Friday, 2021-02-26

clarkbI do think we've mentioned a decent plan for addressing those now. Just need to figure out if it is viable when cross checking real data00:00
clarkbdoes that seem reasonable to other infra-root and stewie925 ?00:01
clarkbUnrelated: ze01.openstack.org's executor did restart a few minuets ago. I have asked it to pause so that I can docker-compose down it00:03
fungithe suggestion for dealing with duplicate accounts in that scenario, while not ideal, sounds reasonable00:03
stewie925clarkb:  so if those reviews are orphaned they would still be there, just curious as to how I can filter for them?00:04
clarkbfungi: which suggestion for which scenario? Sorry there are a cuple things in play there00:04
clarkbstewie925: you'd do a reviewedby:accountid or reviewedby:email00:05
fungisorry, "I do think we've mentioned a decent plan for addressing those now" -> "the third option" -> "more quickly fix the account inconsistencies due to external id email conflicts then revisit this with stewie925 in a week or two"00:06
clarkbstewie925: basically we're in a situation where you can keep the change association or your review/comment association00:06
clarkbstewie925: because we have to decide if the new or old account is what ends up persisting into the future. If you choose to persist the old account then any activity on the new account gets orphaned (since you're not pushing code with the new account it would just be review comments via the web ui).00:06
clarkbif instead you decide to simply move forward with the new account then the changes pushed under the old account would be orphaned. They are all still there in both cases just harder to associate with the current (in the future) account00:07
stewie925clarkb:  ahh ok, what if I stay with the old account, and just purge the new account?00:11
*** tosky has quit IRC00:12
clarkbstewie925: thats the thing that requires us to make the gerrit notedb consistent to do without a dowtime. In that case any reviews/comments you made with the new account will be harder to associate but the changes you've pushed should all still be associated to the active account00:13
kopecmartinianw: hi, shouldn't this be set by a var to either refstack01 or refstack depending whether it's a production deployment or not? https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/refstack/templates/refstack.vhost.j2#L1100:17
clarkbyou know what there is something I haven't considered yet (but is dangerous for others reasons)00:18
clarkbI wonder if we force push will gerrit bypass verification of the accounts/00:18
fungiwhee!00:18
clarkbthis is dangerous because if we race someone updating their emails ro whatever we'll undo that for them00:18
fungiyep00:19
clarkbbut maybe we should test that on review-test and if it works intstead of takign a downtime we can take a virtual downtime where we disable gerrit access00:19
clarkbI dunno just trying to think beyond previous considerations00:19
clarkbstewie925: I'm happy to think on this a few days if you are. And I'll start prioritizing looking at fixing the conflicts in other accounts so we can push updates sanely next week00:21
clarkbdefinitely seems like we won't have an answer today and maybe sleeping on it will produce some great ideas :)00:21
fungior at least produce better-rested people00:22
clarkbstewie925: does that work for you? feel free to ask questions as you have them and we'll do our best00:22
stewie925clarkb:  yes, no rush - take your time00:22
stewie925its not a priority or rush for me00:22
clarkbgreat. The first step was in understanding the account situation (which I think we have done now). Next we need to let brains do what they do00:23
stewie925clarkb: fungi:  appreciate your help00:23
clarkbas part of doing debugging for ^ I made my http able account admin in gerrit. I have now removed it from that group00:28
clarkbthe force push idea is so tempting if it weren't so risky :)00:28
clarkbI'll work on filing this bug with migrating openids tomorrow morning too00:30
clarkbI don't expect they will fix it quickly or at all, but we may get suggestions on how to address that and we could write a patch or something00:30
clarkbI'll also catch up with ze01.openstack.org tomorrow morning as it is still pausing00:36
*** mlavalle has quit IRC01:08
*** hamalq has quit IRC01:10
*** stewie925 has quit IRC01:17
*** jhesketh_ is now known as jhesketh01:49
*** zbr has joined #opendev03:50
*** dmsimard has quit IRC04:08
*** dmsimard has joined #opendev04:09
*** brinzhang has quit IRC04:17
*** marios has joined #opendev05:35
*** marios has quit IRC06:28
*** lpetrut has joined #opendev07:29
*** icey_ is now known as icey07:43
*** elod has quit IRC07:45
*** elod has joined #opendev07:49
*** slaweq_ has joined #opendev07:50
*** gibi has joined #opendev07:53
*** slaweq_ has quit IRC08:22
*** andrewbonney has joined #opendev08:24
*** yoctozepto5 has joined #opendev08:26
*** yoctozepto has quit IRC08:28
*** yoctozepto5 is now known as yoctozepto08:28
*** yoctozepto6 has joined #opendev08:32
*** yoctozepto has quit IRC08:33
*** yoctozepto6 is now known as yoctozepto08:33
*** fressi has joined #opendev08:47
*** fressi has left #opendev08:47
*** roman_g has joined #opendev08:59
*** DSpider has joined #opendev09:31
*** mugsie_ is now known as mugsie10:16
*** yoctozepto8 has joined #opendev10:47
*** yoctozepto has quit IRC10:47
*** yoctozepto8 is now known as yoctozepto10:47
*** yoctozepto1 has joined #opendev10:54
*** yoctozepto has quit IRC10:55
*** yoctozepto1 is now known as yoctozepto10:55
*** iurygregory has quit IRC11:13
*** marios has joined #opendev11:23
*** tosky has joined #opendev11:25
*** yoctozepto2 has joined #opendev11:31
*** yoctozepto has quit IRC11:33
*** yoctozepto2 is now known as yoctozepto11:33
*** roman_g has quit IRC11:57
*** yoctozepto4 has joined #opendev12:45
*** yoctozepto has quit IRC12:46
*** yoctozepto4 is now known as yoctozepto12:46
*** slaweq_ has joined #opendev12:49
*** zbr9 has joined #opendev13:20
*** zigo has joined #opendev13:20
*** zbr has quit IRC13:22
*** zbr9 is now known as zbr13:22
*** slaweq_ has quit IRC13:31
*** stephenfin has joined #opendev13:38
*** zbr0 has joined #opendev13:43
*** zbr has quit IRC13:46
*** zbr0 is now known as zbr13:46
*** iurygregory has joined #opendev13:49
*** zbr3 has joined #opendev14:03
*** zbr has quit IRC14:05
*** zbr3 is now known as zbr14:05
*** zbr7 has joined #opendev14:08
*** zbr has quit IRC14:10
*** zbr7 is now known as zbr14:10
*** zbr4 has joined #opendev14:33
*** zbr has quit IRC14:35
*** zbr4 is now known as zbr14:35
openstackgerritMartin Kopec proposed opendev/system-config master: refstack: Edit URL of public RefStackAPI  https://review.opendev.org/c/opendev/system-config/+/77629214:36
*** fressi has joined #opendev14:51
*** zbr4 has joined #opendev14:52
*** fressi has left #opendev14:52
*** zbr has quit IRC14:53
*** zbr4 is now known as zbr14:53
*** marios has quit IRC15:00
*** zbr5 has joined #opendev15:12
*** zbr has quit IRC15:14
*** zbr5 is now known as zbr15:14
*** lpetrut has quit IRC15:15
*** tosky has quit IRC15:19
*** tosky has joined #opendev15:20
*** zbr8 has joined #opendev15:30
*** zbr has quit IRC15:33
*** zbr8 is now known as zbr15:33
clarkbze01.openstack.org seems to have paused, going to down it now16:01
clarkb#status log Added new focal ze01.opendev.org and stopped zuul-executor on ze01.openstack.org16:02
openstackstatusclarkb: finished logging16:02
*** lpetrut has joined #opendev16:05
*** Dmitrii-Sh has quit IRC16:11
*** zbr2 has joined #opendev16:11
*** Dmitrii-Sh has joined #opendev16:11
*** zbr has quit IRC16:13
*** zbr2 is now known as zbr16:13
openstackgerritClark Boylan proposed opendev/system-config master: Update zuul-executor shutdown handling  https://review.opendev.org/c/opendev/system-config/+/77776816:14
*** zbr6 has joined #opendev16:15
fungirackspace e-mailed us to say that the refstack.openstack.org instance had to be rebooted16:17
*** mlavalle has joined #opendev16:17
*** zbr has quit IRC16:17
*** zbr6 is now known as zbr16:17
fungiserver uptime is ~3 hours and the site seems to be up16:18
clarkbcorvus: ^ that change is the result of our conversation yesterday. It doesn't look like sighup is used by the executor for anything (and the scheduler deprecated its use for reloading configs). Also the executor seems to be the only service that supports graceful16:18
clarkbfungi: thanks for checking on it16:18
corvusclarkb: lgtm16:22
clarkbfungi: stewie925 is not here but I realzied something about the account fixing. We'll also need to reassociate the openid email address with the account to avoid gerrit thinking it should create a new accoutn again. That should also be doable via external id edits so not a huge deal. But it occurred to me that would be an extra step16:23
clarkbfungi: also separately it occurred to me if we're just going to declare bankruptcy on the inactive accounts we can probably use the gerrit rest api to delete their conflicting external ids16:24
clarkbthe upside to this is we should be able to do that one at a time and check on things as we go to ensure we are improving the situation16:24
clarkbthe downside is we won't get nice commit messages16:24
fungi#status log The refstack.openstack.org service was offline 12:57-13:35 UTC due to a localized outage in the cloud provider where it's hosted16:24
openstackstatusfungi: finished logging16:24
clarkbbut I think if we can use that to whittle down the list to the active set then do nicer commit messages for them that may be a win16:24
clarkbas we'll end up with a much easier commit to understand for the accounts that are more important16:25
fungii think we can assume any rest api changes made by one of our administrative accounts during this timeframe is related to cleanup16:25
clarkbya16:25
clarkbthe history won't be impossible to understand, just more robotic16:25
openstackgerritJeremy Stanley proposed opendev/git-review master: Add test helpers for unstaged/uncommitted changes  https://review.opendev.org/c/opendev/git-review/+/77768716:26
openstackgerritJeremy Stanley proposed opendev/git-review master: Don't test rebasing with unstaged changes  https://review.opendev.org/c/opendev/git-review/+/77745616:26
clarkbI mention that because I'm thinking the process might look something like: generate lists of active and inactive accounts, retire all inactive accounts, remove conflicting external ids via rest api for all inactive accounts, rerun consistency checks and confirm the situation is better. Then figure out what is hopefully a much smaller set of active accounts by hand with actual fixes16:27
clarkbthe key bit being we can actuall check things are improving as we go rather than needing to do a giant commit that is hard to test and may not actually help16:28
fungiretiring inactive accounts with conflicts, not all inactive accounts, yeah?16:29
clarkbright only looking at the subset of accounts that have problems16:31
clarkbactive and inactive lists are subsets of the "these accounts are inconsistenct" set16:31
*** icey has quit IRC16:33
*** icey has joined #opendev16:34
fungiianw: your rackspace ticket about the error deleting volumes got resolved, so i closed it16:36
*** zbr has quit IRC16:38
clarkbany idea why cacti doesn't have zm01-08.opendev.org in it? Also I expected ze01.opendev.org in it too. The cacti hosts were updated in system-config and puppet ran successfully against cacti16:40
clarkbfungi: this is a fun one, I've just discovered that querying accounts by email over the rest api doesn't always return all results17:01
clarkbwas trying to do that to fully automate the list creation (the gerrit inconsistency error only gives us emails and openids so have to map to accounts)17:01
clarkbwe may need to do an out of band lookup of the account ids, then automate from there :(17:02
clarkbor as another option just do our best with what the api gives us, address those, then figure out what is left after17:02
*** slaweq_ has joined #opendev17:08
fungiclarkb: interesting, i wonder if the rest api is assuming there wouldn't be conflicting accounts with the same address and optimizing by returning the first one it finds17:09
clarkbfungi: well it seems to work in other cases and return the duplicates. I'm looking at the one that doesn't and comparing it to others to see what may be different and not really noticing anything17:10
clarkbone thought was maybe the other account is inactive, but it isn't17:10
fungiyeah, that would have been my next guess too. oh well17:12
fungiwe should probably just assume that the rest api exhibits undefined behavior where e-mail address conflicts are concerned17:13
clarkbya it seems to be good enough in most cases and if we get back < 2 results we can set that account aside for now17:13
clarkbor generate the list externally17:14
*** hamalq has joined #opendev17:16
clarkbfungi: another thing to test is I'm doing all this unauthenticated. Maybe switching to authenticated will give more complete results17:18
fungioh! yes recent gerrit got a lot stricter about user details and particularly e-mail addresses17:25
fungiyou'll basically need to be an admin to fetch lists of e-mail addresses for any account besides your own now17:26
fungibut there may be other related implications17:26
openstackgerritJeremy Stanley proposed openstack/project-config master: Move git-review to opendev tenant  https://review.opendev.org/c/openstack/project-config/+/77777417:36
*** lpetrut has quit IRC17:40
openstackgerritJeremy Stanley proposed opendev/git-review master: Update jobs for opendev tenant  https://review.opendev.org/c/opendev/git-review/+/77779917:54
*** slaweq_ has quit IRC17:54
*** andrewbonney has quit IRC18:15
fungiconfig-core: if someone has a moment to look over https://review.opendev.org/777774 i'd be most grateful, that way i can proceed with pending docs and python packaging polish before trying to tag a git-review release candidate18:19
mnaserfungi: lgtm, do we need to create a change to add project-template 'publish-opendev-tox-docs' ?18:20
fungimnaser: shouldn't need to, it's defined in the tenant we're moving to18:22
mnaserfungi: oh i see, that's why its complaining18:23
fungii'll recheck 777799 once 777774 deploys18:23
fungii'm working on other changes i plan to stack up behind that, but don't want to waste node cycles on them until i know jobs are passing in the new tenant18:24
*** slaweq_ has joined #opendev18:33
clarkbfungi: approved18:38
fungithanks!18:38
*** slaweq_ has quit IRC18:44
openstackgerritMerged openstack/project-config master: Move git-review to opendev tenant  https://review.opendev.org/c/openstack/project-config/+/77777418:50
clarkbok the numebr of accounts that don't return results as expected through the rest api is a lot higher than I had hoped :(19:01
clarkbthough at least some of them do appear to be due to inactive accounts19:02
clarkbI wonder if there is an option we can pass to say give me the inactive ones too19:02
clarkbthere is!19:05
fungimagic19:08
clarkbya now I just have to figure out how to express +(is:inactive+OR+is:active) in python requests such taht gerrit will accept it19:12
clarkbsending it through curl works just fine :/19:12
clarkbI wanted spaces not +'s19:27
clarkbno idea why curl manages to make that work though19:28
clarkbthere are apparently still many that the rest api will only return oen account for, interesting19:30
clarkbthat must mean there is more than one reason for this behavior :(19:30
clarkbfungi: ok. If I authenticate as a normal user I still only see the one account. If I authenticate as an admin I see both19:41
clarkbI think I'm going to need to put this down for the day though and make sure other work is getting done too19:41
clarkbbut on monday I can update my script to authenticate19:41
clarkband rerun it and hopeflly get a complete picture19:41
fungiyeah, strange but not especially surprising19:42
*** slaweq_ has joined #opendev19:44
*** slaweq_ has quit IRC20:22
openstackgerritJeremy Stanley proposed opendev/git-review master: Switch to default Sphinx theme  https://review.opendev.org/c/opendev/git-review/+/77782520:46
openstackgerritJeremy Stanley proposed opendev/git-review master: Overhaul Python package metadata and OpenDev URLs  https://review.opendev.org/c/opendev/git-review/+/77782620:46
clarkbfungi: ^ those lgtm though the overhaul change failed python 3.9 testing (everything else was fine though)21:52
clarkbI +2'd it as I figure that may be a recheck situation and improve the testing as we go item21:52
clarkbinfra-root I've found that the create_graphs.sh script seems to have run successfully for ze01.opendev.org as well as zm01.opendev.org but they don't show up in the lsit of hosts on the left side of the web ui22:01
clarkbis there another step that needs to be take?22:01
clarkboh interesting if i go to the list view they seem to be there22:02
clarkbof course now I can't figure out how to get the graphs for ze01.openstack.org on previous days to compare22:05
clarkblooks like memory and cpu use may be better though we're using slight more swap22:09
clarkbnot seeing any crazy deltas which is good22:10
clarkbI'm thinking the last thing to check up on is if afs using jobs are still happy22:10
clarkbhttps://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_46b/05249aec2333ccad1a83d53cee7295a0784fb028/post/publish-openstack-python-branch-tarball/46b413b/job-output.txt that looks like one happy afs job22:12
clarkbfungi: are there specific jobs that use afs in ways that are different enough we should check them or is one tarball job being happy sufficient evidence you think?22:12
clarkbcacti for the mergers also looks fine in comparison22:16
fungiclarkb: the failure is probably test timeouts, seems to have been hitting that randomly on some nodes even before we added the 3.9 job22:43
fungiwill take a look, thanks22:44
fungiclarkb: i expect a tarball job and a docs publish to be relatively similar, nothing out of the ordinary in afs use between jobs afaik22:45
clarkbcool in that case I think ze01 looks good22:49
clarkbwe can probably continue to replace servers next week in a rolling fashion.22:49
fungiclarkb: yep, as suspected, TimeoutException: https://8f63d1ba12262ef08ef5-137d8ab1f6e795378a0b5ffa2abe6af6.ssl.cf5.rackcdn.com/777826/1/check/tox-py39/59b685c/testr_results.html23:05
fungiclarkb: does this apply to all tests? https://opendev.org/opendev/git-review/src/branch/master/git_review/tests/__init__.py#L20023:08
fungilike can i just bump that from 120 to 300 and increase the timeout for all tests that way?23:08
fungimost of git-review's functional tests share the same basic workflow, and so are equally prone to hit the timeout23:09
openstackgerritClark Boylan proposed opendev/system-config master: Add tools being used to make sense of gerrit account inconsistencies  https://review.opendev.org/c/opendev/system-config/+/77784623:09
clarkbfungi: it will apply to all tests that inherit from BaseGitReviewTestCase23:10
fungiokay, i think that's all the ones which time out23:10
fungithanks23:10
clarkbfungi: ^ if you get a chance can you look over that change? I've largely got things put together now I think and can run that as admin next week23:10
clarkbwould be curious to have input on whether or not that appraoch makes sense to you for identifying accounts that are completely dead23:11
fungilooks like that timeout was added by https://review.openstack.org/71223 in 2014, so it's lasted a while23:11
openstackgerritJeremy Stanley proposed opendev/git-review master: Increase test timeout to 5 minutes  https://review.opendev.org/c/opendev/git-review/+/77784823:14
fungiclarkb: first thought, we shouldn't care about inactive accounts which are in conflict... we should just strip out their ids, right?23:22
fungiif the account is already inactive, then we clean those up first and see what's still conflicting afterward23:23
clarkbfungi: ya thats sort of what I'm thinking. So my audit script should produce the list of those accounts, then we can write a separate script that deletes the external ids for each of those accounts23:23
clarkbwe can do that via the api23:24
fungiwe only care to try to fix accounts which aren't set inactive in gerrit, so if they're only conflicting with accounts set inactive then retiring the inactive ones may reduce the set dramatically23:24
clarkbI have a todo in the audit script that it needs to distinguish between gerrit inactive and my concept of inactive due to idleness because I think that is a good first pass23:24
fungihere i'm referring to "inactive" in the gerrit sense, not simply stale/unused for a while23:24
clarkbyup23:24
clarkbI don't know that it will be a dramatic reduction but it is a non zero set23:25
clarkbI agree though that that would be a good first pass and updating the script to call out the gerrit concept of inactive is in there as a todo23:25
clarkbbasically we retire and delete external ids for any accounts that are gerrit inactive already23:26
clarkbthat should be entirely safe and can be done without gerrit downtime or mega push23:26
fungias for the query for push/review activity, why not use an "after" constraint in the changes query?23:26
clarkbfungi: mostly that I didn't consider it :) I think that is a reasonable idea. I would probably still do the math directly though to double check gerrit isn't doing something weird23:27
fungiyou can say after:2020-02-26 or whatever and then you won't get any results for changes touched older that that23:27
clarkbbut that is a good way to simplify the input23:27
fungiso much less to iterate through23:27
fungiof course, just because the change was touched more recently than the after date doesn't mean that's activity from our candidate, but it will result in a lot less data to churn through23:28
clarkbyes, that is actually an issue with the current setup. But I think it will get us close enough for now23:28
clarkb"at least a year old" is what that means23:28
clarkbwe shouldn't end up being greedy23:28
clarkbfungi: there are ~51 accounts that gerrit sees as properly inactive that we can just go and remove external ids for23:31
clarkbwhich is just under 10% of the total number of conflicts23:31
clarkba decent chunk but not going to solve all our problems :) still I think that is a good suggestion for making progress and reducing the problem set so I'll get that done as the next thing23:32
fungiyeah, so not substantial, but at lest it's that many fewer to have to ponder23:32
clarkbI also mean to make this audit script do this activity checking on the preferred email has no external id cases to see if we can just turn them all off due to inactivity23:34
clarkbthats another 20 or so23:34
fungisure23:39
clarkbfungi: do you think ~1year is a decent metric for identifying aged out accounts?23:39
*** hamalq has quit IRC23:40
fungiit's a good starting point23:40
fungilike i said in review comments. we can run it with different timeframes and compare the number23:40
clarkboh I haven't seen the comments yet /me refreshes23:41
fungiit's possible most of these are inactive >4yr or something and so a longer period doesn't mean that many more we have to sort out23:41
clarkbyup good point23:42
*** gothicserpent has joined #opendev23:50

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!