Wednesday, 2021-03-03

fungiokay, hearing no objections i've pre-written the git-review 2.0.0 release announcement so will push the tag now and then approve 778056 so the docs get republished with the new tag present, and then send the announcement i've queued up00:23
clarkbsounds good00:24
clarkbI'm about to take advantage of some sun and give the brain a rest. Hope to get through the rest of those accounts tomororw00:24
clarkbianw: its a whole 12C and that is some of the best weather we have had in a while :)00:25
ianwheh, yeah i can't brag ATM it's like 15C here ATM00:26
fungilooking at the test nodes and node requests graphs today, i think the extra nodes from inap have made a great improvement00:26
fungiwe seem to have lpateaued a few times, but it was off and on, not all day00:27
fungier, plateaued00:27
funginew git-review is up now:
openstackgerritMerged opendev/git-review master: Remove comments for unstaged/uncommitted tests
fungidocs promote job for ^ has completed, next vos release for that volume should occur in two minutes00:48
fungi looks correct now, so sending release announcement01:00
fungiand sent01:01
fungi#status log released git-review 2.0.002:21
openstackstatusfungi: finished logging02:21
openstackgerritIan Wienand proposed opendev/system-config master: [wip] handle zuul-summary-results as .jar / per-project config
ianwclarkb: ^ that appears to be hanging doing gerrit init.  i can't replicate it :/  i have a feeling it might have to do with pulling from the intermediate repository.  anyway, still debugging but if you see anything similar ...02:33
openstackgerritIan Wienand proposed opendev/system-config master: Fix up openafs-client job matching
openstackgerritIan Wienand proposed opendev/system-config master: install-ansible: ensure stevedore
ianwclarkb / frickler: ^03:11
ianwok, i've in-place ugpraded afsdb01 to focal now.  went ok, i manually ran base and just had to clear the ansible cache because it picked it up wrong.  i'll do 02 in a bit04:14
openstackgerritIan Wienand proposed opendev/system-config master: [wip] handle zuul-summary-results as .jar / per-project config
openstackgerritIan Wienand proposed opendev/system-config master: [wip] handle zuul-summary-results as .jar / per-project config
openstackgerritMerged openstack/project-config master: Normalize projects.yaml
*** eolivare has joined #opendev07:22
ianw#status log afsdb01 and afsdb02 in-place upgraded to focal07:32
openstackstatusianw: finished logging07:32
ianwi've manually run the base & afs playbooks against them, so i'm confident they will just keep ticking along now07:32
*** rpittau|afk is now known as rpittau07:52
*** jpena|off is now known as jpena08:57
openstackgerritMateusz Kowalski proposed openstack/diskimage-builder master: Change paths for bootloader files in iso element
openstackgerritMateusz Kowalski proposed openstack/diskimage-builder master: Change paths for bootloader files in iso element
openstackgerritDavid Ostrovsky proposed opendev/system-config master: Remove obsolete Bazel spawn strategies
*** dviroel has joined #opendev11:05
*** bhagyashris is now known as bhagyashris|rove11:30
*** bhagyashris|rove is now known as bhagyashri|rover11:30
*** hashar has joined #opendev12:29
*** hemanth_n has quit IRC12:32
fungiianw: thanks, so general rule of thumb is we need to clear the ansible cache immediately following an in-place upgrade?12:50
*** hemanth_n has joined #opendev13:41
*** ykarel has quit IRC13:42
*** ykarel_ is now known as ykarel13:42
*** zoharm has quit IRC14:47
fungipopping out to run some errands but should be back by 16:00z15:01
openstackgerritSorin Sb├órnea proposed openstack/project-config master: Add tripleo-ci-health-queries to zuul
clarkbfungi: we need to clear out the ansible cache anytime the details of a particular host change dramaticly. Another example would be replacing a server with like for like under the same inventory name aiui15:53
*** zoharm has quit IRC15:55
fungiyep, got it16:05
fungimakes total sense16:05
clarkbianw: thank you for pushing on the afs stuff. I'm trying to finish up the gerrit account stuff but then will take a look at the gerrit init thing16:19
auristorianw: the vlserver and ptservers on afsdb01 and afsdb02 look good from here16:21
mordredhave I mentioned that I think it's super neat that auristor can look at things like that?16:22
auristorI'm only taking advantage of the lack of privacy features of openafs :-)16:23
fungipublic transparency is one of the things i appreciate about it. then again, i miss the days when i could telnet to just about any site as guest and request an account16:24
auristorthe remote administration from anywhere features is one of the strengths of the afs-family architecture16:25
mordredfungi: I remember thinking the web was stupid (just pages of links to other pages of links) compared to all the lovely ftp sites16:31
mordredI might not be the fastest adopter of new tech16:31
fungii remember thinking it was a mild improvement over gopher, but that was about it16:32
fungithen again, i also thought embedded images and media files were also a passing fad, same for html e-mail16:33
clarkbfungi and I have run the external id cleanups for the ~35 identified inactive accounts17:13
clarkbI am running consistency checks now to do a diff against17:13
clarkbthese were all the accounts similar to smcginnis' where one account is active and the other is inactive. We should've only modified the inactive side (and the script has a check for active accoutns and will skip if active)17:14
clarkbalso the logs for that made it into review alonside the logs for previous cleanups17:27
yoctozeptodoes the zuul "eager run" of newly added jobs does not apply to project templates?18:08
yoctozeptofor ref see
yoctozeptoI added the project template and thought I would get its job run18:08
yoctozepto(asking to make sure I understand it right)18:09
clarkbit should apply to anything in untrusted config18:09
clarkb defines the template in an untrusted context so I would've expected that to run18:10
clarkbyoctozepto: does that job filter files?18:10
yoctozeptoclarkb: checked, it filters branches18:10
yoctozeptodoes not run on stable branches18:11
yoctozeptobut this is master18:11
clarkbit shouldn't filter branches that way18:11
clarkbI don't think that is the problem but those branch filters never work the way people expect18:11
yoctozeptoyeah, it does not trigger on the followup patch I tried18:12
yoctozeptoso the project template is busted18:12
yoctozeptooh well18:12
clarkbyoctozepto: what happens if you try to add the job directly without the template? if that doesn't work then I would look at the job, if that does work then look at the template18:12
yoctozeptoclarkb: yeah, I think I will try that too18:13
yoctozeptothough I am supposed to use the template18:13
clarkboh osc does not have branches so the branch exclusion for stable is probably ok in this context (it causes problems when you have branches and do exclusions that conflict with the current branch)18:13
clarkbyoctozepto: I think I see why this is happening18:14
clarkbyoctozepto: the openstackclient .zuul.yaml is broken18:14
yoctozeptooh gosh18:14
yoctozeptoend of the world18:14
clarkbyoctozepto: search for openstackclient18:15
yoctozeptooh, nice!18:15
yoctozeptoI am really glad to be the one to find global issues lol18:15
clarkb Unknown projects: openstack/python-karborclient18:15
clarkbseems to be the root cause18:16
yoctozeptoyeah, need to add more instructions for retirement18:16
yoctozeptoclarkb: are you proposing the fix now?18:16
clarkbI am not (sorry still digging into gerrit account stuff)18:18
yoctozeptofwiw, the project is here but obviously retired
yoctozeptoah yes18:19
yoctozeptoit no longer has any zuul config18:19
yoctozeptomakes total sense18:19
clarkbfungi: I've uploaded the newer audit results to review now. As expected the active + inactive set is now empty. We have about 140 accounts that have pushed or reviewed code recently and the rest have not pushed or reviewed code recently18:19
clarkbI want to followup with weshay|ruck on the tripleo ruck rover account before moving too much furhter aheada s that is in the no pushes and reviews group and I think will give us good insight on further identifying recent usage patterns18:20
*** toomer has quit IRC18:21
clarkbI'm now going to try changing the recency period to 2 years and then 6 months and see if the data drastically changes18:22
yoctozeptonow it runs18:25
yoctozeptoclarkb: btw, gerrit shows you still have "Turkey time"18:28
yoctozeptothat's one big turkey there18:28
clarkbya I always forget to update it :)18:30
clarkbbut also turkey time is a good time18:30
zbrclarkb: fungi: please and thanks.18:30
yoctozeptoI agree18:30
zbrI was wondering about the same thing about clarkb timezone.18:31
yoctozeptoit's not timezone though18:31
yoctozeptoit's status, like vacationing :-)18:32
zbrif the timezone was correct, he would have being a night-turkey18:32
weshay|ruckclarkb, aye.. so we've confirmed tripleo-ci.ruck.rover@gmail is only for listening to gerrit events18:32
weshay|ruckit would never push a review18:32
weshay|ruckand has none18:32
weshay|ruckclarkb, does that answer the question well enough?18:33
clarkbweshay|ruck: yes, I think that basically means that we should be looking to see recent ssh logins as well to determine recent usage18:33
clarkbweshay|ruck: I may dig up mroe account details in a bit (want to finish up comparing different time deltas), and will bring up any additional questions if they arise18:33
fungiright, i think for anything in that bucket, just grepping the ssh api log for the username is sufficient. we have like a month of retention, should be plenty18:34
weshay|ruckcool, not a problem.. thanks as always for the help18:34
clarkbI suspect we want something like: if username external id is set then check ssh logs for use of that in sshd logs18:34
fungii agree. we could check whether there's an ssh key configured too if desired18:34
clarkbgerrit has a timezone?18:36
clarkbI couldn't find one18:58
mordredI would have thought it would just be the UTC from the server, right?18:58
clarkbthere are about 50 more "recently used" accounts if I switch the recency period from 1 year to 2 yaers18:58
clarkbit does show you timestamps in your browser specified timezeon18:58
clarkbbut I can't see a way to tell the server that such that other people will see it18:59
clarkbnow to see what a 6 month period looks like18:59
clarkbif we look at a 6 month as the recency then we lose 36 accounts19:32
corvusi did not see an expected gerritbot msg; sorry i have to run right now and don't have time to check on it19:32
clarkb17:02:23 <-- | openstackgerrit (trim) has quit (Quit: Changing servers)19:33
clarkbI'll restart it19:34
clarkbthat is done19:34
clarkbhaving 3 data points for "recently used accounts" probably doesn't actually make a trend btu it does seem there may be an attrition rate there19:38
clarkbthat actually makes me more comfortable with using a year because it seems that is a good balance between reducing problem set and "these accounts are unlikely to ever notice"19:40
clarkb6 months further reduces the problem set but those accounts are probably more likely to try and push code tomorrow19:41
clarkber maybe I've got that backwards. I should eat lunch then do thinking19:42
johnsomHello opendev neighbors.  I just wanted to mention an oddity I noticed on the nodejs jobs where zuul has to retry a few times.20:26
johnsomIt seems to be not happy trying to get a chromium package20:27
johnsomMy guess is the remote side for the snap store is having network issues or such. "Download snap "core18" (1988) from channel "stable" (unexpected EOF)"20:28
johnsomFrom what I can see we are moving away from using chrome in the nodejs tests, so this might go away, but thought I would mention it.20:29
clarkbjohnsom: I had no idea that anyone was using snaps for anything. Is chrome not properly packaged anymore (we mirror the packages)20:30
johnsomclarkb Me either, but I think that is the "magic" of focal20:30
johnsomWhen I track the task back:
johnsomIt's an apt install call via ansible, but snap seems to be getting involved20:31
fungisomeone in another channel started asking me about snaps this morning, and i went looking for my torch and pitchfork20:32
fungiyikes! "Transitional package - chromium-chromedriver -> chromium snap"20:33
johnsomI will not start the "debate" about "Is chrome not properly packaged anymore". grin.20:33
fungiseems like that debate is already over anyway20:34
clarkbweshay|ruck: ok, I've found that the two accounts that conflict over the single tripleo rover ruck email address both have usernames and ssh keys configured. However, grepping sshd logs the account with username os-tripleo-ci seems to be actively used but the one with as the username may not be used20:35
clarkbweshay|ruck: do you know if that is the case? If so we'd want to retire the account and remove its conflicting external ids. If you can help confirm that my investigation makes sense that would be great as we can apply rules like this to other accounts20:35
clarkbfungi: ^ I'm thinking the next good update to the audit will be to check if there are accounts with no username and or no sshkeys as those can probably be safely retired if they have also not been recently used20:36
clarkbjohnsom: well in this case I mean in a package and not a container20:36
clarkbjohnsom: since we mirror the packages but not the snap containers20:36
johnsomYeah, I know. It's a "hot" topic these days.20:37
johnsomNot one I have cycles to get in the middle of. grin20:38
clarkb it looks like the package is a transitional package that pull the snap20:38
weshay|ruckclarkb, ah.. ok this makes sense now20:38
fungii find it interesting that ubuntu would decide to punt their chromium packages to a snap instead of just using debian's (which is also newer than any snap ubuntu seems to have)20:38
clarkbfungi: they may rely on chrom* updating itself within the snap/container20:39
fungi 88.0.4324.182-1, 1:85.0.4183.83-0ubuntu220:39
fungiyeah maybe20:39
fungibut seems silly to install chromium 85 only to immediately upgrade it20:39
clarkbif it gets you out of the business of doing frequent updates I can see the argument for it20:40
johnsomJust for an outside reference20:41
weshay|ruckclarkb, ok.. so os-tripleo-ci is used w/ our tripleo zuul reproducer20:42
weshay|ruckyou can nuke
weshay|ruckbut please leave os-tripleo-ci20:42
clarkbweshay|ruck: cool, that means my investigation produced what appears to be reliable results :)20:43
clarkbweshay|ruck: also fwiw I think what we may do is set a bunch of these accounts inactive, then wait a few days for screaming before doing the more invasive external id removals20:43
clarkbjust to be sure we haven't missed anything20:43
fungijust good to know they're not actively using both accounts20:43
weshay|ruckk.. if we can avoid setting os-tripleo-ci innactive, that would be appreciated.. even if I scream as loud as I can, not everyone will get the message20:45
clarkbweshay|ruck: yup we only have to pick one of the two to deactivate so we will deactivate the one you aren't using20:45
clarkband now I've got more info on improving our audit scripts to better find accounts like those two and classifythem20:46
clarkbjust 607 more to figure out :)20:46
clarkbfungi: I'm thinking the next pass might be "no reviews, no pushes, no username, no sshkeys" and see what that produces20:47
clarkbthen the next set will be no reviews, no pushes, and no recent sshd log entries20:48
fungiwhere "recent" is roughly a month's log retention, i think20:48
clarkbthe reviews and pushes should catch those activities outside of the sshd log20:49
clarkbso we'd really be isolating accounts like teh tripleo one used only for pulling events and not responding back again20:49
fungii think we discussed this previously, and talked about possibly preserving logs or pulling some from backups if we wanted to check a longer timeframe20:49
clarkbI feel like cross checking against pushes and reviews is probably good enough, but may depend on the size of those datasets20:49
fungiyeah, if they're not leaving comments or pushing changes, then the only cnoceivable things they could be doing are listening to the event stream or running queries over ssh (the relevant queries over rest api wouldn't need auth anyway)20:51
fungiand i expect those would be frequent periodic or continuous access anyway20:51
clarkbfungi: do you have time to review as the next step in zuul server replacements?20:52
fungiwe might also want to compare against active connections too though? possible our server is so stable they haven't had to reconnect to the event stream for longer than our log retention20:52
clarkbfungi: ya exactly re frequent or continuous20:52
clarkbfungi: we restarted gerrit recently, like 10 days ago?20:52
fungiyeah, so should be good enough if we do it in the next 20 days20:52
clarkbbut also it logs queries made not just connections20:52
fungitrue, but if the only thing they're doing is connecting to the event stream... meh that probably fairly unlikely you're right20:53
ianwdid we have some known issues with the limestone mirror? e.g.
clarkbit appears to be up now, and no wasn't aware of any21:36
ianwalso seems we lost the gerrit announcer21:38
clarkbhrm I had to restart the gerritbot earlier today because it said it was switching servers then never came back21:41
clarkboh no it saw your stevedore push21:41
clarkbbut you weren't in here21:41
clarkbwas that the one you expected to see? Maybe you were split away too21:42
clarkbya looks like there was a netsplit, possible the bot got caught in it again though21:43
*** roman_g has joined #opendev21:51
openstackgerritMerged opendev/system-config master: Remove
jrosserianw: i had a few jobs with the same ipv6 fail on the limestone mirror today21:54
gothicserpentclarkb, i had an issue with that.. might be a k-line possibly22:07
gothicserpentwas the bot on a vpn?22:08
openstackgerritTristan Cacqueray proposed zuul/zuul-jobs master: cabal-test: add install_args and build_args role var
fungigothicserpent: nope, no vpn. it's just running from a virtual machine in rackspace connecting to directly with the python irc module22:22
fungirackspace's dfw pop to be precise22:22
gothicserpentah ok22:24
openstackgerritLee Yarwood proposed openstack/project-config master: Add custom cirros image with ahci module enabled to cache
ianwmirror-update is quiescent, so i'm think the best idea for upgrading the afs fileserver is to just shut it down for a bit and cycle through one by one22:32
ianwalthough docs may write out i guess22:34
ianwi think we can start with ord anyway22:35
fungisounds great22:50
openstackgerritIan Wienand proposed opendev/system-config master: Remove obsolete Bazel spawn strategies
openstackgerritIan Wienand proposed opendev/system-config master: system-config-roles: only match jobs on roles tested
clarkbI've got an audit run going that checks for accounts without usernames and ssh keys now23:54
clarkbif that loosk good I'll push up my chagnes to the audit script and test new git-review in the process :)23:54
clarkbI'm hopeful this pass will give us another good set of accounts to retire and cleanup23:55

