Monday, 2022-05-02

opendevreviewSteve Baker proposed openstack/diskimage-builder master: Parse block device lvm lvs size attributes  https://review.opendev.org/c/openstack/diskimage-builder/+/83982902:04
opendevreviewSteve Baker proposed openstack/diskimage-builder master: WIP Support LVM thin provisioning  https://review.opendev.org/c/openstack/diskimage-builder/+/84014402:04
ianwlooks like we might be having some issues getting an arm64 node03:09
ianwhrm, might be a centos-8-stream arm64 node in particular03:10
ianwlooks like everything in linaro-us is stuck in build, which i think is the problem here03:23
ianwkevinz: ^03:23
ianwwe're just getting a no-info exception ...  raise exceptions.ServerDeleteException(03:26
ianwlook like kevinz might be on holidays.  we might have to disable linaro-us, but i'm not sure about all this stuff in half-deleted mode, it might be a mess to clean up03:59
opendevreviewIan Wienand proposed openstack/project-config master: Temporarly disable linaro-us  https://review.opendev.org/c/openstack/project-config/+/84015004:06
opendevreviewMerged openstack/project-config master: Temporarly disable linaro-us  https://review.opendev.org/c/openstack/project-config/+/84015005:08
opendevreviewIan Wienand proposed opendev/system-config master: Test openafs roles on CentOS 9-stream  https://review.opendev.org/c/opendev/system-config/+/83984105:08
*** marios is now known as marios|ruck05:08
opendevreviewIan Wienand proposed opendev/system-config master: Test openafs roles on CentOS 9-stream  https://review.opendev.org/c/opendev/system-config/+/83984106:03
*** ysandeep is now known as ysandeep|lunch08:13
*** pojadhav is now known as pojadhav|afk08:49
*** ysandeep|lunch is now known as ysandeep08:53
*** ysandeep is now known as ysandeep|sick08:55
*** rlandy|out is now known as rlandy10:23
*** pojadhav|afk is now known as pojadhav10:27
opendevreviewIan Wienand proposed opendev/system-config master: Test openafs roles on CentOS 9-stream  https://review.opendev.org/c/opendev/system-config/+/83984111:12
*** pojadhav is now known as pojadhav|afk13:41
*** artom_ is now known as artom13:42
clarkbianw: when I had keyboard issues it was related to power on my usb bus. I had to move devices around15:16
*** dviroel is now known as dviroel|lunch15:23
clarkbfungi: frickler: any chance we can move forward with https://review.opendev.org/c/opendev/system-config/+/839422 and https://review.opendev.org/c/opendev/system-config/+/839972 to add more jammy mirroring?15:27
clarkbfor that second one do we need to hold the lock and run the script without a timeout? I think that is the case and I'm happy to do that if so15:27
fungiprobably, otherwise it's likely to take multiple passes in cron before the mirror is ready15:28
fungiwon't break anything if we don't though15:28
clarkbgot it. I'm happy to grab the lock and help it through15:29
clarkbalso ianw  has https://review.opendev.org/c/opendev/system-config/+/837637 proposed to do a bit more fedora mirror cleanup15:29
fungii can single-core approve the jammy changes, they should be entirely non-impacting15:30
clarkbthanks. I'll reboot for local updates then grab that lock on mirror-update15:32
fungii've approved them both15:32
clarkbI've got the lock (sorry got ditracted after reboot putting keys back in place nd stuff)15:52
clarkbthe lock is held in window 3 of the preexisting screen session on mirror-udpate. I'll run the script there once the change lands15:52
opendevreviewMerged opendev/system-config master: Mirror Jammy arm64 ubuntu-ports  https://review.opendev.org/c/opendev/system-config/+/83997215:56
opendevreviewMerged opendev/system-config master: Add Jammy Docker package mirroring  https://review.opendev.org/c/opendev/system-config/+/83942215:56
clarkbthe docker mirroring is much much smaller so I won't bother to manually run that one15:57
fungiyeah, it should take no more than a few minutes16:01
clarkbthe ubuntu-ports update script is running now and of course I forget to set a timeout :/16:04
clarkbthat was even the whole point. I'll let it timeout. THen rerun it turning the timeout off. In the meantime I think I'll look at making no timeout the default and then set a timeout when run in cron16:05
fungiyeah, that's fine really. it's more that this way you can catch it when it times out and restart it immediately rather than waiting up to 2 hours for it to continue automatically16:07
clarkb++16:09
*** marios|ruck is now known as marios|out16:10
opendevreviewClark Boylan proposed opendev/system-config master: Run reprepro with no timeout by default  https://review.opendev.org/c/opendev/system-config/+/84021416:16
clarkbSomething like ^ that?16:16
fungioh, right, i totally forgot we talked about dropping it completely16:16
clarkbat least for me I never remember it is timing out by default, but if we do it this way it is a bit more explicit in hte cronob command that you copy and run. This means you can drop it easily16:17
clarkbanyway I'll keep an eye on it and restart it once it times out. THis time with no timeout :)16:18
*** dviroel|lunch is now known as dviroel16:23
*** rlandy is now known as rlandy|ruck16:25
clarkblooking at the gerrit memory stack's latest results after restacking them I continue to see no appreciable difference in a small setup16:38
clarkbI think thats probably about as good as it will get out of the CI system. Those changes are landable though adn will give us useful info16:39
clarkbnow to see if we can ask gerrit/jvm for heap info as well16:39
clarkbI'm curious to see what kind of headroom we have16:39
clarkb`gerrit show-caches --show-jvm` maybe? I'll give that a go in a bit once I've got admin creds loaded16:40
fungiyeah, probably need a much larger and more active gerrit to see problems with it16:45
clarkblooks like we shouldn't need --show-jvm for the heap info so I'll run that show-caches command without taht extra flag16:45
clarkbthis command does not return quickly16:46
clarkb'Mem: 96.00g total = 22.32g used + 73.29g free + 399.99m buffers' based on that I'm not very concerend about more memory use in 3.516:47
clarkbwould have to be ~3x more memory use to be a problem16:47
clarkbalso only 166 open files (noting that as discussion about increasing ulimits has come up in the past)16:47
clarkbso ya I think we can alnd that stack in preparation just to be sure we don't make it worse than it needs to be and we have extra logging info to track request costs. But otherwise proceed with 3.5 planning per usual16:49
fungiyeah, agreed16:49
clarkbfungi: wiki.openstack.org's SSL cert expires in 30 days16:52
clarkbdo we want to reissue another annual cert for that one?16:52
fungiclarkb: yeah, looks like that's what we did last year16:54
clarkbfungi: if you have time to rereview https://review.opendev.org/c/opendev/system-config/+/839251/ and parent that would be great (I Just changed the order ont hem so that we would haev CI results showing memory use with the performance logging toggle explicityl disabled)17:02
clarkber rather that we would have memory info with performance logging enabled and disabled to compare (and the comparison is pretty close and boring)17:02
opendevreviewClark Boylan proposed opendev/system-config master: Upgrade gitea to v1.16.7  https://review.opendev.org/c/opendev/system-config/+/84021817:12
clarkbinfra-root ^ that probably isn't super urgent but also should be low impact17:14
clarkbthe ubuntu-ports sync should timeout any minute now.17:31
clarkbfungi: re the gerrit updates there is a grandparent that needs to be approved before those land. I'm happy to see if ianw can review them though17:32
clarkbok reprepro restarted with NO_TIMEOUT=1 set17:36
fungithey're each fairly trivial, and easy to undo if concerns are raised17:40
clarkbfungi: yup I just noticed you approved the parent and child but not grandparent17:41
fungii just plain forgot it was there ;)17:48
clarkbgitea 1.16.7 passed ci. Always reassuring when those bugfixes don't magically stop working18:07
fungiyeah18:07
fungiregression testing is a wonderful thing18:07
clarkbfungi: and I guess we can plan for a gerrit restart later today as load drops? that way we'll get the new logfile18:10
fungiwfm, yeah18:24
*** rlandy|ruck is now known as rlandy|rover18:30
opendevreviewMerged opendev/system-config master: Update Gerrit build checkouts  https://review.opendev.org/c/opendev/system-config/+/83925018:37
opendevreviewMerged opendev/system-config master: Enable Gerrit httpd requestLog  https://review.opendev.org/c/opendev/system-config/+/83997618:37
opendevreviewMerged opendev/system-config master: Explicitly disable Gerrit tracing.performanceLogging  https://review.opendev.org/c/opendev/system-config/+/83925118:39
*** timburke__ is now known as timburke19:24
clarkbwe are mirroring supertux now heh20:08
clarkbkind of cool that that has an arm build20:08
clarkbthe ports mirroring is writing out a bunch of zstd errors and not yet complete21:56
clarkbI assume it will eventually get there though21:56
fungisame errors we saw with the main mirror, no doubt21:58
clarkbya22:02
ianwdistro love changing their zip formats22:02
clarkbianw: good morning. Probably two things to be aware of. We landed a stack of small updates to the gerrit images and config as part of more prep work for 3.5. We should probably restart gerrit on that soon just to be sure its all working in production. The other thing is I have a cahnge to update gitea to 1.16.722:03
ianwi'll be happy to restart gerrit this afternoon if we like?22:04
clarkbianw: sure. There are two config changes. One to disable performancelogging explicitly as that can apparently consume more memory and the other enables httpd logging so tthat we can track memory useage for requests (apparently useful for debugging in 3.5 ends up using all your memory)22:05
clarkbthe image updates just bring us up to date with the latest tags on things upstream22:05
ianwgitea lgtm, nothing crazy in the changelog22:05
clarkbon the memory side of things I can't really see it using more memory in CI. And our prod service has plenty of heap space according to `gerrit show-caches` so I'm not too worried about it22:05
ianwthat all sounds good.  we've certainly spent a share of time with gerrit running out of memory with little insight as to why22:06
clarkboh and jammy ubuntu-ports mirroring is in progress with the lock held in the screen (window 3) on mirror-update22:06
ianwalthough i don't feel like we've had anything like since we moved to the bigger server22:07
clarkbI had hoped that would be done before my day ends in about 2 hours but its still going so who knows22:07
clarkbianw: ya the bigger server helped a lot22:07
ianwyeah that will take a while, i can pull up that window and drop the lock if it finishes by eod22:07
ianwi want to try looking at centos 9 arm64, i'm hopeful it "just works"22:09
ianwyesterday i fixed publishing the openafs rpms, am just finishing fixing our testing22:09
clarkbianw: oh and I approved your zuul ansible json callback updates22:10
fungiwhich are the remaining changes for c9arm?22:10
ianwfungi: i think we need some node definitions and some general testing22:12
ianwi nee to pick it apart a bit and see where we got to22:12
fungioh, and if you want a quick diversion, https://review.opendev.org/839990 is some long-overdue mailing list cleanup (once that merges i can take care of the manual steps)22:14
clarkbThe images are in nodepool and the mirrors appears to have aarch64 packages.22:14
clarkbwe've also got the labels defined. You may just need to use them at this point and see if they work?22:14
ianwyeah something seemed up yesterday but it might have also been my typos :)22:27
*** rlandy|rover is now known as rlandy|rover|bbl22:29
ianwfungi: 839990 lgtm, not sure if we want to link to opendev lists?  i assume it's safe to approve?22:29
fungiwdym "link to opendev lists?"22:30
fungioh, in the docs22:30
fungiwe could amend that, but it's worth a broader discussion on how (and if) we'd want to engage with third-party ci operators22:31
ianwyeah, service-discuss/announce?22:31
ianwok, fair enough it's not a 1:1 replacement scenario22:32
ianwschool run, bib22:32
clarkbfungi: I think we should continue to help them with setting up gerrit accounts and debugging gerrit interactions. But ya I'm wary about trying to solve all their CI issues. Thankfully that hasn't been a major issue for a while22:32
fungiright, what i'm questioning is whether we want to make "subscribe to service-*@opendev" a general recommendation for all third-party ci operators22:33
fungiwe used to tell them to subscribe to third-party-ci-announce@openstack because  at one point we were using that to notify of disabled accounts22:34
clarkbfungi: maybe we should email the accounts email address directly instead?22:34
fungii don't think we'd notify the service-discuss ml about specific accounts22:35
clarkbso something like "ensure your third party ci account's email address is up to date as important info may be sent to that address"22:35
clarkbyes agreed about not using service-discuss for that22:35
fungiyeah, i'm cool with that. i mostly wanted to avoid turning the ml removal change into a broader discussion about how we intend to interface with third-party ci operators, but am happy to do that in a followup patch. for this one i mainly just didn't feel right leaving references in the docs to an ml we're not using and are now removing22:37
corvusapparently the nodepool builders have been restarted frequently but not the launchers22:43
corvusso i will restart the launchers now22:43
corvusit would be cool if maybe when folks are updating them they could maybe try to keep them closer to being in sync?22:43
clarkbcorvus: updating them == the builders? I think the issue is that the builders automatically restart but the launchers do not22:44
clarkbwe could possibly decide that automatically restarting the launchers is safe enough and set them up that way too22:44
corvuswhy do the builders automatically restart (and why don't the launchers?)22:45
clarkbcorvus: I think the reason is that interrupting builds and breaking builds is low impact. We'll just use older images. But if the launchers break all CI will grind to a halt22:45
clarkbin theory the launchers will restart automatically and clean up their previously building nodes and continue where they left off without trouble though so it would mostly be if we land a bug in the launchers that presents a problem I think22:46
corvusyes, that's correct.22:46
clarkbI'd be open to auto restarting launchers too and see how we do. Its been a while since we landed a halt the world bug to nodepool (yay testing)22:48
fungii guess the frequent dib releases are the driving interest in automatic builder restarts?22:48
clarkbfungi: I think its more that we can do automated builder restarts and not break anything user visible22:49
clarkbbecause if image builds break we just continue along with the old images22:49
fungiit seems like every time we need to fix a bug in image building or add support for a new platform, that's a dib change which needs to be merged, released, added to nodepool's minimum to trigger new container image uploads to dockerhub which we pull and restart onto22:49
corvusthough it also means that the builders and launchers were apparently 8 weeks apart in versions22:50
clarkbfungi: yup, but that wasn't why we did autoamtic restarts22:50
fungii guess remembering to manually restart builders wouldn't be that big of a deal given all the other steps22:50
clarkbyes, if we wanted to go the other direciton we could do that too. Make them all manual then manually restart the full set when updating22:51
corvusseems like if we want to move toward CD, taking the next step and doing the launchers is reasonable.22:51
corvus#status log restarted nodepool launchers on a2e5e640ad13b5bf3e7322eb3b62005484e2176522:52
opendevstatuscorvus: finished logging22:52
corvuslooks like nodes are becoming ready, so at least it hasn't blowed up.22:53
clarkbanother thing we could do to reduce the risk there is reduce the interval we update nodepool on. Currently it is every hour. Which means if a change lands upstream in nodepool about an hour later we'd get it22:54
clarkber not reduce it increase it22:55
corvusnl02 has a pre-existing error about     if node.type[0] not in provider_pool.labels:  IndexError: list index out of range22:55
clarkbso that we update nodepool when we expect opendev admins to be around should it stop things22:55
corvusnl04 has a pre-existing error about  Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.22:55
corvus<class 'nova.exception.OrphanedObjectError'>22:55
corvus(by pre-existing i mean both of those appear in logs from before the recent restart)22:56
clarkba nova orphaned object error sounds like fun. nl04 is ovh. amorin may be interested in that one22:56
clarkbfor nl02's error I wonder if when we removed some older label we didn't properly let it steady state without nodes and now we just need to manually delete a zk record22:56
corvusyep that's from ovh gra122:56
corvusre nl02 probably something like that yeah22:57
clarkbI can look at nl02's thing more closely tomorrow22:57
corvus(or maybe that's the fake node for deleting issue that i think may have been addressed at some point recently?  look into that first if you want to look into it)22:58
clarkbk22:58
opendevreviewIan Wienand proposed opendev/system-config master: Test openafs roles on CentOS 9-stream  https://review.opendev.org/c/opendev/system-config/+/83984123:09
opendevreviewJeremy Stanley proposed opendev/system-config master: third-party CI: reminder to keep address current  https://review.opendev.org/c/opendev/system-config/+/84025123:16
fungiianw: clarkb: ^23:16
opendevreviewMerged opendev/system-config master: Upgrade gitea to v1.16.7  https://review.opendev.org/c/opendev/system-config/+/84021823:24
clarkbthat should start applying in a few minutes onces the hourly deploy is complete23:24
ianwi guess we probably want to do an upload of our openafs to the ppa to support jammy too23:35
clarkboh right23:36
clarkbunless jammy's packge is enw enough? I suppose that is possible23:36
ianwtrue; we've generally just ended up on the release we have to work around issues23:39
clarkbgitea01 has updated. Looks ok at first glance23:41
fungi01 seems to be working for me23:41
clarkboh I need to send out an agenda for tomorrow /me does this23:42
clarkbok agenda has been updated, I'll send it out in ~10 minutes giving others to add any additional items23:47
clarkb* giving others time23:47
* fungi has nothing to add23:48
clarkblooks like ubuntu ports updating for jammy has completed. I'll release the lock now23:48
clarkbthe screen is still up but lock is dropped. We can probably close the screen session there if no one lese needs it for anything?23:49
fungii don't23:49
fungithough as a general rule, when i'm doing those updates manually, i do try to log all the output to the same log the cronjob uses, so the screen history usually doesn't have much value over that anyway23:50
clarkbyup I logged to the same file. I just kept using the screen since it was there from previous recent mirror updates23:51
fungiah23:51
clarkball 8 giteas are updated now. The job isn't quite completel though23:58
*** dviroel is now known as dviroel|out23:59

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!