Friday, 2024-04-26

opendevreviewCedric Jeanneret proposed zuul/zuul-jobs master: Toggle synchronize to "quiet" mode  https://review.opendev.org/c/zuul/zuul-jobs/+/91711808:36
yoctozeptomorning folks10:33
yoctozeptoI need a little guidance if it is possible to set some extra requirements on node's hardware in opendev's Zuul10:33
yoctozeptospecifically, NebulOuS uses MongoDB in some places, and it requires AVX nowadays:10:34
yoctozeptoWARNING: MongoDB 5.0+ requires a CPU with AVX support, and your current system does not appear to have that!10:34
yoctozeptoso we have jobs randomly failing10:34
fungiyoctozepto: it's possible that using one of the less general node labels will get you that, at the expense of possibly having jobs wait or end in node_failure on the occasion that some of our providers are offline/unavailable13:13
fungifor example, maybe the providers with nested-virt acceleration also all have avx/avx2 capable processors13:29
yoctozeptoI see, so no clean way, thanks for confirming, fungi! I am now evaluating the alternative of pinning to mongodb 4.4 (it seems unlikely they really require 5.0+)13:33
fungiyeah, it's also a sign that you can't run latest mongodb in at least some public clouds. i have no idea if that's a consideration for your project13:38
fungifrom an end user perspective i mean13:38
opendevreviewAlbin Vass proposed zuul/zuul-jobs master: Zuul Cache role with s3 implementation.  https://review.opendev.org/c/zuul/zuul-jobs/+/76480814:04
yoctozeptoagreed fungi14:07
Clark[m]Also mongodb 5.0 isn't open source iirc.14:22
Clark[m]Ya all releases after October 2018 ish and 5.0 is from 202114:22
fungihow did i miss that bcachefs made it into linux 6.7?14:37
fungistill considered experimental for now, but encouraging nonetheless14:38
fungilooks like ems did the scheduled maintenance on our opendev.org matrix server 11:10:02-11:23:17 utc14:42
fungii didn't see any problems, but keep an eye out i guess14:42
Clark[m]I'm having a slow start today. Desktop was unresponsive and after a reboot it isn't fscking clean. I cleared out an orphaned inode and now we'll see if there are any clues to the original problem15:18
fungiyikes. hope your drive isn't on its way out15:18
Clark[m]smartctl says it is fine. Not sure how trustworthy that is. It is a new drive and statistically drives fail early or very late in life aiui15:27
Clark[m]I suspect though that this is an old Linux display port bug where it fails to rewake an idle device. And then rebooting was what made ext4 sad. I should probably switch over to HDMI 15:28
fungiah, yeah i've found dp very fiddly, but thought that was due to using a display hub to connect three monitors15:29
Clark[m]Though there is no kernel log for the prior boot. Almost like things ended up read only at some point and I didn't notice (because I put a lot of workload stuff in /home?)15:30
Clark[m]Arg15:30
Clark[m]Oh wait no I misunderstood the flags to journalctl I think15:31
Clark[m]Found it. Kernel panic. NULL pointer dereference for address 000...000815:32
Clark[m]In amdgpu related call stack stuff15:32
Clark[m]This caused xorg to crash which is why I had no more display15:33
fungiyeah, sounds unpleasant for sure15:37
clarkbok reading the stack trace I think this is in displayport related code. So ya I may be able to workaround this with hdmi if it happens again15:46
clarkband I'm reasonably confident my disk is ok now15:47
clarkbI'm going to punt on further debugging unless it bcomes a bigger issue15:47
clarkbalso in the future I may be able to change vtys and debug without rebooting15:49
clarkbI should've attempted that at the start, but its early in the mornign :)15:50
fungidepending on the panic, i can also usually leverage magic sysrq nerve pinches15:59
fungii typically try to do: sync, emergency unmount, reboot15:59
fungithat often avoids leaving the fs dirty16:00
clarkbdrm_dp_add_payload_part2 is where the stacktrace jump to the page fault routine. I thin dp may be displayport so ya16:00
opendevreviewClark Boylan proposed opendev/glean master: Update zuul config to drop xenial jobs  https://review.opendev.org/c/opendev/glean/+/91695216:06
clarkbfrickler: ^ now with python31116:06
opendevreviewClark Boylan proposed openstack/project-config master: Remove last bit of system-config-puppet-apply-jobs usage  https://review.opendev.org/c/openstack/project-config/+/91719816:21
opendevreviewClark Boylan proposed opendev/system-config master: Remove old infra team puppet testing  https://review.opendev.org/c/opendev/system-config/+/91231116:22
clarkbinfra-root if you have time for reviews a subset of https://review.opendev.org/q/topic:%22drop-ubuntu-xenial%22+status:open should be mergeable for some Xenial cleanup. However this is really just scratching the surface so not sure if we want to wait a bit more until we can remove larger portions of config16:23
fungido you have a feel yet for whether ripping out d-g at the same time makes sense, or should happen separately?16:36
clarkbIn my head I think I'd like to do project retirements separately just so that I don't have too much stuff to page into memory. But I think if someone else wanted to get that done it would simplify some cleanup17:00
clarkbGerrit 3.9 removes stars info from the api ChangeInfo repsones. I'm like 99.99% certain this is a non issue for Zuul, but it may be a problem for gertty so calling it out here17:05
* clarkb is currently looking at the gerrit 3.9 release note list again17:05
clarkbfungi: if you look at the etherpad the last breaking change has to do with how ssh keys are validated. If you still have your scripting around for looking at ssh key stats maybe you can check if we have any users with bad keys?17:08
clarkbI don't think it is a big deal but would be good to know ahead of time if that is straightforward.17:08
clarkbthe suggested edit feature is a little clunky but it is really cool that we can do that in 3.917:11
clarkbThe other thing that might be good for people to think about is we can optionally enable diff3 diffing for merge changes which may show better context17:12
clarkbtimburke: I believe you're someone that deals with merge changes in Gerrit semi often, do you have any opinion on whether or not the extra context diff3 provides would be helpful to you?17:13
clarkbfungi: I remembered to check the mailman uwsgi-error.log files for 'listen queue' errors and there haven't been any since we updated the webserver stuff17:20
clarkbhowever we also weren't getting those issues every day so still to be determined if that was sufficient for improving it17:20
fungiclarkb: is gerrit deprecating the starring functionality, or just requiring different api methods to get the detail now?17:25
fungiwhat's the link to the pad?17:26
clarkbfungi: they are just requiring you to supply extra flags to changeinfo requests to populate the data17:30
clarkbhttps://etherpad.opendev.org/p/gerrit-upgrade-3.917:30
fungioh, got it. that'll be easy to patch gertty for in that case17:31
fungiclarkb: i guarantee there are ssh keys in our gerrit which will fail that. many had mistyped or completely bogus algorithm fields17:33
fungiis that going to cause a problem for the server, or just for the accounts in question?17:33
clarkbmy read is the only thing it should do is result in extra error logs for existing entries17:34
clarkbthey will continue to function as will the server. New keys won't be able to be added if they fail the criteria though17:34
clarkbeventually we may see the keys stop working for those users though17:34
fungiodds are few, if any, of those accounts are still in use17:36
clarkbthen we're probably fine17:37
opendevreviewMerged openstack/diskimage-builder master: Add tox-py311 job  https://review.opendev.org/c/openstack/diskimage-builder/+/91705817:47
timburkeclarkb, re: diff3 -- i don't know exactly what that would end up looking like. for the most part, though, we (swift) would only see a merge change if someone's updating a feature branch; that person would pretty much always be a core themselves and just merge it once they see tests pass19:05
timburkeso idk that it really matters much (for me)19:05
timburkeif it'd be useful to have an example patch to see what resolved merge conflicts look like today, though, see https://review.opendev.org/c/openstack/swift/+/73538119:07
timburkei just noticed -- the "Size" column in the patchlist is a little funny for merges: it shows XL for that patch, "added 3174, removed 745 lines", but when you go into it the "Delta" summary is a much more reasonable +1 / -28 :P19:11
clarkbtimburke: I think it shows a third file state which is the "base" state19:39
clarkbtimburke: "You can pass --conflict either diff3 or merge (which is the default). If you pass it diff3, Git will use a slightly different version of conflict markers, not only giving you the “ours” and “theirs” versions, but also the “base” version inline to give you more context."19:40
clarkbso its just extra information/context19:40
clarkbhttps://blog.nilbus.com/take-the-pain-out-of-git-conflict-resolution-use-diff3/ here's a writeup of it19:41
clarkbI think this boils down to deciding if we feel the extra information is helpful or too much noise and should be omitted19:41
clarkbat this point I've managed to go through the entire list of things I put on the etherpad. There is a rather large list of other changes gerrit has made that I should probably skim too for any concerns. But good news is I don't think there are any major problems with the list we've already got20:05
clarkband the proposed edits feature is a really nifty.20:07
clarkbupgrade doc now has a rough plan for the upgrade itself. I'll run through that on monday with a held node so that I can fill in details like log info and things to watch out for. I thought having those last upgrade was really helpful for ensuring the upgrade was going as anticipated21:02
clarkbI should test it now but I don't want to discover a problem with the upgrade friday afternoon just to stew on it all weekend :)21:04
fungino, no. it's wind-down time21:26

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!