Monday, 2023-06-05

opendevreviewIan Wienand proposed opendev/infra-openafs-deb master: Update Jammy to 1.8.9  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88520000:01
*** JasonF is now known as JayF00:01
opendevreviewIan Wienand proposed opendev/infra-openafs-deb master: Update Jammy to 1.8.9  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88520000:02
ianw"Could not perform action: move changes endpoint is disabled" ... huh00:04
fungiwhere did you see that error?00:06
opendevreviewIan Wienand proposed opendev/infra-openafs-deb jammy: Update Jammy to 1.8.9  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88520100:06
ianwtrying to move the change from the UI00:06
fungigerrit ui? what exactly does "moving" a change mean in that context?00:06
ianwsorry, yeah the gerrit UI.  there's a "move change" from the hamburger dots.  for i guess like me when you "git review" and push the change to master, when you really meant to push to the "jammy" branch00:08
fungiaha, so retargeting it to a different branch00:09
ianwi guess we can certainly get it fixed upstream; but this would not be the first time we've benefited from running from our own ppa built versions00:09
ianwupstream being ubuntu in this case00:10
fungimore like an sru for jammy00:10
fungikinertic already has it00:11
fungikinetic00:11
ianwthe time consuming part is generally if we want to backport 1.8.9 to the older distros in our ppa to be in sync00:11
ianwprobably a decent argument that they will never get a kernel that breaks this, so leaving well enough alone00:12
fungiyeah, i wouldn't be concerned with <jammy for support of newer kernels00:16
opendevreviewIan Wienand proposed opendev/infra-openafs-deb jammy: Update Jammy to 1.8.9  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88520100:44
opendevreviewIan Wienand proposed opendev/infra-openafs-deb jammy: Update Jammy to 1.8.9  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88520100:53
opendevreviewIan Wienand proposed opendev/infra-openafs-deb jammy: Update Jammy to 1.8.9  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88520101:22
ianwfungi: ^ that works now, and has produced the source @ https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_578/885201/4/check/openafs-deb-package-build/5786e2c/openafs-debs.tar.gz01:32
ianwso on commit it should get pushed to the PPA for build01:33
ianwone thing is I guess we don't have a pre-validation build stage; i'm not sure when i made all this if I forgot that or the idea is that the ppa will build it, and if it fails we have logs, can iterate and push again01:34
ianwi think i'm tending towards the second, because otherwise we're just re-creating the ppa build process but maybe not exactly, so doing more work to find the same thing01:34
ianwthe other thing is I think that it has only worked once under my supervision for 1.8.8.1 so i wouldn't class it as "fire and forget" at this point01:35
ianwit wants review but probably good for someone else to babysit it to reduce the bus factor of me pushing it again01:36
*** amoralej|off is now known as amoralej06:51
fungiianw: thanks! i went ahead and single-core approved the dependency to fix the build job. will wait until there are a few more people awake to determine how we want to test the updated build12:00
corvusgiven that jammy afs appears broken, seems like we can push it to the ppa, pull that onto one of the replacement ze hosts (they're all still running) and try it out?14:34
fungithat works for me. i mainly worried about other jammy hosts auto-updating to that before we know it works. i *think* the changelog scoping the upload to jammy will limit the ppa build to there, though i'm not 100% sure it won't update our packages for older releases14:42
fungii'm happy to approve it and keep an eye on it so we can mitigate if something goes horribly wrong14:42
fungialso i suppose the blast radius is further limited by the fact that the new module will only be used after a reboot14:43
fungiso as long as we test in a timely manner before we get bitten by some critical bit of infrastructure rebooting on us due to an emergency, it should be quite safe14:44
corvushrm good point, it might work on jammy hosts that don't execute that syscall.  do we have existing jammy afs clients?14:53
corvusoh and sorry i just picked up on the concern that it might affect older releases14:53
fungiwell, again since it needs a reboot to swap in the rebuild lkm, i'm not super worried14:54
corvusthen yeah, that is outside my area of expertise.14:54
fungii'll be able to tell for sure once the build finishes whethert it's showing up just for jammy14:54
fungiand as long as we take swift corrective action i think it will be fine14:55
fungii suspect we don't have any jammy-based afs clients yet anyway14:55
fungilooks like backup02.ca-ymq-1.vexxhost:/opt/backups-202010 is at 92%, i'll get a prune going in a root screen session there15:02
opendevreviewMerged opendev/infra-openafs-deb jammy: Update Jammy to 1.8.9  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88520115:02
fungiunderway15:03
clarkbnote that 885201 will update static's openafs deployment15:13
clarkbI don't think that is necessary a problem but we should be aware of it15:13
fungithanks, yeah i didn't realize static was upgraded to jammy already, but that's a good one to keep an eye on15:14
fungiit's read-only though i think, so likely why we didn't hit the same problems on it as the executors?15:14
clarkbya I'm guessing the issue the executors hit was due to write activity and not read activity since static has been fine as far as I can tell15:14
fungihttps://launchpad.net/~openstack-ci-core/+archive/ubuntu/openafs now has 1.8.9-1~ppa1~jammy15:23
fungii'll try an apt update on a focal server just to make sure it's not showing up, and a jammy server to make sure it is, then we can try upgrading one of the new executors15:23
fungioh, i guess that was the upload notice not the build notice we got15:27
fungiit's not showing up with an apt update yet15:27
fungithough i notice apt complains about the fact that we have two files trying to add the same ppa to sources15:28
fungi/etc/apt/sources.list.d/openafs.list and /etc/apt/sources.list.d/ppa_openstack_ci_core_openafs_jammy.list15:28
fungiaha, added by both our openafs-server-config role and our zuul-executor role, i think?15:29
clarkbI suspect static won't update until unattended upgrades occur. We may want ot get ahead of that?15:35
clarkbbut ya testing on servers not yet in use first is a good idea15:35
fungihttps://launchpad.net/~openstack-ci-core/+archive/ubuntu/openafs/+build/26277525 says the build completed 23 minutes ago15:37
fungibut an apt update on the replacement ze01 isn't showing it available15:37
fungistatic02 does show them as available though15:39
clarkbload balancer differences maybe?15:40
fungiaha, it was the duplicate sources.list entries causing it to be ignored. i moved one of them to an invalid filename and now it's appearing15:40
fungireview02 is running on focal and doesn't see updated openafs packages, so that's good15:44
fungion 104.130.124.246 (replacement ze02) i'm running `sudo apt upgrade` which should pull in the new builds of openafs-client, openafs-krb5, and openafs-modules-dkms15:46
fungier, replacement ze01 i mean15:46
fungiit's also applying some new security updates for grub, perl, and libssh15:46
fungidkms is going to take a few minutes to build the new lkm15:48
clarkbany idea if it will modprobe the update successfully replacing the old one or do we need a reboot? In particular i wonder because one is from a ppa and the other from upstream and not sure they properly replace each other (I think they do)15:54
fungiit looks like it did indeed modprobe it in and restarted the client15:55
fungithough we may want to reboot the machine anyway just to be on the safe side15:56
fungialso i totally forgot that module hotloading was possible for this, so yes the risk with static02 is a little higher than i estimated15:56
fungicorvus: any concerns with rebooting the replacement ze01? i see there is an executor container currently running on it, not sure if it needs to be stopped gracefully first15:57
fungiassuming things go well with the reboot, we can try upgrading static02 and make sure stuff served out of afs is still browseable thereafter15:58
corvusfungi: it should be a zombie16:02
corvusfungi: we should docker-compose down it16:02
corvusso docker-compose down, then reboot should be okay.  if something goes wrong and it does start up, then i think it will either work or safely crash, so no worries there.16:03
fungion it, thanks!16:04
fungidowned the container and then rebooted the server16:05
fungirebooted successfully, container does not appear to have started on boot16:07
fungias a cursory check, i'm able to browse /afs/openstack.org and /afs/.openstack.org from it, so static02 will likely be fine16:08
fungicorvus: did you have a manual reproducer for the bug you saw which i can test on the replacement ze01 or should we just try to swap it back into service?16:11
corvusfungi: no i don't; gimme 1 minute to see if i can repro on new-ze0216:16
corvusfungi: as root:  `docker run --rm -it --privileged -v /afs:/afs -v /etc/openafs:/etc/openafs quay.io/zuul-ci/zuul-executor:latest zuul-bwrap /tmp /bin/sh`16:21
fungithanks16:21
*** amoralej is now known as amoralej|off16:21
corvusfungi: if it works, you'll get a bunch of bwrap log entries and a /bin/sh prompt; if not, you'll get dropped right back into your current shell with a bad return code and no output16:21
corvus(and a kernel trace in dmesg)16:22
clarkbcorvus: oh wow the errors were just starting the container then? I guess idfferent than what I expected given static is happy but static isn't running in a container either so that could be the difference16:23
fungiBUG: kernel NULL pointer dereference16:23
fungi:(16:23
clarkbfungi: and I guess you double checked that the new module is installed and not the old one?16:26
corvusclarkb: it's in the bwrap check on startup when it tries to make a pag16:27
clarkbfcntl.ioctl(open("/proc/fs/openafs/afs_ioctl"), 0x40084301, struct.pack("lllll", 0, 0, 0, 0, 21)) I guess16:28
fungi`uname -a` says we're booted on 5.15.0-73-generic and the last update time on /lib/modules/5.15.0-73-generic/updates/dkms/openafs.ko was last modified half an hour ago16:34
clarkbfungi: I wonder if pagsh works16:34
clarkb(mostly out of curiousity I don't know that this would fix anyhting for us, but maybe it would point to us trying to hit internal apis that are no longer supported?)16:34
corvushttps://opendev.org/zuul/zuul/src/branch/master/zuul/driver/bubblewrap/__init__.py#L83-L8916:35
corvusclarkb: yeah, i think it's worth a check whether that is still the right api/data16:35
fungialso `find /lib/modules/ -name openafs.ko` only finds the one module so there's no other version it could be loading i don't think16:35
clarkbfungi: cool16:36
clarkblooking at the old strace we're sending a TCGETS ioctl request and getting back an inappropriate ioctl for device. fungi is the new traceback similar? That might lead some additional evidence towards doing the wrong thing with ioctl there now?16:48
clarkbI think struct.pack("lllll", 0, 0, 0, 0, 21) is setting AFSCALL_SETPAG which resets the PAG. I'm not sure about the request value though as that seems to line up with TCGETS16:58
fungii did not see any traceback. you mean strace?16:58
fungii can try to run the container under strace i think16:58
fungido i need to strace docker-run or zuul-bwrap?16:59
fungilooks like there's no strace inside the container image17:00
clarkbfungi: ya I meant strace but also dmesg should have something? You want to trace zuul-bwrap17:00
fungiahh, the call trace from the kernel, yeah17:01
corvuspython3 -c 'import fcntl; fcntl.ioctl(open("/proc/fs/openafs/afs_ioctl"), 0x40084301, struct.pack("lllll", 0, 0, 0, 0, 21))'17:01
corvusshorter reproducer with no docker/etc.  can just run that from the host17:01
clarkbI think https://github.com/openafs/openafs/blob/630d423897e5fffed1873aa9d12c4e74a8481041/src/sys/glue.c#L24-L44 is where afs is giving us an entrypoint to do this sutff17:04
fungihttps://paste.opendev.org/show/bgS2qV5CUcp1IGlqNcOE/17:05
fungithat's what dmesg reports from the kernel oops17:05
clarkbhttps://github.com/openafs/openafs/blob/630d423897e5fffed1873aa9d12c4e74a8481041/src/config/afs_args.h#L258 This defines the request in terms of _IOW()17:06
fungiyeah, on the updated server, corvus's python-based reproducer results in "Killed" on the terminal and a 137 exit code17:08
clarkbfungi: you should be able to trace that17:09
corvusclarkb: https://paste.opendev.org/show/byum0QqoWxpe9Ow6Li3e/  double checking current values17:10
corvus(that's on a local jammy host)17:11
fungihttps://paste.opendev.org/show/b4gQlSSvajxWGSi728Aa/17:11
fungithat's the strace on the ze01 replacement17:11
clarkbok so unlikely that we're hitting an invalid interface and instead the bug is likely in openafs :/17:11
clarkband TCGETS must just be an unfortunate collision and strace it trying to be helpful17:11
corvuswhat if the tcgets is a python thing and it's causing the issue?17:12
corvusopenat(AT_FDCWD, "/proc/fs/openafs/afs_ioctl", O_RDONLY) = 317:13
corvusioctl(3, _IOC(_IOC_WRITE, 0x43, 0x1, 0x8), 0x7ffcea36e210) = 017:13
corvusclose(3)                                = 017:13
corvusthat's from pagsh17:13
corvuslike python itself is performing an unrequested tcgets17:14
clarkboh weird17:14
clarkbyou can set strace --raw and set a syscall value for it to not interpolate arguments17:14
clarkbmaybe we do that to see the raw value being passed there just ot be sure it isn't a some helpful mixup but actually a different value?17:15
clarkbhrm its trying to do something with encodings17:17
clarkbis it possible that is just a red herring? its doing that to figure out terminal stuff then failing on our thing later?17:17
clarkbno scrolling to the end of fungi's paste the last ioctl it does is that after opening the file we want to talk to 17:18
clarkbunless strace can't log the lsat ioctl (the one we care about) because it crashes before strace can strace it? (I didn't think strace worked that way though)17:19
fungii could try to strace it with a null stdin to convince python3 there is no terminal17:19
clarkbor we just need a little C program to do it17:19
clarkbsee if that reproduces or not17:19
clarkbI guess that is what pagsh does though and it worked according to coruvs? though not sure where that ran17:20
fungistill does a tcgets when i </dev/null17:20
corvushttps://paste.opendev.org/show/bMllXA2i6CFZxvGeNGOT/17:21
corvusthat works for me17:21
corvusi think it's the high-level python open method17:21
clarkbI'm guessing older openafs just sort of ignored what python was doing? or maybe it answered properly and now it doesn't and we explode? this is a really fun bug :)17:22
fungior python on jammy is trying to be helpful17:22
clarkbcorvus: does open() work if we open it in byte mode17:22
clarkbfungi: zuul is on python3.11 from custom compiles (but also jammy's python seems to reproduce)17:23
fungioh, good point17:23
clarkbya I'm guessing if we switch to binary mode we might be able to do the high level open17:23
fungiso we're not seeing it with python 3.11 on older kernels17:23
corvusclarkb: "rb" doesn't fix it17:23
clarkbfungi: correct17:23
clarkbcorvus: darn, was worth a shot I guess, but switching to os.open seems reasonable?17:24
corvus(nor does "wb")17:24
fungikeep in mind we were seeing this break with jammy on the openafs version we use with focal17:24
corvusyeah working on the os.open change17:24
clarkbfungi: no I don't think so17:24
clarkbfungi: jammy nodes should use jammy openafs17:24
clarkb(because it is newer than what was in the ppa)17:24
fungiclarkb: anf our focal nodes use jammy openafs backported to focal via ppa17:24
clarkbfungi: no the ppa is older than jammy. I dug into this a bit when I deployed static17:25
clarkbstatic "properly" installed jammy openafs from upstream and not our ppa because apt found a newer version in upstream17:25
clarkbbut ya it could be the kernel itself being more discerning about ioctl requests against specific endpoints17:25
clarkband have nothing todo with afs itself other than afs not supporting that request17:26
fungi1.8.8.1-2~ppa0~focal vs 1.8.8.1-3ubuntu2~22.04.117:26
clarkbah ok so minimally newer17:26
fungiso the patches from -3ubuntu2 are the difference i guess17:26
clarkbfungi: either that or the kernel itself being to blame regardless of the openafs version17:27
fungiyeah, that's why i was leaning toward it being kernel-version-specific, but you're right the packages don't have exactly the same source17:27
fungii'll try to compare patches in a sec17:27
tonybQuestion from the peanut gallery?  that strace seems to have the ioctl file opened read-only shouldin't it we write-only?17:28
corvuswrite access isn't required for ioctls17:28
clarkbbut also it is doing a TCGETS which is a read operation17:29
clarkbthe problem here is python trying to infer encoding types for the file open I think17:29
corvus(it's definitely a good thing to check, and i did check "wb" on the high level open call just to make sure something weird isn't happening)17:29
clarkband to do that it does a TCGETS and that explodes on ioctl devices that don't expect it17:29
clarkbI'm guessing this works on normal files in a less explodey manner17:29
corvus(but since it isn't/shouldn't be necessary, best to request minimal perms so we don't run into a permission denied error)17:30
clarkbcorvus: fungi: so I guess the next steps are to land this then restest on new ze01 and new ze02 and decide if we even need new openafs for jammy?17:30
corvusclarkb: indeed -- my earlier reproducer (the python one-liner) actually has a bug in that i forgot the 'import struct' which means it was failing before we even got around to doing the ioctl, it really was just at the open() call.17:31
corvusclarkb: wfm17:31
clarkbok my only concern with that is we be aware of static's state since I think unattended upgrades will update it if it hasn't already17:31
clarkbthose run around 0600 UTC?17:31
corvusmight even be a good idea to pull that onto those from the intermediate registry; get answers sooner and without landing it17:32
clarkb++17:33
clarkbcorvus: I'm going to leave a comment about error checking :)17:33
corvusclarkb: i was going for like-for-like17:35
clarkbcorvus: posted. Ya but open() raises and handles some errors a bit more graceuflly I think. We don't have that anymore so need to check if the fd is -117:35
opendevreviewNeil Hanlon proposed openstack/diskimage-builder master: Set platform argument in container build command for cross-arch builds  https://review.opendev.org/c/openstack/diskimage-builder/+/88445117:35
clarkbcorvus: but ya its probably not critical either17:36
corvusclarkb: nothing about the old code was graceful; i think the new code will explode in equivalent ways17:36
clarkback17:36
clarkbcorvus: I guess a followup can add more error checking then. I'll +2 this17:36
corvusi mean we don't need/want error checking17:36
corvusif this fails, we want it to crash17:36
corvusif it does something other than crash, then we need error checking17:36
clarkbcorvus: looks like -1 does raise if you try to operate on it as an fd so it will crash17:37
corvusright, so this code has exactly the right error checking17:38
corvusi do not think we should mask the exception which is what adding a try/except handler would do17:38
clarkback wfm17:38
clarkbfwiw there are libraries that will calculate the _IOW values dynamically at runtime. If we need this to be more portable we could do that but seems that is overkill since we haven't needed it yet. That said maybe we should add a commentabout where that magic number comes from. I'll push that up on top of corvus' change17:43
clarkbcorvus: fungi: where do I find the new executor IP addrs? 17:57
fungiclarkb: in the revert commit near the top of system-config17:58
clarkb`python3 -c 'import fcntl; import os; import struct; fd = os.open("/proc/fs/openafs/afs_ioctl", os.O_RDONLY); fcntl.ioctl(fd, 0x40084301, struct.pack("lllll", 0, 0, 0, 0, 21))'` works on jammy ze0218:01
clarkbimplying distro openafs is ok18:01
clarkbhttps://zuul.opendev.org/t/zuul/build/a79927386da74218aa7394ee7d5875b2/artifacts we have artifacts now if we want to test zuul stuff on new ze01/ze0218:02
fungiso we should be able to unwind the ppa update18:02
clarkbfungi: I suspect so but we can test with the zuul images above on ze01 and ze02 to determine that. We also need to make sure we don't orphan that package on some hosts if they already pulled it18:03
clarkbfungi: maybe you want to pull the executor image from ^ and run your bwrap tests again?18:03
clarkbI think that is a good next step18:03
fungihow do i pull from the buildset registry?18:11
corvusyou can do that with right-click copy link for the image in question on the artifacts page; so it'd be like "docker pull insecure-ci-registry.opendev.org:5000/quay.io/zuul-ci/zuul-executor:a79927386da74218aa7394ee7d5875b2_latest" and then "docker run ..." same thing18:11
fungiaha, thanks18:11
fungiyou anticipated my question18:11
clarkbI'm trying to sort out in my head if we report this as a bug to either python or linux18:12
clarkbI think python probably should handle this more gracefully particularly since open() worked before. But at the same time linux says you shouldn't break userspace...18:12
clarkbmaybe we do neither and just accept that for ioctls using os.open is a good idea anyway18:12
corvusbut linux might stop reading at "tainted"18:13
clarkbya we might have to reproduce on some other ioctl with python18:13
fungisudo docker run --rm -it --privileged -v /afs:/afs -v /etc/openafs:/etc/openafs insecure-ci-registry.opendev.org:5000/quay.io/zuul-ci/zuul-executor:a79927386da74218aa7394ee7d5875b2_latest zuul-bwrap /tmp /bin/sh18:15
fungi[...]18:15
fungi#18:15
fungiyay!18:15
corvus\o/18:15
fungihole in one18:15
clarkbwoot18:15
clarkbso now we have two threads to pull on. Which openafs version do we want to use on jammy? And do we proceed with replacing executors?18:16
fungii vote we roll back to distro original packages on jammy18:16
fungiwhich presumably means reverting the build change and then cleaning up the ppa18:16
clarkbI'm good with that. We can always roll forward again too if we need to by reverting18:17
fungiand then double-checking the jammy hosts to make sure they haven't updated18:17
fungiworking on the change first18:17
clarkbstatic02 has not updated according to dpkg -l18:17
clarkbfungi: before we do that can you run our bwrap test on ze02?18:17
clarkbfungi: since that will be a bit more complete than the test I did18:17
fungisure, i can repeat it there18:18
fungi104.130.127.14 is the replacement ze0218:18
fungistill running openafs 1.8.8.1-3ubuntu2~22.04.118:19
fungipulling now18:19
fungistill worked!18:21
fungii've gone ahead and approved the zuul change18:21
clarkbgreat so ya I think trying distro openafs to start with is good and we can always do this update to openafs if we need to later18:21
clarkbI'm not sure I understand what made https://review.opendev.org/c/opendev/infra-openafs-deb/+/885201 jammy only18:25
opendevreviewJeremy Stanley proposed opendev/infra-openafs-deb master: Revert "Update Jammy to 1.8.9"  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88530418:26
fungiclarkb: corvus: ianw: ^18:26
clarkbfungi: ^ do you know why that only made jammy packages?18:26
fungiyes, apparently we upload for specific ubuntu releases to build18:26
clarkbright but what controls that?18:26
fungithe repo itself is branched, and the jammy branch uploads for jammy18:26
clarkbaha I missed that was on the jammy branch. Thanks18:27
fungiyeah, me too originally, until i realized what i was looking for/at18:27
clarkbfungi: any idea what the process is there? Do we need to cleanup the ppa first? then land the change?18:27
clarkbI don't know if the ppa kept our old version around. I guess it doens't matter in this case bceause we'll grab the distro version.18:27
clarkbSo we can probably land this chagne. It will push the old stuff and then we delete the new stuff?18:28
clarkb+2 on https://review.opendev.org/c/opendev/infra-openafs-deb/+/885304 either way. Just let me know if I can help with the order of operations or anything18:28
fungiyeah, i'm assuming we clean up after that pushes18:31
fungithough i expect we can clean up sooner as long as nothing triggers a new upload18:32
clarkblooking at https://launchpad.net/~openstack-ci-core/+archive/ubuntu/openafs/+packages lading the revert may just replace the package? it isn't keeping older packges there18:33
fungihttps://help.launchpad.net/Packaging/PPA/Deleting18:33
clarkbassuming it doesn't do any special version comparisons then I think it would just replace newer thing with newly built older thing?18:33
fungiyeah18:33
fungiand then we presumably delete both18:33
clarkbI think it would just be the one and ya we could decide if we need to delete it or not18:34
clarkbbut ya I think first step is landing that revert and see what happens18:34
fungiagreed. if there end up being two versions then we can delete both at the same time18:34
clarkbsounds good I need to sort out lunch momentarily but I'm mostly around if we want to do that18:35
fungialso, thinking about it, unattended-upgrades probably won't upgrade it automatically since it's not a security update?18:35
clarkbfungi: oh good call. We might need the daily ansible runs to do it if the role explicitly updates the package? or maybe it was alwys a manual thing18:35
fungii'm around to continue hammering on this, so happy to monitor it18:35
clarkbfungi: approve at will then I guess18:35
fungithanks18:35
Clark[m]fungi: did you want me to approve it or are you waiting for a good time spot?19:00
fungioh, sorry, i misread earlier and thought you said you approved it. will do now19:19
fungithough chris just asked if we can go out to get something to eat, so maybe i should hold off?19:20
fungiregardless, i don't think the new upload is going to cause any issues that the earlier one wouldn't, so let's just go ahead and get it rolling19:21
fungiand approved19:21
opendevreviewMerged opendev/infra-openafs-deb master: Revert "Update Jammy to 1.8.9"  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88530419:22
Clark[m]Sounds good19:22
fungiokay, headed out for a bit, shouldn't be more than an hour and then i'll keep working on this19:28
clarkbok that change didn't actually do anything19:47
clarkbbecause it was pushed to master19:48
clarkbfungi: ^ 19:48
clarkbhow did that even make sense to git I'm really confused now19:49
tonyborgin/master and origin/jammy were the same19:52
clarkbtonyb: no they differed I think. The first parent was 599c797 which only has a readme in it19:53
tonybYeah Okay I spoke to soon19:53
clarkbfirst thing is I feel like this is a gerrit issue19:53
clarkbbecause the diff is completely wrong given that situation19:54
clarkbbut I suspect this happened because the content is fast forwardable19:54
clarkbanyway we don't want to revert the merge we want ot revert all the changes between here and there which is annoying19:54
clarkblet me get a proper revert up first though19:55
opendevreviewClark Boylan proposed opendev/infra-openafs-deb jammy: Fix jammy branch .gitreview branch  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88530819:56
opendevreviewClark Boylan proposed opendev/infra-openafs-deb jammy: Revert "Update Jammy to 1.8.9"  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88530919:57
opendevreviewClark Boylan proposed opendev/infra-openafs-deb master: Revert "Revert "Update Jammy to 1.8.9""  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88531020:03
opendevreviewClark Boylan proposed opendev/infra-openafs-deb master: Revert "Update Jammy to 1.8.9"  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88531120:03
opendevreviewClark Boylan proposed opendev/infra-openafs-deb master: Revert "Add build stamp to push to production PPA"  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88531220:03
opendevreviewClark Boylan proposed opendev/infra-openafs-deb master: Revert "Test generic job path"  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88531320:03
opendevreviewClark Boylan proposed opendev/infra-openafs-deb master: Revert "Run promote job from project-config"  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88531420:03
opendevreviewClark Boylan proposed opendev/infra-openafs-deb master: Revert "Build jammy packages"  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88531520:03
clarkbThe git revert manpage says reverting merge commits is weird and I'm not sure it is what we want. So instead we get several changes to revert what the merge brought in... Could do it as a single squashed change if we prever20:03
clarkb*prefer20:04
clarkbhttps://github.com/git/git/blob/master/Documentation/howto/revert-a-faulty-merge.txt#L172-L17420:05
clarkbUltimately I think gerrit really let us down here :/20:06
tonybYeah It was nice of it to try and be smart  It needed a few alarm bells though20:17
tonybWhere are the zuul tenant configs kept?  I think that's what I want to ask.  I wanted to remove the windmill stuff but I can't quite see how it all comes togther.20:20
tonybI suspect there is one (or more) repos that I don't clone that I should20:21
clarkbtonyb: openstack/project-config/zuul.d iirc20:21
clarkbthat dir might be wrong but the repo is correct20:21
tonybOkay I'll keep poking.20:22
clarkbI just posted to the gerrit matrix/discord room about this and ya asked for better diffs and or a warning if the parent change is not proposed to the target branch or already on the target bracnh20:22
clarkbtonyb: sorry it is in zuul/main.yaml and I think windmill is already removed20:22
tonybAhhh zuul not zuul.d20:22
tonybIs it worth trying to clean up windmill stuff in gerrit/gitea/hound ?20:23
clarkbtonyb: I thought about it when I pulled them out of zuul but its a lot of effort to retire things and if the maintainers aren't interested in it them I'm somewhat inclined to just leave things as is20:23
clarkbbasically no reason to add extra work to ourselves20:24
clarkbBut if we want to clean things up further the project retirement steps are probably the ones to do. Then the repos will be auto removed from things like hound and have little content elsewhere20:24
tonybclarkb: Okay thanks20:25
clarkbI'm going to delete zp01.opendev.org now20:27
clarkb#status log Deleted zp01.opendev.org (0eb65b92-2ccc-4fc1-a410-c240c96851f0) as it has been replaced by a newer server.20:29
opendevstatusclarkb: finished logging20:29
opendevreviewClark Boylan proposed openstack/project-config master: DNM Testing merge mode errors in Zuul configuration  https://review.opendev.org/c/openstack/project-config/+/88531620:34
opendevreviewClark Boylan proposed opendev/system-config master: Update Gerrit 3.8 image to the final 3.8.0 release  https://review.opendev.org/c/opendev/system-config/+/88531720:45
clarkbfrickler: https://review.opendev.org/c/openstack/project-config/+/885316 confirms your hunch "Merge mode merge not supported by project github.com/ansible/ansible. Supported modes: []."20:48
clarkbcorvus: ^ fyi I don't think that is urgent because opendev removed those projects from its config instead, but may be something we need to sort out for other github users of zuul?20:48
clarkbI'm guessing this is a permissions or api issue.20:48
fungiokay, back20:58
clarkbfungi: tldr is we merged that revert to master which did an implicit merge of jammy into master20:58
fungiclarkb: catching up, i checked out the jammy branch but maybe we don't have a defaultbranch set in .gitreview (checking)20:58
corvusclarkb: it has not come up in my github-related work, so i suspect permissions or api edge case.  i agree that the behavior on opendev wasn't ideal, but also, i wouldn't want to simply say "if it's empty just pick something" since that could be a good canary that something important is wrong.  i think a solution should probably involve a deeper analysis of what conditions result in the empty list.  since it's not coming up except in this case so20:58
corvusfar (where, if i had to take a wild guess, someone just yanked our app access because who are these people anyway) i'm happy to leave it be for now.20:58
ianwhrm, i'm unsure if pushing an older version to the PPA will build or if it will reject it on version number differences20:59
clarkbfungi: yup the first change I pushed fixed that then the follow up reverts the change on the correct branch20:59
fungiokay, we do set defaultbranch in .gitreview, unfortunately it's set to "master"20:59
clarkbcorvus: ya I think I'm happy to leave it for now and i Think the error feedback should help people who have set up access properly. Then we can corss this in opendev if we start third party ci again20:59
corvusclarkb: (put another way: if all those assumptions are correct, zuul would have actually correctly informed us (in its own esoteric way) that we did not have a correct configuration to do any kind of speculative testing of changes to those repos)21:00
clarkbfungi: and then there is a stack of reverts on master to revert the changes from the implicit merge. And I sent gerrit a message on discord/matrix to be all this seems bad and wrong can we make it better?21:00
clarkbianw: we'll find out :)21:01
fungiokay, running through the new changes21:01
clarkblike I said ultimately I think Gerritfailed us here. It could've warned us the parent wasn't on the branch already or proposed to the branch. It could show a more complete diff etc.21:02
clarkbhopefully we can work with them to make this situation less confusing21:03
fungiyep, i've approved them all now i think. sorry about that21:03
ianwyeah; i did the same thing initially too yesterday uploading to the wrong branch.  i did notice that the "move branch" from the UI didn't work there either, saying the endpoint was disabled21:04
fungiwell, ultimately the fact that we had a defaultbanch set to master in the jammy branch was a foot cannon primed to fire21:04
ianwyeah, mea culpa on that21:04
clarkbI did not check the other branches in that repo but they may need default branch to be updated too?21:04
* clarkb looks21:04
ianwvhdutil might be worth checking on that too, uses the same process21:05
opendevreviewMerged opendev/infra-openafs-deb jammy: Fix jammy branch .gitreview branch  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88530821:05
clarkbI think the other branches have the same issue, but pushing updates to them will rebuild he package for all the things and I'm not sure we want ot be juggling that for everything right now?21:05
opendevreviewMerged opendev/infra-openafs-deb jammy: Revert "Update Jammy to 1.8.9"  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88530921:05
opendevreviewMerged opendev/infra-openafs-deb master: Revert "Revert "Update Jammy to 1.8.9""  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88531021:05
opendevreviewMerged opendev/infra-openafs-deb master: Revert "Update Jammy to 1.8.9"  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88531121:05
opendevreviewMerged opendev/infra-openafs-deb master: Revert "Add build stamp to push to production PPA"  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88531221:05
opendevreviewMerged opendev/infra-openafs-deb master: Revert "Test generic job path"  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88531321:05
ianwyeah, similarly i think that probably the ppa will reject the upload if it's the same version number, so it is *probably* a no-op21:06
ianwprobably ...21:06
ianw:)21:06
clarkbthe first promote job for the .gitreview update is running now21:11
clarkbthen it will try to downgrade the version in the second job after that one completes21:11
clarkbif that fails we'll need to manually remove the newer packages and then land a noopy change to the jammy branch to force a rebuild. I believe there are timestamp comments already in there to do this21:11
opendevreviewMerged opendev/infra-openafs-deb master: Revert "Run promote job from project-config"  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88531421:14
opendevreviewMerged opendev/infra-openafs-deb master: Revert "Build jammy packages"  https://review.opendev.org/c/opendev/infra-openafs-deb/+/88531521:14
fungias far as approving a change proposed to the wrong branch which fast-forwards other commits from a different branch in gerrit, i thought there was a config option from long ago to toggle permitting that21:14
fungii remember it coming up back when one of the fuel maintainers accidentally approved a change which had been proposed to the wrong branch21:15
clarkbhttps://launchpad.net/~openstack-ci-core/+archive/ubuntu/openafs/+packages I don't think that updated the newer new version or downgraded21:16
clarkbso I think ianw is right if you try to push a new deb for the same version it is a noop and if you try to push an older version over a newer version it ignores you21:16
clarkbin that case we need to remove the jammy version and then land a change to push the old version to it?21:16
clarkbfungi: ^ were you still able to do that?21:17
fungii think we didn't have a jammy version before, did we?21:18
fungiin which case we just want to remove the 1.8.9 build21:18
fungiwe were just relying on the distro version anyway21:19
clarkbfungi: we had 1.8.8.1-2... before then we updated the version to 1.8.9 this morning that seems to have removed the 1.8.8.1-2 version from jammy21:19
clarkbyes we were just relying on the distro version and replacing 1.8.8.1-2 won't change that. It will just ensure that the jammy branch of our local repo is consistent with what we have in the PPA21:19
fungiahh, so we had a verison which would never be selected automatically because it sorted earlier than the distro version21:19
clarkbyes21:20
fungii'll work on removing the 1.8.9 packages, but per the document i linked earlier that won't be instantaneous21:20
clarkbah yup the emails that got caught in moderation say these updates were rejected so that all makes sense21:20
clarkbfungi: " A deleted package disappears from the archive indexes in at most 20 minutes. As soon as this happens, users will no longer be able to install it via apt." <- I think this is the bit we care about and that seems quick enough21:21
clarkbmostly we don't want static to end up in a weird spot21:21
fungiSource and binaries deleted by Jeremy Stanley: openafs 1.8.9-1~ppa1~jammy in jammy21:22
fungiDeletion comment: ended up being an unnecessary backport, the version shipped in jammy works fine as long as the caller isn't broken21:22
clarkb(if it does we can probably just redeplo it if necessary)21:22
clarkbso now if we push a noop change we should be able to publish 1.8.8.1-2 again. Though I wonder if it would reject it having seen the version before21:22
clarkb(I would hope not if the version isn't older and the old package doesn't exist anmore)21:23
fungialso not sure how long we need to wait before trying to do that21:23
clarkband ya we're using distro openafs anyway so it isn't vital More of a sync our local reality with the ppa reality for completeness21:23
clarkbnext steps: update jammy ppa to have current version in our downstream repo, downgrade afs on new ze01 and/or replace ze01?, continue with zuul executor replacement cc corvus 21:26
corvusi'd just go with a downgrade on ze01; pretty sure we can be sure that fallout is contained.21:29
corvusi don't think we can do 6-at-a-time during the week, and also, i'm not sure i have a lot of time to babysit that this week.  so if someone wants to re-chunk it into... maybe 3-at-a-time? i bet we could do that... feel free.  otherwise, i can probably pick it up again some time post-summit.21:31
fungisounds reasonable, i'm happy to babysit it21:33
clarkbfungi: per your comment in the gerrit channel I guess we can set receive.rejectImplicitMerges to true in All-Project config22:02
clarkbunlike some other receive settings this one doesn't appear to be settable gerrit wide in gerrit.conf so we need to set it in our root project config config (the default value for this item is to inherit from parent)22:03
clarkbI'll make note of that on hte meeting agenda for tomorrow22:05
fungiyeah, i mean, i had a todo list item about it from before i was infra ptl, if that means anything22:06
clarkbfungi: heh you can push it and knock that item off the todo list :)22:07
clarkbI think we just check that there isn't any reason to not do it then go ahead22:07
clarkbworst case it is an easy revert22:07
fungichecking that it's still relevant in the modern era22:08
clarkbyup it shows up in the docs22:08
clarkbhttps://gerrit-review.googlesource.com/Documentation/config-project-config.html22:08
fungicool22:08
clarkband does not show up in https://gerrit-review.googlesource.com/Documentation/config-gerrit.html so we need to set it in All-Projects or each project individually and not gerrit.conf22:09
fungiwe don't seem to include any receive options in the doc/source/gerrit.rst22:11
clarkbfungi: ya I think we've only set those that can go in gerrit.config22:11
clarkbto be fair thee are a number of useful ones there but ya I think this one should be added22:11
clarkbmy meeting updates are in. Please add any others if you have them in the next little bit22:11
opendevreviewJeremy Stanley proposed opendev/system-config master: Enable receive.rejectImplicitMerges for Gerrit  https://review.opendev.org/c/opendev/system-config/+/88531822:14
fungiclarkb: like that? ^22:14
clarkbyes that looks right. Slight nit I don't think fast forward is necessary it does an actual merge22:15
fungioh, yeah maybe it's just that the non-merge commits need to already be present on some other branch22:16
clarkbyup I think that is it22:16
clarkbGerrit must know about and have merged them to another branch. Then its yolo merging into another branch22:16
opendevreviewJeremy Stanley proposed opendev/system-config master: Enable receive.rejectImplicitMerges for Gerrit  https://review.opendev.org/c/opendev/system-config/+/88531822:16
fungitweaked the commit message to that22:16
clarkbfungi: what is the process of downgrading openafs on ze01? Just uninstall it then reinstall and since the newer package is no longer available it should find the old one instead?22:25
clarkbI guess maybe we need to clear the cache too?)22:25
fungiwe'll need to explicitly downgrade22:34
fungiprobably the easiest way is to `sudo apt purge openafs-client openafs-krb5 openafs-modules-dkms` and then 'sudo apt install openafs-client` (which should pull the other two back in as deps again)22:37
fungithere is a way to do an in-place downgrade, but it's not preferable22:37
fungii've got the purge in progress now22:37
clarkbya ok so remove then reinstall makes sense22:50
clarkbI think https://github.com/python/cpython/blob/main/Lib/_pyio.py#L241 is the line of code we are tripping over22:50
clarkbit is either that isatty (which I can't reproduce on its own) or the bufferedreader/writer/random classes that come later. If I set buffering=0 then this works22:58
clarkbok the issue still occurs if I manually set buffering to 4096 which should skip the isatty call so the issue is in the bufferedreader I think22:59
fungifwiw, the other two packages weren't dependencies so i explicitly installed all 323:00
fungilkm is building now23:00
clarkboh! its actually the lseek that comes after the ioctl that kills it. So the invalid ioctl is handled fine and it returns I'm not a tty. If you set buffering to say 4096 it never calls isatty and that ioctl call doesn't happen23:05
clarkbyup if I switch mode to appending then we seem to break here https://github.com/python/cpython/blob/main/Lib/_pyio.py#L1585 before we do the ioctl23:12
fungireplacement ze01 is downgraded to openafs 1.8.8.1-3ubuntu2~22.04.1 packages, i'll reboot it again for good measure (still no zuul-executor container running on it)23:25
fungipackages on static02 were never upgraded, fwiw23:26
clarkbcool23:28
clarkbapprently pdb cannot step into open()23:28
clarkbwhich is annoying because it is implemented as python so this should theoretically be possible (I may not be understanding how to do it I guess)23:29
clarkbpython3 -c 'open("/proc/fs/openafs/afs_ioctl", mode="rb", buffering=4096)` is the minimalest reproducer I've got so far23:30
clarkband the problem is definitely calling lseek() on the resulting fd.23:31
fungiis it necessarily implemented as python tho? there are a lot of calls which have python implementations shadowed by c when it builds23:31
clarkbfungi: yes its https://github.com/python/cpython/blob/v3.10.6/Lib/_pyio.py#L66 which is python23:33
clarkbI'm trying to figure out where the bufferedreader calls lseek from and not figuring that out. Was hoping pdb would help but it isn't with step at least23:33
fungimmm23:33
clarkbpython3 -c 'import os; fd = os.open("/proc/fs/openafs/afs_ioctl", os.O_RDONLY); os.lseek(fd, 0, os.SEEK_CUR)' <- is a bit more verbose as to what causes the crash though23:34
clarkbat the end of the day openafs isn't handling the seek gracefully and python3 open() appears to always do it one way or another23:38
clarkbIt is odd that we are seeking in an open() call particullay when we don't open with append set23:38
clarkbI'm thinking there may be a bug in python where it is seeking unnecessarily during open() then a separate bug in openafs not rejecting that request gracefully instead of panicing?23:39
corvusi'm going to remember this next time i do performance-critical io in python23:40
clarkbya lseek should raise OSError on unseekable files23:41
clarkbat the very least I would think open() should raise OSError instad of panicking and that likely is due to openafs not gracefully returning the appropriate errno to do that?23:42
clarkboh talking out loud here because I have no hard evidence yet. Does a buffered reader go out and preread the data? And the readmethods do a readall which do an lseek23:43
clarkbI wonder if that is the underlying cause of the lseek. Its buffering hte data before you ever doe a read()?23:43
clarkbI think I can test this on a regular file using strace23:44
clarkbno that doesn't appear to happen23:46

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!