Monday, 2023-09-18

fricklercorvus: clarkb: there is something weird about zuul history. https://zuul.opendev.org/t/openstack/build/a6515b5787b349e490d9c24029d08f80 ran on Sep 1, but if I look at build history for that job, it only shows 49 builds none earlier than Sep 1301:40
fricklerI'm also not sure why and when that job started failing, seems related to switching to bookworm01:41
Clark[m]frickler: related to zuul switching to bookworm?01:54
fricklerClark[m]: nodepool rather, but I didn't check the details yet. I wanted to start by seeing when exactly the job started failing, but with the zuul amnesia I only have a rough guess so far01:55
fricklerthough https://review.opendev.org/c/zuul/nodepool/+/892697 looks like a probable trigger01:57
Clark[m]Oh I thought you were saying the history weirdness might be related to bookworm. But you are saying the actual job result may be01:58
fricklerah, yes, sorry for mixing that up. the reason for history lacking things is completely unclear to me01:59
Clark[m]I did test that the nodepool builds with dib ran successfully building an Ubuntu focal image in the bookworm switch. But maybe the problem is more specific 01:59
fricklerdib history looks more complete and would match my assumption https://zuul.opendev.org/t/openstack/builds?job_name=nodepool-build-image-siblings&project=openstack/diskimage-builder02:00
Clark[m]Got it. Not in a good spot to debug now but can probably look in the morning 02:00
frickleriiuc the issue is in building an actual nodepool image, not with generic images02:01
opendevreviewdaniel.pawlik proposed zuul/zuul-jobs master: Add feature to set --vm-driver name for minikube  https://review.opendev.org/c/zuul/zuul-jobs/+/89475507:21
opendevreviewdaniel.pawlik proposed zuul/zuul-jobs master: Add feature to set --vm-driver name for minikube  https://review.opendev.org/c/zuul/zuul-jobs/+/89475507:33
opendevreviewdaniel.pawlik proposed zuul/zuul-jobs master: Bump cri-o default version to 1.26  https://review.opendev.org/c/zuul/zuul-jobs/+/89559707:45
opendevreviewdaniel.pawlik proposed zuul/zuul-jobs master: Add feature to set --vm-driver name for minikube  https://review.opendev.org/c/zuul/zuul-jobs/+/89475507:46
opendevreviewdaniel.pawlik proposed zuul/zuul-jobs master: Bump cri-o default version to 1.26  https://review.opendev.org/c/zuul/zuul-jobs/+/89559707:50
opendevreviewdaniel.pawlik proposed zuul/zuul-jobs master: Bump cri-o default version to 1.28  https://review.opendev.org/c/zuul/zuul-jobs/+/89559708:09
opendevreviewdaniel.pawlik proposed zuul/zuul-jobs master: Change CoreDNS configuration for Minikube  https://review.opendev.org/c/zuul/zuul-jobs/+/89560408:38
opendevreviewdaniel.pawlik proposed zuul/zuul-jobs master: Fix tox job when stestr is used  https://review.opendev.org/c/zuul/zuul-jobs/+/89560608:55
*** amoralej is now known as amoralej|lunch10:59
*** amoralej|lunch is now known as amoralej12:22
corvusfrickler: Clark it appears to be a pagination error; here is a list of 100 builds: https://zuul.opendev.org/t/openstack/builds?job_name=nodepool-build-image-siblings&project=openstack/openstacksdk&limit=10014:58
fricklercorvus: oh, interesting, it only shows 99 builds there for me. some kind of bad/hidden apple that makes zuul think the list is complete?15:03
frickleralso going to 200 confirms Sep 6 as the date for the job failing15:03
*** ykarel is now known as ykarel|away15:04
clarkbis that when we switched to bookworm?15:11
clarkbit is15:12
fricklercorvus: any idea what is happening with the pagination? nothing special found in web-debug.log15:14
fricklerclarkb: yes, that's the patch I mentioned earlier15:14
clarkbok the issue in the job is trying to install dib bindep packages that apparently don't exist on bookworm anymore15:15
clarkbso updating dib to install the correct pacakges should fix it15:15
clarkbI'll work on that15:15
clarkbI'm rearranging things in the bindep.txt to hopefully be more future proof15:23
clarkbbasically stop explicitly matching new things forcing us to match new things every release and instead unmatch old things15:23
fungigood call, also makes cleanup easier15:27
opendevreviewClark Boylan proposed openstack/diskimage-builder master: Update bindep rules for Debuntu  https://review.opendev.org/c/openstack/diskimage-builder/+/89569915:35
clarkbfrickler: ^ I think that will fix it15:35
clarkbfrickler: did you want to review https://review.opendev.org/c/openstack/project-config/+/895514 to update nodepool image upload timeouts before we approve it?15:40
fricklerclarkb: I meant to, but got distracted, thx for the reminder15:54
opendevreviewMerged openstack/project-config master: Set a six hour nodepool image upload timeout  https://review.opendev.org/c/openstack/project-config/+/89551416:43
fricklercorvus: clarkb: when you restarted zuul from scratch, what exactly did you do to clean up the cache, is that documented somewhere? just some zookeeper commands?16:57
clarkbfrickler: it is a zuul command that clears out the zk data. Let me find a link16:57
clarkbfrickler: https://zuul-ci.org/docs/zuul/latest/client.html#delete-state16:58
fricklerclarkb: ah, I was only looking at zuul-scheduler commands, thx a lot17:01
fricklerclarkb: bonus question: how do you run that command, when all containers are stopped? some docker foo with manually mounted config volumes?17:09
clarkbfrickler: I believe that corvus indicated it was executed using the scheduler container and docker-compose run17:09
clarkbfrickler: something like `docker-compose run zuul-scheduler zuul delete-state` but corvus can confirm17:10
fungifrickler: run works by spawning a new container, rather than exec which uses an already running container17:10
fungi(and i think run overrides the init so services don't get started in it?)17:10
frickleroh, that sounds easy enough, thanks again17:11
clarkbrun overrides the RUN command of the container if you supply an explicit command. But it does not override the ENTRYPOINT17:13
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Revert "Disable base role testing that runs code on localhost"  https://review.opendev.org/c/zuul/zuul-jobs/+/89570817:18
opendevreviewMerged openstack/project-config master: Retire os-win: remove project from infra  https://review.opendev.org/c/openstack/project-config/+/89441917:24
opendevreviewMerged openstack/project-config master: Retire compute-hyperv: remove project from infra  https://review.opendev.org/c/openstack/project-config/+/89442017:25
opendevreviewMerged openstack/project-config master: Retire networking-hyperv: remove project from infra  https://review.opendev.org/c/openstack/project-config/+/89444117:27
opendevreviewMerged openstack/project-config master: Retire oswin-tempest-plugin: remove project from infra  https://review.opendev.org/c/openstack/project-config/+/89444217:27
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Revert "Disable base role testing that runs code on localhost"  https://review.opendev.org/c/zuul/zuul-jobs/+/89570817:36
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Pin python-subunit in fetch-subunit-output test  https://review.opendev.org/c/zuul/zuul-jobs/+/89571517:36
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Unpin stestr and python-subunit in fetch-subunit-output test  https://review.opendev.org/c/zuul/zuul-jobs/+/89571617:36
fricklerjust ftr I did 'docker-compose run zuul-scheduler bash' and then 'zuul delete-state' inside and it worked fine17:45
frickler(on my local zuul, no need to worry ;)17:46
fungicorvus: odd behavior that might be regex related, not sure... https://opendev.org/openstack/requirements/src/branch/stable/train/.zuul.d/cross-jobs.yaml#L185-L197 defines a job parented to build-openstack-sphinx-docs which https://zuul.opendev.org/t/openstack/job/build-openstack-sphinx-docs says is only relevant on stable/rocky, yet there is no config error for that reported in18:55
fungihttps://zuul.opendev.org/t/openstack/config-errors18:55
fungichange https://review.opendev.org/891629 is attempting to remove the definition scoped to stable/rocky, and is getting the config error about stable/stein of openstack/requirements using it18:55
fungihttps://zuul.opendev.org/t/openstack/job/cross-osc-build-sphinx-docs also correctly shows only being relevant for stable/stein (despite having a parent that is only relevant for stable/rocky)18:57
fungiit's left me scratching my head18:57
fricklerI think https://opendev.org/openstack/requirements/src/branch/stable/stein/.zuul.d/cross-jobs.yaml#L180-L192 is the questionable job reference, not the stable/train link above19:09
fricklerafter reading the docs, my understanding of job.branches is that it only limits when the job will be run, it doesn't block the "parent" composition. so a job definition with "parent: build-openstack-sphinx-docs" will only become a config error once the last definition of that parent job gets removed, even if already earlier there is no chance that this job variant will ever be executed19:13
fungiyeah, maybe it's just not intuitive to me that you can parent a job to a definition on another branch that doesn't match (either implicitly, explicitly, or via default fallback)19:15
fungii agree that seems possible though19:15
corvusfungi: the parent definition doesn't specify a branch, it just specifies a job, and there's no way to know if the job will run without an exemplar ref (and even in the case where it seems like there should be no possibility, it's possible for a child to alter things to encourage zuul to use a parent on a different branch).19:23
corvusfungi: so if you want to try to rewire your brain to match zuul: jobs are abstract ideas that exist independent of branches; branches are just things that are used to decide which variants of jobs to use.19:24
fungicorvus: thanks, so essentially what frickler was also saying19:25
fungiand https://review.opendev.org/891629 then really is creating an error by trying to remove a job definition that is referenced by another project's config19:26
corvusfungi: well, i was hoping to try to address your comment that it was non-intuitive.  i was trying to find a way to meet you at your intuition.19:27
fungiyep, got it. i wasn't thinking of the possibility that job definitions can be taken from other branches even when those aren't the default branch19:30
corvus"jobs exist in a state of quantum superposition until a change is enqueued..." does that help? ;)19:32
fungiyes, perfectly!19:32
corvusdefinitely going to work "speculative execution" and "quantum superposition" into the same slide for my next talk19:33
fungiso to be clear, the job wouldn't have actually run anyway because the parent wasn't defined for that branch19:33
corvusfungi: it is very unlikely for that job to run in the normal course of business as i understand things in openstack, yes.19:35
fungiwfm, thanks again19:35
fungi(and yes i can imagine there might be some convoluted way to get it to run from a third project that defined its default branch as one of those and then had a change proposed for the other)19:36
corvushere's a draft of the announcement about zuul regexes: https://etherpad.opendev.org/p/3FbfuhNmIWT33fCFK1oK21:07
clarkbcorvus: lgtm21:30
opendevreviewMerged openstack/diskimage-builder master: Update bindep rules for Debuntu  https://review.opendev.org/c/openstack/diskimage-builder/+/89569921:49
clarkbI've done some edits to the team meeting agenda. Anything else to add?21:50
clarkbalso yuriy responded to my email already. So starting to work through the information gathering for that cloud replacement21:51
corvusclarkb: probably worth bringing up that email in the meeting (not sure if that falls under an existing item)21:51
JayFHello infra friends; can someone hold a node for me: job: ironic-inspector-grenade, change: 895164 repo: openstack/ironic-inspector21:52
JayFit's in experimental queue in the change I'm testing it on, if that matters21:52
JayFlet me know when that's in place and I'll run a `check experimental`21:52
clarkbcorvus: yup I added a new agenda item to discuss the whole cloud redployment thing21:52
clarkbJayF: yup one sec21:53
clarkbJayF: it should be in placen ow21:56
corvusclarkb: oh sorry i meant the regex notification email (though the other thing sounds good too)21:56
JayFthank you21:56
JayFclarkb: I will note if Julia somehow wants to steal this node in the morning before I get to it, please give it to her (or anyone else who wants to help fix it, really LOL)21:57
clarkbcorvus: oh got it22:01
clarkbcorvus: do plan to send it before then and I can link to the archive or should I link to the draft?22:01
corvusthe draft -- i'd like to make sure everyone agrees with the approach since this is somewhat novel22:11
fungicorvus: that announcement lgtm22:31
JayFcorvus: if you want a PTL+TC perspective, that LGTM as well. Thank you for the detail :)22:32
fungion the openstack side of things, i expect the biggest challenges are going to be trying to fix errors on several-years-old branches where nobody has kept their jobs passing, and may require removal/fixing of failing jobs or assistance from sysadmins to bypass gating22:32
fungibut we can tackle that on a case-by-case basis22:33
corvusJayF, fungi thanks :)22:33
JayFfungi: I think there's a potential prioritization batte there: What is more important; the "fix CI" forcing function and CI-correctness, or fixing zuul-config-errors in an expediant manner through force-merging changes.22:34
fungiJayF: or seeing it as a reminder that if you're going to keep branches open to accept changes, you need to keep whatever jobs you run on those branches passing22:34
fungi(with several implied solutions to the problem, depending on your available time and preferences)22:35
JayFthat is another way to state 'forcing CI-correctness', at least in how I meant it22:35
fungiyep22:35
JayFI think you're right, and that probably is the correct move, but I know it always feels like I'm closing a door forever when I remove jobs from an old branch, which makes it tough to do22:36
fungiwhen i said several implied solutions, i meant: 1. fixing jobs, 2. removing jobs, 3. closing branches22:36
fungiwhich is appropriate will be situation-dependent22:37
clarkbslittle: did further decisions get made around starlingx repo renaming?23:09
clarkbmeeting agenda sent23:10
fungithanks!23:40

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!