Tuesday, 2022-10-11

clarkbianw: re console files we need to do similar with static iirc00:02
clarkb(sorry bike rides are good for pulling on threads in your head an I just got back from a bike ride)00:03
ianwoh yep, i've cleared out there too and am watching00:04
ianwi'm going to give it a few hours before i debug too much as iirc any periodic jobs don't update system-config between runs00:05
clarkbya I think waiting is fine. Its not like we haven't had this issue for a while anyway :)00:05
ianwclarkb: does hosts: bastion[0] work as a host-matcher for a playbook of the top of your head?  i was thinking the bastion group would never have more than one entry -- it's basically just a pointer00:08
clarkbianw: I want to say yes. It was noonedeadpunk maybe that was doing similar with osa stuff and zuul?00:09
ianwhttps://docs.ansible.com/ansible/latest/user_guide/intro_patterns.html#common-patterns doesn't list it, but it also says "common patterns" not "here is an exhaustive list of possible patterns" ...00:10
clarkband ya I think addressing it that way is fine we should just be consistent. If we can't be consistent maybe add a verbose comment about it to the groups file00:10
ianwyeah i did call it out in https://review.opendev.org/c/opendev/system-config/+/858476/6/inventory/service/groups.yaml but we can try the direct addressing too00:11
ianwyou're also probably correct that using a different group to address the bastion host in the job setup, versus in the nested ansible, might be clearer00:12
clarkbya I had to think about that one for a minute before I realized what was going on there00:12
ianwi couldn't think of any other way than configuring the group in the job definition.  i toyed with the idea of a "fake" add_host that somehow runs once and makes a fake host you can dereference to find the bastion host name ... but i felt like that was getting even more confusing00:14
clarkbya I think what you've got is the right way to do it. Just a matter of distinguishing "this is the group management for the job top level job ansible" from "this is our production group management"00:15
fungiit certainly is mind-bending in ways that i hope future zuul versions will be able to simplify00:19
*** dviroel|biab is now known as dviroel|out00:20
ianwfungi: are you ok with https://review.opendev.org/c/opendev/system-config/+/856593 which moves the ansible on bridge into a venv?  it's one that will require close monitoring but i'm happy to do that00:21
ianwper the merges above, i've taken the liberty of merging the more trivial cleanup changes 00:22
fungioh, i thought i had already reviewed that one... looking00:23
fungiianw: yeah, lgtm, thanks!00:26
fungimerge at your convenience00:26
ianwthanks; will do.  after that is in the clear, the stack will be focused on jammy upgrade and making it easier to swap the bastion host00:27
ianwrunning every single system-config-run job on one change does expose that change to a lot of failure possibilities :/00:34
fungiyou need popcorn00:36
opendevreviewMerged opendev/system-config master: run-selenium: Use latest tag on firefox image  https://review.opendev.org/c/opendev/system-config/+/85780301:58
opendevreviewMerged opendev/system-config master: afs-release: better info when can not get lockfile  https://review.opendev.org/c/opendev/system-config/+/85800901:58
opendevreviewIan Wienand proposed opendev/system-config master: bootstrap-bridge: drop pip3 role, add venv  https://review.opendev.org/c/opendev/system-config/+/85659302:29
opendevreviewIan Wienand proposed opendev/system-config master: Run jobs with a jammy bridge.openstack.org  https://review.opendev.org/c/opendev/system-config/+/85779902:29
opendevreviewIan Wienand proposed opendev/system-config master: testinfra: Update selenium calls  https://review.opendev.org/c/opendev/system-config/+/85800302:29
opendevreviewIan Wienand proposed opendev/system-config master: Abstract name of bastion host for testing path  https://review.opendev.org/c/opendev/system-config/+/85847602:29
opendevreviewIan Wienand proposed opendev/system-config master: Run a base test against "old" bridge  https://review.opendev.org/c/opendev/system-config/+/86080202:29
opendevreviewIan Wienand proposed opendev/system-config master: Convert production playbooks to bastion host group  https://review.opendev.org/c/opendev/system-config/+/85848602:29
ianwwe still see a few console files coming up on bridge -- they all appear to be related to the gpg encrypting of logs; hence suggesting the post playbook i guess02:30
opendevreviewIan Wienand proposed opendev/system-config master: Correct zuul_console_disabled flag  https://review.opendev.org/c/opendev/system-config/+/86091302:47
*** ysandeep|out is now known as ysandeep02:48
opendevreviewIan Wienand proposed opendev/base-jobs master: Fix zuul_console_disabled typo  https://review.opendev.org/c/opendev/base-jobs/+/86091402:53
*** ysandeep is now known as ysandeep|afk03:32
opendevreviewMerged opendev/system-config master: Correct zuul_console_disabled flag  https://review.opendev.org/c/opendev/system-config/+/86091303:53
opendevreviewMerged opendev/base-jobs master: Fix zuul_console_disabled typo  https://review.opendev.org/c/opendev/base-jobs/+/86091404:10
opendevreviewIan Wienand proposed opendev/system-config master: bootstrap-bridge: drop pip3 role, add venv  https://review.opendev.org/c/opendev/system-config/+/85659304:11
opendevreviewIan Wienand proposed opendev/system-config master: Run jobs with a jammy bridge.openstack.org  https://review.opendev.org/c/opendev/system-config/+/85779904:11
opendevreviewIan Wienand proposed opendev/system-config master: testinfra: Update selenium calls  https://review.opendev.org/c/opendev/system-config/+/85800304:11
opendevreviewIan Wienand proposed opendev/system-config master: Abstract name of bastion host for testing path  https://review.opendev.org/c/opendev/system-config/+/85847604:11
opendevreviewIan Wienand proposed opendev/system-config master: Run a base test against "old" bridge  https://review.opendev.org/c/opendev/system-config/+/86080204:11
*** dasm is now known as dasm|off04:20
opendevreviewIan Wienand proposed opendev/system-config master: Abstract name of bastion host for testing path  https://review.opendev.org/c/opendev/system-config/+/85847605:54
opendevreviewIan Wienand proposed opendev/system-config master: Run a base test against "old" bridge  https://review.opendev.org/c/opendev/system-config/+/86080205:54
*** ysandeep|afk is now known as ysandeep06:01
Hoai-Thu1665468387:56EC680F990544DC2B7F6B6A77ECDE2D95BAFACD YkssY7jp`swsk%06:10
*** jpena|off is now known as jpena07:17
*** frenzyfriday is now known as frenzyfriday|sick07:50
*** kopecmartin|sick is now known as kopecmartin08:08
*** ysandeep is now known as ysandeep|lunch08:19
opendevreviewGrzegorz Grasza proposed opendev/irc-meetings master: Update Barbican meeting chair and time  https://review.opendev.org/c/opendev/irc-meetings/+/86092908:47
*** ysandeep|lunch is now known as ysandeep09:46
*** dviroel|out is now known as dviroel11:06
*** ysandeep is now known as ysandeep|afk11:42
*** pojadhav is now known as pojadhav|afk11:45
opendevreviewRafal Lewandowski proposed openstack/diskimage-builder master: Added cloud-init growpart element  https://review.opendev.org/c/openstack/diskimage-builder/+/85585611:49
*** ysandeep|afk is now known as ysandeep12:16
*** dasm|off is now known as dasm12:41
*** pojadhav|afk is now known as pojadhav12:45
opendevreviewRafal Lewandowski proposed openstack/diskimage-builder master: Added cloud-init growpart element  https://review.opendev.org/c/openstack/diskimage-builder/+/85585613:48
opendevreviewAmy Marrich proposed opendev/irc-meetings master: Change meeting day and time for Diversity and inclusion  https://review.opendev.org/c/opendev/irc-meetings/+/86095514:08
opendevreviewJeremy Stanley proposed opendev/system-config master: DNM force mm3 failure to hold the node  https://review.opendev.org/c/opendev/system-config/+/85529214:18
opendevreviewMerged opendev/irc-meetings master: Change meeting day and time for Diversity and inclusion  https://review.opendev.org/c/opendev/irc-meetings/+/86095514:19
fricklerclarkb: when you tested devstack with the new zuul ansible, you only tested master, not old stable branches, right?14:20
fricklerwondering whether https://bugs.launchpad.net/neutron/+bug/1992379 could be related, saw similar things in older devstack stable branches14:20
funginote that the ansible default change merged on thursday at 16:31z, so if the failures started with the friday periodic runs, that would be a fairly tight correlation14:23
fricklerthat seems to match pretty well https://zuul.opendev.org/t/openstack/builds?job_name=neutron-functional&project=openstack%2Fneutron&branch=stable%2Ftrain&skip=014:26
mlavallefungi: yes, it matches, based on what I've seen14:27
fungishort-term workaround is to set the ansible version in those jobs to 5, but that will only work until zuul drops ansible 5 support, which is coming very soon14:28
fricklermaybe opendev can delay that switch for some time? stable branches are already struggling without further changes imposed upon them14:34
fungiwill they struggle less if we delay it a week?14:35
fungia lot of those stable branches need to be eol already14:36
fricklernot really, I was thinking more in terms of months to either get them fixed or eold14:36
fungithe zuul community would like to drop ansible 5 support and release 8.0 next week: https://etherpad.opendev.org/p/qF7VE9HzqPVzsCyZLWxb14:37
funginormally zuul tags releases after the code has been running successfully in opendev's deployment14:37
*** ysandeep is now known as ysandeep|dinner14:37
fungidelaying removal of ansible 5 in opendev either means not tying zuul releases to opendev successfully running the code first, or delaying zuul's release plans by however long we want to delay the ansible 5 removal14:38
fricklerso we need to weigh interest of the zuul community vs. interest of the OpenStack community14:39
fungiyes. i could see suggesting we push the ansible 5 removal to the week after the ptg, for logistical reasons14:40
fricklerexcept possibly there is an easy fix for this14:40
fricklereasy and not requiring every repo to be touched, ie. likely localized to devstack14:41
frickleris the ansible 5 removal blocking anything else on the zuul side?14:42
fricklerfyi verifying in https://review.opendev.org/c/openstack/devstack/+/860797 now14:45
funginot needing to support eol ansible versions, but i'm not sure what else beyond that14:46
fungiwe basically got way behind because of the modules fork maintenance stuff, which once resolved allowed us to start catching back up to ansible's support schedule14:46
clarkbhuh my messages from matrix seem to have not made it to irc? that is unfortuante14:49
clarkbthis is likely the same issue taht zuul-jobs ran into with ansible's handling of shebangs picking the wrong python version14:50
clarkbwe can just drop the shebang and that should fix it if this is the same issue14:50
funginote that the ironic team ran into some problem related to the ansible default change as well. i didn't dig into it but JayF can probably say if theirs was that issue as well14:50
clarkbthe point I was trying to make that apparently matrix didn't send through for me is that we did try to accomodate openstack here and intentionally waited for the release to compelte before doing this knowing it had the potential to be destructive14:51
JayFOur failures were the ansible openstack module complaining about openstacksdk versions14:51
clarkbat the same time if we never take the chance that some things would break here we would never update teh ansible version14:51
clarkband it seems a little silly to prioritze ancient openstack stable branches that don't get required care and feeding14:52
JayFOh, I've totally seen this failure in some places too. It didn't occur to me this could be ansible version :| 14:52
fungioh, i see the metalsmith situation looks related to openstacksdk constraints vs ansible collection for openstack? https://review.opendev.org/c/openstack/metalsmith/+/860943/3/metalsmith_ansible/ansible_plugins/modules/metalsmith_instances.py14:52
JayFYes14:52
clarkbyou should decouple those things14:53
clarkbthe ansible running your job shouldn't leak into the python install you are testing14:53
JayFhttps://review.opendev.org/c/openstack/metalsmith/+/860943 being our "improved" workaround14:53
clarkboh wait this ia devstack-gate failure?14:53
clarkbI really don't think we should hold anything up for a devstack-gate problem14:54
JayFclarkb: that's basically what I said as soon as we connected the dots as to how it was broken; I'd be surprised if anyone has the time to refactor that job though14:54
fungiyes, devstack-gate is on borrowed time already. there will come a time in the not too distant future where we need to decide between eol'ing some em branches in openstack sooner or dropping integration tests on them so we can retire d-g for good14:55
clarkband ya the issue is precisely the one zuul-jobs hit. Its a one line fix. I'll push it up as soon as I load ssh keys14:55
clarkbBut I don't think we should hold anything up for devstack-gate14:56
clarkbits one thing to try and work with devstack problems but devstack-gate shouldn't be used14:56
clarkbremote:   https://review.opendev.org/c/openstack/devstack-gate/+/860961 Remove shebang from ansible module14:58
opendevreviewRafal Lewandowski proposed openstack/diskimage-builder master: Added cloud-init growpart element  https://review.opendev.org/c/openstack/diskimage-builder/+/85585614:58
frickleroh, I didn't realise that these were all still ds-gate jobs. that's a good argument for retirement indeed15:06
*** ysandeep|dinner is now known as ysandeep15:06
fungiwhich would also explain why the canary devstack tests with ansible 6 didn't turn up things like that15:07
*** dviroel is now known as dviroel|lunch15:11
clarkbfungi: d-g actually does the exact thing that the ansible change breaks :/ it uses the module as both a script and a module15:12
clarkbI pushed a followup patchset, but I think that will break too15:12
fungiindeed15:12
clarkbits a bit of a gg ansible situation. But also its d-g which no one should be using...15:12
fungichange of topic... one catch to holding a job node for a service built on a speculative container... docker-compose up can't find the images in the registry and errors15:18
fungii guess there's a way to tell it to use the cached images it has?15:18
fungioh, or maybe it never actually had them at all. docker image list only lists the mariadb image15:19
fungi860157 built images on 2022-10-07, could they have expired out of the buildset registry at this point15:21
fungi?15:21
clarkbfungi: I thought that cleanup isn't working15:21
clarkbso no I don't think so15:21
fungii guess i should see if the dnm change failed for the intended reason or for other reasons15:21
clarkbin the past I think the issue has been in setting up the job requires and dependencies15:22
clarkbI would double check that15:22
clarkb(what job was it?)15:22
fungiERROR: for mailman-core  pull access denied for opendevorg/mailman-core, repository does not exist or may require 'docker login': denied: requested access to the resource is denied15:22
fungihttps://zuul.opendev.org/t/openstack/build/2b0da96429d942e39b862e41414ab4fa15:22
fungiso no, the held node never had the speculatively built images15:22
fungimissing provides/requires maybe?15:23
clarkbya I was actually concerned about this. I think that when things aren't live in zuul with the speculative state it loses that info somehow? Basically we may need to land the image additions to make this work reliably. I'm pretty sure I added all the requires and dependencies, but worth double checking that too15:24
clarkbthe issue (my hunch anyway) is that we don't know to pull in the speculative build from the intermediate registry to the buildset registery when those changes don't go up together15:25
clarkband since we haven't landed the change either there is no image in docker hub to pull as the fallback15:25
clarkbfungi: I think if you recheck both changes together that it will correct it15:25
fungiyeah, makes sense. thanks15:27
clarkbfungi: frickler: this affects normal devstack jobs train and older because devstack train uses the test-matrix role in playbooks/pre.yaml. Ussuri dropped it15:29
clarkb(I think that is still a bug in "don't use d-g", but does complicate cleanup a bit if we want to take it further)15:30
fungicould be a reason to fast-track eol for stable/train15:31
fungior just drop integration testing on it15:31
clarkbya15:31
fungipretty sure mass eol of pqrs are already underway15:32
fungielodilles: ^ just a heads up that devstack jobs on stable/train and older are broken by newer ansible and will either need fixing or dropping devstack/grenade jobs15:32
fungii wonder if it also affects stable/ussuri grenade jobs15:33
fricklerwith EM these things fall into each project's responsibility and my impression is they tend to just get ignored15:33
clarkbI've got https://review.opendev.org/c/openstack/devstack/+/860963 pushed testing https://review.opendev.org/c/openstack/devstack-gate/+/86096115:33
frickleradded that as PTG topic for the release team, but maybe also a TC topic15:33
clarkbI think the latest patchset should be functional15:34
* frickler takes a break, bbl15:35
clarkbfungi: if the rechecking at the same time corrects it we should ask corvus what we are doing wrong with my change there15:37
mlavalleclarkb: will https://review.opendev.org/c/openstack/devstack-gate/+/860961 hlp me with https://bugs.launchpad.net/neutron/+bug/1992379?15:39
fungiclarkb: i see that the dnm change does have its system-config-run-lists3 build in "waiting" while the parent change's system-config-build-image-mailman build is underway15:39
clarkbmlavalle: I think so but https://review.opendev.org/860963 should tell us. As mentioend above though really nothing should be using devstack-gate. Its a bug that devstack train is relying on devstack gate15:39
clarkbso I think there is a bigger question of how to unwind that (as fungi suggests maybe we need to stop running those old tests entirely)15:40
fungione way to stop running them is to go ahead and eol stable/train15:40
mlavalleclarkb: thanks. will keep an eye on 86096315:46
fungiclarkb: yep, as suspected, once the image build completed on the parent, the mm3 test build started15:52
clarkbfungi: ya I seem to remember running into this once before (maybe adding new gerrit images?) I'm not quite sure if this is expected or not15:53
fungiit does seem side-effect-ey15:53
fungiif we didn't omit the system-config-build-image-mailman job on the child change, it would supply its own image to the tests15:54
clarkbya, and it is omitted due to not matching file matchers for that job15:55
clarkbmayber we've overoptimized15:55
clarkband by we I guess I mean me as the author of that change :)15:55
fungiagreed, seems like it could be one of the many arguments about why file matchers are problematic15:56
fungianyway, this can be worked around easily enough now that i get what's going on, and i think zuul is working as intended here15:57
fungifor me, the surprising side effect is the parent change buildset supplying images to the child change buildset even though they're running in an independent pipeline15:58
clarkbya thats what the job requires and provides does15:59
clarkbfungi: are you holding the node now to do another test pass? Have we updated the mm2 prod strings that are too long?15:59
fungiright, i get that it's an intentional feature, but the fact that you might get a speculatively built image or you might not depending on timing does feel kinda magic16:00
clarkbya that part is a bit weird to me as well16:00
fungiclarkb: yes, i cleaned up the overly long fields in the prod lists and i have an autohold reset to catch this latest build of the dnm change16:00
*** marios is now known as marios|out16:00
clarkbawesome16:01
fungii was mid-rsync of production data to a node i'd held an hour or so ago when i had to mount the ephemeral drive (rackspace node) on /var/lib/mailman to make enough space for the migration tests and so was stopping/starting the containers for moving that homedir when i noticed the images for them were missing16:02
fungibut as soon as i have a new held node i'll get the ssh keys added on it and start pushing another copy of the data over16:02
fungiprobably about 10 minutes out if zuul's estimate is to be believed16:03
*** ysandeep is now known as ysandeep|out16:10
fungiwas enough time for me to scarf down cold leftovers for lunch16:16
funginew held node is 198.72.124.74 and it's not in rackspace so i don't need to fiddle with filesystems16:17
fungithough still no images :/16:18
fungisame error https://zuul.opendev.org/t/openstack/build/0940a25021fa4d9c891186457d4042e816:18
fungii guess it doesn't try to pull the one from the parent change anyway, it only delays building16:19
fungii'll amend the dnm change to touch something to trigger an image build16:19
clarkbhrm if it delayed then it knew about the artifacts and should've pulled those into the buildset registry from intermediate I would've thought16:20
clarkbmaybe the plumbing just doesn't work before the image actually exists?16:20
opendevreviewJeremy Stanley proposed opendev/system-config master: DNM force mm3 failure to hold the node  https://review.opendev.org/c/opendev/system-config/+/85529216:21
fungiswizzled16:21
funginow to reset the trap16:21
clarkbthe fact that my devstack test change of d-g hasn't all failed yet implies to me that the fix is working if we want to go ahead and approve the d-g change16:24
clarkbworst case it won't make anything worse :)16:24
fungilooks like it's running system-config-build-image-mailman this time, so should supply its own image16:24
*** dviroel|lunch is now known as dviroel16:24
fungii've reviewed two devstack-gate changes in the span of a week. i really don't want to resubscribe my gertty to that repo16:39
clarkbas a heads up the matrix oftc bridge isn't passing messages through to irc right now17:14
fungiouch17:14
clarkbmessages written here in IRC make it to matrix17:16
clarkbbut not the other way around17:16
clarkbI've asked for help in #irc:matrix.org17:19
fungiokay, 149.202.168.204 seems to be a working held mm3 server. now to copy data onto it17:19
*** jpena is now known as jpena|off17:25
clarkbhttps://github.com/matrix-org/matrix-appservice-irc/issues/1624 https://github.com/matrix-org/matrix-appservice-irc/issues/1590 https://github.com/matrix-org/matrix-appservice-irc/issues/1575 have been shared with me. Unfortunately, not a whole lot in those other than the problem exists17:25
fungiproduction data is replicating to the held server now17:30
clarkbnote those issues are all for messages working from matrix to irc but not irc to matrix. THe inverse of what we observe. I've asked if we need to file a new issue17:31
fungii've been notified of two new element releases today in my browser client, not sure if that's related but it's a lot more than i normally expect17:40
fungiif nothing else, it could explain why the matrix admins aren't particularly responsive at the moment17:44
rotenseprohello18:57
rotenseproanyone here please18:57
clarkbrotensepro: yes, a few of us are here. Usually best to just ask your question if you have one18:58
rotenseproI'm an Outreachy intern....currently studying the docs & guides sent to me by my mentor18:59
rotenseproso happy to be here18:59
rotenseproMy project mentor's usernames are: fpantano, ashrodri19:03
fungiwelcome!19:05
fungiwe're in the middle of our weekly meeting (in the #opendev-meeting channel), so slightly slower to respond in here as a result19:05
JayFWelcome o/ 19:05
*** dviroel is now known as dviroel|biab19:19
mlavalleclarkb: yeap, it worked: https://review.opendev.org/c/openstack/networking-ovn/+/86061019:26
mlavallethanks!19:26
JayFmlavalle: I hope you're doing well o/ 19:35
corvusclarkb: you're saying an earlier patchset of 855292 ended up without images?19:40
fungicorvus: https://zuul.opendev.org/t/openstack/build/0940a25021fa4d9c891186457d4042e819:40
fungithat was the last example19:41
clarkbcorvus: yes, that is correct19:41
clarkbnormally we'd probably fall back to what is on docker hub but since the change adding the imges to docker hub hasn't merged yet it doesn't do that19:41
corvusit looks like that is based on PS7 of the image build change19:43
corvus(just making sure i have all the configs lined up)19:44
corvusclarkb: fungi it appears zuul itself dtrt; here's the artifact info: https://zuul.opendev.org/t/openstack/build/0940a25021fa4d9c891186457d4042e8/log/zuul-info/inventory.yaml#12419:47
clarkbI guess that implies somewhere in the docker image pulling we aren't looking to the intermediate registry properly?19:48
corvusyeah, does that job inherit from one of the image-using base jobs?19:48
clarkboh! that's it I think. system-config-run and system-config-run-containers are distinct19:49
clarkband we need system-config-run-containers in this situation19:49
corvusyep, that looks like that should do it.19:50
corvusclarkb: maybe it's worth putting a "fail" task in the system-config-run playbook if there are container artifacts?19:51
clarkbthats a good idea19:51
corvus(basically, that should detect the "you set up provides/requires but without the intermediate registry case)19:51
fungiianw: yeah, not combining them into one change for sure, but if we want them to take effect at different times then do we want some specific amount of time separating them in order to avoid confusion?20:04
fungilike three days? a week?20:04
ianwyeah for mine, i think something like >= 2 days (just for tz's and the world to spin around everyone's "day" :)  gives enough granularity to detangle things20:06
fricklerforgot to mention that rocky9 image builds are still paused. I tested a couple of times to unpause, but no change to the failure20:19
frickleralso are we planning to have a normal meeting next week? I'd rather not since my day starts with a QA session at 7UTC20:20
fricklerand with that I'm off for today20:21
fungimm3 import test is underway now on 149.202.168.20420:34
fungiprior full imports took around 2.5-3 hours to complete20:34
ianwhrm, https://review.opendev.org/c/opendev/system-config/+/856593 failed on github timeouts getting a client file for bridge (2x) and a letsencrypt failure20:43
ianwi don't know what the realistic chance of running ~52 jobs (26-ish jobs * 2 for check and gate) is for getting this in ... 20:45
mlavalleJayF: yeah, doing great! o/20:51
JayFmlavalle: just good to see you around; you ever need anything lmk :D 20:54
clarkbfrickler: there was a change NeilHanlon pushed up to stop using some of the fancier mirror stuff to try and get bette rerrors out of the logs. But I think that may require a dib release20:54
mlavalleJayF: will do. Thanks20:55
clarkbfungi: I'm going to update the the mm3 image change to swap out the parent system-config-run job and remove the WIP from the commit message. I don't think this will impact you at all but heads up20:55
opendevreviewClark Boylan proposed opendev/system-config master: Fork the maxking/docker-mailman images  https://review.opendev.org/c/opendev/system-config/+/86015720:57
opendevreviewClark Boylan proposed opendev/system-config master: DNM force mm3 failure to hold the node  https://review.opendev.org/c/opendev/system-config/+/85529220:57
clarkbinfra-root https://etherpad.opendev.org/p/3BivmG_PowWf77iKffJU something like that for the ansible 5 and jammy email?21:08
ianwis the rocky failure related to https://review.opendev.org/c/openstack/diskimage-builder/+/860428 ?  it wasn't clear to me if that was to be merged or was for testing21:09
clarkbianw: yes, I think the idea was if we land that change and run it that we might get better error messages from dnf21:09
clarkbcurrently dnf fails and then doesn't tlel you what actually failed because its using the abstract mirror list21:10
clarkbwhich makes it really hard to debug getting short http responses21:10
clarkbit basically just says it got fewer bytes than expected for each of the mirrors it hits but doesn't tell you which mirrors21:10
clarkbI tried to manually reproduce by getting a mirror listing from the mirrorlist url on nb01, then fetched the files from that server and they came back fine. The other suspicion I had was that maybe we had cached something that was broken but I couldn't find that in the cache if we had done so21:11
clarkbhttps://bugzilla.redhat.com/show_bug.cgi?id=1262845 is the dnf behavior bug that has been open for 8 years21:12
ianwclarkb: ok, one other quick option would be just to hand-edit that into the container and run it, see what happens21:15
clarkbwell I tried to reproduce with the conatiner outside of dib and could not21:15
clarkbdo you mean try and jump on a dib build and edit it really quickly before it fails?21:16
clarkb(this happens right at the beginning of the build os it might be tough to do that)21:16
ianwoh i just mean start a shell in the nodepool container and edit the element file, then trigger a build21:17
clarkboh21:17
clarkbthat container21:17
ianwi've done that a couple of times for things that i just couldn't reproduce outside the actual nb21:17
clarkbya let me give that a shot right now21:17
ianwobviously it doesn't survive etc. but handy for quick testing21:18
ianwi'm also happy to merge it, but it's a long path from there to fresh container builds deployed21:18
corvusclarkb: i made some changes to your etherpad; i'd like to avoid linking to the scratch etherpad where we batted around ideas for the zuul release sequence, and definitely want to avoid calling it a "published release schedule" :)21:19
ianwoh, that was me that added that21:19
clarkbcorvus: ack21:20
corvuseven now, that's still just an estimate, assuming that the change to actually remove ansible 5 gets written soon21:20
clarkbIf I manage to dig out from under everything else i may write that change if for no other reason than to keep my email as accurate as possible21:21
clarkbbut I still owe you a review on that static driver nodepool change21:21
clarkbianw: any idea what editor you use? I can't find one in these images21:21
clarkbI guess I could heredoc a file replacement21:22
ianwyou probably have to to apt-get install a vim21:22
clarkbok nb01 has been updated (but not 02) and I have unpaused rockylinux-921:25
fungiclarkb: on the announcement, it's probably not necessary to call out the openstack release. readers who are concerned about impact to openstack release timing already know what point in the cycle it's at and the other projects on opendev likely don't care about openstack's release schedule anyway21:31
fungii'll separately post to openstack-discuss to draw their attention to the announcement anyway21:32
clarkbfungi: like that?21:34
fungiclarkb: yeah, looks great!21:34
fungishort and to the point21:34
clarkbI made one small additional edit related to that21:35
clarkbI will send that out in a few minutes just to be sure there are no new suggestions21:35
fungiyeah, still lgtm21:35
clarkbianw: frickler NeilHanlon the manually patched rocky 9 build seems to have gotten past where it was previously stuck21:37
clarkbwhich is great, but also not so great because now we don't have extra debugging info21:37
clarkbsomeone that understand mirrorlist vs not better than me should weigh in if we can get away with this change21:38
clarkbI suspect it will cause all our jobs to use the single base repo instead of mirrors? that is probably not ideal21:38
clarkbNeilHanlon: ^ the fact that it worked with that change makes me suspect there is a broken mirror in the mirror list though21:41
*** rotensepro_ is now known as rotensepro21:43
fungimailman import test has just reached openstack-stable-maint, this is the long one22:08
ianwclarkb: yeah, it sounds like the nb is getting hashed to some mirror that is unhappy ...22:22
ianwi guess it may not even show up when just wgetting a file, it might be something to do with actually grabbing multiple files or other things dnf does22:23
clarkbya what I did by hand outside of the nodepool container was fetch the mirror list then take the first result of that and try to fetch the file from there22:25
clarkband that worked, but maybe something about user agents changes the listing? Definitely weird to me that dnf can't log who/what it talked to in a mirrorlist setup to enable debugging though22:26
clarkbseems like thats a basic thing to have22:26
NeilHanlonyeah, agreed clarkb. i will get this handled tonight. just been busy22:27
clarkbNeilHanlon: well I don't think its a rush. I'm just not super well clued into all this stuff so looking for guidance22:30
ianwany objection if i force merge 856593?  i would rather be debugging it in production that sending it around endless rechecks for external issues22:47
clarkbianw: looks like letsencrypt failed consistently. We're happy it wasn't due to this change?22:49
clarkb(the other jobs that failed seemed to succeed at other times)22:49
clarkbI think I'm ok with it if we are happy that the LE failure is unrelated22:49
ianwyeah, it has all succeeded -- https://bbd364adb9ecffa2cba8-64c11a180f7c97233fe37e0ff9660661.ssl.cf1.rackcdn.com/856593/17/check/system-config-run-letsencrypt/8ea3d67/letsencrypt01.opendev.org/acme.sh/acme.sh.log22:51
ianw"I believe this error basically says “the Let’s Encrypt database was overloaded”."22:52
ianwoh in fact they have an incident listed22:52
ianwOctober 11, 2022 18:45 UTC22:53
ianw[Identified] A hardware failure has disrupted operation of the Let's Encrypt Staging Environment. We are working on restoring service.22:53
clarkbwould be good if fungi et al could weigh in too before doing that though23:02
fungichecking...23:07
fungiianw: the le job still failed on the most recent build, from what i can see, but once you're confident they've got it resolved i'm still fine with that23:09
ianwwell it's just that we can't merge because we use the staging env in our tests23:10
clarkbright all jobs needing le certs will fail until that is fixed23:12
clarkbI suppose if all the jobs failed for that reason we might be able to just wait? Depends on how urgent you feel this is I suppos23:12
ianwi would like to give myself a bit of a runway to debug it in production ...23:13
fungioh! i missed that you said "force merge"23:14
fungii guess if you're also ready to force merge a revert if stuff stops working once it lands, i'm okay with that (or temporarily disable the failing job?)23:14
ianwyeah, what i want to watch is the production jobs, and hopefully catch things before the periodic jobs start23:15
clarkbnote a revert won't actually undo things23:15
clarkbbut ya force merging a fix would be the necessary step if LE staging continues to be broken should that be necessary23:16
clarkbok going to send that ansible and jammy announcement now23:20
fungithanks!23:20
ianwit should run the deploy pipeline, right?23:22
clarkbianw: it should as that is triggered by changes merging23:23
clarkbone risk there is if we force merge something that expects the gate to build imgaes that get promoted in deploy. But I don't think this is the case here23:23
ianwyeah, no images in this case, just the venv deployment to bridge23:24
clarkbemail sent to service-announce23:28
ianwi was hoping to move on with this today :/  i'll give it a little while longer and see if things clear up23:41
clarkbianw: it == LE?23:42
clarkbon the matrix bridging trouble i created a random room on oftc and tried to join it via matrix bridge and I don't even seem to be in the channel23:44
clarkbwhich I guess makes sense as I don't seem to be in here anymore from matrix at all23:44
clarkbwe all seemto have ping timed out at around 23:33 UTC yesterday23:45
clarkbabout 24 hours ago23:45
clarkbI'm working on filing an issue with their github repo since that seems to be where reports should go23:45
clarkbhrm now I notice that some matrix bridged users made it back23:47
corvusthere was bridge trouble?23:50
ianwclarbk: it == the venv deployment on bridge23:50
corvusfwiw, i have seen chatting here throughout my day to day23:50
clarkbianw: right I guess I'm wondering if you are waiting on LE instead of force merging or if something else is happening that I've missed23:52
clarkbcorvus: yes a number of us can see irc messages in matrix but cannot send them to irc from matrix23:52
clarkbcorvus: when I asked the irc bridge operatorson matrix about it I got pointed at a number of issues with the inverse problem which wasn't super helpful23:53
ianwclarkb: the LE outage is the current problem ... but bigger picture too, it does seem running *all* the system-config-run jobs twice without any failures is quite a dice to roll23:54
clarkbI tried leaving #openstack-release and rejoining but that doesn't seem to have caused me to actually rejoin23:57
clarkbif I issue !listrooms in the management room I get back you are connected, but not joined to any channels23:59
clarkbso it seems to know something is up23:59

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!