Monday, 2021-12-06

ianwok, i've cleared out the deleting nodes manually00:26
ianwboth builders are working away on images and have built one successfully00:26
ianwopenstack.exceptions.NotSupported: The image service for iweb:mtl01 exists but does not have any supported versions03:09
ianwi wonder how new this is03:09
ianwhttps://meetings.opendev.org/irclogs/%23zuul/%23zuul.2020-04-23.log.html#t2020-04-23T20:25:18 is another instance03:12
ianw$ curl https://image.api.mtl01.cloud.iweb.com03:21
ianwcurl: (60) SSL certificate problem: unable to get local issuer certificate03:21
ianwi think this is the real issue ...03:21
ianwmy browser seems to trust it, but the nodepool-builder image doesn't03:22
ianwthe underlying focal system doesn't seem to trust it, and neither does my debian testing system03:31
ianwneither does fedora 3503:31
ianwCertificate chain03:36
ianw 0 s:CN = image.api.mtl01.cloud.iweb.com03:36
ianw   i:C = GB, ST = Greater Manchester, L = Salford, O = Sectigo Limited, CN = Sectigo RSA Domain Validation Secure Server CA03:36
ianw 1 s:C = US, O = DigiCert Inc, CN = RapidSSL TLS DV RSA Mixed SHA256 2020 CA-103:36
ianw   i:C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root CA03:36
ianw 2 s:C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root CA03:36
ianw   i:C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root CA03:36
ianwi think this is actually sectigo03:36
ianwit looks like missing intermediate certificates to me03:41
opendevreviewIan Wienand proposed openstack/project-config master: iweb: disable image uploads  https://review.opendev.org/c/openstack/project-config/+/82047304:12
ianwon the plus side, "Linux fedora-35-inmotion-iad3-0027609836" ... fedora 35 now boots04:13
ianwnote per the commit in 820473 i've filed a ticket on the certificate issue04:15
*** pojadhav is now known as pojadhav|ruck04:31
*** ysandeep|out is now known as ysandeep04:43
*** marios is now known as marios|ruck06:37
*** pojadhav|ruck is now known as pojadhav|rover06:39
opendevreviewRiccardo Pittau proposed opendev/irc-meetings master: Move Ironic meeting 1 hour forward  https://review.opendev.org/c/opendev/irc-meetings/+/82047708:22
*** ysandeep is now known as ysandeep|lunch08:52
*** ysandeep|lunch is now known as ysandeep09:11
dtantsurHey folks. I'm seeing node_failures on fedora-latest and opensuse-15, is it expected/known?09:16
*** pojadhav is now known as pojadhav|rover09:18
*** rlandy is now known as rlandy|ruck11:14
*** ysandeep is now known as ysandeep|afk12:02
*** pojadhav|rover is now known as pojadhav|brb12:08
*** pojadhav|brb is now known as pojadhav|rover12:28
*** ysandeep|afk is now known as ysandeep12:44
*** marios|ruck is now known as marios|call13:37
*** marios|call is now known as marios|ruck|call13:38
*** marios|ruck|call is now known as marios|ruck14:34
*** ysandeep is now known as ysandeep|afk14:40
*** ykarel is now known as ykarel|away14:42
opendevreviewMerged opendev/elastic-recheck rdo: Fix ER bot to report back to gerrit with bug/error report  https://review.opendev.org/c/opendev/elastic-recheck/+/80563814:56
fungidtantsur: i think that means we don't have any nodes matching those labels... i'll take a look14:58
fungii know ianw was working on getting fedora-35 booting, and i see one in a ready state in rax-dfw since almost 8 hours15:01
fungisorry, no, 8 minutes15:01
fungiwhich means we're probably using them successfully but maybe we didn't have any when you tried since that was only a few hours after ianw said he'd gotten it booting15:02
fungioh, though the fedora-latest nodeset uses the fedora-34 label15:03
fungiand yeah, we don't seem to have any of those15:04
fungiright, we don't appear to have any fedora-34 images built15:05
funginor opensuse-15 images15:05
fungilooks like the builders are still spinning their wheels trying to upload to iweb-mtl01 because of the cert issue ianw observed earlier15:13
fungiseems like this has effectively deadlocked our image builds, and for some reason the previous opensuse-15 and fedora-34 images were deleted and no replacements have been built yet15:14
fungiwe may need to temporarily remove iweb-mtl01 from our builder configs15:15
fungii'll work on a patch for that now15:15
*** ysandeep|afk is now known as ysandeep15:17
fungioh, ianw already tried to push up a patch for that as 820473 but i think it's hitting the wrong file15:24
opendevreviewJeremy Stanley proposed openstack/project-config master: iweb: disable image uploads  https://review.opendev.org/c/openstack/project-config/+/82047315:26
fungii'll emergency-approve that15:26
fungiinfra-root: i've put nb01/nb02 into the emergency disable list and manually applied that change while we wait for it to merge, though i think the builder containers are going to have to be restarted since they're looping far down enough they're not going to reach the point where they reread their configs automatically15:33
corvusfungi:  if you ilke, i'll take a look at the logs and see if i concur with the looping hypothesis?15:35
fungicorvus: please do15:36
fungii haven't restarted them yet15:36
corvusfungi: 2021-12-06 15:30:25,831 DEBUG nodepool.ProviderManager: Creating new ProviderManager object for iweb-mtl0115:37
corvus makes me think it did reload the config... not sure what it's doing right now though15:37
fungiyeah, actually, it looks like they may have15:37
fungithat's roughly the time i made the edit15:37
corvusit's deleting an image from mtl01 though, which is not what you want it to do right now?15:38
fungii wonder if it's trying to delete one it never successfully uploaded15:38
fungii set all the images to paused there in its config15:38
corvusyes, it's deleting failed uploads15:39
corvusand the one it said it was deleting is gone, so i think it may be done, and correctly idling now15:39
corvusfungi: i think your procedure worked as expected :)15:40
corvusthat was on nb01 btw15:40
fungiawesome. hopefully they'll proceed to build the missing images15:40
fungisince we're short at least fedora-34 and opensuse-15 at the moment15:40
funginot sure why those disappeared globally, maybe related to the cleanup ianw did when resolving the full filesystems15:41
opendevreviewMerged openstack/project-config master: iweb: disable image uploads  https://review.opendev.org/c/openstack/project-config/+/82047315:49
fungithe builders still don't actually seem to be doing anything now though... corvus: in theory they should be moving on to start building missing images, right?15:52
fungiit's been over 20 minutes since they logged that they were deleting records for failed uploads15:53
fungisince the pause merged, i'll take the builders back out of emergency disable now15:59
jrossercorvus: here's the keycloak realm config ansible tasks https://gist.github.com/jrosser/0444430988ee4d28788f2577c64712a916:02
jrosserfeels like an example of where some custom module might be warranted now rather than massive use of the uri module16:03
jrosseroh, it's not maintained the order, the entry point is configure_realms.yml16:04
fungicorvus: i'm going to restart the nodepool-builder container on nb01 and see if it starts building things. i'll leave nb02 as-is for further inspection for the moment16:11
corvusfungi: maybe a thread dump on nb02 would be warranted to see what it's up to16:12
fungiyeah, i was going there next16:12
fungithough the nb01 debug log has been silent since i restarted it, and it doesn't seem to think it needs to build any images16:14
fungistill no opensuse-15 nor fedora-34 images according to dib-image-list nor image-list16:16
funginot even in a building state16:16
clarkbsomething needs to trigger the daily builds if that is what you are waiting on16:16
fungii thought the builders automatically built any missing images?16:16
fungidid they stop doing that?16:17
clarkbthey do, but there is still a process that goes through and finds which ones need to be rebuilt and explicitly enqueues build requests for them16:17
clarkbif that thread isn't running or doesn't run extremely frequently we could still eb waiting for that16:18
fungiaah16:18
fungiany idea how often? hourly? longer?16:19
fungii always recall it being fairly immediate16:19
*** ysandeep is now known as ysandeep|out16:20
clarkblooks like it checks every loop through on the build worker16:22
clarkbwhich means if we're building an image already that could explain it16:22
fungiexcept nothing is building from what i can see16:23
fungitwo thread dumps a minute apart and yappi stats are at the end of the nb02 debug log now16:24
fungi2021-12-06 16:21:45,839 DEBUG nodepool.stack_dump: Beginning debug handler16:25
fungistarts there16:25
clarkbdo we know why those images got deleted?16:26
clarkbboth thread dumps show the build worker is in _checkForManualBuildRequest which happens after checking for stale images that need to be built. I'm not sure we can conclude it is stuck in there yet though and may have just gotten lucky when the signals were sent16:28
fungii do not. speculation is it was something to do with the cleanup ianw was doing after the filesystem filled up on all the builders due to hanging deletes16:28
clarkbquickly double checking the images are still in the config and dib-image-list doesn't record them16:29
clarkbya I'm not sure why it isn't finding those images don't exist and need to be built16:30
clarkbwe can manually request a build and see if that behaves different16:31
clarkbshould I go ahead and do that or do we want to do more debugging first? cc corvus16:31
corvusclarkb: maybe do another dump and see?16:34
clarkbsure can do16:35
corvusoh actually it looks like the first thread dump was in _checkForScheduledImageUpdates16:36
clarkbasync_object.set_exception(ConnectionClosedError( <- my first caught that, second seemed happier though but all were in the manual build requests16:37
clarkbcorvus: oh hrm. Ya so where is it deciding that it doen't need to build those images16:37
clarkbcheck if disk images are paused.16:40
opendevreviewRiccardo Pittau proposed openstack/diskimage-builder master: Install python versions specific pip and virtualenv modules  https://review.opendev.org/c/openstack/diskimage-builder/+/82056316:41
corvusfedora34 is paused in the config file16:41
clarkbyes as is opensuse-1516:41
clarkbso mystery solved. However, I bet that means we can't build those images now :/16:41
clarkbwe can try unpausing them. I'll look in git history to see why they were puased and propose reverts/toggles16:41
fungioh, ugh16:41
corvusprogram performs as instructed, news at 11 :)16:42
fungiwell, i guess that explains it16:42
fungimakes me wonder all the more how they got deleted globally16:42
clarkbfedora-34 was paused until https://review.opendev.org/c/openstack/diskimage-builder/+/817317 could be landed. It has landed but we also need it to be included in the nodepool builder images. I'm checking that next16:43
clarkbyes I think we're good for that one.16:44
clarkbthere is a dib release after that merged and nodepool requires that latest version16:44
clarkbopensuse-15 was paused due to "persistent build failures" almost a year ago16:45
fungifedora-34, gentoo-17-0-systemd, opensuse-15 and opensuse-tumbleweed are the ones which are paused. of those, only gentoo-17-0-systemd has images16:45
clarkbI think we unpause opensuse-15 and debug the build failures16:45
fungiso we seem to have lost fedora-34, opensuse-15 and opensuse-tumbleweed16:45
clarkbthen decide if we're going to keep building them16:45
clarkbfungi: re tumbleweed without suse's mroe active involvement I think we should consider not building it at all16:46
clarkbI thought it would be a good way to get leading edge stuff but only if we had people around that could care for it. Opensuse-15 seems a bit more important. Anyway unpauses should be up as soon as I can write the commit16:46
fungiyeah, makes sense, i'm currently just trying to discern if there's something about the paused image builds which resulted in their deletion16:46
clarkbno I suspect they all got caught in ianw's deletion. Maybe because they were old looking and didn't get filtered by active in the dib-image-list?16:47
clarkbor perhaps they had fallen out of the dib-image-list somehow but were still in the image-list?16:47
*** pojadhav|rover is now known as pojadhav|out16:48
fungileaves me wondering why the gentoo image was spared that fate16:48
opendevreviewClark Boylan proposed openstack/project-config master: Unpause fedora-34 and opensuse-15 image builds  https://review.opendev.org/c/openstack/project-config/+/82056516:49
fungithere's also 816933 proposing to remove f34 images since roughly a month16:51
clarkbwe should probably unpause for now since we had those images recently, then work to retire them in the normal process.16:52
clarkbif opensuse-15 completely fails to build still and no one shows up to fix them that might be another story though16:52
fungiyeah16:52
fungitesting on year-old opensuse images was probably not all that great in the first place16:52
fungiwilling to bet 15.2 isn't even the current point release16:53
fungiand may not be available any longer for that matter16:53
clarkbya I think a new release has happend. We should see where we are at after getting logs and send notice to people that the suse stuff needs care and we can guide people through that if there is itnerest otherwise start to retire it16:53
*** marios|ruck is now known as marios|out16:54
fungiwe do still seem to be mirroring 15.2 packages at least16:54
fungihttps://mirror.dfw.rax.opendev.org/opensuse/distribution/leap/15.2/repo/oss/x86_64/ has a bunch of rpms in it16:55
clarkbya part of the struggle here is that unlike ubuntu or centos when a major release gets updated the package mirrors don't auto roll forward16:55
clarkbeach "minor" release is a true release with an entirely separate package mirror aiui16:55
clarkbwhich means that we need to spin up the 15.3 mirror first, then spin up the images, then remove 15.2 images then remove 15.2 mirror16:56
clarkb(or maybe we start pushing back on mirroring for these lesser used images entirely and spin up 15.3 without a local mirror if people want 15.3)16:56
fungithat's a possibility, sure16:56
clarkbcompared to say 20.04.x becoming 20.04.x+1 where the main mirror just updates and we auto sync16:57
clarkbsimilar story with centos 8 (though it isn't updating anymore)16:57
clarkbWe should probablyremind people that centos-8 is going away at the end of the year and plan to remove it in early january in our nodepools?16:57
clarkbI need to find breakfast but I'll probably land the matrix-gerritbot image update change after that. System-config updates are happening normally as expected now right?16:59
clarkbfungi: also how is lists.o.o fixing going /me finds those changes17:00
clarkblooks like we may still need the port 25 update to make them mergeable? (Also a rebase? I bet it was the keyclock change that conflicted)17:02
fungii haven't pushed the update yet17:02
fungiabout to head out to an appointment but planned to send them up when i get back in a couple hours17:02
clarkbsounds good, thanks17:02
clarkbre opensuse mirroring. I'm currently getting a few KBps at home pulling updates17:03
clarkbentirely possible that the suse distros aren't viable without mirroring :/17:03
fungithinking about how our deployment testing works, i have a feeling i'm going to need a custom test-only role to block 25/tcp egress in tests, as we normally test only applying the production iptables rules configured in our inventory. i guess i would include it in jobs after playbooks/roles/iptables so it doesn't get undone17:10
fungiand just have it inject rules via the cli17:10
clarkbI think you may also be able to add custom rules to the test base inventory?17:12
clarkbopendev/system-config/playbooks/zuul/templates/group_var/all.yaml.j2 is the file17:13
fungioh, is that only used for testing?17:13
fungii should be able to add an iptables_rules var in there, in that case17:14
clarkbya the templates under there are put in place as fake private group vars17:14
opendevreviewMerged openstack/project-config master: Unpause fedora-34 and opensuse-15 image builds  https://review.opendev.org/c/openstack/project-config/+/82056517:14
fungilike we do in inventory/service/group_vars/review.yaml17:14
clarkbya though the ones under inventory and on bridge are the real prod values17:15
clarkbonly the ones under playbooks/zuul/templates are test only17:15
fungiokay, headed out to my appointment, should hopefully be back around 20:00 utc17:15
*** sshnaidm is now known as sshnaidm|afk17:25
*** rlandy|ruck is now known as rlandy|ruck|brb17:26
clarkbonce I've confirmed nodepool has unpaused properly I'm going to approve tristanC's matrix-gerritbot update17:26
tristanCclarkb: nice, i'm here to check if it goes wrong17:30
clarkbthanks17:31
clarkbjust waiting for the nodepool job to run then I'll get to that17:31
clarkbthey are unpaused and building now \o/17:39
clarkband now matrix-gerritbot has been approved17:39
tristanCclarkb: i'm watching curl eavesdrop01.opendev.org:9001/metrics | grep error17:45
clarkbtristanC: ok I think it may take a little time to land the change since it does the deploiyment testing first17:45
clarkbinfra-root something like https://etherpad.opendev.org/p/xvu2oKUQVLiHIHsvGRUt for the image changes (this assumes the opensuse images are still broken)17:52
*** rlandy|ruck|brb is now known as rlandy|ruck18:13
fricklerclarkb: also mention the f34 to f35 step in the same run or was that already announced? and what about tumbleweed? otherwise I'm fine with the wording18:13
clarkbfrickler: oh ya good point. We should mention fedora-34 being replaced by 35. For tumbleweed I guess we should say we'd prefer to turn it off entirely since it really hasn't been maintained18:14
clarkbfrickler: updated with the additional thoughts. Thanks that was good feedback18:18
opendevreviewMerged opendev/system-config master: Update the gerritbot-matrix image to support arbitrary uid  https://review.opendev.org/c/opendev/system-config/+/81864518:18
clarkbtristanC: ^ I think that should apply shortly18:19
tristanCclarkb: ssh_errors counter is increasing, has the service restarted already?18:33
clarkbConnecting to review.opendev.org:29418 No user exists for uid 1100018:33
clarkbtristanC: yes it just restarted and ^ is the error. I guess openssh or something else (the fork tooling in ghc?) doesn't like forking to a non existent user18:34
tristanCthat's odd, it did work with rootless podman18:34
clarkbI'll manually undo the uid/gid specification and restart18:34
tristanCclarkb: yes please, we'll need another update18:34
clarkbok thats done. The new image seems to be working with the uid:gid specification commented out of the docker-compose.yaml file18:35
clarkb(I didn't revert the image)18:35
clarkbI guess we need stronger checks in our deployment job too18:36
opendevreviewClark Boylan proposed opendev/system-config master: Unset the matrix gerritbot uid:gid settings  https://review.opendev.org/c/opendev/system-config/+/82058318:38
clarkbI'll self approve ^18:38
clarkbtristanC: ^ I included the entire log context around the error if that is helpful in the commit message there18:39
clarkbsounds like openssh is the problem18:39
clarkbit very specifically wants the uid to exist? that is odd, but ok18:40
clarkbmaybe a security thing?18:40
*** avass[m] is now known as AlbinVass[m]18:45
tristanCit is openssh complaining about the arbitrary uid, but i don't get that error when running `podman --user 11000`18:48
opendevreviewClark Boylan proposed opendev/system-config master: Update gitea to 1.15.7  https://review.opendev.org/c/opendev/system-config/+/82026719:02
tristanCok, i reproduce the issue when using rootfull docker19:02
clarkbinfra-root ^ I decided to double check that the templates didn't change and there is ony minor update to the head navbar template in a block that we comment out. I figured best to deal with small deltas now rather than bigger deltas in future upgrades19:03
clarkboh wow opensuse 15 and fedora-34 both built on their first passes19:04
clarkbI'll need to update my email etherpad19:04
tristanCclarkb: oh i see, podman populates a correct /etc/passwd in that case19:05
clarkbthat is an intersting behavior. Images aren't so immutable I guess :)19:07
clarkbbut that certainly explains why it doesn't error19:07
clarkbdtantsur: I think you should be set for fedora and opensuse now19:08
fungiokay, back and catching up19:18
fungiclarkb: built maybe, but did they boot?19:20
clarkbfungi: I haven't checked that19:23
fungii'll look19:23
fungi| fedora-34-0000024221             | fedora-34             | nb01.opendev.org | qcow2,raw,vhd | ready | 00:00:25:59  |19:23
fungino, sorry, that's not what i meant to check19:23
fungithough it's probably still uploading given it's that new19:24
fungiyeah, still uploading everywhere19:24
fungisame for opensuse-1519:24
fungii'll check on it again in half an hour19:24
clarkbfungi: I'll remove the ubuntu one 2fa topic from the meeting agenda? I think you've proceeded with that19:25
fungiyep, sgtm19:26
fungiunless you want to recap it19:26
clarkbnah I think its fine as that has been on the mailing list and people can follow along there19:26
fungiclarkb: for test nodes announcement, should we include anything about the debian-stretch/debian-stable image cleanups? are those still in progress?19:26
fungialso gentoo probably needs to get the axe unless someone has time to figure out how to get it building/booting successfully19:27
clarkbfungi: not sure what debian-stable is re our images. But stretch is long gone19:27
clarkbhrm ya can add gentoo to the list19:27
fungidebian-stable was an alias for debian-stretch, which we eliminated at the same time (in theory)19:27
fungilooks like vexxhost/ansible-role-wireguard still references the debian-stretch node label, but that's the only straggler19:29
fungiwe don't still build them at least19:29
fungicodesearch doesn't turn up any job configs referencing a debian-stable nodeset either, so i think that's all basically complete19:30
clarkbI'll update the email draft once I'm done editing meeeting agenda stuff19:31
clarkbok I've got my agenda edits in. I'll let them simmer for a bit if others want to add stuff before I send the actual agenda out later today19:34
clarkbfungi: updated with a bit about gentoo.19:36
fungithanks... should we try un-pausing gentoo to see if it's somehow miraculously back to working again?19:36
fungii have at least one old zuul-jobs change blocked on being able to boot gentoo nodes (771106 and its parent)19:37
clarkbI guess we can. Do we have hints as to what the problem is in our logs (opensuse lacked that so I decided the best option was to unpause)19:38
fungii'm not really sure at this point, our existing images are over a year old19:39
fungichecking git history19:39
fungihttps://review.opendev.org/797790 nodepool: pause gentoo and tumbleweed builds19:40
fungi"Both of these are failing in ways that look like we need to fix them in dib.  Stop attempting to build them for now."19:41
funginot much to go on19:41
clarkbya so maybe we unpause and check the results, we may be surprised as with opensuse 1519:41
clarkbadded a note to the agenda bout image spring cleaning19:42
clarkband it isn't spring anywhere in the world right now is it? so no worries about hemisphere confusion :)19:42
clarkbI'ev manually fixed up matrix gerritbot hopefully for the last time before my fix lands19:49
fungiit's when you clean the loose springs out of the machinery19:49
clarkb(just manually applying the comment out of the user line in the docker compose file and restarting things19:50
ianwo/ ... thanks for fixing the image screwup19:50
fungiwell, fingers crossed they can actually boot19:51
fungichecking again now that some of the uploads have completed19:52
fungi| 0027618409 | ovh-gra1            | opensuse-15              | 597fa9a1-7752-43b4-8a61-62fe98f949e4 | 149.202.179.158 |                                         | ready    | 00:00:10:14  | unlocked |19:52
fungino fedora-34 nodes yet19:52
clarkbI was able to ssh to that suse node19:53
opendevreviewMerged opendev/system-config master: Unset the matrix gerritbot uid:gid settings  https://review.opendev.org/c/opendev/system-config/+/82058319:54
ianwclarkb: mail looks about right.  gentoo is building in the dib gate ... i'm not sure but it might build ok now19:55
clarkbok, lets unpause it and see what happens then we can send the email out once we've got the data19:55
clarkbfungi: did you want to push that change since you suggested it? or should I go ahead and push that up?19:55
fungii'm happy to push it now19:55
clarkbcool thanks19:56
ianwthe problem has been it's a bit unreliable, sometimes it takes a long time to build and other times it breaks.  prometheanfire is always responsive, but also i don't think anyone else has contributed to the image either19:56
fungishould gentoo still be python3.8 and default/linux/amd64/17.1/systemd or something newer?19:57
prometheanfireya, I don't think you need to have an infra gentoo image anymore19:57
prometheanfireI need to see what the current default is, it might be 3.919:57
opendevreviewMerged openstack/diskimage-builder master: Document EFI elements requirements  https://review.opendev.org/c/openstack/diskimage-builder/+/81940619:58
clarkbI think it has a bit of value to projects like zuul and bindep that try to have broad assurance of functionality, but the cost of fixing random issues with gentoo when they show up in those tools is probably less than keeping the image alive?19:59
prometheanfirelooks like we are setting it to 3.919:59
fungiprometheanfire: clarkb: ianw: i'll hold off pushing the un-pause for gentoo-17-0-systemd image builds until we know what we should build19:59
prometheanfirefungi: atm I don't think the image is used anywhere in gate20:00
clarkbfungi: what we should build meaning the python version?20:01
fungiopenstack/openstack-ansible-tests seems to reference it in several jobs (which are probably themselves unused)20:01
fungii have a long-wip change to fix gentoo support in the configure-mirrors role in zuul/zuul-jobs which is pending having working gentoo images again20:03
fungibut will just abandon that stack if we're dropping ubuntu images entirely20:03
clarkbfungi: or do you mean not bother until I send the email out and check if anyone wants to fix them?20:04
prometheanfiregentoo, for now fully supports py38 and 39, for the python binary itself 37 and 3.10 are available20:04
clarkbwe should probably build with 39 if unpausing then20:05
fungiclarkb: what we should build meaning the python version, but also profile and thus nodename20:05
clarkbgot it20:05
fungiright now we have images defined for gentoo-17-0-systemd which set GENTOO_PROFILE: 'default/linux/amd64/17.1/systemd'20:05
fungiit sounds like 19 is the current gentoo release?20:05
clarkbalright I'm going to go find lunch. Back in a bit. Might be good to try and land the gitea 1.15.7 update later today if people have time to look that over20:05
prometheanfiregentoo has releases? :P20:06
fungiprometheanfire: it has versioned profiles, at least ;)20:06
ianwfungi: it's probably worth double-checking what is being built in the dib jobs too, to keep in sync20:06
prometheanfiretrue, 17.1/systemd is current20:06
fungioh, in that case we should stick with 17.1/systemd20:06
prometheanfirethat's separate from what you want for python20:07
fungiupping the GENTOO_PYTHON_TARGETS and GENTOO_PYTHON_ACTIVE_VERSION envvars at least doesn't seem to imply we'd want a different image name20:08
fungii'll switch those to 3.9 in the unpause change20:08
opendevreviewJeremy Stanley proposed openstack/project-config master: Un-pause gentoo-17-0-systemd images  https://review.opendev.org/c/openstack/project-config/+/82059020:14
fungiprometheanfire: clarkb: ianw: ^20:14
opendevreviewMerged openstack/diskimage-builder master: Disable all repositories after attaching a pool  https://review.opendev.org/c/openstack/diskimage-builder/+/72787920:20
fungii see fedora-34 nodes building in ovh-bhs1 now, so i'll see what happens with them20:31
opendevreviewTristan Cacqueray proposed opendev/system-config master: Update the gerritbot-matrix image to support arbitrary uid with docker  https://review.opendev.org/c/opendev/system-config/+/82059120:36
fungithe fedora-24 nodes in ovh-bhs1 repeatedly raised nodepool.exceptions.ConnectionTimeoutException20:37
funginow there20:38
tristanCclarkb: 820591 seems to fix the docker issue20:38
fungi's one booting in rax-ord i'll watch20:38
fungi2021-12-06 20:40:14,921 INFO nodepool.NodeLauncher: [e: dbb13cf88df149999975274eaf43da37] [node_request: 300-0016378697] [node: 0027618929] Node is ready20:45
clarkbfungi: the ovh problem is probably the problem we had previously just continuing20:45
fungilooks like we're successfully booting fedora-34 in rad-ord but not ovh-bhs1, which yeah seems like deja vu20:45
clarkbya I bet that is the old problem not being corrected but this should work long enough for us to delete f3420:46
fungiright20:47
clarkbtristanC: fungi +2 on your respective changes thanks20:47
ianwfungi: yep, i don't think we merged the initramfs regeneration20:56
ianwclarkb/fungi: https://review.opendev.org/c/zuul/zuul-jobs/+/818702 updates zuul jobs for f35, and surprisingly  (to me) passed without alteration20:57
clarkbianw: I too am surprised it passed without alteration but +2 from me. Do you have time for https://review.opendev.org/c/opendev/system-config/+/820267?21:15
clarkbfungi: ianw: I'm also happy to watch https://review.opendev.org/c/opendev/system-config/+/820591 if one of you have time to review that one as well21:15
ianwoh yep, looking21:18
ianwi thought we took some screenshots of the gitea page; we should do that21:18
fungioh poop, the iptables_rules var has -A openstack-INPUT hard-coded so i can't use it for egress21:18
clarkbfungi: oh we don't have an egress chain we can modify? hrm21:19
clarkbHow difficult would it be to add an output chain?21:19
clarkb(and is that a good idea?)21:19
clarkbianw: Ya taking a screenshot of the front page, the system-config repo page, and a file in system-config is probably a good idea21:21
clarkb(we populate system-config in that job which makes it a good test case)21:21
clarkblet me see if I can figure that out21:21
clarkbianw: hrm the testinfra stuff calls take_screenshot in test_gitea21:23
clarkbhttp://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f6e/820267/2/check/system-config-run-gitea/f6e2e68/bridge.openstack.org/screenshots/ ah yup there they are21:23
clarkbok ya I think you already set this up for us :)21:24
mgagnefungi: the issue with the certificate intermediate at iweb is being worked on21:27
ianwclarkb: haha i thought so!  i had the wrong job window open 21:27
clarkbianw: and the screenshots look good for 1.15.7, no need to remove the +A21:27
ianwmgagne: thanks!  was it a recent change, or did we just not notice21:28
mgagneThe API certificates were renewed and intermediates didn't properly get updated.21:29
mgagneianw: I've been told the issue has been solved.21:30
ianwmgagne: cool, yes looks like i can connect now21:31
ianwclarkb: not sure where we ended up, are the builders still in emergency?21:32
clarkbianw: no the builders should be out of emergency. At least the change I pushed to unpause opensuse15 and f34 landed and updated automatically21:32
opendevreviewIan Wienand proposed openstack/project-config master: Revert "iweb: disable image uploads"  https://review.opendev.org/c/openstack/project-config/+/82059521:37
opendevreviewMerged openstack/project-config master: Revert "iweb: disable image uploads"  https://review.opendev.org/c/openstack/project-config/+/82059521:56
fungiianw: yeah, my update to your pause change merged at which point i took the builders out of emergency disable22:18
fungiso just reverting that normally should be sufficient22:18
clarkbthe gitea change should merge shortly. I'll keep an eye on that and when that is done approve the matrix-gerritbot update22:18
opendevreviewMerged opendev/system-config master: Update gitea to 1.15.7  https://review.opendev.org/c/opendev/system-config/+/82026722:21
clarkbianw: should we go ahead and land https://review.opendev.org/c/openstack/project-config/+/820590 to restart gentoo image builds too?22:21
fungiadding egress rules shouldn't be hard, i'll just end up creating a separate var for them and corresponding support in the rules files22:22
opendevreviewMerged opendev/system-config master: Update letsencrypt role docs to suggest a specific order  https://review.opendev.org/c/opendev/system-config/+/82040922:23
clarkbfungi: ya I'm more just wondering if the lack of prior art indicates maybe we should appraoch this from another angle. But I still like the firewall rule as it captures what we want pretty well I think from a prevention of state leakage perspective22:24
fungii'm trying to decide between just conditioninally adding to the default OUTPUT chain vs creating a separate openstack-OUTPUT chain guarded by nonzero content in a iptables_rules_egress var22:26
fungiwe have significant enough regression testing with testinfra that probably if adding to the OUTPUT chain in test jobs broke anything we'd notice before it could merge anyway22:27
clarkbthe other place that this might affect is docker22:28
clarkbfor that reason I think we should stick to a separate chain since that has worked well for input22:28
fungithat's fair22:29
fungialso the openstack in openstack-INPUT saddens me, but it would be nontrivial to change given existing production deployments22:30
fungiideally we'd have chosen something more neutral22:30
clarkbok gitea01 is done. The stop and start of the containers was not quick. It took almost 4-5 minutes I think22:41
clarkbbut it seems to have done so successfully and I expect the next 7 will also be happy22:41
clarkbmeeting agenda sent22:48
clarkbI've gone ahead and approved the matrix gerritbot update since the gitea updates seem to be going well and are almost done22:55
clarkbgitea upgrade all done23:03
ianwall images available in iweb too23:14
opendevreviewMerged opendev/system-config master: Update the gerritbot-matrix image to support arbitrary uid with docker  https://review.opendev.org/c/opendev/system-config/+/82059123:26
opendevreviewIan Wienand proposed opendev/dstat_graph master: Add .gitreview  https://review.opendev.org/c/opendev/dstat_graph/+/82060923:26
opendevreviewIan Wienand proposed opendev/dstat_graph master: Update dependencies  https://review.opendev.org/c/opendev/dstat_graph/+/82063023:26
opendevreviewIan Wienand proposed opendev/dstat_graph master: Trim styles  https://review.opendev.org/c/opendev/dstat_graph/+/82063123:26
opendevreviewIan Wienand proposed opendev/dstat_graph master: Conver to bootstrap 5  https://review.opendev.org/c/opendev/dstat_graph/+/82063223:26
opendevreviewIan Wienand proposed opendev/dstat_graph master: Trim margin of overview panel  https://review.opendev.org/c/opendev/dstat_graph/+/82063323:26
opendevreviewIan Wienand proposed opendev/dstat_graph master: Fix walking elements of input list  https://review.opendev.org/c/opendev/dstat_graph/+/82063423:26
opendevreviewIan Wienand proposed opendev/dstat_graph master: Use list-group-item for graphs  https://review.opendev.org/c/opendev/dstat_graph/+/82063523:26
opendevreviewIan Wienand proposed opendev/dstat_graph master: Filter out blank lines  https://review.opendev.org/c/opendev/dstat_graph/+/82063623:26
opendevreviewIan Wienand proposed opendev/dstat_graph master: Replace remove .size() with .length  https://review.opendev.org/c/opendev/dstat_graph/+/82063723:26
opendevreviewIan Wienand proposed opendev/dstat_graph master: Fix focus panel chart selection for pcp-dstat  https://review.opendev.org/c/opendev/dstat_graph/+/82063823:26
opendevreviewIan Wienand proposed opendev/dstat_graph master: Update generate_page.sh  https://review.opendev.org/c/opendev/dstat_graph/+/82063923:26
opendevreviewIan Wienand proposed opendev/dstat_graph master: Add a sample.csv and update docs  https://review.opendev.org/c/opendev/dstat_graph/+/82064023:26
opendevreviewIan Wienand proposed opendev/dstat_graph master: Fix refresh for embedded csv  https://review.opendev.org/c/opendev/dstat_graph/+/82064123:26
opendevreviewMerged opendev/irc-meetings master: Move Ironic meeting 1 hour forward  https://review.opendev.org/c/opendev/irc-meetings/+/82047723:29
clarkbmatrix-gerritbot restarted about a minute ago and I see no errors23:34
clarkbneed an event to show up in the testing channel then I think we can be happy with this23:35
clarkbanyone have a change to push up? :)23:35
*** rlandy|ruck is now known as rlandy|out23:37
fungii'm too lazy to come up with one at this point in my evening23:38
clarkbhttps://review.opendev.org/c/opendev/system-config/+/815049 is an easy one we can approve if you aren't too lazy to take a look at that one. ianw another option is https://review.opendev.org/c/openstack/project-config/+/820590 I would approve it myself but don't want to leave you with a looping failing gentoo build if it is unhappy later today23:40
ianwclarkb: no problem, i can check on gentoo -- if it's non-obvious why it's failing we can pause it again23:41
ianwthe doc one lgtm23:42
clarkbianw: ya I'm hoping fungi can double check the doc one since we thought through that last time we did a rename23:42
fungii just approved it23:42
clarkbthanks!23:42
fungithanks for writing it!23:42
opendevreviewIan Wienand proposed opendev/dstat_graph master: Add basic Zuul job  https://review.opendev.org/c/opendev/dstat_graph/+/82064223:46
clarkbthat event showed up in matrix. I think we're good there. thank you tristanC 23:46
opendevreviewMerged openstack/project-config master: Un-pause gentoo-17-0-systemd images  https://review.opendev.org/c/openstack/project-config/+/82059023:51
opendevreviewIan Wienand proposed opendev/dstat_graph master: Add basic Zuul job  https://review.opendev.org/c/opendev/dstat_graph/+/82064223:52
clarkbthe letsencrypt job failed because the limestone mirror is not reachable23:54
clarkbI can confirm it doesn't ping and http doesn't seem to work either23:54
clarkbopenstack server list says the server is in a shut off state23:55
clarkbdid we do that? SHould we try turning it back on again or add it to the emergency file instead for now?23:55
clarkbupdated                     | 2021-12-06T10:26:00Z presumably that is when it was shut off23:56
fungii don't recall shutting it off, likely something happened on the host23:57
clarkbI'm going to try starting it23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!