Tuesday, 2022-10-25

ianw^ anyone have an issue with adding the dns/inventory entries for bridge01.opendev.org?  00:01
Clark[m]ianw: I've switched to dinner mode, but that inventory addition doesn't modify the groups so should be low impact if I understand your change stack. I'm good with that as a first step00:05
ianwno probs -- yes intended to be no-op.  i'll be watching closely as i figure out the final switching steps00:08
opendevreviewMerged opendev/zone-opendev.org master: Add bridge01.opendev.org  https://review.opendev.org/c/opendev/zone-opendev.org/+/86254300:14
ianwhrm, it looks like the prod playbooks aren't as happy i would have thought :/00:27
ianwThe error appears to be in '/var/lib/zuul/builds/f118475c70fc47f78f1422c3365ed2d5/untrusted/project_0/opendev.org/opendev/system-config/playbooks/zuul/run-production-playbook-post.yaml': line 3, column 7, but may00:28
ianwthe role 'add-bastion-host' was not found00:28
opendevreviewIan Wienand proposed opendev/system-config master: Move add-bastion-host to playbooks/zuul/roles  https://review.opendev.org/c/opendev/system-config/+/86254500:44
opendevreviewMerged openstack/diskimage-builder master: Added example configuration  https://review.opendev.org/c/openstack/diskimage-builder/+/86158201:16
opendevreviewMerged opendev/system-config master: Move add-bastion-host to playbooks/zuul/roles  https://review.opendev.org/c/opendev/system-config/+/86254501:35
opendevreviewIan Wienand proposed opendev/system-config master: add-bastion-host: use hostname directly  https://review.opendev.org/c/opendev/system-config/+/86254601:58
opendevreviewIan Wienand proposed opendev/zone-opendev.org master: Bump serial number  https://review.opendev.org/c/opendev/zone-opendev.org/+/86254802:36
opendevreviewMerged opendev/system-config master: add-bastion-host: use hostname directly  https://review.opendev.org/c/opendev/system-config/+/86254602:54
opendevreviewMerged opendev/zone-opendev.org master: Bump serial number  https://review.opendev.org/c/opendev/zone-opendev.org/+/86254803:04
ianwok, sorry about that, prod jobs should be fixed now ... https://zuul.opendev.org/t/openstack/builds?job_name=infra-prod-service-nodepool&project=opendev/system-config03:25
ianwwhat i totally missed what that monitoring the bootstrap-bridge job is not enough, because that didn't actually use the production playbooks ... doh03:25
opendevreviewMerged opendev/system-config master: Add bridge01.opendev.org to inventory  https://review.opendev.org/c/opendev/system-config/+/86254404:04
opendevreviewIan Wienand proposed opendev/base-jobs master: Switch to bridge01.opendev.org  https://review.opendev.org/c/opendev/base-jobs/+/86255105:08
*** marios is now known as marios|ruck05:08
opendevreviewIan Wienand proposed opendev/system-config master: Switch bridge to bridge01.opendev.org  https://review.opendev.org/c/opendev/system-config/+/86111205:12
fricklerI'm seeing very slow cloning happening from opendev.org right now for the nova repo. according to openssl I'm landing on gitea02, but cacti looks normal though10:05
*** rlandy|out is now known as rlandy10:33
*** dviroel|out is now known as dviroel11:29
fungifrickler: fwiw, it seems slow for me too11:54
fungia little over 6 minutes for me to clone it11:57
fungiand i seem to be getting balanced to gitea0612:02
*** rlandy is now known as rlandy|mtg12:02
fungiand it may be more that twice as slow for me over ipv4 as compared to ipv612:23
fungigit clone -4 just now took over 14 minutes12:23
fungicloning directly from gitea06 in a shell on gitea-lb02 goes quickly though (~1 minute)12:37
fricklerif I override opendev.org in /etc/hosts to point to lb01, I get full speed. so seems to be something about lb0212:45
fricklerhmm, or maybe the issue has stopped somehow, still fast after reverting that override12:47
fungiyes, seems fast from here now too. based on observed behaviors, i expect it was a network issue upstream from our virtual servers (whether inside vexxhost sjc1, at the border, or within a backbone provider, hard to know which from our vantage point)13:17
fungiwell, now it's slowing down again for me (but not as slow as it was earlier)13:24
*** dasm|off is now known as dasm|rover13:24
fungiaround 3.5 minutes13:25
fungiso whatever the network problem it is may be ongoing, just having a variable impact13:28
*** rlandy|mtg is now known as rlandy13:33
Clark[m]Can you try going through lb01 when it is slow just to confirm that it isn't load balancer specific?13:56
fungisure13:56
Clark[m]But also I'm fairly certain I've seen 5 minute nova clones in the past. So I'm not sure this is a new problem either13:57
Clark[m]Iirc our zuul clone timeout is ~10 minutes due to nova. But that clones from Gerrit.13:57
fungialso it was slow bypassing lb02 and going directly to a gitea backend13:58
fungiso i doubt lb01 would (even could) make that any better13:58
Clark[m]++13:58
fungii just got a 12 minute duration cloning nova13:59
fungi(through lb02 that time)13:59
fungitrying through lb01 now13:59
fungibut it doesn't look appreciably faster13:59
Clark[m]I guess another thing complicating this is lb02 may balance you to a slow backend and lb01 to a fast one. But looks like both produce similar behavior implying it isn't a fault of the load balancer upgrade14:00
fungiyeah, also similarly slow going directly to different backends14:00
fungiwatching the data transfer rates more closely, the speed is dramatically impacted by lengthy periods where basically nothing is getting through14:26
fungiso it seems bursty, not steadily slow14:27
fungibut i'm not seeing obvious signs of packet loss either14:34
*** rlandy is now known as rlandy|dr_appt14:37
Clark[m]That could be git being slow to construct pack files?14:52
fungipossible, i suppose, but it's not during the "Enumerating/Counting/Compressing objects" phases15:09
fungii was seeing it in the "Receiving objects" phase15:10
clarkbhuh15:10
clarkbas a heads up I've got people working on my house today which I think will at some point include shutting off power to my home. I've got my network stack on UPS and if that fails I can tether off of my phone. But I'll likely only bother if that occurs during our meeting. Otherwise I'll take it as an opportunity to go for a walk or something :)15:12
slittlePlease add me as first core of starlingx-app-security-profiles-operator-core.  All add the others.15:14
fungislittle: done!15:15
clarkbfungi: frickler: considering my possible network outage and the gitea stuff you've been looking at already do we want ot proceed with https://review.opendev.org/c/opendev/system-config/+/862374 today or should we hold off?15:16
slittlethanks15:16
fungiany time15:16
fungiclarkb: i think it's fine to move forward with that. i don't see anything to indicate that the new lb is at fault15:16
fungiand i expect to be around all day if we need to address something15:17
clarkbfungi: did you want to approve it or should I?15:18
clarkbanotehr thing to consider is that will trigger jobs for all the things since it updates the inventory15:18
clarkbwe might want to check with ianw to see if the bridge work is stable enough for that15:19
fungii can approve it momentarily15:21
opendevreviewMerged opendev/system-config master: Remove gitea-lb01 and jvb02 from our inventory  https://review.opendev.org/c/opendev/system-config/+/86237415:30
clarkbthanks!15:30
clarkbfungi: https://review.opendev.org/c/opendev/gerritbot/+/861474 is the last change before we can drop python 3.8 image builds15:40
clarkbI'm hoping to be able to tackle the 3.9 -> 3.10 updates soon too. But those are much more involved (we had a lot more stuff on 3.9 after the 3.7 drop)15:41
clarkbinfra-root last call on feedback for the storyboard email draft https://etherpad.opendev.org/p/fAVaSBNXEzwsMpfcrKz315:44
*** jpena is now known as jpena|off15:48
clarkbalso today is the day we said we would switch the default nodeset to jammy15:50
*** marios|ruck is now known as marios|out15:51
*** dviroel is now known as dviroel|lunch15:56
clarkbhttps://review.opendev.org/c/zuul/zuul/+/862622 that might be the first python 3.11 job run on opendev?15:59
fungifunny, i was just trying out 3.12.0a1 on my workstation now that it exists16:03
clarkbapparently 3.12 will add some limited jit functionality to the interpreter16:04
fungithat's the hope, though it's way too early to know for sure all that will land16:05
fungiit looks like we don't have a base-jobs change for the default nodeset up. i'll propose that momentarily16:07
opendevreviewJeremy Stanley proposed opendev/base-jobs master: Switch default nodeset to ubuntu-jammy  https://review.opendev.org/c/opendev/base-jobs/+/86262416:12
fungiinfra-root: ^16:13
opendevreviewMerged opendev/gerritbot master: Switch the docker image over to python 3.10  https://review.opendev.org/c/opendev/gerritbot/+/86147416:14
clarkbhttps://zuul.opendev.org/t/zuul/build/3bad42f7277f414b934d80e025333b05 note the error on that one. Not sure I've ever seen that before16:16
clarkboh I interpreted it as a debian install error due to a hash mismatch16:17
clarkbbut I think this is a separate thing. Far less interesting :)16:18
clarkbif I manually pull the tarball the sha checks out. Looking at the log I suspect we got a short read so the file was incomplete?16:23
clarkbnetwork problems abound?16:24
fungiperhaps16:25
fungibad day to be on the internet maybe16:26
fungii guess we can include the announcement of 862624 in today's meeting and then merge it once the meeting concludes16:27
clarkbworks for me16:27
corvusin case anyone finds it interesting, across all of opendev's cloud providers, it takes a pretty consistent 3-4 seconds for the initial nova api create http request to return.16:41
*** rlandy|dr_appt is now known as rlandy16:55
*** dviroel|lunch is now known as dviroel|17:04
*** dviroel| is now known as dviroel17:04
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Pin py38 jobs to focal  https://review.opendev.org/c/zuul/zuul-jobs/+/86262817:09
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Add tox-py311 job  https://review.opendev.org/c/zuul/zuul-jobs/+/86262917:09
clarkbinfra-root heads up docker hub appears to be having trouble17:12
clarkbprobably good to double check image promotions occur as expected when merging changes (the gerritbot promotion appears to have succeeded)17:12
fungithanks for spotting17:13
* corvus keeps refreshing https://status.docker.com/ expecting to see something other than green17:14
clarkbI'm going to send that storyboard email now as I haven't heard objections17:15
corvushttps://forums.docker.com/t/unexpected-http-status-530/130583/4 has a comment suggesting cloudflare issues17:15
corvushttps://www.cloudflarestatus.com/incidents/kdpqngcbbn2517:15
clarkbthat could explain why github release downloads (for the openshift client tarball) also had problems17:15
clarkband email sent17:21
*** dviroel is now known as dviroel|appt17:27
opendevreviewClark Boylan proposed opendev/system-config master: Add python 3.11 docker images  https://review.opendev.org/c/opendev/system-config/+/86263117:44
corvuscloudflare claims issue is resolved19:00
fungii've approved 862624 to switch our default nodeset to ubuntu-jammy now20:01
clarkbthanks!20:01
opendevreviewMerged opendev/base-jobs master: Switch default nodeset to ubuntu-jammy  https://review.opendev.org/c/opendev/base-jobs/+/86262420:06
fungiso took effect as of 20:06 utc today20:14
clarkbI've pushed a PR to fix pip's testsuite, but github doesn't allow you to stack PRs and there is no depends on for github actions so I can't get the other fix to run on this fix20:37
*** dviroel|appt is now known as dviroel20:39
ianwclarkb: it's barbaric isn't it :)20:41
ianwi'm going to keep some notes at https://etherpad.opendev.org/p/bastion-upgrade-nodes-2022-10 as i figure out upgrade.  i'll condense into a more checklist thing when it's all working20:42
clarkbturns out my PR to fix pip won't fix things because they actually do a `git submodule update` on the package install side of things outside of the test suite20:56
clarkbrip them.20:56
clarkb(its possible they may have to stop supporting this feature unless users explicitly toggle a flag?)20:57
*** rlandy is now known as rlandy|bbl21:22
clarkbsince docker hub is apparently happier now I'm going to approve the change that removes the 3.8 images. I don't believe anything is running on them anymore21:55
*** dviroel is now known as dviroel|afk21:57
fungisounds great, thanks22:06
opendevreviewMerged opendev/system-config master: Drop python 3.8 base image builds  https://review.opendev.org/c/opendev/system-config/+/86148022:06
opendevreviewMerged opendev/base-jobs master: Switch to bridge01.opendev.org  https://review.opendev.org/c/opendev/base-jobs/+/86255122:38
opendevreviewClark Boylan proposed opendev/system-config master: WIP Upgrade to Gitea 1.18  https://review.opendev.org/c/opendev/system-config/+/86266123:49
clarkbI don't necessarily expect that to work yet. They didn't publish a list of change sfor 1.18.0-rc0 so I've just compared the dockerfile (new golang version) and the templates23:50
clarkbthere is likely more that needs dealing with.23:50
clarkbIf it does work we should be able to confirm my vendor file classification fix makes our repos look less weird23:52

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!