Tuesday, 2022-12-06

clarkbthe weekly team meeting will begin momentarily18:59
ianwo/18:59
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Dec  6 19:01:06 2022 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/GCABXQDEGIAYG4T63NXZJGNHACEICKAP/ Our Agenda19:01
clarkb#topic Announcements19:01
clarkbThe foundation is entering election time for the board. Nominations for individual members close in 10 days on the 16th of december19:02
clarkbThen an election is held in January19:02
clarkbAny other announcements?19:02
fungiboard meeting today19:03
clarkboh right an hour after the end of this meeting (21:00 UTC) there will be a board meeting19:04
fungi21:00 utc in zoom19:04
fungiyep19:04
clarkbtools for openstack translations will be discussed which might interest this crowd19:04
fungihttps://board.openinfra.dev/meetings/2022-12-0619:04
fungithat 'un19:04
clarkb#topic Bastion Host Updates19:06
clarkbI think we are getting very close to the end of this thread.19:06
clarkb#link https://review.opendev.org/c/opendev/system-config/+/866542 addresses ansible installation on bridge to actually update to the ansible we are testing with19:06
clarkb#link https://review.opendev.org/q/topic:prod-bastion-group parallelized zuul jobs on bridge. Should land when bridge is stable and we can monitor19:06
clarkb#link https://review.opendev.org/q/topic:bridge-ansible-venv This group appears to have all of its changes merged or abandoned19:07
clarkbianw: Anything else to say on this topic? I need to rereview 866542 which is on my todo list for today19:07
ianwyeah 866542 just got a rebase really since you looked at it yesterday, i removed a change that was updating a removed comment19:08
ianwthe stack really expanded to 19:08
ianw#link https://review.opendev.org/q/topic:boostrap-ansible-from-req19:08
clarkboh I see there are a few followons19:08
ianwwhich just moves the same idea to the venv creation, which i noticed when watching the logs19:09
ianwthe other stack that needs feedback and action, particularly from infra-roots is19:09
ianw#link https://review.opendev.org/q/topic:bridge-backups19:09
clarkboh right I had that in my local agenda notes. sorry19:09
clarkband what that does is encrypt things locally so they can be backed up remotely right?19:10
ianwessentially yes, with a key split requiring 2 people to recombine 19:11
ianwthis is so nobody needs to feel like they need to setup fort knox to keep the backup19:11
clarkbI'll have to read into that more to understand the mechanics of it. Like do we all need to forward gpg agents or something? But that can happen in review or in #opendev19:12
clarkbI'll do my best to review those two stacks after the board meeting today19:13
clarkbAnything else bastion related?19:13
ianwthanks; https://review.opendev.org/c/opendev/system-config/+/866430 should be pretty explanatory for that i think19:13
ianwnope, thanks19:13
clarkb#topic Upgrading old servers19:14
clarkbNothing new here other than we should find time to do more of this :/19:14
clarkbI guess technically the bastion work is a subset of this so we are pushing that along :)19:14
fungitechnically we've partially upgraded the listserv too19:14
clarkband the mm3 work isn't directly related but does get us off an old serverthat has kernel fun19:14
fungiyeah that19:14
clarkbprogress then. I'll take it :)19:14
fungii guess we already upgraded the distro on the old mailman server anyway19:15
clarkb#link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades Notes19:15
fungijust not painlessly19:15
clarkbyup19:15
clarkbWhich is a good lead into the next topic19:15
clarkb#topic Mailman 319:15
clarkb#link https://etherpad.opendev.org/p/mm3migration Server and list migration notes19:15
clarkblists.opendev.org and lists.zuul-ci.org moved to mailman3 on the new server yesterday19:15
fungiand within the scheduled window even19:16
fungithough in retrospect i should have called it two hours just in case19:16
fungii didn't factor in gate/deploy time for the dns updates19:16
clarkbthere were/are a couple of issues we found in the process. One was fixed which correct some url routes. The other is setting a site_owner value which was missed because all the other settings are set by env vars but not this one19:16
clarkb#link https://review.opendev.org/c/opendev/system-config/+/866632 set mailman3 site_owner19:16
clarkbfungi: we managed to make the timing work in the end19:17
clarkbThere is also a change to spin things down on the old server for these two domains19:17
clarkb#link https://review.opendev.org/c/opendev/system-config/+/866630 disable old mm2 domains19:17
fungiyeah, the broken archiving is totally my bad. when i re-tested the held node after we updated to the latest mm release, i forgot to double-check that new list messages ended up in the imported archive19:17
fungithanks to corvus and clarkb for figuring it out while i was off stuffing my face19:18
clarkbOne "exciting" thing is that the upstream mailing has a thread from today (of course the day after we upgrade) suggesting people not run latest mm3 which we are doing.19:18
fungihah19:18
ianwwill 866632 require a restart?19:18
clarkbThe reason for this is a bug when handling more than 10 list sites possibly postfix specific (we run exim)19:18
fungithat's so our luck19:18
clarkbianw: yes19:18
clarkb#link https://gitlab.com/mailman/mailman/-/issues/1044 bug with more than 10 lists19:18
clarkblists.opendev.org has 10 lists and lists.zuul-ci.org has fewer. I've also grepped for that warning string in our logs. I'm not sure if we are not affected because we use exim or if it is because we have few lists but I haven't found evidence we have a problem yet19:19
clarkbsomething to watch though as lists.openstack.org has many more than 10 lists and we want this to be working before we upgrade19:19
clarkber s/upgrade/migrate lists.openstack.org/19:19
clarkbAnd please say something if you see the indicated behavior with the migrated lists19:20
fungithough lists.openstack.org also has way fewer lists than it used to, and might have fewer still if i can convince them to retire some more unused ones before we move it19:20
clarkbOther than that I think this went really well. These two domains are newer but we migrated stuff off of an ancient server and software to something modern and it seems to work19:20
clarkbthe testing and planning seem to have done their job. Thank you to everyone who helped with that. fungi in particular did a lot of work on that side of things19:21
fungithanks everyone for all the work on that19:21
fungiit's been on my wishlist for years and i was never able to find the time to tackle it on my own19:21
clarkbanything else to add?19:22
funginothing on my end19:22
clarkb#topic Quo vadis Storyboard19:23
clarkbI just realized my old link should be updated to the new hyperkitty archives. Oh well19:23
clarkbI did send a followup covering our options and asked for feedback19:23
clarkbThe one response I got ws someone offering to help with the software but unfortunately I think we need to start with the deployment if we are going to adopt it19:23
clarkb*if we adopt software maintenance we need to commit to updating the deployment first19:24
clarkbI'll leave it open for more feedback as it has only been about a week. I'd be happy to hear from you19:24
clarkband I guess if that doesn't work I can suggest that people provide semi anonymous feedback instead and I can try to colate it if people trust me to do that19:24
clarkbBut I want to amke sure whatever we do here is reasonable and will be accepted19:25
fungiyes, the software is already well ahead of what we're running in terms of major bug and performance fixes and new features19:25
fungiwhich is a big part of the problem19:25
fungiwe had volunteers to develop the software, but nobody keeping our deployment up to date with it19:25
clarkbright19:26
clarkbanyway, lets see how we do over the next week for feedback and we can take a different appraoch if this continues to not generate input19:26
clarkbI think 2 weeks is a reasonable amount of time for this sort of thing and we are halfway through that right now19:26
fungiagreed. thanks!19:26
clarkb#topic Vexxhost server rescues19:27
clarkbjrosser shared image settings with me19:27
clarkb#link https://paste.opendev.org/show/bxxFEEUWeUrkIVlBSGrw/ jrosser's image settings that work in their cloud19:27
clarkbI've got a todo item to try and upload an image with those settings set and use it as a rescue image after modifying the root boot label19:27
clarkbBut I haven't done that yet19:27
clarkbThey use ceph too so I'm hopeful that this will work19:28
clarkb#topic Gerrit 3.619:28
clarkb#link https://etherpad.opendev.org/p/gerrit-upgrade-3.619:28
clarkbianw ran copy-approvals on all of our repos. We had a small problem in neutron due to a change with more than 1k patchsets which is our current limit19:29
clarkbianw temporarily bumped that limit and reran which caused things to work except for a corrupt change19:29
clarkbeven if that didn't work we would've been fine beacuse all of the neutron changes are closed and not open so their votes are largely there for historical accuracy19:29
clarkbianw: looks like you've noted the next steps are holding a node and double checking things a bit more directly19:30
clarkbas well as working on a proposal for the upgrade19:30
clarkbianw: is there anything around the gerrit upgrade we can help with?19:30
clarkbI will note the openstack release cycle schedule is at https://releases.openstack.org/antelope/schedule.html which we should avoid conflicts with19:31
ianwnot really, i just want to hold a node and validate downgrade, which i should be able to do very soon19:31
ianwif the mm upgrade time worked, we could do it then19:32
clarkbianw: ~2000UTC on a monday you mean?19:32
fungiwfm19:33
ianwyep, that would open up the 12th or the 19th, i'm around for both, though less time after the 19th19:33
clarkbya its a trade off I guess. Less lead time to test and announce with the 12th and less time to fix/debug if the 19th19:33
ianwif we get some held node validation over the next few days, maybe the 12th?  19:34
ianwi'm fairly confident, there doesn't seem to be much more we could test19:34
clarkbya if that all looks good and doesn't show anything that users should need to worry about I'd be good with the 12th19:34
clarkbwe can even send an announcement nowish indicating we plan to do it on the 12th and postpone if necessary19:34
clarkbI think there is a downgrade process for 3.6 -> 3.5 too so we have that option if necessary19:34
ianwand that's an excuse to send a message through the list to keep it ungreylisted too :)19:35
fungiheh19:35
clarkbif we test and confirm the downgrade process seems to work then I'm extra happy to proceed early19:35
clarkbI think 3.7 -> 3.6 has less easy downgrade though so that upgrade will be a funner one19:35
ianwok, i will get onto all that, https://etherpad.opendev.org/p/gerrit-upgrade-3.6 will be updated as things happen19:36
clarkbsounds good, thanks!19:36
clarkb#topic Open Discussion19:36
clarkbthat was it for the agenda but this morning I noticed something I had on my back burner has made some progress and is worth calling out19:37
clarkbThe first bit is nodepool updated to latest openstacksdk which includes ianw's fix for network stuff against older apis19:37
clarkbimage uploads seem to work (we have recent images) and I haven't seen any launcher issues. But we should skim the grafana dashboard for any evidence of problems19:38
clarkbAnd then that unlocked the path for updating zuul and nodepool images to python3.1119:38
clarkbThe zuul change has landed and nodepool is gating19:38
clarkbnodepool will restart once that change lands. zuul will normally restart over the weekend. Do we want to manually restart zuul sooner to observe it?19:38
clarkbI should be able to do that tomorrow if we think that is a good idea.19:39
clarkbIn particular one thing I realized is that ansible might not like python3.11? However, we do have zuul testing that exercises ansible so maybe its fine?19:39
clarkbcc corvus ^ if you have an opinion19:39
clarkbI'm also happy to revert if we think we need more prep19:39
clarkbOh also last week I cleaned up the inmotion cloud's nova and placement records19:40
clarkbThere were two distinct issues. The first was that placement had leaked a few records for nodes that just didn't exist anymore either on the host or in the nova db19:41
clarkbthe second was the nova db leaked instances that didn't exist in libvirt on the hosts19:41
clarkbcleaning up the first thing is relatively straightforward and placement has docs on the process.19:41
clarkbCleaning up the second thing required manually editing the nova db to associate nodes with the cell they lived in because some nova bug allowed them to be disassociated whch broke server deletion. Once those records were updated server delete worked fine19:42
clarkbmelwitt was a huge help in sorting that out, but now we have more nodes to test with so yay19:42
clarkboh we also had leaked nodes in rax19:43
clarkbthey didn't have proper nodepool metadata so nodepool refused to clean them up. i manually cleared those out19:43
corvusclarkb: i think zuul's own tests should give us a heads up on any ansible/python probs like that.  i don't have a strong feeling about whether we need to restart early or just let it happen on the weekend19:43
clarkbcorvus: ack thanks19:44
clarkbas far as team meetings go I think we'll cancel the 27th. Any strong opinions for having meetings on the 13th or january 3?19:44
fungii'll be round on the 13th and 3rd but don't necessarily require a meeting19:45
clarkber sorry the 20th and 3rd19:45
clarkbI plan to be around on the 13th and have that meeting19:45
fungii also should be around on the 20th but may be a little distracted19:45
ianwi should be around on 20th ... unsure on 319:46
ianwfor sure not 27th19:46
clarkbok we can do a low key meeting on the 20th, then see what the new year looks like when we get there19:47
fungii do expect to have far more work than usual the week of the 3rd so may be distracted then too19:47
clarkbya its the time of year when all the paperwork needs to be done :)19:47
fungiso much paperwork19:47
clarkbalright then we'll see you here on the 13th and probably the 20th. Then we can enjoy the holidays for a bit (and you should nejoy them earlier too if you are able :) )19:47
fungithanks clarkb!19:48
clarkbanythnig else?19:48
corvusschedule a holiday party for the 20th ;)19:48
clarkbgood idea. Let me see if I can figure something out for that19:49
clarkbboard game arena game or something :)19:49
clarkbthank you everyone for your time, I'll let you go now. See you next week.19:49
clarkb#endmeeting19:49
opendevmeetMeeting ended Tue Dec  6 19:49:48 2022 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:49
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2022/infra.2022-12-06-19.01.html19:49
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-12-06-19.01.txt19:49
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2022/infra.2022-12-06-19.01.log.html19:49
clarkband now lunch before the board meeting19:51

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!