#opendev-meeting log

19:01:06 <clarkb> #startmeeting infra
19:01:06 <opendevmeet> Meeting started Tue Feb  7 19:01:06 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:06 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:06 <opendevmeet> The meeting name has been set to 'infra'
19:01:31 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/QJK7E7D7HG5ZNT4UE7T5QIQ5TARIAXP6/ Our Agenda
19:01:35 <clarkb> #topic Announcements
19:02:30 <clarkb> The service coordinator nomination period is currently open. You have until February 14 to put your name into the hat. I'm happy to chat about it if there is interest too before any decisions are made
19:02:39 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/32BIEDDOWDUITX26NSNUSUB6GJYFHWWP/
19:02:59 <clarkb> Also, I'm going to be out tomorrow (jsut a heads up)
19:04:25 <clarkb> #topic Topics
19:04:31 <clarkb> #topic Bastion Host Updates
19:04:42 <clarkb> #link https://review.opendev.org/q/topic:bridge-backups
19:04:58 <clarkb> I truly feel bad for not getting to this. I should schedule an hour on my calendar just for this already. But too many fires keep coming up
19:05:12 <clarkb> ianw: fungi: were there any other bastion host updates you wanted to call out?
19:05:54 <fungi> i don't think so
19:06:21 <ianw> sorry, woke up to a dead vm, back now :)
19:06:39 <clarkb> you haven't missed much. Just wanted to make sure there wasn't anthing else bastion related before continuing on
19:06:42 <ianw> no changes related to that this week
19:06:50 <clarkb> #topic Mailman 3
19:07:14 <clarkb> The restart of containers ot pick up the new site owner email landed and fungi corrected the root alias email situation
19:07:36 <fungi> current state is that i need to work out how to create new sites in django using ansible so that the mailman domains can be associated with them
19:07:38 <clarkb> Fixing the vhosting is still a WIP though I think fungi roughly understands the set of steps tha tneed to be taken and now is just a matter of figuring out how to automate django things
19:08:39 <fungi> and yeah, this is really designed to be done from the django webui. if i were a seasoned django app admin i'd have a better idea of what makemigrations could do to ease that from the command line
19:09:00 <clarkb> I wonder if we've got any of those in the broader community? Might e worth reaching out to the openstack mailing list?
19:09:15 <fungi> but it's basically all done behind the scenes by creating database migrations which prepopulate new tables for the site you're creating
19:10:20 <fungi> databases were never my strong suit to begin with, and db migratopns are very much a black box for me still. django seems to build on that as a fundamental part of its management workflow
19:10:23 <clarkb> ya I suspect what we might end up with is having a templated migration file in ansible that gets written out to $dir for mailman for each site and then ansible triggers the migrations
19:10:45 <clarkb> and future migrations should just ensure that steady state without changing much
19:10:58 <clarkb> the tricky bit will be figuring out what goes into the migration file definition
19:11:17 <fungi> yeah, django already templates the migrations, as i loosely understand it, which is what manage.py makemigrations is for
19:11:56 <fungi> it seems you're expected to tell django to build the migrations necessary for the new site, and then to apply those migrations it's made
19:12:13 <fungi> which results in bringing the new site up
19:12:20 <ianw> it sort of seemed like you needed a common settings.py, and then each site would have it's own settings.py but with a different SITE_ID?
19:12:44 <fungi> i think so, but then mailman when it runs needs SITE_ID=0 instead
19:12:54 <clarkb> ianw: I think thats for normal django multi sites. But mailman doesn't quite do it that way? YOu don't have a true extra site it just uses the site db info to vhost its single deployment
19:13:08 <fungi> which is a magic value telling it to infer the site from the web requests
19:13:17 <clarkb> ya so ultimately we run a single site with ID=0 but the db has entries for a few sites
19:14:04 <fungi> the other related tidbit is i need to update docker on lists01 and restart the containers
19:14:18 <fungi> which i plan to do first on a held node i have that pre-dates the new docker release
19:15:10 <clarkb> cool sounds like we know what needs to happen just a matter of sorting through it. Anything else?
19:16:33 <fungi> i don't have anything else, no
19:16:47 <clarkb> #topic Git updates
19:16:54 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/873012 Update our base images
19:16:55 <fungi> i restacked the mm3 version upgrades change behind the vhosting work
19:17:32 <clarkb> The base python images did end up updating. Then I realized we use the -slim images which don't include git so this isn't really useful other than as a semi periodic update to the other things we have installed
19:17:56 <clarkb> I was looking at the non slim images to see if git had updated not realizing we only have git where we explicitly install it. All that to say next week we can drop this topic.
19:18:06 <clarkb> And that change is not urgent, but probably also a reasonable thing to do
19:18:49 <clarkb> #topic New Debuntu Releases Preventing sudo pip install
19:19:11 <clarkb> fungi called out that debian bookworm and consequently ubuntu 23.04 and after will prevent `sudo pip install` from working on those systems
19:19:55 <clarkb> For OpenDev we've shifted a lot of things into docker images built on our base python images. These don't use debian packaging for python and I suspect will be fine. However if they are not we should be able to modify the intsallation system on the image to use a single venv that gets added to $PATH
19:20:04 <clarkb> I think this means the risk to us is relatively low
19:20:27 <clarkb> Aditionally ansible is already in a venv on bridge and we use venvs on our test images
19:20:46 <ianw> docker-compose isn't though.  that's one i've been meaning to get to
19:20:52 <clarkb> good call
19:21:18 <clarkb> definitely anything you can think of that is still running outside of a venv should be moved. We can do that ahead of the system server upgrades that will break us since old stuff can handle venvs
19:21:53 <ianw> ++ i'm sure we can work around it, but it's a good push to do things better
19:22:04 <clarkb> Elsewhere we should expect projects like openstack and probably starlingx to struggle with this change
19:22:17 <clarkb> in particular tools like devstack are not venv ready
19:22:32 <fungi> yeah, i posted to openstack-discuss about it as well, just to raise awareness
19:22:48 <ianw> yeah there have been changes floating around for years, that we've never quite finished
19:23:37 <clarkb> and ya I think talking about it semi regularly is a good way to keep encouraging people tochip away at it
19:23:52 <clarkb> for a lot of stuff we should be able to make msall measureable progress with minimal impact over time
19:25:01 <clarkb> #topic Gerrit Updates
19:25:27 <clarkb> A number of Gerrit related changes have landed over the last week. In particular our use of submit requirements was cleaned up and we have a 3.7 upgrade job
19:25:38 <clarkb> That expanded testing was used to land the base image swap for gerrit
19:25:53 <clarkb> this base image swap missed (at least) one thing: openssh-client installation
19:26:06 <clarkb> this broke jeepyb as it uses ssh to talk to gerrit for new repo creation via the manage-projects tool
19:26:13 <clarkb> Apologies for that.
19:26:59 <clarkb> fungi discovered that even after fixing openssh jeepyb's manage-projects wedges itself for projects if the initial creation fails. The reason for this is that no branch is created in gerrit if manage-projects fails on the first run. This causes subsequent runs to clone from gerrit and not be able to checkout master
19:27:19 <clarkb> To work around this fungi manually pushed a master branch to starlingx/public-keys
19:27:49 <fungi> and discovered in the process that you need an account which has agreed to a cla in gerrit in order to do that to a cla-enforced repository
19:28:11 <fungi> my fungi.admin account had not (as i suspect most/all of our admin accounts haven't)
19:28:18 <clarkb> I've only had a bit of time today to think about that but part of thinks that this may be desireable as I'm not sure we can fully automate around all the gerrit repo creation failed causes?
19:28:32 <fungi> the bootstrapping account is in the "System CLA" group, which seems to be how it gets around that
19:28:36 <clarkb> in this specific case we chould just fallback to reiniting from scratch but I'm not sure that is appropriate for all cases
19:28:55 <clarkb> fungi: ya I wonde rif we should just go ahead and add the admin group to system cla or something like that
19:29:17 <fungi> or add project bootstrappers to it
19:29:24 <clarkb> ah yup
19:29:27 <fungi> as an included group
19:29:52 <clarkb> with that all sorted I think ianw's change to modify acls is landable once communicated
19:29:54 <clarkb> #link https://review.opendev.org/c/openstack/project-config/+/867931 Cleaning up deprecated copy conditions in project ACLs
19:30:12 <clarkb> it would've had a bad time with no ssh :(
19:30:30 <fungi> indeed
19:30:43 <fungi> thanks for fixing it!
19:30:47 <ianw> yeah sorry, will send something up about that
19:30:55 <clarkb> Other Gerrit items include a possible upgrade to java 17
19:30:59 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/870877 Run Gerrit under Java 17
19:31:10 <clarkb> I'd still like to hunt down someone who can explain the workaround that is necessary for that to me a bit better
19:31:31 <clarkb> but I'm finding that the new discord bridge isn't as heavily trafficed as the old slack system. I may have to break down and sign up for discord
19:31:59 <clarkb> And yesterday we had a few users reporting issues with large repo fetches
19:32:09 <clarkb> ianw did some debugging on that and it resulted in this issue for MINA SSHD
19:32:11 <clarkb> #link https://github.com/apache/mina-sshd/issues/319 Gerrit SSH issues with flaky networks.
19:32:58 <ianw> oh, that just got a comment a few minutes ago :)
19:34:08 <ianw> ... sounds like whatever we try is going to involve a .java file :/
19:34:45 <clarkb> ya looks like tomas has a theory but we need to update gerrit to better instrument things in order to confirm it
19:34:51 <clarkb> Progress at least
19:35:51 <clarkb> Anything else gerrit related before we move on?
19:35:55 <ianw> jayf was the first to mention it, but it is a pretty constant thing in the logs
19:36:31 <clarkb> if it is a race the chagne in jdk could be exposing it more too
19:36:41 <clarkb> since that may affect underlying timing of actions
19:36:51 <fungi> and others are still reporting connectivity issues to gerrit today (jrosser at least)
19:37:21 <clarkb> oh side note: users can use https if necessary. Its maybe a bit more clunky if using git-review but is a fallback
19:37:42 <ianw> i think it would be easy-ish to add the close logging suggested there in the same file
19:38:07 <ianw> (if it is) i could try sending that upstream, and if it's ok, we could build with a patch
19:38:09 <clarkb> yup and we could even patch that into our image if upstream doesn't want the extra debugging (though ideally we'd be upstream first as I like not having a fork)
19:38:48 <ianw> yeah.  although we haven't had a lot of response on upstream things lately :/  but that was mail, not patches
19:39:21 <clarkb> ianw: oh also March 2 at a terrible time of day for you (8am for me) they have their community meeting. Why don't I go ahead and throw this on the agenda and I'll do my best to attend
19:39:25 <clarkb> I can ask about java 17 too
19:40:12 <clarkb> (not that we have to wait that long just figure having a direct conversation might help move some of these things forward)
19:40:29 <ianw> ++
19:40:53 <clarkb> #topic Python 2 removal from test images
19:41:05 <clarkb> 20 minutes left lets keep things moving
19:41:24 <clarkb> some projects have noticed the python2 removal. It turns out listing python2 as a dependency in bindep was not something everyone understood as necessary
19:41:37 <clarkb> some projects like nova and swift are fine. Others like glance and cinder and tripleo-heat-templates are not
19:42:16 <clarkb> When this came up earlier today I had three ideas for addressing this. A) revert the python2 removal from test images B) update things to fix buggy bindep.txt C) have -py27 jobs explicitly install python2
19:42:46 <clarkb> I'm beginning to wonder if we should do A) then announce we'll remove it again after the antelope release so openstack should do either B or C in the meantime?
19:42:49 <fungi> per a post to the openstack-discuss ml. tripleo seems to have gone ahead with option b
19:43:09 <ianw> yeah i'm just pulling it up ...
19:43:18 <ianw> i think maybe we have openstack-tox-py27 install it
19:43:27 <fungi> apparently stable branch jobs supporting python 2.7 are very urgent to some of their constituency
19:43:30 <clarkb> my main concern here is that openstack isn't using bindep properly
19:43:52 <ianw> i agree on that
19:44:18 <ianw> if we put it back in the images, i feel like we just have to do a cleanup again at some point
19:44:37 <clarkb> ianw: yup I think we'd remova python2 again say Late april after the openstack release?
19:44:39 <ianw> at least if it's in the job, when the job eventually is unreferenced, we don't have to think abou tit again
19:44:46 <fungi> what is properly in this case? they failed to specify a python version their testing requires... i guess that means they should include python3 as well
19:44:46 <clarkb> thats a good point
19:44:56 <clarkb> fungi: yes python3 should be included too
19:45:14 <ianw> yeah, i mean the transition point between 2->3 was/is a bit of a weird time
19:45:34 <ianw> they *should* probably specify python3, but practically that's on all images
19:45:41 <ianw> at least until python4
19:45:48 <clarkb> I suspect that nova and swift have/had user using bindep outside of CI
19:46:01 <fungi> also a chicken-and-egg challenge for our jobs running bindep to find out they already have the python3 requested
19:46:02 <clarkb> and that is why theirs are fine. But the others never used bindep except for in CI and once things went green they shipped it
19:46:48 <clarkb> So maybe the fix is update openstack -py27 jobs to install python2 and encourage openstack to update their bindep files to include runtime dependencies
19:46:48 <fungi> basically we can't really have images without python3 on them, because ansible even beofre it runs bindep
19:48:04 <fungi> so, yeah, i agree including python3 in bindep.txt is a good idea, it just can't be enforced by ci through exercising the file itself (a linting rule could catch it though)
19:48:08 <clarkb> we also don't need to solve that in the meeting (lack of time) but I wanted to make sure everyone was aware of the speed bump they hit
19:48:11 <ianw> ++ i'll have a suggested patch to openstack-zuul-jobs for that in a bit
19:48:16 <clarkb> thanks
19:48:22 <clarkb> #topic Docker 23
19:48:46 <clarkb> Docker 23 released last week (skipping 21 and 22) and created some minor isues for us
19:49:07 <clarkb> In particular they have an unlisted hard dependency on apparmor which we've worked around in a couple of places by installing apparomor
19:49:42 <clarkb> Also things using buildx need to explicitly install buildx as it has a separate package now (docker 23 makes buildx the default builder for linux too, I'm not sure how that works if buildx isn't even installed by default though)
19:49:52 <fungi> hard dependency on apparmor for debuan-derivatives anyway
19:50:00 <clarkb> right
19:50:00 <fungi> s/debuan/debian/
19:50:07 <clarkb> and maybe on opensuse but we don't opensuse much
19:50:32 <clarkb> at this point I think the CI situation is largely sorted out and ianw has started a list for working through prod updates
19:50:41 <clarkb> prod updates are done manually because upgrading docker implies container restarts
19:51:22 <clarkb> Mostly just a call out topic since these errors have been hitting things all across our world
19:51:27 <ianw> #link https://etherpad.opendev.org/p/docker-23-prod
19:51:29 <clarkb> thank you to everone who has helped sort it out
19:51:50 <ianw> most done, have to think about zuul
19:52:02 <clarkb> ya zuul might be easiest in small batches
19:52:12 <ianw> i'm thinking maybe the regular restart playbook, but with a forced docker update
19:52:19 <ianw> rolling restart playbook
19:52:34 <clarkb> ya that could work too. A one off playbook modification?
19:52:43 <ianw> yeah, basically just run a custom playbook
19:52:51 <fungi> the pad contains list.katacontainers.io (what are we using docker for there?) but not lists.openstack.org
19:53:08 <clarkb> fungi: we're not I think the entire inventory went in there and has been edited to reflect reality?
19:53:11 <corvus> that seems like it should work
19:53:15 <fungi> oh, i see lists.openstack.org is in the not using list
19:53:27 <fungi> list.katacontainers.io probably just hasn't been checked yet
19:53:39 <ianw> yeah sorry, i didn't
19:53:54 <ianw> what i would like to do after this is rework things so we have one docker group
19:53:55 <fungi> no worries, i'll take a look
19:54:19 <ianw> so hosts that run install-docker now are all in that group.  will take a bit of playbook swizzling
19:54:23 <clarkb> ok running out of time and I want to get to ade_lee's topic
19:54:45 <clarkb> #topic FIPS jobs
19:54:52 <ade_lee> :)
19:54:56 <clarkb> speaking of swizzling
19:55:03 <fungi> at this point 866881 needs a second zuul/zuul-jobs
19:55:05 <fungi> reviewer
19:55:15 <fungi> the rest of the changes are ready to merge once that does?
19:55:16 <clarkb> #link https://review.opendev.org/c/zuul/zuul-jobs/+/866881
19:55:27 <clarkb> #link https://review.opendev.org/c/zuul/zuul-jobs/+/866881
19:55:39 <clarkb> #link https://review.opendev.org/c/openstack/project-config/+/872222
19:55:39 <ade_lee> I think so yes
19:55:50 <clarkb> https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/872223
19:56:03 <fungi> ianw and i +2'd the later changes ready to approve once the zuul-jobs change is in
19:56:24 <clarkb> and the tldr here is the jobs are getting reorganized to handle pass to parent and early fips reboot needs. They should emulate how our jobs for docker images are set up
19:56:26 <clarkb> right?
19:56:53 <ade_lee> yup
19:56:56 <fungi> more to handle the need for secret handling in the new role that handles ubuntu advantage subscriptions
19:57:12 <clarkb> ah right thats the bit that needs the secret and uses pass to parent
19:57:29 <fungi> ua just ends up being a prerequisite for fips on ubuntu
19:57:44 <fungi> since it requires a license to get the packages
19:58:01 <fungi> (which opendev has been granted by canonical in order to make this work)
19:58:25 <clarkb> sounds like mostly just need reviews at this point. I'll tr to review today if I don't run out of time.
19:58:35 <clarkb> #topic Open Discussion
19:58:43 <clarkb> Any last minute concerns or topics before we can all go find a meal?
19:58:44 <ade_lee> clarkb, that would be great - thanks!
19:59:16 <fungi> we're running into dockerhub tag pruning issues which are blocking deployment from image updates
19:59:25 <clarkb> ianw has a change to aid in debugging that
19:59:34 <fungi> just a heads up to people who haven't seen the discussion around that yet
19:59:35 <clarkb> #link https://review.opendev.org/c/zuul/zuul-jobs/+/872842
20:00:05 <fungi> as soon as that's worked out we'll have donor logos on the main opendev.org page
20:00:09 <ianw> also speaking of distro deprecated things
20:00:13 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/872808
20:00:24 <ianw> was one to stop using apt-key for the docker install ... it warns on jammy now
20:00:43 <fungi> thanks for fixing that
20:00:48 <clarkb> and reminder I'll be afk tomorrow
20:01:16 <clarkb> thats our hour. Thanks everyone
20:01:18 <clarkb> #endmeeting