19:01:06 #startmeeting infra 19:01:06 Meeting started Tue Feb 7 19:01:06 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:06 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:06 The meeting name has been set to 'infra' 19:01:31 #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/QJK7E7D7HG5ZNT4UE7T5QIQ5TARIAXP6/ Our Agenda 19:01:35 #topic Announcements 19:02:30 The service coordinator nomination period is currently open. You have until February 14 to put your name into the hat. I'm happy to chat about it if there is interest too before any decisions are made 19:02:39 #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/32BIEDDOWDUITX26NSNUSUB6GJYFHWWP/ 19:02:59 Also, I'm going to be out tomorrow (jsut a heads up) 19:04:25 #topic Topics 19:04:31 #topic Bastion Host Updates 19:04:42 #link https://review.opendev.org/q/topic:bridge-backups 19:04:58 I truly feel bad for not getting to this. I should schedule an hour on my calendar just for this already. But too many fires keep coming up 19:05:12 ianw: fungi: were there any other bastion host updates you wanted to call out? 19:05:54 i don't think so 19:06:21 sorry, woke up to a dead vm, back now :) 19:06:39 you haven't missed much. Just wanted to make sure there wasn't anthing else bastion related before continuing on 19:06:42 no changes related to that this week 19:06:50 #topic Mailman 3 19:07:14 The restart of containers ot pick up the new site owner email landed and fungi corrected the root alias email situation 19:07:36 current state is that i need to work out how to create new sites in django using ansible so that the mailman domains can be associated with them 19:07:38 Fixing the vhosting is still a WIP though I think fungi roughly understands the set of steps tha tneed to be taken and now is just a matter of figuring out how to automate django things 19:08:39 and yeah, this is really designed to be done from the django webui. if i were a seasoned django app admin i'd have a better idea of what makemigrations could do to ease that from the command line 19:09:00 I wonder if we've got any of those in the broader community? Might e worth reaching out to the openstack mailing list? 19:09:15 but it's basically all done behind the scenes by creating database migrations which prepopulate new tables for the site you're creating 19:10:20 databases were never my strong suit to begin with, and db migratopns are very much a black box for me still. django seems to build on that as a fundamental part of its management workflow 19:10:23 ya I suspect what we might end up with is having a templated migration file in ansible that gets written out to $dir for mailman for each site and then ansible triggers the migrations 19:10:45 and future migrations should just ensure that steady state without changing much 19:10:58 the tricky bit will be figuring out what goes into the migration file definition 19:11:17 yeah, django already templates the migrations, as i loosely understand it, which is what manage.py makemigrations is for 19:11:56 it seems you're expected to tell django to build the migrations necessary for the new site, and then to apply those migrations it's made 19:12:13 which results in bringing the new site up 19:12:20 it sort of seemed like you needed a common settings.py, and then each site would have it's own settings.py but with a different SITE_ID? 19:12:44 i think so, but then mailman when it runs needs SITE_ID=0 instead 19:12:54 ianw: I think thats for normal django multi sites. But mailman doesn't quite do it that way? YOu don't have a true extra site it just uses the site db info to vhost its single deployment 19:13:08 which is a magic value telling it to infer the site from the web requests 19:13:17 ya so ultimately we run a single site with ID=0 but the db has entries for a few sites 19:14:04 the other related tidbit is i need to update docker on lists01 and restart the containers 19:14:18 which i plan to do first on a held node i have that pre-dates the new docker release 19:15:10 cool sounds like we know what needs to happen just a matter of sorting through it. Anything else? 19:16:33 i don't have anything else, no 19:16:47 #topic Git updates 19:16:54 #link https://review.opendev.org/c/opendev/system-config/+/873012 Update our base images 19:16:55 i restacked the mm3 version upgrades change behind the vhosting work 19:17:32 The base python images did end up updating. Then I realized we use the -slim images which don't include git so this isn't really useful other than as a semi periodic update to the other things we have installed 19:17:56 I was looking at the non slim images to see if git had updated not realizing we only have git where we explicitly install it. All that to say next week we can drop this topic. 19:18:06 And that change is not urgent, but probably also a reasonable thing to do 19:18:49 #topic New Debuntu Releases Preventing sudo pip install 19:19:11 fungi called out that debian bookworm and consequently ubuntu 23.04 and after will prevent `sudo pip install` from working on those systems 19:19:55 For OpenDev we've shifted a lot of things into docker images built on our base python images. These don't use debian packaging for python and I suspect will be fine. However if they are not we should be able to modify the intsallation system on the image to use a single venv that gets added to $PATH 19:20:04 I think this means the risk to us is relatively low 19:20:27 Aditionally ansible is already in a venv on bridge and we use venvs on our test images 19:20:46 docker-compose isn't though. that's one i've been meaning to get to 19:20:52 good call 19:21:18 definitely anything you can think of that is still running outside of a venv should be moved. We can do that ahead of the system server upgrades that will break us since old stuff can handle venvs 19:21:53 ++ i'm sure we can work around it, but it's a good push to do things better 19:22:04 Elsewhere we should expect projects like openstack and probably starlingx to struggle with this change 19:22:17 in particular tools like devstack are not venv ready 19:22:32 yeah, i posted to openstack-discuss about it as well, just to raise awareness 19:22:48 yeah there have been changes floating around for years, that we've never quite finished 19:23:37 and ya I think talking about it semi regularly is a good way to keep encouraging people tochip away at it 19:23:52 for a lot of stuff we should be able to make msall measureable progress with minimal impact over time 19:25:01 #topic Gerrit Updates 19:25:27 A number of Gerrit related changes have landed over the last week. In particular our use of submit requirements was cleaned up and we have a 3.7 upgrade job 19:25:38 That expanded testing was used to land the base image swap for gerrit 19:25:53 this base image swap missed (at least) one thing: openssh-client installation 19:26:06 this broke jeepyb as it uses ssh to talk to gerrit for new repo creation via the manage-projects tool 19:26:13 Apologies for that. 19:26:59 fungi discovered that even after fixing openssh jeepyb's manage-projects wedges itself for projects if the initial creation fails. The reason for this is that no branch is created in gerrit if manage-projects fails on the first run. This causes subsequent runs to clone from gerrit and not be able to checkout master 19:27:19 To work around this fungi manually pushed a master branch to starlingx/public-keys 19:27:49 and discovered in the process that you need an account which has agreed to a cla in gerrit in order to do that to a cla-enforced repository 19:28:11 my fungi.admin account had not (as i suspect most/all of our admin accounts haven't) 19:28:18 I've only had a bit of time today to think about that but part of thinks that this may be desireable as I'm not sure we can fully automate around all the gerrit repo creation failed causes? 19:28:32 the bootstrapping account is in the "System CLA" group, which seems to be how it gets around that 19:28:36 in this specific case we chould just fallback to reiniting from scratch but I'm not sure that is appropriate for all cases 19:28:55 fungi: ya I wonde rif we should just go ahead and add the admin group to system cla or something like that 19:29:17 or add project bootstrappers to it 19:29:24 ah yup 19:29:27 as an included group 19:29:52 with that all sorted I think ianw's change to modify acls is landable once communicated 19:29:54 #link https://review.opendev.org/c/openstack/project-config/+/867931 Cleaning up deprecated copy conditions in project ACLs 19:30:12 it would've had a bad time with no ssh :( 19:30:30 indeed 19:30:43 thanks for fixing it! 19:30:47 yeah sorry, will send something up about that 19:30:55 Other Gerrit items include a possible upgrade to java 17 19:30:59 #link https://review.opendev.org/c/opendev/system-config/+/870877 Run Gerrit under Java 17 19:31:10 I'd still like to hunt down someone who can explain the workaround that is necessary for that to me a bit better 19:31:31 but I'm finding that the new discord bridge isn't as heavily trafficed as the old slack system. I may have to break down and sign up for discord 19:31:59 And yesterday we had a few users reporting issues with large repo fetches 19:32:09 ianw did some debugging on that and it resulted in this issue for MINA SSHD 19:32:11 #link https://github.com/apache/mina-sshd/issues/319 Gerrit SSH issues with flaky networks. 19:32:58 oh, that just got a comment a few minutes ago :) 19:34:08 ... sounds like whatever we try is going to involve a .java file :/ 19:34:45 ya looks like tomas has a theory but we need to update gerrit to better instrument things in order to confirm it 19:34:51 Progress at least 19:35:51 Anything else gerrit related before we move on? 19:35:55 jayf was the first to mention it, but it is a pretty constant thing in the logs 19:36:31 if it is a race the chagne in jdk could be exposing it more too 19:36:41 since that may affect underlying timing of actions 19:36:51 and others are still reporting connectivity issues to gerrit today (jrosser at least) 19:37:21 oh side note: users can use https if necessary. Its maybe a bit more clunky if using git-review but is a fallback 19:37:42 i think it would be easy-ish to add the close logging suggested there in the same file 19:38:07 (if it is) i could try sending that upstream, and if it's ok, we could build with a patch 19:38:09 yup and we could even patch that into our image if upstream doesn't want the extra debugging (though ideally we'd be upstream first as I like not having a fork) 19:38:48 yeah. although we haven't had a lot of response on upstream things lately :/ but that was mail, not patches 19:39:21 ianw: oh also March 2 at a terrible time of day for you (8am for me) they have their community meeting. Why don't I go ahead and throw this on the agenda and I'll do my best to attend 19:39:25 I can ask about java 17 too 19:40:12 (not that we have to wait that long just figure having a direct conversation might help move some of these things forward) 19:40:29 ++ 19:40:53 #topic Python 2 removal from test images 19:41:05 20 minutes left lets keep things moving 19:41:24 some projects have noticed the python2 removal. It turns out listing python2 as a dependency in bindep was not something everyone understood as necessary 19:41:37 some projects like nova and swift are fine. Others like glance and cinder and tripleo-heat-templates are not 19:42:16 When this came up earlier today I had three ideas for addressing this. A) revert the python2 removal from test images B) update things to fix buggy bindep.txt C) have -py27 jobs explicitly install python2 19:42:46 I'm beginning to wonder if we should do A) then announce we'll remove it again after the antelope release so openstack should do either B or C in the meantime? 19:42:49 per a post to the openstack-discuss ml. tripleo seems to have gone ahead with option b 19:43:09 yeah i'm just pulling it up ... 19:43:18 i think maybe we have openstack-tox-py27 install it 19:43:27 apparently stable branch jobs supporting python 2.7 are very urgent to some of their constituency 19:43:30 my main concern here is that openstack isn't using bindep properly 19:43:52 i agree on that 19:44:18 if we put it back in the images, i feel like we just have to do a cleanup again at some point 19:44:37 ianw: yup I think we'd remova python2 again say Late april after the openstack release? 19:44:39 at least if it's in the job, when the job eventually is unreferenced, we don't have to think abou tit again 19:44:46 what is properly in this case? they failed to specify a python version their testing requires... i guess that means they should include python3 as well 19:44:46 thats a good point 19:44:56 fungi: yes python3 should be included too 19:45:14 yeah, i mean the transition point between 2->3 was/is a bit of a weird time 19:45:34 they *should* probably specify python3, but practically that's on all images 19:45:41 at least until python4 19:45:48 I suspect that nova and swift have/had user using bindep outside of CI 19:46:01 also a chicken-and-egg challenge for our jobs running bindep to find out they already have the python3 requested 19:46:02 and that is why theirs are fine. But the others never used bindep except for in CI and once things went green they shipped it 19:46:48 So maybe the fix is update openstack -py27 jobs to install python2 and encourage openstack to update their bindep files to include runtime dependencies 19:46:48 basically we can't really have images without python3 on them, because ansible even beofre it runs bindep 19:48:04 so, yeah, i agree including python3 in bindep.txt is a good idea, it just can't be enforced by ci through exercising the file itself (a linting rule could catch it though) 19:48:08 we also don't need to solve that in the meeting (lack of time) but I wanted to make sure everyone was aware of the speed bump they hit 19:48:11 ++ i'll have a suggested patch to openstack-zuul-jobs for that in a bit 19:48:16 thanks 19:48:22 #topic Docker 23 19:48:46 Docker 23 released last week (skipping 21 and 22) and created some minor isues for us 19:49:07 In particular they have an unlisted hard dependency on apparmor which we've worked around in a couple of places by installing apparomor 19:49:42 Also things using buildx need to explicitly install buildx as it has a separate package now (docker 23 makes buildx the default builder for linux too, I'm not sure how that works if buildx isn't even installed by default though) 19:49:52 hard dependency on apparmor for debuan-derivatives anyway 19:50:00 right 19:50:00 s/debuan/debian/ 19:50:07 and maybe on opensuse but we don't opensuse much 19:50:32 at this point I think the CI situation is largely sorted out and ianw has started a list for working through prod updates 19:50:41 prod updates are done manually because upgrading docker implies container restarts 19:51:22 Mostly just a call out topic since these errors have been hitting things all across our world 19:51:27 #link https://etherpad.opendev.org/p/docker-23-prod 19:51:29 thank you to everone who has helped sort it out 19:51:50 most done, have to think about zuul 19:52:02 ya zuul might be easiest in small batches 19:52:12 i'm thinking maybe the regular restart playbook, but with a forced docker update 19:52:19 rolling restart playbook 19:52:34 ya that could work too. A one off playbook modification? 19:52:43 yeah, basically just run a custom playbook 19:52:51 the pad contains list.katacontainers.io (what are we using docker for there?) but not lists.openstack.org 19:53:08 fungi: we're not I think the entire inventory went in there and has been edited to reflect reality? 19:53:11 that seems like it should work 19:53:15 oh, i see lists.openstack.org is in the not using list 19:53:27 list.katacontainers.io probably just hasn't been checked yet 19:53:39 yeah sorry, i didn't 19:53:54 what i would like to do after this is rework things so we have one docker group 19:53:55 no worries, i'll take a look 19:54:19 so hosts that run install-docker now are all in that group. will take a bit of playbook swizzling 19:54:23 ok running out of time and I want to get to ade_lee's topic 19:54:45 #topic FIPS jobs 19:54:52 :) 19:54:56 speaking of swizzling 19:55:03 at this point 866881 needs a second zuul/zuul-jobs 19:55:05 reviewer 19:55:15 the rest of the changes are ready to merge once that does? 19:55:16 #link https://review.opendev.org/c/zuul/zuul-jobs/+/866881 19:55:27 #link https://review.opendev.org/c/zuul/zuul-jobs/+/866881 19:55:39 #link https://review.opendev.org/c/openstack/project-config/+/872222 19:55:39 I think so yes 19:55:50 https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/872223 19:56:03 ianw and i +2'd the later changes ready to approve once the zuul-jobs change is in 19:56:24 and the tldr here is the jobs are getting reorganized to handle pass to parent and early fips reboot needs. They should emulate how our jobs for docker images are set up 19:56:26 right? 19:56:53 yup 19:56:56 more to handle the need for secret handling in the new role that handles ubuntu advantage subscriptions 19:57:12 ah right thats the bit that needs the secret and uses pass to parent 19:57:29 ua just ends up being a prerequisite for fips on ubuntu 19:57:44 since it requires a license to get the packages 19:58:01 (which opendev has been granted by canonical in order to make this work) 19:58:25 sounds like mostly just need reviews at this point. I'll tr to review today if I don't run out of time. 19:58:35 #topic Open Discussion 19:58:43 Any last minute concerns or topics before we can all go find a meal? 19:58:44 clarkb, that would be great - thanks! 19:59:16 we're running into dockerhub tag pruning issues which are blocking deployment from image updates 19:59:25 ianw has a change to aid in debugging that 19:59:34 just a heads up to people who haven't seen the discussion around that yet 19:59:35 #link https://review.opendev.org/c/zuul/zuul-jobs/+/872842 20:00:05 as soon as that's worked out we'll have donor logos on the main opendev.org page 20:00:09 also speaking of distro deprecated things 20:00:13 #link https://review.opendev.org/c/opendev/system-config/+/872808 20:00:24 was one to stop using apt-key for the docker install ... it warns on jammy now 20:00:43 thanks for fixing that 20:00:48 and reminder I'll be afk tomorrow 20:01:16 thats our hour. Thanks everyone 20:01:18 #endmeeting