19:01:26 <clarkb> #startmeeting infra
19:01:26 <opendevmeet> Meeting started Tue May 16 19:01:26 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:26 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:26 <opendevmeet> The meeting name has been set to 'infra'
19:01:40 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/34DDMF4OX5CPXU2ARFFXA66IHRFDS3F2/ Our Agenda
19:01:56 <clarkb> #topic Announcements
19:02:14 <clarkb> I didn't really have anythin gto announce.
19:02:26 <clarkb> Which means we can dive right in I guess
19:02:36 <clarkb> #topic Migrating to Quay.io
19:02:48 <clarkb> #link ttps://etherpad.opendev.org/p/opendev-quay-migration-2023 Plan/TODO list
19:03:10 <clarkb> unfortunately I discovered a deficiency in docker that impacts speculative testing of container images when they are not hosted on dockerhub
19:03:20 <clarkb> link https://etherpad.opendev.org/p/3anTDDTht91wLwohumzW Discovered a deficiency in Docker impacting speculative testing of container images
19:03:24 <clarkb> #link https://etherpad.opendev.org/p/3anTDDTht91wLwohumzW Discovered a deficiency in Docker impacting speculative testing of container images
19:03:38 <clarkb> I put together a document describing the problem and some potential solutions or workarounds.
19:04:11 <clarkb> I think if we want to workaround it our best option is to build all of our images with buildx (it doesn't have the same deficiency) and then use skopeo to fetch the images out of band in testing only
19:04:22 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/882977 illustration of skopeo prefetching
19:04:40 <clarkb> this change shows the skopeo workaround works and how we can apply it without impacting production
19:05:08 <clarkb> I keep getting distracted but my goal is to pcik this effort up again in the near future and implement th eworkaround everywhere
19:05:37 <clarkb> If we don't like those workarounds I think we should consider rolling back to docker hub and then implementing one of the more robust long term options of switching to podman
19:06:03 <clarkb> but that represents significant effort that will liekly need to be taken slowly to ensure we don't break anything hence the idea we should rollback the container image hosting move if we do that
19:08:14 <clarkb> I haven't heard a ton of feedback on whether or not people are ok with the workarounds other than fungi saying he was fine with them. Please let me know if you have thoughts
19:08:20 <clarkb> #topic Bastion Host Changes
19:08:30 <ianw> does this affect zuul as well?  or does testing there not pull?
19:08:54 <clarkb> ianw: it does affect zuul as well. Though less completely because zuul has k8s testing
19:09:14 <clarkb> and zuul is in a different tenant so depends on for opendev python base images may not work anyway?
19:09:34 <clarkb> #undo
19:09:34 <opendevmeet> Removing item from minutes: #topic Bastion Host Changes
19:10:03 <corvusx> hi um i said some things and didn't see a response so i don't know what the current state of lagging is
19:10:06 <clarkb> I suspect that it will be much easier for zuul to move to podman and friends for CI too since the testing doesn't directly map to a production deployment anywhere
19:10:22 <clarkb> corvusx: I havne't seen any messages from you
19:10:29 <corvusx> so anyway:
19:10:31 <corvusx> i think the skopeo hack basically means we throw away everything we've done with speculative builds and go with a completely different approach which is potentially very fragile
19:10:38 <corvusx> i mean, we have all of this speculative build stuff tested and working in zuul-jobs -- but i guess the quay stuff is only working with podman.  so it seems like that's the way we should go
19:11:19 <clarkb> my concern with that is that is likely months of effort as we go through each service one by one and figure out how permissions and stuff are affected and so on
19:11:22 <corvusx> basically, every interaction with an image is an opportunity to miss getting the speculative image
19:11:31 <clarkb> which is fine if we want to do that bu tI think we should rollback to docker hub first
19:11:35 <corvusx> podman is supposed to be a drop-in replacement
19:11:39 <corvusx> why would permissions be affected?
19:11:40 <clarkb> but we know it isn't
19:11:50 <clarkb> corvusx: because it runs as a regular user without a root daemon
19:12:07 <fungi> specifically, what i said i was good with was rolling forward to iterate on making things work for quay (whether that means temporary workarounds or whatever) rather than rolling back to dockerhub
19:12:11 <corvusx> i mean, they might have fixed whatever was wrong before?  i haven't see any issues with the bind mount stuff.
19:12:47 <clarkb> corvusx: yes the bind specific problems may be addressed. Its more as I dug into it podman-compose vs docker-compose + podman appera very different than docker-compose with docker due to podmans different design
19:12:59 <clarkb> there are also the nested podman issues that ianw hit with nodepool-builder
19:13:20 <clarkb> all of it should be doable, but I don't think its a week of work
19:13:20 <corvusx> how so?  i was certainly led to believe that it's a drop in by the folks who documented that in zuul
19:13:43 <clarkb> corvusx: so podman-compose is not feature complete compared to docker-compose aiui is the first thing
19:14:02 <corvusx> our docker-compose usage is limited to the feature set available like 5 years ago
19:14:05 <clarkb> corvusx: for this reason you can run podman + docker-compose using podman as a daemon. But if you do that I don't know if you are stuck on old docker-compose
19:14:38 <clarkb> but then separately if I run podman as root I will get different results then when I run it as a regular user
19:15:34 <corvusx> it's just our docker-compose usage is so simple i'm having a hard time believing that this could be a huge problem.  it's basically just bind mounts and host networking... if podman/podman-compose can't handle that, then i don't know why it's even a thing
19:16:02 <clarkb> corvusx: I thin kwe already hvae evidence of it being an issue though with nodepool builder running nested podman
19:16:21 <clarkb> I agree ti shouldn't be an issue but then ianw pointed at some historical ones
19:16:35 <clarkb> there is also the issue of installing the toolchain on ubuntu
19:16:44 <corvusx> what's the nodepool builder running nested podman problem (and does that have any relation to what we do on servers?)
19:17:24 <clarkb> corvusx: my undertsanding is that if you run nodepool-builder under podman then all of the diskimage builder container elements stop working because of the nesting of podman
19:17:28 <clarkb> I don't know why this occurs
19:17:36 <clarkb> it would affect our nodepool builder servers
19:18:07 <corvusx> that does sound like a problem
19:18:09 <ianw> https://opendev.org/zuul/nodepool/src/branch/master/Dockerfile#L90 is the things we know about; cgroups issues
19:18:31 <corvusx> so i'm still in the camp that we should not consider the skopeo thing an intermediate solution
19:18:33 <ianw> i don't think anyone really knows what would happen under podman
19:19:02 <corvusx> i think if we want to go that direction, we should very explicitly make a decision to completely throw out all the work we've done on this (which is fine, that's a thing we can decide)
19:19:32 <clarkb> corvusx: I'm not sure I understand why we would have to throw it all out. Th echange I pushed shows it works with all of the speculative stuff in place?
19:19:48 <corvusx> but i don't think it makes sense to do that as an interim step, because i think it's potentially very error prone (every container image interaction in any job is an opportunity to miss some place that needs to be preloaded)
19:19:52 <clarkb> It is fragile though as you hav eto be explicit about which images are potentially speculative
19:20:09 <corvusx> and by the time we find and fix all those things, we will have built a completely new solution that supercedes the old one
19:20:46 <clarkb> ok in that case I think we need to rollback quay.io moves and then do switch to podman then move to quay.io. I don't think we can move to podman in a week. I suspect it will be months before everything since the only time we tried it blew up on us.
19:21:31 <clarkb> the first thing that will need to be solved is installing podman and friends.
19:21:58 <clarkb> Then we can use system-config-run jobs to see what if anything just breaks. Then start holding nodes and sorting out the transition from one to the other
19:22:06 <corvusx> we could still consider doing the preload thing, or we could consider staying on docker
19:22:14 <corvusx> i mean staying on dockerhub
19:22:43 <ianw> the workaround in system-config does mean that if someone was implementing from first principles in zuul-jobs, there's some pretty big caveats that aren't mentioned?
19:22:45 <corvusx> (also, are we absolutely sure that there isn't a way to make this work?)
19:23:06 <corvusx> ianw: i don't follow, can you elaborate?
19:23:21 <clarkb> corvusx: I mean I'm as sure as  Ican be. There is an closed issue on moby with people saying this is still a problem as of February.
19:24:13 <clarkb> Digging around in the docker config docs I can't find any indication that you can set up a mirror for anything but docker.io. Another option woul dbe writing the custom proxy thing and configuring docker daemon with an http proxy that does what we need
19:24:18 <fungi> closed as a "wontfix" by the maintainers who apparently don't see the problem with fallbacks only working for dockerhub
19:24:56 <corvusx> clarkb: the proxy was the original idea and worked poorly
19:25:26 <clarkb> corvusx: ya it seems like something that would need a lot of from scratch implementation too rather than juts using an existing http proxy due to the specific behavior that is needed
19:25:36 <clarkb> and then potentially break when docker changes its behaviors
19:25:42 <corvusx> clarkb: no i literally mean we did that and it worked poorly
19:25:59 <corvusx> like that's part of why zuul-registry is designed the way it is
19:26:00 <clarkb> oh I see
19:26:16 <corvusx> i'm not going to say it can't be done, i'm just saying, it's not the easy way out here
19:26:25 <ianw> corvusx: i mean use-buildset-registry isn't a transparent thing if you're not pulling from docker.io?  which i think is at least a tacit assumption in zuul-jobs
19:26:51 <corvusx> ianw: i agree; i thought it was tested with non-docker images and thought it worked
19:27:09 <corvusx> ianw: so if it really is borked like that, i agree it needs a big red warning
19:27:44 <clarkb> corvusx: ianw: fwiw I think others should double check what I've found. Best I could tell gerrit 3.8 jobs failed because it couldn't find tags for 3.8 because it was talking directly to quay.io which hasn't had the newer 3.8 tag synced over
19:27:56 <clarkb> basically completely ignoring the speculative state and talking directly to quay.io
19:28:13 <fungi> #link https://github.com/moby/moby/pull/34319 "Implement private registry mirror support"
19:28:21 <clarkb> then I found the issue I mentioned previously which seemed to confirm this is expected behavior
19:28:21 <fungi> for reference
19:28:44 <corvusx> i feel like the best approach would be to either (after switching back to dockerhub if we so wish) move to podman completely, or redesign the system based on preloading images (and doing that in a suitably generic way that we can set out rules about what jobs need to expect and do).  something based on artifact data in zuul probably (like pull-from-intermediate-registry but for individual jobs).
19:28:50 <corvusx> it will be must less transparent but should work.
19:29:26 <corvusx> i think both of those have a good chance of producing a positive outcome
19:29:27 <clarkb> I thought about the second option there, but I don't think you can ever pull imgaes in your actual job workload because that would overwrite what is preloaded
19:29:50 <clarkb> I couldn't come up with a way to encode that in zuul-jobs as a "platform" level thing that our service deployments wouldn't immediately clobber over
19:30:09 <clarkb> which is why I ended up wiht the role embedded in systme-config in my example change proving it works (at least minimally)
19:30:57 <corvusx> clarkb: we risk getting into the weeds here, but how do we make opendev prod jobs work then?
19:31:16 <clarkb> corvusx: https://review.opendev.org/c/opendev/system-config/+/882977 illustrates it
19:31:26 <corvusx> yeah i'm reading it
19:31:41 <clarkb> corvusx: you basically hvae your normal docker-compose pull then after that and before docker-compose up you run skopeo to sideload the image(s)
19:32:04 <corvusx> right so that means that job doesn't test our production playbooks?
19:32:33 <clarkb> corvusx: I guess it depends on how you look at it. The skopeo side load only runs in CI but everything else is also tested and runs in production too
19:32:46 <corvusx> right, it has a "CI=1" style flag
19:33:08 <clarkb> yes but only for an additional step. It doesn't remove what runs in production
19:33:24 <clarkb> so what goes into production is still tested from a code coverage perspective
19:34:06 <corvusx> got it
19:34:10 <clarkb> anyawy it sounds like I should hold off on making any changes in the near future. I think it would be helpful if others can look at this and update the etherpad with their thoughts
19:34:27 <clarkb> I have spent a fair bit of time digging around this proble mand workarounds and ended up with that workaround.
19:34:57 <clarkb> I don't think there is a straightforward solution because docker quite simply is badly designed and doesn't support this workflow despite buildx and podman etc all doing so
19:35:22 <corvusx> yeah, if we look at it in a different perspective:
19:36:01 <corvusx> it seems what we have in zuul-jobs is: the docker world where all the roles work together as long as you only use docker images and tools; then basically the same for podman (with the exception that it also works with docker images)
19:36:48 <fungi> it does seem rather balkanized
19:37:07 <corvusx> so if we like what we have been using, then it seems like we're drawn toward "use podman family of tools".  and if we're willing to accept a different method of building and testing things (with much less magic and more explicit decisions about when and where we use speculative images) then we open the door to preloading
19:38:03 <clarkb> in my mind the workaround with explicit decisions is ideally a short/mid term workaround that we employ to complete the migration to quay.io. Then we continue to move to podman
19:38:16 <clarkb> because clearly docker is a problem here ( we want to move to quay.io for a different problem too)
19:38:48 <clarkb> as an alternativ ewe can stay on docker hub and move to podman then move to quay.io
19:39:09 <clarkb> I guess it depends on which chnage we feel is a priority
19:39:11 <corvusx> okay, let me try to be convinced of skopeo as a workaround -- it's certainly better than what we have now since we basically have no speculative testing
19:39:13 <ianw> i know it's unhelpful from the peanut gallery, but yeah, i've certainly touted the "ci == production" features of opendev deployment several times in talks, etc.  it is something that has been a big staple of the system
19:39:18 <fungi> right, it seems like a question of two paths to the same destination: whether we switch the toolchain first and then the hosting location, or the other way 'round
19:40:06 <corvusx> ianw: yes me too.  full disclosure, i have one scheduled a few weeks from now :)  but that's not driving my strong interest -- i can still give a talk and talk about what doesn't work too :)
19:40:37 <corvusx> i'm a bit worried about the nodepool thing
19:40:55 <corvusx> like does that mean it's simply not going to be possible to run it in nodepool?  does that make podman DOA for us?
19:40:57 <clarkb> I'm personally happy with either approach. I've been looking at short term workarounds because quay.io migration is halfway done and completing that would be nice. But rolling back isn't the end of the world either
19:41:25 <clarkb> corvusx: I don't think it is DOA we just likely need to run it that way and figure out what random settings need to be toggled
19:41:41 <clarkb> More just that it isn't likely to be directly drop in, there is work to do I think
19:42:10 <tonyb> I certainly don't have the background of you all but I feel like completeing the migration then start a new wholesale tool migration to podman and co is what I'd look at
19:42:30 <clarkb> as a time check we've spent most of our meeting on this topic. We can followup on this outside the meeting? There are a few other topics to dig into
19:42:54 <corvusx> okay, my concern with the preload "hack" is that it's fragile and doesn't give us full test coverage.  so if migrating nodepool will take a long time then we end up with a permanent hack
19:43:36 <corvusx> if we thought we could do it in a relatively short period and most of the skopeo hack work is done, then maybe that's the way to go.  but if it's really uncertain and going to take a long time, that makes me think that rollback is better.
19:44:18 <clarkb> corvusx: in that case maybe we should focus on "install podman and friends" first then we can use that to run some test jobs with podman instead of docker
19:44:34 <clarkb> spend some time on that then decide if we rollback or rollforward with the workaround
19:44:36 <corvusx> sounds like if we can run nodepool we're probably golden?
19:44:47 <clarkb> yes I think nodepool and gerrit are the two big questions
19:44:54 <corvusx> what's weird about gerrit?
19:45:09 <clarkb> corvusx: all of the sideband command stuff
19:45:21 <corvusx> like "docker run gerrit foo" ?
19:45:25 <corvusx> er "docker exec" ?
19:45:31 <clarkb> ya for reindexing and all that
19:46:05 <clarkb> I think a lot of it is run not exec currently so it spins up anothe rcontainer on the same bind mounts and does operations
19:46:17 <clarkb> I'm less concerend this will break than nodepool builder stuff
19:46:19 <fungi> i guess you're saying it makes a good canary because of the additional complexity?
19:46:20 <corvusx> "neat"
19:46:26 <clarkb> but I think it is good coverage of a workflow we have
19:46:29 <clarkb> fungi: ya
19:46:55 <clarkb> most of our services are a few bind mounts and an http server. Super simple and basic
19:46:59 <fungi> got it, so not for any specific thing that we suspect is problematic
19:47:04 <clarkb> gerrit is different. nodepool-builder is different
19:47:21 <fungi> (unlike the cgroups problem witnessed with nodepool)
19:47:34 <ianw> mysql/mariadb dumps are another thing that calls into the running container
19:47:38 <corvusx> so is anyone interested in getting nodepool-builder to work?
19:47:45 <clarkb> ianw: ya and gerrit would cover that too
19:48:22 <tonyb> I am interested but I don't have time to dedicate to it for several weeks
19:48:25 <ianw> i think the "interesting" path in nodepool-builder would be the containerfile element
19:48:30 <clarkb> corvusx: I can continue to push this along if others prefer. But it might be a day or two until I can spin up testing for that (I would probably rely on system-config-run after sorting out some podman install stuff)
19:48:40 <clarkb> ianw: ++
19:48:49 <ianw> which is used by fedora and rocky only.  fedora we've discussed ...
19:49:28 <clarkb> baically I think step 0 is a system-config role to install podman/buildah/skopeo/podman-compose or docker-compose with podman backing it
19:49:41 <clarkb> then we can have the CI system produce a nodepool-builder we can poke at
19:49:50 <clarkb> and I can do that but probably no sooner than tomorrow
19:50:13 <corvusx> is there some person or group who is driving the rocky requirement that can help?
19:50:13 <ianw> ++ running dib gate testing will stress containerfile
19:50:33 <clarkb> corvusx: NeilHanlon and the openstack ansible group
19:50:54 <clarkb> I can reach out to them once I've got the basic stuff running
19:51:44 <corvusx> is fedora being removed?
19:52:06 <clarkb> corvusx: I think so no one really hicmed in saying they have a use case for it since I last prodded
19:52:07 <ianw> (although now i wonder about the dib gate and if that's pulling the nodepool-builder speculatively built container properly?)
19:52:18 <corvusx> ianw: it is almost certainly not
19:53:26 <corvusx> i do have a suggestion for a last-ditch way to avoid the nested podman problem with nodepool
19:54:10 <corvusx> if we find there is no way to get it to work, we can implement my "build an image in zuul and hack nodepool-builder to upload it" idea that i mentioned in the nodepool-in-zuul spec
19:54:12 <tonyb> I spent a couplel of days verifying that the only thing we "loose" with switching off fedora is some newer virt stack testing and that repo is available for CentOS so it's not even a real loss
19:54:45 <clarkb> tonyb: thanks for confirming
19:54:49 <clarkb> corvusx: oh thats a neat idea.
19:54:55 <corvusx> essentially, anticipate the nodepool-in-zuul work (which itself would alleviate this problem) by doing the rocky build in zuul and then having the nodepool builder job for that download and upload the build
19:55:21 <clarkb> ok lets continue this outside the meeting. I think we have a few investigative next steps we can take in order to better understand potential impact and I'm volunteering to start on those within a day or two
19:55:39 <corvusx> that's not going to be the easiest thing (like, if the issue can be resolved by adding "--work-please" to podman, that's best) but if we literally can't make it work, that gets us out of the corner we backed ourselves into.
19:56:02 <clarkb> I'm going to skip the bastion, mailman3, gerrit, server upgrades, and storyboard topics today becaus eI think the only one with any real movement since last week is the fedora topic
19:56:11 <clarkb> #topic AFS volume utilization
19:56:31 <clarkb> utilization is trending back up. We seem to bounce around up and down with a general trend of upwards
19:56:53 <clarkb> as mentioned I asked for any last comments on whether or not people had a use case for fedora last week with the intent of making change sstarting this week
19:57:29 <clarkb> there really wasn't any "we need fedora" feedback. We got some "why no rhel? Why this why that?" feedback instead
19:57:39 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/883083 Cleanup unused fedora mirror content
19:57:49 <clarkb> I think this is a good step 0 that will free up fileserver disk
19:57:51 <fungi> sean is looking into the possibility of rhel again too
19:58:01 <fungi> has another question about it into rh legal
19:58:08 <clarkb> regardless of what happens to fedora this chagne should be safe as it removes unused fedora mirror content
19:58:22 <clarkb> I think if we can land that in the near future it would be great
19:58:33 <clarkb> that will address the immediate lets not run out of disk on AFS problem
19:58:44 <ianw> ... that has been a very "looked at" thing over the years
19:59:20 <clarkb> Then separately I also think we can proceed with removing the fedora bits we are using. The first step for that would be to configure the fedora jobs to stop relying on the mirrors. Then we can clean up the mirror content that was used. Then finally we can start removing jobs and remove the images
19:59:36 <ianw> i think given tonyb's input too, i don't see too much using this
19:59:56 <clarkb> The actual removal will porobably take some time as we clean up jobs. But we should be able to clean up mirrors which is the immediate concern then gracefully shutdown/remove th erest
20:00:03 <ianw> one thing is maybe we should stop the mirror setup first?  i think zuul-jobs breakage might be the main pain point
20:00:19 <clarkb> ianw: yes thats what I said :)
20:00:38 <clarkb> basically we can keep the nodes but they can talk to upstream package repos
20:00:45 <ianw> ++
20:00:53 <clarkb> note https://review.opendev.org/c/opendev/system-config/+/883083 removes mirroring for content we don't have nodes for
20:01:03 <clarkb> double check me on that but I think that means we can merge it right now
20:01:35 <clarkb> but for the bits we do use we need to stop consuming them in the jobs first and ya  Ithink that is a good first step in the shutdown of what does exist
20:01:38 <clarkb> and we are at time.
20:01:45 <clarkb> Sorry this went a bit long and we skipped over time topics
20:02:01 <clarkb> Feel free to continue discussion in #opendev or on the mailing list for any thing that was missed (including the skipped topics)
20:02:16 <clarkb> But I won't keep you any longer than I already have
20:02:21 <clarkb> thank you for your time today!
20:02:24 <clarkb> #endmeeting