19:01:26 #startmeeting infra 19:01:26 Meeting started Tue May 16 19:01:26 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:26 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:26 The meeting name has been set to 'infra' 19:01:40 #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/34DDMF4OX5CPXU2ARFFXA66IHRFDS3F2/ Our Agenda 19:01:56 #topic Announcements 19:02:14 I didn't really have anythin gto announce. 19:02:26 Which means we can dive right in I guess 19:02:36 #topic Migrating to Quay.io 19:02:48 #link ttps://etherpad.opendev.org/p/opendev-quay-migration-2023 Plan/TODO list 19:03:10 unfortunately I discovered a deficiency in docker that impacts speculative testing of container images when they are not hosted on dockerhub 19:03:20 link https://etherpad.opendev.org/p/3anTDDTht91wLwohumzW Discovered a deficiency in Docker impacting speculative testing of container images 19:03:24 #link https://etherpad.opendev.org/p/3anTDDTht91wLwohumzW Discovered a deficiency in Docker impacting speculative testing of container images 19:03:38 I put together a document describing the problem and some potential solutions or workarounds. 19:04:11 I think if we want to workaround it our best option is to build all of our images with buildx (it doesn't have the same deficiency) and then use skopeo to fetch the images out of band in testing only 19:04:22 #link https://review.opendev.org/c/opendev/system-config/+/882977 illustration of skopeo prefetching 19:04:40 this change shows the skopeo workaround works and how we can apply it without impacting production 19:05:08 I keep getting distracted but my goal is to pcik this effort up again in the near future and implement th eworkaround everywhere 19:05:37 If we don't like those workarounds I think we should consider rolling back to docker hub and then implementing one of the more robust long term options of switching to podman 19:06:03 but that represents significant effort that will liekly need to be taken slowly to ensure we don't break anything hence the idea we should rollback the container image hosting move if we do that 19:08:14 I haven't heard a ton of feedback on whether or not people are ok with the workarounds other than fungi saying he was fine with them. Please let me know if you have thoughts 19:08:20 #topic Bastion Host Changes 19:08:30 does this affect zuul as well? or does testing there not pull? 19:08:54 ianw: it does affect zuul as well. Though less completely because zuul has k8s testing 19:09:14 and zuul is in a different tenant so depends on for opendev python base images may not work anyway? 19:09:34 #undo 19:09:34 Removing item from minutes: #topic Bastion Host Changes 19:10:03 hi um i said some things and didn't see a response so i don't know what the current state of lagging is 19:10:06 I suspect that it will be much easier for zuul to move to podman and friends for CI too since the testing doesn't directly map to a production deployment anywhere 19:10:22 corvusx: I havne't seen any messages from you 19:10:29 so anyway: 19:10:31 i think the skopeo hack basically means we throw away everything we've done with speculative builds and go with a completely different approach which is potentially very fragile 19:10:38 i mean, we have all of this speculative build stuff tested and working in zuul-jobs -- but i guess the quay stuff is only working with podman. so it seems like that's the way we should go 19:11:19 my concern with that is that is likely months of effort as we go through each service one by one and figure out how permissions and stuff are affected and so on 19:11:22 basically, every interaction with an image is an opportunity to miss getting the speculative image 19:11:31 which is fine if we want to do that bu tI think we should rollback to docker hub first 19:11:35 podman is supposed to be a drop-in replacement 19:11:39 why would permissions be affected? 19:11:40 but we know it isn't 19:11:50 corvusx: because it runs as a regular user without a root daemon 19:12:07 specifically, what i said i was good with was rolling forward to iterate on making things work for quay (whether that means temporary workarounds or whatever) rather than rolling back to dockerhub 19:12:11 i mean, they might have fixed whatever was wrong before? i haven't see any issues with the bind mount stuff. 19:12:47 corvusx: yes the bind specific problems may be addressed. Its more as I dug into it podman-compose vs docker-compose + podman appera very different than docker-compose with docker due to podmans different design 19:12:59 there are also the nested podman issues that ianw hit with nodepool-builder 19:13:20 all of it should be doable, but I don't think its a week of work 19:13:20 how so? i was certainly led to believe that it's a drop in by the folks who documented that in zuul 19:13:43 corvusx: so podman-compose is not feature complete compared to docker-compose aiui is the first thing 19:14:02 our docker-compose usage is limited to the feature set available like 5 years ago 19:14:05 corvusx: for this reason you can run podman + docker-compose using podman as a daemon. But if you do that I don't know if you are stuck on old docker-compose 19:14:38 but then separately if I run podman as root I will get different results then when I run it as a regular user 19:15:34 it's just our docker-compose usage is so simple i'm having a hard time believing that this could be a huge problem. it's basically just bind mounts and host networking... if podman/podman-compose can't handle that, then i don't know why it's even a thing 19:16:02 corvusx: I thin kwe already hvae evidence of it being an issue though with nodepool builder running nested podman 19:16:21 I agree ti shouldn't be an issue but then ianw pointed at some historical ones 19:16:35 there is also the issue of installing the toolchain on ubuntu 19:16:44 what's the nodepool builder running nested podman problem (and does that have any relation to what we do on servers?) 19:17:24 corvusx: my undertsanding is that if you run nodepool-builder under podman then all of the diskimage builder container elements stop working because of the nesting of podman 19:17:28 I don't know why this occurs 19:17:36 it would affect our nodepool builder servers 19:18:07 that does sound like a problem 19:18:09 https://opendev.org/zuul/nodepool/src/branch/master/Dockerfile#L90 is the things we know about; cgroups issues 19:18:31 so i'm still in the camp that we should not consider the skopeo thing an intermediate solution 19:18:33 i don't think anyone really knows what would happen under podman 19:19:02 i think if we want to go that direction, we should very explicitly make a decision to completely throw out all the work we've done on this (which is fine, that's a thing we can decide) 19:19:32 corvusx: I'm not sure I understand why we would have to throw it all out. Th echange I pushed shows it works with all of the speculative stuff in place? 19:19:48 but i don't think it makes sense to do that as an interim step, because i think it's potentially very error prone (every container image interaction in any job is an opportunity to miss some place that needs to be preloaded) 19:19:52 It is fragile though as you hav eto be explicit about which images are potentially speculative 19:20:09 and by the time we find and fix all those things, we will have built a completely new solution that supercedes the old one 19:20:46 ok in that case I think we need to rollback quay.io moves and then do switch to podman then move to quay.io. I don't think we can move to podman in a week. I suspect it will be months before everything since the only time we tried it blew up on us. 19:21:31 the first thing that will need to be solved is installing podman and friends. 19:21:58 Then we can use system-config-run jobs to see what if anything just breaks. Then start holding nodes and sorting out the transition from one to the other 19:22:06 we could still consider doing the preload thing, or we could consider staying on docker 19:22:14 i mean staying on dockerhub 19:22:43 the workaround in system-config does mean that if someone was implementing from first principles in zuul-jobs, there's some pretty big caveats that aren't mentioned? 19:22:45 (also, are we absolutely sure that there isn't a way to make this work?) 19:23:06 ianw: i don't follow, can you elaborate? 19:23:21 corvusx: I mean I'm as sure as Ican be. There is an closed issue on moby with people saying this is still a problem as of February. 19:24:13 Digging around in the docker config docs I can't find any indication that you can set up a mirror for anything but docker.io. Another option woul dbe writing the custom proxy thing and configuring docker daemon with an http proxy that does what we need 19:24:18 closed as a "wontfix" by the maintainers who apparently don't see the problem with fallbacks only working for dockerhub 19:24:56 clarkb: the proxy was the original idea and worked poorly 19:25:26 corvusx: ya it seems like something that would need a lot of from scratch implementation too rather than juts using an existing http proxy due to the specific behavior that is needed 19:25:36 and then potentially break when docker changes its behaviors 19:25:42 clarkb: no i literally mean we did that and it worked poorly 19:25:59 like that's part of why zuul-registry is designed the way it is 19:26:00 oh I see 19:26:16 i'm not going to say it can't be done, i'm just saying, it's not the easy way out here 19:26:25 corvusx: i mean use-buildset-registry isn't a transparent thing if you're not pulling from docker.io? which i think is at least a tacit assumption in zuul-jobs 19:26:51 ianw: i agree; i thought it was tested with non-docker images and thought it worked 19:27:09 ianw: so if it really is borked like that, i agree it needs a big red warning 19:27:44 corvusx: ianw: fwiw I think others should double check what I've found. Best I could tell gerrit 3.8 jobs failed because it couldn't find tags for 3.8 because it was talking directly to quay.io which hasn't had the newer 3.8 tag synced over 19:27:56 basically completely ignoring the speculative state and talking directly to quay.io 19:28:13 #link https://github.com/moby/moby/pull/34319 "Implement private registry mirror support" 19:28:21 then I found the issue I mentioned previously which seemed to confirm this is expected behavior 19:28:21 for reference 19:28:44 i feel like the best approach would be to either (after switching back to dockerhub if we so wish) move to podman completely, or redesign the system based on preloading images (and doing that in a suitably generic way that we can set out rules about what jobs need to expect and do). something based on artifact data in zuul probably (like pull-from-intermediate-registry but for individual jobs). 19:28:50 it will be must less transparent but should work. 19:29:26 i think both of those have a good chance of producing a positive outcome 19:29:27 I thought about the second option there, but I don't think you can ever pull imgaes in your actual job workload because that would overwrite what is preloaded 19:29:50 I couldn't come up with a way to encode that in zuul-jobs as a "platform" level thing that our service deployments wouldn't immediately clobber over 19:30:09 which is why I ended up wiht the role embedded in systme-config in my example change proving it works (at least minimally) 19:30:57 clarkb: we risk getting into the weeds here, but how do we make opendev prod jobs work then? 19:31:16 corvusx: https://review.opendev.org/c/opendev/system-config/+/882977 illustrates it 19:31:26 yeah i'm reading it 19:31:41 corvusx: you basically hvae your normal docker-compose pull then after that and before docker-compose up you run skopeo to sideload the image(s) 19:32:04 right so that means that job doesn't test our production playbooks? 19:32:33 corvusx: I guess it depends on how you look at it. The skopeo side load only runs in CI but everything else is also tested and runs in production too 19:32:46 right, it has a "CI=1" style flag 19:33:08 yes but only for an additional step. It doesn't remove what runs in production 19:33:24 so what goes into production is still tested from a code coverage perspective 19:34:06 got it 19:34:10 anyawy it sounds like I should hold off on making any changes in the near future. I think it would be helpful if others can look at this and update the etherpad with their thoughts 19:34:27 I have spent a fair bit of time digging around this proble mand workarounds and ended up with that workaround. 19:34:57 I don't think there is a straightforward solution because docker quite simply is badly designed and doesn't support this workflow despite buildx and podman etc all doing so 19:35:22 yeah, if we look at it in a different perspective: 19:36:01 it seems what we have in zuul-jobs is: the docker world where all the roles work together as long as you only use docker images and tools; then basically the same for podman (with the exception that it also works with docker images) 19:36:48 it does seem rather balkanized 19:37:07 so if we like what we have been using, then it seems like we're drawn toward "use podman family of tools". and if we're willing to accept a different method of building and testing things (with much less magic and more explicit decisions about when and where we use speculative images) then we open the door to preloading 19:38:03 in my mind the workaround with explicit decisions is ideally a short/mid term workaround that we employ to complete the migration to quay.io. Then we continue to move to podman 19:38:16 because clearly docker is a problem here ( we want to move to quay.io for a different problem too) 19:38:48 as an alternativ ewe can stay on docker hub and move to podman then move to quay.io 19:39:09 I guess it depends on which chnage we feel is a priority 19:39:11 okay, let me try to be convinced of skopeo as a workaround -- it's certainly better than what we have now since we basically have no speculative testing 19:39:13 i know it's unhelpful from the peanut gallery, but yeah, i've certainly touted the "ci == production" features of opendev deployment several times in talks, etc. it is something that has been a big staple of the system 19:39:18 right, it seems like a question of two paths to the same destination: whether we switch the toolchain first and then the hosting location, or the other way 'round 19:40:06 ianw: yes me too. full disclosure, i have one scheduled a few weeks from now :) but that's not driving my strong interest -- i can still give a talk and talk about what doesn't work too :) 19:40:37 i'm a bit worried about the nodepool thing 19:40:55 like does that mean it's simply not going to be possible to run it in nodepool? does that make podman DOA for us? 19:40:57 I'm personally happy with either approach. I've been looking at short term workarounds because quay.io migration is halfway done and completing that would be nice. But rolling back isn't the end of the world either 19:41:25 corvusx: I don't think it is DOA we just likely need to run it that way and figure out what random settings need to be toggled 19:41:41 More just that it isn't likely to be directly drop in, there is work to do I think 19:42:10 I certainly don't have the background of you all but I feel like completeing the migration then start a new wholesale tool migration to podman and co is what I'd look at 19:42:30 as a time check we've spent most of our meeting on this topic. We can followup on this outside the meeting? There are a few other topics to dig into 19:42:54 okay, my concern with the preload "hack" is that it's fragile and doesn't give us full test coverage. so if migrating nodepool will take a long time then we end up with a permanent hack 19:43:36 if we thought we could do it in a relatively short period and most of the skopeo hack work is done, then maybe that's the way to go. but if it's really uncertain and going to take a long time, that makes me think that rollback is better. 19:44:18 corvusx: in that case maybe we should focus on "install podman and friends" first then we can use that to run some test jobs with podman instead of docker 19:44:34 spend some time on that then decide if we rollback or rollforward with the workaround 19:44:36 sounds like if we can run nodepool we're probably golden? 19:44:47 yes I think nodepool and gerrit are the two big questions 19:44:54 what's weird about gerrit? 19:45:09 corvusx: all of the sideband command stuff 19:45:21 like "docker run gerrit foo" ? 19:45:25 er "docker exec" ? 19:45:31 ya for reindexing and all that 19:46:05 I think a lot of it is run not exec currently so it spins up anothe rcontainer on the same bind mounts and does operations 19:46:17 I'm less concerend this will break than nodepool builder stuff 19:46:19 i guess you're saying it makes a good canary because of the additional complexity? 19:46:20 "neat" 19:46:26 but I think it is good coverage of a workflow we have 19:46:29 fungi: ya 19:46:55 most of our services are a few bind mounts and an http server. Super simple and basic 19:46:59 got it, so not for any specific thing that we suspect is problematic 19:47:04 gerrit is different. nodepool-builder is different 19:47:21 (unlike the cgroups problem witnessed with nodepool) 19:47:34 mysql/mariadb dumps are another thing that calls into the running container 19:47:38 so is anyone interested in getting nodepool-builder to work? 19:47:45 ianw: ya and gerrit would cover that too 19:48:22 I am interested but I don't have time to dedicate to it for several weeks 19:48:25 i think the "interesting" path in nodepool-builder would be the containerfile element 19:48:30 corvusx: I can continue to push this along if others prefer. But it might be a day or two until I can spin up testing for that (I would probably rely on system-config-run after sorting out some podman install stuff) 19:48:40 ianw: ++ 19:48:49 which is used by fedora and rocky only. fedora we've discussed ... 19:49:28 baically I think step 0 is a system-config role to install podman/buildah/skopeo/podman-compose or docker-compose with podman backing it 19:49:41 then we can have the CI system produce a nodepool-builder we can poke at 19:49:50 and I can do that but probably no sooner than tomorrow 19:50:13 is there some person or group who is driving the rocky requirement that can help? 19:50:13 ++ running dib gate testing will stress containerfile 19:50:33 corvusx: NeilHanlon and the openstack ansible group 19:50:54 I can reach out to them once I've got the basic stuff running 19:51:44 is fedora being removed? 19:52:06 corvusx: I think so no one really hicmed in saying they have a use case for it since I last prodded 19:52:07 (although now i wonder about the dib gate and if that's pulling the nodepool-builder speculatively built container properly?) 19:52:18 ianw: it is almost certainly not 19:53:26 i do have a suggestion for a last-ditch way to avoid the nested podman problem with nodepool 19:54:10 if we find there is no way to get it to work, we can implement my "build an image in zuul and hack nodepool-builder to upload it" idea that i mentioned in the nodepool-in-zuul spec 19:54:12 I spent a couplel of days verifying that the only thing we "loose" with switching off fedora is some newer virt stack testing and that repo is available for CentOS so it's not even a real loss 19:54:45 tonyb: thanks for confirming 19:54:49 corvusx: oh thats a neat idea. 19:54:55 essentially, anticipate the nodepool-in-zuul work (which itself would alleviate this problem) by doing the rocky build in zuul and then having the nodepool builder job for that download and upload the build 19:55:21 ok lets continue this outside the meeting. I think we have a few investigative next steps we can take in order to better understand potential impact and I'm volunteering to start on those within a day or two 19:55:39 that's not going to be the easiest thing (like, if the issue can be resolved by adding "--work-please" to podman, that's best) but if we literally can't make it work, that gets us out of the corner we backed ourselves into. 19:56:02 I'm going to skip the bastion, mailman3, gerrit, server upgrades, and storyboard topics today becaus eI think the only one with any real movement since last week is the fedora topic 19:56:11 #topic AFS volume utilization 19:56:31 utilization is trending back up. We seem to bounce around up and down with a general trend of upwards 19:56:53 as mentioned I asked for any last comments on whether or not people had a use case for fedora last week with the intent of making change sstarting this week 19:57:29 there really wasn't any "we need fedora" feedback. We got some "why no rhel? Why this why that?" feedback instead 19:57:39 #link https://review.opendev.org/c/opendev/system-config/+/883083 Cleanup unused fedora mirror content 19:57:49 I think this is a good step 0 that will free up fileserver disk 19:57:51 sean is looking into the possibility of rhel again too 19:58:01 has another question about it into rh legal 19:58:08 regardless of what happens to fedora this chagne should be safe as it removes unused fedora mirror content 19:58:22 I think if we can land that in the near future it would be great 19:58:33 that will address the immediate lets not run out of disk on AFS problem 19:58:44 ... that has been a very "looked at" thing over the years 19:59:20 Then separately I also think we can proceed with removing the fedora bits we are using. The first step for that would be to configure the fedora jobs to stop relying on the mirrors. Then we can clean up the mirror content that was used. Then finally we can start removing jobs and remove the images 19:59:36 i think given tonyb's input too, i don't see too much using this 19:59:56 The actual removal will porobably take some time as we clean up jobs. But we should be able to clean up mirrors which is the immediate concern then gracefully shutdown/remove th erest 20:00:03 one thing is maybe we should stop the mirror setup first? i think zuul-jobs breakage might be the main pain point 20:00:19 ianw: yes thats what I said :) 20:00:38 basically we can keep the nodes but they can talk to upstream package repos 20:00:45 ++ 20:00:53 note https://review.opendev.org/c/opendev/system-config/+/883083 removes mirroring for content we don't have nodes for 20:01:03 double check me on that but I think that means we can merge it right now 20:01:35 but for the bits we do use we need to stop consuming them in the jobs first and ya Ithink that is a good first step in the shutdown of what does exist 20:01:38 and we are at time. 20:01:45 Sorry this went a bit long and we skipped over time topics 20:02:01 Feel free to continue discussion in #opendev or on the mailing list for any thing that was missed (including the skipped topics) 20:02:16 But I won't keep you any longer than I already have 20:02:21 thank you for your time today! 20:02:24 #endmeeting