Thursday, 2023-05-11

clarkbI've just discovered a problem with speculative gating of container images in quay using docker. tldr is that docker doesn't have an easy way to map quay.io/foo/bar to buildsetregistry/quay.io/foo/bar03:13
clarkbso non of the transparent fetching of images is working03:13
clarkbhttps://github.com/moby/moby/pull/34319 is the unfixed and closed upstream issue that would have addressed this03:13
clarkbthere are some potential workarounds. I'm going to need to dig into this more tomorrow. But I wanted to give a heads up on this03:14
clarkbthe lack of support for this makes me want to get off of docker hub even more quickly. Problem is it definitely makes it more difficult to do so03:15
clarkbI really wish I had realized this sooner. I guess this is a credit to the skopeo/podman crew to have solved this a long long time ago well enough that we just assumed this would work elsewhere...03:26
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role  https://review.opendev.org/c/zuul/zuul-jobs/+/87153903:50
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs  https://review.opendev.org/c/zuul/zuul-jobs/+/87167903:50
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init  https://review.opendev.org/c/zuul/zuul-jobs/+/87168003:50
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role  https://review.opendev.org/c/zuul/zuul-jobs/+/87153904:30
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs  https://review.opendev.org/c/zuul/zuul-jobs/+/87167904:30
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init  https://review.opendev.org/c/zuul/zuul-jobs/+/87168004:30
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role  https://review.opendev.org/c/zuul/zuul-jobs/+/87153904:36
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs  https://review.opendev.org/c/zuul/zuul-jobs/+/87167904:36
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init  https://review.opendev.org/c/zuul/zuul-jobs/+/87168004:36
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role  https://review.opendev.org/c/zuul/zuul-jobs/+/87153904:42
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs  https://review.opendev.org/c/zuul/zuul-jobs/+/87167904:42
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init  https://review.opendev.org/c/zuul/zuul-jobs/+/87168004:42
*** dmellado9 is now known as dmellado05:04
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role  https://review.opendev.org/c/zuul/zuul-jobs/+/87153905:17
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs  https://review.opendev.org/c/zuul/zuul-jobs/+/87167905:17
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init  https://review.opendev.org/c/zuul/zuul-jobs/+/87168005:17
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role  https://review.opendev.org/c/zuul/zuul-jobs/+/87153905:23
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs  https://review.opendev.org/c/zuul/zuul-jobs/+/87167905:23
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init  https://review.opendev.org/c/zuul/zuul-jobs/+/87168005:23
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role  https://review.opendev.org/c/zuul/zuul-jobs/+/87153905:37
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs  https://review.opendev.org/c/zuul/zuul-jobs/+/87167905:37
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init  https://review.opendev.org/c/zuul/zuul-jobs/+/87168005:37
*** atmark is now known as Guest110707:51
*** amoralej is now known as amoralej|lunch12:58
*** dviroel__ is now known as dviroel13:36
*** amoralej|lunch is now known as amoralej14:12
mnasiadkaMaybe a stupid question, but is there a mirror of cirros image on the OpenDev CI mirrors?15:22
fungitaking advantage of nice spring weather to grab an early lunch at the biergarten while i can. back in an hour-ish15:22
fungimnasiadka: we bake copies of it into our node images15:22
fungimnasiadka: https://review.opendev.org/873735 is a pending change to update that, but probably needs to be refreshed since it's a few months old now15:23
fungianyway, biab15:23
opendevreviewRadosÅ‚aw Piliszek proposed openstack/project-config master: Add nebulous/component-template  https://review.opendev.org/c/openstack/project-config/+/88297515:24
clarkbI've tried to summarize the docker speculative image problem with images in quay here: https://etherpad.opendev.org/p/3anTDDTht91wLwohumzW feel free to add more info to that if you have ideas or clarifications15:40
opendevreviewRodolfo Alonso proposed openstack/project-config master: Revert "Temporary disable nested-virt labels in vexxhost-ca-ymq-1"  https://review.opendev.org/c/openstack/project-config/+/88278715:44
opendevreviewClark Boylan proposed opendev/system-config master: DNM testing sideloading container images  https://review.opendev.org/c/opendev/system-config/+/88297715:58
opendevreviewMerged openstack/project-config master: Revert "Temporary disable nested-virt labels in vexxhost-ca-ymq-1"  https://review.opendev.org/c/openstack/project-config/+/88278716:05
*** amoralej is now known as amoralej|off16:20
stephenfinfungi: clarkb: I assume you're aware of https://blog.pypi.org/posts/2023-04-23-introducing-pypi-organizations/ (context being management of OpenStack projects on PyPI)16:23
stephenfinExample of org in use https://pypi.org/org/pallets/16:24
stephenfinExample of project belonging to an org (look at Owner in the left column) https://pypi.org/project/Flask/16:25
clarkbits been mentioned a couple of times. I haven't looked into it yet16:42
fungiokay, i'm back16:44
fungiand yeah, i did register a pypi user called "opendev.org" a while back we can use for the org account16:45
clarkbI don't think 882977 is working unfortunately so even the potentially simple hacky workaround needs work :/16:54
clarkbI suspect the reason is that running skopeo copy doesn't look at aliases since it is a specific copy from X to Y command16:55
clarkbonce the job finishes we should get more logs16:55
opendevreviewClark Boylan proposed opendev/system-config master: DNM testing sideloading container images  https://review.opendev.org/c/opendev/system-config/+/88297717:03
clarkbyay for logs this may have just been pebcak. Hopefully it works but I'm still not getting my hopes up17:03
opendevreviewClark Boylan proposed opendev/system-config master: DNM testing sideloading container images  https://review.opendev.org/c/opendev/system-config/+/88297717:48
clarkbnow it is my turn to step out and enjoy the weather before the heat wave arrives. I'm going to jump on the bike with no particular plan and be back when I'm back. I think this will be good for head clearing after debugging the docker stuff last night17:50
fungiget out there!17:51
clarkbfungi: have you had a chance to look over the etherpad I linked earlier? curious if you've got an feedbakc for what I've brainstormed so far21:35
clarkbmy hacky change (882977) appears to work actually. So thats one potential workaround (a not good workaround but a workaround)21:35
fungino, sorry i've been trying to dig out from under yesterday's culmination of an exceptional openstack security advisory, the resultin breakage and ripple effects are unfortunately ongoing21:38
clarkback. When you get a chance it would probably be worth looking at. I've got one possible workaround in 882977 mocked up (currnetly only for jammy but theoretically possibly to solve for older systems)21:40
fungithis is the quay speculative container builds issue?21:44
clarkbyes21:44
fungiokay, found the link in scrollback21:44
opendevreviewMerged openstack/project-config master: Add Node Feature Discovery FluxCD app to StarlingX  https://review.opendev.org/c/openstack/project-config/+/88188321:56
ianwmnasiadka: no, not in the mirrors.  but we do actually pull a range of images into each node's on-disk cache via nodepool -> https://opendev.org/openstack/project-config/src/branch/master/nodepool/elements/cache-devstack/source-repository-images21:59
ianwyou can look at devstack which sets itself up to use these caches22:00
ianwclarkb: urgh.  yeah it's not fair comparision because it's nested, but podman in the nodepool container has been ... interesting.  22:08
ianwi will say whenever we have raised a problem, they have been responsive.  but the problem is having to raise the problem :)22:09
clarkbianw: ya I'm actually starting to come around to the workaround in 882977 as a short term thing at least.22:11
ianwuse-buildset-registry already pulls the images from changes ahead of it into the BR ... perhaps we need a "populate-from-br" role?22:11
ianwthe pulls the images from the BR locally first in a generic way?22:11
clarkbianw: the problem with that is there isn't sufficient info in the test system to know if it shoudl pull or not22:11
clarkbspecifically you need to pull the images if they aren't in the BR too22:12
ianwif would have to work the same as use-buildset-registry and ask zuul right?22:12
clarkbianw: zuul doesn't have the info is the problem22:12
clarkbbecause you would only have images in the BR if you were updating the images too22:12
clarkbbut somtimes you update only ansible and want to check that existing images work with new ansible for example22:13
ianwso won't they pull as usual from quay?22:13
clarkbI think this means we need to more tightly couple the image population to each specific service's ansible playbook/roles22:13
clarkbianw: well the issue is if you pull per usual you'll overwrite what is populated by the BR22:13
clarkbyou can only do one or the other22:13
clarkbbut I think we can do that by embedding a thing in each service ansible that check if we are under testing and then populates the specific images it needs using skopeo then let that come from the BR or quay using the aliases normally22:14
clarkbI suppose we could do what you are suggesting and prevent docker-compose pull from running and feed a list of images into the population thing.22:14
ianwhrm, if you pull from the BR when you know you ahve to , and tag it as :latest in the local ... won't that satisfy docker-compose?22:15
clarkbwe just have to be very careful that we don't skopeo copy then docker-compose pull undoing what we did with skopeo22:15
clarkbianw: no becaus when you do a docker-compose pull it will check quay proper and notice that latest is different in quay.io and pull what is on quay.io overwriting what you did with skopeo22:15
ianwhuh, i would have thought it would trust the local first :/22:16
clarkbusing my example of the haproxy-statsd stuff I think what we want is a role that runs after docker-compose pull and accepts a list of images. Then if we are running under CI it installs skopeo however we do that for the distr version hen for each image it skopeo copies the image22:16
clarkbianw: its basically what happens today when we update :latest on docker hub22:17
clarkbianw: the local version says it is :latest but then when we pull it sees docker hub has a newer :latest and it fetches that22:17
clarkbthis is doable we just have to embed it into the actual service ansible a bit more which is unfortuante but not the end of the world. Particularly if we don't run any of the weirdness in prod22:17
ianwfair enough, when you put it like that :022:18
ianw:)22:18
ianwcould you change quay.io to the BR in /etc/hosts?22:19
clarkband I think we can probably start with that and see how ugly it turns out to be and use it in the short term. Then long term we can kill two birds with one stone and replace docker and docker-compose with their podman et al equivalents22:19
clarkbianw: that doesn't work because in the BR the image path is br:5000/quay.io/foo/bar22:19
clarkbwe might be able to do some fancy templating to make that work though22:19
clarkbwe should add that idea to the etherpad22:20
fungiat one point we talked about making zuul-registry a pullthrough for anything it doesn't have locally... that never happened did it?22:20
clarkbthe two problems with it are the nonstandard port and the quay.io/ path prefix (and possibly the insecure registry settings)22:20
clarkbfungi: it did22:20
clarkbwell not pull through22:21
clarkbbut it is happily hosting images for quay.io that we shove into it. The problem is the client also needs to be intelligent enough to fetch it from the registry22:21
fungibut if we ask the br for something it doesn't have, it'll fetch it from the right public registry and serve that to the requester?22:21
clarkbfungi: I think what you are describing is effectively the option on line 3622:21
fungigot it22:21
clarkbfungi: no, today the zuul registry is set up as the first alias for an upstream with the upstream being the second alias. Then clients try them in order until they get a winner22:22
clarkbfungi: the problem here is that docker only knows how to do this for containers on docker.io. It is effectively vendor lock in when you consider someone fixed it for them and they refused to merge the patch for 6 years22:22
* fungi puts on his "shocked" face22:23
clarkbfungi: whta this means is that docker will always try to talk to quay.io directly when you list an image from quay.io. The only way around this from what I can tell is to run docker with an http_proxy set22:23
clarkbThen you could run an http_proxy that was super smart for our needs and looked at the BR first and falls back to upstream if not in the BR22:23
clarkbthis is the otpion on line 3622:24
clarkbya as much as I'm super frustrated by this whole thing it is also really good motivation and indication we are making the right high level choices in moving away from their stuff22:24
ianwjust throwing ideas, but perhaps instead of podman, do something like run minikube and run in a more k8s-ish way?  that might be interesting and something that could keep services flexible for possibly moving to some sort of k8s22:24
clarkbianw: so podman actually suggests that for production workloads and I personally hate the idea :) I think it is ridiculout that you have to run an entire kubernetes to run a single container22:25
clarkbI'd rather go back to puppet deploying deb packages or whatever before I do that22:25
clarkbIt feels like everyone has drank so much koolaid they never stopped to think if they stepped over into silly territory22:26
ianwi do agree; but i guess the advantage is that in the future, you don't point it at the kubernetes you're running but that is being run elsewhere22:26
clarkbianw: the problem with that is we have nowhere to run a k8s that doesn't involve us doing it ourslves which is a ton of work22:26
clarkbmagnum is not production worthy22:26
clarkbin particular it relies on short lived distros for the base OS so you don't get security updates almost immediately and even if you did you can't upgrade the k8s on magnum anyway22:27
clarkbso ya bespoke minikubes on each host would theoretically work. It just feels like terrible design22:28
clarkbwe'll end up with far more complicated service management than the services we are running22:28
ianwsure; i mean practically i'm thinking in the same way rax and vexxhost provide us openstack resources, someone/something provides us openshift resources, etc. in a way that is practical to consume22:28
ianwanyway, just a thought.  i didn't find minikube that hard to work with when i updated some of the zuul-jobs testing to use it as a test-backend for k8s things22:29
clarkbI'm also firmly against openshift because you cannot test on it22:30
clarkband it needs like 12GB of memory just to run a single container. I mean I get that it is a possibility it just doesn't seem like a good one22:30
clarkb(side note I really think openshift has dropped the ball on making their software consumable from a development standpoint. minikube shows this can been done pretty effectively for the kubernetes world)22:30
* fungi wonders where microk8s fits in that spectrum22:31
clarkbit is unfortunate because v3 made it possible but they broke all that with v422:31
ianwyeah, it was 9gb when i looked into that stuff ^^^ which was a non-starter on our 8gb systems, and why minikube became the best option, plus the ubuntu integration22:31
clarkbfungi: microk8s is basically equivalent to minikube22:31
ianwsorry, microk8s ... yeah minikube wasn't an option for various reasons in the email i sent22:32
fungigot it, so not appreciably bigger or smaller than minikube22:32
ianwi forget but there was some work going on making a smaller openshift22:32
fungithey should call that effort "downshift"22:33
clarkbI think minikube/microk8s both need about 1GB memory22:33
ianwbut i guess the gist is that if you're deploying services with k8s, the backend should at least theoretically be fairly switchable22:33
clarkbthere is also kind but then you are back to docker :)22:33
clarkbThere's a thing going around about how some crypto company paid datadog $65million for services in one quarter a year ago22:34
ianwhttps://github.com/openshift/microshift was it22:34
clarkbFeels like this sort of "just run an entire kubernetes for something that should be simple" falls under a similar problem space22:35
clarkbyes it is doable and it works, but it is complete overkill and you don't really get any benefits from it22:35
ianwthis idea ... that you run a k8s on the "edge" as we'd effectively do, is a thing22:35
clarkbianw: it is, but again I think it is complete overkill. We'd have to double the size of a number of our servers22:36
clarkband we don't really get any benefits because most of these applications can't run in an active active manner with many redundant load balanced instances22:36
clarkbetherpad can't, irc bots can't, gerrit can't without a whole lot of extra work, etc. Gitea and hound likely could.22:36
ianwi think the load balancing is one thing, but essentially the abstraction about where the service is running is somewhat cool22:37
clarkbyou really only start to see benefits if you can centralize the k8s (theoretically possible but we'd be on our own today running a large service like that) and/or use it for redundancy and load balancing22:37
clarkbIf magnum was a viable option I think this would be far more likely22:38
clarkbbut we don't have a gke equivalent that we can click a button on and run with22:38
clarkbto backup a bit. I don't think "move container image from this registry to that registry" should require us to completely change how we run just about every service we have22:39
ianwright, it gives you the *option* to centralise; by basically having all the work on the deployment side close-to-done.  is that worth it?  dunno, i guess you'd say no, i'd say "maybe"22:39
clarkbIf we decide that is the road we want to take I would argue we undo the registry move and start with the complete rearchitect first. Then worry about image hosting22:39
ianwthat i agree with -- replacing docker.io has really shown how embedded it is22:40
clarkbianw: I think it is really only worth it if you don't have to suddenly run a very complex application alongside everything else. Most people use $cloud to do it for them and its fine22:40
clarkband maybe this should be an indicatin that we've failed to provide adequate feedback to the magnum team over the yaers and we should start changing that22:42
clarkband step one in that process would be to deploy a new magnum k8s cluster so that we can sort out the current issues22:42
clarkbhowever I'm 99% sure the k8s nodes are still fedora based which means we'll only get a few months out of each k8s cluster before it needs deletion and recreation22:43
clarkbhopefully I'm not completely crazy thinking we've jumped from "containers are amazing use them everywhere" to "the only way to run a container now is with k8s" :/22:44
ianwno, sorry to derail things.  just if there's a lot of work in switching to podman on ubuntu (that we know is probably not the most well trodden path), "edge kubernetes" might also be worth considering22:52
clarkbI think it is a good to throw ideas out there. I'm just really skeptical of anything that requires us to redo completely everything and run more services23:22

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!