Monday, 2022-11-28

*** swalladge is now known as Guest13801:38
*** yadnesh|away is now known as yadnesh04:48
*** dasm|off is now known as Guest18805:30
opendevreviewCedric Jeanneret proposed openstack/project-config master: Ensure NetworkManager doesn't override /etc/resolv.conf  https://review.opendev.org/c/openstack/project-config/+/86543308:25
*** jpena|off is now known as jpena08:42
*** prometheanfire is now known as Guest21709:11
*** yadnesh is now known as yadnesh|afk09:43
*** yadnesh|afk is now known as yadnesh10:12
*** anbanerj is now known as frenzy_friday|rover10:47
*** dviroel|out is now known as dviroel10:58
*** frenzy_friday|rover is now known as frenzy_friday|rover|food12:22
opendevreviewMerged openstack/openstack-zuul-jobs master: Add py310 master template jobs  https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/86228613:06
*** frenzy_friday|rover|food is now known as frenzy_friday|rover13:38
Tengufungi: heya! glad you appreciate my python in-lining :)13:52
fungii do, it's clever13:55
Tengu:)13:55
*** akekane is now known as abhishekk13:57
*** Guest188 is now known as dasm13:58
*** yadnesh is now known as yadnesh|away13:59
Tengufungi: while I'm at it - probing: what would be your thoughts on getting a squid proxy with dedicated CA in order to make proper web content caching, even from TLS sources such as ansible-galaxy?14:49
Tengucontext: everyday, we're seeing failures from ansible-galaxy, usually 502 errors from their side, and this breaks TripleO CI jobs, meaning "more recheck" that would be otherwise avoided/not needed.14:50
TenguI'm trying to find a "nice" way out of this situation, and using some caching-proxy, if possible at infra level, seems like a possible way.14:51
Tenguafaik, there are already RPM mirrors available. Not 100% sure about who manage them though.14:52
fungiTengu: we have a content caching proxy in each region already, with valid ssl certs. it's using apache mod_proxy/mod_cache instead of squid, but is the specific proxy software decision important in that case?14:54
*** dviroel is now known as dviroel|afk14:55
fungiright now we use them to cache pypi, npm, dockerhub... i forget what else14:55
fungiwe could add the ansible galaxy site too, i expect, depending on how proxyable they've designed it14:56
Tengufungi: ah, so if we configure our ansible tasks calling ansible-galaxy to use those existing content proxy, that would be just working out of the box?14:57
Tengufungi: what would be required to do some tests? 14:57
fungiTengu: here's where we'd add the proxy: https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror/templates/mirror.vhost.j214:58
Tengufungi: is there some doc?14:58
Tenguah, yeah. ok. got it.14:58
Tenguso I was thinking "squid" because it's "slightly" easier to put in place :).14:59
Tenguno need to make new vhost/others14:59
fungiand yeah, whatever's pulling from ansible-galaxy would need to be told to pull from the local proxy url instead (it's not a transparent proxy, it presents itself as a copy of the site being proxied)14:59
opendevreviewDr. Jens Harbott proposed openstack/project-config master: Use kolla.config for kolla-ansible in gerrit  https://review.opendev.org/c/openstack/project-config/+/86568614:59
Tenguand it generates certificates on-demand, using a provided CA14:59
fungino need to make a new vhost, just use a specific path on the main mirror vhost unless whatever's pulling from galaxy is very particular about the relative path to the content15:00
TenguI'm pretty sure caching ansible-galaxy shouldn't be too hard - though it may take some space.15:00
Tenguyeah - right. /galaxy or the like.15:00
Tenguwe'd need to do some testing I gues.15:00
Tengu*guess15:00
fungimost of our proxied content shares a single vhost on each mirror server15:00
fungibut if you look at the template i linked, you'll see a bunch of examples15:00
fungiat least enough to get some ideas and do some initial manual investigation into feasibility15:01
TenguI see pypi has a rewrite.15:01
Tenguguess galaxy would need the same.15:01
Tengufungi: guess getting URI hit by an "ansible-galaxy collection install foo" would be a good start?15:03
Tenguuho. wait. you're using that same apache thing to cache container layers?  It soulds wrong..15:04
Tengu*sounds15:04
Tengufungi: so for instance:   Downloading https://galaxy.ansible.com/download/tripleo-operator-0.9.0.tar.gz   that's the URI hit by `ansible-galaxy -v collection download tripleo.operator'15:08
fungiTengu: i'm afraid i'm not familiar enough with container distribution to know why caching layers is bad15:12
fungiTengu: does the ansible-galaxy command provide an option to specify a different url, or some other way to configure the url it will go to?15:12
Tengufungi: overall size. using a docker-registry instance as caching-proxy is probably better, and allows a clean management of the registry content (hence cache). but it's a detail.15:13
Tenguas for galaxy - I have to check, but it supports the "http(s)_proxy" environment variable.15:13
fungiTengu: we've looked into registry options, but the last time we dug into it all the ones you could run yourself lacked a safe way to live purge old content15:14
Tenguas far as I can tell, it doesn't support passing tweaked URI. though I have to check a doc15:14
fungiit's been a while though, so maybe they've improved15:14
Tengufungi: I have one running here, it's cleaning things older than 7 days by default (can be configured of course)15:14
fungiTengu: these proxies are not transparent proxies, so http(s)_proxy envvars won't be much help15:14
Tenguah, that's what you call "transparent proxy" - sorry, not same definition on my side ^^'.15:15
fungiTengu: yeah, if memory serves, the ones we looked into at the time had to restart the registry to purge old images/layers, so went offline briefly when doing so15:15
Tenguheh, yeah, bad15:15
fungiTengu: basically we have no way to access control the proxies, so we need to make sure they can't be used to proxy arbitrary content15:16
Tengugimme a moment, have to check a doc about "on-premise ansible-galaxy mirror", it should point to the way to set custom host.15:16
fungii agree, for container images, a pull-through registry backed by other registries with some configurable retention policy would probably work better15:17
TenguI can help on that if needed. I'm running such a registry in a podman pod, alongside redis for index data15:18
Tenguseems to work pretty well15:18
Tenguso - apparently there's a way to pass an API_SERVER to ansible-galaxy collection install15:18
fungiwhat we have was basically state of the art for 2017 or thereabouts, so we should continue to investigate whether the landscape for centralized container image caching has improved15:18
Tengubut I'll have to do some tests.15:18
Tengu:)15:19
Tengufungi: stupid question if I may: why using httpd with mod_proxy/mod_cache instead of an actual caching software?15:19
fungimod_cache is caching software, isn't it?15:19
Tengunot the best afaik.. ?15:20
fungibut if you mean "why not squid" it's that we already have a need for apache on those servers in order to serve the afs caches of our package mirrors15:20
Tenguok..15:20
Tenguand I guess the signed certificate was also a reason15:21
fungiwell, we could configure squid to use the cert when serving connections directly15:21
fungiif we wanted to install both on the server15:22
Tengui.e. with squid as a TLS MitM, a crafted CA would be needed so that it can create certificates on the fly - such CA would need to be added then15:22
fungioh, yes we really don't want to complicate things with a mitm configuration. is that what you consider a "real proxy"?15:22
Tengu:)15:22
Tenguyeah15:22
Tengusomething able to cache TLS content directly.15:23
fungiokay, then no real proxies is one of our basic requirements for this ;)15:23
Tenguby default, squid can't decrypt.15:23
Tenguit just opens a pass-through tunnel and doesn't see anything15:23
fungiagain, we have no way of access controlling these proxies, so don't want them able to be used to proxy arbitrary content. they're reachable from arbitrary hosts on the internet15:23
Tenguno filtering at all? ok15:24
Tenguanyway. I think we should be able to tweak the ansible-galaxy command to use "-s {PROXY}"15:24
fungiwe "filter" by configuring which specific websites they're backed by15:24
Tenguthough I'll need to take some time to test it properly.15:24
fungiso test nodes can't just generally use them to proxy all requests to the web15:25
Tenguthe doc is.. well.15:25
fungisince we don't control the network topology for the cloud resources donated to us, filtering clients based on source ip address or the like isn't really an option15:26
Tenguright. (squid allows authentication)15:26
fungialso we give proposers of untrusted changes root access to the "clients" so they'd be able to read any authentication tokens local to the test nodes15:27
fungihence the need to design this so that it's unlikely to be abused as a clandestine web access anonymizer15:27
Tengunote that squid also has ACL based on backend host names ;)15:28
Tengubut anyway15:28
Tengumode_proxy/mod_cache it is15:28
Tengumakes things a bit more complex for ansible-galaxy apparently.15:28
fungiyes, we could limit access through the proxy to content for specific sites, but then people couldn't just set htp(s)_proxy envvars globally and would need some way to switch them for specific tools/sites only15:29
Tenguno_proxy - but yeah. I've worked on that in tripleo, and proxies are alwaya messy to manage.15:29
Tenguespecially when operator doesn't know what they're doing -.-.15:29
fungianyway, not to belabor the point, but we've ruled out operating transparent/mitm web proxies for a number of security and manageability reasons15:30
Tenguhmmmmmm so "-s" seems to be the correct param.15:31
fungihence the odd rewrite gymnastics necessary for sites like pypi and dockerhub that like to split indexes and content between different domains15:31
Tengulemme start a dumb container with apache/mod_proxy and see how it goes.15:31
fungiany time we've brought up with maintainers of those sorts of sites the idea of redesigning their content to make it easier for direct proxying, the response has generally been "why are you bothering to proxy? we have a cdn already"15:33
Tenguheh15:33
Tenguppl don't get the actual use of cache.15:34
fungieven after very detailed explanations, no15:34
Tenguand we're wondering why web content delivery is so slow, why website layout are so terrible and so on..15:35
fungii think people who are old enough to remember metered residential network access or uucp batching understand, but a lot of folks have grown up treating the internet as a limitless utility15:35
Tengu"back then", it was better :)15:35
Tenguyep15:35
* Tengu feels old now15:36
Tenguthank you fungi -.-15:36
fungii feel the aches and pains of old age every morning when i wake up, it's all the reminding i really need ;)15:36
Tengu.. I try to forget about aches and pains, especially in the back -.-15:37
Tengushhhhhh ;)15:37
Tengufungi: my httpd skills are rusted (thank you nginx) - is there a way to ensure "/galaxy/" is removed from the query done in the backend?15:52
fungiTengu: the pypi config does what i think you're asking: https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror/templates/mirror.vhost.j2#L242-L24515:56
fungiTengu: for example https://mirror.dfw.rax.opendev.org/pypi/simple/bindep/ goes to https://pypi.org/simple/bindep/15:58
Tenguhmm.15:58
Tenguweird.15:58
Tenguah, there are redirect..16:01
Tenguthough I thought ProxyPassReverse was supposed to take care of them.16:01
Tengufungi: so, "in theory", getting a new ProxyPass /galaxy/ https://galaxy.ansible.com/   would work.16:02
Tengunow there's some tweaking - and since my httpd config skills are so rusted, it will take some time for me to come to something that is actually working.16:03
Tengufungi: is there a place where to push a request for new endpoint?16:03
Tenguso yeah - proxypassreverse should catch the 301 and rewrite it properly. pfff. probably missing something in my local httpd config.16:14
vishalmanchandaclarkb:hi, could you fix linters on your patch https://review.opendev.org/c/zuul/zuul-jobs/+/865459 once you have time.16:18
clarkbvishalmanchanda: yes I can take a look shortly16:20
vishalmanchandaclarkb: thanks.16:20
clarkbTengu: fungi: correct, we do not use transparent proxies because they can be abused. Instead we reverse proxy specific content. For docker container caching fungi's memory is also correct. We haven't found a registry that can live prune data which is kind of important for caching.16:20
Tenguclarkb: heya :). Thanks for the confirmation - I was more focused on the "TLS impact" of a transparent proxy, actually. Regarding the container layer caching, I'm using this currently: https://paste.openstack.org/show/bI1JecmqTjtFREczs7te/16:22
Tenguthere's ONE issue with the docker-registry: it can only proxy one backend. Meaning: you have to run as many registries as backend. Which is a bit stupid, but understandable.16:23
clarkbTengu: he docker registry cannot be pruned16:23
Tenguand if the mod_proxy is able to manage the layers, well.16:23
clarkb* cannot be pruned while up16:23
Tenguclarkb: well, apparently yes.16:23
clarkbso it is a non starter16:24
Tenguat least there are options allowing to clean things older than UPLOADPURGING_AGE (in my case, 7 days)16:24
Tenguit's in the registry "maintenance" config section16:24
clarkbaiui you cannot do that safely while it is up. It may create failed requests16:24
clarkbwe investigated this a fair bit before we added the apache caching for docker hub16:25
clarkband it was extremely dissapointing. Other issues included the swift implementation not working16:25
clarkb(it would return 0 byte layers often)16:25
Tengufor my use-case, it's ok like that. still: my point is more about the ansible-galaxy caching needs :)16:25
clarkbsure, we have tools for that. They aren't perfect but they address the varying demands of the system reasonably well16:26
TenguI just proposed https://review.opendev.org/c/opendev/system-config/+/865869 - not really sure how it can actually be tested, i.e. if Infra has some sandbox/playground.16:26
clarkbTengu: check the CI jobs for that change. There should be a job that deploys a mirror and you can use the testinfra tests to query it16:26
clarkbI'm pretty sure we already have tests that check the pypi (I think pypi) proxy16:26
Tenguhmmm care to point to that "testinfra" repository?16:27
clarkbTengu: opendev/system-config/testinfra16:28
Tenguok16:28
Tengudirectly in. ok.16:28
Tengugood - in "test_mirror.py"16:28
TenguI can add a thing for galaxy.16:28
clarkbseparately, I believe that tripleo had avoided needing a galaxy cache by having zuul cache the git repos for the various ansible roles instead16:29
clarkbit would probably be a good idea to clean that up if it is no longer used16:29
Tenguit's failing because there are actual runs of `ansible-galaxy collection install', usually in the molecule tests16:29
Tengubut yeah, if we can get the cache, that would be useless. Lemme add a note, I have a call with TripleO CI tomorrow about that matter.16:30
clarkbansible-galaxy can install from disk though iirc16:30
clarkbwe do it for a role or two that we use in the infrastructure iirc16:30
clarkbanyway, I don't really care wha installation method you use. I just don't want to keep caching git repos we don't need to cache anymore if that is the case16:31
Tengunote added, thanks clarkb for that info!16:32
Tenguclarkb: ah, if you're willing to check on some other change requests, care to have a look at https://review.opendev.org/q/topic:unbound%252Fnetworkmanager ? Note, ianw has an open question, so maybe not fit for a merge right now.16:40
clarkbwhy are there two changes?16:42
clarkbfwiw I think the correct place to fix this is in the simple init element for disk image builder. Not base-jobs or our infra specific elements16:43
Tenguclarkb: well, unbound seems to be configured in another location than the disk image builder - that's why I also edited that "configure-unbound" role.16:44
clarkbits just updating the resolvers to ip version specific resolver config in the job16:44
clarkbwhether or not NM touches it is completely separate16:44
Tenguwell, it's also configuring unbound actually16:45
Tenguwhat the point at configuring unbound and setting the resolv.conf content if it's then squashed by NM?16:45
Tengu(that's what we're seeing in tripleo jobs - hence those 2 patches)16:45
clarkbTengu: unbound is expected to be fully configured at that point16:46
Tenguyes. but it won't be used16:46
Tengubecause NM will override it pretty fast16:46
clarkbBut some clouds NAT all ipv4 outbound16:46
clarkbNAT + UDP (eg DNS) can be unreliably. What the base-jobs role is doing is checking for an ipv6 capable instance and flipping its forwarding config over to ipv6 resolvers to avoid nat16:46
clarkbyou should not need to change anything in base-jobs to fix the network manager problem16:47
Tenguwell.... in that case, there's no use to override the /etc/resolv.conf in the first place.16:47
Tengunor to configure unbound actually.16:47
clarkbI don't understand16:47
clarkbthe point is we are using unbound16:47
clarkbsome clouds need unbound configured to use ipv6 forwarders16:47
Tenguwell, it's NOT used during the CI job lifetime.16:48
Tengubecause NM overrides the /etc/resolv.conf at some point16:48
clarkbyes I understand that16:48
Tengu(lease refresh, service restart, whatever)16:48
Tenguso the configuration I inject there is to ensure this doesn't happen16:48
clarkbbut unbound is configured in the base image.16:48
TenguI don't touch unbound config16:48
clarkbConfiguring it in the job is too late16:48
clarkbbasically updating base-jobs is redundant and confusing16:48
clarkb(as this conversation illustrates)16:49
clarkbyou should only do the configuration in the base image16:49
Tenguby not touching the configure-unbound then?16:49
clarkbI guess simple-init doesn't assumt unbound so maybe project-config is fine. But base-jobs is just redundant16:49
clarkbTengu: when the instance boots it is using unbound for all DNS by default configured to ipv4 resolvers. Early in the test jobs we check if we have ipv6 and flip the resolvers over to ipv6 resolvers to avoid ipv4 NAT. If you fix network manager at that stage it is already too late as something may have broken DNS16:50
clarkbto properly fix this problem you need to have it fixed at boot time is what I am saying16:50
Tenguclarkb: so for you, only https://review.opendev.org/c/openstack/project-config/+/865433 is valid - the second one affecting ansible role "configure-unbound" is useless - what about jobs not using the nodepool image? (is it a valid case?)16:50
clarkbTengu: no that is not a valid use case. You cannot run a job in opendev outisde of our images16:51
Tenguok. so I can, indeed, discard the ansible version of the enforcement.16:52
clarkb(you actually can through ansible inventory manipulation but if/when you do that you are on your own)16:52
clarkb(and any such inventory manipulation should happen well after base jobs playbooks have run)16:52
Tenguok, abandonned the base-jobs one.16:53
clarkbok ya it is nodepool-base that configures unbound for our images and not an element in dib so project-config is the correct place to add the override16:53
Tenguso we keep only the thing in the disk-image-builder16:54
Tengu\o/16:54
Tenguhafl wrong, half right :)16:54
Tengufungi: for the proxy test, I guess the one I want to check is "system-config-run-mirror-x86" job? if green, means my /galaxy/ endpoint is good?16:56
clarkbTengu: side note, I would suggest against pushing every change as a WIP16:57
clarkbI did not review the NM stuff last week because it was marked WIP16:57
Tenguclarkb: why so?16:57
clarkband because it prevents others from landing your change without your intervention if it is actually ready to go16:58
clarkbbasiclly use it when you know the change is not ready16:58
Tenguwell, sure, that's the actual advantage of WIP: not bother ppl16:58
clarkbbut if the only question is CI then let it be mergable16:58
Tenguthat said, I can mark the mirror one as "active", since I was able to add a test16:59
Tenguso it's "just" a matter of getting green CI.16:59
clarkbbasically you should mark it WIP when you know you don't want it to merge. Which is different than asking someone else to help evaluate if it is mergable16:59
Tengu'k, divergent view on the WIP flag, no issue for me :)17:00
clarkbTengu: left a comment on the cache change17:02
fungipart of the reason for that workflow is also because a lot of the projects' acls don't grant core reviewers the ability to un-set wip on your change, therefore they still need another round of action from you to un-wip before approving17:03
Tenguclarkb: hmm ok. wasn't aware of the need for the name - I'll indeed update to ansible-galaxy17:03
fungi(the ability to delegate that is a more recent addition in gerrit than the wip implementation, so projects are only just now starting to update their acls to grant it)17:03
clarkbI'm looking at ansible galaxy and I'm fairly certain we will never cache the search/index results due to the url parameters17:05
clarkbThe downloads themselves are served by s3 so won't be cached either17:06
clarkbhowever, I suspect something like the pypi cache setup would work17:07
Tengufun - while using the "-vvv" params with ansible-galaxy, it doesn't show anything else than the galaxy.ansible.com/download tree17:07
clarkbbut I'm not sure as there are parameters in the s3 redirect as well17:07
clarkbI think docker does this too? so there are examples you can look at at least17:07
Tenguoh, ok. I get it. nice 302 hidden.17:08
Tenguso now I can use WIP? :)17:08
clarkbTengu: I don't think that is necessary as the change is already -1?17:08
clarkbTengu: bu if you are worred about someone merging it with a -1 then sure17:08
Tengu-1 was dropped with the new push.17:08
clarkboh ou pushed a new patch.17:09
clarkbI can -1 again :)17:09
Tengujust -W it :)17:10
TenguI'll work on that tomorrow - it's late here (EMEA)17:10
Tenguthanks for the help clarkb :)17:10
*** jpena is now known as jpena|off17:54
*** dviroel|afk is now known as dviroel18:39
*** rlandy is now known as rlandy|afk19:06
*** dviroel is now known as dviroel|afk21:00
*** swalladge is now known as Guest27721:19
*** rlandy|afk is now known as rlandy21:39
*** dasm is now known as dasm|off22:01
*** Guest217 is now known as prometheanfire23:12

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!