Thursday, 2023-01-19

ianw2ebdabe1-799f-4bb0-9ed6-758d9ee34bbc | test-server | ACTIVE | auto_allocated_network=10.100.0.147 | centos-8-stream-arm64-1674085448 | opendev-no-ephemeral00:14
ianwok ... so the new linaro cloud can boot a raw image uploaded, with no ephemeral storage.  that's a start00:14
fungiprogress!00:17
opendevreviewIan Wienand proposed opendev/system-config master: nodepool config: set linaro cloud to use raw images  https://review.opendev.org/c/opendev/system-config/+/87101000:38
opendevreviewMerged openstack/project-config master: nb04: use linaro region mirror  https://review.opendev.org/c/openstack/project-config/+/87100600:39
*** dasm is now known as dasm|off01:10
opendevreviewMerged opendev/system-config master: Update git in gitea images  https://review.opendev.org/c/opendev/system-config/+/87100902:21
opendevreviewIan Wienand proposed opendev/system-config master: openafs: use consistent name for cache size  https://review.opendev.org/c/opendev/system-config/+/87101402:31
ianw^ i've also manually fixed the linaro mirror to start openafs correctly; but that will do it permanently02:33
clarkbinventory/service/host_vars/mirror01.regionone.linaro.opendev.org.yaml is where the smaller size is set if anyone is wondering02:36
ianwhuh, that went into a recursive error02:57
ianwi guess you can not call a role with openafs_client_cache_size: "{{ openafs_client_cache_size | default(10000000) }}" # 10GiB03:33
opendevreviewIan Wienand proposed opendev/system-config master: linaro mirror: fix afs cache size  https://review.opendev.org/c/opendev/system-config/+/87101403:37
ianwit's confusing but i don't have motivation to do anything more fancy now03:37
fungimakes sense03:45
opendevreviewIan Wienand proposed opendev/system-config master: hound: use updated git packages  https://review.opendev.org/c/opendev/system-config/+/87101603:46
opendevreviewMerged opendev/system-config master: nodepool config: set linaro cloud to use raw images  https://review.opendev.org/c/opendev/system-config/+/87101004:21
opendevreviewMerged opendev/system-config master: linaro mirror: fix afs cache size  https://review.opendev.org/c/opendev/system-config/+/87101404:25
ianwi've recreated the "opendev" flavor on the new linaro cloud to not have ephemeral storage.  however nodepool still isn't sending work there as it doesn't have the right images, yet.  nb04 is building them (after nb03 went missing)04:37
opendevreviewMerged opendev/system-config master: hound: use updated git packages  https://review.opendev.org/c/opendev/system-config/+/87101605:09
*** soniya is now known as soniya29|rover05:41
*** ysandeep is now known as ysandeep|afk06:13
*** ysandeep|afk is now known as ysandeep07:35
*** jpena|off is now known as jpena08:05
*** soniya29|rover is now known as soniya29|rover|brb09:44
*** ysandeep is now known as ysandeep|afk11:00
*** dviroel|afk is now known as dviroel11:13
*** rlandy|out is now known as rlandy11:14
*** soniya29|rover|brb is now known as soniya29|rover11:25
*** ysandeep|afk is now known as ysandeep12:48
*** dasm|off is now known as dasm13:08
*** ysandeep is now known as ysandeep|dinner15:18
*** dasm is now known as Guest184615:29
*** Guest1846 is now known as dasm15:30
*** ysandeep|dinner is now known as ysandeep15:32
Tengufolks, I have a really, really weird behavior with the ansible-galaxy proxy: locally, it works. The vhost has the exact same configuration, though I don't have TLS enabled. But the mirror on opendev infra seems to have a difference making it unreliable.15:32
Tengufor instance, using the proxy, it's impossible to install "community.general" collection, while it does work through my local config.15:33
Tenguand I really don't know why this is failing. Especially since installing (so far) any other collection is working fine.15:33
fungihave you tested more than one mirror server?15:35
fungiwhat is the error you receive from it?15:35
Tengufungi: I don't know the URI for other proxies, so I tested only via https://mirror.iad3.inmotion.opendev.org:444815:35
fungican you replicate it consistently, or is it intermittent>15:36
Tenguconsistent on that one. here's the command: ansible ~/.ansible/galaxy_cache/api.json ; ansible-galaxy collection install -vvvvvvv -s https://mirror.iad3.inmotion.opendev.org:4448 -p ./ansible community.general15:36
Tenguerr... it's missing the beginning.15:37
Tenguhere:     rm -rf ansible ~/.ansible/galaxy_cache/api.json ; ansible-galaxy collection install -vvvvvvv -s https://mirror.iad3.inmotion.opendev.org:4448 -p ./ansible community.general15:37
Tenguthe first part is to clear all local cache. the "-p ansible" ensures we're using a local directory, in order to not pollute the system.15:37
fungihttps://mirror.dfw.rax.opendev.org:4448/ https://mirror.bhs1.ovh.opendev.org:4448/ https://mirror.sjc1.vexxhost.opendev.org:4448/15:38
fungithose are a few in more providers15:38
Tengulet's see.15:38
Tengusame on https://mirror.sjc1.vexxhost.opendev.org:4448/15:38
fungiwhat is the error you receive from it?15:39
Tenguit's an ansible CLI error - and it doesn't really provide data. I tried to compare things with the actual ansible-galaxy server, but didn't find anything. lemme paste the stack.15:39
fungito paste.opendev.org please ;)15:40
Tenguhttps://paste.openstack.org/show/biRnXTrehuHU0b0GNRZ5/15:40
fungithanks15:40
Tengu(no, I won't paste 60+ lines on IRC ;))15:40
fungimuch appreciated15:40
Tengu;)15:40
Tenguand if we point to another collection, say "ansible.utils", it just works fine.15:41
fungihttps://mirror.sjc1.vexxhost.opendev.org:4448/api/v2/collections/community/general/versions/ seems to paginate when i hit it with a browser15:41
Tenguit's expected.15:42
Tengufor instance, ansible-galaxy CLI does it on its own: Calling Galaxy at https://mirror.dfw.rax.opendev.org:4448/api/v2/collections/community/general/versions/?page_size=10015:42
Tenguand the, on the next line: Calling Galaxy at https://mirror01.dfw.rax.opendev.org:4448/api/v2/collections/community/general/versions/?page=2&page_size=10015:42
Tengubasically, ansible-galaxy wants to get the full index15:42
fungiso, there's one problem we observed with mod_substitute when proxying pypi15:43
Tenguwait.... you may have a thing15:43
Tenguansible.utils, a working collection, doesn't paginate15:43
fungiif a json response is all in one line, it may exceed teh maximum line length supported by the mod. this can be adjusted with a setting15:43
Tenguhmmm nope, it seems to be fine with the substitute15:43
Tenguansible.posix doesn't paginate either15:44
Tenguhmmmmm.15:44
Tengufungi: are there settings in httpd set outside of playbooks/roles/mirror/templates/mirror.vhost.j2 ?15:46
fungilooking at the pypi proxy we set SubstituteMaxLineLength 20m because of https://github.com/pypi/warehouse/issues/1191915:46
Tenguasking since the same vhost config is working locally - maybe there's something global I don't have, creating the issue.15:46
Tengufungi: I copied the setting in the galaxy vhost15:46
fungilooks like the limit in mod_substitute is 1m characters to a line15:47
Tengufungi: and... well, it would fail on my local env, but it's working fine.15:47
fungijust trying to rule that out real quick15:47
Tengulemme try to paste my httpd.conf from my container somewhere.15:47
Tenguit's pretty ugly, 1-file, but...15:47
fungitotal response size is 12584 bytes, so definitely not that15:48
fungian order of magnitude lower than would be needed to hit that problem15:49
Tengufungi: I also ruled out some internal cache issue within ansible code - I suspected that the "localhost:8080" being far shorter than the "mirror01......" used in the CI job, it may be truncated or something - but apparently it's not the case.15:50
Tenguthe trace is really weird.15:50
Gue___________________________Greetings #opendev. Quick question: is it possible to delete an etherpad that I created on your site or to delete the content including its history? We are hoping to use the pad for a brainstorm but would prefer if the convo did not live on forever. Thank you in advance for considering.15:50
fungiGue___________________________: that's not a supported use for our etherpad server. i think etherpad.org may have a public server which expires pads after a while, you might check there15:52
Tenguaha. yeah. ok. fungi I think you get a thing with the paginate actually.15:52
fungiGue___________________________: our etherpad is intended for public collaboration, and we make every attempt to preserve the history there for posterity15:53
Tenguthe ansible cache is far, far different.15:53
Tenguyessssssss15:53
Tengujm1: I fond a workaround!15:53
Tenguand it won't hit too hard: add "--no-cache" to the ansible-galaxy command15:53
fungiTengu: so local caching impacts it?15:54
Tengujm1: that will tell ansible to NOT touch its ~/.ansible/galaxy_cache/api.json15:54
Tengufungi: in a weird way - maybe due to something sent by the proxy, still.15:54
Tengugrumpf... isn't there some CLI one can easily use to paste a long file ?!15:54
Tenguinstead of copy-pasting blocks after blocks..15:55
fungii would probably resort to hacking some debug logging into resolvelib to get more detail about the dict that it's trying to access15:55
fungiTengu: there's the pastebinit tool15:55
Tenguhttp://paste.scsys.co.uk/2216 here15:55
Tengunopaste < httpd.conf15:55
Tengufungi: I checked the file on-disk15:56
Tenguits content is indeed "slightly" different when there's a paginate.15:56
fungifor future reference, pastebinit can paste to paste.opendev.org as well15:56
* Tengu takes note15:56
Tenguah, via -b paste.opendev.org  I guess.15:57
Tenguok.15:57
Gue___________________________@fungi Thank you, understood.16:01
*** dviroel is now known as dviroel|lunch16:01
fungiTengu: yeah, i have an "opaste" alias to that in my shell, for convenience16:11
Tengufungi: it failed to get the generated link. bah. I usually don't have to paste 100+ lines.16:12
Tenguanyway. I have a workaround, but it would still be nice to understand why it fails on the "prod", while dev env is fine :/.16:12
fungii would probably resort to hacking some debug logging into resolvelib to get more detail about the dict that it's trying to access16:13
Tengufungi: so I checked the JSON (yeah, the local cache is plain JSON), and it seems to miss things when it comes to that specific collection.16:28
Tenguit's... weird.16:28
fungiwhat part of the json is missing?16:29
*** ysandeep is now known as ysandeep|out16:38
*** dviroel|lunch is now known as dviroel16:57
*** marios is now known as marios|out16:59
Tengufungi: (sorry, was on some other discussion) the whole part matching the key shown in the trace17:01
Tenguso basically, it's as if it's flushing all of the data related to the versions17:01
Tengui.e. loads page one, injects data in the file, and that entry is dropped at some point when it comes to load the second page and tries to update it.17:02
Tenguand since it's supposed to be there, it crashes instead of re-creating (which is probably better).17:02
Tengubut this happens if and only if we're using the opendev proxies. My local httpd, with the configuration I pasted earlier, doesn't crash ansible-galaxy.17:03
Tenguthis is why I'm wondering if there are some other configurations in httpd, set outside of that mirror thingy.17:03
Tengufungi: I'm running my local proxy like this:    podman run --rm --security-opt label=disable -v ./httpd.conf:/usr/local/apache2/conf/httpd.conf:ro -v ./cache:/var/cache/apache2/proxy:rw -v ./logs:/usr/local/apache2/logs:rw -p 8080:8080 httpd:2.417:04
Tenguand then pointing ansible-galaxy -s http://localhost:8080 17:04
fungiTengu: looking at one of the mirror servers, we have https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror/files/apache-connection-tuning added17:07
fungibut aside from that, just the default configuration from ubuntu bionic (18.04 lts)17:08
Tenguhmmmmmm not sure it would really be a thing17:08
Tenguyeah. shouldn't do17:08
Tenguweird...17:08
Tenguanyway... getting late here. I'm a bit puzzled by that behavior, but I don't know what I can do. Yeah, adding some debugg, of course - maybe getting the file copied before update, that may help.17:09
Tenguif I have some time... though I doubt.17:09
fungii confirm that i don't find community.general in the json we get back from the proxy, but it's not in the json from galaxy.ansible.com either17:13
fungii guess community.general is a key in the local cache17:17
funginot in the json response17:17
fungii suppose that's the result of the earlier warning line17:17
fungiTengu: i suppose one difference might be that mirror.dfw.rax.opendev.org is a cname to mirror01.dfw.rax.opendev.org and while we're calling the former it's the latter we find being substituted in the response17:19
fungido you see the same error if you use https://mirror01.dfw.rax.opendev.org:4448/ instead of the hostname without the 01 in it?17:19
clarkbnote we also use the internal rax ip address in CI for the rax mirrors specifically (gets better throughput)17:21
clarkbbut I wouldn't expect that to matter too much as they are both CNAMEs so should be able to reproduce using the public name17:21
fungithough it's worth noting that we'll end up using the non-internal interface for subsequent calls recursed from the initial request since mod_substitute is writing the server name in there. maybe we need to substitute the hostname from the request instead17:22
clarkbI half expected that it already did that? I guess not17:23
fungiit would get extra broken if we started doing sni with different hostname-specific vhosts later17:23
fungiif you curl https://mirror.dfw.rax.opendev.org:4448/api/v2/collections/community/general/ you'll see the json says mirror01 instead of mirror17:24
clarkbdoes it do that with pypi?17:25
clarkb(it also substitutes iirc)17:26
fungialso in https://paste.opendev.org/show/biRnXTrehuHU0b0GNRZ5/ you can see the initial requests are to mirror.dfw.rax but then a subsequent request goes to mirror01.dfw.rax17:26
fungiclarkb: we don't embed the hostname in pypi responses, we use relative hrefs17:27
fungirewriting to /pypifiles17:27
fungier, not relative, but local17:27
clarkbaha17:27
*** jpena is now known as jpena|off17:30
fungii guess technically we could try that with the ansible galaxy substitutions too17:35
fungibasically just drop the scheme, servername and port from https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror/templates/mirror.vhost.j2#L58417:37
fungiSubstitute "s|https://galaxy.ansible.com/|/|ni"17:37
fungialso would allow to get rid of the scheme lookup conditionals17:38
fungioh, though the comment immediately above there states "ansible-galaxy CLI needs a fully qualified URI"17:38
fungiso maybe that was already attempted17:38
fungimaybe we can do it like17:49
fungiSubstitute "s|https://galaxy.ansible.com/|%{REQUEST_SCHEME}://%{HTTP_HOST}:%{SERVER_PORT}/|ni17:55
fungioh, i see it's a feature of apache 2.5.1: https://httpd.apache.org/docs/trunk/en/mod/mod_substitute.html (see the bit on expr= syntax)18:02
fungisome of our older mirror servers are still on bionic, so not new enough for that18:03
fungioh, even focal's isn't18:04
fungiyeah, nevermind. ubuntu lunar is even still on apache 2.418:05
fungiso yeah, the options are dwindling18:09
fungii guess we could make the preferred mirror hostname a separate ansible var and jinja that into the Substitute directive18:10
fungiassuming this ends up being the problem18:11
fricklerwow, that's some monster job logs that make my firefox choke when I open the corresponding zuul page https://184e5731741af40c59ec-11b479ab8ac0999ee2009c93a602f83a.ssl.cf1.rackcdn.com/870988/1/check/cross-nova-functional/1cb7591/18:43
clarkbfrickler: definitelyworth encouraging the nova team to address that. I twill make jobs run faster too18:45
clarkbI usually open large logs with vim and it mostly handles it18:45
fricklerthe problem is that the build page loads the huge job-output.json and them seems to break it18:49
fricklerhttps://zuul.opendev.org/t/openstack/buildset/e8968c8bf1be4caa86d4d6ef0fb23cd9 is the buildset page, the failed job is the one in question18:49
clarkbya I'm not sure what zuul can do about that. I guess show an error?18:49
fricklermaybe it should truncate oversized logs even before uploading them18:51
clarkbthe problem with that is you lose the information necessar to address the problem in many cases18:52
fricklerthem maybe just rename oversized job-output.json so it doesn't get autoloaded. it will still be available for manual inspection18:54
clarkblots of "WARNING [oslo_messaging.rpc.client] Using RPCClient manually to instantiate client. Please use get_rpc_client to obtain an RPC client instance."18:54
clarkb2617487 log lines in the job-output.txt. 2101438 are that line above18:55
clarkbmelwitt: ^ fyi18:55
clarkbwho hacks on oslo these days? stephenfin? Maybe that warning should be emitted once per process?18:56
fricklernot sure if or how that could be related to the eventlet bump though. doesn't seem to happen for other patches18:56
melwittI see stephenfin around oslo occasionally18:57
clarkbfrickler: I don't think it is. I suspect this is a change in oslo_messaging18:57
melwittI'll look at nova, should be "easy" to fix I would think18:57
clarkbI think the january 5 release of oslo.messaging version 14.1.0 added it. Commit 4ead7cb2dcf376032f7bf9532a375256db6d3784 was the change and appears to be after 14.0.019:01
clarkbtobias-urdin: ^ fyi19:02
melwittclarkb: looks like it was fixed two days agoi https://github.com/openstack/nova/commit/c59db128a00477f6163d71ea1454da4286dad70819:26
melwitt*ago19:26
clarkbhrm that log is from about 26 hours ago19:27
clarkboh maybe the change landed more recently than two days ago. The commit time would be earlier19:27
clarkbyup it merged 22 hours ago or so. That explains it19:28
melwittah, yeah19:28
clarkbthank you for looking into it19:29
melwittnp, thanks for the heads up about it19:30
Tengufungi: oh! using the host you pointed (mirror01.dfw.rax.opendev.org) , it seems to work now!19:47
fungiTengu: thanks for testing, that at least narrows down the cause. now to figure out what to do about it19:47
Tenguclarkb: I'd expect it to actually play a role, because the ansible-galaxy cache is using the servername (as passed in -s <servername>)19:47
fungithat also makes a lot more sense as to why skipping the local cache works around the issue19:48
Tenguyup19:48
Tenguso in the CI case, skipping local cache is OK, since it's a one-show.19:48
fungiwould still be nice to figure out how to do that substitution so that you don't have to use that workaround19:49
Tenguand the local cache is a dict, built as {'host': {'module_name': {'path1': ..., 'path2': ...}}}19:49
Tenguor something like that19:49
Tengufungi: iirc httpd itself should know its actual name?19:50
fungiwell, that's the tricky part. there's not necessarily any single name, that vhost supports multiple names19:50
Tengufungi: yeah, so, confirmation: cleaning local cache, running with ansible-galaxy collection install -vvvvvv -s https://mirror.dfw.rax.opendev.org:4448 -p /tmp/foo_test__ community.general  it fails; cleaning cache, re-running with the mirror01, it works.19:51
fungihowever, we can probably specify a preferred name we want to use in the rewrites. for example doing mirror-int for the ones where we want the nodes to use the mirror's internal/private interface for performance reasons19:51
Tengufungi: hmmm.... so there's a mismatch between the ansible variable (don't remember its name) and the actual vhost in apache config?19:51
Tengujm1: we found the root cause apparently :)19:52
Tengujm1: well, actually, fungi pointed the missing piece :)19:52
funginot a mismatch. mirror and mirror01 are both valid names for the server and the vhost will serve (currently the same) content for either19:52
Tengufungi: ~> the ansible_var we get in the zuul job then? 19:53
fungithe challenge is deciding which name we want nodes using in their requests, which may not be the primary hostname for the server19:53
fungibut setting that statically in the vhost configuration, because as your inline comment points out, mod_substitute on apache 2.4.x doesn't support expressions19:53
jm1Tengu, fungi 🥳19:54
jm1I just wanted to give up on this :D19:54
Tengufungi: hmmm..... so mod_substitute doesn't support the httpd internal variables?19:54
funginot until apache 2.5.1 (currently under development)19:54
Tengudang19:54
fungihowever, we can probably add an ansible var in our deployment inventory for each mirror host to contain the name we plan to tell clients to access it as19:55
Tenguthat would be great :)19:55
fungiand then jinja splat that into the substitute rule19:56
fungiother root sysadmins with a better grasp of ansible can tell me if i'm smoking something with that idea19:56
Tenguwe can do whatever you want actually :). host_vars are here for that.19:57
clarkbfungi: the main issue with that is you'd break public access/testing of the name in rax (since we'd have to use the internal name since that is what the jobs get). Elsewhere it is fine and we can probably use elsewhere as testing proxies20:00
fungiclarkb: yes, i think more generally it's impossible to proxy ansible-galaxy completely with apache if you want to serve it from multiple arbitrary hostnames for the same server20:01
fungiso it's already broken in that way20:01
clarkbya20:02
fungijust trying to think of a solution which breaks it in favor of the hostnames we want test nodes using rather than in favor of some other hostname20:02
fungiwhere the latter is what we have at the moment20:02
fungianother way would be to give the rackspace mirrors two vhosts and use sni to route requests to the correct one for internal vs external interface hostnames20:04
Tenguor maybe pass a secondary zuul_site_mirror_fqdn var such as zuul_site_mirror_fqdn_fixed (or the like) that will then match the actual name of the host (i.e. mirror01.dfw.rax.opendev.org) ?20:05
Tenguthat way, actual jobs will have to get the config, but it doesn't really change anything on the mirror config itself?20:05
fungiTengu: the problem (and the reason for the mention of rackspace) is that in rackspace our mirror servers are dual-homed and we'd prefer nodes to connect to their non-public interfaces20:06
Tenguhmm ok.20:07
fungiso we really do want nodes to use urls like https://mirror-int..dfw.rax.opendev.org/... which isn't reachable from outside20:07
Tengusounds legit20:07
fungimainly because we get improved efficiency and stability for connections across their private internal network20:08
fungiso making the apache configuration on that server know the hostname we're telling nodes to connect to would give us something to bake into the substitute rule20:08
clarkbya tha might be the best appraoch but more effort than simply substituting the internal name always20:09
Tengufungi: so for instance http://mirror.iad3.inmotion.opendev.org:8085 would actually be another name ?20:09
fungiand yeah, an apache host_var containing that name would be one way to go about it20:09
fungiTengu: we tell clients to connect to mirror.iad3.inmotion.opendev.org which is currently a cname to mirror02.iad3.inmotion.opendev.org20:10
fungithe server knows itself as mirror02 but considers mirror to be an available alias20:10
Tenguok, so this means the substitute talks abec mirror02, while galaxy knows "mirror".... and crashes.20:11
Tenguok.20:11
fungiright20:11
Tenguand it fails once we get to the second page for #reason.20:11
Tengubecause single paging is fine. 20:11
fungiso if we tweak the substitute rule to use mirror.iad3.inmotion.opendev.org like the client requests do, then it should work20:12
Tengugo figure.... they probably messed big time at some point, but that's really a corner case.20:12
Tengufungi: so if we have a way to inject that name "mirror.iad3..." in the ansible generating the config, we're good.20:12
fungithe reason the pagination seems to break it is that the pages include "previous" and "next" fields which use fully qualified urls20:12
Tengudoes it?20:13
fungiand the client is probably following the "next" url from the first page, which then takes it to mirror02 instead of mirror20:13
Tenguoh.... dang.20:13
Tenguyeah20:13
Tenguthat's exactly that20:13
Tenguthat's the trick20:13
Tenguyou got it, fungi !20:14
fungianyway, the solution seems fairly straightforward, we probably just need to get consensus among the sysadmins as to the best way to encode the "preferred" request name for the mirror sites (like do we leverage some mechanism to generate them on the fly in group_vars or something similar which doesn't need us to list them individually)20:16
Tengufungi: *maybe* as a first thing we may just make a simple mapping in the jinja, using {% set %} and some if/elsif/else things...20:16
tobias-urdinclarkb: yeah, I'm working on getting everything moved over to the new API there, started with Nova gonna continue with Neutron but got blocked needing this https://review.opendev.org/c/openstack/oslo.messaging/+/869899 and been stuggling some with getting CI green with new tox etc20:17
tobias-urdinI will continue look into it for sure20:17
Tengufungi: though... if I understand correctly, mirror.foo.bar is a cname to, at least, mirror01.foo.bar, but may also be a cname to mirror02.foo.bar - would that mean both are up, and both are answering, meaning galaxy may end on 01 first, then re-request mirror.foo.bar and end on 02?20:18
fungiTengu: we add and remove mirror servers frequently is the reason for the cnames20:19
fungiyou can't have a cname resolve to multiple names though, you're probably thinking of round-robin address records20:20
Tengufungi: err yeah, round-robin address record indeed.20:20
Tenguand yeah, cname can't match multiple names. indeed. so it's more a "service" address that may be attached to any of the used server.20:21
Tengufungi: maybe... why is the proxy able/configured to answer to multiple names?20:21
fungianyway, there's enough churn that setting vars is probably cleaner than doing some sort of name mapping20:21
Tengu:) pretty sure you'll figure something out20:22
Tengulemme know if there's a need for testing or just pushing ideas.20:22
Tenguthough.... not today - it's getting late here.20:22
Tengubut at least, the root cause is known20:22
fungiTengu: mainly because we haven't told the proxies not to, and in some cases (like the multi-homed mirrors in rackspace) it's useful to be able to test them over the internet on externally reachable interfaces20:22
Tengufungi: makes sense.20:22
Tengua pity mod_substitute doesn't support vars yet ;_;. that would solve everything20:23
fungiwe could make the apache vhosts name-specific rather than wildcarded, as i mentioned earlier, it would just mean a lot of duplication in the configs or more apache templating20:23
jrosseris the idea to generally proxy/cache any ansible collection, or are there a subset of them we're more interested in?20:24
fungiso there are several ways we could go about it, mainly just trying to work out the least intrusive20:24
Tengujrosser: no actual idea. I thought it would be 2 or 3, but apparently that's already wrong.20:24
fungijrosser: to cache general ansible-galaxy access. if you're going to want to test with unreleased or un-merged commits from specific collections, still better to use required-projects in zuul20:25
jrosserwell, in projects i'm involved in we rewrite the collection URLs on the fly to use any that happen to be cached on the CI node20:25
clarkbtobias-urdin: my main suggestin would be to look into using the python warnings library and emit the warning once20:25
Tengujrosser: yeah - and if not cached on the ci node? adding more and more and more isn't good either, since they end up being moved around during the node bootstrap20:26
Tenguusing the caching-proxy is more flexible imho. and... we're not that far from a working setup, once we get over that hostname "mismatch". And there's a "clean" workaround (passing --no-cache to ansible-galaxy CLI)20:27
Tengualso, that issue seems to be affecting only ansible >2.9 - because the galaxy cache was implemented in later release (2.11 I think)20:28
clarkbthey ultimately provide different functionality some of which is useful at different times. In particular you should use the Zuul case if you are doing testing against unreleased collections and if you want depends-on support20:29
Tenguyep.20:29
Tenguanyway.... getting really late, I'll check back tomorrow :)20:32
fungihowever, if you're just doing some ansible testing and need something from galaxy, it's nice not to need to wait for someone to add another github repo to the zuul tenant config20:32
fungihave a good night Tengu!20:32
Tenguthanks fungi for the pointers :).20:32
jrosserthe most widespread problem i have seen with galaxy in ci jobs is the API returning 5xx and just bailing out, so server side problem at their end20:33
jrosseras a result our jobs get the collections from github with git rathan than galaxy with the API wherever possible20:34
jrosserthough having said that, occurrences of that kind of error have been very infrequent lately, something must be be fixed/improved in the galaxy server20:35
fungialso, in theory, proxying and caching requests for galaxy local to our test nodes should reduce the load we impose on their servers by running test jobs, while improving latency, packet loss, and bandiwdth availability/speed for the requests if hitting a warmed cache20:41
tobias-urdinclarkb: i guess the problem is that it gets logged everytime for example an API worker is spawned which means it will be a new interpreter (atleast for mod_wsgi) every time for the lifetime of that worker atleast20:57
tobias-urdindid it shrink down a bit when nova was fixed? or is there some other ones causing potential issues20:58
*** dviroel is now known as dviroel|out21:05
clarkbtobias-urdin: https://5ee1e6c6ba7962bf8d90-9f271c6f9270f1e424d49ce4325dabf5.ssl.cf2.rackcdn.com/871001/1/gate/cross-nova-functional/f996c0a/ it shrunk down significantly. Its more that if you are going to warn a user or operator about something repeating the warning in a tight loop is not helpful as it fills disks/logs and irritates them. That is why the python warnings library21:15
clarkballows you to emit such warnings once and move on21:15
tobias-urdinclarkb: yeah i agree, just wondering if using python warning once would actually solve all such issues but hm yea probably some of them atleast21:26
opendevreviewClark Boylan proposed opendev/system-config master: Fix Gerrit 3.6 image build  https://review.opendev.org/c/opendev/system-config/+/87011821:28
opendevreviewClark Boylan proposed opendev/system-config master: Build Gerrit on top of our python-base images  https://review.opendev.org/c/opendev/system-config/+/87087421:28
opendevreviewClark Boylan proposed opendev/system-config master: Switch Gerrit to Java 17  https://review.opendev.org/c/opendev/system-config/+/87087721:28
clarkbthis is neat I've got github notifications for a gitea release that doesn't show up in github yet21:32
clarkbtobias-urdin: I would expect it to cut down quite a bit since wsgi should have some process reuse right?21:33
tobias-urdinclarkb: yeah think so, i've proposed patches to all projects now atleast, not sure if i should change oslo.m also21:34
*** arxcruz|ruck is now known as arxcruz21:50
opendevreviewIan Wienand proposed openstack/project-config master: nodepool: drop linaro-us  https://review.opendev.org/c/openstack/project-config/+/87119621:55
clarkbianw: ^ re that did the new flavor get things going on the new cloud?21:55
clarkbianw: also left a thought on that new change21:57
clarkbhrm that tag is still not there I wonder if that implies they immediately deleted it22:02
ianwnot really.  for some reason, it hasn't chosen to upload all the image types, i'm not sure why.  there's nothing in the nb04 logs that i can see, it just doesn't seem to try uploading22:02
clarkbhas it built them? if so thats weird22:03
ianwkevinz gave me access to the cloud, but i'm a little worried it's out of disk22:03
ianw  /dev/nvme1n1p2  196G  186G  792M 100% /22:04
clarkbthat is one downside to raw images22:04
clarkbthey are a lot bigger22:04
clarkbI wonder if we shuldn't consider trimming what we support on arm64 way back. Like Jammy and Rocky 922:05
clarkbthats 4 images (2 * 2) at about 20GB each raw we'd be under that limit22:05
ianwLocal Volumes space usage:22:07
ianwglance                    1         122.9GB22:07
ianwit does seem to me that is probably where the images are being stored.  i'm still just trying to understand the layout and kolla deployment22:08
ianwt] Failed to upload image data due to HTTP error: webob.exc.HTTPRequestEntity22:12
ianwTooLarge: Image storage media is full: There is not enough disk space on the image storage media.22:12
ianwyeah, glance is not happy22:12
ianwok, kevinz did explain this, but i see now ... there's 2 1tb disks on this22:16
ianwnvme0n1                                                                  259:0    0 894.3G  0 disk22:16
ianwnvme1n1                                                                  259:1    0 894.3G  0 disk 22:16
ianwnvme1n1 is the boot disk -- it has a 1gb efi partition, and 200gb / and then the rest is in lvm for cinder volumes22:17
ianwnvme0n1 is 100% in the cinder lvm 22:17
ianwi think we probably want to make glance use cinder22:21
opendevreviewMerged opendev/system-config master: Fix Gerrit 3.6 image build  https://review.opendev.org/c/opendev/system-config/+/87011822:21
ianwsince that is where the space is22:21
clarkbif that is possible that seems like a good idea22:23
ianwit doesn't say cinder -> https://docs.openstack.org/kolla-ansible/latest/reference/shared-services/glance-guide.html#glance-backends22:25
ianwbut https://opendev.org/openstack/kolla-ansible/commit/fa49b2692de1b38bfdf47e1468296770d5dfff89 suggests maybe otherwise22:27
*** dasm is now known as dasm|off23:13
*** rlandy is now known as rlandy|out23:29

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!