Wednesday, 2020-07-29

*** ryohayakawa has joined #openstack-infra00:02
*** rcernin has quit IRC00:05
*** rcernin has joined #openstack-infra00:31
dansmithclarkb: this is a drive-by since I'm going afk, but I just got one of those functional timeout jobs and I don't see the testr file anywhere, even though I see that post step that was supposed to grab them: https://zuul.opendev.org/t/openstack/build/87acf04afa5f4bec8618a3802978f71400:35
*** weshay|ruck has quit IRC00:53
*** weshay_ has joined #openstack-infra00:53
*** howell has quit IRC00:57
*** samueldmq has quit IRC00:57
*** donnyd has quit IRC00:57
*** zzzeek has quit IRC00:57
*** thedac has quit IRC00:57
*** jberg-dev has quit IRC00:57
*** dtantsur|afk has quit IRC00:57
*** owalsh_ has joined #openstack-infra01:00
*** samueldmq has joined #openstack-infra01:01
*** donnyd has joined #openstack-infra01:01
*** howell has joined #openstack-infra01:01
*** zzzeek has joined #openstack-infra01:01
*** thedac has joined #openstack-infra01:01
*** jberg-dev has joined #openstack-infra01:01
*** dtantsur|afk has joined #openstack-infra01:01
*** gyee has quit IRC01:02
*** mordred has quit IRC01:03
*** owalsh has quit IRC01:04
*** tkajinam has quit IRC01:06
*** tkajinam has joined #openstack-infra01:06
*** rfolco has quit IRC01:08
*** mordred has joined #openstack-infra01:10
*** rfolco has joined #openstack-infra01:15
*** ociuhandu has joined #openstack-infra01:17
*** rfolco has quit IRC01:19
*** ociuhandu has quit IRC01:21
*** Goneri has quit IRC01:25
*** armax has quit IRC01:27
*** tkajinam has quit IRC01:33
*** tkajinam has joined #openstack-infra01:34
*** ldenny has quit IRC01:40
*** auristor has joined #openstack-infra01:40
*** ldenny has joined #openstack-infra01:40
*** ociuhandu has joined #openstack-infra01:43
*** ociuhandu has quit IRC01:48
*** dchen is now known as dchen|away01:56
*** dchen|away is now known as dchen02:01
*** rfolco has joined #openstack-infra02:08
*** ricolin has quit IRC02:15
*** ricolin has joined #openstack-infra02:21
*** rfolco has quit IRC02:28
*** artom has quit IRC02:35
*** dchen is now known as dchen|away02:37
zer0c00lclarkb: Thanks.02:44
*** ociuhandu has joined #openstack-infra02:53
gouthamrclarkb fungi: circling back to tell you a new change-id resolved the issue i had earlier - thank you for your help!02:57
*** ociuhandu has quit IRC02:57
*** stevebaker has quit IRC03:18
*** armax has joined #openstack-infra03:28
*** psachin has joined #openstack-infra03:37
*** armax has quit IRC03:43
*** dchen|away is now known as dchen03:58
*** dchen is now known as dchen|away04:08
*** ykarel|away has joined #openstack-infra04:22
*** ykarel|away is now known as ykarel04:29
*** evrardjp has quit IRC04:33
*** evrardjp has joined #openstack-infra04:33
*** dchen|away is now known as dchen04:42
*** Lucas_Gray has quit IRC04:44
*** udesale has joined #openstack-infra04:45
*** dchen is now known as dchen|away04:55
*** marios has joined #openstack-infra05:05
*** ramishra has quit IRC05:11
*** ramishra has joined #openstack-infra05:16
*** stevebaker has joined #openstack-infra05:16
*** tetsuro has joined #openstack-infra05:25
*** tetsuro has quit IRC05:25
*** tetsuro has joined #openstack-infra05:26
*** tetsuro has quit IRC05:27
*** dchen|away is now known as dchen05:27
*** lmiccini has joined #openstack-infra05:30
*** ysandeep|away is now known as ysandeep|rover05:35
*** auristor has quit IRC05:38
*** auristor has joined #openstack-infra05:39
*** slaweq has joined #openstack-infra05:52
*** ociuhandu has joined #openstack-infra05:59
*** slaweq has quit IRC06:02
*** eolivare has joined #openstack-infra06:14
*** brtknr has quit IRC06:17
*** dchen has quit IRC06:22
*** dchen has joined #openstack-infra06:23
*** marios has quit IRC06:29
*** vishalmanchanda has joined #openstack-infra06:39
*** dklyle has quit IRC06:42
*** slaweq has joined #openstack-infra06:57
*** hashar has joined #openstack-infra07:18
*** jcapitao has joined #openstack-infra07:19
*** bhagyashris is now known as bhagyashris|lunc07:22
*** xek_ has joined #openstack-infra07:24
*** zxiiro has quit IRC07:26
*** apetrich has joined #openstack-infra07:27
*** yonglihe has joined #openstack-infra07:31
*** tosky has joined #openstack-infra07:39
*** ralonsoh has joined #openstack-infra07:40
*** yolanda has quit IRC07:40
*** dtantsur|afk is now known as dtantsur07:53
*** jpena|off is now known as jpena07:55
*** evrardjp has quit IRC08:06
*** pkopec has joined #openstack-infra08:06
*** evrardjp has joined #openstack-infra08:08
*** lucasagomes has joined #openstack-infra08:08
*** yolanda has joined #openstack-infra08:17
*** derekh has joined #openstack-infra08:23
*** bhagyashris|lunc is now known as bhagyashris08:40
*** brtknr has joined #openstack-infra08:46
*** ociuhandu has quit IRC08:56
*** dtantsur is now known as dtantsur|brb08:58
*** dchen is now known as dchen|away09:03
*** rcernin has quit IRC09:12
*** Lucas_Gray has joined #openstack-infra09:23
*** Lucas_Gray has quit IRC09:27
*** Lucas_Gray has joined #openstack-infra09:30
*** ociuhandu has joined #openstack-infra09:33
*** ociuhandu has quit IRC09:37
*** ociuhandu has joined #openstack-infra09:42
*** hashar has quit IRC09:42
*** tkajinam has quit IRC10:01
*** hashar has joined #openstack-infra10:06
*** ramishra has quit IRC10:08
*** hashar has quit IRC10:35
*** ramishra has joined #openstack-infra10:49
*** ricolin has quit IRC10:52
*** eolivare has quit IRC11:08
*** hashar has joined #openstack-infra11:14
*** jcapitao is now known as jcapitao_lunch11:15
*** ysandeep|rover is now known as ysandeep|afk11:21
*** dtantsur|brb is now known as dtantsur11:26
*** xek_ has quit IRC11:27
*** hashar has quit IRC11:34
*** hashar has joined #openstack-infra11:34
*** dchen|away is now known as dchen11:34
*** ysandeep|afk is now known as ysandeep|rover11:38
openstackgerritSlawek Kaplonski proposed openstack/project-config master: Move non-voting neutron tempest jobs to separate graph  https://review.opendev.org/74372911:40
*** markvoelker has joined #openstack-infra11:40
*** hashar has quit IRC11:42
*** hashar has joined #openstack-infra11:42
*** hashar has quit IRC11:44
*** hashar has joined #openstack-infra11:45
*** dchen is now known as dchen|away11:46
*** markvoelker has quit IRC11:47
*** rfolco has joined #openstack-infra11:51
*** ryohayakawa has quit IRC11:52
*** jpena is now known as jpena|lunch11:52
*** hashar has quit IRC11:57
*** rlandy has joined #openstack-infra12:02
*** artom has joined #openstack-infra12:03
*** hashar has joined #openstack-infra12:05
*** dciabrin has quit IRC12:06
*** dciabrin has joined #openstack-infra12:07
*** eolivare has joined #openstack-infra12:07
*** udesale_ has joined #openstack-infra12:19
*** udesale has quit IRC12:21
*** hashar has quit IRC12:22
*** jcapitao_lunch is now known as jcapitao12:25
*** xek has joined #openstack-infra12:27
*** lpetrut has joined #openstack-infra12:44
*** dciabrin has quit IRC12:44
*** dciabrin has joined #openstack-infra12:45
*** markvoelker has joined #openstack-infra12:48
*** jpena|lunch is now known as jpena12:54
*** weshay_ is now known as weshay|ruck12:54
*** Goneri has joined #openstack-infra13:01
openstackgerritTristan Cacqueray proposed zuul/zuul-jobs master: ensure-pip: add instructions for RedHat system  https://review.opendev.org/74375013:03
*** dciabrin has quit IRC13:27
*** dciabrin has joined #openstack-infra13:27
*** dave-mccowan has joined #openstack-infra13:29
*** xek has quit IRC13:30
*** ysandeep|rover is now known as ysandeep13:31
*** andrewbonney has joined #openstack-infra13:32
mwhahahaclarkb: so the multiple requests are likely the different jobs requesting the same layer. I checked the logs for one of the jobs and d1dded21abdf1872a3e678bb99614c7728e3d1b381d5721169bedae30cda5c61 was requested 2 times.  I'll do some testing today to try and figure out the cache busting behavior13:36
*** dchen|away is now known as dchen13:44
*** piotrowskim has joined #openstack-infra13:46
mwhahahacan anyone point me at where the apache cache config lives for the docker.io mirror? I'd like to recreate locally to troubleshoot13:49
*** d34dh0r53 has joined #openstack-infra13:49
clarkbmwhahaha: I filtered by ip address first then trimmed to timestamps occuring for that job against your change. All of that should mean I counted for that one hoston that on job.13:53
fungimwhahaha: https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror/templates/mirror.vhost.j2#L400-L45213:53
fungiassuming you're looking for the v2 protocol proxy13:53
mwhahahaclarkb: weird. ok i'll try and track down the other requests13:53
fungiif you're looking for the v1 protocol proxy then it's in a similar macro earlier in that template13:53
mwhahahafungi: yes thats it, thanks13:53
fungiworth noting, dockerhub isn't designed to be easily proxied, they discourage doing so, and we have a few ugly workarounds in there to get it working at all13:55
fungithuogh apparently our workarounds are only sufficient to get it working with the official docker client, sounds like?13:56
clarkbthe biggest thing is ignoring url query paramters to make the urla cacheable13:57
openstackgerritAurelien Lourot proposed openstack/project-config master: Add Keystone Kerberos charm to OpenStack charms  https://review.opendev.org/74376613:57
clarkbthey are sha256 addressed so should never change13:57
*** ysandeep is now known as ysandeep|away14:07
*** xek has joined #openstack-infra14:09
openstackgerritAurelien Lourot proposed openstack/project-config master: Add Keystone Kerberos charm to OpenStack charms  https://review.opendev.org/74376614:10
*** sshnaidm is now known as sshnaidm|bbl14:17
clarkbmwhahaha: I'm double checking and all of the logs I processed yesterday are for https://zuul.opendev.org/t/openstack/build/61aa12e9474840c8969fc8426541fa41/log/job-output.txt#48 that host and the logs run from [2020-07-28 22:06:40.671] to [2020-07-28 22:57:01.165] which is within the time range of that job running14:27
clarkbmwhahaha: now I'm spot checking my requests counts to make sure they are accurate14:27
*** dklyle has joined #openstack-infra14:29
clarkbmwhahaha: ok there was a bug in my sed, it was doubling the counts. So still have multiple requests, but half as many in the original paste. http://paste.openstack.org/show/796432/ should be accurate14:31
clarkbI was using s///p without -n so it printed an extra match I think. Dropped the p and now the counts should be correct14:31
*** dchen is now known as dchen|away14:37
*** ricolin has joined #openstack-infra14:40
*** rcernin has joined #openstack-infra14:41
*** Lucas_Gray has quit IRC14:41
*** lpetrut has quit IRC14:43
*** Lucas_Gray has joined #openstack-infra14:49
*** lbragstad_ has joined #openstack-infra14:51
*** rcernin_ has joined #openstack-infra14:51
*** dklyle has quit IRC14:53
clarkbmwhahaha: looking at tcpdumps the docker client drops the Authorization header bearer token when talking to the CDN14:53
mwhahahawat14:53
clarkbgive me a few minutes to put this all in a paste14:53
mwhahahathanks i'm setting up an apache mirror at the moment to play with it a bit more14:53
*** fdegir5 has joined #openstack-infra14:54
*** rcernin_ has quit IRC14:56
*** rcernin has quit IRC14:58
*** fdegir has quit IRC14:58
*** irclogbot_3 has quit IRC14:58
*** zer0c00l has quit IRC14:58
*** lbragstad has quit IRC14:58
*** irclogbot_3 has joined #openstack-infra14:59
zbrclarkb: why we are not using a generic proxy approach for mirror_info.sh implementation?15:00
*** dklyle has joined #openstack-infra15:01
clarkbmwhahaha: http://paste.openstack.org/show/796435/ that is what I see15:03
clarkbzbr: I don't understand the question. What do you mean by generic proxy approach?15:03
zbrone that you would configure as HTTP_PROXY=....15:03
clarkbzbr: because we don't want open proxies on the internet, and there isn't a good way for us to restrict access to the proxies if we want our jobs to be able to use them15:04
clarkbzbr: instead we reverse proxy so that only specific backends can be targetted limiting the risk we'll be used for nefarious reasons15:04
zbrto be this seems like creating more maintenance and less flexibility. it is still possible to limit what the proxy would serve of not, or whom.15:05
clarkbwe can't IP filter because we use public clouds and get large ranges of IP addrs which can also be used by others. I don't think we can reasonably authenticate the proxies as anyone can write a job to disclose the auth material (particularly since we expect the proxy to be functional throughout the job and not just during pre or post steps)15:05
clarkbzbr: how? I don't think we have a method to limit the who in this case15:06
zbrIP ranges?15:06
clarkbwe could limit the what but then every  time someone adds a new job they'd have to disable the proxy if they talk to things that aren't already enabled15:06
clarkbsee my earlier note about IP ranges15:06
clarkbdue to our use of public clouds that isn't really sufficient15:06
clarkbBUT also it wouldn't address this problem at all15:06
zbror we could use auth on them15:07
clarkbzbr: see my note about auth :)15:07
clarkbif someone can construct a method to do that I'd be happy to hear it but I've been unable to figure out how it would work i na reasonable manner15:07
clarkbthe problem we have here is a result of pushing requests through a single origin which is true if we forward or reverse proxy15:08
clarkbits made worse by not being able to cache for some reason so all the requests go through. On top of that we have jobs requesting the same resources multiple times.15:08
clarkbIf we fix the caching issue all of those multiple requests should avoid hitting the remote. And additionally maybe we can stop making all of those extra requests and use the earlier fetched data within the job15:09
zbri am trying to find you note....15:09
clarkb15:05:54          clarkb | we can't IP filter because we use public clouds and get large ranges of IP addrs which can also be used by others. I don't think we can reasonably authenticate the proxies as anyone can write a job to disclose the auth material (particularly since we expect the proxy to be functional throughout the job and not just during pre or post steps)15:10
clarkbzbr: ^15:10
zbri do not see any damage in not having a perfect lockdown of the proxy, what if few additional IPs would be able to use the proxy? Or what if someone would expose the credential and make use of it? If these proxies are only mirrors for specific domains, it should not matter.15:12
clarkbzbr: the setup you describe would capture all traffic or are yo usuggesting that ever job need to know to enable the proxy for specific requests?15:13
*** lmiccini has quit IRC15:13
clarkbgenerally the suggestion is that we configure it in /etc system wide so that jobs don't need to be aware of it15:14
zbrso now every job needs to know how to load and process 30 variables from https://opendev.org/opendev/base-jobs/src/branch/master/roles/mirror-info/templates/mirror_info.sh.j215:14
clarkbthe problem there is we can't limit the traffic much and that makes it a potential avenue for abuse15:14
zbrinstead of doing it for a single HTTP_PROXY one15:14
*** zxiiro has joined #openstack-infra15:14
clarkbzbr: thats not true what you link is the legacy pattern15:14
clarkbzbr: the expectation now is that you'll run the role to configure eg docker and it will know to do it for you15:14
clarkbwhich is how roles like the docker role work15:15
clarkbbut as I said if you configure the signel http proxy once then you have to allow all traffic through the proxy and that is the abuse concern15:15
clarkbbecause now anyone on the internet can use us to funnel their traffic15:15
clarkbthen if we get firewalled all our jobs stop working15:15
clarkbI'm not suggesting it is perfect, but given the constraints we operate under I think it is reasonable to do what we do (reverse proxying specific resources). Also, again, changing to a forward proxy would not change the key details of this problem that tripleo is facing15:16
zbrdo I need to allow all traffic through it? wouldn't be enough to allow only select locations?15:17
clarkbzbr: if you do that then every job tasks needs to know to set that env var when it makes requests15:17
zbrin fact I even see few extra security benefits on having a proxy, you setup on the machine and discover if jobs try to access data from "random" (unapproved) locations.15:18
clarkbI don't want to be the location police15:18
zbrwe could even have a locked down mode15:19
clarkbjobs can already limit themselves in that manner if they choose, but I don't think it is the CI platforms duty to address that15:19
clarkbforexample this is what the remove sudo role provides to jobs that want to avoid sudo access15:19
clarkbbut jobs opt into that and its all job config, not platform setup15:19
zbryeah, that could be used for similar purposes.15:20
zbreven such a proxy could be an opt-in feature.15:21
clarkbit is, one of the options availalbe to tripleo is to stop using our proxy15:21
clarkbthen you'd have the background error rate but due to using different IPs would likely avoid the rate limits15:21
clarkbjob runtime shouldn't go up since we weren't caching anything for those jobs anyway15:22
fungialso reduced performance due to fetching them from farther away on every request (though that's currently the case anyway because the requests from tripleo's docker client aren't getting cached for as of yet unknown reasons)15:23
fungier, what clarkb just said15:23
clarkbya  Ithink fixing the problem directly would be an improvment, but dropping the proxies wouldn't be a regression against the current situation15:23
zbrclarkb: my question is unrelated to docker issue, is quite the opposite, see https://bugs.launchpad.net/tripleo/+bug/188870015:25
openstackLaunchpad bug 1888700 in tripleo "image build jobs are not using upstream proxy servers" [Critical,Triaged] - Assigned to Sorin Sbarnea (ssbarnea)15:25
clarkbzbr: unfortunately "It would far better to have a proxy enabled and make use of it transparently, so when no proxy is configured it would still work." is a problem for our environment15:26
clarkbbecause if we make it transparent then we're unable to control it to a level where abuse can be limited15:26
clarkbyes that would likely be better in a perfect world15:26
clarkbbut we don't have the ability to take advantage of that as far as I can see (because it would require us to put open proxies on the internet)15:27
zbrso the main issue here is preventing abuse15:27
clarkbyes, if we set up a transparent squid proxy in each of our cloud regions, and configured our test nodes to use it. How would we also prevent random internet users from using them. IP restrictions are difficult because we use public clouds and share IP pools and those IP pools change. Authentication is trivially exposed via a push to gerrit since we'd need the proxy to be useable during the run15:28
clarkbportion of a job and not just in the trusted pre and post playbooks15:28
zbrif abuse prevention is only issue, i think there are ways to investigate a way to avoid it.15:28
clarkbthere may be, but I've spent some time thinking about it (when we initially set up the reverse proxies with pabelanger) and couldnt' come up with a reasonable solution15:29
zbrzuul could produce a temporary token which gives access to the proxy for limited amount of time15:29
clarkbbut now its too complicated to bother15:29
clarkbwhat we have is simple and it works15:29
clarkb(also there are issues with that approach too which we hit with our first attempt at swift based log storage)15:30
zbruntil someone comes to you and say: i need mirrors to you foo.org, two weeks later, this becomes bar.org15:30
fungiyeah, i was about to point out that we tried the temporary token authentication solution for granting nodes access to write to swift containers, and ultimately abandoned that due to the complexity15:30
clarkbfungi: complexity and it didn't work reliably as getting timing right is weird15:31
zbri personally do not find that approach to scale well15:31
clarkbzbr: yes, I agree its not perfect, but again given th econstraints we have it has worked really really well15:31
fungiwell, i meant getting the timing right between token generation, authorization, and revocation was complicated15:32
fungiit looks like the suggested way to do it with squid is kerberos15:34
fungii can only imagine the new and exciting failure modes we'll encounter setting up each job node as a kerberos client15:35
zbrclarkb: fungi: tx, time for me to go back and check how big is the need to cache/mirror requests made towards images.rdoproject.org15:35
fungisquid can also support bearer tokens supplied in the authentication header15:37
*** armax has joined #openstack-infra15:37
fungibut again, key distribution would be the complicated bit15:38
clarkbfungi: apache does too fwiw (its working with the docker hub client http://paste.openstack.org/show/796435/)15:38
zbrwhat i do not understand is why we cannot limit access to a proxy to requests coming from inside the same cloud tenant, but I also do not know how our networking is setup15:38
*** williampiv has joined #openstack-infra15:38
fungizbr: it's generally "provider" networking15:39
clarkbzbr: because we don't get tenant networking in: rax, vexxhost, ovh15:39
fungiwe don't have a dedicated network for our nodes15:39
clarkbI think we do have tenant networking in limestone, openedge, and inap15:39
clarkbnot sure about linaro15:39
clarkbrax + vexxhost + ovh are probably 70% of our resources15:40
zbrbut all support them, so we could provision proxies that respond only to internal requests, even without having to "configure" the proxy itself to be aware about that.15:40
clarkbI'm not sure thats a true statement15:41
clarkbI'm 99% sure OVH does not15:41
clarkbI know rax didn't but don't know if that has changed. I think vexxhost may allow us to configure tenant networks if they aren't there by default15:41
clarkbOVH networking is particularly interesting. They give you a /32 on ipv4 with a default gateway outside of your subnet (and it works). For ipv6 they don't RA, neutron knows about it, but not config drive or metadata service. So you have to query neutron then statically configure it (so we don't do ipv6 there)15:43
*** williampiv has quit IRC15:43
clarkbmwhahaha: I notice that Vary: Accept-Encoding is set on the response from cloud flare (wasn't in my earlier paste as I didn't do the lsat response whoops). And docker client has Accept-Encoding: Identity while tripleo client does Accept-Encoding: gzip, deflate I think that means if we do start caching we will cache disparate objects for tripleo client and docker client as the accept encodings are15:49
clarkbdifferent.15:49
clarkbwhile I don't think that is a direct cause of not caching at all, its another issue to clean up likely to avoid duplicate objects?15:49
clarkbit seems that maybe the Authorization header is related to not caching, however I wouldn't expected your change to address that. Local testing is likely the best next step for sorting that out.15:50
clarkb*I would've15:50
zbri wonder if there is proxy that I can easily reconfigure using REST, so I could tell it to open access to specific IP when I know about it, cleaning it after.15:51
mwhahahaclarkb: yea but according to the docs I should be able to set Cache-Control: public with Authorization and caching should take effect15:52
clarkbmwhahaha: ya that is why I expected your change would help15:52
mwhahahaper https://httpd.apache.org/docs/2.4/caching.html "What Can be Cached"15:52
clarkbmwhahaha: which makes me wonder if it is another issue (or maybe two things authorization and something else)15:52
mwhahahasince i grabbed teh httpd conf i'm going to see if i need aditional stuff15:52
*** ykarel has quit IRC15:53
clarkbzbr: related to that is the reason we stopped using cloud provider dns servers. On rax they block IPs that make too many dns requests, but since we boot lots of nodes what would happen is they blocked an IP for too many dns requests when that IP was used by someone else. Then when it was our turn to use the IP they didn't stop blocking the IP and our jobs would fail. This is what prompted us to run15:54
clarkblocal dns forwarding resolvers and bypass the cloud servers entirely. (fwiw I think it could've been done better and this isn't a reason to not do that, just an interesting story related to a similar mechanism)15:54
*** ykarel has joined #openstack-infra15:58
zbrafaik, zuul knows the IP of its nodes, so it can (un)lock access to a proxy based on that, the only trick is that we need aproxy that does not need restart to change its ACL15:59
zbrand this could be a generic and optional ensure-proxy role15:59
zbrif a job runs on a cloud that does not support a proxy, it may just skip configuring the proxy16:00
fungithat seems like a fairly fundamental job behavior we wouldn't want changing at random depending on where the build happened to get scheduled16:01
*** pkopec has quit IRC16:01
*** lucasagomes has quit IRC16:02
zbri would say behavior on this would an ops decision :D16:02
* fungi has no idea what that means16:03
*** xek has quit IRC16:05
*** jcapitao has quit IRC16:09
clarkbsomething like that may work now that we have a zuul cleanup phase, but we would also need to switch to always gracefully stopping execuyors16:13
*** markvoelker has quit IRC16:13
clarkbotherwise the cleanup may not run and we'll leak connections (or add proxy cleanup to zuul restart procedures)16:14
*** ykarel has quit IRC16:16
*** ociuhandu_ has joined #openstack-infra16:21
*** ociuhandu has quit IRC16:23
*** udesale_ has quit IRC16:24
*** ociuhandu_ has quit IRC16:27
*** sshnaidm|bbl is now known as sshnaidm16:28
*** jrichard has joined #openstack-infra16:28
*** gyee has joined #openstack-infra16:28
*** ociuhandu has joined #openstack-infra16:35
*** ociuhandu has quit IRC16:41
*** hashar has joined #openstack-infra16:42
*** psachin has quit IRC16:53
*** Lucas_Gray has quit IRC16:54
*** pkopec has joined #openstack-infra16:56
*** fdegir5 is now known as fdegir16:57
*** derekh has quit IRC17:00
*** ricolin has quit IRC17:01
*** rlandy is now known as rlandy|mtg17:02
*** jpena is now known as jpena|off17:02
*** sshnaidm is now known as sshnaidm|afk17:15
*** armax has quit IRC17:19
*** armax has joined #openstack-infra17:19
*** dtantsur is now known as dtantsur|afk17:21
*** dchen|away has quit IRC17:21
*** dchen|away has joined #openstack-infra17:24
*** jrichard has quit IRC17:30
*** doggydogworld has joined #openstack-infra17:32
doggydogworldhello all, im trying to PCI passthrough a NIC, but am running into "Insufficient compute resources: Claim pci failed.", does anyone have some insight as to how to fix this?17:33
doggydogworldi'm on train and following this guide17:33
doggydogworldhttps://docs.openstack.org/nova/train/admin/pci-passthrough.html17:33
doggydogworldalso, using all-in-one RDO packstack17:33
*** rlandy|mtg is now known as rlandy17:43
*** artom has quit IRC17:45
*** artom has joined #openstack-infra17:46
*** dchen|away is now known as dchen17:50
*** artom has quit IRC17:52
clarkbdoggydogworld: we help run the developer infrastructure for openstack but don't do a ton of openstack operations ourselves17:53
clarkbdoggydogworld: you might have better luck in #openstack or emailing openstack-discuss@lists.openstack.org17:53
doggydogworldokay, thank you clark17:54
*** dchen is now known as dchen|away18:00
*** artom has joined #openstack-infra18:07
clarkbmwhahaha: I think I figured it out18:16
EmilienMwoot18:16
clarkbthe responses to the tripleo client have Cache-Control: public, max-age=14400 and Age: 512824 headers set18:17
clarkbthe responses to the docker client have Cache-Control: public, max-age=14400 set but no Age header18:17
clarkbI wonder if the difference is the authorization header being sent on the request to cloud flare18:17
clarkbalso rereading the apache docs I think we already do the right thing for Authorization because it is the response to that that needs to set cache-control not the request itself aiui. But then because we are over max-age we don't cache anyway18:19
*** andrewbonney has quit IRC18:21
clarkbI don't know what sort of logic is expected around the authorization header by the docker image protocol :/ but I'm guessing that is something the upstream client can give us hints on. Then we also want to update the accept-encoding so that the vary header doesn't force us to cache duplicate data18:22
mwhahahathe problem is that it's really docker.io specific :/18:23
mwhahahawe require auth for our blob fetches on registry.redhat.io i think18:24
clarkbya, but the docker client must have logic in there for it?18:24
clarkbotherwise docker wouldn't work with any other registry?18:24
mwhahahamaybe they just don't Auth on a 30718:24
mwhahahai'm really uncertain18:24
mwhahahaor we're incorrectly setting the scope of our auth tokens18:25
clarkbits also possible something elseis tripping the setting of Age by cloud flare18:25
mwhahahaanyway going to start digging into that more18:25
clarkbbut the authorization header stands out as a big difference18:25
clarkbalso arguably this is a bug in the cloud flare server setup as max-age shouldn't matter for sha256 adressed entities18:27
clarkbthey cannot change18:27
fungithey can go away though via, e.g., deletion18:29
*** vishalmanchanda has quit IRC18:29
fungiso would switch from 200 to 404 or something like that18:29
mwhahahathat being said, shouldn't max age still cache? just not for a long time?18:30
funginot if the age returned is greater than the max-age18:30
mwhahahahmm18:31
fungiage 512824 is something like 6 days18:31
*** artom has quit IRC18:31
clarkbya normally Age is like 018:31
fungilots of sites play games with age and max-age to try to make things uncacheable ("cache busters")18:31
fungiat one point i remember having to custom compile a patched squid to ignore some of them and just cache it already please18:32
fungisilly games like setting negative max-age18:32
fungimaybe apache can be configured to strip/ignore age from responses?18:33
fungibut still it's surprising that cloudflare is only sometimes returning an age with those depending on how they're requested18:34
fungicould even be a custom filter based on the user agent18:35
clarkboh thats a good point18:35
fungithat might be easy to test if the tripleo client can be tweaked to set a user agent18:36
mwhahahait's just python requests so we can change whatever18:37
fungiyeah, i think it's an additional string parameter you pass in the connection cont\structor18:38
fungiconstructor18:38
*** hashar has quit IRC18:39
clarkbwe can use https://httpd.apache.org/docs/current/mod/mod_headers.html#header to unset the Age header18:45
EmilienMmwhahaha: we probably want to name it with an obvious name for better tracking in logs (i guess you already think about it)18:45
clarkbI'm not sure that is the best option here, but if we can't sort out what dockerhub/cloud flare expect then unsetting that seems reasonable enough18:46
clarkbin particular we have the docker v2 proxy on its own vhost so can limit the "damage"18:46
clarkbwe already set a max expiration for our cache at one day so we'll eventually catch up to something that has been deleted (and when docker hub client is used the header isn't there anyway)18:47
*** artom has joined #openstack-infra18:48
*** artom has quit IRC18:48
*** artom has joined #openstack-infra18:48
openstackgerritAndrii Ostapenko proposed zuul/zuul-jobs master: Add ability to use (upload|promote)-docker-image roles in periodic jobs  https://review.opendev.org/74056018:49
openstackgerritAndrii Ostapenko proposed zuul/zuul-jobs master: Add ability to use (upload|promote)-docker-image roles in periodic jobs  https://review.opendev.org/74056018:52
*** ralonsoh has quit IRC18:53
*** lbragstad_ is now known as lbragstad19:01
*** zer0c00l has joined #openstack-infra19:05
*** zer0c00l has quit IRC19:06
*** zer0c00l has joined #openstack-infra19:13
clarkbmwhahaha: fungi https://github.com/moby/moby/blob/master/registry/registry.go#L157-L174 its basically "am I talking to docker.com or docker.io" check if not then drop authorization19:24
clarkbI think the logic there is if we've been redirected away from our actual location then drop Authorization as the authorization no longer applies19:25
clarkbwhether or not that is actually correct in all instances (see https://github.com/moby/moby/blob/master/registry/registry.go#L140-L155 where they hardcode their own stuff) I don't know19:26
clarkbwhere that gets weird for us is we proxy through the same host so its by chance that we drop it (because we aren't called docker.io)19:26
mwhahahathe alternative way is to use docker registry as a passthrough cache19:27
mwhahahahave we looked into that (Rather than apache)19:27
clarkbmwhahaha: yes, there is no way to prune the registry in that case19:27
mwhahahafigures19:28
mwhahahayou can query the catalog and write a pruner19:28
mwhahahabut yea i guess that's lame19:28
clarkbthe thing they have built in requires you to stop the service aiui19:28
clarkband so oyu take an outage while you do a bunch of io19:29
mwhahahathough you could round robin them19:29
mwhahahato handle that19:29
mwhahahaanyway /me goes back to checking on headers19:29
clarkbmaybe, it would be nice to not over complicate this.19:29
mwhahahatoo late19:29
clarkbafter reading the upstream code I think we can probably drop the Age: header19:29
clarkbsince they are special csing things poorly in the docker code19:29
mwhahahai mean you should be able to drop the age header on the Dockerv2 config19:29
clarkbyup that is what I mean19:30
mwhahahaand the risk should be contained19:30
mwhahahaAge: 127026919:30
mwhahahaso i see that19:30
clarkbwe can also drop the authorized header on the request side on requests to cloudflare19:30
mwhahahalet me see if it's like user agent specific19:30
clarkb(maybe do both)19:30
clarkbmwhahaha: ++19:30
clarkbif its UA specific we can change the UA header and lie in the proxy instead19:31
clarkbor maybe your client can do that19:31
clarkbin general I think the rule that the docker upstream is trying to encode is "if we've been redirected to a third party service then remove authorization as the authorization token is valid only for the origin". Since we proxy everything through the same host the most accurate way to encode that may be to drop Authorization headers on the cloudflare prefix19:34
mwhahahapodman also gets the Age but it's useragent is libpod/1.6.419:36
mwhahahaso docker cli is doing something special19:36
clarkboh this gets more interesting they redirect to cloudflare using a docker.com name so that would be trusted, but because we don't have that same suffix the docker client drops authorization19:40
clarkbmwhahaha: are you able to test easily if dropping authorization drops the age header?19:40
clarkbI'm working on a change now to drop the age header on the proxy as I'm beginning to think that is most accurate for us19:41
mwhahahaso i get an Age: but it's less than 14400 when i just switched the user agent19:42
clarkbswitched to the docker UA?19:43
mwhahahayea19:43
clarkband fetching the same blob with a different UA has a larger Age?19:43
mwhahahayea19:43
clarkbwow19:43
mwhahahagoing to double check but i got numbers less than the 1440019:44
clarkbI've double checked my tcpdump and it isn't set at all there19:44
mwhahahanm it's consistent with the UA19:49
mwhahahai had one item ETag: 3290ca17424cdcfe2a49209035d13f8b with an age of 17187119:49
*** xek has joined #openstack-infra19:49
mwhahahathen i reran with the current ua and it's 17224619:49
mwhahahalet me try dropping the auth header19:49
fungiokay, so at least no shenanigans related to ua string19:51
clarkbmwhahaha: I need to pop out for a bit now, but you might try applying https://review.opendev.org/743835 to your test setup and see if that is happier19:53
mwhahahathe auth header is required for us to fetch the blobs19:53
mwhahaharemoving it broke it19:53
mwhahahaso there must be some other type of thing causing teh difference19:54
clarkbhrm I'll double check my tcpdumps but I'm fairly certain that they weren't there for the docker hub client19:54
clarkbmwhahaha: they need to be there on the first pre redirect request19:55
clarkbbut ya I don't seem them on the post redirect requests in my tcpdump capture19:56
clarkbthey are there for the pre redirect requests19:56
*** doggydogworld has quit IRC19:57
mwhahahathey might be handling the redirect separately where as ours is probably getting handled under the covers by python requests20:07
clarkbya see my github.com/moby links above, they handle it explicitly20:08
*** eolivare has quit IRC20:23
clarkbmwhahaha: why do we need to remove the proxypass reverse line?20:35
mwhahahathe </Location> line is bad20:35
clarkboh wait I see20:35
clarkbya the quote context rendered weird for me20:35
mwhahahai'm not getting any caching but i'm sure i've messed something up trying to hack a config out of this20:35
clarkbI've updated the change and if it passes our testing I'll try to apply it manually to a server (mirror.iad.rax.opendev.org is what I've been reading logs on so far)20:37
mwhahahait definately removes the Age20:37
mwhahahaand it "works" to get content20:37
mwhahahamy caching just isn't working20:37
clarkbits possible that removing the age happens after apache considers if it should be cached20:37
clarkbthat would be unfortunate if so20:37
clarkbfungi: ^ do you know about order of ops there?20:37
*** dchen|away is now known as dchen20:38
mwhahahai'm trying to cheat by running httpd in a container so it might just be me on the caching thing20:38
*** ociuhandu has joined #openstack-infra20:39
*** ociuhandu has quit IRC20:43
*** dchen is now known as dchen|away20:48
mwhahahai feel like i'm missing a rewrite rule or something because the cloudflare stuff isn't being redirected to use the proxy20:51
clarkbmwhahaha: the proxypassreverse line is important for that as it will rewrite the 307 location to be the proxy20:54
mwhahahayea it feel slike the location bit might have broken it20:55
* mwhahaha checks20:55
mwhahahacause it was "working" previously20:55
mwhahahait wasn't actually caching but i was seeing transit on the wire20:55
clarkb"When used inside a <Location> section, the first argument is omitted and the local directory is obtained from the <Location>." is what ProxyPassReverse docs say so it should work20:56
* mwhahaha wonders if the docs are full of lies20:56
clarkbhttps://httpd.apache.org/docs/2.4/mod/mod_proxy.html#proxypassreverse20:56
clarkbthat could be :)20:57
mwhahahayea location broke the redirect20:58
mwhahaharemoving location worked20:59
clarkbfungi: any idea what may be happening there?20:59
mwhahahai think the proxy pass bits need to be top level20:59
mwhahahabecause it would only do that on requests for /cloudflare/20:59
mwhahahainstead of matching the reverse for responses from docker.io20:59
* mwhahaha tests20:59
clarkbah yup I bet that is it21:00
clarkbif it wasn't a 307 it would be fine but the / rules create the Location that we need proxypass to apply to21:00
clarkb*proxypassreverse to apply to21:00
clarkbmwhahaha: the proxypass stays in the location with the other rules, but the proxypassreverse moves out and gets the prefix back21:01
clarkbI think21:01
mwhahahayea21:01
mwhahahai just wrapped the age ina location but you could probably leave the CacheEnable in the location block as well21:02
clarkbmwhahaha: updated the change if you want to double check what I did with it21:03
mwhahahak let me try that21:03
mwhahahayea that'll work21:04
* mwhahaha ties to fix his cache21:04
fungisorry, was on a brief taco break, back now21:06
clarkbfungi: I think we've got it sorted out. Now just waiting on updated test results before I use mirror.iad.rax as a real world check21:07
*** slaweq has quit IRC21:07
fungiyeah, proxypassreverse won't be the same location21:08
fungii only half know what i'm talking about though, apache mod_proxy is a bit of voodoo and the docs are sometimes opaque21:09
*** xek has quit IRC21:10
mwhahahaso cache enable i think has to be outside as well21:23
mwhahahamaybe not tho21:23
clarkbI've applied it to mirror.iad.rax and not seeing it cache (yet at least)21:34
clarkbwe haven't regressed the docker client though21:34
clarkbredirects seem to be working and going through the proxy21:35
clarkbI can try to unset authorization on the cloudflare location21:36
clarkbadding RequestHeader unset Authorization to the cloudflare location doesn't seem to change anything? Though I reloaded and didn't restart21:41
clarkbya I'm wondering if it is something else? or maybe a number of things and we've addressed some of it?21:41
clarkbmwhahaha: the other difference that stood out to me was the accept-encoding difference. The response isn't encoded with either accept encoding and maybe apache won't cache something that has an inappropriate encoding?21:42
mwhahahamaybe21:42
mwhahahai just installed docker and a pull cached (or at least has cache reponses)21:42
mwhahahapodman didn't21:42
mwhahahawhich was weird21:43
mwhahahaso something is going on21:43
clarkbI'm setting mirror.iad back to normal now21:49
clarkbshould be back now. Disabled mod_headers too21:50
mwhahahaI think it might be the Accept-Encoding: identity21:52
mwhahahalet me test that21:52
*** artom has quit IRC22:04
mwhahahaso yes it's the authentication header22:07
mwhahahaand dropping it is a pain22:08
* mwhahaha flips tables22:08
clarkbhrm I tried dropping it in the apache config and it dodn't seem to help. maybe I did it wrong22:09
mwhahahayea i don't think you can drop it in apache22:09
mwhahahait likely does it in the wrong spot22:09
clarkbthat could be22:09
mwhahahai can add a bunch of hack code in22:10
mwhahahabut i think i'll do that tomorrow22:10
mwhahahait means we have to check if the blob response is a redirect and if it is, don't just follow but drop the auth since we're likely switching domains22:10
mwhahaharight now it works, it's just not cachable22:10
mwhahahai hate this code so much22:10
mwhahahabut podman's pull is no better22:11
clarkbmwhahaha: does dropping authorization drop the Age header on the response?22:11
mwhahahalet me see i have the age dropping config in place22:11
clarkbbecaues the response cache-control header does say public which should mean the authorization on its ow nis fine (but maybe not if it adds age)22:11
mwhahahalet me take that out22:11
mwhahahawe might be hitting: If the response has a status of 200 (OK), the response must also include at least one of the "Etag", "Last-Modified" or the "Expires" headers, or the max-age or s-maxage directive of the "Cache-Control:" header, unless the CacheIgnoreNoLastMod directive has been used to require otherwise.22:12
clarkbwe should have Etag, Last-modified, and cache-control with max-age22:12
clarkbI don't think they send an expires header22:13
mwhahahai wasn't setting max-age in the Cache-Control i was sending22:13
mwhahahaso i wonder if i needed to do that22:13
clarkbI think apache says it take sthe lesser of the two values22:13
* clarkb looks for where it says that22:13
mwhahahaah so we couldn't fix that then22:14
clarkb"At the same time, the origin server defined freshness lifetime can be overridden by a client when the client presents their own Cache-Control header within the request. In this case, the lowest freshness lifetime between request and response wins."22:14
clarkbmwhahaha: another option may be to crnak up the apache logging verbosity and see if it says why it is making cache decisions22:14
mwhahahathey send age22:15
clarkbhttps://httpd.apache.org/docs/2.4/mod/core.html#loglevel you can change that value on your test setup22:15
clarkb(if I do it in prod we'll get so many logs all at once but it is a possibility too)22:15
mwhahahahttp://paste.openstack.org/show/796443/22:16
mwhahahathat's what happens with python, let me check docker again22:16
mwhahahadocker gets age too22:20
mwhahahai think it's just the Authorization header22:20
mwhahahahttp://paste.openstack.org/show/796444/22:22
fungihttps://httpd.apache.org/docs/2.4/mod/mod_cache.html under CacheStoreNoStore Directive: "Resources requiring authorization will never be cached."22:24
fungialso includes the same caveat under CacheIgnoreCacheControl and CacheStorePrivate22:26
clarkbfungi: what good is cache-control public then?22:26
clarkb(that is set on these and I thought the intent there was to say yes this was requested with authorization but it is public data)22:26
fungii'm guessing apache doesn't care22:26
mwhahahacan we add CacheIgnoreHeaders Authorization ? :D22:27
mwhahahanope that'd be too easy22:28
clarkbhttps://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.822:28
clarkbmwhahaha: we can do RequestHeader unset Authorization but I tested that and it didn't work22:29
clarkbfungi: ^ the rfc seems to say the cache-control: public means this is allowed22:29
clarkbit would be super annoying if apache said we don't care :(22:29
mwhahahaeh i'll just fix the code tomorrow22:29
clarkbwe may be able to disable CacheQuickHandler then do RequestHeader unset Authorization. I think the reason it is weird there is by default the cache is handled very early on22:35
clarkbbut disabling the quick handler would allow more processing to happen and potentially allow us to disable the extra bits22:35
*** rcernin_ has joined #openstack-infra22:35
mwhahahahttps://review.opendev.org/#/c/743629/22:39
mwhahahaif you want to watch logs for that22:39
mwhahahain theory it's what does it22:39
mwhahahait's terrible22:39
mwhahahamehi broke something22:42
*** rcernin_ has quit IRC22:48
*** rcernin has joined #openstack-infra22:48
*** tkajinam has joined #openstack-infra22:53
*** ociuhandu has joined #openstack-infra23:01
*** ociuhandu has quit IRC23:06
*** piotrowskim has quit IRC23:12
*** rlandy has quit IRC23:16
*** tosky has quit IRC23:21
*** dchen|away is now known as dchen23:24
*** ramishra has quit IRC23:52

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!