Wednesday, 2020-08-19

*** DSpider has quit IRC		00:12
*** markmcclain has quit IRC		00:13
*** markmcclain has joined #opendev		00:15
*** markmcclain has quit IRC		00:24
ianw	i've dug a bit on the manylinux arm page size issue and collated a few things into https://github.com/pypa/manylinux/issues/735	01:33
*** ysandeep\|away is now known as ysandeep		01:46
openstackgerrit	Ian Wienand proposed openstack/project-config master: Move github reporting to checks API https://review.opendev.org/746827	01:59
*** Dmitrii-Sh9 has joined #opendev		02:04
*** cloudnull8 has joined #opendev		02:04
*** cloudnull has quit IRC		02:04
*** cloudnull8 is now known as cloudnull		02:04
*** smcginnis has quit IRC		02:05
*** weshay\|ruck has quit IRC		02:05
*** weshay has joined #opendev		02:06
*** Dmitrii-Sh has quit IRC		02:06
*** prometheanfire has quit IRC		02:06
*** Dmitrii-Sh9 is now known as Dmitrii-Sh		02:06
*** smcginnis has joined #opendev		02:06
*** prometheanfire has joined #opendev		02:07
*** shtepanie has quit IRC		03:16
prometheanfire	ianw: ta	03:31
ianw	lgtm, i hope to release after that and we can have some new things	03:35
prometheanfire	yep	03:43
prometheanfire	I have been using it on a packet host, hope to build an ironic image soon there	03:43
prometheanfire	also, osuosl	03:43
*** raukadah is now known as chkumar\|rover		04:25
*** cmurphy is now known as cmurphy_afk		04:32
openstackgerrit	Merged openstack/diskimage-builder master: update gentoo to allow building arm64 images https://review.opendev.org/746000	04:35
*** cloudnull is now known as kecarter		05:19
*** kecarter is now known as cloudnull		05:19
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: [wip] json edit https://review.opendev.org/746834	05:27
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: [wip] json edit https://review.opendev.org/746834	05:30
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: [wip] json edit https://review.opendev.org/746834	05:40
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: [wip] json edit https://review.opendev.org/746834	05:56
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: [wip] json edit https://review.opendev.org/746834	06:05
*** DSpider has joined #opendev		06:17
*** lpetrut has joined #opendev		06:51
*** moppy has quit IRC		08:01
*** moppy has joined #opendev		08:01
openstackgerrit	Merged opendev/elastic-recheck master: Bumped flake8 https://review.opendev.org/729328	08:10
*** dtantsur\|afk is now known as dtantsur		08:16
openstackgerrit	Merged zuul/zuul-jobs master: terraform: Add parameter for plan file https://review.opendev.org/746657	08:21
openstackgerrit	Sorin Sbarnea (zbr) proposed opendev/elastic-recheck master: Resolve unsafe yaml.load use https://review.opendev.org/730389	08:24
*** tosky has joined #opendev		08:28
*** hipr_c has quit IRC		08:31
openstackgerrit	Merged opendev/elastic-recheck master: Replace pep8 jobs with linters https://review.opendev.org/731497	08:37
openstackgerrit	Sorin Sbarnea (zbr) proposed opendev/elastic-recheck master: Resolve unsafe yaml.load use https://review.opendev.org/730389	08:52
*** ryo_hayakawa has joined #opendev		08:58
*** ryohayakawa has quit IRC		09:01
openstackgerrit	Sorin Sbarnea (zbr) proposed opendev/elastic-recheck master: WIP: Create elastic-recheck container image https://review.opendev.org/729623	09:02
zbr	something is unclear to me, why https://hub.docker.com/r/opendevorg/python-base and builder seem not to have python installed on them by default.	09:19
zbr	every time I build, it seems to install python3..., twice.	09:21
*** ysandeep is now known as ysandeep\|lunch		09:35
*** ysandeep\|lunch is now known as ysandeep		09:55
*** ysandeep is now known as ysandeep\|brb		10:27
*** Ankita2910-contr has joined #opendev		10:49
*** ysandeep\|brb is now known as ysandeep		10:49
*** Ankita2910-contr has quit IRC		10:50
openstackgerrit	Sorin Sbarnea (zbr) proposed opendev/elastic-recheck master: Enable configuration via environment variables https://review.opendev.org/746890	11:10
tosky	I see a relevant number of failures on rax nodes because they can't find glance-store 2.2.0	11:13
*** ryo_hayakawa has quit IRC		11:28
*** ryohayakawa has joined #opendev		11:29
*** ryohayakawa has quit IRC		11:44
*** hashar has joined #opendev		12:01
openstackgerrit	Merged opendev/elastic-recheck master: Resolve unsafe yaml.load use https://review.opendev.org/730389	12:02
*** Gyuseok_Jung has quit IRC		12:06
*** sshnaidm is now known as sshnaidm\|afk		12:26
tosky	now gra1 mirror doesn't find neutron-lib 2.5.0	12:37
*** weshay is now known as weshay\|interview		12:47
frickler	tosky: we only proxy from pypi, so likely some issue in their cdn	12:59
tosky	frickler: I thought so	13:01
tosky	is there a way to tell them about it? I've been hit by the glance-store issues for the last 2 days, and I'm probably not the only one	13:02
tosky	I guess it may solve by itself, but...	13:02
*** weshay\|interview is now known as weshay		13:10
openstackgerrit	Emilien Macchi proposed openstack/project-config master: Disable E208 for now https://review.opendev.org/746920	13:17
openstackgerrit	Emilien Macchi proposed openstack/project-config master: Re-introduce puppet-tripleo-core group https://review.opendev.org/746759	13:17
*** weshay is now known as weshay\|ruck		13:21
*** priteau has joined #opendev		13:31
frickler	tosky: do you have some link to your failures? seems unlikely that it should always hit the same pkg, so maybe there's some other cause aftr all	13:32
tosky	https://zuul.opendev.org/t/openstack/build/72943719365a49c0870ca01b8f4990bf	13:38
tosky	https://zuul.opendev.org/t/openstack/build/54153ce8cbe840f5809f8f3df8931680	13:39
tosky	https://zuul.opendev.org/t/openstack/build/7ce8bd56d0204d4da4489f0d18e29cb3	13:39
tosky	https://zuul.opendev.org/t/openstack/build/07676749949d46e08f03cf701874e4bc	13:40
openstackgerrit	Sorin Sbarnea (zbr) proposed opendev/elastic-recheck master: Enable configuration via environment variables https://review.opendev.org/746890	13:41
tosky	I've seen also a few of them related to other packages (oslo-messaging 12.1.2, and neutron-lib 2.5.0)	13:41
tosky	I may dig into them	13:41
tosky	probably not worth it if tomorrow everything works	13:41
tosky	see also the report on #openstack-infra just now	13:42
*** dulek has joined #opendev		13:44
openstackgerrit	Sorin Sbarnea (zbr) proposed opendev/elastic-recheck master: Filter bugs shown in graphs based on regex https://review.opendev.org/257633	13:49
Open10K8S	Hi infra team	13:52
openstackgerrit	Emilien Macchi proposed openstack/project-config master: Disable E106 & E208 for now https://review.opendev.org/746920	13:52
Open10K8S	Hi team. The zuul checking is failing now. WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='mirror.ca-ymq-1.vexxhost.opendev.org', port=443): Read timed out. (read timeout=60.0)",)': /pypi/simple/cffi/	13:53
openstackgerrit	Emilien Macchi proposed openstack/project-config master: Re-introduce puppet-tripleo-core group https://review.opendev.org/746759	13:53
Open10K8S	seems mirror.ca-ymq-1.vexxhost.opendev.org is unreachable or the connection is unstable	13:53
frickler	infra-root: the only thing I can note about these pypi issues is that the not-found wheels don't appear in our self-built wheel list, neither in https://mirror.dfw.rax.opendev.org/wheel/ubuntu-20.04-x86_64/glance-store/ nor https://mirror.dfw.rax.opendev.org/wheel/ubuntu-18.04-x86_64/glance-store/ e.g.	13:53
frickler	Open10K8S: yes, our 2001:db8 issue has reappeared, on it	13:54
Open10K8S	What is the reason and how can I do ? :)	13:54
Open10K8S	infra-root: frickler: recently, it occurred frequently	13:55
frickler	mnaser: maybe you can track that mac? http://paste.openstack.org/show/796950/	13:56
frickler	Open10K8S: mnaser seemed to assume some neutron bug, not sure if you have internal access to vexxhost, too?	13:57
mnaser	i think we'll have to chase this down :\	13:57
frickler	Open10K8S: I remove the stray IPs from that mirror host, so it should be working now again until the next incident	13:57
mnaser	that mac address is useful, it will help us start narrowing it down	13:58
Open10K8S	frickler: so you mean you removed extra ipv6 address?	13:59
Open10K8S	ok, let me recheck PSs	13:59
frickler	Open10K8S: yes	14:02
frickler	mnaser: it started a bit earlier with the other prefix http://paste.openstack.org/show/796951/ . I'd not be surprised if that was one of our neutron jobs	14:03
* frickler needs to afk, bbl		14:03
*** hashar has quit IRC		14:06
*** Ankita2910-contr has joined #opendev		14:24
*** ysandeep is now known as ysandeep\|away		14:29
Open10K8S	frickler: ok, good now	14:32
Open10K8S	thank you	14:32
*** tkajinam has quit IRC		14:33
*** hashar has joined #opendev		14:33
*** Ankita2910-contr has quit IRC		14:33
*** sshnaidm\|afk is now known as sshnaidm		14:42
*** mlavalle has joined #opendev		14:44
openstackgerrit	Sorin Sbarnea (zbr) proposed opendev/elastic-recheck master: WIP: Enable configuration via environment variables https://review.opendev.org/746890	14:49
*** qchris has quit IRC		14:56
clarkb	zbr: can you expand on what you are seeing with the python images? we base on the python images which have python3 preinstalled, but they aren't installed from the distro so it could be something is pulling it in	15:03
zbr	clarkb: ahh, that explains why it gets installed. Now my question is why not from distro?	15:04
*** larainema has joined #opendev		15:04
clarkb	zbr: because we interested in using the python official images and that isn't how they distribute python	15:04
zbr	and this means that i cannot benefits from precompiled packages, unless they published wheels for them.	15:05
clarkb	yes, those images are intent on python source installs not for use with distro packaging	15:06
clarkb	however	15:06
zbr	seems bit weird but i guess i can live to accept it.	15:06
clarkb	I think there is an apt override file that is supposed to make this better	15:06
clarkb	basically it tells apt that python is already installed /me is looking for it	15:06
zbr	for development it is a very inconvenient way to produce containers (slow)	15:06
clarkb	zbr: well the whole system is built that way. That is why we have a builder and a base image	15:07
clarkb	if you don't want that you can just FROM debian and go from there	15:07
clarkb	the idea is the builder image is used to create wheels for everything in a throw away image, then on the bsae side we install all the wheels which keeps the images smaller	15:07
clarkb	but the whole thing assumes a from source installation	15:07
zbr	i will try to follow the same practices and diverge only if really needed	15:07
clarkb	zbr: looks like the apt override is for python3-dev only	15:08
zbr	clarkb: mtreinish: i need your help with a very weird unitest failure unique to py27 at https://review.opendev.org/#/c/746890/	15:08
clarkb	I don't see a problem with also adding a python3 if that helps	15:08
zbr	it is bit of mindblowing because I see an error related to a an exception that was supposed to be catched	15:09
zbr	i wonder if there is some magic stestr issue....	15:09
clarkb	zbr: https://opendev.org/opendev/system-config/src/branch/master/docker/python-builder/Dockerfile#L23-L24 that is the override for python3-dev	15:09
*** qchris has joined #opendev		15:09
zbr	ok i will look at this later, when i am back at container building. now i have some preparatory steps to finish.	15:10
clarkb	zbr: its a namespace problem. Its ConfigParser.NoOptionError but you from configparser import NoOptionError	15:11
clarkb	frickler: tosky: ya we seem to be seeing that more often. Basically pypi isn't serving the requires python info on the indexes	15:12
zbr	what i used in the past with other python projects was alpine plus all system packages available and wheels for others.	15:12
zbr	haha, lol. thanks	15:12
*** chkumar\|rover is now known as raukadah		15:14
openstackgerrit	Sorin Sbarnea (zbr) proposed opendev/elastic-recheck master: Enable configuration via environment variables https://review.opendev.org/746890	15:15
zbr	clarkb: when do you think we could remove py27 support because the sooner the better. Can we remove it before we have the container? AFAIK e-r is not redeployed with CD so we could live for a while without supporting py37 on master anymore?	15:16
zbr	asking because there is a high amount of py27 junk i would like to scrap...	15:17
clarkb	this is why I originally suggested we just change the pip instal in the puppet to pip3 ...	15:18
clarkb	I think its literally a one line change to get on python3.5 then you can drop most python2 stuff	15:18
clarkb	I think a better end goal is the container so didn't argue against it, but the original goal was just to get off python2 and that is much simpler	15:19
zbr	clarkb: i remember trying this but i went into a chain of issues related to puppet. ideally we should switch to py36	15:21
clarkb	zbr: yes a simple linter failure	15:22
openstackgerrit	Sorin Sbarnea (zbr) proposed opendev/elastic-recheck master: Enable configuration via environment variables https://review.opendev.org/746890	15:22
clarkb	I've rechecked the change so we can get logs again	15:22
zbr	i seen. if we fix this it will ease the other steps.	15:23
openstackgerrit	Merged openstack/project-config master: Disable E106 & E208 for now https://review.opendev.org/746920	15:28
fungi	Open10K8S: mnaser: frickler: catching up on scrollback now, but i see there are still stray v6 global prefixes on the ca-ymq-1 mirror. should i go ahead and delete them?	15:28
mnaser	fungi: i think we're hitting it a lot more often sadly	15:28
clarkb	mwhahaha: weshay\|ruck: fwiw I still see single IPs in rougly the same timeframe (so single job node) requesting the same docker shas multiple times	15:29
clarkb	its possible caching the layers and reusing them without another network round trip would also help with request limits	15:29
mnaser	fungi: but yes, please do	15:31
fungi	mnaser: looks like we have nexthop via fe80::ce2d:e0ff:fe0f:74af and fe80::ce2d:e0ff:fe5a:d84e currently	15:31
mnaser	fungi: those are the correct next hops though	15:32
mnaser	fungi: the only addrs that should be there are 2604:e100:...	15:32
fungi	okay, so we just need to delete the 2001:db8:0:3::/64 and 2001:db8:1::/64 aliases, got it	15:32
mnaser	yes	15:33
fungi	mnaser: oh! i think i see the issue. we've deleted the interface aliases but not the associated host routes	15:33
mnaser	oh heck, so there's.. 4 default routes?	15:34
fungi	nope, not default routes, interface local routes	15:34
fungi	http://paste.openstack.org/show/796957/	15:35
fungi	so maybe those aren't particularly disruptive as long as the mirror never needs to talk to some address within one of those cidrs	15:35
fungi	which it shouldn't, those are from a reserved aggregate anyway	15:35
fungi	frickler: on the "missing wheels" those are pure python projects with existing wheels on pypi, so we don't copy them into our pre-built wheels cache	15:38
fungi	we expect them to be fetched from pypi instead	15:38
fungi	the current wheel mirror job basically installs all of openstack/requirements upper-constraints.txt into a virtualenv for the relevant python version and platform, then scrapes the pip log to find out which wheels were not downloaded from pypi and only copies those into our cache	15:40
openstackgerrit	Sorin Sbarnea (zbr) proposed opendev/puppet-elastic_recheck master: Use py3 with elastic-recheck https://review.opendev.org/729336	15:45
fungi	infra-root: (any anyone else) what do you think about publishing the apache access logs on our mirror servers? would that be reasonably safe? http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016656.html	15:48
*** shtepanie has joined #opendev		15:49
clarkb	fungi: the request was for stats too. Maybe we start with some processed logs to avoid exposing too much? Its also possible that our existing goaccess stuff would be helpful here	15:49
fungi	oh, maybe. can it do mod_proxy cache hit/miss tracking?	15:50
clarkb	I dont know	15:50
clarkb	its mod_cache in this case not mod_proxy, but it has a bunch of features and that may be one of them	15:50
fungi	oh, right	15:50
clarkb	there is a cache status panel	15:52
clarkb	and you need to have %C in your log format to enable it (I'm not sure if the default apache log format has %C)	15:52
clarkb	I don't think %C is in the default combined log format	16:01
clarkb	we would have to define our own log format but that should work once done	16:01
clarkb	and if that doesn't work we can probably do similar with some simple scripting to scrape the logs for this specific purpose	16:02
zbr	clarkb: fixing the puppet linter broke the testing job... what a joy	16:17
*** hashar has quit IRC		16:18
clarkb	zbr: I think its unrelated	16:19
zbr	clarkb: it did pass before removing the ::	16:19
clarkb	zbr: in may	16:20
clarkb	https://zuul.opendev.org/t/openstack/build/3af568eeb7ea40739163ee3b9bf6af3d/log/job-output.txt#1913-1916 is the error and its well before it tries to aprse any puppet I think	16:20
clarkb	its trying to load ssh keys?	16:21
zbr	noidea. when you have time, please also take a look at https://review.opendev.org/#/c/746890/ -- i already know is not perfect but i think is a step in the right direction.	16:26
zbr	once we drop py27, i will be able to simplify it even more	16:26
clarkb	I wonder if the issue is related to ssh key format that whole -m pem thing	16:29
clarkb	https://zuul.opendev.org/t/openstack/build/3af568eeb7ea40739163ee3b9bf6af3d/log/job-output.txt#1275 shows there is already a key present	16:30
clarkb	and if that key was generated by zuul it may have the newer format?	16:30
clarkb	anyone know how /home/zuul/.ssh/id_rsa is being generated in jobs?	16:30
clarkb	https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/infra-puppet-beaker-rspec/run.yaml#L8-L25 is where we try to generate a key on the server itself	16:32
clarkb	https://zuul.opendev.org/t/openstack/build/3af568eeb7ea40739163ee3b9bf6af3d/log/job-output.txt#67-69 I think that may be how we get the key in there	16:33
clarkb	ya https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/add-build-sshkey/tasks/create-key-and-replace.yaml#L1-L4 is gonna be with new format key file since we run executors in containers now	16:34
clarkb	zbr: ^ I think that is what changed since may	16:34
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: Generate build ssh keys in PEM format https://review.opendev.org/746994	16:38
clarkb	zbr: ^ I think that will fix it	16:38
clarkb	fungi: looking at today's proxy_8082 access log on mirror.iad.rax we have about a 79% cache hit rate	16:38
clarkb	not great, but not terrible either	16:39
fungi	any idea if the authenticated requests which are passed through as uncacheable count as cache misses?	16:46
clarkb	fungi: no I'm only counting cacheable things (eg an explicit cache miss or hit is logged)	16:48
fungi	ahh, okay	16:49
*** hashar has joined #opendev		16:50
*** lpetrut has quit IRC		16:50
clarkb	it seems there are in the basic case 5 requests per blob (I think that number goes down if there are more layers associated with a manifest)	16:52
clarkb	we can cache 1 of them the actual data not metadata transfer	16:52
clarkb	the other 4 are metadata and not cacheable	16:52
clarkb	so if we request the same thing 3 times we go from 5 to 15 requests, 12 of which are to docker hub	16:52
fungi	was this specific to the dockerhub v2 cache, or across all our caches for that server?	16:52
clarkb	just port 8082 which is docker hub v2	16:56
clarkb	looking at this one ip during the time range I was using to find duplicate requests it seems there are ~69 layers 30 of which are requested multiple times	16:58
*** dtantsur is now known as dtantsur\|afk		17:01
johnsom	Hi all, I just wanted to mention we have another case where the python package cache seems to have gone wrong: https://zuul.opendev.org/t/openstack/build/cd7e3192f79843c9bb2752037f0b880e/log/job-output.txt#6956	17:10
johnsom	Could not find a version that satisfies the requirement glance-store===2.2.0	17:11
johnsom	Though it was posted on pypi on the 13th (according to pypi).	17:11
johnsom	Just wanted to mention it. We are rechecking, so should be fine, just another oddity case.	17:11
clarkb	johnsom: ya others have been reporting similar. We may also have pypi serving indexes without requires python metadata so sometimes python2 or python3.5 intsall a too new package	17:12
johnsom	Thanks!	17:14
fungi	#status log ethercalc service restarted following exportCSV crash at 16:42:59 utc	17:18
openstackstatus	fungi: finished logging	17:18
johnsom	Looks like the same with neutron-lib 2.5.0, which pypi says was out Aug 6th. hmmm	17:18
fungi	infra-root: the backtrace on that looks basically the same as the last one	17:18
johnsom	Here is the link to the new neutron-lib failure: https://78a56d3383964b9a61ff-8075c89a724b2b68e4da309a54349f49.ssl.cf1.rackcdn.com/746418/4/check/octavia-v2-dsvm-scenario/ad79a09/job-output.txt	17:19
johnsom	Different "mirror" instance than the last one.	17:19
clarkb	johnsom: ya its all proxies to pypi	17:20
clarkb	johnsom: it seems that pypi's CDN has problems	17:20
clarkb	fungi: I can't find a way to do a csv export	17:21
fungi	i wish pip freeze recorded the installed version of pip like pip list does	17:21
clarkb	at least not through the ui. I wonder if that is an api call?	17:21
fungi	looks like that job is using distro package dpip	17:22
fungi	python3-pip_9.0.1-2.3~ubuntu1.18.04.2_all.deb	17:23
clarkb	I think devstack installs with get-pip.py	17:23
fungi	i don't think pip 9.0 knows how to check that metadata on pypi	17:23
johnsom	I'm not sure which cache package is being used, but does it maybe have a negative-cache setting that is too long?	17:23
clarkb	johnsom: indexes are cached for 10 minutes	17:24
clarkb	fungi: yes that job log shows it installing pip 20.2.2 with get-pip	17:24
fungi	ahh, yeah later it's using get-pip.py after purging the python3-pip distro package	17:24
clarkb	johnsom: you can check all that with your own brwoser too fwiw	17:26
clarkb	though in tis case it didn't give you an index url link which is annoying (depending on how it fails you'll get that)	17:26
fungi	and i don't see the job reinstalling the distro package later, so i guess it is failing that with 20.2.2	17:27
johnsom	It's not these: https://mirror.gra1.ovh.opendev.org/pypi/simple, https://mirror.gra1.ovh.opendev.org/wheel/ubuntu-18.04-x86_64 ?	17:27
clarkb	johnsom: its https://mirror.gra1.ovh.opendev.org/pypi/simple/neutron-lib/	17:28
clarkb	the wheel mirror won't have neutron-lib since we publish wheels to pypi	17:28
clarkb	2.5.0 is there now	17:28
johnsom	Hmm, clicking those links don't get me the file, but this stuff is in my "just enough time between needing to know" purged window. grin	17:29
johnsom	Ah, need to drop the sha	17:30
clarkb	johnsom: you don't get a download cliking on those links? (I do)	17:30
clarkb	but also the issue is with the index not having the requested version not the version itself 404'ing	17:30
clarkb	(you get a different error for that)	17:30
johnsom	Hmm, so I wonder if the first call will "warm" the cache. Maybe we need a pip wrapper that retries a few times. I don't see a native option on pip to do that.	17:31
clarkb	I mean ideally pypi would fix their broken indexes :/	17:31
johnsom	Yep	17:31
clarkb	I do think we get a CDN backend identifier in the headers	17:32
clarkb	x-served-by	17:32
johnsom	It's just a bummer for us, the first failure led to recheck that led to another different failure. Wasting zuul resources, etc.	17:32
clarkb	maybe we can request that pip include that info on failed index lookups and then we can take that back t othem	17:33
fungi	aha, if you view-source:https://mirror.gra1.ovh.opendev.org/pypi/simple/neutron-lib/ you can see the requires_python metadata in some of the newer urls	17:33
fungi	er, newer package versions	17:33
clarkb	fungi: yup, I double checked that job is using python3.6 where it fails so shouldn't be a python version mismatch	17:34
fungi	right, the most we specify is >=3.6	17:34
clarkb	(because thats the other similar way we see hte problem. you have wrong version of python and pip says the package doesnt exist)	17:34
fungi	or it dwonloads too new of a package for your version of python and then setuptools throws a fit on install	17:35
clarkb	fungi: maybe we shouod collect tye x-served-by values form mirrors that have hit this and give that to pypa?	17:41
clarkb	I dont know that thereus much more debugging we can do other than to tell them something is up and likely with these cdn caches?	17:41
fungi	maybe we could somehow collect the actual page content for those indices?	17:42
fungi	not sure how to go about doing that though	17:42
fungi	unless pip caches them maybe	17:42
clarkb	sometimes if you refresh you get lucky and get the bad content	17:42
clarkb	but that was months ago when this last happened	17:43
johnsom	This error seems to also happen when the internal resolver fails. I wonder if it would be adding a -v or two to our pip calls to get more information. Based on this open issue report: https://github.com/pypa/pip/issues/8785	17:43
clarkb	johnsom: but it seesthe other packages	17:43
johnsom	That said, I just got a varnish 503 error by digging in their docs, so...	17:44
fungi	ahh, yeah, so suggesting that fastly has a mix of good and problem endpoints in the same pool and is distributing requests across them all	17:44
clarkb	(it lists out all the valid ones it sees in the error)	17:44
clarkb	fungi: yes	17:44
fungi	so in theory if i smash https://pypi.org/simple/neutron-lib/ a bunch from mirror.gra1.ovh.opendev.org i should sometimes get output which is missing neutron_lib-2.5.0-py3-none-any.whl	17:48
clarkb	ya, but ithas a 10 minute timeout si may take a while	17:49
fungi	well, i'm not going through our cache to do that	17:49
clarkb	anotheroption is to wget/curl from that s erver to pypi?	17:49
clarkb	oh ya that	17:49
fungi	i'm running `wget -SvO- https://pypi.org/simple/neutron-lib/\|grep 'neutron_lib-2.5.0-py3-none-any.whl'` from a shell on mirror.gra1.ovh.opendev.org	17:49
fungi	-S gives us the X-Served-By: header	17:50
fungi	(well, all headers, but including that one)	17:50
mnaser	out of curiosity, how has opendev 'forced' image rebuilds when changing the source images inside ci?	17:50
fungi	mnaser: usually with an empty change	17:50
mnaser	we use a similar model, we just made an updated in the base image and we want to force a rebuild of an image (we use files: ..)	17:50
johnsom	Oye, another one: https://6c7880f18320551abc31-62593741b623fd737fcd3b17f392bcb8.ssl.cf1.rackcdn.com/746996/3/check/octavia-v2-dsvm-noop-api/24037db/job-output.txt	17:50
mnaser	right, so whats the empty change y'all go for, just to try and mimic :P	17:51
fungi	mnaser: ugly, but we haven't needed to do it often	17:51
clarkb	mnaser: a comment change in dockerfile iirc	17:51
mnaser	like new line in dockerfile? touching the job config in zuul?	17:51
clarkb	as we filter on the files changed	17:51
mnaser	ah, that's reasonable	17:51
fungi	also allows us to date the trigger for the new build	17:52
fungi	since there's basically a record in the git history	17:52
fungi	johnsom: so "ERROR: No matching distribution found for libvirt-python===6.6.0" for that one, and via mirror.dfw.rax.opendev.org	17:54
*** hashar is now known as hasharWineMusic		17:55
fungi	here's an interesting twist though... are these failures all centos-8?	17:57
johnsom	Just that last one	17:57
fungi	last two	17:57
fungi	ahh, no, i was looking at the same log twice	17:57
johnsom	Yeah, the first two were bionic	17:58
fungi	so unrelated, but that most recent one is also looking for the wrong wheel cache url	17:58
fungi	i expect all recent centos-8 builds may be doing that	17:59
fungi	https://mirror.dfw.rax.opendev.org/wheel/centos-8.2-x86_64/	17:59
fungi	that does not exist	17:59
fungi	should be https://mirror.dfw.rax.opendev.org/wheel/centos-8-x86_64/	17:59
fungi	not sure if we need to tweak the pip.conf on those nodes, or add redirects on the mirrors	17:59
johnsom	There is a proposal on the table to switch our main testing to centos as there is no help from Canonical on the project and focal is having networking issues inside our service vms. Most of the main contributors are running centos. But this is an aside and an up coming PTG topic.	18:01
clarkb	fungi: we ise an ansible fact for the distro name I think	18:02
fungi	looks like libvirt-python-6.6.0.tar.gz was uploaded 17 days ago (2020-08-02) so not including it in the simple api means either a very stale response, or truncated/empty response	18:02
clarkb	did it list valid versions?	18:03
clarkb	ifso that implies it wasnt an empty response	18:03
clarkb	sprry I'm just about to sit on the bikr and head out. back in a bit	18:03
fungi	go get your bike on	18:04
fungi	i'll keep digging	18:04
johnsom	o/	18:04
fungi	the log does indeed list available versions, but only up to 6.4.0, so missing 6.5.0 and 6.6.0 which are included in the responses i'm seeing	18:05
fungi	so yeah, maybe a truncated response?	18:06
fungi	or else a very, very stale cache	18:06
johnsom	Hmm, ok, something else is going on. I did another recheck on that earlier glance-store 2.2.0 issue, got the same failure again from the same "mirror": https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_5bb/746418/4/check/octavia-v2-dsvm-scenario/5bb9d6f/job-output.txt	18:07
johnsom	So, these two: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_5bb/746418/4/check/octavia-v2-dsvm-scenario/5bb9d6f/ (latest) and https://zuul.opendev.org/t/openstack/build/cd7e3192f79843c9bb2752037f0b880e/log/job-output.txt (earlier today)	18:08
fungi	i'll switch back to trying to repro that one instead of the dfw one	18:09
johnsom	If I click into https://mirror.ca-ymq-1.vexxhost.opendev.org/wheel/ubuntu-18.04-x86_64/glance-store/ it is in fact not there	18:09
*** lpetrut has joined #opendev		18:10
fungi	it wouldn't be	18:10
fungi	we don't copy wheels there if they exist on pypi	18:10
fungi	we only add wheels which aren't on pypi, so generally not anything pure-python and not anything which already publishes appropriate manylinux1 wheels	18:11
johnsom	Ok	18:11
fungi	there are going to be some older glance-store wheels in there from before we started filtering what we copy into it	18:11
fungi	but basically if they exist on pypi we want the pypi cache to be used	18:11
fungi	so far i can't reproduce, but they do seem to have a bunch of caches there in the montreal area	18:14
fungi	though right now all the responses i get there are still chained back to the same cache in baltimore	18:15
fungi	(many second-level caches in montreal serving content from the same cache backend in baltimore)	18:16
*** lpetrut has quit IRC		18:17
fungi	worth noting, the last time this cropped up in the montreal area, we saw similar errors from builds in both ca-ymq-1.vexxhost and bhs1.ovh as they seemed to be hitting the same local fastly endpoints	18:22
fungi	but so far i'm not able to trip it from a shell on the mirrors there	18:23
fungi	i've set up a loop to hit it every 10 seconds and log the cache backends to a file	18:33
fungi	save the wear on my up arrow	18:33
fungi	also this way i can continue trying to trap a failure while also chopping onions for dinner	18:34
fungi	aha, got one!	18:35
fungi	cache-bwi5145-BWI, cache-yyz4537-YYZ	18:35
fungi	so far that's the only hit against cache-yyz4537-YYZ but cache-bwi5145-BWI shows for all the successful copies, so it's likely a problem at the second level	18:36
johnsom	Neat, yeah, I just needed to step away to make lunch. grin	18:36
fungi	i'll hopefully catch a few more and if they're all cache-yyz4537-YYZ then i expect it's the problem there	18:37
fungi	i guess i should set up a similar loop on the dfw.rax mirror server	18:37
fungi	looking for stale libvirt	18:38
fungi	so far it's seen 21 different second-level endpoints reported	18:39
fungi	so that puts the failure rate at being fairly low, i expect	18:39
johnsom	Seems like we are unusually "lucky" then?	18:40
fungi	well, the problem is at our third-level cache. we get unlucky once and then serve that bad copy for 10 minutes before we refresh it	18:41
fungi	i mean, as far as amplifying the error. that'll make it come in spurts of 10 minutes of bad followed by roughly 200 minutes of good	18:42
fungi	as opposed to a random ~5% failure rate	18:43
fungi	now i've got a similar loop in dfw.rax looking for missing libvirt-python-6.6.0.tar.gz	18:44
fungi	both running in detached screen sessions under my user	18:45
fungi	i'll check back in on them again in a bit	18:45
fungi	johnsom: have you noticed these failures anywhere besides dfw.rax, bhs1.ovh and ca-ymq-1.vexxhost?	18:46
johnsom	No, but I haven't looked very hard for other either	18:46
fungi	no worries, i expect this will at least be a good start	18:46
johnsom	I will poke around our other patches	18:46
frickler	mnaser: infra-root: as a workaround for the vexxhost mirror v6 issue, how about we configure it's addr and default route statically and disable it usings RAs? would assume that the two routers are static enough. or can one keep RAs for default route and just disable address autoconfig? I can do some tests for that tomorrow	18:55
johnsom	fungi gra1.ovh as well. https://zuul.opendev.org/t/openstack/build/831ab87d3eff48d6b0f96714ad7d5d69/log/job-output.txt	19:15
openstackgerrit	Paul Belanger proposed openstack/diskimage-builder master: Revert "source-repositories: git is a build-only dependency" https://review.opendev.org/747025	19:22
fungi	johnsom: okay, interesting. that ones in france	19:26
fungi	i've added a loop there checking for neutron_lib-2.5.0-py3-none-any.whl	19:29
fungi	revisiting mirror.ca-ymq-1.vexxhost.opendev.org we've gotten a number of good results from cache-yyz4537-YYZ now so i'm really not sure what's going on there	19:31
fungi	and no bad results yet from the loop for mirror.dfw.rax.opendev.org	19:32
*** priteau has quit IRC		19:44
johnsom	Well, I have one job left to finish then I will recheck this quota change patch for the third time and see what happens.	19:49
mnaser	frickler, infra-root: the static routes should be very reliable and extremely stable	19:56
mnaser	infra-root: i am seeing a lot of ceph jobs failing @ vexxhost -- is it possible mirrors are stale there?	19:59
clarkb	mnaser: frickler: that plan seems reasonable, though looking at the server really quickly I'm not sure how ubuntu expects that to be configured these days	19:59
clarkb	mnaser: mirrors should be the same everywhere due to afs	19:59
clarkb	mnaser: can you point to what specifically is breaking?	19:59
mnaser	https://zuul.opendev.org/t/vexxhost/build/05cb819ce0ea485a8229eae309d68c93 was a ceph failure	20:00
clarkb	/etc/netplan/50-cloud-init.yaml seems to be the thing that configures networking on the vexxhost mirror	20:00
clarkb	do we get to learn netplan now :/	20:00
mnaser	it sounds like we're actually pulling from chacra.ceph.com ..	20:00
clarkb	ya you aren't using our mirrors	20:01
mnaser	https://zuul.opendev.org/t/vexxhost/build/c866d0ae94584710a2a2dd60a8e1a4c0 also failed	20:01
mnaser	yeah this seems like a devstack-plugin-ceph issue then	20:01
fungi	Err:19 https://chacra.ceph.com/r/ceph/nautilus/f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9/ubuntu/bionic/flavors/default bionic/main amd64 Packages	20:01
fungi	that's... not us	20:01
mnaser	yeah, i'm just noticing thid, i have no idea what that is to be honest	20:01
mnaser	sorry for that noise, i'm checking the plugin	20:03
clarkb	also re the RA thing when limestone had this problem I tried to set up radvd myself and send RAs then catch them across tenants and failed	20:03
clarkb	but that may be worth checking here as well (then we don't have to figure out what nested neutron is doing)	20:03
mnaser	clarkb: i actually think frickler had a tcpdump going that caught them	20:03
mnaser	it's just.. pretty non trivial to figure out :\	20:04
mnaser	also for those who are curious: https://opendev.org/openstack/devstack-plugin-ceph/src/branch/master/devstack/lib/ceph#L882-L910	20:04
clarkb	mnaser: oh ya I mean I'm sure they're happening but if we can send them ourselves when we want to then we (really you) can trace it through the neutron firewalls	20:04
mnaser	clarkb: yeah at the time the use of finding out the mac address of the system which sent them so we can trace it, so yeah	20:04
fungi	if logstash is up to date and we're indexing the network sanity checks in job logs, the mac address can possibly be found with a logstash search to identify the build/job	20:08
fungi	assuming we were the source of the stray announcements, which i expect we were	20:08
clarkb	ya but those shouldn't be a problem	20:09
clarkb	neutron on the host side should drop them	20:09
* clarkb is reading netplan docs fwiw		20:12
fungi	oh, i completely agree, just hoping if we can get some correlation between ra leakage and specific jobs maybe it'll give us a clue as to how to replicate the problem	20:15
mnaser	fungi: yeah, im hoping that will help me get the host that did it and i can find some useful error message	20:17
clarkb	fwiw I don't think our logstash data is anywhere near current (and ironically due to neutron logs being very large)	20:17
fungi	mnaser: also if you can map it to an instance name (or uuid probably) then i can likely track back through nodepool to zuul and work out which build used it	20:19
mnaser	yeah, i mean i wouldn't be surprised if it was just a normal devstack gate change	20:19
fungi	that's my fear	20:20
fungi	which means the chance of being able to regularly reproduce the problem is low and so a very rare condition	20:20
mnaser	i mean logan- ran into it only once	20:21
mnaser	what can i say though	20:21
mnaser	i'm much more lucky :)	20:22
clarkb	mnaser: well it persisted there	20:22
clarkb	we turned the cloud off for a while and eventually we couldn't reproduce and neutron couldn't do anything so we tried turning it back on again and things were happy	20:22
mnaser	yeah i suspect this won't be fun to find..	20:22
logan-	whoa.. it happened again mnaser?	20:23
mnaser	logan-: like 3 times already :)	20:23
logan-	yikes	20:24
clarkb	2604:e100:1:: is the gateway for the vexxhost mirror?	20:24
mnaser	logan-: and the weird thing is that.. no migrations are happening or anything, so it would only be something that somehow cleared firewall rules and neutron .. readded them	20:24
mnaser	clarkb: the gateways are link local addreses, you should see them right now	20:25
clarkb	mnaser: okthats why I was confused	20:25
clarkb	https://etherpad.opendev.org/p/vexxhost-mirror-netplan something like that maybe?	20:25
clarkb	then restart networking with systemd?	20:26
clarkb	I love that we change these things every reelase	20:26
clarkb	fungi: re reproducing the fastly thing, its possible that each cdn "endpoint" is really many many servers and we only get lucky if we hit one of them	20:30
clarkb	what I remember from when this has happened before is we don't get the html formatting for the index list. Everything is listed in one line that wraps and none of the requires python metadata is included and the list is stale missing newer releases	20:38
johnsom	FYI, the third recheck of that quota change patch seems to have got past the pypi cache issue and appears to be running this time.	20:41
clarkb	heh its also possible that fungi hitting the CDN all over is ensuring that we keep that data fresh	20:42
johnsom	Hey, if it works! grin	20:42
fungi	yeah, not sure, since that one case i caught from ca-ymq-1.vexxhost i haven't had any more problem responses	20:46
fungi	and i'm testing cases from each of the mirrors where we've seen it every ten seconds for at least a couple hours now	20:46
fungi	also saw non-problematic responses from cache-yyz4537-YYZ after the one bad response	20:48
fungi	possible fastly had some cache propagation issues and they've cleared up since	20:48
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: Generate build ssh keys in PEM format https://review.opendev.org/746994	21:02
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: Add test-add-build-sshkey role https://review.opendev.org/747035	21:02
openstackgerrit	Clark Boylan proposed opendev/base-jobs master: Use test-add-build-sshkey in base-test https://review.opendev.org/747036	21:03
corvus	lists.zuul-ci.org is unhappy for me	21:03
clarkb	corvus: the web ui?	21:04
corvus	y	21:04
clarkb	I'm able to browse it from here	21:04
corvus	sigh. i ended up with an https link, sorry	21:04
clarkb	plenty of free memory on the server so we aren't OOMing again due to bot crawlers	21:04
clarkb	ah ok	21:05
corvus	we should do something about that someday :/	21:05
*** dmsimard7 has joined #opendev		21:14
openstackgerrit	Emilien Macchi proposed openstack/project-config master: Re-introduce puppet-tripleo-core group https://review.opendev.org/746759	21:14
*** dmsimard has quit IRC		21:15
*** dmsimard7 is now known as dmsimard		21:15
fungi	i need to pick back up the mm3 poc, i figured we'd ssl that since it has actual accounts and not just a password it e-mails you	21:28
fungi	also the first time i tried we weren't really deploying containers so i tried it from distro packages, but there are official container images i want to take out for a spin	21:29
* clarkb remembers gerritbot unit ws never disabled on review.o.o		21:31
clarkb	doing that now	21:31
fungi	ooh, yep	21:31
fungi	we could just uninstall it	21:31
fungi	nuke it from orbit, the only way to be sure	21:31
clarkb	`sudo systemctl disable gerritbot.service` has been run	21:32
clarkb	I'm not sur ewe need to do more than that	21:32
fungi	at this point i think we don't have any more obvious/critical problems with the new deployment so i doubt we'd roll back regardless	21:32
fungi	but yeah, as long as nothing ever reenables the unit, should be fine	21:33
*** tosky has quit IRC		21:47
*** hasharWineMusic is now known as hahsar		21:59
*** hahsar is now known as hashar		21:59
ianw	fyi on the arm64 issue, it looks like patchelf, rather than compile-time/build platform : https://github.com/pypa/manylinux/issues/735	22:00
ianw	fungi: catching up ... lmn if i can help with the cache issues. tl;dr seems to be the weird "don't get the right index" but we then cache it, making it worse?	22:00
fungi	yup	22:01
fungi	looks like i've finally caught a couple more hits in montreal, both involved cache-yyz4537-YYZ	22:01
fungi	so far no occurrences in the other regions i'm checking	22:02
ianw	clarkb: https://review.opendev.org/#/c/746994/ the ssh key thing ... i have deja vu	22:12
ianw	https://review.opendev.org/#/c/740841/ was it ... which turned out to be trailing whitespace being stripped, not the key format	22:13
ianw	https://review.opendev.org/#/c/740854/	22:14
clarkb	ianw: wasn't that fixed though?	22:14
ianw	yeah but i found that the format change had been incorporated a long time ago -- where is it failing?	22:15
ianw	iirc, it's fuzzy, but i determined everything we run should understand the new format keys	22:16
clarkb	ianw: in https://review.opendev.org/#/c/729336/ beaker testing	22:16
clarkb	it passed back in may but fails now and I believe we switched from xenial openssh on the executors generating the keys to debian new whatever zuul docker images are built on openssh generating them	22:16
clarkb	it may be the ruby lib there not necessarily the system openssh that is failing	22:17
clarkb	ianw: https://zuul.opendev.org/t/openstack/build/a5e7c9c187c748c4a2ae712da56d63e8/log/job-output.txt#1913-1963 is the sort of traceback thing we get from ruby	22:17
ianw	/home/zuul/.bundled_gems/gems/net-ssh-2.9.4/lib/net/ssh/authentication/methods/publickey.rb:19:in `authenticate'	22:17
ianw	yeah ... ok. because yeah, i'm pretty sure i determined xenial even could handle it	22:18
clarkb	and earlier in the job we try to generate a key on xenial using xenial openssh but it fails because we've already written an id_rsa	22:18
clarkb	thats fine if we can use that key though I think	22:18
clarkb	but it seems we can't	22:18
*** shtepanie has quit IRC		22:19
ianw	yeah my only concern is that we'll be using the old format forever because we'll never remove it, to work around a xenial ruby issue	22:19
clarkb	thats fair, but we keep hitting this all over	22:19
clarkb	I had to address it with my gerrit upgrade testing	22:19
clarkb	(gerrit init uses ssh-keygen fork but can only read PEM format so breaks :( )	22:20
clarkb	I guess the common attribute there is its other ssh tools not openssh itself	22:20
clarkb	ianw: reading xenils ssh-keygen manpage I think you are right atht it supporst the RFC 4716/SSH2 public or private key format	22:22
clarkb	but I think xenial must default to PEM	22:22
clarkb	which is why the ruby stuff worked until we switched to containerized executors	22:22
clarkb	the other place we've seen this is with paramiko	22:23
clarkb	so ya Ithink it is largely third party ssh tools that trip over it	22:24
ianw	yeah, that sounds about like what i found, i'm not sure if i wrote it down	22:24
ianw	dropped a comment; one idea is that we could have a flag to turn it on? i'm trying to think of somewhere that's a good hook point to enable that ...	22:24
ianw	would it be "if there's a xenial node in the inventory"?	22:25
clarkb	https://github.com/paramiko/paramiko/issues/1313 is the paramiko bug	22:25
clarkb	well I think it has more to do with the tools you use (gerrit's ssh lib, paramiko, ruby net-ssh) than the distro	22:25
clarkb	is there a concern with using PEM if it is more universally understood?	22:26
ianw	not really, other than over time we fall into the other category of "using the less tested path" as everything else moves	22:28
clarkb	ianw: I guess as an alternative we can modify our python jobs to convert the zuul generated key or delete it and regenerate	22:30
clarkb	basically have zuul do the up to date thing and if you need something else push into specific jobs	22:31
clarkb	let me work on a change to do that	22:31
ianw	up to you if you think it's good. i'm happy enough with a comment at least pointing us in the direction of "it's not ssh, but other stuff"	22:31
ianw	This will be out in Paramiko 2.7 (only one more ticket after this one before that's cut!)	22:32
ianw	bitprophet commented on Dec 4, 2019	22:32
ianw	that's not that long ago, unlike the openssh changes	22:33
*** hashar has quit IRC		22:34
ianw	OpenSSH 5.6 was released on 2010-08-23 ... paramiko merged a fix for it in 2019-12 ... so i'll see you back here in ~2030 and we can switch :)	22:36
openstackgerrit	Clark Boylan proposed opendev/system-config master: Convert ssh keys for ruby net-ssh if necessary https://review.opendev.org/747056	22:37
clarkb	ianw: hahaha	22:37
clarkb	ianw: fwiw I'm happy doing ^ instead if that works	22:37
clarkb	ianw: and ya judging by when the paramiko bug shows up I think ssh-keygen on xenial produces a working format by default but then the default output format changes on bionic	22:44
clarkb	xenial and bionic both support both formats, its more a concern of which tehy output if you don't specify	22:44
ianw	pabelanger: are you going to need a point release for the git thing?	22:47
openstackgerrit	Clark Boylan proposed opendev/puppet-elastic_recheck master: Use py3 with elastic-recheck https://review.opendev.org/729336	22:48
clarkb	zbr: ianw ^ added a depends on to test the change above	22:49
clarkb	ianw: want to WIP my zuul-jobs change until we get results from ^	22:51
clarkb	I can do it too. Just wondering how concered we are about that landing early	22:51
ianw	746994?	22:53
clarkb	ianw: the parent https://review.opendev.org/#/c/747035/1	22:53
clarkb	(technically both but gerrit will enforce it if its on the parent alone)	22:53
*** tkajinam has joined #opendev		22:58
*** DSpider has quit IRC		23:00
*** mlavalle has quit IRC		23:03
openstackgerrit	Ian Wienand proposed openstack/project-config master: Move github reporting to checks API https://review.opendev.org/746827	23:16
openstackgerrit	Clark Boylan proposed opendev/system-config master: Convert ssh keys for ruby net-ssh if necessary https://review.opendev.org/747056	23:28

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!