Friday, 2020-07-31

*** ryohayakawa has joined #opendev		00:05
openstackgerrit	Merged opendev/system-config master: Revert "Cap pytest to <6.0.0 to fix pytest-html" https://review.opendev.org/744070	00:07
*** ryohayakawa has quit IRC		00:11
*** ryohayakawa has joined #opendev		00:27
kevinz	ianw: Hi, I'm online	01:00
kevinz	ianw: What do you mean of "leaked node"?	01:01
ianw	kevinz: hey :) so i can't delete any of the servers at the moment, they all seem stuck in state deleting	01:01
kevinz	ianw: OK, let me check	01:01
ianw	\| OS-EXT-STS:task_state \| deleting	01:02
ianw	b8dacc2b-36a6-46d2-975e-dc8effc6f12f	01:02
ianw	one example	01:02
kevinz	Aha, yes I see. a lot of	01:03
*** elfenix has quit IRC		01:10
kevinz	ianw: it looks recovered	01:18
kevinz	ianw: The nova compute service can not talk to rabbitmq	01:19
ianw	kevinz: cool, thanks yeah i see it blank now too. let's try pyca/cryptography recheck :)	01:19
kevinz	ianw: np	01:19
kevinz	ianw: I wonder why it can not connect to the rabbitmq, and just restart nova_compute can solve this	01:20
kevinz	ianw: btw I saw that mirror01.regionone.linaro-us.opendev.org has been closed	01:21
ianw	oh ... that should not be? as in shutoff? let me check	01:21
ianw	2020-07-31T01:16:43Z	01:22
ianw	i'm powering it up now ...	01:23
kevinz	ianw: OK	01:23
ianw	kevinz: ^ does that timestamp correlate to anything? i don't think we did anything to shut it down	01:23
kevinz	ianw: I don't think we have some operation at that time...	01:24
kevinz	ianw: let me check the log why it has been closed	01:24
ianw	kevinz: hrm, i'm not sure it's turning on ... the console log is giving me a Unknown Error (HTTP 504)	01:25
ianw	\| OS-EXT-STS:task_state \| powering-on	01:25
ianw	but ... yeah, it doens't seem to be	01:25
kevinz	Instance in transitional state powering-off at start-up retrying stop request _init_instance	01:29
kevinz	ianw: looks this mirror is started now	01:32
ianw	yep; although "console log" on it still shows a 504 error for me, so maybe the console service is unhappy?	01:33
ianw	https://zuul.opendev.org/t/pyca/status	01:34
ianw	pyca-cryptography-centos-8-py36-arm64 node_failure	01:35
kevinz	let me see	01:35
kevinz	and I see that scheling failed	01:37
ianw	yeah, all of them seem to be going into node_failure	01:41
ianw	do you need the logs from our side? i guess you can see the problem on your side	01:41
kevinz	ianw: I saw that the scheduling to one node already, and then the compute show log like this: Instance spawn was interrupted before instance_claim, setting instance to ERROR state _error_out_instances_whose_build_was_interrupted /var/lib/kolla/venv/lib/python3.7/site-packages/nova/compute/manager.py	01:44
ianw	... hrm ... that message doesn't ring any instant bells for me, sorry	01:45
clarkb	check libvirt log?	01:48
kevinz	clarkb: the process has stoped at nova-compute. do not kick off the process to calling libvirt to create vm	01:50
clarkb	huh not sure then, nova channel may know?	01:54
kevinz	clarkb:ianw: looks recoverd now	02:03
kevinz	I see that the problem is due to rabbitmq	02:03
kevinz	all the compute node failed to connect to the rabbitmq server, and stucking at creating connection	02:03
kevinz	ianw: could you help to retry creation? I create 5 instances and they are working fine now	02:04
ianw	kevinz: yep, i've just retriggered some testing for https://zuul.opendev.org/t/pyca/status	02:05
ianw	looks like all the nodes are building, a good sign :)	02:05
kevinz	ianw: yes I see, coo	02:07
kevinz	cool	02:07
ianw	kevinz: not sure if you saw the context for pyca/cryptography testing, mostly described in https://github.com/pyca/cryptography/issues/5339	02:16
kevinz	ianw: thanks for the info. I will take a look at this. It looks Opendev can offer external CIs for testing.	02:17
kevinz	This maybe a better method to envolve more CI jobs ourside OpenStack foundation	02:19
ianw	yeah, we don't want to become travisci exactly, but contributing where it makes sense on a more "strategic" model	02:20
ianw	https://github.com/pyca/cryptography/pull/5341#issuecomment-666876930 ... yay, results! :)	02:27
*** owalsh has joined #opendev		02:29
*** owalsh_ has quit IRC		02:33
kevinz	\o/	02:46
corvus	"TypeError: 'ellipsis' object is not iterable" neat	02:48
corvus	ianw: have they perhaps dropped py35 support?	02:48
ianw	corvus: yeah, the setup.cfg seems to say it's supported	02:48
ianw	setup.py but yeah	02:49
corvus	yeah, the travis 3.5 builds are passing	02:49
corvus	i wonder why that failed then	02:49
corvus	or, rather, the github actions builds	02:52
corvus	i dunno about travis, i'm not logged in	02:52
ianw	it looks like maybe an issue with the typing library on xenial : https://github.com/python/typing/issues/259	02:53
ianw	hrm, that can't be it ... that must be part of the tox run	02:54
ianw	https://zuul.opendev.org/t/pyca/build/a44b9f3ec9744b8e876a220a3f168961/log/job-output.txt#1071 maybe not, typing isn't listed in pip list	02:55
ianw	oh, hang on, i'm getting confused by the backport packge	02:57
ianw	"OK, I see you are using 3.5.2 in CI, then you need to either upgrade to 3.5.3, " xenial is 3.5.3	02:57
ianw	3.5.2 i mean	02:57
ianw	that suggests the travis xenial tests are not using the xenial python	02:58
ianw	right, looks like xenial testing is restricted to python 2.7	03:00
corvus	i'm going to go out on a limb and guess that the pyca folks are gonna be in the "test latest upstream python 3.5" camp and not in the "test what the distros ship" camp	03:01
ianw	ok, well there is no coverage for python3 on xenial afaics anyway. we can switch the xenial test to 2.7 and that would be equivalent of the x86 tests	03:02
ianw	Downloading archive: https://storage.googleapis.com/travis-ci-language-archives/python/binaries/ubuntu/18.04/x86_64/python-3.5.tar.bz2	03:03
ianw	$ python --version	03:03
ianw	Python 3.5.7	03:03
ianw	that's how it gets tested on 3.5 effectively	03:03
*** bhagyashris\|away is now known as bhagyashris		03:52
fungi	we have an ensure-python role which installs a built latest minor release using stow, right?	03:59
fungi	or did i imagine that?	03:59
*** DSpider has joined #opendev		05:00
*** Dmitrii-Sh has quit IRC		05:07
*** Dmitrii-Sh has joined #opendev		05:08
*** redrobot has quit IRC		05:17
*** lpetrut has joined #opendev		06:10
ianw	fungi: yeah, but not sure about arm64 support ... and also there's the speed to consider if we do that	06:13
openstackgerrit	OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/744096	06:17
*** Meiyan has joined #opendev		06:32
*** ysandeep\|away is now known as ysandeep		06:48
*** ssaemann has joined #opendev		06:50
*** qchris has quit IRC		06:51
*** avass has quit IRC		06:56
*** ssaemann has quit IRC		07:04
*** qchris has joined #opendev		07:05
*** tosky has joined #opendev		07:19
zbr	I managed to get ownership of https://www.reddit.com/r/opendev/ -- we can try to use it to make the community more accessible to others.	07:21
*** ssaemann has joined #opendev		07:28
*** ssaemann has quit IRC		07:36
*** moppy has quit IRC		08:01
*** moppy has joined #opendev		08:01
*** ianw has quit IRC		08:01
*** ianw has joined #opendev		08:02
*** ysandeep is now known as ysandeep\|afk		09:07
*** dtantsur\|afk is now known as dtantsur		09:19
*** Meiyan has quit IRC		09:46
openstackgerrit	Lajos Katona proposed openstack/project-config master: Import netowrking-l2gw & networking-l2gw-tempest-plugin to x/ https://review.opendev.org/744110	09:53
*** lpetrut has quit IRC		10:17
*** ysandeep\|afk is now known as ysandeep		10:37
*** lpetrut has joined #opendev		10:51
*** zbr is now known as zbr\|pto		10:56
openstackgerrit	Lajos Katona proposed openstack/project-config master: Import netowrking-l2gw & networking-l2gw-tempest-plugin to x/ https://review.opendev.org/744110	11:03
*** tkajinam has quit IRC		11:36
*** ryohayakawa has quit IRC		11:48
*** ysandeep is now known as ysandeep\|brb		12:03
*** ssaemann has joined #opendev		12:10
*** ysandeep\|brb is now known as ysandeep		12:16
*** ssaemann has quit IRC		13:58
*** mlavalle has joined #opendev		14:03
*** lpetrut has quit IRC		14:06
mordred	fungi: we have a role for it that uses stow but I think so far only mnaser is taking advantage of it	14:12
mnaser	i don't think we have added it yet sadly :( but i think its .. tested	14:12
mnaser	as in tested inside zuul-jobs, i think	14:12
mordred	nod	14:13
mnaser	is there a way that we can get a nodeset for the vexxhost tenant carved out of our nodepool allocation?	14:39
mnaser	reasoning is the operator stuff is hard to fit into a single 8gb system. the only other option is to deploy over multinode to distribute this stuff but that's a whole another realm of issues	14:40
clarkb	mnaser: vexxhost doesnt currently provide any larger flavors iirc. several otherclouds do. You could add the expanded labels to vexxhost then consume them from the global pool?	14:43
clarkb	that said multinode is valuable because its more like what the operator will do in the real world	14:44
clarkb	more issues yes, but better to address them early?	14:44
mnaser	clarkb: yeah -- i agree on that second statement very much.	14:44
mnaser	we were very stable until we started adding more things and then it wasn't happy, i mean, we run it in multinode right now (operator is how our cloud runs for those services)	14:45
mnaser	but i think this might be something we should do to replicate a more production environment	14:45
mnaser	i think for now we might have to use bigger nodesets just to help us unblock the progress	14:50
mnaser	clarkb: i could swear zuul had a thing where buildset vm needed to be colocated with the jobs that consumed it, is that right?	14:53
mnaser	which means i might need 2 actual vexxhost-specific flavors, as i really don't need a big nodeset holding that..	14:54
clarkb	mnaser: by buildset you mean buildset registry? then yes. This is why I suggest adding to the existing pool of those resources instead	14:56
clarkb	you'll avoid node failures through extra retries and schedule more quickly if thereis headroom	14:57
mnaser	clarkb: so using the -expanded thing? i think it looks like openedge and airship are the only ones that maintain it?	14:57
mnaser	and one of those is gone and im not sure about the other	14:57
clarkb	yes and openedge is off for the summer but should be back and airship is still providing those resources if they can be scheduled	14:58
fungi	it's technically citycloud providing them, and yeah the 16gb flavor seems to schedule somewhat reliably there, just not the 32gb one	15:06
openstackgerrit	Mohammed Naser proposed openstack/project-config master: Re-add vexxhost-specific labels https://review.opendev.org/744195	15:11
mnaser	^ we'll unblock ourselves and i'll work on moving those jobs to multi-node over the weekend so we can kill those labels	15:11
*** chkumar\|rover is now known as raukadah		15:12
donnyd	Hoping in the next 7 days to have it back online	15:16
openstackgerrit	Mohammed Naser proposed openstack/project-config master: Re-add vexxhost-specific labels https://review.opendev.org/744195	15:17
mnaser	fungi: i missed updating labels	15:17
mnaser	cc clarkb mordred ^	15:17
clarkb	I'm going to pop out shortly and get abike ride in. Then back to lamd that gerrit /p/ change and maybe upgrade gitea	15:35
*** redrobot has joined #opendev		15:37
*** auristor has quit IRC		15:43
*** auristor has joined #opendev		15:44
fungi	i'll probably be semi-around	15:45
mordred	same	15:45
openstackgerrit	Merged openstack/project-config master: Re-add vexxhost-specific labels https://review.opendev.org/744195	15:53
mnaser	is it _possible_ that we have bad images uploaded?	17:27
mnaser	http://paste.openstack.org/show/796497/	17:27
mnaser	oddly enough happens _only_ for the new expanded flavor, but the non-expanded one works just fine, and they're both spawning in the same exact hypervisors..	17:28
*** dtantsur is now known as dtantsur\|afk		17:33
fungi	anything funny with the way bfv is set for those flavors? (no clue if that can vary, complete shot in the dark, seems to be having trouble identifying the boot partition)	17:45
clarkb	mnaser: we've seen it before because yhere is no verification of hash sums with glance	17:48
clarkb	but its been universal failure when that happened	17:48
clarkb	corvus: mordred can I get a second review on https://review.opendev.org/#/c/743324/ I'll aplrove it shortly if it looks good	17:48
mnaser	fungi: those flavors are not bfv, they actually are specific to opendev with built-in local storage (hence osf- prefix)	17:50
mnaser	i cant imagine why a larger instance fails to boot but a smaller one does just fine. the diskimage seems the same and nodepool won't be uploading another image...	17:51
mnaser	"here are the available partitions:" does not list the actual local drive	17:51
mnaser	did i create the flavor incorrectly in openstack... /me checks	17:51
mnaser	oh my god	17:52
mnaser	i created a flavor with 64mb of memory :-)	17:52
mnaser	that'll do it.	17:52
dmsimard	that's a lot of megabytes	17:57
*** ysandeep is now known as ysandeep\|brb		18:00
fungi	bwahahahahaha	18:01
fungi	all the megabytes	18:01
*** ysandeep\|brb is now known as ysandeep		18:07
mordred	clarkb: +2 - +W at will	18:15
corvus	ditto clarkb (fungi has a comment)	18:15
clarkb	ya I think I'll address fungi's comment in a followup along with another cleanup	18:18
clarkb	change is approved, I'll keep an eye on it	18:18
openstackgerrit	Clark Boylan proposed opendev/system-config master: Cleanup /p/ further and add reminder comments https://review.opendev.org/744237	18:22
clarkb	^ addresses fungi's comment and adds another clenaup	18:23
openstackgerrit	Merged openstack/project-config master: Remove os_congress gating https://review.opendev.org/742515	19:03
*** ysandeep is now known as ysandeep\|away		19:05
openstackgerrit	Merged opendev/system-config master: Deny Gerrit /p/ requests https://review.opendev.org/743324	19:19
clarkb	the job to apply ^ has started	19:22
clarkb	apache has restarted	19:25
clarkb	gerrit itself still works for me so thats all happy looking	19:25
clarkb	fatal: unable to access 'https://review.opendev.org/p/openstack/project-config/': The requested URL returned error: 403	19:26
clarkb	that looks correc ttoo	19:26
mordred	I agree - gerrit still works	19:27
fungi	yep, lgtm generally	19:27
clarkb	I'll give that a bit just to be sure nothing pops up then approv ethe gitea upgrade	19:28
clarkb	ok no screaming yet, I'll approve the gitea upgrade now	19:55
mordred	clarkb: aren't you so glad we don't actually hear physical screaming through IRC? :)	20:00
clarkb	ya I get enough from my kids	20:01
*** smcginnis has quit IRC		20:12
*** smcginnis has joined #opendev		20:13
openstackgerrit	Merged opendev/system-config master: Cleanup /p/ further and add reminder comments https://review.opendev.org/744237	20:16
clarkb	rhel and centos users: https://access.redhat.com/solutions/5272311	20:20
clarkb	I wonder if that will affect our image builds, it isn't clear to me if secureboot is necessary to trip it	20:21
clarkb	oh its any uefi setup with secureboot or without but we only uefi on arm64 so we're probably ok to just roll with it	20:22
clarkb	mnaser: ^ I think you run centos	20:23
mnaser	clarkb: thanks for that, we've actually moved most of our fleet to debian :>	20:24
mnaser	and i think we're already patched and the debian patches are largely ok	20:24
*** owalsh has quit IRC		20:24
clarkb	cool, just wanted to point it out as that could make for a very bad weekend :)	20:25
mnaser	hahaha	20:26
mnaser	indeed	20:26
* prometheanfire uses his own key :D		20:28
clarkb	prometheanfire: this bug doens't seem to require secureboot, its an issue with the update for secureboot that breaks all uefi boots	20:28
clarkb	so if you auto update and uefi boot you're still likely to break?	20:28
prometheanfire	maybe, I'm using systemd-boot personally, so not sure if that's impacted	20:31
fungi	yeah, i haven't seen any complaints about the debian patches, and i'm subscribed to all the mailing lists which would be blowing up about now if there were any	20:37
openstackgerrit	Monty Taylor proposed zuul/zuul-jobs master: Add a job for publishing a site to netlify https://review.opendev.org/739047	20:37
fungi	also i have systems which boot uefi and have rebooted them since the patches were applied, without issue	20:38
fungi	(currently testing a bunch of experimental kernel rebuilds on one, so i definitely would have noticed)	20:38
clarkb	ya I think it is a rh specific bug	20:40
*** owalsh has joined #opendev		20:45
fungi	seems so	20:45
clarkb	gitea change should merge in a couple minutes	20:55
clarkb	just waiting on zuul to process its results queue	21:01
openstackgerrit	Merged opendev/system-config master: Upgrade Gitea to v1.12.3 https://review.opendev.org/744079	21:01
clarkb	hrm gitea01 is done but I'm not convinced its gitea process is happy	21:11
clarkb	I think it may be restarting?	21:11
clarkb	yup it just did it again	21:12
clarkb	2020/07/31 21:11:45 ...exer/code/indexer.go:125:func3() [F] Repository Indexer Initialization Timed-Out after: 30s	21:12
clarkb	I think that is the issue	21:12
clarkb	gitea02 seems to not be expriencing this	21:14
clarkb	hrm and now gitea01 has been running for longer than 30 seconds	21:14
clarkb	that makes me wonder if that is simply an incremental process and a timeout that is too short	21:14
clarkb	I'll continue to monitor	21:14
clarkb	other than that the web ui seems to render ok and I havne't seen any other issues	21:15
clarkb	5 minutes ago Up 3 minutes <- is how to identify if it has happend without grepping the logs	21:15
clarkb	thats the docker ps -a output showing container creation time and start time	21:16
clarkb	I'll check the rest of them but I think we're ok as gitea01 has caught up and is no longer restarting now	21:16
clarkb	however the problem with that is it makes our graceful restarts for gerrit replication less graceful as web can be down with ssh up	21:16
*** owalsh has quit IRC		21:17
clarkb	I think they may all experience it but then they recover I'm rtfsing now to see if that is a configurable timeout	21:17
openstackgerrit	Clark Boylan proposed opendev/system-config master: Increase gitea indexer startup timeout https://review.opendev.org/744255	21:24
clarkb	infra-root ^ I think that should address the problem	21:24
clarkb	https://gitea01.opendev.org:3000/zuul/zuul/src/branch/master/zuul/scheduler.py lgtm I'll check 02-08 render it properly too	21:24
clarkb	if others can do a quick check too that would be great	21:25
mnaser	i'm trying `docker run -it --rm insecure-ci-registry.opendev.org:5000/vexxhost/glance-api:change_744212_latest` which i remembered corvus shared a usage example in the past	21:25
mnaser	but it looks like im getting 404 not found (plain text) back..	21:25
mnaser	is it possible that it just got pruned?	21:25
openstackgerrit	Monty Taylor proposed zuul/zuul-jobs master: Add a job for publishing a site to netlify https://review.opendev.org/739047	21:26
clarkb	pruning is done when you promote the next image and we prune only tose that are more than 24 hours old iirc	21:26
mnaser	they were pushd here https://d99bd7feea5bac8bf2c7-371f69b7cb442b1ed4a26cfca1c7e4db.ssl.cf2.rackcdn.com/744212/2/check/openstack-operator:images:build:glance/9f72309/job-output.txt	21:26
mnaser	i don't think we've done a promotion yet	21:26
mnaser	im not even getting a proper 404, it's a cherrypy 404	21:26
mordred	insecure-ci-registry is cherrypy	21:26
clarkb	02-08 also lgtm so I think we're good, double checking still appreciated	21:27
mordred	oh - but you mean it's a text not a json	21:27
mordred	or whatever	21:27
mnaser	yeah, text not json	21:27
mnaser	docker: Error response from daemon: error parsing HTTP 404 response body: invalid character '<' looking for beginning of value: "<!DOCTYPE html PUBLIC\ etc	21:27
mordred	corvus: ^^	21:28
corvus	mnaser: first of all, do you have the zuul build link for that rather than the log?	21:28
mnaser	corvus: https://zuul.opendev.org/t/vexxhost/build/9f723094e6524a7f82c7ac1929881891	21:28
* mnaser needs to break the log link habit		21:28
corvus	mnaser: docker://insecure-ci-registry.opendev.org:5000/vexxhost/glance-api:9f723094e6524a7f82c7ac1929881891_latest is what you want	21:28
corvus	mnaser: it's an artifact link on that page	21:28
corvus	mnaser: (right click / copy url on "vexxhost/glance-api:latest")	21:29
mnaser	corvus: wow, that's very easy.	21:29
corvus	mnaser: too easy apparently ;)	21:29
mnaser	corvus: `import rbd` and i can see my error why it failed in ci	21:30
mnaser	this is _awesome_	21:30
mnaser	now back to finding the fun of why `ImportError: /usr/local/lib/python3.7/site-packages/rbd.cpython-37m-x86_64-linux-gnu.so: undefined symbol: rbd_aio_write_zeroes`	21:30
corvus	mnaser: yeah, i think this could be revolutionary for the "why did my change fail in ci?" use case	21:30
mnaser	100% -- no more retrying things in ci	21:30
corvus	just put all of the complexity of making a reproducible build aside, and just run the actual build	21:30
corvus	only requirement is do everything with containers. seems we're heading that way anyway	21:31
mordred	corvus: I hear containers are just linux after all	21:31
*** owalsh has joined #opendev		21:32
clarkb	if only we could fetch and cache them in a reliable manner	21:34
* clarkb spent the early part of the week debugging tripleo's request limit woes		21:34
openstackgerrit	Monty Taylor proposed zuul/zuul-jobs master: Add a job for publishing a site to netlify https://review.opendev.org/739047	21:37
corvus	clarkb: i haven't gotten my signet hc yet... i'm starting to wonder if they're walking it down from portland. ;)	21:47
corvus	(still shows as pending on crowdsupply, so they haven't forgot)	21:47
clarkb	huh maybe you're on a second round of prints?	21:47
corvus	maybe; ordered in feb	21:47
corvus	maybe i'll be lucky and they'll have fixed the mmc issue	21:48
clarkb	I also noticed there is a newer client version I need to try	21:48
clarkb	that may make a good weekend project to finally dig into that and see if I can make it reliable	21:49
clarkb	(different usb ports, new client, etc)	21:49
clarkb	last weekend I upgraded my home fileserver	21:50
*** tosky has quit IRC		22:10
*** DSpider has quit IRC		22:24
openstackgerrit	Monty Taylor proposed zuul/zuul-jobs master: Add a job for publishing a site to netlify https://review.opendev.org/739047	22:44
openstackgerrit	Monty Taylor proposed zuul/zuul-jobs master: Pass node_version through to included roles https://review.opendev.org/744266	22:44
clarkb	just confirming no more recent gitea restarts on gitea01	23:17
clarkb	definitely seems to be something that increasing the timeout as in https://www.espn.com/mlb/story/_/id/29572885/sources-mlb-commissioner-warns-shutdown-players-do-better-job-managing-coronavirus would fix for next time	23:18
clarkb	erps wrong link :)	23:18
clarkb	https://review.opendev.org/#/c/744255/ that change	23:18
fungi	but also the mlb	23:45
clarkb	yes	23:46

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!