Monday, 2018-12-03

*** wolverineav has quit IRC		00:04
*** eharney has quit IRC		00:07
*** ahosam has quit IRC		00:14
*** jamesmcarthur has quit IRC		00:25
*** jamesmcarthur has joined #openstack-infra		00:29
*** wolverineav has joined #openstack-infra		00:34
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add install and deploy openshift roles. https://review.openstack.org/608610	00:40
*** jamesmcarthur has quit IRC		00:44
*** jamesmcarthur has joined #openstack-infra		00:45
*** jamesmcarthur has quit IRC		00:56
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add install and deploy openshift roles. https://review.openstack.org/608610	01:16
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: Add support for enabling the ARA callback plugin in install-ansible https://review.openstack.org/611228	01:19
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216	01:19
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: Prefix install_openstacksdk variable https://review.openstack.org/621462	01:19
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role https://review.openstack.org/621463	01:19
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role https://review.openstack.org/621463	01:21
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216	01:21
*** jamesmcarthur has joined #openstack-infra		01:27
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role https://review.openstack.org/621463	01:32
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216	01:32
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216	01:45
ianw	http://logs.openstack.org/28/611228/9/check/system-config-run-base-ansible-devel/a5abdca/job-output.txt.gz#_2018-12-03_01_33_26_430653	01:47
ianw	this is an interesting traceback in our ansible devel branch job ... that's an exception from inside python's multiprocesssing module	01:48
ianw	it looks like ansible is a pretty sane user of that, so it seems like a fun bug somewhere	01:48
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add install and deploy openshift roles. https://review.openstack.org/608610	01:50
*** hwoarang has quit IRC		02:03
*** hwoarang has joined #openstack-infra		02:04
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role https://review.openstack.org/621463	02:11
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216	02:11
*** hongbin has joined #openstack-infra		02:13
*** mrsoul has joined #openstack-infra		02:16
*** jamesmcarthur has quit IRC		02:32
*** psachin has joined #openstack-infra		02:42
*** hongbin has quit IRC		02:46
*** wolverineav has quit IRC		03:04
*** wolverineav has joined #openstack-infra		03:04
*** jamesmcarthur has joined #openstack-infra		03:07
*** bhavikdbavishi has joined #openstack-infra		03:14
*** hongbin has joined #openstack-infra		03:21
*** wolverineav has quit IRC		03:28
*** armax has quit IRC		03:29
*** hongbin has quit IRC		03:30
*** ramishra has joined #openstack-infra		03:31
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role https://review.openstack.org/621463	03:31
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216	03:31
*** jamesmcarthur has quit IRC		03:35
*** jamesmcarthur has joined #openstack-infra		03:35
*** hamzy__ is now known as hamzy		03:36
*** jamesmcarthur has quit IRC		03:40
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role https://review.openstack.org/621463	03:45
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216	03:45
*** wolverineav has joined #openstack-infra		03:55
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role https://review.openstack.org/621463	03:59
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216	03:59
*** jamesmcarthur has joined #openstack-infra		04:06
ianw	2018-11-29 03:43:13.751247 \| bridge.openstack.org \| ansible 2.8.0.dev0	04:08
ianw	oh that's quite annoying, ansible doesn't give you the git head when installed from source in version	04:09
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role https://review.openstack.org/621463	04:27
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216	04:27
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role https://review.openstack.org/621463	04:48
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216	04:48
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: [wip] install ansible as editable during devel jobs https://review.openstack.org/621471	04:48
*** yamamoto has joined #openstack-infra		04:55
*** agopi has joined #openstack-infra		05:05
*** hwoarang has quit IRC		05:10
*** hwoarang has joined #openstack-infra		05:11
*** jamesmcarthur has quit IRC		05:22
*** wolverineav has quit IRC		05:24
*** wolverineav has joined #openstack-infra		05:43
*** wolverineav has quit IRC		05:48
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: [DNM] testing ansible task handler import https://review.openstack.org/621475	05:56
*** yamamoto has quit IRC		05:59
*** yamamoto has joined #openstack-infra		06:00
*** hwoarang has quit IRC		06:01
*** hwoarang has joined #openstack-infra		06:03
*** elbragstad has quit IRC		06:03
*** zul has quit IRC		06:04
*** ykarel has joined #openstack-infra		06:08
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: WIP: Add spec for scale out scheduler https://review.openstack.org/621479	06:23
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: WIP: Add spec for scale out scheduler https://review.openstack.org/621479	06:24
*** apetrich has quit IRC		06:25
*** wolverineav has joined #openstack-infra		06:34
openstackgerrit	Surya Prakash (spsurya) proposed openstack-infra/zuul master: dict_object.keys() is not required for in operator https://review.openstack.org/621482	06:35
*** ralonsoh has joined #openstack-infra		06:37
*** yamamoto has quit IRC		06:37
*** yamamoto has joined #openstack-infra		06:38
*** apetrich has joined #openstack-infra		06:40
*** kjackal has joined #openstack-infra		06:47
*** wolverineav has quit IRC		06:55
*** wolverineav has joined #openstack-infra		06:56
*** rcernin has quit IRC		06:57
*** yamamoto has quit IRC		06:58
*** yamamoto has joined #openstack-infra		06:59
*** wolverineav has quit IRC		07:01
*** quiquell\|off is now known as quiquell		07:10
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: [DNM] testing ansible task handler import https://review.openstack.org/621475	07:13
*** rkukura has quit IRC		07:14
*** dpawlik has joined #openstack-infra		07:15
*** dpawlik has quit IRC		07:20
*** dpawlik_ has joined #openstack-infra		07:20
*** aojea has joined #openstack-infra		07:23
*** pcaruana has joined #openstack-infra		07:25
*** wolverineav has joined #openstack-infra		07:27
*** wolverineav has quit IRC		07:28
*** wolverineav has joined #openstack-infra		07:28
*** gema has joined #openstack-infra		07:37
*** quiquell is now known as quiquell\|brb		07:40
*** wolverineav has quit IRC		07:43
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: [DNM] testing ansible task handler import https://review.openstack.org/621475	07:46
*** ahosam has joined #openstack-infra		07:59
*** e0ne has joined #openstack-infra		08:04
*** ginopc has joined #openstack-infra		08:05
*** slaweq has joined #openstack-infra		08:06
*** ahosam has quit IRC		08:08
*** shardy has joined #openstack-infra		08:11
*** yboaron_ has quit IRC		08:12
ianw	mordred / corvus / clarkb : it seems the iptables role has triggered a real issue somewhere in our ansible devel branch testing job; I've filed https://github.com/ansible/ansible/issues/49430 with details	08:12
ianw	certainly it seems related to the importing of tasks into the reload handler	08:13
*** jtomasek has joined #openstack-infra		08:15
*** jtomasek has quit IRC		08:15
*** jtomasek has joined #openstack-infra		08:16
*** quiquell\|brb is now known as quiquell		08:18
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: [to squash] Modifications to ARA installation https://review.openstack.org/621463	08:23
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216	08:23
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: [wip] install ansible as editable during devel jobs https://review.openstack.org/621471	08:23
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: [DNM] testing ansible task handler import https://review.openstack.org/621475	08:23
ianw	dmsimard: ^ could you review 621463 for me, and if you're happy, we can squash that into the base install-ara change? personally i think we can get this in to collect the inner ara results in the gate quickly, as that is very useful	08:24
*** rossella_s has quit IRC		08:26
*** ykarel is now known as ykarel\|lunch		08:26
*** jpena\|off is now known as jpena		08:28
*** rossella_s has joined #openstack-infra		08:28
*** kjackal has quit IRC		08:38
*** kjackal has joined #openstack-infra		08:39
*** ccamacho has joined #openstack-infra		08:45
*** rkukura has joined #openstack-infra		08:46
*** tosky has joined #openstack-infra		08:48
*** yboaron_ has joined #openstack-infra		08:51
*** yboaron_ has quit IRC		08:56
*** yboaron_ has joined #openstack-infra		08:57
*** xek has joined #openstack-infra		08:59
*** jpich has joined #openstack-infra		09:03
*** aojea has quit IRC		09:13
*** aojea has joined #openstack-infra		09:14
openstackgerrit	Merged openstack-infra/infra-manual master: Fix a reST block syntax https://review.openstack.org/621455	09:37
ssbarnea\|rover	ianw: mordred corvus clarkb : would it be a problem to upload some periodic rdo job logs to logstash? I found some errors there where log stash would be very useful for.	09:37
*** gfidente has joined #openstack-infra		09:38
*** ykarel\|lunch is now known as ykarel		09:41
ianw	ssbarnea\|rover: you should have a chat with tristanC about his log analysis stuff, it could probably import	09:42
ianw	to your question, i'm not sure, clarkb is probably best to talk too.	09:42
ssbarnea\|rover	ianw: thanks. i will ask them.	09:43
*** sshnaidm\|off is now known as sshnaidm		09:43
*** derekh has joined #openstack-infra		09:57
*** yamamoto has quit IRC		10:00
*** yamamoto has joined #openstack-infra		10:07
*** yamamoto has quit IRC		10:10
*** fresta has quit IRC		10:13
*** fresta has joined #openstack-infra		10:14
*** kjackal has quit IRC		10:17
*** kjackal has joined #openstack-infra		10:18
*** electrofelix has joined #openstack-infra		10:18
*** fresta has quit IRC		10:22
*** electrofelix has quit IRC		10:22
*** fresta has joined #openstack-infra		10:22
*** bhavikdbavishi has quit IRC		10:23
*** electrofelix has joined #openstack-infra		10:31
*** shardy has quit IRC		10:35
*** shardy has joined #openstack-infra		10:43
*** adriancz has joined #openstack-infra		10:45
*** panda\|pto is now known as panda		10:47
*** shardy has quit IRC		10:55
*** ahosam has joined #openstack-infra		10:55
*** priteau has joined #openstack-infra		10:56
*** yamamoto has joined #openstack-infra		11:04
*** jamesmcarthur has joined #openstack-infra		11:11
*** yamamoto has quit IRC		11:11
*** yamamoto has joined #openstack-infra		11:15
*** jamesmcarthur has quit IRC		11:15
*** sshnaidm has quit IRC		11:16
*** sshnaidm has joined #openstack-infra		11:16
*** sshnaidm has quit IRC		11:18
*** rfolco has joined #openstack-infra		11:18
*** sshnaidm has joined #openstack-infra		11:19
*** quiquell is now known as quiquell\|brb		11:21
*** owalsh_ has quit IRC		11:24
*** owalsh has joined #openstack-infra		11:24
*** jpich has quit IRC		11:25
*** jpich has joined #openstack-infra		11:26
*** hamzy_ has joined #openstack-infra		11:42
*** ahosam has quit IRC		11:42
*** hamzy has quit IRC		11:42
*** dtroyer has quit IRC		11:43
*** dtroyer has joined #openstack-infra		11:43
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add install and deploy openshift roles. https://review.openstack.org/608610	11:44
*** quiquell\|brb is now known as quiquell		11:45
*** yamamoto has quit IRC		11:48
*** yamamoto has joined #openstack-infra		11:49
*** yamamoto has quit IRC		11:49
*** yamamoto has joined #openstack-infra		11:50
*** yamamoto has quit IRC		11:56
tobias-urdin	tonyb: we got consensus to remove the stable/newton branches on all stable branches but i think the thread is kind of lost in openstack-dev list	11:57
tobias-urdin	who can i talk to to queue up that work?	11:57
*** electrofelix has quit IRC		11:58
*** electrofelix has joined #openstack-infra		12:03
*** ykarel is now known as ykarel\|afk		12:03
*** ahosam has joined #openstack-infra		12:03
*** owalsh has quit IRC		12:03
*** shardy has joined #openstack-infra		12:09
*** ykarel\|afk is now known as ykarel		12:11
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool master: Implement an OpenShift resource provider https://review.openstack.org/570667	12:13
*** ykarel is now known as ykarel\|afk		12:19
*** owalsh has joined #openstack-infra		12:21
*** ykarel\|afk has quit IRC		12:30
*** jpena is now known as jpena\|lunch		12:34
*** dhill_ has joined #openstack-infra		12:40
*** ramishra has quit IRC		12:48
*** lpetrut has joined #openstack-infra		12:52
*** rlandy has joined #openstack-infra		12:54
*** lpetrut has quit IRC		12:55
*** lpetrut has joined #openstack-infra		12:55
*** dave-mccowan has joined #openstack-infra		12:58
*** Douhet has quit IRC		12:58
*** ramishra has joined #openstack-infra		13:04
*** rh-jelabarre has joined #openstack-infra		13:06
*** boden has joined #openstack-infra		13:08
*** kjackal has quit IRC		13:12
*** kjackal has joined #openstack-infra		13:12
*** tpsilva has joined #openstack-infra		13:15
*** jamesmcarthur has joined #openstack-infra		13:17
*** ykarel\|afk has joined #openstack-infra		13:18
*** ykarel\|afk is now known as ykarel		13:19
*** agopi has quit IRC		13:24
*** jpena\|lunch is now known as jpena		13:26
*** agopi has joined #openstack-infra		13:30
*** udesale has joined #openstack-infra		13:30
*** dave-mccowan has quit IRC		13:32
*** jamesmcarthur has quit IRC		13:33
ssbarnea\|rover	clarkb: let me know when you are here, i want to ask you about logstash.	13:34
*** ahosam has quit IRC		13:35
*** zul has joined #openstack-infra		13:36
*** jroll has quit IRC		13:38
fungi	ssbarnea\|rover: are these logs from jobs which run in our ci system, or elsewhere? injecting third-party logs into our elasticsearch backend is something we've said in the past we won't support, and instead recommend those third parties operate their own log analysis systems (they're welcome to reuse the same mechanisms we do for running them if they like)	13:38
*** jroll has joined #openstack-infra		13:38
fungi	ianw: catching up on scrollback, did you come to a conclusion on how to unblock system-config changes (the failing "Install IPv4 rules files" task)?	13:40
ssbarnea\|rover	fungi: so short answer no (way to have an unified interface to query logs across differrent CI systems).	13:40
ssbarnea\|rover	i guess there is no need to explain why this would be useful (also related to elastic-recheck), as same error could easily spread across different CIs	13:41
fungi	ssbarnea\|rover: right, we already struggle for a reasonable amount of retention with just the logs from our ci systems. we've also had other projects ask to reuse our elasticsearch cluster to house performance metrics from jobs in their jenkins simply so they can avoid having to maintain an elasticsearch cluster themselves... not sure where we can sanely draw the line, but previously we've said "only	13:43
fungi	jobs which run in our ci system"	13:43
fungi	you can also run your own elastic-recheck service. it's published under an open license too	13:44
frickler	fungi: iiuc we'd have to make system-config-run-base-ansible-devel non-voting if we need to merge something before we find a fix or workaround for that ansible issue	13:45
fungi	frickler: thanks, i need to go run some errands here shortly, but when i get back i can try to take a look so i can merge the mailing list changes which were scheduled to go in today	13:46
frickler	fungi: I can prepare a patch for that	13:47
*** jaosorior has joined #openstack-infra		13:47
fungi	is there a theory as to why ansible isn't finding the "Reload iptables Debian" handler?	13:48
fungi	i saw ianw say something about exposing a bug in ansible	13:48
frickler	fungi: https://github.com/ansible/ansible/issues/49430 has some details, but no root cause yet if I read it correctly	13:50
ssbarnea\|rover	fungi: :) i know, i was trying to lower the number of system I need to check, not increasing it. i do understand the reasons behind. still kibana supports doing queries on multiple clusters, which means that it could be possible to configure it as a single frontend for both clusters.	13:50
openstackgerrit	Jens Harbott (frickler) proposed openstack-infra/system-config master: Make system-config-run-base-ansible-devel non-voting https://review.openstack.org/621577	13:51
*** Douhet has joined #openstack-infra		13:52
*** jamesmcarthur has joined #openstack-infra		13:52
mordred	frickler: fascinating	13:57
frickler	ianw: fungi: I think I found the commit that broke ansible for us, added reference to the issue. still not sure whether that implies that our usage is broken	13:57
frickler	mordred: ^^	13:57
mordred	frickler: yah. I was just reading your comment there	13:57
fungi	neat-o	14:00
*** fried_rice is now known as efried		14:00
*** quiquell is now known as quiquell\|lunch		14:01
*** efried is now known as fried_rice		14:01
*** fried_rice is now known as efried		14:02
*** jcoufal has joined #openstack-infra		14:02
*** kgiusti has joined #openstack-infra		14:03
*** yboaron_ has quit IRC		14:05
*** yboaron_ has joined #openstack-infra		14:05
*** mriedem has joined #openstack-infra		14:07
*** jcoufal has quit IRC		14:07
openstackgerrit	Jens Harbott (frickler) proposed openstack-infra/system-config master: Fix iptables handlers https://review.openstack.org/621580	14:09
frickler	ianw: fungi: mordred: ^^ I think that this should be the fix, waiting to see job results	14:10
*** jcoufal has joined #openstack-infra		14:11
fungi	thanks frickler!	14:15
mordred	neat!	14:16
*** jcoufal has quit IRC		14:17
*** jcoufal has joined #openstack-infra		14:19
*** psachin has quit IRC		14:24
*** SteelyDan is now known as dansmith		14:28
*** quiquell\|lunch is now known as quiquell		14:40
fungi	okay, heading out for errands, back in a sec	14:42
*** nhicher has joined #openstack-infra		14:42
*** jpich has quit IRC		14:42
*** lbragstad has joined #openstack-infra		14:49
*** gema has left #openstack-infra		14:49
*** jpich has joined #openstack-infra		14:50
*** dave-mccowan has joined #openstack-infra		14:54
*** bobh has joined #openstack-infra		14:54
*** lbragstad has quit IRC		14:58
*** lbragstad has joined #openstack-infra		15:00
*** beekneemech is now known as bnemec		15:01
*** jamesmcarthur has quit IRC		15:06
*** sthussey has joined #openstack-infra		15:16
hughsaunders	Hey, I've been looking into nodepool again, and it seems there isn't an attempt to route requests to workers that have ready capacity. Also ready capacity isn't evenly distributed, so once you have more than a few regions that can provide a label, the chances of hitting ready capacity are quite low. Eg if I have 5 regions, and min-ready:3, there will probably only be ready capacity in 2 regions, which gives a request a 2/5	15:30
hughsaunders	chance of hitting a ready node.	15:30
*** dpawlik_ has quit IRC		15:31
hughsaunders	I started digging into the code because I couldn't work out why my requests were waiting for new instance builds when there were ready nodes waiting.	15:31
hughsaunders	So am I doing something wrong? Or have I come to an accurate summary of the current situation? If so would you accept some kind of patch to attempt to prioritise regions with ready capacity?	15:32
fungi	hughsaunders: nodepool/zuul development discussions likely have a better audience in the #zuul channel, as nodepool technically isn't an openstack-infra project any longer	15:33
hughsaunders	probably should have remembered that, apologies and thanks.	15:33
corvus	hughsaunders: if you want to hop over to that channel, i can answer your question there :)	15:33
dmsimard	I was looking at the state of the gate because it seemed like there was a little bit of a backlog. Is it okay for certain projects to have >25 jobs on a single change ?	15:37
mordred	dmsimard: yeah. there is a set of patches that started to be rolled out friday that are intended to make the backlog a bit fairer	15:40
mordred	dmsimard: but as of now there haven't been any limits placed on to numbers of jobs per projects	15:40
dmsimard	I like to think that if they have as many jobs it's because they need it, was just genuinely curious -- I think I saw a set of changes by tobiash to get metrics too.	15:41
mordred	dmsimard: yah. like - I have a ton of jobs on sdk ... but they're all actually useful (I keep trying to remove some)	15:42
*** jamesmcarthur has joined #openstack-infra		15:42
tobiash	dmsimard: you mean https://review.openstack.org/616306 ?	15:43
dmsimard	yeah	15:43
*** aojeagarcia has joined #openstack-infra		15:51
*** ykarel has quit IRC		15:53
*** ykarel has joined #openstack-infra		15:53
*** yboaron_ has quit IRC		15:54
*** aojea has quit IRC		15:54
*** rtjure has quit IRC		15:55
*** lennyb_ has quit IRC		15:55
*** jhesketh has quit IRC		15:55
*** dayou_ has quit IRC		15:56
AJaeger_	tripleo is running again - or still - non-voting jobs in gate ;( . EmilienM , jaosorior, please see https://review.openstack.org/616872 which is right now top of zuul gate for tripleo and has 4 non-voting jobs	15:56
EmilienM	AJaeger_: ok	15:56
*** jhesketh has joined #openstack-infra		15:57
*** lennyb has joined #openstack-infra		15:57
*** quiquell is now known as quiquell\|off		15:58
*** janki has joined #openstack-infra		15:58
EmilienM	mwhahaha: ^ didn't we fix that?	15:59
mwhahaha	there's a patch	16:00
mwhahaha	https://review.openstack.org/#/c/620705/	16:00
*** dayou_ has joined #openstack-infra		16:00
mwhahaha	pending approval :/	16:00
EmilienM	approved	16:00
*** Douhet has quit IRC		16:01
AJaeger_	thanks, EmilienM and mwhahaha !	16:01
*** woojay has joined #openstack-infra		16:02
*** Douhet has joined #openstack-infra		16:02
fungi	need us to promote that change to the front so it will take effect sooner?	16:03
EmilienM	fungi: yes please	16:04
AJaeger_	fungi: it's only for tripleo-ci and there's onl 616872 at top of gate, let it finish	16:04
*** gyee has joined #openstack-infra		16:04
fungi	k	16:05
fungi	wasn't sure if it was in one of the longer shared queues	16:05
AJaeger_	it is in the longer shared queue - but only releveant for tripleo-ci and as there are no other changes for that repo, promoting would harm us IMHO	16:06
fungi	ahh, figured if it was removing a lot of non-voting jobs then we would stop running them that much sooner	16:07
*** rtjure has joined #openstack-infra		16:10
*** dpawlik has joined #openstack-infra		16:11
*** adriancz has quit IRC		16:14
*** dklyle has joined #openstack-infra		16:15
*** dpawlik has quit IRC		16:16
clarkb	amorin: fungi: any word if we should reenable bhs1 at this point?	16:17
clarkb	ssbarnea\|rover: I am here if you want to talk logstash, or did fungi answer your questions?	16:17
*** dtantsur is now known as dtantsur\|afk		16:17
clarkb	I agree we with fungi. Our elasticsaerch and logstash tooling is built for our CI system. Its unfortunately not a great set of tooling to offer to third parties (due to AAA being non existant and the size of the cluster already being quite large for the few days of logs we get out of it)	16:18
ssbarnea\|rover	clarkb: fungi answered most questions, mainly the only remaining one is if we can configure kibana to query both elastic-search clusters.	16:18
clarkb	ssbarnea\|rover: both meaning Infra's and RDOs?	16:18
clarkb	no I don't think we should do that either	16:18
ssbarnea\|rover	clarkb: yep. rdo cluster could be optional.	16:19
ssbarnea\|rover	clarkb: i will try to see if I can configure the rdo kibana to query both (own and upstrean).	16:19
*** dpawlik has joined #openstack-infra		16:20
ssbarnea\|rover	the idea is to have one unified query interface	16:20
clarkb	the issue is that we aren't one unified system though	16:20
clarkb	the infra team has zero ability to fix bugs in rdop	16:20
clarkb	but presenting that data as coming from our CI system would imply otherwise	16:20
clarkb	and I don't want to create that confusion	16:20
fungi	we get enough questions every time systems people already incorrectly assume we manage are offline	16:21
dmsimard	was anyone looking at the issues we had in ovh hs ?	16:21
dmsimard	bhs*	16:21
ssbarnea\|rover	clarkb: never mind, i will try to configure rdo to query both.	16:21
fungi	dmsimard: amorin said he was going to look into it, yes	16:21
*** bobh has quit IRC		16:25
*** yamamoto has joined #openstack-infra		16:25
*** janki has quit IRC		16:26
amorin	fungi: yes, I did try to take a look, but I was trapped in another topic	16:27
dmsimard	amorin: let us know if we can help :)	16:29
fungi	frickler: your proposed fix seems to be raising an "Unexpected Exception" from the iptables : Reload iptables (Debian) handler now	16:29
fungi	not quite sure what to make of that	16:29
fungi	http://logs.openstack.org/80/621580/1/check/system-config-run-base/caf2d3e/job-output.txt.gz#_2018-12-03_15_56_39_095892	16:30
*** yamamoto has quit IRC		16:30
fungi	i think it's saying that `netfilter-persistent start` exited nonzero?	16:31
fungi	hrm, though the json mentions an rc of 0	16:32
fungi	so maybe it's not talking about that task	16:33
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Don't calculate priority of non-live items https://review.openstack.org/621626	16:35
frickler	fungi: oh, that error is in -base now, not in -devel	16:35
frickler	maybe the change isn't backwards compatible?	16:35
fungi	ouch	16:35
fungi	right, i missed that	16:35
fungi	can we specify both import_tasks and include_tasks?	16:36
*** lpetrut has quit IRC		16:36
fungi	or are they mutually-exclusive?	16:36
frickler	I have not idea, I'll leave this to ansible experts now. mordred ianw ^^	16:36
frickler	s/not/no/	16:37
frickler	we can merge the nv patch in the meantime I'd say	16:37
clarkb	frickler: frickler tldr is ansible 2.8.0 has broken things in a non backward compatible way?	16:39
clarkb	I guess ianw filed a bug maybe I should start by reading that	16:39
frickler	clarkb: the issue and the links in it should have some information. I'm not sure whether it is really backwards incompatible or my fix just needs more knowledge	16:40
frickler	clarkb: for sure the cited merge broke the way we use ansible currently	16:41
*** trown is now known as trown\|lunch		16:41
*** dpawlik has quit IRC		16:42
*** e0ne has quit IRC		16:48
clarkb	looks like other users have reported similar issues	16:51
clarkb	so maybe switching to -nv job for now and waiting to see if ansible fixes it for all of us is the way forward?	16:51
*** dpawlik has joined #openstack-infra		16:52
pabelanger	clarkb: corvus: I'm around again today if we wanted to try again nodepool / zuul upgrades. I admit, I am not sure if there are any issues preventing us from trying again this morning	16:57
corvus	pabelanger, clarkb: yeah, we could try now, or we could wait for 621626 to land. either should work.	16:58
pabelanger	looking	16:59
clarkb	If we wait then the end result is releasable assuming it works right?	17:00
pabelanger	looks like a few hours of waiting, assuming we don't enqueue	17:00
openstackgerrit	James E. Blair proposed openstack-infra/system-config master: Don't import in iptables handlers https://review.openstack.org/621633	17:00
openstackgerrit	James E. Blair proposed openstack-infra/system-config master: Don't import tasks in iptables reload and use listen https://review.openstack.org/621634	17:00
corvus	frickler, clarkb, fungi, ianw, mordred: ^ two more alternatives to consider	17:01
corvus	clarkb, pabelanger: why don't i direct-enqueue it	17:01
pabelanger	+1	17:01
clarkb	corvus: ++	17:02
*** udesale has quit IRC		17:02
*** aojeagarcia has quit IRC		17:07
mordred	corvus: I think I like 621633 the best in this particular case, just because it's simpler	17:07
corvus	mordred: yeah. i kind of like listen, and would lean toward that, if it weren't for the 'when' issue	17:08
mordred	yeah	17:09
*** graphene has joined #openstack-infra		17:10
openstackgerrit	James E. Blair proposed openstack-infra/project-config master: Add #openstack-designate to accessbot https://review.openstack.org/621639	17:15
corvus	frickler: ^	17:15
*** armax has joined #openstack-infra		17:16
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: Extract out common config parsing for ConfigPool https://review.openstack.org/621642	17:18
*** manjeets has joined #openstack-infra		17:19
*** dpawlik has quit IRC		17:19
*** bobh has joined #openstack-infra		17:20
*** bobh has quit IRC		17:21
clarkb	fungi: gerrit slowness hasn't happened again and we are still blocking stackalytics user?	17:25
*** jpich has quit IRC		17:25
clarkb	corvus: frickler re -designate channel, I can't seem to list access for that channel with chanserv?	17:27
corvus	clarkb: yep. you will when the accessbot change lands	17:31
clarkb	corvus: was it set up intentioanlly that way before? seems odd	17:32
corvus	clarkb: not sure; might be a side effect of some of the modes set on it?	17:32
corvus	i'm afk for 30m; should be ready to restart zuul when i get back	17:34
corvus	apparently i jinxed it; py36 failed	17:34
clarkb	fwiw I +2'd https://review.openstack.org/621633 as I agree wtih mordred that I prefer it because it is simpler	17:34
clarkb	frickler: fungi ^ if others want to maybe review a fix for the ansible thing	17:35
*** dpawlik has joined #openstack-infra		17:35
corvus	the sql failures again; i'm going to re-enqueue	17:35
corvus	(also, that's the second time i've see the sql failures on limestone)	17:36
mordred	clarkb: I have also +2d, but have not +Ad so that we can get folks to weigh in	17:36
mordred	clarkb: I wish there was a condorcet plugin for gerrit that would allow people to rank vote on a collection of patches. I have no interest in writing such a plugin though	17:37
corvus	mordred: ++	17:37
clarkb	mordred: you could probably implement that intirely in prolog	17:37
mordred	clarkb: yah. DEFINITELY don't want to implement a condorcet voting system in prolog	17:37
clarkb	:)	17:38
corvus	ok. re-enqueued. back in ~30.	17:38
mordred	but maybe zaro will get bored one day and write it :)	17:38
*** dpawlik has quit IRC		17:39
fungi	clarkb: correct, i haven't seen any coordinated reports of slowness (just the occasional ones which only seemed to affect one person and couldn't be reproduced globally)	17:41
fungi	and i haven't removed the ip6tables rule blocking the address stackalytics-bot-2 was seen coming from	17:42
*** shardy has quit IRC		17:46
*** bobh has joined #openstack-infra		17:55
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: Extract out common config parsing for ConfigPool https://review.openstack.org/621642	17:59
*** derekh has quit IRC		18:02
*** e0ne has joined #openstack-infra		18:03
jonher	Gate never ran on https://review.openstack.org/619216/ is a normal recheck required or is there another command to only have it recheck gate?	18:05
openstackgerrit	Merged openstack-infra/zuul master: Don't calculate priority of non-live items https://review.openstack.org/621626	18:07
openstackgerrit	Clark Boylan proposed openstack-infra/zuul master: Handle github delete events https://review.openstack.org/621665	18:10
*** ralonsoh has quit IRC		18:11
openstackgerrit	Ed Leafe proposed openstack-infra/project-config master: Add the os-resource-classes project https://review.openstack.org/621666	18:11
fungi	jonher: that's really strange. i don't see any indication of maintenance activity around that time	18:11
clarkb	fungi: jonher: zuul was restarted a couple times on that day	18:12
clarkb	tryign to get the relative priority work deployed	18:12
fungi	indeed it was according to http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2018-11-30.log.html	18:12
fungi	just never made it into https://wiki.openstack.org/wiki/Infrastructure_Status	18:12
jonher	OK, so a simple "recheck" should get things going again?	18:13
clarkb	jonher: yes	18:13
clarkb	or better yet reapproval	18:13
clarkb	whic I've done	18:13
clarkb	(then we can skip the check queue)	18:13
fungi	approve event in gerrit was at 22:49 and looks like there was indeed a zuul scheduler restart ni progress according to the channel log	18:13
fungi	so no mystery, just poor timing on my part with the approve button	18:14
openstackgerrit	Ed Leafe proposed openstack-infra/project-config master: Add the os-resource-classes project https://review.openstack.org/621666	18:14
jonher	gr8, thanks clarkb	18:14
*** jpena is now known as jpena\|off		18:17
*** apetrich has quit IRC		18:18
openstackgerrit	Merged openstack-infra/infra-manual master: Replace mailing list https://review.openstack.org/619216	18:23
clarkb	fungi: did old mailing lists get disabled yet? that is on tap for today right?	18:24
jonher	^ now it merged, thanks again :)	18:24
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: WIP: Fix broken setRefs whith missing objects https://review.openstack.org/621667	18:24
fungi	clarkb: that is on tap for today, but need to be able to merge system-config patches to do that, ideally	18:25
clarkb	fungi: did you see https://review.openstack.org/#/c/621633/ as a fix for that?	18:25
fungi	yep, and earlier attempts	18:25
fungi	was waiting to see check results	18:25
*** udesale has joined #openstack-infra		18:26
*** apetrich has joined #openstack-infra		18:30
*** electrofelix has quit IRC		18:31
*** ykarel is now known as ykarel\|away		18:36
corvus	clarkb, pabelanger: zuul change is in place; shall we start some restarts now?	18:38
*** eernst has joined #openstack-infra		18:39
fungi	or restart some starts	18:39
corvus	perhaps most accurately: restart some restarts	18:39
clarkb	I'm around and ready	18:41
fungi	also around and not mired in anything especially sticky	18:42
*** vabada has quit IRC		18:43
corvus	would someone like to go aheand and restart the nodepool launchers?	18:44
corvus	and i can restart the zuul scheduler afterwords	18:44
fungi	i can do that	18:45
fungi	any special care to take, or just service restart them?	18:45
corvus	fungi: maybe start with nl04	18:45
corvus	we did merge at least one change since the last time we restarted them	18:45
pabelanger	corvus: clarkb: I am around	18:46
pabelanger	on standby if needed	18:46
fungi	pbr freeze says we have nodepool==3.3.2.dev67 # git sha f116826 installed on nl04	18:46
corvus	looks right	18:47
fungi	that's what we're expecting, seems to match origin/master	18:47
clarkb	ya nl04 is good choice while bhs1 is disabled	18:47
*** ykarel\|away has quit IRC		18:47
fungi	nodepool-launcher restarted on nl04 now	18:47
corvus	it's going to be very very chatty for a bit	18:48
fungi	with `service nodepool-launcher restart` which seems to have worked. new pid, current time	18:48
fungi	and yeah, tailing the debug log it is indeed chatty	18:48
fungi	seems to have reached steady state now?	18:48
fungi	it's handling requests anyway	18:49
corvus	yeah, still very chatty. i'm on the fence about whether we can handle that level long-term. but it's going to be useful for the next little bit to be able to examine the new behavior.	18:49
corvus	i think we can proceed to restart the rest	18:49
fungi	shall i work my way down the list with nl03 next?	18:49
corvus	++	18:50
fungi	f116826 is installed there too	18:50
fungi	it's restarted on nl03 now	18:50
fungi	while that's going, i've checked `pbr freeze` on nl02 and 01 and they both look right as well	18:52
openstackgerrit	James E. Blair proposed openstack-infra/nodepool master: Make launcher debug slightly less chatty https://review.openstack.org/621675	18:53
corvus	that's for later ^	18:53
fungi	i think nl03 is handling requests, the debug log is just so firehose it never pauses	18:54
fungi	shall i move on to nl02?	18:54
corvus	fungi: yep	18:54
fungi	okay, it's restarted as well	18:55
*** diablo_rojo has joined #openstack-infra		18:55
mordred	corvus, fungi, clarkb: I'm around-ish .. but a dude is coming over to the house in a few minutes to give us some quotes on some work, so I'm not around-around	18:56
corvus	hrm. we're missing a debug line at the start of request processing; it's hard to tell (with grep) when the loop starts again	18:56
fungi	i do see nl02 seeming to satisfy some requests though according to the log	18:56
fungi	if i'm reading correctly	18:56
*** wolverineav has joined #openstack-infra		18:57
clarkb	yes it appears to be declining requests for citycloud	18:58
fungi	okay, moving on to nl01 i guess	18:58
fungi	and it's restarted	18:59
*** trown\|lunch is now known as trown\|outtypewww		18:59
fungi	this one's not so active compared to 02 and 03	18:59
fungi	openstack.exceptions.HttpException: HttpException: 403: Client Error for url: https://ord.servers.api.rackspacecloud.com/v2/637776/servers, Quota exceeded for ram: Requested 8192, but already used 1638400 of 1641728 ram	19:00
fungi	whee!	19:00
clarkb	fungi: the launchers with disabled providers are more active since they just decline things	19:01
clarkb	"more active"	19:01
*** wolverineav has quit IRC		19:01
fungi	oh, right, that's what it is	19:01
*** wolverineav has joined #openstack-infra		19:01
fungi	i hadn't made that connection	19:01
mordred	that's working-as-designed :)	19:04
*** e0ne has quit IRC		19:04
clarkb	ya I think this looks happy	19:05
clarkb	corvus: are we ready to restart zuul?	19:05
fungi	seems sane on the launcher end now at any rate	19:05
corvus	let's hold the zuul restart for a few minutes; there's a release making its way through right now	19:05
corvus	it has 1min left in gate; then of course the actual post-merge release activity	19:06
corvus	https://review.openstack.org/#/c/620919/	19:06
corvus	after that we should be good (see #openstack-release)	19:06
*** jamesmcarthur has quit IRC		19:06
fungi	looks like the system-config fix is really, really close to getting node assignments	19:06
corvus	fungi: it should still be after the restart.	19:08
fungi	indeed	19:08
corvus	(possibly closer)	19:08
*** shardy has joined #openstack-infra		19:17
Shrews	hrm, did we remove a provider pool from nl04?	19:17
Shrews	WARNING nodepool.driver.openstack.OpenStackProvider: Cannot find provider pool for node	19:17
*** e0ne has joined #openstack-infra		19:17
clarkb	we disabled bhs1 via max servers	19:19
fungi	yeah, didn't remove one afaik	19:19
clarkb	I don't think we remoed any providers though. Does it not log the one it thinks is missing?	19:19
Shrews	this is for ovh-gra1, which still exists in nodepool.yaml	19:19
Shrews	something is weird there	19:19
Shrews	pool and launcher node attributes are empty. maybe this is due to corvus' recent change...	19:20
clarkb	we seem to have launched new nodes there since the restart	19:20
*** priteau has quit IRC		19:20
*** amotoki has quit IRC		19:21
Shrews	hrm, not the change i was thinking of...	19:22
corvus	Shrews: the pool is named "pool"?	19:23
*** amotoki has joined #openstack-infra		19:23
corvus	oh, no you said it's None. sorry.	19:24
fungi	cruft for something hanging around in zk?	19:24
corvus	Shrews: could it be that when we create a fake node for deleting a failure, it has no pool entry?	19:25
Shrews	corvus: seems that way (just a warning that i hadn't noticed). ovh doesn't seem to be able to delete that instance, so it's hanging around	19:26
Shrews	so a problem with the provider	19:26
corvus	Shrews: ok, so we're still trying to delete those nodes (ie, it's a non-fatal error)?	19:26
Shrews	corvus: right	19:26
fungi	since their upgrade (to newton i think?) ovh has been struggling to satisfy delete requests in a timely fashion	19:26
tobiash	yes, we only set the provider	19:26
*** wolverineav has quit IRC		19:27
Shrews	tobiash: is that warning useful?	19:27
corvus	Shrews: ok. we're probably seeing it more because of the recent fix to create those stub nodes more often (on launch failures which return an external id)	19:27
clarkb	unrelated but https://github.com/kubernetes/kubernetes/issues/71411 probably means we want to redeploy the nodepool k8s cluster when oen of those patched versiosn is available	19:27
tobiash	Shrews: from where does it come?	19:27
corvus	Shrews: we could probably copy in the pool from the original request	19:27
Shrews	tobiash: during quota calculation	19:27
clarkb	corvus: ++	19:27
*** wolverineav has joined #openstack-infra		19:28
tobiash	actually then that node isn't taken into account during quota calculation	19:28
tobiash	so I think the warning was useful	19:28
tobiash	I think we should add the pool to these nodes too	19:28
Shrews	tobiash: ++	19:28
tobiash	but that's something that is already there for a long time so nothing fatal	19:29
corvus	heh -- the fix was to make sure the node was taken into account for quota. so.. yep. :)	19:29
fungi	corvus: unrelated, but the 621633 fix for system-config is failing puppet-beaker-rspec-puppet-4-infra-system-config and system-config-run-base	19:30
fungi	digging into logs for those now	19:30
clarkb	is createServer what sets node.pool?	19:30
fungi	the former is raising "ERROR! The requested handler 'Reload iptables Debian' was not found in either the main handlers list nor in the listening handlers list"	19:31
fungi	and so is the latter	19:31
tobiash	Shrews, corvus: as the comment states, the node is in a funny state: http://paste.openstack.org/show/736590/ :)	19:31
fungi	so i guess that's still being referenced	19:31
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: Set pool for error'ed instances https://review.openstack.org/621681	19:32
*** wolverineav has quit IRC		19:32
Shrews	i think ^^ fixes it	19:32
*** wolverineav has joined #openstack-infra		19:32
clarkb	oh right its because we make a copy of the node data structure in the bubbled up exception handler	19:33
clarkb	we don't use the actual node, instead that is reused	19:33
clarkb	Shrews: ya I think that should fix it	19:33
*** bobh has quit IRC		19:33
corvus	fungi: hrm. i guess that doesn't work. i don't immediately know why, but i wonder if it's due to the arcane rules for referencing handler tasks by name (referred to vaguely in one of the ansible bug reports)	19:34
corvus	in other news, the release is done, so we can restart zuul now	19:34
clarkb	do we need Shrews' fix to make quota crunching work properly?	19:35
tobiash	clarkb: not immediate, this thing is already there for a long time	19:35
clarkb	we merged two related chagnes around that before. The first attempted to track the ndoes properly and the second to track untracked nodes. I think these ndoes are currently "tracked" but then fail to be deleted	19:36
clarkb	tobiash: yes. Mostly wondering if the secodn related change will change the behavior in a more negative way than what we had before	19:36
tobiash	which one?	19:36
clarkb	tobiash: 56164c886a81c5d5c67eaac789a6288dd555189b	19:37
clarkb	I guess its the same as it was before since ^ will see them as untracked and not account for them and afbf9108d893ede0d147da2afe16c9e6d4bc76d4 will basically treat them as untracked too	19:37
clarkb	so not a worse regression, just not fixed yet	19:37
tobiash	clarkb: that dows	19:37
tobiash	clarkb: that doesn't make use of the pool, so shouldn't matter	19:38
AJaeger_	corvus, clarkb , frickler , #openstack-designate redirects to #openstack-dns, I'll WIP https://review.openstack.org/#/c/621639 since I think it's wrong	19:38
clarkb	AJaeger_: oh that explains it	19:38
clarkb	corvus: I'm ready for scheduler restart whenever you are	19:39
fungi	corvus: your 621634 alternative is actually passing all its jobs	19:39
clarkb	looks like tripleo gate just reset too	19:39
clarkb	so not a bad time for it	19:39
fungi	so that one might win for being the only one proposed so far which actually works ;)	19:40
tobiash	clarkb: for the record, this is the change that introduced the 'without pool nodes': https://review.openstack.org/589854	19:40
corvus	AJaeger_: can you elaborate on why you think the change is wrong?	19:40
tobiash	so that merged 3 months ago	19:40
clarkb	tobiash: ya and afbf9108d893ede0d147da2afe16c9e6d4bc76d4 attempted to rely on it but was incomplete	19:41
tobiash	ah that makes sense	19:42
clarkb	fungi: that is weird since 621633 uses the existing handler names in the main handler file. Basically that didn't change. So odd we'd run into import_tasks errors in that file if it can't even find the handlers	19:42
AJaeger_	corvus: see my comment - joining #openstack-designate, the topic is "This channel is unused, use #openstack-dns"	19:43
*** AJaeger_ is now known as AJaeger		19:43
AJaeger	corvus: so, why are you adding it? What triggered that change?	19:43
corvus	AJaeger_: yes... i'm not suggesting anyone use it. i'm just trying to establish basic access.	19:43
fungi	clarkb: i dunno what to tell you, but aside from infra-puppet-apply-3-centos-7 which only just got a node assignment, every other job has reported success on 621634	19:43
*** gfidente is now known as gfidente\|afk		19:43
corvus	AJaeger: frickler needs access to that channel to be able to set +i to make the forward effective. accessbot will grant him that access.	19:44
corvus	AJaeger: see http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2018-11-28.log.html#t2018-11-28T18:57:50 and http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2018-11-28.log.html#t2018-11-28T19:04:52	19:44
AJaeger	corvus: Ah! That explains it - thanks, then all is fine!	19:44
corvus	AJaeger: but, moreover, i can't see as how any change that adds an "#openstack-*" channel to accessbot would be wrong.	19:44
corvus	all openstack channels should be managed by accessbot	19:44
clarkb	fungi: ya mostly just pointing out its weird that a change whcih doesn't change teh addressing of the handlers would go from failing to run said handlers due to import tasks to failing to find the handlers at all.	19:45
clarkb	seems like this should be alarm bell worthy for ansible 2.8 release process if its goign to create havoc in handlers for people	19:45
AJaeger	corvus: even if unused?	19:45
corvus	AJaeger: yeah, i don't see why not	19:46
corvus	AJaeger: otherwise, we won't maintain op access for new global irc ops, etc.	19:46
AJaeger	Ok, I see...	19:46
AJaeger	thanks for explanation, corvus	19:46
corvus	AJaeger: np :)	19:47
fungi	AJaeger: consider a future state where we want to start using the channel again and we left it in some old state owned exclusively by accounts we replaced in intervening years	19:47
fungi	keeping the abandoned channels in our accessbot config preserves our access to them	19:48
corvus	i'll restart the zuul scheduler now	19:48
*** shardy has quit IRC		19:48
AJaeger	fungi: understood now - thanks	19:48
clarkb	mordred: Shrews pabelanger is that something that ansible might find as useful prerelease feedback? do we need to do anything other than just watch the existing bugs for the issue?	19:49
pabelanger	clarkb: feedback of the iptables issue?	19:51
*** e0ne has quit IRC		19:51
clarkb	pabelanger: yes. Basically we can't use import_tasks anymore in the handlers. But then if we switch to using normal tasks in the handler (https://review.openstack.org/#/c/621633/1/playbooks/roles/iptables/handlers/main.yaml) then ansible 2.8 says it can't find the handler for Reload iptables Debian	19:51
clarkb	the fix that does work is 621633 which adds explicit listens to the handlers	19:52
*** e0ne has joined #openstack-infra		19:52
pabelanger	clarkb: Yah, we could ask in #ansible if it will be useful info	19:53
fungi	clarkb: the fix which works (or seems to) is 621634 not 621633	19:55
fungi	though it uses listen as you describe	19:55
corvus	zuul is restarted	19:56
clarkb	oh sorry I copy pasta'd wrong	19:57
*** wolverineav has quit IRC		19:57
clarkb	fungi: yup 621634 is the one I meant	19:57
*** wolverineav has joined #openstack-infra		19:58
clarkb	corvus: is not being able to get a status related to zuul loading its config on first start?	20:00
clarkb	oh there it goes	20:00
*** tpsilva has quit IRC		20:01
corvus	clarkb: related to re-enqueuing (they're both gearman jobs)	20:01
corvus	i've examined the extra debug logs from nodepool and verified that it's processing priority 0 requests before higher numbers	20:02
*** e0ne has quit IRC		20:02
corvus	also, the priority column is visible in the 'nodepool request-list' output.	20:03
*** wolverineav has quit IRC		20:03
*** e0ne has joined #openstack-infra		20:03
pabelanger	Yay	20:04
clarkb	fungi: do you think we should enqueue 621634 to the gate since its been shown to work but didn't finish check testing?	20:05
fungi	clarkb: yes, i think so as long as everyone prefers that to making the job nonvoting	20:05
fungi	it seemed to be the least preferred of the various attempts at fixing, so i wasn't sure	20:05
corvus	so i think things are functioning correctly; probably the next step is to see if things behave how we expect with the changes. that will probably be easier to evaluate after we get past the restart.	20:06
clarkb	fungi: I think this type of error shows there is value in having the test and I worry that if you set it non voting we'll just ignore new failures	20:06
fungi	me too	20:06
fungi	corvus: i concur	20:06
clarkb	corvus: ya last time it seemed that the restart made it hard to see what was normal behavior	20:06
clarkb	I've +2'd 621634 and think we can move fowrad with that while nasible figures out if its broken things sufficiently for fixing	20:07
corvus	it's priority 1, btw.	20:07
corvus	621634 is	20:07
fungi	i guess 33 was pri0	20:08
corvus	so, aside from the fact that the whole system is busy satisfying nodes for the changes which arrived first, it's pretty high on the list for check nodes	20:08
clarkb	fungi: yup	20:08
corvus	ooh	20:09
corvus	i want to dequeue 33 and see if 34 gets bumped	20:09
fungi	an excellent test!	20:09
fungi	i say go for it	20:09
corvus	done!	20:09
*** e0ne has quit IRC		20:10
fungi	it fell out of the check pipeline at least	20:10
corvus	2018-12-03 20:09:39,668 DEBUG zuul.nodepool: Revised relative priority of node request <NodeRequest 300-0000624391 <NodeSet [<Node None ('bridge.openstack.org',):ubuntu-bionic>, <Node None ('trusty',):ubuntu-trusty>, <Node No	20:10
corvus	ne ('xenial',):ubuntu-xenial>, <Node None ('bionic',):ubuntu-bionic>, <Node None ('centos7',):centos-7>]>> from 1 to 0	20:10
clarkb	jobs just started	20:10
clarkb	seems to work as expected	20:10
corvus	yep -- that log line plus i checked nodepool request-list and saw it go from 1 to 0	20:11
fungi	and it's getting nodes already	20:12
fungi	yeah	20:12
fungi	slick!	20:12
pabelanger	++	20:12
corvus	\o/	20:12
fungi	i love it when a plan comes together	20:12
*** jamesmcarthur has joined #openstack-infra		20:12
* corvus lights cigar		20:12
corvus	i've rechecked 633 (for posterity)	20:15
corvus	granted, posterity is, what, a few weeks around here, but hey.	20:15
*** david-lyle has joined #openstack-infra		20:16
corvus	so i think i'll eat some food now, and then come back and make sure that we're actually reporting on changes and don't have any crazy new exceptions, then i'll send that email we drafted friday	20:16
fungi	thanks! i'll get back to drafting e-mails about mailing list shutdowns	20:17
*** manjeets_ has joined #openstack-infra		20:17
*** e0ne has joined #openstack-infra		20:17
*** eernst has quit IRC		20:18
*** manjeets has quit IRC		20:18
*** dklyle has quit IRC		20:18
*** munimeha1 has joined #openstack-infra		20:19
*** jamesmcarthur has quit IRC		20:20
*** e0ne has quit IRC		20:21
*** jamesmcarthur has joined #openstack-infra		20:22
* mordred is back - looks like the new stuff is working good!		20:26
clarkb	ssbarnea\|rover: fyi http://logs.openstack.org/38/618638/1/gate/tripleo-ci-centos-7-containers-multinode/45126b1/ara-report/file/eb257cab-ab3a-45e8-8d69-f33d118f5916/#line-10 is failing because it needs root	20:27
ssbarnea\|rover	clarkb: ouch... kinda is almost 9pm here... ]	20:28
clarkb	I'm guessing https://review.openstack.org/#/c/616872/ is the cause. No worries thought I'd point it out to someone in tripleo	20:30
clarkb	EmilienM: mwhahaha ^ you may care too and be more on awake timezones right now ;)	20:30
*** udesale has quit IRC		20:30
mwhahaha	waa?	20:30
mwhahaha	oh thanks	20:30
clarkb	actually that change may be unrelated. Now thinking that maybe if package has no updates and is already installed that works because you can yum info/list without root	20:31
clarkb	but if that package has updated in rdo or centos or somewhere then we'll try to upgrade it and then it breaks	20:31
clarkb	in any case become: true there likely necessary	20:32
mwhahaha	i shall fix	20:32
*** wolverineav has joined #openstack-infra		20:32
ssbarnea\|rover	clarkb: thanks for reproting, i am creating bug for it now. true: become is a must there, is... obvious.	20:32
mwhahaha	ssbarnea\|rover: you want to fix it since you're creating a bug	20:33
ssbarnea\|rover	mwhahaha: ok, i will do both. pinging you to review.	20:33
ssbarnea\|rover	in fact creating bug is harder than crearting the CR :D	20:34
mwhahaha	pretty much	20:34
ianw	at least it seems the ansible-devel job is working to find issues well before we update and everything explodes :)	20:36
clarkb	ianw: ya and the fix for the first issue seems to have found a second issue	20:37
*** graphene has quit IRC		20:38
ianw	clarkb: so that's 621633 ... where using the block: the handlers also don't seem to be found/triggered?	20:39
clarkb	ianw: ya the handler isn't found	20:40
clarkb	could be the same issue manifesting differently or two different issues, unsure	20:40
corvus	how did those tripleo changes end up in gate with that error?	20:41
ianw	ok, my github bug wasn't the best i know, i didn't have a test-case and only noticed it was the imports late in the day. can work on getting something use for bug now we have some smoking guns	20:41
ssbarnea\|rover	clarkb: https://review.openstack.org/#/c/621696/ -- going out now.	20:41
*** e0ne has joined #openstack-infra		20:42
*** e0ne has quit IRC		20:42
corvus	we've merged changes since the restart	20:44
clarkb	corvus: I think it may have to do with local image install state and remote package availability	20:44
clarkb	corvus: ansible can check if you have teh latest installed without root. And if you do have latest already installed its fine	20:44
clarkb	corvus: but if the upstream package repo updates then now you need root to reconcile the delta	20:44
*** hjensas has joined #openstack-infra		20:45
clarkb	I expect all the changes to fail with those errors until 621696 merges or the upstream package repo reverts the update	20:45
corvus	i think i see a new exception in the scheduler logs; i'm digging	20:48
clarkb	and ya there is a relatively recent package for util-linux in the centos 7 package repo. Time stamp is jsut over 2weeks old. Unsure if that timestamp maps to build time or publish or what	20:48
clarkb	seems like octavia is also having rpm/yum/centos related issues	20:50
mwhahaha	ugh so that ceph-loop-device thing is going to completely hose up the gate, any way to get that promoted to the top of the tripleo gate?	20:50
clarkb	http://logs.openstack.org/38/617838/5/gate/octavia-v2-dsvm-scenario-centos-7/9e669f8/job-output.txt.gz#_2018-12-03_20_09_45_935256	20:50
clarkb	mwhahaha: ya if it gets approved I can enqueue and promote it	20:51
mwhahaha	clarkb: aproved	20:51
mwhahaha	all approved	20:51
mwhahaha	er also	20:51
* mwhahaha give sup		20:51
clarkb	I wonder if that would be a useful ansible lint rule	20:52
mwhahaha	yes	20:52
clarkb	use become for package installs	20:52
clarkb	promotion is running now	20:53
clarkb	and done	20:54
mwhahaha	thanks	20:54
clarkb	johnsom: rm_work hey not sure why yet, but it seems centos7 updates have broken octavia gates, you'll probably want to look into it	20:54
clarkb	I'm guessing today is the next point release release	20:55
johnsom	clarkb I saw a failure this morning with a missing package at RAX, just assumed it was a mirror sync issue	20:55
clarkb	johnsom: I think its likely due to 7.6 or whatever the number is happening	20:56
clarkb	johnsom: and packages being broken there? I'm not sure. Yum called it a non fatal rpm install thing	20:56
johnsom	The one I saw was radvd couldn't be downloaded from the mirror	20:57
clarkb	http://logs.openstack.org/38/617838/5/gate/octavia-v2-dsvm-scenario-centos-7/9e669f8/job-output.txt.gz#_2018-12-03_20_09_10_181752 at least I'm not seeing anything else that could be thep roblem	20:57
*** apetrich has quit IRC		20:58
fungi	#status log removed static.openstack.org from the emergency disable list now that ara configuration for logs.o.o site has merged	20:58
openstackstatus	fungi: finished logging	20:58
*** udesale has joined #openstack-infra		20:58
clarkb	https://lwn.net/Articles/773680/ yup its likely 7.6	20:58
clarkb	#status Log CentOS 7.6 appears to have been released. Our mirrors seem to have synced this release. This is creating a variety of fallout in projects such as tripleo and octavia. Considering that 7.5 is now no longer supported we should address this by rolling forward and fixing problems.	20:59
openstackstatus	clarkb: finished logging	20:59
clarkb	johnsom: reading the devstack function for detecting failures any one of those lines that says failure: something will cause the failure to bubble up in devstack	21:04
clarkb	though maybe the no package golang error is the actual issue?	21:05
clarkb	sure enough there is no golang package	21:06
clarkb	ianw: ^ this is something you probably have the hsitory around to know how to debug	21:06
clarkb	well thats curious 7.5 had golang	21:07
clarkb	7.6 does not	21:07
johnsom	That seems like an issue bigger than a dot release...	21:08
clarkb	well its the dot release not being backward compatible by removing packages	21:08
clarkb	so yes, but also not much we can do about it? may need to enable epel and use their golang?	21:08
*** gema has joined #openstack-infra		21:09
ianw	hrm, that doesn't look like intended behaviour	21:09
ianw	it does say non-fatal error ... we do have some extra stuff in there because yum doesn't exit with !0 on missing packages	21:10
clarkb	ianw: ya devstack has a check of itself to look for Failure: and no package lines	21:10
clarkb	in this case Failure: is not going to match failure: I don't think since awk should be case sensitive. I now believe the lack of a golang package is the issue	21:10
clarkb	which is devstack checking correctly that all packages installed (and they did not)	21:11
ianw	No package golang available.	21:11
ianw	yeah, i think we've come to the same conclusion this is a correct detection of the golang package not being found :)	21:12
ianw	why this just started happening ...	21:12
ianw	is another question	21:12
mordred	ianw: because. raisins	21:12
clarkb	ianw: because 7.6 just released	21:13
clarkb	likely our mirrors just recently finished releasing	21:13
clarkb	mwhahaha: you'll likely want to keep an eye out for any other fallout now that the become: true fix is in place	21:13
clarkb	mwhahaha: since there is a non zero chance there are other breaking issues	21:13
openstackgerrit	Ed Leafe proposed openstack-infra/project-config master: Add the os-resource-classes project https://review.openstack.org/621666	21:14
ianw	yeah, i mean why golang would disappear between releases	21:14
tosky	or maybe golang is just somewhere else	21:14
mwhahaha	clarkb: yea	21:14
clarkb	tosky: regardless its still backward incompatible change for a stable distro	21:15
fungi	perhaps they renamed the package?	21:15
clarkb	I don't think putting the package in a differet location changes how I feel about that	21:15
tosky	clarkb: it depends on the place of the repository	21:16
tosky	on which repository	21:16
fungi	or yeah maybe they moved it to a different rhn channel	21:16
tosky	I don't know how it works internally with golang, but I see this: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.5_release_notes/chap-red_hat_enterprise_linux-7.5_release_notes-deprecated_functionality	21:16
fungi	or whatever they renamed those in the days since rhn	21:16
clarkb	tosky: http://mirror.centos.org/centos/7/os/x86_64/Packages/ is where it was and is now missing	21:16
ianw	"The golang package, available in the Optional channel, will be removed from a future minor release of Red Hat Enterprise Linux 7. Developers are encouraged to use the Go Toolset instead, which is currently available as a Technology Preview through the Red Hat Developer program. "	21:17
clarkb	and ya that explains it	21:17
ianw	that sounds likely	21:17
fungi	whee!	21:17
ianw	i think centos has go toolset?	21:17
*** diablo_rojo has quit IRC		21:18
*** apetrich has joined #openstack-infra		21:18
clarkb	http://mirror.centos.org/centos/7/sclo/x86_64/rh/go-toolset-7/ is that it?	21:18
clarkb	those versions are older than what were in 7.5 so may not fix everything if the go version matters	21:18
clarkb	wait one is older one is newer	21:18
*** diablo_rojo has joined #openstack-infra		21:19
*** priteau has joined #openstack-infra		21:19
tosky	unless you also need to enable the repository with containers-related	21:19
tosky	stuff	21:19
tosky	https://wiki.centos.org/Container/Tools -> it seems to contain golang	21:20
*** manjeets_ is now known as manjeets		21:20
ianw	clarkb: the slco i think is enabled via software collections, then you put it in your path	21:21
clarkb	https://git.openstack.org/cgit/openstack/octavia/tree/devstack/files/rpms/octavia is where it comes from so seems octavia specific	21:21
clarkb	devstack runs aren't all trying to install it	21:21
*** kgiusti has left #openstack-infra		21:22
clarkb	probably up to octavia to decide what is the most appropriate method for installing golang in this case	21:22
tosky	it looks like that at least one of the featuresets in tripleo-quickstart enables the virt7-container-common-candidate repository, which provides golang too	21:22
*** udesale has quit IRC		21:23
EmilienM	we use virt7-container-common-candidate to pull podman mainly and its deps	21:24
corvus	okay, after much digging, i see that the "new" exception from the scheduler is not new at all; apparently for some time the scheduler has gotten sufficiently busy that there's a significant lag between when a job starts and the scheduler registers it. if a job is canceled during that window, we can't notify the executor, and so we return the nodes out from under it. when the job eventually	21:25
corvus	fails, we try to return the nodes again, but note that we don't have the lock. in the end, everything works as it it should (or, at least, as best it can). i don't see an immediate fix to correct the underlying race which causes the errors.	21:25
corvus	so i think i'm happy with the current system state and plan to give the release folks the all-clear and send out that email	21:26
corvus	clarkb, fungi, pabelanger, mordred: ^ sound good	21:26
clarkb	corvus: ++	21:26
clarkb	johnsom: hopefully that gives you enough breadcrumbs to go about fixing it. I'm not sure how octavia is using golang so unsure how to best suggest to fix it. However, I think if it were me maybe install from upstream go?	21:27
pabelanger	corvus: ++	21:28
fungi	corvus: sounds good!	21:29
cmurphy	clarkb: https://review.openstack.org/602380 was approved but had a gate failure, I'm now holding it until someone can babysit it, when is a good time for me to release it?	21:29
mordred	corvus: ++	21:29
cmurphy	or mordred ^	21:30
clarkb	cmurphy: fungi might be willing to help watch it? he has been digging into all the mailing list stuff recently	21:30
clarkb	I can help too, I just don't have the same level of mailman skills	21:30
cmurphy	the main thing is just watching the puppet log to see if anything changed, if anything changed we revert	21:31
fungi	clarkb: cmurphy: sure, happy to take a look, go ahead and un-wip	21:31
openstackgerrit	Merged openstack-infra/system-config master: Don't import tasks in iptables reload and use listen https://review.openstack.org/621634	21:31
cmurphy	thanks fungi	21:31
clarkb	fungi: ^ and with that in hopefully we can unblock the list disabling	21:34
cmurphy	hmm should i recheck or will it get make its way into the gate queue on its own?	21:36
fungi	clarkb: yep, i already rechecked my ml alias changes	21:36
fungi	cmurphy: i've approved it just now	21:37
clarkb	cmurphy: if all you did is remove the -W then you probably need to recheck (or have someone approve it as fungi did)	21:37
cmurphy	got it thanks fungi	21:37
fungi	my pleasure!	21:37
*** jcoufal has quit IRC		21:37
* fungi goes back to writing a bunch of very redundant-looking e-mail messages		21:37
clarkb	corvus: mordred: not sure if you saw https://github.com/kubernetes/kubernetes/issues/71411 during the relevant priority stuff. But any chance we can check if our cluster needs a rebuild and if that is possible? (does magnum give you the version of k8s it deploys or do you select one?)	21:40
*** priteau has quit IRC		21:41
corvus	clarkb: i don't recall seeing a choice or information	21:41
mordred	I'm not super sure that would affect us anyway	21:42
mordred	it seems like a violation of network isolation	21:42
clarkb	mordred: it says in default configs the discovery api exposes it for all requests	21:42
mordred	right - but "Remove pod exec/attach/portforward permissions from users that should not have full access to the kubelet API"	21:43
clarkb	mordred: I read that to mean anyone on the internet (because our k8s api is internet facing right?) could exploit this to run pods	21:43
mordred	is one of the mitigations - and I don't believe we have any such users	21:43
mordred	clarkb: hrm. maybe?	21:44
clarkb	I think they listed the two ways you could exploit it and your thing is the second but not only way	21:44
clarkb	the first way through the discovery api is what I am worried about	21:44
clarkb	ya the articles on it say that the one you point out can give you admin on cluster the one I point out will let you run pods	21:45
mordred	clarkb: "aggregated API server endpoint" seems to be key	21:46
mordred	I mean - regardless, we should likely upgrade - or use it as an exercise to figure out how to upgrade even if we don't need to	21:46
clarkb	ya I'm not sure we have the insight necessary to know how magnum is deploying things so erring on the side of caution here is probably a good idea	21:47
mordred	agree	21:47
clarkb	reading magnum user docs I don't see a managed upgrade command	21:49
clarkb	I'm thinking it may need to be a delete, create	21:49
*** jmorgan1 has joined #openstack-infra		21:53
clarkb	or figure out how to do an upgrade in place on the cluster. Not sure if the commands to expand the cluster will work though (as it may end up with mismatched services?)	21:55
clarkb	hogepodge: ^ you probably know	21:55
*** wolverineav has quit IRC		21:56
clarkb	mordred: thinking about it more I think you can use unauth'd discovery to get a pod, then use that to get admin. Considering that and our not having really used this at all yet, delete, create may be desireable	21:59
mordred	clarkb: ++	22:04
corvus	clarkb: yeah, but would be nice to know if/when that would be effective.	22:05
corvus	also, i wonder if we can/should use the same keys.	22:06
clarkb	ok reading more	22:12
clarkb	it seems that you have to have one of the non default aggregate server endpoints running	22:12
fungi	some sort of race or other nondeterminism in our snmp service test? http://logs.openstack.org/56/619056/2/check/system-config-run-base/30ad771/job-output.txt.gz#_2018-12-03_21_59_38_247682	22:12
clarkb	thats what the blurb about metrics is about	22:13
clarkb	mordred: ^ so I think we were both half right	22:13
clarkb	mordred: basically our api server is likely "vulnerable" but if there isn't the backend service endpoint behind it it can't be exploited	22:13
*** jaosorior has quit IRC		22:13
*** jaosorior has joined #openstack-infra		22:16
*** rcernin has joined #openstack-infra		22:18
*** pcaruana has quit IRC		22:18
openstackgerrit	Merged openstack-infra/system-config master: Turn on future parser for lists.katacontainers.io https://review.openstack.org/602380	22:19
corvus	fungi: this looks ok. i'm not sure it it should be that short (compared to preceding/following lines): http://logs.openstack.org/56/619056/2/check/system-config-run-base/30ad771/job-output.txt.gz#_2018-12-03_21_56_52_322498	22:21
corvus	fungi: same: http://logs.openstack.org/56/619056/2/check/system-config-run-base/30ad771/job-output.txt.gz#_2018-12-03_21_56_56_261602	22:22
corvus	fungi: if it happens again, it might be good to hold the node and capture the syslog	22:22
corvus	or, well, actually, we should just do that in th post playbook regardless	22:23
clarkb	fwiw we do appear to have the default access to some bits of the api as unauthenticaed k8s user	22:23
*** udesale has joined #openstack-infra		22:23
clarkb	but hard to know if there are aggregated api servers running behind that	22:23
corvus	(of course, "capture the syslog" across all the systems we use is an impossibly complex task compared to 2 years ago)	22:24
clarkb	corvus: for our control plane at least everything should still use rsyslog (journald will forward there)	22:24
corvus	oh good	22:24
clarkb	I'm not sure what the context of that is, but ya the way ubuntu and centos have set things up journald is actually a ring buffer that forwards to rsyslog. And pre systemd is just syslog	22:25
*** udesale has quit IRC		22:25
clarkb	so they should all have a consistent interface to permanent logs (which is wherever rsyslog has written them which differes on distros)	22:25
*** wolverineav has joined #openstack-infra		22:25
*** udesale has joined #openstack-infra		22:26
fungi	clarkb: context was getting snmpd's syslogged errors from a test node in our ansible base-test integration jobs	22:27
*** priteau has joined #openstack-infra		22:27
*** ramishra has quit IRC		22:28
clarkb	ah I would expect that to be in /var/log/messages or /var/log/syslog depending on the platform then	22:28
*** priteau has quit IRC		22:29
fungi	yeah, hopefully as this is an attempt at replicating bits of our control plane for an integration test, behavior should be similar	22:29
*** boden has quit IRC		22:32
dmsimard	btw heads up, CentOS 7.6 is rolling out	22:34
dmsimard	ah, just caught up with backlog :p	22:35
clarkb	dmsimard: oh we've already discovered it :) broke tripleo and octavia	22:35
* dmsimard sighs		22:35
mwhahaha	well if that's the only issue with tripleo it'll be one of the smoothest transitions	22:36
* mwhahaha knocks on wood		22:36
clarkb	mwhahaha: http://logs.openstack.org/90/614290/2/gate/tripleo-ci-centos-7-standalone/5c77eaf/job-output.txt.gz#_2018-12-03_21_18_15_207420 paunch just ran into that	22:36
clarkb	I think the error occurred because we don't support nested virt in inap	22:37
mwhahaha	it's ignored	22:37
mwhahaha	it failed cause of another reason	22:37
mwhahaha	http://logs.openstack.org/90/614290/2/gate/tripleo-ci-centos-7-standalone/5c77eaf/job-output.txt.gz#_2018-12-03_21_55_50_136608	22:37
mwhahaha	tempest has been hanging for some weird reason	22:37
clarkb	mwhahaha: maybe only load kvm_intel if vmx is present? (will clean up the logs)	22:38
mwhahaha	yea we can clean up that role. it's our role to check if we should be using qemu or not for nova	22:39
clarkb	also it fails later trying to connect to tempest-sendmail.tripleo.org:8080 ?	22:39
*** mriedem is now known as mriedem_away		22:40
mwhahaha	yea i don't know the deal with that code, will need to raise a bug (and maybe disable it)	22:40
clarkb	zuul can be configured to report via email if you'd like to set that up.	22:40
mgagne_	clarkb: vmx flag exists on our processor. is the issue that it isn't exposed to the VM?	22:40
mwhahaha	no this is the tempest failures being sent out	22:40
clarkb	mgagne_: ya you have to expose it to the middle VM for the nested virt to work	22:41
clarkb	mwhahaha: the reports can point to job logs which include the tempest failures?	22:41
dmsimard	mwhahaha: fwiw the base centos image is 7.5, nodepool hasn't built the 7.6 yet apparently	22:41
mgagne_	clarkb: right but what's the current status? I don't remember what we did	22:41
clarkb	mwhahaha: another thing we should look at cleaning up is https://review.openstack.org/#/c/567224/, periodic jobs can be used for that	22:41
mwhahaha	clarkb: those are basically periodic but < 8 hours (which was previously the periodic limit)	22:42
clarkb	mgagne_: I think it is enabled on some systems but not others? I've not followed it super closely. johnsom tends to have a good overview of it	22:42
mwhahaha	i think they are every 4, but yes it might make sense to look into a different way of running those	22:42
clarkb	mwhahaha: ok I'm not sure how circumventing the limit is any better?	22:42
mgagne_	they all have the same CPU and configs.	22:42
clarkb	absically thats a bug and its wrong so please can we fix it with the correct tool (periodic jobs)	22:42
mwhahaha	periodic is just one job right?	22:42
mgagne_	hopefully they have the same BIOS settings, that I'm not sure	22:42
mwhahaha	not all jobs for a repo?	22:42
clarkb	mwhahaha: periodic is a piepline you configure which jobs to trigger on the period	22:43
mwhahaha	i'll raise the issue with the appropriate folks, i don't really like those anyway	22:43
clarkb	mgagne_: its a hypervisor kvm option not bios flag to pass it through	22:43
clarkb	mgagne_: let me hop on an instance and double check	22:43
johnsom	mgagne_ Hi, what is your nested virtualization question?	22:43
mgagne_	clarkb: could be that VT is disabled in the bios	22:43
mgagne_	johnsom: someone suspects that vmx flag isn't exposed in inap-mtl01. I'm saying our CPU have vmx flag. so I'm wondering what's the actual issue.	22:44
clarkb	mgagne_: johnsom I've just hopped on an instance and don't see vmx in the VM	22:45
johnsom	mgagne_ Ah, ok. Yeah, so if your hypervisor level sees VMX in the cpuinfo, your hardware virtualization is enabled.	22:45
fungi	nova has to be configured to pass that through to the instances, correct?	22:45
clarkb	systemd-detect-virt says kvm so the hypervisor is running with virt enabled (it would say qemu otherwise)	22:45
mgagne_	ok, let me see which CPU model is exposed then	22:45
clarkb	fungi: I think its kvm actually	22:45
fungi	ahh	22:45
johnsom	mgagne_ However, you then need to enable your hypervisor to expose VMX inside the guests as well.	22:45
dmsimard	mgagne_: http://paste.openstack.org/show/736600/	22:46
*** udesale has quit IRC		22:46
clarkb	mgagne_: its not urgent just pointing out that tripleo seemed toa ssume nested virt in the testing which added noise to the logs	22:46
mgagne_	so I think it has to do with the CPU model used by libvirt which does not include vmx.	22:46
clarkb	mwhahaha: the other tool to keep in mind there is openstack health	22:47
johnsom	mgagne_ What hypervisor are you using?	22:47
clarkb	mwhahaha: it uses subunit to track things at a test level and you can rss/atom subscribe to feeds for things like that	22:47
mgagne_	johnsom: libvirt+kvm	22:47
clarkb	mwhahaha: but it gives you nice graphing over time and so on	22:47
mwhahaha	yea we use that too	22:47
*** irclogbot_1 has quit IRC		22:47
clarkb	mgagne_: ah interesting	22:47
johnsom	mgagne_ These are the steps for a KVM hypervisor: https://docs.openstack.org/devstack/latest/guides/devstack-with-nested-kvm.html	22:47
mwhahaha	this is specifically to notify the correct people who care about specific test failures	22:47
mgagne_	I'll see what I can do	22:47
clarkb	mwhahaha: ya they should be able to subscribe to those failures in openstack health I think	22:48
* mwhahaha shrugs		22:48
mwhahaha	this stuff predates alot of that	22:48
mgagne_	johnsom: I think that's not the issue atm, the issue is with the CPU model used by libvirt which doesn't include those flags.	22:48
mwhahaha	i thought mail was turned off anyway	22:48
clarkb	mwhahaha: ya looks like that server isn't responding which leads to the later failure in that job	22:48
mwhahaha	i'm filing bugs	22:49
*** lbragstad has quit IRC		22:51
*** lbragstad has joined #openstack-infra		22:52
clarkb	mordred: there was email to the -discuss list recently about how to upgrade existing magnum clusters. Looks like you need access to the host VMs and run atomic container update ocmmands	22:53
clarkb	mordred: so ya not exposed by the api as far as I can tell	22:53
*** jaosorior has quit IRC		22:53
*** rh-jelabarre has quit IRC		22:54
*** jamesmcarthur has quit IRC		22:55
clarkb	I'm guessing we can't ssh into our magnum instances?	22:55
*** rh-jelabarre has joined #openstack-infra		22:57
clarkb	what do you know I can ssh into them	22:58
clarkb	There were 75084 failed login attempts since the last successful login.	22:58
clarkb	seems like ssh is keeping the badness out?	22:58
fungi	argh, can anyone interpret http://logs.openstack.org/58/621258/1/check/system-config-run-base-ansible-devel/3bb59c6/job-output.txt.gz#_2018-12-03_22_31_44_305164 ?	22:59
fungi	looks like it hit that on trusty, xenial and centos7	22:59
clarkb	fungi: ansible inventory nodes use connections, ssh, windowswhateverpowershell?, etc	23:00
fungi	same error for all 3 so i don't think itS a concidence	23:00
clarkb	seems that ssh is no longer valid?	23:00
clarkb	we might not want to keep up with the ansible devel at this rate :P	23:00
fungi	or i may just put lists.o.o in the emergency disable list temporarily and hand-apply 621258 so i can get on with things	23:01
clarkb	fungi: http://logs.openstack.org/58/621258/1/check/system-config-run-base-ansible-devel/3bb59c6/ansible/hosts/inventory.yaml is where we tell it to use the ansible_connection ssh	23:02
*** rh-jelabarre has quit IRC		23:02
corvus	let's merge the non-voting change	23:03
corvus	https://review.openstack.org/621577	23:04
corvus	someone will need to remove frickler's WIP	23:05
clarkb	corvus: fungi I removed the WIP and approved the cahnge	23:06
fungi	thanks!	23:06
clarkb	corvus: mordred and other infra-root. We can ssh into the k8s nodes via the root user	23:06
clarkb	seems that the hosts use our aggregate ssh key	23:06
clarkb	corvus: mordred: infra-root any reason not to attempt to upgrade the cluster under magnum as described on the -discuss list?	23:06
corvus	clarkb: ah yes, i knew that (i selected the keypair when creating it). i didn't make that connection though.	23:07
clarkb	there is a non zero chance that this will break the cluster but we aren't using it yet right? and maybe we'll learn things	23:07
corvus	clarkb: i say go for it yolo	23:08
fungi	i must admit i'm not entirely clear on what or where said magnum cluster is	23:08
*** irclogbot_1 has joined #openstack-infra		23:08
clarkb	fungi: corvus created a magnum k8s cluster in vexxhost sjc1 to point nodepoo at	23:08
corvus	fungi: i made a magnum in vexxhost for nodepool	23:08
fungi	was it used to test nodepool kubernetes driver?	23:08
fungi	ahh, okay, good guess ;)	23:08
clarkb	its not been used yet as there was a bug in the config file	23:08
clarkb	not sure if that was fixed	23:08
corvus	fungi: https://review.openstack.org/620756	23:08
clarkb	`sudo atomic pull --storage ostree docker.io/openstackmagnum/kubernetes-apiserver:v1.11.5-1` and `sudo atomic containers update --rebase docker.io/openstackmagnum/kubernetes-apiserver:v1.11.5-1 kube-apiserver` are the sorts of commands to run according to the mailing list	23:09
clarkb	I'll start on the master node and update all of the services to 1.11.5.-1 there. Then update the minion services after	23:09
clarkb	and if it breaks we can always rebuild it. But ya Ifigure good learning opportunity to do this as in place upgrade	23:09
clarkb	current versions is 1.11.1	23:09
*** jtomasek has quit IRC		23:10
fungi	so i guess magnum doesn't manage the version of kubernetes in the way that, say, trove manages the version of mysql?	23:11
clarkb	correct	23:11
clarkb	there is apparently ongoing work to support this? but the mailing list confirmed my reading of docs that we have to do it under magnum	23:11
fungi	k	23:11
clarkb	the other concern I have getting set up to do this is the magnum instances are built on fedora 27 whcih is no logner supported aiui	23:12
clarkb	probably smaller concern since all services run out of containers, but ...	23:12
fungi	you can in-place upgrade fedora though, right?	23:13
*** yamamoto has joined #openstack-infra		23:13
clarkb	I think you "can" but its often recommended to do reinstall?	23:13
fungi	or does kubernetes eat its own cloud-native dogfood and recommend that you redeploy your kubernetes control plane daily?	23:13
jonher	Is there a good reason to why lists.openstack.org does not do https?	23:15
fungi	jonher: no point	23:15
jonher	alright, fair enough	23:15
fungi	jonher: it sends out account passwords (the only thing https there would possibly protect) via unencrypted smtp on request	23:15
fungi	and those passwords are only for managing subscription preferences	23:16
clarkb	heh and now I've run out of disk space as we only have 5GB of disk on this node?	23:17
jonher	I just found some links to lists.openstack.org that had https, hence the question, I'll submit a MR in that project	23:17
clarkb	I'm going to see if it just didn't resize the rootfs on boot	23:17
clarkb	once I figure out how to figure that out	23:17
clarkb	(yay learning things)	23:17
fungi	jonher: my poc for upgrading to mailman3 suggests we'll probably switch to https when we do that, but it's a much different system too	23:17
*** gema has quit IRC		23:18
clarkb	ok lvm is set up and has ~32GB mounted under /var/lib/docker	23:20
clarkb	5GB mounted on sysroot	23:20
clarkb	problem is we don't seem to use /var/lib/docker with atomic?	23:20
corvus	clarkb: i wonder if we can do a rolling replace of master/minions?	23:22
clarkb	/vda1 is /boot /vda2 is sysroot mapped through lvm /vdb is ~80GB device of which ~32GB is exposed to docker-pool via lvm	23:23
clarkb	docker-pool isn't actually moutned on anything from what I see	23:23
clarkb	maybe the intent was to set docker-pool	23:24
clarkb	er	23:24
clarkb	set docker-pool in /etc/docker/docker-lvm-plugin? but that wasn't done	23:25
clarkb	hrm though there is an lv on the docker vg so maybe that is automagic	23:25
mgagne_	looks like the only way to be able to add vmx flag in Nova is to run Rocky. Or to use host-passthrough cpu_mode. Version prior to Rocky allows you to provide extra CPU flags but there is a whitelist which does not include vmx, only pcid and others related to meltdown/spectre.	23:26
fungi	mgagne_: that option was added to allow passing through the cpu flags for meltdown/spectre	23:29
mgagne_	yes	23:29
mgagne_	but won't help for vmx =)	23:29
mgagne_	unless I patch our version of nova to allow it	23:30
fungi	i have to assume nested-virt support was accomplished some other way as i thought providers had been doing that for a while	23:30
mgagne_	and in fact, add the feature. still running mitaka.	23:30
mgagne_	fungi: maybe they are using host-passthrough? or host-model?	23:30
fungi	i don't know enough about nova to know, other than having been privy to the meltdown/spectre discussions and seeing other providers exposing nested-virt acceleration support who weren't running rocky either and who i assumed weren't patching nova to do it	23:32
fungi	but... maybe they were/	23:32
clarkb	I freed up disk space with atomic images prune	23:32
clarkb	it deleted some ociimages data	23:32
clarkb	I think the docker lv must be used by k8s workload?	23:33
clarkb	but atomic isn't running things with docker? or otherwise keeping its images and runtimes off of that lv?	23:33
clarkb	hrm that wasn't enough to pull the other images	23:34
ianw	ok, so i'm all caught up on the devel branch issues. the original bug exactly matches the change pointed out by fricker. the additional issue of using a block: in the handler (621633) is a known problem as i mentioned in a comment there	23:36
ianw	so while i probably wouldn't agree ansible should break this without deprecation, it's all explained in my head at least now :)	23:37
openstackgerrit	Merged openstack-infra/system-config master: Tighten permissions on zone keys https://review.openstack.org/617939	23:38
openstackgerrit	Merged openstack-infra/system-config master: Make system-config-run-base-ansible-devel non-voting https://review.openstack.org/621577	23:38
clarkb	fedora-atomic itself uses 4.4GB of disk for its ostree	23:40
clarkb	so Ican't really go deleting anything else	23:40
clarkb	mnaser: ^ as a heads up you may be interested in this as it feels like the vexxhost magnum deployment is not deployed on partitions large enough to do an in place k8s upgrade	23:41
clarkb	mnaser: you might want to double the size of vda to 10GB from 5GB?	23:41
* mnaser reads backlog		23:42
mnaser	clarkb: i think for that when you create a magnum cluster you pick the docker volume size	23:44
mnaser	magnum cluster-show <foo> .. what does that show for docker_volume_size ?	23:44
clarkb	mnaser: no this is the sysroot that is the issue	23:44
clarkb	mnaser: I see the docker volume and it is ~80GB which si fine. The problem is that the host os itself uses atomic/ostree to run the system containers and I can't update those as sysroot is only 5GB large and fedora itself is 4.4GB	23:44
clarkb	but let me show the cluster	23:45
clarkb	coe cluster show Nodepool doesn't show volume sizes. Is that only available with magnumclient?	23:47
mnaser	i think it might be clarkb	23:49
mnaser	clarkb: i think this is a case of magnum creating a vm without volumes	23:49
mnaser	but in sjc1 we do bfv only	23:49
mnaser	that should probably be something we should fix	23:50
clarkb	\| docker_volume_size \| 80 \|	23:50
clarkb	which is what I see on the pv/vg/lv side	23:50
clarkb	so I think that is fine. My understanding of the issue is that atomic runs these system level containers outside of docker. And those containers run k8s	23:50
clarkb	atomic itself is a 4.4GB "container" according to ostree which uses up almost the entire 5GB sysroot	23:51
clarkb	but then I can't update the k8s container images as I run out of disk	23:51
clarkb	mnaser: are we able to specify the sysroot size somehow when creating the cluster?	23:51
mnaser	clarkb: unfortunately, i think the very fact that we are able to boot this cluster at all is a result of this bug: https://review.openstack.org/#/c/603910/	23:52
*** pbourke has quit IRC		23:53
mnaser	when root_gb=0, it creates a 'disk' that is equal to the size of the image	23:53
mnaser	which really is a security issue to start with	23:53
clarkb	that would explain it	23:53
mnaser	but anyways, i think thats what is happening	23:53
mnaser	i wonder if magnum has bfv support, grr	23:53
mnaser	if not that's a fun exercise for me :)	23:54
clarkb	on the one hand atomic is supposed to be fairly atomic and maybe the answer here is wait for vexxhost to push new iamges and then redeploy, but that doesn't help epople that have an existing cluster they want to keep using	23:54
*** pbourke has joined #openstack-infra		23:55
mnaser	clarkb: yeah, what sort of issues did you run into? i havent had issues doing something like atomic host upgrade in the past	23:55
mnaser	but it was on new clusters so maybe they didnt have a lot of space occupied by logs etc	23:55
clarkb	mnaser: `atomic pull --storage ostree docker.io/openstackmagnum/kubernetes-kubelet:v1.11.5-1` fails with `FATA[0033] Error committing the finished image: /builddir/build/BUILD/skopeo-7add6fc80b0f33406217e7c3361cb711c814f028/vendor/src/github.com/ostreedev/ostree-go/pkg/otbuiltin/commit.go:407 - Writing content object: fallocate: No space left on device`	23:57
mnaser	any reason why you were pulling that?	23:57
clarkb	mnaser: yes major k8s security vulnerability I'd like to patch :)	23:57
mnaser	oh that's nice to know.	23:58
clarkb	and took this as a learning opportunity. I think for infra its no big deal to make a new cluster	23:58
mnaser	that's kinda necessary	23:58
mnaser	yeah but it's a good exercise	23:58
clarkb	but anyone that has a running cluster is likely going to want to ugprade in place rather than redeploy	23:58
clarkb	so figuring this out is also useful	23:58
mnaser	look at that, working with a cloud providers pays for both infra and provider	23:58
mnaser	who knew	23:58
mnaser	:P	23:58
mordred	mnaser: ikr?	23:58
clarkb	mnaser: ya I mean we'll likely just reisntall it at this point, but figuring out the disk situation so that in the future we could just upgrade would be nice	23:59
mnaser	https://github.com/openstack/magnum/blob/c8019ea77f33609452dd1a973e0f421b118c2079/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml#L745-L761	23:59
clarkb	but as you said that may depend no whether or not magnum understands bfv	23:59
mnaser	so it looks like it doesnt support boot from volume grrr	23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!