Monday, 2018-12-03

*** wolverineav has quit IRC00:04
*** eharney has quit IRC00:07
*** ahosam has quit IRC00:14
*** jamesmcarthur has quit IRC00:25
*** jamesmcarthur has joined #openstack-infra00:29
*** wolverineav has joined #openstack-infra00:34
openstackgerritTristan Cacqueray proposed openstack-infra/zuul-jobs master: Add install and deploy openshift roles.  https://review.openstack.org/60861000:40
*** jamesmcarthur has quit IRC00:44
*** jamesmcarthur has joined #openstack-infra00:45
*** jamesmcarthur has quit IRC00:56
openstackgerritTristan Cacqueray proposed openstack-infra/zuul-jobs master: Add install and deploy openshift roles.  https://review.openstack.org/60861001:16
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add support for enabling the ARA callback plugin in install-ansible  https://review.openstack.org/61122801:19
openstackgerritIan Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results  https://review.openstack.org/61721601:19
openstackgerritIan Wienand proposed openstack-infra/system-config master: Prefix install_openstacksdk variable  https://review.openstack.org/62146201:19
openstackgerritIan Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role  https://review.openstack.org/62146301:19
openstackgerritIan Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role  https://review.openstack.org/62146301:21
openstackgerritIan Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results  https://review.openstack.org/61721601:21
*** jamesmcarthur has joined #openstack-infra01:27
openstackgerritIan Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role  https://review.openstack.org/62146301:32
openstackgerritIan Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results  https://review.openstack.org/61721601:32
openstackgerritIan Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results  https://review.openstack.org/61721601:45
ianwhttp://logs.openstack.org/28/611228/9/check/system-config-run-base-ansible-devel/a5abdca/job-output.txt.gz#_2018-12-03_01_33_26_43065301:47
ianwthis is an interesting traceback in our ansible devel branch job ... that's an exception from inside python's multiprocesssing module01:48
ianwit looks like ansible is a pretty sane user of that, so it seems like a fun bug somewhere01:48
openstackgerritTristan Cacqueray proposed openstack-infra/zuul-jobs master: Add install and deploy openshift roles.  https://review.openstack.org/60861001:50
*** hwoarang has quit IRC02:03
*** hwoarang has joined #openstack-infra02:04
openstackgerritIan Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role  https://review.openstack.org/62146302:11
openstackgerritIan Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results  https://review.openstack.org/61721602:11
*** hongbin has joined #openstack-infra02:13
*** mrsoul has joined #openstack-infra02:16
*** jamesmcarthur has quit IRC02:32
*** psachin has joined #openstack-infra02:42
*** hongbin has quit IRC02:46
*** wolverineav has quit IRC03:04
*** wolverineav has joined #openstack-infra03:04
*** jamesmcarthur has joined #openstack-infra03:07
*** bhavikdbavishi has joined #openstack-infra03:14
*** hongbin has joined #openstack-infra03:21
*** wolverineav has quit IRC03:28
*** armax has quit IRC03:29
*** hongbin has quit IRC03:30
*** ramishra has joined #openstack-infra03:31
openstackgerritIan Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role  https://review.openstack.org/62146303:31
openstackgerritIan Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results  https://review.openstack.org/61721603:31
*** jamesmcarthur has quit IRC03:35
*** jamesmcarthur has joined #openstack-infra03:35
*** hamzy__ is now known as hamzy03:36
*** jamesmcarthur has quit IRC03:40
openstackgerritIan Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role  https://review.openstack.org/62146303:45
openstackgerritIan Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results  https://review.openstack.org/61721603:45
*** wolverineav has joined #openstack-infra03:55
openstackgerritIan Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role  https://review.openstack.org/62146303:59
openstackgerritIan Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results  https://review.openstack.org/61721603:59
*** jamesmcarthur has joined #openstack-infra04:06
ianw2018-11-29 03:43:13.751247 | bridge.openstack.org | ansible 2.8.0.dev004:08
ianwoh that's quite annoying, ansible doesn't give you the git head when installed from source in version04:09
openstackgerritIan Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role  https://review.openstack.org/62146304:27
openstackgerritIan Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results  https://review.openstack.org/61721604:27
openstackgerritIan Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role  https://review.openstack.org/62146304:48
openstackgerritIan Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results  https://review.openstack.org/61721604:48
openstackgerritIan Wienand proposed openstack-infra/system-config master: [wip] install ansible as editable during devel jobs  https://review.openstack.org/62147104:48
*** yamamoto has joined #openstack-infra04:55
*** agopi has joined #openstack-infra05:05
*** hwoarang has quit IRC05:10
*** hwoarang has joined #openstack-infra05:11
*** jamesmcarthur has quit IRC05:22
*** wolverineav has quit IRC05:24
*** wolverineav has joined #openstack-infra05:43
*** wolverineav has quit IRC05:48
openstackgerritIan Wienand proposed openstack-infra/system-config master: [DNM] testing ansible task handler import  https://review.openstack.org/62147505:56
*** yamamoto has quit IRC05:59
*** yamamoto has joined #openstack-infra06:00
*** hwoarang has quit IRC06:01
*** hwoarang has joined #openstack-infra06:03
*** elbragstad has quit IRC06:03
*** zul has quit IRC06:04
*** ykarel has joined #openstack-infra06:08
openstackgerritTobias Henkel proposed openstack-infra/zuul master: WIP: Add spec for scale out scheduler  https://review.openstack.org/62147906:23
openstackgerritTobias Henkel proposed openstack-infra/zuul master: WIP: Add spec for scale out scheduler  https://review.openstack.org/62147906:24
*** apetrich has quit IRC06:25
*** wolverineav has joined #openstack-infra06:34
openstackgerritSurya Prakash (spsurya) proposed openstack-infra/zuul master: dict_object.keys() is not required for *in* operator  https://review.openstack.org/62148206:35
*** ralonsoh has joined #openstack-infra06:37
*** yamamoto has quit IRC06:37
*** yamamoto has joined #openstack-infra06:38
*** apetrich has joined #openstack-infra06:40
*** kjackal has joined #openstack-infra06:47
*** wolverineav has quit IRC06:55
*** wolverineav has joined #openstack-infra06:56
*** rcernin has quit IRC06:57
*** yamamoto has quit IRC06:58
*** yamamoto has joined #openstack-infra06:59
*** wolverineav has quit IRC07:01
*** quiquell|off is now known as quiquell07:10
openstackgerritIan Wienand proposed openstack-infra/system-config master: [DNM] testing ansible task handler import  https://review.openstack.org/62147507:13
*** rkukura has quit IRC07:14
*** dpawlik has joined #openstack-infra07:15
*** dpawlik has quit IRC07:20
*** dpawlik_ has joined #openstack-infra07:20
*** aojea has joined #openstack-infra07:23
*** pcaruana has joined #openstack-infra07:25
*** wolverineav has joined #openstack-infra07:27
*** wolverineav has quit IRC07:28
*** wolverineav has joined #openstack-infra07:28
*** gema has joined #openstack-infra07:37
*** quiquell is now known as quiquell|brb07:40
*** wolverineav has quit IRC07:43
openstackgerritIan Wienand proposed openstack-infra/system-config master: [DNM] testing ansible task handler import  https://review.openstack.org/62147507:46
*** ahosam has joined #openstack-infra07:59
*** e0ne has joined #openstack-infra08:04
*** ginopc has joined #openstack-infra08:05
*** slaweq has joined #openstack-infra08:06
*** ahosam has quit IRC08:08
*** shardy has joined #openstack-infra08:11
*** yboaron_ has quit IRC08:12
ianwmordred / corvus / clarkb : it seems the iptables role has triggered a real issue somewhere in our ansible devel branch testing job; I've filed https://github.com/ansible/ansible/issues/49430 with details08:12
ianwcertainly it seems related to the importing of tasks into the reload handler08:13
*** jtomasek has joined #openstack-infra08:15
*** jtomasek has quit IRC08:15
*** jtomasek has joined #openstack-infra08:16
*** quiquell|brb is now known as quiquell08:18
openstackgerritIan Wienand proposed openstack-infra/system-config master: [to squash] Modifications to ARA installation  https://review.openstack.org/62146308:23
openstackgerritIan Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results  https://review.openstack.org/61721608:23
openstackgerritIan Wienand proposed openstack-infra/system-config master: [wip] install ansible as editable during devel jobs  https://review.openstack.org/62147108:23
openstackgerritIan Wienand proposed openstack-infra/system-config master: [DNM] testing ansible task handler import  https://review.openstack.org/62147508:23
ianwdmsimard: ^ could you review 621463 for me, and if you're happy, we can squash that into the base install-ara change?   personally i think we can get this in to collect the inner ara results in the gate quickly, as that is very useful08:24
*** rossella_s has quit IRC08:26
*** ykarel is now known as ykarel|lunch08:26
*** jpena|off is now known as jpena08:28
*** rossella_s has joined #openstack-infra08:28
*** kjackal has quit IRC08:38
*** kjackal has joined #openstack-infra08:39
*** ccamacho has joined #openstack-infra08:45
*** rkukura has joined #openstack-infra08:46
*** tosky has joined #openstack-infra08:48
*** yboaron_ has joined #openstack-infra08:51
*** yboaron_ has quit IRC08:56
*** yboaron_ has joined #openstack-infra08:57
*** xek has joined #openstack-infra08:59
*** jpich has joined #openstack-infra09:03
*** aojea has quit IRC09:13
*** aojea has joined #openstack-infra09:14
openstackgerritMerged openstack-infra/infra-manual master: Fix a reST block syntax  https://review.openstack.org/62145509:37
ssbarnea|roverianw: mordred corvus clarkb : would it be a problem to upload some periodic rdo job logs to logstash? I found some errors there where log stash would be very useful for.09:37
*** gfidente has joined #openstack-infra09:38
*** ykarel|lunch is now known as ykarel09:41
ianwssbarnea|rover: you should have a chat with tristanC about his log analysis stuff, it could probably import09:42
ianwto your question, i'm not sure, clarkb is probably best to talk too.09:42
ssbarnea|roverianw: thanks. i will ask them.09:43
*** sshnaidm|off is now known as sshnaidm09:43
*** derekh has joined #openstack-infra09:57
*** yamamoto has quit IRC10:00
*** yamamoto has joined #openstack-infra10:07
*** yamamoto has quit IRC10:10
*** fresta has quit IRC10:13
*** fresta has joined #openstack-infra10:14
*** kjackal has quit IRC10:17
*** kjackal has joined #openstack-infra10:18
*** electrofelix has joined #openstack-infra10:18
*** fresta has quit IRC10:22
*** electrofelix has quit IRC10:22
*** fresta has joined #openstack-infra10:22
*** bhavikdbavishi has quit IRC10:23
*** electrofelix has joined #openstack-infra10:31
*** shardy has quit IRC10:35
*** shardy has joined #openstack-infra10:43
*** adriancz has joined #openstack-infra10:45
*** panda|pto is now known as panda10:47
*** shardy has quit IRC10:55
*** ahosam has joined #openstack-infra10:55
*** priteau has joined #openstack-infra10:56
*** yamamoto has joined #openstack-infra11:04
*** jamesmcarthur has joined #openstack-infra11:11
*** yamamoto has quit IRC11:11
*** yamamoto has joined #openstack-infra11:15
*** jamesmcarthur has quit IRC11:15
*** sshnaidm has quit IRC11:16
*** sshnaidm has joined #openstack-infra11:16
*** sshnaidm has quit IRC11:18
*** rfolco has joined #openstack-infra11:18
*** sshnaidm has joined #openstack-infra11:19
*** quiquell is now known as quiquell|brb11:21
*** owalsh_ has quit IRC11:24
*** owalsh has joined #openstack-infra11:24
*** jpich has quit IRC11:25
*** jpich has joined #openstack-infra11:26
*** hamzy_ has joined #openstack-infra11:42
*** ahosam has quit IRC11:42
*** hamzy has quit IRC11:42
*** dtroyer has quit IRC11:43
*** dtroyer has joined #openstack-infra11:43
openstackgerritTristan Cacqueray proposed openstack-infra/zuul-jobs master: Add install and deploy openshift roles.  https://review.openstack.org/60861011:44
*** quiquell|brb is now known as quiquell11:45
*** yamamoto has quit IRC11:48
*** yamamoto has joined #openstack-infra11:49
*** yamamoto has quit IRC11:49
*** yamamoto has joined #openstack-infra11:50
*** yamamoto has quit IRC11:56
tobias-urdintonyb: we got consensus to remove the stable/newton branches on all stable branches but i think the thread is kind of lost in openstack-dev list11:57
tobias-urdinwho can i talk to to queue up that work?11:57
*** electrofelix has quit IRC11:58
*** electrofelix has joined #openstack-infra12:03
*** ykarel is now known as ykarel|afk12:03
*** ahosam has joined #openstack-infra12:03
*** owalsh has quit IRC12:03
*** shardy has joined #openstack-infra12:09
*** ykarel|afk is now known as ykarel12:11
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool master: Implement an OpenShift resource provider  https://review.openstack.org/57066712:13
*** ykarel is now known as ykarel|afk12:19
*** owalsh has joined #openstack-infra12:21
*** ykarel|afk has quit IRC12:30
*** jpena is now known as jpena|lunch12:34
*** dhill_ has joined #openstack-infra12:40
*** ramishra has quit IRC12:48
*** lpetrut has joined #openstack-infra12:52
*** rlandy has joined #openstack-infra12:54
*** lpetrut has quit IRC12:55
*** lpetrut has joined #openstack-infra12:55
*** dave-mccowan has joined #openstack-infra12:58
*** Douhet has quit IRC12:58
*** ramishra has joined #openstack-infra13:04
*** rh-jelabarre has joined #openstack-infra13:06
*** boden has joined #openstack-infra13:08
*** kjackal has quit IRC13:12
*** kjackal has joined #openstack-infra13:12
*** tpsilva has joined #openstack-infra13:15
*** jamesmcarthur has joined #openstack-infra13:17
*** ykarel|afk has joined #openstack-infra13:18
*** ykarel|afk is now known as ykarel13:19
*** agopi has quit IRC13:24
*** jpena|lunch is now known as jpena13:26
*** agopi has joined #openstack-infra13:30
*** udesale has joined #openstack-infra13:30
*** dave-mccowan has quit IRC13:32
*** jamesmcarthur has quit IRC13:33
ssbarnea|roverclarkb: let me know when you are here, i want to ask you about logstash.13:34
*** ahosam has quit IRC13:35
*** zul has joined #openstack-infra13:36
*** jroll has quit IRC13:38
fungissbarnea|rover: are these logs from jobs which run in our ci system, or elsewhere? injecting third-party logs into our elasticsearch backend is something we've said in the past we won't support, and instead recommend those third parties operate their own log analysis systems (they're welcome to reuse the same mechanisms we do for running them if they like)13:38
*** jroll has joined #openstack-infra13:38
fungiianw: catching up on scrollback, did you come to a conclusion on how to unblock system-config changes (the failing "Install IPv4 rules files" task)?13:40
ssbarnea|roverfungi: so short answer no (way to have an unified interface to query logs across differrent CI systems).13:40
ssbarnea|roveri guess there is no need to explain why this would be useful (also related to elastic-recheck), as same error could easily spread across different CIs13:41
fungissbarnea|rover: right, we already struggle for a reasonable amount of retention with just the logs from our ci systems. we've also had other projects ask to reuse our elasticsearch cluster to house performance metrics from jobs in their jenkins simply so they can avoid having to maintain an elasticsearch cluster themselves... not sure where we can sanely draw the line, but previously we've said "only13:43
fungijobs which run in our ci system"13:43
fungiyou can also run your own elastic-recheck service. it's published under an open license too13:44
fricklerfungi: iiuc we'd have to make system-config-run-base-ansible-devel non-voting if we need to merge something before we find a fix or workaround for that ansible issue13:45
fungifrickler: thanks, i need to go run some errands here shortly, but when i get back i can try to take a look so i can merge the mailing list changes which were scheduled to go in today13:46
fricklerfungi: I can prepare a patch for that13:47
*** jaosorior has joined #openstack-infra13:47
fungiis there a theory as to why ansible isn't finding the "Reload iptables Debian" handler?13:48
fungii saw ianw say something about exposing a bug in ansible13:48
fricklerfungi: https://github.com/ansible/ansible/issues/49430 has some details, but no root cause yet if I read it correctly13:50
ssbarnea|roverfungi: :) i know, i was trying to lower the number of system I need to check, not increasing it. i do understand the reasons behind. still kibana supports doing queries on multiple clusters, which means that it could be possible to configure it as a single frontend for both clusters.13:50
openstackgerritJens Harbott (frickler) proposed openstack-infra/system-config master: Make system-config-run-base-ansible-devel non-voting  https://review.openstack.org/62157713:51
*** Douhet has joined #openstack-infra13:52
*** jamesmcarthur has joined #openstack-infra13:52
mordredfrickler: fascinating13:57
fricklerianw: fungi: I think I found the commit that broke ansible for us, added reference to the issue. still not sure whether that implies that our usage is broken13:57
fricklermordred: ^^13:57
mordredfrickler: yah. I was just reading your comment there13:57
fungineat-o14:00
*** fried_rice is now known as efried14:00
*** quiquell is now known as quiquell|lunch14:01
*** efried is now known as fried_rice14:01
*** fried_rice is now known as efried14:02
*** jcoufal has joined #openstack-infra14:02
*** kgiusti has joined #openstack-infra14:03
*** yboaron_ has quit IRC14:05
*** yboaron_ has joined #openstack-infra14:05
*** mriedem has joined #openstack-infra14:07
*** jcoufal has quit IRC14:07
openstackgerritJens Harbott (frickler) proposed openstack-infra/system-config master: Fix iptables handlers  https://review.openstack.org/62158014:09
fricklerianw: fungi: mordred: ^^ I think that this should be the fix, waiting to see job results14:10
*** jcoufal has joined #openstack-infra14:11
fungithanks frickler!14:15
mordredneat!14:16
*** jcoufal has quit IRC14:17
*** jcoufal has joined #openstack-infra14:19
*** psachin has quit IRC14:24
*** SteelyDan is now known as dansmith14:28
*** quiquell|lunch is now known as quiquell14:40
fungiokay, heading out for errands, back in a sec14:42
*** nhicher has joined #openstack-infra14:42
*** jpich has quit IRC14:42
*** lbragstad has joined #openstack-infra14:49
*** gema has left #openstack-infra14:49
*** jpich has joined #openstack-infra14:50
*** dave-mccowan has joined #openstack-infra14:54
*** bobh has joined #openstack-infra14:54
*** lbragstad has quit IRC14:58
*** lbragstad has joined #openstack-infra15:00
*** beekneemech is now known as bnemec15:01
*** jamesmcarthur has quit IRC15:06
*** sthussey has joined #openstack-infra15:16
hughsaunders Hey, I've been looking into nodepool again, and it seems there isn't an attempt to route requests to workers that have ready capacity. Also ready capacity isn't evenly distributed, so once you have more than a few regions that can provide a label, the chances of hitting ready capacity are quite low. Eg if I have 5 regions, and min-ready:3, there will probably only be ready capacity in 2 regions, which gives a request a 2/515:30
hughsaunderschance of hitting a ready node.15:30
*** dpawlik_ has quit IRC15:31
hughsaundersI started digging into the code because I couldn't work out why my requests were waiting for new instance builds when there were ready nodes waiting.15:31
hughsaundersSo am I doing something wrong? Or have I come to an accurate summary of the current situation? If so would you accept some kind of patch to attempt to prioritise regions with ready capacity?15:32
fungihughsaunders: nodepool/zuul development discussions likely have a better audience in the #zuul channel, as nodepool technically isn't an openstack-infra project any longer15:33
hughsaundersprobably should have remembered that, apologies and thanks.15:33
corvushughsaunders: if you want to hop over to that channel, i can answer your question there :)15:33
dmsimardI was looking at the state of the gate because it seemed like there was a little bit of a backlog. Is it okay for certain projects to have >25 jobs on a single change ?15:37
mordreddmsimard: yeah. there is a set of patches that started to be rolled out friday that are intended to make the  backlog a bit fairer15:40
mordreddmsimard: but as of now there haven't been any limits placed on to numbers of jobs per projects15:40
dmsimardI like to think that if they have as many jobs it's because they need it, was just genuinely curious -- I think I saw a set of changes by tobiash to get metrics too.15:41
mordreddmsimard: yah. like - I have a ton of jobs on sdk ... but they're all actually useful (I keep trying to remove some)15:42
*** jamesmcarthur has joined #openstack-infra15:42
tobiashdmsimard: you mean https://review.openstack.org/616306 ?15:43
dmsimardyeah15:43
*** aojeagarcia has joined #openstack-infra15:51
*** ykarel has quit IRC15:53
*** ykarel has joined #openstack-infra15:53
*** yboaron_ has quit IRC15:54
*** aojea has quit IRC15:54
*** rtjure has quit IRC15:55
*** lennyb_ has quit IRC15:55
*** jhesketh has quit IRC15:55
*** dayou_ has quit IRC15:56
AJaeger_tripleo is running again - or still - non-voting jobs in gate ;( . EmilienM , jaosorior, please see https://review.openstack.org/616872 which is right now top of zuul gate for tripleo and has 4 non-voting jobs15:56
EmilienMAJaeger_: ok15:56
*** jhesketh has joined #openstack-infra15:57
*** lennyb has joined #openstack-infra15:57
*** quiquell is now known as quiquell|off15:58
*** janki has joined #openstack-infra15:58
EmilienMmwhahaha: ^ didn't we fix that?15:59
mwhahahathere's a patch16:00
mwhahahahttps://review.openstack.org/#/c/620705/16:00
*** dayou_ has joined #openstack-infra16:00
mwhahahapending approval :/16:00
EmilienMapproved16:00
*** Douhet has quit IRC16:01
AJaeger_thanks, EmilienM and mwhahaha !16:01
*** woojay has joined #openstack-infra16:02
*** Douhet has joined #openstack-infra16:02
fungineed us to promote that change to the front so it will take effect sooner?16:03
EmilienMfungi: yes please16:04
AJaeger_fungi: it's only for tripleo-ci and there's onl 616872 at top of gate, let it finish16:04
*** gyee has joined #openstack-infra16:04
fungik16:05
fungiwasn't sure if it was in one of the longer shared queues16:05
AJaeger_it is in the longer shared queue - but only releveant for tripleo-ci and as there are no other changes for that repo, promoting would harm us IMHO16:06
fungiahh, figured if it was removing a lot of non-voting jobs then we would stop running them that much sooner16:07
*** rtjure has joined #openstack-infra16:10
*** dpawlik has joined #openstack-infra16:11
*** adriancz has quit IRC16:14
*** dklyle has joined #openstack-infra16:15
*** dpawlik has quit IRC16:16
clarkbamorin: fungi: any word if we should reenable bhs1 at this point?16:17
clarkbssbarnea|rover: I am here if you want to talk logstash, or did fungi answer your questions?16:17
*** dtantsur is now known as dtantsur|afk16:17
clarkbI agree we with fungi. Our elasticsaerch and logstash tooling is built for our CI system. Its unfortunately not a great set of tooling to offer to third parties (due to AAA being non existant and the size of the cluster already being quite large for the few days of logs we get out of it)16:18
ssbarnea|roverclarkb: fungi answered most questions, mainly the only remaining one is if we can configure kibana to query both elastic-search clusters.16:18
clarkbssbarnea|rover: both meaning Infra's and RDOs?16:18
clarkbno I don't think we should do that either16:18
ssbarnea|roverclarkb: yep. rdo cluster could be optional.16:19
ssbarnea|roverclarkb: i will try to see if I can configure the rdo kibana to query both (own and upstrean).16:19
*** dpawlik has joined #openstack-infra16:20
ssbarnea|roverthe idea is to have one unified query interface16:20
clarkbthe issue is that we aren't one unified system though16:20
clarkbthe infra team has zero ability to fix bugs in rdop16:20
clarkbbut presenting that data as coming from our CI system would imply otherwise16:20
clarkband I don't want to create that confusion16:20
fungiwe get enough questions every time systems people already incorrectly assume we manage are offline16:21
dmsimardwas anyone looking at the issues we had in ovh hs ?16:21
dmsimardbhs*16:21
ssbarnea|roverclarkb: never mind, i will try to configure rdo to query both.16:21
fungidmsimard: amorin said he was going to look into it, yes16:21
*** bobh has quit IRC16:25
*** yamamoto has joined #openstack-infra16:25
*** janki has quit IRC16:26
amorinfungi: yes, I did try to take a look, but I was trapped in another topic16:27
dmsimardamorin: let us know if we can help :)16:29
fungifrickler: your proposed fix seems to be raising an "Unexpected Exception" from the iptables : Reload iptables (Debian) handler now16:29
funginot quite sure what to make of that16:29
fungihttp://logs.openstack.org/80/621580/1/check/system-config-run-base/caf2d3e/job-output.txt.gz#_2018-12-03_15_56_39_09589216:30
*** yamamoto has quit IRC16:30
fungii think it's saying that `netfilter-persistent start` exited nonzero?16:31
fungihrm, though the json mentions an rc of 016:32
fungiso maybe it's not talking about that task16:33
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Don't calculate priority of non-live items  https://review.openstack.org/62162616:35
fricklerfungi: oh, that error is in -base now, not in -devel16:35
fricklermaybe the change isn't backwards compatible?16:35
fungiouch16:35
fungiright, i missed that16:35
fungican we specify both import_tasks and include_tasks?16:36
*** lpetrut has quit IRC16:36
fungior are they mutually-exclusive?16:36
fricklerI have not idea, I'll leave this to ansible experts now. mordred ianw ^^16:36
fricklers/not/no/16:37
fricklerwe can merge the nv patch in the meantime I'd say16:37
clarkbfrickler: frickler tldr is ansible 2.8.0 has broken things in a non backward compatible way?16:39
clarkbI guess ianw filed a bug maybe I should start by reading that16:39
fricklerclarkb: the issue and the links in it should have some information. I'm not sure whether it is really backwards incompatible or my fix just needs more knowledge16:40
fricklerclarkb: for sure the cited merge broke the way we use ansible currently16:41
*** trown is now known as trown|lunch16:41
*** dpawlik has quit IRC16:42
*** e0ne has quit IRC16:48
clarkblooks like other users have reported similar issues16:51
clarkbso maybe switching to -nv job for now and waiting to see if ansible fixes it for all of us is the way forward?16:51
*** dpawlik has joined #openstack-infra16:52
pabelangerclarkb: corvus: I'm around again today if we wanted to try again nodepool / zuul upgrades.  I admit, I am not sure if there are any issues preventing us from trying again this morning16:57
corvuspabelanger, clarkb: yeah, we could try now, or we could wait for 621626 to land.  either should work.16:58
pabelangerlooking16:59
clarkbIf we wait then the end result is releasable assuming it works right?17:00
pabelangerlooks like a few hours of waiting, assuming we don't enqueue17:00
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Don't import in iptables handlers  https://review.openstack.org/62163317:00
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Don't import tasks in iptables reload and use listen  https://review.openstack.org/62163417:00
corvusfrickler, clarkb, fungi, ianw, mordred: ^ two more alternatives to consider17:01
corvusclarkb, pabelanger: why don't i direct-enqueue it17:01
pabelanger+117:01
clarkbcorvus: ++17:02
*** udesale has quit IRC17:02
*** aojeagarcia has quit IRC17:07
mordredcorvus: I think I like 621633 the best in this particular case, just because it's simpler17:07
corvusmordred: yeah.  i kind of like listen, and would lean toward that, if it weren't for the 'when' issue17:08
mordredyeah17:09
*** graphene has joined #openstack-infra17:10
openstackgerritJames E. Blair proposed openstack-infra/project-config master: Add #openstack-designate to accessbot  https://review.openstack.org/62163917:15
corvusfrickler: ^17:15
*** armax has joined #openstack-infra17:16
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Extract out common config parsing for ConfigPool  https://review.openstack.org/62164217:18
*** manjeets has joined #openstack-infra17:19
*** dpawlik has quit IRC17:19
*** bobh has joined #openstack-infra17:20
*** bobh has quit IRC17:21
clarkbfungi: gerrit slowness hasn't happened again and we are still blocking stackalytics user?17:25
*** jpich has quit IRC17:25
clarkbcorvus: frickler re -designate channel, I can't seem to list access for that channel with chanserv?17:27
corvusclarkb: yep.  you will when the accessbot change lands17:31
clarkbcorvus: was it set up intentioanlly that way before? seems odd17:32
corvusclarkb: not sure; might be a side effect of some of the modes set on it?17:32
corvusi'm afk for 30m; should be ready to restart zuul when i get back17:34
corvusapparently i jinxed it; py36 failed17:34
clarkbfwiw I +2'd https://review.openstack.org/621633 as I agree wtih mordred that I prefer it because it is simpler17:34
clarkbfrickler: fungi ^ if others want to maybe review a fix for the ansible thing17:35
*** dpawlik has joined #openstack-infra17:35
corvusthe sql failures again; i'm going to re-enqueue17:35
corvus(also, that's the second time i've see the sql failures on limestone)17:36
mordredclarkb: I have also +2d, but have not +Ad so that we can get folks to weigh in17:36
mordredclarkb: I wish there was a condorcet plugin for gerrit that would allow people to rank vote on a collection of patches. I have no interest in writing such a plugin though17:37
corvusmordred: ++17:37
clarkbmordred: you could probably implement that intirely in prolog17:37
mordredclarkb: yah. DEFINITELY don't want to implement a condorcet voting system in prolog17:37
clarkb:)17:38
corvusok.  re-enqueued.  back in ~30.17:38
mordredbut maybe zaro will get bored one day and write it :)17:38
*** dpawlik has quit IRC17:39
fungiclarkb: correct, i haven't seen any coordinated reports of slowness (just the occasional ones which only seemed to affect one person and couldn't be reproduced globally)17:41
fungiand i haven't removed the ip6tables rule blocking the address stackalytics-bot-2 was seen coming from17:42
*** shardy has quit IRC17:46
*** bobh has joined #openstack-infra17:55
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Extract out common config parsing for ConfigPool  https://review.openstack.org/62164217:59
*** derekh has quit IRC18:02
*** e0ne has joined #openstack-infra18:03
jonherGate never ran on https://review.openstack.org/619216/ is a normal recheck required or is there another command to only have it recheck gate?18:05
openstackgerritMerged openstack-infra/zuul master: Don't calculate priority of non-live items  https://review.openstack.org/62162618:07
openstackgerritClark Boylan proposed openstack-infra/zuul master: Handle github delete events  https://review.openstack.org/62166518:10
*** ralonsoh has quit IRC18:11
openstackgerritEd Leafe proposed openstack-infra/project-config master: Add the os-resource-classes project  https://review.openstack.org/62166618:11
fungijonher: that's really strange. i don't see any indication of maintenance activity around that time18:11
clarkbfungi: jonher: zuul was restarted a couple times on that day18:12
clarkbtryign to get the relative priority work deployed18:12
fungiindeed it was according to http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2018-11-30.log.html18:12
fungijust never made it into https://wiki.openstack.org/wiki/Infrastructure_Status18:12
jonherOK, so a simple "recheck" should get things going again?18:13
clarkbjonher: yes18:13
clarkbor better yet reapproval18:13
clarkbwhic I've done18:13
clarkb(then we can skip the check queue)18:13
fungiapprove event in gerrit was at 22:49 and looks like there was indeed a zuul scheduler restart ni progress according to the channel log18:13
fungiso no mystery, just poor timing on my part with the approve button18:14
openstackgerritEd Leafe proposed openstack-infra/project-config master: Add the os-resource-classes project  https://review.openstack.org/62166618:14
jonhergr8, thanks clarkb18:14
*** jpena is now known as jpena|off18:17
*** apetrich has quit IRC18:18
openstackgerritMerged openstack-infra/infra-manual master: Replace mailing list  https://review.openstack.org/61921618:23
clarkbfungi: did old mailing lists get disabled yet? that is on tap for today right?18:24
jonher^ now it merged, thanks again :)18:24
openstackgerritTobias Henkel proposed openstack-infra/zuul master: WIP: Fix broken setRefs whith missing objects  https://review.openstack.org/62166718:24
fungiclarkb: that is on tap for today, but need to be able to merge system-config patches to do that, ideally18:25
clarkbfungi: did you see https://review.openstack.org/#/c/621633/ as a fix for that?18:25
fungiyep, and earlier attempts18:25
fungiwas waiting to see check results18:25
*** udesale has joined #openstack-infra18:26
*** apetrich has joined #openstack-infra18:30
*** electrofelix has quit IRC18:31
*** ykarel is now known as ykarel|away18:36
corvusclarkb, pabelanger: zuul change is in place; shall we start some restarts now?18:38
*** eernst has joined #openstack-infra18:39
fungior restart some starts18:39
corvusperhaps most accurately: restart some restarts18:39
clarkbI'm around and ready18:41
fungialso around and not mired in anything especially sticky18:42
*** vabada has quit IRC18:43
corvuswould someone like to go aheand and restart the nodepool launchers?18:44
corvusand i can restart the zuul scheduler afterwords18:44
fungii can do that18:45
fungiany special care to take, or just service restart them?18:45
corvusfungi: maybe start with nl0418:45
corvuswe did merge at least one change since the last time we restarted them18:45
pabelangercorvus: clarkb: I am around18:46
pabelangeron standby if needed18:46
fungipbr freeze says we have nodepool==3.3.2.dev67  # git sha f116826 installed on nl0418:46
corvuslooks right18:47
fungithat's what we're expecting, seems to match origin/master18:47
clarkbya nl04 is good choice while bhs1 is disabled18:47
*** ykarel|away has quit IRC18:47
funginodepool-launcher restarted on nl04 now18:47
corvusit's going to be very very chatty for a bit18:48
fungiwith `service nodepool-launcher restart` which seems to have worked. new pid, current time18:48
fungiand yeah, tailing the debug log it is indeed chatty18:48
fungiseems to have reached steady state now?18:48
fungiit's handling requests anyway18:49
corvusyeah, still very chatty.  i'm on the fence about whether we can handle that level long-term.  but it's going to be useful for the next little bit to be able to examine the new behavior.18:49
corvusi think we can proceed to restart the rest18:49
fungishall i work my way down the list with nl03 next?18:49
corvus++18:50
fungif116826 is installed there too18:50
fungiit's restarted on nl03 now18:50
fungiwhile that's going, i've checked `pbr freeze` on nl02 and 01 and they both look right as well18:52
openstackgerritJames E. Blair proposed openstack-infra/nodepool master: Make launcher debug slightly less chatty  https://review.openstack.org/62167518:53
corvusthat's for later ^18:53
fungii think nl03 is handling requests, the debug log is just so firehose it never pauses18:54
fungishall i move on to nl02?18:54
corvusfungi: yep18:54
fungiokay, it's restarted as well18:55
*** diablo_rojo has joined #openstack-infra18:55
mordredcorvus, fungi, clarkb: I'm around-ish .. but a dude is coming over to the house in a few minutes to give us some quotes on some work, so I'm not around-around18:56
corvushrm.  we're missing a debug line at the start of request processing; it's hard to tell (with grep) when the loop starts again18:56
fungii do see nl02 seeming to satisfy some requests though according to the log18:56
fungiif i'm reading correctly18:56
*** wolverineav has joined #openstack-infra18:57
clarkbyes it appears to be declining requests for citycloud18:58
fungiokay, moving on to nl01 i guess18:58
fungiand it's restarted18:59
*** trown|lunch is now known as trown|outtypewww18:59
fungithis one's not so active compared to 02 and 0318:59
fungiopenstack.exceptions.HttpException: HttpException: 403: Client Error for url: https://ord.servers.api.rackspacecloud.com/v2/637776/servers, Quota exceeded for ram: Requested 8192, but already used 1638400 of 1641728 ram19:00
fungiwhee!19:00
clarkbfungi: the launchers with disabled providers are more active since they just decline things19:01
clarkb"more active"19:01
*** wolverineav has quit IRC19:01
fungioh, right, that's what it is19:01
*** wolverineav has joined #openstack-infra19:01
fungii hadn't made that connection19:01
mordredthat's working-as-designed :)19:04
*** e0ne has quit IRC19:04
clarkbya I think this looks happy19:05
clarkbcorvus: are we ready to restart zuul?19:05
fungiseems sane on the launcher end now at any rate19:05
corvuslet's hold the zuul restart for a few minutes; there's a release making its way through right now19:05
corvusit has 1min left in gate; then of course the actual post-merge release activity19:06
corvushttps://review.openstack.org/#/c/620919/19:06
corvusafter that we should be good (see #openstack-release)19:06
*** jamesmcarthur has quit IRC19:06
fungilooks like the system-config fix is really, really close to getting node assignments19:06
corvusfungi: it should still be after the restart.19:08
fungiindeed19:08
corvus(possibly closer)19:08
*** shardy has joined #openstack-infra19:17
Shrewshrm, did we remove a provider pool from nl04?19:17
ShrewsWARNING nodepool.driver.openstack.OpenStackProvider: Cannot find provider pool for node19:17
*** e0ne has joined #openstack-infra19:17
clarkbwe disabled bhs1 via max servers19:19
fungiyeah, didn't remove one afaik19:19
clarkbI don't think we remoed any providers though. Does it not log the one it thinks is missing?19:19
Shrewsthis is for ovh-gra1, which still exists in nodepool.yaml19:19
Shrewssomething is weird there19:19
Shrewspool and launcher node attributes are empty. maybe this is due to corvus' recent change...19:20
clarkbwe seem to have launched new nodes there since the restart19:20
*** priteau has quit IRC19:20
*** amotoki has quit IRC19:21
Shrewshrm, not the change i was thinking of...19:22
corvusShrews: the pool is named "pool"?19:23
*** amotoki has joined #openstack-infra19:23
corvusoh, no you said it's None.  sorry.19:24
fungicruft for something hanging around in zk?19:24
corvusShrews: could it be that when we create a fake node for deleting a failure, it has no pool entry?19:25
Shrewscorvus: seems that way (just a warning that i hadn't noticed). ovh doesn't seem to be able to delete that instance, so it's hanging around19:26
Shrewsso a problem with the provider19:26
corvusShrews: ok, so we're still trying to delete those nodes (ie, it's a non-fatal error)?19:26
Shrewscorvus: right19:26
fungisince their upgrade (to newton i think?) ovh has been struggling to satisfy delete requests in a timely fashion19:26
tobiashyes, we only set the provider19:26
*** wolverineav has quit IRC19:27
Shrewstobiash: is that warning useful?19:27
corvusShrews: ok.  we're probably seeing it more because of the recent fix to create those stub nodes more often (on launch failures which return an external id)19:27
clarkbunrelated but https://github.com/kubernetes/kubernetes/issues/71411 probably means we want to redeploy the nodepool k8s cluster when oen of those patched versiosn is available19:27
tobiashShrews: from where does it come?19:27
corvusShrews: we could probably copy in the pool from the original request19:27
Shrewstobiash: during quota calculation19:27
clarkbcorvus: ++19:27
*** wolverineav has joined #openstack-infra19:28
tobiashactually then that node isn't taken into account during quota calculation19:28
tobiashso I think the warning was useful19:28
tobiashI think we should add the pool to these nodes too19:28
Shrewstobiash: ++19:28
tobiashbut that's something that is already there for a long time so nothing fatal19:29
corvusheh -- the fix was to make sure the node was taken into account for quota.  so.. yep.  :)19:29
fungicorvus: unrelated, but the 621633 fix for system-config is failing puppet-beaker-rspec-puppet-4-infra-system-config and system-config-run-base19:30
fungidigging into logs for those now19:30
clarkbis createServer what sets node.pool?19:30
fungithe former is raising "ERROR! The requested handler 'Reload iptables Debian' was not found in either the main handlers list nor in the listening handlers list"19:31
fungiand so is the latter19:31
tobiashShrews, corvus: as the comment states, the node is in a funny state: http://paste.openstack.org/show/736590/ :)19:31
fungiso i guess that's still being referenced19:31
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Set pool for error'ed instances  https://review.openstack.org/62168119:32
*** wolverineav has quit IRC19:32
Shrewsi think ^^ fixes it19:32
*** wolverineav has joined #openstack-infra19:32
clarkboh right its because we make a copy of the node data structure in the bubbled up exception handler19:33
clarkbwe don't use the actual node, instead that is reused19:33
clarkbShrews: ya I think that should fix it19:33
*** bobh has quit IRC19:33
corvusfungi: hrm.  i guess that doesn't work.  i don't immediately know why, but i wonder if it's due to the arcane rules for referencing handler tasks by name (referred to vaguely in one of the ansible bug reports)19:34
corvusin other news, the release is done, so we can restart zuul now19:34
clarkbdo we need Shrews' fix to make quota crunching work properly?19:35
tobiashclarkb: not immediate, this thing is already there for a long time19:35
clarkbwe merged two related chagnes around that before. The first attempted to track the ndoes properly and the second to track untracked nodes. I think these ndoes are currently "tracked" but then fail to be deleted19:36
clarkbtobiash: yes. Mostly wondering if the secodn related change will change the behavior in a more negative way than what we had before19:36
tobiashwhich one?19:36
clarkbtobiash: 56164c886a81c5d5c67eaac789a6288dd555189b19:37
clarkbI guess its the same as it was before since ^ will see them as untracked and not account for them and afbf9108d893ede0d147da2afe16c9e6d4bc76d4 will basically treat them as untracked too19:37
clarkbso not a worse regression, just not fixed yet19:37
tobiashclarkb: that dows19:37
tobiashclarkb: that doesn't make use of the pool, so shouldn't matter19:38
AJaeger_corvus, clarkb , frickler , #openstack-designate redirects to #openstack-dns, I'll WIP https://review.openstack.org/#/c/621639 since I think it's wrong19:38
clarkbAJaeger_: oh that explains it19:38
clarkbcorvus: I'm ready for scheduler restart whenever you are19:39
fungicorvus: your 621634 alternative is actually passing all its jobs19:39
clarkblooks like tripleo gate just reset too19:39
clarkbso not a bad time for it19:39
fungiso that one might win for being the only one proposed so far which actually works ;)19:40
tobiashclarkb:  for the record, this is the change that introduced the 'without pool nodes': https://review.openstack.org/58985419:40
corvusAJaeger_: can you elaborate on why you think the change is wrong?19:40
tobiashso that merged 3 months ago19:40
clarkbtobiash: ya and afbf9108d893ede0d147da2afe16c9e6d4bc76d4 attempted to rely on it but was incomplete19:41
tobiashah that makes sense19:42
clarkbfungi: that is weird since 621633 uses the existing handler names in the main handler file. Basically that didn't change. So odd we'd run into import_tasks errors in that file if it can't even find the handlers19:42
AJaeger_corvus: see my comment - joining #openstack-designate, the topic is "This channel is unused, use #openstack-dns"19:43
*** AJaeger_ is now known as AJaeger19:43
AJaegercorvus: so, why are you adding it? What triggered that change?19:43
corvusAJaeger_: yes... i'm not suggesting anyone use it.  i'm just trying to establish basic access.19:43
fungiclarkb: i dunno what to tell you, but aside from infra-puppet-apply-3-centos-7 which only just got a node assignment, every other job has reported success on 62163419:43
*** gfidente is now known as gfidente|afk19:43
corvusAJaeger: frickler needs access to that channel to be able to set +i to make the forward effective.  accessbot will grant him that access.19:44
corvusAJaeger: see http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2018-11-28.log.html#t2018-11-28T18:57:50 and http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2018-11-28.log.html#t2018-11-28T19:04:5219:44
AJaegercorvus: Ah! That explains it - thanks, then all is fine!19:44
corvusAJaeger: but, moreover, i can't see as how any change that adds an "#openstack-*" channel to accessbot would be wrong.19:44
corvusall openstack channels should be managed by accessbot19:44
clarkbfungi: ya mostly just pointing out its weird that a change whcih doesn't change teh addressing of the handlers would go from failing to run said handlers due to import tasks to failing to find the handlers at all.19:45
clarkbseems like this should be alarm bell worthy for ansible 2.8 release process if its goign to create havoc in handlers for people19:45
AJaegercorvus: even if unused?19:45
corvusAJaeger: yeah, i don't see why not19:46
corvusAJaeger: otherwise, we won't maintain op access for new global irc ops, etc.19:46
AJaegerOk, I see...19:46
AJaegerthanks for explanation, corvus19:46
corvusAJaeger: np :)19:47
fungiAJaeger: consider a future state where we want to start using the channel again and we left it in some old state owned exclusively by accounts we replaced in intervening years19:47
fungikeeping the abandoned channels in our accessbot config preserves our access to them19:48
corvusi'll restart the zuul scheduler now19:48
*** shardy has quit IRC19:48
AJaegerfungi: understood now - thanks19:48
clarkbmordred: Shrews pabelanger is that something that ansible might find as useful prerelease feedback? do we need to do anything other than just watch the existing bugs for the issue?19:49
pabelangerclarkb: feedback of the iptables issue?19:51
*** e0ne has quit IRC19:51
clarkbpabelanger: yes. Basically we can't use import_tasks anymore in the handlers. But then if we switch to using normal tasks in the handler (https://review.openstack.org/#/c/621633/1/playbooks/roles/iptables/handlers/main.yaml) then ansible 2.8 says it can't find the handler for Reload iptables Debian19:51
clarkbthe fix that does work is 621633 which adds explicit listens to the handlers19:52
*** e0ne has joined #openstack-infra19:52
pabelangerclarkb: Yah, we could ask in #ansible if it will be useful info19:53
fungiclarkb: the fix which works (or seems to) is 621634 not 62163319:55
fungithough it uses listen as you describe19:55
corvuszuul is restarted19:56
clarkboh sorry I copy pasta'd wrong19:57
*** wolverineav has quit IRC19:57
clarkbfungi: yup 621634 is the one I meant19:57
*** wolverineav has joined #openstack-infra19:58
clarkbcorvus: is not being able to get a status related to zuul loading its config on first start?20:00
clarkboh there it goes20:00
*** tpsilva has quit IRC20:01
corvusclarkb: related to re-enqueuing (they're both gearman jobs)20:01
corvusi've examined the extra debug logs from nodepool and verified that it's processing priority 0 requests before higher numbers20:02
*** e0ne has quit IRC20:02
corvusalso, the priority column is visible in the 'nodepool request-list' output.20:03
*** wolverineav has quit IRC20:03
*** e0ne has joined #openstack-infra20:03
pabelangerYay20:04
clarkbfungi: do you think we should enqueue 621634 to the gate since its been shown to work but didn't finish check testing?20:05
fungiclarkb: yes, i think so as long as everyone prefers that to making the job nonvoting20:05
fungiit seemed to be the least preferred of the various attempts at fixing, so i wasn't sure20:05
corvusso i think things are functioning correctly; probably the next step is to see if things behave how we expect with the changes.  that will probably be easier to evaluate after we get past the restart.20:06
clarkbfungi: I think this type of error shows there is value in having the test and I worry that if you set it non voting we'll just ignore new failures20:06
fungime too20:06
fungicorvus: i concur20:06
clarkbcorvus: ya last time it seemed that the restart made it hard to see what was normal behavior20:06
clarkbI've +2'd 621634 and think we can move fowrad with that while nasible figures out if its broken things sufficiently for fixing20:07
corvusit's priority 1, btw.20:07
corvus621634 is20:07
fungii guess 33 was pri020:08
corvusso, aside from the fact that the whole system is busy satisfying nodes for the changes which arrived first, it's pretty high on the list for check nodes20:08
clarkbfungi: yup20:08
corvusooh20:09
corvusi want to dequeue 33 and see if 34 gets bumped20:09
fungian excellent test!20:09
fungii say go for it20:09
corvusdone!20:09
*** e0ne has quit IRC20:10
fungiit fell out of the check pipeline at least20:10
corvus2018-12-03 20:09:39,668 DEBUG zuul.nodepool: Revised relative priority of node request <NodeRequest 300-0000624391 <NodeSet [<Node None ('bridge.openstack.org',):ubuntu-bionic>, <Node None ('trusty',):ubuntu-trusty>, <Node No20:10
corvusne ('xenial',):ubuntu-xenial>, <Node None ('bionic',):ubuntu-bionic>, <Node None ('centos7',):centos-7>]>> from 1 to 020:10
clarkbjobs just started20:10
clarkbseems to work as expected20:10
corvusyep -- that log line plus i checked nodepool request-list and saw it go from 1 to 020:11
fungiand it's getting nodes already20:12
fungiyeah20:12
fungislick!20:12
pabelanger++20:12
corvus\o/20:12
fungii love it when a plan comes together20:12
*** jamesmcarthur has joined #openstack-infra20:12
* corvus lights cigar20:12
corvusi've rechecked 633 (for posterity)20:15
corvusgranted, posterity is, what, a few weeks around here, but hey.20:15
*** david-lyle has joined #openstack-infra20:16
corvusso i think i'll eat some food now, and then come back and make sure that we're actually reporting on changes and don't have any crazy new exceptions, then i'll send that email we drafted friday20:16
fungithanks! i'll get back to drafting e-mails about mailing list shutdowns20:17
*** manjeets_ has joined #openstack-infra20:17
*** e0ne has joined #openstack-infra20:17
*** eernst has quit IRC20:18
*** manjeets has quit IRC20:18
*** dklyle has quit IRC20:18
*** munimeha1 has joined #openstack-infra20:19
*** jamesmcarthur has quit IRC20:20
*** e0ne has quit IRC20:21
*** jamesmcarthur has joined #openstack-infra20:22
* mordred is back - looks like the new stuff is working good!20:26
clarkbssbarnea|rover: fyi http://logs.openstack.org/38/618638/1/gate/tripleo-ci-centos-7-containers-multinode/45126b1/ara-report/file/eb257cab-ab3a-45e8-8d69-f33d118f5916/#line-10 is failing because it needs root20:27
ssbarnea|roverclarkb: ouch... kinda is almost 9pm here... ]20:28
clarkbI'm guessing https://review.openstack.org/#/c/616872/ is the cause. No worries thought I'd point it out to someone in tripleo20:30
clarkbEmilienM: mwhahaha ^ you may care too and be more on awake timezones right now ;)20:30
*** udesale has quit IRC20:30
mwhahahawaa?20:30
mwhahahaoh thanks20:30
clarkbactually that change may be unrelated. Now thinking that maybe if package has no updates and is already installed that works because you can yum info/list without root20:31
clarkbbut if that package has updated in rdo or centos or somewhere then we'll try to upgrade it and then it breaks20:31
clarkbin any case become: true there likely necessary20:32
mwhahahai shall fix20:32
*** wolverineav has joined #openstack-infra20:32
ssbarnea|roverclarkb: thanks for reproting, i am creating bug for it now. true: become is a must there, is... obvious.20:32
mwhahahassbarnea|rover: you want to fix it since you're creating a bug20:33
ssbarnea|rovermwhahaha: ok, i will do both. pinging you to review.20:33
ssbarnea|roverin fact creating bug is harder than crearting the CR :D20:34
mwhahahapretty much20:34
ianwat least it seems the ansible-devel job is working to find issues well before we update and everything explodes :)20:36
clarkbianw: ya and the fix for the first issue seems to have found a second issue20:37
*** graphene has quit IRC20:38
ianwclarkb: so that's 621633 ... where using the block: the handlers also don't seem to be found/triggered?20:39
clarkbianw: ya the handler isn't found20:40
clarkbcould be the same issue manifesting differently or two different issues, unsure20:40
corvushow did those tripleo changes end up in gate with that error?20:41
ianwok, my github bug wasn't the best i know, i didn't have a test-case and only noticed it was the imports late in the day.  can work on getting something use for bug now we have some smoking guns20:41
ssbarnea|roverclarkb: https://review.openstack.org/#/c/621696/ -- going out now.20:41
*** e0ne has joined #openstack-infra20:42
*** e0ne has quit IRC20:42
corvuswe've merged changes since the restart20:44
clarkbcorvus: I think it may have to do with local image install state and remote package availability20:44
clarkbcorvus: ansible can check if you have teh latest installed without root. And if you do have latest already installed its fine20:44
clarkbcorvus: but if the upstream package repo updates then now you need root to reconcile the delta20:44
*** hjensas has joined #openstack-infra20:45
clarkbI expect all the changes to fail with those errors until 621696 merges or the upstream package repo reverts the update20:45
corvusi think i see a new exception in the scheduler logs; i'm digging20:48
clarkband ya there is a relatively recent package for util-linux in the centos 7 package repo. Time stamp is jsut over 2weeks old. Unsure if that timestamp maps to build time or publish or what20:48
clarkbseems like octavia is also having rpm/yum/centos related issues20:50
mwhahahaugh so that ceph-loop-device thing is going to completely hose up the gate, any way to get that promoted to the top of the tripleo gate?20:50
clarkbhttp://logs.openstack.org/38/617838/5/gate/octavia-v2-dsvm-scenario-centos-7/9e669f8/job-output.txt.gz#_2018-12-03_20_09_45_93525620:50
clarkbmwhahaha: ya if it gets approved I can enqueue and promote it20:51
mwhahahaclarkb: aproved20:51
mwhahahaall approved20:51
mwhahahaer also20:51
* mwhahaha give sup20:51
clarkbI wonder if that would be a useful ansible lint rule20:52
mwhahahayes20:52
clarkbuse become for package installs20:52
clarkbpromotion is running now20:53
clarkband done20:54
mwhahahathanks20:54
clarkbjohnsom: rm_work hey not sure why yet, but it seems centos7 updates have broken octavia gates, you'll probably want to look into it20:54
clarkbI'm guessing today is the next point release release20:55
johnsomclarkb I saw a failure this morning with a missing package at RAX, just assumed it was a mirror sync issue20:55
clarkbjohnsom: I think its likely due to 7.6 or whatever the number is happening20:56
clarkbjohnsom: and packages being broken there? I'm not sure. Yum called it a non fatal rpm install thing20:56
johnsomThe one I saw was radvd couldn't be downloaded from the mirror20:57
clarkbhttp://logs.openstack.org/38/617838/5/gate/octavia-v2-dsvm-scenario-centos-7/9e669f8/job-output.txt.gz#_2018-12-03_20_09_10_181752 at least I'm not seeing anything else that could be thep roblem20:57
*** apetrich has quit IRC20:58
fungi#status log removed static.openstack.org from the emergency disable list now that ara configuration for logs.o.o site has merged20:58
openstackstatusfungi: finished logging20:58
*** udesale has joined #openstack-infra20:58
clarkbhttps://lwn.net/Articles/773680/ yup its likely 7.620:58
clarkb#status Log CentOS 7.6 appears to have been released. Our mirrors seem to have synced this release. This is creating a variety of fallout in projects such as tripleo and octavia. Considering that 7.5 is now no longer supported we should address this by rolling forward and fixing problems.20:59
openstackstatusclarkb: finished logging20:59
clarkbjohnsom: reading the devstack function for detecting failures any one of those lines that says failure: something will cause the failure to bubble up in devstack21:04
clarkbthough maybe the no package golang error is the actual issue?21:05
clarkbsure enough there is no golang package21:06
clarkbianw: ^ this is something you probably have the hsitory around to know how to debug21:06
clarkbwell thats curious 7.5 had golang21:07
clarkb7.6 does not21:07
johnsomThat seems like an issue bigger than a dot release...21:08
clarkbwell its the dot release not being backward compatible by removing packages21:08
clarkbso yes, but also not much we can do about it? may need to enable epel and use their golang?21:08
*** gema has joined #openstack-infra21:09
ianwhrm, that doesn't look like intended behaviour21:09
ianwit does say non-fatal error ... we do have some extra stuff in there because yum doesn't exit with !0 on missing packages21:10
clarkbianw: ya devstack has a check of itself to look for Failure: and no package lines21:10
clarkbin this case Failure: is not going to match failure: I don't think since awk should be case sensitive. I now believe the lack of a golang package is the issue21:10
clarkbwhich is devstack checking correctly that all packages installed (and they did not)21:11
ianwNo package golang available.21:11
ianwyeah, i think we've come to the same conclusion this is a correct detection of the golang package not being found  :)21:12
ianwwhy this just started happening ...21:12
ianwis another question21:12
mordredianw: because. raisins21:12
clarkbianw: because 7.6 just released21:13
clarkblikely our mirrors just recently finished releasing21:13
clarkbmwhahaha: you'll likely want to keep an eye out for any other fallout now that the become: true fix is in place21:13
clarkbmwhahaha: since there is a non zero chance there are other breaking issues21:13
openstackgerritEd Leafe proposed openstack-infra/project-config master: Add the os-resource-classes project  https://review.openstack.org/62166621:14
ianwyeah, i mean why golang would disappear between releases21:14
toskyor maybe golang is just somewhere else21:14
mwhahahaclarkb: yea21:14
clarkbtosky: regardless its still backward incompatible change for a stable distro21:15
fungiperhaps they renamed the package?21:15
clarkbI don't think putting the package in a differet location changes how I feel about that21:15
toskyclarkb: it depends on the place of the repository21:16
toskyon which repository21:16
fungior yeah maybe they moved it to a different rhn channel21:16
toskyI don't know how it works internally with golang, but I see this: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.5_release_notes/chap-red_hat_enterprise_linux-7.5_release_notes-deprecated_functionality21:16
fungior whatever they renamed those in the days since rhn21:16
clarkbtosky: http://mirror.centos.org/centos/7/os/x86_64/Packages/ is where it was and is now missing21:16
ianw"The golang package, available in the Optional channel, will be removed from a future minor release of Red Hat Enterprise Linux 7. Developers are encouraged to use the Go Toolset instead, which is currently available as a Technology Preview through the Red Hat Developer program. "21:17
clarkband ya that explains it21:17
ianwthat sounds likely21:17
fungiwhee!21:17
ianwi think centos has go toolset?21:17
*** diablo_rojo has quit IRC21:18
*** apetrich has joined #openstack-infra21:18
clarkbhttp://mirror.centos.org/centos/7/sclo/x86_64/rh/go-toolset-7/ is that it?21:18
clarkbthose versions are older than what were in 7.5 so may not fix everything if the go version matters21:18
clarkbwait one is older one is newer21:18
*** diablo_rojo has joined #openstack-infra21:19
*** priteau has joined #openstack-infra21:19
toskyunless you also need to enable the repository with containers-related21:19
toskystuff21:19
toskyhttps://wiki.centos.org/Container/Tools -> it seems to contain golang21:20
*** manjeets_ is now known as manjeets21:20
ianwclarkb: the slco i think is enabled via software collections, then you put it in your path21:21
clarkbhttps://git.openstack.org/cgit/openstack/octavia/tree/devstack/files/rpms/octavia is where it comes from so seems octavia specific21:21
clarkbdevstack runs aren't all trying to install it21:21
*** kgiusti has left #openstack-infra21:22
clarkbprobably up to octavia to decide what is the most appropriate method for installing golang in this case21:22
toskyit looks like that at least one of the featuresets in tripleo-quickstart enables the virt7-container-common-candidate repository, which provides golang too21:22
*** udesale has quit IRC21:23
EmilienMwe use virt7-container-common-candidate to pull podman mainly and its deps21:24
corvusokay, after much digging, i see that the "new" exception from the scheduler is not new at all; apparently for some time the scheduler has gotten sufficiently busy that there's a significant lag between when a job starts and the scheduler registers it.  if a job is canceled during that window, we can't notify the executor, and so we return the nodes out from under it.  when the job eventually21:25
corvusfails, we try to return the nodes again, but note that we don't have the lock.  in the end, everything works as it it should (or, at least, as best it can).  i don't see an immediate fix to correct the underlying race which causes the errors.21:25
corvusso i think i'm happy with the current system state and plan to give the release folks the all-clear and send out that email21:26
corvusclarkb, fungi, pabelanger, mordred: ^ sound good21:26
clarkbcorvus: ++21:26
clarkbjohnsom: hopefully that gives you enough breadcrumbs to go about fixing it. I'm not sure how octavia is using golang so unsure how to best suggest to fix it. However, I think if it were me maybe install from upstream go?21:27
pabelangercorvus: ++21:28
fungicorvus: sounds good!21:29
cmurphyclarkb: https://review.openstack.org/602380 was approved but had a gate failure, I'm now holding it until someone can babysit it, when is a good time for me to release it?21:29
mordredcorvus: ++21:29
cmurphyor mordred ^21:30
clarkbcmurphy: fungi might be willing to help watch it? he has been digging into all the mailing list stuff recently21:30
clarkbI can help too, I just don't have the same level of mailman skills21:30
cmurphythe main thing is just watching the puppet log to see if anything changed, if anything changed we revert21:31
fungiclarkb: cmurphy: sure, happy to take a look, go ahead and un-wip21:31
openstackgerritMerged openstack-infra/system-config master: Don't import tasks in iptables reload and use listen  https://review.openstack.org/62163421:31
cmurphythanks fungi21:31
clarkbfungi: ^ and with that in hopefully we can unblock the list disabling21:34
cmurphyhmm should i recheck or will it get make its way into the gate queue on its own?21:36
fungiclarkb: yep, i already rechecked my ml alias changes21:36
fungicmurphy: i've approved it just now21:37
clarkbcmurphy: if all you did is remove the -W then you probably need to recheck (or have someone approve it as fungi did)21:37
cmurphygot it thanks fungi21:37
fungimy pleasure!21:37
*** jcoufal has quit IRC21:37
* fungi goes back to writing a bunch of very redundant-looking e-mail messages21:37
clarkbcorvus: mordred: not sure if you saw https://github.com/kubernetes/kubernetes/issues/71411 during the relevant priority stuff. But any chance we can check if our cluster needs a rebuild and if that is possible? (does magnum give you the version of k8s it deploys or do you select one?)21:40
*** priteau has quit IRC21:41
corvusclarkb: i don't recall seeing a choice or information21:41
mordredI'm not super sure that would affect us anyway21:42
mordredit seems like a violation of network isolation21:42
clarkbmordred: it says in default configs the discovery api exposes it for all requests21:42
mordredright - but "Remove pod exec/attach/portforward permissions from users that should not have full access to the kubelet API"21:43
clarkbmordred: I read that to mean anyone on the internet (because our k8s api is internet facing right?) could exploit this to run pods21:43
mordredis one of the mitigations - and I don't believe we have any such users21:43
mordredclarkb: hrm. maybe?21:44
clarkbI think they listed the two ways you could exploit it and your thing is the second but not only way21:44
clarkbthe first way through the discovery api is what I am worried about21:44
clarkbya the articles on it say that the one you point out can give you admin on cluster the one I point out will let you run pods21:45
mordredclarkb: "aggregated API server endpoint" seems to be key21:46
mordredI mean - regardless, we should likely upgrade - or use it as an exercise to figure out how to upgrade even if we don't need to21:46
clarkbya I'm not sure we have the insight necessary to know how magnum is deploying things so erring on the side of caution here is probably a good idea21:47
mordredagree21:47
clarkbreading magnum user docs I don't see a managed upgrade command21:49
clarkbI'm thinking it may need to be a delete, create21:49
*** jmorgan1 has joined #openstack-infra21:53
clarkbor figure out how to do an upgrade in place on the cluster. Not sure if the commands to expand the cluster will work though (as it may end up with mismatched services?)21:55
clarkbhogepodge: ^ you probably know21:55
*** wolverineav has quit IRC21:56
clarkbmordred: thinking about it more I think you can use unauth'd discovery to get a pod, then use that to get admin. Considering that and our not having really used this at all yet, delete, create may be desireable21:59
mordredclarkb: ++22:04
corvusclarkb: yeah, but would be nice to know if/when that would be effective.22:05
corvusalso, i wonder if we can/should use the same keys.22:06
clarkbok reading more22:12
clarkbit seems that you have to have one of the non default aggregate server endpoints running22:12
fungisome sort of race or other nondeterminism in our snmp service test? http://logs.openstack.org/56/619056/2/check/system-config-run-base/30ad771/job-output.txt.gz#_2018-12-03_21_59_38_24768222:12
clarkbthats what the blurb about metrics is about22:13
clarkbmordred: ^ so I think we were both half right22:13
clarkbmordred: basically our api server is likely "vulnerable" but if there isn't the backend service endpoint behind it it can't be exploited22:13
*** jaosorior has quit IRC22:13
*** jaosorior has joined #openstack-infra22:16
*** rcernin has joined #openstack-infra22:18
*** pcaruana has quit IRC22:18
openstackgerritMerged openstack-infra/system-config master: Turn on future parser for lists.katacontainers.io  https://review.openstack.org/60238022:19
corvusfungi: this looks ok.  i'm not sure it it should be that short (compared to preceding/following lines): http://logs.openstack.org/56/619056/2/check/system-config-run-base/30ad771/job-output.txt.gz#_2018-12-03_21_56_52_32249822:21
corvusfungi: same: http://logs.openstack.org/56/619056/2/check/system-config-run-base/30ad771/job-output.txt.gz#_2018-12-03_21_56_56_26160222:22
corvusfungi: if it happens again, it might be good to hold the node and capture the syslog22:22
corvusor, well, actually, we should just do that in th post playbook regardless22:23
clarkbfwiw we do appear to have the default access to some bits of the api as unauthenticaed k8s user22:23
*** udesale has joined #openstack-infra22:23
clarkbbut hard to know if there are aggregated api servers running behind that22:23
corvus(of course, "capture the syslog" across all the systems we use is an impossibly complex task compared to 2 years ago)22:24
clarkbcorvus: for our control plane at least everything should still use rsyslog (journald will forward there)22:24
corvusoh good22:24
clarkbI'm not sure what the context of that is, but ya the way ubuntu and centos have set things up journald is actually a ring buffer that forwards to rsyslog. And pre systemd is just syslog22:25
*** udesale has quit IRC22:25
clarkbso they should all have a consistent interface to permanent logs (which is wherever rsyslog has written them which differes on distros)22:25
*** wolverineav has joined #openstack-infra22:25
*** udesale has joined #openstack-infra22:26
fungiclarkb: context was getting snmpd's syslogged errors from a test node in our ansible base-test integration jobs22:27
*** priteau has joined #openstack-infra22:27
*** ramishra has quit IRC22:28
clarkbah I would expect that to be in /var/log/messages or /var/log/syslog depending on the platform then22:28
*** priteau has quit IRC22:29
fungiyeah, hopefully as this is an attempt at replicating bits of our control plane for an integration test, behavior should be similar22:29
*** boden has quit IRC22:32
dmsimardbtw heads up, CentOS 7.6 is rolling out22:34
dmsimardah, just caught up with backlog :p22:35
clarkbdmsimard: oh we've already discovered it :) broke tripleo and octavia22:35
* dmsimard sighs22:35
mwhahahawell if that's the only issue with tripleo it'll be one of the smoothest transitions22:36
* mwhahaha knocks on wood22:36
clarkbmwhahaha: http://logs.openstack.org/90/614290/2/gate/tripleo-ci-centos-7-standalone/5c77eaf/job-output.txt.gz#_2018-12-03_21_18_15_207420 paunch just ran into that22:36
clarkbI think the error occurred because we don't support nested virt in inap22:37
mwhahahait's ignored22:37
mwhahahait failed cause of another reason22:37
mwhahahahttp://logs.openstack.org/90/614290/2/gate/tripleo-ci-centos-7-standalone/5c77eaf/job-output.txt.gz#_2018-12-03_21_55_50_13660822:37
mwhahahatempest has been hanging for some weird reason22:37
clarkbmwhahaha: maybe only load kvm_intel if vmx is present? (will clean up the logs)22:38
mwhahahayea we can clean up that role. it's our role to check if we should be using qemu or not for nova22:39
clarkbalso it fails later trying to connect to tempest-sendmail.tripleo.org:8080 ?22:39
*** mriedem is now known as mriedem_away22:40
mwhahahayea i don't know the deal with that code, will need to raise a bug (and maybe disable it)22:40
clarkbzuul can be configured to report via email if you'd like to set that up.22:40
mgagne_clarkb: vmx flag exists on our processor. is the issue that it isn't exposed to the VM?22:40
mwhahahano this is the tempest failures being sent out22:40
clarkbmgagne_: ya you have to expose it to the middle VM for the nested virt to work22:41
clarkbmwhahaha: the reports can point to job logs which include the tempest failures?22:41
dmsimardmwhahaha: fwiw the base centos image is 7.5, nodepool hasn't built the 7.6 yet apparently22:41
mgagne_clarkb: right but what's the current status? I don't remember what we did22:41
clarkbmwhahaha: another thing we should look at cleaning up is https://review.openstack.org/#/c/567224/, periodic jobs can be used for that22:41
mwhahahaclarkb: those are basically periodic but < 8 hours (which was previously the periodic limit)22:42
clarkbmgagne_: I think it is enabled on some systems but not others? I've not followed it super closely. johnsom tends to have a good overview of it22:42
mwhahahai think they are every 4, but yes it might make sense to look into a different way of running those22:42
clarkbmwhahaha: ok I'm not sure how circumventing the limit is any better?22:42
mgagne_they all have the same CPU and configs.22:42
clarkbabsically thats a bug and its wrong so please can we fix it with the correct tool (periodic jobs)22:42
mwhahahaperiodic is just one job right?22:42
mgagne_hopefully they have the same BIOS settings, that I'm not sure22:42
mwhahahanot *all* jobs for a repo?22:42
clarkbmwhahaha: periodic is a piepline you configure which jobs to trigger on the period22:43
mwhahahai'll raise the issue with the appropriate folks, i don't really like those anyway22:43
clarkbmgagne_: its a hypervisor kvm option not bios flag to pass it through22:43
clarkbmgagne_: let me hop on an instance and double check22:43
johnsommgagne_ Hi, what is your nested virtualization question?22:43
mgagne_clarkb: could be that VT is disabled in the bios22:43
mgagne_johnsom: someone suspects that vmx flag isn't exposed in inap-mtl01. I'm saying our CPU have vmx flag. so I'm wondering what's the actual issue.22:44
clarkbmgagne_: johnsom I've just hopped on an instance and don't see vmx in the VM22:45
johnsommgagne_ Ah, ok. Yeah, so if your hypervisor level sees VMX in the cpuinfo, your hardware virtualization is enabled.22:45
funginova has to be configured to pass that through to the instances, correct?22:45
clarkbsystemd-detect-virt says kvm so the hypervisor is running with virt enabled (it would say qemu otherwise)22:45
mgagne_ok, let me see which CPU model is exposed then22:45
clarkbfungi: I think its kvm actually22:45
fungiahh22:45
johnsommgagne_ However, you then need to enable your hypervisor to expose VMX inside the guests as well.22:45
dmsimardmgagne_: http://paste.openstack.org/show/736600/22:46
*** udesale has quit IRC22:46
clarkbmgagne_: its not urgent just pointing out that tripleo seemed toa ssume nested virt in the testing which added noise to the logs22:46
mgagne_so I think it has to do with the CPU model used by libvirt which does not include vmx.22:46
clarkbmwhahaha: the other tool to keep in mind there is openstack health22:47
johnsommgagne_ What hypervisor are you using?22:47
clarkbmwhahaha: it uses subunit to track things at a test level and you can rss/atom subscribe to feeds for things like that22:47
mgagne_johnsom: libvirt+kvm22:47
clarkbmwhahaha: but it gives you nice graphing over time and so on22:47
mwhahahayea we use that too22:47
*** irclogbot_1 has quit IRC22:47
clarkbmgagne_: ah interesting22:47
johnsommgagne_ These are the steps for a KVM hypervisor: https://docs.openstack.org/devstack/latest/guides/devstack-with-nested-kvm.html22:47
mwhahahathis is specifically to notify the correct people who care about specific test failures22:47
mgagne_I'll see what I can do22:47
clarkbmwhahaha: ya they should be able to subscribe to those failures in openstack health I think22:48
* mwhahaha shrugs22:48
mwhahahathis stuff predates alot of that22:48
mgagne_johnsom: I think that's not the issue atm, the issue is with the CPU model used by libvirt which doesn't include those flags.22:48
mwhahahai thought mail was turned off anyway22:48
clarkbmwhahaha: ya looks like that server isn't responding which leads to the later failure in that job22:48
mwhahahai'm filing bugs22:49
*** lbragstad has quit IRC22:51
*** lbragstad has joined #openstack-infra22:52
clarkbmordred: there was email to the -discuss list recently about how to upgrade existing magnum clusters. Looks like you need access to the host VMs and run atomic container update ocmmands22:53
clarkbmordred: so ya not exposed by the api as far as I can tell22:53
*** jaosorior has quit IRC22:53
*** rh-jelabarre has quit IRC22:54
*** jamesmcarthur has quit IRC22:55
clarkbI'm guessing we can't ssh into our magnum instances?22:55
*** rh-jelabarre has joined #openstack-infra22:57
clarkbwhat do you know I can ssh into them22:58
clarkbThere were 75084 failed login attempts since the last successful login.22:58
clarkbseems like ssh is keeping the badness out?22:58
fungiargh, can anyone interpret http://logs.openstack.org/58/621258/1/check/system-config-run-base-ansible-devel/3bb59c6/job-output.txt.gz#_2018-12-03_22_31_44_305164 ?22:59
fungilooks like it hit that on trusty, xenial and centos722:59
clarkbfungi: ansible inventory nodes use connections, ssh, windowswhateverpowershell?, etc23:00
fungisame error for all 3 so i don't think itS a concidence23:00
clarkbseems that ssh is no longer valid?23:00
clarkbwe might not want to keep up with the ansible devel at this rate :P23:00
fungior i may just put lists.o.o in the emergency disable list temporarily and hand-apply 621258 so i can get on with things23:01
clarkbfungi: http://logs.openstack.org/58/621258/1/check/system-config-run-base-ansible-devel/3bb59c6/ansible/hosts/inventory.yaml is where we tell it to use the ansible_connection ssh23:02
*** rh-jelabarre has quit IRC23:02
corvuslet's merge the non-voting change23:03
corvushttps://review.openstack.org/62157723:04
corvussomeone will need to remove frickler's WIP23:05
clarkbcorvus: fungi I removed the WIP and approved the cahnge23:06
fungithanks!23:06
clarkbcorvus: mordred and other infra-root. We can ssh into the k8s nodes via the root user23:06
clarkbseems that the hosts use our aggregate ssh key23:06
clarkbcorvus: mordred: infra-root any reason not to attempt to upgrade the cluster under magnum as described on the -discuss list?23:06
corvusclarkb: ah yes, i knew that (i selected the keypair when creating it).  i didn't make that connection though.23:07
clarkbthere is a non zero chance that this will break the cluster but we aren't using it yet right? and maybe we'll learn things23:07
corvusclarkb: i say go for it yolo23:08
fungii must admit i'm not entirely clear on what or where said magnum cluster is23:08
*** irclogbot_1 has joined #openstack-infra23:08
clarkbfungi: corvus created a magnum k8s cluster in vexxhost sjc1 to point nodepoo at23:08
corvusfungi: i made a magnum in vexxhost for nodepool23:08
fungiwas it used to test nodepool kubernetes driver?23:08
fungiahh, okay, good guess ;)23:08
clarkbits not been used yet as there was a bug in the config file23:08
clarkbnot sure if that was fixed23:08
corvusfungi: https://review.openstack.org/62075623:08
clarkb`sudo atomic pull --storage ostree docker.io/openstackmagnum/kubernetes-apiserver:v1.11.5-1` and `sudo atomic containers update --rebase docker.io/openstackmagnum/kubernetes-apiserver:v1.11.5-1 kube-apiserver` are the sorts of commands to run according to the mailing list23:09
clarkbI'll start on the master node and update all of the services to 1.11.5.-1 there. Then update the minion services after23:09
clarkband if it breaks we can always rebuild it. But ya  Ifigure good learning opportunity to do this as in place upgrade23:09
clarkbcurrent versions is 1.11.123:09
*** jtomasek has quit IRC23:10
fungiso i guess magnum doesn't manage the version of kubernetes in the way that, say, trove manages the version of mysql?23:11
clarkbcorrect23:11
clarkbthere is apparently ongoing work to support this? but the mailing list confirmed my reading of docs that we have to do it under magnum23:11
fungik23:11
clarkbthe other concern I have getting set up to do this is the magnum instances are built on fedora 27 whcih is no logner supported aiui23:12
clarkbprobably smaller concern since all services run out of containers, but ...23:12
fungiyou can in-place upgrade fedora though, right?23:13
*** yamamoto has joined #openstack-infra23:13
clarkbI think you "can" but its often recommended to do reinstall?23:13
fungior does kubernetes eat its own cloud-native dogfood and recommend that you redeploy your kubernetes control plane daily?23:13
jonherIs there a good reason to why lists.openstack.org does not do https?23:15
fungijonher: no point23:15
jonheralright, fair enough23:15
fungijonher: it sends out account passwords (the only thing https there would possibly protect) via unencrypted smtp on request23:15
fungiand those passwords are only for managing subscription preferences23:16
clarkbheh and now I've run out of disk space as we only have 5GB of disk on this node?23:17
jonherI just found some links to lists.openstack.org that had https, hence the question, I'll submit a MR in that project23:17
clarkbI'm going to see if it just didn't resize the rootfs on boot23:17
clarkbonce I figure out how to figure that out23:17
clarkb(yay learning things)23:17
fungijonher: my poc for upgrading to mailman3 suggests we'll probably switch to https when we do that, but it's a much different system too23:17
*** gema has quit IRC23:18
clarkbok lvm is set up and has ~32GB mounted under /var/lib/docker23:20
clarkb5GB mounted on sysroot23:20
clarkbproblem is we don't seem to use /var/lib/docker with atomic?23:20
corvusclarkb: i wonder if we can do a rolling replace of master/minions?23:22
clarkb/vda1 is /boot /vda2 is sysroot mapped through lvm /vdb is ~80GB device of which ~32GB is exposed to docker-pool via lvm23:23
clarkbdocker-pool isn't actually moutned on anything from what I see23:23
clarkbmaybe the intent was to set docker-pool23:24
clarkber23:24
clarkbset docker-pool in /etc/docker/docker-lvm-plugin? but that wasn't done23:25
clarkbhrm though there is an lv on the docker vg so maybe that is automagic23:25
mgagne_looks like the only way to be able to add vmx flag in Nova is to run Rocky. Or to use host-passthrough cpu_mode. Version prior to Rocky allows you to provide extra CPU flags but there is a whitelist which does not include vmx, only pcid and others related to meltdown/spectre.23:26
fungimgagne_: that option was added to allow passing through the cpu flags for meltdown/spectre23:29
mgagne_yes23:29
mgagne_but won't help for vmx =)23:29
mgagne_unless I patch our version of nova to allow it23:30
fungii have to assume nested-virt support was accomplished some other way as i thought providers had been doing that for a while23:30
mgagne_and in fact, add the feature. still running mitaka.23:30
mgagne_fungi: maybe they are using host-passthrough? or host-model?23:30
fungii don't know enough about nova to know, other than having been privy to the meltdown/spectre discussions and seeing other providers exposing nested-virt acceleration support who weren't running rocky either and who i assumed weren't patching nova to do it23:32
fungibut... maybe they were/23:32
clarkbI freed up disk space with atomic images prune23:32
clarkbit deleted some ociimages data23:32
clarkbI think the docker lv must be used by k8s workload?23:33
clarkbbut atomic isn't running things with docker? or otherwise keeping its images and runtimes off of that lv?23:33
clarkbhrm that wasn't enough to pull the other images23:34
ianwok, so i'm all caught up on the devel branch issues.  the original bug exactly matches the change pointed out by fricker.  the additional issue of using a block: in the handler (621633) is a known problem as i mentioned in a comment there23:36
ianwso while i probably wouldn't agree ansible should break this without deprecation, it's all explained in my head at least now :)23:37
openstackgerritMerged openstack-infra/system-config master: Tighten permissions on zone keys  https://review.openstack.org/61793923:38
openstackgerritMerged openstack-infra/system-config master: Make system-config-run-base-ansible-devel non-voting  https://review.openstack.org/62157723:38
clarkbfedora-atomic itself uses 4.4GB of disk for its ostree23:40
clarkbso Ican't really go deleting anything else23:40
clarkbmnaser: ^ as a heads up you may be interested in this as it feels like the vexxhost magnum deployment is not deployed on partitions large enough to do an in place k8s upgrade23:41
clarkbmnaser: you might want to double the size of vda to 10GB from 5GB?23:41
* mnaser reads backlog23:42
mnaserclarkb: i think for that when you create a magnum cluster you pick the docker volume size23:44
mnasermagnum cluster-show <foo> .. what does that show for docker_volume_size ?23:44
clarkbmnaser: no this is the sysroot that is the issue23:44
clarkbmnaser: I see the docker volume and it is ~80GB which si fine. The problem is that the host os itself uses atomic/ostree to run the system containers and I can't update those as sysroot is only 5GB large and fedora itself is 4.4GB23:44
clarkbbut let me show the cluster23:45
clarkbcoe cluster show Nodepool doesn't show volume sizes. Is that only available with magnumclient?23:47
mnaseri think it might be clarkb23:49
mnaserclarkb: i think this is a case of magnum creating a vm without volumes23:49
mnaserbut in sjc1 we do bfv only23:49
mnaserthat should probably be something we should fix23:50
clarkb| docker_volume_size  | 80                                                         |23:50
clarkbwhich is what I see on the pv/vg/lv side23:50
clarkbso I think that is fine. My understanding of the issue is that atomic runs these system level containers outside of docker. And those containers run k8s23:50
clarkbatomic itself is a 4.4GB "container" according to ostree which uses up almost the entire 5GB sysroot23:51
clarkbbut then I can't update the k8s container images as I run out of disk23:51
clarkbmnaser: are we able to specify the sysroot size somehow when creating the cluster?23:51
mnaserclarkb: unfortunately, i think the very fact that we are able to boot this cluster at all is a result of this bug: https://review.openstack.org/#/c/603910/23:52
*** pbourke has quit IRC23:53
mnaserwhen root_gb=0, it creates a 'disk' that is equal to the size of the image23:53
mnaserwhich really is a security issue to start with23:53
clarkbthat would explain it23:53
mnaserbut anyways, i think thats what is happening23:53
mnaseri wonder if magnum has bfv support, grr23:53
mnaserif not that's a fun exercise for me :)23:54
clarkbon the one hand atomic is supposed to be fairly atomic and maybe the answer here is wait for vexxhost to push new iamges and then redeploy, but that doesn't help epople that have an existing cluster they want to keep using23:54
*** pbourke has joined #openstack-infra23:55
mnaserclarkb: yeah, what sort of issues did you run into? i havent had issues doing something like atomic host upgrade in the past23:55
mnaserbut it was on new clusters so maybe they didnt have a lot of space occupied by logs etc23:55
clarkbmnaser: `atomic pull --storage ostree docker.io/openstackmagnum/kubernetes-kubelet:v1.11.5-1` fails with `FATA[0033] Error committing the finished image: /builddir/build/BUILD/skopeo-7add6fc80b0f33406217e7c3361cb711c814f028/vendor/src/github.com/ostreedev/ostree-go/pkg/otbuiltin/commit.go:407 - Writing content object: fallocate: No space left on device`23:57
mnaserany reason why you were pulling that?23:57
clarkbmnaser: yes major k8s security vulnerability I'd like to patch :)23:57
mnaseroh that's nice to know.23:58
clarkband took this as a learning opportunity. I think for infra its no big deal to make a new cluster23:58
mnaserthat's kinda necessary23:58
mnaseryeah but it's a good exercise23:58
clarkbbut anyone that has a running cluster is likely going to want to ugprade in place rather than redeploy23:58
clarkbso figuring this out is also useful23:58
mnaserlook at that, working with a cloud providers pays for both infra and provider23:58
mnaserwho knew23:58
mnaser:P23:58
mordredmnaser: ikr?23:58
clarkbmnaser: ya I mean we'll likely just reisntall it at this point, but figuring out the disk situation so that in the future we could just upgrade would be nice23:59
mnaserhttps://github.com/openstack/magnum/blob/c8019ea77f33609452dd1a973e0f421b118c2079/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml#L745-L76123:59
clarkbbut as you said that may depend no whether or not magnum understands bfv23:59
mnaserso it looks like it doesnt support boot from volume grrr23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!