Wednesday, 2014-02-19

*** NikitaKonovalov_ has quit IRC00:00
fungiright. i'm deleting nodes associated with each master as i restart it, because i can't be assured they were properly marked offline00:00
jeblairfungi: also, it takes a _really_ long time to retrieve 40k keys00:00
jeblairso this is a memory and performance problem00:00
lifelessjeblair: 40k keys?00:00
lifelessjeblair: unique per image?00:00
lifelessjeblair: erm, node ?00:01
jeblairlifeless: overall in the account00:01
*** tjones has quit IRC00:01
jeblairfor that region00:01
*** NikitaKonovalov_ has joined #openstack-infra00:01
*** NikitaKonovalov_ is now known as NikitaKonovalov00:01
jeblairlifeless: oh, yes, i think that's per node00:01
jeblairlifeless: unique per node00:01
lifelesshmm, if so we should make it per image00:02
*** rfolco has joined #openstack-infra00:02
jeblairlifeless: actually, i think it could be per-provider00:02
jeblairlifeless: it's only used to bootstrap the image creation00:03
lifelessits per image00:03
lifelessupdateImage ... manager.addKeypair00:03
jog0mordred: https://bitbucket.org/hpk42/tox/issue/116/new-pypi-override-breaks-people-who00:03
jeblairlifeless: yeah, that makes sense.  probably got so many due to image creation loops00:03
lifelessjeblair: but making it per provider would avoid running into provider quotas when lots of images are in play00:03
lifelessjeblair: and avoid this issue entirely00:04
jeblairlifeless: yep.  and less work for nodepool overall00:04
fungiokay, jenkins01 is definitely getting lots of nodes now00:04
*** CaptTofu has joined #openstack-infra00:05
jog0mordred: ahh I have tox 1.600:06
jog0mordred:  I am always scared at the bugs you find in python dev workflows00:06
jog0mordred: tox 1.6.1 works \o/00:07
* jeblair deletes keypairs00:07
*** tjones has joined #openstack-infra00:07
*** changbl has quit IRC00:07
*** gokrokve_ has quit IRC00:07
*** gokrokve has joined #openstack-infra00:08
*** jhesketh__ has joined #openstack-infra00:09
*** jhesketh__ has quit IRC00:09
*** jhesketh has joined #openstack-infra00:09
*** jhesketh__ has joined #openstack-infra00:09
fungijenkins03 is up and running again00:09
jeblairdoes anyone know if you can bulk-delete keypairs?00:11
jeblairthe nova api docs don't look promising in this regard...00:11
jog0jeblair: AFAIK I don't think you can00:11
*** dims has quit IRC00:11
*** tjones has quit IRC00:12
*** gokrokve has quit IRC00:12
mordredjeblair: I do forloops00:12
mordredsadly00:12
mordredjeblair: I support changing where keypairs happen, btw00:13
jeblairmordred: yeah, that's probably faster than asking hpcloud for a new account.  but barely.  it could take ~10 hours00:13
stevebakerhey, it looks like the tarballs job is having an issue in my heatclient release https://jenkins06.openstack.org/job/python-heatclient-tarball/11/console00:13
stevebakerConnecting to tarballs.openstack.org00:13
stevebaker2014-02-19 00:08:47.995 | ERROR: Failed to upload files00:13
*** prad has quit IRC00:14
jeblairmordred: any chance of increasing the rate limits for our hpcloud account?00:15
fungijenkins01 finally shows a nodepool node in its webui00:17
fungitwo00:17
fungithey're runnign jobs00:17
fungithis is a good sign00:17
fungistevebaker: https://jenkins01.openstack.org/job/python-heatclient-tarball/2/console00:18
*** prad has joined #openstack-infra00:18
fungiworked00:18
fungi"00:18
fungi"Offline due to Gearman request"00:19
stevebakerfungi: yay00:19
fungifor the corresponding node which ran it too00:19
fungiso i think we're on the right track now00:19
jeblairfungi: awesome00:19
mordredjeblair: probably - I could also see if they can bulk-delete keypairs behind the scenes00:19
*** sarob has quit IRC00:19
jeblairmordred: both of those would be helpful (the rate limit thing is helpful even aside from this)00:20
*** sarob has joined #openstack-infra00:20
*** rcleere has quit IRC00:22
*** matsuhashi has joined #openstack-infra00:22
jeblairaz2 only has 22k.  az3 has 48k.00:23
*** cadenzajon_ has quit IRC00:23
mordredjeblair: asking00:24
jeblairthat's 13 hours to delete00:24
*** sarob has quit IRC00:24
*** yamahata has joined #openstack-infra00:25
jeblairmordred: and in case they can: it's okay to delete all keypairs in all regions from the account00:25
*** miguelzuniga has quit IRC00:26
*** mgagne has quit IRC00:26
*** ryanpetrello has joined #openstack-infra00:26
*** dims has joined #openstack-infra00:27
*** sandywalsh has quit IRC00:27
fungijenkins05 is back up00:28
*** banix has quit IRC00:29
*** talluri has joined #openstack-infra00:30
*** hogepodge has quit IRC00:30
*** nati_ueno has joined #openstack-infra00:32
*** matsuhashi has quit IRC00:32
mordredjeblair: I have put in a few questions - the support team does not have a bulk-delete option, but they pointed me to the nova team, and I'm asking them00:34
*** talluri has quit IRC00:34
*** matsuhas_ has joined #openstack-infra00:34
* clarkb is back00:34
mordredjeblair: I have not yet asked about rate limits - I'll need to file a ticket for that00:34
fungiyay clarkb!00:34
lifeless'phil, please delete ma stuff'!00:34
fungi(so you don't have to read scrollback, just note that we're breaking everything)00:35
*** eharney has quit IRC00:35
*** nati_uen_ has quit IRC00:35
clarkbnow I want to read sb00:36
fungiclarkb: main current issues are dns resolution broken from review.o.o querying rackspace recursive resolvers in dfw (worked around by pointing at iad), nodepool memory leak appears to be related to nearly 100 thousand crufy keypairs in hpcloud, and jenkins 1.511 changed the offline api call00:36
clarkbfungi: wow re keypairs00:37
fungijeblair's deleting keypairs, i'm downgrading jenkinses to lts00:37
jog0fungi jeblair: can you file a bug with nov about bulk keypair00:37
clarkbfungi: are we upgrading zmq plugin when jenkinses are downgraded?00:37
clarkbalso is the bug in jenkins or nodepool?00:37
clarkband why is it only biting us now?00:38
fungiclarkb: i already upgraded the zmq plugin earlier when i upgraded 1.51100:38
jeblairclarkb: jenkins changed something about the internal offline node api that gearman-plugin uses00:38
fungidowngrading now to 1.532.2 (lts) which seems to solve current concerns00:38
anteayaI think crufty keypairs would be a great username00:38
jeblairclarkb: so we need to (later) update gearman-plugin to fix that00:38
anteayalike nifty lettuce00:39
clarkbfungi: jeblair: wait I am confused if lts is 1.532 how does it help to downgrade to it if 1.511 introduced the problem?00:39
fungi1.55100:40
fungii mistyped00:40
clarkbah ok it makes a lot more sense now thanks00:40
fungiearlier i upgraded from 1.525/1.543 to 1.551, now i'm downgrading to 1.532.200:40
*** dangers is now known as dangers_away00:41
fungiwhich supposedly also has the same security fixes backported to it00:41
clarkbnote that that lts version may have a different offline node bug00:41
jeblairjog0: https://bugs.launchpad.net/nova/+bug/128185300:41
clarkbthe one that we are trying to work around with single use nodes00:41
uvirtbotLaunchpad bug 1281853 in nova "Add method to bulk delete keypairs" [Undecided,New]00:41
*** sabari has quit IRC00:41
fungiooh! uvirtbot came back too while i wasn't looking, huh?00:41
*** yamahata has quit IRC00:42
*** yamahata has joined #openstack-infra00:43
jog0jeblair: thanks00:44
fungiokay, jenkins07 is online again00:44
clarkbjeblair: fungi: ok I think I grok the current state of fun. ANything I can jump onto to help?00:44
clarkblooks like DNS is better now curtesy of google00:44
jog0jeblair: do you want to be able to delete all keypairs?00:44
clarkband jenkinses are being downgraded00:44
jeblairclarkb: no we switched to iad dns00:44
clarkbah iad dns00:44
jeblairclarkb: do you think we will have a problem with the lts release?00:45
fungii'm going to start in on the even numbered masters, but more slowly while the odd numbered masters get more nodes assigned00:45
fungisince nodepool is on a go-slow00:45
clarkbjeblair: let me dig into that more, my hunch is single use nodes will mitigate it if so00:45
jeblairjog0: well, at this moment, yes.  but in general being able to provide a list of things to delete would be nice00:45
jog0jeblair: makes sense although listing 10k things in a single request seems excessive00:47
fungijog0: xargs man, xargs00:47
jeblairjog0: everything about openstack-infra is excessive.  haven't you noticed?  ;)00:48
jog0jeblair: :)00:48
fungior being able to go in a for loop and delete 10 keys per call would at least speed up the situation by a factor of 1000:48
jog0I have00:48
geekinutahso sad, success but still requed :-( https://jenkins06.openstack.org/job/gate-nova-python27/847/console00:48
*** tjones has joined #openstack-infra00:48
fungigeekinutah: we're continuing to downgrade jenkins masters00:48
geekinutahyeah, I've been watching pass on downgraded guys00:49
fungigeekinutah: i've got about half of them done, and am getting started shutting the other half down, so it should be fixed up soonish00:49
*** rfolco has quit IRC00:49
jeblairgeekinutah: as fungi works through the jenkins downgrade, your changes of completion are going up! :)00:50
clarkbhttps://issues.jenkins-ci.org/browse/JENKINS-19453 is the upsteram bug00:50
*** sarob has joined #openstack-infra00:50
*** wenlock has quit IRC00:50
clarkbsorting out if that made it into the lts00:51
clarkblooks like ti may have been backported00:51
geekinutahfungi, jeblair: don't mind me, you guys are doing great, really appreciate it00:51
clarkbjeblair: fungi: the fix for 19453 was backported into stable and is in 1.532.2's log00:54
clarkbwe should be fine00:54
fungiclarkb: all's the better. thanks for checking!00:54
*** geekinutah has left #openstack-infra00:54
openstackgerritDerek Higgins proposed a change to openstack-infra/nodepool: Add fedora support  https://review.openstack.org/7452900:56
openstackgerritDerek Higgins proposed a change to openstack-infra/nodepool: Catch key problems in ssh_connect  https://review.openstack.org/7452800:56
*** david-lyle has quit IRC00:57
openstackgerritJames E. Blair proposed a change to openstack-infra/nodepool: Make jenkins get info task synchronous  https://review.openstack.org/7454500:57
clarkbfungi: do any more jenkinses need downgrading?00:58
clarkbI can hand hold some of those if it helps00:58
jeblairclarkb, fungi: ^ maybe let's merge that soon and i think that will reduce the 40 minute main-loop cycle in nodepool00:58
clarkbjeblair: rgr will review00:58
mordredderekh: re: your key problems patch - what happens if the node never comes online/00:59
mordred?00:59
fungiclarkb: not really. it's mostly just stretching the process out so that i don't completely starve us, but i'm shutting down the other evens here momentarily00:59
mordredjeblair: lookin00:59
clarkbfungi: ok00:59
clarkbjeblair: that change is nice and small +201:00
clarkbjeblair: is there any concern that there are mixed async and sync calls?01:00
derekhmordred: it should timeout like it always did, the exception I'm catching get thrown if ssh comes up but the key doesn't work01:00
fungiokay, 2/4/6 are in shutdown now, and 02 is close to me being able to downgrade it. i'm evacuating all it's 100+ ready nodes so that the good masters will start to pick up steam01:00
mordredok. the commit message said something about continuing to try - I just wanted to make sure we weren't introducing a possibly endless loop01:01
anteayalike we are in now01:01
mordredderekh: yup. duh. I read it properly now. thanks01:01
jeblairclarkb: i don't think so; it should just be some simple urllib2 calls; i think the jenkins object is thread safe01:02
jeblairclarkb: yeah, it just stores some strings and that's it01:02
clarkbjeblair: great01:03
*** dcramer__ has joined #openstack-infra01:03
*** mdenny has quit IRC01:04
*** russellb has quit IRC01:05
mordredjeblair: while I'm reviewing that one, I'm reviewing the other nodepool changes that are up - there's one from BobBall that looks very safe and has 2 +2s already (it just adds matching for image hex strings)01:05
mordredshould I avoid landing extra thigns on principle?01:06
jeblairmordred: should be ok01:06
clarkbyeah BobBalls change is pretty safe iirc01:06
mordredjeblair: k. (I've been heads-down in nodepool today, so I also feel fairly competent on what it's doing)01:06
jeblairi have manually installed nodepool with my change on nodepool.o.o01:07
jeblair(because it's going to be forever before it actually merges)01:07
jeblairfungi: what's the current jenkins state?  i'm trying to figure out when it would be best to restart np01:08
jeblairfungi: (not only to pick that up, but also because it's about time to free memory)01:08
fungijeblair: jenkins01,3,5,7 are online but none have nodes assigned (well not entirely true, there are a few dozen in nodepool ready state on 01 but not showing in the webui yet)01:09
fungii'm nodepool deleting ready nodes from the even masters while they finish up their remaining jobs01:09
fungiin hopes nodepool will soon start adding fresh nodes to the active masters01:10
*** markmcclain has quit IRC01:10
jeblairfungi: the evens are in shutdown mode?01:10
fungijeblair: yes01:10
jeblairfungi: now might be the best time to restart then01:10
fungiworks for me01:11
jeblairfungi: i think it may have oomed while we were talking about it01:13
clarkbjeblair: did we identify why keypairs are leaking? and maybe we should switch to using a specific keypair isntead?01:14
*** ryanpetrello has quit IRC01:14
*** tjones has quit IRC01:14
*** tjones has joined #openstack-infra01:14
jeblairclarkb: my guess is they leaked during image creation loops.  and yes, i think we should have one keypair per provider.01:14
*** atiwari has quit IRC01:15
*** tjones has quit IRC01:16
openstackgerritA change was merged to openstack-infra/config: Add single-use py3k-precise nodes  https://review.openstack.org/7384601:17
clarkbjeblair: should be ok to merge change to nodepools config yaml too?01:18
jeblaira change merged!01:19
jeblairclarkb: yeah01:19
clarkboh gah, I really need to figure out why gerrit doesn't show my commit message first01:20
jeblairit looks like the nodepool main loop now runs every ~13 seconds01:20
jeblairso it should be much less spiky now01:20
clarkbI think it happens when I jump to different changes via the dependency links01:20
fungiboy howdy01:20
fungiand the good masters are running mucho jobs now01:21
*** jergerber has joined #openstack-infra01:21
*** nati_uen_ has joined #openstack-infra01:21
mordredjeblair, clarkb: I have locally observed keypairs leaking - best I can tell, if an image fails at creation, one is left with a keypair01:21
clarkbmordred: so I think we should just use a single keypair per provider and call ti good01:21
openstackgerritA change was merged to openstack-infra/config: Fix Climate jobs  https://review.openstack.org/7131701:21
clarkbwhich jeblair agrees with01:21
*** tjones has joined #openstack-infra01:22
mordredclarkb: yup01:22
jeblairclarkb: i assume that means nodepool will need to create it and stash the private half locally in /var.  shouldn't be a big deal though.01:22
clarkbjeblair: correct01:23
*** sarob has quit IRC01:23
clarkbjeblair: ideally it will store both halves :) you only put one half on zuul-dev which meant I had to dig in DBs for the public half which si no fun :)01:23
*** nati_ueno has quit IRC01:23
mordredclarkb: you can construct a public key from a private one01:23
mordredclarkb: I always have to go re-learn the command though01:23
*** mestery has quit IRC01:24
clarkbmordred: oh are both in the encrypted file?01:24
jeblairclarkb: actually, nodepool really only has to store the public half, come to think of it.01:24
*** talluri has joined #openstack-infra01:24
clarkbjeblair: it sshs which needs the private side right?01:24
*** banix has joined #openstack-infra01:24
*** derekh has quit IRC01:24
jeblairclarkb: right, it only needs the private half.  :)01:24
*** harlowja_away has quit IRC01:25
clarkbmordred: I don't know why I never knew that, I guess I assumed that they were distinct (you can't get one from other with maths)01:25
*** tjones has quit IRC01:25
fungijenkins02 is downgraded and back online now01:25
mordredclarkb: you can go in one direction, just not the other01:26
clarkbmordred: right but only because the public key is in the private key file01:26
clarkbnot due to maths01:26
*** mrodden has quit IRC01:26
jeblairthere were 138 nodes in the building state while np was stopped; i'm deleting them now.01:26
anteayalook at all that yellow and green in the graph01:26
anteayathat would get rid of the yellow I guess01:27
anteayaI wonder how much of the green is actual usable available nodes01:27
clarkbanteaya: I think very little of it due to jeblair's fire and brimstone approach01:28
*** tjones has joined #openstack-infra01:28
anteayak01:28
jeblairi think nodes that have been ready for >1h are suspicous and should be deleted01:28
anteayaah01:28
anteayagoodbye nodes01:28
anteayatake your crufty keys with you01:28
jeblairthat's another 101 nodes01:28
*** talluri has quit IRC01:28
jeblairthough of course we're running into rate limits with so much going on01:29
fungijeblair: agreed. if they're on jenkins04 or 06 though they're explainable. i'm in the process of deleting them already01:29
anteayaof course01:29
anteayaa fire would be a fire without some throttling01:29
anteayawouldn't01:30
*** mestery has joined #openstack-infra01:30
fungii've nearly got 04 cleared out. 06 still has a couple jobs running but they should be wrapped up by the time i get to it01:31
anteayathey will just loop back round for another go on a different jenkins01:31
*** tjones has quit IRC01:32
clarkbfungi: out of curiousity is there a reason we limited bare-centos to rax? py3k-precise as well01:33
fungiclarkb: i think because that's hwere they'd previously run01:34
fungiand we maybe hadn't tested puppeting up hpcloud's base centos images?01:34
fungiwe can certainly add a change to spin up images in those too and see how they fare01:35
jeblairyep01:35
*** balar has quit IRC01:35
fungijenkins04 is back online now01:35
clarkbfungi: cool, just checkign that there wasn't a specific reason for that01:36
clarkblike image didn't work or some such01:36
*** nosnos has joined #openstack-infra01:36
fungiph33r of the unknown (and a black hat)01:36
anteayawas jenkins 06 the last one to come down?01:36
fungianteaya: yes, i'm clearing it out now01:36
anteayak01:36
anteayasome jobs just finished on 06 and have started up again on other nodes on a patch I am watching01:37
anteayaI hope this is the last round01:37
clarkbzaro: are you about? looks like a bug was fixed for the envinject thing. Did we chase that down?01:37
clarkbzaro: or are we just calling it a derp and moving on?01:37
clarkbzaro: the bug wasn't clear to me01:37
fungiclarkb: for the zmq plugin tarball job? i merely retriggered the job and it worked the second time around01:38
clarkbfungi: right, but zaro marked that bug fixed I think01:39
*** banix has quit IRC01:39
fungiahh01:39
fungi"fixed"01:40
*** banix has joined #openstack-infra01:41
jeblairthe sparklines for both check and gate have a downtick01:41
*** mgagne has joined #openstack-infra01:41
clarkbjeblair: fungi: all jenkins are downgraded and all cruft nodepool nodes are in the process of being deleted?01:42
fungijenkins06 is online again now and downgraded01:42
*** tjones has joined #openstack-infra01:42
clarkbfungi: in other news the logstash build_master data is populated which is pretty awesome01:42
fungiclarkb: i still have a couple nodepool delete loops taking their time, but only a handful of remaining nodes each between them01:42
clarkbdims: ^01:43
fungii'll try to check back in later and mass delete any nodepool nodes which have been in any state at all for >3 hours01:44
*** mgagne1 has joined #openstack-infra01:44
fungijust in case we miss a few01:44
clarkbfungi: ping me then I may be around and can assist01:44
fungik01:44
jeblairfungi: cool.  i just started some for deletes that have been in state for > 1hr01:44
openstackgerritK Jonathan Harker proposed a change to openstack-infra/config: Parameterize the status page urls  https://review.openstack.org/7455701:44
fungiin amusing news, the zuul jph graph topped out at 3000 earlier :/01:45
*** tjones has quit IRC01:45
jeblairfungi: yowza.01:45
fungii guess we road tested the new patches01:45
*** prad has quit IRC01:45
jeblairhttp://graphite.openstack.org/render/?from=-24hours&height=600&until=now&width=800&bgcolor=ffffff&fgcolor=000000&areaMode=stacked&target=color%28alias%28sumSeries%28stats.gauges.nodepool.target.*.*.*.building%29,%20%27Building%27%29,%20%27ffbf52%27%29&target=color%28alias%28sumSeries%28stats.gauges.nodepool.target.*.*.*.ready%29,%20%27Available%27%29,%20%2700c868%27%29&target=color%28alias%28sumSeries%28stats.gauges.nodepool.target.*.*.*.us01:45
fungii think because so many jobs were cycling and gettign reset01:46
clarkbfungi: cycling? meaning 72 hour timeout?01:46
*** mgagne has quit IRC01:46
jeblairoops, too long.01:46
dimsclarkb, nice!01:46
*** prad has joined #openstack-infra01:46
fungiclarkb: the jobs were getting nodes offlined out from under them and restarted over and over01:46
clarkbfungi: oh right jenkins bug01:47
jeblairclarkb: the jenkins/gearman-plugin bug manifested as a null-result build to zuul01:47
clarkbor gearman plugin01:47
jeblairclarkb: so the 'restart job on jenkins derp' logic kicked in and zuul has been restarting these jobs for several hours01:47
openstackgerritA change was merged to openstack-infra/nodepool: Make jenkins get info task synchronous  https://review.openstack.org/7454501:47
fungiwhich is the main reason for the current pile-up01:47
openstackgerritA change was merged to openstack-infra/nodepool: Allow useage of server IDs as well as names.  https://review.openstack.org/6942401:47
jeblairclarkb: so the nice thing is that they aren't reporting negative results01:48
*** melwitt1 has quit IRC01:48
clarkbas things settle any chance I can get another core review on https://review.openstack.org/#/c/72509/101:48
clarkbjeblair: ya that would make the fun right now a bit more chaotic01:48
jeblairclarkb: you don't want the random sleep?01:48
clarkbjeblair: I don't think it is necessary as the jobs are being split apart by 20 minutes already01:49
jeblairk.  should be fine as long as other things aren't hitting it at those times01:49
*** yaguang has joined #openstack-infra01:50
fungijenkins06 is getting nodepool nodes and running jobs now01:50
clarkbjeblair: it should be an improvement over the current situations which is 12 large query sets per hour instead of 301:50
clarkb(I think 12)01:50
*** banix has quit IRC01:54
fungiclarkb: are we holding off upgrading the gearman plugin to 0.0.5 in production (i see it's only on -dev)01:55
fungijeblair: ^01:55
fungii don't recall what the situation was with that01:55
clarkbfungi: I don't think we need to, but it does require a zuul restart01:55
clarkbI can do the logstash server during a less hectic time01:56
*** weshay has quit IRC01:57
*** banix has joined #openstack-infra01:57
*** dkehn has quit IRC01:57
anteayayay, jobs are finished and staying finished01:57
*** dkehn has joined #openstack-infra01:57
fungii've gone back through the jenkins masters and confirmed they're all on the correct versions of jenkins and important plugins01:58
funginice and consistent for the first time in a while01:58
anteayawell now we are stable jenkins heading into ff01:59
fungiwe hope, anyway02:00
anteayawe do02:00
*** talluri has joined #openstack-infra02:00
fungiit's not like we've had great luck with running a different jenkins release and not finding new issues02:00
fungiand just like the one i downgraded from, this is also a version we haven't run before02:01
*** yamahata has quit IRC02:01
anteayamore fun coming up02:01
anteayano idea how it will manifest02:02
*** khyati_ has quit IRC02:02
*** pcrews has quit IRC02:02
anteayafungi: have you eaten lately?02:02
anteayathe fire fighting has been going on 3+ hours02:03
mordredwait- I thought fungi didn't get to eat until after FF02:03
fungiheh02:03
anteayajust using up the camel hump I guess02:03
*** mgagne has joined #openstack-infra02:06
*** mgagne1 has quit IRC02:07
*** mrodden has joined #openstack-infra02:07
fungigetting back to rerunning jenkins-jobs update on the jenkins masters. my bare-precise change touched a whole lot of jobs, so the puppet exec timeout was way too short to handle it02:07
*** ryanpetrello has joined #openstack-infra02:07
*** hdd_ has joined #openstack-infra02:08
fungii had already gotten through jenkins.o.o and 01-04, so now it's running on 05-0702:08
jeblairfungi: oh nice catch; that's probably responsible for some stuck jobs too02:08
jeblairfungi: as i imagine that our static nodes may be marked offline at this point...02:08
*** mgagne1 has joined #openstack-infra02:08
fungiwell, they were all on 01 and 02, which is why i got those out of the way early02:09
fungithey were mostly done before we got into any of the real fun02:09
*** nati_ueno has joined #openstack-infra02:09
*** mgagne has quit IRC02:10
*** dstanek has joined #openstack-infra02:10
*** talluri has quit IRC02:12
*** nati_uen_ has quit IRC02:12
*** talluri has joined #openstack-infra02:12
fungii had already confirmed no new jobs were getting assigned to the static precise slaves, then offlined them and removed them from the masters a while later (i can add them back fairly easily if need be, since they're not deleted at the provider yet)02:13
jeblairfungi: there are a lot of ready nodes, and most of the are attatched to jenkins02...02:15
jeblairfungi: do you think one of your scripts missed something, or could those be casualties of the nodepool downtime and restart?02:15
fungithe latter. nodepool list counted zero when i deleted them initially with jenkins02 shutdown02:16
clarkbjeblair: I want to say I have seen nodepool do wild swings like that while nodes are down02:16
clarkbthen it settles out again once everything is back up for a full iteration through the single use nodes02:16
fungiit's possible it wasn't the nodepool restart, but that nodepool tried to add nodes to 02 too quickly and most didn't register in jenkins02:16
jeblairclarkb: oh, i'm not thinking that jenkins02 is over loaded, i mean to say that they are not really ready nodes02:17
clarkboh that is different then02:17
jeblairclarkb: the ones i have spot checked were already deleted from jenkins02:17
*** talluri has quit IRC02:17
clarkbjeblair: so they are in nodepool buit not reflected in jenkins02:17
clarkbgotcha02:17
morganfainbergi am guessing some excitement for the day has abated since i see jobs making their way through check/gate02:17
fungithere was an initial glut of 100+ nodes built for 02 and i only saw a fraction of them show up in the webui. i suspect jenkins never added them to its slave list02:18
lifeless'maybe' :P02:18
morganfainberglifeless :)02:18
anteayamorganfainberg: aye02:18
anteayawe hope they continue along the abatement route02:18
anteayaabating?02:18
morganfainbergwell... if they are resolving...and there is a little bandwidth to do something that keystone-core will <3 you (ok ok, i still lie, I will <3 you guys) for https://review.openstack.org/#/c/74472/ - it'll keep us from monopolizing openstack-dev. (just eavesdrop bot stuff)02:20
* morganfainberg thinks if that can be rephrased in a creepy stalker-ish way... 02:20
morganfainberg>.>02:20
morganfainbergnah02:20
*** sarob has joined #openstack-infra02:20
anteayamorganfainberg: please don't do that02:21
jeblairfungi: i think i'll delete ready nodes that are > 0.1 hours02:21
anteayaI have such high regard for you02:21
fungijeblair: sounds safe02:21
morganfainberganteaya, hehe, i don't think i could actually think of a way to rephrase it.02:21
anteayagood02:21
morganfainberganteaya, just doesn't come naturally to me.02:21
anteayaglad to hear it02:21
anteayaagain, so happy to hear that02:21
morganfainbergbesides, i actually genuninely like -infra folks02:22
anteayawell there's that too02:22
anteayaso thanks02:22
anteayaback at 'ya02:22
morganfainberganyway. just relaying keystone desires :) thanks in advance02:22
anteayaalready +1'd02:22
morganfainberganteaya, i know :) you're awesome.02:23
anteayajenkins is +1, 6 other +1 and a +2 on it02:23
jeblairokay there's another 89 nodes i hope02:23
anteayamorganfainberg: nah, I just review the easy patches02:23
morganfainberganteaya, yep. it's why i hopped over.02:23
morganfainberglol02:23
morganfainberganteaya, one of these days i'm going to have time to be really more involved with infra stuff02:24
morganfainberganteaya, one of these days...02:24
dolphmfallacy &02:24
anteayamorganfainberg: one of these days02:24
jeblairmorganfainberg: one of these days i hope i'll have time too.  :)02:24
anteayadoesn't sound like today02:24
morganfainbergjeblair, lol :)02:24
morganfainberganteaya, nah, dolphm  will just find more stuff to be done.02:24
anteayahe is like that, dolphm is02:25
* anteaya gestures pushing that patch out of the gate02:26
*** UtahDave has quit IRC02:26
jeblairoh, the top check change is running its missing job!02:28
clarkband the downtick on the sparklines continue02:29
anteayayay for both02:29
anteayayay that job finished success02:31
*** ryanpetrello has quit IRC02:31
anteayaout out out02:31
anteayalook at the gate shrink02:32
anteaya602:32
anteaya12 in post02:35
*** gokrokve has joined #openstack-infra02:35
funginibalizer: what size vm does this puppetdb need to start out? 2gb ram?02:36
openstackgerritA change was merged to openstack-infra/config: Run fewer es queries with elastic_recheck.  https://review.openstack.org/7250902:39
clarkbfungi: http://docs.puppetlabs.com/puppetdb/latest/scaling_recommendations.html02:41
clarkbfungi: basically we have two major processes, puppetdb itself (which is java and needs heap) and postgresql02:41
jeblairclarkb: puppetdb is java?  not ruby?02:42
clarkbfungi: it doesn't look like we need a very large puppetdb java process because we are using postgresql. Which leaves us with accomodating postgresql02:42
clarkbjeblair: its jvm, it might be jruby or similar02:42
*** jergerber has quit IRC02:42
*** dcramer__ has quit IRC02:42
jeblairclarkb: ah.02:43
clarkblooks like clojure02:43
clarkbhttps://github.com/puppetlabs/puppetdb/tree/master/src/com/puppetlabs02:43
fungithe language schizophrenia of the puppet ecosystem amuses me02:43
clarkbI have a hunch 2GB is plenty02:44
clarkbbut nibalizer should know more02:44
openstackgerritDavanum Srinivas (dims) proposed a change to openstack/requirements: Sync requirements to oslo.vmware  https://review.openstack.org/7456902:45
clarkboslo.vmware02:45
clarkbI think that is my queue for dinner02:45
clarkbcue?02:46
clarkbsilly english02:46
fungioh, fair warning i'll be disappearing around 21:00 utc tomorrow for our monthly local osug02:46
dimsi will probably need help fixing a bad requirements.txt in oslo.vmware :)02:46
fungidims: you're going to need help fixing vmware? i think that's out of my league, sorry ;)02:47
clarkbdims: why do we need vendor specific oslo libs?02:47
fungisounds like the different virtual resources in vmwareland need some common interaction from more than one component of openstack02:48
dimsfungi, just need to add a \n in the requirements.txt02:50
dimsclarkb, fungi - yea, same code in cinder, nova etc02:50
clarkbdims: why wouldn't that live in vmware land?02:50
clarkb(just trying to sort out why this lives in openstack and not the vendor space)02:50
fungivmware python sdk02:51
dimsclarkb, the code is pretty specific to openstack and not usable outside of openstack02:51
anteayais it opensource?02:51
jerryzhi guys, i have a problem with nodepool used by gerrit-triggered jenkins. when a patch is updated, the on-going job will be aborted but the slave hasn't been deleted yet and the new patch is tested on the used slave. later on that slave will be deleted by nodepool and the job fails.02:51
*** sarob_ has joined #openstack-infra02:52
clarkbanteaya: yes02:52
clarkbjerryz: I don't think you can mix the two02:52
dimsanteaya, it's existing code in nova/cinder that's getting moved out all projects can use the same code base02:52
clarkbjerryz: you need to use the offline slave functionality in gearman plugin with nodepool02:52
anteayayeah, I'm with clarkb not sure why we have to maintain vendor code, regardless of how specific it is to openstack02:52
clarkbanteaya: I think I understand now02:53
clarkbit is openstack specific bits for interacting with vmware02:53
clarkband if it needs to go in multiple projects oslo is the place for it02:53
dimsanteaya, clarkb, fungi  - https://blueprints.launchpad.net/oslo/+spec/vmware-api02:53
clarkbit just feels wrong02:53
anteayait does02:53
anteayaI have vendor prickly radar02:53
jerryzclarkb: i will have a try02:54
jerryzthanks02:54
clarkbjerryz: or if the gerrit plugin can offline nodes when jobs are started that will work too02:54
*** sarob has quit IRC02:55
*** nati_uen_ has joined #openstack-infra02:57
jerryzclarkb: is the gearman plugin with offline slave function a snapshot version?02:58
clarkbjerryz: I think you may need a snapshot for the latest bug fixes02:58
clarkbjerryz: http://tarballs.openstack.org/ci/gearman-plugin/02:59
clarkbno 0.0.6 looks new enough02:59
*** yamahata has joined #openstack-infra02:59
jerryzclarkb: thanks02:59
*** nati_ueno has quit IRC03:00
*** julim has quit IRC03:02
*** simonmcc has quit IRC03:08
*** gokrokve has quit IRC03:08
*** gokrokve has joined #openstack-infra03:09
*** simonmcc has joined #openstack-infra03:10
*** gokrokve has quit IRC03:13
*** gokrokve has joined #openstack-infra03:15
*** sarob_ has quit IRC03:24
*** CaptTofu has quit IRC03:26
openstackgerritCyril Roelandt proposed a change to openstack-infra/pypi-mirror: Do not download wheels when running "pip install"  https://review.openstack.org/7457903:28
*** matsuhas_ has quit IRC03:29
mordredhrm03:30
mordredthat's pretty much the opposite direction we'd like that to go03:30
mordredclarkb: ^^ unless there is a direction or issue we're seeing I don't know about?03:31
clarkb?03:34
clarkb74579?03:35
mordredyeah03:37
morganfainberganteaya, can you explain something to me...03:39
morganfainberganteaya, why are recruiters obnoxious? :P03:40
morganfainbergok ok enoough of that03:40
mordredmorganfainberg: because of the reasons03:40
mordredmorganfainberg: also, because they need a bunch of contract java programmers in new jersey apparently03:40
dstufftmordred: ideally you'd download both sdist and Wheel03:41
dstufftbut pip isn't really designed for mirroring :[03:41
morganfainbergmordred, esp when they look at a resume and think "Oh open source developer, he'd like to work on proprietary java internal close source insanity"03:41
fungiwow... a java programming gig in new jersey? can't say i'm sure which part is worse03:41
dstufftfungi: I was just thinking that03:42
mordredmorganfainberg: ++03:42
dstufftthere's literally nothing about that which sounds appealing03:42
lifelessmordred: btw, if you ahve 70G available, bandersnatch++03:42
mordredI don't know what a bandersnatch is03:42
lifelessmordred: its the official pypi mirror tool03:42
dstufftit does a full mirror of PyPI03:42
lifelesss/the/a/03:42
dstufftI think there was a reason why openstack didn't want that03:42
dstufftbecause i'm pretty sure i suggested that before03:42
clarkbwhich isnt what we want but is slowly getting there03:42
lifelessmordred: efficiently03:42
anteayamordred: they don't know any better03:43
morganfainbergfungi, or worse, java + "git architect".  wait... what is a git architect job really? and why does that need ot be a full time job. I could help do that and have more fun/wider range of things just by working with -infra03:43
clarkbreview.o.o isnt working on my phone again03:43
clarkb:/03:43
lifelessclarkb: get a new phone? :)03:43
morganfainberglifeless, bandersnatch is cool.  i've been looking at that for some internal stuff03:43
clarkbits a chrome js + caching problem I think03:43
morganfainbergit's the 70G that is a challenge for me to sell, but i like the project03:43
mordredlifeless, dstufft I BELIEVE lasst time we looked at it it didn't work yet03:43
mordred70G is a piece of cake03:44
clarkbmordred its a mirror03:44
lifelessmorganfainberg: 70G - mem.03:44
lifelessmorganfainberg: meh I mean. :)03:44
clarkbwhich we dont want03:44
dstufftoh03:44
mordredlifeless: does it follow external links?03:44
anteayamy sister's niece is a recruiter03:44
dstufftno it doesn't03:44
clarkbbecause external links03:44
morganfainbergoh oh. i meant to ask...is there some strange chrome issue w/ gerrit?03:44
lifelessmordred: no03:44
mordredclarkb: you used to argue the opposed03:44
mordredopposite03:44
mordredlifeless: yea - that's why03:44
anteayapretty and knows zip and doesnt' want to know03:44
mordredwe want it to suck down external links03:44
dstufftkill all your external links imo03:44
dstufft:D03:44
morganfainbergmy chrome (desktop) browser jumps around when i click.03:44
mordredbecause those are what kill us03:44
morganfainberglifeless, phsaw, ram is free right?03:44
anteayaone ugly christmas dinner was enough for me03:44
lifelessmordred: we want them to die :)03:44
mordredwe do03:44
lifelessmorganfainberg: it doesn't use much ram03:44
mordredbut they are not yet dead03:44
clarkbmordred well a mirror that pulls external links is what I want :)03:44
mordredclarkb: yes03:45
mordredif it pulled external links, I'd get rid of pypi-mirror and just use it03:45
morganfainberglifeless, also network is free, right?03:45
dstufftit's convievable that bandersnatch would grow the option to pull in external links03:45
morganfainberglifeless, >.>03:45
lifelessmorganfainberg: I may have exceeded my quota this week :)03:45
morganfainberglifeless, hehehe03:45
dstufftalthouth it wouldn't match the output of pip install exactly03:45
dstufftbecause people can update the external links03:45
lifelessmorganfainberg: since I setup an Ubuntu mirror (100Gish) + bandersnatch(70G)03:45
*** nati_ueno has joined #openstack-infra03:45
morganfainberglifeless, oh dear!03:45
dstufftwithout updating the pypi listing at all03:45
dstufftand then bandersnatch won't know to download :[[03:45
lifelessmorganfainberg: I have a 500G quota03:46
lifelessmorganfainberg: so I'm probably ok.03:46
morganfainberglifeless, that would blow out my bandwidth cap. (I only get 250)03:46
* clarkb is unlimited \o/03:46
*** nati_ueno has quit IRC03:46
morganfainbergclarkb, i could use my cellphone i have "unlimited data"*03:46
anteayaI too am unlimited03:46
morganfainberganteaya, :( /jealous03:47
anteayaif you want03:47
anteayaCanadian telco monopoly is pretty bad03:47
anteayaI'm jealous of the dude in SF with 271mb up and down03:48
anteayadon't know what he has for data cap03:48
*** nati_uen_ has quit IRC03:48
dstufftunlimited ftw03:48
clarkbdstufft right which is why we pip03:48
anteayaokay so that was the last patch we were waiting for03:49
anteayalifeless: when markmcclain shows up again, he can release a neutronclient, the last patch he was waiting on (yours) has merged: https://review.openstack.org/#/c/69110/03:50
anteayahe said he would check in after he had dinner03:50
anteayaand I am off to bed03:50
anteayanighty-night03:51
clarkbmordred: so we don't use latest pip03:51
lifelessanteaya: thanks for the update03:51
anteayanp03:51
mordredclarkb: no?03:51
clarkbmorganfainberg: we use pip 1.4.X because 1.5 is broken03:51
clarkbmordred: ^03:51
mordredclarkb: I thought 1.5.1 was out03:51
clarkbas is latest virtualenv and tox03:51
mordredwhich fixed it03:51
mordredwe just haven't unpinned yet03:51
mordredbecause of FF03:51
clarkbmordred: it might, if it did we didn't unpin03:51
clarkbmordred: that said03:51
morganfainbergclarkb, i know i know i should use a different name in irc, cause i make you type 4 characters instead of 303:51
clarkbmordred: we might want to set no wheels there bceause we do two passes right?03:51
*** sarob has joined #openstack-infra03:51
mordredwell... yeah. I could see that03:52
clarkbmordred: I think what cyril is saying is that the mirror builder will only find wheels if they are available03:52
mordredwe do the tarball pass, adn then we build wheels from the tarballs03:52
clarkbmordred: so the tarball pass needs to not get any wheels then the wheel pass needs to get all wheels03:52
mordredso - ok. I can buy that03:52
*** rcleere has joined #openstack-infra03:53
clarkbalso pinging holger tomorrow is on my list of things to do03:53
clarkbI would really like working tox03:53
mordredyeah03:53
mordredthat would be awesome03:53
* mordred is so happy that dstufft hangs out in here03:53
mordredit's made pip so much better for us03:53
mordredalso, he's cool and all03:54
clarkbI am going to hop over to #tox tomorrow when I feel patient :)03:54
dstufft;)03:54
dstufftum03:54
dstuffti'm going to be releasing a 1.5.3 either tonight or tomorrow03:55
dstufftI don't know how hard it would be to unpin and try things out03:55
mordredwell, right now we've got tox pinned03:55
dstufftok03:55
mordredbeacuse of $other_bug03:55
mordredso realistically we won't try for another 2 weeks at best, because of our feature freeze cycle of death03:55
dstufftjust saying, if it's easy to unpin then it would be a good time to sneak something into 1.5.303:56
dstufftif something else came up03:56
dstufft:]03:56
mordrednod03:56
mordredI trust that pip is perfect at this point03:56
dstufftok :)03:58
Steap_clarkb: yep, that was my idea (I'm Cyril), looking at the code, I think we'll get the wheels anyway03:58
Steap_so I wanted to keep the tarballs since we currently cannot do without them03:58
clarkbyup03:59
*** gyee has quit IRC04:00
mordredSteap_: yup. I grok now04:04
mordredand thanks- I believe you're quite right04:04
Steap_mordred: honestly, I don't really get everything :)04:06
Steap_I've just learnt that there was yet another way of installing Python packages04:07
mordred:)04:07
mordredwheels are awesome04:07
mordredwe need to get on them04:07
Steap_and it prevents some packages from being updated :/04:07
mordredbut we're not there yet04:07
Steap_mordred: yeah, probably04:07
Steap_but I've known easy_install, pip, setuptools, distutils... Yesterday I had to install a Ruby package, had to learn abotu gems... That's sort of a pain in the end :)04:08
Steap_I miss ./configure && make && make install :)04:08
clarkbmordred I am 95% sure we can turn on wheels now04:08
Steap_mordred: do you have a link explaining how wheels are awesome ?04:08
clarkbi did a bunch of local testing against the mirror and it seemed to work04:08
mordredSteap_: one reason - they don't run python setup.py to install04:08
Steap_clarkb: what about the old pip used in the gates ?04:08
mordredSteap_: they are pre-built/binary04:09
Steap_mordred: ok04:09
mordredSteap_: so you don't need dev libs or c compilers or anything04:09
mordredI mean, you still need the c libs04:09
clarkbSteap_ not a problem I uses the same version of pip04:09
mordredbut MUCH MUCH more efficient04:09
Steap_clarkb: well, how do you explain the failures in the gates, then ?04:09
mordredSteap_: I miss configure ; make ; make install too04:09
Steap_mordred: things where simpler :)04:09
clarkbSteap_ which failures?04:09
mordredSteap_: on my last project, I was teh automake/autoconf person04:09
Steap_mordred: the main issue is that you need to learn a different way of installing packages for every language you migth have to use04:10
mordredwe could move to autoconf for our python stuff ...04:10
mordredSteap_: yeah04:10
Steap_and it changes every5 years04:10
Steap_so in the end, as a user, it's a pain04:10
Steap_if it's not packaged in the distrib, it can keep me busy for a long time04:10
clarkbI think the failures we saw last time wheels were enabled were due to having wheels but telling pip to not use them04:11
Steap_clarkb: well, when only wheels are available for a given package, thegates fail to install it04:11
clarkbneed to go the other way around04:11
Steap_clarkb: oh04:11
Steap_sure04:11
clarkbSteap_ we dont use wheels today04:11
clarkbso that fails04:11
clarkbwheels are not enabled in tests now so a package that is only wheeled wont install04:12
Steap_clarkb: ok04:12
Steap_yes, that's what happens with six04:12
*** david-lyle has joined #openstack-infra04:13
Steap_so, maybe we should discard my patch and just enable wheels in tests04:13
Steap_shouldn't we ?04:13
clarkbno your patch is good04:13
clarkbthen we enable wheels again04:13
Steap_why would we still need the tarballs ? :)04:13
Steap_if wheels are awesome04:13
clarkbbecause not everyone will have wheels enabled04:13
Steap_ok04:13
clarkbtarballs work everywhere04:13
Steap_yes04:14
Steap_that's the good thing about them :)04:14
clarkbwheels are system specific04:14
dolphmcan anyone take a glance at this very short log and tell me it's not normal? http://logs.openstack.org/37/69137/5/check/check-tempest-dsvm-postgres-full/3cf2f41/logs/devstack-gate-setup-host.txt04:14
dolphm"useradd: user 'stack' already exists" etc04:15
fungidolphm: sounds like a host got reused during a test. likely a casualty of jenkins 1.551 (which we just downgraded away from a few hours ago04:19
dolphmfungi: i juust filed a bug report for it -- is it a dupe of something?04:19
clarkbfungi timestamps are newer. does d-g create the user before devstack04:19
dolphmhttps://bugs.launchpad.net/devstack/+bug/128190204:19
uvirtbotLaunchpad bug 1281902 in openstack-ci "/opt/stack/new/devstack/functions-common:1128 Keystone fail to get token" [Undecided,New]04:19
fungioh, i was going off the "user 'stack' already exists"04:20
*** markmcclain has joined #openstack-infra04:20
fungii believe it does, because it needs to chown some things04:20
lifelessSteap_: you need the tarballs to create wheels for platforms04:21
dolphmfungi: host re-use makes sense, given the other failures with git04:21
clarkbstill possible reuse happened04:21
*** markmcclain has quit IRC04:22
*** changbl has joined #openstack-infra04:23
*** lcheng has joined #openstack-infra04:23
*** sarob has quit IRC04:23
clarkbfungi swap didnt need fixing I think you are right node was reused04:24
*** coolsvap has joined #openstack-infra04:25
*** gokrokve has quit IRC04:27
*** gokrokve has joined #openstack-infra04:28
*** jeckersb is now known as jeckersb_gone04:29
*** dkliban has joined #openstack-infra04:32
*** gokrokve has quit IRC04:32
*** ryanpetrello has joined #openstack-infra04:35
*** matsuhashi has joined #openstack-infra04:35
*** tian has quit IRC04:50
*** masayukig has joined #openstack-infra04:51
*** sarob has joined #openstack-infra04:51
*** yamahata_ has quit IRC04:51
*** tian has joined #openstack-infra04:52
*** yamahata_ has joined #openstack-infra04:56
*** jaypipes has quit IRC05:00
zaroclarkb, fungi : was there a bug for jenkins zmq job?05:04
*** rcleere has quit IRC05:05
*** lcheng has quit IRC05:09
*** khyati has joined #openstack-infra05:11
clarkbzaro I thought do but I may havemisread05:12
*** vogxn has joined #openstack-infra05:15
*** miqui has quit IRC05:15
zaroclarkb: i don't think so, the only bug i marked fixed today was 127618005:17
zarobug 127618005:17
uvirtbotLaunchpad bug 1276180 in openstack-ci "Gerrit hook scripts failing with IndexError exceptions" [High,Fix committed] https://launchpad.net/bugs/127618005:17
clarkbI mustve misparsed mail on my phone then05:18
zaroclarkb: you have time to take a quick look at change https://review.openstack.org/#/c/60348 ?05:19
zaroreal quick look, i promise.05:20
*** markmcclain has joined #openstack-infra05:23
*** nicedice has quit IRC05:23
*** lcheng has joined #openstack-infra05:24
*** sarob has quit IRC05:24
clarkbbut house of cards05:25
*** chandan_kumar has joined #openstack-infra05:26
*** nati_ueno has joined #openstack-infra05:27
*** markmcclain has quit IRC05:27
*** CaptTofu has joined #openstack-infra05:27
*** nati_ueno has quit IRC05:28
*** nati_ueno has joined #openstack-infra05:29
*** mfisch has quit IRC05:31
*** CaptTofu has quit IRC05:32
zaroclarkb: huh? what does that mean?05:34
clarkbits a show on netflix. quite good.05:35
*** mfisch has joined #openstack-infra05:35
*** mfisch has joined #openstack-infra05:35
openstackgerritShawn Hartsock proposed a change to openstack/requirements: add pyvmomi library  https://review.openstack.org/6996405:36
*** nati_uen_ has joined #openstack-infra05:37
*** DinaBelova_ is now known as DinaBelova05:37
*** amotoki has quit IRC05:39
*** nati_ueno has quit IRC05:41
*** dstanek has quit IRC05:48
*** dstanek has joined #openstack-infra05:49
*** sarob has joined #openstack-infra05:51
*** wenlock has joined #openstack-infra05:57
openstackgerritA change was merged to openstack/requirements: Update hp3parclient low version number  https://review.openstack.org/7372705:58
nibalizerclarkb: fungi sure 2GB sounds fine to start with05:59
nibalizeri'd also want a chunk of disk too, at least 20GB for it to write stuff down in05:59
*** coolsvap1 has joined #openstack-infra05:59
nibalizer(/var)05:59
*** gokrokve has joined #openstack-infra06:01
*** e0ne has joined #openstack-infra06:02
*** coolsvap has quit IRC06:03
*** gokrokve has quit IRC06:06
*** vkozhukalov has joined #openstack-infra06:10
*** hdd_ has quit IRC06:14
funginibalizer: yeah, it has at least that much, but we can attach volumes too06:17
*** cadenzajon has joined #openstack-infra06:20
fungianyway, it puppeted fine and is up and in dns now06:20
*** markmcclain has joined #openstack-infra06:23
*** amotoki has joined #openstack-infra06:23
*** gokrokve has joined #openstack-infra06:24
*** sarob has quit IRC06:24
fungideleting 70 nodes over 3 hours in their current state06:25
*** amotoki has quit IRC06:25
*** yolanda has joined #openstack-infra06:25
*** amotoki has joined #openstack-infra06:26
*** cadenzajon has quit IRC06:26
*** banix has quit IRC06:26
*** markmcclain has quit IRC06:28
*** e0ne has quit IRC06:28
*** markmcclain has joined #openstack-infra06:30
nibalizerfungi: sweeeet!06:30
jeblairfungi: the hpcloud nodepool providers haven't done anything since around 5:3006:33
*** ryanpetrello has quit IRC06:33
fungioh/06:34
fungi?06:34
jeblairfungi: i think they're waiting for network data06:34
*** markmcclain has quit IRC06:34
*** matsuhashi has quit IRC06:35
jeblairfungi: yes, they are all sitting in     return self._sslobj.read(len)06:36
jeblair(inside of ssl, called from urllib06:36
jeblairfungi: suggest we just restart nodepool06:36
clarkb++06:37
fungiwfm06:37
*** matsuhashi has joined #openstack-infra06:37
jeblairdone06:37
jeblairi'll delete nodes that were 'building' and 'delete' while it was stopped06:38
*** lcheng has quit IRC06:39
jeblairokay, that's started; 560 nodes06:39
jeblair2 of my scripts that were deleting keypairs similarly stopped06:39
jeblairi restarted them06:39
fungithanks... i'm operating at a reduced capacity at this point and may just grab a nap in preparation for whatever fresh challenges await us tomorrow06:39
fungiwondering if hp had network maintenance or something06:40
jeblairfungi: good question.06:40
openstackgerritSpencer Krum proposed a change to openstack-infra/config: Enable puppetdb from puppetmaster  https://review.openstack.org/7461206:40
jeblairi also wonder if there's a way we can protect against that; basically that was a novaclient call that just never returned.06:40
nibalizerfungi: there is your follow up ^^06:41
funginibalizer: thanks!06:41
nibalizerfungi: can you check /var/log/puppetdb/puppetdb.log for any errors or warnings?06:42
*** gokrokve_ has joined #openstack-infra06:45
*** gokrokve has quit IRC06:48
*** khyati has quit IRC06:49
*** gokrokve_ has quit IRC06:49
funginibalizer: info lines filtered out for brevity... http://paste.openstack.org/show/6716606:51
*** sarob has joined #openstack-infra06:51
*** lcheng has joined #openstack-infra06:52
*** lttrl has joined #openstack-infra06:53
fungilooks like it wants ~150g in /var/lib/puppetdb06:53
nibalizerfungi: okay i was more or less expecting that message06:54
nibalizerim not sure what the economics of adding more storage are06:54
fungiso we'll either need to tune it down or add a volume there06:54
fungifree (for us)06:54
nibalizerokay well if its not expensive lets just add disk06:55
fungibut it will need to wait for tomorrow unless someone else wants to take over. i'm officially out of steam (2am here)06:56
nibalizerthats fine with me06:56
clarkbfungi: sleep06:56
nibalizerim just trying to make sure you all aren't blocked on me06:56
funginibalizer: not at all--thanks for the help!06:57
* fungi is just blocked on not enough hours in the day06:57
nibalizerfor context, my company charges something like $8/gig/month for storage between teams, so i wondered if 150 gig would be a problem06:57
fungiwe have very generous sponsors06:58
lifeless8/GBM - wow06:58
*** e0ne has joined #openstack-infra06:59
lifelessthats some fancy pants storage at that rate06:59
*** jhesketh has quit IRC06:59
*** jhesketh__ has quit IRC06:59
fungiengraved on golden platters07:00
*** e0ne has quit IRC07:00
*** e0ne has joined #openstack-infra07:01
*** thomasbiege has joined #openstack-infra07:01
*** lcheng has quit IRC07:04
*** rlandy has joined #openstack-infra07:04
*** morganfainberg is now known as morganfainberg_Z07:05
*** e0ne has quit IRC07:06
*** mgagne has joined #openstack-infra07:07
nibalizeryea... if my company would openstack... that would be great07:10
nibalizer(on commodity hardware)07:10
*** mgagne1 has quit IRC07:10
lifelessnibalizer: so what does it want 150GB for07:10
lifelessthats like 2 copies of PyPI07:10
clarkblogs07:12
nibalizerit doesn't need that much space07:12
nibalizerin my experience07:12
clarkbfor every puppet run done every 10 minutes on every server. now with single use slaves its probably not terrible07:12
nibalizerare the single use slaves the ones that run puppet apply?07:13
nibalizerbecause those will probably never hit puppetdb07:13
nibalizerat my work we have 300+ nodes and 20K resources on a 17GB disk with plenty of space to go, given a 2 week retention policy07:14
*** mrda is now known as mrda_away07:14
clarkbright so no impact from them07:15
nibalizerhow many nodes are there checking in?07:15
*** basha has joined #openstack-infra07:16
*** jcooley_ has quit IRC07:17
*** jcooley_ has joined #openstack-infra07:17
*** chandan_kumar has quit IRC07:18
*** dstanek has quit IRC07:18
*** basha has quit IRC07:20
*** denis_makogon has joined #openstack-infra07:21
*** bhuvan has joined #openstack-infra07:21
*** jcooley_ has quit IRC07:22
*** e0ne has joined #openstack-infra07:24
*** sarob has quit IRC07:25
*** chandan_kumar has joined #openstack-infra07:25
*** markwash has quit IRC07:26
*** vogxn has quit IRC07:27
*** e0ne has quit IRC07:28
*** CaptTofu has joined #openstack-infra07:29
*** dstanek has joined #openstack-infra07:29
*** saju_m has joined #openstack-infra07:30
*** markmcclain has joined #openstack-infra07:31
*** CaptTofu has quit IRC07:33
*** markmcclain has quit IRC07:35
*** vishy has quit IRC07:40
*** cyeoh has quit IRC07:40
*** DinaBelova is now known as DinaBelova_07:40
*** cyeoh has joined #openstack-infra07:41
*** vishy has joined #openstack-infra07:43
*** mrmartin has joined #openstack-infra07:47
*** nati_uen_ has quit IRC07:49
*** nati_ueno has joined #openstack-infra07:49
*** sarob has joined #openstack-infra07:51
*** e0ne has joined #openstack-infra07:56
*** daniil has quit IRC07:56
*** luqas has joined #openstack-infra07:58
*** e0ne has quit IRC08:00
*** skraynev_afk has quit IRC08:01
*** sandywalsh has joined #openstack-infra08:03
*** basha has joined #openstack-infra08:08
*** denis_makogon has quit IRC08:16
*** DinaBelova_ is now known as DinaBelova08:20
*** saju_m has quit IRC08:21
*** sarob has quit IRC08:24
*** jgallard has joined #openstack-infra08:26
*** thomasbiege has quit IRC08:26
*** DinaBelova is now known as DinaBelova_08:28
*** DinaBelova_ is now known as DinaBelova08:28
*** vogxn has joined #openstack-infra08:28
*** markmcclain has joined #openstack-infra08:32
*** vogxn has quit IRC08:33
*** openstack has joined #openstack-infra08:42
-dickson.freenode.net- [freenode-info] if you're at a conference and other people are having trouble connecting, please mention it to staff: http://freenode.net/faq.shtml#gettinghelp08:42
*** asadoughi has joined #openstack-infra08:43
*** nati_uen_ has joined #openstack-infra08:43
*** chandan_kumar has joined #openstack-infra08:44
*** jog0 has joined #openstack-infra08:45
*** matrohon has joined #openstack-infra08:46
*** nati_ueno has quit IRC08:47
*** jroovers|afk has joined #openstack-infra08:49
*** koolhead17 has joined #openstack-infra08:50
*** basha has joined #openstack-infra08:50
*** sarob has joined #openstack-infra08:51
*** jcoufal has joined #openstack-infra08:52
*** rossella-s has joined #openstack-infra08:53
*** jpich has joined #openstack-infra08:54
*** lttrl has quit IRC08:54
*** markmcclain has joined #openstack-infra08:59
*** nosnos_ has joined #openstack-infra08:59
*** nosnos has quit IRC08:59
*** markmcclain1 has joined #openstack-infra09:00
*** markmcclain has quit IRC09:01
*** afazekas has joined #openstack-infra09:01
*** e0ne has joined #openstack-infra09:04
*** markmcclain1 has quit IRC09:05
*** coolsvap1 has quit IRC09:06
*** dstanek has quit IRC09:09
*** derekh has joined #openstack-infra09:13
*** yassine has joined #openstack-infra09:14
*** flaper87|afk is now known as flaper8709:16
*** hashar has joined #openstack-infra09:18
*** hashar_ has joined #openstack-infra09:19
*** hashar has quit IRC09:22
*** chandan_kumar has quit IRC09:23
*** sarob has quit IRC09:25
*** NikitaKonovalov is now known as NikitaKonovalov_09:29
*** CaptTofu has joined #openstack-infra09:29
*** CaptTofu has quit IRC09:34
*** coolsvap has joined #openstack-infra09:34
*** DinaBelova is now known as DinaBelova_09:35
*** yaguang has quit IRC09:36
*** fbo_away is now known as fbo09:37
*** dpyzhov has joined #openstack-infra09:37
*** mrmartin has quit IRC09:39
*** luqas has quit IRC09:40
*** dpyzhov has joined #openstack-infra09:41
*** chandan_kumar has joined #openstack-infra09:41
*** jp_at_hp has joined #openstack-infra09:42
*** chandankumar_ has joined #openstack-infra09:42
*** chandan_kumar has quit IRC09:46
*** matsuhashi has quit IRC09:46
*** nosnos_ has quit IRC09:46
*** nosnos has joined #openstack-infra09:47
*** DinaBelova_ is now known as DinaBelova09:47
*** matsuhashi has joined #openstack-infra09:47
*** pblaho has joined #openstack-infra09:50
*** gilliard has quit IRC09:51
*** sarob has joined #openstack-infra09:51
*** saju_m has joined #openstack-infra09:58
*** jdurgin has quit IRC09:59
*** noorul has joined #openstack-infra10:00
noorulhttp://logs.openstack.org/98/69498/31/check/gate-solum-docs/9de63ba/console.html#_2014-02-19_03_44_50_84010:00
noorulAny idea why that is happening?10:00
*** markmcclain has joined #openstack-infra10:01
*** NikitaKonovalov_ is now known as NikitaKonovalov10:03
*** julienvey has joined #openstack-infra10:04
*** markmcclain has quit IRC10:06
*** nati_uen_ has quit IRC10:07
*** pblaho has quit IRC10:07
*** pblaho has joined #openstack-infra10:08
*** jdurgin has joined #openstack-infra10:12
*** luqas has joined #openstack-infra10:20
*** hashar_ is now known as hashar10:20
*** sarob has quit IRC10:24
*** masayukig has quit IRC10:25
*** ociuhandu has joined #openstack-infra10:29
*** chandankumar_ has quit IRC10:32
*** ArxCruz has joined #openstack-infra10:33
*** alexpilotti has joined #openstack-infra10:34
*** dpyzhov has quit IRC10:35
*** dpyzhov has joined #openstack-infra10:37
*** dizquierdo has joined #openstack-infra10:38
*** dpyzhov has quit IRC10:38
*** flaper87 is now known as flaper87|afk10:38
kiallSeems the gerrit bot is MIA10:40
*** flaper87|afk is now known as flaper8710:42
BobBalljeblair: I'm using novaclient 2.15.0 with nodepool - my problem was NOVA_RAX_AUTH was set to 1, which is an environment variable read explicitly by novaclient and used to set auth_system thus introducing the dependency on the RAX authentication method.10:44
*** chandankumar_ has joined #openstack-infra10:47
*** nati_ueno has joined #openstack-infra10:49
*** flaper87 is now known as flaper87|afk10:50
*** flaper87|afk is now known as flaper8710:50
*** sarob has joined #openstack-infra10:51
*** nati_ueno has quit IRC10:53
*** che-arne has joined #openstack-infra10:55
*** dpyzhov has joined #openstack-infra10:57
*** dpyzhov has quit IRC10:59
*** dpyzhov has joined #openstack-infra11:00
*** markmcclain has joined #openstack-infra11:02
*** markmcclain has quit IRC11:06
*** basha has quit IRC11:11
*** jgallard has quit IRC11:12
*** wenlock has quit IRC11:14
*** chandankumar_ has quit IRC11:14
*** chandan_kumar has joined #openstack-infra11:15
*** NikitaKonovalov is now known as NikitaKonovalov_11:17
*** NikitaKonovalov_ is now known as NikitaKonovalov11:19
*** heyongli has joined #openstack-infra11:20
*** CaptTofu has joined #openstack-infra11:22
*** ociuhandu has quit IRC11:23
*** mrmartin has joined #openstack-infra11:24
*** johnthetubaguy has joined #openstack-infra11:24
*** andreaf has joined #openstack-infra11:25
*** sarob has quit IRC11:25
*** CaptTofu has quit IRC11:26
*** matsuhashi has quit IRC11:28
enikanorov_hi. does anyone knows what's up with check queue? looks like it's stuck11:31
*** matsuhashi has joined #openstack-infra11:31
ilyashakhat_enikanorov_: maybe SergeyLukjanov ?11:32
enikanorov_SergeyLukjanov: ping11:32
SergeyLukjanovenikanorov_, ilyashakhat_, pong11:32
SergeyLukjanovlooking on it11:32
SergeyLukjanovheh, I'm afraid that we have no free devstack-precise slaves11:34
ilyashakhat_is it ok such large 'Deleting' area on Job Stats?11:36
*** CaptTofu has joined #openstack-infra11:36
*** mrmartin has quit IRC11:36
SergeyLukjanovilyashakhat_, nope11:36
SergeyLukjanovilyashakhat_, I don't see that any jobs are running now11:38
SergeyLukjanovand mostly all slaves are offline11:38
*** jroovers|afk is now known as jroovers11:39
*** ociuhandu has joined #openstack-infra11:42
*** noorul has left #openstack-infra11:42
SergeyLukjanovsdague, I've checked all jenkins nodes and we have only several online slaves on jenkins and jenkins0111:42
sdagueyeh, it looks like something went all bonkers again11:43
sdagueit looks like, from scrollback, they were working on it last night11:44
sdagueso I think it's just a wait for fungi thing, because this is the class of things where you need root to go fix I think11:45
SergeyLukjanovsdague, yup11:45
SergeyLukjanovI think that we have tons of 'deleting' slaves that are already offline on jenkins11:45
SergeyLukjanovprobably, nodepool is dead :(11:45
*** che-arne has quit IRC11:45
sdagueyeh, probably11:46
*** hashar is now known as hasharAW11:47
*** sarob has joined #openstack-infra11:51
SergeyLukjanovhm, one more idea is that it's related to the fact that gate-noop is now running on single use nodes11:56
*** coolsvap has quit IRC11:57
*** hasharAW has quit IRC11:58
*** lcostantino has joined #openstack-infra12:01
*** markmcclain has joined #openstack-infra12:03
*** rfolco has joined #openstack-infra12:07
*** markmcclain has quit IRC12:07
*** ArxCruz has quit IRC12:08
*** lcostantino has quit IRC12:08
*** max_lobur has joined #openstack-infra12:09
*** lcostantino has joined #openstack-infra12:09
*** mrmartin has joined #openstack-infra12:10
*** ArxCruz has joined #openstack-infra12:10
mrmartinre12:12
*** banix has joined #openstack-infra12:13
*** dpyzhov has quit IRC12:13
*** lcostantino has quit IRC12:14
mrmartinsomething wrong with gating jobs, some tasks had been started more than 9 hours ago12:14
sdaguemrmartin: yep, no one that can fix it is currently awake12:14
*** ArxCruz has quit IRC12:16
*** matsuhashi has quit IRC12:20
kiallGuess it's time for someone to give the gate a kick ;) Totally and utterly wedged.12:20
*** NikitaKonovalov is now known as NikitaKonovalov_12:20
*** ArxCruz has joined #openstack-infra12:20
*** sarob has quit IRC12:24
*** matsuhashi has joined #openstack-infra12:25
*** lcostantino has joined #openstack-infra12:30
*** luqas has quit IRC12:32
*** yamahata has quit IRC12:36
*** lcostantino has quit IRC12:37
*** Nikolay_St has quit IRC12:38
*** hashar has joined #openstack-infra12:42
*** dpyzhov has joined #openstack-infra12:45
*** jgallard has joined #openstack-infra12:46
*** che-arne has joined #openstack-infra12:49
*** sarob has joined #openstack-infra12:51
*** smarcet has joined #openstack-infra12:52
*** banix has quit IRC12:54
*** CaptTofu has quit IRC12:55
*** CaptTofu has joined #openstack-infra12:56
*** nosnos has quit IRC12:56
*** CaptTofu has quit IRC13:00
*** NikitaKonovalov_ is now known as NikitaKonovalov13:01
*** lcostantino has joined #openstack-infra13:01
*** david-lyle has quit IRC13:01
*** banix has joined #openstack-infra13:01
*** koolhead17 has quit IRC13:01
*** koolhead17 has joined #openstack-infra13:01
*** markmcclain has joined #openstack-infra13:04
*** matsuhashi has quit IRC13:08
*** matsuhashi has joined #openstack-infra13:08
*** matsuhashi has quit IRC13:08
*** markmcclain has quit IRC13:08
*** dkranz has quit IRC13:09
ekarlsodid someone threw a banany peel into the gate or ?13:11
SergeyLukjanovfungi, clarkb, jeblair, mordred, gate is dead // just want to be sure that you'll see it :)13:13
*** zhiyan_ is now known as zhiyan13:14
*** luqas has joined #openstack-infra13:17
*** ken1ohmichi has quit IRC13:18
*** dprince has joined #openstack-infra13:19
*** mrmartin has quit IRC13:21
*** thomasbiege has joined #openstack-infra13:21
*** pdmars has joined #openstack-infra13:22
*** sarob has quit IRC13:24
*** pdmars has quit IRC13:25
*** weshay has joined #openstack-infra13:33
*** luqas has quit IRC13:33
*** dolphm has quit IRC13:35
*** dolphm has joined #openstack-infra13:35
*** dcramer__ has joined #openstack-infra13:36
*** lcostantino has quit IRC13:36
*** dolphm has quit IRC13:37
*** dolphm has joined #openstack-infra13:38
*** sandywalsh has quit IRC13:39
*** mrmartin has joined #openstack-infra13:40
*** CaptTofu has joined #openstack-infra13:45
*** dizquierdo has quit IRC13:46
*** hashar has quit IRC13:47
*** sarob has joined #openstack-infra13:51
*** sandywalsh has joined #openstack-infra13:53
*** thomasem has joined #openstack-infra13:53
*** mfer has joined #openstack-infra13:55
*** dkehn_ has quit IRC13:55
*** dolphm has quit IRC13:55
*** CaptTofu has quit IRC13:56
*** dolphm has joined #openstack-infra13:57
*** dpyzhov has quit IRC13:57
*** ryanpetrello has joined #openstack-infra13:57
*** luqas has joined #openstack-infra13:57
*** dpyzhov has joined #openstack-infra13:57
*** salv-orlando has quit IRC13:58
*** CaptTofu has joined #openstack-infra13:58
*** dolphm has quit IRC13:58
*** lcostantino has joined #openstack-infra13:59
*** dolphm has joined #openstack-infra13:59
*** gordc has joined #openstack-infra13:59
*** hashar has joined #openstack-infra14:01
*** sandywalsh_ has joined #openstack-infra14:02
*** markmcclain has joined #openstack-infra14:04
*** prad has quit IRC14:05
*** banix has quit IRC14:05
*** sandywalsh has quit IRC14:06
*** heyongli has quit IRC14:06
*** dcramer__ has quit IRC14:07
*** markmcclain has quit IRC14:08
*** lcostantino has quit IRC14:09
*** salv-orlando has joined #openstack-infra14:09
*** markmc has joined #openstack-infra14:10
*** saju_m has quit IRC14:11
*** dkranz has joined #openstack-infra14:11
*** dpyzhov has quit IRC14:12
*** salv-orlando has quit IRC14:12
*** andreaf has quit IRC14:14
*** smarcet has left #openstack-infra14:17
*** smarcet has joined #openstack-infra14:17
*** pafuent has joined #openstack-infra14:18
*** yamahata has joined #openstack-infra14:19
fungilooking now14:22
fungilooks like we've piled up fake ready nodes again14:22
*** e0ne has quit IRC14:23
*** julim has joined #openstack-infra14:23
sdaguefungi: yeh, it seems to have started happening about the same time the infra team was signing off last night14:23
sdagueso I'm curious if a late change caused issues14:23
*** dolphm has quit IRC14:24
funginot sure. i'll have to read scrollback. we were still stabilizing things coming out of the jenkins upgrade and redowngrade14:24
*** sarob has quit IRC14:24
*** lcostantino has joined #openstack-infra14:24
fungiby the time i passed out14:25
sdagueyep, that's fair14:25
*** oubiwann has joined #openstack-infra14:25
ttxSergeyLukjanov: I thought you'd magically get things fixed while everyone else sleeps14:26
fungihopefully this isn't an issue with jenkins 1.632.2, because if so we've basically ruled out being able to use any of the jenkins releases which aren't riddled with known security holes14:26
*** dpyzhov has joined #openstack-infra14:26
fungier, 1.532.214:26
SergeyLukjanovttx, I have no root access and it looks like problem is about non-code stuff :(14:26
fungiwhich was what drove me to upgrade us to 1.551 yesterday. huge list of security fixes, some in bits we expose to the public14:27
ttxSergeyLukjanov: just trolling you, ignore me :)14:27
SergeyLukjanovttx :)14:27
SergeyLukjanovfungi, can I help you somehow?14:28
fungii've got 10 loops running in parallel right now deleting any nodepool nodes which are >3 hours in any state14:28
fungithis should get things moving again, i think14:28
fungichecking scrollback to see if they left us any other breadcrumbs14:29
ttxWhen I saw 400 checks piled up I thought: "we should really have those tripleo checks appear in a separate display".. then I looked again14:30
fungiyah14:30
SergeyLukjanovheh14:31
*** e0ne has joined #openstack-infra14:31
*** dims has quit IRC14:32
*** wenlock has joined #openstack-infra14:33
fungiso as best i can tell, nodepool thinks it piled about 600 nodes onto jenkins04, a nearly couple hundred of them in a ready state, but jenkins04's interface shows only offline nodes14:34
*** jeckersb_gone is now known as jeckersb14:34
fungihard to tell how many, but it *could* be in the hundreds14:34
*** dims has joined #openstack-infra14:35
*** thomasbiege has quit IRC14:35
fungilooks like we're down gerritbot and statusbot too...14:37
fungi2014-02-19 08:37:31     <--     openstackgerrit (~openstack@review.openstack.org) has quit (Ping timeout: 260 seconds)14:37
fungi2014-02-19 08:38:01     <--     openstackstatus (~openstack@eavesdrop.openstack.org) has quit (Ping timeout: 272 seconds)14:37
sdaguefungi: the deletes are already helping, things started to move again in gate queue14:38
fungii'll get the bots restarted and see whether something happened in raxland around 6 hours ago (which could also coincide with when this started to go south)14:39
*** yamahata has quit IRC14:39
*** dstanek has joined #openstack-infra14:39
*** yamahata has joined #openstack-infra14:39
fungisort of odd to see bots from two different servers fall off from a ping timeout at the same moment. and those servers are in the same rax region as the nodepool server and jenkins masters14:40
*** dkliban has quit IRC14:40
* fungi makes the obligatory "clouds" sigh and finds a second cup of coffee14:40
SergeyLukjanovfungi, it sounds like rax network outage could be the reason14:41
*** ildikov_ has joined #openstack-infra14:41
fungi"Our engineers have received reports of a brief network disruption that impacted a portion of our DFW2 data center starting at approximately 02:36 CST. The team engaged has stabilized the issue at approximately 02:41 CST and will continue to monitor for further impact. "14:42
ArxCruzlifeless: hey, can you give me your blessing here https://review.openstack.org/#/c/70152/ ?14:42
ArxCruz:)14:42
fungihttps://status.rackspace.com/14:42
fungi02:36 CST is 08:36 UTC for the timezone-impared14:43
fungiwell, we already know that nodepool behaves terribly in the face of provider outages, and thankfully lifeless and derekh have patches proposed which should help that14:44
*** jp_at_hp has quit IRC14:47
fungithat tempest change failing near the head of the gate hit connectivity issues to pypi.python.org trying to download pip around 14:38, just a few minutes ago, from a rax-dfw slave too14:47
SergeyLukjanovfungi, it's good14:47
*** russellb has joined #openstack-infra14:48
SergeyLukjanovfungi, is pip installation the only external op?14:48
fungi(separate note, we still need to neuter get-pip so that it installs from a local cache on these systems)14:48
fungiSergeyLukjanov: nah, jobs also need to look up dns records, retrieve git updates and zuul refs, upload logs/artifacts and stream data back to the jenkins master too14:49
*** dhellmann has quit IRC14:49
*** jp_at_hp has joined #openstack-infra14:49
fungianyway, my pint was that whatever this is happening in dfw, it might be ongoing14:49
*** dhellmann has joined #openstack-infra14:49
SergeyLukjanovfungi, bad wording from my side, I mean outside of our infra14:49
*** dizquierdo has joined #openstack-infra14:49
fungis/pint/point/ (though now i feel like i need a pint too)14:50
SergeyLukjanovfungi, dns and pip14:50
*** dhellmann has quit IRC14:50
fungiSergeyLukjanov: possibly... some less common jobs also download other sorts of things from the internet too14:50
SergeyLukjanovfungi, "whatever this is happening in dfw, it might be ongoing" :(14:50
fungiwell, just noting that was a connectivity issue from a few minutes ago, and when i checked the slave's location, it was in that same region which had the outage earlier14:51
*** sarob has joined #openstack-infra14:51
sdaguehow easy would it be to pull the whole region?14:51
fungibut it could also just be an unfortunate coincidence. i'm still casting my net wide here14:51
*** dkliban has joined #openstack-infra14:51
*** dhellmann has joined #openstack-infra14:51
fungisdague: not easy... for historical reasons we have most of our static infrastructure deployed in rax-dfw... we'd need to rebuild a lot of longer-lived servers14:52
sdagueso, yeh, spot checking additional fails14:52
sdaguethey all look like dfw14:52
sdagueand all because of connectivitiy14:52
fungi(pretty much any infra service you can think of, aside from nodepool slaves, backups and some experimental systems, is in dfw)14:53
anteayaah14:53
sdagueoof14:53
anteayasounds like some waves of movement might be a good idea14:53
fungiso the *good* news here is that we could recover from complete loss of dfw, but it's not a move to be undertaken on a whim14:54
anteayano not a whim14:54
anteayabut perhaps a slow migration14:54
*** miqui has joined #openstack-infra14:54
*** jnoller has joined #openstack-infra14:54
* anteaya looks for a land bridge over the glacier14:54
anteayawhich server would be the easiest to migrate?14:55
*** luqas has quit IRC14:55
anteayafollow up question, which would be the most important?14:55
fungijust about any of them would be roughly similarly easy to migrate, with a few exceptions, but there's just a lot of systems14:55
*** dkliban has quit IRC14:56
*** mwagner_lap has joined #openstack-infra14:56
anteayaright14:56
fungithink back to the several easels of marker-smeared paper we had diagramming them from a high level at the bootcamp... then mentally add a bunch more we've brought online since then14:56
anteayayes14:56
*** luqas has joined #openstack-infra14:57
anteayaseveral easels worth14:57
anteayaif I started up and etherpad to list them, would this help the conversation/migration?14:57
anteayaeven if the consclusion is not to migrate?14:57
*** prad has joined #openstack-infra14:57
anteayahttps://etherpad.openstack.org/p/migrate-all-the-things14:59
*** mgagne has quit IRC14:59
anteayado joing me14:59
anteayajoin14:59
*** wenlock_ has joined #openstack-infra15:00
*** CaptTofu has quit IRC15:01
fungii think that's premature. what we need is a group discussion about ways to spread systems out to reduce the impact of provider outages, which probably means some redundancy... or we remind ourselves that as we've previously stated we're operating at the mercy of providers donating these resources, and they're up most of the time, and when they're not, we should just go out for a drink and clean up15:01
fungithe mess later15:01
*** CaptTofu has joined #openstack-infra15:01
anteayaokay15:01
fungibut right now i need to focus on stabilizing this and see what else might be left broken from the earlier incident15:02
anteayawell while waiting for the others I don't mind having a place to copy/paste15:02
anteayaand I can abandon the etherpad later if need be15:02
anteayaright15:02
anteayaand I need something to do because I can't help you with that15:02
*** CaptTofu has quit IRC15:02
*** CaptTofu has joined #openstack-infra15:02
fungiif we continue to see any new gate failures (besides the ones there) which are network connectivity problems and are on nodepool nodes in dfw, i'll temporarily scale nodepool off that region to buy us a little more stability15:03
*** e0ne_ has joined #openstack-infra15:03
*** protux has joined #openstack-infra15:04
*** jaypipes has joined #openstack-infra15:05
*** markmcclain has joined #openstack-infra15:05
fungithough at the cost of 132 nodes of capacity15:06
anteaya:(15:07
fungiyeah, it's a balancing act15:07
*** e0ne has quit IRC15:07
*** dpyzhov has quit IRC15:08
*** openstackstatus has joined #openstack-infra15:08
*** wenlock_ has quit IRC15:08
*** openstackgerrit has joined #openstack-infra15:09
*** dpyzhov has joined #openstack-infra15:09
fungiokay, we've got openstackstatus and openstackgerrit back15:10
*** markmcclain has quit IRC15:10
anteayayay15:10
*** Ajaeger has joined #openstack-infra15:10
anteayawhat server are they on?15:10
fungione is on eavesdrop and the other is on review15:10
jeblairfungi: good morning15:10
fungijeblair: i hope so!15:10
dims:)15:11
*** eharney has joined #openstack-infra15:11
fungijeblair: quick summary, "brief" outage in rax-dfw around 08:30 utc (but maybe with lingering effects, jury's still deliberating)15:11
*** NikitaKonovalov is now known as NikitaKonovalov_15:12
jeblairfungi: so... i was reading the scrollback and talk of a mass migration from dfw and imagined something rather serious...15:12
jeblairfungi: anything i should kick or check?15:13
funginot especially serious, no15:13
*** bknudson has quit IRC15:13
fungijeblair: a deeper health check on nodepoold would be great. it does seem to be adding replacement nodes as i delete the stale ones, but curious whether it warrants restarting15:13
jeblairack15:14
sdagueone other oddity15:14
sdagueI can't seem to find a single functioning py26 node in the currently running list15:15
jeblairsdague: ack15:15
sdagueso that might be a parallel thing to look into, because even if the nodepool recovers, that will hold us up15:15
*** jeckersb is now known as jeckersb_gone15:16
*** jergerber has joined #openstack-infra15:16
fungisdague: i'll make a second pass to delete the py26 nodes which aren't in use, to speed that along15:16
*** DinaBelova is now known as DinaBelova_15:16
sdaguefungi: cool15:17
fungithanks for spotting it15:17
*** jeckersb_gone is now known as jeckersb15:17
anteayaI count 63 servers listed in cacti15:19
anteayawhich are now listed in the etherpad15:19
jeblairfungi: hpcloud seems operational; az1 and az3 are idle because they are at capacity, likely with false-ready nodes15:19
*** e0ne_ has quit IRC15:19
jeblairfungi: you have deletes going on that will catch those?15:19
fungijeblair: yes15:19
*** malini has joined #openstack-infra15:20
*** e0ne has joined #openstack-infra15:20
fungi10 loops going in parallel right now15:20
fungithough it looks like nodepoold is adding new ready nodes which aren't picking up jobs either... i'm starting to suspect it's having trouble adding them to jenkins masters successfully15:20
funginone of the nodes currently in a ready state are >1hr there15:21
jeblairanteaya: keep in mind that it's generally better to have single-point-of-failure servers that interact with each other in the same data center; it's likely that if we spread out some of our services to 2 data centers, we woud be subject to twice the number of service interruptions15:21
jeblairanteaya: the exception of that is if we can make services truly ha; that is difficult for many of the more important things we run.15:22
*** gokrokve has joined #openstack-infra15:22
fungiand looks like all the centos6 nodes which have been ready for over 0.2 hours are on jenkins04 for some reason15:22
fungiwhich is the same master which had most of the nodes earlier. i'm checking it out now15:22
fungimaybe it's having issues15:22
fungiyeah, jenkins04 currently has a handful of offline devstack slaves from rax-dfw assigned to it, and nothing else15:23
*** sarob has quit IRC15:23
anteayajeblair: ah a good point, I did not know that15:23
fungiso whatever nodepool thinks is going on, it's wrong15:23
jeblairfungi: seeing if the jenkins provider for 04 is stuck...15:25
jeblair  File "/usr/lib/python2.7/ssl.py", line 160, in read15:25
jeblair    return self._sslobj.read(len)15:25
jeblairhey same place novaclient was stuck yesterday15:25
fungiis that via requests, or direct socket?15:26
*** ihrachys has quit IRC15:26
jeblair2014-02-19 08:33:40,851 DEBUG nodepool.JenkinsManager: Manager jenkins04 running task <nodepool.jenkins_manager.CreateNodeTask object at 0x7f3fddc7be10>15:26
*** sandywalsh_ has quit IRC15:26
jeblairstuck since then15:26
fungiyep, that's when all the trouble began15:26
jeblairfungi: its via   File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen15:27
fungiahh, okay15:27
jeblairfungi: so i think we're going to need a restart15:27
*** dmsimard has joined #openstack-infra15:27
fungiokay, i'll take care of it if you're done debugging it's present state15:28
*** atiwari has joined #openstack-infra15:28
jeblairi am; go for it15:28
dmsimardHi guys, I think a merge glitched but I wanted to ask you to make sure not to re-try the merge if there's problems right now.. https://review.openstack.org/#/c/74082/15:28
Shrewsjeblair, fungi: not really following the issue too closely, but when I've seen network errors like that (getting stuck on reads after cloud outages), having keepalive enabled on the sockets usually helps prevent the "stuck"15:28
*** zhiyan is now known as zhiyan_15:29
*** bknudson has joined #openstack-infra15:29
*** dkliban has joined #openstack-infra15:29
jeblairShrews: good idea; hopefully we can get that, or something, passed through all the layers we need15:29
jeblair(novaclient, requests, urllib[123456789], ssl, socket)15:30
openstackgerritRuslan Kamaldinov proposed a change to openstack-infra/storyboard: Update developer documentation  https://review.openstack.org/7471315:30
*** ihrachys has joined #openstack-infra15:30
jeblairdmsimard: thanks for pointing that out; that looks like a new kind of failure we have only seen a couple of times.  it suggests something wrong with the git repos that are cached on the slave images15:32
jeblairdmsimard: unfortunately that slave is gone now; but you should be able to just try again15:32
annegentlewhere's the one true Jenkins to find out if something built? www-01.jenkins.openstack.org?15:32
*** coolsvap has joined #openstack-infra15:33
dmsimardjeblair: I checked the recheck bugs but haven't found anything that seemed to be like that. Should I recheck with a specific bug # ?15:33
fungiblasting out the ready nodes from prior to the restart, so that we get back some momentum. then i'll do the building and delete lists i saved from it15:33
fungiannegentle: what specifically are you looking for?15:33
annegentleJenkins where are you?15:33
jeblairdmsimard: i don't think there is one; could you please file one on openstack-ci, link to that job, and paste the bug here?15:33
annegentlefungi: why the source for this training manuals page http://git.openstack.org/cgit/openstack/openstack-manuals/tree/doc/training-guides/lab001-control-node.xml isn't being published to http://docs.openstack.org/training-guides/content/lab001-control-node.xml.html15:34
dmsimardjeblair: Will do and recheck against that bug. Thanks.15:34
*** mgagne has joined #openstack-infra15:34
annegentlefungi: specifically apt-get dist-update is showing on published, apt-get dist-upgrade is what's in the source15:34
fungiannegentle: logs.openstack.org is going to be your best bet, but you need to know how to build the url. i'll get you an example for that one specifically15:34
annegentlefungi: nice, a worked example15:34
*** bhuvan has quit IRC15:35
fungiannegentle: though before i jump into that, keep in mind taht we've had a bit of a setback this morning and jobs are just now starting to catch up... if it's for a merged change listed in the post pipeline at http://status.openstack.org/zuul/ then it possibly hasn't been finished yet15:36
fungii see about a dozen changes for openstack-manuals which haven't finished in post yet15:37
fungistill awaiting worker assignments15:37
Ajaegerannegentle: that's strange, published last on the 1st of February...15:37
fungiyeah, so sounds like it's been broken for longer. i'll pick a commit which merged a few days ago to be assured i can get you a good example15:38
fungirather than one which might still be pending completion15:38
*** david-lyle has joined #openstack-infra15:39
*** esker has joined #openstack-infra15:39
AjaegerFungi, annegentle: Go to http://docs.openstack.org/training-guides/content/ and compare the list of chapters with  http://docs.openstack.org/training-guides/content/lab001-control-node.xml.html15:40
*** jgrimm has joined #openstack-infra15:40
openstackgerritPetr Blaho proposed a change to openstack-infra/config: Adds gate-tuskar-docs job to zuul.  https://review.openstack.org/7475615:40
openstackgerritPetr Blaho proposed a change to openstack-infra/config: Adds gate-python-tuskarclient-docs job to zuul.  https://review.openstack.org/7475715:40
mordredannegentle: there is no one-true-jenkins. we have 8-masters in a pool behind gearman15:40
Ajaegerannegentle: this  one http://docs.openstack.org/training-guides/content/bk001-associate-training-guide.html shows as last section "Architect Training Guide"15:40
*** dpyzhov has quit IRC15:41
AjaegerBut the other one contains "Introduction to OpenStack" and further chapters afterwards15:41
Ajaegerannegentle: so, this looks like a problem in the openstack-manuals side, nothing fungi can help with.15:41
*** sandywalsh_ has joined #openstack-infra15:42
*** guitarzan has joined #openstack-infra15:42
*** afazekas has quit IRC15:42
fungiannegentle: so, as a working example, if you wanted to see the post jobs for http://git.openstack.org/cgit/openstack/openstack-manuals/commit/?id=254befa4824ef2b3f34be2e54eddcfabf082a6d315:42
fungiannegentle: the log url for that is http://logs.openstack.org/25/254befa4824ef2b3f34be2e54eddcfabf082a6d3/15:42
*** esker has quit IRC15:43
annegentlefungi: Ajaeger: helpful! So, figure out what patch would have fixed it, work from there15:43
fungispecifically the training-guide build/publication for 254befa post-merge is http://logs.openstack.org/25/254befa4824ef2b3f34be2e54eddcfabf082a6d3/post/openstack-training-guides/6692343/console.html15:43
Ajaegerannegentle: see this change: https://review.openstack.org/#/c/70499/15:43
AjaegerThe chapters you are missing are not published anymore...15:44
AjaegerWe really need to remove old files from the server!15:44
annegentleAjaeger: oh yes we do without me doing it manually! Arghhh15:44
*** wenlock_ has joined #openstack-infra15:44
*** markwash has joined #openstack-infra15:44
Ajaegerannegentle: did you talk with clarkb or others to remove regularly old files?15:45
fungiAjaeger: annegentle: unless all your jobs can be fixed to publish into completely separate subdirectories so that they can delete and recreate that entire tree on publication, there's not much which can really be done about having old files15:45
annegentlewe have in the past, never got a good solution (yep what fungi says)15:45
Ajaegerfungi, we publish in complete separate subdirectories.15:46
fungiAjaeger: at least previously it was a "too many cooks in the kitchen" problem (multiple jobs writing to common locations)15:46
AjaegerBut how do you want to do it - upload to subdirectory.new, then mv subdirectory to subdirectoy.old etc.15:46
Ajaegerfungi: That problem shouldn't be there anymore at all.15:47
*** esker has joined #openstack-infra15:47
jgriffithfungi: jeblair should we hold off on +2/A patches for now?15:47
fungiif they no longer do, i think there's an option to the ftp publisher to completely remove the target directory when it runs... though you also get a brief outage for that content on every update i think15:47
jgriffithfungi: jeblair or does it not matter15:47
AjaegerBut I might be wrong and oversee something ;)15:47
Ajaegerand that brief outage is the problem.15:47
*** esker has quit IRC15:47
anteayathere is a neutron patch failing in the gate15:47
AjaegerUploading takes a minute or more - that's too long IMO.15:47
*** esker has joined #openstack-infra15:47
fungijgriffith: approving stuff won't hurt. we're backlogged, but the systems can queue things up just fine15:47
anteayait appears it will be removed without a gate reset15:47
anteayaor I can remove it15:48
*** markwash has quit IRC15:48
jgriffithfungi: ok, thanks15:48
annegentleyeah the brief outage is a stopper15:48
dmsimardjeblair: https://bugs.launchpad.net/openstack-ci/+bug/128213615:48
uvirtbotLaunchpad bug 1282136 in openstack-ci "Git problem: "Failed to resolve 'HEAD' as a valid ref."" [Undecided,New]15:48
jgriffithjust wanted to make sure I don't compound issues15:48
annegentlefungi: jeblair: mordred: if the docs.openstack.org site goes to Jekyll or some such do you imagine this particular problem would go away15:48
fungiannegentle: Ajaeger: getting a location to publish things without being limited to ftp access (so that we could rsync) would make that easier15:49
jeblairAjaeger, annegentle: after i3 i'd like to switch to scp copying15:49
*** rcleere has joined #openstack-infra15:49
*** jaypipes has quit IRC15:49
annegentlefungi: jeblair: mordred: Todd Morey has ideas for that, with the overall vision being Docbook source > built to html > built with jekyll15:49
Ajaegerfungi, regarding approving - could you put approve https://review.openstack.org/73690 - and remember to delete openstack-api-ref since we go from maven to freestyle.15:49
jeblairor rsync if we can swing it15:49
mordredannegentle: uhm. - I have no idea what a jekyll is15:49
*** fbo is now known as fbo_away15:50
Ajaegerjeblair: can we rsync over ssh?15:50
fungia release name candidate which unfairly lost the poll15:50
annegentlemordred: http://jekyllrb.com/docs/usage/15:50
*** gpocentek has joined #openstack-infra15:50
mordredannegentle: ok. so it's just like doing it with the maven build from what I can see15:50
jeblairannegentle: i don't think the rendering systems affects the underlying problem.15:50
annegentlemordred: yeah15:50
mordredyeah. what jeblair said15:50
annegentlejeblair: that's true15:50
fungiAjaeger: we can rsync over ssh but need a functional shell for that (just scp/sftp access won't help) and the destination needs rsync installed15:50
Ajaegerfungi: Yeah, indeed.15:51
mordredfrom our side, we don't really care if you use maven or jekyll - other than wanting to make sure that jekyll is installable and not just hipster crap15:51
annegentlefungi: Ajaeger: moving to jekyll would be a good reason to get off Cloud Sites which gives us shell access15:51
jeblairmordred: ++15:51
AjaegerWhat is Jekyll`15:51
*** sarob has joined #openstack-infra15:51
mordredannegentle: if jekyll incentivizes moving off of cloud sites, I'm all for it15:51
annegentleAjaeger:  http://jekyllrb.com/docs/usage/15:51
jeblairannegentle: we can move off of cloud sites and switch to scp or rsync independent of when/if you switch to jekyll15:51
mordredbut also, what jeblair said15:51
annegentlemordred: jeblair: get Todd Morey to get the design done :)15:51
anteayaAjaeger: it is a templating language made up by tom of github15:51
fungirb is an abbreviation for hipster crap in japanese, right?15:52
anteayaAjaeger: used widely in the ruby community15:52
mordredfungi: ++15:52
fungi;)15:52
annegentlejeblair: not to me, the two are directly related because I don't get a marked-enough improvement15:52
jeblairannegentle: no, i'm saying we don't need to wait for that.  we have other reasons we need to change how the docs are published15:52
annegentlejeblair: in other words, I'm not willing to risk all the changes without a killer redesign15:52
annegentlejeblair: Don't wanna. :)15:52
jeblairannegentle: i'm sorry, we need to move and it can't wait for todd.  we need to make the publishing pipeline better.  :)15:53
annegentleAjaeger: jeblair: we can of course revisit at the summit and get a game plan for moving off of Cloud Sites, but right now there's not enough incentive15:53
AjaegerThanks for the explanations about Jekyll - let's see how this integrates with our XML publishing15:53
anteayaAjaeger: I'm betting it won't15:53
anteayaruby doesn't integrate15:53
annegentleAjaeger: to me it lets us stop publishing "webhelp" and publish plain html (or xhtml)15:53
anteayathat is a point of pride for ruby15:53
Ajaegerannegentle: We still have the option of running remotely a job  that deletes old files.15:53
annegentleAjaeger: yeah15:53
*** sandywalsh_ has quit IRC15:53
mordredI think we've confused about four different conversations here15:53
annegentlejeblair: I'm fine with not waiting on todd but need more incentive15:53
Ajaegerannegentle: something to discuss in Atlanta I guess15:54
annegentlemordred: that's four more fun!15:54
mordredthe incentive is that docs publication is a special pony right now15:54
* anteaya would like to focus on the fire fighting in the gate15:54
mordredand cloud sites are a bit of a pita to deal with15:54
annegentlemordred: not enough incentive15:54
*** tjones has joined #openstack-infra15:54
annegentlemordred: not with a month and a week before an rc15:54
annegentlemordred: mostly it's timin15:54
annegentletiming15:54
jeblairannegentle: for starters, we have the problem we're talking about now where you have to delete things; but moreover, we need to stop using jenkins, and the ftp publishing is not really compatible with what we're moving to15:54
jeblairannegentle: we're not doing it now! :)15:54
mordredwhat jeblair said15:55
annegentlejeblair: yes then timing is all I'm concerned with.15:55
mordredgod no. not this instant15:55
Ajaegermordred: documentation will always be special ;) But yeah, let's make it less special :)15:55
annegentlejeblair: when does jenkins go away15:55
mordredok. that makes more sense15:55
*** dcramer_ has joined #openstack-infra15:55
mordredannegentle: as soon as we can make it go away, which means we need to get rid of a few things, like ftp publishing15:55
annegentlea redesign is HIGH priority too because of translation, versioning, old files, all that15:55
mordredbut - when I say "as soon" - I mean without affecting things like FF15:56
*** vrovachev has joined #openstack-infra15:56
Ajaegerjeblair: so, one part in moving of Jenkins - and fixing a bug with image api publishing - is getting  https://review.openstack.org/73690 in ;)15:56
annegentlenot trying to conflate redesign with building, but to me they're tightly tied due to what all a redesign can also fix15:56
jeblairannegentle: don't worry, the process will be working and tested and in use and in production before we move the docs15:56
annegentlejeblair: you know I trust you guys, just trying to make sure you know the importance of a redesign (since you work at the Foundation I tell you these things too)15:56
jeblairannegentle: we should keep these things in mind so that the two projects don't make incompatible decisions, but for the most part, they really are separate and we shouldn't tie one to the other -- it could just slow both down15:57
jeblairannegentle: i wish i could make todd go faster.  :)15:57
*** amotoki_ has joined #openstack-infra15:58
jeblairannegentle: but he's constantly getting sucked into side projects, and so the larger project of "improve how the website (all of it) is published" seems to move slowly :(15:58
jeblairannegentle: believe me, i'm as interested in todd completing this kind of work as you are.15:58
annegentlejeblair: don't we all :) Yes, it's a tough rock/hard place position15:58
*** banix has joined #openstack-infra15:59
annegentlejeblair: and I'm happy to be convinced of a 2-phase approach, pulling too many levers at once is probably folly15:59
annegentlephase 1: un-jenkins phase 2: remove webhelp output15:59
*** salv-orlando has joined #openstack-infra16:00
jeblairannegentle: cool; we're about 2 years into taking baby steps to remove jenkins and getting near the end.  we _try_ to not bite of more than we can chew.16:01
fungiif you can dislocate your jaw like a snake, it helps too16:02
fungino chewing required16:03
persiaDigestion takes longer that way, though16:03
jeblairfungi: the graph suggests we have used nodes!16:03
*** salv-orlando_ has joined #openstack-infra16:03
*** salv-orlando has quit IRC16:04
*** salv-orlando_ is now known as salv-orlando16:04
fungijeblair: yep. i'm still churning through the ready ones, but should start blowing through the old building/delete nodes here shortly16:04
*** dpyzhov has joined #openstack-infra16:05
*** amcrn has joined #openstack-infra16:05
fungithe gate sparkline has dropped sharply and check has at least plateaued now16:05
fungiso we've regained forward momentum16:05
anteayawill that neutron failure cause a gate reset?16:06
anteayaI can remove it if so16:06
anteayaI'm thinking it won't16:06
jeblairanteaya: zuul already did that for you16:06
anteayagreat16:07
fungiit caused a gate reset a while ago, but the benefit of nnfi is that it shields us somewhat from the pain of resets as long as there aren't multiple failing changes causing the broken ones further down to get retried repeatedly16:07
*** sandywalsh_ has joined #openstack-infra16:07
fungithe state of that change is "failed assuming everything in front of it succeeds"16:07
anteayayes16:07
openstackgerritPetr Blaho proposed a change to openstack-infra/config: Adds gate-python-tuskarclient-docs job to zuul.  https://review.openstack.org/7475716:08
openstackgerritPetr Blaho proposed a change to openstack-infra/config: Adds gate-tuskar-docs job to zuul.  https://review.openstack.org/7475616:08
anteayaand the failure was FAIL: process-returncode16:08
jeblaironly about 2k more keypairs to delete from hpcloud16:08
fungijeblair: nearly done!16:08
*** roz has joined #openstack-infra16:08
anteayawhich we have seen before and which clarkb has said in an email is due to sys.exit being used in the tests16:08
anteayait is the line between what zuul can do for me and what I need to do manually that I am trying to get better at figuring out16:09
anteayanow the one ahead of it is failing16:09
anteayaso two of them16:09
fungiyep, but it depends on that one, so it won't get retried unless something ahead of those also fails16:10
fungibut that has caused the two cinder changes in the gate to get tests restarted without the neutron changes in line16:10
fungii think i have time today to try and get the py3k-precise nodepool nodes into operation16:11
fungiif things don't get bad again16:12
rozquick question: just the owner of a change can mark it as WIP? I am working on a change where I am not the original author and I'd like to submit a patch as WIP but I can't do it.Ootionss: submit it as DRAFT - submit the patch and put a note in a comment "This is a WIP" ? any other suggetions?16:12
*** salv-orlando has quit IRC16:12
anteayaroz: don't use draft16:12
fungiroz: there's an acl which controls that. for most openstack projects the core reviewer group on that project also has wip control16:12
*** yassine has quit IRC16:13
*** yassine has joined #openstack-infra16:13
fungiroz: but i think the heart of the issue here is that gerrit leaves the change owner as the original patchset submitter rather than the most recent patchset submitter. i'm curious to see whether that's configurable in latest gerrit releases16:14
fungizaro: ^ ? (when you're around)16:14
*** Sukhdev has joined #openstack-infra16:14
*** DinaBelova_ is now known as DinaBelova16:15
rozthanks for the replies. When you say core reviewer can control the WIP means that they can mark a change as WIP ro they can add me as "WIP cpntroller" for that specific change?16:15
anteayathey can mark the change WIP16:16
anteayathey can't change what permissions you have16:16
fungiroz: or un-wip a wip change too16:16
anteayaunless they make you core16:16
rozthanks, now it's clear.16:16
fungiroz: right, it's an acl covering wip control for an entire project--can't be assigned on a per-change basis except by modifying the owner of the change (which i don't think our current gerrit release has a feature to make that easy)16:17
*** pcrews has joined #openstack-infra16:19
fungimost of the old ready nodes are gone, and i've started some processes deleting old building nodes next16:19
fungioh, also, i was wrong about disappearing at 21:00 today for the osug monthly... i also have a tax appointment prior to that, so will actually be mostly offline starting at 19:00 utc16:20
anteayahappy tax appointment16:20
fungiso a little over couple hours from now16:20
anteayaI hope you exit smiling16:20
*** rossella has joined #openstack-infra16:21
anteayawill clarkb be back today?16:21
*** mrmartin has quit IRC16:21
fungianteaya: i believe he was back in seattle last night16:21
*** tjones has quit IRC16:21
anteayayes16:21
*** rossella has quit IRC16:22
anteayaI'm also gone for the day soon16:22
anteayaanother appointment to fix my back/neck/head16:22
anteayahopefully this should wrap it up16:22
*** rossella-s has quit IRC16:23
vrovachevhi guys, please, review me: https://review.openstack.org/#/c/74342/16:24
*** afazekas has joined #openstack-infra16:24
*** david_lyle_ has joined #openstack-infra16:24
anteayadoes the post job upstream-translation-update require a specific kind of node? lots of post jobs waiting for that job to get a node16:25
*** sarob has quit IRC16:25
anteayathe only specific node I am aware of is centos for python26 jobs16:26
fungianteaya: yes, there is a trusted static node named "proposal" assigned to jenkins.o.o which runs those, in order, one at a time16:26
fungiso they have a tendency to queue up16:26
anteayaah16:26
fungioh! and it got marked offline16:26
anteayaI don't see any of them running16:26
anteayaokay16:27
fungilooks like i need to patch teh regex for the jobs which run on it. checking to see what it ran last16:27
anteayak16:27
*** rossella-s has joined #openstack-infra16:27
*** jcoufal has quit IRC16:28
*** david-lyle has quit IRC16:28
*** yassine has quit IRC16:28
*** yassine has joined #openstack-infra16:29
*** chuck__ has joined #openstack-infra16:29
fungiseems i missed setting propose-requirements-updates to add the reusable_node parameter function. i've re-onlined the slave so it should burn through those fairly quickly unless it hits another propose-requirements-update job before we merge the fix16:29
fungier, propose-requirements-updates16:30
anteayaokay16:30
anteayaI'll watch it16:30
anteayaone down16:31
*** Ajaeger has quit IRC16:31
*** esker has quit IRC16:32
*** esker has joined #openstack-infra16:32
anteayafungi: how long does it need between jobs?16:35
openstackgerritJames E. Blair proposed a change to openstack-infra/nodepool: Check server status in batch  https://review.openstack.org/7477316:35
openstackgerritJeremy Stanley proposed a change to openstack-infra/config: Don't offline after propose-requirements-updates  https://review.openstack.org/7477416:35
jeblairfungi, clarkb, mordred: https://review.openstack.org/74773 is another fairly small nodepool change that should make a huge difference16:36
anteayafungi: it is not currently running16:36
fungianteaya: looks like it takes about 1-1.5 minutes per job... https://jenkins.openstack.org/computer/proposal.slave.openstack.org/16:36
anteayato finish the job16:36
anteayabut to move from one to the other?16:36
fungianteaya: a few seconds16:36
*** sandywalsh_ has quit IRC16:36
anteayaokay, can you check it again16:36
anteayait isn't running, been at least 20 seconds16:37
*** esker has quit IRC16:37
fungianteaya: it's running16:37
fungii just watched it complete a glance translation update and start a ceilometer one16:37
*** tjones has joined #openstack-infra16:38
anteayaI don't even see a glance patch in post16:38
anteayabut at least it is running16:38
*** gokrokve has quit IRC16:38
anteayathanks for the link16:38
fungianteaya: it might not have been in the post queue16:38
anteayaoh16:38
anteayaI had just been watching the post queue16:38
*** smarcet has quit IRC16:38
fungiit was in the periodic pipeline... https://jenkins.openstack.org/job/glance-propose-translation-update/325/parameters/16:38
openstackgerritJames E. Blair proposed a change to openstack-infra/nodepool: Check server status in batch  https://review.openstack.org/7477316:39
*** gokrokve has joined #openstack-infra16:39
dmsimardjeblair: Got another merge fail on that same git bug again, is there another affected slave ? https://review.openstack.org/#/c/74082/16:39
anteayaah, okay thanks16:40
SergeyLukjanovjeblair, fwiw #2 lgtm (https://review.openstack.org/74773)16:40
fungidmsimard: looks like it might be a broken repository on one of the git server farm. i'll double-check that16:40
dmsimardfungi: Thanks, I submitted https://bugs.launchpad.net/openstack-ci/+bug/1282136 FYI16:40
uvirtbotLaunchpad bug 1282136 in openstack-ci "Git problem: "Failed to resolve 'HEAD' as a valid ref."" [Undecided,New]16:40
*** ravikumar_hp has joined #openstack-infra16:41
ravikumar_hpquick question - What is Jenkins URL that runs nightly jobs16:42
anteayaravikumar_hp: we have 7 jenkins16:42
anteayathey all run jobs16:42
SergeyLukjanovanteaya, 816:42
SergeyLukjanov:)16:43
*** gyee has joined #openstack-infra16:43
fungiravikumar_hp: http://logs.openstack.org/periodic/16:43
ravikumar_hpok. Jenkins that runs Tempest nightly jobs16:43
anteayayes 816:43
*** gokrokve has quit IRC16:43
fungianteaya: SergeyLukjanov: technically 9 if you also count jenkins-dev16:43
anteayayes16:43
SergeyLukjanovyeah ^)16:43
fungithough for tempest periodic jobs, only 7 of them run those16:44
anteayaI was going to move to figuring out what ravikumar_at_mothership wanted16:44
*** andreaf has joined #openstack-infra16:44
anteayabut you beat me too it16:44
ravikumar_hpanteaya: i am trying to find out if there is Tempest job that runs everyday other than gated test16:45
jeblairdmsimard: thanks!  i was quiet because i was quickly sshing into that slave, which hadn't been deleted16:45
jeblairdmsimard: it does indeed look like the cached git repo for puppet-swift on that node was bad; i saved a copy of it16:46
dmsimardjeblair: We ran into the same issue for puppet-neutron, I did a recheck and it worked - I linked it to the bug16:46
fungijeblair: dmsimard: yes, my eye jumped to the remote update, but the git farm's copies of that repo seem fine16:46
fungiand 'git clone file:///opt/git/stackforge/puppet-swift' was the local source of the issue16:47
fungiwonder if it's bad on the image in that provider region16:47
jeblairfungi: yeah, i'm going to check that next16:47
*** beagles has quit IRC16:47
*** dkliban is now known as dkliban_afk16:47
*** virmitio has joined #openstack-infra16:48
anteayaravikumar_hp: I'm looking here: http://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/files/zuul/layout.yaml16:48
*** tjones has quit IRC16:48
anteayaI don't have the answer yet, but you are welcome to join me16:48
jeblairdmsimard, fungi: both failures were puppet-swift in az2;16:48
jeblairdmsimard: did you say you saw a puppet-neutron as well?16:48
fungijeblair: sounds like a strong correlation16:48
fungisample size 2 ;)16:49
*** sandywalsh_ has joined #openstack-infra16:49
*** oubiwann has quit IRC16:49
dmsimardjeblair: Yeah, I linked it, it's on http://status.openstack.org/rechecks/ - review is https://review.openstack.org/#/c/74709/16:49
jeblairalso az216:49
ravikumar_hpanteaya: ok. Thanks16:49
anteayaravikumar_hp: and in the periodic logs that fungi linked you to: http://logs.openstack.org/periodic/16:50
anteayayou can see all the periodic tempest job logs16:50
ravikumar_hpanteaya: ok16:50
*** jaypipes has joined #openstack-infra16:50
fungijeblair: also, last successful build of that image was 181.65 hours ago16:50
anteayaravikumar_hp: did you have more to your question or does that give you the information you need?16:50
fungihpcloud-az2 really does not like to build images16:51
jeblairno it doesn't16:51
anteayaI need to change tasks and don't want to leave you hanging16:51
*** tjones has joined #openstack-infra16:51
*** b3nt_pin has joined #openstack-infra16:51
ravikumar_hpanteaya: i got the information. That's it .Thanks .16:51
*** sarob has joined #openstack-infra16:51
anteayaravikumar_hp: great16:51
*** b3nt_pin is now known as beagles16:51
*** beagles is now known as beagles_brb16:52
*** hemnafk is now known as hemna_16:53
*** smarcet has joined #openstack-infra16:53
*** sarob_ has joined #openstack-infra16:53
*** tjones has quit IRC16:54
*** markmcclain has joined #openstack-infra16:54
*** tjones has joined #openstack-infra16:55
jeblairfungi: the git repos with the latest timestamps are all bad.  perhaps we didn't call sync three times while spinning in a circle.16:55
*** esker has joined #openstack-infra16:55
dmsimardlol ?16:55
fungithat takes me back16:56
*** sarob has quit IRC16:56
medieval1XYZZY16:56
*** esker has quit IRC16:56
fungisync ; sleep 10 ; sync ; sleep 10; sync; sleep 10 ; shutdown -h now16:56
jeblairfungi: prepare_devstack has a sync, but not prepare_node, which is where the clones are16:56
fungiPLUGH16:56
*** sabari_ has joined #openstack-infra16:57
jeblair(and you can bet the sync is in prepare_devstack because it's needed)16:57
annegentlefungi: so I know you said the merge would look funny, but https://review.openstack.org/#/c/74777/ is the result of my trying to merge openstack/operations-guide feature/edits with master16:57
annegentlefungi: seems to be blank (no changes, just a commit message)16:57
*** sabari_ is now known as sabari16:57
fungiannegentle: sometimes it's funnier. but it's never a laugh riot or anything16:57
annegentlefungi: *snort*16:57
*** e0ne has quit IRC16:58
dmsimardjeblair, fungi: You guys let me know when to try a reverify :)16:58
annegentlefungi: I followed the steps in https://wiki.openstack.org/wiki/GerritJenkinsGit#Merge_Commits with git checkout -b oreilly/71943 remotes/origin/feature/edits as my first step16:58
*** ociuhandu has quit IRC16:58
fungiannegentle: anyway, if your ops guide jobs include draft building, you should be able to preview the result there from the check run before approving16:58
annegentlefungi: it really is supposed to look like that?16:59
* annegentle is freaked out :)16:59
fungiannegentle: usually, yes16:59
annegentlefungi: no way. Okay!16:59
annegentleI'll wait for it to build then! Nice16:59
*** jcooley_ has joined #openstack-infra16:59
fungiannegentle: the critical part is the "parent(s)" field there... you can see it lists the commits you're merging16:59
jeblairdmsimard: at this point, i think the lack of sync is the problem in the image build.  i'll fix it but it'll take a few hours to work through the system; you can play the odds if you like, or come back to it later in the day for better odds.17:00
jeblairdmsimard: considering the state of the backlog, if you can do the latter, that would probably be best17:00
fungiannegentle: and also the "branch" field which tells you which branch you're merging them on17:00
jeblairdmsimard: i'll update the bug in a minute; thank you very much for catching this and pointing me at a live server!17:00
fungiannegentle: presumably the two parents are one from each branch you're trying to merge17:00
annegentlefungi: okay, I see parents now.17:00
*** dkliban_afk is now known as dkliban17:00
annegentlefungi: so I've got a spreadsheet with all the patches I need to go to feature/edits at https://docs.google.com/spreadsheet/ccc?key=0AhXvn1h0dcyYdGtiRXo5ODFMbkhRZkVROGdTY3RjWVE#gid=0 and I'll just go through the list from oldest to newest17:01
*** dpyzhov has quit IRC17:01
annegentlefungi: and I think that helps me figure out parentage17:01
annegentlefungi: woops, gotta get on a call, thanks for the help!17:01
dmsimardjeblair: np, I appreciate you fixing the issue - i'm the one thanking you, here :p17:01
*** jaypipes has quit IRC17:02
BobBallI've managed to break my environment around pip and pbr while playing with nodepool... I may have deleted something I shouldn't have done.  http://paste.openstack.org/show/67315/ Can anyone suggest how I can uninstall / reinstall PBR in a sensible way?17:03
*** thomasbiege has joined #openstack-infra17:03
*** derekh has quit IRC17:03
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Add sync calls to all prepare scripts  https://review.openstack.org/7478017:03
jeblairoh wait let me attach the bug to that17:04
*** jaypipes has joined #openstack-infra17:04
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Add sync calls to all prepare scripts  https://review.openstack.org/7478017:05
*** jcooley_ has quit IRC17:05
*** markmc has quit IRC17:06
jeblairBobBall: delete the virtualenv and start over?17:07
BobBallI wish I had done this in a virtual environment... :P17:07
jeblairoh, i thought /usr/workspace/scratch/openstack/citrix/nodepool/easy-install.pth was in a venv17:07
BobBallIt might have been at one point - I'm still getting used to using venvs by default, and so perhaps the issue is I might have installed it in a venv and then done something else outside the venv that broke it or similar17:08
BobBallthat file doesn't exist17:08
*** zul has quit IRC17:08
*** chuck__ has quit IRC17:08
*** zul has joined #openstack-infra17:09
BobBallthe venv environment I have works great - but I'm trying to fix my system so I don't have to be in a venv to use novaclient :P17:09
jeblairBobBall: then at this point i usually go mucking about and try to remove things manually17:09
jeblairBobBall: it's possible mordred may have better advice17:09
jeblairmordred: btw, do know the relative merits of https://review.openstack.org/#/c/74521/ vs https://review.openstack.org/#/c/74523/ ?17:10
SergeyLukjanovjeblair, all scripts are based on prepare_node, is it 'as designed' to sync twice?17:10
fungijeblair: so... there are jobs running, but not very many. looking at the jenkins masters' webuis, some have no assigned nodes at all, some have nodes but they're all marked offline, some have nodes running jobs but none have a bunch17:10
*** jcooley_ has joined #openstack-infra17:10
jeblairSergeyLukjanov: yes, so that we don't have to think about it.  :)17:11
SergeyLukjanovjeblair, can't disagree ;)17:11
funginodepoold even after restarting and deleting everything, seems to have 574 nodes on jenkins0417:12
*** sandywalsh_ has quit IRC17:12
jeblairfungi: oh, but those were false-ready nodes17:12
jeblairfungi: and probably need to be deleted17:12
jeblairfungi: i think nodepool marks them ready _before_ adding them to jenkins17:12
fungijeblair: i deleted any nodes which were marked ready at the time of the restart17:12
jeblairoh17:13
fungiso these are all new since the restart17:13
*** sarob_ has quit IRC17:13
*** cadenzajon has joined #openstack-infra17:13
*** sarob has joined #openstack-infra17:13
*** amotoki_ has quit IRC17:14
fungihttp://paste.openstack.org/show/67316/ is the current breakdown for jenkins04 according to nodepool17:14
fungiskimming its webui, i believe the used and delete counts, but not the ready17:15
jeblairfungi: i think the jenkins04 manager is stuck again waiting for a response17:18
jeblairfungi: nodepool reports the connection ESTABLISHED but it doesn't show up on jenkins0417:18
*** sarob has quit IRC17:18
fungimaybe jenkins04 is struggling? i can put it into shutdown and see what happens when i restart nodepoold17:19
jeblairfungi: hrm, i wouldn't expect a half-closed connection as a result of that17:20
fungior maybe this is ongoing network issues in dfw17:20
jeblairfungi: seems more likely that we're losing fin packets17:20
*** mrmartin has joined #openstack-infra17:21
openstackgerritA change was merged to openstack/requirements: Sync requirements to oslo.vmware  https://review.openstack.org/7456917:22
jeblairfungi: nm, jenkins04 is moving17:23
fungithis looks decidedly non-graceful... http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=1411&rra_id=all17:23
*** chandankumar_ has joined #openstack-infra17:24
fungichecking to see whether there's anything obvious from teh javamelody side17:25
jeblairfungi: there are a lot of offline nodes on jenkins0417:25
fungihttps://jenkins04.openstack.org/monitoring?part=graph&graph=fileDescriptors17:25
fungijeblair: yeah, it seems like nodepool may be having trouble adding or deleting nodes from jenkins0417:25
*** khyati has joined #openstack-infra17:26
*** nicedice has joined #openstack-infra17:26
openstackgerritA change was merged to openstack-infra/config: Don't offline after propose-requirements-updates  https://review.openstack.org/7477417:26
fungii wonder if it's hitting an open file descriptors limit17:26
jeblairfungi: nodepool's interactions with it are _very_ slow17:26
jeblairfungi: so maybe it is struggling17:26
*** chandankumar_ has quit IRC17:27
jeblair5-10 seconds between api calls17:27
fungiand the number of open files flatlining at 4k for long periods seems suspiciously like a max17:28
*** sandywalsh_ has joined #openstack-infra17:28
jeblairfungi: jenkins04 has 413 slaves attached to it which is considerably more than our intent17:28
fungiright. just wondering what caused it to get so many new slaves assigned after the nodepool restart17:28
fungiit was similarly the one with most of the slaves before the nodepool restart (though i deleted those). could the predictive assignment in nodepool be misinterpreting that?17:30
fungithinking it wants to run that many jobs?17:30
jeblairfungi: it tries to balance across all providers that are up.  that does mean that if a provider comes up with 0 nodes, it's going to try to catch it up to the others quickly17:31
fungithat sounds like the reverse of what we see here then17:32
*** gokrokve has joined #openstack-infra17:32
fungiso maybe it's not a feedback loop problem17:32
*** gyee has quit IRC17:33
*** oubiwann has joined #openstack-infra17:33
jeblairfungi: which restart are you thinking of?  it never did a mass-allocation to 04 aronud 15:3017:34
fungino, i think it builds back up on 0417:34
*** beagles_brb is now known as beagles17:35
fungithe established tcp connections and open file descriptor graphs look like they might be proportional to the number of connected slaves17:35
jeblairfungi: it seems that even nodepool thinks 04 has all the slaves17:35
fungithey start to ramp up almost linearly from the time of the nodepool restart17:35
jeblairjust about17:36
*** jcoufal has joined #openstack-infra17:36
markmcclainwith the current jenkins issue it is expected lose a release job?17:37
markmcclainI pushed this tag: http://git.openstack.org/cgit/openstack/python-neutronclient/tag/?id=2.3.417:37
markmcclainbut the release job disappeared from zuul17:37
jeblairfungi: okay, i'm going to load the db locally and debug the allocator.  in the mean time, why don't you put 04 into shutdown and see if it redistributes after that.17:40
fungimarkmcclain: looks like jenkins04 ate it... http://logs.openstack.org/59/5931316dd7cddd6834eed6bd9665bd5ef7adbffc/release/python-neutronclient-tarball/0f27e48/console.html17:40
fungijeblair: will do. that was going to be my next suggestion17:40
fungimarkmcclain: i'll retrigger it once we get this going again17:41
*** pblaho has quit IRC17:41
markmcclainfungi: thanks17:41
fungijeblair: manually deleting the ready/building nodes assigned to jenkins04 now that it's in shutdown17:42
*** luqas has quit IRC17:43
jeblairfungi: so a lot of those actually still have active threads trying to add them17:43
fungii can hold off if you like17:43
jeblairfungi: it's probably okay.  i think it will cause a lot of errors in the daemon, but it should be ok.17:44
jeblaircarry on17:44
fungiproceeding in that case17:44
*** esker has joined #openstack-infra17:44
*** wenlock has quit IRC17:45
openstackgerritDevananda van der Veen proposed a change to openstack-infra/config: Let infra manage pyghmi releases  https://review.openstack.org/7449917:45
*** sandywalsh_ has quit IRC17:46
*** sarob has joined #openstack-infra17:46
clarkbmorning17:47
*** basha has joined #openstack-infra17:48
*** hashar has quit IRC17:48
*** packet has joined #openstack-infra17:48
*** Ryan_Lane has quit IRC17:49
*** max_lobur is now known as max_lobur_afk17:50
fungiclarkb: welcome to the continuation of "what can possibly break next?"17:51
clarkbjenkins04 is in trouble?17:51
jeblairclarkb: nodepool is being mean to it17:52
fungias software goes, nodepool really can be a bit of a bully17:52
*** markwash has joined #openstack-infra17:58
anteayamorning clarkb17:58
*** mrmartin has quit IRC17:59
*** sandywalsh_ has joined #openstack-infra17:59
*** dangers_away is now known as dangers18:00
*** rossella-s has quit IRC18:00
openstackgerritHenry Gessau proposed a change to openstack-infra/config: Incompatible chrome extension has been fixed  https://review.openstack.org/7479618:00
*** jpich has quit IRC18:02
*** thomasbiege has quit IRC18:05
*** hogepodge has joined #openstack-infra18:05
clarkbfungi: to answer your question, elasticsearch. The cluster fell over around 0842UTC today18:07
clarkbI have restarted elasticsearch6 which was the only node node back in the cluster at this point and ES is recovering shards to go back to all green18:08
fungiclarkb: that was not a good time for, well, anything running in dfw i suspect18:08
*** david_lyle_ is now known as david_lyle18:08
clarkboh did dfw have a bad time?18:08
clarkbI am still trying to catch up on everything, but ES is on its way to being happy again so I can move onto the next thing18:09
fungiahh, you probably haven't had time to read scrollback18:09
clarkbnope18:09
anteayadfw had a bad time yesterday18:09
fungiyes, rax-dfw network outage18:09
anteayawhich you came in towards the end of18:09
*** chris_johnson has joined #openstack-infra18:09
fungitoday utc though18:09
anteayathen we went to bed, except Sergey - credit to him for not doing anything drastic18:09
anteayaand then dfw had problems again today18:10
anteayacleanup is underway18:10
fungiapparently rax-dfw problem was just after 08:30 utc18:10
clarkboh that explains why my weechat derped18:10
anteayaand debugging to see what we can do since dfw might have problems some more18:10
clarkbfungi: that lines up perfectly with ES cluster issues, I won't dig into them too deeply then. I may increase the wait for master timeout though18:11
*** ildikov_ has quit IRC18:11
fungiclarkb: yeah, i don't know what the exact duration was, but we can guess from gaps in cacti graphs18:11
*** chandan_kumar has quit IRC18:12
anteayaoh and fungi is afk for a good portion of the afternoon18:13
*** basha has quit IRC18:13
anteayaand so am I18:13
*** Sukhdev has quit IRC18:14
fungiyeah, i need to vaporize in about 45 minutes18:15
lifelessfungi: speaking of said patches, i haven't looked at reviews yet; are either of them acceptable?18:16
fungilifeless: i basically haven't reviewed anything in the past 24 hours which was > 1 line long unless it was addressing an in-progress firefight18:17
lifelessack18:17
*** esker has quit IRC18:18
openstackgerritA change was merged to openstack-infra/config: Add sync calls to all prepare scripts  https://review.openstack.org/7478018:18
fungiokay, retriggered markmcclain's tarball job, only to discover that the authentication error it failed on the first time doesn't seem to be related to jenkins04 issues after all... got the same on jenkins05 now: http://logs.openstack.org/59/5931316dd7cddd6834eed6bd9665bd5ef7adbffc/release/python-neutronclient-tarball/0f27e48,1/console.html18:19
*** jcooley_ has quit IRC18:20
fungichecking logs on static.o.o18:20
*** nati_ueno has joined #openstack-infra18:21
*** johnthetubaguy has quit IRC18:21
clarkbis it trying to use the credential store for scp now as well?18:22
clarkbmight explain oddness in scp'ing to tarballs if the credentials stuff changed there18:22
jeblairlifeless: you and derekh both seemed to propose a patch that does similar things; is that correct?  should we choose one or the other?18:22
funginice! "Feb 19 18:09:10 static sshd[32104]: Invalid user hudson from 162.242.149.179"18:22
fungiapparently the jenkins upgrade/downgrade has mucked with credentials18:23
jeblairlifeless: i haven't reviewed yet, but knowing what to do with those two might help18:23
lifelessjeblair: derekh and I independently approached the problem, now you get to choose18:23
lifelessjeblair: I will review his; I think on his description that both approaches are valid18:24
jeblairlifeless: ok, thanks.  that will help.18:24
lifelessjeblair: we probably can do both at the same time in fact18:24
lifelessjeblair: though I don't know if that would be needed18:24
jeblairbelt and braces and a rope and some duct tape too?  :)18:24
lifelessyes18:25
lifelesssuperglue as well18:25
ArxCruzjeblair: regarding https://review.openstack.org/#/c/69715/ which paramiko version are you guys using? because I've tested in fedora19 and it fails because there's no get_tty argument on sshclient18:25
fungiyep, so it definitely has the username as "hudson" in the scp publisher for tarballs.o.o on at least two of the masters so far, probably more18:25
fungii'll correct them18:26
*** jroovers has quit IRC18:26
*** chris_johnson is now known as wchrisj|away18:26
jeblairfungi: that's very weird.18:26
*** jroovers has joined #openstack-infra18:26
clarkbfungi: jeblair: wouldn't be surprised if older jenkins read files different18:26
fungijeblair: i'm making sure nothing else about that publisher got reverted. i think hudson was the name it used back before we folded it onto static.o.o18:27
openstackgerritA change was merged to openstack-infra/zuul: Log components starts in Zuul.Server  https://review.openstack.org/6693918:28
*** wchrisj|away is now known as chris_johnson18:30
lifelessArxCruz: +1'd - I am not core in -infra in general, only in pbr18:30
ArxCruzlifeless: ;) thanks18:31
*** chris_johnson has quit IRC18:32
*** wchrisj has joined #openstack-infra18:32
fungiit also changed the target directory from /srv/static to /srv (for some reason it didn't alter any of that publisher on jenkins.o.o)18:32
*** krtaylor has quit IRC18:34
*** krtaylor has joined #openstack-infra18:36
*** mriedem has quit IRC18:37
fungiyeah, it seems to have only happened on 04-07 so maybe something to do with the way we copied in the configs for those when we built them?18:37
*** jgallard has quit IRC18:37
clarkbpossible since they were created all at once iirc18:37
fungier, 03-0718:37
*** jgrimm has quit IRC18:38
*** jgrimm has joined #openstack-infra18:38
clarkboh 03 was before 04-07 so maybe?18:38
fungii thought we created 01+02 at one time, 03+04 together and then 05-07 together18:40
clarkbcould be, my memory is fuzzy18:41
clarkbthat was around LCA when a bunch of stuff was happening18:41
*** mriedem has joined #openstack-infra18:41
*** jp_at_hp has quit IRC18:42
clarkbI thought mordred spun up 4 new masters18:42
fungiproposal.slave got offlined by another reqs update job before the layout.yaml change made it onto zuul, so i've brought it back online again18:42
anteayawe had 3 before lca, and 2 more during lca18:42
fungioh, i guess puppet agent is still disabled on zuul anyway?18:42
*** morganfainberg_Z is now known as morganfainberg18:42
anteayathen mordred brought up 3 more after that18:42
*** esker has joined #openstack-infra18:42
clarkbfungi: must be since I thought your change merged18:43
anteayawe had jenkins, 01 and 02 before18:43
anteayaand 03 and 04 during lca18:43
*** dizquierdo has quit IRC18:43
*** esker has quit IRC18:44
anteayaI remember since my graphic was current on the monday and stale on the tuesday18:44
*** e0ne has joined #openstack-infra18:44
*** esker has joined #openstack-infra18:45
*** dcramer_ has quit IRC18:46
fungimarkmcclain: https://pypi.python.org/pypi/python-neutronclient/2.3.418:46
markmcclainfungi: awesome.. thanks18:47
anteayalifeless: ^18:47
openstackgerritJames E. Blair proposed a change to openstack-infra/nodepool: Fix typo in allocation  https://review.openstack.org/7480318:47
jeblairfungi: ^18:47
jeblairfungi, clarkb: we're going to want to restart with that soon.  understanding that bug leads me to believe that the distribution is currently piling up on a different jenkins18:48
fungijeblair: gah!18:48
fungiand yes, i think so18:48
fungii however won't be around for that bit of fun, i suspect18:48
jeblairfungi, clarkb: the behavior change that triggered that is the addition of the py3k nodes, which happened to be the last ones in the loop.  since there are few of them, the distribution is rather skewed.18:48
clarkbreviewing now18:49
*** jroovers has quit IRC18:49
openstackgerritK Jonathan Harker proposed a change to openstack-infra/config: Parameterize the status page urls  https://review.openstack.org/7455718:50
fungifor some reason i seem to be unable to bring proposal.slave back online in the jenkins.o.o webui this time... after i click the button it just sits18:50
clarkbjeblair: that is a fun typo18:51
funginow adding to the fun, i can't even get the login link on jenkins.o.o to work after logging out and trying to log back in18:52
fungidoesn't *seem* to be the dns issue review.o.o was having along those lines yesterday though18:52
jeblairfungi: ok if i restart nodepool?  (i manually installed that)18:52
clarkbelasticsearch is doing a slow recovery :( this is going to be like last week for ES I think18:52
fungijeblair: sure18:53
clarkbfungi: I am giving jenkins.o.o and proposal a shot18:53
clarkbbut logging in seems to be unresponsive for me too18:53
clarkbnothing in the jenkins log about it htough18:53
jeblairrestarting and running deletes for nodes in building/delete state18:54
openstackgerritJustin Lund proposed a change to openstack/requirements: Update neutron-client minimum to 2.3.4  https://review.openstack.org/7480518:55
*** malini has left #openstack-infra18:55
openstackgerritJustin Lund proposed a change to openstack/requirements: Update neutron-client minimum to 2.3.4  https://review.openstack.org/7480518:56
*** melwitt has joined #openstack-infra18:56
*** oubiwann has quit IRC18:56
jeblairall the keypairs are deleted18:58
anteayayay18:58
anteayadid it take 13 hours?18:59
jeblairanteaya: about18:59
anteayawell now we have that datapoint18:59
clarkbfungi: apache is throwing proxy timeout errors when trying to log in18:59
*** dcramer_ has joined #openstack-infra18:59
fungiokay, i'm headed out. i'll get online from when/where i can over the next ~6 hours, and then get some more stuff done later when i'm home again18:59
jeblairfungi: have fun19:00
fungijeblair: thanks19:00
* anteaya leaves too19:00
clarkbtrying to read from jenkinses securityRealm/commengeLogin which I assume does the openid dance19:00
*** e0ne has quit IRC19:00
*** tjones has quit IRC19:04
*** dkehn_ has joined #openstack-infra19:05
clarkbjeblair: any ideas on where else to look for jenkins.o.o login issues? jenkins.log is pretty much empty19:05
clarkbI am tempest to restart the server since it isn't doing anyhting at the moment19:05
lifelessclarkb: lol19:06
clarkbwow19:06
lifelessclarkb: your fingers failed you19:06
clarkb*tempted19:06
clarkbmy print drivers cache common words19:06
lifelessclarkb: have you seen that fax encoding bug ?19:06
jeblairclarkb: not without logging in.  :)  i vote you restart19:06
clarkblifeless: no19:06
lifelessclarkb: so there's a compression driver for some faxs that takes a bitmap from the page - say a 019:07
jeblairjenkins 04 has a lot of nodes attached to it that don't exist. i'm going to stop it and manually remove the configs19:07
lifelessclarkb: and then applies it everywhere there are 0's19:07
clarkbjeblair: ok19:07
*** mriedem has quit IRC19:07
lifelessclarkb: the algorithm is tunable for noise etc19:07
lifelessclarkb: if you don't tune it *just right* you end up with numbers - e.g. payroll data, cheques, bank accounts - totally messed up19:08
jeblairstarting jenkins0419:10
clarkbjenkins.o.o is dead. it can't getRootDir. Investigating now :/19:11
*** wchrisj has quit IRC19:11
*** chris_johnson has joined #openstack-infra19:12
*** dstanek is now known as dstanek_afk19:12
jeblairjenkins04 is up, getting slaves added, and running jobs19:13
clarkbhrm now it is up, maybe that is a false alarm19:13
lifelessclarkb: http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_are_switching_written_numbers_when_scanning19:14
*** mriedem has joined #openstack-infra19:14
clarkband proposal.slave.o.o is running jobs again19:14
lifelessclarkb: have a read of that and weep19:15
*** julim has quit IRC19:15
clarkbI will :)19:15
lifelessalso, don't by xerox:)19:15
jeblairxerox laser printers are great.  i print books on them.19:15
lifeless'They indeed implemented a software bug, eight years ago, and indeed, numbers could be mangled across all compression modes. They have to roll out a patch for hundreds of thousands of devices world-wide.'19:16
lifelessjeblair: It was meant in humour; single mistakes don't blacklist a vendor - mistakes happen19:16
lifelessjeblair: I've purchased some pretty large xerox kit at firms in the past19:17
jeblair*nod*19:17
clarkbnow to figure out what fungi needed to run on the proposal slave. to the scrollback19:17
*** protux has quit IRC19:18
jeblairclarkb: i think it was just that it kept going offline because the regex was wrong; i don't think anything needs to be re-run19:18
clarkbjeblair: oh right because zuul needs new functions19:18
jeblairclarkb: the only thing that needed re-running was the tarball job due to the scp thing19:18
jeblairclarkb: i think that change merged so we should be set now wrt proposal19:19
clarkbjeblair: great, I will look at retriggering tarball job now19:19
*** e0ne has joined #openstack-infra19:21
*** thomasbiege has joined #openstack-infra19:21
clarkbjeblair: there are a bunch of offline nodes on 05, not sure if that is just nodepool catching up though19:23
clarkbmarkmcclain: you had tagged a release right?19:24
clarkbmarkmcclain: I will make sure that the whole pipeline happens for that19:24
*** nati_ueno has quit IRC19:24
jeblairclarkb: it could be a similar situation to 04; i'll check it out19:25
clarkblooks like fungi may have retriggered already, I am hunting this down19:25
*** nati_ueno has joined #openstack-infra19:25
markmcclainclarkb: yes and everything looks to have been published now19:25
clarkbmarkmcclain: yup I see it, I think fungi must've triggered everything then the jobs ran once I brought the slave back online19:26
markmcclainah19:26
jeblairclarkb: it's moving; i think i'll leave it be and see if it catches up.19:27
clarkbjeblair: ok19:27
openstackgerritA change was merged to openstack-infra/nodepool: Fix typo in allocation  https://review.openstack.org/7480319:28
*** dstanek_afk has quit IRC19:28
*** salv-orlando has joined #openstack-infra19:29
jeblair02 has that problem too.  the others are ok19:29
clarkbhttps://issues.jenkins-ci.org/browse/JENKINS-16239 is what I saw on jenkins.o.o19:29
clarkbI think an update of the envinject plugin will fix it19:30
clarkbbut it doesn't appear to be as serious as I first thought19:30
*** mrmartin has joined #openstack-infra19:30
openstackgerritK Jonathan Harker proposed a change to openstack-infra/config: Parameterize the status page urls  https://review.openstack.org/7455719:32
*** thomasbiege has quit IRC19:40
*** jcooley_ has joined #openstack-infra19:47
*** afazekas has quit IRC19:50
*** mfisch has quit IRC19:50
*** salv-orlando has quit IRC19:53
*** salv-orlando has joined #openstack-infra19:53
*** mrmartin has quit IRC19:54
*** dstanek_afk has joined #openstack-infra19:54
*** yassine has quit IRC19:55
*** yassine has joined #openstack-infra19:55
*** sandywalsh_ has quit IRC19:57
*** salv-orlando has quit IRC19:57
*** dstanek_afk has quit IRC19:59
*** julim has joined #openstack-infra20:01
clarkbES recovery is really slow, I am going to stop my indexers to give the cluster a chance to finish recovering20:02
jog0343 patches in check?20:04
clarkbwelcome to the jungle20:05
*** dcramer_ has quit IRC20:05
jog0is this the recheck 24 thing?20:05
jeblairjog0: no, this is a rax network outage + ffp load + the check thing20:05
jog0ffp?20:06
jeblairfeature freeze proposal20:06
jeblairer20:06
jeblairfeature proposal freeze?20:06
jeblairsome combination of those words.  :)20:06
jog0ack20:06
clarkbjeblair: indexers are stopped. I think indexing and recovery was slow because it was doing both at the same time which meant everything had to be extremely synchronous20:06
clarkbgoing to watch it now and see if that last 4 shards recover more quickyl20:07
jog0wow this is pretty scary20:07
* jog0 finds lunch20:07
jeblairjog0: ha20:07
*** ociuhandu has joined #openstack-infra20:08
*** jcooley_ has quit IRC20:08
*** hashar has joined #openstack-infra20:09
*** jcooley_ has joined #openstack-infra20:09
*** markmcclain has quit IRC20:12
jeblairclarkb: looks like the ready node count is now small (as it should be under load)20:12
openstackgerritZane Bitter proposed a change to openstack-infra/config: Fix ChangeId links  https://review.openstack.org/7482120:13
*** sandywalsh_ has joined #openstack-infra20:13
*** jcooley_ has quit IRC20:13
*** oubiwann has joined #openstack-infra20:14
*** jamespage_ has joined #openstack-infra20:17
*** oubiwann has quit IRC20:18
BobBallnodepool question... deleteNode can sometimes timeout in RAX causing nodepool to bail at http://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/nodepool.py#n1112 - but the node is eventually cleaned from RAX.  What would the advice be here?  Extend timeout? ignore all exceptions and carry on with the nodepool stuff?20:20
*** ociuhandu has quit IRC20:20
openstackgerritA change was merged to openstack-infra/git-review: Retrieve remote pushurl independently of user's locale  https://review.openstack.org/6430720:20
openstackgerritDan Prince proposed a change to openstack-infra/nodepool: Retry ssh connections on auth failure.  https://review.openstack.org/7482520:21
jeblairBobBall: the cleanup thread is supposed to cleanup the nodepool db in that case.  i think we should extend the rax timeout so it hits less often.20:21
jeblairBobBall: lifeless was working on a patch series that tackles that from a different perspective, but it's not ready yet20:22
BobBallAh, OK.20:22
BobBallYou hit it in the gate too with rax nodes?20:22
jeblairBobBall: yep20:22
jeblairthey eventually get cleaned up, just slower than they should20:23
BobBallkay.  Wonder why it hits them.  Might have a chat with Ant/John about that.20:23
*** ociuhandu has joined #openstack-infra20:23
zaroclarkb: i think this question is meant for you.. https://review.openstack.org/#/c/6132120:23
BobBallWill increase the timeout, and rely on the cleaning thread ;)20:23
clarkbwe seem to actually be recovering indexes now. I stopped all indexers and cleared the caches on es nodes20:24
jeblairBobBall: if you decide on a good value, let me know20:24
clarkbonce it is green again I will turn on indexers20:24
clarkb(I think we are beginning to get into our nodes are too small for the data thrown at it territory again)20:25
openstackgerritAndreas Jaeger proposed a change to openstack/requirements: Update openstack-doc-tools to 0.7.1  https://review.openstack.org/7482720:25
BobBallJust to understand jeblair - why do you want it to wait for the server to have gone on deletion?  a quota thing?  Could just add to a list of nodes that are being deleted and poll them in the cleanup thread, rather than block?20:25
*** cadenzajon has quit IRC20:26
lifelessjeblair: the delete refactoring stuff ?20:26
clarkbyou don't want to account the node as gone before it is gone20:26
clarkbquota is part of that but more importantly the allocation of nodes across providers20:27
BobBallOK20:27
BobBall10 minutes just seems like a long time to block so I'm hesitant about making it even longer :P20:27
jeblairBobBall: answering in order: nodepool needs to know how many servers there actually are in order to do math about how many it should spin up correctly.  yes.  there's lots of ways you could do it; this one is not that problematic, it just needs tuning; lifeless has another way.20:27
jeblairBobBall: it's not blocking anything20:27
jeblairBobBall: the current nodepool design has lots of threads all fighting to get their work done, mediated by the provider managers (so they don't starve each other or run over rate limits)20:28
jeblairBobBall: so that one thread is blocking, but it isn't slowing anything else down.20:29
jeblairlifeless: yes20:29
BobBallUnderstood.20:29
*** markmcclain has joined #openstack-infra20:30
*** mrmartin has joined #openstack-infra20:30
*** markmcclain1 has joined #openstack-infra20:31
BobBallI've got a python script trying to use nodepool - so my script polls for nodes, holds them then deletes them.  This is what's blocking for me, but I can re-work the blocking there so a longer timeout is fine.20:31
*** jgrimm has quit IRC20:32
*** markmcclain1 has quit IRC20:32
*** jamespage_ has quit IRC20:32
mtreinishfungi, clarkb: have you guys seen this failure before/is there a bug for it?: http://logs.openstack.org/57/73457/1/check/check-tempest-dsvm-postgres-full/595b00c/console.html20:32
*** dstanek has joined #openstack-infra20:32
*** mrda_away is now known as mrda20:33
*** denis_makogon_ has joined #openstack-infra20:34
clarkbmtreinish: haven't seen that before. looks like adding a region failed20:34
*** markmcclain has quit IRC20:34
clarkbbut I don't see keystone logs20:34
openstackgerritRyan Petrello proposed a change to openstack/requirements: Update pecan >= 0.4.5 in global requirements.  https://review.openstack.org/7483020:34
*** dprince has quit IRC20:35
jeblair2014-02-19 06:37:40.894 | 2014-02-19 06:37:40 /opt/stack/new/devstack/functions-common: line 997: /opt/stack/new/devstack/stack-screenrc: Permission denied20:35
jeblairis that the actual error?20:35
clarkband syncing requirements failed20:35
clarkbjeblair: looks like permissions trouble in the /opt dirs20:35
*** ryanpetrello has left #openstack-infra20:36
HenryGHi, I am unable to find an existing bug for this gate-neutron-python27 failure:  http://logs.openstack.org/33/68833/3/gate/gate-neutron-python27/42c237020:36
*** ryanpetrello has joined #openstack-infra20:36
HenryGAny clues/hints would be appreciated.20:37
clarkbHenryG: looks like a greenlet failure20:38
clarkbI would ask neutron folks20:38
HenryGclarkb: thanks, will do20:38
mtreinishclarkb, jeblair: ok I was just thrown by what looked like ps output interspersed in the log messages20:39
*** jcooley_ has joined #openstack-infra20:39
mtreinishbut yeah it definitely looks like permissions issue, should I open it against devstack or ci?20:39
clarkbmtreinish: not sure, does that change change permissions in a weird way?20:41
*** smarcet has left #openstack-infra20:41
jeblairright before running devstack, devstack-gate does: "sudo chown -R stack:stack $BASE"20:42
*** yolanda has quit IRC20:42
jeblairso it's hard to say what the problem could be.  did that fail?  or did something in devstack change it?20:42
mtreinishjeblair: it looks like everything was working fine until: http://logs.openstack.org/57/73457/1/check/check-tempest-dsvm-postgres-full/595b00c/console.html#_2014-02-19_06_37_13_73020:43
mtreinishwhen it went to sync the requirements for horizon20:43
*** jgrimm has joined #openstack-infra20:44
clarkbdevstack is also doing safe_chown ing of it sown20:45
clarkbso yeah I think it could be in a number of places20:45
clarkbsudo chown -R jenkins:jenkins /opt/stack/new happens in workspace new setup20:46
clarkbshould it be stack:stack instead?20:46
jeblairclarkb: not unless we want to 'sudo stack' before every command20:46
clarkbdoesn't devstack run as the stack user though?20:47
clarkbI guess it gets root as necessary though20:47
lifelessyou could sudo stack exec :)20:47
jeblairclarkb: yes, which is why devstack-gate does  "sudo chown -R stack:stack $BASE"20:47
jeblairright before running devstack20:47
jeblairlifeless: then we couldn't go back.20:47
clarkboh gotcha20:47
jeblairlifeless: jenkins has sudo, stack drops sudo20:47
lifelessah20:48
*** dcramer_ has joined #openstack-infra20:48
*** jcooley_ has quit IRC20:49
jeblairthat seems to have happened to 2 builds in the last 24h, in dfw and iad.20:51
jeblairaccording to logstash20:51
*** mwagner_lap has quit IRC20:53
mtreinishjeblair: I guess I'm really lucky then :)20:54
*** smarcet has joined #openstack-infra20:54
*** jcooley_ has joined #openstack-infra20:55
jeblairmtreinish: i think we'll either need to catch a live node or add some debugging20:56
mtreinishjeblair: ok, should I open a bug about it then?20:58
mtreinishyeah the logs don't really show what happened20:58
*** DinaBelova is now known as DinaBelova_20:58
jeblairmtreinish: sure; target ci and devstack until we know what's up i guess20:58
*** khyati has quit IRC21:01
*** sabari has quit IRC21:01
*** khyati has joined #openstack-infra21:02
mtreinishjeblair: https://bugs.launchpad.net/devstack/+bug/128226221:02
uvirtbotLaunchpad bug 1282262 in openstack-ci "Permission denied errors on /opt during devstack" [Undecided,New]21:02
*** khyati has quit IRC21:04
clarkbjeblair: I am thinking we may want to add another ES node so that losing one node doesn't cause the others to run into GC trouble (will need to bump the number of shards slightly too though that may be less necessary)21:05
clarkbjeblair: but I think this can happen after FF21:05
*** CaptTofu has quit IRC21:06
jeblairclarkb: whew21:06
*** CaptTofu has joined #openstack-infra21:06
*** jamespage_ has joined #openstack-infra21:07
openstackgerritDan Prince proposed a change to openstack-infra/nodepool: Retry ssh connections on auth failure.  https://review.openstack.org/7482521:07
*** rfolco has quit IRC21:08
ArxCruzjeblair: clarkb are you guys having problems with jenkins and nodepool? I have a few VM's ready, but a lot of jobs in the build queue21:09
*** thomasbiege has joined #openstack-infra21:09
* ArxCruz blame zuul changes :@21:10
*** jamespage_ has quit IRC21:10
openstackgerritIvan Melnikov proposed a change to openstack-infra/config: Add documentation jobs for taskflow  https://review.openstack.org/7483721:10
*** pafuent has left #openstack-infra21:10
openstackgerritMatthew Treinish proposed a change to openstack-infra/devstack-gate: Start compressing config files too  https://review.openstack.org/7483821:10
*** CaptTofu has quit IRC21:11
*** alexpilotti has quit IRC21:11
*** alexpilotti_ has joined #openstack-infra21:11
HenryGclarkb: there does not seem to be a bug tracking this yet, but it looks like trouble may be brewing:  http://logstash.openstack.org/index.html#eyJzZWFyY2giOiJtZXNzYWdlOlwiZ3JlZW5sZXQuR3JlZW5sZXRFeGl0XCIgQU5EIGZpbGVuYW1lOlwiY29uc29sZS5odG1sXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzOTI4NDQyMDk0OTV921:11
jeblairArxCruz: actually, zuul seems to have dealt rather well with 400 changes in queue....21:12
jeblairArxCruz: the bulk of our problems stem from a rax network outage this morning21:12
*** weshay has quit IRC21:12
ArxCruzjeblair: that's really weird, I have only one zuul and zuul-merger, and nodepool latest version21:12
HenryGAny tips on how to track down the culprit?21:12
mattoliverauMorning!21:12
ArxCruzjeblair: right now I have a lot of vm's idle and a lot of jobs in build queue21:12
jeblairmattoliverau: good morning;  things are busy here.21:13
*** jamespage_ has joined #openstack-infra21:13
jeblairArxCruz: oh, you're talking about your own thing.21:13
*** mrmartin has quit IRC21:13
ArxCruzjeblair: hehe, yup21:13
jeblairArxCruz: you asked about us.21:13
ArxCruzwondering if is something I did or if there's something wrong with yours too21:13
ArxCruzsorry, bad english21:13
*** cadenzajon has joined #openstack-infra21:13
jeblairArxCruz: see what the state of the nodes are in jenkins.  we upgraded jenkins and found that the latest version didn't work with the gearman plugin, so we're currently running the lts version21:14
ArxCruzoh boy...21:15
ArxCruzwhich jenkins version are you guys using ?21:15
ArxCruzand which gearman plugin ?21:15
ArxCruz:/21:15
ArxCruzjeblair: ^21:16
*** jcooley_ has quit IRC21:17
jeblairArxCruz: you can check the version # at the bottom of the page; the gearman plugin is something recent but shouldn't matter too much.21:17
*** jcooley_ has joined #openstack-infra21:18
ArxCruzjeblair: thanks, sorry for the confusion :)21:19
*** tjones has joined #openstack-infra21:21
*** jcooley_ has quit IRC21:22
*** jroovers has joined #openstack-infra21:26
openstackgerritSergey Lukjanov proposed a change to openstack-infra/config: Enable docs for python-savannaclient  https://review.openstack.org/7447021:27
*** markmcclain has joined #openstack-infra21:28
*** sabari has joined #openstack-infra21:28
*** e0ne has quit IRC21:28
*** andreaf has quit IRC21:28
*** jroovers has quit IRC21:30
*** jcooley_ has joined #openstack-infra21:30
dhellmanndstufft: fyi, I'm very close to giving up on namespace packages for oslo libraries :-|21:30
*** thomasbiege has quit IRC21:33
*** fbo_away is now known as fbo21:34
*** jamielennox is now known as jamielennox|away21:36
*** hashar has quit IRC21:38
*** jhesketh_ has joined #openstack-infra21:39
*** protux has joined #openstack-infra21:39
jhesketh_Morning21:39
*** ok_delta has joined #openstack-infra21:40
*** sabari_ has joined #openstack-infra21:40
*** sabari has quit IRC21:41
jeblairjhesketh_: good morning21:42
jhesketh_hey jeblair, how's things?21:43
jeblairjhesketh_: could be better.  :)21:43
jog0jeblair: how much of the check queue is from the outage vs recheck21:43
jeblairjhesketh_: there was a rax network outage this morning; that's the flat line in the nodepool graph21:43
jeblairjog0: i'm not sure how i would determine the answer to that21:43
jhesketh_:-(21:43
jeblairjhesketh_: that set us back a bit21:43
jhesketh_right, let me know if I can help with anything21:44
*** salv-orlando has joined #openstack-infra21:44
jeblairjog0: the trend in queue length has been solidly downward since we got everything unstuck, so at current in/out rates, we're not getting worse.  that suggests that under normal circumstances we can more than handle the current patchset test load.21:46
jeblairjog0: (extrapolating from less than 1 days worth of data which is potentially dangerous)21:46
*** oubiwann has joined #openstack-infra21:47
*** oubiwann has quit IRC21:47
jeblairjhesketh_: i have a puzzle for you if you're interested -- during the network outage, both the jenkins manager in nodepool as well as novaclient itself were stuck in the same ssl read function.21:48
jeblairjhesketh_: Shrews suggested that setting keepalive on the socket might help prevent that sort of situation in the future21:48
*** markmcclain has quit IRC21:48
dimspuzzled...requirements/projects.txt seems to be outdated for a brand new docs run. any ideas? http://logs.openstack.org/74/74474/17/check/gate-oslo.vmware-docs/33f359e/console.html21:48
jeblairjhesketh_: are you interested in seeing if something like that is possible?  it might involve some novaclient, urllib, or ssl library deep diving21:49
*** sabari_ has quit IRC21:50
jeblairdims: that will update during the next image build, which won't be for a while21:50
*** smarcet has quit IRC21:51
jhesketh_jeblair: not sure I have enough knowledge of those systems to actually achieve much there to be honest21:51
dimsjeblair, i see. thx21:51
jeblairmordred: ^ might need to do something about stale requirements repos21:52
jeblairjhesketh_: no prob21:52
jhesketh_jeblair: what was the read function they were stuck in/error they saw21:52
jeblairjhesketh_:21:53
jeblairhttp://paste.openstack.org/show/67382/21:53
*** wenlock_ has quit IRC21:54
openstackgerritDavanum Srinivas (dims) proposed a change to openstack-infra/config: Mark a few oslo.vmware jobs as non-voting  https://review.openstack.org/7466921:54
*** skraynev is now known as skraynev_afk21:55
mordredjeblair: reading scrollback21:55
*** wenlock has joined #openstack-infra21:56
*** prad has quit IRC21:56
jeblairmordred: /opt/requirements is now updated daily at most.  in the case of hpcloud-az2, it was last updated feb 12.21:57
*** julim has quit IRC21:58
openstackgerritDavanum Srinivas (dims) proposed a change to openstack-infra/config: Temporary : Mark a few oslo.vmware jobs as non-voting  https://review.openstack.org/7466921:59
mordredjeblair: so - we might need to "cd /opt/requirements ; git pull --ff-only"  (or something similar)21:59
mordred?21:59
mordredoh STALE requirements. I thought you were saying stable requirements21:59
jeblairmordred: yes, though that may require sudo access unless we change the owner of those repos to jenkins21:59
jeblairmordred: since all the slaves are single use, i think we can do that now22:00
mordredjeblair: shouldn't the repo prep be setting requirements to master?22:00
mordredlike, since requirements is part of the integration set?22:00
jeblairmordred: not devstack22:00
jeblairmordred: unit test, etc, jobs22:00
mordredoh. but why does /opt/requirements matter for unittests - they're all in tox?22:00
jeblairmordred: see the original question from dims and 22:03 < jeblair> mordred: not devstack22:01
jeblair22:03 < jeblair> mordred: unit test, etc, jobs22:01
jeblairgah22:01
jeblairmordred: and http://logs.openstack.org/74/74474/17/check/gate-oslo.vmware-docs/33f359e/console.html22:01
mordredk. reading22:01
mordredjeblair: GOTCHA. thank you22:02
mordredyeah - I think we fetch /opt/requirements as a pre-test sudo operation22:03
jeblairmordred: can't sudo, not yet at least.22:03
mordredor, rather, change it to jenkins owner22:03
mordredsorry - misspoke22:03
jeblairmordred: can sudo after this merges: https://review.openstack.org/#/q/status:open+project:openstack-infra/config+branch:master+topic:sudoers,n,z22:03
annegentlefungi: what's a an appropriate gerrit ref to point oreilly to for a pointer to HEAD:openstack/operations-guide/feature/edits (is that right?)22:03
clarkbok back from lunch22:04
annegentlefungi: right now they are pointing at a fork of openstack/operations-guide, but I think jeblair mentioned they could push to a gerrit ref22:05
annegentleclarkb: welcome back!22:05
clarkbI am going to turn indexers back on22:06
clarkbannegentle: fungi is AFk for a while. let me kick ES then I will look at your question22:06
annegentleclarkb: okie22:06
*** amcrn has quit IRC22:07
*** ok_delta has quit IRC22:07
*** virmitio has quit IRC22:08
*** dkliban is now known as dkliban_afk22:09
*** cadenzajon has quit IRC22:09
*** CaptTofu has joined #openstack-infra22:10
clarkbok ES and logstash are "UP" it is relocating shards but indexing is happening at a reasonable speed. I am a bit worried that we might run into memory trouble so will keep an eye on it22:11
*** oubiwann has joined #openstack-infra22:11
clarkbannegentle: now for oreilly. What is it that oreilly needs to do? just push their edits upstream?22:11
*** jamielennox|away is now known as jamielennox22:12
*** vkozhukalov has quit IRC22:12
*** cadenzajon has joined #openstack-infra22:13
*** ArxCruz has quit IRC22:13
annegentleclarkb: so we created a branch so that oreilly's edits are less intrusive on our master22:13
clarkbyup22:13
annegentleclarkb: we can happily keep editing while they make it production ready22:13
*** lcostantino has quit IRC22:13
annegentleclarkb: we're still changing master and then I keep delivering changes to feature/edits22:14
*** khyati has joined #openstack-infra22:14
annegentleclarkb: they just want to know what we want :) very accomodating22:14
*** ArxCruz has joined #openstack-infra22:15
zaroroz: you cannot replace the change owner and that's not configurable in gerrit.  however it looks like there might be a workaround which fungi has powers to do.. https://groups.google.com/forum/#!topic/repo-discuss/aqNgmuiCtyk22:15
clarkbannegentle: what would you like them to do ?22:15
*** sarob has quit IRC22:16
jeblairzaro, roz: we're not going to do that.  what's the problem?22:16
annegentleclarkb: ideally they'll push to feature/edits22:16
annegentleclarkb: so what do I tell them to push to?22:16
*** sarob has joined #openstack-infra22:16
clarkbdims: still around? we need to test the -proposed version of libvirt 1.1.1 on precise before it will end up in cloud archive. I think the easiest way to do that is with a devstack change that enables -proposed for the libvirt package. Is that something you are already testing?22:16
*** thedodd has joined #openstack-infra22:16
clarkbannegentle: and these edits would go into review right?22:17
annegentlepush to the appropriate gerrit ref (HEAD:refs/for/branchname)22:17
annegentleclarkb: jeblair originally had that in an email ^^22:17
annegentleclarkb: so helping their production staff get the pointer right22:17
clarkbgit push ssh://username@review.openstack.org:29418 HEAD:refs/for/feature/edits <- that will push them up for review22:17
*** banix has quit IRC22:18
clarkbcan also just use git review on that branch if the .gitreview branch is set correctly22:18
annegentlerefs/for? really?22:18
clarkbannegentle: refs/for is the magical gerrit reference prefix22:18
annegentleclarkb: (not that I'm doubting)!22:18
annegentleclarkb: do you think it makes sense to give them one username that can push directly? or did we decide that was bad22:19
*** bknudson has quit IRC22:19
annegentleclarkb: I'm okay with walking one of their production staff through cla but wanting to be sure it's required22:20
clarkbannegentle: personally I think that is bad. It isn't how openstack accepts commits. But the relationship here is new and special and may not require review22:20
clarkbjeblair: ^22:20
jeblairi'd like to try having them push things for review22:20
annegentlejeblair: clarkb: okay I'll keep pushing them22:20
*** sarob has quit IRC22:21
*** dolphm has joined #openstack-infra22:21
dolphmis zuul waiting for a check job to complete before moving approved changes into the gate?22:22
clarkbdolphm: if the check results are more than 24 hours old yes22:22
dolphmYAY!22:22
annegentlejeblair: if I give them git push ssh://username@review.openstack.org:29418 HEAD:refs/for/feature/edits and they go through the CLA and all, what will those patches look like to me on review.openstack.org?22:22
*** mfer has quit IRC22:22
dstufftdhellmann: dooo it22:22
dstufftdhellmann: namespace packages are bad for you22:23
clarkbdolphm: and it will recheck if comments happen (not just approvals) and the check tests are more than 72 hours old22:23
annegentlejeblair: right now I'm porting from master to feature/edits22:23
dstufftat least until python 3.whatever is the baseline and you can use the built in form of namepsace packages22:23
dstufftmaybe someone can backport that to 2.x, I dunno22:23
dolphmclarkb: ha, that's awesome22:23
*** esker has quit IRC22:23
clarkbdolphm: idea behind that is test results stay fresh as review happens22:23
fungiclarkb: the neutronclient release was not related to the offlined proposal slave... just different broken things i was trying to fix22:23
fungibut looks like you figured that out22:24
jeblairannegentle: they'll show up like normal but the branch column will be different22:24
dolphmclarkb: that's great -- it should help catch merge conflicts earlier too, which will be super useful all by itself22:24
*** dstanek has quit IRC22:24
clarkbfungi: yup thanks, go back to being AFK :P22:24
annegentlejeblair: ok like stable/havana.22:24
jeblairannegentle: exactly22:24
zarojeblair: roz wants to make himself the owner of a change so he can set it to WIP Status.22:24
dmsimardjeblair: Sorry to bother you with that again, when did you say https://review.openstack.org/#/c/74780/ was going to be effective ?22:24
jeblairannegentle: so you'll want to watch out for that22:24
dimsclarkb, pong. yes i can help with that22:24
clarkbdims: awesome thanks. let me collect the relevant data really quickly22:25
jeblairzaro, roz: remove the changeid from the commit message and git-review it again to make a new change in gerrit.  abondon the old one.22:25
dimsclarkb, i ended up building the libvirt from their git and running it in our gate22:25
clarkbdims: see https://bugs.launchpad.net/nova/+bug/1228977/ comment from Brian Murray. Lifeless already updated the impact and risk stuff for us22:26
uvirtbotLaunchpad bug 1228977 in nova "n-cpu seems to crash when running with libvirt 1.1.1 from ubuntu cloud archive" [High,Confirmed]22:26
clarkbdims: so now we need to test it22:26
jeblairdmsimard: i'll try kicking off an image build now22:27
clarkbdims: we need a change to devstack that enables ubuntu -proposed https://wiki.ubuntu.com/Testing/EnableProposed and changes the name of the libvirt package to libvirt/precise-proposed in devstack that we can WIP22:27
clarkbdims: that should install libvirt from proposed and test that the patched libvirt works as expected22:27
dimsclarkb, am on it after i wrap up a couple of things in a few22:27
*** jamespage_ has quit IRC22:28
clarkbdims: with that info we should be able to get the package updated in cloud archive and hopefully switch all tests to new libvirt22:28
*** oubiwann has quit IRC22:28
dimsclarkb, yep. sounds good.22:28
clarkbdims: so basically this is a throw away change to show ubuntu that the fix is safe22:28
clarkbdims: awesome thank you22:28
dimsclarkb, yep.22:28
*** jomara has quit IRC22:29
*** prad has joined #openstack-infra22:31
*** jcooley_ has quit IRC22:31
*** jcooley_ has joined #openstack-infra22:31
dmsimardjeblair: Thanks, appreciate it. Let me know what happens :)22:33
*** mrda is now known as mrda_away22:33
*** dcramer_ has quit IRC22:34
jeblairclarkb: http://paste.openstack.org/show/67391/22:34
jeblairclarkb: az2 consistently fails image creation with that22:34
clarkblooking22:35
*** dolphm is now known as dolphm_50322:35
clarkbFYI gearman for logstash is 164k events behind but slowly catching up22:35
clarkbjeblair: is the remote side killing our connection?22:35
*** jcooley_ has quit IRC22:35
jeblairclarkb: i have no idea22:36
jeblairclarkb: i tried it from my workstation at home and it works.  :/22:36
*** jcooley_ has joined #openstack-infra22:36
*** VijayT has joined #openstack-infra22:37
*** mriedem has quit IRC22:37
*** jcooley_ has quit IRC22:37
*** jeckersb is now known as jeckersb_gone22:37
*** jcooley_ has joined #openstack-infra22:38
*** thomasem has quit IRC22:39
*** e0ne has joined #openstack-infra22:39
*** rcleere has quit IRC22:39
clarkbis CONNECT_TIMEOUT being hit?22:40
* clarkb reads more code22:40
clarkbdoesn't look like it22:41
*** miqui has quit IRC22:42
*** jcooley_ has quit IRC22:42
clarkbjeblair: I think the nodeutils ssh_connect may need to catch a wider net of exceptions possibly22:42
clarkbright now it only catches socket.error22:42
jeblairclarkb: interesting that this is new and only happens on az222:42
*** e0ne has quit IRC22:42
*** dstanek has joined #openstack-infra22:42
clarkbI agree22:43
jeblairclarkb: i'm trying some manual tests with 'nova boot'22:43
clarkbk22:43
fungiclarkb: thanks. brad topol is giving a great keystone overview to the group. Shrews is here too22:44
clarkbso #tox is the channel for the skype replacement, not the python test tool...22:44
lifelesshahahahaha22:45
lifelessclarkb: #python-testing22:45
jeblairclarkb: so just manually sshing, for a while i got ssh: connect to host 15.185.190.118 port 22: Connection refused22:45
*** ArxCruz has quit IRC22:45
jeblairclarkb: now i get Connection closed by 15.185.190.11822:45
clarkbjeblair: which should cause an EOFError right?22:46
dimsclarkb, i see zul may have updated "precise-proposed/icehouse" to libvirt 1.2.1 with the changes we need (https://launchpad.net/~ubuntu-cloud-archive/+archive/icehouse-staging/+sourcepub/3889570/+listing-archive-extra) - we will have to try that22:46
clarkblifeless: thanks22:46
*** sarob has joined #openstack-infra22:46
clarkbdims: that should work22:47
morganfainbergfungi, give topol a hard time for me ;)22:47
dimsclarkb, will report back tomorrow.22:47
morganfainbergfungi, (or at least wave enthusiastically at him for me)22:47
jeblairclarkb: i've never looked at a console log for an hpcs vm before, but this doesn't look great to me: http://paste.openstack.org/show/67397/22:47
clarkbdims: awesome thank you for the help (I had hoped to get to it eventually but so much other stuff is going on)22:48
zuldims/clarkb: we should be uploading a new version of libvirt next week22:48
clarkbzul: does that mean you don't need us to test it?22:48
clarkbzul: https://bugs.launchpad.net/nova/+bug/1228977/ started the conversation22:48
uvirtbotLaunchpad bug 1228977 in nova "n-cpu seems to crash when running with libvirt 1.1.1 from ubuntu cloud archive" [High,Confirmed]22:48
jaypipesquick question... anybody know which config file the periodic QA jobs are defined in?22:49
clarkbwe need to test it anyways, but it is easier to do that once in cloud archive22:49
*** bknudson has joined #openstack-infra22:49
clarkbhowever getting ahead of it is probably best so that if it doesn't work we can hopefully fix it before the update22:49
fungimorganfainberg: will do. i'm a well-practiced heckler22:49
morganfainbergfungi, ++22:49
morganfainberg:)22:49
clarkbjeblair: that looks like unhappy metadata server which is bad times22:50
lifelesszul: as we understand it you need it tested, so we're aiming to do that :)22:51
clarkbjaypipes: most of them should be templates now and we specify which branch to test in the projects.yaml file for JJB when we instantiate the template22:51
mordredclarkb: its #pylib22:51
*** rlandy has quit IRC22:52
jeblairclarkb: still getting eof on ssh to that host.  spinning up another one in az1 to compare console log.22:52
jaypipesclarkb: yeah, am looking in that file now.. unless I am mistaken, all the periodic jobs are run against "devstack-precise" single use nodes. Is that correct?22:52
*** dkranz has quit IRC22:52
clarkbjaypipes: all of the tempest periodic tests yes22:53
clarkbthe unittest periodic jobs are run on bare-precise and bare-centos now22:53
jeblairclarkb: yeah, the output looks much less error-like on az122:53
jaypipesclarkb: gotcha. thx man.22:54
jeblairclarkb: i think this may be hpcs ticket-worthy22:54
clarkbjeblair: I agree, though we may just be told to stop using az2 which is :(22:54
jeblairclarkb: not much we can do about that, we can't use it now anyway22:55
clarkbyup22:55
*** ryanpetrello has quit IRC22:55
jeblairclarkb: would you please do the honors?22:56
clarkboh you want me to do it :P yes I will file it22:56
zullifeless:  thats for srus22:57
lifelesszul: so, UCA doesn't need as much testing as SRUs ?22:58
lifelesszul: anyhow, we want it in saucy directly too22:58
*** esker has joined #openstack-infra22:59
*** thedodd has quit IRC23:00
*** esker has quit IRC23:00
*** esker has joined #openstack-infra23:00
*** mrda_away is now known as mrda23:01
zullifeless:  to get it saucy it needs an sru, UCA it gets updated when trusty gets updated23:01
lifelesszul: ok, so - tripleo wants it in saucy ;)23:01
zullifeless:  thats nice for tripleo, that takes a bit longer then :)23:02
*** fbo is now known as fbo_away23:02
*** markmcclain has joined #openstack-infra23:03
*** markmcclain1 has joined #openstack-infra23:05
clarkbjeblair: ticket sent, I cc'd you23:05
*** markmcclain has quit IRC23:07
*** julim has joined #openstack-infra23:08
*** ayoung has joined #openstack-infra23:08
*** khyati has quit IRC23:09
*** jnoller has quit IRC23:10
*** sarob has quit IRC23:11
openstackgerritMat Lowery proposed a change to openstack-infra/config: Enable list item bullets in CSS except for Jenkins  https://review.openstack.org/7175223:12
ayoungjeblair, whom do we bug about enabling evesdrop for #openstack-keystone?  I feel like we are coding without git right now23:15
jeblairayoung: sorry, it's lost in the infra review backlog23:15
ayoungof course23:15
clarkbhsa the change been proposed?23:16
jeblairayoung: well, not lost, but it's there.23:16
clarkbI see it23:16
*** yassine has quit IRC23:16
jeblairi can't really prioritize reviewing irc-related changes right now.  sorry.23:16
clarkbI will approve, I don't think there are any meetings for the next 45 minutes23:17
ayoungheh23:17
*** gordc has quit IRC23:17
ayoungsorry to be a noodge23:17
clarkbayoung: out of curiousity why vacate -dev?23:17
dhellmanndstufft: the problem is the amount of pain to rename the packages we already have :-/23:17
jeblairayoung: apparently clarkb is the answer.  he's nicer than i am.  maybe i can convince him to review some of my changes.  ;)23:17
dmsimardjeblair: Leaving the office, i'll let you know if I still see the issue tomorrow23:17
ayoungclarkb, so many people were complaining about the keystone devs crowding out the room23:17
clarkbayoung: thats the point23:18
dstufftdhellmann: sufficient pain to teach you the error of your ways ;)23:18
dstufft(yes it sucks :( )23:18
clarkbayoung: eg that is a good thing23:18
ayoungclarkb, think we should stay in -dev?23:18
clarkboh well23:18
dstufftdhellmann: (true talk, basically this pain is why I'm anti namespaces, because i know this feel)23:19
clarkbayoung: not necessarily. I definitely seem ot have a different idea of how irc should work than most23:19
clarkbayoung: I expect folks to use clients that don't suck :)23:19
dhellmanndstufft: well, it's pain on the packagers, not on me23:19
dhellmannthe same pain applies for renaming anything23:19
ayoungclarkb, I preferred being in -dev as it meant I was paying attention there and tended to answer General Purpose questions, too23:19
morganfainbergclarkb, ++ on clients that don't suck23:20
*** oubiwann has joined #openstack-infra23:20
zulwth are we renaming now?23:20
morganfainbergclarkb, and i agree w/ ayoung, but if there is a real push for us to be elsewhere, I'm ok with it.23:20
clarkbzul: everything23:20
zulawesome23:21
* zul goes jump off a cliff23:21
morganfainbergclarkb, *shrug* it's why i hang out here as well, good convos, and sometimes even unrelated to -infra stuffs23:21
*** yamahata has quit IRC23:21
fungimorganfainberg: we fish you in with good conversation and then try to put you to work on infra tasks ;)23:21
morganfainbergfungi, LOL someday when dolphm_503 hasn't swamped us keystone folks w/ work, I'll be contributing more to infra :)23:22
morganfainbergfungi, actually... it is on my "I will be more involved in this" list for Juno23:22
clarkblol logstash gearman backlog isn't falling23:22
*** flaper87 is now known as flaper87|afk23:23
*** dmsimard has quit IRC23:23
*** CaptTofu has quit IRC23:24
jeblairfungi: have the static slaves been deleted and nodepool config adjusted?23:24
jeblairno23:24
jeblairhttps://review.openstack.org/#/q/status:open+project:openstack-infra/config+branch:master+topic:single-use,n,z23:24
*** CaptTofu has joined #openstack-infra23:24
openstackgerritA change was merged to openstack-infra/config: Add Eavesdrop bot to #openstack-keystone  https://review.openstack.org/7447223:27
jeblairfungi, clarkb: i approved the next change in that series; we have a node ready and it has python3 and pypy installed23:27
jeblair(i'm thinking 60 more nodes would be helpful now)23:27
clarkbI am going to temporarily increase the number of logstash workers to 3 per host while I am watching it. Hopefully that drops the backlog23:27
morganfainbergclarkb, ++ thanks for approving that23:27
clarkbjeblair: sounds good23:28
SergeyLukjanovjeblair, agreed23:28
*** chris_johnson has quit IRC23:28
*** CaptTofu has quit IRC23:29
SergeyLukjanovheh, just understand that it's already 3:30am in my tz while reading scrollback...23:30
fungijeblair: yeah, that sounds good. i removed the static slaves (except py3k) from jenkins01 and 02 but didn't press forward yet with everything else going on23:30
fungiwe should be safe to delete the static centos6 and precise slaves from rax now23:31
fungii've seen no failures which seem to stem from the precise->bare-precise shift23:31
SergeyLukjanovis there anyway to see the gate backlog?23:31
clarkbSergeyLukjanov: zuul status?23:32
*** dstanek has quit IRC23:32
*** CaptTofu has joined #openstack-infra23:32
*** sarob has joined #openstack-infra23:33
*** dkliban_afk has quit IRC23:33
SergeyLukjanovclarkb, it shows only 20 for each queue that are now in progress23:33
*** openstack has joined #openstack-infra23:34
clarkbSergeyLukjanov: right, 20 is the floor and it will grow as long as there aren't failures23:36
clarkbSergeyLukjanov: it adds 2 to the queue for each successful merge and halves with a floor of 20 for each failed merge23:36
clarkbSergeyLukjanov: the actual value is in the json blob23:36
SergeyLukjanovclarkb, yup, I know, looks like I should sleep a bit to be able to ask correctly :)23:37
clarkbSergeyLukjanov: you should sleep more23:37
clarkbSergeyLukjanov: compared to you and fungi I think I get more sleep than the both of you combined23:37
clarkb>_>23:37
SergeyLukjanovclarkb, oh, thanks for the tip about json23:37
*** hemna_ is now known as hemnafk23:38
SergeyLukjanovclarkb, :)23:38
*** oubiwann has quit IRC23:38
clarkbjeblair: ok, I think I just need to leave es and logstash be for a while and see how they do over a larger time sample23:39
clarkbjeblair: anything in particular you think needs attention re feature proposal freeze?23:39
clarkbif not I am going to go through review backlogs23:39
clarkbjeblair: we have a response from hpcloud, it happens every time right? and we are booting precise images there?23:41
* clarkb pokes at nodepool for info23:41
morganfainbergout of curiosity who do you tell that the link for the hotel block at the omni is now raising a 404 (ATL summit)?23:42
clarkbmorganfainberg: the foundation23:42
clarkbreed would be a good one but is afk this week23:42
morganfainbergclarkb, hm, ok i'll hunt down some email in that front.23:42
morganfainbergclarkb, k thnks :)23:42
morganfainbergclarkb, Infra, they know everything23:43
morganfainbergyes.. everything23:43
morganfainberg;)23:43
*** jgrimm has quit IRC23:43
* anteaya finishes reading backscroll23:44
*** jerryz has quit IRC23:44
*** protux has quit IRC23:44
*** dstanek has joined #openstack-infra23:44
*** denis_makogon_ has quit IRC23:46
*** jergerber has quit IRC23:46
*** jerryz has joined #openstack-infra23:46
openstackgerritCyril Roelandt proposed a change to openstack-infra/config: python-ceilometerclient: make the py33 gate voting  https://review.openstack.org/7487523:46
*** esker has quit IRC23:46
*** alexpilotti_ has quit IRC23:47
openstackgerritA change was merged to openstack-infra/devstack-gate: Add change in README file according to changes in code  https://review.openstack.org/7434223:56
openstackgerritCyril Roelandt proposed a change to openstack-infra/config: python-ceilometerclient: make the py33 gate voting  https://review.openstack.org/7487523:58

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!