Thursday, 2014-03-06

anteayamestery: and you can vote on the sandbox repo:
SpamapSclarkb: that is what nodepool thinks. The cloud thinks there are no nodes.00:00
SpamapSclarkb: we were unreachable from the internet for about 16 hours so that would why they may be out of sync :-P00:00
clarkbSpamapS: gotcha. so I should kick it00:00
fungiSpamapS: fungi is fed and responding once again00:00
jeblairclarkb: kick what?00:00
SpamapSfungi: ci wizard needs food00:00
anteayamestery: best case scenario is you have some commenting history on neutron and voting history on some sandbox patches and then apply at a neutron weekly meetin for voting rights00:00
fungicatching up nowish00:00
SpamapSfungi: tripleo shot the food00:00
clarkbgah we made deletes async00:01
clarkbjeblair: I was going to delete all the nodes so that new ones would build00:01
clarkbjeblair: but that is asnyc so it may not help much00:01
anteayamestery: if markmcclain tells infra change OpenDaylight Jenkins to the voting group, then you are changed in gerrit and you can vote00:01
jeblairclarkb: ok.  yeah, the ones in delete state should be being deleted.00:01
jeblairclarkb: the ones in building will switch over after 8 hours00:01
clarkbjeblair: so leave it as is?00:02
jeblairclarkb: i'd look into why the deleted nodes aren't deleting00:02
jeblair(but i don't think anything needs "kicking")00:03
mordreddhellmann: what's versionutils?00:03
*** blamar has quit IRC00:03
fungiSpamapS: from what i hear, someone shot the food00:04
fungiSpamapS: oh, you beat me to that joke00:04
fungiSpamapS: tripleo warrior is about to die?00:04
SpamapSapparently fungi is async too00:04
marunseen in an experimental jenkins job that attempts to run tox as the tempest user: mkdir('/opt/stack/new/neutron/.tox',)00:04
fungiindeed. scrollback in chronological order00:04
fungicaught up now00:04
marunsorry, permission error on that command00:05
clarkb2014-03-06 00:04:33,671 DEBUG nodepool.NodePool:   Deficit: tripleo-precise: 0 (start: 0 min: 35 ready: 35)00:05
clarkbI think that means that 35 nodes are 'ready' but I don't see them00:05
jeblairclarkb: building counts as ready00:05
clarkbjeblair: gotcha00:06
marunShould tox be targeting a different directory or is there a user that can be both sudo and write to the neutron path?00:06
jeblairclarkb: (otherwise, it would always be building)00:06
fungiclarkb: SpamapS: still 98 in a delete state. i'll try again to clear them if the cloud is really up this time, for reals00:06
SpamapSI'm fine with "wait 8 hours" ... but I was hoping we'd have some idea if the attempts at new nodes would work in 8 hours00:06
jeblairmarun: are you asking "how to sudo in unit tests?"00:06
fungireattempting to delete --now the nodes in a delete state in the tripleo cloud00:07
marunjeblair: There's already a job that should have sudo privileges.00:07
marunjeblair: I'll find the config, hold on.00:07
SpamapSfungi: there are no non-template nodes in the tripleo cloud that I can see00:07
fungiSpamapS: this does not surprise me00:07
clarkbjeblair: what if we delete the rows in the nodepool db manually?00:08
clarkbjeblair: since SpamapS says there is nothing on his end00:08
SpamapSwe have "monitoring" on the tripleo cloud now... and we think we've solved out hardware and driver issues, so hopefully this will be less of a problem going forward.00:08
jeblairmarun: okay, not unit tests then.  carry on.  :)00:08
fungiSpamapS: nodepool is still mostly written based on the assumption that clouds don't fall offline, or when they do they at least do so briefly00:08
SpamapS16 hours isn't brief?00:09
SpamapSas Ng says, we have nine fives of uptime.00:09
fungi(and don't lose track of what they had in the process either)00:09
jeblairSpamapS: sure it is, and 8 hours to recover is brief too.  :)00:09
marunjeblair: any thoughts as to how this job could be made to run?  It's failing when it tries to run tox because the user running the job doesn't have write access to the neutron dir.00:09
fungiSpamapS: that all depends on where you keep your nines00:09
marunjeblair: line 28 in
SpamapSfungi: read it again. We have _fives_00:09
fungiSpamapS: 50.99999% uptime? ;)00:09
SpamapS55.5555555 :)00:10
fungiNINE FIVES00:10
fungigot it00:10
Ngbest SLA troll ever00:10
jeblairmarun: you may want to look at the devstack-gate script.  briefly; the jenkins user has sudo access.00:10
jeblairmarun: i can't dig deeply into it with you right now, sorry.00:10
marunjeblair: fair enough00:10
*** dims has joined #openstack-infra00:10
*** rpodolyaka has joined #openstack-infra00:11
*** rfolco has joined #openstack-infra00:11
jeblairmarun: i _think_ that means that you should be able to "sudo foo" from any of the hooks and it should work00:11
fungikevinbenton: the quixotiists amongst us would rather tilt at the windmills which might increase the number of changes which pass tests00:11
* fungi is a quixotiist00:12
*** cody-somerville has quit IRC00:12
StevenKAre openstack/melange and openstack/python-melangeclient supposed to be un-clone-able ?00:12
marunjeblair: it's not even getting to the point of invoking anything sudo, though.  tox fails trying to create /opt/stack/new/neutron/.tox00:13
clarkbkevinbenton: wouldn't that ignore the fails00:13
clarkbkevinbenton: it is important to remember that these failures are real bugs00:13
marunjeblair: anyway, i'll figure it out00:13
fungiStevenK: they're supposed to be pretty broken and ancient cruft nobody uses any longer00:13
kevinbentonclarkb: no, it would just detect them faster00:13
kevinbentonfungi: i understand :-)00:13
StevenKfungi: They will disappear from ls-projects at some point, then?00:13
*** Ryan_Lane has quit IRC00:13
fungiStevenK: we do not delete history00:14
clarkbkevinbenton: not necessarily as we would lose intertest interactions00:14
kevinbentonclarkb: oh i see. is that the source of some of the bugs?00:14
anteayakevinbenton: multiple bug sources00:14
fungiStevenK: though if they're unclonable, that might already be having the same effect00:14
fungiStevenK: i'll test00:14
anteayaraces, merge optimizations, host optimizations00:15
clarkbkevinbenton: yes I think large portions of them are nova did this thing then later stuff goes ugh00:15
fungiStevenK: yep, pretty darn broken00:15
kevinbentonclarkb, anteaya: i see. i was imagining a node for each high-level tempest group or something along those lines00:16
anteaya5 just merged00:16
fungiStevenK: checking into why00:16
anteayakevinbenton: how is that different from what we are currently doing00:16
anteayawe have a node for each running test job00:16
kevinbentonanteaya: broken down slightly further. like one that runs compute tests, one runs network tests, one runs volume tests, etc00:17
*** alexpilotti_ has joined #openstack-infra00:17
StevenKfungi: I suppose a better question would have been why can't I clone melange and its client00:17
fungiStevenK: oh... i see why00:18
fungiStevenK: we have gerrit configured to replicate it everywhere, but our git server farm only creates repositories listed in
fungiStevenK: so you can clone from but not from
*** alexpilotti has quit IRC00:19
*** Ryan_Lane has joined #openstack-infra00:20
fungiStevenK: i agree we should fix that in one way or another00:20
*** andreaf has quit IRC00:20
StevenKfungi: Hmmmm. Those two things do seem at odds.00:20
*** sandywalsh has quit IRC00:21
fungiStevenK: please file a bug against
fungiStevenK: there are a couple of possible ways to solve this, but it's worth some debate probably00:21
*** alexpilotti_ has quit IRC00:21
fungiStevenK: the main thing i dislike about the current situation is that we're mirroring repositories to github which we're not serving from our own git server farm00:23
*** markwash has quit IRC00:23
fungiand that sends entirely the wrong message in my opinion00:23
jeblairfungi: github is where the crufty stuff is?  :)00:23
fungijeblair: let's stop mirroring current projects to github and let them host our abandoned refuse ;)00:24
fungiStevenK: thanks!00:24
*** mrodden has joined #openstack-infra00:24
*** markmcclain has quit IRC00:25
jeblairfungi: i'd really like to get gerritbot working..00:25
jeblairfungi: i don't see ServerNotConnectedError in any logs except todays00:25
fungijeblair: i'm happy to work on that now00:25
jeblairfungi: my inclination is to revert the change i made.  at least, if i'm charged with fixing it, that's the first thing i'd do00:26
jeblairfungi: because i just want it to work right now, i'm chasing too many other issues to open a new one...00:26
fungipoll... downgrade irclib, release and upgrade gerritbot, or restart it and wait for it to (probably) fail again?00:26
fungii'm fine with backing down irclib00:26
*** packet has quit IRC00:26
kevinbentonanteaya: the reasoning being that with a non-negligible probability of each job failing, it's better to focus more resources towards the top of the gate to fail faster rather than the current depth where a failure wastes at least the same amount of compute resources00:26
*** mgagne has joined #openstack-infra00:26
jeblairfungi: but if someone else really wants to jump down the rabbithole, i'm not objecting...00:26
jeblairi just think we should do one of the other soon.  :)00:27
fungijeblair: i was fine with it while it sounded like clarkb was volunteering ;)00:27
jeblairi'm volunteering to downgrade irclib since i broke it by upgrading00:27
*** yamahata has joined #openstack-infra00:27
fungijeblair: i have absolutely no objections00:27
jeblairfungi, clarkb: so one of you pre-empt me now if you want to do something else.  :)00:27
fungii had not done a time correlation on the traceback occurrence, and agree that it sounds related to the irclib upgrade00:28
clarkbjeblair: fungi: I am ok with it to00:28
clarkbjeblair: was e-r involved as well? it is having trouble but doesn't seem to be ircbot related00:28
jeblairclarkb: it's online now00:28
fungiclarkb: we probably didn't upgrade irclib on status.o.o? (does e-rbot use irclib?)00:29
clarkbfungi: I think it does00:29
fungihmm... maybe then00:29
jeblairit actually should be pretty new anyway on that host00:30
clarkbyeah it probably isn't related. looks like jogo just found a TypeError somewhere00:30
fungiright now pip freeze claims irc==8.5.4 on both systems00:30
*** vkozhukalov has joined #openstack-infra00:32
dansmithclarkb: do we have any plotting of runtimes of nova unit tests?00:32
pleia2fungi: commented on bug 1288485 - if we can do a "sync once" of old projects that won't need to be updated (since they are old, and not updated anymore) one of my solutions on the git server side should work fine00:32
dansmithclarkb: trying to debug something I'm seeing only in the check queue that is on something threading/event/timeout related and wanted to see if it's regularly taking longer (timing out) than other things00:33
*** sabari3 has quit IRC00:33
clarkbdansmith: no, but sdague has a thing that could be used for that00:33
*** sabari has joined #openstack-infra00:33
fungipleia2: maybe. i did just rebuild all the git servers this week and relied on create-cgitrepos to take care of all that for me (worked great by the way)00:33
fungipleia2: i'd be a little hesitant to add more manual steps00:34
pleia2fungi: ah yeah, true for rebuilding00:34
*** denis_makogon has quit IRC00:34
fungipleia2: for reference, the (simplified) replacement steps look like
pleia2fungi: oy00:34
*** bhuvan has joined #openstack-infra00:34
fungipleia2: it was easier to use a cut-and-paste document so i could do it drunk^Weasily00:35
pleia2fungi: so if we create a review.projects.old.yaml to add old stuff to we can edit create-cgitrepos to create the old config too00:35
pleia2I don't remember how the gerrit side of syncing it all works00:36
fungipleia2: i think the simple solution is to just go ahead and add the couple of missing/old projects to review.projects.yaml and stop worrying00:37
fungiwhether those projects appear in one yaml file or separate files isn't really making a substantive statement about their relative viability00:38
fungiso the least complicated solution is to fix the list and move on00:38
pleia2is there a good reason to remove them from that yaml file?00:38
fungii don't really see leaving them out of that file as accomplishing anything, no00:38
* pleia2 nods00:38
*** bhuvan__ has quit IRC00:39
*** bhuvan_ has quit IRC00:39
*** bhuvan has quit IRC00:39
*** jnoller has joined #openstack-infra00:39
*** MarkAtwood has quit IRC00:40
fungiif we want to make a dead projects boneyard down the road, well, we can actually do something to implement that separation in a sane way00:40
fungibut until we do, it's my opinion that there's a lot less work involved in not caring00:40
*** openstackgerrit has joined #openstack-infra00:40
jeblairirc for openstackgerrit is downgraded00:41
fungijeblair: thanks!00:41
StevenKfungi: jogo votes for openstack-boneyard00:41
jeblairnp, sorry for that.  maybe we should just install the new version on eavesdrop.o.o and leave that one alone00:42
pleia2jeblair: thanks for trying to fix it :)00:42
fungipleia2: and if someone proposes a patch to add them to that file for now (with whatever scary admonishing description seem appropriate), i won't hesitate to +2 that00:42
pleia2any bright ideas on how to get a list of "all of them"?00:42
jeblairpleia2: i'm pretty sure the new version will work, i just don't want to find out what _won't_ work about it right now00:42
fungipleia2: gerrit ls-projects00:42
*** bhuvan has joined #openstack-infra00:43
*** bhuvan_ has joined #openstack-infra00:43
fungipleia2: and if that was vague, i meant 'ssh -p 29418 gerrit ls-projects'00:44
dansmithclarkb: do you know if sdague has a dansmith-is-an-idiot detector?00:44
pleia2fungi: no worries, I got that :)00:44
fungidansmith: come closer and we'll find out00:44
*** adrian_otto has quit IRC00:44
*** sarob_ has quit IRC00:45
*** sarob_ has joined #openstack-infra00:45
*** hogepodge has quit IRC00:45
*** amcrn has quit IRC00:46
fungii are mailing list fail. it's now march 6th utc and i've found time to skim and delete threads starting on the -dev ml up to february 27th now00:46
fungii'm weighing declaring ml bankruptcy against knocking off for the evening and hoping tomorrow finds me somehow less busy00:47
*** dprince has quit IRC00:48
fungii'll do some code review first so i don't feel completely useless00:48
*** MarkAtwood has joined #openstack-infra00:48
*** zhiwei has joined #openstack-infra00:48
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Add a script to manage IRC perms
*** derekh has quit IRC00:49
jeblairbtw, i ran that for real on the channels in the config file00:49
fungithat one ^ was on the priority list! ;)00:49
*** wchrisj has quit IRC00:49
*** sarob_ has quit IRC00:49
jeblairfungi: that's a new one -- it's the followup that actually makes changes00:49
fungiokay, both are on the priority list in that case00:50
*** sarob_ has joined #openstack-infra00:50
jeblairnext up is add 50 more channels to the config and then finally normalize everything00:50
*** wchrisj has joined #openstack-infra00:50
jeblairreed: ttx: you two are about to get ops on all openstack channels.  anyone else you think should be a global op (to deal with spammers, etc?)00:50
jeblair(by about to, i mean within the next few days maybe)00:51
*** jcoufal has quit IRC00:52
fungiStevenK: has a point... ttx and reed and infra give us okay coverage over one hemisphere, roughly00:52
fungiat least somebody in apac would probably be a good addition00:53
*** rpodolyaka has quit IRC00:53
openstackgerritA change was merged to openstack-infra/elastic-recheck: Add query for neutron db migration conflicts.
*** MarkAtwood has quit IRC00:54
fungii have no idea if lifeless has time to be (or interest in being) irc police however00:54
StevenKfungi: lifeless already has experience with #ubuntu-* ops00:55
openstackgerritClark Boylan proposed a change to openstack-infra/elastic-recheck: Don't get loggers until after log setup
StevenKWhich I why I put him forward00:55
StevenKs/Which I/Which is/00:55
fungiStevenK: might be all the more reason he'd want to decline the honor ;)00:55
StevenKfungi: Possibly.00:56
StevenKfungi: ITYM 'honour'? :-P00:56
anteayakevinbenton: now I understand the design you are proposing better00:56
* StevenK tries to add a 'u' to responsibility, fails00:56
anteayakevinbenton: I'm stuck on what you mean by non-negligibe probablity of each job failing00:56
openstackgerritClark Boylan proposed a change to openstack-infra/config: Use the correct qualname for recheckwatchbot
fungiStevenK: just pronounciate it with your best paul hogan impersonation00:57
clarkbjogo: ^ and that will fix the other issue I have found00:57
StevenKfungi: "That's not a knife!"00:57
*** bhuvan___ has joined #openstack-infra00:58
jeblairclarkb: any chance we can do as a revert?00:58
*** vkozhukalov has quit IRC00:58
clarkbjeblair: looking00:58
jeblairclarkb: is the change00:58
clarkbjeblair: maybe? was that inserted in a weird way?00:58
clarkbjeblair: thanks00:58
kevinbentonanteaya: i mean that there is still a decent chance that a job will fail on something like check-tempest-dsvm-neutron-isolated00:58
fungieven just fashioning it as a revert would make the situation a little more clear, i think00:59
clarkbjeblair: sort of, there are places in that that are broken too00:59
jeblairclarkb: oh?00:59
clarkbjeblair: and the LOG object has been cargo culted into other places00:59
lifelessI am ok with beig an IRC op00:59
clarkbjeblair: the class members for log load at import time too00:59
*** sarob__ has joined #openstack-infra01:00
jeblairclarkb: i'm fine with it being an instance var, but still, why is that a problem?01:00
clarkbjeblair: because incremental loading of logging configs doesn't work well01:00
clarkbjeblair: so we have to make sure that setup_logging executes before we grab the loggers01:00
clarkbalso I love the commit message on 6656401:01
*** bhuvan_ has quit IRC01:01
*** prad has joined #openstack-infra01:01
jeblairclarkb: ok, i got that... i guess i'm confused as to why we've never seen an issue with class-level vars01:01
*** wenlock has quit IRC01:01
clarkbjeblair: hrm, I may be misreading the python logging docs01:01
clarkbin which case I don't know why the module level LOG isn't working01:01
lifelessfungi: any chance you can peek at nodepool logs?01:02
clarkbbut that may be explained by
*** yamahata_ has joined #openstack-infra01:02
*** bhuvan has quit IRC01:02
jeblairclarkb: basically, yeah, i thought that class level vars were a solution to the problem you are describing01:02
*** zhiwei has quit IRC01:02
*** sarob_ has quit IRC01:02
fungilifeless: which nodepool logs. image logs or operating logs?01:02
clarkbjeblair: oh so it is a problem at module level but not class level?01:02
anteayakevinbenton: isolated jobs for neutron have been removed:
jeblairclarkb: that is a dupe of
lifelessfungi: well there is a template thats been building for some time; and we don't have any tripleo slaves01:02
clarkbjeblair: thanks01:03
*** bhuvan___ has quit IRC01:03
*** bhuvan__ has quit IRC01:03
clarkbjeblair: clearly I should've asked you about e-r logging before I debugged :)01:03
openstackgerritElizabeth Krumbach Joseph proposed a change to openstack-infra/config: Add back old projects to replicate to git.o.o
pleia2fungi: that do? ^01:03
jeblairclarkb: call before you dig?01:03
clarkbjeblair: indeed01:03
clarkbthere could be danger below01:03
pleia2(fortunately it is only those two that are in ls-projects but not on git.o.o)01:04
lifelessfungi: so there are arguably two issues we'd like to fix, SpamapS and I are hanging around in the hope we'll get it fixed01:04
fungilifeless: it'll be faster for me to sift through the logs. looking now01:04
kevinbentonanteaya: whoops. gate-tempest-dsvm-neutron-pg was the one that died on me this morning01:04
*** sarob__ has quit IRC01:05
anteayakevinbenton: on what failure? can I see the logs?01:05
clarkbjeblair: I am still trying to wrap my head around why class level loggers work01:05
*** SumitNaiksatam has quit IRC01:05
clarkbjeblair: those statements are executed at import time just like the module level loggers right?01:05
*** SumitNaiksatam has joined #openstack-infra01:05
jeblairclarkb: i'm curious too, but i think i need to forage for food now01:05
kevinbentonanteaya: looks like the cirros ssh timeout bug01:06
clarkbjeblair: but I think a revert would be cleaner then a second commit to do additional cleanup in the cargo cult areas01:06
jeblairfungi: heads up, i talked to someone at rax today who has a team of folks wanting to build a gerrit replacement on openstack principles and in an open manner like storyboard; he should be posting to the ml soon01:07
jeblairclarkb: ^01:07
clarkbjeblair: I am not sure what to think of that01:07
jeblairi think it's pretty exciting01:07
clarkbon one hand yay sane upstream. on the other this thing over here works01:07
fungijeblair: awesome--they approached me a few days ago as well and i told them "mailing list" too01:07
fungijeblair: but sounded exciting!01:07
clarkbgerrit does not have the same sort of problems that the bug tracker world have01:08
*** atiwari has quit IRC01:08
*** sabari has quit IRC01:08
jeblairclarkb: it has a different set... we've certainly had our issues with it...01:08
fungijeblair: i started to point them at storyboard as an example of a related/parallel effort and was please to discover they were already aware and patterning some of their execution plan after that (and wanted to see them integrate well)01:08
clarkbjeblair: right but the set it has don't necessarily scream fork to me01:09
clarkbor even reinvent01:09
* clarkb points at our effort to stop running a fork as evidence01:09
fungiclarkb: i get the impression that one of the reasons it's happening is that a code review system was proposed internally, gerrit was held up as best-of-breed and some one (or ones) panned it because it was written in java01:10
anteayakevinbenton: yes also it ran on hpcloud-az2, which is displaying some very difficult behaviours around image builds01:10
kevinbentonanteaya: so what i'm getting at is if every job has something like a 1/6 chance of dying, a job 6 deep in the queue has like a 66% of failing or having one of its parents failing01:10
jeblairclarkb: i'm sure you'll admit we've have considerable trouble getting our patches in upstream.01:10
clarkbjeblair: yup no argument there01:10
anteayakevinbenton: but why is it dying?01:10
kevinbentonanteaya: ah, maybe i'm overestimating the failures like these timeouts then01:11
clarkbkevinbenton: way overestimating01:11
anteayakevinbenton: the reasons ro the test failures are myriad and moving01:11
openstackgerritMonty Taylor proposed a change to openstack-infra/storyboard: Handle yaml files updates
openstackgerritMonty Taylor proposed a change to openstack-infra/storyboard: Make project description longer
clarkbkevinbenton: we are really stable right now oddly01:11
clarkbusually a bunch of broken happens during feature freeze01:11
anteayadue to load01:12
anteayaflushing out bugs we never knew we had, because load01:12
openstackgerritMonty Taylor proposed a change to openstack-infra/config: Load storyboard projects from projects.yaml
jeblairclarkb: anyway, i think the case for this will come out once we start looking at it.  i mostly wanted to convey that i thought they way they want to approach it is really good.01:12
clarkbjeblair: I agree that the approach is good01:12
kevinbentonclarkb, anteya: is there anyway to see stats on the current ratio of rejects to accepts?01:12
clarkbjeblair: and like the idea of an actually responsive upstream for a code review tool01:13
anteayakevinbenton: I'm not saying that talking about improvements isn't a good idea, we welcome those discussions01:13
clarkbespecially when you consider stuff like change screen 201:13
clarkbbut, there are a bunch of tools out there including gerrit and they work supposedly01:13
*** Ryan_Lane has quit IRC01:13
dstufftif killing gerrit means there is a review api that doesn't open a tab for each file I'm for it01:13
dstufftnot api01:13
anteayakevinbenton: you might like
dstufftI'm dumb today01:13
anteayaI wish I knew how to use it better01:14
clarkbdstufft: its ok, so am I01:14
jeblairdstufft: someone wrote a patch for that but it didn't get included because of the gerrit CLA.01:14
clarkbdstufft: we could start a support group01:14
anteayakevinbenton: jeblair sdague fungi clarkb and jogo make the nicest graphs I see01:14
jeblairanyway, food now.01:14
* fungi thinks we're throwing stones in glass houses complaining about a cla ;)01:14
jeblairfungi: i complain about ours more than anyone elses.01:15
dstufftjeblair: a patch that isn't applied is pretty useless to me unless I convince people to apply said patch :D01:15
anteayakevinbenton: maybe on an after ff day like next week one of them might be able to make some suggestions01:15
StevenKjogo: But this does?01:15
fungijeblair: fair point!01:15
jogosilly StevenK01:15
dstufftI never understood why people care about a CLA01:15
jeblairdstufft: it won't apply now.  it's too ald.  i mostly wanted you to have another piece of anecdotal evidence that CLAs are bad for free software projects.  i think it's important everyone knows that.  :)01:15
jeblairthis is why i care.01:15
dstufftjeblair: why are they bad, because some people won't sign them? ;P01:15
fungi(especially people working on free software projects)01:16
jeblairdstufft: all for nothing.01:16
fungidstufft: they complicate things. (probably) entirely unnecessarily01:16
fungithough they do make lawyers feel better about themselves01:16
StevenKStuff and other stuff jogo01:17
anteaya13 in the gate and 16 in post01:17
jeblairfungi: true, one thing they certainly do is increase billable hours.01:17
jogoanteaya: it looked like the neutron stuff was the culprite in the end01:17
StevenKSoon to be 10, I think01:18
fungilifeless: unfortunately nodepool is logging when it starts launching a new node in the tripleo cloud, but then basically nothing thereafter... there are 35 nodes it thinks are in a "building" state at this point, but i haven't yet found an explanation for why none of them are transitioning to a ready state. digging deeper01:18
anteayado expand01:18
anteayapersonally I have been very proud of the way neutron has been addressing issues01:18
anteayathey haven't been blocking the gate for others, that I have seen01:18
anteayaand they are responsive when asked to address issues01:18
anteayaI have been away all day01:19
anteayaand have missed the context you are referencing jogo01:19
anteayabut am hear now and interested in listening01:19
dstufftjeblair: fungi well I don't know much, but VanL says Python needs one and I trust VanL :)01:21
*** jcooley_ has quit IRC01:23
*** rpodolyaka has joined #openstack-infra01:24
fungiSpamapS: lifeless: so this seems to be the situation... nodepool is configured to keep at least 35 nodes from tripleo on hand, and currently doesn't see a demand for more than that. it *thinks* it's building 35 currently (the oldest has been building for over 7 hours and the youngest for right at 1 hour). it won't try building more until one of those reaches the 8 hour timeout or the demand for nodes01:25
fungigrows past 35. and it doesn't have any way to force building nodes to a delete state other than the 8 hour timeout or hearing back from the provider that the build failed01:25
* anteaya makes tea and hopes jogo returns because she would like to know what he is talking about01:25
fungiSpamapS: lifeless: how recently should it have actually started working?01:26
StevenKfungi: So is it getting 404's, or is it actually talking correctly?01:27
StevenKfungi: Since the tripleo cloud says it has no nodes01:28
fungiStevenK: i see no evidence it's getting a 404. it may not be getting a completion or socket closure from the nova boot call... i'll see how many established sockets we have to the api endpoint01:28
fungihmm... one established socket to but i guess it reuses an open socket so that was no help01:29
mordreddstufft: lawyers tend to say that people need things that make lawyers more money01:30
clarkbjogo: so I ahve WIP'd my er change because I think we should revert sdague's change so that it is clear that it needs to be one way and not the other01:30
mordredsdague: it's hard to remember some times that lawyers work for us and not the other way aroudn01:30
mordreddstufft, not sdague01:30
*** dkliban has quit IRC01:31
fungimordred: you only say this because you're married to a law major01:31
*** jnoller has quit IRC01:31
kevinbentonanteaya: thanks for the link01:31
anteayakevinbenton: np01:31
dstufftmordred: well IANAL but it seems to me the case against CLA hinges on an implicit license of some contribution just because the project that the patch was agaisnt has a particular license01:31
dstufftI don't think you can reasonably assert that legally though :/01:32
anteayaand you bring good thoughts, just hard to find fertile ground for gate redesign today01:32
anteayaweary and all01:32
*** jcooley_ has joined #openstack-infra01:32
StevenKfungi: Apparently, there is a template build underway, can you talk to the node that is doing that?01:32
StevenKfungi: ssh, ping, etc01:33
fungiStevenK: nodepool does not believe that it is currently building a template. it knows about two ready images for which the template server is assumed to be long gone01:34
fungiStevenK: the two images it's aware of are from roughly 3 and 7 days ago respectively (it tries to build new images nightly, but the persistent outages have impeded that)01:34
lifelessfungi: I'll delete the template thats building too ?01:35
*** nosnos has joined #openstack-infra01:35
lifelessfungi: uuid was 56fe3f9e-365e-4609-a70e-3c171aba3fba01:36
fungilifeless: okay, good to know. is it possible that the several-day-old image it's trying to boot new nodes from is broken somehow, in ways that are causing it not to get notification that nova boot is failing?01:36
lifelessfungi: I don't think so01:36
*** thuc has quit IRC01:36
fungilifeless: i can try to create an updated image and find out what (if anything) breaks01:36
*** thuc has joined #openstack-infra01:37
*** zhiwei has joined #openstack-infra01:37
*** jcoufal has joined #openstack-infra01:37
lifelessfungi: +101:37
fungistarted new build now01:37
lifelessnova list shows a template started01:38
lifelessI can ping the template01:38
fungithat's good. so far i have no appreciable output, but generally wouldn't until the ssh interaction begins01:38
lifelessok so if nodepool thinks it has 35 nodes01:38
lifelessfungi: I can recheck a couple of things01:38
funginodepool thinks 35 nodes are currently in the process of building01:39
fungisome started slightly over an hour ago01:39
*** jcooley_ has quit IRC01:39
funginodepool has logged into and is puppeting the template in progress now01:39
*** jcooley_ has joined #openstack-infra01:40
lifeless    check-tripleo-seed-precise NOT_REGISTERED01:40
lifeless    check-tripleo-undercloud-precise NOT_REGISTERED01:40
lifeless    check-tripleo-overcloud-precise NOT_REGISTERED01:40
lifelessfungi: I guess that means jenkins doesn't think it can run the job at all ?01:40
fungilifeless: that's because none of the jenkins masters have had any tripleo-precise nodes added to them since we restarted zuul01:40
*** stevebaker has quit IRC01:40
*** stevebaker has joined #openstack-infra01:40
funginormally the jenkins masters register jobs associated with node labels into zuul's gearman server when nodes which can run them are added01:41
*** thuc has quit IRC01:41
zhiweifungi: hi01:41
lifelessfungi: is there a viibile log of that template build01:42
zhiweiI saw there is no rename stackforge project in #infra meeting agenda.01:42
fungizhiwei: it's on the agenda but wasn't discussed during the meeting due to time constraints. outside the meeting we did touch base with SergeyLukjanov who is the savanna ptl and he mentioned that they're waiting on foundation legal feedback before they can settle on the final name for theirs, so probably next week or the01:43
fungiweek after would be my guess01:43
*** dprince has joined #openstack-infra01:44
*** Ryan_Lane has joined #openstack-infra01:44
*** vkozhukalov has joined #openstack-infra01:44
fungilifeless: i don't think that client-initiated image updates end up in the image logs, though they might. i'm just watching the console spew from it01:44
zhiweiok, thanks. This process blocked too long.01:45
lifelessfungi: is it progressing?01:45
openstackgerritA change was merged to openstack-infra/config: Log recheckwatchbot messages
fungilifeless: it _was_ but has stopped squaking and gone silent. the last few lines were from the start of the run...
fungiit's been sitting there for several minutes now01:47
fungii'll ssh into it and see what's happening01:47
*** SumitNaiksatam has quit IRC01:47
*** Ryan_Lane has quit IRC01:48
fungilifeless: oh, while i was fiddling with getting the ip address and correct ssh key, it updated01:51
fungiso it *does* seem to be progressing after all01:51
*** ryanpetrello has joined #openstack-infra01:52
*** gokrokve has joined #openstack-infra01:53
lifelessok, so I think we need to go01:54
fungiit's moved on past puppeting to git repo caching01:54
lifelessto get to the group dinner01:54
lifelessits clearly a working cloud01:54
fungilifeless: hop to it, and i'll leave you updates in scrollback01:54
lifelesshopefully nodepool will sort its stuff out overnight.....01:54
clarkbfungi: before I forget how do I check for the nova git thing timing out when merging in zuul?01:54
clarkbfungi: would like to check to see if any hapapend this afternoon after the restart01:54
*** sabari has joined #openstack-infra01:55
fungiclarkb: sudo grep "did not appear in the git repo" /var/log/zuul/debug.log01:55
fungior debug.log.2014-03-0501:55
fungilast hit was 17:58:4601:55
clarkb2014-03-05 17:58:46,913 is the timestamp for the last one01:55
clarkbit appears to be much happier now01:55
kevinbentonclarkb, antaeya: i made a graph that i think shows the probability of a failure. it compares "pipeline gate total changes" with the "gerrit event change merged" by day01:56
fungilike clams in... clamato01:56
kevinbentoncan't seem to export a link to the graph though01:56
kevinbentonhere is the image01:56
kevinbentonhere is the data string01:57
kevinbentondivideSeries(diffSeries(summarize(stats_counts.zuul.pipeline.gate.total_changes, "1day"), summarize(stats_counts.gerrit.event.change-merged, "1day")), summarize(stats_counts.zuul.pipeline.gate.total_changes, "1day"))01:57
reedjeblair, re: ops, add fifieldt too01:57
anteayakevinbenton: can I get a shortened link?01:57
fungireed: jeblair: good point. he's even in apac!01:58
anteayakevinbenton: weechat can't handle links that linewrap01:58
*** harlowja has quit IRC01:58
* anteaya clicks01:58
fungi"ZeroDivisionError: integer division or modulo by zero"01:59
fungimaybe i pasted the link parts together incorrectly01:59
kevinbentonthe one should work01:59
anteayakevinbenton: what is the vertical axis?01:59
fungiyeah, i must have. the link shortener one works for me01:59
kevinbentonanteya: ratio of failures to successes01:59
fungihigh ratio bad02:00
kevinbentonanteaya: low ratio is good02:00
fungikevinbenton: good to see that we're trending downward!02:00
kevinbentonanteaya: sorry i keep messing up your name. my fingers don't like all of the vowels :-)02:00
anteayait seems to have data for thursday and friday02:01
anteayaI thought it was wednesday today02:01
anteayakevinbenton: np02:01
anteayaI usually do ke tabcomplete for you02:01
fungianteaya: it's been thursday for a couple hours now02:01
anteayaor an tab complete for me02:01
anteayaand anne gentle and I get each others messages a lot02:01
anteayait is thursday02:02
anteayaso it is 2:20 utc02:02
fungi2:02 according to my sun dial02:02
kevinbentonanteaya: whoops, i think i didn't have the end date set right02:02
anteayayes me as well02:02
* anteaya clicks again02:02
clarkbkevinbenton: so that will get you an upper bound02:03
clarkbkevinbenton: but not an exact number because changes can be removed from the pipeline without merging for reasons other than flaky tests02:03
clarkbkevinbenton: if you push a new patchset the one in the gate pipeline is removed, if a reviewer -2's a change it won't merge after testing02:03
kevinbentonclarkb: yeah, i tried to find a stat for gate job failure so it was more explicit02:03
clarkbkevinbenton: but that should give you a reasonable upper bound02:03
fungiclarkb: kevinbenton: in fact, we just today realized that openstack/requirements merges and the requirements proposal job are a major culprit there02:04
*** mrodden has quit IRC02:04
*** harlowja has joined #openstack-infra02:04
clarkbfungi: ya I should fix that when I have a minute02:04
clarkbI think I have rewritten that script about 4 times now. I should feel bad02:04
fungiat least when the gate is longer, the chances of requirements sync changes being removed from the gate by new patchsets is higher02:04
kevinbentonclarkb: that's the other issue, this would probably vary a lot from project to project, right?02:05
anteayakevinbenton: yes depend on the bug dejour02:05
fungikevinbenton: significantly. especially keeping in mind that we host a lot of projects whose jobs are not part of the main integrated gate queue02:06
anteayaand olso change broke all of nova for a while yesterday02:06
anteayakevinbenton: until there was a new config.sample merged02:06
fungianteaya: twice in fact ;)02:06
*** thedodd has joined #openstack-infra02:06
anteayathe first was the oslo change with the dependency02:06
anteayathe second was the stale config file02:06
kevinbentonis there a task specifically that i can look in the stats for that marks a job as failed?02:07
anteayayay, starting to catch on02:07
fungikevinbenton: there is, but not all jobs which are run in the gate end up being significant since changes can be retested when other changes ahead of them fail02:07
clarkbkevinbenton: ya notmyname has a thing together02:07
clarkbkevinbenton: he has it hosted somewhere too02:08
fungikevinbenton: further complicated by the fact that when jenkins cancels jobs for a variety of other reasons, those can also often be incorrectly reported as a job failure02:08
kevinbentonfungi: well i would want to include that case02:08
kevinbentoni'm looking for the things that trigger the downstream jobs to have to get rebased and restarted02:09
fungikevinbenton: why? if a change's running jobs are cancelled and retried, that doesn't necessarily imply a separate event. it's usually a symptom of a failure of a change further ahead02:09
*** SumitNaiksatam has joined #openstack-infra02:10
*** Ryan_Lane has joined #openstack-infra02:10
*** jcooley_ has quit IRC02:11
* clarkb AFKs02:12
kevinbentonfungi: right. but resets is what i'm looking for because it supports the notion of focusing more resources at the top of the gate to get to success or failure faster02:12
kevinbentonfungi: because a reset is wasted compute resources for the downstream jobs02:12
fungikevinbenton: so, we've already made some very recent changes which do just that02:13
kevinbentonfungi: cool! details?02:13
fungithere's a scaling heuristic which decides how many or how few of the changes at the front of the gate should be tested, and varies by the recent pass/fail frequency of other changes02:13
kevinbentonanteaya: thanks, that link was exactly what i was trying to make02:14
fungialso, zuul now knows that as soon as a change has at least one failing job, it and its dependent changes (if any) should step aside from the main series and allow changes to shift forward to be tested on top of the other changes which are already succeeding02:14
kevinbentonfungi: so what i was getting at earlier was dividing the tempest tests for a single change across more compute nodes02:15
kevinbentonfungi: so a job can pass/fail within 30 minutes instead of an hour or whatever02:16
fungipart of the challenge is that we have limited compute resources with which to accomplish this, and need to try not to starve check and other pipelines while still prioritizing resources assigned to jobs for changes in the gate pipeline02:16
*** morganfainberg is now known as morganfainberg_Z02:16
fungikevinbenton: but yes, we've approached the potential for distributed testing of long-running jobs. i think sdague and mtreinish may have some details on that front02:17
*** malini is now known as malini_afk02:17
fungii know several ways of accomplishing that were discussed. i think their analysis concluded that overall throughput would most likely diminish because of setup/teardown overhead eating into the overall capacity02:18
*** Ryan_Lane has quit IRC02:18
openstackgerritMichael Krotscheck proposed a change to openstack-infra/storyboard: [WIP] Updated oslo
fungipartly because tempest depends on building a cloud from scratch incorporating the proposed changes, and that part takes a nontrivial amount of time, and would have to be done on each discrete unit where part of the test was being performed02:19
*** david-lyle has joined #openstack-infra02:20
fungiso the more widely we distribute that load, the more overall capacity we lose to setup overhead02:20
*** krotscheck has quit IRC02:21
fungiin a nonlinear envelope02:21
*** chandan_kumar has joined #openstack-infra02:21
*** dprince has quit IRC02:22
fungithere might be a sweet spot where that overhead is balanced by the estimated chance a change would suffer from a reset due to a failing change ahead. the task of working out where that moving point is from one moment to the next seems fairly daunting02:23
fungi(former math theory major hat on)02:24
kevinbentonfungi: that's where the average daily gate failure rate comes into play :-)02:24
kevinbentonfungi: as that increases, the more resources shift towards the top02:24
*** ryanpetrello has quit IRC02:24
anteayakevinbenton: you are assuming stability where there is not so much02:24
anteayahost dns issues02:25
anteayaimage building issues02:25
anteayamirror issues02:25
fungikevinbenton: i think daily is far too coarse. we get incidents which break all gating for short periods of time while we scramble to solve the underlying cause, and then long periods of relative calm with background noise from nondeterminism in some tests02:25
anteayathere is much we do to address the situation02:25
anteayaand we have to constantly have flak jackets on for stuff that happens that we never even thought of02:25
*** jcoufal has quit IRC02:25
kevinbentonanteaya, fungi: ah, i didn't realize how much firefighting was involved :-)02:26
fungirather a lot02:26
anteayamuch firefighting02:26
anteayatwo weeks ago was fun02:26
anteayajenkins upgrade downgrade02:26
*** dstanek has quit IRC02:26
anteayathat took all of a day02:26
anteayaand the gate was basically at a stand still02:26
anteayaand fungi was doing the heavily lifting02:27
openstackgerritMonty Taylor proposed a change to openstack-infra/storyboard: Make project description longer
fungibasically, as a combined meta-project with more than a hundred outside dependencies, any one of which could spontaneously make a broken release, and relying on network communication over the internet and between cloud hosts which can sometimes be flaky, there a lot of running from one emergency to the next02:27
anteayawhile the rest of us do what we could to help02:27
*** nati_uen_ has quit IRC02:27
fungimuch of our non-firefighting work goes into finding ways to increase scalability and robustness of these systems to help mitigate the emergencies we can up front, but we can't really predict what the next day will bring with much certainty02:28
*** bada_ has joined #openstack-infra02:28
*** bada has quit IRC02:28
anteayakevinbenton: and optimizing some thing breaks or causes a bottle neck some where else02:28
anteayapretty much every time02:29
fungikevinbenton: that's precisely what the current zuul dependent pipeline windowing was patterned after in fact... tcp slow-start02:29
anteayaand we have ddos ourselves more than once with an optimization that did the broken thing very quickly02:29
*** sabari has quit IRC02:30
kevinbentonfungi: so the only missing part then is piling on some extra instances ;-)02:30
anteayakevinbenton: got any to spare?02:30
kevinbentonanteaya: yeah, i can imagine with that many compute nodes ddosing wouldn't be hard02:30
fungikevinbenton: yep. we're in constant talks with our current generous resource donors about quota increases, and always entertaining offers from other interested parties02:31
anteayahappens pretty quick02:31
anteayathen we have to figure out why, very fast02:31
anteayathen how to apply the fix02:31
*** zhiyan_ is now known as zhiyan02:31
kevinbentonanteaya: unfortunately no extra servers here02:31
anteayabringing down the system is not an option, we have to restart pieces and time them correctly02:31
fungikevinbenton: next time you're bored, tell 1000 of your servers to clone the same 80mb git repo simultaneously ;)02:31
anteayajust to see what happens02:32
kevinbentondoes github block IPs speaking of which?02:32
fungikevinbenton: they do have throttles, but we don't use github (partly for that reason, and also because they're not free software)02:32
kevinbentoni noticed our BigSwitch-CI has trouble cloning the repos sometimes02:32
anteayawe use cgit02:32
kevinbentonis that what i should be cloning from?02:33
anteayaalso known as git.o.o02:33
fungiyou should clone from wherever you want to, honestly02:33
*** jnoller has joined #openstack-infra02:33
fungii'm not going to claim that lacks any single points of failure02:33
kevinbentonwell will reset my connections if i'm cloning quite a bit?02:33
fungithough i did just grow and upgrade the server darm there this week02:34
anteayayou did02:34
anteayakevinbenton: only when we ddos ourselves02:34
anteayaso usually, it shouldn't02:34
anteayaand if it does, tell us02:34
kevinbentonk, and shared fate with the regular jenkins test isn't a bad thing02:34
anteayayou might be the canary in the coal mine02:34
fungikevinbenton: we don't throttle it, no, but we recommend you pre-cache within your network when possible. it'll speed things up for you if you can (we do, and then just pull updates since the last nightly cache on our slave images)02:34
fungiway faster than doing a full clone of each project on every job02:35
kevinbentonfungi: it is based on a cache, but it's getting to be a couple weeks old now02:35
kevinbentonfungi: did an initial devstack run to get everything cloned02:35
kevinbentonfungi: then just set the RECLONE=True option so it just pulls updates02:36
*** jnoller has quit IRC02:36
fungikevinbenton: we clone all the projects once each night for each type of server image in each provider and then pull updates during the day when jobs use servers built from those images. it's been working well so far02:36
kevinbentonfungi: yeah, that'll be the next step for me to automate02:37
*** Alexandra is now known as alex-lunch02:37
fungikevinbenton: we already have that automated. it's all free software if you want to bend it to your own purposes02:37
kevinbentonfungi: which project is that one?02:38
fungikevinbenton: and the scripts we use with it are
fungiif nothing else, they might serve as good examples for how we solved/are solving these problems for ourselves02:39
kevinbentonit shouldn't be a big step for me to automate the caching at this point02:40
fungianyway, there's some good examples in there you can lift if needed. all apache-licensed02:41
fungiand if you have bug fixes, please feel free to put them up for review. we always welcome the help!02:42
*** sabari has joined #openstack-infra02:42
kevinbentonokay, i'll check these out02:43
kevinbentonthanks! i'm off for the night02:43
fungiyou're welcome, of course! me too i think02:43
* anteaya nods02:45
*** rpodolyaka has quit IRC02:46
*** thuc has joined #openstack-infra02:47
anteayawe have another submitted, merge pending issue:
anteayaanother dependency issue02:49
*** khyati has quit IRC02:50
*** bada has joined #openstack-infra02:50
*** dkliban has joined #openstack-infra02:50
*** thuc has quit IRC02:52
openstackgerritA change was merged to openstack-infra/elastic-recheck: Remove inaccurate docs about wildcards
*** bada_ has quit IRC02:52
fungiyep, looks like another broken rebase02:53
jeblairfungi: oh! we should make sure that case isn't detected by the replication check02:54
*** sarob_ has joined #openstack-infra02:55
fungijeblair: i think my patchset to remove the replication check still checks that gerrit has reported the change in a merged state02:55
*** markmcclain has joined #openstack-infra02:55
fungiunless i'm misreading what that code is meant to accomplish02:56
*** markmcclain has quit IRC02:56
jeblairfungi: i think you are right, except: if status == 'MERGED' or status == 'SUBMITTED':02:57
fungidata = self.gerrit.query(change.number)02:57
fungichange._data = data02:57
fungichange.is_merged = self._isMerged(change)02:57
*** sarob_ has quit IRC02:57
jeblairfungi: so we're actually accepting submitted as merged.  that seems wrongish.02:57
fungiyep, i missed that inside of _isMerged()02:58
*** sarob_ has joined #openstack-infra02:58
fungiand i COMPLETELY agree02:58
jeblairi wonder what i was smoking 1.5 years ago.02:58
*** sweston has quit IRC02:59
fungiwe should fix that, even if we don't remove the replication check, because i'm pretty sure it's wrong even just on principle02:59
*** thomasem has joined #openstack-infra02:59
jeblairfungi: more than that... guess what the commit msg for the commit that adds that is...02:59
*** chandan_kumar has quit IRC02:59
fungii'll submit a separate one-liner for that right now02:59
jeblairfungi: "Initial commit."02:59
fungiat least we didn't add it later for some worse reason03:00
jeblairi'm guessing we were a little more loosy goosy about this gate thing at the time.  :)03:00
fungisalad days03:00
*** SumitNaiksatam has quit IRC03:00
jeblairi think that was also before zuul was doing things like not testing changes whose dependencies weren't approved, so i might have actually intended to handle that case like that.03:01
jeblairbut yeah, pretty wrong now.03:01
*** stevebaker has quit IRC03:02
*** stevebaker has joined #openstack-infra03:02
*** sarob_ has quit IRC03:02
*** SumitNaiksatam has joined #openstack-infra03:03
*** thomasem has quit IRC03:04
*** prad has quit IRC03:04
*** gokrokve_ has joined #openstack-infra03:04
*** stevebaker has joined #openstack-infra03:05
openstackgerritJeremy Stanley proposed a change to openstack-infra/zuul: Submitted is _not_ necessarily merged in Gerrit
fungiso we don't forget later ^03:05
fungiand with that, off to do eveningish things03:06
openstackgerritKhai Do proposed a change to openstack-infra/jenkins-job-builder: fix setting of default values for missing parameters in jenkins.ini file.
fungioh, also in unrelated news, the debian-security team is strongly considering continuing security support past normal eol for stable...
fungie.g., a "squeeze-lts" suite or similar03:08
*** gokrokve has quit IRC03:08
fungione more data point to toss into the blender03:08
*** Ryan_Lane has joined #openstack-infra03:09
*** alex-lunch is now known as Alexandra03:09
*** sabari has joined #openstack-infra03:15
*** wchrisj has quit IRC03:21
*** SumitNaiksatam has quit IRC03:22
*** SumitNaiksatam has joined #openstack-infra03:23
*** unicell has joined #openstack-infra03:27
*** vkozhukalov has quit IRC03:27
*** wchrisj has joined #openstack-infra03:35
*** jcooley_ has joined #openstack-infra03:37
*** stevebaker has joined #openstack-infra03:38
*** stevebaker has joined #openstack-infra03:42
*** Ryan_Lane has quit IRC03:42
*** stevebaker has quit IRC03:42
*** pcrews has left #openstack-infra03:43
*** jcooley_ has quit IRC03:53
*** sgordon_ has joined #openstack-infra03:55
*** wenlock has joined #openstack-infra03:55
*** fifieldt has joined #openstack-infra03:55
sgordon_so i will probably be like the 500th person to mention this03:55
sgordon_but is * down?03:55
sgordon_nm as i typed that it's back03:56 looks up for me03:56
*** Sam-I-Am has joined #openstack-infra03:56
*** Sam-I-Am has left #openstack-infra03:56
*** bada has quit IRC04:00
*** sarob_ has joined #openstack-infra04:03
kevinbentonwhat's the appropriate reverify statement for when jenkins died04:09
*** julim has joined #openstack-infra04:11
*** jcooley_ has joined #openstack-infra04:12
*** Ryan_Lane has joined #openstack-infra04:14
*** wchrisj has quit IRC04:15
*** julim has quit IRC04:21
*** stevebaker has quit IRC04:27
*** stevebaker has joined #openstack-infra04:27
*** Ryan_Lane has quit IRC04:30
*** Ryan_Lane1 is now known as Ryan_Lane04:30
*** Ryan_Lane has joined #openstack-infra04:30
*** jcooley_ has quit IRC04:30
*** Ryan_Lane1 has joined #openstack-infra04:30
*** jcooley_ has quit IRC04:32
*** sarob_ has quit IRC04:45
*** sarob_ has joined #openstack-infra04:45
*** stevebaker has joined #openstack-infra04:47
*** harlowja is now known as harlowja_away04:49
*** sarob_ has quit IRC04:50
*** mrodden has joined #openstack-infra04:50
*** Ryan_Lane1 has quit IRC04:53
openstackgerritKhai Do proposed a change to openstack-infra/jenkins-job-builder: fix setting of default values for missing parameters in jenkins.ini file.
*** CaptTofu has quit IRC04:54
*** thuc has joined #openstack-infra04:56
*** thuc_ has joined #openstack-infra04:56
*** esker has joined #openstack-infra04:57
*** thuc has quit IRC05:01
*** sarob_ has joined #openstack-infra05:02
*** harlowja_away is now known as harlowja05:02
*** wchrisj has joined #openstack-infra05:07
*** sabari has quit IRC05:15
*** sabari has joined #openstack-infra05:15
*** sweston has joined #openstack-infra05:16
*** nati_ueno has joined #openstack-infra05:22
wenlockkevinbenton, recheck no bug   .... ?05:23
kevinbentonwenlock: i thought maybe there was a bug for jenkins dying :-)05:24
*** jcooley_ has joined #openstack-infra05:24
*** sarob_ has quit IRC05:30
*** sarob_ has joined #openstack-infra05:30
*** sarob_ has quit IRC05:35
*** jcooley_ has quit IRC05:40
*** wchrisj has quit IRC05:44
*** nicedice has quit IRC05:46
*** jcooley_ has joined #openstack-infra05:49
*** talluri has joined #openstack-infra05:49
*** jcoufal has joined #openstack-infra05:51
*** gyee has quit IRC06:00
*** jhesketh_ has quit IRC06:00
*** jcoufal has quit IRC06:01
*** jhesketh has quit IRC06:03
*** chandan_kumar has joined #openstack-infra06:04
*** gokrokve_ has quit IRC06:04
*** gokrokve has joined #openstack-infra06:04
*** wenlock has quit IRC06:07
*** gokrokve has quit IRC06:08
*** gokrokve has joined #openstack-infra06:15
*** nati_uen_ has joined #openstack-infra06:21
*** thuc has joined #openstack-infra06:23
*** nati_ueno has quit IRC06:24
*** Alexandra is now known as alex-gone06:24
*** thuc_ has quit IRC06:26
*** thuc has quit IRC06:27
*** skraynev_afk is now known as skraynev06:30
*** thedodd has quit IRC06:34
*** CaptTofu has joined #openstack-infra06:34
*** CaptTofu has quit IRC06:39
*** sweston has quit IRC06:39
*** amcrn has joined #openstack-infra06:40
*** sarob_ has joined #openstack-infra06:41
*** sarob_ has quit IRC06:45
clarkbsdague: when you start your morning, instead of checking zuul status can you propose a revert of ? There are reasons that they are isntance level variables, we need to load the logging config in setup_logging ebfore any of those loggers are grabbed06:45
clarkbsdague: I think we will need one more change on top of the revert to catch other uses of LOG that were copy pasta'd around06:45
Daisyclarkb: could you help to review and push this patch:
*** Ryan_Lane1 has joined #openstack-infra06:50
clarkbDaisy: I can certainly take a look06:50
DaisyI'm eager to see whether it works. It's time for translation team to start message translation.  I hope it could run as soon as possible.06:51
clarkbDaisy: ok, SergeyLukjanov is usually on in a little bit, I can give it the first +2 and hopefully he can review and approve06:52
*** pblaho has joined #openstack-infra06:53
DaisyThank you !06:53
*** pblaho has quit IRC06:53
*** pblaho has joined #openstack-infra06:54
clarkbI starred the change and will try to remember to look at it in the morning my time06:54
clarkband shepherd it if necessary06:54
*** vogxn has joined #openstack-infra06:54
*** jamielennox is now known as jamielennox|away06:57
*** harlowja is now known as harlowja_away06:59
*** briancurtin has quit IRC06:59
*** denis_makogon has joined #openstack-infra07:01
*** jpich has joined #openstack-infra07:01
*** jlibosva has joined #openstack-infra07:07
openstackgerritafazekas proposed a change to openstack-infra/elastic-recheck: Add fingerprint for bug 1288579
*** oubiwann-ef has quit IRC07:13
*** ildikov_ has quit IRC07:16
SergeyLukjanovDaisy, clarkb, approved07:16
DaisyThanks !07:16
fifieldtall hail infra07:18
* fifieldt bows07:18
openstackgerritA change was merged to openstack-infra/config: Job to push Horizon translation to Transifex
jpichGreat :)07:19
*** harlowja_away has quit IRC07:21
*** saju_m has joined #openstack-infra07:22
*** Alexey has joined #openstack-infra07:23
*** yolanda_ has joined #openstack-infra07:27
*** Alexey has quit IRC07:28
*** alexey has joined #openstack-infra07:28
*** alexey has quit IRC07:28
*** achuprin has joined #openstack-infra07:29
*** sarob_ has joined #openstack-infra07:32
*** thuc has joined #openstack-infra07:34
*** adrian_otto has joined #openstack-infra07:35
*** sarob_ has quit IRC07:36
openstackgerritA change was merged to openstack-infra/config: Adds ! defined() guards around a2mod declarations
*** thuc has quit IRC07:38
*** sweston has joined #openstack-infra07:40
achuprinHi Infra!07:41
achuprinTell someone who can help me with the creation of Service Account for Third Party Testing?07:42
openstackgerritA change was merged to openstack-infra/jenkins-job-builder: Use venv to build documentation
*** oubiwan__ has joined #openstack-infra07:44
achuprinThis is a link to my email request -
clarkbachuprin: we typically process them in batches and do a handful at a time07:44
*** sweston has quit IRC07:44
clarkbusually at least once a week. I will put it on my todo list to do those again. Hopefully I can get to that07:45
achuprinok, thanks!07:46
clarkbfungi: should be abandoned?07:49
*** oubiwan__ has quit IRC07:49
openstackgerritA change was merged to openstack-infra/config: Restricting chef-cookbook-chefspec job to spec dir
*** sabari has quit IRC07:56
*** basha has joined #openstack-infra07:57
*** gokrokve has quit IRC08:00
*** gokrokve has joined #openstack-infra08:00
openstackgerritMehdi Abaakouk proposed a change to openstack-infra/devstack-gate: Set CEILOMETER_PIPELINE_INTERVAL to 15
*** gokrokve has quit IRC08:04
*** ildikov_ has joined #openstack-infra08:05
*** saju_m has quit IRC08:06
*** e0ne has joined #openstack-infra08:10
ttxjeblair: maybe fifieldt, dunno if he spends enough time on enough channels though08:15
ttxalso travels a lot, so not "reliably" apac08:15
rcarrillocruzhey guys, does openstackgerrit user run with code hosted in ?08:17
openstackgerritA change was merged to openstack-infra/config: Watch also havana branch for packstack
clarkbyes it does08:18
clarkbthough on an admittedly old commit08:18
fifieldtttx, mmm?08:18
*** saju_m has joined #openstack-infra08:18
rcarrillocruzah...that would explain, cos in the most recent code I don't see strings used by openstack gerrit!08:19
rcarrillocruzthx clarkb08:19
*** dstanek has quit IRC08:20
rcarrillocruzttx: hi, i'm looking at starting with some infra low-hanging-fruit bugs. I saw . Do you mean having a single yaml containing all events to be pushed by gerrit review and later gerrit syncs up with Google Calendar? Or on the contrary you maybe mean having a folder for holding calendar events, one yaml per event...08:22
*** CaptTofu has joined #openstack-infra08:22
*** flaper87|afk is now known as flaper8708:22
*** saju_m has quit IRC08:24
*** rlandy has joined #openstack-infra08:26
*** CaptTofu has quit IRC08:27
ttxrcarrillocruz: we have a group of students on that project now08:27
ttxrcarrillocruz: so probably a bad idea to duplicate effort08:28
ttxrcarrillocruz: i should have updated the bug, sorry. Doing it now08:28
*** hashar has quit IRC08:28
*** jgallard has joined #openstack-infra08:28
clarkbI need to sleep but one I need to write a bug for is host kibana 3 off of logstash.o.o08:30
rcarrillocruzgsoc or... ?08:30
clarkbwe are all kibana2 now and need to join the future but I havent had time to put that together in a bug08:31
rcarrillocruzany low hanging fruit that you know is not handled by anyone ?08:31
*** sarob_ has joined #openstack-infra08:32
*** dizquierdo has joined #openstack-infra08:33
*** openstackgerrit has quit IRC08:34
*** openstackgerrit has joined #openstack-infra08:34
*** saju_m has joined #openstack-infra08:36
*** sarob_ has quit IRC08:37
*** hashar has joined #openstack-infra08:40
*** sweston has joined #openstack-infra08:41
*** sarob_ has joined #openstack-infra08:42
*** gokrokve has joined #openstack-infra08:43
*** gokrokve_ has joined #openstack-infra08:45
*** sweston has quit IRC08:45
*** sarob_ has quit IRC08:46
*** basha has joined #openstack-infra08:47
*** gokrokve has quit IRC08:47
*** denis_makogon has quit IRC08:48
*** basha has quit IRC08:52
ttxrcarrillocruz: no it's a group of students at NDSU08:53
ttxworking with lbragstad08:53
*** jpich has joined #openstack-infra08:54
*** basha has joined #openstack-infra08:55
openstackgerritA change was merged to openstack-infra/config: Make sure lvm2 tools are installed
*** Daisy has quit IRC09:04
*** rlandy has quit IRC09:04
*** andreaf has joined #openstack-infra09:08
*** saju_m has quit IRC09:12
*** yassine has joined #openstack-infra09:13
*** bada has joined #openstack-infra09:14
*** fbo_away is now known as fbo09:17
*** mkerrin has quit IRC09:18
*** rossella_s has joined #openstack-infra09:18
*** mkerrin has joined #openstack-infra09:19
*** hashar has quit IRC09:20
*** Ryan_Lane has quit IRC09:20
*** Ryan_Lane1 has quit IRC09:20
*** zhiwei has quit IRC09:22
*** hashar has joined #openstack-infra09:22
ttxsdague: wrote instead of the blanket email09:22
ttxwould have made an email too long, also easier to reuse in the future09:23
*** johnthetubaguy has joined #openstack-infra09:24
*** jooools has joined #openstack-infra09:24
openstackgerritA change was merged to openstack-infra/jenkins-job-builder: Content-Type can now be set for email-ext publisher
rcarrillocruzttx: just got assigned , hope it's not also taken by students to your knowledge?09:30
*** sarob_ has joined #openstack-infra09:33
ttxrcarrillocruz: nope, nobody on it to my knowledge ;)09:34
ttxrcarrillocruz: and thanks so much for helping !09:34
rcarrillocruznp, thx09:35
*** sarob_ has quit IRC09:37
*** SumitNaiksatam has quit IRC09:39
*** ociuhandu has quit IRC09:39
*** SumitNaiksatam has joined #openstack-infra09:39
*** basha has quit IRC09:42
*** gokrokve has joined #openstack-infra09:45
*** zhiwei has joined #openstack-infra09:45
*** gokrokve has quit IRC09:50
*** yamahata has quit IRC09:56
*** lawcen has joined #openstack-infra09:57
*** Ryan_Lane has quit IRC09:58
*** lahoucine has joined #openstack-infra09:58
*** saju_m has joined #openstack-infra09:58
*** lawcen has quit IRC09:58
*** jerryz has quit IRC09:59
*** jp_at_hp has joined #openstack-infra10:00
*** hashar_ has joined #openstack-infra10:04
*** hashar has quit IRC10:05
*** morganfainberg_Z is now known as morganfainberg10:06
*** hashar_ is now known as hashar10:08
SergeyLukjanovttx, thx for the why FF blogpost, great explanation of the process, I'm glad to use it to explain FF to Savanna contributors10:09
ttxSergeyLukjanov: glad you enjoyed it :)10:10
*** CaptTofu has joined #openstack-infra10:11
*** rpodolyaka has joined #openstack-infra10:14
*** malini_afk is now known as malini10:14
*** CaptTofu has quit IRC10:15
*** enikanorov has quit IRC10:18
*** rpodolyaka has quit IRC10:18
*** enikanorov has joined #openstack-infra10:18
*** yolanda_ has joined #openstack-infra10:20
*** yolanda_ has quit IRC10:25
*** yolanda_ has joined #openstack-infra10:27
*** sarob_ has joined #openstack-infra10:34
*** sarob_ has quit IRC10:38
*** sweston has joined #openstack-infra10:41
*** amotoki has joined #openstack-infra10:42
*** gokrokve has joined #openstack-infra10:45
*** sweston has quit IRC10:46
*** saju_m has quit IRC10:46
*** yamahata has joined #openstack-infra10:47
*** adrian_otto1 has joined #openstack-infra10:49
*** adrian_otto has quit IRC10:49
*** gokrokve has quit IRC10:50
*** jgallard has quit IRC11:13
*** hashar has quit IRC11:13
*** ociuhandu has joined #openstack-infra11:14
*** rpodolyaka has joined #openstack-infra11:15
*** adrian_otto1 has quit IRC11:15
*** rossella_s has quit IRC11:18
*** rossella_s has joined #openstack-infra11:19
*** rpodolyaka has quit IRC11:20
*** andre__ has joined #openstack-infra11:22
*** CaptTofu has joined #openstack-infra11:32
*** sarob_ has joined #openstack-infra11:35
*** sarob_ has quit IRC11:39
sdaguettx: well, the blanket email with the pointer would be good the FFE runs fierce11:40
ttxsdague: I posted the link on the ML11:40
sdagueit was deep in another thread though, right?11:41
ttxyes... I feel like I would be abusing to post it twice though11:41
ttxlooks like selfpromotion11:41
*** sweston has joined #openstack-infra11:42
ttxsdague: i'll post a vacation notice later, maybe I can mention it (as part of the "sean will run them" notification) there11:43
ttxsdague: I hope most will be covered this week, and you'll only have to check progress at the Tuesday meeting11:43
ttxbut i try not to be too hopeful :)11:44
ttxlate FFE requests generally come from PTls though, rather than random devs11:44
sdaguewell, I'll just be mean. Unless it's ZOMG nova won't start without this, I think any FFE showing up late needs to wait11:44
sdagueyeh, the events api in nova is the one that will need to be sorted11:45
sdaguebecause that's finally figuring out why neutron + nova races, and a way to stop doing that11:45
ttxAll tracked at -- still need to sync with markmcclain, jgriffith and markwash11:45
*** gokrokve has joined #openstack-infra11:45
*** sweston has quit IRC11:47
ttxshall have a pretty complete picture by eod11:48
*** gokrokve has quit IRC11:49
*** e0ne_ has joined #openstack-infra11:51
*** e0ne has quit IRC11:51
sdaguesounds good.11:53
sdagueSergeyLukjanov: do you have enough visibility into nodepool to know why it's stalled?11:54
sdaguelooks like python3 nodes are all gone11:55
SergeyLukjanovsdague, I have no access to our infra servers11:56
SergeyLukjanovsdague, probably, we can find smth in
sdaguealso, the number of devstack nodes is pretty low11:56
SergeyLukjanovsdague, yup, graph doesn't looks healthy11:57
SergeyLukjanovand 162 CR in check11:58
*** sgordon_ has quit IRC12:00
SergeyLukjanovsdague, looks like we have no py33 nodes (or lack of them) for at least 5h12:11
sdaguethus begins the long wait for fungi to get up12:11
mkodererhi folks... does someone know if the recheck of VMware Mine Sweeper work?12:14
openstackgerritNikita Konovalov proposed a change to openstack-infra/storyboard: Auth Token Middleware
*** rpodolyaka has joined #openstack-infra12:16
openstackgerritNikita Konovalov proposed a change to openstack-infra/storyboard: Only public user fields in unauthorized requests
*** weshay has joined #openstack-infra12:19
*** rpodolyaka has quit IRC12:20
*** bada has quit IRC12:21
*** jhesketh has joined #openstack-infra12:24
*** jhesketh has quit IRC12:25
*** mwagner_lap has quit IRC12:27
*** jnoller has joined #openstack-infra12:29
*** yassine has quit IRC12:29
*** jnoller has quit IRC12:37
openstackgerritBrad P. Crochet proposed a change to openstack-infra/jenkins-job-builder: Added support for Exclusion plugin
*** sweston has joined #openstack-infra12:42
*** sarob_ has joined #openstack-infra12:45
*** gokrokve has joined #openstack-infra12:45
*** sweston has quit IRC12:47
openstackgerritNikita Konovalov proposed a change to openstack-infra/storyboard: Remove empty CONF import and useage
*** mriedem has joined #openstack-infra12:47
*** gokrokve has quit IRC12:49
openstackgerritNikita Konovalov proposed a change to openstack-infra/storyboard: Added DELETE method for projects, stories, and tasks.
*** mfink has quit IRC13:00
*** zhiwei has quit IRC13:00
*** yamahata has quit IRC13:00
*** CaptTofu has quit IRC13:09
*** david-lyle has quit IRC13:10
*** yamahata has joined #openstack-infra13:13
*** esker has joined #openstack-infra13:14
*** esker has quit IRC13:14
*** smarcet has joined #openstack-infra13:14
*** esker has joined #openstack-infra13:14
*** dims has quit IRC13:18
*** dims has joined #openstack-infra13:19
*** esker has quit IRC13:19
anteayakevinbenton: in future, looks like a good candidate13:26
*** saju_m has quit IRC13:27
*** pdmars has joined #openstack-infra13:29
*** dcramer_ has quit IRC13:33
*** hashar has joined #openstack-infra13:33
*** mfink has joined #openstack-infra13:34
*** madmike has joined #openstack-infra13:35
*** bknudson has left #openstack-infra13:35
*** andre__ has quit IRC13:35
openstackgerritA change was merged to openstack-infra/config: Support filtering by review id(s)
*** sarob_ has joined #openstack-infra13:36
*** andre__ has joined #openstack-infra13:36
openstackgerritNikita Konovalov proposed a change to openstack-infra/storyboard: Added DELETE method for projects, stories, and tasks.
*** eharney has joined #openstack-infra13:37
*** mbacchi has joined #openstack-infra13:38
*** mfink has quit IRC13:38
*** sarob_ has quit IRC13:40
*** rlandy_ has joined #openstack-infra13:41
*** yamahata has quit IRC13:41
anteayamkoderer: have you gone through the logs that vm minesweeper provides?
*** e0ne has joined #openstack-infra13:42
mkodereranteaya: yep and it's not related to my fix.. sdague already told me that I can ignore it13:43
anteayamkoderer: can you email vm minesweeper and let them know that you are ignoring their results and why?13:43
anteayaoh guess you can't no email there yet13:44
*** rlandy has quit IRC13:44
anteayacan you ping salv-orlando about it?13:44
mkodereranteaya: and the recheck doesn't work13:44
mkoderersalv-orlando: ping13:44
anteayaand I had thought salv-orlando had offered a vm minesweeper email to be added to the account13:45
anteayamkoderer: thank you13:45
*** freyes has joined #openstack-infra13:45
*** gokrokve has joined #openstack-infra13:45
mkodereranteaya: ure welcome13:45
sdaguewe really need fungi to wake up :)13:46
fungiwe do?13:49
*** gokrokve has quit IRC13:49
*** thuc has joined #openstack-infra13:50
*** thomasem has joined #openstack-infra13:51
anteayathe world can start to turn, fungi's up13:51
fungilooks like we're low on slaves... particularly ones we only boot in rax13:51
*** dkliban has quit IRC13:51
anteayaif I understand correctly, yesterday you did an intial foray into testing to see if the only rax jobs could work on hpcloud13:52
fungichecking the quotas there13:52
*** malini is now known as malini_afk13:52
fungianteaya: nope, rather we were testing whether we could use the new hp region instead of the old hp region (which needs bigger flavors, so we were testing a means of limiting the available ram on them)13:53
fungiwe've got a ton of rackspace nodes in a delete state for a long time, so i'll clear those while checking other things13:54
sdaguefungi: yeh, basically the whole of check is stalled out13:54
sdagueor down to a trickle13:54
sdaguealso no single use pypy or python313:54
*** rlandy_ is now known as rlandy13:55
fungiright, we generally don't need as many of those so we on'y keep a few on hand, currently all in rax regions13:55
openstackgerritA change was merged to openstack-infra/devstack-gate: Set CEILOMETER_PIPELINE_INTERVAL to 15
*** ryanpetrello has joined #openstack-infra13:56
anteayafungi: ah thanks for clarifying13:57
anteayamight it be worthwhile to spread out the rax only jobs to hpcloud as well, in future?13:57
anteayaor did I just state the obvious again?13:58
fungiyeah, that's something we've wanted to do13:58
openstackgerritAntoine Musso proposed a change to openstack-infra/zuul: Document the Zuul triggers
fungilooks like we're at max ram in ord, even though we're only a little over 60% of our instance limit14:00
*** zns has joined #openstack-infra14:01
fungiand we maxed out the 5000 instances created per day there14:01
fungiiad and dfw on the other hand have quite a bit of capacity14:02
*** rcarrillocruz1 has joined #openstack-infra14:03
fungii'm going to stop puppet agent on nodepool and manually zero out the quota on ord to calm it down a bit14:03
*** hartsocks has joined #openstack-infra14:08
fungii'll also get started trying to add py3k-precise and bare-centos6 images in hpcloud-az1 and az3 (az2 is dead to me now)14:08
*** hartsocks has left #openstack-infra14:08
*** amotoki has quit IRC14:09
*** thuc has quit IRC14:10
*** yamahata has joined #openstack-infra14:10
*** thuc has joined #openstack-infra14:12
*** bknudson has joined #openstack-infra14:12
*** changbl has quit IRC14:14
openstackgerritSean Dague proposed a change to openstack-infra/config: create an integrated-gate template
sdaguefungi: can we get rax to change that instance create param, because that's now hit us multiple times14:16
fungisdague: jeblair has an e-mail thread going with them to sort it out since last week some time. i gather there's progress14:16
openstackgerritFlavio Percoco proposed a change to openstack-infra/devstack-gate: Archive config files along with logs
*** yamahata has quit IRC14:17
fungiwe're painfully aware of the need to get it resolved14:17
*** yamahata has joined #openstack-infra14:17
*** rpodolyaka has joined #openstack-infra14:17
*** dkranz has joined #openstack-infra14:18
*** thomasem has quit IRC14:18
*** yamahata has quit IRC14:19
*** yamahata has joined #openstack-infra14:19
*** thomasem has joined #openstack-infra14:20
mesteryQuestion for any infra folks: I'm working with the Linux Foundation to enable 3rd party testing for the OpenDaylight Neutron integration.14:20
*** julim has joined #openstack-infra14:20
mesteryIS this expected at first?14:20
mesteryI don't even see our results in the reviews either, which is concerning.14:20
mesteryLogs on the Linux Foundation JEnkins server indicate it is voting back.14:20
*** rpodolyaka has quit IRC14:21
anteayamestery: your account can't vote right now14:21
mesteryanteaya: OK, I figured that, thanks for confirming anteaya!14:22
anteayaOpenDaylight Jenkins is in the non-voting group14:22
mesteryWE removed the "starting" post which was going into reviews per suggestion from markmcclain14:22
anteayathat is great14:22
mesteryCool :)14:23
anteayathis group is the non-voting group and includes the voting group as a subset14:23
mesteryMark indicated that wasn't needed and in fact was more annoying than anything14:23
mesteryOK, cool, thanks!14:23
anteayaright now new 3rd party ci accounts start out in the non-voting group14:23
anteayato ensure their systems are stable14:23
mesteryGot it, that makes perfect sense.14:23
anteayathey can comment on patches and vote on teh sandbox repo:
anteayathen once they are stable and have some history of being a reiable service and a good community member14:24
fungimestery: if you attempt to post a vote (a "vrif" score) when leaving a comment and the acl doesn't allow it, the api call will fail completely and you'll get no comment added at all. you can however configure it to leave a 0 score when commenting and i believe that should work14:24
mesteryfungi: Bingo, that's the problem I think!14:24
mesteryLinux Foundation told me they were voting +1.14:25
mesteryI'll have htem change it to 0 for now.14:25
anteayathey apply at their projects weekly meeting and then if the ptl agrees, the ptl talks to gerrit admin and you get into the voting group14:25
fungimestery: you can vote +1 on openstack-dev/sandbox right now to test out that the vrif score addition is working, just not on any other projects14:25
mesteryanteaya: Thank you for clarifying the process, much appreciated!14:25
mesteryfungi: How do I vote on that sandbox? Just change the repository when voting back?14:25
anteayamestery: np, right now people are being advised to ignore the 3rd party output since the group as a whole is not stable14:26
anteayasee the backscroll in -qa14:26
fungimestery: once we move that account from the "third-party ci" group to the "voting third-party ci" group it will be able to vote on any project14:26
mesteryanteaya: OK, thanks!14:26
anteayathis isn't an exercise in creating noise, the point is to create useful information people pay attention to, but right now it is just viewed as noise14:27
*** thuc has quit IRC14:27
mesteryanteaya: 100% agree! My goal with the OpenDaylight Jenkins is to have it run jobs against both OpenStack and OpenDAylight code bases. :)14:27
mesterySo, less noise, more testing from both ends.14:27
mesteryAnd I got things working last night!14:27
mesteryLinux Foundation has some issues with their OpenStack cloud we're working through as well at the moment.14:27
mesterySo good we're non-voting to start with :)14:28
*** thuc has joined #openstack-infra14:28
anteayaso the more reliable all systems are the better for each individual 3rd party testing system14:28
mesteryAgreed anteaya.14:28
anteayalet us know how else we can help14:28
anteayaand also you might enjoy attending jaypipes 3rd party testing workshops in -meeting on mondays at 18:00 utc, I think that is the correct time14:29
mesteryI have that on my calendar, unfourtanetly I am on vacation next week, getting away from the Minnesota winters for a week with the family. :)14:29
*** yamahata has quit IRC14:29
anteayagood for you14:29
anteayayou, me, ttx14:29
anteayanext week is a popular choice to disappear14:29
mesteryAh, you're gone as well? Where too?14:30
*** yamahata has joined #openstack-infra14:30
mesteryI'm headed to San Diego14:30
mesteryWow, have fun, sounds like a great trip!14:30
anteayagreat, been there before?14:30
mesteryYes, though not with the family. Kids are excited to spend time on the beach. :)14:30
anteayathanks, I am looking forward to it, my host keeps calling me to make sure I have the train schedule14:30
anteayamestery: nice, I hope you have pleasant travels and lots of beach time14:31
mesterySame to you anteaya!14:31
*** briancurtin has joined #openstack-infra14:31
*** thuc has quit IRC14:31
*** HenryG has quit IRC14:34
*** fifieldt has quit IRC14:35
*** wchrisj has joined #openstack-infra14:35
*** dkranz has quit IRC14:36
*** sarob_ has joined #openstack-infra14:36
*** e0ne has quit IRC14:37
*** e0ne has joined #openstack-infra14:37
*** afazekas has joined #openstack-infra14:39
*** sarob_ has quit IRC14:41
*** yamahata has quit IRC14:41
*** rcarrillocruz has joined #openstack-infra14:41
sdaguemestery: I'm excited about the OpenDaylight testing going on an LF in OpenStack style14:42
sdaguevery cool14:42
mesterysdague: Yes, me too!14:43
*** rcarrillocruz1 has quit IRC14:43
mesterysdague: We plan to move them to the full OpenStack setup with zuul, jjb, etc. very soon.14:43
*** dkranz has joined #openstack-infra14:43
mesteryThey are excited about it as well!14:43
*** dcramer_ has joined #openstack-infra14:44
*** gokrokve has joined #openstack-infra14:45
sdagueit would be nice to have a solid open source SDN. once that really hardens up, I'd like to see that as our neutron default cause14:48
sdaguebecause the raw ovs approach... continues to be problematic, as we've seen in the gate.14:48
mesterysdague: Agreed! I think it will get there in the Juno timeframe, which aligns with the next release of OpenDaylight (Helium)14:48
mesteryThe patches I have out now (devstack and Neutron) for ODL lay the groundwork.14:49
*** rlandy has quit IRC14:49
*** zul has quit IRC14:49
*** gokrokve has quit IRC14:49
*** jnoller has joined #openstack-infra14:50
*** jswarren has joined #openstack-infra14:50
*** zul has joined #openstack-infra14:52
*** dkranz has quit IRC14:52
*** mfer has joined #openstack-infra14:53
*** talluri has quit IRC14:54
*** nosnos has quit IRC14:55
sdaguemestery: ok, some quick feedback on
sdaguecouple of questions on it, so let me know if there are answers, then I'm +214:57
mesterysdague: Checking it out, thanks for the review!14:57
*** HenryG has joined #openstack-infra14:57
mesterysdague: I agree on the SERVICE_HOST comment, will default it to that.14:58
*** dkliban has joined #openstack-infra14:58
mesterysdague: what functions in devstack for adding config? Pointer?14:58
anteayaalso noticing this question got missed: How do I vote on that sandbox? Just change the repository when voting back? No, you need to set up your system to listen the stream from the sandbox repo and then you need to submit a patchset to the sandbox repo to trigger a test run14:59
*** malini_afk is now known as malini14:59
*** jnoller has quit IRC14:59
mesteryanteaya: Got it, thanks! Is that required before moving to full voting?14:59
mesterysdague: Cool! Thanks for the pointer! I'll rework a new patch with your comments addressed ASAP.15:00
sdagueI'm not 100% sure if it will work in you case, but if it will, that would be great15:00
anteayamestery: up to markmcclain and the rest of the project, but we are suggesting it and it is a good demonstration of how your system handles voting15:00
*** eharney has quit IRC15:00
mesterysdague: I'll try it out!15:00
mesteryanteaya: Thanks again for all the help!15:01
dstufftmordred: sdague lifeless fungi clarkb whoever else, you may see some breakage in installs15:01
anteayadstufft: thanks15:01
dstufftif you accidently upgraded setuptool to 3.0+15:01
fungidstufft: thanks for the heads up!15:01
sdaguedstufft: is this something we should block on our side?15:01
sdagueor is a fix imminent15:02
*** gokrokve has joined #openstack-infra15:02
dstufftsetuptools 1.0 deprecated the "Feature" feature and setuptools 3 removed it, some projects were using it in their and those projects will fail to install if you have setuptools 3.0+ installed15:02
dstufft(there was some issue that the deprecation warning wasn't very visible since it just used a standard DeprecationWarning from Python which are silent by default :/)15:03
*** jnoller has joined #openstack-infra15:03
dstufftbut it won't be fixed upstream because it was a planned deprecatiion/removal15:03
fungii don't think we used feature in our files, but i suppose some of our dependencies may15:03
*** mwagner_lap has joined #openstack-infra15:04
dstufftfungi: I know cffi did15:04
dstufftnot sure if y'all depend on that or not, I think you do15:04
* fungi grumbles15:04
fungiin some places, i think15:04
dstufftthere's a fix for cffi getting pushed out now15:04
lifelessdstufft: sadface15:04
dstufftwell getting patched15:04
dstufftnot sure whn they'll do a release15:04
dstufftI know of zope.interface and Markupsafe too15:04
sdagueyeh, pycrypto needs that15:04
sdagueI think that's the only place we hit it though15:05
*** rlandy has joined #openstack-infra15:05
sdaguefungi: I guess an early mirror trigger might be in order once cffi is out ?15:05
fungisdague: possibly, if we end up using setuptools 3.0 inadvertently15:06
sdaguelifeless: at least you know the answer to your question in -dev15:06
dstufftdidn't even notice that15:06
dstufftwelp glad I said something then :)15:06
sdaguefungi: - clarkb has +2ed it15:06
sdagueand he was the last one to touch that code15:06
fungiyeah, i was going to follow suit and then he said something in irc after that which sounded like maybe he was recanting. let me refresh my memory15:08
*** zns has quit IRC15:08
sdagueSergeyLukjanov: so on - I'd like to start smaller15:08
*** mgagne has joined #openstack-infra15:08
sdagueand bring over the other jobs one at a time because I do think we need to revisit if they all actually need to be cogating15:09
SergeyLukjanovsdague, ok, sounds reasonable15:09
*** Hefeweizen has quit IRC15:10
sdagueand grenade, tempest full, and tempest neutron seems reasonably uncontroversial15:10
sdagueit did pick up a few places, like trove-client, that weren't in the mix on these15:10
*** denis_makogon has quit IRC15:10
fungiwe now have a bare-centos6 image in hpcloud-az3 and nodes being launched from it15:11
mtreinishsdague: no love for postgres :)15:11
sdaguemtreinish: lets start small, and sort out the rest as we go :)15:12
fungiother bare-centos6 and py3k-precise images in az1 and az3 are near completion as well15:12
sdaguehonestly, with current clean check, I think postgres could safely live on check only.15:12
openstackgerritA change was merged to openstack-infra/config: remove inline set -e that is preventing explanations
*** jaypipes has joined #openstack-infra15:13
*** rlandy_ has joined #openstack-infra15:14
*** freyes has quit IRC15:14
*** rlandy has quit IRC15:15
*** rlandy_ is now known as rlandy15:15
*** adrian_otto has joined #openstack-infra15:15
ttxfungi: another "submitted" thing:
ttx"Depends on commit 35b513c1b3a0770db00dbf4aed754d9d6d9614e5 which has no change associated with it"15:16
fungittx: yeah, spotted that one last night15:16
fungisomeone screwed up a rebase, looked like15:17
ttxfungi: looks pretty recent15:17
fungittx: it stemmed from your approval about 25 hours ago15:18
fungittx: i saw it last night15:18
ttxYeah, looks funny @,n,z15:18
*** sarob_ has joined #openstack-infra15:20
sdaguefungi: how you feeling about this -
anteayafungi: so did we ever find out why all the rax nodes disappeared? was it quota?15:21
fungittx: right, looks like 78168 was committed on top of 78161 after that commit was modified, but then only 78168 got pushed to gerrit without the modified commit for 7816115:21
fungittx: so since the parent commit didn't exist in gerrit, it didn't set up a dependency relationship between the commits, but then when it tried to merge gerrit realized there was a missing dependency there and refused15:22
jeblairgood morning15:22
fungimorning jeblair15:23
*** amotoki has joined #openstack-infra15:23
anteayamorning jeblair15:24
fungisdague: i'll have a look in a bit. still trying to unwind whether there's anything else wrong in nodepool land15:24
sdaguefungi: cool, thanks15:24
*** sarob_ has quit IRC15:25
jeblairfungi, clarkb: i don't think it would be enough for zuul to remember the most recent change merged because what if 5 merge in a row (and that would be even faster if we remove the replication check)15:25
fungianteaya: they didn't all disappear. we were getting starved out of most of the less common (in this case py3k-precise and to a lesser extent bare-precise and bare-centos6) nodes because nodepoold was trying too hard to bring them up in rax-ord15:25
jeblairfungi, clarkb: so something like remembering which project-branches were seen in the most recent X time is more correct; or having the merger create refs for all projects in the shared queue is probably most correct (but potentially slow)15:26
fungijeblair: right, almost certainly not just the most recent. more like some time window15:26
mesterysdague: Addressed your main concern and few of the smaller ones.15:26
fungioh, you just said that15:26
mesterysdague: Config file thing was tricky, see my comments on patchset 14 for more details.15:26
jeblairfungi, clarkb: there's that kind of approach, or the other alternative is to try to measure replication completion more correctly.  i don't have good ideas about that other than to tell zuul about all replication targets and have it check all of them (but what if one is intentionally down?).  i'm not as keen on this.15:27
sdagueyeh, I wasn't sure if it would work or not15:27
jeblairfungi, clarkb: or teach zuul to read gerrit's process list.  that seems really wrong.15:28
jeblairmestery, sdague: you were talking about making odl the default neutron case... is odl testing something that can be done upstream?15:28
sdaguejeblair: yes, it could be15:29
mesteryjeblair: Eventually I'd like to get ODL as the default Neutron driver. The patch above moves us closer.15:29
sdaguenot today15:29
sdaguebut it could get there15:29
mesteryjeblair: My testing has shown ODL with Neutron is vastly more responsive than with the OVS agents.15:30
sdagueonce opendaylight is a bit more tested15:30
mestery+1 to what sdague is saying15:30
mesteryWe're going to be testing OpenStack with each OpenDaylight commit in the OVSDB project soon as well,.15:30
mesterySo it will get a lot of testing, both openstack and opendaylight15:30
sdagueI think the stretch goal of doing that by end of Juno is a good one15:30
sdagueassuming the neutron team agrees as well with that as their default15:31
mesteryagreed sdague, markmcclain is aware of this, but will take discussion in Atlanta I think15:31
*** apevec has joined #openstack-infra15:31
sdaguejeblair: so in summary, it's completely technically doable in upstream15:32
anteayafungi: ah15:32
sdagueand it's a policy decision that will need agreement15:32
* mestery loves it when a plan comes together sdauge.15:32
apevecrussellb, vishy - backport should make Nova Grizzly happen, please review15:32
dhellmanngood morning15:33
anteayamorning dhellmann15:33
*** david-lyle has joined #openstack-infra15:35
anteayadhellmann: bring it15:36
dhellmannanteaya: haha, I'll go see about postage15:36
*** markmcclain has joined #openstack-infra15:36
anteayaah that will be a problem15:37
openstackgerritA change was merged to openstack-infra/storyboard: Auth Token Middleware
anteayadhellmann: our postal system is &^%^%^*&ed... ah less efficient than it could be15:37
dhellmannanteaya: UPS then?15:38
anteayadhellmann: fedex15:38
anteayaUPS just keeps your stuff in a warehouse for ever15:38
anteayasending alerts and never able to find/deliver it15:38
dhellmannI'll see if I can squeeze it into one of those little envelopes15:38
*** beagles has left #openstack-infra15:38
*** jnoller has quit IRC15:38
anteayadhellmann: go you, loves me some cold weather15:39
dhellmannnormally I do too, but I've had enough this year15:39
* anteaya nods15:39
*** jnoller has joined #openstack-infra15:39
anteayanow having said all that, I am running away to thailand for a week15:39
*** thedodd has joined #openstack-infra15:39
anteayabut the package will be delayed anyway15:39
anteayaso I will pick it up once I return15:40
*** rpodolyaka has joined #openstack-infra15:40
*** rpodolyaka1 has joined #openstack-infra15:40
*** oubiwan__ has joined #openstack-infra15:41
*** krotscheck has joined #openstack-infra15:42
*** wenlock has joined #openstack-infra15:43
*** sweston has joined #openstack-infra15:43
*** sarob_ has joined #openstack-infra15:45
*** rcleere has quit IRC15:46
openstackgerritJeremy Stanley proposed a change to openstack-infra/config: Add bare-centos6 and py3k-precise nodes to hpcloud
*** sweston has quit IRC15:48
fungiokay, we've had jobs run and succeed on py3k-precise and bare-centos6 nodes in hpcloud-az1 and hpcloud-az3 so there's ^ a change for that15:48
*** esker has joined #openstack-infra15:48
anteayashould we do just az-1 and az-3 for now? will the az-2 issues become a problem for these jobs, more so that other jobs?15:50
*** sarob_ has quit IRC15:50
fungino, jobs for these won't run in az2 anyway until the images are able to build successfully there so we can boot nodes from them15:51
fungiand the az2 problem might be cleared up by the time that's reviewed and merged anyway15:51
fungiclarkb said someone in hp was going to have a look at the ticket this morning and try to dig into it15:52
sdaguejeblair: I'd like to start rolling up the integrated-gate so we could turn on heat-slow as gating easier - (I think a reasonably minimal starting point)15:56
lifelessfungi: so, cloud is still up, but no nodepool love15:56
anteayafungi: k, thanks15:56
krotscheckHow would I go about getting my ssh key on so I can go look at logs?15:56
krotscheckServer behavior does not seem to match codebase behavior right now.15:57
fungisdague: is adding grenade to heat going to be problematic?15:57
sdaguefungi: it better not be :)15:57
sdaguehonestly, right now it probably noops15:57
*** sarob_ has joined #openstack-infra15:57
sdaguehowever having an upgrade job is part of the TC approved requirements for integrated projects15:58
*** rpodolyaka1 has quit IRC15:58
fungisdague: trove as well i guess16:01
fungiand ceiloclient16:02
jeblairkrotscheck: is there something we can look up quickly for you?16:02
krotscheckmordred: Thanks.16:04
*** dstufft has quit IRC16:04
*** rcarrillocruz1 has joined #openstack-infra16:04
lahoucineHi everyone, I'have deleted my old account "lahoucine <>" but it's still visible in gerrit. My current account "Lahoucine BENLAHMR <>" is shown duplicated and  is unselectable when trying to add it as reviewer to a change.Any one knows how I can definitly remove my old account "lahoucine <>", and  how to makes my current account  "Lahoucine BENLAHMR <lahoucine@benla16:05>" works (resolve deplucation) ? Thank you for your help!16:05
*** pblaho has quit IRC16:05
anteayalahoucine: hi16:05
apevecfungi, speaking of Trove - I've proposed but looks like check job isn't using review branch so it fails16:05
jeblairsdague: why do you want to reduce the integration tests in the gate?16:06
lahoucinehi anteaya16:06
anteayalahoucine: fungi is our gerrit db account clean up person16:06
fungilahoucine: i saw your bug report from earlier (and your private /msg which i hadn't gotten to yet). i'll take a look in a bit16:06
*** rcarrillocruz has quit IRC16:06
sdaguejeblair: so this patch doesn't reduce that16:06
fungilifeless: seeing if i can tell why they're not booting now, but we did get successful completion of that image build last night16:06
sdagueit extracts out common, uncontroversial tests16:07
lahoucinehi fungi, ok thanks16:07
jeblairsdague: i know, but the reason to ditch SergeyLukjanov's work in favor of yours hinges on this16:07
*** juice has joined #openstack-infra16:07
jeblairsdague: so it looks like you don't think neutron-full, large-ops, neutron-large-ops, and cells should be gating16:07
fungiwell, those weren't removed from projects which are currently running them16:08
jeblairfungi: yes, but those _are_ the jobs that are different from sergey's change16:08
sdagueso right now, neutron-full should not be gating16:08
sdagueand I think neutron-large-ops is worth thinking about as whether it's actually a co-gate job16:08
fungii'm in favor of SergeyLukjanov's change too, though it's currently wip16:09
jeblairsdague: neutron-full is gating by virtue of being in check, fwiw.16:09
sdagueit's not actually voting16:09
* SergeyLukjanov reading backlog16:09
jeblairok, well, SergeyLukjanov didn't change that anyway... should neutron-full not be in check?16:10
SergeyLukjanovsdague, jeblair, my CR was just to extract common part of the gate16:10
SergeyLukjanovI'm ok with starting from small pack16:11
jeblairSergeyLukjanov: i know.  sdague has one to extract a smaller set.  but i want to know where this is going.16:11
jeblairam i going to -1 the next 4 patches he submits because they remove gating jobs?  or is he going to submit 4 patches that end up making his change the same as yours...16:11
sdaguejeblair: so SergeyLukjanov's set is a straight extract16:11
jeblairthese are questions that are worth asking beforehand.  :)16:11
sdaguewhich means it doesn't include all the integrated projects16:11
sdaguebecause the integrated projects don't all run these things16:12
SergeyLukjanovIIRC sdague wants to achieve clean set of integr gate  by making small template and add only needed jobs to it and remove all other16:12
sdaguemy approach is start minimal16:12
sdagueand enforce on all integrated projects16:12
sdaguewhich actually means turning on jobs on some of them16:12
jeblairsdague: okay, cool, i just want to know what the end state is16:12
fungilifeless: so the building state nodes were still hanging around, and there were enough of them that nodepool didn't think you needed new ones. i'm deleting them now (it appears after the 8 hour mark nodes in a building state can be deleted, but until then there's a database lock on those rows)16:13
sdaguebecause my next patch is to add the heat-slow job to this, because I think we do need to co-gate on that. And otherwise I'd like to discuss at summit if the large ops jobs are actually co-gate or should be on just some projects16:14
jeblairsdague: you think large-ops might be safe to asymmetrically gate?16:14
sdaguebut after resistance to my drop of unit tests in the gate, I'm leaving removes until after summit16:14
sdaguejeblair: yes, because they are basically nova performance tests16:14
fungilifeless: you should (hopefully) see 35 new instances spinning up in nova16:15
sdaguebut, I think that's a summit discussion16:15
jeblairsdague: would we be adding more large-ops jobs if we added it to your template?16:16
sdagueyeh, you'd end up putting it on trove, for instance16:16
jeblairand then remove it when we decide to remove it, rather than decided to freeze it arbitrarily where it is now16:17
sdaguejeblair: we were legitimately out of nodes the last couple of days, so I don't want to burn cycles on uselessnes16:17
jeblairsdague: then remove it everywhere.16:17
sdagueI'd rather do some real analysis before making that decision16:18
sdaguethis was the point of doing the small version16:18
sdaguebecause I think we can all agree on that change. And I'd rather not hold that on all the decisions on the stuff that requires some analysis16:19
jeblairsdague: the layout file is way too complicated; i'd much rather have it be comprehensible and represent what we are trying to accomplish and burn the occasional node on a trove test than not be able to understand why large-ops is on these 7 projects but not these other 216:19
*** sarob__ has joined #openstack-infra16:20
fungiand apparently the other reason we've been node starved is that each of rax-dfw and iad had 95 nodes stuck in building for more than 8 hours as well, so i'm deleting those now too16:20
sdaguejeblair: and you don't think it's better to actually have reasons for why each of these things are required for all integrated projects?16:20
jeblairbesides, if node exhaustion is really the most important thing, how's about we not run "gate-noop" on them16:21
*** andreaf has quit IRC16:21
*** rcarrillocruz has joined #openstack-infra16:21
jeblairsdague: i do, but you don't want to examine those reasons until the summit.  so until then you want to maintain the status quo.  afaik the status quo is they run everywhere16:21
sdaguethe status quo is to run where they run16:21
*** sarob___ has joined #openstack-infra16:22
jeblairthey only run not everywhere because we're really bad at updating this file.  we're fixing that.16:22
sdagueI'm not convinced of that. A lot of times they don't run everywhere because when deciding 'does this job make sense here' the answer is no16:23
sdaguesometimes it's misses, and sometimes is a decision16:23
jeblairsdague: where does it make sense to run large ops then?16:23
fungii think the special snowflake decisions are adverse from a consistency standpoint16:24
*** thuc has joined #openstack-infra16:24
sdaguehonestly, I don't know. And I don't have the time to figure that out right now. So I don't want to add or remove that test to jobs until we do.16:24
fungibecause they make it a lot harder to tell the difference between intentional and accidental exceptions16:24
*** sarob__ has quit IRC16:24
sdaguefungi: I agree with all of this, which is why this was about minimum step forward16:24
*** thuc has quit IRC16:25
*** thuc has joined #openstack-infra16:25
sdagueI have strong justification for the 3 jobs in the template that I added, which I can very much defend16:25
sdaguebut all the rest of those...16:25
sdagueI don't know right now16:25
jeblairsdague: so is there a next step after your patch before the summit?16:29
jeblairsdague: you say: "After this merges we can take other jobs one at a time here." but i'm not seeing what the next step is except wait 3 months16:29
openstackgerritNikita Konovalov proposed a change to openstack-infra/storyboard: Auth hotfix
sdagueI think there needs to be analysis of jobs we want to lock into this, especially some documentation that they've managed to catch a cross project issue.16:30
jeblairsdague: okay, so there won't actually be any follow-on patches for 3 months16:32
sdaguewell, first it's 2 months16:32
sdaguesecondly, not without some other analysis16:32
sdagueif we can do that offline, cool16:32
sdagueif not, ensure we discuss at summit16:33
sdaguehonestly, I'm not sure why this is controversial16:33
jeblairsdague: because if we merge your patch i don't know how to review further changes to zuul's configuration.16:34
sdaguethis was minimal for a reason, because I actually want to talk with jogo about the large ops jobs, because I'd like to make sure we understand how they would find issues on a keystone change16:35
fungigiven that, i'm more in favor of whatever we can do now to templatize anything in layout.yaml which will shrink it substantially16:35
jeblairare we in an "integrated job freeze" until the summit?  can we add large ops jobs to missing projects?  can we remove them?  i dunno.16:35
jeblairi know what I think the current status quo is -- we run all the jobs everywhere16:35
*** sabari has joined #openstack-infra16:35
sdaguejeblair: we can add them with justification16:35
jeblairand when we don't do that, it's only because the current structure makes it nearly impossible to catch that in review.  and that's what we were trying to fix, and that's what you're asking us to give up.16:36
sdaguejeblair: we are so far away from running all the jobs everywhere though16:36
sdagueok, so I give up then. I really thought this was a helpful step16:36
*** saper_ has joined #openstack-infra16:38
*** sabari3 has joined #openstack-infra16:38
sdagueI feel like what's currently in the check and gate queues are based on whatever someone managed to get in once, and have a very low level of understanding on what's in there. So I'd rather not massively encode that in the template16:39
sdagueI'd rather encode in the template things we are *really* sure on16:39
sdagueand make it so that's a pretty high standard of *yes* absolutely, it's valuable to run this on all integrated projects for these following reasons16:39
fungifor some reason, ord server boot count has jumped back to 5000 available in the past hour (from 0) so i'm putting it back into rotation too16:39
fungithough i think i'd like to get merged before reenabling puppet on nodepool.o.o16:41
jeblairsdague: right, but our thinking has been "don't create assymetrical gates, and the more jobs we run, the more nondeterministic bugs we catch"16:41
jeblairsdague: i'm okay revisiting that, but it feels like your change is done with the thought that we should alter that thinking but the completion of that thought is months away16:42
jeblairsdague: whereas there's an alternate approach out there which is that we just "fix" whatever things have slipped through the gap in our current thinking16:42
sdaguejeblair: with clean check required, blocking projects moving code forward because we like to catch non deterministic bugs their code  seems very unfair16:42
sdaguenon deterministic bugs that are unrelated to their code16:43
sdagueif we want more jobs, we should work on getting the other periodic queue running to build that result set16:43
jeblairsdague: sure, but the analysis that it's truly unrelated doesn't exist, and is what you are proposing be postponed for 2.5 months16:43
sdaguewhich gives us the data16:43
*** sweston has joined #openstack-infra16:44
sdaguejeblair: what I'm proposing is I like to actually sleep sometimes16:44
sdagueand not work every weekend16:44
sdagueso I'm being realistic16:44
jeblairwell, i can't argue with that16:44
sdagueif someone else wants to do this analysis earlier, I'm totally happy with that16:45
sdaguebut I think it's part of the process16:45
*** basha has quit IRC16:45
sdagueand with other commitments, it's going to take a while for me to get to it16:45
fungilifeless: SpamapS: did you end up seeing instances appear? i've got more 35 nodepool nodes which claim to have been in a "building" state in your cloud for over half an hour now16:46
*** rpodolyaka has joined #openstack-infra16:46
*** SumitNaiksatam has joined #openstack-infra16:46
sdagueso I'm not saying we can't have that conversation until summit. I'm saying I'm not going to be able to drive it until then16:47
fungilifeless: SpamapS: i think something must still be wrong on your end because i'm getting no instance id or ip address for any of them16:48
fungilifeless: SpamapS: is there perhaps something special which can break when trying to boot from a snapshot, which doesn't come into play when booting from a "normal" glance-uploaded image?16:49
*** yamahata has joined #openstack-infra16:50
jeblairsdague: why not say "everything gates on what everything else gates on" which is more or less what we've been trying to do, and then remove those jobs one at a time if they don't fit?16:52
openstackgerritDoug Hellmann proposed a change to openstack-infra/config: Create ACL groups for oslo.rootwrap
jeblairsdague: i think that leaves the configuration in a cleaner state, and one where it's much easier for us to review changes16:52
sdaguejeblair: because when we are near a milestone we're waiting 1.5 hrs for a devstack node to be allocated on check16:54
sdagueand adding a ton of additional jobs doesn't make that better16:55
sdagueif that's what you want to do, so be it. my opinion is it's the wrong thing because we do actually have finite resources. And it makes me, as QA PTL, actually hold back on adding new parts of this matrix because it's already completely overloaded16:56
sdaguemaybe that's the key tension16:56
lifelessfungi: tryig manually with tripleo-precise-1394069856.template.openstack.org16:57
openstackgerritA change was merged to openstack-infra/storyboard: Auth hotfix
sdagueI think we're back in starvation for our crunch times16:57
lifelessfungi: image is in state spawning16:58
sdagueand I'm about to propose another devstack job on everything (the heat one)16:58
lifelessfungi: and running16:58
lifelessfungi: try nova list from the nodepool box?16:58
lifeless| 63e19767-4a6c-42c1-a734-9c179a635730 | live-migration-test2 | ACTIVE | -          | Running     | default-net=
sdagueso I don't think we should be running stuff we aren't sure has value, because our resources are actually finite16:58
jeblairsdague: again, "gate-noop" is a much better target for your ire on that.16:58
*** jungleboyj has joined #openstack-infra16:59
sdagueok, so lets purge that as well16:59
sdaguehow many nodes does that save'16:59
*** saper_ has quit IRC16:59
sdaguenova lost half a day tuesday because of the time it took to get check results in reaction to upstream library release breaks16:59
jeblairsdague: and yeah, there are resource issues and we're trying to fix them.  we could continue to try to fix them, or we could give up and say that our cloud providers are incapable of providing the resources we need.17:00
*** dims has joined #openstack-infra17:00
sdaguejeblair: or we could be a little more careful about what we thing we need, to keep some overhead available for when we need it17:01
lifelessfungi: deleted it now17:01
fungisdague: we could stop using oslo libraries so that they don't break nova any more? i'm not quite sure what your point was on that statement17:01
sdaguefungi: my point wasn't the oslo broke us17:01
sdaguemy point was we couldn't figure out if we fixed it in nova17:01
jungleboyjI have a gate run that failed this morning in check-tempest-dsvm-neutron - setting up devstack with a bunch of 'no such file or directory' when trying to get packages.  Any ideas what that might have been?17:01
*** harlowja has joined #openstack-infra17:02
fungijungleboyj: without actual context, no. no idea whatsoever. do you have a link to the log?17:02
*** basha has joined #openstack-infra17:03
*** rpodolyaka has quit IRC17:03
sdaguebut apparently I have a very different view on this one17:03
sdagueso I'm going to abandon that patch, because I don't think it's worth this much fight.17:03
jeblairsdague: i think we should fix the degredation17:04
jeblairand run the tests we want to run17:04
sdaguejeblair: no, I'm actually saying we stop running tests we aren't sure we wan to run17:04
*** saper has joined #openstack-infra17:04
sdagueyou are assuming we know we want to run all those tests, and I actually want to have *that* conversation17:04
sdaguebut I don't have time to have it now, or build the data to make the right decisions17:04
jeblairsdague: but you don't want to have it for 2.5 months17:04
sdaguejeblair: I actually want to have it now17:05
fungijungleboyj: that's definitely a new one on me... it looks like apt-get tried to write index files to disk and failed... in hpcloud-az2 as well... checking logstash to see how many jobs may have been impacted on other machines similarly in the past week17:05
jungleboyjfungi: It is like the node's filesystem went bad or something when it was trying to set up the environment.  Don't know what I can reverify against for that though.17:05
sdagueI don't have time to gather enough data to have it be useful now17:05
sdagueI'm totally happy to have it if someone else is willing to go collect that data17:05
jeblairjogo: do you think large-ops should run on changes to all projects or just nova?17:06
jogojeblair: so it doesn't touch cinder, so not cinder for sure17:06
jogobut in *theory* keystone, swift, nova, glance are all tested by it (and neutron for neutron version)17:07
jogoand rootwrap of course17:07
sdaguejogo: for keystone, swift, glance, will it catch issues the other jobs will not?17:07
jeblair"I'm hoping we can get the new rate classes in place by end of day tomorrow (I'm an optimist), but I think worst-case would be next week."17:08
jeblairfungi, sdague: ^ just got an update from rax about the limits17:08
fungijungleboyj: looks like it hit one other job in the past 7 days...
sdaguejeblair: cool17:08
jogosdague: I *think* so. but if you want to run on nova only for resource reasons I am fine with that17:08
jogonova and devstack and tempest that is17:08
fungijungleboyj: within a few minutes of yours17:08
*** sarob_ has quit IRC17:08
jogosdague: so glance i doublt it will catch anything and swift too17:09
sdaguejogo: if you can find any change, ever, where it did find a failure mode we didn't catch in the full job, I'd be fine on running it those places. It's just not clear to me that it will.17:09
jogobut keystone perhaps17:09
jogoactually wait no17:09
fungijungleboyj: and also in hpcloud-az217:09
jeblairi'm really against the idea that we ever chose not to run a test for resource reasons17:09
dimsjogo, pong re: 1.x libvirt  - waiting for UCA team to update 1.2.x in icehouse/uca proposed deb repo17:09
jeblairi'm okay with choosing not to run it because we know it's pointless17:09
jungleboyjfungi: Interesting.  So something environmental?17:09
jogosdague: so nova and neutron are the biggest risks for large-ops17:09
jogojeblair: ^17:09
sdaguejeblair: so that decision gets made all the time17:10
jeblairbut "if you want to run on nova only for resource reasons" elicits from me: "no, not for resource reasons"17:10
jogojeblair: what is the reason?17:10
sdagueand if it's not made at the infra level, so infra is often pegged17:10
fungijungleboyj: i'm willing to bet hpcloud had some sort of issue with some of their storage in az2 around 06:2517:10
sdagueit will get made at lower levels, because developers would rather not be waiting17:10
jeblairsdague: is it never okay to wait for a test result?17:11
SpamapSfungi: good morning.17:11
jogodims: ack, lets move the discussion about this to nova room for a second17:11
sdaguejeblair: I'm not saying it isn't17:11
jeblairsdague: i mean, if it's that important, why not run the test on your workstation.17:11
jungleboyjfungi: Where do you see where it ran?17:11
fungijungleboyj: at the top of the console log. we encode the provider, region and image type into the slave hostname17:11
sdaguebut I'm saying that if people are waiting on infra a lot, they stop coming up with new interesting tests they want to put into the pool17:12
SpamapSfungi: I do not see any instances running on our cloud for your tenant.17:12
fungiSpamapS: yep, i'm trying to whip up a novaclient session on the nodepool server to see what's listed17:12
*** jcoufal has joined #openstack-infra17:12
jeblairsdague: you will convince me that we should not run useless tests.  you will not convince me that we should change our goals in testing based on a temporary performance degradation in the clouds we use.17:12
jungleboyjfungi: Ah, thank you.  So, should I open a bug for this and reverify against that or is there a more appropriate solution?17:12
fungiSpamapS: but for some reason nodepool thinks it launched 35 instances from that image about an hour ago when i cleared the old ones17:13
sdaguejeblair: aren't we runnig more quota than we ever ran?17:13
fungiSpamapS: and seems to be waiting to hear back what the instance and ip address are17:13
*** e0ne has quit IRC17:13
jeblairsdague: of course, but we're using only a portion of what we could be because of the rax issue17:13
SpamapSfungi: ok I'll look in the logs to see if we have errors17:14
fungijungleboyj: yes, that will work. whatever it was seems to be very brief (and over now) but it'll be useful for tracking purposes in case we see anything related from that timeframe17:14
sdaguejeblair: that was true when we hit the oslo.messaging issue?17:14
sdagueI thought we were just solidly flat out on that17:14
jungleboyjOk.  Thank you fungi !17:14
jeblairsdague: we've been bumping up against the rax servers/day quota daily for about a week17:14
sdaguejeblair: sure, but this was early in the day17:14
jeblairsdague: was there a backlog caused by slow runs late in the day?17:15
sdaguebased on what I saw in the check and gate queues, we wen're hitting quota17:15
sdagueI get there is also a quota issue17:15
jeblairi should have said rate limit17:15
sdaguesure, rate limit17:15
sdagueI do actually understand that issue17:15
jeblairk.  they are both at play so it's probably better to be clear.17:16
*** jcooley_ has joined #openstack-infra17:16
sdaguefrom my recolection on when we were getting boned by this, it looked like we were running as hot as we could (weren't hitting the rate issue)17:16
jeblairsdague: we wasted 220 nodes on gate-noop yesterday.17:16
sdagueover 1 day?17:16
jogodims: do you need help prodding the UCA team?17:17
sdagueso that's 4% of nodes?17:17
jogozul: ^17:17
jeblairsdague: we ran 23863 jobs yesterday.17:17
sdagueok, so 1%?17:17
sdagueso that's not much head room that it gives us17:18
jeblairalso.  omg.  :)17:18
sdagueso, sure, we should get rid of it17:18
jeblairyeah.  not a panacea tho.17:18
*** dkorolev has joined #openstack-infra17:19
jeblairat any rate, if you look at the node graph now, that really high orange bit is not normal.  that's nodepool spinning on trying and failing to create nodes17:19
jeblair(some of the probably tripleo, but a lot of them rax)17:19
sdaguejeblair: yep17:19
sdagueI agree with that, this morning's issue is different17:19
sdagueand not the thing I'm trying to solve17:19
sdagueso what I've seen is we're doing about 5x in check than in gate17:20
jeblairand even when rax is letting us build nodes, if it's come after a sustained period where we could not, we're going to have a backlog to work through17:20
sdagueand gate is merging ~100 a day17:20
sdagueso adding a new job ends up being +600 devstack nodes a day17:20
*** sarob_ has joined #openstack-infra17:20
sdagueif we run it across the integrated projects17:21
SpamapSfungi: nothing in our logs suggests errors. Let me try booting a snapshot.17:22
fungiSpamapS: lifeless said he tried that a few minutes ago and it worked17:22
fungiSpamapS: i wasn't seeing anything obvious in the nodepool debug log containing the novaclient request or response, but i'll look closer17:23
sdagueanyway, decide or not decide on the review. I think the data from jogo says the large ops jobs definitely shouldn't be run everywhere17:24
sdaguewhich is good data17:24
sdaguegot to get to other things17:24
SpamapSfungi: ok. I do see three snapshot images in your tenant...17:24
*** apevec has quit IRC17:24
jogosdague: link?17:24
fungiSpamapS: that sounds right. that's what i have from nodepool too17:24
jeblairjogo: would you be willing to propose a change on top of sdague's change that sets large-ops to run only where you think they should?17:25
*** sarob_ has quit IRC17:25
*** gokrokve has joined #openstack-infra17:26
fungiSpamapS: the most debug-level data i have from nodepool on it looks like (not especially helpful in this case)17:27
*** nati_uen_ has quit IRC17:27
fungiSpamapS: i believe it tried to use snapshot 8799d365-c7e9-49ca-a363-4d0d2a1562ed which we think is named tripleo-precise-1394069856.template.openstack.org17:28
jogojeblair: sure, so one quick question --  what is the motivation for pruning where we run jobs?17:29
jogoright now the job is running in extra places ... so cleaning it sounds good17:30
jeblairjogo: aiui sdague think's it's unfair to gate a project on a job that can't be affected by it17:30
jogoso in that case cinder shouldn't gate on it17:30
jogowe don't test cinder in the job17:31
jogobut ceilometer could break things17:31
jogonot sure about currently -- but it used to inject code into nova17:31
SpamapSfungi: glance has a different ID for that snapshot17:32
SpamapSfungi: | 117149f6-1bf6-45c4-9b24-149fe0ffe699 | | qcow2       | bare             | 5078908928 | active |17:32
dansmithare we having trouble with the largeops test?17:32
jeblairdansmith: not that i'm aware of17:32
dansmithjeblair: looks like a few of mine are failing on that test, which wouldn't be in that tested path, AFAIK17:33
*** sabari has joined #openstack-infra17:33
fungioh joy... looks like is breaking devstack jobs17:33
jeblairwhat serendipitous timing17:34
jogodansmith: ttp://
fungicould be a mirror update in progress... "Hash Sum mismatch" on some of their indexes17:34
dansmithjogo: ah, thanks17:34
*** jnoller has joined #openstack-infra17:34
*** amcrn has joined #openstack-infra17:34
jogoSpamapS: ^17:35
jeblairsdague: i'm pretty sure we agree.  we've always acknowledged that we're testing combinations that aren't necessary.  but no one has wanted to do the analysis on that to determine which combos are necessary.17:36
openstackgerritDoug Hellmann proposed a change to openstack-infra/config: Add gate jobs for oslo libraries
fungiSpamapS: ahh, you're right, we have image id of 117149f6-1bf6-45c4-9b24-149fe0ffe699 for it... i was looking at the server id which it was built from17:36
jeblairsdague: i think the issue i have is that your patch is the first step in that approach but you're saying you won't be doing that analysis for a while17:36
jeblairsdague: so i'm worried about starting down that road in case we don't actually finish the trip17:36
clarkbjeblair: I was thinking yesterday that the gerrit event stream should include replication event notifications17:37
sdaguejeblair: right, so my opinion is that it leads to 2 failure modes17:37
jeblairclarkb: that would be 3rd way to solve it.  :)17:37
openstackgerritA change was merged to openstack-infra/devstack-gate: Rename tempest.conf so it is gz'ed properly
*** vkozhukalov has joined #openstack-infra17:38
openstackgerritgordon chung proposed a change to openstack-infra/config: enable doc generation for pycadf
sdagueso my feeling is it's better to not default to 'run this job even if we don't know why it's useful'17:38
zaroclarkb: take a look ->
sdaguebecause as much as I know you believe that we should never consider resources as an issue17:39
sdagueI do17:39
jeblairsdague: (fwiw, note the downward trend on building nodes, upward trend on workers and downward trend on waiting jobs which roughly correspond with when fungi put rax back into the config.)17:39
sdagueand I've stopped thinking about new ways we should be testing openstack as a whole because I don't feel we have the headroom for it17:39
fungiwell, and also to when i deleted 190 slaves stuck building for more than 8 hours in iad and dfw17:40
clarkbzaro: woo17:40
jeblairsdague: do you think we need more quota?17:40
clarkbzaro: I guess we deal with it with your patch then17:40
*** gokrokve_ has joined #openstack-infra17:40
fungiand manually added py3k-precise and bare-centos6 to hpcloud az1 and az317:40
jeblairsdague: and are you basing that on the behavior of the system for the past week or other times?17:41
sdagueI'm basing it on the behavior of the system the week of any milestone17:41
jeblair(no matter what our quota and job demands are, if things break we're going to get behind)17:41
sdaguejeblair: right, but I think we need to not be idealistic and realize we've yet to handle a milestone without a break at this point :)17:41
davidlenwellHowdy infra team!  So this review is kicking back pep8 stuff but I'm not seeing it .  should I have them re-review or something ?17:42
*** krotscheck has quit IRC17:42
jeblairsdague: you're really bumming me out.  we're merging like tons of changes with like a couple hours delay in check results despite huge failures from our cloud providers.17:42
clarkbdavidlenwell: ImportError: cannot import name Feature you depend on a thing that uses a feature that was removed from setuptools17:42
clarkbdavidlenwell: which is :(17:43
jeblairsdague: it's not perfect, but it's not nearly as terrible as you're making it out.17:43
sdaguejeblair: I'm not saying it's terrible17:43
davidlenwellclarkb: :( that is sad17:43
clarkbdavidlenwell: markupsafe is the offender17:43
*** gokrokve has quit IRC17:43
jeblairsdague: i'm trying to not tune the system during a period where we know it's not behaving as it should, but you keep insisting that we do, so let's consider it17:44
openstackgerritDavid Lyle proposed a change to openstack/requirements: Adding support for Django 1.6
openstackgerritKhai Do proposed a change to openstack-infra/jenkins-job-builder: fix setting of default values for missing parameters in jenkins.ini file.
jeblairsdague: none of the changes you are talking about would have affected the backlog we've seen due to the rax outage17:44
sdaguejeblair: sure17:44
lifelessfungi: SpamapS: sorry I'm awolish - at the HP office in meetings17:44
jeblairsdague: because that backlog was due to jobs running on nodes that we only run on rackspace17:44
sdagueso your saying on Tuesday, there was a giant rax outage?17:44
davidlenwellclarkb: where are you finding that?17:44
*** rlandy is now known as rlandy|bbl17:45
SpamapSlifeless: np.. any clues as to what we should do next?17:45
sdaguesorry, I may not have been paying attention to that one.17:45
clarkbdavidlenwell: in the pep8 job log17:45
*** dangers_away is now known as dangers17:45
lifelessStevenK: moffett17:45
sdagueI'm not actually optimizing for what's happening right now17:45
jeblairsdague: nodepool received 58630 OverLimit responses on tuesday17:45
jeblair(utc tuesday)17:46
jogosdague jeblair: wow we did have large-ops in way too many places17:46
sdaguejogo: thank you, this is why I wanted to revisit it17:46
lifelessfungi: SpamapS: do we have a nodepool image-list for the cloud please?17:46
jogosdague: such as heat17:46
lifelessthat should list the external id17:46
fungilifeless: yes, pasting...17:46
jeblairsdague: so yeah, i've been trying to convey that this has been a significant problem for a week.17:47
lifelesswhich should be the cloud uuid for the thing17:47
sdagueso once we are running with headroom again, and it feels like we have it during a milestone like j1, I'll spend time thinking about new ways to test things17:48
openstackgerritMatt Ray proposed a change to openstack-infra/config: Add Ceph support to existing Chef cookbooks
jeblairsdague: you've convinced me that what you want to do is a good thing.  you've also convinced me that you are not going to do it.17:48
jeblairhow can i approve a patch like that?17:48
SpamapSlifeless: I pasted the image UUID earlier17:48
SpamapSfungi: | 117149f6-1bf6-45c4-9b24-149fe0ffe699 | | qcow2       | bare             | 5078908928 | active |17:48
SpamapSlifeless: ^^17:49
fungiSpamapS: lifeless:
*** zhiyan is now known as zhiyan_17:49
fungiSpamapS: that seems to match the newest image i have listed from nodepool too17:49
fungiSpamapS: lifeless: it's the one from the image-update i kicked off last night17:50
clarkbsdague: I don't want to disrupt the other discussion but I think we need to revert there is a reason for not using a static instance and that is so that setup_logging will work as expected17:50
openstackgerritA change was merged to openstack-infra/gerritbot: Some README fixups, including git url
SpamapSfungi: ok so should I see nodepool attempting to boot things?17:52
zaroclarkb: question here,
fungiSpamapS: not at this point because it's still waiting to hear back from the 35 instances it started building17:52
zaroclarkb: not sure what you mean by that.  is that a bad thing?17:52
clarkbzaro: yes, look at the xml diff job log17:52
lifelessSpamapS: so they match up17:52
lifelessfungi: ^17:53
*** sdake_ has joined #openstack-infra17:53
fungii'm testing right now to see if i can make novaclient work sanely on nodepool.o.o for talking manually to your cloud so i can try to emulate the things it's trying to do17:53
lifelessSpamapS: gotta run, another call calls.17:53
*** rcleere has quit IRC17:53
*** reed has joined #openstack-infra17:54
clarkbzaro: no I am saying the way that variable is used doesn't make it a variable17:55
clarkbzaro: the jobs literally get {git-dir} passed to them17:55
*** rcleere has joined #openstack-infra17:55
zaroclarkb: ohh you mean it should be ${git-dir}?17:56
openstackgerritJoe Gordon proposed a change to openstack-infra/config: Don't run large-ops test on repos that it doesn't touch
jogosdague jeblair: ^17:56
clarkbzaro: no, you have to pass git-dir somewhere for interpolation to happen17:57
clarkbzaro: otherwise it doesn't get replaced and the '.' condition in the script is never used17:57
jogothat was a conservative pruning17:58
zaroclarkb: so why would it not interpolate in the macro?  the macro.yaml contains other variables.17:59
clarkbzaro: because you are not passing that as a variable anywhere18:00
clarkbinterpolation only happens if git-dir is set somewhere. Otherwise it remains {git-dir}18:01
*** reed has joined #openstack-infra18:01
fungiSpamapS: i passed the two net-ids we have when running nova boot... default-net=; tripleo-bm-test=
clarkbzaro: at least that is what appears to have happened according to the xml diffs18:01
fungiSpamapS: those are all rfc-191818:01
SpamapS| 512e73ca-3ec8-405f-88c2-631eacd2a875 | fungible              | ACTIVE | -          | Running     | default-net=; tripleo-bm-test=               |18:03
SpamapSfungi: you should have floating ips to attach18:03
zaroclarkb: ok. i think i get it now.  i'm not sure what the solution would be besides adding git-dir to every project and i don't think that's a viable solution.18:03
clarkbzaro: rgith which is why I think we should just use the cd into dir then g-g-p solution that jeblair has suggested18:03
clarkbzaro: see my cover comment18:03
SpamapScloud-init boot finished at Thu, 06 Mar 2014 18:00:28 +0000. Up 40.08 seconds18:03
fungiSpamapS: oh, okay. i think i don't know how to do that. i'll see if i can figure it out. the nova command line is already pretty seriously scary...
zaroclarkb: ohh missed the cover.18:03
zaroclarkb: ok, read the cover.  i agree,  will abandon the change.18:05
SpamapSfungi: nova floating-ip-associate18:06
fungiSpamapS: yeah, i have to floating-ip-create first looks like, according to floating-ip-list18:07
*** chandan_kumar has joined #openstack-infra18:07
SpamapSfungi: yes18:07
fungiSpamapS: just out of curiosity, does tripleo have a development nodepool instance they're testing against that cloud to make sure that it's expected to be working?18:08
fungior is that me?18:08
*** andre__ has quit IRC18:08
*** nati_ueno has joined #openstack-infra18:09
jeblairjogo: cool thanks18:09
davidlenwell"Looks like the node went offline during the build. Check the slave log for the details.FATAL"   on .. is stuff broken or is it me?18:09
*** khyati has joined #openstack-infra18:09
fungiSpamapS: and it's apparently add-floating-ip not floating-ip-associate18:10
jeblairsdague: i think we can proceed with your patch, especially if we can get people to pitch in on reviewing layout changes until we've fully formulated a new policy18:10
sdaguejeblair: sounds good18:10
jogojeblair: didn't think the patch would be so big18:11
anteayadavidlenwell: well you can recheck bug 128437118:12
anteayayou can add a comment to the bug report18:12
anteayayou can read the bug report and see if you have any insight into why that is happening18:12
sdaguejeblair: can't wait for new gerrit with secondary indexes that would make that simpler18:12
*** Ryan_Lane has joined #openstack-infra18:13
jeblairsdague: oh because of file level watches?18:13
jeblairsdague: yeah, i think it works in email just not web18:13
sdagueyeh, it's supposed to, I haven't tried before18:13
sdagueI set one up now18:13
sdaguewe'll see18:13
fungiSpamapS: okay, confirmed i'm able to use novaclient on nodepool.o.o to attach a floating ip to the instance i booted18:14
fungiSpamapS: so whatever nodepool's issue is, we probably need more debug logging to sort it out. i'll generate a stacktrace and see if i can tell whether something's hung in some way. no clue whether that will help but i might spot something18:15
davidlenwellanteaya: alas .. I do not know why that would happen .. I've added my incident to the comments.. how do I make it recheck ..since im sure the problem wasn't on my end.18:16
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Add a script to manage IRC perms
SpamapSfungi: thanks for chasing it. We're very intersted in getting CI back up so just ping if you need anything from me.18:16
anteayarecheck bug 128437118:17
fungijeblair: none i've been informed of. just "okay cloud's back up now! where all the nodes at?"18:18
*** hogepodge has joined #openstack-infra18:18
jeblairSpamapS: do you folks have a nodepool pointing at your own cloud to help debug these sorts of things?18:18
* fungi just asked that too... heh18:19
jeblairoh heh18:19
jeblairi mean, when i'm debugging this sort of thing, i just run nodepool on my workstation and point it at rax or hp18:19
fungijeblair: current behavior is that nodepoold has started building nodes (or thinks it has) but has no instance id or ip address in its db for them, and just mentions in its debug log that it's building them but never anything else. confirmed i'm able to nova boot and attach a floating ip and ping an instance from nodepool.o.o using the same credentials and settings listed in the nodepool.yaml18:20
StevenKpleia2: diff from 0.9.8-2ubuntu17 (in Ubuntu) to 1.2.2-0ubuntu1~precise1 (28.8 MiB)18:20
jeblairfungi: when you take a stacktrace, the thread name will have the node id in it so you can see exactly where it's sitting in the process18:21
fungiand nodepool image-update worked fine, built an image, marked it ready, i used that to boot the one i did manually18:21
jeblairyou probably know that18:21
fungiyep, that's what i'm hoping will yield some additional detail18:21
clarkbbtw I haven't heard anything new from hp land this morning18:21
clarkbI will bug people again18:21
fungioh, and nova list doesn't return any of the nodes nodepool thinks it has in a building state either, forgot to mention that18:22
sdaguejeblair: ok, so I'm going to propose the heat-slow add on top of that one then18:22
openstackgerritSean Dague proposed a change to openstack-infra/config: enable heat-slow in the integrated-gate
*** jcooley_ has joined #openstack-infra18:22
jeblairsdague: cool!18:22
clarkbor is that just a name?18:23
openstackgerritA change was merged to openstack-infra/config: Add gate-murano-devstack job
sdagueit's faster than tempest-full18:23
sdagueit includes @slow jobs in tempest18:23
sdaguewhich we normally exclude18:23
jeblairso it's kind of a fast slow18:24
clarkbya so its a different set18:24
sdagueit's not a lot of jobs right now18:24
sdaguebut it does actually bring up real versions of linux18:24
sdaguenot just cirros18:24
clarkboh interesting18:24
openstackgerritElizabeth Krumbach Joseph proposed a change to openstack-infra/config: Add back old projects to replicate to git.o.o
clarkbsince cirros uses dropbear18:24
sdagueclarkb: yeh, that I don't know18:24
clarkbit shouldn't matter but who knows18:24
sdagueI thought the ssh bug was mostly races on network connections18:25
clarkbsdague: but if you read the logs the connection is made18:25
clarkbanyways smarter people than I are debugging that one18:25
sdaguedansmith is working on an event callback api that neutron can call which will make that much better18:25
sdaguebecause it turns out that we are mostly passing tests because cirros dhcps 5 times18:26
sdaguebefore giving up18:26
*** thedodd has quit IRC18:26
sdagueand time 5 usually works (times 1, 2, and 3... not so much)18:26
*** nicedice has joined #openstack-infra18:26
jeblairi don't understand 7805218:27
*** zns has joined #openstack-infra18:28
fungijeblair: clarkb: it's worth pointing out that we may want to scale back our max instances in ord... we're hitting ram quota limit around 63 nodes and nodepool thinks it's allowed to have 92 in there18:28
clarkbnibalizer: help me out with why does puppetdb do anything on the master? aren't they separate hosts? (trying to grok the interaction that happens there)18:28
dansmithjeblair: it's just a stub to get the gate stuff to run with out yet-committed trees across three projects18:28
*** rpodolyaka has joined #openstack-infra18:28
jeblairdansmith: i don't think that will work; the gate only runs what zuul decides18:28
dansmithjeblair: it seems to be working18:29
dansmithjeblair: we've fixed several things it caught that we hadn't reproduced yet18:29
nibalizerclarkb: that class will configure the puppet master to use puppetdb18:29
jeblairsdague: agreed; people have said they would work on that but i haven't seen anything.18:29
jeblairdansmith: if that works then we have a really serious problem18:29
clarkbnibalizer: and restart the puppet master when puppetdb changes are made?18:29
nibalizerif you modify puppet.conf you gotta bounce puppet18:29
nibalizeri think, it might be all smart about that now18:30
clarkbnibalizer: gotcha so that change is just adding the bounce18:30
clarkbeverything else is already in place right?18:30
*** gyee has quit IRC18:30
clarkbcool /me approves18:31
openstackgerritMichael Krotscheck proposed a change to openstack-infra/storyboard: Make token storage configurable
fungion all our systems18:31
clarkbfungi: should that be abandoned / WIP until we kill grizzly?18:31
jeblairdansmith: indeed, devstack did seem to checkout 74832 in
jeblairdansmith: that's _really_ not supposed to happen18:32
jeblairsdague, dtroyer: ^18:32
dansmithjeblair: roger that.. was just trying to find a trace that's only generated in the new code18:32
fungiclarkb: yep, done18:32
dansmithjeblair: you can credit arosen for finding that hole, I just said "oh, nifty" :D18:32
dansmithjeblair: really wish I had --reset-author so I could get the credit :(18:33
dansmithjeblair: okie18:33
lifelessSpamapS: fungi: is the instance fungible fungi ? :)18:33
fungilifeless: so it seems ;)18:33
dansmithjeblair: would really appreciate you not fixing it until we get this stuff in tho :D18:33
jeblairdansmith: (basically, zuul needs to be responsible for setting up repos because setting them up the way zuul wants them to be is very complicated)18:34
openstackgerritVictor Stinner proposed a change to openstack/requirements: Block setuptools 3.0 to workaround a cffi bug
jeblairdansmith: well, devstack-gate in this case18:34
dansmithjeblair: yeah, I get it.. seems like a devstack patch would be able to break that, but nothing else, but IANAIP18:34
*** sabari has quit IRC18:34
fungilifeless: the parts i exercised anyway. at this point i'm hoping the stack trace (once i can find and snip it out of the log full of other noise) will show we're blocking on something where nodepoold got confused at some point while the endpoint was unresponsive18:34
clarkbjeblair: does ERROR_ON_CLONE need to be ERROR_ON_GIT instead?18:35
jeblairdansmith: yes, that should be the case18:35
clarkbhpk hopes to get to pull requests soon. Fingers are crossed we get concrete info on the tox thing.18:37
*** CaptTofu has joined #openstack-infra18:37
*** thuc has quit IRC18:39
jeblairdansmith, sdague, dtroyer: the checkout of refs seems to be fairly self-contained and doesn't looks like it indicates a bug that could affect the gate, as long as devstack cores don't actually approve a change that sets the default branches to something with refs18:39
sdaguejeblair: so it looks like the logic in devstack in git_clone is pattern matching for what is probably a zuul ref18:39
*** thuc has joined #openstack-infra18:39
*** dprince has joined #openstack-infra18:39
sdaguejeblair: yeh, I think we're safe there :)18:39
jeblairsdague: well, it would be any kind of ref, zuul or (in this case) raw gerrit18:40
sdaguewe could probably check for it in our just in case18:40
sdagueright ^ref/18:40
sdaguesorry ^refs18:40
*** bhuvan_ has joined #openstack-infra18:40
sdagueit is kind of a useful hack of the system for exactly this case though18:40
jeblairsdague: yeah, i think it's probably safe to leave the facility in there as long as we are careful (and a test for it would be a good way to do that), at least until we can cross-depend in zuul18:41
*** thuc_ has joined #openstack-infra18:42
*** coolsvap has quit IRC18:43
*** thuc has quit IRC18:43
*** thuc__ has joined #openstack-infra18:43
*** chuck__ has joined #openstack-infra18:45
*** thuc_ has quit IRC18:47
*** sweston has joined #openstack-infra18:47
clarkbsdague: with that sorted, can we fix e-r logging?18:50
*** rcarrillocruz1 has joined #openstack-infra18:51
clarkbsdague: want to make sure you think a revert is the right way to tackle that before we go doing that18:51
jogoclarkb: working on a patch to fix the !logging part18:51
*** mwagner_lap has quit IRC18:51
jogoalmost there18:51
clarkbjogo: woot18:51
jogostill getting dropped files18:51
clarkbjogo: did you track that down to gerritlib?18:51
jogoclarkb: worked around it18:51
clarkbthe TypeError18:51
sdaguejogo: great, point me to the patch when you get it up18:51
*** rcarrillocruz has quit IRC18:51
jogosdague: will do18:51
fungijeblair: lifeless: SpamapS: every one of the node launcher threads for all the currently "building" tripleo nodes looks like
sdagueso - is setuptools still in our mirror?18:52
jeblairfungi: so look for the task manager for tripleo18:53
jeblairfungi: they are all waiting for it to complete a task18:53
clarkbsdague: looking18:53
sdaguebecause if so, we should purge it or we should put in the requirements block18:54
sdaguegiven that it's been pulled from pypi18:54
jeblairfungi: possibly it's the old dead connection isn't detected as dead because of lack of keepalive thing18:54
sdaguealso... pycon must be around the corner :)18:54
clarkbsdague: that version is not in our mirror18:54
clarkbsdague: we should be fine18:55
fungijeblair: yep!18:55
sdagueclarkb: ok, when did it drop?18:55
fungijeblair: lifeless: SpamapS:
clarkbsdague: I don't understand the question. I don't think it was ever in our mirror18:55
*** banix has joined #openstack-infra18:55
*** talluri has quit IRC18:55
jeblairfungi: so i've been getting an earful this morning about how unsatisfactory it is that we can't provide test nodes in a timely manner18:56
*** johnthetubaguy has quit IRC18:56
annegentleanyone know the right meaning of OS_TENANT_NAME for HP Cloud? For my credentials file to use nova CLI?18:56
annegentlethe project ID isn't wanted apparently18:56
clarkbannegentle: one sec18:57
annegentleI have a domain id and an account id guess I'll try those and process of elimination18:57
jeblairfungi: so i don't want to restart nodepool to fix that.  i think it can wait until we need to restart for something else, or this weekend or next week.18:57
fungijeblair: makes sense18:57
SpamapSis it not closing dead connections?18:57
*** mrodden has quit IRC18:57
*** chuck__ has quit IRC18:57
fungiSpamapS: it's not closing _a_ connection. one that is still established the last time it heard18:58
fungiSpamapS: if the other end dropped the connection without sending tcp rst or fin, then it's just going to wait forever (or until the next time it's restarted)18:58
clarkbannegentle: looks like it should be the project name18:58
SpamapSfungi: so perhaps we could craft an RST ...18:59
jeblairthat would probably do it18:59
SpamapSis it possible it _is_ still active on this end?19:00
*** dims has quit IRC19:00
sdagueclarkb: ok, never mind, the largeops fails were the other thign19:00
annegentleclarkb: huh. now I get Tenant not accessible.19:00
anteayaso for tomorrow's installment of new project fridays we have: thus far19:01
SpamapSfungi: could you dig out the tcp connection details? we can try sendip on it.19:01
*** jcooley_ has joined #openstack-infra19:01
*** sabari has joined #openstack-infra19:02
* SpamapS has used sendip before but never not as a joke..19:02
*** hogepodge has quit IRC19:02
sdagueman, the fact that nova is broken on vim in cloud archive - ....19:03
openstackgerritMichael Krotscheck proposed a change to openstack-infra/storyboard: Add superuser check
clarkbsdague: is that dims' change to test cloud archive?19:04
openstackgerritA change was merged to openstack-infra/config: Set apache as the puppet service name
*** zns has quit IRC19:04
fungiSpamapS: if you're wanting to work on a patch, there's probably a couple of things we should be doing in nodepool for robustness when reusing client/provider connections... find a way to get tcp keepalives going on the socket (to handle cases where the provider endpoint dies silently) and also periodically recycle connections (to handle things like graceful endpoint moves using dns record changes)19:05
sdagueclarkb: yeh19:05
*** hogepodge has joined #openstack-infra19:05
clarkbsdague: I was really hoping it would just work :(19:06
sdagueclarkb: actually, no -
sdaguemaybe they just broke the update stream entirely19:06
sdaguehowever, there isn't really any reason we should be explicitly installing vim in devstack19:06
openstackgerritA change was merged to openstack-infra/storyboard: Make token storage configurable
dansmithfungi: should I be rechecking patches after that ubuntu mirror outage thing?19:07
dansmithfungi: I don't want to generate more load if they're just going to fail19:08
fungidansmith: it looked like it was brief... i only saw it hit a few19:08
dansmithfungi: cool, thanks19:08
*** arosen has joined #openstack-infra19:09
fungidansmith: oh, it might still be ongoing? i see it affected some nova changes in the gate about 30 minutes ago19:10
dansmithfungi: okay19:10
dansmithfungi: well, the check queue isn't huge, so maybe it's okay if I do a few?19:10
*** skraynev is now known as skraynev_afk19:11
openstackgerritClark Boylan proposed a change to openstack-infra/elastic-recheck: Revert "move to static LOG"
clarkbsdague: ^19:12
clarkbjogo: ^ I suppose you are probably interested in that too19:13
*** mrodden has joined #openstack-infra19:13
openstackgerritA change was merged to openstack-infra/storyboard-webclient: Removed errant console statements.
sdagueclarkb: so instead of that, can we lazy load it?19:14
SpamapSfungi: any chance that one of those connections shut down?19:14
sdaguesorry, I only just realized what the issue is19:14
clarkbsdague: this is lazy loading the singleton objects19:14
*** alexpilotti has quit IRC19:15
jogoclarkb: so e-r is more broken then I thought :/19:15
clarkbdo you want a LOG object with an internal logger that is filled when logger methods are used on LOG?19:15
fungiSpamapS: doesn't look like it19:15
fungiSpamapS: still all the same port numbers on source and destination19:16
SpamapS19:16:15.563519 IP > ICMP host unreachable - admin prohibited, length 4819:16
SpamapSI don't think it liked my RST19:16
SpamapS19:16:15.524872 IP > Flags [R], seq 819381088, win 65535, length 019:16
sdagueclarkb: ok19:17
SergeyLukjanovfungi, jeblair, clarkb, mordred, we've just selected the name for savanna, so, I'd like to request the repo renaming this weekend (it'll be really awesome, to have more time before the RC)19:18
fungiSpamapS: i can try sending it from the host to itself, or from an adjacent vm if i can find one. what command-line options were you trying?19:18
anteayaI just looked at my iternary for my trip, i thought I left tomorrow night but I don't I leave tonight19:18
SpamapSfungi: sendip -v -p ipv4 -p tcp -ts 13774 -td 60307 -tfr 1 -tfs 0 -is
fungiSpamapS: giving it a whirl19:19
annegentleclarkb: ah, figured it out, had to activate services on my project19:19
clarkbannegentle: oh yeah you ahve to do that for individual things in different regions19:19
*** sweston has quit IRC19:20
fungiSpamapS: i was trying to do something similar with hping3 the other day (to spoof tcp/rst packets in an attempt to close down gerrit client connections) but wasn't having much luck. never knew about sendip19:20
annegentleclarkb: kind of nice for protecting my bill... workshop for sxsw Sunday19:20
annegentleclarkb: :) cross cloud workshop19:20
*** thedodd has joined #openstack-infra19:20
clarkbannegentle: sounds like fun19:21
NobodyCamGood morning infra quick question: is setuptools-3.0.2 newly uploaded this morning?19:21
clarkbNobodyCam: yes and it should be gone now19:21
fungis/deprecated/removed/ (it was already deprecated)19:22
SpamapSfungi: I think we actually need the sequence number.19:22
NobodyCamahh ok ... :) thank you :)19:22
fungiSpamapS: i agree, and ran out of time to figure out whether i could dig it out of the kernel19:22
SpamapSfungi: right thats what I'm trying to determine19:23
anteayaso that means I am not here tomorrow for new project friday19:23
SpamapSfungi: check /proc/net/ip_conntrack19:23
*** e0ne has joined #openstack-infra19:24
fungiSpamapS: eureka!19:24
fungitcp      6 298488 ESTABLISHED src= dst= sport=60307 dport=13774 src= dst= sport=13774 dport=60307 [ASSURED] mark=0 use=219:24
lifelessfungi: can you send it a RST ?19:25
StevenKfungi: My fault19:25
fungii assume 298488 is the ipseq19:25
openstackgerritMichael Krotscheck proposed a change to openstack-infra/config: Added krotscheck as a user to
lifelessom over to the sprint19:25
fungilifeless: that's what we're trying to figure out19:25
SpamapSfungi: so I think the RST needs to be that +119:25
SpamapSfungi: sendip -v -p ipv4 -p tcp -ts 13774 -td 60307 -tfr 1 -tfs 0 -tn 298489 -is
SpamapSI'm still getting ICMP denials from the actual server19:26
*** sarob_ has joined #openstack-infra19:27
mtreinishclarkb: I'm too lazy to do a diff, what's the difference between: and ?19:28
*** gyee has joined #openstack-infra19:29
fungiSpamapS: no luck from this end either (locally on the machine or from another host in the same region). iptables might be doing some sort of anti-spoofing to prevent egress of these packets... i'll fire up tcpdump shortly19:29
clarkbmtreinish: one passes pep8 and is the result of a revert. The other was my first stab at it, I will abandon the one that isn't a revert19:29
clarkbmtreinish: basically better book keeping in one19:29
*** lcheng has joined #openstack-infra19:30
mtreinishclarkb: heh, ok I was reviewing the revert and thought it looked familiar so I got confused19:30
jogoclarkb: I am going to rebase my WIP to fix ER on your patch and we should have a working e-r again19:30
clarkbjogo: great19:31
clarkbmtreinish: sorry for the confusion19:31
clarkbpleia2: speaking of nodepool fedora. is a thing and I couldn't get centos6 nodes running in hpcloud yesterday. I am about to try again to test that change. If that continues to fail any chance you have a node that we can edit /etc/default/grub on then update-grub, reboot, take a snapshot then boot from the snapshot?19:33
mtreinishclarkb: no it's good, it keeps me on my toes.19:33
openstackgerritA change was merged to openstack-infra/elastic-recheck: Revert "move to static LOG"
pleia2clarkb: having a look19:34
*** dhellmann_ is now known as dhellmann19:34
SpamapSfungi: ty.. we're having debates about whether or not we can just do a very high sequence and whether ack number is important19:35
greghaynesWhat happens when you mention tcp innards to a room of nerds19:35
*** hogepodge has quit IRC19:36
fungiSpamapS: yeah, it looks like i can't send it remotely (never arrives) but locally on the interface i get several which look like19:36
fungi19:35:05.757062 IP > Flags [R.], seq 2147916301, ack 999999994, win 65535, length 019:36
clarkbpleia2: ya I still get 2014-03-06 19:36:26,611 -[WARNING]: '' failed [49/120s]: url error [[Errno 113] No route to host] I am going to try a west instead of east19:36
clarkbalso the centos6 image has a 5 second grub timeout19:37
fungiSpamapS: oh, actually, o19:37
* clarkb should find out if we can have images that just boot19:37
pleia2clarkb: doh19:37
fungiSpamapS: actually i'm not receiving the ones i'm sending either. i think it was your attempts i was picking up via tcpdump19:37
*** apevec has joined #openstack-infra19:38
SpamapSfungi: I've sent a few...19:38
fungiSpamapS: yeah, i captured three19:39
*** e0ne has quit IRC19:39
pleia2clarkb: I'll give it a try (my vms are on west at the moment anyway)19:39
fungiSpamapS: which means they are arriving19:39
*** e0ne has joined #openstack-infra19:39
*** mrodden1 has joined #openstack-infra19:40
*** hogepodge has joined #openstack-infra19:41
SpamapSfungi: just sent this one19:41
SpamapS19:41:35.760724 IP > Flags [R], seq 298489, win 65535, length 019:41
fungiyep, saw it19:42
SpamapSfungi: ok.. and when you try to send it, you get the same ICMP denial?19:42
*** mrodden has quit IRC19:42
SpamapSsendip -v -p ipv4 -p tcp -ts 13774 -td 60307 -tn 298489 -tfr 1 -tfs 0 -is
fungiSpamapS: i get no reply packet at all, just silence19:42
SpamapSfungi: conntrack still showing the same thing?19:43
tchaypopleia2: I don't have a review button on that change19:43
fungiSpamapS: seq is 296009 now19:43
fungiSpamapS: wait, wrong socket. 29735019:44
ttxfungi: anything wrong with gate right now ? I kinda need 78670,1 to cut I319:44
ttxand the top looks a bit funny19:45
pleia2clarkb: so the change will be different in centos because /etc/default/grub doesn't exist19:46
fungittx: 77710 failed a while ago on a rackspace ubuntu mirror issue19:46
pleia2tchaypo: should! I don't know why you wouldn't :(19:46
clarkbpleia2: that is what I was afraid of but couldn't verify because EMETADATASERVER19:46
fungittx: 77941 looks like a nondeterministic bug somewhere raised in a tempest test19:46
SpamapSfungi: no that is not sequence19:46
ttxfungi: ok so there may still be hope for 7867019:46
clarkbpleia2: is centos6 grub1?19:46
SpamapSfungi: that is seconds until the entry is deleted19:46
fungiSpamapS: it's timeout19:46
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Unbreak elastic-recheck
fungiSpamapS: what you just said19:46
*** zhiyan_ is now known as zhiyan19:46
jogoclarkb sdague: doing final round of testing on that ^19:47
* fungi is juggling too many irons in separate fires19:47
fungittx: i think so, yes19:47
clarkbttx: yup I think there is hope19:47
jogojust got a live one19:47
jogohopefully all this ER activity won't crash ES19:48
jogoclarkb sdague: it worked19:48
pleia2clarkb: 1:0.97-77.el6 ugh19:48
jogo- gate-tempest-dsvm-full:
clarkbjogo: it looks happyish. I mean we doubled its size so have room to grow19:48
jogoER lives again19:48
clarkbpleia2: woot!19:48
fungiSpamapS: i'm going to see if conntrack-tools will get us some relief19:48
jogoclarkb: well now ER will actually start DOSing it again19:48
jogoonce my fix lands19:48
clarkbjogo: oh good point19:48
clarkbjogo: well whatever19:48
dstufftfungi: setuptools 3 got pulled from PyPI a bit ago19:49
clarkbpleia2: what about fedora?19:49
SpamapSfungi: ok. One thought, the RST only has to be within window-size of the sequence number..19:49
clarkbpleia2: I am guessing that is grub2 and has /etc/default/grub19:49
SpamapSfungi: so if window is 65535 we don't have that many windows to try..19:49
fungiSpamapS: very true19:49
pleia2clarkb: spinning up a new fedora now to check19:49
jeblairfungi, clarkb: i'm kind of thinking that i don't really want to review channel-level acl changes... so maybe i should rework the accessbot to set perms for global things, and then maybe revoke +F,etc from anyone else, but otherwise leave things be?19:52
openstackgerritMichael Krotscheck proposed a change to openstack-infra/config: Added Authorization Header flag to storyboard module
fungijeblair: that seems like it would be less work in the long run. sounds good to me19:53
clarkbjeblair: wfm19:53
mordredjeblair: ++19:54
fungiSpamapS: conntrack was no help. it allowed me to delete the connection tracking entries from the table, but did not actually close out the established sockets associated with them :(19:55
jeblairfungi: that might have been anti-help19:55
fungijeblair: yeah, i can re-add them though19:55
SpamapSfungi: and that may now prevent me from actually getting an RST through19:55
pleia2clarkb: heh, of course fedora does not really use /etc/default so much (it has a couple things in it, but it doesn't seem to be the way they do things)19:56
*** markwash has joined #openstack-infra19:56
clarkbpleia2: :) so my change as is is safe for precise but nothing else19:56
pleia2clarkb: seems so19:57
*** nicedice_ has joined #openstack-infra19:57
pleia2I'll dig into this and see what we need to do in rh19:57
clarkbI think I managed to get a centos host in west19:57
jeblairclarkb, pleia2: does it break fedora/centos or just not work there?19:57
fungilifeless: no help on a dead socket " needs to sniff the connection and extract the magic Acknowlegment and Sequence numbers from a TCP packet..."19:57
clarkbjeblair: it would fail the image build19:57
clarkbjeblair: because the file and update-grub aren't a thing on centos19:57
lifelessfungi: would it hurt to try?19:58
fungilifeless: um19:58
fungilifeless: i don't have those pieces of data, and have no way to obtain them. try what exactly?19:58
lifelessfungi: it sends a syn to the socket19:58
lifelessfungi: and uses that to figure out what to send to kill it19:58
fungilifeless: i get that. just guess something?19:58
lifelessKillcx works by creating a fake SYN packet with a bogus SeqNum, spoofing the remote client IP/port and sending it to the server. It will fork a child process that will capture the server response, extract the 2 magic values from the ACK packet and use them to send a spoofed RST packet. The connection will then be closed.19:59
fungiit sounds like that part only actually works with windows, suggesting linux doesn't respond to its probe19:59
lifelessfungi: it describes how it works on linux19:59
*** dstanek has quit IRC20:00
*** dstanek_afk is now known as dstanek20:00
fungioh, i see. their instructions sounded like i needed to get those values20:00
jeblairfungi: i thought that too20:00
fungiworth a try, but first i probably need to restore the connection tracking entries20:00
SpamapSfungi: would you mind 2.9 million RST's sent at that machine from ours?20:00
*** openstackgerrit_ is now known as openstackgerrit20:01
SpamapSthat would hit every possible window20:01
*** wchrisj__ has joined #openstack-infra20:02
*** wchrisj_ has quit IRC20:03
*** nicedice has quit IRC20:03
*** wchrisj has quit IRC20:03
*** harlowja has quit IRC20:03
*** mrodden1 has quit IRC20:03
*** mrodden has quit IRC20:04
*** mrodden has joined #openstack-infra20:04
lifelessSpamapS: IF we're not facing NAT20:04
mordredjeblair: I believe we're ready for an ssl cert for - you normally get those, yeah?20:04
mordredjeblair: wow. you're amazing20:04
SpamapSlifeless: facing NAT?20:04
SpamapSlifeless: there's no NAT.20:04
lifelessSpamapS: on the nodepool server?20:06
*** rlandy|bbl is now known as rlandy20:06
SpamapSlifeless: the TCP connection I was shown shows the real IP's on both sides.20:06
fungilifeless: SpamapS: okay, the conntrack entries for those two sockets are restored20:07
*** vkozhukalov has quit IRC20:08
krotscheckjeblair: Could you fix storyboard for us?
SpamapSfungi: I'll try with a big window size first20:09
lifelessSpamapS: ok what are the conn details ?20:11
*** sweston has quit IRC20:11
*** eharney has quit IRC20:12
Ngdo you guys know about cutter?20:13
*** malini is now known as malini_afk20:13
*** mrodden1 has joined #openstack-infra20:14
openstackgerritA change was merged to openstack-infra/config: Added Authorization Header flag to storyboard module
*** mrodden has quit IRC20:15
*** ihrachys is now known as ihrachys|afk20:15
SpamapSfungi: anything?20:16
lifelessfungi: thanks20:17
SpamapSi just hit all 65535 byte windows20:17
fungiSpamapS: no change... still there20:17
lifelessfungi: tried killcx ?20:17
fungii'm about set with the requirements for that lifeless, yes20:17
lifelessfungi: cool. fingers crossed.20:17
fungistill working on it20:17
fungilifeless: SpamapS: that seems to have done it--good find!20:20
fungishort little program too... didn't take too long to audit20:20
StevenKfungi: Can you paste the output, out of interest?20:21
SpamapSfungi: killcx got it?20:21
fungiStevenK: SpamapS: yep!
krotscheckjeblair: Thanks!20:23
fungiSpamapS: lifeless: StevenK: it's not packaged in precise, but worked with libnet-rawip-perl libnet-pcap-perl and libnetpacket-perl on precise20:24
*** mriedem has joined #openstack-infra20:24
fungithat's a useful one to keep up the sleeve for future use. something i've occasionally wanted to be able to do on linux and never found a good tool for20:25
StevenKIt's not packaged at all, as far as I can tell20:25
fungipro'lly not20:25
*** rlandy has quit IRC20:26
StevenKPerl modules are incredibly easy to package up, so I may just accidently do it.20:27
openstackgerritMonty Taylor proposed a change to openstack-infra/config: Make storyboard run over ssl
pleia2clarkb: so fedora cloud images don't actually have grub2 packages, they have a grubby package which is used to update the grub.cfg which it seems to use, but I can't get it booting in a way that respects the mem=8G20:28
fungiStevenK: it's useful enough i'd get it into sid if i had time, but you know... time20:28
pleia2clarkb: will dig more after lunch20:28
clarkbpleia2: thank you20:29
clarkbpleia2: I think I mostly have centos sorted20:29
*** eharney has joined #openstack-infra20:30
*** zns has quit IRC20:32
jomarafungi: quick git question for you, i have a question pertaining to my 'situation' yesterday - of the 5 patches, #1 was merged, #5 was decoupled, and #2 was found to have a spelling error i need to fix. am i save to just edit #2 & git-review that, or should i edit #2, then cherry pick #3 & #4, and git-review that?20:32
*** zns has joined #openstack-infra20:34
jomaraim a little gunshy after yesterday :)20:34
krotscheckMonty wants me to share this:
mordredfungi, jeblair: ^^^20:34
jomaraok, got it20:35
jomarathat is the same thing you wanted me to do yesterday (which i ended up aborting, because i had to decouple the 5th patch instead of preserving it)20:35
fungijomara: if #5 is decoupled, then assuming your old topic branch still had it depending on the others, you'll want to git reset --hard to patch #4 before starting the rebase (and git stash first if you had anything else being edited in there you hadn't committed yet)20:38
*** dangers is now known as dangers_away20:38
jomarafungi: my old topic branch ends in #4, so i should be ok20:38
jomaraalso now that ive started this it makes perfect sense, thanks20:38
fungioh, perfect20:38
clarkbthat is a lot of seconds at 20k servers a day20:38
fungijomara: also you can 'git review -d NNNNN' the revire number of change #4 if you need a fresh topic branch for it and its dependencies20:39
*** rossella_s has quit IRC20:40
*** rlandy has joined #openstack-infra20:41
*** yolanda_ has quit IRC20:42
*** rossella_s has joined #openstack-infra20:42
openstackgerritClark Boylan proposed a change to openstack-infra/config: Limit nodepool nodes to 8GB of RAM
clarkbpleia2: ^ that covers ubuntu and CentOS20:42
clarkbpleia2: left a blank spot in there for Fedora, feel free to push a patchset that addresses Fedora (and add yourself as a co author in the commit)20:43
openstackgerritClark Boylan proposed a change to openstack-infra/config: Limit nodepool nodes to 8GB of RAM
clarkbpleia2: ^ that will actually edit grub.conf20:47
*** sweston has joined #openstack-infra20:49
clarkbfungi: I will be around20:49
clarkbso lets do that20:50
*** Ryan_Lane has quit IRC20:50
SergeyLukjanovfungi, clarkb, thank you, I need to clarify that we're ready to do it this weekend, so, will be ready to ack it tomorrow20:52
*** bhuvan_ has quit IRC20:53
*** ildikov_ has joined #openstack-infra20:54
*** sandywalsh has quit IRC20:55
*** eharney has quit IRC20:56
clarkbjogo: I hav eapproved your e-r fixes. I will keep an eye on it20:56
fungiSpamapS: lifeless: i temporarily bumped the min-ready for tripleo-precise nodes on each jenkins master by one, so that they'll all get your jobs registered again, however nova list takes a crazy log time to return and shows a bunch of instances in an error state. is something still wrong there?20:57
*** e0ne has joined #openstack-infra20:57
*** krotscheck has quit IRC20:58
*** bhuvan has joined #openstack-infra20:59
*** bhuvan_ has joined #openstack-infra20:59
*** SumitNaiksatam has quit IRC21:00
*** SumitNaiksatam has joined #openstack-infra21:00
clarkbjeblair: fungi mordred SergeyLukjanov for stuff like do we want to discuss that more than two +2s and a +A? I feel slightly bad approving something like that without more discussion/agreement21:01
*** krotscheck has joined #openstack-infra21:01
mordredclarkb: I agree with more discussion, although I also agree with the patch21:02
SergeyLukjanovclarkb, agree with extended discussion21:02
fungiit probably should go on the -dev ml in a dedicated thread with [3rd-party ci] subject tag or something21:03
clarkbgreat. fungi we await your vote (no rush)21:03
fungibecause input from the people we're imposing this new requirement on is at least some of what we need, as well as those impacted by not imposing the requirement21:03
clarkbfungi: ++ want to suggest that on the change?21:04
jogoclarkb: thanks, I went to lunch so glad someone was tracking it21:04
clarkbjogo: the service should restart here shortly I Think21:04
jogowendar: BTW21:05
clarkbif it doesn't I will go digging21:05
*** bhuvan_ has quit IRC21:05
*** bhuvan has quit IRC21:05
wendarjogo: yeah, I was just looking it over, looks good!21:05
*** andreaf has joined #openstack-infra21:06
*** dzimine has joined #openstack-infra21:06
lifelessfungi: it will come good eventually21:06
fungilifeless: funzies21:06
lifelessjogo: ^21:06
dziminefolks, we have a problem on stackforge with Mistral21:07
dziminea few commits that are failing with the exact same error:21:07
dzimineWe looked at the logs and looks like it's CI itself: at least we can't find a problem on our side.21:08
dzimineAny help? Thanks in advance!!21:08
fungidzimine: "ImportError: cannot import name Feature"21:08
fungidzimine: new setuptools was released today which removed "Feature" for setup.py21:08
fungidzimine: looks like MarkupSafe expects it21:08
*** Ryan_Lane has joined #openstack-infra21:08
mordredI agree with fungi21:09
fungidzimine: you should probably convince the MarkupSafe developers to fix that21:09
*** andreaf has quit IRC21:09
mordredfungi: wow - really - the setuptools guys released a breaking change?21:09
mordredthat just removes something?21:09
*** mrodden1 has quit IRC21:09
fungimordred: they deprecated "Feature" in 2.x and removed it in 3.x21:09
*** mriedem has quit IRC21:09
*** andreaf has joined #openstack-infra21:09
*** sweston has joined #openstack-infra21:09
lifelessmordred: yes, see dstufft's note yesterday21:10
*** mriedem has joined #openstack-infra21:10
fungithough admittedly most of the time deprecation warnings are more or less silent, so some packages continued using it unaware they were eventually going to break21:10
fungicffi was also known to be affected, and they were working on a fix as of today21:11
openstackgerritMichael Krotscheck proposed a change to openstack-infra/storyboard: Fixed name resolution in OAuth token
*** lcostantino has quit IRC21:13
*** mrodden has joined #openstack-infra21:13
clarkbpleia2: I did, please do change it to 8G21:14
*** blamar has joined #openstack-infra21:14
clarkbI was testing on an 8G node so used 2G to see it change21:14
*** mriedem1 has quit IRC21:14
clarkbdstufft: you should be good now I thought they pulled the release21:15
dzimineok, I'll try out. Thanks!21:15
*** lcheng has quit IRC21:16
*** jswarren has joined #openstack-infra21:16
clarkb they did, 3.0 s gone21:16
fungiclarkb: ahh, yep, so they did21:16
fungisimple index is missing it now too21:16
pleia2clarkb: so the centos fix *should* work with fedora, it adds it to the grub.conf fine, but something is weird about fedora, when I reboot it's still got 16G ram :\21:18
clarkbpleia2: the image you are using may not include the bootloader in it or some such21:19
clarkbpleia2: the non pvhvm rax images have this problem too21:19
clarkbpleia2: but as long as it doesn't completely fall over and break I think we decided to go with it21:19
clarkbfungi: ^ you seemed knowledgeable about that stuff21:20
pleia2clarkb: yeah, it doesn't come with grub2, just comes with "grubby" which is a tool that can update the config21:20
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Fix nesting for  required files
clarkbI am admittedly in the dark a bit when it comes to the various ways you can boot via kvm21:20
* SpamapS looks into why novaclient connections don't have a timeout21:20
clarkbSpamapS: ++21:20
* SpamapS finds that it is because none of urllib3, requests, or novaclient, set one.21:20
SpamapSYou must, as a novaclient user, set a timeout, currently.21:21
SpamapSwhich seems.. sily21:21
SpamapSsilly even21:21
fungiclarkb: the issue being that some virtualization implementations boot with an external bootloader (some even with an external kernel too) so if your second-stage bootloader isn't being run from within the image or the external bootloader isn't looking within the image for its configuration then you can end up having no guest-level control over kernel command line parameters21:22
*** ivanand has joined #openstack-infra21:22
clarkbfungi: gotcha21:22
fungiclarkb: though in this case, i expect our providers are launching a secondary bootloader from within the image, since that's the most flexible way to support lots of different platforms21:23
pleia2so, it doesn't hurt to do the centos thing to fedora, it just doesn't work21:23
fungirackspace may, however, document what their bootloader sequence and makeup looks like21:24
*** bhuvan has joined #openstack-infra21:24
fungiif so and if someone can find that information, then we don't have to guess21:24
clarkbpleia2: in that case I think we collapse the two elifs into one21:24
clarkbelfi [centos] || [fedora]21:25
pleia2clarkb: and just deal with fedora images ending up with too much ram?21:25
annegentleerror: src refspec 0.9 matches more than one.21:25
annegentleI'm trying to do a tagged release of the openstack-doc-tools repo, I've done it a few times now. But for 0.9 I'm getting "21:25
*** dzimine has quit IRC21:25
clarkbpleia2: yeah21:25
pleia2clarkb: wfm21:25
clarkbpleia2: we can't boot them in hpcloud currently which is where we need the restriction21:25
clarkbpleia2: but "handling" fedora keeps the nodepool scripts simple21:26
* pleia2 nods21:26
openstackgerritMonty Taylor proposed a change to openstack-infra/config: Make storyboard run over ssl
annegentleclarkb: oh it's there, locally, 0.9. Hm21:26
pleia2clarkb: you want to take care of this in the review? also commit message typo: nodepoll21:27
clarkbpleia2: sure and thanks21:27
pleia2sure thing, thanks21:27
*** dzimine has joined #openstack-infra21:27
dstufftmordred: Note, the thing that was pulled was deprecated in verison 1.021:27
annegentleclarkb: looks like I can git tag -d 0.921:28
dstufftI don't think I agree with the decision to pull it21:28
openstackgerritClark Boylan proposed a change to openstack-infra/config: Limit nodepool nodes to 8GB of RAM
clarkbpleia2: ^21:28
dstufftbut it had some warning at least :/21:28
clarkbdstufft: to be fair version 1.0 is less than a year old right?21:28
*** thuc has quit IRC21:28
clarkbdidn't it happen late summer earlyfall21:28
clarkbannegentle: yes that should be fine21:29
annegentleclarkb: thanks21:29
*** thuc has joined #openstack-infra21:29
*** ivanand has quit IRC21:29
openstackgerritA change was merged to openstack-infra/storyboard: Fixed name resolution in OAuth token
annegentleclarkb: hm still seeing it21:30
*** bhuvan_ has quit IRC21:30
*** bhuvan has quit IRC21:30
mordreddstufft: I just think if setuptools is going to start aggressively doing stuff - they should replace easy_install with pip21:31
*** jcoufal_ has joined #openstack-infra21:31
mordredsince easy_install is ACTUALLY broken21:31
clarkbannegentle: it shows up in `git tag` after deleting it?21:31
*** jcoufal has quit IRC21:31
mordredand setuptools.Feature is only probably weird and doesn't really hurt many people21:31
dstufftmordred: it actually broke things for some popular projects21:31
dstufftsetuptools.Feature that is21:31
mordreddstufft: the existence? or the removal?21:32
dstufftSQLAlchemy, cffi, Markupsafe21:32
dstufftthe removal21:32
*** dzimine has quit IRC21:32
mordredyah -I'm saying, if they're willing to remove that ...21:32
dstufftmostly for optional C extensions21:32
mordredperhaps let's replace easy_install instead21:32
mordredbecause, you know, ponies and unicorns21:32
pleia2clarkb: it's ok, you're just helping me get my review status up21:32
openstackgerritClark Boylan proposed a change to openstack-infra/config: Limit nodepool nodes to 8GB of RAM
clarkbI swear those comments registered but then totally forgot21:33
*** thuc has quit IRC21:33
*** thuc has joined #openstack-infra21:34
*** thuc has quit IRC21:34
openstackgerritA change was merged to openstack-infra/elastic-recheck: Fix nesting for  required files
*** thuc has joined #openstack-infra21:35
jeblairjhesketh_: good morning21:36
*** madmike has quit IRC21:36
jeblairjhesketh_: i'm sorry i'm behind on your zuul changes; i hope to catch up next week21:36
pleia2clarkb: so my final concern here is that prepare_node is the common script (tripleo uses it too), and we may not want to limit tripleo to 8G always21:37
jhesketh_there's much more important things :-)21:37
openstackgerritA change was merged to openstack-infra/meetbot: manual: fix typo
*** dzimine has joined #openstack-infra21:37
jeblairjhesketh_: btw, there was a bit more brainstorming this morning about the replication check in zuul; you might want to scan scrollback for it21:37
*** jcoufal_ has quit IRC21:37
jeblairjhesketh_: we don't have an answer yet, just some more thoughts21:37
clarkbpleia2: I think it is ok to limit the tripleo slave node21:38
clarkbpleia2: as the actual test envs are behind that node21:38
pleia2clarkb: lifeless isn't thrilled with this idea21:38
jhesketh_jeblair: okay, will do21:38
clarkbwe aren't limiting the tests in any way, just the proxy node21:38
pleia2yeah, aware21:38
jeblairpleia2: not okay limiting the 8g node to 8g?21:38
clarkbright that too21:39
pleia2jeblair: well they're 8G now :)21:39
lifelessjeblair: can't we just put it in ?21:39
jeblairi don't think it's a bad idea for the unit test nodes either21:39
jeblairso if not everywhere, then where pleia2 suggested (bare and devstack)21:40
lifelessjeblair: I'm just concerned that if we want more memory for cache on the slaves, that we don't want to perturbate devstack at that time21:40
*** dzimine has quit IRC21:40
clarkbwhat cache?21:41
clarkbthese nodes are used to proxy gearman right?21:41
clarkbare we using tmpfs on them for something?21:41
mordredso - I thnk if a node wants to have more memory, then it's a different type of node and should potentially have a different node definition21:41
*** denis_makogon has joined #openstack-infra21:44
clarkbsdague: do you need to reconcile that with the thing you are doing?21:45
*** atiwari has quit IRC21:46
*** yamahata has quit IRC21:47
*** yamahata has joined #openstack-infra21:47
sdagueclarkb: yeh21:48
openstackgerritJeremy Stanley proposed a change to openstack-infra/config: Lower rax-ord max-servers in nodepool to 56
pleia2clarkb: disk cache for building all the images on the nodepool node21:48
jeblairzaro: can it wait until next week?21:49
clarkbpleia2: that is a tmpfs?21:49
sdagueclarkb: honestly, we should layer that on top of the one I've got21:49
openstackgerritMonty Taylor proposed a change to openstack-infra/config: Make storyboard run over ssl
jeblairzaro: (i'd like to avoid making anyone's life more difficult this week if possible)21:50
pleia2clarkb: er, no21:50
clarkbpleia2: then why does the ram matter?21:50
*** dkliban has joined #openstack-infra21:51
SpamapSok so we've got 45 active nodes in the tripleo CI cloud.. but still getting "NOT_REGISTERED" for checks21:52
*** jcoufal has joined #openstack-infra21:52
*** lcheng has joined #openstack-infra21:52
*** blamar has quit IRC21:52
lifelessI think there is some confusion - I know I'm confused :)21:53
SpamapSfungi: ^^ ?21:53
*** blamar has joined #openstack-infra21:53
clarkblifeless: we want our slave nodes to boot with 8GB of memory21:53
jeblairSpamapS: OverLimit: Quota exceeded for instances: Requested 1, but already used 100 of 100 instances (HTTP 413) (Request-ID: req-88fd93b8-dd2d-4665-a385-416daa8d157c)21:53
jeblairfungi: ^21:53
clarkbso that we don't let code slip through that requires larger nodes than that21:53
fungijeblair: SpamapS: yep21:53
fungiSpamapS: lifeless: remember i said i saw a ton of instances in "error" state?21:54
clarkblifeless: it is an artificial limit on RAM so that we can boot flavors with more CPU or disk or anything that isn't RAM21:54
SpamapSneutron is choking because of db queue pool things21:54
SpamapSbut some things are working21:54
mordredlifeless: which is a workaround for an HP Cloud 1.1 issue that is being worked but will be the way things are fora  while21:54
fungiSpamapS: we're not getting any all the way to a ready state21:54
SpamapSquotas are a bit inaccurate... 95 instances total including the errors.21:55
SpamapSfungi: ahh21:55
*** thuc has joined #openstack-infra21:56
zarojeblair: no hurry.  just throwing it out there.21:56
jeblairzaro: ok, cool.21:57
lifelessclarkb: ok. So I want to be able to give the tripleo-gate slaves more RAM in future without accidentally breaking devstack-gate21:58
clarkblifeless: thats fair, but all I am asking is why?21:58
lifelessclarkb: because, we build disk images in those slaves, and thats too slow today, and we haven't analyzed why its slow yet.21:59
clarkbwhat runs on those machines that needs more than 8GB of ram?21:59
fungiSpamapS: so currently, nodepool isn't aware of any existing nodes in that provider... any it attempts to build immediately meet with over quota errors, and so it deletes them and tries again21:59
openstackgerritA change was merged to openstack-infra/config: Fixed update of env var in manila's job
lifelessclarkb: we build 4-5 disk images, which means LOTS of IO so we want lots of page cache.21:59
SpamapSfungi: quotas fixed21:59
annegentleclarkb: sorry had to step away. So I removed 0.9 tag, then added it back, then on the git push gerrit 0.9 I'm getting the error again21:59
lifelessclarkb: plus we use a tmpfs to store the tranient image21:59
lifelessclarkb: upping them to 16G is one of the very first items on my 'and now we try optimising things' list21:59
annegentleclarkb: and I don't see a 0.9 tag on the remote22:00
*** zhiyan is now known as zhiyan_22:00
pleia2clarkb: ah, so we do use a tmpfs22:00
pleia2sorry :)22:00
fungiSpamapS: i see 52 stably building now. crossing fingers again22:01
* fungi needs to go make some dinner... bbiaw22:01
SpamapSfungi: thanks22:01
*** pdmars has quit IRC22:03
clarkbannegentle: can you paste the output?22:03
clarkblifeless: gotcha, but wouldn't you need to change the nodepool configs anyways?22:03
clarkbI see this as making things more ocmplicated for a noop22:04
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Add fingerprint for swift bug 1288918
clarkb(and the nodepool scripts are already unfortunately complicated)22:04
jeblairclarkb: i think he's saying that he would immediately need to move the 8g knob out of the tripleo path because he needs it not to be there22:04
*** dzimine has joined #openstack-infra22:05
fungiSpamapS: you have ready nodes22:06
*** zns has joined #openstack-infra22:06
annegentleclarkb: sure here's more than you want probably :)22:07
*** lcestari_ has quit IRC22:07
SpamapSfungi: is nodepool failing to delete the 52 that are in error?22:07
fungiSpamapS: probably, since it didn't/doesn't seem to know they exist22:07
*** atiwari has joined #openstack-infra22:08
*** khyati has quit IRC22:09
lifelessclarkb: if we change the nodepool config and a script caps our ram on boot, thats going to make the change a no-op right?22:09
fungiSpamapS: spot-checking one of those in an error state, it failed on create and the log says the exception was "OperationalError: (OperationalError) (2006, 'MySQL server has gone away') 'UPDATE node SET external_id=%s WHERE = %s' ('efacd12d-95d7-4079-b4d8-3c0303e24515', 2151728L)"22:10
*** jungleboyj has quit IRC22:10
SpamapSfungi: ow!22:11
fungiSpamapS: that was at 20:56:19 utc22:11
SpamapSfungi: I think nodepool DoS'd us. ;)22:11
SpamapSpoor little cloud :p22:12
fungiSpamapS: i think that was sqlalchemy complaining about the local mysql database on the nodepool server, so maybe it ddos'd itself22:12
*** packet has quit IRC22:12
jeblairit probably exceeded the mysql connection timeout22:13
fungigot it22:13
jeblairit == time nodepool spent waiting for something to happen22:13
fungiand then mysqld said "you're taking too long, it's someone else's turn"22:13
SpamapSfungi: the instances on our side have all kinds of crazy errors22:14
*** zns has quit IRC22:14
SpamapSwe were seeing lots of stuff fail because neutron was throwing 500's22:14
SpamapSfungi: all software has bugs ;)22:14
*** mbacchi has quit IRC22:14
jeblairsometimes i wonder if there are people just as busy inside of hp and rackspace claning up errors caused by nodepool on their side as we are on ours22:14
fungiSpamapS: you *are* aware that neutron is only designed to handle one request in a cloud at a time, right? ;)22:14
openstackgerritClark Boylan proposed a change to openstack-infra/config: Limit non tripleo nodepool nodes to 8GB of RAM
SpamapSjeblair: a cloud of dissonance.22:15
* fungi is really going to cook dinner now, before his gf comes after him with knives or something22:15
jeblairfungi: eyes on the stove22:16
* SpamapS cooks eyes in the microwave22:16
clarkbSpamapS: sounds tasty22:17
pleia2clarkb: thanks :)22:17
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Make IRC bot list which failures were seen in which job.
annegentleclarkb: any thoughts? What am I not seeing?22:19
clarkbannegentle: sorry catching up on that now22:19
annegentleclarkb: no worries22:19
*** bhuvan_ has joined #openstack-infra22:20
*** bhuvan has joined #openstack-infra22:20
*** bhuvan has quit IRC22:20
*** bhuvan has joined #openstack-infra22:20
*** bhuvan_ has joined #openstack-infra22:20
clarkbannegentle: huh22:20
annegentleclarkb: yeah me too, puzzled22:21
*** mkoderer has quit IRC22:21
annegentleclarkb: guess I can try it from another server22:21
annegentleclarkb: to make sure it's not something local22:21
clarkbannegentle: I wonder if 0.9 is too ambiguous for some reason22:22
clarkbjeblair: ^22:22
*** dzimine has quit IRC22:25
jeblairclarkb, annegentle: context switching; hang on a sec22:25
jeblairoh.  interesting.  possibly...22:26
openstackgerritRussell Bryant proposed a change to openstack-infra/config: Trim down gantt check/gate jobs
*** rpodolyaka has joined #openstack-infra22:26
* jeblair pokes around a bit22:26
openstackgerritA change was merged to openstack-infra/elastic-recheck: Add fingerprint for swift bug 1288918
jeblairannegentle: what does "git show 0.9" produce?22:27
jeblair(i get fatal: ambiguous argument '0.9': unknown revision or path not in the working tree. )22:27
*** dims has quit IRC22:28
*** rcarrillocruz1 has quit IRC22:28
*** yassine has joined #openstack-infra22:28
*** jnoller has quit IRC22:29
*** khyati has joined #openstack-infra22:32
*** jhesketh__ has joined #openstack-infra22:34
jeblairclarkb: ping?22:35
jeblairannegentle: ping?22:35
*** bhuvan___ has joined #openstack-infra22:35
*** bhuvan_ has quit IRC22:36
clarkbjeblair: pong22:37
clarkbjeblair: git show 0.9 does the same thing for me22:37
*** bhuvan has quit IRC22:37
jeblairannegentle: i'm still curious about that ^ whenever you are back22:38
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Move 1286963 into queries folder
*** morganfainberg_Z is now known as morganfainberg22:41
*** dims has joined #openstack-infra22:41
*** sarob_ has quit IRC22:42
*** smarcet has left #openstack-infra22:43
*** alex-gone is now known as Alexandra22:46
*** esker has quit IRC22:47
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: If not posting IRC comment for unrecognized error, log something
*** esker has joined #openstack-infra22:48
*** rpodolyaka has quit IRC22:48
*** rpodolyaka has joined #openstack-infra22:49
*** rpodolyaka has quit IRC22:49
*** jungleboyj has joined #openstack-infra22:50
SpamapSmost of the error nodes were caused by scheduling errors presumably because of the races that are inherent :p22:50
*** esker has quit IRC22:52
jeblairSpamapS: wow it's like the errors we see in tempest are real22:53
SpamapSjeblair: whoa whoa whoa ... let's not jump to logic and reason22:54
morganfainbergjeblair, hehe22:54
*** zns has joined #openstack-infra22:55
morganfainbergpleia2, let me know when you have some time to chat (possibly next week) so we can brainstorm some on monitoring stuffs. I have some thoughts but I want to aim for a little more real-time before we start tossing things into "bugs". Make sure I'm not off in left field.22:55
*** rcarrillocruz1 has joined #openstack-infra22:55
*** thomasem has quit IRC22:55
*** lcheng has quit IRC22:56
pleia2morganfainberg: yeah sure, some time on monday work for you?22:56
*** rcarrillocruz has quit IRC22:56
*** bhuvan___ has quit IRC22:56
*** jamielennox|away is now known as jamielennox22:57
*** rcarrillocruz has joined #openstack-infra22:57
morganfainbergpleia2, sounds good. uhm, I'm Pacific time, so I tend to be around a bit later than the east coast folks. other than that, i should be mostly free22:58
JayFI'm working on trying to get a new project imported for ironic, and I can't find any documentation on what modifications are required to openstack-infra/config to execute on it. I've added it to gerritbot_channel_config.yaml, jenkins_job_builder/config/projects.yaml and review.projects.yaml -- What else do I need before pushing the merge request?22:58
*** dstanek_afk has quit IRC22:58
*** jcooley_ has quit IRC22:59
clarkbJayF: is probably be best documentation for the process22:59
clarkbJayF: basically s/stackforge/openstack/ as you do it22:59
openstackgerritA change was merged to openstack-infra/elastic-recheck: Move 1286963 into queries folder
pleia2morganfainberg: yeah I'm pacific too, just give me a ping whenever, I'll be around all day :)23:00
morganfainbergpleia2, awesome23:00
JayFAnd that core group referenced in the gerritt acls is added manually?23:00
*** rcarrillocruz1 has joined #openstack-infra23:01
*** blamar has quit IRC23:01
*** lcheng has joined #openstack-infra23:02
*** rcarrillocruz has quit IRC23:02
clarkbJayF: if it is a new group the magical scriptage to add the project will add the group, but it won't have any initial members. A human needs to add the first member who is then allowed to add the remaining members23:02
JayFgotcha. That works. I can take care of that, tyvm for pointing me at a document23:02
JayFtrying to reverse engineer it was... difficult23:02
clarkbJayF: you can leave a comment on that change or file a bug with us to let us know who the initial member should be23:03
JayFPerfect. Thanks!23:04
sdagueSpamapS: welcome to actually gating :)23:04
*** oubiwan__ has joined #openstack-infra23:05
jeblairclarkb, fungi, mordred:
jeblairclarkb, fungi, mordred: that's what the version of the script i just pushed will do; does that look sane?23:06
jeblairactually, i'm going to change something...23:06
jeblair(i'm adding +f to the operators acl)23:07
*** dstanek_afk is now known as dstanek23:08
*** mugsie has quit IRC23:09
jeblairclarkb, fungi, mordred:
fungijeblair: so that's the differences it would currently apply?23:10
jeblairfungi: yep23:10
clarkbI have to go read channel modes now23:10
clarkbthere are so many23:11
jeblairclarkb: /msg chanserv help flags23:11
jeblairis what i've been working off of23:11
*** oubiwan__ is now known as oubiwann-ef23:13
jeblairokay, i'm going to like go ahead and run that and stuff then.23:13
*** banix has quit IRC23:13
jeblair'openstackinfra' does not show up in that list, so i'm fairly confident it won't hose us.23:13
clarkbjeblair: start with one channel first if you are worried?23:14
jeblairclarkb: off it goes.  :)23:15
jeblairin slow motion; it has a 1 second sleep after each command to avoid flood protection, so it's kinda in slow motion.23:15
jeblairi said that twice23:15
*** reaper has quit IRC23:15
jeblairin my defense, i'm watching the other screen mostly.23:15
*** mfink has joined #openstack-infra23:18
*** bhuvan has joined #openstack-infra23:19
*** bhuvan_ has joined #openstack-infra23:19
*** yassine has quit IRC23:20
*** dizquierdo has quit IRC23:22
*** thedodd has quit IRC23:23
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Add statusbot to all known channels
*** bhuvan has quit IRC23:26
*** bhuvan_ has quit IRC23:26
*** e0ne_ has joined #openstack-infra23:26
*** e0ne has quit IRC23:28
sdagueso what's the RPC issue - ?23:30
sdaguegrr, one sec23:30
sdague2014-03-06 22:42:55.806 | + git fetch refs/zuul/master/Z01d5da78d42e4addbab287038e67118623:30
sdague2014-03-06 22:45:14.060 | error: RPC failed; result=7, HTTP code = 023:30
jeblairirc changes are all done, and the script doesn't want to make any more changes on the next pass23:30
jeblairsdague: that machine is very lightly loaded:
sdaguewell, it just failed23:31
sdaguecausing a gate reset on a pep8 job23:32
jeblairsdague: i understand your question.23:32
*** dkranz has quit IRC23:32
*** mriedem has quit IRC23:32
jeblairso this is tricky...23:35
jeblairnodepool logs the v4 address of machines it creates, but not the v623:35
jeblairand that machine was talking to zm01 over v623:35
zaroclarkb: please check PS3 cover comment
jeblairwe should probably have all the jobs output their ip addresses23:35
jeblairi can guess one from the logs though23:35
clarkbjeblair: ++ to dumping ip addresses23:36
clarkbzaro: I was hoping that someone had checked it won't break review.o.o too :)23:36
*** andreaf2 has joined #openstack-infra23:37
zaroclarkb: sorry out of my league, but i think you can do.23:38
openstackgerritA change was merged to openstack-infra/config: add tests for gerrit builds
clarkbjeblair: for jheskeths add footer change to zuul. Would you prefer I allow you to review that before approving?23:39
jeblairclarkb: please23:39
*** rpodolyaka has joined #openstack-infra23:39
*** andreaf has quit IRC23:39
jeblairsdague: i can't track down a log entry for that.  the lack of ip address means i may just be missing it.  but it's also possible that the worker just never successfully connected to zm01.23:40
*** bhuvan_ has joined #openstack-infra23:41
lifelessclarkb: hey, jog says I shold talk to you about getting tripleo seed/undercloud/overcloud logs into the e-r log pipeline23:43
openstackgerritJeremy Stanley proposed a change to openstack-infra/jeepyb: Welcome message hook query result is an int
clarkbif the end up at logs/screen-servicename.txt on the log archive it will be automagic23:44
clarkbjenkins console log is automagic for all jobs23:44
lifelessclarkb: these are synced from different nodes23:45
lifelessclarkb: and we have N logs per job - e.g. 2 nova-compute logs23:45
clarkboh that makes it trickier23:45
clarkbwhat we really need to do is have the http getter walk the tree over http23:45
clarkbbut I haven't had time to do that. Then pattern match filenames instead of full paths23:46
*** rcleere has quit IRC23:46
*** fbo is now known as fbo_away23:47
*** bhuvan_ has quit IRC23:48
*** bhuvan has quit IRC23:48
openstackgerritBrant Knudson proposed a change to openstack/requirements: Uncap sphinx
openstackgerritJoshua Hesketh proposed a change to openstack-infra/zuul: Add debugging metrics to RPC
openstackgerritJames E. Blair proposed a change to openstack-infra/statusbot: Don't crash on invalid UTF8
jeblairfungi, clarkb: ^ wow.23:54
*** alexpilotti has quit IRC23:54
clarkbjeblair: wow23:56
*** jp_at_hp has joined #openstack-infra23:56
*** changbl has quit IRC23:56
*** Ryan_Lane1 has joined #openstack-infra23:57

