Friday, 2013-09-27

dansmith	sorry if I missed it in the scrollback, but things are wedged right now, yes?	00:00
*** sarob has quit IRC		00:01
sdague	clarkb: awesome	00:02
fungi	dansmith: not that anyone's said until now...	00:04
sdague	dansmith: stable/grizzly is still a problem	00:04
dansmith	fungi: the top thing in the check queue looks to have been there for five hours	00:04
sdague	but master should be fine	00:04
dansmith	my thing queued for master has been sitting in check for 2+ hours	00:05
clarkb	dansmith: I think what is happening there is we have enough jobs in the gate queue that we are starving the check queue	00:06
clarkb	dansmith: as gate queue jobs get dibs on slaves first	00:06
dansmith	clarkb: really? 36 in the gate right?	00:06
clarkb	dansmith: yes	00:06
clarkb	dansmith: but the new NNFI causes a lot more thrashing. Less time in between for check to catch up	00:06
dansmith	it's been much higher than that in the not to distant past	00:06
dansmith	ah	00:06
clarkb	tl;dr we need to fix flakyness	00:07
dansmith	that's some pretty bad starvation.. 5h with no progress..	00:07
*** gyee has quit IRC		00:07
dansmith	okay	00:07
jeblair	also, more cloud servers	00:07
jeblair	but mostly flakyness	00:07
*** sarob_ has quit IRC		00:09
sdague	jeblair: can we burst some more nodes? getting to rc1 is going to be tough if stuff is hanging in check that long	00:10
*** ArxCruz has joined #openstack-infra		00:10
dansmith	cha	00:10
*** sarob has joined #openstack-infra		00:10
sdague	also, we should probably drop large-ops from gate, non voting on the gate just burns time	00:11
fungi	we'd need to get hp to raise our quotas, right?	00:11
sdague	or put the rack nodes back in rotation	00:11
sdague	slow on check wouldn't be that big a deal	00:11
fungi	true	00:11
*** dims has joined #openstack-infra		00:11
*** sarob has quit IRC		00:12
*** sarob has joined #openstack-infra		00:13
jeblair	sdague: that's what i've been working on. :)	00:13
*** dcramer_ has joined #openstack-infra		00:14
jeblair	sdague, fungi: zuul is able to thrash nodes faster than nodepool can keep up, so i'm working on getting nodepool to be able to more or less instantly burst to capacity	00:14
jeblair	we are, however, at the moment pretty close to capacity.	00:14
jeblair	(we've worked up to it over a while)	00:14
* fungi nods		00:15
*** reed_ has quit IRC		00:17
jeblair	node selection by pipeline is possible. we could reserve rackspace nodes for that purpose. we're going to run into unit test node starvation too, which is the next thing i'm going to work on. of course we can spin up more static nodes for now.	00:17
openstackgerrit	Sean Dague proposed a change to openstack-infra/config: drop large-ops from gate (it's non voting) https://review.openstack.org/48545	00:17
jog0	was logs.openstack.org down for a split second for sdague's new patch?	00:17
clarkb	jog0: apache may have been restarted momentarilly	00:17
sdague	so that will help a little	00:17
*** adalbas has quit IRC		00:17
jeblair	jog0: are you ready to make large-ops voting or should we consider https://review.openstack.org/48545 ?	00:17
jog0	clarkb: that explains what i saw thanks	00:18
jog0	jeblair: I am ready	00:18
jeblair	jog0: then can you propose a change to do that	00:19
sdague	so the neutron job looks like it has < 50% pass rate right now - https://jenkins01.openstack.org/job/gate-tempest-devstack-vm-neutron/	00:22
clarkb	it would be cool if gearman priority could be weighted so that as things aged in check they would get more priority and could flop positions with gate	00:22
jeblair	sdague: that includes check jobs	00:22
sdague	it does	00:22
sdague	but I watched 2 neutron based resets in the last 4 minutes	00:22
jeblair	i'm going to be busy with the nodepool bursting change, if someone else wants to take making rackspace nodes available for check jobs	00:22
clarkb	jeblair: I can take a quick stab at it. There is a usergroup thing at 6 that I plan on going to though	00:24
jeblair	sdague: that's complex. i'd rather throw more machines at the problem.	00:24
clarkb	jeblair: how would we make it so those nodes are only used for check? new label and new jobs?	00:24
*** UtahDave has quit IRC		00:24
clarkb	or use a zuul function?	00:24
jeblair	clarkb: new label and zuul parameter function that sets the node to that label	00:25
clarkb	got it.	00:25
openstackgerrit	Joe Gordon proposed a change to openstack-infra/config: Make gate-tempest-devstack-vm-large-ops voting https://review.openstack.org/48547	00:25
jog0	jeblair: done	00:25
*** matsuhashi has joined #openstack-infra		00:28
*** colinmcnamara has joined #openstack-infra		00:28
*** MarkAtwood2 has quit IRC		00:29
*** colinmcnamara has quit IRC		00:35
*** rockyg has quit IRC		00:38
*** nosnos has joined #openstack-infra		00:38
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Use rackspace for tempest check tests. https://review.openstack.org/48549	00:38
clarkb	jeblair: fungi mordred ^ I am really sure that is wrong as gearman node selection doesn't happen with NODE_LABEL iirc	00:39
clarkb	and I need to head to the user group thing, but that should jumpstart the process, feel free to push better patchsets	00:39
*** jhesketh has joined #openstack-infra		00:39
*** rnirmal has quit IRC		00:39
*** senk has joined #openstack-infra		00:40
*** kong has joined #openstack-infra		00:41
jhesketh	jeblair: What do you think about introducing conditional reporting into zuul. For example, since we'll be running our own zuul to report back to gerrit we don't want it to report on merge failures. In fact, we probably only need it to report in certain cases. For example, when our tests fail we always want to report FAILURE but we only need to report SUCCESS when there is a new migration introduced.	00:41
*** weshay has quit IRC		00:41
*** CaptTofu_ has quit IRC		00:43
jog0	clarkb sdague logstash is only 7 hours behind now!	00:44
sdague	nice	00:44
*** julim has joined #openstack-infra		00:44
jog0	looks promising hopefully its not just related to peoples workday	00:44
sdague	yeh, we'll find out tomorrow	00:45
jog0	sdague: saw your new patch in action, will make gate on stacktrace easy	00:45
clarkb	its not. job queue fell by 100k in about an hour	00:46
jog0	\0/	00:47
clarkb	change definitely helped	00:47
*** julim has quit IRC		00:48
*** senk has quit IRC		00:49
jog0	that should have been Obama's catch phrase for his second term	00:51
*** senk has joined #openstack-infra		00:53
*** senk has quit IRC		00:53
*** senk has joined #openstack-infra		00:54
mordred	sdague: I did not see your patch. tell me about it!	00:59
mriedem	sdague: do you have any ideas about this quantumclient issue in the stable/grizzly gate? https://review.openstack.org/#/c/48299/	01:03
*** portante\|afk is now known as portante		01:03
*** xchu has joined #openstack-infra		01:04
Alex_Gaynor	Hmm, so we probably have the ability to compute what %age of gate jobs are passing?	01:04
jog0	Alex_Gaynor: there is a way but I forget but it uses graphite.openstack.or	01:05
Alex_Gaynor	jog0: trying to analyze if my feeling that the fail rate has been crazy high for the last 1-2 days is accurate	01:05
jog0	http://graphite.openstack.org/graphlot/?width=586&height=308&_salt=1380244013.092&target=stats.zuul.pipeline.gate.job.gate-tempest-devstack-vm-full.FAILURE&target=stats.zuul.pipeline.gate.job.gate-tempest-devstack-vm-full.SUCCESS&target=stats.zuul.pipeline.gate.job.gate-tempest-devstack-vm-neutron.SUCCESS&target=stats.zuul.pipeline.gate.job.gate-tempest-devstack-vm-neutron.FAILURE&from=00%3A00_20130926&until=23%3A59_20130926	01:07
Alex_Gaynor	So going back two weeks leads me to believe that yes, failure rates are up	01:09
jeblair	jog0: what's the attraction of graphlot?	01:09
*** sodabrew has quit IRC		01:09
jeblair	as opposed to composer	01:10
jeblair	i find composer easier to use for finding metrics, changing time windows, and applying funcitions...	01:11
jog0	Alex_Gaynor: http://graphite.openstack.org/graphlot/?width=586&from=00%3A00_20130919&_salt=1380244287.508&height=308&target=summarize(stats_counts.zuul.pipeline.gate.job.gate-tempest-devstack-vm-neutron.FAILURE%2C%2224h%22)&target=summarize(stats_counts.zuul.pipeline.gate.job.gate-tempest-devstack-vm-neutron.SUCCESS%2C%2224h%22)&until=23%3A59_20130926&lineMode=staircase	01:11
Alex_Gaynor	jog0: cool, science confirms my intuition!	01:12
*** jrgarciahp has quit IRC		01:12
jog0	jeblair: that was the link that I found first	01:12
jog0	Alex_Gaynor: I can point to the bug too	01:13
jeblair	jog0: please do; i'd like to see who is assigned	01:13
*** senk has quit IRC		01:13
Alex_Gaynor	jog0: my impression was there was a handful of bugs causing this?	01:14
jog0	http://logstash.openstack.org/#eyJzZWFyY2giOiIgQG1lc3NhZ2U6XCJBc3NlcnRpb25FcnJvcjogU3RhdGUgY2hhbmdlIHRpbWVvdXQgZXhjZWVkZWQhXCIgQU5EIEBmaWVsZHMuYnVpbGRfc3RhdHVzOlwiRkFJTFVSRVwiIEFORCBAZmllbGRzLmZpbGVuYW1lOlwiY29uc29sZS5odG1sXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzODAyNDQ0MzM2NzZ9	01:14
jog0	Alex_Gaynor: at least one bug	01:15
jog0	jeblair: no one because I noticed it today	01:15
jog0	I can't even find a stacktrace that caused it	01:15
Alex_Gaynor	jog0: my impression was that it was the boto and the test_volume_boot_pattern ones?	01:15
jeblair	jog0: thank you for that.	01:15
jog0	https://bugs.launchpad.net/tempest/+bug/1230407	01:16
uvirtbot	Launchpad bug 1230407 in neutron "State change timeout exceeded" [Undecided,Confirmed]	01:16
jeblair	also, i'm becoming more and more keen on the idea that we should run the neutron test 10 times for every neutron change	01:16
jog0	jeblair: hahaha	01:16
jog0	by that I mean yes!	01:16
*** thomasm has quit IRC		01:18
jog0	http://logstash.openstack.org/#eyJzZWFyY2giOiJAbWVzc2FnZTpcIk5vdmFFeGNlcHRpb246IGlTQ1NJIGRldmljZSBub3QgZm91bmQgYXRcIiBBTkQgQGZpZWxkcy5idWlsZF9zdGF0dXM6XCJGQUlMVVJFXCIgQU5EIEBmaWVsZHMuZmlsZW5hbWU6XCJsb2dzL3NjcmVlbi1uLWNwdS50eHRcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM4MDI0NDY2ODQ5Nn0=	01:18
jog0	https://bugs.launchpad.net/tempest/+bug/1226337	01:18
jog0	boot pattern	01:18
uvirtbot	Launchpad bug 1226337 in tempest "tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern flake failure" [High,Triaged]	01:18
jog0	anyway you get the idea	01:18
*** wenlock has quit IRC		01:19
jog0	anyone want to send those links to the openstack-dev ML?	01:19
jog0	shaming people for destabilizing during stabilization	01:19
jeblair	jog0: do you not want to? in the past, sdague has started a thread naming specific critical bugs for gate failures and it has helped to focus attention	01:20
jog0	I will go ahead and do it	01:21
jog0	should be fun	01:21
jog0	unless sdague wants to	01:21
*** dkliban has joined #openstack-infra		01:22
*** mriedem has quit IRC		01:23
* jog0 starts drafting a fun email		01:25
*** kong has quit IRC		01:28
*** jerryz has quit IRC		01:28
*** jerryz has joined #openstack-infra		01:29
*** ojacques has quit IRC		01:35
*** melwitt has quit IRC		01:37
*** rfolco has quit IRC		01:39
*** CaptTofu has joined #openstack-infra		01:40
jog0	sent	01:42
jog0	that should be fun	01:42
*** CaptTofu_ has joined #openstack-infra		01:45
*** CaptTofu has quit IRC		01:47
jog0	Alex_Gaynor: I can account for 200 failures in last 24 hours with just two bugs	01:48
Alex_Gaynor	jog0: :/	01:48
*** ArxCruz has quit IRC		01:49
jog0	out of 305	01:49
jog0	or so	01:49
clarkb	wow	01:52
morganfainberg	jog0, thats crazy.	01:54
lifeless	morganfainberg: pretty common	01:59
lifeless	morganfainberg: you get a long tail effect	01:59
morganfainberg	lifeless, aye, still. i know i've had my fair share of rechecks on the bootpattern one	01:59
morganfainberg	lifeless, just didn't realize how _much_ it affected everything	02:00
mordred	I think, as much as I don't like it in theory, that I'd like to skip those two tests in the normal runs	02:01
mordred	but run an extra job for neutron with them on	02:01
mordred	and loop them 10x	02:01
lifeless	morganfainberg: when we first got similar stuff in place for Launchpad, we had something like 80% explained by the first 4 bugs.	02:01
mordred	because those numbers above are crazy	02:01
lifeless	morganfainberg: and then 80% of the remainder from 4 more bugs, and so on.	02:01
morganfainberg	lifeless, lol	02:01
mordred	jog0, sdague: it's a little bitchy, but what do you think?	02:02
*** ericw has quit IRC		02:02
*** dkliban has quit IRC		02:02
dims	jog0, which two tests specifically?	02:03
mordred	dims: jog0 just sent a mail to the -dev list with the deets	02:03
*** ericw has joined #openstack-infra		02:05
*** yaguang has joined #openstack-infra		02:05
dims	mordred, thx	02:06
lifeless	mordred: I think it's a decent accomodation if the problem is test-side, not service side.	02:06
lifeless	mordred: if neutron is actually buggy - and I've seen stuff with tripleo these last few days that makes me think it's service side.	02:07
lifeless	mordred: then the gate is doing it's job and we need to fix the damn things before release.	02:07
*** dkliban has joined #openstack-infra		02:07
mordred	lifeless: yes. I completely agree that we should fix the damn things before the release. I agree that the gate is doing its job	02:07
*** senk has joined #openstack-infra		02:08
mordred	lifeless: I think I'm more brainstorming on how we can better place the onus to fix near where it could be fixed	02:08
lifeless	mordred: Ah, so thats interesting.	02:08
lifeless	mordred: From one sense, having it widespread gets more folk onboard faster.	02:08
mordred	yah. that's the original theory	02:09
lifeless	mordred: in fact, stopping other things changing while we fix brain damage helps prevent slippage: this is exactly the concern you and jeblair have about 'turn off bare metal gating if it breaks'.	02:09
mordred	yes	02:09
lifeless	mordred: OTOH if slippage is a low risk, you are basically breaking everyone elses brains until the thing is fixed.	02:09
mordred	yeah. especially since the thing that is breaking is flaky, so the gate breakage isn't preventing slippage in this case	02:10
mordred	which is where the "take flaky tests and run a job which runs them 10x" idea comes in	02:10
mordred	if we can cause them to be _more_ breaking - but in a targetted manner	02:11
lifeless	maybe we should just run everything N* where N gets us some confidence interval of 'very reliable'	02:11
lifeless	e.g. 10* -> 90% reliable.	02:12
mordred	yah. I could see that as a general strategy once we get past these	02:12
lifeless	run 10 tempest jobs in parallel for every gate.	02:12
mordred	yup	02:12
mordred	the overall machine cost might still be lower than all the gate resets	02:12
mordred	if it helps us not let flaky things in	02:12
lifeless	jog0: do we have an identified bad commit ?	02:12
lifeless	jog0: like 'never before X' ?	02:13
lifeless	can we revert the thing?	02:13
*** reed_ has joined #openstack-infra		02:15
*** senk has quit IRC		02:18
jeblair	lifeless: according to http://logstash.openstack.org/#eyJzZWFyY2giOiIgQG1lc3NhZ2U6XCJBc3NlcnRpb25FcnJvcjogU3RhdGUgY2hhbmdlIHRpbWVvdXQgZXhjZWVkZWQhXCIgQU5EIEBmaWVsZHMuYnVpbGRfc3RhdHVzOlwiRkFJTFVSRVwiIEFORCBAZmllbGRzLmZpbGVuYW1lOlwiY29uc29sZS5odG1sXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjEwMCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM4MDI0NDQzMzY3Nn0=	02:18
jeblair	lifeless: never before 2013-09-20T23:37:40.000 but the major problem started at 2013-09-25T02:09:44.000	02:19
jeblair	lifeless: you'll see what i mean if you look at the graph	02:20
lifeless	yeah	02:20
lifeless	so a commit before 2013-09-25T02:09:44.000	02:20
lifeless	and not far before	02:20
*** CaptTofu_ has quit IRC		02:24
*** dguitarbite has joined #openstack-infra		02:25
*** CaptTofu has joined #openstack-infra		02:26
lifeless	does openstack have a secure document store	02:27
lifeless	where e.g. I can store a bunch of passwords and give them out to selected tripleo folk ?	02:27
lifeless	for context, I want to make getting access to the machines that will host the proposed baremetal test cluster something we can document and delegate.	02:28
lifeless	one test I'm considering is 'tripleo ptl + delegates'	02:28
jeblair	lifeless: no; anteaya is looking into owncloud for the board of directors; we've considered expanding its use if it works out for that.	02:33
lifeless	ok, I'll do something icky for now, but please consider us interested.	02:34
jeblair	lifeless: related: there are plans forming for a keysigning event at the summit	02:34
lifeless	yeah, I need to do a key migration thing	02:34
anteaya	would we have an owncloud separate from the one the board of directors is using?	02:34
lifeless	my gpg key is long in the tooth	02:34
anteaya	or everyone on one owncloud?	02:34
jerryz	Hi everyone, got a version conflict error from oslo.config on my own devstack while starting nova-api, http://paste.openstack.org/show/47585/ need help , thanks	02:34
mordred	anteaya: unsure. I think we'll have to learn a little more about group permissions, management and users in owncloud	02:34
anteaya	very good	02:35
jeblair	yeah, and no need to get ahead; we can do baby steps.	02:35
anteaya	owncloud is up after puppet-dashboard starts processing reports	02:35
anteaya	up meaning next in line for my attention	02:35
mordred	jerryz: awesome! that's just great	02:35
anteaya	jeblair: k	02:35
jeblair	lifeless: yeah, about a year ago i finally decided that having a 1024 bit key from 1996 was a liability, not a badge of honor. :)	02:36
mordred	how did we manage to land that change?	02:36
jeblair	jerryz: can you link to the change?	02:37
mordred	oh! wait	02:37
jerryz	no change here. Just sync the upstream and trigger a tempest test on my own devstack	02:37
mordred	jerryz: you may need to do something	02:38
mordred	jerryz: cd /opt/stack/new/oslo.config	02:38
mordred	rm -rf *.egg-info	02:39
mordred	git pull --ff-only	02:39
mordred	sudo pip install -e .	02:39
mordred	jeblair: you know the one gotcha in the way we're calculating versoins? that a setup.py develop'd install is not going to ever pick up a new version?	02:39
mordred	jeblair: I believe that may be what has happened here	02:40
mordred	sdague, dtroyer ^^ we may want to put something in to restack to clean out egg-info files	02:40
mordred	so that git updates will re-gen versions properly across tag boundaries (where it might be important)	02:41
mordred	clarkb: if you get bored: https://review.openstack.org/#/c/41945/ I think is FINALLY actually ready	02:48
*** anteaya has quit IRC		02:50
jerryz	mordred: thanks. but why the d-g test on o.o does not have this issue? what is the circumstance for it to happen?	02:59
*** dims has quit IRC		02:59
mordred	jerryz: d-g test starts with a completely clean vm each time	02:59
mordred	your vm had some unaccounted for state from previous versions of your git repo	02:59
mordred	jerryz: there is something that could be added to devstack to deal with this, and I'll add that to my todo list	03:00
mordred	but you're lucky enough to have hit a strange corner case	03:00
jerryz	mine is manged by nodepool, i believe it will clean up used ones	03:00
mordred	oh! well that's a whole other thing	03:03
jerryz	mordred: any more info needed to debug this?	03:07
*** dkliban has quit IRC		03:07
mordred	jerryz: honestly, I'm kinda stumped as to how that could happen if that is a completely fresh node	03:08
mordred	jerryz: and it's 11pm here, so I'm probably not going to dig in too much right now	03:08
mordred	jerryz: I'll try to figure out what's going on when I wake up	03:08
jerryz	mordred: ok. thanks. night	03:09
*** sarob has quit IRC		03:11
*** sarob has joined #openstack-infra		03:12
*** matsuhashi has quit IRC		03:15
*** sarob has quit IRC		03:16
*** dkranz has joined #openstack-infra		03:29
*** marun has quit IRC		03:37
*** marun has joined #openstack-infra		03:38
*** nati_ueno has quit IRC		03:38
*** dguitarbite has quit IRC		03:42
*** ryanpetrello has joined #openstack-infra		03:42
clarkb	http://justin.abrah.ms/misc/state_of_githubs_code_review.html	03:45
pleia2	hey, look at that, they link to our gerrit :)	03:47
clarkb	yup :)	03:48
*** Ryan_Lane has quit IRC		03:48
clarkb	those of you that are twittery should twitter the benefits of gerrit	03:49
Alex_Gaynor	grumble, the rate of gate resets is resulting in starving the check pipeline	03:49
clarkb	Alex_Gaynor: yup	03:49
clarkb	Alex_Gaynor: https://review.openstack.org/#/c/48549/ should help	03:50
*** marun has quit IRC		03:50
clarkb	I won't get to fixing it tonight, anyone else is welcome to	03:50
clarkb	(basically run tests in check on the other cloud)	03:50
Alex_Gaynor	clarkb: redundant array of independent clouds!	03:51
*** marun has joined #openstack-infra		03:51
hub_cap	mordred: promise im making progress on the new cli tool. ive got maybe ~2 days of work to go	03:52
*** marun has quit IRC		03:56
*** marun has joined #openstack-infra		03:56
*** matsuhashi has joined #openstack-infra		03:56
*** basha has joined #openstack-infra		04:06
lifeless	clarkb: hey, how do you get uber receipts into HP's system ?	04:13
clarkb	lifeless: I have never had to do it for HP... I use it in seattle for personal things	04:13
pleia2	lifeless: I save the email receipt as pdf	04:13
*** jerryz has quit IRC		04:14
lifeless	pleia2: ah yeah, print-to-pdf	04:15
pleia2	yeah	04:16
*** AlexF has joined #openstack-infra		04:16
*** CaptTofu has quit IRC		04:16
*** CaptTofu has joined #openstack-infra		04:17
*** AlexF has quit IRC		04:21
*** SergeyLukjanov has joined #openstack-infra		04:31
*** AlexF has joined #openstack-infra		04:31
*** basha has quit IRC		04:32
*** reed_ has quit IRC		04:37
*** basha has joined #openstack-infra		04:38
*** sarob has joined #openstack-infra		04:38
*** AlexF has quit IRC		04:42
*** AlexF has joined #openstack-infra		04:43
*** sarob has quit IRC		04:44
*** ericw has quit IRC		04:45
*** jerryz has joined #openstack-infra		04:46
*** ericw has joined #openstack-infra		04:48
*** odyssey4me has joined #openstack-infra		04:50
*** basha has quit IRC		04:52
*** boris-42 has joined #openstack-infra		04:53
*** odyssey4me has quit IRC		04:54
*** odyssey4me has joined #openstack-infra		04:55
mordred	clarkb: nice!	04:55
*** DennyZhang has joined #openstack-infra		04:56
*** sarob has joined #openstack-infra		04:57
Alex_Gaynor	watching the gate today has been so sad	04:59
Alex_Gaynor	Head of the gate was approved 10.5 hours sago :(	04:59
mordred	Alex_Gaynor: yeah. it's been a bad couple of days for that	05:02
Alex_Gaynor	mordred: sadly I can't think of any sane approach to improving it besides "fix the bugs in tempest / <projects>"	05:02
mordred	Alex_Gaynor: yeah. well, did you see my terrible idea earlier (or combo of ideas)	05:02
Alex_Gaynor	mordred: No, I missed it	05:02
mordred	Alex_Gaynor: disable the two bad tests in the normal runs, make a run that does run those tests - and on every neutron change, run 10 copies of that	05:03
mordred	that way, most of the gate is fine, but neutron has to fix the bugs before anything else will land for them	05:03
Alex_Gaynor	mordred: I... I kind of love it (assuming we're sure neutron is at fault)	05:03
*** sarob has quit IRC		05:04
mordred	the bad ones only happen when neutron is enabled	05:04
Alex_Gaynor	mordred: probably the neutrno core reviewers shoudl also stop approving other patches	05:04
mordred	then - once we've cleaned up the top reset offenders	05:04
mordred	add a fanout run to every change which runs 5 copies of the neutron tests for everybody	05:05
mordred	it would explode node usage a bit, but I'm _guessing_ not as bad as all the resets	05:05
Alex_Gaynor	Possibly we need to think of a more general approach to dealing with non-determinism in tests.	05:06
mordred	only systemic way I can think of is running tests multiple times	05:06
mordred	to try to increase the odds of tripping non-deterministic things on their way in	05:06
Alex_Gaynor	The other issue is that non-determinism sometimes doesn't look like it's caused by a patch, even if it is, so people just recheck until it manages to land, even though it's exacerbating a problem	05:07
Alex_Gaynor	I don't know how to address that.	05:07
mordred	well, recheck itself is a bandaid	05:07
clarkb	ya thats a big problem I think	05:07
clarkb	push until it goes in just adds more badness	05:08
mordred	that's there to deal with non-deterministic tests	05:08
clarkb	right but it feeds it too	05:08
mordred	yup	05:08
Alex_Gaynor	maybe the system should handle reverifies with expontential backoff, to prevent a patch that really almost never passes. or something.	05:09
mordred	if we could figure out a better way to block flaky tests (such as parallel copies, or someting better)	05:09
mordred	then we could make recheck/reverify go away	05:09
Alex_Gaynor	right, making them go away would be ideal	05:09
mordred	and save that feature for only things that infra triggeres, such as "the internet exploded"	05:09
Alex_Gaynor	I wonder if the number of nodes we're spawning and shutting down produce a noticable blip for people at RS/HP observing. Probably not I guess	05:11
*** afazekas_zz has quit IRC		05:20
*** AlexF has quit IRC		05:21
* mordred likes to think that both clouds have dedicated ops teams who just watch our activity and marvel		05:22
jerryz	mordred: could you tell me how package version number is calculated? i got variations of version numbers for oslo.config when doing pip install -e . locally	05:22
mordred	jerryz: yes, it's very similar to how git describe works	05:23
mordred	if the current commit is tagged, then that is the version	05:23
*** nicedice has quit IRC		05:23
mordred	if the current commit is not taged, then the version is $next_version.a$number_of_commits_since_last_tag.g$git_short_sha	05:23
mordred	where next_version is the version in setup.cfg	05:24
mordred	this is how the version is calculated for the server repos and for the oslo code	05:24
mordred	for library code, it's different (and slightly easier)	05:24
mordred	jerryz: so _currently_ oslo.config master should be showing you:	05:25
mordred	mordred@camelot:~/src/openstack/oslo.config$ python setup.py --version	05:25
mordred	1.2.1	05:25
mordred	if you're not seeing that, then my guess would be perhaps you're not fetching tags?	05:26
jerryz	if my oslo.config code base is synced from upstream , which is review.o.o or github, the tag 1.2.1 should be already in the code	05:27
jerryz	why i still get 1.2.0.**** if i install from a git clone from my private repo that is synced with upstream	05:27
*** cthulhup has joined #openstack-infra		05:28
*** SergeyLukjanov has quit IRC		05:29
*** cthulhup has quit IRC		05:29
mordred	the only other thing is - if the repo was used before, the version calculation is cached in the egg-info dir	05:31
mordred	when you say "if i install from a git clone from my private repo that is synced with upstream" - how are you syncing your private repo?	05:31
mordred	jerryz: actually, funny story - look at the most recent commit to oslo.config	05:33
mordred	and the commit message	05:33
mordred	it seems this was a problem for us back on Sunday	05:33
*** ericw has quit IRC		05:33
*** odyssey4me has quit IRC		05:36
jerryz	mordred: it seems that when syncing the upstream to private repo, i didn't push tags	05:37
mordred	phew. well, that at least explains it!	05:37
*** afazekas has joined #openstack-infra		05:41
*** SergeyLukjanov has joined #openstack-infra		05:42
*** SergeyLukjanov has quit IRC		05:44
*** ryanpetrello has quit IRC		05:44
*** ryanpetrello has joined #openstack-infra		05:45
*** Ryan_Lane has joined #openstack-infra		05:46
*** Ryan_Lane has joined #openstack-infra		05:46
*** Ryan_Lane has quit IRC		05:46
*** nati_ueno has joined #openstack-infra		05:56
*** DennyZhang has quit IRC		06:03
*** marun has quit IRC		06:06
*** davidhadas_ has quit IRC		06:06
*** amotoki has joined #openstack-infra		06:15
*** yolanda has joined #openstack-infra		06:15
*** afazekas_ has joined #openstack-infra		06:16
*** afazekas_ has quit IRC		06:17
*** jhesketh has quit IRC		06:20
*** jhesketh__ has quit IRC		06:20
*** jhesketh_ has joined #openstack-infra		06:20
*** yongli_away is now known as yongli		06:26
*** slong has quit IRC		06:29
*** jhesketh has joined #openstack-infra		06:34
*** shardy_afk is now known as shardy		06:38
*** odyssey4me has joined #openstack-infra		06:55
*** Ryan_Lane has joined #openstack-infra		06:57
*** Ryan_Lane has quit IRC		07:01
openstackgerrit	Rongze Zhu proposed a change to openstack-infra/gitdm: Add two employees to UnitedStack https://review.openstack.org/48597	07:11
*** hashar has joined #openstack-infra		07:20
*** Ryan_Lane has joined #openstack-infra		07:21
ttx	fungi: (to solve exclusionary reqs) if you except pep8 those seem to come from ceilometer and swift, but those two projects weren't in the gate in stable/folsom times, so i'm not sure why we would consider them ?	07:22
*** hashar_ has joined #openstack-infra		07:25
*** hashar has quit IRC		07:25
*** hashar_ is now known as hashar		07:25
*** fbo_away is now known as fbo		07:25
*** hashar has quit IRC		07:25
*** hashar has joined #openstack-infra		07:26
*** Ryan_Lane has quit IRC		07:29
*** flaper87\|afk is now known as flaper87		07:32
*** mrda has quit IRC		07:42
*** tvb\|afk has joined #openstack-infra		07:43
*** tvb\|afk has joined #openstack-infra		07:43
*** jcoufal has joined #openstack-infra		07:45
*** yassine has joined #openstack-infra		07:47
*** basha has joined #openstack-infra		07:47
*** mrda has joined #openstack-infra		07:49
*** basha has quit IRC		07:49
*** jcoufal has quit IRC		07:49
*** boris-42 has quit IRC		07:50
*** SergeyLukjanov has joined #openstack-infra		07:53
*** Ryan_Lane has joined #openstack-infra		07:56
*** Ryan_Lane has quit IRC		08:01
*** mrda has quit IRC		08:07
*** SergeyLukjanov has quit IRC		08:09
*** dizquierdo has joined #openstack-infra		08:10
*** jcoufal has joined #openstack-infra		08:13
*** SergeyLukjanov has joined #openstack-infra		08:13
*** thomasbiege1 has joined #openstack-infra		08:16
*** thomasbiege1 has quit IRC		08:19
*** DinaBelova has joined #openstack-infra		08:22
*** Ryan_Lane has joined #openstack-infra		08:27
*** nati_ueno has quit IRC		08:28
*** Ryan_Lane has quit IRC		08:31
*** johnthetubaguy has joined #openstack-infra		08:31
*** mancdaz has quit IRC		08:33
*** dizquierdo has quit IRC		08:33
*** derekh has joined #openstack-infra		08:34
*** mancdaz has joined #openstack-infra		08:35
*** jerryz has quit IRC		08:41
*** DinaBelova has quit IRC		08:43
*** tvb\|afk has quit IRC		08:44
*** tvb\|afk has joined #openstack-infra		08:44
*** tvb\|afk has joined #openstack-infra		08:44
*** tvb\|afk is now known as tvb		08:44
*** locke105 has quit IRC		08:49
*** locke105 has joined #openstack-infra		08:50
openstackgerrit	Pavel Sedlák proposed a change to openstack-infra/jenkins-job-builder: KeepLongStdio argument for JUnit publisher https://review.openstack.org/48431	08:51
*** samalba has quit IRC		08:52
*** samalba has joined #openstack-infra		08:53
*** jcoufal has quit IRC		08:55
*** Ryan_Lane has joined #openstack-infra		08:57
*** Ryan_Lane has quit IRC		09:02
*** boris-42 has joined #openstack-infra		09:05
*** tvb is now known as Tristan_		09:10
*** Tristan_ is now known as Guest77656		09:11
*** Guest77656 is now known as tvb		09:11
*** dizquierdo has joined #openstack-infra		09:15
*** Ryan_Lane has joined #openstack-infra		09:27
*** Ryan_Lane has quit IRC		09:32
openstackgerrit	Jaroslav Henner proposed a change to openstack-infra/jenkins-job-builder: Add dynamic string and choice params. https://review.openstack.org/48506	09:40
*** Ryan_Lane has joined #openstack-infra		09:58
*** Ryan_Lane has quit IRC		10:02
*** hashar has quit IRC		10:04
*** hashar has joined #openstack-infra		10:10
*** hashar has quit IRC		10:14
*** AlexF has joined #openstack-infra		10:16
*** kmartin has quit IRC		10:17
*** fifieldt has quit IRC		10:28
*** tvb has quit IRC		10:28
*** Ryan_Lane has joined #openstack-infra		10:29
*** DinaBelova has joined #openstack-infra		10:30
*** dkehn_ has joined #openstack-infra		10:31
*** dkehn has quit IRC		10:31
*** hashar has joined #openstack-infra		10:31
*** Ryan_Lane has quit IRC		10:33
*** DinaBelova has quit IRC		10:33
*** hashar has quit IRC		10:36
*** thomasbiege1 has joined #openstack-infra		10:40
*** matsuhashi has quit IRC		10:52
*** yaguang has quit IRC		10:56
*** tvb has joined #openstack-infra		10:59
*** tvb has quit IRC		10:59
*** tvb has joined #openstack-infra		10:59
*** Ryan_Lane has joined #openstack-infra		10:59
*** Ryan_Lane has quit IRC		11:03
*** tvb has quit IRC		11:07
*** thomasbiege1 has quit IRC		11:09
*** johnthetubaguy has quit IRC		11:10
*** AlexF has quit IRC		11:10
openstackgerrit	Jaroslav Henner proposed a change to openstack-infra/jenkins-job-builder: Add dynamic string and choice params. https://review.openstack.org/48506	11:14
BobBall	mordred: when you're around could you let me know? I want to understand what sort of stats you think would be useful to show that smokestack's -1's are stable to feed into the discussion of whether they can be upgraded to -2?	11:14
*** AlexF has joined #openstack-infra		11:14
*** tvb has joined #openstack-infra		11:20
*** tvb has quit IRC		11:20
*** tvb has joined #openstack-infra		11:20
*** thomasbiege1 has joined #openstack-infra		11:24
*** thomasbiege3 has joined #openstack-infra		11:27
*** thomasbiege1 has quit IRC		11:27
*** tvb has quit IRC		11:28
*** Ryan_Lane has joined #openstack-infra		11:30
*** tvb has joined #openstack-infra		11:30
sdague	BobBall: if it's not run by CI team, it really can't be -2	11:30
*** giulivo has joined #openstack-infra		11:31
sdague	we can't have an external entity have the ability to have an infrastructure fail then break the gate for everyone, we've got enough challenges with infrastructure we control doing that	11:31
*** shardy is now known as shardy_afk		11:32
BobBall	I'm referring to the discussion which finished with http://lists.openstack.org/pipermail/openstack-infra/2013-August/000196.html - of course, the infra team needs the ultimate authority and the revokation of -2 privs easily solves that	11:32
BobBall	just like the "ultimate" sanction of moving a job from voting to non-voting	11:33
BobBall	doesn't really need any work from the infra team to fix it, but ensures that the team responsible for the job/etc will fix it before being considered for the priviledge again	11:33
*** thomasbiege3 has quit IRC		11:33
sdague	ok, sorry, different thread I was thinking about	11:34
*** Ryan_Lane has quit IRC		11:34
BobBall	I think it's the same thread - but my starting suggestion was unworkable and I completely understand why that was now!	11:34
sdague	so I think the stat mordred actually wants there is how often is someone ignoring a -1 from smokestack	11:35
BobBall	Basically what I think would be useful is for SS to run in parallel to the gate and post a -2 vote if it completes it's testing and finds a failure in the tests (we've specifically only included test-failures in voting - so if a packging failure occurs, it doesn't post)	11:36
BobBall	if the gate finishes first, then tough, SS doesn't get a chance to say whether it thinks a patch works or not	11:36
BobBall	nod - I've got those stats	11:36
BobBall	but I want to get more details because I think there are other useful things	11:36
sdague	https://review.openstack.org/#/q/status:merged+Verified-1+project:openstack/nova,n,z	11:37
BobBall	ohhh useful query	11:37
BobBall	I was doing it through SSH	11:37
*** thomasbiege has joined #openstack-infra		11:37
sdague	so it's only happened twice this year on nova	11:37
BobBall	I'll have to look into those two	11:38
BobBall	but they were way before the automatically posting / packaging fxies that changed the SS workflow	11:38
sdague	the first one, smokestack was broken (January)	11:38
sdague	BobBall: sure	11:38
sdague	but that's even more indication that there is no need for SS to have -2	11:38
BobBall	but I'm also interested in the stats about how regularly SS had posted before jenkins returned	11:38
openstackgerrit	Darragh Bailey proposed a change to openstack-infra/jenkins-job-builder: Add repo scm https://review.openstack.org/45165	11:39
sdague	right, but it will still post a -1 even if we went to merge	11:39
BobBall	think so, yes	11:39
sdague	unless you did something very magical	11:40
BobBall	heh :)	11:40
*** CaptTofu has quit IRC		11:40
sdague	we get jenkins check results after we're in the gate	11:40
sdague	sometimes	11:40
*** CaptTofu has joined #openstack-infra		11:40
sdague	https://review.openstack.org/#/c/42361/ is the only override in the last 6 months	11:40
BobBall	What do you mean by override?	11:41
sdague	the only time we merged a change that SmokeStack had a -1 on	11:41
BobBall	oh, yes	11:41
sdague	so I think you are trying to solve a problem that doesn't exist :)	11:42
openstackgerrit	Darragh Bailey proposed a change to openstack-infra/jenkins-job-builder: Add repo scm https://review.openstack.org/45165	11:42
BobBall	depends on what the problem really is :)	11:42
sdague	ok, what do you think the problem is? maybe I don't understand	11:42
BobBall	From my perspective we've got a system that can be used to gate changes to prevent breakages to the XenAPI driver	11:43
BobBall	that's the criteria for being a "Group A" hypervisor	11:43
sdague	but it's already doing it	11:43
sdague	we only had 1 override in the last 6 months	11:44
BobBall	and while I'm working hard on getting XenServer tested in the gate properly, there have been lots of hiccups along the way	11:44
BobBall	nah, it's already "functional testing provided by an external system that does not gate commits"	11:44
*** matsuhashi has joined #openstack-infra		11:44
sdague	so this is really just about moving from B -> A state?	11:44
*** thomasbiege has quit IRC		11:44
sdague	not actually about keeping the breaks out of the tree?	11:45
BobBall	Group A is about a system that ensures the breaks are kept out - rather than relying on the reviewers	11:45
sdague	from a code perspective, the problem is already solved	11:45
BobBall	agreed	11:45
sdague	we rely on reviewers for all sorts of things, especially as we don't have 400% test coverage	11:46
sdague	and the reviewers aren't failing us here	11:46
sdague	1 override in 6 months is not a real failure rate	11:47
sdague	so you are trying to fix a problem that doesn't exist	11:47
sdague	if the override rate was twice a day, I'd agree with you	11:47
BobBall	So is your view that group A and B should really be considered the same thing because an automated process and manual process-that-works are as good as each other	11:47
sdague	they are different, because group B isn't being run by the project. So if entity X that is running external CI stops, the project can do nothing about it.	11:48
BobBall	So you think that A needs to be integrated with the gate and B is external irrespective of whether it "gates" or not	11:49
sdague	realize "has -2" requires that it is "run by the CI team"	11:49
sdague	I think that's where the definition might not have been clear. To be group A I really think it needs to be run by infrastructure team for OpenStack. I don't see another way we could do that.	11:50
BobBall	I thought the discussion we had last month suggested that the -2 privs could be given to an external system because it's easy enough for the CI team to revoke those privs if they ever break the gate	11:50
sdague	I didn't think that was suggested	11:51
sdague	I'm -2 on the idea of non infra run systems having -2 on integrated projects	11:51
*** dims has joined #openstack-infra		11:51
BobBall	grin That was my suggestion, but I thought mordred's suggestion to talk about it again when SS was proving it's stability with automated -1's meant that possibility was open :)	11:53
*** AlexF has quit IRC		11:53
sdague	my reading of that is wanting to see how often the override was a problem was to make it clear there was nothing wrong with being only a -1 job	11:54
sdague	because the -1 has been respected 99.99% of the time	11:54
*** SergeyLukjanov has quit IRC		11:55
sdague	I'll let him speak for himself when he gets up though :)	11:55
sdague	but that's my take	11:55
BobBall	understood	11:55
*** pcm_ has joined #openstack-infra		11:56
sdague	fungi when you get up, I had a question on job definition	11:59
sdague	mostly around neutron jobs	11:59
*** Ryan_Lane has joined #openstack-infra		12:00
*** matsuhashi has quit IRC		12:01
*** afazekas is now known as afazekas_food		12:01
*** AlexF has joined #openstack-infra		12:02
*** SergeyLukjanov has joined #openstack-infra		12:05
*** adalbas has joined #openstack-infra		12:05
dims	hi, looking at zuul page none of the "check" jobs seem to have a progress bar. they are marked "queued" . Do the gate jobs take precedence and check jobs will wait for their turn? or is there some other problem?	12:07
openstackgerrit	Sean Dague proposed a change to openstack-infra/config: add gate-tempest-devstack-vm-neutron-pg job https://review.openstack.org/48635	12:07
sdague	dims: gate takes priority	12:07
sdague	so yes, check queue is starved right now	12:07
*** Ryan_Lane has quit IRC		12:07
dims	sdague, thanks!	12:08
sdague	basically before nnfi the gate would be sitting in a hold until the gate failure was resolved, so the check jobs would run in and grab all the devstack nodes	12:08
sdague	but now because the gate throughput is up, they are grabbing every resource	12:08
dims	makes sense	12:08
sdague	and because the neutron race which is killing most jobs, that's kind of problematic	12:09
sdague	fungi / jeblair: check queue is now > 150, so bursting would be nice :)	12:09
*** matsuhashi has joined #openstack-infra		12:10
*** matsuhashi has quit IRC		12:11
*** flaper87 is now known as flaper87\|afk		12:12
*** AlexF has quit IRC		12:14
*** thomasm has joined #openstack-infra		12:17
*** thomasbiege has joined #openstack-infra		12:20
*** AlexF has joined #openstack-infra		12:20
*** hashar has joined #openstack-infra		12:21
*** ArxCruz has joined #openstack-infra		12:21
*** thomasbiege has quit IRC		12:22
*** flaper87\|afk is now known as flaper87		12:22
*** dims has quit IRC		12:22
*** dims has joined #openstack-infra		12:23
*** weshay has joined #openstack-infra		12:28
*** acabrera has joined #openstack-infra		12:29
*** acabrera is now known as alcabrera		12:29
*** tvb has quit IRC		12:30
*** matsuhashi has joined #openstack-infra		12:35
*** tvb has joined #openstack-infra		12:35
*** Ryan_Lane has joined #openstack-infra		12:36
*** dkliban has joined #openstack-infra		12:36
*** jhesketh has quit IRC		12:37
*** jhesketh_ has quit IRC		12:37
*** Ryan_Lane has quit IRC		12:40
ttx	fungi: finally fixed bug 1160277	12:44
uvirtbot	Launchpad bug 1160277 in openstack-ci "Groups have similar names in LP and gerrit but are no longer synced" [Medium,Fix released] https://launchpad.net/bugs/1160277	12:44
ttx	fungi: while looking at the groups list in gerrit though, I found a few groups that are probably useless and should be removed:	12:44
ttx	fungi: empty copy of the LP "heat" group: https://review.openstack.org/#/admin/groups/92,members	12:44
*** afazekas_food has quit IRC		12:45
ttx	hmm, that's all.	12:45
*** johnthetubaguy has joined #openstack-infra		12:47
*** basic` has joined #openstack-infra		12:47
fungi	sdague: what's your job definition question?	12:48
fungi	ttx: yeah, i try to empty and set unused groups non-visible	12:48
fungi	gerrit doesn't have a "delete group" feature	12:48
sdague	fungi: can we specify the same job twice on a zuul run	12:48
ttx	fungi: ah. ah.	12:48
fungi	ttx: eventually i'll get around to determining how to construct a query which identifies an empty group and removes all traces of it from the various tables it might appear in	12:49
fungi	sdague: i don't think we've tried, so not entirely sure	12:49
fungi	back to the "run neutron tempest 10x for neutron jobs" idea presumably	12:50
*** jhesketh has joined #openstack-infra		12:50
fungi	er, for neutron changes	12:50
*** jhesketh_ has joined #openstack-infra		12:50
fungi	lemme see if a duplicate entry horks up the layout.yaml parser at least	12:50
fungi	ttx: i didn't get as far as the nova requirements sync in folsom yesterday, ran into some more corner cases, but did get the patches for openstack/requirements on folsom and grizzly with the capped list including all transitive dependencies for all integrated projects on that branch... https://review.openstack.org/#/q/topic:bug/1172418,n,z	12:52
fungi	ttx: steps i'm following are described at https://etherpad.openstack.org/XpIFEzhkgY along with some details on manual conflict resolution between some of the projects' requirements lists	12:53
fungi	the changes to the requirements project may need some more massaging since i crudely backported a couple changes from master to rename/combine the lists there	12:54
*** rfolco has joined #openstack-infra		12:55
ttx	fungi: did you see my questions above about the need to care about ceilometer in stable/folsom at all ?	12:59
*** crank has quit IRC		13:01
fungi	ttx: haven't hit the scrollback yet, but will look	13:02
ttx	(that was answering your question on how to solve conflicting reqs)	13:03
fungi	looks like removing them will solve the anyjson conflict at least	13:03
*** zul has quit IRC		13:03
ttx	fungi: also was wondering about swift since they were not in the gate in those ancient folsom times	13:04
ttx	ignoring both would solve all conflicts	13:04
ttx	except pep8	13:04
*** dkehn_ is now known as dkehn		13:04
fungi	so it would	13:04
*** julim has joined #openstack-infra		13:04
*** ericw has joined #openstack-infra		13:05
*** tizzo has joined #openstack-infra		13:06
*** Ryan_Lane has joined #openstack-infra		13:06
*** davidhadas_ has joined #openstack-infra		13:06
*** dprince has joined #openstack-infra		13:07
*** zul has joined #openstack-infra		13:07
fungi	though the versions i settled on to resolve those other conflicts are basically still the right one after factoring swift out of folsom	13:07
ttx	ok then :)	13:08
*** dizquierdo has left #openstack-infra		13:09
*** Ryan_Lane has quit IRC		13:11
*** HenryG has joined #openstack-infra		13:11
*** xchu has quit IRC		13:11
sdague	fungi: well at least run neutron more than once	13:11
sdague	right now it's way too easy for a race to come through	13:12
ekarlso	any of you familiar with disk image builder ?	13:12
sdague	so running 2x neutron and 2x neutron-pg would make it closer to other projects in how easy it is to slip a change through	13:12
sdague	ekarlso: you probably want #tripleo	13:13
*** nosnos has quit IRC		13:15
*** mriedem has joined #openstack-infra		13:18
*** HenryG has quit IRC		13:19
*** HenryG has joined #openstack-infra		13:19
*** salv-orlando has joined #openstack-infra		13:20
*** crank has joined #openstack-infra		13:20
*** prad_ has joined #openstack-infra		13:23
sdague	fungi: so check queue is at 170 and growing because of the gate starvation, which is actually making folks jump the check queue, hence making the gate worse (at least a couple non Jenkins +1ed changes over in there)	13:25
sdague	any idea how we can aleviate this?	13:25
dansmith	yeah, my thing from yesterday still hasn't run check, after 15h	13:25
*** afazekas has joined #openstack-infra		13:26
ttx	sdague: needs a slightly smarter prioritization algorithm, I fear	13:27
sdague	ttx: the reality is we'll just move the pain around	13:27
sdague	ttx: but I agree	13:27
sdague	clarkb and jeblair were working on this last night, but I guess no progress, and I don't think they realized quite how bad it was	13:28
dansmith	yeah, my thing from yesterday is critical, so it just got +A'd since jenkins never voted on it	13:28
ttx	sdague: at some point going faster just makes you go slower. This is a complex system :)	13:28
*** matty_dubs\|gone is now known as matty_dubs		13:28
fungi	it looks like we're starved on devstack slaves, so adding more unit test slaves isn't going to help	13:28
sdague	fungi: yeh, this is all devstack starvation	13:29
sdague	also, given that stable/grizzly is bust, that's not helping either	13:30
*** bnemec_ is now known as beekneemech		13:30
sdague	as those are guarunteed resets right now	13:30
sdague	that's how we just lost the gate	13:31
*** yassine has quit IRC		13:31
fungi	someone approved a grizzly change?	13:31
*** yassine has joined #openstack-infra		13:31
sdague	yes	13:31
fungi	the list of people able to do stable branch approvals is small--we should at least tell those people to cut it out until grizzly is fixed	13:31
sdague	well, 8hrs ago they do	13:32
sdague	https://review.openstack.org/#/c/47080/	13:32
sdague	it took 8hrs for that to get to the top of the gate, fwiw	13:32
fungi	https://review.openstack.org/#/admin/groups/120,members plus https://review.openstack.org/#/admin/groups/11,members	13:35
*** Ryan_Lane has joined #openstack-infra		13:36
*** johnthetubaguy1 has joined #openstack-infra		13:39
*** johnthetubaguy has quit IRC		13:40
*** Ryan_Lane has quit IRC		13:41
*** CaptTofu has quit IRC		13:44
sdague	fungi: any idea where the scheduling config is in zuul, so we could at least unstarve check?	13:44
*** CaptTofu has joined #openstack-infra		13:44
*** dcramer_ has quit IRC		13:45
*** guohliu has joined #openstack-infra		13:46
fungi	sdague: in zuul's layout.yaml, within entries in the pipelines section there are precedence parameters	13:46
fungi	we could, for example, put gate and check back on equal footing that way	13:47
fungi	so that the gate will take 2-3x as long to clear as it is now	13:47
fungi	we can't currently set proportional shares or anything though (to say 75% of available resources go to gate jobs and 25% go to check jobs)	13:48
openstackgerrit	Sean Dague proposed a change to openstack-infra/config: make check queue high priority https://review.openstack.org/48657	13:49
sdague	fungi: yeh, equal priority I thik would be the right call	13:49
sdague	the gate's really not merging much code right now anyway because of the resets	13:50
sdague	and debug fixes to get to the bottom of those issues, are blocked on check, and not getting feedback	13:50
dansmith	+1	13:51
fungi	as to your earlier question about multiple instances of the same job for a given project+pipeline, i did confirm that doesn't fail the layout parsing check but still no idea what zuul would do with it	13:52
*** CaptTofu has quit IRC		13:52
sdague	fungi: ok, well we can ponder that one later :)	13:52
*** CaptTofu has joined #openstack-infra		13:52
sdague	so what do you think about leveling the queues? per - https://review.openstack.org/48657	13:52
*** yassine has quit IRC		13:53
*** yassine has joined #openstack-infra		13:53
openstackgerrit	Jeremy Stanley proposed a change to openstack-infra/config: Temporarily raise check pipeline precedence https://review.openstack.org/48659	13:55
fungi	oh, you wrote one already	13:56
sdague	fungi: yeh :)	13:56
dhellmann	good morning	13:56
sdague	morning	13:57
dhellmann	sdague: it sounds like there are still issues with stable/grizzly because of the cliff change and quantumclient. I'm thinking of just releasing a cliff that doesn't use pyparsing at all, to remove the conflict.	13:58
soren	Hm. I'm trying to use jenkins-job-builder, but my Jenkins has CSRF enabled and python-jenkins doesn't seem to support that. How have you worked around it for the OpenStack Jenkins?	13:58
*** julim has quit IRC		13:58
fungi	sdague: abandoned mine, +2'd yours. i expect jeblair will be waking up any time so let's get his input on it	13:58
sdague	dhellmann: that would be awesome	13:58
dhellmann	sdague: ok, I'll get back to work on that, then.	13:59
sdague	fungi: ok	13:59
*** julim has joined #openstack-infra		13:59
fungi	soren: good question... where is the csrf option in jenkins? i'll check whether we set it (we don't really use the webui enough to worry about that)	14:00
soren	fungi: I just found http://ci.openstack.org/jenkins.html	14:00
soren	fungi: ...which says not to enable CSRF.	14:00
soren	Scary.	14:00
fungi	i suppose that would do it	14:00
fungi	well, again, if you treat its http interface as an api endpoint only and don't use it for browsery clicky-clicky things, it's not particularly scary	14:01
*** shardy_afk is now known as shardy		14:02
fungi	your api client is not going to be following links from other sites (one would hope)	14:02
fungi	this mostly underscores the need for jenkins to separate its web interface and its api endpoint	14:03
*** anteaya has joined #openstack-infra		14:03
fungi	also, when i do need to connect into any sort of web interface as an admin, i use an entirely separate browser to log into that and only that, but thankfully most of the stuff we administer doesn't require a webui	14:05
*** Ryan_Lane has joined #openstack-infra		14:07
fungi	i wonder if http://javadoc.jenkins-ci.org/hudson/security/csrf/CrumbExclusion.html could be leveraged for that more recently	14:08
openstackgerrit	Felipe Reyes proposed a change to openstack-infra/jenkins-job-builder: Added support for Git shallow clone parameter https://review.openstack.org/48661	14:09
*** mrodden has joined #openstack-infra		14:10
*** Ryan_Lane has quit IRC		14:11
*** rnirmal has joined #openstack-infra		14:13
dhellmann	sdague: I'm trying to think of a plan for testing a new cliff release without actually releasing it and potentially causing more things to break. Any ideas?	14:15
sdague	if we had spare gate time, I would. But as that is all starved... I don't know	14:16
sdague	we could make a requirements proposed change with a tarball link	14:16
dhellmann	I can run tests locally, I'm just trying to reason through would I would need to do	14:16
dhellmann	oh, that's interesting	14:16
sdague	that would at least test master	14:16
*** dizquierdo has joined #openstack-infra		14:17
dhellmann	I'm assuming if I remove the pyparsing requirement from cliff, the one in stable/grizzly will be useless but not have a conflict	14:17
dhellmann	so stable/grizzly will think it needs a version of pyparsing that nothing will import	14:17
sdague	right, so it won't wedge in stable/grizzly	14:17
dhellmann	right	14:17
sdague	I think that's right	14:17
sdague	honestly, I'm only about 1/2 way down the rabbit hole on that one, as I thought others were working it	14:18
dhellmann	can I point the requirements file at a git URL? that would make it easy for me to test locally	14:18
dhellmann	me, too	14:18
dhellmann	I thought it was just a matter of removing that dependency, but apparently it's hard to get to the quantumclient part of the repo and do a release or something	14:18
sdague	dhellmann: yeh, you can change the repos for devstack	14:18
sdague	in localrc	14:18
soren	fungi: csrf isn't about how you use the web ui, after all.	14:18
sdague	either alt url, or alt branch	14:18
dhellmann	sdague: no, I mean have the global requirements point to git for cliff	14:19
soren	fungi: It's about how your browser can be tricked into using it.	14:19
sdague	dhellmann: I don't remember if it can point to a git	14:19
sdague	but it can do a tarball, like oslo does	14:19
dhellmann	ok, I can make a local sdist	14:19
soren	sdague: You can point pip at a git url.	14:19
fungi	soren: yep. not logging authenticating to the jenkins administrative webui with your browser is a great way to thwart that	14:19
soren	sdague: git+https://github.com/blah	14:19
fungi	er, not authenticating	14:20
*** tvb has quit IRC		14:20
sdague	soren: ok, except I'm not sure we propogate those via our global requirements sync	14:20
sdague	I know we do the oslo tar case	14:20
soren	sdague: Sorry, I replied entirely out of context. :)	14:20
*** KennethWilke has joined #openstack-infra		14:20
sdague	yep, no worries :)	14:20
soren	fungi: Jenkins seems less useful if you never look at it :)	14:21
sdague	it's good to know though, probably something worth looking to add to our reqs sync	14:21
fungi	soren: but yeah, having an automation-friendly means of authenticating to the api endpoint entirely separate from browser handling	14:21
fungi	something it lacks	14:21
fungi	soren: probably the other reason we don't need to authenticate to it often is that we have it set up with anonymous read access enabled, so as long as you're not changing things through the webui you don't need to log into it	14:22
jd__	huhu, today ETA for a Ceilometer patch merge seems to be around 8 hours, FWIW	14:23
fungi	jd__: yeah, we're proposing slowing that down further ;)	14:23
*** datsun180b has joined #openstack-infra		14:23
jd__	if that improves quality even further I wouldn't mind	14:23
jd__	I prefer to wait 8 hours for a merge than spending my days doing rechecks :-)	14:24
sdague	jd__: the gate's at about 8 hrs merge time right now because of all the resets	14:24
soren	fungi: Ah, good point. Mine's set up to always require authentication.	14:24
sdague	however, the check queue is currently starved, so nothings moved there for the last 15 hrs	14:24
jd__	sdague: ah I didn't know there has been reset, cool then	14:24
*** wchrisj_ has joined #openstack-infra		14:24
sdague	jd__: not a zuul reset	14:24
sdague	fails by stuff in the gate	14:24
jd__	oh I see	14:25
sdague	the gate failure rate is really high	14:25
jd__	the new tree stuff ?	14:25
sdague	no, bugs in openstack	14:25
*** amotoki has quit IRC		14:26
fungi	shush. openstack has no bugs. you're dreaming	14:26
jd__	sdague: bugs in new patchset being tested you mean, or existing bugs (rechecks)?	14:26
dims	lol	14:26
sdague	http://lists.openstack.org/pipermail/openstack-dev/2013-September/015743.html	14:26
sdague	existing bugs	14:26
jd__	ok :)	14:26
*** adalbas has quit IRC		14:28
*** wchrisj_ has quit IRC		14:29
dansmith	http://img819.imageshack.us/img819/3070/6exn.png	14:30
sdague	what is definitely interesting is the Test Nodes graphic at the bottom of the page has a very distinctive look when we are in reset land	14:30
sdague	the peaks going up and down	14:30
dansmith	it's pretty amazing how small gnome-terminal will go, so at least I can see all of the nova stuff block-wise :)	14:30
sdague	heh	14:30
*** dcramer_ has joined #openstack-infra		14:32
*** tvb has joined #openstack-infra		14:32
*** mrodden has quit IRC		14:33
*** markmcclain has joined #openstack-infra		14:33
mordred	morning all	14:34
Alex_Gaynor	morning mordred	14:34
mordred	soren: we kinda think Jenkins is less useful in general, and thus never really look at it :)	14:35
sdague	mordred: how do you feel about rebalancing the queues? :)	14:35
sdague	mordred: https://review.openstack.org/#/c/48657/	14:35
*** senk has joined #openstack-infra		14:36
sdague	we have stuff that entered the check queue yesterday afternoon, as still haven't gotten access to devstack nodes	14:37
mordred	sdague: done	14:37
sdague	mordred: thank you	14:37
*** Ryan_Lane has joined #openstack-infra		14:37
*** MoXxXoM has quit IRC		14:38
Alex_Gaynor	So, maybe ridiculous question, but could we be launching more devstack nodes?	14:39
*** MoXxXoM has joined #openstack-infra		14:39
sdague	Alex_Gaynor: my understanding is we were basically at quota with HP	14:41
sdague	maybe mordred knows more	14:41
*** Ryan_Lane has quit IRC		14:41
mordred	we are - and we could request a quota increase... but	14:43
mordred	I don't know that I'm convinced that would help, given the resets	14:44
*** adalbas has joined #openstack-infra		14:44
mordred	the gate queue isnt' slow due to starvation	14:44
sdague	it would help with the starvation on check	14:44
sdague	correct	14:44
mordred	well, we've also got a change in flight to move the check queue to a separate pool of machines	14:44
mordred	https://review.openstack.org/#/c/48549/	14:45
sdague	sure, it's just going to take until tomorrow afternoon to clear the check queue at this rate	14:45
mordred	yah. I'm just saying, I think that finishing the above patch and landing it will get us _way_ further (and be quicker) than trying to increase quota size	14:46
sdague	yeh, sure	14:46
openstackgerrit	A change was merged to openstack-infra/config: make check queue high priority https://review.openstack.org/48657	14:46
mordred	I'll work on trying to get that patch finished as soon as I've found coffee	14:46
jeblair	i think i would have made them both normal	14:46
jeblair	now post will starve	14:46
jeblair	i apparently missed reviewing that by 2 minutes	14:47
sdague	I thought post was high?	14:47
jeblair	normal	14:47
sdague	ah	14:47
*** alcabrera is now known as gerrit2		14:48
*** gerrit2 is now known as alcabrera		14:48
sdague	is normal a keyword? or just the default?	14:48
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Make check, high, post normal precedence. https://review.openstack.org/48668	14:48
jeblair	both	14:48
mordred	jeblair: nod. +2	14:49
sdague	I think you want to update commit message :)	14:49
sdague	s/high/gate/	14:49
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Make check, gate, post normal precedence. https://review.openstack.org/48668	14:49
jeblair	make word word word	14:49
ryanpetrello	so I've just tagged a stackforge project (pecan) for release, and watched it go through on zuul;	14:49
ryanpetrello	I've never done this before now - how long does it take for the sdist to show up on pypi?	14:49
sdague	heh	14:49
ryanpetrello	(not in a rush, just want to make sure I didn't goof it up :D)	14:50
*** kgriffs has joined #openstack-infra		14:51
ryanpetrello	http://logs.openstack.org/bf/bf841be3933fd297b534ca235bcbae0c13bf6202/release/pecan-tarball/0b9ffb8/console.html	14:51
ryanpetrello	looks like it failed?	14:51
*** rcleere has joined #openstack-infra		14:51
*** matsuhashi has quit IRC		14:51
jeblair	mordred, sdague: there are also things we can tune to get nodepool a little more responsive now, i'll work on that while mordred finished the rax-check stuff	14:51
kgriffs	guys, got a question re paste.openstack.org	14:52
sdague	jeblair: cool	14:52
kgriffs	I noticed it is based on lodgeit, and I found this: https://github.com/openstack-infra/lodgeit	14:52
kgriffs	is that repo independent of the original lodgeit?	14:53
fungi	ryanpetrello: yeah, looks like you're missing a [testenv:venv] section in your tox.ini which run-tarball.sh expects to find	14:53
fungi	kgriffs: it's a fork	14:53
sdague	jeblair: so are queue priorities changed as soon as the config lands?	14:53
fungi	kgriffs: the original lodgeit is abandoned upstream last i checked	14:53
kgriffs	oh, ok	14:53
jeblair	sdague: yes	14:53
sdague	check is still going in the wrong direction, and it's only going to get worse as the PST folks wake up	14:53
kgriffs	so we are sort of keeping it on life support?	14:53
fungi	kgriffs: i think pocoo stopped using it and ceased maintaining it	14:53
jeblair	sdague: for new jobs	14:53
kgriffs	fungi: ok, I suspected as much	14:54
jeblair	sdague: which isn't going to help many of the jobs currently in check	14:54
sdague	jeblair: ok, so the 190 check jobs that are in there won't make any progress?	14:54
fungi	kgriffs: basically, i think. part of the problem is that unauthenticated sites allowing you to post arbitrary text are an attractive nuisance and often abused to the point of being unmaintainable	14:54
mordred	kgriffs: yes. clarkb and I found a pastebin that was more similar to gist a little while ago, but we haven't gotten to the point where working on paste has been important enough :)	14:54
*** marun has joined #openstack-infra		14:55
*** jswarren has joined #openstack-infra		14:55
jeblair	sdague: indeed it seems likely to make it worse	14:55
kgriffs	mordred, fungi: I would like to create a "pastebin" for images to share screenshots and stuff, and was wondering whether it should be a standalone thing or try to integrate with something already out there	14:55
*** jswarren has quit IRC		14:55
jeblair	sdague: perhaps we should _lower_ gate to low until it clears out	14:55
fungi	kgriffs: yikes. i think you don't want to do that	14:55
fungi	kgriffs: it's called 4chan ;)	14:55
kgriffs	heh	14:55
sdague	jeblair: yeh, that seems reasonable, then on the next reset they'll start getting resources	14:56
mordred	kgriffs: we have an open item to have better support for this from the horizon folks to	14:56
*** jswarren has joined #openstack-infra		14:56
mordred	kgriffs: and some preliminary plans, but simlarly that hasn't hit high enough on the queue yet	14:56
kgriffs	mordred what is the alternative project you found?	14:56
sdague	jeblair: you want to respin your patch for that? or I can do it	14:56
kgriffs	mordred: (the gist-like thing you mentioned)	14:57
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Make check, gate, post low precedence https://review.openstack.org/48668	14:57
jeblair	sdague: ^	14:57
*** Ajaeger has joined #openstack-infra		14:57
mordred	kgriffs: https://github.com/justinvh/gitpaste	14:57
sdague	fungi, mordred: ^^^	14:57
mordred	+2	14:57
kgriffs	ah, nice	14:57
kgriffs	thanks - I'll check it out.	14:58
sdague	ok, hopefully that will get things running though	14:58
Alex_Gaynor	kgriffs, fungi: I can confirm that pocoo upstream no longer maintains lodgetit, their install (paste.pocoo) was being used for various illegal and highly offensive stuff so it was too much of a hassle	14:58
kgriffs	gtk	14:59
openstackgerrit	Monty Taylor proposed a change to openstack-infra/config: Use rackspace for tempest check tests. https://review.openstack.org/48549	15:00
jswarren	Hello. Maybe this has already been brought up, but I'm noticing on zuul that python26 jobs are stuck on queued with evidently none are in progress.	15:00
mordred	jeblair: I think that does it	15:00
mordred	jswarren: yup. big-time gate issues right now	15:01
mordred	jswarren: http://lists.openstack.org/pipermail/openstack-dev/2013-September/015743.html	15:01
jeblair	mordred: you didn't split it into 2 changes	15:01
mordred	jeblair: ah. sorry. didn't see that note (still pre-coffee) one sec	15:02
*** mrodden has joined #openstack-infra		15:03
openstackgerrit	A change was merged to openstack-infra/config: Make check, gate, post low precedence https://review.openstack.org/48668	15:03
Alex_Gaynor	So the priority updates, does that require a zuul restart?	15:04
jeblair	Alex_Gaynor: no, nothing to the zuul layout.yaml requires a restart, only a reload (which puppet will do automatically); queue contents don't change	15:04
Alex_Gaynor	jeblair: thank god	15:05
openstackgerrit	Monty Taylor proposed a change to openstack-infra/config: Use rackspace for tempest check tests https://review.openstack.org/48672	15:05
openstackgerrit	Monty Taylor proposed a change to openstack-infra/config: Set up new images on rackspace for check tests https://review.openstack.org/48549	15:05
jeblair	mordred: dfw has 18/60 slots available (the rest are static slaves); ord is pretty much open (i can delete some test servers there), iad only has 8 slots	15:06
jeblair	mordred: i think we need to leave headroom in dfw. i'm not sure we should use it much, if at all.	15:06
mordred	jeblair: agree. lemme modify the first patch	15:07
jeblair	mordred: hang on	15:07
mordred	I'm also going to send pvo and troy an email seeing if we can get IAD to match	15:07
*** Ryan_Lane has joined #openstack-infra		15:08
*** tvb has quit IRC		15:08
Alex_Gaynor	mordred: if you need me to ask people to up our limit, let me know, I can start sending emails	15:08
mordred	Alex_Gaynor: I just emailed troy and pvo, but if you know other folks, what I requested was "Can you up our quota on the openstackjenkins account in IAD to match DFW and ORD?"	15:09
openstackgerrit	Anne Gentle proposed a change to openstack-infra/config: Removes openstack-api-programming doc build https://review.openstack.org/48674	15:09
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Tune nodepool https://review.openstack.org/48675	15:09
Alex_Gaynor	mordred: k, will start firing emails	15:10
jeblair	mordred: i will modify your patch	15:10
fungi	jeblair: ttx: reed: noticed a small freshness problem with http://git.openstack.org/cgit/openstack-infra/config/tree/tools/atc/email-stats.sh . what's the best way to confirm which repositories should be listed in there to count toward atc? should everything in openstack/ openstack-dev/ and openstack-infra/ get added to it?	15:12
mordred	Alex_Gaynor, jeblair: pvo has acknowledge my email	15:12
*** Ryan_Lane has quit IRC		15:12
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Set up new images on rackspace for check tests https://review.openstack.org/48549	15:13
jeblair	mordred: ^	15:13
*** DinaBelova has joined #openstack-infra		15:13
ttx	fungi: you shoudln't need ATC right now, just APC	15:13
jeblair	what's an apc?	15:14
ttx	Active pro(ject/gram) Contributor	15:14
mordred	jeblair: yup	15:14
openstackgerrit	David Caro proposed a change to openstack-infra/jenkins-job-builder: Added the possibility to specify source files https://review.openstack.org/48677	15:14
fungi	ttx: that's the list of projects we're building stats on, so for example openstack/django_openstack_auth is not represented (yet)	15:14
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Use rackspace for tempest check tests https://review.openstack.org/48672	15:15
jeblair	rebase ^	15:15
ttx	fungi: we don't have the precise program/project map yet, but I can go through the list of projects and get that for you	15:15
fungi	ttx: i'll add it since you say it's part of horizon's program, but just trying to figure out what else we may be missing more generally	15:15
jeblair	fungi: aprv https://review.openstack.org/48675 ?	15:15
mordred	jeblair: all three are +2 from me	15:16
ttx	fungi: i'll fix that list for you before we run the ATC voters lists	15:16
fungi	ttx: k, thanks	15:16
jeblair	fungi: and then https://review.openstack.org/48549 as well	15:16
jeblair	i wip'd the 3rd change to keep it from going in prematurely	15:16
ttx	fungi: i added django_openstack_auth because that's arguably part of the horizon program	15:17
*** CaptTofu has quit IRC		15:17
ttx	fungi: but it's a bit of a grey area right now, until programs all submit their lists	15:17
ttx	but i can't get them to publish a mission statement, so projects lists...	15:17
jeblair	ttx: i believe that's the understanding we came to with gabrielhurley	15:17
ttx	jeblair: agreed, but it just won't be completely clear cut until we get the program/projects maps in the governance repo	15:19
* anteaya observes		15:19
ttx	until then we'll continue to use the old "sounds about right" recipe we've been using for ATCs until now :)	15:19
fungi	wfm	15:19
anteaya	okay	15:20
ttx	fungi: everyone will just blame anteaya anyway	15:20
ttx	that's what we need election officials for, after all	15:20
anteaya	blame me	15:20
fungi	i know i do ;)	15:20
anteaya	:D	15:20
* fungi kids		15:20
anteaya	it is the fun that comes with that particular hat	15:20
anteaya	knew it when I volunteered	15:20
*** tvb has joined #openstack-infra		15:21
*** tvb has quit IRC		15:21
*** tvb has joined #openstack-infra		15:21
ttx	anteaya: note that I decided to share the blame for the TC election. Just couldn't for this one :)	15:21
anteaya	understood	15:21
anteaya	and yeah the TC election promises to be a whole lot of fun	15:21
anteaya	get ready for the deluge of +1 emails	15:21
mordred	ttx: you might want to poke the TC folks who still haven't vote on the governance repo - I believe your reminded slipped in the end of the meeting last time	15:21
mordred	so they may not be noticing that they need to do that	15:22
ttx	mordred: will do	15:22
jgriffith	sdague: ummm... just curious why you think this: https://bugs.launchpad.net/tempest/+bug/1226337 is a tgt issue?	15:22
uvirtbot	Launchpad bug 1226337 in tempest "tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern flake failure" [High,Triaged]	15:22
jgriffith	particularly sinc ethe specific example here is that the server never booted?	15:22
*** CaptTofu has joined #openstack-infra		15:22
fungi	ttx: the main reason i was asking as far as updating that list is that it potentially affects the set of qualifying atcs i gave reed for summit passes	15:23
openstackgerrit	David Caro proposed a change to openstack-infra/jenkins-job-builder: Added the possibility to specify source files https://review.openstack.org/48677	15:23
ttx	mordred: actually we have 8 +2s there. Which is enough to pass.	15:23
ttx	mordred: i'll still ping them for a last-minute objection though	15:23
jgriffith	jog0: ping	15:24
*** Ajaeger has quit IRC		15:24
jgriffith	OH... never mind that Nikola	15:24
*** freyes has joined #openstack-infra		15:25
*** reed_ has joined #openstack-infra		15:27
*** CaptTofu_ has joined #openstack-infra		15:27
openstackgerrit	A change was merged to openstack-infra/config: Tune nodepool https://review.openstack.org/48675	15:28
openstackgerrit	A change was merged to openstack-infra/config: Set up new images on rackspace for check tests https://review.openstack.org/48549	15:28
*** CaptTofu_ has quit IRC		15:30
sdague	jgriffith: because the issue looks like the iscsi device can't be found from compute	15:31
*** rpodolyaka has left #openstack-infra		15:31
jgriffith	sdague: Ummmm	15:32
sdague	it's a boot from volume, and on the 3rd time to boot from a volume the iscsi device never shows up on n-cpu	15:32
jgriffith	http://logs.openstack.org/29/45029/6/check/gate-tempest-devstack-vm-full/80dd62e/logs/screen-n-cpu.txt.gz	15:32
openstackgerrit	Jaroslav Henner proposed a change to openstack-infra/jenkins-job-builder: Add dynamic string and choice params. https://review.openstack.org/48506	15:32
jgriffith	sdague: afraid I think these multiple things going on here	15:32
jgriffith	s/these/there's/	15:32
sdague	http://logs.openstack.org/87/47487/4/gate/gate-tempest-devstack-vm-postgres-full/247d81e/logs/screen-n-cpu.txt.gz#_2013-09-27_10_03_40_640	15:33
*** tvb\|afk has joined #openstack-infra		15:33
*** tvb\|afk has quit IRC		15:33
*** tvb\|afk has joined #openstack-infra		15:33
sdague	jgriffith: ok, well more eyes appreciated	15:33
sdague	this is as far as we got on -qa this morning trying to figure things out	15:33
sdague	there's some scrollback there if you are on it	15:34
jgriffith	sdague: I'm looking, If I can find a clean example of the target issue I can dig in on the cinder side	15:34
jgriffith	sdague: checking...	15:34
*** tvb has quit IRC		15:34
openstackgerrit	James E. Blair proposed a change to openstack-infra/zuul: Allow multiple invocations of the same job https://review.openstack.org/48684	15:35
jeblair	sdague, fungi: ^ sadly, I think that answers that question in the negative. but we should be able to have that feature in place over the weekend.	15:35
*** Ryan_Lane has joined #openstack-infra		15:38
mgagne	When Rackspace updates their images, does the image ID change? Does the image disappears for a brief moment or are there 2 images with the same name for a couple of seconds?	15:39
*** AlexF has quit IRC		15:40
*** tvb\|afk has quit IRC		15:41
mordred	jeblair: pvo says our IAD quota should be increased	15:42
jeblair	mgagne: i don't know	15:42
*** tvb has joined #openstack-infra		15:42
*** tvb has quit IRC		15:42
*** tvb has joined #openstack-infra		15:42
jeblair	mgagne: it is!	15:42
*** DinaBelova has quit IRC		15:42
jeblair	mordred: it is!	15:42
jeblair	mgagne: sorry	15:42
*** Ryan_Lane has quit IRC		15:43
jeblair	mordred: i'll update nodepool conf	15:43
*** tizzo has quit IRC		15:43
*** DennyZhang has joined #openstack-infra		15:43
*** AlexF has joined #openstack-infra		15:44
*** UtahDave has joined #openstack-infra		15:45
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Increase IAD nodepool limits https://review.openstack.org/48687	15:45
openstackgerrit	David Caro proposed a change to openstack-infra/jenkins-job-builder: Added globbed parameters to the job specification https://review.openstack.org/48688	15:45
jeblair	mordred: check images are building	15:45
mordred	jeblair: woot	15:46
*** yassine has quit IRC		15:47
jeblair	i'm deleting the old test nodes/images	15:47
giulivo	jgriffith, what I found is that cinder seems to receive on okay from tgt-admin about the update so the volume is moved into available state	15:48
*** DennyZhang has quit IRC		15:48
giulivo	but later iscsiadm on nova can't find the volume	15:48
*** DinaBelova has joined #openstack-infra		15:49
giulivo	so following sdague suggestion I've this on devstack https://review.openstack.org/#/c/48626/	15:49
*** DennyZhang has joined #openstack-infra		15:49
jgriffith	giulivo: is it iscsiadm can't discover? Cuz it looks like the discover works and it thinks it attached it	15:49
jgriffith	giulivo: but that that actual proble is that the attach was no good	15:49
giulivo	jgriffith, I found three attempts to rediscover	15:49
jgriffith	giulivo: but I'm just trying to catch up so I could be wrong	15:49
giulivo	lasting like secs	15:49
jgriffith	giulivo: what do you mean by that?	15:50
jgriffith	giulivo: ie... can you point to the logs?	15:50
jgriffith	giulivo: You mean sendtargets command?	15:50
giulivo	wait a sec so I can post the relevant log	15:51
jgriffith	giulivo: cool	15:51
giulivo	:P	15:51
jgriffith	giulivo: like I said, be patient with me I'm just catching up with you guys here :)	15:51
jgriffith	giulivo: Hoping I can help	15:51
*** tizzo has joined #openstack-infra		15:51
*** mkerrin has quit IRC		15:52
giulivo	oh c'mon so the logs I was looking at are http://logs.openstack.org/64/47264/2/gate/gate-tempest-devstack-vm-full/dced339/logs/screen-c-vol.txt.gz for cinder and http://logs.openstack.org/64/47264/2/gate/gate-tempest-devstack-vm-full/dced339/logs/screen-n-cpu.txt.gz for nova	15:52
giulivo	the problem is with volume 4020e0dd-24a0-453b-985d-e50cb2dd0de1	15:53
giulivo	the nova exception is here http://logs.openstack.org/64/47264/2/gate/gate-tempest-devstack-vm-full/dced339/logs/screen-n-cpu.txt.gz#_2013-09-24_04_44_35_186	15:54
jeblair	mordred, fungi: https://review.openstack.org/#/c/48687/	15:54
jeblair	all the rax check images are now ready	15:55
jgriffith	giulivo: yeah, so that's what I was wondering....	15:56
fungi	jeblair: does that mean 48672 is safe to un-wip/approve now?	15:57
jgriffith	giulivo: Login was succesful indicating the target was there	15:57
jeblair	fungi: not just yet, it's launching the nodes	15:57
jgriffith	giulivo: 2013-09-24 04:44:17.515	15:57
giulivo	login succeeds true, but not the volume?	15:57
fungi	ahh, okay	15:57
jgriffith	BUT	15:57
jgriffith	the attach/mount ad /dev/vda is the crux of the issue	15:57
jgriffith	I think	15:57
jgriffith	giulivo: That fact that the login to the target was succesful is why I had moved past that point	15:58
jgriffith	giulivo: sadly, no logging inbetween there :(	15:58
*** thomasbiege has joined #openstack-infra		15:59
*** CaptTofu has quit IRC		15:59
giulivo	so the three attempts to rediscover which are failing are "okay" ?	15:59
giulivo	rediscover the volume, after logging in	15:59
giulivo	like this http://logs.openstack.org/64/47264/2/gate/gate-tempest-devstack-vm-full/dced339/logs/screen-n-cpu.txt.gz#_2013-09-24_04_44_22_710	16:00
jgriffith	giulivo: well...	16:00
jgriffith	giulivo: so "discover" can mean different things with iscsi	16:00
jgriffith	giulivo: "discover" in terms of iscsi target discovery appears to have succeeded without issue	16:00
jgriffith	giulivo: what you're referring to though is the attachment	16:00
giulivo	yeah it's not the sendtargets sorry, I should say rescan but that is just the argument passed to iscsiadm	16:00
jgriffith	I think	16:01
jgriffith	giulivo: got ya	16:01
jgriffith	giulivo: so what's failing is the attach	16:01
jgriffith	giulivo: the target appears to be vlie	16:01
jgriffith	valid	16:01
*** thomasbiege has quit IRC		16:01
SpamapS	Anybody know a way to specify a different set of things to ignore for flake8 per-directory?	16:01
jgriffith	giulivo: but it's the attach that is hosed	16:01
jgriffith	and whatever's been done with the logging isn't overly helpful IMO	16:02
*** matty_dubs is now known as matty_dubs\|lunch		16:03
*** tizzo has quit IRC		16:04
openstackgerrit	David Caro proposed a change to openstack-infra/jenkins-job-builder: Added the possibility to specify source files https://review.openstack.org/48677	16:04
jeblair	mordred, sdague, fungi: our first rax nodes are ready, from IAD, they took 16 minutes to build	16:05
jeblair	(dfw and ord are still building)	16:06
fungi	eek	16:06
fungi	what's build time like for hp?	16:06
jeblair	fungi: 2 mins	16:06
fungi	i guess ~15 minutes is what i recall from standing up puppetish servery things in rackspace previously though	16:07
openstackgerrit	A change was merged to openstack-infra/config: Increase IAD nodepool limits https://review.openstack.org/48687	16:07
guitarzan	giulivo: can you tell if the iscsi device shows up eventually?	16:07
fungi	taking package installs/upgrades and whatnot into account	16:07
jeblair	fungi: that's not necessary for this though -- this is a straight launch from image -- but it's a custom image, which means it may not be local to the compute node	16:07
fungi	oh, ew	16:08
fungi	right, image is already updated and such	16:08
jeblair	i don't know how it works in rax though -- perhaps continued use warms caches on compute nodes.	16:08
fungi	we'll be warming those up really quickly if that's the case ;)	16:08
jeblair	we can somewhat mitigate this by increasing min-ready even more	16:09
giulivo	guitarzan, so I think the problem is exactly that the block device never shows up	16:09
giulivo	there is nothing from the kernel messages about the newer volume (from iscsiadm)	16:10
giulivo	not that I can see at least	16:10
guitarzan	hmm, how is the network between the two machines?	16:11
giulivo	loopback	16:11
guitarzan	ah, and above someone said the discovery was fine	16:11
giulivo	yeah the login on the portal works	16:11
jgriffith	guitarzan: network sucks	16:12
jgriffith	guitarzan: the target is discovered BTW	16:12
jgriffith	guitarzan: it's the iscsiadm attach that doesn't seem to work	16:13
*** flaper87 is now known as flaperboon		16:13
guitarzan	well, he also said the login worked	16:13
guitarzan	so that's definitely confusing	16:14
dims	jgriffith, giulivo - i don't even see iscsiadm commands being run - looking at logstash using query - (@message:"4020e0dd-24a0-453b-985d-e50cb2dd0de1" OR @message:"iscsiadm") AND @fields.build_uuid:"dced339fa65543fd9e752d2581bc5cae"	16:14
jgriffith	guitarzan: http://logs.openstack.org/64/47264/2/gate/gate-tempest-devstack-vm-full/dced339/logs/screen-n-cpu.txt.gz	16:14
jgriffith	dims: I've given up on logstash for the time being	16:14
guitarzan	jgriffith: yeah, I'm looking at that too	16:14
jgriffith	dims: checkout the link above to the nova log	16:14
jgriffith	2013-09-24 04:44:17.515	16:15
dims	i see it	16:15
dims	looks like we are losing information in logstash sigh.	16:15
jgriffith	dims: that's what I concluded but thought maybe my queries just sucked ;)	16:16
*** alcabrera is now known as alcabrera_afk		16:16
jeblair	jgriffith: http://logs.openstack.org/64/47264/2/gate/gate-tempest-devstack-vm-full/dced339/logs/screen-n-cpu.txt.gz#_2013-09-24_04_44_17_515	16:17
*** tvb has quit IRC		16:17
jeblair	jgriffith: the timestamps are hyperlinks to per-line targets	16:17
jeblair	jgriffith: (so you can more easily share a link to a line)	16:17
clarkb	morning	16:17
jgriffith	jeblair: Nice!!!	16:17
jeblair	jgriffith, guitarzan: sdague made a change yesterday that removes DEBUG lines from logstash	16:17
jgriffith	jeblair: thank you!	16:17
jeblair	jgriffith: sdague did the line-hyperlink too	16:18
jgriffith	jeblair: Ahhhh, so it's not that i cna't write a descent querie to save my life ;)	16:18
dims	:)	16:18
jeblair	fungi: some rax nodes are going on 0.43 hours in building state :(	16:19
* clarkb catches up on the state of things		16:19
fungi	wow	16:19
jeblair	clarkb: there's a lot; short version, we're throwing levers to deal with check starvation; nothing needs immediate attention there	16:20
dims	giulivo, jgriffith, 04:44:17.577 first try and exception is at 04:44:31.806 - may be it just needs more time?	16:20
giulivo	I don't know if there is nova folks around but after the latest iscsiadm --rescan attempt http://logs.openstack.org/64/47264/2/gate/gate-tempest-devstack-vm-full/dced339/logs/screen-n-cpu.txt.gz#_2013-09-24_04_44_22_710 we have 10 seconds of almost no logging before the stack trace	16:20
*** gyee has joined #openstack-infra		16:20
openstackgerrit	A change was merged to openstack-infra/config: Use rackspace for tempest check tests https://review.openstack.org/48672	16:20
guitarzan	giulivo: 3**2 seconds maybe? :)	16:21
clarkb	jeblair: does gearman honor NODE_LABEL? that was the biggest thing I was fuzzy on last night?	16:21
jeblair	clarkb: zuul translates that into the job_name:label syntax for gearman	16:21
*** odyssey4me has quit IRC		16:21
giulivo	it's 10 seconds after the last attempt	16:21
jeblair	clarkb: we've never used that, so that's going to be exciting!	16:22
clarkb	jgriffith: dims: we are removing DEBUG for a couple reasons the biggest being it adds an order of magnitude to the size of our indexes (2 weeks is ~600GB now but was ~5TB with DEBUG) but also DEBUG is largely useless noise	16:22
clarkb	jgriffith: dims: also if there is information that pinpoints a bug and does not have anything logged at a higher level I would consider that to be a bug as well (if we fail it should be logged at something higher than INFO)	16:23
clarkb	at least WARN imo	16:23
jgriffith	clarkb: sure, don't get me wrong wasn't complaining	16:23
clarkb	jeblair: cool	16:23
jgriffith	clarkb: just pointing out that my queries never worked, and now I know why :)	16:23
clarkb	jgriffith: I know, just trying to point out how we got here. It isn't perfect bus is definitely more useable overall	16:24
jgriffith	clarkb: I would agree WRT bumping up some of the log levels	16:24
giulivo	jgriffith, in nova it looks like the iscsiadm --rescan is only attempted three times so I think this just never finds the volume after logging in	16:24
jgriffith	clarkb: agreed	16:24
dims	clarkb, thanks, understood	16:24
jgriffith	giulivo: sorry... I was looking at something else, going back to something here	16:24
guitarzan	giulivo: if it hasn't happened in 14 seconds, maybe it isn't going to happen?	16:25
giulivo	https://github.com/openstack/nova/blob/master/nova/virt/libvirt/volume.py#L275	16:25
guitarzan	giulivo: you say there was never anything in kern.log about a new disk showing up?	16:25
giulivo	guitarzan, ^^ yep	16:25
giulivo	I think logging on the portal works but the volume is never found and as per nova code, after three failed attempts it reports failure	16:25
giulivo	that explains why there isn't anything in the kernel log about the new block device	16:26
dims	giulivo, so try a few more times may help?	16:27
giulivo	it is either the iscsiadm failing at --rescan	16:27
jeblair	clarkb, fungi, mordred: look at 48423,2 on the status page	16:27
jeblair	clarkb, fungi, mordred: mouseover the red dot	16:27
giulivo	or the tgtd returning an okay to cinder before the lun is actually made available	16:27
mordred	jeblair: yah	16:28
jeblair	clarkb, fungi, mordred: you'll see the 'needed dependency is failing' logic in action	16:28
clarkb	jeblair: awesome	16:28
mordred	++	16:28
fungi	nice, dependency failure	16:28
clarkb	I mean not that it is failing but that the representation of it works :)	16:28
giulivo	so based on that, I think this could help https://review.openstack.org/#/c/48626/ as we get tgtd in debug mode and can try to figure what it is doing when cinder provides it with the new volume	16:28
fungi	i was hoping to eventually spot one of those in the wild with the new visualization	16:28
fungi	also, holy test nodes graph batman	16:29
clarkb	jog0: logstash is all caught up and appears to be keeping up	16:30
pabelanger	fungi, I was about to say that... that is awesome!	16:30
clarkb	jog0: so elastic-recheck probably doesn't need any fancy backoff stuff	16:30
jeblair	here's an embiggened version: http://tinyurl.com/pj3kpj9	16:30
jeblair	(you have to remember to reload that one occasionally)	16:30
jeblair	the orange peak near the end is the rackspace spinup	16:31
jeblair	(and most of the ready nodes are rackspace)	16:32
pabelanger	jeblair, what's the amount of time to actually spin up a node? Is that tracked some place?	16:33
jeblair	pabelanger: it's in graphite (nodepool.launch.*), but i can tell you offhand we're seeing about 2 mins for hp and 16 for rackspace atm.	16:35
dhellmann	sdague: https://github.com/dhellmann/cliff/tree/remove-cmd2 if you want to give it a spin	16:35
*** Ryan_Lane has joined #openstack-infra		16:35
jog0	mordred: it was a little bitchy, I was going for a public shaming.	16:36
jog0	clarkb: woot!	16:37
clarkb	jeblair: chatted with zaro briefly over hte wall (shame on us for not doing it here) to better understand the NODE_LABEL stuff and I am not entirely sure it will owkr as expected	16:40
jeblair	clarkb: we're about to find out?	16:40
clarkb	jeblair: because our project configs don't use the label devstack-precise-check there won't be any jobs for that project:label name in the gearman server	16:40
clarkb	s/jobs/workers/	16:41
jeblair	clarkb: ah, yes, that label needs to be added	16:41
*** boris-42 has quit IRC		16:41
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Revert "Use rackspace for tempest check tests" https://review.openstack.org/48698	16:42
clarkb	jeblair: but we can't do that safely without another job	16:42
jeblair	clarkb: i think we can. the param func should set the label in all cases	16:42
giulivo	dims, guitarzan, jgriffith, sdague I'm sorry I've to leave but FWIW I'm of the idea that iscsiadm --rescan is failing at finding the volume after it logs in on the portal, the nova code checks for the device path 3 times but it never pops up so it raises , see https://github.com/openstack/nova/blob/master/nova/virt/libvirt/volume.py#L275 so I think putting tgtd on debug on the other side could help figure wha	16:43
giulivo	t is going on (at both creation time and attach) https://review.openstack.org/#/c/48626/	16:43
clarkb	jeblair: so we need to have an else in that function that sets it to devstack-precise? that should work	16:43
jeblair	clarkb: yeah, though to do it safely, i think we need to start by setting it to devstack-precise always, then change the job labels, then add the conditional	16:43
jeblair	clarkb: it's getting complicated enough that we should re-evaluate adding jobs...	16:44
clarkb	++	16:44
jeblair	clarkb: the advantage of adding jobs is that we can say check jobs can run on either, which is a little bit of a release valve if rackspace can't keep up.	16:45
*** odyssey4me has joined #openstack-infra		16:45
jeblair	clarkb: the disadvantage, obviously, is that the devstack jobs are a huge mess right now and we'd be making twice as many of them.	16:45
*** dcramer_ has quit IRC		16:45
clarkb	yeah. What if we didn't treat them differently (rackspace runs the jobs in about as much time as hpcloud did them serially)	16:46
clarkb	(just throwing ideas out there)	16:46
jeblair	clarkb: rackspace runs them in about 1.5 the time, so we're looking at 60 minutes instead of 40.	16:47
*** wchrisj_ has joined #openstack-infra		16:47
*** afazekas is now known as afazekas_zz		16:47
*** dcramer_ has joined #openstack-infra		16:48
*** giulivo has quit IRC		16:48
notmyname	clarkb: I'm just getting caught up this morning. status of gates? good to go, or still waiting?	16:48
jeblair	clarkb: that might be the best approach.	16:49
clarkb	notmyname: still in a bit of flux, but we are actively sorting it out	16:49
notmyname	clarkb: kk, thanks	16:49
jeblair	clarkb: what are we sorting out?	16:49
clarkb	jeblair: node starvation?	16:49
clarkb	oh talking about gate in particular	16:49
openstackgerrit	A change was merged to openstack-infra/config: Revert "Use rackspace for tempest check tests" https://review.openstack.org/48698	16:50
*** beekneemech has quit IRC		16:50
jeblair	clarkb: i don't think notmyname needs to take any particular action other than not approving stable/grizzly changes, which he so rarely does anyway. :)	16:50
clarkb	jeblair: gotcha	16:50
clarkb	notmyname: ^	16:50
notmyname	clarkb: jeblair: ok, thanks :-)	16:50
*** gyee has quit IRC		16:51
*** AlexF has quit IRC		16:52
*** dcramer_ has quit IRC		16:52
jog0	clarkb: some files are missing from logstash	16:52
jeblair	mordred, fungi, sdague: so clarkb and i were chatting, and we either need to (a) do like 3 more steps to set up the check jobs to use rackspace, (b) double the number of devstack jobs so the check ones use rackspace, or (c) say screw it and just throw rackspace nodes into the general pool (occasionally jobs will take 60 instead of 40 mins)	16:53
jog0	http://logstash.openstack.org/#eyJzZWFyY2giOiIgQGZpZWxkcy5idWlsZF9jaGFuZ2U6XCIzOTYyMVwiIEFORCBAZmllbGRzLmJ1aWxkX3BhdGNoc2V0OlwiMTJcIiBBTkQgQGZpZWxkcy5idWlsZF9uYW1lOlwiZ2F0ZS10ZW1wZXN0LWRldnN0YWNrLXZtLXBvc3RncmVzLWZ1bGxcIiBBTkQgQGZpZWxkcy5idWlsZF91dWlkOlwiZWUyZjI2OTMyNDVhNGRmYmFjNDA4YmY3YmEyNDZmNmVcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiOTAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJtb2RlIjoidGVybXMiLCJhbmFseXplX2ZpZWx	16:53
jeblair	mordred, fungi, sdague: thoughts?	16:53
*** jerryz has joined #openstack-infra		16:53
jog0	no screen-key isn't there	16:54
clarkb	keystone should be there	16:54
clarkb	we are missing ceilometer and one of the swift files (because the format of the swift file isn't conducive to indexing)	16:54
* clarkb looks closer		16:55
* zaro says option c		16:55
jog0	keystone is only missing sometimes	16:55
mordred	jeblair: damn	16:56
mordred	jeblair: I'm not convinced just more nodes in the pool will help - but you have just made excellent points	16:56
jeblair	mordred: why wouldn't more nodes in the pool help?	16:56
jeblair	mordred: that's pretty much what starvation means....	16:56
jog0	oh and elasticSearch is really cought up, you weren't exaggerating.	16:57
clarkb	jog0: http://logs.openstack.org/93/37893/11/check/gate-tempest-devstack-vm-neutron/303633a/logs/screen-key.txt.gz?level=INFO that is why	16:57
jog0	sdague: thanks!!!	16:57
clarkb	jog0: basically no non DEBUG log lines according to apache	16:57
mordred	jeblair: 2 things - slower nodes in the pool will increase the latency before resets potentially	16:57
clarkb	jog0: but there are INFO lines in there so we have a bug	16:57
jog0	clarkb: oh :(	16:57
jeblair	mordred: yes, slowing resets down mitigates starvation but slows gate throughput	16:57
jog0	turns out I don't need keystone yet so its not a blocker	16:58
clarkb	jog0: I know what is going on	16:58
clarkb	jog0: I think keystone uses its special snowflake format and we don't handle that properly on the apache side	16:58
clarkb	sdague: ^	16:58
jeblair	mordred: other thing?	16:58
mordred	jeblair: nope. I think that was the thing. I was wrong about there being 2	16:58
*** dmakogon__ has joined #openstack-infra		16:59
jeblair	mordred: the steps in (a) aren't difficult, and (b) is just a lot of typing (c) needs reconfiguration as well. i think all 3 choices will take about the same amount of time.	16:59
jeblair	we get to chose on merits.	17:00
mordred	jeblair: I like the end state of having check jobs running in rackspace	17:00
mordred	because the slowness doesn't have a pile-on effect there	17:00
*** DinaBelova has quit IRC		17:00
clarkb	jog0: sdague: I suddenly remember why logstash is so slow :) the number of cases you have to account for is a bit ridiculous	17:00
*** matty_dubs\|lunch is now known as matty_dubs		17:00
*** gyee has joined #openstack-infra		17:01
*** dstufft has quit IRC		17:01
clarkb	I think right now we only handle oslo format properly so swift and keystone aren't working	17:01
jog0	clarkb: yeah ...	17:01
*** odyssey4me has quit IRC		17:01
clarkb	a quick fix would be to make the level configurable in the workers and only have >DEBUG on oslo formatted things	17:01
clarkb	or sort it out in the wsgi app	17:02
jeblair	okay, so the choice is between (a) run _only_ on rackspace, or (b) run on rackspace and hp, more or less at random according to the proportion of available nodes	17:02
clarkb	sdague: ^ do you have an opinion on that?	17:02
jog0	clarkb: makes sense to me, but that may blow ElasticSearch way back again	17:02
*** hashar has quit IRC		17:02
*** dstufft has joined #openstack-infra		17:02
clarkb	jog0: it shouldn't be too horrible. keystone and swift logs are smaller than the others	17:02
jog0	clarkb: hopefully	17:02
*** MarkAtwood2 has joined #openstack-infra		17:03
jeblair	clarkb: i think it's only a partial regex to get the level anyway, so it may not be too complex to do in the app.	17:03
jog0	clarkb: on a related front I want to go ahead and make the elastic-search gerrit user	17:03
jog0	anything special to do that?	17:03
*** SergeyLukjanov has quit IRC		17:03
jeblair	mordred: what are your feelings on a/b ?	17:03
clarkb	jog0: one of the Gerrit admines (openstack-infra-core) needs to run a command	17:03
markmcclain	jeblair: any update on manually pushing that quantumclient branch pypi?	17:03
jeblair	markmcclain: did you ask us to?	17:03
clarkb	jog0: probably get consensus on the name first (since it will potentially comment on lots of chnages)	17:04
jog0	elastic-recheck?	17:04
jeblair	why is it called recheck?	17:04
jog0	ala the recheck page we have	17:04
jog0	so use elasticSearch to make rechecks easier	17:05
clarkb	hmm is it time to test asterisk?	17:05
jeblair	clarkb: yes it is	17:05
mordred	jeblair: I think b sounds long term sounds richer	17:05
jeblair	i was hoping we could at least reach a consensus on which of a/b/c to do about nodes...	17:05
jeblair	mordred: yeah, so that means doubling the number of devstack jobs so there are check and gate versions	17:06
mordred	jeblair: yeah. that's the least appealing part of b	17:06
clarkb	maybe we can template those jobs and it won't be so horrible?	17:07
jeblair	i mean, there may be opportunities for templating	17:07
jeblair	so who wants to work on that? clarkb, zaro, mordred?	17:07
mordred	jeblair: I am on the phone for the next 2 hours.	17:08
clarkb	I can stab at it	17:08
jeblair	mordred: i'm guessing that's a no, but i'm not sure ;)	17:08
jeblair	clarkb: ok, thanks	17:08
* mordred trying to bilk hp out of more headcount for us - so it's at least useful...		17:08
jeblair	russellb, pabelanger: around?	17:08
fungi	eek, more scrollback	17:09
jog0	mtreinish: ping	17:09
markmcclain	jeblair: I throught so, but I might not have made it clear	17:09
*** odyssey4me has joined #openstack-infra		17:09
jeblair	mordred: can you release the quantumclient branch to pypi?	17:10
mordred	jeblair: sure	17:10
pabelanger	jeblair, indeed	17:10
markmcclain	that was the review is going to require a manual merge first	17:10
markmcclain	because that branch won't clear the gate	17:10
jeblair	mordred: oh, so, er, can you force merge the review markmcclain is about to link for you, and then manually release it? :)	17:10
markmcclain	mordred: https://review.openstack.org/#/c/48364/	17:10
clarkb	sdague: if you get a free moment it would be great if you could stab at making the wsgi app regex more flexible to handle keystone and in the case of swift probably just pass it all through	17:11
mordred	jeblair: yes	17:11
clarkb	since swift doesn't do log levels...	17:11
notmyname	????	17:11
fungi	i think a phased approach with rackspace nodes dumped into the general pool for starters makes sense, then take time to be able to separate pipelines to different providers in the ways which will make jenkins happy longer-term. i'm not super-keen on doubling the devstack job definitions, but maybe that's just unfounded ocd on my part	17:11
clarkb	notmyname: http://logs.openstack.org/83/42283/37/check/gate-tempest-devstack-vm-full/4465ed4/logs/screen-s-account.txt.gz?level=DEBUG we are doing level based filtering of logs	17:11
*** DennyZhang has quit IRC		17:11
clarkb	notmyname: but since swift doesn't have level based logs the filtering derps and removes everything	17:11
mordred	markmcclain: do we want to tag that as a particular version?	17:12
notmyname	clarkb: all swift processes support syslog facilities and log level filters: https://github.com/openstack/swift/blob/master/etc/proxy-server.conf-sample#L24	17:12
*** yolanda has quit IRC		17:12
clarkb	notmyname: but that only works with syslog?	17:13
mordred	markmcclain: like, what version should be released to pypi?	17:13
clarkb	notmyname: syslog doesn't like us when we run devstack it falls over pretty spectacularly	17:13
jeblair	fungi: bummer, loss of consensus. i actually think that (b) is the safest from the pov that it's least likely to break things if rackspace can't keep up (or we decide to reduce its node supply).	17:13
jog0	clarkb: so for the elastic-search gerrit user .. now that ElasticSearch is blazingly fast I want to get the bot up, on my own RAX server	17:14
markmcclain	mordred: 2.2.4	17:14
jeblair	fungi: it sucks that it adds so many jobs, but maybe templating will help	17:14
* fungi is still reading the last 20 minutes of scrollback, which will take about 20 minutes, at which point there will be another 20 minutes of scrollback		17:14
*** odyssey4me has quit IRC		17:14
mordred	markmcclain: can't do that - neutronclient already has that tag :)	17:14
mordred	markmcclain: how about 2.2.4.1 ?	17:14
jeblair	clarkb, fungi, mordred: can we go ahead and merge these changes before clarkb starts? https://review.openstack.org/#/c/48547/ https://review.openstack.org/#/c/48635/	17:14
jeblair	pabelanger: i'm available to dial in	17:15
jeblair	anteaya, zaro, fungi, clarkb: are you available for conferencing?	17:15
jeblair	anyone else?	17:15
anteaya	jeblair: oh yeah	17:15
markmcclain	that will work	17:15
mordred	jeblair: done	17:16
clarkb	jeblair: yes, will be slightly distracted by job config stuff though	17:16
* zaro is available		17:17
jeblair	pabelanger: let us know when you have pbx.o.o configured the way you want	17:17
*** derekh has quit IRC		17:17
*** MarkAtwood has joined #openstack-infra		17:17
mordred	markmcclain: released	17:18
mordred	jeblair: the jobs should fail, but I tihnk I should push the tag back to gerrit anyway, what do you think?	17:18
fungi	jeblair: regarding loss of consensus, i'm still catching up on what the consensus was	17:18
jeblair	mordred: yes	17:18
mordred	done	17:18
markmcclain	mordred: thanks	17:19
jeblair	fungi: (b) the one you didn't like because it adds lots of jobs	17:19
*** kgriffs has left #openstack-infra		17:19
jeblair	fungi: i mean, none of us like it because it adds lots of jobs	17:19
clarkb	sdague: if we set the default starting sev to ERROR that should handle the swift case but will make the screen lines always show up...	17:19
*** bnemec has joined #openstack-infra		17:20
fungi	jeblair: yeah i can switch rooms and jump into the pbx in a bit. just trying to finish reading the discussion in here first	17:20
*** odyssey4me has joined #openstack-infra		17:20
*** reed_ is now known as reed		17:22
*** senk has quit IRC		17:22
openstackgerrit	A change was merged to openstack-infra/config: Make gate-tempest-devstack-vm-large-ops voting https://review.openstack.org/48547	17:22
fungi	jeblair: clarkb: sdague: mordred: if adding duplicate jobs is the safest and most pragmatic solution, then i agree it makes sense to take that route (no need to add features to support that)	17:22
*** johnthetubaguy1 has quit IRC		17:22
*** reed has quit IRC		17:22
*** reed has joined #openstack-infra		17:22
pabelanger	jeblair, sure, give me a minute, trying to fix some errors on the pbx	17:24
*** ryanpetrello has quit IRC		17:25
openstackgerrit	A change was merged to openstack-infra/config: add gate-tempest-devstack-vm-neutron-pg job https://review.openstack.org/48635	17:26
harlowja	qq for ya'll	17:31
harlowja	if anybody has some free secs	17:31
*** alcabrera_afk is now known as alcabrera		17:31
pabelanger	jeblair, is multiple asterisk boxes still up?	17:33
*** wchrisj_ has quit IRC		17:33
pabelanger	okay pbx.o.o is fixes	17:33
pabelanger	fixed*	17:33
jeblair	pabelanger: maybe? i can check, but i think voipms is configured for pbx.o.o	17:34
jeblair	pabelanger: yeah, the others are still around if we need them.	17:34
anteaya	I'm in	17:35
*** MarkAtwood2 has quit IRC		17:35
*** MarkAtwood2 has joined #openstack-infra		17:35
fungi	yeah, they keep e-mailing me about pending updates/needed reboots but since they don't have a domain configured they don't match my cronspam filters and land in my inbox instead	17:36
*** hemnafk is now known as hemna_		17:36
fungi	so pretty sure they're still up	17:36
pabelanger	jeblair, okay, seems to be working now	17:36
* zaro is in conference		17:36
harlowja	so just a question that the taskflow team is having, we'd like to run our tests against a real mysql instance (or maybe even postgres) instead of just sqlite (especially the migration part) and was wondering if there is any standard process to go through to make that happen?	17:37
anteaya	my skype crashed, back now	17:37
anteaya	and I am out again, my skype keeps crashing	17:38
harlowja	:(	17:38
anteaya	new laptop just installed it	17:38
anteaya	sigh	17:38
jeblair	i only hear silence now	17:41
pabelanger	I am tweaking the time while you are talking to see if there is an notice of impact	17:41
pabelanger	so, there might be some chop	17:42
pabelanger	I increased the threashold	17:42
jeblair	it came back	17:42
pabelanger	yup	17:42
pabelanger	lowering it again	17:42
pabelanger	okay	17:42
pabelanger	back to 1000ms	17:42
pabelanger	(the sweet spot, so far)	17:42
jog0	clarkb: until we sort out the gerrit user for elastic-recheck just using my own user	17:44
pabelanger	wow	17:45
*** reed has quit IRC		17:48
*** SergeyLukjanov has joined #openstack-infra		17:50
*** dizquierdo has left #openstack-infra		17:50
anteaya	my skype died	17:51
pabelanger	anteaya, okay	17:52
*** sarob has joined #openstack-infra		17:52
anteaya	I'm pm'ing fungi for the rest	17:52
*** melwitt has joined #openstack-infra		17:52
*** nati_ueno has joined #openstack-infra		17:52
*** odyssey4me has quit IRC		17:54
*** boris-42 has joined #openstack-infra		17:58
*** Ajaeger has joined #openstack-infra		18:01
pabelanger	back to 1000ms for silence	18:02
pleia2	if it would be helpful to have me join the call too let me know, I got distracted by my baremetal testing strace finally working (hooray)	18:02
pleia2	well, the failure appearing so I could strace it anyway :)	18:02
*** odyssey4me has joined #openstack-infra		18:03
jog0	its scary watching the elastic-recheck bot in openstack-qa	18:05
* fungi is afraid to look		18:06
jog0	sdague: ping	18:07
jog0	for bug 1230407	18:07
uvirtbot	Launchpad bug 1230407 in neutron "VMs can't progress through state changes because Neutron is deadlocking on it's database queries, and thus leaving networks in inconsistent states" [Critical,Confirmed] https://launchpad.net/bugs/1230407	18:07
jog0	what would be better query to use for thatone	18:07
*** dcramer_ has joined #openstack-infra		18:07
jog0	something like @message:"Lock wait timeout exceeded" AND @fields.filename:"logs/screen-q-svc.txt" AND @fields.build_status:"FAILURE" ?	18:09
*** DinaBelova has joined #openstack-infra		18:09
devananda	wsme seems to be broken?	18:09
*** julim has quit IRC		18:10
devananda	clarkb: what's the interface to do searches on recent jenkins failures?	18:12
jog0	devananda: logstash.openstack.org	18:13
fungi	devananda: comes to us from the distant past of monday	18:14
fungi	with news of wsme issues	18:14
*** dmakogon__ has quit IRC		18:14
*** dmakogon_ has joined #openstack-infra		18:15
jeblair	clarkb, fungi, mordred, jhesketh: https://review.openstack.org/48684	18:16
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Make devstack jobs templates and create check jobs https://review.openstack.org/48714	18:16
jeblair	clarkb, fungi, mordred, jhesketh: if we merge that soonish, we can probably manage a zuul restart over the weekend to pick it up	18:16
*** alexpilotti has quit IRC		18:16
devananda	fungi: wait. wsme's been broken since monday?	18:17
*** odyssey4me has quit IRC		18:20
sdague	jog0: i'd actually narrow the message to - "Lock wait timeout exceeded; try restarting transaction"	18:20
anteaya	pleia2: hooray	18:20
*** dcramer_ has quit IRC		18:21
devananda	fungi: logstash suggests that it broke ~4hr ago with the new upload of pecan	18:21
devananda	http://bit.ly/1606Cmj	18:21
* sdague just got back from lunch + bike ride, scrolling back		18:22
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Make devstack jobs templates and create check jobs https://review.openstack.org/48714	18:22
jog0	sdague: awesome thanks	18:22
jog0	@message:"Lock wait timeout exceeded; try restarting transaction" AND @fields.filename:"logs/screen-q-svc.txt" AND @fields.build_status:"FAILURE" looks good	18:22
clarkb	sdague: tl;dr is the wsgi log filter doesn't handle swift and keystone	18:22
clarkb	sdague: because they are not oslo format	18:23
clarkb	sdague: for swift I think we just want to let them pass through and for keystone we may need a slightly more forgiving regex	18:24
*** dmakogon_ has quit IRC		18:24
clarkb	but let me know what you think	18:24
*** dmakogon_ has joined #openstack-infra		18:24
sdague	clarkb: sure. So what I should actually do is get some unit testing for this in tree so we can dump in a bunch of sample logs and make sure it works	18:25
clarkb	++	18:25
sdague	clarkb: is there a pattern already for unit testing things in the config tree?	18:25
clarkb	nope	18:25
clarkb	sdague: but we do have a tox.ini	18:26
clarkb	sdague: so you should be able to make use of that	18:26
clarkb	or	18:26
*** nati_ueno has quit IRC		18:26
clarkb	we could split this into a proper project	18:26
sdague	yeh, I'm mixed on that, it seems like more trees end up just being more complexity	18:26
clarkb	ya there are tradeoffs	18:27
*** nati_ueno has joined #openstack-infra		18:27
*** odyssey4me has joined #openstack-infra		18:27
*** zaro is now known as list		18:27
*** list is now known as zaro		18:27
dims	jog0, i added some notes, basically 4 SQL statements hit this	18:28
sdague	dims: they are all basically the same fail though right?	18:28
sdague	I think that neutron fail moves around	18:28
dims	sdague, all 4 SQL's end up with "Lock wait timeout exceeded; try restarting transaction" - yes.	18:28
*** ryanpetrello has joined #openstack-infra		18:29
jog0	fun	18:29
sdague	jog0: well it's a database deadlock	18:29
sdague	so that's kind of expected	18:29
sdague	as whoever gets there last looses, and that's going to change	18:29
*** rockyg has joined #openstack-infra		18:30
dims	sdague, y	18:30
sdague	jeblair: you around? I want to get your opinion of trying to bring unit testing into config vs. breaking out to a separate project	18:30
sdague	clarkb, jeblair: on the rax nodes, I'd say general pool. they should be running in 45min max I think (they were about 40% slower for devstack runs)	18:31
sdague	at least short term	18:31
sdague	check queue down to 20, nice. Much better than 190	18:32
jeblair	sdague: well, we sort of settlen on plan (b) which was still to just use rax nodes exclusively for check, but to also allow hp nodes to contribute to check. clarkb just finished the change here: https://review.openstack.org/#/c/48714/	18:33
sdague	ok, that's cool too	18:33
jeblair	sdague: since the hard part is done, we might as well keep going with it, for now at least. can always change later. :)	18:33
sdague	yep	18:33
Ajaeger	clarkb: do you have a few minutes to discuss https://review.openstack.org/#/c/47691/ ? I'd like to know whether and how to rename the manual jenkins jobs	18:34
jeblair	sdague: i'm fine either way on testing, but i feel like by the time something needs a unit test, that's one of the signals that it's probably time for it to be its own project. we have high hopes for this thing anyway. i think splitting is a good idea, but am not opposed to more 'incubation' if you're not quite ready.	18:35
sdague	sure, though I do think all the python in the config tree should have tests anyway :) solving a framework to make that easier would be good at some point.	18:36
sdague	but I expect we'll use some of the log parsing for other things here, so let me split this out	18:37
*** reed has joined #openstack-infra		18:37
dansmith	wow, that monster check queue dumped pretty quick :)	18:37
jeblair	sdague: i think the thing is that mostly we don't think there should be very much python in the config tree. a quick look suggests we're pretty close to that.	18:38
sdague	dansmith: you can thank mtreinish and tempest testr for that. We can actually chew through it pretty quick when not starved :)	18:39
dansmith	sdague: I know why it's faster, I'm just saying I would have expected it to take longer than a couple hours given how huge it was	18:40
jeblair	dansmith: we threw 300 machines at it.	18:40
sdague	nice :)	18:41
dansmith	jeblair: ah	18:41
dansmith	I try not to throw my machines around, personally, but.. thanks anyway :)	18:41
sdague	oh, hey, yeh I didn't see the bottom graph	18:41
sdague	that's pretty awesom	18:41
jeblair	dansmith: that's how we roll here	18:41
clarkb	Ajaeger: yes, actually something similar to what I have done to sort out devstack-gate stuff may help	18:41
dansmith	jeblair: props, yo.	18:41
clarkb	Ajaeger: but basically have a single project entry aclled openstack-manuals that covers all of the various subsets	18:41
Ajaeger	clarkb, let me check devstack-gate in projects.yaml	18:42
clarkb	Ajaeger: https://review.openstack.org/#/c/48714/2/modules/openstack_project/files/jenkins_job_builder/config/projects.yaml the section starting on line 917 then splits out the subsets	18:42
clarkb	Ajaeger: ^ is where you should look	18:42
Ajaeger	clarkb: thanks for the reference	18:43
*** _david_ has joined #openstack-infra		18:44
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Make devstack jobs templates and create check jobs https://review.openstack.org/48714	18:45
clarkb	I am hopeful that ^ will actually compile correctly	18:46
jeblair	sdague: https://jenkins01.openstack.org/job/gate-tempest-devstack-vm-neutron-pg/	18:47
_david_	clarkb, mordred, jeblair i am working on WIP Gerrit-Plugin against Gerit master (upcoming 2.8 release)and hope to have something working in few days	18:47
clarkb	jeblair: fungi mordred ^ that passes. I think it is ready if you are, but I am going to lunch shrotly	18:47
jeblair	sdague: https://jenkins02.openstack.org/job/gate-tempest-devstack-vm-neutron-pg/	18:47
clarkb	_david_: oh	18:47
clarkb	_david_: you should've told us earlier :)	18:47
clarkb	zaro: ^	18:48
_david_	i did	18:48
jeblair	clarkb: i believe _david_ is up to date on our efforts	18:48
*** MarkAtwood has quit IRC		18:48
clarkb	ah cool	18:48
clarkb	it is I who is behind	18:48
jeblair	clarkb: i think _david_ has a different risk profile with respect to working on gerrit and contributing upstream. :)	18:48
_david_	with recent changes it is actually trivial thing to do	18:48
_david_	jeblair, ;-)	18:49
jeblair	_david_: neat. do you think it will be an in-tree plugin, or a separate project?	18:49
Ajaeger	clarkb: so, something like this: http://paste.openstack.org/show/47615/ ?	18:50
_david_	jeblair, what exactly do you mean by in-tree plugin?	18:50
jeblair	_david_: will it be in the gerrit repository, or a different one?	18:51
zaro	_david_: hi, did you comment on https://gerrit-review.googlesource.com/#/c/48254	18:51
jeblair	_david_: (sorry, i haven't ever used a gerrit with plugins, i don't really know how they are maintained)	18:51
_david_	jeblair, that's a good question	18:51
_david_	zaro, yes, it was /me	18:51
*** mrodden has quit IRC		18:51
*** dkliban has quit IRC		18:51
fungi	devananda: ahh, new breakage then. that i think was the latest version trying to get us out of dependency hell in grizzly	18:52
_david_	jeblair, the only problem i see (may be we have more) that Change.State.WORKINPROGRESS and DashboardAccount should be extended in core and can be influenced by plugin,	18:52
_david_	well at least not yet.	18:52
fungi	devananda: dhellmann would probably be interested in your logstash link there	18:53
_david_	So here is my prototype for WorkInProgressAction (against Master):	18:53
_david_	http://pastebin.com/rZ9b8YCZ	18:53
zaro	_david_: ohh ok. looks like difference of opinion going on. hope it gets resolved soon.	18:53
_david_	jeblair, concerning place: we have two option	18:53
_david_	on gerrit-review or on openstack, right?	18:53
dhellmann	fungi, devananda : ryan is working on the problem	18:54
_david_	may be we still would need very little core patch to make it work,	18:54
dhellmann	fungi, devananda : but any debugging details you have may help	18:54
fungi	awesome, thanks dhellmann and ryan!	18:54
jog0	sdague: for bug https://bugs.launchpad.net/bugs/1230407	18:54
uvirtbot	Launchpad bug 1230407 in neutron "VMs can't progress through state changes because Neutron is deadlocking on it's database queries, and thus leaving networks in inconsistent states" [Critical,Confirmed]	18:54
_david_	but i hope to convince guys to make it work with against upstream gerrit	18:54
jog0	http://logs.openstack.org/70/44670/3/gate/gate-tempest-devstack-vm-neutron/02d68e3/logs/screen-q-svc.txt.gz?level=TRACE	18:54
ryanpetrello	yep, seems to be some sort of issue introduced in the pecan/wsme plugin w/ today's pecan release	18:55
ryanpetrello	debugging...	18:55
*** julim has joined #openstack-infra		18:55
dhellmann	if we're seeing gate blockages, I can propose a change to pin pecan for now	18:55
zaro	_david_: what is the difference between your WIP plugin and my patch to upstream?	18:56
ryanpetrello	+1	18:56
*** sodabrew has joined #openstack-infra		18:56
_david_	zaro, i don't understand that question	18:56
*** sarob_ has joined #openstack-infra		18:56
zaro	_david_: ohh, i implemented that patch so we can create a custom WIP vote and you are creating a WIP plugin. so i'm just asking what would be the difference?	18:57
_david_	zaro, wip plugin is 1 to 1 migration of Shrews's change https://gerrit-review.googlesource.com/36091 against latest master with may be 10 line of upstream patch (for now)	18:58
*** dkliban has joined #openstack-infra		18:58
*** dcramer_ has joined #openstack-infra		18:58
zaro	_david_: ahh i see. thx for the clarification.	18:59
sdague	jog0: what's the question?	18:59
sdague	sorry so many pings	18:59
openstackgerrit	Doug Hellmann proposed a change to openstack/requirements: Pin pecan to avoid the latest release https://review.openstack.org/48722	18:59
_david_	zaro, in the handling: you not vote with a label, you just mark it as in Shrews's change directly on change screen	18:59
dhellmann	fungi, devananda : I opened https://bugs.launchpad.net/pecan/+bug/1232199 for tracking rechecks and the real fix	18:59
uvirtbot	Launchpad bug 1232199 in pecan "release 0.4 breaks some operations with WSME" [Undecided,New]	18:59
Shrews	ugh, don't remind me of that horrific coding experience	19:00
jog0	sdague: I'll move this to the qa room where its a little less noisy	19:00
_david_	Shrews, why? ,-)	19:00
*** sarob has quit IRC		19:00
clarkb	Ajaeger: yes. would need to check the output to be sure though	19:01
*** sarob_ has quit IRC		19:01
devananda	dhellmann: thanks! judging by logstash, i suspect ceilometer and ironic are blocked on this, but nothing else is showing up yet	19:02
fungi	Shrews: pick a time and i'll join you while you drown your memories of that project in a few pints. olaph too	19:02
dhellmann	devananda: ok, good. see that changeset a few lines back for a requirements pin to work around it for now	19:02
Shrews	fungi: yes, we should make that happen	19:02
fungi	Shrews: olaph: the lynnwood grill next door to me just started a brewery recently, and now have several kinds ready for consumption on premises	19:03
_david_	Shrews, i wonder about that comment in your code: WORKINPROGRESS ... It implies that there is more work to be done, but the change will not show up in any review lists until a new patch set is pushed.	19:03
*** vipul is now known as vipul-away		19:03
Ajaeger	clarkb: sure, this was untested, just wanted to know whether I'm on the right track.	19:04
Shrews	_david_: Where? Which comment?	19:05
_david_	git push convert it? Is that true? Or a change owner has to explicitly to convert it to Status.NEW?	19:05
_david_	https://gerrit-review.googlesource.com/#/c/36091/1/gerrit-reviewdb/src/main/java/com/google/gerrit/reviewdb/client/Change.java	19:05
_david_	line 285	19:05
*** yolanda has joined #openstack-infra		19:06
Shrews	_david_: The intent that, along with clarkb's patch, any WIP review would not show up in a reviewer's list. Once a new patchset is pushed to a WIP review, it becomes "Ready for review" again.	19:07
Shrews	does that answer your question? not sure exactly what you're looking for	19:07
jeblair	clarkb: i spot checked the output of your change locally, lgtm	19:08
clarkb	its basically a public draft	19:08
_david_	Shrews, Can you point me were that conversion take place?	19:08
_david_	I thought you have two buttons: WIP and Ready for review?	19:08
Shrews	_david_: i don't think it was recorded. it was mainly discussed in this channel	19:08
jeblair	Shrews: 'conversion' not 'conversation'	19:09
jeblair	Shrews: (i did the same thing, finally read it right the 3rd time)	19:09
Shrews	oh, duh	19:09
fungi	conservation	19:09
_david_	1/ git push => Status.NEW	19:09
jeblair	the conversion conversation was not conserved	19:09
_david_	2/ i click on WIP button => Status.WIP	19:09
_david_	my question how i suposed to get back to Status.NEW again ?	19:10
_david_	All use cases please ;-)	19:10
jeblair	fungi, mordred: https://review.openstack.org/48714	19:10
fungi	jeblair: yep, almost through reading that one	19:10
Shrews	_david_: case 1) new patchset uploaded, case 2) press R4R button. fin	19:10
Shrews	_david_: I don't remember the code well enough to point you to specific areas	19:11
notmyname	mordred: FYI https://review.openstack.org/#/c/48724/	19:11
_david_	Shrews, i didn't find where 1case 1) in code. can you pint me?	19:11
_david_	point	19:11
mordred	18:00:36 hub_cap \| one of my beefs is that i scream, fucking SCREAM at people internally	19:11
jeblair	hub_cap: i hear your screams from here	19:12
mordred	notmyname: responded	19:13
mordred	notmyname: swear swear swear grumble grumble swear swear	19:13
mordred	notmyname: I kept my comment short, to keep the swearing out, fwiw	19:13
Shrews	_david_: https://gerrit-review.googlesource.com/#/c/36091/1/gerrit-httpd/src/main/java/com/google/gerrit/httpd/rpc/changedetail/ChangeDetailFactory.java	19:13
*** basha has joined #openstack-infra		19:13
Shrews	_david_: I think. Like I said, I really can't remember the code too well	19:14
notmyname	mordred: and I'm the one who has to play the diplomat standing between the 2 of you ;-)	19:14
mordred	notmyname: lovely	19:14
mordred	well, his patch is completely non-functional	19:14
mordred	like, it's not even close to being functional. it looks like a patch made in anger with absolutely no thought	19:14
jeblair	notmyname: do you know if michael barton is planning on submitting a similar patch to the other 56 openstack projects?	19:15
notmyname	mordred: try not to review in that way ;-)	19:15
_david_	Shrews, i don't think so, there you put if the button on Views should be enabled or no	19:15
mordred	notmyname: I will not	19:15
*** mrodden has joined #openstack-infra		19:15
mordred	notmyname: I am not, in fact, going to review it further	19:15
jeblair	(because if not, it may not be as well thought out as the patch that added in pbr)	19:15
notmyname	mordred: jeblair: and, like I said, FYI.	19:15
Shrews	_david_: well, i don't remember then	19:15
mordred	notmyname: I believe "all of the openstack projects use it and it plays a key role in release management" should be clear enough	19:15
fungi	clarkb: no need for a check-tempest-devstack-vm-heat-slow since gate-blah is only in the experimental pipeline?	19:15
*** jcoufal has joined #openstack-infra		19:16
notmyname	mordred: yes, but "the way things are" is not a compelling argument for most people. /me being a diplomat	19:16
clarkb	fungi right	19:16
_david_	Shrews, and you are absolutely sure that it is implemented?	19:16
mordred	notmyname: I understand. but sometimes here, with as many projects as we have, I cannot make 56 different long-form arguments to everyone who would just happen to have chosen to solve the problem differently	19:17
Shrews	_david_: If it isn't then I don't know how review.o.o has been working that way for the last umpteen months	19:17
notmyname	mordred: agreed	19:17
*** DinaBelova has quit IRC		19:17
jeblair	notmyname: i, and i'm sure many others agree with you. something about the fact that he chose to propose that patch without even trying to understand why things are the way thay are rankles a bit.	19:17
mordred	notmyname: thank you, btw, for diplomating here	19:17
* lifeless is curious about which patch is being discussed; couldn't find the start of the conversation		19:17
mordred	lifeless: https://review.openstack.org/#/c/48724/1	19:18
*** MarkAtwood2 has quit IRC		19:18
notmyname	jeblair: yes, but from the opposite perspective, pbr is making his day-to-day life more difficult without offering any perceived benefit (ie he now has to repackage the library himself instead of using something on pypi, and it includes more dependencies that may also need to be repackaged too)	19:19
notmyname	jeblair: note I'm not arguing against pbr here	19:20
jeblair	notmyname: yep. pbr makes some things more and some things less difficult, no argument there. attempting to delete it is a strange way of learning about what those are and what solutions there may be to his problems.	19:22
notmyname	jeblair: we don't need to rehash long-form arguments about pbr here or now. I'll see what can be done	19:23
lifeless	mordred: notmyname: huh, what I find interesting is the lack of attempt to understand - did he file a bug on pbr and the situation it fails in?	19:24
mordred	notmyname: http://paste.openstack.org/show/47621/	19:25
mordred	that is what I would like to respond	19:25
mordred	notmyname: he does not have to repackage the library himself	19:25
openstackgerrit	A change was merged to openstack-infra/config: Make devstack jobs templates and create check jobs https://review.openstack.org/48714	19:25
notmyname	mordred: thanks. gotta run to a lunch meeting...	19:26
mordred	notmyname: if he would read the documentation put together for packagers, he would see that he has to set an env var	19:26
mordred	notmyname: thank you!	19:26
*** basha has quit IRC		19:27
hub_cap	lol mordred	19:28
hub_cap	jeblair: u might be able to hear those screams	19:28
jeblair	fungi, mordred, clarkb: Make devstack jobs templates and create check jobs just merged; exciting things should be happenening soon	19:28
clarkb	fingers are crossed	19:28
* mordred waits		19:28
*** odyssey4me has quit IRC		19:28
jeblair	hub_cap: i have been hearing a lot of sirens recently; do you have something to do with that?	19:28
openstackgerrit	Jeremy Stanley proposed a change to openstack-infra/config: Determine the package name when uploading to PyPI https://review.openstack.org/46805	19:29
hub_cap	nope. it could be the band of gypsies that have set up shop on dwight... a big bus of em, and some sleeping in cars in the area	19:29
openstackgerrit	A change was merged to openstack-infra/config: Determine the package name when uploading to PyPI https://review.openstack.org/46805	19:31
fungi	i'm not sure exciting is what i want out of my evening... here's hoping it's exciting in a good way and not in the usual way	19:31
jeblair	i'm going to run puppet on jenkins masters manually to make that happen a bit faster and smoother	19:33
jeblair	(to minimize the time that the check jobs don't exist before zuul reloads and starts using them)	19:34
*** odyssey4me has joined #openstack-infra		19:36
jswarren	bnemec, if you're not busy fighting neutron or any other component, https://review.openstack.org/#/c/46553 seems to have settled down a bit in case you're up for another look. I seem to have a talent for finding problems to work on that are not straightforward to explain and whose solutions are not easy to justify concisely. Just lucky, I guess.	19:40
jeblair	zuul change is going in now	19:41
jswarren	oops, wrong channel.	19:41
*** wchrisj_ has joined #openstack-infra		19:46
ryanpetrello	FYI, I have a review open for pecan which will resolve the WSME issue	19:46
*** CaptTofu has joined #openstack-infra		19:47
fungi	mordred: is the current thinking that pbr should only be a setup_requires a la https://git.openstack.org/cgit/openstack-infra/git-review/tree/setup.py#n20	19:47
*** jswarren has quit IRC		19:47
fungi	mordred: because basically all of the clients still have it listed in their requirements.txt as if it were a runtime requirement	19:48
fungi	which i can see potentially confusing downstream/distro package maintainers	19:48
jgriffith	jeblair: clarkb inerested in changing the settings to nova.conf in the gate..... not sure what repo/where the best place to do that is?	19:51
jgriffith	jeblair: clarkb I'd like to bump CONF.num_iscsi_scan_tries	19:51
mordred	fungi: it depends on whether or not they use it at runtime	19:52
fungi	ahh	19:52
mordred	fungi: for version processing	19:52
fungi	mordred: got it... the bits which are in the process of being moved to oslo	19:52
fungi	jgriffith: for devstack-gate jobs? if it makes sense to be adjusted as a default behavior for devstack, then in devstack. if it's really very specific to how we're testing things and not generally helpful (or potentially harmful) to other devstack use cases, then overriding in devstack-gate would be appropriate	19:53
fungi	but we try to keep devstack-gate from changing devstack defaults if at all possible, so that we don't "test with devstack" using configurations dissimilar to the way other people run devstack in general	19:55
*** rfolco has quit IRC		19:56
jgriffith	fungi: hmm... ok	19:56
jgriffith	fungi: there's an awful lot of "added" changes from devstack in the gate configs which is why I asked but cool by me	19:56
jeblair	jgriffith: we hate all of them	19:57
fungi	we've been moving those out as we can	19:57
jgriffith	jeblair: haha... Ok, now that makes more sense :)	19:57
*** MarkAtwood has joined #openstack-infra		19:58
*** SergeyLukjanov has quit IRC		19:58
*** ryanpetrello has quit IRC		19:59
*** vipul-away is now known as vipul		20:01
mordred	fungi: that's right	20:01
jeblair	zuul is now using the check jobs	20:02
*** _david_ has quit IRC		20:03
fungi	so it should be safe to re-diversify the pipeline precedence settings again?	20:03
*** ryanpetrello has joined #openstack-infra		20:03
jeblair	fungi: yes, if we're okay with the possibility of starving check of the unit test runners. so all told, i'm leaning toward leaving it for now.	20:04
fungi	k	20:04
openstackgerrit	Dirk Mueller proposed a change to openstack/requirements: Raise Babel requirements to >= 1.1 https://review.openstack.org/48739	20:05
openstackgerrit	Andreas Jaeger proposed a change to openstack-infra/config: Use Jenkins templates for old manual jobs https://review.openstack.org/47691	20:06
Ajaeger	clarkb: your suggested change worked fine for me, I've updated the patch, see ^^	20:06
*** alcabrera has quit IRC		20:07
clarkb	Ajaeger: cool, I will take a look	20:07
*** sarob has joined #openstack-infra		20:07
Ajaeger	clarkb: thanks. If you have further ideas, just comment on it and I'll fix in the following days. For now I'm calling it a day.	20:08
* Ajaeger waves good-bye		20:08
clarkb	have a good weekend	20:08
*** alcabrera has joined #openstack-infra		20:08
Ajaeger	clarkb: thanks, same to all of you!	20:09
*** yolanda has quit IRC		20:09
*** Ajaeger has quit IRC		20:09
clarkb	jeblair: which zuul change did you want reviewed?	20:11
jeblair	https://review.openstack.org/#/c/48684/	20:11
*** basha has joined #openstack-infra		20:11
*** sarob has quit IRC		20:13
clarkb	jeblair: we should also get https://review.openstack.org/#/c/46869/ in	20:13
clarkb	jeblair: I didn't approve due to the -1, but figure you can decide if that is worth overriding	20:13
*** prad_ has quit IRC		20:14
clarkb	48684 lgtm	20:14
*** dprince has quit IRC		20:14
mordred	48684 has now been reviewed by all of us	20:14
Alex_Gaynor	jeblair: want to review https://review.openstack.org/#/c/47953/ while you're in that area? (tahnks!)	20:14
jeblair	Alex_Gaynor: nice catch, thanks	20:15
*** CaptTofu has quit IRC		20:16
*** basha has quit IRC		20:16
*** prad has joined #openstack-infra		20:16
*** prad has quit IRC		20:16
*** rockyg has quit IRC		20:18
*** rockyg has joined #openstack-infra		20:18
jeblair	mordred: https://jenkins02.openstack.org/computer/precise38/builds	20:20
*** dmakogon_ has quit IRC		20:20
jeblair	that host was producing this error as fast as it could: https://jenkins02.openstack.org/job/gate-glance-pep8/619/console	20:20
jeblair	i disconnected/reconnected it	20:21
jeblair	i hate jenkins	20:21
jeblair	precise10 is doing it as well	20:21
*** CaptTofu has joined #openstack-infra		20:22
clarkb	jeblair: could that be related to the increase in slaves?	20:22
clarkb	jenkins does seem to have an upper bound on the number of slaves it can handle before it starts failing to keep them connected	20:23
jeblair	clarkb: beats me. do you understand that traceback?	20:23
clarkb	I don't	20:24
clarkb	it is trying to run a remote connection	20:24
jeblair	clarkb: want to spin up jenkins03?	20:25
fungi	i'm happy to start firing up a jenkins or two if you want to keep troubleshooting	20:26
clarkb	jeblair: we can try it	20:26
clarkb	I don't have much time to do that though, I need to finish preping for next week	20:27
fungi	looks like we used a 30gb flavor?	20:27
jeblair	clarkb: to be clear, i wasn't suggesting it as much as asking if that was your suggestion. ;)	20:27
fungi	8x vcpu with load average hovering a little over 5, slightly more than 50% of ram in active use (not buffers/cache). looks like it's sized appropriately--would be struggling a little on the next flavor down	20:29
*** wchrisj_ has quit IRC		20:29
clarkb	jeblair: ah, yes. So in the grizzly cycle with one jenkisn we ran into similar problems as we added more and more slaves	20:30
jeblair	clarkb: oh, did we see that error?	20:30
fungi	i was looking at jenkins02, which is interestingly a little more heavily-loaded than jenkins01 for some reason	20:30
clarkb	jeblair: I don't remember if it was this specific error, but it happened in a similar way. Immediately when starting jobs jenkins threw an exception indicating that something in the communication had failed	20:31
clarkb	fungi: oh maybe	20:31
jeblair	fungi: well, that was the jenkins to which those two slaves were attached	20:31
clarkb	fungi: maybe we are running into that issue with the threads hanging around again	20:31
fungi	mmm	20:31
*** flaperboon is now known as flaper87\|afk		20:31
fungi	could be, just catching it in the early stages so symptoms aren't nearly as pronounced yet	20:31
fungi	checking	20:32
jeblair	precise12 just threw the same error	20:33
jeblair	(also jenkins02)	20:33
fungi	1.5m threads	20:34
clarkb	hahahahahaha	20:34
fungi	Threads on jenkins02.openstack.org@166.78.48.99: Number = 1,628, Maximum = 2,152, Total started = 1,512,727	20:34
clarkb	sorry, I probably shouldn't find that so funny	20:34
clarkb	oh	20:34
fungi	oh, wait, wrong counter	20:34
openstackgerrit	David Peraza proposed a change to openstack/requirements: Adding sqlalchemy db2 dialect dependencies https://review.openstack.org/48745	20:34
fungi	so no, not anywhere near as high as that last time	20:34
clarkb	yeah the Number value is what you want and that doesn't look too terrible	20:34
fungi	pulling up 01 for a spot comparison	20:35
*** ryanpetrello has quit IRC		20:35
jeblair	i just checked the rest of the precise nodes on jenkins02, they're not failing jobs with that error (yet)	20:35
fungi	Threads on jenkins01.openstack.org@166.78.188.99: Number = 1,276, Maximum = 1,474, Total started = 862,538	20:36
*** ryanpetrello has joined #openstack-infra		20:36
fungi	so 02 is definitely higher, but only by about 30%	20:36
jeblair	btw, the status pgae, starting with 48516 is interesting -- that's what happens when changes behind a single change fail in succession	20:36
jeblair	(and yeah, the top is broken; i'll fix that next week)	20:36
*** sarob has joined #openstack-infra		20:37
fungi	wow, that's a great indication that the tempest change at 48516 is causing the trouble not for itself but for changes which follow	20:38
fungi	oh, except those failures aren't in tempest tests (yet)	20:39
jeblair	fungi: yeah, that would be the interpretation except that the actual problem is that all of those changes happened to hit our bad jenkins nodes	20:39
fungi	so csincidence	20:39
fungi	coincidence	20:39
fungi	should we put jenkins02 in shutdown and restart it to limp through before adding more masters (if we think we're bumping up against an inherent slave tracking limitation)?	20:40
fungi	and also scale down nodepool's per-master max setting?	20:40
jeblair	fungi: i reconnected those slaves, and they seem better at the moment; i think we can leave 02 as is for now; i don't really want to lose its capacity	20:41
fungi	k	20:41
*** jcoufal has quit IRC		20:42
fungi	so back to the earlier question... go ahead and start building more masters? or hold off until we're more certain it's warranted?	20:42
jeblair	i wasn't expecting problems until we had more slaves, but perhaps 200/master is the mark.	20:44
*** odyssey4me has quit IRC		20:44
pleia2	anteaya: gave owncloud a spin in win7 with IE9, all works as expected	20:44
*** flaper87\|afk is now known as flaper87		20:44
anteaya	woohoo	20:44
anteaya	thanks pleia2	20:44
pleia2	sure :)	20:45
* pleia2 logs out of windows before she gets dirty		20:45
jeblair	i think we peaked at around 186 slaves total	20:45
anteaya	no kidding	20:45
anteaya	that's a lot of slaves	20:45
jeblair	per master, including unit test workers	20:45
jeblair	https://bugs.launchpad.net/openstack-ci/+bug/1148900	20:46
uvirtbot	Launchpad bug 1148900 in openstack-ci "Could not initialize class jenkins.model.Jenkins$MasterComputer" [High,Fix released]	20:46
jeblair	blast from the past	20:46
fungi	nodepool is reinventing jclouds failure modes ;)	20:47
fungi	except not really, because these are static slaves which have been connected and running jobs just fine	20:48
jeblair	fungi: except these are long running nodes	20:48
* fungi nods		20:48
jeblair	fungi: i am leaning toward not spinning up another master	20:49
fungi	k	20:49
jeblair	i favor: if it happens again, restart that jenkins master, and if it happens again after that, add a new master.	20:49
zaro	pleia2: did you try map drive using webdav protocol?	20:49
pleia2	zaro: no, that's a good idea	20:49
fungi	i like that having multiple masters, we can restart them now without any downtime for other systems, merely temporary loss of capacity	20:50
jog0	are you running jobvs on rax yet?	20:50
jog0	Ithink that may be breaking the large-ops test	20:50
jeblair	jog0: yes	20:50
jog0	:(	20:50
jeblair	jog0: link?	20:50
zaro	pleia2: i had problems with that last time i tried.	20:50
jog0	so large http://logs.openstack.org/27/48727/1/check/check-tempest-devstack-vm-large-ops/a3e7745/	20:51
jeblair	jog0: yeah, it looks like the only successful runs of check-tempest-devstack-vm-large-ops have been on hpcloud	20:51
jeblair	jog0: any ideas?	20:51
fungi	threadcount on jenkins01 and 02 is equalizing a bit now as well	20:51
openstackgerrit	A change was merged to openstack-infra/config: Handle when `id` is null. https://review.openstack.org/47953	20:51
jog0	jeblair: we would have to tweek the large-ops number for rax	20:52
openstackgerrit	A change was merged to openstack-infra/zuul: On null changes serialize the id as null https://review.openstack.org/46869	20:52
openstackgerrit	A change was merged to openstack-infra/zuul: Allow multiple invocations of the same job https://review.openstack.org/48684	20:52
jeblair	jog0: why?	20:52
jog0	because it was tuned to work for hpcloud	20:52
jeblair	tuned?	20:52
jog0	the test check to see if it can boot x VMs using fake virt driver. where a common error is something timeing out	20:52
fungi	seems a bit inexact	20:53
jeblair	jog0: so why would that need to be different?	20:53
*** MarkAtwood has quit IRC		20:53
jog0	so rax cloud is running slower so timeouts happen with less VMs	20:53
jeblair	BuildErrorException: Server %(server_id)s failed to build and is in ERROR status	20:53
fungi	basically it's performance-testing the cloud provider, it seems	20:53
jeblair	jog0: a server being in error state is a result of that?	20:53
jog0	fungi: yeah and our code too	20:53
jog0	jeblair: yup	20:54
jog0	nova-net times out	20:54
openstackgerrit	David Peraza proposed a change to openstack/requirements: Adding sqlalchemy db2 dialect dependencies https://review.openstack.org/48745	20:54
jog0	when all cloud resources were equal, the test just performance tested our code. but with two very different couds ... :(	20:55
jeblair	jog0: it was an illusion that all cloud resources were equal, i'm afraid	20:55
jeblair	even hpcloud has significant variance	20:55
jog0	some are more equal then others?	20:55
jeblair	especially when we approach release deadlines. :)	20:55
jog0	jeblair: yeah the number I picked before seemed pretty stable	20:55
jog0	accross all HP cloud	20:55
jeblair	so these aren't really designed to be performance tests -- ideally these should work on developers laptops too...	20:56
jog0	never got fails like this with HP cloud, at least extremely rarely (I never found one)	20:56
jog0	jeblair: it does you just have to pick one param	20:56
jeblair	jog0: ideally the test would be structured to be more tolerant of the environment it's running in. but for our immediate problem, would you like to adjust the parameter or remove the test?	20:57
jog0	jeblair: lets just remove it due to the nature of the gate right now I think its safe to say this shouldn't get priority at this juncture	20:58
jog0	and revist post havana	20:58
*** julim has quit IRC		20:58
jeblair	jog0: shame to lose a test. :(	20:59
jog0	yeah ...	20:59
jog0	I think the answer in the future will be have two numbers one for hpcloud and one for rax	20:59
jog0	that will take at least a day of testing and whatnot to get right	21:00
jeblair	jog0: and one for the next provider we get, and one for the one after that?	21:00
jog0	have to run recheck a dozen times or so to be sure I am right	21:00
jog0	we can maybe find a CPU perf metric to corrilate with a number	21:00
jog0	once we get two datapoints	21:00
fungi	unfortunately, those will also probably have to be retuned even for existing providers as their performance characteristics change over time	21:01
jog0	so if CPU A is 30% slower then CPU B, number should be 30 percent lower too	21:01
*** freyes has quit IRC		21:02
jog0	fungi: perhap, the test is there to detect order ofmagnitide slowdowns	21:02
*** matty_dubs is now known as matty_dubs\|gone		21:02
jog0	and I would hope a cloud wouldn't have that fluctation	21:03
jeblair	jog0: i used to hope that	21:03
*** sodabrew_ has joined #openstack-infra		21:04
jog0	jeblair: lets talk about a smarter way to do this in Edinburgh	21:04
jog0	or HK	21:04
fungi	we've definitely been in situations where new vms ended up on compute nodes with very resource-hungry neighbors	21:04
jeblair	jog0: we have seen some of the metrics we care about change up to 3x over time; including both cloud providers.	21:04
* fungi needs to disappear and do a bit of cooking... bbl		21:05
jog0	jeblair: ouch	21:05
*** sodabrew has quit IRC		21:06
jog0	well if we ollect those numbers today ... we can make something adjust to that	21:06
jeblair	jog0: so i think we can probably live with running the large-ops test only on hp for now, as long as we definitely plan to improve it later.	21:06
jog0	that would be awesome	21:06
jog0	that test came out of the issues with rootwrap	21:07
*** ArxCruz has quit IRC		21:07
*** tjones has joined #openstack-infra		21:07
jog0	jeblair: didn't realize that was an option to put it on one cloud only	21:08
*** julim has joined #openstack-infra		21:08
jeblair	jog0: it's not a good option -- it's working against how we're trying to manage resources. and if we have further problems, it'll be the first thing to go. but we can try it. :)	21:08
*** senk has joined #openstack-infra		21:09
*** jcoufal has joined #openstack-infra		21:10
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Run large-ops test only on hp nodes https://review.openstack.org/48748	21:10
*** rnirmal has quit IRC		21:11
jog0	fair enough	21:11
jog0	yeah we need to revsiit this in the near future	21:11
jeblair	jog0: so while you're around... other than Zhi Kun ZK Liu being on vacation, do you know if work on those 2 bugs is progressing?	21:12
jog0	jeblair: a little sdague and jgriffith and dims are doing stuff	21:13
jog0	jeblair: see -qa	21:13
jeblair	jog0: thx	21:14
jog0	my call to arms / public shaming worked a little	21:14
jgriffith	jog0: /window 25	21:15
jgriffith	crap	21:15
*** vipul is now known as vipul-away		21:15
*** senk has quit IRC		21:17
*** julim has quit IRC		21:19
*** tjones has quit IRC		21:20
jeblair	i just saw some more of those errors	21:20
jeblair	i've put jenkins02 in shutdown	21:20
*** markmcclain1 has joined #openstack-infra		21:21
*** markmcclain has quit IRC		21:22
*** markmcclain has joined #openstack-infra		21:22
*** markmcclain has quit IRC		21:24
*** markmcclain has joined #openstack-infra		21:24
*** markmcclain1 has quit IRC		21:26
jeblair	clarkb: ping	21:27
jeblair	clarkb: i need https://review.openstack.org/#/c/45348/ to be merged but it depends on https://review.openstack.org/#/c/45347/1	21:27
*** alcabrera has quit IRC		21:27
*** vipul-away is now known as vipul		21:28
*** anteaya has quit IRC		21:28
dims	k i'll be back in a few hours	21:29
*** markmcclain1 has joined #openstack-infra		21:29
jeblair	lacking that, i have manually executed "set global max_connections=1024;" in mysql on nodepool	21:29
*** mriedem has quit IRC		21:30
*** markmcclain has quit IRC		21:30
ryanpetrello	okay, a new version of pecan (0.4.2) has been released that resolved the wsme breakage	21:36
jeblair	oh nevermind, 0.6.1 doesn't have it either	21:36
jeblair	clarkb: ^	21:36
clarkb	jeblair: looking	21:36
dhellmann	jeblair, fungi: we'd like to land https://review.openstack.org/#/c/43145/ so we can set up cross-check jobs to gate pecan and WSME. The change has 2 +2 but isn't approved. Is there something else we need?	21:36
jeblair	clarkb: i'm trying to add max_connections; i don't think it's supported even in 0.6.1. i may have to add a /etc/mysql/conf.d/ file	21:38
clarkb	jeblair: we could potentially go to an even newer version. 0.6.1 was chosen to minimize delta while getting the desired results	21:38
mgagne	jeblair: looks to be only supported in 1.0.0. adding a custom conf file looks to be the solution atm. I have the same problem with my setup.	21:39
jeblair	dhellmann: i think we're afraid to merge that at the moment (if it goes wrong everything breaks), and there's quite a bit of excitement already.	21:39
dhellmann	jeblair: fair enough :-)	21:39
dhellmann	jeblair: we'll work on setting up the tests, and come back when things settle down to configure the gate jobs	21:39
mgagne	jeblair: 0.9.0 supports it https://github.com/puppetlabs/puppetlabs-mysql/blob/0.9.0/manifests/config.pp#L117	21:39
jeblair	dhellmann: ok. feel free to ping us when you think it might be a good time (in case it slips our minds)	21:40
dhellmann	jeblair: count on it! ;-)	21:40
jeblair	mgagne: oh, that might work. it has both config_hash and max_connections.	21:41
dkranz	This recent failure looks like some infra issue but I haven't seen it before http://logs.openstack.org/45/41345/8/check/gate-tempest-devstack-vm-neutron/e91142b/console.html	21:41
mgagne	jeblair: 0.8.0 looks to be the first version to support the parameter.	21:41
jeblair	dkranz: in what way?	21:41
dkranz	jeblair: It seems to just stop during setup of tempest	21:42
*** pabelanger has quit IRC		21:46
jeblair	dkranz: it looks like it stopped while running devstack. but i don't think it's an infra problem -- the node continued to run, including doing all of the cleanup work and copying the log files	21:47
dkranz	jeblair: So what kind of problem do you think it is? Should I just recheck no bug?	21:48
dkranz	jeblair: I've been trying not to do that.	21:48
jeblair	dkranz: i'd start with the idea that it's a bug in devstack. note that lots of services are running and devstack has been doing work to set up images, etc... so it at least got that far.	21:51
dkranz	jeblair: OK, I'll check there and file a bug if I don't turn up anything. Thanks.	21:52
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Set mysql max_connections to 1024 on nodepool https://review.openstack.org/48755	21:52
*** bnemec_ has joined #openstack-infra		21:53
jog0	dkranz: I have seen things like this before so opening a bug maybe a good idea	21:55
dkranz	jog0: I asked Jim and he suggested starting with the idea that it is a devstack bug	21:55
dkranz	jog0: I will file a bug there if there isn't one already	21:55
*** pcm_ has quit IRC		21:56
dkranz	jog0: Because the job does finish but just stops in the middle of devstack running	21:56
dkranz	jog0: presumably returning non-zero exit code	21:56
jog0	sigh yet another racy bug	21:56
*** bnemec has quit IRC		21:57
fungi	we don't have enough of those yet	21:58
openstackgerrit	A change was merged to openstack-infra/config: Run large-ops test only on hp nodes https://review.openstack.org/48748	21:59
*** flaper87 is now known as flaper87\|afk		22:02
*** pabelanger has joined #openstack-infra		22:02
mordred	moring all. I'm back on line - anything I can jump on?	22:03
jeblair	i'm about to restart jenkins02 because of the errors we saw earlier (check scrollback)	22:04
clarkb	mordred: puppet-mysql has come up again	22:04
mordred	clarkb: ugh. what now?	22:04
clarkb	mordred: thats not super urgent though	22:04
clarkb	mordred: jeblair needs to limit the number of connections for nodepool and the version of the module we have doesn't do that	22:04
clarkb	mordred: newer versions do	22:04
jeblair	clarkb: _raise_ the limit	22:04
lifeless	anyone seen	22:05
lifeless	File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/pip/backwardcompat/__init__.py", line 90, in fwrite	22:05
mordred	ah. interesting	22:05
clarkb	jeblair: ah	22:05
lifeless	f.write(s)	22:05
lifeless	ValueError: I/O operation on closed file	22:05
lifeless	before ?	22:05
mordred	jeblair: not doubting - but are you sure that's what you want to do?	22:05
jeblair	mordred: yes. please read the commit message and let me know if you think otherwise.	22:05
mordred	jeblair: increasing max_connections often has less positive effects than you might want (if you are sure, then fine, just checking)	22:05
mordred	jeblair: ok. cool.	22:05
mordred	looking	22:05
jeblair	mordred: i'm not running a php script in apache, which is more or less what the default is tuned for. :)	22:06
mordred	ah. ok. so, each threadconnection should essentially be performing like a quick query	22:06
*** thomasm has quit IRC		22:06
mordred	the patch looks good- I potentially agree with fungi's comment - but I haven't really used conf.d files in anger	22:07
jeblair	yes, except it might be a couple of queries separated by like 10 minutes, but each only looking at one row.	22:07
fungi	jeblair: i think it's evidence nodepool should have been written in php	22:07
mordred	lifeless: yes. but I cannot for the life of me remember why or what it was trying to do wrong	22:07
jeblair	mordred: if you could answer fungi's comment-question, that would be swell.	22:08
*** dcramer_ has quit IRC		22:08
mordred	ah - answer is "yes"	22:08
jog0	clarkb: can you make the elastic-recehck gerrit user	22:08
mordred	it needs to be in [server[]	22:08
mordred	it needs to be in [server]	22:08
jog0	so I don't have to keep using my own account	22:08
mordred	or mysqld	22:09
mordred	either will work	22:09
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Set mysql max_connections to 1024 on nodepool https://review.openstack.org/48755	22:09
morganfainberg	jog0, using your own account just makes you look like you're looking at everyone's changes ;)	22:10
*** sodabrew_ has quit IRC		22:10
jog0	morganfainberg: but it sends me too many emails	22:10
jeblair	i'm going to upgrade the gearman plugin on jenkins02 since i'm restarting it anyway	22:10
morganfainberg	jog0, hehe. i bet.	22:10
fungi	jog0: i can do it after i stop cramming food in my mouth hole. need an ssh key and, if possible, a dedicated contact e-mail address (not shared with any other gerrit user since gerrit has issues with duplicate e-mail addresses) and a display name you want it using in comments if different from the ssh username (can include spaces and whatnot)	22:10
*** tjones has joined #openstack-infra		22:11
jeblair	fungi: this is an infra account	22:11
jeblair	it's going to be run on the logstash host	22:11
fungi	oh	22:11
jeblair	fungi: so i think we should create it ourselves and stick it in hiera	22:11
fungi	so we'll want to puppet the keys in and whatnot	22:11
jeblair	https://review.openstack.org/#/c/47497/	22:11
jog0	jeblair: I was hopign at first I could run it on my box for debugginga nd whatnot	22:11
jog0	if not I can work around that too	22:12
clarkb	why don't I fix my review really fast	22:12
clarkb	then maybe we can just deploy it on logstash.o.o and debug there	22:12
*** sarob has quit IRC		22:12
jog0	clarkb: works for me	22:12
*** AlexF has joined #openstack-infra		22:12
*** sarob has joined #openstack-infra		22:13
*** alexpilotti has joined #openstack-infra		22:14
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Deploy elastic-recheck on logstash.openstack.org. https://review.openstack.org/47497	22:15
*** flaper87\|afk is now known as flaper87		22:15
clarkb	jog0: fungi jeblair ^ there we go	22:15
*** sarob has quit IRC		22:17
jog0	clarkb: so I don't think elastic-recheck is wired up to pip yet	22:18
jog0	not really sure whats needed to put on pypi	22:18
clarkb	jog0: we don't need it on pypi	22:19
clarkb	jog0: we will CD it from git	22:19
clarkb	jog0: we just need it to be python setup.py installable	22:19
jog0	even better	22:19
jog0	ohh haven't tried that heh	22:20
clarkb	eventually we may want to pypi it, but for now this is good	22:20
jeblair	restarting jenkins02	22:20
*** datsun180b has quit IRC		22:22
jeblair	the thing i love about the gearman plugin is how it starts running jobs before jenkins webui is even up.	22:24
*** jcoufal has quit IRC		22:25
mordred	jeblair: ++	22:26
jeblair	even before the nodes themselves are ready.	22:27
mordred	well, that's less exciting, but still fun	22:28
*** justinabrahms has joined #openstack-infra		22:28
jeblair	well, after failing 100 jobs or so, it seems to be a bit better now.	22:30
sdague	clarkb: where in the tree are the logstash parsing rules?	22:30
clarkb	sdague: modules/openstack_project/templates/logstash/indexersomsething	22:31
*** _david_ has joined #openstack-infra		22:32
clarkb	sdague: http://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/templates/logstash/indexer.conf.erb	22:32
_david_	clarb, jeblair, mordred done ;-)	22:32
_david_	WIP plugin (on top of Gerrit 2.8): https://github.com/davido/gerrit-wip-plugin	22:32
_david_	Even with screen cast, you can see it in action on new and shiny change screen 2	22:33
*** flaper87 is now known as flaper87\|afk		22:33
sdague	clarkb: cool	22:33
_david_	And this is the patch upstream that still needed for that to work: https://gerrit-review.googlesource.com/50250	22:34
clarkb	_david_: are there any ACLs around it?	22:34
_david_	clarkb, sure ;-)	22:34
_david_	let me point you to that:	22:34
clarkb	_david_: that is where zaro's patch comes in, being able to allow change owners permissions to do things to a change that not everyone else may be able to do	22:34
clarkb	_david_: awesome	22:35
_david_	clarkb, take a look on pictures	22:35
_david_	in Gerrit 2.8 i introduced so called plugin owned capabilities (old permissions):	22:36
_david_	https://github.com/davido/gerrit-wip-plugin/blob/master/src/main/java/com/googlesource/gerrit/plugins/wip/WorkInProgressCapability.java	22:36
_david_	so you can just annotate REST endpoints:	22:36
_david_	https://github.com/davido/gerrit-wip-plugin/blob/master/src/main/java/com/googlesource/gerrit/plugins/wip/WorkInProgressAction.java#L40	22:36
clarkb	_david_: then in your ACL config you would give that capability to groups?	22:37
* _david_ solved ACL in another patch already:		22:37
*** che-arne has joined #openstack-infra		22:38
jeblair	clarkb, mordred, fungi: i had to disconnect/reconnect some slaves from jenkins02 because they couldn't find their workspace	22:38
mordred	jeblair: k. that's weird	22:38
_david_	clarkb, https://gerrit-review.googlesource.com/#/c/46970/	22:38
jeblair	i think it's because gearman plugin starting using them too early	22:38
jeblair	and they don't seem to be able to fix themselves	22:38
*** tjones has quit IRC		22:39
_david_	clarkb, exactly, Capabilities are global permisions (exactly like in Shrews change).	22:39
mordred	jgriffith: just catching up - are you making progress anywhere with the CONF.num_iscsi_retries ?	22:40
clarkb	_david_: perfect	22:41
*** CaptTofu has quit IRC		22:42
jgriffith	mordred: just started running it through gates	22:44
mordred	jgriffith: awesome. here's hoping it helps!	22:44
jgriffith	mordred: https://review.openstack.org/#/c/48752/	22:44
jgriffith	ditto... although at this rate it will take forever to have any good data	22:44
jeblair	i just disconnected all of the precise slaves from jenkins02	22:45
jeblair	that was a lot of clicking	22:45
jeblair	i think the restart process needs to be:	22:45
jeblair	enter shutdown mode; wait; disable gearman plugin; stop; start; wait; enable gearman plugin	22:45
mordred	jeblair: yes. I agree	22:46
jeblair	clarkb, fungi: ^ fyi	22:46
*** dcramer_ has joined #openstack-infra		22:48
*** _david_ has quit IRC		22:51
fungi	makes sense to me	22:52
clarkb	we didnt have problems with the last restart	22:52
clarkb	but being defensive can't hurt	22:53
fungi	we probably need something somewhere which can tell whether the slaves are ready and waits for them to settle before jenkins starts accepting jobs on their behalf	22:53
sdague	where is that cookie cutter repo again?	22:53
fungi	or maybe it just waits for us to start connecting slaves directly to the gearman server	22:53
fungi	sdague: openstack-dev/cookiecutter	22:54
sdague	jgriffith: it seems to have hit the same issue again	22:54
jgriffith	anybody else noticed the errors spewing everywhere	22:59
*** nicedice has joined #openstack-infra		22:59
* fungi checks his faucet		23:02
fungi	jgriffith: which errors? and i assume spewing in job failure console logs, but... example?	23:03
jgriffith	fungi: http://logs.openstack.org/52/48752/2/check/check-tempest-devstack-vm-postgres-full/b0e6a41/logs/screen-n-cpu.txt.gz	23:03
jgriffith	fungi: just step through a search on error or trace	23:03
*** rcleere has quit IRC		23:04
jgriffith	fungi: I'm also confused by the xen volumes mounted in this test output	23:04
*** AlexF has quit IRC		23:05
jgriffith	xen-vdb-51744-part1 etc	23:05
fungi	grr. i'm clearly on the wrong evening computer. its hanging up my browser	23:05
*** sodabrew has joined #openstack-infra		23:05
* jgriffith wants diff computers for diff times of day :)		23:06
jgriffith	jeblair: sdague well that didn't tell us much except that upping the retry count isn't going to help us	23:07
jgriffith	what's bothersome about this is if you look at syslog, it appears that we connected over IET succesfully	23:09
*** boris-42 has quit IRC		23:11
fungi	eek, clicking trace on that log oom'd firefox, but took this poor netbook with it for several minutes while it dod so	23:11
fungi	did so	23:11
*** gyee has quit IRC		23:11
fungi	512mb ram used to seem like a lot	23:12
jgriffith	fungi: hehe	23:12
* jgriffith takes back his earlier comment about wanting multiple coputers like fungi		23:12
jgriffith	:)	23:12
fungi	yeah, you don't want these	23:12
* fungi has random linux thinnish-clients scattered around the house		23:13
jog0	clarkb: python setup.py install works for elastic-search	23:13
jog0	just doesn't install any binaries	23:13
clarkb	jog0: awesome. I think the puppet is mostly ready then (it is missing an init script, but we can run it manually until we get one)	23:14
jog0	cool	23:14
clarkb	fungi: yes manually running it was the intention until we had time to do it proper like	23:14
clarkb	fungi: did you still want to create the system account and put it into hiera? I am being distracted by Fridayness	23:15
clarkb	eg end of week fried brain	23:15
openstackgerrit	Salvatore Orlando proposed a change to openstack-infra/devstack-gate: Revert "Enable q-vpn service" https://review.openstack.org/48767	23:16
jgriffith	hey wait...	23:17
jgriffith	is it just me or is that SID not correct?	23:17
clarkb	SID?	23:18
jgriffith	SCSI ID	23:19
jgriffith	something's not aligning correctly in the logs	23:19
jgriffith	so notice in the nova logs we try to open/connect around 22:44:17	23:20
jgriffith	and the scsi ID is 6	23:20
jgriffith	then check the syslog, and at that time you see a connection made for a target ID 5	23:21
jgriffith	Ohhhhh	23:22
jgriffith	hmmmm	23:22
sdague	any idea why https://review.openstack.org/#/c/48626/ didn't collect logs after timeout	23:24
clarkb	sdague: it didn't get a chance to run the cleanup function in devstack-gate	23:26
clarkb	that is an annoying problem	23:26
sdague	ok	23:26
mordred	something about this: "jgriffith \| hmmmm" terrifies me	23:26
sdague	he didn't say muhahaha	23:26
jgriffith	nahh, was wondering if there's something bad happening with iscsi mixing up targets	23:26
mordred	jgriffith: I blame shuttleworth	23:27
clarkb	sdague: not sure how we can handle that better. couple things come to mind like run a post build shell action that does the copying or trapping SIGINT and running cleanup then (assuming taht is how jenkins is killing the test)	23:27
jgriffith	mordred: ha! I've been doing that for a year!	23:30
sdague	mordred: that's always your answer, at least on fridays	23:30
mordred	sdague: also on the other days that end in y	23:31
*** ryanpetrello has quit IRC		23:32
*** che-arne has quit IRC		23:34
jeblair	so who wants to restart jenkins01? :)	23:36
jeblair	it's not exhibiting problems, but i think it would be a good idea, possibly as a preventive measure, and also to upgrade the gearman plugin	23:36
jgriffith	K, on a hunch that there's a target collision I'm ading a show targets message to the output	23:37
jeblair	(i've uploaded the plugin, so it will take effect on restart)	23:37
jgriffith	I'm likely not going to be around for a bit but I'll check it out when I get back to a computer	23:37
openstackgerrit	A change was merged to openstack-infra/jenkins-job-builder: Add publisher for Git Publisher support https://review.openstack.org/46417	23:38
jeblair	(i also uploaded it to jenkins.o.o)	23:38
*** alexpilotti has quit IRC		23:42
mordred	jeblair: the process is "put into shutdown; wait; disable gearman plugin; wait; stop; start; enable gearman plugin"	23:43
mordred	jeblair: right?	23:43
jeblair	mordred: yes	23:43
mordred	putting jenkins01 into shutdown mode	23:48
jeblair	i'm heading out	23:49
jeblair	mordred: thanks for taking care of 01	23:49
mordred	k. sure thing! thanks for taking care of all of infra!	23:49
clarkb	++ jeblair is a good keeper of the gate keeper	23:50
*** mgagne has quit IRC		23:51
jeblair	mordred: if you want to do jenkins.o.o at the same time it's ready (and should be easy, can probably do it while you're waiting on 01)	23:52
*** KennethWilke has quit IRC		23:53
*** sodabrew has quit IRC		23:53
*** UtahDave has quit IRC		23:54
mordred	jenkins is in shutdown mode	23:55
*** sodabrew has joined #openstack-infra		23:57
*** sodabrew has quit IRC		23:58

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!