Friday, 2013-09-27

dansmithsorry if I missed it in the scrollback, but things are wedged right now, yes?00:00
*** sarob has quit IRC00:01
sdagueclarkb: awesome00:02
fungidansmith: not that anyone's said until now...00:04
sdaguedansmith: stable/grizzly is still a problem00:04
dansmithfungi: the top thing in the check queue looks to have been there for five hours00:04
sdaguebut master should be fine00:04
dansmithmy thing queued for master has been sitting in check for 2+ hours00:05
clarkbdansmith: I think what is happening there is we have enough jobs in the gate queue that we are starving the check queue00:06
clarkbdansmith: as gate queue jobs get dibs on slaves first00:06
dansmithclarkb: really? 36 in the gate right?00:06
clarkbdansmith: yes00:06
clarkbdansmith: but the new NNFI causes a lot more thrashing. Less time in between for check to catch up00:06
dansmithit's been much higher than that in the not to distant past00:06
clarkbtl;dr we need to fix flakyness00:07
dansmiththat's some pretty bad starvation.. 5h with no progress..00:07
*** gyee has quit IRC00:07
jeblairalso, more cloud servers00:07
jeblairbut mostly flakyness00:07
*** sarob_ has quit IRC00:09
sdaguejeblair: can we burst some more nodes? getting to rc1 is going to be tough if stuff is hanging in check that long00:10
*** ArxCruz has joined #openstack-infra00:10
*** sarob has joined #openstack-infra00:10
sdaguealso, we should probably drop large-ops from gate, non voting on the gate just burns time00:11
fungiwe'd need to get hp to raise our quotas, right?00:11
sdagueor put the rack nodes back in rotation00:11
sdagueslow on check wouldn't be that big a deal00:11
*** dims has joined #openstack-infra00:11
*** sarob has quit IRC00:12
*** sarob has joined #openstack-infra00:13
jeblairsdague: that's what i've been working on.  :)00:13
*** dcramer_ has joined #openstack-infra00:14
jeblairsdague, fungi: zuul is able to thrash nodes faster than nodepool can keep up, so i'm working on getting nodepool to be able to more or less instantly burst to capacity00:14
jeblairwe are, however, at the moment pretty close to capacity.00:14
jeblair(we've worked up to it over a while)00:14
* fungi nods00:15
*** reed_ has quit IRC00:17
jeblairnode selection by pipeline is possible.  we could reserve rackspace nodes for that purpose.  we're going to run into unit test node starvation too, which is the next thing i'm going to work on.  of course we can spin up more static nodes for now.00:17
openstackgerritSean Dague proposed a change to openstack-infra/config: drop large-ops from gate (it's non voting)
jog0was down for a split second for sdague's new patch?00:17
clarkbjog0: apache may have been restarted momentarilly00:17
sdagueso that will help a little00:17
*** adalbas has quit IRC00:17
jeblairjog0: are you ready to make large-ops voting or should we consider ?00:17
jog0clarkb: that explains what i saw thanks00:18
jog0jeblair: I am ready00:18
jeblairjog0: then can you propose a change to do that00:19
sdagueso the neutron job looks like it has < 50% pass rate right now -
clarkbit would be cool if gearman priority could be weighted so that as things aged in check they would get more priority and could flop positions with gate00:22
jeblairsdague: that includes check jobs00:22
sdagueit does00:22
sdaguebut I watched 2 neutron based resets in the last 4 minutes00:22
jeblairi'm going to be busy with the nodepool bursting change, if someone else wants to take making rackspace nodes available for check jobs00:22
clarkbjeblair: I can take a quick stab at it. There is a usergroup thing at 6 that I plan on going to though00:24
jeblairsdague: that's complex.  i'd rather throw more machines at the problem.00:24
clarkbjeblair: how would we make it so those nodes are only used for check? new label and new jobs?00:24
*** UtahDave has quit IRC00:24
clarkbor use a zuul function?00:24
jeblairclarkb: new label and zuul parameter function that sets the node to that label00:25
clarkbgot it.00:25
openstackgerritJoe Gordon proposed a change to openstack-infra/config: Make gate-tempest-devstack-vm-large-ops voting
jog0jeblair: done00:25
*** matsuhashi has joined #openstack-infra00:28
*** colinmcnamara has joined #openstack-infra00:28
*** MarkAtwood2 has quit IRC00:29
*** colinmcnamara has quit IRC00:35
*** rockyg has quit IRC00:38
*** nosnos has joined #openstack-infra00:38
openstackgerritClark Boylan proposed a change to openstack-infra/config: Use rackspace for tempest check tests.
clarkbjeblair: fungi mordred ^ I am really sure that is wrong as gearman node selection doesn't happen with NODE_LABEL iirc00:39
clarkband I need to head to the user group thing, but that should jumpstart the process, feel free to push better patchsets00:39
*** jhesketh has joined #openstack-infra00:39
*** rnirmal has quit IRC00:39
*** senk has joined #openstack-infra00:40
*** kong has joined #openstack-infra00:41
jheskethjeblair: What do you think about introducing conditional reporting into zuul. For example, since we'll be running our own zuul to report back to gerrit we don't want it to report on merge failures. In fact, we probably only need it to report in certain cases. For example, when our tests fail we always want to report FAILURE but we only need to report SUCCESS when there is a new migration introduced.00:41
*** weshay has quit IRC00:41
*** CaptTofu_ has quit IRC00:43
jog0clarkb sdague logstash is only 7 hours behind now!00:44
*** julim has joined #openstack-infra00:44
jog0looks promising hopefully its not just related to peoples workday00:44
sdagueyeh, we'll find out tomorrow00:45
jog0sdague: saw your new patch in action, will make gate on stacktrace easy00:45
clarkbits not. job queue fell by 100k in about an hour00:46
clarkbchange definitely helped00:47
*** julim has quit IRC00:48
*** senk has quit IRC00:49
jog0that should have been Obama's catch phrase for his second term00:51
*** senk has joined #openstack-infra00:53
*** senk has quit IRC00:53
*** senk has joined #openstack-infra00:54
mordredsdague: I did not see your patch. tell me about it!00:59
mriedemsdague: do you have any ideas about this quantumclient issue in the stable/grizzly gate?
*** portante|afk is now known as portante01:03
*** xchu has joined #openstack-infra01:04
Alex_GaynorHmm, so we probably have the ability to compute what %age of gate  jobs are passing?01:04
jog0Alex_Gaynor: there is a way but I forget but it uses graphite.openstack.or01:05
Alex_Gaynorjog0: trying to analyze if my feeling that the fail rate has been crazy high for the last 1-2 days is accurate01:05
Alex_GaynorSo going back two weeks leads me to believe that yes, failure rates are up01:09
jeblairjog0: what's the attraction of graphlot?01:09
*** sodabrew has quit IRC01:09
jeblairas opposed to composer01:10
jeblairi find composer easier to use for finding metrics, changing time windows, and applying funcitions...01:11
Alex_Gaynorjog0: cool, science confirms my intuition!01:12
*** jrgarciahp has quit IRC01:12
jog0jeblair: that was the link that I found first01:12
jog0Alex_Gaynor: I can point to the bug too01:13
jeblairjog0: please do; i'd like to see who is assigned01:13
*** senk has quit IRC01:13
Alex_Gaynorjog0: my impression was there was a handful of bugs causing this?01:14
jog0Alex_Gaynor: at least one bug01:15
jog0jeblair: no one because I noticed it today01:15
jog0I can't even find a stacktrace that caused it01:15
Alex_Gaynorjog0: my impression was that it was the boto and the test_volume_boot_pattern ones?01:15
jeblairjog0: thank you for that.01:15
uvirtbotLaunchpad bug 1230407 in neutron "State change timeout exceeded" [Undecided,Confirmed]01:16
jeblairalso, i'm becoming more and more keen on the idea that we should run the neutron test 10 times for every neutron change01:16
jog0jeblair: hahaha01:16
jog0by that I mean yes!01:16
*** thomasm has quit IRC01:18
jog0boot pattern01:18
uvirtbotLaunchpad bug 1226337 in tempest "tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern flake failure" [High,Triaged]01:18
jog0anyway you get the idea01:18
*** wenlock has quit IRC01:19
jog0anyone want to send those links to the openstack-dev ML?01:19
jog0shaming people for destabilizing during stabilization01:19
jeblairjog0: do you not want to?  in the past, sdague has started a thread naming specific critical bugs for gate failures and it has helped to focus attention01:20
jog0I will go ahead and do it01:21
jog0should be fun01:21
jog0unless sdague wants to01:21
*** dkliban has joined #openstack-infra01:22
*** mriedem has quit IRC01:23
* jog0 starts drafting a fun email01:25
*** kong has quit IRC01:28
*** jerryz has quit IRC01:28
*** jerryz has joined #openstack-infra01:29
*** ojacques has quit IRC01:35
*** melwitt has quit IRC01:37
*** rfolco has quit IRC01:39
*** CaptTofu has joined #openstack-infra01:40
jog0that should be fun01:42
*** CaptTofu_ has joined #openstack-infra01:45
*** CaptTofu has quit IRC01:47
jog0Alex_Gaynor: I can account for 200 failures in last 24 hours with just two bugs01:48
Alex_Gaynorjog0: :/01:48
*** ArxCruz has quit IRC01:49
jog0out of 30501:49
jog0or so01:49
morganfainbergjog0, thats crazy.01:54
lifelessmorganfainberg: pretty common01:59
lifelessmorganfainberg: you get a long tail effect01:59
morganfainberglifeless, aye, still.  i know i've had my fair share of rechecks on the bootpattern one01:59
morganfainberglifeless, just didn't realize how _much_ it affected everything02:00
mordredI think, as much as I don't like it in theory, that I'd like to skip those two tests in the normal runs02:01
mordredbut run an extra job for neutron with them on02:01
mordredand loop them 10x02:01
lifelessmorganfainberg: when we first got similar stuff in place for Launchpad, we had something like 80% explained by the first 4 bugs.02:01
mordredbecause those numbers above are crazy02:01
lifelessmorganfainberg: and then 80% of the remainder from 4 more bugs, and so on.02:01
morganfainberglifeless, lol02:01
mordredjog0, sdague: it's a little bitchy, but what do you think?02:02
*** ericw has quit IRC02:02
*** dkliban has quit IRC02:02
dimsjog0, which two tests specifically?02:03
mordreddims: jog0 just sent a mail to the -dev list with the deets02:03
*** ericw has joined #openstack-infra02:05
*** yaguang has joined #openstack-infra02:05
dimsmordred, thx02:06
lifelessmordred: I think it's a decent accomodation *if* the problem is test-side, not service side.02:06
lifelessmordred: if neutron is actually buggy - and I've seen stuff with tripleo these last few days that makes me think it's service side.02:07
lifelessmordred: then the gate is doing it's job and we need to fix the damn things before release.02:07
*** dkliban has joined #openstack-infra02:07
mordredlifeless: yes. I completely agree that we should fix the damn things before the release. I agree that the gate is doing its job02:07
*** senk has joined #openstack-infra02:08
mordredlifeless: I think I'm more brainstorming on how we can better place the onus to fix near where it could be fixed02:08
lifelessmordred: Ah, so thats interesting.02:08
lifelessmordred: From one sense, having it widespread gets more folk onboard faster.02:08
mordredyah. that's the original theory02:09
lifelessmordred: in fact, stopping other things changing while we fix brain damage helps prevent slippage: this is exactly the concern you and jeblair have about 'turn off bare metal gating if it breaks'.02:09
lifelessmordred: OTOH if slippage is a low risk, you are basically breaking everyone elses brains until the thing is fixed.02:09
mordredyeah. especially since the thing that is breaking is flaky, so the gate breakage isn't preventing slippage in this case02:10
mordredwhich is where the "take flaky tests and run a job which runs them 10x" idea comes in02:10
mordredif we can cause them to be _more_ breaking - but in a targetted manner02:11
lifelessmaybe we should just run everything N* where N gets us some confidence interval of 'very reliable'02:11
lifelesse.g. 10* -> 90% reliable.02:12
mordredyah. I could see that as a general strategy once we get past these02:12
lifelessrun 10 tempest jobs in parallel for every gate.02:12
mordredthe overall machine cost might still be lower than all the gate resets02:12
mordredif it helps us not let flaky things in02:12
lifelessjog0: do we have an identified bad commit ?02:12
lifelessjog0: like 'never before X' ?02:13
lifelesscan we revert the thing?02:13
*** reed_ has joined #openstack-infra02:15
*** senk has quit IRC02:18
jeblairlifeless: according to
jeblairlifeless: never before 2013-09-20T23:37:40.000 but the major problem started at 2013-09-25T02:09:44.00002:19
jeblairlifeless: you'll see what i mean if you look at the graph02:20
lifelessso a commit before 2013-09-25T02:09:44.00002:20
lifelessand not far before02:20
*** CaptTofu_ has quit IRC02:24
*** dguitarbite has joined #openstack-infra02:25
*** CaptTofu has joined #openstack-infra02:26
lifelessdoes openstack have a secure document store02:27
lifelesswhere e.g. I can store a bunch of passwords and give them out to selected tripleo folk ?02:27
lifelessfor context, I want to make getting access to the machines that will host the proposed baremetal test cluster something we can document and delegate.02:28
lifelessone test I'm considering is 'tripleo ptl + delegates'02:28
jeblairlifeless: no; anteaya is looking into owncloud for the board of directors; we've considered expanding its use if it works out for that.02:33
lifelessok, I'll do something icky for now, but please consider us interested.02:34
jeblairlifeless: related: there are plans forming for a keysigning event at the summit02:34
lifelessyeah, I need to do a key migration thing02:34
anteayawould we have an owncloud separate from the one the board of directors is using?02:34
lifelessmy gpg key is long in the tooth02:34
anteayaor everyone on one owncloud?02:34
jerryzHi everyone, got a version conflict error from oslo.config on my own devstack while starting nova-api,  need help , thanks02:34
mordredanteaya: unsure. I think we'll have to learn a little more about group permissions, management and users in owncloud02:34
anteayavery good02:35
jeblairyeah, and no need to get ahead; we can do baby steps.02:35
anteayaowncloud is up after puppet-dashboard starts processing reports02:35
anteayaup meaning next in line for my attention02:35
mordredjerryz: awesome! that's just great02:35
anteayajeblair: k02:35
jeblairlifeless: yeah, about a year ago i finally decided that having a 1024 bit key from 1996 was a liability, not a badge of honor.  :)02:36
mordredhow did we manage to land that change?02:36
jeblairjerryz: can you link to the change?02:37
mordredoh! wait02:37
jerryzno change here. Just sync the upstream and trigger a tempest test on my own devstack02:37
mordredjerryz: you may need to do something02:38
mordredjerryz: cd /opt/stack/new/oslo.config02:38
mordredrm -rf *.egg-info02:39
mordredgit pull --ff-only02:39
mordredsudo pip install -e .02:39
mordredjeblair: you know the one gotcha in the way we're calculating versoins? that a develop'd install is not going to ever pick up a new version?02:39
mordredjeblair: I believe that may be what has happened here02:40
mordredsdague, dtroyer ^^ we may want to put something in to restack to clean out egg-info files02:40
mordredso that git updates will re-gen versions properly across tag boundaries (where it might be important)02:41
mordredclarkb: if you get bored: I think is FINALLY actually ready02:48
*** anteaya has quit IRC02:50
jerryzmordred: thanks. but why the d-g test on o.o does not have this issue? what is the circumstance for it to happen?02:59
*** dims has quit IRC02:59
mordredjerryz: d-g test starts with a completely clean vm each time02:59
mordredyour vm had some unaccounted for state from previous versions of your git repo02:59
mordredjerryz: there is something that could be added to devstack to deal with this, and I'll add that to my todo list03:00
mordredbut you're lucky enough to have hit a strange corner case03:00
jerryzmine is manged by nodepool, i believe it will clean up used ones03:00
mordredoh! well that's a whole other thing03:03
jerryzmordred: any more info needed to debug this?03:07
*** dkliban has quit IRC03:07
mordredjerryz: honestly, I'm kinda stumped as to how that could happen if that is a completely fresh node03:08
mordredjerryz: and it's 11pm here, so I'm probably not going to dig in too much right now03:08
mordredjerryz: I'll try to figure out what's going on when I wake up03:08
jerryzmordred: ok. thanks. night03:09
*** sarob has quit IRC03:11
*** sarob has joined #openstack-infra03:12
*** matsuhashi has quit IRC03:15
*** sarob has quit IRC03:16
*** dkranz has joined #openstack-infra03:29
*** marun has quit IRC03:37
*** marun has joined #openstack-infra03:38
*** nati_ueno has quit IRC03:38
*** dguitarbite has quit IRC03:42
*** ryanpetrello has joined #openstack-infra03:42
pleia2hey, look at that, they link to our gerrit :)03:47
clarkbyup :)03:48
*** Ryan_Lane has quit IRC03:48
clarkbthose of you that are twittery should twitter the benefits of gerrit03:49
Alex_Gaynorgrumble, the rate of gate resets is resulting in starving the check pipeline03:49
clarkbAlex_Gaynor: yup03:49
clarkbAlex_Gaynor: should help03:50
*** marun has quit IRC03:50
clarkbI won't get to fixing it tonight, anyone else is welcome to03:50
clarkb(basically run tests in check on the other cloud)03:50
Alex_Gaynorclarkb: redundant array of independent clouds!03:51
*** marun has joined #openstack-infra03:51
hub_capmordred: promise im making progress on the new cli tool. ive got maybe ~2 days of work to go03:52
*** marun has quit IRC03:56
*** marun has joined #openstack-infra03:56
*** matsuhashi has joined #openstack-infra03:56
*** basha has joined #openstack-infra04:06
lifelessclarkb: hey, how do you get uber receipts into HP's system ?04:13
clarkblifeless: I have never had to do it for HP... I use it in seattle for personal things04:13
pleia2lifeless: I save the email receipt as pdf04:13
*** jerryz has quit IRC04:14
lifelesspleia2: ah yeah, print-to-pdf04:15
*** AlexF has joined #openstack-infra04:16
*** CaptTofu has quit IRC04:16
*** CaptTofu has joined #openstack-infra04:17
*** AlexF has quit IRC04:21
*** SergeyLukjanov has joined #openstack-infra04:31
*** AlexF has joined #openstack-infra04:31
*** basha has quit IRC04:32
*** reed_ has quit IRC04:37
*** basha has joined #openstack-infra04:38
*** sarob has joined #openstack-infra04:38
*** AlexF has quit IRC04:42
*** AlexF has joined #openstack-infra04:43
*** sarob has quit IRC04:44
*** ericw has quit IRC04:45
*** jerryz has joined #openstack-infra04:46
*** ericw has joined #openstack-infra04:48
*** odyssey4me has joined #openstack-infra04:50
*** basha has quit IRC04:52
*** boris-42 has joined #openstack-infra04:53
*** odyssey4me has quit IRC04:54
*** odyssey4me has joined #openstack-infra04:55
mordredclarkb: nice!04:55
*** DennyZhang has joined #openstack-infra04:56
*** sarob has joined #openstack-infra04:57
Alex_Gaynorwatching the gate today has been so sad04:59
Alex_GaynorHead of the gate was approved 10.5 hours sago :(04:59
mordredAlex_Gaynor: yeah. it's been a bad couple of days for that05:02
Alex_Gaynormordred: sadly I can't think of any sane approach to improving it besides "fix the bugs in tempest / <projects>"05:02
mordredAlex_Gaynor: yeah. well, did you see my terrible idea earlier (or combo of ideas)05:02
Alex_Gaynormordred: No, I missed it05:02
mordredAlex_Gaynor: disable the two bad tests in the normal runs, make a run that does run those tests - and on every neutron change, run 10 copies of that05:03
mordredthat way, most of the gate is fine, but neutron has to fix the bugs before anything else will land for them05:03
Alex_Gaynormordred: I... I kind of love it (assuming we're sure neutron is at fault)05:03
*** sarob has quit IRC05:04
mordredthe bad ones only happen when neutron is enabled05:04
Alex_Gaynormordred: probably the neutrno core reviewers shoudl also stop approving other patches05:04
mordredthen - once we've cleaned up the top reset offenders05:04
mordredadd a fanout run to every change which runs 5 copies of the neutron tests for everybody05:05
mordredit would explode node usage a bit, but I'm _guessing_ not as bad as all the resets05:05
Alex_GaynorPossibly we need to think of a more general approach to dealing with non-determinism in tests.05:06
mordredonly systemic way I can think of is running tests multiple times05:06
mordredto try to increase the odds of tripping non-deterministic things on their way in05:06
Alex_GaynorThe other issue is that non-determinism sometimes doesn't look like it's caused by a patch, even if it is, so people just recheck until it manages to land, even though it's exacerbating a problem05:07
Alex_GaynorI don't know how to address that.05:07
mordredwell, recheck itself is a bandaid05:07
clarkbya thats a big problem I think05:07
clarkbpush until it goes in just adds more badness05:08
mordredthat's there to deal with non-deterministic tests05:08
clarkbright but it feeds it too05:08
Alex_Gaynormaybe the system should handle reverifies with expontential backoff, to prevent a patch that really almost never passes. or something.05:09
mordredif we could figure out a better way to block flaky tests (such as parallel copies, or someting better)05:09
mordredthen we could make recheck/reverify go away05:09
Alex_Gaynorright, making them go away would be ideal05:09
mordredand save that feature for only things that infra triggeres, such as "the internet exploded"05:09
Alex_GaynorI wonder if the number of nodes we're spawning and shutting down produce a noticable blip for people at RS/HP observing. Probably not I guess05:11
*** afazekas_zz has quit IRC05:20
*** AlexF has quit IRC05:21
* mordred likes to think that both clouds have dedicated ops teams who just watch our activity and marvel05:22
jerryzmordred: could you tell me how package version number is calculated? i got variations of version numbers for  oslo.config  when doing pip install -e . locally05:22
mordredjerryz: yes, it's very similar to how git describe works05:23
mordredif the current commit is tagged, then that is the version05:23
*** nicedice has quit IRC05:23
mordredif the current commit is not taged, then the version is $next_version.a$number_of_commits_since_last_tag.g$git_short_sha05:23
mordredwhere next_version is the version in setup.cfg05:24
mordredthis is how the version is calculated for the server repos and for the oslo code05:24
mordredfor library code, it's different (and slightly easier)05:24
mordredjerryz: so _currently_ oslo.config master should be showing you:05:25
mordredmordred@camelot:~/src/openstack/oslo.config$ python --version05:25
mordredif you're not seeing that, then my guess would be perhaps you're not fetching tags?05:26
jerryzif my oslo.config code base is synced from upstream , which is review.o.o or github, the tag 1.2.1 should be already in the code05:27
jerryzwhy i still get 1.2.0.**** if i install from a git clone from my private repo that is synced with upstream05:27
*** cthulhup has joined #openstack-infra05:28
*** SergeyLukjanov has quit IRC05:29
*** cthulhup has quit IRC05:29
mordredthe only other thing is - if the repo was used before, the version calculation is cached in the egg-info dir05:31
mordredwhen you say "if i install from a git clone from my private repo that is synced with upstream" - how are you syncing your private repo?05:31
mordredjerryz: actually, funny story - look at the most recent commit to oslo.config05:33
mordredand the commit message05:33
mordredit seems this was a problem for us back on Sunday05:33
*** ericw has quit IRC05:33
*** odyssey4me has quit IRC05:36
jerryzmordred: it seems that when syncing the upstream to private repo, i didn't push tags05:37
mordredphew. well, that at least explains it!05:37
*** afazekas has joined #openstack-infra05:41
*** SergeyLukjanov has joined #openstack-infra05:42
*** SergeyLukjanov has quit IRC05:44
*** ryanpetrello has quit IRC05:44
*** ryanpetrello has joined #openstack-infra05:45
*** Ryan_Lane has joined #openstack-infra05:46
*** Ryan_Lane has joined #openstack-infra05:46
*** Ryan_Lane has quit IRC05:46
*** nati_ueno has joined #openstack-infra05:56
*** DennyZhang has quit IRC06:03
*** marun has quit IRC06:06
*** davidhadas_ has quit IRC06:06
*** amotoki has joined #openstack-infra06:15
*** yolanda has joined #openstack-infra06:15
*** afazekas_ has joined #openstack-infra06:16
*** afazekas_ has quit IRC06:17
*** jhesketh has quit IRC06:20
*** jhesketh__ has quit IRC06:20
*** jhesketh_ has joined #openstack-infra06:20
*** yongli_away is now known as yongli06:26
*** slong has quit IRC06:29
*** jhesketh has joined #openstack-infra06:34
*** shardy_afk is now known as shardy06:38
*** odyssey4me has joined #openstack-infra06:55
*** Ryan_Lane has joined #openstack-infra06:57
*** Ryan_Lane has quit IRC07:01
openstackgerritRongze Zhu proposed a change to openstack-infra/gitdm: Add two employees to UnitedStack
*** hashar has joined #openstack-infra07:20
*** Ryan_Lane has joined #openstack-infra07:21
ttxfungi: (to solve exclusionary reqs) if you except pep8 those seem to come from ceilometer and swift, but those two projects weren't in the gate in stable/folsom times, so i'm not sure why we would consider them ?07:22
*** hashar_ has joined #openstack-infra07:25
*** hashar has quit IRC07:25
*** hashar_ is now known as hashar07:25
*** fbo_away is now known as fbo07:25
*** hashar has quit IRC07:25
*** hashar has joined #openstack-infra07:26
*** Ryan_Lane has quit IRC07:29
*** flaper87|afk is now known as flaper8707:32
*** mrda has quit IRC07:42
*** tvb|afk has joined #openstack-infra07:43
*** tvb|afk has joined #openstack-infra07:43
*** jcoufal has joined #openstack-infra07:45
*** yassine has joined #openstack-infra07:47
*** basha has joined #openstack-infra07:47
*** mrda has joined #openstack-infra07:49
*** basha has quit IRC07:49
*** jcoufal has quit IRC07:49
*** boris-42 has quit IRC07:50
*** SergeyLukjanov has joined #openstack-infra07:53
*** Ryan_Lane has joined #openstack-infra07:56
*** Ryan_Lane has quit IRC08:01
*** mrda has quit IRC08:07
*** SergeyLukjanov has quit IRC08:09
*** dizquierdo has joined #openstack-infra08:10
*** jcoufal has joined #openstack-infra08:13
*** SergeyLukjanov has joined #openstack-infra08:13
*** thomasbiege1 has joined #openstack-infra08:16
*** thomasbiege1 has quit IRC08:19
*** DinaBelova has joined #openstack-infra08:22
*** Ryan_Lane has joined #openstack-infra08:27
*** nati_ueno has quit IRC08:28
*** Ryan_Lane has quit IRC08:31
*** johnthetubaguy has joined #openstack-infra08:31
*** mancdaz has quit IRC08:33
*** dizquierdo has quit IRC08:33
*** derekh has joined #openstack-infra08:34
*** mancdaz has joined #openstack-infra08:35
*** jerryz has quit IRC08:41
*** DinaBelova has quit IRC08:43
*** tvb|afk has quit IRC08:44
*** tvb|afk has joined #openstack-infra08:44
*** tvb|afk has joined #openstack-infra08:44
*** tvb|afk is now known as tvb08:44
*** locke105 has quit IRC08:49
*** locke105 has joined #openstack-infra08:50
openstackgerritPavel Sedl├ík proposed a change to openstack-infra/jenkins-job-builder: KeepLongStdio argument for JUnit publisher
*** samalba has quit IRC08:52
*** samalba has joined #openstack-infra08:53
*** jcoufal has quit IRC08:55
*** Ryan_Lane has joined #openstack-infra08:57
*** Ryan_Lane has quit IRC09:02
*** boris-42 has joined #openstack-infra09:05
*** tvb is now known as Tristan_09:10
*** Tristan_ is now known as Guest7765609:11
*** Guest77656 is now known as tvb09:11
*** dizquierdo has joined #openstack-infra09:15
*** Ryan_Lane has joined #openstack-infra09:27
*** Ryan_Lane has quit IRC09:32
openstackgerritJaroslav Henner proposed a change to openstack-infra/jenkins-job-builder: Add dynamic string and choice params.
*** Ryan_Lane has joined #openstack-infra09:58
*** Ryan_Lane has quit IRC10:02
*** hashar has quit IRC10:04
*** hashar has joined #openstack-infra10:10
*** hashar has quit IRC10:14
*** AlexF has joined #openstack-infra10:16
*** kmartin has quit IRC10:17
*** fifieldt has quit IRC10:28
*** tvb has quit IRC10:28
*** Ryan_Lane has joined #openstack-infra10:29
*** DinaBelova has joined #openstack-infra10:30
*** dkehn_ has joined #openstack-infra10:31
*** dkehn has quit IRC10:31
*** hashar has joined #openstack-infra10:31
*** Ryan_Lane has quit IRC10:33
*** DinaBelova has quit IRC10:33
*** hashar has quit IRC10:36
*** thomasbiege1 has joined #openstack-infra10:40
*** matsuhashi has quit IRC10:52
*** yaguang has quit IRC10:56
*** tvb has joined #openstack-infra10:59
*** tvb has quit IRC10:59
*** tvb has joined #openstack-infra10:59
*** Ryan_Lane has joined #openstack-infra10:59
*** Ryan_Lane has quit IRC11:03
*** tvb has quit IRC11:07
*** thomasbiege1 has quit IRC11:09
*** johnthetubaguy has quit IRC11:10
*** AlexF has quit IRC11:10
openstackgerritJaroslav Henner proposed a change to openstack-infra/jenkins-job-builder: Add dynamic string and choice params.
BobBallmordred: when you're around could you let me know?  I want to understand what sort of stats you think would be useful to show that smokestack's -1's are stable to feed into the discussion of whether they can be upgraded to -2?11:14
*** AlexF has joined #openstack-infra11:14
*** tvb has joined #openstack-infra11:20
*** tvb has quit IRC11:20
*** tvb has joined #openstack-infra11:20
*** thomasbiege1 has joined #openstack-infra11:24
*** thomasbiege3 has joined #openstack-infra11:27
*** thomasbiege1 has quit IRC11:27
*** tvb has quit IRC11:28
*** Ryan_Lane has joined #openstack-infra11:30
*** tvb has joined #openstack-infra11:30
sdagueBobBall: if it's not run by CI team, it really can't be -211:30
*** giulivo has joined #openstack-infra11:31
sdaguewe can't have an external entity have the ability to have an infrastructure fail then break the gate for everyone, we've got enough challenges with infrastructure we control doing that11:31
*** shardy is now known as shardy_afk11:32
BobBallI'm referring to the discussion which finished with - of course, the infra team needs the ultimate authority and the revokation of -2 privs easily solves that11:32
BobBalljust like the "ultimate" sanction of moving a job from voting to non-voting11:33
BobBalldoesn't really need any work from the infra team to fix it, but ensures that the team responsible for the job/etc will fix it before being considered for the priviledge again11:33
*** thomasbiege3 has quit IRC11:33
sdagueok, sorry, different thread I was thinking about11:34
*** Ryan_Lane has quit IRC11:34
BobBallI think it's the same thread - but my starting suggestion was unworkable and I completely understand why that was now!11:34
sdagueso I think the stat mordred actually wants there is how often is someone ignoring a -1 from smokestack11:35
BobBallBasically what I think would be useful is for SS to run in parallel to the gate and post a -2 vote if it completes it's testing and finds a failure in the tests (we've specifically only included test-failures in voting - so if a packging failure occurs, it doesn't post)11:36
BobBallif the gate finishes first, then tough, SS doesn't get a chance to say whether it thinks a patch works or not11:36
BobBall*nod* - I've got those stats11:36
BobBallbut I want to get more details because I think there are other useful things11:36
BobBallohhh useful query11:37
BobBallI was doing it through SSH11:37
*** thomasbiege has joined #openstack-infra11:37
sdagueso it's only happened twice this year on nova11:37
BobBallI'll have to look into those two11:38
BobBallbut they were way before the automatically posting / packaging fxies that changed the SS workflow11:38
sdaguethe first one, smokestack was broken (January)11:38
sdagueBobBall: sure11:38
sdaguebut that's even more indication that there is no need for SS to have -211:38
BobBallbut I'm also interested in the stats about how regularly SS had posted before jenkins returned11:38
openstackgerritDarragh Bailey proposed a change to openstack-infra/jenkins-job-builder: Add repo scm
sdagueright, but it will still post a -1 even if we went to merge11:39
BobBallthink so, yes11:39
sdagueunless you did something very magical11:40
BobBallheh :)11:40
*** CaptTofu has quit IRC11:40
sdaguewe get jenkins check results after we're in the gate11:40
*** CaptTofu has joined #openstack-infra11:40
sdague is the only override in the last 6 months11:40
BobBallWhat do you mean by override?11:41
sdaguethe only time we merged a change that SmokeStack had a -1 on11:41
BobBalloh, yes11:41
sdagueso I think you are trying to solve a problem that doesn't exist :)11:42
openstackgerritDarragh Bailey proposed a change to openstack-infra/jenkins-job-builder: Add repo scm
BobBalldepends on what the problem really is :)11:42
sdagueok, what do you think the problem is? maybe I don't understand11:42
BobBallFrom my perspective we've got a system that can be used to gate changes to prevent breakages to the XenAPI driver11:43
BobBallthat's the criteria for being a "Group A" hypervisor11:43
sdaguebut it's already doing it11:43
sdaguewe only had 1 override in the last 6 months11:44
BobBalland while I'm working hard on getting XenServer tested in the gate properly, there have been lots of hiccups along the way11:44
BobBallnah, it's already "functional testing provided by an external system that does not gate commits"11:44
*** matsuhashi has joined #openstack-infra11:44
sdagueso this is really just about moving from B -> A state?11:44
*** thomasbiege has quit IRC11:44
sdaguenot actually about keeping the breaks out of the tree?11:45
BobBallGroup A is about a system that ensures the breaks are kept out - rather than relying on the reviewers11:45
sdaguefrom a code perspective, the problem is already solved11:45
sdaguewe rely on reviewers for all sorts of things, especially as we don't have 400% test coverage11:46
sdagueand the reviewers aren't failing us here11:46
sdague1 override in 6 months is not a real failure rate11:47
sdagueso you are trying to fix a problem that doesn't exist11:47
sdagueif the override rate was twice a day, I'd agree with you11:47
BobBallSo is your view that group A and B should really be considered the same thing because an automated process and manual process-that-works are as good as each other11:47
sdaguethey are different, because group B isn't being run by the project. So if entity X that is running external CI stops, the project can do nothing about it.11:48
BobBallSo you think that A needs to be integrated with the gate and B is external irrespective of whether it "gates" or not11:49
sdaguerealize "has -2" requires that it is "run by the CI team"11:49
sdagueI think that's where the definition might not have been clear. To be group A I really think it needs to be run by infrastructure team for OpenStack. I don't see another way we could do that.11:50
BobBallI thought the discussion we had last month suggested that the -2 privs could be given to an external system because it's easy enough for the CI team to revoke those privs if they ever break the gate11:50
sdagueI didn't think that was suggested11:51
sdagueI'm -2 on the idea of non infra run systems having -2 on integrated projects11:51
*** dims has joined #openstack-infra11:51
BobBall*grin* That was my suggestion, but I thought mordred's suggestion to talk about it again when SS was proving it's stability with automated -1's meant that possibility was open :)11:53
*** AlexF has quit IRC11:53
sdaguemy reading of that is wanting to see how often the override was a problem was to make it clear there was nothing wrong with being only a -1 job11:54
sdaguebecause the -1 has been respected 99.99% of the time11:54
*** SergeyLukjanov has quit IRC11:55
sdagueI'll let him speak for himself when he gets up though :)11:55
sdaguebut that's my take11:55
*** pcm_ has joined #openstack-infra11:56
sdaguefungi when you get up, I had a question on job definition11:59
sdaguemostly around neutron jobs11:59
*** Ryan_Lane has joined #openstack-infra12:00
*** matsuhashi has quit IRC12:01
*** afazekas is now known as afazekas_food12:01
*** AlexF has joined #openstack-infra12:02
*** SergeyLukjanov has joined #openstack-infra12:05
*** adalbas has joined #openstack-infra12:05
dimshi, looking at zuul page none of the "check" jobs seem to have a progress bar. they are marked "queued" . Do the gate jobs take precedence and check jobs will wait for their turn? or is there some other problem?12:07
openstackgerritSean Dague proposed a change to openstack-infra/config: add gate-tempest-devstack-vm-neutron-pg job
sdaguedims: gate takes priority12:07
sdagueso yes, check queue is starved right now12:07
*** Ryan_Lane has quit IRC12:07
dimssdague, thanks!12:08
sdaguebasically before nnfi the gate would be sitting in a hold until the gate failure was resolved, so the check jobs would run in and grab all the devstack nodes12:08
sdaguebut now because the gate throughput is up, they are grabbing every resource12:08
dimsmakes sense12:08
sdagueand because the neutron race which is killing most jobs, that's kind of problematic12:09
sdaguefungi / jeblair: check queue is now > 150, so bursting would be nice :)12:09
*** matsuhashi has joined #openstack-infra12:10
*** matsuhashi has quit IRC12:11
*** flaper87 is now known as flaper87|afk12:12
*** AlexF has quit IRC12:14
*** thomasm has joined #openstack-infra12:17
*** thomasbiege has joined #openstack-infra12:20
*** AlexF has joined #openstack-infra12:20
*** hashar has joined #openstack-infra12:21
*** ArxCruz has joined #openstack-infra12:21
*** thomasbiege has quit IRC12:22
*** flaper87|afk is now known as flaper8712:22
*** dims has quit IRC12:22
*** dims has joined #openstack-infra12:23
*** weshay has joined #openstack-infra12:28
*** acabrera has joined #openstack-infra12:29
*** acabrera is now known as alcabrera12:29
*** tvb has quit IRC12:30
*** matsuhashi has joined #openstack-infra12:35
*** tvb has joined #openstack-infra12:35
*** Ryan_Lane has joined #openstack-infra12:36
*** dkliban has joined #openstack-infra12:36
*** jhesketh has quit IRC12:37
*** jhesketh_ has quit IRC12:37
*** Ryan_Lane has quit IRC12:40
ttxfungi: finally fixed bug 116027712:44
uvirtbotLaunchpad bug 1160277 in openstack-ci "Groups have similar names in LP and gerrit but are no longer synced" [Medium,Fix released]
ttxfungi: while looking at the groups list in gerrit though, I found a few groups that are probably useless and should be removed:12:44
ttxfungi: empty copy of the LP "heat" group:,members12:44
*** afazekas_food has quit IRC12:45
ttxhmm, that's all.12:45
*** johnthetubaguy has joined #openstack-infra12:47
*** basic` has joined #openstack-infra12:47
fungisdague: what's your job definition question?12:48
fungittx: yeah, i try to empty and set unused groups non-visible12:48
fungigerrit doesn't have a "delete group" feature12:48
sdaguefungi: can we specify the same job twice on a zuul run12:48
ttxfungi: ah. ah.12:48
fungittx: eventually i'll get around to determining how to construct a query which identifies an empty group and removes all traces of it from the various tables it might appear in12:49
fungisdague: i don't think we've tried, so not entirely sure12:49
fungiback to the "run neutron tempest 10x for neutron jobs" idea presumably12:50
*** jhesketh has joined #openstack-infra12:50
fungier, for neutron changes12:50
*** jhesketh_ has joined #openstack-infra12:50
fungilemme see if a duplicate entry horks up the layout.yaml parser at least12:50
fungittx: i didn't get as far as the nova requirements sync in folsom yesterday, ran into some more corner cases, but did get the patches for openstack/requirements on folsom and grizzly with the capped list including all transitive dependencies for all integrated projects on that branch...,n,z12:52
fungittx: steps i'm following are described at along with some details on manual conflict resolution between some of the projects' requirements lists12:53
fungithe changes to the requirements project may need some more massaging since i crudely backported a couple changes from master to rename/combine the lists there12:54
*** rfolco has joined #openstack-infra12:55
ttxfungi: did you see my questions above about the need to care about ceilometer in stable/folsom at all ?12:59
*** crank has quit IRC13:01
fungittx: haven't hit the scrollback yet, but will look13:02
ttx(that was answering your question on how to solve conflicting reqs)13:03
fungilooks like removing them will solve the anyjson conflict at least13:03
*** zul has quit IRC13:03
ttxfungi: also was wondering about swift since they were not in the gate in those ancient folsom times13:04
ttxignoring both would solve all conflicts13:04
ttxexcept pep813:04
*** dkehn_ is now known as dkehn13:04
fungiso it would13:04
*** julim has joined #openstack-infra13:04
*** ericw has joined #openstack-infra13:05
*** tizzo has joined #openstack-infra13:06
*** Ryan_Lane has joined #openstack-infra13:06
*** davidhadas_ has joined #openstack-infra13:06
*** dprince has joined #openstack-infra13:07
*** zul has joined #openstack-infra13:07
fungithough the versions i settled on to resolve those other conflicts are basically still the right one after factoring swift out of folsom13:07
ttxok then :)13:08
*** dizquierdo has left #openstack-infra13:09
*** Ryan_Lane has quit IRC13:11
*** HenryG has joined #openstack-infra13:11
*** xchu has quit IRC13:11
sdaguefungi: well at least run neutron more than once13:11
sdagueright now it's way too easy for a race to come through13:12
ekarlsoany of you familiar with disk image builder ?13:12
sdagueso running 2x neutron and 2x neutron-pg would make it closer to other projects in how easy it is to slip a change through13:12
sdagueekarlso: you probably want #tripleo13:13
*** nosnos has quit IRC13:15
*** mriedem has joined #openstack-infra13:18
*** HenryG has quit IRC13:19
*** HenryG has joined #openstack-infra13:19
*** salv-orlando has joined #openstack-infra13:20
*** crank has joined #openstack-infra13:20
*** prad_ has joined #openstack-infra13:23
sdaguefungi: so check queue is at 170 and growing because of the gate starvation, which is actually making folks jump the check queue, hence making the gate worse (at least a couple non Jenkins +1ed changes over in there)13:25
sdagueany idea how we can aleviate this?13:25
dansmithyeah, my thing from yesterday still hasn't run check, after 15h13:25
*** afazekas has joined #openstack-infra13:26
ttxsdague: needs a slightly smarter prioritization algorithm, I fear13:27
sdaguettx: the reality is we'll just move the pain around13:27
sdaguettx: but I agree13:27
sdagueclarkb and jeblair were working on this last night, but I guess no progress, and I don't think they realized quite how bad it was13:28
dansmithyeah, my thing from yesterday is critical, so it just got +A'd since jenkins never voted on it13:28
ttxsdague: at some point going faster just makes you go slower. This is a complex system :)13:28
*** matty_dubs|gone is now known as matty_dubs13:28
fungiit looks like we're starved on devstack slaves, so adding more unit test slaves isn't going to help13:28
sdaguefungi: yeh, this is all devstack starvation13:29
sdaguealso, given that stable/grizzly is bust, that's not helping either13:30
*** bnemec_ is now known as beekneemech13:30
sdagueas those are guarunteed resets right now13:30
sdaguethat's how we just lost the gate13:31
*** yassine has quit IRC13:31
fungisomeone approved a grizzly change?13:31
*** yassine has joined #openstack-infra13:31
fungithe list of people able to do stable branch approvals is small--we should at least tell those people to cut it out until grizzly is fixed13:31
sdaguewell, 8hrs ago they do13:32
sdagueit took 8hrs for that to get to the top of the gate, fwiw13:32
fungi,members plus,members13:35
*** Ryan_Lane has joined #openstack-infra13:36
*** johnthetubaguy1 has joined #openstack-infra13:39
*** johnthetubaguy has quit IRC13:40
*** Ryan_Lane has quit IRC13:41
*** CaptTofu has quit IRC13:44
sdaguefungi: any idea where the scheduling config is in zuul, so we could at least unstarve check?13:44
*** CaptTofu has joined #openstack-infra13:44
*** dcramer_ has quit IRC13:45
*** guohliu has joined #openstack-infra13:46
fungisdague: in zuul's layout.yaml, within entries in the pipelines section there are precedence parameters13:46
fungiwe could, for example, put gate and check back on equal footing that way13:47
fungiso that the gate will take 2-3x as long to clear as it is now13:47
fungiwe can't currently set proportional shares or anything though (to say 75% of available resources go to gate jobs and 25% go to check jobs)13:48
openstackgerritSean Dague proposed a change to openstack-infra/config: make check queue high priority
sdaguefungi: yeh, equal priority I thik would be the right call13:49
sdaguethe gate's really not merging much code right now anyway because of the resets13:50
sdagueand debug fixes to get to the bottom of those issues, are blocked on check, and not getting feedback13:50
fungias to your earlier question about multiple instances of the same job for a given project+pipeline, i did confirm that doesn't fail the layout parsing check but still no idea what zuul would do with it13:52
*** CaptTofu has quit IRC13:52
sdaguefungi: ok, well we can ponder that one later :)13:52
*** CaptTofu has joined #openstack-infra13:52
sdagueso what do you think about leveling the queues? per -
*** yassine has quit IRC13:53
*** yassine has joined #openstack-infra13:53
openstackgerritJeremy Stanley proposed a change to openstack-infra/config: Temporarily raise check pipeline precedence
fungioh, you wrote one already13:56
sdaguefungi: yeh :)13:56
dhellmanngood morning13:56
dhellmannsdague: it sounds like there are still issues with stable/grizzly because of the cliff change and quantumclient. I'm thinking of just releasing a cliff that doesn't use pyparsing at all, to remove the conflict.13:58
sorenHm. I'm trying to use jenkins-job-builder, but my Jenkins has CSRF enabled and python-jenkins doesn't seem to support that. How have you worked around it for the OpenStack Jenkins?13:58
*** julim has quit IRC13:58
fungisdague: abandoned mine, +2'd yours. i expect jeblair will be waking up any time so let's get his input on it13:58
sdaguedhellmann: that would be awesome13:58
dhellmannsdague: ok, I'll get back to work on that, then.13:59
sdaguefungi: ok13:59
*** julim has joined #openstack-infra13:59
fungisoren: good question... where is the csrf option in jenkins? i'll check whether we set it (we don't really use the webui enough to worry about that)14:00
sorenfungi: I just found
sorenfungi: ...which says not to enable CSRF.14:00
fungii suppose that would do it14:00
fungiwell, again, if you treat its http interface as an api endpoint only and don't use it for browsery clicky-clicky things, it's not particularly scary14:01
*** shardy_afk is now known as shardy14:02
fungiyour api client is not going to be following links from other sites (one would hope)14:02
fungithis mostly underscores the need for jenkins to separate its web interface and its api endpoint14:03
*** anteaya has joined #openstack-infra14:03
fungialso, when i do need to connect into any sort of web interface as an admin, i use an entirely separate browser to log into that and only that, but thankfully most of the stuff we administer doesn't require a webui14:05
*** Ryan_Lane has joined #openstack-infra14:07
fungii wonder if could be leveraged for that more recently14:08
openstackgerritFelipe Reyes proposed a change to openstack-infra/jenkins-job-builder: Added support for Git shallow clone parameter
*** mrodden has joined #openstack-infra14:10
*** Ryan_Lane has quit IRC14:11
*** rnirmal has joined #openstack-infra14:13
dhellmannsdague: I'm trying to think of a plan for testing a new cliff release without actually releasing it and potentially causing more things to break. Any ideas?14:15
sdagueif we had spare gate time, I would. But as that is all starved... I don't know14:16
sdaguewe could make a requirements proposed change with a tarball link14:16
dhellmannI can run tests locally, I'm just trying to reason through would I would need to do14:16
dhellmannoh, that's interesting14:16
sdaguethat would at least test master14:16
*** dizquierdo has joined #openstack-infra14:17
dhellmannI'm assuming if I remove the pyparsing requirement from cliff, the one in stable/grizzly will be useless but not have a conflict14:17
dhellmannso stable/grizzly will think it needs a version of pyparsing that nothing will import14:17
sdagueright, so it won't wedge in stable/grizzly14:17
sdagueI think that's right14:17
sdaguehonestly, I'm only about 1/2 way down the rabbit hole on that one, as I thought others were working it14:18
dhellmanncan I point the requirements file at a git URL? that would make it easy for me to test locally14:18
dhellmannme, too14:18
dhellmannI thought it was just a matter of removing that dependency, but apparently it's hard to get to the quantumclient part of the repo and do a release or something14:18
sdaguedhellmann: yeh, you can change the repos for devstack14:18
sdaguein localrc14:18
sorenfungi: csrf isn't about how *you* use the web ui, after all.14:18
sdagueeither alt url, or alt branch14:18
dhellmannsdague: no,  I mean have the global requirements point to git for cliff14:19
sorenfungi: It's about how your browser can be tricked into using it.14:19
sdaguedhellmann: I don't remember if it can point to a git14:19
sdaguebut it can do a tarball, like oslo does14:19
dhellmannok, I can make a local sdist14:19
sorensdague: You can point pip at a git url.14:19
fungisoren: yep. not logging authenticating to the jenkins administrative webui with your browser is a great way to thwart that14:19
sorensdague: git+
fungier, not authenticating14:20
*** tvb has quit IRC14:20
sdaguesoren: ok, except I'm not sure we propogate those via our global requirements sync14:20
sdagueI know we do the oslo tar case14:20
sorensdague: Sorry, I replied entirely out of context. :)14:20
*** KennethWilke has joined #openstack-infra14:20
sdagueyep, no worries :)14:20
sorenfungi: Jenkins seems less useful if you never look at it :)14:21
sdagueit's good to know though, probably something worth looking to add to our reqs sync14:21
fungisoren: but yeah, having an automation-friendly means of authenticating to the api endpoint entirely separate from browser handling14:21
fungisomething it lacks14:21
fungisoren: probably the other reason we don't need to authenticate to it often is that we have it set up with anonymous read access enabled, so as long as you're not changing things through the webui you don't need to log into it14:22
jd__huhu, today ETA for a Ceilometer patch merge seems to be around 8 hours, FWIW14:23
fungijd__: yeah, we're proposing slowing that down further ;)14:23
*** datsun180b has joined #openstack-infra14:23
jd__if that improves quality even further I wouldn't mind14:23
jd__I prefer to wait 8 hours for a merge than spending my days doing rechecks :-)14:24
sdaguejd__: the gate's at about 8 hrs merge time right now because of all the resets14:24
sorenfungi: Ah, good point. Mine's set up to always require authentication.14:24
sdaguehowever, the check queue is currently starved, so nothings moved there for the last 15 hrs14:24
jd__sdague: ah I didn't know there has been reset, cool then14:24
*** wchrisj_ has joined #openstack-infra14:24
sdaguejd__: not a zuul reset14:24
sdaguefails by stuff in the gate14:24
jd__oh I see14:25
sdaguethe gate failure rate is really high14:25
jd__the new tree stuff ?14:25
sdagueno, bugs in openstack14:25
*** amotoki has quit IRC14:26
fungishush. openstack has no bugs. you're dreaming14:26
jd__sdague: bugs in new patchset being tested you mean, or existing bugs (rechecks)?14:26
sdagueexisting bugs14:26
jd__ok :)14:26
*** adalbas has quit IRC14:28
*** wchrisj_ has quit IRC14:29
sdaguewhat is definitely interesting is the Test Nodes graphic at the bottom of the page has a very distinctive look when we are in reset land14:30
sdaguethe peaks going up and down14:30
dansmithit's pretty amazing how small gnome-terminal will go, so at least I can see all of the nova stuff block-wise :)14:30
*** dcramer_ has joined #openstack-infra14:32
*** tvb has joined #openstack-infra14:32
*** mrodden has quit IRC14:33
*** markmcclain has joined #openstack-infra14:33
mordredmorning all14:34
Alex_Gaynormorning mordred14:34
mordredsoren: we kinda think Jenkins is less useful in general, and thus never really look at it :)14:35
sdaguemordred: how do you feel about rebalancing the queues? :)14:35
*** senk has joined #openstack-infra14:36
sdaguewe have stuff that entered the check queue yesterday afternoon, as still haven't gotten access to devstack nodes14:37
mordredsdague: done14:37
sdaguemordred: thank you14:37
*** Ryan_Lane has joined #openstack-infra14:37
*** MoXxXoM has quit IRC14:38
Alex_GaynorSo, maybe ridiculous question, but could we be launching more devstack nodes?14:39
*** MoXxXoM has joined #openstack-infra14:39
sdagueAlex_Gaynor: my understanding is we were basically at quota with HP14:41
sdaguemaybe mordred knows more14:41
*** Ryan_Lane has quit IRC14:41
mordredwe are - and we could request a quota increase... but14:43
mordredI don't know that I'm convinced that would help, given the resets14:44
*** adalbas has joined #openstack-infra14:44
mordredthe gate queue isnt' slow due to starvation14:44
sdagueit would help with the starvation on check14:44
mordredwell, we've also got a change in flight to move the check queue to a separate pool of machines14:44
sdaguesure, it's just going to take until tomorrow afternoon to clear the check queue at this rate14:45
mordredyah. I'm just saying, I think that finishing the above patch and landing it will get us _way_ further (and be quicker) than trying to increase quota size14:46
sdagueyeh, sure14:46
openstackgerritA change was merged to openstack-infra/config: make check queue high priority
mordredI'll work on trying to get that patch finished as soon as I've found coffee14:46
jeblairi think i would have made them both normal14:46
jeblairnow post will starve14:46
jeblairi apparently missed reviewing that by 2 minutes14:47
sdagueI thought post was high?14:47
*** alcabrera is now known as gerrit214:48
*** gerrit2 is now known as alcabrera14:48
sdagueis normal a keyword? or just the default?14:48
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Make check, high, post normal precedence.
mordredjeblair: nod. +214:49
sdagueI think you want to update commit message :)14:49
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Make check, gate, post normal precedence.
jeblairmake word word word14:49
ryanpetrelloso I've just tagged a stackforge project (pecan) for release, and watched it go through on zuul;14:49
ryanpetrello I've never done this before now - how long does it take for the sdist to show up on pypi?14:49
ryanpetrello(not in a rush, just want to make sure I didn't goof it up :D)14:50
*** kgriffs has joined #openstack-infra14:51
ryanpetrellolooks like it failed?14:51
*** rcleere has joined #openstack-infra14:51
*** matsuhashi has quit IRC14:51
jeblairmordred, sdague: there are also things we can tune to get nodepool a little more responsive now, i'll work on that while mordred finished the rax-check stuff14:51
kgriffsguys, got a question re paste.openstack.org14:52
sdaguejeblair: cool14:52
kgriffsI noticed it is based on lodgeit, and I found this:
kgriffsis that repo independent of the original lodgeit?14:53
fungiryanpetrello: yeah, looks like you're missing a [testenv:venv] section in your tox.ini which expects to find14:53
fungikgriffs: it's a fork14:53
sdaguejeblair: so are queue priorities changed as soon as the config lands?14:53
fungikgriffs: the original lodgeit is abandoned upstream last i checked14:53
kgriffsoh, ok14:53
jeblairsdague: yes14:53
sdaguecheck is still going in the wrong direction, and it's only going to get worse as the PST folks wake up14:53
kgriffsso we are sort of keeping it on life support?14:53
fungikgriffs: i think pocoo stopped using it and ceased maintaining it14:53
jeblairsdague: for new jobs14:53
kgriffsfungi: ok, I suspected as much14:54
jeblairsdague: which isn't going to help many of the jobs currently in check14:54
sdaguejeblair: ok, so the 190 check jobs that are in there won't make any progress?14:54
fungikgriffs: basically, i think. part of the problem is that unauthenticated sites allowing you to post arbitrary text are an attractive nuisance and often abused to the point of being unmaintainable14:54
mordredkgriffs: yes. clarkb and I found a pastebin that was more similar to gist a little while ago, but we haven't gotten to the point where working on paste has been important enough :)14:54
*** marun has joined #openstack-infra14:55
*** jswarren has joined #openstack-infra14:55
jeblairsdague: indeed it seems likely to make it worse14:55
kgriffsmordred, fungi: I would like to create a "pastebin" for images to share screenshots and stuff, and was wondering whether it should be a standalone thing or try to integrate with something already out there14:55
*** jswarren has quit IRC14:55
jeblairsdague: perhaps we should _lower_ gate to low until it clears out14:55
fungikgriffs: yikes. i think you don't want to do that14:55
fungikgriffs: it's called 4chan ;)14:55
sdaguejeblair: yeh, that seems reasonable, then on the next reset they'll start getting resources14:56
mordredkgriffs: we have an open item to have better support for this from the horizon folks to14:56
*** jswarren has joined #openstack-infra14:56
mordredkgriffs: and some preliminary plans, but simlarly that hasn't hit high enough on the queue yet14:56
kgriffsmordred what is the alternative project you found?14:56
sdaguejeblair: you want to respin your patch for that? or I can do it14:56
kgriffsmordred: (the gist-like thing you mentioned)14:57
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Make check, gate, post low precedence
jeblairsdague: ^14:57
*** Ajaeger has joined #openstack-infra14:57
sdaguefungi, mordred: ^^^14:57
kgriffsah, nice14:57
kgriffsthanks - I'll check it out.14:58
sdagueok, hopefully that will get things running though14:58
Alex_Gaynorkgriffs, fungi: I can confirm that pocoo upstream no longer maintains lodgetit, their install (paste.pocoo) was being used for various illegal and highly offensive stuff so it was too much of a hassle14:58
openstackgerritMonty Taylor proposed a change to openstack-infra/config: Use rackspace for tempest check tests.
jswarrenHello.  Maybe this has already been brought up, but I'm noticing on zuul that python26 jobs are stuck on queued with evidently none are in progress.15:00
mordredjeblair: I think that does it15:00
mordredjswarren: yup. big-time gate issues right now15:01
jeblairmordred: you didn't split it into 2 changes15:01
mordredjeblair: ah. sorry. didn't see that note (still pre-coffee) one sec15:02
*** mrodden has joined #openstack-infra15:03
openstackgerritA change was merged to openstack-infra/config: Make check, gate, post low precedence
Alex_GaynorSo the priority updates, does that require a zuul restart?15:04
jeblairAlex_Gaynor: no, nothing to the zuul layout.yaml requires a restart, only a reload (which puppet will do automatically); queue contents don't change15:04
Alex_Gaynorjeblair: thank god15:05
openstackgerritMonty Taylor proposed a change to openstack-infra/config: Use rackspace for tempest check tests
openstackgerritMonty Taylor proposed a change to openstack-infra/config: Set up new images on rackspace for check tests
jeblairmordred: dfw has 18/60 slots available (the rest are static slaves); ord is pretty much open (i can delete some test servers there), iad only has 8 slots15:06
jeblairmordred: i think we need to leave headroom in dfw.  i'm not sure we should use it much, if at all.15:06
mordredjeblair: agree. lemme modify the first patch15:07
jeblairmordred: hang on15:07
mordredI'm also going to send pvo and troy an email seeing if we can get IAD to match15:07
*** Ryan_Lane has joined #openstack-infra15:08
*** tvb has quit IRC15:08
Alex_Gaynormordred: if you need me to ask people to up our limit, let me know, I can start sending emails15:08
mordredAlex_Gaynor:  I just emailed troy and pvo, but if you know other folks, what I requested was "Can you up our quota on the openstackjenkins account in IAD to match DFW and ORD?"15:09
openstackgerritAnne Gentle proposed a change to openstack-infra/config: Removes openstack-api-programming doc build
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Tune nodepool
Alex_Gaynormordred: k, will start firing emails15:10
jeblairmordred: i will modify your patch15:10
fungijeblair: ttx: reed: noticed a small freshness problem with . what's the best way to confirm which repositories should be listed in there to count toward atc? should everything in openstack/ openstack-dev/ and openstack-infra/ get added to it?15:12
mordredAlex_Gaynor, jeblair: pvo has acknowledge my email15:12
*** Ryan_Lane has quit IRC15:12
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Set up new images on rackspace for check tests
jeblairmordred: ^15:13
*** DinaBelova has joined #openstack-infra15:13
ttxfungi: you shoudln't need ATC right now, just APC15:13
jeblairwhat's an apc?15:14
ttxActive pro(ject/gram) Contributor15:14
mordredjeblair: yup15:14
openstackgerritDavid Caro proposed a change to openstack-infra/jenkins-job-builder: Added the possibility to specify source files
fungittx: that's the list of projects we're building stats on, so for example openstack/django_openstack_auth is not represented (yet)15:14
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Use rackspace for tempest check tests
jeblairrebase ^15:15
ttxfungi: we don't have the precise program/project map yet, but I can go through the list of projects and get that for you15:15
fungittx: i'll add it since you say it's part of horizon's program, but just trying to figure out what else we may be missing more generally15:15
jeblairfungi: aprv  ?15:15
mordredjeblair: all three are +2 from me15:16
ttxfungi: i'll fix that list for you before we run the ATC voters lists15:16
fungittx: k, thanks15:16
jeblairfungi: and then as well15:16
jeblairi wip'd the 3rd change to keep it from going in prematurely15:16
ttxfungi: i added django_openstack_auth because that's arguably part of the horizon program15:17
*** CaptTofu has quit IRC15:17
ttxfungi: but it's a bit of a grey area right now, until programs all submit their lists15:17
ttxbut i can't get them to publish a mission statement, so projects lists...15:17
jeblairttx: i believe that's the understanding we came to with gabrielhurley15:17
ttxjeblair: agreed, but it just won't be completely clear cut until we get the program/projects maps in the governance repo15:19
* anteaya observes15:19
ttxuntil then we'll continue to use the old "sounds about right" recipe we've been using for ATCs until now :)15:19
ttxfungi: everyone will just blame anteaya anyway15:20
ttxthat's what we need election officials for, after all15:20
anteayablame me15:20
fungii know i do ;)15:20
* fungi kids15:20
anteayait is the fun that comes with that particular hat15:20
anteayaknew it when I volunteered15:20
*** tvb has joined #openstack-infra15:21
*** tvb has quit IRC15:21
*** tvb has joined #openstack-infra15:21
ttxanteaya: note that I decided to share the blame for the TC election. Just couldn't for this one :)15:21
anteayaand yeah the TC election promises to be a whole lot of fun15:21
anteayaget ready for the deluge of +1 emails15:21
mordredttx: you might want to poke the TC folks who still haven't vote on the governance repo - I believe your reminded slipped in the end of the meeting last time15:21
mordredso they may not be noticing that they need to do that15:22
ttxmordred: will do15:22
jgriffithsdague: ummm... just curious why you think this: is a tgt issue?15:22
uvirtbotLaunchpad bug 1226337 in tempest "tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern flake failure" [High,Triaged]15:22
jgriffithparticularly sinc ethe specific example here is that the server never booted?15:22
*** CaptTofu has joined #openstack-infra15:22
fungittx: the main reason i was asking as far as updating that list is that it potentially affects the set of qualifying atcs i gave reed for summit passes15:23
openstackgerritDavid Caro proposed a change to openstack-infra/jenkins-job-builder: Added the possibility to specify source files
ttxmordred: actually we have 8 +2s there. Which is enough to pass.15:23
ttxmordred: i'll still ping them for a last-minute objection though15:23
jgriffithjog0: ping15:24
*** Ajaeger has quit IRC15:24
jgriffithOH... never mind that Nikola15:24
*** freyes has joined #openstack-infra15:25
*** reed_ has joined #openstack-infra15:27
*** CaptTofu_ has joined #openstack-infra15:27
openstackgerritA change was merged to openstack-infra/config: Tune nodepool
openstackgerritA change was merged to openstack-infra/config: Set up new images on rackspace for check tests
*** CaptTofu_ has quit IRC15:30
sdaguejgriffith: because the issue looks like the iscsi device can't be found from compute15:31
*** rpodolyaka has left #openstack-infra15:31
jgriffithsdague: Ummmm15:32
sdagueit's a boot from volume, and on the 3rd time to boot from a volume the iscsi device never shows up on n-cpu15:32
openstackgerritJaroslav Henner proposed a change to openstack-infra/jenkins-job-builder: Add dynamic string and choice params.
jgriffithsdague: afraid I think these multiple things going on here15:32
*** tvb|afk has joined #openstack-infra15:33
*** tvb|afk has quit IRC15:33
*** tvb|afk has joined #openstack-infra15:33
sdaguejgriffith: ok, well more eyes appreciated15:33
sdaguethis is as far as we got on -qa this morning trying to figure things out15:33
sdaguethere's some scrollback there if you are on it15:34
jgriffithsdague: I'm looking, If I can find a clean example of the target issue I can dig in on the cinder side15:34
jgriffithsdague: checking...15:34
*** tvb has quit IRC15:34
openstackgerritJames E. Blair proposed a change to openstack-infra/zuul: Allow multiple invocations of the same job
jeblairsdague, fungi: ^ sadly, I think that answers that question in the negative.  but we should be able to have that feature in place over the weekend.15:35
*** Ryan_Lane has joined #openstack-infra15:38
mgagneWhen Rackspace updates their images, does the image ID change? Does the image disappears for a brief moment or are there 2 images with the same name for a couple of seconds?15:39
*** AlexF has quit IRC15:40
*** tvb|afk has quit IRC15:41
mordredjeblair: pvo says our IAD quota should be increased15:42
jeblairmgagne: i don't know15:42
*** tvb has joined #openstack-infra15:42
*** tvb has quit IRC15:42
*** tvb has joined #openstack-infra15:42
jeblairmgagne: it is!15:42
*** DinaBelova has quit IRC15:42
jeblairmordred: it is!15:42
jeblairmgagne: sorry15:42
*** Ryan_Lane has quit IRC15:43
jeblairmordred: i'll update nodepool conf15:43
*** tizzo has quit IRC15:43
*** DennyZhang has joined #openstack-infra15:43
*** AlexF has joined #openstack-infra15:44
*** UtahDave has joined #openstack-infra15:45
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Increase IAD nodepool limits
openstackgerritDavid Caro proposed a change to openstack-infra/jenkins-job-builder: Added globbed parameters to the job specification
jeblairmordred: check images are building15:45
mordredjeblair: woot15:46
*** yassine has quit IRC15:47
jeblairi'm deleting the old test nodes/images15:47
giulivojgriffith, what I found is that cinder seems to receive on okay from tgt-admin about the update so the volume is moved into available state15:48
*** DennyZhang has quit IRC15:48
giulivobut later iscsiadm on nova can't find the volume15:48
*** DinaBelova has joined #openstack-infra15:49
giulivoso following sdague suggestion I've this on devstack
*** DennyZhang has joined #openstack-infra15:49
jgriffithgiulivo: is it iscsiadm can't discover?  Cuz it looks like the discover works and it *thinks* it attached it15:49
jgriffithgiulivo: but that that actual proble is that the attach was no good15:49
giulivojgriffith, I found three attempts to rediscover15:49
jgriffithgiulivo: but I'm just trying to catch up so I could be wrong15:49
giulivolasting like  secs15:49
jgriffithgiulivo: what do you mean by that?15:50
jgriffithgiulivo: ie... can you point to the logs?15:50
jgriffithgiulivo: You mean sendtargets command?15:50
giulivowait a sec so I can post the relevant log15:51
jgriffithgiulivo: cool15:51
jgriffithgiulivo: like I said, be patient with me I'm just catching up with you guys here :)15:51
jgriffithgiulivo: Hoping I can help15:51
*** tizzo has joined #openstack-infra15:51
*** mkerrin has quit IRC15:52
giulivooh c'mon so the logs I was looking at are for cinder and for nova15:52
giulivothe problem is with volume 4020e0dd-24a0-453b-985d-e50cb2dd0de115:53
giulivothe nova exception is here
jeblairmordred, fungi:
jeblairall the rax check images are now ready15:55
jgriffithgiulivo: yeah, so that's what I was wondering....15:56
fungijeblair: does that mean 48672 is safe to un-wip/approve now?15:57
jgriffithgiulivo: Login was succesful indicating the target was there15:57
jeblairfungi: not just yet, it's launching the nodes15:57
jgriffithgiulivo: 2013-09-24 04:44:17.51515:57
giulivologin succeeds true, but not the volume?15:57
fungiahh, okay15:57
jgriffiththe attach/mount ad /dev/vda is the crux of the issue15:57
jgriffithI *think*15:57
jgriffithgiulivo: That fact that the login to the target was succesful is why I had moved past that point15:58
jgriffithgiulivo: sadly, no logging inbetween there :(15:58
*** thomasbiege has joined #openstack-infra15:59
*** CaptTofu has quit IRC15:59
giulivoso the three attempts to rediscover which are failing are "okay" ?15:59
giulivorediscover the volume, after logging in15:59
giulivolike this
jgriffithgiulivo: well...16:00
jgriffithgiulivo: so "discover" can mean different things with iscsi16:00
jgriffithgiulivo: "discover" in terms of iscsi target discovery appears to have succeeded without issue16:00
jgriffithgiulivo: what you're referring to though is the attachment16:00
giulivoyeah it's not the sendtargets sorry, I should say rescan but that is just the argument passed to iscsiadm16:00
jgriffith*I think*16:01
jgriffithgiulivo: got ya16:01
jgriffithgiulivo: so what's failing is the attach16:01
jgriffithgiulivo: the target *appears* to be vlie16:01
*** thomasbiege has quit IRC16:01
SpamapSAnybody know a way to specify a different set of things to ignore for flake8 per-directory?16:01
jgriffithgiulivo: but it's the attach that is hosed16:01
jgriffithand whatever's been done with the logging isn't overly helpful IMO16:02
*** matty_dubs is now known as matty_dubs|lunch16:03
*** tizzo has quit IRC16:04
openstackgerritDavid Caro proposed a change to openstack-infra/jenkins-job-builder: Added the possibility to specify source files
jeblairmordred, sdague, fungi: our first rax nodes are ready, from IAD, they took 16 minutes to build16:05
jeblair(dfw and ord are still building)16:06
fungiwhat's build time like for hp?16:06
jeblairfungi: 2 mins16:06
fungii guess ~15 minutes is what i recall from standing up puppetish servery things in rackspace previously though16:07
openstackgerritA change was merged to openstack-infra/config: Increase IAD nodepool limits
guitarzangiulivo: can you tell if the iscsi device shows up eventually?16:07
fungitaking package installs/upgrades and whatnot into account16:07
jeblairfungi: that's not necessary for this though -- this is a straight launch from image -- but it's a custom image, which means it may not be local to the compute node16:07
fungioh, ew16:08
fungiright, image is already updated and such16:08
jeblairi don't know how it works in rax though -- perhaps continued use warms caches on compute nodes.16:08
fungiwe'll be warming those up really quickly if that's the case ;)16:08
jeblairwe can somewhat mitigate this by increasing min-ready even more16:09
giulivoguitarzan, so I think the problem is exactly that the block device never shows up16:09
giulivothere is nothing from the kernel messages about the newer volume (from iscsiadm)16:10
giulivonot that I can see at least16:10
guitarzanhmm, how is the network between the two machines?16:11
guitarzanah, and above someone said the discovery was fine16:11
giulivoyeah the login on the portal works16:11
jgriffithguitarzan: network sucks16:12
jgriffithguitarzan: the target is discovered BTW16:12
jgriffithguitarzan: it's the iscsiadm attach that doesn't seem to work16:13
*** flaper87 is now known as flaperboon16:13
guitarzanwell, he also said the login worked16:13
guitarzanso that's definitely confusing16:14
dimsjgriffith, giulivo - i don't even see iscsiadm commands being run - looking at logstash using query - (@message:"4020e0dd-24a0-453b-985d-e50cb2dd0de1" OR @message:"iscsiadm") AND @fields.build_uuid:"dced339fa65543fd9e752d2581bc5cae"16:14
jgriffithdims: I've given up on logstash for the time being16:14
guitarzanjgriffith: yeah, I'm looking at that too16:14
jgriffithdims: checkout the link above to the nova log16:14
jgriffith2013-09-24 04:44:17.51516:15
dimsi see it16:15
dimslooks like we are losing information in logstash sigh.16:15
jgriffithdims: that's what I concluded but thought maybe my queries just sucked ;)16:16
*** alcabrera is now known as alcabrera_afk16:16
*** tvb has quit IRC16:17
jeblairjgriffith: the timestamps are hyperlinks to per-line targets16:17
jeblairjgriffith: (so you can more easily share a link to a line)16:17
jgriffithjeblair: Nice!!!16:17
jeblairjgriffith, guitarzan: sdague made a change yesterday that removes DEBUG lines from logstash16:17
jgriffithjeblair: thank you!16:17
jeblairjgriffith: sdague did the line-hyperlink too16:18
jgriffithjeblair: Ahhhh, so it's not that i cna't write a descent querie to save my life ;)16:18
jeblairfungi: some rax nodes are going on 0.43 hours in building state :(16:19
* clarkb catches up on the state of things16:19
jeblairclarkb: there's a lot; short version, we're throwing levers to deal with check starvation; nothing needs immediate attention there16:20
dimsgiulivo, jgriffith, 04:44:17.577 first try and exception is at 04:44:31.806 - may be it just needs more time?16:20
giulivoI don't know if there is nova folks around but after the latest iscsiadm --rescan attempt we have 10 seconds of almost no logging before the stack trace16:20
*** gyee has joined #openstack-infra16:20
openstackgerritA change was merged to openstack-infra/config: Use rackspace for tempest check tests
guitarzangiulivo: 3**2 seconds maybe? :)16:21
clarkbjeblair: does gearman honor NODE_LABEL? that was the biggest thing I was fuzzy on last night?16:21
jeblairclarkb: zuul translates that into the job_name:label syntax for gearman16:21
*** odyssey4me has quit IRC16:21
giulivoit's 10 seconds after the last attempt16:21
jeblairclarkb: we've never used that, so that's going to be exciting!16:22
clarkbjgriffith: dims: we are removing DEBUG for a couple reasons the biggest being it adds an order of magnitude to the size of our indexes (2 weeks is ~600GB now but was ~5TB with DEBUG) but also DEBUG is largely useless noise16:22
clarkbjgriffith: dims: also if there is information that pinpoints a bug and does not have anything logged at a higher level I would consider that to be a bug as well (if we fail it should be logged at something higher than INFO)16:23
clarkbat least WARN imo16:23
jgriffithclarkb: sure, don't get me wrong wasn't complaining16:23
clarkbjeblair: cool16:23
jgriffithclarkb: just pointing out that my queries never worked, and now I know why :)16:23
clarkbjgriffith: I know, just trying to point out how we got here. It isn't perfect bus is definitely more useable overall16:24
jgriffithclarkb: I would agree WRT bumping up some of the log levels16:24
giulivojgriffith, in nova it looks like the iscsiadm --rescan is only attempted three times so I think this just never finds the volume after logging in16:24
jgriffithclarkb: agreed16:24
dimsclarkb, thanks, understood16:24
jgriffithgiulivo: sorry... I was looking at something else, going back to something here16:24
guitarzangiulivo: if it hasn't happened in 14 seconds, maybe it isn't going to happen?16:25
guitarzangiulivo: you say there was never anything in kern.log about a new disk showing up?16:25
giulivoguitarzan, ^^ yep16:25
giulivoI think logging on the portal works but the volume is never found and as per nova code, after three failed attempts it reports failure16:25
giulivothat explains why there isn't anything in the kernel log about the new block device16:26
dimsgiulivo, so try a few more times may help?16:27
giulivoit is either the iscsiadm failing at --rescan16:27
jeblairclarkb, fungi, mordred: look at 48423,2 on the status page16:27
jeblairclarkb, fungi, mordred: mouseover the red dot16:27
giulivoor the tgtd returning an okay to cinder before the lun is actually made available16:27
mordredjeblair: yah16:28
jeblairclarkb, fungi, mordred: you'll see the 'needed dependency is failing' logic in action16:28
clarkbjeblair: awesome16:28
funginice, dependency failure16:28
clarkbI mean not that it is failing but that the representation of it works :)16:28
giulivoso based on that, I think this could help as we get tgtd in debug mode and can try to figure what it is doing when cinder provides it with the new volume16:28
fungii was hoping to eventually spot one of those in the wild with the new visualization16:28
fungialso, holy test nodes graph batman16:29
clarkbjog0: logstash is all caught up and appears to be keeping up16:30
pabelangerfungi, I was about to say that... that is awesome!16:30
clarkbjog0: so elastic-recheck probably doesn't need any fancy backoff stuff16:30
jeblairhere's an embiggened version:
jeblair(you have to remember to reload that one occasionally)16:30
jeblairthe orange peak near the end is the rackspace spinup16:31
jeblair(and most of the ready nodes are rackspace)16:32
pabelangerjeblair, what's the amount of time to actually spin up a node? Is that tracked some place?16:33
jeblairpabelanger: it's in graphite (nodepool.launch.*), but i can tell you offhand we're seeing about 2 mins for hp and 16 for rackspace atm.16:35
dhellmannsdague: if you want to give it a spin16:35
*** Ryan_Lane has joined #openstack-infra16:35
jog0mordred: it was a little bitchy, I was going for a public shaming.16:36
jog0clarkb: woot!16:37
clarkbjeblair: chatted with zaro briefly over hte wall (shame on us for not doing it here) to better understand the NODE_LABEL stuff and I am not entirely sure it will owkr as expected16:40
jeblairclarkb: we're about to find out?16:40
clarkbjeblair: because our project configs don't use the label devstack-precise-check there won't be any jobs for that project:label name in the gearman server16:40
jeblairclarkb: ah, yes, that label needs to be added16:41
*** boris-42 has quit IRC16:41
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Revert "Use rackspace for tempest check tests"
clarkbjeblair: but we can't do that safely without another job16:42
jeblairclarkb: i think we can.  the param func should set the label in all cases16:42
giulivodims, guitarzan, jgriffith, sdague  I'm sorry I've to leave but FWIW I'm of the idea that iscsiadm --rescan is failing at finding the volume after it logs in on the portal, the nova code checks for the device path 3 times but it never pops up so it raises , see so I think putting tgtd on debug on the other side could help figure wha16:43
giulivot is going on (at both creation time and attach)
clarkbjeblair: so we need to have an else in that function that sets it to devstack-precise? that should work16:43
jeblairclarkb: yeah, though to do it safely, i think we need to start by setting it to devstack-precise always, then change the job labels, then add the conditional16:43
jeblairclarkb: it's getting complicated enough that we should re-evaluate adding jobs...16:44
jeblairclarkb: the advantage of adding jobs is that we can say check jobs can run on either, which is a little bit of a release valve if rackspace can't keep up.16:45
*** odyssey4me has joined #openstack-infra16:45
jeblairclarkb: the disadvantage, obviously, is that the devstack jobs are a huge mess right now and we'd be making twice as many of them.16:45
*** dcramer_ has quit IRC16:45
clarkbyeah. What if we didn't treat them differently (rackspace runs the jobs in about as much time as hpcloud did them serially)16:46
clarkb(just throwing ideas out there)16:46
jeblairclarkb: rackspace runs them in about 1.5 the time, so we're looking at 60 minutes instead of 40.16:47
*** wchrisj_ has joined #openstack-infra16:47
*** afazekas is now known as afazekas_zz16:47
*** dcramer_ has joined #openstack-infra16:48
*** giulivo has quit IRC16:48
notmynameclarkb: I'm just getting caught up this morning. status of gates? good to go, or still waiting?16:48
jeblairclarkb: that might be the best approach.16:49
clarkbnotmyname: still in a bit of flux, but we are actively sorting it out16:49
notmynameclarkb: kk, thanks16:49
jeblairclarkb: what are we sorting out?16:49
clarkbjeblair: node starvation?16:49
clarkboh talking about gate in particular16:49
openstackgerritA change was merged to openstack-infra/config: Revert "Use rackspace for tempest check tests"
*** beekneemech has quit IRC16:50
jeblairclarkb: i don't think notmyname needs to take any particular action other than not approving stable/grizzly changes, which he so rarely does anyway.  :)16:50
clarkbjeblair: gotcha16:50
clarkbnotmyname: ^16:50
notmynameclarkb: jeblair: ok, thanks :-)16:50
*** gyee has quit IRC16:51
*** AlexF has quit IRC16:52
*** dcramer_ has quit IRC16:52
jog0clarkb: some files are missing from logstash16:52
jeblairmordred, fungi, sdague: so clarkb and i were chatting, and we either need to (a) do like 3 more steps to set up the check jobs to use rackspace, (b) double the number of devstack jobs so the check ones use rackspace, or (c) say screw it and just throw rackspace nodes into the general pool (occasionally jobs will take 60 instead of 40 mins)16:53
jeblairmordred, fungi, sdague: thoughts?16:53
*** jerryz has joined #openstack-infra16:53
jog0no screen-key isn't there16:54
clarkbkeystone should be there16:54
clarkbwe are missing ceilometer and one of the swift files (because the format of the swift file isn't conducive to indexing)16:54
* clarkb looks closer16:55
* zaro says option c16:55
jog0keystone is only missing sometimes16:55
mordredjeblair: damn16:56
mordredjeblair: I'm not convinced just more nodes in the pool will help - but you have just made excellent points16:56
jeblairmordred: why wouldn't more nodes in the pool help?16:56
jeblairmordred: that's pretty much what starvation means....16:56
jog0oh and elasticSearch is really cought up, you weren't exaggerating.16:57
clarkbjog0: that is why16:57
jog0sdague: thanks!!!16:57
clarkbjog0: basically no non DEBUG log lines according to apache16:57
mordredjeblair: 2 things - slower nodes in the pool will increase the latency before resets potentially16:57
clarkbjog0: but there are INFO lines in there so we have a bug16:57
jog0clarkb: oh :(16:57
jeblairmordred: yes, slowing resets down mitigates starvation but slows gate throughput16:57
jog0turns out I don't need keystone yet so its not a blocker16:58
clarkbjog0: I know what is going on16:58
clarkbjog0: I think keystone uses its special snowflake format and we don't handle that properly on the apache side16:58
clarkbsdague: ^16:58
jeblairmordred: other thing?16:58
mordredjeblair: nope. I think that was the thing. I was wrong about there being 216:58
*** dmakogon__ has joined #openstack-infra16:59
jeblairmordred: the steps in (a) aren't difficult, and (b) is just a lot of typing (c) needs reconfiguration as well.  i think all 3 choices will take about the same amount of time.16:59
jeblairwe get to chose on merits.17:00
mordredjeblair: I like the end state of having check jobs running in rackspace17:00
mordredbecause the slowness doesn't have a pile-on effect there17:00
*** DinaBelova has quit IRC17:00
clarkbjog0: sdague: I suddenly remember why logstash is so slow :) the number of cases you have to account for is a bit ridiculous17:00
*** matty_dubs|lunch is now known as matty_dubs17:00
*** gyee has joined #openstack-infra17:01
*** dstufft has quit IRC17:01
clarkbI think right now we only handle oslo format properly so swift and keystone aren't working17:01
jog0clarkb: yeah ...17:01
*** odyssey4me has quit IRC17:01
clarkba quick fix would be to make the level configurable in the workers and only have >DEBUG on oslo formatted things17:01
clarkbor sort it out in the wsgi app17:02
jeblairokay, so the choice is between (a) run _only_ on rackspace, or (b) run on rackspace and hp, more or less at random according to the proportion of available nodes17:02
clarkbsdague: ^ do you have an opinion on that?17:02
jog0clarkb: makes sense to me, but that may blow ElasticSearch way back again17:02
*** hashar has quit IRC17:02
*** dstufft has joined #openstack-infra17:02
clarkbjog0: it shouldn't be too horrible. keystone and swift logs are smaller than the others17:02
jog0clarkb: hopefully17:02
*** MarkAtwood2 has joined #openstack-infra17:03
jeblairclarkb: i think it's only a partial regex to get the level anyway, so it may not be too complex to do in the app.17:03
jog0clarkb: on a related front I want to go ahead and make the elastic-search gerrit user17:03
jog0anything special to do that?17:03
*** SergeyLukjanov has quit IRC17:03
jeblairmordred: what are your feelings on a/b ?17:03
clarkbjog0: one of the Gerrit admines (openstack-infra-core) needs to run a command17:03
markmcclainjeblair: any update on manually pushing that quantumclient branch pypi?17:03
jeblairmarkmcclain: did you ask us to?17:03
clarkbjog0: probably get consensus on the name first (since it will potentially comment on lots of chnages)17:04
jeblairwhy is it called recheck?17:04
jog0ala the recheck page we have17:04
jog0so use elasticSearch to make rechecks easier17:05
clarkbhmm is it time to test asterisk?17:05
jeblairclarkb: yes it is17:05
mordredjeblair: I think b sounds long term sounds richer17:05
jeblairi was hoping we could at least reach a consensus on which of a/b/c to do about nodes...17:05
jeblairmordred: yeah, so that means doubling the number of devstack jobs so there are check and gate versions17:06
mordredjeblair: yeah. that's the least appealing part of b17:06
clarkbmaybe we can template those jobs and it won't be so horrible?17:07
jeblairi mean, there may be opportunities for templating17:07
jeblairso who wants to work on that?  clarkb, zaro, mordred?17:07
mordredjeblair: I am on the phone for the next 2 hours.17:08
clarkbI can stab at it17:08
jeblairmordred: i'm guessing that's a no, but i'm not sure ;)17:08
jeblairclarkb: ok, thanks17:08
* mordred trying to bilk hp out of more headcount for us - so it's at least useful...17:08
jeblairrussellb, pabelanger: around?17:08
fungieek, more scrollback17:09
jog0mtreinish: ping17:09
markmcclainjeblair: I throught so, but I might not have made it clear17:09
*** odyssey4me has joined #openstack-infra17:09
jeblairmordred: can you release the quantumclient branch to pypi?17:10
mordredjeblair: sure17:10
pabelangerjeblair, indeed17:10
markmcclainthat was the review is going to require a manual merge first17:10
markmcclainbecause that branch won't clear the gate17:10
jeblairmordred: oh, so, er, can you force merge the review markmcclain is about to link for you, and then manually release it? :)17:10
clarkbsdague: if you get a free moment it would be great if you could stab at making the wsgi app regex more flexible to handle keystone and in the case of swift probably just pass it all through17:11
mordredjeblair: yes17:11
clarkbsince swift doesn't do log levels...17:11
fungii think a phased approach with rackspace nodes dumped into the general pool for starters makes sense, then take time to be able to separate pipelines to different providers in the ways which will make jenkins happy longer-term. i'm not super-keen on doubling the devstack job definitions, but maybe that's just unfounded ocd on my part17:11
clarkbnotmyname: we are doing level based filtering of logs17:11
*** DennyZhang has quit IRC17:11
clarkbnotmyname: but since swift doesn't have level based logs the filtering derps and removes everything17:11
mordredmarkmcclain: do we want to tag that as a particular version?17:12
notmynameclarkb: all swift processes support syslog facilities and log level filters:
*** yolanda has quit IRC17:12
clarkbnotmyname: but that only works with syslog?17:13
mordredmarkmcclain: like, what version should be released to pypi?17:13
clarkbnotmyname: syslog doesn't like us when we run devstack it falls over pretty spectacularly17:13
jeblairfungi: bummer, loss of consensus.  i actually think that (b) is the safest from the pov that it's least likely to break things if rackspace can't keep up (or we decide to reduce its node supply).17:13
jog0clarkb: so for the elastic-search gerrit user .. now that ElasticSearch is blazingly fast I want to get the bot up, on my own RAX server17:14
markmcclainmordred: 2.2.417:14
jeblairfungi: it sucks that it adds so many jobs, but maybe templating will help17:14
* fungi is still reading the last 20 minutes of scrollback, which will take about 20 minutes, at which point there will be another 20 minutes of scrollback17:14
*** odyssey4me has quit IRC17:14
mordredmarkmcclain: can't do that - neutronclient already has that tag :)17:14
mordredmarkmcclain: how about ?17:14
jeblairclarkb, fungi, mordred: can we go ahead and merge these changes before clarkb starts?
jeblairpabelanger: i'm available to dial in17:15
jeblairanteaya, zaro, fungi, clarkb: are you available for conferencing?17:15
jeblairanyone else?17:15
anteayajeblair: oh yeah17:15
markmcclainthat will work17:15
mordredjeblair: done17:16
clarkbjeblair: yes, will be slightly distracted by job config stuff though17:16
* zaro is available17:17
jeblairpabelanger: let us know when you have pbx.o.o configured the way you want17:17
*** derekh has quit IRC17:17
*** MarkAtwood has joined #openstack-infra17:17
mordredmarkmcclain: released17:18
mordredjeblair: the jobs should fail, but I tihnk I should push the tag back to gerrit anyway, what do you think?17:18
fungijeblair: regarding loss of consensus, i'm still catching up on what the consensus was17:18
jeblairmordred: yes17:18
markmcclainmordred: thanks17:19
jeblairfungi: (b) the one you didn't like because it adds lots of jobs17:19
*** kgriffs has left #openstack-infra17:19
jeblairfungi: i mean, none of us like it because it adds lots of jobs17:19
clarkbsdague: if we set the default starting sev to ERROR that should handle the swift case but will make the screen lines always show up...17:19
*** bnemec has joined #openstack-infra17:20
fungijeblair: yeah i can switch rooms and jump into the pbx in a bit. just trying to finish reading the discussion in here first17:20
*** odyssey4me has joined #openstack-infra17:20
*** reed_ is now known as reed17:22
*** senk has quit IRC17:22
openstackgerritA change was merged to openstack-infra/config: Make gate-tempest-devstack-vm-large-ops voting
fungijeblair: clarkb: sdague: mordred: if adding duplicate jobs is the safest and most pragmatic solution, then i agree it makes sense to take that route (no need to add features to support that)17:22
*** johnthetubaguy1 has quit IRC17:22
*** reed has quit IRC17:22
*** reed has joined #openstack-infra17:22
pabelangerjeblair, sure, give me a minute, trying to fix some errors on the pbx17:24
*** ryanpetrello has quit IRC17:25
openstackgerritA change was merged to openstack-infra/config: add gate-tempest-devstack-vm-neutron-pg job
harlowjaqq for ya'll17:31
harlowjaif anybody has some free secs17:31
*** alcabrera_afk is now known as alcabrera17:31
pabelangerjeblair, is multiple asterisk boxes still up?17:33
*** wchrisj_ has quit IRC17:33
pabelangerokay pbx.o.o is fixes17:33
jeblairpabelanger: maybe? i can check, but i think voipms is configured for pbx.o.o17:34
jeblairpabelanger: yeah, the others are still around if we need them.17:34
anteayaI'm in17:35
*** MarkAtwood2 has quit IRC17:35
*** MarkAtwood2 has joined #openstack-infra17:35
fungiyeah, they keep e-mailing me about pending updates/needed reboots but since they don't have a domain configured they don't match my cronspam filters and land in my inbox instead17:36
*** hemnafk is now known as hemna_17:36
fungiso pretty sure they're still up17:36
pabelangerjeblair, okay, seems to be working now17:36
* zaro is in conference17:36
harlowjaso just a question that the taskflow team is having, we'd like to run our tests against a real mysql instance (or maybe even postgres) instead of just sqlite (especially the migration part) and was wondering if there is any standard process to go through to make that happen?17:37
anteayamy skype crashed, back now17:37
anteayaand I am out again, my skype keeps crashing17:38
anteayanew laptop just installed it17:38
jeblairi only hear silence now17:41
pabelangerI am tweaking the time while you are talking to see if there is an notice of impact17:41
pabelangerso, there might be some chop17:42
pabelangerI increased the threashold17:42
jeblairit came back17:42
pabelangerlowering it again17:42
pabelangerback to 1000ms17:42
pabelanger(the sweet spot, so far)17:42
jog0clarkb: until we sort out the gerrit user for elastic-recheck just  using my own user17:44
*** reed has quit IRC17:48
*** SergeyLukjanov has joined #openstack-infra17:50
*** dizquierdo has left #openstack-infra17:50
anteayamy skype died17:51
pabelangeranteaya, okay17:52
*** sarob has joined #openstack-infra17:52
anteayaI'm pm'ing fungi for the rest17:52
*** melwitt has joined #openstack-infra17:52
*** nati_ueno has joined #openstack-infra17:52
*** odyssey4me has quit IRC17:54
*** boris-42 has joined #openstack-infra17:58
*** Ajaeger has joined #openstack-infra18:01
pabelangerback to 1000ms for silence18:02
pleia2if it would be helpful to have me join the call too let me know, I got distracted by my baremetal testing strace finally working (hooray)18:02
pleia2well, the failure appearing so I could strace it anyway :)18:02
*** odyssey4me has joined #openstack-infra18:03
jog0its  scary watching the elastic-recheck bot in openstack-qa18:05
* fungi is afraid to look18:06
jog0sdague: ping18:07
jog0for bug 123040718:07
uvirtbotLaunchpad bug 1230407 in neutron "VMs can't progress through state changes because Neutron is deadlocking on it's database queries, and thus leaving networks in inconsistent states" [Critical,Confirmed]
jog0what would be better query to use for thatone18:07
*** dcramer_ has joined #openstack-infra18:07
jog0something like @message:"Lock wait timeout exceeded" AND @fields.filename:"logs/screen-q-svc.txt" AND @fields.build_status:"FAILURE" ?18:09
*** DinaBelova has joined #openstack-infra18:09
devanandawsme seems to be broken?18:09
*** julim has quit IRC18:10
devanandaclarkb: what's the interface to do searches on recent jenkins failures?18:12
jog0devananda: logstash.openstack.org18:13
fungidevananda: comes to us from the distant past of monday18:14
fungiwith news of wsme issues18:14
*** dmakogon__ has quit IRC18:14
*** dmakogon_ has joined #openstack-infra18:15
jeblairclarkb, fungi, mordred, jhesketh:
openstackgerritClark Boylan proposed a change to openstack-infra/config: Make devstack jobs templates and create check jobs
jeblairclarkb, fungi, mordred, jhesketh: if we merge that soonish, we can probably manage a zuul restart over the weekend to pick it up18:16
*** alexpilotti has quit IRC18:16
devanandafungi: wait. wsme's been broken since monday?18:17
*** odyssey4me has quit IRC18:20
sdaguejog0: i'd actually narrow the message to - "Lock wait timeout exceeded; try restarting transaction"18:20
anteayapleia2: hooray18:20
*** dcramer_ has quit IRC18:21
devanandafungi: logstash suggests that it broke ~4hr ago with the new upload of pecan18:21
* sdague just got back from lunch + bike ride, scrolling back18:22
openstackgerritClark Boylan proposed a change to openstack-infra/config: Make devstack jobs templates and create check jobs
jog0sdague: awesome thanks18:22
jog0 @message:"Lock wait timeout exceeded; try restarting transaction" AND @fields.filename:"logs/screen-q-svc.txt" AND @fields.build_status:"FAILURE" looks good18:22
clarkbsdague: tl;dr is the wsgi log filter doesn't handle swift and keystone18:22
clarkbsdague: because they are not oslo format18:23
clarkbsdague: for swift I think we just want to let them pass through and for keystone we may need a slightly more forgiving regex18:24
*** dmakogon_ has quit IRC18:24
clarkbbut let me know what you think18:24
*** dmakogon_ has joined #openstack-infra18:24
sdagueclarkb: sure. So what I should actually do is get some unit testing for this in tree so we can dump in a bunch of sample logs and make sure it works18:25
sdagueclarkb: is there a pattern already for unit testing things in the config tree?18:25
clarkbsdague: but we do have a tox.ini18:26
clarkbsdague: so you should be able to make use of that18:26
*** nati_ueno has quit IRC18:26
clarkbwe could split this into a proper project18:26
sdagueyeh, I'm mixed on that, it seems like more trees end up just being more complexity18:26
clarkbya there are tradeoffs18:27
*** nati_ueno has joined #openstack-infra18:27
*** odyssey4me has joined #openstack-infra18:27
*** zaro is now known as list18:27
*** list is now known as zaro18:27
dimsjog0, i added some notes, basically 4 SQL statements hit this18:28
sdaguedims: they are all basically the same fail though right?18:28
sdagueI think that neutron fail moves around18:28
dimssdague, all 4 SQL's end up with "Lock wait timeout exceeded; try restarting transaction" - yes.18:28
*** ryanpetrello has joined #openstack-infra18:29
sdaguejog0: well it's a database deadlock18:29
sdagueso that's kind of expected18:29
sdagueas whoever gets there last looses, and that's going to change18:29
*** rockyg has joined #openstack-infra18:30
dimssdague, y18:30
sdaguejeblair: you around? I want to get your opinion of trying to bring unit testing into config vs. breaking out to a separate project18:30
sdagueclarkb, jeblair: on the rax nodes, I'd say general pool. they should be running in 45min max I think (they were about 40% slower for devstack runs)18:31
sdagueat least short term18:31
sdaguecheck queue down to 20, nice. Much better than 19018:32
jeblairsdague: well, we sort of settlen on plan (b) which was still to just use rax nodes exclusively for check, but to also allow hp nodes to contribute to check.  clarkb just finished the change here:
sdagueok, that's cool too18:33
jeblairsdague: since the hard part is done, we might as well keep going with it, for now at least.  can always change later. :)18:33
Ajaegerclarkb: do you have a few minutes to discuss ? I'd like to know whether and how to rename the manual jenkins jobs18:34
jeblairsdague: i'm fine either way on testing, but i feel like by the time something needs a unit test, that's one of the signals that it's probably time for it to be its own project.  we have high hopes for this thing anyway.  i think splitting is a good idea, but am not opposed to more 'incubation' if you're not quite ready.18:35
sdaguesure, though I do think all the python in the config tree should have tests anyway :) solving a framework to make that easier would be good at some point.18:36
sdaguebut I expect we'll use some of the log parsing for other things here, so let me split this out18:37
*** reed has joined #openstack-infra18:37
dansmithwow, that monster check queue dumped pretty quick :)18:37
jeblairsdague: i think the thing is that mostly we don't think there should be very much python in the config tree.  a quick look suggests we're pretty close to that.18:38
sdaguedansmith: you can thank mtreinish and tempest testr for that. We can actually chew through it pretty quick when not starved :)18:39
dansmithsdague: I know why it's faster, I'm just saying I would have expected it to take longer than a couple hours given how huge it was18:40
jeblairdansmith: we threw 300 machines at it.18:40
sdaguenice :)18:41
dansmithjeblair: ah18:41
dansmithI try not to throw my machines around, personally, but.. thanks anyway :)18:41
sdagueoh, hey, yeh I didn't see the bottom graph18:41
sdaguethat's pretty awesom18:41
jeblairdansmith: that's how we roll here18:41
clarkbAjaeger: yes, actually something similar to what I have done to sort out devstack-gate stuff may help18:41
dansmithjeblair: props, yo.18:41
clarkbAjaeger: but basically have a single project entry aclled openstack-manuals that covers all of the various subsets18:41
Ajaegerclarkb, let me check devstack-gate in projects.yaml18:42
clarkbAjaeger: the section starting on line 917 then splits out the subsets18:42
clarkbAjaeger: ^ is where you should look18:42
Ajaegerclarkb: thanks for the reference18:43
*** _david_ has joined #openstack-infra18:44
openstackgerritClark Boylan proposed a change to openstack-infra/config: Make devstack jobs templates and create check jobs
clarkbI am hopeful that ^ will actually compile correctly18:46
_david_clarkb, mordred, jeblair i am working on WIP Gerrit-Plugin against Gerit master (upcoming 2.8 release)and hope to have something working in few days18:47
clarkbjeblair: fungi mordred ^ that passes. I think it is ready if you are, but I am going to lunch shrotly18:47
clarkb_david_: oh18:47
clarkb_david_: you should've told us earlier :)18:47
clarkbzaro: ^18:48
_david_i did18:48
jeblairclarkb: i believe _david_ is up to date on our efforts18:48
*** MarkAtwood has quit IRC18:48
clarkbah cool18:48
clarkbit is I who is behind18:48
jeblairclarkb: i think _david_ has a different risk profile with respect to working on gerrit and contributing upstream.  :)18:48
_david_with recent changes it is actually trivial thing to do18:48
_david_jeblair, ;-)18:49
jeblair_david_: neat.  do you think it will be an in-tree plugin, or a separate project?18:49
Ajaegerclarkb: so, something like this: ?18:50
_david_jeblair, what exactly do you mean by in-tree plugin?18:50
jeblair_david_: will it be in the gerrit repository, or a different one?18:51
zaro_david_: hi, did you comment on
jeblair_david_: (sorry, i haven't ever used a gerrit with plugins, i don't really know how they are maintained)18:51
_david_jeblair, that's a good question18:51
_david_zaro, yes, it was /me18:51
*** mrodden has quit IRC18:51
*** dkliban has quit IRC18:51
fungidevananda: ahh, new breakage then. that i think was the latest version trying to get us out of dependency hell in grizzly18:52
_david_jeblair, the only problem i see (may be we have more) that Change.State.WORKINPROGRESS and DashboardAccount should be extended in core and can be influenced by plugin,18:52
_david_well at least not yet.18:52
fungidevananda: dhellmann would probably be interested in your logstash link there18:53
_david_So here is my prototype for WorkInProgressAction (against Master):18:53
zaro_david_: ohh ok.  looks like difference of opinion going on.  hope it gets resolved soon.18:53
_david_jeblair, concerning place: we have two option18:53
_david_on gerrit-review or on openstack, right?18:53
dhellmannfungi, devananda : ryan is working on the problem18:54
_david_may be we still would need very little core patch to make it work,18:54
dhellmannfungi, devananda : but any debugging details you have may help18:54
fungiawesome, thanks dhellmann and ryan!18:54
jog0sdague: for bug
uvirtbotLaunchpad bug 1230407 in neutron "VMs can't progress through state changes because Neutron is deadlocking on it's database queries, and thus leaving networks in inconsistent states" [Critical,Confirmed]18:54
_david_but i hope to convince guys to make it work with against upstream gerrit18:54
ryanpetrelloyep, seems to be some sort of issue introduced in the pecan/wsme plugin w/ today's pecan release18:55
*** julim has joined #openstack-infra18:55
dhellmannif we're seeing gate blockages, I can propose a change to pin pecan for now18:55
zaro_david_: what is the difference between your WIP plugin and my patch to upstream?18:56
*** sodabrew has joined #openstack-infra18:56
_david_zaro, i don't understand that question18:56
*** sarob_ has joined #openstack-infra18:56
zaro_david_: ohh, i implemented that patch so we can create a custom WIP vote and you are creating a WIP plugin.  so i'm just asking what would be the difference?18:57
_david_zaro, wip plugin is 1 to 1 migration of Shrews's change against latest master with may be 10 line of upstream patch (for now)18:58
*** dkliban has joined #openstack-infra18:58
*** dcramer_ has joined #openstack-infra18:58
zaro_david_: ahh i see.  thx for the clarification.18:59
sdaguejog0: what's the question?18:59
sdaguesorry so many pings18:59
openstackgerritDoug Hellmann proposed a change to openstack/requirements: Pin pecan to avoid the latest release
_david_zaro, in the handling: you not vote with a label, you just mark it as in Shrews's change directly on change screen18:59
dhellmannfungi, devananda : I opened for tracking rechecks and the real fix18:59
uvirtbotLaunchpad bug 1232199 in pecan "release 0.4 breaks some operations with WSME" [Undecided,New]18:59
Shrewsugh, don't remind me of that horrific coding experience19:00
jog0sdague: I'll move this to the qa room where its a little less noisy19:00
_david_Shrews, why? ,-)19:00
*** sarob has quit IRC19:00
clarkbAjaeger: yes. would need to check the output to be sure though19:01
*** sarob_ has quit IRC19:01
devanandadhellmann: thanks! judging by logstash, i suspect ceilometer and ironic are blocked on this, but nothing else is showing up yet19:02
fungiShrews: pick a time and i'll join you while you drown your memories of that project in a few pints. olaph too19:02
dhellmanndevananda: ok, good. see that changeset a few lines back for a requirements pin to work around it for now19:02
Shrewsfungi: yes, we should make that happen19:02
fungiShrews: olaph: the lynnwood grill next door to me just started a brewery recently, and now have several kinds ready for consumption on premises19:03
_david_Shrews, i wonder about that comment in your code: WORKINPROGRESS ... It implies that there is more work to be done, but the change will not show up in any review lists until a new patch set is pushed.19:03
*** vipul is now known as vipul-away19:03
Ajaegerclarkb: sure, this was untested, just wanted to know whether I'm on the right track.19:04
Shrews_david_: Where? Which comment?19:05
_david_git push convert it? Is that true? Or a change owner has to explicitly to convert it to Status.NEW?19:05
_david_line 28519:05
*** yolanda has joined #openstack-infra19:06
Shrews_david_: The intent that, along with clarkb's patch, any WIP review would not show up in a reviewer's list. Once a new patchset is pushed to a WIP review, it becomes "Ready for review" again.19:07
Shrewsdoes that answer your question? not sure exactly what you're looking for19:07
jeblairclarkb: i spot checked the output of your change locally, lgtm19:08
clarkbits basically a public draft19:08
_david_Shrews, Can you point me were that conversion take place?19:08
_david_I thought you have two buttons: WIP and Ready for review?19:08
Shrews_david_: i don't think it was recorded. it was mainly discussed in this channel19:08
jeblairShrews: 'conversion' not 'conversation'19:09
jeblairShrews: (i did the same thing, finally read it right the 3rd time)19:09
Shrewsoh, duh19:09
_david_1/ git push => Status.NEW19:09
jeblairthe conversion conversation was not conserved19:09
_david_2/ i click on WIP button => Status.WIP19:09
_david_my question how i suposed to get back to Status.NEW again ?19:10
_david_All use cases please ;-)19:10
jeblairfungi, mordred:
fungijeblair: yep, almost through reading that one19:10
Shrews_david_: case 1) new patchset uploaded, case 2) press R4R button. fin19:10
Shrews_david_: I don't remember the code well enough to point you to specific areas19:11
notmynamemordred: FYI
_david_Shrews, i didn't find where 1case 1) in code. can you pint me?19:11
mordred18:00:36 hub_cap | one of my beefs is that i scream, fucking SCREAM at people internally19:11
jeblairhub_cap: i hear your screams from here19:12
mordrednotmyname: responded19:13
mordrednotmyname: swear swear swear grumble grumble swear swear19:13
mordrednotmyname: I kept my comment short, to keep the swearing out, fwiw19:13
*** basha has joined #openstack-infra19:13
Shrews_david_: I *think*. Like I said, I really can't remember the code too well19:14
notmynamemordred: and I'm the one who has to play the diplomat standing between the 2 of you ;-)19:14
mordrednotmyname: lovely19:14
mordredwell, his patch is completely non-functional19:14
mordredlike, it's not even close to being functional. it looks like a patch made in anger with absolutely no thought19:14
jeblairnotmyname: do you know if michael barton is planning on submitting a similar patch to the other 56 openstack projects?19:15
notmynamemordred: try not to review in that way ;-)19:15
_david_Shrews, i don't think so, there you put if the button on Views should be enabled or no19:15
mordrednotmyname: I will not19:15
*** mrodden has joined #openstack-infra19:15
mordrednotmyname: I am not, in fact, going to review it further19:15
jeblair(because if not, it may not be as well thought out as the patch that added in pbr)19:15
notmynamemordred: jeblair: and, like I said, FYI.19:15
Shrews_david_: well, i don't remember then19:15
mordrednotmyname: I believe "all of the openstack projects use it and it plays a key role in release management" should be clear enough19:15
fungiclarkb: no need for a check-tempest-devstack-vm-heat-slow since gate-blah is only in the experimental pipeline?19:15
*** jcoufal has joined #openstack-infra19:16
notmynamemordred: yes, but "the way things are" is not a compelling argument for most people. /me being a diplomat19:16
clarkbfungi right19:16
_david_Shrews, and you are absolutely sure that it is implemented?19:16
mordrednotmyname: I understand. but sometimes here, with as many projects as we have, I cannot make 56 different long-form arguments to everyone who would just happen to have chosen to solve the problem differently19:17
Shrews_david_: If it isn't then I don't know how review.o.o has been working that way for the last umpteen months19:17
notmynamemordred: agreed19:17
*** DinaBelova has quit IRC19:17
jeblairnotmyname: i, and i'm sure many others agree with you.  something about the fact that he chose to propose that patch without even trying to understand why things are the way thay are rankles a bit.19:17
mordrednotmyname: thank you, btw, for diplomating here19:17
* lifeless is curious about which patch is being discussed; couldn't find the start of the conversation19:17
*** MarkAtwood2 has quit IRC19:18
notmynamejeblair: yes, but from the opposite perspective, pbr is making his day-to-day life more difficult without offering any perceived benefit (ie he now has to repackage the library himself instead of using something on pypi, and it includes more dependencies that may also need to be repackaged too)19:19
notmynamejeblair: note I'm not arguing against pbr here19:20
jeblairnotmyname: yep.  pbr makes some things more and some things less difficult, no argument there.  attempting to delete it is a strange way of learning about what those are and what solutions there may be to his problems.19:22
notmynamejeblair: we don't need to rehash long-form arguments about pbr here or now. I'll see what can be done19:23
lifelessmordred: notmyname: huh, what I find interesting is the lack of attempt to understand - did he file a bug on pbr and the situation it fails in?19:24
mordredthat is what I would like to respond19:25
mordrednotmyname: he does not have to repackage the library himself19:25
openstackgerritA change was merged to openstack-infra/config: Make devstack jobs templates and create check jobs
notmynamemordred: thanks. gotta run to a lunch meeting...19:26
mordrednotmyname: if he would read the documentation put together for packagers, he would see that he has to set an env var19:26
mordrednotmyname: thank you!19:26
*** basha has quit IRC19:27
hub_caplol mordred19:28
hub_capjeblair: u might be able to hear those screams19:28
jeblairfungi, mordred, clarkb:  Make devstack jobs templates and create check jobs  just merged; exciting things should be happenening soon19:28
clarkbfingers are crossed19:28
* mordred waits19:28
*** odyssey4me has quit IRC19:28
jeblairhub_cap: i have been hearing a lot of sirens recently; do you have something to do with that?19:28
openstackgerritJeremy Stanley proposed a change to openstack-infra/config: Determine the package name when uploading to PyPI
hub_capnope. it could be the band of gypsies that have set up shop on dwight... a big bus of em, and some sleeping in cars in the area19:29
openstackgerritA change was merged to openstack-infra/config: Determine the package name when uploading to PyPI
fungii'm not sure exciting is what i want out of my evening... here's hoping it's exciting in a good way and not in the usual way19:31
jeblairi'm going to run puppet on jenkins masters manually to make that happen a bit faster and smoother19:33
jeblair(to minimize the time that the check jobs don't exist before zuul reloads and starts using them)19:34
*** odyssey4me has joined #openstack-infra19:36
jswarrenbnemec, if you're not busy fighting neutron or any other component, seems to have settled down a bit in case you're up for another look.  I seem to have a talent for finding problems to work on that are not straightforward to explain and whose solutions are not easy to justify concisely.  Just lucky, I guess.19:40
jeblairzuul change is going in now19:41
jswarrenoops, wrong channel.19:41
*** wchrisj_ has joined #openstack-infra19:46
ryanpetrelloFYI, I have a review open for pecan which will resolve the WSME issue19:46
*** CaptTofu has joined #openstack-infra19:47
fungimordred: is the current thinking that pbr should only be a setup_requires a la
*** jswarren has quit IRC19:47
fungimordred: because basically all of the clients still have it listed in their requirements.txt as if it were a runtime requirement19:48
fungiwhich i can see potentially confusing downstream/distro package maintainers19:48
jgriffithjeblair: clarkb inerested in changing the settings to nova.conf in the gate..... not sure what repo/where the best place to do that is?19:51
jgriffithjeblair: clarkb I'd like to bump CONF.num_iscsi_scan_tries19:51
mordredfungi: it depends on whether or not they use it at runtime19:52
mordredfungi: for version processing19:52
fungimordred: got it... the bits which are in the process of being moved to oslo19:52
fungijgriffith: for devstack-gate jobs? if it makes sense to be adjusted as a default behavior for devstack, then in devstack. if it's really very specific to how we're testing things and not generally helpful (or potentially harmful) to other devstack use cases, then overriding in devstack-gate would be appropriate19:53
fungibut we try to keep devstack-gate from changing devstack defaults if at all possible, so that we don't "test with devstack" using configurations dissimilar to the way other people run devstack in general19:55
*** rfolco has quit IRC19:56
jgriffithfungi: hmm... ok19:56
jgriffithfungi: there's an awful lot of "added" changes from devstack in the gate configs which is why I asked but cool by me19:56
jeblairjgriffith: we hate all of them19:57
fungiwe've been moving those out as we can19:57
jgriffithjeblair: haha... Ok, now that makes more sense :)19:57
*** MarkAtwood has joined #openstack-infra19:58
*** SergeyLukjanov has quit IRC19:58
*** ryanpetrello has quit IRC19:59
*** vipul-away is now known as vipul20:01
mordredfungi: that's right20:01
jeblairzuul is now using the check jobs20:02
*** _david_ has quit IRC20:03
fungiso it should be safe to re-diversify the pipeline precedence settings again?20:03
*** ryanpetrello has joined #openstack-infra20:03
jeblairfungi: yes, if we're okay with the possibility of starving check of the unit test runners.  so all told, i'm leaning toward leaving it for now.20:04
openstackgerritDirk Mueller proposed a change to openstack/requirements: Raise Babel requirements to >= 1.1
openstackgerritAndreas Jaeger proposed a change to openstack-infra/config: Use Jenkins templates for old manual jobs
Ajaegerclarkb: your suggested change worked fine for me, I've updated the patch, see ^^20:06
*** alcabrera has quit IRC20:07
clarkbAjaeger: cool, I will take a look20:07
*** sarob has joined #openstack-infra20:07
Ajaegerclarkb: thanks. If you have further ideas, just comment on it and I'll fix in the following days. For now I'm calling it a day.20:08
* Ajaeger waves good-bye20:08
clarkbhave a good weekend20:08
*** alcabrera has joined #openstack-infra20:08
Ajaegerclarkb: thanks, same to all of you!20:09
*** yolanda has quit IRC20:09
*** Ajaeger has quit IRC20:09
clarkbjeblair: which zuul change did you want reviewed?20:11
*** basha has joined #openstack-infra20:11
*** sarob has quit IRC20:13
clarkbjeblair: we should also get in20:13
clarkbjeblair: I didn't approve due to the -1, but figure you can decide if that is worth overriding20:13
*** prad_ has quit IRC20:14
clarkb48684 lgtm20:14
*** dprince has quit IRC20:14
mordred48684 has now been reviewed by all of us20:14
Alex_Gaynorjeblair: want to review while you're in that area? (tahnks!)20:14
jeblairAlex_Gaynor: nice catch, thanks20:15
*** CaptTofu has quit IRC20:16
*** basha has quit IRC20:16
*** prad has joined #openstack-infra20:16
*** prad has quit IRC20:16
*** rockyg has quit IRC20:18
*** rockyg has joined #openstack-infra20:18
*** dmakogon_ has quit IRC20:20
jeblairthat host was producing this error as fast as it could:
jeblairi disconnected/reconnected it20:21
jeblairi hate jenkins20:21
jeblairprecise10 is doing it as well20:21
*** CaptTofu has joined #openstack-infra20:22
clarkbjeblair: could that be related to the increase in slaves?20:22
clarkbjenkins does seem to have an upper bound on the number of slaves it can handle before it starts failing to keep them connected20:23
jeblairclarkb: beats me.  do you understand that traceback?20:23
clarkbI don't20:24
clarkbit is trying to run a remote connection20:24
jeblairclarkb: want to spin up jenkins03?20:25
fungii'm happy to start firing up a jenkins or two if you want to keep troubleshooting20:26
clarkbjeblair: we can try it20:26
clarkbI don't have much time to do that though, I need to finish preping for next week20:27
fungilooks like we used a 30gb flavor?20:27
jeblairclarkb: to be clear, i wasn't suggesting it as much as asking if that was your suggestion.  ;)20:27
fungi8x vcpu with load average hovering a little over 5, slightly more than 50% of ram in active use (not buffers/cache). looks like it's sized appropriately--would be struggling a little on the next flavor down20:29
*** wchrisj_ has quit IRC20:29
clarkbjeblair: ah, yes. So in the grizzly cycle with one jenkisn we ran into similar problems as we added more and more slaves20:30
jeblairclarkb: oh, did we see that error?20:30
fungii was looking at jenkins02, which is interestingly a little more heavily-loaded than jenkins01 for some reason20:30
clarkbjeblair: I don't remember if it was this specific error, but it happened in a similar way. Immediately when starting jobs jenkins threw an exception indicating that something in the communication had failed20:31
clarkbfungi: oh maybe20:31
jeblairfungi: well, that was the jenkins to which those two slaves were attached20:31
clarkbfungi: maybe we are running into that issue with the threads hanging around again20:31
*** flaperboon is now known as flaper87|afk20:31
fungicould be, just catching it in the early stages so symptoms aren't nearly as pronounced yet20:31
jeblairprecise12 just threw the same error20:33
jeblair(also jenkins02)20:33
fungi1.5m threads20:34
fungiThreads on Number = 1,628, Maximum = 2,152, Total started = 1,512,72720:34
clarkbsorry, I probably shouldn't find that so funny20:34
fungioh, wait, wrong counter20:34
openstackgerritDavid Peraza proposed a change to openstack/requirements: Adding sqlalchemy db2 dialect dependencies
fungiso no, not anywhere near as high as that last time20:34
clarkbyeah the Number value is what you want and that doesn't look too terrible20:34
fungipulling up 01 for a spot comparison20:35
*** ryanpetrello has quit IRC20:35
jeblairi just checked the rest of the precise nodes on jenkins02, they're not failing jobs with that error (yet)20:35
fungiThreads on Number = 1,276, Maximum = 1,474, Total started = 862,53820:36
*** ryanpetrello has joined #openstack-infra20:36
fungiso 02 is definitely higher, but only by about 30%20:36
jeblairbtw, the status pgae, starting with 48516 is interesting -- that's what happens when changes behind a single change fail in succession20:36
jeblair(and yeah, the top is broken; i'll fix that next week)20:36
*** sarob has joined #openstack-infra20:37
fungiwow, that's a great indication that the tempest change at 48516 is causing the trouble not for itself but for changes which follow20:38
fungioh, except those failures aren't in tempest tests (yet)20:39
jeblairfungi: yeah, that would be the interpretation except that the actual problem is that all of those changes happened to hit our bad jenkins nodes20:39
fungiso csincidence20:39
fungishould we put jenkins02 in shutdown and restart it to limp through before adding more masters (if we think we're bumping up against an inherent slave tracking limitation)?20:40
fungiand also scale down nodepool's per-master max setting?20:40
jeblairfungi: i reconnected those slaves, and they seem better at the moment; i think we can leave 02 as is for now; i don't really want to lose its capacity20:41
*** jcoufal has quit IRC20:42
fungiso back to the earlier question... go ahead and start building more masters? or hold off until we're more certain it's warranted?20:42
jeblairi wasn't expecting problems until we had more slaves, but perhaps 200/master is the mark.20:44
*** odyssey4me has quit IRC20:44
pleia2anteaya: gave owncloud a spin in win7 with IE9, all works as expected20:44
*** flaper87|afk is now known as flaper8720:44
anteayathanks pleia220:44
pleia2sure :)20:45
* pleia2 logs out of windows before she gets dirty20:45
jeblairi think we peaked at around 186 slaves total20:45
anteayano kidding20:45
anteayathat's a lot of slaves20:45
jeblairper master, including unit test workers20:45
uvirtbotLaunchpad bug 1148900 in openstack-ci "Could not initialize class jenkins.model.Jenkins$MasterComputer" [High,Fix released]20:46
jeblairblast from the past20:46
funginodepool is reinventing jclouds failure modes ;)20:47
fungiexcept not really, because these are static slaves which have been connected and running jobs just fine20:48
jeblairfungi: except these are long running nodes20:48
* fungi nods20:48
jeblairfungi: i am leaning toward not spinning up another master20:49
jeblairi favor: if it happens again, restart that jenkins master, and if it happens again after that, add a new master.20:49
zaropleia2: did you try map drive using webdav protocol?20:49
pleia2zaro: no, that's a good idea20:49
fungii like that having multiple masters, we can restart them now without any downtime for other systems, merely temporary loss of capacity20:50
jog0are you running jobvs on rax yet?20:50
jog0 Ithink that may be breaking the large-ops test20:50
jeblairjog0: yes20:50
jeblairjog0: link?20:50
zaropleia2: i had problems with that last time i tried.20:50
jog0so large
jeblairjog0: yeah, it looks like the only successful runs of check-tempest-devstack-vm-large-ops have been on hpcloud20:51
jeblairjog0: any ideas?20:51
fungithreadcount on jenkins01 and 02 is equalizing a bit now as well20:51
openstackgerritA change was merged to openstack-infra/config: Handle when `id` is null.
jog0jeblair: we would have to tweek the large-ops number for rax20:52
openstackgerritA change was merged to openstack-infra/zuul: On null changes serialize the id as null
openstackgerritA change was merged to openstack-infra/zuul: Allow multiple invocations of the same job
jeblairjog0: why?20:52
jog0because it was tuned to work for hpcloud20:52
jog0the test check to see if it can boot x VMs using fake virt driver. where a common error is something timeing out20:52
fungiseems a bit inexact20:53
jeblairjog0: so why would that need to be different?20:53
*** MarkAtwood has quit IRC20:53
jog0so rax cloud is running slower so timeouts happen with less VMs20:53
jeblairBuildErrorException: Server %(server_id)s failed to build and is in ERROR status20:53
fungibasically it's performance-testing the cloud provider, it seems20:53
jeblairjog0: a server being in error state is a result of that?20:53
jog0fungi: yeah and our code too20:53
jog0jeblair: yup20:54
jog0nova-net times out20:54
openstackgerritDavid Peraza proposed a change to openstack/requirements: Adding sqlalchemy db2 dialect dependencies
jog0when all cloud resources were equal, the test just performance tested our code. but with two very different couds ... :(20:55
jeblairjog0: it was an illusion that all cloud resources were equal, i'm afraid20:55
jeblaireven hpcloud has significant variance20:55
jog0some are more equal then others?20:55
jeblairespecially when we approach release deadlines.  :)20:55
jog0jeblair: yeah the number I picked before seemed pretty stable20:55
jog0accross all HP cloud20:55
jeblairso these aren't really designed to be performance tests -- ideally these should work on developers laptops too...20:56
jog0never got fails like this with HP cloud, at least extremely rarely  (I never found one)20:56
jog0jeblair: it does you just have to pick one param20:56
jeblairjog0: ideally the test would be structured to be more tolerant of the environment it's running in.  but for our immediate problem, would you like to adjust the parameter or remove the test?20:57
jog0jeblair: lets just remove it due to the nature of the gate right now I think its safe to say this shouldn't get priority at this juncture20:58
jog0and revist post havana20:58
*** julim has quit IRC20:58
jeblairjog0: shame to lose a test.  :(20:59
jog0yeah ...20:59
jog0I think the answer in the future will be have two numbers one for hpcloud and one for rax20:59
jog0that will take at least a day of testing and whatnot to get right21:00
jeblairjog0: and one for the next provider we get, and one for the one after that?21:00
jog0have to run recheck a dozen times or so to be sure I am right21:00
jog0we can maybe find a CPU perf metric to corrilate with a number21:00
jog0once we get two datapoints21:00
fungiunfortunately, those will also probably have to be retuned even for existing providers as their performance characteristics change over time21:01
jog0so if CPU A is 30% slower then CPU B, number should be 30 percent lower too21:01
*** freyes has quit IRC21:02
jog0fungi: perhap, the test is there to detect order ofmagnitide slowdowns21:02
*** matty_dubs is now known as matty_dubs|gone21:02
jog0and I would hope a cloud wouldn't have that fluctation21:03
jeblairjog0: i used to hope that21:03
*** sodabrew_ has joined #openstack-infra21:04
jog0jeblair: lets talk about a smarter way to do this in Edinburgh21:04
jog0or HK21:04
fungiwe've definitely been in situations where new vms ended up on compute nodes with very resource-hungry neighbors21:04
jeblairjog0: we have seen some of the metrics we care about change up to 3x over time; including both cloud providers.21:04
* fungi needs to disappear and do a bit of cooking... bbl21:05
jog0jeblair: ouch21:05
*** sodabrew has quit IRC21:06
jog0well if we ollect those numbers today ... we can make something adjust to that21:06
jeblairjog0: so i think we can probably live with running the large-ops test only on hp for now, as long as we definitely plan to improve it later.21:06
jog0that would be awesome21:06
jog0that test came out of the issues with rootwrap21:07
*** ArxCruz has quit IRC21:07
*** tjones has joined #openstack-infra21:07
jog0jeblair: didn't realize that was an option to put it on one cloud only21:08
*** julim has joined #openstack-infra21:08
jeblairjog0: it's not a good option -- it's working against how we're trying to manage resources.  and if we have further problems, it'll be the first thing to go.  but we can try it.  :)21:08
*** senk has joined #openstack-infra21:09
*** jcoufal has joined #openstack-infra21:10
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Run large-ops test only on hp nodes
*** rnirmal has quit IRC21:11
jog0fair enough21:11
jog0yeah we need to revsiit this in the near future21:11
jeblairjog0: so while you're around... other than Zhi Kun ZK Liu being on vacation, do you know if work on those 2 bugs is progressing?21:12
jog0jeblair: a little sdague and jgriffith and dims are doing stuff21:13
jog0jeblair: see -qa21:13
jeblairjog0: thx21:14
jog0my call to arms / public shaming worked a little21:14
jgriffithjog0: /window 2521:15
*** vipul is now known as vipul-away21:15
*** senk has quit IRC21:17
*** julim has quit IRC21:19
*** tjones has quit IRC21:20
jeblairi just saw some more of those errors21:20
jeblairi've put jenkins02 in shutdown21:20
*** markmcclain1 has joined #openstack-infra21:21
*** markmcclain has quit IRC21:22
*** markmcclain has joined #openstack-infra21:22
*** markmcclain has quit IRC21:24
*** markmcclain has joined #openstack-infra21:24
*** markmcclain1 has quit IRC21:26
jeblairclarkb: ping21:27
jeblairclarkb: i need to be merged but it depends on
*** alcabrera has quit IRC21:27
*** vipul-away is now known as vipul21:28
*** anteaya has quit IRC21:28
dimsk i'll be back in a few hours21:29
*** markmcclain1 has joined #openstack-infra21:29
jeblairlacking that, i have manually executed "set global max_connections=1024;" in mysql on nodepool21:29
*** mriedem has quit IRC21:30
*** markmcclain has quit IRC21:30
ryanpetrellookay, a new version of pecan (0.4.2) has been released that resolved the wsme breakage21:36
jeblairoh nevermind, 0.6.1 doesn't have it either21:36
jeblairclarkb: ^21:36
clarkbjeblair: looking21:36
dhellmannjeblair, fungi: we'd like to land so we can set up cross-check jobs to gate pecan and WSME. The change has 2 +2 but isn't approved. Is there something else we need?21:36
jeblairclarkb: i'm trying to add max_connections; i don't think it's supported even in 0.6.1.  i may have to add a /etc/mysql/conf.d/ file21:38
clarkbjeblair: we could potentially go to an even newer version. 0.6.1 was chosen to minimize delta while getting the desired results21:38
mgagnejeblair: looks to be only supported in 1.0.0. adding a custom conf file looks to be the solution atm. I have the same problem with my setup.21:39
jeblairdhellmann: i think we're afraid to merge that at the moment (if it goes wrong everything breaks), and there's quite a bit of excitement already.21:39
dhellmannjeblair: fair enough :-)21:39
dhellmannjeblair: we'll work on setting up the tests, and come back when things settle down to configure the gate jobs21:39
mgagnejeblair: 0.9.0 supports it
jeblairdhellmann: ok.  feel free to ping us when you think it might be a good time (in case it slips our minds)21:40
dhellmannjeblair: count on it! ;-)21:40
jeblairmgagne: oh, that might work.  it has both config_hash and max_connections.21:41
dkranzThis recent failure looks like some infra issue but I haven't seen it before
mgagnejeblair: 0.8.0 looks to be the first version to support the parameter.21:41
jeblairdkranz: in what way?21:41
dkranzjeblair: It seems to just stop during setup of tempest21:42
*** pabelanger has quit IRC21:46
jeblairdkranz: it looks like it stopped while running devstack.  but i don't think it's an infra problem -- the node continued to run, including doing all of the cleanup work and copying the log files21:47
dkranzjeblair: So what kind of problem do you think it is? Should I just recheck no bug?21:48
dkranzjeblair: I've been trying not to do that.21:48
jeblairdkranz: i'd start with the idea that it's a bug in devstack.  note that lots of services are running and devstack has been doing work to set up images, etc... so it at least got that far.21:51
dkranzjeblair: OK, I'll check there and file a bug if I don't turn up anything. Thanks.21:52
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Set mysql max_connections to 1024 on nodepool
*** bnemec_ has joined #openstack-infra21:53
jog0dkranz: I have seen things like this  before so opening a bug maybe a good idea21:55
dkranzjog0: I asked Jim and he suggested starting with the idea that it is a devstack bug21:55
dkranzjog0: I will file a bug there if there isn't one already21:55
*** pcm_ has quit IRC21:56
dkranzjog0: Because the job does finish but just stops in the middle of devstack running21:56
dkranzjog0: presumably returning non-zero exit code21:56
jog0sigh yet another racy bug21:56
*** bnemec has quit IRC21:57
fungiwe don't have enough of those yet21:58
openstackgerritA change was merged to openstack-infra/config: Run large-ops test only on hp nodes
*** flaper87 is now known as flaper87|afk22:02
*** pabelanger has joined #openstack-infra22:02
mordredmoring all. I'm back on line - anything I can jump on?22:03
jeblairi'm about to restart jenkins02 because of the errors we saw earlier (check scrollback)22:04
clarkbmordred: puppet-mysql has come up again22:04
mordredclarkb: ugh. what now?22:04
clarkbmordred: thats not super urgent though22:04
clarkbmordred: jeblair needs to limit the number of connections for nodepool and the version of the module we have doesn't do that22:04
clarkbmordred: newer versions do22:04
jeblairclarkb: _raise_ the limit22:04
lifelessanyone seen22:05
lifeless  File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/pip/backwardcompat/", line 90, in fwrite22:05
mordredah. interesting22:05
clarkbjeblair: ah22:05
lifeless    f.write(s)22:05
lifelessValueError: I/O operation on closed file22:05
lifelessbefore ?22:05
mordredjeblair: not doubting - but are you sure that's what you want to do?22:05
jeblairmordred: yes.  please read the commit message and let me know if you think otherwise.22:05
mordredjeblair: increasing max_connections often has less positive effects than you might want (if you are sure, then fine, just checking)22:05
mordredjeblair: ok. cool.22:05
jeblairmordred: i'm not running a php script in apache, which is more or less what the default is tuned for.  :)22:06
mordredah. ok. so, each threadconnection should essentially be performing like a quick query22:06
*** thomasm has quit IRC22:06
mordredthe patch looks good- I potentially agree with fungi's comment - but I haven't really used conf.d files in anger22:07
jeblairyes, except it might be a couple of queries separated by like 10 minutes, but each only looking at one row.22:07
fungijeblair: i think it's evidence nodepool should have been written in php22:07
mordredlifeless: yes. but I cannot for the life of me remember why or what it was trying to do wrong22:07
jeblairmordred: if you could answer fungi's comment-question, that would be swell.22:08
*** dcramer_ has quit IRC22:08
mordredah - answer is "yes"22:08
jog0clarkb: can you make the elastic-recehck gerrit user22:08
mordredit needs to be in [server[]22:08
mordredit needs to be in [server]22:08
jog0so I don't have to keep using my own account22:08
mordredor mysqld22:09
mordredeither will work22:09
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Set mysql max_connections to 1024 on nodepool
morganfainbergjog0, using your own account just makes you look like you're looking at everyone's changes ;)22:10
*** sodabrew_ has quit IRC22:10
jog0morganfainberg: but it sends me too many emails22:10
jeblairi'm going to upgrade the gearman plugin on jenkins02 since i'm restarting it anyway22:10
morganfainbergjog0, hehe.  i bet.22:10
fungijog0: i can do it after i stop cramming food in my mouth hole. need an ssh key and, if possible, a dedicated contact e-mail address (not shared with any other gerrit user since gerrit has issues with duplicate e-mail addresses) and a display name you want it using in comments if different from the ssh username (can include spaces and whatnot)22:10
*** tjones has joined #openstack-infra22:11
jeblairfungi: this is an infra account22:11
jeblairit's going to be run on the logstash host22:11
jeblairfungi: so i think we should create it ourselves and stick it in hiera22:11
fungiso we'll want to puppet the keys in and whatnot22:11
jog0jeblair:  I was hopign at first I could run it on my box for debugginga nd whatnot22:11
jog0if not I can work around that too22:12
clarkbwhy don't I fix my review really fast22:12
clarkbthen maybe we can just deploy it on logstash.o.o and debug there22:12
*** sarob has quit IRC22:12
jog0clarkb:  works for me22:12
*** AlexF has joined #openstack-infra22:12
*** sarob has joined #openstack-infra22:13
*** alexpilotti has joined #openstack-infra22:14
openstackgerritClark Boylan proposed a change to openstack-infra/config: Deploy elastic-recheck on
*** flaper87|afk is now known as flaper8722:15
clarkbjog0: fungi jeblair ^ there we go22:15
*** sarob has quit IRC22:17
jog0clarkb: so I don't think elastic-recheck is wired up to pip yet22:18
jog0not really sure whats needed to put on pypi22:18
clarkbjog0: we don't need it on pypi22:19
clarkbjog0: we will CD it from git22:19
clarkbjog0: we just need it to be python installable22:19
jog0even better22:19
jog0ohh haven't tried that heh22:20
clarkbeventually we may want to pypi it, but for now this is good22:20
jeblairrestarting jenkins0222:20
*** datsun180b has quit IRC22:22
jeblairthe thing i love about the gearman plugin is how it starts running jobs before jenkins webui is even up.22:24
*** jcoufal has quit IRC22:25
mordredjeblair: ++22:26
jeblaireven before the nodes themselves are ready.22:27
mordredwell, that's less exciting, but still fun22:28
*** justinabrahms has joined #openstack-infra22:28
jeblairwell, after failing 100 jobs or so, it seems to be a bit better now.22:30
sdagueclarkb: where in the tree are the logstash parsing rules?22:30
clarkbsdague: modules/openstack_project/templates/logstash/indexersomsething22:31
*** _david_ has joined #openstack-infra22:32
_david_clarb, jeblair, mordred done ;-)22:32
_david_WIP plugin (on top of Gerrit 2.8):
_david_Even with screen cast, you can see it in action on new and shiny change screen 222:33
*** flaper87 is now known as flaper87|afk22:33
sdagueclarkb: cool22:33
_david_And this is the patch upstream that still needed for that to work:
clarkb_david_: are there any ACLs around it?22:34
_david_clarkb, sure ;-)22:34
_david_let me point you to that:22:34
clarkb_david_: that is where zaro's patch comes in, being able to allow change owners permissions to do things to a change that not everyone else may be able to do22:34
clarkb_david_: awesome22:35
_david_clarkb, take a look on pictures22:35
_david_in Gerrit 2.8 i introduced so called plugin owned capabilities (old permissions):22:36
_david_so you can just annotate REST endpoints:22:36
clarkb_david_: then in your ACL config you would give that capability to groups?22:37
* _david_ solved ACL in another patch already:22:37
*** che-arne has joined #openstack-infra22:38
jeblairclarkb, mordred, fungi: i had to disconnect/reconnect some slaves from jenkins02 because they couldn't find their workspace22:38
mordredjeblair: k. that's weird22:38
jeblairi think it's because gearman plugin starting using them too early22:38
jeblairand they don't seem to be able to fix themselves22:38
*** tjones has quit IRC22:39
_david_clarkb, exactly, Capabilities are global permisions (exactly like in Shrews change).22:39
mordredjgriffith: just catching up - are you making progress anywhere with the CONF.num_iscsi_retries ?22:40
clarkb_david_: perfect22:41
*** CaptTofu has quit IRC22:42
jgriffithmordred: just started running it through gates22:44
mordredjgriffith: awesome. here's hoping it helps!22:44
jgriffithditto... although at this rate it will take forever to have any good data22:44
jeblairi just disconnected all of the precise slaves from jenkins0222:45
jeblairthat was a lot of clicking22:45
jeblairi think the restart process needs to be:22:45
jeblairenter shutdown mode; wait; disable gearman plugin; stop; start; wait; enable gearman plugin22:45
mordredjeblair: yes. I agree22:46
jeblairclarkb, fungi: ^ fyi22:46
*** dcramer_ has joined #openstack-infra22:48
*** _david_ has quit IRC22:51
fungimakes sense to me22:52
clarkbwe didnt have problems with the last restart22:52
clarkbbut being defensive can't hurt22:53
fungiwe probably need something somewhere which can tell whether the slaves are ready and waits for them to settle before jenkins starts accepting jobs on their behalf22:53
sdaguewhere is that cookie cutter repo again?22:53
fungior maybe it just waits for us to start connecting slaves directly to the gearman server22:53
fungisdague: openstack-dev/cookiecutter22:54
sdaguejgriffith: it seems to have hit the same issue again22:54
jgriffithanybody else noticed the errors spewing everywhere22:59
*** nicedice has joined #openstack-infra22:59
* fungi checks his faucet23:02
fungijgriffith: which errors? and i assume spewing in job failure console logs, but... example?23:03
jgriffithfungi: just step through a search on error or trace23:03
*** rcleere has quit IRC23:04
jgriffithfungi: I'm also confused by the xen volumes mounted in this test output23:04
*** AlexF has quit IRC23:05
jgriffithxen-vdb-51744-part1 etc23:05
fungigrr. i'm clearly on the wrong evening computer. its hanging up my browser23:05
*** sodabrew has joined #openstack-infra23:05
* jgriffith wants diff computers for diff times of day :)23:06
jgriffithjeblair: sdague well that didn't tell us much except that upping the retry count isn't going to help us23:07
jgriffithwhat's bothersome about this is if you look at syslog, it appears that we connected over IET succesfully23:09
*** boris-42 has quit IRC23:11
fungieek, clicking trace on that log oom'd firefox, but took this poor netbook with it for several minutes while it dod so23:11
fungidid so23:11
*** gyee has quit IRC23:11
fungi512mb ram used to seem like a lot23:12
jgriffithfungi: hehe23:12
* jgriffith takes back his earlier comment about wanting multiple coputers like fungi23:12
fungiyeah, you don't want these23:12
* fungi has random linux thinnish-clients scattered around the house23:13
jog0clarkb: python install works for elastic-search23:13
jog0just doesn't install any binaries23:13
clarkbjog0: awesome. I think the puppet is mostly ready then (it is missing an init script, but we can run it manually until we get one)23:14
clarkbfungi: yes manually running it was the intention until we had time to do it proper like23:14
clarkbfungi: did you still want to create the system account and put it into hiera? I am being distracted by Fridayness23:15
clarkbeg end of week fried brain23:15
openstackgerritSalvatore Orlando proposed a change to openstack-infra/devstack-gate: Revert "Enable q-vpn service"
jgriffithhey wait...23:17
jgriffithis it just me or is that SID not correct?23:17
jgriffithSCSI ID23:19
jgriffithsomething's not aligning correctly in the logs23:19
jgriffithso notice in the nova logs we try to open/connect around 22:44:1723:20
jgriffithand the scsi ID is 623:20
jgriffiththen check the syslog, and at that time you see a connection made for a target ID 523:21
sdagueany idea why didn't collect logs after timeout23:24
clarkbsdague: it didn't get a chance to run the cleanup function in devstack-gate23:26
clarkbthat is an annoying problem23:26
mordredsomething about this: "jgriffith | hmmmm" terrifies me23:26
sdaguehe didn't say muhahaha23:26
jgriffithnahh, was wondering if there's something bad happening with iscsi mixing up targets23:26
mordredjgriffith: I blame shuttleworth23:27
clarkbsdague: not sure how we can handle that better. couple things come to mind like run a post build shell action that does the copying or trapping SIGINT and running cleanup then (assuming taht is how jenkins is killing the test)23:27
jgriffithmordred: ha!  I've been doing that for a year!23:30
sdaguemordred: that's always your answer, at least on fridays23:30
mordredsdague: also on the other days that end in y23:31
*** ryanpetrello has quit IRC23:32
*** che-arne has quit IRC23:34
jeblairso who wants to restart jenkins01? :)23:36
jeblairit's not exhibiting problems, but i think it would be a good idea, possibly as a preventive measure, and also to upgrade the gearman plugin23:36
jgriffithK, on a hunch that there's a target collision I'm ading a show targets message to the output23:37
jeblair(i've uploaded the plugin, so it will take effect on restart)23:37
jgriffithI'm likely not going to be around for a bit but I'll check it out when I get back to a computer23:37
openstackgerritA change was merged to openstack-infra/jenkins-job-builder: Add publisher for Git Publisher support
jeblair(i also uploaded it to jenkins.o.o)23:38
*** alexpilotti has quit IRC23:42
mordredjeblair: the process is "put into shutdown; wait; disable gearman plugin; wait; stop; start; enable gearman plugin"23:43
mordredjeblair: right?23:43
jeblairmordred: yes23:43
mordredputting jenkins01 into shutdown mode23:48
jeblairi'm heading out23:49
jeblairmordred: thanks for taking care of 0123:49
mordredk. sure thing! thanks for taking care of all of infra!23:49
clarkb++ jeblair is a good keeper of the gate keeper23:50
*** mgagne has quit IRC23:51
jeblairmordred: if you want to do jenkins.o.o at the same time it's ready (and should be easy, can probably do it while you're waiting on 01)23:52
*** KennethWilke has quit IRC23:53
*** sodabrew has quit IRC23:53
*** UtahDave has quit IRC23:54
mordredjenkins is in shutdown mode23:55
*** sodabrew has joined #openstack-infra23:57
*** sodabrew has quit IRC23:58

Generated by 2.14.0 by Marius Gedminas - find it at!