21:01:47 <mikal> #startmeeting nova
21:01:49 <openstack> Meeting started Thu Sep 11 21:01:47 2014 UTC and is due to finish in 60 minutes.  The chair is mikal. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:01:50 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:01:53 <openstack> The meeting name has been set to 'nova'
21:01:58 <melwitt> o/
21:02:01 <mikal> Well hello
21:02:09 <dansmith> quick would be awesome
21:02:09 <mikal> The agenda as always is at https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting
21:02:11 * mriedem1 sneaks in late
21:02:18 <alaski> o/
21:02:21 <mikal> #topic Feature freeze exceptions
21:02:27 <mikal> #link https://etherpad.openstack.org/p/juno-nova-approved-ffes
21:02:33 <mikal> We still have a few inflight
21:02:44 <mikal> The deadline for them to _enter_ the gate being midnight Friday this week
21:02:54 <dansmith> so
21:02:57 <dansmith> we just had a thing
21:03:02 <dansmith> that kicked out at least one
21:03:06 <mikal> Looking at the list, I think server group quotas is the most off the rails of the four left
21:03:23 <mikal> dansmith: as in verification failed?
21:03:42 <mriedem1> mikal: no gate reset
21:03:42 <dansmith> mikal: as in, 100% of unit tests were failing
21:03:44 <PhilD> Codes's there - but it needs a tempest chaneg which is in the gate.  Going to just be a race against the gate at ths point
21:04:03 <mriedem1> mikal: bad sqla-migrate release broke unit tests, had to be promoted to top
21:04:06 <mriedem1> which reset the gate
21:04:12 <mikal> mriedem1: ahhh ok
21:04:24 <dansmith> well, and SRIOV got kicked because it "failed"
21:04:24 <mikal> I wouldn't panic yet
21:04:27 <dansmith> because of unit tests
21:04:37 <mikal> If things are approved and we're just fighting the gate then we can work through that
21:04:52 <mikal> Let's go through these four real quick
21:04:57 <mikal> vmware refactor
21:05:02 <mikal> There are two minor patches left?
21:05:17 <mriedem1> https://review.openstack.org/#/c/100927
21:05:18 <mikal> One approved and one not
21:05:18 <PhilD> The tempest change needed for sg quotas has been in the gate for 21hrs.  Once that merges I'm expecting the nova changes to verify -
21:05:22 <mriedem1> that's the last one it looks like
21:05:39 <mikal> https://review.openstack.org/#/c/119696/ is approved but not merged as well
21:05:50 <mriedem1> the latter sounds like cleanup, not functional
21:05:50 <mikal> But yeah, I think vmware refactor is basically done except for that one
21:05:54 <mriedem1> yup
21:06:04 <tjones> hurrah!
21:06:16 <mriedem1> it's not tied to the bp either
21:06:18 <mriedem1> https://review.openstack.org/#/c/100927 that is
21:06:25 <mikal> mriedem1: oh, as in we might just do that last one during stablisation anyways?
21:06:37 <mikal> Ahhh, well spotted
21:06:43 <mikal> I had trusted the etherpad list
21:06:45 <mikal> Ok
21:06:53 <mikal> So, vmware refactor is safe I'd say
21:07:02 <mikal> virt-numa-driver-placement then
21:07:35 <mikal> Two there need approval
21:07:36 <dansmith> mikal: fully in the gate, I think
21:07:41 <mikal> https://review.openstack.org/#/c/115381/
21:07:49 <mikal> https://review.openstack.org/#/c/115007/
21:08:00 <dansmith> oh, thought jaypipes had done that one already
21:08:17 <mikal> So, it would be cool if someone could be the second +2 on those todayish
21:08:31 <jaypipes> I will.
21:08:31 <dansmith> jaypipes: was going to I think he prolly just got distracted
21:08:33 <dansmith> yeah
21:08:36 <jaypipes> I thought I already had...
21:08:36 <mikal> I will take a look at them after the meeting unless someone beats me to it
21:08:48 <mikal> Cool, so virt-numa is safe then too
21:08:54 <jaypipes> mikal: nah, I'm on it.
21:09:02 <mikal> jaypipes: ta
21:09:04 <mikal> server group quotas
21:09:08 <mikal> PhilD: its your time to shine
21:09:17 <mikal> We're waiting on the tempest change?
21:09:17 <PhilD> First patch is in the gate
21:09:32 <PhilD> second part is +2'd but needs the tempest change
21:10:11 <mikal> https://review.openstack.org/#/c/116079/ needs rescue though?
21:10:11 <PhilD> third part has been reviewd OK but was missign v2.1 work.  That's there now - waitign for Chris and Ken'ichi to come on lien and do a final pass
21:10:28 <mikal> Ahhh, ok
21:10:34 <mikal> So you think you're covered once those guys wake up?
21:10:42 <PhilD> Needs Chris and Ken'icih to re-review it yes, btu the changs is jist to 2,1
21:10:44 <mikal> Or do you want to ask if anyone else can review as well?
21:10:46 <PhilD> Yep, I think so
21:10:50 <mikal> Cool
21:10:55 <PhilD> More eyes are always welcome
21:10:56 <mikal> So, server groups is mostly safe then
21:11:11 <mikal> SRIOV is the last one
21:11:29 <dansmith> SRIOV was fully in the gate earlier
21:11:32 <mikal> https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/pci-passthrough-sriov,n,z is the correc tlist for this one, right?
21:11:40 <dansmith> after a check run of the base one, it should all pop back in
21:11:47 <mikal> https://review.openstack.org/#/c/120675/ needs re-approval?
21:12:05 <mikal> https://review.openstack.org/#/c/120423/ depends on an outdated dependancy?
21:12:07 <dansmith> mikal: that one is not part of the original thing
21:12:14 <mikal> Ahhh, ok
21:12:16 <dansmith> mikal: something baoli and I are adding on top for more testing
21:12:18 <mikal> Its in the gerrit list is all
21:12:23 <dansmith> yep
21:12:26 <mikal> Ok, cool
21:12:44 <mikal> Can you turn the etherpad list into the canonical list of reviews you need to merge for the bp to be marked completed?
21:12:49 <mikal> That will help me know when to mark it done
21:13:04 <dansmith> okay
21:13:06 <sdague> so, how possibly destabalizing do we think any of these ffes could be?
21:13:31 <mikal> Well, we approved them all
21:13:35 <mikal> So in theory they shouldn't be too bad
21:13:37 <sdague> because back of the envelope... these are probably not really landing until middle of next week if "in gate by friday" is the critera
21:13:50 <PhilD> The quota change is pretty well contained, so its scope for mayhem is limited
21:14:05 <mikal> Our gate load should be pretty light, I guess that depends on how wild other teams are going
21:14:07 <sdague> yeh, vmware and quotas feel pretty off in their corner
21:14:22 <mikal> SRIOV sounds well tested
21:14:35 <mikal> i.e. someone has actually deployed it to test, right Dan?
21:14:36 <PhilD> Numa and SRIOV look like they have a much bigger surface
21:14:36 <sdague> yeh, my bigger concern is the fact that we went from 1 in 10 tests failing to 1 in 2
21:14:40 <sdague> during FFE
21:14:45 <sdague> for all of openstack
21:14:47 <mikal> Herm
21:14:50 <dansmith> mikal: yeah, it *actually* works :)
21:14:50 <mikal> That's not good
21:14:59 <sdague> that's why there is a 30 hr backup
21:15:07 <sdague> which was a 26 hr backup this morning
21:15:08 <baoli> mikal, it's been tested on cisco's vmfex and mlnx's
21:15:11 <mikal> sdague: do we know what bit is less stable now?
21:15:22 <sdague> mikal: no...
21:15:25 <jogo> another way to look at this is: on our record for merges to openstack/openstack in a single day is something like 140
21:15:34 <jogo> yesterday we merged 38
21:15:39 <sdague> yeh, right now we should be merging 120 - 150 a day
21:15:47 <sdague> given the in flow
21:15:58 <sdague> so I get everyone wants their feature in
21:16:16 <jogo> mikal: everything is pretty bad right now
21:16:38 <mikal> Looking at the calendar, we cut rc1 on 25 September (ish)
21:16:40 <PhilD> If someone coudl keep an eye on the tempest change https://review.openstack.org/#/c/120395/ and re-vreify the other two Nova changes if/when it lands overnight (my time) that would help me
21:16:43 <sdague> but ... the trade off with this much energy still on features at this point in the game, seems odd
21:16:48 <mikal> Which is 14 days away
21:17:05 <jogo> horizon, neutron, nova, mordred, glance
21:17:16 <mikal> mordred is unreliable?
21:17:41 <mikal> sdague: I agree we need to move past this to really getting stabilization happening
21:17:44 <sdague> he did push sqlalchemy-migrate package which broke all our unit tests
21:17:49 <jogo> mikal: yup  https://review.openstack.org/gitweb?p=openstack/nova.git;a=commitdiff;h=9c2d6f21854bc79f03095163b8133a9fec68f9f2
21:17:50 <mikal> Oh, nice
21:18:19 <sdague> well, the point is, we're not going to be able to start what we'd normally consider stabalization until the gate is under control
21:18:28 <sdague> and that's probably 10 - 14 days solid effort
21:18:36 <jogo> what about a hard deadline for things by your Monday mikal
21:18:39 <sdague> anyway, something to consider
21:18:56 <mikal> jogo: as in "if it hasn't merged by then you're out?"
21:19:05 <jogo> mikal: yup
21:19:26 <mikal> I think that's fair... Its two days past the original deadline in the UK timezone.
21:19:43 <mikal> It only works for us if the rest of the ecosystem is also dropping things out of the gate too though, right?
21:20:12 <mikal> We need other projects to also be working on stabilization
21:20:39 <jogo> mikal: yeah that would help
21:20:51 <dansmith> so, without a doubt, that deadline is going to leave some of these features half-merged
21:21:15 <mikal> dansmith: that's hard to tell... It depends how broken the gate is
21:21:19 <jogo> mikal: I can send an email out later today about gate issues
21:21:29 <mikal> jogo: that would be good
21:21:34 <dansmith> Based on how it has been going this week, I think it's probably how it will go
21:21:45 <mikal> Oh, I agree
21:21:49 <PhilD> Do they all leave things a sane state if they part merge ?
21:21:53 <dansmith> so would we (a) revert the pieces, (b) leave them, or (c) push the rest to keep the feature whole?
21:21:56 <mikal> I guess my point is do we think its going to be like that for the next two weeks?
21:22:05 <mikal> Are we just going to reverify over and over for a couple of weeks?
21:22:11 <sdague> well, all patches are supposed to be safe for the b camp, right?
21:22:14 <dansmith> I'm not arguing against the deadline, by the way :)
21:22:24 <dansmith> sdague: technically, sure
21:22:24 <sdague> that's our design point, we work at every patch level
21:22:25 <mikal> b camp?
21:22:36 <jogo> dansmith: we can evaulate case by case how to deal with things that won't make it
21:22:37 <dansmith> I'm just curious how we're going to handle it
21:22:38 <sdague> i.e. don't revert work that lands, just move on
21:22:43 <dansmith> jogo: okie
21:23:00 <dansmith> these are some of the larger FFEs we've ever had, I think
21:23:01 <mikal> I think there's always an implied "we will do the least worst thing when the time comes"
21:23:12 <mikal> But encouraging people to be watching their patches over the weekend is a good idea
21:23:15 <dansmith> so I just want to make sure we think about what we're going to do
21:23:28 <dansmith> "nothing" is fine with me :)
21:23:34 <sdague> yeh, nova had a ton of large FFE
21:23:53 <sdague> normally FFE for nova is closed out by the tuesday after
21:24:00 <mikal> This is true, although I think we also worked through most of them better than ususal as well
21:24:05 <mikal> Gate tech debt is what is hurting us here
21:24:13 <mikal> (Noting that is our fault as well, just a different thing)
21:24:35 <mikal> I want to move onto bugs if I may
21:24:43 <mikal> Because I think that's where the gate discussion goes
21:25:00 <mikal> #topic Bugs
21:25:06 <mikal> So...
21:25:22 <mikal> sdague / mriedem1 / jogo: if you could cherry pick stablization bugs for people to work on, what would they be?
21:25:33 <mikal> Do you have a wish list?
21:25:39 <jogo> mikal: go to http://status.openstack.org/elastic-recheck/
21:25:41 <mikal> Would putting a list of important bugs in an etherpad or something help?
21:25:42 <mriedem1> mikal: there was some discussion in nova today already, lots
21:25:45 <jogo> and look for the word nova (or neutron)
21:26:01 <tjones> should we mark those critical for tracking purposes?
21:26:12 <sdague> honestly, I've not been working on gate issues, I've been trigaging the nova bug queue
21:26:15 <mriedem1> yeah i'd go through e-r and look for things that are targeted to nova and aren't completely ambiguous, like 'think timed out waiting for status x'
21:26:18 <tjones> there is just 1 critical affecting gate https://bugs.launchpad.net/nova/+bug/1367941 but i think there are more...
21:26:20 <mikal> I think its a bending of our triage rules, but if something is breaking the gate its a big deal
21:26:21 <uvirtbot> Launchpad bug 1367941 in oslo-incubator "Able to aquire the semaphore used in lockutils.synchronized_with_prefix twice at the same time" [Critical,Confirmed]
21:26:23 <sdague> tjones: yeh, I think marking gate bugs critical is good
21:26:38 <mriedem1> tjones: i thought we had a lockutils sync?
21:26:45 <jogo> tjones: yes that would be great, I don't think we can realistically mark all as critical. so say anything with 5 or more hits in 1 day
21:26:53 <mriedem1> oh nvm
21:26:54 <mriedem1> https://review.openstack.org/#/c/120897/
21:26:55 <sdague> mriedem1: this is after the sync
21:27:20 <tjones> jogo: good idea
21:27:24 <jogo> mriedem1: mriedem1 lockutils was a red herring
21:27:35 <mikal> jogo: the elastic_recheck page is sorted by fail rate, yes?
21:27:42 <sdague> mikal: yes
21:27:45 <jogo> tjones: also some are marked as invlalid
21:27:51 <jogo> mikal: fail rate in last 24 hours
21:27:57 <jogo> so just go to the top of the page
21:28:12 <sdague> jogo: did you get any further on - 1367941 ?
21:28:22 <sdague> because I can imagine that might be a culprit in lots of issues
21:28:28 <jogo> sdague: yeah it was a red herring
21:28:33 <jogo> sdague: just misleading logs
21:28:37 <sdague> ok
21:28:42 <jogo> sdague: which is too bad
21:28:47 <sdague> so why is it still confirmed?
21:29:00 <jogo> sdague: because there was a bug, but it was in the logging
21:29:17 <sdague> can you update the log message and make it not critical :P
21:29:30 <jogo> sdague: that is the workigng theory at least and everything backs that up
21:30:39 <jogo> sdague: re-triaged out of critical
21:30:44 <sdague> jogo: thanks
21:31:15 <jogo> sdague: you looked into bug 1357476 right?
21:31:19 <uvirtbot> Launchpad bug 1357476 in neutron "Timeout waiting for vif plugging callback for instance" [Medium,Confirmed] https://launchpad.net/bugs/1357476
21:31:21 <sdague> on the up side, only 32 bugs in the new state for nova
21:31:30 <sdague> I was debugging with dansmith
21:31:31 <mriedem1> jogo: dansmith has a logging patch up for that
21:31:40 <sdague> yeh, we need the logging patch
21:31:41 <mriedem1> https://review.openstack.org/#/c/120842/
21:31:41 <mikal> I guess I'm hoping for a FFE like workflow for bugs at this point
21:31:44 <sdague> something's not right
21:31:50 <mikal> If we could decide as a group a small number to focus on
21:31:56 <mikal> We might make more progress than we do usually
21:32:04 <mikal> I guess "Critical" is one way of defining that group
21:32:33 <tjones> either that or use a tag
21:32:59 <jogo> lets do critical
21:33:15 <mikal> Well, we should also be emailing that list to -dev a bunch
21:33:22 <mikal> Let's make it hard for people to not know what to look at
21:33:36 <tjones> that will do it!
21:33:37 <mikal> If people were checking the bug tracker, we wouldn't have 1,000 bugs
21:34:17 <tjones> 935 (without incomplete)
21:34:27 <mriedem1> moving on...
21:34:31 <mriedem1> novaclient release?
21:34:31 <mikal> Yeah
21:34:40 <mikal> But if someone could come up witha  wishlist email that would be good
21:34:45 <mikal> mriedem1: sure, if you want one
21:34:50 <mikal> mriedem1: what prompts it?
21:35:00 <mriedem1> mikal: 9/18 is the deadline for final client releases before rc1
21:35:12 <mikal> And would it destabilize the gate?
21:35:32 <mriedem1> the gate already uses trunk novaclient
21:35:38 <mikal> Wow, there's a lot of "undecided" in the client fix committed list
21:35:51 <mriedem1> this would be for getting it into global-reqs i think as a min version?
21:35:53 <mriedem1> there is a ML thread
21:35:57 <mikal> Nothing above medium for those thigns which are triaged
21:36:16 <mriedem1> mikal: http://lists.openstack.org/pipermail/openstack-dev/2014-September/045487.html
21:36:28 * mikal looks
21:36:37 <mikal> Oh that one
21:36:39 <mriedem1> mikal: basically any FFEs that touch the clients should have a release before rc1
21:36:46 <mikal> Ok, so how about I try and triage those bugs so they're less confusing over the weekend
21:36:51 <mikal> And do a release on my Monday?
21:37:02 <mriedem1> sure, well i think you have a week
21:37:25 <mikal> I don't recall seeing any client changes in the FFE review list
21:37:29 <mikal> But I might have missed them
21:37:34 <mikal> I wasn't really looking for them
21:37:35 <mriedem1> they'd be tied to the bp's
21:37:52 <mikal> Ok, I will make a note to do a client release
21:37:58 <mikal> oMoving on?
21:38:08 <mriedem1> mikal: https://github.com/openstack/python-novaclient/commit/c59a0c8748ccc5f6a0cf80910c09b9328b4253ac
21:38:16 <mriedem1> that's an example of server bp that is in the client
21:38:20 <mriedem1> but not in a released client version
21:38:22 <mriedem1> for vishy
21:38:38 <mikal> So noted
21:39:02 <PhilD> https://review.openstack.org/#/c/108942/ is the client change for server group quotas
21:39:30 <mikal> Ugh
21:39:37 <mriedem1> PhilD: he needs to tie it to the bp
21:39:38 <mikal> Getting that through the gate before 18 September will be hard
21:39:57 <mikal> Given we need to get three of its friends through first
21:40:11 <sdague> yeh, honestly, just wait on the client bit there and bring it in once we open the tree back up
21:40:22 <sdague> and cut another release of the nova client after the release
21:40:32 <PhilD> Ok - makes sense
21:40:36 <mikal> Yeah, client releases aren't too scary
21:40:37 <jaypipes> mikal: both remaining NUMA patches now reviewed.
21:40:49 <mikal> But I agree that change needs to be tied to the BP
21:40:54 <mikal> Moving on...
21:40:58 <PhilD> Its only the shell change - the client bindign itself will work wth the new quotas anyway
21:40:59 <mikal> I think we've covered gate already
21:41:14 <mikal> #topic Ironic API proxy
21:41:38 <mikal> The code is out for review for this one right?
21:41:47 <mikal> It seems the concensus on the thread is we should land it?
21:42:47 <mikal> Its not even very big
21:42:54 <mikal> So... Who is going to review it?
21:43:19 <dansmith> I will
21:43:21 <sdague> I will
21:43:22 <dansmith> because I have time for that
21:43:26 <dansmith> (not)
21:43:29 <mikal> I will too, cause I feel left out
21:43:39 <mriedem1> link?
21:43:40 <mikal> Ok, let's just do it then
21:43:42 <sdague> I did the first round, mostly wanted a fix on imports, didn't see if we had a second
21:43:45 <mikal> https://review.openstack.org/#/c/120433/
21:43:56 <mikal> We're at patch set 7
21:44:05 <mikal> This is the last thing blocking ironic graduation as best as I can tell
21:44:32 <sdague> unit test fails are probably migrate's fault
21:44:33 <mriedem1> i'll add to the queue
21:44:35 <sdague> rechecking
21:44:49 <mikal> sdague: It already had a recheck?
21:45:10 <sdague> it just failed out recently I thought
21:45:25 <mikal> It failed at 7:29AM
21:45:30 <mikal> And got rechecked at 7:39AM
21:45:35 <mikal> (My time)
21:45:40 <sdague> in crazy future world
21:45:44 <sdague> ok, my bad :)
21:45:44 <mikal> LOL
21:45:55 <mikal> Ok, I'm done with ironic proxies
21:46:01 <mikal> #topic Open Discussion
21:46:47 <bknudson> I've got a bug fix up for review -- https://review.openstack.org/#/c/119646/ -- for some reason it's -V.
21:46:59 <bknudson> although it's actually passed all the CI
21:47:24 <jogo> bknudson: citrix doesn't like it
21:47:28 <jogo> click toggle CI
21:47:33 <mriedem1> citrix doesn't like much today
21:47:38 <mikal> Sigh
21:47:46 <jogo> although citrix says it passed but -1ed anywy
21:47:51 <bknudson> for some reason the latest citrix CI comment says it passed.
21:48:08 <bknudson> maybe just rebase it and run again?
21:48:17 <mikal> Has anyone pinged Bob about the citrix CI being unreliable?
21:48:20 <mriedem1> just did
21:48:24 <mriedem1> consider him punged
21:48:26 <mikal> Ta
21:48:34 <sdague> the reason citrix CI is unreliable is the memory bloat
21:48:45 <jogo> sdague: ?
21:48:47 <sdague> they are basically always running out of memory on their systems during test
21:48:53 <clarkb> right they have the xenserver overhead on top of everything else
21:49:03 <clarkb> which is non trivial iirc
21:49:05 * dansmith shudders
21:49:09 <jogo> haha
21:49:21 <sdague> BobBall was actually originally the one pushing the reduction of workers in devstack because of *this*
21:49:56 <sdague> so there is a patch, that ianw disagrees with me on, that gives us a global flag to tune this. I suppose I should just override him and land it so we can move forward.
21:50:33 <bknudson> devstack change: https://review.openstack.org/#/c/117517/
21:51:16 <mikal> So, I think we're done here?
21:51:18 <mikal> Nothing else?
21:51:24 <jogo> fix bugs please!
21:51:29 <mikal> I live to obey
21:51:32 <jogo> I marked gate  bugs as critical
21:51:36 <mikal> Thanks
21:51:36 <sdague> mikal: definition for release critical bugs?
21:51:37 <mriedem1> why does novaclient have a config generator file?
21:52:19 <mikal> sdague: hmmm, not sure. I haven't given much thought to that yet. Do we have a written definition we've used in the past?
21:53:04 <sdague> don't know, it would just probably be good for guidance on triage
21:53:13 <mikal> Yep
21:53:20 <mikal> This sounds like the sort of thing ttx will have advice on
21:53:24 <mikal> I shall ping him and ask
21:53:43 <mikal> I feel like I don't need to reinvent the wheel here, just work out what we said in the past
21:54:30 <mikal> Sounds like we're done
21:54:35 <mikal> Have a 6 minute break
21:54:38 <mikal> Then go fix bugs
21:54:53 <mikal> #endmeeting