14:00:10 <edleafe> #startmeeting nova_scheduler
14:00:11 <openstack> Meeting started Mon Nov 27 14:00:10 2017 UTC and is due to finish in 60 minutes.  The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:14 <openstack> The meeting name has been set to 'nova_scheduler'
14:00:14 <cdent> o/
14:00:18 <efried> \o
14:00:18 <ttsiouts> o/
14:00:19 <takashin> o/
14:00:25 <edleafe> Good UGT morning! Who's here?
14:00:28 <jaypipes> o/
14:00:54 <alex_xu> o/
14:01:42 <cdent> Over the week I learned that jaypipes is running for the senate
14:01:52 <jaypipes> quoi?
14:01:58 <cdent> “child bride"
14:02:06 <jaypipes> ahahaha
14:02:17 * edleafe missed the reference
14:02:21 * efried too
14:02:31 <jaypipes> over the weekend I learned edleafe got really high.
14:02:39 <edleafe> that I did!
14:02:40 <cdent> https://twitter.com/jaypipes/status/934490883478773762
14:03:04 <edleafe> jaypipes: you little devil
14:03:07 * bauzas waves
14:03:26 <jaypipes> so... on to scheduler things :)
14:03:35 <edleafe> yeah
14:03:48 <edleafe> #topic Tell me what's going on
14:04:01 <cdent> #link what’s going on http://lists.openstack.org/pipermail/openstack-dev/2017-November/124886.html
14:04:10 <edleafe> I got back from 10 days away from the 'puter, and have nothing prepared
14:04:26 <edleafe> So raise your hand if you want to discuss something
14:04:45 <efried> I think the topic accumulated_nits accidentally got sprayed around a whole series by git restack.
14:04:48 <efried> mahbad
14:05:01 <efried> Most of that should have been bp/nested-resource-providers probably.
14:05:13 <efried> uh, no.
14:05:27 <efried> but not accumulated_nits
14:05:57 <efried> Anyway, IMO the most pressing need right now is getting core reviews on the n-r-p series and starting to get that merged.
14:06:20 <efried> It's going to be prerequisite to a couple of other critical things.
14:06:26 <cdent> i agree!
14:06:39 <cdent> (damn, I just missed an “I concur” opportunity)
14:06:40 <edleafe> yeah, these long-running series become a pain when they get stale
14:06:49 <edleafe> I concur!
14:06:57 <jaypipes> that and the refactor series which also has a bunch of func tests and traits handling
14:06:59 <bauzas> well, I'll look
14:07:29 <efried> #link bottom of the n-r-p series: https://review.openstack.org/#/c/377138/
14:07:39 <bauzas> (well, once I'll done with my internal problems :p )
14:07:50 <efried> jaypipes ++
14:08:11 <edleafe> I haven't looked at the alternate host series yet, but I assume it's still being held back while details are being picked over
14:08:32 * edleafe probably has several more weeks of rebase fun with that
14:08:37 <efried> #link current bottom of the refactor series: https://review.openstack.org/#/c/516782/11
14:09:02 <efried> Actually ^ is the last patch in the refactor series proper.
14:09:09 <efried> But it's anchoring a pile of other stuff.
14:09:20 <efried> including tests and traits handling, as jaypipes said.
14:10:29 <efried> I think the biggest to-do on top of the n-r-p series is getting n-r-p affordance for allocation candidates.
14:10:59 * cdent concurs
14:11:01 <efried> jaypipes or alex_xu Do you have that cued up anywhere (locally) yet?
14:11:13 <jaypipes> efried: yes, and then settling on the update_provider_tree() implementation
14:11:27 <edleafe> Are we still aiming for "merge shit early" like we said at PTG? It sure doesn't feel like it
14:11:40 <alex_xu> i don't have anything
14:11:44 <jaypipes> edleafe: we've merged quite a bit so far, actually.
14:11:56 <efried> Agree, though the summit + thanksgiving really put a crimp in things.
14:12:03 <jaypipes> edleafe: personally, I'd like to see the alternate hosts stuff make progress this week.
14:12:14 <cdent> here comes christmas too :(
14:12:17 <efried> yeah
14:12:28 <edleafe> jaypipes: it isn't as critical path as n-r-p, though
14:13:04 <efried> So I'll follow edleafe's "merge shit early" with: The next 2-3 weeks we should have some kind of official focussed review-and-merge-shit push.
14:13:14 <jaypipes> edleafe: it's still priority over n-r-p, though, according to my last recollection of priorities (alternate hosts, traits handling and move operation cleanup)
14:13:43 <efried> jaypipes I didn't think those three were in any particular priority order.
14:13:43 <jaypipes> efried: let's make it official then.
14:14:03 <edleafe> efried: yeah, I didn't get a sense of ordering either
14:14:05 <jaypipes> efried: they're not. they're just the 3 priorities. n-r-p isn't actually a priority for Queens...
14:14:50 <cdent> that’s not my recollection? I thought we said at ptg that nrp was pre-req for traits and RT cleanup
14:15:09 <cdent> or was that just “before shared”?
14:15:15 <jaypipes> cdent: right. before shared.
14:15:38 * jaypipes notes that shared stuff has been the source of most bugs in the allocation candidates code so far...
14:15:47 <jaypipes> in any case, it all is related.
14:15:58 <jaypipes> so let's make this week the official "push this shit" week.
14:16:05 <efried> https://etherpad.openstack.org/p/nova-ptg-queens-placement L47-55
14:16:19 * cdent wonders if we can make every week officially push this shit?
14:16:21 <efried> Right - "shared" was deferred out of Q.
14:16:28 <efried> cdent Baby steps
14:16:53 <alex_xu> should I rebase this patch https://review.openstack.org/480379?
14:17:34 <efried> alex_xu I would say that's probably a lower priority than reviews, if you had to pick one or the other.
14:17:39 <efried> Though you've been pretty well on top of reviews.
14:17:54 <jaypipes> efried: can you check to see if alex_xu's tests from the dependent patch on above is included in the refactor suite (from you or gibi)?
14:18:02 <efried> Yeah, given how we're not doing shared RPs for Q, we sure have spent a lot of time on it...
14:18:34 <efried> jaypipes You talking about https://review.openstack.org/#/c/480379/22/nova/tests/functional/db/test_resource_provider.py ?
14:18:50 <jaypipes> yeah
14:18:57 <efried> jaypipes Will do.
14:19:37 <alex_xu> For this patch https://review.openstack.org/517119, we still needs to ensure the traits work well for shared case, right?
14:20:14 <jaypipes> gawd, maybe I was confusing the generic device manager with n-r-p w.r.t priorities in Queens. guh, I must be getting old. :(
14:20:36 <jaypipes> either that or everything is running into everything else in my mind at this poitn.
14:20:48 <cdent> it does all run together
14:20:48 <efried> I feel like we still have some architectural work to do on sharing + traits.
14:20:57 <edleafe> jaypipes: that's the definition of getting old
14:21:01 * edleafe knows
14:21:09 <efried> In general, we should be looking for opportunities to defer work on sharing RPs.
14:21:26 <efried> Which we haven't been doing especially well, tbh.  I'm probably the most guilty.
14:21:26 * cdent concurs
14:21:52 <cdent> when it feature freeze?
14:21:54 <cdent> s/it/is/
14:22:07 <bauzas> Milestone-3
14:22:20 <cdent> what is the date of Milestone-3 ?
14:22:26 <alex_xu> Jan 22
14:22:31 <alex_xu> #link https://releases.openstack.org/queens/schedule.html
14:22:33 <bauzas> https://releases.openstack.org/queens/schedule.html
14:22:37 <bauzas> dammit, burned
14:22:49 <cdent> thanks was nearly there
14:22:50 <jaypipes> so basically 6 weeks, if you count western holiday week a goner.
14:23:07 <bauzas> correct
14:23:12 <bauzas> R-5
14:23:25 <bauzas> and we are R-13
14:23:51 <efried> Jan 25 according to https://wiki.openstack.org/wiki/Nova/Queens_Release_Schedule -- a whole three extra days!
14:24:10 <bauzas> yeah, because we do that by Thursdays
14:24:16 <efried> phew
14:24:20 <bauzas> some projects do that by Tuesdays
14:24:37 <bauzas> but the relmanagement team accepts all the milestones during those 3 days
14:24:54 <bauzas> hope that's clearer
14:25:07 <bauzas> the real crux is : do we still need to discuss design problems now ?
14:25:21 <efried> I really hope not.
14:25:40 <bauzas> tbh, n-r-p is the top prio
14:25:41 <efried> Hopefully the design discussions are minor enough to be contained within reviews.
14:25:48 <bauzas> if we need to discuss problems during the implementation
14:25:53 <efried> E.g. the update_provider_tree stuff
14:25:56 <bauzas> yeah
14:26:02 <bauzas> that's my point
14:26:17 <efried> So far that effort has been doing well on that count.
14:26:18 <bauzas> we had a long series of discussions when we reviewed the scheduler claims
14:26:40 <bauzas> if we need to discuss like we did for that, then huh
14:26:51 <efried> Course it was pretty straightforward when it was just me & jaypipes having the discussions :)
14:28:08 <jaypipes> efried: I've always maintained that there will need to be kinks worked out once real clients were using the n-r-p stuff. and now that XenAPI has a proposed patch up, that's shaking that implementation detail tree a bit. which is a good thing.
14:28:20 <efried> ++
14:28:35 <edleafe> all the more reason to merge and then update in subsequent patches
14:28:41 <efried> +++
14:28:46 * cdent concurs
14:29:06 * efried looks around...
14:29:20 <efried> There's three cores in attendance, and two of them are authors of the patches in question.
14:29:24 <bauzas> jaypipes: yeah, the Xen team is having 3 people working on the VGPU feature, so they can help
14:29:31 <bauzas> (while only me for libvirt :p )
14:29:36 <edleafe> efried: we can nag the others later in -nova
14:29:59 <efried> Yuh, we need to socialize our "official" review push week to the likes of dansmith, stephenfin, etc.
14:30:06 <bauzas> efried: here I definitely want to help but struggle with time
14:30:20 <bauzas> NFV FTW
14:30:20 * gibi can help with some of the reviews in n-r-p
14:30:21 <efried> Right.  It can't all be on the shoulders of one or two cores.
14:30:51 <edleafe> We seem to agree on this. Anything else to discuss?
14:31:05 * edleafe has to dig out from a ton of backlog emails
14:32:10 <cdent> my pending placement stuff is pretty ready for whoever would like to look
14:32:18 <jaypipes> cdent: link pls
14:32:42 <cdent> #link some gabbit clean up https://review.openstack.org/#/c/513057/
14:32:57 <cdent> #link symmetric get and put and post to /allocations: https://review.openstack.org/#/c/510626/
14:33:14 <cdent> #link cache headers and requisite changes to objects: https://review.openstack.org/#/c/521639/
14:33:33 <jaypipes> danke
14:33:35 <cdent> #link fixup to get proper formatted errors: https://review.openstack.org/#/c/518223/
14:34:56 <cdent> in other news I’ve started messing around with a containered placement service to see what breaks when it is isolated and/or used in parallel. one fun bug so far:
14:35:22 <cdent> #link config for middleware bug: https://bugs.launchpad.net/nova/+bug/1734491
14:35:22 <openstack> Launchpad bug 1734491 in OpenStack Compute (nova) "placement keystonemiddleware_authtoken ignores OS_PLACEMENT_CONFIG_DIR" [Undecided,In progress] - Assigned to Chris Dent (cdent)
14:35:51 <cdent> Also there’s a new bug related to request ids, requiring some clean up on nova side:
14:35:59 <cdent> #link request id bug: https://bugs.launchpad.net/nova/+bug/1734625
14:36:00 <openstack> Launchpad bug 1734625 in OpenStack Compute (nova) "placement: Request IDs are not passed to placement service" [Undecided,In progress] - Assigned to Takashi NATSUME (natsume-takashi)
14:36:32 <jaypipes> cool, thanks for investigfating that cdent
14:36:59 <edleafe> why would placement care about nova-isms like request id?
14:37:14 <efried> request IDs are openstack-isms now.
14:37:19 <cdent> edleafe: see: https://review.openstack.org/#/c/523007/
14:37:34 <mriedem> this is a bug that's affecting the gate, which only efried and i have looked at so far i think https://bugs.launchpad.net/nova/+bug/1731668
14:37:34 <openstack> Launchpad bug 1731668 in OpenStack Compute (nova) "placement: claim allocations fails with IndexError in _ensure_lookup_table_entry" [High,Confirmed]
14:37:34 <cdent> the incoming request id ought to be respected in the logs
14:38:49 <cdent> mriedem, jaypipes : is that code subject to the big refactoring that jay and eric have done?
14:39:20 <efried> cdent Which, the bug?
14:39:22 <mriedem> there was a refactor that hit this code recently
14:39:35 <cdent> yeah, the ensure look up table
14:39:36 <mriedem> note the bug is a few weeks old at this point
14:41:29 <jaypipes> mriedem: no, I don't think that code was touched.
14:43:02 * efried concurs
14:43:29 <jaypipes> mriedem: I will take that bug
14:43:32 <mriedem> it may be a latent bug, idk
14:43:33 <efried> The refactor was all about GET /allocation_candidates.  That bug is in stuff that touches project/user IDs.
14:43:34 * cdent writes efried a cheque
14:43:49 * efried makes fun of cdent's British spelling of cheque.
14:43:55 <jaypipes> mriedem: yeah, I think I know what it is.
14:44:24 <efried> jaypipes Note the "related fix" in the bug report - some discussion of the race condition there (and a possible "fix")
14:44:27 <jaypipes> mriedem: we're not catching the right exception and it's dropping out of the except block without running that fetchall() again.
14:44:37 <efried> Ah, or that.
14:44:59 <jaypipes> efried: almost guarantee we're getting a DBDeadlock error, not a duplicate error.
14:45:05 <jaypipes> efried: and that's causing this.
14:45:06 <efried> cool
14:45:27 <efried> Yeah, cause that race should be way too hard to hit for the frequency with which we're seeing the error.
14:46:33 <efried> mriedem jaypipes I could put up an investigative patch that logs the exception.
14:46:45 <jaypipes> efried: I'll do it.
14:46:48 <efried> ack
14:48:32 <jaypipes> I think we just need to catch sa.exc.IntegrityError instead... but I'll check it out.
14:48:32 <edleafe> ok, anything else?
14:49:37 <efried> Who's got the #action to socialize the review push?
14:50:09 * edleafe thinks efried just volunteered
14:50:28 * efried thinks it should be someone with more clout.
14:52:46 <efried> I can compose an email for the dev list
14:52:53 * edleafe thinks it should be someone with social skills, leaving himself out
14:54:11 <edleafe> #action efried to socialize review priorities to other nova cores
14:54:23 <efried> ack
14:54:29 <edleafe> Anything else? Or shall we call it a day?
14:54:35 <cdent> call it
14:54:46 * edleafe just confused non-native English speakers
14:55:00 <edleafe> OK, thanks everyone!
14:55:03 <edleafe> #endmeeting