14:00:35 <edleafe> #startmeeting nova_scheduler
14:00:36 <openstack> Meeting started Mon Jul 10 14:00:35 2017 UTC and is due to finish in 60 minutes.  The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:37 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:40 <openstack> The meeting name has been set to 'nova_scheduler'
14:00:44 <edleafe> Good UGT morning!
14:00:48 <edleafe> Who's here?
14:01:08 <dtantsur> o/
14:01:58 <cdent> o/
14:02:10 <edleafe> jaypipes, alex_xu, bauzas - around?
14:02:16 <alex_xu> o/
14:02:17 <jaypipes> yuppers.
14:03:14 <edleafe> Guess we'll start
14:03:15 <edleafe> #topic Specs and Reviews
14:03:27 <edleafe> There is a new spec
14:03:33 <edleafe> or rather an amendment to one
14:03:35 <edleafe> #link Amend spec for Custom Resource Classes in Flavors: https://review.openstack.org/#/c/481748/
14:04:01 <edleafe> This was going to be done by jroll
14:04:08 <edleafe> Looks like it's now on me
14:04:33 <cdent> i can probably be your off hours buddy on that?
14:04:36 <jaypipes> edleafe: didn't you already have code for that?
14:04:43 <jaypipes> I thought I remember reviewing that already?
14:04:46 <edleafe> jaypipes: for my half, yes
14:05:02 <edleafe> jroll was going to handle what needed to happen for migration
14:05:16 <edleafe> so that when Pike starts up, the correct resources are allocated
14:05:16 <jaypipes> ah
14:05:31 <dtantsur> FYI I've tried it with a devstack change, and still cannot make the tests pass: https://review.openstack.org/#/c/476968/. It may be my mistake, of course, or it may be this missing migration
14:05:47 <edleafe> dtantsur: tried what?
14:06:00 <dtantsur> edleafe: sorry :) using resource classes for scheduling ironic instances
14:06:29 <edleafe> OK, I haven't looked at that patch.
14:06:36 <edleafe> I'll take a look at it later
14:07:09 <jaypipes> I will as well.
14:07:25 <jaypipes> both the spec and the patch
14:08:18 <edleafe> jaypipes: do we have the code merged to use the custom RC?
14:08:31 <edleafe> I know it was mine, but I thought there was another piece needed
14:08:37 <jaypipes> edleafe: oh yes, since Ocata.
14:08:51 <jaypipes> edleafe: oh, sorry, you're talking about the flavor thing
14:09:10 <jaypipes> edleafe: not sure on the flavor thing... need to check
14:09:12 <edleafe> jaypipes: yeah, the patch I wrote grabbed the custom RC from extra_specs
14:09:22 <edleafe> and added it to the 'resources' dict.
14:09:31 <jaypipes> right
14:10:49 <edleafe> Well, I'll be digging into what's needed for the migration. And I'd be happy to have cdent's help (and anyone else's)
14:11:04 <bauzas> oh snap, forgot meeting \o
14:11:07 <cdent> you know where to find me and I’ll look for you
14:11:21 * edleafe waves to bauzas
14:11:21 <jaypipes> stalker alert!
14:11:32 * bauzas bows to edleafe
14:11:32 <edleafe> :)
14:11:44 <edleafe> OK, next up...
14:11:47 <edleafe> #link Claims in the Scheduler: https://review.openstack.org/#/c/476632/
14:12:06 <edleafe> The first part is +W'd, so this is the only active one
14:12:56 <edleafe> jaypipes: anything to note?
14:13:09 <jaypipes> edleafe: I'll respond to mriedem's comments on there.
14:13:16 <edleafe> ok
14:13:17 <jaypipes> edleafe: did you have further comments on it?
14:13:57 <bauzas> technically, we have not yet merged the bottom patch but okay
14:14:36 <edleafe> jaypipes: I haven't looked at it since Friday morning, so when I do I'll respond on the patch
14:14:55 <jaypipes> k
14:14:59 <edleafe> Oh, I almost forgot to note:
14:15:01 <edleafe> #link Devstack to use resource classes by default https://review.openstack.org/#/c/476968/
14:15:22 * edleafe wants to keep the record up-to-date
14:15:45 <edleafe> Moving on...
14:15:46 <edleafe> #link Nested Resource Providers: series starting with https://review.openstack.org/#/c/470575/
14:15:59 <edleafe> This is still pretty much on hold, right?
14:17:08 * edleafe pokes jaypipes
14:17:16 <jaypipes> edleafe: yeah
14:17:27 <jaypipes> edleafe: it will pick up steam once claims are in.
14:17:30 <edleafe> ok, just making sure
14:17:35 * mriedem joins late
14:17:44 <jaypipes> edleafe: and I add some more functional testing around the scheduler -> conductor -> compute interactions.
14:17:44 <edleafe> Finally...
14:17:47 <edleafe> #link Placement api-ref docs https://review.openstack.org/#/q/topic:cd/placement-api-ref+status:open
14:18:18 <edleafe> jaypipes: let us know how we can help (besides reviews, of course)
14:19:24 <edleafe> Anything else for specs/reviews?
14:19:39 <alex_xu> the traits support in the allocation candidates are submitted
14:19:58 <alex_xu> #link the first patch https://review.openstack.org/478464
14:20:10 <alex_xu> #link the last one https://review.openstack.org/#/c/479776/
14:21:08 <jaypipes> mriedem: responded to your comments on ^
14:21:20 <jaypipes> mriedem: sorry, on https://review.openstack.org/#/c/476632/
14:21:33 <edleafe> OK, thanks alex_xu - added to my review list
14:21:45 <alex_xu> edleafe: I also remember there is one patch from you for 'GET /resources' with traits
14:21:53 <alex_xu> edleafe: thanks
14:22:25 <mriedem> jaypipes: ok, i guess i'm missing something then because when originally planning this all out,
14:22:39 <mriedem> i thought we were going for some minimum nova-compute service version check before doing allocations in the scheduler
14:22:48 <mriedem> such that we would no longer do the claim in the compute
14:23:51 <mriedem> once we do the allocation in the scheduler, the claim in the compute is at best redundant but not a problem,
14:24:05 <mriedem> at worst the claim fails because of something like the overhead calculation
14:24:19 <mriedem> or pci or whatever we don
14:24:24 <mriedem> *don't handle yet in the scheduler
14:25:46 <jaypipes> mriedem: we can do the *removal of the claim on the compute node* once we know all computes are upgraded. but that's a different patch to what's up there now, which just does the claim in the scheduler.
14:27:48 <edleafe> jaypipes: so if the scheduler starts doing claims, will that cause a problem with older computes?
14:27:54 <jaypipes> edleafe: no.
14:28:06 <edleafe> Or will the compute claim just be a duplicate
14:28:19 <mriedem> it's a duplicate
14:28:21 <jaypipes> edleafe: not even duplicate. it just won't be done.
14:28:35 <mriedem> what do you mean it won't be done?
14:28:39 <jaypipes> edleafe: b/c the report client only writes allocations that are not already existing.
14:28:39 <edleafe> jaypipes: even on an old compute?
14:28:53 <jaypipes> edleafe: yes. on ocata computes, we already do this.
14:29:03 <mriedem> writing the allocations is part of the claim process that happens on the compute *today* yes?
14:29:08 <edleafe> jaypipes: ok, I'll have to re-read that code
14:29:15 <jaypipes> mriedem: yes, and the periodic audit job.
14:29:23 <mriedem> but before we have the RT call the report client to write allocations, we're doing pci and overhead calculations
14:29:41 <jaypipes> mriedem: correct.
14:29:41 <mriedem> so we are still going to go through the same old claim process
14:29:47 <mriedem> which may fail, and trigger a retry
14:30:06 <jaypipes> mriedem: correct. if that happens, the allocations are deleted from the placement API.
14:30:12 <mriedem> where?
14:30:18 <jaypipes> in the periodic audit job.
14:30:25 <jaypipes> update_available_resource()
14:30:29 <jaypipes> will pick that uyp.
14:31:12 <mriedem> when does the alternates stuff for retries come in?
14:31:24 <mriedem> on top of https://review.openstack.org/#/c/476632/ ?
14:31:44 <cdent> even if something writes allocations for the same instance multiple times, it is a replace action
14:31:59 <cdent> PUT /allocations/consumer_uuid is replace
14:32:02 <jaypipes> mriedem: yes, the alternatives stuff needs to come after this.
14:32:58 <jaypipes> cdent: right, but we look up existing allocations first and do nothing if nothing changed: https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L863
14:33:20 <cdent> jaypipes: yeah, I know, I was just saying that’s it’s safe even if that wasn’t happening
14:33:26 <jaypipes> gotcha
14:33:36 <mriedem> ok so if we leave the allocation cleanup to the periodic task,
14:33:59 <mriedem> there is a chance you could "fill up" allocations for a compute node after a couple of failed attempts within a minute or something,
14:34:09 <jaypipes> mriedem: yep.
14:34:14 <mriedem> which if you've got a lot of compute nodes and a busy cloud, should be ok...
14:34:42 <jaypipes> mriedem: and I wrote in that comment that I could try and "undo" successful allocations in the scheduler _claim_resources() method, but that meh, eventually it'll get cleaned up by the periodic audit task on the compute
14:35:31 <mriedem> i have a bad feeling about relying on that
14:35:44 <mriedem> especially when someone does nova boot with min-count 100
14:36:04 <mriedem> e.g. you get to 99 and novalidhost, and we don't cleanup the allocations for the first 98
14:37:02 <jaypipes> mriedem: I'm happy to take a go at that cleanup if you'd like.
14:37:06 <dansmith> the retry part of conductor could accelerate that
14:37:08 <jaypipes> mriedem: just say the word.
14:37:10 <mriedem> will needing to undo allocations in the scheduler slow it down for other incoming requests? we're still single worker right?
14:37:34 <mriedem> dansmith: in this case we wouldn't get to conductor,
14:37:35 <dansmith> mriedem: single worker but we yield when making a call to placement
14:37:36 <mriedem> it's novalidhost
14:37:42 <jaypipes> mriedem: there's no reason at all why the scheduler needs to be single process.
14:38:05 <dansmith> mriedem: you mean for a failed boot that never gets retried?
14:38:53 <mriedem> dansmith: yes
14:38:57 <mriedem> scheduler raises NoValidHost
14:39:21 <dansmith> okay I'm confused about why we'd still have stale allocations in that case
14:39:29 <dansmith> but we can discuss outside of the meeting
14:39:31 <jaypipes> dansmith: he's talking about this code:
14:39:38 <mriedem> https://review.openstack.org/#/c/476632/19/nova/scheduler/manager.py@128
14:39:43 <jaypipes> ya
14:39:46 <jaypipes> danke mriedem
14:40:08 <dansmith> oh I see, just in the n-instances case, I gotca
14:40:30 <jaypipes> mriedem: like I said, I'm happy to give a go at cleaning up already-successful allocations in that block.
14:40:38 <jaypipes> mriedem: just say the word.
14:40:41 <dansmith> cleanup there would be easy I think, yeah
14:40:52 <mriedem> in general i think we should cleanup when we can
14:40:59 <jaypipes> yeah, I'll just keep track of the instance UUIDs that succeeded.
14:41:03 <dansmith> yep
14:41:04 <mriedem> including when we retry from the comptue to the conductor with the alternates
14:41:26 <jaypipes> mriedem: well, and we'll eventually want to be retrying *within* the scheduler.
14:41:29 <dansmith> mriedem: yeah that's the case I was thinking of and have always described it as "cleanup the old, claim the next alternate"
14:41:36 <jaypipes> but whatevs, I hear ya, I'll fix that section up.
14:41:44 <dansmith> jaypipes: no, we can't retry in the scheduler once we've failed on the compute node
14:41:58 <jaypipes> dansmith: retry on the allocation_request...
14:42:00 <mriedem> i think jay is talking about pre-compute
14:42:05 <mriedem> yeah
14:42:07 <dansmith> that, yes
14:42:07 <jaypipes> right.
14:42:19 <dansmith> figured he meant: [07:41:04]  <mriedem>	including when we retry from the comptue to the conductor with the alternates
14:42:30 <jaypipes> yeah, sorry, no I mean the allocation candidates thing.
14:42:37 <mriedem> retrying within the scheduler is the whole reason we decided to do it in the scheduler and not conductor
14:42:41 <dansmith> ack
14:42:44 <mriedem> so yeah we should do that :)
14:42:45 <edleafe> well, that's not really a retry when the scheduler can't claim
14:42:46 <dansmith> yeah
14:42:52 <edleafe> just validating the host
14:43:06 <jaypipes> anyway, mriedem, besides the cleaning up successful allocations in that failure block, is there anything big you want changed on the patch? if not, I'll go and work on this.
14:43:24 <mriedem> jaypipes: i think you already replied on my other things
14:43:29 <jaypipes> the other little nits I'll get, yep
14:43:39 <edleafe> Let's continue this in -nova
14:43:43 <mriedem> btw, we create the allocations after the filters right?
14:43:44 <edleafe> #topic Bugs
14:43:57 <edleafe> #undo
14:43:58 <openstack> Removing item from minutes: #topic Bugs
14:44:20 <jaypipes> mriedem: yes.
14:44:24 <jaypipes> mriedem: and the weighers.
14:44:26 <bauzas> sorry was a bit afk
14:44:36 <bauzas> but I have a point about the above
14:44:47 <edleafe> Let's keep it quick
14:45:27 <bauzas> given the time we still have for Pike, do folks agree with me about possibly not having the conductor passing alternatives for Pike ?
14:45:45 <dansmith> no I don't agree
14:45:54 <jaypipes> bauzas: no, I think it's absoltelyuy doable for Pike to have the alternatives done.
14:46:14 <edleafe> me too
14:46:17 <bauzas> would it be a problem not having that for Pike ?
14:46:20 <jaypipes> bauzas: I think we can have claims merged and ready by Wednesday and patches up for alternatives by EOW
14:46:38 <bauzas> while I agree with all of us about why it's important, I'm just trying to be pragramatic
14:46:41 <dansmith> bauzas: yes, without that we're toast for the proper cellsv2 arrangement
14:47:01 <cdent> yeah, we pretty much have to do it
14:47:01 <dansmith> bauzas: we can be pragmatic when we're out of time, but we're not there, IMHO
14:47:12 <bauzas> okay
14:47:14 <jaypipes> we need to get alternatives done, flavors for resource classes complete, and claims done.
14:47:19 <dansmith> ack
14:47:21 <jaypipes> those are absolutes for Pike.
14:47:37 <jaypipes> nested stuff is nice to have, and we've made a bit of progress on it already.
14:47:39 <bauzas> and shared-RP, and custom-RP? :)
14:47:47 <bauzas> yeah, that's my point
14:47:48 <mriedem> shared is done
14:48:02 <bauzas> well, agreed
14:48:02 <mriedem> allocation candidates takes care of shared, at least for disk
14:48:16 <edleafe> mriedem: well, not completely done
14:48:29 <edleafe> mriedem: we don't handle complex RPs
14:48:35 <jaypipes> mriedem: well, almost... still need a way to trigger the compute node to not want to claim the disk when shared provider is used...
14:48:37 <bauzas> okay, tbc, I don't disagree with the direction, I'm just trying to see what is left for Pike
14:48:39 <edleafe> mriedem: like a compute with both local and shared
14:48:41 * alex_xu puts the trait's priority low, focus on review the priority stuff
14:48:50 <jaypipes> mriedem: but that is a short patch that all the plumbing is ready for.
14:49:04 <jaypipes> edleafe: we don't *currently* handle that.
14:49:15 <jaypipes> edleafe: so that's not something I'm worried about yet
14:49:24 <edleafe> jaypipes: exactly - which was going to be the subject I wanted to discuss in Opens
14:49:29 <jaypipes> kk
14:49:33 <edleafe> but we are quickly running out of time
14:49:46 <jaypipes> there is always #openstack-nova, ed :)
14:50:08 * edleafe blinks
14:50:12 <edleafe> Really??
14:50:15 <edleafe> :)
14:50:31 <bauzas> anyway
14:50:38 <bauzas> I don't want to confuse people
14:50:47 <edleafe> Let's try to move on again...
14:50:48 <edleafe> #topic Bugs
14:50:48 <edleafe> #link https://bugs.launchpad.net/nova/+bugs?field.tag=placement
14:50:58 <edleafe> Only one new bug:
14:50:58 <edleafe> #link The AllocationCandidates.get_by_filters returned wrong combination of AllocationRequests https://bugs.launchpad.net/nova/+bug/1702420
14:51:00 <openstack> Launchpad bug 1702420 in OpenStack Compute (nova) "The AllocationCandidates.get_by_filters returned wrong combination of AllocationRequests" [High,In progress] - Assigned to Alex Xu (xuhj)
14:51:01 <edleafe> alex_xu reported this one, and is working on it.
14:51:08 <edleafe> alex_xu: any problems with that?
14:51:21 <alex_xu> edleafe: no, just waiting for review
14:51:31 <edleafe> great
14:51:39 <edleafe> Anything else on bugs?
14:52:16 <edleafe> #topic Open Discussion
14:52:28 <edleafe> I had one concern: the change to return a list of HostState objects from the scheduler driver to the manager. IMO, we really need the host to be associated with its Allocation object so that a proper claim can be made. The current design just returns hosts, and then picks the first allocation that matches the hosts RP id.
14:52:38 <edleafe> In the case of a host that has both local and shared storage, there will be two allocation candidates for that host. The current design will choose one of those more or less at random.
14:52:45 <edleafe> Jay has said that when we begin to support such complex RPs, we will make the change then. Since we are changing the interface between manager and driver now, wouldn't it be best to do it so that when we add complex RPs, we don't have to change it again?
14:53:07 <dansmith> if you haven't requested a trait of shared or not-shared, then at-random is fine right?
14:53:30 <edleafe> dansmith: in that case, yes
14:53:51 <edleafe> but in the case of local vs. public net for PCI, probably not
14:54:09 <jaypipes> to be clear, the code just selects the first allocation request containing the host's RP ID. so yeah, there's no order to it.
14:54:41 <mkucia> Hi. I am wondering how the driver will be handling ResourceProviders? Will there be a dedicated class (ResourceProviderDriver) for each provider type?
14:54:41 <dansmith> in the case of network, if your flavors say "give me a pci net device but I don't care which kind" then you're asking for at random, no?
14:54:43 <edleafe> jaypipes: you can keep the randomness for now
14:54:44 <dansmith> agree it would be a dumb thing to do, but..
14:55:00 <edleafe> jaypipes: I was concerned about having to change the interface yet again in Queens
14:55:23 <edleafe> dansmith: again, in that particular case, you would be correct
14:55:32 <edleafe> but that's not my point
14:55:40 <jaypipes> edleafe: this is an internal interface. I'm not concerned at all about that.
14:55:48 <mriedem> me neither
14:55:55 <mriedem> and this is no worse than what we have today right?
14:56:09 <jaypipes> edleafe: I mean, we need to change the RPC interface for alternatives support, and that's major surgery. This stuff was just a botox injection compared to that.
14:56:12 <mriedem> i'm more concerned about the <3 weeks to FF
14:56:18 <dansmith> mriedem: ++
14:56:23 <bauzas> botox, heh
14:56:26 <edleafe> ok, fine.
14:56:37 <bauzas> mriedem: me too, hence my previous point
14:56:40 <jaypipes> edleafe: you agree the RPC change is much more yes?
14:56:45 <edleafe> It just wasn't what we had originally discussed, and it raised a flag for me
14:56:52 <edleafe> jaypipes: of course
14:56:54 <jaypipes> understod.
14:57:17 <jaypipes> understood, edleafe and I appreciate your concerns on it. As you saw, I went through a bunch of iterations on thinking about those internal changes
14:57:50 <jaypipes> edleafe: but returning the HostState objects instead of the host,node tuples allowed us to isolate pretty effectively the claims code in the manager without affecting the drivers at all.
14:58:17 <edleafe> As long as we all realize that this will have to change yet again in Queens, sure
14:58:35 <cdent> change is and always will be inevitable
14:58:42 <jaypipes> edleafe: certainly it may. but again, I'm less concerned about internal interfaces than the RPC ones.
14:58:43 <edleafe> how trite
14:58:48 * cdent is trite
14:58:55 <cdent> always has been, always will be
14:59:31 <edleafe> jaypipes: I was more concerned about saying we will do X, and finding Y
14:59:32 <bauzas> 1 min left
14:59:51 <edleafe> As long as we get to X eventually
15:00:05 <edleafe> That's it - thanks everyone!
15:00:07 <edleafe> #endmeeting