14:00:17 <edleafe> #startmeeting nova_scheduler
14:00:17 <openstack> Meeting started Mon Aug 28 14:00:17 2017 UTC and is due to finish in 60 minutes.  The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:19 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:21 <openstack> The meeting name has been set to 'nova_scheduler'
14:00:26 * edleafe thinks cdent is antsy
14:00:37 <edleafe> Anyone here today?
14:01:05 <cdent> Does antsy mean “the sun is shing outside”?
14:01:23 <mriedem> o/
14:01:37 <alex_xu> o/
14:01:48 <edleafe> I'm feeling guilty having a bit of sun this morning while nearby Houston drowns
14:02:52 <edleafe> I know bauzas is still on holiday. jaypipes - around?
14:03:06 <mriedem> he's out until mid week i think
14:03:20 <mriedem> well still traveling but might be around right now
14:03:42 <jaypipes> edleafe: yup
14:03:48 <edleafe> kewl
14:03:52 <edleafe> let's start
14:03:56 <edleafe> #link Saner RT agg map updates https://review.openstack.org/#/c/489633/
14:03:59 <edleafe> doh!
14:04:12 * edleafe cant copy/paste
14:04:15 <edleafe> #undo
14:04:16 <openstack> Removing item from minutes: #link https://review.openstack.org/#/c/489633/
14:04:21 <edleafe> #topic Specs & Reviews
14:04:31 <edleafe> #link Spec for returning allocation requests to the scheduler https://review.openstack.org/#/c/471927/
14:04:49 <edleafe> This is an internal spec that we've already pretty much implemented
14:04:57 <edleafe> just leftover from pike
14:05:28 <edleafe> needs some love from nova-specs cores
14:05:36 <edleafe> #link Add alternate hosts https://review.openstack.org/486215/
14:06:03 <edleafe> This is marked WIP because I don't really like how this works
14:06:40 <edleafe> I think we really need to spend some design time before we merge something like this
14:06:49 <mriedem> is it on the ptg etherpad?
14:06:49 <edleafe> #link https://blog.leafe.com/handling-unstructured-data/
14:07:13 <edleafe> mriedem: not yet - I wanted to have discussions sooner
14:07:20 <edleafe> and then continue at PTG
14:07:27 <edleafe> if needed
14:08:27 <edleafe> We made these choices in haste last cycle
14:08:55 <edleafe> And while we have the time I want to make sure we use it to not add more technical debt
14:09:41 <cdent> +1
14:10:05 <edleafe> jaypipes doesn't feel that this is going to be a problem; he says it makes things simpler
14:10:12 <edleafe> I'd like to hear from others
14:10:20 <jaypipes> edleafe: I don't necessarily agree that this is unstructured data, but I agree with you that just returning lists of lists of unnamed tuples or a tuple of lists of lists of HostState objects is not good.
14:10:26 <edleafe> Look at the code for that series, and make sure we want to live with that
14:11:06 <jaypipes> edleafe: what are you talking about I don't think this is going to be a problem? :) I specifically say in the patch that I don't like returning lists of lists of tuples.
14:11:31 <cdent> maybe we should just stick the data in dogpile, and send around a referference uuid instead
14:11:33 <edleafe> jaypipes: I was basing that on your comment on the blog post
14:11:35 <jaypipes> edleafe: and gave a suggestion of solving with a namedtuple.
14:11:40 <cdent> ;)/2
14:12:04 <edleafe> cdent: I was thinking of oslo.cache this weekend
14:12:11 <edleafe> which is a wrapper around dogpile
14:12:47 <edleafe> the key could be something like request_id + root_provider
14:13:37 <edleafe> Passing around huge globs of data never feels right to me, whether they are in named or unnamed tuples
14:13:38 <mriedem> ew
14:14:02 <mriedem> huge globs of data, like the RequestSpec?
14:14:08 <mriedem> this is quite a bit simpler though, right?
14:14:25 <mriedem> why don't we just abstract this into some versioned object?
14:14:29 <edleafe> mriedem: no, like the dicts of junk we passed before RequestSpec was made into an object
14:14:43 <jaypipes> I fail to see how this is a huge set of data.
14:14:44 <mriedem> filter_properties
14:15:20 <mriedem> so why not just an AlternativeHosts object or something?
14:15:31 <jaypipes> edleafe: isn't this just zero or more allocation request blobs (one per max_attempts per instance in num_instances)?
14:15:33 <edleafe> mriedem: that would be an improvement
14:15:50 <edleafe> jaypipes: one or more
14:15:56 <edleafe> for each instance
14:15:56 <cdent> I don’t think the issue is that there’s some catastrophe afoot
14:16:08 <cdent> but rather that we can do this more cleanly, so may as well get it right, now
14:16:21 <cdent> and the way to get something right is to have a chat about it
14:16:27 <mriedem> so the cache idea is like how reservation_id works in the compute API?
14:16:39 <mriedem> create >1 instances and get a reservation id back, so you can query on that later when listing servers?
14:17:07 <edleafe> mriedem: the exact key isn't important. It could be a UUID
14:17:15 <jaypipes> -1 on using a cache. I see no reason to do that for this amount of data.
14:17:31 * jaypipes surprised cdent hasn't said the same.
14:17:45 <mriedem> i also don't see the need to use a cache for tihs
14:17:48 <edleafe> jaypipes: once again, it's not the *volume* of data
14:18:00 <edleafe> mriedem: I'm throwing out ideas
14:18:06 <edleafe> I'm not married to any
14:18:06 <mriedem> if it's the structure that's a problem, just create a versioned object
14:18:06 <cdent> the cache idea was a lark, a spitball
14:18:25 <jaypipes> mriedem: ya
14:18:25 <edleafe> mriedem: I proposed an object for this in the blog post
14:18:29 <mriedem> ok
14:18:32 <mriedem> and document the fields
14:18:37 <mriedem> ^ something i wish the reqspec had
14:18:39 <mriedem> documentation
14:18:43 <mriedem> per my recent ML thread
14:19:24 <jaypipes> ++
14:19:29 <cdent> it’s a derivation of how I think we should be managing passing data: not in objects that we put over RPC, but in retrievable cacheable data, only identifiers over the rpc wire. But that’s not really germane for right now. What’s germane for right now is: hey wouldn’t it be great if we had something tidy. It sounds like some kind of object is the current def of “tidy”
14:19:56 <jaypipes> cdent: you mean how k8s works?
14:20:30 <cdent> I don’t watch that show
14:20:53 <jaypipes> or we could just go back to having all the computes read from the DB.
14:20:58 <cdent> I just happen to like global ram
14:21:00 <mriedem> yeah, but at some point the client side has to deal with the structure of the thing it pulls out of the cache, or rpc response
14:21:11 <jaypipes> mriedem: zactly.
14:21:12 <mriedem> and it sounds like that is the main concern
14:21:34 <edleafe> mriedem: not necessarily
14:21:37 <jaypipes> (for the record, I'm totally not serious about having computes read from the DB again)
14:21:57 <edleafe> the allocations are there so that we can unambiguously claim complex resources
14:22:20 <edleafe> whether we post the details, or a link to the details, is not important
14:23:09 <edleafe> right now we're passing around a bunch of these details, most of which will never be needed
14:24:06 <jaypipes> edleafe: well, right now we're not passing around anything :)
14:24:21 <edleafe> jaypipes: right now our current design is
14:24:32 <edleafe> and we are passing between placement and scheduler
14:24:51 <jaypipes> edleafe: right, but that's not going to change w.r.t. the alternate hosts stuff...
14:24:53 <edleafe> the design calls for then passing to super conductor and cell conductor
14:25:12 <edleafe> alternate hosts just multiplies the complexity
14:25:23 <jaypipes> I fail to see that.
14:25:26 <edleafe> making every X a list of X
14:25:49 <edleafe> instead of returning a host per instance, we return a list of hosts per instance
14:26:18 <jaypipes> it *reduces* the complexity of the retry operation as a whole because no longer does the request_spec.filter_properties['retry'] stuff need to be adjusted nor does the scheduler need to be re-consulted on each retry iteration
14:26:23 <edleafe> instead of returning an allocation dict per instance, we return a list of them per instance
14:26:46 <edleafe> jaypipes: of course. I've not said a thing about retry
14:26:48 <jaypipes> edleafe: yep, that is totes true.
14:26:59 <jaypipes> edleafe: this whole thing is about the retry operation.
14:27:19 <edleafe> what I'm saying it would be cleaner to return a list of objects instead of this complex 2-tuple
14:27:24 <jaypipes> edleafe: attempting to a) reduce the complexity of that operation and b) allow it to work in a cell-no-upcall situation
14:27:40 <jaypipes> edleafe: no disagreement from me on that.
14:27:45 <cdent> are there two different types of complexity being discussed here?
14:28:11 <edleafe> cdent: yes, and it's the multiplication effect that is my concern
14:28:32 <edleafe> The added complexity of passing a list of hosts is necessary
14:28:44 <edleafe> It's much cleaner than going through the retry cycle
14:29:12 <jaypipes> edleafe: I'm really not following your multiplication effect concern.
14:29:17 <edleafe> But passing the corresponding allocation dicts along with that is messy
14:29:39 <jaypipes> edleafe: that's what I'm not following you. why is that messy?
14:29:48 <mriedem> so i think we can agree we don't want the 2-tuple
14:29:57 <mriedem> and use an object
14:30:37 <edleafe> jaypipes: because it's relying on positional matches
14:31:07 <edleafe> the allocation dict for a given host is referenced by having the same nested list position
14:31:41 <jaypipes> edleafe: ok
14:31:48 <edleafe> IOW, if we are using host 2 of instance 3, we get its allocation through allocation_lists[2, 3]
14:31:48 <mriedem> key off the host,
14:31:50 <mriedem> don't use tuples
14:31:52 <mriedem> use an object with a dict
14:32:05 <edleafe> mriedem: that would be much, much better
14:32:09 <mriedem> so di it
14:32:10 <mriedem> *do it
14:32:22 <edleafe> Sure
14:32:34 <edleafe> I wanted agreement before I did
14:32:42 <edleafe> in case someone had an even better idea
14:32:46 <edleafe> or a reason not to
14:33:09 <mriedem> if we have to identify something in a structure, let's use keys in a dict rather than indexes in a tuple/list/whatever
14:33:13 <mriedem> in general, always
14:33:27 <edleafe> #agreed select_destinations will return a list of objects for each requested instance
14:33:34 <mriedem> otherwise i'll always have to re-learn what the items in the tuple are
14:33:39 <jaypipes> mriedem: ++
14:33:46 <edleafe> mriedem: yeah, that was my fear
14:34:09 <edleafe> mriedem: *I* know what those indexes are, but someone coming in new to the code would be completely confused
14:34:40 <mriedem> i can assure you'd i'd have to relearn it every time i debug that code
14:34:48 <mriedem> *you
14:35:03 <mriedem> like everything in the scheduler
14:35:06 <edleafe> mriedem: 'zactly
14:35:18 <edleafe> I'll start working on that today
14:35:23 <edleafe> Let's move on
14:35:26 <edleafe> #link Saner RT agg map updates https://review.openstack.org/#/c/489633/
14:35:48 <edleafe> cdent: comments?
14:36:04 <cdent> nope, there it is, have at
14:36:40 <edleafe> ok then
14:36:47 <edleafe> and for completeness:
14:36:47 <edleafe> #link Nested RP series https://review.openstack.org/#/c/470575/
14:37:02 <edleafe> That is still a ways off before it is resumed
14:37:12 <mriedem> let's re-propose the spec for nested RPs for queens
14:37:20 <mriedem> are there changes in design that need to be updated in the spec?
14:38:02 <edleafe> mriedem: none that I know of. jaypipes?
14:38:39 <mriedem> i just wondered since you said it's a ways off
14:38:44 <mriedem> not sure what it's a ways off from
14:39:28 <cdent> the integration of traits, shared, and nested all in the same place is major hairy
14:39:31 <edleafe> mriedem: well, that was jay's comment. The stuff that has to be done first is mostly in his head
14:39:53 <cdent> I’d love to see us make that more comprehensible and composable before adding more
14:39:57 <mriedem> cdent: yeah i wanted to know if we need to do shared first, plus do moves with a migration uuid
14:40:01 <mriedem> cdent: agree
14:40:13 <mriedem> so, dansmith started the migration uuid stuff here https://review.openstack.org/#/c/496933/
14:40:17 <mriedem> at least the data model and object changes
14:40:31 <dansmith> I figure we want a spec there
14:40:36 <dansmith> I just wanted to work on code instead
14:40:37 <mriedem> dansmith: agree
14:40:42 <edleafe> I think that nested is getting pressure from the NFV folks, since it's needed for that kind of scheduling
14:41:02 <mriedem> edleafe: sure,
14:41:06 <jaypipes> sorry, pulled away
14:41:10 <mriedem> but as cdent noted we need to clean up some of the pike mess first
14:41:14 <jaypipes> no, no changes to nested stuff.
14:41:22 <jaypipes> just need to rebase and fix up conflicts
14:41:25 <jaypipes> still...
14:41:30 <mriedem> so i think "migrating" the move operations to use migration uuid is job 1
14:42:00 <mriedem> because the move operation stuff was a real bear that came in way too late in pike
14:42:03 <dansmith> I don't disagree,but I think reschedules/alternatives need to be high on the list
14:42:08 <mriedem> and was the cause of most of our bugs in rc
14:42:19 <dansmith> mriedem: and will be the cause of most of the bugs not filed yet :)
14:42:29 <cdent> migraiton uuid and alternates can happen concurrently, yeah?
14:42:35 <mriedem> dansmith: yeah reschedules are important to cells v2 adoption too
14:42:38 <mriedem> cdent: yeah
14:42:40 <dansmith> cdent: yeah
14:42:44 <jaypipes> cdent: thx for taking on the agg update thing. will review that later.
14:42:52 * cdent bows
14:43:04 <mriedem> the traits stuff in the api is also concurrent while it's just work in the placement api
14:43:11 <mriedem> i.e. alex_xu's changes
14:43:19 <cdent> that’s the stuff that is likely to imact nested
14:43:31 <cdent> as the query complexity goes exponential
14:44:09 <mriedem> dansmith: so are you going to write a spec for the migration stuff?
14:44:14 <mriedem> at least some high level spec for the idea
14:44:19 <dansmith> yeah I guess so
14:44:23 <mriedem> that would be tops
14:44:25 <dansmith> I won't enjoy it though, just FYI
14:44:27 <mriedem> i know
14:44:37 <mriedem> i didn't want to ask, just FYI :)
14:44:37 <edleafe> we'll enjoy you not enjoying it
14:44:39 <mriedem> but you brought it up
14:44:41 <dansmith> hah
14:44:42 <jaypipes> I'd like to prioritize shared resources over nested actually
14:44:51 <dansmith> jaypipes: agree
14:44:55 <mriedem> agree too
14:45:05 <mriedem> the migration uuid should help that
14:45:08 <jaypipes> ok, cool. sorry if I missed that as an earlier agreement.
14:45:36 <mriedem> so i think, migration uuid for move consumer -> shared providers -> nested|traits?
14:45:43 <mriedem> plus alternatives happening concurrently
14:45:57 <jaypipes> yes
14:46:12 <edleafe> can't traits also happen concurrently?
14:46:17 <mriedem> #action dansmith to enjoy writing spec for using migration uuid as move operation consumer
14:46:23 <jaypipes> edleafe: yeah, I think they can.
14:46:24 <dansmith> #undo
14:46:31 <jaypipes> lol
14:46:44 * dansmith knows his undo has no power
14:46:46 <mriedem> i don't think we can actually do meeting stuff, edleafe is the chair
14:46:57 <dansmith> yeah
14:47:32 <edleafe> ok then
14:47:33 <edleafe> #action dansmith to enjoy writing spec for using migration uuid as move operation consumer
14:47:36 <edleafe> hehehe
14:47:39 <dansmith> gah
14:47:54 <edleafe> anything else for specs / reviews
14:47:56 <edleafe> ?
14:47:59 <jaypipes> yeah.
14:48:12 <edleafe> jaypipes: go for it
14:49:07 <alex_xu> #link traits in allocation candidates https://review.openstack.org/497713
14:49:13 <alex_xu> ^ the spec added
14:49:20 * alex_xu is faster than jaypipes
14:49:42 <jaypipes> alex_xu's patch here https://review.openstack.org/#/c/480379/
14:49:44 <edleafe> alex_xu: everyone is faster than jaypipes
14:49:44 <jaypipes> lol
14:50:17 <alex_xu> heh
14:50:22 <jaypipes> yep, was going to bring up that I asked alex_xu to split out the test in that patch in the same manner that gibi did for other bugs
14:50:37 <edleafe> #link ensure RP maps to those RPs that share with it https://review.openstack.org/#/c/480379/
14:50:57 <alex_xu> yea, I will do that, probably tomorrow morning
14:51:29 <jaypipes> no worries alex_xu
14:51:58 <edleafe> thanks alex_xu
14:52:07 <edleafe> Let's move ahead
14:52:07 <edleafe> #topic Bugs
14:52:07 <edleafe> Placement bugs
14:52:08 <edleafe> https://bugs.launchpad.net/nova/+bugs?field.tag=placement
14:52:32 <edleafe> One new bug this week - migration related (surprise!)
14:52:37 <mriedem> yeah so https://bugs.launchpad.net/nova/+bug/1712411 is one that didn't get fixed for rc2
14:52:37 <openstack> Launchpad bug 1712411 in OpenStack Compute (nova) pike "Allocations may not be removed from dest node during failed migrations" [High,Triaged]
14:53:02 <mriedem> i know of at least one place in conductor where that could be addressed, but the failed migration bug fixes are getting to be whack a mole
14:53:18 <mriedem> and i'm on the fence about whether we should have something general like a periodic in the computes to also remove failed stuff
14:53:32 <edleafe> mriedem: are we waiting on migration uuid for those?
14:53:40 <mriedem> to fix them?
14:53:40 <mriedem> no
14:53:47 <edleafe> ah, ok
14:53:58 <mriedem> i opened ^ after fixing the bug for force live migraiton not creating allocations
14:54:31 <mriedem> the problem is when you specify a host for live migration, the scheduler will allocate but then we do some other pre-checks which could fail, and we don't delete the allocations on the dest host if those fail
14:55:03 <jaypipes> mriedem: ew.
14:55:06 <mriedem> we could do that cleanup right at the point of failure in the conductor live migration task, and/or with a periodic in the compute
14:55:16 <cdent> meaning we’re in a known state within the conductor, yeah, so seems like we should just fix it there
14:55:25 <mriedem> that's the easiest fix
14:55:25 <edleafe> agreed
14:55:30 <mriedem> but like i said, whack a mole
14:55:41 <cdent> it is whack a mole, but it is explicit
14:55:46 * edleafe inserts his quarter to play
14:55:54 <mriedem> like this guy https://review.openstack.org/#/c/497606/
14:56:05 <cdent> having a clean up job is actually whack a mole: randomly stamping on the playing field, hoping that mole shows up somewhere
14:56:27 <mriedem> yes, and the periodic cleanup in the compute is exactly how the overwrite was happening
14:56:32 <mriedem> which we disabled late in pike
14:56:47 <mriedem> i just,
14:56:49 <mriedem> you know,
14:56:51 <mriedem> :(
14:57:03 <mriedem> cleaning up allocations is now like cleaning up volumes and ports,
14:57:07 <mriedem> it's spinkled everywhere
14:57:14 <mriedem> *sprinkled
14:57:27 <cdent> sure but that says more about how we allocate them in the first place, not about how we clean them up?
14:57:30 <mriedem> but i digress
14:57:43 <cdent> digression is the finest form of progression
14:58:01 <edleafe> So we have two minutes left for
14:58:03 <edleafe> #topic Open discussion
14:58:14 <mriedem> i think how/where we allocate them is fine, it's in the controller services, which is much better than doing it in the claim in the compute
14:58:15 <edleafe> anything we *haven't* covered yet?
14:58:24 <mriedem> cleanup on failure is just always going to be messy
14:58:40 <mriedem> because we have 100 APIs that can fail randomly anywhere :)
14:59:00 <cdent> right: how we X in the first place…
14:59:28 <mriedem> https://www.youtube.com/watch?v=fGx6K90TmCI ?
14:59:38 <cdent> Before we get to the ptg I wanted to ask: if extracting placement ever going to be possible, or should I just stop worrying about it. If it is possible, I can write something (a spec? an etherpad) up.
15:00:08 <edleafe> Let's continue in -nova
15:00:12 <edleafe> #endmeeting