14:02:52 <cdent> #startmeeting nova_scheduler
14:02:53 <openstack> Meeting started Mon Jun 25 14:02:52 2018 UTC and is due to finish in 60 minutes.  The chair is cdent. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:02:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:02:56 <openstack> The meeting name has been set to 'nova_scheduler'
14:02:59 <tssurya> o/
14:03:00 <takashin> o/
14:03:11 <cdent> #chair efried edleafe tssurya takashin
14:03:12 <openstack> Current chairs: cdent edleafe efried takashin tssurya
14:03:22 <efried> ō/
14:03:39 <cdent> #link agenda https://wiki.openstack.org/wiki/Meetings/NovaScheduler
14:04:15 <cdent> #link last meeting http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-06-18-14.00.html
14:04:23 <bauzas> \o
14:04:28 <edleafe> \o
14:04:31 <cdent> #topic last meeting
14:04:43 <cdent> anything anyone want to address from the last meeting?
14:05:07 <cdent> #topic specs and review
14:05:07 <cdent> #link latest pupdate: http://lists.openstack.org/pipermail/openstack-dev/2018-June/131752.html
14:05:19 <cdent> any pending work that people need to discuss in person?
14:05:23 <cdent> "in person"
14:06:45 <cdent> I guess that's a "no"?
14:06:51 <mriedem> o/ (late)
14:07:03 <cdent> you got something mriedem ?
14:07:07 <mriedem> no
14:07:09 <mriedem> just here
14:07:24 <cdent> cool
14:07:24 <cdent> #topic bugs
14:07:25 <cdent> #link placement bugs: https://bugs.launchpad.net/nova/+bugs?field.tag=placement&orderby=-id
14:09:13 <mriedem> https://bugs.launchpad.net/nova/+bug/1777591 is interesting
14:09:13 <openstack> Launchpad bug 1777591 in OpenStack Compute (nova) "‘limit’ in allocation_candidates where sometimes make fore_hosts invalid" [Undecided,In progress] - Assigned to xulei (605423512-j)
14:09:45 <mriedem> and is a similar problem i think to the rebuild + image-defined traits issue
14:10:10 <mriedem> https://review.openstack.org/#/c/569498/
14:13:26 * bauzas raises fist at force_hosts
14:14:11 <efried> Before you put it in that context, my best stab at fixing this (actually I think it was cdent's idea) was to add a queryparam to GET /allocation_candidates letting you restrict to particular provider UUID(s).
14:14:55 <efried> mriedem: Do you have a different idea?
14:15:13 <mriedem> efried: i saw cdent's comment on the bug, and just said the same on the patch
14:15:17 <mriedem> and yes that's what we should do imo
14:15:23 <mriedem> we already have that on GET /resource_providers
14:15:51 <mriedem> i.e. we likely need the same in_tree param in GET /allocation_candidates
14:16:12 <efried> okey dokey.  Stein, I'm assuming.
14:16:26 <mriedem> maybe, it is a bug fix
14:16:38 <mriedem> for what i think is a pretty common scenario (for admins)
14:16:57 <efried> I was thinking less from a procedural-bp-approval point of view and more of a we've-already-got-a-shit-ton-on-our-plate perspective.
14:16:58 <mriedem> there might be a backportable workaround...
14:17:12 <mriedem> yeah i understand, but we can't backport microversions
14:17:33 <efried> The workaround is "set your limit higher".
14:17:40 <mriedem> so unless we have a workaround we can hack into the scheduler, it's kind of a big deal - alternatively your workaround is just set the limit to -1
14:17:42 <bauzas> MHO is maybe we should just not ask for a limit if operators use force_hosts
14:17:54 <bauzas> that's their responsibility
14:19:54 <mriedem> so for rocky,
14:20:02 <efried> Are we suggesting in Rocky (and backportable) to override limit to -1 if force_hosts is set?
14:20:02 <efried> Or to document as a workaround that the operator should set a higher limit overall?
14:20:23 <mriedem> we could (1) update the max_placement_results option help for the known issue (2) add a known issues release note and (3) update the docs for force hosts to mention this as well https://docs.openstack.org/nova/latest/admin/availability-zones.html
14:20:27 <bauzas> force_hosts is only used by admins
14:20:37 <bauzas> so they know about the number of compute nodes they have
14:20:38 <mriedem> efried: i think it's a docs bug in rocky
14:20:46 <efried> bah, I will *never* get the operator/admin terminology right.
14:20:47 <mriedem> and we could fix in stein with a pre-request placement filter
14:20:58 <mriedem> i left comments on https://review.openstack.org/#/c/576693/
14:21:32 <efried> fwiw, I don't think it's a terrible workaround to disable limit when force_hosts is used.
14:21:53 <edleafe> the two options seem exclusive
14:22:07 <efried> yes
14:22:20 <efried> One is automatic
14:22:32 <efried> And the manual one - isn't the limit done by a conf option?
14:23:04 <mriedem> it's still shitty for performance
14:23:10 <efried> So you would have to set it globally for the sake of a single spawn?
14:23:14 <mriedem> you could pull back 10K candidates just to find the one you want
14:23:21 <edleafe> you could add a host_id argument to limit the amount of returned a/cs to either 1 (if it has the resources) or 0 (if it doesn't)
14:23:27 <mriedem> yeah max_placement_results is config
14:23:48 <efried> Yes, you could, but that's better to do for that single call than to have to set it globally so it happens for *every* request.
14:23:56 <mriedem> efried: true
14:24:13 <edleafe> Just got a message from cdent: his network is down, and is trying to get back online
14:24:29 <efried> figured something like that, thx edleafe
14:25:22 <mriedem> efried: also left that alternative on https://review.openstack.org/#/c/576693/
14:25:55 <mriedem> best thing probably is to start with a functional recreate
14:26:00 <mriedem> and then we can tease out that idea
14:26:31 <mriedem> 2 hosts, limit=1 and force to one of them
14:26:33 <efried> functional recreate is going to be tough, considering we can't predict the order in which results come back from the db.
14:26:42 <mriedem> default order is based on id isn't it?
14:26:50 <efried> I don't think it's that predictable.
14:27:10 <efried> Especially the way the code is set up now, where we shuffle alloc cands around in dicts all over the place while we're filtering/merging them.
14:27:34 <efried> It's definitely not *documented* as being based on ID if that's what you're asknig.
14:27:36 <efried> asking
14:27:40 <mriedem> so i guess randomize_allocation_candidates isn't really useful?
14:27:44 <efried> it is
14:27:45 <mriedem> "If False, allocation candidates
14:27:45 <mriedem> are returned in a deterministic but undefined order."
14:28:03 <efried> Right, meaning for the same env you would get back the same order every time.
14:28:22 <efried> but you can't rely on what that order would be.
14:28:33 <efried> And if one teensy thing about the env changes, the whole thing could be different.
14:29:49 <efried> also keep in mind that 2 hosts doesn't necessarily == 2 candidates IRL - because sharing & nested.
14:30:13 <efried> not that we have to do that in test
14:30:36 <mriedem> i wouldn't use sharing or nested in the functional test
14:30:44 <mriedem> but if it's a worry then i guess we just unit test it, but that kind of sucks
14:31:03 <mriedem> i don't know what else to do though
14:31:06 <efried> Not sure if we need a specific recreate for "NoValidHosts when using force_host".
14:31:31 <efried> We can have a functional test where the conf is set up for a lower limit and just prove we get all results (higher than that limit) when force_hosts is set.
14:32:07 <efried> I think I get what you're saying, though, that that test would then not be reusable for the "real" fix.
14:32:10 <mriedem> meh, that's probably fine for a unit test - if force_hosts, assert get allocation candidates was called w/o a limit
14:32:14 <efried> But I don't see an alternative offhand.
14:32:23 <mriedem> anyway, i think we can move on
14:32:56 <mriedem> since dansmith added max_placement_results he might have opinions too
14:33:11 <mriedem> but can take that to -nova after the meeting
14:33:21 <dansmith> makes sense
14:34:45 <efried> Any other bugs to bring up?
14:34:56 <efried> #topic open discussion
14:35:10 <efried> Bueller? Bueller?
14:35:55 <efried> Okay then.  Keep calm and carry on.
14:35:58 <efried> #endmeeting