14:02:52 #startmeeting nova_scheduler 14:02:53 Meeting started Mon Jun 25 14:02:52 2018 UTC and is due to finish in 60 minutes. The chair is cdent. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:02:54 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:02:56 The meeting name has been set to 'nova_scheduler' 14:02:59 o/ 14:03:00 o/ 14:03:11 #chair efried edleafe tssurya takashin 14:03:12 Current chairs: cdent edleafe efried takashin tssurya 14:03:22 ō/ 14:03:39 #link agenda https://wiki.openstack.org/wiki/Meetings/NovaScheduler 14:04:15 #link last meeting http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-06-18-14.00.html 14:04:23 \o 14:04:28 \o 14:04:31 #topic last meeting 14:04:43 anything anyone want to address from the last meeting? 14:05:07 #topic specs and review 14:05:07 #link latest pupdate: http://lists.openstack.org/pipermail/openstack-dev/2018-June/131752.html 14:05:19 any pending work that people need to discuss in person? 14:05:23 "in person" 14:06:45 I guess that's a "no"? 14:06:51 o/ (late) 14:07:03 you got something mriedem ? 14:07:07 no 14:07:09 just here 14:07:24 cool 14:07:24 #topic bugs 14:07:25 #link placement bugs: https://bugs.launchpad.net/nova/+bugs?field.tag=placement&orderby=-id 14:09:13 https://bugs.launchpad.net/nova/+bug/1777591 is interesting 14:09:13 Launchpad bug 1777591 in OpenStack Compute (nova) "‘limit’ in allocation_candidates where sometimes make fore_hosts invalid" [Undecided,In progress] - Assigned to xulei (605423512-j) 14:09:45 and is a similar problem i think to the rebuild + image-defined traits issue 14:10:10 https://review.openstack.org/#/c/569498/ 14:13:26 * bauzas raises fist at force_hosts 14:14:11 Before you put it in that context, my best stab at fixing this (actually I think it was cdent's idea) was to add a queryparam to GET /allocation_candidates letting you restrict to particular provider UUID(s). 14:14:55 mriedem: Do you have a different idea? 14:15:13 efried: i saw cdent's comment on the bug, and just said the same on the patch 14:15:17 and yes that's what we should do imo 14:15:23 we already have that on GET /resource_providers 14:15:51 i.e. we likely need the same in_tree param in GET /allocation_candidates 14:16:12 okey dokey. Stein, I'm assuming. 14:16:26 maybe, it is a bug fix 14:16:38 for what i think is a pretty common scenario (for admins) 14:16:57 I was thinking less from a procedural-bp-approval point of view and more of a we've-already-got-a-shit-ton-on-our-plate perspective. 14:16:58 there might be a backportable workaround... 14:17:12 yeah i understand, but we can't backport microversions 14:17:33 The workaround is "set your limit higher". 14:17:40 so unless we have a workaround we can hack into the scheduler, it's kind of a big deal - alternatively your workaround is just set the limit to -1 14:17:42 MHO is maybe we should just not ask for a limit if operators use force_hosts 14:17:54 that's their responsibility 14:19:54 so for rocky, 14:20:02 Are we suggesting in Rocky (and backportable) to override limit to -1 if force_hosts is set? 14:20:02 Or to document as a workaround that the operator should set a higher limit overall? 14:20:23 we could (1) update the max_placement_results option help for the known issue (2) add a known issues release note and (3) update the docs for force hosts to mention this as well https://docs.openstack.org/nova/latest/admin/availability-zones.html 14:20:27 force_hosts is only used by admins 14:20:37 so they know about the number of compute nodes they have 14:20:38 efried: i think it's a docs bug in rocky 14:20:46 bah, I will *never* get the operator/admin terminology right. 14:20:47 and we could fix in stein with a pre-request placement filter 14:20:58 i left comments on https://review.openstack.org/#/c/576693/ 14:21:32 fwiw, I don't think it's a terrible workaround to disable limit when force_hosts is used. 14:21:53 the two options seem exclusive 14:22:07 yes 14:22:20 One is automatic 14:22:32 And the manual one - isn't the limit done by a conf option? 14:23:04 it's still shitty for performance 14:23:10 So you would have to set it globally for the sake of a single spawn? 14:23:14 you could pull back 10K candidates just to find the one you want 14:23:21 you could add a host_id argument to limit the amount of returned a/cs to either 1 (if it has the resources) or 0 (if it doesn't) 14:23:27 yeah max_placement_results is config 14:23:48 Yes, you could, but that's better to do for that single call than to have to set it globally so it happens for *every* request. 14:23:56 efried: true 14:24:13 Just got a message from cdent: his network is down, and is trying to get back online 14:24:29 figured something like that, thx edleafe 14:25:22 efried: also left that alternative on https://review.openstack.org/#/c/576693/ 14:25:55 best thing probably is to start with a functional recreate 14:26:00 and then we can tease out that idea 14:26:31 2 hosts, limit=1 and force to one of them 14:26:33 functional recreate is going to be tough, considering we can't predict the order in which results come back from the db. 14:26:42 default order is based on id isn't it? 14:26:50 I don't think it's that predictable. 14:27:10 Especially the way the code is set up now, where we shuffle alloc cands around in dicts all over the place while we're filtering/merging them. 14:27:34 It's definitely not *documented* as being based on ID if that's what you're asknig. 14:27:36 asking 14:27:40 so i guess randomize_allocation_candidates isn't really useful? 14:27:44 it is 14:27:45 "If False, allocation candidates 14:27:45 are returned in a deterministic but undefined order." 14:28:03 Right, meaning for the same env you would get back the same order every time. 14:28:22 but you can't rely on what that order would be. 14:28:33 And if one teensy thing about the env changes, the whole thing could be different. 14:29:49 also keep in mind that 2 hosts doesn't necessarily == 2 candidates IRL - because sharing & nested. 14:30:13 not that we have to do that in test 14:30:36 i wouldn't use sharing or nested in the functional test 14:30:44 but if it's a worry then i guess we just unit test it, but that kind of sucks 14:31:03 i don't know what else to do though 14:31:06 Not sure if we need a specific recreate for "NoValidHosts when using force_host". 14:31:31 We can have a functional test where the conf is set up for a lower limit and just prove we get all results (higher than that limit) when force_hosts is set. 14:32:07 I think I get what you're saying, though, that that test would then not be reusable for the "real" fix. 14:32:10 meh, that's probably fine for a unit test - if force_hosts, assert get allocation candidates was called w/o a limit 14:32:14 But I don't see an alternative offhand. 14:32:23 anyway, i think we can move on 14:32:56 since dansmith added max_placement_results he might have opinions too 14:33:11 but can take that to -nova after the meeting 14:33:21 makes sense 14:34:45 Any other bugs to bring up? 14:34:56 #topic open discussion 14:35:10 Bueller? Bueller? 14:35:55 Okay then. Keep calm and carry on. 14:35:58 #endmeeting