14:00:13 #startmeeting nova_scheduler 14:00:14 Meeting started Mon Sep 24 14:00:13 2018 UTC and is due to finish in 60 minutes. The chair is efried. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:15 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:17 The meeting name has been set to 'nova_scheduler' 14:00:33 o/ 14:00:35 o/ 14:00:37 o/ 14:00:43 * bauzas waves 14:01:14 \o 14:01:34 alrightythen 14:01:48 #link agenda https://wiki.openstack.org/wiki/Meetings/NovaScheduler#Agenda_for_next_meeting 14:02:19 #topic last meeting 14:02:19 #link last minutes: http://eavesdrop.openstack.org/meetings/scheduler/2018/scheduler.2018-09-17-13.59.html 14:02:19 Thanks jaypipes for covering ^ while I was having Day From Hell. 14:02:25 Any old business to discuss? 14:02:52 grenade changes are ongoing 14:03:05 https://review.openstack.org/#/c/604454/ 14:03:10 it's a mess as most can probably imagine 14:03:19 excellent. 14:03:31 ^ currently depends on dansmith's db migration script, 14:03:43 and a change to the neutron-grenade job so that we can clone openstack/placement into the CI run 14:04:08 i'm not sure yet if grenade will actually have to do the placement install/config stuff that devstack normally does 14:04:29 b/c we've got a weird chicken/egg situation where cdent's devstack change that does the new style placement setup depends on the grenade change 14:04:53 otherwise ideally new devstack would have placement cloned and setup via the separate repo before the upgrade runs 14:05:11 tl;dr i'm whacking the moles as i hit them 14:05:49 also, dan is out the rest of the week after today so if we need updates to his script we'll have to make them 14:06:02 * efried gets ready to swap in getopt 14:06:57 I'd like to see the pg script commonized in some sane way, since it's 90% duplicated. 14:07:01 I commented accordingly. 14:07:45 my script is easily broken out into common bits and the mysql bits. 14:08:00 yup, that'd be one way, lib-ify the common bits 14:08:04 it would be trivial I think to just take --dbtype on one script 14:08:13 and just call the right function I would think 14:08:20 yup, that was what I was thinking 14:08:23 or commonize it, either one 14:08:34 I just wish we wouldn't, but.. 14:08:40 whyzat? 14:09:01 because we hardly even test pg and have said we're not making an effort to make sure it works 14:09:35 all this script does is dump a set of tables but with some safeguards.. if you're running pg against community advice, I'm sure you're capable of doing that part yourself 14:09:54 Fair enough, but there's clearly a want for the pg version since someone went through the trouble to propose it; and if we're going to have it at all, it might as well not be duplicated code. 14:10:26 I don't disagree that if we're going to have it it should be commonized 14:10:35 If we want to take a harder line on not supporting pg, we should downvote that patch I guess. 14:11:05 we've already merged a grenade-postgresql job 14:11:08 so we can test it 14:11:23 that's nice 14:11:45 Moving on? 14:12:18 #topic specs and review 14:12:29 #link Consumer generation & nrp use in nova: Series starting at https://review.openstack.org/#/c/591597 14:12:29 In a runway (ends Oct 4) 14:12:42 Any discussion on this? 14:12:58 yep 14:13:01 I'll sponsor it 14:13:09 efried: you have some concerns about the first patch 14:13:14 s/sponsor/shepherd 14:13:16 i had questions in it as well 14:13:40 efried: I tried to answer it inline 14:14:00 Yeah, I haven't processed your response yet. Does what I'm saying at least make sense? 14:14:04 mriedem: yours are more like comments I have to fix (and I'm fixing right now) 14:14:32 efried: I'm not sure what you eventually suggests about the patch? shall we not do the delete patch at all? 14:14:41 That was my thought, yes. 14:15:04 Switching from DELETE to PUT {} doesn't buy us anything, and it makes it *look* like we're covering races when we're really really not. 14:15:09 efried: for delete allocation we have DELETE /allocations/ to ignore generation 14:15:16 right 14:15:32 efried: for the rest of the allocation manipulate we don't have such a workaround 14:15:50 Right 14:15:55 But 14:16:03 efried: I really don't like ignorig generation in delete but handling it in every other case 14:16:24 I think the scenarios are different 14:16:42 like for initial allocs in spawn 14:16:56 we ought to be able to count on the generation being None 14:17:09 If it ain't, we should blow up righteously 14:17:14 right 14:17:25 There's no corollary to that in the delete case. 14:18:48 there are places where nova needs to read the allocation from placement and then manipualte it and then PUT it back 14:19:11 For other paths like migrate or resize, I agree we have a similar issue, in that we're basically retrieving the allocations *right* before we mess with them. So we're not closing much of a gap race-wise. 14:19:19 https://review.openstack.org/#/c/583667/24/nova/scheduler/utils.py 14:19:30 yeah 14:19:52 * jaypipes here now, sorry for lateness 14:20:02 efried: what to do with those? there we cannot ignore the generation 14:20:18 Though again for migration, if we attempt to claim for the migration UUID and the generation isn't None, that's the same as spawn, which is definitely a thing to guard against. 14:20:51 idk, I just think the cure is worse than the disease for deletion. 14:21:59 efried: so when we read - manipulate - and put (like in force live migrate and in force evacuate) then we can still have to prepare for conflict 14:22:16 yes 14:22:17 even if this is not something that happens frequently 14:22:36 I can drop the delete patch if others like jaypipes and mriedem agrees 14:23:04 no comment 14:23:22 but in the rest of the cases we will have conflicts to prepare for 14:23:27 yes 14:23:39 I'm not suggesting getting rid of any of the others 14:23:44 efried: OK 14:23:54 jaypipes: thoughts? 14:23:57 or dansmith? 14:24:02 reading back still 14:24:17 jaypipes: how do you feel about still using DELETE /allocations/{uuid} to delete instance allocations instead of PUTting allocations: {} with generation 14:24:52 if we don't use PUT, 14:24:59 then we don't know if they have changed since we examined them right? 14:25:09 dansmith: right 14:25:10 gibi: I don't get why efried thinks your patch will not avoid races. 14:25:13 like, maybe we shouldn't be deleting them if we thought they should be gone but they changed? 14:25:37 dansmith: Point is that we only know something changed in the teeny window *within* this one report client method between the GET and the PUT. 14:25:40 I thought we discussed in dublin even that it should be a put because delete with a body is weird, for exactly that reason 14:25:41 which is not useful 14:25:50 gibi: if someone changes the allocations in between the initial read and the call to PUT {} then we will fail, which will prevent the race. 14:25:50 it's not? 14:26:05 because the window-of-change we care about strats when the delete is initiated 14:26:15 and the vast majority of that window is *before* we hit this method. 14:26:32 So switching from DELETE to PUT {} in this method makes it *look* like we're closing a race when we're really really not. 14:26:36 jaypipes: yes, this is what the patch does, but the window is small as efried pointed out 14:27:13 isn't efried arguing that the window is so small that we shouldn't close it? 14:27:15 efried: we close one really small race. But if nova does not store the allocation then there is no way to close the others 14:27:17 because that does not resonate with me 14:27:36 dansmith: We're not closing it. 14:27:38 it's a small window now, but in the future it might be larger if we change the way the workflow is arranged 14:27:55 We're narrowing it a fraction. It's still wide open. 14:28:00 efried: you say that because why.. because we're able to detect it and we'll just re-do it right? 14:28:29 well, if we do that (retry) then the change is truly pointless. But we agreed we would make 409 a hard fail here. 14:28:36 dansmith: we are not even retrying the delete in this case 14:28:48 I totally don't understand 14:28:48 dansmith: the end user can retry the delete 14:29:03 gibi: ack, just trying to understand what efried is talking about 14:29:31 The race window starts when the conductor receives the delete request. 14:29:33 Lots of stuff happens, lots of stuff happens, then finally down in the compute service we hit the report client method. 14:29:45 dansmith: efried is saying that because the window the race condition exists in is super-tiny, he doesn't feel this patch is important. 14:29:47 Then *within* that method, we do a GET followed by an immediate PUT. 14:30:10 jaypipes: I thought he was arguing that it doesn't solve the race 14:30:16 Right. It doesn't. 14:30:49 If allocations are changed anywhere in that "lots of stuff happens" timeframe, we'll miss it, ignore it, delete anyway. 14:30:49 efried: I totally don't understand what the "conductor does lots of stuff" part has to do it 14:31:25 okay, but in the grand scheme of things, 14:31:33 So I'm saying this patch gives us a false sense of security that we've actually done something 14:31:36 when we really haven't. 14:31:48 the get followed by the put should include some "does the data I got in the GET make sense? Yes, okay, delete if it hasn't changed" 14:31:54 which is the pattern we should be following, 14:32:03 even if right now we just PUT $(GET ..) 14:32:09 IMHO 14:32:14 jaypipes: agree? 14:32:15 efried: without nova storing the instance allocation there is nothing to compare to so we cannot detect the race in a the big window 14:32:23 You mean e.g. comparing the GET data with some understanding of what we think the allocation should look like? 14:32:27 gibi: Precisely. 14:32:33 efried: correct 14:32:50 just because the code plays fast and loose on the client side right now doesn't mean it will be that way forever 14:32:52 dansmith: agree. but this particular code path doesn't use the reportclient's cached provider tree and so cannot check a known consumer generation. 14:32:53 Okay, but the only thing we have that would allow us to do that is... the allocation in placement, isn't it? 14:33:16 jaypipes: that's a transient and unfortunate state of the code at this moment though right? 14:33:24 dansmith: not sure? 14:33:45 We've discussed and dismissed the idea of caching allocations in the past, I thought, but I suppose we could revisit that. 14:33:53 efried, gibi: where is delete_allocation_for_instance() called from at this pint? 14:33:55 point... 14:34:11 efried: it's not allocations we need/want to cache. it's consumer generation. 14:34:27 just another reason why *we should have a separate consumer endpoint*. 14:34:34 * jaypipes goes back into hole. 14:34:51 jaypipes: To answer your question, it's called from all over the place. 14:34:55 jaypipes: normal instance delete, local delete, some rollback cases like failing unshelve 14:35:05 yeah, what gibi said 14:35:47 filter scheduler too 14:35:58 well, then my view is that this patch doesn't make our existing situation any *worse* and solves a micro-race that might happen (very unlikely). I don't think it's a reason to not merge the patch as-is 14:36:05 deleting the allocation held by the migration uud after move 14:36:23 but I do acknowledge the problem that efried has outlined. 14:36:39 we could solve this long-term by caching consumer generations. 14:37:02 which would be made easier if we had a GET /consumers endpoint. 14:37:07 but I digress. 14:37:17 Okay. I'll buy the change if can we put a prominent NOTE acking the issue. gibi cool? 14:37:25 efried: super cool 14:37:27 I'm fine with that. 14:37:33 Thanks for the discuss y'all. 14:37:39 thank you 14:37:58 #agreed to keep https://review.openstack.org/#/c/591597/ in play with a NOTE acking the race issue 14:38:13 moving on 14:38:15 #link latest pupdate: http://lists.openstack.org/pipermail/openstack-dev/2018-September/134977.html 14:38:30 If you wouldn't mind clicking through ^ and scrolling down to the specs section 14:38:44 Give the titles a quick skim and see if there's anything you'd like to talk about. 14:39:59 i haven't been reviewing specs 14:40:02 efried: about my specs open on placement, I'm totally OK deferring those while placement is in freezed state 14:40:04 efried: not that I'd like to talk about. just need to do the reviews. :( 14:40:43 gibi: does min bandwidth sched depend on any-traits? 14:41:04 jaypipes: the multisegment use case depends on any-traits but I think that can wait 14:41:17 ack 14:41:18 jaypipes: if a network only maps to a single physnet then we are good 14:41:24 I think alex_xu merged gibi's specs, not sure what the bp tracking process does at this point. mriedem? 14:41:25 right. 14:41:40 I think mriedem approved the bp after the spec was merged 14:42:11 and mriedem also noted that placement is freezed in the bp 14:42:35 but now we're saying we want to defer to Train? Or just let it ride and *probably* defer to Train at the end of the cycle? 14:43:08 efried: I more like the later 14:43:17 we freeze to Train? 14:43:24 We've got a handful of placement bps in play for this cycle. I was under the impression we were assuming placement would un-freeze at some point, early enough for us to get work done. 14:43:56 it won't unfreeze until grenade is done 14:43:59 so that's my focus 14:44:02 We're not truly frozen at the moment, fwiw. The openstack/placement repo is open for patches; we've been merging stuff that's not purely related to extraction. 14:44:21 what kinds of stuff? 14:44:31 I agree we shouldn't merge features until the upgrade stuff is sorted. 14:44:45 efried: still I'd not start steeling review time from the extraction with any-traits and similar features 14:44:51 can we call Anna for unfreezing ? 14:44:57 * bauzas makes sad jokes 14:45:41 mriedem: Nothing big, mainly refactoring. E.g. for reducing complexity. 14:46:34 actually, looking at it, nothing significant has merged, but there's open patches along those lines. 14:46:56 anyway, my point is, the code could get proposed and reviewed while upgrade stuff is being worked. 14:47:12 We don't have to keep hands off the repo entirely. 14:48:01 okay, any other specs or reviews to bring up specifically? 14:48:42 #topic Extraction 14:48:42 cdent is on PTO. Anyone have any updates beyond what mriedem talked about earlier? 14:49:05 nothing but the fact I just uploaded a new revision for the libvirt reshape 14:49:19 and then I'll try to look whether it works 14:49:27 with my machine 14:49:49 bauzas: link? 14:50:04 #link vgpu reshape https://review.openstack.org/#/c/599208/ 14:50:18 danke 14:50:26 jaypipes: https://review.openstack.org/#/c/599208/5 14:50:31 shit too late 14:50:41 * efried holsters six-shooter 14:50:44 #topic bugs 14:50:44 #link Placement bugs https://bugs.launchpad.net/nova/+bugs?field.tag=placement 14:50:47 we should also discuss about reshapes 14:51:05 okay 14:51:06 eg. should we have some specific module for reshapes ? 14:51:09 #topic reshapes 14:51:17 bauzas: module to do what? 14:51:30 efried: for example, say I need a reshape for vGPU 14:51:44 efried: then, once we agree on NUMA, we could have yet another reshape 14:51:58 then, say PCPU will need a new reshape 14:51:59 etc. 14:51:59 You think there's some portion of the reshaping algorithm in update_provider_tree that could be common for all reshapes in all virt drivers? 14:52:23 so, I was thinking of having a specific module that upgraders would use (like FFU folks) 14:52:26 or are you saying we should ask virt drivers to keep those things somewhere separate for ease of review/maintenance? 14:52:47 efried: maybe having a pattern, just that 14:53:17 so we could see all the reshapes 14:53:31 then, knowing for example when they were created, and for which cycle 14:53:54 I mean, I dunno 14:53:57 I think the idea has merit, if only for the sake of not having to keep adding and removing virt driver code every cycle. Keep 'em all together, kind of like we do for db upgrades? 14:54:04 yup 14:54:22 if so, I'll provide a new revision for https://review.openstack.org/#/c/599208/ 14:54:33 and people will discuss in it ^ 14:54:38 bauzas: Maybe you can write up something that demonstrates what you're talking about? 14:54:49 yeah, the above ^ 14:55:02 okay. 14:55:19 #topic opens 14:55:38 We're going to run out of time, but I put this one on there 14:55:39 How to handle min_unit (and others) in a forward-looking (i.e. generic NRP) way? 14:55:39 #link IRC discussion with belmoreira http://eavesdrop.openstack.org/irclogs/%23openstack-placement/%23openstack-placement.2018-09-20.log.html#t2018-09-20T14:11:59 14:56:50 My take was that, if we're going to have any hope of allowing operators to configure things like allocation ratios, min/max units, etc. in the future (with nrp/sharing) then we're going to need a generic solution that doesn't get us into the config nightmare we're currently experiencing with alloc ratios. 14:57:58 jaypipes suggested 14:57:58 #link Spec (rocky, abandoned): Standardized provider descriptor file https://review.openstack.org/#/c/550244/ 14:57:58 which almost gets us there, but falls a bit short in that it doesn't solve the chicken/egg of being able to identify a provider before you can tweak it. 14:58:00 efried: I agree but right now I have no good sollution 14:58:18 yeah 14:58:50 the efforts around device passthrough 14:58:50 #link Spec: Modelling passthrough devices for report to placement ("generic device management") https://review.openstack.org/#/c/591037/ 14:58:50 #link Spec: Generic device discovery policy https://review.openstack.org/#/c/603805/ 14:58:50 are leading towards defining file formats in a similar spirit 14:59:20 but I think that fundamental problem still exists - how do I identify a provider that's going to be automatically generated by nova (or neutron or cyborg or cinder or...) 14:59:36 ...in order to provide customized inventory settings for it? 14:59:40 mmmm 14:59:49 Since we're out of time, consider ^ food for thought. 14:59:50 maybe we should autogenerate this file if needed ? 14:59:55 like we do for options ? 15:00:11 a getter/setter way 15:00:13 efried: I specifically left out the "identification of the provider before you need it" because the clients of such a descriptor file would undoubtedly have different ideas of how to map local identifiers to RP identifiers. 15:00:15 it would have to be updated frequently. basically kept in sync with placement. 15:00:33 That's time, folks. Let's continue in -nova if desired. 15:00:35 #endmeeting