14:00:13 <efried> #startmeeting nova_scheduler
14:00:14 <openstack> Meeting started Mon Sep 24 14:00:13 2018 UTC and is due to finish in 60 minutes.  The chair is efried. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:15 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:17 <openstack> The meeting name has been set to 'nova_scheduler'
14:00:33 <mriedem> o/
14:00:35 <alex_xu> o/
14:00:37 <gibi> o/
14:00:43 * bauzas waves
14:01:14 <edleafe> \o
14:01:34 <efried> alrightythen
14:01:48 <efried> #link agenda https://wiki.openstack.org/wiki/Meetings/NovaScheduler#Agenda_for_next_meeting
14:02:19 <efried> #topic last meeting
14:02:19 <efried> #link last minutes: http://eavesdrop.openstack.org/meetings/scheduler/2018/scheduler.2018-09-17-13.59.html
14:02:19 <efried> Thanks jaypipes for covering ^ while I was having Day From Hell.
14:02:25 <efried> Any old business to discuss?
14:02:52 <mriedem> grenade changes are ongoing
14:03:05 <mriedem> https://review.openstack.org/#/c/604454/
14:03:10 <mriedem> it's a mess as most can probably imagine
14:03:19 <efried> excellent.
14:03:31 <mriedem> ^ currently depends on dansmith's db migration script,
14:03:43 <mriedem> and a change to the neutron-grenade job so that we can clone openstack/placement into the CI run
14:04:08 <mriedem> i'm not sure yet if grenade will actually have to do the placement install/config stuff that devstack normally does
14:04:29 <mriedem> b/c we've got a weird chicken/egg situation where cdent's devstack change that does the new style placement setup depends on the grenade change
14:04:53 <mriedem> otherwise ideally new devstack would have placement cloned and setup via the separate repo before the upgrade runs
14:05:11 <mriedem> tl;dr i'm whacking the moles as i hit them
14:05:49 <mriedem> also, dan is out the rest of the week after today so if we need updates to his script we'll have to make them
14:06:02 * efried gets ready to swap in getopt
14:06:57 <efried> I'd like to see the pg script commonized in some sane way, since it's 90% duplicated.
14:07:01 <efried> I commented accordingly.
14:07:45 <dansmith> my script is easily broken out into common bits and the mysql bits.
14:08:00 <efried> yup, that'd be one way, lib-ify the common bits
14:08:04 <dansmith> it would be trivial I think to just take --dbtype on one script
14:08:13 <dansmith> and just call the right function I would think
14:08:20 <efried> yup, that was what I was thinking
14:08:23 <dansmith> or commonize it, either one
14:08:34 <dansmith> I just wish we wouldn't, but..
14:08:40 <efried> whyzat?
14:09:01 <dansmith> because we hardly even test pg and have said we're not making an effort to make sure it works
14:09:35 <dansmith> all this script does is dump a set of tables but with some safeguards.. if you're running pg against community advice, I'm sure you're capable of doing that part yourself
14:09:54 <efried> Fair enough, but there's clearly a want for the pg version since someone went through the trouble to propose it; and if we're going to have it at all, it might as well not be duplicated code.
14:10:26 <dansmith> I don't disagree that if we're going to have it it should be commonized
14:10:35 <efried> If we want to take a harder line on not supporting pg, we should downvote that patch I guess.
14:11:05 <mriedem> we've already merged a grenade-postgresql job
14:11:08 <mriedem> so we can test it
14:11:23 <efried> that's nice
14:11:45 <efried> Moving on?
14:12:18 <efried> #topic specs and review
14:12:29 <efried> #link Consumer generation & nrp use in nova: Series starting at https://review.openstack.org/#/c/591597
14:12:29 <efried> In a runway (ends Oct 4)
14:12:42 <efried> Any discussion on this?
14:12:58 <gibi> yep
14:13:01 <bauzas> I'll sponsor it
14:13:09 <gibi> efried: you have some concerns about the first patch
14:13:14 <bauzas> s/sponsor/shepherd
14:13:16 <mriedem> i had questions in it as well
14:13:40 <gibi> efried: I tried to answer it inline
14:14:00 <efried> Yeah, I haven't processed your response yet. Does what I'm saying at least make sense?
14:14:04 <gibi> mriedem: yours are more like comments I have to fix (and I'm fixing right now)
14:14:32 <gibi> efried: I'm not sure what you eventually suggests about the patch? shall we not do the delete patch at all?
14:14:41 <efried> That was my thought, yes.
14:15:04 <efried> Switching from DELETE to PUT {} doesn't buy us anything, and it makes it *look* like we're covering races when we're really really not.
14:15:09 <gibi> efried: for delete allocation we have DELETE /allocations/<uuid> to ignore generation
14:15:16 <efried> right
14:15:32 <gibi> efried: for the rest of the allocation manipulate we don't have such a workaround
14:15:50 <efried> Right
14:15:55 <efried> But
14:16:03 <gibi> efried: I really don't like ignorig generation in delete but handling it in every other case
14:16:24 <efried> I think the scenarios are different
14:16:42 <efried> like for initial allocs in spawn
14:16:56 <efried> we ought to be able to count on the generation being None
14:17:09 <efried> If it ain't, we should blow up righteously
14:17:14 <gibi> right
14:17:25 <efried> There's no corollary to that in the delete case.
14:18:48 <gibi> there are places where nova needs to read the allocation from placement and then manipualte it and then PUT it back
14:19:11 <efried> For other paths like migrate or resize, I agree we have a similar issue, in that we're basically retrieving the allocations *right* before we mess with them. So we're not closing much of a gap race-wise.
14:19:19 <gibi> https://review.openstack.org/#/c/583667/24/nova/scheduler/utils.py
14:19:30 <gibi> yeah
14:19:52 * jaypipes here now, sorry for lateness
14:20:02 <gibi> efried: what to do with those? there we cannot ignore the generation
14:20:18 <efried> Though again for migration, if we attempt to claim for the migration UUID and the generation isn't None, that's the same as spawn, which is definitely a thing to guard against.
14:20:51 <efried> idk, I just think the cure is worse than the disease for deletion.
14:21:59 <gibi> efried: so when we read - manipulate - and put (like in force live migrate and in force evacuate) then we can still have to prepare for conflict
14:22:16 <efried> yes
14:22:17 <gibi> even if this is not something that happens frequently
14:22:36 <gibi> I can drop the delete patch if others like jaypipes and mriedem agrees
14:23:04 <mriedem> no comment
14:23:22 <gibi> but in the rest of the cases we will have conflicts to prepare for
14:23:27 <efried> yes
14:23:39 <efried> I'm not suggesting getting rid of any of the others
14:23:44 <gibi> efried: OK
14:23:54 <efried> jaypipes: thoughts?
14:23:57 <efried> or dansmith?
14:24:02 <jaypipes> reading back still
14:24:17 <gibi> jaypipes: how do you feel about still using DELETE /allocations/{uuid} to delete instance allocations instead of PUTting allocations: {} with generation
14:24:52 <dansmith> if we don't use PUT,
14:24:59 <dansmith> then we don't know if they have changed since we examined them right?
14:25:09 <gibi> dansmith: right
14:25:10 <jaypipes> gibi: I don't get why efried thinks your patch will not avoid races.
14:25:13 <dansmith> like, maybe we shouldn't be deleting them if we thought they should be gone but they changed?
14:25:37 <efried> dansmith: Point is that we only know something changed in the teeny window *within* this one report client method between the GET and the PUT.
14:25:40 <dansmith> I thought we discussed in dublin even that it should be a put because delete with a body is weird, for exactly that reason
14:25:41 <efried> which is not useful
14:25:50 <jaypipes> gibi: if someone changes the allocations in between the initial read and the call to PUT {} then we will fail, which will prevent the race.
14:25:50 <dansmith> it's not?
14:26:05 <efried> because the window-of-change we care about strats when the delete is initiated
14:26:15 <efried> and the vast majority of that window is *before* we hit this method.
14:26:32 <efried> So switching from DELETE to PUT {} in this method makes it *look* like we're closing a race when we're really really not.
14:26:36 <gibi> jaypipes: yes, this is what the patch does, but the window is small as efried pointed out
14:27:13 <dansmith> isn't efried arguing that the window is so small that we shouldn't close it?
14:27:15 <gibi> efried: we close one really small race. But if nova does not store the allocation then there is no way to close the others
14:27:17 <dansmith> because that does not resonate with me
14:27:36 <efried> dansmith: We're not closing it.
14:27:38 <dansmith> it's a small window now, but in the future it might be larger if we change the way the workflow is arranged
14:27:55 <efried> We're narrowing it a fraction. It's still wide open.
14:28:00 <dansmith> efried: you say that because why.. because we're able to detect it and we'll just re-do it right?
14:28:29 <efried> well, if we do that (retry) then the change is truly pointless. But we agreed we would make 409 a hard fail here.
14:28:36 <gibi> dansmith: we are not even retrying the delete in this case
14:28:48 <dansmith> I totally don't understand
14:28:48 <gibi> dansmith: the end user can retry the delete
14:29:03 <dansmith> gibi: ack, just trying to understand what efried is talking about
14:29:31 <efried> The race window starts when the conductor receives the delete request.
14:29:33 <efried> Lots of stuff happens, lots of stuff happens, then finally down in the compute service we hit the report client method.
14:29:45 <jaypipes> dansmith: efried is saying that because the window the race condition exists in is super-tiny, he doesn't feel this patch is important.
14:29:47 <efried> Then *within* that method, we do a GET followed by an immediate PUT.
14:30:10 <dansmith> jaypipes: I thought he was arguing that it doesn't solve the race
14:30:16 <efried> Right. It doesn't.
14:30:49 <efried> If allocations are changed anywhere in that "lots of stuff happens" timeframe, we'll miss it, ignore it, delete anyway.
14:30:49 <dansmith> efried: I totally don't understand what the "conductor does lots of stuff" part has to do it
14:31:25 <dansmith> okay, but in the grand scheme of things,
14:31:33 <efried> So I'm saying this patch gives us a false sense of security that we've actually done something
14:31:36 <efried> when we really haven't.
14:31:48 <dansmith> the get followed by the put should include some "does the data I got in the GET make sense? Yes, okay, delete if it hasn't changed"
14:31:54 <dansmith> which is the pattern we should be following,
14:32:03 <dansmith> even if right now we just PUT $(GET ..)
14:32:09 <dansmith> IMHO
14:32:14 <dansmith> jaypipes: agree?
14:32:15 <gibi> efried: without nova storing the instance allocation there is nothing to compare to so we cannot detect the race in a the big window
14:32:23 <efried> You mean e.g. comparing the GET data with some understanding of what we think the allocation should look like?
14:32:27 <efried> gibi: Precisely.
14:32:33 <dansmith> efried: correct
14:32:50 <dansmith> just because the code plays fast and loose on the client side right now doesn't mean it will be that way forever
14:32:52 <jaypipes> dansmith: agree. but this particular code path doesn't use the reportclient's cached provider tree and so cannot check a known consumer generation.
14:32:53 <efried> Okay, but the only thing we have that would allow us to do that is... the allocation in placement, isn't it?
14:33:16 <dansmith> jaypipes: that's a transient and unfortunate state of the code at this moment though right?
14:33:24 <jaypipes> dansmith: not sure?
14:33:45 <efried> We've discussed and dismissed the idea of caching allocations in the past, I thought, but I suppose we could revisit that.
14:33:53 <jaypipes> efried, gibi: where is delete_allocation_for_instance() called from at this pint?
14:33:55 <jaypipes> point...
14:34:11 <jaypipes> efried: it's not allocations we need/want to cache. it's consumer generation.
14:34:27 <jaypipes> just another reason why *we should have a separate consumer endpoint*.
14:34:34 * jaypipes goes back into hole.
14:34:51 <efried> jaypipes: To answer your question, it's called from all over the place.
14:34:55 <gibi> jaypipes: normal instance delete, local delete, some rollback cases like failing unshelve
14:35:05 <efried> yeah, what gibi said
14:35:47 <efried> filter scheduler too
14:35:58 <jaypipes> well, then my view is that this patch doesn't make our existing situation any *worse* and solves a micro-race that might happen (very unlikely). I don't think it's a reason to not merge the patch as-is
14:36:05 <gibi> deleting the allocation held by the migration uud after move
14:36:23 <jaypipes> but I do acknowledge the problem that efried has outlined.
14:36:39 <jaypipes> we could solve this long-term by caching consumer generations.
14:37:02 <jaypipes> which would be made easier if we had a GET /consumers endpoint.
14:37:07 <jaypipes> but I digress.
14:37:17 <efried> Okay. I'll buy the change if can we put a prominent NOTE acking the issue. gibi cool?
14:37:25 <gibi> efried: super cool
14:37:27 <jaypipes> I'm fine with that.
14:37:33 <efried> Thanks for the discuss y'all.
14:37:39 <gibi> thank you
14:37:58 <efried> #agreed to keep https://review.openstack.org/#/c/591597/ in play with a NOTE acking the race issue
14:38:13 <efried> moving on
14:38:15 <efried> #link latest pupdate: http://lists.openstack.org/pipermail/openstack-dev/2018-September/134977.html
14:38:30 <efried> If you wouldn't mind clicking through ^ and scrolling down to the specs section
14:38:44 <efried> Give the titles a quick skim and see if there's anything you'd like to talk about.
14:39:59 <mriedem> i haven't been reviewing specs
14:40:02 <gibi> efried: about my specs open on placement, I'm totally OK deferring those while placement is in freezed state
14:40:04 <jaypipes> efried: not that I'd like to talk about. just need to do the reviews. :(
14:40:43 <jaypipes> gibi: does min bandwidth sched depend on any-traits?
14:41:04 <gibi> jaypipes: the multisegment use case depends on any-traits but I think that can wait
14:41:17 <jaypipes> ack
14:41:18 <gibi> jaypipes: if a network only maps to a single physnet then we are good
14:41:24 <efried> I think alex_xu merged gibi's specs, not sure what the bp tracking process does at this point. mriedem?
14:41:25 <jaypipes> right.
14:41:40 <gibi> I think mriedem approved the bp after the spec was merged
14:42:11 <gibi> and mriedem also noted that placement is freezed in the bp
14:42:35 <efried> but now we're saying we want to defer to Train? Or just let it ride and *probably* defer to Train at the end of the cycle?
14:43:08 <gibi> efried: I more like the later
14:43:17 <alex_xu> we freeze to Train?
14:43:24 <efried> We've got a handful of placement bps in play for this cycle. I was under the impression we were assuming placement would un-freeze at some point, early enough for us to get work done.
14:43:56 <mriedem> it won't unfreeze until grenade is done
14:43:59 <mriedem> so that's my focus
14:44:02 <efried> We're not truly frozen at the moment, fwiw. The openstack/placement repo is open for patches; we've been merging stuff that's not purely related to extraction.
14:44:21 <mriedem> what kinds of stuff?
14:44:31 <efried> I agree we shouldn't merge features until the upgrade stuff is sorted.
14:44:45 <gibi> efried: still I'd not start steeling review time from the extraction with any-traits and similar features
14:44:51 <bauzas> can we call Anna for unfreezing ?
14:44:57 * bauzas makes sad jokes
14:45:41 <efried> mriedem: Nothing big, mainly refactoring. E.g. for reducing complexity.
14:46:34 <efried> actually, looking at it, nothing significant has merged, but there's open patches along those lines.
14:46:56 <efried> anyway, my point is, the code could get proposed and reviewed while upgrade stuff is being worked.
14:47:12 <efried> We don't have to keep hands off the repo entirely.
14:48:01 <efried> okay, any other specs or reviews to bring up specifically?
14:48:42 <efried> #topic Extraction
14:48:42 <efried> cdent is on PTO. Anyone have any updates beyond what mriedem talked about earlier?
14:49:05 <bauzas> nothing but the fact I just uploaded a new revision for the libvirt reshape
14:49:19 <bauzas> and then I'll try to look whether it works
14:49:27 <bauzas> with my machine
14:49:49 <jaypipes> bauzas: link?
14:50:04 <efried> #link vgpu reshape https://review.openstack.org/#/c/599208/
14:50:18 <jaypipes> danke
14:50:26 <bauzas> jaypipes: https://review.openstack.org/#/c/599208/5
14:50:31 <bauzas> shit too late
14:50:41 * efried holsters six-shooter
14:50:44 <efried> #topic bugs
14:50:44 <efried> #link Placement bugs https://bugs.launchpad.net/nova/+bugs?field.tag=placement
14:50:47 <bauzas> we should also discuss about reshapes
14:51:05 <efried> okay
14:51:06 <bauzas> eg. should we have some specific module for reshapes ?
14:51:09 <efried> #topic reshapes
14:51:17 <efried> bauzas: module to do what?
14:51:30 <bauzas> efried: for example, say I need a reshape for vGPU
14:51:44 <bauzas> efried: then, once we agree on NUMA, we could have yet another reshape
14:51:58 <bauzas> then, say PCPU will need a new reshape
14:51:59 <bauzas> etc.
14:51:59 <efried> You think there's some portion of the reshaping algorithm in update_provider_tree that could be common for all reshapes in all virt drivers?
14:52:23 <bauzas> so, I was thinking of having a specific module that upgraders would use (like FFU folks)
14:52:26 <efried> or are you saying we should ask virt drivers to keep those things somewhere separate for ease of review/maintenance?
14:52:47 <bauzas> efried: maybe having a pattern, just that
14:53:17 <bauzas> so we could see all the reshapes
14:53:31 <bauzas> then, knowing for example when they were created, and for which cycle
14:53:54 <bauzas> I mean, I dunno
14:53:57 <efried> I think the idea has merit, if only for the sake of not having to keep adding and removing virt driver code every cycle. Keep 'em all together, kind of like we do for db upgrades?
14:54:04 <bauzas> yup
14:54:22 <bauzas> if so, I'll provide a new revision for https://review.openstack.org/#/c/599208/
14:54:33 <bauzas> and people will discuss in it ^
14:54:38 <efried> bauzas: Maybe you can write up something that demonstrates what you're talking about?
14:54:49 <bauzas> yeah, the above ^
14:55:02 <efried> okay.
14:55:19 <efried> #topic opens
14:55:38 <efried> We're going to run out of time, but I put this one on there
14:55:39 <efried> How to handle min_unit (and others) in a forward-looking (i.e. generic NRP) way?
14:55:39 <efried> #link IRC discussion with belmoreira http://eavesdrop.openstack.org/irclogs/%23openstack-placement/%23openstack-placement.2018-09-20.log.html#t2018-09-20T14:11:59
14:56:50 <efried> My take was that, if we're going to have any hope of allowing operators to configure things like allocation ratios, min/max units, etc. in the future (with nrp/sharing) then we're going to need a generic solution that doesn't get us into the config nightmare we're currently experiencing with alloc ratios.
14:57:58 <efried> jaypipes suggested
14:57:58 <efried> #link Spec (rocky, abandoned): Standardized provider descriptor file https://review.openstack.org/#/c/550244/
14:57:58 <efried> which almost gets us there, but falls a bit short in that it doesn't solve the chicken/egg of being able to identify a provider before you can tweak it.
14:58:00 <gibi> efried: I agree but right now I have no good sollution
14:58:18 <efried> yeah
14:58:50 <efried> the efforts around device passthrough
14:58:50 <efried> #link Spec: Modelling passthrough devices for report to placement ("generic device management") https://review.openstack.org/#/c/591037/
14:58:50 <efried> #link Spec: Generic device discovery policy https://review.openstack.org/#/c/603805/
14:58:50 <efried> are leading towards defining file formats in a similar spirit
14:59:20 <efried> but I think that fundamental problem still exists - how do I identify a provider that's going to be automatically generated by nova (or neutron or cyborg or cinder or...)
14:59:36 <efried> ...in order to provide customized inventory settings for it?
14:59:40 <bauzas> mmmm
14:59:49 <efried> Since we're out of time, consider ^ food for thought.
14:59:50 <bauzas> maybe we should autogenerate this file if needed ?
14:59:55 <bauzas> like we do for options ?
15:00:11 <bauzas> a getter/setter way
15:00:13 <jaypipes> efried: I specifically left out the "identification of the provider before you need it" because the clients of such a descriptor file would undoubtedly have different ideas of how to map local identifiers to RP identifiers.
15:00:15 <efried> it would have to be updated frequently. basically kept in sync with placement.
15:00:33 <efried> That's time, folks. Let's continue in -nova if desired.
15:00:35 <efried> #endmeeting