#openstack-meeting-alt log

14:02:06 <cdent> #startmeeting nova scheduler
14:02:07 <openstack> Meeting started Mon Jun 11 14:02:06 2018 UTC and is due to finish in 60 minutes.  The chair is cdent. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:02:08 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:02:10 <openstack> The meeting name has been set to 'nova_scheduler'
14:02:12 <efried> ō/ again
14:02:15 <efried> whee!
14:02:17 <mriedem> o/
14:02:18 <takashin> o/
14:02:20 <cdent> mriedem: the underscore gets added automagic
14:02:20 <alex_xu> o/
14:02:27 <tetsuro> o/
14:02:30 <cdent> #chair efried mriedem edleafe
14:02:31 <openstack> Current chairs: cdent edleafe efried mriedem
14:02:55 <cdent> today I'd like to blame my keyboard, where none of the keys work any more unless I hit them like nails
14:03:01 <cdent> anyway
14:03:03 <cdent> welcome
14:03:20 <efried> a likely exuse
14:03:25 <cdent> no specific agenda items
14:03:39 <cdent> #topic reviews to discuss
14:03:53 <cdent> any reviews that people think need some sync discussion?
14:04:02 <mriedem> bottom of the consumer generatoin patch is just waiting for dansmith https://review.openstack.org/#/c/567678/
14:04:07 <mriedem> s/patch/series/
14:04:49 * bauzas forgot to wave
14:04:53 <efried> I'd like to draw attention to the performance issue in the reshape spec.
14:04:55 <efried> #link https://review.openstack.org/#/c/572583/
14:05:02 <efried> #link https://etherpad.openstack.org/p/reshape-provider-tree-performance
14:05:17 <efried> TL;DR we don't want to gather allocations every periodic, so we're looking for a way not to do that.
14:05:27 <efried> Because CERN will point their supercollider at us if we do.
14:05:55 <cdent> will that turn us into superheroes? cuz if so, let's leave all the bugs
14:05:56 <bauzas> it's fine, it's just light
14:06:24 <efried> My favorite right now is: we pass allocations=None to update_provider_tree(), and if the virt driver thinks it needs to do a migration, it raises MigrationNeeded, which we catch, gather allocations, and call again.
14:06:31 <cdent> efried: has anything changed since I expressed an opinion?
14:06:54 <efried> cdent: I added the `except MigrationNeeded` wrinkle to your favorite option.
14:06:57 <efried> cdent: otherwise, no.
14:07:25 <cdent> efried: I saw that, and I'm not sure I like it, because if there are expensive operations before migration needed is raised, we're doing them twice
14:07:28 * gibi joins late
14:07:47 <efried> True, but it saves us adding a weird special-case ComputeDriver method.
14:07:52 <mriedem> so the virt driver thinks it needs to do a migration for vgpus because the vgpu inventory is still on the root compute node provider?
14:07:55 <mriedem> in this example?
14:08:02 <efried> mriedem: Yes, that would be an example.
14:08:13 <bauzas> I was proposing the virt driver to know when to ask for a migration
14:08:24 <bauzas> because for example what mriedem said
14:08:35 <efried> cdent: I think the virt driver can do the check up front in whatever way it would have done in the new method anyway.
14:08:42 <cdent> true
14:08:42 <efried> cdent: ...if expense is an issue.
14:09:08 <efried> bauzas: Yes, it's a given that the virt driver is the only thing that can know if a migration is necessary.  The question is, how do we ask it?
14:09:49 <bauzas> I'm not understanding the relation within the "reshape migration" and the fact we pass all the allocations
14:10:09 <bauzas> we already pass all the allocations to the virt driver when calling update_blah() right?
14:10:27 <efried> bauzas: No.  The allocations param is newly proposed in the spec.
14:10:34 <bauzas> oh right
14:10:49 <efried> bauzas: The problem is that gathering the allocs is expensive, so we don't want to do it on every periodic.
14:11:02 <efried> bauzas: Because that will make the Swiss mad.
14:11:05 <bauzas> gotcha
14:11:20 <bauzas> no, s/swiss/swiss_and_french/
14:11:20 <gibi> efried: if the virt the only thing that knows that it needs the allocations then can the virt ask for that allocations from placement?
14:11:22 <efried> We've never really seen the Swiss break out a can of whup-ass before.
14:11:46 <bauzas> CERN is both in France and Swizerland :p
14:12:02 <efried> gibi: That's also an idea, BUT we've so far wanted to avoid the virt driver talking directly to placement.
14:12:12 <gibi> efried: ahh true
14:12:13 <efried> bauzas: Well, we're really not worried about the FRENCH getting mad.
14:12:16 <efried> :P
14:12:26 <bauzas> French are *always* mad, right
14:12:33 <bauzas> anyway, trying to think about the problem
14:12:41 <efried> So, we don't necessarily need to settle it right here, but if you have strong feelings one way or another, hit the etherpad.
14:12:58 <bauzas> when I started to think about the migration, I was thinking about the virt driver discovering the problem
14:13:14 <gibi> efried: passing down a lazy allocations list to the virt and if the virt touches it it indirectly calls placement to load the allocations
14:13:17 <bauzas> like, 'heh, look, this RC is atm on the root RP'
14:13:35 <efried> gibi: That's a neat idea.  A bit heavyweight, perhaps?
14:13:47 <efried> bauzas: Exactly.
14:13:52 <gibi> efried: I know I know. I'm in brainstorming mode I guess :)
14:13:58 <mriedem> it would be cleaner,
14:14:07 <mriedem> and lazy loading would match what we do elsewhere for optional fields on objects
14:14:18 <mriedem> but, it could also be an optimization / cleanup later once we get something that actually works
14:14:20 <bauzas> if so, yes, an exception provided by the virt could be nice
14:14:30 <efried> atm we don't *have* an object for allocations, so we would have to make that, but it's a doable thing.
14:14:44 <mriedem> anything is doable, but that's why i'm saying it could be later
14:14:50 <bauzas> like try: update_blah(); except exception.virtMigration: get_allocations(), update_blah
14:14:50 <cdent> a lazy list (in the same process) seems kind of an artificial way to overcome an arbitrary boundary
14:14:55 <mriedem> we're starting to get crunched for time in this release
14:15:05 * cdent nods
14:15:16 <cdent> especially since we keep adding more stuff
14:15:23 <mriedem> i remember talking about the flag option on friday but couldn't remember where we said we could store that flag in the db
14:15:26 * cdent STATES THE OBVIOUS
14:15:37 <efried> mriedem: Right, because dansmith pointed out that it wasn't per-instance.
14:15:52 <mriedem> etcd!
14:15:53 * dansmith nods
14:15:57 <mriedem> basic service!
14:16:15 <efried> And the db flag also seems like a heavyweight solution to me.
14:16:26 <efried> This is really simple and local:
14:16:26 <efried> try:
14:16:26 <efried> update_provider_tree(provider_tree, None)
14:16:26 <efried> except MigrationNeeded:
14:16:26 <efried> update_provider_tree(provider_tree, expensive_allocation_retrieval())
14:16:32 <mriedem> so for the virt option,
14:16:47 <mriedem> do we just have that as a check in the compute driver temporarily and then we can drop later right?
14:17:04 <mriedem> _check_if_vgpu_migrated() # TODO: remove in T?
14:17:05 <efried> mriedem: We have the check as part of update_provider_tree.  And it can be permanent.
14:17:12 <dansmith> mriedem: only if we provide an offline way to do it remember :)
14:17:23 <bauzas> I need to leave you in 5/10 mins btw.
14:17:46 <efried> mriedem: Because it's not just vgpu (or the short list of specific things we know about now).  These kinds of migrations could happen at any point in the future, for any number of possible scenarios.  This mechanism covers those.
14:18:01 <mriedem> i understand the mechanism does
14:18:13 <mriedem> i'm talking about the specific per-release check in each virt driver
14:18:25 <mriedem> dansmith: we are providing an offline way to do this
14:18:34 <mriedem> which just hooks into this same code i thought - or uses part of it
14:18:36 <efried> oh, does it even need to be a separate method outside of update_provider_tree?
14:18:57 <dansmith> mriedem: yeah, I know
14:19:15 <cdent> efried: I think it ought to be possible to have a cli that calls stuff in update_provider_tree, yeah?
14:19:24 <efried> Anyway, yeah, the specific internal stuff - wherever it lives - could have a "remove in T" TODO on it.
14:19:26 <mriedem> efried: i'm talking about whatever logic exists in the virt driver to say that a migration is needed, gimme allocations
14:19:38 <efried> cdent: That's what we've been figuring.  Swhat's written in the spec.
14:19:45 <efried> mriedem: yuh.
14:20:09 <mriedem> ok so this seems fine - agree to go this direction in the spec, get coding and adjust course if there is an unknown dragon?
14:20:17 <efried> Sounds good to me.
14:20:18 * cdent nods
14:20:27 <mriedem> dansmith: ^?
14:20:53 <efried> I've been waiting for dansmith to have a crack at the spec before I post another rev.  But I'll put this tweak in the queue.
14:21:05 <cdent> The difference I'm seeing is that efried seems to be saying "this can happening magically" and mriedem is saying "each cycle the virt driver may need to know that how we structure things may be different"?
14:21:32 <dansmith> mriedem: I'm not sure what you're agreeing to that is different than what we discussed, other than just implementation,
14:21:42 <dansmith> but I'm still ramping up my morning and will have a look at the spec in a bit
14:21:43 <mriedem> i think of this as a one-time thing specific to a release, that once you do that one thing we don't need to do that one thing again, e.g. move vgpu inventory from root to child
14:22:02 <dansmith> mriedem: that's one way to look at it,
14:22:02 <mriedem> and this is required to go from rocky to stein
14:22:09 <dansmith> I think efried wanted it to be more of a permanent sort of thing
14:22:13 <efried> mriedem: Does FFU run these offline script thingies on every release in between?
14:22:27 <cdent> we need it to be more permanent for situations where hardware is added
14:22:39 <mriedem> the mechanism is permanent
14:22:39 <dansmith> efried: yes, but so far there has never been one of those that had to run on all the compute nodes
14:22:41 <mriedem> as a framework
14:22:42 <efried> cdent: No, for that it's not a migration
14:22:47 <dansmith> cdent: hardware being added is different than migration
14:22:48 <efried> cdent: Because allocations don't have to move.
14:22:49 <dansmith> efried: right
14:23:04 <cdent> I'm not talking the migration, I'm talking about the mechanism internal to the update_provider_tree "knowing"
14:23:11 <cdent> it needs to always try to "know"
14:23:22 <mriedem> cdent: sure i agree that is permanent
14:23:32 <mriedem> the specific vgpu check is not
14:23:57 <cdent> so if we know there, why does there need to be a special thing for migrations? that's the part I'm missing
14:24:01 <bauzas> well, number of times people said in the spec they want a generic solution
14:24:14 <bauzas> even if my own point was just for VGPU migrations
14:24:28 <bauzas> so, I'm not sure people 'd like to have a specific check
14:24:36 <efried> Basically, if FFU runs its scripty thing against every release's codebase, then yes, we can have the migrationy virt driver stuff removed on every release.  But if we can skip releases, then whatever code needs to know how the Q shape changed in R needs to stick around so it can figure that out if we're going from Q to T or whatever.
14:24:39 <mriedem> cdent: because we don't want to do expensive allocation stuff every periodic for no reason
14:25:04 <mriedem> efried: you can't skip releases
14:25:14 <mriedem> FFU is rolling through each release's codebase
14:25:18 <cdent> which the virtdriver decides if it wants to do at the start of its run, right?
14:25:23 <mriedem> if T needs to know how things worked in newton, we'd be f'ced
14:25:42 <dansmith> mriedem: we probably need to talk to some FFU people about the potential for having to run this on all computes though,
14:25:51 <mriedem> cdent: yes
14:25:54 <dansmith> because that is definitely a difference from what they have to do today
14:26:22 * bauzas disappears
14:26:30 <efried> in order to know whether you have to run it on a compute, you have to .... ask that compute.
14:26:41 <mriedem> dansmith: todo added to https://etherpad.openstack.org/p/reshape-provider-tree-performance, L40
14:26:41 <efried> There's no other reliable way to know.
14:26:56 <mriedem> dansmith: you're likely closer to any FFU knowing people
14:26:56 <dansmith> efried: I know
14:27:05 <dansmith> mriedem: ack
14:27:34 <efried> ...whereupon you might as well just run the script on the compute.  It'll be a no-op if migration wasn't necessary.
14:28:08 <mriedem> i'll plead ignorance on what's different here
14:28:32 <mriedem> we didn't have to run the ironic flavor migration per compute b/c it was just a thing in the DB right
14:28:32 <mriedem> ?
14:28:34 <dansmith> efried: you'll have to,
14:28:47 <dansmith> I'm just saying they currently don't have to run anything on the computes during the process
14:28:48 <mriedem> we just re-used logic from the ironic virt driver
14:28:56 <dansmith> mriedem: yeah
14:28:58 <efried> mm
14:29:04 <mriedem> i guess i thought this was going to be similar
14:29:07 <dansmith> mriedem: but this will depend on the hardware on the compute
14:29:17 <dansmith> mriedem: we won't have enough info in the db about every node to do that
14:29:33 <mriedem> i thought it was just a matter of checking where the vgpu inventory is located in the provider tree?
14:29:39 <dansmith> nope
14:30:04 <mriedem> ok then i'm missing something
14:30:07 <dansmith> it requires looking at libvirt's topology of the system, seeing how sockets and memory are connected, which gpu is on which numa node, same for sriov, etc
14:30:10 <dansmith> efried: right?
14:30:21 <efried> yes.
14:30:42 <mriedem> ok so seeing vgpu on root provider says "MigrationNeeded!" but we need ^ for actually determining WHERE to move that inventory
14:30:43 <mriedem> yes?
14:30:50 <dansmith> right
14:30:52 <mriedem> ok
14:31:23 <efried> mriedem: basically, the virt driver is going to have something it does to check whether a migration is needed, and it may or may not be as simple as "do I have VGPUs in the compute RP", but whatever it is, in order to do the check without asking the virt driver, you would have to duplicate that logic somewhere.
14:32:06 <efried> and then, yes, to actually *do* anything about it, you definitely need to ask the virt driver - because only the virt driver knows which VGPU is allocated to which instance.
14:32:17 <dansmith> and,
14:32:21 <mriedem> the topology would have to be in the db, and we would have had to populate that in an older release
14:32:35 <mriedem> which we didn't, so we can't rely on it
14:32:39 <dansmith> only the virt driver knows which node the pci card is connected to
14:32:46 <dansmith> efried: er, sorry, read yours as VCPU
14:33:10 <mriedem> i don't know what alternative the FFU people have to this
14:33:13 <efried> either way - VCPUs with NUMA-ification, whatever.
14:33:17 <dansmith> yep
14:33:32 <mriedem> if you don't do an online migration each release, to let the coputes heal this themselves, then you're doing FFU and this is how we handle this case
14:33:44 <mriedem> there will be gnashing of teeth
14:33:51 <mriedem> but FFS FFU PPL!
14:33:54 <dansmith> the thing I'm more worried about,
14:34:06 <dansmith> is that it requires laying down code on the computes, even if you don't start the service,
14:34:13 <dansmith> which is what they want to avoid in general
14:34:17 <dansmith> because it makes rollback harder as well
14:34:26 <dansmith> containers can help if you're that sophisticated, but..
14:34:27 <mriedem> run it in a venv?
14:34:39 <cdent> this is why efried was suggesting that we just make it so these kinds of changes can happen at any release
14:34:41 <dansmith> right, container
14:34:47 <dansmith> cdent: yep, I know
14:34:51 <cdent> a virt driver knows, at any release, that is sees new stuff
14:34:54 <cdent> and just takes care of it
14:35:01 <cdent> no ffu hoopa required
14:35:01 <dansmith> cdent: we've been doing online transformation of simpler things for a long time,
14:35:19 <dansmith> and the lessons I think we've learned are that keeping that code around and working for a long time is not as trivial as it sounds
14:35:44 <cdent> why are these (nested rp topology) types of changes release-bound?
14:35:47 <efried> wait, rollback??
14:35:57 <efried> cdent: They're not, I've kept saying.
14:36:05 <dansmith> efried: manual rollback, nothing fancy
14:36:09 <cdent> yeah, I'm trying to agree with you
14:36:30 <efried> cdent: Dammit, keep agreeing with me, wouldja?
14:36:48 <efried> dansmith: I shudder to think of a *manual* rollback of a migration like this.
14:37:01 <mriedem> can someone define release bound here?
14:37:03 <dansmith> efried: um, okay
14:37:24 <dansmith> efried: snapshot the db, if anything goes wrong, you reinstall old packages, restore the db and then call your vendor.. lots of people do that
14:37:34 <mriedem> we aren't blocking you from getting to stein with this i don't think, the computes will do the migratoin on first startup once you hit your target release
14:37:35 <efried> cdent: But for FFU purposes, you'd basically have to be "accumulating" the changes, because the FFU process is going to run one step per release.
14:38:03 <efried> dansmith: Okay, I guess that works too :)  Long as you didn't do anything after that that you want to preserve.
14:38:09 <mriedem> if we dropped this rocky specific check in stein or T or U, and someone goes from queens to V, then they'd be screwed, but we don't support skip level releases
14:38:21 <dansmith> efried: obviously that's a restriction yes
14:38:45 <efried> dansmith: That makes sense, sorry, I was trying to imagine manually issuing the appropriate API calls to unwind a migration like this.
14:39:15 <cdent> efried: that accumulating is what I don't understand. there's state A (then) and state B (now) from the virt driver's standpoint. if there are states between A and B why do we care when we are planning to send "the state we want" to placement?
14:39:53 <dansmith> mriedem: I think they should roll forward with the online bit here, and if we can't get to the manual force thing in rocky then we might need a blocker migration of some sort in the future where we make sure this happens in the next release before we touch anything, but the longer we go between those two, the harder it will be to make sure we did the right thing and didn't drop it, IMHO
14:39:56 <efried> cdent: If the states are cumulative in some sense, don't we hypothetically need to understand how to go from A->B, B->C and also A->C?
14:40:26 <cdent> that's what I'm getting at: would they ever be cumulative and can we just say "no"?
14:40:53 <mriedem> dansmith: yeah, same for things like dropping old bdm attachment record / flows modeling - it's online now b/c it runs on the compute startup (because it has to)
14:41:13 <mriedem> and i think we've said we'd have to do some kind of blocker migration before we could ever drop that old attach flow code
14:41:41 <dansmith> mriedem: that is technically doable not on the computes though right?
14:41:52 <dansmith> or does it require collecting attachment info only discoverable on the compute?
14:42:23 <mriedem> maybe, we need the host connector from the compute, but that is stored in the bdm record for anything created after i think liberty
14:42:39 <dansmith> anyway
14:42:40 <mriedem> and it requires a rest call to cinder
14:42:41 <mriedem> yeah
14:42:42 <mriedem> so,
14:42:49 <mriedem> we're in agreement to move forward in the spec?
14:43:25 <mriedem> and we agree that data migrations are a giant pain in the ass
14:43:25 <dansmith> I'm going to look at it when we're done here
14:43:42 <efried> cool cool. Let's move on.  Thanks for the talk.
14:45:04 <efried> Other reviews to discuss/highlight?
14:45:31 <cdent> #link placement direct https://review.openstack.org/#/c/572576/
14:45:47 <cdent> I've got a question in there (the FIXME noted in the commit) about where to put the boundary for things
14:45:56 <cdent> it doesn't necessarily need to be resolved this cycle
14:46:06 <cdent> the latest version addressed efried and mriedem 's coments
14:46:22 <efried> cdent: I haven't looked since PS1 - are we at an Adapter interface at this point?
14:46:33 <cdent> it returns a NoAuthReportClient now
14:46:36 <cdent> so yes
14:46:41 <efried> schweet
14:47:13 <cdent> doing that is part of what introduced the boundary confusion
14:47:40 <efried> Other things:
14:47:41 <efried> consumer gens should be ready to start getting final reviews on.
14:47:41 <efried> nrp-in-alloc-cands is moving along nicely.
14:47:41 <efried> placement extraction has several patches ready for review
14:47:41 <efried> Lemme dig up some links for those...
14:47:43 <cdent> (but it is the right thing to do and worked nicely. this was an especially productive review loop)
14:48:48 <efried> #link consumer gens https://review.openstack.org/#/c/567678/
14:48:48 <efried> #link nrp-in-alloc-cands https://review.openstack.org/#/c/559480/
14:48:48 <efried> #link placement extraction https://review.openstack.org/#/c/362766/
14:48:48 <mriedem> at this point i'd like to remind people again to put stuff in runways
14:48:58 <mriedem> because only 1 of those 3 ^ are in runways
14:48:58 <cdent> I intend to fix https://bugs.launchpad.net/nova/+bug/1773225 today, as the warning is getting to me
14:48:59 <openstack> Launchpad bug 1773225 in OpenStack Compute (nova) "placement needs to stop using accept.best_match from webob it is deprecated" [Low,Triaged]
14:49:06 <mriedem> and the one is because i told cdent that's how to get review on it
14:49:24 <mriedem> i feel like the placement stuff is getting a lot of review attention even though it's mostly not in the runways etherpad
14:49:52 <efried> mriedem: Yeah, one philosophy on runways was that we should put stuff there that wouldn't get as much attention otherwise.
14:50:15 <efried> mriedem: I.e. keep urgent top-of-mind placement stuff out of there to keep it clear for lower-pri items.
14:50:19 <mriedem> i think by doing so, the stuff that's in runways is still not getting attention
14:50:24 <mriedem> which was the point of runways, getting attention
14:50:35 <mriedem> things can still be in the runways queue
14:50:37 <efried> I don't disagree, just echoing info.
14:50:42 <mriedem> and getting reviewed
14:50:56 <mriedem> i think the consumer generation stuff would have been in a slot by now if it had been put in the queue weeks ago when it was ready for review
14:51:03 <efried> though placement-extract wasn't getting as much attention as anything else anyway, so it definitely warrants a runway.
14:51:25 <efried> That bottom change is 103 patch sets old.
14:52:11 <mriedem> i'm speaking out of general anxiety over being in the 3rd milestone and feeling unaware of what's actually going on wrt priorities for the release
14:52:17 <mriedem> which is maybe just a personal problem :)
14:53:08 <efried> True, we kinda punted on the usual priorities list with some hand-waving about runways, but the latter doesn't replace the former.
14:53:44 <efried> It's not too late for us to nail (or at least write) down a list of priorities for the release.
14:53:57 <efried> That's probably a topic for the nova meeting tho.
14:54:15 <mriedem> i wouldn't look at the specs repo for priorities, migrator wouldn't have been in there anyway
14:54:22 <cdent> nail is maybe a strong word: given what we added as a priority for placement just last week
14:54:30 <mriedem> cdent: right
14:54:45 <cdent> which is probably just the way it goes, sadly
14:55:10 <mriedem> anyway, i would just like a state of the state or something, but maybe that's just me
14:55:34 <mriedem> i always had an etherpad of all the approved blueprints and their status by the 3rd milestone to help gauge where things were at
14:55:40 <cdent> I'll (finally) be able to get back to the placement update this friday (done with travelling) which will probably at least a bit of state
14:56:12 <mriedem> cool.
14:56:16 <mriedem> i guess we're done here?
14:56:20 <cdent> before we all start crying, any last comments before endig the meeting?
14:56:31 * efried waits to see if cdent can spell #endmeeting
14:56:32 <cdent> other than good luck, and such like?
14:56:43 <mriedem> too soon
14:56:51 <cdent> see this is why it is so embarassing, I can spell just fine. I can't type.
14:56:57 <cdent> #endfried
14:57:00 <cdent> nope, that's not it
14:57:01 <efried> hahahha
14:57:09 <cdent> #endem
14:57:14 <cdent> almost
14:57:17 <cdent> going?
14:57:22 <cdent> #endmeeting