14:02:06 #startmeeting nova scheduler 14:02:07 Meeting started Mon Jun 11 14:02:06 2018 UTC and is due to finish in 60 minutes. The chair is cdent. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:02:08 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:02:10 The meeting name has been set to 'nova_scheduler' 14:02:12 ō/ again 14:02:15 whee! 14:02:17 o/ 14:02:18 o/ 14:02:20 mriedem: the underscore gets added automagic 14:02:20 o/ 14:02:27 o/ 14:02:30 #chair efried mriedem edleafe 14:02:31 Current chairs: cdent edleafe efried mriedem 14:02:55 today I'd like to blame my keyboard, where none of the keys work any more unless I hit them like nails 14:03:01 anyway 14:03:03 welcome 14:03:20 a likely exuse 14:03:25 no specific agenda items 14:03:39 #topic reviews to discuss 14:03:53 any reviews that people think need some sync discussion? 14:04:02 bottom of the consumer generatoin patch is just waiting for dansmith https://review.openstack.org/#/c/567678/ 14:04:07 s/patch/series/ 14:04:49 * bauzas forgot to wave 14:04:53 I'd like to draw attention to the performance issue in the reshape spec. 14:04:55 #link https://review.openstack.org/#/c/572583/ 14:05:02 #link https://etherpad.openstack.org/p/reshape-provider-tree-performance 14:05:17 TL;DR we don't want to gather allocations every periodic, so we're looking for a way not to do that. 14:05:27 Because CERN will point their supercollider at us if we do. 14:05:55 will that turn us into superheroes? cuz if so, let's leave all the bugs 14:05:56 it's fine, it's just light 14:06:24 My favorite right now is: we pass allocations=None to update_provider_tree(), and if the virt driver thinks it needs to do a migration, it raises MigrationNeeded, which we catch, gather allocations, and call again. 14:06:31 efried: has anything changed since I expressed an opinion? 14:06:54 cdent: I added the `except MigrationNeeded` wrinkle to your favorite option. 14:06:57 cdent: otherwise, no. 14:07:25 efried: I saw that, and I'm not sure I like it, because if there are expensive operations before migration needed is raised, we're doing them twice 14:07:28 * gibi joins late 14:07:47 True, but it saves us adding a weird special-case ComputeDriver method. 14:07:52 so the virt driver thinks it needs to do a migration for vgpus because the vgpu inventory is still on the root compute node provider? 14:07:55 in this example? 14:08:02 mriedem: Yes, that would be an example. 14:08:13 I was proposing the virt driver to know when to ask for a migration 14:08:24 because for example what mriedem said 14:08:35 cdent: I think the virt driver can do the check up front in whatever way it would have done in the new method anyway. 14:08:42 true 14:08:42 cdent: ...if expense is an issue. 14:09:08 bauzas: Yes, it's a given that the virt driver is the only thing that can know if a migration is necessary. The question is, how do we ask it? 14:09:49 I'm not understanding the relation within the "reshape migration" and the fact we pass all the allocations 14:10:09 we already pass all the allocations to the virt driver when calling update_blah() right? 14:10:27 bauzas: No. The allocations param is newly proposed in the spec. 14:10:34 oh right 14:10:49 bauzas: The problem is that gathering the allocs is expensive, so we don't want to do it on every periodic. 14:11:02 bauzas: Because that will make the Swiss mad. 14:11:05 gotcha 14:11:20 no, s/swiss/swiss_and_french/ 14:11:20 efried: if the virt the only thing that knows that it needs the allocations then can the virt ask for that allocations from placement? 14:11:22 We've never really seen the Swiss break out a can of whup-ass before. 14:11:46 CERN is both in France and Swizerland :p 14:12:02 gibi: That's also an idea, BUT we've so far wanted to avoid the virt driver talking directly to placement. 14:12:12 efried: ahh true 14:12:13 bauzas: Well, we're really not worried about the FRENCH getting mad. 14:12:16 :P 14:12:26 French are *always* mad, right 14:12:33 anyway, trying to think about the problem 14:12:41 So, we don't necessarily need to settle it right here, but if you have strong feelings one way or another, hit the etherpad. 14:12:58 when I started to think about the migration, I was thinking about the virt driver discovering the problem 14:13:14 efried: passing down a lazy allocations list to the virt and if the virt touches it it indirectly calls placement to load the allocations 14:13:17 like, 'heh, look, this RC is atm on the root RP' 14:13:35 gibi: That's a neat idea. A bit heavyweight, perhaps? 14:13:47 bauzas: Exactly. 14:13:52 efried: I know I know. I'm in brainstorming mode I guess :) 14:13:58 it would be cleaner, 14:14:07 and lazy loading would match what we do elsewhere for optional fields on objects 14:14:18 but, it could also be an optimization / cleanup later once we get something that actually works 14:14:20 if so, yes, an exception provided by the virt could be nice 14:14:30 atm we don't *have* an object for allocations, so we would have to make that, but it's a doable thing. 14:14:44 anything is doable, but that's why i'm saying it could be later 14:14:50 like try: update_blah(); except exception.virtMigration: get_allocations(), update_blah 14:14:50 a lazy list (in the same process) seems kind of an artificial way to overcome an arbitrary boundary 14:14:55 we're starting to get crunched for time in this release 14:15:05 * cdent nods 14:15:16 especially since we keep adding more stuff 14:15:23 i remember talking about the flag option on friday but couldn't remember where we said we could store that flag in the db 14:15:26 * cdent STATES THE OBVIOUS 14:15:37 mriedem: Right, because dansmith pointed out that it wasn't per-instance. 14:15:52 etcd! 14:15:53 * dansmith nods 14:15:57 basic service! 14:16:15 And the db flag also seems like a heavyweight solution to me. 14:16:26 This is really simple and local: 14:16:26 try: 14:16:26 update_provider_tree(provider_tree, None) 14:16:26 except MigrationNeeded: 14:16:26 update_provider_tree(provider_tree, expensive_allocation_retrieval()) 14:16:32 so for the virt option, 14:16:47 do we just have that as a check in the compute driver temporarily and then we can drop later right? 14:17:04 _check_if_vgpu_migrated() # TODO: remove in T? 14:17:05 mriedem: We have the check as part of update_provider_tree. And it can be permanent. 14:17:12 mriedem: only if we provide an offline way to do it remember :) 14:17:23 I need to leave you in 5/10 mins btw. 14:17:46 mriedem: Because it's not just vgpu (or the short list of specific things we know about now). These kinds of migrations could happen at any point in the future, for any number of possible scenarios. This mechanism covers those. 14:18:01 i understand the mechanism does 14:18:13 i'm talking about the specific per-release check in each virt driver 14:18:25 dansmith: we are providing an offline way to do this 14:18:34 which just hooks into this same code i thought - or uses part of it 14:18:36 oh, does it even need to be a separate method outside of update_provider_tree? 14:18:57 mriedem: yeah, I know 14:19:15 efried: I think it ought to be possible to have a cli that calls stuff in update_provider_tree, yeah? 14:19:24 Anyway, yeah, the specific internal stuff - wherever it lives - could have a "remove in T" TODO on it. 14:19:26 efried: i'm talking about whatever logic exists in the virt driver to say that a migration is needed, gimme allocations 14:19:38 cdent: That's what we've been figuring. Swhat's written in the spec. 14:19:45 mriedem: yuh. 14:20:09 ok so this seems fine - agree to go this direction in the spec, get coding and adjust course if there is an unknown dragon? 14:20:17 Sounds good to me. 14:20:18 * cdent nods 14:20:27 dansmith: ^? 14:20:53 I've been waiting for dansmith to have a crack at the spec before I post another rev. But I'll put this tweak in the queue. 14:21:05 The difference I'm seeing is that efried seems to be saying "this can happening magically" and mriedem is saying "each cycle the virt driver may need to know that how we structure things may be different"? 14:21:32 mriedem: I'm not sure what you're agreeing to that is different than what we discussed, other than just implementation, 14:21:42 but I'm still ramping up my morning and will have a look at the spec in a bit 14:21:43 i think of this as a one-time thing specific to a release, that once you do that one thing we don't need to do that one thing again, e.g. move vgpu inventory from root to child 14:22:02 mriedem: that's one way to look at it, 14:22:02 and this is required to go from rocky to stein 14:22:09 I think efried wanted it to be more of a permanent sort of thing 14:22:13 mriedem: Does FFU run these offline script thingies on every release in between? 14:22:27 we need it to be more permanent for situations where hardware is added 14:22:39 the mechanism is permanent 14:22:39 efried: yes, but so far there has never been one of those that had to run on all the compute nodes 14:22:41 as a framework 14:22:42 cdent: No, for that it's not a migration 14:22:47 cdent: hardware being added is different than migration 14:22:48 cdent: Because allocations don't have to move. 14:22:49 efried: right 14:23:04 I'm not talking the migration, I'm talking about the mechanism internal to the update_provider_tree "knowing" 14:23:11 it needs to always try to "know" 14:23:22 cdent: sure i agree that is permanent 14:23:32 the specific vgpu check is not 14:23:57 so if we know there, why does there need to be a special thing for migrations? that's the part I'm missing 14:24:01 well, number of times people said in the spec they want a generic solution 14:24:14 even if my own point was just for VGPU migrations 14:24:28 so, I'm not sure people 'd like to have a specific check 14:24:36 Basically, if FFU runs its scripty thing against every release's codebase, then yes, we can have the migrationy virt driver stuff removed on every release. But if we can skip releases, then whatever code needs to know how the Q shape changed in R needs to stick around so it can figure that out if we're going from Q to T or whatever. 14:24:39 cdent: because we don't want to do expensive allocation stuff every periodic for no reason 14:25:04 efried: you can't skip releases 14:25:14 FFU is rolling through each release's codebase 14:25:18 which the virtdriver decides if it wants to do at the start of its run, right? 14:25:23 if T needs to know how things worked in newton, we'd be f'ced 14:25:42 mriedem: we probably need to talk to some FFU people about the potential for having to run this on all computes though, 14:25:51 cdent: yes 14:25:54 because that is definitely a difference from what they have to do today 14:26:22 * bauzas disappears 14:26:30 in order to know whether you have to run it on a compute, you have to .... ask that compute. 14:26:41 dansmith: todo added to https://etherpad.openstack.org/p/reshape-provider-tree-performance, L40 14:26:41 There's no other reliable way to know. 14:26:56 dansmith: you're likely closer to any FFU knowing people 14:26:56 efried: I know 14:27:05 mriedem: ack 14:27:34 ...whereupon you might as well just run the script on the compute. It'll be a no-op if migration wasn't necessary. 14:28:08 i'll plead ignorance on what's different here 14:28:32 we didn't have to run the ironic flavor migration per compute b/c it was just a thing in the DB right 14:28:32 ? 14:28:34 efried: you'll have to, 14:28:47 I'm just saying they currently don't have to run anything on the computes during the process 14:28:48 we just re-used logic from the ironic virt driver 14:28:56 mriedem: yeah 14:28:58 mm 14:29:04 i guess i thought this was going to be similar 14:29:07 mriedem: but this will depend on the hardware on the compute 14:29:17 mriedem: we won't have enough info in the db about every node to do that 14:29:33 i thought it was just a matter of checking where the vgpu inventory is located in the provider tree? 14:29:39 nope 14:30:04 ok then i'm missing something 14:30:07 it requires looking at libvirt's topology of the system, seeing how sockets and memory are connected, which gpu is on which numa node, same for sriov, etc 14:30:10 efried: right? 14:30:21 yes. 14:30:42 ok so seeing vgpu on root provider says "MigrationNeeded!" but we need ^ for actually determining WHERE to move that inventory 14:30:43 yes? 14:30:50 right 14:30:52 ok 14:31:23 mriedem: basically, the virt driver is going to have something it does to check whether a migration is needed, and it may or may not be as simple as "do I have VGPUs in the compute RP", but whatever it is, in order to do the check without asking the virt driver, you would have to duplicate that logic somewhere. 14:32:06 and then, yes, to actually *do* anything about it, you definitely need to ask the virt driver - because only the virt driver knows which VGPU is allocated to which instance. 14:32:17 and, 14:32:21 the topology would have to be in the db, and we would have had to populate that in an older release 14:32:35 which we didn't, so we can't rely on it 14:32:39 only the virt driver knows which node the pci card is connected to 14:32:46 efried: er, sorry, read yours as VCPU 14:33:10 i don't know what alternative the FFU people have to this 14:33:13 either way - VCPUs with NUMA-ification, whatever. 14:33:17 yep 14:33:32 if you don't do an online migration each release, to let the coputes heal this themselves, then you're doing FFU and this is how we handle this case 14:33:44 there will be gnashing of teeth 14:33:51 but FFS FFU PPL! 14:33:54 the thing I'm more worried about, 14:34:06 is that it requires laying down code on the computes, even if you don't start the service, 14:34:13 which is what they want to avoid in general 14:34:17 because it makes rollback harder as well 14:34:26 containers can help if you're that sophisticated, but.. 14:34:27 run it in a venv? 14:34:39 this is why efried was suggesting that we just make it so these kinds of changes can happen at any release 14:34:41 right, container 14:34:47 cdent: yep, I know 14:34:51 a virt driver knows, at any release, that is sees new stuff 14:34:54 and just takes care of it 14:35:01 no ffu hoopa required 14:35:01 cdent: we've been doing online transformation of simpler things for a long time, 14:35:19 and the lessons I think we've learned are that keeping that code around and working for a long time is not as trivial as it sounds 14:35:44 why are these (nested rp topology) types of changes release-bound? 14:35:47 wait, rollback?? 14:35:57 cdent: They're not, I've kept saying. 14:36:05 efried: manual rollback, nothing fancy 14:36:09 yeah, I'm trying to agree with you 14:36:30 cdent: Dammit, keep agreeing with me, wouldja? 14:36:48 dansmith: I shudder to think of a *manual* rollback of a migration like this. 14:37:01 can someone define release bound here? 14:37:03 efried: um, okay 14:37:24 efried: snapshot the db, if anything goes wrong, you reinstall old packages, restore the db and then call your vendor.. lots of people do that 14:37:34 we aren't blocking you from getting to stein with this i don't think, the computes will do the migratoin on first startup once you hit your target release 14:37:35 cdent: But for FFU purposes, you'd basically have to be "accumulating" the changes, because the FFU process is going to run one step per release. 14:38:03 dansmith: Okay, I guess that works too :) Long as you didn't do anything after that that you want to preserve. 14:38:09 if we dropped this rocky specific check in stein or T or U, and someone goes from queens to V, then they'd be screwed, but we don't support skip level releases 14:38:21 efried: obviously that's a restriction yes 14:38:45 dansmith: That makes sense, sorry, I was trying to imagine manually issuing the appropriate API calls to unwind a migration like this. 14:39:15 efried: that accumulating is what I don't understand. there's state A (then) and state B (now) from the virt driver's standpoint. if there are states between A and B why do we care when we are planning to send "the state we want" to placement? 14:39:53 mriedem: I think they should roll forward with the online bit here, and if we can't get to the manual force thing in rocky then we might need a blocker migration of some sort in the future where we make sure this happens in the next release before we touch anything, but the longer we go between those two, the harder it will be to make sure we did the right thing and didn't drop it, IMHO 14:39:56 cdent: If the states are cumulative in some sense, don't we hypothetically need to understand how to go from A->B, B->C and also A->C? 14:40:26 that's what I'm getting at: would they ever be cumulative and can we just say "no"? 14:40:53 dansmith: yeah, same for things like dropping old bdm attachment record / flows modeling - it's online now b/c it runs on the compute startup (because it has to) 14:41:13 and i think we've said we'd have to do some kind of blocker migration before we could ever drop that old attach flow code 14:41:41 mriedem: that is technically doable not on the computes though right? 14:41:52 or does it require collecting attachment info only discoverable on the compute? 14:42:23 maybe, we need the host connector from the compute, but that is stored in the bdm record for anything created after i think liberty 14:42:39 anyway 14:42:40 and it requires a rest call to cinder 14:42:41 yeah 14:42:42 so, 14:42:49 we're in agreement to move forward in the spec? 14:43:25 and we agree that data migrations are a giant pain in the ass 14:43:25 I'm going to look at it when we're done here 14:43:42 cool cool. Let's move on. Thanks for the talk. 14:45:04 Other reviews to discuss/highlight? 14:45:31 #link placement direct https://review.openstack.org/#/c/572576/ 14:45:47 I've got a question in there (the FIXME noted in the commit) about where to put the boundary for things 14:45:56 it doesn't necessarily need to be resolved this cycle 14:46:06 the latest version addressed efried and mriedem 's coments 14:46:22 cdent: I haven't looked since PS1 - are we at an Adapter interface at this point? 14:46:33 it returns a NoAuthReportClient now 14:46:36 so yes 14:46:41 schweet 14:47:13 doing that is part of what introduced the boundary confusion 14:47:40 Other things: 14:47:41 consumer gens should be ready to start getting final reviews on. 14:47:41 nrp-in-alloc-cands is moving along nicely. 14:47:41 placement extraction has several patches ready for review 14:47:41 Lemme dig up some links for those... 14:47:43 (but it is the right thing to do and worked nicely. this was an especially productive review loop) 14:48:48 #link consumer gens https://review.openstack.org/#/c/567678/ 14:48:48 #link nrp-in-alloc-cands https://review.openstack.org/#/c/559480/ 14:48:48 #link placement extraction https://review.openstack.org/#/c/362766/ 14:48:48 at this point i'd like to remind people again to put stuff in runways 14:48:58 because only 1 of those 3 ^ are in runways 14:48:58 I intend to fix https://bugs.launchpad.net/nova/+bug/1773225 today, as the warning is getting to me 14:48:59 Launchpad bug 1773225 in OpenStack Compute (nova) "placement needs to stop using accept.best_match from webob it is deprecated" [Low,Triaged] 14:49:06 and the one is because i told cdent that's how to get review on it 14:49:24 i feel like the placement stuff is getting a lot of review attention even though it's mostly not in the runways etherpad 14:49:52 mriedem: Yeah, one philosophy on runways was that we should put stuff there that wouldn't get as much attention otherwise. 14:50:15 mriedem: I.e. keep urgent top-of-mind placement stuff out of there to keep it clear for lower-pri items. 14:50:19 i think by doing so, the stuff that's in runways is still not getting attention 14:50:24 which was the point of runways, getting attention 14:50:35 things can still be in the runways queue 14:50:37 I don't disagree, just echoing info. 14:50:42 and getting reviewed 14:50:56 i think the consumer generation stuff would have been in a slot by now if it had been put in the queue weeks ago when it was ready for review 14:51:03 though placement-extract wasn't getting as much attention as anything else anyway, so it definitely warrants a runway. 14:51:25 That bottom change is 103 patch sets old. 14:52:11 i'm speaking out of general anxiety over being in the 3rd milestone and feeling unaware of what's actually going on wrt priorities for the release 14:52:17 which is maybe just a personal problem :) 14:53:08 True, we kinda punted on the usual priorities list with some hand-waving about runways, but the latter doesn't replace the former. 14:53:44 It's not too late for us to nail (or at least write) down a list of priorities for the release. 14:53:57 That's probably a topic for the nova meeting tho. 14:54:15 i wouldn't look at the specs repo for priorities, migrator wouldn't have been in there anyway 14:54:22 nail is maybe a strong word: given what we added as a priority for placement just last week 14:54:30 cdent: right 14:54:45 which is probably just the way it goes, sadly 14:55:10 anyway, i would just like a state of the state or something, but maybe that's just me 14:55:34 i always had an etherpad of all the approved blueprints and their status by the 3rd milestone to help gauge where things were at 14:55:40 I'll (finally) be able to get back to the placement update this friday (done with travelling) which will probably at least a bit of state 14:56:12 cool. 14:56:16 i guess we're done here? 14:56:20 before we all start crying, any last comments before endig the meeting? 14:56:31 * efried waits to see if cdent can spell #endmeeting 14:56:32 other than good luck, and such like? 14:56:43 too soon 14:56:51 see this is why it is so embarassing, I can spell just fine. I can't type. 14:56:57 #endfried 14:57:00 nope, that's not it 14:57:01 hahahha 14:57:09 #endem 14:57:14 almost 14:57:17 going? 14:57:22 #endmeeting