14:00:02 #startmeeting nova_scheduler 14:00:03 Meeting started Mon Jun 18 14:00:02 2018 UTC and is due to finish in 60 minutes. The chair is cdent. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:04 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:06 The meeting name has been set to 'nova_scheduler' 14:00:16 #chair edleafe jaypipes bauzas efried 14:00:17 Current chairs: bauzas cdent edleafe efried jaypipes 14:00:19 * bauzas puts one foot on the door 14:00:31 #link agenda https://wiki.openstack.org/wiki/Meetings/NovaScheduler 14:00:33 o/ 14:00:35 ō/ 14:00:44 #topic specs and review 14:00:46 o/ 14:00:50 #link latest pupdate: http://lists.openstack.org/pipermail/openstack-dev/2018-June/131540.html 14:01:00 hah, pupdate 14:01:06 latest pupdate was long and had many things in it. Anything people would like to bring up? 14:01:12 I guess that's because it's young 14:01:42 and when you forget to tell it, it's an oopsdate 14:01:57 o/ 14:02:10 * bauzas trying to make jokes in English, so apologies for everyone 14:02:42 the one thing we probably need to coordinate/talk about is getting started on the rest of the /migrator related work, and resolving the debate on the spec 14:02:49 agree 14:03:01 dansmith and bauzas seem to be on one side of the discussion and efried and cdent on the other. not sure about anyone else? 14:03:34 when I started the thread, I considered an upgrade impact 14:03:52 so I don't see why we should run it idempotently 14:03:54 o/ 14:04:04 unless we want an online data migration 14:04:16 I don't disagree with the theory that it should only run on compute startup. My problem is how to handle partial failures. 14:04:23 but then, it's a race, nope ? 14:04:29 i need to go over the spec again 14:04:46 mriedem: tl;dr: the question is about when we should call the new API 14:05:14 i thought there were 2 cases: on compute startup (online migration) and via CLI (FFU/offline migration)? 14:05:19 dansmith and me think just once when starting the service, while cdent and efried think we should do this every 60 secs 14:05:32 mriedem: me too 14:05:33 that's not accurate bauzas 14:05:34 i think the latter is too heavyweight 14:05:49 mriedem: No, realistically it won't run every 60s. 14:05:50 i mean, doing it every minute 14:05:51 cdent: okay, please help me understanding then :) 14:05:54 it's "check to see if maybe we should" 14:05:58 'run until done'? 14:06:06 what does that check consist of? 14:06:22 vgpu inventory on the root provideR? 14:06:25 ^ The virt driver determining whether any resources need to be moved. 14:06:31 mriedem: Yes, that would be one example. 14:06:47 we could have a nova-manage status check ? 14:07:00 does that require any additional API calls to placement? or does the virt driver already have all of that information normally every RT update call? 14:07:05 the latter. 14:07:25 i know we talked about 'is_migrated' type flags in the db for this, but there isn't really any place to put those 14:07:34 unless you stashed some random crap in the compute nodes stats field 14:07:46 yep 14:07:55 there are 2 possibilities 14:08:24 * alex_xu waves late 14:08:29 1/ is we say for a race concern we want to atomically modify the inventories/allocations when restarting and then we're done 14:09:02 2/ is saying we could have the service accepting both models, and just moving the inventories/allocations online 14:09:10 but then we would have races, right? 14:09:22 bauzas: No races with the POST /migrator model. That's not the issue. 14:09:25 Races are not a problem. 14:09:36 yeah i wasn't following what the race was 14:09:47 if the placement migrator api is all atomic, which we said it would be, then you either migrate or you fail 14:09:51 efried: I know, but the /migrator call will move *all* the allocations by once 14:09:56 and i think efried and cdent are concerned about retrying the migration on failure 14:10:08 efried: so why should we run yet another time ? 14:10:17 I think the thing dansmith is concerned about is that a migration could potentially happen in the middle of a running and otherwise-stable compute service. 14:10:23 so it sounds like if we failed to migrate on startup, options are (1) do it during the regular RT update if needed or (2) wait for the next restart? 14:10:33 either because we just shouldn't do that, or because there's a concern that some part of that is very heavy to do. 14:10:39 mriedem: if the API call returns an exception, then dansmith and me are accepting to not run the service 14:10:45 ie. a blocking upgrade 14:10:46 i was just going to say that 14:10:51 if the migration fails on startup, then kill nova-compute 14:10:53 mriedem: I don't understand how we can continue to start, 14:11:12 i'd be fine with that, then we don't need the RT update logic 14:11:17 because then all the compute, RT and virt code has to support running with things nested the way they should be, and un-nested if for some reason we haven't migrated 14:11:18 that's the whole point of an *atomic* transaction 14:11:18 and this is then all throwaway in stein 14:11:24 mriedem: exactly 14:11:28 do it once forever 14:11:34 not throwaway in stein 14:11:46 *this specific check* is throwaway in stein 14:11:51 for vgpu on root provider i mean 14:11:56 yup 14:12:01 this specific migration would be throwaway in stein 14:12:01 I won't have time to work on it 14:12:04 gotcha. 14:12:09 agree. 14:12:18 the model / framework can be used again in the future if needed 14:12:28 my question was theoretical for the services where the nova-compute is not on the same hardware. In that situation is there a reason why we would want to change the hardware, notice it, and not restart the nova-compute. If we don't want to care about that, then killing the nova-compute (and thus making the entire thing the nova-compute is managing dead) is okay enough 14:12:29 yup, we said at least for NUMA resources 14:12:55 cdent: i don't want to care about that 14:13:05 especially for v1 of the migrator big upgrade thing 14:13:17 mriedem: sure. that's why it is a theoretical question 14:13:24 i.e. let's not conflate the complexity of those now 14:13:26 ack 14:14:03 so (1) online migration on startup, kill nova-compute if it fails and (2) CLI hook for FFU offline migrations using placement direct 14:14:04 yeah? 14:14:26 So then the model is this: On startup only, we retrieve allocations and pass them to update_provider_tree. If update_provider_tree gets allocations, it's allowed to do a migration (if needed). Otherwise it must continue along with the existing model, even if it otherwise thinks a migration is called for. So there's no longer a MigrationNeeded exception. And if update_from_provider_tree raises an exception in the 14:14:26 ignore the exception and go on to the next periodic. 14:14:28 seems so. is a lot of work for a one off :(. I hope we can make it magically useful in the future 14:15:15 efried: why do we need to get all allocations? i liked the MigrationNeeded model. because otherwise, we're always getting all allocations on startup even though we don't need them after we've migrated 14:15:24 what mriedem said 14:15:28 see my last comments on the spec 14:15:40 if at startup we get the exception, it's a signal for us 14:15:42 cdent: see the flavor online data migration changes in kilo....it is a lot of work yeah 14:15:43 mriedem: Because then we have to have some *other* signal to tell update_provider_tree that we're allowing a migration to be possible. 14:16:10 efried: i'm not sure i'm following 14:16:43 If update_provider_tree doesn't know whether it's been invoked "on startup" or in the regular RT flow, then it's gonna raise MigrationNeeded any time it thinks a migration is needed. 14:16:51 i thought the MigrationNeeded thing was our framework for per-release checks 14:17:04 that's my thoughts at least 14:17:22 that's how the virt driver tells the service at startup 'hey dude, you have things to do' 14:17:26 It's the framework for the virt driver to be able to signal whether a migration is needed. It doesn't know whether we're going through a FFPU or whatever. 14:17:34 bauzas: except without the concept of "at startup". 14:17:45 Again, unless we inject some other signal for that. 14:17:47 so, how are we proposing to indicate we're on startup? 14:18:03 we do have a startup flag in the compute manager when we call update_available_resource 14:18:04 when we discussed on the signal and we agreed on an exception, my mind was set on this being raised once 14:18:07 efried: $ 14:18:12 efried: ^ 14:18:20 * bauzas stupid AZERTY keyboard 14:18:30 https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L7367 14:18:56 that's called from https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1161 14:19:00 mriedem: I was considering to plug it into __init_host() 14:19:12 and we can die there https://github.com/openstack/nova/blob/master/nova/service.py#L180 14:19:20 on a hill 14:19:29 bauzas: that could work too 14:19:32 oh, the "startup" parameter is nice 14:19:39 I wasn't knowing it 14:19:51 point is, we have a couple of places in compute manager where we know we're starting up the service 14:19:58 mriedem: Point is still that we need a way to tell update_provider_tree a) we're on startup; and/or b) migrating is allowed. 14:20:00 that wouldn't require a new hook on init_host(), so I back up my proposal 14:20:07 Do we add the 'startup' param to update_provider_tree? 14:20:10 we can call the update_provider_tree stuff w/o allocations, get MigrationNeeded, pull allocations and redo right? 14:20:13 w/o it being part of the RT.update 14:20:18 efried: why do we need this ? 14:20:35 efried: we just call update_p_t() which would return us an exception once 14:20:49 i figured there would be some kind of 'check_migrations' flag to upt or something 14:20:50 in other words, what mriedem said 14:20:54 idk what you were thinking about that 14:21:01 bauzas: Because if update_provider_tree decides a migration is necessary in the middle of normal/stable n-cpu operation, it'll raise MigrationNeeded. And then what? 14:21:13 then the RT.update would bomb out 14:21:19 which efried is trying to avoid 14:21:19 needs to be a flag 14:21:31 only true at startup 14:21:33 yeah i'm fine with passing a 'check_migrations' flag on startup to upt 14:21:37 which sounds pretty easy to pass along 14:21:41 yup 14:22:00 anyway, I need to bail out 14:22:08 if in the distant future we decide we are cool with periodic online migrations, we have a hook to do that 14:22:08 and gives us the chance to change it under whatever bizarre circumstances we come up with later 14:22:11 jinx 14:22:31 I'm going on record as not liking how many teeny cogs we're throwing into this thing. It doesn't have to be this complicated. But okay. 14:22:31 efried: are you ok with that? 14:22:39 qualified yes --^ 14:22:55 those who are absent are always wrong, so feel free to agree on any implementation detail and I'll vote accordingly 14:23:03 do we accept normal n-cpu operation before init-host finished? 14:23:03 * bauzas runs 14:23:18 alex_xu: we'd be calling after init_host 14:23:24 https://github.com/openstack/nova/blob/master/nova/service.py#L162 14:23:27 https://github.com/openstack/nova/blob/master/nova/service.py#L180 14:23:28 so we have tentative plan, which could change, but doesn't really change anything about the rest of the work, right? 14:23:31 the latter is when we migrate 14:23:50 I'm assuming we'll want to start on the /migrator plumbin asap, yes? 14:23:54 to summarize, keep me honest: 14:24:00 upt gets two new params: allocs and check_migration. If (allocs is None and check_migration == True) then upt gets to check whether migration is needed and raise MigrationNeeded if so. Whereupon RT gets allocs and passes them in. 14:24:17 But that's the only time upt is allowed to raise MigrationNeeded. 14:24:23 1. only migrate on startup, using the startup flag from pre_start_hook, and pass a check_migrations flag to upt, handle MigrationNeeded, etc, fail and kill nova-compute on error 14:24:34 2. add a hook for FFU offline to do similar 14:24:35 aye 14:24:46 3. throwaway the specific vgpu root provider inventory check in stein 14:25:25 ok so todo is efried updates the spec for that? 14:25:35 roger wilco. 14:25:36 and then the rest of us go over it again, and start coding 14:25:50 cdent: Yes, the /migrator stuff doesn't change with any of this, and can/should get started asap. 14:25:55 ack 14:26:04 ✔ 14:26:19 Hopefully the API for that is clear in the spec. 14:26:27 it was last i read it 14:26:34 yeah 14:26:44 I want a separate error code for each conflict scenario, though. 14:26:48 * jaypipes reading back, sorry for lateness. 14:27:05 the http parts should be straightforward. wiring it up to the db, less so, but mostly in terms of wiring 14:27:13 not actual "wow, that's hard" 14:27:14 I hope 14:27:54 cdent: I would think we could and should reuse existing code. 14:28:11 cdent: We should need to write zero additional sql for this. 14:28:30 cdent: And the schema should be glued-together bits of existing ones. 14:28:41 thus the hope 14:29:32 okay, any other specs and reviews things to talk about ? 14:29:40 I mean, I'm sure there's a fancy schmancy way to do the atomic migration in a single monolithic sql statement. But I'm also saying we should not do that, at least at first, or possibly ever, untless we can show that there's some good reason for it. 14:30:37 #topic bugs 14:30:38 #link placement bugs: https://bugs.launchpad.net/nova/+bugs?field.tag=placement&orderby=-id 14:31:14 my read is that we haven't got any recently new bugs, except for those that have already been resolved 14:31:32 but there are 27 there. which is more than 0 14:31:42 * mriedem double checks that math 14:32:05 In [1]: 27 > 0 14:32:05 Out[1]: True 14:32:07 Math checks out. 14:32:08 I'm not sure. I was educated using "new math" so can't do anything with numbers. 14:32:37 anyone else seen Incredibles II yet? Good "new math" jokes in there. 14:32:44 not yet 14:33:21 <3 <3 highly recommended. But back to our regularly scheduled scheduler schedule. 14:33:46 okay then 14:34:15 #topic opens 14:34:15 [put yours here] 14:34:21 anyone? 14:34:27 i've started the sync_aggregates CLI 14:34:37 https://review.openstack.org/#/c/575912/ 14:34:47 quite a few TODOs yet, but it's the general idea 14:34:57 but that's the last piece of the mirror aggregates bp 14:35:32 the scheduler report client bits to use aggregate generation need to happen too, but i might just do those in a follow up 14:35:34 sorry for the post-discussion question, but what are we proposing to do when allow_migrate=False and the /migrator endpoint has *already* been POST'd to? i.e. will the virt driver try and give placement a pre-nested model again if that happens? 14:35:41 mriedem: I've got those. 14:35:54 mriedem: https://review.openstack.org/#/c/556669/ 14:36:49 jaypipes: You mean if upt detects a migration is in order, but it received check_migration=False? 14:37:02 jaypipes: Then yes, it has to continue supporting the existing model, whatever it is, even if it's wrong. 14:37:52 It can still shuffle providers, traits, and aggregates, and *add/remove* inventories, but it's not allowed to *move* existing resource classes. 14:38:03 efried: no, as in the migration from non-nested to nested happened already, then n-cpu is restarted, the update_available_resources() periodic runs, passing allow_migrate=False (and no allocations) and the virt driver reports a non-nested model of providers. 14:38:25 efried: in other words, how is the virt driver supposed to know that it has already moved to a nested model? 14:38:48 jaypipes: Because the provider_tree it gets from RT matches its world view, instead of looking like the non-nested old view. 14:39:01 efried: ok 14:39:11 check me on this - does it make sense? 14:39:44 efried: yeah, I guess so. lots of cogs in this, but I guess all the cogs are necessary for all the offline/online/don't break anything edge cases. 14:40:04 jaypipes: I went on record as disagreeing that all the cogs are necessary. 14:40:07 I'm just nervous with all the complexity, is all. 14:40:12 Agree. 14:40:33 But outvoted by people with +2 specs privs :P 14:40:54 efried: yes, I read that. I understand the complexity is necessary, though. just want to make sure that the "normal" processes are going to be able to run simply. 14:40:55 * cdent passes around the xanax 14:41:04 don't take it all though, I need some 14:41:16 efried: and by "normal processes" I mean the update_available_resource() periodic. 14:41:28 efried: just want to make sure that code path is alright. 14:41:43 since obviously, it's run all the time and failure in it will break the world. 14:41:44 jaypipes: It's going to be about 2x as complex for upt to deal with. 14:42:02 if we get to check_migrations=False and RT.update (normal) and the virt driver thinks it needs to do the old model, we done f'ed up yeah? 14:42:14 multiply that by however many virt drivers need to implement it. By however many migrations happen between now and the end of time. 14:42:34 mriedem: I would think so, yes. just trying to get agreement on that from folks. 14:42:38 mriedem: uh, you mean like a backwards migration? 14:42:45 no 14:42:47 i mean, 14:42:50 we migrate on startup or fail 14:43:03 after that, the 'normal' get provider tree stuff in the virt driver should assume new omdel 14:43:04 so s/old/new/ in your statement above? 14:43:05 *model 14:43:14 if it for some reason sees/thinks old model, it should raise 14:43:23 should it? 14:43:32 And should we blow up n-cpu at that point? 14:43:32 why wouldn't it? 14:43:56 What we talked about above is that if check_migration is False, virt has to keep supporting whatever model we fed it, and do it with a smile. 14:44:21 "whatever model we fed it" should be the new model 14:44:24 or we done f'ed up 14:44:35 we should only be feeding the new model after startup 14:44:43 right, what mriedem said. I just wanted to make sure we all agreed on that. 14:44:45 otherwise we failed and didn't kill nova-compute like we were supposed to 14:45:05 so if that happens we should kill n-cpu even if it's in non-startup RT cycle? 14:45:10 (this is the "partial failure" scare that I had, btw) 14:45:16 right 14:45:18 you can't kill compute from an RT.update 14:45:22 right 14:45:24 you just fail the periodic 14:45:27 right 14:45:32 And then it raises again next time. 14:45:34 and the time after that 14:45:37 and the time after that 14:45:42 we get it. :) 14:45:44 and is basically running but sick forever. 14:45:50 until....what? 14:45:57 what's the remedy at that point? 14:45:58 well, if there is a legitimate reason we'd be failing there, we'd eventually want to auto-disable that compute 14:46:14 Whereas if we allow "agile" migrations, it's self-healing. 14:46:15 anyway, this shouldn't happen 14:46:18 efried: just keeping logging errors, that's all. 14:46:32 efried: "agile" migrations? 14:46:38 (14 minute warning) 14:46:56 believe it or not, there are likely bugs in the alternatives 14:47:04 so we need to shit or get off this pot 14:47:17 jaypipes: spec as currently written allows the possibility of upt raising MigrationNeeded any time it's called, and RT responding by retrieving allocs and reinvoking 14:47:21 mriedem: agreed. sorry for bringing up the topic. 14:47:33 jaypipes: not directed at your specifically, just all of us 14:47:34 jaypipes: So a migration could *theoretically* happen any time. But *realistically* will only ever happen at startup. 14:47:37 *you 14:48:03 jaypipes: But we wouldn't need the extra "startup" flag percolating around. And the upt logic is simpler. 14:48:20 efried: but the virt driver logic is likely not, right? because then it has to handle both old and new models 14:48:39 mriedem: No, in that case it only has to know whether the model it's fed is out of sync. 14:48:50 mriedem: If yes, raise MigrationNeeded, get allocs, and fix it. 14:48:53 but like i said, if we migrate and fail, we kill the service on startup 14:48:56 efried: and how would it know that? 14:49:00 we don't get to a 'fed a model out of sync' 14:49:32 jaypipes: How does it know the model it gets from RT (e.g. monolithic) is out of sync with the model it thinks it should be using (e.g. nested)? 14:49:47 whole purpose of upt, I thought. 14:49:53 in this case the check is more finicky 14:50:09 because it has to know if existing inventories are moving. 14:50:19 but basically ^ is the check. 14:51:23 So mriedem jaypipes if we're going to do it this way - where upt raises any time it thinks we're out of sync - then we don't need the extra check_migration flag. 14:51:37 upt gets to act as written in the spec; we're just changing how RT responds to that. 14:52:01 on startup, except MigrationNeeded => get allocs and redrive; on periodic, except MigrationNeeded => log error and move on. 14:52:23 efried: no, I was more thinking of how does the virt driver know that a nested model with allocations on child X would need to move those allocations to child Y... 14:52:28 (8 minutes) 14:53:01 jaypipes: Well, that's a migration we don't need to think about at the moment. 14:53:07 efried: same for sharing providers I guess.. 14:53:15 jaypipes: Do you have an example of where that would happen? 14:53:42 efried: virt driver suddenly needs to know (somehow?) that DISK_GB allocs need to be moved from local to a sharing provider. 14:53:59 where they're actually the same allocs? 14:54:02 "on periodic, except MigrationNeeded => log error and move on." - the compute service shouldn't be running if we're hitting this 14:54:28 mriedem: Agree *shouldn't*. Which is I guess why we're okay allowing this anomalous sick-but-alive state. 14:54:52 when did we say we were ok with that? 14:54:55 Basically, that should happen with the same frequency as the theoretical mid-stream migration in the spec as written. 14:54:56 we said, migrate on start or fail 14:55:38 mriedem: Right, but we need a condition, however rare, for the scenario where upt thinks a migration is needed, but didn't get the check_migration flag. 14:55:48 mriedem: Either it has to keep supporting the old model, or it has to blow up. 14:55:58 and if it blows up, RT either has to log-and-ignore, or itself blow up. 14:56:14 "upt thinks a migration is needed"? - you mean the virt driver thinks a migration is needed? 14:56:19 mriedem: yes. 14:56:22 upt is a virt driver method. 14:56:34 Should never happen 14:56:43 it has to blow up 14:56:51 okay, cool. Then we don't need the extra flag. 14:56:55 and the upt logic is back to simple 14:56:57 which is goodness. 14:57:07 mriedem: Does RT also blow up, or just log-and-proceed? 14:57:22 what is there to proceed with? 14:57:33 mriedem: Same way we handle almost every other exception at this level. 14:57:35 upt is the interface between the virt driver and placement right? 14:58:03 Whenever RT gets an exception in _update, it logs it and keeps running, til the next periodic. 14:58:18 mriedem: Yes, pretty much. 14:58:25 brokered by RT. 14:58:46 "okay, cool. Then we don't need the extra flag." 14:58:52 i disagree with the premise 14:59:08 of the virt driver needing to check "oh check_migrations=False but i'm still going to see if i should be migrating anyway" 14:59:14 we've got one minute left, continue on spec or in #openstack-placement ? 14:59:23 the latter. 14:59:26 yeah i need to drop 14:59:32 damn 14:59:41 spec discussions are so choppy. 14:59:45 time-wise. 14:59:46 It's clear that the process of creating and using /migrator is involved, so clear your schedules! 14:59:56 true 15:00:12 we out of time, see you elsewhere 15:00:19 thanks for coming and all your enthusiasms 15:00:24 #endmeeting