13:59:59 #startmeeting nova_scheduler 14:00:00 Meeting started Mon Jul 30 13:59:59 2018 UTC and is due to finish in 60 minutes. The chair is efried. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:02 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:04 The meeting name has been set to 'nova_scheduler' 14:00:07 o/ 14:00:21 o/ 14:00:50 o/ 14:01:00 o/ 14:01:24 o/ 14:01:38 * bauzas waves 14:02:32 Okay, let's get started. 14:02:33 #topic last meeting 14:02:33 #link last minutes: http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-07-23-14.00.html 14:02:33 Any old business? 14:02:36 o/ 14:03:11 #topic specs and review 14:03:11 #link latest pupdate: http://lists.openstack.org/pipermail/openstack-dev/2018-July/132562.html 14:03:45 Now that we're past FF and the 14:03:45 #link reshaper series: https://review.openstack.org/#/q/topic:bp/reshape-provider-tree+status:open 14:03:46 has been deferred, I'm not completely sure what our priorities are supposed to be. Anyone? 14:04:29 Perhaps a focus on 14:04:29 #topic bugs 14:04:29 #link placement bugs: https://bugs.launchpad.net/nova/+bugs?field.tag=placement&orderby=-id 14:05:20 or 14:05:20 #topic opens 14:05:21 Planning/Doing support in nova/report client for: 14:05:21 #link consumer generation handling (gibi): https://review.openstack.org/#/c/583667/ 14:05:21 ? 14:05:57 I, for one, would like to see the reshaper series finished and reviewed, ready for when we branch so we can land it in Stein first thing. 14:06:32 I think this week it's all bugs? 14:06:47 okay then 14:06:59 It would be nice to see some 14:06:59 Planning/Doing support in nova/report client for: 14:06:59 nested and shared providers for initial allocations 14:06:59 nested and shared providers when modifying migration (and other?) allocations 14:06:59 as well. 14:07:43 * gibi joins late 14:07:58 We had some discussion last week about the sharing provider support we landed for libvirt. TL;DR: it's broke, so we ripped it out. See https://review.openstack.org/#/c/586614/ and https://bugs.launchpad.net/nova/+bug/1784020 14:07:58 Launchpad bug 1784020 in OpenStack Compute (nova) "Shared storage providers are not supported and will break things if used" [High,Triaged] 14:08:31 so it would be neat if we could try to work through some of that, but a lot of it is going to be Stein (and possibly beyond) blueprint work. 14:09:29 Yeah, I think we need to do some exploring and experimenting to make sure we really know what's going on 14:09:44 Tests and CI jobs would be great. 14:10:39 Anyone have anything else? Specific bugs or topics to bring up? Open discussion? 14:11:05 Just a reminder for anyone to put stuff on https://etherpad.openstack.org/p/nova-ptg-stein 14:11:11 efried: so, the bugs on shared storage providers... are we not able to fix those in Rocky? 14:11:17 no way 14:11:38 I should link the IRC discussion, stand by lemme find... 14:11:51 jaypipes: there was quite a bit of discussion on thursday and friday in irc about slow gateness and features not being quite there 14:12:11 which has resulted in a great deal of stuff being even less there 14:13:00 cdent: ok, yeah if anyone has links for me to read, that would be great. tia 14:13:13 currently reading the bug report 14:13:23 This is probably as good a place as any to start reading. Goes on for a while: 14:13:23 #link IRC discussion about busted sharing provider support http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-07-27.log.html#t2018-07-27T14:43:33 14:13:31 matt and dan may be able to give you a quick summary of their perspectives 14:13:42 The bug report actually summarizes the issues pretty well. 14:14:01 But I was convinced it was going to be too much to qualify as "bug fixes" in rocky. 14:14:19 and I tend to be the optimistic one. 14:15:44 Do we want to discuss further here? jaypipes cdent? 14:16:23 I've got nothing to add at this time. I think after jaypipes has digested if we want to continue we can do it after the meeting 14:16:38 efried: just disappointing to push it yet another release, but at this point, it's par for the course. 14:17:46 I think good progress was made, but note that we explicitly made it not a priority for Queens and then never brought it back to the forefront for Rocky. So if nothing else, this is good reason for us to make it an explicit priority for Stein. 14:18:09 And to spec out exactly what we need to do to make it work and declare it done. 14:19:01 efried: the way we handle migrations currently, I'm not sure there's ever going to be a solution to this :( 14:20:12 efried: at least, not as long as we can't tolerate doubling up allocations on shared storage for a brief period of time during migration. 14:20:38 jaypipes: Incremental progress. We can start by making sure allocations aren't lost/doubled. Then next step, making sure the same provider is picked on the target when it's available. Then making sure that's done without having doubled allocations. Then doing the migration without re-copying the disk contents. 14:20:39 efried: we go through all sorts of code gymnastics to cater to resize to same host shit. 14:21:07 I think if we spec it out, we'll be able to solve it. 14:21:17 #link Stein PTG etherpad https://etherpad.openstack.org/p/nova-ptg-stein 14:21:39 L82 mriedem started some bullets around this. 14:21:59 efried: doubling up allocations *should* be fine -- and standard practice during migrations. it is the fact that due to resize to same host and concerns about potentially running out of perceived disk space on a shared provider that we *don't* just always double shit up, right? 14:22:19 that's not what's happening today 14:25:04 I imagine there will be discussion about whether we prefer dest hosts sharing the same providers as existing allocations - I imagine the answer is yes - and then what we do if we can't find one of those. So maybe we ask for alloc candidates *excluding* shared resources but which are ?member_of the aggregate(s) of the sharing providers? And if that results in no hits, then we fall back to the regular thing and accept 14:25:06 * jaypipes reading IRC backlog 14:25:22 Sounds like ideal PTG whiteboard fodder. 14:25:59 efried: sounds way too complicated to be safe/clean. which is typically the case with anything migration-related... 14:27:21 denver has all the best whiteboards upstairs, and none of them downstairs 14:27:28 The alternative is not caring that we're asking for more resource than we actually need, and thus not being able to hit full utilization without incorrectly failing migrations... 14:28:33 which may be okay as an incremental first step 14:29:00 efried: that is *precisely* what I just said above :) 14:29:32 efried: in other words, don't try to overthink things. just always create a doubled allocation during migration -- owned/consumed by the migration UUID. 14:29:46 efried: and don't try to figure out whether there's a sharing provider in use or not. 14:30:08 Yuh, what I'm saying is, once we've implemented that, let's not dust off our hands and quit trying to make it better. 14:30:24 efried: and on the rare occasion where the scheduler/placement returns no valid host due to exceeding capacity, then oh well... 14:31:00 buy more disk 14:32:25 Soooo, is someone going to start writing a spec? 14:32:41 migration isn't the only issue in play 14:32:51 keep in mind upgrades then ;) 14:32:56 ^ 14:33:11 or we would have yet another spec like reshape :) 14:33:37 efried: I'm unable to commit to anything like spec writing _now_ 14:34:20 later perhaps, but I need more mental headspace to do anything useful, and my backlog is way full 14:34:25 I suspect that is the case for everyone 14:34:36 the reshaper is the thing that is needed for sharing providers to be a completely automated setup. 14:34:41 while that's true, we'll just have to pass our apoligies downstream 14:34:46 There was a discussion last week about process for big, multi-release features like this. It was suggested to have a high-level spec to describe things at the conceptual level and enumerate some of the components; and then single-release-contained specs for those components. 14:35:14 we always said that when we added the sharing provider deletes DISK_GB inventory patch that the operator would be responsible for manually moving the DISK_GB allocation records from compute nodes onto the sharing provider. :( 14:35:42 I guess now we want to automate all of that. 14:36:28 Well, the op was responsible for creating the inv on the sharing provider. But the way we implemented the libvirt update_provider_tree, the driver "automatically" deleted it from the compute node. 14:36:34 god forbid an operator needs to run a command to move data when they set up shared storage... 14:36:54 well 14:36:59 idk 14:37:01 oh, allocation records, yeah 14:37:11 efried: yes, and auto-deleting that inventory record from the compute node would work just dandy if the operator had already moved the allocations to the shared storage provider. :( 14:37:15 Do we have osc-placement yet? 14:37:19 for allocs 14:37:24 nope AFAIK 14:37:32 we had a BZ about it 14:37:36 I thought we did? 14:37:37 (internally) 14:37:44 oh, okay 14:37:54 for deleting allocations ? 14:38:24 bauzas: we don't want to delete allocations. the solution agreed upon was using the replace-the-world reshaper for these kinds of maintenance events. 14:38:28 because IIRC, the BZ reporter said he had to call the API directly for deleting allocs 14:38:47 He would want to POST allocs. 14:39:01 jaypipes: oh okay, if that's a design consensus :) 14:39:05 move them from the compute node rp to the shared storage rp. 14:39:07 I'm fine with it 14:39:11 But yeah, reshaper could be used for this also. 14:39:24 tbc, the BZ wasn't about asking to create a osc-placement call for deleting allocs 14:39:26 efried: sharing storage providers was one of the primary use cases for reshaper... 14:39:33 um 14:39:57 I very specifically remember talking excitedly with cdent about using reshaper to "shape" a sharing provider world. 14:40:20 I mean, I can see how that would work, but I don't remember sharing being part of the discussions. 14:40:20 efried: the two primary use cases were NUMA-fying and sharing storage-ifying 14:41:06 efried: in any case, I also remember speaking about the need to move allocation records for DISK_GB to the sharing provider as a manual operator action 14:41:19 Was that before reshaper was conceived? 14:41:25 yes 14:41:55 but now we're delaying the one thing that would actually be useful for fixing this problem for sharing providers. so, oh well, guess Stein it is. 14:42:21 What are you suggesting as an alternative? That we land reshaper and use it to fix sharing providers? 14:42:23 in Rocky? 14:42:34 efried: yes. 14:42:44 Between now (FF was last week) and The End? 14:43:44 efried: I know FF was last week... I don't really care about FF if I'm being honest. Never have. Probably never will. I don't like rigid release cycles. sorry... 14:43:59 efried: I know I'm in the minority and am just bitching. 14:44:22 Well then, let's get the reshaper series finished up and reviewed, and we can buck the process when we've demonstrated that the code is ready, howzat? 14:44:36 jaypipes: I don't know that you're in the minority, I think it is rather that Dan and Matt and Mel have already stated their opinion and you weren't there 14:44:53 (not there for good reasons) 14:45:01 efried: heh, good luck with that :) but yes, I'm happy to review and fix up the reshaper series as needed. even though I know it doesn't have a chance of landing before October. 14:45:10 _plenty_ of people think the rigid release cycles are cray cray 14:45:13 Code is ready except for the last patch, which needs test. If we want to get aggressive about this, I can have that patch finished up in the next couple of days. 14:45:35 cdent: it's not just dan, matt and melanie (and I don't blame them for anything). it's just the flow of OpenStack. it is what it is. 14:45:52 * cdent blanches 14:45:59 I hate the phrase "it is what it is" 14:46:01 efried: I can commit to reviewing as much as is pushed up to gerrit. 14:46:11 cdent: yeah, I know. sorry :( 14:46:53 jaypipes: Series starts here: https://review.openstack.org/#/c/576927/ -- 8 patches, 7 of which are complete and ready for review. 14:47:49 efried: ack, I'll review all today. 14:48:05 Cool, I'll work on that last patch. 14:48:12 which, of course, will be the most complicated :) 14:48:32 So, anything else for today? 14:49:22 not from me. 14:49:51 ight, laters y'all 14:49:53 #endmeeting