#openstack-meeting-alt log

13:59:59 <efried> #startmeeting nova_scheduler
14:00:00 <openstack> Meeting started Mon Jul 30 13:59:59 2018 UTC and is due to finish in 60 minutes.  The chair is efried. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:02 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:04 <openstack> The meeting name has been set to 'nova_scheduler'
14:00:07 <takashin> o/
14:00:21 <tssurya> o/
14:00:50 <alex_xu> o/
14:01:00 <tetsuro_> o/
14:01:24 <jaypipes> o/
14:01:38 * bauzas waves
14:02:32 <efried> Okay, let's get started.
14:02:33 <efried> #topic last meeting
14:02:33 <efried> #link last minutes: http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-07-23-14.00.html
14:02:33 <efried> Any old business?
14:02:36 <cdent> o/
14:03:11 <efried> #topic specs and review
14:03:11 <efried> #link latest pupdate: http://lists.openstack.org/pipermail/openstack-dev/2018-July/132562.html
14:03:45 <efried> Now that we're past FF and the
14:03:45 <efried> #link reshaper series: https://review.openstack.org/#/q/topic:bp/reshape-provider-tree+status:open
14:03:46 <efried> has been deferred, I'm not completely sure what our priorities are supposed to be. Anyone?
14:04:29 <efried> Perhaps a focus on
14:04:29 <efried> #topic bugs
14:04:29 <efried> #link placement bugs: https://bugs.launchpad.net/nova/+bugs?field.tag=placement&orderby=-id
14:05:20 <efried> or
14:05:20 <efried> #topic opens
14:05:21 <efried> Planning/Doing support in nova/report client for:
14:05:21 <efried> #link consumer generation handling (gibi): https://review.openstack.org/#/c/583667/
14:05:21 <efried> ?
14:05:57 <efried> I, for one, would like to see the reshaper series finished and reviewed, ready for when we branch so we can land it in Stein first thing.
14:06:32 <cdent> I think this week it's all bugs?
14:06:47 <efried> okay then
14:06:59 <efried> It would be nice to see some
14:06:59 <efried> Planning/Doing support in nova/report client for:
14:06:59 <efried> nested and shared providers for initial allocations
14:06:59 <efried> nested and shared providers when modifying migration (and other?) allocations
14:06:59 <efried> as well.
14:07:43 * gibi joins late
14:07:58 <efried> We had some discussion last week about the sharing provider support we landed for libvirt. TL;DR: it's broke, so we ripped it out. See https://review.openstack.org/#/c/586614/ and https://bugs.launchpad.net/nova/+bug/1784020
14:07:58 <openstack> Launchpad bug 1784020 in OpenStack Compute (nova) "Shared storage providers are not supported and will break things if used" [High,Triaged]
14:08:31 <efried> so it would be neat if we could try to work through some of that, but a lot of it is going to be Stein (and possibly beyond) blueprint work.
14:09:29 <cdent> Yeah, I think we need to do some exploring and experimenting to make sure we really know what's going on
14:09:44 <efried> Tests and CI jobs would be great.
14:10:39 <efried> Anyone have anything else? Specific bugs or topics to bring up? Open discussion?
14:11:05 <cdent> Just a reminder for anyone to put stuff on https://etherpad.openstack.org/p/nova-ptg-stein
14:11:11 <jaypipes> efried: so, the bugs on shared storage providers... are we not able to fix those in Rocky?
14:11:17 <efried> no way
14:11:38 <efried> I should link the IRC discussion, stand by lemme find...
14:11:51 <cdent> jaypipes: there was quite a bit of discussion on thursday and friday in irc about slow gateness and features not being quite there
14:12:11 <cdent> which has resulted in a great deal of stuff being even less there
14:13:00 <jaypipes> cdent: ok, yeah if anyone has links for me to read, that would be great. tia
14:13:13 <jaypipes> currently reading the bug report
14:13:23 <efried> This is probably as good a place as any to start reading. Goes on for a while:
14:13:23 <efried> #link IRC discussion about busted sharing provider support http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-07-27.log.html#t2018-07-27T14:43:33
14:13:31 <cdent> matt and dan may be able to give you a quick summary of their perspectives
14:13:42 <efried> The bug report actually summarizes the issues pretty well.
14:14:01 <efried> But I was convinced it was going to be too much to qualify as "bug fixes" in rocky.
14:14:19 <efried> and I tend to be the optimistic one.
14:15:44 <efried> Do we want to discuss further here? jaypipes cdent?
14:16:23 <cdent> I've got nothing to add at this time. I think after jaypipes has digested if we want to continue we can do it after the meeting
14:16:38 <jaypipes> efried: just disappointing to push it yet another release, but at this point, it's par for the course.
14:17:46 <efried> I think good progress was made, but note that we explicitly made it not a priority for Queens and then never brought it back to the forefront for Rocky. So if nothing else, this is good reason for us to make it an explicit priority for Stein.
14:18:09 <efried> And to spec out exactly what we need to do to make it work and declare it done.
14:19:01 <jaypipes> efried: the way we handle migrations currently, I'm not sure there's ever going to be a solution to this :(
14:20:12 <jaypipes> efried: at least, not as long as we can't tolerate doubling up allocations on shared storage for a brief period of time during migration.
14:20:38 <efried> jaypipes: Incremental progress. We can start by making sure allocations aren't lost/doubled. Then next step, making sure the same provider is picked on the target when it's available. Then making sure that's done without having doubled allocations. Then doing the migration without re-copying the disk contents.
14:20:39 <jaypipes> efried: we go through all sorts of code gymnastics to cater to resize to same host shit.
14:21:07 <efried> I think if we spec it out, we'll be able to solve it.
14:21:17 <efried> #link Stein PTG etherpad https://etherpad.openstack.org/p/nova-ptg-stein
14:21:39 <efried> L82 mriedem started some bullets around this.
14:21:59 <jaypipes> efried: doubling up allocations *should* be fine -- and standard practice during migrations. it is the fact that due to resize to same host and concerns about potentially running out of perceived disk space on a shared provider that we *don't* just always double shit up, right?
14:22:19 <efried> that's not what's happening today
14:25:04 <efried> I imagine there will be discussion about whether we prefer dest hosts sharing the same providers as existing allocations - I imagine the answer is yes - and then what we do if we can't find one of those. So maybe we ask for alloc candidates *excluding* shared resources but which are ?member_of the aggregate(s) of the sharing providers? And if that results in no hits, then we fall back to the regular thing and accept
14:25:06 * jaypipes reading IRC backlog
14:25:22 <efried> Sounds like ideal PTG whiteboard fodder.
14:25:59 <jaypipes> efried: sounds way too complicated to be safe/clean. which is typically the case with anything migration-related...
14:27:21 <cdent> denver has all the best whiteboards upstairs, and none of them downstairs
14:27:28 <efried> The alternative is not caring that we're asking for more resource than we actually need, and thus not being able to hit full utilization without incorrectly failing migrations...
14:28:33 <efried> which may be okay as an incremental first step
14:29:00 <jaypipes> efried: that is *precisely* what I just said above :)
14:29:32 <jaypipes> efried: in other words, don't try to overthink things. just always create a doubled allocation during migration -- owned/consumed by the migration UUID.
14:29:46 <jaypipes> efried: and don't try to figure out whether there's a sharing provider in use or not.
14:30:08 <efried> Yuh, what I'm saying is, once we've implemented that, let's not dust off our hands and quit trying to make it better.
14:30:24 <jaypipes> efried: and on the rare occasion where the scheduler/placement returns no valid host due to exceeding capacity, then oh well...
14:31:00 <cdent> buy more disk
14:32:25 <efried> Soooo, is someone going to start writing a spec?
14:32:41 <efried> migration isn't the only issue in play
14:32:51 <bauzas> keep in mind upgrades then ;)
14:32:56 <efried> ^
14:33:11 <bauzas> or we would have yet another spec like reshape :)
14:33:37 <cdent> efried: I'm unable to commit to anything like spec writing _now_
14:34:20 <cdent> later perhaps, but I need more mental headspace to do anything useful, and my backlog is way full
14:34:25 <cdent> I suspect that is the case for everyone
14:34:36 <jaypipes> the reshaper is the thing that is needed for sharing providers to be a completely automated setup.
14:34:41 <cdent> while that's true, we'll just have to pass our apoligies downstream
14:34:46 <efried> There was a discussion last week about process for big, multi-release features like this. It was suggested to have a high-level spec to describe things at the conceptual level and enumerate some of the components; and then single-release-contained specs for those components.
14:35:14 <jaypipes> we always said that when we added the sharing provider deletes DISK_GB inventory patch that the operator would be responsible for manually moving the DISK_GB allocation records from compute nodes onto the sharing provider. :(
14:35:42 <jaypipes> I guess now we want to automate all of that.
14:36:28 <efried> Well, the op was responsible for creating the inv on the sharing provider. But the way we implemented the libvirt update_provider_tree, the driver "automatically" deleted it from the compute node.
14:36:34 <jaypipes> god forbid an operator needs to run a command to move data when they set up shared storage...
14:36:54 <bauzas> well
14:36:59 <bauzas> idk
14:37:01 <efried> oh, allocation records, yeah
14:37:11 <jaypipes> efried: yes, and auto-deleting that inventory record from the compute node would work just dandy if the operator had already moved the allocations to the shared storage provider. :(
14:37:15 <efried> Do we have osc-placement yet?
14:37:19 <efried> for allocs
14:37:24 <bauzas> nope AFAIK
14:37:32 <bauzas> we had a BZ about it
14:37:36 <cdent> I thought we did?
14:37:37 <bauzas> (internally)
14:37:44 <bauzas> oh, okay
14:37:54 <bauzas> for deleting allocations ?
14:38:24 <jaypipes> bauzas: we don't want to delete allocations. the solution agreed upon was using the replace-the-world reshaper for these kinds of maintenance events.
14:38:28 <bauzas> because IIRC, the BZ reporter said he had to call the API directly for deleting allocs
14:38:47 <efried> He would want to POST allocs.
14:39:01 <bauzas> jaypipes: oh okay, if that's a design consensus :)
14:39:05 <efried> move them from the compute node rp to the shared storage rp.
14:39:07 <bauzas> I'm fine with it
14:39:11 <efried> But yeah, reshaper could be used for this also.
14:39:24 <bauzas> tbc, the BZ wasn't about asking to create a osc-placement call for deleting allocs
14:39:26 <jaypipes> efried: sharing storage providers was one of the primary use cases for reshaper...
14:39:33 <efried> um
14:39:57 <jaypipes> I very specifically remember talking excitedly with cdent about using reshaper to "shape" a sharing provider world.
14:40:20 <efried> I mean, I can see how that would work, but I don't remember sharing being part of the discussions.
14:40:20 <jaypipes> efried: the two primary use cases were NUMA-fying and sharing storage-ifying
14:41:06 <jaypipes> efried: in any case, I also remember speaking about the need to move allocation records for DISK_GB to the sharing provider as a manual operator action
14:41:19 <efried> Was that before reshaper was conceived?
14:41:25 <jaypipes> yes
14:41:55 <jaypipes> but now we're delaying the one thing that would actually be useful for fixing this problem for sharing providers. so, oh well, guess Stein it is.
14:42:21 <efried> What are you suggesting as an alternative? That we land reshaper and use it to fix sharing providers?
14:42:23 <efried> in Rocky?
14:42:34 <jaypipes> efried: yes.
14:42:44 <efried> Between now (FF was last week) and The End?
14:43:44 <jaypipes> efried: I know FF was last week... I don't really care about FF if I'm being honest. Never have. Probably never will. I don't like rigid release cycles. sorry...
14:43:59 <jaypipes> efried: I know I'm in the minority and am just bitching.
14:44:22 <efried> Well then, let's get the reshaper series finished up and reviewed, and we can buck the process when we've demonstrated that the code is ready, howzat?
14:44:36 <cdent> jaypipes: I don't know that you're in the minority, I think it is rather that Dan and Matt and Mel have already stated their opinion and you weren't there
14:44:53 <cdent> (not there for good reasons)
14:45:01 <jaypipes> efried: heh, good luck with that :) but yes, I'm happy to review and fix up the reshaper series as needed. even though I know it doesn't have a chance of landing before October.
14:45:10 <cdent> _plenty_ of people think the rigid release cycles are cray cray
14:45:13 <efried> Code is ready except for the last patch, which needs test. If we want to get aggressive about this, I can have that patch finished up in the next couple of days.
14:45:35 <jaypipes> cdent: it's not just dan, matt and melanie (and I don't blame them for anything). it's just the flow of OpenStack. it is what it is.
14:45:52 * cdent blanches
14:45:59 <cdent> I hate the phrase "it is what it is"
14:46:01 <jaypipes> efried: I can commit to reviewing as much as is pushed up to gerrit.
14:46:11 <jaypipes> cdent: yeah, I know. sorry :(
14:46:53 <efried> jaypipes: Series starts here: https://review.openstack.org/#/c/576927/ -- 8 patches, 7 of which are complete and ready for review.
14:47:49 <jaypipes> efried: ack, I'll review all today.
14:48:05 <efried> Cool, I'll work on that last patch.
14:48:12 <efried> which, of course, will be the most complicated :)
14:48:32 <efried> So, anything else for today?
14:49:22 <jaypipes> not from me.
14:49:51 <efried> ight, laters y'all
14:49:53 <efried> #endmeeting