#openstack-meeting-alt log

14:00:10 <efried> #startmeeting nova-scheduler
14:00:11 <openstack> Meeting started Mon Oct 29 14:00:10 2018 UTC and is due to finish in 60 minutes.  The chair is efried. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:14 <openstack> The meeting name has been set to 'nova_scheduler'
14:00:18 <takashin> o/
14:00:34 <edleafe> \o
14:01:10 <alex_xu> o/
14:02:17 <jaypipes_> o/
14:02:43 <jaypipes_> efried, edleafe: welcome, Red Hatters.
14:03:04 <efried> Yeah, I heard something yesterday from one of my students, what's that about?
14:03:11 <edleafe> I wonder if we have to start wearing Blue fedoras to tell us apart
14:03:12 <mriedem> o/
14:03:37 <edleafe> efried: https://www.cnbc.com/2018/10/28/ibm-is-reportedly-nearing-deal-to-acquire-red-hat.html
14:03:39 <alex_xu> ha
14:03:50 <gibi> o/
14:03:59 <jaypipes_> edleafe: ++ :)
14:04:11 <efried> Oh, well, as long as we're resetting the cloud landscape.
14:04:24 <efried> that metaphor is... broken.
14:04:44 <jaypipes_> efried: I'm sure the execs have found some synergistic realities.
14:05:03 <efried> They're always aligning something strategic.
14:05:19 <edleafe> We're already maximizing potential interactions
14:05:53 <efried> I keep waiting for someone to say something like "following up collaborative knowledge"
14:06:16 <efried> okay, let's get started.
14:06:28 <efried> #link agenda https://wiki.openstack.org/wiki/Meetings/NovaScheduler#Agenda_for_next_meeting
14:06:42 <efried> #topic last meeting
14:06:42 <efried> #link last minutes: http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-22-14.00.html
14:06:42 <efried> Any old business?
14:07:22 <efried> #topic specs and review
14:07:23 <efried> #link latest pupdate: http://lists.openstack.org/pipermail/openstack-dev/2018-October/136095.html
14:07:23 <efried> ^^ Nice list of specs, in case you missed last Tuesday (or even if you didn't)
14:07:38 <efried> #link nrp use in nova: Series now starting at https://review.openstack.org/#/c/604125/
14:07:38 <efried> (In a runway)
14:07:47 <efried> Any specs or reviews to discuss here?
14:08:00 <gibi> one thing
14:08:04 <efried> (besides extraction stuff, which now has its own section)
14:08:14 <efried> gibi: shoot
14:08:31 <gibi> this is the last patch of the use-nested-allocation-candidates https://review.openstack.org/#/c/608298/
14:08:44 <gibi> i haven't updated the unit tests so those are failing
14:08:48 <gibi> but the functional tests are passing
14:09:06 <gibi> I'd like to get some initial feedback on my design decision before I dig into the unit tests
14:09:24 <gibi> as I forsee some debate about the proposed technical solution
14:09:45 <gibi> it touches the Selection codepath in the scheduler and that is pretty complex to me
14:10:23 <efried> okay. That's been on my list to look at for... a while now :(
14:10:39 <gibi> thanks. thats all I wanted to add
14:12:06 <efried> anyone else, anything else?
14:12:36 <mriedem> yeah
14:12:38 <mriedem> i've got a bug
14:12:42 <mriedem> do i wait?
14:13:08 <efried> there's a bug segment. So yeah, I guess.
14:13:21 <efried> #topic Extraction
14:13:21 <efried> #link Extraction etherpad https://etherpad.openstack.org/p/placement-extract-stein-4
14:13:42 <efried> cdent, edleafe, mriedem: word?
14:14:14 <mriedem> i put a call out into the dev ML asking for tripleo / osa people to write upgrade poc
14:14:25 <edleafe> I've been working on integrating alembic. Found some issues with the tests I copied from Ironic
14:14:35 <jaypipes_> gibi: ack, will review today.
14:14:37 <mriedem> http://lists.openstack.org/pipermail/openstack-dev/2018-October/136075.html
14:14:50 <mriedem> no responses from tripleo/osa yet
14:15:37 <mriedem> the libvirt/xenapi reshape patches are in a runway now
14:16:47 <mriedem> and the FFU entry point for reshape is still MIA
14:17:15 <mriedem> which i think just uses PlacementDirect and does the same thing that the RT does right?
14:17:24 <mriedem> but runs per compute
14:17:37 <mriedem> unlike all other online data migrations
14:18:15 <efried> mriedem: I thought we had a corollary with something ironic-ish in the past?
14:18:34 <mriedem> that called the ironic driver, but didn't need to be per-compute i don't think
14:18:38 <mriedem> it just re-used the code to hit the db
14:18:48 <mriedem> it was per-cell, not per compute
14:18:55 <efried> Okay. In any case, is dansmith still on the hook to write that thing?
14:19:07 <mriedem> idk, i assume he's busy
14:20:30 <mriedem> anyway, those seem to be the random extraction things
14:20:48 <mriedem> we'll have a forum session to checkpoint in 2 weeks
14:21:21 <mriedem> https://www.openstack.org/summit/berlin-2018/summit-schedule/events/22780/update-on-placement-extraction-from-nova
14:21:46 <cdent> sorry I'm late, DST fail
14:22:33 <efried> cdent: Any extraction news beyond alembic (in progress), OSA/TripleO (crickets), FFU script (??)
14:23:10 <efried> and libvirt/xenapi reshape patches (in runway)
14:23:23 <cdent> I know nothing about FFU script. Is anyone even on the hook for that?
14:23:55 <cdent> the gabbi-based integration tests are basically ready, pending the merge of the devstack change, pending the existence of a good db_sync-like CLI
14:25:07 <cdent> who is it that wants the FFU script?
14:26:33 <efried> Originally dansmith volunteered. I guess I'd like to hear him say he's not doing it before we look for another body.
14:26:44 <dansmith> I'm a little busy at the moment
14:27:23 <efried> Meaning you'll get to it later, or we should find another body?
14:27:31 <cdent> I don't mean who within nova/placement: I mean who/what is the source of the demand for a FFU script?
14:27:34 <jaypipes> efried: probably the latter.
14:28:59 <mriedem> cdent: to do the reshapes during an FFU
14:29:11 <mriedem> b/c otherwise you need nova-compute running to reshape on startup
14:29:32 <cdent> mriedem: I know that, but who (which project, what sig, which person) cares about fast forward upgrades?
14:29:40 <mriedem> upgrades sig?
14:29:41 <mriedem> ops?
14:29:44 <efried> does that mean you don't actually need the script until you FFU from <stein to >rocky?
14:29:49 <efried> sorry
14:30:02 <efried> <stein to >stein, e.g. rocky to train
14:30:37 <mriedem> i think you want it in whatever release adds the thing that needs to be upgraded
14:30:47 <mriedem> like all other data migrations
14:31:07 <efried> okay, when you say "in", you mean the code being part of that release branch.
14:31:21 <mriedem> yeah
14:31:22 <efried> But nobody is actually going to execute it until after train is released
14:31:33 <mriedem> i'm not saying that
14:31:39 <efried> I'm just trying to understand the flow.
14:31:54 <mriedem> should ask lyarwood then probably
14:32:09 <efried> If you FFU *to* stein, and reshape stuff doesn't run, and then you start up your computes, reshape stuff will happen at that point, yah?
14:32:14 <mriedem> you mean if you stop at stein, you don't need the FFU script b/c the compute startup will handle the reshape right?
14:32:17 <efried> correct.
14:32:49 <efried> I'm not advocating, just saying if that's true then technically we could backport the FFU code some time during the train dev cycle.
14:32:51 <mriedem> i think getting the reshape patches for libvirt/xen is more important than the FFU script *in stein* yes, but think we should get both done so it's not dropped
14:33:13 <efried> yeah, I don't disagree, just trying to get an idea of worst case
14:34:39 <efried> To summarize: right now we don't have an owner for the reshaper FFU script. We would still like to get that done in stein. Do we... put out an email asking for a volunteer?
14:34:54 <mriedem> sure
14:35:14 <cdent> that's seems like the sole strategy if nobody here has the cycles (and/or desire)
14:35:25 <cdent> it at least publicises the need
14:35:44 <efried> mriedem: You want to compose that, since you probably have some idea what to say?
14:35:52 <mriedem> sure
14:36:03 <mriedem> i love recruiting
14:36:15 <cdent> you get the sublime pleasure of sending requests out into the dark void
14:36:24 <mriedem> yeah that worked so well for me last week
14:36:25 <efried> #action mriedem to send ML call for reshape FFU script author
14:36:31 <efried> yeah, was just thinking that.
14:36:44 <cdent> we'll buy you a kitten to compensate
14:36:58 <efried> okay, let's move on.
14:37:11 <efried> #topic bugs
14:37:11 <efried> #link Placement bugs https://bugs.launchpad.net/nova/+bugs?field.tag=placement
14:37:11 <efried> mriedem, you talk more now?
14:37:25 <mriedem> yeah i've got https://review.openstack.org/#/c/613126/ for https://launchpad.net/bugs/1799727
14:37:25 <openstack> Launchpad bug 1799727 in OpenStack Compute (nova) "CPU_Allocation_Ratio from nova.conf doesn't update exisiting providers" [High,In progress] - Assigned to Matt Riedemann (mriedem)
14:37:39 <mriedem> which needs to be sorted out before the initial allocation ratio stuff that yikun is working on can proceed (i think)
14:37:59 <mriedem> bauzas and efried left comments in there, and the patch is already a bit fugly logic-wise
14:38:29 <mriedem> with a more clear head this morning, i read efried's alternative solution again to just have the libvirt driver set the allocation_ratios/reserved host inventory values from config in the upt method, and i think that might be the cleanest thing to do
14:38:48 <mriedem> basically RT._normalize_inventory_from_cn_obj does not work with any virt driver that implements update_provider_tree
14:38:56 <mriedem> which right now is ironic and libvirt
14:39:27 <mriedem> _normalize_inventory_from_cn_obj in master doesn't care about ironic b/c ironic in stein doesn't report standard resource class inventory and _normalize_inventory_from_cn_obj  is only about vcpu/ram/disk
14:39:38 <mriedem> it does in rocky though, where i need to backport this
14:40:14 <efried> Yeah, I'm not loving the fact that there will be mostly-duplicated code in all (or most?) drivers' upt methods, but trading that off, I think it's cleaner than trying to factor it out so that inventory data is *partly* managed by upt and *partly* centralized in the rt.
14:40:30 <mriedem> so i'm thinking maybe go with efried's suggestion and just have the libvirt driver's upt method set the allocation ratio/reserved values from config and add something to the upt docstring interface saying if you implement this method you should set those values in the inventory dicts (from config if that's how you get them)
14:40:47 <cdent> yeah, fixing the split of control would be nice
14:40:59 <jaypipes> mriedem: and remove the resource tracker from setting any of those ratios?
14:41:06 <efried> yes ^
14:41:22 <mriedem> jaypipes: well, right now we don't need that for anything besides the libvirt driver
14:41:31 <mriedem> but i guess that could be a follow up
14:41:51 <efried> I have an almost-merged patch for powervm upt
14:41:51 <jaypipes> mriedem: if that's the case, then who will be responsible for reading and using the initial allocation ratios? the virt drivers?
14:41:59 <efried> jaypipes: yes ^
14:42:15 <jaypipes> efried: not the "normal" ratios. the initial ones.
14:42:40 <mriedem> that just means if the allocation_ratios are not already in the provider tree when upt is called, we set the initial ratios right?
14:42:56 <efried> zooming out, the virt driver is responsible for provider inventories, period. (Idealized simplification)
14:43:10 <cdent> efried++
14:43:41 <jaypipes> mriedem: k, cool with me.
14:43:50 <efried> Perhaps there's even some magical future where the virt driver bootstraps the initial compute RP
14:44:03 <mriedem> i'm not working on the initial allocation ratios stuff, yikun is, and it's in a runway slot
14:44:09 <efried> instead of having the rt do it and then the virt driver has to mung it to make it right.
14:44:31 <mriedem> so i'm not sure how much duplication this would all introduce in the virt drivers
14:44:38 <jaypipes> quite a bit.
14:44:49 <mriedem> if it's really dumb duplication, it could be done in the parent ComputeDriver class with a helper method maybe
14:44:57 <cdent> mriedem++
14:44:57 <efried> I *guess* it might be okay to provide base ComputeDriver helper methods that the UPTs can call?
14:45:01 <jaypipes> but cannot be avoided, UIAM
14:45:05 <efried> yeah, jynx
14:45:32 <jaypipes> mriedem: that helper method is basically what normalize_inventory_from_cn was.
14:45:47 <mriedem> jaypipes: yeah i know
14:45:54 <efried> jaypipes: Yes; except now it will be the responsibility of the virt driver to call it as/when appropriate
14:46:01 <jaypipes> mriedem: the content at least.. obviously it wasn't part of an inheritable virt driver API
14:46:06 <efried> rather than having to sweep up the mess after the RT calls it blindly.
14:46:20 <jaypipes> efried: ++
14:46:22 <mriedem> right i'm in a weird "do this unless x or y but not for z" situation
14:46:29 <mriedem> which is hurting my head
14:46:30 <efried> yup
14:46:39 <jaypipes> mriedem: which is why I dropped those specs. :(
14:46:40 <efried> sounds like we have consensus here, cool.
14:46:49 <mriedem> so i'll just do the simple thing for the bug fix to backport,
14:47:01 <mriedem> and we can hash out removing the normalize method and initial ratios in a follow up
14:47:01 <jaypipes> mriedem: though mostly I dropped them because we couldn't agree on upgrade paths.
14:47:03 <efried> mriedem:  what's the simple thing?
14:47:18 <mriedem> the libvirt driver's upt method will set the allocation ratio and reserved host values
14:47:26 <jaypipes> mriedem: +1
14:47:35 <mriedem> and i guess the fake virt driver i'm using in the functional recreate test...
14:47:46 <jaypipes> mriedem: but, but, but what about existing compute nodes with wrong values?
14:47:55 <mriedem> those don't exist
14:48:01 <jaypipes> ha, right..
14:48:44 <mriedem> this is essentially just moving the normalize stuff into the libvirt driver's upt method
14:48:58 <mriedem> so behavior wise it should be the same as before the regression
14:49:18 <mriedem> oh but....
14:49:24 <jaypipes> mriedem: the whole thing with the 0.0 default values and doing stuff in the ComputeNode object was all about "fixing" compute nodes that had wrong allocation ratios, no? For "old compute nodes before we moved CONF.xxx_allocation_ratio out from the scheduler"?
14:49:26 <mriedem> we don't have the CN object in the upt method
14:49:38 <efried> mriedem: We shouldn't need it, should we?
14:49:46 <efried> If we're setting from conf?
14:49:50 <mriedem> the CN object is what currently provides the real default values via the facade
14:49:57 <mriedem> today, CONF.cpu_allocation_ratio defaults to 0.0
14:49:58 <jaypipes> right... thus the problem.
14:50:11 <mriedem> so the driver's upt method would have to do CONF.cpu_allocation_ratio or 16.0
14:50:12 <efried> gross
14:50:30 <mriedem> with a big # TODO(mriedem): Fix this with the initial allocation ratio stuff...
14:50:47 <mriedem> if we're going to use sane conf options to fix this anyway, that todo isn't a big deal is it?
14:50:57 <efried> So those are hardcoded in the bowels of the rt; they'll have to be hardcoded into the upt methods while we transition. Do I understand correctly?
14:51:13 <mriedem> yes
14:51:18 <efried> k
14:51:20 <mriedem> rt/cn
14:51:55 <efried> #agreed It is the responsibility of the virt driver's update_provider_tree to set reserved/allocation_ratio values, initial and otherwise, rocky and master and future. _normalize_inventory_from_cn shall DIAF. Temporarily hardcode initial values while we transition to a cleaner, saner world.
14:52:23 <efried> fair summary?
14:52:39 <mriedem> yeah i think so
14:52:56 <efried> any other bugs to discuss?
14:53:19 <cdent> If you're not already aware of the harden wsgi stuff, it is worth a review:
14:53:22 <cdent> #link harden wsgi https://review.openstack.org/#/c/610034/
14:53:30 <cdent> bug 1799246
14:53:30 <openstack> bug 1799246 in OpenStack Compute (nova) "module level init of db transation contexts cause failure under mod_wsgi on module reload" [Medium,In progress] https://launchpad.net/bugs/1799246 - Assigned to sean mooney (sean-k-mooney)
14:55:37 <efried> Okay, five minutes for
14:55:37 <efried> #topic opens
14:55:43 <efried> anyone?
14:55:56 <cdent> naw
14:56:30 <efried> Thanks all. Productive meeting, in the sense that several action items came out of it, and mriedem got most of them :P
14:56:40 <efried> #endmeeting