14:00:19 <efried> #startmeeting nova_scheduler
14:00:19 <openstack> Meeting started Mon Oct  8 14:00:19 2018 UTC and is due to finish in 60 minutes.  The chair is efried. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:20 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:23 <openstack> The meeting name has been set to 'nova_scheduler'
14:00:33 <takashin> o/
14:00:37 <jaypipes> o/
14:01:07 * gibi cannot join this time
14:01:19 * efried strikes gibi from agenda
14:02:21 <efried> Bueller? Bueller?
14:02:41 <efried> guess we'll get started and let people wander in.
14:02:47 <efried> #link agenda https://wiki.openstack.org/wiki/Meetings/NovaScheduler#Agenda_for_next_meeting
14:03:00 <efried> #topic last meeting
14:03:00 <efried> #link last minutes: http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-01-14.00.html
14:03:00 <efried> Any old business?
14:03:08 <mriedem> o/
14:03:48 <efried> #topic specs and review
14:03:48 <efried> #link latest pupdate: http://lists.openstack.org/pipermail/openstack-dev/2018-October/135475.html
14:03:59 * bauzas waves a bit late
14:04:20 <efried> Anything to call out from the pupdate? (Will talk about extraction a bit later)
14:04:58 <efried> #link Consumer generation & nrp use in nova: Series now starting at https://review.openstack.org/#/c/605785/
14:04:58 <efried> No longer in runway. Was gonna ask gibi the status of the series, but he's not attending today.
14:05:10 <efried> Bottom patch has some minor fixups required
14:05:18 <efried> an interesting issue raised by tetsuro
14:05:54 <mriedem> i need to look at that one,
14:06:02 <mriedem> since i talked with gibi about the design before he wrote it
14:06:13 <efried> which is that we can't tell whether the destination is nested in many cases until after we've already decided to schedule to it.
14:06:31 <bauzas> also some concern about how the filters could check the computes in case allocation candidates are only about nested RPs
14:06:33 <efried> which means we don't know whether we need to run the scheduler until... we've run the scheduler.
14:06:50 <mriedem> i have the same problem with the resize to same host bug
14:07:20 <efried> well, if you know the host already, you can go query that host to see if he's nested. And if so, you have to run the scheduler.
14:07:24 <bauzas> I had a concern on changing the behaviour in https://review.openstack.org/#/c/605785/9/nova/compute/api.py@4375
14:07:56 <bauzas> if we want to call the scheduler anyway, we should have a new microversion IMHO
14:08:19 <mriedem> bauzas: gibi and i had talked about the behavioral changes, but we didn't think a new microversion would be needed here,
14:08:31 <mriedem> but it's messy i agree,
14:08:43 <bauzas> heh
14:08:47 <bauzas> huh* even
14:08:48 <mriedem> we already broke the force behavior in pike when we made sure we could claim allocations for vcpu/disk/ram on force
14:08:59 <mriedem> here we're breaking that if nested
14:09:11 <mriedem> the more we depend on claims in the scheduler, the less we can honor force
14:09:21 <bauzas> if we want to stop forcing a target (wrt I'm fine with), I just think we should still signal it for operators
14:09:41 <efried> Can we add a column to the hosts table caching whether the host uses nested/sharing?
14:09:52 <bauzas> like, you wanna still not call the scheduler ? fair enough, just don't ask for 2.XX microversion
14:10:05 <bauzas> >2.XX even
14:10:11 <jaypipes> why does it matter if we go from a non-nested host to a nested host? I mean, if the nested host supports the original requested resources and traits, who cares?
14:10:17 <mriedem> i don't think we want to allow people to opt into breaking themselves
14:10:19 <efried> bauzas: But if we don't call the scheduler, we literally *can't* schedule to a nested host
14:10:43 <bauzas> efried: how can I target a nested resource provider ?
14:10:49 <bauzas> could someone give me examples ?
14:10:52 <efried> jaypipes: a) How would you know if it does? b) if any of the resources are in child providers, you need GET /a_c to give you a proper allocation request.
14:11:03 <bauzas> operators target compute services
14:11:05 <jaypipes> efried: and?
14:11:24 <efried> and that (calling GET /a_c rather than just cloning the alloc onto the dest) is a behavior change.
14:11:35 <mriedem> we should probably table this until gibi is around to talk about it, because i know he and i talked about a bit of this before he started this code
14:11:36 <jaypipes> efried: if the scheduler returns a destination, we use it. who cares if the resources ended up being provided by child providers or not.
14:12:01 <efried> that's the point. The scheduler returns a destination if we call the scheduler.
14:12:10 <efried> We're talking about a code path where previously we *didn't* call the scheduler.
14:12:12 <efried> IIUC.
14:12:15 <mriedem> jaypipes: the question is when you force and bypass the scheduler
14:12:32 <jaypipes> ah... force_host rears its ugly-ass head yet again.
14:12:37 <mriedem> yes
14:12:41 <bauzas> not force_hosts
14:12:48 <mriedem> same idea
14:12:51 <bauzas> force_hosts is only for boot
14:12:53 <mriedem> i think we should table until gibi is around
14:12:57 <bauzas> but it's calling the scheduler
14:12:58 <efried> yeah.
14:13:20 <mriedem> i could try to dig up our irc conversation but it'd be hard probably
14:13:21 <efried> or we could just proceed, and make a big decision that affects his whole world for the next six months.
14:13:25 <bauzas> compared to livemigrate/evacuate where you litterally can bypass scheduler
14:13:27 <jaypipes> I guess I still don't see why we care. If the destination host (forced or not) supports the original request, why do we care?
14:13:49 <efried> chicken/egg. We don't know if it supports the original request unless we call the scheduler algo to find that out.
14:14:04 <mriedem> well, we claim outside of the scheduler
14:14:05 <efried> I'm not sure to what extent ops expect "force" to mean "don't call the scheduler" though.
14:14:06 <mriedem> today
14:14:07 <bauzas> I still don't get why we're concerned by nested resource providers being targets
14:14:08 <jaypipes> efried: why can't we ask the destination host in pre-live-migrate?
14:14:23 <mriedem> like i said, we already sort of broke the live migration 'force' parameter in pike,
14:14:31 <mriedem> when conductor started claiming
14:14:34 <bauzas> efried: since live-migrate API is existing AFAIK
14:14:54 <bauzas> mriedem: shit, I missed that then
14:14:59 <efried> bauzas: If any of the resources that we need come from nested providers, we must use GET /a_c to come up with a proper allocation request.
14:15:31 <bauzas> efried: isn't that a bit related to the concern I had about candidates be only on nested resource providers ?
14:15:44 <bauzas> we somehow need to know which root RP we're talking about
14:16:12 <mriedem> bauzas: see https://review.openstack.org/#/c/605785/9/nova/conductor/tasks/live_migrate.py@132 and scheduler_utils.claim_resources_on_destination for history
14:16:44 <efried> so, tabling until we can involve gibi. Moving on.
14:16:49 <mriedem> +1
14:16:59 <efried> Extraction
14:16:59 <efried> Info in the pupdate ---^
14:16:59 <efried> cdent out this week. edleafe, mriedem, status?
14:17:14 <efried> Oh, Ed isn't around either. It's all on you mriedem
14:17:34 <mriedem> umm
14:17:40 * mriedem looks
14:17:59 <mriedem> https://review.openstack.org/#/c/604454/ is the grenade patch which is passing,
14:18:13 <mriedem> cdent updated that with the proper code to create the uwsgi placement-api config
14:18:31 <efried> #link https://review.openstack.org/#/c/604454/ is the grenade patch which is passing
14:18:31 <mriedem> the devstack change that depends on it is still failing though https://review.openstack.org/#/c/600162/
14:18:46 <efried> #link the devstack change that depends on it https://review.openstack.org/#/c/600162/
14:18:55 <efried> This is the $PROJECTS issue?
14:18:59 <mriedem> tl;dr there are other jobs that devstack runs which aren't cloning the placement repo yet,
14:19:09 <mriedem> i have patches up for that, but they aren't passing and i haven't dug into why yet
14:19:23 <mriedem> yeah https://review.openstack.org/#/c/606853/ and https://review.openstack.org/#/c/608266/
14:19:35 <bauzas> I have good news for extraction
14:20:04 <bauzas> https://review.openstack.org/#/c/599208/ has been tested and works on a physical machine with pGPUs
14:20:22 <mriedem> efried: looks like my d-g patch for updating $PROJECTS passed, just failed one test in tempest
14:20:26 <mriedem> so just rechecks it looks like
14:20:27 <bauzas> next step will be to write some functional test mocking this ^
14:20:42 <efried> nice
14:20:58 <jaypipes> bauzas: nice.
14:21:11 <efried> bauzas: That's more of a reshape nugget than extraction, though?
14:21:32 <jaypipes> efried: we agreed that that was a requirement for extraction.
14:21:36 <bauzas> efried: well, I thought we agreed on this being a priority for the extraction :)
14:21:36 <efried> oh, I guess we said we were going to want ... yeah
14:21:53 <efried> I forget why, actually.
14:22:02 <bauzas> anyway
14:22:09 <jaypipes> let's not rehash that.
14:22:12 <efried> oh, right, it was a requirement for the governance split
14:22:22 <efried> not for getting extracted placement working.
14:22:26 <efried> cool cool
14:22:30 <jaypipes> I have a spec topic...
14:22:37 <efried> anything else on extraction?
14:22:44 <mriedem> tl;dr it's close
14:22:49 <efried> sweet
14:22:50 <mriedem> for the grenade/devstack ci/infra bits
14:23:31 <bauzas> I need to disappear, taxi driving my kids from school
14:23:42 <efried> jaypipes: Want to go now or after the other spec/review topics?
14:24:02 <jaypipes> so I have repeatedly stated I am not remotely interested in pursuing either https://review.openstack.org/#/c/544683/ or https://review.openstack.org/#/c/552105/. I was under the impression that someone (Yikun maybe?) who *was* interested in continuing that work was going to get https://review.openstack.org/#/c/552105/ into a state where people agreed on it (good luck with that), but as of now, I've seen little action on it other than
14:24:03 <jaypipes> negative reviews.
14:24:21 * efried click click click
14:24:35 <mriedem> jaypipes: yeah yikun has been busy with some internal stuff after a re-org,
14:24:36 <jaypipes> so my question is should I just abandon both of the specs and force the issue?
14:24:48 <mriedem> i can send an email to see what's going on and if we still care about those
14:24:53 <jaypipes> k, thx
14:25:54 <efried> This could relate to the next-next topic on the agenda actually.
14:26:39 <efried> we were talking about using the file format proposal embedded in the
14:26:39 <efried> #link device passthrough spec https://review.openstack.org/#/c/591037/
14:26:39 <efried> as a mechanism to customize provider attributes (prompted by the belmoreira min_unit discussion)
14:26:59 <efried> jaypipes agreed to review ^ with that in mind
14:27:03 <jaypipes> efried: yes.
14:27:26 <jaypipes> efried: I have found it very difficult to review. will give it another go this morning.
14:27:29 <efried> The "initial defaults" thing is still weird.
14:27:48 <efried> and not addressed in there (yet)
14:28:03 <efried> bauzas suggested to split out the part of the spec that talks about the file format, and do the device passthrough aspect on its own.
14:28:24 <efried> Which sounds like a good idea to me, considering the various ways we've talked about using it.
14:30:08 <efried> okay, moving on.
14:30:24 <efried> last week, the
14:30:25 <efried> #link HPET discussion http://lists.openstack.org/pipermail/openstack-dev/2018-October/135446.html
14:30:25 <efried> led to an interesting precedent on using traits for config
14:30:31 <jaypipes> another spec ... I pushed a new rev on https://review.openstack.org/#/c/555081/
14:30:38 <jaypipes> (cpu resource tracking)
14:30:59 <efried> #link CPU resource tracking spec https://review.openstack.org/#/c/555081/
14:33:03 <efried> any discussion on traits-for-config or CPU resource tracking?
14:33:25 <efried> any other specs or reviews to discuss?
14:33:29 <mriedem> i personally hope that cpu resource tracking is not something we pursue for stein
14:33:35 <mriedem> while we're still trying to land reshaper et al
14:34:13 <mriedem> reshaping all instances on all compute nodes is going to be rough during upgrade
14:34:16 <mriedem> unless we can do that offline
14:35:56 <jaypipes> mriedem: so let's hold off getting new clean functionality so that upgrades can be prolonged even longer until end of 2019?
14:36:21 <mriedem> yes?
14:36:26 <dansmith> I feel like we've been putting off numa topo in placement a while now
14:36:47 <mriedem> i think getting reshaper and bw-aware scheduling and all that stuff has been around long enough that we need to get those done first
14:36:50 <dansmith> so I don't disagree that it's going to be a big reshape, but.. dang, we've been working towards it for a while now and..
14:37:01 <jaypipes> mriedem: I don't get the argument that adding another data migration (reshape action) makes upgrades harder than having one to do in a release cycle.
14:37:04 <mriedem> i would just like fewer things to worry about
14:37:24 <dansmith> if we end up with something for gpus that requires compute nodes to be online,
14:37:33 <dansmith> it'd be massively better for FFU to have both of those in the same release
14:37:45 <dansmith> vs. two different (especially back-to-back) releases
14:38:21 <mriedem> do we need the computes online for the cpu resource tracking upgrade?
14:38:27 <dansmith> yes
14:38:50 <dansmith> they have to do it themselves, I think, because only they know where and what the topology is
14:39:18 <jaypipes> dansmith: right, unless we go with a real inventory/provider descriptor file format.
14:39:34 <dansmith> jaypipes: well, that just pushes the problem elsewhere.. you still have to collect that info from somewhere
14:39:50 <jaypipes> dansmith: it's already in the vast majority of inventory management systems.
14:39:57 <efried> waitwait, the *admin* is going to be responsible for describing NUMA topology? It's not something the driver can discover?
14:40:14 <dansmith> efried: we should have the driver do it for sure
14:40:22 <jaypipes> efried: the operator is ultimately responsible for *whether* a compute node should expose providers as a tree.
14:40:22 <efried> phew
14:40:28 <dansmith> jaypipes: but we can't just requre the operator to have that and build such a mapping, IMHO
14:40:30 <dansmith> but even still,
14:40:54 <mriedem> why would operators care how we model things internally?
14:41:01 <jaypipes> efried: a lot of operators don't want or need to deal with NUMA. they just have needs for dedicated CPU and shared CPU resources and don't care about NUMA.
14:41:14 <efried> Yeah, I can live with a "use numa or not" switch.
14:41:18 <dansmith> the driver is the only one that can decide how existing allocations map to that information, IMHO, so unless you want to run the driver against the DB from a central node... even still, there are numa pinnings that the driver has done already we need to know about
14:41:36 <efried> I was just afraid you were talking about requiring the op to say "and this CPU is in NUMA node 1, and this CPU is in NUMA node 2 and..."
14:41:45 <dansmith> mriedem: they don't, that's why making them write a topo description for each (type of) compute node to do this migration would be mega-suck
14:41:58 <dansmith> efried: I think that's what jaypipes is saying
14:42:03 <dansmith> and I think that's not reasonable
14:42:17 <dansmith> efried: I don't think a "numa or not" switch is reasonable either, FWIW
14:42:20 <jaypipes> dansmith: ops *already* have that. they all have hardware profiles which describe the different types of hardware they provide to guests.
14:42:21 <dansmith> they just want it to work
14:42:45 <dansmith> jaypipes: all ops do not have that
14:42:51 <dansmith> jaypipes: but even still, they don't have the information about what numa allocations we've already done for existing instances
14:43:08 <jaypipes> dansmith: agreed completely with that last statement.
14:43:34 <efried> With a generic inventory/provider descriptor file, you could allow the op to override/customize. But I would think we would want the default to be automatic detection/configuration resulting in at least a sane setup.
14:43:42 <jaypipes> it's a shame the guest NUMA topology and CPU pinning were implemented as such a tightly coupled blobject mess.
14:43:50 <mriedem> while i agree it would be best if we can do all the reshapes we know we need to do in the same release to ease the pain, i just wanted to state that i'm worried about trying to bite this off in stein with everything else that's going on
14:44:11 <dansmith> mriedem: there's risk there for sure, no doubt
14:44:29 <efried> We also still can't do generic affinity without a placement API change, just to bring that up again.
14:44:31 <dansmith> I'm not saying it's critical, I'm just saying writing it off now seems like a bad idea to me
14:45:02 <mriedem> i'll admit the only part of that spec i've read is the upgrade impact
14:45:10 <mriedem> then i had to go change my drawers
14:45:22 <dansmith> mriedem: I guess I'm not sure why that's a surprise at this point,
14:45:27 <mriedem> will artom's stuff depend on this?
14:45:29 <dansmith> but maybe I have just done more thinking about it
14:45:40 <mriedem> artom's stuff = numa aware live migration
14:45:52 <dansmith> mriedem: artom's stuff kinda conflicts with this.. if this was done his stuff would be easier I think
14:45:55 <mriedem> dansmith: yeah i've just avoided thinking about this
14:46:40 <mriedem> ok, i need to get updated on what he plans to do with that as well
14:46:56 <mriedem> anyway, i'll be quiet now
14:48:35 <efried> dansmith, jaypipes: any last words?
14:48:46 <dansmith> no.
14:49:36 <efried> Home stretch
14:49:36 <efried> #topic bugs
14:49:36 <efried> #link Placement bugs https://bugs.launchpad.net/nova/+bugs?field.tag=placement
14:49:41 <efried> any bugs to highlight?
14:49:43 <jaypipes> efried: go Browns?
14:50:12 <mriedem> ugliest FG ever
14:51:43 <efried> Horns to 5-1 by a toenail. Khabib face cranks Connor to a tap, then attacks his training team. Derek Lewis's balls are hot. Other weekend sports news?
14:51:56 <efried> I guess we're really in
14:51:56 <efried> #topic opens
14:51:58 <dansmith> is that real sports news?
14:52:02 <mriedem> yes
14:52:09 <dansmith> hah. okay, sounded made up
14:52:12 <mriedem> https://deadspin.com/khabib-nurmagomedov-taps-out-conor-mcgregor-attacks-co-1829580622
14:52:14 <dansmith> shows how much I know
14:52:51 * bauzas waves again
14:53:15 <efried> and https://www.youtube.com/watch?v=F_E6jXHMPs4
14:53:18 <efried> okay, anything else?
14:53:26 * edleafe arrives super-late
14:53:57 <efried> edleafe: Anything to bring up before we close?
14:54:48 <bauzas> we also had https://www.youtube.com/watch?v=KgwmPhAu0tc
14:55:31 <efried> Thanks y'all
14:55:31 <efried> #endmeeting