14:00:19 #startmeeting nova_scheduler 14:00:19 Meeting started Mon Oct 8 14:00:19 2018 UTC and is due to finish in 60 minutes. The chair is efried. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:20 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:23 The meeting name has been set to 'nova_scheduler' 14:00:33 o/ 14:00:37 o/ 14:01:07 * gibi cannot join this time 14:01:19 * efried strikes gibi from agenda 14:02:21 Bueller? Bueller? 14:02:41 guess we'll get started and let people wander in. 14:02:47 #link agenda https://wiki.openstack.org/wiki/Meetings/NovaScheduler#Agenda_for_next_meeting 14:03:00 #topic last meeting 14:03:00 #link last minutes: http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-01-14.00.html 14:03:00 Any old business? 14:03:08 o/ 14:03:48 #topic specs and review 14:03:48 #link latest pupdate: http://lists.openstack.org/pipermail/openstack-dev/2018-October/135475.html 14:03:59 * bauzas waves a bit late 14:04:20 Anything to call out from the pupdate? (Will talk about extraction a bit later) 14:04:58 #link Consumer generation & nrp use in nova: Series now starting at https://review.openstack.org/#/c/605785/ 14:04:58 No longer in runway. Was gonna ask gibi the status of the series, but he's not attending today. 14:05:10 Bottom patch has some minor fixups required 14:05:18 an interesting issue raised by tetsuro 14:05:54 i need to look at that one, 14:06:02 since i talked with gibi about the design before he wrote it 14:06:13 which is that we can't tell whether the destination is nested in many cases until after we've already decided to schedule to it. 14:06:31 also some concern about how the filters could check the computes in case allocation candidates are only about nested RPs 14:06:33 which means we don't know whether we need to run the scheduler until... we've run the scheduler. 14:06:50 i have the same problem with the resize to same host bug 14:07:20 well, if you know the host already, you can go query that host to see if he's nested. And if so, you have to run the scheduler. 14:07:24 I had a concern on changing the behaviour in https://review.openstack.org/#/c/605785/9/nova/compute/api.py@4375 14:07:56 if we want to call the scheduler anyway, we should have a new microversion IMHO 14:08:19 bauzas: gibi and i had talked about the behavioral changes, but we didn't think a new microversion would be needed here, 14:08:31 but it's messy i agree, 14:08:43 heh 14:08:47 huh* even 14:08:48 we already broke the force behavior in pike when we made sure we could claim allocations for vcpu/disk/ram on force 14:08:59 here we're breaking that if nested 14:09:11 the more we depend on claims in the scheduler, the less we can honor force 14:09:21 if we want to stop forcing a target (wrt I'm fine with), I just think we should still signal it for operators 14:09:41 Can we add a column to the hosts table caching whether the host uses nested/sharing? 14:09:52 like, you wanna still not call the scheduler ? fair enough, just don't ask for 2.XX microversion 14:10:05 >2.XX even 14:10:11 why does it matter if we go from a non-nested host to a nested host? I mean, if the nested host supports the original requested resources and traits, who cares? 14:10:17 i don't think we want to allow people to opt into breaking themselves 14:10:19 bauzas: But if we don't call the scheduler, we literally *can't* schedule to a nested host 14:10:43 efried: how can I target a nested resource provider ? 14:10:49 could someone give me examples ? 14:10:52 jaypipes: a) How would you know if it does? b) if any of the resources are in child providers, you need GET /a_c to give you a proper allocation request. 14:11:03 operators target compute services 14:11:05 efried: and? 14:11:24 and that (calling GET /a_c rather than just cloning the alloc onto the dest) is a behavior change. 14:11:35 we should probably table this until gibi is around to talk about it, because i know he and i talked about a bit of this before he started this code 14:11:36 efried: if the scheduler returns a destination, we use it. who cares if the resources ended up being provided by child providers or not. 14:12:01 that's the point. The scheduler returns a destination if we call the scheduler. 14:12:10 We're talking about a code path where previously we *didn't* call the scheduler. 14:12:12 IIUC. 14:12:15 jaypipes: the question is when you force and bypass the scheduler 14:12:32 ah... force_host rears its ugly-ass head yet again. 14:12:37 yes 14:12:41 not force_hosts 14:12:48 same idea 14:12:51 force_hosts is only for boot 14:12:53 i think we should table until gibi is around 14:12:57 but it's calling the scheduler 14:12:58 yeah. 14:13:20 i could try to dig up our irc conversation but it'd be hard probably 14:13:21 or we could just proceed, and make a big decision that affects his whole world for the next six months. 14:13:25 compared to livemigrate/evacuate where you litterally can bypass scheduler 14:13:27 I guess I still don't see why we care. If the destination host (forced or not) supports the original request, why do we care? 14:13:49 chicken/egg. We don't know if it supports the original request unless we call the scheduler algo to find that out. 14:14:04 well, we claim outside of the scheduler 14:14:05 I'm not sure to what extent ops expect "force" to mean "don't call the scheduler" though. 14:14:06 today 14:14:07 I still don't get why we're concerned by nested resource providers being targets 14:14:08 efried: why can't we ask the destination host in pre-live-migrate? 14:14:23 like i said, we already sort of broke the live migration 'force' parameter in pike, 14:14:31 when conductor started claiming 14:14:34 efried: since live-migrate API is existing AFAIK 14:14:54 mriedem: shit, I missed that then 14:14:59 bauzas: If any of the resources that we need come from nested providers, we must use GET /a_c to come up with a proper allocation request. 14:15:31 efried: isn't that a bit related to the concern I had about candidates be only on nested resource providers ? 14:15:44 we somehow need to know which root RP we're talking about 14:16:12 bauzas: see https://review.openstack.org/#/c/605785/9/nova/conductor/tasks/live_migrate.py@132 and scheduler_utils.claim_resources_on_destination for history 14:16:44 so, tabling until we can involve gibi. Moving on. 14:16:49 +1 14:16:59 Extraction 14:16:59 Info in the pupdate ---^ 14:16:59 cdent out this week. edleafe, mriedem, status? 14:17:14 Oh, Ed isn't around either. It's all on you mriedem 14:17:34 umm 14:17:40 * mriedem looks 14:17:59 https://review.openstack.org/#/c/604454/ is the grenade patch which is passing, 14:18:13 cdent updated that with the proper code to create the uwsgi placement-api config 14:18:31 #link https://review.openstack.org/#/c/604454/ is the grenade patch which is passing 14:18:31 the devstack change that depends on it is still failing though https://review.openstack.org/#/c/600162/ 14:18:46 #link the devstack change that depends on it https://review.openstack.org/#/c/600162/ 14:18:55 This is the $PROJECTS issue? 14:18:59 tl;dr there are other jobs that devstack runs which aren't cloning the placement repo yet, 14:19:09 i have patches up for that, but they aren't passing and i haven't dug into why yet 14:19:23 yeah https://review.openstack.org/#/c/606853/ and https://review.openstack.org/#/c/608266/ 14:19:35 I have good news for extraction 14:20:04 https://review.openstack.org/#/c/599208/ has been tested and works on a physical machine with pGPUs 14:20:22 efried: looks like my d-g patch for updating $PROJECTS passed, just failed one test in tempest 14:20:26 so just rechecks it looks like 14:20:27 next step will be to write some functional test mocking this ^ 14:20:42 nice 14:20:58 bauzas: nice. 14:21:11 bauzas: That's more of a reshape nugget than extraction, though? 14:21:32 efried: we agreed that that was a requirement for extraction. 14:21:36 efried: well, I thought we agreed on this being a priority for the extraction :) 14:21:36 oh, I guess we said we were going to want ... yeah 14:21:53 I forget why, actually. 14:22:02 anyway 14:22:09 let's not rehash that. 14:22:12 oh, right, it was a requirement for the governance split 14:22:22 not for getting extracted placement working. 14:22:26 cool cool 14:22:30 I have a spec topic... 14:22:37 anything else on extraction? 14:22:44 tl;dr it's close 14:22:49 sweet 14:22:50 for the grenade/devstack ci/infra bits 14:23:31 I need to disappear, taxi driving my kids from school 14:23:42 jaypipes: Want to go now or after the other spec/review topics? 14:24:02 so I have repeatedly stated I am not remotely interested in pursuing either https://review.openstack.org/#/c/544683/ or https://review.openstack.org/#/c/552105/. I was under the impression that someone (Yikun maybe?) who *was* interested in continuing that work was going to get https://review.openstack.org/#/c/552105/ into a state where people agreed on it (good luck with that), but as of now, I've seen little action on it other than 14:24:03 negative reviews. 14:24:21 * efried click click click 14:24:35 jaypipes: yeah yikun has been busy with some internal stuff after a re-org, 14:24:36 so my question is should I just abandon both of the specs and force the issue? 14:24:48 i can send an email to see what's going on and if we still care about those 14:24:53 k, thx 14:25:54 This could relate to the next-next topic on the agenda actually. 14:26:39 we were talking about using the file format proposal embedded in the 14:26:39 #link device passthrough spec https://review.openstack.org/#/c/591037/ 14:26:39 as a mechanism to customize provider attributes (prompted by the belmoreira min_unit discussion) 14:26:59 jaypipes agreed to review ^ with that in mind 14:27:03 efried: yes. 14:27:26 efried: I have found it very difficult to review. will give it another go this morning. 14:27:29 The "initial defaults" thing is still weird. 14:27:48 and not addressed in there (yet) 14:28:03 bauzas suggested to split out the part of the spec that talks about the file format, and do the device passthrough aspect on its own. 14:28:24 Which sounds like a good idea to me, considering the various ways we've talked about using it. 14:30:08 okay, moving on. 14:30:24 last week, the 14:30:25 #link HPET discussion http://lists.openstack.org/pipermail/openstack-dev/2018-October/135446.html 14:30:25 led to an interesting precedent on using traits for config 14:30:31 another spec ... I pushed a new rev on https://review.openstack.org/#/c/555081/ 14:30:38 (cpu resource tracking) 14:30:59 #link CPU resource tracking spec https://review.openstack.org/#/c/555081/ 14:33:03 any discussion on traits-for-config or CPU resource tracking? 14:33:25 any other specs or reviews to discuss? 14:33:29 i personally hope that cpu resource tracking is not something we pursue for stein 14:33:35 while we're still trying to land reshaper et al 14:34:13 reshaping all instances on all compute nodes is going to be rough during upgrade 14:34:16 unless we can do that offline 14:35:56 mriedem: so let's hold off getting new clean functionality so that upgrades can be prolonged even longer until end of 2019? 14:36:21 yes? 14:36:26 I feel like we've been putting off numa topo in placement a while now 14:36:47 i think getting reshaper and bw-aware scheduling and all that stuff has been around long enough that we need to get those done first 14:36:50 so I don't disagree that it's going to be a big reshape, but.. dang, we've been working towards it for a while now and.. 14:37:01 mriedem: I don't get the argument that adding another data migration (reshape action) makes upgrades harder than having one to do in a release cycle. 14:37:04 i would just like fewer things to worry about 14:37:24 if we end up with something for gpus that requires compute nodes to be online, 14:37:33 it'd be massively better for FFU to have both of those in the same release 14:37:45 vs. two different (especially back-to-back) releases 14:38:21 do we need the computes online for the cpu resource tracking upgrade? 14:38:27 yes 14:38:50 they have to do it themselves, I think, because only they know where and what the topology is 14:39:18 dansmith: right, unless we go with a real inventory/provider descriptor file format. 14:39:34 jaypipes: well, that just pushes the problem elsewhere.. you still have to collect that info from somewhere 14:39:50 dansmith: it's already in the vast majority of inventory management systems. 14:39:57 waitwait, the *admin* is going to be responsible for describing NUMA topology? It's not something the driver can discover? 14:40:14 efried: we should have the driver do it for sure 14:40:22 efried: the operator is ultimately responsible for *whether* a compute node should expose providers as a tree. 14:40:22 phew 14:40:28 jaypipes: but we can't just requre the operator to have that and build such a mapping, IMHO 14:40:30 but even still, 14:40:54 why would operators care how we model things internally? 14:41:01 efried: a lot of operators don't want or need to deal with NUMA. they just have needs for dedicated CPU and shared CPU resources and don't care about NUMA. 14:41:14 Yeah, I can live with a "use numa or not" switch. 14:41:18 the driver is the only one that can decide how existing allocations map to that information, IMHO, so unless you want to run the driver against the DB from a central node... even still, there are numa pinnings that the driver has done already we need to know about 14:41:36 I was just afraid you were talking about requiring the op to say "and this CPU is in NUMA node 1, and this CPU is in NUMA node 2 and..." 14:41:45 mriedem: they don't, that's why making them write a topo description for each (type of) compute node to do this migration would be mega-suck 14:41:58 efried: I think that's what jaypipes is saying 14:42:03 and I think that's not reasonable 14:42:17 efried: I don't think a "numa or not" switch is reasonable either, FWIW 14:42:20 dansmith: ops *already* have that. they all have hardware profiles which describe the different types of hardware they provide to guests. 14:42:21 they just want it to work 14:42:45 jaypipes: all ops do not have that 14:42:51 jaypipes: but even still, they don't have the information about what numa allocations we've already done for existing instances 14:43:08 dansmith: agreed completely with that last statement. 14:43:34 With a generic inventory/provider descriptor file, you could allow the op to override/customize. But I would think we would want the default to be automatic detection/configuration resulting in at least a sane setup. 14:43:42 it's a shame the guest NUMA topology and CPU pinning were implemented as such a tightly coupled blobject mess. 14:43:50 while i agree it would be best if we can do all the reshapes we know we need to do in the same release to ease the pain, i just wanted to state that i'm worried about trying to bite this off in stein with everything else that's going on 14:44:11 mriedem: there's risk there for sure, no doubt 14:44:29 We also still can't do generic affinity without a placement API change, just to bring that up again. 14:44:31 I'm not saying it's critical, I'm just saying writing it off now seems like a bad idea to me 14:45:02 i'll admit the only part of that spec i've read is the upgrade impact 14:45:10 then i had to go change my drawers 14:45:22 mriedem: I guess I'm not sure why that's a surprise at this point, 14:45:27 will artom's stuff depend on this? 14:45:29 but maybe I have just done more thinking about it 14:45:40 artom's stuff = numa aware live migration 14:45:52 mriedem: artom's stuff kinda conflicts with this.. if this was done his stuff would be easier I think 14:45:55 dansmith: yeah i've just avoided thinking about this 14:46:40 ok, i need to get updated on what he plans to do with that as well 14:46:56 anyway, i'll be quiet now 14:48:35 dansmith, jaypipes: any last words? 14:48:46 no. 14:49:36 Home stretch 14:49:36 #topic bugs 14:49:36 #link Placement bugs https://bugs.launchpad.net/nova/+bugs?field.tag=placement 14:49:41 any bugs to highlight? 14:49:43 efried: go Browns? 14:50:12 ugliest FG ever 14:51:43 Horns to 5-1 by a toenail. Khabib face cranks Connor to a tap, then attacks his training team. Derek Lewis's balls are hot. Other weekend sports news? 14:51:56 I guess we're really in 14:51:56 #topic opens 14:51:58 is that real sports news? 14:52:02 yes 14:52:09 hah. okay, sounded made up 14:52:12 https://deadspin.com/khabib-nurmagomedov-taps-out-conor-mcgregor-attacks-co-1829580622 14:52:14 shows how much I know 14:52:51 * bauzas waves again 14:53:15 and https://www.youtube.com/watch?v=F_E6jXHMPs4 14:53:18 okay, anything else? 14:53:26 * edleafe arrives super-late 14:53:57 edleafe: Anything to bring up before we close? 14:54:48 we also had https://www.youtube.com/watch?v=KgwmPhAu0tc 14:55:31 Thanks y'all 14:55:31 #endmeeting