14:00:19 #startmeeting nova_scheduler 14:00:20 Meeting started Mon Jan 29 14:00:19 2018 UTC and is due to finish in 60 minutes. The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:22 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:24 The meeting name has been set to 'nova_scheduler' 14:00:46 @/ 14:00:49 o/ 14:00:53 o/ 14:00:56 o/ 14:01:04 Looks like efried didn't comb his hair 14:01:24 * efried is a conditioner commercial 14:01:26 \o 14:01:28 i'm only vaguely here 14:02:03 can we please try to make that meeting short ? :) 14:02:09 I have a ton of things to catch up :) 14:02:28 bauzas: sounds good 14:02:31 let's start then 14:02:38 #topic Reviews 14:02:53 #link Provider Tree series, starting with: https://review.openstack.org/#/c/533808/ 14:02:56 Most of the bottom patches are +W'd 14:02:59 #link First provider tree patch in progress is https://review.openstack.org/#/c/537648/ 14:03:02 Anything to add, efried? 14:03:24 o/ 14:03:26 The weekend saw two of eight merge there. The remaining six are In The Gate. 14:03:36 14:03:58 are a bunch failing on functional job timeouts? 14:03:58 ok 14:04:00 I'll be cleaning up the rest of the series over the next couple of days, but of course those pieces are Rocky-bound. 14:04:14 mriedem Infra has declared no rechecks until they've finished sleuthing. 14:04:21 ooo... 14:04:27 it should no longer be called "the gate", rather "the wall" 14:04:33 was there something in the ML? 14:04:39 or just a channel message? 14:04:39 "the pit" 14:04:40 mriedem IRC broadcast 14:04:42 mriedem: see IRC topics 14:04:54 "...of despair" 14:04:54 ah 14:05:08 next up: 14:05:09 #link Nested RP traits selection: https://review.openstack.org/#/c/531899/ 14:05:12 Not much recent action on this, as the series is postponed until Rocky 14:05:14 and https://wiki.openstack.org/wiki/Infrastructure_Status 14:05:31 says that we had an infra cloud provider outage, hence the very large delay 14:05:48 #link Singular request group traits, starting with: https://review.openstack.org/#/c/536085/ 14:05:51 This series is also +W'd, although a cleanup patch at the end needs work 14:05:55 the traits API thing has been merged, right? 14:06:17 bauzas: which thing? 14:06:32 yes 14:06:45 the API side 14:06:51 bauzas: the traits support in allocation candidates API is merged 14:07:24 the patch for support traits in flavor extra spec is still in the gate pipeline 14:07:24 https://review.openstack.org/#/c/536085/ is left 14:07:29 to tie things together 14:07:40 yes 14:07:43 yeah ok, we're on the same page 14:07:46 alex_xu: did you see my request for a functional test? 14:07:48 I was meaning that 14:08:05 mriedem: the patch for bump the timeout? 14:08:14 alex_xu: no, 14:08:17 on the traits stuff, 14:08:34 i think we should have a functional test with 2 computes and one decorated with a trait that we use via flavor extra spec to see everything works as expected 14:08:35 end to end 14:08:50 +1 14:09:03 mriedem: yea, I will add one, gibi_ wants that also 14:09:16 thanks. i can't remember which patch i mentioned that in. 14:09:27 Moving on... 14:09:28 #link Granular resource requests, starting with: https://review.openstack.org/#/c/517757/ 14:09:31 Still WIP; to be completed in Rocky 14:09:52 Next: 14:09:53 #link Remove microversion fallback: https://review.openstack.org/#/c/528794/ 14:09:56 Simple enough; caught in Zuul hell 14:10:03 Needs some +2s, also 14:10:30 Last on the agenda: 14:10:31 #link Use alternate hosts for resize: https://review.openstack.org/#/c/537614/ 14:10:34 The main patch finally merged! All that's left is this follow-up unit test 14:10:39 honestly, I think we need to identify what we can still land for Rocky and what can be postponed 14:10:50 I know mriedem made an etherpad for that 14:10:56 s/Rocky/Queens 14:11:05 because I don't know when the gate problems will be fixed 14:11:10 heh, oops, yes 14:11:29 we can't just recheck the entire world 14:11:40 https://etherpad.openstack.org/p/nova-queens-blueprint-status is what i'm still tracking for queens 14:11:40 We're OpenStack - we can do anything! 14:11:52 (and I tell that as I also have one important change blocked in the gate) 14:12:30 mriedem: yup, I was talking of that etherpad 14:12:55 Is there anything on that etherpad we need to discuss? Seems straightforward to me 14:13:05 no i don't think so, 14:13:13 i think all the approved NRP stuff is mergd, 14:13:16 so the rest goes to rocky 14:13:24 i'll probably put a procedural -2 on the bottom change in the series 14:13:44 oh nvm i see a bunch is approved now 14:13:45 The bottom unapproved one, you mean 14:14:16 https://review.openstack.org/#/c/537648/ is the first unapproved in the series. 14:14:20 yeah 14:14:53 question 14:14:59 is there an 'end' patch in this series yet 14:14:59 ? 14:15:09 like, when do we say a user can use this stuff 14:15:23 https://review.openstack.org/#/c/520246/ 14:15:31 if by "user" you mean "virt driver" 14:15:50 And can use in the sense that they can use update_provider_tree, but they can't make use of nested therein. 14:15:58 i mean, 14:16:04 an API user or operator, 14:16:12 modeling server create requests that rely on NRP 14:17:15 Actual hierarchical provider models will require 1) the above (https://review.openstack.org/#/c/520246/), 2) Jay's series on NRP in alloc cands, and 3) a virt driver impl of update_provider_tree. 14:17:16 mriedem: the theory is that a good customer usecase would be the VGPU stuff 14:17:20 None of those things are making Q. 14:17:31 But are very close. 14:17:48 Well, somewhat close. At least already started. 14:17:49 mriedem: ie. xen folks providing a nested inventory so that a server boot asking for VGPU in the flavor would use that 14:18:13 that's the only usecase I'm aware of in the foreseenable future 14:18:23 vgpu and xen reminds me, there are open xen vgpu patches, so what's the support statement for the xen driver wrt gpu in queens? 14:18:25 because PCI things are way behind that in terms of schedule 14:18:33 https://review.openstack.org/#/q/topic:bp/add-support-for-vgpu+status:open 14:19:03 does the xen driver have the same basic support for vgpu as the libvirt driver? and the remaining patches are future work for using NRP? 14:19:12 mriedem: AFAICT, they started to play with nested RPs 14:19:15 1) is a few lines of test code from being ready. 2) is fairly far along. 3) gpu stuff has been written based on WIPs of #1, but is obviously blocked by that, and is still pretty nascent afaik 14:19:26 yes 14:19:40 mriedem: yes, basic feature parity, except they haven't tested all the server actions AFAICS 14:19:58 ok thanks both; bauzas we should have a patch for a feature support matrix entry for vgpu i think 14:20:11 mriedem: yup, I saw your comment and I agree 14:20:12 but now i'm derailing this meeting and will stop 14:20:25 +1 14:20:43 (and I asked for a quick meeting - pretty unfair if I chat too much) 14:21:15 bauzas: heh 14:21:30 Any other reviews to discuss? 14:22:06 ok then 14:22:06 #topic Bugs 14:22:07 #link Placement bugs: https://bugs.launchpad.net/nova/+bugs?field.tag=placement 14:22:11 Nothing new this week; anyone have anything to say about bugs? 14:22:38 we got'em 14:22:49 just keep an eye out for new bugs each day until RC1 14:22:51 just that I'll do some triage 14:23:09 Sounds good 14:23:11 #topic Open Discussion 14:23:16 ooo i have something 14:23:21 go for it 14:23:23 http://lists.openstack.org/pipermail/openstack-dev/2018-January/126653.html 14:23:25 #link http://lists.openstack.org/pipermail/openstack-dev/2018-January/126653.html 14:23:33 just a simple thing i POC'ed over the weekend, 14:23:40 to put the driver.capabilities as traits on the compute node RP 14:24:04 i think this would be useful for scheduling things that depend on driver capabilities, like multiattach and tagged attach, 14:24:14 which are things today that if we pick the wrong compute, we fail and don't reschedule 14:24:31 and also doesn't require a full blown capabilities API 14:24:33 mriedem: I saw your email but I was waiting you to be awake 14:24:52 mriedem: because looks like I wasn't thinking of the same than you when I was looking at a "capabilities API" 14:25:08 First blush I like the idea. 14:25:12 mriedem: in your email, you mention exposing the CPU features and other nasty bits, right? 14:25:35 IMO that should be done anyway, but is unrelated to driver capabilities. 14:25:46 oh, sorrt 14:25:49 misunderstood 14:25:54 efried: yeah, my thought too 14:26:02 bauzas: no 14:26:03 the proposal is to use traits for exposing what you can do for a compute 14:26:13 correct? 14:26:21 i've always wanted a way to take what's in the driver.capabilities dict, 14:26:25 and expose that out of the rest api 14:26:30 this is an easy way to do that 14:26:55 In the sense of "has image cache" or "supports quiesce". Not in the sense of "HW_CPU_X86_AVX". 14:27:01 correct 14:27:09 gotcha 14:27:29 we could have stored these along with the compute_nodes table and written a capabilities subresource API on os-hypervisors, 14:27:43 but a new API just seems like it would get stuck in committee 14:28:09 anyway, it was just an idea, thrown out there for discussion 14:28:10 I mean, I get that there could be some confusion, because we're mixing virt driver traits with hardware traits on the same resource provider. 14:28:28 not all traits on a RP are going to be hw traits, are they? 14:28:33 that's a bit limited in scope 14:28:50 ages ago we talked about hypervisor version as a trait 14:29:26 presumably an rp should expose any traits which are useful in selection 14:29:32 Yeah, I'm not saying we shouldn't do it. I'm just saying muddling the distinction between "the box" and "the virt driver"... 14:29:37 traits are cheap and designed to be cheap 14:30:11 We *could* make a dummy RP with no inventory that represents the virt driver. Put it... really anywhere in the tree. 14:30:21 But to cdent's point, RPs are more expensive than traits. 14:30:30 yeah that seems overly complicated to me 14:30:34 Agree. 14:30:35 efried: yeah, I was just going to say that 14:30:54 reductio ad absurdam and all that. 14:31:06 soon we'll have an SDD for modeling this all 14:31:07 and "supports foo" is better than "version 1.23" 14:31:14 ++ 14:31:39 sorry, was on another discussion 14:31:42 * mriedem gets lost down memory lane http://docs.oasis-open.org/sdd/v1.0/os/sdd-spec-v1.0-os.html 14:31:48 but I think it's a doable way 14:32:05 * cdent demerits mriedem for making a refernce to oasis 14:32:06 i'll throw it in the PTG etherpad and we can discuss there, and move on here 14:32:23 what's next to talk ? TOSCA support ? 14:32:32 mmm https://www.dmtf.org/standards/cim 14:32:46 OK, anything else for opens? 14:33:26 OK, everyone - back to whatever it was you were doing before. 14:33:29 #endmeeting