14:00:14 #startmeeting nova 14:00:14 Meeting started Thu Sep 6 14:00:14 2018 UTC and is due to finish in 60 minutes. The chair is melwitt. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:15 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:18 The meeting name has been set to 'nova' 14:00:18 o/ 14:00:27 hello everybody 14:00:28 o. 14:00:32 o/ 14:00:40 o/ 14:00:46 o/ 14:00:49 \o 14:00:56 let's get started 14:01:06 #topic Release News 14:01:15 #link Stein release schedule: https://wiki.openstack.org/wiki/Nova/Stein_Release_Schedule 14:01:40 ō/ 14:01:41 final rocky release was last thursday. we're still working on bugs and backporting them to stable/rocky 14:02:00 so now, we kick off the stein cycle with the PTG next week 14:02:39 that's all I have for release news. anyone have anything else? 14:03:00 #topic Bugs (stuck/critical) 14:03:17 we have one bug in the critical link 14:03:26 https://bugs.launchpad.net/nova/+bug/1790701 14:03:26 Launchpad bug 1790701 in OpenStack Compute (nova) "online_data_migrations fail in rocky+" [Critical,In progress] - Assigned to Matt Riedemann (mriedem) 14:03:49 * bauzas waves late 14:03:55 need https://review.openstack.org/#/c/599744/ approved 14:04:04 I have that open now 14:04:07 with that and another fix already merged, 14:04:11 we have nova-status passing in devstack https://review.openstack.org/#/c/599847/ 14:04:14 for fresh install 14:04:18 I'm on the patch 14:04:20 something i should have added long ago 14:04:32 i'll start backports after the meeting 14:04:45 ok, coolness 14:05:06 thsnk 14:05:09 *thanks 14:05:14 #link 51 new untriaged bugs (up 1 since the last meeting): https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 14:05:22 #link 11 untagged untriaged bugs (up 1 since the last meeting): https://bugs.launchpad.net/nova/+bugs?field.tag=-*&field.status%3Alist=NEW 14:05:39 not too big of an increase from last week in bugs, thanks to all who have been helping with triage 14:05:47 #link bug triage how-to: https://wiki.openstack.org/wiki/Nova/BugTriage#Tags 14:05:52 #help need help with bug triage 14:06:09 Gate status 14:06:10 #link check queue gate status http://status.openstack.org/elastic-recheck/index.html 14:06:11 gate has seemed OK 14:06:22 3rd party CI 14:06:27 #link 3rd party CI status http://ci-watch.tintri.com/project?project=nova&time=7+days 14:06:46 anything else for bugs or gate status or 3rd party CI? 14:06:52 3rd party ci needs https://review.openstack.org/#/c/599672/ 14:07:05 that was the 0.0 allocation ratio thing killing the non-libvirt ci jobs 14:07:13 right, ok. will review 14:07:22 thanks 14:07:43 #topic Reminders 14:07:51 #link Stein Subteam Patches n Bugs: https://etherpad.openstack.org/p/stein-nova-subteam-tracking 14:07:56 #link Stein PTG planning: https://etherpad.openstack.org/p/nova-ptg-stein 14:08:12 I've updated the etherpad with a schedule ^ 14:08:39 the cyborg team is going to talk about placement integration stuff on monday from 2pm - 3pm at the cyborg room 14:08:53 they'd like for interested folks from the nova team to join 14:09:08 there's a blazar one on tuesday at 10am (I think) 14:09:12 yes 14:09:25 ok, will add a note about that on the schedule 14:09:28 and mfing edge at 4pm on tuesday 14:09:53 edge is having an all day thing on tuesday, I 14:09:58 right, 14:10:00 will add a note about 4pm being nova time 14:10:03 but their nova-specific stuff starts around 4 14:10:06 already done 14:10:09 thanks 14:10:49 we have the rocky retro first thing on wednesday 14:10:54 #link Rocky retrospective for the PTG: https://etherpad.openstack.org/p/nova-rocky-retrospective 14:11:09 there's almost nothing on the etherpad, so I expect it to be short 14:11:28 but we'll at least talk about runways and any changes we'd like to make to the spec freeze date this time 14:11:39 and kick off runways for stein accordingly 14:11:46 oo, I just thought of this, when we do the retrospective *next* time, we get to call it the stein whine 14:11:53 * efried crawls back into hole 14:12:00 that's something to look forward to 14:12:17 ok, that's all I have for reminders. anyone else have anything to add for reminders? 14:12:36 #topic Stable branch status 14:12:58 #link stable/rocky: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/rocky,n,z 14:13:10 * melwitt needs to review 14:13:18 #link stable/queens: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/queens,n,z 14:13:23 #link stable/pike: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/pike,n,z 14:13:28 #link stable/ocata: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/ocata,n,z 14:13:53 #help please help with stable reviews 14:14:04 there are a lot of reviews 14:14:22 anything else for stable branch status? 14:14:36 #topic Subteam Highlights 14:14:51 we didn't have a cells v2 meeting yesterday. anything you'd like to mention here dansmith? 14:14:59 not really, 14:15:15 several of us have been out here and there, like surya this week 14:15:38 I think we got the flag for reverting to the old skip behavior all nailed down (not sure if it merged yet or not) 14:15:52 and we've been iterating on the proper down-cell stuff, which has been a little slow with the people outages 14:15:53 i haven't looked at that yet 14:15:56 but otherwise going pretty well 14:16:01 mriedem: yeah, would be good to get your ack on that 14:16:10 given i asked for it... 14:16:11 yeah 14:16:22 cool, thank you 14:16:42 scheduler, efried? 14:16:46 No sched meeting this week due to labor day (though in retrospect it would have been polite of me to send an email to that effect). 14:16:46 But I would like to have a brief update on placement extraction. 14:16:52 As of yesterday we've merged the forty-some patches to get the extracted repository to the point of gating/voting unit/func/pep, which is a great milestone. 14:17:25 it was in honors of efried's 42 birthday 14:17:36 And with a couple of pending patches as deps, I think cdent has gotten devstack working, as proven by placecat etc. cdent, care to unmuddle that? 14:18:03 I got tempest working against https://review.openstack.org/#/c/600162/ 14:18:07 but not grenade of course 14:18:29 and placecat is my docker driven test suite for placement, the container now uses openstack/placement instead of openstack/nova as its source 14:19:59 cool, glad things are going well 14:20:35 I think gibi isn't around, no notes left for notifications team 14:20:58 and I think gmann isn't around, no notes left for API team 14:21:09 anything else for subteams before we move on? 14:21:44 #topic Stuck Reviews 14:21:56 no items in the agenda. does anyone in the room have anything for stuck reviews? 14:22:36 #topic Open discussion 14:22:57 if it's not on the agenda, 14:23:03 cern is going to have a specless bp request for https://blueprints.launchpad.net/nova/+spec/extend-in-use-rbd-volumes 14:23:12 to support extending in-use rbd volumes 14:23:23 the os-brick code isn't merged yet 14:23:26 ok, yeah not in the agenda 14:23:40 and i've said on the nova change that i want to see the ceph job passing with the volume extend tempest test on that nova change first 14:23:40 ok 14:24:07 sounds like a good plan to me 14:24:25 Something I'd like to put in folks' noggins: 14:24:25 Do we ultimately see *all* device passthrough eventually going through cyborg, or just accelerators? 14:24:51 Looking at the long-term plan for torching the existing pci passthrough code 14:25:35 efried: no 14:25:40 efried: please 14:25:52 mriedem: looks like a parity thing for that blueprint, so I'm +1 on approving 14:26:07 efried: cyborg is a management API for accelerators, but please don't purge the capabilities that nova has to manage a set of devices out of it 14:26:16 anyone else have opinions about the approval of specless blueprint https://blueprints.launchpad.net/nova/+spec/extend-in-use-rbd-volumes ? 14:26:46 melwitt: it might be premature until the actual brick change is approved and ceph testing is green 14:26:50 i was just bringing it up as an fyi 14:26:54 I was about to say the same 14:26:56 sorry for the cross-talk, lemme know when you're done 14:27:01 it requires a new osbrick version 14:27:10 os-brick even 14:27:30 mriedem: ok. so we'll wait to approve until after that. sorry, I thought you were asking to approve now 14:27:35 but if that's straightforward in nova, I'm not opposed to the specless-y 14:28:21 k. cool. I think we're done with that then 14:28:59 efried: go ahead, sorry about that 14:29:09 thanks. 14:29:37 my opinion is that cyborg isn't far enough along to have enough confidence in it to replace things like basic device attach, 14:29:38 So I know cyborg is going to get involved in doing the discovery and reporting (to placement) of accelerator inventory. 14:29:48 especially with SRIOV type things that need some network attention 14:30:19 I would kindof expect that the PCI attach functionality in nova is how we end up attaching accelerators under the covers anyway, perhaps without the same level of whitelisting nonsense 14:30:21 yeah I think it's premature 14:30:28 well, I agree with that for sure. We're not going to be able to replace the whole pci subsystem all at once. 14:30:33 but until cyborg becomes a much more mature thing, I'm not really in favor of replacing anything with it, 14:30:39 but we can take one of two paths wrt cyborg 14:30:40 and only trying to enable what new things it might bring 14:30:59 I'm still a bit concerned 14:31:11 we can either make the effort to embrace it and thus help it mature, pulling in pieces as they become available/usable 14:31:17 if we say this way, then we should have said to leave vGPUs out of the nova radar 14:31:32 or we can go our own way and then do a second, bigger, more painful integration later when we consider cyborg "mature". 14:31:34 the most crucial thing is not what we have, but how we support it 14:31:53 efried: you mean "if/when" 14:31:56 we've said no to fpga directly in nova for years, 14:32:01 because the if part is the important bit to me 14:32:02 cyborg is the path to fpga in nova 14:32:08 so let's see that happen first 14:32:12 mriedem: exactly 14:32:21 before spending a bunch of time retrofitting what we already have 14:32:28 oh yeah 14:32:36 chicken/egg, self-fulfilling prophecy, and all that. 14:32:50 I.e. if we take path A, cyborg is more likely to be a long-term success. 14:32:51 If I'm understanding efried correctly, the concern here is about architecture over the long term 14:33:05 and we could potentially improve the PCI functionality without really pulling it out of nova 14:33:08 if there's a chance that cyborg will become more generic it needs to start out that way sooner 14:33:14 yes, that ^ 14:33:28 I'm not opposed to have the same feature be done in two different ways 14:33:54 Look, it actually makes my life easier if we say we're going to ignore cyborg for a couple of cycles and start rolling our own placement-based device passthrough, per kosamara's spec as written. 14:34:05 after all, it's now 4 cycles that we are wondering how cyborg will interact with nova 14:34:52 yeah, and I think it's not a big deal for placement, right? 14:34:53 well, not really, only since Dublin has it been more than a haze 14:35:14 I heard of cyborg since barcelona 14:35:39 it's just that we had a chat with them since Dublin, yeah 14:35:48 but I'm trying to consider what's best long term, and whether we have a duty^Wresponsibility^Wopportunity to help raise the project and help it mature. 14:36:14 but we can also try to avoid overguessing what the future could be, and leave people engage with us 14:36:30 for example, blazar is way older than placement 14:36:38 but at the end, they will use it 14:36:59 I don't see a problem having both efforts 14:37:19 I guess I'm not sure how cyborg being generic enough is related to which thing we integrate first 14:37:29 so I believe it was dansmith who asked the question on kosamara's spec, lemme find that... 14:37:47 the fpga thing will be the first step and if that works well, we could consider moving other passthrough to it right? 14:37:49 a question standing here is "do we have a chance to collaborate rather than duplicate effort" 14:38:07 melwitt: cyborg is not generic enough today, as defined/planned I think 14:38:14 dansmith: https://review.openstack.org/#/c/591037/ PS5: "I would have expected a lot of the stuff described here to be in scope for cyborg. Not that we should exclude all that from nova necessarily, but I think that it's probably worth calling out how this intersects (or not) cyborg's intended scope." 14:38:15 melwitt: they're asking if we should encourage them to *be* generic enough 14:38:20 dansmith: oh, ok 14:38:46 was just looking at their wiki again, "various types of accelerators such as GPU, FPGA, ASIC, NP, SoCs, NVMe/NOF SSDs, ODP, DPDK/SPDK and so on" so I thought that sounded generic 14:38:47 well, when I asked Sundar this question, his reaction was yes. 14:39:09 this is premature - given how slow things move, they should opt to be generic if possible, 14:39:15 but not at the expense of actually getting shit done 14:39:22 right 14:39:30 i don't think we have any duty to raise that project 14:39:34 we can collaborate, sure 14:39:40 but it's not my top priority by any means 14:39:44 My point being that that affects how we proceed in nova with device passthrough and making existing pci code diaf 14:39:44 neither mine, 14:39:47 and expect it's not the priority for others 14:39:54 it is mine, actually. 14:39:57 but I think that efried is asking because he wants to know whether to push on the nova-centric generic device approach, 14:40:00 or go push in cyborg 14:40:02 correct 14:40:07 thanks dansmith, nail on head 14:40:37 i'm likely not going to be involved in that either way, at least not in stein, so doesn't matter to me personally 14:40:54 obviously decomp is best if possible, 14:40:59 but that might take a couple of years 14:41:09 decomp like "let that corpse rot" ? 14:41:16 vay 14:42:39 Okay, so dansmith if the response in that review is, "this may or may not be in scope for cyborg long-term, but we're going to do it this way until that project matures more"... 14:42:40 punt to ptg? 14:42:41 that wfy? 14:43:06 mriedem: yeah, I've typed out several responses and deleted them all because I can't articulate my feelings on the matter 14:43:11 so maybe ptg 14:43:12 Yeah, definitely going to discuss some at ptg, but wanted to get a couple of gears turning in y'all's heads. 14:43:26 I guess the bottom line is: 14:43:42 I don't have a lot of faith in cyborg becoming a useful generic device service as it is today 14:43:54 so if I cared about generic devices a lot, I probably wouldn't put my eggs in that basket 14:44:23 but, since I don't care so much, putting them over there keeps them out of the way in nova 14:44:24 so..? :) 14:44:37 okay 14:44:38 so 14:45:00 I'm going to be pushing hard for at least a small piece (full GPUs) of generic placement-based device passthrough in Stein. 14:45:25 And obviously will be asking people like those present here to review things in that space. 14:45:41 so wanted to get pre-buy-in for which approach to take short-term (stein) 14:45:44 which I think I have now 14:45:44 I think fleshing out GPUs in nova, which we already have makes sense 14:45:45 so 14:45:46 thanks. 14:45:56 well, distinguishing VGPU from GPU in this case dansmith 14:46:09 Those are going to be very different things. 14:46:09 oh, 14:46:26 you want a GPU-specific PCI passthrough replacement? 14:46:30 The full-GPU passthrough thing is going to actually subsume some of the functionality you can currently do with [pci]* 14:46:35 yes exactly 14:46:43 GPU first 14:46:55 * dansmith looks for his spoon 14:46:56 or possibly any "full card" 14:47:53 I see being able to use either mechanism (legacy [pci]passthrough_whitelist/alias or The New Thing) for multiple releases 14:48:03 until we have full parity and can start ripping out the legacy thing 14:48:09 if we try to do it all at once, fail 14:48:19 we should make a list of the ginormous tasks we think we're going to take on in stein - at the ptg of course 14:48:28 b/c i remember a lot of wailing about not having shared storage support yet 14:48:48 cross-cell cold migrate is going to be my albatross 14:48:56 or a plan for numa 14:49:12 or just being able to upgrade to stein with placement working :) 14:49:20 yeah 14:49:20 yeah, I want to get shared storage squared away. being that it looks like it's close too 14:49:27 or instance ownership transfers 14:49:28 etc 14:49:33 lots of big proposals on the plate right now 14:49:45 we're gonna need to weigh this stuff 14:50:09 should be plenty of scales in colorado 14:50:20 b/c of fat coloradoans? 14:50:27 weeeeeeeed 14:50:28 cows 14:50:30 oh right 14:50:31 heh 14:50:36 *rimshot* 14:50:55 ok, are we done? :) 14:50:57 cdent: post-legalization, it's not that big a deal to make sure the dime bag is no larger than it should be :) 14:51:49 I think I'm done 14:51:53 * cdent avoids going off into too much weed jargon 14:52:13 ok, let's call it. thanks everyone 14:52:17 #endmeeting