21:00:07 <melwitt> #startmeeting nova
21:00:08 <openstack> Meeting started Thu Feb 28 21:00:07 2019 UTC and is due to finish in 60 minutes.  The chair is melwitt. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:09 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:12 <openstack> The meeting name has been set to 'nova'
21:00:14 <mriedem> o/
21:00:15 <melwitt> hi everyone, welcome to the nova meeting
21:00:18 <dansmith> o/
21:00:19 <takashin> o/
21:00:21 <melwitt> agenda https://wiki.openstack.org/wiki/Meetings/Nova
21:00:21 <artom> ~o~
21:00:33 <mriedem> that's my move
21:00:46 <melwitt> let's make a start
21:00:47 <edleafe> \o
21:00:49 <melwitt> #topic Release News
21:00:56 <melwitt> #link Stein release schedule: https://wiki.openstack.org/wiki/Nova/Stein_Release_Schedule
21:01:03 <melwitt> #info non-client library freeze is today Feb 28, os-vif 1.15.1 was released, os-resource-classes 0.3.0 was released. os-traits did not have anything new to release since last version.
21:01:19 <melwitt> so all of our non-client library releases are done
21:01:26 <melwitt> #info s-3 feature freeze is March 7
21:01:29 <melwitt> one week away
21:01:37 <melwitt> #link Stein blueprint status tracking: https://etherpad.openstack.org/p/nova-stein-blueprint-status
21:01:53 <melwitt> we're tracking progress here ^
21:02:00 <efried> o/
21:02:01 <melwitt> #link Stein RC potential changes tracking: https://etherpad.openstack.org/p/nova-stein-rc-potential
21:02:25 <melwitt> RC potential blocker bugs and other related RC stuff goes here ^
21:02:39 <melwitt> #link Stein runway etherpad: https://etherpad.openstack.org/p/nova-runways-stein
21:02:47 <melwitt> #link runway #1: https://blueprints.launchpad.net/nova/+spec/flavor-extra-spec-image-property-validation (jackding) [END 2019-03-06] https://review.openstack.org/#/c/620706/ Flavor extra spec and image properties validation
21:02:53 <melwitt> #link runway #2: https://blueprints.launchpad.net/nova/+spec/ironic-conductor-groups (jroll) [END 2019-03-06] https://review.openstack.org/#/c/635006/ ironic: partition compute services by conductor group
21:03:00 <melwitt> this is merged and bp marked as complete today ^
21:03:08 <melwitt> #link runway #3: https://blueprints.launchpad.net/nova/+spec/enable-rebuild-for-instances-in-cell0 (ttsiouts) [END 2019-03-07 - feature freeze] https://review.openstack.org/570201
21:03:34 <melwitt> does anyone have anything else to mention for release news? or questions?
21:03:55 <melwitt> ok, moving on
21:03:57 <melwitt> #topic Bugs (stuck/critical)
21:04:02 <melwitt> no critical bugs
21:04:09 <melwitt> #link 69 new untriaged bugs (up 5 since the last meeting): https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New
21:04:16 <melwitt> #link 9 untagged untriaged bugs (up 3 since the last meeting): https://bugs.launchpad.net/nova/+bugs?field.tag=-*&field.status%3Alist=NEW
21:04:25 <melwitt> #link bug triage how-to: https://wiki.openstack.org/wiki/Nova/BugTriage#Tags
21:04:31 <melwitt> #help need help with bug triage
21:05:08 <melwitt> when doing bug triage, use the nova-stein-rc-potential bug tag for potential RC blockers
21:05:47 <melwitt> #link ML post http://lists.openstack.org/pipermail/openstack-discuss/2019-February/003343.html
21:05:59 <melwitt> Gate status
21:06:04 <melwitt> #link check queue gate status http://status.openstack.org/elastic-recheck/index.html
21:06:10 <melwitt> 3rd party CI
21:06:15 <melwitt> #link 3rd party CI status http://ciwatch.mmedvede.net/project?project=nova&time=7+days
21:06:37 <melwitt> anything else to mention for bugs or gate/CI?
21:06:56 <melwitt> ok, continuing
21:07:05 <melwitt> #topic Stable branch status
21:07:13 <melwitt> #link stable/rocky: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/rocky,n,z
21:07:31 <melwitt> very few rocky backports proposed
21:07:38 <melwitt> #link stable/queens: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/queens,n,z
21:07:42 <efried> That must mean rocky was perfect
21:07:56 <melwitt> yeah that's what I assume
21:07:56 * artom prefers Rocky II personally
21:08:08 <melwitt> queens backports could use some review help
21:08:14 <melwitt> lots o backports
21:08:20 <melwitt> #link stable/pike: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/pike,n,z
21:08:31 <efried> something something Stallone can beat up Freddy Mercury
21:08:32 <melwitt> bunch for pike too. help wanted
21:08:58 <melwitt> I'll propose stable releases next week on s-3 day
21:09:24 <melwitt> since we usually aim for doing stable release at milestone
21:09:40 <melwitt> maybe we should do that a week later bc FF actually
21:10:19 <melwitt> so maybe the week after FF aim to flush stable reviews and release
21:10:35 <melwitt> anything else for stable branches before we move on?
21:10:46 <melwitt> ok
21:10:50 <melwitt> #topic Subteam Highlights
21:11:01 <melwitt> efried: any updates for scheduler?
21:11:07 <efried> you know it
21:11:14 <efried> #link n-sch minutes http://eavesdrop.openstack.org/meetings/nova_scheduler/2019/nova_scheduler.2019-02-25-14.00.html
21:11:22 <efried> We discussed
21:11:22 <efried> #link alloc cands in_tree series starting at https://review.openstack.org/#/c/638929/
21:11:22 <efried> ...which has since merged \o/ (microversion 1.31)
21:11:38 <efried> We discussed
21:11:38 <efried> #link the OVO-ectomy https://review.openstack.org/#/q/topic:cd/less-ovo+(status:open+OR+status:merged)
21:11:38 <efried> ...all of which has since merged. There is a continuation of
21:11:38 <efried> #link refactors and cleanup currently starting at https://review.openstack.org/#/c/637325/
21:11:51 <efried> We discussed
21:11:51 <efried> #link libvirt reshaper (new bottom of series) https://review.openstack.org/#/c/636591/
21:11:51 <efried> That bottom patch has merged, and the rest of the series is mostly green except for one issue noted at
21:11:52 <efried> #link what happens to mdevs on reboot? https://review.openstack.org/#/c/636591/5/nova/virt/libvirt/driver.py@586
21:12:02 <efried> We discussed
21:12:02 <efried> #link ML thread about placement & related bug/bp tracking #link http://lists.openstack.org/pipermail/openstack-discuss/2019-February/003102.html
21:12:02 <efried> As well another couple of operational things that should be hashed out on the ML, possibly initiated there by the PTL (old or new):
21:12:02 <efried> - Format/fate of the n-sch meeting
21:12:02 <efried> - Placement team logistics at the PTG
21:12:15 <efried> END
21:12:34 <melwitt> cool, so based on that I will mark the in_tree bp as complete
21:12:42 <efried> there is a spec update pending
21:13:07 <melwitt> I think that's ok. I'll bug people to review the spec update
21:13:14 <efried> #link in-tree alloc candidates spec update https://review.openstack.org/#/c/639033/
21:13:24 <efried> yeah, self.review_that_sucker()
21:13:39 <melwitt> yeah, me too
21:13:41 <melwitt> ok
21:13:48 <melwitt> no updates for API from gmann on the agenda
21:13:59 <melwitt> so we'll move on to...
21:14:05 <melwitt> #topic Stuck Reviews
21:14:25 <melwitt> (mriedem): Decide what to do about attaching volumes with tags to shelved offloaded servers for https://review.openstack.org/#/c/623981
21:14:35 <melwitt> #link ML thread with options: http://lists.openstack.org/pipermail/openstack-discuss/2019-February/003356.html
21:14:50 <mriedem> you want me to just go?
21:14:57 <melwitt> yeah, sure
21:14:57 <mriedem> the options are in the email
21:15:06 <mriedem> the way the root detach/attach code is written today,
21:15:14 <mriedem> when detaching a root volume, the tag is reset to None,
21:15:25 <mriedem> with the idea that when you attach a new root volume, you could specify a new tag,
21:15:33 <mriedem> the problem is, root detach/attach is only allowed on shelved offloaded instances,
21:15:47 <mriedem> but the api does not allow you to attach a volume with a tag to a shelved offloaded instance
21:15:50 <mriedem> the tag part specifically
21:16:03 <mriedem> the original thinking was because when we unshelve, we don't know if the compute will support tags
21:16:05 <mriedem> and honor them
21:16:08 <mriedem> however,
21:16:27 <mriedem> that's already a latent bug because i can createa server with device tags, shelve it and then unshelve it and if i land on a host that does not support device tags, it passes but my tags aren't exposed to the guest
21:16:46 <mriedem> that recorded with bug 1817927
21:16:47 <openstack> bug 1817927 in OpenStack Compute (nova) "device tagging support is not checked during move operations" [Undecided,New] https://launchpad.net/bugs/1817927
21:16:51 <mriedem> same is true for any move operation actually,
21:16:59 <mriedem> because we don't consider the user-requested device tags during scheduling at all, not even create
21:17:05 <mriedem> so,
21:17:23 <mriedem> i think we're restricting attaching volumes with tags to shelve offloaded servers for really no good reason
21:17:46 <artom> I guess realistically, how many people are running heterogeneous clouds with the potential to hit bug 1817927? It was reported by mriedem, not and end user/operator...
21:17:46 <openstack> bug 1817927 in OpenStack Compute (nova) "device tagging support is not checked during move operations" [Undecided,New] https://launchpad.net/bugs/1817927
21:17:46 <mriedem> the question is what to do about it in the context of the root volume detach/attach series
21:17:57 <mriedem> artom: i would say slim
21:18:05 <artom> mriedem, yeah, so I'm partial to 1, with you
21:18:13 <mriedem> also, looking back,
21:18:28 <mriedem> we probably should have put a policy rule in the api for device tags if your deployment doesn't support them
21:18:29 <melwitt> IIUC, when you create an instance, tags are not guaranteed to be supported by the compute host the server lands on?
21:18:32 <mriedem> like we have for trusted certs
21:18:47 <mriedem> melwitt: correct, and if they land on a compute that doesn't support them during create, it aborts
21:18:54 <mriedem> no reschedule, nothing - you're dead
21:19:00 <artom> melwitt, this was more true in the past with the possibility of older versions, but now it's just about running a supported hypervisor
21:19:09 <melwitt> ok. then I guess I don't see why to restrict it for shelve/unshelve
21:19:10 <mriedem> again correct
21:19:13 <artom> hyperv and xen and libvirt have them (for boot time)
21:19:30 <mriedem> yeah so if you're running VIO and your users try to specify tags during server create, kaboom
21:19:46 <mriedem> we could policy rule that out of existence if we wanted, but it hasn't come up
21:19:58 <melwitt> yeah. it seems like the existing restriction doesn't make sense given that there's not even a restriction for create
21:20:05 <mriedem> also, with the compute-driven capabilities traits stuff that aspiers is working on,
21:20:27 <mriedem> we can modify the request spec in train to say, "the user wants tags, so make sure you give them a compute which supports tags"
21:21:01 <melwitt> yeah, that would be nice
21:21:12 <mriedem> so if we're leaning to option 1, we would lift that restriction in the same microversion Kevin_Zheng is adding for the root attach/detach support
21:21:17 <mriedem> i assume anyway
21:21:36 <mriedem> we can't really just remove the restriction and say 'oops' for interop reasons
21:21:56 <melwitt> yeah. I can't immediately think of how a separate microversion would help
21:22:08 <mriedem> this does make his change more complicated
21:22:14 <melwitt> yeah :(
21:22:21 <mriedem> but i think it needs to happen this way, i don't want to half ass around with multiple microversions for this
21:22:39 <artom> mriedem, actually, hold up
21:23:04 <artom> IIRC one of the reasons we outright refused tagged attach to shelved is because we had to communicate with the compute manager
21:23:20 <artom> Which we didn't know at the time of attach
21:23:30 <artom> Has this been "solved" by Kevin's work?
21:23:48 <mriedem> when attaching a volume to a not-shelved server, we call down to compute to reserve a device name
21:23:58 <mriedem> when attaching a volume to a shelved offloaded server, we just create the bdm in the api
21:24:21 <mriedem> in the case of your tagged attach code, it will also check the compute capabilities to see if it supports tagged attach and blow up if not
21:24:32 <mriedem> so we wouldn't have ^ in the case of shelved offloaded attach
21:24:43 <mriedem> however, as noted, we're already not honoring device tags on unshelve anyway
21:24:50 <mriedem> so....who cares?
21:25:16 <mriedem> the long-term fix for doing that properly is the scheduling based on required traits stuff
21:25:31 <mriedem> i don't think we can just start exploding unshelve because servers have tags with them now
21:25:46 <mriedem> until the scheduler piece is worked in
21:26:23 <mriedem> artom: you're thinking of this https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L5480
21:26:31 <mriedem> ^ happens during an attach to a non-shelved server
21:27:07 <mriedem> multiattach volumes are kind of broken in the same way wrt unshelve
21:27:22 <mriedem> you can boot from multiattach volume, shelve and then unshelve elsewhere on something that doesn't support it
21:27:59 <mriedem> the api kicks out trying to attach multiattach volumes to shelved servers as well
21:28:26 <mriedem> https://github.com/openstack/nova/blob/master/nova/compute/api.py#L4199
21:28:48 <melwitt> I assume volume attach is the only time you can add device tags to something
21:29:01 <mriedem> create and attach
21:29:15 <melwitt> otherwise a workaround would be to set them after attaching sans tags
21:29:17 <melwitt> got it
21:29:33 <mriedem> we don't have that today
21:29:38 <artom> mriedem, hah, found it https://review.openstack.org/#/c/391941/50/nova/compute/api.py
21:29:39 <melwitt> right
21:29:47 <artom> And yeah, it was only checking for compute host support
21:30:37 <melwitt> so that means a server create could reschedule to land on a host with support?
21:31:00 <mriedem> server create aborts if it lands on a host that doesn't support tags
21:31:04 <mriedem> it does not reschedule
21:31:18 <melwitt> oh that's in manager
21:31:19 <melwitt> I see. ok
21:31:26 <mriedem> https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1898
21:31:48 <mriedem> so another option is,
21:31:59 <mriedem> land kevin's code as-is (option 2 in my email),
21:32:28 <mriedem> and when we have smarter scheduling to take device tags and multiattach volumes into account, we could add a microversion to top the tag/multiattach restriction on attaching volumes to shelved offloaded instances
21:32:52 <mriedem> which is these 2 checks https://github.com/openstack/nova/blob/master/nova/compute/api.py#L4191
21:33:10 <mriedem> s/top/drop/
21:33:11 <artom> It'd be kinda weird to just disappear a tag without warnin
21:33:13 <artom> *warning
21:33:36 <melwitt> that also sounds reasonable, and would make it easier on Kevin for landing this series. the only potential pitfall is that ^ you could lose your tags and be unable to re-add them
21:33:41 <mriedem> well, we'd probably put something in the api reference saying 'at this microversion you can detach a root volume but note that the tag will be gone with it and you cannot provide a new tag when attaching a new root volume'
21:34:13 <artom> We'll need a big warning regardless
21:34:16 <melwitt> ++
21:34:22 <artom> Hrmm, so actually, the API is user-facing, right?
21:34:36 <mriedem> our api is meant to be used by users yes...
21:34:37 <artom> So if we're going to warn about stuff in the API, it should be about what users can change/control
21:34:48 <artom> Ie, telling them their tag will disappear is fair game
21:34:51 <melwitt> I think ideal is option 1, the restriction seems artificial based on what's been explained. but can we even get option 1 done within a week
21:35:08 <artom> Telling them their unshelve might blow up because behind the scenes the operator is running different HVs isn't fair
21:35:12 <artom> Because they can't do anything about that
21:35:38 <artom> So with that reasoning I'm leaning more 2 now
21:36:05 <mriedem> artom: as in do what we have now, and in the future when we don't suck at scheduling, lift the api restriction
21:36:16 <artom> mriedem, yeah
21:36:19 <artom> (Heh, "when")
21:36:21 <melwitt> I guess I could see how unshelve is worse than rejecting a create in a mixed HV's env because you haven't invested much into your server yet
21:36:42 <mriedem> well, unshelve just fails,
21:36:46 <mriedem> we don't delete your snapshot
21:37:08 <mriedem> actually unshelve doesn't even fail if you have device tags
21:37:12 <mriedem> that's that bug from earlier
21:37:17 <mriedem> bug 1817927
21:37:18 <openstack> bug 1817927 in OpenStack Compute (nova) "device tagging support is not checked during move operations" [Undecided,New] https://launchpad.net/bugs/1817927
21:37:20 <melwitt> oh yeah, right.
21:37:26 <artom> (That'd be hillarious if the FaultWrapper just deleted a random instance)
21:37:27 <mriedem> but nor does evacuate, resize or live migrate
21:38:00 <artom> So what does actually happen? The tag is just ignored?
21:38:19 <mriedem> yes
21:38:25 <artom> That's harmless
21:38:36 <mriedem> we don't honor the user request
21:38:42 <artom> So really 1 and 2 are the same in that sense
21:38:47 <artom> You end up with a tagless server
21:39:06 <melwitt> ok, this is pretty complicated to reason about but I think the problem has been adequately explained. so we could continue discussing in #openstack-nova and/or the ML
21:39:06 <artom> In 1 it's ignored by the unshelve
21:39:12 <artom> In 2 it's removed by the detach
21:39:41 <mriedem> ok we can move on, people can dump opinions in the ML
21:39:46 <artom> I have to bounce to pick up kids anyways
21:39:48 <artom> o/
21:39:52 <melwitt> ok, cool
21:39:58 <melwitt> last thing, open discussion
21:40:15 <melwitt> #topic Open discussion
21:40:26 <melwitt> anyone have anything for open discussion before we wrap up?
21:40:52 <melwitt> going
21:41:00 <melwitt> going
21:41:18 <melwitt> ok, guess that's it, thank you all
21:41:19 <melwitt> #endmeeting