16:00:24 <gibi> #startmeeting nova
16:00:26 <openstack> Meeting started Thu Apr 15 16:00:24 2021 UTC and is due to finish in 60 minutes.  The chair is gibi. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:27 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:29 <openstack> The meeting name has been set to 'nova'
16:00:34 <gibi> o/
16:01:13 <ganso> o/
16:01:22 <lyarwood> o/
16:02:05 <elod> o/
16:02:08 <gibi> #topic Bugs (stuck/critical)
16:02:13 <gibi> No Critical bugs
16:02:16 <bauzas> \o
16:02:18 <gibi> #link 18 new untriaged bugs (+4 since the last meeting): #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New
16:02:25 <gibi> It seems that the gerrit - launchpad integration started to work so bug status expected to be updated automatically from gerrit again.
16:02:29 <gibi> Details: #link http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2021-04-15.log.html#t2021-04-15T14:28:21
16:02:44 <dansmith> o/
16:02:59 <stephenfin> o/
16:03:05 <gibi> is there any specific bug we need to talk about?
16:03:24 <stephenfin> There's that gate bug, but I guess we'll get to it shortly
16:03:45 <gibi> yupp
16:03:57 <gibi> anything else on bug side?
16:04:37 <gibi> #topic Gate status
16:04:42 <gibi> We have a high failure rate in the live migration job due to #link https://bugs.launchpad.net/tempest/+bug/1924258
16:04:44 <openstack> Launchpad bug 1924258 in tempest "test_live_migration_with_trunk fails intermittently" [Undecided,In progress] - Assigned to Lajos Katona (lajos-katona)
16:05:01 <gibi> stephenfin: Is this you wanted to add? ^^
16:05:22 <gibi> it just got a WIP patch #link https://review.opendev.org/c/openstack/tempest/+/786465
16:05:23 <stephenfin> that's the one
16:06:06 <gibi> any other gate issue we should know about?
16:06:50 <gmann> o/
16:07:01 <gibi> #topic Release Planning
16:07:06 <gibi> Wallaby has been released
16:07:16 <gibi> Thank you all who made it happen! \o/
16:07:20 <gibi> Wallaby project update call happened today, the recording is available here: #link https://www.youtube.com/watch?v=tZ2bfdF0fOg
16:07:45 <gibi> We need two patches to land to fully open the master to Xena
16:07:48 <gibi> #link https://review.opendev.org/c/openstack/nova/+/782171
16:07:52 <gibi> #link https://review.opendev.org/c/openstack/nova/+/778923
16:08:03 <gibi> needs some eyes from cores ^^
16:08:18 <gibi> any other release info?
16:09:55 <stephenfin> I'm +2 on both of those now
16:10:02 <gibi> stephenfin: thanks!
16:10:05 <stephenfin> needs another +2 for https://review.opendev.org/c/openstack/nova/+/782171
16:10:05 <gibi> #topic PTG planning
16:10:16 <gibi> PTG is next week!
16:10:24 <gibi> topics: #link https://etherpad.opendev.org/p/nova-xena-ptg
16:10:26 <bauzas> I miss you all folks
16:10:32 <gibi> me too
16:11:00 * bauzas needs to prepare the PTG... by filling up his keg
16:11:15 <gibi> recent updates in the ptg schedule:
16:11:15 <gibi> A small neutron - nova cross project is booked for Friday 15:00 UTC, we have only one topic.
16:11:21 <gibi> Also there was a request for an interop session from Arkady and it is booked to Wednesday 14:00 UTC
16:11:35 <gibi> I did a minimal reorg on the nova topics but the basic rule that we well go from top to bottom and potentially defer topics if the author / expert is not available.
16:12:03 <gibi> if anybody has a topic that needs special timing then let me know and I will note it
16:12:08 <gibi> and try to schedule it
16:12:38 <gibi> ptg bot is up to date #link http://ptg.openstack.org/ptg.html
16:13:29 <gibi> next week we will skip the weekly meeting due to PTG
16:14:29 <gibi> anything else about next week and the PTG?
16:15:40 <gibi> #topic Stable Branches
16:15:45 <gibi> stable/wallaby is open for bug fix backports
16:15:49 <gibi> stable gates should be OK from Wallaby till Pike (stackviz post-task workarounds are merged)
16:15:52 <gibi> EOM
16:15:55 <gibi> thanks elod for the update
16:15:59 <elod> np
16:16:03 <gibi> anything else on stable?
16:16:18 <elod> nothing else from me
16:17:08 <gmann> +1
16:17:28 <bauzas> how are the stable branch jobs ?
16:18:00 <gmann> it should be green now after stackviz workaround and grenade stable/train fixes
16:19:17 <gmann> this on pike merged 5 days ago, so should be all green https://review.opendev.org/c/openstack/nova/+/723055
16:19:24 <elod> yes, as I saw they are OK, thanks for the fixes
16:20:04 <bauzas> thanks
16:20:33 <gibi> #topic Sub/related team Highlights
16:20:37 <gibi> Libvirt (bauzas)
16:21:01 <bauzas> well, mnaser filed a complaint about how bad I write libvirt features
16:21:52 <bauzas> so, now, in order to avoid jail, I have to work on https://bugs.launchpad.net/nova/+bug/1900800 and deliver appropriate backports
16:21:53 <openstack> Launchpad bug 1900800 in OpenStack Compute (nova) "VGPUs is not recreated on host reboot" [Low,Confirmed] - Assigned to Sylvain Bauza (sylvain-bauza)
16:21:54 <bauzas> that's it.
16:21:59 <gibi> I saw you found a way forward about it
16:22:15 <bauzas> (this was a joke, to be 100% clear)
16:22:22 <bauzas> moving on
16:22:59 <gibi> #topic Open discussion
16:23:02 <lassimus87> I'd like to discuss adding support for guests with arch != host arch. I'm tracking @stephenfin and others topics in ptg etherpad. I have a working concept here: https://review.opendev.org/c/openstack/nova/+/772156. I've been afk for the last few weeks waiting out the wallaby release period, so I have some new merge conflicts to resolve. I'm
16:23:03 <lassimus87> bringing it up here because there seems to be differing opinions on the direction of nova regarding emulation support.
16:23:45 <gibi> lassimus87: could you open up what are the differing options?
16:24:10 <bauzas> this rings me a bell.
16:24:18 <lassimus87> "Only conflict is if we ever wanted to support non-host guests using these architectures but that seems to be a non-goal of nova?(maybe we shoudl rephsase this as dropping supprot for 32bit hosts) that might be a valid cross project goal." --@stephenfin from the xena ptg notes
16:24:21 <bauzas> belmoreira was interested in this, if I recall correctly
16:24:49 <lassimus> okay I got my nick back :)
16:25:20 <stephenfin> I think this would be worth discussing at the PTG, if possible
16:25:50 <stephenfin> My point there is that we tend to conflate host and guest architectures, since we've haven't support host != guest architecture for some time now
16:26:05 <lassimus> I'm fine waiting to discuss until the PTG. I'm new to the nova dev community, so I didn't want to add discussion directly to etherpad
16:26:28 <gibi> lassimus: feel free to add discussion to the ptg etherpad
16:26:42 <stephenfin> We have checks for things like MIPS, which no new hardware is being made for, which means we could drop the host architecture support, but I'm not sure if we can drop the guest architecture support
16:26:50 <stephenfin> so yeah, a good PTG topic
16:27:08 <gibi> lassimus: will you be able to join us during the PTG next week?
16:27:17 <lassimus> awesome. I'll add some thoughts on etherpad, and I look forward to hashing it out next week
16:27:18 <lassimus> yes
16:27:43 <gibi> lassimus: I will make sure to ping you when we reach this topic next week
16:28:00 <lassimus> perfect
16:28:12 <gibi> I do feel that we had guest != host request from CERN so you are not alone
16:28:30 <bauzas> right, hence my courtesy ping to belmoreira :)
16:28:36 <gibi> bauzas: ++
16:28:40 <lassimus> my customer is the Georgia Cyber Center, and some other minor interested parties
16:29:05 <gibi> lassimus: cool
16:29:12 <bauzas> that's a reasonable ask, but we need to discuss the design
16:29:12 * artom thought there was a massive emulation performance penalty on that, last time he checked
16:29:19 <artom> Though I guess it depends on the specific arch's
16:29:27 <bauzas> artom: yup, from my recollection
16:29:38 <lassimus> performance isn't always the goal
16:29:39 <bauzas> but there are good reasons now to mix them up
16:29:47 <artom> So I am a bit curious what use cases don't mind the perf hit
16:29:55 <bauzas> spec up !
16:29:56 <bauzas> :p
16:30:14 <bauzas> unkidding, sounds a good PTG discussion
16:30:18 <gibi> artom: I can imagine a CI system functional testing arch specific app in a cheap way
16:30:34 <gibi> ppc tend to be expensive
16:30:39 <lassimus> yeah, I'm happy to brain dump here, but it seems like a better fit for the PTG
16:30:51 <gibi> sure, lets do the braindumping next week
16:31:01 <gibi> any other topic for today?
16:31:09 <ganso> gibi: o/ my topic is on the agenda
16:31:26 <gibi> ganso: ohh, I missed that, please tell us
16:31:38 <ganso> topic: update on bug 1821755 (ganso)
16:31:39 <openstack> bug 1821755 in OpenStack Compute (nova) "live migration break the anti-affinity policy of server group simultaneously" [Medium,In progress] https://launchpad.net/bugs/1821755 - Assigned to Boxiang Zhu (bxzhu-5355)
16:31:57 <ganso> so, I've brought this up in a meeting previously about addressing this bug
16:32:17 <ganso> but I leaned towards a redesign of the (anti-)affinity functionality in placement
16:32:56 <ganso> I spent a significant amount of effort on that and hit several struggles. It can be done, but the amount of work and complexity has increased far beyond what I initially estimated
16:33:08 <bauzas> affinity in placement is a can of worms
16:33:26 <ganso> bauzas: yea, looks like I hit some of those worms xD
16:33:29 <ganso> therefore I decided to take a step back and try a simpler alternative
16:33:30 <artom> Angry worms, with teeth and spikes and venomous stingers
16:33:33 <bauzas> I thought we said we should model the affinity between RPs as a distance between them
16:33:39 <ganso> voi-la https://review.opendev.org/c/openstack/nova/+/784166
16:34:10 <ganso> basically I took inspiration from a previous attempt on solving the bug (https://review.openstack.org/651969)
16:34:16 <ganso> and did some things differently
16:34:30 <bauzas> ganso: I discover the bug, what's the problem ?
16:34:34 <ganso> in my testing I was not able to reproduce the issue for anti-affinity any longer
16:35:08 <ganso> bauzas: sorry I didn't understand your question?
16:35:10 <bauzas> don't we have the late affinity check ?
16:35:23 <bauzas> on the compute service
16:35:27 <gibi> bauzas: I think this patch now adds the late affinity check for live migration
16:35:35 <bauzas> or have we removed it?
16:35:35 <ganso> bauzas: oh ok, so the problem is that there are race conditions that violate the policy when doing concurrent migrations (live or cold)
16:35:36 <artom> bauzas, yeah, we only have that check for boot, and no other move operation
16:35:52 <bauzas> artom: really? I'm surprised
16:36:01 <artom> I'm not :P
16:36:15 <ganso> bauzas: the existing check worked only for when creating instances, and it doesn't account for instances are being migrated to that host
16:36:34 <bauzas> actually reading mriedem's comment
16:36:58 <bauzas> okay, this sounds a decent review request then
16:37:12 <gibi> yeah, I queued it up to my review list
16:37:12 <artom> I mean, I agree the "correct" way to do it would be placement
16:37:26 <artom> Anything else is hax.
16:37:36 <artom> OTOH, the former is hard, and the latter is much quicker and easier
16:37:38 <gibi> and I agree that affinity and placement is a hard topic so I have no problem having a hax in the meantime
16:37:47 <ganso> there is a lengthy discussion on the gerrit page, I tried to address all comments with as much detail as I can to move this forward. There is also a summary of my work in the redesign in one of the comments (the lenghiest one)
16:38:26 <ganso> gibi, artom: so, the patch I'm proposing only addresses anti-affinity, not affinity
16:38:45 <gibi> ganso: thanks for picking this work up I will try to get to it tomorrow
16:38:56 <ganso> those are 2 different can of worms, and I found the affinity ones to be the most venomous ones :P
16:38:56 <bauzas> ganso: which is what the late-affinity check is doing
16:39:27 <bauzas> violating the affinity policy isn't a race for a single compute
16:39:42 <ganso> doing both through placement is a huge amount of work. Doing just one leaves things hanging and incomplete, possibly requiring another redesign for implementing the other one
16:39:46 <bauzas> you just don't see the race as both instances are spreaded
16:40:15 <bauzas> ganso: I totally agree and I fundamentally disagree with sean-k-mooney's objection :)
16:40:30 <bauzas> sad he isn't here :)
16:40:33 <ganso> bauzas: yea I use 5 computes in my lab, makes it much easier to reproduce and visualize the violations
16:41:10 <ganso> that's all I had. Thanks all and looking forward to your reviews :)
16:41:21 <gibi> ganso: thanks for working on the bug
16:41:23 <bauzas> yup
16:41:32 <gibi> anything else for today?
16:42:53 <gibi> if not then I thank all of you to join today
16:43:14 <gibi> I will have glass of vine for the wallaby release. thanks again to make that happen
16:43:20 <gibi> not the wine, the release :D
16:43:56 <gibi> #endmeeting