Tuesday, 2019-09-10

*** ociuhandu has joined #openstack-nova00:10
*** factor has joined #openstack-nova00:12
*** ociuhandu has quit IRC00:15
*** ganso has quit IRC00:23
*** avolkov has quit IRC00:29
*** gbarros has joined #openstack-nova00:30
*** sapd1_x has quit IRC00:35
*** macz has quit IRC00:39
*** gbarros has quit IRC00:40
*** gyee has quit IRC00:42
*** gbarros has joined #openstack-nova00:46
*** markvoelker has joined #openstack-nova00:46
*** markvoelker has quit IRC00:48
*** markvoelker has joined #openstack-nova00:49
*** spatel has joined #openstack-nova00:49
*** spatel has quit IRC00:53
*** gbarros has quit IRC00:57
*** markvoelker has quit IRC00:59
*** markvoelker has joined #openstack-nova00:59
*** markvoelker has quit IRC01:04
*** nicolasbock has quit IRC01:04
*** nicolasbock has joined #openstack-nova01:04
*** markvoelker has joined #openstack-nova01:07
*** markvoelker has quit IRC01:17
*** markvoelker has joined #openstack-nova01:18
*** markvoelker has quit IRC01:23
*** mtanino has joined #openstack-nova01:25
*** hongbin has joined #openstack-nova01:35
*** awalende has joined #openstack-nova01:46
*** markvoelker has joined #openstack-nova01:48
*** awalende has quit IRC01:50
openstackgerritMerged openstack/python-novaclient master: Microversion 2.79: Add delete_on_termination to volume-attach API  https://review.opendev.org/67348501:59
*** nicolasbock has quit IRC02:01
*** spsurya has joined #openstack-nova02:17
*** markvoelker has quit IRC02:20
*** markvoelker has joined #openstack-nova02:22
*** icarusfactor has joined #openstack-nova02:25
openstackgerritAkira KAMIO proposed openstack/nova master: VMware: disk_io_limits settings are not reflected when resize  https://review.opendev.org/68029602:27
*** factor has quit IRC02:27
*** ganso has joined #openstack-nova02:33
*** larainema has joined #openstack-nova02:33
*** macz has joined #openstack-nova02:36
*** macz has quit IRC02:41
*** BjoernT has joined #openstack-nova03:01
openstackgerritArtom Lifshitz proposed openstack/nova master: New objects for NUMA live migration  https://review.opendev.org/63482703:01
openstackgerritArtom Lifshitz proposed openstack/nova master: LM: Use Claims to update numa-related XML on the source  https://review.opendev.org/63522903:01
openstackgerritArtom Lifshitz proposed openstack/nova master: NUMA live migration support  https://review.opendev.org/63460603:01
openstackgerritArtom Lifshitz proposed openstack/nova master: Deprecate CONF.workarounds.enable_numa_live_migration  https://review.opendev.org/64002103:01
openstackgerritArtom Lifshitz proposed openstack/nova master: Functional tests for NUMA live migration  https://review.opendev.org/67259503:01
*** hamzy_ has quit IRC03:06
*** BjoernT has quit IRC03:24
*** igordc has quit IRC03:31
*** dave-mccowan has quit IRC03:35
*** ash2307 has joined #openstack-nova03:36
*** Izza_ has joined #openstack-nova03:37
Izza_hello...good day, i'm doing tempest testing on openstack helm environment but i encountered error "Got Server Fault"03:38
Izza_ tempest.lib.exceptions.ServerFault: Got server fault03:38
Izza_Failed 1 tests - output below:03:39
Izza_"/usr/lib/python2.7/site-packages/tempest/test.py", line 172, in setUpClass03:39
*** Izza_ has quit IRC03:39
*** Izza_ has joined #openstack-nova03:41
Izza_Failed 1 tests - output below:03:41
Izza_"/usr/lib/python2.7/site-packages/tempest/test.py", line 172, in setUpClass03:41
*** Izza_ has quit IRC03:41
*** Izza_ has joined #openstack-nova03:43
Izza_hi can u pls help me on my tempest testing, scenario: tempest.api.compute.servers.test_create_server.ServersTestJSON , error: tempest.lib.exceptions.ServerFault: Got server fault03:44
Izza_Details: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.03:44
*** hongbin has quit IRC03:46
*** mkrai has joined #openstack-nova03:59
openstackgerritArtom Lifshitz proposed openstack/nova master: NUMA live migration support  https://review.opendev.org/63460603:59
openstackgerritArtom Lifshitz proposed openstack/nova master: Deprecate CONF.workarounds.enable_numa_live_migration  https://review.opendev.org/64002103:59
openstackgerritArtom Lifshitz proposed openstack/nova master: Functional tests for NUMA live migration  https://review.opendev.org/67259503:59
*** udesale has joined #openstack-nova04:07
*** etp has joined #openstack-nova04:18
*** mtanino has quit IRC04:19
openstackgerritSundar Nadathur proposed openstack/nova master: ksa auth conf and client for Cyborg access  https://review.opendev.org/63124204:30
openstackgerritSundar Nadathur proposed openstack/nova master: Add Cyborg device profile groups to request spec.  https://review.opendev.org/63124304:30
openstackgerritSundar Nadathur proposed openstack/nova master: Create and bind Cyborg ARQs.  https://review.opendev.org/63124404:30
openstackgerritSundar Nadathur proposed openstack/nova master: Get resolved Cyborg ARQs and add PCI BDFs to VM's domain XML.  https://review.opendev.org/63124504:30
openstackgerritSundar Nadathur proposed openstack/nova master: Delete ARQs for an instance when the instance is deleted.  https://review.opendev.org/67373504:30
openstackgerritSundar Nadathur proposed openstack/nova master: [WIP] add cyborg tempest job  https://review.opendev.org/67099904:30
*** ociuhandu has joined #openstack-nova04:30
*** ociuhandu has quit IRC04:34
*** Luzi has joined #openstack-nova04:36
*** macz has joined #openstack-nova04:37
openstackgerritSundar Nadathur proposed openstack/nova master: Add Cyborg device profile groups to request spec.  https://review.opendev.org/63124304:39
openstackgerritSundar Nadathur proposed openstack/nova master: Create and bind Cyborg ARQs.  https://review.opendev.org/63124404:39
openstackgerritSundar Nadathur proposed openstack/nova master: Get resolved Cyborg ARQs and add PCI BDFs to VM's domain XML.  https://review.opendev.org/63124504:39
openstackgerritSundar Nadathur proposed openstack/nova master: Delete ARQs for an instance when the instance is deleted.  https://review.opendev.org/67373504:39
openstackgerritSundar Nadathur proposed openstack/nova master: [WIP] add cyborg tempest job  https://review.opendev.org/67099904:39
*** etp_ has joined #openstack-nova04:41
*** macz has quit IRC04:41
*** etp_ has quit IRC04:41
*** etp_ has joined #openstack-nova04:42
*** damien_r has joined #openstack-nova04:42
*** etp_ has quit IRC04:43
*** mkrai has quit IRC04:43
*** igordc has joined #openstack-nova04:44
*** etp_ has joined #openstack-nova04:44
*** etp has quit IRC04:46
*** etp has joined #openstack-nova04:46
*** damien_r has quit IRC04:47
*** etp_ has quit IRC04:47
*** etp_ has joined #openstack-nova04:47
*** markvoelker has quit IRC04:48
*** etp_ has quit IRC04:48
*** etp_ has joined #openstack-nova04:48
*** etp has quit IRC04:50
*** etp_ is now known as etp04:50
*** etp_ has joined #openstack-nova04:50
*** etp has quit IRC04:51
*** ricolin has joined #openstack-nova05:00
*** ratailor has joined #openstack-nova05:04
*** etp_ has quit IRC05:06
*** markvoelker has joined #openstack-nova05:26
*** markvoelker has quit IRC05:30
*** ralonsoh has joined #openstack-nova05:38
*** pots has quit IRC05:42
*** etp has joined #openstack-nova05:47
openstackgerritLuyao Zhong proposed openstack/nova master: Claim resources in resource tracker  https://review.opendev.org/67845206:13
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver discovering PMEM namespaces  https://review.opendev.org/67845306:13
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: report VPMEM resources by provider tree  https://review.opendev.org/67845406:13
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Support VM creation with vpmems and vpmems cleanup  https://review.opendev.org/67845506:13
openstackgerritLuyao Zhong proposed openstack/nova master: Parse vpmem related flavor extra spec  https://review.opendev.org/67845606:13
*** dtantsur|afk is now known as dtantsur06:15
*** arshad777 has joined #openstack-nova06:20
*** dklyle has quit IRC06:20
*** dklyle has joined #openstack-nova06:21
arshad777I have created a sync replicated volume with peer persistence enabled. Attached this volume06:24
*** sapd1_x has joined #openstack-nova06:25
*** N3l1x has quit IRC06:27
*** jawad_axd has joined #openstack-nova06:29
*** slaweq has joined #openstack-nova06:41
*** ileixe has joined #openstack-nova06:45
*** luksky has joined #openstack-nova06:52
*** ileixe has quit IRC06:52
*** igordc has quit IRC07:00
*** avolkov has joined #openstack-nova07:01
*** tesseract has joined #openstack-nova07:05
*** awalende has joined #openstack-nova07:08
*** rcernin has quit IRC07:09
*** damien_r has joined #openstack-nova07:10
*** damien_r has quit IRC07:11
*** damien_r has joined #openstack-nova07:11
*** ociuhandu has joined #openstack-nova07:14
*** maciejjozefczyk has joined #openstack-nova07:15
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces  https://review.opendev.org/67964007:18
openstackgerritLuyao Zhong proposed openstack/nova master: Add functional tests for virtual persistent memory  https://review.opendev.org/67847007:18
*** threestrands has quit IRC07:20
*** ociuhandu has quit IRC07:21
*** rpittau|afk is now known as rpittau07:28
*** cdent has joined #openstack-nova07:30
*** tssurya has joined #openstack-nova07:31
openstackgerritTakashi NATSUME proposed openstack/python-novaclient master: doc: Add support microversions for options  https://review.opendev.org/68117407:36
*** macz has joined #openstack-nova07:39
*** macz has quit IRC07:43
*** trident has quit IRC07:50
*** lpetrut has joined #openstack-nova07:55
*** jangutter has joined #openstack-nova07:58
*** trident has joined #openstack-nova08:01
*** priteau has joined #openstack-nova08:07
*** sapd1_x has quit IRC08:18
*** ociuhandu has joined #openstack-nova08:20
bauzasstephenfin: once you're there, see dansmith's comments08:21
bauzasstephenfin: he wants to squash https://review.opendev.org/#/c/680983/08:21
bauzasstephenfin: so please try to provide the new revisions this morning08:21
*** tkajinam has quit IRC08:22
*** panda|rover has quit IRC08:23
*** panda has joined #openstack-nova08:24
*** ociuhandu has quit IRC08:25
*** ociuhandu has joined #openstack-nova08:26
*** ociuhandu has quit IRC08:30
*** ociuhandu has joined #openstack-nova08:30
*** takashin has left #openstack-nova08:32
*** derekh has joined #openstack-nova08:33
*** ociuhandu has quit IRC08:40
*** ociuhandu has joined #openstack-nova08:41
*** ociuhandu has quit IRC08:44
*** yaawang_ has joined #openstack-nova08:49
*** yaawang has quit IRC08:49
stephenfinsure thing08:51
openstackgerritBalazs Gibizer proposed openstack/nova master: Support reverting migration / resize with bandwidth  https://review.opendev.org/67614009:02
*** jaosorior has joined #openstack-nova09:11
openstackgerritweibin proposed openstack/nova master: Add support for using ceph RBD ereasure code  https://review.opendev.org/68118809:11
*** shilpasd has joined #openstack-nova09:17
*** tetsuro has joined #openstack-nova09:18
openstackgerritBalazs Gibizer proposed openstack/nova master: Func test for migrate re-schedule with bandwidth  https://review.opendev.org/67697209:19
openstackgerritBalazs Gibizer proposed openstack/nova master: Support migrating SRIOV port with bandwidth  https://review.opendev.org/67698009:21
openstackgerritBalazs Gibizer proposed openstack/nova master: Allow migrating server with port resource request  https://review.opendev.org/67149709:23
stephenfinbauzas: trivial patch with no conflicts here https://review.opendev.org/#/c/679339/09:24
stephenfin(if you'd be so kind)09:24
bauzasreminder : I'm French, I'm never kind09:25
stephenfinbauzas: I'm going to rebase my the cpu-resources series on top of the SEV series to head off the incoming merge conflicts too09:25
stephenfinwhile I rework it, that is09:25
bauzasstephenfin: +Wd FWIW09:26
openstackgerritBalazs Gibizer proposed openstack/nova master: Do not query allocations twice in finish_revert_resize  https://review.opendev.org/67882709:26
bauzasstephenfin: cool, I'll look at gibi's series then09:26
openstackgerritBalazs Gibizer proposed openstack/nova master: Allow resizing server with port resource request  https://review.opendev.org/67901909:28
openstackgerritBalazs Gibizer proposed openstack/nova master: Extract pf$N literals as constants from func test  https://review.opendev.org/68099109:30
openstackgerritBalazs Gibizer proposed openstack/nova master: Improve dest service level func tests  https://review.opendev.org/68099809:30
*** yaawang_ has quit IRC09:33
*** yaawang has joined #openstack-nova09:33
openstackgerritBrin Zhang proposed openstack/python-novaclient master: Microversion 2.80: Add user_id/project_id to migration-list API  https://review.opendev.org/67502309:34
*** boxiang has joined #openstack-nova09:35
*** yaawang has quit IRC09:37
*** yaawang has joined #openstack-nova09:37
openstackgerritBrin Zhang proposed openstack/python-novaclient master: Microversion 2.80: Add user_id/project_id to migration-list API  https://review.opendev.org/67502309:38
*** rcernin has joined #openstack-nova09:38
*** yaawang has quit IRC09:41
*** aarents has quit IRC09:42
*** yaawang has joined #openstack-nova09:43
bauzasgibi: I'm almost done with https://review.opendev.org/#/c/676140 but I'll need to do some checks later today09:48
* bauzas disappears for the gym09:48
openstackgerritBrin Zhang proposed openstack/python-novaclient master: Microversion 2.80: Add user_id/project_id to migration-list API  https://review.opendev.org/67502309:57
*** dklyle has quit IRC10:03
*** dklyle has joined #openstack-nova10:04
*** udesale has quit IRC10:09
aspiersstephenfin: I'm here in case you have any last minute questions about https://review.opendev.org/#/c/644565/10:09
*** udesale has joined #openstack-nova10:10
*** markvoelker has joined #openstack-nova10:16
*** tetsuro has quit IRC10:17
*** markvoelker has quit IRC10:21
gibibauzas: thanks10:21
*** macz has joined #openstack-nova10:25
*** macz has quit IRC10:30
openstackgerritAlexandra Settle proposed openstack/nova master: Fixing broken links  https://review.opendev.org/68120610:31
openstackgerritAlexandra Settle proposed openstack/nova master: Fixing broken links  https://review.opendev.org/68120610:32
*** jawad_axd has quit IRC10:40
*** tbachman has quit IRC10:42
*** markvoelker has joined #openstack-nova10:55
aspiersIs there a way to get a guest's libvirt XML via the nova api?  I'm guessing no11:00
*** ociuhandu has joined #openstack-nova11:00
*** markvoelker has quit IRC11:00
aspiersWould be nice if it was available from https://docs.openstack.org/api-ref/compute/?expanded=show-server-diagnostics-detail#show-server-diagnostics11:01
*** ociuhandu has quit IRC11:01
aspiersWithout the XML I am struggling to see how Tempest can verify that an SEV guest was actually booted with SEV enabled11:01
*** nicolasbock has joined #openstack-nova11:06
gmannaspiers: same case for NFV use case testing from artom L180 -https://etherpad.openstack.org/p/qa-train-ptg11:07
aspiershmm11:07
aspiersgmann: which line?11:07
*** ociuhandu has joined #openstack-nova11:07
gmannin QA PTG, we accepted the idea of adding that tempest plugin into QA but it was action item for artom to propose that.11:08
gmannL18011:08
aspiersthanks11:08
aspiersoh nice11:08
aspiersyeah "white box plugin" sounds like a good concept11:08
gmannyeah and it can be expanded with more use case which is out of scope from Tempest11:09
aspiersright11:09
*** rouk has quit IRC11:09
aspiersgmann: although I wonder if adding libvirt XML to the show-server-diagnostics API call might be a simpler solution11:10
aspiersshowing the XML could be an admin-only thing11:11
gmannbut even admin-only does it expose more info than nova should ? (from security point of view.)11:12
gmannI think it was discussed previously also but sean-k-mooney or artom might know more on that.11:13
*** udesale has quit IRC11:14
*** larainema has quit IRC11:15
artomgmann, yeah, I still want to do that, I was meant to propose a spec for Train but only have a very WIP up11:16
*** dave-mccowan has joined #openstack-nova11:17
artomThe code exists in RDO project's gerrit, and a Red Hat QE and myself wanted to clear outstanding reviews on there and merging its tests for NUMA live migration11:17
artomBut that didn't happen yet11:17
gmanni see.11:17
sean-k-mooneygmann: sorry i was not following chat11:18
sean-k-mooneywhat were we talking about11:18
gmannsean-k-mooney: question from aspiers on exposing the libvirt XML to the show-server-diagnostics API11:18
sean-k-mooneyaspiers: no there is not a way to get the xml form the api11:18
sean-k-mooneyand there nerver will be11:19
aspiersthat's a bold statement :)11:19
sean-k-mooneyit completely violates the cloud abstration to expose that level of detail via the api11:19
sean-k-mooneya non admin is not even ment to know the hypervior that is in use11:19
aspierssean-k-mooney: as admin-only diagnostics there is no violation11:20
*** jawad_axd has joined #openstack-nova11:20
aspierssean-k-mooney: we're not talking about non-admins11:20
sean-k-mooneyas admin only technically be they could jsut ssh into the host and look at the xml11:20
aspierssean-k-mooney: not from tempest they can't11:20
aspiersplus that's a lot less convenient11:20
sean-k-mooneytempest shoudl not be asserting behavior of the xml generation11:20
openstackgerritBrin Zhang proposed openstack/python-novaclient master: Microversion 2.80: Add user_id/project_id to migration-list API  https://review.opendev.org/67502311:20
aspierssean-k-mooney: please first read the use case above to understand the need ^^^11:21
sean-k-mooneythat is what functional or white box testing is for11:21
artomaspiers, yeah, that's quite explicitly out of scope for tempest11:21
sean-k-mooneytempest if for blackbox testing11:21
artomBut... quite explicitly *in* scope for whitebox :)11:21
gmannyeah11:21
artomSo now you've landed yourself on the list of people interested, and will be poked mercilessly once it's ready ;)11:21
aspiersagain, this is repeating discussion from a few minutes ago when we talked about the white box plugin11:21
aspiersif there is a white box plugin for tempest, then that means white box testing *is* in scope for the tempest ecosystem, even if not the core11:22
sean-k-mooneyaspiers: so why cant you boot a vm and ssh into an detect that sev is configured from within the vm?11:22
sean-k-mooneyor at least available11:22
aspierssean-k-mooney: how would I detect that?11:22
sean-k-mooneylscpu?11:22
sean-k-mooneyis there not an msr or cpu flag for sev11:22
aspierssean-k-mooney: have you tested that?11:22
sean-k-mooneyno i dont have sev hardware11:22
gmannaspiers: within scope of QA ecosystem not Tempest ecosystem. Tempest is just a tool under QA :)11:23
artomaspiers, yeah, we're not saying "don't do it", we're saying "don't propose patches for it to Tempest"11:23
aspiersartom: OK, it sounded like the former before :)11:23
aspiersgmann: the white box tempest plugin isn't in the tempest ecosystem? ;-)11:24
sean-k-mooneyaspiers: whitebox and the intel nfv test repo use tempest as a framework11:24
sean-k-mooneyto do this type of testing11:24
sean-k-mooneybut its not in tempest as its out of scope fo tempest11:24
sean-k-mooneybut a whitebox style tempest plugin for sev would be fine11:24
sean-k-mooneyor add it to whitebox11:24
artomsean-k-mooney, unrelated, but we had some discussion around saving the new NUMA topology in https://review.opendev.org/#/c/634606/75/nova/compute/manager.py@7223 - and in func tests at least, that instance.refresh() isn't necessary (yes, the tests check the InstanceNUMATopology)11:25
gmannaspiers: It will be QA ecosystem. it can be done via tempest plugin or separate testing framework like extreme-testing( which never got progress ). but it will be separate project under QA with separate team.11:25
artomsean-k-mooney, I'll try to get to the office later this morning, to see what's up with my machine, would you have the bandwidth to play around with that in the meantime in your env?11:25
sean-k-mooneyartom: since this is aparently our highest priority i can make time11:26
sean-k-mooneywhat exactly do you want me to test11:26
sean-k-mooneyremove the instace.refresh11:26
artomsean-k-mooney, I did that already in the latest patchset11:26
artomMaking sure that the new instance NUMA topology is saved in the DB11:27
sean-k-mooneyand check both the db and virsh to confrim that the state is updated correctly?11:27
sean-k-mooneyok11:27
sean-k-mooneyya ill do that now11:27
artomThank you (for the ∞'s time)11:27
artom:)11:27
aspierssean-k-mooney: BTW lscpu on the guest does not mention sev at all11:28
sean-k-mooneyas i said before i have exposed the servers im using for testing via port forwarding so if you continue to have issue then you can ssh into them11:28
sean-k-mooneyaspiers: is there anything in dmidecode/dmesg to indicate sev11:29
sean-k-mooneyi though the guest had to set bit 48 to 1 to enable the encryption11:29
sean-k-mooneyfor pointers11:29
aspierssean-k-mooney: http://paste.openstack.org/show/774681/11:30
aspiersdoesn't even look right11:30
artomsean-k-mooney, IIRC when I tried connecting last time I couldn't - but yeah, what's the connection info again?11:31
sean-k-mooneyhttps://events.linuxfoundation.org/wp-content/uploads/2017/12/Extending-Secure-Encrypted-Virtualization-with-SEV-ES-Thomas-Lendacky-AMD.pdf looking at slide 12 we might be able to check it via the gurest msr11:32
sean-k-mooneyartom: i have two routter my isp one and my ubiquity one. my isp router firwall was blocking it so i truned it off and it started working11:32
aspierssean-k-mooney: I will ask the experts11:33
artomsean-k-mooney, sure, but I still don't have the IP/FQDN in my bash history for some reason11:34
sean-k-mooneyartom: ya i know im looking it up in mine/my router config11:35
*** priteau has quit IRC11:38
*** mtreinish has joined #openstack-nova11:39
kashyapaspiers: Randomly chiming in, but there is an MSR for SEV: https://www.kernel.org/doc/html/latest/x86/amd-memory-encryption.html11:41
kashyapaspiers: And it's reported via `cpuid`11:41
kashyap"Support for SME and SEV can be determined through the CPUID instruction. The CPUID function 0x8000001f reports information related to SME:"11:41
kashyapAnd:11:41
kashyap"If SEV is supported, MSR 0xc0010131 (MSR_AMD64_SEV) can be used to determine if SEV is active: [...]"11:42
aspiersYeah, I've already used that in the past11:42
kashyapAh, then disregard me.11:42
* kashyap goes back to fiddling with what he needs to fiddle with11:42
aspiersNo, the reminder is appreciated11:42
kashyapaspiers: Your goal is to check if the instance (in the upstream CI) booted has indeed with SEV, yeah?11:43
aspiersright11:43
aspiersI guess the JeOS image will need to include cpuid11:44
kashyapIsn't the JeOS (Just Enough OS, I presume) CirrOS in this case?11:44
aspiersIt's whatever tempest is configured with11:44
kashyap(Nod)11:45
sean-k-mooneyaspiers: has one of the upstream ci provered provided you with a lable that will run on SEV hardware11:47
sean-k-mooneyaspiers: because 90% of the upstream ci cloud are proably runing intel x86_6411:47
sean-k-mooneyalthough rackspace did run power for a while11:47
aspierssean-k-mooney: SUSE has our own SEV boxes which can run 3rd party CI11:48
sean-k-mooneyah so its for the suse thrid party ci not for upstream11:48
aspierswell if upstream has SEV hardware then great, but I was not expecting that any time soon11:48
sean-k-mooneyaspiers: i dont think it currely does11:49
sean-k-mooneyor at least not in a way you can target11:49
sean-k-mooneyim sure some of the clould proably have at least a small amd eypc inventory if for nothing else but there own internal validation11:49
aspiersyeah11:50
donnydsean-k-mooney: have the numa jobs been running?12:00
donnydI saw last night it was having some inbound ssh timeout issues.12:02
sean-k-mooneyi have not checked this morning but i think clarkb kicked off a job around 4/5 am so ill see if that passed12:04
sean-k-mooneydonnyd: so that one failed at 04:5412:04
*** markvoelker has joined #openstack-nova12:04
sean-k-mooneydonnyd: was that after your l3 agent restart12:05
donnydi didn't restart till well after you were talking about it on infra last night12:05
sean-k-mooneydonnyd: ya looking at the infra scroll back clarkb kicked that off after you did the restart12:06
openstackgerritChris Dent proposed openstack/os-resource-classes master: Update api-ref link to canonical location  https://review.opendev.org/68123512:06
donnydoh yea i see12:06
donnydIt really just makes no sense though12:07
donnydall the rest of the labels seem to work without issue12:07
donnydmostly without issue... not anymore than any other provider at least12:07
*** tbachman has joined #openstack-nova12:08
openstackgerritChris Dent proposed openstack/os-traits master: Update README to be a bit more clear  https://review.opendev.org/68123712:09
donnydSo the old label we have setup for numa, does that still work?12:09
donnydI haven't set it back on my end12:10
openstackgerritBalazs Gibizer proposed openstack/nova master: Shrink the race window in confirm resize func test  https://review.opendev.org/68123812:12
*** etp has quit IRC12:16
kashyapaspiers: Is there a way to detect from _inside_ the guest that it has indeed booted with SEV?12:25
kashyapaspiers: (From the host, we can use the CPUID bit or other ways)12:25
*** macz has joined #openstack-nova12:26
kashyapaspiers: Meanwhile, I learnt that if SEV isn't provided by the host, an SEV-enabled kernel should fail to boot.12:26
openstackgerritAlexandra Settle proposed openstack/nova master: Fixing broken links  https://review.opendev.org/68120612:27
openstackgerritChris Dent proposed openstack/os-traits master: Update README to be a bit more clear  https://review.opendev.org/68123712:29
gibiefried, aspiers: Is there anything I can do regarding the SEV series?12:30
stephenfinaspiers: Random question: why can we not add the iommu attribute for all those device types?12:30
stephenfingibi: I'm reviewing the last patch now. Think you've already hit it though12:30
stephenfinby which I mean the first one12:30
stephenfinsince the other two are +212:30
stephenfin+W12:30
gibistephenfin: cool. Yeah I tried to find somebody how can look at the first12:30
*** macz has quit IRC12:30
gibibut then it is in good hands now12:31
kashyapaspiers: Also, when you're about, see my response to your comment: https://review.opendev.org/#/c/348394/1012:33
kashyapaspiers: If you can confirm that /usr/share/qemu/ovmf-x86_64-suse-code.bin is indeed the binary built with Secure Boot, then we can fix it12:34
luyaoHi, everyone, I have a question, when an instance is in rebuild, how many allocations will it have? Both new and old ?12:34
brinzhangefried: Could you please review this patch https://review.opendev.org/#/c/681151/, it changes the novalient version to 15.1.0, needed by https://review.opendev.org/#/c/673725/12:43
*** derekh has quit IRC12:43
*** derekh has joined #openstack-nova12:43
stephenfinaspiers: Is that follow-up for https://review.opendev.org/#/c/666616/ around yet?12:44
gibistephenfin: I did not find the followup either so I guess it isn't12:47
*** tbachman has quit IRC12:47
*** mriedem has joined #openstack-nova12:55
*** udesale has joined #openstack-nova12:57
openstackgerritStephen Finucane proposed openstack/nova master: Apply SEV-specific guest config when SEV is required  https://review.opendev.org/64456512:58
openstackgerritStephen Finucane proposed openstack/nova master: Reject live migration and suspend on SEV guests  https://review.opendev.org/68015812:58
openstackgerritStephen Finucane proposed openstack/nova master: Enable booting of libvirt guests with AMD SEV memory encryption  https://review.opendev.org/66661612:58
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Start reporting PCPU inventory to placement  https://review.opendev.org/67179312:58
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: '_get_(v|p)cpu_total' to '_get_(v|p)cpu_available'  https://review.opendev.org/67269312:58
openstackgerritStephen Finucane proposed openstack/nova master: objects: Add 'InstanceNUMATopology.cpu_pinning' property  https://review.opendev.org/68010612:58
openstackgerritStephen Finucane proposed openstack/nova master: Validate CPU config options against running instances  https://review.opendev.org/68010712:58
openstackgerritStephen Finucane proposed openstack/nova master: trivial: Use sane indent  https://review.opendev.org/68022912:58
openstackgerritStephen Finucane proposed openstack/nova master: objects: Add 'NUMACell.pcpuset' field  https://review.opendev.org/68010812:58
openstackgerritStephen Finucane proposed openstack/nova master: hardware: Differentiate between shared and dedicated CPUs  https://review.opendev.org/67180012:58
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Start reporting 'HW_CPU_HYPERTHREADING' trait  https://review.opendev.org/67557112:58
openstackgerritStephen Finucane proposed openstack/nova master: Add support for translating CPU policy extra specs, image meta  https://review.opendev.org/67180112:58
openstackgerritStephen Finucane proposed openstack/nova master: fakelibvirt: Make 'Connection.getHostname' unique  https://review.opendev.org/68106012:58
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Mock 'libvirt_utils.file_open' properly  https://review.opendev.org/68106112:58
openstackgerritStephen Finucane proposed openstack/nova master: Add reshaper for PCPU  https://review.opendev.org/67489512:58
bauzasgibi: add a question for you https://review.opendev.org/#/c/676140/12:58
bauzashadù12:58
gibibauzas: looking12:59
*** hamzy_ has joined #openstack-nova12:59
*** nweinber has joined #openstack-nova12:59
openstackgerritStephen Finucane proposed openstack/nova master: Apply SEV-specific guest config when SEV is required  https://review.opendev.org/64456513:00
openstackgerritStephen Finucane proposed openstack/nova master: Reject live migration and suspend on SEV guests  https://review.opendev.org/68015813:00
openstackgerritStephen Finucane proposed openstack/nova master: Enable booting of libvirt guests with AMD SEV memory encryption  https://review.opendev.org/66661613:00
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Start reporting PCPU inventory to placement  https://review.opendev.org/67179313:00
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: '_get_(v|p)cpu_total' to '_get_(v|p)cpu_available'  https://review.opendev.org/67269313:00
openstackgerritStephen Finucane proposed openstack/nova master: objects: Add 'InstanceNUMATopology.cpu_pinning' property  https://review.opendev.org/68010613:00
openstackgerritStephen Finucane proposed openstack/nova master: Validate CPU config options against running instances  https://review.opendev.org/68010713:00
openstackgerritStephen Finucane proposed openstack/nova master: trivial: Use sane indent  https://review.opendev.org/68022913:00
openstackgerritStephen Finucane proposed openstack/nova master: objects: Add 'NUMACell.pcpuset' field  https://review.opendev.org/68010813:00
openstackgerritStephen Finucane proposed openstack/nova master: hardware: Differentiate between shared and dedicated CPUs  https://review.opendev.org/67180013:00
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Start reporting 'HW_CPU_HYPERTHREADING' trait  https://review.opendev.org/67557113:00
openstackgerritStephen Finucane proposed openstack/nova master: Add support for translating CPU policy extra specs, image meta  https://review.opendev.org/67180113:00
openstackgerritStephen Finucane proposed openstack/nova master: fakelibvirt: Make 'Connection.getHostname' unique  https://review.opendev.org/68106013:00
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Mock 'libvirt_utils.file_open' properly  https://review.opendev.org/68106113:00
openstackgerritStephen Finucane proposed openstack/nova master: Add reshaper for PCPU  https://review.opendev.org/67489513:00
*** tbachman has joined #openstack-nova13:00
*** BjoernT has joined #openstack-nova13:02
*** macz has joined #openstack-nova13:03
stephenfinbauzas, alex_xu: I've merged in those follow-ups to https://review.opendev.org/#/c/671793/ and the next two patches, per dansmith's request, if you fancy re +2ing13:04
bauzasack13:05
gibibauzas: replied in https://review.opendev.org/#/c/67614013:07
bauzasgibi: cool thanks13:07
*** macz has quit IRC13:07
luyaoefried: Hi efried,  are you around?13:07
mriedemgibi: do you have a bug reported for https://954d3ddb67e757934983-a9cc155153d08dd30dfffbbf1d71d234.ssl.cf5.rackcdn.com/676138/16/gate/nova-tox-functional-py36/fb5d235/testr_results.html.gz yet?13:08
gibimriedem: not yet. I have a patch that shrinks the window13:08
gibihttps://review.opendev.org/#/c/681238/13:08
*** jdillaman has quit IRC13:09
*** tbachman has quit IRC13:09
*** nweinber_ has joined #openstack-nova13:09
efriedluyao: o/13:09
bauzasgibi: mriedem: just an unrelated thought, should we somehow persist VGPU resource request to the RequestSpec requested_resources field ?13:10
luyaoefried: I would like another patch to refactor this later if necessary. Current appproach works well and it will be great for me  to change after merged. https://review.opendev.org/#/c/678452/22/nova/compute/resource_tracker.py@116813:10
gibirequested_resources field is intentionally not persistes in the db13:10
gibibauzas: ^^13:10
bauzask13:11
efriedbauzas: imo VGPU should take its lead from what's been done with vpmem13:11
*** arshad777 has quit IRC13:11
efriedmove vgpu into the resources field13:11
bauzasfor the moment, we don't really do anything, we just take the allocation that was done13:11
stephenfindansmith: When you're around, we're going to need to discuss https://review.opendev.org/#/c/671801/40/nova/conf/scheduler.py@208 so I can grasp where you're coming from13:11
*** nweinber has quit IRC13:11
bauzasefried: WDYM ? sorry, don't have a lot of contect of VPMEM apart of the spec13:11
*** eharney has joined #openstack-nova13:12
efriedluyao: Yes, I completely overlooked the fact that the report client's provider tree doesn't have the resources in it. I agree we should keep it this way for now, but I think for the refactor we may want to reduce the data structure stored in the RT to just a dict, keyed by rp_uuid, of lists of resources.13:12
*** gbarros has joined #openstack-nova13:12
efriedbauzas: Unless you were thinking to do this for Train, let's talk about it after FF.13:13
bauzasefried: of course not, just considering the next steps13:13
efriedstephenfin: Are you going to have time today to review the vpmem series?13:13
luyaoefried: Okey, we can discuss the details in the future. :)13:13
bauzassince we plan to do VGPU affinity13:13
stephenfinefried: I am, and I'm going to hold off on the follow-up requested for https://review.opendev.org/#/c/674895/ because reviews seem more important at the moment, right?13:14
efriedbrinzhang: looks like you already got your novaclient release merging, yah?13:14
efriedstephenfin: totally13:15
openstackgerritMerged openstack/nova master: Indent fake libvirt host capabilities fixtures more nicely  https://review.opendev.org/67933913:16
gibimriedem: field a bug https://bugs.launchpad.net/nova/+bug/184343313:16
openstackLaunchpad bug 1843433 in OpenStack Compute (nova) "functional test test_migrate_server_with_qos_port fails intermittently due to race condition" [Undecided,New]13:16
bauzasgibi: you could triage it as critical since it impacts the gate IMHO13:17
bauzasgibi: any ideas of the failure rate?13:17
gibibauzas: does it block the gate?13:17
mriedemhttp://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22in%20test_migrate_server_with_qos_port%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=7d13:17
mriedemit's not critical13:17
mriedemyes it hits the gate but doesn't block it13:17
bauzasmriedem: okay, thanks, I was about to ask logstash13:18
mriedemgibi: ack, i left a few comments in the patch to close the race13:18
gibimriedem: thanks. I will respin soon13:18
bauzasgibi: what's the patch up for this ?13:18
gibibauzas: https://review.opendev.org/#/c/681238/113:18
bauzasgibi: tbh, I'm a bit afraid it's at the top of the series13:19
bauzasif you need to respin, I'd appreciate you move it down13:19
luyaoefried: Could you help confirm will an rebuild instance have two groups of allocations?  I knew migration instance will change one allocation consumer to migration uuid. https://review.opendev.org/#/c/678452/23/nova/compute/resource_tracker.py@40613:19
gibibauzas: I think it is top of master13:19
bauzasyup13:19
gibibauzas: so it is not connected to the series13:19
gibibauzas: just by topic13:19
mriedemright, it's a race in an already merged functional test,13:20
mriedemso let's just fix the race and merge it13:20
bauzasgibi: my bad, you're right13:20
gibimriedem: good suggestion about instance event, that will fix the race13:20
mriedem\o/13:20
gibiI was affraid we need to poll the migration allocation in placement to close the race13:20
gibimriedem: I also fixed the bug grenade found in revert13:21
gibimriedem: unfortunately we still not have test run on that patch13:21
aspierskashyap: oh, I thought you were talking about cpuid inside the guest13:21
mriedemgibi: yeah i saw you updated but i haven't looked at your replies or changes,13:22
aspierskashyap: I think there are multiple binaries with SB13:22
mriedemit kind of sucks that we don't store the information needed for revert in the migration context13:22
gibimriedem: I have better plans, described in my reply13:22
*** Sundar has joined #openstack-nova13:22
gibimriedem: we can let neutron remember the mapping13:22
kashyapaspiers: Sorry, I was not clear.  I don't _know_ if it is also reported in the guest `cpuid` -- maybe you can tell from your hardware?13:23
gibimriedem: by using multiple portbinding13:23
aspierskashyap: I don't have a guest with cpuid installed currently13:23
aspierskashyap: I think I misremembered - before I probably ran it on the host13:23
kashyapaspiers: For SB -- IMHO, you don't _need_ all those 4M, 2M variants -- it's beyond overkill.  Your life, and admins life will be far more simpler, if you used the "two pairs" approach I noted in the comment13:24
aspierskashyap: You're preaching to the choir. You really need to file a bug report on bugzilla.opensuse.org13:24
mriedemgibi: ah yeah13:24
sean-k-mooneyartom: i have confirmed that the migration context correctly contains the new numa toplogy blob but it does not get saved to the db13:24
*** ratailor has quit IRC13:25
aspierskashyap: or reach the right team in some other way13:25
kashyapaspiers: Aaah, sorry; didn't realize we're on the same line, same word13:25
openstackgerritAdam Spiers proposed openstack/nova master: Improve SEV documentation and other minor tweaks  https://review.opendev.org/68125413:25
aspiersefried, gibi, stephenfin: there's the follow-up ^^^13:25
sean-k-mooneyartom: so do ing apply migration context followed by drop to delete it form the db does not work13:25
sean-k-mooneyartom: i would assume that for some reason the filed is not flaged as dirty and is not being saved13:26
gibiaspiers: ack!13:26
*** BjoernT_ has joined #openstack-nova13:27
SundarHi sean-k-mooney, how are you doing?13:27
sean-k-mooneySundar: hi13:28
efriedluyao: I'm trying to make sense of those scenarios right now.13:28
Sundarsean-k-mooney: The Cyborg patches for nova-integ have merged; https://review.opendev.org/#/q/project:openstack/cyborg+branch:master+topic:nova-integ+owner:Sundar13:28
sean-k-mooneySundar: ok so its just the nova code that is pending13:28
SundarThere is one related patch for Nova notification https://review.opendev.org/674520 that is close13:29
*** BjoernT has quit IRC13:29
aspierskashyap: Where did you hear that an SEV-enabled kernel would fail to boot if SEV is not provided? I have empirically proven that false by booting the same image both with and without SEV13:29
sean-k-mooneySundar: ok yes without that nova will timeout waiting and rollback the spawn13:29
mriedemSundar: no testing at all for those cyborg patches?13:30
SundarYes. BTW, I updated the Nova patches too. Notification works with 674520. If you look at the seeming UT failures, they are mostly unrelated to Cyborg13:30
kashyapaspiers: DanPB; but to be fair to him, he used a qualifier "IIUC".  If you've tested it, I'll go with your data13:30
aspierskashyap: gotcha13:30
Sundarmriedem: What do you mean by 'no testing'?13:30
sean-k-mooneySundar: have we had a end to end run of the tempest job13:30
mriedemSundar: there are no tests associated with that patch13:31
SundarThe tempest CI is working with the patches, and we are hoping to merge it by this week.13:31
efriedmriedem: do we use "migration instance" for evac?13:31
mriedemefried: no13:31
efriedso what happens to allocations?13:31
mriedemefried: you mean migratoin-based allocations?13:31
efriedyeah13:31
mriedemefried: they sit on the source13:31
efriedthanks13:31
mriedemwhich is why we can't delete resource providers when deleting a compute service that has evacuated instance allocations against it13:31
mriedemthat whole thread i had in the ML13:32
*** Luzi has quit IRC13:32
mriedemefried: related to this https://review.opendev.org/#/c/678100/13:32
sean-k-mooneymriedem: that has been changed recently right. we delete the allcoation if they exisits if and only if you bring back up the compute agent on the failed host13:32
sean-k-mooneyso that only help if you repair whatever the issue was13:32
mriedemsean-k-mooney: yes the source allocations are deleted if you bring up the evacuated-from compute service13:33
sean-k-mooneyif you dont you get into the stiutation in that ml thread13:33
mriedemi don't think it's probably uncommon to have a host failure, down the compute service, evacuate it, and then try to delete the comptue service before redeploying on that host13:33
mriedemSundar: tempest only covers happy path testing for the most part; unit tests are good for testing error conditions and such - exceptional cases13:34
mriedemanyway, that's up to the cyborg core team for how they want to enforce testing standards13:34
efriedmriedem: also rebuild?13:35
mriedemefried: a rebuild isn't a migration13:35
mriedemand you can't rebuild on a down host13:35
*** bbowen_ has joined #openstack-nova13:35
mriedemso the host stays the same and the flavor stays the same, but the image might change on a rebuild13:35
sean-k-mooneySundar: have the depencies been fixed in https://review.opendev.org/#/c/670999/13:36
efriedmriedem: right, separate topic, I'm saying: does rebuild also end up with multiple sets of allocations for the same instance uuid?13:36
sean-k-mooneySundar: you rebased it but did not run the new job via "check experimental"13:36
sean-k-mooneyso we still dont have a run that worked13:36
sean-k-mooneythe previous run failded https://c3308e17743765936b80-6c7fec3fffbf24afb7394804bcdecfae.ssl.cf2.rackcdn.com/670999/7/experimental/cyborg-tempest/2fe52ec/testr_results.html.gz13:37
mriedemefried: no13:37
sean-k-mooneythat said we knew the depencies were wrong so we expected that13:37
*** bbowen__ has joined #openstack-nova13:37
*** bbowen has quit IRC13:37
openstackgerritMatt Riedemann proposed openstack/nova master: Fixing broken links  https://review.opendev.org/68120613:38
luyaomriedem: what about in the process of rebuild13:39
*** bbowen_ has quit IRC13:39
luyaomriedem: rebuild is not done, how many allocations will an instance have?13:40
efriedwe destroy the instance but keep the allocations, then respawn it with the existing allocations?13:40
efried...and maybe a different image?13:40
sean-k-mooneySundar: it looks like there are 3 patches still remaining on the cyborg side. 2 for nova integration and 1 for python3 support13:40
mriedemluyao: i don't understand your question13:41
mriedemefried: correct13:41
efriedmriedem: luyao and I are asking about the same thing. Thanks for the help.13:41
mriedemrebuild is basically re-spawn in place on the same host with maybe a different image but the same ports/volumes/flavor - if you're on shared storage you keep your root disk, if not your root disk is rebuilt from the specified image13:42
Sundarsean-k-mooney: I'll rerun with 'check experimental'. Re. dependencies for  https://review.opendev.org/#/c/670999/, one has already merged, and the other is the tempest code itself, which is working with the patches and should merge soon. There should be only 1 for nova-integ (i.e. Nova notification) -- I fixed the topic now.13:42
sean-k-mooneySundar: wait a minute13:42
*** eharney has quit IRC13:42
mriedemthough i might be thinking of evacuate for that root disk comment13:42
sean-k-mooneyim going to fix it to not waste gate resouces and fix the depencies13:42
mriedemanyway, root disk doesn't matter for what you're asking13:43
mriedemsean-k-mooney: cyborg integration isn't happening in nova in train - where are we with test runs on the numa live migration series?13:43
efriedstephenfin, dansmith: btw, vpmem now has a CI passing. I haven't opened one up yet, but it's taking half an hour, so it's doing... *something* :P13:43
luyaomriedem: I saw an instance have new and old allocations in an functional test when it's in evacuating.13:43
sean-k-mooneymriedem: i confrimed that it is not saving the updated numa toplogy so it looks like we need instance.refresh13:43
*** jawad_axd has quit IRC13:43
sean-k-mooneymriedem: the migration context has the correct numa toplogy13:44
luyaomriedem: I thought rebuild is similar to evacuate13:44
sean-k-mooneymriedem: so it looks liek its not marking the field as changed for some reason13:44
sean-k-mooneymriedem: perhapes because apply migration context uses setattr13:44
mriedemluyao: evacuate is rebuild to another host when the source host is down13:44
sean-k-mooneymriedem: i -1'd the patch13:44
*** rcernin has quit IRC13:45
mriedemevacuate and rebuild use the same code flows in conductor and compute services with conditionals for any differences13:45
efriedluyao, mriedem: I find all of it so confusing that I don't even try to remember, because that just makes things worse. It goes against the very fiber of my being, but this is one where I'll ask. Every. Time.13:45
*** jawad_axd has joined #openstack-nova13:45
mriedeme.g. evacuate does a claim on the dest host, rebuild does not13:45
*** jawad_axd has quit IRC13:45
mriedemsean-k-mooney: ok i didn't realize artom removed the instance.refresh in post live migration13:45
mriedembut haven't looked at changes since yesterday13:46
mriedemefried: rebuild+evacuate and resize+cold-migrate are confusing in that they share code but don't do the exact same things13:46
mriedembut are like 90% the same13:46
mriedemefried: do we need a "what's the difference between evacuate and rebuild" thing in the contributor docs?13:47
mriedemb/c that's probably pretty easy to write up13:47
efriedthat must be why luyao was conflating rebuild+evacuate wrt allocations. That must fall in the 10% that's different.13:47
mriedempart of it yeah13:48
efriedmriedem: I'll admit I didn't even go looking for a doc on this.13:48
mriedemthere likely isn't one13:48
mriedembut this isn't the first time this has come up,13:48
efriedThere was that one blog post13:48
efried"one"13:48
mriedemso instead of explaining it each time, one could just link to a doc13:48
jangutterefried: http://www.danplanet.com/blog/2016/03/03/evacuate-in-nova-one-command-to-confuse-us-all/ this one?13:48
mriedemnot really the same13:48
efriedthat looks familiar jangutter13:49
*** jawad_axd has joined #openstack-nova13:49
mriedemthat's about nova client commands that live migrate all instances off a host13:49
openstackgerritAdam Spiers proposed openstack/nova master: Improve SEV documentation and other minor tweaks  https://review.opendev.org/68125413:49
luyaoefried: yes, actually I found evacuting instance has two groups of allocations, and I thought rebuild is the same.13:49
* alex_xu tried to understand luyao's question13:49
mriedem"After a compute host has failed, rebuild my instance from the original  image in another place, keeping my name, uuid, network addresses, and  any other allocated resources that I had before."13:49
efriedanyway, I envision some kind of table with all these related operations on one axis, an things like "same/different destination host", "what happens to allocations", "instance/migration UUIDs", etc on the other.13:50
mriedemthat's a much bigger doc if you're trying to lump in all move operations,13:51
openstackgerritsean mooney proposed openstack/nova master: [WIP] add cyborg tempest job  https://review.opendev.org/67099913:51
mriedembecause you'd have to consider resize on the same host, which is complicated in different ways13:51
efriedmriedem: is there a todo list on which that ^ could be registered so it is not forgotten, but not started until after FF? We need you for more urgent things this week.13:51
sean-k-mooneySundar: that ^ should test with the correct deps. im going back to testing artoms code13:51
mriedemefried: i'm not signing up to write a doc with all of the axis of evil,13:52
mriedemwriting up something quick and easy about "what's the difference between rebuild and evacuate" is pretty easy for me to crank out13:52
efriedeven so13:52
mriedema bug is good enough for that13:52
mriedemdocs bug, link to the irc question that started tihs13:52
efriedack13:52
*** jawad_axd has quit IRC13:53
sean-k-mooneymriedem: im going to add back in the instnace.refresh locally and add some logging to see what teh numa toplogy blob look like before and after.13:53
*** med_ has joined #openstack-nova13:53
sean-k-mooneymriedem: artom is in dad taxi mode currently but he should be back soon ish13:53
*** nweinber__ has joined #openstack-nova13:53
sean-k-mooneyalthough looking at that funciton im not clear why that helps13:55
SundarThanks, sean-k-mooney13:55
*** eharney has joined #openstack-nova13:55
*** mlavalle has joined #openstack-nova13:56
efriedhttps://bugs.launchpad.net/nova/+bug/184343913:56
openstackLaunchpad bug 1843439 in OpenStack Compute (nova) "doc: Rebuild vs. Evacuate (and other move-ish ops)" [Low,Confirmed] - Assigned to Matt Riedemann (mriedem)13:56
*** nweinber_ has quit IRC13:56
efriedmriedem: there's a 'doc' tag and a 'docs' tag. Which one is real?13:56
efried(I used both)13:56
*** ociuhandu has quit IRC13:57
*** ociuhandu has joined #openstack-nova13:58
*** tbachman has joined #openstack-nova13:58
bauzasefried: IIRC, it was 'docs' 3 years ago13:59
bauzasoops, I meant 'doc'14:00
bauzasholy shit14:00
sean-k-mooneyits doc14:00
*** BjoernT has joined #openstack-nova14:00
sean-k-mooneyhttps://wiki.openstack.org/wiki/Nova/BugTriage#Tag_Owner_List14:00
bauzasefried: yeah, I'm correct : https://bugs.launchpad.net/nova/+bugs?field.tag=doc vs. https://bugs.launchpad.net/nova/+bugs?field.tag=docs14:00
efriedbauzas: are tags autovivified or do they need to be explicitly created somewhere?14:01
*** Izza_ has quit IRC14:01
bauzasefried: nope, you can add anyone14:01
efriedLike, could we go through the latter four bugs and remove 'docs' and that tag would disappear?14:01
mriedemthere is an official list14:01
bauzasefried: I'm just updating the 4 'docs'14:01
mriedemanyone can add any tag,14:01
sean-k-mooneyefried: if you go to the wiki i likned14:01
*** BjoernT_ has quit IRC14:01
bauzass/docs/doc14:01
mriedembut the list of official tags that auto-complete is curated14:01
sean-k-mooneythere is a link to manage the offical tags14:01
efriedcool14:01
efriedmriedem: oh, in that case both 'docs' and 'doc' must be in there14:02
sean-k-mooneyhttps://bugs.launchpad.net/nova/+manage-official-tags14:02
efriedbecause both autocompleted for me14:02
mriedemhttps://bugs.launchpad.net/nova/+manage-official-tags14:02
sean-k-mooneyno idea how to use that however14:02
*** tkajinam has joined #openstack-nova14:02
mriedemyou move things in or out of the 'official tags' column14:02
bauzasefried: mriedem: I removed usage for 'docs' https://bugs.launchpad.net/nova/+bugs?field.tag=docs14:02
efriedokay, I'm removing 'docs' from official...14:03
bauzasand yeah, I just removed 'docs' from the official list14:03
bauzasheh, jinxed14:03
*** ociuhandu has quit IRC14:03
efriedsorted14:03
openstackgerritStephen Finucane proposed openstack/nova master: objects: Remove custom comparison methods  https://review.opendev.org/47228514:03
mriedemi just added "documentation"14:03
efriedyou're a bastard14:03
mriedemand "docify"14:03
openstackgerritStephen Finucane proposed openstack/nova master: objects: Remove custom comparison methods  https://review.opendev.org/47228514:04
efriedgibi: Did I see something about a grenade failure - should I wait to recheck stuff?14:04
bauzasefried: mriedem: either way, it's been a long time since I triaged a single bug in Launchpad, floor is yours, folks14:04
gibiefried: don't have to wait14:04
efriedthanks14:04
gibiefried: it was a failure in an open review14:04
efriedo good :)14:04
gibiefried: which I respinned with a fix14:04
efriedI'm going to need to bring in a ringer for this vpmem patch https://review.opendev.org/#/c/678455/14:05
efriedI can handle the rest, but that one is beyond me.14:05
efriedstephenfin is already lined up for it. alex_xu is a co-author.14:05
efriedsean-k-mooney, kashyap, aspiers, artom: I would proxy a +2 from one of you if you're able --^14:06
alex_xualso the vpmem CI is pretty close https://review.opendev.org/#/c/679640/14 \o/14:06
kashyapefried: vPMEM?14:06
* kashyap clicks14:07
efriedkashyap: Yeah, that patch is just "do shit efried doesn't understand to the libvirt xml"14:07
efriedso I don't think you need to understand vpmem14:07
sean-k-mooneyefried: the pmem seriese. i can take a look if kashyap cant14:07
kashyapefried: It's been forever on my to-read list, but I was pulled into urgent downstream stuff last/this week14:07
bauzasefried: once I'm done with gibi's series and stephenfin's cpu-resources, I could give some look at https://review.opendev.org/#/q/topic:bp/virtual-persistent-memory14:07
openstackgerritMatt Riedemann proposed openstack/nova master: Fixing broken links  https://review.opendev.org/68120614:08
efriedThanks folks. bauzas kashyap sean-k-mooney in the interest of focusing resources, if it's possible to start with https://review.opendev.org/#/c/678455/ rather than going through the whole series, that's the one where we have a known reviewer gap.14:09
alex_xubauzas: kashyap sean-k-mooney thanks also14:09
bauzasefried: honestly, reviewing the series needs me looking from the start, but I can surely just look at the patches without really commenting them14:09
sean-k-mooneyyes if that is the one that need to be reviewed i can start there in about 5 mins14:10
kashyapefried: Nod; that's what I clicked on14:10
gibistephenfin: a question about the cpu series. The vcpu_pins_set can be and empty string and the nova will use every host cpu for the vcpus. Does this behaviour preserved for the cpu_shared_set and cpu_dedicated_set ?14:10
*** brinzhang has quit IRC14:11
efriedgibi: iiuc no, you'll get an error if you try that.14:11
bauzasgibi: good question, we haven't agreed on that in the spec14:12
*** brinzhang has joined #openstack-nova14:12
gibiefried: so I can assume that when the vcpu_pin_set is removed then there will be no way to implicitly offer every host cpu as vcpu? But I have to explicitly list them in one of the new configs14:13
stephenfingibi: It's preserved, yes14:13
bauzasstephenfin: only if you don't use the new options, right?14:14
bauzasstephenfin: because if you start messing with your options, then something will come up :)14:14
stephenfinany(vcpu_pin_set, compute.cpu_shared_set, compute.cpu_dedicated_set) == False --> allocate all host CPUs for PCPU inventory14:14
stephenfinbauzas: correct14:14
sean-k-mooneywhats the status on the SEV series by the way. stephen did you get a chance to look at the bottom patch? i think that was all that was left to be +w and the rest were ready to go?14:14
efriedRight, I guess we don't have to make the decision about what gibi is asking until a future release.14:15
efriedsean-k-mooney: it's gateward14:15
stephenfinsean-k-mooney: It's gone through14:15
stephenfingoing, rather14:15
sean-k-mooneyok so i can unload the SEV context form my brain14:15
sean-k-mooneyat least for a while thats good14:15
stephenfinburn it. burn it all14:15
stephenfinat least until people start using it in two years or so14:15
gibiefried: sure, the part of what happens when we remove the vcpu_pin_set is for the future not for Train14:15
gibistephenfin: thanks14:16
stephenfingibi, efried: fwiw, when we remove 'vcpu_pin_set' I'd like to preserve the same behavior14:16
stephenfinthough sean-k-mooney disagrees14:16
stephenfinin U or V or whatever, the above simply becomes14:16
stephenfinany(compute.cpu_shared_set, compute.cpu_dedicated_set) == False --> allocate all host CPUs for PCPU inventory14:16
gibistephenfin: I have conflicting requirements internally what empty vcpu_pin_set "should" mean14:17
stephenfindamn, sorry, VCPU14:17
stephenfinin both cases14:17
gibistephenfin: yeah, VCPU. even beter14:17
efriedalex_xu: even if not yet +2able, do you see https://review.opendev.org/#/c/671800/ being generally sane, enough to unblock the bottom and start merging cpu-resources patches?14:17
stephenfinyeah, my mistake14:17
alex_xuefried: yes, I think it is good. the case I tested passed14:18
efriedstephenfin: any reservations about that ^ ?14:18
stephenfinefried: Nope. I need to fix a thing with quotas and discuss whether we need the scheduler option with dansmith ([1]) but neither of those should affect anything [1] https://review.opendev.org/#/c/67180114:19
stephenfin*anything lower in the series14:19
stephenfinsean-k-mooney: Want to skim through https://review.opendev.org/#/c/671793/ again now that the fixes for your issues are merged directly in?14:20
sean-k-mooneyam ill add it to the queue after the vpmem patch but sure14:20
efriedbauzas: would you please re+2 and +W the bottom cpu-resources https://review.opendev.org/#/c/671793/ once sean-k-mooney has done that ^ ?14:20
*** awalende has quit IRC14:24
*** awalende has joined #openstack-nova14:25
openstackgerritLuyao Zhong proposed openstack/nova master: Claim resources in resource tracker  https://review.opendev.org/67845214:25
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver discovering PMEM namespaces  https://review.opendev.org/67845314:25
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: report VPMEM resources by provider tree  https://review.opendev.org/67845414:25
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Support VM creation with vpmems and vpmems cleanup  https://review.opendev.org/67845514:25
openstackgerritLuyao Zhong proposed openstack/nova master: Parse vpmem related flavor extra spec  https://review.opendev.org/67845614:25
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces  https://review.opendev.org/67964014:25
*** awalende has quit IRC14:25
openstackgerritLuyao Zhong proposed openstack/nova master: Add functional tests for virtual persistent memory  https://review.opendev.org/67847014:25
*** dtantsur is now known as dtantsur|afk14:25
*** awalende has joined #openstack-nova14:26
artomsean-k-mooney, thanks for testing, I saw your comments while waiting at the doctor's office. I'm going to try and fix the func tests, and then get a better handle on why instance.refresh() makes it work14:27
sean-k-mooneyartom: well it does not14:27
*** brinzhang_ has joined #openstack-nova14:27
artomsean-k-mooney, hah, so it never worked?14:27
sean-k-mooneyat least adding it before apply does not work14:27
sean-k-mooneyim gong to try adding after drop migration context14:27
artomsean-k-mooney, no, it was on the source, before calling driver.cleanup()14:27
sean-k-mooneyoh i thought it was in post live migrate14:28
artomThe theory was that driver.cleanup(), at least in libvirt, calls instance.save() and clobbers that the dest saved14:28
sean-k-mooneyoh ok14:28
kashyapefried: Ah, nice - just noticed that Luyao also wrote the upstream libvirt-proper part of the 'pmem' / related NVDIMM support bits :-).  (/me needs more time, it's a massive context-switch for my slow brain.)14:28
sean-k-mooneyill look at the old code and try and figure out where to add it14:28
sean-k-mooneyif you know then i can test it14:28
* kashyap bbiab; neighbour called for help14:29
sean-k-mooneyor feel free to updated it on the host i gave you acess too14:29
efriedkashyap: that sounds impressive14:29
sean-k-mooneywhichever works14:29
artommriedem, ^^ ... would we be OK using instance.refresh() without a clear handle on why it's needed? I'm assuming not14:29
*** awalende has quit IRC14:30
*** maciejjozefczyk has quit IRC14:30
sean-k-mooneyefried: https://review.opendev.org/#/c/678455/24/nova/virt/libvirt/config.py is the primary file you want our input on yes14:31
efriedsean-k-mooney: yeah, I can at least somewhat understand the others.14:32
sean-k-mooneyi have looked at the other and assming this file looks sane those do too14:32
*** ociuhandu has joined #openstack-nova14:32
sean-k-mooneyim almost done but sofar nothing jumps out as me as obviously wrong14:32
*** ociuhandu has quit IRC14:33
efriedluyao: https://review.opendev.org/#/c/678456/ quick fix here please14:33
*** ociuhandu has joined #openstack-nova14:33
luyao efried : ok14:34
mriedemartom: i thought sean-k-mooney said it was needed14:35
sean-k-mooneymriedem: i said it does not work without it. i have not tested readding instance.refresh in the correct place14:35
mriedemartom: like i said in review, my guess is the copy of the instance on the source has a dirty migration_context field, the applies it's copy and drops it, and then the source does an instance.save and saves the dirty copy14:35
sean-k-mooneythe current code is setting the correct numa toplogy in the migration context but its not makeing it into the numa_toplogy field int eh instance_extra table14:36
openstackgerritLuyao Zhong proposed openstack/nova master: Parse vpmem related flavor extra spec  https://review.opendev.org/67845614:36
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces  https://review.opendev.org/67964014:36
openstackgerritLuyao Zhong proposed openstack/nova master: Add functional tests for virtual persistent memory  https://review.opendev.org/67847014:36
sean-k-mooneyluyao: efried alex_xu  where are ye subtracting the alingment form the requested size14:36
sean-k-mooneyself.target_size = kwargs.get("size_kb", 0) - self.align_size14:37
artommriedem, it *is* needed, sean-k-mooney confirmed this morning14:37
artomBut I don't have a good understanding of why14:37
sean-k-mooneyluyao: efried alex_xu https://review.opendev.org/#/c/678455/24/nova/virt/libvirt/config.py@318014:37
artomErr, brb, my client's all wonky14:37
*** ociuhandu has quit IRC14:37
*** artom has quit IRC14:38
*** artom has joined #openstack-nova14:38
*** ociuhandu has joined #openstack-nova14:39
luyaoefried, sean-k-mooney: because the  label will occupy some space, and the the size must be aligned14:39
sean-k-mooneyluyao: sure but jsut substracting the alignment is not correct14:40
sean-k-mooneyit will result in a smaller allcotion then you asked for.well14:40
sean-k-mooneythere are two ways to handel it14:40
sean-k-mooneyif we know the lable size14:40
artommriedem, btw, I removed all the allocation-style pinning/non-NUMA stuff that we talked about last night. You and sean-k-mooney convinced me it was 1. tangential (long-standing issue not related to intance NUMA topology, fixed with a hard reboot) 2. scope creep that's too risky at this point. So I pushed a PS last night with out it14:40
artommriedem, so, in my mind, all that's left is the instance.refresh() thing14:40
sean-k-mooneywe can add that then round up to the next size or we had do what your doing14:40
luyao sean-k-mooney : Yes it is, it will be smaller14:41
mriedemartom: ok. the instance.refresh() certainly doesn't hurt so i have no issues with leaving that in14:41
dansmithsame,14:41
sean-k-mooneyluyao: i normally would have assumed we would handel this by adding the overhaed size and round up to then next aliment boundry like we do for block device but i guess we cant do that as we need to fit in the size in the placemnt allocation14:42
dansmithI had said I wasn't totally sure why, but wasn't surprised it was needed14:42
sean-k-mooneyluyao: so you chose to subtract the alignment14:42
luyaosean-k-mooney: the size must be aligned by alignsize14:42
mriedemi think it would be pretty simple to debug after the fact if we have a working job - just dump the source instance.migration_context to logs, call post at dest, then refresh and dump it again to the logs14:42
sean-k-mooneyluyao: yes i know14:42
sean-k-mooneyim not arguing that point14:42
dansmithmriedem: yeah14:42
sean-k-mooneybut that code does not actully guarentee that14:42
artommriedem, dansmith, ack, so first thing I'll do is put it back in, and while you guys check whether I've addressed the rest of the feedback satisfactorily, I can fix the func tests for not picking that up, and bet a better handle on why it's needed in the first place14:43
luyaosean-k-mooney: let me check it again14:43
dansmithartom: ack14:43
sean-k-mooneyluyao: im refering to https://review.opendev.org/#/c/678455/24/nova/virt/libvirt/config.py@3180 by the way14:44
sean-k-mooneyluyao: oh you are relying on the plamcent step size to prevent non aligned allocations14:45
luyaosean-k-mooney: sorry, I don't understand14:46
luyaosean-k-mooney: the target size is aligned by alignsize, this is guaruanteed by creating namepsace, then I subtract the alignment, it will be aligned by alignsize also.14:47
luyaosean-k-mooney: what is plamcent step size14:48
sean-k-mooneyyes but this code is relying on other code makeing sure that size_kb is a multiple of the aliment size14:48
sean-k-mooneyit is nto actully checkign for that here its just assuming it14:48
sean-k-mooneywhich is why this normally would not guarentee its a still alinged but in this case it will be14:49
sean-k-mooneyluyao: ignore the placment step size we decided to model it diffrently14:50
luyaosean-k-mooney: the size is return from ndctl utility,  ndctl will guarentee the size is aligned. the alignsize is also from ndctl14:51
sean-k-mooneywe had two proposals this is why it would have been better for me to review all of the series rather then jsut this patch14:51
sean-k-mooneyok14:51
efriedluyao: FYI: step_size is the granularity with which you're allowed to make an allocation for a resource.14:51
efriedSo if you have 64 units of something, and your step_size is 8, you're allowed to allocate 8, 16, 24... but not 9 or 3 or 14.14:51
sean-k-mooneyefried: ya so in the current propoal we are tracking invenoties of namespace size14:52
sean-k-mooneyso i think its 114:52
sean-k-mooneybut we had debated modeling it differently at one point14:52
sean-k-mooneyin its current form its not relevent. but i need to remind my self of some of teh other decisions14:53
luyaoefried,sean-k-mooney: thanks, I see. it has no step_size, it is not partable14:54
sean-k-mooneyluyao: well its set to 114:54
efriedYes, I remember this now. 1 is appropriate, because they're single namespaces14:55
sean-k-mooneyso you get pmem in units of 1 namespace14:55
luyaosean-k-mooney: yes14:55
efried^14:55
efriedand the size/align shouldn't actually leak into any kind of placement, scheduling, etc. code.14:55
efriedIf it's anywhere at all, it would be deep in the libvirt-isms.14:55
efriedbecause as far as nova+placement are concerned, a namespace is a single unit.14:56
sean-k-mooneyefried: at one point we had discussed the idea of dynamically creating the namespces but we regjected that14:56
sean-k-mooneywhen we were considring it the alinment was one of the posible values for the stepsize i belive14:57
sean-k-mooneyanyway that is not relevent14:57
efriedright14:57
* efried ==> meetings...14:57
openstackgerritBalazs Gibizer proposed openstack/nova master: Fix the race in confirm resize func test  https://review.opendev.org/68123814:57
gibimriedem: ^^14:57
*** nweinber_ has joined #openstack-nova14:58
efrieddansmith, stephenfin: FYI on the pmem CI, the reason it's passing on patches outside of the pmem series is because currently the job is configured to pull down the top patch (https://review.opendev.org/#/c/678470) before running.14:59
efriedI'm still digging into what the tests are actually doing.14:59
gibimriedem: it seems that the top of the bw series hits the race so I'm wondering what to do. Rebase the series to inclued the fix in the bottom, or wait for the grenade test result first as gate is slow14:59
dansmithefried: are you talking about vpmem or PCPU?14:59
*** udesale has quit IRC15:00
*** udesale has joined #openstack-nova15:00
efrieddansmith: vpmem15:00
*** nweinber__ has quit IRC15:01
efriedbbiab15:01
*** efried is now known as efried_afk15:01
*** gbarros has quit IRC15:02
mriedemgibi: a nit inline on that race test fix15:03
mriedemgibi: i think we'll just get the race fix merged today and then it's the same as you rebasing your series on it15:03
gibimriedem: OK. will put back the lower() call.15:04
gibimriedem: ahh good point, the gate will rebase the series for the test anyhow15:05
*** spsurya has quit IRC15:05
*** jawad_axd has joined #openstack-nova15:06
*** brinzhang_ has quit IRC15:07
mriedemgibi: left another comment, don't know if you want to fup again or not,15:07
mriedembut since you changed all of these copy/paste blocks of confirmResize/wait for status/wait for event, those could go into a single helper method15:08
gibimriedem: OK I will put that in a helper. Do you suggest to only change the new confirm resize tests?15:09
mriedemi'm not sure what you mean by 'new confirm resize tests'15:09
mriedemlike the new bw provider migration tests?15:10
gibimriedem: the current patch adds the instance action event waiting for every confirm resize tests in test_servers15:10
gibimriedem: not just to the bw related confirm resize tests15:10
mriedemyeah i didn't expect to see all of those other changes in tests that aren't doing exotic allocatoins15:10
*** Sundar has quit IRC15:10
mriedemi mean, it's fine, and we could just put the helper in ServerMovingTests15:11
gibithis race does not depend on any exotic allocation.15:11
gibitrue we only saw the issue in the bw related tests15:12
mriedemputting it in ServerMovingTests is probably not good enough then, since the new tests don't use that15:12
mriedemin PortResourceRequestBasedSchedulingTestBase15:12
mriedemcould go in ProviderUsageBaseTestCase though...15:13
*** tkajinam has quit IRC15:13
mriedemup to you - i just don't love the copy/paste so we should use a helper at some point, fup or whatever - up to you15:13
*** tssurya has quit IRC15:13
gibimriedem: OK. I will respin the patch15:13
openstackgerritArtom Lifshitz proposed openstack/nova master: NUMA live migration support  https://review.opendev.org/63460615:14
openstackgerritArtom Lifshitz proposed openstack/nova master: Deprecate CONF.workarounds.enable_numa_live_migration  https://review.opendev.org/64002115:14
openstackgerritArtom Lifshitz proposed openstack/nova master: Functional tests for NUMA live migration  https://review.opendev.org/67259515:14
bauzasgibi: could you please give me again the race fix change ?15:14
bauzasgibi: can't easily find it15:14
gibibauzas: https://review.opendev.org/#/c/68123815:15
bauzasgibi: thanks15:15
bauzasah, that's because the topic name changed :)15:15
sean-k-mooneyalex_xu: luyao have you tested the vpmem code with instace that request hugepage or pinning but dont specify a numa toplogy15:16
artomdansmith, mriedem, ^^15:17
*** gbarros has joined #openstack-nova15:17
*** shilpasd has quit IRC15:17
bauzasgibi: so you'll respin https://review.opendev.org/#/c/681238 ?15:18
alex_xusean-k-mooney: I guess luyao have test on that. But that will be a case like normal instance with numa. Since when we specify pinning or hugepage, we will create one node numa for the instance, right?15:18
* bauzas context switches to cpu-resources then 15:18
sean-k-mooneyalex_xu: we should but i need to make sure https://review.opendev.org/#/c/678455/25/nova/virt/libvirt/driver.py@4664 does nto break that15:19
*** ociuhandu has quit IRC15:19
sean-k-mooneye.g. and instance with hw:cpu_policy=dedicated + vpmem needs to be pinned15:19
*** ociuhandu has joined #openstack-nova15:19
sean-k-mooneyand i need to ensure adding not need_pin wont break that15:19
alex_xusean-k-mooney: https://review.opendev.org/#/c/678455/25/nova/virt/libvirt/driver.py@5454, for that case, the need_pin should be True, since the instance has numa_topology15:20
sean-k-mooneyinstance.numa_topology shoudl be set in the calim but i need to triple check15:20
alex_xuyea15:20
gibibauzas: yeah, I will respin soon.15:20
bauzascool15:21
alex_xusean-k-mooney: actually we set instance.numa_topology in the API layer I think15:21
sean-k-mooneyalex_xu: i think the code is correct but im not sure im happy with this be in a diffreent location then the other places we create a numa toplogy implcitly15:21
alex_xusean-k-mooney: yea, that is agreement on the PTG, people said they want the vpmem on no numa instance, but I totally understand your point15:22
*** ociuhandu has quit IRC15:22
alex_xuI can't remember who is asking, but it should be redhat people15:23
*** ociuhandu has joined #openstack-nova15:23
sean-k-mooneywell as a redhat person i did not but i think maybe dansmith prefered supporting this. i kown we said we woudl do that at the PTG however15:23
alex_xuyea, actually I'm on your side in the beginning15:24
sean-k-mooneyalex_xu: do you have an xml i can inspect15:24
dansmithsean-k-mooney: um, what?15:24
sean-k-mooneye.g. an exmaple xml that was generated using this code15:24
* dansmith feels like sean-k-mooney has been blaming a lot of stuff on him lately15:24
*** tbachman has quit IRC15:25
alex_xusean-k-mooney: luyao has, but she is on the way home~15:25
sean-k-mooneydansmith: you prefered to add pmem suport without requireing numa in the inital version corect?15:25
sean-k-mooneydansmith: i could be miss rememebring if so my applogies15:25
sean-k-mooneydansmith: i just rememebr you and  i were the most active redhatters in that discuession in the ptg.15:26
alex_xusean-k-mooney: here is one http://52.27.155.124/95/674895/35/check/pmem-tempest-plugin-filtered/af5fdef/controller/logs/screen-n-cpu.txt.gz15:26
alex_xusean-k-mooney: search the 'pmem', you will see the xml output by nova-compute log15:27
dansmithsean-k-mooney: I seriously doubt I said that15:27
*** ociuhandu has quit IRC15:28
sean-k-mooneydansmith: in that case my applogies for invoking your name :)15:29
sean-k-mooneyalex_xu: so this is the generated domain15:29
sean-k-mooneyhttp://paste.openstack.org/show/774840/15:29
openstackgerritBalazs Gibizer proposed openstack/nova master: Fix the race in confirm resize func test  https://review.opendev.org/68123815:30
gibimriedem, bauzas: ^^15:30
sean-k-mooneyalex_xu: it is both pinning ram of the and the core of the guest to float over host numa node 015:30
alex_xusean-k-mooney: yes, but I think the is based on old patchset, since that ci job always pull a old version of luyao's patch15:31
*** ociuhandu has joined #openstack-nova15:31
alex_xuin new version, it shouldn't have the pinning15:31
sean-k-mooneyalex_xu: but it is not specificy a constrait on pmem device as far as i can see15:31
alex_xusean-k-mooney: yes, that is the output for patchset13, sorry15:33
sean-k-mooneyalex_xu: im not sure that is a good thing. but ok. do you have an updated run15:33
sean-k-mooneyill check the ci logs i guess15:33
alex_xusean-k-mooney: luyao is trying to get one for you15:33
sean-k-mooneyhttp://52.27.155.124/93/671793/23/check/pmem-tempest-plugin-filtered/23b32fa/ shoudl have them right15:34
sean-k-mooneythat from ps 2315:34
sean-k-mooneyi found one15:34
alex_xusean-k-mooney: no, it won't, the CI always try to apply the PS13, Rui is trying to remove that. That is why you can see other patch also can pass the pmem ci test15:35
*** macz has joined #openstack-nova15:35
*** gyee has joined #openstack-nova15:35
sean-k-mooneyin the new code will both the numatune and cputune element be removed if only a pmem device is requested in the flavor15:37
*** ociuhandu has quit IRC15:38
kashyapsean-k-mooney: To my eyes this whole NUMA and PMEM interaction looks sutble enough that some more functional testing is required...15:39
*** tbachman has joined #openstack-nova15:41
sean-k-mooneykashyap: well there are functional tests in  https://review.opendev.org/#/c/678470/2715:41
sean-k-mooneybut i have not looked at them yet15:41
kashyapsean-k-mooney: Ah, missed it; thx15:42
sean-k-mooney... its locking up my firefox windwo trying to open it for some reason15:43
bauzasgibi: https://review.opendev.org/#/c/681238/315:44
bauzasgibi: can I update the commit msg ?15:44
sean-k-mooneykashyap: the functional test will not assert anythin about the xml generation15:44
gibibauzas: sure go ahead15:44
*** factor has joined #openstack-nova15:45
kashyapsean-k-mooney: Sure, I see those are already in the main change.15:45
*** icarusfactor has quit IRC15:45
sean-k-mooneythere are no assertion made about numa element in the main change as far as i can tell15:46
sean-k-mooneyrather numatune and cputune15:46
*** ociuhandu has joined #openstack-nova15:47
bauzasgibi: cool, will do and +215:48
gibibauzas: thanks a lot15:48
mriedemgibi: nits in https://review.opendev.org/#/c/676140/19 for a follow up15:48
openstackgerritSylvain Bauza proposed openstack/nova master: Fix the race in confirm resize func test  https://review.opendev.org/68123815:48
sean-k-mooneyefried_afk: alex_xu ill try to come back to the pmem stuff in an hour or so. i need to review stephens patch then sync with artom15:49
gibimriedem: ack. thanks15:49
sean-k-mooneystephenfin: you wanted me to review https://review.opendev.org/#/c/671793/ specifcally right. should i also review the rest15:50
stephenfinsean-k-mooney: would be helpful, aye15:50
alex_xusean-k-mooney: i got one http://paste.openstack.org/show/774842/15:50
stephenfinI'm marching through the pmem stuff15:50
* bauzas is done for the day, sorry stephenfin :(15:50
bauzasbut I'll look at your series tomorrow morning15:51
sean-k-mooneyalex_xu: looking15:51
artomsean-k-mooney, for once, I think I'm good, and do not require your excellent services :)15:51
sean-k-mooneyalex_xu: i think that that is correct. it is creatin a virtual numa toplogy but it is not tiying it to the host in any way15:52
sean-k-mooneyartom: the pmem device is also assocaitated with the virtual guest numa node 015:53
*** gbarros has quit IRC15:53
sean-k-mooneyartom: are you working on fixing the persitence issue15:53
sean-k-mooneyartom: i jsut wanted to circle back and see if you had anything form me to test or if i should start looking into where to fix the issue15:54
artomsean-k-mooney, I added instance.refresh() back it, so that's settled15:54
alex_xusean-k-mooney: but I think this is wrong https://review.opendev.org/#/c/678455/25/nova/virt/libvirt/driver.py@545815:54
sean-k-mooneyartom: ok did you push that?15:54
artomThe func tests weren't hitting it because driver.cleanup() is called conditionally, and the func test env isn't meeting those conditions15:54
artomsean-k-mooney, I did push15:54
alex_xusean-k-mooney: I checked the nova show, I saw there is "hw:numa_nodes" being added, so I guess that is persistented in the db15:55
sean-k-mooneyok then ill check it locally unless you have already15:55
artomIf I just change the code to always call it, I can reproduce15:55
sean-k-mooneyartom: that is not correct15:55
artomAnd instance.refresh() does indeed fix it15:55
artomsean-k-mooney, I know :) It was just to test15:55
mriedemstephenfin: did you ever re-post your PCPU upgrade ML thread with [nova] tagged on it to actually get operator visibility?15:55
artomNext step is to trigger driver.cleanup in the "real" way in func tests15:55
sean-k-mooneywe should not see hw:numa_nodes in nova show15:55
alex_xusean-k-mooney: yes, we shouldn't15:55
sean-k-mooneyartom: sorry that was for allex15:56
sean-k-mooneyartom: i have not looked at your change15:56
stephenfinmriedem: No, it didn't seem necessary since we'd solved the upgrade issue in a way that didn't require anything special from the operator15:56
*** damien_r has quit IRC15:56
stephenfinoutside of bog standard config options15:56
sean-k-mooneyalex_xu: we should not see hw:numa_nodes=1 if its not in the flavor15:56
sean-k-mooneyalex_xu: we do not see that when you get implcit numa toplogyies in other cases15:56
sean-k-mooneyalex_xu: so if you are seeing it then the code is incorrect15:57
artomsean-k-mooney, we're good, don't worry :)15:57
mriedemstephenfin: isn't dansmith's comment all about a nasty upgrade problem?15:57
alex_xusean-k-mooney: I think the problem is the patch is change instance.flavor direclty, after driver.spawn, the nova-comptue update the instance object, then persistent it into the db.15:57
mriedemto which operators, like mnaser, might want to weigh in?15:57
sean-k-mooneyalex_xu: if we were to cold migrate the instance the behavior sould chagne if we save it to the db15:57
mnaserhm15:57
* mnaser can read context now while munching food15:57
dansmithmriedem: yeah, the "plan" doesn't seem super great to me as currently laid out :/15:57
sean-k-mooneyalex_xu: ya we shoudl not be changeing tehe flavor at all15:57
*** ociuhandu has quit IRC15:58
stephenfinmriedem: an intractable one though. Even if operators don't like the little dance we're doing, I fail to see how there's an alternative15:58
mriedemmnaser: you'd need someone to tl;dr it (i would also)15:58
sean-k-mooneyalex_xu: the other case dont chagne the flaovr they just create a numa toplogy15:58
mriedemsince there are 5 conversations going on at once in here right now15:58
mnasersounds like an ML post?15:58
mnaserthat i can read15:58
mriedemmnaser: there was one which no operators read :)15:58
mriedemb/c it was'nt tagged for [nova] or [ops]15:58
sean-k-mooneyso the pmem code is tacking a shortcut by updating the flavor. on a hard reboot that instance would be pinned15:58
*** ociuhandu has joined #openstack-nova15:58
stephenfinsean-k-mooney: I'm looking at that at the moment. I don't like it.15:59
stephenfinNot at all15:59
stephenfinAssuming you're referring to https://review.opendev.org/#/c/678455/25/nova/virt/libvirt/driver.py@545815:59
mriedemstephenfin: for the sake of everyone's clarity, could you post a new ML thread with the proposed upgrade path for PCPU and tag with [nova] and [ops]?15:59
sean-k-mooneystephenfin: if we need to create a numa toplogy we should move it to where we do it for hugepages right?15:59
sean-k-mooneystephenfin: yes15:59
sean-k-mooneystephenfin: that is the hack that i dont like16:00
stephenfinsean-k-mooney: exactly what I'm writing in a comment as we speak16:00
stephenfinwhat is it with people trying to hack flavors :D16:00
stephenfinmriedem: sure16:00
stephenfinthough I really don't see the point16:00
dansmithstephenfin: "intractable" is a bit of a silly characterization :)16:00
alex_xusean-k-mooney: yea, agree with you16:00
stephenfinbecause the only people that can solve this are in this channel/on the review already16:00
sean-k-mooneystephenfin: the issue is that we want a numa topology in the xml. but not in the numa toplogy filter16:01
stephenfindansmith: Possibly :) I have been thinking about this for quite some time though and we've gone through a lot of options, so it starts looking like that to me, heh16:01
*** ociuhandu has quit IRC16:02
*** ociuhandu has joined #openstack-nova16:03
sean-k-mooneyalex_xu: since stephenfin is looking at it im gong to review his cpu code then ill come back to this after i test artoms code16:03
mriedemstephenfin: saying "because the only people that can solve this are in this channel/on the review already" is not true imo - if you've got a hard upgrade thing coming for operators, you likely should get some feedback from them before pushing forward16:03
alex_xusean-k-mooney: thanks a lot16:03
sean-k-mooneymriedem: the upgrade will be signifcantly harder if we also have to deal with numa in placment in the same release16:04
mriedemsean-k-mooney: i don't know what that has to do with this at all16:04
sean-k-mooneymriedem: if we defer pcpus in placment to U we will have to deal with both in one go16:04
mriedemi didn't say anything about deferring16:04
mriedemi said, does anyone outside of the 3 people reviewing this that will actually have to deal with the upgrade know what the plan is16:05
mriedemand are they ok with it16:05
*** tbachman has quit IRC16:05
sean-k-mooneyright but the current upgrade approch is the best we could come up with and we went to the MLs and asked if the toggel was ok16:05
mriedemand no operators even saw that thread,16:05
mriedemwhich is why i asked (again) if it could be posed with a [nova][ops] tag16:06
mriedemto get visibility16:06
mriedemlack of feedback from operators is not agreement16:06
*** tesseract has quit IRC16:06
sean-k-mooneywell we did ask cern in irc16:06
*** rpittau is now known as rpittau|afk16:06
dansmithI think it's probably good to get feedback not just from ops,.16:06
sean-k-mooneybut i would have liked other to comment too16:07
dansmithbut from people that have to do this in the deployment tools16:07
mriedemit would be a lot better to know before releasing train that "this sucks but it's not terrible" rather than "this is a no-go for me"16:07
dansmithas this adds at least one more atomic reconfigure/restart of the deployment16:07
mriedemsure, i lump mnaser into the ops and tooling (OSA) camps16:07
dansmithyup16:08
stephenfinI don't see what the actual issue is though16:08
sean-k-mooneydansmith: for what its worth we talked about this internally with our tripleo folks that will be implementing and they were ok and actully prefered the seperate config flip step16:08
mriedemsean-k-mooney: and cern (surya? belmiro?) said what?16:08
sean-k-mooneymriedem: we ask belmiro16:08
dansmithsean-k-mooney: preferred to what/16:08
sean-k-mooneyand he was ok with the config16:08
stephenfinYou do your upgrade and nothing changes. At some point after the upgrade, you go tweak knobs on the compute nodes followed by a knob on the scheduler16:08
stephenfinand you're done16:08
sean-k-mooneyill see if i can find the irc logs16:08
dansmithstephenfin: and restart the whole deployment atomically :)16:08
stephenfinno, you don't need to do that16:09
dansmithno?16:09
dansmithyou say "immediately" in your comment16:09
stephenfinI said we'd have to do that immediately if I wasn't doing the things I was doing to prevent that16:09
sean-k-mooneydansmith: the alternitve was to do the doble report of resouces as both vcpu and pcpu by the way. and that was not done for a spciric reason i cant rememebr16:10
*** tbachman has joined #openstack-nova16:10
dansmithsean-k-mooney: yeah, that's a terrible alternative, agreed :)16:10
dansmith"Would you prefer an extra config step or a kick in the nuts?"16:10
dansmithI could get most people to agree to the first16:11
sean-k-mooneyit did not require the config and it was self healing but ok16:11
dansmithseems to me that with the current plan,16:11
dansmithafter they've upgraded,16:12
sean-k-mooneyi think we did not do it because of an issue with reshapes16:12
stephenfinsean-k-mooney: Not reshapes, no16:12
stephenfinhttp://lists.openstack.org/pipermail/openstack-discuss/2019-August/008501.html16:12
dansmiththey have to change all (or some fraction) of their computes to the new config to expose the new resources,16:12
dansmithrestart them,16:12
dansmiththen tweak the scheduler config to ask for the new thing, then restart those,16:12
dansmiththen fix the rest of the computes before running out of capacity, and then restart those16:12
dansmithright?16:12
*** lpetrut has quit IRC16:12
sean-k-mooneydansmith more or less16:12
dansmiththat's the most graceful thing16:12
stephenfindansmith: exactly, yeah16:13
dansmithwhich sounds like (a) hard to automate and (b) laborious16:13
dansmithotherwise you're looking for full atomic downtime while you do all that in one go16:13
dansmithI can't imagine OSA is going to decide what the sufficient fraction for conversion is,16:13
sean-k-mooneywell for FFU we take down the whole cloud contol plain so that not unprecidented16:13
dansmithconvert that set, reconfig/restart control services, etc16:14
stephenfinthe scheduler option exists to prevent the need for that atomic upgrade16:14
dansmithsean-k-mooney: this is for rolling one release16:14
stephenfin*exists solely16:14
*** bbowen__ has quit IRC16:14
mriedemsean-k-mooney: vexxhost doesn't need to FFU because they actually don't suck at CD16:14
dansmithstephenfin: which likely only works for the case where the humans decide when to throw that switch16:14
sean-k-mooneymriedem: :) yes but telco dont upgrade untill the last second.16:15
stephenfindansmith: Yeah, I've told mschuppert et al internally to not even try automating this in TripleO16:15
dansmithstephenfin: exactly16:15
stephenfindansmith: But manual human intervention is going to be necessary anyway16:15
mriedemoh right, openstack's only consumer, telco's16:15
dansmithstephenfin: I don't think that's a given16:15
stephenfinyeah, it is16:15
dansmithalright, well, end of discussion then huh?16:16
sean-k-mooneystephenfin: well they will be automatining it as a seperate step that you run but the full detail are tbd16:16
stephenfinwait, I'm preparing my longer answer :)16:16
stephenfinwe can't tell if a host is intended for pinned workloads, unpinned workloads or (bad!) both16:16
dansmithif converting from the current cpuset config to the new one is not something a computer can automate, then this is all unreasonable, IMHO16:17
stephenfinso we can't therefore tell whether we should be mapping 'vcpu_pin_set' to '[compute] cpu_dedicated_set' or '[compute] cpu_shared_set'16:17
stephenfinassuming 'vcpu_pin_set' is even set, which it doesn't have to be16:17
mriedemstephenfin: couldn't we detect that if the compute was reporting a trait saying what it's configured for?16:17
dansmithmriedem: he's saying the config doesn't currently include the intended behavior16:17
dansmithwhich is fine, the operator may have to tell the tool what they're using their pinning for,16:18
stephenfinmriedem: To paraphrase a kid with a spoon, there is no trait16:18
dansmithbut the actual conversion of the formats does not need to be hand-edited everywhere16:18
mriedemstephenfin: i don't know what that means (the spoon kid thing) but sure there is never a trait unless we add one16:18
dansmithmatrix16:18
dansmithcan't believe you didn't get a 90s movie reference dude16:18
mriedemmy point was, if the control plane needs to know things about how the compute is configured/supported, then we use traits for that now16:18
mriedemi'm not a matrix fanboy16:19
* dansmith glares16:19
stephenfinmriedem: We don't need a trait though - we have resources16:19
dansmithright, that's not really the problem16:19
stephenfinthe problem is that we're going from a world where two different types of resource have been munged together, and we're trying to unmunge them as cleanly as possible16:20
dansmiththe controller side of this seems easy to make flexible enough to handle the rolling config of computes to me16:20
stephenfinusing service versions?16:21
dansmithI would say that the scheduler should ask for the new format by default. If placement returns some options, then we filter and schedule to those if possible. Basically, prefer the upgraded machines16:21
dansmithif we get back no candidates or filter them all out,16:21
sean-k-mooneywe are going form a world where teh VCPU reouse was the number of virtual cpus avialabel to it meaning the number of shared cpus16:21
dansmithwe check the service version to determine if there are old computes in the deployment. If so, we query again for the older format to see if there's any room that way16:21
artomSo... maybe we need entirely new resource names then?16:21
dansmithpotentially cache that determination for ten minutes or something, but...16:22
mriedemartom: moving away from VCPU is a non starter to me16:22
dansmiththen you can upgrade and convert computes in one step, which is what OSA and other tools are going to want to do..16:22
mriedemit's baked into *everything*16:22
dansmiththey want to upgrade and fix the config as one step generally16:22
artommriedem, I'm not saying remove it, I'm saying leave it "legacy CPU resource thing", and come up with new resources to mean shared CPU and dedicated CPU16:23
dansmithartom: -316:23
* artom shuts up in a corner16:23
artomAnyways, I have 0 context and func tests to fix16:23
* artom tries to stop bike-shed-astinating16:24
* dansmith whips artom like a mule16:24
sean-k-mooneyartom: we talked about SCPU and PCPU in the past but we said no we want to keep VCPU resouces16:24
artomdansmith, hey man, dinner first16:24
sean-k-mooneyso way to late to go back that route16:24
stephenfindansmith: so tl;dr: kill the static scheduler-only config option and instead do "give me PCPU, but if you can't give me PCPU then search for VCPU" instead16:24
dansmithstephenfin: yes, but only the last part of there are old computes around. once you do that one time and find everything is upgraded, stop even doing that check16:25
dansmithstephenfin: remove that compat step in U, no deprecation cycle needed for that16:25
stephenfinWhat do you mean by old computes?16:25
sean-k-mooneydansmith: if we did that we would need a trait to make the scond query safe16:25
mriedemstephenfin: older than train computes16:26
mriedembased on the nova-compute service rpc api version16:26
dansmithstephenfin: service version will tell you if all computes have been upgraded.. so actually, maybe just always do that in T if no candidates, because they could do the upgrade and the config tweak separately16:26
sean-k-mooneywe would need a support_PCPU capablity trait and need to add it as a forbiden trait for the second query16:26
*** bbowen__ has joined #openstack-nova16:26
stephenfinthat won't work though16:26
dansmithstephenfin: so yeah, what you said16:26
stephenfinbecause by default we don't report PCPU on Train16:26
stephenfindoing so would force people to set new config options as soon as they upgrade16:27
stephenfinwhich we can't do16:27
sean-k-mooneystephenfin: if you look at artom migratio ncode we ignore the "can live migrate with numa" config if everythin is upgraded16:27
dansmithstephenfin: that's why I backed away from the version check there16:27
sean-k-mooneythat would still require everything to be restated however.16:28
stephenfinsean-k-mooney: right, but that's because the "can live migrate with numa" check only needs to know that code is new enough. it doesn't need the operator to tweak some config first16:28
stephenfinwhich this does16:28
stephenfindansmith: Ah, yeah, I missed the "so actually"16:28
dansmithstephenfin: the reason I was heading in that direction, is because:16:29
sean-k-mooneyi think if we were to do the double query as i said we need the compute capableity trait to protect against lading on a new host when old hsot are available and we ask for vcpus16:29
mriedemyou'd only do that fallback for VCPU if the flavor in the request spec (in the scheduler) has resources:PCPU=x right?16:29
stephenfinsean-k-mooney: not gonna happen - the NUMATopologyFilter or libvirt driver protect us16:30
dansmithstephenfin: I was going to suggest we also always expose the inventory, even if we have to synthesize it from the older config, but I'm guessing you're going to say we'd potentially expose as shared or dedicated and be wrong for the intention of the operator right?16:30
sean-k-mooneymriedem: no we are translating hw:cpu_policy=dedicated into PCPU requests16:30
dansmithstephenfin: require them to convert their configs before U, but expose the new inventory right away16:30
mriedemok but my point is, just make sure we're not unconditionally doing that fallback re-query16:30
stephenfinsean-k-mooney: because if you land on a Train compute node, that'll have explicitly set 'NUMATopology.pcpuset' to None16:30
dansmithmriedem: yes, only do the fallback query for PCPU things16:31
sean-k-mooneyok we could have issue with the limit paramater on placement16:31
stephenfinand if that's None, we've got nothing to pin to16:31
stephenfinso placement will pass but the filter will fail16:31
sean-k-mooneybut if we do the second query with out a limit then ya the numa toplogy filter would prevent that16:31
sean-k-mooneyso we dont need the trait16:31
stephenfindansmith: Yeah, exactly16:32
dansmithI don't think the limit is a problem16:32
*** N3l1x has joined #openstack-nova16:32
dansmithstephenfin: so they express their intent right now how?16:32
sean-k-mooneydansmith: doesnt cern set it to like 1516:32
stephenfindansmith: My previous solution had been to expose CPU inventory on hosts without the new configuration as both PCPU and VCPU16:32
dansmithsean-k-mooney: we're asking placement for a query that will return things with PCPU resources.. if placement returns nothing then there's nothing that will fit16:33
sean-k-mooneythe default limit is 1000 so that should not be a proablem but if its really low then i expect it could be16:33
dansmithsean-k-mooney: their problem is different16:33
sean-k-mooneydansmith: im not talking baout the first query for pcpus16:33
dansmithstephenfin: yeah, I don't think that's a good plan16:33
sean-k-mooneydansmith: i ment the second query for VCPUS16:33
*** gbarros has joined #openstack-nova16:33
stephenfinYeah, it's really not. I detailed why in that ML post16:33
dansmithsean-k-mooney: I don't see what the problem is16:34
sean-k-mooneythe new host can have invetories of both16:34
sean-k-mooneywe could get 15 new hosts and no old hosts16:34
dansmithstephenfin: I'm asking about today.. they use this ambiguous config thing.. how do they control which type land where?16:34
sean-k-mooneythen the numa toplogy filter would elinate all hosts16:34
stephenfinthe scheduler option and the NUMATopologyFilter16:35
*** mdbooth has joined #openstack-nova16:35
dansmithstephenfin: meaning static config to determine how to treat pinned cpu requests?16:35
stephenfinIf the scheduler option is set to False (so it's not translating 'hw:cpu_policy=dedicated' to 'resources:PCPU') then we'll keep requesting VCPU from placement16:36
dansmithno,16:36
dansmithI'm asking about TODAY16:36
stephenfinoh, today16:36
dansmithnot the mythical future where your patches are landed16:36
stephenfingotcha16:36
stephenfinhost aggregates16:36
dansmithright, okay16:36
stephenfinand metadata16:36
dansmiththat's what I thought16:36
stephenfinthat's what you're _supposed_ to use, of course. We never enforced it16:36
dansmithso the problem is that computes literally don't have access to the information they need to know what kind of thing to expose, because they have config, but no access to the aggregate info16:37
sean-k-mooneyand windrieve developed a host agent to allow you to mix16:37
sean-k-mooneyso they exploited that we did nto enforece it16:37
stephenfincorrect16:37
dansmithstephenfin: so, if we defaulted to the looser interpretation of the inventory,16:37
stephenfinthe aggregate metadata isn't standardized16:37
stephenfinand it doesn't even need to be set16:37
dansmiththen the aggregate configs would still be in place and we could fall back appropriately maybe?16:37
efried_afkstephenfin: Not having looked thoroughly, if I addressed your -1s on the pmem series, a) would I still be able to +2 in your opinion; b) would it do any good over waiting for luyao to hit it overnight, in terms of you being able to get back to it today?16:37
dansmithanyway, we're getting off on a tangent a bit I think, so let me summarize:16:38
dansmithI think we can do away with the scheduler knob by doing the query-nay-requery approach to prioritize upgraded and converted computes16:39
*** igordc has joined #openstack-nova16:39
dansmithit would be nice if we could make the compute side smarter and/or default to a closer runtime scenario,16:39
stephenfinefried_afk: So if the comments were addressed could you +2 in my absence? I guess, but I'm happy to review again tomorrow too16:39
dansmithbut I'm less concerned about that if they can be reconfigured and restarted in isolation to be picked up by the scheduler by default16:39
*** efried_afk is now known as efried16:39
efriedstephenfin: I meant me +2ing on myself, not assuming your +2.16:39
stephenfinohh16:40
stephenfinYeah, sure. There's no major rework necessary in anything I've reviewed so far16:40
stephenfinthis will need a good bit of modification though, I suspect https://review.opendev.org/#/c/678455/25/nova/virt/libvirt/driver.py16:41
*** jawad_axd has quit IRC16:41
stephenfindansmith: Yup, that all makes sense to me16:41
efriedI ain't touching that patch16:43
stephenfingood call :)16:43
gibimriedem: replied in https://review.opendev.org/#/c/676140 with some questions regarding the private helper you suggested16:43
*** igordc has quit IRC16:44
*** derekh has quit IRC16:44
* gibi needs to drop for today16:45
*** maciejjozefczyk has joined #openstack-nova16:47
mriedemgibi: replied16:48
*** awalende has joined #openstack-nova16:55
*** maciejjozefczyk has quit IRC16:56
*** awalende has quit IRC17:00
*** ociuhandu_ has joined #openstack-nova17:00
*** ociuhandu has quit IRC17:03
*** brault has joined #openstack-nova17:03
*** ociuhandu_ has quit IRC17:05
*** brault has quit IRC17:08
sean-k-mooneyi need to take a break for a bit to clear my head so im going to have something to eat.17:11
sean-k-mooneyi just kicked of stacking with the latest version of artoms code17:11
sean-k-mooneyill start testing it when i get back17:11
*** udesale has quit IRC17:12
*** tbachman has quit IRC17:14
openstackgerritMerged openstack/os-resource-classes master: Update api-ref link to canonical location  https://review.opendev.org/68123517:21
*** brault has joined #openstack-nova17:22
efrieddansmith: bottom two vpmems on your radar for today? https://review.opendev.org/#/c/678447/17:23
efriedhopefully easy, minor updates per your prior comments17:24
dansmithefried: no, I gotta give a talk in a bit and I spent too much time this morning on reviews already17:24
dansmithif comments were addressed it's probably easy for other people to confirm,17:24
efriedif you're okay with that, then sure.17:25
dansmithbut for the record, I'm scared of both pcpu and vpmems at this point17:25
efriedimo the changes are simple and in line with what you requested17:25
efriedI don't blame you. But hey, it's just FF. We have *weeks* to fix bugs :P17:25
* efried feeds face17:26
*** brault has quit IRC17:27
*** ralonsoh has quit IRC17:32
*** cdent has quit IRC17:33
*** nicolasbock has quit IRC17:35
*** tbachman has joined #openstack-nova17:37
*** ralonsoh has joined #openstack-nova17:39
*** nicolasbock has joined #openstack-nova17:40
mriedemi'll call it, the first major bugs from pmem and pcpu regressions will be after vexxhost upgrades to train, and then in about 18-24 months when other deployments start upgrading to train :)17:43
mriedemmaybe cern in a year17:43
dansmithheh, yeah, we won't find any of the bugs in the FF->release window17:43
*** igordc has joined #openstack-nova17:44
mriedemis there any way to test pcpu in a gate job if we ran tempest smoke tests in serial or something? like wouldn't that just be a matter of creating a flavor with PCPU, single node devstack?17:45
mriedemand configuring n-cpu for the dedicated CPUs on the host?17:45
mriedembasically all of them17:45
*** trident has quit IRC17:46
dansmithI'm less concerned about if it actually works in a contrived scenario, and more about an existing deployment trying to get through the upgrade and/or being able to use it without regressions or other issues17:46
mriedemsure, and functional tests are good, they just aren't a replacement for the real thing17:47
dansmithpresumably vpmem testing is pretty much impossive17:47
dansmith*impossible17:47
mriedemi assume so, hence the 3rd party ci17:47
*** igordc has quit IRC17:50
*** gbarros has quit IRC17:51
artomWhich, btw, came up super fast (though I dunno if it had been in the works for a long time before)17:57
*** igordc has joined #openstack-nova17:57
artomRH can learn a thing or 217:57
*** gbarros has joined #openstack-nova17:58
*** trident has joined #openstack-nova17:59
dansmithmriedem: so numa LM should be ready for you, if I understand the current state right?18:02
dansmithI hit the object patch again a bit ago18:02
dansmithI think the third patch should be good  too, but you wanted some verification from the NUMA boyz which sent it off into that cpu pinning tangent, so not sure if you're still waiting for something there18:03
*** igordc has quit IRC18:06
*** igordc has joined #openstack-nova18:06
mriedemartom: came up super fast? you mean the 3rd party CI did?18:10
artommriedem, yeah18:10
artomIntel_Zuul appeared a few days ago18:10
mriedemdansmith: yeah it's sitting in a tab, was going through gibi's series until i hit a stopping point, which i just did18:10
*** jdillaman has joined #openstack-nova18:10
mriedemgibi: https://review.opendev.org/#/c/676972/ re-introduces the race you just fixed, so i'll rebase the series and fix that and +W once your other change is merged18:10
artomI talked with efried a few days ago, mention the apparent disconnect between the "burden of proof" on my for NUMA LM, vs the VPMEM stuff that was apparently ready to go in without any public demonstrations of it working18:11
mriedemand "Intel NFV CI" comments on everything immediately just to say it was skipped, which is annoying18:11
mriedemefried: any way you can get "Intel NFV CI" to shut up if it's not going to do anything?18:11
mriedemit hasn't done anything for years18:11
artomDunno if Intel_Zuul popping up was related, but the coincidence was interesting :)18:11
mriedemartom: fwiw i'm not ready for vpmem to go in18:12
efriedmriedem, artom: The Intel_Zuul is actually running. It's only running pmem, three tests. Unfortunately at the moment it's hardcoded to pull down an old version of the pmem series.18:12
dansmithartom: well, one difference is that yours has the potential to break existing functionality, whereas vpmem hopefully won't break anything existing, only people that try to use it18:12
efriedright ^18:12
artomdansmith, yeah, efried made the same argument - and it's true18:12
dansmithartom: not that it's no risk at all, but it is a teensy bit different18:12
artomdansmith, yep, I get you18:13
mriedemexcept all of the resource tracker weird side bugs refactoring that code will probably introduce on everyone18:13
dansmithwell, that's true18:13
efriedmriedem: retrieving the allocations earlier than we were before. That's the only difference.18:13
efriedAll of it runs under COMPUTE_RESOURCE_SEMAPHORE18:13
artomAnyways, my point was: Intel spun up a CI for their RFE in a matter of days (apparently). RH sucks if we can't do the same18:13
efrieddid before, does now.18:13
mriedemartom: the guy was working on that since denver i think18:14
mriedemefried: btw i was harassing you about "Intel NFV CI" not Intel_Zuul18:14
artommriedem, on the CI?18:14
efriedyeah, I can go ask, but it wasn't like days18:14
mriedem"Intel NFV CI" is an old thing that no longer does anything except comment that it's not doing anything18:14
mriedemartom: yeah18:14
mriedemwe were talking about 3rd party CI in denver18:14
mriedemfor vpmem18:14
artommriedem, OK, I'll eat my words then18:14
mriedemyou and the other red hat bros might have been hungover still from the bar the night before :P18:15
efriedmriedem: fwiw Intel NFV CI they're trying to resurrect to do the thing it was originally intended for.18:15
*** igordc has quit IRC18:15
mriedemefried: i've been hearing that for months18:15
efriedfirst step was turning it back on to be a no-op18:15
mriedemi'd like it to shut up until it actually delivers18:15
efriedyeah I know.18:15
artommriedem, I have Russian roots, calling me an alcoholic is a noop ;)18:15
sean-k-mooneymriedem: we shoudl be able to test the pcpu stuff in the gate yes18:16
mriedemhttps://www.youtube.com/watch?v=soNcOfRvOtg18:16
dansmith<318:17
dansmithI love me some early 'priest18:17
sean-k-mooneyi was talking to a few people on the infra channel and i think we migth be able to replace teh intel nfv ci with first party ci.18:17
*** eharney has quit IRC18:17
sean-k-mooneywe might be abel to get multi numa nested virt lables form limestone and vexhost in the future18:18
sean-k-mooneywe can do non numa testing in the gate already as we have 3 providers with nested virt capablity18:18
sean-k-mooneyor rather singel numa18:19
sean-k-mooneythe testing i set up for the numa live migration was/is testing with cpu pinning, multiple numa nodes and hugepages18:21
mriedemhmm, it seems weird to me that the intel pmem job is running and passing on patches that don't have anything to do with pmem when pmem isn't merged18:23
mriedemlike, how does that even work?18:23
sean-k-mooneymriedem: they have hardocde a specifc version of the patch to be merged in18:24
sean-k-mooneythey were tring to un do that earlier18:25
mriedem2019-09-10 03:27:01.729430 | TASK [upgrade-libvirt-qemu : apply vpmem patch]18:25
mriedemaha18:25
efriedyeah18:25
mriedemrefs/changes/70/678470/13 -> FETCH_HEAD18:26
efriedunfortunately it's applying PS13, not the latest18:26
mriedemthat patch is up to PS27 now18:26
sean-k-mooneyyep18:26
efriedknown, Rui is supposed to unwind that asap.18:26
mriedemi'm not sure why you wouldn't just only run it on patches in that series and skip for anything else18:26
efriedthat would have been another way to do it.18:26
efriedthough it wouldn't have made sense to do it for patches at the bottom either18:27
mriedemartom: sean-k-mooney: so this is the numa lm patch/job we care about right? https://review.opendev.org/#/c/680739/18:27
sean-k-mooneyyes more or less18:27
mriedemwhich hasn't run on the latest series of patches18:27
sean-k-mooneyyes so if you recheck it18:28
sean-k-mooneyit will just run the singel job we want18:28
sean-k-mooneyshall i do it18:29
mriedemye shall18:29
artommriedem, it hasn't - what's FN's status, are we back up?18:29
artomdonnyd ^^?18:29
donnydyea18:29
donnydit was back up yesterday18:29
artom👍18:29
sean-k-mooneyya there is ocationally a network issue18:29
sean-k-mooneybecaue we curently cant fail over to another cloud it more obvios with this job18:30
donnydim not sure why this particular job keeps doing that too... it seems to be failure more often than not18:31
donnydevery now and again i get something that pukes... but by and large it works18:32
sean-k-mooneywell this job cant retry on another provider and im not sure if zuul will retry on the same one18:32
sean-k-mooneyso i think its more a case of it works our your out of luck18:33
openstackgerritmelanie witt proposed openstack/nova master: WIP Include error 'details' in dynamic vendordata log  https://review.opendev.org/68132918:34
donnydWell if FN was having this kind of failure rate we are seeing with this job... I am pretty sure the other projects would have already kilt me18:35
sean-k-mooneywell i admit it does look like there is something else going on but im at a loss as to what18:36
donnydI don't think we were having this much of an issue when it was in a seperate pool on the other label18:36
sean-k-mooneywell we have hit a few different issues. 1 was quotas when we change pool18:37
sean-k-mooneythen the other issue is the ssh connection18:37
sean-k-mooneyi think the quota issue has gone away sicne you are nolonger managin that on your end and contoling it via nodepool18:38
donnydwell there is no more quota (within reason) max-servers is set to 70 and quota for instances set to 10018:38
donnydeverything else on quota is -118:38
sean-k-mooneyya18:38
sean-k-mooneythe pools were both the same project on the same cloud right18:39
*** ociuhandu has joined #openstack-nova18:39
sean-k-mooneyso it should not affect this behavior18:39
donnydcorrect18:39
donnydshould is the opportune word18:39
sean-k-mooneyyes this also should work :)18:39
donnydLOL sean-k-mooney18:39
mriedemhttps://review.opendev.org/#/c/634606/ went from PS75 to PS83 in a hurry18:39
donnydSo the ssh thing I have some theories on and a fix in flight18:40
donnydmy edge router could be a little (lot) better18:40
donnydso its possible that its struggling with all of the connections... or i have a knob i need to turn18:40
donnydload is not high according to cpu / mem / network... but that doesn't mean something else isn't borked... so I am digging18:41
sean-k-mooneyya perhaps you said your using bgp to advertiese the block right but i assume you dont need to adverteise each /64 you are delegating to the vms and have a /48 or something from your isp18:42
donnydeach vm is on the same /64 that is advertised (i know... not cloudy) at the edge18:43
*** ociuhandu has quit IRC18:44
donnydthe zuul tenant has one /64 and its already advertised.. so if we had a routing issue it would be pretty big18:44
sean-k-mooneydonnyd: oh you have a /64 for the sight and the vms are getting a /128? form that subnet18:44
donnydits also possible my state table is too small18:44
donnydI have tweaked some knobs to start18:44
donnydso hopefully the issue goes away18:44
sean-k-mooneyi know that alot of hardware router assume that endpoing are /64 so i know use /128 or more stict routes can cause issue somethimes18:45
donnydyea, i tried to get the functionality to work pretty much like v4 so people don't get too confused18:45
donnydThe way its setup is exactly like if your isp provided you a /6418:46
sean-k-mooneyright18:46
sean-k-mooneyok makes sense18:46
donnydall the systems on your network could get addresses off that /64 and the isp routes it18:46
*** ralonsoh has quit IRC18:46
donnydthat is exactly how I have FN setup18:46
donnydnot great for billing (don't have billing at FN), but great for actual networky things18:47
donnydso if two instances needed to talk in the same tenant... they could just start talking..18:47
sean-k-mooneyim currently trying to get ipv6 working via a HE.net ipv4->ipv6 tunnel but i have only got as far as my router has ipv618:47
sean-k-mooneywhen i had my client get ipv6 i had mtu issue18:47
sean-k-mooneyso your doing better then me at getting ipv6 to work18:48
donnydi have my /48 routed to me via HE and then I am able to pass out /64's to tenants18:48
sean-k-mooneyya that is what i setup on my router too but couldnt get the mtu clamping to work18:49
*** gyee has quit IRC18:49
sean-k-mooneyso i could get to ipv6 sight but packtes over aboutu 1350-1400 bytes were droped18:49
donnydthat is strange18:50
donnydwhat are you using at your edge?18:50
*** gyee has joined #openstack-nova18:50
donnydcould be ISP not doing you any favors18:50
donnydi have a business connection, so they pretty much leave me alone18:50
sean-k-mooneya ubiquite edgerouter-x well actully that is not my edge router18:50
sean-k-mooneymy isp router is infront of it18:50
sean-k-mooneyso you my isp router could be messing it up18:51
* dansmith is thankful to have native ipv6 these days18:51
dansmithalso a business connection here and they'll give me multiple /64s for my subnets18:51
sean-k-mooneyi pay extra for a static ip but they wont let me pay for a buisness connection and that is the only way to get native ipv618:52
dansmithour residential connections here all have native ipv6,18:52
dansmithbut not sure those people can get multiple /64s like I can18:52
sean-k-mooneymost are on cable broadband here but not vsdl18:52
donnyddansmith: verizon fios business doesn't even have v6 to offer18:53
dansmithdonnyd: sucks18:53
sean-k-mooneyall the fiber to home stuff is ipv6 enabled18:53
dansmithmine is cable18:53
donnydyea.. its pretty frustrating18:53
dansmithI don't like to say nice things about comcast, but they do have the ipv6 stuff pretty well sorted at this point (and have for a couple years)18:53
donnydthe network is pretty quick... so the HE overlay doesn't really seem to be hurting for performance18:53
dansmithyeah, I loved my fios business for speed when I had it18:54
donnydyea I had them when i was in CoSprings and their business class was pretty good18:54
dansmithbut then moved outside their area and they mostly stopped expanding it18:54
*** brinzhang has quit IRC18:54
*** brinzhang has joined #openstack-nova18:54
*** panda is now known as panda|rover|off18:54
sean-k-mooneyartom: sorry this took so long but the latest version is updating the numa toplogy blob in the db correctly18:55
sean-k-mooneyso puting back in the instance.refresh() or whatever you did fixed that issue18:56
sean-k-mooneyim going to try testing a bunch of different cases but are there any in partcalar people want me to test18:58
*** ricolin has quit IRC18:58
mriedemartom: so don't respin now since we need to get a result from that ci job, but queue this locally https://review.opendev.org/#/c/634606/8319:05
artommriedem, ack19:06
artomsean-k-mooney, cool, thank you :)19:06
artomI managed to hit it func tests, as I said. But... that was by cheating and setting do_cleanup = True in the code itself, to trigger the driver.cleanup() call19:07
mriedemsmells like a networking orgy in here19:08
artomSo I'm trying to do it correctly by forcing making is_shared_instance_path false19:08
artomWhich leads to a whole other rabbit hole...19:08
dansmithmriedem: efried: not sure if you saw this, but alex_xu confirmed that we just stop caring about quotas on pcpu instances (IIUC) per my question about how it works: https://review.opendev.org/#/c/674895/32/nova/virt/libvirt/driver.py19:08
mriedemstephenfin said something about dealing with quota needs to happen yet19:09
dansmithmriedem: efried: I can see the solution being a new quota class (ick) or lumping them together (which may be confusing) but just leaving them ignored doesn't seem like a good plan to me, especially since it differs for the user based on whether or not the deployment is configured for placement19:09
sean-k-mooneydansmith: for what its worth i point out the PCPU quota issue in the unified limts spec19:11
*** eharney has joined #openstack-nova19:11
sean-k-mooneyif we implement it next cycle we either will have 2 quotas or have to use both vcpu and pcpu count when looking at the cpu quota19:12
dansmithwe have to do something other than just pretend they're not there regardless of when we land it19:13
sean-k-mooneywell we did not intend to pretend they are not there19:13
dansmithsince it's all about consuming whole cpus, I'm pretty sure that operators won't be okay with just allowing anyone to boot enough pcpu guests to exhaust the whole deployment19:13
sean-k-mooneybut yes19:13
dansmithI know, but...19:13
sean-k-mooneypersonally i prefer having two spereate quotas one per RC but i can see why some wont want to distiguiush19:14
dansmithyeah, I mean,19:15
dansmithyou'd think that the operators would want to quota those separately19:15
sean-k-mooneyso they can bill for the seperatly19:15
dansmithno,19:15
sean-k-mooneybecause one cost them a lot more19:15
dansmiththey will bill separately, they just need to make sure one tenant doesn't eat them all up19:15
sean-k-mooneywell ya that too19:16
sean-k-mooneyif we treat them sepatly we dont need to special case for them19:16
sean-k-mooneyits just 1 limit per resource class19:16
sean-k-mooneyso its simpler to reason about19:16
sean-k-mooneythe down side is we need to teach people that there would now be two limits on cpus19:17
dansmithwell, only if they configure flavors with those things19:18
dansmithI can't imagine anyone that enables this functionality will be fine with treating them as the same.. one costing like 100x the other :)19:18
sean-k-mooneyby those things you mean cpu pinning19:18
dansmithno, I mean allowing people to ask for dedicated cpus19:19
dansmithisn't that the point of this?19:19
sean-k-mooneyof pcpu in placmenet or unified limits19:19
*** pcaruana has quit IRC19:20
sean-k-mooneymy understanding is we are sitll going to request dedicated cpus the same way we always didn with hw:cpu_policy=dedicated19:20
dansmithI think I've lost grasp of this conversation19:21
sean-k-mooneyme too19:21
dansmithheh19:21
sean-k-mooneyi was going to say i dont think the way we request things is going to change19:21
mriedembut but but i could override my flavor vcpu with resources:VCPU=0 and define resources:PCPU=119:21
sean-k-mooneyi am 99% sure stephen add a check in his code to block that19:22
sean-k-mooneywe definetly discussed addint one19:22
sean-k-mooneybut yes today with out his patches you could19:22
mriedemit would be confusing anyway since flavor.vcpus would be...what?19:22
sean-k-mooneyflavor.vcpus would be 119:22
mriedemeven though you're not getting vcpu, you're getting pcpu19:23
sean-k-mooneyyes19:23
dansmithmriedem: right, I'm hoping we go in the direction of a resource override in the flavor and not the hw:$foo stuff19:23
mriedemyeah you can't even create a flavor with vcpu=019:23
sean-k-mooneythe placement resouce class shouldn nver have been vcpu. flavor.vcpu means virtual cpu count exposed to the guest19:24
sean-k-mooneydansmith: we are currently not going in that direction19:24
sean-k-mooneydansmith: we are currently explcitly planning ot block resouce class overrides19:24
dansmithmriedem: yeah, but the "rewriting" patch will cause you to get an allocation with vcpu=0, which is weirdish19:24
dansmithsean-k-mooney: huh?19:24
dansmithI dunno who the "we" is in that scenario19:25
mriedemwe = stephen and the people approving the changes19:25
dansmithdo you mean specifically for PCPU things?19:25
dansmithmriedem: heh, yeah19:25
sean-k-mooneywell yes to both19:25
dansmithI definitely don't agree with blocking resource based overrides in the future :)19:25
sean-k-mooneyerric stephen alex gibi and i were talking about this about a mothn ago19:26
sean-k-mooneydansmith: the reason is if we dont you need to modfiy your flavor if the toplogy of resouce changes in placmenet19:26
sean-k-mooneye.g. you flavor would break if we move cpus under numa nodes or cache nodes19:26
mriedemi get the reason in the short term19:26
dansmithsean-k-mooney: I don't see what that has to do with it at all19:27
sean-k-mooneyyou woudl have to chagne teh resouce: syntax to the numberd group form19:27
efriedI'm not fully swapped into this conversation, but last time I looked you get an error if flavor.vcpu != PCPU19:27
efried(when PCPU is specified)19:27
sean-k-mooneyam that is not always correct19:28
dansmithwhich is also really weird and confusing19:28
efriedexcept for something something hyperthread19:28
efriedyes, it's confusing, but it's the compromise that was agreed on in the spec.19:28
sean-k-mooneyexcept if you have hw:emulator_threads_policy=isolate19:28
efriedthen you add 119:28
efriedright?19:28
sean-k-mooneyin that case we allocate 1 addtional pcpu for the emulater thread19:29
sean-k-mooneyyes19:29
efriedthat was in there too.19:29
dansmithwell, I don't see how we can be landing any of this without the quota bit in place at the very least19:29
mriedemwhich is because of this right? https://github.com/openstack/nova/blob/stable/stein/nova/virt/libvirt/driver.py#L85219:29
mriedemis hw:emulator_threads_policy and that +1 thing for pcpu restricted to the libvirt driver or does that logic creep into the api?19:30
*** nweinber_ has quit IRC19:30
sean-k-mooneyam well that is hard to answer19:30
mriedemGREAT19:31
sean-k-mooneyonly the libvirt driver supprots pinning19:31
sean-k-mooneyand this only works with pinning19:31
sean-k-mooneyso this only works with the libvirt dirver19:31
dansmiththat is not the right answer19:31
dansmithletting stuff like that creep into the API because only one driver supports it is how we have a ton of xen-specific warts on the api19:31
sean-k-mooneyis the right answer we hope to remove that in U19:31
sean-k-mooneybecause we hope to remvoe it in U19:32
mriedemdelicious agent builds19:32
sean-k-mooneynow that we have support for share19:32
sean-k-mooneywehich mapps the emulator threads to the same cpu pool as the floating vms we dont think we need isolate anymore19:33
mriedemsean-k-mooney: so by remove "that" you mean hw:emulator_threads_policy ?19:33
sean-k-mooneyyes19:33
sean-k-mooneywhat stephen and i woudl like to propose in U is this. if you have cpu_shared_set defiend the emultor thread run there. if not they run on the same cores as your pinned vm19:34
sean-k-mooneythat is what you get if you do hw:emulator_threads_policy=share today19:34
sean-k-mooneyand if we only support 1 value for the option we can remove it entirly and just do that by default19:35
mriedemefried: so you might want to not drop the -2 on https://review.opendev.org/#/c/671793/23 until the quota issue is sorted out19:36
mriedemthis quota issue https://review.opendev.org/#/c/674895/32/nova/virt/libvirt/driver.py@745719:36
efrieddone19:36
sean-k-mooneydo we have quota issue wiht this?19:36
sean-k-mooneyi though quota was being counted on flavor.vcpu19:36
dansmithsean-k-mooney: did you read the comments?19:36
sean-k-mooneynot on the resouce class in train19:36
sean-k-mooneydansmith: no not yet19:37
*** panda|rover|off has quit IRC19:37
dansmithsean-k-mooney: maybe do that :)19:37
efriedgood, now everyone can focus on vpmem19:37
sean-k-mooneyok but since we only support vms with all pinned or all shared cpus with this seriese it does not change things as far as i was aware in train19:37
*** panda has joined #openstack-nova19:38
dansmithbefore this, we'd limit you to at least your vcpu quopta19:38
dansmithafter this, no limits AFAICT19:38
dansmithhonestly, I'm not sure how you're going to fix it so it works the same with placement quotas and nova quotas19:39
sean-k-mooneywe should still be limiting on flavor.vcpu19:39
dansmithnot with placement19:39
dansmithdid you read the comments?19:39
sean-k-mooneyim trying to find which patch you put it on19:39
mriedem(2:36:21 PM) mriedem: this quota issue https://review.opendev.org/#/c/674895/32/nova/virt/libvirt/driver.py@745719:39
sean-k-mooneyoh but i think i know what your saying19:40
sean-k-mooneywe have the option to track cpu quota in placmenet already19:40
mriedemby default we don't use placement for quota usage counting19:40
sean-k-mooneywe enabled it by default stien right19:40
mriedemf naw19:40
sean-k-mooneyoh we dont19:40
mriedemwhere is melwitt19:40
dansmithwe should be in train I think right?19:40
dansmithmeaning,19:40
dansmithwe should be turning that on but have't19:41
melwittmriedem: hiding19:41
sean-k-mooneydansmith: so if we turn that on then yes we have a proablem19:41
sean-k-mooneyif we dont we should not19:41
dansmithno19:41
dansmithwe have a problem because people could  have that on19:41
sean-k-mooneyture19:41
dansmithyou know config knobs can be set to not the default right? :)19:41
mriedemcern uses it19:41
sean-k-mooneyok i see the problem. i was aware that was a thing but ya i had not consiered the impact to this19:43
mriedemquota is generally the last thing anyone thinks about19:43
dansmithif CPUs weren't the most expensive and constrained resource in the cloud, then maybe less of an issue, but.... :D19:44
mriedem[quota]/injected_file_content_bytes is a major concern of mine19:45
sean-k-mooneyim not sure if sarcasim or if i shoudl feel realy bad for you19:45
mriedemheh, sarcasm19:45
sean-k-mooneydansmith: i always ran out of ram first but ya. i think the only thing we coudl do woudl be to could both inventories. at least short term19:46
sean-k-mooneywehere does that code live19:47
sean-k-mooneye.g. where we check the quota19:47
melwittenforcing quota on other resource classes (DISK_GB and more) is part of the proposal in the unified limits spec, as I think sean-k-mooney mentioned earlier19:47
dansmithmelwitt: right, but this set is taking dedicated cpus out of the equation for the cpus quota,19:48
dansmitheffectively making them unconstrained (and unconstrainable)19:48
melwittsean-k-mooney: nova/quota.py is the main file19:48
dansmithbut they're the most expensive things19:48
mriedemsean-k-mooney: start here https://github.com/openstack/nova/blob/f4ca3e70852c0a7ed7904a9f2d7177c9118d3d1c/nova/quota.py#L128119:48
sean-k-mooneymelwitt: thanks19:48
dansmithand combining them together as a hack is pretty dumb, because the whole point of this effort is to make dedicated cpus be first class citizens instead of a hack19:49
dansmiththe other problem is,19:49
dansmithwe wouldn't want the quota behavior to be different depending on whether or not you're using placement for quota19:50
melwittdansmith: yeah, sorry, was trying to say that if enforcing quota for PCPU resource class is what's needed then that's a big amount of work19:50
dansmithsomewhat unrelated,19:50
dansmithmelwitt: were't we going to enable quota in placement by default soon?19:50
sean-k-mooneydansmith: well if combined them together ti woudl be maintained previous behavior19:50
dansmithmelwitt: yep, and I'm saying that's the right solution19:50
dansmithsean-k-mooney: the point of this work is to make pcpus not work like vcpus right? :)19:50
melwittdansmith: I don't remember a specific timeline, just that we wanted to let it bake with cern for awhile before making it default19:51
mriedemidk that counting PCPU for cores quota from placement is that hard, you'd just sum VCPU and PCPU here wouldn't you? https://github.com/openstack/nova/blob/f4ca3e70852c0a7ed7904a9f2d7177c9118d3d1c/nova/scheduler/client/report.py#L234419:51
sean-k-mooneyand the two main goals of this work was co existing of pinned and floating vms on the same host and removeing the race on claiming pinned cpus19:51
mriedemand here https://github.com/openstack/nova/blob/f4ca3e70852c0a7ed7904a9f2d7177c9118d3d1c/nova/scheduler/client/report.py#L235619:51
dansmithmelwitt: ack19:51
sean-k-mooneydansmith: not direclty  that is a sideffect19:51
dansmithmriedem: if you combine them yes19:51
dansmithmriedem: that's a hack to make it work, not the right solution19:52
mriedemi know we dont want to combine them externally, but the non-placement counting method is going to be based on instances.vcpus which comes from the embedded instance.flavor.vcpus19:52
sean-k-mooneydansmith: the follow up spec that wants to have a vm with some pinned cores and some floating cores need them to be different thigns19:52
dansmithmriedem: right but we should be moving to placement quotas anyway19:52
dansmithsean-k-mooney: AFAIK, this work is aimed at making pcpus not just a per-host special case of pretending that they're like vcpus19:53
dansmithsean-k-mooney: so, yeah, obviously separate quotas is not the goal of this,19:53
dansmithbut all of this is important to get right19:53
sean-k-mooneyyes19:53
sean-k-mooneyim not disputing that at all19:53
dansmithotherwise people really need to segregate these hosts still, which means the rest of it doesn't get us anything19:53
sean-k-mooneyim not sure i agree on the last point but im not going to rathole on it either19:54
sean-k-mooneyi think this will change in U again with unified limits19:54
sean-k-mooneyso i would prefer to groub them in Train to keep the pretrain behavior19:54
sean-k-mooneyand then only change the behavior once19:54
sean-k-mooneyin U when everything goes to unifed limits19:55
dansmithlol19:55
dansmithriiiight19:55
mriedemi want to say making count_usage_from_placement=True by default was dependent on consumer types to smooth out some of the inconsistencies with the legacy counting method today in edge cases19:55
sean-k-mooneythe lol being ever getting to unifed limits19:55
mriedemlike during a resize19:55
melwittunified limits is going to go through the same bake time that counting usage from placement is. it will begin defaulted to False19:56
dansmithmriedem: I didn't think that's why we didn't make it default, because we'd still do the other thing for the non-placement-able resources19:56
dansmithsean-k-mooney: I'm saying don't count your eggs before they hatch19:56
sean-k-mooneyok19:56
dansmithunified limits has been a long time coming, so..19:56
*** gbarros has quit IRC19:57
sean-k-mooneyya i know. so are you opposed to internally in nova combining them for Train19:57
mriedemdansmith: without digging up old review comments and ML thread conversations, i want to say my recollection was (1) counting usage from placement has at least 3 differences in behavior from legacy counting - which are documenting in the config option help text and (2) the main benefit is for multi-cell deployments, of which there are few,19:57
mriedemand (3) it landed late in stein,19:57
sean-k-mooneywhen counting using placment19:57
mriedemso to reduce the risk on non-multi-cell deployments with the behavior change, default to legacy counting19:57
mriedemmelwitt: ^ is that what you remember?19:58
melwittit landed early in train, the data migration landed late in stein19:58
dansmithoh, I thought it was stein19:58
mriedemi thought it was stein too...19:58
dansmithin that case, can't default it in train19:58
* dansmith has to go to a thing19:58
mriedemhttps://review.opendev.org/#/c/638073/ it was train19:59
mriedemtime flies19:59
sean-k-mooneywell if we cant default it in train then we still need to supprot it in train with pcpu in plamcent right19:59
melwittmriedem: but yeah I think the combo of all those reasons were why we default to false19:59
melwittbig delta from legacy counting and only multi-cell ppl likely to "need" it (for the down cell resilience). so we chose not to impose the big delta in behavior on the majority who don't need down cell resilience20:00
sean-k-mooney so do we just want to add another line here https://github.com/openstack/nova/blob/f4ca3e70852c0a7ed7904a9f2d7177c9118d3d1c/nova/scheduler/client/report.py#L2356 to make it work by combining them?20:01
sean-k-mooney*them being PCPUs20:01
melwittmriedem pointed that out earlier20:01
mriedemto be fair though, some of those differences in behavior in counting also changed since i think ocata or pike20:01
mriedemwhen we moved to counting in general and dropped reservations20:01
mriedemand no one apparently noticed until we noticed in stein :)20:01
sean-k-mooneyyes im wondering if we should comment tha ton stephens patch as the way forward so he can do it tomorrow20:01
melwittmriedem: I think there was only one change in pike, doesn't leave room for a revert resize, IIRC (I documented all of it on the review comment)20:02
mriedemsean-k-mooney: i think one could say, "one way forward is ...."20:02
melwittthe new stuff, there's way more deltas than that, a big laundry list20:02
mriedemwhat is a laundry list anyway? 1. put stuff in washer and wash, 2. put stuff in dryer, 3. fold. wouldn't a more accurate list be a grocery list?20:03
sean-k-mooneymriedem: ok but stephen isnt here and i wanted to provide a summary to him but ill just tell him to read scollback in the morning20:03
melwittmriedem: hah20:03
melwittI dunno where that saying comes from actually20:03
mriedemsean-k-mooney: whatever, that's better than nothign20:03
mriedemhttps://english.stackexchange.com/questions/437507/what-is-the-origin-of-the-phrase-laundry-list20:04
melwittI still haven't wrapped my head around what "combining" the resource classes means and whether/how it's different from legacy counting20:04
mriedemit wouldn't be different from legacy counting20:04
melwittI mean, I know it means VCPU + PCPU20:04
melwittok20:04
mriedemso i think the summary is,20:04
mriedem1. with legacy counting, pcpu is counted the same since we use instance.vcpus which comes from the flavor20:05
mriedem2. with placement counting, we'd have to combine VCPU and PCPU usage to match ^20:05
dansmith3. neither are actually right20:05
mriedem3. long-term combining them goes against the goal of separate them as countable trackable resources20:05
mriedemright20:05
mriedemso the immediate question is, is #2 good enough for train with punting sorting out #3 to the future20:06
sean-k-mooneyso this is a bit of nova i dont really fully understand. melwitt is there anythin we can do before unified limts to supprot not combining them if that is prefrable20:07
sean-k-mooneyor do we just wati for unifed limits to do #320:07
melwittI guess if it's not worse than what's possible with legacy counting (inability to separate) then the hack doesn't sound like a huge issue to me. it would obviously need a TODO on it to get rid of it when unified limit support exists and is enabled20:07
melwittsean-k-mooney: nothing reasonable, really. you'd have to add a new quota config and resource for "pcpu_whatever" and use that to let people set the limit separately. because until unified limits, there would be no way for nova to consume the unified limit for PCPU that a person sets in keystone20:09
sean-k-mooneyright and we dont really want to do that because techdebth20:10
efriedFrom where I sit (outside the swirling maelstrom of actual understanding) it sounds like we've had a pretty big imbalance "forever" where we've been counting dedicated and virtual cpus "the same" from a quota perspective. If so, we're not making it worse by continuing to do that, just now we're doing it with a separate resource class for the former.20:10
melwittyeah, that would be a big tech debt20:10
melwittwhereas the hack of combining VCPU and PCPU doesn't sound like big debt to me, unless I've missed something more complex about it20:11
melwitt"small debt"20:11
sean-k-mooneyit should be small to do i think. i assume its just the line mriedem linked but this code is new to me20:13
mriedemplus functional testing20:13
sean-k-mooneywe need to haneld this branch too https://github.com/openstack/nova/blob/f4ca3e70852c0a7ed7904a9f2d7177c9118d3d1c/nova/scheduler/client/report.py#L234420:13
mriedemi'd want to see a functional test that creates a server that allocates PCPU inventory and then assert that cores usage for that tenant is incremented20:13
* melwitt nods20:14
sean-k-mooneyi think stephen has a function test for the first half so the quota check shoudl not be too hard to add20:14
sean-k-mooneyok so he has functional test in the reshap patch at a minium20:17
sean-k-mooneyok yes he add more here https://review.opendev.org/#/c/671801/43/nova/tests/functional/libvirt/test_numa_servers.py20:18
sean-k-mooneyso that proably where it makes sense to ad the test20:18
*** nweinber_ has joined #openstack-nova20:28
*** spatel has joined #openstack-nova20:31
*** spatel has quit IRC20:35
*** ociuhandu has joined #openstack-nova20:37
mriedemartom: https://review.opendev.org/#/c/640021/4820:41
mriedemi probably won't get to the functional test patch tonight20:45
sean-k-mooneymriedem: for what its worth i did test teh  [upgrade_levels]/compute=stein case prviously20:46
sean-k-mooneyi can test it again tomorrow im more or less done for the day20:47
sean-k-mooneybtu if either node has [upgrade_levels]/compute=stein we endup with the stien behavior20:47
sean-k-mooneythe migration successes as long as the cores used in the souce host exist on the dest20:47
sean-k-mooneybut no xml is updated20:47
mriedemso it goes back to the bug behavior20:47
mriedemright?20:47
sean-k-mooneyyep20:48
sean-k-mooneyit goes back to the current master/stien behavior20:48
mriedemyeah i don't know that we really need to bend over backward to try and detect that from conductor20:48
mriedemif you're computes are fully upgraded and restarted and reporting as train service versions you should have also removed any manual pins20:49
sean-k-mooneyif you dont have can_live_migrate_numa set we will block it anyway20:49
mriedem*your20:49
mriedemnot with the new checks20:49
mriedemwe would totally ignore can_live_migrate_numa20:49
mriedemas long as all computes are reporting train20:49
sean-k-mooneyi thik we have a min compute servcie check too before we ignore it20:50
sean-k-mooneyoh thats the compute service20:50
sean-k-mooneynot the RPC20:50
mriedemif we pass that check we ignore the config20:50
mriedemright20:50
sean-k-mooneywell ya i agree we likely dont need to bend over backwards to check this in the conductor20:51
sean-k-mooneyanyway my concentration is totally gone so ill call it a day.20:52
*** ociuhandu has quit IRC20:52
sean-k-mooneyartom: ill test v 49 tommorow when you have addressed mriedem comments20:52
*** ociuhandu has joined #openstack-nova20:53
*** ociuhandu has quit IRC20:57
*** luksky has quit IRC21:00
mriedemmelwitt: brinzhang has been asking me to review this api change https://review.opendev.org/#/c/674243/ but i haven't had time with the bw provider migrate and numa lm stuff - maybe you can peruse it?21:10
artommriedem, ack, thanks!21:11
artom(The func test can wait until after FF if that's what it comes to, right?)21:11
artomI mean, it's not plan A, but...21:11
mriedemmy priority is sean's manual testing and the gate job21:13
mriedembrinzhang: a follow up is not the correct answer to all issues https://review.opendev.org/#/c/679413/421:13
melwittmriedem: will do21:19
*** BjoernT has quit IRC21:20
openstackgerritDustin Cowles proposed openstack/nova master: Use SDK for add/remove instance info from node  https://review.opendev.org/65969121:25
openstackgerritDustin Cowles proposed openstack/nova master: Use SDK for getting network metadata from node  https://review.opendev.org/67021321:25
*** hemna has joined #openstack-nova21:29
*** henriqueof1 has joined #openstack-nova21:30
*** henriqueof has quit IRC21:31
*** hemna has quit IRC21:34
*** mriedem has quit IRC21:37
*** nweinber_ has quit IRC21:43
*** aloga has quit IRC21:45
openstackgerritDustin Cowles proposed openstack/nova master: Use SDK for add/remove instance info from node  https://review.opendev.org/65969121:50
openstackgerritDustin Cowles proposed openstack/nova master: Use SDK for getting network metadata from node  https://review.opendev.org/67021321:50
*** N3l1x has quit IRC21:50
*** hemna has joined #openstack-nova21:57
*** adriant has quit IRC21:59
*** mriedem has joined #openstack-nova22:01
*** slaweq has quit IRC22:21
*** panda has quit IRC22:26
*** panda has joined #openstack-nova22:28
*** aloga has joined #openstack-nova22:29
*** ociuhandu has joined #openstack-nova22:30
*** ociuhandu has quit IRC22:35
*** mriedem has quit IRC22:41
*** brault has joined #openstack-nova22:43
*** avolkov has quit IRC22:47
*** hemna has quit IRC22:50
*** TxGirlGeek has joined #openstack-nova22:57
*** brault has quit IRC22:59
*** rcernin has joined #openstack-nova22:59
*** henriqueof1 has quit IRC23:02
*** tkajinam has joined #openstack-nova23:03
*** macz has quit IRC23:05
*** slaweq has joined #openstack-nova23:11
*** slaweq has quit IRC23:15
*** spatel has joined #openstack-nova23:29
*** TxGirlGeek has quit IRC23:30
*** spatel has quit IRC23:34
openstackgerritMerged openstack/nova master: api-ref: fix server topology "host_numa_node" field param name  https://review.opendev.org/68077523:34
*** mlavalle has quit IRC23:37
*** adriant has joined #openstack-nova23:39
*** mtreinish has quit IRC23:43

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!