Tuesday, 2018-10-16

melwittalex_xu: ack, thanks00:08
*** Dinesh_Bhor has joined #openstack-nova00:34
*** macza has joined #openstack-nova00:37
*** macza has quit IRC00:42
*** tetsuro has joined #openstack-nova00:43
openstackgerritArtom Lifshitz proposed openstack/nova master: Ensure attachment cleanup on failure in driver.pre_live_migration  https://review.openstack.org/58743900:48
*** tbachman has joined #openstack-nova00:56
*** moshele has joined #openstack-nova01:04
*** slaweq has joined #openstack-nova01:11
*** TuanDA has joined #openstack-nova01:14
*** slaweq has quit IRC01:16
*** takashin has joined #openstack-nova01:17
*** mrsoul has joined #openstack-nova01:26
*** Dinesh_Bhor has quit IRC01:27
*** Dinesh_Bhor has joined #openstack-nova01:35
*** openstackgerrit has quit IRC01:35
*** dillaman has joined #openstack-nova01:52
*** mhen has quit IRC01:52
*** READ10 has joined #openstack-nova01:52
*** jdillaman has quit IRC01:52
*** moshele has quit IRC01:58
*** Dinesh_Bhor has quit IRC02:02
*** hongbin has joined #openstack-nova02:09
*** annp has joined #openstack-nova02:26
*** READ10 has quit IRC02:32
*** openstackgerrit has joined #openstack-nova02:45
openstackgerritTakashi NATSUME proposed openstack/nova master: Remove mox in virt/test_block_device.py  https://review.openstack.org/56615302:45
*** macza has joined #openstack-nova02:55
*** Dinesh_Bhor has joined #openstack-nova02:55
*** macza has quit IRC02:59
*** hshiina has joined #openstack-nova03:02
*** hshiina has joined #openstack-nova03:03
*** hshiina has quit IRC03:03
*** hshiina has joined #openstack-nova03:04
*** hshiina has quit IRC03:05
*** hshiina has joined #openstack-nova03:06
*** hshiina has quit IRC03:06
*** hshiina has joined #openstack-nova03:08
*** hshiina has quit IRC03:08
*** hshiina has joined #openstack-nova03:09
*** hshiina has quit IRC03:09
*** hshiina has joined #openstack-nova03:11
*** slaweq has joined #openstack-nova03:11
*** slaweq has quit IRC03:16
*** Dinesh_Bhor has quit IRC03:36
openstackgerritMerged openstack/python-novaclient master: Add support for microversion 2.67: BDMv2 volume_type  https://review.openstack.org/60974303:37
*** hongbin has quit IRC03:48
*** Bhujay has joined #openstack-nova03:51
*** Bhujay has quit IRC03:52
*** lpetrut has joined #openstack-nova03:52
*** Bhujay has joined #openstack-nova03:52
*** Bhujay has quit IRC04:06
*** slaweq has joined #openstack-nova04:11
*** macza has joined #openstack-nova04:12
*** slaweq has quit IRC04:16
*** Dinesh_Bhor has joined #openstack-nova04:30
*** lpetrut has quit IRC04:34
*** liuyulong has quit IRC04:58
*** tetsuro has quit IRC05:04
*** macza_ has joined #openstack-nova05:08
*** slaweq has joined #openstack-nova05:11
*** macza has quit IRC05:12
*** macza_ has quit IRC05:12
*** macza has joined #openstack-nova05:13
*** slaweq has quit IRC05:15
*** macza has quit IRC05:20
*** macza has joined #openstack-nova05:20
*** macza has quit IRC05:24
*** ratailor has joined #openstack-nova05:30
*** janki has joined #openstack-nova05:47
*** dpawlik has joined #openstack-nova05:55
*** dpawlik has quit IRC06:00
*** Luzi has joined #openstack-nova06:01
*** dpawlik has joined #openstack-nova06:11
*** slaweq has joined #openstack-nova06:11
*** dpawlik has quit IRC06:12
*** dpawlik has joined #openstack-nova06:12
*** Dinesh_Bhor has quit IRC06:16
*** slaweq has quit IRC06:16
*** adrianc has joined #openstack-nova06:28
*** moshele has joined #openstack-nova06:30
*** edisonxiang has joined #openstack-nova06:44
*** tommylikehu has joined #openstack-nova06:54
*** slaweq has joined #openstack-nova06:56
*** mrjk has quit IRC07:00
*** rcernin has quit IRC07:01
*** mrjk has joined #openstack-nova07:02
*** helenafm has joined #openstack-nova07:18
*** jpena|off is now known as jpena07:20
*** lpetrut has joined #openstack-nova07:20
*** lpetrut has quit IRC07:25
*** markvoelker has quit IRC07:29
*** markvoelker has joined #openstack-nova07:29
*** ralonsoh has joined #openstack-nova07:29
*** jangutter has quit IRC07:30
*** jangutter has joined #openstack-nova07:30
*** ttsiouts has joined #openstack-nova07:30
*** Dinesh_Bhor has joined #openstack-nova07:33
*** mvkr has quit IRC07:34
*** markvoelker has quit IRC07:34
bauzasgood morning nova07:37
*** edisonxiang has quit IRC07:39
openstackgerritJan Gutter proposed openstack/os-vif master: Extend host_info to cover port profiles  https://review.openstack.org/61063607:40
*** icey has quit IRC07:52
*** icey has joined #openstack-nova07:52
dpawlikmorning08:01
dpawlikquick question: why nova service-list shows me ID instead of UUID ?08:01
*** dtantsur|afk is now known as dtantsur08:03
*** moshele has quit IRC08:05
*** gnuoy has quit IRC08:10
*** k_mouza has joined #openstack-nova08:12
openstackgerritMerged openstack/nova stable/rocky: Handle missing marker during online data migration  https://review.openstack.org/60857208:13
*** mvkr has joined #openstack-nova08:14
*** tetsuro has joined #openstack-nova08:21
*** yikun has quit IRC08:22
*** hshiina has quit IRC08:23
*** fghaas has joined #openstack-nova08:27
fghaaskashyap: taking the liberty to follow up on https://review.openstack.org/#/c/609788 which you asked me to take a stab on. If you could give that a read to make sure that I didn't chuck in anything stupid, I'd be grateful. Thanks!08:29
*** markvoelker has joined #openstack-nova08:30
kashyapfghaas: Hey08:30
kashyapfghaas: I did see the email, and even partly reviewed it.  But was buried in preparing a conference talk08:31
kashyap(Also incidentally related to CPU models)08:31
*** priteau has joined #openstack-nova08:31
kashyapfghaas: I'll definitely look at it by EOD, I have it open.  Sorry for the delay08:31
fghaasI'm so surprised that your conf talk would be about *that*, of all things. ;)08:31
kashyapHeh08:31
kashyapWill you be in Edinburgh?08:32
kashyap(Open Source Summit & KVM Forum)08:32
fghaasNo I won't. They waitlisted my talk, but it didn't get in.08:33
fghaasBut I will be in Berlin assuming you're headed there.08:33
dpawliknvm about my question. I didn't saw that there is nova and nova_legacy service type08:34
kashyapfghaas: Yeah, will be there08:34
fghaasPerfect. I still owe you dinner or at least a drink for your help with nesting and migration.08:34
*** mvkr has quit IRC08:35
*** gnuoy has joined #openstack-nova08:41
*** takashin has left #openstack-nova08:42
kashyapfghaas: No, don't be crazy08:47
kashyap:-)08:47
kashyapfghaas: Did a quick review, posted a couple of comments.  Looks largely good.08:47
* kashyap crawls back into LaTex cave08:47
*** mvkr has joined #openstack-nova08:48
kashyap(And yes, a drink in Berlin does sound good.)08:48
fghaasoh, beamer. That's something my little brain is incapable of grasping. :)08:50
kashyapfghaas: I hate myself for the amount of time I spend on it.  But I love its typography08:51
kashyapAnd the TikZ diagramming is so clean, nothing comes close to it.08:51
kashyapBut I'm not just super fast with it.  Maybe in a few years08:51
*** derekh has joined #openstack-nova08:52
*** cdent has joined #openstack-nova08:59
fghaaskashyap: reveal.js plus draw.io is my drug of choice.09:02
fghaasReplied to your comments. Sorry for putting on my difficult tech writer hat. :)09:03
*** markvoelker has quit IRC09:03
kashyapThe day I get fed up w/ TeX+TikZ, I'll consider the alternatives :-)09:04
kashyapAnd nit-picky tech writer hat is good, I appreciate it.09:05
kashyapfghaas: Aah, some of the text was already pre-exisiting.  I should've realized it.  Will respond in a few09:06
fghaasNo rush at all. If we can get this sorted within the week, I'm happy.09:06
*** panda|off has quit IRC09:15
*** panda has joined #openstack-nova09:18
*** helenafm has quit IRC09:25
openstackgerritJan Gutter proposed openstack/os-vif master: Do not call linux_net.delete_net_dev on Windows  https://review.openstack.org/61091609:30
jangutterIs anyone familiar enough with Hyper-V functionality to check for a better way of doing ^ ?09:31
openstackgerritJan Gutter proposed openstack/os-vif master: Extend host_info to cover port profiles  https://review.openstack.org/61063609:33
*** tommylikehu has quit IRC09:33
*** k_mouza has quit IRC09:36
*** k_mouza has joined #openstack-nova09:37
*** mvkr has quit IRC09:40
*** mvkr has joined #openstack-nova09:41
openstackgerritStephen Finucane proposed openstack/nova master: Modify PciDevice.uuid generation code  https://review.openstack.org/53048709:44
openstackgerritStephen Finucane proposed openstack/nova master: Add an online migration for PciDevice.uuid  https://review.openstack.org/53090509:44
*** ShilpaSD has joined #openstack-nova09:45
ShilpaSDmriedem: Hi09:47
ShilpaSDmriedem: Guide me how to reprodiuce cold migrate issue mentioned at https://bugs.launchpad.net/nova/+bug/1784020 in section 2. b)09:48
openstackLaunchpad bug 1784020 in OpenStack Compute (nova) "Shared storage providers are not supported and will break things if used" [Medium,Fix released]09:48
*** imacdonn has quit IRC09:54
*** imacdonn has joined #openstack-nova09:55
*** mrch has joined #openstack-nova09:56
openstackgerritFlorian Haas proposed openstack/nova master: Explain cpu_model_extra_flags and nested guest support  https://review.openstack.org/60978809:56
openstackgerritFlorian Haas proposed openstack/nova stable/queens: Explain cpu_model_extra_flags and nested guest support  https://review.openstack.org/60979009:57
*** k_mouza has quit IRC09:58
openstackgerritFlorian Haas proposed openstack/nova stable/rocky: Explain cpu_model_extra_flags and nested guest support  https://review.openstack.org/60978909:59
*** TuanDA has quit IRC10:00
*** markvoelker has joined #openstack-nova10:00
openstackgerritJan Gutter proposed openstack/os-vif master: Do not call linux_net.delete_net_dev on Windows  https://review.openstack.org/61091610:01
openstackgerritFlorian Haas proposed openstack/nova stable/queens: Explain cpu_model_extra_flags and nested guest support  https://review.openstack.org/60979010:02
openstackgerritJan Gutter proposed openstack/os-vif master: Extend host_info to cover port profiles  https://review.openstack.org/61063610:02
*** fghaas has quit IRC10:02
*** tetsuro has quit IRC10:03
*** erlon has joined #openstack-nova10:07
stephenfingibi: Are you allowed to review these? https://review.openstack.org/#/c/482629/10:07
stephenfinThis one too https://review.openstack.org/#/c/580345/10:08
*** ttsiouts has quit IRC10:08
* stephenfin has had http://burndown.peermore.com/nova-notification/ open for weeks now and would really like to close it :)10:08
*** tetsuro has joined #openstack-nova10:14
*** tetsuro has quit IRC10:16
*** Dinesh_Bhor has quit IRC10:20
*** k_mouza has joined #openstack-nova10:25
*** ttsiouts has joined #openstack-nova10:29
openstackgerritMerged openstack/nova stable/rocky: hyperv: Cleans up live migration Planned VM  https://review.openstack.org/60269810:29
openstackgerritMerged openstack/nova master: doc: update metadata service doc  https://review.openstack.org/60259310:30
*** markvoelker has quit IRC10:34
*** moshele has joined #openstack-nova10:34
gibistephenfin: what do you mean by allowed? :)10:34
* gibi opening the links10:34
stephenfingibi: I wasn't sure if you'd authored any of them but it seems you haven't so all good10:35
gibistephenfin: yeah, I did not write those so I'm going to finish them off this afternoon10:35
* gibi was lazy in the past weeks about notification patches10:35
*** helenafm has joined #openstack-nova10:39
*** tbachman has quit IRC10:40
*** macza has joined #openstack-nova10:43
*** fghaas has joined #openstack-nova10:46
*** adrianc has quit IRC10:46
*** adrianc has joined #openstack-nova10:46
*** macza has quit IRC10:47
*** brinzhang has joined #openstack-nova10:47
openstackgerritGhanshyam Mann proposed openstack/nova master: Remove duplicate legacy-tempest-dsvm-multinode-full job  https://review.openstack.org/61093110:50
jangutterralonsoh: do you know anyone knowlegeable on hyper-v networking?10:51
*** Dinesh_Bhor has joined #openstack-nova10:54
ralonsohjangutter: no, sorry10:55
openstackgerritMerged openstack/nova stable/rocky: Replace usage of get_legacy_facade() with get_engine()  https://review.openstack.org/60857411:00
*** ttsiouts has quit IRC11:02
*** ttsiouts has joined #openstack-nova11:03
*** macza has joined #openstack-nova11:04
*** Dinesh_Bhor has quit IRC11:06
*** ttsiouts has quit IRC11:07
*** dave-mccowan has joined #openstack-nova11:09
*** macza has quit IRC11:09
*** ttsiouts has joined #openstack-nova11:17
*** READ10 has joined #openstack-nova11:25
*** jpena is now known as jpena|lunch11:29
*** k_mouza_ has joined #openstack-nova11:31
*** markvoelker has joined #openstack-nova11:31
*** ttsiouts has quit IRC11:31
*** ttsiouts has joined #openstack-nova11:32
gibistephenfin: I've sent the volume notification patch to the gate but I have concerns about the compute_task one11:32
*** k_mouza has quit IRC11:34
*** ttsiouts has quit IRC11:34
*** ttsiouts has joined #openstack-nova11:35
*** ttsiouts has quit IRC11:37
*** brinzhang has quit IRC11:38
*** ttsiouts has joined #openstack-nova11:38
*** k_mouza_ has quit IRC11:43
*** k_mouza has joined #openstack-nova11:45
*** k_mouza has quit IRC11:46
*** ttsiouts has quit IRC11:46
openstackgerritJan Gutter proposed openstack/os-vif master: Do not call linux_net.delete_net_dev on Windows  https://review.openstack.org/61091611:47
*** k_mouza has joined #openstack-nova11:47
*** ttsiouts has joined #openstack-nova11:47
*** ttsiouts has quit IRC11:47
*** jistr is now known as jistr|afk11:47
*** ttsiouts has joined #openstack-nova11:47
openstackgerritJan Gutter proposed openstack/os-vif master: Extend host_info to cover port profiles  https://review.openstack.org/61063611:50
*** ttsiouts has quit IRC11:50
*** ttsiouts has joined #openstack-nova11:51
*** READ10 has quit IRC11:51
*** ttsiouts has quit IRC11:55
*** fghaas has quit IRC11:57
*** ratailor has quit IRC12:00
*** ttsiouts has joined #openstack-nova12:00
*** markvoelker has quit IRC12:04
*** tbachman has joined #openstack-nova12:04
*** amorin has joined #openstack-nova12:08
amorinhey all12:08
amorinI am facing issue on my openstack deployment when I try to live migrate an instance12:08
amorinif the original image is deleted on glance12:09
amorinlive-migration fail12:09
*** tbachman has quit IRC12:09
amorinI am using openstack newton12:09
amorindo you know if this bug is already declared somewhere ?12:09
amorinEventually fixed?12:09
*** tbachman has joined #openstack-nova12:10
*** tbachman has quit IRC12:22
*** k_mouza has quit IRC12:22
*** ttsiouts has quit IRC12:32
*** priteau has quit IRC12:39
*** k_mouza has joined #openstack-nova12:40
*** priteau has joined #openstack-nova12:40
*** k_mouza_ has joined #openstack-nova12:41
*** tbachman has joined #openstack-nova12:42
*** k_mouza has quit IRC12:45
*** macza has joined #openstack-nova12:53
*** macza has quit IRC12:58
bauzasamorin: do you have some stacktrace to share ?13:09
*** efried has joined #openstack-nova13:10
amorinwell, it's not a stacktrace but just a image not found13:10
amorinI will show you13:10
amorinI have that on source host:13:10
amorin[instance: e4c16231-5c3e-49d3-b985-c963bfa52437] Live Migration failure: internal error: info migration reply was missing return status13:10
bauzasamorin: we had a bug like this https://bugs.launchpad.net/nova/+bug/127082513:11
openstackLaunchpad bug 1270825 in OpenStack Compute (nova) "Live block migration fails for instances whose glance images have been deleted" [High,Fix released] - Assigned to melanie witt (melwitt)13:11
*** tbachman has quit IRC13:11
amorinsounds like my  issue !13:11
bauzaswe also had https://bugs.launchpad.net/nova/+bug/154677813:12
openstackLaunchpad bug 1546778 in OpenStack Compute (nova) liberty "libvirt: resize with deleted backing image fails" [Medium,Fix released] - Assigned to Matthew Booth (mbooth-9)13:12
bauzasbut that's for resize13:12
amorinfirst bug look good, but I am running openstack newton...13:12
amorinseems that the bug is supposed to be fixed since juno13:13
*** mchlumsky has joined #openstack-nova13:13
bauzasyup, very old bug, hence the needed stacktrace13:13
*** dpawlik has quit IRC13:13
bauzaswe need to understand more why it fails13:13
*** tbachman has joined #openstack-nova13:15
amorinI'll try to restart nova compute in debug and find a trace13:17
amorinon both source and dest host13:17
amorinI will come back to you asap13:17
*** dpawlik has joined #openstack-nova13:18
*** jistr|afk is now known as jistr13:20
bauzasok, please highlight me when you reply so I can see it13:20
bauzasamorin: like this :)13:20
amorinyup13:20
openstackgerritAdam Spiers proposed openstack/nova-specs master: Add spec for libvirt driver launching AMD SEV-encrypted instances  https://review.openstack.org/60977913:22
*** mriedem has joined #openstack-nova13:23
*** k_mouza has joined #openstack-nova13:24
*** tbachman has quit IRC13:27
*** k_mouza_ has quit IRC13:27
*** dpawlik has quit IRC13:30
*** tbachman has joined #openstack-nova13:34
*** awaugama has joined #openstack-nova13:37
*** ociuhandu has joined #openstack-nova13:37
*** jpena|lunch is now known as jpena13:37
*** ociuhandu has quit IRC13:38
*** dpawlik has joined #openstack-nova13:40
openstackgerritMatt Riedemann proposed openstack/nova stable/queens: Handle missing marker during online data migration  https://review.openstack.org/61097413:42
mnaserasking here because i think nova would probably be a project that does this but13:45
mnaseris it possible to have multiple rootwrap daemons?13:45
mnaseri'm running to problems with neutron vpnagent doing a lot of commands that take a long time which block everything else from running (in rootwrap daemon)13:45
*** eharney has joined #openstack-nova13:45
dansmithmultiple in a load-balance situation? I kinda doubt it13:46
mnasereven as in dedicated situation13:46
dansmithI think that's one of the many performance limitations of rootwrap13:46
mnaserit also made me wonder if privsep has the same issues13:46
mnaserwhich i think it might not?13:46
dansmithI think privsep is the same13:46
mnaserbecause i frequently see privsep spawn processes for a specific module or so13:47
dansmithISTR the cinder people being concerned about that13:47
mnaseri much rather have vpnaas (in this case) fight for resources between itself rather than with things that are important and need to happen in a few seconds13:47
mnaseri guess the advantage this would give would be the ability to replace a shell out by a code/library13:49
smcginnisThere is a performance issue with privsep right now that it serializes anything it runs.13:52
smcginnisSo only one "priveleged" thing can happen at a time.13:53
smcginnisBut there's a patch up to fix that.13:53
mnasersmcginnis: thats the case with rootwrap daemon too, no?13:53
mnaserat least, that's what the behavior im seeing anyways13:53
smcginnismnaser: I didn't think so. That just calls out to run commands, so I thought it didn't have the same issue.13:53
mnaserwell, rootwrap yes, it just calls out to run commads, but rootwrap daemon seems to do the whole serialize thing13:54
mnaser(i think)13:54
smcginnisHere's the privsep patch if anyone is interested - https://review.openstack.org/#/c/593556/13:54
smcginnisIt must not be quite as bad. There was push back on moving fully to privsep because there was a noticeable performance impact in doing so.13:55
openstackgerritStephen Finucane proposed openstack/nova master: Transform scheduler.select_destinations notification  https://review.openstack.org/50850613:55
dansmithsmcginnis: to be clear, he's talking about rootwrap *daemon*13:57
*** janki has quit IRC13:58
smcginnisYeah14:00
*** Luzi has quit IRC14:00
mnaseri wonder if i can spawn an independent/second rootwrap daemon14:01
*** READ10 has joined #openstack-nova14:04
*** mlavalle has joined #openstack-nova14:07
*** jcosmao has joined #openstack-nova14:08
*** mrch has quit IRC14:10
*** dpawlik has quit IRC14:11
bauzasmnaser: I guess the problem is how nova.rootwrap would know which daemon to pick14:19
*** munimeha1 has joined #openstack-nova14:19
bauzasmnaser: oh wait, you can spawn multiple daemons, each per service, nope ?14:21
mnaserbauzas: i think so, thats what im attempting14:21
mnaserjust launch another client..14:22
openstackgerritMerged openstack/nova master: Use tempest-pg-full  https://review.openstack.org/60995414:22
bauzasanyway, taxi time14:22
*** k_mouza has quit IRC14:24
*** moshele has quit IRC14:29
*** eharney has quit IRC14:33
*** k_mouza has joined #openstack-nova14:40
*** dpawlik has joined #openstack-nova14:46
*** eharney has joined #openstack-nova14:48
*** dpawlik has quit IRC14:50
bauzasmnaser: interesting to read https://specs.openstack.org/openstack/oslo-specs/specs/juno/rootwrap-daemon-mode.html#client-api14:57
*** ccamacho has quit IRC14:57
*** ccamacho has joined #openstack-nova15:00
*** READ10 has quit IRC15:01
bauzasmnaser: more interesting https://github.com/openstack/nova/blob/master/nova/utils.py#L12615:05
bauzasmnaser: we allow one client per rootwrap config15:05
bauzasif two configs, two clients15:06
bauzasand since clients lazily load daemons if needed...15:06
bauzasand I guess https://github.com/openstack/nova/blob/master/nova/utils.py#L123 is your PITA15:07
mnaserbauzas: I’m investigating on how to pull this out and seeing what breaks terribly with multiple clients15:09
bauzasmnaser: could you test something ? what if you have two distinct services running different config files, with each of them differencing by the rootwrap_config option value15:13
bauzasmnaser: in this case, I guess we would automatically create two clients and two daemons15:14
openstackgerritSundar Nadathur proposed openstack/nova-specs master: Nova Cyborg interaction specification.  https://review.openstack.org/60395515:27
kashyapmriedem: Hi, I saw a ping fly by last night on 'hpet' and libvirt.  I wonder if it's resolved15:29
* kashyap has been heads-down preparing for a conf; so been a bit less active here15:30
mriedemhttps://review.openstack.org/#/c/607989/15:34
mriedemlots of chatter yesterday about whether or not libvirt exposed hpet capability as a clock source for the host caps15:35
mriedemit doesn't look like it does15:35
openstackgerritJan Gutter proposed openstack/os-vif master: Do not call linux_net.delete_net_dev on Windows  https://review.openstack.org/61091615:36
openstackgerritJan Gutter proposed openstack/os-vif master: Fix random test_unplug_ovs failures  https://review.openstack.org/61101715:36
openstackgerritIvaylo Mitev proposed openstack/nova master: VMware: OVA and StrOpt images as VM templates  https://review.openstack.org/60973615:36
*** macza has joined #openstack-nova15:39
*** hamzy has quit IRC15:42
* kashyap goes to read the review15:44
*** macza has quit IRC15:44
*** helenafm has quit IRC15:46
mriedemgerritbot must be dead15:50
mriedemimacdonn: +W on https://review.openstack.org/#/c/608091/15:50
imacdonnmriedem: OK, thanks .... I need to get a new rev in to fix a typo on in the release note15:51
mriedemi fixed it15:51
imacdonnmriedem: ok, cool, thanks .. just got email from review too15:51
*** macza has joined #openstack-nova15:57
*** macza has quit IRC15:58
*** macza has joined #openstack-nova16:00
kashyapmriedem: So, from a quick chat w/ the QEMU & libvirt folks --16:00
kashyapThey say: "I'd would not do that" (configuring 'hpet')16:01
kashyapAs it's super expensive compared to other timer sources16:01
kashyapAnd even worse for virtual machines16:01
kashyap"HPET access involves a context switch to QEMU userspace, where as TSC is handled by KVM natively"16:01
kashyapmriedem: But to your original question -- no I don't see it in libvirt's host capabilities either.16:02
kashyapProbably because "no one has asked for it before"16:04
* kashyap goes to comment on the review, about the value of enabling HPET16:04
kashyapmriedem: Responded on the review with details.16:12
*** macza has quit IRC16:13
*** macza has joined #openstack-nova16:13
efriedhah, so after all of that, we may wind up not doing this thing at all?16:14
cdentrad16:15
mriedemwell it was explicitly disabled in libvirt guests originally for a reason16:15
mriedemb/c apparently at least for windows images it can skew the clock in the guest16:16
mriedemkashyap: thanks for investigating16:16
mriedemartom: +2 on your live migration cleanup thing https://review.openstack.org/#/c/609517/16:16
artommriedem, thanks for the thorough reviewing :)16:16
artomdansmith, feel like hitting up ^^ ?16:17
openstackgerritRodolfo Alonso Hernandez proposed openstack/os-vif master: Add native implementation OVSDB API  https://review.openstack.org/48222616:19
kashyapmriedem: Yeah, also RHEL disables it completely even now16:19
*** tssurya has joined #openstack-nova16:31
dansmithartom: man that's a lot of derping16:35
artomdansmith, herp16:35
*** moshele has joined #openstack-nova16:35
kashyapHey folks, a random question -- does anyone came across upstream bugs asking for CPU hotplug in Nova?16:36
* kashyap goes to do a look up16:36
kashyapOkay, I see a few blueprints, old and new16:37
mnaserbefore i start diving16:41
mnaserreally old environment: juno-era, upgraded all the way up to rocky (no ffus) .. i'm seeing exceptions in placement once i hit rocky (around _create_incomplete_consumers_for_provider)16:42
mnaserwit DBDuplicateEntry exceptions for unique consumer uuid16:42
mnaseri'm guessing that it's trying to create incomplete consumers but they're there, or something.16:43
openstackgerritArtom Lifshitz proposed openstack/nova stable/rocky: Handle volume API failure in _post_live_migration  https://review.openstack.org/61108316:43
mriedemmnaser: traceback in a paste?16:44
*** adrianc has quit IRC16:44
tssuryadansmith: had a question about the cell templating stuff,16:44
openstackgerritArtom Lifshitz proposed openstack/nova stable/queens: Handle volume API failure in _post_live_migration  https://review.openstack.org/61108416:44
dansmithtssurya: yah?16:44
mnasermriedem: http://paste.openstack.org/show/732260/16:45
tssuryashouldn't we consider the cell0's transport_url here : https://github.com/openstack/nova/blob/a53e46a75936b55c93face840764a67f2186cb11/nova/objects/cell_mapping.py#L144 ?16:45
mnaserOH also fun little thing i found out about today16:45
openstackgerritStephen Finucane proposed openstack/nova master: Fail to live migration if instance has a NUMA topology  https://review.openstack.org/61108816:45
stephenfinartom: Thoughts? https://review.openstack.org/61108816:45
mnaserQ=>R upgrades requires you to run api_db sync first then db sync after (i dont think this is documented)16:45
tssuryaright now running db sync without local_cell parameter gives out errors16:45
*** jpena is now known as jpena|off16:45
mnaserbecause cell disabled field is missing from api database so the db sync fails16:45
stephenfinartom: I'd personally like to backport that as far as we can go. I'm kind of sick of explaining how broken this is to people16:45
mnaserapi_db sync first adds that field, which then lets db sync do it after16:46
mriedemmnaser: i think it's ordered that way in the upgrade docs16:46
melwitt16:46
mnaserreally, let me double check16:46
dansmithtssurya: not sure what you mean.. pastebin an error?16:46
mriedemhttps://docs.openstack.org/nova/latest/user/upgrade.html#rolling-upgrade-process16:46
tssuryamnaser, mriedem: yea someone ran into the same issue and we changed the order16:46
mriedem"Using the newly installed nova code, run the DB sync. (nova-manage api_db sync; nova-manage db sync). These schema change operations should have minimal or no effect on performance, and should not cause any operations to fail."16:46
tssuryadansmith: ok16:46
artomstephenfin, I don't know the full history, but I feel like it's opening a can of worms16:46
mnasermriedem: serves me right for looking at the queens docs thinking it hasnt change because it "looks" the same16:47
artomstephenfin, also, I could imagine a scenario where an operator really pinky swears the destination host is fine, and wants to live migrate regardless16:47
mnaseryou're right, the order was swapped in rocky, my bad16:47
artomstephenfin, so I'm not sure I'm comfortable with such a heavy handed approach16:47
mnaserbut anyways, back to that gigantic traceback16:47
mriedemmnaser: i think grenade was doing it the right way before that docs change,16:47
mriedemour docs were just old16:47
tssuryadansmith: https://pastebin.com/7cQKv0fz16:47
artomstephenfin, totally get where you're coming from though :)16:47
stephenfinartom: Fair point. Wanna stick your thoughts in that review?16:47
* stephenfin has done that and is now off home16:47
dansmithtssurya: oh because cell0's transport_url column can be NULL ?16:48
artomstephenfin, yep, will do16:48
*** panda is now known as panda|off16:48
stephenfinta16:48
tssuryayea16:48
dansmithtssurya: add "and val" here: https://github.com/openstack/nova/blob/a53e46a75936b55c93face840764a67f2186cb11/nova/objects/cell_mapping.py#L16216:48
dansmithtssurya: you gonna cook up a patch or do you want me to?16:48
dansmithtssurya: I wonder why/how we're not hitting that in the gate?16:49
dansmithdo we set it to something bogus?16:49
tssuryadansmith: would be nice if you do it..16:49
dansmithtssurya: sure16:49
tssuryawe were just doing the upgrade checks for rocky and saw this16:49
mriedemmnaser: so this is being triggered when listing resource providers, what is doing that?16:49
mnasermriedem: i'm assuming nova-scheduler?16:50
mnaserhttps://github.com/openstack/nova/blob/377921103121bc62a3f7fce60c63e30815406851/nova/api/openstack/placement/objects/resource_provider.py#L1923-L196516:50
mnaserthis is interesting16:50
dansmithtssurya: hmm, actually, it's not nullable on the object, so I'm not sure how it'd be doing the right thing if you have NULL in the db16:50
tssuryadansmith: but we can have NULL on the conf file16:51
tssuryatechnically there is no requirement to produce a transport_url for db sync right ?16:51
tssuryas/to produce/to supply16:52
dansmithtssurya: ohh, I see, I thought you were saying it was NULL in the database16:52
*** k_mouza_ has joined #openstack-nova16:52
mnaser[placement] incomplete_consumer_project_id and incomplete_consumer_user_id are a thing, i guess16:52
tssuryano not in the database. but maybe we hit it here ? https://github.com/openstack/nova/blob/a53e46a75936b55c93face840764a67f2186cb11/nova/objects/cell_mapping.py#L15016:52
mriedemmnaser: oh i guess adding/removing host aggregates in rocky would do it b/c we have to find the provider by name which we use GET /resource_providers?name=foo for that16:52
mriedemand when building the "provider tree"16:52
mriedemwhen reporting inventory16:53
mriedemfrom the compute16:53
mnasermriedem: the 500 is coming on /resource_providers/foo/allocations"16:53
dansmithtssurya: so there is already a check for None-ness, but I guess if the value in the db isn't a template we'll fail to exit16:53
mnaserand what it seems like almost any time its requesting allocations16:54
mriedemmnaser: yeah that happens in the resource tracker on the compute16:54
tssuryadansmith: ah yea was just wondering why that check didn't catch the Noneness16:54
mriedem_remove_deleted_instances_allocations method16:54
mnaseryep, i see compute ips sending in that request16:54
mriedemso on startup of the compute, it's going to list allocations for the given compute ndoe provider16:54
mriedemand try to create consumers table records for that provider and any allocations against it16:55
mriedemi'm not sure how we race to hit _create_incomplete_consumers_for_provider though16:55
mriedembecause that should be idempotent16:55
mriedemand jaypipes isn't around16:55
*** k_mouza has quit IRC16:55
dansmithtssurya: I'm shocked we haven't seen this in the regular tests16:55
tssuryadansmith: yea we should have seen this somewhere16:56
*** k_mouza_ has quit IRC16:56
tssuryanot sure if people didn't hit this when moving to rocky ?16:56
dansmithwell, I'm not sure why devstack doesn't hit it16:57
dansmithI guess because we always have those defined in config16:57
tssuryaprobably yea, but its easily reproducible16:57
dansmithyep16:58
openstackgerritArtom Lifshitz proposed openstack/nova stable/pike: Handle volume API failure in _post_live_migration  https://review.openstack.org/61109316:58
tssuryawill you also backport this please ? we might need this in queens16:58
mriedemour base test case uses the rpc fixture https://github.com/openstack/nova/blob/377921103121bc62a3f7fce60c63e30815406851/nova/test.py#L23816:58
mriedemso that's probably why we'd never hit it?16:58
mriedemtssurya: the template stuff wasn't in queens16:58
mriedemhttps://github.com/openstack/nova/blob/396156eb13521a0e7af4488a8cd4693aa65a0da2/nova/tests/fixtures.py#L72816:59
mriedemall of our tests at least configure this: transport_url = 'fake:/'16:59
tssuryamriedem: oh yea sorry rocky then16:59
mriedemtssurya: have you reported a bug?16:59
dansmithmriedem: I got one already17:00
dansmithcoming17:00
tssuryamriedem: no17:00
openstackgerritDan Smith proposed openstack/nova master: Fix formatting non-templated cell URLs with no config  https://review.openstack.org/61109417:00
tssuryadansmith: ah thanks :)17:00
dansmithmriedem: tssurya ^17:00
mriedemmnaser: well i'm not sure how it's happening, but clearly we could race if two requests are listing resource providers at the same time,17:00
mriedemmnaser: so likely just bug report it and we can add a try/except for the duplicate entry error17:00
dansmithmriedem: I meant devstack-based tests, but it's because we always have a transport_url set I think..17:00
mnasermriedem: but i mean this is constantly happening and i think i cant provision any more vms on the cloud with a hostnotfound type of thing17:00
dansmithand yeah, this'll be a rocky backport17:01
*** moshele has quit IRC17:01
mriedemmnaser: so i wonder if https://github.com/openstack/nova/blob/377921103121bc62a3f7fce60c63e30815406851/nova/api/openstack/placement/objects/resource_provider.py#L1940 is always True?17:01
mnasermriedem: thats what im trying to decipher17:02
openstackgerritDan Smith proposed openstack/nova master: Fix formatting non-templated cell URLs with no config  https://review.openstack.org/61109417:02
*** dtantsur is now known as dtantsur|afk17:03
mnasermriedem: i think this is a cloud with some brokenness that is being exposed17:04
mnaserright now, there are records in allocations for that specific uuid under resource_provider_id=417:04
mnaserbut in that traceback, it tries to add allocations for resource_provider_id=517:04
mnasersorry, no, i lied17:05
mnaserit actaully tries to add for 417:05
mnaserlet me see if there is anything in cosnumers17:05
*** hamzy has joined #openstack-nova17:05
tssuryadansmith: thanks a lot17:06
mnaseri think this sql server is f'd17:07
mnaserselect * from consumers where uuid='fbee657b-6a60-4525-b7c8-b070643404ec'; returns nothing17:07
mriedemthere aren't consumer records until rocky17:07
mriedemand the data migration function that is blowing up is trying to populate that table from existing allocations records17:08
dansmithtssurya: np17:08
mnasermriedem: right, but the traceback seems to say: Duplicate entry 'fbee657b-6a60-4525-b7c8-b070643404ec' for key 'uniq_consumers0uuid'17:08
mnaseryet -- select * from consumers where uuid='fbee657b-6a60-4525-b7c8-b070643404ec'; -- returns nothing17:09
mriedemhmm it's doing an insert from select,17:09
mriedemso the select results probably have duplicates17:09
mriedemand those aren't being trimmed17:09
mriedemand i bet the test for this only had 1 allocation against 1 provider17:09
mriedemor something like that17:10
mnaserlets test that out17:10
*** munimeha1 has quit IRC17:10
mriedemwould be nice to see what the select query results are17:10
mriedemthe sql-fu in here is hard for me to grok17:10
mnasermriedem: you're right17:11
mnaser9 rows returned from that17:11
mriedemwhat's the select query?17:11
mnasermriedem: http://paste.openstack.org/show/732263/17:11
mnaserstole this from the traceback17:12
mnaserit was right next to the error17:12
*** whoami-rajat has quit IRC17:13
*** mvkr has quit IRC17:13
mnasermriedem: yup.. i see 9 records but really 3 unique ones17:14
mriedemok, so i bet 3 instances with allocations against a single provider, and each instance has 3 resource class allocations (VCPU, MEMORY_MB and DISK_GB)17:15
mriedemand we're not collapsing those 3 allocations for the same consumer into a single consumer entry17:15
mriedemlet me see if i can dig up what is supposed to be testing this17:16
mnasermriedem: thats exactly the case17:17
mriedem\o/17:17
mnaseri can confirm same resource provider each, with 3 resource classes17:17
mnaserhow come the others didnt break when migrating17:17
mnaseri mean this isn't exactly an outlier17:17
mriedemdon't know17:17
mriedemhttps://review.openstack.org/#/c/565405/26/nova/tests/functional/api/openstack/placement/db/test_consumer.py is only testing with 3 unique allocations each with a single resource class17:18
mriedemso that's why i guess tests didn't catch it17:18
*** k_mouza has joined #openstack-nova17:18
mnaserpoop17:19
mnaserwell17:19
mnaseri guess i gotta find a fix17:19
* mnaser doesnt wanna db hack it17:19
mriedemi'm having a hard f'ing time understanding these test17:19
mriedem*tests17:19
mnaseryeah :\17:19
mnaserand the whole logic too17:19
mriedemwell for the select query, i'd think we need to group the allocations records results by consumer_id17:20
mriedemreally need a recreate in a test to see how to fix this17:21
mnasermriedem: or maybe just even a select distinct?17:21
mnaserbut yes, i agree17:21
mriedemyeah true,17:22
mriedemagain, wish jay was here17:22
mriedemalso, this is a placement bug so maybe i can just kick you over to that channel and let those guys fix it :P17:22
mnaserlolll17:23
mnaseri mean you're not wrong17:23
*** k_mouza has quit IRC17:23
mnaseri'll take this to #openstack-placement17:23
*** panda|off has quit IRC17:24
mnasermriedem: on a nova note im thinking maybe live migrate those machines and cheat17:24
mnaserlol17:24
*** panda has joined #openstack-nova17:27
*** ralonsoh has quit IRC17:28
openstackgerritmelanie witt proposed openstack/nova master: Bump os-brick version to 2.6.1  https://review.openstack.org/61110917:35
openstackgerritMatt Riedemann proposed openstack/nova master: Add recreate test for bug 1798163  https://review.openstack.org/61111317:49
openstackbug 1798163 in OpenStack Compute (nova) "Placement incomplete consumers online migration fails" [Undecided,New] https://launchpad.net/bugs/179816317:49
mriedemmnaser: ^ fugly but works17:49
mnasermriedem: i have a fix17:50
mnaserdo you want me to squash into yours or rather two patch?17:50
openstackgerritMohammed Naser proposed openstack/nova master: Use unique consumer_id when doing online data migration  https://review.openstack.org/61111517:55
mnaseroops missed an uncomment17:57
mnasertesting functional tests again locally17:57
melwittmriedem: should we wait for the tempest tests to merge before marking https://blueprints.launchpad.net/nova/+spec/boot-instance-specific-storage-backend as complete?17:59
mnaserbleh18:00
mnaseranother thing broke18:00
mnaserok yay18:05
mnaseri was missing somethig18:05
openstackgerritMohammed Naser proposed openstack/nova master: Use unique consumer_id when doing online data migration  https://review.openstack.org/61111518:06
mriedemmnaser: i think i might clean mine up to be a simple new test, and then you can stack on top18:09
mriedemi was rushing b/c my pizza was getting cold18:10
mnasermriedem: feel free to checkout mine locally i you want18:10
mnasermriedem: very valid reason tbh18:10
mriedemmelwitt: no, i was going to mark it today but forgot18:10
mnaseri was rushing because i have a broken cloud :-)18:10
melwittok, I can mark it18:10
mnaserbut i'm comfortable pushing that one liner now18:10
mriedemi'm glad i deleted my vexxhost vm the other day then18:10
mriedem:)18:10
mnaserits not our public cloud18:11
mnaseri wouldnt trust those nova people and the code they ship18:11
mnaser:p18:11
mriedemgood plan18:11
tssuryamriedem: question about https://review.openstack.org/#/c/571535/ . How are we able to set the compute_node.uuid which is a read-only attribute ? Am I missing something here ?18:13
mriedemplease hold dear caller18:14
tssuryaack :)18:14
*** mvkr has joined #openstack-nova18:14
mriedemwe create compute node records with uuids in tests all the time18:14
tssuryatrue.. I am somehow hitting https://pastebin.com/kqyu66wB18:15
tssuryawill look closer18:16
openstackgerritMatt Riedemann proposed openstack/nova master: Add recreate test for bug 1798163  https://review.openstack.org/61111318:16
openstackbug 1798163 in OpenStack Compute (nova) "Placement incomplete consumers online migration fails" [Undecided,In progress] https://launchpad.net/bugs/1798163 - Assigned to Mohammed Naser (mnaser)18:16
mnasermriedem: woo, fix worked here18:17
mnaserconfirmed with the LOG.info message showed up as expected18:17
mnaserwith 3 consumer records for rp #4 as the one that was broken which we were looking at18:17
mriedemrebasing18:18
mnaserlol18:19
mnaseri'm getting a KeyError now18:19
*** gyee has joined #openstack-nova18:19
mnaser(in placement)18:19
mnaserhttp://paste.openstack.org/show/732269/ this time18:19
mnaserhttps://github.com/openstack/nova/blob/master/nova/api/openstack/placement/objects/resource_provider.py#L352218:20
openstackgerritMatt Riedemann proposed openstack/nova master: Use unique consumer_id when doing online data migration  https://review.openstack.org/61111518:20
mriedemtssurya: hmm, maybe that's hitting after an upgrade?18:21
mriedembut,18:21
mriedemif that were the case, ironic's grenade job should be broken18:21
mriedemis the value changing?18:22
tssuryaso at the moment the only info I have is that we cherry-picked this to queens18:22
tssuryamaybe something is wrong because of that18:23
*** hamzy has quit IRC18:23
mriedemtssurya: ok it's called from here in the RT https://review.openstack.org/#/c/571535/2/nova/compute/resource_tracker.py@61718:24
mriedemmy guess is the compute node record already existed with a random uuid, and on restart of the compute service with the new code, it's trying to update the uuid in the compute node record using the ironic node uuid18:24
mriedemi don't know why ironic's grenade job wouldn't fail for the same reason, but it seems like an obvious oversight in that patch18:24
tssuryamriedem: yea18:24
tssuryathat's exactly what's happening18:25
tssuryaso this is for the existing nodes..18:25
mriedem\o/18:25
mriedemmnaser: as for https://github.com/openstack/nova/blob/stable/rocky/nova/api/openstack/placement/objects/resource_provider.py#L348618:25
mriedemand the KeyError18:25
mriedemi don't understand any of that code18:26
mriedemthat's all efried, tetsuro and jaypipes18:26
mriedemhttps://review.openstack.org/#/c/559480/18:27
* mriedem celebrates all the things being on fire at once18:28
tssuryaheh18:28
mriedemtssurya: report a bug for the ironic node uuid thing i guess18:28
mriedemjroll: do you know anything about the ironic grenade job?18:28
tssuryamriedem: yea I will18:28
mriedemdoes it restart n-cpu across releases?18:28
mnasermriedem: i think this is a weird env-related corner case where those havent done much reporting to placement18:29
mnasergoing to delete those stale rps18:30
mnasermriedem: every time i use osc-placement i have to say thank you18:32
mnaserits such a life saver18:32
mriedemdon't thank me, thank avolkov18:34
*** bjolo has joined #openstack-nova18:34
mriedemand rpodolykia18:34
mriedemi know i butchered that irc18:34
mriedemtssurya: yeah so you can set a readonly field while it's never been set18:35
mriedemonce it's set though, it's stuck18:35
tssuryamriedem: ah right, thanks18:35
tssuryaI am filing a bug now18:35
mriedemcdent: efried: https://review.openstack.org/#/q/topic:bug/1798163+(status:open+OR+status:merged)18:35
mriedemdansmith: melwitt: ^18:36
mriedemstarting on the placement repo cherry picks18:39
melwitt👀18:39
tssuryamriedem: https://bugs.launchpad.net/nova/+bug/179817218:39
openstackLaunchpad bug 1798172 in OpenStack Compute (nova) "Ironic driver tries to update the compute_node's UUID which of course fails in case of existing compute_nodes" [Undecided,New]18:39
mriedemi'll assume that lego block is a pile-o-poo18:39
mriedem"of course"18:40
mriedemnice18:40
mriedem:(18:40
melwittit's eyes, looking at the linked patches18:40
melwittbut pile-o-poo could have been good18:40
dansmithmriedem: so this has to be broken for anyone right?18:41
mriedemdansmith: which thing?18:41
mriedemplacement? yes.18:42
dansmithyeah18:42
mriedemi guess mnaser is first to rocky ever18:42
mnaserworld first18:42
mnaserbut now18:42
mnaserwe have MORE18:42
mnaserException during message handling: RPCVersionCapError: Requested message version, 5.0 is incompatible.  It needs to be equal in major version and less than or equal in minor version as the specified version cap 4.17.18:42
mriedemhonestly i'm not sure how it wouldn't have been a problem caught in grenade18:42
dansmithlooks straightforward18:42
mnasernova-conductor when scheduling new instances18:42
mriedemb/c in grenade we have existing instances with allocations18:42
melwittmnaser: need to get a windshield sticker for your car WORLD FIRST TO ROCKY18:42
mnasermore like a military medal for the stuff i have to go through y'all18:43
mnaserlol18:43
melwitt:***(18:43
mnaserall my packages are up to date, control plane is all rocky with queen computes18:43
melwitt(those are tears)18:43
mnaserits ok, i do it for the people18:43
mnaseropenstack upgrades are easy, i promise, i already helped fixed most of it18:43
mnaserwhere are the versions listed again? in the rpc api file?18:45
mriedemdansmith: so on upgrade to rocky we would have listed allocations for a provider here https://github.com/openstack/nova/blob/237bfcfd82fc28a955574b588fbce1d2392c9e45/nova/compute/resource_tracker.py#L1298 which should have hit the unique constraint18:46
mriedemso idk18:46
dansmithmriedem: yeah18:46
mriedemi mean in a grenade run18:46
dansmithmnaser: that rpc error looks like you have something old18:46
dansmithlike an old conductor still running maybe?18:46
mriedemmaybe we didn't hit it in grenade because the queens allocations already have project_id/user_id created?18:46
mriedems/created/set/18:46
mriedemor.... because we ran the online data migrations?18:47
mnaserdansmith: oh shit, i missed a compute.18:47
mnaserin pike=>queens18:47
melwittmriedem: well, normally allocations are required to be created with project/user and that's what creates a consumer. this bug is about the online data migration for allocations with missing consumers, prior to the microversion where we required project/user, right?18:47
dansmithmnaser: mm, I dunno, if that's coming from a conductor node I think it's probably an old conductor, but I'd have to see more about where exactly18:47
melwittso grenade probably doesn't cover that, I wouldn't think18:48
dansmithmnaser: or are you saying the auto stuff is calculating a 4.x when everything has moved past 5.0 because of a very old compute?18:48
dansmithmelwitt: but you don't create consumers directly and before some version, no consumers were created for you18:48
mnaserdansmith: there is a compute that is active running at version 22 (service table), rest of services are 30 (for queens computes) and 35 (control plaen)18:49
dansmithmelwitt: so grenade should hit this at some point, unless we run online migrations and fix them up before we run or whatever18:49
mnaserso i guess the auto stuff is calculating based on the fact that the oldest compute (which is active)18:49
dansmithmnaser: ah okay yeah18:49
mnaserso18:49
mnaserworking as intended18:49
mnaseri think the upgrade check would have probably warned me if i used it oops18:49
melwittdansmith: true, but I'm pretty sure the online migration to create missing consumers was created later on, i.e. not in the same release where we started created consumers with allocations. so maybe that's how it missed it. by the time the online data migration existed, grenade was no longer testing the old way that didn't create consumers18:50
mriedemmelwitt: different code path18:50
mriedemhttps://review.openstack.org/#/c/611115/3/nova/api/openstack/placement/objects/consumer.py@2918:50
dansmithmnaser: yeah, can't really blame us for this one :)18:51
mnaserdansmith: ill take that one :P18:51
mnaserbut yeah, i think the issue is upgrading across releases18:51
mriedemthe upgrade check CLI doesn't look to see if your minimum compute version is > N-118:51
mriedemfwiw18:51
* mnaser takes out pitchforks again18:52
melwittso the microversion that started creating consumers was 1.8, pike https://docs.openstack.org/nova/latest/user/placement.html#require-placement-project-id-user-id-in-put-allocations18:52
mriedemno one has requested it check for that18:52
*** tssurya has quit IRC18:52
melwittnow when was the online data migration added...18:52
mnaserfwiw18:52
mnaserthis cloud exists since juno to as far as i know18:52
mnaserit's seen some shit18:52
melwittcreate_incomplete consumers was added in rocky18:52
melwittso allocations without consumers would be from before pike18:53
mriedemhmm, i do seem to recall online data migrations for some placement stuff not working18:53
mriedemb/c we were hitting the wrong db config18:53
mriedemmaking it think nothing needed to be migrated18:54
melwittso any grenade that covered create_incomplete_consumers would be testing queens => rocky and never see any consumerless allocations18:54
mriedemi suppose we were using at least 1.8 when creating allocations in queens18:55
mriedemb/c of dansmith's migratoin allocation stuff18:55
melwittI thought we started using 1.8 in pike, that's when it was added18:55
mriedemyeah i guess https://review.openstack.org/#/c/469634/18:56
mriedemok i guess that solves the grenade mystery18:58
mriedemgeez when is someone going to add an FFU job that runs from ocata-em to master?!18:58
mnaserissues like this is why ffu upgrades terrify me19:01
mnaserlol19:01
mnaser"when did this break? here's 2 years worth of code to go through!"19:01
dansmithmnaser: it's way easier than the alternative, IMHO19:01
dansmithof not knowing if the data set has been transformed since juno or not19:01
mnaserdansmith: i'll agree on that statement19:02
mriedemwell, i was right about one thing19:03
mriedemhttp://logs.openstack.org/00/607600/1/check/ironic-grenade-dsvm/4d493b1/logs/screen-n-cpu.txt.gz#_Oct_03_18_33_59_07234119:03
melwitthm, I just realized, we're going to need to dupe these patches and use the same change-id to propose them to placement as well19:04
mriedemyes19:04
mriedemthta's what we've been doing19:04
melwittok19:04
mnaserforward porting19:04
mnaseris that what we call it19:04
melwitton the second patch, it looks like there's at least one additional place we need to add the group_by, right?19:05
mriedemyes19:05
melwittand should correspondingly test it too. the recreate test patch is already approved though19:05
mriedemnot for long19:05
melwittk19:05
efriedmriedem: That KeyError. Is that part of the existing bugs you've been talking about, or has it not yet been investigated?19:05
mnaserthe keyerror is not related19:06
openstackgerritMatt Riedemann proposed openstack/nova master: Add recreate test for bug 1798163  https://review.openstack.org/61111319:06
openstackbug 1798163 in OpenStack Compute (nova) "Placement incomplete consumers online migration fails" [Critical,In progress] https://launchpad.net/bugs/1798163 - Assigned to Mohammed Naser (mnaser)19:06
mnaserit was just some weird leftovers19:06
mriedemmnaser: i'll rev my functional test patch and yours on top19:06
* mriedem makes coffee19:06
efriedIs it a bug that needs to be fixed, or was it a user error?19:06
mnaserthe keyerror? i dunno, but i dont think it should have been an issue because said user doesnt touch placement19:07
efriedI wouldn't have thought it should be possible no matter what abuse you lavish on the placement db.19:07
*** dave-mccowan has quit IRC19:07
mnaserfwiw the resource provider had nothing allocated19:07
mnaserno usage that is19:07
*** spatel has joined #openstack-nova19:07
*** dave-mccowan has joined #openstack-nova19:08
efriedmnaser: Any sharing providers in this mess?19:08
mnaserefried: sorry, not sure what you mean by that19:08
efriedProviders with the MISC_SHARES_VIA_AGGREGATE trait19:08
mnaserefried: i am not sure hoenstly, i didn't dig in that much19:10
*** jding1_ has joined #openstack-nova19:12
mriedemthen no19:12
mriedemb/c you'd have to create them yourself19:12
*** jding1_ has quit IRC19:12
mnaseryeah besides nova19:13
mnaserno api interaction19:13
*** jackding has quit IRC19:15
efriedIf you see a repro, lmk. Otherwise I'm going to pretend it didn't happen.19:15
*** tbachman has quit IRC19:15
*** jackding has joined #openstack-nova19:17
mnaserefried: i can get you a stacktrace if you want, but i dont think id be able to reproduce it given i deleted stuff19:17
efriedmnaser: The stack trace won't tell me much. Logs up to that point might help a bit.19:18
efriedespecially if they've got our fun new debug messages19:19
openstackgerritMatthew Edmonds proposed openstack/nova master: Use tempfile for powervm config drive  https://review.openstack.org/61017419:20
edmondswefried ^ this should address the fd open issue19:20
mriedemfudge,19:23
mriedemthis unique constraint error is in 3 f'ing places19:23
melwittI wondered if there were more. and I had thought they'd call through the same method to create missing consumers but I guess all of the queries are different19:24
mriedemwell maybe not19:24
mriedemhttps://review.openstack.org/#/c/611115/3/nova/api/openstack/placement/objects/resource_provider.py@197319:24
mriedemthat's not doing the insert-from-select19:25
mriedemlike the others19:25
mnaserum19:26
mnaserin rocky we moved to console auth tokens stored in db, right?19:26
melwittyes, in addition to nova-consoleauth until this lands https://review.openstack.org/61067319:27
mnasermelwitt: what service creates the auth tokens?19:28
melwittmnaser: nova-compute creates them for the database, nova-consoleauth creates them for nova-consoleauth19:28
mnaserso if your nova-compute is not on rocky19:29
mnaser..does that mean no console?19:29
melwittthen you get nova-consoleauth tokens19:29
mnaserok i see19:29
melwittno, you get console19:29
* cdent sighs and cries about 179816319:29
mnaserso just an extra indirection right now19:29
mnasertill nova-compute creates to db directly in the future19:30
melwittnova-compute creates directly to db in rocky. just obviously your older computes will not and those instances will be supported by nova-consoleauth19:30
openstackgerritMatt Riedemann proposed openstack/nova master: Add recreate test for bug 1798163  https://review.openstack.org/61111319:31
openstackbug 1798163 in OpenStack Compute (nova) "Placement incomplete consumers online migration fails" [Critical,In progress] https://launchpad.net/bugs/1798163 - Assigned to Mohammed Naser (mnaser)19:31
openstackgerritMatt Riedemann proposed openstack/nova master: Use unique consumer_id when doing online data migration  https://review.openstack.org/61111519:31
mriedemmnaser: melwitt: dansmith: cdent: efried: ^ should be good now, covers both cases19:31
melwittonce all your computes are on rocky, then you wouldn't need nova-consoleauth once this backport lands https://review.openstack.org/61067319:31
mnasermelwitt: im seeing consoleauth get a token, but the traceback that says token validation failed is resulted from a method that does db.console_auth_token_get19:32
mnaserugh19:33
mnaser[workarounds] enable_consoleauth=True19:33
melwittmnaser: and you have a mix of rocky computes and older than rocky computes? in that case, you'll need to set [workarounds]enable_consoleauth = True on your console proxy host19:33
mnasermelwitt: that was it, thank you19:36
mnaserpart of me wants to document all this in some "heads up" way, but also i worry "look how bad it is" messaging :\19:37
cdentthanks mriedem, is the expectation on that stuff that since that code is "remove in stein" or "called from online migrations" that we can remove it in openstack/placement instead of using those changes (I've not caught up fully on the irc log)19:37
melwittmnaser: it defaults to False, but for a rolling upgrade you would need it to be True. yet more information I missed in the upgrade release notes :(19:37
mriedemcdent: i don't expect it will be removed in stein,19:37
mriedemwe have to have a blocker migration first19:37
mriedembut haven't thought through it all yet19:37
mnasermelwitt: np, want me to push up a bug or something or you'll write it down?19:38
mriedemopenstack/placement doesn't have a "placement-manage db online_data_migrations" yet19:38
mriedemor does it?19:38
mnasermaybe good to add to the upgrade doc19:38
mriedemcdent: ^19:38
cdentit doesn't have _any_ placement-manage yet19:38
melwittmnaser: alternatively, could switch to defaulting to True and then let operators turn it off and decommission nova-consoleauth intentionally once they've rolled everything to rocky19:38
cdentthat's part of that message I sent earlier today19:38
cdentmriedem: but my thinking was: we don't need to do any online db migrations, yet, either19:39
cdentthat is: Isn't the database in the correct state when someone gets to using openstack/placement?19:39
mnasermelwitt: that feels like a better user experience19:40
mnasera lot of users probably will do rolling upgrades19:40
mriedemcdent: nope19:41
mnaserand you can keep it =True for rocky only anyways and remove it after19:41
mriedemcdent: not if you're upgrading from <pike to stein19:41
mriedemlike mnaser is doing with going from juno to rocky19:41
dansmithmelwitt: workarounds are supposed to default to off19:41
melwittmnaser: you can create a bug, that would be most visible I think19:41
dansmithmelwitt: so really we should have landed it where false meant what we wanted19:41
cdentmriedem: I'm confused on how that's supposed to work: don' t you stop on the rocky _code_ when doing a FFU?19:42
dansmithchanging it again is kindof the suck too, IMHO19:42
melwittdansmith: ok :( I see19:42
mnasermelwitt: dansmith i guess i'm pretty busy these times but i19:42
mnaseri'll put up a bug and leave it for the team to decide whats best :>19:43
mriedemcdent: you mean upgrade to rocky where the create_incomplete_consumeres online migration runs as part of the rocky nova-manage db online_data_migrations, and then extract placement while upgrading to stein?19:43
mriedemand assume create_incomplete_consumers is done already?19:43
mriedemthat might happen19:43
mriedemit's just nice to have a blocker migration to prevent you from upgrading if you didn't do the homework19:44
mriedemwe don't always have those though b/c sometimes they span multiple DBs19:44
cdentthe term "blocker migration" has never been sufficiently defined for me19:44
mriedemwe've used nova-status upgrade check for that though19:44
mriedemas in db sync fails19:44
mriedemdb sync in N fails b/c you didn't complete the online migrations in N-119:44
melwittmnaser: thanks. at the very least, we can add more info to the upgrade reno, I think. but it sounds like I messed up the [workarounds] option too much to fix19:45
mnasermelwitt: nah, it's fine, i think the messaging needs to be more clear as in like19:45
mnaseryo shit will be broken if you havent completed the upgrade19:45
mnaserbecause to me i saw some stuff related to it but it didnt really feel like "that was my issue"19:45
dansmithmelwitt: just MHO of course. We've deviated from my initial proposal of workarounds in the past, but I do think that adding another possible "how it is if you didn't change it" case at this point is probably less helpful19:46
melwittwell, I was thinking "if you're doing a rolling upgrade, set [workarounds]enable_consoleauth = True19:46
melwittmnaser: ^ rather than, yo shit will be broken19:46
mnaseryeah i'd be in favour of enabling it given that it cant really do much19:46
mriedemcdent: i couldn't do a block migration for this request spec thing, so added it to nova-status upgrade check https://review.openstack.org/#/c/581813/19:47
mriedem*blocker migration19:47
mnasermelwitt: much better delivered.19:47
mnaser:p19:47
mnasermelwitt: and my sh.... rolling upgrades were affected by https://bugs.launchpad.net/nova/+bug/179818819:47
openstackLaunchpad bug 1798188 in OpenStack Compute (nova) "VNC stops working in rolling upgrade by default" [Undecided,New]19:47
melwittlol, no I mean it _won't_ be broken if you set it19:47
mnaseroh19:47
mnaserah my brain has potato'd19:47
cdentmriedem perhaps it would be good/ideal if we can instead of a suite of N blocker migrations we have a sanity check of some kind in a placement status check, which effectively does the same kind of "You're database isn't ready" thing.19:48
mnaseri did ocata => pike => queens => rocky in 2 days19:48
mriedemcdent: that's an optoin19:48
mriedem*option19:48
melwittdansmith: I didn't understand the "how it is if you didn't change it" are you saying you think changing the default would be less helpful than adding more words to the upgrade reno?19:48
mnaserthis one was by far the toughest but as expected i guess19:48
mriedemfor dropping create_incomplete_consumers19:48
dansmithmelwitt: I'm saying if you change it now, then people reading the renos will see "This was deprecated. Oopps, undeprecated set this workaround, Oops Oops, nevermind, it's set by default now"19:49
dansmithmelwitt: and it just seems like we're piling on the confusion if we keep making that a moving target19:49
dansmithmelwitt: if anything, add something to nova-status and backport it to help make sure people are warned to pay attention to this19:49
dansmithand get that released before people have a chance to stumble over this19:49
melwittdansmith: I see, yeah. that's true, if we change the default we have to change all of the words related to how the workaround works, that would be confusing if someone's seen it before19:50
mnaser(but also how many people went through this document already given the issue i ran into today :p)19:51
dansmithmelwitt: and I don't think we get to alter the older renos, if I'm not mistaken, but even still it's out there so if someone is looking at X.1 docs and then they're similar but different in X.2...19:51
melwittargh, yeah.19:52
dansmithmnaser: I'm talking about in a year when most people are deploying rocky and trying to figure out what the story is now19:52
mnaserdansmith: makes sense19:52
*** icey has quit IRC19:52
melwittI have cursed nova-consoleauth :(19:53
melwittok, so add something to nova-status. I hope everyone uses nova-status19:53
*** hamzy has joined #openstack-nova19:54
dansmithof course not everyone does.. OSA does I think, and hopefully all the buzz around making this a generic thing will mean in a year people are looking at it19:55
dansmithI thought you were also suggesting clarifying words in renos to help understand19:55
dansmithI was just saying flipping the default behavior now and trying to document _that_ is the confusing part19:55
melwittyeah but IIUC that doesn't help someone upgrading to rocky if I can't backport those words19:55
melwittoh19:56
dansmithyou can backport the words,19:56
dansmithI just don't think you should change the behavior and backport more words explaining how it's changed for the third time19:56
melwittgot it, ok19:56
mriedemwhat would the nova-status upgrade check look for? that [workarounds]/enable_consoleauth is False and return a warning?19:57
*** angiewang has joined #openstack-nova19:57
dansmithyeah, and maybe check the services table or current tokens to see if you even use that stuff19:58
dansmithif you don't use console, then you don't need to warn,19:58
dansmithbut if you do and you're rolling, pretty much should have that set right?19:58
melwittyeah19:58
*** angiewang has quit IRC19:58
mriedemyou can also tell if there are no console auth entries in the db right?19:58
dansmithI said that19:58
mriedemi said it with an accent19:59
dansmithfancy19:59
mriedemdansmith: btw, https://review.openstack.org/#/c/611094/ needs an assertion on it19:59
melwittthere wouldn't be, before rocky though. they'd be in the nova-consoleauth service19:59
mriedemmelwitt: well, that's the point right?19:59
mriedemif you're using nova-consoleauth in queens, and upgrading to rocky, you want the workaround enabled19:59
dansmithmriedem: ack, I have to run off for a bit but will hit that when I get back20:00
mriedemand nova-consoleauth would show up in the services table in....one of the dbs20:00
melwittyeah, I mean, if you are checking a queens deployment for whether they use consoles at all, you'd have to check the nova-consoleauth service20:00
mriedemwe don't want to make an rpc call from the status check20:00
mriedembut we could check the services table to see if it's been started20:00
melwittoh, you're thinking if they don't use consoles they won't run the service at all. that makes sense too20:00
melwittyeah20:00
mriedemi just don't know which db that'd be in20:00
mriedemapi?20:00
mriedemno,20:01
mriedemwrong schema20:01
mriedemi guess just iterate the cell dbs20:01
mriedemif you find a non-deleted nova-consoleauth service record in that db, but no console auth tokens in the db, and workarounds is false, then fail20:01
melwittyeah, or warn like dansmith said. only matters if you're rolling20:02
melwitti.e. it will only mess you up if you're rolling20:03
mriedemi left a comment on the bug with the status ugprade check idea20:04
melwittthanks20:05
*** angiewang has joined #openstack-nova20:05
openstackgerritMatthew Edmonds proposed openstack/nova master: Use tempfile for powervm config drive  https://review.openstack.org/61017420:05
*** rmart04 has joined #openstack-nova20:05
*** rmart04 has quit IRC20:06
*** angiewang has left #openstack-nova20:08
mriedemdansmith: np i got it, it was 1 line20:11
openstackgerritMatt Riedemann proposed openstack/nova master: Fix formatting non-templated cell URLs with no config  https://review.openstack.org/61109420:11
mriedemeasy fix for another core ^20:12
melwitt+W20:17
*** cdent has quit IRC20:24
*** macza has quit IRC20:25
*** macza_ has joined #openstack-nova20:25
openstackgerritMatt Riedemann proposed openstack/nova master: Ignore uuid if already set in ComputeNode.update_from_virt_driver  https://review.openstack.org/61116220:27
mriedemefried: jroll: ^20:27
*** macza_ has quit IRC20:28
*** macza has joined #openstack-nova20:29
openstackgerritmelanie witt proposed openstack/nova master: Bump os-brick version to 2.6.1  https://review.openstack.org/61110920:30
efriedI don't get it.20:30
*** macza has quit IRC20:31
*** macza has joined #openstack-nova20:32
openstackgerritSundar Nadathur proposed openstack/nova-specs master: Nova Cyborg interaction specification.  https://review.openstack.org/60395520:32
*** slaweq has quit IRC20:33
spatelFolks! i have 64G compute node20:33
dansmithspatel: it's not nice to brag20:34
dansmithmriedem: thanks home skillet20:34
spatelShould i go with 1G hugepage or 2M20:34
spateldansmith: i was going to write question but hit enter middle of i20:34
spateldansmith: i was going to write question but hit enter middle of it20:34
dansmithspatel: I know, I'm just joking :P20:34
spatel:)20:35
spatelwhat do you recommend if that is the case20:35
spatelProblem is if i launch application then it will be hard to adjust those value20:36
spatelcurrently i have "hugepagesz=2M hugepages=27000 transparent_hugepage=never"20:36
dansmithspatel: you probably want cfriesen20:36
spatelcfriesen: ^^20:37
*** moshele has joined #openstack-nova20:37
spatelHe may be not around20:38
mriedemefried: you were on the original regression patch of mine so figured you'd have context20:38
mriedemthis https://review.openstack.org/#/c/571535/20:38
efriedmriedem: Yeah, I think I get it now.20:38
efriedSee if my review comment makes sense.20:39
mriedemefried: yup20:40
efriedmriedem: ...and another update20:40
cfriesenspatel: in our testing 2M gave a noticeable benefit.  1G gave some additional benefit but only for specific testcases20:40
spatelThere you go!! thanks you20:40
mriedemefried: yup20:41
efriedcool20:41
cfriesenspatel: are you using dedicated CPUs?20:41
spatelDo you think 27000 is good number on 64G compute node?20:41
spatelyes I am pinning CPU20:41
spateli am going to run riak cluster application on this compute node20:41
spatelriak love memory20:42
cfriesenyou don't need to allocate hugepages at boot. you can allocate them at runtime20:42
spateli heard sometime it cause issue during runtime20:42
cfriesenspatel: If you allocate them early during startup the memory hasn't gotten fragmented yet20:42
spatelhmm!20:43
spateli will keep that in mind then..20:43
cfriesenit's a bit of a tradeoff, since any memory you reserve for hugepages can't be allocated to small-page instances.  Also, you need to keep some 4K memory around for the host itself.20:43
spatelbut you have to reboot your flavor also right after adjust hugepage20:43
*** hamzy has quit IRC20:44
cfriesennot sure what you mean by "reboot your flavor" :)20:44
spatelI kept 8G memory for host that is why i pick 27000 pages20:44
spateli meant reboot your VM20:44
cfriesenspatel: that should be fine as long as most of your guests are using hugepages20:44
spatelcfriesen: thanks! in that case i will go with  2700020:45
cfriesenspatel: changing a flavor and then rebooting your vm won't do anything.  the flavor information was cached in the VM at creation time.20:46
cfriesenin the instance object, rather20:47
spateloh!!20:47
spatelcool20:47
*** erlon has quit IRC20:48
*** awaugama has quit IRC20:51
*** READ10 has joined #openstack-nova20:55
mriedemi still don't understand how this forbidden aggregate placement API thing is going to be used via nova for blazar which is use case in this spec https://review.openstack.org/#/c/603352/20:56
efriedmriedem: You kind of had to be in the room, unfortunately.20:56
mriedemit seems we're gung ho about adding a placement api without any details on how to ues it20:56
mriedem*use20:56
mriedemwell, the place to document that is in the spec for those not in the room right?20:56
mriedemif the answer is, "we're going to fork nova and make pre-request filters an extension point" then say that20:56
efriedYes, I agree. If it's not clear to someone who wasn't in the room, it needs a rewrite.20:57
*** priteau has quit IRC20:57
mriedemi particularly want Kevin_Zheng on board with this b/c he had a use case for the dedicated host stuff as well20:58
mriedembut he needs to speak up on the spec review too20:58
*** priteau has joined #openstack-nova21:01
*** mriedem is now known as mriedem_away21:02
mriedem_awaytime for parent/teacher conferences21:02
*** eharney has quit IRC21:03
*** munimeha1 has joined #openstack-nova21:05
efriedmnaser, dansmith: Would DISTINCT have done the same thing? And possibly be more efficient?21:18
efriedsorry, I'm talking about https://review.openstack.org/#/c/611115/321:18
mnaserefried: i dunno, not an sql expert, i didnt try it and it seemed like the.. easier way21:19
*** mchlumsky has quit IRC21:25
*** moshele has quit IRC21:35
*** lbragstad is now known as lbragstad-50321:42
openstackgerritMerged openstack/nova master: Transform volume.usage notification  https://review.openstack.org/58034521:42
*** munimeha1 has quit IRC21:45
*** slaweq has joined #openstack-nova21:53
*** mriedem_away has quit IRC21:55
*** smcginnis is now known as smcginnis_vaca21:55
*** spatel has quit IRC21:56
*** priteau has quit IRC22:03
*** tbachman has joined #openstack-nova22:04
*** slaweq has quit IRC22:09
*** tbachman has quit IRC22:10
*** slaweq has joined #openstack-nova22:11
*** tbachman has joined #openstack-nova22:14
*** slaweq has quit IRC22:44
*** rcernin has joined #openstack-nova22:49
*** macza has quit IRC23:01
*** slaweq has joined #openstack-nova23:11
*** dave-mccowan has quit IRC23:24
*** hamzy has joined #openstack-nova23:24
*** mlavalle has quit IRC23:31
*** takashin has joined #openstack-nova23:41
*** slaweq has quit IRC23:44
*** k_mouza has joined #openstack-nova23:53
*** erlon has joined #openstack-nova23:55
*** k_mouza has quit IRC23:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!