Wednesday, 2019-09-11

*** slaweq has joined #openstack-nova00:11
*** slaweq has quit IRC00:16
*** mtreinish has joined #openstack-nova00:18
melwittsean-k-mooney: fyi I've proposed a stable/stein release of os-vif. I didn't find one already proposed https://review.opendev.org/68135800:20
*** TxGirlGeek has joined #openstack-nova00:24
*** tetsuro has joined #openstack-nova00:26
*** threestrands has joined #openstack-nova00:27
sean-k-mooneyrelease or branch. there is a a stable branch but i dont think we have done a stabel release in a while00:31
sean-k-mooneymelwitt: in either case a release proably  should be done as i dont think we have release since we fixed the cve with the linux bridge flooding00:33
alex_xusean-k-mooney: so we are still good at current upgrade plan for pcpu? I saw there a looong discussion on that00:33
sean-k-mooneyalex_xu: am i think dan wants to make the config outhion not required00:33
alex_xusean-k-mooney: ok, i see now.00:34
sean-k-mooneyby trying to claim with PCPUs and if that fails fall back to claiming with VCPUs and rely on the numa toplogy filter to validate it00:34
melwittsean-k-mooney: release, for stable branch. and correct, we've not had a release of the stable/stein branch since the cve fix00:34
sean-k-mooneymelwitt: ya so i proably should have proposed one when that merged. thanks for doing it00:35
alex_xusean-k-mooney: thanks, that is interesting...00:35
alex_xuhow can I know whether I need to stop fallback after upgrade00:35
melwittsean-k-mooney: np. I just wanted to make sure I didn't duplicate a release proposal in case I hadn't found it in my gerrit search00:35
sean-k-mooneyi think the release are ment to be on a topic branch with the project name but i dont see one looking quickly either00:37
sean-k-mooneyya looking at all the open patches agains the release repo yours is the only os-vif one i see00:38
sean-k-mooneyalex_xu: basicailly how it would work is it would ould fallback if the first placmeent query with PCPUs failed then it would try with VCPUs00:39
sean-k-mooneythe numa toplogy filter could tell if the compute node was an train node or a pre train node based on the field in the host numa toplogy object00:40
sean-k-mooneyso on train nodes all either there ill be no pinned cpus or the will all be used00:40
melwittsean-k-mooney: lol, looks like my beloved auto topic from git-review removed my good branch name00:40
sean-k-mooneyfor a stine node we would usee the old logic00:40
sean-k-mooneymelwitt: i think that is why it was removed00:41
melwittreally???00:41
melwitt:P00:41
melwittI still like it most of the time00:41
sean-k-mooneyi think. that if you name you r branches bug/whatever and do git review it will keep it00:41
*** eharney has quit IRC00:42
sean-k-mooneyat least i have still been relying on haveing the correct local brnach name00:42
alex_xusean-k-mooney: ah...00:42
*** gyee has quit IRC00:42
sean-k-mooneyalex_xu: im still not 100% sure there are no issue with that but it sound like it might work00:42
sean-k-mooneyi think stephen is going to try and code it up tommorow/today wehn he is online in 8 hours00:43
alex_xuyea, I'm trying to build up how that works in my mind also00:43
alex_xuanyway I need moving to office, then continue to look at, see if anything I can help00:44
alex_xusean-k-mooney: thanks00:44
sean-k-mooneyi think it should work in theroy. my concern would be the use of the limit parmater on the second query but in principal i dont imedialy see why it would not work00:45
sean-k-mooneyi had assumed somthing like a fallback mechaniums woudl be deamed too complicated. but i guess its for just one cycle00:46
sean-k-mooneywell proably more with our tenency not to delete things right away00:47
*** TxGirlGeek has quit IRC00:48
alex_xuyea, but that fallback should be existing in the whole U release00:49
*** tetsuro has quit IRC00:54
openstackgerritweibin proposed openstack/nova master: Add support for using ceph RBD ereasure code  https://review.opendev.org/68118800:56
*** tetsuro has joined #openstack-nova01:02
sean-k-mooneyalex_xu: ya we might have to wait till V to drop it but in V it could be removed01:04
sean-k-mooneywe want to support people moving from stine to train without needed to modify anything but train to ussuri was when we were thingin of chaning the default to in the config and removing the confiv in V01:06
sean-k-mooneyin anycase before you upgrade form U to V we want people to have moved to useing the new way of tracking01:06
*** mriedem has joined #openstack-nova01:06
sean-k-mooneymriedem: the numa job passed by the way https://review.opendev.org/#/c/680739/ result are here https://zuul.opendev.org/t/openstack/build/2ad3676591e440b6a76d30e966833a4901:10
mriedem2am in ireland, go to bed01:12
mriedembut thanks :)01:12
sean-k-mooneyi will soon im just finishing playing a game to unwind. irc is on a different monitor01:14
*** tetsuro has quit IRC01:15
*** Tianhao_Hu has joined #openstack-nova01:20
*** Tianhao_Hu has quit IRC01:20
alex_xuls01:23
alex_xuoops01:23
*** spatel has joined #openstack-nova01:24
*** liuyulong has joined #openstack-nova01:25
*** tkajinam has quit IRC01:25
*** lbragstad has quit IRC01:26
*** tkajinam has joined #openstack-nova01:26
*** lbragstad has joined #openstack-nova01:26
*** tetsuro has joined #openstack-nova01:28
*** Tianhao_Hu has joined #openstack-nova01:34
*** Tianhao_Hu has left #openstack-nova01:34
openstackgerritArtom Lifshitz proposed openstack/nova master: NUMA live migration support  https://review.opendev.org/63460601:35
openstackgerritArtom Lifshitz proposed openstack/nova master: Deprecate CONF.workarounds.enable_numa_live_migration  https://review.opendev.org/64002101:35
openstackgerritArtom Lifshitz proposed openstack/nova master: Functional tests for NUMA live migration  https://review.opendev.org/67259501:35
*** tetsuro has quit IRC01:40
*** tetsuro has joined #openstack-nova01:40
*** nicolasbock has quit IRC01:44
*** nicolasbock has joined #openstack-nova01:45
*** liuyulong has quit IRC01:55
*** nicolasbock has quit IRC01:58
brinzhangmriedem:https://review.opendev.org/#/c/679413/4/nova/objects/migration.py@17202:02
brinzhangin self._context are not contain user_id and project_id02:02
brinzhangmriedem:IMO, when the operator post the request, the user_id (self._context._user_id) need to record, isn't it?02:04
*** slaweq has joined #openstack-nova02:11
*** FlorianFa has quit IRC02:15
*** slaweq has quit IRC02:15
*** FlorianFa has joined #openstack-nova02:35
*** tetsuro has quit IRC02:47
*** mriedem has quit IRC02:48
*** tetsuro has joined #openstack-nova03:00
*** spatel has quit IRC03:03
openstackgerritBrin Zhang proposed openstack/nova master: Add operator user_id/project_id to the migrations  https://review.opendev.org/67941303:04
*** tetsuro has quit IRC03:05
openstackgerritMerged openstack/nova master: Fixing broken links  https://review.opendev.org/68120603:09
*** ccamacho has quit IRC03:11
*** PrinzElvis has quit IRC03:39
*** larainema has joined #openstack-nova03:39
*** knikolla has quit IRC03:42
*** ildikov has quit IRC03:42
*** csatari has quit IRC03:42
*** pas-ha has quit IRC03:44
*** ildikov has joined #openstack-nova03:44
*** knikolla has joined #openstack-nova03:45
*** csatari has joined #openstack-nova03:45
*** pas-ha has joined #openstack-nova03:45
*** samc-bbc has quit IRC03:45
*** PrinzElvis has joined #openstack-nova03:45
*** samc-bbc has joined #openstack-nova03:47
*** udesale has joined #openstack-nova04:00
*** etp has joined #openstack-nova04:12
*** ash2307 has quit IRC04:25
*** ash2307 has joined #openstack-nova04:26
*** dave-mccowan has quit IRC04:29
*** pcaruana has joined #openstack-nova04:42
openstackgerritMerged openstack/nova master: Fix the race in confirm resize func test  https://review.opendev.org/68123804:49
*** brault has joined #openstack-nova05:00
openstackgerritAlex Xu proposed openstack/nova master: Counting both of VCPU and PCPU as core quota  https://review.opendev.org/68137405:02
alex_xustephenfin: efried sean-k-mooney ^ hope this is what you need05:02
openstackgerritAlex Xu proposed openstack/nova master: Counting both of VCPU and PCPU as core quota  https://review.opendev.org/68137405:04
*** brault has quit IRC05:05
*** ratailor has joined #openstack-nova05:06
*** Luzi has joined #openstack-nova05:07
*** slaweq has joined #openstack-nova05:11
*** pcaruana has quit IRC05:12
*** slaweq has quit IRC05:16
*** ash2307 has quit IRC05:18
*** ash2307 has joined #openstack-nova05:20
*** rcernin has quit IRC05:22
*** redrobot has quit IRC05:25
*** damien_r has joined #openstack-nova05:31
openstackgerritAlex Xu proposed openstack/nova master: Counting both of VCPU and PCPU as core quota  https://review.opendev.org/68137405:34
*** damien_r has quit IRC05:36
*** rcernin has joined #openstack-nova05:38
alex_xustephenfin: https://review.opendev.org/#/c/681374/ according this http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-09-10.log.html#t2019-09-10T20:04:42, in case you don't know05:43
*** TxGirlGeek has joined #openstack-nova05:45
*** brinzhang_ has joined #openstack-nova05:50
*** rcernin has quit IRC05:51
*** brinzhang has quit IRC05:54
*** rpittau|afk is now known as rpittau06:01
*** TxGirlGeek has quit IRC06:03
*** Tianhao_Hu has joined #openstack-nova06:05
*** takashin has joined #openstack-nova06:05
*** rcernin has joined #openstack-nova06:09
*** slaweq has joined #openstack-nova06:11
*** slaweq has quit IRC06:15
*** ash2307 has quit IRC06:18
*** luksky has joined #openstack-nova06:19
*** slaweq has joined #openstack-nova06:21
*** pcaruana has joined #openstack-nova06:21
openstackgerritAlex Xu proposed openstack/nova master: DNM: Try to fallback to PCPU request when VCPU failed  https://review.opendev.org/68138306:25
*** tetsuro has joined #openstack-nova06:26
*** etp has quit IRC06:32
*** etp has joined #openstack-nova06:33
*** ash2307 has joined #openstack-nova06:35
*** ricolin has joined #openstack-nova06:38
*** ricolin has quit IRC06:39
*** ociuhandu has joined #openstack-nova06:42
*** tetsuro_ has joined #openstack-nova06:46
*** damien_r has joined #openstack-nova06:46
*** ociuhandu has quit IRC06:46
*** tetsuro has quit IRC06:48
*** maciejjozefczyk has joined #openstack-nova06:51
*** lpetrut has joined #openstack-nova06:52
*** brault has joined #openstack-nova06:57
*** ociuhandu has joined #openstack-nova07:04
*** ociuhandu has quit IRC07:04
*** ociuhandu has joined #openstack-nova07:07
*** awalende has joined #openstack-nova07:08
*** ociuhandu has quit IRC07:08
*** trident has quit IRC07:08
*** ivve has joined #openstack-nova07:09
*** tetsuro_ has quit IRC07:09
*** luksky has quit IRC07:10
*** tesseract has joined #openstack-nova07:11
*** trident has joined #openstack-nova07:17
*** ralonsoh has joined #openstack-nova07:19
*** tetsuro has joined #openstack-nova07:21
*** trident has quit IRC07:22
*** luksky has joined #openstack-nova07:23
*** ociuhandu has joined #openstack-nova07:24
openstackgerritAlex Xu proposed openstack/nova master: DNM: Try to fallback to PCPU request when VCPU failed  https://review.opendev.org/68138307:29
*** trident has joined #openstack-nova07:31
alex_xustephenfin: efried sean-k-mooney ^ also tested the dansmith idea07:31
*** ccamacho has joined #openstack-nova07:38
*** rcernin has quit IRC07:38
*** tetsuro has quit IRC07:45
openstackgerritBalazs Gibizer proposed openstack/nova master: Make _revert_allocation nested allocation aware  https://review.opendev.org/67613807:47
gibibauzas: I needed to recheck anyhow and later part of the series needs a rebase to pick up the race fix. Could you re +W ^^07:49
gibi?07:49
bauzasgibi: sure thing07:49
gibithanks07:49
bauzasand morning07:49
gibibauzas: yeah, good morning07:49
bauzasgibi: done07:50
alex_xuyou guys wakeup early today, or you guy just quiet before~07:51
bauzasnah it's 9:51am07:51
bauzasbut in general I do a lot of internal stuff on mornings :(07:51
*** luksky has quit IRC07:51
alex_xui see now~07:51
gibialex_xu: I wake up earlier today but not due to work.07:52
alex_xunice07:53
gibiI hit the gym early as I have to flight to Ireland today (internal meeting)07:53
gibiwhat a better time to have an internal conf than on FF day :)07:54
alex_xuhah07:54
alex_xuor review on the flight :)07:54
openstackgerritBalazs Gibizer proposed openstack/nova master: Support reverting migration / resize with bandwidth  https://review.opendev.org/67614007:55
gibiyeah, I'm planning to work whole day regardless of the flights07:55
gibibauzas: rebase on https://review.opendev.org/676140 as well07:55
gibibauzas: could you please re+w?07:56
bauzasgibi: done07:56
gibithanks a lot07:56
gibibauzas: the next in the series will change to avoid re-introducing the race. So that will take a bit of time to propose07:56
*** luksky has joined #openstack-nova07:57
bauzasgibi: ok ping me then when you're done07:59
*** threestrands has quit IRC07:59
openstackgerritLuyao Zhong proposed openstack/nova master: db: Add resources column in instance_extra table  https://review.opendev.org/67844708:01
openstackgerritLuyao Zhong proposed openstack/nova master: object: Introduce Resource and ResourceList objs  https://review.opendev.org/67844808:01
openstackgerritLuyao Zhong proposed openstack/nova master: Add resources dict into _Provider  https://review.opendev.org/67844908:01
gibibauzas: sure08:01
openstackgerritLuyao Zhong proposed openstack/nova master: Retrieve the allocations early  https://review.opendev.org/67845008:01
openstackgerritLuyao Zhong proposed openstack/nova master: Claim resources in resource tracker  https://review.opendev.org/67845208:01
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver discovering PMEM namespaces  https://review.opendev.org/67845308:01
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: report VPMEM resources by provider tree  https://review.opendev.org/67845408:01
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Support VM creation with vpmems and vpmems cleanup  https://review.opendev.org/67845508:01
openstackgerritLuyao Zhong proposed openstack/nova master: Parse vpmem related flavor extra spec  https://review.opendev.org/67845608:01
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces  https://review.opendev.org/67964008:01
openstackgerritLuyao Zhong proposed openstack/nova master: Add functional tests for virtual persistent memory  https://review.opendev.org/67847008:01
openstackgerritLuyao Zhong proposed openstack/nova master: objects: use all_things_equal from objects.base  https://review.opendev.org/68139708:01
*** tssurya has joined #openstack-nova08:02
openstackgerritLuyao Zhong proposed openstack/nova master: Add resources dict into _Provider  https://review.opendev.org/67844908:05
openstackgerritLuyao Zhong proposed openstack/nova master: Retrieve the allocations early  https://review.opendev.org/67845008:05
openstackgerritLuyao Zhong proposed openstack/nova master: Claim resources in resource tracker  https://review.opendev.org/67845208:05
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver discovering PMEM namespaces  https://review.opendev.org/67845308:05
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: report VPMEM resources by provider tree  https://review.opendev.org/67845408:05
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Support VM creation with vpmems and vpmems cleanup  https://review.opendev.org/67845508:05
openstackgerritLuyao Zhong proposed openstack/nova master: Parse vpmem related flavor extra spec  https://review.opendev.org/67845608:05
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces  https://review.opendev.org/67964008:05
openstackgerritLuyao Zhong proposed openstack/nova master: Add functional tests for virtual persistent memory  https://review.opendev.org/67847008:05
openstackgerritLuyao Zhong proposed openstack/nova master: objects: use all_things_equal from objects.base  https://review.opendev.org/68139708:05
efriedo/ nova08:09
efriedalex_xu, luyao: Is that ---^  ready for another look?08:09
alex_xuefried: good morning...so early08:10
efriedalex_xu: I thought it might be a good idea for me to get up early and overlap more with you & stephenfin08:10
alex_xuefried: hah08:10
efriedjust today (and maybe tomorrow), not permanently.08:10
efriedbut yeah, it's closer to "all-nighter with a nap" than "get up early" :P08:11
alex_xuefried: no response from luyao, I guess she just get a break after submit patch :)08:12
efriedalex_xu: I saw you worked the quota solution - thank you for that.08:12
efriedhow's the CI thing coming? Does Rui have it set to pull latest yet?08:12
alex_xuefried: np, one more interesting thing https://review.opendev.org/68138308:12
*** ociuhandu has quit IRC08:12
alex_xuefried: no, I talk with Dolpher that depends on CI team, they are in US08:13
alex_xuefried: we will know after few hours i guess08:13
efriedargh! If only we had known, we were looking at this yesterday during the day.08:13
*** ratailor_ has joined #openstack-nova08:14
*** ociuhandu has joined #openstack-nova08:15
efriedalex_xu: Did https://review.opendev.org/#/c/678455/27 wind up sane?08:15
alex_xuefried: let me check now08:16
efriedIf so, perhaps we can ask gibi/bauzas to have a look at it and give the other +2 (expecting the first from stephenfin)08:16
*** ratailor has quit IRC08:16
bauzasefried: which one ?08:16
efriedbauzas: That one piece of vpmem we talked about yesterday - the part that does the libvirt dom xml business.08:17
efriedlink above08:17
efriedbauzas: but maybe let alex_xu vet it first to make sure it's worth your time.08:17
bauzasefried: okay, honestly, it's prio #3 for me08:17
efriedthat should work out nicely :)08:17
bauzasbut I can try to look while I wait for stephenfin and gibi08:17
openstackgerritAlexandra Settle proposed openstack/nova stable/stein: Fixing broken links  https://review.opendev.org/68140108:17
alex_xuefried: bauzas I guess the only one I want to check is this https://review.opendev.org/#/c/678455/27/nova/virt/hardware.py@2065, I don't know why she can't just assign a cpu_policy and cpu_thread_plicy directly instead of using the flavor and image_metadata08:18
*** derekh has joined #openstack-nova08:20
*** luksky has quit IRC08:21
openstackgerritBrin Zhang proposed openstack/nova master: Filter migrations by user_id/project_id  https://review.opendev.org/67424308:21
*** tssurya has quit IRC08:22
luyaoefried: code ready to be reviewed, welcome :)08:23
openstackgerritBrin Zhang proposed openstack/nova master: Filter migrations by user_id/project_id  https://review.opendev.org/67424308:23
efriedluyao: on it now08:23
alex_xuluyao: only thing I have https://review.opendev.org/#/c/678455/27/nova/virt/hardware.py@207208:25
luyaoalex_xu: Okay, I'll test it first.08:26
*** tkajinam has quit IRC08:27
alex_xuluyao: thanks a lot08:27
*** jawad_axd has joined #openstack-nova08:29
stephenfinalex_xu: Nicely done :) Looking08:29
alex_xustephenfin: \o/08:29
*** takashin has left #openstack-nova08:30
luyaoalex_xu: works well08:30
alex_xuluyao: \o/08:31
openstackgerritMerged openstack/nova master: Apply SEV-specific guest config when SEV is required  https://review.opendev.org/64456508:34
openstackgerritMerged openstack/nova master: Reject live migration and suspend on SEV guests  https://review.opendev.org/68015808:34
openstackgerritMerged openstack/nova master: Enable booting of libvirt guests with AMD SEV memory encryption  https://review.opendev.org/66661608:34
openstackgerritBalazs Gibizer proposed openstack/nova master: Func test for migrate re-schedule with bandwidth  https://review.opendev.org/67697208:34
alex_xustephenfin: https://review.opendev.org/#/c/671800/30/nova/virt/hardware.py@2018, I'm not sure we should take crea of numa_usages_from_instance_numa in ResoourceTracker, since the RT only process the instance below to this host, if this host is upgraded, then it will host_cell.pcpuset. If this host is old, then it doesn't know about any pcpuset. But I can help you to test that08:35
*** tssurya has joined #openstack-nova08:36
*** avolkov has joined #openstack-nova08:37
gibibauzas: the next patch is up: https://review.opendev.org/67697208:37
bauzasgibi: ack, looking08:37
gibithanks08:37
* stephenfin looks08:49
stephenfinalex_xu: Hmm, that's a good point actually08:50
stephenfinI keep forgetting the resource tracker runs on the host08:50
openstackgerritBalazs Gibizer proposed openstack/nova master: Support migrating SRIOV port with bandwidth  https://review.opendev.org/67698008:53
alex_xustephenfin: yea, we only take care of host_cell, not the instance cell, it should be ok for any migration also08:53
aspiersstephenfin: I just removed SEV from the runway since it merged \o/08:53
aspiersthanks everyone for all your help08:53
*** priteau has joined #openstack-nova08:54
alex_xuaspiers: congrats, destory the laptop and begin to vacation08:54
aspiersalex_xu: exactly! X-D ;-)08:54
aspiersI'm taking vacation next week08:55
alex_xuaspiers: enjoy :)08:55
aspiersthanks :)08:55
openstackgerritBalazs Gibizer proposed openstack/nova master: Allow migrating server with port resource request  https://review.opendev.org/67149708:55
openstackgerritBalazs Gibizer proposed openstack/nova master: Do not query allocations twice in finish_revert_resize  https://review.opendev.org/67882708:55
*** ttsiouts has joined #openstack-nova09:00
openstackgerritBalazs Gibizer proposed openstack/nova master: Allow resizing server with port resource request  https://review.opendev.org/67901909:00
*** shilpasd has joined #openstack-nova09:01
openstackgerritBalazs Gibizer proposed openstack/nova master: Extract pf$N literals as constants from func test  https://review.opendev.org/68099109:03
stephenfinaspiers: \o/09:04
stephenfinefried: You're mad, btw09:04
stephenfin(the getting up early thing :D)09:04
*** lpetrut has quit IRC09:04
* gibi going to the airport, back online in ~45 mins09:05
openstackgerritBalazs Gibizer proposed openstack/nova master: Improve dest service level func tests  https://review.opendev.org/68099809:05
efriedfly safe gibi09:05
efried\o/ aspiers!09:06
alex_xustephenfin: sean-k-mooney efried found an intersting case for this https://review.opendev.org/#/c/681383/209:07
alex_xubut I guess in the end, it will become why choice fallback to PCPU or fallback to VCPU.09:08
aspiersefried: \o/ indeed, can't thank you enough for all your help! it's been a long road (basically a whole year IIRC)09:08
stephenfinalex_xu: I just left comments on that09:08
stephenfintl;dr: I don't think we should be doing the service version check09:09
efriedaspiers: sadly, your experience was very smooth and quick compared to most of a similar size.09:09
aspiershaha wow really :)09:09
efriedask anyone09:09
stephenfinaspiers: yup, he's not lying09:09
aspiersI honestly didn't think it would be anywhere near as hard to complete, or as long09:09
aspiersclassic case of engineering underestimation09:10
alex_xustephenfin: except we define a rule, the operator must use cpu_dedicated_set for pinning host after upgrade to T09:10
stephenfincpu-resources is on its...fourth cycle?09:10
efriedit really shouldn't be, aspiers09:10
aspiersstephenfin: :-o09:10
efried...as hard/long to complete09:10
*** rcernin has joined #openstack-nova09:10
stephenfinalex_xu: I've been told we can't. We'll break upgrades. You can't require nova.conf changes simply to upgrade to a new major version09:10
aspiersefried: what are the lessons learnt? other than we should all chip in to help fix CI more :)09:11
efriedaspiers: We keep learning the same lessons over and over again but not reacting09:11
stephenfinaspiers: bug people and be as responsive as possible, pretty much09:11
alex_xustephenfin: emm...I guess because we have no way to force that, the upgrade-check can't check all the hosts' configuration file09:12
aspiersstephenfin: pretty sure I did a lot of bugging - sorry about that ;-)09:12
stephenfinalex_xu: Indeed. Tbh, I don't think it's an issue anyway09:13
efriedaspiers: I see the main problems being 1) we commit to too much stuff per release, and 2) we *somehow* need more consistent core engagement throughout the cycle (and it would be nice if less "bugging" was required to achieve that)09:13
stephenfinThe NUMATopologyFilter will save us09:13
alex_xustephenfin: what do you mean "you can't require nova.conf changes simply to ugprade to a new major version"?09:14
efriedfwiw my plan for ussuri is to be strict about 1. This was the first release where I had an employer with real business commitments based on the assumption that "approved spec" actually means a thing is expected to get done. (And that's the cycle I had to run for PTL, shoot-self-in-foot)09:14
aspiersefried: yeah. It feels like more resources are required for reviewing in general and the cores are being pulled in too many directions at once09:15
efried...and therefore the first cycle I really felt the pain of stuff slipping09:15
aspiersouch09:15
stephenfinalex_xu: I mean if you have Stein-based deployment that's not emitting deprecation warnings, you should be able to upgrade it to Train without changing nova.conf at all and it'll just work09:15
aspiersI can relate - I felt a lot of pressure to land SEV in Train (original plan with AMD was Stein), although it turns out a lot of that pressure was just from me09:16
stephenfinIt might start issuing new warnings saying "you will need to do X before you upgrade to U" but that's it09:16
kashyapaspiers: Nice on SEV completion, that's one fo the first things I was looking if it was all merged this morning.09:16
stephenfinefried: Right? ^09:16
aspierskashyap: thanks a lot for all your help!09:16
efriedstephenfin: that is my understanding, yes09:17
stephenfinSweet09:17
alex_xustephenfin: emm...right, we have rule about the configuration file should works with same as before09:17
stephenfinalex_xu: So yeah, tl;dr: we can insist '[compute] cpu_dedicated_set' is configured before U09:18
stephenfinand add an upgrade check to ensure that09:18
stephenfinbut not T09:18
kashyapaspiers: NP; glad to be useful.  Now I can bug you for Secure Boot stuff -- turning out to be more work than I anticipated :/09:18
stephenfinbecause we gave them no warning and no migration path in Stein09:18
stephenfinalex_xu: However, yeah, I don't think this is in an issue. Consider this09:18
aspierskashyap: bug soon as I'm on vacation next week ;-)09:19
stephenfinStein-based compute node: reports VCPU, uses NUMATopology.cpuset to determine what cores can be pinned to (with the NUMATopologyFilter simply copying 'cpuset' to 'pcpuset' to keep that passing)09:20
kashyapaspiers: Heh, noted; well-deserved09:20
stephenfinTrain-based compute node without new config options: reports VCPU, uses NUMATopology.pcpuset (p!) to determine cores can be pinned to but this == NUMATopology.cpuset (so no need for NUMATopologyFilter to copy 'cpuset' to 'pcpuset')09:21
alex_xuyup09:21
stephenfinTrain-base compute node with new config options: reports PCPU, uses NUMATopology.pcpuset (p!) to determine cores can be pinned to and this is unique (again, no need for NUMATopologyFilter to copy 'cpuset' to 'pcpuset')09:22
stephenfinsooo09:22
stephenfinfrom a placement perspective, things have changed09:22
luyaoefried: do you need a unittest for this? https://review.opendev.org/#/c/678450/11/nova/compute/manager.py@222409:23
stephenfinbut from a NUMATopologyFilter (nova-scheduler) and nova-compute perspective, all three will work perfectly fine together09:23
alex_xuyup, agree09:23
efriedluyao: was there a unit test for the exception case before? If so, it should be retained. If not, it wouldn't hurt to add one, but I'm not sure it's necessary. stephenfin, what do you think?09:24
luyaoefried: there was one before09:24
stephenfinalex_xu: the _only_ thing that can hurt us is Train-based compute nodes that have no CPU configuration or have new config options but only for shared CPUs ('[compute] cpu_shared_set')09:24
efriedluyao: okay, so if it needs to be adjusted based on the code having been moved, do that, but don't remove the unit test.09:25
stephenfinIn that case, the second request to placement will return these hosts since they're reporting VCPU. However, they're _really_ reporting VCPU and don't have any PCPU inventory09:25
alex_xustephenfin: why09:25
stephenfinbut, like I said, the NUMATopologyFilter will protect us from that09:25
stephenfinsince NUMATopology.pcpuset will be set to None and therefore it will fail to pin09:26
*** zhubx has joined #openstack-nova09:26
luyaoefried: okay09:26
alex_xustephenfin: ok09:26
stephenfinand obviously if NUMATopologyFilter is disabled for some reason, the instance will fail to schedule when it lands on the host09:26
alex_xustephenfin: that make senses I think09:27
alex_xustephenfin: have you see the resize case I wrote at https://review.opendev.org/#/c/681383/209:27
stephenfinalex_xu: Phew, glad you understood. It's really hard to put this stuff into words without waffling or going down ratholes :-D09:27
stephenfinLooking09:27
alex_xustephenfin: hah09:27
stephenfinalex_xu: Could we call that out as a known issue?09:28
*** boxiang has quit IRC09:29
alex_xustephenfin: maybe, I'm thinking whether we request PCPU, and fallback to VCPU, can make that better.09:29
alex_xubut I feel tired, my brain works very slow...09:29
*** dtantsur|afk is now known as dtantsur09:30
stephenfinYeah, I guess we'd need to implement the retry logic at a higher level09:30
alex_xustephenfin: ah, I guess that doesn't work. The problem is placement always can get you same host allocation candidates, but the request is reject by the scheduler filters09:30
stephenfinYup. So if we wanted to solve that, we'd have to wrap both the call to placement and the call to the scheduler filters09:31
stephenfinand the attempt to schedule to the guest too, in case the user doesn't have NUMATopologyFilter enabled09:31
stephenfinthat sounds like a lot of effort for corner case that will only happen while an operator is in the process of upgrading their configuration09:31
*** ratailor_ has quit IRC09:32
*** ratailor_ has joined #openstack-nova09:32
alex_xuI feel I just tired09:33
stephenfinalex_xu: If it gets too late for you, I can take over https://review.opendev.org/#/c/681383/09:33
alex_xustephenfin: sorry, I need to take a rest little bit. you can free to take over those two patches. I probably online after one or two hours.09:34
stephenfinSounds good (y)09:34
efriedluyao: btw, here's what I did to that provider tree unit test when I was messing around with it locally: http://paste.openstack.org/show/775098/09:35
efriedthat last assertTrue fails with copy.copy but succeeds with deepcopy.09:35
luyaoefried: get it, thanks.09:38
efriedluyao: note that https://review.opendev.org/#/c/678453/26 is in merge conflict (probably aspiers' fault :P ) so you'll need to rebase again.09:39
luyaoefried: haha :D09:40
*** rcernin has quit IRC09:41
*** ociuhandu has quit IRC09:43
*** brinzhang_ has quit IRC09:45
*** brinzhang_ has joined #openstack-nova09:46
*** brinzhang_ has quit IRC09:46
luyaoefried: I check the unittests for get_allocations, it tests the function that we move the get_allocations_xx out of09:49
*** ociuhandu has joined #openstack-nova09:49
luyaoefried: so I have to remove it09:49
efriedluyao: then I say just leave that part of the patch as is09:50
efriedluyao: stephenfin and I have already reviewed it and accepted it this way, so it's not worth the trouble of changing.09:50
*** Tianhao_Hu has quit IRC09:52
luyaoefried: Okay09:52
*** ociuhandu has quit IRC09:54
luyaoefried: sorry, confirm again,  need try/except or drop?09:54
efriedluyao: Sorry, this has gotten confusing.09:56
efriedRemove the try/except. I.e. keep this part of the patch as it appears in PS22 https://review.opendev.org/#/c/678450/22/nova/compute/manager.py@221509:56
luyaoefried: Okay, thanks.09:57
efriedluyao: Okay, I've finished re-reviewing the series (except for https://review.opendev.org/#/c/678455/, which I'm still not going to touch)09:58
efriedstephenfin: Do you need my re-look at any of the cpu-resources patches at this point?09:59
stephenfinefried: In about a half hour, yeah09:59
stephenfinI think the quota problem is solved09:59
efriedsweet09:59
stephenfinSo it's just replacing the scheduler conf option with retry logic for placement left09:59
stephenfinalmost done with tests10:00
*** cdent has joined #openstack-nova10:02
efriedluyao: note response on the defaultdict thing -- I still don't think you need it https://review.opendev.org/#/c/678449/21/nova/compute/provider_tree.py@7010:03
luyaoefried:  you're right. :)10:06
efriedphew. I like being right10:07
bauzasstephenfin: I wonder something10:08
bauzasstephenfin: if we ask for an o.vo field not existing, can we just do something like 'if field not in my_object:' ?10:08
bauzasIIRC, we need to use a specific o.vo method10:09
bauzasunless the field is set to None10:09
stephenfinbauzas: You can do that, yeah10:09
stephenfinFor some time now10:09
stephenfinyou used to need to use is_attr_set or something like that10:09
bauzasok, I had concerns about the virt.hardware module in https://review.opendev.org/#/c/671800/10:09
stephenfinbut __contains__ was implemented many moons ago10:09
stephenfinanywhere is particular?10:09
*** tssurya has quit IRC10:10
*** tbachman has quit IRC10:12
bauzasnah it's okay then10:17
bauzasmy o.vo skills are a bit rusty10:18
bauzasbut I could have tested it10:18
sean-k-mooneyefried: after 6 year of working on nova the main lesson i have learns is unlesss you have 2 cores from different compaines lined up to review the think you want to do before you start it will take 2 cycles to complete10:19
sean-k-mooneythe first cycle you get it working and people beging to become familar with and the second you finall land the thing10:19
*** brinzhang has joined #openstack-nova10:19
efriedsean-k-mooney: I would love to be able to commit cores to review blueprints before/as they are approved. I just a) doubt cores would be willing to commit, and b) don't want to do that much PM type work10:20
sean-k-mooneyits partly a social thing. you need to explain why this thing you care about is imporant so that people will be interested enogh to review and demonstrate it works and is sane.10:21
sean-k-mooneythe non craze telco things i have worked on are less hard to enabel but im normlally in craze telco land10:22
efriedIt sucks that "be interested" is a criterion10:22
*** ttsiouts has quit IRC10:23
sean-k-mooneywell it only is in terms of making it higher on list of other things you have to review10:23
*** ttsiouts has joined #openstack-nova10:23
sean-k-mooneythere are only a limit number of sme fore each system and they are overloaded frequently10:24
*** maciejjozefczyk has quit IRC10:24
*** maciejjozefczyk has joined #openstack-nova10:25
sean-k-mooneyif we had more subsystem maintainer or more reviewres it would help but we work with what we got.10:25
sean-k-mooneywe still get a lot done each cycle10:25
openstackgerritStephen Finucane proposed openstack/nova master: DNM: Try to fallback to PCPU request when VCPU failed  https://review.opendev.org/68138310:25
sean-k-mooneythe latency can just be a bit high sometimes10:25
stephenfinefried, bauzas: Can you sanity check before I waste more time fixing up tests ^10:25
stephenfinthe approach in general, that is10:26
bauzasstephenfin: I left a comment on https://review.opendev.org/#/c/671800/10:26
bauzasstephenfin: cool, I'll do it a bit later (needs to refuel my stomach)10:28
*** maciejjozefczyk has quit IRC10:28
*** ttsiouts has quit IRC10:28
*** maciejjozefczyk has joined #openstack-nova10:29
efriedstephenfin: sorry, looking at this patch for the first time, and... is this really the right approach? If we bounce with PCPUs, try VCPUs?? If they *have* cut over, and we're just *actually* out of PCPUs in the cloud, they're going to get *not* desired behavior, nah?10:32
openstackgerritLuyao Zhong proposed openstack/nova master: db: Add resources column in instance_extra table  https://review.opendev.org/67844710:33
openstackgerritLuyao Zhong proposed openstack/nova master: object: Introduce Resource and ResourceList objs  https://review.opendev.org/67844810:33
openstackgerritLuyao Zhong proposed openstack/nova master: Add resources dict into _Provider  https://review.opendev.org/67844910:33
openstackgerritLuyao Zhong proposed openstack/nova master: Retrieve the allocations early  https://review.opendev.org/67845010:33
*** HagunKim has quit IRC10:39
* efried caffeinates10:44
*** belmoreira has joined #openstack-nova10:45
*** cdent has quit IRC10:50
*** ttsiouts has joined #openstack-nova11:03
sean-k-mooneyefried: the numa toplogy filter will handel the out of PCPU case when they have swapped over to using cpu_dedicted_set11:04
* bauzas is back11:04
efriedeh?11:05
sean-k-mooneythe numa toplogy filter will still run11:05
sean-k-mooneyit can tell the difference between a pre train host and a train host with non pCPUs free11:06
efriedIf I'm out of PCPU resource per placement, this code will try the alloc request again with VCPUs. All of that happens before we get to the filter, yah?11:06
sean-k-mooneyyep11:06
efriedso... that's bad11:06
sean-k-mooneyand the filter will reject all host that are train but have not pcpus left when we are given vcpu allocations11:06
efriedthat's not the scenario I'm concerned with.11:07
sean-k-mooneywhat is?11:07
*** ttsiouts has quit IRC11:07
*** shilpasd has quit IRC11:07
efriedLet's say I'm fully upgraded, with my confs properly cut over to cpu_{shared|dedicated}_set.11:07
sean-k-mooneyyep11:07
efriedthis means my cloud has both PCPU and VCPU resources scattered around it.11:08
sean-k-mooneyyep11:08
efriedsometimes on the same host, maybe sometimes not, whatever.11:08
sean-k-mooneysure11:08
efriedI request a dedicated VM.11:08
efriedBut I'm genuinely out of PCPU resources in my cloud.11:08
openstackgerritStephen Finucane proposed openstack/nova master: fixup! Add support for translating CPU policy extra specs, image meta  https://review.opendev.org/68144511:08
sean-k-mooneyyep then we fall back11:08
efriedThis code will bounce in GET /a_c and then *retry* with VCPUs instead.11:08
efriedSo I'll wind up with my instance on VCPUs11:08
sean-k-mooneyno11:08
stephenfinefried: nope11:09
stephenfinscroll up and look at the conversation between alex_xu and I about two hours ago11:09
sean-k-mooneyyou get allocation from placmenet that are for VCPUs11:09
sean-k-mooneybut then the numa toplogy filter will reject all the hosts11:09
sean-k-mooneystephenfin: do you want to explain why11:09
efried...because the host knows the VCPUs aren't dedicated.11:09
efried?11:09
stephenfinefried: Yeah11:10
efriedwhich it doesn't know pre-Train, but in Train it does11:10
stephenfinPlacement will try for PCPU, not get it, and then try for VCPU11:10
efriednow, this is the version of the filter that runs on the compute host, not the scheduler, right?11:10
stephenfinFor Stein hosts, that's fine11:10
sean-k-mooneyefried: not its in the schduler11:10
efriedoh11:10
stephenfinFor Train hosts, the NUMATopologyFilter will then reject the host11:10
sean-k-mooneystepehn modifled the host numa toplogy object11:10
stephenfinand if that's disabled, it'll late fail on the compute node11:11
efried...so in a partially-upgraded scenario, how does the sched know the diff between stein & train hosts?11:11
stephenfinNUMATopology.pcpuset11:11
efriedack11:11
efriedit all comes together11:11
stephenfinFor Stein nodes, that isn't set11:11
openstackgerritLuyao Zhong proposed openstack/nova master: Claim resources in resource tracker  https://review.opendev.org/67845211:11
donnydsean-k-mooney: I made some adjustments yesterday and I am hoping that fixed the issue we are hainvg getting the numa job to run11:11
alex_xuactually scheduler needn't know the diff of stein and trait11:11
stephenfinFor Train nodes with vcpu_pin_set, it's set to the same value as NUMATopology.cpuset11:12
sean-k-mooneyi only kicked off one run last night and it passed11:12
*** ttsiouts has joined #openstack-nova11:12
stephenfinFor Train nodes with cpu_dedicated_set, it's set to those cores11:12
sean-k-mooneydonnyd: there is a new version of the numa migration scripts so i can kick it off again to test tehm11:12
stephenfinFor Train nodes with _only_ cpu_shared_set, it's set to None11:12
donnydok11:12
sean-k-mooneyalex_xu: we want the filter to know the difference so it can do the right thing in a mixed case11:13
stephenfinWe check to see if it's unset (meaning this is Stein compute node) and set it to the same value as NUMATopology.cpuset manually in the scheduler11:13
donnydlmk if there are more issues, I would like to get them ironed out so the label is actually usable11:13
stephenfinHowever, if it's set to None, we do nothing, which means we'll fail to pin11:13
openstackgerritArtom Lifshitz proposed openstack/nova master: Deprecate CONF.workarounds.enable_numa_live_migration  https://review.opendev.org/64002111:13
openstackgerritArtom Lifshitz proposed openstack/nova master: Functional tests for NUMA live migration  https://review.opendev.org/67259511:13
stephenfinMy functional tests prove that out. See here11:13
alex_xusean-k-mooney: the numa filter doesn't know. if stephenfin is going to update the patch, we will ensure we have value in host.pcpuset and host.cpuset. Then you asking pcpu, we get it from host.pcpuset, if you asking vcpu, then get it from host.cpuset11:14
sean-k-mooneydonnyd: sure. i vexhost and limestone might also be willing to provide the lables so we could maybe make this a voting job in the future11:14
stephenfinhttps://review.opendev.org/#/c/681445/1/nova/tests/functional/libvirt/test_numa_servers.py@24911:14
alex_xuensure we have value in host.pcpuset and host.cpuset whatever it is stein node or train node.11:15
stephenfinIt's now the NUMATopologyFilter failing the request due to lack of PCPUs, not placement11:15
* stephenfin goes to roll that back into alex_xu's patch11:15
stephenfinTIL 'git commit --amend --fixup=???' resets the author commit message (losing the change-ID). The more you know11:15
efriedstephenfin, sean-k-mooney: I re-commented. Good to move forward. Thanks for patiently explaining :)11:17
sean-k-mooneywe still need the numa toplogy filter for a release or two anyway since plament know nothing about numa yet so it will be enabled if you are using this feature anyway11:18
efriedyah11:18
efriedI was going to complain about that aspect, but yah.11:18
sean-k-mooneyonce we fix placment of tis john snow syntrom re numa  that is also proably the releas we want to drop this fallback and the numa toplogy filter11:19
sean-k-mooneyim guessing that will be V11:19
openstackgerritStephen Finucane proposed openstack/nova master: fixup! Add support for translating CPU policy extra specs, image meta  https://review.opendev.org/68138311:19
*** maciejjozefczyk has quit IRC11:19
sean-k-mooneyassuming we make good progress in Ussuri on using the new placment features11:20
sean-k-mooneythat might be a bit aggressive but i dont think a short time line will be possible11:20
*** maciejjozefczyk has joined #openstack-nova11:21
sean-k-mooneywe will likely want at least 1 release where the numa toplogy filter is optional11:21
sean-k-mooneywhich hopefully could be ussuri?11:21
*** gibi is now known as gibi_fly11:22
sean-k-mooneywe woudl need to get mempages,cpus,vgpus and pci devices reported in placment under numa nodes first11:22
*** cdent has joined #openstack-nova11:24
stephenfinefried: If that reworded comment looks good to you, I'll go ahead and squash that commit back11:26
openstackgerritStephen Finucane proposed openstack/nova master: fixup! Add support for translating CPU policy extra specs, image meta  https://review.opendev.org/68138311:26
stephenfinin there ^, that is11:26
* stephenfin goes for lunch11:26
bauzasdang11:28
bauzasstephenfin: got questions for you https://review.opendev.org/#/c/675571/2111:28
*** udesale has quit IRC11:31
openstackgerritsean mooney proposed openstack/nova master: [DNM] testing with new lable.  https://review.opendev.org/68073811:34
openstackgerritsean mooney proposed openstack/nova master: [DNM] testing with old lable  https://review.opendev.org/68073911:34
sean-k-mooneyartom: you patches just went into merge conflict11:37
sean-k-mooneyartom: looks like its only the functional tests11:37
sean-k-mooneyis it ok to rebase them11:38
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver discovering PMEM namespaces  https://review.opendev.org/67845311:38
sean-k-mooneyi want to test the different nodepool lables11:38
*** nicolasbock has joined #openstack-nova11:39
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: report VPMEM resources by provider tree  https://review.opendev.org/67845411:39
artomsean-k-mooney, I know - I don't want to do it just yet11:41
sean-k-mooneyits such a dumb confilct11:41
artomsean-k-mooney, make it Depends-on on the patch below it11:41
sean-k-mooney<<<<<< HEAD11:41
sean-k-mooneyimport fixtures11:41
sean-k-mooney=======11:41
artomI'd rather make mriedem's and dansmith's job easier at this point11:41
sean-k-mooneyimport itertools11:41
sean-k-mooney>>>>>>> Functional tests for NUMA live migration11:41
sean-k-mooneyya ok i can do that11:41
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Support VM creation with vpmems and vpmems cleanup  https://review.opendev.org/67845511:42
sean-k-mooneyboth fixure and itertools are need by the way and its in alphabetical order so we jsut need to remove the conflict markers.11:42
openstackgerritLuyao Zhong proposed openstack/nova master: Parse vpmem related flavor extra spec  https://review.opendev.org/67845611:43
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces  https://review.opendev.org/67964011:46
openstackgerritLuyao Zhong proposed openstack/nova master: Add functional tests for virtual persistent memory  https://review.opendev.org/67847011:47
* bauzas disappears for 1.5h11:48
efriedstephenfin: commented. Sorry for being dense here.11:49
openstackgerritLuyao Zhong proposed openstack/nova master: objects: use all_things_equal from objects.base  https://review.opendev.org/68139711:51
*** brault has quit IRC11:54
*** luksky has joined #openstack-nova11:55
*** brault has joined #openstack-nova11:55
*** brault has quit IRC11:55
*** brault has joined #openstack-nova11:56
*** tbachman has joined #openstack-nova11:57
*** spsurya has joined #openstack-nova11:58
luyaosean-k-mooney: I gave some wrong infomation about vpmem yestoday, so sorry to misleading you. I clarified at here https://review.opendev.org/#/c/678455/24/nova/virt/libvirt/config.py@318011:58
sean-k-mooney ok so we dont need to subtract and it should be fine to just have 2KB lables since they are just markers12:00
sean-k-mooneyto denote the start and end of the namespace right?12:01
luyaosean-k-mooney: yes, I think so12:02
*** dave-mccowan has joined #openstack-nova12:05
sean-k-mooneycool12:05
efriedluyao: Any update on the vpmem CI?12:07
*** lpetrut has joined #openstack-nova12:13
yaawangHi, could anyone review this patch, it's part of vCPU models selection. https://review.opendev.org/#/c/67029812:13
*** ociuhandu has joined #openstack-nova12:13
*** ociuhandu has quit IRC12:14
*** derekh has quit IRC12:16
*** ociuhandu has joined #openstack-nova12:16
efriedyaawang: I've been staying clear of that one because Alex is representing. bauzas or gibi_fly, might you have room for that ^12:16
*** ociuhandu has quit IRC12:16
efriedkashyap: Could you please clarify your opinion on https://review.opendev.org/#/c/678453/27/nova/virt/libvirt/driver.py@447 -- should we use InternalError to be consistent, or go rogue and use InvalidConfig?12:17
efriedstephenfin: also ^ when you get back12:19
luyaoefried: I know Dolpher and Rui is trying to apply the comunity patches then trigger the tests.  Besides, I'm not very clear about that.12:19
efriedluyao: okay, thanks.12:20
efriedhttp://52.27.155.124/55/678455/28/check/pmem-tempest-plugin-filtered/7336479/job-output.txt.gz @ 2019-09-11 12:14:29.162146 looks like we're still getting PS1312:22
efriedthat completed less than 10 minutes ago.12:22
*** awalende has quit IRC12:22
*** awalende has joined #openstack-nova12:23
sean-k-mooneyluyao: efried how is the intel ci running? if its using zuul v3 i dont really see how it difficult to remote the presetp that is hardcoding the patch.12:25
sean-k-mooneyit would not be  too hard to make it skip it the nova repo did not have teh changeid of the pmem patches in the git log either.12:25
efriedsean-k-mooney: agreed, I don't think it's hard, I just think the owners aren't up yet.12:25
sean-k-mooneythe repos will be set up before your main playbook runs12:25
kashyapefried: Looking12:26
sean-k-mooneyok so there are no public repos with the job definitions?12:26
kashyapefried: I'd say InternalError for now, for consistency's sake.12:26
sean-k-mooneyits managed internally?12:26
efriedkashyap: ack, thanks. luyao ---^12:26
efriedsean-k-mooney: I have no idea.12:26
sean-k-mooneyi mean that fine but if there is we could porbaly fix it12:26
kashyapefried: And I agree with your point there, in a future commit, it can be changed12:26
efriedsean-k-mooney: https://github.com/intel/Intel-OpenStack-CI-jobs/12:27
*** awalende has quit IRC12:27
luyaokashyap: okay, I'll change it cake to InternalError. :)12:27
sean-k-mooneyok its not mnaged via gerrit so we cant use depens on to test job changes12:28
sean-k-mooneyunless they are using gerrithub12:28
sean-k-mooneythat woudl be a no12:29
kashyapluyao: Thanks12:29
*** etp has quit IRC12:31
luyaoefried, sean-k-mooney: Rui told me just now, them have a patch, but they have no permission to merge it into CI , waiting for matt to merge, then we can appy the latest patches and do the tests12:32
efriedluyao: Matt who? Merge where?12:32
sean-k-mooneyok i was just looking at the repo to see if i could write one12:32
sean-k-mooneyi did not see it in that repo12:32
luyaoefried: Merge ti CI-jobs I think? I don't know about CI12:33
sean-k-mooneyany way we coudl get efried or alex_xu added to the people that can merge to it if its going to run against nova12:33
*** awalende has joined #openstack-nova12:34
luyaoefried: Matt is helping setup CI, I saw he is the intel ci job repo owner12:34
efriedMatt who? (Last name?)12:35
sean-k-mooneyefried: https://github.com/matt-welch12:35
efriedokay.12:35
efriedsean-k-mooney: how does one find out who's got merge perms on a github repo?12:35
sean-k-mooneyi dont see the hardcoding in that repo12:36
sean-k-mooneyno idea buyt that is the mat that has merged things12:36
luyaoefried: Matt Welch12:36
sean-k-mooneyyou can see it in the org12:36
sean-k-mooneybut im not a memebr of the intel org anymore12:36
efriedlooks like he's the only one who has merged things.12:36
efriedugh, and he's west coast, which probably means he won't be up for another several hours.12:37
sean-k-mooneyim not sure if this hardcoding is in that repo or in the tempest plugin repo12:37
*** ociuhandu has joined #openstack-nova12:38
efriedsean-k-mooney: https://github.com/intel/Intel-OpenStack-CI-jobs/blob/master/roles/upgrade-libvirt-qemu/tasks/main.yaml#L105-L10812:38
efriedseems like a weird place for it, but whatevs.12:38
sean-k-mooneyah that is why i have not found it it does not have pmem in the name12:38
*** awalende has quit IRC12:38
sean-k-mooneywell the task has12:39
sean-k-mooneyya there are btter ways to do that12:40
sean-k-mooneyoh12:40
sean-k-mooneyi wonder if we can use a pull request in the depens on12:40
sean-k-mooneythat should work if they have zuul set up correctlyly12:40
efriedbrb12:41
*** jmlowe has quit IRC12:42
stephenfinefried: sean-k-mooney or mriedem are you people RE: InternalError or InvalidConfig. It was the former's comment but the latter agreed12:42
stephenfin*your12:43
*** maciejjozefczyk has quit IRC12:43
efriedstephenfin: Sorry, I'm failing to parse that12:43
efriedoh, got it.12:44
efriedsean-k-mooney: Could you please clarify your opinion on https://review.opendev.org/#/c/678453/27/nova/virt/libvirt/driver.py@447 -- should we use InternalError to be consistent, or go rogue and use InvalidConfig?12:44
sean-k-mooneywell its not an internal error but i dont really care12:45
stephenfinbauzas: replied on https://review.opendev.org/#/c/675571 (tl;dr: I spotted it and removed it - should have been a different patch/called out in the commit message but it's done now)12:45
sean-k-mooneywe proably shoudl change them all to invalid config12:45
sean-k-mooneywe can do that in U12:45
stephenfinefried: I vote do the new thing and fix the old things later12:45
efriedsigh12:45
sean-k-mooneyiternal error implies the something in nova broke and it did not. this is user error12:45
stephenfinfoolish consistency and all that12:46
efriedluyao: Looks like kashyap is outvoted ^ so please just fix the docstring and we'll move on.12:47
sean-k-mooneyefried: from a time perspecitve i dont think this is important but i also dont think internal error is correct for config errors12:47
sean-k-mooneyso ill be happy with whatever ye choose12:47
efriedsean-k-mooney: no argument that InvalidConfig is better. The only argument to use InternalError is consistency.12:47
openstackgerritChris Dent proposed openstack/os-resource-classes master: WIP: Build pdf docs  https://review.opendev.org/68146312:48
kashyapYeah, maybe in this case being 'consistent' isn't worth it.  I don't know - whoever has super strong opinion on it (seems like stephenfin has)12:49
sean-k-mooneydoing something consitenly wrong does not make it correct12:49
efriedno, it just makes it predictable for the operator12:49
*** mriedem has joined #openstack-nova12:49
sean-k-mooneybut two days before freatre freeze i dont really want to block on that12:49
sean-k-mooneytrue12:50
kashyapsean-k-mooney: Sure :-).  I do see the good argument for doing the right thing with InternalConfig12:50
sean-k-mooneyif i was an operator i would want to see invalide config12:50
sean-k-mooneybecause then i know it shte config thats broken not nova12:50
*** eharney has joined #openstack-nova12:51
openstackgerritChris Dent proposed openstack/os-resource-classes master: Update bug link in docs to point to storyboard  https://review.opendev.org/68146412:51
openstackgerritChris Dent proposed openstack/os-traits master: WIP: Build pdf docs  https://review.opendev.org/68146512:51
efriedSo all we have left is https://review.opendev.org/#/c/678455/ and for stephenfin to re+2 the whole pile.12:53
* efried bbiab12:53
*** efried is now known as efried_afk12:53
mriedemmight want to keep an eye on this https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c0b/676972/18/check/nova-tox-functional/c0b03d1/testr_results.html.gz12:56
mriedemhttp://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22DestinationDiskExists%3A%20The%20supplied%20disk%20path%20(path)%20already%20exists%2C%20it%20is%20expected%20not%20to%20exist.%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=7d12:56
*** jmlowe has joined #openstack-nova13:01
*** cdent has quit IRC13:01
*** derekh has joined #openstack-nova13:01
stephenfinsean-k-mooney: What do you think of https://review.opendev.org/#/c/678455/28/nova/virt/libvirt/driver.py@5468 ?13:02
*** ratailor_ has quit IRC13:02
sean-k-mooney i dont like it but its what we said we would do a the PTG13:04
sean-k-mooneyi would prefer to actully give the guest a numa toplogy like we do with hugepages13:04
sean-k-mooneybut just not requrie the pmem namespace to be numa affined13:04
stephenfinMe too13:04
sean-k-mooneybut i know some people want to supprot non numa guest with pmem13:04
*** nweinber_ has joined #openstack-nova13:05
stephenfinBut the spec says this "Persistent memory is by nature NUMA sensitive."13:05
sean-k-mooneyyes it is13:05
stephenfinand also "Libvirt domain specification requires each virtual persistent memory to be associated with one guest NUMA node."13:05
sean-k-mooneyits litrally memory dimms that happen to not loose ther content on host reboot13:05
sean-k-mooneyyes so we are working around a qemu limitation by giving the guest a virtual numa toplogy of 1 numa node13:06
stephenfinhmm, yeah, this should definitely have NUMA affinity enforced13:06
sean-k-mooneythen not affinig it to a host numa node to make it act like a floating vm13:06
stephenfinvia the NUMATopologyFilter, of course, until we have NUMA in placement13:06
sean-k-mooneythats what i argued but again we chose not too13:07
*** tbachman has quit IRC13:07
sean-k-mooneywithout reopentin the spec i think we have to just live with this in train13:07
sean-k-mooneyand kill it with fire in U13:08
stephenfinidk. I'd like to see how hard it would be to do the NUMA affinity thing13:08
stephenfinIt seems easy13:08
stephenfinjust another conditional in numa_get_constraints13:08
sean-k-mooneyit would be13:08
sean-k-mooneyyep13:08
sean-k-mooneythe proable with that is it mean that you cant have geust with more cores then fit on one host numa node13:09
sean-k-mooneywithout have a multi numa guest13:09
sean-k-mooneyand that is want people dont want to have to care about13:09
stephenfinthat's no different to huge pages and PCI devices13:09
sean-k-mooneypci device does not creat a numa toplogy13:09
sean-k-mooneyhugepage does13:09
stephenfinsorry, not PCI devices13:09
stephenfinyeah13:09
sean-k-mooneyhugepages and cpu pinning13:10
sean-k-mooneyi would prefer not to special case pmem13:10
sean-k-mooneyif we were to special case pmem i would prefer to relax hugepages and cpu pinning to be the same13:10
stephenfinSo would I. We have way too many special corner cases as-is. We should be reducing that, not increasing it13:10
stephenfinYeah, that13:10
sean-k-mooneye.g i have always wanted to decouple pinning/hugepages form numa13:11
*** ociuhandu has quit IRC13:12
stephenfinoh, I'm well aware :)13:12
sean-k-mooneyso ya that is where i stand. im not going to block it but i consider it tech debth13:13
*** ociuhandu has joined #openstack-nova13:13
sean-k-mooneystephenfin: at least im consitent on the topic :)13:14
*** cdent has joined #openstack-nova13:14
sean-k-mooneystephenfin: alex_xu luyao https://libvirt.org/formatdomain.html#elementsMemory13:15
sean-k-mooneyThe node subelement configures the guest NUMA node to attach the memory to. The element shall be used only if the guest has NUMA nodes configured.13:15
sean-k-mooneyso it looks like not is optional in the target field13:16
sean-k-mooneyso lest just not set it13:16
sean-k-mooneyand get rid fo the virtual numa node.13:16
stephenfinI see this above in the opening paragraph though: "Some hypervisors may require NUMA configured for the guest."13:16
stephenfinIf kvm/qemu one such hypervisor?13:17
sean-k-mooneyif the guest has a numa toplogy we could set it to 0 but i think that would be the default13:17
sean-k-mooneyno13:17
sean-k-mooneyhttps://github.com/qemu/qemu/blob/master/docs/nvdimm.txt13:17
sean-k-mooney-machine pc,nvdimm13:18
sean-k-mooney -m $RAM_SIZE,slots=$N,maxmem=$MAX_SIZE13:18
*** ociuhandu has quit IRC13:18
sean-k-mooney -object memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE13:18
sean-k-mooney-device nvdimm,id=nvdimm1,memdev=mem113:18
sean-k-mooneybasic usage create an nvdim device with no numa node13:18
sean-k-mooneyefried_afk: alex_xu luyao  ^13:18
stephenfinokay, if that's true I can definitely live with it13:19
stephenfinremind me to go and convert the QEMU docs to rST sometime13:19
*** efried_afk is now known as efried13:19
stephenfinor Markdown or something13:19
luyaosean-k-mooney: the vm can't run if no numa node13:19
sean-k-mooneyplain text is fine13:19
sean-k-mooneyluyao: have you tested it13:19
luyaosean-k-mooney: yes13:19
sean-k-mooneywell qemu said it can13:20
sean-k-mooneywhat error did you get13:20
stephenfinBut rendered HTML is better (for me, anyway) ;-)13:20
sean-k-mooneystephenfin: and much worse for me  :)13:20
stephenfinIs dansmith around yet?13:20
* stephenfin taps fingers on desk13:20
luyaosean-k-mooney: I don't remember it clearly, something like you must specify an numa node13:22
sean-k-mooneycan you test it again13:22
luyaosean-k-mooney: actually, the libvirt part you mentioned is my updating before13:22
sean-k-mooneyill try a nested env to test it13:22
sean-k-mooneywhich libvirt part13:22
luyaosean-k-mooney: About the nvdimm config13:23
sean-k-mooneyright the libvirt doc say the node is optional in the target13:23
sean-k-mooneyso libvirt hsould not require a numa toplogy13:24
sean-k-mooneyand the qemu examle does not create a numa toplogy13:24
openstackgerritChris Dent proposed openstack/os-traits master: Build pdf docs  https://review.opendev.org/68146513:24
openstackgerritChris Dent proposed openstack/os-resource-classes master: Build pdf docs  https://review.opendev.org/68146313:25
luyaosean-k-mooney: that node description is for all memory devices, for  nvdimm , we do need a numa13:26
*** jaosorior has quit IRC13:26
luyaosean-k-mooney: I'll try to do a test13:26
*** ociuhandu has joined #openstack-nova13:26
sean-k-mooneyi think we may only need the node to be set if there is a numa node in the guest13:27
sean-k-mooneyif its not present i think we wont need it13:27
*** nweinber__ has joined #openstack-nova13:27
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver discovering PMEM namespaces  https://review.opendev.org/67845313:27
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: report VPMEM resources by provider tree  https://review.opendev.org/67845413:27
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Support VM creation with vpmems and vpmems cleanup  https://review.opendev.org/67845513:27
openstackgerritLuyao Zhong proposed openstack/nova master: Parse vpmem related flavor extra spec  https://review.opendev.org/67845613:27
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces  https://review.opendev.org/67964013:27
openstackgerritLuyao Zhong proposed openstack/nova master: Add functional tests for virtual persistent memory  https://review.opendev.org/67847013:27
openstackgerritLuyao Zhong proposed openstack/nova master: objects: use all_things_equal from objects.base  https://review.opendev.org/68139713:27
sean-k-mooneyso we could hard code it ot 0 if there is a numa toplogy and not set it if there is not13:27
*** nweinber_ has quit IRC13:29
mriedemwhy is it that Intel_Zuul comments can't be hidden with the 3rd party toggle CI button?13:30
sean-k-mooneyit does not have CI in the name13:30
mriedemblarg13:30
sean-k-mooneywhich it should have13:30
mriedemalex_xu: ^ can we get that fixed?13:30
*** ociuhandu has quit IRC13:31
*** eharney has quit IRC13:32
openstackgerritsean mooney proposed openstack/nova master: [DNM] pmem pull request test.  https://review.opendev.org/68147413:32
*** eharney has joined #openstack-nova13:32
*** gbarros has joined #openstack-nova13:33
sean-k-mooneyefried: alex_xu luyao in theory ^ should cause the intel ci to run with my pull request that removes the hardcoding13:33
sean-k-mooneyit also might make the first party ci unhappy but i have turned off all the check test anyway13:33
mriedemartom: you still have a busted test in https://review.opendev.org/#/c/634606/84 but if you want to tack on a follow up patch to the end of the series i wouldn't be opposed to just fixing it that way13:34
artommriedem, what's the risk in pushing a new PS?13:35
mriedemis dan doing reviews on the other patches later up the stack?13:35
mriedemjust wait for me to hit https://review.opendev.org/#/c/640021/50 again quick13:35
artomAck13:36
efriedstephenfin, sean-k-mooney, luyao: here's spec discussion about numa stuff https://review.opendev.org/#/c/601596/16/specs/train/approved/virtual-persistent-memory.rst@27013:37
openstackgerritsean mooney proposed openstack/nova master: [DNM] pmem pull request test.  https://review.opendev.org/68147413:37
efriedand the one below it13:38
sean-k-mooneyefried: yes they have implemented what the spec said but its still problemenatic13:39
efriedbut you agreed it was okay to be consistent with the other problematic things that do the same?13:39
sean-k-mooneyit is not consitent413:39
stephenfinyeah, it's not consistent13:39
stephenfinthat's the bit I don't like13:39
stephenfinat all at all13:39
sean-k-mooneythe other thing actully pin the cores and memoyr to a host numa node13:40
*** panda is now known as panda|ruck13:40
efriedoh, so it's bad in a different way than the other things are bad13:40
efriedand we'd rather have it be consistently bad.13:40
sean-k-mooneythe other things imporve performance13:40
efriedbut we're agreed it's too hard to have it be good (and inconsistent) until "later".13:40
sean-k-mooneythis does not and it make the other code more complex13:40
*** ociuhandu has joined #openstack-nova13:41
sean-k-mooneyefried: it hink we can get the same beahvior if we jsut dont set the node elemnt in the xml13:41
stephenfinefried: I'm not sure what you mean13:41
sean-k-mooneyand dont create a numa toplogy at all13:41
stephenfinwe're saying it should either not have a NUMA topology, if that's possible (which sean-k-mooney thinks it is)13:41
stephenfinand if it has a NUMA topology, it should behave like everything else with NUMA topologies13:42
stephenfinnamely, each guest NUMA node is mapped to a unique host NUMA node13:43
sean-k-mooneyim going to quickly update my multi numa test and start stacking the latest version of atroms code then manully try to test if a numa toplogy is need for pmem13:43
stephenfinif we want to loosen that constraint in the future, grand, but it should be done across the board (so hugepages too)13:43
luyaosean-k-mooney: http://paste.openstack.org/show/775121/13:44
efriedI don't understand the issue, so I'm just operating at a very abstract level. My understanding:13:44
efried- The way other resources "do it" is bad13:44
efried- The way pmem is doing it in the current PS is bad, but different bad from ^13:44
efried- Doing it "good" is hard13:44
efried- So do it the same bad as the "other resources" for now13:44
efried- And some time in the future, make all the things do it good.13:44
efried- ...unless we can do it good now, easily.13:44
sean-k-mooneyluyao: remove the numa element  on line 7-913:44
*** tbachman has joined #openstack-nova13:45
*** panda|ruck is now known as panda|rover13:45
sean-k-mooneyif you have numa element in the cpu element i think you need node to be se13:45
sean-k-mooney*set13:45
sean-k-mooneyif you dont you should not13:45
sean-k-mooneyefried: by they i dont think doing it "good" was hard but let not dwell on that13:46
stephenfinluyao: Yeah, you're hitting this https://github.com/libvirt/libvirt/blob/master/src/qemu/qemu_domain.c#L11525-L1153213:46
sean-k-mooneyefried: also the way the other numa toplogy stuff works is not bad13:46
stephenfin- The way other resources "do it" is bad13:46
stephenfinit's not bad13:46
sean-k-mooneyi just wish it was less copled13:47
stephenfinthere's a very good reason we do that13:47
efriedokay, this reinforces that I should not be involved with this part of the series.13:47
* efried backs away slowly13:47
stephenfinnot at all13:47
stephenfinjust that we have strong opinions about this stuff ;)13:47
sean-k-mooneyefried: this is why we do it https://software.intel.com/en-us/articles/openstack-epa-feature-breakdown-and-analysis#inpage-nav-6-313:47
efriedsean-k-mooney: ^ meaning that even if there's only one NUMA node, turning NUMA "on" is better perf than leaving it "off"?13:48
sean-k-mooneythe numa affintiy on old systems gave you like 7-10% performacne improvement on its own13:48
sean-k-mooneyefried: yes because if the host has multiple numa nodes the vm wont use remote memory form the other numa node or float across them13:49
sean-k-mooneyif the host has one numa nod its has no effect13:49
efriedyahbut...13:50
*** redrobot has joined #openstack-nova13:50
efriedwhat if my VM needs more CPUs than are available in one NUMA node?13:50
sean-k-mooneycreate a vm with 2 numa nodes13:50
efriedno, I mean...13:50
sean-k-mooneyit will perfome better13:50
efriedIIUC we're doing this thing to create an implicit single-NUMA guest if no NUMA topo was specified? To realize this perf gain.13:51
sean-k-mooneyi get the argument  that people dont care13:51
sean-k-mooneyand i want ot support that13:51
sean-k-mooneybut i think we can support that by just no setteing lines 7-9 in http://paste.openstack.org/show/775121/13:51
sean-k-mooneyefried: no we are not13:51
luyaosean-k-mooney: got a error if i remove numa , Total size of memory devices exceeds the total memory size13:52
sean-k-mooneythe current code does not have the performace gain13:52
stephenfinluyao: Can you paste?13:52
sean-k-mooneyi thikn we still need to reserv dim slots13:52
luyaoefried: Rui told me dean trove also has the permission to merge13:52
sean-k-mooney <maxMemory slots='16' unit='KiB'>1524288</maxMemory>13:53
sean-k-mooneymaybe not13:53
*** rpittau is now known as rpittau|afk13:54
*** jawad_axd has quit IRC13:55
luyaosean-k-mooney, stephenfin: http://paste.openstack.org/show/775126/13:56
mriedemartom: https://review.opendev.org/#/c/640021/5013:57
mriedemdansmith: i think a couple of questions for you to opine on in there as well ^13:57
*** dtroyer has joined #openstack-nova13:58
efriedo/ dtroyer13:58
efriedsean-k-mooney, would you please explain to dtroyer what needs to happen in that CI repo to make it right?13:58
dtroyero/13:58
sean-k-mooneycan you try one last thing can you sett <maxMemory  slots="16" unit='GiB'>8</maxMemory> to inclnde the pmem but set   <memory unit='KiB'>524288</memory>13:59
sean-k-mooney  <currentMemory unit='KiB'>524288</currentMemory>13:59
sean-k-mooneyto a lower value13:59
*** Luzi has quit IRC13:59
sean-k-mooneyif that does not work then in the sake of time i think we have to live with this for Train14:00
artommriedem, ack, looking14:00
sean-k-mooneyand in Ussuri we need to find a way to converge this somehow14:00
stephenfinluyao: I'm assuming that's because you're setting <maxMemory> and not <memory>14:02
sean-k-mooneyyep14:02
stephenfinYou were able to omit that before because "In case NUMA is configured for the guest the memory element can be omitted"14:02
stephenfinfrom https://libvirt.org/formatdomain.html#elementsMemoryAllocation14:02
stephenfinsean-k-mooney: Cool. Thought that's what you were saying but I wasn't sure :)14:02
efrieddtroyer: the simple solution is removing this thing https://github.com/intel/Intel-OpenStack-CI-jobs/pull/1/commits/5d4a998a890fef3a2b0a46f73d26aa1deeb3cd8b14:02
artommriedem, about the qemu/kvm stuff - it's not "lower" that's the problem, is that the code used to check for "kvm", but then we realized that it should be checking for "qemu", and not all of the tests were changed14:02
sean-k-mooneythe max memory is the maium memory the geust can boot with or have hotplugged as addtional memeory14:03
artommriedem, the first one is harmless, but second one actually needs to change, because it was triggering a false positive by passing the hypervisor check before the thing we were actually testing14:03
sean-k-mooneyso we shoudl be setting that to guest ram + nvdim size14:03
sean-k-mooneyand the guest memory shoudl be set to the ram size in the flavor14:03
dtroyerefried: would getting rid of the hard-coded ps13 work?14:04
efrieddtroyer: I think that's what we want long term. For now it would be nice if it only ran the pmem tests against patches that have at least https://review.opendev.org/#/c/679640/ involved -- otherwise it's running on everything and will therefore fail handily.14:04
efrieddtroyer: idk, does refs/changes/$gerritnum work without /$psnum ?14:04
sean-k-mooneyit would fail until its merged but it would work after that14:04
mriedemartom: ok i guess i can buy that second one14:04
mriedemartom: so might as well leave them14:04
dtroyeryeah, the filter needs cleaning up, its pretty open now14:04
sean-k-mooneydtroyer: i dont know if the ci is using the git driver or github dirver14:05
efrieddtroyer: I always use git review -d, never git fetch14:05
* alex_xu is back without headache14:05
sean-k-mooneydtroyer: if its using the github dirver then https://review.opendev.org/#/c/681474/2 should work14:05
sean-k-mooneyefried: same14:05
efriedsean-k-mooney: yeah, unless the CI env doesn't have git-review installed :P14:06
dtroyermee too… I approved that PR, we can iterate from there14:06
sean-k-mooneyefried: no it will work14:06
sean-k-mooneythis shoudl all be done via the zuul cloner14:06
sean-k-mooneye.g. zuul should be preparing all the repos before the job starts14:06
sean-k-mooneyand then the jobs should use those prepreapred repos14:07
sean-k-mooneyso we shoudl not use git fetach or git review in the scripts14:07
sean-k-mooneydtroyer: oh i actully didnt mean for it to be aproved since i think there was an internal one coming but i guess thats fine too14:08
luyaosean-k-mooney: http://paste.openstack.org/show/775129/14:08
dansmithmriedem: just got off a call so I'll look at that deprecation patch in a minute14:09
*** maciejjozefczyk has joined #openstack-nova14:09
dtroyersean-k-mooney: I think Rui is looking at this in the internal env, yes, I'm not sure where he is on that...14:09
dansmithmriedem: considering how long things are taking to get through the gate, are you comfortable going ahead and letting the bottom three patches there go in or do you want to wait?14:09
*** nicolasbock has quit IRC14:10
alex_xudtroyer: the mroning, Rui and Dolpher talk with me we can remove the hardcode patch checkout and add a filter. Dolpher will do that when he wakeup I think. Dolpher in the US now14:10
stephenfinluyao, sean-k-mooney: so that's triggering this https://github.com/libvirt/libvirt/blob/master/src/qemu/qemu_domain.c#L11604-L1161514:11
*** nicolasbock has joined #openstack-nova14:11
dtroyeralex_xu: ok, thanks, I'll check with him14:11
*** nicolasbock has quit IRC14:11
alex_xudtroyer: thanks14:11
luyaosean-k-mooney: that's it!14:12
alex_xuyea...that's it14:13
sean-k-mooney this really isnt a hotplug but i guess they build  pmem on that backend14:13
*** nicolasbock has joined #openstack-nova14:13
luyaostephenfin: thanks14:13
stephenfinso it looks like we need a guest NUMA topology, meaning we're back to "do we have to treat VPMEM different from hugepages/other things that imply a NUMA topology in nova"14:13
openstackgerritBalazs Gibizer proposed openstack/nova master: Follow up for Ib50b6b02208f5bd2972de8a6f8f685c19745514c  https://review.opendev.org/68149014:14
mriedemdansmith: i've already started +2ing14:14
mriedemjust cross referencing logs with the latest passing CI job run14:14
stephenfindansmith: Morning. Can I get you to weigh in on https://review.opendev.org/#/c/681383 before I merge it back?14:15
*** gbarros has quit IRC14:15
dansmithmriedem: but the bottom one? /me looks14:15
stephenfin(this is the "try for PCPUs, and if that fails try for VCPUs" change)14:15
dansmithmriedem: oh I see okay14:15
dansmithstephenfin: okay, in a bit14:15
luyaostephenfin: yes, we create one guest numa if no numa specified in Nova, if numa is set, just put vpmems on numa 014:15
stephenfinta14:15
*** nicolasbock has quit IRC14:15
stephenfinluyao: Okay, but why don't we do the pinning to a host NUMA node like we do normally?14:16
sean-k-mooneystephenfin: it did in a previous version14:16
sean-k-mooneyit was removed to allow vms that are bigger then a singel numa node to work14:16
stephenfincan we not do that?14:17
alex_xustephenfin: just because in PTG someone said we should support the floating VM with vpmem14:17
stephenfinand fix it later, for huge pages too14:17
*** BjoernT has joined #openstack-nova14:17
alex_xuthen that use-case write into the spec, then that is why we support that14:17
sean-k-mooneyalex_xu: i can pretty much guranette that this wont be supported with floating instance downstream.14:17
sean-k-mooneyit might but we have no customer askign for that as far as im aware14:18
*** ricolin has joined #openstack-nova14:18
stephenfinI don't think that's a valid request, and I don't think we should be catering to it this cycle14:18
stephenfinIf we do want to cater to it, we should do the same for hugepages too14:18
sean-k-mooneyand cpu pinning14:19
stephenfinand CPU pinning, yeah14:19
sean-k-mooneythere was a propal for hyperv to allow this more or less years ago14:19
alex_xusean-k-mooney: yea, I have to say the initiial proposal doesn't support that also, it added due to the ptg discussion.14:19
alex_xuemm....14:19
alex_xuI think how much effort we can remove it14:19
stephenfin(apologies, btw, for brining this up so late in the day /o\ This week was the first time I made my way that far up the patch list)14:19
mriedemartom: a couple more nits in https://review.opendev.org/#/c/634606/84 if you're respinning or add them in a follow up14:20
alex_xudo we need to change hw:pmem parse too much14:20
artomhttp://giphygifs.s3.amazonaws.com/media/rl0FOxdz7CcxO/giphy.gif14:20
bauzasstephenfin: do you want to continue reviewing your series ?14:20
sean-k-mooneyalex_xu: i think we dont need to chagne that at all14:20
artommriedem, yeah, respinning at this point14:20
stephenfinbauzas: Yup, go for it. There's only two changes to be aware of14:20
bauzasor maybe I should look at gibi_fly's one14:20
alex_xusean-k-mooney: stephenfin how about just like cpu_policy, when the cpu_policy specific, then we create numa_topology for the instance in the api layer14:21
sean-k-mooneyalex_xu: the only thing that would change is we create a numa toplogy in the schduler and we ingore pmem devices in the numa toplogy filter14:21
sean-k-mooneyalex_xu: that is what we want to do14:21
alex_xuluyao: ^ sorry, is that easy for your14:21
sean-k-mooneywe want to add anothger condtional to get_instance_numa_constiratis ?14:22
stephenfinbauzas: Your call :) I would ask that you hit the bottom three though since you hit them already and all I've done if merge back the follow-ups14:22
stephenfinThey start here https://review.opendev.org/#/c/671793/14:22
sean-k-mooney stephenfin hand the sepcific funtion that need 1 more if14:22
bauzasstephenfin: cool14:22
*** cdent has quit IRC14:22
stephenfinsean-k-mooney: um, what? :)14:22
alex_xusean-k-mooney: stephenfin luyao I guess we just fall into this branch https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L174014:22
stephenfinoh, as if that needed another if statement14:23
sean-k-mooneystephenfin: what was the name of the function in hardware.py that we need to add the if tooo14:23
stephenfinohhh14:23
mriedemartom: ack, you'll have to rebase your functional test patch as well14:23
stephenfinnuma_get_constraints, iirdc14:23
stephenfin*iirc14:23
mriedemand https://review.opendev.org/#/c/680739/2 i guess?14:23
artommriedem, yep, I intentionally left it in merge conflict to facilitate checking changes on the lower patches14:23
stephenfinalex_xu: yup, that's what I think14:23
alex_xustephenfin: cool14:23
alex_xuluyao: so sorry14:24
alex_xuluyao: are you still at office, I can take over that14:24
sean-k-mooney basically we need to update this right https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L1684-L170814:24
artommriedem, and I think at this point that's how it'll stay until the things merge? I don't want to have you wade through a rebased PS just to see that I addressed the test/wording feedback14:24
*** tbachman has quit IRC14:24
sean-k-mooneyso that if the flavor has pmen we go into that if and call _get_numa_topology_auto14:25
sean-k-mooneywhich creates the numa toplogy of 114:25
stephenfinexactly14:25
luyaoalex_xu: sean-k-mooney stephenfin : I prefer current solution first, when we support numa affinity for vpmem, then generate numa in scheduler14:25
sean-k-mooneyactully on just make it hit https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L174114:26
sean-k-mooneyeither will work14:26
mriedemartom: ? you can address changes on https://review.opendev.org/#/c/634606/84 and https://review.opendev.org/#/c/640021/50 without touching the already approved patches14:26
mriedemjust start at https://review.opendev.org/#/c/640021/50, git rebase -i HEAD~2,14:26
mriedemand then git review -R -y14:26
artommriedem, right, I was saying I'd have to rebase those as well if I want to get the func test out of merge conflict14:26
mriedemoh yeah i care less about that right now14:26
artomSo instead of seeing simple one-line changes, you'd see the rebase stuff as well14:26
sean-k-mooneyluyao: yes but the current solution is techdebt you know that right and it does nto work the same as other case that create a implcit numa toplogy14:27
mriedemyeah i hate expecting a simple diff from the previous patchset and finding the person rebased on master14:27
alex_xusean-k-mooney: is there any case that doesn't work14:27
alex_xuI didn't know the previous discussion14:27
sean-k-mooneyalex_xu: yes one. you request more cpus or ram then fit on 1 numa node14:28
mriedemnothing like last minute redesigns for vpmems14:28
sean-k-mooneyif we leave the code as it is14:28
dansmithartom: mriedem: I replied on the deprecation patch.. I'll take another scan through the last functional patch and try to give my go/no-go which I think is what mriedem is looking for14:28
mriedemi should get a nickel for every time sean-k-mooney says "we'll do x in U"14:28
sean-k-mooneyplease remvoe any reference to creating an implcit numa toplogy in the docs because it does not act the same14:29
alex_xusean-k-mooney: I don't get that. there is any limit on the number of cpus and ram in 1 numa node?14:29
stephenfinluyao: Why can't we do NUMA affinity for VPMEM? Just return the needed info in the NUMATopology object and let the NUMATopologyFilter do the work for you?14:29
stephenfinJust like we do for hugepages14:29
sean-k-mooneyalex_xu: we do not allow guest to oversubsribe against it self14:29
sean-k-mooneyif you have a numa toplogy of 1 then you cant have more cpus or ram then is availabel on that numa node14:30
sean-k-mooneythis is check both in the driver and the numa toplogy filter14:30
luyaostephenfin: we can, but the  first design not support that, alex_xu had discuss on PTG before14:30
alex_xusean-k-mooney: the floating VM can14:30
sean-k-mooneyonly within the host numa cell14:31
sean-k-mooneybut yes14:31
efrieddansmith: if you can ack the approach for the quota (even if minor tweaks are needed) I can unblock cpu-resources so we have time for the zillion rechecks that will surely be needed.14:31
alex_xusean-k-mooney: so it still works :)14:31
sean-k-mooneyif the host has 16 core and 64GB of ram in the numa cell you cant have a vm wiith 20 core or 96GB of ram14:32
alex_xustephenfin: yes, we said no NUMA affinity in the first step14:32
sean-k-mooneyi think in the sake fo time we jsut have to follow what the spec says14:32
alex_xustephenfin: I guess the reason the people doesn't want us put anything more in NUMATopology object anymore14:33
sean-k-mooneyits sucks we have to special case and we shoudl document this does not work like other implict numa toplogies14:33
sean-k-mooneyalex_xu: ya that was why we did not do numa this cycle14:33
alex_xusean-k-mooney: qemu will refuse to startup that VM?14:33
*** pierreprinetti has joined #openstack-nova14:33
sean-k-mooneyalex_xu: no the numa toplogy filter will prevent it14:33
sean-k-mooneynova does not allow that14:34
*** priteau has quit IRC14:34
alex_xusean-k-mooney: no, the numa topology filter won't stop that, since the instance has no numa_toplogy obj14:34
sean-k-mooneyrigh with the special case code it wont14:34
alex_xusean-k-mooney: we only implicit to create one in libvirt driver14:34
sean-k-mooneyso that is why it works14:34
mlozaIs there an option in nova for instance timeout to ERROR state when a compute node fails?14:34
sean-k-mooneybut was awsering your question about what does hw:numa_nodes=1 do14:35
sean-k-mooneyall other implcit numa toploges act exactly like hw:numa_nodes=114:35
alex_xuyes14:35
sean-k-mooneyhw:pmem is not the same as hw:numa_nodes=114:35
stephenfinokay, I hate the idea of the implicit NUMA affinity when VPMEM is specified working differently from implicit VPMEM for other things (hugepages, CPU pinning), but I agree it doesn't make sense to work that way if we can't provide VPMEM NUMA affinity14:36
sean-k-mooneyand we should not imply it is in any docs, comment or release notes14:36
stephenfinI'm going to have a look to see how hard tacking on proper NUMA affinity is14:36
sean-k-mooneystephenfin: its not hard it has been blocked because its not plamcent native.14:37
stephenfinI don't buy the argument that we shouldn't add more things to the NUMATopology object. NUMA in placement isn't ready so we've to use what we got14:37
stephenfinput politely, that's nonsense :)14:37
*** ttsiouts has quit IRC14:37
*** zhubx has quit IRC14:37
*** zhubx has joined #openstack-nova14:37
*** ttsiouts has joined #openstack-nova14:37
sean-k-mooneyits been nonsense for 3 years but that is not going to change. so we should enbrace doing it the plamcnet way. in U to give mriedem a nickel14:38
stephenfingimme a half hour to check out how tough this would be to do. If it's terrible, I'll hold my tongue14:38
alex_xustephenfin: yes, it will be very easy to support numa affinity once we have numa in placement14:39
sean-k-mooneyits 1 field in the host numa toplogy blob and a tweek to the numa toplogy filter to match on it in the numa_fit_instace_to_host funciton14:39
alex_xustephenfin: just like vgpu, the first step no numa affinity14:39
sean-k-mooneyalex_xu: its very easy to support it without placmeent but not polically viable14:39
stephenfinvGPU didn't imply an implicit guest NUMA topology though, that's the key difference here14:39
bauzaslooks like we side-tracked the discussion14:40
bauzasif that's about providing a NUMA support in placement, that's something we'll do in Unicorn14:40
bauzasbecause we understand the concerns14:40
alex_xustephenfin: the imply an implicit guest NUMA and NUMA affinity are different thing14:41
bauzasmy PMs can pay for a nickel if that's the problem14:41
sean-k-mooneyalex_xu: not currently14:41
*** zhubx has quit IRC14:41
sean-k-mooneyform a nova point of view14:41
stephenfinalex_xu: only for VPMEM though14:41
sean-k-mooneybut they woudl be with vpmem14:41
*** zhubx has joined #openstack-nova14:41
sean-k-mooneyso that break operators understand of what we men when we say there is an implcit numa toplogy14:42
*** ttsiouts has quit IRC14:42
sean-k-mooneyalex_xu: we can crate a numa toplogy as a impmeenation detail14:42
sean-k-mooneybut we should never telll people we are because they will assume it works the same way as the other cases we do that14:42
*** belmoreira has quit IRC14:43
*** gbarros has joined #openstack-nova14:43
*** mlavalle has joined #openstack-nova14:43
*** zhubx has quit IRC14:43
*** zhubx has joined #openstack-nova14:44
sean-k-mooneyalex_xu: just to clarify we use teh terms implcit and explcit numa topoyg in downstream docs and in bug reports and customer expect an implict numa toplogy to be the same as and expclti one. that is the only reason i bring up docs/release notes14:44
mriedemhere is an idea, defer vpmems since you're still arguing about designs 1 day before FF and just sort it out in Ussuri :)14:45
mriedemrather than defer and promise to fix things later14:45
alex_xumriedem: come on...14:45
sean-k-mooneyas i said lest just follow the spec and use the code as it is14:46
alex_xumriedem: so that is works a later fix?14:46
sean-k-mooneyalex_xu: we would need to change qemu/libvirt to not do what the current code does14:47
mriedemalex_xu: i'm not reviewing the series nor am i following the argument here, but i know when we say "we'll hack it in x and fix it for real in x+1" that x+1 rarely happens if ever (x+5)14:47
sean-k-mooneyor require a realy numa toplogy with affity14:47
mriedembauzas: you know you already have head count to work on numa in placement support for nova in ussuri?14:49
bauzasmriedem: if I don't have other things like internal crisises, then yes14:50
bauzasor escalations if you prefer14:50
sean-k-mooneyat this point not having it block so much of our roadmap since we cant do it the non plamcent way  i dont see we have any choice but to dedicate headcount to it14:51
mriedembauzas: so that means no14:51
bauzassean-k-mooney: yeah and I went out of steam trying to find some other way for vGPU affinity14:51
*** panda|rover is now known as panda|ruck14:52
mriedemlet's just not bullshit ourselves here is all i'm saying14:52
bauzasmriedem: nah, it just means I'm telling the truth : we can't continue implementing NUMA related features without having NUMA support in placement14:52
mriedemsure you can!14:52
bauzasand that's something I'll make it clea to who it could be14:52
mriedemstephenfin doesn't see why ont14:52
mriedem*not14:52
stephenfinjaypipes14:52
stephenfinthat was the main reason why, in the past :)14:52
sean-k-mooneyno not jaypies14:52
mriedemgod rest his soul14:53
stephenfinplacement will solve <X>14:53
stephenfinRIP14:53
sean-k-mooneywe can implemntate we jsut cant get it merged14:53
*** cdent has joined #openstack-nova14:53
sean-k-mooneyanyone here if his house/partner/dogs were safe after the storm14:53
mriedemso what's the actual decision that's blocking the pmems stuff at this point? whether numa is implied or not?14:53
stephenfinby specifying VPMEM, you get an implicit NUMA topology but that implicit NUMA topology works differently to every other implicit NUMA topology14:54
mriedemdifferently how14:54
dansmiththat's BS14:54
dansmithdon't do that14:54
alex_xudansmith: you said yes in PTG :)14:54
sean-k-mooneyit does not give numa affinity fo the cpus and memory14:54
stephenfinnormally, each guest NUMA node is mapped to a unique host NUMA node14:55
stephenfinwith this, they'll float14:55
sean-k-mooneyso its not the same as hw:numa_nodes=114:55
dansmithalex_xu: sean-k-mooney said I said that too, but I don't agree that I did14:55
stephenfinso the guest has a NUMA topology but it's literally there to work around a quirk in libvirt14:55
dansmithalex_xu: I know that because I'm quite sure I didn't say anything positive about vpmems14:55
sean-k-mooneydansmith: it proably was not you but apprently it was form redhat an i know stephn and i relly hate that idea14:55
alex_xudansmith: haha14:55
stephenfinI really don't want to blow something out of the water, but I feel by doing this we're really just masking a problem with libvirt (or QEMU)14:56
stephenfinand we're also creating yet another special case that we'll have to clean up later14:56
mriedemso with pmem do you have to have an implicit numa topology?14:57
stephenfinyup, see here14:57
stephenfinhttps://github.com/libvirt/libvirt/blob/master/src/qemu/qemu_domain.c#L11604-L1161514:57
mriedemi mean, will libvirt fail to boot the guest w/o numa defined in the xml?14:57
stephenfinyup14:57
mriedemah ok14:57
alex_xusean-k-mooney: stephenfin ok, let me try to remove that, maybe it is super fast and simple than our discussion :)14:57
mriedemalex_xu: remove what? qemu code?14:57
stephenfinalex_xu: Remove the implicit NUMA topology? Doesn't look like that's an option. See the above link14:57
mriedemthat is likely years before it hits LTS14:58
stephenfinlibvirt is telling us we need a NUMA topology, so we must give it a NUMA topology14:58
alex_xustephenfin: add numa topology explicityly14:58
mriedemok so if qemu requires implicit numa, then our options are hack it and create a unicorn (bad for nova), change qemu and defer the feature (bad for intel), or do numa affinity as usual in nova,14:58
*** TxGirlGeek has joined #openstack-nova14:58
mriedemright?14:58
bauzasoh gosh14:58
stephenfinmriedem: Right. But14:59
sean-k-mooneymriedem: yes14:59
openstackgerritEric Fried proposed openstack/nova master: WIP Include error 'details' in dynamic vendordata log  https://review.opendev.org/68132914:59
stephenfinthe reason they avoided doing it that way was because they wanted to avoid doing more stuff in NUMATopologyFilter14:59
dansmithartom: commented on the top numa functional patch14:59
bauzas... and all Hatters technically need to bail out for attending some meeting :)14:59
dansmithbauzas: not all of them :)14:59
artomdansmith, ack - I'm still trying to write words that make sense for https://review.opendev.org/#/c/640021/50/nova/conductor/tasks/live_migrate.py@17915:00
sean-k-mooneydansmith: can i join your team to not do numa stuff. it seam simpler15:00
mriedemstephenfin: would the numa topo filter need to look at pmems or something? or just the hardware module that the filter calls?15:00
alex_xustephenfin: emm...no, that isn't the reason15:00
bauzasdansmith: heh, so true15:00
artomIt's tricky because... just because we found some old computes in the cell, doesn't mean the source and dest are old15:00
stephenfinto be able to do NUMA affinity properly, we'll need to provide information about the VPMEM devices available in each node as part of the NUMATopology/NUMACell objects15:00
dansmithsean-k-mooney: heh15:00
openstackgerritBalazs Gibizer proposed openstack/nova master: Skip querying resource request if no qos port  https://review.opendev.org/68151315:00
artomSo I want to word around that15:00
dansmithartom: ack15:00
*** TxGirlGeek has quit IRC15:00
mriedemartom: the code isn't granular to just the source and selected dest - where you're checking we don't have a dest yet in some cases,15:01
mriedemso i'm not sure why that would matter for the reno15:01
mriedemoh this is the log15:01
*** spatel has joined #openstack-nova15:01
artommriedem,  yeah - I don't want to say "WARNING LEGACY BEHAVIOR" unconditionally15:01
stephenfinYeah, the NUMA fitting code in the hardware module only gets InstanceNUMATopology, NUMATopology, PCIRequest and whatever the host PCI tracking object is15:01
mriedemartom: i think it's fine to say computes aren't new enough15:02
mriedemthat's generic enough15:02
artomYeah, except they *might* be new enough15:02
mriedemthat doesn't matter15:02
artomIf that specific source and dest are, even if others in the cell aren't15:02
mriedemb/c that's not how the code works15:02
alex_xustephenfin: ah, you complain the part of explicltly numa but no affinity15:02
mriedemiow i don't see a problem with the paraphrase that dan gave15:02
alex_xustephenfin: but you are ok with implicity numa?15:03
artomSo, if we got passed the service check, it means there are some old computes in the cell15:03
alex_xuto be host, I'm confused what we discussion for now....15:03
mriedemartom: correct15:03
artomBut... the actual source and dest might both be new15:03
artomIn which case, assuming the workaround is enabled, NUMA-LM will happen15:03
stephenfinalex_xu: That was the reason we didn't do the implicit NUMA affinity, right?15:03
mriedemsure, but we don't necessarily have a dest at the point of that check15:03
mriedemyes15:03
artomSo I don't want to say "you enabled the workaround, things will explode"15:04
mriedemartom: i mentioned this here https://review.opendev.org/#/c/640021/48/nova/conductor/tasks/live_migrate.py@19615:04
*** jmlowe has quit IRC15:04
artomMore like "you enabled the workaround, things will explode if the source and dest are old"15:04
artom(explode == eggagerating)15:04
luyaostephenfin, stephenfin : i'm confused about  'implicit' numa15:04
spatelHelp!! - I want to evacuate compute nodes because it has hardware issue, but all running vm using local-disk ( NOT shared disk ) can i do evacuate?15:04
alex_xustephenfin: no, that isn't. The reason do implicit NUMA topology for instance just due to someone asking that usecase in the PTG15:04
mriedemartom: i don't think that's what dan suggested15:04
mriedem"Computes are too old to do the smart thing, workaround is enabled, doing that"15:04
dansmithspatel: see topic15:04
alex_xustephenfin: the explicity NUMA topology for instance and without affinity is clear plan in the PTG15:05
mriedemyou adjust the warning to say that computes are too old to do the supported method or whatever15:05
dansmithspatel: you might notice that we're a day before FF and there are multiple dev-related discussions going on in here15:05
spateldansmith: sorry about that, you guys continue.15:05
luyaostephenfin,alex_xu: what is implicit NUMA affinity15:05
stephenfinalex_xu: It's sounding like the implications of that plan were not properly understood though. Certainly I didn't grasp them15:05
alex_xuluyao: I guess that is just stephenfin typing wrong word, actually mean implicity NUMA topology15:07
alex_xustephenfin: ^ is that right? or you pointed to something I may not get15:07
mriedemartom: i left a suggestion on wording15:07
stephenfinluyao: Implicit NUMA affinity is what we do for hugepages and CPU pinning. If you use either of those features, your guest will have a one-node NUMA topology and it will be pinned to a host NUMA node15:07
stephenfinalex_xu: Yeah, an implicit NUMA topology15:07
luyaostephenfin: I thought implicit Alex said is the guest numa not bind to host numa15:08
mriedemstephenfin: oh i have a solution - drop the numa part but only support vpmems on ppc64 https://github.com/libvirt/libvirt/blob/master/src/qemu/qemu_domain.c#L11604-L11615 :)15:08
mriedempower to the rescue15:08
stephenfinfight the power15:08
luyaostephenfin, alex_xu: so you are discuss different things ?15:08
*** jmlowe has joined #openstack-nova15:09
mriedemdoes tonyb have openshift working on power yet?15:09
stephenfinluyao: no, the disagreement is on the lack of pinning15:09
artommriedem, https://review.opendev.org/#/c/640021/50/nova/conductor/tasks/live_migrate.py@20015:09
*** TxGirlGeek has joined #openstack-nova15:10
stephenfinthe fact that for hugepages, the single guest NUMA cell is pinned to a host NUMA cell, but for VPMEM, the single guest NUMA cell floats across all host NUMA cells15:10
stephenfin*for hugepages and CPU pinning15:10
alex_xustephenfin: yes, so you want the pinning, but is it ok without affinity for now?15:11
stephenfinbauzas: Think you could hit these two too? https://review.opendev.org/#/c/681060/ https://review.opendev.org/#/c/681061/15:11
stephenfinalex_xu: I'll take that as a middle ground, yes15:11
bauzasstephenfin: looking15:12
alex_xustephenfin: so the only I need is fall into this branch https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L1740, right15:12
stephenfinyup15:12
alex_xucool15:12
mriedemartom: replied15:12
mriedemwith anger15:13
luyaostephenfin: Does it make sense to do the numa pinning now without affinity.15:13
sean-k-mooneyluyao: yes15:13
stephenfinluyao: It's definitely sub-optimal, I agree, but it's better than breaking our long-standing one guest NUMA node == one host NUMA node policy15:13
sean-k-mooneyits not ideal but its not a bad thing15:14
artommriedem, ok, there's something I genuinely don't get then15:14
luyaosean-k-mooney: Okay15:14
stephenfinyou'll get inferior performance but from a user perspective, things behavior as expected15:14
artomNot trying to make you angry on purpose, for serious15:14
stephenfin*behave15:14
artomIf the conductor's train, and the source is train, and the dest is train, but there are other stein computes in the cell15:14
artomWe'll get past the min sevice version check15:15
artomget past = not return15:15
artomMeaning we check the value of the workaround15:15
artomIf the workaround is disabled, that's that15:15
artomIf the workaround is enabled, in *this specific case of dest and source being new*15:16
artomWe'll get NUMA LM15:16
mriedemartom: even if the source is new and you checked that, the dest - which we might not have yet if the user didn't force the dest from the api and bypass the scheduler - might not be new, and we won't know in that code with the warning log message because it comes before we ask the scheduler for a dest15:17
mriedemso if we get lucky and the source host and selected dest host are new enough, yeah it might be fine15:18
mriedembut,15:18
dansmithmriedem: I don't think that's whathe's saying15:18
mriedemit probably won't be because if you have old computes,15:18
mriedemrpc is going to be pinned and we'll backlevel the new numa migrate_data object stuff15:18
dansmiththis ^ is what he's saying15:18
dansmithso he just doesn't want to specifically say that the src or dst is too old.. right?15:19
mriedemwe never asked him to say that15:19
mriedemwe just said "say something about computes not being upgraded yet"15:19
dansmithI don't think he claimed we did15:19
mriedemwho's on first?15:19
dansmithI think he just said he's trying to wordsmith a suitably generic message right?15:19
mriedemi provided one15:20
artomWell there's the logging on L196 as well - it all fits together15:20
artomOr I'm just overcomplicating everything?15:20
dansmithI don't see any logging on L 19615:20
artomYou asked for it :)15:21
mriedemartom: i think dan just meant adjusting the msg object15:21
dansmithyou mean the logging that would be on L196 if you updated?15:21
mriedemif the config is enabled, we log a warning15:21
mriedemelse we raise, but with the same message15:21
dansmithmriedem: I was suggesting one more log message inside the service version check15:22
*** gyee has joined #openstack-nova15:23
mriedemok i'm going to move onto reviewing gibi's stuff now15:23
artomHaha15:23
dansmithhow about artom just fixes the test issue, adds logs like he thinks he needs, and then we can sort out the log minutia in a later patch if necessary?15:23
bauzasstephenfin: good news, I now start looking at your reshaper change :)15:24
mriedemthat's fine15:24
stephenfinhuzzah!15:24
gibi_flymriedem: hit me with any questions15:26
artomdansmith, btw, the compute RPC version is per-cell, right? So, if we're in a cell with mixed stein/train computes, none of the will be able to send 5.3?15:27
artomThere's no backporting like what's done for objects - if a node can send 5.3, then all others must be able to receive it?15:27
dansmithartom: not really, but I definitely do not think that level of detail needs to be in there15:27
efriedsean-k-mooney: https://review.opendev.org/#/c/681474/2 appears not to have worked --15:27
efriedhttp://52.27.155.124/74/681474/2/check/pmem-tempest-plugin-filtered/134b243/job-output.txt.gz @ 2019-09-11 14:17:14.914120 is still applying PS1315:27
artomdansmith, yeah, maybe I'm overthinking it15:28
*** macz has joined #openstack-nova15:29
*** ivve has quit IRC15:29
efriedsean-k-mooney, dtroyer: I can't tell by looking at https://github.com/intel/Intel-OpenStack-CI-jobs/pull/1/commits/5d4a998a890fef3a2b0a46f73d26aa1deeb3cd8b -- is this merged?15:29
sean-k-mooneyefried in that case they are likely using the git plug not the gitub plugin15:29
*** gbarros has quit IRC15:30
sean-k-mooneyefried: the zuul git driver does not support pull request but works with any git repo via a timer15:30
sean-k-mooneythe gitub one require the zuul app to be installed in the github org15:30
sean-k-mooneybut once that is done pull requeist work15:30
sean-k-mooneyi suspect intel it dont allow the zuul app15:30
efriedsean-k-mooney, dtroyer: I wanted to just recheck one of the top vpmem patches to get new results15:30
efriedbut no point unless that pr merged15:31
*** igordc has joined #openstack-nova15:31
mriedemgibi_fly:15:32
mriedemhttps://review.opendev.org/#/c/676980/1915:32
gibi_flymriedem: looking15:32
bauzasI believe I can gibi_fly15:33
bauzas(sorry)15:33
mriedemget out15:33
gibi_flymriedem: good points. I will fix it up today.15:35
*** shilpasd has joined #openstack-nova15:35
sean-k-mooney efried its not merged by the way15:36
gibi_flybauzas: I will change to a bus soon so you were on time with that joke :)15:36
bauzasgibi_fly: do you need help for providing a new revision ?15:36
sean-k-mooneyefried: the PR woudl be closed if it was merged15:36
bauzasgibi_fly: since you're a bit on-off15:36
bauzasFWIW, I starbucks'd last Friday and it wasn't a great experience15:37
gibi_flybauzas: thanks. I will manage15:37
gibi_flybauzas: I will ping you for review :)15:37
bauzasgibi_fly: cool15:37
*** eharney has quit IRC15:37
gibi_flybauzas: I don't know which net is better, plane or starbucks15:37
efriedsean-k-mooney: okay, that's what I thought, but dtroyer said he merged it so I wasn't sure if I just didn't know how to read the thing (which would be totally likely)15:38
luyaostephenfin: could you review rest of the patch series about vpmems?15:38
sean-k-mooneyhe approved it and i guess that normllay allows a ci to merge it but in this case it might need to be manually submitedd15:38
stephenfinluyao: Already on it :)15:38
luyaostephenfin: Thanks!  alex_xu  will address the numa stuff. :)15:39
bauzasgibi_fly: I'm just surprised you getting internet connection while on flight15:41
bauzasgibi_fly: I never saw this with BA or Swiss or whatever else15:41
alex_xuluyao: go home, it is too late to be office right now15:41
gibi_flybauzas: Lufthansa has some paidd service15:41
gibi_flybauzas: 7 EUR to push patches and read reviews sounded like a bargain15:42
luyaoalex_xu: I prepared to fight overnight. :D15:43
* bauzas reconsiders Lufthansa then15:43
efriedlufthansa is a lovely airline15:43
bauzasoh gosh it's not15:43
*** jaosorior has joined #openstack-nova15:44
efriedbut maybe only compared to the shit that's standard in the US15:44
bauzasbut that's a sidetrack15:44
dtroyerefried, sean-k-mooney: it is probably my github fail… I hit approve… but I've also been multi-tasking two meetings this whole time...15:45
stephenfinwhen you've flown Ryanair, anything is a lovely airline15:45
melwittI find lufthansa's gate boarding "system" a total mess (in that there seems to be no system) but other than that, I've been happy enough with it15:45
sean-k-mooneydtroyer: no worries. you have too also manually submit it if there is not a ci set up to do it15:46
efriedaspiers: fyi you can knock a patch out of the gate by rebasing it. I can re+W if you want to do that rather than waiting for the gate to fail.15:48
aspiersefried: ah, nice idea thanks!15:49
*** eharney has joined #openstack-nova15:50
bauzaswell, what I can just say about Lufthansa is how they're aggresive with carry-on luggage15:50
openstackgerritsean mooney proposed openstack/nova master: multi numa nfv testing job  https://review.opendev.org/67965615:51
openstackgerritsean mooney proposed openstack/nova master: [DNM] testing with new lable.  https://review.opendev.org/68073815:51
bauzasevery time I flew with them, my carry-on was taken on checked luggage, just because they felt they were running out of space in the cabin15:51
*** damien_r has quit IRC15:51
dtroyerefried, sean-k-mooney: ok, for the record, just talked with Dolpher and his final change is ready to be merged into the Intel-OpenStack-CI-jobs repo, I should get that done after this meeting15:51
bauzasI admit this was in Geneva, but come on, even on August when all people wear light stuff ?15:52
efrieddtroyer: nice15:52
sean-k-mooneydtroyer: ok ill close my PR then15:52
sean-k-mooneylets just go with the proper one15:52
sean-k-mooneyalso i forgot to chnage the rul in the numa tests new patch incoming15:53
*** Garyx has quit IRC15:53
*** Garyx has joined #openstack-nova15:54
efriedmelwitt: I wonder if you're hitting https://github.com/openstack/keystoneauth/blob/38cd5fc6c39c38a51c11683884caf9696ce5f367/keystoneauth1/exceptions/http.py#L411 which afaict will result in your exception containing bugger-all.15:54
openstackgerritsean mooney proposed openstack/nova master: multi numa nfv testing job  https://review.opendev.org/67965615:56
openstackgerritsean mooney proposed openstack/nova master: [DNM] testing with new lable.  https://review.opendev.org/68073815:56
efriedmelwitt: ...which may be possible if the encoding is off or something.15:56
*** TxGirlGeek has quit IRC15:57
*** TxGirlGeek has joined #openstack-nova15:58
openstackgerritsean mooney proposed openstack/nova master: [DNM] test migration with pinning  https://review.opendev.org/67975415:59
melwittefried: thanks for the idea15:59
*** shilpasd has quit IRC16:00
artomStupid sphinx errors line numbers that have nothing to do with anything16:01
openstackgerritsean mooney proposed openstack/nova master: [DNM] test non overlaping vcpu pin sets.  https://review.opendev.org/67980516:01
*** spatel has quit IRC16:01
sean-k-mooneyartom: ok all the numa jobs should be using the old nodepool lable now and running agaist the patch before the functionl tests16:02
*** spatel has joined #openstack-nova16:02
sean-k-mooneymaybe i missed one16:05
mriedemartom: if it was reno, i use an online sphinx editor to validate those rather than wait to build them locally16:06
aspiersefried: hrm, bit confused - now the failure has disappeared from the gate but there's another job for it still running in the gate's "integrated" queue which has no failures16:07
artommriedem, thanks for the suggestion, I think I'll do that16:07
*** spatel has quit IRC16:07
artomsean-k-mooney, cheers, thanks!16:07
aspiersefried: might it have failed in a different queue in the gate? which gate queue(s) actually count for the V+2?16:07
sean-k-mooneyfixing the last one16:07
efriedaspiers: is it possible you were looking at your patch as it existed as part of some other patch's chain?16:08
efriedwhat's the change #?16:08
aspiersefried: 68125416:08
openstackgerritsean mooney proposed openstack/nova master: [DNM] test non overlaping vcpu pin sets.  https://review.opendev.org/67980516:08
aspiersefried: the failure I saw was at the tip of a chain16:08
* aspiers looks in the builds tab16:08
aspiersweird, can't see it16:09
aspiersthe only failure was from the first time around16:09
aspiersi.e. 2019-09-11T05:56:59 which was before my recheck16:09
aspiersmaybe I was hallucinating16:10
efriedyeah, it looks okay to me16:10
*** dtantsur is now known as dtantsur|afk16:10
sean-k-mooneydonnyd: fyi i have 4 jobs queue 3 using the old lable and 1 using the new one so ill let you know how it goes. hopeful your router tweaks resolved the issue16:10
*** jmlowe has quit IRC16:13
sean-k-mooneystephenfin: efried is there anything i can do to help with either the PCPU in plamcenet seriese or pmem?16:14
efriedsean-k-mooney: I'm at a point where I'm willing to proxy your vote on https://review.opendev.org/#/c/678455/ -- once we've agreed whether it needs to be changed or not.16:15
efriedI'm not clear whether y'all came to an agreement above?16:15
stephenfinsean-k-mooney: I'm on the vpmem one, so if you fancy looking at either https://review.opendev.org/#/c/674895/16:16
stephenfinefried: alex_xu is on it16:16
mriedemTestInstanceNotificationSampleWithMultipleCompute.test_multiple_compute_actions just reset the gate again,16:16
mriedemi think we have a regression16:16
stephenfinWe agreed it was better to keep the one guest NUMA node to one host NUMA node model, even if we don't do proper affinity for the VPMEM devices16:16
efriedand that's because qemu has some checks in place that don't allow us to do the slightly better thing?16:17
sean-k-mooneystephenfin: so like pci passthoug with affinity disabled16:17
stephenfinefried: Yup16:17
efriedokay cool.16:17
stephenfinsean-k-mooney: Yeah, good analogy16:17
stephenfinefried: I did have a comment on https://review.opendev.org/#/c/678448/20 though16:18
stephenfinI'm not sure if it's worth respinning for but if we don't, we're stuck with that object name forever16:18
sean-k-mooneywhats SlugField16:19
sean-k-mooneyoh somting that is alpha numeric or has _16:20
stephenfinsean-k-mooney: That's the terminology I'm used to from Django. I thought it was more widespread than that, tbh16:20
stephenfinyeah, exactly16:20
sean-k-mooneynot really but i have seen you use it before16:20
*** luksky has quit IRC16:21
stephenfinhere we go https://en.wikipedia.org/wiki/Clean_URL#Slug16:21
efriedstephenfin: I thought slug was with hyphens, not underscores16:21
stephenfinGood point. It is16:21
efriedoh, according to that wp article it's either.16:21
efriedbut also lowercased16:22
*** ricolin has quit IRC16:22
efriedand also specific to URLs16:22
sean-k-mooneyya this is specifcly only uppercase16:22
efriedI was beaten up about this by jaypipes at some point when I tried to name a method slugify16:22
efriedin fact16:22
sean-k-mooneyits basically what i know as CONST_CASE16:22
efriedI think it was the method now known in os-traits and os-resource-classes as normalize_name :)16:22
efriedso no16:22
efriedyou can't call it a slug16:22
efriedbecause Jay will kick your ass16:22
*** tbachman has joined #openstack-nova16:22
efriedIMO it's definitely not worth respinning for this.16:23
efriedeven if this object winds up single use16:23
stephenfinFair point. I was on the fence as it was16:23
sean-k-mooneywe can alwas do it later16:24
sean-k-mooneyif we have a common felid type this can jsut be an alias or inherit16:24
stephenfinvery true16:24
stephenfinI switched to +2 anyway16:24
mriedemhttps://bugs.launchpad.net/nova/+bug/184361516:24
openstackLaunchpad bug 1843615 in OpenStack Compute (nova) "TestInstanceNotificationSampleWithMultipleCompute.test_multiple_compute_actions intermittently failing since Sept 10, 2019" [Undecided,New]16:24
stephenfinon account of it not being a slug and therefore probably not as reusable as I thought it would be16:25
sean-k-mooneystephenfin: once question does the qutotat code need to be move down in the series16:25
stephenfinsean-k-mooney: Yup, see my comment on same. I'll move it when I squash that fixup patch back16:25
stephenfinexcept for the functional test. That has to wait til the end, of course16:25
sean-k-mooneyah cool16:25
stephenfinthe squash back which will be happening any minute now16:26
openstackgerritAlex Xu proposed openstack/nova master: libvirt: Support VM creation with vpmems and vpmems cleanup  https://review.opendev.org/67845516:29
openstackgerritAlex Xu proposed openstack/nova master: Parse vpmem related flavor extra spec  https://review.opendev.org/67845616:29
openstackgerritAlex Xu proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces  https://review.opendev.org/67964016:29
openstackgerritAlex Xu proposed openstack/nova master: Add functional tests for virtual persistent memory  https://review.opendev.org/67847016:29
openstackgerritAlex Xu proposed openstack/nova master: objects: use all_things_equal from objects.base  https://review.opendev.org/68139716:29
*** ociuhandu has quit IRC16:30
alex_xustephenfin: sean-k-mooney ^ done \o/16:30
alex_xureally small change actually16:31
alex_xuand tested16:31
efriedsean-k-mooney: Can I abandon https://review.opendev.org/#/c/681474/ ?16:31
sean-k-mooneyoh yes16:31
alex_xustephenfin: sean-k-mooney removed the workaround code at https://review.opendev.org/678455, then a super simple check at https://review.opendev.org/67845616:33
sean-k-mooneyalex_xu: i expect the code is simpler overall16:34
alex_xusean-k-mooney: yea16:34
alex_xuoops, I forget to remove that useless need_pin parameters16:35
sean-k-mooneyim not sure get_guest_numa is correct it proablyt should be get_guest_singel_numa16:36
*** ralonsoh has quit IRC16:37
efriedthese sound like fupables16:37
alex_xusean-k-mooney: emm?16:37
sean-k-mooneyhttps://review.opendev.org/#/c/678455/30/nova/virt/hardware.py16:38
sean-k-mooneyit ignores the flavor and always returns a single numa node16:38
sean-k-mooneyso get_guest_numa is missleading16:38
alex_xusean-k-mooney: sorry, that one should be removed, noone use that method anymore16:39
sean-k-mooneyah ok16:39
sean-k-mooneyi was looking for where it was used16:39
sean-k-mooneythat explain why i could not find it16:39
* bauzas is done for the day unfortunately16:40
bauzassee you folks tomorrow morning16:40
*** lpetrut has quit IRC16:40
* bauzas goes daddy16:40
openstackgerritAlex Xu proposed openstack/nova master: libvirt: Support VM creation with vpmems and vpmems cleanup  https://review.opendev.org/67845516:41
openstackgerritAlex Xu proposed openstack/nova master: Parse vpmem related flavor extra spec  https://review.opendev.org/67845616:41
openstackgerritAlex Xu proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces  https://review.opendev.org/67964016:41
openstackgerritAlex Xu proposed openstack/nova master: Add functional tests for virtual persistent memory  https://review.opendev.org/67847016:41
openstackgerritAlex Xu proposed openstack/nova master: objects: use all_things_equal from objects.base  https://review.opendev.org/68139716:41
alex_xusean-k-mooney: actually, it is all about this little bit https://review.opendev.org/#/c/678456/32/nova/virt/hardware.py@168816:42
*** ivve has joined #openstack-nova16:42
sean-k-mooneyalex_xu: ya that shoudl cause nodes to be 116:43
sean-k-mooneyand i think we will take this brance when creating the numa toplgoy right https://review.opendev.org/#/c/678456/32/nova/virt/hardware.py@174216:44
sean-k-mooneywell we will also call  numa_topology = _get_numa_topology_auto(...)16:45
*** jaosorior has quit IRC16:46
*** ociuhandu has joined #openstack-nova16:46
sean-k-mooneyya16:47
alex_xusean-k-mooney: yes, it is go to the _get_nuam_toplogy_auto16:47
sean-k-mooneythe singel numa toplogy is created by https://review.opendev.org/#/c/678456/32/nova/virt/hardware.py@151416:47
sean-k-mooneywhich is exactly what we want16:47
alex_xuyea16:47
*** TxGirlGeek has quit IRC16:51
stephenfinefried: What do you think about https://review.opendev.org/#/c/678455/25/nova/virt/libvirt/driver.py@1299 ?16:53
openstackgerritArtom Lifshitz proposed openstack/nova master: NUMA live migration support  https://review.opendev.org/63460616:54
openstackgerritArtom Lifshitz proposed openstack/nova master: Deprecate CONF.workarounds.enable_numa_live_migration  https://review.opendev.org/64002116:54
openstackgerritArtom Lifshitz proposed openstack/nova master: Functional tests for NUMA live migration  https://review.opendev.org/67259516:54
artomdansmith, mriedem ^^16:54
efriedstephenfin: It is critical that a dirty pmem not be allowed to be assigned to another guest16:54
efriedsorry, you're asking whether we should continue cleaning up wait until the end to fail out?16:55
*** TxGirlGeek has joined #openstack-nova16:55
stephenfinyup16:55
efriedthis is by no means the only way we can fail out of a destroy() early16:55
efriedis it?16:55
stephenfinThat's what I'm trying to figure out16:55
stephenfin_unplug_vifs is called with ignore_errors=True16:56
* artom is reminded he should get food...16:56
efriedlooking at L1207-3016:56
efriedanything that's not LibvirtException will cause immediate failure.16:56
efriedand even some LibvirtErrorZ16:56
efriedso this is far from unprecedented it would seem.16:56
stephenfingood point. If it's broken, it was already broken16:57
efriedstephenfin: likewise below when cleaning up bdms16:57
openstackgerritMatt Riedemann proposed openstack/nova master: Fix race in _test_live_migration_force_complete  https://review.opendev.org/68154016:57
mriedemefried: dansmith: ^ this fixes a gate bug16:57
mriedemwhich reset us earlier16:57
sean-k-mooneystephenfin: its called where16:57
*** tesseract has quit IRC16:57
*** gibi_fly is now known as gibi_bus16:58
sean-k-mooneyoh in distroy16:58
stephenfinhttps://review.opendev.org/#/c/678455/25/nova/virt/libvirt/driver.py@119616:58
stephenfinsean-k-mooney: ^16:58
*** mriedem is now known as gibi_zeppelin16:58
* stephenfin notes gibi_bus is on quite the little roadtrip today16:58
stephenfinheh16:58
*** gibi_zeppelin is now known as gibi_submarine16:58
*** gibi_submarine is now known as mriedem16:58
*** spatel has joined #openstack-nova16:59
gibi_busstephenfin: yepp, try to get to Athlone16:59
*** derekh has quit IRC17:00
sean-k-mooneyso there is a diffreenc between pmens and vifs17:00
efriedmriedem: qualified +217:00
* artom prepares for gibi_sheep17:00
sean-k-mooneypmem namespace contain user data which we want to be very careful with ensuring it is earased17:00
sean-k-mooneywe shoudl clean up vifs but if noting is plug into it it has little effect17:01
sean-k-mooneyideal neighter would fail17:01
sean-k-mooneybut we shoudl make sure we dont skip the rest of the cleanup if they do17:01
dansmithartom: -117:01
artomFor the sheep joke or the code?17:01
dansmithartom: for the plethora of typos17:02
sean-k-mooneystephenfin: efried so i think we shoudl catch the VPMEMCleanupFailed excetion or put the rest of the cleanup into a finally block17:02
dansmithI assumed you were spending lots of time wordsmithing and checking all that stuff17:02
sean-k-mooneystephenfin: efried or do the pmem cleanup last17:02
sean-k-mooneyactullly no we want to always try it17:03
*** awalende has joined #openstack-nova17:03
stephenfinsean-k-mooney: You see what efried said about us already failing hard on other things though?17:03
sean-k-mooneyyes and that has bit us over and over again17:03
sean-k-mooneythis is one of the places we leak resouces on failed migrations17:03
efriedseems like yet another thing we shouldn't start fixing as part of this series, but should do later when we can be more complete17:04
sean-k-mooneyi guess17:04
artomdansmith, clearly not enough :(17:04
efriedf man, if we try to fix everything ugly we notice or touch or get near, we'll never get anything done anywhere ever.17:05
sean-k-mooneywhat is the behavior when we raise we go to error then we clean it up later?17:05
sean-k-mooneyi assume that woudl jsut fail again17:05
dansmithartom: fix those quick so I can apply my +2 and we can move on17:05
mriedemartom: https://review.opendev.org/#/c/634606/8517:05
mriedemFUP needed but won't block17:05
artomAck, thanks17:06
*** awalende has quit IRC17:07
sean-k-mooneyefried: ok so we will just leave it raise.17:07
efried++17:08
*** ociuhandu has quit IRC17:08
sean-k-mooneyefried: its not that i want to fix every ugly thing we see. i just dont want use adding more17:08
efriedYeah, I understand, but in this case it would entail reengineering the whole exception flow.17:08
sean-k-mooneyas long as we dont return the namesapce to the pool of alocatable ones then its fine17:08
sean-k-mooneyif we did it woudl be a CVE17:08
efriedright, IIUC we're careful enough about that.17:09
sean-k-mooneyeven if you delete the vm17:09
efriedI think in all cases where we could have leaked a pmem, we scrub it.17:09
* mriedem sets timer for pmem cve17:09
mriedemnow that sean jinxed it17:09
sean-k-mooneyif only the hardware did secure erase....17:10
efriedwhat was that unix util called?17:10
efriedshred17:10
sean-k-mooneydban?17:10
sean-k-mooneyshred is proably a thing too17:10
efriedshred exists on my ubuntu. dban doesn't.17:11
mriedemshred is used in the lvm image backend code17:11
artomdansmith, you made me enable spell check for vim17:11
sean-k-mooneydban is not a utility its a thing you boot https://dban.org/17:11
efriedanyway, I would think a VM wouldn't be able to go looking at a low enough level to recover regular-deleted data.17:12
sean-k-mooneyefried: of couse it can17:12
efriedthought you needed specialized hardware for that17:12
sean-k-mooneyits directly mapped into the guests adress speace17:12
sean-k-mooneyit is byte adressable17:13
efriedright, but if it's deleted17:13
sean-k-mooneythat is why we need to 0 it out17:13
efriedisn't that what daxio -z does??17:13
sean-k-mooneyefried: there is no filesystem so to delete it you have to write over it17:13
sean-k-mooneyand yes17:14
dansmithartom: no, you did.17:14
artomdansmith, that was a good thing :)17:14
dansmithefried: uh what?17:14
efriedsean-k-mooney: yeah, so what I'm saying is, having daxio -z'd the thing, you would need specialized hardware to try to uncover the ghost data17:14
mriedemartom: i've piled on17:14
efrieddansmith: what what?17:14
sean-k-mooneyefried: you would be surpriesed17:15
dansmithefried: unless you scrub every byte in what you hand to the guest, it can find it17:15
efriedscrub meaning overwrite multiple times?17:15
sean-k-mooneyefried: yes17:15
efriedand... is that not a problem for regular ol ram and disk too?17:15
sean-k-mooneyat least with magnetic media there was a bias that a singel over write was not enough17:15
efrieddo we srsly shred every byte of those between VMs?17:16
dansmithefried: certainly not for ram because it's paged in17:16
sean-k-mooneyram is not persetend so no power no data17:16
dansmithefried: the kernel can zero a page before it lets you read from it if you haven't written to it17:16
dansmithbut with media that's not the case17:17
dansmithI dunno what the semantics of the daxio thing are, but in general you have to be careful about that stuff17:17
efried-z, --zero Zero the output device for len size, or the entire device if no length was provided. The output device must be a Device DAX device.17:17
dansmithand there have been exploits where even dropping the map could be circumvented by tricking the thing into granting you access to a region again without it being zeroed17:17
efriedI mean, it doesn't say "Really really zero the output device"17:18
efriedbut still17:18
efriedyou would think it would be set up to prevent contamination17:18
dansmithefried: and for silicon, it almost never does because it costs money (i.e. wears the media)17:18
efriedotherwise what good is it?17:18
sean-k-mooneyefried: anyway were are side tracking17:18
efriedagreed17:18
dansmithefried: you said "regular deleted data" above, which is what I took exception to17:19
efriedack17:19
sean-k-mooneyright its specal hadware that needs special handeling17:19
dansmithbut I will say, all the spectre stuff is about sussing out data via sidechannel which is not reading it direclty17:19
dansmithso let's not pretend that's not a thing :)17:19
efriedYeah, of course if you just trash the inode and don't actually zero the content17:19
efriedI'm talking about: if you overwrite with zeros, but just once, don't you need something special (that VMs don't have) to read the ghost data?17:20
dansmithnot necessarily17:20
efriedokay17:20
dansmithyou're thinking about low-level hardware detection of things,17:20
dansmithbut that's not the only way to get access to that "ghost" data17:21
dansmiththat's the whole point of sidechannel attacks17:21
alex_xuefried: do we have something to ignore a host for placement allocation_candidates call now?17:22
efriedalex_xu, sean-k-mooney: looks like the CI fixups have merged now https://github.com/intel/Intel-OpenStack-CI-jobs/pull/217:22
efriedWe still have some more minor deltas to do right?17:22
efriedalex_xu: you mean something like in_tree=!$uuid  ?17:22
alex_xuyes17:22
* efried looks17:22
sean-k-mooneyi havent done a full review but the inital change i have looked at seam correct17:23
efriedalex_xu: I don't think so. cdent, we haven't done !in_tree yet, right?17:23
efriedalex_xu: why?17:23
efriedoh17:24
alex_xuefried: for his https://review.opendev.org/#/c/681383/5/nova/scheduler/manager.py@15517:24
alex_xuif we have that, then we can ignore the src host for the same host resize17:24
openstackgerritArtom Lifshitz proposed openstack/nova master: Deprecate CONF.workarounds.enable_numa_live_migration  https://review.opendev.org/64002117:24
openstackgerritArtom Lifshitz proposed openstack/nova master: Functional tests for NUMA live migration  https://review.opendev.org/67259517:24
alex_xuefried: emm...or we do a manually filter after that17:25
sean-k-mooneyall the numa jobs just got killed17:25
artommriedem, dansmith ^^17:25
efriedalex_xu: Isn't this the famous "doubled allocations" bug?17:26
*** luksky has joined #openstack-nova17:26
alex_xuefried: no, it isn't17:26
alex_xuefried: when a host enable same host resize, and that host is the only host has available PCPU now.17:27
sean-k-mooneyalex_xu: cant ever ignore the source host for same host reszie or rebuild17:27
*** ociuhandu has joined #openstack-nova17:27
efriedyeah, I would think for same-host resize we would want to restrict to *only* the current host17:27
efried...having calculated the positive difference from our current allocations17:27
alex_xuefried: we will try to allocate PCPU in placement, and yes, placement tell you the source host has PCPU, but later the scheduler will check ignore host, the source host is in the ignore host. so the scheduling failed.17:27
dansmithefried: you don't know you're doing a same-host resize17:28
efriedoh, bummer17:28
alex_xusean-k-mooney: sorry, I mean the enable same host resize17:28
alex_xuoops17:28
alex_xudisalbe the same host resize17:28
sean-k-mooneysame host resize with pcpus is way more involved17:28
efriedso what we should really do is two GET /a_c calls17:28
sean-k-mooneywell not pcpu numa in general17:28
efriedone with in_tree=!$cur_host with the full set of resources17:29
efriedone with in_tree=$cur_host with the diff'd set of resources17:29
efriedand then combine the results.17:29
mriedemumm, you're not talking about adding that into the series for train now are you?17:29
sean-k-mooneythat still has the double allocation proablem17:30
alex_xudansmith: I know the ignored_host from request spec, so I can check that, if the allocation_candidates return the host in the ignore_hosts, then fallback to VCPU17:30
efriedsean-k-mooney: how would it?17:30
mriedemare you essentially trying to say, avoid the same host for a resize if the instance has PCPU resources?17:30
dansmithalex_xu: you want to fall back to vcpu allocation if the only host that gets returned is the one you're on?17:30
dansmithalex_xu: that's....crazy17:30
sean-k-mooneythe in_tree=$cur_host call would require double the resouces17:30
sean-k-mooneythe current resouce and the resize to resouces17:31
dansmithalex_xu: or are you saying fall back to vcpu allocation but with the current host excluded?17:31
sean-k-mooneyanyway the numa toplogy filter requires double the resouces so its not a new thing17:31
efriedsean-k-mooney: no, that's why the in_tree=$cur_host is made with the *diff* of the current and new resources (only the positive ones)17:31
*** ociuhandu has quit IRC17:31
efried(this is all theoretical btw)17:31
alex_xudansmith: fallback to vcpu if the only host get returned is the one you are on. After fallback vcpu, I won't get VCPu availalbe on the source host again17:31
sean-k-mooneyefried: i mention this on irc last week17:32
sean-k-mooneyin placment channel17:32
sean-k-mooneyefried: i have a customer request to fix inplemce rebuild with numa toplogy17:32
*** spsurya has quit IRC17:32
efriedsean-k-mooney: Yeah, I thought it sounded familiar.17:32
sean-k-mooneyso i or someone on my team needs to actully make this work in U17:32
cdenti confirm that in_tree=! hasn't happened yet (sorry was away from computer for a few minutes doing network management)17:32
sean-k-mooneyand backport somthi esle to queens17:32
mriedemsean-k-mooney: there is another nickel17:33
*** brault has quit IRC17:33
mriedemsean-k-mooney: remember when you said you'd make sriov pci device live migration claims part of the actual claims code in U too/17:33
mriedem?17:33
sean-k-mooneymriedem: yes but this time we have the customer wanting use to ship them the backported code  in october17:33
dansmithU is going to be a big cycle.. HUGE.  YUGE.17:33
dtroyerefried, sean-k-mooney, alex_xu: status update: we've re-triggered the pmem job on Intel CI with the changes on PS22, should have some results in about 45 min…17:33
mriedemsean-k-mooney: backported code for PCPU?17:33
efrieddansmith: no it's not.17:33
mriedemincluding all of the upgrade shit that entails?17:33
alex_xudtroyer: thanks17:33
dtroyerhttps://review.opendev.org/#/c/679640/2217:33
sean-k-mooneymriedem: no inplace numa vma rebuild17:34
efriedthanks dtroyer17:34
mriedeminplace numa vma rebuild?17:34
efriedit appears that 'pleaselookagain' is the magic invocation for pmem CI17:34
mriedemi'm lost - i don't know what that has to do with what we're talking about17:34
alex_xumriedem: https://review.opendev.org/#/c/681383/5/nova/scheduler/manager.py@15517:34
alex_xumriedem: or you can see the PS2 comment I give17:34
*** brault has joined #openstack-nova17:35
*** brault has quit IRC17:35
*** brault has joined #openstack-nova17:35
sean-k-mooneymriedem: https://bugzilla.redhat.com/show_bug.cgi?id=170041217:35
efrieddansmith: If the PTL has anything to say about it, we're going to accept only blueprints that were already approved or proposed-but-didn't-make-spec-freeze17:35
openstackbugzilla.redhat.com bug 1700412 in openstack-nova "[RFE] Rebuild with a different image fails due to NUMATopologyFilter" [High,New] - Assigned to nova-maint17:35
mriedem"if same-host resize is disabled and the only host available is the host you're on there are resource problems" is not a new issue17:35
*** jmlowe has joined #openstack-nova17:36
mriedem^ is also a problem for servers in a strict affinity group17:36
mriedemsince you can't resize out of that17:36
dansmithefried: good luck with that17:36
sean-k-mooneymriedem: no its not a new proablem at all17:36
efriedyou can be part of the solution17:36
mriedemi think i tried that strategy back in like ocata17:36
dansmithefried: don't want to17:36
alex_xumriedem: okay...17:37
dansmithefried: we've approved things I don't think we should do almost every cycle, so...limiting to doing things we've "approved" is counterproductive, IMHO17:37
alex_xudansmith: yes, you can be :)17:37
mriedemsean-k-mooney: so https://bugs.launchpad.net/nova/+bug/1763766 ?17:38
openstackLaunchpad bug 1763766 in OpenStack Compute (nova) "nova needs to disallow resource consumption changes on image rebuild" [Medium,Triaged]17:38
artommriedem, fwiw, the SRIOV live migration thing can be me17:38
alex_xudansmith: back to preivous, why I check ignore_host is crazy...17:38
sean-k-mooneymriedem: its related17:38
dansmithalex_xu: what I said is crazy is not what you are actually suggesting I think17:39
mriedemartom: i've resigned that it's never going to happen so was just using it as an example of another "we'll do it in U" thing17:39
alex_xudansmith: ok...i'm safe17:39
sean-k-mooneyfor older release i was planing to check for that and allow skiping the numa topology filter if there were no changes in resouce usaage or numa toplogy on rebuild only.17:39
artommriedem, I mean, I followed through on device tagging attachments, back in the day, didn't I?17:39
alex_xustephenfin: so...check that https://review.opendev.org/#/c/681383/5/nova/scheduler/manager.py@155, if that is solution for you17:39
artomThough I did drop the ball in placement for osc live migration17:39
mriedemartom: you had a business justification for tagged attachments right?17:40
artommriedem, -17:40
dansmithartom: you gonna wait for the last three weeks of the U cycle for that one too?17:40
artomErr, 017:40
mriedemcustomers aren't asking that you work on sriov pci device claims architecture17:40
sean-k-mooneymriedem: of corse we will only do dthat if we fix it upstream and block it upstream first17:40
artomIt was purely a "let's be good community citizens" thing17:40
mriedemi'm not sure what "placement for osc live migration" is17:41
stephenfinalex_xu: I'm not sure you get you17:41
artommriedem, err, I meant the osc live migration refactor17:41
mriedemstephenfin: obligatory smiley face please17:41
mriedemartom: i got that17:41
mriedemand the --boot-from-volume sugar17:41
artomI didn't even do the device tagging for it17:41
artomWhich we *did* have an ask for17:41
mriedemwell the good news is we have a whole big ass etherpad of osc gaps for nova17:42
mriedemhttps://etherpad.openstack.org/p/compute-api-microversion-gap-in-osc17:42
luyaostephenfin: could I just catch all exceptions, because we will print it later  https://review.opendev.org/#/c/678455/25/nova/virt/libvirt/driver.py@129717:42
mriedemthe bad news (for openstack) is my university mentoring project idea to work on that didn't get people17:42
alex_xustephenfin: the issue, or the solution?17:42
artommriedem, huh, that reminds me, we're apparently getting interns - or have signed up to get some, at any rate17:43
stephenfinso if there are no allocation candidates or if allocations_candidates == [current_host], fallback to VCPUs?17:43
artomWe were wondering what to set them lose on17:43
artomOSC could be a good idea17:43
alex_xustephenfin: if allocation_candidates == ignored_hosts, fallback to VCPUs17:43
artomThe other idea was test automation stuff with tempest/whitebox, since the learning curbe isn't as steep as nova17:43
artom*curve17:43
stephenfingotcha17:43
stephenfinI can do that (y)17:43
stephenfinalex_xu: Can I do it as a follow-up though?17:44
stephenfinbecause that patch is already huge and it's a corner case17:44
stephenfinI'll have it done first thing in the morning17:44
alex_xustephenfin: sure, or I can help that in my morning17:45
stephenfinsweet17:45
*** cdent has quit IRC17:45
stephenfinI'm just moving your quotas patch further down the series17:46
stephenfinshould be done shortly then I'll push it up17:46
alex_xucool, I thought it should be adjust the order also17:46
alex_xuanyway, see you guys tomorrow17:46
stephenfino/17:46
efriedstephenfin: did you see luyao's question about https://github.com/intel/Intel-OpenStack-CI-jobs/pull/2 ?17:47
efrieds,that link,https://review.opendev.org/#/c/678455/25/nova/virt/libvirt/driver.py@1297,17:47
mriedemhttps://review.opendev.org/#/q/topic:bp/numa-aware-live-migration+status:open+label:Code-Review=2+project:openstack/nova17:47
mriedemartom: dansmith: sean-k-mooney: ^17:47
mriedemdonnyd: ^17:48
efried\o/17:48
artomzomgies17:48
mriedemsuccess bot?17:48
efriedffs wait until they merge17:49
mriedempsh17:49
dansmithmriedem: praise be17:49
sean-k-mooney:)17:49
mriedemgibi_bus: are you going to get https://review.opendev.org/#/c/676980/ updated today or should i work on that?17:49
gibi_busmriedem: I'm about to finish the rework17:50
gibi_busmriedem: I will ping you when it is up17:50
* gibi_bus has a long day today17:50
mriedemok, then i'm going to get lunch17:51
gibi_busmriedem: have a nice one!17:52
* mriedem goes to look for leftovers in the fridge17:53
* gibi_bus has a bit of pastry for dinner 17:53
*** dolpher has joined #openstack-nova17:55
efriedo/ dolpher :)18:03
* efried feeds face18:04
* donnyd waves to dolpher 18:07
dolpherefried: Hello, seems the filter is not working, no one is triggering the CI job now18:07
dolpherHi donnyd :)18:07
*** Garyx has quit IRC18:12
*** mdbooth has quit IRC18:16
*** Garyx has joined #openstack-nova18:16
donnydsean-k-mooney: I didn't hear back on the labels.. but i hope this https://review.opendev.org/#/c/634827/ isn't waiting on jobs for FN18:19
donnydI submitted a bit more for resources, kinda need it right now18:19
openstackgerritBalazs Gibizer proposed openstack/nova master: Support migrating SRIOV port with bandwidth  https://review.opendev.org/67698018:20
gibi_busmriedem: ^^18:20
gibi_busbauzas: for tomorrow's coffee ^^18:21
*** panda|ruck is now known as panda|ruck|off18:21
sean-k-mooneydonnyd: no but the jobs got canceld18:22
sean-k-mooneythe 4 build i had in flight got killed because of a rebase/fix of the patches18:22
sean-k-mooneyi can start them again and see what happens18:22
openstackgerritsean mooney proposed openstack/nova master: multi numa nfv testing job  https://review.opendev.org/67965618:24
openstackgerritsean mooney proposed openstack/nova master: [DNM] testing with new lable.  https://review.opendev.org/68073818:24
sean-k-mooneydonnyd: just rebased those via the gerrit ui that shoudl run with the old and new lable18:25
donnydsean-k-mooney: ok cool18:26
*** maciejjozefczyk has quit IRC18:29
efrieddolpher: Can we kill the filter for now?18:39
openstackgerritBalazs Gibizer proposed openstack/nova master: Allow migrating server with port resource request  https://review.opendev.org/67149718:39
dolpherhttps://github.com/intel/Intel-OpenStack-CI-jobs/pull/418:40
dolpherefried: send the pull request to remove the filter, and I'll ping Matt to merge it.18:41
efriedokay18:41
openstackgerritBalazs Gibizer proposed openstack/nova master: Do not query allocations twice in finish_revert_resize  https://review.opendev.org/67882718:42
openstackgerritBalazs Gibizer proposed openstack/nova master: Allow resizing server with port resource request  https://review.opendev.org/67901918:44
openstackgerritBalazs Gibizer proposed openstack/nova master: Extract pf$N literals as constants from func test  https://review.opendev.org/68099118:52
openstackgerritLuyao Zhong proposed openstack/nova master: Retrieve the allocations early  https://review.opendev.org/67845018:52
openstackgerritLuyao Zhong proposed openstack/nova master: Claim resources in resource tracker  https://review.opendev.org/67845218:52
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver discovering PMEM namespaces  https://review.opendev.org/67845318:52
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: report VPMEM resources by provider tree  https://review.opendev.org/67845418:52
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Support VM creation with vpmems and vpmems cleanup  https://review.opendev.org/67845518:52
openstackgerritLuyao Zhong proposed openstack/nova master: Parse vpmem related flavor extra spec  https://review.opendev.org/67845618:52
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces  https://review.opendev.org/67964018:52
openstackgerritLuyao Zhong proposed openstack/nova master: Add functional tests for virtual persistent memory  https://review.opendev.org/67847018:52
openstackgerritLuyao Zhong proposed openstack/nova master: objects: use all_things_equal from objects.base  https://review.opendev.org/68139718:52
openstackgerritBalazs Gibizer proposed openstack/nova master: Improve dest service level func tests  https://review.opendev.org/68099818:54
*** eharney has quit IRC18:55
eanderssonHaving an odd bug - a VM can't be rebuilt with a new image. Pretty sure placement is filtering out all the potential matches.18:56
eanderssonRebuilding it with the same image works fine.18:57
openstackgerritBalazs Gibizer proposed openstack/nova master: Follow up for Ib50b6b02208f5bd2972de8a6f8f685c19745514c  https://review.opendev.org/68149018:57
mriedemeandersson: which release?18:57
eanderssonRunning Rocky18:57
eanderssonVery latest stable/rocky18:57
eanderssonI see that the scheduler gets a single result, but it is failing on NUMA Topology.18:58
mriedemwe shouldn't be calling placement for rebuild https://github.com/openstack/nova/blob/stable/rocky/nova/scheduler/manager.py#L12618:58
eandersson>  'NUMATopologyFilter: (start: 1, end: 0)'18:58
eanderssonAh bad assumption18:58
mriedemhttps://bugs.launchpad.net/nova/+bug/1763766 ?18:58
openstackLaunchpad bug 1763766 in OpenStack Compute (nova) "nova needs to disallow resource consumption changes on image rebuild" [Medium,Triaged]18:58
mriedemor https://bugzilla.redhat.com/show_bug.cgi?id=170041218:59
openstackbugzilla.redhat.com bug 1700412 in openstack-nova "[RFE] Rebuild with a different image fails due to NUMATopologyFilter" [High,New] - Assigned to nova-maint18:59
openstackgerritBalazs Gibizer proposed openstack/nova master: Skip querying resource request if no qos port  https://review.opendev.org/68151318:59
eanderssonYep looks like it18:59
dansmithyeah if the image is what causes it trouble,18:59
dansmithit's likely the new numa topo filtering out the one host you're on19:00
mriedemsean-k-mooney has to fix it and get that backported to queens19:00
* dansmith resists making an unhelpful comment about numa as a feature19:00
artommriedem, it's not getting backported, but one of the things we discussed was making that RUN_ON_REBUILD flag we added way back when for the rebuild CVE configurable in some way19:02
artomWe kinda dreaded coming to upstream with that19:02
*** brault has quit IRC19:02
sean-k-mooneydansmith: for the downstream issue yes its just the numa toploy filter filtering it out beacue tehre is no space for the rebuild19:02
dansmithsean-k-mooney: I'm talking about eandersson's case19:03
mriedemartom: we don't claim on rebuild so if the numa topo changes in the new image we won't claim for it properly,19:03
mriedemso you could just be kicking the error down to compute19:03
sean-k-mooneyill read scollback19:03
sean-k-mooneymriedem: correct19:03
artommriedem, yeah, it wouldn't be a correct fix19:03
sean-k-mooneywe rebuild with the new image but old toplogy19:04
artomMore of a candy we throw our customers on, like, 10 to placate them until they an upgrade to whatever release the un-backportable fix lands in19:04
mriedemsean-k-mooney: you mean in some downstream-only patch?19:04
sean-k-mooneymriedem: no updstream if you trun off the numa toplogy filter to get passed it19:04
sean-k-mooneywhen we rebuild but the toplogy change via the image19:05
mriedemoh19:05
sean-k-mooneywe end up with the old numa toplogy and new image19:05
sean-k-mooneyso i want to put an upstream check to prevent that19:05
mriedemok so hear me out,19:05
sean-k-mooneyand backport a config option to opt out19:05
mriedemthe rebuild api takes new metadata for the server,19:05
dansmithno19:06
dansmithno19:06
mriedemwhat if we have a skip_numa_topology_filter metadata that the user can pass in?!19:06
mriedemand then the filter checks for that19:06
mriedemand it's great19:06
dansmithnoooooooooo19:06
mriedemproblem solved19:06
* mriedem exits the community with a great idea19:06
mriedemok so the image takes random metadata,19:06
mriedemso hear me out19:06
mriedemos_skip_numa_filter_but_only_for_rebuild=true19:06
mriedemwe'll fix it in U19:07
dansmithWell, officer, you should have heard what he was suggesting before I shot him... Hear me out...19:07
mriedemthat's justifiable homicide in some states19:07
dansmithyeah19:07
artomWe fix it? No, U fix it19:07
sean-k-mooneywell the custromer has validated that tehy never us the image to change numa toplogy19:07
mriedemget in line with bauzas and the bad jokes19:07
sean-k-mooneyso for them that skip on rebuild is valid19:08
sean-k-mooneybut i have aded https://bugs.launchpad.net/nova/+bug/1763766 to the downstrema bug19:08
openstackLaunchpad bug 1763766 in OpenStack Compute (nova) "nova needs to disallow resource consumption changes on image rebuild" [Medium,Triaged]19:08
sean-k-mooneyso what i am going to propose is i implemnt a check for that19:08
*** eharney has joined #openstack-nova19:08
sean-k-mooneyand if and only if that passes we skip the numa toplogy filter19:08
sean-k-mooneybecasue not numa things changed19:08
sean-k-mooneyon rebuild only19:09
sean-k-mooneyand like the check we have to prevent numa migratio nwe can backport the saftey check for no resouce change on rebuild19:09
sean-k-mooneyand give peole a workaofund config option to disable it if they want to shot them selves in the foot19:10
sean-k-mooneyi have aslo said that if we dont fix it upstream im not fixing it downstream only19:11
sean-k-mooneymriedem: dansmith does ^ sound fair.  fixing https://bugs.launchpad.net/nova/+bug/1763766 is a prequisite for fixing https://bugs.launchpad.net/nova/+bug/1804502 safely19:13
openstackLaunchpad bug 1763766 in OpenStack Compute (nova) "nova needs to disallow resource consumption changes on image rebuild" [Medium,Triaged]19:13
openstackLaunchpad bug 1804502 in OpenStack Compute (nova) "Rebuild server with NUMATopologyFilter enabled fails (in some cases)" [Undecided,In progress] - Assigned to David Hill (david-hill-ubisoft)19:13
eanderssonWould it be possible to add an option to allow rebuilds to re-schedule the VM?19:14
eanderssonsomething like openstack server rebuild <id> --migrate/reschedule/bla19:14
sean-k-mooneylike evauate19:14
eanderssonI can't reproduce this as well19:14
sean-k-mooneyrebuild is ment to be posiblein virt driver that dont support migration19:15
sean-k-mooneythere is noting in principal that would disallow a driver form having rebiuld be to a different host like a cold migration but it is not how its currelty impelmeted19:16
sean-k-mooneywe would have to change the noop claim to a move claim if the inplcem rebuild did not succesed19:16
sean-k-mooneyor expose it at the api  like you suggest and make the resuling to a different host user contolerale19:17
sean-k-mooneyingoring numa for a second since image can have tratis we need to check with placmenet in anycase to see it the traits on the new image are valid on the current host19:18
sean-k-mooneyeandersson: what is the behavior you would like to see19:20
eanderssonBetter error message at least :p19:21
eanderssonBut not sure what the ideal would be.19:21
efriedstephenfin, sean-k-mooney, luyao, alex_xu: I'm +2 on everything vpmem except:19:21
efried- https://review.opendev.org/#/c/678455/ where I'll proxy a vote from sean-k-mooney19:21
efried- the bottom where I'll flip from -2 to +2 when we're ready19:21
eanderssonMaybe just documentation on how to avoid this on image builds.19:21
efriedI'm now going to sleep.19:22
eanderssonIt would be nice as an admin to have a way to fix it at least.19:22
sean-k-mooneyeandersson: well you cant avoid it if the instance has a numa toplogy19:22
sean-k-mooneyyou can work around it via cold/live migation19:22
sean-k-mooneybut that it19:22
*** efried is now known as efried_zzz19:22
eanderssonMaybe always force reschedule on rebuild when numa is a thing19:23
sean-k-mooneyrebuild are not allowed to change host today19:23
sean-k-mooneyso there is no rescudler19:23
sean-k-mooneywhats actully happening is the numa toplogy filter is treating the rebuild like a new isntance spawn19:24
sean-k-mooneyso you need double the resouces19:24
eanderssonoh so that is why some of them worked19:24
sean-k-mooneyyep19:24
sean-k-mooneyif the instance does not have a numa toplogy or there is enough space it works19:24
sean-k-mooneywe can actully now tell if its a rebuild in the filter code but we have not added special handeling for that in the numa toplogy filter19:25
sean-k-mooneythe correct thign to do is to check if the numa topldoy woudl change. reject the rebuild if its true or skip the numa toplogy filter if not19:26
sean-k-mooneythat is more or less want i want to code up19:26
sean-k-mooneyif posible i would like to do the "rebuild chage numa toplogy" check at the api and return an expcti error19:28
sean-k-mooneybut i have not figured out all the pices yet19:28
*** artom has quit IRC19:28
mriedemsean-k-mooney: if rebuild + new image + numa sometimes works if the new image numa topo fits on the same host, why wouldn't you just let the scheduler still try to fit it?19:31
mriedemand only outright skip if the image is new + rebuild + none of the numa properties changed19:31
sean-k-mooneymriedem: we dont update the instance numa toplogy in the db19:31
sean-k-mooneybecause that is dont in the claim19:31
mriedemand we don't claim19:31
sean-k-mooneyand we do a noop claim19:31
mriedemyeah ok19:31
sean-k-mooneyso if we fixed that too then sure19:31
mriedemso like new image + volume-backed server we just fail fast in the api today19:31
sean-k-mooneythats becasue we dont supprot volume backed rebuild in general?19:32
mriedemno,19:32
mriedemit's because we don't replace the root volume with the new image19:32
mriedemthere are approved cinder/nova specs to support all of that in the past and cinder did a bunch of work to plumb a new api for us, but the nova side changes all stalled19:33
mriedemgibi_bus: did you go to bed yet?19:33
sean-k-mooneythis liekly will become a spec in anycase. but what i wanted to do was first do the api/filer check to block it. then explore either updating the topology on rebuld and validatin git or otherwise having a more ocmplet fix19:34
sean-k-mooneyif it could fix then i would liek to allow it19:35
sean-k-mooneybut only if the toploy you ask for in the image is what you actlly end up with19:35
sean-k-mooneywhich is not what would happen without updating the laim19:35
sean-k-mooney*claim19:35
*** artom has joined #openstack-nova19:37
sean-k-mooneythe otherthing with allowing the toplogy to change on a rebuild would be keeping placmenet in sysnc if/when we model numa in placmenet19:37
sean-k-mooneyi dont want to complicate numa in plamcent more by having to deal with this at the same time19:38
mriedemi could use some opinions on alternatives i've proposed for some new error handling during prep_resize in gibi's bottom unapproved change https://review.opendev.org/#/c/676980/20/nova/compute/manager.py@4529 which is the only thing holding it up right now19:40
mriedemsince i'll probably be the one implementing them today so we can still meet FF for the bw provider migration stuff19:40
*** CeeMac has quit IRC19:41
mriedemsean-k-mooney: yeah i'd focus on whatever you can that is backportable first19:42
sean-k-mooney yep thats the plan. its in our team backlog for U and ill likely volenterr to be the person to fix it so im going to fix what can be backport first then consider how to imporve it later.19:43
sean-k-mooneythanks for the link to the other bug by the way19:43
sean-k-mooneyi forgot that was still open19:44
openstackgerritMerged openstack/nova master: Make _revert_allocation nested allocation aware  https://review.opendev.org/67613819:45
sean-k-mooneyim jsut looking at https://review.opendev.org/#/c/676980/20/nova/compute/manager.py@452919:45
sean-k-mooneyi dont really have context right now but what alternitve were you tinking of19:45
*** eharney has quit IRC19:45
mriedemthey are listed below19:49
sean-k-mooneyok ill keep reading19:49
mriedemgibi added code to the prep_resize flow that can result in a BuildAbortException getting raised and in that case we shouldn't try to reschedule19:50
mriedemhe added handling for BuildAbortException,19:50
sean-k-mooneythe finally block will be run by the way when we catch the BuildAbortException19:50
sean-k-mooneyi was just comming that in gerrit19:50
mriedembut was missing some things we do when we don't reschedule from a prep_resize failure, like recording a fault and sending an error notification19:50
*** henriqueof has joined #openstack-nova19:51
*** nweinber__ has quit IRC19:53
*** panda|ruck|off has quit IRC19:55
*** gibi_bus is now known as gibi19:55
* gibi reached the hotel19:55
gibimriedem: I'm not sleeping yet but I'm already short on brainpower19:56
mriedemgibi: ok if you can look at https://review.opendev.org/#/c/676980/20 before going to bed i can try to update it19:56
mriedembut it's looking non-trivial19:56
mriedemotherwise i'll just move up the series since those changes look simple19:57
*** panda has joined #openstack-nova19:57
gibinow I read your comments. it seems non trivial indeed. Would it be too bad letting the re-schedule happen as a first solution. And optimizing out the unnecessary rechedule later?19:59
gibimriedem: I cannot implement the functional test and the proper cleanup today. I can do that tomorrow morning. Or If you able to implement it then I can try to review it20:01
sean-k-mooneydonnyd: the new lable job passed https://review.opendev.org/#/c/680738/ so im going to call it a day20:02
gibiI will be back in 15 minutes20:03
mriedemgibi: i could let the BuildAbortException handling slide to a follow up but the fact that i can comment out the new code and none of the functional tests fail is kind of a blocker for me20:05
gibimriedem: ack, that was a surprise for me too.20:05
gibimriedem: I suggest you move forward with the reviews and I will respin this in the morning with a proper functional test20:06
gibimriedem: as I run out of time today and I don't want to ask you to troubleshoot those functional tests20:07
openstackgerritLuyao Zhong proposed openstack/nova master: Add functional tests for virtual persistent memory  https://review.opendev.org/67847020:07
*** BjoernT has quit IRC20:08
mriedemgibi: yup sounds like a plan20:08
openstackgerritMerged openstack/nova master: Nova object changes for forbidden aggregates request filter  https://review.opendev.org/67107220:11
*** spatel has quit IRC20:18
gibishover thoughts: the computes in those tests are symmetrical. they have the same PF setup. PFs are named the same on host1 and on host2 and we are "updating" the PF name in the PCI request.20:21
gibiwhen we migrate from host1 to host220:21
gibiso I can imagine that without the update the PF names remains the same20:22
gibiand everything works20:22
donnydsean-k-mooney: woot woot20:22
*** mvkr has quit IRC20:27
*** markvoelker has quit IRC20:27
*** mvkr has joined #openstack-nova20:32
*** markvoelker has joined #openstack-nova20:36
*** factor has quit IRC20:40
*** markvoelker has quit IRC20:41
*** factor has joined #openstack-nova20:42
openstackgerritMerged openstack/nova master: Improve SEV documentation and other minor tweaks  https://review.opendev.org/68125420:43
openstackgerritMerged openstack/nova master: Introduce live_migration_claim()  https://review.opendev.org/63566920:43
*** BjoernT has joined #openstack-nova20:47
gibiI was able to disprove my shover thoughts.20:48
gibiI'm going to sleep now. Talk to you tomorrow20:48
gibiand I disproved my disproval20:49
*** ociuhandu has joined #openstack-nova20:50
* gibi kick himself off the net20:50
openstackgerritMerged openstack/python-novaclient master: doc: Add support microversions for options  https://review.opendev.org/68117420:51
*** BjoernT has quit IRC20:51
*** BjoernT_ has joined #openstack-nova20:51
openstackgerritMerged openstack/os-traits master: Build pdf docs  https://review.opendev.org/68146520:53
openstackgerritMerged openstack/os-traits master: Update README to be a bit more clear  https://review.opendev.org/68123720:53
eanderssonthanks for the help sean-k-mooney mriedem20:56
eanderssonsorry had to walk away earlier for a meeting20:56
*** ociuhandu has quit IRC20:57
aspiers#success AMD SEV support finally landed in nova after a year https://docs.openstack.org/nova/latest/admin/configuration/hypervisor-kvm.html#amd-sev-secure-encrypted-virtualization20:59
openstackstatusaspiers: Added success to Success page (https://wiki.openstack.org/wiki/Successes)20:59
mriedemeandersson: yw, though i'm not sure how helpful it was20:59
mriedemat least you know what the problem is21:00
openstackgerritStephen Finucane proposed openstack/nova master: Counting both of VCPU and PCPU as core quota  https://review.opendev.org/68137421:01
openstackgerritStephen Finucane proposed openstack/nova master: Add support for translating CPU policy extra specs, image meta  https://review.opendev.org/67180121:01
openstackgerritStephen Finucane proposed openstack/nova master: fakelibvirt: Make 'Connection.getHostname' unique  https://review.opendev.org/68106021:02
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Mock 'libvirt_utils.file_open' properly  https://review.opendev.org/68106121:02
openstackgerritStephen Finucane proposed openstack/nova master: Add reshaper for PCPU  https://review.opendev.org/67489521:02
aspiers#thanks efried, sean-k-mooney, stephenfin, kashyap, bbobrov, and many others for helping land AMD SEV support in Nova!21:03
openstackstatusaspiers: Added your thanks to Thanks page (https://wiki.openstack.org/wiki/Thanks)21:03
*** ivve has quit IRC21:05
*** ivve has joined #openstack-nova21:05
*** slaweq has quit IRC21:10
*** slaweq has joined #openstack-nova21:11
mriedemi'm all the way up through the bw migration series and barring some catastrophe i think it'll probably all be +2ed from me tomorrow21:15
*** rcernin has joined #openstack-nova21:15
mriedemooo the thanks page21:15
openstackgerritStephen Finucane proposed openstack/nova master: Include both VCPU and PCPU in core quota count  https://review.opendev.org/68137421:17
openstackgerritStephen Finucane proposed openstack/nova master: Add support for translating CPU policy extra specs, image meta  https://review.opendev.org/67180121:17
openstackgerritStephen Finucane proposed openstack/nova master: fakelibvirt: Make 'Connection.getHostname' unique  https://review.opendev.org/68106021:17
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Mock 'libvirt_utils.file_open' properly  https://review.opendev.org/68106121:17
openstackgerritStephen Finucane proposed openstack/nova master: Add reshaper for PCPU  https://review.opendev.org/67489521:17
mriedem#thanks sean-k-mooney and donnyd for getting numa-aware live migration CI testing going on FortNebula nodes21:17
openstackstatusmriedem: Added your thanks to Thanks page (https://wiki.openstack.org/wiki/Thanks)21:17
donnydnp mriedem21:17
*** slaweq has quit IRC21:18
donnydmriedem: anything else that can be done on the provider end you can think of... lmk and i will do my best21:19
*** boxiang has joined #openstack-nova21:19
*** zhubx has quit IRC21:20
mriedemdansmith: remember this multi-cell floating IP re-associate bug fix https://review.opendev.org/#/c/656594/ ? can we get that re-approved?21:26
mriedemi think the only thing since it was last approved was fixing some words in the test and removing the handling for CantStartEngineError21:27
*** slaweq has joined #openstack-nova21:27
*** slaweq has quit IRC21:32
*** zer0c00l has joined #openstack-nova21:36
openstackgerritMatt Riedemann proposed openstack/nova master: Use multiple attachments in test_list_volume_attachments  https://review.opendev.org/68161821:43
donnydcan someone help me understand how the api-paste.ini works?21:44
openstackgerritMatt Riedemann proposed openstack/nova master: Use multiple attachments in test_list_volume_attachments  https://review.opendev.org/68161821:44
donnydfor reasons out of my control i need to run an api to include an additional url21:44
dansmithmriedem: jes21:44
donnydthis isn't really for nova... but i am hoping the api-paste works pretty much the same across projects21:45
*** takashin has joined #openstack-nova21:46
mriedemdonnyd: the rest all likely copied from nova originally so it should probably work similarly21:46
donnydso i need to add /image in at the front of the request for glance21:47
donnydand with the other service i just use apache and that works like a boss21:48
mriedemwould that be api-paste.ini or the endpoint url in the service catalog?21:48
donnydwell both21:48
mriedemprobably best to ask in #openstack-glance21:48
mriedemoh right glance is yet to support wsgi21:48
mriedemit's all mtreinish's fault21:48
donnydi did21:48
mriedemmaybe #openstack-operators?21:49
mriedemglance room is pretty dead usually21:49
donnydoh i will give that a swing21:49
mriedemor hunt down rosmaita21:49
mriedemhe's in -cinder21:49
mriedemhiding as the cinder ptl21:49
donnydadding /image in the api-paste works right now21:50
donnydbut the ref links don't come back with the url in them21:50
mriedemref links from glance or from nova?21:51
mriedemb/c https://docs.openstack.org/nova/latest/configuration/config.html#api.glance_link_prefix :(21:51
*** henriqueof1 has joined #openstack-nova21:51
mriedemi think ^ might only matter if you're using the compute api as a proxy to glance,21:52
mriedemwhich tools shouldn't be doing anymore21:52
*** henriqueof has quit IRC21:52
*** TxGirlGeek has quit IRC21:53
* mriedem has to run21:53
*** mriedem is now known as mriedem_afk21:53
*** hemna has joined #openstack-nova21:58
*** markvoelker has joined #openstack-nova21:59
*** panda has quit IRC22:00
*** panda has joined #openstack-nova22:03
*** slaweq has joined #openstack-nova22:11
stephenfinmelwitt: If you're still around, I've addressed your comments on https://review.opendev.org/#/c/681374/22:12
stephenfinYou can find the functional tests in the following patch that turns everything on https://review.opendev.org/#/c/671801/45/nova/tests/functional/libvirt/test_numa_servers.py@241 and https://review.opendev.org/#/c/671801/45/nova/tests/functional/libvirt/test_numa_servers.py@32922:13
melwittstephenfin: ack, thanks22:14
*** slaweq has quit IRC22:16
stephenfinand for dansmith, I've got the snazzy "try and try again" scheduler thing here itching for some red pen https://review.opendev.org/#/c/67180122:16
*** mlavalle has quit IRC22:17
openstackgerritTakashi NATSUME proposed openstack/nova master: Add TODO note for mox removal  https://review.opendev.org/57675822:25
*** avolkov has quit IRC22:31
*** TxGirlGeek has joined #openstack-nova22:32
*** TxGirlGeek has quit IRC22:33
*** threestrands has joined #openstack-nova22:37
*** luksky has quit IRC22:38
*** TxGirlGeek has joined #openstack-nova22:40
stephenfinI haven't seen anything stream by from Gerrit in quite some time :-\22:47
stephenfinGuess that's as good a sign as any to call it22:48
sean-k-mooneythis is several hours past when you normally do22:48
*** mtreinish has quit IRC22:54
openstackgerritStephen Finucane proposed openstack/nova master: trivial: Re-add some useful checks  https://review.opendev.org/68162622:54
*** dtruong has quit IRC22:55
*** mtreinish has joined #openstack-nova22:55
*** dtruong has joined #openstack-nova22:56
openstackgerritMerged openstack/nova master: vCPU model selection  https://review.opendev.org/67029822:59
*** macz has quit IRC23:01
*** tkajinam has joined #openstack-nova23:04
*** hemna has quit IRC23:04
*** slaweq has joined #openstack-nova23:11
openstackgerritMerged openstack/nova master: Add compatibility checks for CPU mode and CPU models and extra flags  https://review.opendev.org/67029923:15
*** slaweq has quit IRC23:15
openstackgerritMerged openstack/nova master: Support reporting multi CPU model traits  https://review.opendev.org/67030023:16
*** igordc has quit IRC23:18
*** BjoernT_ has quit IRC23:18
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Start reporting PCPU inventory to placement  https://review.opendev.org/67179323:31
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: '_get_(v|p)cpu_total' to '_get_(v|p)cpu_available'  https://review.opendev.org/67269323:31
openstackgerritStephen Finucane proposed openstack/nova master: objects: Add 'InstanceNUMATopology.cpu_pinning' property  https://review.opendev.org/68010623:31
openstackgerritStephen Finucane proposed openstack/nova master: Validate CPU config options against running instances  https://review.opendev.org/68010723:31
openstackgerritStephen Finucane proposed openstack/nova master: trivial: Use sane indent  https://review.opendev.org/68022923:31
openstackgerritStephen Finucane proposed openstack/nova master: objects: Add 'NUMACell.pcpuset' field  https://review.opendev.org/68010823:31
openstackgerritStephen Finucane proposed openstack/nova master: hardware: Differentiate between shared and dedicated CPUs  https://review.opendev.org/67180023:31
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Start reporting 'HW_CPU_HYPERTHREADING' trait  https://review.opendev.org/67557123:32
openstackgerritStephen Finucane proposed openstack/nova master: Include both VCPU and PCPU in core quota count  https://review.opendev.org/68137423:32
openstackgerritStephen Finucane proposed openstack/nova master: Add support for translating CPU policy extra specs, image meta  https://review.opendev.org/67180123:32
openstackgerritStephen Finucane proposed openstack/nova master: fakelibvirt: Make 'Connection.getHostname' unique  https://review.opendev.org/68106023:32
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Mock 'libvirt_utils.file_open' properly  https://review.opendev.org/68106123:32
openstackgerritStephen Finucane proposed openstack/nova master: Add reshaper for PCPU  https://review.opendev.org/67489523:32
*** igordc has joined #openstack-nova23:33
alex_xugood morning23:35
*** BjoernT has joined #openstack-nova23:39
*** igordc has quit IRC23:40
*** igordc has joined #openstack-nova23:42
*** hemna has joined #openstack-nova23:42
*** BjoernT has quit IRC23:48
*** mriedem_afk has quit IRC23:49
openstackgerritMerged openstack/nova master: Fix the incorrect powershell command  https://review.opendev.org/67958823:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!