Thursday, 2022-06-30

*** hemna9 is now known as hemna06:36
gibigood morning07:32
* kashyap waves07:57
bauzasgood morning08:10
bauzasgibi: actually, yesterday I was organizing a birthday party for my daughter with 8 of her friends :)08:10
bauzasnot sure I "enjoyed" it :D08:11
Ugglabauzas, still alive ? :)08:31
gibibauzas, Uggla: https://www.youtube.com/watch?v=qdrs3gr_GAs ? :)09:57
*** bhagyashris_ is now known as bhagyashris10:25
sean-k-mooneygibi: portal is a great game11:36
gibiindeed11:36
sean-k-mooneyby the way i watched your summit video11:37
sean-k-mooneylive demos are fun11:37
sean-k-mooneydid you restack live on stage11:37
sean-k-mooneyyou said you rebooted the vm but devstack does not survie reboots anymore11:37
sean-k-mooneysince there are ordering issues with the systemd service files11:37
gibiI restarted the devstack VM, but did not restack it. It worked:D11:54
sean-k-mooneymaybe the issues have been fixed11:59
sean-k-mooneythe wsgi service used to race with appache i think and there was an issue with the /run dirs being created12:00
sean-k-mooneyyou could get lucky 12:00
sean-k-mooneybut often the service woudl not start cleanly12:00
gibiI never seen that issue locally12:05
gibibut meh12:05
sean-k-mooneyits been a while since i tried it honestly12:06
sean-k-mooneylike 18.04 era12:06
*** dasm|off is now known as dasm13:03
sean-k-mooneybauzas: was i imaginign it or did you have an implementation of removing the ssh key pair generation13:44
sean-k-mooneybauzas: i was going to go review it today but i cant find it13:44
sean-k-mooneybauzas: i was hopign we could get that closed out to remove it form the list of thing we need to finish this cycle13:45
sean-k-mooneybauzas: speaking of things i would like to clsoe out can you review https://review.opendev.org/c/openstack/nova/+/84700113:46
sean-k-mooneygibi: it would be nice if you could reivew this too https://review.opendev.org/c/openstack/nova/+/833411 i would like to keep that backport series moving along13:49
* sean-k-mooney is actully logged into a terminal today and lookign at patches13:49
gibisean-k-mooney: on it13:50
sean-k-mooneythanks13:51
sean-k-mooneyim also going to rebase https://review.opendev.org/c/openstack/nova/+/830829 shortly once i run and fix the functional tests13:57
sean-k-mooneytja twill be a nice easy win we defered it last cycle because it was too close to FF13:58
sean-k-mooneyso i would like to try an land it by m213:58
sean-k-mooneymainly to catch any ci fallout if there is any13:58
sean-k-mooneythe benifit of me running test locally on my laptop instead of my home server is it takes a while and i do code reviews while im waiting14:01
* sean-k-mooney is still tempeted to swap to my home server to run thses instead however :P14:01
mfodansmith, hey! sorry, i couldn't get back to you yesterday ("context dropped out"). most certainly.. the long delay/lost context is on me :/  14:02
mfoso, briefly:  in victoria/ussuri one can set `hw_firmware_type=uefi` and the VM won't boot if the UEFI ovmf firmware/loader picked is the secureboot one (*.secboot.fd), as the default `hw_machine_type=pc` doesn't support it (that needs `q35`).   14:02
mfoso we should *try* other firmware/loader files, or use .secboot.fd anyway if its' the only one that exists (not to break/change from what is done in this old/stable branch).   14:02
mfoThis was a suggestion from sean-k-mooney, and also to have unit tests. 14:02
mfoYour suggestions were 1) to convert the list of firmware/loader paths to constants (so to intentionally "break" any downstream stable consumers that possibly patched that, so they'd be aware of the change and review their version), and 2) change the patch approach to not even iterate the list of firmware/loader files  with .secboot.fd  if we're on PC machine type.    (I just combined it with, "ok, use it if it's the only one" per Sean's 14:02
mfosuggestion too).14:02
sean-k-mooneyi think i basicaly said sort so that its last in the list and if its the only one then use it and maybe it will work14:03
mfoI guess that's what this is about and where we are. :)14:03
sean-k-mooneyrahter then deleteing it form the list14:03
mfosean-k-mooney, yup. this is currently how it is, but the for-loop doesn't have a `break` in case it finds one, so the last file is used. 14:04
sean-k-mooneybut yes sercure boot need uefi14:04
mfoi could just add a break in there, but this would change things / file end up chosen in the general case, i guess.14:05
sean-k-mooneyso it wont work with hw_machine_type=pc`14:05
mfoand for older stable branches, well, i didnt want to risk such changes. (i'm not too familiar w/ openstack yet, actually).14:05
sean-k-mooneyi think we aleay want to break on the first file we find that exsits as long as we make the secure boot one last if you did not ask for secure boot14:07
sean-k-mooneywe are only going to boot the vm once anyway so there is no point in continuing to check14:07
sean-k-mooneybut i dont have your patch open currently so that depnds on how you wrote the loop14:08
opendevreviewribaudr proposed openstack/nova master: Allow unshelve to a specific host (Compute API part)  https://review.opendev.org/c/openstack/nova/+/83150714:09
opendevreviewribaudr proposed openstack/nova master: Allow unshelve to a specific host (REST API part)  https://review.opendev.org/c/openstack/nova/+/84589714:09
mforight, i wondered why the (existing) loop didn't have a 'break' statement in the first place when it was introduced, but it was long ago, and didn't seem like a thing to change for old stable branches.  also, in this version one can't ask for secure boot yet (this is victoria/ussuri, and secureboot came in wallaby), it just happened that a patch adding the OVMF paths included a .secboot.fd file, before proper SB support came in.14:10
sean-k-mooneyack14:11
sean-k-mooneythe ovmf frimware images can be build so that secure boot is avaiable but not required14:12
sean-k-mooneyso i think the old logic was intended to allow it to be used if it was supproted14:12
sean-k-mooneyby the ovmf image by manually configuring it via the boot menu14:12
sean-k-mooneybut that was not really supported by nova14:13
mfosean-k-mooney, ah, i see.14:13
mfoi think that works for the SB feature alone, but i think the SMM feature support once it's built in then it isn't opt-in, right? than that requires support in the emulator (ie, qemu's q35).14:14
mfos/than/then/14:14
sean-k-mooneyi think that is correct but dont know all the details14:14
sean-k-mooneySMM does require qemu to be configured to enable it14:15
sean-k-mooneybut i dont know if you need that enabled if its complied in to the ovmf image14:15
sean-k-mooneyits all a bit of a mess14:15
mfoyes, there are many details in this.  i added some docs/links to the patch for the research i had done, but it's indeed at the lower-level details.14:15
bauzassean-k-mooney! sorry, haven't yet done the implementation for the keypair deprecation14:17
sean-k-mooneybauzas: ack no worries i was just expecting that to be relitivly small so if it was off your backlog you woudl have more time to reivew14:19
mfosean-k-mooney, if you're ever curious, i doc'ed the research details of OVMF/SB/SMM/QEMU in bug 1960758 comment #6,  the takeaway is, you can build OVMF with SMM_ENABLE, but then it's not really _secure_ Secure Boot; for that you need SMM_REQUIRE, and then platform support is not optional, it's required as well.14:19
sean-k-mooneybauzas: i was just trying to see if we could reduce context switching by merging some small easy wins before m214:19
sean-k-mooneyto have less to context switch between coming up to FF14:20
sean-k-mooneymfo: SMM is system management mode support which is actuly unrealted to secure boot14:20
sean-k-mooneySMM is used for some other security feature and ring -2 hypervior feature at run time14:21
mfosean-k-mooney, IIUIC indeed, it's orthogonal, but it's involved in the implementation of not allowing the OS to tamper w/ the SB things in memory,  otherwise the OS could bypass things.14:21
mfothe link to pbonzini's presentation/video about it is really clarifying.14:22
sean-k-mooneyyep so in generall i would expect the default uefi image to be build with SMM_ENABLE14:22
sean-k-mooneyand the secure boot one to have require14:22
bauzassean-k-mooney: https://review.opendev.org/c/openstack/nova/+/847001 +W with comments14:24
bauzassean-k-mooney: yeah but I had a lot of other stuff to do before :)14:24
bauzaslike the vGPU bugs 14:25
bauzasand then reviews and bug triage14:25
bauzas(and now, back to be a mentor :) )14:25
bauzaseventually, will work on this feature next week 14:25
bauzasmaybe tomorrow if I have time14:25
sean-k-mooneyno rush sicne i actully had time to do upstream stuff today i was just looking to see what was close to merging14:26
sean-k-mooneyand the open review priorties14:26
sean-k-mooneystephenfin: are you around today? if so got time to look at this os-vif patch its pretty small https://review.opendev.org/c/openstack/os-vif/+/83910214:28
stephenfinsean-k-mooney: sure, I'll look now14:28
sean-k-mooneyta14:29
stephenfindone14:34
sean-k-mooneymany thanks14:36
opendevreviewMerged openstack/osc-placement master: Support microversion 1.39  https://review.opendev.org/c/openstack/osc-placement/+/82854514:37
*** diablo_rojo__ is now known as diablo_rojo14:39
stephenfinsean-k-mooney: Seen this before? https://zuul.opendev.org/t/openstack/build/b5c09ce1dbdd42228f5f2928d9df6178/log/controller/logs/screen-n-cpu.txt#1006014:56
stephenfinnova.exception.InternalError: Unexpected vif_type=unbound14:56
stephenfinIt rings a bell, but I thought we'd fixed this years ago14:56
sean-k-mooneywe had older bugs related to unbound14:57
sean-k-mooneythat is the state when the host-id is not set on the port14:57
sean-k-mooney i think i looked at this14:57
sean-k-mooneyNo conversion for VIF type unbound yet {{(pid=97953) nova_to_osvif_vif /opt/stack/nova/nova/network/os_vif_util.py:530}}14:58
sean-k-mooneyis really just a side effect of the prot not being bound properly in neutron14:58
sean-k-mooneyah right15:00
sean-k-mooneyhttps://zuul.opendev.org/t/openstack/build/b5c09ce1dbdd42228f5f2928d9df6178/log/controller/logs/screen-q-svc.txt#1089115:00
sean-k-mooneyso this is an issue with slow neutron i think15:00
opendevreviewMerged openstack/nova stable/wallaby: fake: Ensure need_legacy_block_device_info returns False  https://review.opendev.org/c/openstack/nova/+/84367815:00
opendevreviewMerged openstack/nova stable/wallaby: Add a regression test for bug 1939545  https://review.opendev.org/c/openstack/nova/+/84367915:00
sean-k-mooneystephenfin: basically when i was looking at this i was assumeing this happened because we retried the port bidning because neutron was slow to responed15:01
sean-k-mooneyand that caused a concurnet bind attempt that left it in ubound or something like that15:01
stephenfinHmm, that sounds reasonable. We saw it bubble up in the OSC tests because the server create failed. That sounds like a likely root cause though15:02
sean-k-mooneyif that is what is happening i woudl expect to see the retry logged somewher or see two bind attempts in the nutron log15:02
sean-k-mooneyill check if i can see that15:02
sean-k-mooneywe do see the port had just finished binding when we got the concurrent error15:03
sean-k-mooney] Bound port: a2fb8af2-d4df-4b29-bd3f-5591aa8819d2, host: ubuntu-focal-rax-dfw-0030231262, vif_type: ovs, vif_details: {"connectivity": "l2", "port_filter": true, "ovs_hybrid_plug": false, "datapath_type": "system", "bridge_name": "br-int"}, binding_levels: [{'bound_driver': 'openvswitch', 'bound_segment': {'id15:03
sean-k-mooneyya it  looks like there are 3 attempts to bind the port15:05
fricklerstephenfin: I mentioned that yesterday, it also seemed related to neutron retrying binds15:05
sean-k-mooneyin the neutron side the last to of which had the concurrent bind excption15:05
sean-k-mooneyfrickler: well its actully the neutornclient retrying the bind15:05
sean-k-mooneyfrickler: that was entirely broken in nova until somewhat recently15:06
melwittbauzas: I wanted to get your thoughts on this proposed patch to change logic in the placement audit nova-manage command, since you worked on it https://review.opendev.org/c/openstack/nova/+/844418 it seems like there is a bug in the current logic but it's not clear to me what the logic should be15:06
sean-k-mooneyfrickler: we fixed retires about a year or so ago15:06
bauzasmelwitt: okay, I'll look15:06
melwittthanks15:06
fricklersean-k-mooney: iiuc there is an internal retry in neutron happening now15:06
sean-k-mooneyfrickler: there is also likely one in the db decorator15:07
sean-k-mooneywe can see transaction error in the log15:07
frickleris this with neutron-segment enabled? we have some issue with segment ID reuse in OSC15:07
sean-k-mooneyi think segments were enabled yes15:08
frickleroh, that's the osc job, yes15:08
sean-k-mooneyits using vxlan however i think15:09
sean-k-mooneyrather then routed provider networks15:09
sean-k-mooney Bound port: a2fb8af2-d4df-4b29-bd3f-5591aa8819d2, host: ubuntu-focal-rax-dfw-0030231262, vif_type: ovs, vif_details: {"connectivity": "l2", "port_filter": true, "ovs_hybrid_plug": false, "datapath_type": "system", "bridge_name": "br-int"}, binding_levels: [{'bound_driver': 'openvswitch', 'bound_segment': {'id': 'cd5c5c6b-1027-4fc7-bbc7-b8204df12e32', 'network_type': 'vxlan',15:10
sean-k-mooney'physical_network': None, 'segmentation_id': 1, 'network_id': 'ff960d9f-3b68-4b9b-8d69-78fe6441f27b'}}] {{(pid=90353) _bind_port_level /opt/stack/neutron/neutron/plugins/ml2/managers.py:948}}15:10
fricklerah. maybe it is side effect of the segments test that runs in parallel. breaking other random tests15:10
sean-k-mooneyno right after ^ where the mech driver is able to bind15:10
sean-k-mooneywe get a concurrent bind excption15:10
sean-k-mooneythen neutron retires15:11
fricklerhttps://zuul.opendev.org/t/openstack/build/b5c09ce1dbdd42228f5f2928d9df6178/log/job-output.txt#2229915:11
sean-k-mooney7.556330 ubuntu-focal-rax-dfw-0030231262 neutron-server[90353]: WARNING neutron.plugins.ml2.plugin [req-f9a5c6a8-ab26-4f1f-ab63-dd518edf32f3 req-c372ca6e-78a4-4f09-976b-c74d5f169c66 service neutron] Concurrent port binding operations failed on port a2fb8af2-d4df-4b29-bd3f-5591aa8819d215:11
sean-k-mooneyJun 30 11:36:27.557557 ubuntu-focal-rax-dfw-0030231262 neutron-server[90353]: INFO neutron.plugins.ml2.plugin [req-f9a5c6a8-ab26-4f1f-ab63-dd518edf32f3 req-c372ca6e-78a4-4f09-976b-c74d5f169c66 service neutron] Attempt 2 to bind port a2fb8af2-d4df-4b29-bd3f-5591aa8819d215:11
fricklerlook at ^^ the passing segment test right after the failure15:11
fricklerI'm pretty sure this is related15:11
sean-k-mooney unless you are using the same port in both tests i dont see how it would be15:12
fricklerthe segment test locking the DB leading to retries in other actions15:12
sean-k-mooneyoh so you think this trace https://zuul.opendev.org/t/openstack/build/b5c09ce1dbdd42228f5f2928d9df6178/log/controller/logs/screen-q-svc.txt#1072215:12
sean-k-mooneyis caused by the segment test15:13
sean-k-mooneyORM session: SQL execution without transaction in progress, traceback15:13
sean-k-mooneylets check the nova logs and see if there is a rety on our side15:14
sean-k-mooneyif not then its an internal neutron issue15:14
stephenfinsean-k-mooney: I'm not sure if that's related to this issue or not15:14
stephenfinactually no, maybe it is. It's an update_port call that's causing the issue15:16
opendevreviewMerged openstack/nova stable/xena: reenable greendns in nova.  https://review.opendev.org/c/openstack/nova/+/83341115:31
opendevreviewMerged openstack/os-vif master: Check for hybrid plugging in OVS  https://review.opendev.org/c/openstack/os-vif/+/83910216:04
opendevreviewMerged openstack/nova master: ignore deleted server groups in validation  https://review.opendev.org/c/openstack/nova/+/84700116:37
opendevreviewMerged openstack/nova stable/wallaby: compute: Ensure updates to bdms during pre_live_migration are saved  https://review.opendev.org/c/openstack/nova/+/84368016:54
opendevreviewMerged openstack/nova stable/wallaby: fup: Make connection_info returned by CinderFixture unique per attachment  https://review.opendev.org/c/openstack/nova/+/84459416:54
opendevreviewMerged openstack/nova stable/wallaby: fup: Assert state of connection_info during LM rollback in func tests  https://review.opendev.org/c/openstack/nova/+/84459516:54
opendevreviewJay Faulkner proposed openstack/nova stable/victoria: [ironic] Minimize window for a resource provider to be lost  https://review.opendev.org/c/openstack/nova/+/80087318:44
*** dasm is now known as dasm|off20:42
*** diablo_rojo is now known as Guest386521:17

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!