Thursday, 2020-06-04

*** maciejjozefczyk_ has joined #openstack-nova07:17
bauzasgood morning Nova08:41
gibibauzas: good morning08:47
* bauzas wraps his head to understand the routed networks story08:52
xiaolinhello guys, I'm trying to run openstack on a host with MIPS architecture, but got an error "Instance failed to spawn: libvirtError: internal error: unexpected host-model CPU for mips64el architecture" while creating an instance.09:04
xiaolinAnyone can guide me how I can contribute a fix to upstream?  need I report a bug first?09:04
gibixiaolin: does this similar to what belmoreira requested here?
openstackLaunchpad bug 1863728 in OpenStack Compute (nova) "Nova can't create instances for a different arch" [Wishlist,Invalid]09:18
gibixiaolin: we will discuss ^^ today during the PTG
*** Liang__ has joined #openstack-nova09:25
xiaolingibi: The problem I encountered is not similar to  what belmoreira requested09:27
xiaolingibi: the host and instance are both MIPS architecture09:28
alex_xugibi: sean-k-mooney does neutron already report physical network interface resource provider? accroding to the spec
* alex_xu sorry for inject the discussion09:29
gibixiaolin: I see. Then I don't have a direct answer. the libvirt error suggest that something is missing from the libvirt side. Are you able to start a VM purely with libvirt / virsh on your host?09:30
belmoreiragibi xiaolin unfortunately I won't be able to join09:30
gibibelmoreira: thanks for the headsup, we will discuss it anyhow09:31
gibialex_xu: neutron reports SRIOV PF and OVS bridge as RP in placement with physnet and vnic type trait09:31
alex_xugibi: got it09:31
gibialex_xu: but only if bandwidth is defined in the agent conf for these devices09:32
alex_xugibi: just read the cyborg sriov doc, begin to think about..cyborg will report any RP for PF...09:32
alex_xus/report any/report another/09:32
xiaolingibi: Yes, I can start a VM purely with libvirt / virsh on my host09:32
gibialex_xu: I guess we have to decide if neutron or cyborg owns the PF09:33
alex_xugibi: ok, if they are totally two feature, should be ok. Since those cyborg PF's RP should be managed by cyborg...not sure how cyborg and neutron merge them...09:33
alex_xugibi: yea, that is something to decide09:33
gibixiaolin: please look at the host-model documentation you might need to configure the nova a bit differently for using your hardware.09:36
gibixiaolin: also I hope kashyap could help from libvirt perspective09:37
* kashyap blinks and reads the scroll10:11
kashyapxiaolin: Oh, MIPS ... you are the first I've come across here :)10:12
kashyapgibi: xiaolin: So the error "unexpected host-model" simply means, MIPS in libvirt doesn't have the notion of 'host-model'10:12
kashyapSo, let me check what's possible there...10:13
kashyapxiaolin: What you could try is: launch your `qemu-system-mips -cpu help` (or whatever the MIPS QEMU binary is called)10:15
kashyapxiaolin: On the Compute host, and configure that as an explicit CPU model in Nova:10:15
gibikashyap: thanks for the help :)10:16
kashyap    [libvirt]10:16
kashyap    cpu_mode = custom10:16
kashyap    cpu_model = $My_MIPS_CPU_FIXME10:16
kashyapgibi: Happy to be useful :)10:17
* kashyap goes back to kitchen to finish up the meal10:17
* gibi realizes that it is lunchtime10:19
gibisean-k-mooney: today neutron creates the PF RPs and reports bandiwdth inventory on it, when we start reporting VF inventory from nova we need to agree with neutron who creates the PF RP or we have to create two RPs for a PF one from neutron perspective and one from nova perspective and then connect them somehow10:43
sean-k-mooneygibi: nova need to do it because it needs to be consitient with non nic pci devices10:44
*** ociuhandu has joined #openstack-nova10:44
sean-k-mooneyunless we are going to specal case when the device has a physnet tag on the pci whitelist10:44
sean-k-mooneywhich we coudl do but then we need two different code paths10:44
sean-k-mooneygibi: ideally it would end up being one RP10:45
sean-k-mooneyunless we extend placemets data model with a way to link rps10:45
sean-k-mooneygibi: quick question do you want a blueprint or a spec to track the machine type recodeing work?10:54
sean-k-mooneyas part of that i would like to start reporing avialble machine types as traits to placemnt and the schdlueling on that10:54
sean-k-mooneye.g. if we do a move opperation on an exsting instance i want to ensure we land it on a host that supports that machine type10:55
sean-k-mooneyand if we are creating a new instance and you request a machine type in the image i want to ensure we land on a host that supports that too10:56
sean-k-mooneyboth of which would be done by a prefilter10:56
sean-k-mooneyjust addign the required tratit based on the instance system metadata or image metdata10:57
gibisean-k-mooney: sorry I was lunching.11:00
sean-k-mooneyno worries11:00
sean-k-mooneygibi: are you ok with also doing the traits reporting and prefileter in the same specless blueprint.11:05
sean-k-mooneyill write it up and i guess we can decide then11:06
*** mgariepy has quit IRC11:06
sean-k-mooneyi assume a specless blueprint is also fine for the doc change to add the common profiles(nfv,realtime,modren) that we talked about too11:07
sean-k-mooneyim going to try and get those up before we resume today11:07
*** udesale_ has joined #openstack-nova11:08
gibisean-k-mooney: the prefilter work is a bit of a grey area as I guess that needs a new config flag to enable since old computes will not report the traits11:11
*** udesale has quit IRC11:11
gibiso you might only want to enable the prefilter after all your computes are on V11:11
sean-k-mooneyyes although we could auto enable based on the min compute service version too.11:12
gibisean-k-mooney: specless bp for the hw:profile doc is totally OK11:12
gibisean-k-mooney: OK, note this upgrade wrinkel in the specless bp then I'm OK11:12
sean-k-mooneysure will do. if i end up having to write a small spec its not the end of the world either but we will see what other think once its up11:13
gibisean-k-mooney: we will bring this bp up in the next nova meeting for approval so the rest of the team can complain if they need more words about it11:13
sean-k-mooneysure works for me11:13
openstackgerritBalazs Gibizer proposed openstack/nova-specs master: Make Feature Liaison optional
gibinova will continue the PTG session in 30 minutes in the Juno Zoom room.12:30
*** slaweq_ is now known as slaweq13:28
sean-k-mooneyartom: you should not13:29
sean-k-mooneyi can change mine and i dont have one13:29
artomsean-k-mooney, in the client?13:29
sean-k-mooneyin both the client and the browser13:29
artomsean-k-mooney, 'cuz I didn't find any way to do it. Even their official doc states you have to be logged in13:29
dansmithadvanced settings need an app I think13:29
dansmithincluding advanced network13:29
sean-k-mooneyartom: what are you trying to change13:30
artomsean-k-mooney, audio output to go to my BT headphones13:30
sean-k-mooneyif you click the arror on mic13:30
artomIt was outputting to stereo out, where nothing is currently plugged in13:30
sean-k-mooneythen you can change the output13:30
artomsean-k-mooney, oh christ, yes, it's there, you're right13:30
*** Liang__ has joined #openstack-nova13:32
artomstephenfin, but my gainz are way too low :( Stupid gyms closing13:41
stephenfinartom: you know where the door is13:41
bauzasgibi: patches from dansmith about the compute 5.0 API bump
dansmithbauzas: yeah13:55
gmanngibi: 1 min joinin13:55
bauzasgibi: the proxy dansmith told
gibigmann: we will discuss the policy stuff after the break at 5 minute past the hour13:55
gibigmann: so no rush13:55
bauzasanyway, less a deal to write the 6.0 proxy than the 5.013:55
gibibauzas: thanks!13:55
gmanngibi: ok13:55
bauzaserr, the 5.x proxy I mean13:56
*** ratailor has quit IRC13:58
*** ralonsoh_ is now known as ralonsoh14:00
*** rpittau|brb is now known as rpittau14:01
kashyapgibi: stephenfin: Has the topic flew by?  Which line this whole ARM topic on?14:56
stephenfinthe previous one (line 427)14:56
kashyapAh, we're on line-427 currently14:56
gibikashyap: 427 yes14:56
kashyapstephenfin: Yep, just see it, "virtualized architectures"14:56
dansmithsean-k-mooney: so you could just float one DNM patch to some thing to add that queue to get one run of nova on arm yeah?14:58
dansmithI mean, I assume we've never seen that work?14:58
CeeMachi sean-k-mooney and melwitt ; regarding or chat yesterday around shelving, we've done some testing today and it would appear that even in shelved_offloaded state an instance is accumulating usage and the resources still appear allocated in the project quota15:56
CeeMacim back to being perplexed15:57
melwittCeeMac: oh... this is sounding familiar :(15:57
CeeMacwell, that doesnt sound good :/15:57
CeeMaci'm running rocky btw15:58
melwittlong ago, when the shelve API was added I am recalling that we don't release quota while instance is shelved. but I didn't realize that would cause it to count as the simple tenant usage. but it makes sense that it would15:58
melwittthe reasoning behind that decision was, not to let the user ever be in a position where they cannot unshelve for lack of quota15:59
*** factor has quit IRC15:59
*** jsuchome has quit IRC15:59
CeeMacyeah, i was actually having that conversation with my colleagues around implications to consider when shelving16:00
CeeMaci guess, at least in the way I'm ingesting the data, it depends on how horizon is building its usage information for Admin | Overview | Usage summary16:01
sean-k-mooneyso that would be the init_host check17:24
sean-k-mooneythe per vm check is if i add up the MEMORY_MB for all vms on this host is it larger then total ram17:24
sean-k-mooneyif so warn17:24
sean-k-mooneythe less agressive warning would be to warn on total_ram + swap17:25
sean-k-mooneyinstead of > then total ram but once your are over total ram you are in OOM killer terrorty17:25
melwittsean-k-mooney: ah I see, thanks17:26
sean-k-mooneymelwitt: do you rememebr the nova bug for that transport thing i borught up at the end17:27
sean-k-mooneyinfact that is an anti patteren i think17:39
melwittsean-k-mooney: could it have been this?
openstackLaunchpad bug 1854992 in OpenStack Compute (nova) "Frequent instances stuck in BUILD with no apparent failure" [Undecided,Incomplete]17:40
sean-k-mooneymelwitt: yep that was the nova bug17:40
sean-k-mooneywell i think it was the second one17:41
melwittok, can you write a note on it so we can have more hope to find it next time xD17:41
melwittthe only reason I found that was because I thought I remembered erik reporting it17:41
*** ociuhandu has joined #openstack-nova17:42
sean-k-mooneyyep erik is from blizzard right17:42
sean-k-mooneyoptions = oslo_messaging.TransportOptions(at_least_once=True)17:42
sean-k-mooneyclient = oslo_messaging.RPCClient(transport,17:42
sean-k-mooney                                  target,17:42
sean-k-mooney                                  transport_options=options)17:42
sean-k-mooneyso i think that is all we have to do17:42
melwittI wonder if there's a way we could test that it does something17:43
sean-k-mooneywhere we do this
sean-k-mooneyand well the other RPC clients17:43
sean-k-mooneythat is what is trying to do17:44
sean-k-mooneywhen it fails to deliver a message to the queue17:45
sean-k-mooneywe will get a oslo_messaging.exceptions.MessageUndeliverable17:45
melwittyeah I mean, how do you create that env where it would have raised MessageUndeliverable?17:45
melwittjust stop rabbitmq process or?17:45
melwittjust curious how to repro and see it fix the problem17:46
sean-k-mooneyno you need to send a message to a queue that does not exist17:46
melwittok, I don't know how to do that I guess17:47
sean-k-mooneywe could create an exchange without any queue and jsut send a message to a random queue name17:47
melwittor is that the 'foo' in the example17:47
sean-k-mooneyya so we can sed to foo17:48
sean-k-mooneybut not create foo17:48
melwittok. shows how much I know about rabbitmq :P17:48
sean-k-mooneywell i think that is how this works17:48
melwittcool. I'm just thinking about a one-off local test just so we know we set the option in the right place and get the improved behavior. since the unit tests that go with the patch will only be checking "did you pass mandatory at this call site"17:49
sean-k-mooneyi wondering if we could do an opertunistic functional test or something17:50
sean-k-mooneyso based on the fact that skips if its not rabbit://17:57
sean-k-mooneyi would guess that it only works if its aviable17:57
melwittok. besides that, that makes it look like we need to do more than just set the flag right? we have to do something to handle MessageUndeliverable17:59
sean-k-mooneywhich woudl depend on the rpc call17:59
sean-k-mooneyfor spwan i guess put the vm to error state?17:59
sean-k-mooneynot really sure about what we woudl do for anything else18:00
melwittah so this is more complicated than I thought18:01
melwittI thought all we'd have to do is set the flag and then oslo.messaging would reconnect us and retry or something like that18:01
sean-k-mooneythe issue is that the queue that the compute agent created nolonger exsits but the compute agent does not know that. so unless we had the compute agent call itself on that queue in a heart beat we would not know we had to recreate the queue18:01
sean-k-mooneymelwitt: im not sure if we can create the queue or if it has to be the compute agent18:02
sean-k-mooneythe compute agent normally creates the queue when it connect to rabbitmq18:02
sean-k-mooneywhich is why restarting the compuate agent fixes the problem18:02
sean-k-mooneyso at a minium we can log that the agent is unreachable. we could even mark the agent as down but that leaves the question of who will make it up again once its restarted18:03
melwittyeah. well, then I'd wonder if we could put try-except in the nova/rpc layer that will create the queue if we get MessageUndeliverable? that way whoever gets the "no queue" state will recreate it?18:03
sean-k-mooneyif we had a periodic task in teh compute agent that sent to its own queue then that would work18:04
sean-k-mooneyit might work if the conductor or api created it too18:04
sean-k-mooneyi just dont know if the compute agent will start litening to the queue when its created by someone else18:05
melwittyeah I guess I'm thinking it wouldn't matter who creates the queue. like if nova-conductor gets MessageUndeliverable trying to talk to compute, assume the queue is gone and recreate it, and resend18:05
melwittoh, right18:05
sean-k-mooneythat is why i was suggesting havign the compute agent send to its own queue18:06
sean-k-mooneyjsut a simple ping/heatbeat18:06
melwittperiodic does sound like it would work but that doesn't help someone trying to boot an instance and it fails due to this18:06
melwittbut of course better than what we have today18:07
sean-k-mooneywe could retry after a short interval?18:07
sean-k-mooneyhoping the agent fixes its self?18:07
*** ralonsoh has quit IRC18:07
sean-k-mooneymaybe if we have time tommorow and ben or some of the oslo folks are aroudn we could ask them18:08
sean-k-mooneyour i guess i could try it locally18:08
melwittyeah... I don't love that idea but yeah, would take some thought on some options18:08
sean-k-mooneyi could use the rabbitmq gui to delete the queue manually18:08
sean-k-mooneyi dont think the manamge ui is isntall by devstack by default but i dont think its that hard to enable18:09
sean-k-mooneyif i could repoduce it manually then we could test teh recreate behavior18:10
sean-k-mooneye.g. if i make the condutor recreate it and i still can boot a vm i know it need to be the agent18:10
melwittyeah doing that would demystify a lot. but yeah gonna be a pain/not ideal if nova-compute would need to re-init to pick up the new queue18:11
sean-k-mooneyim going to go figure out what im doing for dinner and then i might give it a try although i kind of want to look into it someother time18:14
sean-k-mooneycan you bug me about this if you remember tommorow or next week18:14
bnemecIf you have messaging stuff to talk about I would suggest pinging kgiusti to make sure he's available.18:14
melwittsure. I'll try to write something on the launchpad bug so I don't forget about this a sixth time or whatever we're at now18:14
melwittin addition18:15
sean-k-mooneybnemec: well we are just trying to figure out how to use the mandataory flag that ye enabled via the transport options correctly18:15
sean-k-mooneybut my rabbit mq knoladge is really not good enough to have an intuition about this18:16
melwittuse it + handle the exception that will raise as a result of setting it18:16
melwitt*set it18:16
sean-k-mooneymelwitt: hehe yep 6th might be a bit much but at least thrid or forth :)18:17
sean-k-mooneywe finally merge the patch t silance the amqp heart beat error by the way18:19
openstackstatusmelwitt: Added success to Success page (
melwittlol oops18:20
sean-k-mooneyok time for food o/18:21
melwittI forgot about the hashtag success being a thing 😬18:22
