13:01:23 <baoli> #startmeeting PCI Passthrough
13:01:24 <openstack> Meeting started Tue Nov 18 13:01:23 2014 UTC and is due to finish in 60 minutes.  The chair is baoli. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:01:25 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:01:27 <openstack> The meeting name has been set to 'pci_passthrough'
13:01:30 <baoli> Hi there
13:01:33 <beagles> hi
13:01:36 <pczesno> hi
13:01:38 <sadasu> hello
13:01:38 <riwinters> hi
13:01:57 <yonglihe> hi
13:02:24 <irenab> hi
13:02:44 <baoli> Looks like the everyone is here. let's get started
13:02:57 <baoli> #topic bugs
13:03:14 <baoli> https://wiki.openstack.org/wiki/Meetings/Passthrough#Agenda_on_Nov._18th.2C_2014
13:03:40 <baoli> So no new bugs
13:04:33 <yonglihe> sound good
13:05:02 <baoli> yonglihe: any progress on the 'resize' bug?
13:05:09 <yonglihe> yes
13:05:13 <irenab> I guess we need to be sure that proposed patches are tested on real HW
13:05:49 <yonglihe> for now, the flow design is ok. and resize to same host is working fine.
13:06:01 <yonglihe> for resize to another machine, still in debuging.
13:06:38 <yonglihe> and just to throw a factor impact design :
13:06:57 <yonglihe> do we should ensure reverting resize get the orignial pci devices?
13:07:28 <yonglihe> and , should we ensure the reverting is always can be successuf?
13:07:43 <yonglihe> the 2nd is like must.
13:07:58 <yonglihe> but first 1nd, i don't very sure.
13:08:11 <yonglihe> current, i ensure both.
13:08:19 <irenab> yonglihe: I think it should work in a sort of 'make before break' fasion
13:08:51 <yonglihe> ireab, don't get it , what is it?
13:09:14 <irenab> I mean if resize if not working on other node, you should stay with what was working before
13:09:54 <irenab> I think it is similar to migration flow
13:09:57 <baoli> yonglihe: I see, can you share your preliminary patch if possible? I wonder if there are common area between resize and live migration
13:10:20 <irenab> I think it may be also relevant to interface-attach
13:10:41 <yonglihe> baoli: share most of code
13:11:11 <baoli> yonglihe, are you working on 'resize' with generic PCI device, or SR-IOV as well
13:11:13 <sadasu> for resize within the same host, we should be able to use the same pci devices
13:11:37 <yonglihe> sadasu, ok, currently it is.
13:12:11 <sadasu> resize on a diff host, should be similar to bringing up a whole new VM with new criteria, once that is up, free up old VM
13:12:45 <irenab> and if it failed, keep the initial VM
13:12:58 <yonglihe> baoli, it's time cosuming to debug it. to ensure both 1&2 constrain, there lack of 2 'update' resource point, i add it but need testing and debuging.
13:13:42 <irenab> yonglihe: can the fix be split into 2 parts: resize on smae node and resize to other node?
13:13:44 <yonglihe> irenab, keep the initial pci is ok, not the initial vm,cause if not live migration, it's destroyed.
13:14:13 <yonglihe> irenab: it can , but i not split it. and no unit test.
13:14:23 <baoli> yonglihe, I understand.
13:14:42 <irenab> yonglihe: ok, I was not aware that there is no option to choose the fallback
13:14:45 <yonglihe> i can throw a 'WIP' single host version, if you guy's want it.
13:15:09 <irenab> yonglihe: great, we can grap and test it as well
13:15:11 <baoli> yonglihe, sure a WIP patch is welcome so that we can see where you are at and going
13:15:41 <baoli> Anything else on bugs?
13:15:53 <yonglihe> ok, single host is working, including device reserve,  release, resize, reverting and aborting.
13:16:32 <yonglihe> resize from 0-1, 1-2, 2-1 also tested, but not enough.
13:17:07 <irenab> baoli: neutron bug is fixed
13:17:14 <baoli> yonglihe, that is very good progress.
13:17:20 <yonglihe> a head up is, pci devices is kind of new for nova resource tracker, which not fit wll wiht "reseting to free and collection from instances and migrations' stlye
13:17:49 <baoli> irenab, cool
13:18:48 <baoli> yonglihe, so looking forward to your WIP patch
13:18:57 <yonglihe> so, resouce reservation, release and all thoes depending on the states of opration and some key point 'resource updating'.
13:19:01 <irenab> I have related question
13:19:21 <yonglihe> such a patch, will kind of 'mess' .
13:20:09 <yonglihe> another bug is 'sync with the db.
13:21:03 <yonglihe> there are 2 patches there, but i had a new idea:  a approch to the 'colletion from instance ' resolution, but it's kind of long term refine task.
13:21:30 <yonglihe> s/resolution/solution/
13:21:54 <baoli> yonglihe: what's "collection from instance"?
13:21:59 <irenab> yonglihe: what bug number/link?
13:23:12 <baoli> irenab: 1383465, 1333498
13:23:32 <yonglihe> baoli: think about the vcpu/memory
13:24:14 <yonglihe> every round update resource, resource tracker reset them to free, then subtracting them from this node's instances and migrations.
13:25:00 <yonglihe> this approch will make resize /migration more clear, but request more changes to db opration.
13:25:37 <baoli> yonglihe, I will take a look at it offline. But right now, we need something that is working with the existing design, I think
13:25:56 <yonglihe> currently pci db 'forbiden' opratiing the device from another node, this lead the resize code mess.
13:26:00 <yonglihe> baoli, agree
13:26:10 <irenab> yonglihe: I think it worth to add your concerns to the bug
13:26:28 <yonglihe> irenab, i will update
13:27:04 <irenab> it will be easier to understand why it was resolved in certain way
13:27:16 <sadasu> irenab: what neutron bug were you referring to earlier? link pls?
13:27:35 <irenab> sadasu: it is on the meeting agenda
13:27:41 <sadasu> ok....thanks
13:28:00 <baoli> yonglihe, you are welcome to write up your thoughts so that we can understand what you want to do. Also your WIP patch would help us know where you are at.
13:28:03 <irenab> https://bugs.launchpad.net/neutron/+bugs?field.tag=pci-passthrough
13:28:19 <yonglihe> baoli, sure.
13:28:46 <baoli> yonglihe, thanks
13:28:53 <baoli> Can we move on?
13:28:55 <yonglihe> another thing, i not testing the sriov. it requie a few more patch to do that.
13:29:14 <baoli> yonglihe, that's fine
13:29:18 <irenab> I do not want to be trouble maker, but I think we should think about appropriate testing for the fixes
13:29:44 <irenab> tempest?
13:29:51 <yonglihe> can't
13:29:56 <yonglihe> need third CI
13:30:09 <yonglihe> for an formal way to ensure pci is ok.
13:30:30 <irenab> yonglihe: agree. But we need CI to run the test
13:31:06 <yonglihe> lack of method to fake a 'pci passthrouh' devices. if there is, tell, me , we then can setup up official temptest cases.
13:31:14 <irenab> we may proably discuss it part of the CI topic, so lets move on
13:31:19 <baoli> irenab, CI is on the agenda
13:31:28 <baoli> #topic Reviews
13:32:05 <baoli> On the new whitelist format, some cores prefers a different file approach
13:32:35 <baoli> But Ian's patch is still under review
13:33:25 <sadasu> my 2 cents, if these cores are very active in the NFV area, we should consider this approach
13:34:02 <baoli> Haven't seen Ian responding to the email thread. So we'll see
13:34:44 <baoli> a few reviews requires +2s. So let's ask the cores to take a look at them
13:34:45 <sadasu> Ian as in ijw was off last week
13:35:15 <sadasu> or is that a diff Ian
13:35:16 <baoli> sadasu, it is Ian Wienand
13:35:22 <sadasu> ok :-)
13:39:53 <baoli> #topic Blueprints
13:41:45 <baoli> On live migration, I was thinking that the VM should move to a host in which PCI devices are from the same vendor/product_id as the source host. Do you think this makes sense as a requirement?
13:42:40 <irenab> baoli: interesting...Not sure I agree...
13:43:08 <jchapman> Hi all, SRIOV scheduling with stateless offloads spec will be uploaded today
13:43:36 <irenab> I mean with the general approach to support multi vendors
13:44:08 <irenab> baoli: it is possible to make it as optional requirement?
13:44:09 <yonglihe> jchapman, add me
13:44:37 <baoli> irenab, are you thinking that you may live migrate a VM with sr-iov from one vendor to a different one? How do we make sure these devices are compatible?
13:44:37 <yonglihe> live migration to a diffrent pci device is posible?
13:44:46 <sadasu> baoli: don't think that should be a requirement
13:44:49 <jchapman> yonglihe; sure thing
13:44:58 <beagles> baoli, I'm curious about that myself
13:45:04 <yonglihe> if devices changes, driver know that? and hypervisor support such thing?
13:45:41 <baoli> jchapman, that's cool.
13:45:54 <yonglihe> even the pci address change , does hypevisor supporting such live migration?
13:46:34 <baoli> irenab, for example, if a VM uses cisco's vmfex, it can't be live migrated to a host that uses non vmfex devices.
13:47:24 <irenab> baoli: as think that here some devices can coexist and some cannot. Intel and MEllanox probably can be replaced, but Cisco has a unique technology
13:47:34 <sadasu> baoli: I think the decision should still be based on physical network
13:48:12 <baoli> sadasu, then you do you make sure the devices are compatible?
13:48:19 <irenab> but I think baoli is right, since image should contain the appropriate vendor drivers
13:48:28 <yonglihe> the live migration constrain should thinck carefully
13:48:53 <yonglihe> i notice there a way through suspend to ram to achive this, but still limited maybe.
13:49:44 <irenab> I think we may start with sticking to the same vendor and evolve later, and see what will be raised during spec review
13:49:54 <baoli> yonglihe, we're talking about live migration with macvtap, not direct passthrough
13:50:12 <yonglihe> baoli, sorry , forget it again
13:50:21 <irenab> I forgot  too :-)
13:50:58 <irenab> so aactually we do not have problem with vendor driver inside VM, so why we need the limitation?
13:50:59 <baoli> irenab, I agree.
13:51:13 <yonglihe> agree
13:51:43 <sadasu> i feel we should only worry about physical network connectivity
13:51:50 <baoli> irenab, I don't think we'd have problem with cold migration to different vendors, as long as they are on the same network
13:52:29 <sadasu> baoli: what is the difference between cold migration and live migration?
13:52:54 <beagles> time is running a litle short, so I figure I'll just interrupt - I'm putting together a blueprint for supporting bonds interfaces in guests... on the face of it, it doesn't seem like a "big deal" but I may be missing some critical bits. I'll be uploading a WIP tomorrow
13:53:06 <beagles> (lots of meetings today so I don't expect it to be today)
13:53:12 <irenab> baoli: I think the best will be if you push the spec as WIP so we can start to get an idea of what is your plan
13:53:24 <irenab> beagles: great!
13:53:36 <baoli> irenab, let's try that.
13:53:43 <baoli> beagles, looking forward to it
13:54:37 <baoli> sadasu, VM will look liks no down time with live migration
13:55:25 <baoli> beagles, please update the meeting wiki with your WIP
13:55:36 * beagles nods
13:55:41 <sadasu> beagles: thanks..no idea what "bonds interfaces in guests" means but will look for your spec
13:56:13 <irenab> beagles: do you plan for multi vendor support as well? Is it vendor agnostic?
13:56:49 <pczesno> i will try to restart the discussion about nova api changes to support booting with sroiv ports
13:57:10 <sadasu> baoli: going back to live/cold migration..can we have diff criteria for each as you are suggesting in the earlier comment?
13:57:32 <baoli> sadasu, that's possible.
13:57:37 <pczesno> i understand that there were earlier approches to do that , can you guys provide links to old reviews discussions etc.
13:57:47 <beagles> irenab, considering the use case I'm focussing on, I suspect that it will be vendor agnostic but I would not expect users to select from different devices to create a bond in the guest... although...
13:58:02 <sadasu> baoli: ok
13:58:09 <beagles> now that I think of it, I don't know why they wouldnt' want to do that if the guest OS supported it
13:58:11 <baoli> pczesno: one sec.
13:58:22 <irenab> pczesno: great, I just saw some proposed nova spec for multi-queue proposing api change as wee planned for vnic_type
13:58:46 <beagles> .. thinking that if HA is the objective the different upstream links might be across diverse hardware
13:59:10 <irenab> beagles: agree with every word
13:59:21 <beagles> irenab, in any case, I wasn't thinking of anything that would be restrictive or exclusionary
14:00:01 <irenab> beagles: this was my understanding as well
14:00:18 <pczesno> irenab, baoli can you add the old links to the agenda pls? i gotta go
14:00:19 <pczesno> bye
14:00:30 <baoli> pczesno, sure
14:00:42 <irenab> pczesno: what irc you are usually on? I can try to find the relevant links
14:01:20 <irenab> let's start from CI our next meeting, ok?
14:01:24 <sadasu> time is up
14:01:38 <baoli> ok, next time
14:01:42 <yonglihe> time to say you gain.
14:01:42 <baoli> thanks everyone
14:01:44 <yonglihe> see
14:01:48 <yonglihe> thanks
14:01:50 <irenab> bye
14:01:53 <riwinters> bye
14:01:55 <sadasu> thanks again
14:01:56 <baoli> #endmeeting