13:04:42 #startmeeting PCI passthrough 13:04:43 Meeting started Wed Jan 8 13:04:42 2014 UTC and is due to finish in 60 minutes. The chair is baoli. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:04:44 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:04:46 The meeting name has been set to 'pci_passthrough' 13:05:01 hi, do we have a rough agenda for today? 13:05:13 Hi John 13:05:27 I posted this on the wiki yesterday:https://wiki.openstack.org/wiki/Meetings/Passthrough 13:06:11 baoli: how do you suggest to proceed? 13:06:18 Yesterday, we were discussing predefined PCI groups. Yongli doesn't seem to like the idea 13:06:37 Let's continue from what we have left yesterday 13:06:41 baoli: got it, thanks 13:07:27 baoli: will the meeting change back to Tuesdays once agreement is reached ? I haven't updated the meeting calendar given it's very temporary... 13:07:46 ttx, yes. 13:07:58 ok, let's skip the calendar update then :) 13:08:03 we will be doing daily meetings for this week. 13:08:19 except for Friday/Saturday 13:08:33 baoli: I can't promise to make all those, but lets see how it goes 13:09:04 Have a agreed the list of use cases we want to support yet? 13:09:29 like a short term list (for Icehouse) and longer term aims too? 13:09:58 john, we didn't go through those cases yet. But stuck on the first part 13:10:17 I thought use cases would be the first part, which bit are we stuck on? 13:10:18 But I guess that we should go through them first? 13:10:34 baoli: there is a list of use cases you put on wiki 13:11:03 Shall we start from use cases today, then? 13:11:03 I just miss there one more case for mixed VIFs, both sriov and vnic for same VM 13:11:19 #topic use cases 13:11:22 i think john mean the use case in the blueprint of nova. 13:11:33 #irenab, yes, I should put that in 13:11:53 heyongli: I think we probably want both, but lets start with this wiki first 13:12:02 sure 13:12:29 #topic SRIOV-based cloud 13:12:37 Any thoughts on this? 13:12:51 Can we start with GPU passthrough? 13:13:00 just to keep things simple 13:13:04 Ok 13:13:27 how do we want that to look? 13:13:39 nova boot —flavor bigGPU 13:14:04 nova boot —flavor smallGPU_4GBRAM_2_vCPUs 13:14:06 John, our discussion so far is based on PCI groups 13:14:31 groups is almost identical to pci-flavor 13:14:39 I think there will be more agreement if we work from what the user wants, then look how to deliver that 13:15:01 i.e. agree the problem we are solving, then look how to implement 13:15:08 then apply that to networking 13:15:26 johnthetubaguy: I think we mostly talked on PCI for networking, and this is quite different from GPU case 13:15:31 #agreed 13:16:02 I agree its different, but we need the object model to work for both right? 13:16:12 When we have been working on this for a while, and certainly we would think about it from user's point of view 13:16:34 Also taking into account what existing API we have in nova/neutron 13:16:59 johnthetubaguy: not sure it will e the same from request point of view 13:17:26 I have strong objection to elaborate in flavor request for SRIOV NICs 13:17:46 John, in any case, a PCI group/pci flavor can be used in the nova server flavor 13:17:49 which is fine for Device passthrough case 13:17:58 yes, and I think I agree, but I would just like to see both SRIOV and GPU side by side 13:18:30 if we agree how to setup GPU, for example, it should be very similar for SRIOV, agreed the user bit is probably different 13:18:42 I think GPU case should be mostly as today, with extra_spec for PCI device 13:19:09 the proposal is to change the terminology from pci_alias to pci_group 13:20:02 irena: no , i think the alias is diffrent , we can drop but not the same thing with group 13:20:06 OK, can we recap what we have in the code today, if only for my benefit? 13:20:22 then agree how GPU looks in the new(er) world? 13:20:52 Yongli, can you go ahead to describe that for John? 13:20:55 we have now is : alias: define how you chose the device 13:21:11 server flavor: use alias request your device 13:21:37 white list: select device from a host, pick which can be assign to VMs 13:22:00 Just want to add that the extra_specs/whitelist is based on PCI device's vender_id and product_id 13:22:22 and how do the while list and alias relate again? 13:22:41 by vendor_id and product_id 13:22:55 alias is chose device from the available pool 13:23:10 how does the device id come into things? only via whitelist? 13:23:13 whitelist chose device from all on a specific host 13:24:13 Ok, so if I want a flavor that says pick either GPUv3 and GPUv4, can I do that? 13:24:36 alias support this 13:24:57 define a alias, say the GPUv3 or GPUv4 13:25:05 OK, so alias is a list of possible vendor_ids and product_ids? 13:25:12 yeah 13:25:16 does it include device ids? 13:25:29 what id do you mean? 13:25:43 the PCI device id, where does that come into the model? 13:26:11 no alias not include the device id ( the DB main key) 13:26:50 so where does the device id come from? it gets seletected our of the whitelist on the device when attaching it to the VM? 13:26:51 John, by id, do you mean PCI slot? 13:26:59 possibly 13:27:21 that infomation store in the pci device model 13:27:47 I think I mean address, sorry 13:28:03 alias should not include the address 13:28:05 domain:bus:slot:func 13:28:16 right, thats the thing 13:28:26 whitelist doe not also, but i added alread in current patch i released 13:28:28 is that in the whitelist? 13:28:40 ah, OK 13:28:57 and support the * and [1-5] 13:29:01 so the big step between GPU and SRIOV is groups different addresses? 13:29:09 grouping^ 13:29:33 yes, they belong to different groups 13:29:52 so that should go in the alias now? for SRIOV? 13:30:05 alias don't need the address 13:30:16 Do we talk about SRIOV for networking or general? 13:30:18 add the group to alias is sufficent 13:30:37 OK, so we are adding an extra thing called group? 13:30:43 this is also in my patches released 13:30:46 that deals with grouping addresses? 13:30:46 yeah 13:30:59 why is this not just part of alias, that is just a grouping right? 13:31:12 yeah, just in group 13:31:16 alias is global 13:31:23 (sorry lots of dumb questions, but I just don't think I get where you are coming from now) 13:31:30 so group is going to be local to each server? 13:31:39 should not say, i want the devicd had bdf is a:b:c, this is meanless 13:31:59 PCI group is global 13:32:02 kind of local , like pci vendor 13:32:20 if we keep alias as it, this is local 13:32:29 if we kill alias, this is going to global 13:32:35 Yongli, alias is defined on the controller node 13:32:41 yeah 13:32:48 hmm, but its a local thing that gets referenced in a global concept (flavor) 13:32:55 I think this is where it gets very confusing 13:33:11 kind of confusing, might have a better solution 13:33:34 So, from my outsider view, this seems: 13:33:46 but , group is very like the vender id 13:34:15 a) roughly complete, but (b) a bit confusing ( c) re-inventing groupings we already have in other bits of nova 13:34:17 we can say vendor id is global, cause it's alloced by pci world 13:34:40 I think we can agree on this though... 13:35:08 PCI device has: vendor_id, product_id, address 13:35:27 and we want to group them 13:35:44 well, vendor_id is a hardware specific thing 13:35:50 types of GPU (don't care about address), types of VIF (do care about specific groups of addresses) 13:36:25 by default we should not expose any of these devices, unless we configure nova to allow such a device on a particular host to be exposed 13:36:26 VIF should not care the address, i think, they just need partion by address, am right ? 13:36:47 well, they a grouped by an address range right? 13:36:59 yeah, think 13:37:14 john: it maybe PF 13:37:24 parent of all Virtual Functions 13:37:44 ah, OK, so we have virtual functions from a specific address too? 13:37:55 or is function just part of the address? 13:38:18 Join, in SRIOV, we have PF and VF 13:38:40 PF: physical function, VF: vritual function, The function is part of the address 13:38:56 thats cool, just checking we are still grouping by address 13:38:58 Virtual Function is a PCI device of SRIOV NIC that has Parent Function representing the SRIOV NIC itself 13:39:21 thats all cool, just trying to work out what we are grouping 13:39:54 A PCI group is a collection of PCI devices that share the same functions or belong to the same subsystem in a cloud. 13:40:27 actually what we need for basic networking case is grouping by network connectivity 13:40:57 Irenab, that's what I mean by subsystem 13:41:02 OK, so we need someway to link the address to the neutron network-uuid? 13:41:42 in the case of SRIOV, new --nic options will achieve that 13:42:15 john: yes, but wee need to make sure that VM is scheduled on appropriate Host 13:42:38 well, I am not sure it always can, the user doesn't know which host that request will land on right? it just hints to some mappings 13:42:51 +1 13:42:54 anyways, I think we are moving foward here 13:43:24 pci group is a logical abstraction 13:43:53 john: its an idea. Based on the VM boot request it should be scheduled on the Host that is capable to provide SRIOV nics and connect to correct physical network 13:43:59 it doesn't care where it lands, but as long as it's using a device in a particular pci group 13:44:25 agree 13:44:45 right, so what is the user requesting here 13:45:05 the neutron network, and the type of connection? 13:45:27 so passthrough, or virtual, and also which type of passthrough, 1Gb or 10Gb, etc? 13:45:29 a neutron network with a NIC that is in a particular PCI group 13:45:39 on wiki: nova boot --flavor m1.large --image --nic net-id=,vnic-type=macvtap,pci-group= 13:45:42 I am trying to ignore our terms here, and thing of the user 13:45:47 think^ 13:46:02 John, 1Gb or 10 GB is a qos thing 13:46:34 It's not related to what we are discussing here. But conceptually, you can have a PCI group with 1GB nics 13:46:36 depends, it could be different cards right? 13:46:54 john: on --nic there is waht we think is needed 13:47:28 to deal with 1G, 10G thing, add the pci device_id to alias is a good solution 13:47:37 i think 13:48:05 AGain, you can use PCI groups to group NICs that are on different kind of cards 13:48:16 also work 13:48:33 OK 13:48:47 I have written up what I think we said here: 13:48:48 https://wiki.openstack.org/wiki/Meetings/Passthrough#Definitions 13:48:59 Do we all agree with those statements? 13:49:40 sorry, I missed a bit, please refresh 13:49:48 extra bullet on SRIOV 13:50:23 i post my +1 13:50:29 john: I think the last SRIOV bullet is not accurate. 13:50:54 irenab__: yeah, I don't like it, what is a better statement? 13:51:15 Its not specific to neutron network , its specific to provider_network that many neutron networks can be defined for 13:51:39 John, can we go throught the original post and see if they make sense? 13:51:43 OK, so it could be specific to a group of netron networks? 13:51:52 john: yes 13:52:06 irenab__: awesome, got you, thanks 13:53:00 irenab__: can you check my update please, is that better? 13:53:32 baoli: we can do that next, I just wanted to agree some basics of what we have, and what we need 13:54:04 ok 13:54:26 john: its OK. Not sure what you mean by specific configuration 13:54:45 I was meaning neutron might specify settings like VLAN id 13:55:19 john: correct 13:55:30 cool, thanks, let me add an e.g. 13:55:55 so I guess in basic cases we pass the full nic though 13:56:08 and its straight to the provider network 13:56:11 each device can be configured differently, but the common part is that it has same network connectivity (to the same Fabric) 13:56:22 but if we have virtual devices, we can do some fun stuff 13:56:25 right 13:57:10 john: with full NIC passthrough , I think there is nothing for neutron to do 13:57:38 irenab__: yeah, it probably gives the guest IP addresses, and things, but yes, there is little connection info I guess 13:58:30 In full passthrough it can be only configured from inside the VM 13:58:39 server flavor can still be used for generic PCI passthrought 13:58:50 at least for cases I need, we talk only of SRIOV VFs 13:58:51 I don't get why that is, neutron DHCP can still setup, if its given the mac address? 13:59:28 jogh: agree. I mean that you need VM to actually do something to get the config, like send DHCP request 13:59:30 they might mean pass through regular PCI 13:59:51 irenab__: ah, yep, sorry, thats true 14:00:09 cool, so I think we can agree the GPU passthrough case then... 14:00:30 user requests a flavor extra specs *imply* which possible PCI devices can be connected 14:00:33 still base on alias, right ? 14:00:51 john: would you be available tomorrow for this meeting to dig into SRIOV net details? 14:00:55 I am leaving that out for now.. we can add that later 14:01:06 what time is tomorrow? 14:01:15 same time 14:01:23 13.00 UTC? 14:01:27 that should be OK 14:01:31 Yes 14:01:40 great. thanks 14:01:45 Do we want to end this meeting now? 14:01:57 we might have to soon 14:02:02 I can do another 10 mins 14:02:14 I can too 14:02:20 cool 14:02:23 fine 14:03:07 Do we want to start SRIOV NIC case? 14:03:32 well, just thinking about doing a statement like 14:03:33 user requests a flavor extra specs *imply* which possible PCI devices can be connected 14:03:42 as in thats the GPU case 14:03:54 what do we say for the SRIOV case? 14:04:13 I think flavor extra spec is not god solution for networking case 14:04:32 good^ 14:04:37 a VM needs NICs from one or more PCI groups 14:04:51 user requests neutron nics, on specific neutron networks, but connected in a specific way (i.e. high speed SRIOV vs virtual) 14:05:01 doe that make sense? 14:05:05 does^ 14:05:15 and VM can be attached to different virtual networks 14:05:39 and iterface can be attached/detached later on 14:05:49 interface^ 14:05:51 I should say, A VM needs NICs on some networks from some PCI groups 14:05:55 some of the nics may be virtual, some may be passthrough, and some might be a different type of passthrough 14:06:03 yes 14:06:08 john: correct 14:06:15 I am trying to exclude any of the admin terms in the user description 14:06:26 so we have a clear vision we can agree on, thats all 14:06:49 #agreed 14:06:56 john: vision yes, implementation details - no 14:07:00 OK, I updated the wiki page 14:07:08 https://wiki.openstack.org/wiki/Meetings/Passthrough#The_user_view_of_requesting_things 14:07:14 do we agree on that? 14:07:47 john: yes 14:08:14 sorry to take up the whole meeting on this, but really happy to get a set of aims we all agree on now 14:08:14 #agreed 14:08:19 sweet 14:08:30 +1 14:08:35 so I think the question now, is how do we get the admin to set this up and configure it 14:08:40 and what do we call everything 14:09:01 agree 14:09:06 that sounds like something for tomorrow, but maybe spend 5 mins discussing one point... 14:09:06 #agreed 14:09:15 ok 14:09:20 at the summit we raised an issue with the current config 14:09:44 john: can you recap 14:10:10 basically we are trying to keep more of the config as API driven, to stop the need for reloading nova.conf, etc, and general ease of configuration 14:10:18 now clearly not everything should be an API 14:11:01 also, in other sessions, we have pushed back on ideas that introduce new groups that are already covered by existing generic groupings (i.e. use host aggregates, don't just add a new grouping) 14:11:06 John, we have discussed configuration versus API for the past couple of meetings. Would you be able to look at the logs? I can send you the logs 14:11:31 yeah, if you can mail me the lots that would be awesome, or are then on the usual web address? 14:11:43 did we have nova-core review any outcomes of that yet? 14:11:47 john: we try to define auto-discovery of PCI device in order to minimize items needed for config 14:11:55 right that sounds good 14:12:11 I should read up on those lots 14:12:13 logs 14:12:16 A couple of them are in the daily logs, but not in the meeting logs 14:12:29 ah... 14:12:49 I need to find a way to link them back here. I'll try to do that 14:12:59 baoli: I think its better you send it, since there was meeting name change and one meeting without starting... 14:13:12 cool, we should probably end this meeting, then add those pointers to the wiki page? 14:13:18 I'll send them again. 14:13:28 Sure 14:13:32 cool, could we just add it to that meeting wiki page? 14:13:37 cool 14:13:49 thanks, I think the meeting was productive. see you tomorrow 14:13:55 I'll do both. See you guys tomorrow 14:13:55 thanks,baoli 14:14:00 so to be upfront, I think we can do the whole grouping with host aggregates and an API to list all pci devices 14:14:09 but yep, lets chat tomorrow! 14:14:21 #endmeeting