13:00:23 <baoli> #startmeeting PCI Passthrough
13:00:24 <openstack> Meeting started Tue Jan 14 13:00:23 2014 UTC and is due to finish in 60 minutes.  The chair is baoli. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:00:25 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:00:27 <openstack> The meeting name has been set to 'pci_passthrough'
13:01:32 <ijw_> peeps
13:02:15 <irenab_> hi
13:02:42 <baoli> Hi everyone
13:02:51 <itzikb1> hi
13:03:18 <baoli> Is the proposed agenda ok?
13:03:33 <heyongli> hi
13:03:39 <irenab_> baoli:yes
13:03:52 <baoli> So let's start with pci flavor
13:03:56 <baoli> #topic pci flavor
13:04:58 <yjiang5> baoli: I have a question on PCI flavor. Seems you and IJW cares for the PCI flavor overlaping a lot, what's your concern?
13:04:59 <baoli> it's similar to pci alias, but can be created with an API, and updated with an API to set keys. Keys: vendor_id, product_id, bdf, host_id.
13:05:32 <ijw_> yjiang5: there's that and the fact that you can't optimise communications with the compute nodes if flavor is essentially arbitrary, though would be the two points.
13:05:36 <heyongli> suggest no host_id, keep it global
13:05:56 <baoli> Yongli, that's good
13:06:08 <baoli> Concerns:
13:06:24 <ijw_> Also, given baoli's definition of it just there all identical devices would end up in the same flavor which wouldn't be acceptable, but I think that's a more limited description than we discussed previously.
13:07:10 <baoli> with overlapping, how do you do counts/stats based scheduling
13:07:20 <ijw_> If you can annotate devices (preferably on the compute node as you add them) I think that problem goes away
13:07:53 <yjiang5> baoli,  pci_stats served for the scheduling.
13:08:04 <heyongli> stats can but implement by longest match, if a device have a,b,c,d it will only in the stats(a,b,c,d) and not in the (a,b)
13:08:15 <ijw_> baoli: the proposal I had is you pre-group devices into groups where the devices are user-indistinguishable so that you get stats, then your flavors are based on the stat groups.  And you do that by grouping on unique combinations of a set of attrs and only allowing flavors to filter on those attrs.
13:08:34 <ijw_> heyongli: that implies an ordering of stats
13:08:43 <ijw_> Of attributes rather
13:09:29 <heyongli> ijw: this will elimate one device in many pci stats and keep scheculer work correctly
13:09:41 <ijw> Which 'this' are you referring to
13:09:45 <heyongli> and make pci flavor it should be
13:10:12 <heyongli> my suggest algorithm
13:10:13 <baoli> yjiang5, pci_stats has keys, right? The keys correspond to the keys assigned to a flavor, right?
13:10:27 <yjiang5> ijw: yes, exactly.
13:10:30 <baoli> Let's have one converstation at a time
13:11:11 <ijw> OK, so we're starting with flavors based on only the 4 stats you listed and vague matching.
13:11:52 <baoli> So, yes, the scheduler needs to be smart enough to handle it.
13:11:53 <heyongli> ijw: if you say based on 4 stats, what's this mean
13:12:11 <ijw> baoli listed vendor product bdf host
13:12:23 <baoli> Another concern is: pci slot (or bdf) info is compute node specific, isn't it?
13:12:44 <irenab_> ijw: why host is part of the flavor?
13:12:58 <heyongli> yeah, seem bdf should be in the global flavor
13:13:04 <heyongli> should not
13:13:06 <ijw> Even if it isn't the same problems come up
13:13:14 <ijw> heyongli's system won't work because he needs to order the stats for longest match and therefore you can't do an efficient match when the first stat is vague.  And you can't distinguish between identical devices which you want to be in different flavors
13:13:38 <baoli> Yongli, then you can only include vendor_id and product_id in the pci flavor, am I right?
13:14:00 <heyongli> might add extra info into flavor
13:14:15 <baoli> what are those extra info?
13:14:39 <irenab_> heyongli: will this exta info be used for scheduling?
13:14:40 <heyongli> that's nutron want ,
13:14:55 <yjiang5> ijw: seems the pci flavor baoli is talking about is different with our discussion on mailing list .
13:15:03 <ijw> Yes
13:15:15 <ijw> I think if the extra info is just arbitrary this works better
13:15:25 <irenab_> can we start with definition of pci favor and then discuss it?
13:15:39 <heyongli> irenab_ , +1
13:16:02 <baoli> Sounds like that we have multiple definitions of PCI flavor
13:16:14 <baoli> Let's start with Yjang5 and Ijw's
13:16:21 <yjiang5> irenab_: ok
13:16:22 <ijw> The way I would define it is: it's user facing, it can be changed by API, and two flavors can both include the same device
13:16:41 <irenab_> it is global ?
13:16:43 <baoli> ijw, we want something concrete
13:16:51 <ijw> (versus group, which, when we were discussing it, was not changeable by API and one device, one group)
13:17:01 <heyongli> ijw, fine, what the exactly define it?
13:17:44 <baoli> ijw, something very concrete
13:17:45 <ijw> I would expect to define it by matching attributes of the PCI device - and that might include configuration-introduced attributes, not just the ones baoli listed - with a vague match (that is more than one value would hit the match.
13:18:17 <heyongli> given a example of vague math
13:18:28 <heyongli> s/math/match
13:18:38 <ijw> For instance, vendor = 0x8086|0x8087
13:19:10 <heyongli> we can do this by intro duce a simple reguler expresion, so it's ok
13:19:11 <irenab_> ijw: should it represent the provider network connectivity concern?
13:19:14 <ijw> Or vendor, device = (a, b) or (c, d)
13:19:31 <ijw> irenab_: I think so.  Also multiple GPUs that are OK
13:19:31 <baoli> First of all, you need to define what attributes can be used to classify the devices
13:20:06 <heyongli> baoli: i think except local to host, all if fine for this
13:20:07 <yjiang5> ijw: I'd define vague match as, a device can be matched by multiple flavor.
13:20:11 <ijw> baoli: as I said - ones which are concrete properties of the device (vendor, device) and ones which can be config-supplied and are arbitrary (e.g. connection)
13:20:43 <irenab_> how config-supplied will be managed by scheduler?
13:20:59 <ijw> yjiang5: I would define that as 'flavors can overlap' because it's not the same thing - I can have a flavor that matches multiple device types and still insist that flavors don't overlap
13:21:13 <yjiang5> ijw: ok :)
13:21:37 <heyongli> so flavor to device is m to m , that's fin
13:21:38 <heyongli> e
13:21:48 <ijw> irenab_: the proposal on the mailing list suggests that we have a configuration item that lists the properties that can be used in a flavor.  (Or an API item, I'm not choosy, but it would be difficult to change it because you'd need to check all the old flavors were still valid)
13:22:22 <baoli> ijw, any use case for what you have talked about?
13:22:26 <ijw> Yes.
13:22:30 <baoli> Why do you need to overlap?
13:23:14 <ijw> I've added that because people have talked about overlapping in the past, to be honest.  I'm not so certain an overlap is essential.  But you would find it hard to redefine flavors if you couldn't overlap them, I think
13:23:42 <heyongli> ijw; good point
13:23:45 <yjiang5> irenab_: do you have more question to ijw's definition.
13:23:46 <ijw> For instance, I've created a machine that needs device type X, and now I change the flavor to X|Y - but the running machine, if I migrate it, still needs X
13:24:25 <baoli> that's why we make it autonamous
13:24:32 <baoli> And it's a simple solution
13:24:43 <ijw> autonomous meaning what?
13:25:20 <irenab_> ijw: so pci flavor is a new resource (object) that can be managed via APIs, no config?
13:25:31 <baoli> ijw, autonomy means the host can elect to join which flavor, rather than defined by some criteria that doesn't meet future requirements
13:25:41 <heyongli> alias should extend to pci flavor in config file
13:25:48 <ijw> irenab_: yes.  That was very clearly a popular use case.  John had a good argument, that as you improve your cloud you may want to change your resource types.
13:26:26 <irenab_> heyongli: so now we have alias, pci flavor and white-list?
13:26:36 <heyongli> irenab_: no
13:26:41 <ijw> baoli: that implies that every identical device in a given host group is interchangeable, but that's not necessarily true.  I think we have to accept that if you have two devices that discover the same on the PCI bus they may have different things attached to them and therefore belong in different flavors.
13:26:51 <heyongli> alias might should be fade out, maybe
13:26:56 <ijw> That's not just network, it applies to storage, for instance, or peripherals
13:27:33 <irenab_> heyongli: so pci flavor and white list?
13:27:48 <heyongli> irenab_: i think so
13:28:00 <baoli> ijw, I am lost. that's why the host can say that a device with the same bdf can belong to a different flavor
13:28:14 <ijw> baoli: how does it say that?
13:28:20 <ijw> You've skipped that bit
13:28:30 <heyongli> irenab_: and might be a pci_infomation config to add info to device, let's talk it later
13:28:43 <baoli> ijw, in the case of pci group, we define that in the pci passthrough list, right?
13:29:10 <heyongli> baoli: what the pci passthrough list?
13:29:20 <ijw> Ok, and if we go back to that we lose the flexibility of being able to redefine flavors because the host defines them (and host state is hard if not impossible to control via API)
13:29:51 <baoli> ijw, why do you need to change the host once it's deployed?
13:30:20 <baoli> heyongli, it's the whitelist
13:30:36 <ijw> John's use case was that you wanted to change the GPUs you were offering to customers, maybe to add a new device type because the new GPU in new machines was as powerful as the old one, maybe to remove a device type because that GPU type had a bug
13:31:09 <ijw> The add is not so bad, providing you know you're going to do it when you set the machine up, but the remove is hard to do if your state is on the hosts.
13:31:19 <ijw> The add is hard if it's deferred.
13:31:26 <baoli> ijw, we can create PCI group throught APIs, then it's not a problem to do so.
13:31:29 <irenab_> guys, just to validate and understand better what is going on with all the definitions, can we cover what admin should define in few use cases?
13:31:53 <ijw> But if you create PCI group via APIs you then have to propagate that PCI group to all the hosts when it changes
13:32:07 <ijw> The idea I had was to try and split the problem down the middle.
13:32:10 <baoli> Well, no.
13:32:37 <ijw> Some of the configuration is static, to simplify the scheduling problem.  Some is dynamic, because you want to be able to modify flavors after the event.
13:32:39 <heyongli> ijw: good example to demo the group is not good
13:32:44 <baoli> When you remove a group, you invalidate all the hosts that use the group, effectively not scheduling any instance on them
13:32:57 <ijw> Potentially, that's all the hosts.
13:33:21 <ijw> Though I see where you're going with this and it does work
13:33:27 <ijw> It's more complex but more flexible
13:34:21 <irenab_> simple networking use case, need cisco vNIC, have machines with single cisco NIC. what should be predefined by admin?
13:34:30 <baoli> ijw, I would suggest that you wrote down the proposal in very much detail, including use cases. Then we can further discuss it.
13:34:46 <ijw> I did, how many times would you like me to write it down again?
13:35:19 <baoli> I'd like to see the APIs, the attributes that can be included, the exact use cases that it would cover.
13:36:06 <baoli> If you check the jan. 8th agenda, that's something we want
13:36:22 <irenab_> ijw: can we go over the items on today  agenda? I cannot connect between proposed nova boot api and pci flavor/group.
13:36:23 <baoli> So that we know what exactly we are proposing, and the pros and cons of it
13:36:33 <ijw> I have not defined the APIs (which are merely of flavor specification and I've made it clear what is acceptable in a flavor specification), the attributes are explicitly not specified because any attribute can be used providing it's declared in config in advance.  I grant you I could list the use cases.
13:37:22 <baoli> Ok, can we continue to talk about Yongli's PCI flavor, then?
13:38:00 <heyongli> baoli: my flavor is ijw's flavor
13:38:22 <baoli> It's the same as you have put down in your wiki?
13:38:32 <heyongli> baoli: no
13:38:33 <yjiang5> irenab_: do you want the use case on the meeting or after the meeting?
13:38:47 <irenab_> if possible, now
13:38:49 <heyongli> baoli , my wiki page is kind of old,
13:39:15 <irenab_> <yjiang5
13:39:17 <baoli> yongli, all right. then let's refine the PCI flavor
13:39:23 <irenab_> just to see we on the same page
13:39:26 <yjiang5> ijw: so will you give example, or I need do that?
13:39:49 <yjiang5> ijw: I'd prefer you, you describe  very clearly :)
13:40:04 <ijw> Sorry, lost track.  Example of what, and where, on here or in mail or in a document?
13:40:33 <yjiang5> (5:31:29 AM) irenab_: guys, just to validate and understand better what is going on with all the definitions, can we cover what admin should define in few use cases?
13:40:33 <yjiang5> (5:34:21 AM) irenab_: simple networking use case, need cisco vNIC, have machines with single cisco NIC. what should be predefined by admin?
13:40:44 <ijw> Ah, sorry
13:41:35 <ijw> Yes, so in my case if it's that simple then I think we'd configure that the two attributes we cared about were device and vendor.  We could whitelist the Cisco NICs on the backend and not add any extra_info to them.  The flavor would match a specific device and vendor ID and nothing else.
13:41:47 <ijw> The flavor would be created with pci-flavor-create ...
13:42:08 <ijw> (or however heyongli defined it, I'm not going to argue with his API definition)
13:42:44 <irenab_> ijw: and when calling nova boot, what will be the command?
13:42:55 <ijw> Depends.
13:43:04 <heyongli> ijw: any way it's GPUD
13:43:17 <ijw> If you're using the NIC as a PCI passthrough device without nova, (like a GPU would be) then:
13:43:29 <irenab_> ijw: SRIOV
13:43:42 <ijw> nova flavor-create with extra specs specifying cisco-nic:1 - then nova boot with the flavor
13:43:45 <ijw> If it's Neutron then
13:44:09 <ijw> nova boot --nic net-id=xxx,vnic-mode=passthrough,pci-flavor=cisco-nic ...
13:44:21 <baoli> ijw, this is how it works today
13:44:31 <ijw> Without the flavor-create
13:44:36 <ijw> And the ability to change flavors
13:44:45 <baoli> that's right, but ti's the same functionality
13:45:00 <irenab_> ijw: so it is not according to what is stated in today's agenda, right?
13:45:13 <ijw> I didn't write today's agenda, I wrote on the mailing list
13:45:42 <irenab_> ijw: I thought the agenda represents the points to go with...
13:45:56 <ijw> It's largely the same as we were planning.  What the proposal does is try and avoid the scheduler being a nightmare because we can still use the pci_stats table to group PCI devices by attribute, reducing the amount of information the scheduler has to consider.
13:46:14 <ijw> I was assuming it represented what Robert wanted to discuss
13:46:18 <baoli> ijw, what if on a machine, you want a PCI device with the same vendor id and product id to be excluded from the flavor?
13:46:40 <irenab_> baoli: I think it won't be on the white-list
13:46:46 <ijw> baoli: two options
13:46:54 <ijw> we already have a whitelist that would let us exclude it
13:47:25 <ijw> Alternatively, you can tag up your devices hen you whitelist them with extra_specs, broadly similar to what heyongli's patch does now (though I don't like his config format much)
13:47:26 <baoli> ok
13:47:51 <ijw> Then you say that that extra_info should be included as a parameter you can use in flavors in the config item, and you can use it as part of the matching expression when you flavor-create
13:48:08 <heyongli> ijw: my patch should be clean up
13:48:22 <irenab_> how all this is related to host aggregates?
13:48:43 <ijw> It isn't, I wasn't going anywhere near host aggregates - honestly, I think that's misusing host aggregates
13:48:44 <irenab_> it is on today's agenda
13:48:49 <baoli> ijw, the extra_specs can't be arbitrary, right? it has to be defined
13:48:56 <ijw> I would recommend that we don't use host aggregates, in fact
13:49:05 <ijw> There's no reason it can't be a dict
13:49:33 <baoli> ijw, what if on a host, I want a device with the same vendor id and product id to classified into a different flavor
13:49:33 <yjiang5> ijw: +1 for host aggregate
13:49:52 <ijw> baoli: give me a reason why you would do that
13:50:06 <ijw> There are certainly other options than host aggregates.
13:50:08 <baoli> ijw, I agree that we shouldn't add host aggregate in the discussion
13:50:37 <irenab_> baoli: please remove it from the list on wiki, it is confusing
13:50:38 <baoli> ijw, I thought that we have discussed it over and over again
13:50:52 <baoli> irenab, which one?
13:50:56 <heyongli> baoli: if you really want to , add a tag info for that, in ijw's proposal
13:51:10 <ijw> baoli: No, I accept that you would want to do it, but you haven't given a use case there, you've given a requirement.  What drives the requiremetn?
13:51:31 <ijw> heyongli: absolutely, and I'm trying to understand if baoli's thinking is compatible with that approach
13:51:58 <baoli> I could have a host with multiple physical connections
13:51:58 <ijw> So, baoli - why would you want to do that - what is it that would make you choose to do that, precisely?  An example case would really help here
13:52:11 <irenab_> baoli: on new proposal for admin api: "configured using host aggregates AP" on meeting wiki
13:52:30 <baoli> irenab, I didn't add that
13:52:40 <ijw> Yep - and that's a case for tagging, definitely, because I can't choose between those connections by host aggregate
13:52:42 <baoli> Everyone is trying to update that wiki
13:53:10 <baoli> ijw, please wirte down how exactly that's done
13:53:13 <ijw> Alternatively, you're saying that you have some hosts with more NICs than others, or some with NICs connected?
13:53:37 <baoli> Right, I could have multiple NICs that are connected to different subsystems
13:53:48 <irenab_> baoli: +1
13:54:00 <baoli> That's why we want something such as PCI groups.
13:54:11 <baoli> We want to use them as a tool to desgin the cloud
13:54:13 <ijw> Yes, if they're differently connected I don't see how host aggregates help at all, because the NICs are in the same host aggregate and you can't choose between them.
13:54:42 <heyongli> pci_infomation =  match: {vendor,} , extra: {connection=xxx}
13:54:45 <ijw> Can I suggest we write *one* document with *absolutely no* implementation details in that lists *just* the use cases that we can then check against our solutions?
13:54:55 <baoli> ijw, host aggregate can be used on top of flavor or group, I have no issue with that
13:54:56 <irenab_> ijw: actually same host can be in many host aggregates
13:55:20 <ijw> irenab_: yeah, but the two NICs are on the same host so they're in all its host aggregates
13:55:33 <heyongli> host aggregate might be another thing ,we can skip this for later discuss, i think
13:55:45 <yjiang5> irenab_: so you want host aggregate, or you simply want resolve baoli's case, i.e. different connection for nic on same host?
13:56:14 <ijw> So yes - standalone use case document, please - *why* you're trying to do this makes a lot of difference - you might be using a host aggregate here to mark hosts with different hardware or you might be using them for any number of reasons.
13:56:14 <irenab_> yjiang5: just to resolve baoli case, I didn't add host aggregate eiter
13:56:41 <yjiang5> irenab_: I think ijw has answer to baoli's question :)
13:56:52 <ijw> If it's hardware or connectivity I would go back to arguing that config is more appropriate than host aggregate, because the hardware and connectivity can't be changed by someone on the outside of the API
13:57:14 <ijw> can't easily be changed, I should say
13:57:17 <irenab_> ijw: agree
13:57:23 <yjiang5> ijw: I don't think anyone here want host aggregate, john is on vocation :)
13:57:26 <heyongli> i think you want is this: define connect: pci_infomation =  match: {vendor, bdf , } , extra: {connection=xxx}  and select want you want in flavor
13:57:45 <ijw> OK - apparently you're all playing devil's advocate then, but hopefully I've given you good reasons
13:58:13 <baoli> yongli, I thought that you are saying that bdf shouldn't be included in the flaour because it's host specific stuff
13:58:15 <ijw> And I think we should add the chosen list of attrs at the outset before we end up with an impossible scheduler
13:58:26 <heyongli> baoli: this is not flavor
13:58:51 <baoli> ijw, how do you manage tags. YOu neeed another set of APIs to do that, aren't you?
13:59:00 <irenab_> we have 2 minutes left
13:59:10 <heyongli> baoli: i write is the tag, i think
13:59:18 <yjiang5> irenab_: for baoli's usacage case, the solution is, when we defien white_list config, (or any other name), we add the connection information, and that connect information will be pci_device's attribute. The flavor can be created with that connect information as key to select device.
13:59:27 <baoli> yongli, understood
13:59:36 <ijw> No, per what heyongli's documented the values are in the inforomation config, and for now at least (and until it proves necessary to change it) I would have the available list in config on the server
13:59:48 <ijw> We should be able to hit 99% of use cases with that
14:00:13 <heyongli> ijw: i not talke the key should be use, just how to add a avaliable tag
14:00:20 <ijw> yjiang5: I see it as a useful place to put arbitrary other information that irenab_'s plugin might want too
14:00:26 <baoli> Ok, i would suggest this given that the time is running out. Write down the exact APIs/concepts for your PCI flavor and we continue tomorrow
14:00:34 <yjiang5> ijw: yes
14:00:41 <baoli> An example would be the jan. 8th's agenda on the wiki
14:00:41 <ijw> baoli - use cases.  Please.
14:00:48 <irenab_> baoli: fine with me
14:00:57 <ijw> Start a document.  We'll add stuff.  We can check it tomorrow.
14:01:08 <baoli> #endmeeting