13:00:01 <baoli> #startmeeting PCI Passthrough
13:00:01 <openstack> Meeting started Wed Jan 15 13:00:01 2014 UTC and is due to finish in 60 minutes.  The chair is baoli. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:00:02 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:00:04 <openstack> The meeting name has been set to 'pci_passthrough'
13:00:20 <baoli> Hi
13:00:29 <ijw> o/
13:00:31 <heyongli> hi
13:00:46 <irenab> hi
13:01:12 <ijw> https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU is the very late proposal from yesterday, plus a list of use cases at the bottom based on what we were talking about.  I've not tried to justify the propsal against the use cases in there.
13:01:53 <baoli> Let's continue with yesterday's use case, then.
13:02:54 <baoli> a  couple of minutes to look at your proposal
13:03:38 <ijw> It's supposed to be what I was describing from yesterday, though it wouldn't kill me to add a diagram...
13:03:54 <ijw> I think the questions to answer are:
13:04:05 <ijw> - do we have more use cases than I've described (or are there some to go away)
13:04:14 <ijw> - is the proposal going to satisfy them
13:04:30 <ijw> - is there something else we could do that would be simpler / do better?
13:05:23 <baoli> ijw, not much time to digest it. But I see it's very complicated.
13:05:54 <baoli> First of all, about the attribute.
13:06:05 <irenab> may I raise concern not related to the document?
13:06:29 <ijw> Go ahead
13:07:02 <irenab> We have about 2 month window to push something to icehouse and get accepted. What are we going to do?
13:07:34 <irenab> Do we want full list of use cases with all apis or we can define basic cases and have e2e flow?
13:08:12 <irenab> My feeling we discuss  alot, but  progress is very slow ...
13:08:15 <ijw> The proposal on the table basically uses code we almost have ready - heyongli has a patch out there that's about 90% of the backend work.  This can be implemented in stages, so if we can't write a flavor API in that time we would write a config item to do the same job temporarily.  The question is whether we're going to accept it or argue about it again
13:08:48 <heyongli> ijw, agree.
13:08:58 <baoli> irena, I agree with you
13:09:00 <ijw> baoli has the code for the --nic changes we need, you're on the PCI backend, the whole thing is pretty close if we'd just sit down and write it rather than debating
13:09:09 <ijw> s/PCI/NIC/
13:09:34 <yjiang51> irenab: in nova side, we have to wait for nova core to approve BPs and code, so  on nova side, it's important to get john's ack.
13:09:37 <baoli> ijw, what are the use cases that can't be taken care of by pci group but the pci flavor
13:10:31 <irenab> so there is also baoli's code that require blueprint approval
13:10:53 <yjiang51> baoli: or, vice versa, any usage case can;t be taken care of pci flavor but pci group?
13:10:56 <irenab> and neutron work that I need translate to blueprints
13:11:12 <baoli> Can you answer my question first?
13:11:26 <ijw> You can't easily do the online modifications - some of it is making things easier - 6-a-2 is a bugger in PCI group - some is because administrators might well have reasons to change the content of a flavor as in 6-b, 6-c
13:11:47 <heyongli> irenab, i remember john said he like to review and approve if we done the work split
13:11:53 <ijw> PCI group is mainly much simpler to implement.  *Much* simpler.  But a lot of flexibility goes away with it
13:12:08 <baoli> How do you justify the flexibility
13:12:16 <baoli> How do you define attributes
13:12:23 <ijw> People asked for it.  John and Alan K from Ericsson
13:12:33 <ijw> Attribute definition is in there.
13:13:07 <ijw> In the backend aribtrary attributes are allowed for extra_info, there's no restriction.  In the scheduler there's a list of available attrs.
13:13:18 <baoli> ijw, I don't see any difference between the pci flaovr and the pci group
13:13:22 <baoli> if you do that
13:13:23 <irenab> heyongli: its good, need also neutron blueprints approved by Mark
13:13:39 <ijw> baoli: if you do what?
13:13:50 <baoli> If you define the extra attributes
13:14:17 <baoli> Do we agree that PCI address and host can't be used in the pci flavor?
13:14:35 <irenab> baoli: agree
13:14:38 <ijw> The distinction is there's a mapping from combinations of attribute (which, absolutely, you can select to be 'pci_group' and you get the groups proposal, that's intentional) to flavor.  That extra level of mapping gives you the requested API flexibility.
13:15:01 <ijw> baoli: I'm happy to accept that
13:15:02 <baoli> I think that we've talked about that you can add "meta-data" to a pci group down the road if needed
13:15:02 <heyongli> baoli, pci flavor can contian you group as a extra attr, but not push the group to the domain factor of design
13:15:13 <baoli> the meta-data is the attribute
13:15:38 <ijw> baoli: The difference between this and group is tiny in that respect but still significant.
13:16:10 <baoli> Are you seeing that a pci group is a special attribute?
13:16:19 <baoli> saying
13:16:26 <ijw> In group, you put all devices into a group, presumably on the backend, and add attributes to the group.  In this, you put attributes on the device and then group by the attribute (which conveniently can reuse the matching code we already have)
13:16:55 <ijw> That gets you two things.  Firstly, if you're of a mind like yjiang5 or heyongli, you can still define your flavors in terms of device and vendor ID and it will work
13:17:14 <ijw> Second, you can put attributes on individual devices which don't apply to every device in the group, which we need
13:18:01 <irenab> ijw: I think group is more compute specific definition (only name is global) and flavor is global, correct?
13:18:06 <ijw> No extra_info attributes are special, I'm saying, but if you want to call one pci_group and use it to group your devices that is entirely your choice and you get pretty much the grouping behaviour you're looking for (while not denying other people the behaviour they'd like)
13:18:14 <baoli> ijw, do you still need to add the attributes in the pci whitelist that is defined on the compute nodes?
13:18:20 <ijw> yup
13:18:29 <baoli> in that case, how could it be flexible?
13:18:45 <ijw> baoli: yup.  irenab: I think group was always driven from the compute node, it doesn't really work well done at the frontend
13:19:05 <baoli> you can modify your flavor whatever way you want, but do you have to modify your compute node?
13:19:19 <baoli> I see complexity, but not flexibitlity
13:19:29 <ijw> It's inflexible in that you're configuring values there (though if you really want to change them you change them and restart, and as I said yesterday those values are largely dependent on the physical configuration and situation of the server so config's a good place for them)
13:19:48 <baoli> And what benefits we get from it?
13:19:50 <ijw> It's flexible in that you get to interpret those values in light of how you define the flavors, which means that you can change your mind with an API call later
13:20:22 <baoli> Can you give an example on that?
13:20:33 <ijw> You are defining the wiring of the host at the backend, where it belongs, and the features offered to the user in the front end.  I would say that's why it works well
13:20:48 <baoli> give an example
13:20:56 <baoli> not something in general
13:21:06 <ijw> So if I want to say 'this PCI device is attached to switch X port 4' then I would do that on the compute node, because that's very situational and nothing will apply globally to all compute nodes.
13:21:42 <ijw> If I want to say 'Cisco and Intel NICs are sold as the flavor 'fastnic'' then I do that at the front end, because that's a user facing decision and one I might want to re-evaluate in the future
13:22:23 <irenab> ijw: if you need to specify compute node network interface, like neutron (eth2), not phy switch?
13:23:05 <ijw> If I'm running a small cloud and the flavor functionality's too much bother then I can define a flavor as 'flavor X is all devices in group X' and label up my devices at the backend.  If I don't like grouping at all, I can say 'flavor X is all Intel NICs (by device ID)' and that works too.  ANd your specific case of 'all SRIOV NICs should be used for passthrough works as well, bonus
13:23:24 <baoli> group?
13:23:29 <ijw> irenab: if we want to do that we're going to have to add the network interface as a device property.
13:23:51 <ijw> You can't do it now, and that's basically why.  That information's pretty easy to discover though so that wouldn't be a huge change.
13:23:59 <baoli> group X
13:24:02 <baoli> group X ?
13:24:03 <irenab> ijw: not sure, this is the network interface of PF
13:24:30 <baoli> ijw, you can do it with PCI group as well
13:24:35 <ijw> irenab: I think it's the same - either we autodiscover it and it's a device property or we have to manually find it and it's extra_info
13:24:54 <ijw> baoli: yes, indeed - my point is that this does precisely what groups do and a number of other things besides
13:25:08 <baoli> a number of other things besides?
13:25:09 <ijw> 'group' here would be an extra_info property that I just happened to call 'group'
13:25:12 <irenab> ijw: you need it to choose the device that is child of this PF
13:25:22 <ijw> irenab: we'll have to add that to the pci_device then.
13:25:30 <ijw> Different problem but not too hard to solve
13:25:32 <yjiang51> irenab: for SR-IOV device, we record the corresponding PF even in current code already
13:25:37 <ijw> Ah, ok
13:25:50 <yjiang51> irenab: as a PCI device property.
13:26:17 <heyongli> irenab: define PF as a attr to the flavor in ijw's propsal you get you want
13:26:21 <irenab> yjiang51: its fine, but I think we need some correlation with it in pci flavor
13:26:22 <baoli> So you are saying that we need a special attribute, somethign called say sriov_group?
13:26:26 <ijw> baoli: As I said before, and as the document says, groups can offer you backend configuration (if you set them up in config) or front-end flexibility (if you have an API to set them up) but they can't offer you both at the same time.
13:26:45 <ijw> irenab: how'd you mean?
13:26:49 <irenab> heyongly: cannot, since PF can differ between compute nodes
13:27:36 <heyongli> irenab: then you can add  attr 'connective' and define it diffrently in every host, and based it's pf
13:27:38 <yjiang51> irenab: so you mean you want to select SR-IOV device based on the attribute of the corresponding PF device's atrribute?
13:27:39 <irenab> ijw: I want to specify flavor that suites connectivy case
13:27:39 <ijw> irenab: that's where you'd use the backend tagging, I think: match PF = whatever, add an attr (not quite sure what you have in mind for there though)
13:28:15 <ijw> match PF = whatever, add provider_network='outside' or something
13:28:29 <baoli> guys, are we trying to define something so complicated when you have a simple solution to solve all the practical cases we have right now?
13:28:30 <heyongli> ijw: +1
13:28:46 <irenab> ijw:as an analogy, for neutron agent you specify where provider net is going outside (i.e. eth2)
13:28:52 <heyongli> baoli: this is not complicated for me to implement it
13:28:52 <ijw> baoli: because we don't.  Whatever you think groups are they are either defined in backend config or via a frontend API
13:29:12 <baoli> you need all the APIs to support attributes, wouldn't you?
13:29:15 <ijw> baoli: and yes, I don't think this *is* complicated, it's designed to build largely on what we have
13:29:31 <ijw> baoli: no, you wouldn't - there's no API there that takes a bare attribute
13:29:55 <ijw> Flavors take a matching expression, PCI flavors in flavors and NICs don't use attributes at all, only flavor names
13:30:03 <baoli> how do you report stats?
13:30:51 <ijw> There's no use case for stats there - if you have one in mind we can add it and then I'll answer that question, because I can think of a number of stats that might be useful.
13:31:05 <irenab> ijw: whre do you define the  (match PF = whatever, add provider_network='outside' )? Flavor?
13:31:08 <ijw> Also, I would question whether stats are a priority for Icehouse
13:31:25 <baoli> ijw, how do you do scheduling?
13:31:28 <yjiang51> irenab: I remember even in neutron, you have to specifify provider net through config, right? then in nova , you have to specify the provider net by providing that attribute in compute node, and then create flavor for it.
13:32:01 <ijw> irenab: I define that in the compute node config (because the compute node's physical connectivity determines the attribute value to use) then I make my flavor up by matching on e.provider_network
13:32:21 <irenab> yjiang51: so need to associate list of devices on compute node with this provider net
13:32:48 <irenab> ijw: can you, amybe later write down how the configs will look like?
13:33:12 <ijw> baoli: same as I described to you before - pci_stats buckets can be created by doing a SELECT COUNT(*) GROUP BY attr, attr, attr - giving you a limited number of buckets - then each PCI flavor corresponds to a number of buckets and you find a set of choices out of those buckets with availability that satisfies your demand
13:33:40 <ijw> irenab: it's in there in the abstract, but there's no worked example - 'pci_information'
13:33:45 <baoli> ok, a compute node can use whatever attributes in your mind to tag an entry?
13:33:49 <ijw> yup
13:34:17 <yjiang51> irenab: just wonderinfgif nova have mechanism to get that network provider automatically?
13:34:40 <irenab> ijw: I feel I need to take it a little bit more down to earth (example)
13:34:44 <heyongli> as a implement, stats can be calculated based on flavor property
13:35:08 <ijw> yjiang51: a provider network here is 'an external network attached to a compute node port' so it's not usually automatically discoverable - it's also a Neutron concept so Nova would find it hard to get hold of, I think
13:35:31 <irenab> yjiang51: not as far as I know,  nova gets it by at allocate_network stage
13:35:37 <ijw> heyongli: indeed, which is pretty much the same as 'group' that I was using in the worked examples above
13:35:55 <ijw> irenab: the thing I don't like about that pci_information at the moment is it's ugly, but I think you'd end up with
13:36:16 <heyongli> ijw: yeah, anyway stats and scheduler is fine for this design
13:36:27 <ijw> pci_information=[{pf => '0000:01:01.00'}, { provider_network => 'outside'}]
13:36:48 <heyongli> ijw: +2 for this
13:36:53 <BrianB_> irenab: do you have a simple use case, seems it;s a good time to move on and see what and e2e solution will look like.
13:37:52 <irenab> BrianB: the use case is very simple. each compute node is connected to provider network via specific netowrk interface (SRIOV NIC)
13:37:52 <ijw> I think that PCI information thing wants a DSL really but for the minute we'll stick with JSON or Python
13:38:41 <baoli> ijw, we need to support live migration soon.
13:38:48 <baoli> how do you take care of this?
13:39:30 <heyongli> baoli: live migration or migration ?
13:39:36 <baoli> live migration
13:39:50 <baoli> in the case of SRIOV
13:40:15 <irenab> baoli: we need libvirt network  then
13:40:23 <baoli> yes
13:40:31 <yjiang51> irenab: how that provider network for each SRIOV NIC defined in neutron? In config file or automaticaly? If in config file, will it be ok to define same in nova also, as ijw suggested? (i.e. for all devices with PF=xxx, define provider_network as xxx)
13:41:41 <irenab> yjiang51: I think it will be sufficient
13:41:43 <baoli> Also, does the compute node knows about flavor?
13:42:01 <yjiang51> irenab: great.
13:42:08 <heyongli> irenab: cool
13:42:45 <irenab> yjiang51: but it should be considered by scheduler too
13:43:28 <ijw> baoli: live migration shouldn't be a problem from a scheduling perspective, but very very few devices support it and none that I know of with KVM support
13:43:39 <ijw> baoli: irenab's macvtap connection will work though.
13:43:50 <heyongli> irenab: sure and ijw and I give a rough description above
13:43:51 <yjiang51> irenab: yes, if we have attribute 'provider_network' defined as attributes supported for pci_stat/pci_flavor.
13:44:12 <ijw> yjiang51: provider nets are in a config file
13:44:39 <ijw> baoli: compute nodes don't know about flavors, only the results of flavor, I *think*
13:45:01 <heyongli> ijw: i also think so
13:45:04 <ijw> As in, by that point you have the allocations in terms of RAM, CPU and now PCI device, so you don't need the connection back to flavor
13:45:05 <baoli> Again, we come back to the question on how pci stats are reported by the compute nodes?
13:45:58 <heyongli> baoli: i think compute node report the stats by the avaliable pci flavor property
13:46:20 <ijw> baoli: right now, they're reported individually by device.  This allows you to report them grouped by the grouping attributes, but since we'd quite like something to work I was expecting to do what we do now - report all the PCI devices and group in the DB - and move to a different form later.  Also, since the scheduler's being hacked to bits right now there's not much point in prejudging the results of that
13:46:23 <heyongli> pci_flavor_attrs=attr,attr,attr on control node
13:46:41 <baoli> ijw, no. they are not individually reported
13:47:41 <ijw> OK, so why do we have a pci_devices table?
13:47:47 <ijw> How does it get filled?
13:47:55 <yjiang51> irenab: so are you ok with ijw's suggestion ?
13:48:12 <baoli> ijw, they maintain status information
13:48:40 <irenab> yjiang51: seems Ok, need to go over the details to validate
13:48:50 <heyongli> ijw: not  so far, but can report by pci_flavor attrs, even directly from dB is so good( we try to push it in crrrent implementation at havana)
13:49:24 <baoli> do you reporte by flavor? or do you report by individual attribute?
13:49:45 <ijw> You'll notice I didn't say, because it's not terribly important
13:49:55 <baoli> ijw, it's important
13:50:22 <ijw> From a use case perspective it isn't - it doesn't matter which you're doing because it doesn't affect behaviour
13:50:28 <ijw> As long as you get it right...
13:50:35 <baoli> You can't conceptually bring up something without considering its complexity and pacticality
13:50:47 <baoli> practicality
13:51:02 <ijw> Actually, I considered it both ways
13:51:04 <heyongli> baoli: not by flavor, can based on pci_flavor_attrs,  it's  practically , or just by DB
13:51:26 <baoli> what do you mean by based on pci_flavor_attrs?
13:51:33 <baoli> or just by DB?
13:51:53 <ijw> So, if you group on the control node, then that's a lot of network traffic, but otherwise fine.  If you group on the compute node, the only thing you need to consider is that you need the grouping attributes available, so the one visible change would be whether you need that list of grouping attrs on the compute node (presumably in its configuration)
13:52:35 <ijw> I would prefer that we group on the compute node, but I was also assuming that we could do it with control node grouping and fix the problem later, particularly since there's no user visible change there
13:52:48 <baoli> ok, you define an API for PCI flaovr, you need to notify all the compute nodes about that, so that they can do grouping properly, is that what you say?
13:52:56 <ijw> Personally I would choose - for now - the way that's closest to what we already have in the code to minimise work
13:53:01 <ijw> No.
13:53:10 <baoli> ok, then how
13:53:10 <ijw> Not at all.
13:53:24 <ijw> Flavors are defined in terms of a list of attributes that can be used in the flavor.
13:53:27 <ijw> Which you define in config.
13:53:35 <ijw> Setting aside for a moment where that config is.
13:53:50 <ijw> So, your pci_stats table has a row for every value combination of those attributes.
13:54:31 <ijw> Now, if you're grouping on compute nodes, you don't need to tell them about flavor.  They do need to know what the grouping attributes are.  Either they have the same attrs in their config and confirm early on that they have the right set or they have to ask the conrol node for the list.  Either works.
13:54:34 <baoli> you are saying the compute node will report every device?
13:54:43 <ijw> No, I'm not.
13:56:30 <yjiang51> baoli: I'm not sure if anyone else have more questions. If you have more, possibly you can discuss with ijw after the meeting since you are on the same company and easy to discuss?
13:56:33 <baoli> remember that pci flavors are defined on the controller node which has an arbitrary criteria
13:56:59 <irenab> ijw: baoli: do we want to set a plan/dates we want to progress?
13:57:03 <baoli> yjang51, sure
13:57:30 <irenab> baoli: any chance you can share git fork with you changes?
13:57:32 <baoli> irena, I wish we can
13:58:00 <baoli> Irena, I can provide a full patch if you need one
13:58:18 <ijw> baoli: flavors are defined on a controller node but the criteria are not arbitrary, they're validated against the list of grouping attributes.
13:58:36 <ijw> And I think that that is what you're missing
13:59:04 <baoli> ijw, let's discuss it offline then
13:59:15 <baoli> #endmeeting