13:00:28 <baoli> #startmeeting PCI Passthrough
13:00:29 <openstack> Meeting started Wed Feb  5 13:00:28 2014 UTC and is due to finish in 60 minutes.  The chair is baoli. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:00:30 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:00:32 <openstack> The meeting name has been set to 'pci_passthrough'
13:00:55 <irenab> hi
13:01:31 <baoli> Hi
13:02:16 <irenab> short update with current status?
13:02:42 <baoli> Irenab, Shall we wait for others
13:03:04 <irenab> sure
13:05:48 <rkukura> hi
13:06:14 <irenab> rkukura:hi
13:06:36 <rkukura> snow day here - had to shovel a path to take the trash out
13:06:54 <baoli> Hi rkukura
13:07:14 <irenab> rkukura: we are more than month without rain...
13:08:01 <baoli> we'll have 8-12 inches snow today
13:08:01 <irenab> baoli: shall we wait for Sandhya?
13:08:05 <rkukura> expecting 10-12" of snow today - looks like about 8" so far
13:08:27 <rkukura> baoli: Where are you located?
13:08:28 <baoli> irenab, yes. a couple of more minutes
13:08:45 <baoli> rkukura, westford/boston
13:09:09 <baoli> rkukura, is your office in westford?
13:09:12 <rkukura> baoli: I'm in Sudbury, office is in Westford
13:09:47 <irenab> guys, till we wait for Sandhya, how we make progress with vnic_type?
13:09:56 <baoli> rkukura, we are neighbors
13:10:27 <rkukura> At least the couple days a week I work from the office
13:10:43 <baoli> yep
13:13:44 <baoli> irenab, regarding vnic_type, I'd like to see normal users can choose to go with sriov or virtio.
13:13:49 <irenab> rkukura: did you have any chance to discuss it with other core team members? Shall I send an email to the mailing list?
13:15:09 <irenab> baoli: I think I'll be able to push the code for it as draft either later today or tomorrow
13:15:10 <rkukura> irenab: I think you should send an email to openstack-dev.
13:15:27 <rkukura> irenab: That's probably best - makes it concrete.
13:15:35 <baoli> irenab, that sounds great
13:15:40 <irenab> rkukura: Ok, thanks
13:16:33 <irenab> rkukura: It will be a hard to present without having nova api defined..
13:17:14 <baoli> Not sure if you guys are aware of the change proposed for ipv6. Two new keywords for ipv6 subnet: ipv6_ra_mode and ipv6_address_mode. You'd expect a normal user to fully understand ipv6 before using it
13:17:58 <baoli> intuitively, a normal user should just say "I want ipv6" for my network
13:18:10 <rkukura> baoli: I'm no IPv6 expert, but don't those effect what happens inside the VM?
13:18:22 <irenab> baoli: it has permissions as regular user in  policy.py ?
13:18:41 <baoli> rkukura, it just provides ipv6 connectivity
13:19:10 <baoli> irenab, no restrictions imposed in policy.py
13:19:14 <baoli> on subnet
13:19:18 <rkukura> With IPv4, the VM needs to know whether DHCP is being used. Aren't these similar?
13:19:48 <irenab> baoli: only admin or network owner can create subnet
13:19:59 <baoli> rkukura, in addition to that, a user needs to say dhcp stateless or dhcp statefull, or slacc, etc
13:20:33 <irenab> baoli: can you please send the link to review?
13:21:19 <rkukura> baoli: Someone should be looking out for the usability of this for normal tenants!
13:21:41 <irenab> baoli: I am not sure it supposed to be managed by regular user, should be admin or network owner
13:21:43 <baoli> irenab, you mean the ipv6 change?
13:21:52 <irenab> baii: yes
13:22:19 <irenab> baoli:yes
13:22:28 <baoli> irenab, if it's ok for network owner, then it's ok for a network owner to say that I need a sriov port
13:22:44 <rkukura> If this precedent helps with the case that exposing vnic_type to nomral users is the way to go, fine.
13:23:10 <rkukura> Don't forget that tenants can use networks owned by someone else, as long as the shared attribute is set.
13:23:21 <irenab> so normal_user =network_owner for the vnic_type case?
13:23:37 <rkukura> There are also proposals being discussed for hierarchies of tenants
13:24:00 <irenab> I need to state it in the neutron policy.py
13:24:04 <rkukura> I do not think we should restrinct SR-IOV to the case where the tenant is the owner of the network.
13:24:30 <irenab> its either admin or owner
13:24:41 <irenab> rkukura: would it be ok?
13:24:43 <rkukura> Networks (and their subnets) are often shared, especially provider networks.
13:25:05 <rkukura> I don't think admin or owner is correct for attaching to a network
13:25:42 <baoli> rkukura, in the current proposal, sriov ports are shared among tenants!
13:26:50 <rkukura> Why not just let normal users request SR-IOV via vnic_type?
13:27:06 <irenab> according to policy.py  "create_port:mac_learning_enabled": "rule:admin_or_network_owner"
13:27:28 <irenab> I think we should have same for vnic_type
13:28:35 <rkukura> This whole API is getting way too complicated!
13:29:36 <rkukura> mac_learning_enabled is a nicira-specific extension, it looks like.
13:29:55 <baoli> irenab, I think that --binding:vnic_type should have at least have the same restriction as the command port-create.
13:30:14 <baoli> So I think it should be admin_or_network_owner
13:30:26 <rkukura> I have no idea what the use case is for mac_learning_enabled, so maybe it is something normal users would never use unless they own the network
13:31:39 <rkukura> baoli: Wouldn't using admin_or_network_owner would prevent normal tenants from requesting SR-IOV on a port attached to a shared network? Maybe I'm wrong and this poilicy rule takes --shared into account?
13:31:55 <irenab> rkukura: do not see any other examle that fits... MAC and IP seems indeed something the only admin_or_network_owner should manage
13:32:25 <irenab> rkukura: there is also rule "shared", not sure how it works
13:32:50 <irenab> rkukura: so we can mix
13:33:27 <rkukura> Interesting - I do see now that specifying mac_address or fixed_ips is admin_or_network_owner. I'm getting convinced that that rule must take sharing into account.
13:33:58 <baoli> rkukura, good point. I think that i need to study the policy.py  a bit more
13:34:24 <irenab> anyway seems that  the current discussion proves that there is a good chance vnic_type may have different policy rules that other items that may land into binding:profile
13:34:45 <rkukura> I apologize if my misinformation on this has been leading the team astray!
13:35:11 <rkukura> irenab: Agreed.
13:35:16 <irenab> rkukura: I think your questions are in place and cause us provide good answers
13:36:07 <irenab> so seems that no reason to block the vnic_type bp, what do you think?
13:38:09 <rkukura> irenab: I'm fine with going forward with it.
13:38:11 <baoli> rkukura, a shared network can be used by any tenant, is that right?
13:41:09 <irenab> baoli: there was a question sadasu sent to the mailing list on neutron SRIOV ports and MD. After vnic_type I wanted to start with SRIOVPortMDBase. Do know if sadasu started some work on this?
13:41:33 <rkukura> baoli: That is my understanding. The mailing list discussion regarding hierarchies of tenants/projects may eventually make that more useful.
13:42:03 <baoli> irenab, not sure if she would do something.
13:42:21 <irenab> baoli: ok..
13:43:04 <baoli> so for vnic_type, we'd go with binding:vnic_type, and set the rule as admin_or_network_owner? is that agreed?
13:43:26 <irenab> baoli: I agree
13:44:21 <baoli> cool
13:44:32 <baoli> let's move on to SRIOVPortMDBase
13:44:38 <irenab> rkukura: do you agree?
13:45:11 <rkukura> irenab: I agree as long as admin_or_network_owner really does work for shared networks
13:46:12 <irenab> rkukura: I'll do some test on some existing attribute to verify
13:46:30 <baoli> rkukura, that's the catch we have to experiment with. But I think that should be same for someone to create a port on a shared network regardless the vnic_type. I maybe wrong, though
13:46:42 <rkukura> irenab: Great. Very interested in what you find out.
13:47:01 <irenab> the rule can be also the rule like "rule:admin_or_owner or rule:shared",
13:47:29 <baoli> Irenab, let us know once you find out
13:47:35 <irenab> baoli: sure
13:47:57 <irenab> baoli: can you please give short update for nova side?
13:48:32 <baoli> irenab, sure
13:48:58 <baoli> basically, yunhong would like to go with a simpler version for Icehouse
13:49:35 <baoli> The major enhancement would be: a) add attr supports in the whitelist b) support multiple aliases
13:49:55 <baoli> and c) support stats based on aliases
13:50:15 <baoli> he is asking for approval
13:50:48 <irenab> baoli: this will work without need to create VM flavor with PCI alias, right?
13:51:11 <baoli> If that can be done, then we should be ok. I also asked for the API to support correlation between the allocated device with the requested networks.
13:51:26 <baoli> irenab, no vm flavor for network.
13:51:27 <irenab> baoli: and for your bp, is it on the way?
13:52:03 <baoli> irenab, I put John's name as approver. I need to send an email as well
13:52:15 <rkukura> Is requesting the physical network still implicit in requesting the PCI device? I saw some mention in baoli's wiki of nova asking neutron for the physical_network, but don't think that is possible.
13:53:16 <irenab> baoli: thanks
13:54:00 <irenab> baoli: so we need to well define nova-neutron API for setting PCI details and returning VIF details/VIF_TYPE
13:54:02 <baoli> rkukura, what is not possible? Can you clarify it?
13:55:08 <sadasu> rkukura: yes, the wiki says that, why not possible?
13:55:24 <sadasu> I thought phy_net can be part of the vif_details
13:55:33 <rkukura> baoli: With ML2, a virtual network can be made up of multiple segments, which may have different values for provider:phyiscal_network. I don't think either nova or neutron could know which segment's physical_network to  use.
13:56:21 <rkukura> sadasu: Once port binding occurs, a segment has been picked, and the MD can put that segment's physical_network into binding:vif_details.
13:56:48 <baoli> rkukura, do you have an example on how that's used and provisioned?
13:57:07 <baoli> rkukura, I mean multi-segments
13:57:11 <rkukura> It seemed to me that the physical_network was needed in nova before the VM is scheduled. Port binding can't be done until after the VM is scheduled.
13:57:51 <sadasu> rkukura: yes. didn't know of his sequencing problem
13:57:51 <baoli> rkukura, I assumes that a nuetron net is associated with a physical net
13:58:15 <baoli> rkukura, and a port is created from a neutron net
13:58:58 <irenab> rkukura: it  should possible to support single segment network, right?
13:59:16 <baoli> time is up. can we switch to a different channel?
13:59:28 <rkukura> Single segment networks are certainly possible, and most common right now, but the ML2 model allows multi-segment networks.
13:59:29 <irenab> I can for ~10 mins more
14:00:17 <baoli> Ok, rkukura, could you please send me an email on that? I'd like to know more details
14:00:31 <irenab> me too
14:00:51 <rkukura> What happened to the plan for the admin to create flavors or host aggregates with SR-IOV connectivity to specific physical networks?
14:01:33 <irenab> rkukura: Host aggregates can be done today
14:02:12 <rkukura> We should #endmeeting
14:02:43 <baoli> if no one is using it, we can continue a bit more
14:03:27 <rkukura> OK, but if someone else is scheduled and ready they should speak up
14:04:01 <baoli> checked, it's open
14:04:09 <irenab> rkukura: with multi-segment + provider netowrk, should the network be created as with regular plugin?
14:05:44 <baoli> I think that we only care about the first segment, which is the one that the compute node immediately connects with.
14:06:05 <rkukura> irenab: I'm not sure what you mean. The ML2 plugin supports both the providernet and multiprovidernet extensions. Either API extension can be used to create single-segment provider networks. Only the multiprovidernet extension can be used to create multi-segment provider networks. Creating normal tenant networks doesn't use either extension.
14:06:49 <rkukura> baoli: Port binding can pick any of the segments. First is not special. In fact there is a bug right now that the order of the segments isn't even deterministic.
14:07:31 <baoli> rkukura, is the multi-segments support documented anywhere?
14:08:53 <irenab> so now I don't understand how vnic_type helps us ...
14:08:58 <rkukura> baoli: I'm not sure, but multiprovidernet should be covered in the API guide.
14:09:19 <rkukura> Here's what I thought was supposed to happen:
14:09:24 <irenab> baoli: I think that provider net is documented in neutron admin guide
14:09:40 <irenab> but not with ML2, if its differ
14:10:00 <baoli> rkukura, please go ahead to describe it. I'll take a look at it offline as well.
14:10:07 <rkukura> 1) The admin creates a flavor or host aggregate with SR-IOV connectivity to a specific physical network
14:11:03 <rkukura> 2) Probably the admin, or someone else, creates a virtual VLAN network on that physical network, and gives the tenant access (as owner or shared)
14:11:19 <rkukura> actually I guess 2) needs to be the admin
14:12:05 <baoli> rkukura, in 2), a neutron net is associated with a physical net, right?
14:12:16 <rkukura> 3) The tenant creates a neutron port on that network specifying --binding:vnic_type <something other than virtio>
14:13:05 <rkukura> 4) The tenant boots a VM specifying the flavor or host aggregate with SR-IOV connectivity to that same network
14:14:04 <rkukura> In 4) the user specifying --nic with the port ID from 3
14:14:33 <irenab> rkukura: I don't see a problem here. neutron net is associated with physicla network and nova had admin access, right?
14:14:43 <rkukura> 5) Nova takes care of reserving the PCI slot and PF when scheduling the VM, and stores these details in the binding:profile attrbiute of the port created in 3
14:14:59 <baoli> rkukura, also in 1), we need to tag the pci devices with a specific physical network on each compute nodes.
14:15:41 <rkukura> 6) Either in 5 or after, nova sets binding:host_id to specify, which triggers ML2 port biinding
14:17:02 <baoli> rkukura, if by port binding, you mean to bind a port to a host, then we don't have a problem here. The association of a port with a physical net is determined when the port is created
14:17:55 <rkukura> 7) ML2 port binding tries the registered mechanism drivers. Ones that don't support the binding:vnic_type refuse to binding. The SR-IOV mechanism drivers do try to bind, and look for a segment for which the PCI device has connectivity to the segment's physical network
14:18:50 <baoli> that information is then used for scheduling a host. We don't want to do a trail-and-error approach in selecting a host to 'bind'
14:19:03 <baoli> sorry, trial-and-error
14:19:07 <rkukura> baoli: Port binding is what determines which network segment is being used, what the binding:vif_type is, and (soon) what is in binding:vif_details.
14:19:36 <baoli> but in 2), you said that a vlan net(a nuetron net) is associated with a physical net, right?
14:19:52 <irenab> rkukura: its first time in the flow you mentioned network segment. when it was created?
14:20:37 <rkukura> baoli: My understanding was that nova's scheduler would schedule on a host from the aggregate or flavor with the needed connectivtiy, so as long as the user gets this right, its not trial and errror.
14:20:53 <baoli> rkukura, no. that's not the plan for now
14:21:21 <rkukura> irenab: The network segment(s) is/are created in 2
14:22:02 <baoli> rkukura, regardless sriov, can you provide a work flow for multi-segment?
14:22:07 <irenab> rkukura: 2) says, network is created
14:22:28 <rkukura> baoli: What is the current plan for making nova schedule on a host with an available SR-IOV VF with connectivity to the needed physical network then?
14:22:46 <baoli> rkukura, it's described in my wiki
14:23:05 <irenab> I have to go, please do end meeting by the end to make logs available.
14:23:05 <baoli> 1) you tag each pci device with the attached physical net
14:23:37 <baoli> 2) compute node reports pci stats as "host:net-group:phynet1:count" to the controller/scheduler
14:23:55 <rkukura> baoli: Right now multi-segment networks can only be created by passing in a list of maps describing each segment to create network or update network, using the multiprovider extension.
14:24:40 <baoli> rkukura, can you provide how it's done in terms of neutron config and work flow (cli commands, etc)?
14:25:16 <baoli> 3) create a neutron net that is associated with a physical net
14:25:35 <baoli> 4) create a neutron port on this net with --vnic-type=direct
14:25:52 <baoli> 5) nova boot --nic port-id=<port-uuid-in-abovr>
14:26:13 <rkukura> baoli: I can track down the exact syntax. They aren't too commonly used right now, but the capability is there. The bridging between the segments in not managed by neutron and must be setup administratively right now.
14:26:43 <baoli> rkukura, are all the segements bridged together?
14:26:57 <baoli> administratively?
14:27:28 <rkukura> baoli: Yes, that's what makes them the same virtual L2 network, right?
14:27:54 <baoli> rkukura, so on each compute node, the starting segement may be different?
14:28:34 <rkukura> Longer term plans are for some of this to be automated - maybe creating a vlan segment shared within a specific rack, but a vxlan segment connecting the various top of rack switches
14:29:00 <rkukura> baoli: Not sure what you mean by "starting segment"?
14:29:45 <baoli> rkukura, I think that we need to do some study on that. Thanks for bringing that up.
14:29:51 <rkukura> baoli: In your steps above, what forces nova to schedule the VM on a node with an available VF for an SR-IOV device with connectivity to the needed physical network?
14:30:27 <baoli> we have a pci filter scheudler that has to be loaded into the nova scheduler
14:31:00 <baoli> that pci filter scheduler works off the stats: host:net-group:phynet:count
14:31:02 <sgordon> speaking of nova...which bps are still active on the nova side for icehouse?
14:31:08 <sgordon> do we need to seek core reviewers?
14:31:17 <baoli> #endmeeting