13:01:26 <ijw> #startmeeting pci_passthrough
13:01:27 <openstack> Meeting started Tue Jan 21 13:01:26 2014 UTC and is due to finish in 60 minutes.  The chair is ijw. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:01:28 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:01:31 <openstack> The meeting name has been set to 'pci_passthrough'
13:01:40 <ijw> Welcome one and all.
13:01:43 <heyongli> #link https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support#API_interface
13:01:48 <heyongli> bp update
13:01:53 <ijw> Marvellous
13:02:00 <ijw> Is it any good?
13:02:00 <ijw> ;)
13:02:15 <heyongli> maybe miss sth, or sth wrong ,please check it
13:02:19 <johnthetubaguy> so are the blueprints ready to review, and inline with the google doc now?
13:03:06 <heyongli> johnthetubaguy, updated bp based on these days discuss, i don't update any google docs.
13:03:29 <ijw> OK, I see a couple of typos but it's basically good.  Seems like it's pretty much the same as the google doc
13:03:49 <heyongli> ijw, it should be.
13:03:56 <irenab> heyongli: I'll take a look after the meeting
13:04:28 <heyongli> irenab, sure, i suggest edit the wiki directly tag with your name, maybe
13:04:33 <ijw> How are we transitioning from the old config items to the new ones?
13:04:41 <irenab> johnthetubaguy: there is Ian's google doc, that may be worth to look at
13:04:59 <johnthetubaguy> yeah, had a quick read
13:05:01 <ijw> Simply remove the old ones and deny everything?  I don't know if that will fly.
13:05:03 <johnthetubaguy> just looking at the blueprint
13:05:04 <heyongli> i link it to bp already.
13:06:02 <irenab> heyongli: will do
13:06:18 <ijw> heyongli: OK, we're in the lull between I2 and I3 practically (more so cos the gate is down) so I suggest we line up our patches ready to get in when I3 opens
13:06:44 <baoli> Hi
13:06:56 <irenab> baoli: hi
13:06:57 <johnthetubaguy> we can do the review now, I would just get your code up right away
13:07:09 <johnthetubaguy> its going to be hard getting into I3, its quite full already
13:07:13 <ijw> baoli for arguments, heyongli for libvirt data gathering, bobball (who's avoiding us again) for xenapi and irenab for Neutron.  I guess I'll take the scheduler?
13:07:31 <johnthetubaguy> I can cover the xenapi side a little bit
13:07:32 <ijw> johnthetubaguy: if we had code that would work better.
13:08:02 <ijw> And we managed H3 last time as I recall
13:08:04 <baoli> sorry for joining late. ARe we talking about work division?
13:08:08 <johnthetubaguy> ijw: what are the XenAPI worries, I chair the subteam meeting, so I can bring those up there if you like
13:08:29 <heyongli> what's the libvirt data gathering?
13:08:35 <ijw> johnthetubaguy: mainly that bob was saying they'd started on implementation of current functionality without regard to the new plans
13:08:48 <ijw> heyongli: pci_information basically
13:09:00 <johnthetubaguy> ijw: right, I don't think its a big deal, its mostly low level plumbing
13:09:20 <sadasu> baoli: yes
13:09:24 <heyongli> ijw, sure, and we had one people more for API
13:09:28 <johnthetubaguy> ijw: I did warn them about the new plans, but not sure I got the full picture across
13:09:34 <irenab> for neutron rkukura is also doing some patch that we may use for extending port attributes
13:09:52 <ijw> OK - well the BP is there and I think subject to some minor edits it looks like I expected.
13:09:58 <ijw> We all need to review that for tomorrow
13:10:14 <irenab> ijw: action item?
13:10:28 <ijw> #action All to review the BP by tomorrow
13:10:38 <ijw> Not that the meetingbot loves me
13:10:47 <baoli> Sorry, which BP?
13:10:58 <ijw> https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support#API_interface
13:11:06 <ijw> Well, all of it, not just that section
13:11:30 <baoli> sure.
13:11:54 <baoli> What's the plan for today?
13:12:12 <ijw> We don't have one as such
13:12:47 <johnthetubaguy> I have some initial feedback on that blueprint, it seems unclear to me where pci_flavor is stored?
13:12:52 <ijw> Unless we want to discuss the network scheduling (I put a section in the google doc but I think we all agreed that we wouldn't do it yet)
13:12:59 <ijw> johnthetubaguy: DB
13:13:07 <irenab> baoli: ijw: lets try to see that all parts have owner and relevant bp
13:13:25 <ijw> johnthetubaguy: Section at the bottom referring to DB
13:13:34 <irenab> ijw: lets try focus on plan A, to make some to I-3?
13:13:40 <ijw> irenab: indeed
13:13:46 <johnthetubaguy> ijw: but in what format in the DB? it seemed to suggest it was in configuration earlier in the doc, may have missread
13:14:01 <ijw> The section at the bottom sketches out the table format
13:14:09 <heyongli> johnthetubaguy, store them in Database i think, DB in the later section.
13:14:36 <johnthetubaguy> Ah, my quick reading failed me again
13:14:57 <johnthetubaguy> seems like a duplication of host aggregates to me, but that might not be a bad thing
13:15:59 <irenab> johnthetubaguy: we may need host agggregates in addition
13:16:22 <johnthetubaguy> I don't see why you need both, but I could be missing something
13:16:30 <sdague> there isn't any existing metadata store you could use instead? seems odd to create whole new table for this.
13:17:08 <johnthetubaguy> yeah, host aggregates does seem to fit what you need quite well
13:17:40 <ijw> johnthetubaguy: It's not host aggregates, not least because two network cards in the same machine can be connected to different things and therefore be in different flavors
13:18:06 <johnthetubaguy> ijw: thats just a question of mapping to host aggregates correctly
13:18:11 <irenab> ijw: till we have neutron aware scheduler
13:18:27 <johnthetubaguy> ijw: a host can be in many aggregates
13:18:31 <ijw> johnthetubaguy: No, it's a question of being able to separate one card from another when they have identical devices and vendors, which is what the PCI flavor does
13:18:58 <johnthetubaguy> ijw: that was still possible in the host aggregate approach, using a host aggregate for each PCI flavor
13:19:18 <ijw> Er, host aggregates presumably refer to hosts?
13:19:53 <ijw> Whereas what we're talking about has nothing to do with hosts
13:20:25 <ijw> I mean, I can create two flavors that don't make any attempt to distinguish the host that a card is in
13:20:45 <ijw> And need additional information that isn't in host aggregates like device ID, vendor ID, connection, group...
13:20:49 <johnthetubaguy> ijw: PCI flavor described in a host aggregate metadata, the hosts you want to allow requests to, just get added into that aggregate
13:20:56 <irenab> johnthetubaguy: do you suggest to define host aggregates by physicla connectivity?
13:21:21 <johnthetubaguy> iranab: nope, just that you persist the PCI flavor in host aggregate metadata
13:21:41 <ijw> johnthetubaguy: I'm confused.  The primary point of host aggregates is to define groups of hosts.  The primary point of PCI flavors is to define groups of PCI devices of which there are almost always many per host.  how would I put that into a host aggregate?
13:21:50 <ijw> eww
13:21:52 <johnthetubaguy> it might not be quite right, but its what you are replacing, so needs some thought
13:22:12 <ijw> In what sense is it what we're replacing?  The flavor doesn't have a host in it at all
13:23:01 <heyongli> what the property of pci device in  pci flavor should be kind of global, not bind to host, i think.
13:23:10 <johnthetubaguy> ijw: the question is more, should PCI flavor be an extra DB table or use existing data storage
13:23:22 <ijw> (or, at least, I suppose you could filter by flavor, but in fact that's pretty much the one thing no-one has suggested because I think we all agree that filtering by location in cloud isn't a great administrative idea
13:24:16 <ijw> johnthetubaguy: understood - but I think you're saying 'use host aggregates to add an aggregate with additional information that you actually want to store, then use host aggregates where PCI flavors are'
13:24:29 <johnthetubaguy> if there is a feature you want to restrict to a subset of hosts, we currently do that using host aggregates
13:24:35 <ijw> Other than 'host aggregates have metadata' I really don't see the overlap
13:24:43 <ijw> I could equally use glance images if that were the argument
13:25:04 <ijw> There isn't a feature you restrict to a subset of hosts
13:25:25 <johnthetubaguy> there is, I just can't remember what it is now, its how pClouds got implemented
13:25:31 <ijw> There's a constraint that limits you, ultimately, to a subset of hosts based on resource availability, but only in the same way that CPU and memory do
13:25:37 <johnthetubaguy> so availability zone, thats a grouping of hosts
13:25:45 <johnthetubaguy> it is stored in host aggregates today
13:25:54 <ijw> Yes, I understand what you mean by host grouping, I'm missing how that's relevant to PCI
13:26:37 <ijw> Or, more accurately, I think I see where you're going with this and it's not really relevant to the use cases
13:27:23 <irenab> I think the discussed issue is if PCI flavor is new stand alone resource or somehting that should be bound to exsisting constract?
13:27:34 <johnthetubaguy> so PCI whitelist + PC flavor could = host aggregates grouping available PCI devices
13:27:52 <johnthetubaguy> the question we ask when adding DB tables is, do we really need this extra concept
13:28:12 <johnthetubaguy> the answer maybe yes, I just need to think though why it couldn't be host aggregates
13:28:18 <ijw> So, to be clear, PCI flavors aren't intended to restrict a feature to a subset of hosts.  They're intended to find a device matching criteria.  And the device criteria are specified on the compute node, largely because - like provider networks and quantity of RAM - they're very much a property of the individual compute node and not something that changes.
13:29:30 <ijw> So there's a logical reason why the information is divided between the compute config (which is, I think, really what you're arguing against) and the PCI flavor.  The config defines what hardware is in the cloud.  The flavor defines what the user can use.
13:29:51 <irenab> ijw: if we were doing plan B, I think it could be much easier to manage
13:29:53 <ijw> In much the same as Neutron config defines a provider network and Neutron API defines a network that lives on that provider network, for instance
13:30:32 <ijw> irenab: I'm not sure it really helps - you could still quite easily end up scheduling to an aggregate that doesn't have the network you want
13:30:38 <johnthetubaguy> ijw: I am just worried about the extra DB tables, and wonder if we really need them
13:31:04 <irenab> ijw: thats why I think host aggregate is not a way here
13:31:08 <johnthetubaguy> ijw: staying with compute node config, just makes me worry about a global concept being cut and paste across every node in the cloud
13:31:14 <ijw> johnthetubaguy: well, I'm not sure if we need it or not, but I don't think host aggregate is the right substitute
13:31:28 <ijw> Why is it a crime to add a DB table?  I'm missing the concern here
13:32:15 <johnthetubaguy> well, don't want to re-invent an existing abstraction by accident, but just need to work out how different it is from what we already have
13:32:18 <ijw> johnthetubaguy: It's specific to a host and if you have multiple host types then actually your average admin will use a deployment system that determines that config.  From my perspective (and I've installed Openstack plenty of times using puppet) it's the right approach
13:33:00 <ijw> And you need the whitelist on the backend anyway, at least - otherwise using the API you can unplug hosts from their control interfaces by assigning them to passthrough - which even for an administrator is probably not what we're looking for
13:33:11 <johnthetubaguy> in this case it might be the right approach, in which case, don't add extra DB stuff we will never use I guess
13:33:20 <baoli> john, with host aggregate with meta-data, does it maintain count per host aggregate?
13:34:25 <johnthetubaguy> yeah, the count per host should be in the current host stats, using existing scheduler update paths, for my money
13:34:33 <johnthetubaguy> but maybe I am missing something
13:34:52 <ijw> johnthetubaguy: maybe you could review the document to see what you think?  My take here is that PCI flavor is a publicly exposed object and having a table to back it is probably the right approach, and having been around and around I think the backend/frontend split we have here is the one we all like most, so the question is probably if another nova-exposed object suits the task
13:35:31 <ijw> Host aggregate just seems such an odd choice, because we've almost religiously avoided categorising by host to date
13:35:49 <ijw> Not explicitly, it's just not come up, and I think because it just doesn't fit the model
13:36:16 <johnthetubaguy> right, and I think host aggregates + current scheduler host node stats, plus an extra filter, does what you need, I just need to know why that doesn't work
13:36:33 <ijw> Well, we already have pci stats in any case
13:37:07 <ijw> But that aside I think I need to talk this over with you a bit more, I'm just not seeing it (but I certainly wouldn't claim to be an expert on host aggregates)
13:37:35 <johnthetubaguy> yeah, there is a blueprint for making the current node stats extensible already, which should deal with the replacement for that
13:37:50 <johnthetubaguy> anyways, it needs more though from me too
13:37:54 <baoli> In a case of host aggregate, a host join an aggregate by means of an API. In the case of flavor, a host joins by configuration, and stats is maintained per PCI flavor. However, host aggregate meta-data and the PCI flavor definition seems to be the same.
13:38:38 <ijw> What I would say is that, again from experience, I absolutely wouldn't want to say 'a machine of type X has *these* PCI devices and these machines are of type X'.  Independently of that, it still doesn't actually get rid of the need for a flavor.
13:39:01 <johnthetubaguy> host aggregates is much simpler...
13:39:05 <johnthetubaguy> you define a new aggreage
13:39:13 <johnthetubaguy> add something that makes it a PCI flavor
13:39:21 <johnthetubaguy> something=metadata
13:39:25 <ijw> But (and I'll say it again) a PCI flavor doesn't apply to a host
13:39:32 <johnthetubaguy> add extra metadata to describe it
13:39:41 <ijw> No, I mean it's not a concept you apply to hosts
13:40:01 <johnthetubaguy> then add the hosts that have the possibility of reporting a present PCI device
13:40:14 <ijw> What if they have two?
13:40:20 <ijw> Or (in the case of SRIOV) 64?
13:40:24 <johnthetubaguy> OK, but you are applying it to hosts, its just you do that in the config file on the host, right?
13:40:38 <ijw> No, that's the PCI information, not the flavor
13:40:45 <ijw> And it doesn't have a DB table
13:40:48 <johnthetubaguy> host aggregates says the have "N" devices, where N could be 0
13:40:55 <baoli> John, with a host aggregate, it just says a host POTENTIALLY has the resource, is that understanding correct?
13:41:26 <johnthetubaguy> yeah, the scheduler needs to report the in-use devices, per flavor, and you can work out what happens later
13:41:37 <johnthetubaguy> its just an approach
13:41:57 <ijw> Actually it's not reporting in use devices per flavor in case of overlaps, too, but that's a whole other can of worms...
13:42:29 <baoli> if the host aggregate can take hosts that joins during boot up, and maintains counts per aggregate, and I see the pci flavor can definitely use it.
13:42:39 <johnthetubaguy> overlaps just mean more scheduler races, which we have semi-solutions for anyways
13:42:50 <ijw> The thing I'm confused about here is that this doesn't actually make the PCI flavor go away
13:43:01 <johnthetubaguy> the host aggregate would not store any counts
13:43:25 <baoli> that's probably make the pci flavor necessary
13:43:36 <ijw> It could potentially be used to hold the PCI information and make that go away, but there are open questions as to whether that's a good idea independently
13:44:05 <ijw> johnthetubaguy: maybe you could explain what you think a PCI flavor is?
13:44:33 <johnthetubaguy> I am guess its changed again, just trying to re-read the doc
13:44:57 <johnthetubaguy> I assume PCI flavor is a description of a set of PCI devices the user is requesting
13:45:06 <johnthetubaguy> if not, we shouldn't call it a flavor
13:45:26 <ijw> No, it's a description of a type of device that the user is requesting
13:45:40 <johnthetubaguy> well, thats what I was trying to day
13:45:51 <johnthetubaguy> its the description of the device the user is requesting
13:46:26 <ijw> More accruately, it's a match expression for one of the types of device that a user is requesting for an instance
13:46:43 <johnthetubaguy> host aggregates also maps in a secondary whitelist, effectively, which is possibly annoying, possibly useful
13:46:58 <irenab> ijw: but type may have some labels, not physical properties
13:47:40 <ijw> yup, though that's because we specify them on the backend at the moment and the labels are a convenience for scheduling (though host aggregates doesn't really solve the Neutron information issue)
13:48:15 <johnthetubaguy> either way, I recon you need a way to say this group of devices are attached to this group of neutron networks
13:48:24 <johnthetubaguy> I would add that to metadata in the host aggreate
13:48:41 <ijw> johnthetubaguy: we do and at the moment it's in the pci_information extra_info
13:49:34 <ijw> Which is just combining it with the whitelist on the compute node rather than bringing it to the DB (and again, this is intimately bound with the physical setup of your cloud so I think it *is* config, inasmuch as we've drawn a line between config and DB content)
13:51:02 <johnthetubaguy> right, well I guess that would get joined at the scheduler if its not in the PCI flavor, which, sucks but still works I guess
13:51:34 <ijw> Seems only logical to me - in much the same way that hosts report their CPU type, instances demand CPU type and the scheduler finds a host that meets demand
13:52:04 <ijw> Here, hosts report PCI device summaries, instances demand a certain number and type of PCI devices and the scheduler attempts to meet demand
13:52:17 <johnthetubaguy> Anyways, I am afraid I have spent too long reviewing this, its going to need a strong argument around this persistence story to get the blueprint approved, we can agree on that
13:53:00 <ijw> I think I'd like to talk to you after the meeting about this - I don't think we're really seeing eye to eye on what PCI flavors are for.
13:53:32 <irenab> ijw: please share the summary on ml
13:53:37 <ijw> irenab: will do
13:53:38 <johnthetubaguy> Yeah, I am being fast and loose with everything to make it fit what we have in nova
13:53:53 <heyongli> ijw,  johnthetubaguy, if you guy  talked, share that with us.
13:54:01 <sgordon_> so what is the canonical location for the up to date design document?
13:54:13 <baoli> The sticking point is the count that host aggregate doesn't maintain, therefore it can only tell a host can potentailly meet the demand, but exact selection of a host based on exact resource demand cannot be done by host aggregate in its current state.
13:54:27 <ijw> johnthetubaguy: if host aggregates seemed like the right concept then I would be entirely with you, absolutely, but that doesn't seem to be what we're defining with a flavor and that's why I'd like to go over it a bit more
13:54:50 <johnthetubaguy> afraid, will have to do something later in the week, I have to do some coding this afternoon, really need to get some code pushed up before a meeting at 15:00 UTC
13:55:03 <ijw> baoli: not really - the sticking point is more that host aggregates group hosts and nothing in the current design groups hosts at all
13:55:23 <johnthetubaguy> I can't clearly see all the concepts described in this doc:
13:55:29 <johnthetubaguy> https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support#API_interface
13:55:38 <johnthetubaguy> they are all there, just its a bit mixed in with other things
13:55:50 <baoli> ijw, you mean pci doesn't group hosts?
13:55:55 <ijw> no
13:56:06 <ijw> not as it defines resources and not as it defines PCI flavors, no
13:56:18 <johnthetubaguy> describing the use cases, the user CLI calls, then defining all the concepts, and where they are persisted, then describe how the admin sets stuff up, would really help
13:56:22 <baoli> that's interesting
13:56:48 <ijw> We end up in the scheduler with a set of machines meeting the resources, but that's the only machine set involved and we didn't create that
13:57:00 <sgordon_> johnthetubaguy, +1
13:57:07 <johnthetubaguy> for me, PCI flavor is the user concept of what resources they can request
13:57:32 <johnthetubaguy> the admin has to map that to reality, and the rest is implementation details
13:57:40 <irenab> johnthetubaguy: user tenant or user admin?
13:57:48 <ijw> It is, but they can request multiple PCI flavors per instance and multiple devices per flavor so it's not too close to anything else we're using
13:58:07 <ijw> Though I do see how you might see that being an intersection of host aggregates
13:58:26 <johnthetubaguy> its a user's concept of a PCI device they are requesting
13:58:30 <johnthetubaguy> I see that as the key bit
13:58:45 <johnthetubaguy> anyways, I think spliting that wiki into those sections I described should help
13:58:47 <ijw> Indeed, and an administrator's offering
13:59:02 <johnthetubaguy> where admin is the guy defining flavors and setting up the hosts
13:59:13 <johnthetubaguy> user is the person who only calls nova boot
13:59:18 <ijw> yup - though the events can be separated
14:00:10 <heyongli> time is up..
14:00:15 <ijw> Yup
14:00:31 <ijw> #action johnthetubaguy and ijw to discuss host aggregates, alternatives to a flavor table
14:00:35 <ijw> #endmeeting