14:00:15 <edleafe> #startmeeting nova_scheduler
14:00:16 <openstack> Meeting started Mon Jun  6 14:00:15 2016 UTC and is due to finish in 60 minutes.  The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:18 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:20 <openstack> The meeting name has been set to 'nova_scheduler'
14:00:22 <cdent> o/
14:00:23 <rlrossit> o/
14:00:23 <mriedem> o/
14:00:24 <doffm> o/
14:00:25 <mlavalle> o/
14:00:26 <takashin> o/
14:00:28 <Yingxin> o/
14:00:42 <edleafe> Good crowd today
14:00:47 <johnthetubaguy> o/
14:01:00 <edleafe> I'm a little unprepared, as I was out most of last week
14:01:14 <edleafe> No one updated the agenda, so we'll have to wing it
14:01:20 <alex_xu> o/
14:01:27 <edleafe> #topic Specs
14:01:39 <edleafe> Is there anything to discuss regarding specs?
14:01:57 <johnthetubaguy> so spec freeze happened last week
14:02:13 <johnthetubaguy> do we have a good list of the priority spec we still want to happen?
14:02:23 <johnthetubaguy> I guess the etherpad covers some of that
14:02:40 * bauzas waves
14:03:11 <bauzas> I think we possibly need to amend some already approved specs
14:03:20 <bauzas> plus the allocations sepc
14:03:21 <bauzas> spec
14:03:27 <jaypipes> o/
14:03:29 <bauzas> given what jaypipes wrote
14:03:39 <bauzas> to clarify how the RT is updating its stats
14:03:52 <bauzas> well, I should abandon the wording "stats"
14:03:52 <jaypipes> bauzas: I'm almost done with the amend to the generic-resource-pools spec.
14:03:59 <bauzas> jaypipes: coolness ++
14:04:13 <bauzas> jaypipes: thanks for the nice catch-up email btw., nice to dig into it
14:04:51 <edleafe> jaypipes: anything you need to discuss/clarify here?
14:04:53 <bauzas> so, I was saying there is a major implementation change (not really a design modification) about how the RT is sending its inventories
14:05:30 <bauzas> see the tl;dr in jaypipes's email of this morning/afternoon/<pick your TZ>
14:05:52 <bauzas> plus some cdent's open question about which interface we should use for that
14:06:03 <bauzas> (I just gave MHO to that)
14:07:08 <cdent> Yeah, my comments there are trying to draw out people's opinions, get things clarified, etc
14:07:12 <jaypipes> edleafe: no, just that the resource-providers-allocations spec will basically be overhauled.
14:07:22 <alaski> o/
14:07:43 <mlavalle> jaypipes: is it going to be a new gerrit patchset?
14:07:49 <jaypipes> edleafe: since we will not try to do the migration in the Instance objects themselves but instead rely on a duplicate call to the placement API from the resource tracker to add allocation information via the placement API.
14:07:58 <jaypipes> mlavalle: talking about the spec :)
14:08:13 <jaypipes> mlavalle: but yes, it will likely be a new patchset.
14:08:29 <mlavalle> jaypipes: thatnks, that is what I meant
14:08:47 <edleafe> OK, thanks. We can continue discussion on the ML
14:09:16 <edleafe> #topic Reviews
14:09:34 <edleafe> Anyone have anything to bring up about code reviews?
14:09:36 <bauzas> well, reviews, I'm a bit on-hold now :)
14:10:03 <bauzas> jaypipes: cdent: so AFAICS, there is a patch series starting with the Allocation object that ends up with us having a new placement endpoint, correct?
14:10:30 <cdent> bauzas: s/endpoint/service/
14:10:41 <cdent> but yes
14:10:55 <cdent> and much of it needs to be -W, because of the stuff we've talked about earlier today and late last week
14:10:59 <cdent> but some of it is still stable
14:11:09 <bauzas> well, I don't think that would be a lot impacted
14:11:10 <cdent> later today I'm going to extract the stable bits
14:11:18 <bauzas> we can still merge https://review.openstack.org/#/c/282442/
14:11:32 <cdent> bauzas: no, that one is wrong too
14:11:37 <cdent> will mark it -w now
14:11:38 <bauzas> and then, I could see how we would end-up with us having that new endpoint
14:11:41 <cdent> actually I can't, wil -1 it
14:11:43 <bauzas> cdent: why so ?
14:12:06 <bauzas> oh
14:12:07 <jaypipes> bauzas: well, what we've come up with is a plan to put most methods on the ResourceProvider object itself.
14:12:37 <jaypipes> bauza, edleafe: for instance, have a ResourceProvider.update_inventory() method and a ResourceProvider.associate_aggregate() call, etc.
14:12:37 <bauzas> jaypipes: okay, I guess it's what I haven't read yet in your summary email :)
14:12:51 <bauzas> gotcha
14:12:59 <jaypipes> bauzas: no, that was a conversation I had with cdent *after* I sent my email. :)
14:13:07 <bauzas> graaaah
14:13:23 <bauzas> I officially state here that I have free time for helping you :)
14:13:28 <jaypipes> bauzas: will send another ML post after finalize that decision with dansmith
14:13:36 <jaypipes> bauzas: understood.
14:13:48 <bauzas> so, just lemme know so I could bite a bit of that big cake :)
14:13:49 <jaypipes> bauzas: will need your review help this week more than anything else.
14:14:01 <jaypipes> bauzas: also...
14:14:45 <bauzas> (still waiting about what the supreme secret of the universe being...)
14:14:50 <jaypipes> edleafe, bauzas: I have changed the REST API from /resource_pools to /resource_providers to standardize the terminology used. from last week's discussion with the Ironic team folks, it was clear the terminology was inconsistent and confusing.
14:15:05 <bauzas> jaypipes: k, wfm
14:15:14 <edleafe> yeah, makes sense
14:16:03 <bauzas> jaypipes: cdent: one thing is confusing me, do we need a new endpoint or a new *service* ?
14:16:08 <bauzas> ie. a new port ?
14:16:28 <cdent> bauzas: it's been described as a new port from the start, to enable later extraction
14:16:43 <bauzas> I mean, the more can do the less, but that mostly impacts a lot of ops
14:17:10 <cdent> and at midcycle when we decided new, port we also decided "use less of nova wsgi architecture"
14:17:15 <bauzas> we can still have a totally separate branch that would be behind a single endpoint, without requiring a new service
14:17:45 <bauzas> cdent: sure, but you know that every new service we create is just a clear PITA for packagers and ops running our infra ?
14:18:18 <cdent> That may be, but we decided this back in January.
14:18:31 <jaypipes> bauzas: it's a new service.
14:18:35 <cdent> That was one of the few things we agreed on.
14:18:43 <bauzas> cdent: in Bristol ? I should have been sleeping by then :)
14:19:03 <cdent> You snooze you loose :)
14:19:11 <dansmith> jaypipes: by service you mean a new thing in the catalog and a new port?
14:19:24 <dansmith> long term that's certainly the expectation, but not sure we need to do that _now_
14:19:32 <dansmith> we can, of course, but..
14:19:36 <bauzas> dansmith: my point, thanks for clarifying
14:20:27 <johnthetubaguy> the alternative is not use the service catalog for now and a CONF value instead?
14:20:41 <cdent> At the time, the fear was that if we didn't do it from the outset, then we would have to maintain the halfway-way forever.
14:20:53 <johnthetubaguy> or find it relative to nova?
14:20:55 <cdent> So it would be better to do a clean bit of newness.
14:21:12 <bauzas> johnthetubaguy: I'm fine with having the placement API behind a separate endpoint, that's what we agreed
14:21:14 <dansmith> I guess I'm a bit afraid to commit to a new thing in the catalog and a new port at the moment
14:21:28 <bauzas> I'm only concerned by having a new n-something with a dedicated port
14:21:33 <dansmith> the nice thing is it means we don't have to add rpc from the api to that service if we go straight for a new port,
14:21:55 <dansmith> but that's a pretty weak decision, vs what hedging gives us in terms of being able to evolve the course
14:22:14 <cdent> \b
14:22:18 <dansmith> IMHO, if it's a separate endpoint we're not committed to anything long-term
14:22:18 <cdent> Ooops.
14:22:43 <cdent> Before last weeks re-ordering, the hedging was being done by not using the API at all, initially
14:23:12 <dansmith> and as I said last week, we don't *have* to use the API in newton
14:23:19 <dansmith> jaypipes kinda codified the decision in his summary,
14:23:40 <dansmith> but I was just explaining that I had expected we'd get there in newton and avoid the rpc upcall
14:24:42 <cdent> It sounds like we still have quite a bit of getting-on-the-same-page to be doing.
14:24:46 <dansmith> I think the real sticking point for the api comes in where neutron needs to report their resources right?
14:25:25 <dansmith> seems like maybe jaypipes has gotten pulled away.. not sure we're still making progress here...
14:25:41 <jaypipes> I
14:26:04 <jaypipes> sorry, by service I mean a new thing in th ecatalog and a new port, yes.
14:26:35 <dansmith> heh
14:26:44 <bauzas> um
14:27:10 <bauzas> I wish things would have been clearer before, because that's really concerning me :/
14:27:20 <jaypipes> bauzas: why?
14:27:48 <bauzas> jaypipes: because there are 2 possibilities with that
14:27:57 <jaypipes> bauzas: we're going to need a separate scheduler service in the catalog when the rEST API for placement exists.
14:28:28 <bauzas> #1 either we keep the current wsgi stack and just add a new service that would use the existing stack for running the new namespace
14:28:58 <bauzas> but that sounds an huge operator impact for something that could be running on the same workers
14:29:57 <bauzas> #2 or we assume the existing wsgi stack has kind of a tech debt and we deploy a new framework, but that would be terrible because we would have 2 ways of writing REST resources within one single repo
14:30:54 <bauzas> creating a new service means more than allocating a new port, operational-wise :)
14:31:03 <bauzas> in particular for packagers and deployers :)
14:31:23 <jaypipes> bauzas: we discussed a lot of that a while ago with sdague and determined it would actually be good to *not* have the placement API code inherit the Nova baggage and instead be a totally separate API service endpoint, just housed in /nova/api/openstack/placement instead of nova/api/openstack/compute
14:31:43 <bauzas> jaypipes: I agree with having its own endpoint
14:31:51 <bauzas> even, 100% to that
14:32:07 <jaypipes> bauzas: I don't really see any benefit to keeping the placement API within the same nova-os-compute-api endpoint.
14:32:18 <bauzas> jaypipes: don't get me wrogn
14:32:26 <bauzas> jaypipes: not talking of not having /placement
14:32:47 <bauzas> jaypipes: just talking of it running behind our single n-api service or not
14:33:05 <jaypipes> bauzas: we don't currently have a "single n-api" service, though.
14:33:15 <bauzas> and AFAIR, we agreed on having it a separate endpoint, I'm fine with that :)
14:33:19 <jaypipes> we have nova-os-api-compute and nova-api-metadata services.
14:33:30 <bauzas> right
14:34:42 <sdague> bauzas: this definitely should start as a 3rd API on the network
14:34:53 <sdague> otherwise, the split requires a proxy service
14:35:33 <johnthetubaguy> yeah, it has to be as separate as nova-api and metadata
14:35:39 <johnthetubaguy> at a minimum
14:36:02 <sdague> johnthetubaguy: right
14:36:02 <bauzas> sdague: I see, because our endpoint is /v2.1/<resource>, not /v2.1/os-api-compute/<resource> ?
14:36:25 <sdague> bauzas: well, for a lot of reasons, but yes, that's a symptom
14:36:39 <sdague> os-api (compute) has one router
14:36:41 <bauzas> okay, thanks for clarifying
14:36:45 <sdague> you can't really mix routers
14:37:05 <bauzas> k, I see it now
14:37:18 <jaypipes> bauzas: to be clear, this would be an API service running on a totally different port or top-level directory.
14:37:20 <sdague> and I think we've been pretty clear on this approach since ... at least bristol
14:37:42 <johnthetubaguy> it seems like we need to expose this extra service to deployers eventually, seems like doing it now is the easiest path long term
14:37:59 <edleafe> I wasn't at Bristol, but I do remember this being one of the outcomes
14:38:02 <johnthetubaguy> agreed there is significant short term pain
14:38:10 <dansmith> I was getting lost in the terminology here.. I agree it needs to be a thing peer to compute and metadata
14:38:21 <bauzas> sdague: I think I mixed endpoint and service
14:38:35 <bauzas> I clearly remember us talking about the placement API being a separate endpoint
14:38:39 <johnthetubaguy> now that doesn't mean we have to get it in the service catalog right away, I guess?
14:38:43 <sdague> bauzas: right, which is what this means
14:38:46 <sdague> johnthetubaguy: yes
14:38:50 <sdague> type=placement
14:38:51 <bauzas> sdague: okay
14:39:40 <bauzas> can we re-use the existing WSGI stack we have or do we need to somehow run a totally different router ?
14:40:11 <sdague> that is up to whoever is doing it
14:40:16 <sdague> you could do either
14:40:22 <cdent> bauzas: In the POC at https://review.openstack.org/#/c/293104/ I'm using a different router because it is much simpler than the Nova Routes mode
14:40:30 <bauzas> okay, because the latter is a bit worrying me
14:40:35 <dansmith> my concern stems really from putting the build-a-new-api-from-new-parts in the critical path
14:41:16 <johnthetubaguy> dansmith: yeah, thats a good concern, it seems like we could create the API in parallel, if we cheat for the short term
14:41:21 <cdent> dansmith: I agree that's a valid concern, but one of the things I've taken great pains to do is make sure that the code being used is very straightforward and small. _much_ more so than the compute-api
14:41:49 <cdent> As the API is currently definied, there's no need for it to be super complicated.
14:41:53 <sdague> for reference, this is where the service split between os-api and md happens today - https://github.com/openstack/nova/blob/903731e7a145eb3cd27e16461de83fdbab1baf03/nova/cmd/api.py#L52-L61 it's a super early split in the workers
14:42:12 <jaypipes> johnthetubaguy: it's those "cheats" that involve using the ovo indirection_api that I believe add to much cruft to the solution that long-term will need to be undone.
14:42:47 <johnthetubaguy> well that is the trade off here
14:42:49 <dansmith> jaypipes: I totally don't understand what you just said
14:43:30 <johnthetubaguy> well, not sure I understand the cost of the "cruft", it doesn't seem too big, as it uses existing infrastructure, but I assume I am missing something?
14:43:42 <bauzas> cdent: well, the problem with using a new router can be seen with things like https://review.openstack.org/#/c/293104/50/nova/api/openstack/placement/handlers/aggregate.py
14:43:51 <jaypipes> dansmith: the "cheats" that johnthetubaguy is referring to is not using the HTTP API at all in Newton and instead directly using InventoryList objects et al from the Nova resource tracker and using in-object-trickery to send invneotry/allocation data to one place or the other.
14:44:13 <bauzas> cdent: that's a different way of coding that needs me to having 2 different mindsets for reviewing depending on which namespace I'm looking at
14:44:39 <bauzas> (plus the fact that we're duplicating the aggregates REST resoruce, which is a bit worrying me, but out of that convo now)
14:44:49 <dansmith> jaypipes: I don't get the "in-object-trickery" part.. you mean sending inventory updates to the api db?
14:45:23 <cdent> (bauzas let's talk about that outside this meeing, because I think I can change your mind, at least a little bit)
14:45:32 <bauzas> k
14:45:35 <jaypipes> dansmith: yes. via the ComputeNode object instead of having totally separate objects for the placement service and for Nova.
14:46:06 * dansmith is still confused
14:46:30 <johnthetubaguy> can't we just call the placement service code, pre split?
14:46:31 <jaypipes> dansmith: in other words, the stuff I pushed a revert up for, I don't think that way of doing things is good. I believe we need totall different sets of objects in Nova and in the placement service (the split-out scheduler).
14:46:55 <johnthetubaguy> at least, thats what I expected to happen in that scheduler client seam
14:48:04 <jaypipes> johnthetubaguy: well, that's my point... we don't *have* a placement service until we get the REST API done and we can't get that REST API done until the object interfaces are done. And we can't get the object interfaces done until we decide how to handle data in the old ComputeNode object vs. the new ResourceProvider object..
14:49:15 <sdague> jaypipes: why can't you have a REST API until the object interfaces are done? There should be some decoupling right
14:49:20 <bauzas> ++
14:49:32 <johnthetubaguy> sdague: thats my thinking too
14:49:48 <jaypipes> johnthetubaguy: so my proposal to the ML was to work on the object interfaces for the ResourceProvider object (InventoryList, AllocationList, etc), get those defined to the point where they make sense for the split-out placement service, then get the REST API for placement finalized to use those object definitions and the API database in Nova, and then update the resource tracker to call
14:49:54 <jaypipes> that placement REST API in addition to its existing call to ComputeNode.save() whenever inventory changes.
14:50:39 <jaypipes> sdague: we could have the placement REST API just directly operate against the API DB instead of using Nova objects. Is that your suggestion?
14:51:06 <sdague> well, we know from history that what we need in the data layer and the REST layer changes over time
14:51:14 <bauzas> given Ironic in particular is actively waiting for the generic-rp to be implemented (plus dynamic-rc), I'd suggest to not wait for a potential split-out that's blocking us, and rather iterate on things we can do quickly
14:51:25 <sdague> and the REST layer needs certain guaruntees to users and has to change more slowly
14:51:40 <jaypipes> bauzas: Sylvain, that is precisely what I am trying to do: make some progress.
14:51:44 <bauzas> because I really want us to not have the Ironic host/node relationship to be kept again a couple of cycles
14:52:01 <cdent> From my standpoint, there are a lot of good ideas here, but _none_ of them are being communicated clearly and completely. Can everybody be good and join the ML thread with more complete dumps of their concerns and ideas?
14:52:14 <bauzas> jaypipes: sure, I'm just trying to say that generic-rp is what we want to achieve, not the split-out (yet)
14:52:16 <jaypipes> bauzas: I don't either
14:52:41 <bauzas> it's certainly a nice side effect that we could do that in the same time, but we shouldn't block us because of that
14:53:26 <jaypipes> bauzas: I am at a loss right now.
14:53:50 <jaypipes> bauzas: all I want to do is make forward progress but every time I suggest something you say something like "but that would be too much for operators to deal with".
14:54:15 <jaypipes> bauzas: how do you propose to make progress here exactly?
14:54:35 <bauzas> jaypipes: I totally apologize if you feel that, I was confused by the reason behind a new service, but sdague clarified that
14:55:27 <edleafe> We only have 5 minutes left. We should continue in -nova and/or the ML
14:55:36 <edleafe> #topic Opens
14:55:38 <bauzas> jaypipes: my point is, can't we just add those Inventory and Allocation objects and then work on having the REST API implemented
14:55:50 <edleafe> Anything (different) to discuss?
14:57:14 * edleafe hears crickets
14:57:36 <cdent> silence means security
14:57:45 <edleafe> OK, so let's continue this on the ML, where we can probably express our ideas more clearly
14:57:51 <edleafe> #endmeeting