14:00:18 <efried> #startmeeting nova_scheduler
14:00:19 <openstack> Meeting started Mon Aug 13 14:00:18 2018 UTC and is due to finish in 60 minutes.  The chair is efried. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:20 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:22 <openstack> The meeting name has been set to 'nova_scheduler'
14:00:29 <takashin> o/
14:00:30 * efried waves smelling salts at cdent
14:00:32 <alex_xu> o/
14:00:38 <tssurya> o/
14:00:45 <tetsuro__> o/
14:01:39 <efried> #link Agenda https://wiki.openstack.org/wiki/Meetings/NovaScheduler#Agenda_for_next_meeting
14:01:52 <efried> jaypipes said he wouldn't make it today. So I gave him all the reviews.
14:01:59 <edleafe> \o
14:02:25 <efried> #topic last meeting
14:02:25 <efried> #link last minutes: http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-08-06-14.00.html
14:02:25 <efried> Any old business to discuss that's not elsewhere on the agenda?
14:02:48 <gibi> o/
14:03:29 <efried> #topic specs and review
14:03:29 <efried> #link latest pupdate: http://lists.openstack.org/pipermail/openstack-dev/2018-August/132852.html
14:03:29 <efried> Nobody picked this up last week. I feel the loss... maybe even enough to pick it up next week. We shall see.
14:05:19 <efried> #link reshaper series: https://review.openstack.org/#/q/topic:bp/reshape-provider-tree+status:open
14:05:19 <efried> Stein is open, so let's get this merged ASAP.
14:05:19 <efried> I believe jaypipes and gibi are on the hook here. Other eyeballs from those who were involved in the planning stages and haven't looked lately (dansmith mriedem bauzas edleafe) would be appreciated.
14:06:29 <efried> #link Gigantor SQL split and debug logging: https://review.openstack.org/#/c/590041/
14:06:44 <efried> This has undergone rudimentary scale/perf testing and has so far been shown not to be significantly slower than the original (yay)
14:06:44 <efried> Needs scrutiny wrt content and abundance of log messages, please review.
14:07:19 <jaypipes> hey, I'm here now...
14:07:21 <cdent> when I left on friday review had identified some edge cases. did those bugs get additional test coverage
14:07:34 <jaypipes> thought I'd have to miss the meeting, but was able to catch a bit.
14:08:19 <alex_xu> efried: where can see the perf testing result?
14:08:27 <efried> cdent: There was a logic bug with multiple forbidden traits: the new algo was using ^ instead of -. A patch to expose the bug was merged and the code has been fixed.
14:08:58 <efried> #link Test case for multiple forbidden traits https://review.openstack.org/#/c/591070/
14:09:14 <efried> alex_xu: There are numbers in some of the patches, stand by...
14:09:39 <cdent> I wonder if we can programmatically figure out (perhaps just with coverage?) that we're getting the corners
14:10:01 <jaypipes> alex_xu: short version: not doing nested means performance is much better.
14:10:11 <efried> alex_xu: https://review.openstack.org/#/c/590041/ see around PS4
14:10:16 <alex_xu> jaypipes: ah...right
14:10:23 <efried> alex_xu: Oh, jaypipes is talking about something different. Stand by...
14:10:25 <alex_xu> efried: thanks, checking
14:11:23 <efried> alex_xu: https://review.openstack.org/#/c/590388/ around PS3 commentary there's an eavesdrop link.
14:11:50 <efried> alex_xu: There's a topic in open discussion to talk more about scale/perf in general.
14:12:05 <alex_xu> efried: got it, thanks
14:12:48 <efried> Those two patches ^ ought to be backported to rocky once we're past the (bizarre and arbitrary) RC/release phase.
14:13:29 <efried> Planning/Doing support in nova/report client for:
14:13:30 <efried> #link consumer generation handling (gibi): https://review.openstack.org/#/c/583667/
14:13:30 <efried> Still WIP (code almost done, gibi is splitting into multiple patches)
14:14:03 <gibi> there will be questionable api behaviors there
14:14:13 <efried> gibi: Nova API or placement API?
14:14:34 <gibi> when the report client gets consumer conflict it retries but at some point it can run out of retry and needs to fail
14:14:38 <gibi> efried: nova API
14:14:51 <gibi> I will write up a list of question in a mail
14:14:51 <efried> yes, I imagine that will look like the non-retry failure would have looked before, right?
14:15:12 <efried> that's what I would expect anyway
14:15:30 <gibi> efried: we don't handle today when a PUT allocations (that used to delete allocation) fails
14:15:41 <gibi> efried: and many other failure cases
14:16:00 <efried> "handle" in what sense? You mean today we let the exception bubble up through the resource tracker, where it gets ignored?
14:16:03 <gibi> efried: there are plenty of TODO in the above patch in the func test detailing them
14:16:15 <gibi> efried: today we are silent in most cases
14:16:20 <efried> yup
14:16:39 <gibi> efried: or go to error in some migration cases like revert_resize
14:16:54 <gibi> efried: but revert_resize fails then the the end user might want to retry
14:17:17 <gibi> anyhow plenty of edges I have to list and ask for oppinion
14:18:21 <efried> This is great to have you digging and auditing and finding these holes. Pepper the code with TODOs. Call out behavioral changes in commit messages. Do the ML thing.
14:18:37 <gibi> efried: will do
14:18:38 <efried> Thanks gibi
14:18:55 <cdent> yeah, even if we don't fix everything now, todo-ing what you notice is +many
14:19:17 <efried> nested and shared providers for initial allocations
14:19:17 <efried> nested and shared providers when modifying migration (and other?) allocations
14:19:17 <efried> gibi, has your work above expanded to cover any of ^^ this stuff?
14:19:45 <gibi> efried: unfortunately no
14:19:52 <efried> okay.
14:20:03 <gibi> tetsuro started with nested here https://review.openstack.org/#/c/585672
14:20:25 <gibi> there are two set of functional test cases above it
14:20:30 <gibi> one for initial and one for move
14:20:47 <efried> oh, nice.
14:20:52 <tetsuro> will rebase it once gibi's one is split.
14:21:13 <efried> Yeah, I have those on my radar but haven't looked at them recently. But let me add them to the agenda so I don't forget (again) next week...
14:21:13 <gibi> yeah, I have to move forward as lot of good work depend on the 1.28 patch now
14:22:30 <efried> #link nested and shared providers for initial & migration (and other?) allocations, series starting at: https://review.openstack.org/#/c/585672
14:23:05 <efried> Okay, any other reviews to discuss? Or specs?
14:24:37 <efried> Oh, there's a spec I didn't put on the agenda but I'd like to mention.
14:24:37 <efried> Disclaimer: I haven't read it yet.
14:24:37 <efried> #link Placement model for PCI devices https://review.openstack.org/#/c/591037/
14:25:01 <cdent> I glanced at that, it's basically a dup of your powervm spec
14:25:03 <efried> This is a next step on the road to the "generic device management" via placement.
14:25:21 <efried> okay, that was essentially what kosamara said he was going to do.
14:26:25 <efried> For reference, cdent is referring to:
14:26:25 <efried> #link nova-powervm spec for generic device passthrough via placement: https://review.openstack.org/#/c/579359/
14:27:40 <efried> Anything else for specs/reviews?
14:27:49 <efried> #topic bugs
14:27:49 <efried> #link placement bugs: https://bugs.launchpad.net/nova/+bugs?field.tag=placement&orderby=-id
14:28:01 <cdent> I added a new one over the weekend
14:28:20 <cdent> https://bugs.launchpad.net/nova/+bug/1786703
14:28:20 <openstack> Launchpad bug 1786703 in OpenStack Compute (nova) "Placement duplicate aggregate uuid handling during concurrent aggregate create insufficiently robust" [Medium,New]
14:28:22 <cdent> oops
14:28:24 <cdent> #link https://bugs.launchpad.net/nova/+bug/1786703
14:28:41 <cdent> not sure if it an postgres problem, an oslo_db problem or a placement problem
14:29:35 <efried> neat
14:29:59 <cdent> that was my reaction too
14:30:14 <efried> I can almost see that happening IRL.
14:30:21 <cdent> in a kinda "oh lookie, I made it broked!!!"
14:30:59 <efried> Okay, any other bug topics?
14:31:03 <jaypipes> efried: ack, thx for the prod on that GDM-ish spec.
14:31:06 <jaypipes> will review later.
14:31:41 <efried> #topic opens
14:31:41 <efried> Where do we want to be on that extraction thing?
14:31:41 <efried> cdent, assume this is your microphone
14:32:14 <cdent> it's just a reminder/open question. stein is open but we have a fair chunk of code pending
14:32:28 <cdent> so we can't go _now_ but do we want to go _soon_?
14:32:39 <cdent> or are we going to delay even more?
14:33:01 <efried> cdent: There's an etherpad somewhere that lists the pending steps before we can pull the trigger, right?
14:33:37 <cdent> there are several etherpads that had that kind of stuff on them, but I'm not sure we made such a list?
14:34:02 <cdent> or maybe you're remembering something I've forgotten
14:34:09 <efried> umm
14:34:34 <cdent> I'm not very good at remember things on etherpads because they are impossible to find
14:34:50 <efried> #action cdent to find/make an etherpad (or other doc) with a burndown of remaining work items for extraction
14:34:50 <efried> ?
14:35:19 <efried> This will be useful when we get to that agenda item in denver.
14:35:30 <cdent> so, actually, no. I don't want this to be solely on me, essentially because there are many factors that I don't care about. If it was just on me we would have done it long ago.
14:35:39 <cdent> but
14:35:51 <efried> #undo
14:35:51 <efried> :)
14:35:52 <openstack> Removing item from minutes: #action cdent to find/make an etherpad (or other doc) with a burndown of remaining work items for extraction
14:36:08 <cdent> I can start an etherpad of some kind that reflects the several times I've written all this up in a blog post or on a ptg planning etherpad
14:36:47 <efried> any single point of reference would be great.
14:37:01 <cdent> but the root of my question/agenda item is more a query about our combined fortitude and commitment to doing it
14:37:32 <efried> I for one am very much supportive of extraction, but definitely don't have my head wrapped around what else needs to be done around it.
14:37:49 <cdent> have you (re)read my several blog posts and emails on the topic?
14:38:04 <cdent> would assembling those links somewhere be a useful starting point?
14:38:19 <jaypipes> cdent: didn't you have a blog post on extraction (a followup post to your original one)?
14:38:21 <efried> I read them when they come out (assuming I'm linked on them) but don't go back and reread them later unless prompted.
14:38:35 <cdent> jaypipes: yes
14:38:57 <cdent> #link placement extraction 2: https://anticdent.org/placement-extraction-2.html
14:39:01 <jaypipes> danke
14:39:16 <cdent> several of the things listed there are done
14:39:18 <efried> One one hand, yes, such a collection of links would be useful; but on the other, I don't want to daunted by a massive volume of reading if much of it is obsolete (already completed).
14:39:37 <cdent> but again: the technical barriers are not my major concern at this point
14:39:50 <cdent> in fact not even really my minor concern
14:40:26 <jaypipes> cdent: the intestinal fortitude of the nova core team to allow the complete segregation of the placement service from nova itself is your primary question/concern.
14:40:44 <cdent> Yes, jaypipes. At some point we're going to fork, and I'd like that to be a comfortable thing to do, if at all possible.
14:41:04 <jaypipes> ack
14:41:32 <efried> Well, knowing which work items are mandatory and which are optional; knowing if any need to be done in a particular order; maybe even having some sense of how big each item is; getting names next to some of them...
14:41:44 <efried> All of that would be useful to break down barriers
14:42:03 <jaypipes> cdent: I think one thing that might make that process easier is having a placementclient library that can be easily imported into nova and replace much of the code currently in scheduler/client/report.py
14:42:15 <cdent> efried: are you asking me to do something, or are you just listing things that may need to be done?
14:42:41 <jaypipes> cdent: I think efried is pining for the list, that's all :)
14:42:58 <cdent> jaypipes: I don't see what the report client has to do with placement the api service being in its own repo?
14:43:14 <efried> cdent: I'm describing a thing (burndown etherpad or similar) that I think would be useful. I think you're the best person to compose it, at least initially, at least in skeletal form.
14:43:42 <cdent> k
14:43:54 <jaypipes> cdent: because having a separate client library indicates the maturity of the API service for which the client library provides an interface.
14:44:00 <efried> I agree this has nothing to do with a client library, which is a topic all of its own, and a contentious one. The mordred faction would likely argue that we shouldn't have a placementclient lib at all, ever.
14:44:36 <mordred> aroo?
14:44:42 <mordred> yup
14:44:46 <mordred> I would strongly argue that
14:44:47 <efried> The report client does stuff that's nova-specific. The vast majority of it wouldn't be appropriate for a generic lib.
14:44:55 <cdent> the api service is as mature (or at least temporarily static) as we declare it to be. I agree that if we are going to do this move we should freeze the placement api
14:44:56 <jaypipes> cdent: put another way: the fact that the reportclient is its own bastardized client library to placement is an indicator to nova core contributors that the placement API service is not yet fully distinguishable from nova itself.
14:45:21 <efried> oh, I disagree with this ^
14:45:23 <jaypipes> mordred: oh, fello.
14:45:27 <jaypipes> hello...
14:45:32 <mordred> or fello
14:45:33 <cdent> jaypipes: I agree with efried disagreeing
14:45:34 <cdent> :)
14:45:41 <mordred> but yeah - we should be deleting python-*client libraries, not making new ones
14:46:46 <jaypipes> mordred: are openstacksdk pieces separately importable into a service?
14:46:50 <mordred> yes
14:46:59 <jaypipes> without bringing in the entire collection of other clients?
14:47:09 <mordred> sdk does not touch other clients
14:47:15 <mordred> as should nothing
14:47:28 <mordred> since they make everything harder
14:47:36 <cdent> I suspect that "placementclient" lib ought to be more to do with things like ProviderTree than with api calls. But I still think that "client stuff" is orthogonal from maturity and repo separation
14:47:45 <jaypipes> ok, so since there is a placement openstacksdk client plugin, we should be able to import that without having to import any other client plugin? is that correct?
14:48:12 <jaypipes> mordred: ^
14:48:23 <mordred> well, there is no such thing as openstacksdk plugins - but placement is more than welcome to put as much placement support in sdk as desired, and we can make some people core on that area if needed
14:48:28 <efried> is that a wrapper around a ksa adapter to the placement API?
14:49:27 <mordred> and I'd be more than happy to work with y'all on figuring out the right/easiest way to make the progress you need to make
14:49:34 <jaypipes> mordred: right now, we have a straight requests+ksa raw HTTP client embedded in the scheduler/client/report.py module. what I'm looking for is the ability to do: from openstacksdk import placement as place_client (or something like that)
14:49:48 <mordred> yup. that would basically just work today
14:49:50 <efried> jaypipes: And then place_client.what()?
14:50:05 <jaypipes> efried: place_client.get_providers(), blah blah...
14:50:05 <efried> .get() .put() .post() .delete() ?
14:50:08 <mordred> conn = openstack.connect() ; conn.placement.get('/some/placement/path')
14:50:13 <mordred> no patches needed
14:50:48 <jaypipes> mordred: cool. maybe I can sit down with you in Denver and learn more about that.
14:50:49 <efried> bingo, there's the disconnect (so to speak). jaypipes is asking for .get_providers() etc. - basically method wrappers around particular URIs.
14:50:51 <mordred> BUT - if you want to add more smarter things, we can add those methods to the proxy/adapter object that gets attached to conn as placement
14:50:55 <mordred> jaypipes: would LOVE to do that
14:51:04 <mordred> efried: yup - we can add one of those really easily
14:51:13 <efried> Okay.
14:51:15 <efried> That's neat.
14:51:17 <efried> But
14:51:31 <mordred> and then I've been meaning to circle back around with efried about getting nova to use sdk instead of python-*client or raw ksa anyway
14:51:34 <efried> Still orthogonal to the where the placement API itself sprouts from.
14:51:35 <mordred> so - yay denver!
14:51:53 <efried> mordred: Would you be able to hang out in the nova room for that topic if we had it on the agenda/
14:51:54 <efried> ?
14:52:05 <mordred> efried: absolutely
14:52:46 <cdent> we running out of time
14:52:46 * efried adds to etherpad
14:53:18 <efried> Yeah, so back to placement API extraction itself. cdent, do you "volunteer" to take an action to create a master burndown list of action items toward extraction?
14:53:50 <jaypipes> efried: to repeat... cdent's concern is not actually with that list of things.
14:53:56 <jaypipes> efried: his concern is a metaphysical one.
14:54:07 <jaypipes> efried: and no amount of list-making will answer those concerns.
14:54:17 <efried> #link Stein PTG etherpad https://etherpad.openstack.org/p/nova-ptg-stein
14:54:20 <cdent> volunteer is a bit strong, but yes I will create that list, and yes, jay is right that I'm concerned about metaphysics, but we presumably need the list anyway
14:54:21 <efried> mordred: ^ very bottom
14:54:55 <efried> cdent: Right, because we don't need to waste time with the first wave of objections ("well, what about X, what about Y, ...")
14:55:12 <efried> because we already have X and Y listed with a plan to address.
14:55:21 <efried> okay, moving on, 5 minutes left.
14:55:34 <efried> Placement scale/perf testing in the gate (cdent)
14:55:34 <efried> #link github repo to set up a bunch of providers: https://github.com/cdent/placeload
14:55:34 <efried> #link A start on adding some kind of perf reporting to the gate: https://review.openstack.org/#/c/591367/
14:56:00 <efried> This is to follow on with earlier mentions of cdent having experimented with some scale testing
14:56:03 <jaypipes> and I was trying (terribly, apparently) to explain that one source of tension that nova core contributors have w.r.t. a placement extraction is the continued belief that nova == placement in so much as there is as yet no real consumer of placement outside of nova. it's a chicken and egg problem in a lot of ways, but the mess of the reportclient's calling into the placement service (and the very nova-centrism of that reportclient) makes
14:56:03 <jaypipes> it *seem* like nova == placement for all intents and purposes.
14:56:22 <efried> He identified some serious bottlenecks, which have since been fixed.
14:56:33 <cdent> neutron and blazar have significant bits of code in-progress
14:56:36 <efried> But it would be neat if this kind of testing could be more regular
14:56:43 <cdent> so yeah, perf scale stuff
14:56:45 <cdent> I've started:
14:56:55 <cdent> #link perf info in the gate: https://review.openstack.org/#/c/591367/
14:56:57 <cdent> doesn't work yet
14:57:33 <cdent> but will allow the nova-next job to report a simple timing info each run (adding only seconds to the job run time)
14:57:43 <jaypipes> nice, good work cdent.
14:57:46 <cdent> I put it on nova-next because that one uses the optional placement db
14:59:16 <efried> jaypipes: Even if nova == placement for all intents and purposes, that's *no* reason to avoid extracting it. We have a demonstrated imbalance of codebase to reviewers/developers. Extracting pieces into their own components where feasible is a good way to combat that. And placement is a significant piece with some pretty clear borders that's ripe for such a move.
14:59:19 <cdent> with time it could become a bit more exciting, but at the moment I'm trying to make it as dumb as possible
15:00:10 <efried> As I mentioned earlier, that's way better than what we had before, which is nothing.
15:00:12 <jaypipes> efried: again, I'm trying to tell you what the perceptions are.
15:00:17 <efried> Okay, we're out of time.
15:00:25 <efried> jaypipes: Would love to continue the discussion in -placement
15:00:27 <efried> #endmeeting