14:00:18 #startmeeting nova_scheduler 14:00:19 Meeting started Mon Aug 13 14:00:18 2018 UTC and is due to finish in 60 minutes. The chair is efried. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:20 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:22 The meeting name has been set to 'nova_scheduler' 14:00:29 o/ 14:00:30 * efried waves smelling salts at cdent 14:00:32 o/ 14:00:38 o/ 14:00:45 o/ 14:01:39 #link Agenda https://wiki.openstack.org/wiki/Meetings/NovaScheduler#Agenda_for_next_meeting 14:01:52 jaypipes said he wouldn't make it today. So I gave him all the reviews. 14:01:59 \o 14:02:25 #topic last meeting 14:02:25 #link last minutes: http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-08-06-14.00.html 14:02:25 Any old business to discuss that's not elsewhere on the agenda? 14:02:48 o/ 14:03:29 #topic specs and review 14:03:29 #link latest pupdate: http://lists.openstack.org/pipermail/openstack-dev/2018-August/132852.html 14:03:29 Nobody picked this up last week. I feel the loss... maybe even enough to pick it up next week. We shall see. 14:05:19 #link reshaper series: https://review.openstack.org/#/q/topic:bp/reshape-provider-tree+status:open 14:05:19 Stein is open, so let's get this merged ASAP. 14:05:19 I believe jaypipes and gibi are on the hook here. Other eyeballs from those who were involved in the planning stages and haven't looked lately (dansmith mriedem bauzas edleafe) would be appreciated. 14:06:29 #link Gigantor SQL split and debug logging: https://review.openstack.org/#/c/590041/ 14:06:44 This has undergone rudimentary scale/perf testing and has so far been shown not to be significantly slower than the original (yay) 14:06:44 Needs scrutiny wrt content and abundance of log messages, please review. 14:07:19 hey, I'm here now... 14:07:21 when I left on friday review had identified some edge cases. did those bugs get additional test coverage 14:07:34 thought I'd have to miss the meeting, but was able to catch a bit. 14:08:19 efried: where can see the perf testing result? 14:08:27 cdent: There was a logic bug with multiple forbidden traits: the new algo was using ^ instead of -. A patch to expose the bug was merged and the code has been fixed. 14:08:58 #link Test case for multiple forbidden traits https://review.openstack.org/#/c/591070/ 14:09:14 alex_xu: There are numbers in some of the patches, stand by... 14:09:39 I wonder if we can programmatically figure out (perhaps just with coverage?) that we're getting the corners 14:10:01 alex_xu: short version: not doing nested means performance is much better. 14:10:11 alex_xu: https://review.openstack.org/#/c/590041/ see around PS4 14:10:16 jaypipes: ah...right 14:10:23 alex_xu: Oh, jaypipes is talking about something different. Stand by... 14:10:25 efried: thanks, checking 14:11:23 alex_xu: https://review.openstack.org/#/c/590388/ around PS3 commentary there's an eavesdrop link. 14:11:50 alex_xu: There's a topic in open discussion to talk more about scale/perf in general. 14:12:05 efried: got it, thanks 14:12:48 Those two patches ^ ought to be backported to rocky once we're past the (bizarre and arbitrary) RC/release phase. 14:13:29 Planning/Doing support in nova/report client for: 14:13:30 #link consumer generation handling (gibi): https://review.openstack.org/#/c/583667/ 14:13:30 Still WIP (code almost done, gibi is splitting into multiple patches) 14:14:03 there will be questionable api behaviors there 14:14:13 gibi: Nova API or placement API? 14:14:34 when the report client gets consumer conflict it retries but at some point it can run out of retry and needs to fail 14:14:38 efried: nova API 14:14:51 I will write up a list of question in a mail 14:14:51 yes, I imagine that will look like the non-retry failure would have looked before, right? 14:15:12 that's what I would expect anyway 14:15:30 efried: we don't handle today when a PUT allocations (that used to delete allocation) fails 14:15:41 efried: and many other failure cases 14:16:00 "handle" in what sense? You mean today we let the exception bubble up through the resource tracker, where it gets ignored? 14:16:03 efried: there are plenty of TODO in the above patch in the func test detailing them 14:16:15 efried: today we are silent in most cases 14:16:20 yup 14:16:39 efried: or go to error in some migration cases like revert_resize 14:16:54 efried: but revert_resize fails then the the end user might want to retry 14:17:17 anyhow plenty of edges I have to list and ask for oppinion 14:18:21 This is great to have you digging and auditing and finding these holes. Pepper the code with TODOs. Call out behavioral changes in commit messages. Do the ML thing. 14:18:37 efried: will do 14:18:38 Thanks gibi 14:18:55 yeah, even if we don't fix everything now, todo-ing what you notice is +many 14:19:17 nested and shared providers for initial allocations 14:19:17 nested and shared providers when modifying migration (and other?) allocations 14:19:17 gibi, has your work above expanded to cover any of ^^ this stuff? 14:19:45 efried: unfortunately no 14:19:52 okay. 14:20:03 tetsuro started with nested here https://review.openstack.org/#/c/585672 14:20:25 there are two set of functional test cases above it 14:20:30 one for initial and one for move 14:20:47 oh, nice. 14:20:52 will rebase it once gibi's one is split. 14:21:13 Yeah, I have those on my radar but haven't looked at them recently. But let me add them to the agenda so I don't forget (again) next week... 14:21:13 yeah, I have to move forward as lot of good work depend on the 1.28 patch now 14:22:30 #link nested and shared providers for initial & migration (and other?) allocations, series starting at: https://review.openstack.org/#/c/585672 14:23:05 Okay, any other reviews to discuss? Or specs? 14:24:37 Oh, there's a spec I didn't put on the agenda but I'd like to mention. 14:24:37 Disclaimer: I haven't read it yet. 14:24:37 #link Placement model for PCI devices https://review.openstack.org/#/c/591037/ 14:25:01 I glanced at that, it's basically a dup of your powervm spec 14:25:03 This is a next step on the road to the "generic device management" via placement. 14:25:21 okay, that was essentially what kosamara said he was going to do. 14:26:25 For reference, cdent is referring to: 14:26:25 #link nova-powervm spec for generic device passthrough via placement: https://review.openstack.org/#/c/579359/ 14:27:40 Anything else for specs/reviews? 14:27:49 #topic bugs 14:27:49 #link placement bugs: https://bugs.launchpad.net/nova/+bugs?field.tag=placement&orderby=-id 14:28:01 I added a new one over the weekend 14:28:20 https://bugs.launchpad.net/nova/+bug/1786703 14:28:20 Launchpad bug 1786703 in OpenStack Compute (nova) "Placement duplicate aggregate uuid handling during concurrent aggregate create insufficiently robust" [Medium,New] 14:28:22 oops 14:28:24 #link https://bugs.launchpad.net/nova/+bug/1786703 14:28:41 not sure if it an postgres problem, an oslo_db problem or a placement problem 14:29:35 neat 14:29:59 that was my reaction too 14:30:14 I can almost see that happening IRL. 14:30:21 in a kinda "oh lookie, I made it broked!!!" 14:30:59 Okay, any other bug topics? 14:31:03 efried: ack, thx for the prod on that GDM-ish spec. 14:31:06 will review later. 14:31:41 #topic opens 14:31:41 Where do we want to be on that extraction thing? 14:31:41 cdent, assume this is your microphone 14:32:14 it's just a reminder/open question. stein is open but we have a fair chunk of code pending 14:32:28 so we can't go _now_ but do we want to go _soon_? 14:32:39 or are we going to delay even more? 14:33:01 cdent: There's an etherpad somewhere that lists the pending steps before we can pull the trigger, right? 14:33:37 there are several etherpads that had that kind of stuff on them, but I'm not sure we made such a list? 14:34:02 or maybe you're remembering something I've forgotten 14:34:09 umm 14:34:34 I'm not very good at remember things on etherpads because they are impossible to find 14:34:50 #action cdent to find/make an etherpad (or other doc) with a burndown of remaining work items for extraction 14:34:50 ? 14:35:19 This will be useful when we get to that agenda item in denver. 14:35:30 so, actually, no. I don't want this to be solely on me, essentially because there are many factors that I don't care about. If it was just on me we would have done it long ago. 14:35:39 but 14:35:51 #undo 14:35:51 :) 14:35:52 Removing item from minutes: #action cdent to find/make an etherpad (or other doc) with a burndown of remaining work items for extraction 14:36:08 I can start an etherpad of some kind that reflects the several times I've written all this up in a blog post or on a ptg planning etherpad 14:36:47 any single point of reference would be great. 14:37:01 but the root of my question/agenda item is more a query about our combined fortitude and commitment to doing it 14:37:32 I for one am very much supportive of extraction, but definitely don't have my head wrapped around what else needs to be done around it. 14:37:49 have you (re)read my several blog posts and emails on the topic? 14:38:04 would assembling those links somewhere be a useful starting point? 14:38:19 cdent: didn't you have a blog post on extraction (a followup post to your original one)? 14:38:21 I read them when they come out (assuming I'm linked on them) but don't go back and reread them later unless prompted. 14:38:35 jaypipes: yes 14:38:57 #link placement extraction 2: https://anticdent.org/placement-extraction-2.html 14:39:01 danke 14:39:16 several of the things listed there are done 14:39:18 One one hand, yes, such a collection of links would be useful; but on the other, I don't want to daunted by a massive volume of reading if much of it is obsolete (already completed). 14:39:37 but again: the technical barriers are not my major concern at this point 14:39:50 in fact not even really my minor concern 14:40:26 cdent: the intestinal fortitude of the nova core team to allow the complete segregation of the placement service from nova itself is your primary question/concern. 14:40:44 Yes, jaypipes. At some point we're going to fork, and I'd like that to be a comfortable thing to do, if at all possible. 14:41:04 ack 14:41:32 Well, knowing which work items are mandatory and which are optional; knowing if any need to be done in a particular order; maybe even having some sense of how big each item is; getting names next to some of them... 14:41:44 All of that would be useful to break down barriers 14:42:03 cdent: I think one thing that might make that process easier is having a placementclient library that can be easily imported into nova and replace much of the code currently in scheduler/client/report.py 14:42:15 efried: are you asking me to do something, or are you just listing things that may need to be done? 14:42:41 cdent: I think efried is pining for the list, that's all :) 14:42:58 jaypipes: I don't see what the report client has to do with placement the api service being in its own repo? 14:43:14 cdent: I'm describing a thing (burndown etherpad or similar) that I think would be useful. I think you're the best person to compose it, at least initially, at least in skeletal form. 14:43:42 k 14:43:54 cdent: because having a separate client library indicates the maturity of the API service for which the client library provides an interface. 14:44:00 I agree this has nothing to do with a client library, which is a topic all of its own, and a contentious one. The mordred faction would likely argue that we shouldn't have a placementclient lib at all, ever. 14:44:36 aroo? 14:44:42 yup 14:44:46 I would strongly argue that 14:44:47 The report client does stuff that's nova-specific. The vast majority of it wouldn't be appropriate for a generic lib. 14:44:55 the api service is as mature (or at least temporarily static) as we declare it to be. I agree that if we are going to do this move we should freeze the placement api 14:44:56 cdent: put another way: the fact that the reportclient is its own bastardized client library to placement is an indicator to nova core contributors that the placement API service is not yet fully distinguishable from nova itself. 14:45:21 oh, I disagree with this ^ 14:45:23 mordred: oh, fello. 14:45:27 hello... 14:45:32 or fello 14:45:33 jaypipes: I agree with efried disagreeing 14:45:34 :) 14:45:41 but yeah - we should be deleting python-*client libraries, not making new ones 14:46:46 mordred: are openstacksdk pieces separately importable into a service? 14:46:50 yes 14:46:59 without bringing in the entire collection of other clients? 14:47:09 sdk does not touch other clients 14:47:15 as should nothing 14:47:28 since they make everything harder 14:47:36 I suspect that "placementclient" lib ought to be more to do with things like ProviderTree than with api calls. But I still think that "client stuff" is orthogonal from maturity and repo separation 14:47:45 ok, so since there is a placement openstacksdk client plugin, we should be able to import that without having to import any other client plugin? is that correct? 14:48:12 mordred: ^ 14:48:23 well, there is no such thing as openstacksdk plugins - but placement is more than welcome to put as much placement support in sdk as desired, and we can make some people core on that area if needed 14:48:28 is that a wrapper around a ksa adapter to the placement API? 14:49:27 and I'd be more than happy to work with y'all on figuring out the right/easiest way to make the progress you need to make 14:49:34 mordred: right now, we have a straight requests+ksa raw HTTP client embedded in the scheduler/client/report.py module. what I'm looking for is the ability to do: from openstacksdk import placement as place_client (or something like that) 14:49:48 yup. that would basically just work today 14:49:50 jaypipes: And then place_client.what()? 14:50:05 efried: place_client.get_providers(), blah blah... 14:50:05 .get() .put() .post() .delete() ? 14:50:08 conn = openstack.connect() ; conn.placement.get('/some/placement/path') 14:50:13 no patches needed 14:50:48 mordred: cool. maybe I can sit down with you in Denver and learn more about that. 14:50:49 bingo, there's the disconnect (so to speak). jaypipes is asking for .get_providers() etc. - basically method wrappers around particular URIs. 14:50:51 BUT - if you want to add more smarter things, we can add those methods to the proxy/adapter object that gets attached to conn as placement 14:50:55 jaypipes: would LOVE to do that 14:51:04 efried: yup - we can add one of those really easily 14:51:13 Okay. 14:51:15 That's neat. 14:51:17 But 14:51:31 and then I've been meaning to circle back around with efried about getting nova to use sdk instead of python-*client or raw ksa anyway 14:51:34 Still orthogonal to the where the placement API itself sprouts from. 14:51:35 so - yay denver! 14:51:53 mordred: Would you be able to hang out in the nova room for that topic if we had it on the agenda/ 14:51:54 ? 14:52:05 efried: absolutely 14:52:46 we running out of time 14:52:46 * efried adds to etherpad 14:53:18 Yeah, so back to placement API extraction itself. cdent, do you "volunteer" to take an action to create a master burndown list of action items toward extraction? 14:53:50 efried: to repeat... cdent's concern is not actually with that list of things. 14:53:56 efried: his concern is a metaphysical one. 14:54:07 efried: and no amount of list-making will answer those concerns. 14:54:17 #link Stein PTG etherpad https://etherpad.openstack.org/p/nova-ptg-stein 14:54:20 volunteer is a bit strong, but yes I will create that list, and yes, jay is right that I'm concerned about metaphysics, but we presumably need the list anyway 14:54:21 mordred: ^ very bottom 14:54:55 cdent: Right, because we don't need to waste time with the first wave of objections ("well, what about X, what about Y, ...") 14:55:12 because we already have X and Y listed with a plan to address. 14:55:21 okay, moving on, 5 minutes left. 14:55:34 Placement scale/perf testing in the gate (cdent) 14:55:34 #link github repo to set up a bunch of providers: https://github.com/cdent/placeload 14:55:34 #link A start on adding some kind of perf reporting to the gate: https://review.openstack.org/#/c/591367/ 14:56:00 This is to follow on with earlier mentions of cdent having experimented with some scale testing 14:56:03 and I was trying (terribly, apparently) to explain that one source of tension that nova core contributors have w.r.t. a placement extraction is the continued belief that nova == placement in so much as there is as yet no real consumer of placement outside of nova. it's a chicken and egg problem in a lot of ways, but the mess of the reportclient's calling into the placement service (and the very nova-centrism of that reportclient) makes 14:56:03 it *seem* like nova == placement for all intents and purposes. 14:56:22 He identified some serious bottlenecks, which have since been fixed. 14:56:33 neutron and blazar have significant bits of code in-progress 14:56:36 But it would be neat if this kind of testing could be more regular 14:56:43 so yeah, perf scale stuff 14:56:45 I've started: 14:56:55 #link perf info in the gate: https://review.openstack.org/#/c/591367/ 14:56:57 doesn't work yet 14:57:33 but will allow the nova-next job to report a simple timing info each run (adding only seconds to the job run time) 14:57:43 nice, good work cdent. 14:57:46 I put it on nova-next because that one uses the optional placement db 14:59:16 jaypipes: Even if nova == placement for all intents and purposes, that's *no* reason to avoid extracting it. We have a demonstrated imbalance of codebase to reviewers/developers. Extracting pieces into their own components where feasible is a good way to combat that. And placement is a significant piece with some pretty clear borders that's ripe for such a move. 14:59:19 with time it could become a bit more exciting, but at the moment I'm trying to make it as dumb as possible 15:00:10 As I mentioned earlier, that's way better than what we had before, which is nothing. 15:00:12 efried: again, I'm trying to tell you what the perceptions are. 15:00:17 Okay, we're out of time. 15:00:25 jaypipes: Would love to continue the discussion in -placement 15:00:27 #endmeeting