#openstack-meeting-3 log

21:00:25 <dansmith> #startmeeting nova_cells
21:00:26 <openstack> Meeting started Wed Feb 21 21:00:25 2018 UTC and is due to finish in 60 minutes.  The chair is dansmith. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:28 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:30 <openstack> The meeting name has been set to 'nova_cells'
21:00:33 <tssurya> o/
21:00:36 <dansmith> I got distracted talking to belmoreira
21:00:41 <mriedem> o/
21:00:43 <dansmith> hence my 26 second tardiness
21:01:32 <dansmith> #topic bugs
21:01:53 <dansmith> we've got a pretty good set on the agenda, cultivated by tssurya: https://wiki.openstack.org/wiki/Meetings/NovaCellsv2
21:02:16 <dansmith> tssurya: I'm not sure how I feel about continuing to work on that first one, for cellsv1, to be honest
21:02:33 <melwitt> o/
21:02:46 <tssurya> dansmith : well we have that patch in production now
21:03:01 <tssurya> and we would be moving away from cellsv1 soon :)
21:03:08 <dansmith> tssurya: yeah, it just doesn't work for our test environment, hence my concern
21:03:35 <tssurya> dansmith : so maybe we just keep it as WIP ?
21:03:38 <dansmith> ack, so I think I'll just leave it up in case people need it, but not really push on it
21:03:38 <dansmith> yeah
21:03:42 <dansmith> I'll make a note on it
21:04:16 <dansmith> the rest of the bugs up there look straightforward and almost all have reviews, which melwitt just added to the priorities list, so ... review those
21:04:23 <dansmith> tssurya: any of those you want to highlight?
21:04:26 <tssurya> I would appreciate some pointers on trying to write a test case for this : https://review.openstack.org/#/c/546660/ with
21:04:36 <tssurya> respect to deleting RPs
21:04:54 <dansmith> tssurya: okay cool
21:05:37 <tssurya> dansmith : thanks,
21:05:46 <tssurya> will wait for your comments in the review then
21:06:00 <dansmith> tssurya: sure, or mriedem.. he's good with that stuff
21:06:00 <tssurya> I don't have anything else to highlight
21:06:03 <dansmith> okay
21:06:15 <tssurya> dansmith : okay
21:06:18 <dansmith> #topic open reviews
21:06:23 <dansmith> I have this set up: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/placement-req-filter
21:06:35 <dansmith> which is about a pre-filtering mechanism for the scheduler, which isn't cells-specific,
21:06:52 <dansmith> but came up because of the concerns tssurya and belmoreira had about the scheduler choking on the full result set from placement
21:07:04 <dansmith> this will let us fine-tune what we ask of placement for lots of cases
21:07:18 <dansmith> this being one solution for the cells case: https://review.openstack.org/#/c/545002/
21:07:26 <tssurya> dansmith : thanks again for doing this
21:07:29 <dansmith> specifically over tenant cell assignment
21:07:46 * melwitt adds to priorities etherpad
21:07:50 <dansmith> here's the start of another one that isn't cells-specific: https://review.openstack.org/546282
21:08:01 <dansmith> which would let us do AZs without a post-scheduler filter like we do today,
21:08:09 <dansmith> which will be way more efficient when users ask for a specific AZ
21:08:35 <dansmith> there is some placement API work that has to be done first in order for both of these to work, but it's just a parity thing and not too major
21:08:50 <tssurya> dansmith : so the placement aggregates would be modelled to accommodate the avz ?
21:09:05 <dansmith> tssurya: for the AZ thing yeah
21:09:15 <tssurya> cool
21:09:26 <dansmith> jay is working on a spec to allow mirroring of aggregate operations up to placement,
21:09:36 <dansmith> so when you add an aggregate and add hosts to it, nova will tell placement about those things
21:09:40 <dansmith> so you don't have to do everything twice
21:09:51 <belmoreira> dansmith by not cell specific is because it uses aggregates?
21:10:00 <dansmith> however, until that happens, you'd just have to make sure placement knows about the links
21:10:05 <melwitt> I need to read up on that. placement will do some aggregate stuff but not all, like metadata I assume?
21:10:15 <dansmith> belmoreira: not cells-specific because people that just use AZs today would still want this
21:10:39 <dansmith> melwitt: right, placement already has aggregates for things like knowing which computes are connected to which networks, shared storage, etc
21:10:48 <dansmith> but it's not as heavy as nova's implementation
21:10:55 <melwitt> k, cool
21:11:09 <melwitt> not going to have all of the key=value stuff in it
21:11:13 <dansmith> correct
21:11:21 <melwitt> got it
21:11:51 <belmoreira> dansmith do you think that then we can also have the cell abstraction?
21:12:15 <dansmith> belmoreira: what do you mean?
21:12:23 <mriedem> model cells in placement i assume
21:12:33 <mriedem> like ed's idea about nested providers
21:12:37 <belmoreira> for large sites aggregates are fine grained. We organize thinhs with cells
21:12:38 <mriedem> even though cells don't provide inventory
21:13:13 <dansmith> yeah, cells don't provide inventory, which is why I think it's a bad idea to model cells as parent providers
21:13:21 <dansmith> not to mention it makes the entire deployment in one tree
21:13:24 <belmoreira> meaning that we will need to duplicate the host-cell mapping that we already have per cell for the aggregates
21:14:03 <dansmith> belmoreira: placement is definitely not going to get a cell notion
21:14:19 <dansmith> belmoreira: the closest would be nova maintaining an aggregate per cell when hosts are mapped or something
21:14:42 <dansmith> which I guess we could do, but it doesn't excite me :)
21:14:56 <belmoreira> dansmith ack :)
21:15:08 <melwitt> nested aggregates anybody?
21:15:46 <dansmith> melwitt: that's what I'll tell people when they ask why I'm applying for my next job, yeah
21:15:46 <melwitt> or wait, we can already do that
21:15:59 <belmoreira> but if not done by nova ,operators will need to keep them in sync (aggregate/cell). Not easy...
21:16:00 <dansmith> no, we don't have nested aggregates, we have overlapping aggregates
21:16:05 <melwitt> but maybe AZs messes that up. anyway
21:16:21 <melwitt> overlapping is what I was thinking of
21:16:23 <melwitt> okay
21:16:25 <dansmith> belmoreira: well, operators that need cell based scheduling
21:17:01 <dansmith> belmoreira: so far you're the only one I know of like that, and the other large operators I've talked to want *more* management-via-aggregate, like the allocation ratios thing
21:17:31 <melwitt> fwiw I predict other large operators wanting it
21:17:49 <melwitt> like, if they had cells, they'd want to manage ratios per cell if they could
21:17:52 <dansmith> so, I get that your case would require some manual syncing of those concepts, and I get why that sucks, I just need to kinda get my head around what we ca do about it
21:18:08 <dansmith> melwitt: they can, by defining aggregates
21:18:27 <melwitt> but what if you have multi aggregates in one cell? then can't, right?
21:18:37 <dansmith> sure
21:18:54 <dansmith> that's why I think forcing people into one per cell is wrong anyway
21:19:06 <dansmith> because some people may have smallish cells and deal with things only on the cell level,
21:19:19 <dansmith> others may have giant cells, for which no one rule applies to all things in that cell
21:19:35 <dansmith> that's why aggregates can overlap and why they have metadata and not fixed attributes
21:20:05 <dansmith> and you define aggregates around the things with similar characteristics, and assign meaning to those as appropriate
21:20:12 <belmoreira> dansmth true
21:20:34 <belmoreira> but also because we lack metadata in cells
21:21:27 <dansmith> that's intentional though, so we don't have to apply all the things we can do with aggregates to cells in a different way
21:22:06 <melwitt> yeah ... that makes sense
21:22:19 <dansmith> there are a ton of things you can do with aggregates, and replicating that onto cells is just a terribly complex undertaking
21:22:27 <mriedem> isn't a lot of this trying to shoe-horn the old multi-level cells scheduler stuff into the new flat world rather than just doing things the way we can with what we have in flat scheduling?
21:22:40 <dansmith> mriedem: yes
21:22:41 <melwitt> it might, yeah
21:22:41 <mriedem> like, in cells v1 we have 2 level scheduling and can optimize the cell that's picked,
21:22:51 <mriedem> ok
21:23:03 <dansmith> it's, IMHO, more about "tenants are fixed into these silos", which is valid
21:23:21 <dansmith> those silos used to be naturally cellsv1 cells, but I don't want to tie more meaning into a cell than we have to,
21:23:29 <mriedem> because i don't think we want to add a bunch of new complexity to maintain how things were done the cells v1 way
21:23:34 <dansmith> which is why I'm resistant to giving them more meaning than just a group of computes that share a db/mq
21:23:39 <dansmith> right
21:23:41 <mriedem> ack
21:23:52 * melwitt nods
21:23:53 <mriedem> i realize that makes the transition harder
21:24:26 <dansmith> mirroring cells into aggregates (i.e. when we discover a new cell mapping, we add an aggregate, and when we map a new host, we add it to the aggregate) is an option, I just don't want to make that too easy :)
21:24:56 <belmoreira> mriedem the transition yes, but I'm must worried about the operations
21:25:51 <belmoreira> I need to setup something to keep the aggregates in sync and we will have few of them
21:25:55 <melwitt> this talk is making me think it might be useful to brainstorm a few reference deployment layouts to include in our docs
21:26:07 <belmoreira> for example: aggregate-cell; aggregate-avz
21:26:29 <melwitt> complete with how you could draw your aggregates and cells
21:26:57 <melwitt> "if you currently do this with cells v1, this is how you would do that in cells v2"
21:27:29 <melwitt> going from the multi-level stuff to the flat. anyway, just an idea
21:28:01 <dansmith> melwitt: as long as we're not making a direct mapping, but describing how you achieve things in the new system
21:28:18 <dansmith> belmoreira: so, since this is going on and to wrap up a bit:
21:28:33 * melwitt nods
21:29:03 <dansmith> belmoreira: are you willing to do the mapping with aggregates and this pre-filter thing for a first-go, and report back with how heavy it is in reality for maintenance?
21:29:39 <dansmith> presumably the worst case here is "yes, this is very hard, we have aggregates out of sync sometimes, because $reasons, etc"
21:30:03 <belmoreira> dansmith sure. let's give it a go
21:30:36 <dansmith> belmoreira: okay cool.. this aggregate idea is the result of me transitioning from "hell no" to this, which gets us close,
21:31:12 <dansmith> so I think we'll end up with something workable given more soak time and learning more about the pain points, as we have already
21:31:42 <dansmith> and we have next week to smash our brains together on ideas for refining things
21:31:57 <dansmith> okay, so.. any other open reviews to highlight? :)
21:32:01 <dansmith> other than tssurya's bug reviews
21:32:32 <melwitt> not yet, consoles stuff is still up but spec not re-approved yet. fyi
21:32:36 <belmoreira> dansmith having some filtering in the placement is already something great. thanks for that. don't let me wrong :)
21:32:43 <dansmith> belmoreira: okay :)
21:32:59 <dansmith> melwitt: ack
21:33:12 <dansmith> #topic open discussion
21:33:20 <dansmith> we've already done a lot of discussing openly
21:33:25 <dansmith> no meeting next week because obvious.
21:33:32 <dansmith> anything else to bring up?
21:33:38 <tssurya> nope
21:33:51 * dansmith 's fingers are already tired
21:34:00 <melwitt> nay
21:34:06 <dansmith> tssurya: looking forward to meeting you next week!
21:34:16 <tssurya> dansmith : same here!
21:34:17 <melwitt> ++
21:34:28 <dansmith> aight, cells team out
21:34:30 <dansmith> #endmeeting