21:01:21 <alaski> #startmeeting nova_cells
21:01:21 <openstack> Meeting started Wed Feb 10 21:01:21 2016 UTC and is due to finish in 60 minutes.  The chair is alaski. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:01:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:01:25 <openstack> The meeting name has been set to 'nova_cells'
21:01:31 * bauzas waves to alaski
21:01:37 <doffm> Hi.
21:01:38 <alaski> o/
21:01:43 <mriedem> o/
21:01:48 <ccarmack> o/
21:02:00 <bauzas> and... \o (because I'm a gentleman)
21:02:04 <alaski> great, let's get going
21:02:05 <ctrath> o/
21:02:11 <alaski> #topic Cells testing/bugs
21:02:34 <ccarmack> I updated https://review.openstack.org/#/c/225199/ to change server_basic_ops to test ssh
21:02:46 <ccarmack> I could use some reviews on it
21:03:03 <alaski> great
21:03:08 <alaski> #link https://review.openstack.org/#/c/225199/
21:03:28 <ccarmack> One thing I need to do is change project-config to set run_validation = true for cells
21:04:21 <ccarmack> but I already have a 3 dependency patch  …so would like to change project-config after this is approved
21:04:44 <alaski> sounds reasonable
21:04:54 <ccarmack> cool
21:05:24 <alaski> there was a tempest test added recently which failed the cells job
21:05:35 <alaski> so it was added to the exclusion list in https://review.openstack.org/#/c/277536/
21:05:50 <mriedem> alaski: that test broke all shared storage jobs too
21:05:51 <mriedem> so it was reverted
21:05:56 <alaski> oh
21:06:34 <alaski> when/if it's added back it would be good to have the security group addition behind a flag
21:06:53 <alaski> but otherwise I'm not aware of any cells failures
21:07:11 <alaski> #topic Open reviews
21:07:20 <alaski> as always, just calling out https://etherpad.openstack.org/p/mitaka-nova-priorities-tracking
21:07:31 <alaski> I have some reviews to add there, but there's a lot of stuff up now
21:07:38 <alaski> please take a look
21:07:50 <alaski> #topic Open Discussion
21:08:12 <doffm> I created a script to look at possible foreign key issues when doing the database split.
21:08:28 <doffm> https://etherpad.openstack.org/p/CellsV2-database-split
21:08:54 <doffm> I also proposed a doc change to add all missing databases to the split info.
21:09:07 <doffm> https://review.openstack.org/#/c/277543/
21:09:22 <doffm> If people could review that to discuss any database split issues i'd be grateful.
21:09:37 <bauzas> lovely
21:09:52 <alaski> yeah, really nice
21:10:06 <bauzas> that said, I just wonder how we could draw the line in the sand
21:10:06 <alaski> do we want to discuss the big one now, aggregates?
21:10:20 <doffm> We could. I don't have any really good ideas though.
21:10:39 <bauzas> the 'how' being: how can we be sure that the boundaries are good ?
21:10:44 <alaski> bauzas: we may need to draw a temporary line that we know we might revise later
21:11:26 <doffm> We wont know for sure until we try. So make a plan for the best split we can.
21:11:28 <bauzas> alaski: yeah, I just think that migrating a table to the API DB can be okay, reverting it to the cell DB could be difficult
21:11:41 <bauzas> so, I would be a bit conservative first
21:12:01 <bauzas> and just make sure we migrate the tables we really need to call
21:12:07 <alaski> that's fair
21:12:16 <doffm> Yep.
21:12:45 <alaski> the list of api tables is really just a list of tables we want to look at migrating
21:12:53 <bauzas> then cool :)
21:13:01 <alaski> in the process we may find that something shouldn't actually be moved
21:13:35 <alaski> but it's good to have the list of things to look at
21:13:45 <bauzas> yeah for sure
21:13:53 <alaski> getting back to aggregates, I started wondering if we actually do want them to span cells
21:14:09 <bauzas> I think yeah
21:14:32 <alaski> I think so too, but why?
21:14:33 <bauzas> I was seeing the aggregates differently from cells
21:14:42 <bauzas> orthogonally even
21:14:58 <doffm> I always presumed that we did. Some things might be rack / cell specific. Others global.
21:15:01 <bauzas> so, aggregates are there for 2 reasons, right?
21:15:13 <bauzas> # for a placement decision
21:15:23 <bauzas> #2 for a global item
21:15:32 <bauzas> #1 and/or #2
21:15:46 <alaski> what do you mean by a global item?
21:15:51 <bauzas> like a ratio
21:16:12 <bauzas> I mean a metadata var for all the computes in there
21:16:20 <doffm> uses_ssh has_gpu. funky_network_gear. slow_disks.
21:16:27 <bauzas> hah
21:16:42 <bauzas> so, that's why I see cells being different
21:17:17 <alaski> I agree with all of that, but will also point out that at rackspace those tags applied at the cell level
21:17:40 <bauzas> so, MHO is that cells are failure domains
21:17:46 <alaski> I'm also just playing devils advocate here
21:17:48 <bauzas> while aggregates are the above
21:18:06 <bauzas> an aggregate (and an AZ) are not a failure domain
21:18:11 <alaski> if a cell is a failure domain should there be resources that span it
21:18:30 <bauzas> I can give an example
21:18:34 <alaski> because you're not isolated against failure like that
21:18:43 <doffm> Sure we could add aggregates to each cell. funky_network_gear_1, funky_network_gear_2. Thats what people do in cellsv1 right?
21:18:48 <doffm> Adds a load on the operator though.
21:19:05 <doffm> If there are globalish concepts that span cells.
21:19:21 <bauzas> yeah, you could want to place like this
21:19:49 <bauzas> (cell_A OR cell_B) AND not cellC
21:19:59 <doffm> cell affinity.
21:20:05 <bauzas> aggregates could be one way
21:20:06 <alaski> doffm: aggregates weren't used, but yes that's essentially what happened in v1
21:20:07 <bauzas> also
21:20:49 <alaski> doffm: it was basically cells 3,4,5 can handle flavors a,b and cells 6,7 take flavors c,d
21:21:01 <bauzas> consider for example that you have 2 cells sharing each 2 types of hardware
21:21:28 <bauzas> what if as a user I care about a specific type of hardware but I don't care about where it will be placed
21:22:32 <alaski> let me ask something real quick, but I want to be clear
21:23:00 <alaski> does it matter that an aggregate spans cells, or that two cells each have an aggregate with the same properties?
21:23:36 <alaski> for affinity it does seem to matter
21:24:00 <doffm> Yes, if we are using aggregates for affinity. Also for operator load. (Having to create in N cells).
21:24:19 <doffm> Also possibly for performance in a global scheduler. (Multiple db look ups for aggregates?)
21:24:41 <alaski> but since it's really the scheduler that cares about aggregates, and it will have a global view, I wonder if the aggregates could be merged by it instead of having them global in nova
21:24:52 <bauzas> well, I just don't want to lock up a specific implementation detail that could lead to a huge design difference :)
21:25:42 <bauzas> the thing is, you can create as many aggregates per host as you wish
21:25:46 <alaski> here's my struggle
21:26:01 <alaski> I think Jays work depends on aggregates being in a cell
21:26:04 <bauzas> so, I know that lots of operators have dedicated aggregates, each per use they want
21:26:13 <alaski> but I agree that they should be global
21:26:45 <doffm> alaski: We could think of other ways around aggregates in regards to the resource pools framework.
21:26:49 <alaski> and long term I think we want to move a lot of this into a unified scheduler and not store it in nova dbs
21:27:07 <doffm> We will have to have a mapping between resource pools and cells anyway.
21:27:19 <alaski> doffm: yeah, might be the way we end up needing to go
21:27:26 <doffm> So we will have a table with resource pools ids in them.
21:27:40 <doffm> We could move the resource-pool <-> aggregate mapping to the api db.
21:27:56 <doffm> And link it to the resource pool cell-id mapping instead of directly to the resource pool table.
21:28:37 <bauzas> that could work
21:28:47 <bauzas> just another level of indirection
21:29:32 <alaski> I think I need to draw this all out at some point
21:29:45 <alaski> but that does seem to work
21:29:50 <doffm> I could write something up for us all to discuss.
21:30:00 <alaski> that would be great
21:30:04 <doffm> OK.
21:30:31 <mriedem> don't draw it out in ascii art
21:30:40 <doffm> I actually was going to.
21:30:56 <mriedem> doffm: that ML thread was before you were working on openstack...
21:30:59 <mriedem> but i digress
21:31:00 <alaski> oh man, not this discussion
21:31:13 <bauzas> heh
21:31:20 <alaski> it was my ascii art that spawned that discussion
21:31:32 <alaski> doffm: ascii art is wonderful
21:31:36 <alaski> lascii art, even better
21:31:42 <doffm> mriedem: Will inform me of what i'm supposed to do offline. :)
21:32:02 <mriedem> whatever jogo used for the arch diagram in the nova devref
21:32:04 <mriedem> use that
21:32:33 <mriedem> mtreinish: will tell you to use, oh what's it called again
21:32:41 <alaski> heh, latex :)
21:32:47 <mriedem> yeah
21:32:54 <doffm> There are two other forign key issues I found. Fixed-ips -> instance ids. SecurityGroups -> instance ids. I guess we can discuss those down the line.
21:33:03 <doffm> I only had a chance to look at security groups a little bit.
21:33:31 <alaski> for those I was thinking we could change the foreign key to be to the instance_mapping
21:33:42 <doffm> Makes sense.
21:34:14 <alaski> there could still be some gotchas in there, but it's something to try
21:34:54 <doffm> Although the security-group <-> instance mapping is many-to-many. Its most often used in the cells for accessing security groups I think.
21:35:05 <doffm> So there is some argument for keeping it in the cell db.
21:35:23 <alaski> the only other thing on the agenda is a reminder about summit proposals and newton specs, we may want to start an etherpad to track those
21:35:37 <alaski> doffm: ahh, okay
21:36:19 <alaski> although I think security_groups should really be in the api db
21:36:34 <doffm> alaski: For sure, the mapping could stay though.
21:36:47 <alaski> ahh, I see
21:37:33 <doffm> For newton specs... I guess we should start writing some. Should we table a discussion of what we want to get in for newton and go from there?
21:38:02 <alaski> yep, that's a good plan
21:38:07 <ccarmack> alaski: are you still looking for volunteers?
21:38:21 <alaski> definitely
21:38:30 <ccarmack> anything I can work on?
21:38:44 <ccarmack> I saw something about grenade updates
21:39:14 <alaski> that's one thing
21:39:28 <alaski> some slightly more invasive testing to check things that functional tests may not catch
21:39:37 <alaski> if that's possible
21:39:40 <doffm> alaski: When would you like to have the newton plans discussion, next weeks meeting? Hangouts?
21:40:08 <doffm> alaski: ccarmack: Will we want to do a multi-cell grenade eventually? I mean adding a new cell and checking everything works.
21:40:46 <doffm> As well as a multi-cell test in general. :/
21:40:55 <alaski> doffm: lets start with doing it in the meetings, and maybe have a hangout after FF when we see what progress was made in M
21:41:05 <ccarmack> maybe I should write a grenade spec
21:41:06 <doffm> Ok.
21:41:56 <alaski> yes, multi-cell testing will be necessary
21:42:15 <alaski> I'm trying to find some code that has no testing that should
21:42:44 <bauzas> so, newton high-level objective would be to allow a 2nd cell ? :)
21:43:00 <alaski> this nova-manage command is untested http://git.openstack.org/cgit/openstack/nova/tree/nova/cmd/manage.py#n1269
21:43:24 <alaski> a great grenade test would be to boot some instances and then run that command after the upgrade
21:43:35 <alaski> and ensure the migration succeeded
21:43:54 <alaski> it's going to depend on https://review.openstack.org/#/c/270565/ as well
21:44:40 <alaski> bauzas: heh
21:44:59 <doffm> Did anyone put in summit proposals?
21:45:26 <alaski> I proposed one talk on cellsv2
21:45:57 <alaski> doffm: I'm not sure if you're familiar with the format, but there will be proposals for the design summit much closer to the event
21:46:18 <alaski> those will be the technical discussions that are more helpful
21:46:33 <doffm> I'm not familiar at all. Thats good to know.
21:47:12 <alaski> the proposals that just happened are basically presentation format, the design summit is much like the midcycle
21:47:25 <alaski> except with many more people
21:47:29 <mriedem> and time boxed
21:47:31 <bauzas> alaski: I guess you're planning the cellsv2 talk for giving an high-level view ?
21:47:36 <mriedem> it's basically worse in every way :)
21:47:39 <bauzas> to ops
21:47:57 <alaski> mriedem: +1
21:48:08 <alaski> bauzas: high level view, and progress report
21:48:13 <bauzas> ack
21:48:56 <alaski> anything else for today?
21:49:29 <alaski> that was a good discussion, and I look forward to your writeup doffm
21:49:51 <doffm> Thanks.
21:50:16 <alaski> ccarmack: if any other cells work comes up this cycle I'll ping you, but next cycle should be chock full of it
21:50:21 <alaski> thanks all!
21:50:31 <alaski> #endmeeting