#openstack-meeting-3 log

17:00:23 <alaski> #startmeeting nova_cells
17:00:23 <openstack> Meeting started Wed Jun  8 17:00:23 2016 UTC and is due to finish in 60 minutes.  The chair is alaski. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:00:24 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:00:27 <openstack> The meeting name has been set to 'nova_cells'
17:00:43 <mriedem> o/
17:00:44 <doffm> o/
17:00:44 <dansmith> ohai
17:00:45 <auggy> o/
17:00:48 <melwitt> o/
17:00:55 <alaski> hello everyone
17:00:59 <alaski> #topic Testing
17:01:14 <alaski> no breaks as far as I know
17:01:24 <alaski> auggy: any update on grenade testing?
17:01:34 <auggy> i've got a WIP change up - https://review.openstack.org/#/c/326105/
17:01:50 <alaski> awesome
17:01:51 <auggy> it's just copying the multinode test
17:01:54 <auggy> nothing special
17:01:56 * alaski opens a tab
17:01:59 <auggy> i'm still trying to get devstack-gate to work
17:02:15 <auggy> so i can get the grenade target devstack settings we need
17:02:24 <alaski> okay
17:02:46 <auggy> well, whenever the simple cells setup stuff is ready
17:02:47 <auggy> :)
17:03:07 <alaski> great. I'll check out the review in a bit
17:03:08 <auggy> right now i'm just troubleshooting adding that test and making jenkins pass
17:03:22 <alaski> for the simple cells setup I have https://review.openstack.org/#/c/322311/
17:03:24 <auggy> yeah it's failing right now because i'm not putting a thing somewehre it needs to go
17:04:02 <alaski> okay
17:04:07 <alaski> thanks for working on that
17:04:24 <auggy> alaski: oh great! as soon as i can get devstack-gate working to create a devstack i'll check out that change and see what happens
17:04:32 <auggy> yeah and feel free to pipe in if it looks like i'm going down a rabbit hole i shouldn't be
17:04:45 <alaski> sure
17:04:52 <alaski> #topic Open Reviews
17:05:01 <alaski> https://etherpad.openstack.org/p/newton-nova-priorities-tracking
17:05:20 <alaski> I have not been keeping my stuff up to date
17:05:30 <alaski> I will get on that in a bit
17:05:31 * dansmith wags his finger
17:05:42 * alaski hangs his head
17:05:43 <woodster_> o/
17:05:52 <melwitt> should WIP things go in there or no?
17:05:56 <alaski> please don't follow my lead, keep it up to date
17:06:02 <alaski> melwitt: I would say yes
17:06:10 <alaski> just mark it as such  in there
17:06:21 <melwitt> okay
17:06:48 <alaski> #topic Open Discussion
17:06:55 <alaski> I have a few prepared items here today
17:07:10 <alaski> first, I want to mention an issue with instance.name
17:07:24 <alaski> by default it relies on using instance.id
17:07:30 <alaski> which is assigned by the db
17:07:41 <alaski> so returning that before writing to the cell db is problematic
17:07:54 <alaski> my planned solution is to make it blank until it's in a cell db
17:07:57 <dansmith> alaski: just want to be clear: we expose instance.name via the external API?
17:08:02 <alaski> yes
17:08:08 <dansmith> I'm not sure why we would do that, but.. heh, okay
17:08:10 <alaski> I'm pretty sure we do
17:08:13 <mriedem> external attribute i think
17:08:50 <mriedem> i'm not sure why we do a lot of the things we do
17:08:50 <alaski> yep, just another instance of oversharing implementation details
17:08:57 <mriedem> which is why sean is taking a flamethrower to the api :)
17:08:57 <dansmith> "OS-EXT-SRV-ATTR:instance_name": "instance-00000001",
17:09:16 <alaski> mriedem: I'm trying to start small fires here and there as well
17:09:19 <doffm> Why do we need to base it on the db id?
17:09:25 <doffm> Cant we give it a uuid instead?
17:09:28 <alaski> doffm: we don't
17:09:29 <doffm> For new servers?
17:09:34 <alaski> but in the past we did
17:09:44 <dansmith> doffm: we don't, it's just configurable and used in some scary places
17:09:46 <mriedem> doffm: see the instance_name_template config option
17:09:52 <doffm> Ok.
17:09:54 <alaski> doffm: the more complex answer is that it's generated on each access
17:10:07 <doffm> Ouch.
17:10:10 <alaski> if we persisted it for older instances we could update this
17:10:21 <alaski> but I didn't want to go down that rabbit hole atm
17:10:22 <dansmith> we can work around this, but it's far easier to either not return it until we know it, or have it appear to change
17:10:23 <mriedem> https://github.com/openstack/nova/blob/master/nova/objects/instance.py#L252
17:10:43 <dansmith> this is a relic from long ago
17:10:58 <doffm> LOL. That function.
17:11:00 <dansmith> the last time we tried to remove it, we realized all people with lvm-backed instances would be screwed
17:11:01 <alaski> dansmith: yeah, I want to run an object abuse past you for this. but I would like to keep it blank at first
17:11:04 <dansmith> and there are a couple other places
17:11:27 <dansmith> alaski: this is not remotable so we can abuse it at will, but I'd rather start with it blank or missing yeah
17:11:40 <alaski> cool
17:11:54 <mriedem> how are we going to namespace it per cell?
17:11:59 <mriedem> to avoid collisions?
17:12:03 <alaski> we aren't
17:12:19 <alaski> there's no guarantee of uniqueness here
17:12:34 <dansmith> that's the thing
17:12:34 <alaski> right now someone could make a static template
17:12:40 <dansmith> it will overlap.. a lot
17:12:44 <dansmith> right
17:12:52 <mriedem> it eventually just turns into the uuid if it can't fit the template
17:13:05 <mriedem> https://github.com/openstack/nova/blob/master/nova/objects/instance.py#L268-L271
17:13:41 <alaski> yeah, but a template like 'instance' should work
17:13:43 <mriedem> anyway, yeah, you could goof that template up
17:13:49 <dansmith> it does't matter,
17:13:56 <dansmith> there are resources named from that template right now
17:14:03 <dansmith> if we change it, then those resources are lost
17:14:09 <alaski> I would like to address this eventually, but not as a blocker for what I'm working on now
17:14:09 <dansmith> like lvms for instance backing disks
17:14:30 <alaski> eventually I want to persist the name, and snapshot every current instance name
17:14:37 <alaski> but that's harder that it seems at first
17:15:08 <alaski> next up
17:15:16 <alaski> adding a new compute to a cell
17:15:35 <alaski> I realized yesterday that we have plans for migrating what exists to a cell
17:15:42 <alaski> but no plans for how things should be added to one
17:16:06 <alaski> so when a compute is added it needs to get mapped properly
17:16:18 <alaski> I would like it to look up the info it needs, and have it do it itself
17:16:27 <dansmith> yes please
17:16:30 <alaski> and all it needs it the cell mapping uuid
17:16:47 <alaski> so we could require that in a config, or put it in the cell db
17:17:25 <alaski> I favor putting it in the db, but a config is the simpler start I think
17:17:43 <alaski> any thoughts?
17:18:11 <melwitt> so every compute nova.conf would contain the cell uuid?
17:18:20 <alaski> yeah
17:18:34 <dansmith> I don't love it, but...
17:18:42 <melwitt> okay, just making sure I understand
17:18:44 <mriedem> how would the db one work?
17:18:58 <alaski> a new cell table that just stored that uuid
17:19:07 <alaski> since every compute is configured to point at a db
17:19:16 <dansmith> alaski: so there is another option maybe:
17:19:23 <dansmith> alaski: instead of the upcall which kinda sucks anyway,
17:19:38 <dansmith> alaski: what if we had a "discover" command either via api or nova-manage,
17:19:59 <dansmith> which would merely list all computes in each cell in sequence, find new compute nodes that aren't yet mapped, and add a mapping for them?
17:20:05 <dansmith> that would avoid the upcall, not require a config,
17:20:23 <dansmith> and would have the knowledge of the cell already so it doesn't need a cell uuid persisted anywhere other than where we have it already
17:20:34 <dansmith> and then we could let that run periodically or just say "put that in cron if you want it"
17:20:48 <dansmith> presumably people only want to do that when they know they're adding new computes for the first time
17:20:56 <dansmith> and it wouldn't be "register this one compute node" kind of sucky
17:21:08 <alaski> fair point
17:21:34 <dansmith> and it could even be "discover --all-cells" or "discover --cell=$uuid"
17:21:44 <dansmith> to be lightweight when you just lit up a new rack
17:21:55 <melwitt> interesting idea
17:22:00 <dansmith> the computes already check in to their own cell by queue, so no reason to make them upcall I think
17:22:06 <alaski> I don't love the extra deployer step, but it does simplify it
17:22:23 <dansmith> because people that don't allow upcalls by policy (which hopefully will be most people eventually) would have an issue registering new computes
17:22:45 <melwitt> that's a good point
17:22:58 <mriedem> alaski: there is an extra deployer step with the config option too
17:23:04 <dansmith> alaski: well, we could make a periodic task a  the top ad some point that just does "SELECT count(id) FROM compute_nodes" every few minutes on each cell db
17:23:04 <alaski> yeah
17:23:14 <dansmith> mriedem: yeah and that is more painful, IMHO
17:23:18 <alaski> mriedem: that's why I'm in favor of the db option. but it still requires the upcall
17:23:33 <dansmith> alaski: anyway, one more step right now that we can automate is not a huge deal I think
17:23:50 <alaski> dansmith: right. so I think this sounds like a good first step, and then it can be refined later
17:24:28 <alaski> I'll try that, and we can debate further on a review
17:25:01 <alaski> my final agenda item: I'm going to be heading to the airport during next weeks meeting, so does someone want to run it, or skip?
17:25:03 <dansmith> alaski: I can think of lots of lightweight ways the scheduler could detect that we have more compute nodes than mappings, and trigger a discovery
17:25:25 <dansmith> and by "lots" I mean "at least one"
17:25:31 <melwitt> :)
17:25:35 <alaski> dansmith: yes, until the scheduler splits (I'm still hopeful on that)
17:25:46 <dansmith> I hate meetings, I suggest we skip
17:25:54 <doffm> We can probably miss a week.
17:26:02 <dansmith> alaski: I'm not sure where all this kind of stuff goes in that case anyway, but yeah
17:26:07 <melwitt> yeah, a skip is cool with me too
17:26:19 <alaski> cool
17:26:24 <alaski> #note no meeting next week
17:26:40 <alaski> dangit
17:26:45 <alaski> #info no meeting next week
17:26:54 <alaski> okay, any other topics for today?
17:27:27 <melwitt> I wanted to mention I put up a WIP for people to have a look at for querying cell service version for compute RPC calls
17:27:57 <melwitt> https://review.openstack.org/#/c/326906/ so feel free to comment
17:28:17 <mriedem> hmmm,
17:28:29 <mriedem> that makes me think of the check i have in the get me a network rest api change
17:28:41 <mriedem> it's checking that all computes in the deployment are at least newton
17:28:57 <mriedem> with cells it would have to aggregate that all up
17:29:11 <mriedem> but you could be smarter and schedule the instance to a particular cell that is fully newton
17:29:20 <alaski> yeah
17:29:29 <alaski> I was thinking we could treat cells individually
17:30:07 <mriedem> where does the service table live in cells v2?
17:30:14 <dansmith> in the cell
17:30:41 <mriedem> so when asking for min nova-compute service, that will just be checking all computes as it does today,
17:30:49 <mriedem> but those computes would be grouped by cells in some mapping table
17:31:03 <dansmith> you can't do a single query of all compute services
17:31:18 <dansmith> you can do N for N cells and then pick the cells that are >=X
17:31:35 <mriedem> sure
17:31:48 <mriedem> that check in the api won't work unless the scheduler does the right thing though
17:32:33 <mriedem> easy out is just require all computes in all cells to be >=x
17:32:36 <dansmith> right, the scheduler would have to consider it
17:32:54 <dansmith> for single feature adds,
17:33:05 <dansmith> not allowing it until everything is upgraded is totally fine, IMHO
17:33:24 <mriedem> yeah i'm fine with that
17:33:52 <mriedem> do we have a stance on mixed cell deployments?
17:34:03 <mriedem> i.e. i can have a newton cell and a mitaka cell?
17:34:06 <alaski> they're going to need to be possible
17:34:07 <doffm> I think so.
17:34:14 <dansmith> yeah, not optional
17:34:18 <doffm> I mean you will want to roll cells.
17:34:24 <alaski> right
17:34:25 <mriedem> roll computes within your rolling cells
17:34:27 <mriedem> mfer
17:34:28 <dansmith> it'd be a regression to atomic upgrades
17:35:00 <alaski> always be upgrading
17:35:05 <dansmith> lol
17:35:08 <mriedem> god
17:35:14 <dansmith> need .. the .. tshirt
17:35:26 <mriedem> doffm: not it for the ansible changes to handle this
17:35:27 <alaski> hah
17:36:01 <doffm> mriedem: It. :(
17:36:03 <mriedem> ok i'm done with random questions
17:36:25 <melwitt> I was thinking of starting work on the server groups migrations if that's cool with everyone
17:36:44 <dansmith> I OBJECT
17:36:49 <dansmith> (I don't object)
17:36:50 <doffm> YES
17:36:53 * melwitt goes back in cave
17:36:54 <alaski> I'd rather we just got rid of them... but that's cool with me
17:37:07 <melwitt> heh
17:37:10 <mriedem> yeah i was going to say quotas is probably higher priority?
17:37:17 <mriedem> but shittier
17:37:28 <alaski> quotas is in progress right?
17:37:30 <melwitt> I think doffm is doing quotas right?
17:37:32 <mriedem> is it?
17:37:35 <mriedem> ha
17:37:36 <doffm> I will start on quotas next week.
17:37:40 <mriedem> seriously?
17:37:43 <doffm> I'll add it to our backlog.
17:37:46 <doffm> Or do it in the evening.
17:37:55 <mriedem> doffm: i need to know how this benefits ibm public cloud
17:38:14 <doffm> mriedem: Shhhhh. Nothing happening here.
17:38:14 <mriedem> anywho
17:38:19 <melwitt> I can do quotas if that's more needed, I just thought it was already taken
17:38:37 <mriedem> mark wrote the spec
17:38:40 <dansmith> melwitt: run away .. fast
17:38:52 <mriedem> but i'm pretty sure mark is overcommitted, but i'll let him hang himself if he wants
17:39:08 <melwitt> :)
17:39:55 <mriedem> let's end meeting before dan freaks out
17:40:02 <dansmith> yes please
17:40:14 <alaski> alright, anything else?
17:40:19 <alaski> better speak up quick
17:40:34 <alaski> thanks everyone!
17:40:36 <alaski> #endmeeting