17:00:07 <alaski> #startmeeting nova_cells
17:00:08 <openstack> Meeting started Wed Feb 11 17:00:07 2015 UTC and is due to finish in 60 minutes.  The chair is alaski. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:00:09 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:00:11 <openstack> The meeting name has been set to 'nova_cells'
17:00:22 <alaski> Anyone around?
17:00:26 <vineetmenon> /o
17:00:27 <melwitt> o/
17:00:31 <bauzas> \o
17:00:31 <dansmith> o/
17:00:33 <edleafe> o/
17:00:36 <belmoreira> o/
17:00:37 <dheeraj-gupta-4> o/
17:00:44 <alaski> excellent!
17:00:49 <alaski> #topic Testing
17:01:11 <alaski> mriedem pointed out https://bugs.launchpad.net/nova/+bug/1420322
17:01:11 <openstack> Launchpad bug 1420322 in OpenStack Compute (nova) "gate-devstack-dsvm-cells fails in volumes exercise with "Server ex-vol-inst not deleted"" [Medium,In progress] - Assigned to Matt Riedemann (mriedem)
17:01:26 <alaski> which seems to be related to what melwitt has been chasing
17:02:19 <alaski> melwitt: any updates since we discussed this yesterday?
17:03:08 <melwitt> ah, yeah. I'm chasing around all of the DetachedInstance errors. not really unfortunately, I tried your suggestion with the obj_to_primitive and will be combing through results this morning
17:03:40 <alaski> cool, thanks for chasing that
17:04:01 <vineetmenon> latest count with the regex (by melwitt) is 126
17:04:18 <bauzas> so we had regressions ?
17:04:31 <bauzas> I was only counting 74 exceptions
17:04:54 <vineetmenon> etherpad has lot more than that
17:05:07 <melwitt> I thought it's still around 74 (when you look at the CI job)
17:05:12 <alaski> I'm seeing 2 failures on a recent run
17:05:19 <vineetmenon> 9th feb
17:05:23 <bauzas> #link https://etherpad.openstack.org/p/nova-cells-testing
17:06:26 <vineetmenon> i have appened possibly rectified for failures which are no more
17:06:28 <alaski> looking at http://logs.openstack.org/67/154567/2/check/check-tempest-dsvm-cells/b9eecd8/console.html from the 10th
17:06:31 <alaski> shows 2 failures
17:06:55 <alaski> tempest.thirdparty.boto.test_ec2_instance_run.InstanceRunTest.test_run_idempotent_instances is pretty consistent in failing
17:07:13 <alaski> I've tried reproducing locally and have failed thus far
17:07:46 <bauzas> right
17:08:14 <alaski> I'd hate to just exclude it without understanding the reasoning there, since it was passing before
17:08:58 <alaski> other failures seem to be intermittent and I haven't had a chance to dig deeper
17:09:18 <bauzas> alaski: I can't even find the boto one on http://logs.openstack.org/03/127203/5/check/check-tempest-dsvm-cells/7b641b4/console.html
17:09:46 <bauzas> that's probably transient errors :(
17:10:12 <alaski> yep
17:10:43 <alaski> I ran tempest locally with --until-failure for about half a day and didn't get any failures, but they keep coming up in the gate
17:11:19 <alaski> it could use some digging from anyone interested
17:11:21 <dheeraj-gupta-4> The boto one is here too http://logs.openstack.org/81/150381/2/check/check-tempest-dsvm-cells/9f37833/console.html but the end error is different
17:11:29 <alaski> and I'll keep digging as well
17:11:40 <vineetmenon> alaski, my error count is from a local run too
17:11:46 <melwitt> in general, I've been having trouble reproducing the gate locally
17:11:59 <dheeraj-gupta-4> ahh no scratch that...its the same
17:12:31 <alaski> dheeraj-gupta-4: yep, just a different id
17:13:02 <alaski> melwitt: same here, which makes this all the more interesting
17:13:14 <alaski> but we're really close to having a passing job
17:13:48 <alaski> anything else around testing?
17:14:12 <alaski> #topic Database migrations
17:14:23 <alaski> https://review.openstack.org/#/c/153666/
17:14:48 <alaski> I've added basic support for alembic as a PoC
17:15:14 <alaski> the autogenerate feature for migrations is nice
17:15:19 <bauzas> agreed
17:15:37 <alaski> but I'm currently working on testing which requires reworking a lot of things that are geared towards the current setup
17:15:59 <bauzas> I was thinking that jerfleldt was working on shipping alembic for his expand/contract BP ?
17:16:13 <bauzas> so maybe you could just rebase on top of his patch ?
17:16:26 <alaski> bauzas: he is, but it's not a full alembic setup from what I understand
17:16:36 <bauzas> alaski: oh ok
17:16:41 <alaski> I need to touch base with him again on this
17:17:15 <bauzas> alaski: yeah, my only concern was that you could maybe leverage his work, that's it
17:17:38 <alaski> but right now the options seem to be: use alembic like I have here, depend on the expand contract work, use sqlalchemy-migrate
17:17:53 <alaski> I would like to leverage his work, but I'm not sure it's in a state to use it quite yet
17:18:32 <alaski> so then the question is, alembic or sqlalchemy-migrate for the short term
17:19:01 <bauzas> sqlalchemy-migrate seems to be the easiest path
17:19:20 <bauzas> I mean, we're focusing on having 2 connection strings for the DB API
17:19:32 <bauzas> that would directly benefit for sqlalchemy-migrate
17:20:17 <dheeraj-gupta-4> bauzas: Why only 2?
17:20:42 <alaski> sqlalchemy-migrate is easier, but I feel there are benefits to alembic that make it worth pursuing
17:21:17 <bauzas> dheeraj-gupta-4: forget the number, I was just wanting to say that we're working on providing a facade for any connection string
17:21:19 <vineetmenon> IMHO, we are limiting to two tier (one top-level, one child)
17:21:36 <alaski> let me chat with johannes to see how we can converge in a solid way
17:21:44 <bauzas> alaski: makes sense
17:22:15 <bauzas> alaski: that's only matter of priorities... :)
17:22:38 <alaski> yep
17:22:46 <alaski> anything else on the database for now?
17:23:17 <alaski> #topic Neutron integration
17:23:39 <alaski> I spoke with armax a bit yesterday about the Neutron side of things
17:24:23 <alaski> we agreed that there are a lot of open questions, and that this would make a good summit discussion on the neutron side
17:24:56 <alaski> I'm trying to get the right people together for this discussion
17:25:06 <alaski> and not limit it to the summit, but get it going before then
17:26:01 <alaski> there seem to be two main issues we face, scheduling and api contracts
17:26:01 <belmoreira> alaski: but what is the real goal on this? having a neutron for cell?
17:26:35 <alaski> belmoreira: having nova/neutron work together when Nova is using cells
17:26:54 <alaski> Neutron would like to look at cells for themselves, but I'm not sure it's clear what that means
17:27:09 <bauzas> alaski: different nets and subnets for cells then ?
17:27:18 <belmoreira> ok, but what is the problem today?
17:27:34 <belmoreira> bauzas: got it.
17:28:04 <alaski> bauzas: perhaps.  Right now I think it's unclear whether that's a requirement
17:28:11 <bauzas> alaski: I mean, if we only consider a global network for the cloud, then Neutron doesn't necessarly need to talk cells ?
17:28:18 <alaski> bauzas: right
17:28:37 <alaski> but there are at least two deployments that don't work that way
17:28:51 <bauzas> alaski: ack, so that's not a design problem, but rather a feature
17:29:18 <bauzas> mmm
17:29:34 <alaski> ideally Neutron would work for both cases, but how that's accomplished is an open problem atm
17:29:47 <bauzas> you can have multiple subnets in Neutron, you only need a global L2
17:30:02 <belmoreira> alaski: we have the same problem. subnets per cells. However we are looking into solving it in a different way
17:30:38 <bauzas> so maybe the question is : is Neutron able to scale its L2 network with cells ?
17:31:02 <alaski> belmoreira: good to know.  I'd like to ensure you or someone from CERN is involved, as well as people from Rackspace, and then whoever is interested
17:31:36 <alaski> bauzas: that may be a good way to frame it
17:32:59 <bauzas> alaski: so then, Neutron folks should be interested in that
17:33:17 <bauzas> alaski: because that's a scale problem
17:33:25 <alaski> I don't have the knowdedge of Neutron/networking that I'd like to have, so I'm going to be relying on others to know the details while I'm learning
17:34:00 <bauzas> alaski: maybe inviting Neutron folks to our meeting is worth it ?
17:34:22 <alaski> bauzas: I was told that they are interested.  And this is a good opportunity for them to learn about what cells is
17:34:33 <bauzas> alaski: totally agree with you
17:35:11 <alaski> bauzas: +1.  It would help if there were some people more knowledgeable on networking than I to help represent the Nova side
17:35:43 <alaski> I'm trying to pull in some rackspace networking folks to help
17:36:03 <alaski> and belmoreira when he's available
17:36:31 <belmoreira> alaski: yes, I'm interested on this
17:37:00 <alaski> great
17:37:13 <alaski> so my plan at the moment is primarily to get the right people talking
17:37:24 <bauzas> makes sense
17:37:51 <alaski> #topic Open Discussion
17:38:22 <alaski> anything on anyones mind?
17:38:30 <dheeraj-gupta-4> I'd like to get some feedback on https://review.openstack.org/#/c/150381/ . Like is it in line with what plans for cells are
17:38:37 <dheeraj-gupta-4> or is it totally orthogonal
17:39:27 <alaski> It looks like a really good start, and in line with the plans
17:39:48 <dheeraj-gupta-4> The two/one configuration thing is still fuzzy
17:39:56 <bauzas> dheeraj-gupta-4: I didn't had time to review your latest PS
17:40:40 <bauzas> dheeraj-gupta-4: but very quickly glancing at it, I'm +1 on it
17:40:42 <vineetmenon> any clarification about how to pass connection_url?
17:40:48 <dheeraj-gupta-4> bauzas: no worries...I tried fixing it
17:41:01 <alaski> dheeraj-gupta-4: it could use an answer to dansmiths comment about how this will be used in practice
17:41:27 <dheeraj-gupta-4> and it will probably also cross paths with what alaski is doing with DB
17:41:31 <alaski> dheeraj-gupta-4: i.e. how to make a db call to one of these engines
17:42:10 <bauzas> alaski: I just think we just need a new opt for the cells V2 DB
17:42:15 <dheeraj-gupta-4> Well.... to my mind the function calling the DBAPI method will supply the connection_url
17:42:41 <bauzas> alaski: so when calling the engine, we're just importing the opt
17:43:13 <dheeraj-gupta-4> Like say nova-api on cell show will first call a dbapi.get_cells() and then for each connection_url it will do a dbapi.whatever(connection_url)
17:43:17 <dheeraj-gupta-4> something like that
17:43:24 <alaski> bauzas: I'm thinking about when calling into different cells, the same query could go to multiple dbs
17:43:55 <bauzas> alaski: oh you mean a global engine for multiple DBs ?
17:43:58 <dheeraj-gupta-4> bauzas: Within oslo.db it is not possible
17:43:59 <alaski> dheeraj-gupta-4: it would be good to see an example of that in a patch
17:44:23 <alaski> dheeraj-gupta-4: and you might consider putting that info into the context so that each api method doesn't need to be updated
17:44:28 <bauzas> alaski: I think we need to have one engine per child cell
17:44:31 <dheeraj-gupta-4> but with the changes you asked me to do (moving the register_opts out of oslo.db and into nova) we may be able to get two sections in the conf
17:44:40 <dheeraj-gupta-4> bauzass : ^
17:44:57 <bauzas> dheeraj-gupta-4: I was not thinking of an oslo.db opt
17:45:02 <alaski> bauzas: right
17:45:18 <alaski> bauzas: and then a way to pick an engine per db call
17:45:34 <dheeraj-gupta-4> alaski: context is a security thing as per my limited understanding no?
17:45:41 <bauzas> alaski: well, that's partially done by dheeraj-gupta-4 in his patcfh
17:45:52 <bauzas> alaski: because there is a dict keyed on the connection string
17:46:09 <alaski> dheeraj-gupta-4: not entirely.  it's used for policy but it does more than that, and seems like a good fit for this
17:46:38 <alaski> it holds information about the environment a request is being made in
17:46:40 <dheeraj-gupta-4> alaski: ok, I'll look into that
17:46:43 <bauzas> alaski: by speaking about context, you mean a nova context, or a python context manager ?
17:46:52 <alaski> bauzas: nova context
17:47:21 <bauzas> alaski: well, I don't think we should have one context per cell
17:47:51 <alaski> bauzas: agreed.  but the context could hold information about which db a request should go to
17:47:53 <vineetmenon> bauzas, +1
17:48:36 <bauzas> alaski: oh I see, you mean caching the connection information into the context object ?
17:48:36 <vineetmenon> so, you couls possibly have as many contexts as many cells you have
17:49:04 <alaski> I'm going to steal from dansmith again.  He gave an example of "target_cell(context, cell) as targetted_context: do_thing(targetted_context, ...)" in a discussion I had with him a while back
17:49:10 <bauzas> alaski: IIUC, that's only nova-api which needs to handle multiple connections right ?
17:49:12 <dheeraj-gupta-4> alaski: but when API makes calls to different cells, it changes the connection info in the context or creates a new context?
17:49:24 <alaski> dheeraj-gupta-4: changes it
17:49:38 <alaski> bauzas: right
17:49:59 <bauzas> alaski: so then the local context is kinda transient ?
17:50:02 <dheeraj-gupta-4> alaski: makes sense (though I'm still a bit hazy about it all)
17:50:13 <alaski> a context manager could set something in the context on a per request basis
17:50:29 <bauzas> alaski: got your idea
17:50:44 <alaski> bauzas: the connection data would be, similar to is_admin on the context
17:51:03 <bauzas> alaski: yeah understood, just need to think about the benefits
17:51:16 <bauzas> alaski: as I said, that's very local to n-apiu
17:51:19 <bauzas> n-api
17:51:51 <alaski> bauzas: sure.  The thing I really like is this means not modying every db.api method, and using a context manager means the state is set only when needed
17:51:53 <dheeraj-gupta-4> alaski: the dansmith example is really helpful.... so using that we basically don't need to change API signatures, only the methods themselves and the callers
17:52:13 <alaski> dheeraj-gupta-4: exactly
17:52:14 <bauzas> alaski: yeah, as I said, a context manager used for caching it
17:52:57 <dheeraj-gupta-4> alaski: thanks for clearing that out
17:53:11 <alaski> dheeraj-gupta-4: np
17:53:21 <dheeraj-gupta-4> alaski: Another thing..... the two databases still bug me
17:53:36 <dheeraj-gupta-4> shouldn't there be only one DB the service *really* cares about
17:53:36 <alaski> in what way?
17:53:44 <dheeraj-gupta-4> for n-api that is nova_api
17:53:54 <dheeraj-gupta-4> for n-cpu the standard nova
17:54:20 <alaski> for everything except nova-api that will be true
17:54:36 <dheeraj-gupta-4> so while putting it into code, we don;t explicilty look to handle two DBs
17:54:40 <alaski> but nova-api will need to be able to return data that lives in the cells
17:54:57 <dheeraj-gupta-4> yes but that can be done through mechanism we discussed
17:55:50 <dheeraj-gupta-4> The point I am trying to make is that do we really need two explicit engines in the DB API  - Say nova_engine and api_engine?
17:56:09 <dheeraj-gupta-4> the default engine will always point to the correct DB as per the config file in use
17:56:15 <bauzas> dheeraj-gupta-4: from an API PoV, there N+1 engines to manage
17:56:17 <dheeraj-gupta-4> sorry if it is all confusing
17:56:27 <bauzas> dheeraj-gupta-4: N being the number of cells
17:57:09 <alaski> I see what you've worked on a being nova_engine
17:57:14 <alaski> and api_engine doesn't exist yet
17:57:27 <dheeraj-gupta-4> for n-api the nova_engine _is_ the api_engine
17:57:30 <dheeraj-gupta-4> that is my point
17:58:00 <alaski> gotcha
17:58:08 <dheeraj-gupta-4> because the n-api configuration will point it to the correct database
17:58:27 <alaski> I think we do want to be explicit, and I've had reservations about overloading the config option that wya
17:58:28 <alaski> way
17:58:54 <dheeraj-gupta-4> ok...well that was my line of thought during the WIP patch
17:59:08 <bauzas> 1 min to go :)
17:59:21 <dheeraj-gupta-4> alaski: Would you mind commenting on the WIP patch with your PoV on this
17:59:26 <dheeraj-gupta-4> I don't want to hold the meeting up
17:59:28 <alaski> dheeraj-gupta-4: sure
17:59:43 <alaski> it's a good point, and we should all be on the same page there
17:59:48 <dheeraj-gupta-4> yep
18:00:02 <dheeraj-gupta-4> I tried making the same point to dan there but....
18:00:09 <alaski> I think that's it for today
18:00:15 <alaski> thanks everyone!
18:00:20 <alaski> #endmeeting