20:03:15 <n0ano> #startmeeting
20:03:16 <openstack> Meeting started Thu May 10 20:03:15 2012 UTC.  The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:03:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
20:03:35 <n0ano> no, we're good for orchestration
20:04:12 <maoy> cool
20:04:45 <n0ano> now if sriram appears we'll have full quorum
20:05:13 <maoy> my WIP feature branch is at github
20:05:27 <maoy> that seems like the way to do it for openstack for now.
20:06:23 <n0ano> I think that follows the BKM (Best Known Method), in fact I think you're the first one to do that.
20:06:46 <maoy> glad to be the lab rat..
20:07:31 <n0ano> we like to call it `bleeding edge` :-)
20:07:49 <n0ano> have you had anyone look at your feature branch yet?
20:08:26 <maoy> Yes. Mark McLoughlin and Jay Pipes
20:08:44 <maoy> i'm also in contact with some folks from IBM and NTT
20:09:00 <n0ano> excellent, any feedback so far?
20:09:15 <maoy> yes, some inline comments at github.
20:09:32 <maoy> will have a much better update next week
20:09:50 <n0ano> sounds good
20:09:53 <maoy> i'm not entirely sure if I should rebase or merge the new update though..
20:10:16 <n0ano> I would think re-basing would be the way to go, is there a problem.
20:10:20 <maoy> perhaps i should just use a different branch every time..
20:10:53 <maoy> and rebase
20:11:37 <n0ano> branches are very cheap in git, I use them extensively
20:11:56 <n0ano> pretty much, when I doubt I create a new branch
20:13:54 <maoy> about the blueprint, i'm inclined to update the blueprint inplace than creating a new one
20:14:27 <n0ano> works for me, that should actually create a history which is good
20:14:35 <vishy> I have some comments about orchestration stuff
20:14:47 <vishy> esp. regarding maoy's proposed code
20:15:07 <maoy> great
20:15:40 <maoy> i was hoping to hear from you vishy..
20:16:02 <vishy> maoy: should i mention now?
20:16:06 <n0ano> #topic proposed code
20:16:35 <vishy> so first the major concern: we are trying to get rid of all db access in nova-compute
20:17:22 <maoy> yes please.
20:17:39 <maoy> that should work when the zookeeper backend is in.
20:18:31 <maoy> without database access, i'm assuming there is a place to write persistent state, such as the health monitor, or report capability
20:20:27 <vishy> maoy: so there are to other things
20:20:48 <vishy> a) if compute isn't hitting the db, I don't think we need distributed state management in compute
20:21:22 <vishy> b) it is possible that distributed state isn't needed at all.  Some people have suggested that there are lock-free approaches which might save us a lot of extra work
20:21:40 <vishy> the scheduler could be a a different story
20:22:08 <vishy> but for individual vm state management i think in memory state machine is probably fine on the compute node
20:22:35 <vishy> here is the general principle that I'm going to suggest
20:23:06 <vishy> user requests come into api and they are performed by simply making a call to compute and succeding or failing
20:23:27 <vishy> state is propogated back up from compute to api periodically
20:23:52 <vishy> the api node doesn't need to make decisions about state because it lets the owning node do it
20:24:13 <vishy> there are a few special cases which need to be considered but this can be solved in a lock-free way as well.
20:24:28 <maoy> this should work if the state is local. e.g. the compute node owns the VM
20:24:30 <vishy> thoughts?
20:24:43 <maoy> but my concerns are mostly non local state:
20:24:48 <vishy> maoy: such as?
20:25:06 <maoy> a) volume + vm needs to work together, also network
20:25:08 <maoy> b) vm migration
20:25:55 <vishy> i think a) makes sense and so there may be a need for that kind of state management at a higher layer
20:26:21 <vishy> although I'm not totally sure we are doing anything complicated enough there to warrant distributed locking
20:26:51 <vishy> b) what kind of state is important in this case? and does it need to be managed on multiple nodes?
20:27:25 <maoy> for b) which node owns the VM? the source or the target?
20:29:34 <vishy> maoy: the source until the migration is complete
20:30:10 <vishy> maoy: the two nodes already need to communicate directly to perform the migration so having a higher level lock arbiter seems like a bit of overkill in this case
20:30:25 <vishy> maoy: but perhaps there is a complicated case where it would be necessary
20:33:21 <maoy> vishy: there might be tricky crash cases where it's not clear who owns what..
20:33:42 <vishy> maoy: I think in general i would prefer if we are doing distributed locking that it does not happen in the compute worker
20:33:58 <vishy> maoy: i want the compute worker to be as dumb as possible and have access to as little as possible
20:34:15 <maoy> vishy: regardless of how it's implemented, the task abstraction still holds.
20:34:24 <vishy> maoy: however it probably needs an internal state machine
20:35:08 <vishy> maoy: to handle some of the transitions required.
20:35:58 <maoy> vishy: ok. points taken. but i don't think the locking mechanism i have in mind is more complicated than local locks.
20:36:20 <vishy> maoy: otherwise i like the idea of tracking actions and steps via something like what you proposed. In fact I tried to make a generalized task system for python here https://github.com/vishvananda/task
20:36:37 <vishy> maoy: before i discovered that celery does essentially the same thing only better :)
20:37:48 <maoy> vishy: i need to look into celery. does celery allow you to kill tasks and recycle locks/resources?
20:38:05 <vishy> maoy: not sure I never got into it that deeply
20:38:55 <maoy> vishy: so even within the compute node, the tracking actions and kill tasks functions are still necessary..
20:38:56 <vishy> maoy: doesn't look like it has it out of the box: http://loose-bits.com/2010/10/distributed-task-locking-in-celery.html
20:39:22 <vishy> maoy: I agree, I just don't want it to have to talk to a centralized db/zookeeper if possible
20:39:50 <vishy> maoy: and I wonder how much of it is already implemented in the drivers
20:39:53 <maoy> vishy: i see your point. that's one backend change. right? from a centralized db to a in memory one..
20:40:02 <vishy> maoy: as in xen and libvirt already have to handle state management
20:40:21 <vishy> maoy: so we may get a lot of it for free
20:40:47 <maoy> vishy: i saw those i was actually planning to use the state management code as well.
20:40:48 <vishy> maoy: by just going try: reboot() except: libvirtError rollback()
20:41:42 <vishy> maoy: true, but I wonder if using the db layer is necessary at all.
20:42:15 <vishy> maoy: you could use in memory sqlite but that is going to do table locking and nastiness
20:42:42 <vishy> maoy: so maybe something specifically designed to handle that kind of stuff would be better.
20:42:44 <maoy> vishy: a in memory hash table is enough. actually that's how i started.
20:43:12 <vishy> maoy: That seems like a great place to start, do a simple in memory one
20:43:19 <vishy> maoy: we may find that is all we need.
20:43:24 <maoy> vishy: but I felt that the information is useful for ops to gain insight of the system in general, so keeping the log in db is not a bad place.
20:44:03 <vishy> maoy: hmm i guess that is a good point.  There is a review in to store running greenlets, have you seen it?
20:44:03 <maoy> vishy: the thing is, once the task traverse the node boundry, e.g. from compute to network, you lose the context
20:44:50 <maoy> vishy: not yet.. link plz..
20:44:55 <vishy> maoy: https://review.openstack.org/#/c/6694/
20:45:22 <vishy> maoy: so this seems like it is solving a very similar problem
20:46:01 <vishy> maoy: especially if we add subtasks/logging to the idea
20:46:32 <vishy> maoy: persistence is also a possibility but I feel like we could add that later if needed.
20:46:33 <maoy> vishy: ok. will take a look. it's local task tracking or cross node? i can't tell from the title..
20:47:42 <maoy> vishy: i can't connect the blueprint with the patch title. perhaps i should ping JE and read the code for more details.
20:47:59 <vishy> maoy: yeah do that
20:48:03 <vishy> maoy: it is just local
20:48:22 <vishy> maoy: and it is specific to greenthreads (no further granularity)
20:48:39 <maoy> vishy: i'd also have a function where the ops can just say, find all running tasks against that VM, kill them if necessary
20:48:56 <vishy> maoy: yes i think that is where the patch tries to get
20:49:04 <vishy> maoy: you should probably sync up with him
20:49:30 <maoy> vishy: then i though migration might make this tricky so a centralized version is dead simple to get started.
20:49:33 <maoy> vishy: yeah sure.
20:50:14 <maoy> vishy: i have some VM+EBS race conditions in my amazon cloud so I'd like to get that right in openstack. :)
20:50:35 <vishy> maoy: i think we can see how far we get without centralizing.  I agree that we will need it for higher-level orchestration
20:50:40 <maoy> vishy: but local task tracking is definitely composable with a global/distributed one
20:51:00 <vishy> maoy: but that could be something that lives above nova / quantum / cinder
20:51:37 <maoy> vishy: that's indeed what's in my mind but i have to start from somewhere.. so nova..
20:52:12 <vishy> maoy: also check out this one https://review.openstack.org/#/c/7323/
20:52:34 <vishy> maoy: it looks like johannes is trying to solve the same problems as you, so you should probably communicate :)
20:53:07 <maoy> vishy: ok. that means i'm solving the right problems at least. :)
20:54:23 <maoy> vishy: is there more docs on how to get rid of db?
20:54:32 <maoy> vishy: at compute.
20:56:25 <maoy> vishy: I'm afraid we might have to abuse rabbitmq more to extract state from compute nodes.
20:57:47 <n0ano> compute nodes are already sending state info to the scheduler, can you ride on top of that?
20:58:23 <vishy> maoy: i don't know if there are docs yet
20:58:39 <vishy> maoy: but the idea is to just allow computes to report state about there vms
20:58:52 <vishy> maoy: and all relevant info will be passed in through the queue
20:59:46 <vishy> maoy: my initial version was going to make the api nodes listen and just throw data back in a periodic task
21:00:22 <vishy> maoy: and update the state of the instance on the other end
21:00:58 <vishy> if we keep the user requested state as a separate field, then we don't run into weird timing collisions
21:02:05 <maoy> vishy: i'm not sure i follow this. but it seems like the api nodes, other than translating api calls to compute/network apis, it also monitors the task execution status?
21:02:34 <vishy> maoy: no not task execution status, just vm state
21:03:33 <vishy> maoy: nova-api is just an easy place to put the receiving end of the call, it could also be a separate worker: nova-instance-state-writer or some such
21:04:07 <maoy> vishy: got you
21:05:09 <maoy> vishy: so the vm state change in db now happens in n-cpu, but will be rpc-ed to nova-state-writer who does the db ops
21:05:28 <vishy> maoy: correct
21:05:57 <vishy> maoy: and the calls from api -> compute will pass in all the relevant info so it doesn't need to read from the db either
21:06:08 <vishy> i.e. the entire instance object instead of just an id
21:06:35 <maoy> vishy: great. that makes sense.
21:09:02 <maoy> vishy: i will take a closer look at the the code in review and see how that fits the task management i have.
21:09:49 <maoy> vishy: will make the backend plugable to fit both local in memory and distributed case.
21:11:12 <maoy> vishy: I wish I saw Johannes's patch earlier..
21:11:26 <vishy> maoy: hard to keep track of this stuff, I know :)
21:11:26 <maoy> vishy: is there any attempt on utilizing celery by anyone else?
21:11:36 <vishy> maoy: not that i know of
21:13:14 <maoy> vishy: ok. so i'll ignore it for now. :)
21:13:47 <maoy> vishy: where would the compute node health status update go without db?
21:14:20 <maoy> i know the IBM folks are working on a zookeeper backend for that.
21:14:40 <vishy> maoy: passed through the queue most likely
21:15:15 <maoy> is this going to happen in folsom or later release?
21:16:19 <vishy> maoy: we are going to try and get all db access out in folsom
21:16:29 <vishy> maoy: but we will see how it goes
21:17:20 <maoy> vishy: what about which VMs should be running on the node -- used periodically to compare it against libvirt/xenapi
21:17:38 <maoy> vishy: does that mean the compute node need to maintain a local copy?
21:17:59 <vishy> maoy: I don't think so, I think the periodic task could be initiated by api/external worker
21:18:31 <vishy> maoy: it could glob the instances directory periodically or something
21:18:54 <vishy> maoy: but having a separate data store I don't think would be needed
21:19:30 <vishy> maoy: alternatively it could keep a list in memory, and make a request out to api/scheduler/nova-db-reader or something and get a list when it starts up
21:20:07 <maoy> vishy: ok. sounds like a lot of changes. will this happen gradually in trunk or on a feature branch?
21:20:18 <vishy> maoy: feature branch i think
21:20:37 <vishy> we are trying to pull staged changes out of trunk
21:22:19 <maoy> vishy: ok. will keep an eye on it. thanks!
21:25:19 <maoy> vishy: i would imagine there are some tricky cases to get the periodic tasks right on n-cpu. but in general i think making n-cpu dumb is the right direction.
21:28:57 <maoy> n0ano: i think we are done in the discussion.
21:29:05 <maoy> vishy: thanks so much for jumping in. :)
21:29:50 <n0ano> sounds good
21:30:25 <n0ano> is there a resolution that needs tp be documented?
21:31:57 <vishy> maoy: yw
21:33:09 <maoy> n0ano: tp?
21:33:36 <n0ano> tp - sorry, don't know the abbreviation
21:39:05 <maoy> n0ano: oh i think you mean "needs to be documented". right?
21:39:24 <n0ano> oops
21:39:28 <n0ano> s/tp/to
21:39:34 <maoy> we have the meeting log for everything, right?
21:39:51 <maoy> not sure about resolution..
21:39:59 <n0ano> yep, if you don't have a succinct summary that is sufficient.
21:40:58 <n0ano> let's go with the full log and we'll talk again next week
21:41:04 <n0ano> #endmeeting