20:00:10 <harlowja> #startmeeting state-management
20:00:10 <openstack> Meeting started Thu May 30 20:00:10 2013 UTC.  The chair is harlowja. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:00:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
20:00:14 <openstack> The meeting name has been set to 'state_management'
20:00:29 <harlowja> Hi all, rollcall if anyone is interested :)
20:00:50 <changbl> harlowja, hello there
20:00:52 <harlowja> howdy!
20:01:17 <alexpilotti> Ho
20:02:03 * harlowja give others a few minutes
20:02:14 <harlowja> #link https://wiki.openstack.org/wiki/Meetings/StateManagement#Agenda_for_next_meeting
20:02:45 <adrian_otto> hi
20:03:05 <harlowja> hey
20:03:31 <harlowja> so seems like we have enough people
20:03:48 <harlowja> can start off with a little status of the library (from what i know)
20:03:56 <harlowja> #topic status
20:04:12 <harlowja> anyone that has done anything with the library feel free to let others know :)
20:04:43 <changbl> Looks like you guys have some good progress, I am still trying to catch up with you folks
20:04:49 <harlowja> i've been adding more decorators in, helping rackspace folks get there changes in for the db 'backend'
20:04:53 <harlowja> changbl thx!
20:04:58 <changbl> harlowja, the code is mainly in TaskFlow right?
20:05:04 <changbl> not NovaOrc?
20:05:10 <harlowja> correct
20:05:28 <harlowja> #link https://github.com/Yahoo/TaskFlow
20:05:34 <changbl> ok, I will check taskflow
20:05:35 <harlowja> thats current location until the stackforge move finishes
20:05:56 <changbl> so we are moving taskflow to stackforge?
20:06:08 <harlowja> ya, so that its not just a yahoo (aka me) thing :)
20:06:37 <harlowja> and we can use the review system that everyone involved in openstack knows
20:06:39 <harlowja> and all that
20:06:40 <kebray> hello.
20:06:43 <changbl> sure
20:06:44 <harlowja> hi kebray
20:07:06 <harlowja> kebray and updates from your side, just going through a little status of 'the world;
20:07:15 <harlowja> ^any
20:07:16 <uvirtbot> harlowja: Error: "any" is not a valid command.
20:07:28 <changbl> harlowja, do I need to check anything on NovaOrc?
20:07:59 <harlowja> changbl if u just want to see how some of the nttdata folks and i started shifting code around in nova, then its useful
20:07:59 <kebray> No updates from me other than I know Jessica was reworking her code…
20:08:08 <harlowja> kebray thx
20:08:19 <changbl> harlowja, ok
20:08:26 <kebray> she has an idea for distributed task flow management, but the first path she went down didn't pan out.. but, she's reworking it.
20:08:32 <harlowja> all good :)
20:08:43 <kebray> She's out of office today.
20:09:06 <harlowja> np
20:09:14 <changbl> kebray, is Jessica working on ZK backend now?
20:09:39 <kebray> changbl.  not at the moment… last I heard from her I think she said someone else is working on that.  maybe you?
20:10:09 <kebray> She is working on distributed state management, using celery I believe.
20:10:21 <harlowja> #topic whose-working-on-what
20:10:23 <kebray> as opposed to linear/sequential state management.
20:10:26 <changbl> kebray, I had a very busy week... but I plan to work on ZK backend
20:10:34 <harlowja> changbl i think thats still an open item
20:10:45 <kebray> changbl: excellent.  We have an interest. so happy to hear that.
20:11:07 <changbl> kebray, sounds good
20:11:10 <harlowja> btw i'm working on plugging cinder in (and just the general library goodness)
20:11:37 <harlowja> i think kevin and jessica (from rackspace) are doing the distributed states (with celery) + db backend
20:11:37 <kebray> harlowja:  where is the code living now?  Is it on stackforge?
20:11:54 <kebray> harlowja:  correct about kevin and Jessica.
20:12:08 <harlowja> kebray so its almost to stackforge, i think jessica has to do one more review commit :)
20:12:18 <harlowja> #link https://review.openstack.org/#/c/30789/
20:12:26 <harlowja> someone complained about whitespace :(
20:12:40 <changbl> lol
20:12:56 <changbl> quite some effort to put a project on stackforge
20:12:59 <harlowja> def
20:13:40 <alexheneveld> howdy -- sorry i'm late
20:13:45 <changbl> one questions, what is the diff between zk backend and celery?
20:13:45 <harlowja> i believe another guy is involved might also be looking in the ZK stuff as well changbl , haven't heard much though, can connect u 2
20:14:02 <changbl> harlowja, sure
20:14:05 <changbl> let me know
20:14:16 <harlowja> cool
20:14:20 <harlowja> hi alexheneveld
20:14:28 <harlowja> *we are just going over who is doing what as of right now :)
20:14:41 <adrian_otto> alexheneveld: hi
20:14:43 <harlowja> which does bring up a good point about how putting stuff on say launchpad might help it become more clear
20:14:52 <harlowja> but maybe after the stackforge move we can do that?
20:15:00 <harlowja> like launchpad/~taskflow or something
20:15:13 <changbl> harlowja, one question, what is the diff between ZK backend and celery? I see a celery folder in backends/ in Taskflow
20:15:38 <harlowja> its a good question, and comes down to what do we want the ZK backend to do :)
20:15:57 <harlowja> celery is more of a way to run tasks, ZK wouldn't neccasarily be a 'running mechansim'
20:16:14 <harlowja> does that make sense?
20:16:30 <harlowja> #link http://www.celeryproject.org/
20:16:30 <changbl> I once used ZK to implemented a distributed queue
20:16:31 <alexheneveld> thx harlowja hi adrian_otto - we've been spiking it (samc)
20:16:36 <changbl> yes, checking their website
20:16:38 <alexheneveld> celery supports ZK as a transport
20:16:42 <alexheneveld> not used it tho
20:16:52 <alexheneveld> TBH i think since we have a DB that is a logical choice
20:17:01 <alexheneveld> cool as ZK is of course :)
20:17:12 <harlowja> agreed, so ZK can be a 'storage' backend
20:17:25 <harlowja> but it also provides the other more interesting part of 'job transfer'
20:17:32 <harlowja> example
20:17:34 <alexheneveld> not sure if celery supports all the mutexes/alarms we need but that should be an abstract service
20:17:56 <changbl> harlowja, what do we use ZK to store?
20:17:58 <adrian_otto> we'll likely need a combination of both
20:18:21 <harlowja> conductor A gets job to do Y, conductor A claims job Y via ZK, conductor A does 1/2 of the task required for job A, conductor A dies
20:18:42 <harlowja> now with ZK u can transfrer job Y from A -> conductor B when u notice it dies
20:19:04 <adrian_otto> assuming it rolls back and restarts from the beginnning?
20:19:21 <harlowja> #topic details
20:19:26 <changbl> harlowja, you mean leader election via ZK?
20:20:17 <harlowja> adrian_otto well rolling back may be appropriate in some of the workflows, but not all, some u may just be able to resume where conductor A left off
20:20:46 <adrian_otto> ok
20:20:51 <harlowja> changbl so if u think of each job as having an owner, it maps to the concept of ZK locks and ZK watches
20:21:09 <changbl> harlowja, yes, I used that before
20:21:34 <harlowja> when owner (conductor A) dies, ZK releases lock, then conductor B gets a notification via its watch on said lock, and then conductor B can attempt to acquire it
20:22:04 <harlowja> so then that brings up the question of what is stored to be able to allow conductor B to resume (or rollback) job Y
20:22:51 <changbl> harlowja, make sense, that means ZK stores that info?
20:23:03 <changbl> i mean from where to resume
20:23:17 <harlowja> right, each task/flow that composes job Y creates some result, that resulted is stored 'somewhere' and that result set can be referenced later for rollback or resuming
20:23:25 <harlowja> so ZK can be one such place to store, or a DB can
20:23:44 <harlowja> kevin from rackspace is working on the DB place to store that
20:24:01 <harlowja> which started to showup this week (in the db backend)
20:24:32 <harlowja> an example of how said storage might look like (when printed), was showing this to john griffith (the cinder PTL)
20:24:35 <harlowja> #link http://paste.openstack.org/show/37929/
20:24:40 <harlowja> *just an example*
20:24:55 <jgriffith> o/
20:25:10 <harlowja> said information could be exposed as 'what did my job do' API
20:25:22 <harlowja> *if desired*
20:25:48 <harlowja> changbl does that all make sense? :)
20:26:06 <changbl> harlowja, yes. Just wonder, do you have any code on this that I can check out?
20:26:22 <harlowja> for the ZK backend?
20:26:57 <harlowja> or the code that produced the above paste?
20:27:22 <changbl> The code which defines what to store, and how they call APIs
20:27:43 <harlowja> so right now that is
20:27:49 <harlowja> #link https://github.com/yahoo/TaskFlow/blob/master/taskflow/logbook.py
20:27:55 <ehudtr> so except locking you may like ZK also to keep the workflow state? why both db and ZK?
20:28:40 <kebray> some folks are opposed to zookeeper… also, it'll be good to have a lightweight db implementation for devstack me thinks.
20:29:14 <harlowja> yes, but if someone wants to work on a ZK 'storage' backend, then that seems fine no?
20:29:22 <kebray> Yes!
20:29:25 <kebray> agreed
20:29:50 <kebray> I'm not opposed to Zookeeper.   I'm pro modularity and plugable backend implementations.
20:29:55 <harlowja> which does bring up the good question of ZK ;)
20:30:01 <harlowja> #topic locks
20:30:05 <harlowja> #link https://wiki.openstack.org/wiki/StructuredWorkflowLocks
20:30:24 <harlowja> so if u guys want to check that out, its my idea of something that taskflow could provide
20:30:37 <harlowja> although maybe not at stage 1.0
20:31:37 <harlowja> when i was reading the google chubby paper, they say that not even googles own developers do locking right, so thats why they made chubby, so it'd be nice to offer a api that can help get it *somewhat* right
20:31:45 <harlowja> #link http://research.google.com/archive/chubby-osdi06.pdf
20:32:33 <harlowja> i just think that resource level locking will come up pretty quick, especially after talking to devananda  about ironic
20:32:39 <harlowja> #link https://wiki.openstack.org/wiki/StructuredWorkflowLocks#Ironic_.28WIP.29
20:33:11 <devananda> o/
20:33:25 <harlowja> ha, just talking about locking :-P
20:33:43 <harlowja> i think its something we should try to offer, but offer it 'very carefully'
20:33:46 <harlowja> thoughts?
20:33:47 <devananda> #link https://github.com/openstack/ironic/blob/master/ironic/manager/task_manager.py#L20
20:33:55 <devananda> is what i put together for ironic
20:34:23 <harlowja> thx devananda
20:34:40 <devananda> nova alrady does this is a fairly poor way, IMHO
20:34:43 <devananda> with 'task_state'
20:34:53 <devananda> very easy to get out of sync // lost
20:35:16 <harlowja> yup, nova confuses a task_state with a lock
20:35:25 <harlowja> when task_state should just be that, a task state :-P
20:35:34 <devananda> just look at "nova reset-state"
20:35:43 <devananda> that that even exists says bad things
20:35:45 <harlowja> ya :-/
20:36:33 <harlowja> do others think it'd be useful for taskflow to provide something like that locking API in the above wiki, or at least something like it
20:36:53 <harlowja> *backed by 1 or more implementations
20:37:29 <kebray> harlowja  will have to wade through some of those links before I have feedback.
20:37:32 <harlowja> np
20:37:34 <adrian_otto> regarding the lock wiki reference from above, in section "Filesystem" under Drawbacks, it reads "Does not release automatically on lock holder failure (but timeouts on locks are possible)."
20:37:43 <alexheneveld> yes, a generic locking service would be good
20:37:54 <harlowja> adrian_otto does that make sense, i could word it better?
20:38:00 <adrian_otto> that's not true, depending on what you define as failure
20:38:19 <adrian_otto> if the caller holding an advisory lock goes away, then the lock is automatically released by the kernel.
20:38:33 <harlowja> adrian_otto true, good point
20:38:52 <adrian_otto> go away = process terminates for any reason
20:39:01 <harlowja> alexheneveld its just when we give a locking api thingy, then people have to be prettty aware of how to acquire those locks 'sanely'
20:39:13 <harlowja> *ordering issues become a big problem, lol
20:39:14 <alexheneveld> is there a good one we could reuse however?
20:39:22 <harlowja> alexheneveld none afaik
20:39:46 <devananda> filesystem locks should, IMO, call out a draback that it's not distributed
20:39:53 <harlowja> devananda sure
20:39:54 <alexheneveld> shame.  it's hard to get right!  people will just keep asking for new use cases otherwise.
20:40:19 <harlowja> i was reading about stuff like http://linux.die.net/man/3/dlm_lock, ibms vms and such
20:40:45 <alexheneveld> it might be an idea to aim for a higher-level model
20:40:46 <adrian_otto> devananda: you still could use a multiplicity of filesystems as a backing store for locks if you were inclined to make a networked lock service built that way.
20:40:50 <harlowja> dlm_lock and such stuff i didn't find much info on though, except that glusterfs i think uses its :-p
20:41:20 <alexheneveld> as in you are locking lifecycle of a server
20:41:43 <harlowja> alexheneveld agreed, i think this is why chubby (the google zookeeper) provides only coarse grained locking
20:42:04 <harlowja> and i think if we can be pretty coarse, we will also be fine, its just defining 'coarse' isn't so easy, ha
20:42:08 <alexheneveld> people should be discouraged from low-level locking operations tho i accept they may need to be available
20:42:09 <devananda> adrian_otto: true, but with a SPoF on the network FS. or a consistency problem if using a distributed FS
20:42:41 <alexheneveld> harlowja: makes a lot of sense (re chubby)
20:42:50 <changbl> harlowja, one question, why do we have so many (6) providers for locking? plan to implement all of them?
20:43:08 <harlowja> changbl it was more of just a survey of ones i could think of
20:43:18 <changbl> oh, ok
20:43:26 <harlowja> i think redis, filesystem, ZK might be a good 3
20:44:04 <harlowja> or just redis + filesystem to start
20:44:37 <harlowja> *so as not to scare people with ZK
20:44:46 <changbl> I can take ZK
20:44:59 <changbl> for both storage and locking, i guess here?
20:45:13 <harlowja> that'd be cool
20:45:27 <harlowja> #link http://openreplica.org/faq/ was another interesting one, that has something like ZK in python
20:45:31 <harlowja> didn't investigate much more though
20:45:56 <devananda> though i don't usually recommend it, i suspect innodb+galera would actually be a good fit here
20:46:06 <devananda> NDB would be another, but very complex to set up
20:46:08 <changbl> never used OpenReplica before
20:46:22 <changbl> ZK + Kazoo seems work nice
20:47:02 <harlowja> changbl ya, it shouldn't be that hard to connect it to the taskflow api
20:47:21 <devananda> AIUI, galera, with the proper settings, will do distributed lock before committing a write. min 3 servers, so partitioning is less of an issue
20:47:34 <harlowja> devananda intereting, didn't know that :)
20:47:47 <harlowja> *interesting
20:47:59 <adrian_otto> OpenReplica looks interesting.
20:48:19 <harlowja> adrian_otto ya, i'm not sure how mature it is though, didn't mess around with it that much
20:48:44 <devananda> from the little reading on openreplica i did, it's aimed at geographic distribution
20:49:14 <harlowja> with some underlying paxos concoord thing
20:49:24 <adrian_otto> devananda: do you have a documentation pointer to galera re: locking? Where did you learn about that behavior?
20:49:46 <devananda> lemme see if i can find it
20:49:56 <devananda> adrian_otto: learned about it while working at percona ... :)
20:50:45 <harlowja> changbl if u want to work on the ZK stuff that'd be really neat, i think we can keep in the back of our mind how 'job/workflow' transfer can occur with ZK
20:51:02 <devananda> adrian_otto: http://www.codership.com/wiki/doku.php?id=mysql_options_0.8 -- wsrep_causal_reads, IIRC
20:51:06 <adrian_otto> most of what I found when I looked for that is actually about table locking limitations in Master:Master setups
20:51:46 <adrian_otto> aha
20:52:06 <changbl> harlowja, sure
20:52:12 <adrian_otto> read committed isolation level might actually work well enough for what we need.
20:52:46 <adrian_otto> and assuming that works equally well in a 3 node arrangement, that might actually be one of the best ZK alternatives.
20:53:12 <devananda> better link / explanation
20:53:12 <adrian_otto> we could deploy/configure it using a HOT in a well automated way too, I expect.
20:53:17 <devananda> #link http://www.percona.com/doc/percona-xtradb-cluster/wsrep-system-index.html#wsrep_causal_reads
20:53:36 <devananda> adrian_otto: and yes, galera is designed to require min 3 nodes. it wont start with 2
20:53:50 <devananda> but will continue to run if a 3-node degrades to 2
20:53:56 <harlowja> devananda for ironic, do u have any thoughts on what u'd like for say when the thing using your context manager crashes (without releasing)?
20:54:15 <harlowja> manual recovery at that point to release? time based release?
20:54:28 <devananda> harlowja: eventually, time-based release-wipe-set-to-error or something
20:54:42 <harlowja> k
20:55:13 <adrian_otto> using a Galera cluster also solves a multi-tenancy concern as well, which may of the other options don't address.
20:55:18 <adrian_otto> (if any)
20:55:41 <adrian_otto> s/may/many/
20:55:45 <harlowja> ZK has namespaces afaik ;)
20:55:57 <adrian_otto> with related auth?
20:56:05 <harlowja> *unsure*
20:56:29 <harlowja> alright, 4 minutes
20:57:32 <harlowja> adrian_otto it might be interesting to see if the hortonworks people (that i think are doing hadoop) have thought about how ZK and openstack
20:57:51 <harlowja> and especially the tenant issue
20:58:14 <adrian_otto> Savanna is actually Mirantis, not Hortonworks
20:58:18 <harlowja> ah
20:58:27 <adrian_otto> we know them, so I could ask
20:58:32 <harlowja> ah, either way
20:58:43 <harlowja> i thought hortonworks, guess i was mistakened
20:58:50 <harlowja> *which we know :-P
20:58:59 <harlowja> anyways, that'd be neat to see what they think
20:59:28 <harlowja> cool, well times up folks!
20:59:38 <harlowja> email, or irc, or more email for anything else :)
20:59:45 <harlowja> #endmeeting