17:00:35 #startmeeting 17:00:36 Meeting started Tue Nov 8 17:00:35 2011 UTC. The chair is sandywalsh. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:37 Useful Commands: #action #agreed #help #info #idea #link #topic. 17:00:51 Who is here for orchestration meeting? 17:01:02 o/ 17:01:05 me 17:01:19 hi sandy 17:01:27 hey! ... may be a short meeting 17:01:52 #topic pacemaker and zookeeper 17:02:14 is Andrew Beekoff here? 17:02:22 I don't believe so 17:02:49 I'm not very familiar with pacemaker. 17:02:50 he's in charge of Pacemaker, which is a core part of red hats clustering strategy 17:03:05 nor am I, it seems very capable 17:03:24 the biggest issue I see is the conflict with the nova architecture 17:03:27 I havent had a chance to review pacemaker at all, got through maoy's ppt, but not the full Tropic paper 17:03:32 (workers vs. master-slave) 17:04:01 mikeyp, the tropic paper is the log from last meeting 17:04:22 just need to read it :-) 17:04:27 maoy, thanks again for the discussion on the row-locking issues 17:04:57 for me the next step is to mess with zookeeper (and the python bindings) to see what we can make it do 17:04:59 the tropic paper requires a fairly big change to the nova architecture. so I made some changes in the ppt to simplify things 17:05:15 you are welcome 17:05:33 I think we're in general agreement on the approach. I think your strategy fits in well with my proposal 17:05:43 cool. 17:05:45 mikeyp, the workflow summary was great 17:06:15 I haven't heard of pyutillib.workflow ... what's the recent status of it? 17:06:22 is it maintained actively? 17:06:38 the zk python binding works fine for me although I never tried with eventlet 17:06:41 (spiff workflow isn't actively maintained, and the author suggested we fork) 17:06:50 I looks like it's actively maintained - last checkin was a couple of weeks ago. 17:06:55 nice 17:07:24 does it make any assumptions about persistence layer or require a web interface, etc? Or is it just an engine? 17:07:58 Yah, I do wonder abt the ZK interface + eventlet 17:08:07 I don't know yet - I'm going to kich the tires today 17:08:10 It uses threading, + a C module 17:08:24 It does seem to be primarily an engine, thogu 17:08:25 dragondm, good point 17:08:47 mikeyp, can we put you down to give us a report on it? 17:09:03 sure, no problem. 17:09:17 #action mikeyp to give us a report on pyutillib.workflow (dependencies ideally) 17:09:33 there is a non-threading version, at least for the C API. Not sure if there is a python binding as well. 17:10:05 maoy, which did you use previously? 17:10:11 dragondm, did you look at zookeeper before? 17:10:25 I used the multithread python binding 17:10:39 I've looked at it briefly, I haven't played w/ it much 17:10:47 maoy, but you weren't doing your project against nova, correct? 17:10:56 correct 17:11:06 that's for tropic 17:11:10 which doesn't use evently 17:11:12 eventlet 17:11:50 andrew mentioned the licensing of the python binding for pacemaker wouldn't be an issue, I do have a question for him on the engine portion. 17:12:02 or if he makes his money from professional services of the product 17:12:10 i c. 17:12:16 the main concern would be something blocking in a C module that would prevent eventlet from taskswitching 17:12:23 correct 17:13:03 so, as an action item, maoy, can we put you down to investigate zookeeper/eventlet integration? 17:13:04 BTW: sandywalsh: my workorder concept was posted here: https://lists.launchpad.net/openstack/msg03767.html 17:13:08 i'll take a look at it 17:13:37 #link https://lists.launchpad.net/openstack/msg03767.html dragondm's proposal 17:13:44 thanks 17:14:01 I can expand that out if needed. 17:14:19 #action maoy to investigate zookeeper/eventlet integration. Is the threading model with the C library going to be an issue? 17:14:35 so the concept of tasks and the analogy to process in OS makes sense? 17:14:47 Is it possible to run redundant Zookeepers? 17:14:54 dragondm, we'll give it a re-read and give you some feedback 17:15:39 for zookeeper, you can run 2f+1 nodes to tolerate f node failures 17:15:40 #action give dragondm feedback on his proposal 17:15:47 The workorder proposal seems really compatible with orchestration. 17:16:09 sandywalsh: and s/scheduler/orchestrator/ in that :> 17:16:49 A question re:zookeeper - are there any concerns about adding a dependency on Zookeeper ? 17:16:56 dragondm, right ... I still think the two are synonymous 17:17:09 ya, pretty much 17:17:19 mikeyp, I thought about that ... I sort of view it the same as rabbit, mysql or apache 17:17:31 so long as the license works. 17:17:40 have you guys thought about the retry logic? 17:17:50 however, there are replacements for apache, rabbit and mysql ... no so with zookeeper 17:18:11 maoy, not in depth yet ... until the workflow engine is in place 17:19:04 maoy did you think of it being handled in a different manner than the workflow? 17:19:24 retry and rollback might become a next-release item - I've been thinking a little about it. 17:19:39 it's just my opinion, but workflow is mostly studied in computer programming to capture and develop human-to-machine interaction. 17:20:04 there is not much human interaction in nova. everything is a computer program.. 17:20:15 when I say workflow I mean petri-net 17:20:24 (state machine) 17:20:27 sorry 17:20:53 i like petri-net at the design phase 17:21:13 I'd like to see what the python code is going to look like to model these petri-nets 17:21:25 ya' 17:21:30 maoy, do you see something more formal for later stage? 17:21:32 Maoy, thats true for a lot of cases. There's also a whole world of production scheduling, ETL, and APP integration with little or no human interaction. 17:21:33 when implemented, it's still going to be python programs with greenthreads and rpc calls, right? 17:21:48 maoy, yes 17:22:09 I was planning on trying to implement a couple of workflows in pyutilab.workflow, to see what they look like. 17:22:24 mikeyp, that would be a big help 17:22:51 are the calls blocking? that is, can a number of events take place at once? 17:22:55 sandy, nothing more formal.. 17:22:56 #task examples of what the petri-net models would look like in python 17:23:12 i'm trying to figure out the exact benefit after we have the petri-net 17:23:26 garyk_, good question. 17:23:52 maoy, of petri-net over single-state state machine? 17:24:23 no. 17:24:36 I think the benefit of a workflow / petri-net is that there can be many pre-defined workflows, so the service could expand to uses we haven't yet considered. 17:24:37 garyk_, there will likely be some blocking in the orchestation layer, but it should be on a per-job basis ... not per-service 17:24:45 petri-net can model concurrent stuff. that i buy. 17:24:52 but i'm wondering after we have the models, how to take advantages of it 17:25:01 ok 17:25:29 maoy, well, as you mentioned before, I think like a defining a single "retry" operation would be useful 17:25:40 and reusing that model in various places 17:25:52 (for example) 17:26:23 workflows can also be nested - pretty powerful way of combining primitives for reuse. 17:26:29 yes 17:26:57 how does that compare to a decorator @retry(max=3) then automatically catch exceptions and retry? 17:27:06 Perhaps even the whole "provision" an instance process would be comprised of sub workflows (such as load-image, move image, change networking, etc) 17:27:16 silly question - in the event that the host running the orchestration reboots, is there a way in which the orchestration can be resumed from the same point 17:27:18 maoy, what happens if that service dies? 17:27:28 maoy, the decorator has no persistence 17:27:55 but I see your point ... there may be places where code-level retries are better than workflow-modeled retries 17:28:02 hey andrew! 17:28:06 hey! 17:28:14 jetlagged, so i happened to be awake :) 17:28:24 beekhof, you may want to read the scrollback 17:28:51 could someone paste it somewhere? i only just got my internet connection back 17:28:57 I'm thinking that if a compute node dies, then the scheduler should receive a timeoutException if engineered correctly and retry there. 17:29:03 beekhof, there'll be a log when we stop the meeting 17:29:09 k 17:29:13 hi beekhof! 17:29:15 i read last weeks too 17:29:23 hi maoy :) 17:29:31 garyk, I think thats one reason to consider ZooKeeper - a way of storing state reliably. 17:29:45 * heckj wonders which meeting he walked into 17:29:53 heckj, orchestration 17:30:10 garyk_: yah, that was the reason I thought of the workorder idea. THat way the orchestration service is basically stateless. Doesn't matterif one falls over. 17:30:12 #topic Orchestration - packemaker & zookeeper 17:30:16 sandywalsh: cool, thank you 17:30:53 ok - sounds good. does the zookeeper keep some kind of configuration id to track the states? 17:30:54 dragondm, does your proposal use zookeeper? 17:31:09 I didn't specify storage. 17:31:16 k 17:31:31 well, I think we have a good list of to-do's for this week 17:31:36 The design I had could be persisted with an db 17:31:39 for those that came in late, what type of state are we specifically talking about? 17:31:43 dragondm, i'll read your link after the meeting 17:31:59 beekhof, state machine 17:32:15 if you have many "clients" wanting to all agree on state, zookeeper is an excellent way of doing it. 17:32:25 zookeeper is used for 3 reasons: persistent storage, lock management, and leader election 17:32:27 beekhof, and that could contain things like "VM state" 17:32:50 a bit of extra complexity, but does a lot of the hard work of distributed locks to enable that sort of thing 17:32:50 it could be for the queue as well but since we're using rabbit, no need for zk at the moment 17:32:56 beekhof, or rollback status, etc 17:33:27 will it require support for authentication? 17:33:35 yes, the trickiest thing about zk is what is it's core competency 17:33:41 so would this be analogous to writing "guest X is starting" to a db? 17:34:00 beekhof, I think so, yes 17:34:26 beekhof, yes for storage purpose. 17:34:27 beekhof, the concern I brought up on the ML was zk vs. row-level locking 17:34:54 and it sort of sounds like zk is an abstraction over those differences 17:35:06 likely doing it's own row-level locking under the hook 17:35:48 so a random scheduler would grab guestX off the queue, say "i got this", and then go about the steps involved in start it up, updating the state as it went? 17:35:48 ok ... I'd like to push the topic ahead for now 17:36:02 sorry, go ahead beekhof ... 17:36:02 zookeeper uses a quorum protocol to reach consensus. 17:36:06 :) 17:36:45 beekhof, it would do one step in the process, when the event came it that the step finished another worker could handle the next step. 17:36:58 ok, i can see the advantage there 17:37:32 beekhof, I think that's the fundamental difference between PM and ZK ... master/slave vs. workers 17:37:39 yep 17:37:46 (well, and your resource manager) 17:37:46 it can also grab a lock on the instance so that no one else is touching the VM 17:38:02 (which zk doesn't do) 17:38:24 let's continue this one on the ML 17:38:28 sure 17:38:34 #topic Orchestration - Meeting time 17:38:48 what UTC are most of you in? 17:38:57 -4 17:38:58 ah, this one's my fault :) 17:39:05 -5 17:39:20 well, also, is Tuesday best? 17:39:37 right now, I'm +10 17:39:40 http://www.worldtimebuddy.com/ 17:39:53 really handy for this sort of thing 17:39:54 UTC -8 / Pacific 17:40:10 beekhof, heh, both sides 17:40:23 I mean beekhof & mikeyp 17:40:26 same as mikeyp 17:40:42 i am sorry i need to go and feed the animal in my zoo. thanks for the great ideas. 17:40:53 round the clock, follow the sun development :-) 17:40:56 bye garyk 17:40:58 garyk_, thanks for the input 17:41:21 k, so unless there are any objections ... keep meeting time the same? 17:41:33 works for me 17:41:34 actually 17:42:19 did we get any europeans? 17:42:35 not active ... perhaps lurkers 17:43:14 what about 2:15 from now? 17:43:19 is that too late for anyone? 17:43:26 2:15:00 17:43:35 on tues it will be a conflict with other openstack teams 17:43:44 we'd have to move days 17:43:53 because thats 7am, which is easily doable 17:44:00 7am here i mean 17:44:29 * sandywalsh tries to figure out what that would be for him 17:44:30 4am is harder and i'm less coherent 17:44:34 :) 17:44:54 for PDT it should be about lunch time 17:44:57 7pm Eastern? 17:45:43 is boston eastern? 17:45:47 yup 17:45:50 yes 17:46:17 that website is claiming my 7am is your 3pm 17:46:56 this room is booked until 2300 UTC on tues 17:47:40 different day? i'd really like to join on a regular basis 17:47:57 See an opening? http://wiki.openstack.org/Meetings 17:48:28 it would have to be Thurs for me 17:48:55 thurs is fine by me 17:49:20 and there appears to be only one other meeting on that day 17:49:23 thurs works for me. 17:49:25 thursday, when? 17:49:33 Thursday 3pm EST, 20 UTC? 17:49:58 that would be ideal for me 17:49:59 20:00 UTC 17:50:02 done 17:50:09 sweet :) 17:50:11 cool 17:50:20 #action meeting moved to Thursdays 3pm EST, 2000 UTC 17:50:31 beekhof, what's the best intro reading for pacemaker? 17:50:31 thanks guys ... keep active on the ML! 17:50:52 there's a detailed PDF called "pacemaker explained" which does a good job. 17:50:57 warning: very complex critter... 17:51:08 maoy: that one and "clusters from scratch" 17:51:20 heckj: yeah, pretty dry 17:51:37 beekhof: yeah, but the best detail short of "playing with it" incessantly 17:51:38 maoy: http://www.clusterlabs.org/doc <-- look for the 1.1 version 17:52:21 ok 17:52:26 anything quick before we end? 17:52:31 nod. its job is to detail all the options and possibilities, but doesnt give the first clue how to put it together sanely :) 17:52:43 i'll read the notes 17:52:43 #endmeeting