13:01:19 #startmeeting senlin 13:01:20 Meeting started Tue Oct 6 13:01:19 2015 UTC and is due to finish in 60 minutes. The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:01:21 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:01:23 The meeting name has been set to 'senlin' 13:01:38 hello 13:01:45 hi 13:01:45 hi 13:01:45 Hi 13:01:51 hello 13:02:02 wow, you are all here, ;) 13:02:12 yep :) 13:02:16 :) 13:02:20 just came back from hometown 13:02:22 pls check agenda and see if you have things to add 13:02:28 https://wiki.openstack.org/wiki/Meetings/SenlinAgenda#Weekly_Senlin_.28Clustering.29_meeting 13:02:39 nice holiday 13:03:01 I was expecting that I will be alone here, since it is holiday for most of us 13:03:32 ;) 13:03:37 #topic liberty work items 13:04:04 the etherpad page loads pretty slow 13:04:49 we still have some items left there, have to postpone to next cycle 13:05:12 for the container work, will get an update from SUR team tonight -- 1 hour later 13:05:31 about the unit test in senlinclient, have we done it? 13:05:37 yes, i sent out a request qi ming. haven't heard anything back 13:05:44 yanyanhu_, that is something we need to postpone 13:05:50 ok 13:06:11 sorry about that, I should do that 13:06:15 jruano, you are in that meeting, I haven't heard anything last week 13:06:16 there is some interesting client code, but all the specs for profile and policy are not checked into the repo 13:06:21 yep 13:06:44 I have invited Liam to join that discussion 13:07:06 ton and i have defined the use cases, and i want to start understanding how/where senlin fits 13:07:20 haiwei and I should spend sometime on client test cases anyway 13:07:37 yep, jruano, need to touch base with the eam 13:07:40 yes, Qiming 13:07:44 s/eam/team 13:07:47 I think we can work on it together in the coming cycle. 13:08:07 I will spend some time on it as well 13:08:21 okay, we have just got an +2 on the patch to propose senlinclient into global requirements 13:08:34 good news 13:08:38 cool 13:08:38 awesome 13:08:39 saw it 13:08:41 hope it will be approved soon, so it won't block senlin-dashboard progress 13:08:48 this is helpful for elynn I think 13:09:09 yanyanhu_, do we have more to add regarding functional tests? 13:09:11 for the senlin support in heat 13:09:20 yes 13:09:24 Qiming, I guess not for now 13:09:37 great 13:09:48 great job yanyanhu 13:09:51 oh, maybe still a little more work on node 13:10:10 elynn, we need to sit together for a discussion on how to get heat resource types work 13:10:11 but I think it won't take much time 13:10:37 Qiming: yes, when I get back from holiday. 13:10:58 yanyanhu_, alright, let's get it done then we switch to mitaka work items 13:11:08 ok 13:11:17 #topic placement policy 13:11:40 lixinhui, anything to share from your side? 13:12:03 I am working on vSphereDRS_policy 13:12:07 and unit test 13:12:28 will submit patch around this Thursday 13:12:35 okay, cool 13:12:53 Need your help to review then :) 13:12:59 in fact I am not familiar with vSphereDRS_policy 13:13:04 Liuwei's patch reagarding az placement policy needs a patch 13:13:13 oh? 13:13:28 Maybe I can help 13:13:32 what is the relationship between vSphereDRS_policy and placement policy 13:13:40 seems we need a doc for every policy? 13:13:41 I think he has done that part 13:13:52 Qiming, agree. 13:14:04 and I noticed you have worked on docstring 13:14:11 it will help me at least 13:14:26 Okay 13:14:33 reasonable 13:14:44 em. for all builtin policies, we need some docs explaining how it works 13:14:50 I could add some when the patch done 13:15:01 sounds great 13:15:06 maybe we should always adding docstring when adding new features 13:15:09 lixinhui, Liuwei's patch: https://review.openstack.org/221684 13:15:17 thanks lixinhui 13:15:29 Okay, Qiming 13:15:38 I will read it 13:15:53 yanyanhu_, by docs, I am referring to some design level things, not just function level comments 13:16:08 lixinhui, it is in a pretty good shape now 13:16:09 yes i think that will help 13:16:42 yes, this is nice since it can help people understand the implementation better 13:17:02 the placement policy patch can be tweaked a little bit to support cross-region placement 13:17:16 Okay, Qiming 13:17:49 yanyanhu_, can you have add a TODO item in the TODO.rst file? don't want this ball dropped, :) 13:18:17 sure 13:18:33 #topic deletion policy for RESIZE operation 13:18:50 haiwei has helped start this thread 13:19:10 so far the implementation is not correct in my opinion 13:19:19 I thought it was not difficult, but it seems not 13:19:31 yep 13:20:08 there are many easier paths to get this done, but we have to look at the big picture 13:20:09 I think the problem is that resize action can delete nodes more than one at a time 13:20:35 yes 13:21:22 maybe we can extract the parsing of RESIZE parameters into a utility function 13:21:52 we then call that function directly if no policy is attached to the cluster 13:22:20 that will be done inside the action execution? 13:22:34 if we do have certain policies that want to handle RESIZE action, we invoke this parser as well 13:23:00 sounds a good idea 13:23:16 in action execution, we check if there are policy outputs and skip the parsing if seemed unnecessary 13:23:25 I think this should be done in the engine/service layer 13:23:47 there will be concurrency problems if you do maths there 13:23:52 the cluster is not locked 13:24:06 any other actions can change the cluster at the same time 13:24:42 but for deletion policy the pre_op in which candidates are chosen in done before resize action execution 13:24:53 haiwei, I think you can refer to the implementation of do_scale_in/out 13:25:09 they have similar problem 13:25:18 agreed, scale_in/scale_out is a good reference 13:25:30 yanyanhu, the problem is do_scale_in/out only delete one node at a time 13:25:42 but resize action is different 13:26:01 scale_in/out can carry a 'count' parameter 13:26:27 how to initialize 'count' in pre_op is difficult for resize action 13:26:36 in the case of resize, there are more parameters to handle 13:26:43 yes, haiwei, just as Qiming said, those two actions can also accept 'count' input 13:27:30 haiwei, yes, the logic is more complicated since the constraint might be changed at the same time 13:27:34 haiwei, I was suggesting to extract the "count" computation into a utility function 13:27:34 i know that, those 'count' is default to 1, but resize action is different, we have to give it a value 13:27:45 but I think you can split the logic out 13:28:10 ok, I think I got you 13:28:17 great 13:28:34 #topic policy for node create/delete 13:28:56 so far we have been focusing on CLUSTER_XYZ actions when dealing with policies 13:29:09 however, we do have NODE_CREATE/DELETE/JOIN/LEAVE actions 13:29:43 a NODE_CREATE action, with cluster_id provided, needs to be considered by the LB policy, for example 13:29:53 yes 13:30:19 ideally, those node_xxx actions should also be the target of some policies 13:30:39 I have been looking at this during the past days 13:31:36 I'm hoping this won't be a disruptive change to the current policy implementation 13:32:06 umm, we need to think through this... 13:32:33 anyway, I'll keep working on this 13:32:33 will think about it 13:32:50 okay, feel free to ping me for a discussion 13:33:00 sure :) 13:33:08 #topic batch policy 13:33:13 Qiming, you mean you will focus on the LB policy only? 13:33:21 haiwei, no 13:33:39 it is more about how to weave NODE_xxx actions into policy checking 13:33:49 not just for LB policy 13:34:44 ok 13:35:03 placement policy, for example, is another case where NODE_CREATE action should be checked 13:35:05 and also placement policy 13:35:14 yes 13:35:17 :) 13:36:07 #topic batching policy 13:36:48 batching policy is really about throttling 13:37:06 when creating/updating/deleting objects, senlin makes calls to other services 13:37:49 we have to impose some constraints on the number requests sent to other services during any given period 13:38:15 this is not an easy job as it seems to be 13:38:33 take CLUSTER_CREATE as an example 13:39:06 we want to control how many NODE_CREATE (thus nova boot requests after translation) we will trigger 13:39:51 however, the cluster is just being created, no policy has a chance to get attached to it yet 13:40:07 it becomes a chicken-and-egg problem 13:41:00 so ... the question becomes: how can we do throttling without a policy 13:41:40 Define it in senlin.conf? 13:41:48 Qiming, I think maybe we can split a cluster creation into multiple action sets, and there are dependencies between them? 13:41:51 maybe we only attach batch policy after the cluster is created? 13:42:18 I just proposed a configuration option, max_actions_per_batch 13:42:34 but the existing dependency logic may not be able to support it 13:42:41 it can be overridden later by a batching policy 13:43:52 right, the current batched creation logic has to be revised to support this option 13:44:04 hmm, my idea is not good. It is too far from our policy framework 13:45:04 I am starting to change my mind now 13:45:34 since we are executing all 'actions' asynchronously 13:46:04 (at least we wanted to do things that way) 13:46:27 all actions are first persisted into database 13:46:41 then retrieved for execution 13:47:11 so that throttling problem seems more like a scheduler problem 13:47:17 yes 13:47:19 it is 13:47:28 say if we create a cluster of 1000 nodes 13:47:32 or it can be 13:47:45 sounds like it 13:48:10 the 1000 NODE_CREATE actions are supposed to be executed in a controlled way 13:49:09 so ... I am inclined to looking at it from a different angel now 13:49:33 the problem is: how do we define the 'batching' policy then? 13:49:45 by the way, this policy is triggered by hand? 13:50:08 haiwei, all policies are supposed to be triggered by certain actions 13:50:32 set of rules to a scheduler 13:50:35 so we need a new action for it? 13:50:42 haiwei, I think we just manually provide the rule 13:50:44 for creation/deletion, it sounds more like a scheduler parameter 13:51:05 So is it still needed? 13:51:15 for update, it may carry some other QoS related semantics 13:51:42 yes. maybe we can start to consider the refactoring of engine scheduler 13:51:47 during batched update, users may want to keep a certain number of service node running at any time 13:52:40 yanyanhu_, yes, that is why I wasn't proposing a lot of patches recently, :) 13:53:00 really blocked by this problem 13:53:17 maybe we need to refactor the engine scheduler first 13:53:22 ok, lets think about it :) 13:53:30 make it a 'real' scheduler 13:53:40 yep 13:53:55 then we do 'scheduler.reschedule()' whenever necessary, just like a tickless Linux kernel 13:54:38 maybe we need to change 'batching' policy to just an 'update' policy 13:54:39 right 13:55:07 will keep thinking of this 13:55:13 #topic open discussions 13:56:12 regarding big tent proposal, I'm drafting it in the coming days 13:56:15 I will have a small session about senlin in my company's booth during the summit 13:56:27 will send out to everyone for review 13:56:42 thx, haiwei, ping us if any help needed 13:56:43 haiwei, cool :) 13:56:51 nice 13:57:07 great! 13:57:15 about the meetup during summit, the room allocation is pretty tight 13:57:21 will visit your booth ;p 13:57:22 I will prepare for the presentation , and want you to advices 13:57:23 http://lists.openstack.org/pipermail/openstack-dev/2015-October/076054.html 13:57:33 sure 13:57:59 need to find out how to get everyone together for a f2f discussion 13:58:40 anything else 13:58:44 ? 13:58:49 no 13:58:51 hoho, 1 min left 13:58:53 nope 13:58:55 nope 13:58:57 nope 13:59:04 thanks for joining, during your vacation 13:59:14 talk to you later 13:59:17 #endmeeting