13:02:03 #startmeeting senlin 13:02:04 Meeting started Tue Jun 14 13:02:03 2016 UTC and is due to finish in 60 minutes. The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:02:05 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:02:07 The meeting name has been set to 'senlin' 13:02:25 hi 13:02:25 seems working 13:02:27 cool 13:02:38 hi 13:02:39 morning/evening 13:02:42 good 13:02:44 Hi 13:02:45 hi 13:02:57 pls feel free to add items to the agenda if you have topics 13:03:06 #link https://wiki.openstack.org/wiki/Meetings/SenlinAgenda#Weekly_Senlin_.28Clustering.29_meeting 13:03:29 hi, everyone 13:03:44 let's start with the etherpad 13:03:53 #link https://etherpad.openstack.org/p/senlin-newton-workitems 13:04:34 tempest testing side, we have made some really good progress on api tests, especially those negative ones 13:04:58 yes 13:05:11 yes, almost done I think. just need negative test case for cluster actions 13:05:16 How many negative tests left? 13:05:26 about 10 I think 13:05:36 great! 13:05:37 one for each action 13:05:38 Great! 13:06:11 then we may need to migrate functional tests to use tempest? 13:06:32 Qiming, yes, I think so 13:06:45 or, do we want to do the migration at all? 13:06:46 maybe we can put them into scenario dir? 13:06:47 :) 13:07:03 Qiming, if possible, we should :) 13:07:10 okay 13:07:17 then we do it 13:07:45 then we don't need to maintain functional gate job 13:07:45 Or we can create a functional dir? 13:07:46 all these tests will be done using tempest 13:07:59 although we may need tempest scenario test job? 13:08:17 yes 13:08:43 functional is also ok I think 13:09:04 functional tests will still use fake drivers 13:09:13 right 13:09:16 how about scenario tests? 13:09:27 that is the same as API test 13:09:51 actually in my mind, our current functional test is more like 'scenario' test :) 13:10:10 just the backend driver is fake 13:10:26 scenario will use real driver, right? 13:10:38 um, that can be integration :) 13:10:43 integration test 13:10:54 right 13:11:04 seems there is no strict definition for these test types... 13:11:07 oh, I thought they are the same... 13:11:22 elynn, that's confusing :) 13:11:23 api test focuses on the api surface 13:11:49 functional tests is more about exercising the senlin-engine 13:12:29 then I'm doubting if we should treat scenario test and integration test as the same thing 13:13:49 I'm ok with both functional and scenario. We just need to differentiate those two cases using real driver or not. 13:14:07 Speaking of that, seems defcore tends to use the tests in tempest tree. 13:14:21 Do we need to put some tests there? 13:14:33 if needed, we can copy the code there 13:14:40 I think we don't need to test complicated engine logic in integration test 13:14:53 defcore only cares about api surface, right? 13:15:03 just need to ensure senlin works well with other backend services I guess 13:15:16 Qiming: yes 13:15:23 yanyanhu, agree, but we need to test some tricky, corner cases as well 13:15:37 yes, like lb policy 13:15:42 Agree 13:15:50 like lb policy and health management 13:15:53 integration test is more of exercises for profiles and policies 13:15:56 right 13:16:04 yep 13:16:26 so we can skip scenario tests? 13:16:45 ok 13:16:52 have functional focusing on engine testing, integration tests focusing on profiles/policies 13:17:05 move existing functional tests to tempest dir 13:17:10 the former still use the fake driver, the latter use real drivers 13:17:11 Anyway, these tests are only tools to help us to make sure our services works as expected. Naming is not import :) 13:17:14 s/move/re-implement 13:17:27 We can add any tests we want to tests. 13:17:31 elynn, +1 13:17:36 elynn, yep, but we have to speak the same language 13:18:05 we agree we won't talk about scenario tests 13:18:10 true? 13:18:50 agree 13:19:04 ok 13:19:22 okay, let's keep things simple 13:19:28 stress testing 13:19:44 noticed your patch about rally testing, yanyan 13:19:57 quite some nits found when reviewing it 13:20:01 pls check 13:20:08 yes, have read your comments 13:20:12 will fix it tomorrow 13:20:30 also, we have got some comments from rally team about that plugin testing 13:20:56 Qiming, yes, definitely 13:21:00 pls help keep the balls rolling 13:21:15 I noticed roman just left some comments on my patch for adding cluster plugin 13:21:22 will reply and update patch 13:21:31 we may want to check if cmcc guys want to help on rally test cases 13:21:55 finally, all those plugins will stay in rally repo and we can remove our local copy 13:22:11 Qiming, yes, just didn't get msg from eldon zhao 13:22:17 you mean the rally_jobs subdir? 13:22:24 will contact with him to see whether there is anything we can help them 13:22:37 Qiming, yes, we just need to keep job description files 13:22:52 no need to keep local plugins if they have been merged to rally 13:23:00 okay 13:23:11 moving on 13:23:34 last week we talked about the health threshold problem 13:23:50 have you guys got some new ideas? thoughts? 13:24:30 hmm, still not very sure about it 13:25:08 ... 13:25:29 in my mind, there could be a property like percentage to describe the threshold of health status 13:25:36 which is based on desired_capacity 13:25:37 :) 13:26:03 yes, I know what you mean 13:26:19 which means if the current size of cluster is less than %* of user desired size, the cluster will be marked as unhealthy 13:26:23 yes 13:26:37 then we have four numbers, min_size, health_threshold, desired_capacity, max_size 13:26:49 yes, that's my original idea 13:27:28 how would you map the different node count to cluster status? 13:28:06 node count? you mean count of node in health status? 13:28:22 each time the cluster is resized, a user is supposed to adjust the health_threshold? 13:28:56 no, I think user should expect a stable/fix health_threshold no matter how large the cluster is 13:29:01 suppose we have the 4 numbers as these: 1, 3, 5, 7 13:29:18 the cluster now has 2 nodes active 13:29:28 what the cluster status would be? 13:29:38 that means they always want at least 80% of nodes in their cluster is health. e.g. 13:29:38 even when they have a very large cluster 13:29:59 then the health_threshold can only be a percentage? 13:30:05 3 is health threshold? 13:30:08 yes 13:30:16 60% in this case 13:30:45 the cluster has 2 nodes active, what should the cluster's status be? 13:30:58 but this is just my thought. I think we really need to find user's consideration about this issue 13:31:24 if the threshold is 60%, desired_capacity is 5, then I think the status should be warning/unhealthy 13:31:58 if there are 3 healthy nodes, we can mark cluster as healthy 13:32:00 the cluster's desired_capacity is 5 13:32:09 2 nodes is less than that 13:32:14 it is no good 13:32:38 so in my idea, desired_capacity actually become the 'IDEAL' size 13:32:45 then what is 'desired_capacity' used for? 13:32:52 actually it's not 'desired_capacity' 13:33:11 you can't always get what you want, haha 13:33:13 health_threshold becomes the new 'desired'? 13:33:23 that why I have that idea :) 13:33:31 Qiming, seems so 13:33:48 so I said I'm not so sure about it after all these thinking 13:33:49 then we can completely abandon 'desired_capacity' 13:34:05 that depends on how user understand those conceptions, like desired_capacity 13:34:35 Qiming, maybe, or we use it as real 'desired' capacity 13:34:36 I was having problems explaining these four numbers to a user 13:35:00 when you do recovery 13:35:18 you recover to the health_threshold (3) nodes or desired_capacity (5) nodes? 13:35:31 if the cluster has 4 nodes active now 13:35:38 do you need to recover it? 13:35:54 it is not ideal (one less than the desired) 13:36:08 for recovery, I think we should try to recover to desired_capacity 13:36:36 recover to health_theshold would already make the cluster healthy 13:36:37 hmm, I think your right, we should only maintain three conceptions here 13:36:58 I've been struggling on this for a long time 13:37:37 was trying to get some more brains on this 13:37:53 then we need to explain the difference between 'ideal' and 'what your want' 13:38:08 that's too confusing 13:38:19 yes, that was my thought a long time ago 13:38:32 what you want == ideal 13:38:41 because the reality always changes 13:38:45 so if we only have desired_capacity, that is the threshold of healthy and also 'what user wants' 13:38:58 we always try to make the cluster size match it 13:39:07 that is the *desired* capacity 13:39:23 yes 13:39:30 it could be an over simplification 13:39:50 desired means what user wants 13:40:08 and we try to meet their requirement when creating/recovering cluster 13:40:09 yep, what else could it be ... 13:40:35 if we fail to achieve that goal, the cluster is unhealthy since it doesn't meet user's expectation 13:40:40 yes, the engine will always try it best to "kind of converge" the cluster to that size 13:41:15 ok, I'm much clearer now 13:41:20 about this 13:42:15 okay, if we all agree on this "over" simplification 13:42:22 we can start closing the loop 13:42:30 yes 13:42:37 sounds good! 13:42:46 we can hear node (VM) failure events now 13:43:12 the basic health management will do some recover with and without guidance from a policy 13:43:37 kind of a convergence to the desired_capacity 13:44:02 yes, so policy just make it "automatic" and "smarter" 13:44:19 in future, when needed, we can add an option: the engine can optionally converge the cluster size to a number you want, not necessarily the 'desired_capacity' 13:44:31 that is sort of equivalent to health_threshold 13:44:54 maybe a policy option or something 13:45:16 yes, define that property in policy makes more sense IMHO 13:45:21 moving on 13:45:38 no update on documentation, though I did fixed some links on senlin wiki 13:46:11 reinstalled ceilometer along with aodh, will try workout some tutorial docs on manual/auto scaling 13:46:23 nice 13:46:25 haiwei_, progress on container support? 13:46:45 no progress this week 13:47:09 waiting for your review on the initial patch 13:47:15 okay, let us know if you need someone for a discussion 13:47:44 ok 13:47:49 I haven't reviewed that? ... 13:47:54 my fault 13:47:55 sorry 13:48:10 its ok 13:48:18 no progress on event/notification yet 13:48:38 though I have a half-baked patch on generalizing the backend 13:48:52 oops, we only have 12 mins left 13:49:11 #topic cluster-collect call 13:49:33 let me walk you quickly thru the cluster-collect call I'm adding 13:49:50 the basic requirement is that when you created a cluster of nova servers, for example 13:50:02 you want to get a list of the IP addresses of all nodes 13:50:28 this is a command being added, which necessitates a new engine version and a new api version 13:50:36 you will be able to do things like this: 13:50:55 senlin cluster-collect -p details.addresses.private[0] 13:51:05 this function is useful in container cluster, I think 13:51:23 the "details.addresses.private[0]" is modeled as a json path 13:51:46 I have some local patches to be commited to senlin/sdk/senlinclient 13:52:18 still trying to iron out some issues 13:52:32 but basically, it works pretty good 13:52:59 in the simple case you can do: senlin cluster-collect -p name 13:53:13 that will give you a list of names on command line 13:53:27 you can also do: senlin cluster-collect -p details.addresses.private[0] -L 13:53:49 the "-L" switch will print the output into a two-columned table 13:54:03 the first column is the node id, the second is the attribute value 13:54:32 nice. Have you decided which type of data collect operation will return? list or dict 13:54:33 engine patch and rpc patches are ready for review, I'm working on the api layer 13:54:49 if you review the code 13:55:44 you will see it is returning something like this: {'cluster_attributes': [{'id': 'NODE1', 'value': 'V1'}, {'id': 'NODE2', 'value': 'V2'}]} 13:56:13 I see 13:56:24 that's a quick update on cluster-collect call 13:56:31 #topic open discussions 13:56:54 open for questions/comments/suggestions 13:57:33 nothing from my side 13:57:36 btw, I have got a company internal call for presentation for Barcelona summit 13:57:46 time flies 13:57:57 yes... 13:58:07 the end of oct. 13:58:07 yes... 13:58:35 will think about it 13:58:56 lao si ji, dai dai wo. 13:59:05 :P 13:59:10 I can read it 13:59:27 :) 13:59:28 maybe we can have a brainstorming for it 13:59:34 time's up 13:59:36 in coffee time 13:59:38 yes 13:59:42 thanks for joining guys 13:59:52 have a good night/day 13:59:56 thanks 14:00:00 #endmeeting