13:00:05 #startmeeting senlin 13:00:06 Meeting started Tue Mar 15 13:00:05 2016 UTC and is due to finish in 60 minutes. The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:00:07 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:00:10 The meeting name has been set to 'senlin' 13:00:21 good evening 13:00:30 evening! 13:00:40 evening! 13:00:49 hi, elynn, lixinhui_ 13:01:02 Hi 13:01:08 hi, chuck 13:01:15 hi 13:01:18 meeting agenda posted here: https://wiki.openstack.org/wiki/Meetings/SenlinAgenda 13:01:19 hi 13:01:31 pls check if there are things to add 13:01:51 #topic austin summit planning 13:02:14 so we have three talks accepted 13:02:41 elynn, yanyan and I had a discussion this noon, about the 'senlin deep dive' session 13:03:02 we are going to prepare the talk based on the deck I presented during the mitaka mid-cycle meetup 13:03:21 this session will be mainly focused on the why's and how's 13:03:58 is the midcycle meetup deck posted somewhere? 13:04:08 it will walk the audience through the project's origin, its architecture, its use case and roadmap, possibly a comparison to aws scaling and heat-based scaling 13:04:25 no, chuck, I can send you a copy if you are interested 13:04:55 Yes, thanks. 13:05:13 other than that 'deep dive' kind of overview, we won't have a lot time to dive into a particular theme, e.g. HA, AutoScaling, policies, profiles, drivers ... 13:05:43 we will need to start prepare the talk as early as possible 13:05:57 the second one is mostly about autoscaling 13:06:43 it will talk about the requirements, the projects/services involved, the various usage scenarios and the senlin's solution 13:07:15 hopefully, we can combine all the load-balancing, auto-scaling and high-availability things into a single Heat template 13:07:43 lixinhui_ has been driving this work 13:07:58 we'll help her get this talk carefully prepared as well 13:08:40 the third one is more of an open discussion, creating and managing containers using senlin, thus making containers a first-class citizen on openstack 13:09:08 we are supposed to do a good survey of all existing proposals/projects and articulate why we are doing this 13:09:32 what are the specific problems we need to solve, and how are we planning to address them 13:10:01 yes, Qiming 13:10:19 I have updated the spec, hope you can review it 13:10:21 this one is the most visionary one, and we are really hoping we can get some constructive feedbacks from the community 13:10:31 and also ask some magnum guys to review it 13:10:42 even if there are opponents, we will listen to what they say 13:10:53 that would be good haiwei 13:11:02 but one notion 13:11:26 last time we collaborated with magnum developers on autoscaling the VM clusters used to run containers 13:11:44 there was a proposal in magnum to do their own auto-scaling engine 13:12:26 we are open to do all communication, but we are not planning to do stupid things 13:12:39 any comments on the talk preparation? 13:12:44 suggestions? 13:13:30 moving on 13:13:40 #topic mitaka work items 13:13:50 #link https://etherpad.openstack.org/p/senlin-mitaka-workitems 13:14:28 stress testing side, contacted xujun, they don't have bandwidth at the moment 13:14:39 but anyway, we are introducing tempest into senlin 13:14:39 ok 13:14:48 Bran has gotten some data 13:14:51 we have a topic on this in summit 13:14:54 oh, really? 13:14:57 lixinhui_, great 13:15:02 I will send the document to all of you 13:15:12 thank you madam 13:15:15 need your help to improve 13:15:26 very crude one 13:15:26 that would be a good starting point 13:15:31 sure, this will be a very good start point 13:15:42 but we can do better with your suggestions and help 13:15:53 yanyan and I talked about tempest, and elynn joined the discussion as well 13:16:04 I have noticed that a heat stack will indicate create complete even though the senlin cluster create fails. 13:16:05 also, I will start to investigate rally to see whether there are something we can leverage 13:16:14 Is someone looking into this? 13:16:47 the plan is to introduce tempest test into senlin asap, since we are not planning to add new features until the newton development opens 13:16:54 cschulz, bug filed? 13:17:03 Good. thanks. 13:17:11 Hi cschulz, is there any bug you open for it? 13:17:52 the api/functional/scenario tests will be driven by tempest, and we will decide later whether using rally to do stress test is viable 13:17:55 No I didn't since it wasn't causing any issues, but thought I'd bring it up in the area of testing. 13:18:32 cschulz, it sounds like a bug to me, :) 13:18:46 oh... if so, I think it should be fixed in senlin... 13:18:52 Yes will open bug 13:19:03 thanks, cschulz 13:19:12 health management/policy 13:19:31 I believe API side check/recover has done 13:19:37 node status polling is labeld with 'need tests' 13:19:52 functional tests? 13:20:01 oh 13:20:19 seems need to add some functional tests 13:20:27 yep 13:20:41 some functional test for the health-manager 13:20:50 hi, xinhui, there has been a functional test case for node check/recover 13:20:53 will refer to Yanyan's work and do it 13:21:00 lixinhui_, still stuck by the lb errors? 13:21:00 but just for basic workflow 13:21:18 yes, Qiming... 13:21:38 yanyanhu_, we will read your work and add some tests for health-manager 13:21:40 can we switch back to haproxy instead of octavia for the moment? 13:21:48 haven't figured out how to test node/cluster check/recover with more complicated cases 13:22:03 lixinhui_, any question, please just ping me 13:22:16 thanks, yanyanhu. 13:22:21 seems we need to emulate some node failures anyway 13:22:44 yes, Qiming 13:22:44 for the robustness of all other apis, besides the health checking ones 13:23:23 Qiming, yes, a feasible way is providing another test driver for this purpose 13:23:41 insert random failures? 13:23:45 but it's not a good way I think 13:24:03 Qiming, yes, that's what we want. But not sure how to support it 13:24:18 "random" I mean, if it is really random, you can not verify you testing results 13:24:37 should be a quoted version, :) 13:25:04 maybe we can improve the fake driver to do things in a controlled way 13:25:13 yes 13:25:19 this is what I'm thinking 13:25:32 definitely not another set of drivers 13:26:22 I also think so. That's really not a graceful way to solve this issue 13:26:23 we still need an etherpad to discuss the use cases 13:26:27 for HA 13:26:56 lixinhui_, you want to drive that? 13:27:06 https://etherpad.openstack.org/p/senlin-ha-recover 13:27:07 sure 13:27:42 collect some discussion last week into the etherpad 13:27:42 great, you already created one, :) 13:27:55 cool, will read and comment offline 13:28:15 documentation side, just commited some more documents for review 13:28:16 will extend it for use case discussion purpose 13:28:27 still need to add docs for the placement policy 13:28:38 no progress on wiki site revision yet 13:29:37 the autoscaling sample can be based on our current work for the summit talk, right? 13:29:53 yes 13:29:59 I think so 13:30:13 okay, just leave it there, it belongs to mitaka 13:30:25 the next one is container profile 13:30:34 as a contrib it is okay 13:30:38 Is that autoscale sample similar to the work Ethan has been doing there? 13:30:50 cschulz, yes 13:31:15 cschulz, https://www.openstack.org/summit/austin-2016/summit-schedule/events/7469 13:31:31 or, you can say, a more comprehensive one 13:31:52 it is a combination of auto-scaling, auto-healing and load-balancing 13:32:28 still no progress on the NODE_CREATE/DELETE action thing 13:33:17 I'm adding an API microversioning item 13:33:26 hopefully can be done by mitaka release 13:33:42 the idea will be borrowed from nova and cross-project specs 13:34:01 it is a prerequisite for any further revision to senlin api 13:34:09 we are locking the version 1.0 api 13:34:38 any (user visible) changes to it will necesitate a 1.1 version bump 13:34:53 before doing that, we will need a microversioning infra in place 13:35:15 that's pretty a heavy workload before final release 13:35:17 comments? 13:35:41 is this micro version similar to nova micro version? 13:35:54 [21:33] the idea will be borrowed from nova and cross-project specs 13:36:05 ok 13:36:21 okay, moving on 13:36:29 #topic stricter policy checking 13:36:56 so this problem come to my mind when I was reviewing the policy implementations and trying to document them down 13:37:25 currently, a policy checks its TARGET against the actions to be (or already) executed 13:37:51 the problem is that we don't have a strict definition/realization of TARGET 13:38:25 you mean the naming of TARGET? 13:38:38 an action not listed in the TARGET may not trigger the policy checking, even if it may arouse some problems 13:38:42 I think we have defined TARGET action list in some policies 13:38:57 take affinity policy as an example 13:39:17 supposed I have created a cluster and attached an affinity policy 13:39:26 now I want to do cluster_node_add 13:39:52 and the node I want to add doesn't belong to the servergroup, it is from some other hosts 13:39:57 can I do it? 13:40:25 nope I think 13:40:30 if we allow the policy check to succeed, the cluster violates the affinity setting 13:40:40 the affinity policy should reject this request 13:40:47 if we don't allow it, we should say NO in the affinity checking 13:40:58 I think that the only way to be totally sure is to trigger all Policies for every action and have the policy decide if it is appropriate. 13:41:02 right, that is the problem I'm talking about 13:41:12 But that is kind of heavy weight. 13:41:42 I'm creating a table with all actions as the columns and all policies as the rows 13:41:45 is cluster_node_add now in the target list of affinity policy? 13:42:16 in each cell, we should fill in "CHECK", "IGNORE" 13:42:35 yanyanhu_, no, it is not listed there 13:42:45 ok, so we should add it 13:42:48 I think this is more about managing policy target in a better way 13:42:56 We will also need to consider how to amend this table when a new policy plugin is added. 13:43:05 yep. cschulz 13:43:24 the table should be documented 13:44:01 can this table be revised dynamically? 13:44:15 And probably a method in a plugin which amends the table, similar to how the plugin mapping occurs 13:44:31 when filling in "IGNORE", we can save the action from the TARGET list, but we have to bear in mind, an "IGNORE" is equal to say "YES" 13:44:59 cschulz, that checking was done in the policy base class, IIRC 13:45:13 we can check if it can be made more flexible 13:45:52 'IGNORE' means not check, right? 13:46:05 haiwei, exactly 13:46:58 when you are checking, you can say "yes" or "no", when you are silent, it means a "yes", that is the difference we need to clarify 13:47:16 Qiming, a question is when those cell will be filled in 13:47:28 when a policy plugin is loaded? 13:47:43 I think when the action is executed 13:47:51 we can start with a static one, and then try make it more flexible 13:48:18 there should be no check before some actions are executed 13:48:26 for example, we can load a policy and check that policies definition and determine when it should be checked ... 13:48:50 yes, this makes sense 13:49:02 just a little concerned about changing this target relationship dynamically 13:49:19 yes, yanyanhu_ , we have been there 13:49:30 since it will influence all policy instances and action instances 13:49:35 so, for builtin policies, we will use a static definition 13:49:55 too much flexibility will lead the whole scenario unmanageable 13:50:02 yes 13:50:14 that is exactly what I mean 13:50:20 let's fill the holes one by one 13:50:28 currently the target relationship is changed dynamically? 13:50:55 for example, zone placement and region placement cannot handle cluster resize actions, but they should 13:51:09 haiwei, no, it is implemented by hard code in definition of each policy 13:51:38 e.g. 13:51:38 deletion_policy.py: TARGET = [ 13:51:38 deletion_policy.py- ('BEFORE', consts.CLUSTER_SCALE_IN), 13:51:38 deletion_policy.py- ('BEFORE', consts.CLUSTER_DEL_NODES), 13:51:38 deletion_policy.py- ('BEFORE', consts.CLUSTER_RESIZE), 13:51:38 deletion_policy.py- ] 13:51:38 ok, by giving them some integers, yanyanhu_? 13:52:00 haiwei, actually some consts 13:52:10 yes 13:52:13 just like what Qiming showed 13:52:16 these are the relationships between actions and policies 13:52:33 and ... among policies, there is a priority checking 13:53:12 so most of these can be treated as bugs 13:53:34 and we can strive to fix them starting from now 13:54:03 yanyanhu_, since you are looking into rally 13:54:13 Qiming, you mean remove priority attr from policy? 13:54:23 maybe you can give team an assessment next week? 13:54:37 Can the table be posted somewhere so we can all look at it and comment. 13:54:42 yanyanhu_, no, I was pointing out that we are talking about different things 13:54:51 yes, cschulz 13:55:08 Qiming, sure, will do some investigation in coming week 13:55:08 #action Qiming to draft a table (action, policy) and post it online 13:55:14 Qiming, I see 13:55:28 #topic open discussions 13:56:09 can we arrange a discussion sometime this week to talk about containers? 13:56:20 sure 13:56:31 maybe a phone call? 13:56:36 maybe Tuesday morning? 13:56:39 I was wondering if there is a 'proof reading' policy for our web pages, wiki pages etc. I've noticed quite a lot of typos and misspellings and it looks unprofessional. 13:56:48 I am ok for phone call meeting 13:57:19 cschulz, all docs, except for the wiki pages are generated from senlin source tree 13:57:34 any helps are welcomed on polishing the docs 13:57:44 I think maybe we can start to post some items in the following etherpad about what we are doing for newton cycle 13:57:54 https://etherpad.openstack.org/p/senlin-newton-workitems 13:58:05 maybe just some draft 13:58:14 yes 13:58:15 cschulz, there is no native English speaker in the team 13:58:28 Wrong, there is me. 13:58:39 s/is/was 13:58:41 :) 13:58:45 cschulz cool :) 13:59:00 cschulz great! 13:59:19 I thought you were a Chinese too 13:59:35 time's up, guys, thanks for joining, until next week! 13:59:46 see u 13:59:46 thanks 13:59:47 #endmeeting