13:03:12 #startmeeting senlin 13:03:12 Meeting started Tue Mar 22 13:03:12 2016 UTC and is due to finish in 60 minutes. The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:03:14 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:03:16 The meeting name has been set to 'senlin' 13:03:24 hello 13:03:28 hi 13:03:30 o/ 13:03:31 Hi 13:03:33 hi 13:03:35 Hi 13:03:42 sorry for being late, trapped in some microversion patches 13:03:51 pls review agenda and see if you have things to add 13:04:09 #link https://wiki.openstack.org/wiki/Meetings/SenlinAgenda 13:04:37 first thing I'd like to bring up is about summit prep 13:05:48 for the deep dive session, we'll need a high-level structure so every presenter can start prepare his/her own section 13:06:16 elynn brought up this last week? 13:06:34 this week :) 13:06:42 okay, :) 13:06:42 Hi 13:06:54 chenying__, wlc 13:07:26 so my suggestion is like this: elynn to start with why we started senlin 13:07:52 based on our understanding of the requirement, the discussions with Heat team etc 13:08:31 then I follow with an intro to the senlin architecture and some design decisions 13:09:02 yanyan can close the session with some status update, roadmap and pointers, examples etc 13:09:14 ok 13:09:30 great! 13:09:44 okay, next, the autoscaling one 13:09:48 will start the paper work in coming week :) 13:10:14 mark will start talking about auto-scaling requirements 13:10:38 and I can follow with a generic intro of senlin 13:10:59 then xinhui can close this with a real deployment example and lessons learnt 13:11:16 okay 13:11:24 lixinhui, is the other presenter 'mark'? 13:11:37 just some rough ideas 13:11:51 I think it would be good 13:11:54 any comments are welcomed 13:12:05 We need to start seeding the autoscaling discussions with topics beyond the basic Reactive autoscaling 13:12:08 will involve you to dicuss with Mark 13:12:21 Predictive autoscaling 13:12:37 okay, great, cschulz, will incorporate those 13:12:57 Also the requirement to bring things like Business Process Management into the scaling decisions. 13:13:00 and in some cases, semi-auto-scaling 13:13:08 Yes 13:13:23 that is one way to look at it. 13:13:59 The scaling driven by events from outside the Cloud 13:14:01 cschulz, we can dive into that requirement later, when the newton development cycle is open 13:14:41 yes, that is why we decided to provide 'receiver' instead of 'trigger' abstraction 13:15:18 in real deployments, people have various monitoring/alerting solutions to hook into the scaling workflow 13:15:19 Has the receiver security issue been solved? 13:15:39 DOS ... 13:15:48 cschulz, not yet 13:16:18 we will need to explore other flavors of receivers, beyond the basic webhook one 13:17:11 one promising alternative is to bring zaqar into the picture, so the triggering end doesn't have to know where the receiving end resides 13:17:39 another related topic is to add a rate-limit middleware in front of senlin-api 13:17:45 I will look into zaqar. That is new to me. 13:18:08 it is basically the openstack version of AWS SQS 13:18:43 last one is about containers, I saw haiwei has submitted a new patchset on that spec 13:18:53 I haven't got time to review it 13:19:27 almost the same with what we discussed last week 13:19:32 still evaluating whether senlin + ansible is a good solution for users to create/manage containers 13:20:16 have no idea how to introduce ansible to container cluster 13:20:16 I've been learning myself some ansible basics, so far, it is pretty useful 13:20:35 haiwei_, it would be something like this: 13:21:00 senlin cluster-create => a vm cluster, with list of IPs 13:21:30 ansible -i -m 13:22:20 you can ssh into those VMs in batch and perform operations to do container environment preparation and even operate them 13:22:40 the ansible command should be typed by hand by users? 13:22:50 good question 13:23:03 for operators, it is not a difficult task 13:23:33 for end users, we can invoke ansible by calling its API from inside senlin, do whatever necessary jobs on those VMs created 13:24:47 that means we still need to create a scheduler and something else? 13:24:50 maybe we should allow this ansible deployment triggered by some events 13:24:56 I have tried writing some playbooks (ansible term) for this kind of a task, it is pretty neat 13:25:21 just like other actions 13:25:22 maybe a custom action 13:25:25 yes 13:25:36 just a wrapper of ansible 13:26:05 previously, I tried paramiko, the underlying module used by ansible 13:26:10 it works as well 13:26:32 but 'it works' <> 'it is an elegant design' 13:26:45 do we need to provide high level abstraction for deployment and support ansible as one of the deployment type? 13:26:55 haiwei_, we can have a phone call tomorrow? 13:27:10 yanyanhu_, that is a good idea 13:27:13 I am ok tomorrow afternoon 13:27:14 maybe, as the default type 13:27:32 the high level abstraction maybe just a container image 13:28:17 by default it would be a docker image, for example 13:28:29 exactly 13:29:06 okay, this is something we need to dive into in this week 13:29:19 #topic newton work items 13:29:32 what 14:00~1500 beijing time 13:29:39 for the phone meeting? 13:29:42 just moved etherpad items here: https://etherpad.openstack.org/p/senlin-newton-workitems 13:30:10 haiwei_, fine with me 13:30:42 yanyanhu_ ? 13:30:54 I'm fine with it 13:30:59 anyone interested in this can join 13:31:14 I will dial in 13:31:28 o/ 13:31:38 okay, I'll send an invitation to you all 13:32:05 last Friday, we have cut the RC1 branch 13:32:19 now we have a stable/mitaka branch on git 13:32:31 cool! 13:32:38 that branch is meant to be the base for the coming stable release 13:32:38 I saw the release tag in senlin branch. 13:33:03 if you have found any critical bugs, please bring it up asap 13:33:54 back to newton work items 13:34:28 tempest tests are coming in 13:34:40 saw elynn's patch 13:34:50 I create a BP for it. 13:35:07 And start to look into it 13:35:17 maybe we should retarget the test cases about failure scenarios and stress tests on it? 13:35:29 yes, I think so 13:35:52 thanks, elynn, that would be very important a QA infra 13:36:10 those tests should be included in tempest test framework 13:36:24 oh...seems I need to speed up to not let you blame me ;) 13:36:57 yanyanhu_ will blame you 13:37:05 sorry. really trapped by some other works in recent weeks 13:37:12 yes... 13:37:32 just kidding 13:37:37 After I rebuild my env, I will start to migrate exist tests on tempest framework. 13:37:40 will investigate rally ASAP to decide how we should support stress test 13:37:54 we need such an infra to enable more thorough tests 13:38:08 cool, yanyanhu_ 13:38:15 Qiming, no problem, really my fault :( 13:38:34 oh .... no one's fault 13:38:49 we are all heroes, ;) 13:39:00 health management side 13:39:48 where are we about lb based health detection? 13:40:01 lixinhui, asleep? 13:40:04 I just 13:40:18 added lb-node-polling 13:40:25 into the prototype 13:40:48 underlying, it will rely on neutron 13:40:50 lixinhui, so you're using haproxy as the service provider now? 13:40:55 ayes 13:40:56 yes 13:41:06 ok, at least it works 13:41:26 I am thinking 13:41:36 about the demo 13:41:44 actaully the protoytpe 13:42:10 maybe we could attach the health_polling policy onto lb cluster 13:42:21 then simulate member failure 13:42:42 then watch the auto-healing of the failed member 13:42:43 yes? 13:43:06 by list the member of pool and senlin cluster node status before and after auto-healing 13:43:09 I believe there are many details to be figured out? 13:43:22 yes 13:43:35 right, a cluster with scaling policy, health policy and lb policy 13:44:01 let's see how they dance/fight together 13:44:17 yes 13:44:21 Xujun 13:44:22 you mean the auto healing will be triggered by simulated failure which is actually not a failure detected by lb monitoring 13:44:38 one intern is trying to write the heat template 13:44:49 including everything into a template 13:45:05 yanyanhu_ 13:45:34 I add lb member status polling logic 13:45:37 for this demo 13:45:55 suppose the status of member should be changed bu health monitor 13:46:11 ah, I see 13:46:12 lixinhui, please help bring xujun into the IRC channel, so when he has got questions, he know where to seek help 13:46:24 sure 13:46:45 one problem in my mind 13:46:50 we don't have octavia in the picture now? 13:46:56 no 13:46:58 now 13:47:03 great 13:47:19 I am deply nsx based lbaas 13:47:34 to see any problems there and ask mark to help if does 13:47:45 cool 13:48:16 for the quick demo, we can focus on haproxy, but it would be great if someone helps on validating it using nsx 13:48:31 yes 13:48:49 what is nsx? 13:48:53 hope it would add cents to senlin markting 13:49:06 NSX is vmware neutron engine 13:49:21 hardware loadbalancer from citrix? 13:49:25 almost all custmoers bought its license 13:49:27 oh 13:49:44 HW loadbalancer? 13:49:52 SDDN 13:49:55 sorry, I was wrong :) 13:50:05 pure 13:50:07 software 13:50:17 Makes sense 13:50:29 one problem in my mind 13:50:37 but maybe not import for this demo 13:50:56 is how to differenciate the normal node stop and abmorl down 13:51:02 my guess it is a question, not a problem, :P 13:51:10 yes :) 13:51:13 Maintenance mode? 13:51:19 exactly 13:51:23 yes 13:51:39 the HA prolicy has to be put on hold in some scenarios 13:51:58 Or at least for a subset of the cluster 13:51:59 disable the HA policy temporarily? 13:52:00 that is why we have cluster-policy binding and there is a 'enabled' field there 13:52:31 if it is always effective, you won't be able to do a thing to your cluster, :) 13:52:43 ok 13:52:53 when to disable it 13:53:22 So should we add a 'semi-auto scaling' topic to Newton? 13:53:35 :) 13:53:37 sure, cschulz 13:53:50 OK will add some stuff 13:54:11 lixinhui, sounds to me more like an interaction between policy and action again 13:54:28 yes, Qiming 13:54:43 this will be more complex 13:54:55 before some actions are performed, we will have to disable ha, after that action is done, we renable it 13:55:52 will think more about it 13:55:53 yes, maybe we should throw some warning to user when they perform 'maintainance operations' with HA policy enabled 13:56:18 :) 13:56:24 okay, that's something deserve a whole hour for discussion 13:56:31 let's continue this on senlin tomorrow 13:56:38 okay 13:56:45 last update from me 13:56:57 policy docs are almost all out there 13:57:19 just started working on micro-versioning support for senlin-api 13:57:37 before the versioning thing has landed, we won't accept any changes to senlin-api 13:57:56 #topic open discussion 13:58:09 2 minutes 13:58:19 no more topics from me 13:58:31 no from me 13:58:38 We've run into Action suspend not working. Has that ever been tested? 13:58:42 ok for me 13:58:59 cschulz, that .... is not thoroughly tested 13:59:02 cschulz, it is not supported I think 13:59:22 really haven't test it yet... 13:59:30 there are some db level support 13:59:31 OK, That's OK since we will be following a different approach 13:59:37 but the whole thing is half-baked 13:59:55 thanks for joining, guys, have a good night/nice day 14:00:00 u2 14:00:01 #endmeeting