13:00:23 #startmeeting senlin 13:00:23 Meeting started Tue Jul 26 13:00:23 2016 UTC and is due to finish in 60 minutes. The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:00:24 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:00:27 The meeting name has been set to 'senlin' 13:00:33 Hello. 13:00:36 hi 13:00:38 o/ 13:00:38 #topic roll call 13:01:07 hello 13:01:30 o/ 13:01:47 Evening! 13:01:51 xinhui or haiwei online? 13:01:59 Yes 13:02:06 Just jumped in 13:02:24 you jumped beautifully 13:02:30 :) 13:02:36 #topic newton work items 13:02:47 #link https://etherpad.openstack.org/p/senlin-newton-workitems 13:03:13 any progress on stress testing last week? 13:03:57 I saw yanyan's rally work blocked for release-cut window 13:04:07 Qiming, yes 13:04:27 still waiting, guess still need to wait for a while 13:04:42 is that patch the last one we will "beg" rally to merge in? 13:05:03 :) 13:05:11 Qiming, it is not necessary to add all them into rally repo 13:05:22 yep 13:05:25 but letting them stay in rally side is better than keeping them inside senlin 13:05:35 benefit? 13:05:44 so will first add plugins into our repo and migrate them into rally gradually 13:05:54 we don't need to hold them by ourselves 13:06:12 but it will still be senlin team to maintain it 13:06:20 sure 13:06:33 then what's the benefit? 13:07:02 just once there some structure refactoring inside rally, we will know at once if it breaks senlin plugin I think 13:07:03 hopefully, we won't forget adding/modifying rally jobs when we change things ... 13:07:11 sure 13:07:21 that testing can be done at senlin gate as well 13:08:09 a little bit upset by the slow reviews there 13:08:20 Qiming, yes, me too... 13:08:47 looks like the team has no enough bandwidth for all these reviews... 13:09:15 but we do get lots of important comments :) 13:09:33 to help improve my patch and let me get better understand of rally 13:09:39 anyway, I agree we should do this at senlin repo at first, then migrate to rally step by step 13:09:51 Qiming, yes 13:09:53 this is my plan 13:10:00 graet 13:10:23 any other updates about benchmarking/performance testing? 13:10:56 guess no 13:10:57 moving on 13:10:59 no other progress I think 13:11:02 health management 13:11:40 I spent some time reading oslo.messaging code 13:11:47 2 findings 13:12:27 1. the transport used for listeners is supposed to be different from the one used for RPC, that one has been fixed, although we still get a working listener there somehow 13:13:21 2. when invoking 'get_notification_listener', we have an opportunity to specify the 'executor' 13:13:28 which defaults to 'blocking' today 13:13:44 other choices are 'threading', 'eventlet' 13:14:13 oh? 13:14:14 I tried them both but had to revert to 'blocking' for the listeners to work properly 13:14:48 what do the other two mean? 13:14:58 the only pitfall is that we will get a warning from oslo.messaging saying that our listener may be hang forever listening to events 13:15:04 that is acceptable 13:15:23 they are imported from package 'futurist' directly 13:15:35 that package provides options to execute taks in different flavors 13:15:47 I don't have a lot bandwidth to dig into that 13:15:56 ok 13:16:07 if anyone is interested in this, here is the doc: http://docs.openstack.org/developer/futurist/api.html#executors 13:16:36 that is how oslo.message dispatches events 13:16:52 ok 13:16:54 LB bug fix, any news there? 13:17:26 Two of three patches have been accepted 13:17:35 still this one https://review.openstack.org/325624 13:17:43 btw, someone stopped by on senlin channel asking for a working version of health policy 13:17:56 he said he watched our presentation on austin summit 13:18:15 oh 13:18:16 I can provide one 13:18:18 that is ringing a loud alarm to me 13:18:25 if he or she needs 13:18:46 we should be very very very careful when delivering presentation/demos 13:18:47 Adam has some concerns 13:19:09 unless we can ensure users can reproduce the demo easily using the public code base 13:19:42 I think you have revised the health policy from WIP 13:19:44 right? 13:20:03 or else, we will have difficulties attracting them to come back 13:20:17 that health policy is still not working 13:20:31 the loop is not closed 13:20:37 and fencing is not there yet 13:21:13 people will git clone and try it and see that it doesn't work 13:21:16 then they leave 13:22:05 so ... for the coming barcelona presentations, no matter which one(s) are accepted 13:22:16 the demos used in those talks must work 13:22:49 the code/profile/policy has to show up in main tree 13:23:39 I'll spend time on health management this week 13:23:51 try to close the loop asap 13:24:07 let's move on? 13:24:17 one question 13:24:24 shoot 13:24:28 does https://review.openstack.org/345916 fixes the issues xinhui mentioned? 13:24:43 about wait after listener is tarted 13:24:43 https://review.openstack.org/346390 13:24:45 this one 13:25:06 pls check the bug report 13:25:17 https://launchpad.net/bugs/1605869 13:25:17 Launchpad bug 1605869 in senlin "hang: wait is waiting for stop to complete" [Undecided,In progress] - Assigned to Cindia-blue (miaoxinhuili) 13:25:36 it is not an error reported by oslo.messaging 13:25:50 oslo.messaging is too smart in this respect 13:26:14 when it detects we didn't set a timer when calling wait() 13:26:29 it will warn us that the listener may listen forever 13:26:30 I see 13:26:33 thus a 'hang' 13:26:49 actually, that is what we wanted in a listener thread 13:27:05 a dedicated listener thread 13:27:23 okay, moving on 13:27:26 just need to ensure stop is explicitly called before stoping health manager 13:27:39 yep, that will be desirable 13:27:58 however, in multi-engine setup, we don't have a way to gracefully shutdown all threads 13:28:24 if we start a single engine, we can see that all threads are gracefully killed 13:28:33 that is a broader problem to solve 13:28:43 yes, it is 13:28:56 moving on, documentation 13:29:24 I'm working on tutorial documentation for autoscaling today 13:29:44 to make auto-scaling work, I am using ceilometer + aodh + senlin 13:29:59 many interesting/annoying findings 13:30:16 but finally, I got auto-scaling with cpu_util working now 13:30:26 though I know in theory it should work 13:30:35 exit 13:30:40 share some findings with you: 13:31:19 1. aodh alarm-update cannot process --query parameters properly, we have to get --query specified properly when doing 'aodh alarm create' 13:32:04 2. recent modifications to python-openstacksdk is breaking server details retrieval 13:32:05 sounds like a bug? 13:32:24 we cannot get 'image' and 'flavor' properties if we are using latest master 13:32:33 means senlin node-show -D will break as well? 13:32:56 yes, that one was broken as well 13:33:06 I see... 13:33:10 I have rebased senlin resources to resource2/proxy2 13:33:37 great 13:33:38 https://review.openstack.org/#/c/344662/ 13:34:12 to make that work, I have spent a lot time discussing with sdk team about the 'to_dict()' method which was removed from resource2.Resource 13:34:23 it will break all senlinclient resource show command 13:34:36 yes, think so 13:34:46 we use [''] now 13:35:01 if you are interested in this, you can check the review history: https://review.openstack.org/#/c/331518/ 13:35:12 not a backward compatible change 13:35:23 it took about one month to get that accepted 13:36:11 back to the auto-scaling experiment 13:36:23 yes, noticed the discussion between you and brian 13:36:27 will check it :) 13:36:52 this is how I created an alarm: 13:36:53 aodh alarm create -t threshold --name c1a1 -m cpu_util --threshold 50 --comparison-operator gt --evaluation-periods 1 --period 60 --alarm-action http://node1:8778/v1/webhooks/518fc9b7-01e8-410a-ac34-59fb33cb398f/trigger?V=1 --repeat-actions True --query metadata.user_metadata.cluster=113707a0-8fdc-434f-b824-98fd706a5e0d 13:37:25 the tricky part is in the --query parameter, not well documented, and it is using 'pyparsing' 13:37:36 the docs says that '==' can be used, but it won't work 13:38:00 no one is telling you that you should use 'metadata.user_metadata.cluster' for filtering 13:38:08 well, inconsistency in document again... 13:38:18 had to read the source code to get it work 13:38:38 sure, I did that two and half years ago 13:38:47 when I first time try fitlering in ceilometer 13:38:48 after this step, you won't get an alarm 13:38:55 still happening :) 13:39:27 because in all the cpu_util samples, you won't see the nova metadata included 13:39:48 then ceilometer cannot evaluate the samples, aodh cannot fire an alarm 13:40:10 looks weird 13:40:12 after reading the source code, I figured that I have to add one line into ceilometer.conf file: 13:40:23 reserved_metadata_keys = cluster 13:40:39 what does that mean? 13:40:40 after that, restart ceilometer compute agent 13:41:17 the ceilometer compute pollster will now know that 'cluster' value in the nova.metadate should be reserved 13:41:49 or else, ceilometer is dropping all metadata key-values, unless the keys are prefixed by 'metering.' 13:41:58 I don't think this is documented anywhere 13:42:06 I see 13:42:21 I recalled I met similar problem before 13:42:27 at the end of 2014 13:42:46 I'll document the process into the tutorial doc, so users will know how to make the whole thing work 13:42:57 needed to do some hack to address it 13:43:29 since this condition was not always satisfied 13:43:36 yep 13:44:09 since haiwei is not online and no one is working on container support, we can skip the container profile item 13:44:17 not a pleasant experience :) 13:44:25 engine, NODE_CREATE, NODE_DELETE 13:44:35 I think the problem is solved now 13:44:48 yes 13:44:54 saw those patches 13:44:59 I was thinking of deriving cluster actions from node actions so that policies will be respected 13:45:09 but it turned out to be too complicated 13:45:24 current solution is good I think 13:45:34 I did a workaround, making policy aware of NODE_xxx actions 13:45:47 that is making things much more clearer 13:45:54 yes, and differentiate node actions derived from different sources 13:46:00 so .. deleting that work item 13:46:26 yep, we had that design/impl in place, these patches were just leveraging them 13:46:40 yea 13:46:48 em ... need to add some release notes about this 13:46:56 right :) 13:47:07 zaqar receiver thing 13:47:12 where are we? 13:47:15 no progress this week... 13:47:22 still pending for sdk support 13:47:30 and also document updating 13:47:50 I have made some local test on 'message' resource 13:47:52 if sdk support is in, we will get a working version soon? 13:47:56 but still some problems need to fix 13:47:59 to figure out 13:48:08 then grab wangfl 13:48:11 nope, it is just for queue 13:48:16 yea 13:48:23 he is working on that I think 13:48:27 saw his patch 13:48:29 okay 13:48:45 then continue grabbing him when necessary, :) 13:49:03 no update about event/notification from last week 13:49:08 sure :) owe him a beer 13:49:17 ok 13:49:30 #topic newton deliverables 13:49:49 guys, if you take a look at the newton release schedule 13:49:51 #link http://releases.openstack.org/newton/schedule.html 13:50:04 you will see that we are at week R-10 13:50:36 that means we still have 10 weeks before the final 2.0.0 release 13:50:37 a month left 13:51:04 if we consider newton-3 milestone, we only have 1 month 13:51:21 yes, for feature freeze 13:51:33 hopefully, we can deliver what we planned at the beginning of this cycle 13:51:59 e.g. profile-validate, policy-validate, cluster-collect, cluster-do, health policy, notification, container profile 13:52:20 also message type of receiver 13:52:20 I might got some spare time next week, hope we can finish that. 13:52:35 elynn, great :) 13:52:53 yep, time to step up and claim some items that most interested you 13:52:56 know you are really trapping on some annoying stuff :) 13:53:07 that is life 13:53:13 :) 13:53:17 yea 13:53:22 always :) 13:53:26 never meant to be an easy one for anybody 13:53:37 :) 13:54:07 glad wie can still get things moving forward and even accomplish something we feel good 13:54:34 let's see what we can complete during the coming month 13:54:40 #topic open discussions 13:54:49 oh, BTW, about the mascot 13:55:01 right, I replied their email 13:55:01 I guess forest? 13:55:04 :P 13:55:07 maybe just forest 13:55:13 its an obvious choice for us 13:55:22 that is what senlin means 13:55:29 agree 13:55:32 we still have choices 13:55:41 if you have some favorite animal 13:55:47 yes!that what senlin is :) 13:56:02 forest is straightforward :) 13:56:20 easy to understand, I think the picture we always use in the slice is ok 13:56:40 email from Heidi: 13:56:40 Thank you so much for the reply! Of course I won�t mock you. Actually, I�m thrilled to know you already have a great mascot that works with this project. Senlin will have the first right of refusal on a forest since that�s already your logo. You might want to discuss with your team whether you intend the trees in your forest to look deciduous, evergreen, or a specific variety (stands of Aspen, for example). That can help guide our illustrator 13:56:41 to make a forest that reflects what you like. 13:56:41 Cheers, 13:56:41 hope no conflict with other projects :P 13:56:42 Heidi Joy 13:57:08 haha 13:57:17 deciduous, evergreen, or ... 13:57:48 evergreen sounds good, haha 13:57:50 for HA 13:57:57 good point 13:58:50 2 minutes left 13:59:35 thanks for joining boys and girls 13:59:37 no other topic from me 13:59:42 thanks 13:59:42 will you all a happy night 13:59:45 pleasant one 13:59:51 take good care of you baby :) 13:59:54 :) 13:59:54 #endmeeting