#openstack-meeting log

13:00:45 <Qiming> #startmeeting senlin
13:00:46 <openstack> Meeting started Tue May 10 13:00:45 2016 UTC and is due to finish in 60 minutes.  The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:00:47 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:00:49 <openstack> The meeting name has been set to 'senlin'
13:01:35 <Qiming> hello
13:01:39 <zzxwill> Good evening.
13:01:42 <Qiming> welcome back
13:01:49 <Qiming> o/, zzxwill
13:01:53 <haiwei> hi
13:02:08 <zzxwill> Thanks. Crazy with my work recently:(
13:02:10 <yanyanhu> hi
13:02:13 <Qiming> hi, haiwei, in taipei?
13:02:19 <elynn> Evening!
13:02:25 <haiwei> back now
13:02:44 <Qiming> actually, I only manage to restore my work env 1 hour ago
13:02:57 <Qiming> lost my laptop at austin airport
13:03:05 <Qiming> anyway
13:03:06 <lixinhui_> hi
13:03:07 <haiwei> :(
13:03:10 <zzxwill> Oh, it was a pity.
13:03:18 <elynn> Got a new one?
13:03:23 <Qiming> the only item I have in mind is about newton work items
13:03:53 <Qiming> feel free to add topics to meeting agenda: https://wiki.openstack.org/wiki/Meetings/SenlinAgenda
13:04:28 <yanyanhu> Just add a topic about adding new workitems based on our discussion in summit
13:04:55 <Qiming> yes, that is also about newton work items
13:05:23 <yanyanhu> ok
13:06:06 <cschulz_> Hi
13:06:08 <Qiming_> let's quickly go thru the current list
13:06:10 <yanyanhu> hi, cschulz_
13:06:22 <yanyanhu> ok
13:06:22 <Qiming_> #link https://etherpad.openstack.org/p/senlin-newton-workitems
13:06:43 <Qiming_> scalability improvement, need to sync with xujun/junwei
13:06:49 <yanyanhu> some of them have been done and some should be obsoleted
13:07:06 <Qiming_> tempest test
13:07:12 <Qiming_> we are on track
13:07:16 <yanyanhu> me and ethan are working on it
13:07:18 <yanyanhu> yep
13:07:18 <elynn> yes
13:07:20 <Qiming_> basic support is there
13:07:34 <elynn> We are working on API tests
13:07:50 <elynn> still need policy type list and profile type list and negative tests
13:07:59 <elynn> Do we need to add a gate job for it?
13:08:07 <elynn> in experimental
13:08:16 <Qiming_> sure, that would be nice
13:08:40 <elynn> ok, I will work on it then.
13:08:57 <yanyanhu> may also need to rework the client to enable negative test. Or we can use exception not resp status to verify the result
13:08:59 <Qiming_> recorded
13:09:33 <elynn> About the negative tests for API
13:09:36 <yanyanhu> I mean the clusteringclient of tempest test
13:09:42 <Qiming_> need to test the status code at least, imo
13:09:46 <elynn> do we only check the respond code?
13:09:52 <yanyanhu> Qiming_, yes
13:10:21 <yanyanhu> agree with this. So we may need to invoke raw_request directly
13:10:33 <Qiming_> yes
13:10:59 <elynn> I think only check status code and respond body is enough for API negative tests.
13:11:06 <yanyanhu> elynn, yes
13:11:06 <Qiming_> if we are bringing in senlinclient into this, it looks then more like a functional test of senlinclient, instead of an API test of the server
13:11:30 <Qiming_> so.. benchmarking
13:11:31 <elynn> Existing client can not return a bad status code?
13:11:42 <yanyanhu> basic support has been done
13:11:50 <yanyanhu> in rally side
13:12:03 <Qiming_> lixinhui_, any update?
13:12:18 <yanyanhu> will work on some simplest test case based on it
13:12:24 <yanyanhu> but maybe not now
13:12:49 <lixinhui_> Qiming
13:13:07 <lixinhui_> is it about benchmarking?
13:13:21 <yanyanhu> elynn, nope, the failure will be caught by rest client of tempest lib and converted to exception
13:13:22 <Qiming_> I'm wondering if bran and xinhui has done some experiments on engine/api stress test
13:13:31 <lixinhui_> we are
13:13:39 <elynn> yanyanhu, okay, I got your point...
13:13:44 <yanyanhu> :)
13:13:46 <lixinhui_> but
13:14:03 <lixinhui_> we are bottlenecked by nova
13:14:13 <Qiming_> still need to overcome the scalability issue of oslo.messaging?
13:14:28 <lixinhui_> not really about oslo
13:14:32 <lixinhui_> but nova
13:14:36 <Qiming_> nova api rate-limit?
13:14:41 <yanyanhu> oh, about this topics, I think there should be some performance improvement benefit from lastet scheduler rework
13:15:01 <lixinhui_> something like that
13:15:03 <yanyanhu> I mean the performance of senlin engine
13:15:14 <lixinhui_> we may try to resolve it at driver layer
13:15:27 <lixinhui_> from product env
13:15:38 <lixinhui_> we have rally and heat based
13:15:44 <Qiming_> okay, we need some rough numbers using both the fake driver and the real one
13:15:45 <lixinhui_> stress tests
13:15:58 <yanyanhu> Qiming_, agree
13:16:29 <lixinhui_> but that will depends on if we need bring in senlin into this test env
13:17:04 <Qiming_> yes, it would be nice to know senlin has scalability issue or not
13:17:13 <Qiming_> the earlier the better
13:17:24 <lixinhui_> Bran has tried with simulated one
13:17:54 <Qiming_> maybe we can paste the numbers on senlin wiki?
13:17:58 <lixinhui_> and found that no up limit on the
13:18:04 <lixinhui_> one engine
13:18:07 <lixinhui_> test
13:18:19 <lixinhui_> but parallel tests will need more time
13:18:26 <yanyanhu> maybe I should implement a basic rally plugin for senlin cluster and node operation to support this test
13:18:32 <Qiming_> okay
13:18:40 <yanyanhu> lixinhui_, if you guys need it, please just tell me
13:18:50 <lixinhui_> not really now
13:18:53 <Qiming_> I see, so there is a dependency
13:18:54 <lixinhui_> thanks yanyanhu
13:19:00 <yanyanhu> no problem
13:19:20 <lixinhui_> we will keep working on multiple engine simulted driver test
13:19:25 <Qiming_> or these two threads can go in parallel
13:19:37 <Qiming_> cool
13:20:09 <Qiming_> please check if we can record these "baseline" numbers into senlin wiki: https://wiki.openstack.org/wiki/Senlin
13:20:16 <lixinhui_> sure
13:20:23 <Qiming_> Rally side
13:20:49 <yanyanhu> basic support for senlin in rally has been done. Will start to work on plugin
13:21:00 <Qiming_> we are still about to commit rally test cases to rally project?
13:21:06 <yanyanhu> will start from basic cluster operations
13:21:39 <Qiming_> by plug-in, you mean we will be hosting the rally test cases?
13:21:56 <yanyanhu> Qiming_, we can if we want to I think
13:22:10 <yanyanhu> to hold the test jobs
13:22:11 <Qiming_> what's the suggestion from rally team?
13:22:23 <yanyanhu> they sugguest us to contribute the plugin to rally repo
13:22:28 <yanyanhu> which I think makes sense
13:22:47 <yanyanhu> for those jobs, we can hold it in senlin repo I think
13:23:28 <Qiming_> ... jobs are not modelled as plugins?
13:23:48 <yanyanhu> no, jobs means those job description file :)
13:23:49 <Qiming_> what is this then? https://review.openstack.org/#/c/301522/
13:23:54 <yanyanhu> those yaml or json file
13:24:23 <yanyanhu> Qiming_, those jobs are used as example to verify the plugin :)
13:24:34 <Qiming_> okay, makes sense
13:24:35 <yanyanhu> more jobs should be defined per our test requirement
13:24:44 <yanyanhu> which I guess should be hold by ourselves
13:24:57 <Qiming_> that is fine
13:25:59 <Qiming_> pls help make that plugin work so others may help contribute job definitions etc.
13:26:10 <yanyanhu> sure
13:26:16 <yanyanhu> will work on it
13:26:26 <Qiming_> health management
13:26:42 <Qiming_> em, a huge topic indeed
13:26:44 <lixinhui_> is trying the linux HA
13:26:56 <Qiming_> for health detection?
13:27:06 <Qiming_> or recovery, or both?
13:27:09 <lixinhui_> wanna Qiming_ to share more picture in your mind
13:27:27 <Qiming_> you mean photo from San Antonio?
13:27:29 <lixinhui_> based on dicussion with adam and DD
13:27:36 <lixinhui_> fencing
13:27:39 <lixinhui_> nowdays
13:27:46 <lixinhui_> with CentOS
13:27:54 <lixinhui_> VM
13:28:05 <Qiming_> got it
13:28:05 <lixinhui_> but you know
13:28:20 <lixinhui_> just wanna to know more picture
13:28:31 <Qiming_> need to spend sometime on the specs and the etherpad
13:28:32 <lixinhui_> about the HA story
13:28:49 <lixinhui_> yes
13:28:58 <Qiming_> we cannot cover all HA requirement in our very first step
13:29:12 <Qiming_> we may not be able to cover them all in future
13:29:14 <lixinhui_> from presentation of Adam and DD
13:29:22 <Qiming_> need to focus on some typical usage scenarios
13:29:38 <lixinhui_> They hope to leverage Senlin on Recover and
13:29:42 <lixinhui_> fecing
13:29:47 <lixinhui_> fencing
13:29:59 <Qiming_> right
13:30:15 <lixinhui_> but is that assumed design by ourselves?
13:30:27 <Qiming_> so ... let's focus on the user story then
13:30:29 <cschulz_> What is your thought, that HM will create events that may trigger cluster actions based on cluster policies?
13:30:55 <lixinhui_> yes
13:31:02 <Qiming_> we will build the story step by step
13:31:07 <lixinhui_> that is the recover part
13:31:23 <Qiming_> first step is check/recover mechanism, the very basic ones
13:31:39 <Qiming_> and fencing may become part of the recover process
13:32:01 <cschulz_> So there probably also needs to be policy like things in HM that defines how the health of a cluster is assessed?
13:32:12 <Qiming_> second step is to try introduce some intelligence on failure detection
13:32:38 <yanyanhu> health check and failure recovery can be two workitems in parallel I guess?
13:32:48 <Qiming_> third step is to link the pieces together using some sample health policies
13:33:01 <lixinhui_> actually I do not think we should do many check things
13:33:03 <Qiming_> yes, guess so
13:33:21 <cschulz_> agreed, health checking is independent of what actions you take when you've made an assessment
13:33:42 <Qiming_> if users don't like the health policy, we still provide some basic APIs for them to do cluster-check, cluster-recover [--with-fence], etc.
13:34:12 <cschulz_> Actually that is where I'd start
13:34:30 <Qiming_> user may don't like the way we do health checking, still, they can do cluster-recover by triggering that operation from their software/service
13:34:31 <cschulz_> Then add some basic mechanisms for those who just want simple
13:34:40 <Qiming_> right
13:35:14 <Qiming_> I cannot assume we understand all usage scenarios
13:36:06 <Qiming_> :) I was challenged by linux-ha author during my presentation --- how do you detect application failure?
13:36:27 <cschulz_> And your answer was?
13:36:38 <Qiming_> it is a huge space, we cannot assume we know all the answers
13:36:52 <Qiming_> application failure detection is currenly out of senlin's scope
13:37:03 <cschulz_> Agreed!
13:37:03 <lixinhui_> yes
13:37:05 <haiwei> I think so
13:37:08 <lixinhui_> that is his anwser
13:37:15 <lixinhui_> from this
13:37:23 <Qiming_> there are plenty of software doing application monitoring, use them
13:37:27 <lixinhui_> I do not think we can understand the use case today
13:37:42 <Qiming_> but we can start from the basics
13:37:57 <lixinhui_> or the design on the loop of check and recover
13:38:15 <yanyanhu> so the key is how to leverge those monitoring tools/services
13:38:32 <lixinhui_> but trying to provide some basic investment
13:38:35 <yanyanhu> to detect failure of node/app happened in senlin cluster
13:38:38 <lixinhui_> on the choice of failure proceing
13:38:40 <Qiming_> we leave choices to users, though we do provide some basic support to simple cases
13:38:45 <lixinhui_> processing
13:39:00 <lixinhui_> even today
13:39:03 <lixinhui_> masakari
13:39:08 <lixinhui_> 's evacuate
13:39:15 <Qiming_> recover a heat stack is completely different from recovering a nova server
13:39:22 <lixinhui_> can not work well with all guest OS and hypervisor
13:39:33 <Qiming_> you are already onto masakari?
13:39:53 <lixinhui_> tries some that function of masakari
13:40:07 <lixinhui_> need to investigate more
13:40:17 <Qiming_> ... big thanks!
13:40:27 <lixinhui_> :)
13:40:31 <cschulz_> masakari is new to me.  Will investigate
13:41:04 <lixinhui_> it has a vagrant and chef deployer, cschulz
13:41:05 <Qiming_> for HA support, let's focus on planning
13:41:20 <lixinhui_> yes
13:41:24 <yanyanhu> https://github.com/ntt-sic/masakari
13:41:30 <yanyanhu> this one?
13:41:31 <Qiming_> build stories on the etherpad: https://etherpad.openstack.org/p/senlin-ha-recover
13:41:45 <Qiming_> yanyanhu, yes
13:41:50 <Qiming_> moving on
13:42:17 <Qiming_> documentation side
13:42:31 <Qiming_> I'm working on API documentation in RST
13:43:00 <Qiming_> hopefully, it can be done soon, then I can switch to tutorial/wiki docs
13:43:06 <yanyanhu> will provide some help on it
13:43:14 <Qiming_> great, yanyanhu
13:43:24 <Qiming_> container support
13:43:45 <Qiming_> haiwei, maybe we can check in the container profile as an experimental one
13:44:22 <haiwei> you mean just create one first?
13:44:33 <Qiming_> yes, very simple one is okay
13:44:46 <Qiming_> it has to work, it has to be clean
13:44:59 <Qiming_> we can improve it gradually
13:45:32 <haiwei> ok, I will submit some patches for it
13:45:56 <Qiming_> then we can start looking into the specific issues when CLUSTERING containers together
13:46:24 <Qiming_> at the same time, we will watch the progress of the Higgins project: https://review.openstack.org/#/c/313935/
13:46:47 <haiwei> yes, I noticed it recently
13:47:00 <Qiming_> if that one grows fast, we can spend less and less energy at this layer
13:47:14 <Qiming_> just focusing on the clustering aspect of the problem
13:47:25 <yanyanhu> agree :)
13:47:41 <Qiming_> that's why I think a simple profile suffices
13:47:50 <haiwei> ok
13:47:57 <Qiming_> for us to think about the next layer
13:48:30 <Qiming_> "tickless" scheduler is out
13:48:33 <Qiming_> that is great!!!
13:49:09 <yanyanhu> :)
13:49:22 <yanyanhu> it do improve the efficiency of our scheduler
13:49:27 <Qiming_> any news from zaqar investigation?
13:49:38 <yanyanhu> very appreciated your suggestion in summit :P
13:49:54 <lixinhui_> about event and notice mechanism
13:50:14 <cschulz_> I've been very distracted since week of Austin summit, so not much progress.
13:50:19 <lixinhui_> I do not know if that is related to the scenario discussion on summit
13:50:22 <lixinhui_> but vmware PM
13:50:37 <lixinhui_> on customisable reaction
13:50:45 <Qiming_> okay
13:51:12 <Qiming_> lixinhui_, I was thinking of this scenario
13:51:12 <lixinhui_> or just related to the processing of action
13:51:14 <cschulz_> Can someone give me a brief on the scenario discussion?
13:51:20 <Qiming_> for vmware vm monitoring
13:51:53 <Qiming_> senlin can emit events for vmware to listen
13:52:06 <Qiming_> so that it will know which node belongs to which cluster
13:52:09 <lixinhui_> that will be great
13:52:29 <Qiming_> it will have some knowledge to filter out irrelevant vms when doing maths on metrics
13:52:43 <lixinhui_> yes
13:53:01 <lixinhui_> that is desired by mix deployment env
13:53:16 <Qiming_> okay, we can work on a design first
13:53:43 <Qiming_> a multi-string configuration option for event backend
13:53:52 <Qiming_> we only have database backend implemented
13:54:19 <Qiming_> we can add http, message queue as backends
13:55:01 <Qiming_> detailed design is still needed
13:55:12 <Qiming_> em... only 5 mins left
13:55:17 <lixinhui_> Okay
13:55:24 <cschulz_> Are events predefined?  Or can a stack/cluster define events it wants?
13:55:33 <Qiming_> yes, cschulz
13:55:39 <yanyanhu> Qiming_, maybe we postpone the second topic to next meeting
13:55:54 <yanyanhu> about adding new workitems based on discussion in summit
13:55:58 <Qiming_> ok
13:56:21 <Qiming_> there are followups wrt the design summit sessions
13:56:34 <Qiming_> need to dump them into TODO items
13:56:49 <Qiming_> and those items will be migrated to this etherpad for progress checking
13:57:01 <yanyanhu> yes
13:57:03 <Qiming_> for example, profile/policy validation
13:57:18 <Qiming_> that means one or two apis to be added
13:57:34 <Qiming_> when someone has cycles to work on it, we can add it to the etherpad
13:57:51 <Qiming_> the same applies to all other topics we have discussed during the summit
13:58:33 <lixinhui_> cool
13:58:40 <Qiming_> that's all from my side
13:58:51 <Qiming_> two mins left for free discussions
13:58:59 <Qiming_> #topic open topics
13:58:59 <lixinhui_> that was good discussion there in Austin
13:59:15 <yanyanhu> yep :)
13:59:33 <cschulz_> Anyone can send me anything they would like proofread for English.
13:59:41 <Qiming_> okay, we successfully used up the 1 hour slot, :)
13:59:50 <cschulz_> bye
13:59:53 <Qiming_> thanks, cschulz_
14:00:08 <Qiming_> thanks everyone for joining
14:00:12 <Qiming_> #endmeeting
14:00:15 <haiwei> thanks
14:00:18 <haiwei> bye
14:00:35 <yanyanhu> bye
14:00:35 * regXboi finds a corner in the room and quietly snores
14:00:39 <Qiming_> cannot end meeting
14:00:49 <lixinhui_> ..
14:01:16 <Qiming_> nickname occupied I think
14:01:51 <Sam-I-Am> hello networking nerds
14:02:02 <pcm_> o/
14:02:02 <jlibosva> o/
14:02:04 <Qiming_> #endmeeting