13:00:36 #startmeeting senlin 13:00:37 Meeting started Tue Aug 2 13:00:36 2016 UTC and is due to finish in 60 minutes. The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:00:38 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:00:41 The meeting name has been set to 'senlin' 13:00:59 hello 13:01:06 hi 13:01:13 hi 13:01:31 evening, xinhui, haiwei 13:01:35 hi 13:01:39 sorry I'm late 13:01:41 hi, yanyan 13:01:51 np 13:02:10 pls review agenda and see if you got items to add 13:02:12 #link https://wiki.openstack.org/wiki/Meetings/SenlinAgenda#Weekly_Senlin_.28Clustering.29_meeting 13:02:40 first item, newton work items 13:02:48 #topic newton work items 13:02:56 #link https://etherpad.openstack.org/p/senlin-newton-workitems 13:03:37 updates ? 13:03:58 rally plugin 13:04:23 the patch for rally side is still in progress, will check and fix latest issues tomorrow morning 13:04:32 hope can finish it soon 13:04:43 okay, just got some comments from Roman ... 13:04:48 for senlin repo, plugin for cluster scaling has been proposed 13:04:50 Qiming, yes 13:05:00 need quick fix and also some explanation 13:05:08 okay 13:05:29 https://review.openstack.org/346656 13:05:32 interesting ... we are still exposing 'parent' to client? 13:05:32 the one for cluster scaling 13:05:56 I guess there is still some out of date msg in doc? 13:06:07 okay, it has been hanging there for some days 13:06:07 will check it and reply to roman 13:06:15 sounds great 13:06:24 yes, hope can complete it soon 13:06:38 maybe we can ask some helps from cmcc 13:06:51 don't know eldon can offer a hand 13:07:01 Qiming, yes, I think we can ask them for some use case reference 13:07:20 for coding, it's ok for me since 13:07:28 okay 13:07:34 there is no critical issue left I feel 13:07:45 anyway, will keep working on it 13:07:53 anything else in this space? 13:08:02 nope I think 13:08:06 moving on 13:08:18 About Fencing 13:08:22 health management, no progress from my side last week 13:08:50 I add some points on Qiming's HA etherpad 13:09:13 First step is to target fencing nova compute service 13:09:35 Second step is fencing of vm 13:09:49 for compute service fencing 13:10:01 which should happen when some host failure happens 13:10:11 that is actually abouf fencing a nova compute node, correct? 13:10:19 yep 13:10:23 Yes, Qiming 13:10:31 I don't have a multi-node setup at hand 13:10:48 cannot produce a compute node failure to observe the host failure events 13:10:53 So I wonder if proper to add this into healthmonitor 13:11:06 Qiming 13:11:11 have you got any hints on that? either by digging into the source or doc or thru experimentation? 13:11:37 compute node failure can only be known by polling service status 13:11:45 observing host failure could only be done thru events 13:12:07 Actually 13:12:13 I'm a little reluctant to poll nova compute services 13:12:28 Nova today use heartbeat to know if host is alive or not 13:12:37 that is their internals 13:12:55 we are not supposed to peek into that 13:13:01 There are only two types of event for nova to notice 13:13:08 one if the node.update 13:13:16 the other is service.update 13:13:17 IIRC, nova has event reports when a host fails 13:14:01 okay, then we can listen to those events 13:14:07 Service.update can be sent only when the change happen on the nova services by nova service* 13:14:28 don't understand 13:15:08 you can read the code of nova/objects/service 13:15:16 .py 13:15:36 can you pls just explain your last sentence? 13:15:38 my experiments proove this 13:16:28 that means the up or down of nova services will be changes based on heatbeat without notice 13:16:50 but service.update will be sent when I enable or disable some serivce 13:16:57 okay, those are the nova internal state maintenance, we cannot check it from outside 13:17:15 so 13:17:24 if nova-compute is down, no event notification is sent? 13:17:31 no 13:17:36 after two cycle 13:17:44 the serivce becomes down 13:17:47 that is all 13:18:20 after detection that, we can fencing the compute 13:18:21 good/bad to know ... 13:18:49 sounds like the only way for failure detection is polling? 13:18:59 need to double check that 13:19:04 that was my understanding 13:19:06 or read the status of nova service 13:19:28 but last time in a mailinglist discussion, I raised this question 13:19:30 that would be good if you can double check 13:19:51 someone told me that nova is already capable of sending out notifications when a compute service is down 13:20:04 I hope I have ten heads 13:20:18 need to dig that email 13:20:25 or the source code 13:21:00 #action Qiming to double check nova's capability of notifying host down 13:21:10 they indeed add some notices 13:21:10 moving on 13:21:13 nova/nova/objects/service.py 13:21:29 documentation side 13:21:42 added some user references docs last week 13:21:57 mainly a reorg around auto-scaling, receivers ... etc 13:22:44 I was thinking of adding a tutorial about auto-scaling, but later I realized that is a huge topic, not suitable for a tutorial, which is supposed to be pretty short 13:23:22 I have also moved the heat based autoscaling under a scenarios subdirectory 13:23:35 where in future we can add more scenarios for references 13:24:03 will check if tutorial doc can be left there ... 13:24:15 next ... 13:24:34 yanyan just started adding version control to profile and policy specs 13:24:47 this is necessary, pls help review 13:24:58 yes, just proposed the first patch https://review.openstack.org/348709 13:25:06 thanks 13:25:06 to add version support to schema and spec 13:25:28 in parallel, I'm looking into oslo.versionedobjects for a more wholistic solution 13:25:32 will update later 13:25:43 moving on ... 13:25:43 my pleasure. Really need some discussion about this topic 13:25:51 container profile support 13:26:24 haiwei just pushed a commit: https://review.openstack.org/#/c/349906 13:27:11 yes, Qiming 13:27:22 I haven't got time to review 13:27:28 I only tested it partly 13:27:28 just a quick glance 13:27:59 team, please take a look at it and help polish it when you got cycles 13:28:19 sure, will check it 13:28:28 thx 13:28:30 moving on 13:28:32 I think the point for that patch is where should we store 'host_node' uuid? in that patch I stored it in the metadata of profile 13:29:00 maybe node.data ? 13:29:33 if you check other policy decisions such as zone placement, region placement ... 13:29:54 ok, I will think about it 13:29:59 we are injecting data into the 'data' field of the node (abstract one) 13:30:23 then when we are about to create the physical resource, we extract those policy decisions 13:30:36 profile metadata was designed for users to use 13:30:49 in service layer we got host_node, but it is not the server's id, so we need to pass server's id to profile 13:31:08 e.g. {'author': 'haiwei', 'last-updated': '2016-08-02', ... } etc 13:31:35 we can pass those information in node.data field 13:31:50 I will check it later 13:31:52 the node.data field was designed to carry those data around 13:31:55 great 13:32:33 pls also think if we can move the policy decision out into a policy type 13:33:04 1. that will make the engine code cleaner; 2. we could later improve/replace that policy type implementation easily 13:33:19 what policy decision? 13:33:33 currently I am thinking about node_create 13:33:37 by "policy decision" I mean the selection of node in a hosting cluster 13:34:26 just something to keep in mind, I'm not sure how feasible it is without digging into the code 13:34:39 ok 13:34:45 great, moving on 13:34:52 zaqar based receiver 13:35:13 yanyan has been busy working on that ... 13:35:31 yes, have confirmed with zaqar team again about the usage of "project_id" and "client_id" today 13:35:53 just as you said, we should expose them out for invoker of sdk proxy call 13:36:20 have post the latest result in the follow patches: * https://review.openstack.org/349369 13:36:21 * https://review.openstack.org/338041 13:36:37 okay, the thing to bear in mind is ... 13:37:03 if you put 'client_id = Header('Client-ID') in that Message class 13:37:21 the header still won't appear in the final request ... 13:37:41 that is something I missed when reviewing your last patch 13:38:23 overriding resource calls will make it take effect I think? 13:38:23 so ... your way of overriding those methods are still valid, though there are rooms for improvement 13:38:29 yes 13:38:30 like the latest patch does 13:38:36 yes 13:38:49 really not graceful way 13:38:51 it is ugly, but ... you know, people need time to understand the issue we are facing 13:38:58 yea 13:39:17 we should allow custom headers in all those create, get, list calls 13:39:18 hope brian can figure it out when building resource2 13:39:24 using better way 13:39:29 yes 13:39:30 resource2 is already there ... 13:39:36 if so, that will be much better 13:39:49 Qiming, it still needs some improvement I think 13:39:50 but he doesn't seem buy in the idea of adding more parameters 13:40:00 for those "corner" use cases 13:40:12 thanks for keeping the balls rolling 13:40:24 will review your new patchset tomorrow 13:40:32 thanks a lot 13:40:38 moving on 13:40:47 events/notifications 13:40:55 no update from me in this space 13:41:17 actually I was trapped in a more general issue .... versioning of things 13:41:43 okay, next topic 13:41:49 #topic newton deliverables 13:42:29 though I've been digging into the issue of versioning of things, I don't think we can get it done by this cycle 13:43:04 on the other hand, the new features about cluster-collect and cluster-do will have to base on micro-version support 13:43:22 which is also blocked here: https://review.openstack.org/#/c/343992/ 13:43:56 still need time to convince brian that the current patch is already okay 13:44:07 this part is really complicated... 13:44:37 the overall design and impl is good, there are some trivial coding style things for communication 13:44:49 health policy implementation ... 13:45:08 I do hope we can deliver a basic, working version by this cycle 13:45:17 sure 13:45:18 as for container cluster 13:45:33 really need to achieve that goal I feel 13:45:35 it would be GREAT we can have a basic, working version 13:45:51 yes, people are asking questions on that 13:46:01 yes 13:46:18 let's keep working hard on this 13:46:24 s/this/these 13:46:39 next topic I added is about versioned objects 13:46:56 when adding new properties to policy (e.g. the lb policy revision lately) 13:47:03 we need to bump policy version 13:47:19 so ... we have a lot of things to be versioned 13:47:35 1. API micro-version 13:47:43 2. API request body version 13:47:51 3. API response version 13:47:57 4. RPC version 13:48:07 5. DB object version 13:48:14 6. Event/Notification version 13:48:21 7, Policy type version 13:48:26 8. Profile type version 13:48:39 9. Receiver version 13:49:05 without proper versioning infra at hand, we will quickly loose control of things 13:49:16 Qiming, yes, almost every elements that could vary over time 13:49:17 and things will break in a thousand ways 13:49:44 so... I'm investigating oslo.versionedobjects, every single line of code there 13:49:54 and also jsonschema doc/implementation 13:50:21 I think I have got a rough idea on how to unify all object versioning into the same framework 13:50:42 but that warrants a lot of experimentation and code churn 13:51:14 will leave that as a long term work, maybe by end of Ocata we will have this framework completely landed 13:51:41 ideally, after that, when you want to add a new property to an existing resource 13:52:09 you won't need to modify a few hundred lines of code while still worrying about breaking existing users 13:52:22 great, we can add version support for different elements gradually I think 13:52:42 some preliminary code have proved the feasibility of this 13:53:05 we can even make the api-ref documentation generated out of these objects 13:53:10 start from most basic part and keep it in mind when making changes on those "unversioned" stuff 13:53:24 yep 13:53:56 so that is my update in this thread 13:54:15 I didn't leave any time for questions/comments 13:54:25 #topic open discussion 13:54:31 no problem, will check the code :) 13:54:44 voting is open now :) 13:54:50 good luck for senlin's topic 13:54:59 topic(s) 13:55:07 yup 13:55:09 blessing 13:55:33 this is a very strange document to read ... http://json-schema.org/latest/json-schema-validation.html 13:55:59 yes... 13:56:16 and that is their most comprehensive one I guess .... :) 13:56:23 Hi 13:56:23 really hope we are native eng speaker :) 13:56:38 hi, namnh 13:57:46 anything else? 13:57:55 nope from me 13:58:42 seems lixinhui has dropped 13:58:50 yea 13:58:57 will over time soon 13:58:59 but anyway I will look into the nova code 13:59:15 thanks, guys 13:59:21 let's meet next week 13:59:23 thanks, have a good night 13:59:28 #endmeeting