13:00:06 #startmeeting senlin 13:00:07 Meeting started Tue Dec 5 13:00:06 2017 UTC and is due to finish in 60 minutes. The chair is ruijie_. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:00:08 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:00:10 The meeting name has been set to 'senlin' 13:00:27 hi ruijie_ 13:00:38 hi all, this is the agenda, please feel free to add topics: https://wiki.openstack.org/wiki/Meetings/SenlinAgenda#Agenda_.282017-12-05_1300_UTC.29 13:00:42 hi XueFeng 13:02:39 let's wait for a while 13:02:48 hi ruijie_ 13:03:00 hi 13:03:01 hi elynn :) 13:03:03 hi 13:03:05 Qiming 13:03:15 hi,all 13:03:19 woo 13:03:44 let's get started :) 13:03:57 https://review.openstack.org/#/c/523965/ 13:04:23 the first one is the lifecycle hook for scale-in action 13:05:19 this means we will set a temporary status for the scale-in action 13:06:21 and a greenthread will wait for the change of the status 13:06:40 so that the server or user have time to terminate there service on the server before we destroy it 13:09:09 spawn_after means creating a new greenthread after given time interval? 13:09:46 emm, I think he want to create a thread in the policy to wait for the TIMEOUT happens 13:10:09 he meant to invoke spawn_after 13:10:33 this thread will go head to destroy the server if we receive a message/request directly, or we wait for the TIMEOUT and then destroy it 13:10:39 I'm not sure a thread is actually created, but ... anyway 13:10:54 it is a confirmation 13:11:02 yes Qiming 13:11:09 WAITING_LIFECYCLE_COMPLETION 13:11:23 it makes sense 13:11:31 but there are some corner cases to check 13:11:53 suppose we have a cluster-scale-in request arriving, with timeout set to 180 seconds 13:12:17 then we cannot wait for the new confirmation for more than 30 seconds 13:12:55 also, in current implementation, policies are checked DURING action execution, not BEFORE action execution 13:13:33 when CLUSTER_SCALE_IN action is executed, senlin will check the policies attached to the cluster 13:13:44 that action will lock the cluster for operation 13:14:21 for cluster status to be consistent, we cannot release that lock 13:14:44 we have to wait for the WAITING_LIFECYCLE_COMPLETION signal to arrive, while holding the lock 13:15:11 yes Qiming, the cluster should be locked during this period 13:15:50 That might block the whole cluster operation? 13:15:54 maybe pointing at node actions makes sense: cluster scale-in --> node-delete(pre_check the policy, create threadGroup/thread to process, and release current thread) 13:16:01 yes, elynn 13:17:09 for this particular use case, I'm wondering if we can extend the deletion policy 13:17:30 once we receive a message/request, we only update the node status 13:17:49 say add a property to the deletion policy: 13:17:50 action's status actually 13:17:54 notify: 13:18:00 type: zaqar 13:18:18 sink: wherever zaqar is 13:18:44 need_confirm: true 13:18:51 timeout: 120 13:19:24 we continue the cluster-scale-in or cluster-resize or node-delete operation 13:20:05 just the node will become orphant node 13:20:38 or reject the deletion? 13:20:50 acutally node DESTROY happens either 1) timeout waiting for confirm, or 2) a confirm message is received via the proposed new action 13:21:00 So this node was actually removed from a cluster instead of deleted? 13:21:22 yes ... 13:21:41 How can senlin receive a confirm message? 13:21:41 no matter we are getting a notification or not, the node will be removed 13:21:45 from receiver? 13:22:07 the use case is actually only about "postponing the node deletion operation" 13:22:27 kind of a new action as proposed in the spec 13:23:34 Also we might need a new node status. 13:23:49 yes, that is fine 13:24:07 this is a very real use case 13:24:30 just trying to identify some mismatches between the proposed approach and senlin's current workflow 13:25:05 we cannot do policy check before locking a cluster 13:25:11 there are race conditions 13:25:37 the advantage for a new policy is that we can extend it to support cluster-scale-out, to ask permission from supervisors? 13:26:05 say senlin noticed a "lifecycle" policy attached, so it decides to do something, then before senlin actually started doing the job, the policy may get detached ... 13:26:36 that is true, ruijie_ 13:26:55 still, we have to check policy with cluster locked 13:28:06 and it will introduce more concurrency issues if we lock a cluster twice: one lock for policy checking, another for actual cluster operation 13:29:04 emm, how about: cluster-scale-in --> lock the cluster --> node actions --> check the policy --> hold here .. we only focus on the node level 13:29:38 yes, for scaling in operation, that is what I was talking about 13:30:05 to be more specific, the policy is a new deletion policy 13:30:44 that policy will grant users sufficient time to drain a node 13:31:17 the question is ... when are we releasing the cluster lock 13:32:05 once required number of nodes are removed from the cluster, we treat the cluster scale-in operation completed 13:32:34 however, the deletion policy leaves a thread waiting for further confirmation before destroying the node 13:32:55 and it is just an extension of the grace period 13:33:02 make sense? 13:34:07 looks so.. rejection might be reasonable? 13:34:22 rejection? 13:34:41 you are not able to destroy the node currently, please try again 13:35:04 no, cluster-scale-in operation was successful 13:35:21 number of active/working nodes reduced 13:35:35 just there are some nodes not yet cleanly destroyed 13:36:08 leaving nodes there untouched has been treated as a valid situation before 13:36:26 we have "destroy_after_deletion" property in deletion policy 1.0 13:37:09 what we really need is some new fields, that prolongs the waiting period, and waiting for a confirmation before finally destroying the node 13:38:03 I'm also a little bit cautious at introducing new policies 13:38:43 one principle we had in senlin is that all policies are independent from each other 13:39:08 they can be used separately, and they can be combined freely 13:39:31 all policies have builtin priorities to make sure all possible combinations make senses 13:40:19 the new lifecycle policy, if permited, will conflict with deletion policy, I'm afraid 13:40:49 although, I do see how careful the author has been when drafting the spec 13:41:30 he/she has considered actions to handle, alternatives to the proposed approach etc etc 13:41:36 exactly, it looks like an extension of the deletion policy, but useful for some cases 13:41:38 it is a great spec 13:42:49 yet another possibility ... if I failed to convince you that an extension to the deletion policy is okay for the use case 13:43:01 we don't call this a policy 13:43:13 a lifecycle hook is a hook 13:43:38 it is a precheck and postcheck for cluster level operations 13:43:55 we don't lock the cluster for these hooks 13:44:28 we clearly document what can happen when user application receives a notification from senlin 13:46:43 but other actions could be triggered during this period if we do not lock the cluster/node 13:47:02 right, that is the dilemma 13:47:42 in fact, we don't have a lot choices 13:48:15 one option is a synchronous call, senlin doesn't move forward until we get a response 13:48:34 another option is asynchronous notification 13:49:05 sync calls to outside world, with cluster locks held, is dangerous, imo 13:49:51 but it meets our action flow.. 13:51:15 exactly 13:52:15 A little time for another two topics:) 13:52:28 I see, that is a problem .., sync == hold cluster lock, async == release lock for current implementation 13:53:06 ya, don't do async calls with locks held ... ;) 13:53:49 ha sorry, we do not have enough time for rest topics, so please leave comments for this patch :) 13:54:18 yep. please jump in and leave comments 13:54:38 btw, ruijie_ , we are about to release q-2 this week 13:54:42 let's move to senlin channel? we only have 6 minutes left 13:54:46 yes Qiming 13:54:51 I am working on it today 13:54:54 release team is not sending notifications? 13:55:00 great! 13:55:51 great man, ruijie_ 13:55:58 I am going to release the channel, we have another 2 topics need to discuss in #senlin 13:56:09 ok 13:56:17 thanks for joining :) 13:56:28 #endmeeting