#openstack-meeting log

04:02:20 <samP> #startmeeting masakari
04:02:21 <openstack> Meeting started Tue Dec 13 04:02:20 2016 UTC and is due to finish in 60 minutes.  The chair is samP. Information about MeetBot at http://wiki.debian.org/MeetBot.
04:02:22 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
04:02:24 <openstack> The meeting name has been set to 'masakari'
04:02:30 <samP> Hi all
04:03:02 <samP> Thank you all for attending to our first masakari IRC
04:03:52 <samP> First I would like to make quick intro from every one
04:04:03 <tpatil> I think we should add weekly agenda on masakari wiki page
04:04:12 <tpatil> #link https://wiki.openstack.org/wiki/Masakari
04:04:32 <abhishekk> #link https://wiki.openstack.org/wiki/Meetings/Masakari#Agenda_for_next_meeting
04:04:37 <samP> can we use the rollcall?
04:05:23 <samP> tpatil: sure, I will do that
04:05:43 <samP> #action samP add add weekly agenda on masakari wiki page
04:05:50 <abhishekk> samP: I have added some points for todays discussion
04:06:00 <samP> abhishekk: thank you
04:08:22 <abhishekk> shall we start with the discussion?
04:08:30 <samP> sure,
04:09:15 <tpatil> Let's start the discussion as per agenda
04:09:27 <samP> we dont have opens bugs right? can we stat jump in to next?
04:09:32 <abhishekk> yes
04:09:36 <samP> sure
04:10:03 <tpatil> Only one issue is open
04:10:26 <samP> #topic Discussion new features
04:10:34 <tpatil> in fact there are 3 open issues but none is critical
04:11:20 <samP> tpatil: OK, lets do it after this discussion
04:11:35 <tpatil> samP: ok
04:11:36 <samP> First, evacuate_all config option
04:12:38 <samP> now we dont evacuate VMs without HA flag, this option enable/disable the evacuation of all VMs
04:12:49 <abhishekk> yes, we are using this option for host_failure flow
04:13:18 <abhishekk> if this option is True then we should evacuate all the instances else only ha_enabled instances should be evacuated
04:13:34 <tpatil> #link https://review.openstack.org/#/c/407538/
04:13:55 <abhishekk> IMO we should rename this option so that we can use for instance_failure flow as well
04:14:38 <samP> abhishekk: agree, we have kind of same issue there
04:15:02 <abhishekk> as of now in instance_failure we are only processing HA-Enabled instances
04:15:14 <samP> how about rescue_all?
04:15:59 <rkmrhj> rescue is other API name of nova.
04:16:12 <abhishekk> right, we can decide about the config name in internal disscussion
04:16:31 <samP> rkmrhj: ah,, thank you
04:17:48 <samP> In future, we are going to impliment custormize rescue patterns,
04:19:05 <samP> I think we need to define separate options for evacuate and instance_failure
04:19:27 <tpatil> samP: Let's add  a new blueprint to describe about new feature
04:19:52 <samP> such as, evacuate for all but instance_failure only for HA enable VMS
04:19:57 <tpatil> also, we should add a litespec to describe how we are going to implement it
04:20:10 <samP> tpatil: sure
04:20:42 <samP> do we need a spec repo? or just document it someware?
04:20:54 <tpatil> repo is better
04:21:39 <samP> OK, in future, spec repo is more useful. I'll try to get one
04:21:51 <samP> #action create spec repo for masakari
04:23:09 <samP> are we going to create BP for "evacuate_all config option"?
04:23:41 <tpatil> I think blueprint should be enough for this change as design wise it's not a big change
04:24:02 <samP> tpatil: agree
04:25:00 <tpatil> can we move to the next item
04:25:11 <samP> any volunteer for that BP?
04:25:46 <abhishekk> I will do that
04:26:04 <samP> #action abhishekk crate evacuate_all config option BP
04:26:09 <samP> abhishekk: thanks
04:26:20 <samP> OK lets go to next item
04:26:25 <abhishekk> samP: no problem
04:26:46 <samP> item2: an we have one periodic task?
04:26:54 <abhishekk> Ok, I will explain about this
04:26:56 <abhishekk> Earlier we were planning to have two periodic tasks,
04:27:06 <abhishekk> process_error_notifications, for processing notifications which are in error state
04:27:13 <abhishekk> process_queued_notifications, for processing notifications which are in new state for long time due to ignored/missed by messaging server.
04:27:42 <abhishekk> but we can club this into one as in both the tasks we are going to execute the workflow again
04:27:54 <abhishekk> this way we can eliminate the duplicate code
04:29:05 <tpatil> the question is, we can process both these tasks in a single periodic task
04:29:56 <tpatil> the only difference is in case of notification status is new, if the periodic task fails to execute the workflow , should the status be set to "failed" or "error"?
04:30:45 <tpatil> abhishekk: can you please explain the status transition that takes place during processing notifications
04:30:53 <abhishekk> ok
04:31:04 <abhishekk> process_error_notifications:
04:31:14 <abhishekk> Error flow, error >> running >> error >> failed
04:31:22 <abhishekk> Success flow, error >> running >> finished
04:31:33 <abhishekk> for process_queued_notifications:
04:31:40 <abhishekk> Error flow, new >> running >> error
04:31:47 <abhishekk> Success flow, new >> running >> finished
04:32:35 <abhishekk> In case of secnond periodic task it we set status to error then that will again be taken for execution by process_error_notifications
04:32:51 <abhishekk> so we cab club this and have common flow like,
04:32:59 <abhishekk> Error flow, new/error >> running >> error >> failed
04:33:06 <abhishekk> Success flow, new/error >> running >> finished
04:33:31 <samP> Is there any flag to stop it at some point?
04:34:07 <tpatil> abhishekk: Let's add a litespec to explain all these possible cases
04:34:08 <abhishekk> no, these periodic tasks will run at regular interval
04:34:38 <abhishekk> ok
04:35:01 <samP> OK, lets discuss this further on the spec
04:35:28 <samP> abhishekk: can I assign this spec to you?
04:35:45 <abhishekk> samP: yes
04:36:20 <samP> #action abhishekk create spc for merge periodic tasks
04:36:25 <samP> abhishekk: thank you
04:36:34 <samP> shall we move to next then?
04:37:02 <samP> item3: configurable workflow
04:37:37 <abhishekk> this is a new requirement
04:37:45 <samP> is this configurable recovery patterns or smt else?
04:37:52 <abhishekk> yes
04:37:58 <samP> abhishekk: ok
04:38:16 <tpatil> samP: configurable recovery patterns
04:38:34 <samP> tpatil: thanks
04:38:43 <tpatil> I think Kajinami explained you the problems we are having in the current design
04:39:26 <samP> tpatil: actually, I couldn't. we gonna meet tomorrow
04:39:44 <tpatil> samP: Ok
04:40:14 <tpatil> Post that discussion, let's finalize on the new requirement before we go ahead and add a new blueprint for it
04:41:12 <samP> tpatil: sure, I will discuss with this on ML with kajinami
04:41:34 <samP> tpatil: we can have more details discuss on next meeting
04:41:54 <tpatil> samP: Sure
04:42:28 <abhishekk> samP: we have one more item for discussion
04:42:40 <samP> abhishekk: sure
04:42:44 <abhishekk> Dinesh_Bhor will explain you about that
04:42:58 <Dinesh_Bhor> ok I have a question that  whether the workflow should be executed synchronously or asynchronously?
04:43:31 <samP> specific work flow or all of them?
04:44:31 <abhishekk> Prticulary host_failure
04:44:31 <Dinesh_Bhor> The problem is we want to mark the used reserved_hosts after the execution of host failure workflow as reserved=False
04:45:17 <Dinesh_Bhor> For this we are passing that reserved_host_list dictionary to workflow for further execution.
04:45:21 <samP> ah.. got it
04:45:33 <Dinesh_Bhor> When the reserved_host is taken for evacuation, it is set to reserved_host['reserved'] = False. As the dictionary is mutable we get the updated dictionary after the execution of workflow.
04:45:59 <Dinesh_Bhor> After the execution of whole workflow we are looping over through the reserved_host_list in manager.py and if the reserved_host is marked as false then we are getting the related object and marking it as reserved=False.
04:46:25 <Dinesh_Bhor> The above solution is based on the assumption that we are executing the workflow synchronously.
04:47:01 <tpatil> In future, if some one wants to contribute another driver say Mistral then the workflow might execute asychronously and you might not get the results from the workflow execution in engine, right?
04:47:33 <Dinesh_Bhor> tpatil: yes, correct
04:47:33 <tpatil> where you will can db apis to update reserve_host flag to False
04:47:48 <tpatil> s/can/call
04:49:38 <tpatil> the current supported driver run on the local machine where engine is running. but in future any one can contribute new driver and we don't know whether it will return result  or not.
04:49:39 <samP> as tpatil said, if some one bring other driver to call this workflow, we can not do this synchronously
04:50:31 <tpatil> so the main question is how to set reserve_host flag to False after the instances are evacuated from the failover segment.
04:51:55 <tpatil> let's discuss the design offline but one thing is sure we cannot assume workflow to return results
04:52:04 <tpatil> samP: Do you agree?
04:52:38 <samP> tpatil: yes, I am thing abt some kind of locking or intermediate state for it
04:53:19 <samP> tpatil: agree, shall we raise a spec for this?
04:53:29 <tpatil> samP: yes
04:53:38 <samP> tpatil: thanks
04:53:54 <samP> Dinesh_Bhor: may I assign this spec to you?
04:54:04 <Dinesh_Bhor> samP: yes
04:55:01 <samP> #action Dinesh_Bhor spec for synchronous/asynchronous work flows
04:56:06 <samP> any other discussion topics, if no lest move to #any_other_topics
04:56:45 <samP> #topic AOB
04:57:20 <samP> I will update masakari wiki with our release schedule.
04:58:30 <samP> Our initial plan, we had milestone b1 on 12/9
04:59:33 <samP> since we have new topics to discuss, I would like this to extend this 12/16
05:00:35 <rkmrhj> Sure.
05:00:44 <samP> ok then.
05:00:51 <tpatil> I think we should use LP milestone feature to figure out details of each milestone
05:01:07 <samP> tpatil: sure
05:01:15 <abhishekk> thank you all
05:01:29 <tpatil> samP: Thank you
05:01:29 <samP> OK then, its almost time
05:01:40 <Dinesh_Bhor> yes, thanks all
05:02:00 <samP> please use ML openstack-dev[masakari] for further discussions
05:02:10 <samP> Thank you all
05:02:13 <tpatil> Sure
05:02:15 <tpatil> bye
05:02:16 <samP> #endmeeting