04:02:28 <samP> #startmeeting masakari
04:02:29 <openstack> Meeting started Tue Jan 24 04:02:28 2017 UTC and is due to finish in 60 minutes.  The chair is samP. Information about MeetBot at http://wiki.debian.org/MeetBot.
04:02:30 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
04:02:32 <openstack> The meeting name has been set to 'masakari'
04:02:39 <samP> Hi all
04:02:43 <takashi> samP: hi
04:02:49 <abhishekk> o/
04:03:02 <rkmrHonjo> hi
04:03:13 <samP> since we do not have critical bugs, let jump in to discuttion
04:03:35 <rkmrHonjo> ok.
04:03:49 <tpatil> In the last meeting we discuss about signal handler issue
04:03:58 <samP> tpatil: yes,
04:04:13 <tpatil> signal issue is fixed in patch https://review.openstack.org/#/c/421767/
04:04:38 <tpatil> Just saw you have approved that patch
04:04:46 <samP> tpatil: LGTM
04:05:07 <rkmrHonjo> tpatil: thanks.
04:05:13 <samP> rkmrHonjo: if its ok, then +1 the workflow
04:05:15 <abhishekk> needs to set w+ on that patch
04:05:46 <rkmrHonjo> samP, abhishekk:sure.
04:06:35 <samP> is does not effect to other patches, so, no need to rebase them, right?
04:07:03 <tpatil> rkmrHonjo: We will submit a separate patch to exit child process gracefully
04:07:33 <rkmrHonjo> tpatil:Is the patch use ServiceLauncher?
04:07:46 <rkmrHonjo> s/Is/Does/gc
04:08:05 <tpatil> rkmrHonjo: No, it uses same code as previous i.e ProcessLauncher
04:09:04 <rkmrHonjo> tpatil: OK, thanks.
04:10:20 <samP> tpatil: thanks..
04:10:28 <takashi> sorry for short disconnection. just come back from wifi trouble.
04:11:30 <samP> takashi: np
04:11:58 <samP> tpatil: when will you plan to submit the new patch?
04:13:29 <abhishekk> samP: tushar san is facing internet issue
04:13:30 <abhishekk> we
04:13:33 <tpatil_> sorry, I lost my internet connection
04:13:40 <samP> tpatil_: np
04:13:42 <samP> tpatil: when will you plan to submit the new patch?
04:13:57 <tpatil_> samP: can you please suggest what should be the default interval for the new periodic task?
04:14:34 <samP> tpatil_: are you refering to https://review.openstack.org/#/c/423059/1
04:14:45 <tpatil_> yes
04:16:58 <tpatil_> more the probability of notification failure , less should be the interval time
04:18:06 <tpatil_> Do you think 2 mins interval is appropriate to run the new periodic task?
04:18:39 <samP> tpatil_: current default is 300 right, feels bit long. but 120 would be nice
04:18:55 <tpatil_> samP: ok
04:19:17 <abhishekk> I will update the lite-specs accordingly
04:19:50 <samP> tpatil_: in nova, is 60. but we are not just polling things, in that case 120 would be OK
04:20:31 <samP> however, Im wondering, cat we recommend a minimum value for retry_notification_new_status_interval?
04:21:36 <samP> tpatil_: thanks, I put a minor comment on minimum value for retry_notification_new_status_interval
04:22:08 <tpatil_> samP: We can keep the default value to 60 for retry_notification_new_status_interval
04:23:06 <samP> tpatil_: that meens, generated_time+60s it will re-try the new notifications
04:23:27 <tpatil_> correct
04:24:37 <samP> tpatil_: for my understandig, it wont take long befor recovery folw get the new notifications, which is way less than 60s.
04:25:27 <samP> so, 60s would be fine. I will add these comments to spec review.
04:25:57 <tpatil_> samP: execution time of each flow cannot be predicted as it goes through different services
04:27:59 <samP> tpatil_: ah, right. I will check other config values and comment to the spec
04:28:09 <tpatil_> samP: ok
04:28:28 <samP> one other question, how does the operator know which flow is not re-trying?
04:29:04 <tpatil_> Abhishek: Please explain the notification status flow
04:29:30 <samP> it seems, both notifications are end in "failed" states
04:29:37 <abhishekk> yes
04:29:57 <abhishekk> so 1st is if notification ends in error state
04:30:25 <abhishekk> then periodic task will pick that notification and states will be error > running > success or failed
04:30:34 <samP> ah, got it
04:30:53 <abhishekk> 2nd one if notification is ignored then it will be new > running > failed or success state
04:30:54 <samP> if failed, the operator must look in to it
04:30:59 <abhishekk> yes
04:31:07 <tpatil_> samP: correct
04:31:08 <samP> abhishekk: thanks, got it.
04:31:38 <samP> tpatil_: thanks
04:31:39 <tpatil_> samP: All notifications whose status is Failed should be resolved by the operator
04:31:59 <samP> tpatil_: clear.. thanks
04:33:02 <samP> OK then, overall flow is LGTM except minor comments about default values. I will add my comment on them.
04:34:01 <samP> if no other comments or questins abt periodic task, shall we move to "RESERVED_HOST recovery action"?
04:34:09 <tpatil_> Sure
04:34:21 <samP> here is the spec.
04:34:37 <samP> humm... link is not working
04:34:40 <samP> https://review.openstack.org/#/c/423072/1/specs/ocata/approved/implement-reserved-host-action.rst
04:35:33 <samP> are we goning to set reserved=False at the end of the execution?
04:35:40 <tpatil_> #link: https://review.openstack.org/#/c/423072
04:36:24 <abhishekk> samP: as per current approach yes
04:36:55 <samP> for me it seems that, there is a possibility for multiple work folw could take the same reserved host...
04:37:58 <tpatil_> samP: you are correct, the reserved host should be set to False immediately even before evacuating vms from the failed compute node.
04:38:51 <tpatil_> sam: We will incorporate this use case in the current specs
04:38:51 <takashi> Or do we need a lock about reserved host?
04:39:40 <tpatil_> as per current design ,we don't want to call masakari db api from the workflow
04:39:40 <samP> tpatil_: thanks..we may introduce a new flag/lock, but make reserved=false & nova-compute=disable is same effect right?
04:40:38 <samP> ah..no..it wont
04:40:45 <abhishekk> IMO when setting reserved=false means we need to enable compute service for that host
04:41:47 <samP> abhishekk: yes, but we cant set reserved=fale at the start and set nova-compute=enable at the end, in the same flow.
04:42:22 <samP> abhishekk: in the middle, we can set the error andlings
04:43:35 <tpatil_> samP: flow should be: loop through the available reserved host list->set reserved to False, enable compute node of reserved host->evacuate vms
04:43:36 <abhishekk> we need to enable compute service at the start only
04:44:42 <takashi> we need to enable nova-compute service before evacuation, right?
04:44:58 <tpatil_> takashi: correct
04:45:09 <samP> tpatil_: before evacuate, correct
04:45:34 <tpatil_> samP: yes
04:46:48 <samP> tpatil_: from start to end of the recovery flow, how do we prevent nova-scheduler from assigning new VM to that host?
04:47:11 <tpatil_> samP: That's not possible
04:47:53 <samP> tpatil_: cant we use on maintenance? is it block the evacuation too?
04:48:15 <tpatil_> samP: Evacuate api will fail if the compute host is out of resources and then it will get the new reserved host from the list and continue evacuation on new compute node
04:49:43 <abhishekk> samP: nova does not know about on_maintenance, once compute service is enabled on reserved host then nova can use that host to schedule new instances
04:51:04 <tpatil_> samP: Abhishek is correct
04:51:31 <samP> sorry, probably my mistake of nova service-disable --reason maintenance
04:52:59 <samP> however, it seems that migrate can preform even in service-disable mode, may be that made me confuse
04:53:10 <samP> http://docs.openstack.org/ops-guide/ops-maintenance-compute.html
04:55:50 <rkmrHonjo> samP: Just a information: The software that is base of masakari didn't care this scheduling problem. (Of course, improving it is nice.)
04:56:12 <tpatil_> IMO, the nova compute service should be enabled even in live migration case other RPC message won't work at all
04:57:01 <samP> rkmrHonjo: correct.
04:57:51 <tpatil_> s/other/otherwise
04:58:38 <samP> tpatil_: agree... I will chekck the current nova code and update myself...
04:59:11 <samP> ok then, its almost time...
04:59:34 <rkmrHonjo> In my understanding, http://docs.openstack.org/ops-guide/ops-maintenance-compute.html says that source-computenode can be disable. But, dest-compute node should be enable.
04:59:53 <takashi> Let's keep discussion on specs. would be glad if we can get some feedback from previous masakari implementation
05:00:06 <samP> rkmrHonjo: right
05:00:26 <samP> Please put your comments and questions on the spec..
05:00:37 <samP> #topic AOB
05:00:44 <takashi> I'll have a look about specs and some remainig patches
05:00:52 <samP> takashi: sure, thanks
05:01:05 <takashi> Should we move to #openstack-masakari?
05:01:45 <samP> takashi: sure, but I was thinkg to link gerrit with it
05:02:22 <samP> Since, infra said thay dont have enough bots, I left it to TODO
05:03:47 <samP> takashi: sice we are out of time, lets dicuss it on next meeting..
05:04:42 <samP> any ohter questions or comments?
05:04:58 <rkmrHonjo> Can I talk about other topic?(Sorry, I forgot to write topic on wiki.)
05:05:08 <samP> rkmrHonjo: sure
05:05:15 <rkmrHonjo> samP: Thanks.
05:05:45 <rkmrHonjo> Masakari-monitors: Takahara & I re-thought about "ProcessLauncher or ServiceLauncher" after last week meeting.
05:06:07 <rkmrHonjo> As a result, we thought that service launcher is better than process launcher. There are 2 reasons.
05:06:53 <rkmrHonjo> 1: Using service launcher for non-http server is a general way.(This reason was already said in last week meeting.)
05:07:59 <rkmrHonjo> 2: Launching 3 monitors as workers of one parent is not useful. Some users won't wish to use all monitors. And restating/shutoff one monitor is not easy in this way.
05:08:19 <rkmrHonjo> samP: How do you think about this?
05:09:24 <samP> rkmrHonjo: Are you recomending the service launcher insted of current process launcher?
05:09:36 <rkmrHonjo> samP: Yes.
05:10:52 <samP> rkmrHonjo: For my POV, (1) is not that importent, however (2) is importent
05:10:59 <samP> rkmrHonjo: aggree with you on (2)
05:12:15 <rkmrHonjo> samP: Thanks. We'll change launcher after tpatil's patch will be merged.
05:12:32 <samP> rkmrHonjo: if we are going to change this, then we have to discuss with tpatil and abhishekk too. because they are working on the new patch
05:13:18 <samP> rkmrHonjo: can you create a doc on ether pad or PoC for that ptach?
05:14:40 <rkmrHonjo> samP: "That patch" means change launcher, right?
05:14:46 <samP> rkmrHonjo: yes
05:15:35 <rkmrHonjo> samP: OK. I create doc or PoC and notify it to you, tpatil and abhichekk.
05:16:10 <samP> rkmrHonjo: thank you very much..
05:16:19 <rkmrHonjo> samP: thanks.
05:17:09 <samP> OK then, we are 16mis out of the schedule..lets wrap up
05:17:25 <samP> is there are any other things to discuss?
05:18:07 <samP> if no, then lets finish the meeting. use #openstack-masakari for further discussions
05:18:13 <samP> Thank you all
05:18:35 <samP> #endmeeting