03:00:27 <samP> #startmeeting masakari
03:00:28 <openstack> Meeting started Tue Jun  5 03:00:27 2018 UTC and is due to finish in 60 minutes.  The chair is samP. Information about MeetBot at http://wiki.debian.org/MeetBot.
03:00:29 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
03:00:31 <openstack> The meeting name has been set to 'masakari'
03:00:37 <samP> Hi all for masakari
03:00:37 <sagara> hi
03:00:40 <tpatil> Hi
03:00:42 <Dinesh__Bhor> Hi
03:00:50 <samP> Hi all,
03:01:11 <samP> From today meeting will start at 0300UTC
03:01:25 <samP> Let's start
03:01:38 <samP> #topic High priority items
03:02:04 <samP> Just push a patch to release python-masakariclient
03:02:08 <samP> #link https://review.openstack.org/#/c/572244/
03:02:26 <samP> It has some validation error, I will fix it.
03:02:34 <samP> This will release 5.1.0
03:03:53 <samP> Any other high priority items?
03:04:10 <samP> if not lets move to bug and patches
03:04:26 <samP> #topic Bug/Patches
03:04:45 <samP> Any critical bugs or patches to discuss?
03:05:05 <tpatil> https://bugs.launchpad.net/masakari/+bug/1773132
03:05:06 <openstack> Launchpad bug 1773132 in masakari "masakari-engine runs recovery twice for one notification when disconnection with rabbitmq" [Undecided,Confirmed]
03:06:25 <tpatil> Looking at the code, I have marked this bug as confirmed
03:06:38 <tpatil> I will fix this issue
03:07:07 <samP> This problem could happen.
03:07:27 <tpatil> basically we will need to get the notification from db and compare the previous status with the current one and take decision to skip processing
03:09:51 <samP> tpatil: thanks. Do we have any exceptions for rabbit mq delivery failures?
03:10:34 <tpatil> on masakari-aPI side, yes
03:10:57 <tpatil> but after publishing, masakari-api return success status to the caller
03:11:33 <tpatil> and if masakari-engine doesn't get the message for long time, then this situation could occur
03:11:45 <samP> tpatil: correct
03:11:54 <tpatil> this situation is rare though
03:12:35 <samP> for Host failure, this would not be a critical issue.
03:13:36 <samP> ah..can't say for sure. I take my statement back
03:13:38 <samP> sorry
03:16:02 <samP> tpatil: thanks for fixing this. Let's discuss once we have patches
03:16:12 <tpatil> samP: Sure
03:16:16 <samP> Any other bugs?
03:16:19 <samP> or ptches
03:16:31 <samP> s/ptches/patches
03:17:17 <samP> if not let's move to discussion
03:17:26 <samP> #topic discussion
03:17:28 <tpatil> https://bugs.launchpad.net/masakari/+bug/1773765
03:17:29 <openstack> Launchpad bug 1773765 in masakari "There is a possibility that 'running' notification will remain" [Undecided,New]
03:17:43 <tpatil> The issue is almost same as the previous one
03:18:49 <tpatil> since host evacuation is lengthy process, it could fail in between and the status would remain as running instead of "failed".
03:18:50 <samP> almost same, but not duplicate.
03:19:19 <tpatil> But in production environment,this situation could only occur in case of power failure
03:19:42 <samP> or the network failure
03:20:01 <tpatil> yes
03:20:37 <samP> is it possible to check evacuation status from nova side/
03:20:39 <samP> ?
03:20:53 <tpatil> since we cannot predict how much time its going to take to finish processing host failure notifications, it's difficult to rerun based on running status in the periodic tasks
03:21:24 <samP> tpatil: agree
03:21:57 <samP> Need to check about evacuation, but live migration have cancel feature.
03:23:41 <samP> Second thought, even we have cancel feature, better to wait till it finish or become error
03:25:24 <tpatil> In that case, masakari will need to store the request id and query based on it. But does nova support this feature? need to check
03:26:01 <tpatil> need to check instance actions and figure out the status based on the request if
03:26:02 <samP> well we could listen to nova notifications
03:26:05 <tpatil> s/if/id
03:26:27 <tpatil> yes, that's another option too
03:26:40 <tpatil> but in case of power/network failure, this won't happen, correct
03:26:59 <samP> tpatil: correct
03:27:01 <tpatil> I mean we wouldn't get notifications from nova
03:27:43 <samP> In such failure, we cant say for sure that we can get those notifications
03:28:30 <samP> Best option is to leave this to operator to handle
03:29:11 <tpatil> one thing is for sure, we will need to query nova to find out whether evacuation succeeded or failed and then maybe we can take some decision to process notifications with running status in periodic tasks
03:29:32 <samP> tpatil: agree
03:31:24 <samP> I will update bug report with my findings
03:32:37 <tpatil> we can certainly add some code in the periodic task to notify operators that some notifications which are in running status are taking longer time to process than expected (configurable using new config option)
03:33:45 <samP> + warning log is useful too
03:33:47 <tpatil> it will log warning messages and then operator will need to figure out the issue by themselves
03:34:10 <samP> tpatil: got it
03:35:10 <samP> I think that would be fine for immediate fix
03:35:36 <tpatil> Ok, We will fix this issue
03:36:20 <samP> It is better if we can load this config without restarting masakari-api
03:37:19 <samP> tpatil: thanks
03:38:00 <tpatil> config option will be needed in masakari-engine
03:38:30 <samP> tpatil: sorry, you are correct.
03:40:09 <samP> Any other bugs or patches?
03:40:23 <samP> Please bring them up any time.
03:40:32 <samP> (1) Horizon Plugin
03:40:47 <samP> Need to review...
03:41:13 <tpatil> Niraj will upload a new PS today
03:41:24 <tpatil> then we should start reviewing Add segment panel patch
03:41:40 <samP> I have the check to release deadline for horizon plug-in for Rocky
03:41:48 <samP> tpatil: sure, thans
03:41:56 <samP> s/thans/thanks
03:42:30 <samP> Any way we better merge main patches before Rocky-3
03:42:46 <samP> I will let you know the exact dates for this.
03:43:08 <samP> (2) Ansible support for Masakari
03:43:32 <tpatil> Rocky 3 milesone release date is Jul 23 - Jul 27
03:43:48 <samP> tpatil: yep.
03:44:03 <tpatil> Niraj is working on fixing functional tests
03:44:24 <tpatil> then he will added documentation to install masakari-api and masakari-engine
03:44:32 <tpatil> s/add/added
03:44:38 <samP> tpatil: Thanks
03:44:59 <tpatil> next action is to write masakari-monitor role
03:45:35 <tpatil> fixing functional tests and adding documentation will be done in this week
03:45:43 <samP> tpatil: yep, which might cause some troubles.
03:45:55 <tpatil> and then we will focus on masakari-monitor role
03:45:58 <samP> tpatil: got it. thanks for fixing tests and add docs
03:46:04 <samP> tpatil: got it
03:47:05 <samP> #topic AOB
03:47:41 <samP> I'm working on rpm packaging
03:48:02 <samP> Here is the test packaging which is failing currently..
03:48:09 <samP> #link https://copr.fedorainfracloud.org/coprs/sampntt/masakari/
03:48:36 <samP> I will fix this soon, and propose this to RDO.
03:48:56 <samP> Then we can use dnf or yum to install masakari packages...
03:49:20 <samP> That's all form my side
03:49:31 <samP> any updates?
03:49:43 <tpatil> recovery workflow customization
03:49:56 <tpatil> need to review specs
03:50:03 <samP> tpatil: ah, sorry I miss that
03:50:11 <samP> tpatil: I will review that
03:50:19 <tpatil> Thank you
03:51:02 <samP> Need to renew the agenda on wiki...
03:51:48 <tpatil> I will add my agenda items before the next meeting
03:51:57 <samP> And need to update masakari wiki too..
03:52:13 <samP> tpatil: thanks that would be helpful
03:52:26 <tpatil> Any updates about project mascot?
03:52:35 <samP> tpatil: not yet.
03:52:39 <samP> I will ping them
03:52:44 <tpatil> Ok, thanks
03:55:11 <samP> Any updates?
03:55:25 <samP> or we could finish today's meeting
03:55:35 <tpatil> Nothing from my end for now
03:55:39 <samP> tpatil: thanks
03:56:31 <samP> Please use openstack-dev ML with [masakari] or IRC #openstack-masakari @freenode for further discussion
03:56:40 <samP> Thank you all...
03:56:45 <Dinesh__Bhor> Thank you all
03:56:47 <samP> #endmeeting