#openstack-meeting-alt log

09:08:32 <aspiers> #startmeeting ha
09:08:32 <openstack> Meeting started Wed Dec 14 09:08:32 2016 UTC and is due to finish in 60 minutes.  The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:08:33 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:08:35 <openstack> The meeting name has been set to 'ha'
09:08:37 <aspiers> alright then
09:08:55 <aspiers> just two of us today
09:09:01 <aspiers> I guess the day change confused people
09:09:05 <ddeja> maybe
09:09:17 <ddeja> but we see not many people from Barcelona
09:09:45 <aspiers> #topic specs
09:10:04 <aspiers> so I see sampath updated his VM recovery spec
09:10:33 <ddeja> that's good
09:10:42 <ddeja> today I finally have some time to review them
09:10:48 <aspiers> oh great
09:11:03 <ddeja> I don't know how it would go however
09:11:06 <aspiers> yeah, it is still missing the important part though
09:11:19 * ddeja spend 11 hours in train yesterday insted of 6...
09:11:24 <aspiers> ouch :-(
09:11:38 <aspiers> the most important part of these specs are the interface points
09:11:44 <ddeja> aspiers: agree
09:11:45 <aspiers> to ensure compatibility between the components
09:12:50 <ddeja> OK
09:13:10 <aspiers> so that bit should be covered for compute node monitoring recovery now
09:13:29 <aspiers> since my spec and your spec agree on the format of the message to pass
09:14:02 <aspiers> I have been thinking about the libvirt and nova-compute OCF RA specs
09:14:17 <ddeja> OK
09:14:29 <aspiers> in Barcelona I agreed with sampath that these should simply send failures to an external component
09:14:38 <aspiers> which decides what to do
09:14:56 <aspiers> but the challenge is how that decision should be handled
09:15:06 <aspiers> because Pacemaker needs to handle it
09:16:21 <ddeja> aspiers: BTW, I found this last week https://review.openstack.org/#/c/389103/2
09:17:05 <aspiers> ddeja: yes I talked to Michele and Andrew about that in Barcelona
09:17:11 <ddeja> oh
09:17:14 <ddeja> OK
09:17:21 <ddeja> can you give a little update?
09:17:35 <aspiers> they said don't worry, it's just covering what we are already doing, from a triple-o PoV
09:17:48 <aspiers> so not a new solution
09:18:08 <aspiers> just about getting triple-o to automatically set up instance HA
09:18:25 <ddeja> oh, OK
09:18:37 <ddeja> so it would be using resource agents?
09:19:24 <aspiers> yes I guess so
09:19:26 <aspiers> for now
09:19:34 <aspiers> or maybe mistral
09:19:39 <aspiers> I don't know
09:19:44 <ddeja> OK
09:19:46 <aspiers> really RH are driving that
09:20:19 <ddeja> but If it's not about implementing something new, then it is OK
09:20:42 <aspiers> yeah
09:20:46 <aspiers> well that's what they said
09:21:02 <aspiers> but I would suggest pinging them and asking about mistral
09:21:18 <ddeja> sure
09:21:22 <ddeja> thanks
09:21:40 <aspiers> we should also review their spec ...
09:22:02 <ddeja> I've just added myself as a reviewer
09:22:13 <ddeja> it seems like today is the review day
09:22:29 <aspiers> cool
09:22:41 <aspiers> I'll aim for that too
09:22:46 <ddeja> OK
09:23:12 <aspiers> last month I've been forced to focus much more on customers :(
09:23:31 <aspiers> but closer to Xmas hopefully it should get quiet leaving more time for this
09:24:10 <aspiers> if you have any ideas about the process monitoring specs then please let me know
09:24:21 <ddeja> good (for openstack community, maybe not for the customers)
09:24:25 <ddeja> aspiers: of course
09:24:25 <aspiers> hehe
09:24:39 <aspiers> I don't think it can work with HTTP messages like with host recovery
09:25:00 <aspiers> since Pacemaker has to initiate the monitoring and also receive the results, synchronously
09:25:42 <ddeja> aspiers: yes, but it can also then send HTTP message
09:26:24 <aspiers> the question is whether it should send HTTP message every time
09:26:31 <aspiers> and when it should attempt recovery
09:26:45 <aspiers> only based on HTTP response, or whenever it normally would recover?
09:27:14 <aspiers> in the latter case, the recovery workflow engine could perform additional recovery
09:27:21 <ddeja> aspiers: please remind me - process recovery is about restarting libvirt/nova compute?
09:27:30 <aspiers> that's the first part
09:27:39 <aspiers> but it could take stronger action after several failures
09:27:50 <aspiers> like service-disable
09:28:05 <ddeja> but, hm
09:28:19 <ddeja> all of those could be performed inside the resource agent I guess
09:28:39 <aspiers> yes but the idea was that the spec for monitoring should leave that totally flexible
09:28:47 <aspiers> so that each cloud can decide its own policy
09:29:07 <ddeja> OK
09:29:20 <aspiers> so I am not sure exactly how it should work
09:29:34 <ddeja> but hm, still I don't see a place where we should perform any http calls?
09:29:47 <ddeja> it still can be done from the resource agent IMO
09:30:02 <aspiers> I think maybe the RA should do normal process monitoring *and* recovery, but also send HTTP message when monitor fails and when starting/ending recovery
09:30:23 <aspiers> then external engine can do any additional steps
09:30:28 <aspiers> if necessary
09:30:39 <ddeja> that should work
09:30:53 <aspiers> beekhof also suggested that every RA stop also does service-disable, and start does service-enable
09:31:31 <aspiers> the problem is that the RA doesn't know if the stop is due to failed monitor or just a clean stop
09:31:38 <aspiers> maybe it can track that internally
09:31:57 <aspiers> well, I'll try to write the spec and in the process hopefully we can figure out the best option
09:32:04 <ddeja> OK
09:32:05 <aspiers> but if you have ideas meanwhile, please tell me
09:32:10 <ddeja> sure
09:32:11 <aspiers> thanks
09:32:20 <aspiers> OK, I have to go soon - anything else from you?
09:32:24 <aspiers> e.g. mistral updates?
09:32:44 <ddeja> not really
09:32:56 <aspiers> OK no problem
09:32:58 <ddeja> last week I was focused on presentation for meetup
09:33:09 <aspiers> oh yeah, I heard it went well
09:33:16 <ddeja> and in mistral I'm working mostly on gate
09:33:24 <aspiers> ok
09:33:38 <ddeja> aspiers: yes, it went OK
09:33:57 <ddeja> and the fact that demo failed at middle step didn't broke everything ;)
09:34:02 <aspiers> cool
09:34:04 <aspiers> lol
09:34:13 <aspiers> it's not a real demo if it works ;-)
09:34:31 <ddeja> Roman told me to get a printed chicken
09:34:37 <aspiers> and rip it up before?
09:34:41 <aspiers> I have done that in the past
09:34:41 <ddeja> yes
09:34:43 <aspiers> it works ;-)
09:34:45 <ddeja> but I didn't
09:34:48 <aspiers> well
09:34:52 <aspiers> so it is your fault ;-)
09:34:53 <ddeja> so that must be a case
09:34:56 <aspiers> haha
09:34:58 <aspiers> proof!
09:35:14 <ddeja> I'll do it next time
09:35:26 <aspiers> good ;-)
09:35:34 <ddeja> also, next time I'll be jumping through commits instead of live coding ;)
09:35:41 <aspiers> lol
09:35:49 <aspiers> wow, live coding - nice :)
09:35:51 <aspiers> I did that oncde
09:35:53 <aspiers> once
09:35:55 <aspiers> it was fun
09:36:08 <aspiers> BTW when you review my host monitoring spec please let me know what you think about the section structure
09:36:13 <ddeja> OK
09:36:15 <aspiers> it is different to the others
09:36:30 <aspiers> I need to update the template to match, so that tests pass
09:36:38 <aspiers> OK I have to go now
09:36:42 <ddeja> no problem
09:36:44 <aspiers> thanks a lot!
09:36:49 <ddeja> thank you
09:37:00 <aspiers> see you on #openstack-ha, bye :)
09:37:16 <aspiers> #endmeeting