09:08:32 #startmeeting ha 09:08:32 Meeting started Wed Dec 14 09:08:32 2016 UTC and is due to finish in 60 minutes. The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:08:33 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:08:35 The meeting name has been set to 'ha' 09:08:37 alright then 09:08:55 just two of us today 09:09:01 I guess the day change confused people 09:09:05 maybe 09:09:17 but we see not many people from Barcelona 09:09:45 #topic specs 09:10:04 so I see sampath updated his VM recovery spec 09:10:33 that's good 09:10:42 today I finally have some time to review them 09:10:48 oh great 09:11:03 I don't know how it would go however 09:11:06 yeah, it is still missing the important part though 09:11:19 * ddeja spend 11 hours in train yesterday insted of 6... 09:11:24 ouch :-( 09:11:38 the most important part of these specs are the interface points 09:11:44 aspiers: agree 09:11:45 to ensure compatibility between the components 09:12:50 OK 09:13:10 so that bit should be covered for compute node monitoring recovery now 09:13:29 since my spec and your spec agree on the format of the message to pass 09:14:02 I have been thinking about the libvirt and nova-compute OCF RA specs 09:14:17 OK 09:14:29 in Barcelona I agreed with sampath that these should simply send failures to an external component 09:14:38 which decides what to do 09:14:56 but the challenge is how that decision should be handled 09:15:06 because Pacemaker needs to handle it 09:16:21 aspiers: BTW, I found this last week https://review.openstack.org/#/c/389103/2 09:17:05 ddeja: yes I talked to Michele and Andrew about that in Barcelona 09:17:11 oh 09:17:14 OK 09:17:21 can you give a little update? 09:17:35 they said don't worry, it's just covering what we are already doing, from a triple-o PoV 09:17:48 so not a new solution 09:18:08 just about getting triple-o to automatically set up instance HA 09:18:25 oh, OK 09:18:37 so it would be using resource agents? 09:19:24 yes I guess so 09:19:26 for now 09:19:34 or maybe mistral 09:19:39 I don't know 09:19:44 OK 09:19:46 really RH are driving that 09:20:19 but If it's not about implementing something new, then it is OK 09:20:42 yeah 09:20:46 well that's what they said 09:21:02 but I would suggest pinging them and asking about mistral 09:21:18 sure 09:21:22 thanks 09:21:40 we should also review their spec ... 09:22:02 I've just added myself as a reviewer 09:22:13 it seems like today is the review day 09:22:29 cool 09:22:41 I'll aim for that too 09:22:46 OK 09:23:12 last month I've been forced to focus much more on customers :( 09:23:31 but closer to Xmas hopefully it should get quiet leaving more time for this 09:24:10 if you have any ideas about the process monitoring specs then please let me know 09:24:21 good (for openstack community, maybe not for the customers) 09:24:25 aspiers: of course 09:24:25 hehe 09:24:39 I don't think it can work with HTTP messages like with host recovery 09:25:00 since Pacemaker has to initiate the monitoring and also receive the results, synchronously 09:25:42 aspiers: yes, but it can also then send HTTP message 09:26:24 the question is whether it should send HTTP message every time 09:26:31 and when it should attempt recovery 09:26:45 only based on HTTP response, or whenever it normally would recover? 09:27:14 in the latter case, the recovery workflow engine could perform additional recovery 09:27:21 aspiers: please remind me - process recovery is about restarting libvirt/nova compute? 09:27:30 that's the first part 09:27:39 but it could take stronger action after several failures 09:27:50 like service-disable 09:28:05 but, hm 09:28:19 all of those could be performed inside the resource agent I guess 09:28:39 yes but the idea was that the spec for monitoring should leave that totally flexible 09:28:47 so that each cloud can decide its own policy 09:29:07 OK 09:29:20 so I am not sure exactly how it should work 09:29:34 but hm, still I don't see a place where we should perform any http calls? 09:29:47 it still can be done from the resource agent IMO 09:30:02 I think maybe the RA should do normal process monitoring *and* recovery, but also send HTTP message when monitor fails and when starting/ending recovery 09:30:23 then external engine can do any additional steps 09:30:28 if necessary 09:30:39 that should work 09:30:53 beekhof also suggested that every RA stop also does service-disable, and start does service-enable 09:31:31 the problem is that the RA doesn't know if the stop is due to failed monitor or just a clean stop 09:31:38 maybe it can track that internally 09:31:57 well, I'll try to write the spec and in the process hopefully we can figure out the best option 09:32:04 OK 09:32:05 but if you have ideas meanwhile, please tell me 09:32:10 sure 09:32:11 thanks 09:32:20 OK, I have to go soon - anything else from you? 09:32:24 e.g. mistral updates? 09:32:44 not really 09:32:56 OK no problem 09:32:58 last week I was focused on presentation for meetup 09:33:09 oh yeah, I heard it went well 09:33:16 and in mistral I'm working mostly on gate 09:33:24 ok 09:33:38 aspiers: yes, it went OK 09:33:57 and the fact that demo failed at middle step didn't broke everything ;) 09:34:02 cool 09:34:04 lol 09:34:13 it's not a real demo if it works ;-) 09:34:31 Roman told me to get a printed chicken 09:34:37 and rip it up before? 09:34:41 I have done that in the past 09:34:41 yes 09:34:43 it works ;-) 09:34:45 but I didn't 09:34:48 well 09:34:52 so it is your fault ;-) 09:34:53 so that must be a case 09:34:56 haha 09:34:58 proof! 09:35:14 I'll do it next time 09:35:26 good ;-) 09:35:34 also, next time I'll be jumping through commits instead of live coding ;) 09:35:41 lol 09:35:49 wow, live coding - nice :) 09:35:51 I did that oncde 09:35:53 once 09:35:55 it was fun 09:36:08 BTW when you review my host monitoring spec please let me know what you think about the section structure 09:36:13 OK 09:36:15 it is different to the others 09:36:30 I need to update the template to match, so that tests pass 09:36:38 OK I have to go now 09:36:42 no problem 09:36:44 thanks a lot! 09:36:49 thank you 09:37:00 see you on #openstack-ha, bye :) 09:37:16 #endmeeting