09:00:26 #startmeeting HA: automated recovery from hypervisor failure 09:00:27 Meeting started Mon Nov 16 09:00:26 2015 UTC and is due to finish in 60 minutes. The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:00:28 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:00:30 The meeting name has been set to 'ha__automated_recovery_from_hypervisor_failure' 09:00:58 hi 09:01:04 Hi all, and welcome to the (first ever?) official IRC meeting about OpenStack HA! 09:01:11 o/ 09:01:11 <_gryf> :) 09:01:16 hi o/ 09:01:20 hi \o 09:01:22 here is the etherpad we will be referring to: https://etherpad.openstack.org/p/automatic-evacuation 09:01:22 hello 09:02:13 this meeting was arranged off the back of discussions in Tokyo on hypervisor HA 09:02:37 we have a few existing approaches within the community, and we were interested in trying to converge on one 09:03:01 the etherpad lists all known existing approaches, plus some ideas on new approaches 09:03:11 and probably put it in the HA guide as recommended :) 09:03:17 exactly :) 09:03:38 I think today we only have 30 mins or so, so I don't expect us to reach a complete plan for convergence :) 09:03:44 although if that happens I will not complain ;) 09:04:07 but it's more about setting a general direction for how work should continue 09:04:39 should I give a very brief history for those who are new to this area? 09:05:08 <_gryf> aspiers, pleas, do :) 09:05:46 OK. The first approach on the etherpad (masakari) is from NTT and has some very nice features. 09:06:07 The second was born out of conversations between Red Hat and Intel, and has a similar design. 09:06:48 The third was originally by Red Hat, and also used by SUSE. This one exposed some weaknesses in the design which the other two are trying to address. 09:06:53 ok, i'm here too 09:07:00 Mainly the unreliability of evacution 09:07:09 too many channels :) 09:07:15 what are we talking about? 09:07:17 hey beekhof :) you didn't miss much but I guess you can catch up via the logs on the web 09:07:26 I'm just summarising status quo for benefit of newcomers 09:07:34 k 09:08:08 then we have an entirely different approach from AWcloud and ChinaMobile which was presented in Tokyo 09:08:26 it has some good ideas which a best of breed solution should probably incorporate 09:08:52 but it is fundamentally different (Pacemaker not used) 09:08:55 ChinaMobile was tristack? 09:08:59 could we define the evacuation term more precise? in terms of fencing as well. Is it just STONITH the host hypervisor node and relaunch instances another place, which would be like a reboot from the instances pov? 09:09:09 so I guess it is impossible to converge with that codebase 09:09:10 or is it live migration? 09:09:18 it is not live migration 09:09:28 so we have a problem with terminology unfortunately 09:09:42 a long time ago the nova project chose the misnomer "evacuate" 09:09:45 <_gryf> bogdando, evacuation is basically rebuilding vm on another host. it's post mortem process 09:09:53 at the point a compute node is dead - there is nowhere to live migrate from 09:09:59 exactly. it's really "resurrect" not "evacuate" 09:10:01 okay, so it is like a reboot for its apps 09:10:22 in Vancouver it was proposed to fix this naming by renaming to resurrect 09:10:45 well in the guide that would be just a notem so everything would be clear 09:10:47 but since then nothing has happened, and I spoke to Dan Smith about it who said he was -1.99 on the rename due to the impact 09:10:52 note* 09:10:57 so it's unlikely to happen any time soon 09:11:15 so yes, we'll just have to be clear what we mean, especially in docs 09:11:17 not a problem, just to be on the same page here 09:11:37 bogdando: thanks for the clarification, I had forgotten to mention that :) 09:12:20 it seems all methods are about pacemaker 09:12:42 <_gryf> bogdando, not really 09:12:50 and I like it personally. But do we make a R&D around its fancy alternatives? 09:13:05 do we want to*? 09:13:12 that's a good question 09:13:30 what alternatives are there? 09:13:52 BTW I suspect most people in this meeting want to stick with Pacemaker but we can put it to the vote for sure 09:13:57 note, we probably may want to sync with a DLM initiative 09:14:05 keepalived is about all (from a high level view, not the instance HA part) 09:14:16 if they decided to stick with zookeper or etcd or consul 09:14:30 the AWcloud/ChinaMobile approach uses consul/raft/gossip 09:14:51 so to keep ops less frustrated it would be nice we to use the same solution 09:14:52 andreaf: it was easy to rename BaseComputeTest to BaseV2ComputeTest due to number of lines 09:14:57 frankly, moving away from Pacemaker would be too much rework for SUSE 09:15:09 fwiw, i spent most of the day starting to pull in a lot of the content from https://github.com/beekhof/osp-ha-deploy into the ha-guide 09:15:17 so it's pretty unlikely we'd do that 09:15:19 just to not introduce yet another control plane 09:15:40 * oomichi sorry for interuppting. 09:15:47 redhat is staying with pacemaker too 09:16:08 also the upstream HA guide is already Pacemaker-based 09:16:21 and so is Masakari 09:16:24 NTT also pant to use pacemaker. 09:16:26 yes, pacemaker is a part of ref. arch already 09:16:42 s/pant/plan/ 09:17:24 of course anyone is free to do R&D around other technologies at any time 09:17:53 everyone is entitled to their opinions, as long as they understand they are wrong :) 09:17:57 lol :) 09:18:21 and make sure it would be not diverge from the control plane /distributed consensus solution picked by the DLM initiative, I'd say 09:18:46 given that 1) we already have a very good platform based on Pacemaker and 2) we are pretty confident we can implement solid hypervisor HA on top of it, I'm not sure it makes much sense for us to research other options 09:18:54 unless it's to steal ideas :) 09:18:58 I mean ops would barely be happy to see pacemaker and consul and zookeper in one deploy 09:19:18 bogdando: agreed 09:19:28 one cluster manager is enough :) 09:19:43 let's push pacemaker to the DLM topic! :) 09:19:49 sorry for offtopic 09:19:59 :) 09:20:24 so are we agreed to stick with Pacemaker for now? 09:20:40 It seems so 09:20:43 <_gryf> yup 09:21:02 well i'm not going to say no 09:21:03 agree 09:21:12 #agreed hypervisor HA solution will be based on Pacemaker 09:21:34 so, most obvious path to me is to converge masakari with evacuationd 09:21:58 <_gryf> aspiers, how about mistral solution? 09:21:58 Nova{Compute,Evacuate} RAs work fine with known limitations, but are kind of a dead end, right? 09:22:12 _gryf: oh yes, sorry I forgot about that 09:22:18 note, there is kilometers of bash, would it be a maintainable well solution? 09:22:28 I mean masakari 09:22:34 bogdando: IIRC both masakari and evacuationd are Python? 09:22:42 we may want the bats tests at least, may be 09:23:04 well, as I understood from the repo, it is pure bash? 09:23:09 I am strongly in favour of the solution being mainly in Python 09:23:20 <_gryf> evacuationd is in pure python 09:23:20 bogdando: ah, I didn't look at the code yet 09:23:41 no, masakari has a bunch of Python 09:23:46 73.5% according to github 09:23:52 I like bash, but we should care of unit tests as well 09:23:59 hm, ok then 09:24:24 example https://github.com/ntt-sic/masakari/blob/master/masakari-hostmonitor/hostmonitor/hostmonitor.sh 09:24:24 Only masakari's RAs are basing on bash. Masakari's controller is written Python. 09:24:46 I think both Masakari and evacuationD do a pretty good job of using standard OpenStack tech 09:25:15 bogdando: most pacemaker agents are written in bash. they can be in python but miss out on the common library functions 09:25:32 true, although most RAs are very simple and quite short 09:25:35 so bash is tolerable 09:25:43 thats why you'd be seeing bash at all i imagine 09:25:55 right 09:26:35 <_gryf> otoh openstack is mainly in python, so interacting with nova would have much more sense if we go into python 09:26:39 I'm not sure if it is restricted to the RA for mamsakari 09:26:59 _gryf: agreed 09:27:04 limited* 09:27:30 bash for RA is acceptedd practice, agree 09:27:35 but for the rest? 09:27:44 question for the Intel guys: how attached are you to evacuationd? e.g. would you be ok with the idea of switching to masakari and then porting over any features from evacuationd which it is missing? 09:28:16 <_gryf> we have no problem with deprecating evacuationd 09:29:09 IIRC, masakari already persists to database which is nice. Does it use SQLalchemy? I can't remember 09:29:14 is one clearly better than the other? 09:29:32 there are pros and cons to both 09:29:39 Masakari use SQLalchemy to access DB. 09:29:46 as listed in the etherpad 09:29:59 I think the main thing which would need fixing with masakari is to port it to use pacemaker_remote 09:30:02 so that it can scale 09:30:09 currently compute nodes are grouped into 16s 09:30:13 regarding masakari " corosync's scaling limits, compute nodes are grouped into 16-node clusters" why to not use pacemaker-remote? 09:30:19 but I doubt that would be hard 09:30:32 i think that would be the first thing RH would want to change :) 09:30:38 and SUSE :) 09:30:53 masahito: any problem there? 09:31:26 aspiers, oops, you was first 09:31:39 I don't have any problem to change pacemaker remote. 09:31:43 <_gryf> other than that, there would be a need for selecting certain vms for resurrecting - we already have that in evacuationd 09:32:00 _gryf: good point 09:32:04 _gryf: sounds nice, how would that work? 09:32:25 aspiers: atttribute on the instance in nova iirc 09:32:32 <_gryf> we had it implemented through the flavor extra specs and vm metadata 09:32:57 _gryf: did you get around to using a db for persistance in evacuationd? 09:32:59 oh, you mean something a bit like availability zones? 09:33:08 _gryf: e.g. selecting which VMs are pets vs. cattle? 09:33:44 what about the MQ? Do masakari use it? 09:33:46 <_gryf> beekhof, we didn't made any changes due to discussion we had 09:33:53 <_gryf> aspiers, right 09:33:57 bogdando: I think they use HTTP requests 09:34:19 should we also keep in mind the Mistral alternative in the list? Looks like a high level coordintaion for masakari as well 09:34:27 yes 09:34:31 +1 09:34:49 what's the first action we could take regarding Mistral? 09:34:57 how wouldwe want to proceed with PoC? 09:35:02 I guess it's still in the R&D phase regarding hypervisor HA 09:35:20 right now I'm working on POC using mistral that will auto evacuate VMs 09:35:30 cool! 09:35:31 _gryf: how far off is that project from conceptually being able to support what we want? 09:35:32 <_gryf> we have this almost working :) 09:35:37 wow 09:35:39 ddeja, would be nice to join your efforts 09:35:40 awesome 09:35:43 if possible 09:36:04 ddeja: perhaps you can give a quick summary of how that works and where you are with it? 09:36:05 so do we park this thread until we hear how the mistral PoC went? 09:36:19 any action item then? 09:36:26 bogdando: no problem, I can share code on github 09:36:32 beekhof: I suggest we hear some details first before deciding 09:36:33 great! 09:36:42 yes please, github would be great 09:37:23 ddeja / _gryf: are you able to quickly summarise now? 09:37:28 aspiers, action item please? 09:37:31 <_gryf> aspiers, yup 09:37:36 <_gryf> so the idea is simple 09:37:52 <_gryf> prepare small action class (in python) 09:37:58 ok, i need to head out guys (its my wedding anniversary and its getting late here) 09:38:04 #action ddeja will share mistral PoC code on github 09:38:05 <_gryf> plus the workflow (which is basically a yaml) 09:38:12 i'll read up on the weblog though 09:38:27 beekhof: whoa, run before you get in trouble ;-) congrats and thanks for attending! 09:38:32 <_gryf> and then proceed like with other solution - trigger it from pacemaker 09:38:50 beekhof, congrats! 09:38:53 <_gryf> mistral would take care of the evacuation 09:38:53 beekhof: congrats! 09:38:58 _gryf: this action class is needed only for selecting which VM should be evacuated (so resolving pet vs cattle problem) 09:39:26 <_gryf> ddeja, right, but it;s one of our assumption 09:40:15 could this also help us to build the solution with a solid logging for events? http://blog.clusterlabs.org/blog/2015/reliable-notifications/ 09:40:22 <_gryf> and the main point regarding mistral is, that it's a quite stable project now 09:40:51 that sounds cool to me, looking forward to seeing more details 09:41:46 <_gryf> a soon we have the poc working, I'll make some more detailed description on the etherpad. 09:42:06 #action _gryf will update the etherpad with more details of the mistral PoC 09:42:11 <_gryf> +! 09:42:14 <_gryf> +1 09:42:17 <_gryf> :) 09:42:19 :) 09:43:07 so do we want to work in parallel on masakari / mistral, or hold off for a short while? 09:43:30 My vote is for Mistral PoC as well 09:43:43 masakari has three levels of monitoring which is really nice - could mistral do that too? 09:44:17 and can mistral itself be made HA? 09:44:25 if not, how much work is that? 09:44:38 obviously there's no point designing an HA system around a component which is a SPoF :) 09:44:59 just trying to figure out what's likely in the short vs. long term 09:45:19 <_gryf> aspiers, we can make it to monitor - actually, we can make it through pacemaker 09:45:30 <_gryf> *all 3 levels 09:45:35 I believe the HA based solution with pacemaker for Mistral would be the same as for the rest OpenStack projects 09:45:52 bogdando: so it's basically stateless? 09:45:54 A/P or A/A for multiple API instances, what else? 09:46:05 all state in the DB I guess? 09:46:08 depends on if it is stateless, yes 09:46:11 <_gryf> regarding ha - there is a priority to make mistral HA in the mitaka cycle 09:46:21 ok cool 09:47:05 _gryf, OCF_CHECK_LEVEL? 09:47:17 are we about those levels? 09:47:35 we could make as many of them as we want 09:47:57 <_gryf> bogdando, look at the masakari project - there are 3 levels of failure check 09:48:11 <_gryf> vm, process (libvirt/compute) and host 09:48:15 I'd like to know how much potential there is for convergence between Mistral PoC and Masakari, but maybe it's too early to discuss that and we should wait until more details of Mistral PoC first? 09:48:28 perhaps 09:48:43 like I said, I didn't expect us to solve everything in the first meeting ;-) 09:48:48 <_gryf> aspiers, I would wait 09:49:04 anyway we did not bad already :) 09:49:15 right :) 09:49:43 I think we can continue the architectural conversations on #openstack-ha and openstack-dev mailing list, right? 09:49:53 <_gryf> aspiers, sure 09:50:06 #openstack-ha is logged, so noone has to miss discussions 09:50:32 also, please make sure to include "[HA]" Subject: prefix for discussions on openstack-dev 09:50:58 ttx configured mailman for this prefix so you can even do server-side filtering for that topic now 09:51:47 masahito: once more details of Mistral PoC are released, would you be able to look at it and assess possibility of convergence? 09:52:02 aspiers: yes. 09:52:17 great, thanks! 09:52:22 I think we would converge both as long term goal 09:52:28 agreed :) 09:52:52 btw, which do you all easy to push codes to Masakari using stackforge or github? 09:52:55 #action masahito will investigate possibility of converging masakari with Mistral PoC, once details of the latter are published 09:53:20 masahito: I guess gerrit/stackforge is preferred 09:53:36 <_gryf> for a poc level I think gh is alright 09:53:47 true, either works 09:54:09 if we need to set anything up on gerrit, I can help with that 09:54:15 <_gryf> then, if we decide to go either way, gerrit/stackfroge will be a way :) 09:54:28 I already went through that process for openstack-resource-agents so I know how to do it 09:54:35 <_gryf> great 09:54:41 aspiers: great 09:55:09 Do I start to move the repo to stackforge? 09:55:17 masahito: also would you be able to investigate what work would be needed to switch masakari to pacemaker_remote? 09:55:44 aspiers: not yet. because we didn't try it. 09:55:59 masahito: I would guess no rush to move to stackforge yet, but of course you can if you want 09:56:17 aspiers: meaning just think the idea, but not implementing. 09:56:17 ok, we are approaching the end of the 60 minutes 09:56:38 aspiers: ok. I'll wait suitable time :) 09:56:49 masahito: right. if you have ideas on pacemaker_remote implementation then please share them on IRC or mailing list 09:57:05 anybody want to raise anything else before we close? 09:57:32 otherwise let's continue on #openstack-ha and openstack-dev (with "[HA]" in Subject: line :) 09:57:38 <_gryf> it was fruitfull meeting :) thx everyone :) 09:58:00 yeah great first meeting, thanks a lot everyone! 09:58:46 bye! 09:58:56 #agreed we'll continue discussion on #openstack-ha and openstack-dev (with "[HA]" in Subject: line :) 09:59:28 thanks everyone, see you same time/place next week! 09:59:37 #endmeeting