09:00:52 <aspiers> #startmeeting ha
09:00:54 <openstack> Meeting started Mon Mar  7 09:00:52 2016 UTC and is due to finish in 60 minutes.  The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:00:55 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:00:58 <openstack> The meeting name has been set to 'ha'
09:01:15 <aspiers> OK let's get started
09:01:28 <haukebruno> hi everyhone :)
09:01:29 <aspiers> we've had apologies from beekhof who is not feeling well
09:01:40 <aspiers> hi haukebruno
09:02:02 <aspiers> #topic introductions
09:02:14 <aspiers> haukebruno: since we are a small group, would you like to introduce yourself?
09:03:14 <aspiers> since I don't think we saw you in these meetings before
09:03:46 <haukebruno> sure. my name is hauke, 27 years old and from germany. I work in a small private cloud startup near frankfurt and we use openstack for 2.5 years now. before I used a lot of the HA stuff around pacemaker/corosync and haproxy, thats the main reason why I joined this meeting/channel
09:04:01 <aspiers> great! really glad to have you join :)
09:04:08 <haukebruno> ah, + I am poorly a 100% ops guy
09:04:12 <aspiers> haha
09:04:18 <masahito> haukebruno: welcome
09:04:24 <haukebruno> thanks masahito
09:04:25 <aspiers> I'm from SUSE, masahito is from NTT, and ddeja / _gryf are from Intel
09:04:37 <haukebruno> nice to meet you guys
09:04:50 <aspiers> we are all working on HA for our companies, and currently focusing quite a bit on compute node HA
09:05:04 <aspiers> #topic Current status (progress, issues, roadblocks, further plans)
09:05:13 <haukebruno> very nice, I guess compute node HA is the most wanted kind of HA these days
09:05:13 <aspiers> OK, quick update from me:
09:05:24 <aspiers> yes it seems in demand :)
09:05:53 <aspiers> I did some reviewing of changes from Norbert Illes on openstack-resource-agents
09:06:10 <aspiers> we have been tidying up the code and now it passes bashate 100%
09:06:17 <aspiers> next step will be to add some basic CI
09:06:52 <aspiers> #info https://bugs.launchpad.net/openstack-resource-agents/+bug/1550203 (bashate violations in OCF scripts) is now fixed
09:06:52 <openstack> Launchpad bug 1550203 in openstack-resource-agents "Bashate violations in OCF scripts" [Medium,Fix committed] - Assigned to Norbert Illes (nilles)
09:07:21 <aspiers> I think that's the only interesting news I have this week
09:08:07 * _gryf passes the baton to ddeja :)
09:08:11 <aspiers> :)
09:08:25 <ddeja> ok
09:08:59 <ddeja> so I was on a medical leave - today is my first day at office in March
09:09:09 <Qiming> so the basic assumption when we are talking about HA is still about pacemaker/corosync?
09:09:19 <aspiers> hi Qiming
09:09:33 <Qiming> hi, everyone
09:09:39 <Qiming> just sneaked in
09:09:40 <aspiers> Qiming: not necessarily, but at the moment all topics usually involve pacemaker
09:09:57 <Qiming> got it
09:09:57 <ddeja> I was thinking a little about demo of Tenants HA using Mistral, and I'm about to prepare some short film to present how it works
09:10:05 <ddeja> that's all from my side
09:10:12 <aspiers> ddeja: great, looking forward to seeing that :)
09:10:17 <aspiers> a short film would be really useful
09:10:19 <masahito> ddeja: sounds nice.
09:10:33 <ddeja> thanks guys :)
09:10:41 <aspiers> masahito: same for masakari ;-)
09:10:58 <aspiers> masahito: any updates from your side?
09:11:24 <masahito> I don't have any update to report the team.
09:11:28 <aspiers> ok
09:11:51 <aspiers> haukebruno / Qiming: anything you want to share under the current #topic ?
09:12:03 <aspiers> any info on current work or future plans?
09:12:09 <aspiers> no problem if not
09:12:11 * Qiming is wondering who pushed him into the room ...
09:12:17 <haukebruno> not from my site sadly
09:12:17 <aspiers> haha
09:13:02 <aspiers> ok
09:13:03 <Qiming> aspiers, seriously, I'm from Heat and Senlin team, we have some thoughts, design, prototype on VM/App HA
09:13:41 <aspiers> Qiming: oh, well we definitely want to hear about that then!
09:13:45 <haukebruno> maybe important for you: I am pretty new to the community site of openstack, I wanted to contribute as good as I can, but no idea about the 'how'
09:13:49 <Qiming> we are working with some NFV guys, soliciting their requirements on workload HA
09:14:19 <aspiers> #topic Vm/App HA work within Heat and Senlin teams
09:14:50 <Qiming> previously we tried to inject some HA mechanisms into Heat
09:14:57 <aspiers> Qiming: ok, hopefully we can work together with you on this? one of the big goals of these meetings is to try to converge efforts in the long term
09:15:13 <Qiming> but the proposal was rejected because it doesn't align well with Heat's mission, which is a pure orchestrator
09:15:24 <Qiming> aspiers, definitely
09:15:45 <Qiming> then later when we started the Senlin project (a clustering service), now an official project
09:16:02 <aspiers> Qiming: have you seen http://specs.openstack.org/openstack/openstack-user-stories/user-stories/draft/ha_vm.html ?
09:16:03 <Qiming> we tried to get HA designed into the service
09:16:22 <Qiming> no, aspiers, will read that offline
09:16:39 <aspiers> Qiming: also https://etherpad.openstack.org/p/automatic-evacuation
09:17:01 <Qiming> in senlin, our understanding is that behind any HA solution, you need redundancy, which is a cluster
09:17:16 <aspiers> right
09:17:34 <Qiming> and to do HA, you will need to think about three aspects (at least): detection, signaling and recovery
09:17:58 <Qiming> we have prototyped some policies that can be enforced on VM clusters or Heat stack clusters
09:18:02 <aspiers> and recovery requires fencing
09:18:28 <Qiming> aspiers, exactly, we were working with some IBMers from Haifa research lab on this
09:18:29 <aspiers> as well as election if the cluster is decentralized
09:18:55 <aspiers> Qiming: how far along is your prototype?
09:18:57 <Qiming> exactly, it is never a simple solution
09:19:12 <aspiers> Qiming: could you describe the architecture and/or what it achieves, or point us to a URL with docs?
09:20:05 <Qiming> so, back to senlin's prototype, we plan to failure detections in three ways: 1) periodically polling the VM states from Nova 2) listen to VM lifecycle events 3) inquire the load-balancer (health monitor) if the cluster does have a load-balancer
09:20:43 <Qiming> http://git.openstack.org/cgit/openstack/senlin/tree/senlin/policies/health_policy.py
09:21:20 <Qiming> that is a skeleton, team is still debating on the details, as always, :)
09:21:47 <aspiers> ok
09:22:03 <aspiers> I was not aware of Senlin before, so it's great that you joined this meeting to tell us
09:22:11 <Qiming> my pleasure
09:22:26 <Qiming> but I'm not gonna hijacking this meeting for a senlin tutorial
09:22:40 <aspiers> Qiming: well, I'm not sure we have much else to discuss today
09:22:45 <aspiers> Qiming: it's a smaller group than usual
09:22:55 <Qiming> okay
09:22:58 <aspiers> Qiming: so I think it's a good use of the time
09:23:08 <Qiming> good to know that
09:23:10 <aspiers> although if anyone else has urgent issues to discuss, please let me know :)
09:23:27 <aspiers> Qiming: how would you describe the main differences to the existing approaches to HA?
09:23:52 <Qiming> it is more customizable, it is not tied to pacemaker/corosync
09:23:55 <aspiers> it seems that this is some kind of "HA as a service"
09:24:22 <Qiming> HA was treated as one of the policies that you can attach to a group of things
09:24:34 <Qiming> a pretty bold simplification
09:24:37 <aspiers> what are the "homogeneous objects" referred to?
09:25:07 <aspiers> the OpenStack infrastructure services, e.g. API endpoints?
09:25:09 <Qiming> a cluster can be a group of nova server, a group of heat stacks, for instance
09:25:25 <Qiming> but you are not supposed to have a cluster mixed of nova servers and heat stacks
09:25:49 <aspiers> is the idea that Senlin is only used by other OpenStack services? or also by OpenStack end users?
09:26:05 <Qiming> main target is end users
09:26:21 <aspiers> oh
09:26:26 <Qiming> it can be used by other projects as well because we have a REST API
09:26:40 <aspiers> can you give us an example use case?
09:26:46 <Qiming> some friends have helped implemented Heat resource types for Senlin
09:27:10 <Qiming> create a cluster of Nova server, get it load-balanced, make it auto-scale, and ensure HA for the instances
09:27:12 <aspiers> e.g. what would a cluster of nova servers (I assume you mean VMs) look like?
09:28:01 <aspiers> so it would need to have access inside each VM, e.g. to install/configure clustering software?
09:28:22 <aspiers> or would you only monitor from outside the VMs?
09:28:35 <Qiming> when adding new nodes (e.g VMs), you can decide where those VMs will be created (i.e. affinity or anti-affinity), when deleting existing nodes, you can have a say which nodes are preferred
09:29:05 <Qiming> senlin is not yet installing any other clustering software into the VMs
09:29:10 <aspiers> ok
09:29:22 <Qiming> but that has been considered as a usage scenario
09:29:27 <aspiers> so decisions on cluster management would be made centrally by Senlin server?
09:29:33 <Qiming> yep
09:29:43 <aspiers> ok, I understand now
09:29:46 <haukebruno> what will happen if one instance fails? spawning another one with the same kind of 'metadata' (you pointed out the affinity thing)?
09:30:21 <Qiming> in a health_policy attached to a cluster, you can specify the recover actions you want to try
09:30:32 <aspiers> #info Qiming gave an introduction to Senlin (clustering service for OpenStack end users)
09:30:37 <aspiers> #link https://github.com/openstack/senlin
09:30:39 <Qiming> for nova servers, it could be 'reboot', 'rebuild', 'evacuate', ... 'recreate
09:30:48 <_gryf> Qiming, you said, that the state of the vm you polling from nova
09:30:56 <haukebruno> ah, now I understand too, thanks ;)
09:31:31 <Qiming> all nova server clusters that have a health policy attached will be registered
09:31:43 <Qiming> then checked periodically (i.e. http://git.openstack.org/cgit/openstack/senlin/tree/senlin/engine/health_manager.py#n76)
09:32:19 <Qiming> we haven't yet decided whether auto-recover is a good thing
09:32:30 <aspiers> Qiming: how will you do fencing?
09:32:42 <Qiming> there are other details to the cluster_recover function to be figured out
09:33:11 <haukebruno> sorry for kind of offtopic, but I am curious if there is anything inside openstack that someone could use as a fencing device for instances
09:33:12 <Qiming> aspiers, our friends from Israel lab helped developed those components back in 2014
09:33:27 <aspiers> haukebruno: there isn't, that's why we use Pacemaker
09:33:34 <_gryf> haukebruno, there is no such thing
09:33:36 <Qiming> IIRC, they remote operate the gateway
09:33:38 <aspiers> haukebruno: well, it is one of many reasons why we use Pacemaker
09:34:24 <haukebruno> i see, thanks
09:34:43 <Qiming> I heard some different opinions regarding using pacemaker to do HA for OpenStack controllers
09:35:05 <Qiming> our friends in New York lab is doing OpenStack controller HA without using pacemaker
09:35:17 <Qiming> they are using consul for monitoring, I believe
09:35:17 <haukebruno> Qiming, we also
09:35:36 <aspiers> Qiming: how does NY lab do fencing?
09:35:46 <Qiming> aspiers, have to check out
09:36:25 <Qiming> there is no fencing API as far as I know, so our prototype only works on certain type of network switch
09:37:31 <aspiers> Qiming: I heard different opinions on Pacemaker too, although I never heard any convincing arguments against Pacemaker
09:38:05 <aspiers> most (not all) of the arguments I heard against it were based on misunderstanding
09:38:26 <Qiming> okay, I asked my colleagues there when I heard this, their opinion is they hate switching between nova commands and pcs
09:38:28 <_gryf> Qiming, so your solution actually is dependent on some sort of things, like certain type of switch, otherwise it wouldn't be able to fence nodes, right?
09:39:11 <Qiming> and their resource agents are not always yielding a reliable result regarding whether glance-api is still alive
09:39:18 <aspiers> clustering is really difficult, and the Pacemaker code is based on 15-20 years of experience of writing clustering software
09:39:31 <Qiming> the only thing pacemaker knows for sure is that the PID is still there, :)
09:39:45 <Qiming> _gryf, correct
09:40:05 <aspiers> Qiming: Pacemaker knows about a lot more than the PID if you use the openstack-resource-agents project :)
09:40:34 <Qiming> aspiers, that is beyond my knowledge, :) Haven't been following that for a long time
09:40:51 <Qiming> good to know that things are improving
09:41:31 <aspiers> Qiming: I maintain that project. For a long time the OpenStack OCF RAs have been capable of monitoring the actual service, not just the pids
09:42:06 <Qiming> that's great
09:42:31 <aspiers> and the monitoring is direct.  IIUC nova-server <-> nova-compute relies on the message bus
09:43:12 <aspiers> alright
09:43:19 <aspiers> that was a really useful intro to Senlin, thanks!
09:43:30 <Qiming> my pleasure
09:43:38 <aspiers> Qiming: please take a look at those links so you can understand what the rest of the community is doing
09:43:50 <Qiming> really very happy there are finally more people looking into this area
09:43:51 <aspiers> Qiming: also http://www.slideshare.net/adamspiers/compute-node-ha-current-upstream-development
09:43:54 <Qiming> \o/
09:44:07 <Qiming> will do, aspiers
09:44:19 <aspiers> Qiming: there is a cross-project IRC meeting in 35 hours from now which aims to cover this topci
09:44:21 <aspiers> topic
09:44:27 <aspiers> it would be great if you could join
09:44:41 <aspiers> https://wiki.openstack.org/wiki/Meetings/CrossProjectMeeting
09:45:11 <Qiming> it is 2100 UTC?
09:45:14 <aspiers> yes
09:45:24 <aspiers> it's a difficult time for some of us
09:45:24 <ddeja> aspiers: but it's not 100% that it would take place this week
09:45:28 <Qiming> 5 am here, :(
09:45:31 <aspiers> :(
09:45:42 <ddeja> like, it was canceled last week
09:45:57 <aspiers> ddeja: thingee sent an email in the last few days asking for someone to chair this week
09:46:11 <aspiers> ddeja: so I think it's probably 80% likely
09:46:28 <aspiers> I am not sure though
09:46:32 <ddeja> ok, but on the other hand there was this mail if some of us can cover the topic
09:46:42 <aspiers> #topic AOB (Any Other Business)
09:47:09 <aspiers> yeah, let's see what thingee says
09:47:27 <ddeja> I can talk with Renat from Mistral team if he can contact Timofey (the guy who originaly put the spec in review)
09:47:30 <aspiers> I think it's really important that at least 1 or 2 of us are there
09:47:44 <ddeja> they work with each other AFAICT
09:47:49 <aspiers> ok, thanks
09:48:17 <aspiers> ddeja: is that time OK for you? or do you think we should try to push for a different time?
09:48:29 <aspiers> masahito: I guess 2100 UTC is a bad time for you too?
09:48:36 <ddeja> aspiers: it's 10 P.M for me, but I can make it
09:48:45 <aspiers> ok
09:48:51 <masahito> aspiers: yap, it doesn't work for me.
09:48:53 <_gryf> aspiers, I'll try to participate
09:48:59 <masahito> 6am X(
09:49:01 <aspiers> _gryf: me too
09:49:04 <aspiers> masahito: :(
09:49:12 <ddeja> watching evening movie or joining meeting... same fun! ;)
09:49:16 <aspiers> I guess the challenge is that the meeting also needs to cover non-HA topics
09:49:19 <aspiers> haha
09:51:03 <aspiers> any other topics people want to discuss?
09:51:19 <ddeja> aspiers: only short question
09:51:40 <ddeja> didi you get some mail about presentation in Austin?
09:51:47 <aspiers> ddeja: not yet
09:51:56 <ddeja> aspiers: ok
09:52:52 <aspiers> ok 1 minute to raise any other topics
09:53:07 <aspiers> otherwise we can end the meeting slightly early
09:53:25 <haukebruno> is the meeting normally about 1 hour?
09:53:31 <_gryf> haukebruno, yup
09:53:54 <_gryf> haukebruno, but you can join #openstack-ha for further discussion anytime
09:53:59 <aspiers> yes
09:54:14 <haukebruno> _gryf, thanks, I am also in #openstack-ha ;)
09:54:20 <_gryf> haukebruno, k :)
09:54:26 <aspiers> Qiming: please join #openstack-ha too :)
09:54:33 <masahito> haukebruno: or use openstack-dev ML to send a mail with [HA] in title
09:54:34 <aspiers> Qiming: and encourage your colleagues to also join
09:54:45 <haukebruno> masahito, good to know, thanks
09:55:11 <aspiers> ok great, thanks everyone!
09:55:16 <aspiers> see you next week :)
09:55:22 <haukebruno> thanks too, was interesting :)
09:55:34 <ddeja> bye
09:55:43 <masahito> thanks, bye
09:55:44 <_gryf> cu
09:55:52 <aspiers> bye :)
09:55:54 <aspiers> #endmeeting