08:03:55 <aspiers> #startmeeting ha
08:03:56 <openstack> Meeting started Mon Jun 27 08:03:55 2016 UTC and is due to finish in 60 minutes.  The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot.
08:03:57 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
08:03:59 <openstack> The meeting name has been set to 'ha'
08:04:15 <aspiers> so let's start, maybe people will join in a bit
08:04:18 <ddeja> hello
08:04:34 <aspiers> hi :)
08:04:34 <rsjethani> o/
08:04:40 <aspiers> #topic Current status (progress, issues, roadblocks, further plans)
08:04:49 <aspiers> alright, we haven't done any status reports for a while
08:04:54 <aspiers> I'll go first
08:05:11 <aspiers> I'm finishing up preparation of openstack-resource-agents-specs
08:05:30 <samP> Hi, sorry for the delay
08:05:30 <aspiers> there's a weird Sphinx problem building the docs, but I now know how to pin it down
08:05:35 <aspiers> samP: hi, np :)
08:05:49 <aspiers> so I am hoping to submit this today
08:05:59 <aspiers> at which point it should be easy for other people to submit specs for review
08:06:26 <aspiers> also, the long discussion with beekhof and kgaillot on the pacemaker-users list finally reached consensus :)
08:06:52 <samP> yeh, it took some time for me to follow up...lol
08:06:57 <aspiers> #info openstack-resource-agents-specs should be ready for submissions later today
08:07:16 <aspiers> samP, ddeja: not sure if you are following that thread?
08:07:47 <ddeja> aspiers: I've read it last week, but I didn't take an update since then
08:07:52 <aspiers> #info conclusion reached on how to stop nova scheduling VMs to failing nova-compute
08:07:55 <samP> I read the ML discussion
08:08:00 <aspiers> the conclusion was basically:
08:08:10 <aspiers> 1. use force_host_down not service-disable
08:08:23 <aspiers> 2. do it on every RA stop, and the opposite on every RA start
08:08:48 <aspiers> I think that's it :)
08:09:03 <aspiers> no Pacemaker enhancements required
08:09:03 <ddeja> aspiers: thanks for the update!
08:09:19 <aspiers> probably migration-threshold=1 and start-failure-is-fatal=False
08:09:38 <aspiers> ok, I think that's all from my side
08:09:54 <aspiers> samP: any news from your side?
08:10:04 <samP> aspiers: Thank you and I'm working on masakari to try that out. But I coulndt finish
08:10:17 <aspiers> oh ok, cool
08:10:37 <samP> I will give some details on ML or next meeting
08:10:40 <aspiers> nice
08:10:58 <aspiers> oh I remembered more things
08:11:20 <aspiers> as ddeja already saw, Intel is asking if we want to use the OSIC for testing implementations of the user story
08:12:08 <aspiers> I gave them a quick summary of where we are
08:12:11 <aspiers> and pointers to more info
08:12:28 <aspiers> but I think it's still a bit too early to test an upstream implementation :)
08:13:17 <aspiers> also I talked to the chairman of the OpenStack board and he has raised the idea of a dedicated HA track in Barcelona
08:13:24 <samP> aspiers: agree, but nice if we can use it to analyse the technical gaps, if possible
08:13:28 <aspiers> I don't know if it will happen, but at least people are thinking about it now
08:13:37 <ddeja> aspiers: cool
08:13:43 <aspiers> samP: true, although I think OSIC is probably more useful for scale testing
08:14:00 <aspiers> samP: but I'm not really sure. I guess we'll find out when they've looked at the info
08:14:38 <samP> aspiers: sure
08:14:53 <aspiers> ddeja: any news on the mistral side?
08:14:58 <ddeja> my status:
08:15:00 <ddeja> -Working on alternative RPC layer for mistral, so that we can have ACK then process pattern
08:15:02 <ddeja> -Lot of internal work last week
08:15:21 <ddeja> aspiers: RPC layer is working more or less ;)
08:15:28 <aspiers> :-O :-)
08:15:43 <ddeja> I'm hoping to have it merged in ~2 weeks
08:16:17 <ddeja> and I'm starting work on letting user decide in which mode he would like to process his message
08:16:33 <ddeja> that's it
08:16:35 <aspiers> remind me what the aim of that is? is it so that we can track which worker is handling a workflow?
08:16:49 <ddeja> aspiers: maybe in a long term
08:16:50 <aspiers> and then make sure the workflow is reliable?
08:17:07 <ddeja> right now is to make sure that if given task is idempotent
08:17:17 <ddeja> it would be done at some point
08:17:28 <ddeja> it is enaught for evacuate workflow
08:18:06 <aspiers> ok
08:18:39 <aspiers> so is it supporting retry of task on different worker?
08:18:51 <ddeja> yup
08:18:53 <aspiers> or something else, I can't remember the main point of doing it
08:19:11 <ddeja> yeah, it is basically re-sending a messagem in case worker dies
08:19:19 <ddeja> so another would do the job
08:19:26 <aspiers> wouldn't we need to also fence the worker?
08:19:44 <ddeja> not really
08:19:52 <aspiers> (in general, maybe not for idempotent tasks)
08:20:01 <aspiers> I mean if a task cannot be repeated safely
08:20:14 <ddeja> aspiers: your concer is right, but
08:20:23 <ddeja> in case wokrer dies
08:20:32 <ddeja> we don't know what he already did
08:20:59 <samP> ddeja: is that what we discussed abt state of the job?
08:21:04 <ddeja> may he send nova_boot to nova and died just before he let the mistral engine knows that the action succeded?
08:21:29 <ddeja> so in case action is not idempotent, we just cannot do it twice, we need to fail whole workflow
08:21:44 <ddeja> and thereofre, there is no need to fence the worker
08:21:45 <aspiers> hmm yeah, difficult
08:22:00 <aspiers> so how does ACK then process help?
08:22:11 <ddeja> aspiers: process then ACK
08:22:24 <aspiers> you said the opposite earlier ;-)
08:22:25 <ddeja> oh, I write it wrong...
08:22:28 <aspiers> lol
08:22:38 <ddeja> so, process then ACK helps
08:22:44 <aspiers> ok, process then ACK makes more sense ;-)
08:22:49 <ddeja> yup, sorry
08:23:01 <ddeja> samP: yes, that's the story
08:23:31 <aspiers> but if process fails and there is no ACK, how do you know whether to retry or fail the whole workflow?
08:24:09 <ddeja> soo
08:24:14 <aspiers> ddeja: BTW is there a spec for this? if so please just provide the URL so I can stop asking stupid questions :)
08:24:34 <ddeja> aspiers: there is no spec right now...
08:24:38 <aspiers> if there isn't a spec, maybe there should be?
08:24:52 <aspiers> presumably you already discussed this with Renat etc.?
08:24:55 <ddeja> aspiers: well, really it is not very complicated
08:25:07 <ddeja> let me explain
08:25:30 <ddeja> so: in mistral we would support both ACK then process and process then ACK
08:25:37 <aspiers> ah
08:25:48 <ddeja> for idempotent messages we use process then ACK
08:26:04 <ddeja> and for not idempotent ACK then process (model that is used now)
08:26:17 <ddeja> + timeout for such tasks
08:26:19 <aspiers> ok
08:27:13 <aspiers> and idempotency property is set as metadata on the workflow or task, or similar?
08:27:34 <ddeja> aspiers: it would be set per task
08:27:41 <ddeja> in a workflow definition
08:27:58 <aspiers> ok, makes sense
08:29:03 <aspiers> ddeja: are there any mailing list discussions on this which I might have missed?
08:29:59 <ddeja> aspiers: there was only discussion about adding such feature to oslo
08:30:09 <ddeja> and there is a blueprint
08:30:21 <aspiers> an oslo blueprint?
08:30:53 <ddeja> nope, oslo rejected it
08:31:13 <ddeja> https://blueprints.launchpad.net/mistral/+spec/mistral-alternative-rpc
08:31:16 <aspiers> thanks
08:31:23 <ddeja> blueprint for alternative rpc layer
08:31:54 <aspiers> http://thread.gmane.org/gmane.comp.cloud.openstack.devel/83394/focus=86142
08:32:00 <aspiers> that was the oslo discussion
08:32:22 <ddeja> yup, that's that
08:32:42 <aspiers> ddeja: so would it be a fair summary to say you are working on making mistral more reliable?
08:32:54 <aspiers> I guess that's the one-line description :)
08:32:58 <ddeja> yes :)
08:33:09 <aspiers> #info ddeja is working on making mistral more reliable
08:33:11 <aspiers> :)
08:33:12 <samP> gret :)
08:33:22 <aspiers> alright
08:33:29 <aspiers> any other topics to discuss?
08:33:55 <samP> I think we should use our ML more
08:34:03 <aspiers> openstack-dev?
08:34:07 <samP> yep
08:34:10 <aspiers> I agree
08:34:30 <aspiers> #action everyone should use openstack-dev more often for discussing HA topics
08:34:49 <radek__> I have one question, I am using TripleO do deploy my openstack and in TripleO world there is someting called light HA
08:34:51 <samP> If you need to share or need attention our team, then just put it to ML
08:34:54 <radek__> any idea what that could be ?
08:35:14 <aspiers> radek__: I've not heard of that
08:35:38 <radek__> it was also new to me :)
08:35:43 <radek__> ok thanks Adam
08:35:52 <aspiers> radek__: based on https://github.com/openstack/tripleo-heat-templates/blob/master/environments/puppet-pacemaker.yaml
08:36:02 <aspiers> I would guess that it is Red Hat's move from Pacemaker to systemd
08:36:11 <aspiers> as per http://blog.clusterlabs.org/blog/2016/next-openstack-ha-arch
08:36:16 <aspiers> but I could be totally wrong
08:36:21 <aspiers> you'd have to ask beekhof :)
08:36:25 <radek__> ahhh maybe
08:36:58 <radek__> want me to make him angry :)
08:37:11 <aspiers> haha
08:37:27 <aspiers> beekhof's always angry ;-) just kidding ;-)
08:37:45 <radek__> anyway going to ask on TripleO meeting next time
08:37:49 <aspiers> good idea
08:38:04 <aspiers> #topic AOB (Any Other Business)
08:38:34 <aspiers> #info aspiers is going on holiday shortly, should be back Monday July 18th
08:38:58 <aspiers> please could someone volunteer to chair HA meetings for the next 2 weeks?
08:39:04 <ddeja> aspiers: sure
08:39:07 <aspiers> thanks!
08:39:14 <samP> BTW, have we put discussed time slot?
08:39:28 <aspiers> samP: I have collected all the input and need to compare it + suggest a new time
08:39:43 <samP> OK, thank you
08:40:41 <aspiers> alright, anything else? otherwise we can finish early
08:41:31 <ddeja> I'm done
08:41:44 <samP> nothing form my side, hv a nice week ahead...
08:41:54 <aspiers> ok thanks, you too!
08:41:58 <aspiers> bye for now :)
08:42:30 <samP> bye then... thank you all
08:42:39 <aspiers> #endmeeting