#openstack-meeting-alt log

09:20:22 <aspiers> #startmeeting ha
09:20:22 <openstack> Meeting started Wed Feb  1 09:20:22 2017 UTC and is due to finish in 60 minutes.  The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:20:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:20:25 <openstack> The meeting name has been set to 'ha'
09:20:34 <aspiers> let's just have a quick chat
09:20:51 <samP> aspiers: sure, in that case, I have to let go one of my talks, but its ok
09:20:56 <aspiers> #topic Boston
09:21:19 <aspiers> samP: I guess you would suggest another talk on compute HA?
09:21:20 <samP> aspiers: talk submit deadline is 2/6 (I think)
09:21:23 <aspiers> yes, it's soon
09:21:26 <beekhof> aspiers: howdy
09:21:31 <aspiers> oh hey beekhof
09:21:37 <aspiers> beekhof: you gonna come to Boston?
09:21:38 <samP> beekhof: hi
09:21:51 <beekhof> sorry, the US is on my No-Fly-To list
09:21:59 <aspiers> ah, you too
09:22:23 <aspiers> Tim and Florian too
09:22:24 <samP> aspiers: I was thinking to submit one about compute-HA but more masakari related
09:22:46 <samP> aspiers: but I can join with you..
09:22:52 <beekhof> organising a conference in the US right now seems borderline dumb and more than a little insensitive
09:23:09 <aspiers> beekhof: agreed but there is a blog explaining why
09:23:16 <beekhof> -ENOCARE
09:23:48 <aspiers> I'm definitely not gonna argue with you on that :)
09:23:53 <beekhof> :)
09:24:53 <aspiers> samP: if we could write a simple host monitor which notifies masakari for host recovery then I think doing a joint talk is justified
09:25:10 <aspiers> since that demonstrates the idea of a componentized architecture
09:25:11 <samP> aspiers: sure
09:25:20 <aspiers> and it should be easy to do I think
09:25:48 <samP> aspiers: I can do that
09:26:30 <aspiers> we could follow the driver idea, so that it can have a driver to notify masakari via its native mechanism, and also a driver to notify any other service via standard https
09:26:53 <aspiers> or if it's easy, to enhance masakari to accept the standard message
09:26:57 <aspiers> or just make them the same format
09:27:06 <aspiers> I need to learn more about how masakari currently works
09:27:44 <samP> aspiers: It depends on what is "standard message"
09:28:12 <aspiers> samP: well we can define it to be whatever we want :) but something which is suitable for any recovery workflow controller to handle
09:28:50 <aspiers> it's ok if it supports masakari extensions, but not ok if it requires masakari-specific things
09:29:18 <samP> aspiers: correct, I thought you are referring to smt already exist
09:29:47 <aspiers> ah, no :)
09:30:41 <aspiers> are there any docs for masakari yet?
09:30:42 <samP> aspiers: current masakari-monitors does not require any masakari-specific info, however there is a format for every thing
09:31:00 <samP> aspiers: sorry, docs are on the way..
09:31:07 <aspiers> ok
09:31:34 <aspiers> we need the host monitor to support different notifications anyway, so that it can notify mistral too
09:33:06 <samP> aspiers: agree, both cases we have to prepare the data with required format from the engine (mistral/masakari or etc...)
09:33:52 <aspiers> samP: could you do a very quick rough doc somewhere (etherpad/wiki) describing how masakari monitors send notification to the controller?
09:34:34 <samP> aspiers: sure, I will do it on wiki
09:34:43 <aspiers> beekhof: from our side I think the action would be to split fence_compute into two decoupled parts: the monitoring code and the recovery code
09:34:44 <samP> and will send you the link
09:35:10 <aspiers> beekhof: or at least to conceptually split it, if not into separate files
09:35:20 <aspiers> samP: great thanks!
09:35:40 <aspiers> #action samP to document how masakari monitors send notification to the controller
09:36:06 <samP> aspiers: sure, thanks for the action item
09:36:39 <beekhof> i'm not sure i follow
09:36:44 <aspiers> samP: so maybe I should draft a synopsis for a talk proposal
09:36:58 <beekhof> fence_compute only performs evacuations
09:37:14 <beekhof> it doesn't look for failed vms
09:37:26 <samP> aspiers: That would be great..thank you
09:37:28 <aspiers> beekhof: no but it sets the attribute
09:37:44 <aspiers> beekhof: which is akin to sending a notification to the recovery workflow controller (NovaEvacuate)
09:38:04 <aspiers> #action aspiers to draft a proposal for a joint talk with samP on next gen compute HA
09:38:08 <beekhof> in that case we agree
09:38:25 <beekhof> i didnt really like that it was called twice in two completely different modes
09:38:56 <aspiers> beekhof: ok cool, me neither.
09:39:07 <aspiers> beekhof: Since we have the underlying goal of componentising everything to support a modular approach, this decoupling would be a key part of that
09:39:29 <aspiers> beekhof: since currently the monitoring part of fence_compute only works with the recovery part of fence_compute
09:39:42 <aspiers> since both are dependent on attrd_updater
09:40:26 <aspiers> if we convert it to send / receive generic JSON notifications via https then it can work with any other approach to recovery
09:40:55 <aspiers> I guess we need some thought on how to do that reliably
09:41:34 <aspiers> e.g. what if the recovery workflow controller which receives notifications is offline when a compute host dies and gets fenced
09:42:03 <beekhof> sounds like fun
09:42:04 <aspiers> obviously https is not a stateful queue
09:42:17 <aspiers> whereas at least attrd can act as one
09:43:55 <aspiers> hmm, actually we could keep attrd acting as a queue, and then have a separate cluster process which takes items off the "queue" as soon as it successfully notifies a controller about them
09:44:02 <aspiers> that would probably work
09:44:22 <ddeja> Hi guys, I had problems with my PC...
09:44:23 <samP> aspiers: IMO, that is way the HA of recovery controller itself is important, https is stateless but we could retry
09:44:49 <samP> ddeja: hi
09:44:54 <aspiers> samP: we could retry, but we wouldn't be allowed to ever give up
09:45:03 <aspiers> and if the process running the retries died, then the notification would get lost
09:45:16 <aspiers> hey ddeja :)
09:46:33 <aspiers> beekhof: so actually, the change would be more to NovaEvacuate
09:46:36 <samP> aspiers: true
09:46:54 <aspiers> beekhof: that NovaEvacuate would be responsible not for recovery, but instead for notifying the recovery workflow controller
09:47:04 <beekhof> that could work
09:47:18 <aspiers> of course we'd probably want to change the name
09:47:25 <aspiers> but architecturally that's my idea
09:48:12 <aspiers> cool! that sounds like a potentially strategy for starting to unify all three approaches
09:48:44 <aspiers> which would make migration to masakari/mistral easier
09:49:20 <aspiers> #topic architecture
09:50:23 <aspiers> #info aspiers proposed an extension to ddeja's driver-based approach which would work with not only masakari/mistral but also potentially the OCF agents
09:50:42 <aspiers> I'll try to capture it in the specs
09:51:01 <aspiers> #action aspiers to capture proposal in the specs
09:51:14 <aspiers> of course we can consider other approaches too
09:51:17 <beekhof> aspiers: what if it was a different agent
09:51:44 <aspiers> beekhof: what do you mean?
09:51:52 <beekhof> NovaEvacuate for the existing pacemaker way, and mistral-evacuate for the new one
09:52:09 <aspiers> absolutely
09:52:33 <aspiers> well, it wouldn't necessarily be mistral-evacuate. I think it should aim to support both mistral and masakari
09:52:46 <aspiers> then later masakari can potentially use mistral to do its work
09:52:52 <beekhof> json-evacuate then :)
09:52:58 <aspiers> right :)
09:53:26 <beekhof> dont much care what its called, but not trying to support json and attrd in the same agent seems like a good idea
09:53:34 <aspiers> +100
09:54:17 <aspiers> beekhof: and the cool thing about having two RAs is that you could trivially migrate a cluster from one to the other by having the two resources co-existing during the switch-over
09:54:29 <aspiers> you would simply turn the old one off and the new one on
09:54:36 <aspiers> boom - done
09:54:45 <beekhof> yep
09:54:47 <aspiers> since they could still use the same attributes
09:54:55 <aspiers> hooray
09:55:14 <aspiers> this sounds suspiciously easy
09:55:21 <beekhof> lets not jump the gun on upgrades though, i believe its tradition to wait until the new way is in production
09:55:28 <aspiers> true
09:55:56 <aspiers> samP: I made some updates to your spec based on ddeja's feedback
09:56:57 <samP> aspiers: thanks..
09:57:05 <aspiers> samP: please review when you get a chance
09:57:52 <samP> aspiers: sure I will
09:58:23 <prometheanfire> tonyb: ?
09:58:28 <aspiers> ddeja: I'll update the host monitor spec to suggest supporting driver plugins, with one for mistral, one for JSON, and another for masakari if it needs something different to the standard JSON one
09:58:36 <aspiers> ok, looks like we're out of time
09:58:48 <aspiers> but that was a pleasingly productive discussion
09:59:03 <aspiers> if you have anything else, let's just continue on #openstack-ha
09:59:09 <aspiers> thanks guys!
09:59:19 <samP> thank you all!
09:59:29 <ddeja> thanks
09:59:29 <aspiers> bye for now :)
09:59:54 <aspiers> #endmeeting