00:01:52 <thinrichs> #startmeeting CongressTeamMeeting
00:01:53 <openstack> Meeting started Thu May 12 00:01:52 2016 UTC and is due to finish in 60 minutes.  The chair is thinrichs. Information about MeetBot at http://wiki.debian.org/MeetBot.
00:01:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
00:01:56 <openstack> The meeting name has been set to 'congressteammeeting'
00:01:57 <masahito> hi
00:02:24 <thinrichs> Today I just have status updates for the agenda.
00:02:32 <thinrichs> Anything else to discuss?
00:03:27 <thinrichs> #topic status
00:03:31 <thinrichs> ramineni: want to start?
00:03:48 <ramineni_> thinrichs: sure,
00:04:55 <ramineni_> thinrichs: i just started with migration of test-congress  this week
00:05:38 <ramineni_> thinrichs: and fixed some gat failures
00:05:51 <ramineni_> thinrichs: thats it from my side for this week
00:06:26 <thinrichs> Great!  Getting good unit test coverage for at least the in-memory version of the new arch is important, I'd say.
00:06:35 <thinrichs> And the gate is always important to get fixed
00:06:43 <thinrichs> masahito: want to go next?
00:07:06 <ekcs> hi all sorry I’m late. just got in from a conference.
00:07:26 <masahito> sure
00:07:40 <thinrichs> ekcs: no problem.  Just doing status updates.
00:08:48 <masahito> I updated the code Congress with new arhictecture and it work now!
00:08:55 <masahito> https://review.openstack.org/#/c/314873/
00:09:28 <ekcs> From link?
00:09:32 <ekcs> wrong link?
00:09:52 <masahito> And I added codes which loads configured datasources.
00:10:43 <masahito> from this patch, it works https://review.openstack.org/#/c/280793/11
00:11:39 <masahito> ekcs: sorry, I wrote wrong linke.
00:12:46 <thinrichs> Hi all.  Still here?
00:12:54 <ramineni_> yes
00:12:56 <masahito> yes
00:13:18 <ekcs> yup
00:13:26 <thinrichs> My irc client crashed, and then wouldn't reconnect.  I couldn't even ping chat.freenode.net
00:13:35 <ramineni_> masahito: still gate job for new arch fails , i think is it because of datasource_id vs name?
00:13:53 <masahito> ramineni_: I think so
00:14:27 <masahito> ramineni_: or another bugs in new_architecture
00:15:13 <thinrichs> The name vs. id would definitely cause tests to fail
00:15:20 <thinrichs> Could be other problems, of course.
00:15:40 <thinrichs> How did we decide to fix that?
00:16:00 <thinrichs> Have the API translate from ID to name?
00:16:16 <thinrichs> by asking the DB?
00:16:26 <ekcs> I think the consensus was to to do that. same way it’s done now.
00:18:40 <ramineni_> i think patch got merged for that right
00:18:54 <thinrichs> masahito: Did I miss any important discussion about your status update?
00:19:32 <masahito> I think nothing.
00:20:17 <masahito> thinrichs: my update is just Congress with new architecture works now!
00:20:27 <thinrichs> ramineni: the patch for translating from ID to name?
00:22:32 <ramineni_> thinrichs: not getting now, i thought you have kept a patch for it already?
00:23:28 <thinrichs> masahito: the whole new architecture works?  Even the datasource name/id bug is fixed?
00:24:26 <thinrichs> ramineni: I abandoned that patch thinking we were just trying something out.
00:24:45 <thinrichs> ramineni: masahito reviewed it and pointed out some correctness problems
00:24:46 <ramineni_> thinrichs: https://review.openstack.org/#/c/310597/
00:25:06 <ramineni_> thinrichs: its abandoned
00:25:08 <ramineni_> ?
00:25:25 <thinrichs> I was just confused.  I'll resurrect it and work out the bugs.
00:25:43 <masahito> thinrichs: oh, 'works' means Congress can launch with DseNode, not pass tempest tests.
00:26:10 <thinrichs> masahito: understood
00:26:30 <ramineni_> thinrichs: oh ok
00:26:57 <thinrichs> ekcs: want to do your status update?
00:27:09 <ekcs> sure.
00:27:35 <ekcs> Thinking through and writing out the details of HA proposal based on our discussions. working with Tim and Andrew from Redhat on that.
00:27:35 <ekcs> Design summarized in a diagram in the etherpad. Comments welcome. https://etherpad.openstack.org/p/newton-congress-availability
00:27:39 <ekcs> Will be putting it together into a spec proposal.
00:27:44 <ekcs> Also will revise the update sequencing patch once we decide whether to do sequencing logic in DseNode or DataService.
00:31:05 <thinrichs> ekcs: for the reactive enforcement policy engine, a hot+cold would add quite a bit of complexity b/c we'd need leader election
00:31:39 <thinrichs> ekcs: originally I had thought we'd want that to ensure we don't miss executing any actions if the engine crashes and we need to restart it.
00:32:00 <thinrichs> But do we really need a hot+cold?
00:33:14 <ekcs> I’m thinking for first step we can just restart action execution policy engine on failure. that gives us maybe 30 second or less down time?
00:33:40 <ekcs> but if better is needed, then standby would be necessary.
00:33:55 <ramineni_> thinrichs, ekcs: restarting policy engine might not work evrytime right on the same node
00:34:16 <thinrichs1> Here's my question: are we actually ever going to miss executing an action, even if the policy engine is down for minutes or hours?
00:34:43 <thinrichs1> All the messages are sent over oslo-messaging with the receiver set to the execution policy engine
00:35:02 <thinrichs1> Those messages will stay there on the bus for however long it takes to resurrect the policy engine.
00:35:34 <thinrichs1> When it comes back up, it should read the messages off the bus, and as long as it processes them 1 by 1, execute all the correct actions.
00:36:01 <thinrichs1> (That's assuming, I suppose, that it doesn't fall so far behind that it never catches up.  But that'd be a whole different problem anyway.)
00:36:17 <ekcs> thinrichs1: The way things are done right now, executions will be missed. because a policy engine only looks at latest snapshot on restart. but I can think through whether we can do it differently and leverage the message queue to make sure we don’t miss execution.
00:37:05 <thinrichs1> Leader-election is hard, so if there's some way we can leverage oslo-messaging to avoid it, that'd be good.
00:37:32 <thinrichs1> Does anyone know if there are tools in OpenStack to make hot+cold easy to implement?
00:37:35 <ekcs> so you’re saying not missing any execution action is an important requirement. downtime is less important.
00:38:34 <thinrichs1> I think so—for the execution policy engine at least.  The only reason downtime would matter is if the user wanted actions to be executed as soon as the conditions were satisfied.
00:39:05 <thinrichs1> Hmmm…that was what masahito needed though—sub-second response times
00:39:43 <masahito> thinrichs1: usually hot+cold is implemented by pacemaker in OpenStack
00:40:15 <thinrichs1> So pacemaker is responsible for picking a hot and a cold
00:40:26 <ekcs> The other thing is this: my understanding is that we’ll make the necessary changes in congress to do an HA deployment. And we’ll document and test a reference HA deployment. but it’d be up to the operator to actually configure the deployment with all the different tools. is that correct? So to me it’s okay to document how to do a hot standby deployment using say pacemaker. and let that be an optional thing based on user need.
00:41:22 <ekcs> is that your understanding as well?
00:42:02 <thinrichs1> ekcs: I'm not sure what other projects do.  Based on the discussion with the Redhat/Suse guys at the summit, though, that sounds right.
00:42:28 <masahito> ekcs: yes, but IMO using pacemaker is defact standard now.
00:42:45 <ekcs> the end user would need a bunch of external tools to do it. corosync, pacemaker, haproxy, etc.
00:42:53 <ekcs> masahito: yes.
00:43:50 <thinrichs1> Ideally we'd want to have system tests that test our HA—deploy the HA configuration, shoot some instances, and check that we don't lose functionality.
00:44:05 <thinrichs1> Not sure how other projects do that though.
00:45:13 <thinrichs1> Thanks for the update ekcs.
00:45:15 <ekcs> thinrichs1: you mean an automated test?
00:45:21 <thinrichs1> ekcs: yes
00:45:46 <thinrichs1> Let's all try to take time to read/comment on ekcs's etherpad proposal.
00:46:04 <thinrichs1> ekcs: or should we wait, if you're getting a spec ready?
00:46:25 <ekcs> I think quick scan and comment/questions would be helpful.
00:46:41 <thinrichs1> #action Everyone takes a quick look at the HA
00:47:09 <ekcs> #link https://etherpad.openstack.org/p/newton-congress-availability
00:48:00 <thinrichs1> For my status update, I've been looking at the vm-placement policy engine
00:48:28 <thinrichs1> trying to get it working again for person who mailed us wanting to use it.
00:48:54 <thinrichs1> Trying to enable it to be spun up like a datasource
00:49:27 <thinrichs1> just so that it can be used on mitaka/liberty/etc.
00:50:01 <thinrichs1> I'll also make sure to resurrect and fix up the patch for the datasource name/id bug
00:50:12 <thinrichs1> That's about it
00:51:17 <thinrichs1> Anything else today?
00:51:51 <ekcs> on vmplacement,
00:52:36 <ekcs> I’m a little confused as to whether the person wants to use vmplacement policy engine we have, or use his/her own vmplacement code.
00:52:48 <thinrichs1> That I don't know for sure either.
00:53:14 <thinrichs1> The vm-placement code wasn't easy to write, so I was assuming they'd want to start with that and tweak it.
00:53:17 <ekcs> ok
00:53:20 <thinrichs1> But it's hard to say
00:55:34 <thinrichs1> Seems like that's it for the day.
00:55:37 <thinrichs1> Thanks all!
00:55:43 <ekcs> thanks!
00:56:20 <masahito> thanks
00:57:15 <thinrichs1> #endmeeting
16:01:01 <Sam-I-Am> hello
16:01:20 <johndperkins> o/
16:01:20 <Sam-I-Am> emagana: moo.
16:01:57 <Sam-I-Am> anyone here for the net guide meeting?
16:02:02 <johndperkins> I am
16:02:07 <Sam-I-Am> cool
16:02:57 <openstack> Sam-I-Am: Error: Can't start another meeting, one is in progress.  Use #endmeeting first.
16:03:03 <Sam-I-Am> #endmeeting