#openstack-meeting log

00:05:50 <thinrichs> #startmeeting CongressTeamMeeting
00:05:51 <openstack> Meeting started Thu Aug 11 00:05:50 2016 UTC and is due to finish in 60 minutes.  The chair is thinrichs. Information about MeetBot at http://wiki.debian.org/MeetBot.
00:05:52 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
00:05:54 <openstack> The meeting name has been set to 'congressteammeeting'
00:06:00 <thinrichs> ekcs: hi!
00:06:05 <aimeeu> Greetings all!
00:06:20 <thinrichs> ekcs: thanks for handling last week's meeting.  I looked over the logs—sounded like a good one.
00:06:38 <ramineni_> hi
00:06:49 <ekcs> thanks!
00:07:07 <thinrichs> masahito is out of town this week
00:07:12 <thinrichs> So we can get started.
00:07:17 <thinrichs> On my agenda...
00:07:26 <thinrichs> 1. Extra ATC nominations
00:07:32 <thinrichs> 2. Status updates
00:07:53 <thinrichs> 3. Synchronization during API calls discussion
00:07:57 <thinrichs> Anything else?
00:08:22 <ekcs> i’d like to talk about
00:08:38 <ekcs> how to test congress HA in tempest.
00:08:53 <ekcs> but maybe only if masahito is here.
00:09:05 <thinrichs> 4. testing Congress HA in tempest (possibly)
00:09:31 <thinrichs> #topic Extra ATC nominations
00:10:04 <thinrichs> Typically Active Technical Contributors are people who have submitted  at least 1 patch
00:10:23 <thinrichs> but now it's possible to nominate someone who has been an active contributor but who has not submitted any patches
00:10:57 <thinrichs> Is there anyone like that we should consider nominating?
00:11:05 * thinrichs digging up link to description
00:11:35 <thinrichs> ATCs get free admission to the summit
00:11:42 <thinrichs> which is I think the main value
00:11:52 <thinrichs> besides acknowledging their contributions publically
00:12:36 <ekcs> Most people have patches. Only person I can think of who wouldn’t have a Congress patch may be bryan_att.
00:13:23 <thinrichs> #link http://www.openstack.org/legal/technical-committee-member-policy/
00:14:16 <aimeeu> ekcs: That's true. I don't think he's submitted any patches to Congress,  but he is actively interested and integrated Congress into OPNFV Copper.
00:14:30 <thinrichs> " Contributions
00:14:30 <thinrichs> might include, but aren't limited to, bug triage, design work, and
00:14:30 <thinrichs> documentation -- there is a lot of leeway in how teams define
00:14:30 <thinrichs> contribution for ATC status."
00:14:58 <thinrichs> Nova submits people who have Co-authored a patch but not been primary committer
00:15:36 <thinrichs> Integrating Congress into OPNFV Copper probably wouldn't do it.  That's great open-source contributions but not contributions to OpenStack
00:15:54 <ekcs> question: it is a project by project thing? like basically if someone has other openstack patches but not congress patches, does it still make any difference whatsoever to nominate someone?
00:15:57 <thinrichs> He worked on installers though and was trying to get them committed
00:16:07 <thinrichs> ekcs: I don't think so
00:16:17 <thinrichs> ekcs: I think it's binary: either your an ATC or you're not.
00:16:36 <thinrichs> ekcs: whether you contributed to 1000 patches to 10 different projects or 1 patch
00:16:37 <aimeeu> thinrichs: Yes, something with JuJu but he may not have been the primary author.
00:17:25 <thinrichs> bryan_att is the only person I can think of too.  I'll follow up with him to see what he's done and whether he's already an ATC.
00:17:42 <thinrichs> Oh, and I guess whether he's an OpenSTack Foundation member, which is also required
00:18:26 <thinrichs> Moving on…
00:18:28 <aimeeu> thinrichs: I'm checking Stackalytics
00:18:39 <thinrichs> aimeeu: great!
00:18:57 <aimeeu> (well I do work with him sort of)
00:19:04 <thinrichs> #topic status
00:19:16 <thinrichs> ramineni_: want to start?
00:19:41 <ramineni_> thinrichs: ok
00:21:00 <ramineni_> not much update, worked on couple of gate failures and tempest tests
00:21:32 <ramineni_> thats it from my side
00:21:45 <thinrichs> The gate seems to be green now.  Thanks!
00:21:50 <thinrichs> Sounded like a challenge to get it working.
00:22:24 <ramineni_> actually inbetween its because of tempest failure, not related to our code
00:22:45 <ramineni_> the other one, ekcs actually worked an disabled the test , as it is redundant
00:23:19 <thinrichs> Right—saw the disabled one.  Sounds like there may be some deeper issues lurking there.
00:23:30 <thinrichs> Anything to worry about?
00:24:04 <ramineni_> ya, couldnt figure out what is the issue, i thought locking mechanism in place should fix that , masahito already has patch for that
00:24:30 <ekcs> it seems like masahito’s patch fixes the issue.
00:24:38 <ekcs> *would* fix the issue.
00:25:44 <thinrichs> I imagine we'll find a number of locking issues as we move forward with the new arch
00:26:09 <thinrichs> ekcs: want to go next?
00:26:25 <ekcs> lossless DSD failover patch in review. lossless PE failover patch is coming.
00:26:34 <ekcs> Thought about how to do integration test of replicated Congress deployment. Maybe we'll discuss it later.
00:26:47 <ekcs> Identified a couple more issues with datasource add/delete/sync when it comes to distributed/replicated deployment. About to document in bugs.
00:26:53 <ekcs> Worked with ramineni_ on resolving gate issues.
00:27:15 <ekcs> that’s all.
00:27:56 <thinrichs> ekcs: it is lossless in the sense that it retries to exec an action?
00:28:25 <ekcs> thinrichs: yea. loose terminology.
00:28:41 <thinrichs> ekcs: just making sure I knew the patch you meant
00:29:00 <thinrichs> Maybe I missed this when looking at the patch....
00:29:16 <thinrichs> do all the PEs retry?  Or how do they know when to retry?
00:29:55 <ekcs> all PEs retry.
00:30:07 <thinrichs> on every action execution?
00:30:08 <ekcs> the DSD (whenever it comes back online) would acknowledge all “followers” and exec action from “leader"
00:30:28 <ekcs> yes.
00:31:02 <thinrichs> Got it.
00:31:57 <ekcs> if we’re not careful,
00:32:16 <ekcs> we could run out of greenthreads if lots and lots of actions take place within the retry time.
00:32:26 <ekcs> not an issue if retry is handled by oslo-messaging.
00:32:55 <thinrichs> Does oslo-messaging have retry functionality?
00:33:59 <ekcs> it does, but it’s not totally clear how it behaves and whether it fits our need.
00:34:14 <ekcs> response to your comment: The reason why I implemented our own retry logic is the following:
00:34:15 <ekcs> 1. Using built-in retry, it doesn't seem that we can separately specify how long to wait on each try and how long before we give up retrying. From reading the docs, in some situations oslo-messaging may wait the full length of the timeout on the first try, instead of retrying. The exact behavior probably depends on the transport used and other specific conditions.
00:34:16 <ekcs> 2. oslo-messaging does not support unbounded retry without timeout. I made it an option with our retry logic. But maybe we don't want that anyway. (Like maybe if the ExecutionDriver takes a full day to come back, maybe it never makes sense to retry executing the action after that long.)
00:34:59 <thinrichs> Understood.  Thanks for copying in your response here.
00:35:03 <ekcs> I also asked this on openstack-dev, but haven’t gotten anything helpful.
00:35:10 <ekcs> Hi all, I have a question about oslo-messaging RPC retry. Thanks so much!
00:35:11 <ekcs> Say I set timeout=600 and retry=None (unbounded retry within time), would an rpc call wait 600s on the first try or would it do multiple retries until 600s total time has elapsed? What factors does the answer depend on?
00:35:11 <ekcs> Alternatively, if I set timeout=600 and retry=100, would it be 600s total time before failing? or would it be 600s timeout for each try?
00:35:22 <ekcs> May need to just read the code and test it myself to figure out.
00:36:00 <ekcs> not clear how to test how many failed retries take place.
00:36:27 <thinrichs> Right—retry logic seems hard to test.  Maybe better to look at code/comments for that one.
00:37:17 <thinrichs> aimeeu: want to give a status update?
00:37:26 <aimeeu> Sure. Unfortunately not much from me this past week due to lots of last minute meetings . Code reviews, and thanks ekcs for the feedback on the HA Overview guide. #link https://review.openstack.org/#/c/350731/
00:38:02 <aimeeu> That's it from me
00:38:56 <thinrichs> aimeeu: anything you need from us?  Last week you were looking for a feature to add and test.  Did you find something?
00:39:16 <aimeeu> ekcs gave me some ideas to ponder
00:40:07 <thinrichs> This isn't a feature, but we talked about replacing exception.message with the str(exception).
00:40:56 <aimeeu> Yes, I started on that but the majority of changes are in the horizon plug-in, so I was waiting until ramineni's patch was merged
00:41:04 <thinrichs> Got it.
00:41:30 <thinrichs> I'll keep my eye out then, as I'm (slowly) getting back into things after vacation
00:41:55 <aimeeu> I can probably finish up the exception patch tomorrow
00:42:07 <thinrichs> aimeeu: great!
00:42:22 <thinrichs> status update from me…
00:42:32 <thinrichs> Trying to get that large patch in that disables Dse1.
00:42:51 <thinrichs> Rebased today and addressed lingering comments.  Think it's ready to go.
00:43:03 <thinrichs> Other than that, trying to get back up to speed from vacation
00:43:17 <thinrichs> That's it.
00:43:27 <thinrichs> 2 items left on the agenda..
00:43:41 <thinrichs> synchronization on APIs and HA testing with tempest
00:44:10 <thinrichs> ekcs: since you've spent time thinking about it, I'd suggest HA testing with tempest
00:44:15 <thinrichs> unless you want to wait for masahito
00:44:32 <ekcs> i’ll put out what I have now
00:44:38 <thinrichs> great
00:44:46 <ekcs> and hear your comments and see what masahito thinks later.
00:44:49 <ekcs> may be quick.
00:44:54 <ekcs> Last meeting we discussed whether to add new HA tests to existing tempest job or create now one.
00:45:02 <ekcs> It's not clear how to test replicated Congress in tempest because:
00:45:03 <ekcs> 1. Not sure we can (and how complex it is to) deploy a load balancer in the test env.
00:45:03 <ekcs> 2. Not clear how easy it is to kill and restart individual processes in the test env.
00:45:30 <ekcs> My current thinking is this:
00:45:31 <ekcs> For basic functionality testing: deploy 1 API + n PE + 1 DSDs-node, run all the current tempest tests (except the replica-HA test). Some tests would need to be modified to wait/retry because different instances get out of sync.
00:46:13 <thinrichs> Agree with basic functionality
00:46:16 <ekcs> Notice that this is NOT how we recommend deploying. We recommend n API+PE nodes behind a load balancer. But we can test it this way to avoid having to setup a load balancer. Not ideal, but I think it works.
00:46:57 <thinrichs> Hang on… missed the n PEs
00:47:16 <thinrichs> Does that work?
00:47:37 <thinrichs> I thought the API was hard-coded to send to 1 engine
00:47:40 <thinrichs> by name
00:48:10 <ramineni_> thinrichs: i think it sends if engine exists locally otherwise rpc
00:48:19 <ramineni_> so it should work
00:48:29 <ramineni_> ekcs: right?
00:48:32 <thinrichs> how does the API pick which engine to send it to?
00:48:35 <ekcs> Yes, but when we run multiple PEs with the same service name (different node name), oslo-messaging actually just arbitrarily picks one.
00:49:00 <thinrichs> Interesting behavior from oslo-messaging
00:49:07 <ekcs> or send to multiple and take response from first.
00:49:26 <ekcs> either way should work for tempest purpose.
00:49:36 <thinrichs> Another option is to deploy n API+PEs and 1 DSD, like we recommend...
00:49:48 <thinrichs> and then write tests that implement the load-balancer
00:50:13 <thinrichs> which we probably want to do anyway
00:50:34 <thinrichs> Such as write policy to API+PE 1; then run query on API+PE 2
00:51:03 <ekcs> that’s an option too.
00:51:31 <thinrichs> I could imagine having a LB class that simulates the behavior of a LB, for when we want to do that.
00:51:32 <ekcs> but then we get less reuse out of out current tempest tests.
00:51:50 <thinrichs> I see
00:52:21 <ekcs> yes, we can do that too. it’s a but complicated though. I was thinking we would deploy N API+PEs on different ports. and load balance between them.
00:52:27 <ekcs> as an option.
00:52:41 <ekcs> For testing different failover scenarios, I'm think maybe the best place is to start by using the unittest environment (similar to test_congress). Again it's not ideal because it's not how things actually get deployed, but it makes it much easier to manipulate and create failure scenarios.
00:52:41 <ekcs> But that's where I want to hear from you guys and masahito to see if I'm mis-estimating the complexity of different things. Like maybe it's easy to manipulate processes in tempest. And maybe it's easy to deploy load balancer in tempest.
00:53:15 <ekcs> and that’s where we can also do tests simulating LB behavior.
00:53:51 <ekcs> that’s all from me.
00:54:05 <thinrichs> I'd say it makes sense to use test_congress tests for more comprehensive testing.
00:54:39 <thinrichs> I usually think of tempest tests as checking that deployment and devstack are set up correctly.
00:54:58 <thinrichs> And tempest tests parallelism.
00:55:00 <ramineni_> thinrichs: swaning new processes and testing with curret tempest tests should be doable
00:55:32 <thinrichs> I guess we're already spinning up a new Congress instance inside the tempest tests.
00:55:36 <ekcs> ramineni_: right that’s an option as well.
00:55:38 <thinrichs> So there's the functionality to start new processes
00:55:53 <ramineni_> yes
00:55:55 <thinrichs> which makes spinning up LBs inside of tempest a possibility
00:56:01 <ekcs> it’s just not clear that it offers any benefit over test_congress style.
00:56:21 <thinrichs> is test_congress running with true parallelism?
00:56:46 <thinrichs> Can we effectively test race conditions?
00:57:07 <ekcs> I assume that if you start new processes then it’s parallel.
00:57:18 <ekcs> even if only one core is allocated, you still get to test race conditions.
00:57:38 <thinrichs> I'm always in favor of testing inside of test_congress since then we get feedback quicker when we've broken something.
00:57:50 <thinrichs> And I think it ends up being more comprehensive.
00:58:03 <thinrichs> because the tests are easier to write
00:58:14 <ekcs> yup that’s my thought too. but want to hear/consider if there are advantages to doing it in tempest.
00:58:16 <ramineni_> I agree , they would be more comperehensive
00:58:32 <ramineni_> tempest maybe we can go for basic testing
00:59:24 <thinrichs> We definitely want to test the real deployment with LBs, etc. using tempest.  Using test_congress for comprehensiveness makes sense to me too.
00:59:43 <thinrichs> 1 minute left.  Any last thoughts?
01:00:00 <ramineni_> i thought of adding unit tests for synchronizer too
01:00:13 <ramineni_> https://bugs.launchpad.net/congress/+bug/1609223
01:00:13 <openstack> ramineni_: Error: Could not gather data from Launchpad for bug #1609223 (https://launchpad.net/bugs/1609223). The error has been logged
01:00:15 <thinrichs> ramineni_: that would be cool!
01:00:26 <thinrichs> out of time
01:00:34 <thinrichs> Thanks all!
01:00:56 <ekcs> thanks!
01:01:17 <thinrichs> #endmeeting