20:00:09 <johnsom> #startmeeting Octavia
20:00:10 <openstack> Meeting started Wed Aug 29 20:00:09 2018 UTC and is due to finish in 60 minutes.  The chair is johnsom. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:00:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
20:00:14 <openstack> The meeting name has been set to 'octavia'
20:00:18 <cgoncalves> o/
20:00:19 <nmagnezi> o/
20:00:27 <johnsom> Hi folks
20:00:42 <johnsom> #topic Announcements
20:00:55 <johnsom> Same reminder about the PTG etherpad:
20:01:00 <johnsom> #link https://etherpad.openstack.org/p/octavia-stein-ptg
20:01:20 <johnsom> The PTG is coming up fast, so I will try to put a rough schedule together soon
20:01:43 <johnsom> Also a note, the TC election nominations start tomorrow.
20:01:50 <johnsom> #link https://governance.openstack.org/election/
20:02:01 <johnsom> In case you are interested in running for the TC
20:02:38 <johnsom> The official Rocky release patch is up for review. I have already checked the SHAs and our stuff looks good.
20:03:06 <nmagnezi> Nice
20:03:48 <johnsom> Any other announcements today?
20:04:14 <johnsom> #link https://review.openstack.org/597529
20:04:24 <johnsom> Finally found the Rocky release link...
20:04:28 <cgoncalves> Octavia Queens 2.0.2 tagging is blocked until stable team reviews a commit that caught that attention
20:04:39 <johnsom> Yeah, I saw that.
20:05:02 <cgoncalves> #link https://review.openstack.org/#/c/593954/
20:05:03 <johnsom> FYI, I have already bumped the OpenStack Ansible SHA to use that 2.0.2 version
20:05:20 <johnsom> I think it will be ok since it includes a default.
20:06:08 <johnsom> That is all I have so, moving on down the agenda
20:06:18 <johnsom> #topic Brief progress reports / bugs needing review
20:06:41 <johnsom> I have wrapped up most of my internal work so should have a bit more time to work on things upstream.
20:07:23 <johnsom> I recently wrote a tool to stress test the health manager process. It can also be used to populate a DB with load balancers.
20:07:41 <johnsom> I will probably post that in my github space sometime in the near future.
20:08:00 <cgoncalves> nice, thanks!
20:08:10 <cgoncalves> do share the link later :)
20:08:11 <johnsom> Testing has shown that we have a bit of a performance regressing in the HM and I have a patch for that in the works.
20:08:53 <nmagnezi> johnsom, if only we had such tools for nlbaas :D
20:08:56 <johnsom> I also noticed the UDP code is not reporting listener health correctly and the work around code is in my way with the HM fix, so I will also be posting a fix for that soonish
20:09:34 <johnsom> nmagnezi HA, well, you can run my existing API stress tool against neutron-lbaas....  But the results aren't pretty
20:09:54 <nmagnezi> yeah.. we know.. :)
20:10:00 <johnsom> It does pass well on Octavia however
20:10:35 <johnsom> That is mostly what I have been up to over the week.  Any other updates?
20:10:48 <johnsom> Oh, I forgot one item
20:10:59 <johnsom> #link https://review.openstack.org/#/c/594786/
20:11:20 <johnsom> I have posted an alternate option for how we can handle API versioning in the tempest plugin.
20:11:34 <johnsom> rm_work Also posted an idea for it.
20:11:46 <johnsom> We should try to review those and merge one soonish.
20:12:08 <johnsom> Without this, the tempest plugin fails when run against queens Octavia
20:12:21 <nmagnezi> Yes, we saw that in-house
20:12:30 <nmagnezi> Thanks so much for submitting that one
20:12:33 <johnsom> It tries to test features that were added in Rocky on the Queens cloud
20:12:46 <nmagnezi> Yeah stuff like listener timeout and such
20:13:00 <johnsom> Yep
20:13:13 <johnsom> This patch tests that patch with stable/queens:
20:13:17 <johnsom> #link https://review.openstack.org/#/c/595257/
20:13:28 <johnsom> Reviews would be appreciated.
20:14:34 <nmagnezi> Ack
20:14:35 <cgoncalves> ah, forgot to re-review. you made jobs voting, thanks
20:14:44 <johnsom> Any other updates?
20:14:54 <nmagnezi> Speaking of reviews, I have a question about other patch you posted
20:14:56 <cgoncalves> lol, I already re-reviewed :)
20:15:09 <nmagnezi> #link https://review.openstack.org/#/c/585031/
20:15:25 <johnsom> Sure, what is the question?
20:15:44 <nmagnezi> I also posted this in gerrit, what would happen if we have a new controller (with this patch) with an older amp
20:15:51 <nmagnezi> And we try to ask for stats
20:16:26 <johnsom> You should get an answer as this doesn't connect to the AMP, it just pulls data we already have in the database in a different way.
20:17:16 <nmagnezi> Oh, alright. didn't get to test it just yet sadly, just a thought that I had
20:18:14 <johnsom> Yeah, as of now, there is never a connection from the API process directly to an Amphora.  Only the other three processes
20:19:08 <johnsom> The background on this patch and the two in tempest-plugin is I needed a gate test that tests out VRRP failover for an internal request.
20:19:32 <johnsom> The best way I could come up with to figure out which amp is passing traffic was to look at the per-amphora stats.
20:20:16 <johnsom> The gate test works, but I still need to add some tempest API tests to the middle patch in that chain, so I have it WIP right now
20:21:32 <johnsom> Ok, if we don't have any more updates, I will move on.
20:21:42 <johnsom> #topic Upgrade-checker community goal
20:22:07 <johnsom> One of the two community goals for Stein is to have an "upgrade-checker" script like nova has.
20:22:33 <johnsom> Frankly I think this is low hanging fruit if someone is looking for a small project.
20:22:40 <johnsom> There are details here:
20:22:47 <johnsom> #link http://lists.openstack.org/pipermail/openstack-dev/2018-August/133888.html
20:23:36 <johnsom> Just wanted to raise awareness in case someone is looking for a little project.
20:23:57 <johnsom> I don't think we have much that would need to go in there at the moment.
20:24:58 <johnsom> #topic Open Discussion
20:25:08 <johnsom> Any other topics today?
20:26:20 <cgoncalves> we've received couple of reports these past days that octavia can bring down all loadbalancers due to a DB outage
20:26:26 <cgoncalves> #link https://storyboard.openstack.org/#!/story/2003575
20:26:50 <cgoncalves> I plan to work on this as soon as I can
20:27:09 <cgoncalves> a traceback can be read here: https://bugzilla.redhat.com/show_bug.cgi?id=1603138#c4
20:27:09 <openstack> bugzilla.redhat.com bug 1603138 in openstack-octavia "Controller replacement with Octavia- After controller replacement amphora is in ERROR state" [Urgent,Closed: duplicate] - Assigned to cgoncalves
20:27:40 <johnsom> Yeah, bummer.  I have added a comment to that story with an idea of how to approach it
20:28:07 <cgoncalves> examples of operations that can trigger this issue are: node reboot, db service restart, cloud upgrade (causing DB downtime)
20:28:29 <cgoncalves> so, whatever causes DB downtime...
20:28:56 <johnsom> Well, ideally you are running a DB cluster so minor things like a DB restart should not impact Octavia
20:29:42 <nmagnezi> a tmp network disruption can also block the connection to DB
20:29:47 <cgoncalves> yeah, I need to check why it still caused that on 3-controller HA deployments
20:30:56 <johnsom> There is built in DB retries in the oslo DB layer, so short network blips should not trigger it either.  Plus, with the default settings it would have to be unreachable for 60 seconds or more across all of the HMs
20:31:49 <johnsom> Maybe there is a bug in galera (if that is what you are using for clustering) or oslo DB.  We should do some testing.
20:32:03 <johnsom> Either way, I think we should improve how the HMs handle the situation
20:32:26 <cgoncalves> agreed. we should catch db_exc.DBConnectionError exceptions in health_check()
20:32:42 <xgerman_> sorry, had some prod issue to attend...
20:33:04 <cgoncalves> xgerman_, let me guess. octavia brought down all LBs due to DB outage? xd
20:33:17 <xgerman_> no, unrelated to Octavia
20:33:38 <nmagnezi> He had to look around for zombies
20:33:52 <cgoncalves> ha! :)
20:34:59 <johnsom> cgoncalves Feel free to ping me if you want to bounce ideas around....
20:35:33 <cgoncalves> johnsom, I most certainly will, thanks :)
20:35:42 <johnsom> Other topics for today?
20:36:44 <xgerman_> PTG? Did you plug the etherpad?
20:37:09 <xgerman_> ah saw it
20:38:02 <johnsom> Yeah, I posted the normal reminder at the start of the meeting
20:38:39 <xgerman_> +1
20:38:41 <johnsom> xgerman_ lol, oh, we will be talking about Octavia.....  (topic 18 in the etherpad)
20:38:52 <cgoncalves> I like item "18. Octavia". let's discuss Octavia at the PTG xD
20:39:20 <johnsom> Ok, if we don't have any more topics, I will close out the meeting.
20:40:28 <johnsom> Ok, thanks folks!
20:40:34 <johnsom> #endmeeting