#openstack-lbaas log

20:00:03 <johnsom> #startmeeting Octavia
20:00:04 <openstack> Meeting started Wed May  2 20:00:03 2018 UTC and is due to finish in 60 minutes.  The chair is johnsom. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:00:05 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
20:00:08 <openstack> The meeting name has been set to 'octavia'
20:00:16 <johnsom> Hi folks!
20:00:49 <cgoncalves> hi
20:00:52 <xgerman_> o/
20:01:00 <johnsom> #topic Announcements
20:01:12 <johnsom> The only announcement I have this week is that we have a new TC elected:
20:01:17 <xgerman_> +1
20:01:18 <johnsom> #link https://governance.openstack.org/election/results/rocky/tc.html
20:02:06 <johnsom> Oh, and there is now an Octavia ingress controller for Octavia
20:02:13 <johnsom> #link https://github.com/kubernetes/cloud-provider-openstack/tree/master/pkg/ingress
20:02:35 <johnsom> Any other announcements this week?
20:03:14 <johnsom> #topic Brief progress reports / bugs needing review
20:03:49 <johnsom> I have been busy working on the provider driver.  The Load Balancer part is now complete and up for review comments.
20:03:56 <johnsom> #link https://review.openstack.org/#/c/563795/
20:04:16 <johnsom> It got a bit big due to single-call-create being part of load balancer.
20:04:30 <rm_work> o/
20:04:32 <johnsom> So, I'm going to split it across a few patches (and update the commit to reflect that)
20:05:01 <nmagnezi> johnsom, thank you for taking a lead on this. I will review it.
20:05:06 <johnsom> Ha, I guess there is that announcement as well
20:05:33 <rm_work> I have been working on the octavia tempest plugin. Two patches ready for review (although I need to address johnsom's comments)
20:05:36 <johnsom> I think the listener one will be a good example for what needs to happen with the rest of the API.  It's up next for me
20:05:53 <johnsom> +1 on tempest plugin work
20:07:06 <johnsom> Any updates on Rally or grenade tests?
20:07:53 <cgoncalves> sorry, I still need to resume the grenade patch
20:08:24 <johnsom> Ok, NP.  Just curious for an update.
20:08:32 <nmagnezi> johnsom, the rally scenario now works, i have some other internal fires to put out and then I'll iterate back to run it and report the numbers. it had a bug with the loadbalancers cleanup which is fixed now. so we are in a good shape there overall.
20:08:47 <johnsom> Cool!
20:09:11 <johnsom> Any other updates this week or should we move on to our next agenda topic?
20:09:24 <nmagnezi> yeah :) it took quite a few tries but it worth the effort i think.
20:09:36 <johnsom> #topic Discuss health monitors of type PING
20:09:44 <johnsom> #link https://review.openstack.org/#/c/528439/
20:09:53 <johnsom> nmagnezi This is your topic.
20:10:04 <nmagnezi> open it ^^ while gerrit still works :)
20:10:13 <rm_work> PING is dumb and should be burned with fire
20:10:17 <nmagnezi> so, rm_work submitted a patch to allow operators to block it
20:10:26 <johnsom> I can give a little background on why I added this feature.
20:10:39 <cgoncalves> rm_work: wait for it. I think you will like it ;)
20:10:46 <johnsom> 1. Most load balancers offer it.
20:10:49 <rm_work> johnsom: because you want users to suffer?
20:10:52 <nmagnezi> i commented that I understand rm_work's point, but I don't know if adding a config option is a good idea here
20:11:02 <nmagnezi> rm_work, lol
20:11:33 <rm_work> we're handing them a gun and pointing it at their foot for them
20:11:34 <nmagnezi> anyhow, the discussion I think we should have is whether or not we want to deprecate and later remove this option from our API
20:11:47 <rm_work> cgoncalves: you're right :)
20:11:57 <johnsom> 2. I was doing some API load testing with members and wanted them online, but not getting HTTP hits to skew metrics.
20:12:53 <rm_work> you could also just ... not use HMs in a load test... they'll also be "online"
20:13:02 <rm_work> or use an alternate port
20:13:10 <johnsom> Well, they would be "no monitor"
20:13:35 <rm_work> does TCP Connect actually count for stats?
20:13:36 <johnsom> It was basically, ping localhost so they all go online no matter what.
20:14:17 <johnsom> So, I'm just saying there was a reason I went to the trouble to fix that (beyond the old broken docs that listed it)
20:15:11 <rm_work> we could rename it to "DO_NOT_USE_PING"
20:15:16 <nmagnezi> johnsom, your opinion is that we should keep ping hm as is?
20:15:38 <johnsom> Now, I fully understand that joe-I-don't-know-jack-but-am-an-load-balancer-expert will use PING for all of the wrong reasons....  I have seen it with my own eyes.
20:16:18 <rm_work> in *most openstack clouds* the default SG setup is to block ICMP
20:16:29 <rm_work> though I guess I can't back that up with actual survey data
20:16:47 <johnsom> Nice, so they instantly fail and they don't get too burned by being dumb
20:16:54 <johnsom> grin
20:16:56 <rm_work> so people are like "all my stuff is down, your thing is broken"
20:17:41 <xgerman_> I dislike most ooenstack clouds — there are some wacky clpuds out there
20:17:46 <rm_work> lol
20:18:02 <johnsom> My stance is, most, if not all of our load balancers support it. There was at least one use case for adding it. It's there and works (except on centos amps). Do we really need to remove it?
20:18:05 <nmagnezi> johnsom, in your eyes, what are the right reasons for using ping hm?
20:18:16 * xgerman_ read about people using k8s to loadbalance since they don’t want to upgrade from Mitaka
20:18:27 <johnsom> Testing purposes only...  Ha
20:18:34 <nmagnezi> lol
20:19:19 <nmagnezi> i'm not asking if we should or shouldn't remove this because of the centos amps. I'm asking this because it seem that everyone agree with rm_work's gentle statements about ping :)
20:19:38 * rm_work is so gentle and PC
20:20:17 <rm_work> tremendously gentle, everyone says so. anyone who doesn't is fake news
20:20:27 <johnsom> #link http://andrewkandels.com/easy-icmp-health-checking-for-front-end-load-balanced-web-servers
20:20:29 <johnsom> lol
20:20:34 <cgoncalves> +1. unless there's a complelling use case for keeping ping, I'm for removing it
20:20:48 <rm_work> we SHOULD probably check with some vendors
20:20:54 <rm_work> I wish we had more participation from them
20:20:58 <nmagnezi> the point i'm trying to make here is that if ping is something we would want to keep, i don't think we need a config option to block it.
20:21:06 <xgerman_> +1
20:21:12 <rm_work> I don't even see most of our vendor contacts in-channel anymore
20:21:20 <nmagnezi> if we agree that it should be removed, we don't need that config option as well :)
20:21:26 <xgerman_> that’s why we are ding providers
20:21:38 <rm_work> nmagnezi: yeah, this was supposed to be a compromise
20:21:53 <rm_work> you could argue that all compromise is bad and we should just pick a direction
20:21:54 <xgerman_> anyhow, I think ping has value — not everybody runs HTTP or TCP
20:22:00 <xgerman_> we have UDP coming up
20:22:06 <johnsom> Yeah, from what I see, all of our vendors support ICMP
20:22:15 <rm_work> alright
20:22:16 <rm_work> well
20:22:37 <xgerman_> just trying to thik through a UDP healthmonitor
20:22:39 <johnsom> This is true, UDP is harder to check
20:22:42 <rm_work> yes
20:22:57 <johnsom> Maybe someone will want us to load balance ICMP....
20:22:58 <johnsom> grin
20:23:03 <rm_work> but that's why there's TCP_CONNECT and alternate ports
20:23:03 <nmagnezi> HAHA
20:23:33 <rm_work> any reason a UDP member wouldn't allow a TCP_CONNECT HM with the monitor_port?
20:23:53 <johnsom> Yes, if they don't have any TCP code....
20:24:09 <nmagnezi> rm_work, that might depend on the app you run on the members
20:24:51 <rm_work> i mean
20:24:54 <johnsom> Yeah, so F5, A10, radware, and netscaler all have ICMP health check options
20:24:56 <rm_work> you would run another app
20:25:04 <rm_work> that is a health check for the UDP app
20:25:08 <rm_work> to make sure it is up, etc
20:25:33 <rm_work> so combo of connectable + 200OK response == good
20:25:43 <rm_work> I was pretty sure that was the standard for healthchecking stuff and why we added the monitor_port thing to begin with
20:25:48 <johnsom> Well, some of this UDP stuff is for very dumb/simple devices. That was what the use case discussion was at the PTG around the need for UDP
20:26:02 <nmagnezi> rm_work, sounds a little bit redundant. if you want to check the health of you ACTUAL app, why have another one just to answer the lb?
20:26:06 <xgerman_> probably not too dumb for ICMP
20:26:25 <nmagnezi> (but you could argue the same for ICMP, but at least it checks networking.. ha)
20:26:28 <johnsom> So, if the concern is for users mis-using ICMP, should we maybe just add a warning print to the client and dashboard?
20:26:37 <nmagnezi> johnsom, +!
20:26:40 <nmagnezi> johnsom, +1
20:26:48 <xgerman_> +1
20:26:58 <rm_work> k T_T
20:27:01 <rm_work> I am ok with this
20:27:03 <nmagnezi> johnsom, i would add another warning to the logs as well
20:27:25 <cgoncalves> +1, plus warning msg in server side?
20:27:32 <rm_work> eh, logs just go to ops, and they can see it in the DB
20:27:33 <rm_work> which is easier to check
20:27:34 <rm_work> and they already know it's dumb
20:27:38 <rm_work> i wouldn't bother with the server side
20:27:45 <johnsom> Eh, not sure operators would care that much what heatlh monitors the users are setting.  Does that cross the "INFO" log level?????
20:27:46 <rm_work> its users we need to reach
20:28:28 <nmagnezi> johnsom, a user being dump sounds like a warning to me :)
20:28:34 <nmagnezi> dumb*
20:29:12 <johnsom> Yeah, I just want us to draw a balance between filling up log files with noise and having actionable info in there.
20:29:38 <nmagnezi> well, you only print it once, when the it's created
20:29:50 <nmagnezi> so it's not spamming the logs that bad
20:30:02 <johnsom> Ha, I have seen projects with 250 LBs in it.  Click-deploy....
20:30:27 <johnsom> I am ok with logging it, no higher than INFO, if you folks think it is useful
20:30:41 <nmagnezi> fair enough.
20:30:51 <rm_work> wait, isn't info the one that always prints?
20:31:01 <rm_work> or, i guess that was your point
20:31:02 <rm_work> k
20:31:08 <johnsom> It would be some "fanatical support" to have agents call the user that just did that....  Grin
20:31:40 <rm_work> I would set up an automated email job
20:31:46 <nmagnezi> lol
20:32:02 <johnsom> That was flux...
20:32:21 <johnsom> Ha, ok, so where are we at with the config patch?
20:32:22 <rm_work> "We noticed you just created a PING Health Monitor for LB #UUID#. We recommend you reconsider, and use a different method for the following reasons: ...."
20:33:04 <rm_work> I mean... I would still like to be able to disable it, personally, but I grant that it should probably remain an option at large (however reluctantly)
20:33:07 <johnsom> I can open a story to add warnings to the client and dashboard
20:33:32 <rm_work> I can put WIP on this one or DNM or whatever, and just continue to pull it in downstream I guess <_<
20:33:48 <rm_work> I just figured a config couldn't hurt
20:34:11 <rm_work> the way I designed it, it would explain to the user when it blocks the creation
20:34:17 <nmagnezi> rm_work, if everyone else agree on that, I will not be the one to block it. Just wanted to raise discussion around this topic
20:34:22 <johnsom> I am ok with empowering operators myself
20:34:55 <rm_work> can we get CentOS to 1.8? :P
20:35:05 <rm_work> I'd have a much weaker case then
20:35:13 <xgerman_> \me wrong person to ask
20:35:14 <cgoncalves> +1, still knowing nmagnezi is not a fan of adding config options like this
20:35:15 <nmagnezi> cgoncalves and myself are working on it. it's not easy but we are doing our best :)
20:35:25 <cgoncalves> rm_work: soon! ;)
20:35:33 <rm_work> k
20:35:33 <nmagnezi> rm_work, we'll keep you posted
20:35:34 <rm_work> I mean
20:35:35 <rm_work> if we got a more official repo
20:35:40 <rm_work> we don't even need it in the main repo
20:35:49 <rm_work> we could merge my patch to the amp agent element
20:35:54 <rm_work> err, amp element
20:36:10 <rm_work> (which I already pull in downstream)
20:36:21 <cgoncalves> rm_work: short answer is: likely to have 1.8 in OSP14 (Rocky)
20:36:32 <rm_work> in what way?
20:36:41 <rm_work> CentOS amps based on CentOS8?
20:36:51 <rm_work> Official repo for OpenStack HAProxy?
20:37:00 <rm_work> HAProxy 1.8 backported into CentOS7?
20:37:34 <cgoncalves> cross tag. haproxy rpm in osp repo, same rpm as from openshift/pass repo
20:37:43 <rm_work> ok
20:37:53 <rm_work> so we would update and merge my patch
20:38:06 <cgoncalves> we will keep haproyz 1.5 but add 'haproxy18' package
20:38:11 <rm_work> yeah
20:38:18 <johnsom> #link https://storyboard.openstack.org/#!/story/2001957
20:38:54 <cgoncalves> rm_work: you could then delete the repo add part from your patch
20:39:01 <rm_work> ok
20:39:08 <rm_work> i wish i could look up that CR now >_>
20:39:14 <rm_work> great timing on gerrit outage for us, lol
20:40:55 <johnsom> So, I guess to close out the PING topic, vote on the open patch. (once gerrit is back)
20:41:16 <johnsom> #topic Open Discussion
20:41:23 <johnsom> Any topics today?
20:41:33 <rm_work> Multi-AZ?
20:41:42 <rm_work> I have a patch, it is actually reasonable to review
20:42:01 <rm_work> the question is... since it will only work if every AZ is routable on the same L2... is this reasonable to merge?
20:42:26 <rm_work> At least one other operator was doing the same thing and even had some similar patches started
20:42:28 <johnsom> We have a bionic gate, it is passing, but I'm not sure how giving the networking changes they made. It must have a backward compatibility feature.  It's on my list to go update the amphora-agent for bionic's new networking.
20:43:28 <johnsom> I have not looked at the AZ patch, so can't really comment at the moment
20:43:32 <rm_work> (or if they're using an L3 networking driver)
20:43:44 <rm_work> k, it's more about whether the concept is a -2 or not
20:45:27 <johnsom> In general mutli-AZ seems great to me.  However the details really get deep
20:47:06 <rm_work> yeah
20:47:33 <rm_work> though if you have a routable L2 for all AZs, or you use an L3 net driver... then my patch will *just work*
20:47:37 <xgerman_> +1
20:47:39 <rm_work> and the best part is that the only required config change is ... adding the additional AZs to the az config
20:47:51 <rm_work> :)
20:48:21 <xgerman_> Would love nova to do something reasonable but in the interim…
20:49:19 <johnsom> Yeah, so I think it's down to review
20:49:39 <johnsom> Which brings me to a gentle nag....
20:49:39 <xgerman_> +1
20:49:49 <johnsom> #link ttps://review.openstack.org/#/q/(project:openstack/octavia+OR+project:openstack/octavia-dashboard+OR+project:openstack/python-octaviaclient+OR+project:openstack/octavia-tempest-plugin)+AND+status:open+AND+NOT+label:Code-Review%253C0+AND+NOT+label:Verified%253C%253D0+AND+NOT+label:Workflow%253C0
20:50:09 <johnsom> Well, when gerrit is back up.
20:50:10 <nmagnezi> johnsom, forgot an  'h'
20:50:28 <rm_work> ono
20:50:29 <johnsom> There are a ton of open un-reviewed patches....
20:50:38 <johnsom> #undo
20:50:39 <openstack> Removing item from minutes: #link ttps://review.openstack.org/#/q/(project:openstack/octavia+OR+project:openstack/octavia-dashboard+OR+project:openstack/python-octaviaclient+OR+project:openstack/octavia-tempest-plugin)+AND+status:open+AND+NOT+label:Code-Review%253C0+AND+NOT+label:Verified%253C%253D0+AND+NOT+label:Workflow%253C0
20:50:42 <rm_work> so many
20:50:50 <rm_work> I need to go review too, but
20:50:53 <johnsom> #link https://review.openstack.org/#/q/(project:openstack/octavia+OR+project:openstack/octavia-dashboard+OR+project:openstack/python-octaviaclient+OR+project:openstack/octavia-tempest-plugin)+AND+status:open+AND+NOT+label:Code-Review%253C0+AND+NOT+label:Verified%253C%253D0+AND+NOT+label:Workflow%253C0
20:50:55 <rm_work> not just me :P
20:51:15 <johnsom> Yeah, please take a few minutes and help us with reviews.
20:51:41 <johnsom> Any other topics today?
20:52:30 <johnsom> Ok then. Thanks everyone!
20:52:35 <johnsom> #endmeeting