20:00:11 #startmeeting Octavia 20:00:12 Meeting started Wed Aug 1 20:00:11 2018 UTC and is due to finish in 60 minutes. The chair is johnsom. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:13 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:00:15 The meeting name has been set to 'octavia' 20:00:20 Hi folks! 20:00:35 o/ 20:00:47 #topic Announcements 20:00:54 o/ 20:01:09 O/ 20:01:12 We are still tracking priority bugs for Rocky. We are in feature freeze, but we can still be fixing bugs..... 20:01:18 #link https://etherpad.openstack.org/p/octavia-priority-reviews 20:01:37 o/ 20:02:06 As an FYI, Rocky RC1 is next week. This is where we will cut a stable branch for Rocky. We should strive to have as many bug fixes in as we can. 20:02:19 It would be super nice to only do one RC and start work on Stein 20:02:34 I do have some sad news for you however.... 20:03:02 Since no one ran against me, you are stuck with me as PTL for another release..... 20:03:11 #link https://governance.openstack.org/election/ 20:03:15 * xgerman_ raises the “4 more years” sign 20:03:18 4 more years \o/ !! 20:03:28 * nmagnezi joins xgerman_ 20:03:45 You all are trying to make me crazy aren't you.... 20:03:59 crazier 20:03:59 just showing our appreciation… 20:04:01 johnsom, you scared me for a sec 20:04:10 johnsom, not cool :) 20:04:37 Towards the end of the year it will be three years for me. I would really like to see a change in management around here, so.... Start planning your campaign. 20:04:38 this is not where we elect PTLs for a year? 20:05:23 Not yet. Maybe the cycle after Stein will be longer than six months.... 20:05:42 Actually Stein is going to be slightly longer than a normal release to sync back with the summits 20:05:53 :-) 20:05:56 #link https://releases.openstack.org/stein/schedule.html 20:06:05 If you are interested in the Stein schedule 20:06:57 Also an FYI, all of the OpenStack IRC channels now require you to be signed in with a freenode account to join. 20:07:05 #link http://lists.openstack.org/pipermail/openstack-dev/2018-August/132719.html 20:07:37 There have been bad IRC spam storms recently. I blocked our channel yesterday, but infra has done the rest today. 20:07:49 I see the longer Stein release as a good thing this time around 20:08:21 It doesn't mean we can procrastinate though.... grin 20:08:42 I think my early Stein goal is going to be implement flavors 20:09:05 That is all I have for announcements, anything I missded? 20:09:45 #topic Brief progress reports / bugs needing review 20:10:23 I have been pretty distracted with internal stuffs over the week, but most of that is clear now (some docs to create which I hope to upstream). 20:10:40 I have also been focused on the UDP patch and helping there. 20:10:54 did two bug fixes: one when nova doesn’t release the port for failover + one for the zombie amps 20:12:04 Yeah, the nova thing was interesting. Someone turned off a compute host for eight hours. Nova just sits on the instance delete evidently and doesn't do it, nor release the attached ports. 20:12:38 If someone has a multi-node lab where they can power of compute hosts, that patch could use some testing assistance. 20:12:47 +10 20:13:21 my multinode lab has customers — so I can’t chaos monkey 20:13:24 nothing special from my side: some octavia and neutron-lbaas backporting, devstack plugin fixing and CI jobs changes + housekeeping, then some tripleo-octavia bits 20:13:46 interesting, we're ... having that happen here, as we're patching servers on a rolling thing, and some hosts end up down for a while sometimes <_< 20:13:46 Any other updates? nmagnezi cgoncalves ? 20:14:04 xgerman_, could your patch (which I haven't looked yet) improve scenarios like https://bugzilla.redhat.com/show_bug.cgi?id=1609064 ? it sounds like it 20:14:04 bugzilla.redhat.com bug 1609064 in openstack-octavia "Rebooting the cluster causes the loadbalancers are not working anymore" [High,New] - Assigned to amuller 20:14:10 On my end: have been looking deeply into active standby. Will report bunch of stories (and submit patches) soon 20:14:30 Some if the issues where already known ; some look new (at least to me) 20:14:38 But nothing drastic 20:14:39 rm_work the neat thing we saw once, but couldn't confirm was nova status said "DELETED" but there is a second status in the EXT that said "deleting" 20:15:09 O_o 20:15:31 johnsom, I actually have a question related to active standby, but that can wait for open discussion 20:15:34 well ... we wouldn't get bugs like cgoncalves linked, as our ACTIVE_STANDBY amps are split across AZs 20:15:38 with AZ Anti-affinity ;P 20:16:05 which I still wish we could merge as an experimental feature, as i have seen at least two other operators HERE that use similar code 20:16:23 Stein… 20:16:32 Looks like it went to failover those and there were no compute hosts left: Failed to build compute instance due to: {u'message': u'No valid host was found. There are not enough hosts available.' 20:17:46 Yeah, so too short of a timeout before we start failing over or too small of a cloud? 20:17:55 johnsom, right. and I think after that the LBs/amps couldn't failover manually because they were in ERROR. I need to look deeper and need-info the reporter. anyway 20:18:14 hmmm. Ok, thanks for the updates 20:18:22 Our main event today: 20:18:30 #topic Make the FFE call on UDP support 20:19:09 Hmm, wonder if meeting bot is broken 20:19:16 mmh 20:19:16 Well, we will see at the end 20:19:22 I *swear* I've been wanting to test this :( I even restacked this afternoon with latest patch sets 20:19:35 Current status from my perspective: 20:19:51 1. the client patch is merged and was in Rocky. This is good and it seems to work great for me. 20:20:13 2. Two out of the three patches I have +2'd as I think they are fine. 20:20:40 3. I have started some stories for issues I see, but I don't consider show stoppers: https://storyboard.openstack.org/#!/story/list?status=active&tags=UDP 20:20:58 4. I have successfully build working UDP LBs with this code. 20:21:38 5. The gates show it doesn't break existing stuff. (the one gate failure today was a "connection reset" while devstack was downloading a package) 20:21:50 Yeah I think this also falls into "merge, fix bugs as we find them" territory 20:21:59 as with any big feature 20:22:12 so long as it doesn't interfere with existing code paths (which i believe it does not) 20:22:38 Yeah, I'm leaning that way too. I need to take another pass across the middle patch to see if anything recent jumps out at me, but I expect we can turn and burn on that if needed. 20:22:54 I am not entirely in love with the additional code path for UDP LB health 20:23:13 would it make sense to somehow flag it as experimental? there's not a single tempest test for it IIRC 20:23:16 but we can streamline that later 20:23:16 That is in the middle patch I haven't looked at for a bit 20:23:59 cgoncalves we have shipped stuff in worse shape tempest wise... sigh 20:24:01 yeah, my other beef is with having a special UDP listener on the amphora REST API… 20:24:27 xgerman_ What do you mean? It's just another protocol on the listener..... 20:25:04 Oh, amphora-agent API? 20:26:19 yep: https://review.openstack.org/#/c/529651/57/octavia/amphorae/backends/agent/api_server/server.py 20:26:23 cgoncalves I can probably wipe up some tempest tests for this before Rocky ships if you are that concerned. We will need them 20:26:38 Will be a heck of a lot easier than the dump migration tool tests 20:26:57 wouldn’t bet on it ;-) 20:27:27 Well, I have been doing manual testing on this a lot so have a pretty good idea how I would do it 20:27:35 johnsom, I'd prefer having at least a basic udp test but I wont ask you for that. too much already on your plate 20:29:32 Any more discussion or should we vote? 20:30:04 well, how do others feel about the architecture ? 20:30:26 or, let’s vote ;-) 20:30:45 #startvote Should we merge-and-fix the UDP patches? Yes, No 20:30:46 Begin voting on: Should we merge-and-fix the UDP patches? Valid vote options are Yes, No. 20:30:47 Vote using '#vote OPTION'. Only your last vote counts. 20:31:06 #vote abstain 20:31:07 xgerman_: abstain is not a valid option. Valid options are Yes, No. 20:31:12 No maybe options for you whimps.... Grin 20:31:22 #vote yes 20:31:41 (do I get to vote?!) 20:31:41 #vote yes 20:31:52 Yes, everyone gets a vote 20:32:01 I was not involved in this, but johnsom reasoning make sense to me 20:32:04 #vote yes 20:32:53 xgerman_ rm_work Have a vote? Anyone else lurking? 20:32:59 ah 20:33:29 I thought sitting it out would be like abstain ;-) 20:33:50 #vote yes 20:34:06 * johnsom needs a buzzer for "abstain" votes 20:34:08 though I should at least try to get through the patches 20:34:20 to make sure there's nothing that'd be hard to fix later 20:34:39 Yeah, I think 1 and 3 are good. I would like some time on 2 today, so maybe push to merge later today or early tomorrow 20:35:11 Going once.... 20:35:16 Going twice..... 20:35:24 #endvote 20:35:25 Voted on "Should we merge-and-fix the UDP patches?" Results are 20:35:26 Yes (4): rm_work, nmagnezi, cgoncalves, johnsom 20:35:42 Sold, you are now the proud owners of a UDP protocol load balancer 20:36:57 dougwig: will be proud ;-) 20:37:09 So, cores, if you could give your approve votes on 1 and 3. Give us some time on 2. I will ping in the channel if it's ready for the final review pass 20:37:20 we now *really* need to fix it for centos amps. I just tried creating a LB and it failed 20:37:33 what’s the patch I have to pull for a complete install? 1 or 3? 20:37:43 Ah bummer. cgoncalves can you help with that or too busy? 20:38:04 johnsom, I will prioritize that for tomorrow 20:38:06 xgerman_ 3 or https://review.openstack.org/539391 20:38:22 k 20:38:25 I also added a follow up patch with API-ref and release notes and some minor cleanup 20:38:38 https://review.openstack.org/587690 20:38:43 Which is also at the end of the chain. 20:39:04 k 20:39:18 cgoncalves If you have changes, can you create a patch at the end of the chain? That way we can still make progress on review/merge but get it fixed 20:39:30 johnsom, sure 20:39:37 UDP, damn straight. 20:39:58 If I get done early with my review on 2 I might poke at centos, but no guarantees I will get there. 20:40:23 dougwig o/ Sorry you missed the vote. Now you can load balancer your DNS servers... grin 20:40:45 :-) 20:40:47 next up, rewrite in ruby 20:41:05 You had better sign up for PTL if you want to do that.... 20:41:08 grin 20:41:20 #topic Open Discussion 20:41:38 nmagnezi I think you had an act/stdby question 20:41:49 johnsom, yup :) 20:42:25 johnsom, so did a basic test of spawning a highly available load balancer , and captured the traffic on both amps 20:42:37 Specifically, on the namespace that we run there 20:42:49 MASTER: https://www.cloudshark.org/captures/1d0a1028c402 20:42:59 BACKUP: https://www.cloudshark.org/captures/8a4ee5b38e18 20:43:21 First question, mm.. I was not expecting to see tenant traffic in the backup one 20:43:54 Even if I manually "if down" the VIP interface (who does not send GARPs - I verified that) -> I still see that traffic 20:44:17 And that happens specifically when I send traffic towards the VIP 20:44:52 btw in this example -> 10.0.0.1 is the qrouter NIC and 10.0.0.3 is the VIP 20:45:25 It's likely the promiscuous capture on the port. 20:46:19 johnsom, you mean that it is because I use 'tcpdump -i any" in the namespace? 20:46:22 Oh, I know what it is. It's the health monitor set on the LB. It's outgoing tests for the member I bet 20:46:40 +1 20:46:48 IIRC I didn't set any health monitor 20:46:56 Lemme double check that real quick 20:46:56 mmh 20:47:02 Your member is located out the VIP subnet (you didn't specify a subnet at member create) 20:47:47 Because on that backup, those HTTP are all outbound from the VIP 20:48:21 Checked.. no health monitor set 20:48:42 The members reside on the same subnet as the VIP 20:49:00 All in privste-subnet the is created by default in devstack 20:49:37 Hmm, they do look a bit odd. Yeah, my bet is the promiscuous setting on the port is picking up the response traffic from the master, let's look at the MAC addresses. 20:50:29 That is probably why only half the conversation is seen on the backup. 20:50:57 Yeah that looked very strange.. no SYN packets 20:51:04 If you check, the backup's haproxy counters will not be going up 20:51:47 Will check that 20:52:10 But honestly I was not expecting to see that traffic on the backup amp 20:53:01 It looks right to me in general. Yeah, generally I wouldn't either, but I'm just guessing it's how the network is setup underneath and the point of capture. 20:53:26 I still don't get why it's there actually. I know the two amps communicate for other stuff (e.g. keepalived) 20:53:50 okay 20:53:56 The key to helping understand that is to look at the MAC addresses of your ports and the packets. The 0.3 packets will likely have the MAC of the base port on the master 20:54:47 If you switch it over it should be the base port of the backup in those packets. 20:55:19 Does that help? 20:55:34 I was inspecting the qrouter arp table while doing those tests. It remained consistent with the MASTER MAC address 20:55:39 It does, thank you 20:55:44 Will keep looking into this 20:55:51 Ok, cool. 20:55:58 Any other items today? 20:56:01 If there's time I have another question 20:56:08 But let other folks talk first 20:56:09 :) 20:56:09 Sure, 5 minutes 20:56:29 Going once.. 20:56:36 Just take it 20:56:39 ha 20:56:40 ok 20:56:50 So if we look at the capture from master 20:57:00 MASTER: https://www.cloudshark.org/captures/1d0a1028c402 20:57:19 Some connections end with RST, ACK and RST 20:57:20 Some not 20:57:35 Is that an HAPROXY thing to close connections with pool members? 20:58:21 It does not happen with all the sessions 20:58:22 If it is a flow with the pool member, yes, that is the connection between HAProxy and the member server. 20:59:41 Okay, no more questions here 20:59:45 If the client on the front end closes the connection to the LB, haproxy will RST the backend. 20:59:54 Thank you! 20:59:59 Let me see if I can find that part of the docs. 21:00:07 I will send a link after the meeting 21:00:13 Np 21:00:35 Thanks folks! 21:00:39 #endmeeting