16:00:02 <rm_work> #startmeeting Octavia
16:00:03 <openstack> Meeting started Wed Jan 29 16:00:02 2020 UTC and is due to finish in 60 minutes.  The chair is rm_work. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:04 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:06 <openstack> The meeting name has been set to 'octavia'
16:00:10 <rm_work> #chair johnsom
16:00:11 <openstack> Current chairs: johnsom rm_work
16:00:14 <rm_work> #chair cgoncalves
16:00:15 <openstack> Current chairs: cgoncalves johnsom rm_work
16:00:17 <cgoncalves> hi
16:00:20 <johnsom> o/
16:00:22 <rm_work> o/
16:00:30 <haleyb> hi
16:00:45 <ataraday_> hi
16:02:08 <rm_work> looks like just a few of us today
16:02:13 <rm_work> or maybe this is the norm? anywho
16:02:15 <rm_work> #topic Announcements
16:02:44 <johnsom> The NDSU students working on TLS ciphers and protocols start today!
16:02:50 <rm_work> I was about to say that :D
16:02:58 <johnsom> We are doing an introduction meeting for them later today.
16:03:11 <cgoncalves> awesome! if you are reading this, welcome!!
16:03:26 <johnsom> #link https://www.openstack.org/foundation/2019-openstack-foundation-annual-report
16:03:41 <johnsom> We are also mentioned in the annual report for this effort.
16:04:23 <cgoncalves> spotlight on the project!
16:04:45 <rm_work> So that's all I have... Anyone else have anything to announce?
16:07:49 <johnsom> Nothing else from me
16:07:51 <rm_work> going once... going once... going twice... going three times.... going five times...
16:07:59 <rm_work> #topic Brief progress reports / bugs needing review
16:09:45 <haleyb> a talkative crowd today
16:09:54 <rm_work> apparently
16:10:26 <cgoncalves> I spent some time reviewing stuff, particularly jobboard patches. many merged
16:10:33 <rm_work> So, I've resurrected the patch that allows UDP pool HMs to use other protocols
16:10:35 <johnsom> So, failover flow....  It is working in the lab, I am at the "clean it up" and testing phase.
16:10:36 <rm_work> #link https://review.opendev.org/#/c/589180/
16:10:59 <johnsom> Recently I have been looking at the SINGLE topology testing.
16:11:23 <ataraday_> cgoncalves, rm_work thanks a lot for reviews! Just one change letf :)
16:11:23 <rm_work> We ran into issues with the UDP healthcheck in our environment (it's ... not a great design, but I guess it's the best we can do generically) so we need to be able to use other types on a UDP LB
16:11:37 <johnsom> For SINGLE LBs, I have dropped the outage time down to a second or two for manual failovers. This is a huge improvement.
16:11:39 <rm_work> ataraday_: so close! :D
16:12:05 <rm_work> johnsom: o/ does it create an amp in parallel before the delete of the old one?
16:12:09 <johnsom> Right now I am working to debug an IPv6 DAD issue triggered by my new code to speed up the SINGLE topology failover.
16:12:24 <rm_work> which is what we were avoiding previously due to "possible resource constraints" but i kinda thought was a BS reason
16:12:37 <ataraday_> rm_work, yeah, just the main change :D
16:13:13 <johnsom> rm_work, it does build prior to failover, so yes, if there is a quota/capacity constraint it will now fail. This is what also raised this DAD issue.
16:13:30 <johnsom> Duplicate Address Detection (DAD)
16:13:51 <rm_work> I just figured your kids came back home and kept interrupting you :D
16:14:00 <cgoncalves> ataraday_, great work on your patches! you asked a question today on Gerrit if amphorav2 should be default in Ussuri. we could discuss it here today if you'd like
16:14:53 <rm_work> I have a couple of patches that I worked on that are good to go I think, could just use more reviews and a push :D
16:14:59 <rm_work> #link https://review.opendev.org/#/c/699521/
16:15:05 <rm_work> ^^ to add more functionality to AZs
16:15:17 <rm_work> #link https://review.opendev.org/#/c/702535/
16:15:21 <johnsom> The rebase is going to be a nightmare I think....
16:15:35 <haleyb> johnsom: ping me if you need help with DAD, if it's failing is there a loop?
16:15:36 <rm_work> ^^ allow configuring whether you want to force one-armed
16:15:48 <johnsom> I have also done some significant refactoring around the amphorae driver and backend to clean up some "issues".
16:16:45 <johnsom> haleyb I have fixed these issues before. It's a sequencing issue with the new accelerated failover.  I just found it in testing last night, so will look at it and fix it today.
16:16:57 <haleyb> ack
16:17:59 <johnsom> Last time I tested, SINGLE completely rebuilds the amphora in around 30 seconds, Act/Stdby in around 70 seconds. Outage time is a second or less for both.  Switching to VRRP version 3 will drop it even more. I have a followup patch for that started.
16:18:42 <cgoncalves> you're on fire!
16:19:24 <johnsom> Still fully backport-able. No image roll needed, but would help bring down the SINGLE outage time.
16:20:14 <johnsom> Anyway, that and reviews have been my focus over the last week.
16:21:21 <haleyb> i'd like to ask for some of my py2 removal patches to get some reviews, we're seeing other repos randomly get bitten as third party library support goes away, would be good to get ahead of it
16:21:33 <haleyb> except for the six removal they're all pretty small
16:22:02 <haleyb> johnsom: should i put some on your list?
16:22:25 <johnsom> haleyb I think I have already been bugging you about some of those... grin
16:22:40 <johnsom> haleyb But, yes, please make sure they are on the priority list.
16:23:16 <ataraday_> cgoncalves, It can be discussed on gerrit :) I put question to highlight this point. Should be 'amphorav2' amoung enabled_provider_drivers by default or not
16:23:17 <haleyb> johnsom: yes, and i think i've re-spun most, i'll verify and add to list
16:23:39 <johnsom> #link https://etherpad.openstack.org/p/octavia-priority-reviews
16:23:45 <johnsom> Just in case someone doesn't have it
16:25:48 <rm_work> also this one I just rebased:
16:25:50 <openstackgerrit> Adam Harwell proposed openstack/octavia master: Update the lb_id on an amp earlier if we know it  https://review.opendev.org/698082
16:26:26 <rm_work> which was a combination of what ataraday_ and I did independently (though she did it first and I was just blind, lol)
16:30:34 <rm_work> ok so I guess it's time for:
16:30:35 <rm_work> #topic Open Discussion
16:32:02 <cgoncalves> I'd like to get input from the team on enabling KVM instead of QEMU, when possible, in the CI jobs
16:32:21 <rm_work> we've bounced around on this a lot
16:32:22 <johnsom> I am planning to finish up the basic cleanup stuffs and dev testing, then I will probably post failover with broken tests for v1 only. Followup will be with fixed tests.
16:32:33 <rm_work> we do it, and then it works for a bit, and then jobs start breaking, and then we have to disable it
16:32:40 <rm_work> we can try again, but just be aware
16:32:45 <rm_work> this'll be like the third time
16:32:53 <cgoncalves> context is that there are some nodepool providers that provide nested virtualization but we are not leveraging because of bugs in the ubuntu kernel
16:33:05 <johnsom> I am good with turning it on again if we seem to have passing tests across the nodepool providers.
16:33:31 <johnsom> We can hope that the kernel bug is now fixed and deployed across the nodepool fleet
16:34:03 <cgoncalves> although, I think root caused it to one particular provider (vexxhost) having not exact/best-matching CPU models than the actual physical CPU
16:34:08 <johnsom> We ran for a year and a half with it on without any issues, so I'm not worried about it in *general*
16:34:22 <cgoncalves> #link https://review.opendev.org/#/c/702921/
16:34:53 <johnsom> Last root cause I found in partnership with OVH was a kernel KVM bug with certain guest and host kernel versions.
16:34:54 <cgoncalves> note there's a depends-on for a devstack patch
16:35:16 <cgoncalves> in testing, seems to work fine at OVH
16:35:27 <cgoncalves> the problematic one was vexxhost because of the CPU model
16:35:42 <johnsom> Yeah, it's been a long time since we tried it again to see if there is a fix out.
16:35:46 <cgoncalves> setting cpu model to host-passthrough in libvirt helped
16:37:18 <cgoncalves> there's more information on the commit message that may better explain the context and proposal
16:38:09 <cgoncalves> maybe I should give folks some time to digest it. we can talk about it again next week or in Gerrit
16:38:26 <rm_work> its prolly fine to try again
16:39:38 <cgoncalves> works for me. I'll make sure the devstack patch merges
16:43:32 <rm_work> anything else or should we call it for today?
16:50:22 <rm_work> ok, calling it, thanks folks
16:50:26 <rm_work> #endmeeting