15:01:40 <lamt> #startmeeting openstack-helm
15:01:44 <openstack> Meeting started Tue Nov 17 15:01:40 2020 UTC and is due to finish in 60 minutes.  The chair is lamt. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:45 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:47 <openstack> The meeting name has been set to 'openstack_helm'
15:02:14 <lamt> #link https://etherpad.opendev.org/p/openstack-helm-weekly-meeting agenda
15:02:20 <stevthedev> Hello
15:02:31 <lamt> o/
15:02:41 <miniroy> \o
15:02:42 <cliffparsons> o/
15:03:02 <lamt> Our fearless leader is feeling unwell. He asked me to host this week's meeting
15:03:31 <miniroy> I am glad he/she decides to take a day off
15:04:12 <stevthedev> I hope they are getting some rest
15:04:32 <lamt> I hope *he* is too.
15:04:59 <lamt> I think we can get started.
15:05:29 <lamt> Quick reminder, there is no meeting next week
15:05:41 <lamt> #topic OSH compute gate failure
15:06:05 <lamt> This is a follow up from last week, but the gate is still broken
15:06:32 <miniroy> X)
15:06:35 <lamt> so no OSH patches can merge
15:07:05 <sangeet> oh no!
15:07:59 <stevthedev> Do we know how it is broken?
15:08:26 <lamt> Not exactly, conjecture is some system library in Bionic
15:08:37 <lamt> starts to act up
15:09:01 <miniroy> to be precise, causing openvswitch to gag when it can't create threads
15:09:32 <stevthedev> Sounds tricky :/
15:09:40 <miniroy> let me try that tweak today
15:09:52 <lamt> I tried to update all the old stuff in the gate (minikube version, helm version, k8s version)
15:10:14 <miniroy> all signs point to a system issue though
15:10:22 <lamt> (I think they should be updated anyway) - here was my last attempt that sort of work
15:10:41 <miniroy> oh really... did that actually help?
15:10:45 <lamt> https://review.opendev.org/#/c/762361/
15:10:55 <lamt> Updating the host to focal worked
15:11:02 <lamt> "worked" - but
15:11:09 <lamt> it introduced new problems
15:11:33 <lamt> it crashes ceph and some other python/C library errored with Rocky
15:12:35 <lamt> I assume it is some system library difference between focal and bionic was the cause, but I wasn't able to pinpoint it
15:12:59 <miniroy> hmm.... so at least it confirms our suspicsion
15:13:00 <lamt> if anyone wants to take a lot, I'd appreciate that
15:13:21 <miniroy> I will take another crack at it today
15:14:00 <lamt> Thanks - if upgrading to focal isn't the correct path, we have to find the fix for ovs
15:14:07 <miniroy> so any other host we can try on besdies vexx and focal?
15:14:36 <lamt> I believe Andrii tried to address the max task parameter, but from what I can tell it was never limited
15:14:43 <lamt> so setting it to infinity doesn't do anything
15:15:15 <lamt> I tried the 32gb node, and same error - doesn't look like system memory constrainted.
15:17:07 <lamt> Thanks miniroy for taking a look
15:17:29 <miniroy> I wonder if something in the image change.....
15:17:37 <miniroy> but these are great data points to have
15:17:58 <lamt> I tried that too - I reverted the ovs image to the date the last gate passed
15:18:04 <lamt> same failure :/
15:18:31 <miniroy> =(
15:19:19 <lamt> others can chime in as well - but I exhausted all the possibilities why ovs keeps failing
15:20:18 <miniroy> we don't have local access to any of these hosts right?
15:20:30 <lamt> I do not think so
15:22:20 <lamt> if it is not system library - it is some constraints preventing pthread creation
15:22:51 <miniroy> any idea what kernel is running on focal?
15:24:00 <lamt> ansible_kernel: 5.4.0-53-generic
15:24:39 <miniroy> oh and that works actually... wow
15:24:58 <lamt> it breaks other things
15:25:03 <lamt> like ceph and rocky
15:25:09 <lamt> with cffi
15:26:10 <miniroy> so should we move ahead and try to get it work with focal?  or try to get it working with vexx?
15:26:52 <lamt> I will let others chime in - both require a bit of work
15:27:33 <lamt> it is asking if you want to fix errors A or errors B.
15:28:44 <miniroy> well let me poke at that ovs/vexx issue today more then
15:28:55 <lamt> thanks miniroy
15:28:58 <miniroy> sounds like probably more work at this point with upgrade to focal
15:29:11 <miniroy> looks like an onion to me I am afraid
15:29:25 <lamt> the alternative is to fix ovs
15:29:38 <miniroy> ceph is probably just the first layer I am afraid
15:29:56 <lamt> yup
15:30:14 <miniroy> in the absense of our fearless leader, I say let's focus on fixing ovs for now
15:30:30 <lamt> ++
15:31:16 <lamt> we can take this offline to troubleshoot
15:31:41 <miniroy> +
15:31:43 <lamt> let's move on
15:31:50 <lamt> #topic Open Discussion
15:32:28 <lamt> there is no other agenda item - so opening the floor for discussion
15:34:12 <lamt> if nothing else, we can end the meeting. Everyone have a great rest of the week.
15:34:19 <lamt> #endmeeting