09:00:34 #startmeeting magnum 09:00:34 Meeting started Wed Feb 15 09:00:34 2023 UTC and is due to finish in 60 minutes. The chair is jakeyip. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:00:34 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:00:34 The meeting name has been set to 'magnum' 09:00:45 #topic Roll Call 09:00:51 o/ 09:01:09 o/ 09:01:30 o/ 09:02:13 #link https://etherpad.opendev.org/p/magnum-weekly-meeting 09:02:26 Please feel free to populate the agenda 09:03:21 hi travissoto 09:03:32 thanks everyone for coming to the meeting. feel free to join in at anytime. 09:03:39 #topic PTG 09:04:00 hi all 09:04:24 There are two PTGs this time, (1) Vitual PTG (March 27-31) (2) OpenInfra Summit + PTG (June 13-15) 09:05:04 does everyone have a preference for PTG? 09:05:47 unfortunately I probably will not be able to make it to OpenInfra 09:06:28 if there is no hard preference I may book something for March and see if there will be interest closer to the date 09:06:29 likewise, virtual is preferred this time for me. 09:06:45 +1 09:07:17 ok 09:07:27 #action jakeyip to book Virtual PTG 09:07:48 #topic Antelope supported versions 09:08:30 Thanks everyone for the patches to make FCOS 36-37 work 09:08:39 and also k8s v1.24 09:09:26 A common issue for user is that they are unsure which versions of FCOS / K8S is supported. For that I have recently fixed up the docs to reflect that 09:09:40 #link https://docs.openstack.org/magnum/latest/user/#supported-versions 09:10:28 I would like to propose that we say FCOS 36/37 + k8s v1.24 is supported for this cycle. 09:10:35 what does everyone think? 09:10:39 I've also passed conformance for 1.25, and had 1.26 mostly running (but I have not reviewed kube-system pod versions yet). 09:10:54 sounds good, 1.24 is still supported k8s version. 09:11:24 dalees: does default labels work ? 09:11:59 jakeyip: unlikely, that was the other topic I'd like to discuss. We bump a large number of things and the defaults are way out of date now (calico etc) 09:12:18 dalees: yeah that is a problem. lots of discussion needed there :) 09:12:40 ok if we are in agreement let's just target 1.24 and 1.25 as stretch goal ;) 09:13:01 #info Antelope supported version FCOS 36/37 and Kubernetes v1.24 09:13:03 maybe we can share (or update) working template labels if we have the 1.24 locked in. it's hard to know to update the default without the version (1.24) set in place. 09:13:48 we can target tests for these versions, and we can have labels that work possibly out of the box (later discussion) 09:13:59 great :) 09:14:12 #topic Deprecation 09:15:12 As Antelope is about to come to an end, I am in a bit of a hurry to mark things as deprecated this cycle, so to allow them to be removed in 2 cycles' time 09:16:04 #topic Deprecate Fedora Atomic for Kubernetes 09:16:33 #link https://review.opendev.org/c/openstack/magnum/+/833949 I see a few +1, I think we can get this in this cycle. Thanks dalees :) 09:16:45 #topic Deprecate Swarm 09:16:57 is anybody still using swarm? or not? 09:17:08 not us. 09:17:09 we are not using Swarm at all, and I'm not even sure it works. 09:18:16 OK if there is someone using Swarm please feel free to email me. If not I will drop a mail on the ML to see who is using and wants to take up maintenance 09:18:35 #action jakeyip Propose deprecation of Swarm to ML 09:19:00 +1 needs a mailing list post, and then propose removal if it's not relevant anymore. 09:19:23 :) 09:19:50 if we can get Fedora Atomic out there is lots of code we can remove 09:20:46 #topic python-magnumclient intermittent failures after tox4 09:21:24 So I tried updating to tox4 format, but weirdly I am getting intermittent failures running `tox -e py38` with those changes. 09:21:39 it fails in check too 09:22:13 we have patches stuck because of this. if someone can help it'll be great 09:23:51 alright we went through the previous items pretty quickly, are there questions for those items, dalees / travissoto ? 09:24:27 i'll see if i can look into those failures with magnumclient, the "intermittent" part is concerning. 09:24:40 no not from me at this stage :) 09:24:56 thanks dalees 09:25:27 dalees: do you want to discuss prometheus helm chart now? 09:26:10 yeah, sure. 09:26:13 #topic Prometheus helm charts 09:26:50 so the prometheus/grafana stack is installed into kube-system namespace with helm if monitoring_enabled is set. 09:27:00 and this breaks in 1.22 09:27:39 we(actually, travissoto ) replaced the helm charts with the newer, completely different ones from kube-prometheus-stack. 09:28:03 do others want or use these, or should we keep this patch local and remove the complexity from Magnum? 09:29:53 I think I tried the default one and gave up :) 09:30:18 dalees: do you install it for all your users? 09:30:41 our templates enable it by default yes, but many turn it off and install their own monitoring stacks. 09:31:39 yeah ok 09:32:11 we don't install it by default IIRC and I suspect users might prefer their own 09:32:29 as we refactor to CAPI, we're going to consider if and how we keep it. It's a big job to keep it maintained in Magnum codebase. 09:33:03 of cos having more things work out of the box is great for users, but maintaining them up to date is an issue 09:33:30 and the more is in the codebase, the more we are responsible for testing 09:34:05 we don't have good test for that now (?) so merging changes will be difficult without tests 09:35:19 yeah, with k8s 1.24 as supported i suspect that won't work out of the box. 09:35:29 I propose that we should remove it if it is broken. What does everyone think? 09:35:59 agree better to remove it 09:36:44 ok by me 09:36:56 lets confirm it's broken first. 09:37:24 (it is for us, but i want to be sure it's not our local patches) 09:37:36 OK. can you help to confirm in devstack and send up a change to remove it if it is? 09:37:57 I may take a look later too 09:38:13 ok 09:39:35 #action dalees / travissoto to confirm prometheus helm chart is broken, and propose patch to remove if it is 09:39:46 #topic Container Labels 09:39:53 big topic ;) 09:40:31 I guess executive summary is: default labels may be broken, but updating them may break existing cluster templates that do not set them 09:40:39 what are our options? 09:42:42 oh oh 09:43:50 yep, that's pretty difficult. making a template that relies on defaults that may change isn't a great experience. 09:44:46 we've a similar issue with manifests, if someone updates the calico manifests for version v1.23 and we're still allowing users to create calico v1.13 their new clusters all break. 09:46:33 I've resolved this by copying all templates that change and picking them up with version matches (as seen in https://github.com/openstack/magnum/blob/master/magnum/drivers/k8s_fedora_coreos_v1/templates/kubecluster.yaml#L1424 ) 09:47:37 look on the bright side, new cluster breaking is better than current cluster breaking 09:47:41 back to labels topic though - i think defining as many labels as possible in a template helps, which is what we end up doing. it means the defaults don't apply. 09:48:16 that's what we do too because the defaults are just too old 09:49:46 so who has the problem of user templates breaking? do we just run with it and update them to match the current k8s (1.24 right now)? and produce example templates that can be published with little chance of breaking? 09:50:29 I did have our organisation templates broken before, that's why I learnt to pin as many labels as possible 09:51:04 you mentioned breaking existing clusters - how are these labels ever re-applied to a running cluster? the upgrade or scaling process doesn't do it (if we're talking kube-system container images). Existing Heat stacks stay the same. 09:51:52 yeah sorry I mean existing cluster _templates_, I might have typo 09:52:14 ah ok; just checking I understood properly 09:53:14 this type of problem may not go away with CAPI. we still need some concept of cluster templates. 09:55:00 yeah, I feel the least disruptive is to leave the defaults alone and document what works for the current versions 09:55:55 but then you sacrifice the "works out of the box" experience, if that is the goal. 09:56:17 updating the labels in code is an impossible task. we can push it to latest in Antelope, but by the time an organisation installs / upgrade to Antelope they will be out of date already 09:56:23 you could remove all defaults and force them to be specified in template labels :) 09:56:30 :) 09:56:44 for CAPI? :) 09:57:16 it's worth considering yeah. 09:57:33 then you only maintain versions in once place, and they match the k8s version 09:58:08 yeah agree much nicer 09:58:47 I guess what we can do better is document it. I've heard complains :) 09:58:59 we are almost out of time. any other topic? 09:59:10 keen to hear others' ideas on the topic, who aren't in the meeting but involved in Magnum. 09:59:43 I've got some for another week, but they can wait. thanks for the discussion 10:00:07 me too. let's hold this regularly and more may join 10:00:38 Thanks dalees and travissoto for coming 10:01:03 #endmeeting