09:01:24 <jakeyip> #startmeeting magnum
09:01:24 <opendevmeet> Meeting started Wed Nov  1 09:01:24 2023 UTC and is due to finish in 60 minutes.  The chair is jakeyip. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:01:24 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:01:24 <opendevmeet> The meeting name has been set to 'magnum'
09:01:25 <jakeyip> Agenda:
09:01:27 <jakeyip> #link https://etherpad.opendev.org/p/magnum-weekly-meeting
09:01:29 <jakeyip> #topic Roll Cally
09:01:42 <jakeyip> #topic Roll Call
09:05:19 <lpetrut> o/
09:08:16 <jakeyip> hi lpetrut
09:08:22 <lpetrut> hi
09:08:45 <jakeyip> looks like there isn't others here today :) is there anything youw ant to talk about?
09:09:17 <lpetrut> I work for Cloudbase Solutions and we've been trying out the CAPI drivers
09:09:23 <lpetrut> there's something that I wanted to bring out
09:09:27 <lpetrut> up*
09:09:34 <lpetrut> the need for a manaegment cluster
09:09:41 <jakeyip> cool, let's start then
09:09:47 <jakeyip> #topic clusterapi
09:10:10 <jakeyip> go on
09:10:16 <lpetrut> we had a few concerns about the mangement cluster, not sure if it was already discussed
09:10:37 <jakeyip> what's the concern?
09:10:50 <lpetrut> for example, the need of keeping it around for the lifetime of the workload cluster
09:11:54 <lpetrut> and having to provide an existing cluster can be an inconvenient in multi-tenant environments
09:13:05 <jakeyip> just to clarify, by 'workload cluster' do you mean the magnum clusters created by capi?
09:13:10 <lpetrut> yes
09:14:02 <jakeyip> can you go more in detail how having a cluster in multi-tenant env is an issue?
09:14:12 <lpetrut> other projects tried a different approach: spinning up a cluster from scratch using kubeadm and then deploying CAPI and have it manage itself, without the need of a separate management cluster. that's something that we'd like to experiment and I was wondering if it was already considered
09:14:53 <jakeyip> by other projects yiou emaN?
09:15:05 <jakeyip> mean?
09:15:50 <lpetrut> this one specifically wasn't public but I find their approach interesting
09:16:17 <johnthetubaguy> lpetrut: I would like to understand your worry more on that
09:16:33 <jakeyip> hio johnthetubaguy
09:16:38 <lpetrut> about the multi-tenant env, we weren't sure if it's safe for multiple tenants to use the same management cluster, would probably need one for each tenant, which would then have to be managed for the lifetime of the magnum clusters
09:16:42 <johnthetubaguy> do you mean you want each tenant cluster to have a separate management cluster?
09:17:34 <johnthetubaguy> lpetrut: I think that is where magnum's API and quota come in, that gives you a bunch of protection
09:18:01 <jakeyip> lpetrut: just to confirm you are testing the StackHPC's contributed driver that's in review ?
09:18:05 <johnthetubaguy> each cluster gets thier own app creds, so there is little crossover, except calling openstack APIs
09:18:17 <johnthetubaguy> jakeyip: both drivers do the same thing, AFAIK
09:18:46 <lpetrut> yes, I'm working with Stefan Chivu (schivu), who tried out both CAPI drivers and proposed the Flatcar patches
09:18:52 <johnthetubaguy> jakeyip: sorry, my calendar delt with the time change perfectly, my head totally didn't :)
09:20:00 <lpetrut> and we had a few ideas about the management clusters, wanted to get some feedback
09:20:45 <lpetrut> one of those ideas was the one that I mentioned: completely avoiding a management cluster by having CAPI manage itself. I know people have already tried this, was wondering if it's something safe and worth considering
09:21:31 <lpetrut> if that's not feasible, another idea was to use Magnum (e.g. a different, possibly simplified driver) to deploy the management cluster
09:21:36 <johnthetubaguy> lpetrut: but then mangum has to reach into every cluster directly to manage it? That seems worse (although it does have the creds for that)
09:21:59 <lpetrut> yes
09:22:21 <johnthetubaguy> lpetrut: FWIW, we use helm wrapped by ansible to deploy the management cluster, using the same helm we use from inside the magnum driver
09:22:55 <johnthetubaguy> lpetrut: its interesting I hadn't really considered that approach before now
09:23:18 <lpetrut> just curious, why would it be a bad idea for magnum to reach the managed cluster directly?
09:23:26 <johnthetubaguy> I think you could do that with the helm charts still, and "just" change the kubectl
09:24:15 <johnthetubaguy> lpetrut: I like the idea of magnum not getting broken by what users do within their clusters, and the management is separately managed outside, but its a gut reaction, needs more thought.
09:24:32 <lpetrut> I see
09:25:03 <jakeyip> I'm afraid I don't understand how CAPI works without a management cluster
09:25:15 <johnthetubaguy> its a trade off of course, there is something nice about only bootstraping from the central cluster, and the long running management is inside each clsuter
09:25:19 <jakeyip> might need some links if you have them handy?
09:25:45 <lpetrut> right, so the idea was to deploy CAPI directly against the managed cluster and have it manage itself
09:25:47 <johnthetubaguy> jakeyip: its really about the CAPI controllers being moved inside the workload cluster after the intiial bootstrap, at least we have spoken about that for the management cluster its-self
09:26:19 <johnthetubaguy> lpetrut: you still need a central managment cluster to do the initial bootstrap, but then it has less responsbility longer term
09:26:40 <johnthetubaguy> (in my head at least, which is probably not the same thing as reality)
09:27:30 <lpetrut> right, we'd no longer have to keep the management cluster around
09:27:55 <johnthetubaguy> well that isn't quite true right
09:28:02 <johnthetubaguy> ah, wait a second...
09:28:20 <johnthetubaguy> ah, you mean a transient cluster for each bootstrap
09:28:27 <lpetrut> yes
09:28:33 <jakeyip> johnthetubaguy: do you mean, 1. initial manageent cluster 2. create A workload cluster (not created in Magnum) 3. move it into this workload cluster 4. point Magnum to this cluster ?
09:28:42 <lpetrut> exactly
09:29:30 <johnthetubaguy> honestly that bit sounds like an operational headache, debugging wise, I prefere a persistent management cluster for the bootstrap, but transfer control into the cluster once its up
09:29:43 <johnthetubaguy> ... but this goes back to what problem we are trying to solve I guess
09:29:46 <lpetrut> and I think we might be able to take this even further, avoiding the initial management altogether, using userdata scripts to deploy a minimal cluster using kubeadm, then deploy CAPI
09:29:52 <johnthetubaguy> I guess you want magnum to manage all k8s clusters?
09:30:41 <johnthetubaguy> lpetrut: I mean you can use k3s for that, which I think we do for our "seed" cluster today: https://github.com/stackhpc/ansible-collection-azimuth-ops/blob/main/playbooks/provision_capi_mgmt.yml
09:30:43 <lpetrut> yeah, we were hoping to avoid the need of an external management cluster
09:31:32 <johnthetubaguy> ... yeah, there is something nice about that for sure.
09:31:34 <lpetrut> right now, we were hoping to get some feedback, see if it make sense and if there's anyone interested, then we might prepare a POC
09:32:27 <johnthetubaguy> lpetrut: in my head I would love to see helm being used to manage all the resources, to keep the consisnet, and keep the manifests out of the magnum code base, so its not linked to your openstack upgrade cycle so strongly (but I would say that!)
09:33:23 <lpetrut> one approach would be to extend one of the CAPI drivers and customize the bootstrap phase
09:33:35 <johnthetubaguy> I guess my main worry is that is a lot more complicated in magnum, but hopefully the POC would prove me wrong on that
09:33:54 <jakeyip> I think it's an interesting concept
09:34:02 <johnthetubaguy> jakeyip: +1
09:34:14 <johnthetubaguy> lpetrut: can you describe more what that would look like please?
09:35:36 <lpetrut> sure, so the idea would be to deploy a Nova instance, spin up a cluster using kubeadm or k3s, deploy CAPI on top so that it can manage itself and from then on we could use the standard CAPI driver workflow
09:35:43 <johnthetubaguy> at the moment the capi-helm driver "just" does a helm install, after having injected app creds and certs into the management cluster, I think in your case you would first wait to create a bootstrap cluster, then do all that injecting, then bring the cluster up, then wait for that to finish, then migrate into the deployed clusters, including injecting all the secrts into that, etc.
09:36:04 <lpetrut> exactly
09:36:21 <johnthetubaguy> lpetrut: FWIW, that could "wrap" the existing capi-helm driver, I think, with the correct set of util functions, there is a lot of shared code
09:36:33 <lpetrut> exactly, I'd just inherit it
09:36:59 <johnthetubaguy> now supporting both I like, let me describe...
09:37:15 <johnthetubaguy> if we get the standalone managenet cluster in first, from a mangum point of view that is simpler
09:37:49 <johnthetubaguy> second, I could see replacement of the shared managenet cluster, with a VM with k3s on for each cluster
09:38:15 <johnthetubaguy> then third, you move from VM into the main cluster, after the cluster is up, then tear down the VM
09:38:45 <johnthetubaguy> then we get feedback from operators on which proves nicer in production, possibly its both, possibly we pick a winner and deprecate the others
09:38:52 <johnthetubaguy> ... you can see a path to migrate between those
09:39:02 <johnthetubaguy> lpetrut: is that sort of what you are thinking?
09:39:08 <lpetrut> sounds good
09:39:26 <johnthetubaguy> one of the things that was said at the PTG is relevant here
09:39:41 <johnthetubaguy> I think it was jonathon from the BBC/openstack-ansible
09:39:51 <lpetrut> yes, although I was wondering if we could avoid the initial bootstrap vm altogether
09:39:57 <johnthetubaguy> magnum can help openstack people not have to understand so much of k8s
09:40:10 <johnthetubaguy> lpetrut: your idea here sure helps with that
09:40:51 <johnthetubaguy> lpetrut: sounds like magic, but I would to see it, although I am keen we make things possible with vanilla/mainstream cluster api approaches
09:40:52 <lpetrut> before moving further, I'd like to check withe the CAPI maintainers to see if there's anything wrong with CAPI managing the cluster that it runs on
09:41:11 <johnthetubaguy> lpetrut: I believe that is a supported use case
09:41:26 <lpetrut> that would be great
09:41:38 <jakeyip> lpetrut: I think StackHPC implementation is just _one_ CAPI implementation. Magnum can and should support multiple drivers
09:41:55 <jakeyip> as long as we can get maintainers haha
09:42:01 <lpetrut> thanks a lot for feedback, we'll probably come back in a few weeks with a PoC :)
09:42:04 <johnthetubaguy> you can transfer from k3s into your created ha cluster, so it manages itself... we have a plan to do that for our shared management cluster, but have not got around to it yet (too busy doing magnum code)
09:42:34 <jakeyip> I think important from my POV is that all the drivers people want to implement not clash with one another.
09:42:36 <johnthetubaguy> jakeyip: that is my main concern, diluting an already shrinking community
09:42:41 <jakeyip> about that I need to chat with you johnthetubaguy
09:42:57 <johnthetubaguy> jakeyip: sure thing
09:43:00 <jakeyip> about the hardest problem in computer science - naming :D
09:43:05 <johnthetubaguy> lol
09:43:09 <johnthetubaguy> foobar?
09:43:17 <jakeyip> everyone loves foobar :)
09:43:41 <johnthetubaguy> which name are you thinking about?
09:44:01 <jakeyip> johnthetubaguy: mainly the use of 'os' tag and config section
09:44:12 <jakeyip> if we can have 1 name for all of them, different drivers won't clash
09:44:22 <johnthetubaguy> so about the os tag, the ones magnum use don't work with nova anyways
09:44:36 <johnthetubaguy> i.e. "ubuntu" isn't a valid os_distro tag, if my memory is correct on that
09:45:17 <johnthetubaguy> in config you can always turn off any in tree "clashing" driver anyways, but granted its probably better not to clash out of the box
09:45:36 <jakeyip> yeah, is it possible to change them all to 'k8s_capi_helm_v1' ? so driver name, config section, os_distro tag is the same
09:45:47 <johnthetubaguy> jakeyip: I think I went for capi_helm in the config?
09:45:56 <jakeyip> yeah I want to set rulesss
09:46:02 <johnthetubaguy> jakeyip: I thought I did that all already?
09:46:20 <johnthetubaguy> I don't 100% remember though, let me check
09:46:58 <jakeyip> now is driver=k8s_capi_helm_v1, config=capi_helm, os_distro=capi-kubeadm-cloudinit
09:47:26 <johnthetubaguy> ah, right
09:47:47 <johnthetubaguy> so capi-kubeadm-cloudinit was chosen as to matches what is in the image
09:48:00 <johnthetubaguy> and flatcar will be different (its not cloudinit)
09:48:08 <jakeyip> just thinking if lpetrut wants to develop something they can choose a name for driver and use that for config section and os_distro and it won't clash
09:48:46 <johnthetubaguy> it could well be configuration options in a single driver, to start with
09:49:11 <schivu> hi, I will submit the flatcar patch on your github repo soon and for the moment I used capi-kubeadm-ignition
09:49:42 <johnthetubaguy> schivu: sounds good, I think dalees was looking at flatcar too
09:49:52 <jakeyip> yeah I wasn't sure how flatcar will work with this proposal
09:50:14 <johnthetubaguy> a different image will trigger a different boostrap driver being selected in the helm chart
09:50:25 <johnthetubaguy> at least that is the bit I know about :) there might be more?
09:50:49 <schivu> yep, mainly with CAPI the OS itself is irrelevant, what matters is which bootstrapping format does the image use
09:51:18 <johnthetubaguy> schivu: +1
09:51:51 <johnthetubaguy> I was trying to capture that in the os-distro value I chose, and operator config can turn the in tree implementation off if they want a different out of tree one?
09:51:59 <johnthetubaguy> (i.e. that config already exists today, I believe)
09:52:37 <johnthetubaguy> FWIW, different drivers can probably use the same image, so it seems correct they share the same flags
09:53:06 <johnthetubaguy> (I wish we didn't use os_distro though!)
09:54:10 <johnthetubaguy> jakeyip: I am not sure if that helped?
09:54:37 <johnthetubaguy> what were you thinking for the config and the driver, I was trying to copy the pattern with the heat driver and the [heat] config
09:55:28 <johnthetubaguy> to be honestly, I am happy with whatever on the naming of the driver and the config, happy to go with what seems normal for Magnum
09:55:58 <jakeyip> hm, ok am I right that for stackhpc, the os_distro tag in glance will be e.g. for  ubuntu=capi-kubeadm-cloudinit and flatcar=capi-kubeadm-ignition (as schivu said)
09:56:21 <johnthetubaguy> I am open to ideas, that is what we seem to be going for right now
09:56:33 <johnthetubaguy> it seems semantically useful like that
09:56:58 <johnthetubaguy> (we also look for a k8s version property)
09:57:37 <johnthetubaguy> https://github.com/stackhpc/magnum-capi-helm/blob/6726c7c46d3cac44990bc66bbad7b3dd44f72c2b/magnum_capi_helm/driver.py#L492
09:58:19 <johnthetubaguy> kube_version in the image properties is what we currently look for
10:00:29 <johnthetubaguy> jakeyip: what was your preference for os_distro?
10:00:30 <jakeyip> I was under the impression the glance os_distro tag needs to fit 'os' part of driver tuple
10:00:54 <johnthetubaguy> so as I mentioned "ubuntu" is badly formatted for that tag anyways
10:01:10 <johnthetubaguy> I would rather not use os_distro at all
10:01:43 <johnthetubaguy> "ubuntu22.04" would be the correct value, for the nova spec: https://github.com/openstack/nova/blob/master/nova/virt/osinfo.py
10:02:07 <jakeyip> see https://opendev.org/openstack/magnum/src/branch/master/magnum/api/controllers/v1/cluster_template.py#L428
10:02:42 <jakeyip> which gets used by https://opendev.org/openstack/magnum/src/branch/master/magnum/drivers/common/driver.py#L142
10:03:25 <johnthetubaguy> yep, understood
10:04:03 <johnthetubaguy> I am tempted to register the driver as None, which might work
10:04:55 <jakeyip> when your driver declares `{"os": "capi-kubeadm-cloudinit"}`, it will only be invoked if glance os_distro tag is `capi-kubeadm-cloudinit` ? it won't load for flatcar `capi-kubeadm-ignition` ?
10:05:25 <johnthetubaguy> yeah, agreed
10:05:48 <jakeyip> I thought the decision was based on values passed to your driver
10:05:54 <johnthetubaguy> I think there will be an extra driver entry added for flatcar, that just tweaks the helm values, but I haven't see a patch for that yet
10:07:14 <jakeyip> that's what I gathered from over https://github.com/stackhpc/capi-helm-charts/blob/main/charts/openstack-cluster/values.yaml#L124
10:08:25 <jrosser> johnthetubaguy: regarding the earlier discussion about deploying the capi managment k8s cluster - for openstack-ansible i have a POC doing that using an ansible collection, so one managment cluster for * workload clusters
10:08:25 <johnthetubaguy> capi_bootstrap="cloudinit|ingnition" would probably be better, but yeah, I was just trying hard not to clash with the out of tree driver
10:10:11 <johnthetubaguy> jrosser: cool, that is what we have done in here too I guess, reusing the helm charts we use inside the driver: https://github.com/stackhpc/ansible-collection-azimuth-ops/blob/main/playbooks/provision_capi_mgmt.yml and https://github.com/stackhpc/azimuth-config/tree/main/environments/capi-mgmt
10:10:53 <jakeyip> will the flatcar patch be something that reads CT label and sets osDistro=ubuntu / flatcar? is this a question for schivu ?
10:10:57 <johnthetubaguy> jrosser: the interesting idea about lpetrut 's idea is that magnum could manage the management cluster(s) too, which would be neat trick
10:11:20 <jrosser> johnthetubaguy: i used this https://github.com/vexxhost/ansible-collection-kubernetes
10:12:38 <johnthetubaguy> jrosser: ah, cool, part of the atmosphere stuff, makes good sense. I haven't looked at atmosphere (yet).
10:13:15 <jrosser> yeah, though it doesnt need any atmosphere stuff to use the collection standalone, ive used the roles directly in OSA
10:14:04 <jakeyip> jrosser: curious how do you maintain the lifecycle of the cluster deployed with ansible ?
10:14:14 <johnthetubaguy> jrosser: I don't know what kolla-ansible/kayobe are planning yet, right now we just add in the kubeconfig and kept the CD pipelines separate
10:14:30 <jrosser> i guess i would be worried about making deployment of the management cluster using magnum itself much much better than the heat driver
10:15:14 <schivu> jakeyip: the flatcar patch adds a new driver entry; the ignition driver inherits the cloudinit one and provides "capi-kubeadm-ignition" as the os-distro value within the tuple
10:15:18 <jrosser> jakeyip: i would have to get some input from mnaser about that
10:15:51 <johnthetubaguy> I wish we were working on this together at the PTG to design a common approach, that was my hope for this effort, but it hasn't worked out that way I guess :(
10:17:21 <jakeyip> schivu: thanks. in that case can you use the driver nme for os_distro? which is the question I asked johnthetubaguy initially
10:17:27 <johnthetubaguy> jrosser: they key difference with the heat driver is most of the work is in cluster API, with all of these approaches. In the helm world, we try to keep the mainfests in a single place, helm charts, so the test suite for the helm charts helps across all the different ways we stamp out k8s, be it via magnum, or ansible, etc.
10:17:51 <johnthetubaguy> jakeyip: sorry, I miss understood /miss read your question
10:19:19 <johnthetubaguy> jakeyip: we need the image properties to tell us cloudinit vs ignition, any that happens is mostly fine with me
10:19:37 <johnthetubaguy> having this converstation in gerrit would be my preference
10:21:02 <jakeyip> sure I will also reply to it there
10:22:08 <johnthetubaguy> I was more meaning on the flatcar I guess, its easier when we see what the code looks like I think
10:22:26 <johnthetubaguy> there are a few ways we could do it
10:24:22 <johnthetubaguy> jakeyip: I need to discuss how much time I have left now to push this upstream, I am happy for people to run with the patches and update them, I don't want us to be a blocker for what the community wants to do here.
10:24:42 <jakeyip> yeah I guess what I wanted to do was quickly check if using os_distro this way is a "possible" or a "hard no"
10:24:47 <jakeyip> as I make sure drivers don't clash
10:25:23 <johnthetubaguy> well I think the current proposed code doesn't clash right? and you can configure any drivers to be disabled as needed if any out of tree driver changes to match?
10:25:39 <jakeyip> johnthetubaguy: I am happy to take over your patches too, I have it running now in my dev
10:25:45 <johnthetubaguy> open to change the value to something that feels better
10:26:01 <jakeyip> cool, great to have that understanding sorted
10:26:12 <johnthetubaguy> jakeyip: that would be cool, although granted that means its harder for you to +2 them, so swings and roundabouts there :)
10:27:38 <jakeyip> johnthetubaguy: well... one step at a time :)
10:27:51 * johnthetubaguy nods
10:28:19 <jakeyip> I need to officially end this cos it's over time, but feel free to continue if people have questions
10:28:27 <johnthetubaguy> jakeyip: are you aware of the tooling we have for the management cluster, that reuses the helm charts?
10:28:29 <johnthetubaguy> jakeyip: +1
10:28:31 <jakeyip> #endmeeting