09:03:56 <mnasiadka> #startmeeting magnum
09:03:56 <opendevmeet> Meeting started Wed Sep 20 09:03:56 2023 UTC and is due to finish in 60 minutes.  The chair is mnasiadka. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:03:56 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:03:56 <opendevmeet> The meeting name has been set to 'magnum'
09:04:03 <mnasiadka> jakeyip: couldn't resist ;)
09:04:16 <mnasiadka> #topic rollcall
09:04:18 <mnasiadka> o/
09:04:22 <johnthetubaguy> o/
09:04:22 <bbezak> o\
09:04:24 <dalees> o/
09:04:26 <mkjpryor> o/
09:04:30 <gbialas> o/
09:04:31 <travisholton> o/
09:04:34 <jakeyip> o/
09:05:10 <jakeyip> what a crowd :)
09:05:31 <mnasiadka> #topic agenda
09:05:40 <mnasiadka> BU Mentorship
09:05:47 <mnasiadka> ClusterAPI
09:05:54 <mnasiadka> Open discussion
09:05:59 <mnasiadka> #topic BU Mentorship
09:06:09 <mnasiadka> So, it was linked last time in the etherpad
09:06:22 <jakeyip> #link https://etherpad.opendev.org/p/magnum-weekly-meeting
09:06:23 <dalees> etherpad link: https://etherpad.opendev.org/p/magnum-weekly-meeting
09:06:40 <mnasiadka> I reached out to diablo_rojo - it seems there are no students that signed up, but it would be good if we could have a list of potential mentors from Magnum side
09:06:49 <mnasiadka> #link https://etherpad.opendev.org/p/2023-BU-Magnum
09:07:21 <mnasiadka> It would be nice if people interested would add their names on that etherpad to Mentors section
09:07:27 <opendevreview> John Garbutt proposed openstack/magnum master: WIP: ClusterAPI: add initial driver implementation  https://review.opendev.org/c/openstack/magnum/+/851076
09:07:29 <mnasiadka> #topic ClusterAPI
09:07:34 <mnasiadka> jakeyip: giving the meeting back to you ;-)
09:08:28 <opendevreview> John Garbutt proposed openstack/magnum master: WIP: ClusterAPI: add initial driver implementation  https://review.opendev.org/c/openstack/magnum/+/851076
09:08:29 <jakeyip> oh man the difficult topic
09:08:49 <jakeyip> johnthetubaguy are you around? (I guess you are online from ^)
09:08:57 <johnthetubaguy> I can update from our side what we are doing?
09:09:05 <jakeyip> yes thanks
09:09:38 <johnthetubaguy> So given we didn't get merged this cycle, and we are testing this with a few customers, we have created this repo for now: https://github.com/stackhpc/magnum-capi-helm
09:10:26 <johnthetubaguy> I have been rebaseing the upstream patches so they could be kept in sync with the above
09:10:42 <johnthetubaguy> I certianly would still like a cluster api driver, that uses helm charts, in tree, if that is possible
09:11:05 <johnthetubaguy> This is the tip of the updated patch set: https://review.opendev.org/c/openstack/magnum/+/851076
09:11:49 <dalees> thanks for creating that repo btw; it's much easier to install from a repo than carry a stack of gerrit patches to test against.
09:12:05 <johnthetubaguy> in the process of retesting the devstack bits there, to see what I broke in all the refactoring (facepalm!)
09:12:15 <jakeyip> thanks. for background and comparison, VEXXHOST driver is out of tree, I don't think there's inclination for them to contribute to in-tree.
09:12:19 <johnthetubaguy> dalees: cool, glad that helps, certainly seemed easier going
09:13:08 <johnthetubaguy> so we spoke about things at the last PTG right, and I think the core difference is we want to use helm charts that can be shared with k8s on OpenStack outside of magnum
09:13:34 <johnthetubaguy> in particilar I would like ArgoCD directly as and option (alongside Azimuth that has been using these for 18 months or more)
09:13:59 <johnthetubaguy> and really I would love for that to be a community effort, with a tested referece starting point people can use
09:14:37 <johnthetubaguy> ... so I don't think that vision has changed form when we approved the spec, granted we only got our funding approved about two weeks ago, hence the re-annimation on our side
09:14:41 <jakeyip> yeap I understand. both approaches have their pros and cons, but let's not get too deep into that now, in the interest of time as that can take a while
09:15:03 <johnthetubaguy> So there is a new patch in the series
09:15:16 <johnthetubaguy> https://review.opendev.org/c/openstack/magnum/+/895828
09:15:17 <jakeyip> so we paused the merge because we realised the patches in tree, as it stands, would conflict with VEXXHOST and existing implementations using that driver
09:15:29 <johnthetubaguy> It sarts some common utils to share being the two drivers
09:15:47 <johnthetubaguy> jakeyip: yes, I noticed that late last week in the comments, that is certainly bad
09:15:58 <johnthetubaguy> for the moment I went for changing our use of os-distro
09:16:19 <johnthetubaguy> "os": "capi-kubeadm-cloudinit"
09:16:40 <johnthetubaguy> now that is pretty crazy, but it represents that the current chart defaults depend on the kubeadm cloudinit bootstrapper
09:16:56 <johnthetubaguy> (we can add the flat car one, with the approriate values tweak too!)
09:17:20 <johnthetubaguy> with my Nova hat os os_distro=ubuntu is technically mal formed and not useful to nova
09:17:45 <dalees> I think, given both drivers require the same capi built images (ubuntu for now, flatcar soon), we need some other way of differentiating besides (vm, ubuntu, kubernetes) tuple. At the moment I prefer the idea of having the Magnum template define the driver preference (and it could default otherwise).
09:17:46 <johnthetubaguy> from a more human sense, its a bit odd though, but it stops us conflicting for now
09:17:52 <jakeyip> cool. glad we all agree we shouldn't break existing implementations. certainly it caught us (me) by surprise. It would have helped if we did had a VEXXHOST representative reviewing those patches.
09:18:11 <johnthetubaguy> dalees: yeah, that would be ideal, of couse you could only enable one of the drivers via config
09:18:32 <mnasiadka> jakeyip: I think they have hard time attending this meeting, due to time difference - but let's see if we can resolve it in long term :)
09:18:48 <johnthetubaguy> jakeyip: 100% we shouldn't break that driver, I am glad that got spotted, I certainly didn't notice that till it was pointed out
09:19:21 <jakeyip> mnasiadka: yeah we can put one of them as a core to review patches, offline from meeting TZ.
09:19:21 <dalees> johnthetubaguy: yes, there are config flags to disable certain drivers - this solves one part. jakeyip brought up the point that if someone was migrating between drivers, they'd want both running at once for some duration.
09:19:26 <johnthetubaguy> mnasiadka: +1 the overlap is hard
09:19:38 <mnasiadka> Let's not get deep into technical stuff here, Gerrit is for code reviews (at least that's my opinion) - situation is that we have Bobcat RC1 right now, so if we merge any in-tree driver then it's going to be earliest Caracal
09:19:50 <johnthetubaguy> dalees: jakeyip yeah, good point, you would want side by side
09:20:31 <johnthetubaguy> FWIW, I kinda like the idea of the template speciying a driver more directly, and the driver validating the image its self
09:21:53 <johnthetubaguy> we could look at writing up a spec for that? but general guidence on the best way to implement that is very welcome
09:22:04 <johnthetubaguy> a label is tempting, but also nasty
09:22:11 <jakeyip> in addition the config approach, the current config option is 'disabled_drivers'; a new one driver in a new cycle will be enabled by default, if operator hasn't update config
09:22:36 <johnthetubaguy> top level template param to select the driver? if empty falls back to the legacy image based seletion?
09:23:07 <johnthetubaguy> jakeyip: I have your beta driver change in my series to make sure its opt in till we are happy
09:23:49 <dalees> yeah, top level template param seems suitable i think, with empty fallback to the tuple match (or *also* do the tuple match, as well as the driver selection?).
09:24:06 <jakeyip> the problem really is the tuple design, it will hamstring us if we keep trying to work with it. plus if we introduce new tuples, we run the risk of it breaking an installation out there
09:24:54 <johnthetubaguy> I was thinking a top level template param means we just ignore the tuple, and we call that legacy for when the top level param is empty?
09:25:29 <johnthetubaguy> i.e. you opt into the new system, for new templates.
09:25:54 <jakeyip> johnthetubaguy: yeap, that is sort of the design we came up with last week after our brainstorming session.
09:26:00 <johnthetubaguy> and you just say, driver=k8s_capi_helm_v1 or whatever in the template. That does mean we need an API to list the avaliable drivers.
09:26:37 <mnasiadka> yes, but that might be easier to do when we drop all other drivers
09:26:47 <johnthetubaguy> seems cleaner, I honestly hate the tuple thing, I have spend hours debugging that with image permissions, etc.
09:27:05 <dalees> true; there's a CLI tool for listing drivers, but not an API.
09:27:43 <mnasiadka> seems like a nice priority for C cyle
09:27:46 <johnthetubaguy> mnasiadka: well we can't drop the old approach without killing the API right, so I don't see why that needs to wait? this is all about someting for C release anyways?
09:27:47 <mnasiadka> *cycle
09:28:49 <johnthetubaguy> is there someone who wants to take this one and write up a spec I guess?
09:28:50 <jakeyip> good point about API, the old discovery is useful in certain circumstances where a user creates a template, possibly off the information on magnum user docs
09:29:08 <mnasiadka> johnthetubaguy: just saying all other drivers are already deprecated, and it might be just easier to get that implemented when we drop them for simplicity
09:29:24 <johnthetubaguy> so the API would exclude disabled drivers, and include any out of tree things you happen to have installed, I presume.
09:29:27 <jakeyip> I am not sure if that is a thing anymore when we are talking about out of tree drivers nowadays. how does the user know what os_distro to use if they didn't know about the driver(s)
09:30:08 <johnthetubaguy> So personally, I think we should disable users from creating templates, by default, and leave that to the admin... but I might be on my own there.
09:30:21 <johnthetubaguy> (but that is a whole other RBAC debate really)
09:30:27 <jakeyip> what, more api changes?! :)
09:31:04 <dalees> johnthetubaguy: +1, we create these for users. self-serve is a minefield of labels :)
09:31:24 <johnthetubaguy> so many of our customers do that, as the users get them selvers in a mess when they create crazy templates, and they just don't have the time to help them with that
09:31:25 <jakeyip> FWIW we create templates for user. Users do create their templates off ours if they want something special.
09:31:32 <mnasiadka> I think let's not get into the policy battle for now
09:31:41 <mnasiadka> people can change magnum policy today and I think that's fine
09:31:48 <jakeyip> but let's shelve that, a whole other discussion
09:31:49 <mnasiadka> instead of enforcing our thinking on users :)
09:32:24 <johnthetubaguy> the problem is relevant to who is expected to create a template, and needing to know the avilable driver list right? but happy to ignore that for now
09:32:30 <johnthetubaguy> this feels like a PTG like discusion to me
09:32:32 <jakeyip> but I think what this leads to is that the API to discover drivers is not strictly necessary?
09:33:03 <jakeyip> at least, not at the inital to support CAPI driver. It can come later, if someone wants to.
09:33:40 <mnasiadka> I don't think it's strictly necessary for anything, it's nice to have - and given the size of the Magnum community, we might find better use for our time
09:34:10 <johnthetubaguy> I don't personally see this as blocking the driver, it only blocks side by side having two cluster api drivers, assuming we have the same os_distro flag, which now we don't
09:34:29 <johnthetubaguy> ... it does however seem useful and worth adding, either way
09:34:33 <jakeyip> +1 I agree. we will accept patches if someone do the work. needs a spec. let's leave it as that
09:34:57 <johnthetubaguy> does anyone want to implement that?
09:35:07 <mnasiadka> Let's leave that question for PTG
09:35:09 <johnthetubaguy> (I mean does anyone have time, really)
09:35:22 <jakeyip> johnthetubaguy: if we continue on the new approach to use CT to define driver, it's not a blocker for two CAPI driver side by side
09:35:23 <johnthetubaguy> Is magnum signed up to the PTG?
09:35:25 <jrosser> from an operator pov you need a k8s running capi somewhere - is this completely independant/decoupled from the particular magnum capi driver in use?
09:36:10 <jakeyip> jrosser: good point. there's a config option to point to the management option. I think we have to make sure config options don't clash
09:36:35 <johnthetubaguy> note quite, as the CRDs the drive use might depend on specific operator versions, but in theory they should overlap a lot
09:36:38 <jakeyip> at some point, Magnum still needs to assignment config sections to prevent conflicts
09:37:01 <jakeyip> e.g. each driver will have their section named after driver name
09:37:22 <johnthetubaguy> so the heat driver does that now, so I went for capi_helm in my current patch series
09:37:38 <jakeyip> edit: there's a config option to point to the management cluster
09:38:52 <johnthetubaguy> I think the two drivers are doing that today?
09:39:03 <johnthetubaguy> as in they both have spearate config, or did you mean they should share?
09:39:07 <jakeyip> johnthetubaguy: I don't really understand what you mean by 'heat driver does that now' ?
09:39:40 <johnthetubaguy> jakeyip: I guess its not really true, I mean their is a [heat] config section that is specific to the heat driver, but I guess its not really that clean
09:39:59 <jakeyip> johnthetubaguy: I mean, currently for both drivers, and also for future drivers, we sort of should have a standard for people implementing out of tree drivers to prevent conflicts
09:40:23 <jrosser> imho there needs to be some good thought to operator experience, i foresee a large attraction in general of the capi driver approach is to relieve the operator from having extremely deep k8s expertise but still be able to run the magnum service
09:42:06 <jakeyip> there are a few point of conflicts. (1) tuple (we solve by CT specifying driver) (2) config section (3) driver name
09:42:43 <jakeyip> I think understanding this is enough so we can help advice future patches / reviews.
09:43:00 <jakeyip> jrosser: can you elaborate? is there something we can do better?
09:43:40 <jrosser> i am very interested in the new direction of travel for magnum, it looks great
09:44:20 <jrosser> as an operator we have been unable to deploy the existing approach as it is too much burden
09:45:03 <johnthetubaguy> +1 Cluster API is a strong base, regardless of all this, and I am excited by the further traction its getting inside and outside of OpenStack
09:45:20 <jakeyip> ah I see, understand now.
09:46:26 <jakeyip> ok to bring the driver discussion back, would like to poll the room on our approach and if there's anything we've missed
09:48:40 <johnthetubaguy> jakeyip: my main question is what is needed to help get the capi_helm driver merged in the C cylce? And I suspect that is probably best discussed at the PTG if we can find a slot when lots of us can attend, including vexxhost driver representatives. I know I can't usually make this meeting time either.
09:50:06 <jakeyip> johnthetubaguy: will you, or someone from StackHPC, be able to work with VEXXHOST to align both your drivers?
09:50:10 <johnthetubaguy> Or maybe I should rephrase, that, to getting a clear yes we aim to merge (as we said in B) or we decide to not have any drivers in tree.
09:50:34 <johnthetubaguy> jakeyip: we have wanted to do that from the start, but helm is the deal breaker from both sides, based on our previous discussions
09:51:32 <johnthetubaguy> I have started some driver common utils where we have a bit of shared code already, it would be great to expand that, so the out of tree driver can consume that when it makes sense for it, or not, as it may choose.
09:52:17 <jakeyip> johnthetubaguy: I think VEXXHOST mainly wants to continue on the ClusterClass route.
09:52:51 <johnthetubaguy> I want to use clusterclass too, its on the roadmap, just spending our effort on magnum integration right now
09:53:08 <johnthetubaguy> it wasn't working when we started the helm charts, so we didn't use it from the start
09:53:26 <jakeyip> cool. definitely it can be a good collab.
09:53:30 <johnthetubaguy> similar to the add ons, we have our own helm add on installer, to get around the life-cycle issues on the updates there
09:54:44 <jakeyip> right now, our focus is more on how to get both drivers to work together. Once we figure that out, the way to getting your CAPI driver merged will be clear
09:54:50 <johnthetubaguy> jakeyip: I would like to collab, but I am not seeing the opertunity right now, beond some share utils, which is a shame, my strong preference is for a single in tree solution we all support :'(
09:55:14 <jakeyip> yeap I am keen for that.
09:56:44 <dalees> I think Magnum needs to ensure both drivers can co-exist (discussed above), and then I'm keen to see one or both merge if there are maintainers for them. Staying out of tree is okay too. We're actively picking up the helm driver and using it now.
09:57:14 <johnthetubaguy> I would be cool with both merging in tree too, that would be cool too right?
09:57:35 <jakeyip> the single in tree solution doesn't have to be there from day one. as long as we allow for multiple drivers, we can iterate
09:57:41 <johnthetubaguy> (much easier to share code and logic when we are both in tree)
09:58:28 <johnthetubaguy> jakeyip: I believe the current patches actively allow for both drivers today? at least that was always the intention on my part, although it wasn't the reality due to the problems we found in review.
09:58:56 <jakeyip> similar to how linux can work different FS. Allow multiple to exists, see which one wins out over time.
09:59:02 <johnthetubaguy> maybe a different question, based on the current patches, where is the tention?
09:59:18 <johnthetubaguy> s/tention/conflict/
09:59:35 <jakeyip> we need maintainers for any driver, which is the _hard_ problem.
09:59:53 <johnthetubaguy> jakeyip: that is a good comparison here, we use different package managers... I mean add on providers
10:00:59 <jakeyip> johnthetubaguy: I believed we have covered the conflicts so far?
10:01:27 <johnthetubaguy> jakeyip: I mean, I think they are all addressed in the current patches, I would love comments on the patches highlighting any that are left please
10:01:31 <dalees> johnthetubaguy: the actual image supported by both is identical. it's true they don't conflict now in the tuple, but only because of the changed `os_distro` (as well reasoned as it is, to differ from `ubuntu`) - there's no reason the vexxhost driver shouldn't launch using an identical image.
10:03:02 <johnthetubaguy> dalees: agreed the same images should work with both
10:03:34 <johnthetubaguy> but I don't know of a use case we are blocking that would actively want to support that
10:04:20 <johnthetubaguy> I do like the driver selection in the template, it would be a good add, and avoid this problem
10:05:25 <jakeyip> johnthetubaguy: I am not really sure of the question and basis you are asking from actually. may need clarification.
10:05:50 <jakeyip> are you asking for conflicts, on the basis that (1) os_distro has been changed (2) CT appraoch is not implemetned ?
10:05:55 <dalees> well that is a point; if I were moving drivers I'd not need both drivers to launch the same image; I'd use a new k8s version.
10:06:41 <johnthetubaguy> jakeyip: I am meaning, now we use differnet os_distro flags, I think they for most people, don't conflict
10:07:10 <johnthetubaguy> dalees: you wouldn't need to wait long to get a new release at least :)
10:07:46 <jakeyip> johnthetubaguy: ok I understand. yes things don't conflict now, with the os_distro change, but there are more about using the os_distro that I need to clarify.
10:08:06 <jakeyip> it is outside the scope of meeting though, if you would like to hang around a bit after?
10:08:19 <jakeyip> I am just concious we are over time
10:08:27 <mnasiadka> ok, do we need some summary?
10:08:41 <jakeyip> OK I will summarise this topic
10:08:49 <johnthetubaguy> afraid I really should run, I was meant to be with my customer at 9am, and its gone 11 now.
10:09:13 <jakeyip> johnthetubaguy: sure I may put comments in your PS then. or we find a better time
10:10:59 <jakeyip> #agreed we have stopped the CAPI driver in Bobcat due to conflict with VEXXHOST's driver. We will explore solutions that allow multiple drivers to exist in C cycle.
10:11:09 <jakeyip> anyone wants to add on?
10:11:21 <johnthetubaguy> jakeyip: yes to both, lets discuss in the patch
10:11:32 <mnasiadka> yes, we need a timeframe for vPTG and some etherpad to add all those ideas in there :)
10:12:14 <johnthetubaguy> So I don't really agree there is a conflict, but agree we need to work that out during the C cycle.
10:12:40 <jakeyip> mnasiadka: OK, vPTG will be separate discussion.
10:12:53 <jakeyip> let's close ClusterAPI
10:12:55 <johnthetubaguy> (I should say, I still don't really understand the conflict right now, lets work that out ASAP)
10:13:06 <jakeyip> #topic vPTG
10:13:44 <jakeyip> I didn't register for a vPTG this cycle because for the previous cycle the timing doesn't suit us, and we had our own vPTG discussion at this timeslot on the vPTG week
10:14:16 <jakeyip> I am not sure what's the best way to go, considering that it'll be valuable to have VEXXHOST attend too.
10:14:34 * dalees is willing to make some ungodly NZ hour, if it means more can attend.
10:14:55 <mnasiadka> I'm willing to make the same as dalees, but it will be 12 hours later for me ;)
10:15:15 <jakeyip> (this is all new to me/us, happened after vPTG registration has closed)
10:15:27 <mnasiadka> I think generally 19UTC is 7AM NZ time and 9PM CEST
10:16:06 <dalees> 7am is quite acceptable, for both myself and travisholton
10:16:09 <jakeyip> it is 5AM AEST but sure :)
10:16:19 <dalees> hah, sorry jakeyip  :)
10:16:23 <mnasiadka> ah right
10:16:24 <mnasiadka> lol
10:16:32 <jakeyip> it's ok
10:17:24 <johnthetubaguy> are there two different slots that would work, where many people can make both? (granted its all relative to massive jet lag)
10:17:29 <travisholton> we have daylight savings from next week as well
10:18:16 <johnthetubaguy> we have that soon too, that is a good point, is this where it gets worse again or better? I forget...
10:18:31 <travisholton> https://shorturl.at/fuvAD
10:18:34 <mnasiadka> travisholton: that might make things better, because our daylight savings time change is 29th Oct
10:18:53 <jakeyip> mnasiadka, johnthetubaguy, dalees - given you all may have different vPTGs to attend to, how is Wed 25/10 1900 UTC ?
10:19:28 <mnasiadka> it's ok for me
10:19:45 <johnthetubaguy> sorry, checking
10:19:55 <jakeyip> that is also a slot NOT on the vPTG https://ptg.opendev.org/ptg.html
10:20:40 <johnthetubaguy> I think that can work (its half term here)
10:20:46 <jakeyip> they may as well put 24 hour slots for the vPTG
10:21:14 <dalees> 8am NZT, yes - I can make that work.
10:21:20 <jakeyip> ptg-bot doesn't need toilet break
10:21:24 <travisholton> me too
10:22:05 <johnthetubaguy> sounds like its worth trying that time, and asking on the ML for folks that can't make that time
10:22:25 <jakeyip> ok let's pencil that in for now.
10:22:37 <opendevreview> Michal Nasiadka proposed openstack/magnum-tempest-plugin master: WIP: k8s driver CI tests  https://review.opendev.org/c/openstack/magnum-tempest-plugin/+/893131
10:23:49 <jakeyip> johnthetubaguy: ok I will make a post on ML to notify others. I am still holding out for mnaser too
10:24:02 <jakeyip> #topic Open Discussion
10:24:37 <jakeyip> free for all to post
10:25:35 <jakeyip> oops forgot for previous topic
10:26:03 <jakeyip> #agreed vPTG on 25 Oct 1700 UTC
10:26:30 <jakeyip> if there's nothing I would like to end the meeting.
10:26:56 <jakeyip> thanks everyone for coming!
10:27:23 <jakeyip> #endmeeting