15:00:16 <iurygregory> #startmeeting ironic
15:00:16 <opendevmeet> Meeting started Mon Aug 22 15:00:16 2022 UTC and is due to finish in 60 minutes.  The chair is iurygregory. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:16 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:16 <opendevmeet> The meeting name has been set to 'ironic'
15:00:30 <iurygregory> Hello everyone, welcome to our weekly meeting
15:00:34 <JayF> o/
15:00:39 <erbarr> o/
15:00:40 <matfechner> o/
15:00:43 <kubajj> o/
15:00:52 <TheJulia> o/
15:00:52 <rloo> o/
15:01:11 <dtantsur> o/
15:01:14 <iurygregory> you can find the agenda for the meeting in the wiki
15:01:17 <iurygregory> #link https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_next_meeting
15:01:18 <ajya> o/
15:01:29 <iurygregory> #topic Announcements / Reminder
15:01:31 <kamlesh6808c> o/
15:02:43 <rpioso> o/
15:02:52 <iurygregory> the first announcement is a sad one
15:03:13 <iurygregory> #info Ilya Etingof Passed Away. Goodbye, etingof!
15:03:21 <iurygregory> #link https://lists.openstack.org/pipermail/openstack-discuss/2022-August/030062.html
15:03:38 <ajya> Sorry to hear that, my condolences.
15:04:05 <iurygregory> I would like to share with our community, some of us knew him a lot
15:04:07 <arne_wiebalck> o/
15:05:45 <rloo> OH no, I'm so sorry.
15:07:03 <rpioso> I am very saddened to learn of etingof's passing. My condolences to the ironic community, Red Hat, and Ilya's family and friends.
15:07:03 <iurygregory> #info This week we will release our non-client libraries
15:07:55 <ajya> that's sushy also?
15:08:05 <iurygregory> ajya, correct
15:08:06 <dtantsur> sushy, ironic-lib, metalsmith
15:08:10 <iurygregory> yeah
15:08:26 <dtantsur> we need to check for outstanding patches (I have one, has some comments)
15:08:36 <iurygregory> so we will focus on reviewing this 3 to make sure we have included what we want in Zed =)
15:08:48 <iurygregory> dtantsur, yeah
15:09:00 <ajya> can I get 2nd reviewer for this https://review.opendev.org/c/openstack/sushy/+/850899 and include that in release?
15:09:27 <iurygregory> ajya, sure we will try to include the open patches =)
15:09:27 <JayF> I'll look; I haven't traditionally worked much on sushy but should proabbly ramp it up.
15:09:33 <ajya> thanks
15:09:52 <iurygregory> #info Antelope PTG etherpad
15:10:07 <iurygregory> #link https://etherpad.opendev.org/p/ironic-antelope-ptg
15:10:21 <iurygregory> just a reminder that our etherpad for the PTG is this one =)
15:11:39 <iurygregory> #info PTG registration
15:11:49 <iurygregory> #link https://openinfra-ptg.eventbrite.com/
15:11:56 <iurygregory> don't forget to register for the PTG
15:12:33 <TheJulia> Please register for the PTG so the foundation knows how many attendees plan to actively engage. This allows them to have information for future planning as well, so everyone attending registering would help them a lot.
15:13:34 <iurygregory> #info ironic-ui is fixed =)
15:13:51 * TheJulia suspects we all need to dance now
15:14:28 <dtantsur> yay!
15:14:31 <iurygregory> I don't know the irc handle of Vishal, tks for the help!
15:14:39 <iurygregory> #link https://review.opendev.org/c/openstack/ironic-ui/+/852702
15:15:21 <iurygregory> no action items from previous meeting, skipping
15:15:31 <iurygregory> #topic Review subteam status reports
15:15:40 <iurygregory> #link https://etherpad.openstack.org/p/IronicWhiteBoard
15:15:59 <iurygregory> starting around L90
15:18:25 <JayF> Are there even meaningful updates there to review?
15:18:43 <TheJulia> I guess the one w/r/t anaconda
15:18:46 <iurygregory> Anaconda CI
15:18:53 <iurygregory> yup =)
15:19:32 <TheJulia> Uhh... so... tl;dr is I cannot use opendev's mirror system without hacking in another feature (maybe) into the interface to explicitly delineate package repositories versus all the other artifacts
15:19:55 <iurygregory> =(
15:20:11 <TheJulia> in essence, the mirror can't take on a more stuff without there being an increasingly negative impact, and the guidance is to just use public mirrors for folks doing Rocky linux
15:20:23 <JayF> Have we looked at if we can make the install lighter/faster in any way to help get past timeouts?
15:20:23 <TheJulia> so.. I *think* the net effect is I just need to get the timing right
15:20:39 <JayF> I'm mainly curious if we can pass flags to anaconda, disable some of the setup steps to get a thinner test
15:20:40 <TheJulia> it is fairly minimal, downloads can just take a ton of time
15:20:48 <JayF> but I haven't looked at a breakdown of what it's spending most of the time in
15:20:59 <JayF> except that one were we saw it was taking ~5 minutes for all the packages
15:21:00 <TheJulia> possibly, although again, I'm thinking we're talking borderline featurey things
15:21:09 <TheJulia> maybe those are okay since they would just be jinja2 parsing
15:21:11 <JayF> but really ... 5 minutes is small in context of an hour+ job
15:21:42 <TheJulia> yeah, latest run got to configuring the kernel post-install at 1hr. I think we're also having the lack of paravirt kill us too
15:21:56 <TheJulia> since it is a lot of CPU overhead on uncompressing
15:22:14 <TheJulia> I'm going to push forward, already working to make it it's own single job
15:22:25 <TheJulia> just wanted folks to generally be aware
15:22:40 <JayF> if you want to pair on this at any point, or just get a second set of eyes, feel free to ping me and/or we can even set aside some time
15:22:51 <TheJulia> The "evil" option in my brain is just look to see if anaconda checked in with ironic, and then abort the deployu
15:22:56 <TheJulia> which could be valid too...
15:23:04 <TheJulia> Maybe we should discuss that instead
15:23:07 <JayF> that's exactly the kind of short circuiting I'm looking for
15:23:26 <TheJulia> Ultimately, it all comes down to the template and it's contents, it is hyper customizable
15:23:28 <JayF> ironic does all the orchestration up front; do we want the CI to "catch" anaconda breakages, or just Ironic's ability to set the table for anaconda?
15:24:02 <TheJulia> It wouldn't catch the close out of the deploy... but that should be fairly clear if it breaks based upon reports/issues
15:24:16 <TheJulia> but starting/template processing/data handling, it would catch any issues there
15:24:33 <JayF> we could even intentionally have a deploy break and return an error to ironic
15:24:41 <JayF> testing the unhappy path is arguably more important than the happy one
15:24:56 <TheJulia> depending on what, yeah
15:25:07 <JayF> get anaconda going far enough for it to be able to do the err callback to ironic; then we know at least we setup anaconda as expected
15:25:08 <TheJulia> Anyway, I could use opinions here, I'm a bit tired of fighting this :)
15:25:21 <JayF> yeah, use me as the help for that, I'm somewhat guilty for helping upstream that sans-CI
15:25:25 <TheJulia> That might be a really good separate test, fwiw
15:25:43 <JayF> I am proposing it potentially as the only test, as we could probably skip all the package installs
15:25:44 <TheJulia> do one that aborts, do one that errors, call it a day
15:25:53 <JayF> then you get something working sooner, then worry about the happy/aborted path
15:25:59 <TheJulia> oh, error would get called before package installs I believe
15:26:04 <JayF> that's what I'm saying
15:26:25 <TheJulia> I think you just need to feed it invalid stuffs and it calls %onerror
15:26:34 <rloo> what exactly are we trying to test wrt anaconda interface?
15:26:45 <rloo> minimum test I guess...
15:26:55 <JayF> Heh. You could go an even step further. Write test code in anaconda template, run it, use the returned err as an indication that things were setup as expected (or not)
15:26:58 <TheJulia> I'd like to know end to end it works
15:27:12 <TheJulia> *but* there is a lot of overhead to make it happy in our CI
15:27:13 <rloo> end-to-end == actually install an OS image?
15:27:24 <TheJulia> well, in this case, install from a repository
15:27:30 <JayF> rloo: yeah TheJulia has it working with an actual install, it just times out in the last few %
15:27:42 <TheJulia> like last 10% it looks like
15:27:52 <JayF> using the support added for repo-based deploys instead of liveimg based deploys
15:27:55 <TheJulia> but it is quite variable based upon the mirrors
15:28:41 <rloo> gad. hmm... and it needs everything from the mirrors?
15:29:11 <rloo> and where does the time out come from? can we increase it?
15:29:43 <TheJulia> not *everything* only like 320 rpms
15:29:50 <TheJulia> infra cannot carry the stage2/install image
15:29:56 <TheJulia> so... no local mirrors
15:29:57 <iurygregory> only...
15:30:03 <TheJulia> we should move on
15:30:06 <TheJulia> and continue in open discussion
15:30:24 <iurygregory> I was about to say this =)
15:30:36 <iurygregory> #topic Deciding on priorities for the coming week
15:30:43 <iurygregory> #link https://review.opendev.org/q/status:open+hashtag:ironic-week-prio
15:31:32 <JayF> I'm working on trying to get several Nova-Ironic driver patches backported, probably wouldn't hurt to get more Ironic +1s on them if folks want to add them to their review list as well. I'm tracking it here: https://etherpad.opendev.org/p/NovaPatchesFromJay
15:31:48 <JayF> (I don't think we can put weekly-prio tag on nova patches)
15:32:16 <kamlesh6808c> Can you please help to add this patch to week priority list : https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/853621
15:32:23 <TheJulia> I think we can hashtag it...
15:32:42 <iurygregory> TheJulia, normally only the owner of the patch can do that if I recall
15:32:50 <TheJulia> ahh, yeah!
15:32:52 <iurygregory> also depends on the config for the project
15:33:18 <iurygregory> for example https://review.opendev.org/c/openstack/nova/+/813897 has the hashtag
15:33:55 <iurygregory> so JayF you can probably try to add the hashtag (I think it should work...)
15:34:31 <iurygregory> kamlesh6808c, added
15:34:39 <kamlesh6808c> thanks !
15:34:56 <iurygregory> I'm adding dtantsur's patch https://review.opendev.org/c/openstack/sushy/+/851023 also
15:35:43 <JayF> iurygregory, all: I updated the hashtag on those nova stable patches owned by me (many are owned by others and I'm just playing frontman to get them merged lol)
15:35:54 <iurygregory> JayF, no worries!
15:35:58 <iurygregory> tks!
15:36:29 <iurygregory> not sure if Eric Lei is around to update https://review.opendev.org/c/openstack/ironic-lib/+/844666
15:36:48 <iurygregory> I'll push an edit later today so we can merge =)
15:38:20 <iurygregory> metalsmith doesn't seem to have patches we would need to review
15:38:52 <iurygregory> moving on o/
15:38:55 <iurygregory> #topic Baremetal SIG
15:38:56 <TheJulia> given the time, I think one of us should jsut make the change
15:39:18 <arne_wiebalck> NTR for the SIG
15:39:27 <iurygregory> tks arne_wiebalck =)
15:39:45 <iurygregory> #topic RFE review
15:40:00 <iurygregory> I'm a bit puzzled if the topic from open discussion would be rfe review... =)
15:40:26 <arne_wiebalck> yep, could be
15:40:48 <iurygregory> #info Discussion of the software RAID story
15:40:55 <iurygregory> #link https://storyboard.openstack.org/#!/story/2010233
15:41:21 <iurygregory> kubajj, o/
15:41:29 <arne_wiebalck> kubajj has been working on extending the disk protection to s/w RAID devices
15:41:55 <arne_wiebalck> one question we ran into is what to do with create_configuration
15:42:10 <arne_wiebalck> i.e. when the devices are re-created
15:42:18 <JayF> Can I ask a question a step behind that?
15:42:27 <arne_wiebalck> sure
15:42:27 <kubajj> Sure
15:42:31 <JayF> Why do we need the ability to explicitly skip disks that hold RAID partitions
15:42:45 <JayF> if the operator already has the (thanks to kubajj) ability to skip disks based on device hints?
15:43:10 <kubajj> Because RAIDs are skipped by default anyway. They are handled in a different function
15:43:22 <JayF> Ah, and you just want to add software raids to those that are skipped.
15:43:49 <JayF> We have to be careful how we implement this to prevent a malicious actor from putting something that looks like a raid superblock on a disk to prevent being cleaning
15:43:50 <kubajj> Yeah, the goal is just to extend the functionality.
15:43:51 <JayF> **cleaned
15:44:56 <arne_wiebalck> JayF: everything but the partitions which form the RAID are cleaned
15:45:09 <arne_wiebalck> well, almost everything :-D
15:45:34 <JayF> Yeah, I'm not saying we shouldn't do it, I'm saying we should be careful and make sure there's an opt-out for anyone with a higher security bar
15:46:13 <arne_wiebalck> JayF: sure, unless you explicit say on the node that you would like to skip sth, all will be cleaned
15:46:24 <arne_wiebalck> like before
15:46:29 <JayF> awesome
15:46:40 <arne_wiebalck> this is about the special case where you have multiple s/w RAID devices
15:46:48 <JayF> I apologize, some of this stuff, I don't know what happened when I wasn't looking so I appreciate you filling in the context
15:46:50 <arne_wiebalck> and you would like to skip cleaning *some*
15:47:22 <kubajj> exactly, the plan is to use the volume_name as mentioned in the story and include it in the skip_block_devices list in the properties section like with normal disks
15:48:00 <arne_wiebalck> the volume name is the name you give the device itself (not the block device file), it is md device metadata
15:49:02 <kubajj> https://review.opendev.org/c/openstack/ironic-python-agent/+/853182 enables actually creating logical disks with volume name enabled
15:49:17 <kubajj> I have tested it out on our testing node and it works
15:49:21 <arne_wiebalck> and the question was if there is an obvious problem with this ... I think dtantsur mentioned the inspector as one potential source of problems
15:50:23 <arne_wiebalck> otherwise we go ahead and see where it gets us :)
15:52:13 <iurygregory> I liked the idea, just trying to understand the inspector problem ...
15:52:51 <arne_wiebalck> the inspector adds the root device to the inspector data (I think)
15:53:11 <JayF> Are we not worried at all about the ability for whoever got that device provisioned to them being able to change that volume name?
15:53:25 <JayF> Like, if that's not a case we're worried about; awesome... but it's trivial for that volume name to change
15:54:13 <JayF> I was going to suggest PARTUUID but pretty much any unique identifier is changable from the system :(
15:54:18 * TheJulia wonders if we're scope creeping to cover all possibilities as opposed to trying to cover 90%
15:54:27 <arne_wiebalck> if you change the volume name, the next cleaning would erase your data
15:54:30 <JayF> TheJulia: that's why I asked if we were worried about it :D
15:54:36 <TheJulia> (not to say everything is good!, but obviously we need to start somewhere)
15:54:37 <JayF> arne_wiebalck: of course, so it fails safe
15:54:50 <JayF> aight, sounds like fun :) I look forward to reviewing it
15:54:55 <arne_wiebalck> heh
15:55:10 <arne_wiebalck> ok, we can check with dtantsur directly once more, seems he is not here atm
15:55:37 <dtantsur> I'm kinda here, not following the discussion tho
15:55:50 <JayF> I have a small item for open discussion, if we've talked this one through
15:56:06 <iurygregory> JayF, go ahead
15:56:25 <JayF> So, most of you know I've been working a new job, with 20% time dedicated generally to openstack and 80% to the other project I work on, Armada
15:56:33 <dtantsur> iurygregory: the problem with inspector is that it does not have access to any symlinks on the node
15:56:35 <JayF> over the next few weeks, those percentages will be swapping and you'll be seeing more of me around
15:57:07 <JayF> I'm going to focus, until I get into a good cycle of my own work, on upstreaming some stuff from downstream here, stable maintenance, and reviews
15:57:19 <JayF> but feel free to nerd snipe me in for bigger stuff as I will have the time to give
15:57:23 <iurygregory> JayF, nice
15:57:59 <arne_wiebalck> dtantsur: I fail to see how that is a problem
15:58:21 <dtantsur> the discussion was around resolving /dev/md/<something>
15:58:25 <JayF> I know we don't have a formal role for it anymore, but generally I am going to try to be the grand-poo-bah of stable branches, getting stuff backported (and starting with heralding our long-outstanding patches to our nova collegues)
15:58:28 <dtantsur> I still haven't read the full scrollback
15:58:37 <arne_wiebalck> dtantsur: we can do this tomorrow or so
15:58:49 <dtantsur> JayF: I feel bad asking someone to look at our bug list.. but someone needs to look at our bug list
15:58:50 <JayF> That /dev/md/<something> is going away in newer mdadm aiui
15:59:02 <JayF> when you said volume name, I assumed that was shorthand for the partition label
15:59:02 <TheJulia> cool!
15:59:32 <JayF> dtantsur: when I get ramped up, i'll specifically cut aside some time to work on bugs
15:59:37 <TheJulia> JayF: is it hiding in under the lvm interface?
15:59:38 <arne_wiebalck> JayF:  no, what we mean is the name of the md device (which is sth you can set on an md device)
15:59:50 <JayF> arne_wiebalck: that is going away in newer mdadm/kernel combinations
15:59:58 <JayF> arne_wiebalck: people were complaining about it being gone in gentoo channels the other day
16:00:04 <dtantsur> arne_wiebalck: without /dev/md/<something>, volume name seems useless
16:00:04 <TheJulia> oh noes
16:00:12 <kubajj> JayF: you mean the --name option?
16:00:21 <arne_wiebalck> dtantsur: why?
16:00:28 <dtantsur> arne_wiebalck: how else do you access it?
16:00:31 <JayF> kubajj:  I *think* so? I'd have to look in depth to remember exactly
16:00:37 <arne_wiebalck> mdadm --detail
16:00:51 <dtantsur> arne_wiebalck: okay, here is the trick: how are you going to do it on the inspector side? :)
16:00:57 <dtantsur> assuming you want it as part of the root device hints
16:01:08 <arne_wiebalck> dtantsur: why do I need to do it on the inspector side?
16:01:14 <dtantsur> arne_wiebalck: because we already do
16:01:16 <iurygregory> we are at the top of the hour for the meeting , going to close and we can continue talking :D
16:01:22 <dtantsur> we parse and process root device hints in inspector
16:01:24 <iurygregory> #endmeeting