18:00:22 <JayF> #startmeeting tc
18:00:22 <opendevmeet> Meeting started Tue Jun 27 18:00:22 2023 UTC and is due to finish in 60 minutes.  The chair is JayF. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:00:22 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
18:00:22 <opendevmeet> The meeting name has been set to 'tc'
18:00:33 <JayF> #topic Roll Call
18:00:39 <rosmaita> o/
18:00:39 <gmann> o/
18:00:39 <jamespage> o/
18:00:40 <spotz[m]> o/
18:00:47 <noonedeadpunk> \o semi around
18:00:58 <JayF> Hello everyone, welcome to the June 27th meeting of the TC. I'm your substitute chair for the meeting as knikolla is not feeling well.
18:00:58 <dansmith> o/
18:01:10 <slaweq> o/
18:01:21 <JayF> A reminder that this meeting is held under the OpenInfra Code of Conduct available at https://openinfra.dev/legal/code-of-conduct
18:01:53 <JayF> Thanks everyone for coming, it's obviously quorum so I'm going to get going
18:02:04 <JayF> #topic Follow up on past action items
18:02:35 <JayF> #note JayF followed up on action item to capture discussion related to SQLAlchemy2.0 migration
18:02:42 <JayF> #link https://review.opendev.org/c/openstack/governance/+/887083
18:03:03 <JayF> I'd love to get feedback on that, and thanks gmann already for participating so we got a revision done before the meeting :)
18:03:14 <gmann> perfect, thanks, I will check
18:03:25 <JayF> If there's any comments on that in IRC now I'll hear them, otherwise I'll be responsive to reviews in gerrit.
18:03:58 <JayF> the other action item was for Kristi, who isn't here today
18:04:11 <JayF> #action knikolla will write alternative proposals for Extended Maintenance
18:04:25 <spotz[m]> A few grammary things I see off the top of y head, I'll comment in gerrit
18:04:29 <JayF> I'm carrying that action over to next meeting, but maybe we'll have some input for him as the next topic is
18:04:30 <rosmaita> i haven't seen any feedback about EM on the mailing list ... did i miss it?
18:04:39 <JayF> #topic Extended Maintenance
18:04:49 <dansmith> rosmaita: I don't think so, I'm also looking for it
18:05:03 <gmann> yeah, no discussion on ML yet
18:05:08 <rosmaita> ok, thanks for the confirmation
18:06:04 <JayF> Extended maintanence is on the agenda as a topic; but I'm unsure what we are looking for.
18:06:39 <gmann> in last meeting, we talked about discussing it on ML before TC decide/take any next step
18:06:59 <gmann> and that is what forum sessions also ended up to continue the discussion on ML
18:07:03 <JayF> I'll take an action item to create a discussion thread on the ML about it then, if folks are amenable to that?
18:07:19 <JayF> I know I'm one of the people who thought longer form comms would be helpful for this, so I can take the action to try and kick it off.
18:07:25 <JayF> Any objection or other volunteers? :)
18:07:36 <spotz[m]> go for it:)
18:07:36 <dansmith> I thought knikolla was goign to do that?
18:07:39 <dansmith> isn't that the action item?
18:07:49 <JayF> His action was listed as writing alternate proposals for Extended Maintenance
18:08:02 <JayF> I was under the impression that'd be in the form of a gerrit review to governance, not a mailing list discussion
18:08:12 <dansmith> I was assuming the opposite
18:08:37 <JayF> You are correct, looking at the logs.
18:08:40 <gmann> I think email is what he also agree instead of writing governance change fore now?
18:08:50 <gmann> *for
18:08:53 <JayF> I'll reach out to knikolla tomorrow; if he wants to continue the action he can and I'll support in whatever way I can.
18:09:03 <gmann> ++
18:09:08 <rosmaita> email first makes sense, to find out what form the governance proposal should take
18:09:23 <gmann> yeah
18:09:46 <JayF> Anything else on Extended Maintenance topic before we move along?
18:10:28 <gmann> nothing else from me
18:10:50 <JayF> #topic Gate Health Check
18:11:10 <gmann> its been very unstable last week and till now I will say
18:11:15 <dansmith> yeah
18:11:24 <slaweq> I saw many timeouted jobs in last few days/weeks
18:11:29 <dansmith> that new test and 100% fail on ceph caught us off guard
18:11:29 <gmann> I am seeing lot of timeout in tempest slow and other jobs too
18:11:30 <JayF> Ironic has been fighting for at least two weeks now an issue where our DB migration unit tests will hang in the gate. We've been working to fix it, and the incidence is somewhat reduced, but it's been extremely painful.
18:11:38 <slaweq> is it like that in other projects too or it's something neutron related?
18:11:47 <gmann> yeah rebuild test failing 100% on ceph which is skipped now to unblock gate
18:12:16 <gmann> but timeout is lot more now a days not sure what changed
18:12:32 <JayF> There was also a breaking change in Ironic tests due to a tempest image naming change (I'm not sure if we were misusing it or the API changed; but it's fixed now, ty gmann and others who worked on it)
18:12:40 <dansmith> JayF: that sounds like maybe a good opportunity for a bisect to find out what changed
18:12:57 <JayF> dansmith: that's what's difficult: we've had extreme difficulty reproducing, even one-off, locally
18:13:07 <gmann> JayF: yeah DEFAULT_IMAGE_NAME change broke ironic and octavia and may be more. but it is reverted now
18:13:21 <JayF> dansmith: we've had two contributors get it reproduced, made a fix, stopped reproducing it locally, but it still breaks in the gate
18:13:23 <dansmith> JayF: well, you can bisect in the gate, depending on how reliable it is there, but okay
18:13:29 <JayF> it's not reliable there
18:13:34 <dansmith> JayF: any chance it's in opportunistic tests that use actual mysql?
18:13:43 <JayF> It is 100% in those tests, in our DB migration tests
18:14:00 <JayF> and the failures seem to corrolate, roughly, to when more jobs are running
18:14:16 <dansmith> yeah, but they're not sharing a mysql so that shouldn't really matter
18:14:20 <JayF> so we've been bamboozled a couple of times where we thought we had them fixed b/c rechecks passed over a weekend, but then the failures restarted
18:14:22 <dansmith> unless it's a noisy neighbor race or something
18:14:32 <JayF> I'm thinking exactly ^ that, noisy neighbor race
18:14:35 <dansmith> anyway, need not debug it here
18:14:49 <JayF> yeah, but it's been painful and if someone wants to help us debug it anywhere, I think we'd take any help we can get :D
18:15:00 <JayF> (although I'm looking away from it today to retain sanity)
18:15:21 <JayF> so to summarize, sounds like gate health is probably a bit lower than usual, but we are all working together to restore more stability (and maybe are arriving there?)
18:15:55 <JayF> Any other comments on the Gate, and Gate Health, before moving on?
18:16:17 <gmann> due to working on other isues, i have not debugged the timeout and why it is more since last week
18:16:30 <rosmaita> if it's a noisy neighbor thing, maybe move the DB tests to a periodic job?
18:16:35 <gmann> but this is something more critical as we are moving close to release
18:17:00 <JayF> rosmaita: that's an incredibly good idea for if we finally give up on it, but I don't think we're quite to that point (perhaps)
18:17:02 <dansmith> rosmaita: not a good idea for migration tests
18:17:13 <dansmith> because you want to know if those are bad definitely before they land
18:17:20 <gmann> ++
18:17:29 <rosmaita> well, how many patches have migration test changes?
18:17:35 <dansmith> especially if they work on sqlite but not mysql due to typing differences
18:17:37 <fungi> periodic jobs all kick off at the same exact moment, so are more susceptible to that, not less
18:17:47 <rosmaita> ok, i withdraw my suggestion
18:18:10 <fungi> you should look at our node usage graphs some time, you can see the periodic daily and weekly spikes dominate
18:18:12 <clarkb> but also it looks like a locking issue? yes noisy neighbors may make it worse but if you have a locking issue ou don't want to paper over that
18:18:18 <JayF> I'll be looking at this again tomorrow morning, PDT, in #openstack-ironic if folks wanna take a look please do :) I might document the path we've been down on an ehterpad
18:18:25 <clarkb> you want to reorder operations so that you cannot deadlock regardless of cpu/io/etc demand
18:18:51 <dansmith> clarkb: right agree
18:18:57 <dansmith> and being in real mysql,
18:19:10 <dansmith> I would think you have more tools for examining the deadlock if it's something on the transaction side
18:19:17 <dansmith> maybe we need more logs or log level or something if that's the case
18:19:34 <dansmith> and if it's not (i.e. it's something in ironic code itself) examining what all has changed lately with a critical eye
18:20:04 <JayF> I am incredibly appreciative of the suggestions and all, but we should probably keep the deep discussion of how to fix it out of the TC meeting
18:20:18 <JayF> Even if I'm tempted to use all the brains assembled :D
18:20:36 <JayF> #topic Open Discussion & Reviews
18:20:54 <JayF> We have a couple of active open reviews at https://review.opendev.org/q/projects:openstack/governance+is:open
18:21:02 <JayF> Please review, vote, and provide feedback as appropriate.
18:21:12 <JayF> Is there anything for Open Discussion?
18:23:07 <JayF> Last call?
18:23:28 <gmann> nothing else from me
18:23:41 <JayF> Thank you all for attending the TC meeting, our next meeting is on July 4th, which is a US holiday.
18:23:56 <gmann> oh, we want to cancel that
18:24:02 <JayF> Yeah, that's what I was thinking.
18:24:08 <dansmith> cancel or don't but I shan't be here :)
18:24:12 <JayF> same :)
18:24:25 <gmann> I will be here but ok to cancel
18:24:32 <slaweq> +1 for cancelling next week's meeting
18:24:47 <rosmaita> +1 from me too
18:24:56 <JayF> Barring any objection, I consider this unanimous consent to cancel the July 4 meeting.
18:25:08 <JayF> In which case, our next meeting is July 11 and is scheduled to be chaired by knikolla
18:25:11 <JayF> thank you all for attending
18:25:12 <JayF> #endmeeting