18:00:22 #startmeeting tc 18:00:22 Meeting started Tue Jun 27 18:00:22 2023 UTC and is due to finish in 60 minutes. The chair is JayF. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:00:22 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:00:22 The meeting name has been set to 'tc' 18:00:33 #topic Roll Call 18:00:39 o/ 18:00:39 o/ 18:00:39 o/ 18:00:40 o/ 18:00:47 \o semi around 18:00:58 Hello everyone, welcome to the June 27th meeting of the TC. I'm your substitute chair for the meeting as knikolla is not feeling well. 18:00:58 o/ 18:01:10 o/ 18:01:21 A reminder that this meeting is held under the OpenInfra Code of Conduct available at https://openinfra.dev/legal/code-of-conduct 18:01:53 Thanks everyone for coming, it's obviously quorum so I'm going to get going 18:02:04 #topic Follow up on past action items 18:02:35 #note JayF followed up on action item to capture discussion related to SQLAlchemy2.0 migration 18:02:42 #link https://review.opendev.org/c/openstack/governance/+/887083 18:03:03 I'd love to get feedback on that, and thanks gmann already for participating so we got a revision done before the meeting :) 18:03:14 perfect, thanks, I will check 18:03:25 If there's any comments on that in IRC now I'll hear them, otherwise I'll be responsive to reviews in gerrit. 18:03:58 the other action item was for Kristi, who isn't here today 18:04:11 #action knikolla will write alternative proposals for Extended Maintenance 18:04:25 A few grammary things I see off the top of y head, I'll comment in gerrit 18:04:29 I'm carrying that action over to next meeting, but maybe we'll have some input for him as the next topic is 18:04:30 i haven't seen any feedback about EM on the mailing list ... did i miss it? 18:04:39 #topic Extended Maintenance 18:04:49 rosmaita: I don't think so, I'm also looking for it 18:05:03 yeah, no discussion on ML yet 18:05:08 ok, thanks for the confirmation 18:06:04 Extended maintanence is on the agenda as a topic; but I'm unsure what we are looking for. 18:06:39 in last meeting, we talked about discussing it on ML before TC decide/take any next step 18:06:59 and that is what forum sessions also ended up to continue the discussion on ML 18:07:03 I'll take an action item to create a discussion thread on the ML about it then, if folks are amenable to that? 18:07:19 I know I'm one of the people who thought longer form comms would be helpful for this, so I can take the action to try and kick it off. 18:07:25 Any objection or other volunteers? :) 18:07:36 go for it:) 18:07:36 I thought knikolla was goign to do that? 18:07:39 isn't that the action item? 18:07:49 His action was listed as writing alternate proposals for Extended Maintenance 18:08:02 I was under the impression that'd be in the form of a gerrit review to governance, not a mailing list discussion 18:08:12 I was assuming the opposite 18:08:37 You are correct, looking at the logs. 18:08:40 I think email is what he also agree instead of writing governance change fore now? 18:08:50 *for 18:08:53 I'll reach out to knikolla tomorrow; if he wants to continue the action he can and I'll support in whatever way I can. 18:09:03 ++ 18:09:08 email first makes sense, to find out what form the governance proposal should take 18:09:23 yeah 18:09:46 Anything else on Extended Maintenance topic before we move along? 18:10:28 nothing else from me 18:10:50 #topic Gate Health Check 18:11:10 its been very unstable last week and till now I will say 18:11:15 yeah 18:11:24 I saw many timeouted jobs in last few days/weeks 18:11:29 that new test and 100% fail on ceph caught us off guard 18:11:29 I am seeing lot of timeout in tempest slow and other jobs too 18:11:30 Ironic has been fighting for at least two weeks now an issue where our DB migration unit tests will hang in the gate. We've been working to fix it, and the incidence is somewhat reduced, but it's been extremely painful. 18:11:38 is it like that in other projects too or it's something neutron related? 18:11:47 yeah rebuild test failing 100% on ceph which is skipped now to unblock gate 18:12:16 but timeout is lot more now a days not sure what changed 18:12:32 There was also a breaking change in Ironic tests due to a tempest image naming change (I'm not sure if we were misusing it or the API changed; but it's fixed now, ty gmann and others who worked on it) 18:12:40 JayF: that sounds like maybe a good opportunity for a bisect to find out what changed 18:12:57 dansmith: that's what's difficult: we've had extreme difficulty reproducing, even one-off, locally 18:13:07 JayF: yeah DEFAULT_IMAGE_NAME change broke ironic and octavia and may be more. but it is reverted now 18:13:21 dansmith: we've had two contributors get it reproduced, made a fix, stopped reproducing it locally, but it still breaks in the gate 18:13:23 JayF: well, you can bisect in the gate, depending on how reliable it is there, but okay 18:13:29 it's not reliable there 18:13:34 JayF: any chance it's in opportunistic tests that use actual mysql? 18:13:43 It is 100% in those tests, in our DB migration tests 18:14:00 and the failures seem to corrolate, roughly, to when more jobs are running 18:14:16 yeah, but they're not sharing a mysql so that shouldn't really matter 18:14:20 so we've been bamboozled a couple of times where we thought we had them fixed b/c rechecks passed over a weekend, but then the failures restarted 18:14:22 unless it's a noisy neighbor race or something 18:14:32 I'm thinking exactly ^ that, noisy neighbor race 18:14:35 anyway, need not debug it here 18:14:49 yeah, but it's been painful and if someone wants to help us debug it anywhere, I think we'd take any help we can get :D 18:15:00 (although I'm looking away from it today to retain sanity) 18:15:21 so to summarize, sounds like gate health is probably a bit lower than usual, but we are all working together to restore more stability (and maybe are arriving there?) 18:15:55 Any other comments on the Gate, and Gate Health, before moving on? 18:16:17 due to working on other isues, i have not debugged the timeout and why it is more since last week 18:16:30 if it's a noisy neighbor thing, maybe move the DB tests to a periodic job? 18:16:35 but this is something more critical as we are moving close to release 18:17:00 rosmaita: that's an incredibly good idea for if we finally give up on it, but I don't think we're quite to that point (perhaps) 18:17:02 rosmaita: not a good idea for migration tests 18:17:13 because you want to know if those are bad definitely before they land 18:17:20 ++ 18:17:29 well, how many patches have migration test changes? 18:17:35 especially if they work on sqlite but not mysql due to typing differences 18:17:37 periodic jobs all kick off at the same exact moment, so are more susceptible to that, not less 18:17:47 ok, i withdraw my suggestion 18:18:10 you should look at our node usage graphs some time, you can see the periodic daily and weekly spikes dominate 18:18:12 but also it looks like a locking issue? yes noisy neighbors may make it worse but if you have a locking issue ou don't want to paper over that 18:18:18 I'll be looking at this again tomorrow morning, PDT, in #openstack-ironic if folks wanna take a look please do :) I might document the path we've been down on an ehterpad 18:18:25 you want to reorder operations so that you cannot deadlock regardless of cpu/io/etc demand 18:18:51 clarkb: right agree 18:18:57 and being in real mysql, 18:19:10 I would think you have more tools for examining the deadlock if it's something on the transaction side 18:19:17 maybe we need more logs or log level or something if that's the case 18:19:34 and if it's not (i.e. it's something in ironic code itself) examining what all has changed lately with a critical eye 18:20:04 I am incredibly appreciative of the suggestions and all, but we should probably keep the deep discussion of how to fix it out of the TC meeting 18:20:18 Even if I'm tempted to use all the brains assembled :D 18:20:36 #topic Open Discussion & Reviews 18:20:54 We have a couple of active open reviews at https://review.opendev.org/q/projects:openstack/governance+is:open 18:21:02 Please review, vote, and provide feedback as appropriate. 18:21:12 Is there anything for Open Discussion? 18:23:07 Last call? 18:23:28 nothing else from me 18:23:41 Thank you all for attending the TC meeting, our next meeting is on July 4th, which is a US holiday. 18:23:56 oh, we want to cancel that 18:24:02 Yeah, that's what I was thinking. 18:24:08 cancel or don't but I shan't be here :) 18:24:12 same :) 18:24:25 I will be here but ok to cancel 18:24:32 +1 for cancelling next week's meeting 18:24:47 +1 from me too 18:24:56 Barring any objection, I consider this unanimous consent to cancel the July 4 meeting. 18:25:08 In which case, our next meeting is July 11 and is scheduled to be chaired by knikolla 18:25:11 thank you all for attending 18:25:12 #endmeeting