Tuesday, 2023-06-27

*** tobias-urdin is now known as tobias-urdin-pto10:43
*** blarnath is now known as d34dh0r5313:26
knikollatc-members: I’m not feeling well and will be missing today’s meeting. JayF agreed to run the meeting in my place. Thank you!15:54
JayFknikolla: that also means I can tell them that an hour before so you can rest :)15:54
jamespageknikolla: sorry to hear that hope you feel better soon15:55
JayFI'm writing up a draft resolution around sqlalchemy; the timeline for the migration needs to be during/early in D (2024.2) cycle, right? Because we can't bump SQLA version for C due to SLURP?16:28
clarkbI don't think slurp means you cannot update dependencies16:36
clarkbit means that you can upgrade openstack and its dependencies and skipping an intermediate update16:36
JayFyeah, I found https://etherpad.opendev.org/p/tc-leaders-interaction-2023-vancouver and we put the details in16:38
JayFthank goodness I wrote some of this down; etherpad has a better memory than I do :D 16:38
JayFs/I/we/16:38
opendevreviewJay Faulkner proposed openstack/governance master: TC resolution: migrate to sqlalchemy2  https://review.opendev.org/c/openstack/governance/+/88708316:53
JayFtc-members: reminder meeting in 64 minutes16:56
noonedeadpunksorry, I have quite spontanious and valuable errand, so will be semi-around (from phone)17:08
JayFjust text me where you struck gold, I'll bring a few pickaxes ;) 17:08
JayFsorry to miss you but thanks for the heads up17:08
opendevreviewJay Faulkner proposed openstack/governance master: TC resolution: migrate to sqlalchemy2  https://review.opendev.org/c/openstack/governance/+/88708317:57
JayFtc-members almost meeting time!17:59
JayF#startmeeting tc18:00
opendevmeetMeeting started Tue Jun 27 18:00:22 2023 UTC and is due to finish in 60 minutes.  The chair is JayF. Information about MeetBot at http://wiki.debian.org/MeetBot.18:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.18:00
opendevmeetThe meeting name has been set to 'tc'18:00
JayF#topic Roll Call18:00
rosmaitao/18:00
gmanno/18:00
jamespageo/18:00
spotz[m]o/18:00
noonedeadpunk\o semi around 18:00
JayFHello everyone, welcome to the June 27th meeting of the TC. I'm your substitute chair for the meeting as knikolla is not feeling well.18:00
dansmitho/18:00
slaweqo/18:01
JayFA reminder that this meeting is held under the OpenInfra Code of Conduct available at https://openinfra.dev/legal/code-of-conduct18:01
JayFThanks everyone for coming, it's obviously quorum so I'm going to get going18:01
JayF#topic Follow up on past action items18:02
JayF#note JayF followed up on action item to capture discussion related to SQLAlchemy2.0 migration18:02
JayF#link https://review.opendev.org/c/openstack/governance/+/88708318:02
JayFI'd love to get feedback on that, and thanks gmann already for participating so we got a revision done before the meeting :)18:03
gmannperfect, thanks, I will check18:03
JayFIf there's any comments on that in IRC now I'll hear them, otherwise I'll be responsive to reviews in gerrit.18:03
JayFthe other action item was for Kristi, who isn't here today18:03
JayF#action knikolla will write alternative proposals for Extended Maintenance18:04
spotz[m]A few grammary things I see off the top of y head, I'll comment in gerrit18:04
JayFI'm carrying that action over to next meeting, but maybe we'll have some input for him as the next topic is18:04
rosmaitai haven't seen any feedback about EM on the mailing list ... did i miss it?18:04
JayF#topic Extended Maintenance18:04
dansmithrosmaita: I don't think so, I'm also looking for it18:04
gmannyeah, no discussion on ML yet18:05
rosmaitaok, thanks for the confirmation18:05
JayFExtended maintanence is on the agenda as a topic; but I'm unsure what we are looking for.18:06
gmannin last meeting, we talked about discussing it on ML before TC decide/take any next step18:06
gmannand that is what forum sessions also ended up to continue the discussion on ML18:06
JayFI'll take an action item to create a discussion thread on the ML about it then, if folks are amenable to that?18:07
JayFI know I'm one of the people who thought longer form comms would be helpful for this, so I can take the action to try and kick it off.18:07
JayFAny objection or other volunteers? :)18:07
spotz[m]go for it:)18:07
dansmithI thought knikolla was goign to do that?18:07
dansmithisn't that the action item?18:07
JayFHis action was listed as writing alternate proposals for Extended Maintenance18:07
JayFI was under the impression that'd be in the form of a gerrit review to governance, not a mailing list discussion18:08
dansmithI was assuming the opposite18:08
JayFYou are correct, looking at the logs.18:08
gmannI think email is what he also agree instead of writing governance change fore now?18:08
gmann*for18:08
JayFI'll reach out to knikolla tomorrow; if he wants to continue the action he can and I'll support in whatever way I can.18:08
gmann++ 18:09
rosmaitaemail first makes sense, to find out what form the governance proposal should take18:09
gmannyeah18:09
JayFAnything else on Extended Maintenance topic before we move along?18:09
gmannnothing else from me18:10
JayF#topic Gate Health Check18:10
gmannits been very unstable last week and till now I will say18:11
dansmithyeah18:11
slaweqI saw many timeouted jobs in last few days/weeks18:11
dansmiththat new test and 100% fail on ceph caught us off guard18:11
gmannI am seeing lot of timeout in tempest slow and other jobs too18:11
JayFIronic has been fighting for at least two weeks now an issue where our DB migration unit tests will hang in the gate. We've been working to fix it, and the incidence is somewhat reduced, but it's been extremely painful.18:11
slaweqis it like that in other projects too or it's something neutron related?18:11
gmannyeah rebuild test failing 100% on ceph which is skipped now to unblock gate18:11
gmannbut timeout is lot more now a days not sure what changed18:12
JayFThere was also a breaking change in Ironic tests due to a tempest image naming change (I'm not sure if we were misusing it or the API changed; but it's fixed now, ty gmann and others who worked on it)18:12
dansmithJayF: that sounds like maybe a good opportunity for a bisect to find out what changed18:12
JayFdansmith: that's what's difficult: we've had extreme difficulty reproducing, even one-off, locally18:12
gmannJayF: yeah DEFAULT_IMAGE_NAME change broke ironic and octavia and may be more. but it is reverted now18:13
JayFdansmith: we've had two contributors get it reproduced, made a fix, stopped reproducing it locally, but it still breaks in the gate18:13
dansmithJayF: well, you can bisect in the gate, depending on how reliable it is there, but okay18:13
JayFit's not reliable there18:13
dansmithJayF: any chance it's in opportunistic tests that use actual mysql?18:13
JayFIt is 100% in those tests, in our DB migration tests18:13
JayFand the failures seem to corrolate, roughly, to when more jobs are running18:14
dansmithyeah, but they're not sharing a mysql so that shouldn't really matter18:14
JayFso we've been bamboozled a couple of times where we thought we had them fixed b/c rechecks passed over a weekend, but then the failures restarted18:14
dansmithunless it's a noisy neighbor race or something18:14
JayFI'm thinking exactly ^ that, noisy neighbor race18:14
dansmithanyway, need not debug it here18:14
JayFyeah, but it's been painful and if someone wants to help us debug it anywhere, I think we'd take any help we can get :D18:14
JayF(although I'm looking away from it today to retain sanity)18:15
JayFso to summarize, sounds like gate health is probably a bit lower than usual, but we are all working together to restore more stability (and maybe are arriving there?)18:15
JayFAny other comments on the Gate, and Gate Health, before moving on?18:15
gmanndue to working on other isues, i have not debugged the timeout and why it is more since last week18:16
rosmaitaif it's a noisy neighbor thing, maybe move the DB tests to a periodic job?18:16
gmannbut this is something more critical as we are moving close to release18:16
JayFrosmaita: that's an incredibly good idea for if we finally give up on it, but I don't think we're quite to that point (perhaps)18:17
dansmithrosmaita: not a good idea for migration tests18:17
dansmithbecause you want to know if those are bad definitely before they land18:17
gmann++18:17
rosmaitawell, how many patches have migration test changes?18:17
dansmithespecially if they work on sqlite but not mysql due to typing differences18:17
fungiperiodic jobs all kick off at the same exact moment, so are more susceptible to that, not less18:17
rosmaitaok, i withdraw my suggestion18:17
fungiyou should look at our node usage graphs some time, you can see the periodic daily and weekly spikes dominate18:18
clarkbbut also it looks like a locking issue? yes noisy neighbors may make it worse but if you have a locking issue ou don't want to paper over that18:18
JayFI'll be looking at this again tomorrow morning, PDT, in #openstack-ironic if folks wanna take a look please do :) I might document the path we've been down on an ehterpad18:18
clarkbyou want to reorder operations so that you cannot deadlock regardless of cpu/io/etc demand18:18
dansmithclarkb: right agree18:18
dansmithand being in real mysql,18:18
dansmithI would think you have more tools for examining the deadlock if it's something on the transaction side18:19
dansmithmaybe we need more logs or log level or something if that's the case18:19
dansmithand if it's not (i.e. it's something in ironic code itself) examining what all has changed lately with a critical eye18:19
JayFI am incredibly appreciative of the suggestions and all, but we should probably keep the deep discussion of how to fix it out of the TC meeting18:20
JayFEven if I'm tempted to use all the brains assembled :D 18:20
JayF#topic Open Discussion & Reviews18:20
JayFWe have a couple of active open reviews at https://review.opendev.org/q/projects:openstack/governance+is:open18:20
JayFPlease review, vote, and provide feedback as appropriate.18:21
JayFIs there anything for Open Discussion?18:21
JayFLast call?18:23
gmannnothing else from me18:23
JayFThank you all for attending the TC meeting, our next meeting is on July 4th, which is a US holiday.18:23
gmannoh, we want to cancel that 18:23
JayFYeah, that's what I was thinking.18:24
dansmithcancel or don't but I shan't be here :)18:24
JayFsame :)18:24
gmannI will be here but ok to cancel18:24
slaweq+1 for cancelling next week's meeting18:24
rosmaita+1 from me too18:24
JayFBarring any objection, I consider this unanimous consent to cancel the July 4 meeting.18:24
JayFIn which case, our next meeting is July 11 and is scheduled to be chaired by knikolla 18:25
JayFthank you all for attending18:25
JayF#endmeeting18:25
opendevmeetMeeting ended Tue Jun 27 18:25:12 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)18:25
opendevmeetMinutes:        https://meetings.opendev.org/meetings/tc/2023/tc.2023-06-27-18.00.html18:25
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/tc/2023/tc.2023-06-27-18.00.txt18:25
opendevmeetLog:            https://meetings.opendev.org/meetings/tc/2023/tc.2023-06-27-18.00.log.html18:25
slaweqo/18:25
gmannthanks18:25
spotz[m]Hey all don't forget the election patch https://review.opendev.org/c/openstack/election/+/88665318:25
JayFack; I opend it will look now18:26
gmannspotz[m]: just checking again in case you missed, you want to be election official also or TC liaison only ?18:27
gmannif election official, it will be good to add you in the group to speed up the things18:27
spotz[m]Let me double check the dates as I'll be on PTO in August19:12

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!