16:00:17 <gmann> #startmeeting tc
16:00:17 <opendevmeet> Meeting started Wed Feb  8 16:00:17 2023 UTC and is due to finish in 60 minutes.  The chair is gmann. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:17 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:17 <opendevmeet> The meeting name has been set to 'tc'
16:00:21 <gmann> #topic Roll call
16:00:24 <dansmith> o/
16:00:25 <gmann> o/
16:00:46 <slaweq> o/
16:01:02 <knikolla[m]> o/
16:01:15 <noonedeadpunk> o/
16:02:17 <rosmaita> o/
16:02:19 <gmann> let's wait for couple of min, meanwhile this is today agenda #link https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting
16:02:30 <spotz> o/
16:04:03 <gmann> let's start
16:04:05 <gmann> #topic Follow up on past action items
16:04:11 <gmann> JayF to write document on how to remove pypi maintainer/owner access and a draft email for PTLs to use when asking former-contributors to give up their access.
16:04:37 <gmann> JayF has started working on this, we will talk about it in separate topic we have next
16:04:49 <gmann> #topic Gate health check
16:05:03 <gmann> any better news with gate :) ?
16:05:23 <gmann> we are seeing lot of timeout in various tempest jobs now a days
16:05:30 <dansmith> lots of timeouts :(
16:05:32 <gmann> which blocking the things to merged
16:05:47 <dansmith> *still* trying to get my chunked get patch merged, but it keeps getting nailed
16:05:49 <slaweq> I noticed at the begining of the week some failure due to failed download of ubuntu images
16:05:54 <noonedeadpunk> and timouts seems to be related just to some very slow/overloaded provider
16:05:54 <gmann> yeah
16:06:18 <gmann> noonedeadpunk: not sure if that is the only reason
16:06:22 <slaweq> yeah, timeouts is also something what I saw often recently
16:06:23 <dansmith> we're also seeing an occasional failure with the glance test that tries an external http download of cirros, so I have a patch up to make that retry a few times
16:06:45 <gmann> yes, I need to review that. will do today
16:06:55 <noonedeadpunk> Well, out of our jobs, they do timout in the middle of execution, on tasks that was not hanging for too long
16:07:01 <dansmith> it hasn't gotten a clean test yet, but it is passing the tests it's supposed to of course
16:07:12 <noonedeadpunk> So it looked like execution is slow overall
16:07:12 <dansmith> noonedeadpunk: same
16:07:24 <gmann> noonedeadpunk: ohk
16:08:03 <dansmith> I've also seen timeouts in older release jobs,
16:08:16 <dansmith> so I don't think it's anything like keystone or nova suddenly do a sleep(1) in every request or anything
16:08:37 <dansmith> i.e. a performance regression in one of the release-specific services that is causing it to go slower
16:08:53 <noonedeadpunk> no-no, the jobs that passing do have proper execuiton time
16:09:19 <gmann> yes, I also did not see any increase in time for passing job
16:09:23 <noonedeadpunk> and timouts are quite random on scenario/OS
16:09:49 <dansmith> are the passing jobs just under the job timeout? or do the timed out jobs end up running way longer for some reason?
16:10:17 <gmann> what i observed is passing jobs are taking time they use to do in previous releases or so
16:10:36 <gmann> at least not noticeable increase in their time
16:10:42 <noonedeadpunk> so my bet would be on some overloaded provider. I think it should be possible to identify what provider is that, but I'm not sure we have spare capacity for nodepool to just disable it...
16:11:18 <dansmith> gmann: okay but it seems to me like they're fairly close to the timeout right?
16:11:45 <dansmith> like a full-xena run that did not timeout ran in 1:55 but a full-zed that did was 2:05
16:12:25 <dansmith> noonedeadpunk: if nothing is wrong and it's just a slow provider, then perhaps a different job timeout for that provider, or we split up some jobs so they run in less time
16:12:29 <gmann> but if you see other passing rate tempest-full-py3 takes ~1.35 in many time
16:12:56 <dansmith> gmann: on that same patch I was looking at full-py3 took 1:57
16:12:58 <dansmith> but okay
16:13:01 <noonedeadpunk> dansmith: well, we had job timeout set in 3h, and job that takes in average 1.40-2h does timeout
16:13:08 <gmann> yeah not always.
16:13:28 <slaweq> noonedeadpunk I think it's the same in neutron check queue as well
16:13:29 <dansmith> is it set at 3h or 2h? because it seems like 2h is the threshold from what I'm seeing
16:13:40 <slaweq> but I don't have links to any jobs now
16:14:04 <fungi> noonedeadpunk: zuul logs the provider id if the builds have logs available. if not, i should be able to look it up from the executor's service logs
16:14:05 <gmann> dansmith: tempest multinode/slow jobs are 3 hr ands rest are 2
16:14:07 <noonedeadpunk> dansmith: in osa we have 3h
16:14:20 <dansmith> gmann: okay
16:14:39 <noonedeadpunk> fungi: yup, logs are available - they don;'t reach post timeout
16:14:58 <fungi> in that case, look in the zuul info log files
16:15:17 <fungi> you may also be able to correlate it with dpawlik's opensearch service
16:15:26 <gmann> rax-iad is recent provider I am seeing taking 3 hr for tempest-slow job which used to take 2 hr https://zuul.opendev.org/t/openstack/build/5f5f48160b0049d99153d0d60909cee7/log/job-output.txt#49
16:15:41 <dansmith> gmann: I guess my point is.. it seems like some legit full runs are dangerously close to the 2h mark, so either a slight performance regression, or a few extra tests could be causing all the jobs on our slowest provider to tip over that mark lately
16:16:38 <gmann> dansmith: yeah, agree with that. i have started the test split in this, #link https://review.opendev.org/c/openstack/tempest/+/873055
16:17:04 <gmann> need some more work on that, checking some project gate coverage on those extra tests
16:17:10 <dansmith> gmann: ack
16:17:34 <gmann> anyways, let's monitor the provider if we can identify the one making things slow
16:18:27 <gmann> anything else on gate ?
16:18:59 <gmann> #topic Cleanup of PyPI maintainer list for OpenStack Projects
16:19:07 <gmann> Etherpad for audit and cleanup of additional PyPi maintainers
16:19:09 <gmann> #link https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup
16:19:14 <gmann> ML discussion
16:19:22 <gmann> #link https://lists.openstack.org/pipermail/openstack-discuss/2023-January/031848.html
16:19:48 <gmann> as discussed in last meeting, JayF is working on the email template and steps to remove/give access to openstackci
16:20:10 <gmann> ha has prepared the email template for PTLs to ask additional maintainers to do the required steps #link https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup-email-template
16:20:28 <gmann> adding PyPi steps also here #link https://review.opendev.org/c/opendev/infra-manual/+/873033
16:20:40 <gmann> please check and add it in etherpad if any feedback
16:20:57 <JayF> I will respond to feedback on that PR/etherpad this afternoon.
16:21:10 <gmann> +1
16:21:43 <gmann> another thing is to explicitly add PyPi things in governance as policy
16:22:04 <gmann> noonedeadpunk: if i remember correctly you wanted to add it in documentation ?
16:22:24 <gmann> that will help to clearly document the process on PyPi acess
16:22:44 <noonedeadpunk> Yes, I didn't manage to push patch yet as took a week of vacation
16:23:07 <noonedeadpunk> But was planning to do that during this week
16:23:27 <gmann> perfect, let me add action item just to track
16:23:47 <knikolla[m]> great!
16:23:48 <gmann> #action noonedeadpunk to add PyPi access policy in governance documentation
16:24:03 <noonedeadpunk> +1
16:24:30 <gmann> JayF: as you are on top of your email template and steps documentation, do you want me to continue your action item for next meeting?
16:24:45 <JayF> It'd be ideal for me if we can action those before next meeting
16:24:59 <JayF> e.g. get general agreement on email and if the change merges, we can update actions for PTLs
16:25:03 <gmann> JayF: also, please add this email template/sending to main etherpad also in step1 #link https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup
16:25:18 <JayF> I'm waiting to do that until folks here give a general agreement to the etherpad :)
16:25:23 <gmann> sure
16:25:26 <JayF> if it's OK for me to do that; I'm going to do it :D
16:25:27 <gmann> JayF to write document on how to remove pypi maintainer/owner access and a draft email for PTLs to use when asking former-contributors to give up their access.
16:25:41 <gmann> thanks JayF for working on that
16:25:57 <gmann> anything else on PyPi things?
16:27:10 <gmann> #topic Recurring tasks check
16:27:12 <gmann> Bare 'recheck' state
16:27:22 <gmann> #link https://etherpad.opendev.org/p/recheck-weekly-summary
16:27:30 <gmann> slaweq please go ahead
16:28:13 <slaweq> nothing really new there
16:28:28 <slaweq> I updated stats, generally gates aren't very stable as we discussed already
16:28:36 <gmann> yeah
16:28:42 <slaweq> and it is visible in number of rechecks needed to merge patches
16:28:54 <slaweq> nothing else except that
16:29:02 <gmann> ok. thanks for updating
16:29:14 <gmann> #topic Open Reviews
16:29:16 <gmann> #link https://review.opendev.org/q/projects:openstack/governance+is:open
16:29:26 <gmann> 6 open reviews out of which 3 are ready to review
16:29:34 <gmann> #link https://review.opendev.org/c/openstack/governance/+/872232
16:29:42 <gmann> #link https://review.opendev.org/c/openstack/governance/+/872233
16:29:53 <gmann> this is new one #link https://review.opendev.org/c/openstack/governance/+/872769
16:30:04 <gmann> slaweq I will check it today, opened it in tab
16:30:24 <gmann> others are waiting for their dependency to merge first
16:30:26 <slaweq> thx
16:30:47 <gmann> that is all on open reviews
16:30:48 <gmann> two things more from my side
16:31:32 <gmann> 1. elections: nomination are open. tc members whos term are completing in this election and thinking to re-run, please check nomination deadline
16:32:24 <gmann> also encourage other members to run even TC or PTL for your known projects. deadline for nomination is Feb 15, 2023 23:45 UTC
16:32:56 <slaweq> ++
16:33:09 <knikolla[m]> time flies
16:33:15 <gmann> yeah :)
16:33:18 <gmann> 2. open infra Board syncup call today at 20 UTC details are mentioned in #link https://etherpad.opendev.org/p/2023-02-board-openstack-sync
16:33:49 <gmann> which is zoom call in 3hr 30 min from now
16:34:03 <gmann> please plan to attend or feel free to add topics to discussed in etherpad
16:34:37 <gmann> that is all from agenda and me for today. we have ~26 min left if anything else to discuss from anyone ?
16:34:46 <slaweq> ot
16:34:47 <slaweq> it's pretty late for me but I will try to be
16:34:54 <gmann> ack
16:35:08 <noonedeadpunk> btw, should we somehow start tracking how projects are progressing with adding upgrade jobs for N-2?
16:36:08 <gmann> noonedeadpunk: good point, we do have grenade-skip job which we will update on release and if I am not wrong that will be added in integrated gate template
16:36:49 <gmann> but that is only for 4-5 projects. and manila does too. but i agree it will be good to ask other projects to do so and track that
16:37:42 <gmann> ~16 projects have the greande plugin and should add the skip level testing #link https://docs.openstack.org/grenade/latest/plugin-registry.html
16:37:53 <noonedeadpunk> As while it's matter of practise for SLURP upgrades, I think we'd better have things and mindset in place sooner then later
16:38:10 <gmann> yeah
16:38:30 <dansmith> gmann: how many have it voting?
16:39:00 <gmann> dansmith: good question.  I have not checked that but yes many of them might be non voting
16:39:08 <dansmith> AFAIK, pretty much everyone should have it voting in current master right?
16:39:15 <gmann> yes
16:39:39 <slaweq> You mean skip-level job voting?
16:39:44 <dansmith> yes
16:39:44 <noonedeadpunk> But does everyone have grenade-skip in some default template?
16:40:03 <gmann> noonedeadpunk: not for this cycle as this is non-SLURP
16:40:14 <dansmith> wait what?
16:40:15 <gmann> those run only for SLURP releases.
16:40:20 <slaweq> I though that we agreed to have them non-voting for now as it will be just first SLURP release now
16:40:26 <dansmith> 2023.1 is a slurp no?
16:40:43 <slaweq> 2023.1 will be first slurp
16:40:45 <slaweq> yes
16:40:46 <rosmaita> yes, but in the forward direction
16:40:48 <noonedeadpunk> Well, it's first slurp. so it's indeed can be non-voting
16:40:51 <gmann> dansmith: i mean the first immediate SLUPR after 2023.1
16:40:55 <noonedeadpunk> But already running at least?
16:40:56 <dansmith> oh you mean because it's the first, right right
16:41:11 <gmann> in 2023.1 it is non voting yes as we discussed, but in 2023.3 it should be voting
16:41:30 <dansmith> ack, but opt-in to voting for 2023.1 would be good if the project can swing it
16:41:31 <gmann> 2024.1
16:41:32 <slaweq> so in neutron we have neutron-ovs-grenade-multinode-skip-level and neutron-ovn-grenade-multinode-skip-level jobs but those are non-voting for now
16:41:36 <gmann> in 2023.2 it will not run. in 2024.1 it should be voting
16:42:10 <dansmith> yep, I keep thinking this is the first upgradeable-to but it's really the first upgradeable-from...I'm on the page now :)
16:42:10 <noonedeadpunk> I don't see them in heat at least https://opendev.org/openstack/heat/src/branch/master/.zuul.yaml#L185-L202
16:42:18 <gmann> I think we have updated our PTI document also for that but not every project knowing that
16:42:20 <noonedeadpunk> Unless it's part of some template
16:42:52 <noonedeadpunk> dansmith: I think it's unofficial upgradable-to as well, as a matter of practise?
16:43:02 <gmann> noonedeadpunk: if I remember, except manila and grenade in-tree projects, I do not think any one else staretd
16:43:25 <dansmith> noonedeadpunk: just not required yeah
16:43:53 <gmann> so this is for 2024.1 and something we can start in next cycle to ask projects to prepare those jobs and rn
16:43:55 <gmann> run
16:43:58 <noonedeadpunk> ok, maybe worth writing some ML to encourage projects in doing so?
16:44:06 <gmann> +1
16:44:07 <noonedeadpunk> ok
16:44:17 <gmann> no harm to do that
16:44:24 <dansmith> yeah, running it for everyone, encouraging projects to opt-in to voting now would be best, IMHO
16:44:32 <gmann> sure
16:44:37 <dansmith> meaning non-voting in the template
16:45:13 <gmann> let me put the data in etherpad about what all project need to add and then I can send it on ML to ask projects
16:45:28 <slaweq> maybe we should talk with release team so they can add info about it in their weekly emails about releases? To remind projects e.g. to remove those jobs from gate in 2023.2 cycle
16:46:03 <gmann> #action gmann to prepare etherpad for grenade skip upgrade job data and send email asking required projects to add job
16:46:45 <gmann> slaweq: that also work and even I can add it in my weekly summary too
16:46:59 <gmann> once 2023.2 start
16:47:01 <slaweq> gmann++ thx
16:47:19 <gmann> noonedeadpunk: thanks, good reminder.
16:47:33 <gmann> anything else for today?
16:47:45 <spotz> not from me
16:48:32 <gmann> ok, let's close then. thanks everyone for joining.
16:48:36 <gmann> #endmeeting