16:00:17 #startmeeting tc 16:00:17 Meeting started Wed Feb 8 16:00:17 2023 UTC and is due to finish in 60 minutes. The chair is gmann. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:17 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:17 The meeting name has been set to 'tc' 16:00:21 #topic Roll call 16:00:24 o/ 16:00:25 o/ 16:00:46 o/ 16:01:02 o/ 16:01:15 o/ 16:02:17 o/ 16:02:19 let's wait for couple of min, meanwhile this is today agenda #link https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting 16:02:30 o/ 16:04:03 let's start 16:04:05 #topic Follow up on past action items 16:04:11 JayF to write document on how to remove pypi maintainer/owner access and a draft email for PTLs to use when asking former-contributors to give up their access. 16:04:37 JayF has started working on this, we will talk about it in separate topic we have next 16:04:49 #topic Gate health check 16:05:03 any better news with gate :) ? 16:05:23 we are seeing lot of timeout in various tempest jobs now a days 16:05:30 lots of timeouts :( 16:05:32 which blocking the things to merged 16:05:47 *still* trying to get my chunked get patch merged, but it keeps getting nailed 16:05:49 I noticed at the begining of the week some failure due to failed download of ubuntu images 16:05:54 and timouts seems to be related just to some very slow/overloaded provider 16:05:54 yeah 16:06:18 noonedeadpunk: not sure if that is the only reason 16:06:22 yeah, timeouts is also something what I saw often recently 16:06:23 we're also seeing an occasional failure with the glance test that tries an external http download of cirros, so I have a patch up to make that retry a few times 16:06:45 yes, I need to review that. will do today 16:06:55 Well, out of our jobs, they do timout in the middle of execution, on tasks that was not hanging for too long 16:07:01 it hasn't gotten a clean test yet, but it is passing the tests it's supposed to of course 16:07:12 So it looked like execution is slow overall 16:07:12 noonedeadpunk: same 16:07:24 noonedeadpunk: ohk 16:08:03 I've also seen timeouts in older release jobs, 16:08:16 so I don't think it's anything like keystone or nova suddenly do a sleep(1) in every request or anything 16:08:37 i.e. a performance regression in one of the release-specific services that is causing it to go slower 16:08:53 no-no, the jobs that passing do have proper execuiton time 16:09:19 yes, I also did not see any increase in time for passing job 16:09:23 and timouts are quite random on scenario/OS 16:09:49 are the passing jobs just under the job timeout? or do the timed out jobs end up running way longer for some reason? 16:10:17 what i observed is passing jobs are taking time they use to do in previous releases or so 16:10:36 at least not noticeable increase in their time 16:10:42 so my bet would be on some overloaded provider. I think it should be possible to identify what provider is that, but I'm not sure we have spare capacity for nodepool to just disable it... 16:11:18 gmann: okay but it seems to me like they're fairly close to the timeout right? 16:11:45 like a full-xena run that did not timeout ran in 1:55 but a full-zed that did was 2:05 16:12:25 noonedeadpunk: if nothing is wrong and it's just a slow provider, then perhaps a different job timeout for that provider, or we split up some jobs so they run in less time 16:12:29 but if you see other passing rate tempest-full-py3 takes ~1.35 in many time 16:12:56 gmann: on that same patch I was looking at full-py3 took 1:57 16:12:58 but okay 16:13:01 dansmith: well, we had job timeout set in 3h, and job that takes in average 1.40-2h does timeout 16:13:08 yeah not always. 16:13:28 noonedeadpunk I think it's the same in neutron check queue as well 16:13:29 is it set at 3h or 2h? because it seems like 2h is the threshold from what I'm seeing 16:13:40 but I don't have links to any jobs now 16:14:04 noonedeadpunk: zuul logs the provider id if the builds have logs available. if not, i should be able to look it up from the executor's service logs 16:14:05 dansmith: tempest multinode/slow jobs are 3 hr ands rest are 2 16:14:07 dansmith: in osa we have 3h 16:14:20 gmann: okay 16:14:39 fungi: yup, logs are available - they don;'t reach post timeout 16:14:58 in that case, look in the zuul info log files 16:15:17 you may also be able to correlate it with dpawlik's opensearch service 16:15:26 rax-iad is recent provider I am seeing taking 3 hr for tempest-slow job which used to take 2 hr https://zuul.opendev.org/t/openstack/build/5f5f48160b0049d99153d0d60909cee7/log/job-output.txt#49 16:15:41 gmann: I guess my point is.. it seems like some legit full runs are dangerously close to the 2h mark, so either a slight performance regression, or a few extra tests could be causing all the jobs on our slowest provider to tip over that mark lately 16:16:38 dansmith: yeah, agree with that. i have started the test split in this, #link https://review.opendev.org/c/openstack/tempest/+/873055 16:17:04 need some more work on that, checking some project gate coverage on those extra tests 16:17:10 gmann: ack 16:17:34 anyways, let's monitor the provider if we can identify the one making things slow 16:18:27 anything else on gate ? 16:18:59 #topic Cleanup of PyPI maintainer list for OpenStack Projects 16:19:07 Etherpad for audit and cleanup of additional PyPi maintainers 16:19:09 #link https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup 16:19:14 ML discussion 16:19:22 #link https://lists.openstack.org/pipermail/openstack-discuss/2023-January/031848.html 16:19:48 as discussed in last meeting, JayF is working on the email template and steps to remove/give access to openstackci 16:20:10 ha has prepared the email template for PTLs to ask additional maintainers to do the required steps #link https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup-email-template 16:20:28 adding PyPi steps also here #link https://review.opendev.org/c/opendev/infra-manual/+/873033 16:20:40 please check and add it in etherpad if any feedback 16:20:57 I will respond to feedback on that PR/etherpad this afternoon. 16:21:10 +1 16:21:43 another thing is to explicitly add PyPi things in governance as policy 16:22:04 noonedeadpunk: if i remember correctly you wanted to add it in documentation ? 16:22:24 that will help to clearly document the process on PyPi acess 16:22:44 Yes, I didn't manage to push patch yet as took a week of vacation 16:23:07 But was planning to do that during this week 16:23:27 perfect, let me add action item just to track 16:23:47 great! 16:23:48 #action noonedeadpunk to add PyPi access policy in governance documentation 16:24:03 +1 16:24:30 JayF: as you are on top of your email template and steps documentation, do you want me to continue your action item for next meeting? 16:24:45 It'd be ideal for me if we can action those before next meeting 16:24:59 e.g. get general agreement on email and if the change merges, we can update actions for PTLs 16:25:03 JayF: also, please add this email template/sending to main etherpad also in step1 #link https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup 16:25:18 I'm waiting to do that until folks here give a general agreement to the etherpad :) 16:25:23 sure 16:25:26 if it's OK for me to do that; I'm going to do it :D 16:25:27 JayF to write document on how to remove pypi maintainer/owner access and a draft email for PTLs to use when asking former-contributors to give up their access. 16:25:41 thanks JayF for working on that 16:25:57 anything else on PyPi things? 16:27:10 #topic Recurring tasks check 16:27:12 Bare 'recheck' state 16:27:22 #link https://etherpad.opendev.org/p/recheck-weekly-summary 16:27:30 slaweq please go ahead 16:28:13 nothing really new there 16:28:28 I updated stats, generally gates aren't very stable as we discussed already 16:28:36 yeah 16:28:42 and it is visible in number of rechecks needed to merge patches 16:28:54 nothing else except that 16:29:02 ok. thanks for updating 16:29:14 #topic Open Reviews 16:29:16 #link https://review.opendev.org/q/projects:openstack/governance+is:open 16:29:26 6 open reviews out of which 3 are ready to review 16:29:34 #link https://review.opendev.org/c/openstack/governance/+/872232 16:29:42 #link https://review.opendev.org/c/openstack/governance/+/872233 16:29:53 this is new one #link https://review.opendev.org/c/openstack/governance/+/872769 16:30:04 slaweq I will check it today, opened it in tab 16:30:24 others are waiting for their dependency to merge first 16:30:26 thx 16:30:47 that is all on open reviews 16:30:48 two things more from my side 16:31:32 1. elections: nomination are open. tc members whos term are completing in this election and thinking to re-run, please check nomination deadline 16:32:24 also encourage other members to run even TC or PTL for your known projects. deadline for nomination is Feb 15, 2023 23:45 UTC 16:32:56 ++ 16:33:09 time flies 16:33:15 yeah :) 16:33:18 2. open infra Board syncup call today at 20 UTC details are mentioned in #link https://etherpad.opendev.org/p/2023-02-board-openstack-sync 16:33:49 which is zoom call in 3hr 30 min from now 16:34:03 please plan to attend or feel free to add topics to discussed in etherpad 16:34:37 that is all from agenda and me for today. we have ~26 min left if anything else to discuss from anyone ? 16:34:46 ot 16:34:47 it's pretty late for me but I will try to be 16:34:54 ack 16:35:08 btw, should we somehow start tracking how projects are progressing with adding upgrade jobs for N-2? 16:36:08 noonedeadpunk: good point, we do have grenade-skip job which we will update on release and if I am not wrong that will be added in integrated gate template 16:36:49 but that is only for 4-5 projects. and manila does too. but i agree it will be good to ask other projects to do so and track that 16:37:42 ~16 projects have the greande plugin and should add the skip level testing #link https://docs.openstack.org/grenade/latest/plugin-registry.html 16:37:53 As while it's matter of practise for SLURP upgrades, I think we'd better have things and mindset in place sooner then later 16:38:10 yeah 16:38:30 gmann: how many have it voting? 16:39:00 dansmith: good question. I have not checked that but yes many of them might be non voting 16:39:08 AFAIK, pretty much everyone should have it voting in current master right? 16:39:15 yes 16:39:39 You mean skip-level job voting? 16:39:44 yes 16:39:44 But does everyone have grenade-skip in some default template? 16:40:03 noonedeadpunk: not for this cycle as this is non-SLURP 16:40:14 wait what? 16:40:15 those run only for SLURP releases. 16:40:20 I though that we agreed to have them non-voting for now as it will be just first SLURP release now 16:40:26 2023.1 is a slurp no? 16:40:43 2023.1 will be first slurp 16:40:45 yes 16:40:46 yes, but in the forward direction 16:40:48 Well, it's first slurp. so it's indeed can be non-voting 16:40:51 dansmith: i mean the first immediate SLUPR after 2023.1 16:40:55 But already running at least? 16:40:56 oh you mean because it's the first, right right 16:41:11 in 2023.1 it is non voting yes as we discussed, but in 2023.3 it should be voting 16:41:30 ack, but opt-in to voting for 2023.1 would be good if the project can swing it 16:41:31 2024.1 16:41:32 so in neutron we have neutron-ovs-grenade-multinode-skip-level and neutron-ovn-grenade-multinode-skip-level jobs but those are non-voting for now 16:41:36 in 2023.2 it will not run. in 2024.1 it should be voting 16:42:10 yep, I keep thinking this is the first upgradeable-to but it's really the first upgradeable-from...I'm on the page now :) 16:42:10 I don't see them in heat at least https://opendev.org/openstack/heat/src/branch/master/.zuul.yaml#L185-L202 16:42:18 I think we have updated our PTI document also for that but not every project knowing that 16:42:20 Unless it's part of some template 16:42:52 dansmith: I think it's unofficial upgradable-to as well, as a matter of practise? 16:43:02 noonedeadpunk: if I remember, except manila and grenade in-tree projects, I do not think any one else staretd 16:43:25 noonedeadpunk: just not required yeah 16:43:53 so this is for 2024.1 and something we can start in next cycle to ask projects to prepare those jobs and rn 16:43:55 run 16:43:58 ok, maybe worth writing some ML to encourage projects in doing so? 16:44:06 +1 16:44:07 ok 16:44:17 no harm to do that 16:44:24 yeah, running it for everyone, encouraging projects to opt-in to voting now would be best, IMHO 16:44:32 sure 16:44:37 meaning non-voting in the template 16:45:13 let me put the data in etherpad about what all project need to add and then I can send it on ML to ask projects 16:45:28 maybe we should talk with release team so they can add info about it in their weekly emails about releases? To remind projects e.g. to remove those jobs from gate in 2023.2 cycle 16:46:03 #action gmann to prepare etherpad for grenade skip upgrade job data and send email asking required projects to add job 16:46:45 slaweq: that also work and even I can add it in my weekly summary too 16:46:59 once 2023.2 start 16:47:01 gmann++ thx 16:47:19 noonedeadpunk: thanks, good reminder. 16:47:33 anything else for today? 16:47:45 not from me 16:48:32 ok, let's close then. thanks everyone for joining. 16:48:36 #endmeeting