Thursday, 2021-07-29

*** Guest2352 is now known as prometheanfire00:34
*** prometheanfire is now known as Guest264000:35
*** Guest2640 is now known as prometheanfire00:36
*** ykarel_ is now known as ykarel05:15
*** rpittau|afk is now known as rpittau07:10
*** ykarel is now known as ykarel|lunch08:24
*** ykarel|lunch is now known as ykarel09:51
*** jcapitao is now known as jcapitao_lunch10:29
*** jcapitao_lunch is now known as jcapitao12:18
*** rlandy is now known as rlandy|ruck12:19
elodilleshi, i'm planning to run the 'stale eol branch delete' script to remove another batch of eol'd branches (this time we have around a dozen branch to delete)14:50
elodillesas far as I see gerrit is fully operational so I guess it shouldn't be a problem to start now14:51
elodillesclarkb fungi : fyi, as usual ^^^14:51
clarkbelodilles: yup should be fine. We're planning a downtime tomorrow around this time though14:52
elodillesack, thanks for the info :)14:53
fungiagreed, go for it14:54
*** ykarel is now known as ykarel|away15:50
elodillesscript finished without errors (list of deleted branches: https://paste.opendev.org/show/807797/ )15:52
fungigreat! thanks for running that15:53
fungielodilles: do you expect these eols to reduce the volume of job failure reports to the stable-manit ml any time soon? we're up to something like 200 messages a day on that list and i can't imagine anyone is reading them all15:54
elodilleswell, when the most of the ocata branches were deleted then we went down from ~150 mails to ~120 mails, so now it's around that15:57
elodillesto tell you the truth I don't think the number will be reduced that much15:58
elodillesI mean, for now...15:58
fungii wonder if we should be using a different mechanism for tracking stable job failures. sending an e-mail message about each and every one when there are hundreds a day doesn't sound efficient15:58
fungizuul already has a dashboard we can query to see which jobs fail by branch, by project, in specific pipelines, et cetera15:59
elodillesactually i don't know who else follows the mails beyond me o:)16:00
elodillesis there a lot of people subscribed to stable-maint ?16:00
elodilles* are16:00
fungii'll check, but just because they're subscribed doesn't mean their address is still valid or that they're looking at the messages16:01
fungimailman has 85 subscribers for the openstack-stable-maint ml, and a bunch of these i know to be people who are no longer active in openstack... i suspect that's the case for almost all of them in fact16:02
*** rpittau is now known as rpittau|afk16:03
fungibut at this point, the openstack-stable-maint list traffic accounts for roughly half of our overall legitimate volume of messages to mailing lists we're maintaining16:04
clarkbSimilarly if those jobs fail consistently and no one is workign to fix them should we stop running the jobs?16:06
fungiit would definitely free up ci resources16:07
elodillesyes, that's true. the main issue is that teams are not fixing their stable branch gate failures (and not even following their state)16:09
clarkbI know for EM we said we'd just turn things off when that happened. I suspect this is happeing on the more supported stable branches too though?16:09
elodillesclarkb: actually not16:10
elodilleswe don't really turn off failing tests16:10
elodillesI mean where the maintenance is active mostly all jobs are still working16:10
elodilles(nova, neutron, cinder, etc.)16:11
elodilleshowever, for example lower-constraints jobs were deleted after some TC discussions,16:11
elodilleswhen the new pip-resolver revealed version conflicts16:12
elodillesand grenade were disabled for some of the projects16:12
elodillesotherwise the less active projects have just broken gates, and that's it :/16:13
elodillescurrently, most of the periodic-job failures are due to the easy_install issue in xenial based jobs16:14
fungiso to reiterate, are those likely to ever be fixed? and if not, can we stop running them (at least for the projects where we expect they'll never be fixed)?16:16
elodillesI have a TODO to fix it some general way (strangely it is not a problem for some repos, and for some other repos it was enough to install pbr in advance, but still it is on my TODO to have a general fix)16:17
fungithe opendev sysadmins reserve the right to bypass zuul gating and gerrit acls in order to merge job configuration changes to projects where necessary, so if the problem is just a lack of reviewers on those projects approving changes we have ways around it16:18
elodillesfungi: that could help16:18
fungithough here we're talking specifically about official openstack projects, so would probably clear it with the openstack tx16:18
fungitc16:18
fungior at least give them a heads up when we do it16:18
elodillessince some of the issues are due to inactive projects i think a heads up would be good16:20
fungiright. in many cases it may be that the projects are active but nobody's maintaining their older stable branches and not paying attention to job failures on them, so just need to be encouraged to eol those branches16:29
clarkbright I think the indication here may be that those branches are now dead and should be treated that way16:31
elodillesyes, that's true. I'm thinking now to collect such repos and send a mail to teams about either fix or prepare to eol their broken branches16:32
elodillesfor inactive projects, I could try to propose patch like this: https://review.opendev.org/c/openstack/nova-powervm/+/80291316:33
elodillesthough I guess these needs some extra rights as the gate won't allow these to merge (as the gate is broken)16:34
fungiyeah, i or someone else on the tact sig who is also a gerrit sysadmin could bypass gating to merge that directly, with the acknowledgement of the tc16:35
elodillesanyway, let me send some mails in the next days to teams about their broken stable gates16:36
fungithough if that project can't merge any changes because its check jobs are also failing in master, then maybe we need to talk to the tc about retiring the project entirely16:36
fungior possibly retiring the entire team if all of their deliverables are in this state16:37
fungimay also be worth the tc bringing this particular case up with the multi-arch sig16:37
elodillesis there a process now that monitors the teams/projects health?16:39
fungielodilles: not really. the tc mostly responds to evidence like this to make determinations16:40
fungi"health" is a really hard thing to define, much less measure, so it makes more sense to base decisions on whether teams are keeping up with general expectations for maintaining their deliverables (responding to vulnerability reports, participating in scheduled releases, keeping jobs running)16:41
elodilleshmmm, I see. I remembered something like that existed before, but maybe it was long ago or my memory is tricking me :)16:41
fungithere have been multiple attempts, but they required more work to track project health than it would have been to just take over those project duties16:42
elodillesi see16:42
fungithere are so many more projects than there are tc members, that assigning tc members to individually evaluate teams on an ongoing basis has always been untenanble16:44
fungiuntenable16:44
elodillesthat's understandable16:44
fungiright now there are nearly 10 official project teams for every tc member16:44
elodillesok, so I think this is what I can do now:16:45
elodillessince I'm checking the periodic-stable job failures anyway,16:45
elodillesI will send mails to teams about their stable branch state (that are failing for a longer period),16:46
elodillesand if we don't get replies then I can forward these cases to TC16:47
elodilleswhat do you think? is this a good approach?16:48
clarkbthat seems reasonable to me16:51
clarkbgive people a cahnce to volunteer and help fix it, if that doesn't happen we can proceed to cleanup16:51
fungiyep, make sense16:52
elodillesok, thanks, then I will start looking this tomorrow and then we will see16:54
elodillesand hopefully this will soon eliminate the majority of the daily ~120 mails :)16:55
elodillesand free up some CI resource, of course16:56
fungithanks!16:57
elodillesno problem :)17:00
elodillesand thanks too!17:01
*** sshnaidm is now known as sshnaidm|afk18:30
opendevreviewJames E. Blair proposed openstack/project-config master: Remove report-build-page from zuul tenant config  https://review.opendev.org/c/openstack/project-config/+/80297323:42

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!