Thursday, 2021-07-08

opendevreview	Ghanshyam proposed openstack/governance master: Add link to Yoga announcement https://review.opendev.org/c/openstack/governance/+/799926	00:15
opendevreview	Ghanshyam proposed openstack/governance master: Define Yoga release testing runtime https://review.opendev.org/c/openstack/governance/+/799927	00:22
opendevreview	Merged openstack/governance master: Add DPL model also in 'Appointing leaders' section https://review.opendev.org/c/openstack/governance/+/797985	00:38
*** rpittau\|afk is now known as rpittau		06:59
*** slaweq_ is now known as slaweq		11:42
* jungleboyj sighs .... Yoga ... didn't think it would go that far down the list.		12:13
*** rpittau is now known as rpittau\|afk		12:45
fungi	i was holding out hope for yog-sothoth	12:48
* jungleboyj laughs		12:50
jungleboyj	I am surprised that Yoga went through as it is one of the laptop lines we have at Lenovo.	12:50
tosky	I know it's late, but maybe using https://spaceballs.fandom.com/wiki/Yogurt instead of Yoghurt may have changed the final result	12:51
jungleboyj	:-) Oh Spaceballs.	12:52
gmann	jungleboyj: I had Yoga laptop but it stopped working after an year or so:)	12:54
gmann	but I think we should find some way to do some pre-sanity trademark check before vote	12:54
jungleboyj	gmann: :-( That is no good. I have had several Yogas and they all have worked well except for one that it took them a while to figure out it had a bad battery.	12:55
jungleboyj	gmann: Which model?	12:55
gmann	Yoga 910	12:55
jungleboyj	gmann: Oh, that is a nice one. I think that is the one my son has. Surprised it didn't last longer.	12:56
gmann	it seems motherbaord issue but I need to send that to service center. may be motherbaord is same costly as new laptop as it is not in warranty	12:56
jungleboyj	I always get the 3 year warranty though. That has been my luck with Laptops. :-)	13:06
*** poojajadhav is now known as pojadhav		13:16
gmann	tc-members: meeting time.	15:00
gmann	#startmeeting tc	15:00
opendevmeet	Meeting started Thu Jul 8 15:00:17 2021 UTC and is due to finish in 60 minutes. The chair is gmann. Information about MeetBot at http://wiki.debian.org/MeetBot.	15:00
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	15:00
opendevmeet	The meeting name has been set to 'tc'	15:00
gmann	#topic Roll call	15:00
diablo_rojo_phone	o/	15:00
gmann	o/	15:00
jungleboyj	o/	15:00
ricolin	o/	15:00
belmoreira	o/	15:00
dansmith	o/	15:00
clarkb	hello	15:00
gmann	clarkb: hi	15:00
gmann	yoctozepto is on PTO so would not be able to join today meeting	15:00
dansmith	did we approve that time off?	15:01
gmann	:)	15:01
fungi	i told him it was okay	15:01
jungleboyj	:-)	15:01
gmann	let's start	15:01
gmann	#topic Follow up on past action items	15:01
gmann	gmann to remove Governance non-active repos cleanup topic from agenda	15:02
gmann	done	15:02
gmann	gmann to remove election assignments topic form agenda	15:02
gmann	this too	15:02
gmann	ricolin to ask for collecting the ops pain points on openstack-discuss ML	15:02
gmann	ricolin: any update on this	15:02
ricolin	already added it on community-goals backlogs and y-cycle pre-selected,but not yet send ML out	15:03
*** frickler is now known as frickler_pto		15:03
gmann	+1. i think that is good	15:03
spotz	o/	15:03
ricolin	will send it out this week	15:03
ricolin	on ML	15:03
gmann	ok, thanks	15:03
gmann	gmann to propose the RBAC goal	15:04
gmann	I proposed that #link https://review.opendev.org/c/openstack/governance/+/799705	15:04
gmann	please review	15:04
gmann	#topic Gate health check (dansmith/yoctozepto)	15:04
gmann	dansmith: any news	15:05
dansmith	I really have nothing to report, but mostly because I've been too busy with other stuff to be submitting many patches in the last week or so	15:05
gmann	ok	15:05
gmann	one thing to share is about the log warning especialyl from oslo policy	15:05
fungi	we've had a bit of job configuration upheaval from the zuul 4.6.0 security release	15:05
gmann	melwitt clarkb: pointed out that in infra channel and many projects have such ot of warning due to policy rule	15:06
gmann	I am fixing those in #link https://review.opendev.org/q/topic:%22fix-oslo-policy-warnings%22+(status:open%20OR%20status:merged)	15:06
fungi	had to make non-backward-compatible changes to how some kinds of variables are accessed, particularly with regards to secrets, so that's been disrupting some post/promote jobs (should be under control now), as well as made some projects' over all zuul configuration insta-buggy causing some of their jobs to not run	15:06
fungi	i think kolla was hardest hit by that	15:07
gmann	ok	15:07
gmann	fungi: any effected project without ack or need helpon this ?	15:08
gmann	I saw on ML about few project ack that and working on	15:08
clarkb	gmann: it might be good to update those warnings to only fire once per process	15:08
fungi	i haven't checked in the past few days, but click the bell icon at the top-right of the zuul status page for a list of some which may need help	15:08
clarkb	I can't imagine those warnings helps operators any more than they help CI	15:08
gmann	fungi: ok, thanks for update. let us know if any project did not notice or need help	15:09
gmann	back to policy rule warning	15:10
gmann	clarkb: yes, that seems very noisy now	15:10
gmann	when we added it initially we thought it would help operator to move to new rbac but as in new rbac work every policy rule changed rhe default so warning	15:11
gmann	whihc seems does not help much	15:11
gmann	One approch I sent on ML about disableing those by default with make it configurable so that operator can enable those to see what all they need to update	15:11
gmann	#link http://lists.openstack.org/pipermail/openstack-discuss/2021-July/023484.html	15:11
gmann	and this is patch #link https://review.opendev.org/c/openstack/oslo.policy/+/799539	15:12
gmann	feel free to respond to ML or gerrit about your opinon	15:12
gmann	anything else to discuss related to gate health?	15:13
gmann	#topic Migration from 'Freenode' to 'OFTC' (gmann)	15:14
gmann	#link https://etherpad.opendev.org/p/openstack-irc-migration-to-oftc	15:14
gmann	I started pushing the patches for remaining projects #link https://review.opendev.org/q/topic:%22oftc%22+(status:open%20OR%20status:merged)	15:14
gmann	few are still left	15:14
gmann	nothing else to share on this	15:15
fungi	today we landed an update to the opendev infra manual as well, so if you refer anyone there it should now properly reference oftc and not freenode	15:15
gmann	+1	15:15
gmann	#topic Xena Tracker	15:16
spotz	+1	15:16
gmann	#link https://etherpad.opendev.org/p/tc-xena-tracker	15:16
gmann	I think we can close 'election promotion' now as we have three new election official	15:17
gmann	spotz: belmoreira diablo_rojo_phone ? what you say?	15:17
gmann	L63 in etherpad	15:17
fungi	i'm very excited by that, and happy to answer questions anyone has	15:17
jungleboyj	\o/	15:17
spotz	Yeah and we now have a name for that patch	15:17
gmann	and email opt in process or solution can be discussed by yoou guys at election channel	15:18
belmoreira	lgtm	15:18
gmann	thanks again for volunteering	15:18
gmann	Charter revision also done so marked as completed	15:19
diablo_rojo_phone	Yes we can close it.	15:19
gmann	any other update on Xena tracker?	15:20
gmann	jungleboyj: mnaser any update you want to share for 'stable policy process change' ?	15:20
jungleboyj	No, didn't get to that with the holiday week.	15:21
gmann	ok	15:21
gmann	we have 8 items in etherpad to finish in Xena, let's start working on those which should not take much time	15:22
gmann	moving next..	15:22
gmann	#topic ELK services plan and help status	15:22
gmann	first is Board meeting updates	15:23
gmann	I presented this slide in 30th June Board meeting #link https://docs.google.com/presentation/u/1/d/1ugdwMI2ZM2L8z1sobzHJwDpbvlyWKH02PH7Fi4tkyVc/edit#slide=id.ge1bdf71dac_0_0	15:23
gmann	I was expected some actional item from Board but that did not happen.	15:23
gmann	Board ack this help-needed and stated to broadcast it in the organization/local community etc	15:24
gmann	that I think we everyone are doing since 2018 when we re-defined the upstream investment opportunity	15:25
gmann	honestly saying I am not so happy with the no-actionable item from that meeting	15:26
gmann	and do not know how we can get help here ?	15:26
spotz	I took it as folks were going back to their own companies	15:26
spotz	It was a bit late for me though	15:26
gmann	yeah theor own company also	15:26
gmann	butn that is no different step from what we all including Board are trying since 2018	15:27
fungi	not to apologize for them, but i don't expect the board members come to those meetings expecting to make commitments on the behalf of their employers, and probably don't control the budget that assistance would be provided out of in most cases (they're often in entirely separate business units), so they have to lobby internally for that sort of thing	15:27
jungleboyj	fungi: True.	15:28
fungi	i'm more disappointed by the years of inaction than in their inability to make any immediate promises	15:28
gmann	few of the suggestion are listed in the slide#5 #link https://docs.google.com/presentation/d/1ugdwMI2ZM2L8z1sobzHJwDpbvlyWKH02PH7Fi4tkyVc/edit#slide=id.ge1bdf71dac_0_24	15:28
gmann	that was my expecttion and hope. I know those are not easy but in current situation we need such support	15:28
gmann	anyways that is update from Board meeting. moving next..	15:30
gmann	Creating a timeline for shutting the service down if help isn't found	15:30
gmann	clarkb please go ahead	15:31
clarkb	This is mostly a request that we start thinkign about what the timeline looks like if we don't end up with help to update the system or host it somewhere else	15:31
clarkb	I'm not currently in a rush to shut it down, but there is a risk that external circumstances could force that to be done (security concerns or similar)	15:32
clarkb	However, I think it would be good to have some agreement on what not a rush means :)	15:32
jungleboyj	:-(	15:32
clarkb	part of the reason this came up was after a week or two it was noticed that he cluster had completely crashed and I had to go resurrect it	15:32
clarkb	I don't want to do that indefinitely if there isn't proper care and feeding happening	15:33
clarkb	There are also a few problems with indexing currently including the massive log files generated by unittests due to warnings and for some reason logsatsh is emitting events for centuries in the future which floods the elasticsaerch cluster with indexes for the future	15:33
clarkb	I think the massive log files led to the cluster crashing. The future events problem is more annoying than anything else	15:34
gmann	yeah, we should start fixing those warning.may be we can ask all projects on ML. I can fix for oslo policy but do nto have bandwidth to fix other	15:34
gmann	back to shutdown thing	15:35
gmann	so if we shutdown, bug question is how we are going to debug the failure or how much it will add extra load on gate in term of doing recheck ..	15:35
clarkb	gmann: to be fair I think most people just recheck anyway and don't do debugging	15:35
gmann	yeah but not all	15:35
gmann	after shutdown there will be many recheck we had to do	15:35
clarkb	where elastic-recheck has been particularly useful is when you have an sdague, jogo, mtreinish, melwitt, or dansmith digging into braoder failures and trying to address them	15:35
dansmith	yeah, I try to shame people that just blindly recheck,	15:36
dansmith	but it's a bit of a losing battle	15:36
dansmith	still, removing the ability to do real checking sucks :/	15:36
clarkb	I suspect the biggest impact will not be recheck problems but the once a cycle or so fix a very unstable gate	15:36
dansmith	...or a more continuously unstable gate	15:36
gmann	yeah	15:36
clarkb	ya	15:36
gmann	which will directly impact our release	15:37
gmann	or feature implementation	15:37
fungi	though it sounds like the entire cluster was broken for a couple weeks there before anyone noticed it wasn't returning results to their queries	15:37
clarkb	I think that also is part of why it has been so hard to find help for this. When it is a tool you use every 6 months it is less in your mind continuously for care and feeding	15:37
clarkb	fungi: yes, but no one notices if the gate is stable	15:37
dansmith	yeah	15:37
clarkb	which is a big underlying issue here imo	15:38
clarkb	people do notice when there are systemic problems in the gate that need addressing	15:38
clarkb	another reason to have a rough timeline is it may help light a fire under people willing to help	15:39
clarkb	when I brought this up last week gmann suggested the end of the Yoga cycle as a potential deadline	15:39
dansmith	yeah, "no rush" is not as motivating	15:39
gmann	yeah, I am thinking end of Yoga will be like more than 6 month we called it as last critical call for help	15:40
clarkb	That ensures that Xena (hopefully) doesn't have any major changes to the stabilzation process. Then in Yoga we can start planning for replacement/shutdown/etc (though that can start earlier too)	15:40
gmann	so if there is anyone want to help, should be raising hand by than	15:40
clarkb	That timeline seems reasonable to me	15:40
gmann	any objection on above deadline ?	15:41
fungi	"no rush" has also been tempered by "but might be tomorrow, depending on outside factors"	15:41
clarkb	fungi: yes and I think that is still the message from me	15:41
dansmith	I'm not happy about the timeline, but accept the need	15:41
dansmith	"happy with, not happy about" you might say :)	15:42
gmann	dansmith: means? it too late or early ?	15:42
clarkb	if we notice abuse of elasticsearch or lgostash that requires upgrades to address we'll be in a situation where we don't have much choice	15:42
jungleboyj	I think that sounds like a reasonable timeline ... even though we don't want one.	15:42
dansmith	gmann: it's me being intentionally vague. I'm good with it, just not happy about it.. necessary, but I worry about the inevitable end where nobody has actually stepped up	15:42
gmann	clarkb: yeah for outside factors we would not be able to do anything and shutdown early ?	15:42
clarkb	gmann: correct	15:43
gmann	k	15:43
gmann	dansmith: correct. my last hope was Board on paid resource but anyways that did not happen	15:43
clarkb	Another concern is the sheer size of the system. I've temporarily shut down 50% of the indexing pipeline and have been monitoring our indexing queue https://grafana.opendev.org/d/5Imot6EMk/zuul-status?viewPanel=17&orgId=1&from=now-24h&to=now	15:43
clarkb	compared to elasticsearch the logstash workers aren't huge but it is still something. I think I may turn on 10% again and leave it at 40% shutdown for another week then turn off the extra servers if that looks stable.	15:44
clarkb	currently we seem to be just barely keeping up with demand	15:44
fungi	yeah, that's just half the indexing workers, not half the system	15:44
clarkb	(and then having some headroom for feature freeze is a good idea hence only reducing by 40% total)	15:45
gmann	how about keeping only check pipeline logs ?	15:45
clarkb	gmann: I would probably to the opposite and only keep gate	15:45
clarkb	check is too noisy	15:45
clarkb	people push a lot of broken into check :)	15:45
jungleboyj	clarkb: That makes sense to me.	15:45
gmann	clarkb: yeah but in check we do most of debugging and make it more stable till gate	15:46
fungi	the check pipeline results are full of noise failures from bad changes, while the gate pipeline should in theory be things which at least got through check and code review to approval	15:46
clarkb	but that is another option and reducing the total amount of logs indexed would potentially allows us to remove an elasticsearch server or two (since the major factor there is total storage size)	15:46
clarkb	gmann: yes, but it is very hard to see anything useful in check because you can't really tell if things are just broken because someone didn't run tox locally or if they are really broken	15:46
gmann	yeah	15:47
clarkb	it is still useful to have check, often you want to go and see where something may have been introduced and you can trace that back to check	15:47
clarkb	but if we start trimming logs check is what I would drop first	15:47
clarkb	as far as elasticsaerch disk consumption goes we should have a pretty good idnicate of current db size for 7 days of indexes at the beginning of next week	15:47
clarkb	the data is currently a bit off since we had the cluster crash recently	15:48
clarkb	that info is available in our cacti instance if you want to see what usage looks like. We have 6TB storage available but 5TB useable beacuse we need to be tolerate to losing one server and its 1TB of disk	15:48
clarkb	If we want t ostart pruning logs out then maybe we start that conversation next week when we have a good baseline of data to look at first	15:49
gmann	or truncate the log storage time? to 2-3 days	15:49
clarkb	yes that is another option	15:49
fungi	though that doesn't give you much history to be able to identify when a particular failure started	15:50
fungi	a week is already fairly short in that regard	15:50
clarkb	yup, but may be enough to identify the source of problems and then work backward in code	15:50
gmann	yeah, we anyways going to loose that	15:50
clarkb	as well as track what issues are still occuring	15:50
gmann	yes	15:50
clarkb	anyway I think discussion for pruning elasticsearch size is better next week when we have better data to look at. I'm happy to help collect some of that info together and discuss it further next week if we like	15:51
fungi	i wonder if we could change the indexing threshold to >info instead of >debug	15:51
clarkb	(this is about all I had on this agenda item. I'll go ahead and make note of the Yoga daedline on the mailing list in a response to the thread I started a while back if I can find it)	15:51
gmann	clarkb: +1 that will be even better	15:51
clarkb	fungi: the issue with that is a good chunk of logs are the job-output.txt files now with no log level	15:52
gmann	clarkb: +1 and thanks for publishing deadline on ML	15:52
clarkb	fungi: this is why the warnings hurt so much	15:52
fungi	ahh, yeah good point	15:52
gmann	on warnings I will start a thread to fix and start onverting them to error at from openstack lib side so that proejct had tp fix them	15:52
gmann	#action clarkb to convey the ELK service shutdown deadline on ML	15:53
gmann	#action gmann to send ML to fix warning and oslo side changes to convert them to error	15:53
gmann	and we will continue disscussing it in next week	15:54
gmann	thanks clarkb fungi for the updates and maintaining these services	15:54
jungleboyj	++	15:54
gmann	#topic Open Reviews	15:54
gmann	#link https://review.opendev.org/q/projects:openstack/governance+is:open	15:54
gmann	I added the link for Yoga release name announcement	15:55
gmann	please review that	15:55
gmann	also Yoga testing runtime #link https://review.opendev.org/c/openstack/governance/+/799927	15:55
gmann	with no change from what we have in Xena	15:55
gmann	and this one about rbac goal proposal #link https://review.opendev.org/c/openstack/governance/+/799705	15:56
gmann	and need one more vote in this project-update #link https://review.opendev.org/c/openstack/governance/+/799817	15:56
clarkb	as a note on the python version available in focal I think 3.9 is avaible now	15:57
spotz	3.9 is also what will be Stream 9	15:57
clarkb	oh I guess it is in universe though	15:57
clarkb	probably good to test it but not make it the default	15:57
gmann	clarkb: I think 3.8	15:57
gmann	clarkb: we have unit test job as non voting for 3.9	15:57
clarkb	gmann: it has both. But 3.8 is the default and not in universe :)	15:58
gmann	yeah, default	15:58
gmann	that's all for me today, anything else to diuscuss ?	15:59
gmann	though 1 min left	15:59
jungleboyj	Nothing here.	16:00
gmann	if nothing, let's close meeting.	16:00
gmann	k	16:00
gmann	thanks all for joining.	16:00
gmann	#endmeeting	16:00
opendevmeet	Meeting ended Thu Jul 8 16:00:16 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	16:00
opendevmeet	Minutes: https://meetings.opendev.org/meetings/tc/2021/tc.2021-07-08-15.00.html	16:00
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/tc/2021/tc.2021-07-08-15.00.txt	16:00
opendevmeet	Log: https://meetings.opendev.org/meetings/tc/2021/tc.2021-07-08-15.00.log.html	16:00
spotz	Thanks gmann	16:00
ricolin	thanks gmann	16:00
jungleboyj	Thank you!	16:00
*** pojadhav is now known as pojadhav\|away		16:52
opendevreview	Ghanshyam proposed openstack/governance-sigs master: Moving IRC network reference to OFTC https://review.opendev.org/c/openstack/governance-sigs/+/800135	23:39

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!