Thursday, 2021-07-08

opendevreviewGhanshyam proposed openstack/governance master: Add link to Yoga announcement  https://review.opendev.org/c/openstack/governance/+/79992600:15
opendevreviewGhanshyam proposed openstack/governance master: Define Yoga release testing runtime  https://review.opendev.org/c/openstack/governance/+/79992700:22
opendevreviewMerged openstack/governance master: Add DPL model also in 'Appointing leaders' section  https://review.opendev.org/c/openstack/governance/+/79798500:38
*** rpittau|afk is now known as rpittau06:59
*** slaweq_ is now known as slaweq11:42
* jungleboyj sighs .... Yoga ... didn't think it would go that far down the list.12:13
*** rpittau is now known as rpittau|afk12:45
fungii was holding out hope for yog-sothoth12:48
* jungleboyj laughs12:50
jungleboyjI am surprised that Yoga went through as it is one of the laptop lines we have at Lenovo.12:50
toskyI know it's late, but maybe using https://spaceballs.fandom.com/wiki/Yogurt instead of Yoghurt may have changed the final result 12:51
jungleboyj:-)  Oh Spaceballs.12:52
gmannjungleboyj: I had Yoga laptop but it stopped working after an year or so:)12:54
gmannbut I think we should find some way to do some pre-sanity trademark check before vote12:54
jungleboyjgmann:  :-(  That is no good.  I have had several Yogas and they all have worked well except for one that it took them a while to figure out it had a bad battery.12:55
jungleboyjgmann:  Which model?12:55
gmannYoga 91012:55
jungleboyjgmann:  Oh, that is a nice one.  I think that is the one my son has.  Surprised it didn't last longer.12:56
gmannit seems motherbaord issue but I need to send that to service center. may be motherbaord is same costly as new laptop as it is not in warranty 12:56
jungleboyjI always get the 3 year warranty though.  That has been my luck with Laptops.  :-)13:06
*** poojajadhav is now known as pojadhav13:16
gmanntc-members: meeting time.  15:00
gmann#startmeeting tc15:00
opendevmeetMeeting started Thu Jul  8 15:00:17 2021 UTC and is due to finish in 60 minutes.  The chair is gmann. Information about MeetBot at http://wiki.debian.org/MeetBot.15:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.15:00
opendevmeetThe meeting name has been set to 'tc'15:00
gmann#topic Roll call15:00
diablo_rojo_phoneo/15:00
gmanno/15:00
jungleboyjo/15:00
ricolino/15:00
belmoreirao/15:00
dansmitho/15:00
clarkbhello15:00
gmannclarkb: hi15:00
gmannyoctozepto is on PTO so would not be able to join today meeting15:00
dansmithdid we approve that time off?15:01
gmann:)15:01
fungii told him it was okay15:01
jungleboyj:-)15:01
gmannlet's start15:01
gmann#topic Follow up on past action items15:01
gmanngmann to remove Governance non-active repos cleanup topic from agenda15:02
gmanndone15:02
gmanngmann to remove election assignments topic form agenda15:02
gmannthis too15:02
gmannricolin to ask for collecting the ops pain points on openstack-discuss ML15:02
gmannricolin: any update on this15:02
ricolinalready added it on community-goals backlogs and y-cycle pre-selected,but not yet send ML out15:03
*** frickler is now known as frickler_pto15:03
gmann+1. i think that is good15:03
spotzo/15:03
ricolinwill send it out this week15:03
ricolinon ML15:03
gmannok, thanks15:03
gmanngmann to propose the RBAC goal 15:04
gmannI proposed that #link https://review.opendev.org/c/openstack/governance/+/79970515:04
gmannplease review15:04
gmann#topic Gate health check (dansmith/yoctozepto)15:04
gmanndansmith: any news15:05
dansmithI really have nothing to report, but mostly because I've been too busy with other stuff to be submitting many patches in the last week or so15:05
gmannok15:05
gmannone thing to share is about the log warning especialyl from oslo policy15:05
fungiwe've had a bit of job configuration upheaval from the zuul 4.6.0 security release15:05
gmannmelwitt clarkb: pointed out that in infra channel and many projects have such ot of warning due to policy rule15:06
gmannI am fixing those in #link https://review.opendev.org/q/topic:%22fix-oslo-policy-warnings%22+(status:open%20OR%20status:merged)15:06
fungihad to make non-backward-compatible changes to how some kinds of variables are accessed, particularly with regards to secrets, so that's been disrupting some post/promote jobs (should be under control now), as well as made some projects' over all zuul configuration insta-buggy causing some of their jobs to not run15:06
fungii think kolla was hardest hit by that15:07
gmannok15:07
gmannfungi: any effected project without ack or need helpon this ?15:08
gmannI saw on ML about few project ack that and working on15:08
clarkbgmann: it might be good to update those warnings to only fire once per process15:08
fungii haven't checked in the past few days, but click the bell icon at the top-right of the zuul status page for a list of some which may need help15:08
clarkbI can't imagine those warnings helps operators any more than they help CI15:08
gmannfungi: ok, thanks for update. let us know if any project did not notice or need help15:09
gmannback to policy rule warning15:10
gmannclarkb: yes, that seems very noisy now15:10
gmannwhen we added it initially we thought it would help operator to move to new rbac but as in new rbac work every policy rule changed rhe default so warning15:11
gmannwhihc seems does not help much15:11
gmannOne approch I sent on ML about disableing those by default with make it configurable so that operator can enable those to see what all they need to update15:11
gmann#link http://lists.openstack.org/pipermail/openstack-discuss/2021-July/023484.html15:11
gmannand this is patch #link https://review.opendev.org/c/openstack/oslo.policy/+/79953915:12
gmannfeel free to respond to ML or gerrit about your opinon 15:12
gmannanything else to discuss related to gate health?15:13
gmann#topic Migration from 'Freenode' to 'OFTC' (gmann)15:14
gmann#link https://etherpad.opendev.org/p/openstack-irc-migration-to-oftc15:14
gmannI started pushing the patches for remaining projects #link https://review.opendev.org/q/topic:%22oftc%22+(status:open%20OR%20status:merged)15:14
gmannfew are still left15:14
gmannnothing else to share on this15:15
fungitoday we landed an update to the opendev infra manual as well, so if you refer anyone there it should now properly reference oftc and not freenode15:15
gmann+115:15
gmann#topic Xena Tracker15:16
spotz+115:16
gmann#link https://etherpad.opendev.org/p/tc-xena-tracker15:16
gmannI think we can close 'election promotion' now as we have three new election official 15:17
gmannspotz: belmoreira diablo_rojo_phone ? what you say?15:17
gmannL63 in etherpad15:17
fungii'm very excited by that, and happy to answer questions anyone has15:17
jungleboyj\o/15:17
spotzYeah and we now have a name for that patch15:17
gmannand email  opt in process or solution can be discussed by yoou guys at election channel15:18
belmoreiralgtm15:18
gmannthanks again for volunteering 15:18
gmannCharter revision also done so marked as completed15:19
diablo_rojo_phoneYes we can close it. 15:19
gmannany other update on Xena tracker?15:20
gmannjungleboyj: mnaser any update you want to share for 'stable policy process change' ?15:20
jungleboyjNo, didn't get to that with the holiday week. 15:21
gmannok15:21
gmannwe have 8 items in etherpad to finish in Xena, let's start working on those which should not take much time15:22
gmannmoving next..15:22
gmann#topic ELK services plan and help status15:22
gmannfirst is Board meeting updates15:23
gmannI presented this slide in 30th June Board meeting #link https://docs.google.com/presentation/u/1/d/1ugdwMI2ZM2L8z1sobzHJwDpbvlyWKH02PH7Fi4tkyVc/edit#slide=id.ge1bdf71dac_0_015:23
gmannI was expected some actional item from Board but that did not happen. 15:23
gmannBoard ack this help-needed and stated to broadcast it in the organization/local community etc15:24
gmannthat I think we everyone are doing since 2018 when we re-defined the upstream investment opportunity 15:25
gmannhonestly saying I am not so happy with the no-actionable item from that meeting15:26
gmannand do not know how we can get help here ?15:26
spotzI took it as folks were going back to their own companies15:26
spotzIt was a bit late for me though15:26
gmannyeah theor own company also15:26
gmannbutn that is no different step from what we all including Board are trying since 201815:27
funginot to apologize for them, but i don't expect the board members come to those meetings expecting to make commitments on the behalf of their employers, and probably don't control the budget that assistance would be provided out of in most cases (they're often in entirely separate business units), so they have to lobby internally for that sort of thing15:27
jungleboyjfungi:  True.15:28
fungii'm more disappointed by the years of inaction than in their inability to make any immediate promises15:28
gmannfew of the suggestion are listed in the slide#5 #link https://docs.google.com/presentation/d/1ugdwMI2ZM2L8z1sobzHJwDpbvlyWKH02PH7Fi4tkyVc/edit#slide=id.ge1bdf71dac_0_2415:28
gmannthat was my expecttion and hope. I know those are not easy but in current situation we need such support15:28
gmannanyways that is update from Board meeting. moving next..15:30
gmannCreating a timeline for shutting the service down if help isn't found15:30
gmannclarkb please go ahead15:31
clarkbThis is mostly a request that we start thinkign about what the timeline looks like if we don't end up with help to update the system or host it somewhere else15:31
clarkbI'm not currently in a rush to shut it down, but there is a risk that external circumstances could force that to be done (security concerns or similar)15:32
clarkbHowever, I think it would be good to have some agreement on what not a rush means :)15:32
jungleboyj:-(15:32
clarkbpart of the reason this came up was after a week or two it was noticed that he cluster had completely crashed and I had to go resurrect it15:32
clarkbI don't want to do that indefinitely if there isn't proper care and feeding happening15:33
clarkbThere are also a few problems with indexing currently including the massive log files generated by unittests due to warnings and for some reason logsatsh is emitting events for centuries in the future which floods the elasticsaerch cluster with indexes for the future15:33
clarkbI think the massive log files led to the cluster crashing. The future events problem is more annoying than anything else15:34
gmannyeah, we should start fixing those warning.may be we can ask all projects on ML. I can fix for oslo policy but do nto have bandwidth to fix other15:34
gmannback to shutdown thing15:35
gmannso if we shutdown, bug question is how we are going to debug the failure or how much it will add extra load on gate in term of doing recheck ..15:35
clarkbgmann: to be fair I think most people just recheck anyway and don't do debugging15:35
gmannyeah but not all15:35
gmannafter shutdown there will be many recheck we had to do15:35
clarkbwhere elastic-recheck has been particularly useful is when you have an sdague, jogo, mtreinish, melwitt, or dansmith digging into braoder failures and trying to address them15:35
dansmithyeah, I try to shame people that just blindly recheck,15:36
dansmithbut it's a bit of a losing battle15:36
dansmithstill, removing the *ability* to do real checking sucks :/15:36
clarkbI suspect the biggest impact will not be recheck problems but the once a cycle or so fix a very unstable gate15:36
dansmith...or a more continuously unstable gate15:36
gmannyeah15:36
clarkbya15:36
gmannwhich will directly impact our release 15:37
gmannor feature implementation15:37
fungithough it sounds like the entire cluster was broken for a couple weeks there before anyone noticed it wasn't returning results to their queries15:37
clarkbI think that also is part of why it has been so hard to find help for this. When it is a tool you use every 6 months it is less in your mind continuously for care and feeding15:37
clarkbfungi: yes, but no one notices if the gate is stable15:37
dansmithyeah15:37
clarkbwhich is a big underlying issue here imo15:38
clarkbpeople do notice when there are systemic problems in the gate that need addressing15:38
clarkbanother reason to have a rough timeline is it may help light a fire under people willing to help15:39
clarkbwhen I brought this up last week gmann suggested the end of the Yoga cycle as a potential deadline15:39
dansmithyeah, "no rush" is not as motivating15:39
gmannyeah, I am thinking end of Yoga will be like more than 6 month we called it as last critical call for help15:40
clarkbThat ensures that Xena (hopefully) doesn't have any major changes to the stabilzation process. Then in Yoga we can start planning for replacement/shutdown/etc (though that can start earlier too)15:40
gmannso if there is anyone want to help, should be raising hand by than 15:40
clarkbThat timeline seems reasonable to me15:40
gmannany objection on above deadline ?15:41
fungi"no rush" has also been tempered by "but might be tomorrow, depending on outside factors"15:41
clarkbfungi: yes and I think that is still the message from me15:41
dansmithI'm not happy about the timeline, but accept the need15:41
dansmith"happy with, not happy about" you might say :)15:42
gmanndansmith: means? it too late or early ?15:42
clarkbif we notice abuse of elasticsearch or lgostash that requires upgrades to address we'll be in a situation where we don't have much choice15:42
jungleboyjI think that sounds like a reasonable timeline ... even though we don't want one.15:42
dansmithgmann: it's me being intentionally vague. I'm good with it, just not happy about it.. necessary, but I worry about the inevitable end where nobody has actually stepped up15:42
gmannclarkb: yeah for outside factors we would not be able to do anything and shutdown early ?15:42
clarkbgmann: correct15:43
gmannk15:43
gmanndansmith: correct. my last hope was Board on paid resource but anyways that did not happen  15:43
clarkbAnother concern is the sheer size of the system. I've temporarily shut down 50% of the indexing pipeline and have been monitoring our indexing queue https://grafana.opendev.org/d/5Imot6EMk/zuul-status?viewPanel=17&orgId=1&from=now-24h&to=now15:43
clarkbcompared to elasticsearch the logstash workers aren't huge but it is still something. I think I may turn on 10% again and leave it at 40% shutdown for another week then turn off the extra servers if that looks stable.15:44
clarkbcurrently we seem to be just barely keeping up with demand15:44
fungiyeah, that's just half the indexing workers, not half the system15:44
clarkb(and then having some headroom for feature freeze is a good idea hence only reducing by 40% total)15:45
gmannhow about keeping only check pipeline logs ? 15:45
clarkbgmann: I would probably to the opposite and only keep gate15:45
clarkbcheck is too noisy15:45
clarkbpeople push a lot of broken into check :)15:45
jungleboyjclarkb:  That makes sense to me.15:45
gmannclarkb: yeah but in check we do most of debugging and make it more stable till gate15:46
fungithe check pipeline results are full of noise failures from bad changes, while the gate pipeline should in theory be things which at least got through check and code review to approval15:46
clarkbbut that is another option and reducing the total amount of logs indexed would potentially allows us to remove an elasticsearch server or two (since the major factor there is total storage size)15:46
clarkbgmann: yes, but it is very hard to see anything useful in check because you can't really tell if things are just broken because someone didn't run tox locally or if they are really broken15:46
gmannyeah15:47
clarkbit is still useful to have check, often you want to go and see where something may have been introduced and you can trace that back to check15:47
clarkbbut if we start trimming logs check is what I would drop first15:47
clarkbas far as elasticsaerch disk consumption goes we should have a pretty good idnicate of current db size for 7 days of indexes at the beginning of next week15:47
clarkbthe data is currently a bit off since we had the cluster crash recently15:48
clarkbthat info is available in our cacti instance if you want to see what usage looks like. We have 6TB storage available but 5TB useable beacuse we need to be tolerate to losing one server and its 1TB of disk15:48
clarkbIf we want t ostart pruning logs out then maybe we start that conversation next week when we have a good baseline of data to look at first15:49
gmannor truncate the log storage time? to 2-3 days15:49
clarkbyes that is another option15:49
fungithough that doesn't give you much history to be able to identify when a particular failure started15:50
fungia week is already fairly short in that regard15:50
clarkbyup, but may be enough to identify the source of problems and then work backward in code15:50
gmannyeah, we anyways going to loose that15:50
clarkbas well as track what issues are still occuring15:50
gmannyes15:50
clarkbanyway I think discussion for pruning elasticsearch size is better next week when we have better data to look at. I'm happy to help collect some of that info together and discuss it further next week if we like15:51
fungii wonder if we could change the indexing threshold to >info instead of >debug15:51
clarkb(this is about all I had on this agenda item. I'll go ahead and make note of the Yoga daedline on the mailing list in a response to the thread I started a while back if I can find it)15:51
gmannclarkb: +1 that will be even better15:51
clarkbfungi: the issue with that is a good chunk of logs are the job-output.txt files now with no log level15:52
gmannclarkb: +1  and thanks for publishing deadline on ML15:52
clarkbfungi: this is why the warnings hurt so much15:52
fungiahh, yeah good point15:52
gmannon warnings I will start a thread to fix and start onverting them to error at from openstack lib side so that proejct had tp fix them15:52
gmann#action clarkb to convey the ELK service shutdown deadline on ML15:53
gmann#action gmann to send ML to fix warning and oslo side changes to convert them to error15:53
gmannand we will continue disscussing it in next week 15:54
gmannthanks clarkb fungi for the updates and maintaining these services 15:54
jungleboyj++15:54
gmann#topic Open Reviews15:54
gmann#link https://review.opendev.org/q/projects:openstack/governance+is:open15:54
gmannI added the link for Yoga release name announcement 15:55
gmannplease review that15:55
gmannalso Yoga testing runtime #link https://review.opendev.org/c/openstack/governance/+/79992715:55
gmannwith no change from what we have in Xena15:55
gmannand this one about rbac goal proposal #link https://review.opendev.org/c/openstack/governance/+/79970515:56
gmannand need one more vote in this project-update #link https://review.opendev.org/c/openstack/governance/+/79981715:56
clarkbas a note on the python version available in focal I think 3.9 is avaible now15:57
spotz3.9 is also what will be Stream 915:57
clarkboh I guess it is in universe though15:57
clarkbprobably good to test it but not make it the default15:57
gmannclarkb: I think 3.815:57
gmannclarkb:  we have unit test job as non voting for 3.915:57
clarkbgmann: it has both. But 3.8 is the default and not in universe :)15:58
gmannyeah, default15:58
gmannthat's all for me today, anything else to diuscuss ?15:59
gmannthough 1 min left15:59
jungleboyjNothing here.16:00
gmannif nothing, let's close meeting. 16:00
gmannk16:00
gmannthanks all for joining. 16:00
gmann#endmeeting16:00
opendevmeetMeeting ended Thu Jul  8 16:00:16 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:00
opendevmeetMinutes:        https://meetings.opendev.org/meetings/tc/2021/tc.2021-07-08-15.00.html16:00
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/tc/2021/tc.2021-07-08-15.00.txt16:00
opendevmeetLog:            https://meetings.opendev.org/meetings/tc/2021/tc.2021-07-08-15.00.log.html16:00
spotzThanks gmann16:00
ricolinthanks gmann 16:00
jungleboyjThank you!16:00
*** pojadhav is now known as pojadhav|away16:52
opendevreviewGhanshyam proposed openstack/governance-sigs master: Moving IRC network reference to OFTC  https://review.opendev.org/c/openstack/governance-sigs/+/80013523:39

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!