Tuesday, 2023-03-14

clarkb	meeting time	19:00
clarkb	I'm a bit behind due to the docker stuff. Please excuse my lack of organization today	19:00
clarkb	#startmeeting infra	19:01
opendevmeet	Meeting started Tue Mar 14 19:01:16 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.	19:01
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	19:01
opendevmeet	The meeting name has been set to 'infra'	19:01
clarkb	#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/YZXXWZ7LB3KEF3AMJV3WIPFKCGH2IA2O/ Our Agenda	19:01
clarkb	#topic Announcements	19:02
clarkb	Daving saving time has gone into effect for some of us. Heads up that it will go into effect for others in 2-3 weeks as well	19:02
clarkb	heh I can't type either. *Daylight saving time	19:02
fungi	i favor switching to daving savelights time	19:03
clarkb	Our meeting doesn't change the time it occurs at. It remains at 19:00 UTC but this time may have shifted relative to your local timezone due to the time change	19:03
clarkb	OpenStack is making its 2023.1/Antelope release next week. That should occur on a wednesday so roughly 8 days from now	19:03
fungi	yeah, "festivities" will likely start around 09:00 utc	19:04
fungi	maybe a bit later	19:04
fungi	release notes jobs in the tag	19:04
fungi	pipeline will need about 8-10 hours due to serializatio	19:04
fungi	n	19:04
fungi	would love to work out a better option than that semaphore at some point	19:05
clarkb	its only there to prevent errors that aren't actually fatal in the docs jobs right?	19:05
clarkb	I mean you could just remove the semaphore and tell them to validate docs publication?	19:05
clarkb	or maybe I'm confusing issues and there is a more important reason to have the semaphore	19:05
fungi	well, it's there to solve when someone approves release requests for several branches of the same project and they race uploads of the release notes and one regresses the others	19:06
fungi	because all branches share the same tree in afs	19:06
fungi	so they need a per-project semaphore, which doesn't really exist (without defining a separate one for each of hundreds of repos)	19:06
clarkb	aha, could possibly remove the semaphore temporarily for the release since only that one branch should e getting releases on that day?	19:07
fungi	possible, i'll bring it up with them	19:07
clarkb	The week after next the virtual PTG will be taking place	19:08
clarkb	And that was it for announcements	19:09
clarkb	#topic Bastion Host Changes	19:09
clarkb	ianw: are you here? I was hoping we'd be able to decide on whetheror not we are proceeding with the backups stack.	19:09
clarkb	#link https://review.opendev.org/q/topic:bridge-backups	19:09
ianw	yes :)	19:09
clarkb	It looks like you may need a second reviewer? Something we should probably do in this case since we need multiple people to stash keys?	19:10
clarkb	Any volunteers for second reviews?	19:10
ianw	yes, and probably a few people to say they're on board with holding a secret for it, otherwise it's not going to work	19:10
fungi	i can try to take a look, and am happy to safeguard a piece of the key	19:11
clarkb	I'm happy to stash the bits into my keepassxc db	19:11
ianw	ok, well if fungi can take a look we can move one way or the other	19:12
clarkb	fungi: thanks! I think thats the next step then. Get a second review and assuming review is happy make a plan to distribute the right key bits	19:12
clarkb	anything else bridge related?	19:12
ianw	nope, not for now	19:13
clarkb	#topic Mailman 3	19:13
clarkb	fungi: I haven't seen anything new here, but want to make sure Ididn't miss naything	19:14
corvus	i'm on board for being a (partial) keymaster	19:15
fungi	yeah, i got very close to picking it back up today, before things started to get exciting again	19:16
fungi	so nothing new to share yet	19:16
clarkb	yes boring would be nice occasionally	19:16
fungi	vinz clortho, keymaster of gozer	19:16
clarkb	#topic Gerrit Updates	19:16
clarkb	ianw's stack of copyCondition and submit requirements changes has landed as has the manual update to All-Projects for submit requirements	19:17
clarkb	We did run into some problems with the All-Projects update because 'and' and 'AND' are different in Gerrit 3.6 query expressions	19:17
clarkb	But that got sorted out andI think things have been happy since (at least no new complaints since then)	19:17
fungi	but not in 3.7. that seems like an unfortunate choice of fix not to backport	19:17
clarkb	ianw: from your work on this are there other ACL updates you think we need to make or are we all up to date for modern Gerrit3.7 expecttations?	19:18
ianw	nope, i think we're ready for the 3.7 transition from that POV now	19:18
ianw	i will spend a little time updating https://etherpad.opendev.org/p/gerrit-upgrade-3.7 today	19:19
clarkb	great!	19:19
ianw	a couple of things to check, but i think all known knowns and known unknowns are dealt with :)	19:19
clarkb	ianw: as far as ensuring we don't slide backwards goes can we update the little checker tool to only allow function = NoBlock and require copyCondition not the old thing?	19:19
clarkb	I think if we do those two things it will prevent any cargo culting of old info accidentally	19:20
ianw	oh yes, sorry that's on my todo list. the snag i hit was that the normalizer isn't really a linter in the way of a normal linter, but a transformer, and then if there's a diff it stops	19:20
clarkb	ya in that case maybe just delete the lines we don't want which will produce a diff	19:20
clarkb	and hopefully that diff is clear that we don't want those lines because they are removed (don't need to replace them with an equivalent as that would be more effort)	19:20
ianw	i guess the problem is that that then creates a diff that is wrong	19:21
ianw	i wasn't sure if the point was that you could apply the diff	19:21
ianw	if so, it kind of implies writing a complete function -> s-r transformer	19:21
clarkb	I think the idea was the diff would help people correct their changes and bonus points if you could directl pply it	19:21
clarkb	in this case I think it is ok if we have a diff that isn't going to complete fix changes for peopel and simply force an error and pull the eye to where the problem is	19:22
ianw	i could do something like add a comment line "# the following line is deprecated, work around it"?	19:22
clarkb	++	19:22
ianw	ok, i'll do that then	19:22
clarkb	#topic Project Renames and Gerrit Upgrade	19:24
clarkb	Quick check if we think we are still on track for an April 7th upgrade of Gerrit and project renames	19:24
clarkb	I think the only concern that has come up is the docker org deletion on the 14th	19:25
clarkb	mostly worried that will demand our time and we won't be able to prep for gerrit things appropriately. But it is probably too early to cancel or move the date for that. Mostly bringing it up as a risk	19:25
clarkb	And then I wanted to talk about the coordination of that. Do we want to do the renames and upgrade in one window or two separate windows? And what sorts of times are we looking at?	19:26
clarkb	ianw: I think you were thinking of doing the Gerrit upgrade late April 6 UTC or early April 7 UTC? Then maybe fungi and I do the renames during our working hours April 7 if we do two different windows	19:27
clarkb	If we do one window I can be around to do it all late April 6 early APril 7 but I think that gets more difficult for fungi	19:28
ianw	i guess question 1 is do we want renames or upgrade first?	19:28
corvus	i agree it's worth keeping an eye on, and if anyone feels overburdened, raise a flag and we can slow down or deal with it. but from right now at least, i think we can work on both.	19:28
fungi	i can swing it	19:29
fungi	i just can't do the week after as i'll be offline	19:29
clarkb	ianw: I think one reason to do renames first would be if we had previously done renames under that gerrit version. But we have never reanmed anything under 3.6 so order doesn't matter much	19:29
clarkb	fungi: ianw ok in that case maybe aim for ~2200-2300 UTC April 6 and do both of them?	19:30
clarkb	and we can sort out the order antoher time if we're committing to a single block like that	19:30
ianw	ok, if that's a bit late we coul dmove it forward a few hours too	19:31
fungi	wfm	19:31
clarkb	Ok with that decided (lets say 2200 UTC to make it a bit easier for fungi) should we send email about that now?	19:31
clarkb	for some value of now approximately equal to soon	19:32
ianw	++	19:32
clarkb	I can do that I just want to make sure we're reasonably confident first	19:32
clarkb	cool I'll add that to my todo list	19:32
fungi	thanks!	19:32
ianw	i am happy to drive, and we'll have checklists, so hopefully it's really just don't be drunk at that time in case the worst happens :)	19:32
clarkb	haha	19:32
ianw	or maybe, get drunk, in case the worst happens. either way :)	19:32
clarkb	#topic Old Server Upgrades	19:33
clarkb	Much progress has been made with the giteas.	19:33
clarkb	As of Friday we're entirely jammy for the gitea cluster in production behind the load balancer	19:33
clarkb	I have changes up to clean up gitea01-04 but have WIP'd them becuase I think the openstack release tends to be a high load scenario for the giteas and that is a good sanity check we won't need those servers before deleting them	19:34
clarkb	I'll basically aim to keep the gitea01-04 backends replicated to until after the openstack release and if all looks well after that clean them up	19:34
fungi	yeah, especially when some of the deployment projects update and all their users start pulling the new release at the same time	19:35
clarkb	there are two reasons for the caution here. The first is that we've changed the flavor type for the new servers and we've seen some high cpu steal at times. But those flavors are bigger on more modern cpus so in theory will be quicker anyway so I've reduced the gitea backend count from 6 to 8	19:35
clarkb	* 8 to 6	19:35
clarkb	so far though those new servers have looked ok	19:35
clarkb	just want to keep an eye out through the release before making the cleanup more permanent	19:35
clarkb	ianw has also started looking at nameserver replacements	19:36
clarkb	#link https://etherpad.opendev.org/p/2023-opendev-dns	19:36
clarkb	#link https://review.opendev.org/q/topic:jammy-dns	19:36
clarkb	good news the docker stuff doesn't affect dns :)	19:36
ianw	yep sorry got toally distracted on that, but will update all that now that we've got consenus on the names	19:36
fungi	thanks for working on it	19:36
clarkb	Ya this is all good progress. Still more work to do including ehtperad which I had previously planned to do after the PTG	19:37
clarkb	its possible to get it done quickly pre ptg but the ptg relies on etherpad so much I'd kinda prefer changing things after	19:37
clarkb	jitsi meet as well	19:37
corvus	clarkb: the gitea graphs look good. qq (i hope it's quick, if not, nevermind and we can take it offline) -- what happened between march 7-9 -- maybe we had fewer new servers and then added more?	19:37
clarkb	corvus: yes, we had 4 new servers and we also got hit by a bot crawler that was acting like a 2014 samsung phone	19:38
clarkb	corvus: we addressed that by updating our UA agent block list to block the nacient phone and added two more servers for a total of 6	19:38
clarkb	I thought we might get away with 4 servers instead of 8 but that incident showed that was probably too small	19:39
fungi	so the issue was twofold: a bad actor and fewer backends	19:39
corvus	cool; thanks	19:39
fungi	it noticeably slowed down response times for clients too	19:39
fungi	while that was going on	19:39
clarkb	If I get time this week or next I'll probably try to do a server or two that the ptg doesn't interact with (mirror nodes maybe?)	19:40
clarkb	anyway measurable progress here. Thanks for all the help	19:40
clarkb	#topic AFS volume quotas and utilization	19:40
clarkb	Last week I bumped AFS quotas for the volumes that were very close to the limit	19:41
clarkb	That avoided breaking any of those distro repo mirrors which is great. But doesn't address the every growing disk utilization problem	19:41
clarkb	also it looks like deleting fedora 35 and adding fedora 37 resulted in a net increase of disk utilization	19:41
ianw	i should be able to remove 36 fairly quickly	19:42
clarkb	I did poke around looking for some easy wins deleting things (something that has worked well in the past) and did't really come up with any other than: Maybe we can drop the openeuler mirror and force them to pull from upstream like we do with rocky?	19:42
clarkb	ianw: oh thats good to know	19:42
fungi	there's also a debian release coming up which we'll probably need at least a temporary bump in capacity for before we can drop old-oldstable	19:42
clarkb	Maybe lets get that done before making any afs decisions. The other idea I had was we should maybe consider adding a new backing volume to the two dfw fileservers	19:42
clarkb	I don't think this is urgent as long as we are not adding new stuff (debian will force the issue when that happens)	19:44
clarkb	I guess start with fedora 36 cleanup then evaluate what is necessary to add new debian content	19:44
fungi	worth trying to find out if debian-buster images are still heavily used, or transition them off our mirroring if they are infrequently used but unlikely to get dropped from use soon	19:44
fungi	in order to free up room for debian-bookworm in a few months	19:44
clarkb	fungi: ya thats an option. Can also make buster talk to upstream if infrequently used	19:45
ianw	you can't have one volume > 2tb right (that was pypi's issue?)	19:45
clarkb	but keep the images	19:45
clarkb	ianw: correct	19:45
clarkb	ianw: we can add up to 12 cinder volumes each a max of 1TB (these are cloud limitations) to the lvm on the fileservers so we are wll under total afs disk potential	19:45
fungi	yeah, that's what i meant by transition off our mirroring	19:45
clarkb	but then an individual afs volume can't be more than 2TB	19:46
fungi	but also the more cinder devices we attach, the more precarious the server becomes	19:46
ianw	i guess the only problem is if those screw up, it becomes increasingly difficult to recover	19:47
ianw	heh, jinx	19:47
corvus	we can add more servers	19:47
clarkb	ya and also just general risk of an outage	19:47
fungi	it basically multiplies the chances of the server suffering a catastrophic failure from an iscsi incident	19:47
fungi	right, more afs servers with different rw volumes may be more robust than adding more storage to one server	19:48
corvus	(doesn't affect our overall chances of being hit by an iscsi incident, but may contain the fallout and make it easier to recover)	19:48
fungi	the risk of an outage doesn't decrease, but the impact of an outage for a single device or server decreases to just the volumes served from it	19:48
clarkb	corvus: does growing vicepa require services be stopped?	19:49
ianw	we also add everything under vicepa -- we could use other partitions?	19:49
clarkb	if so that may be another good reason to use new servers	19:49
clarkb	ianw: heh jinx. I'm not sure what hte mechanics of the underlying data are like and whether or not one appraoch should be preferred	19:50
fungi	also, vos release performance may improve, since we effectively serialize those today with the assumption that otherwise we'll overwhelm the one server with the rw volumes	19:50
clarkb	we've only got 10 minutes left and there are a couple of other things I wanted to discuss. Lets keep afs in mind and we can brainstorm ideas going forward but it isn't urgent today	19:50
clarkb	more of a mid term thing	19:51
clarkb	#topic Quo vadis Storyboard	19:51
corvus	we're not using raw partitions, we're using ext filesystems, so i don't think anything needs to be stopped to grow it, but i'm not positive on that.	19:51
clarkb	corvus: ack	19:51
clarkb	frickler won't be able to attend this meeting today but made a good point that with the PTG coming up there may be discussions from projects about not using storyboard aynmore	19:51
clarkb	I mentioned in #opendev that I think we should contiue to encourage those groups to work together and coordinate any tooling they might produce so that we don't have duplicated efforts	19:52
clarkb	But does leave open the question for what we should do. I also mentioned in #opendev that if I was a lone person making a decision I think I'd look at sunsetting storyboard since we haven't been able to effectively operate/upgrade/maintain it	19:52
clarkb	with an ideal sunset involving more than 30 days notice and if we can makin some sort of read only archive that is easier to mange	19:53
clarkb	That said I don't think I should make decisions like that alone so am open to feedback and other ideas	19:53
clarkb	I'm also happy to jump into ptg sessions that involve storyboard to try and help where I can during the ptg	19:53
clarkb	Maybe ya'll can digest those ideas and let me know if they make sense or are terrible or have better ones :)	19:54
clarkb	Definitely not something we have time for today or in this meeting. But the feedback would be helpful	19:54
ianw	perhaps sunsetting it would be the push someone needs to dedicate resources on it?	19:54
clarkb	ianw: its possible	19:54
clarkb	I think that is unlikely but it is a theoretical outcome	19:55
ianw	either way something happens then, i guess	19:55
clarkb	ok running out of time and one more item remains	19:55
clarkb	this is not on the agenda but worth bringing up	19:55
clarkb	#topic Docker ending free team organizations	19:55
fungi	because people will ask about it anyway ;)	19:55
clarkb	Docker is ending their free team organization setup which we use for opendevorg and zuul on docker hub	19:56
clarkb	(there are actually two other orgs openstackinfra and stackforge which are unused and empty)	19:56
clarkb	This will affect us one way or another and we are very likely going to need to make changes	19:56
clarkb	It isn't clear yet which changes we will need to make and of the options which we should take but I started an etherpad to collect info and try to make that decision making easier	19:56
clarkb	#link https://etherpad.opendev.org/p/MJTzrNTDMFyEUxi1ReSo	19:57
clarkb	I think we should continue to gather information and collect ideas there for the next day or two without trying to attribute too much value to any of them. Then once we have a good clear picture make some decisions	19:57
corvus	one point it would be useful to clarify is whether it's possible, and if so how, we can have an unpaid organization on quay.io to host our public images. quay.io says that's possible, but i only see a $15/mo developer option on the pricing page, and account signup requires a phone number.	19:57
clarkb	If you sign up with aphone number and get what you need I'm happy to sacrifice mine	19:58
clarkb	ianw: ^ maybe that is something you can ask about at red hat?	19:58
clarkb	basically clarify what account setup requirements are and if public open source projects need to pay for public image hosting	19:58
corvus	(i'd be happy to sign up to find out too, except that i wear a lot of hats, and if i only get one phone number, i don't know if i should burn it for "opendev" "zuul" or "acme gating"...)	19:59
ianw	i can certainly look into it -- of the top of my head i don't know anyone directly involved for instant answers but i'll see what i can find	19:59
clarkb	ianw: thanks!	19:59
corvus	(or maybe it's okay to have two accounts with the same phone number.. <shrug>)	19:59
clarkb	Also NeilHanlon (Rocky Linux) and Ramereth (OSUOSL) have similar issues/concerns with this and we may be able to learn from each other. They have both applied for docker's open source program which is apparently one way around this	20:00
clarkb	I asked them to provide us with info on how that goes just so that we've got it and can weight that option	20:00
fungi	or at least someeone on twitter with the same name as a c-level exec at docker claimed that they won't delete teams who apply for the open source tier	20:01
* fungi takes nothing for granted these days		20:01
clarkb	yes, they also say they won't allow names to be reused which means if/when we get our orgs deleted others shouldn't be able to impersonate us	20:01
clarkb	this is important because docker clients default to dockerhub if you don't qualify the image names with a location	20:02
clarkb	and we are at time. I can smell lunch too so we'll end here :)	20:02
clarkb	Thank you everyone!	20:02
clarkb	#endmeeting	20:02
opendevmeet	Meeting ended Tue Mar 14 20:02:53 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	20:02
opendevmeet	Minutes: https://meetings.opendev.org/meetings/infra/2023/infra.2023-03-14-19.01.html	20:02
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-03-14-19.01.txt	20:02
opendevmeet	Log: https://meetings.opendev.org/meetings/infra/2023/infra.2023-03-14-19.01.log.html	20:02
corvus	fungi: good point; technically someone on github who says they're at docker said there won't be namespace takeover issues... but they don't actually have any flair that would confirm their position...	20:03
fungi	oof	20:04

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!