Wednesday, 2023-06-21

paladox	ok it's finally started "104/191 (54.45%) partitions replicated in 899.48s (0.12/sec, 12m remaining)"	00:02
paladox	although that time seems quick	00:02
timburke	yeah, the time estimates are notoriously bad	00:19
opendevreview	Tim Burke proposed openstack/swift master: Green GreenDBConnection.execute https://review.opendev.org/c/openstack/swift/+/866051	01:16
opendevreview	Tim Burke proposed openstack/swift master: tests: Fix replicator test for py311 https://review.opendev.org/c/openstack/swift/+/886538	01:16
opendevreview	Tim Burke proposed openstack/swift master: tests: Stop trying to mutate instantiated EntryPoints https://review.opendev.org/c/openstack/swift/+/886539	01:16
opendevreview	Tim Burke proposed openstack/swift master: CI: test under py311 https://review.opendev.org/c/openstack/swift/+/886541	01:16
paladox	timburke: would you know why using the fallocate thing, it didn’t stop swift filling up to 100%?	10:05
opendevreview	Philippe SERAPHIN proposed openstack/swift master: In the case where we can't stat the device, an error search in the Kernel logs must also be carried out, and the device unmounted if necessary https://review.opendev.org/c/openstack/swift/+/886633	12:37
opendevreview	Jianjian Huo proposed openstack/swift master: proxy: add new metrics to account/container_info cache for skip/miss https://review.opendev.org/c/openstack/swift/+/885798	14:22
opendevreview	ASHWIN A NAIR proposed openstack/swift master: combine duplicated code in replication and EC GET paths https://review.opendev.org/c/openstack/swift/+/886645	15:37
timburke	paladox, i think there are two main issues. first is that fallocate_reserve only works for data passing through the object-server; rsync traffic can fill a disk completely. even if you were using ssync for replication, though, since swift data and logs are all on the same drive, once fallocate_reserve trips and swift starts returning 507s you can find yourself filling up the disk with logs about the 507s :-(	15:47
paladox	oh	15:48
timburke	oh, that reminds me though! you might want to go looking for rsync tempfiles -- those could also be deleted to help free space	16:17
opendevreview	Tim Burke proposed openstack/swift master: CI: test under py311 https://review.opendev.org/c/openstack/swift/+/886541	16:21
opendevreview	ASHWIN A NAIR proposed openstack/swift master: combine duplicated code in replication and EC GET paths https://review.opendev.org/c/openstack/swift/+/886645	16:29
opendevreview	ASHWIN A NAIR proposed openstack/swift master: combine duplicated code in replication and EC GET paths https://review.opendev.org/c/openstack/swift/+/886645	16:41
opendevreview	Tim Burke proposed openstack/swift master: proxy: Bring back logging/metrics for get_*_info requests https://review.opendev.org/c/openstack/swift/+/884931	17:11
opendevreview	Tim Burke proposed openstack/swift master: CI: Move py3 probe tests to centos 9 stream https://review.opendev.org/c/openstack/swift/+/886654	17:22
paladox	timburke: would you know for the folllowing, how i would balance it correctly. 3 of the servers have 600g disks, 1 900 and 1 500. One of the disks has like 200g free but some how all the other disks are full and it keeps sending requests there (uploading):	19:40
paladox	https://www.irccloud.com/pastebin/OxBSaUf5/	19:40
paladox	i thought 100 would work but didn't hence why i saw someone else do 4000/8000 (that didn't work properly for us either so 4000/6000 but that didn't either)	19:42
paladox	oh there's a prevent full disk scenario on https://docs.openstack.org/swift/latest/admin_guide.html	19:53
opendevreview	Jianjian Huo proposed openstack/swift master: proxy: add new metrics to account/container_info cache for skip/miss https://review.opendev.org/c/openstack/swift/+/885798	20:32
opendevreview	Tim Burke proposed openstack/swift master: Add a swift-reload command https://review.opendev.org/c/openstack/swift/+/833174	20:47
opendevreview	Tim Burke proposed openstack/swift master: systemd: Send STOPPING/RELOADING notifications https://review.opendev.org/c/openstack/swift/+/837633	20:47
opendevreview	Tim Burke proposed openstack/swift master: Add abstract sockets for process notifications https://review.opendev.org/c/openstack/swift/+/837641	20:47
kota	good morning	20:53
timburke	#startmeeting swift	21:00
opendevmeet	Meeting started Wed Jun 21 21:00:18 2023 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.	21:00
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	21:00
opendevmeet	The meeting name has been set to 'swift'	21:00
timburke	who's here for the swift meeting?	21:00
kota	hi	21:00
acoles	o/	21:01
timburke	i did remember to update the agenda! (though only just now)	21:01
timburke	#link https://wiki.openstack.org/wiki/Meetings/Swift	21:01
timburke	first up	21:01
timburke	#topic ssync metadata corruption bug	21:02
timburke	#link https://review.opendev.org/c/openstack/swift/+/884240	21:02
timburke	has the fix	21:02
timburke	but it could use reviews	21:02
timburke	#link https://review.opendev.org/c/openstack/swift/+/884954	21:04
timburke	is a follow-on to update a probe test, but i'd like a little more consensus about switching to direct client instead of going through the proxy. if we have that consensus, i'm fine with squashing it into the fix	21:04
acoles	IIUC that is to avoid a swiftclient bug?>	21:06
timburke	yep -- py3 stdlib won't parse non-ascii header names correctly	21:07
timburke	i think i looked into working around it, but eventually gave up -- too many layers to cut through, and it's especially difficult to do it in a way that doesn't involve monkeypatching stdlib for everything	21:09
timburke	(which seems risky for client code)	21:10
acoles	ok	21:10
mattoliver	Sorry im late	21:10
timburke	can anyone volunteer to review the fix?	21:12
acoles	I will	21:14
timburke	thanks, acoles. and maybe i can hunt down the bug reporter, have him try the fix and report back 😁	21:15
mattoliver	I'm still catching up on things, and stuck down a rabbithole at work, but can add it to my todo.	21:15
timburke	#topic get info backend request logging/metrics	21:15
timburke	#link https://review.opendev.org/c/openstack/swift/+/884931	21:15
timburke	acoles and jian have done some reviews, thanks guys!	21:16
timburke	there was a point at which i had it in a place that it didn't actually fix things, but i think it's in a good place again now	21:17
timburke	so if you get a chance, i'd appreciate some fresh eyes on it. and thanks again for the new tests, acoles!	21:17
acoles	NP	21:18
timburke	#topic py311 support	21:18
acoles	yes I will take another look	21:18
timburke	i took a bit of time the last week or so to get to the point of having tests pass on py311	21:19
timburke	culminating in having a passing gate job!	21:19
timburke	#link https://review.opendev.org/c/openstack/swift/+/886541	21:19
timburke	there are a few pre-req patches to fix up some tests, but zaitcev has been quick to review & approve (thanks!)	21:20
zaitcev	Sure.	21:20
mattoliver	Oh nice	21:21
timburke	i maybe should have proposed them as separate changes, with a Depends-On in the CI change to bring them all together	21:21
zaitcev	I may not know how sharding works, but I know what a subclass is in Python.	21:21
timburke	'cause the base of the chain could probably use a bit of work (better commit message, bug report, probably even an upstream python bug)	21:22
timburke	note that the gate job is still using jammy, which has got a 3.11.0 RC, so it still needed the __slots__ workaround for the segfault	21:23
timburke	next up	21:24
timburke	#topic tagged metrics	21:24
timburke	i forget if i'd mentioned it before, but i finally got a patch up to try out some statsd extensions for labeled metrics	21:25
timburke	#link https://review.opendev.org/c/openstack/swift/+/885321	21:25
mattoliver	I'm really interested in checking it out! I was off last week, so will try and get around to poking around it this week	21:26
timburke	i'd really appreciate it if people could take a look at how it affects the calling code in something like proxy-logging, say, before i get too far into fixing up tests and such	21:27
indianwhocodes	sorry im late	21:27
timburke	and i still need to get some docs together about how to try it out in a SAIO	21:27
mattoliver	The docs would be good	21:28
timburke	all right, that's all i've got	21:28
timburke	#topic open discussion	21:29
timburke	anything else we should bring up?	21:29
acoles	exciting!	21:29
acoles	labeled metrics I mean :)	21:29
mattoliver	I got nothing this week	21:30
timburke	it's been like 3 months since our last release, i should put another one together	21:31
timburke	if anyone has patches they feel should be sure to get into the next release, please let me know!	21:31
mattoliver	Kk	21:32
timburke	all right, i think i'll call it then	21:33
timburke	thank you all for coming, and thank you for working on swift!	21:33
timburke	#endmeeting	21:33
opendevmeet	Meeting ended Wed Jun 21 21:33:24 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	21:33
opendevmeet	Minutes: https://meetings.opendev.org/meetings/swift/2023/swift.2023-06-21-21.00.html	21:33
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/swift/2023/swift.2023-06-21-21.00.txt	21:33
opendevmeet	Log: https://meetings.opendev.org/meetings/swift/2023/swift.2023-06-21-21.00.log.html	21:33
timburke	oh, and there were messages from paladox! so the standard way to assign weights is to have them match the size of the drive -- given 3x600GB, 1x900GB, and 1x500GB servers, i'd expect the ring to have three disks with weight 600, and one each with 900 and 500. the exact value doesn't really matter, but the ratio between weights really does (so it could be 6, 6, 6, 9, 5, say)	21:41
paladox	ohhhhhhhhhhhhh	21:41
paladox	thank you so much! going to do that now	21:42
paladox	timburke: if the disk size is like 525g, the weight would be 500?	21:43
paladox	Also do you know the best way to repair orphaned data/objects	21:43
paladox	(and 1tb i guess is 1000)	21:45
timburke	it's kinda up to you what values you want -- if it were me, i'd probably look at the output of df or something and truncate a few decimal places	21:46
timburke	what do you mean by "orphaned"? like, old SLO/DLO segments whose manifests have been deleted?	21:47
timburke	there are several complications with trying to clean up segments, but they all really stem from the same central problem: segment data is just another object that can be uploaded and referenced	21:51
timburke	so problem 1: users may have uploaded data as part of a large object that they also want to be able to reference directly. for example, you might have daily logs files getting uploaded with some naming convention that makes it easy to also have DLOs to roll them up as monthly	21:54
Voidwalker	Picking up from paladox here, our problem with orphaned data/files is more to do with the fact that we have files that exist on the server that don't exist in the container's listing, and we are trying to figure out how to get the files listed again	21:56
timburke	ah -- have you checked async pendings? how are the object-updater logs looking?	21:57
timburke	and are the container DBs on the same handful of full disks?	21:57
Voidwalker	It's the result of a crash on our account server -- many of the db files there were corrupted and needed to be replaced	21:57
timburke	i think i've got a script somewhere that could re-send the container update... i'd have to dig for a bit	22:03
timburke	if you wanted to go the other way, though, and delete data not in listings, we've got a dark data watcher; see https://github.com/openstack/swift/blob/2.31.1/etc/object-server.conf-sample#L596-L613	22:03
timburke	(we could probably add a re-send-the-update mode to that...)	22:03
timburke	but i'd start by getting disks less full (probably ideally by adding another server or two with fresh disks), then checking on the state of async pendings, then start figuring out how to get listings back into shape	22:05
Voidwalker	I've already got a script in place to delete the files we're not repairing from the disk, but it might be a good idea to wait on expanding the available storage	22:08
*** Voidwalker is now known as Guest3765		22:14
timburke	the trouble is that full disks will complicate a lot of things -- you usually want to issue a real DELETE through the swift API when cleaning up dark data, that way you aren't fighting with replication when directly rm'ing files. but DELETEs create tombstones, and tombstones need disk space...	22:17
timburke	meanwhile, if the container disks are also full, the object-server and updater won't be able to write new rows, so async pendings can't clear	22:18
kota	um Just FYI. I'll be at SFO next week for business trip.	23:57

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!