Wednesday, 2023-06-21

paladoxok it's finally started "104/191 (54.45%) partitions replicated in 899.48s (0.12/sec, 12m remaining)"00:02
paladoxalthough that time seems quick00:02
timburkeyeah, the time estimates are notoriously bad00:19
opendevreviewTim Burke proposed openstack/swift master: Green GreenDBConnection.execute  https://review.opendev.org/c/openstack/swift/+/86605101:16
opendevreviewTim Burke proposed openstack/swift master: tests: Fix replicator test for py311  https://review.opendev.org/c/openstack/swift/+/88653801:16
opendevreviewTim Burke proposed openstack/swift master: tests: Stop trying to mutate instantiated EntryPoints  https://review.opendev.org/c/openstack/swift/+/88653901:16
opendevreviewTim Burke proposed openstack/swift master: CI: test under py311  https://review.opendev.org/c/openstack/swift/+/88654101:16
paladoxtimburke: would you know why using the fallocate thing, it didn’t stop swift filling up to 100%?10:05
opendevreviewPhilippe SERAPHIN proposed openstack/swift master: In the case where we can't stat the device, an error search in the Kernel logs must also be carried out, and the device unmounted if necessary  https://review.opendev.org/c/openstack/swift/+/88663312:37
opendevreviewJianjian Huo proposed openstack/swift master: proxy: add new metrics to account/container_info cache for skip/miss  https://review.opendev.org/c/openstack/swift/+/88579814:22
opendevreviewASHWIN A NAIR proposed openstack/swift master: combine duplicated code in replication and EC GET paths  https://review.opendev.org/c/openstack/swift/+/88664515:37
timburkepaladox, i think there are two main issues. first is that fallocate_reserve only works for data passing through the object-server; rsync traffic can fill a disk completely. even if you were using ssync for replication, though, since swift data and logs are all on the same drive, once fallocate_reserve trips and swift starts returning 507s you can find yourself filling up the disk with logs about the 507s :-(15:47
paladoxoh15:48
timburkeoh, that reminds me though! you might want to go looking for rsync tempfiles -- those could also be deleted to help free space16:17
opendevreviewTim Burke proposed openstack/swift master: CI: test under py311  https://review.opendev.org/c/openstack/swift/+/88654116:21
opendevreviewASHWIN A NAIR proposed openstack/swift master: combine duplicated code in replication and EC GET paths  https://review.opendev.org/c/openstack/swift/+/88664516:29
opendevreviewASHWIN A NAIR proposed openstack/swift master: combine duplicated code in replication and EC GET paths  https://review.opendev.org/c/openstack/swift/+/88664516:41
opendevreviewTim Burke proposed openstack/swift master: proxy: Bring back logging/metrics for get_*_info requests  https://review.opendev.org/c/openstack/swift/+/88493117:11
opendevreviewTim Burke proposed openstack/swift master: CI: Move py3 probe tests to centos 9 stream  https://review.opendev.org/c/openstack/swift/+/88665417:22
paladoxtimburke: would you know for the folllowing, how i would balance it correctly. 3 of the servers have 600g disks, 1 900 and 1 500. One of the disks has like 200g free but some how all the other disks are full and it keeps sending requests there (uploading):19:40
paladoxhttps://www.irccloud.com/pastebin/OxBSaUf5/19:40
paladoxi thought 100 would work but didn't hence why i saw someone else do 4000/8000 (that didn't work properly for us either so 4000/6000 but that didn't either)19:42
paladoxoh there's a prevent full disk scenario on https://docs.openstack.org/swift/latest/admin_guide.html19:53
opendevreviewJianjian Huo proposed openstack/swift master: proxy: add new metrics to account/container_info cache for skip/miss  https://review.opendev.org/c/openstack/swift/+/88579820:32
opendevreviewTim Burke proposed openstack/swift master: Add a swift-reload command  https://review.opendev.org/c/openstack/swift/+/83317420:47
opendevreviewTim Burke proposed openstack/swift master: systemd: Send STOPPING/RELOADING notifications  https://review.opendev.org/c/openstack/swift/+/83763320:47
opendevreviewTim Burke proposed openstack/swift master: Add abstract sockets for process notifications  https://review.opendev.org/c/openstack/swift/+/83764120:47
kotagood morning20:53
timburke#startmeeting swift21:00
opendevmeetMeeting started Wed Jun 21 21:00:18 2023 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.21:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.21:00
opendevmeetThe meeting name has been set to 'swift'21:00
timburkewho's here for the swift meeting?21:00
kotahi21:00
acoleso/21:01
timburkei *did* remember to update the agenda! (though only just now)21:01
timburke#link https://wiki.openstack.org/wiki/Meetings/Swift21:01
timburkefirst up21:01
timburke#topic ssync metadata corruption bug21:02
timburke#link https://review.opendev.org/c/openstack/swift/+/88424021:02
timburkehas the fix21:02
timburkebut it could use reviews21:02
timburke#link https://review.opendev.org/c/openstack/swift/+/88495421:04
timburkeis a follow-on to update a probe test, but i'd like a little more consensus about switching to direct client instead of going through the proxy. if we have that consensus, i'm fine with squashing it into the fix21:04
acolesIIUC that is to avoid a swiftclient bug?>21:06
timburkeyep -- py3 stdlib won't parse non-ascii header names correctly21:07
timburkei think i looked into working around it, but eventually gave up -- too many layers to cut through, and it's especially difficult to do it in a way that doesn't involve monkeypatching stdlib for *everything*21:09
timburke(which seems risky for client code)21:10
acolesok21:10
mattoliverSorry im late21:10
timburkecan anyone volunteer to review the fix?21:12
acolesI will21:14
timburkethanks, acoles. and maybe i can hunt down the bug reporter, have him try the fix and report back 😁21:15
mattoliverI'm still catching up on things, and stuck down a rabbithole at work, but can add it to my todo.21:15
timburke#topic get info backend request logging/metrics21:15
timburke#link https://review.opendev.org/c/openstack/swift/+/88493121:15
timburkeacoles and jian have done some reviews, thanks guys!21:16
timburkethere was a point at which i had it in a place that it didn't actually fix things, but i think it's in a good place again now21:17
timburkeso if you get a chance, i'd appreciate some fresh eyes on it. and thanks again for the new tests, acoles!21:17
acolesNP21:18
timburke#topic py311 support21:18
acolesyes I will take another look21:18
timburkei took a bit of time the last week or so to get to the point of having tests pass on py31121:19
timburkeculminating in having a passing gate job!21:19
timburke#link https://review.opendev.org/c/openstack/swift/+/88654121:19
timburkethere are a few pre-req patches to fix up some tests, but zaitcev has been quick to review & approve (thanks!)21:20
zaitcevSure.21:20
mattoliverOh nice21:21
timburkei maybe should have proposed them as separate changes, with a Depends-On in the CI change to bring them all together21:21
zaitcevI may not know how sharding works, but I know what a subclass is in Python.21:21
timburke'cause the base of the chain could probably use a bit of work (better commit message, bug report, probably even an upstream python bug)21:22
timburkenote that the gate job is still using jammy, which has got a 3.11.0 RC, so it still needed the __slots__ workaround for the segfault21:23
timburkenext up21:24
timburke#topic tagged metrics21:24
timburkei forget if i'd mentioned it before, but i finally got a patch up to try out some statsd extensions for labeled metrics21:25
timburke#link https://review.opendev.org/c/openstack/swift/+/88532121:25
mattoliverI'm really interested in checking it out! I was off last week, so will try and get around to poking around it this week21:26
timburkei'd really appreciate it if people could take a look at how it affects the calling code in something like proxy-logging, say, before i get too far into fixing up tests and such21:27
indianwhocodessorry im late21:27
timburkeand i still need to get some docs together about how to try it out in a SAIO21:27
mattoliverThe docs would be good21:28
timburkeall right, that's all i've got21:28
timburke#topic open discussion21:29
timburkeanything else we should bring up?21:29
acolesexciting!21:29
acoleslabeled metrics I mean :)21:29
mattoliverI got nothing this week21:30
timburkeit's been like 3 months since our last release, i should put another one together21:31
timburkeif anyone has patches they feel should be sure to get into the next release, please let me know!21:31
mattoliverKk21:32
timburkeall right, i think i'll call it then21:33
timburkethank you all for coming, and thank you for working on swift!21:33
timburke#endmeeting21:33
opendevmeetMeeting ended Wed Jun 21 21:33:24 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)21:33
opendevmeetMinutes:        https://meetings.opendev.org/meetings/swift/2023/swift.2023-06-21-21.00.html21:33
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/swift/2023/swift.2023-06-21-21.00.txt21:33
opendevmeetLog:            https://meetings.opendev.org/meetings/swift/2023/swift.2023-06-21-21.00.log.html21:33
timburkeoh, and there were messages from paladox! so the standard way to assign weights is to have them match the size of the drive -- given 3x600GB, 1x900GB, and 1x500GB servers, i'd expect the ring to have three disks with weight 600, and one each with 900 and 500. the exact value doesn't really matter, but the *ratio* between weights really does (so it could be 6, 6, 6, 9, 5, say)21:41
paladoxohhhhhhhhhhhhh21:41
paladoxthank you so much! going to do that now21:42
paladoxtimburke:  if the disk size is like 525g, the weight would be 500?21:43
paladoxAlso do you know the best way to repair orphaned data/objects21:43
paladox(and 1tb i guess is 1000)21:45
timburkeit's kinda up to you what values you want -- if it were me, i'd probably look at the output of df or something and truncate a few decimal places21:46
timburkewhat do you mean by "orphaned"? like, old SLO/DLO segments whose manifests have been deleted?21:47
timburkethere are several complications with trying to clean up segments, but they all really stem from the same central problem: segment data is just another object that can be uploaded and referenced21:51
timburkeso problem 1: users may have uploaded data as part of a large object that they *also* want to be able to reference directly. for example, you might have daily logs files getting uploaded with some naming convention that makes it easy to *also* have DLOs to roll them up as monthly21:54
VoidwalkerPicking up from paladox here, our problem with orphaned data/files is more to do with the fact that we have files that exist on the server that don't exist in the container's listing, and we are trying to figure out how to get the files listed again21:56
timburkeah -- have you checked async pendings? how are the object-updater logs looking?21:57
timburkeand are the container DBs on the same handful of full disks?21:57
VoidwalkerIt's the result of a crash on our account server -- many of the db files there were corrupted and needed to be replaced21:57
timburkei think i've got a script somewhere that could re-send the container update... i'd have to dig for a bit22:03
timburkeif you wanted to go the other way, though, and delete data not in listings, we've got a dark data watcher; see https://github.com/openstack/swift/blob/2.31.1/etc/object-server.conf-sample#L596-L61322:03
timburke(we could probably add a re-send-the-update mode to that...)22:03
timburkebut i'd start by getting disks less full (probably ideally by adding another server or two with fresh disks), then checking on the state of async pendings, then start figuring out how to get listings back into shape22:05
VoidwalkerI've already got a script in place to delete the files we're not repairing from the disk, but it might be a good idea to wait on expanding the available storage22:08
*** Voidwalker is now known as Guest376522:14
timburkethe trouble is that full disks will complicate a lot of things -- you usually want to issue a real DELETE through the swift API when cleaning up dark data, that way you aren't fighting with replication when directly rm'ing files. but DELETEs create tombstones, and tombstones need disk space...22:17
timburkemeanwhile, if the container disks are *also* full, the object-server and updater won't be able to write new rows, so async pendings can't clear22:18
kotaum Just FYI. I'll be at SFO next week for business trip.23:57

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!