Monday, 2023-07-10

paladoxhmm, maybe 4 x 500 (one is 490g) and 1 x 1000 (it's 925g)12:15
opendevreviewAlistair Coles proposed openstack/swift master: Encode header in latin-1 with wsgi_to_bytes  https://review.opendev.org/c/openstack/swift/+/88424014:20
opendevreviewAlistair Coles proposed openstack/swift master: proxy: remove client_chunk_size and skip_bytes from GetOrHeadHandler  https://review.opendev.org/c/openstack/swift/+/88682315:25
opendevreviewAlistair Coles proposed openstack/swift master: proxy: encapsulate Getter resp, node and parts_iter  https://review.opendev.org/c/openstack/swift/+/88699415:25
paladoxhmm, not sure how to get the perfect balance. so we have 3 x 525, 1 x 490 and 1 x 916. But seem the 3 x 525 ran out of storage whilst there was like 80g left on the 1 x 916 one (this was with the 3 x 600, 1 x 500 and 1 x 900 for weight).16:32
timburkepaladox, it's hard to get a *perfect* balance, especially in a small cluster -- the distribution of object sizes tends to be lumpy, some partitions are more (or less) densely filled than average, etc.16:57
timburkeyou can keep fiddling with weights to try to get through the current pain -- but whether a cluster is "healthy" or not should never come down to whether we've got a handful of partitions assigned to one disk vs another. when it seems like it does, we're already to "unhealthy." ultimately you kinda *need* to get some more hardware in (or delete some data)17:00
opendevreviewASHWIN A NAIR proposed openstack/swift master: bad request syntax response missing txn-id  https://review.opendev.org/c/openstack/swift/+/88790417:04
opendevreviewASHWIN A NAIR proposed openstack/swift master: bad request syntax response missing txn-id  https://review.opendev.org/c/openstack/swift/+/88790417:19
opendevreviewASHWIN A NAIR proposed openstack/swift master: bad request syntax response missing txn-id  https://review.opendev.org/c/openstack/swift/+/88790417:22
paladoxtimburke: ah ok. i'm going to see if i can get more storage. But the doing it to like 3 x 525, 1 x 490 and 1 x 916 as weight is fine?17:22
opendevreviewASHWIN A NAIR proposed openstack/swift master: bad request syntax response missing txn-id  https://review.opendev.org/c/openstack/swift/+/88790417:33
timburkepaladox, if it seems to be working for you, go for it. if you're worried about the 525 disks still being too more-full than the others, maybe bring that down some, or bring the 916 up a bit. i'd do it fairly slowly, though -- maybe 1-3% weight change, rebalance, let things settle, see how fullness has changed across the cluster, re-evaluate and decide whether to continue. how much any change will help may largely depend upon 17:37
timburkehow much space is required for the individual partitions that get reassigned at this point17:37
paladoxah ok17:37
opendevreviewMerged openstack/swift master: Encode header in latin-1 with wsgi_to_bytes  https://review.opendev.org/c/openstack/swift/+/88424017:44
reid_gI just saw the note about the upgrade order: O>C>A>P. Is that documented someplace?18:43
DHEI upgraded 1 proxy (nothing else) and it's definitely gone sideways on my EC containers on that proxy..18:55
timburkereid_g, unfortunately, i think it's been largely tribal knowledge -- i should make sure that gets written down somewhere. notmyname had a blog post about it a while back, but it's since gone MIA19:35
timburkeDHE, are there tracebacks with those 500s?19:35
DHEhttps://pastebin.com/raw/ETNsWqQj19:40
DHEthe version is 2.26.0-10+deb11u1   (yes, debian's managed package)19:40
timburkewell, the good news is it's fixed on master... https://github.com/openstack/swift/commit/a5fa3cfc19:50
opendevreviewMerged openstack/swift master: Object-server: keep SLO manifest files in page cache.  https://review.opendev.org/c/openstack/swift/+/88530219:50
timburkeand backported it to victoria: https://github.com/openstack/swift/commit/acb742ac19:53
timburkei never did a 2.26.1 tho :-(19:56
timburkeoh! DHE, you also mentioned proxies sometimes jamming up, right? i think that may have been fixed in a more-recent version, too: https://opendev.org/openstack/swift/src/tag/2.27.0/CHANGELOG#L184-L18620:01
DHEumm, yes. I believe I'm the one who suggested the fix20:02
DHEthe original problem from the changelog jammed up the whole proxy process, deadlocking it forever. this problem seems specific to EC containers, and the proxy process remains otherwise servicable20:04
timburkeoh, right20:04
DHEthis py3 issue looks like it's related to my other issue. debian 11 carries swift 2.26 which sounds about right20:05
DHEhmm... maybe I can just cherry-pick it over the debian code...20:06
timburkei'd recommend it -- the change should apply cleanly. just need to remember to do it again if you ever need to re-image a node or something20:09
reid_gDo you have link to the write up by notmyname?20:09
DHEwith the recent release of debian 12, I hope I don't... :)20:10
reid_gI was about to upgrade our clusters and it would be interesting to read it. We running OCAP on all of our hosts and I did not separate the backend upgrades in my testing/staging clusters.20:11
timburkereid_g, i've been searching, but haven't had any luck. it was an old post, though; i'm sure there's a bunch more that could be said about how to do upgrades without your clients noticing these days20:11
timburkefwiw, we've definitely done upgrades like that (node at a time, all services on a node at once) for a while; it should still work pretty well20:12
reid_gFair Enough. We did that in the past and didn't have any noticable problems. I was curious since you mentioned an order.20:13
reid_gAlso, while you are here. I'm not sure if you noticed my comment the other day about the commit you posted relating to the unable to bind ports. I tested it out by manually patching a host and it did not fix the issue.20:15
timburkegood to know, thanks for testing it! i'm still scratching my head, then :-/20:18
reid_gThe only way I found is to stop the object reconstructor/replicator/server, wait 60 sec for the timewaits to die down and then I can start the object server again.20:19
timburkereid_g, what version are you on? i'm wondering if the "seamless reload" from https://github.com/openstack/swift/commit/1107f241 would work any better for you, or if the re-exec'ed process would also fail to bind....20:21
reid_gussuri / 2.25.220:22
reid_gOh. Actually it still happens in Yoga / 2.29.220:25
timburkereid_g, try doing a `kill -USR1 $MAINPID`! should be supported since 2.24.0. the idea is that we fork an extra child that's responsible for shutting down the old servers, re-exec the main guy to spawn a new batch of workers with new code/configs, then signal the extra child that we're ready so it can actually do the shutdown20:28
reid_gI will try to test that out.20:30
DHEwith SO_REUSEPORT (or SO_REUSEADDR ? I always get them mixed up) it might not even need that. in theory you can start the new service immediately alongside the old service. connections are assigned randomly. is there a "clean shutdown" command?20:36
DHEactually that probably causes problems with the service manager20:37
timburkeDHE, send a `HUP` to the main process and we'll close the listen sockets while still completing any in-flight requests20:38
timburkeand iirc, we let eventlet take care of setting both REUSEPORT and REUSEADDR for us: https://github.com/eventlet/eventlet/blob/master/eventlet/convenience.py#L3420:39
reid_gSounds like the systemd service unit should have a few more options configured.20:48
reid_gI'm just using the default unit files from Ubuntu package20:49
opendevreviewTim Burke proposed openstack/swift-bench master: Fix SyntaxWarning  https://review.opendev.org/c/openstack/swift-bench/+/88806921:35
opendevreviewMerged openstack/swift-bench master: Fix SyntaxWarning  https://review.opendev.org/c/openstack/swift-bench/+/88806922:22
opendevreviewTim Burke proposed openstack/swift-bench master: refactor bin/bench into swiftbench/cli for testing  https://review.opendev.org/c/openstack/swift-bench/+/86682622:42
opendevreviewTim Burke proposed openstack/swift-bench master: Switch from optparse to argparse  https://review.opendev.org/c/openstack/swift-bench/+/87434122:42
opendevreviewTim Burke proposed openstack/swift-bench master: support container_name from cli  https://review.opendev.org/c/openstack/swift-bench/+/86536922:42
opendevreviewASHWIN A NAIR proposed openstack/swift master: bad request syntax response missing txn-id  https://review.opendev.org/c/openstack/swift/+/88790423:15
opendevreviewASHWIN A NAIR proposed openstack/swift master: bad request syntax response missing txn-id  https://review.opendev.org/c/openstack/swift/+/88790423:32

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!