21:00:35 #startmeeting swift 21:00:35 Meeting started Wed Feb 22 21:00:35 2023 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:35 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:35 The meeting name has been set to 'swift' 21:00:43 who's here for the swift team meeting? 21:00:55 o/ 21:00:58 o/ 21:01:47 o/ 21:01:54 sorry for the last-minute cancellation last week -- i've been having some computer troubles, so everything just feels harder than it should be :-( 21:02:06 Nps 21:02:09 as usual, the agenda's at 21:02:14 #link https://wiki.openstack.org/wiki/Meetings/Swift 21:02:21 first up 21:02:31 #topic recoving expired objects 21:03:35 my understanding is that we will have two headers for obj-server and proxy-sever seperately ? 21:03:50 we've had some users that accidentally let some data expire that they didn't mean to -- and it seemed like it'd be nice if there were a way to help them recover it 21:05:04 Makes sense 21:05:28 today, you can't really do that, at least not easily. the object-server will start responding 404 as soon as the expiration time has passed. you could do something with x-backend-replication:true and internal clients... but it's a little tricky and operator-intensive 21:06:55 noted. 21:07:03 indianwhocodes, yeah -- the way i'd imagine this working would be to have one client-facing header, then translate it to something else (possibly even just piggy-backing off the existing x-backend-replication header) when talking to the backend 21:07:54 we've got a couple pieces of work started: first, allowing an intentional delay in the expirer processing queue entries 21:07:56 #link https://review.opendev.org/c/openstack/swift/+/874806 21:08:17 and second, a client-facing API for retrieving expired data 21:08:20 #link https://review.opendev.org/c/openstack/swift/+/874710 21:09:54 imo, grace_period sounds a bit weird 21:10:33 oh yeah, x-backend-replication: true basically allows you get the data so a user header -> into this makes sense 21:11:23 ic 21:11:48 i think there are two main (possibly related) questions: (1) does this seem like a reasonable feature? i.e., can we see this merging upstream as opposed to just being something we (nvidia) carry? 21:12:59 and (2) what level of access would we want to require for this feature? reseller admin? swift owner? probably not all authed users; certainly not anonymous users 21:14:20 i'm don't think we're likely to answer either of those this week, but i wanted to put them out there and draw attention to the patches so we can talk more about them next week or at the vPTG 21:14:23 good perspective 21:14:41 I think it could be a useful tool to have. I mean in an ideal world just set a better X-Delete-At if want to access it after then. 21:15:04 and so long as people then dont expect a grace period of other objects that they could undelete. 21:15:37 But fact is we have a user who has the need and that goes along way 21:16:43 I feel like it could be a good enhancement to expired objects and could be used to determine if its been reclaimed (rather then look at the timestamps of the request) a 404 with x-open-expired would mean it's gone 21:16:52 and something a client could deal with 21:17:16 it would be being still operator perspective, could it be developed as a middleware? if it would be a plubggable feature like as a debugging tool, either (maintain upstream or not) may be fine. my feel. 21:18:04 fwiw, i feel like the delay is not *so* far off from an operator deciding to just turn off expirers for a while -- but better in some significant ways 21:18:44 kota, yes, absolutely -- at least for the retrieval side of things 21:19:09 interensting kota if is was a small middleware we could have an option to make it an admin or authed user etc. 21:19:42 (maybe this *is* going to be answerable this week :-) 21:19:57 basiclly all it would do is look for x-open-expired (or whatever) and convert it to x-backend-replication: true 21:20:42 and we can check to see if it was an admin request or not (if we enable that option). 21:21:07 I think exposing it as a client api is enough but ofcourse my wsgi middleware knowledge is not on par 21:21:20 it would be a small middleware, but allows opt in.. hmm 21:21:34 i've got a few more topics, so i think i'll keep us moving 21:21:40 Also something indianwhocodes can bite his teeth into. 21:21:57 indianwhocodes: most our apis are middlewares 21:21:58 #topic ssyncing data with offsets and meta without 21:23:10 oh this was an interesting bug 21:23:15 we recently discovered an issue with syncing .data files that have offsets (specifically, because of object versioning, though we expect this would happen with reconciled data, too) that *also* have .meta on them that *do not* 21:23:28 #link https://launchpad.net/bugs/2007643 21:23:50 acoles did a great job diagnosing the issue, writing up the bug, and coming up with a fix 21:23:57 #link https://review.opendev.org/c/openstack/swift/+/874122 21:24:06 As always. 21:24:16 clayg even wrote up a probe test 21:24:21 #link https://review.opendev.org/c/openstack/swift/+/874330 21:24:56 and i've got this itch to make sure we can also ssync .metas that have offsets 21:25:01 #link https://review.opendev.org/c/openstack/swift/+/874184 21:25:52 i don't think there's too much to discuss on the issue, but wanted to call it out as a recent body of work 21:26:48 #topic http keepalive 21:26:58 yeah, interesting bug that needed just the right combination of ssync, posts, versioned objects. Great work all of you! 21:27:16 I'll go give a review to what needs it seeing as I wasn't involved 21:27:56 some recent experiments showed that constraining max_clients could have a decent benefit to time to first byte latencies (and maybe overall performance? i forget) 21:28:25 but it uncovered an annoying issue with out clients 21:29:38 sometimes, a client would hold on to an idle connection for a while, just in case it decided to make another request. usually, this isn't much of a problem -- with high max_clients, we can keep some greenthreads watching the idle connection, no trouble 21:30:21 yeah, we were trampolining on too many eventlet coroutines when we had too large a max_clients.. at least with our clients, workflows and hardware sku's tuning them down helped us... it was an interesting deep dive.. but thankfully our PTL is also an eventlet maintainer :) 21:31:08 but with the constrained max_clients, we could find ourselves with all available greenthreads waiting on idle connections while new connections stayed in accept queues or waited to be handed off 21:32:02 one of the frustrating things was that from swift's perspective, our TTFB still looked good -- but not from a client perspective :-( 21:32:28 Ugh. 21:32:45 yeah, that damn accept queue in the listening socket, meant we hadn't accepted so our timers didn't start.. but the clients had 21:34:05 one option was to turn down client_timeout -- iirc we were running with the default 60s, which meant the idle connection would linger a pretty long time 21:34:59 but turning it down too much would lead to increased 499s/408s as it looks like the client timed out during request processing 21:35:47 i've been working on an eventlet patch to add a separate timeout for reading the start of the next request, and temoto seems on board 21:35:49 #link https://github.com/eventlet/eventlet/pull/788 21:36:11 but i've also been trying to do a similar thing purely in swift 21:36:14 #link https://review.opendev.org/c/openstack/swift/+/873744 21:36:52 I managed to recreate the problem in a VSAIO and had a bad client, and was able to "fix" the bad problem by limiting the wsgi server to HTTP/1.0 basically disconnecting after each request.. which isn't ideal but worked, your fix does this but better, break an idle (after a request and waiting for a new one) after a given amount of time. Allowing these bad clients to get disconected if they're hogging a connection and not using it. 21:37:30 i wanted to offer that background on the patch (since otherwise it seems a little crazy) 21:38:26 I think for those patches background is the key :) 21:39:05 and solicit some feedback on one point: if/when the eventlet patch merges, should i change the other one to just plumb in the timeout from config and call out that you need a fairly new eventlet? or continue setting the timeout so it works with old eventlet? 21:40:23 oh, interesting. I guess it depends on how many others are seeing this bad client behaviour. If people aren't noticing maybe the former as it's less code for us to carry? 21:40:42 or is it something we want to backport. 21:41:23 if it is, then the latter for maybe at least the next release is EOLed? 21:41:30 (just thinking outloud) 21:42:35 mattoliver, former was kind of my feeling, too -- it'll require that we (nvidia) remember to upgrade eventlet before the patch stops setting timeouts, but that should be fine 21:42:49 yeah 21:42:53 i don't expect this to be something to backport 21:43:39 #topic per-policy quotas 21:43:42 #link https://review.opendev.org/c/openstack/swift/+/861282 21:43:48 the swift patch does do a little wit of timeout swapping that's harder to grok.. so great workaround, but I vote for newer eventlet long term 21:44:46 i'm pretty sure i mentioned this patch not *so* long ago, but wanted to call out that the pre-reqs have seen some updates recently 21:45:39 i'd still love to be able to use my nice all-flash policy without worrying about it filling up and causing problems :-) 21:46:08 #topic vPTG 21:46:26 so... i forgot to actually book rooms 😱 21:46:34 lol 21:46:40 wow 21:46:53 i'll follow up with appropriate parties to make sure we have something, though -- don't expect it'll be a problem 21:47:18 i'll stick with the same slots we've been using the past few vPTGs 21:47:35 yeah, get what you can, I can always just drink alot of coffee :P 21:47:49 ok. nps 21:47:51 if worse comes to worse 21:48:09 I've put some stuff in the topics etherpad 21:48:11 remember how i said "everything just feels harder than it should be"? this is part of "everything" :P 21:48:21 thanks mattoliver! 21:48:21 #link https://etherpad.opendev.org/p/swift-ptg-bobcat 21:48:32 lol 21:49:15 Still probably missing a bunch there. all the topics we talked about today, if they're not landed might be interesting discussions. 21:49:30 that's all i've got 21:49:35 #topic open discussion 21:49:36 other things people are working on, or got langushing 21:49:45 what else should we discuss this week? 21:50:20 We've had more stuck shards, but we've already merged the patch that should stop the edgecase. 21:50:51 I've since also been opening the lids on sharding container-update and async pending stuff 21:51:20 Done some brain storming about what we can improve when a root is under too high a load.. 21:51:54 But instead of putting it in here, I just wrote it up in the PTG etherpad... so if anyone is interested you can look / comment there 21:52:16 or wait until the PTG and I might have some more POC and benchmarking done. 21:53:35 basically object-updater memcache support, maybe a an container-update header to the object-server to ot do a container-update just write directly to async as a hold off procedure. 21:53:56 *not 21:54:45 ooh, good thought... 21:56:07 Also build a beefy SAIO on an object server skued dev box, and I plan on seeing how my sharding-pending files patch is doing, to see if it too will help in this area from the other side.. then I'll put it under load... but let's see how far I can get before PTG. 21:56:13 (probably need to us an A/C skued box, real h/w is a good next step). 21:56:45 nice 21:57:04 all right, i think i'll call it 21:57:15 thank you all for coming, and thank you for working on swift! 21:57:19 #endmeeting