Wednesday, 2023-02-22

opendevreviewASHWIN A NAIR proposed openstack/swift master: Add x-open-expired to recover expired objects  https://review.opendev.org/c/openstack/swift/+/87471000:20
opendevreviewClay Gerrard proposed openstack/swift master: test for ssync meta offset bug  https://review.opendev.org/c/openstack/swift/+/87433000:21
opendevreviewASHWIN A NAIR proposed openstack/swift master: Add x-open-expired to recover expired objects  https://review.opendev.org/c/openstack/swift/+/87471000:33
opendevreviewASHWIN A NAIR proposed openstack/swift master: Add x-open-expired to recover expired objects  https://review.opendev.org/c/openstack/swift/+/87471000:35
opendevreviewASHWIN A NAIR proposed openstack/swift master: Add x-open-expired to recover expired objects  https://review.opendev.org/c/openstack/swift/+/87471000:38
opendevreviewASHWIN A NAIR proposed openstack/swift master: Add x-open-expired to recover expired objects  https://review.opendev.org/c/openstack/swift/+/87471000:48
opendevreviewASHWIN A NAIR proposed openstack/swift master: Add x-open-expired to recover expired objects  https://review.opendev.org/c/openstack/swift/+/87471000:55
opendevreviewASHWIN A NAIR proposed openstack/swift master: Add x-backend-open-expired to recover expired objects  https://review.opendev.org/c/openstack/swift/+/87471000:55
opendevreviewMatthew Oliver proposed openstack/swift master: docs: Add memcache.conf config doc  https://review.opendev.org/c/openstack/swift/+/87472005:20
opendevreviewMatthew Oliver proposed openstack/swift master: updater: add memcache shard update lookup support  https://review.opendev.org/c/openstack/swift/+/87472105:20
opendevreviewTim Burke proposed openstack/swift master: container: Add delimiter-depth query param  https://review.opendev.org/c/openstack/swift/+/82960505:34
opendevreviewTim Burke proposed openstack/swift master: staticweb: Work with prefix-based tempurls  https://review.opendev.org/c/openstack/swift/+/81075405:35
opendevreviewTim Burke proposed openstack/swift master: replicator: Add sync_batches_per_revert option  https://review.opendev.org/c/openstack/swift/+/83964905:45
opendevreviewMatthew Oliver proposed openstack/swift master: db: shard up the DatabaseBroker pending files  https://review.opendev.org/c/openstack/swift/+/83055106:23
mattoliverJust a rebase ^06:23
mku11Hi I have a weird situation that I can maybe get some pointers to investigate further. We have 2 regions where container sync is enabled. Some customers upload objects with an expiration date. When I look at the sync reports some containers can sync 130+ puts per container_time(60) but the number of deletes never exceeds 4. I can't find anything in the code that restricts10:31
mku11deletes. Unfortuanelty the containers where objects have an expiration date run out of sync I guess because there is not enough deletes synced10:31
opendevreviewAlistair Coles proposed openstack/swift master: sharder: make misplaced objects lookup faster  https://review.opendev.org/c/openstack/swift/+/87184313:45
opendevreviewAlistair Coles proposed openstack/swift master: sharder: yield fewer rows that have no destination  https://review.opendev.org/c/openstack/swift/+/87478114:43
opendevreviewAlistair Coles proposed openstack/swift master: ssync: Round-trip offsets in meta/ctype Timestamps  https://review.opendev.org/c/openstack/swift/+/87418415:40
opendevreviewAlistair Coles proposed openstack/swift master: sharder: make misplaced objects lookup faster  https://review.opendev.org/c/openstack/swift/+/87184316:02
opendevreviewAlistair Coles proposed openstack/swift master: sharder: yield fewer rows that have no destination  https://review.opendev.org/c/openstack/swift/+/87478116:02
mku11I found the problem with container sync and object expiration. in sync.py object_delete is called withouth retries parameter. Since the object in the other region in most cases is already deleted by the object expirer proces the object_delete gets a 404 not found and retries default 5 times with a total of around 17 seconds before continuing on. This gives a maximum of 417:47
mku11deletes per run and very slow advance of the sync pointer17:47
mku11Perhaps object_delete in sync.py can be better called with retries=017:47
timburkemku11, what version of swift is this? sounds a lot like https://bugs.launchpad.net/swift/+bug/1849841 which should've been fixed in 2.25.0 (so, ussuri) by https://github.com/openstack/swift/commit/f68e22d418:30
mku11ah sorry apparently I work with an old version (just got thrown into swift) newton. I had no luck digging up this bug with google.18:55
timburkemku11, no worries! just wanted to make sure there wasn't something else affecting later versions :-)18:56
timburkei'd definitely recommend upgrading when you get a chance, though -- there are so many *other* bugs we've fixed since then18:57
mku11I will surely give that attention but my first assignment is to move our swift from vm's to iron. When that is done I will look into upgrading. Thanks for the pointer to the patch18:59
opendevreviewMandell proposed openstack/swift master: WIP Add grace period to object expirer  https://review.opendev.org/c/openstack/swift/+/87480620:15
opendevreviewASHWIN A NAIR proposed openstack/swift master: Add x-backend-open-expired to recover expired objects  https://review.opendev.org/c/openstack/swift/+/87471020:16
opendevreviewASHWIN A NAIR proposed openstack/swift master: Add x-backend-open-expired to recover expired objects  https://review.opendev.org/c/openstack/swift/+/87471020:16
opendevreviewMandell proposed openstack/swift master: WIP Add grace period to object expirer  https://review.opendev.org/c/openstack/swift/+/87480620:38
indianwhocodeshowdy!21:00
timburke#startmeeting swift21:00
opendevmeetMeeting started Wed Feb 22 21:00:35 2023 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.21:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.21:00
opendevmeetThe meeting name has been set to 'swift'21:00
timburkewho's here for the swift team meeting?21:00
kotao/21:00
indianwhocodeso/21:00
mattolivero/21:01
timburkesorry for the last-minute cancellation last week -- i've been having some computer troubles, so everything just feels harder than it should be :-(21:01
mattoliverNps21:02
timburkeas usual, the agenda's at21:02
timburke#link https://wiki.openstack.org/wiki/Meetings/Swift21:02
timburkefirst up21:02
timburke#topic recoving expired objects21:02
indianwhocodesmy understanding is that we will have two headers for obj-server and proxy-sever seperately ?21:03
timburkewe've had some users that accidentally let some data expire that they didn't mean to -- and it seemed like it'd be nice if there were a way to help them recover it21:03
mattoliverMakes sense21:05
timburketoday, you can't really do that, at least not easily. the object-server will start responding 404 as soon as the expiration time has passed. you could do something with x-backend-replication:true and internal clients... but it's a little tricky and operator-intensive21:05
indianwhocodesnoted.21:06
timburkeindianwhocodes, yeah -- the way i'd imagine this working would be to have one client-facing header, then translate it to something else (possibly even just piggy-backing off the existing x-backend-replication header) when talking to the backend21:07
timburkewe've got a couple pieces of work started: first, allowing an intentional delay in the expirer processing queue entries21:07
timburke#link https://review.opendev.org/c/openstack/swift/+/87480621:07
timburkeand second, a client-facing API for retrieving expired data21:08
timburke#link https://review.opendev.org/c/openstack/swift/+/87471021:08
indianwhocodesimo, grace_period sounds a bit weird21:09
mattoliveroh yeah, x-backend-replication: true basically allows you get the data so a user header -> into this makes sense21:10
kotaic21:11
timburkei think there are two main (possibly related) questions: (1) does this seem like a reasonable feature? i.e., can we see this merging upstream as opposed to just being something we (nvidia) carry?21:11
timburkeand (2) what level of access would we want to require for this feature? reseller admin? swift owner? probably not all authed users; certainly not anonymous users21:12
timburkei'm don't think we're likely to answer either of those this week, but i wanted to put them out there and draw attention to the patches so we can talk more about them next week or at the vPTG21:14
kotagood perspective21:14
mattoliverI think it could be a useful tool to have. I mean in an ideal world just set a better X-Delete-At if want to access it after then. 21:14
mattoliverand so long as people then dont expect a grace period of other objects that they could undelete.21:15
mattoliverBut fact is we have a user who has the need and that goes along way21:15
mattoliverI feel like it could be a good enhancement to expired objects and could be used to determine if its been reclaimed (rather then look at the timestamps of the request) a 404 with x-open-expired would mean it's gone21:16
mattoliverand something a client could deal with21:16
kotait would be being still operator perspective, could it be developed as a middleware? if it would be a plubggable feature like as a debugging tool, either (maintain upstream or not) may be fine. my feel. 21:17
timburkefwiw, i feel like the delay is not *so* far off from an operator deciding to just turn off expirers for a while -- but better in some significant ways21:18
timburkekota, yes, absolutely -- at least for the retrieval side of things21:18
mattoliverinterensting kota if is was a small middleware we could have an option to make it an admin or authed user etc.21:19
timburke(maybe this *is* going to be answerable this week :-)21:19
mattoliverbasiclly all it would do is look for x-open-expired (or whatever) and convert it to x-backend-replication: true21:19
mattoliverand we can check to see if it was an admin request or not (if we enable that option). 21:20
indianwhocodesI think exposing it as a client api is enough but ofcourse my wsgi middleware knowledge is not on par21:21
mattoliverit would be a small middleware, but allows opt in.. hmm21:21
timburkei've got a few more topics, so i think i'll keep us moving21:21
mattoliverAlso something indianwhocodes can bite his teeth into. 21:21
mattoliverindianwhocodes: most our apis are middlewares21:21
timburke#topic ssyncing data with offsets and meta without21:21
mattoliveroh this was an interesting bug21:23
timburkewe recently discovered an issue with syncing .data files that have offsets (specifically, because of object versioning, though we expect this would happen with reconciled data, too) that *also* have .meta on them that *do not*21:23
timburke#link https://launchpad.net/bugs/200764321:23
timburkeacoles did a great job diagnosing the issue, writing up the bug, and coming up with a fix21:23
timburke#link https://review.opendev.org/c/openstack/swift/+/87412221:23
zaitcevAs always.21:24
timburkeclayg even wrote up a probe test21:24
timburke#link https://review.opendev.org/c/openstack/swift/+/87433021:24
timburkeand i've got this itch to make sure we can also ssync .metas that have offsets21:24
timburke#link https://review.opendev.org/c/openstack/swift/+/87418421:25
timburkei don't think there's too much to discuss on the issue, but wanted to call it out as a recent body of work21:25
timburke#topic http keepalive21:26
mattoliveryeah, interesting bug that needed just the right combination of ssync, posts, versioned objects. Great work all of you!21:26
mattoliverI'll go give a review to what needs it seeing as I wasn't involved21:27
timburkesome recent experiments showed that constraining max_clients could have a decent benefit to time to first byte latencies (and maybe overall performance? i forget)21:27
timburkebut it uncovered an annoying issue with out clients21:28
timburkesometimes, a client would hold on to an idle connection for a while, just in case it decided to make another request. usually, this isn't much of a problem -- with high max_clients, we can keep some greenthreads watching the idle connection, no trouble21:29
mattoliveryeah, we were trampolining on too many eventlet coroutines when we had too large a max_clients.. at least with our clients, workflows and hardware sku's tuning them down helped us... it was an interesting deep dive.. but thankfully our PTL is also an eventlet maintainer :) 21:30
timburkebut with the constrained max_clients, we could find ourselves with all available greenthreads waiting on idle connections while new connections stayed in accept queues or waited to be handed off21:31
timburkeone of the frustrating things was that from swift's perspective, our TTFB still looked good -- but not from a client perspective :-(21:32
zaitcevUgh.21:32
mattoliveryeah, that damn accept queue in the listening socket, meant we hadn't accepted so our timers didn't start.. but the clients had21:32
timburkeone option was to turn down client_timeout -- iirc we were running with the default 60s, which meant the idle connection would linger a pretty long time21:34
timburkebut turning it down too much would lead to increased 499s/408s as it looks like the client timed out during request processing21:34
timburkei've been working on an eventlet patch to add a separate timeout for reading the start of the next request, and temoto seems on board21:35
timburke#link https://github.com/eventlet/eventlet/pull/78821:35
timburkebut i've also been trying to do a similar thing purely in swift21:36
timburke#link https://review.opendev.org/c/openstack/swift/+/87374421:36
mattoliverI managed to recreate the problem in a VSAIO and had a bad client, and was able to "fix" the bad problem by limiting the wsgi server to HTTP/1.0 basically disconnecting after each request.. which isn't ideal but worked, your fix does this but better, break an idle (after a request and waiting for a new one) after a given amount of time. Allowing these bad clients to get disconected if they're hogging a connection and not using it. 21:36
timburkei wanted to offer that background on the patch (since otherwise it seems a little crazy)21:37
mattoliverI think for those patches background is the key :) 21:38
timburkeand solicit some feedback on one point: if/when the eventlet patch merges, should i change the other one to just plumb in the timeout from config and call out that you need a fairly new eventlet? or continue setting the timeout so it works with old eventlet?21:39
mattoliveroh, interesting. I guess it depends on how many others are seeing this bad client behaviour. If people aren't noticing maybe the former as it's less code for us to carry? 21:40
mattoliveror is it something we want to backport. 21:40
mattoliverif it is, then the latter for maybe at least the next release is EOLed? 21:41
mattoliver(just thinking outloud)21:41
timburkemattoliver, former was kind of my feeling, too -- it'll require that we (nvidia) remember to upgrade eventlet before the patch stops setting timeouts, but that should be fine21:42
mattoliveryeah21:42
timburkei don't expect this to be something to backport21:42
timburke#topic per-policy quotas21:43
timburke#link https://review.opendev.org/c/openstack/swift/+/86128221:43
mattoliverthe swift patch does do a little wit of timeout swapping that's harder to grok.. so great workaround, but I vote for newer eventlet long term21:43
timburkei'm pretty sure i mentioned this patch not *so* long ago, but wanted to call out that the pre-reqs have seen some updates recently21:44
timburkei'd still love to be able to use my nice all-flash policy without worrying about it filling up and causing problems :-)21:45
timburke#topic vPTG21:46
timburkeso... i forgot to actually book rooms 😱21:46
mattoliverlol21:46
kotawow21:46
timburkei'll follow up with appropriate parties to make sure we have something, though -- don't expect it'll be a problem21:46
timburkei'll stick with the same slots we've been using the past few vPTGs21:47
mattoliveryeah, get what you can, I can always just drink alot of coffee :P 21:47
kotaok. nps21:47
mattoliverif worse comes to worse21:47
mattoliverI've put some stuff in the topics etherpad21:48
timburkeremember how i said "everything just feels harder than it should be"? this is part of "everything" :P21:48
timburkethanks mattoliver!21:48
mattoliver#link https://etherpad.opendev.org/p/swift-ptg-bobcat21:48
mattoliverlol21:48
mattoliverStill probably missing a bunch there. all the topics we talked about today, if they're not landed might be interesting discussions. 21:49
timburkethat's all i've got21:49
timburke#topic open discussion21:49
mattoliverother things people are working on, or got langushing21:49
timburkewhat else should we discuss this week?21:49
mattoliverWe've had more stuck shards, but we've already merged the patch that should stop the edgecase.21:50
mattoliverI've since also been opening the lids on sharding container-update and async pending stuff21:50
mattoliverDone some brain storming about what we can improve when a root is under too high a load.. 21:51
mattoliverBut instead of putting it in here, I just wrote it up in the PTG etherpad... so if anyone is interested you can look / comment there21:51
mattoliveror wait until the PTG and I might have some more POC and benchmarking done. 21:52
mattoliverbasically object-updater memcache support, maybe a an container-update header to the object-server to ot do a container-update just write directly to async as a hold off procedure. 21:53
mattoliver*not21:53
timburkeooh, good thought...21:54
mattoliverAlso build a beefy SAIO on an object server skued dev box, and I plan on seeing how my sharding-pending files patch is doing, to see if it too will help in this area from the other side.. then I'll put it under load... but let's see how far I can get before PTG. 21:56
mattoliver(probably need to us an A/C skued box, real h/w is a good next step). 21:56
timburkenice21:56
timburkeall right, i think i'll call it21:57
timburkethank you all for coming, and thank you for working on swift!21:57
timburke#endmeeting21:57
opendevmeetMeeting ended Wed Feb 22 21:57:19 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)21:57
opendevmeetMinutes:        https://meetings.opendev.org/meetings/swift/2023/swift.2023-02-22-21.00.html21:57
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/swift/2023/swift.2023-02-22-21.00.txt21:57
opendevmeetLog:            https://meetings.opendev.org/meetings/swift/2023/swift.2023-02-22-21.00.log.html21:57
opendevreviewTim Burke proposed openstack/swift master: Fix docstring regarding private method  https://review.opendev.org/c/openstack/swift/+/87481623:38

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!