Wednesday, 2021-09-29

opendevreviewTimur Alperovich proposed openstack/swift master: Bug: fix s3api multipart parts listings  https://review.opendev.org/c/openstack/swift/+/81124701:04
opendevreviewTimur Alperovich proposed openstack/swift master: Bug: fix s3api multipart parts listings  https://review.opendev.org/c/openstack/swift/+/81124703:59
timburke__clayg, any ideas on how to make my changes at https://review.opendev.org/c/openstack/swift/+/732996/2/swift/container/backend.py more readable?19:17
claygmaybe the readability matters less if my question about "is it for some reason more common in this specific call site" is answered "yes"19:22
claygyou find the other get_brokers()[0] call site and called it "probably safer" - and I guess we only ever blow up when trying to get brokers[0]?  cause shortly after we reap brokers[0] (the old db) we're back to only one db?19:23
claygI just assumed if we had a broker.get_old_db() or something we could encapsulate the race - i'm not sure what it does when it finds it?  refresh db state and return the new broker?  raise a more specific exception as part of the contract?19:25
timburke__hmm... maybe that could work... refresh and return new broker, then get state...19:31
timburke__but yeah, i suspect it's way more common at that particular call site -- because in the window where the race is tightest, we should mainly be looking at the old DB for the sake of stats19:33
reid_gHello, Is this IRC channel a good place to ask questions about swift usage/troubleshooting or is it more for development?19:38
timburke__reid_g, it's for both! what's your question?19:39
*** timburke__ is now known as timburke19:39
reid_gCool! I've emailed before and had clayg respond a few times. Thought it would be fun to join IRC19:40
claygreid_g: IRC is SO MUCH FUN 😉19:41
reid_gWe are using EC and recently added new nodes. We kicked off a rebalance and it looks to have mostly gone smoothly.19:41
reid_gWent from ~120K handoffs in 4 rings to ~500 but it has been stuck there for a day.19:42
reid_gwondering how to tell what is causing the stragglers19:42
reid_gTried running the reconstructor copy-of.conf (with logging set to debug) -o -v -p against one of these handoff partitions but it doesn't push the fragment to the correct device.19:43
reid_gTried going to what I thought was a neighbor and tried running the reconstructor on the partition but the missing fragment didn't get created in the missing primary.19:43
reid_gMy understanding is that the reconstructor should push a handoff to the correct location or recreate the missing fragment when run from a neighbor.19:43
reid_gWe identify handoffs as being partitions that don't belong to the host/device. Not sure if this is the correct terminology for data that needs to be moved after a rebalance.19:44
claygoff the cuff: probably bugs - we had to fix more than one EC handoff draining bugs19:52
claygone of the really bad ones that only got closed recently had to do with expired fragments I think @acoles fixed it19:52
clayghad to extend the SSYNC protocol and everything!  he's a mad scientist19:53
claygoh, not expired - non-durable https://review.opendev.org/c/openstack/swift/+/77004719:54
claygbut I'm pretty sure that's only the most recent example19:54
claygsame shit different metadata https://review.opendev.org/c/openstack/swift/+/45692119:56
clayg@acoles probably had a whole 'nother career as a network scientist in between fixing those bugs 🙄19:56
reid_gThis particular object I'm looking at isn't deleted/expired. No meta when I salt out a `ls` to all the nodes19:57
reid_gWhat is a durable vs non-durable fragment?19:57
reid_gWe did upgrade all of these clusters to USSURI but that is the latest available to 18.04.20:00
timburkewhen we write EC data, there's a two (or, really, three) phase commit -- in one phase we write fsync the newly-written data to all nodes but don't mark it "durable" (so it won't be considered authoritative, we won't clean up whatever old data may have been at that name, and given enough time, the non-durable data will get cleaned up similar to a tombstone)20:01
timburkeprovided enough backend nodes ack that phase, we tell them all to switch it over to durable20:01
timburkeyou can tell whether a frag is durable or not just by looking at the name -- durable data will end in #<frag number>#d.data, while non-durable will just be #<frag number>.data (or, on pretty old swift, there'd be a separate .durable file)20:03
reid_gSo it looks like this isn't durable 1632511385.76093#8.data20:04
reid_gThis ^ fragment is one that is in the pre-rebalance location.20:05
reid_gSo may be related to 770047 above20:06
reid_gIf I go to a neighbor and run object-reconstructor, shouldn't that create the missing fragment on the device that is missing?20:06
reid_gHow nodes need to say OK to turn EC object durable? the above example is from a 10+4 EC ring20:13
reid_gReading docs20:17
reid_gSeems like it should be close to realtime looking at the high level example20:43
timburkereid_g, what do the other nodes say? do they have durable data?20:46
timburke(sorry, suddenly had to drop off for childcare)20:46
reid_gNo. the other nodes do not show durable (no #d in the file name)20:46
timburkefor a 10+4 policy, we want 11 acks before marking durable. as long as at least one node marks it durable, it should propagate pretty quickly, even if other nodes missed the second phase. if we don't get enough acks, nobody gets marked durable, the client gets back a 503, and the data on-disk never *will* get marked durable20:48
timburkeit's a little curious, tho -- i would've expected the client to retry the upload after getting back the 503, so the other nodes would hopefully have durable data with a later timestamp20:50
reid_gO.O20:50
reid_gGoing to check with the application team if they expect this object to be functioning.20:50
timburkemaybe also double check whether it shows up in listings (given the description thus far, i expect it doesn't)20:51
reid_gWhat do you mean in "listings"20:51
timburkewhen the client does a GET at the container level20:52
timburkeyou can also go grepping for the object name in logs, confirm the response status was sent back to the client20:54
timburkeer...status *that* was sent...20:55
kotamorning20:59
reid_gGood news is that the application storing the data doesn't know about that object so it probably handled a failure.21:00
timburke\o/21:00
timburke#startmeeting swift21:00
opendevmeetMeeting started Wed Sep 29 21:00:39 2021 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.21:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.21:00
opendevmeetThe meeting name has been set to 'swift'21:00
timburkewho's here for the swift meeting?21:00
kotahi21:01
mattolivero/ 21:01
acoleso/21:02
timburkeas usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift21:02
timburke#topic gate21:02
timburkethe tempest and grenade failures should be resolved now! i seem to recall clayg and acoles noticing them being a problem21:03
timburke#link http://lists.openstack.org/pipermail/openstack-discuss/2021-September/025128.html21:03
timburke#topic xena release21:04
timburkethe stable branch has been cut!21:04
claygWoot!21:04
mattoliverNice 21:04
timburkei'd actually kinda meant to do another release before xena so we wouldn't be shipping code from June, but oh well -- i dropped the ball a bit21:05
timburkenot the end of the world, of course. 2.28.0 is a great release ;-)21:06
timburke#topic ring v221:06
timburkethanks for the review mattoliver! i haven't had a chance to look through them much, but i'll plan on responding this week21:07
mattoliverNps, still have more of the chain to work though, will do that today.21:07
timburke#topic root epoch reset21:08
timburkeit looked like there was a good bit of progress on this, kinda split between...21:09
timburke#link https://review.opendev.org/c/openstack/swift/+/80782421:09
timburke#link https://review.opendev.org/c/openstack/swift/+/80996921:09
mattoliverYeah, so the situation is we have had an epoch reset in the cluster a few times21:10
mattoliverBut still can't reproduce without physically breaking it (in a probe test)21:11
mattoliverBut we know once and own_shard_range is shared it should ALWAYS have an epoch set21:12
mattoliverThe first patch is one that stops merging a remote own_shard_range with a local if local has an epoch and remote doesn't down in the container broker.21:13
mattoliverThis "should" fix it.. but hard to know because we still don't know the cause21:14
mattoliverThe second is one that doesn't do down in the guts of shard merging only on replication. And will block a node without an epoch from replicating to its neighbours 21:15
mattoliverNot as universal as the first, but will give us better logging and a pause to help diagnose the bug.21:15
mattoliverWe're thinking about rolling out the second temporarily to catch when/if it happens again so we can track the bugger down.21:16
timburkesounds good21:17
timburke#topic staticweb + tempurl-with-prefix21:17
timburkeso this was an idea i had while thinking about how to share something out of my home cluster21:18
timburkei wanted to share a set of files, but not require that i send a separate tempurl for each or provide swift creds21:19
timburkeso i came up with21:19
timburke#link https://review.opendev.org/c/openstack/swift/+/81075421:19
timburkethe core of the change is in staticweb -- basically, do staticweb listings when auth'ed via tempurl and carry the prefix-based tempurl to the links that we build21:21
timburkei wanted to get people's thoughts on it, and see how we feel about the increase in privileges (prefix-based tempurl can now do listings -- but only if staticweb is enabled and only within the prefix)21:22
mattoliverOh that's kinda cool. You can share as a list of links and not make the container public readable. 21:23
mattoliverNeed to have a better look and play first though 21:23
timburkeyup :-)21:23
timburkeand it needs tests -- skimped there, figuring i ought to get a bit more buy-in first21:24
timburkeanyway, just wanted to raise a bit of attention for it21:25
timburkethat's all i've got21:25
timburke#topic open discussion21:25
timburkewhat else should we bring up this week?21:25
mattoliverPtg not far away, get topics in, or just plan to come and hang with us virtually for a bit21:27
kotais the schedule fixed?21:28
timburkei believe so, but will double check for next week21:29
kotaokay21:29
timburkeall right, i think i'll call it21:32
timburkethank you all for coming, and thank you for working on swift!21:32
timburke#endmeeting21:32
opendevmeetMeeting ended Wed Sep 29 21:32:48 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)21:32
opendevmeetMinutes:        https://meetings.opendev.org/meetings/swift/2021/swift.2021-09-29-21.00.html21:32
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/swift/2021/swift.2021-09-29-21.00.txt21:32
opendevmeetLog:            https://meetings.opendev.org/meetings/swift/2021/swift.2021-09-29-21.00.log.html21:32
mattoliverThanks timburke__ 21:33
mattoliverTime for breakfast  ðŸ˜€21:33
zaitcevTo hang in virtually is my plan.21:36
reid_gtimburke - checked the logs and this object looks like 7/14 timeout on the commit status of the put so that would be why it wasn't durable.21:44
reid_gThis object should be cleaned up at reclaim_age since it didn't get the durable flag?21:44
acolesreid_g: you may be suffering from bug https://bugs.launchpad.net/swift/+bug/1778002 - during rebalance, EC fragments should move from what has become a handoff node to their new primary node. But *non-durable* EC frags wouldn't be moved. A durable frag is identified by a filename with #d in it such as 1234567890.00000#1#d.data whereas 1234567890.00000#1.data is non-durable.21:44
acolesreid_g: yep, I was about to say, the nondurable will be eventually removed after reclaim age has passed.21:45
timburkemakes sense. yup! it'll get cleaned up after a reclaim age -- or you could just delete it manually21:45
acolesor, when you upgrade to wallaby :) the bug was fixed in wallaby i.e. the nondurable gets moved to the primary, but it still isn't made durable unless there'a another durable frag.21:46
reid_gNice at least we know why we have some things sitting around. I am guessing that the cluster got a bit busy during the rebalance and caused some timeouts.21:46
acolesin the meantime the handoff nags around annoyingly21:47
acoless/nags/hangs/21:47
reid_gYeah. We use them as the metric to know when the rebalance is done.21:47
* acoles heading to bed21:49
reid_gThanks for the help today! Time to go take care of kid21:50
timburkereid_g, glad to help! pop by again any time you need some help22:17
opendevreviewTim Burke proposed openstack/swift master: sharding: Raise fewer errors when the on-disk files change out from under us  https://review.opendev.org/c/openstack/swift/+/73299622:25

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!