Thursday, 2020-08-13

*** openstackgerrit has joined #openstack-swift00:21
openstackgerritMerged openstack/swift master: s3api: Use swift.backend_path to proxy-log s3api requests  https://review.opendev.org/73522100:21
*** gyee has quit IRC00:21
mattoliverauI've been doing some more audit thinking, and had a bit of a brainwave, or at least another way to look at the problem. Don't have answers yet, but if you need some bed time reading to help put you to sleep then here is a mattoliverau braindump, which pictures (because it helps me visualise): https://docs.google.com/document/d/1y6NAcwbNKG0zWMGedfWm-ul3hhI_XWv63MjM5ecrIm8/edit?usp=sharing00:25
*** rcernin has quit IRC01:11
*** rcernin has joined #openstack-swift01:11
*** rcernin has quit IRC03:06
zaitcevInteresting.03:17
*** psachin has joined #openstack-swift03:31
*** rcernin has joined #openstack-swift03:52
*** evrardjp has joined #openstack-swift04:33
*** TViernion has quit IRC05:14
*** TViernion has joined #openstack-swift05:17
*** dsariel has joined #openstack-swift06:17
*** gtema has joined #openstack-swift06:22
*** gtema_ has joined #openstack-swift06:24
*** gtema has quit IRC06:28
mattoliverauHere's a dodgy graphviz hack to add an -g option to manage shard ranges:  http://paste.openstack.org/show/796804/07:13
*** mikecmpbll has joined #openstack-swift07:39
*** zaitcev has quit IRC09:01
*** rcernin has quit IRC09:05
*** zaitcev has joined #openstack-swift09:15
*** ChanServ sets mode: +v zaitcev09:15
*** rcernin has joined #openstack-swift09:57
*** rcernin has quit IRC10:40
*** rcernin has joined #openstack-swift10:47
*** rcernin has quit IRC11:00
*** gtema_ has quit IRC11:11
*** gtema has joined #openstack-swift11:37
*** tkajinam has quit IRC13:32
claygmattoliverau: that's a GREAT way to think about auditing shard shard ranges!  As long as "get rid of shard point/range" means "shrink it somewhere" I LOVE this15:56
claygI don't know much about graphviz, but I'm guessing if I run `render` with `view=True` in my vsaio it's not going to open a picture on my host15:57
claygI also haven't written enough graph theory code to make stuff that are "obvious when visualized" to be "obvious when written as code" 🤔15:59
timburkegood morning16:08
*** manuvakery has joined #openstack-swift16:11
mattoliveraunah, but it will leave a .svg in the same folder you ran it from. I don't know about it either, before today :P (re:graphviz)16:20
mattoliverauyeah, I mean we dump the good ranges into the bad shards shardrange table (once we know) and it should cleave and delete itself into the right path, so no just getting rid of them16:21
mattoliverauclayg: also thanks. it just struck me, so needed to write it down, glad is makes some kinda sense :P16:22
mattoliveraufor a graphviz in your saio, you'd need to install graphviz and pip install graphviz16:23
*** psachin has quit IRC16:25
*** gyee has joined #openstack-swift16:40
timburkemattoliverau, what are you still doing up!?16:52
*** gtema has left #openstack-swift16:53
timburke(or, *were*, hopefully. sleep well!)16:53
*** aluria has quit IRC16:54
timburkeclayg, thinking about https://review.opendev.org/#/c/744942/ -- what would we expect for something like a 5 replica policy? should [Timeout, Timeout, Timeout, 404, 404] return 503, or 404?16:55
patchbotpatch 744942 - swift - WIP: Client should retry primary quroum of errors - 1 patch set16:55
mattoliverauI'm attending a virtual conference (netdev 0x14).. because it's virtual so I can attend for the hell of it and be a fly on the wall :)16:55
mattoliverauso many late nights over the next week or so16:55
*** aluria has joined #openstack-swift16:56
timburkeah -- enjoy! hopefully the kids let you sleep later ;-)16:57
timburkeclayg, or more likely to come up: what about EC? i feel like [Timeout] * ndata + [404] * nparity is still enough 404s to respond 404...17:00
timburkemaybe a config option like `ignore_rebalance_404s = 1` (default), and if you're trying to move parts more aggressively, you can bump it up?17:02
timburke🤔 that also resolves issues with even numbers of replicas -- if you've got a 4-replica policy and [Timeout, Timeout, 404, 404], you've got *both* a quorum of errors and a quorum of 404s17:09
timburkei'm not sure "quorum" is really the right way to think about the problem -- "expected 404s with no timestamps because of a rebalance" is17:09
claygTIL https://aiosqlite.omnilib.dev/en/latest/?badge=latest17:11
claygtimburke: [timeout, timeout, 404, 404] is a good one to think about17:14
claygare you able to successfully differentiate between "primary 404 with no timestamp" from a "primary has a tombstone"17:14
claygthe tombstone is definitely authoritative -  if we don't have timestamps and "more primaries error'd than didn't" ... maybe that's a good enough reason to ask the client to try again!17:15
claygI certainly believe it to be the case an object could be "out there somewhere" even if we have more than one "404 w/o timestamp" because of over aggressive rebalance - a config option with a default of 1 might be useful if you could bump to 2 when shit hits the fan17:17
timburkeyeah, we should be able to look at headers to figure out if it was a tombstone or no record17:25
timburkeso far, we've mostly been thinking about GET/HEAD -- what about writes? i'm thinking the behavior may need to be different between DELETE (where we'll write something down) and DELETE (where we won't)17:26
timburke[Timeout, Timeout, 404] on DELETE makes me think of S3's behavior...17:27
claygthere's only one kind of DELETE we always write something down - no one has complained about DELETE response codes (yet?)17:36
*** manuvakery has quit IRC18:21
timburkebah, i'd meant DELETE vs. POST18:25
*** openstackgerrit has quit IRC19:02
claygso with a post a [timeout, timeout, 404] would currently... 503 probably, and that's... i don't think anyone has complained about POST either19:33
*** mikecmpbll has quit IRC21:26
*** rcernin has joined #openstack-swift22:00
*** patchbot has quit IRC22:02
*** tkajinam has joined #openstack-swift22:58
*** openstackgerrit has joined #openstack-swift23:25
openstackgerritTim Burke proposed openstack/swift master: Client should retry when there's just one 404 and a bunch of errors  https://review.opendev.org/74494223:25
timburkepretty sure [Timeout, Timeout, 404] on POST will 404 on master. [500, 500, 404] would (probably?) 503. i think i like how ^^^ is shaping up; either of those would 503 now, which seems to send the right signal to the client23:30

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!