Wednesday, 2021-08-04

manuvakery1	timburke_: we are running following version of swift and is consistent across all nodes.	03:10
manuvakery1	openstack-swift-account.noarch 2.23.1-1.el7 @centos-openstack-train	03:10
manuvakery1	openstack-swift-container.noarch 2.23.1-1.el7 @centos-openstack-train	03:10
manuvakery1	openstack-swift-object.noarch 2.23.1-1.el7 @centos-openstack-train	03:10
manuvakery1	timburke_: sorry my mistake. we are running swift==2.25.0	03:22
opendevreview	Pete Zaitcev proposed openstack/swift master: Make the dark data watcher work with sharded containers https://review.opendev.org/c/openstack/swift/+/787656	04:33
manuvakery1	timburke_: you are right, one of the storage node was running an older version even though we upgrade to swift==2.25.0 via pip, cleanup and reinstall fixed the issue, thanks for the help	05:25
opendevreview	Matthew Oliver proposed openstack/swift master: conatiner-server: return objects of a given policy https://review.opendev.org/c/openstack/swift/+/803423	07:21
mattoliver	^ that still needs unit tests... but clays probe test looks better :P	07:23
*** mabrams is now known as Guest3279		08:58
*** mabrams1 is now known as mabrams		08:58
*** diablo_rojo is now known as Guest3281		09:49
*** Guest3281 is now known as diablo_rojo		10:09
*** diablo_rojo is now known as Guest3316		18:34
*** Guest3316 is now known as diablo_rojo		18:35
timburke_	almost meeting time!	20:57
*** timburke_ is now known as timburke		20:57
clayg	🥳	20:59
zaitcev	Sounds good.	20:59
kota	good morning	20:59
mattoliver	Morning	21:00
acoles	good evening :)	21:00
timburke	#startmeeting swift	21:00
opendevmeet	Meeting started Wed Aug 4 21:00:31 2021 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.	21:00
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	21:00
opendevmeet	The meeting name has been set to 'swift'	21:00
timburke	who's here for the swift meeting?	21:00
kota	o/	21:00
acoles	o/	21:00
mattoliver	o/	21:01
timburke	as usual, the agenda's at	21:01
timburke	#link https://wiki.openstack.org/wiki/Meetings/Swift	21:01
timburke	sorry, i only just got around to updating it	21:01
timburke	#topic PTG	21:01
timburke	just a reminder to start populating the etherpad with topics	21:02
timburke	#link https://etherpad.opendev.org/p/swift-ptg-yoga	21:02
timburke	i did get around to booking rooms, but i still need to put add them to the etherpad, too	21:03
timburke	i decided to split times again, try to make sure everyone has some time where they're likely to be well-rested ;-)	21:03
timburke	any questions on the PTG?	21:04
mattoliver	Not yet, let's fill out the etherpad and have some good discussions :)	21:05
timburke	agreed :-)	21:05
timburke	#topic expirer can delete data with inconsistent x-delete-at values	21:06
timburke	so i've got some users that are using the expirer pretty heavily, and i think i've seen an old bug	21:07
timburke	#link https://bugs.launchpad.net/swift/+bug/1182628	21:07
timburke	basically, there's a POST at t1 to mark an object to expire at t5, then another POST at t2 to have it expire at t10. if replication doesn't get rid of all the t1 .metas, and the t5 expirer queue entry is still hanging around, we'll delete data despite getting back 412s	21:09
timburke	on top of that, since the data got deleted, the t10 expiration fails with 412s and hangs around until a reclaim_age passes	21:10
clayg	timburke: do you have any evidence we've hit expiry times getting [412, 412, 204] because it expired before the updated POST was fully replicated?	21:10
zaitcev	Ugh	21:11
clayg	any chance we could reap the t10 delete row based on the x-timestamp coming off the 404/412 response?	21:11
zaitcev	I can see telling them not to count on POST doing the job, but the internal inconsistency is just bad no matter what.	21:11
clayg	also I think there's an inline attempt (maybe even going to async) to clean up the t5 if the post at t2 happens to notice it	21:11
timburke	i've seen the .ts as of somewhere around the start of July, and the expirer kicking back 412s starting around mid-July. i haven't dug into the logs enough to see exactly what happened when, but knowing my users it seems likely that they wanted the later expiration time	21:12
timburke	clayg, there is, but only if the t1 .meta is present on whichever servers get the t2 POST	21:14
clayg	👍	21:14
timburke	i think it'd be reasonable to reap the t10 queue entry based on the t5 tombstone being newer than the t2 enqueue-time. but it also seems preferable to avoid the delete until we actually know that we want to delete it	21:16
timburke	'cause i also think it'd be reasonable to reap the t5 queue entry based on a 412 that indicates the presence of a t2 .meta (since it's greater than the t1 enqueue-time)	21:17
timburke	anyway, i've got a failing probe test at	21:18
timburke	#link https://review.opendev.org/c/openstack/swift/+/803406	21:18
mattoliver	Great start	21:19
timburke	it gets a pretty-good split brain going on, with 4 replicas of an object, two with one delete-at time, two with another, and queue entries for both	21:20
clayg	noice	21:21
clayg	love the idea of getting those queue entries cleaned up if we can do it in a way that makes sense 👍	21:21
timburke	i'm also starting to work on a fix for it that makes DELETE with X-If-Delete-At look a lot like a PUT with If-None-Match -- but it seems like it may get hairy. will keep y'all updated	21:21
clayg	timburke: just do the HEAD and DELETE to start - then make it fancy	21:22
clayg	"someday"	21:22
clayg	(also consider maybe not making it fancy if we can avoid it)	21:22
timburke	it'd have to be a HEAD with X-Newest, though -- which seems like a pretty sizable request amplification for the expirer :-(	21:23
clayg	it doesn't have to be x-newest - you could just use direct client and get all the primaries in concert	21:24
clayg	the idea is you can't make the delete unless everyone already has a matching x-delete-if-match	21:24
timburke	i'll think about it -- seems like i'd have to reinvent a decent bit of best_response, though	21:26
timburke	next up	21:26
timburke	#topic last primary table	21:26
timburke	this came up at the last PTG, and i already see it's a topic for the next one (thanks mattoliver!)	21:27
timburke	mattoliver even already put a patch together	21:27
timburke	#link https://review.opendev.org/c/openstack/swift/+/790550	21:27
mattoliver	Thanks for the reviews lately timburke	21:28
timburke	i just wanted to say that i'm excited about this idea -- it seems like the sort of thing that can improve both client-observable behaviors and replication	21:28
zaitcev	Very interesting. That for_read, is it something proxy can use too?	21:29
zaitcev	Right, I see. Both.	21:29
zaitcev	So, where's the catch? How big is that array for a 18-bit ring with 260,000 partitions?	21:30
mattoliver	In my limited testing and as you can see in its follow ups it makes a post rebalance reconstruction faster and less CPU bound.	21:30
timburke	basically, it's an extra replica's worth of storage	21:30
timburke	/ram	21:30
mattoliver	Yeah, so you ring grows an extra replica basically.	21:30
kota	interesting	21:31
timburke	and with proxy plumbing, you can rebalance 2-replica (or even 1-replica!) policies without so much risk of unavailability	21:32
mattoliver	Also means on post rebalance we can take last primaries into account and get built in handoffs first (or at least for last primaries). Which is why I'm playing with the reconstructor as a follow-up.	21:34
timburke	👍	21:35
timburke	#topic open discussion	21:35
timburke	those were the main things i wanted to bring up; what else should we talk about this week?	21:35
clayg	There was that ML thread about the x-delete-at in the past breaking EC rebalance because of old swift bugs.	21:36
clayg	Moral: upgrade!	21:37
timburke	zaitcev, looks like you took the DNM off https://review.opendev.org/c/openstack/swift/+/802138 -- want me to find some time to review it?	21:41
mattoliver	Just an update, I had a meeting with our SREs re the tracing request patches, they gave some good improvements they'd find useful.. next I plan to do some bench marks to see if it effects anything before I move forward on it.	21:42
timburke	mattoliver and acoles, how are we feeling about the shard/storage-policy-index patch? https://review.opendev.org/c/openstack/swift/+/800748	21:42
timburke	nice!	21:43
zaitcev	timburke: I think it should be okay.	21:44
acoles	timburke: I need to look at the policy index patch again since mattoliver last updated it	21:45
zaitcev	timburke: Some of the "zero" tests belong into system reader. I was thinking about factoring them out, but haven't done that. It's not hurting anything except my sense of symmetry.	21:45
mattoliver	I moved the migration into the replicatior, so there is a bit of new code there, but means we can migrate shards spi before enqueued reconiler (but it still happens in the sharper too). So take a look and we can decide where to do it.. or maybe both.	21:45
zaitcev	I wish people looked at this though... it's the first step of the mitigation for stuck updates: https://review.opendev.org/c/openstack/swift/+/743797	21:46
zaitcev	I'll take a look at 790550 and 803406.	21:48
mattoliver	There is then a follow up to the shard spi migration and that's to get shard containers to respond to GET with the policy supplied to it (if one is supplied) so a longer tail spi migration doesn't effect root container GETS. A shard is an extension of its root, and happily takes objects with a different spi (supplied by the root) so makes sense it should return them on get too.	21:49
acoles	mattoliver: does that need to be a follow-up? does it depend on the other change?	21:50
mattoliver	No it doesn't.. but wanted to test it with clays probe test he wrote :)	21:50
acoles	oic	21:51
mattoliver	So ca n move it off there :) the follow up still needs tests so will do that today. I could always steal clays probe test and change it for this case :)	21:51
acoles	I'll try to catch up on those patches tomorrow	21:51
mattoliver	Just was useful while writing :)	21:51
acoles	clayg has a habit of being useful :)	21:52
clayg	😊	21:53
timburke	zaitcev, i'll take a look at https://review.opendev.org/c/openstack/swift/+/743797 -- you're probably right that we're in no worse situation than we already were. i might push up a follow-up to quarantine dbs with hash mismatches	21:53
timburke	all right, we're about at time	21:55
timburke	thank you all for coming, and thank you for working on swift!	21:55
timburke	#endmeeting	21:55
opendevmeet	Meeting ended Wed Aug 4 21:55:57 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	21:55
opendevmeet	Minutes: https://meetings.opendev.org/meetings/swift/2021/swift.2021-08-04-21.00.html	21:55
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/swift/2021/swift.2021-08-04-21.00.txt	21:55
opendevmeet	Log: https://meetings.opendev.org/meetings/swift/2021/swift.2021-08-04-21.00.log.html	21:55

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!