Friday, 2021-10-08

reid_g	Tried out the handoffs_only setting... very nice	12:17
reid_g	In handoffs_only mode. If a disk fails, will handoffs for missing fragments be created?	16:00
timburke__	reid_g, new writes may land on handoffs. but the data that was on the failed disk will not get rebuilt elsewhere (whether it was a primary or handoff location)	16:02
timburke__	so it's the sort of thing you turn on during the massive rebalances that happen during an expansion, then turn off again quick as you can so you can ensure full durability	16:03
reid_g	Alright. In normal operation a failed disk will cause handoffs to be created for EC in both new writes and stored data.	16:07
timburke__	yup -- though the rebuilding-to-a-handoff is a fairly recent addition. for a while, the assumption was that the failed disk would get removed from the ring fairly quickly, and we'd only rebuild to the new primary. i should double check when we added that...	16:10
reid_g	Is that recent for EC only or REP also?	16:11
reid_g	Yeah would be nice to know	16:13
timburke__	got the rebuild-to-handoffs behavior back in stein: https://github.com/openstack/swift/blob/master/CHANGELOG#L863-L871	16:37
timburke__	had it for replicated policies for a good long while	16:37
reid_g	So when swift drive fails. Replication policy = Create handoff for missing data? EC policy (Stein+) = Recreate fragment in handoff?	16:39
reid_g	When the drive is replaced. Replication/EC [stein+] policy = Move back to primary. EC otherwise = recreate the missing fragment.	16:40
timburke__	yup -- you'll want to unmount the failed drive as soon as you notice the failure. that'll cause the object server to start responding 507 to REPLICATE requests and the replicator/reconstructor will use handoffs to ensure full durability	16:41
timburke__	hands in the DC go swap out drives. SRE gets a new FS on the replacement, mounts it in the old location, then swift works to fill it back up	16:43
timburke__	note that sometimes drives fail in subtle ways -- they start to get just a little FS corruption, maybe you see ENODATA tracebacks in logs. it's not enough to where SMART indicates there's anything too fishy, and you can still do a healthcheck with a full PUT/POST/GET/DELETE of some well-known object through the object-server	16:48
timburke__	i'm ambivalent about how to handle those cases -- on the one hand, it's tempting to keep the disk in the ring but drop its weight to zero, drain off whatever data you can still read. on the other, the drive seems to be having trouble; how much should we really trust anything that's still on there?	16:50
timburke__	if the cluster's generally fairly healthy, i'd lean toward just unmounting the disk and letting the other replicas/frags get it back to full durability. if it's not doing great, i start to worry about whether that failing drive is my last backstop against data loss	16:52
reid_g	Yeah they are normally healthy	17:14
reid_g	Our normal process now is to get the list of disks that failed in the morning and have them replaced by the EOD usually.	17:24

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!