Tuesday, 2019-01-22

*** tkajinam has joined #openstack-swift00:02
*** psachin has joined #openstack-swift02:43
*** gkadam has joined #openstack-swift03:19
*** rcernin has quit IRC03:35
*** baojg has joined #openstack-swift03:48
*** rcernin has joined #openstack-swift03:50
*** rcernin has quit IRC03:57
*** rcernin has joined #openstack-swift03:58
*** baojg has quit IRC04:12
*** baojg has joined #openstack-swift04:18
*** gkadam has quit IRC04:21
*** baojg has quit IRC04:23
*** godog has quit IRC04:26
*** baojg has joined #openstack-swift05:11
*** tkajinam_ has joined #openstack-swift05:16
*** tkajinam has quit IRC05:18
*** baojg has quit IRC05:22
*** baojg has joined #openstack-swift05:23
*** tkajinam__ has joined #openstack-swift05:25
*** tkajinam_ has quit IRC05:27
*** e0ne has joined #openstack-swift05:52
*** e0ne has quit IRC05:53
*** tkajinam_ has joined #openstack-swift06:09
*** tkajinam__ has quit IRC06:12
*** spsurya has joined #openstack-swift06:21
*** ccamacho has joined #openstack-swift07:12
*** baojg has quit IRC07:36
*** pcaruana has joined #openstack-swift07:41
*** gkadam has joined #openstack-swift07:51
*** baojg has joined #openstack-swift08:03
*** tkajinam_ has quit IRC08:12
*** godog has joined #openstack-swift08:18
*** rcernin has quit IRC08:56
*** mikecmpbll has quit IRC09:04
*** hseipp has joined #openstack-swift09:05
*** mikecmpbll has joined #openstack-swift09:18
*** baojg has quit IRC09:57
*** baojg has joined #openstack-swift09:58
*** baojg has quit IRC10:09
*** e0ne has joined #openstack-swift10:24
*** mahatic has joined #openstack-swift10:51
*** ChanServ sets mode: +v mahatic10:51
*** mvkr has quit IRC11:34
*** mvkr has joined #openstack-swift12:06
*** e0ne has quit IRC12:25
*** e0ne has joined #openstack-swift12:30
*** baojg has joined #openstack-swift13:00
*** psachin has quit IRC13:16
*** e0ne has quit IRC14:05
*** e0ne has joined #openstack-swift14:08
*** openstackgerrit has joined #openstack-swift15:24
openstackgerritThiago da Silva proposed openstack/swift master: Remove duplicate statement  https://review.openstack.org/63248615:24
*** ccamacho has quit IRC15:37
*** ccamacho has joined #openstack-swift15:37
*** ybunker has joined #openstack-swift15:46
ybunkerhi all, quick question.. I've a swift cluster and some of the obj drives are getting more used that others, for example, some are at 95% of used space, and others are at 82% on the same node and also on different nodes.. the weights on the object ring are the same for those drives.. any ideas of what could be going on in here? also, is there a way to stop "storing" data on those 95% used disks?15:48
*** openstackgerrit has quit IRC15:51
*** ianychoi has joined #openstack-swift16:26
*** ccamacho has quit IRC16:33
*** e0ne has quit IRC16:38
*** e0ne has joined #openstack-swift16:39
*** hseipp has quit IRC16:42
*** pcaruana has quit IRC17:02
*** e0ne has quit IRC17:02
DHEdo you have a sane number of placement groups? sounds like you may have too few17:06
DHEand no, you can't just stop using certain drives. swift needs to be able to consistently predict where an object is located by name alone17:06
*** ccamacho has joined #openstack-swift17:16
ybunkerDHE: I've 8192 partitions, with replica of 3, 1 region, 8 zones and 72 devices17:21
DHEseems okay...17:25
ybunkerDHE: don't know where else to look, and every day the used space is growing17:26
DHEare the overused devices (disks) consistently on the same nodes? or would one node have a mixture of high and low usage disks17:28
tdasilvait's also a good idea to double check that one of your nodes don't have old rings...swift-recon --md5 will check all rings for you17:30
DHEthat's a good point...17:30
ybunkerthanks a lot guys will check on that17:31
DHEI'm assuming 72 disks here, not 72 nodes/hosts/servers17:31
*** mikecmpbll has quit IRC17:43
*** ccamacho has quit IRC17:51
ybunker72 disks yes, with 8 data nodes17:52
ybunkerthe rings are the same for all the nodes, so that's not the problem :(17:55
timburkeybunker: sounds like you're getting into a cluster-full situation, which typically sucks :-( even those ~80% full drives aren't going to be super happy; fwiw my usual recommendation is to try to keep drives under ~75% full18:00
ybunkerwe are planning to add 4x more data nodes.. but it will take at least a month... :S18:01
timburke...can we delete some data?18:01
*** pcaruana has joined #openstack-swift18:01
ybunkertwo data nodes had 60~70% used, so i change the weights of those nodes so they balance more there.. but on the other nodes instead of lower a little bit on space, they just grow and grow :(18:02
timburkethe core trouble is that those full drives are going to start responding 507, even for a lot of replication requests, which means that the remaining drives will fill up *even more quickly*, and you'll probably get some super-replicated data18:03
timburkeif you're confident that your drives are healthy and unlikely to fail in the next couple months (to give you time to not only get the new hardware in place but also get replication to settle), you might want to look at the handoffs_first and handoffs_delete options.... i'd feel much more comfortable recommending them if you already had the new hardware in place, though, and just needed to make replication go faster18:05
ybunkerthe thing is that the cluster has millon and millons of images.., mmm at some point is possible to delete some of the replicas?18:05
timburkesee https://github.com/openstack/swift/blob/2.20.0/etc/object-server.conf-sample#L279-L296 for the config options18:06
ybunkertimburke: thanks a lot, let me take a look on that18:06
ybunkertimburke: are those options available for juno release? 2.2.0 ?18:07
timburkeyou could reduce the replica count for the ring... but it'll come at a cost to durability, and probably wouldn't be a quick fix. i wouldn't recommend it unless you already know you want a two-replica policy or something18:08
timburkeshould go back fairly far... but then, junolemme see...18:08
timburkeshould go back fairly far... but then, juno's pretty old... lemme see...18:08
timburkelooks like you're good: https://github.com/openstack/swift/commit/e078dc3da05ce9e7c2b36e05686d28101381eec818:09
timburke(missing sample config got added in 1.13.0)18:10
ybunkerthanks :), so handoffs_first should be change to True and leave handoff_delete to auto18:11
ybunkeroh sorry to 218:12
timburkeprobably? you'll definitely want to have handoffs_first=true when rebalancing... and yeah, handoff_delete=2 seems not-crazy18:12
timburkeonce the new hardware's in place and you've had a few good replication cycles, you'll want to take those back to the defaults18:13
ybunkeranother problem that we got is that we can let object-replicator process to be running all day, we start the process on a specific window and then stopped, because latency goes to the sky18:13
ybunkerso the obj-replicator process runs for about 4 hours a day18:14
timburkegood news is, more disks should definitely help with that18:15
timburkeare your auditors on the same schedule?18:16
ybunkeryes18:16
timburkemakes sense. anything you can do to avoid the disk-thrashing, i'd imagine...18:17
ybunkerdo I need any special configuration on the object-auditor? i just have a concurrency of 1, files per second to 1, zero_byte_files_per_second = 5 and bytes_per_second = 100018:19
*** pcaruana has quit IRC18:23
timburkeseems... ok-ish, i guess? how far does it get in that 4hr window? i feel like with that tuning, you should be able to have them running continuously without really impacting client traffic...18:23
timburkei'd be inclined to increase concurrency to # of disks on the node, but that's me18:25
*** gkadam has quit IRC18:26
timburkehow long is that cycle time? i feel like it must take a while...18:26
ybunkeron object-replicator i had concurrency = 2 and replicator_workers = 618:27
timburkeyeah, i think i like that once better. auditor doesn't have the same concurrency/workers split iirc18:28
timburkei think they might not even mean the same thing :-(18:28
timburkeugh, yeah: https://review.openstack.org/#/c/572571/18:29
patchbotpatch 572571 - swift - object-auditor: change "concurrency" to "auditor_w... - 1 patch set18:29
*** ybunker has quit IRC18:34
*** ybunker has joined #openstack-swift18:34
*** ybunker has quit IRC18:45
*** ybunker has joined #openstack-swift18:45
*** openstackgerrit has joined #openstack-swift18:45
openstackgerritTim Burke proposed openstack/swift master: object-auditor: change "concurrency" to "auditor_workers" in configs  https://review.openstack.org/57257118:45
*** mikecmpbll has joined #openstack-swift18:47
openstackgerritTim Burke proposed openstack/swift master: object-auditor: change "concurrency" to "auditor_workers" in configs  https://review.openstack.org/57257118:53
*** baojg has quit IRC18:57
*** baojg has joined #openstack-swift18:57
*** baojg has quit IRC18:58
*** baojg has joined #openstack-swift18:58
*** baojg has quit IRC18:58
*** baojg has joined #openstack-swift18:59
*** baojg has quit IRC18:59
*** baojg has joined #openstack-swift18:59
*** baojg has quit IRC19:00
*** baojg has joined #openstack-swift19:00
*** baojg has quit IRC19:01
*** baojg has joined #openstack-swift19:01
*** baojg has quit IRC19:02
*** baojg has joined #openstack-swift19:02
*** baojg has quit IRC19:02
*** baojg has joined #openstack-swift19:03
*** baojg has quit IRC19:04
*** baojg has joined #openstack-swift19:04
*** baojg has quit IRC19:05
*** baojg has joined #openstack-swift19:05
*** baojg has quit IRC19:06
*** baojg has joined #openstack-swift19:06
*** baojg has quit IRC19:06
*** baojg has joined #openstack-swift19:07
*** e0ne has joined #openstack-swift19:07
*** baojg has quit IRC19:08
*** baojg has joined #openstack-swift19:08
*** baojg has quit IRC19:09
*** baojg has joined #openstack-swift19:09
*** baojg has quit IRC19:09
*** baojg has joined #openstack-swift19:10
*** baojg has quit IRC19:10
*** baojg has joined #openstack-swift19:11
*** baojg has quit IRC19:11
*** baojg has joined #openstack-swift19:11
*** baojg has quit IRC19:12
*** baojg has joined #openstack-swift19:13
*** baojg has quit IRC19:13
*** baojg has joined #openstack-swift19:14
*** baojg has quit IRC19:14
*** baojg has joined #openstack-swift19:15
*** baojg has quit IRC19:15
*** baojg has joined #openstack-swift19:15
*** baojg has quit IRC19:16
*** baojg has joined #openstack-swift19:16
*** baojg has quit IRC19:17
*** baojg has joined #openstack-swift19:17
*** baojg has quit IRC19:17
*** baojg has joined #openstack-swift19:19
*** baojg has quit IRC19:20
*** takamatsu has quit IRC19:27
ybunkerok so I disable object-replicator on the nodes that have more capacity19:38
ybunkerthen I flip on handoffs_first and drop handoff_delete to 2 on object-replicator on all the nodes, or do i have to change that just on the most full nodes?19:38
*** e0ne has quit IRC19:42
*** ybunker has quit IRC19:43
timburkethat first bit sounds a little terrifying. why are we turning the replicator off entirely? as for the config changes, i always find it easier to reason about a cluster when i have configs as uniform as possible across the nodes... i think i'd do that on all of them19:46
*** pcaruana has joined #openstack-swift19:50
*** pcaruana has quit IRC20:11
DHEalso remember that replication is push based. It seems to me running the replicator is more likely to allow a host to delete objects once it realizes that it is a handoff node and the primaries are all healthy. (is that something a replicator does?)20:14
*** pcaruana has joined #openstack-swift20:24
*** portante has left #openstack-swift20:32
zaitcevI tried to make everything less complicated for container server, and it went very poorly.21:01
zaitcevI mean less complicated than my previous patch, which had  ShardRange(row[0].decode('utf-8'), row[1:])21:02
zaitcevThe biggest problem is the code that insists on using the nul character for SQL markers.21:03
zaitcevLike...  m = x + b'\x00', then  sql("SELECT FROM table WHERE name < ?", m)21:05
zaitcevThere's NO WAY that I can see to use unicode there21:05
claygtimburke: thanks for point me at p 437523 and p 609843 - those are both good to keep on the radar21:13
patchbothttps://review.openstack.org/#/c/437523/ - swift - Store version id when copying object to archive - 9 patch sets21:13
patchbothttps://review.openstack.org/#/c/609843/ - swift - Allow arbitrary UTF-8 strings as delimiters in con... - 2 patch sets21:13
zaitcevDoes anyone actually remember what that zero actually does?21:14
* zaitcev pokes mattoliverau 21:17
zaitcevhttps://github.com/openstack/swift/blob/master/swift/container/sharder.py#L23721:17
zaitcevhttps://github.com/openstack/swift/blob/master/swift/common/utils.py#L479921:18
zaitcev(the latter is actually bogus in py3, but never mind)21:18
*** pcaruana has quit IRC21:19
*** baojg has joined #openstack-swift21:21
*** baojg has quit IRC21:27
openstackgerritTim Burke proposed openstack/swift master: Fix socket leak on object-server death  https://review.openstack.org/57525421:39
timburkezaitcev: we can't use u'\x00'? the idea is that `name == x` should be included, but no other valid object name after that. though i can't remember now why we didn't use `name <= ?`...21:45
timburkewhy is that last one bogus on py3?21:46
zaitcevwait, what21:49
zaitcevoh, so a nil is a valid unicode character21:49
zaitcevtimburke: thanks a lot, I have something to re-think here.21:52
*** rcernin has joined #openstack-swift22:10
*** rcernin has quit IRC22:17
*** rcernin has joined #openstack-swift22:19
*** lifeless_ is now known as lifeless22:34
*** baojg has joined #openstack-swift22:52
*** tkajinam has joined #openstack-swift22:54
openstackgerritMerged openstack/swift master: Remove duplicate statement  https://review.openstack.org/63248623:12
timburkeso this is weird. func testing https://review.openstack.org/#/c/575254/ (which needs another patchset; i was dumb in my last one), i put some russian-roulette middleware in my object server pipelines, try to pull down something sizeable, and one of two things happens23:14
patchbotpatch 575254 - swift - Fix socket leak on object-server death - 3 patch sets23:14
timburkeeither i see three object server deaths and a traceback in the proxy that ends with ShortReadError23:14
timburke(which is good, that's the behavior i want)23:14
timburkeor i see *one* object server death and a traceback coming out of catch_errors that ends with BadResponseLength23:15
timburkeand i can't seem to figure out where i'm getting a response body file that wouldn't have my ByteCountEnforcer :-(23:16
timburkei even tried pushing the wrapping up into utils/request_helpers...23:18
zaitcevugh23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!