Tuesday, 2019-03-05

*** itlinux has joined #openstack-swift00:09
*** itlinux_ has joined #openstack-swift00:13
*** itlinux has quit IRC00:15
*** rchurch has joined #openstack-swift00:54
*** rchurch_ has quit IRC00:55
*** openstackgerrit has joined #openstack-swift01:01
openstackgerritMerged openstack/swift master: py3: port proxy container controller  https://review.openstack.org/63832901:01
openstackgerritMerged openstack/swift master: py3: port proxy account controller  https://review.openstack.org/63765301:01
*** rchurch has quit IRC01:10
*** rchurch has joined #openstack-swift01:11
claygi made this - in case you need to know what your async_pendings are upto -> https://gist.github.com/clayg/249c5d3ff580032a0d40751fc3f9f24b01:17
supamattclayg: forked!01:19
openstackgerritTim Burke proposed openstack/swift master: docs: clean up SAIO formatting  https://review.openstack.org/64091401:37
openstackgerritMerged openstack/python-swiftclient master: Update release to 3.7.0  https://review.openstack.org/64081901:39
kota_good morning02:20
openstackgerritMerged openstack/swift master: Clean up func tests ahead of py3  https://review.openstack.org/64051902:27
*** psachin has joined #openstack-swift02:39
*** chocolate-elvis has joined #openstack-swift02:42
*** itlinux_ has quit IRC03:06
*** itlinux has joined #openstack-swift03:12
*** itlinux has quit IRC03:21
*** itlinux has joined #openstack-swift03:25
*** gyee has quit IRC03:54
*** _david_sohonet has quit IRC04:03
kota_rledisez: I'll be at office until tonight and online as possible so if you're ready to push a merge commit to feature/losf branch, I'll be able to help any time in your morning04:53
notmynametdasilva: thanks for taking care of the changelog. it landed so I updated https://review.openstack.org/#/c/640549/04:59
patchbotpatch 640549 - releases - swiftclient 3.7.0 release - 2 patch sets04:59
*** chocolate-elvis has quit IRC04:59
*** pcaruana has joined #openstack-swift05:52
*** pcaruana has quit IRC06:07
*** ccamacho has quit IRC06:21
*** e0ne has joined #openstack-swift06:24
*** e0ne has quit IRC06:33
*** itlinux has quit IRC06:34
*** e0ne has joined #openstack-swift06:46
*** e0ne has quit IRC07:07
*** thyrst has left #openstack-swift07:31
*** hseipp has joined #openstack-swift07:43
admin6Hi team, some help is welcomed on dealing with a server full in my Erasure coding ring. I’m a bit puzzled about the behavior of the object-reconstructor that only try to rebalance data to the server that is already full. I’ve strongly reduced the weight of all disks on this server but the rebalance still try to put most of the datas on them.07:57
*** e0ne has joined #openstack-swift08:17
*** pcaruana has joined #openstack-swift08:19
kota_admin6: could you try handoff_only = True mode in object-server.conf? https://github.com/openstack/swift/blob/master/etc/object-server.conf-sample#L35308:19
kota_that will help you to move the handoffs from weight reduced nodes to the new nodes.08:19
kota_and also multiple workers might help to speed up the transfer, https://github.com/openstack/swift/blob/master/etc/object-server.conf-sample#L33308:20
kota_if you have Swift newer enough to use the stuffs.08:21
*** tkajinam has quit IRC08:22
*** pcaruana has quit IRC08:25
*** ccamacho has joined #openstack-swift08:29
admin6kota: thanks a lot for your suggestion. I’m on swift v2.17. I think it should be supported in this version (I’ll check). I’ll try it and see if it helps.08:31
*** pcaruana has joined #openstack-swift08:37
*** pcaruana has quit IRC08:44
rledisezhi kota_. i just arrived at work08:45
kota_rledisez: good morning!08:46
kota_rledisez: did you see my gist for the procedure to get master merged?08:47
rledisezyes, i'll try it right now08:47
kota_b08:50
openstackgerritRomain LE DISEZ proposed openstack/swift feature/losf: Merge remote-tracking branch 'remotes/origin/master' into merge-master  https://review.openstack.org/64095508:52
rledisezkota_: ^ I had no conflicts08:53
kota_good08:53
rledisezif tests are passing, I'll +2 / +A it08:53
kota_perfect08:54
*** cschwede has joined #openstack-swift08:57
*** ChanServ sets mode: +v cschwede08:57
*** e0ne has quit IRC08:58
*** e0ne has joined #openstack-swift09:00
*** pcaruana has joined #openstack-swift09:01
*** cschwede has quit IRC09:03
*** cschwede has joined #openstack-swift09:07
*** ChanServ sets mode: +v cschwede09:07
admin6kota: I have plenty of messages concerning .lock files like : object-reconstructor: 10.10.2.55:7000/s05z1ecd04/105712 Unexpected response: ":ERROR: 0 '8 seconds: /srv/node/s05z1ecd04/.lock’".09:21
admin6kota: in the object-server logs, 99+% of PUT logs concern the full server, with ssync subrequest failed with 507 error09:23
kota_perhaps, you hit https://github.com/openstack/swift/blob/master/etc/object-server.conf-sample#L170 ?09:23
kota_oh my.... it sounds like the destination is full enough?09:24
admin6kota: and in object-reconstructor, very few part of  "Removing partition" messages concern the full server,09:24
*** cschwede has quit IRC09:25
kota_admin6: no idea currently. To know what happens, we should check the ring stat, device usage, then, check if we've managed the ring correctly.09:30
admin6kota: yes the destination is full. and the strange behavior is that the most I’ve reduced the weight of disks on the concerned server (which now is full), the most data was written to it09:31
admin6kota: what do you means by ring stat? verbose ring dispersion report?09:33
* kota_ is meaning `swift-ring-builder <builder file>`09:37
kota_if the ring was configured (reduce the device weight and assigned partitions are decreased), the reconstructor in the node should move the partitions very actively.09:40
admin6kota: that’s what I did, but the behavior was the opposit, and the most I reduces the weight of the disks, the more this server was filled :-(09:53
admin6kota: would you be kind enough to have a look at my ring config ? https://pastebin.com/iTHxebmH09:55
kota_admin6: one possible reason was, there are already a tons of handoffs for the full node in other nodes, then, the handoffs are back to the full nodes when it has free space.09:55
kota_which is full disk?09:57
admin6kota: all disk of server 10.10.1.5309:57
kota_admin6: looking at line 322, 10.10.1.53 still keeps 41.19 % of partitions. is it intended for you?09:59
kota_i'm feeling it's still enough partition to be filled up as full, maybe?10:00
kota_and looking at the dispersion report from Line 224, it seems like we still have room for rebalance?10:02
admin6kota: no, it is not intended at all. and I’m not sure what exactly means this percentage and how to reduce it.10:02
kota_I cannot make sure what the weight on each disk is according to but the weight: 3700 looks not smaller enough than other nodes, some of devices have smaller weight...10:05
kota_and the balance looks not to fit with the weight too, so my bet is calling `swift-ring-builder <builder file> rebalance` to follow the weight balance.10:09
admin6kota: my weigths are globally related to size of disk : 4000 for a 4TB disk. disks on server  10.10.1.53 also are 4TB and I’ve reduced them to 3700, thinking that could be enough to reduce their usage.10:09
admin6kota: smaller weights are disk that are currentlty gradually intergated into the ring10:10
kota_i see10:10
kota_admin6: could you calculate how much data is in a partition?10:12
admin6kota: don’t think if I’m right but each time I do a rebalance, I try not to change more than 8% of the total ring weight, then wait for the dispersion report to return at least 99% of object copies found.10:13
kota_the 4TB disks with 4000 weight has around 14504 partitions, then, 3700's has 14127 partitions so just around 380 partitions is the difference with other 4TB disks10:14
admin6I’ve currently 700TB stored in this ring divided by 262144 partitions, that means 2,7GB per partition10:16
kota_hmmm10:25
admin6kota: server 10.10.1.54 has disk flled with 3520GB for 14504 partitions. Server 10.10.1.53 has disk flled with 3866GB for 14127 partitions10:29
kota_that sounds like your object size stored in Swift is in dispersion :/10:31
kota_and if a partition can store 2.7GB, 14504 partitions mean that the server can store 39TB that seems different from your intension.10:34
kota_at most.10:34
admin6kota: I’m not sure what you means by "object size stored is in dispersion". unless it was sarcasm ;-)  for the partition size, isn’t their something related to the number of fragment a file is divided in ?, because on server 10.10.1.53 for example, 3866GB divided by 14127 partitions means 274MB per partition, not 2,7GB10:39
kota_I don't think it's being from actual stored data. it's from design how operator want to control the stored data. When you think to create 700TB logical cluster with 262144 partitions, a partition has the meaning that has the capacity with 2.7GB in average.10:44
kota_but the size of each partition should be "average", so that if the real object size in-coming from users would be distributed, some of partitions may gets more size rather than average.10:46
kota_anyway, the partition size is from your cluster design and total real disk volume size, so I have no idea how much size of each partition is in your cluster.10:48
kota_but if the partition average size * the number of partitions is bigger than the real disk size, the disk will keep full unless you decrease assigned partitions less than the real volume size.10:49
kota_it's so strategic.10:50
admin6kota: but if I sum the number of partitions of each disks, I obtain 3145728 which is 12 time more than 262144 which is the numer of partition in the ring, ang it seems somewhere logical as my erasure coding ring is a 9+310:50
admin6so the partition at the disk level has not the same meaning as the ring level ?10:51
kota_9 + 3 meaning 12 pieces of fragments so 12 times is correct, i think.10:51
kota_so if assuming 700TB actual (not logical) disk volumes in your cluster, 227MB for a partition.10:53
kota_it seems like around 3.2TB, mathmatically.10:55
kota_with 14127 partitions.10:55
admin6so 227MB * 14127 partitions = 3207GB, even with 10% overload; I should not be over 3527GB per disk, but they are full with 3866GB and reconstructor still want to write more data only into these disks :(10:58
kota_admin6: that could happen, if you're running for a long time with the full disk. I think the reconstructor in the full disk node .53 actively push the data but in-coming handoff from other nodes might be faster.11:00
kota_so if my thought would be correct, my suggestion is `run handoff_only = True mode only in the full disk node`11:01
admin6kota: ok, I can try that option.11:02
admin6kota: could you tell me if I’m right or not, thinking that it may not be safe to push a new rebalance if the dispersion report is at 95/96% ?11:04
admin6kota: lastly, I thank you a lot for your time spent with me, really appreciated  :)11:05
kota_admin6: no worries, what's 95/96% mean?11:06
admin6each time I do a rebalance, I try not to change more than 8% of the total ring weight, then wait for the dispersion report to return at least 99% of object copies found. Currently my dispersion report is : Queried 2621 objects for dispersion reporting, 45s, 0 retries - There were 1417 partitions missing 0 copies. -  There were 1074 partitions missing 1 copy. - There were 127 partitions missing 2 copies. - There were 311:08
admin6partitions missing 3 copies. - 95.75% of object copies found (30115 of 31452).   That’s this 95,75% I’m talking about.11:08
kota_i see11:11
admin6I fixed 8% percent max because it means about 1 fragment per object on my 9+3 EC.11:13
kota_ring weight and dispersion is relative but not completely tied up because Swift doesn't move more than 1 replica for a partition on one rebalance.11:13
kota_basically11:14
kota_but I can not say 'yes it's safe' because you already 3 parts missing 3 copies11:14
kota_you already have11:14
kota_that means, the rebalance call might cause to make a partition missing 4 copies in the primary locations.11:15
kota_that will be unfortunately causing 503 (or 404?) errors temporaly11:16
admin6kota: that’s very clear. and than means swift is safer a more smarter than I tought. (or than me ;-) )11:17
*** e0ne has quit IRC11:32
openstackgerritMerged openstack/swift feature/losf: Merge remote-tracking branch 'remotes/origin/master' into merge-master  https://review.openstack.org/64095511:34
*** e0ne has joined #openstack-swift11:42
*** ccamacho has quit IRC12:17
*** early` has quit IRC12:26
*** early` has joined #openstack-swift12:29
*** admin6_ has joined #openstack-swift14:21
*** admin6 has quit IRC14:24
*** admin6_ is now known as admin614:24
*** cschwede has joined #openstack-swift14:28
*** ChanServ sets mode: +v cschwede14:28
claygtimburke: do you think a rebase would help https://review.openstack.org/#/c/637662/ or does it just need a recheck?14:44
patchbotpatch 637662 - swift - Simplify empty suffix handling - 2 patch sets14:44
*** pcaruana has quit IRC14:57
*** e0ne has quit IRC15:02
*** ccamacho has joined #openstack-swift15:03
*** e0ne has joined #openstack-swift15:13
*** openstackgerrit has quit IRC15:28
*** pcaruana has joined #openstack-swift15:42
*** psachin has quit IRC16:54
*** ccamacho has quit IRC16:55
-openstackstatus- NOTICE: Gerrit is being restarted for a configuration change, it will be briefly offline.17:11
*** jistr|sick is now known as jistr17:14
notmynamepython-swiftclient 3.7.0 has been tagged. this is our release in the stein cycle17:16
*** e0ne has quit IRC17:18
*** hseipp has quit IRC17:25
*** itlinux has joined #openstack-swift17:35
claygrledisez: have you ever experimented with pulling out the post-revert-replicate request?  lp bug #181870917:43
openstackLaunchpad bug 1818709 in OpenStack Object Storage (swift) "object replicator update_deleted post ssync REPLICATE request considered harmful" [Undecided,New] https://launchpad.net/bugs/181870917:44
claygrledisez: IIRC you also use SSYNC on your replicated clusters - and there as well it seems more and more like a bad IO trade off17:44
claygN.B. we mainly use rsync replication for replicated storage policies and for me there's no obvious way to get away from REPLICATE requests - but with SSYNC it seems like a cheap win?17:45
clayg... unless I'm missing something?17:45
*** gyee has joined #openstack-swift17:54
*** e0ne has joined #openstack-swift18:19
*** itlinux has quit IRC18:42
*** itlinux has joined #openstack-swift18:47
*** SkyRocknRoll has joined #openstack-swift19:11
*** e0ne has quit IRC19:17
*** SkyRocknRoll has quit IRC19:19
*** e0ne has joined #openstack-swift19:37
*** itlinux has quit IRC19:56
*** itlinux has joined #openstack-swift20:09
*** itlinux has quit IRC20:15
*** pcaruana has quit IRC21:10
*** e0ne has quit IRC21:13
*** e0ne has joined #openstack-swift21:26
*** e0ne has quit IRC21:30
*** rchurch_ has joined #openstack-swift21:45
*** rchurch has quit IRC21:47
*** mvkr has quit IRC22:25
*** openstackgerrit has joined #openstack-swift22:51
openstackgerritTim Burke proposed openstack/swift master: s3token: Add note about config change when upgrading from swift3  https://review.openstack.org/64115322:51
*** tkajinam has joined #openstack-swift22:54
tdasilvatimburke: why the check for v < 2: https://review.openstack.org/#/c/636748/3/setup.py@109 ?23:26
patchbotpatch 636748 - pyeclib - Use liberasurecode_get_version() - 3 patch sets23:26
timburkein case we ever make a v223:27
timburkeplus, that's what the old code seemed to be *trying* to do, so... may as well keep at it?23:30
tdasilvatimburke: got it, sounds good! thanks23:37
openstackgerritMerged openstack/pyeclib master: Use liberasurecode_get_version()  https://review.openstack.org/63674823:40
mattoliveraumorning23:44
timburkeo/ mattoliverau23:46
openstackgerritMerged openstack/swift master: docs: clean up SAIO formatting  https://review.openstack.org/64091423:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!