Tuesday, 2021-01-19

*** baojg has joined #openstack-swift01:34
*** rcernin has quit IRC02:26
*** rcernin has joined #openstack-swift02:42
*** rcernin has quit IRC02:44
*** rcernin has joined #openstack-swift02:44
openstackgerritMatthew Oliver proposed openstack/swift master: Add root aceptor as root if root has been deleted  https://review.opendev.org/c/openstack/swift/+/77134305:15
*** evrardjp has quit IRC05:33
*** evrardjp has joined #openstack-swift05:33
*** m75abrams has joined #openstack-swift05:36
*** dsariel has joined #openstack-swift06:12
*** rcernin has quit IRC07:19
*** rpittau|afk is now known as rpittau08:07
*** hoonetorg has quit IRC08:51
*** hoonetorg has joined #openstack-swift08:53
*** dsariel has quit IRC09:15
*** dsariel has joined #openstack-swift09:16
*** dsariel has quit IRC09:17
*** dsariel has joined #openstack-swift09:17
*** rcernin has joined #openstack-swift09:43
*** dsariel has quit IRC10:19
*** dsariel has joined #openstack-swift10:20
*** rcernin has quit IRC10:28
*** rcernin has joined #openstack-swift11:13
*** baojg has quit IRC11:28
*** baojg has joined #openstack-swift11:29
*** baojg has quit IRC11:29
*** baojg has joined #openstack-swift11:29
*** baojg has quit IRC11:30
*** baojg has joined #openstack-swift11:30
*** baojg has quit IRC11:31
*** baojg has joined #openstack-swift11:31
*** baojg has quit IRC11:32
*** baojg has joined #openstack-swift11:32
*** baojg has quit IRC11:32
*** baojg has joined #openstack-swift11:33
*** baojg has quit IRC11:33
*** baojg has joined #openstack-swift11:34
*** baojg has quit IRC11:34
*** baojg has joined #openstack-swift11:34
openstackgerritAlistair Coles proposed openstack/swift master: Fix 503s from EC GETs of objects with POST metadata  https://review.opendev.org/c/openstack/swift/+/77108911:34
*** baojg has quit IRC11:35
*** baojg has joined #openstack-swift11:35
*** baojg has quit IRC11:36
*** baojg has joined #openstack-swift11:36
*** baojg has quit IRC11:36
*** baojg has joined #openstack-swift11:37
*** rcernin has quit IRC11:37
*** baojg has quit IRC11:37
*** baojg has joined #openstack-swift11:38
*** baojg has quit IRC11:43
*** paladox has quit IRC11:57
*** lifeless has quit IRC11:57
*** DHE has quit IRC11:57
*** paladox has joined #openstack-swift11:57
*** lifeless has joined #openstack-swift11:58
*** DHE has joined #openstack-swift11:58
openstackgerritMerged openstack/swift master: s3api: Get rid of slo_enabled flag  https://review.opendev.org/c/openstack/swift/+/77068512:02
*** rcernin has joined #openstack-swift14:42
openstackgerritAlistair Coles proposed openstack/swift master: s3api: actually execute check_pipeline in real world  https://review.opendev.org/c/openstack/swift/+/77146714:45
*** m75abrams has quit IRC14:53
*** rcernin has quit IRC15:01
*** m75abrams has joined #openstack-swift15:24
*** klamath_atx has joined #openstack-swift15:45
timburke_good morning15:51
timburke_DHE, good to know on the proxy hang -- i'll try to get something approaching a repro with pre https://review.opendev.org/c/openstack/swift/+/752593, then test again with something more like master. assuming that doesn't fly, i'll try moving the fix to *before* monkey-patching (which seems like it should do it for both original and patched versions)15:56
*** m75abrams has quit IRC16:00
*** klamath_atx has quit IRC16:01
*** diablo_rojo has joined #openstack-swift16:01
*** m75abrams has joined #openstack-swift16:03
*** jv has quit IRC16:21
*** jv has joined #openstack-swift16:34
*** hoonetorg has quit IRC16:49
*** hoonetorg has joined #openstack-swift16:49
*** jv has quit IRC17:10
*** m75abrams has quit IRC17:19
*** jv has joined #openstack-swift17:25
seongsoochoHi,  When a just one disk unmount from object-server,  all object-server's replicate respond slowly.   As a result, the object-replicator runs very slowly.  and I see the message  'Nothing replicated for 2400.06150103 seconds' in the log file.  Is it normal operation?17:33
*** gyee has joined #openstack-swift18:16
*** rpittau is now known as rpittau|afk18:41
openstackgerritTim Burke proposed openstack/swift master: obj: Include timeout value when logging long-running rsyncs  https://review.opendev.org/c/openstack/swift/+/77150418:44
timburke_seongsoocho, that sounds like normal operation. when a disk responds as unmounted, the replicator will assume that the disk has "failed in place" and work to ensure full durability by replicating to the first-available handoff. the "Nothing replicated ..." messages are usually because it's waiting on a long-running rsync (which makes sense if it needs to copy the whole partition)19:02
timburke_note that if the server *never responds*, otoh, the replicator assumes that it's a transient failure and will *not* replicate to handoffs19:03
seongsoochotimburke_:    About 24 hours have passed since it was unmounted. And looking at the replication network traffic, it seems that replication to the first handoff node is over.   Will replicating to the first handoff node affect the slow response speed of the object-server's REPLICATE api?19:09
timburke_every partition dir on every disk has a hashes.pkl file that has a kind of a checksum of the files present in that partition on that disk. after rsyncing a whole partition, the receiver will need to recalculate that checksum, which can be pretty io intensive. this can cause slow responses for requests to that disk (REPLICATE or otherwise, and within that partition or otherwise)19:18
timburke_still, i'm surprised that it's still impacting things *that much* a day on...19:20
timburke_it's probably worth looking at iostat/iotop on a slow server19:23
*** lifeless has quit IRC19:27
*** lifeless has joined #openstack-swift19:27
seongsoochodisk io is not that high on a slow server.  (It's weird.. )   I will wait a little longer.  thanks!19:30
openstackgerritClay Gerrard proposed openstack/swift master: Do not reclaim sharded roots until they shrink  https://review.opendev.org/c/openstack/swift/+/77108619:44
claygoh hrm... looks like I may have missed an opportunity to address some review comments - i'll mark it WIP19:45
claygseongsoocho: did you say you already know *why* cycle time is slow, and it's the REPLIATE requests?  i don't see why a REPLICATE request to a mounted disk would be slow because another disk is unmounted *on a different server* - maybe bouncing services could clean up some tar pit object server?19:50
claygIME REPLICATE requests that have to do a re-hash have always been kinda slow - maybe you just didn't notice and that's not actually what *changed* between when your cycles times were ok, and now?19:51
timburke_fwiw, i know i often see low %util but high %iowait at home -- not sure if that's mostly a result of running with SMR disks or what, though19:52
claygwhat Tim said about "unmounting a disk causes replication" is 100% true - if that's all that's going on that's normal19:52
claygtimburke_: for sure, iowait can get tanked by random reads - which is basically what a re-hash is doing 🤮19:52
timburke_the funny thing is that %util for the disk isn't just *low*, it's *0*. ditto *all* the per-disk stats19:53
timburke_like, in the middle of a `iostat -x 5` i get back a set of stats like http://paste.openstack.org/show/801743/19:57
seongsoochohttps://www.irccloud.com/pastebin/5s8sEwPp/20:01
seongsoochooh..20:01
seongsoochothis is the current object-server log.20:02
seongsoocho[19/Jan/2021:18:41:32 +0000] "REPLICATE /sdc/934" 200 169030 "-" "-" "object-replicator 30793" 0.0217 "-" 2148 020:02
seongsoocho[19/Jan/2021:18:59:20 +0000] "REPLICATE /sdc/934" 200 169030 "-" "-" "object-replicator 7361" 0.7141 "-" 2166 020:02
seongsoocho[19/Jan/2021:19:03:54 +0000] "REPLICATE /sdc/934" 200 169030 "-" "-" "object-replicator 30793" 0.0194 "-" 2166 020:02
seongsoocho[19/Jan/2021:19:23:43 +0000] "REPLICATE /sdc/934" 200 169030 "-" "-" "object-replicator 7361" 0.5397 "-" 2164 020:02
seongsoochoBefore the replication to hand off node, the response time was about 0.0x sconds.20:02
*** Jeffrey4l has quit IRC20:04
seongsoochoI don't know why the response time has changed from before. But I think this is why the replicator slows down.20:08
*** openstackgerrit has quit IRC20:12
*** Jeffrey4l has joined #openstack-swift20:13
*** openstackgerrit has joined #openstack-swift20:23
openstackgerritAlistair Coles proposed openstack/swift master: s3api: actually execute check_pipeline in real world  https://review.opendev.org/c/openstack/swift/+/77146720:23
timburke_man, trying to repro https://bugs.launchpad.net/swift/+bug/1895739 is making me notice other weird things, too... somehow, with a 3x replicated policy and only 5 disks in my cluster, i'm getting 7x "Client disconnected on read of ..." messages with the same txn id??20:31
openstackLaunchpad bug 1895739 in OpenStack Object Storage (swift) "Proxy server sometimes deadlocks while logging client disconnect" [Undecided,In progress]20:31
timburke_client_ip is sometimes present, sometimes not... and some log lines are missing txn id entirely... maybe i should try applying https://review.opendev.org/c/openstack/swift/+/761475 and see if i can get any more insight? though that was mostly targetting EC...20:34
timburke_i should probably also up-rev swift -- it's not even on 2.26.0 yet (though it's only a couple commits or so behind there)20:36
timburke_:-/ that didn't help much; still see error logs with no txn id, no client ip...20:41
*** Jeffrey4l has quit IRC20:50
*** Jeffrey4l has joined #openstack-swift20:51
openstackgerritClay Gerrard proposed openstack/swift master: Debug EC multipart/byteranges responses  https://review.opendev.org/c/openstack/swift/+/76147520:57
openstackgerritClay Gerrard proposed openstack/swift master: WIP: s3api: Make multi-deletes async  https://review.opendev.org/c/openstack/swift/+/64826321:06
clayglittle rebase action pre-package 🥳21:07
openstackgerritTim Burke proposed openstack/swift master: relinker: Track part_power/next_part_power in state file  https://review.opendev.org/c/openstack/swift/+/76985521:18
*** priteau has quit IRC21:35
mattoliveraumorning22:00
*** rcernin has joined #openstack-swift22:09
*** dsariel has quit IRC22:18
*** openstackgerrit has quit IRC22:59
*** openstackgerrit has joined #openstack-swift23:00
openstackgerritTim Burke proposed openstack/swift master: s3api: Break S3Request.__init__ signature less  https://review.opendev.org/c/openstack/swift/+/77152623:00
timburke_clayg, give ^^^ a try23:00
*** klamath_atx has joined #openstack-swift23:05

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!