Wednesday, 2021-11-17

opendevreviewTimur Alperovich proposed openstack/swift master: Fix multipart upload listings  https://review.opendev.org/c/openstack/swift/+/81371500:21
reid_gHey guys, what would be the path to diagnosing 'ERROR with Object server 10.40.100.92:6001/d1 re: Trying to get final status of PUT to' messages? We started receiving a lot of them with some increased traffic.13:42
DHEtimeouts?13:42
reid_gYeah Timeout (10.0s)13:42
reid_gAre you saying to tweak timeouts?13:46
reid_gWhich have you changed?13:46
DHEumm, none. 10 second timeout seems downright generous, unless you have some crazy fast networking and large objects?13:46
DHEif it's consistently the same server and device (d1) then I'd look at this specific disk's health13:47
reid_gWe have 10G on that cluster but the items are probably small.14:05
reid_gIt's pretty much every server showing these errors. Should we be doing rolling restart of swift services occasionally?14:06
reid_g10s is the default for that Timeout btw14:11
DHEI'm not a dev, but my guess would be large files uploaded very rapidly... 10G could upload a full size 5G object in 5 seconds at peak, but a spinning disk can't possibly flush the data within 10 seconds... depending on write cache sizes of course14:17
DHEactually it's 512 MB limit of dirty data by default...14:18
DHEthat's my guess... busy cluster, large objects, dirty data sync-out...14:19
reid_gWe are currently copying from an old hodgepodge cluster to this newer cluster. Fixed an issue on the old cluster and the GET 200 doubled and HEAD 200 x4. So I think the transfer sped up and is causing more strain on the new cluster.14:49
DHEmultiple simultaneous uploads at high speeds? yeah I'm thinking disks might just be getting too busy to fsync() in reasonable amounts of time...15:10
DHEraising timeouts is probably a good idea15:10
reid_gIs/was it possible for just the SLO manifest to get written and none of the segments made it?17:35
reid_gWhat is "512 MB limit of dirty data by default"?18:19
DHE[app:object-server]    mb_per_sync = 51218:56
opendevreviewClay Gerrard proposed openstack/swift master: wip: testing gate fix  https://review.opendev.org/c/openstack/swift/+/81810719:14
reid_gNot sure if it is only our version of Swift (2.2.0 Juno), but if you try to GET an object where only the SLO manifest exists, the proxy will return no data (404 exists in logs for the fragment).19:23
zaitcevI see timburke is not online today.21:12
acoleszaitcev: timburke is out this week, IIRC he cancelled the meeting for this week and next (thanksgiving)21:15
opendevreviewClay Gerrard proposed openstack/swift master: wip: testing gate fix  https://review.opendev.org/c/openstack/swift/+/81810721:58
opendevreviewClay Gerrard proposed openstack/swift master: DNM: playing with ssync EAGAIN  https://review.opendev.org/c/openstack/swift/+/81829622:02

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!