Wednesday, 2023-09-06

opendevreviewJianjian Huo proposed openstack/swift master: Container-server: add container namespaces GET  https://review.opendev.org/c/openstack/swift/+/89047000:57
opendevreviewMerged openstack/swift master: slo: 500 if we can't load the manifest  https://review.opendev.org/c/openstack/swift/+/88723001:11
zigotimburke: Thanks, let me try that patch.09:16
zigotimburke: The patch looks like fixing that StopIteration issue for me, though I still get this unit test failure:09:21
zigohttps://paste.opendev.org/show/bB2hp2iu7RaVGsQFzTdd/09:21
zigoMaybe I can ignore that one?09:21
opendevreviewMatthew Vernon proposed openstack/swift master: swift-expired-status CLI tool to inspect the expiry queue  https://review.opendev.org/c/openstack/swift/+/89386110:54
opendevreviewMatthew Vernon proposed openstack/swift master: swift-expired-status: tool to inspect the expiry queue  https://review.opendev.org/c/openstack/swift/+/89386110:55
opendevreviewAlistair Coles proposed openstack/swift master: Improve FakeSwift Backend-Ignore-Range support  https://review.opendev.org/c/openstack/swift/+/89357712:08
opendevreviewASHWIN A NAIR proposed openstack/swift master: wip: refactor slo  https://review.opendev.org/c/openstack/swift/+/89357818:55
kotagood morning20:58
zaitcevossu20:58
seongsoochogood morning20:59
mattoliverMorning21:02
seongsoochoHi~!21:03
* zaitcev taps his foot21:04
* zaitcev pokes timburke 21:04
zaitcevI have a question for an open discussion slot... What happens if one back-end node goes down? Yeah, really.21:04
kotawhich timing? in user requests? or stable cluster management perspective?21:06
zaitcevI'm looking at our code for error state in proxy.21:09
kotaOn the write context, only one backend fails is no matter because the rest two nodes can respond correct 201, write? (and also if the fail can be detected at the beginning of the request, we can use handoff nodes)21:12
mattoliverIt gets error limited, new puts are put on handoffs. 21:12
kotanot write, right21:12
mattoliverWell what kota said 😀21:13
kotaand, perhaps no meeting?21:16
zaitcevApparently the issue is, an operator has a cluster that is very slow, and they narrowed it to a particular node being slow enough to make connect timeouts. However, this does not result in the node getting error limited.21:17
zaitcevMaybe it would be limited if it either reset connections or returned HTTP errors, but connect timeouts aren't doing it.21:17
mattoliverAre they using servers per port, one slow disk can slow a whole  ide down. 21:18
mattoliver*node down21:18
mattoliverOh really, I would've thought they would, need to look at the code21:19
kotagot it. slow down node handling. hum...21:19
zaitcevI'll look into the servers per port thing. We do not enable it by default, I think.21:19
opendevreviewMerged openstack/swift master: Improve FakeSwift Backend-Ignore-Range support  https://review.opendev.org/c/openstack/swift/+/89357721:26
mattoliverHmm, in base on ConnectionTimeout we do call call the proxies exception_occured with is suppose to increment the error limit value for the node. 21:28
zaitcevRight.21:28
mattoliverWell it looks like it's suppose to. Is it intermittent enough that the nodes error limit suppression count is too high to ever actually trigger in the time frame?21:30
zaitcevUnfortunately, I don't know.21:31
zaitcevI don't even have the ring layout yet.21:31
mattoliverYeah we need for info on this, there is now way to get ready error_limit date with out logs in debug. Tracing pulls this data out. So you can see the state of the error_limit dictionary from the proxy you visit. 21:32
mattoliver*Get "real" error limit data21:33
mattoliverSorry on phone and it keeps auto correcting21:33
zaitcevnp21:35
mattoliverYeah, I'd get the suppression count and suppression interval, if there aren't count happening within the interval it won't get limitted21:35
mattoliverSo maybe it's too intermittent. Could adjust those values. But also look at servers per port (or what ever) as that means a single slow disk doesn't slow down the while back end service. Although I guess it could just as easy be other latency issues21:36
mattoliverAlso guessing there will be no meeting today 😜 21:37
zaitcevThe report says "we can easily reproduce it with iptables with -j DROP". Okay duh.21:37
mattoliverLol21:37
mattoliverI'll can play with a  saio and see what happens when it times out (post breakfast).21:40
mattoliverzaitcev once you get more info and if you want/need to chat more just ping , esp if you just need another  sounding board.21:40
zaitcevmattoliver: thanks!21:40
opendevreviewTim Burke proposed openstack/swift master: Clean up watchdog threads  https://review.opendev.org/c/openstack/swift/+/88581222:18
opendevreviewTim Burke proposed openstack/swift master: Add our own sys.unraisablehook  https://review.opendev.org/c/openstack/swift/+/89399122:19

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!