Thursday, 2020-09-17

*** tonyb has joined #openstack-swift00:22
*** psachin has joined #openstack-swift03:24
openstackgerritMerged openstack/swift master: proxy: Include thread_locals when spawning _fragment_GET_request  https://review.opendev.org/74937603:33
*** m75abrams has joined #openstack-swift04:18
*** evrardjp has quit IRC04:33
*** evrardjp has joined #openstack-swift04:33
*** gyee has quit IRC05:38
*** manuvakery has joined #openstack-swift05:47
*** rcernin has quit IRC06:50
*** rcernin has joined #openstack-swift07:08
*** rcernin has quit IRC07:28
*** mikecmpbll has joined #openstack-swift08:03
*** rcernin has joined #openstack-swift08:11
*** rcernin has quit IRC08:17
*** rcernin has joined #openstack-swift08:50
*** rcernin has quit IRC09:03
*** lxkong has joined #openstack-swift09:32
*** m75abrams has quit IRC09:46
*** zaitcev has joined #openstack-swift12:49
*** ChanServ sets mode: +v zaitcev12:49
*** mikecmpbll has quit IRC13:54
*** gyee has joined #openstack-swift15:15
DHEanyone seen deadlocks in the proxy server? I've had a few times now where the proxy service hangs and requires a full restart, the workers apparently hung in a futex() call of sorts15:28
DHEI've brought this up before without resolution, but that was months ago15:28
zaitcevNot me, but my cluster is small.15:29
DHEmine isn't all that big either. but it might need updating. could be an eventlet bug maybe, etc.15:32
DHESomething like this https://github.com/eventlet/eventlet/issues/508   except I'm running python 3.6 instead and this specifically mentions 3.7.15:37
zaitcevI was hitting https://github.com/eventlet/eventlet/issues/526 and I noticed it on 3.6 first. This does not have a direct relation to your deadlock, but it tells us that the lowest broken and highest working versions are not always reliable.15:39
DHEfair15:39
*** manuvakery has quit IRC16:03
*** lxkong has quit IRC16:11
*** manuvakery has joined #openstack-swift16:14
*** psachin has quit IRC16:15
ormandjtimburke: you'll be happy to know we're rolling out servers_per_port as we speak16:26
ormandjwill let you know how it goes16:26
ormandjon an unrelated to that point - why would we be seeing lots of timeouts/errors getting respones from the object-updater connecting to the container-server port/disk? the disks are basically idle (and are SSD) for container dbs16:27
timburkeDHE, fwiw i've been running into https://bugs.launchpad.net/swift/+bug/1895739 at home16:56
openstackLaunchpad bug 1895739 in OpenStack Object Storage (swift) "Proxy server sometimes deadlocks while logging client disconnect" [Undecided,New]16:56
timburkefor the moment, i'm running with something like http://paste.openstack.org/show/798024/ to see if the problem goes away, or if any *new* problems creep up -- so far, so good 🤞16:58
DHEstack xray? I like it already...16:59
timburkesuper-handy! torgomatic's awesome17:00
DHEso I'm currently on 2.23.1 but it looks like it will install cleanly17:05
timburkeoh yeah, zaitcev, have you tried latest eventlet recently? i *think* the SSLContext ting should be fixed now...17:24
zaitcevtimburke: on my list. I'm just back from the desert today.17:24
timburkeoh, no worries! just figured i'd check :-)17:25
timburkegood trip? i know i really needed a trip to the mountains a few weeks ago17:25
zaitcevIt was okay. Lost 16 pounds.17:26
zaitcevWhile eating mostly chicken and sousage.17:26
DHEoh I found a hung process! one moment please!17:33
DHEhttp://paste.openstack.org/show/IRDAu64Piux39lZIRbkJ/17:37
*** irclogbot_1 has quit IRC17:58
*** irclogbot_2 has joined #openstack-swift18:02
timburkewhat happened to the poor guys that we spawne dup at https://github.com/openstack/swift/blob/2.23.1/swift/proxy/controllers/base.py#L1335-L1336 ? :-/ they were supposed to be making the backend requests18:31
timburkeDHE, try grepping logs for "STDERR" or "_make_node_request"18:31
DHESTDERR shows a lot of memcached errors18:33
timburke(the biggest downside to the xray thing is that it can only tell you where things are *now*, after things have broken -- it's like how i always want a "rewind" option in debuggers)18:34
timburkefwiw, the memcache logging will improve somewhat with https://github.com/openstack/swift/commit/e4586fdcd18:34
DHEmemcached is limiting itself to 1024 connections, I suspect that is the reason...18:35
timburkei could see that causing issues, yeah. fwiw, i think we run with like 16k by default18:38
timburkei don't think it's the root cause for *this* issue, of course. by the time we get down to making and waiting on backend connections, we shouldn't be touching memcache18:40
openstackgerritTim Burke proposed openstack/swift master: Authors/ChangeLog for 2.26.0  https://review.opendev.org/75053718:56
*** ccamel has quit IRC18:57
*** camelCaser has joined #openstack-swift19:15
DHEis it possible to print the active thread's stack? I can run "bt" in gdb but have no idea how to interpret that19:16
DHE(I know enough to be dangerous but not enough to be useful)19:16
*** openstackgerrit has quit IRC19:21
DHEso I've dumped 3 processes so far. 1 has your nested current_thread call deadlock. the other 2 are nigh-identical and one is the paste I gave above. the only significant difference is the top stack.19:30
*** manuvakery has quit IRC20:03
*** openstackgerrit has joined #openstack-swift20:21
openstackgerritTim Burke proposed openstack/swift master: Add a new URL parameter to allow for async cleanup of SLO segments  https://review.opendev.org/73302620:21
timburkeDHE, so basically any stack that ends with `self.greenlet.switch()`  should be inactive - it's registered what it's waiting on with eventlet and put itself to sleep. generally when you observe a deadlock, the only active thread will be the eventlet hub, and it'll be in a sleep-a-little-then-check-what's-ready loop20:33
timburkegood to know that you've seen this nested-current_thread-call issue, too! i'm not sure why it hasn't come up before -- it really seems like it should've been an issue with py2 as well...20:35
DHEyes, but 1 out of 3 isn't reassuring. and when I strace'd a hung thread it was in a futex that wasn't time-limited20:35
*** lxkong has joined #openstack-swift20:36
*** lxkong has quit IRC20:36
DHEproxy server isn't actually multi-threaded and doesn't share memory with other instances, right? so there isn't anything that could cause a wake-up20:36
DHEunless somehow the bug manifests in other ways20:37
DHEsorry, I've been spending most of the day trying to figure this one out. bit us pretty hard this morning20:39
DHEand this is definitely not my area of expertise20:40
timburkemight try picking up the changes in https://github.com/swiftstack/python-stack-xray/pull/2/files so you can see pthread stacks, too -- i think those should all be "active"21:17
DHEokay that shows a new stack with your current_thread doublecall that wasn't in the previous output21:34
DHEyeah this looks right22:06
*** mgagne has joined #openstack-swift22:37
openstackgerritTim Burke proposed openstack/swift master: Add a new URL parameter to allow for async cleanup of SLO segments  https://review.opendev.org/73302622:39
timburkereeeally... mind sharing? i'd only ever seen it in a greenthread stack iirc22:39
openstackgerritTim Burke proposed openstack/swift master: Run swift-tox-func-encryption-py37 job in the gate  https://review.opendev.org/75258022:47
*** rcernin has joined #openstack-swift23:05
DHEhttp://paste.openstack.org/show/798034/  oh sorry, yeah I should share that23:49

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!