Friday, 2020-08-28

mattoliveraumorning00:03
*** jv__ has joined #openstack-swift00:52
*** jv__ has quit IRC01:08
*** xiaolin has joined #openstack-swift01:31
*** baojg has joined #openstack-swift01:47
*** baojg has quit IRC01:50
*** gyee has quit IRC01:58
*** rcernin has quit IRC02:56
*** rcernin has joined #openstack-swift03:04
*** mahatic has quit IRC03:20
*** baojg has joined #openstack-swift03:31
*** baojg has quit IRC03:35
*** rcernin has quit IRC03:35
*** rcernin has joined #openstack-swift03:39
*** psachin has joined #openstack-swift03:51
*** rcernin has quit IRC03:54
*** rcernin has joined #openstack-swift04:08
*** rcernin has quit IRC04:18
*** rcernin has joined #openstack-swift04:19
*** evrardjp has quit IRC04:33
*** evrardjp has joined #openstack-swift04:33
*** xiaolin has quit IRC05:27
*** mahatic has joined #openstack-swift11:01
*** ChanServ sets mode: +v mahatic11:01
*** jv__ has joined #openstack-swift11:22
*** dsariel has joined #openstack-swift11:27
*** jv__ has quit IRC12:38
*** rcernin has quit IRC12:56
*** jv__ has joined #openstack-swift13:09
*** jv__ has quit IRC13:48
*** djhankb has joined #openstack-swift15:23
timburkegood morning15:24
*** manuvakery has joined #openstack-swift16:11
*** psachin has quit IRC16:24
openstackgerritTim Burke proposed openstack/swift master: wsgi: stop closing listen sockets when workers die  https://review.opendev.org/74872116:48
timburkeclayg, i'm kinda tempted to squash ^^^ and its two parents into one patch -- it wasn't until i started digging into the graceful exit for workers that i could really see the strategy i wanted for socket-per-worker16:49
timburkeso a decent bit of that last patch feels like it's winding back changes from the first one :-/16:50
claygidk man, we don't really know what's going on with these workers when need them to shutdown16:51
claygI was looking at the rss killer and thinking about HUP/TERM - and I'm not sure we won't need a "hard stop after timeout" sort of situation16:51
clayghaving options is ideal16:52
claygit looks like that change might also be adding workers sharing sockets again?  i probably can't make an honest assesment about doing a squash w/o spending more time with it16:54
timburkeclayg, so fwiw, i've been testing with killing workers via HUP/USR1 for a graceful exit, TERM for a harder stop, and KILL for a "right now, i *mean it*" and the parent's been good about bringing back a fresh worker in its place17:40
timburkeworkers will share sockets only in so far as one worker replaces another17:41
timburkeso if you've got workers=4, we bind four sockets in do_bind_ports, spin up four workers, and if one of those workers dies, we move its socket over to tthe "orphan" column so we can spin up a fresh worker to start accepting on it again17:43
*** manuvakery has quit IRC19:20
openstackgerritClay Gerrard proposed openstack/swift master: add swift-manage-shard-ranges shrink command  https://review.opendev.org/74172119:39
*** ormandj has quit IRC19:39
claygtimburke: is there a signal you can send to a worker that closes it's socket as well?  maybe useful to distinguish HUP/USR1 in this regard19:40
*** ormandj has joined #openstack-swift19:41
timburkenot at the moment. got a preference on which one should do the close?19:41
timburkeor rather, the shutdown...19:41
timburkeit's gonna complicate the parent a bit since it'll need to check whether the socket it's got in hand is shutdown or not, but the flexibility does seem useful19:42
claygI'm almost positive the reason we're killing workers is to get the socket to close20:01
claygand i've also become skeptical that our current rss killer can "just" use HUP - I think it ends up sending a TERM after close doesn't work20:02
openstackgerritTim Burke proposed openstack/swift master: Client should retry when there's just one 404 and a bunch of errors  https://review.opendev.org/74494220:11
timburkeclayg, which socket, though? the listen socket or the client connection socket? if the rss killer is working *today*, without a listen socket per-worker, it sure seems like if anything, it *must* be the client connection socket that needs to get nuked20:16
timburkei'm still concerned by the stack xrays we've seen recently for orphans following a USR1 that seem to show workers in a deadlock down in logging. if those start piling up, and a bunch of them are loading some large-ish SLO manifest in their head before locking up, that seems likely to cause ballooning memory...20:18
timburkeof course, if those two things *are* related, the graceful stop won't actually stop -- but by sending a "stop accepting new connections" signal and then waiting 0.5-5 mins, we'll have more confidence that any connections still associated with that worker were *never* going to receive a response, so a TERM is "safe"20:22
timburkei think it'll absolutely be worth us getting the new code running on a canary in prod then manually watching for when rss gets "too high" and fixing it. if a simple HUP is insufficient, we should be ready to run an xray and look for whether we've still got the main thread in the accept loop or not. if it is, that'd indicate the HUP was ineffective and we should leave the rss-killer going straight to TERM20:28
*** djhankb has quit IRC22:06
*** djhankb has joined #openstack-swift22:07
-openstackstatus- NOTICE: A zuul server ended up with read only filesystems which caused many jobs to hit retry_limit. The server has been rebooted and appears happy. Jobs can be rechecked.22:14
*** djhankb has quit IRC22:37
*** djhankb has joined #openstack-swift22:37
timburke:-/ *is* there a way to check whether a listen socket has been shutdown short of trying to accept and catching the EINVAL if it's been shut down? i still want the parent to do the binding, but only the children should do any accepting...22:57
DHEwhat would happen if you polled it?23:01
*** rcernin has joined #openstack-swift23:10
*** rcernin has quit IRC23:15
*** djhankb has quit IRC23:30
*** djhankb has joined #openstack-swift23:31
timburkeDHE, good call! looks like i can check for POLLHUP flags being set. and here i was just about ready to go parsing the result of `lsof -a -p {os.getpid()} -d {sock.fileno()} -FtT`...23:48
DHEhorray I'm useful! :)23:58

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!