Wednesday, 2022-05-04

opendevreviewMatthew Oliver proposed openstack/swift master: sharding: add can_transition_to_cleaved helper method  https://review.opendev.org/c/openstack/swift/+/84040804:59
*** diablo_rojo is now known as Guest8313:36
opendevreviewAlistair Coles proposed openstack/swift master: Refactor rate-limiting helper into a class  https://review.opendev.org/c/openstack/swift/+/83496013:48
opendevreviewAlistair Coles proposed openstack/swift master: AbstractRateLimiter: add option to burst on start-up  https://review.opendev.org/c/openstack/swift/+/83512213:48
opendevreviewAlistair Coles proposed openstack/swift master: Add backend rate limiting middleware  https://review.opendev.org/c/openstack/swift/+/83604613:48
opendevreviewAlistair Coles proposed openstack/swift master: proxy: don't error limit ratelimiting backends  https://review.opendev.org/c/openstack/swift/+/83908813:48
opendevreviewAlistair Coles proposed openstack/swift master: backend ratelimit middleware: support separate config file  https://review.opendev.org/c/openstack/swift/+/84053118:35
timburke_#startmeeting swift21:01
opendevmeetMeeting started Wed May  4 21:01:17 2022 UTC and is due to finish in 60 minutes.  The chair is timburke_. Information about MeetBot at http://wiki.debian.org/MeetBot.21:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.21:01
opendevmeetThe meeting name has been set to 'swift'21:01
acoleso/21:01
timburke_who's here for the swift meeting?21:01
mattolivero/21:01
timburke_as usual, the agenda's at21:03
timburke_#link https://wiki.openstack.org/wiki/Meetings/Swift21:03
timburke_but the overall thing i've noticed is: we've got three or four major threads we've been pulling on, and i feel like we need to actually commit to getting some landed or otherwise resolved21:04
timburke_i think i'll do things a little out of order21:05
timburke_#topic eventlet and locking21:06
timburke_a quick overview of why we're even looking at this, though i think both acoles and mattoliver are in the loop already21:06
timburke_it all started with wanting to remove some import side-effects with https://review.opendev.org/c/openstack/swift/+/457110 -- specifically, to stop monkey-patching as part of import21:07
timburke_the more we dug into it, the more we started to freak out at how broken eventlet monkey-patching of locks is on py3 (see https://github.com/eventlet/eventlet/issues/546)21:08
timburke_which led to much hand-wringing and staring at code for a while, and a realization that all object replicators (for example) serialize on logging since the PipeMutex works across processes21:09
zaitcevHah!21:09
timburke_until we decided that maybe we *don't* need those logging locks *at all*! so we're going to see how https://review.opendev.org/c/openstack/swift/+/840232 behaves and whether it seems to garble our logs21:10
acoles🤞21:11
timburke_note that the fact that the PipeMutex works across processes *also* presents some risk of deadlocks -- if a worker dies or is killed while holding the ThreadSafeSysLogHandler's lock, everybody in the process group deadlocks21:12
timburke_so, hopefully that all works out, we drop the locking *entirely* in ThreadSafeSysLogHandler, and hope that there aren't any other CPython RLocks that might actually matter. sounds like mattoliver is planning on doing some load testing to verify our assumptions about how UDP logging (whether via the network or UDS) works21:16
mattoliverYup, will load test it some today. See if I can break it21:16
acolesnice - try your hardest @mattoliver :)21:18
timburke_next up21:18
timburke_#topic ring v221:18
zaitcevI assume one log message is one large UDP datagram. The protocol does the segmentation into packets for you, and it will not deliver a partially assembled datagram. I think.21:18
opendevreviewAlistair Coles proposed openstack/swift master: backend ratelimit: support per-method rate limits  https://review.opendev.org/c/openstack/swift/+/84054221:18
timburke_zaitcev, yeah, that was my understanding too21:19
timburke_and if it's via a domain socket, even better21:19
timburke_we've talked about ring v2 for a couple PTGs now, we've tried implementing it in a couple different patch chains -- but we haven't actually landed much (any?) code for it21:20
timburke_so my two questions are: does the current implementation seem like the right direction? and if so, can we land some of it before *all* of the work's ready?21:22
mattoliverI think the current index approach is correct. And I've been able to extend it to include the builder in my WIP chain. And it's what we discussed at PTGs. So I think it's a win21:23
mattoliverThe new RingReader/RingWriter approach encapsulates it really well. Kudos timburke_ 21:24
mattoliverSo for me I'm +1 on thinking it's the right direction, but I might be biased because I've been looking at it and involved closer with it then most. 21:26
timburke_my feeling, too :-)21:27
timburke_so should we commit to reviewing and landing https://review.opendev.org/c/openstack/swift/+/834261? mattoliver, do you have any concerns that some of the follow-ups should get squashed in before landing?21:28
mattoliverNah I think it's a great start. And so long as well double check before next release, I'm fine with it. 21:32
mattoliverie if any follow ups are required. 21:32
mattoliverThere was an issue in the RingWriter I think but you fixed that, so cool. 21:33
timburke_all right! sounds good21:34
timburke_next up21:34
timburke_#topic backend rate-limiting21:34
timburke_acoles, i feel like your chain's getting a little long :-)21:35
acolesyeah I need to rate limit myself21:35
acolessome may get squashed but I am working little by little because we want to get *something* into prod soon21:36
mattoliverlol21:36
acolesthe first couple of patches are refactor and minor fixup21:36
acolesthe goal is to provide a 'backstop' ratelimiter that will allow backend servers to shed load rather than have huge queues build up21:37
timburke_anything we should be aware of as we review it? it looks like there may be some subtle interactions with proxy-server error limiting21:39
acolesyes, so this patch could use some careful review https://review.opendev.org/c/openstack/swift/+/839088 - on master any 5xx response will cause the proxy to error limit (or increment the error limit counter), but we don't want that to happen when the backend is rate limiting (that would be an unfortunate amplification of the rate limiting)21:43
acolesso the patch adds special handling for 529 response codes (at PTG IIRC we decided 529 could be used as 'too many backend requests')21:43
mattoliveroh yeah, that could be bad, don't want to rate limite into error limitted21:44
acolesI chose to NOT log these in the proxy since there could be a lot and they won't stop because the proxy won't error limit the node21:44
timburke_seems reasonable -- the 529s will still show up in log lines like `Object returning 503 for [...]`, though, yeah?21:46
acolesyes I checked that21:47
timburke_https://review.opendev.org/c/openstack/swift/+/840531 seems like a great opportunity to allow for config reloading similar to what we do with rings21:48
acoleshaha, yes that's exactly where I am heading with that! actually, clayg's idea21:50
mattoliveroh yeah, that'll be cool.21:50
acolesbut, one thing at a time, I think the finer grained per-method ratelimiting may be higher priority?? we'll see21:51
timburke_sounds good21:51
timburke_ok, we're getting toward the end of our time -- maybe i'll skip the memcache stuff for now, though i'd encourage people to take a look21:51
timburke_#topic open discussion21:51
timburke_what else should we bring up this week?21:52
mattolivernot really, for those who use it, I've been updating VSAIO To use jammy because I got sick of deprecation warnings and older version of things: https://github.com/NVIDIA/vagrant-swift-all-in-one/pull/126 21:55
mattoliver*nothing really21:55
mattolivertimburke_: you were interested in getting people to look at signal handling, so https://review.opendev.org/c/openstack/swift/+/840154 but I'll take a better look today. 21:56
timburke_🎉 edge of technology, man!21:56
clarkbwe've got jammy ci nodes coming up now. There have been a few minor bumps, but they should mostly be working21:56
zaitcevPersonally I don't believe in that limiting thing unless it's global.21:56
zaitcevElection, master, and algorithm that considers a snapshot of load.21:56
zaitcevBuuuut21:56
zaitcevBut it's Alistair so it obviously good, so I dunno.21:57
mattolivercool thanks clarkb ! 21:57
zaitcevMaybe it will be some kind of thing that can dampen itself and never amplify the congestion.21:58
acoleszaitcev: it is admittedly a fairly blunt tool21:58
timburke_mostly depends on the client response to being told to back off, i suppose -- i have a hard time believing it'd be worse than what we've got now though22:00
timburke_(which is to say, backend servers backing up so much that proxies have already timed out by the time they've read the request that was sent)22:00
timburke_all right -- i've kept you all long enough22:01
timburke_thank you all for coming, and thank you for working on swift!22:01
timburke_#endmeeting22:01
opendevmeetMeeting ended Wed May  4 22:01:42 2022 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)22:01
opendevmeetMinutes:        https://meetings.opendev.org/meetings/swift/2022/swift.2022-05-04-21.01.html22:01
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/swift/2022/swift.2022-05-04-21.01.txt22:01
opendevmeetLog:            https://meetings.opendev.org/meetings/swift/2022/swift.2022-05-04-21.01.log.html22:01

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!