21:00:03 #startmeeting swift 21:00:03 Meeting started Wed Jun 17 21:00:03 2020 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:04 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:06 The meeting name has been set to 'swift' 21:00:10 who's here for the swift meeting? 21:00:26 hi 21:01:12 half here 21:01:30 o/ 21:01:34 o/ 21:02:18 o/ 21:02:43 as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift 21:02:53 first up 21:02:59 #topic gate 21:03:19 you may have noticed that nothing was passing the last couple days 21:03:41 i think it's all resolved now, but i wanted to give an overview of the issues 21:03:53 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-June/015432.html 21:04:19 there was an issue with uwsgi that broke our grenade job (along with *everyone else*) 21:04:47 the qa team's been all over it, and the resolution merged last night 21:05:35 then there was another issue with our probe tests (most visibly; also affected the ceph s3 tests and rolling upgrade tests) 21:06:05 pretty sure it was the result of pip no longer being available in the base images 21:06:06 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-June/015425.html 21:06:08 o/ 21:06:43 the fix there did require a change to our tooling, but that merged this morning 21:06:44 https://review.opendev.org/#/c/735992 21:06:44 patch 735992 - swift - Use ensure-pip role (MERGED) - 5 patch sets 21:08:29 i rechecked a bunch of changes about three hours ago, but everything's all backed up so none of those have actually posted new results yet 21:08:31 thanks for fixing the gate timburke ! 21:08:51 if anyone sees more issues, holler! 21:09:14 #topic memcache and container failures 21:10:38 so last week i had all replicas of a container get overloaded 21:10:57 yeah that was pretty cool 21:11:22 actually I wasn't there - it SOUNDED cool (after the fact) 21:11:42 which led me to notice that when the proxy hands back a 503 (because we got timeout, timeout, timeout, 404, 404, 404), we go evict memcache 21:11:55 #link https://bugs.launchpad.net/swift/+bug/1883211 21:11:55 Launchpad bug 1883211 in OpenStack Object Storage (swift) "get_container_info 503s shouldn't try to clear memcache" [Undecided,In progress] 21:13:40 which meant that once info fell out of cache while there were hundreds of concurrent requests trying to do things in the container, it couldn't *stay in cache* even when some of those HEADs to try to repopulate managed to get back to the proxy 21:14:47 i proposed https://review.opendev.org/#/c/735359/ to fix it (basically, follow what the docstring said to do in set_info_cache), but i was wondering if anyone else has seen similar behavior 21:14:48 patch 735359 - swift - proxy: Stop killing memcache entries on 5xx responses - 4 patch sets 21:15:07 moral of the story: don't let your primaries get overloaded - but when you do! you know... be better swift 21:15:08 I haven't but it sounds persuasive. 21:16:14 note that prior to https://review.opendev.org/#/c/667411/ (from about a year ago), we would've been caching a 404 21:16:19 patch 667411 - swift - Return 503 when primary containers can't respond (MERGED) - 2 patch sets 21:16:48 I was reluctant to go mucking with such old code; but once I realized we're a few iterations away for untangling all the things that could possibly lead to clients+sharder overwhelming a root db... I loaded it in my head and it makes sense to me 21:17:25 (funny enough, it was definitely the same cluster and quite possibly the same container that prompted that change, too) 21:17:29 I'm not even sure we really *intended* to clear the cache on error - the history of how it evolved reads more like it just happened on accident as the code evolved 21:18:12 certainly all the primaries being overloaded isn't something that comes up often - it's possible it was just never bad enough (or when it go that bad there was like OTHER thing that were ALSO bad - like... idk... not enough ratelimiting) 21:18:36 yeah, it sure *seemed like* https://review.opendev.org/#/c/30481/ didn't mean to change behavior like that 21:18:37 patch 30481 - swift - get_info - removes duplicate code (Take 3) (MERGED) - 17 patch sets 21:18:41 anyway - even if I'm wrong and someone thought they had a good reason to flush cache on error... I can't convince myself anymore it's a good idea 21:19:11 when the backend service is saying "please back off" - GO HARDER - is rarely going to be the BEST plan 😁 21:20:01 anyway; we're shipping it - and at least two cores like the change - so it'll probably merge eventually, but it's fairly fresh and we're open to better ideas! 21:20:41 The problem is usually the cache being stale. If the error is indicative of the main storage being changed without cache flushed, then cache needs to be flushed. not sure if 503 is such. The 409 seems like a candidate for suspicion. 21:22:04 *nod* i'm not sure that the container server can send back a 409 on GET or HEAD, but good thinking. will check 21:22:34 which 409? timburke the 404 cache is so weird... to think of that as a "remediation" I mean... maybe a client does a PUT and ends up handoffs!? I don't think that behavior was anymore desirable really. 21:22:49 I'm most happy about the tests - it's now defined behavior - we're saying on 503 we don't want to flush the cache 21:23:14 if we change our minds later at least we have tests that can express what we want - and we won't accidently forget to think about it next time we're working in there 21:23:33 i'm gunna go +A it right now - I'm totally talking myself into it!!! 😁 21:23:44 lol 21:24:25 so as clayg mentioned, the trouble seemed to come from the shard stat reporting. fortunately, we've already landed a latch for that 21:24:49 timburke: so you're saying before p 30481 you think we'd leave the cache alone on 503? Or just that was so old ago who KNOWS what would have happened? 21:24:50 unfortunately, we hadn't gotten that fix out to our cluster yet 21:24:50 https://review.opendev.org/#/c/30481/ - swift - get_info - removes duplicate code (Take 3) (MERGED) - 17 patch sets 21:25:13 clayg, yeah, pretty sure it would've been left alone 21:26:23 ok, so ... mostly just a heads up for folks I guess - the patch is new; but good. If anyone else had noticed the behavior before that'd be cool - but it's ok if not either. 21:26:35 while we were trying to stop those shard stats from reporting, we were sad to see that we couldn't just stop the replication servers to stop the background traffic 21:26:55 #topic replication network and background daemons 21:27:23 i wrote up https://launchpad.net/bugs/1883302 and https://review.opendev.org/#/c/735751/ for that particular issue 21:27:23 Launchpad bug 1883302 in OpenStack Object Storage (swift) "container-sharder should send stat updates using replication network" [Undecided,In progress] 21:27:23 patch 735751 - swift - sharder: Use replication network to send shard ranges - 1 patch set 21:27:23 oh yeah, this one's heavy - timburke wants to go full on 21:28:13 clayg: (sorry, lagging), we have not seen it, but I can't say it hasn"t happened either 21:28:22 hrm... I know that p 735751 is slightly more targeted to the bug - but really the issue and the fix are much more pervasive than we realized originally 21:28:22 https://review.opendev.org/#/c/735751/ - swift - sharder: Use replication network to send shard ranges - 1 patch set 21:29:15 timburke: I'd argue we reword the bug to at least "sharder and reconciler don't always use replication" and attempt to move forward with p 735991 which is bigger but WAY better 21:29:16 https://review.opendev.org/#/c/735991/ - swift - Add X-Backend-Use-Replication-Network header - 1 patch set 21:29:54 yeah -- so the writes go over replication, but the sharder still does reads over the client-traffic interface -- but it was harder to fix since it uses internal_client for that 21:30:07 it's got me wondering: which interface should our background daemons be using? 21:30:09 it's like a unified way to make all our different little client interfaces use replication networks like the probably all should have been doing forever; but we never had an interface for 'em before 21:31:06 oh yeah interesting. 21:31:18 the way i've got that second patch at the moment, callers have to opt-in to using the replication network. but i wonder if we could/should use it by default 21:32:26 timburke: I think i'd be willing to say any thing besides the proxy connecting to the node[ip] when a [replication_ip] is available is a bug? like not a design choice, or operator choice - a bug 21:32:30 if a direct client or internal client is ever used inline from a customer request then client traffic else replication network. 21:32:44 clayg says we (nvidia nee swiftstack) have at least one utility we've written that *would* want the client-traffic network; i wonder what other people have written and which interface they'd prefer 21:33:11 that's a fairly strong stance, but personally having a separate storage server for background work (that I can turn off when needed) has been a HUGE QOL improvement for me over the years 21:33:57 mattoliverau: I don't think internally we ever use direct/internal client from inside the proxy (i.e. related to a user request) 21:34:07 timburke: do some of the new UPDATE requests use direct client? 21:34:30 "new" - i'm not sure there's anything landed that does that... and IIRC they just call req.get_resp(app)? 21:34:32 yeah, trying to decide if we use it anywhere 21:34:39 nope, it's plumbed through the proxy-server app 21:35:36 well, maybe i keep it opt-in on that patch and propose another to change the default while people think through what they've got and what the upgrade impact would be like 21:35:57 so internal-client and "the proxy-server app" are VERY similar - but Tim found a place between InternalClient and the app itself where we can plumb this header through (and then way down near where we make_connection(node, ...) we get to look at headers to pick node[ip] or node[replication_ip] 21:36:43 it's really sort of slick - and sexy because it works uniformly across both interfaces (because both interfaces already take headers and can set backend defaults) 21:36:53 Direct client goes straight to replication network sounds unexpected to me. I thought that proxies might not even have that network. 21:37:36 zaitcev: that's good feedback! proxies don't use direct client - but anything "defaulting" to the backend network might be "surprising" to some 21:38:37 a quick grep, yeah no direct client in proxy or middlewares. 21:38:42 and I hadn't considered access/topology - if someone deploys anything that uses either of these interfaces ON a node that can use the replication network, that could be a big surprise 😞 21:40:25 Well for things like the reconsiler and sharder, they are part of the consistency engine, the sharder is just a type of replicatior (in a way). so yeah totally should do it's work over replication network. 21:40:45 timburke: I would encourage you to drop p 735751 now that p 735991 is on everyone's radar - to me, it's not so much about "fixing ALL THE THINGS" as "fixing it RIGHT" 21:40:45 https://review.opendev.org/#/c/735751/ - swift - sharder: Use replication network to send shard ranges - 1 patch set 21:40:47 https://review.opendev.org/#/c/735991/ - swift - Add X-Backend-Use-Replication-Network header - 1 patch set 21:41:11 👍 thanks for the feedback, everyone! 21:41:24 on to updates 21:41:31 #topic waterfall EC 21:41:37 clayg, how's it going? 21:41:50 mattoliverau: I'm glad to hear you say that! I think having internal and direct client growing these new interfaces will amek it much easier to get it right out of the gate for new daemons 21:41:58 timburke: a little better-ish, or maybe? 21:42:01 I like the feeder! 21:42:35 https://review.opendev.org/#/c/711342/8 phew - too many links open 21:42:35 patch 711342 - swift - wip: asyc concurrent ecfragfetcher - 8 patch sets 21:42:47 I'm still waffling about the code duplication 21:44:02 i don't know exactly how to describe the experience of pulling them apart - it's like I'm starting to see the tear lines and I can't help but try and imagine a few abstraction that could MAYBE cut through them 😞 21:44:12 I mostly try not to think about it while I make waterfall-ec awseome 21:44:17 which it *totally* is 21:45:33 or at least I can see how it will be - once I add a follow to configure the feeder with per-policy settings and the stair-step configuration that alecuyer talked about at the PTG 21:45:50 nice! 21:46:01 I'm much more excited about working on that code than wading through the mess of cutting up GETorHEADHandler and ECFragGetter 21:46:33 at some level I want to just leave the messy turd there finish the stuff I care about and then try to re-evaluate when I feel less pressure to FIX THE DAMN BUG 21:47:17 but I sort of know a new priority will come along, and even though I'll probably get up a patch out of pure guilt - it's not obvious to me "here a 1000 line diff that doesn't change anything" is gunna get merged if I'm not complaining about it 21:47:52 ALSO! I need to chat with folks about extra requests for non-durables - or at least... the existing behavior is obviously wrong and the correct behavior is not obvious 21:48:23 I picked something... and it's... better - but what if Y'ALL have an even BETTER idea!!! 21:48:58 Little hope of that I'm afraid. 21:49:10 Also 21:49:13 i dunno if we can wait til the next PTG to go over it... 21:49:49 should we read what you've done so far to try to get our heads around the problem, or should we sum it now? 21:50:23 I think it's a complex enough change (I'm really trying to SIMPLIFY) that it's worth a read by anyone who can handle it 21:50:34 I've been trying to drop comments around the interesting bits 21:50:41 we could schedule a video chat if you think something closer to "in person" would be best 21:51:05 etherpad braindump of the current problem, them video chat to talk through it? 21:51:14 plus time to look at code :) 21:51:21 yes waiting for the next ptg is too far if clay is working on it *now* ? 21:51:34 for the non-durable extra requests - yeah I would like to high-bandwidth (very helpful at the PTG); I would definitely try and prepare if there was something scheduled. 21:52:48 let's do something then :) We could zoom or jitsi and announce it in channel so anyone can attend. (keeping it open). 21:52:48 ok, well no one is screaming about the code duplication - that gives me some confidence that I've built it up enough no one is going to go to review and be like "WTF is this!? you can't do this!" 21:53:12 so I'll leave the turd there and move on down the line to the follow on configuration stuff (which will be SUPER sexy) 21:53:47 then we're just left with non-durable extra requests - which I can write up ASAP and Tim will help me with a zoom thing 21:54:02 👍 21:54:08 clayg: like you said at the PTG code dup is ok, so long as we all know, it's documented, and it make it easier to grok and understand ;) 21:54:24 mattoliverau: ❤️ you guys are the best 21:54:35 all right 21:54:38 thanks for polishing the turd :) 21:54:59 sorry rledisez, alecuyer: i forgot to drop losf from the agenda like i'd promised to last week 21:55:06 so 21:55:11 #topic open discussion 21:55:24 anything else we should talk about in the last five minutes? 21:55:27 well I'll just post a link for clay ;) wrt to a PTG question 21:55:27 https://review.opendev.org/#/c/733919/ 21:55:28 patch 733919 - swift - s3api: Allow CompleteMultipartUpload requests to b... - 3 patch sets 21:55:28 https://docs.python.org/3/library/multiprocessing.shared_memory.html 21:55:50 alecuyer: YAS!! 21:55:55 3.8 only tho - but nice interface to use shared memory, switch the ring to use numpy ? 21:56:05 timburke: my complete multi-part retry has been going for 3.5 hours - and it's still working 21:56:07 alecuyer, that also makes me think of something DHE mentioned earlier today... 21:56:19 didn't think about it but thought i'd share the link, and sorry if you're all aware of that 21:56:34 clayg, wow! 5 min seems *way* too short then -- maybe it should work indefinitely 21:56:45 dunno 😞 21:56:58 also i haven't tried abort - or... what was the other calls you were interested in? 21:57:34 abort after complete and complete after abort are the two sequences i'm a little worried about 21:57:54 alecuyer: I remember thinking "oh it's only arrays? pfhfhfhfh" - but now that you mention it - what is the ring except a big array!? 😁 21:58:02 there is the error limiting stuff 🤔 21:58:33 zaitcev, thanks for the review on https://review.opendev.org/#/c/734721/ ! 21:58:33 patch 734721 - swift - py3: (Better) fix percentages in configs - 4 patch sets 21:58:35 abort after complete - so i'm in that state now... but if that works I could try to complete it again too! 🤔 21:59:00 error limiting staff on shared memory seems good idea 21:59:01 So, are we trying to load rings into SysV shm? 21:59:31 I'd be more comfortable with an mmap() of some temp file into which the json or pickle is dumped first. 21:59:44 kota__: yes! alecuyer will figure out how to make it work :P 22:00:11 it seems it's not only py3.8 but greater and equals to 3.8? 22:00:20 not yet 3.9 released yet 22:00:44 kota__: right 22:00:48 good 22:01:01 all right, we're about out of time 22:01:04 kota__: yeah and like zaitcev it's maybe not even a full solution on it's own even if we did want to do it >= 3.8 only (which by the time it's done might seem reasonable) 22:01:33 thank you all for coming! i feel like we had some really good discussions today :-) 22:01:35 clayg: true, got it. 22:01:55 thank you all for coming, and thank you for working on swift! 22:01:59 #endmeeting