21:00:22 #startmeeting swift 21:00:24 Meeting started Wed Apr 22 21:00:22 2020 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:25 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:27 The meeting name has been set to 'swift' 21:00:30 who's here for the swift meeting? 21:00:37 o/ 21:00:45 o/ 21:00:46 o/ 21:01:07 o/ 21:01:45 o/ 21:02:13 as usual, agenda's at https://wiki.openstack.org/wiki/Meetings/Swift 21:02:23 #topic PTG 21:02:45 another reminder that we've got an etherpad to collect things to talk about 21:02:47 #link https://etherpad.openstack.org/p/swift-ptg-victoria 21:03:10 hi 21:03:13 i also recently tried to get together a list of all our *prior* etherpads 21:03:16 #link https://wiki.openstack.org/wiki/Swift/Etherpads 21:03:50 thought it might be worth revisiting some of them :-) 21:03:57 Oh nice 21:03:57 wow, nice! 21:04:06 awesome! 21:04:15 good 21:04:30 i'm here too 😁 21:04:35 #topic swift release 21:04:38 we had one! 21:04:54 🥳 21:04:57 Yay 21:04:58 2.25.0 is out and is expected to be our ussuri release 21:05:19 ...which leads to... 21:05:27 #topic rolling upgrade gate is broken 21:06:01 i burned a day or so figuring out what went wrong, but our upgrade job broke following 2.25.0 21:06:46 the key was that i wrote a func test for the etag-quoter that left the thing on for the account 21:07:09 which broke a whole bunch of tests after that one 21:07:29 https://review.opendev.org/#/c/721518/ should fix the test to not leave the account dirty 21:07:29 patch 721518 - swift - func tests: Allow test_etag_quoter to be run multi... - 1 patch set 21:08:08 but the multi node upgrade job is still failing 🤔 21:09:04 https://review.opendev.org/#/c/695131/ both fixes the func tests to accept either quoted or unquoted etags *and* makes an in-process job run with etag-quoter in the pipeline 21:09:05 patch 695131 - swift - func tests: work with etag-quoter on by default - 7 patch sets 21:09:27 clayg, 2.25.0 is out, and the func tests for that job come from that tag :-( 21:09:37 plus, the fix isn't merged yet :P 21:09:58 which was why i had to make it non-voting in https://review.opendev.org/#/c/721519/ 21:09:59 patch 721519 - swift - Make rolling-upgrade job non-voting (MERGED) - 1 patch set 21:10:22 ic, so the rolling update does setup using code form a tag - so once the latest tag has fixed code future changes will work! Brilliant! 21:10:38 so basically "tim fixed everything and needs people to click buttons"??? 21:10:50 or just "tim fixed everything" and we can just 👏 21:11:48 maybe? there's going to be more work coming, too -- in particular, i really want to have a "kitchen sink" func test job where we have *no* (or exceedingly few) skips 21:12:17 i was thinking the dsvm jobs might be a good place for it, so we see all features passing against both keystone and tempauth 21:12:54 (plus there are a bunch of tests that skip if you're not running with some extra keystone users defined) 21:13:14 so i dusted off https://review.opendev.org/#/c/620189/ 21:13:14 patch 620189 - swift - WIP: swift-dsvm: Create more Keystone users so we ... - 11 patch sets 21:13:24 and started poking at https://review.opendev.org/#/c/722120/ 21:13:25 patch 722120 - swift - dsvm: Enable more middlewares - 1 patch set 21:14:15 and i'm thinking about ways to make sure that we never have a release break that job again -- but i'm not entirely sure how to get that 🤔 21:15:30 maybe a rolling-upgrade job that uses origin/master? could add it to the experimental checks that i always run when putting the authors/changelog patch together 21:15:56 if anybody has ideas on that, feel free to reach out! 21:16:34 on to updates! 21:16:45 #topic waterfall ec 21:17:09 clayg, still trying to find time to think more about it? 21:17:54 i mean i guess kinda... i want to put it behind working on better tracing support 21:18:05 👍 21:18:14 should i take it off the agenda for next meeting? 21:18:15 but if I was for sure where the problem was and that I could do something to make it better I guess we could skip that part 21:18:23 yeah you don't need to carry it forward 21:18:36 #topic lots of small files 21:18:48 alecuyer, rledisez how's it going? 21:19:25 Still working on it, but I've had to spend some time on other things too, I think I can post a first patch for the new key format this week 21:20:34 sounds good 21:20:51 #topic CORS 21:21:13 i still need to clean up https://review.opendev.org/#/c/720098/ 21:21:13 patch 720098 - swift - WIP: s3api: Allow MPUs via CORS requests - 6 patch sets 21:21:55 but i think the other three in the chain are good to go (p 533028, p 710330, p 710355) 21:21:55 https://review.opendev.org/#/c/710330/ - swift - s3api: Pass through CORS headers - 13 patch sets 21:21:57 https://review.opendev.org/#/c/710355/ - swift - s3api: Allow CORS preflight requests - 16 patch sets 21:22:16 any chance someone would have an opportunity to look at them? 21:22:43 patchbot, p 533028 21:22:43 https://review.opendev.org/#/c/533028/ - swift - Add some functional CORS tests - 17 patch sets 21:22:48 better :D 21:24:17 well, worth asking ;-) 21:24:25 #topic sharding split brain 21:24:31 * mattoliverau not good at js. But I can look.. not sure it'll be useful :P 21:25:00 uh-oh 21:25:03 so we had a customer that accidentally got two sets of shard ranges inserted from different nodes 21:25:05 oops 21:25:47 this was through swift-manage-shard-ranges (not the auto-sharding stuff), it just got run twice for the same container 21:26:27 i got as far as writing a couple probe tests in p 721376, one of them gets it into the bad state 21:26:28 https://review.opendev.org/#/c/721376/ - swift - sharding: Add probe test that exercises swift-mana... - 2 patch sets 21:26:53 next up i need to figure out how to get back to a *good* state :P 21:27:05 I'll definitely look at that test. 21:27:15 thanks! 21:27:36 maybe we need to make sure we can id a set of ranges inserted by a tool. 21:28:05 on that topic, I saw some activity about auto sharding mattoliverau, anything we can do to help? 21:28:13 well, they'll all have the same epoch & timestamp as i recall, so *that's* good 21:28:23 I did have some code somewhere, in the old POC that scanned ranges in a table and found the most recent contiguas full set of ranges (no gaps) 21:28:58 i'm thinking i probably want to mark one set's shard ranges as collapsed, so any already-cleaved DBs move their rows to the other set's shards? 21:30:00 fortunately, i'm pretty sure each set is complete -- it was using find_and_replace to do the whole set at once 21:30:31 oh yeah I'm sure, just if we want to pick one set then collpase or whatever the rest. 21:31:20 if we can't as easily define set by metadata. though surely we can. I'll need to look at the code again 21:31:48 yeah, i'll try that out, see how far i get. definitely more code coming in the next few weeks 21:31:59 #topic open discussion 21:32:13 rledisez: thanks, just started blowing the dust off it, rebasing it. Wrote up a braindump: https://docs.google.com/document/d/17NllKQmH6tfTsKm5nAx3KCKUvs0zs_qamXtkreOQDWg/edit# 21:32:18 ^ auto-sharding 21:32:58 next step was to maybe create a probetest(s) to attempt to find other edgecases I can fix. 21:34:04 oh, i realized etag-quoter isn't happy on py3: https://review.opendev.org/#/c/721714/ 21:34:05 patch 721714 - swift - py3: Make etag-quoter work - 1 patch set 21:34:06 Thinking about an exclusive UPDATE, so only add shards if there not already there to other primary nodes. Fail if there are nodes, to fix a potential edge case I could see. 21:34:16 I'll add it to the doc when I get a chance. 21:34:56 *fail if there are shards.. 21:35:21 but anyway, all just brainstorming atm, getting ready for the PTG ;) 21:35:30 mattoliverau: I'll carefully read your document. I'm really interested in this feature 21:35:40 still related to container sharding: say I have few (tens) of thousands of containers to shard. would you recommend that I shard them all at once and let the sharder do it's job, or shard some of them, wait for the sharder, shard more, etc… 21:35:41 yeah, i really need to catch up on emails for that :-( 21:36:32 rledisez, at first, i'd say do one at a time, just to see how it goes, check logs, etc. 21:37:15 +1 21:37:17 yeah, we already did about 15 containers, it went well so now I want to go further :) 21:37:25 after doing that a couple times, though, i think it should be pretty safe to do a bunch at once -- we recently started offering that, i can ask around a bit to see how it went 21:38:14 i've come to rather appreciate the cleave limit -- makes sure the sharder isn't stuck on any single container for too long 21:38:17 what I'm afraid if we do a lot at once is maybe the disk space usage on container server. It could a lot more space until all are done 21:38:36 true -- how full are disks? 21:39:13 well filled, so I should do small steps probably :) 21:39:40 time to buy more disks! damn supply-chain issues... 21:39:58 i don't think the process really requires a bunch of scratch space 21:40:04 to be exact, as you probably know, some of them are almost full while others are quite empty, big containers, uh… :D 21:40:53 clayg: I though it deleted the original sqlite only when all is done, so for each container it would consume the double space in the cluster until it's done. am I wrong? 21:40:58 i guess the root being split won't be removed until after all the shards are populated - so yeah probably one a time is better 21:41:01 ah, yeah -- so actually, it might be fine. sharding's *great* at evening out that lumpiness a bit 21:41:46 it's probably something to keep in mind for the auto-sharding. a limit on the number of containers being sharded at a time 21:41:52 mattoliverau: ^ 21:42:01 rledisez: good idea! 21:43:17 could probably even have an optimization where it goes to check recon dumps *first* to see what's currently sharding, then go straight to the DBs... skip the treewalk. hmm... 21:43:18 the best would be to estimate the size of each shard and check there enough space on the devices holding these shards 21:44:07 timburke: totally, that's what we do know: for db in $(cat | jq … Í …); do 21:46:50 all right, i think i'm going to call it 21:47:01 thank you all for coming, and thank you for working on swift! 21:47:06 #endmeeting