21:00:06 <timburke> #startmeeting swift
21:00:07 <openstack> Meeting started Wed Nov  4 21:00:06 2020 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:08 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:10 <openstack> The meeting name has been set to 'swift'
21:00:15 <timburke> who's here for the swift meeting?
21:00:26 <kota_> hi
21:00:34 <seongsoocho> o/  Hi !
21:00:44 <mattoliverau> o/
21:00:53 <rledisez> hi o/
21:01:05 <timburke> rledisez! we missed you last week
21:01:15 <timburke> good to see you again :-)
21:02:04 <rledisez> yeah, sorry I missed that :( i hope you had great talk
21:02:29 <zaitcev> o/
21:03:05 <zaitcev> rledisez: I thought it was going to be all ex-OpenIO at OVH from now on, with Swift and Ceph commiserating the in the dustbin.
21:03:39 <clayg> ugh, *already*
21:03:39 <zaitcev> rledisez: Because both you and Alex missed, so it was a bit more than just someone could not make it.
21:03:41 <acoles> o/
21:04:03 <timburke> all right -- first up
21:04:07 <timburke> #topic PTG
21:05:07 <timburke> thanks to those that cameto the ptg last week, and those that didn't, we all missed you
21:05:28 <timburke> notmyname even made a cameo appearance or two :-)
21:05:53 <clayg> #throwback
21:06:12 <timburke> we covered a bunch of great topics, and tried to takenotes in the etherpad as we did so
21:06:18 <timburke> #link https://etherpad.opendev.org/p/swift-ptg-wallaby
21:06:50 <timburke> a quick overview:
21:07:19 <timburke> clayg was interested in havig a new handoff table in the ring, used only on reads
21:08:00 <timburke> people generally seemed enthusiastic about the idea, and we'll see what code ends up looking like
21:08:29 <clayg> "or if it gets written" 🤷‍♂️
21:08:59 <timburke> i was looking at memcached error limiting, particularly when you only have a single mecached server configured in the proxy
21:09:35 <timburke> i wound up writing a new patch to make the existing, hard-coded error limiting more tunable
21:09:37 <timburke> #link https://review.opendev.org/#/c/761029/
21:09:38 <patchbot> patch 761029 - swift - memcache: Make error-limiting values configurable - 1 patch set
21:10:32 <timburke> ... which i think will satisfy my needs without going the more-complicated route in the chain starting at https://review.opendev.org/759183
21:10:32 <patchbot> patch 759183 - swift - memcache: Refuse to error limit the last available... - 2 patch sets
21:11:11 <timburke> zaitcev and dsariel continue workig on audit watchers
21:11:17 <timburke> #link https://review.opendev.org/#/c/706653/
21:11:18 <patchbot> patch 706653 - swift - Let developers/operators add watchers to object au... - 38 patch sets
21:11:35 <timburke> i think it's getting pretty darn close to merging
21:11:54 <zaitcev> No, actually, it's done. I addressed the comments that came up at PTG like moving to a separate directory.
21:12:28 <zaitcev> Although leaving a debugging logger.info in there was pretty embarrassing.
21:12:30 <timburke> yeah,i suppose i should say, "i needto go have another look and merge it" :-)
21:12:56 <timburke> we talked a bit about the default and recommended configs, and came away with a few different things we wanted to do
21:13:55 <timburke> cschwede will look at improving our recommendations
21:13:58 <acoles> zaitcev: not as embarassing as me leaving a debugging print !
21:14:48 <timburke> mattoliverau will look at trimming our manpages (to mostly just point to online docs iirc)
21:15:28 <timburke> and acoles will pull the long tables i the deployment guide out to separate pages (hopefully making the whole thing a bit more readable)
21:16:56 <mattoliverau> zaitcev: I'll try and get back to audit watchers this week. Just been distracted.
21:17:18 <timburke> clayg is excited about some of the recent proxy-logging changes and how they let you slice your metrics; he'll likely propose a change to the sample cofigs to have two separately-named proxy-logging middlewares, each with their own namespacing
21:18:08 <clayg> yeah, i really should do that - and also fix whatever is wrong with the byte-enforcing code
21:18:11 <timburke> and there was a arathon ops feedback session with a lot of good commentary from ormandj
21:19:28 <timburke> (i feel a little bad that that took up so much of our time, but i also feel like it's always one of the most-valuable things we can do when we're all together)
21:19:44 <timburke> for ore on that, see
21:19:47 <timburke> #link https://etherpad.opendev.org/p/swift-wallaby-ops-feedback
21:19:54 <mattoliverau> +1 ormandji and the marathon ops feedback was awesome.
21:20:18 <clayg> for sure all good swift ❤️ ops
21:20:39 <timburke> that's my quick recap of the ptg; did i miss (or misrepresent) anything major?
21:21:27 <zaitcev> ALO?
21:22:07 <acoles> mattoliverau: talked us through all his great work on eliminating overlapping shard ranges
21:22:58 <timburke> oh yeah -- i keep writing it off sincei haven't actually written any code for it yet ;-) but hopefully people have a better feel for the problems we've seen with trying to use SLOs for s3 MPUs, and why a new type of large object might be useful/necessary
21:24:28 <timburke> all right, moving on
21:24:34 <timburke> #topic gate failures
21:25:07 <timburke> so lately, i've had this feeling like our gate has been in particularly bad shape
21:25:16 <mattoliverau> And acoles came up with a great alternative shard audit with gaps algorithm I want to now write into code.
21:26:16 <timburke> i think the guy that finally pushed meover the edge was https://review.opendev.org/#/c/759790/ -- 10 rechecks for a one-line change to drop an unused package from lower-constraints
21:26:16 <patchbot> patch 759790 - swift - Remove the unused coding style modules - 1 patch set
21:27:16 <mattoliverau> Only 10 rechecks :p
21:27:46 <timburke> so i started writing some tooling to get build info from zuul, pull down logs or subunit results, and parse out failures, looking for which jobs (and which individual *tests*) fail most often
21:28:34 <zaitcev> Suspense intensifies
21:28:51 <timburke> i don't have parsing for all job types yet, but it's already been able to help me find some particularly bad/annoying patterns
21:29:44 <mattoliverau> Nice
21:30:06 <timburke> for instance, of 258 individual probe test failures, 149 of them were the result of resetswift failing because the loopback device was busy
21:30:30 <acoles> :'(
21:30:35 <timburke> hopefully that failure mode will go away with https://review.opendev.org/#/c/761439/
21:30:35 <patchbot> patch 761439 - swift - saio: Stop processes more forcefully in resetswift - 1 patch set
21:31:35 <timburke> i also found that the func tests were pretty nice to deal with, since they emit testrepository.subunit files
21:32:37 <timburke> so i figured i'd try switching probe tests to use ostestr, too: https://review.opendev.org/#/c/761459/ (and we'll just see whether the file shows up; it's all magic to me :-/)
21:32:37 <patchbot> patch 761459 - swift - probe: Use ostestr as test runner - 1 patch set
21:33:23 <mattoliverau> Nice work timburke
21:33:39 <clayg> mattoliverau: +1 timburke is a gate hero!
21:33:42 <timburke> there will probably be more information i glean from all of this (and more patches i write as a result), but wanted to share what i've been working on so far with it
21:34:21 <timburke> because i'm *so* tired of starting (and often ending) my day with a slew of rechecks :-(
21:34:44 <timburke> any questions or comments?
21:34:52 <acoles> good stuff timburke
21:35:37 <tosky> timburke: if I may - at this point you may try stestr
21:36:05 <tosky> ostestr is meant to be deprecated
21:36:30 <tosky> not sure it was already considered and tried in the past, maybe it was, so feel free to ignore me on this :)
21:36:43 <timburke> so i *did* try that originally! but forsome unknown reason it caused the partition numbers used in the probe tests' rings to come out different -- no idea why
21:37:24 <tosky> uhm
21:38:29 <timburke> i'm certainly interested in making sure we have maintained software for the test runner, but i figured i'd start with just seeing whether using the same runner that we do in func tests gets me the test artifacts i'm looking for
21:39:19 <timburke> it's very strange, though. i'll keep digging, see if i can get some kind of repro/explanation/bug report for it
21:40:07 <timburke> #topic replication lock
21:40:10 <timburke> #link https://review.opendev.org/#/c/754242/
21:40:10 <patchbot> patch 754242 - swift - Fix a race condition in case of cross-replication - 6 patch sets
21:40:22 <timburke> rledisez, i'm sorry to say, i still haven't reviewed it :-(
21:40:34 <timburke> i even promised i would, too. sorry
21:41:22 <rledisez> that's ok, everybody is busy. it will be reviewed eventually ;)
21:43:07 <timburke> i'd still like to get a test ev up such that i actually repro the problem and see the patch fix it, but at the same time, (1) you're already runing it in prod, (2) it's definitely making your clusters better, and (3) i want *everybody's* clusters to run as well as rledisez's
21:43:48 <timburke> somaybe i should just run it through my mental python parser (clayg always tells me it's a pretty good one)
21:44:35 <mattoliverau> So that's your secret :)
21:44:37 <clayg> timburke: your brain is amazing
21:44:57 <clayg> mattoliverau: and he can do py2 and py3 at the same time!!!
21:45:43 <clayg> rledisez: did the rsync fix get squashed into the ssync change?  I was pretty happy with the strategy for locking the REPLICATION requests; but never quite followed what you were thinking for rsync?
21:45:44 <timburke> i'll try again this week; we've got a pretty big rebalance coming up, it'd probably be good for us to have that patch
21:46:03 <timburke> i don't think we have an rsync fix yet
21:46:44 <rledisez> clayg: no rsync fix. i have the idea pretty clear but I lack time
21:47:19 <timburke> all right, that's all i've got for the agenda
21:47:26 <timburke> #topic open discussion
21:48:23 <timburke> speaking of replication, rledisez, i'd be curious about your take on https://review.opendev.org/#/c/758636/
21:48:23 <patchbot> patch 758636 - swift - Add option to REPLICATE to just invalidate hashes - 5 patch sets
21:49:24 <timburke> (though you might be more interested in just ripping out post-(s)sync replicate calls; see https://bugs.launchpad.net/swift/+bug/1818709)
21:49:25 <openstack> Launchpad bug 1818709 in OpenStack Object Storage (swift) "object replicator update_deleted post ssync REPLICATE request considered harmful" [Undecided,New]
21:50:17 <rledisez> yeah, I sometimes disable it (like after a relink, it helps a lot). i'll have a look at the patch
21:50:51 <timburke> thanks
21:51:02 <tosky> I have a quick note about some legacy jobs (I'm the coordinator for the "no legacy jobs" community goal)
21:51:55 <tosky> I've just noticed a few legacy jobs I originally missed (thanks for porting basically all of them long time ago!)
21:52:09 <tosky> they don't use devstack-gate, so they are not so problematic, but still: they are in pyeclib
21:52:46 <tosky> so if you could convert those as well, and backport them to stable/victoria (and if you want also to older branches), that would be nice!
21:53:23 <timburke> tosky, thanks for the reminder! i'll look into it (hopefully this week?)
21:53:26 <tosky> oh, I see now, it doesn't have the openstack stable branches, so I guess master is fine
21:53:44 <tosky> thank you!
21:53:52 <mattoliverau> Yay, already easier :)
21:53:56 <timburke> i suspect they could largely be switched to the openstack-tox-... jobs
21:54:07 <tosky> most likely
21:54:45 <tosky> the official gerrit topic is "native-zuulv3-migration"
21:55:19 <timburke> 👍
21:55:22 <tosky> sorry for the late ping, I totally missed them originally; the devstack-gate jobs had (and have, the few left) an higher priority
21:56:08 <timburke> makes sense -- and pyeclib sees few enough patches, it likely wouldn't show up on any list of recently-run jobs
21:57:08 <timburke> all right
21:57:15 <timburke> thank you all for coming, and thank you for working on swift!
21:57:20 <timburke> #endmeeting