21:00:02 #startmeeting swift 21:00:02 Meeting started Wed May 1 21:00:02 2024 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:02 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:02 The meeting name has been set to 'swift' 21:00:11 who's here for the swift team meeting? 21:00:50 o/ 21:01:13 huzzah! i was worried i'd be left talking to myself ;-) 21:01:25 not this time :) 21:01:41 i tried to do a better job of prepping this week 21:01:57 so the agenda's pretty full at 21:02:00 #link https://wiki.openstack.org/wiki/Meetings/Swift 21:02:06 first up 21:02:14 #topic utils refactor 21:02:21 #link https://review.opendev.org/c/openstack/swift/+/914029 21:02:22 patch 914029 - swift - Refactor utils - 20 patch sets 21:02:52 clayg, acoles, and i all like where this has landed 21:03:36 unfortunately it looks like there was a probe test failure in the gate (test_reconciler_move_object_twice), so it'll need a recheck 21:03:38 yeah, I love the idea of further refactor, utils is getting big.. but not looking forward to the rebase fallout, esp in tracing :P 21:03:52 but it'll be coming in the next day or so 21:04:18 and yeah, expect a decent number of merge conflicts to fall out of it (sorry in advance) 21:04:47 kk 21:05:13 i'll try to get a merge down to feature/mpu up asap once its landed so acoles can have a ready-to-go-branch in his morning 21:05:33 #topic probe test timeouts 21:05:35 oh yeah great idea 21:06:02 while i was reviewing that patch, i noticed that we get a fair bit of probe test timeouts 21:06:11 not a *ton*, but more than i'd like 21:07:10 some of them more or less make sense -- a patchset breaks every probe test, then the retry-failed-tests logic kicks in and retries them *all*... 21:07:19 yeah, that's reasonably likely to cause a timeout 21:07:45 others seem to just hang, though, and that's more worrying 21:08:15 #link https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_aeb/913949/3/check/swift-probetests-centos-9-stream/aebbd31/job-output.txt 21:08:55 for example, gets 8% of the way through the tests, then hangs until the timeout pops 1h51m later 21:09:21 wow 21:09:23 the test that hangs isn't consistent, fwiw 21:09:26 #link https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_67c/909800/7/check/swift-probetests-centos-9-stream/67cfe7b/job-output.txt 21:09:39 #link https://9b6014e80e764b848f3d-c29773bdeee4530a738751d9e026e2a7.ssl.cf1.rackcdn.com/874806/23/check/swift-probetests-centos-9-stream/ddc315e/job-output.txt 21:09:53 been able to reproduce when running probe tests locally? 21:10:08 nope -- so honestly i'm not quite sure how to debug it 21:10:32 but i figured i'd bring it up in case anyone else had ideas 21:11:01 i should probably write up a bug about it, and try to track job failures more closely 21:11:42 if anyone else wants to take a look, i found this helpful 21:11:45 #link https://zuul.opendev.org/t/openstack/builds?job_name=swift-probetests-centos-9-stream&job_name=swift-probetests-centos-8-stream&project=openstack%2Fswift&result=TIMED_OUT&skip=0&limit=100 21:11:45 yeah bug might be a good start. I'll run some probe tests locally in the meantime and see what happens 21:12:04 on nice 21:12:55 it does seem like things go worse around March -- prior to that, it was mostly ~1/month 21:13:28 but of course, the older runs don't still have logs attached to verify the hang 21:13:54 next up 21:14:03 #topic liberasurecode release 21:14:12 it's been like a couple years! 21:14:24 so i put together authors/changelog 21:14:31 #link https://review.opendev.org/c/openstack/liberasurecode/+/917784 21:14:32 patch 917784 - liberasurecode - Release 1.6.4 - 1 patch set 21:15:06 yeah probably due for a release :P 21:15:30 there's nothing too major -- there's a bounds-check that callers might appreciate, but otherwise it's mostly code cleanup and build fixes 21:15:47 kk, will review it today 21:15:49 probably half the reason is just to make sure i remember how to do one of these ;-) 21:15:51 thanks 21:16:16 speaking of ec... 21:16:30 #topic manylinux wheels for pyeclib 21:17:03 so i've been playing with this for a bit, and created a Dockerfile to help build these a while back 21:17:09 #link https://review.opendev.org/c/openstack/pyeclib/+/817498 21:17:09 patch 817498 - pyeclib - Add Dockerfile to build manylinux wheels - 11 patch sets 21:17:31 but i finally got around to trying to get them building in CI! 21:17:37 #link https://review.opendev.org/c/openstack/pyeclib/+/917857 21:17:37 patch 917857 - pyeclib - Add job to build wheels - 5 patch sets 21:17:45 oh yeah, I remember you playing with this 21:17:57 nice 21:18:24 it even has them showing up as artifacts on the zuul build page: https://zuul.opendev.org/t/openstack/build/a8e195bfe57b4d2c928d1a52a0523e4e/artifacts 21:19:36 next up i want to beg some help from someone who knows zuul and the release process better than me to figure out how to actually build & upload that when we tag a release 21:20:14 you might have to visit infra for that 21:20:35 i also realize it might be nice to provide a little more context on manylinux wheels and why i want this 21:20:54 true 21:22:28 so any of us can build a binary wheel already -- setup.py bdist_wheel and away you go 21:23:27 but that would create a wheel tied to your specific version of system libraries (including not just glibc but also liberasurecode) 21:24:22 meaning that you couldn't just publish it and expect other people to be able to use it. pypi will actually reject such a wheel if you even try 21:26:36 manylinux wheels are designed so you *can* distribute them, because they target a really old version of glibc and glibc won't break backwards compat 21:27:15 oh ok, making alot more sense now 21:28:31 that actually only solves half the problem, though -- great, glibc's OK, and we can probably expect other people to have *some* version of that installed 21:28:45 but what about liberasurecode? or isa-l? 21:30:01 there's a way to have those baked into the wheel, too! and since *those* will only depend on some widely-installed libraries, now you've got a wheel that can actually be used in a lot of places 21:31:12 *and* you don't need a C build chain to install pyeclib 21:31:37 oh wow, ok. I never considered putting more into a wheel. I guess why not. The point is to save compiling etc. 21:31:52 my end goal is to be able to run `pip install swift` on a pretty bare-bones system and have it Just Work 21:32:50 at least now you can say `pip isntall https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_a8e/917857/5/check/pyeclib-build-wheels/a8e195b/artifacts/pyeclib-1.6.1-cp35-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl swift` and i *think* that'll work ;-) 21:33:02 (until the build results expire) 21:33:31 (and assuming you fix my isntall typo :P) 21:33:40 that would be cool. I actaully did just that yesterday (pip install swift) and then needed to get python.py and a compiler installed. So maybe good timing for this discussion :) 21:33:52 *python.h 21:35:03 there's more stuff that could be done (aarch64 wheels, musl wheels) but this seemed like a pretty good starting point 21:35:16 next up 21:35:24 #topic expirer work 21:35:28 +1 21:35:42 there are a few patches we've been looking at lately 21:36:24 one adds some more info to the expirer queue entries -- specifically, the content-length of items that are marked to expire 21:36:28 #link https://review.opendev.org/c/openstack/swift/+/912496 21:36:28 patch 912496 - swift - add bytes of expiring objects to queue entry - 13 patch sets 21:38:12 the other body of work is trying to deal with the large number of expirers and large number of queue entries we've got in prod -- every object node is participating, and that can result in a lot of account/container db load when they all restart 21:39:10 the fact that we've got a bunch of deferred work in the queue that should be skipped for now just adds to the frustration 21:39:27 so clayg has a couple patches 21:39:29 #link https://review.opendev.org/c/openstack/swift/+/914713 21:39:30 patch 914713 - swift - expirer: new options to control task iteration - 14 patch sets 21:39:35 #link https://review.opendev.org/c/openstack/swift/+/916026 21:39:35 patch 916026 - swift - distributed parallel task container iteration - 6 patch sets 21:40:14 they were stacked previously, but that second one hasn't been updated in a little bit 21:40:40 fwiw, though, i wonder how much we'd need the first one if we had the second one already 21:41:35 would finally moving to the new task queue (that divides up the queues amongst all the partitions (or whatever)) making it more distributed, be an option? 21:42:21 I haven't really looked into these patches yet. I'll try and get too that to get a better understanding 21:43:36 potentially? p 517389 hasn't seen real activity since 2019, though, and we'll still need to deal with the 1B+ queue entries in the old layout 21:43:36 https://review.opendev.org/c/openstack/swift/+/517389 - swift - Add object-expirer new mode to execute tasks from ... - 47 patch sets 21:44:50 next up... 21:45:02 oh yeah, just thinking out loud 21:45:07 #topic py2/py3 behavior difference in brokers 21:45:30 acoles and i noticed a funny thing while reviewing a patch on feature/mpu 21:46:13 when we bulk-load all the rows from the pending file into a db, py2 shuffles the rows! 21:46:14 yeah, I've noticed this. And skipped on py2 tests because the row insert order isn't known bewteen the 2 21:46:22 this was a bit of a surprise to both of us 21:47:01 oh! which test, do you remember? i want to fix it so py2 behaves like py3 21:47:35 didn't py2's dict not strickly ordered. maybe it's used as a datatype down in the sqlite module or something 21:48:00 it comes down to dict iteration order -- i think we just need to use an OrderedDict around https://github.com/openstack/swift/blob/2.33.0/swift/container/backend.py#L1365 21:48:30 I'll have to find it.. it was a while ago 21:48:42 and maybe https://github.com/openstack/swift/blob/2.33.0/swift/container/backend.py#L341 21:49:21 where was working on brokers. maybe in the shard-ragne sync point patch, or maybe somethnig that's landed. I'll have to go digging. I'll ping you when I find it. 21:49:27 that'd be great if you can. i might be able to find it on my own, too, now that i know it's somewhere out there 21:49:38 last up 21:49:48 #topic unreleased swiftclient bug 21:50:29 there are a couple bugs caused by a recent-ish swiftclient patch, but Yan's got a fix up for them! 21:50:32 #link https://review.opendev.org/c/openstack/python-swiftclient/+/916135 21:50:32 patch 916135 - python-swiftclient - Fix swiftclient output regression - 5 patch sets 21:50:41 oh nice 21:51:02 we probably want to get that reviewed & merged fairly soon 21:51:19 kk, I'll put it on my list 21:51:31 all right, that's all i've got 21:51:35 #topic open discussion 21:51:42 anything else we want to bring up? 21:52:22 We do have some students from a university in Qatar who want to work on swift as a project at Uni, their teacher/lecturer as reached out. 21:52:46 I was trying to think of some swift related project for them to work on. 21:53:18 oh yeah, i think i saw you forwarded something to me... sorry, i'm bad at keeping up with outreach 21:53:19 So any thoughts would be greatly appreciated. Not sure on the size or complexity though. 21:53:47 Looking at our old ideas page maybe one of these? 21:54:00 account quotas for number of files 21:54:12 #link https://wiki.openstack.org/wiki/Swift/ideas/account-quota-files 21:54:49 task queue (though maybe to complex) 21:55:04 probably same with teiring. 21:55:20 we could try and give them pipeline automation 21:56:09 I think the reconciler and sharder daemons need better scaling (ie added concurrency with workers etc). 21:56:23 oh yeah, i should revisit p 635040 ... 21:56:23 https://review.opendev.org/c/openstack/swift/+/635040 - swift - Include some pipeline validation during proxy-serv... - 5 patch sets 21:56:40 Or maybe just something something intersting audit-watcher or custom middleware. 21:57:40 i'll have a think on it 21:58:08 Thanks, me too. And jianjian too now that he's joined the room :P 21:58:37 I think we're basically out of time.. so that'll do from me :) 21:59:53 all right. thank you for coming, and thank you for working on swift! 21:59:57 #endmeeting