Tuesday, 2022-09-27

opendevreviewTim Burke proposed openstack/swift master: replicator: Use last-primary table  https://review.opendev.org/c/openstack/swift/+/85934900:47
opendevreviewJianjian Huo proposed openstack/swift master: Sharder: warn when sharding appears to have stalled.  https://review.opendev.org/c/openstack/swift/+/85937304:49
opendevreviewAlistair Coles proposed openstack/swift master: proxy: refactor error limiter to a class  https://review.opendev.org/c/openstack/swift/+/85879014:57
opendevreviewAlistair Coles proposed openstack/swift master: Refactor memcache config and MemcacheRing loading  https://review.opendev.org/c/openstack/swift/+/82064814:57
opendevreviewAlistair Coles proposed openstack/swift master: Global error limiter using memcache  https://review.opendev.org/c/openstack/swift/+/82031314:57
acolesreid_g: is it possible you have run into https://bugs.launchpad.net/liberasurecode/+bug/1886088? i.e. do you have different liberasurecode versions?16:00
reid_gHmm. that does look possible. The 20.04 nodes have 1.6.1-4 and the 18.04 nodes have 1.5.0-1.16:04
reid_gHow do I know if "Note that this is only a problem *if your servers were using libec's alternative CRC*."16:05
timburkereid_g, the second comment on the bug has a check_libec_crc.py script that you can run against some fragments16:06
reid_gWhen I run this script against one of the quarantined fragments, it is saying that the stored as zlib.16:12
reid_gWhen I check a file fragment that isn't quarantined, it says legacy.16:12
reid_g(both on the 18.04 box)16:12
timburkesounds like exactly the problem, then :-(16:12
reid_ghmm16:13
timburkei wonder how hard it would be to drop in the jammy package on the focal boxes, so you could have them writing legacy CRCs again... unfortunately, we didn't catch & fix the bug until 1.6.216:15
reid_gSo it sounds like we would need to install 1.6.2 on focal, set LIBERASURECODE_WRITE_LEGACY_CRC=1, finish upgrades on all nodes, remove LIBERASURECODE_WRITE_LEGACY_CRC=1, move quarantined objects back to where they belong?16:18
timburkei think that'd be the recommended route, yeah. the fact that there's already-quarantined data makes it a little hairy -- it's hard to tell whether you've already got an availability issue. depending on how far along the upgrade you are, you might prefer to get 1.6.0+ on the remaining bionic nodes16:23
reid_gI wonder if this package is in the cloud archive16:25
timburkebest way to stop the bleeding is to pull the bionic nodes out of your load balancer -- as long as the proxies are all writing frags with the legacy CRC and rebuilds are infrequent, you should generally get legacy crcs everywhere16:25
timburkethis also reminds me that i should keep pushing on https://review.opendev.org/c/openstack/pyeclib/+/817498 -- ideally we'd even have zuul building binary pyeclib wheels that include libec and isa-l and publish them to pypi / https://tarballs.opendev.org/openstack/pyeclib/16:32
reid_gI am going to meet with my team about this. Maybe we can backport the liberasurecode1 package from jammy16:43
reid_gThe note 'This issue was fixed in the openstack/swift 2.27.0 release.' just means that you can set the LIBERASURECODE_WRITE_LEGACY_CRC=1 via swift right? If you are using 2.25 & liberasurecode1>=1.6.2, you can just set the ENV manually to write legacy crcs?17:46
reid_gtimburke18:25
timburkereid_g, yeah -- starting in swift 2.27.0, you could set `write_legacy_ec_crc = true` in your proxies, reconstructors, and internal clients and have swift set the env var for you. if you can manage the env vars on your own, any swift can take advantage of the legacy-crc mode with libec 1.6.2+18:35
reid_gOk. Thank you!18:42

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!