Wednesday, 2021-06-02

*** seongsoocho has joined #openstack-swift00:29
seongsoochoHi ! back from long vacation and  just join new irc !00:29
mattoliverauseongsoocho: hey o/ welcome back!00:50
seongsoochoyay !00:50
*** aolivo1 has quit IRC01:23
*** timburke has quit IRC01:38
*** timburke has joined #openstack-swift01:54
zaitcevAren't we having some odd failures with py3902:40
zaitcevhttps://zuul.opendev.org/t/openstack/build/109346d22ef04da99e0d22c4e10f1595/log/job-output.txt02:40
zaitcev- Swift realm="account-name%0A%0A%3Cb%3Efoo%3Cbr%3E"02:41
zaitcev+ Swift realm="account-name%3Cb%3Efoo%3Cbr%3E"02:41
zaitcevThe \n\n is added02:41
zaitcevAnd another test:02:41
zaitcev- Invalid path: on%20e02:41
zaitcev+ Invalid path: o%0An%20e02:41
timburkezaitcev, should be fixed now that https://review.opendev.org/c/openstack/swift/+/793495 merged02:42
zaitcevOne \n is added02:42
timburkei wouldn't mind having another set of eyes look at it before i backport it to wallaby as https://review.opendev.org/c/openstack/swift/+/793439, though02:42
zaitcevtimburke: I do not understand why you dropped the test. Sure, the Request.blank now always unquotes. But that is an a-priory knowledge, isn't it? And you want to make sure that result is right... I don't see that test as value-less.02:49
zaitcevBut maybe I'm missing something here... There's no point in having tests that are always guaranteed to pass, we're just wasting CPU time.02:50
timburkezaitcev, part of why i made sure to add the assertion that the unquoted string wound up in PATH_INFO at https://review.opendev.org/c/openstack/swift/+/793495/1/test/unit/common/test_swob.py@79302:52
zaitcevI'd use something like  self.assertTrue('Swift realm="%s"' % quoted_hacker == resp.headers['Www-Authenticate'] or ('Swift realm="%s"' % quoted_hacker).strip('\n') == resp.headers['Www-Authenticate']) ... maybe.02:52
zaitcevOh, okay.02:53
*** mattoliver has joined #openstack-swift03:17
opendevreviewPete Zaitcev proposed openstack/swift master: Band-aid and test the crash of the account server  https://review.opendev.org/c/openstack/swift/+/74379703:22
zaitcevSo I don't need to do anything, you're doing all my work for me.03:22
mattoliverFYI am testing out the matrix irc bridge, thus mattoliver and mattoliverau at the same time. I figure a move to a new IRC server means a good time to see how matrix is going.. seeing as it means a free bouncer, and if it goes well I can decommission my quassel core + gcloud host :)04:07
timburkehuh. https://github.com/openstack/liberasurecode/blob/master/include/xor_codes/xor_hd_code_defs.h#L63 seems like something must be wrong. shouldn't there being two entries for 56 mean that two parities are identical?04:45
timburkewith similar issues for 7+6 (hd 4) and 8+6 (hd 4) :-(04:51
timburkemaybe that's OK for the xor codes? the assumption for the 6+6 (hd 4) is that you'll have *9* frags in hand when you're trying to rebuild, right? hmm...04:53
opendevreviewMatthew Oliver proposed openstack/swift master: sharder: Track scan progress to fix small tails  https://review.opendev.org/c/openstack/swift/+/79354305:25
opendevreviewTim Burke proposed openstack/liberasurecode master: Fix underflow in flat_xor_hd code  https://review.opendev.org/c/openstack/liberasurecode/+/79413706:09
*** timburke has quit IRC06:20
acolesseongsoocho: welcome back and welcome to oftc :)08:10
opendevreviewLin PeiWen proposed openstack/swift master: Delete unavailable py2 package  https://review.opendev.org/c/openstack/swift/+/79416708:56
opendevreviewMerged openstack/swift master: Switch IRC references from freenode to OFTC  https://review.opendev.org/c/openstack/swift/+/79398310:52
*** aolivo1 has joined #openstack-swift13:51
*** tdasilva_ has joined #openstack-swift14:09
*** tdasilva has quit IRC14:09
*** opendevreview has quit IRC14:38
*** timburke has joined #openstack-swift15:32
*** timburke has quit IRC16:32
*** thiago__ has joined #openstack-swift16:51
*** tdasilva_ has quit IRC16:57
*** erbarr has joined #openstack-swift17:11
*** edausq has joined #openstack-swift17:21
*** timburke has joined #openstack-swift17:24
*** erlon has joined #openstack-swift17:44
*** opendevreview has joined #openstack-swift20:09
opendevreviewMerged openstack/swift master: relinker: Remove replication locks for empty parts  https://review.opendev.org/c/openstack/swift/+/79030520:09
timburkealmost meeting time! reminder that we're going to try doing it *here* 🤞20:54
*** kota_ has joined #openstack-swift20:58
kota_morning20:58
timburke#startmeeting swift21:00
opendevmeetMeeting started Wed Jun  2 21:00:08 2021 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.21:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.21:00
opendevmeetThe meeting name has been set to 'swift'21:00
timburkewho's here for the swift meeting?21:00
*** thiago__ is now known as tdasilva21:00
kota_hi21:00
acoleso/21:01
mattolivero/21:02
timburkei'm glad to see most everybody's migrated over to OFTC, and that the meeting bot's working well for us here :-)21:03
timburkeas usual, the agenda's at https://wiki.openstack.org/wiki/Swift21:03
timburkeer, not that. https://wiki.openstack.org/wiki/Meetings/Swift21:03
timburkethat's the one21:03
timburkefirst up21:03
timburke#topic testing on ARM21:04
timburkei wanted to see what opinions we might have about ARM jobs now that we've (1) got more jobs proposed (thanks mattoliver!) and (2) we've had a bit more time to think about it21:04
timburkethe good news, by the way, is that everything seems to Just Work -- libec, pyeclib, swift all have passing ARM jobs proposed21:05
timburkethey're taking a bit longer than the other jobs (~2x or so?) but at least for swift, they aren't the limiting factor21:06
mattoliveryeah, and I added func, func encrytion and a probe. So pretty good coverage I think21:06
timburkei've got two main questions, and i'm not sure whether they're connected or not21:07
mattoliver#link https://review.opendev.org/c/openstack/swift/+/79328021:07
mattoliver#link https://review.opendev.org/c/openstack/pyeclib/+/79328121:07
timburke#link https://review.opendev.org/c/openstack/swift/+/79286721:08
timburke#link https://review.opendev.org/c/openstack/liberasurecode/+/79351121:08
timburkefirst, should we have them in the main check queue or a separate check-arm64 queue? ricolin proposed it as a separate queue, but trying it out on the libec patch, a single queue seems to work fine21:09
timburkesecond, should they be voting or not? they all seem to pass and if i saw one fail (*especially* if it was on a patch touching ctypes or something) i'd be inclined to figure out the failure before approving, personally21:10
mattoliverwell now that we know they seem to pass, I'm happy to have them voting, we can always turn them off again.21:11
acoles+121:11
mattoliverthe extra check pipeline, I'm not sure.. I thought I'd read somewhere it might have something to do with managing the arm64 resources.. but cant seem to figure out where I read it.. so might have been dreaming :P21:12
timburkei seem to remember seeing something about that, too -- an ML thread, maybe?21:14
timburkethat also brings me to why i'm not sure whether the questions are connected or not: with two queues, we get two zuul responses -- if the arm jobs are voting, can the second response change the vote from the first? i can ask in -infra, i suppose...21:16
zaitcevIf the CI machine set for ARM is reliable enough, then I think we want them voting. We don't want to get stuck just because something keeps crashing. That balances against the upside of guarding against a breakage that is specific to ARM.21:17
timburkei'm inclined to merge them non-voting to start, then revisit later (maybe a topic for the next PTG?)21:18
mattoliversure, sounds reasonable. point is we get to test on arm, which is pretty cool.21:19
timburkefor sure!21:20
timburke#topic train-em21:20
timburkeso at the end of this week, openstack as a whole is moving train to extended maintenance. i'm going to work on getting a release tagged before then. just a heads-up21:21
zaitcevSo... What is here to discuss?21:21
timburkethat was all :-)21:21
timburkeon to updates!21:21
timburke#topic sharding and shrinking21:22
timburkehow's it going?21:22
timburkewe merged https://review.opendev.org/c/openstack/swift/+/792182 - Add absolute values for shard shrinking config options21:23
acoleswe noticed some intermittent gappy listings from sharded containers last week, turned out we had some shard range data stuck in memcache21:23
acolesthe root problem is memcache  related, but it caused us to realise that perhaps we should not be so tolerant of bad listing responses from shard containers21:24
timburkeleading to https://review.opendev.org/c/openstack/swift/+/793492 - Return 503 for container listings when shards are deleted21:25
acolesso  https://review.opendev.org/c/openstack/swift/+/793492 proposed to 503 is a shard listing does not succeed21:25
acolesIIRC we originally thought a gappy listing was equivalent to eventual consistency, but with hindsight they are more like 'something isn't working'21:26
mattoliverAnd acoles has a patch for invalidating the shard listing cache which will hopefully make things much better21:27
acolesmattoliver: actually I abandoned that :)21:27
mattoliveroh, then I take that back.. he hasn't got one :P21:27
acolesI decided that if the cause of the bad response was backend server workload then flushing the cache could just escalate the problem21:28
acolesso not worth the risk21:28
mattoliveroh fair enough, it was hard enough to find as it was21:28
acolesgiven that memcache should expire entries, we just had an anomaly21:28
timburkei think we need some more investigation into why the entry didn't expire properly, anyway21:29
acolesI prefer the idea of including expiry time with the cached data, but I expect that's a bigger piece of work21:29
acolesanyway, that was the background to https://review.opendev.org/c/openstack/swift/+/79349221:30
timburkeonce landed, do we think it's the sort of thing we ought to backport?21:30
mattoliverI've learnt alot about memcache (and mcrounter what we use at NVIDIA). memcache should be able to supply the TTL with a 'me <key>' or something like that. I'll investigate that21:31
timburke(in light of all the sharding backports zaitcev has already done)21:31
acolestimburke: maybe. shall we see how it goes in production first (just in case we uncover a can of worms)21:32
acolesalthough, we've no reason to expect a an of worms :)21:32
mattoliver We're slowly making progress on small shard tails. Have a new simpler approach where we actually track the scan progress of the scanner to make it more reliable, and from that can make smarter decisions. And not delve into effiecent db queries or adding rows_per_shard = auto21:32
acoless/an/can/21:33
mattoliver#link https://review.opendev.org/c/openstack/swift/+/79354321:33
timburkenice21:33
acolesmattoliver: I like the context idea in https://review.opendev.org/c/openstack/swift/+/79354321:34
mattoliverI see acoles reviewed it, thanks! Will look at that again today.. yeah storing the upper might actually simplify the method.. that and/or the index.21:34
acolesI was just a bit unsure about where we do the 'tiny-shard-squashing'21:34
timburkeanything else we ought to bring up for sharding? i'll be sure to add those two patches to the priority reviews page21:36
acolesI also wondered if having per-db-replica context for scanning might help avoid split brain scanning??? but that's *another topic*21:36
mattoliverit's the progress + shard_size + minimum > object_count line. Because that returns the end upper. but maybe I miss understand.21:36
mattoliveryeah! interesting, maybe it could.. but yeah, need to think about it more before we discuss that :P21:36
timburkeall right, i'll assume those are the two main track right now :-)21:38
timburke#topic dark data watcher21:38
timburkezaitcev, i saw some more updates on https://review.opendev.org/c/openstack/swift/+/788398 -- how's it going?21:38
zaitcevtimburke: I'm addressing comments by acoles21:39
zaitcevGive me a day or two21:39
timburke👍21:40
zaitcevCould we get this landed instead? https://review.opendev.org/c/openstack/swift/+/79271321:40
zaitcevI mean in the meanwhile21:40
zaitcevNot instead.21:40
timburkei'll take a look, see about writing a test for it to demonstrate the difference21:41
zaitcevAlthough ironically enough I was going to slip it through with no change in testing coverage.21:41
timburke:P21:41
timburke#topic open discussion21:41
timburkeanything else we ought to bring up this week?21:41
zaitcevI was just about to type that the other change has better tests. However, it only emulates listings that miss the objects, but errors.21:42
acoleswe successfully quarantined a large number of isolated durable EC fragments in the last week using https://review.opendev.org/c/openstack/swift/+/78883321:44
timburkeour log-ingest pipeline seems much happier for it :-)21:44
acolesand as a consequence eliminated a large number of error log messages :)21:44
zaitcevNote that it's not the dark data plugin but the built-in replicator code that does that.21:44
timburkeoh -- i noticed that unlike with the object-updater and container-updater (which can use request path), the container-sharder doesn't give any indication what shard an update came from in container server logs -- so i proposed https://review.opendev.org/c/openstack/swift/+/793485 to stick the chard account/container in Referer21:45
zaitcevWhy are you guys quarantine them instead of deleting?21:45
zaitcevs/ are / do /21:45
mattolivertimburke: nice21:46
zaitcevIs there any doubt about the decision-making in that code? Looked pretty watertight to me. Just a general caution?21:46
acolestimburke: I'll review that again21:46
timburkethanks21:47
acoleszaitcev: yes, caution21:47
mattoliverjust seemed better to quarantine then to just delete21:47
acolesI'm averse to deleting things21:47
zaitcevThis contrasts with Alistair wanting to run object watcher with action=delete, which clearly has more avenues to fail and start deleting everything.21:47
timburkezaitcev, yeah, general caution. our ops team will still need some tooling to wade through quarantines, though :-(21:47
acoleszaitcev: I don't want to run dark data watcher ! I'm worried for anyone that does (before these fixes get merged)21:48
mattoliverI have been playing with some potential reconstructor improvements, more interesting chain ends: https://review.opendev.org/c/openstack/swift/+/793888 which if it finds it on the last known primary will leave it for the handoff to push.. kinda a built in handoffs_first if we're talking post rebalance.21:49
mattoliverthe last patch (that;s linked) in the chain is skipping a partition if there have been a bunch already found on said partition. In some basic testing in my SAIO it sped post rebalance reconstructor cycle quite a bit.21:51
mattoliverbut just playing around, scratching an itch.21:51
timburkevery cool -- it'd be interesting to play with that in a lab environment (and for that matter, to have some notion of "rebalance scenarios" for labs...)21:52
timburkeall right, i think we're about done then21:53
timburkethank you all for coming, and thank you for working on swift!21:54
timburke#endmeeting21:54
opendevmeetMeeting ended Wed Jun  2 21:54:07 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)21:54
opendevmeetMinutes:        http://eavesdrop.openstack.org/meetings/swift/2021/swift.2021-06-02-21.00.html21:54
opendevmeetMinutes (text): http://eavesdrop.openstack.org/meetings/swift/2021/swift.2021-06-02-21.00.txt21:54
opendevmeetLog:            http://eavesdrop.openstack.org/meetings/swift/2021/swift.2021-06-02-21.00.log.html21:54
*** kota_ has quit IRC21:55
*** kota_ has joined #openstack-swift21:56
*** kota_ has quit IRC22:04
*** kota_ has joined #openstack-swift22:43
*** kota_ has quit IRC22:51

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!