Monday, 2020-10-05

*** psachin has joined #openstack-swift03:39
*** evrardjp has quit IRC04:33
*** evrardjp has joined #openstack-swift04:33
*** m75abrams has joined #openstack-swift05:06
*** tkajinam has quit IRC06:13
*** tkajinam has joined #openstack-swift06:13
*** manuvakery has joined #openstack-swift06:45
*** rpittau|afk is now known as rpittau07:29
*** mikecmpbll has joined #openstack-swift07:49
*** mikecmpbll has quit IRC09:50
*** mikecmpbll has joined #openstack-swift09:53
*** dsariel has joined #openstack-swift10:24
*** tkajinam has quit IRC13:49
*** tkajinam has joined #openstack-swift13:49
*** tonyb has quit IRC14:50
*** m75abrams has quit IRC15:06
*** gyee has joined #openstack-swift15:34
*** rpittau is now known as rpittau|afk16:06
timburkegood morning16:16
zaitcevGood Morning!16:46
claygtimburke: so trying to test p 735271 - I don't know that "timing since" is really the right way to emit these stats16:47
patchbothttps://review.opendev.org/#/c/735271/ - swift - metrics: Add lag metric to expirer - 1 patch set16:48
clayglike only get metrics for count, mean and upper - do we only care about upper?16:48
claygthere's an interesting line for error-404.lag.upper until it hits a reclaim age - most everything else is just spotty.16:49
claygthe real values are NUTS too - like i might have a 30s lag on a successful - but I approach a reclaim age on 404's 🤷‍♂️16:50
claygMaybe I'm still stuck thinking about the expirer queue in terms of "how big is the volume of work that's past it's deadline" instead of "how late was I on this particular item" 🤔16:50
timburkeclayg, i could see some value in count -- if you've got something like your script to do some GETs/HEADs in the expiring-object account and see that you've got a bit of a backlog, the counts can give you an idea of how long it'll take you to get through them all16:51
claygsuccessful.lag.count then is a proxy for processing rate - and you can just you sum that up across nodes?16:52
claygit's that significantly better than extrapolating from the graph of "how fast is the backlog going down" - I mean i guess so, it's more real time (no container-updates needed) if you're adjusting concurrency tuning etc that's useful16:53
timburkedon't see why not -- though we've already got an "objects" counter that'd do the same thing...16:54
timburkethe tendency of upper to approach reclaim age makes it seem like mean (or maybe 90th percentile?) would be useful, too -- sure, i've got some stale queue entries that are going to need to get reaped after a reclaim age -- but how are the *rest* of the entries doing?16:55
openstackgerritTim Burke proposed openstack/swift master: squash: More logging refactoring; stop needing an extra catch_errors  https://review.opendev.org/75590616:56
claygHave you played with these metrics in a dev environment?  do you have graphs you LIKE?  I'm probably failing to conceptualize them...16:56
claygI have a value on a graph of "542K" for object-expirer.successful.lag.upper - it's part of a series that went up then down then back up... what... does... that... mean?16:57
claygdid I really have my expirer slowed down so much it successfully deleted an expired object six days late?  I only had it running over the weekend!  🤔16:58
timburkei've not actually seen it graphed yet, no. it was largely speculative that it'd be useful, and drawn somewhat by analogy from p 715580 -- way i see it, we've got an item of work, it became valid to perform it at such-and-such time, it's worth tracking how long it takes for us to actually get it done17:08
patchbothttps://review.opendev.org/#/c/715580/ - swift - obj-updater: add metric on lag of containers listing - 1 patch set17:08
timburkethere will be times when a whole bunch of stuff is all available to be done at once, and it's going to lead to spikes on the graph -- if it stays under reclaim age, i know i'm still good17:10
timburkeif upper *and* median keep trending toward reclaim age, i probably need to think about upping my reclaim age; otherwise there's gonna be some data that never gets cleaned up properly17:12
timburkeif upper trends toward reclaim age but median seems "healthy", i guess i won't worry too much -- there's some stale work that's taking a while to process and i should maybe look at the health of my object-updaters?17:13
claygbut, like if we have multiple nodes you can't just "sum" the upper - so you take an "average" - but then what does that really mean... I mean... I don't know what it means already maybe a mean of the mean would be mean 🤮17:34
timburkeno -- you can look at the upper of the upper, and you might even be able to make sense of the upper of the mean, but my gut says trying to average across nodes is unlikely to give you anything useful18:11
*** psachin has quit IRC18:16
*** manuvakery has quit IRC18:45
openstackgerritTim Burke proposed openstack/swift master: squash: More logging refactoring; stop needing an extra catch_errors  https://review.opendev.org/75590618:46
claygoh right,  max of the upper across nodes 🤔18:53
claygI might have to try again to setup an experiment, right now i've just got https://gist.github.com/clayg/e6ddc8396b72683d162ce56c52b5b390 and i made some to expire @ 30, 300, 3000, 30000, etc.18:55
*** viks____ has quit IRC19:02
*** edausq has quit IRC20:25
*** tonyb has joined #openstack-swift21:22
openstackgerritPete Zaitcev proposed openstack/swift master: DNM: follow-up for Dark Data #35  https://review.opendev.org/75616321:48
zaitcevWhat a can of worms21:48
zaitcevtimburke: I wanted to consult with you about this real quick though. Suppose we have plugins X and Y. X runs first and quarantines the object. Should Y get called to the already-quarantined object?21:56
zaitcevtimburke: There's also a Solomon solution: prohibit quarantining by plugins and just not do it.21:57
timburkemy gut says no -- but i also don't want us stat'ing the file to see whether it still exists between every plugin invocation...21:58
timburkemaybe we could make it part of the api that plugins aren't allowed to quarantine themselves, but they can instead signal that the auditor should do it?21:59
timburkeor maybe, *if* a plugin quaranitines a file itself, it has to let a DiskFileQuarantined bubble out...22:00
zaitcevhttps://review.opendev.org/#/c/756163/1/swift/obj/audit_dark_data.py does just that22:01
patchbotpatch 756163 - swift - DNM: follow-up for Dark Data #35 - 1 patch set22:01
zaitcevHowever, that exception causes the execution of chain of plugins to stop.22:01
zaitcevAs you see, David omitted the raise, so other plugins were called anyway. I thought it was asking for trouble.22:02
timburkeaborting the chain seems reasonable to me -- it's not in the objects tree, so no more watching22:04
zaitcevThanks a lot22:05
*** rcernin has joined #openstack-swift22:19
openstackgerritTim Burke proposed openstack/swift stable/ussuri: Authors/ChangeLog for 2.25.1  https://review.opendev.org/75616622:21
*** samueldmq has quit IRC22:22
*** samueldmq has joined #openstack-swift22:26
mattoliveraumorning22:30
openstackgerritTim Burke proposed openstack/swift stable/train: ChangeLog for 2.23.2  https://review.opendev.org/75616722:47
*** mikecmpbll has quit IRC23:12
*** mikecmpbll has joined #openstack-swift23:15
*** dsariel has quit IRC23:53

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!