21:00:50 <timburke> #startmeeting swift
21:00:50 <opendevmeet> Meeting started Wed Mar  1 21:00:50 2023 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:50 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:50 <opendevmeet> The meeting name has been set to 'swift'
21:01:00 <timburke> who's here for the swift meeting?
21:01:08 <zaitcev> o/
21:01:12 <indianwhocodes> o/
21:01:45 <mattoliver> i'm kinda here, have the day off today so that means I'm on getting kids ready for school (however that works) :P
21:02:51 <timburke> i didn't get around to updating the agenda, but i think it's mostly going to be a couple updates from last week, maybe one interesting new thing i'm working on
21:03:20 <timburke> #topic ssync, data with offsets, and meta
21:03:34 <acoles> o/
21:03:51 <timburke> clayg's probe test got squashed into acoles's fix
21:03:59 <timburke> #link https://review.opendev.org/c/openstack/swift/+/874122
21:04:41 <timburke> we're upgrading our cluster now to include that fix; we should be sure to include feedback about how that went on the review
21:05:37 <timburke> being able to deal with metas with timestamps is still a separate review, but acoles seems to like the direction
21:05:40 <timburke> #link https://review.opendev.org/c/openstack/swift/+/874184
21:06:24 <acoles> timburke: persuaded me that we should fix a future bug while we had this all in our heads
21:06:29 <timburke> the timestamp-offset delimiter business still seems a little strange, but i didn't immediately see a better way to do deal with it
21:07:56 <timburke> #topic http keepalive timeout
21:08:28 <timburke> so my eventlet patch merged! gotta admit, seemed easier to get merged than expected :-)
21:08:30 <timburke> #link https://github.com/eventlet/eventlet/pull/788
21:09:24 <timburke> which means i ought to revisit the swift patch to add config plumbing
21:09:28 <timburke> #link https://review.opendev.org/c/openstack/swift/+/873744
21:10:28 <timburke> are we all ok with turning it into a pure-plumbing patch, provided i make it clear in the sample config that the new option kinda requires new eventlet?
21:12:03 <acoles> what happens if the option is set without new eventlet?
21:13:12 <timburke> largely, existing behavior: keepalive is turned on, and with the general socket timeout (ie, client_timeout)
21:13:41 <timburke> it would also give the option of setting keepalive_timeout to 0 to turn off keepalive behavior
21:13:50 <mattoliver> Yup, do it
21:14:36 <acoles> ok
21:15:32 <timburke> all right then
21:15:34 <timburke> #topic per-policy quotas
21:15:45 <timburke> thanks for the reviews, mattoliver!
21:16:11 <timburke> test refactor is now landed, and there's a +2 on the code refactor
21:16:18 <timburke> #link https://review.opendev.org/c/openstack/swift/+/861487
21:16:27 <timburke> any reason not to just merge it?
21:17:44 <timburke> i suppose mattoliver's busy ;-) i can poke him more later
21:18:09 <timburke> the actual feature patch needs some docs -- i'll try to get that up this week
21:18:12 <timburke> #link https://review.opendev.org/c/openstack/swift/+/861282
21:19:22 <timburke> other interesting thing i've been working on (and i should be sure to add it to the PTG etherpad)
21:19:24 <acoles> I just glanced (not reviewed) and the refactor looks nicer than the original
21:20:16 <timburke> thanks -- there were a couple sneaky spots, but the existing tests certainly helped
21:20:24 <timburke> #topic statsd labeling extensions
21:21:18 <mattoliver> Yeah it can probably just land
21:21:20 <timburke> when swift came out, statsd was the basis for a pretty solid monitoring stack
21:22:03 <timburke> these days, though, people generally seem to be coalescing around prometheus, or at least its data model
21:23:23 <timburke> we at nvidia, for example, are running https://github.com/prometheus/statsd_exporter on every node to turn swift's stats into something that can be periodically scraped
21:24:29 <mattoliver> I've been playing with otel metrics, put it as a topic on the ptg etherpad. Got a basic client to test some infrastructure here at work. Maybe I could at least write up some doc on how that works for extra discussions at the ptg?
21:25:00 <mattoliver> By that i mean how open telemetry works
21:25:08 <timburke> that'd be great, thanks!
21:26:33 <timburke> as it works for us today, there's a bunch of parsing that's required -- a stat like `proxy-server.object.HEAD.200.timing:56.9911003112793|ms` doesn't have all the context we really want in a prometheus metric (like, 200 is the status, HEAD is the request method, etc.)
21:27:55 <timburke> which means that whenever we add a new metric, there's a handoff between dev and ops about what the new metric is, then ops need to go update some yaml file so the new metric gets parsed properly, and *then* they can start using it in new dashboards
21:28:12 <timburke> which all seems like some unnecessary friction
21:29:33 <timburke> fortunately, there are already some extensions to add the missing labels for components, and the statsd_exporter even already knows how to eat several of them: https://github.com/prometheus/statsd_exporter#tagging-extensions
21:30:08 <timburke> so i'm currently playing around with emitting metrics like `proxy-server.timing,layer=account,method=HEAD,status=204:41.67628288269043|ms`
21:30:22 <timburke> or `proxy-server.timing:34.14654731750488|ms|#layer:account,method:HEAD,status:204`
21:30:35 <timburke> or `proxy-server.timing#layer=account,method=HEAD,status=204:5.418539047241211|ms`
21:30:44 <timburke> or `proxy-server.timing;layer=account;method=HEAD;status=204:34.639835357666016|ms`
21:31:33 <timburke> (really, "proxy-server" should probably get labeled as something like "service"...)
21:31:58 <timburke> my hope is to have a patch up ahead of the PTG, so... look forward to that!
21:32:05 <acoles> nice!
21:32:37 <acoles> "layer" is a new term to me?
21:32:56 <timburke> idk, feel free to offer alternative suggestions :-)
21:33:10 <acoles> vs tier or resource (I guess tier isn't clear)
21:33:22 <acoles> haha it took us < 1second to get into a naming debate :D
21:33:40 <acoles> let's save that for the PTG
21:34:53 <mattoliver> Oh cool, I look forward to seeing it!
21:34:54 <timburke> if it doesn't mesh well with an operator's existing metrics stack, (1) it's opt-in and they can definitely still do the old-school vanilla statsd metrics, and (2) most collection endpoints (i believe) offer some translation mechanism
21:34:55 <acoles> I'm hoping we might eventually converge this "structured" stats with structured logging
21:35:14 <mattoliver> +1
21:35:31 <timburke> yes! there's a lot of context that seems like it'd be smart to share between stats and logging
21:35:40 <acoles> e.g. build a "context" data structure and squirt it a logger and/or a stats client and you're done
21:36:15 <timburke> that's all i've got
21:36:19 <timburke> #topic open discussion
21:36:25 <timburke> what else should we bring up this week?
21:36:40 <acoles> on that theme, I wanted to draw attention to a change i have proposed to sharder logging
21:37:18 <timburke> #link https://review.opendev.org/c/openstack/swift/+/875220
21:37:21 <timburke> #link https://review.opendev.org/c/openstack/swift/+/875221
21:37:27 <acoles> 2 patches currently: https://review.opendev.org/c/openstack/swift/+/875220 and https://review.opendev.org/c/openstack/swift/+/875221
21:37:35 <acoles> timburke: is so quick!
21:38:13 <mattoliver> Oh yeah, I've been meaning to get to that.. but off for the rest of the week, so won't happen now until next week.
21:38:19 <acoles> I recently had to debug some sharder issue and found the inconsistently log formats very frustrating
21:38:58 <acoles> e.g sometime we include the DB path, sometimes the resource path, sometimes both...but worst, sometimes neither
21:39:59 <acoles> So the patches ensure that every log message associated with a container DB (which is almost all) will consistently get both the db file path and the resource path (i,e, 'a/c') appended to the message
21:40:31 <acoles> I wanted to flag it up because that includes WARNING and ERROR level messages that I am aware some ops may parse for alerts
21:41:07 <acoles> so this change may break some parsing, but on the whole I believe we'll be better for having consistency
21:41:11 <mattoliver> Sounds good, and as we eventually worker up the sharper it gets all more important.
21:41:37 <mattoliver> *sharder
21:42:07 <acoles> IDK if we have precedence for flagging up such a change, or if I am worrying too much (I tend to!)
21:43:26 <mattoliver> Your making debugging via log messages easier.. and that's a win in my book
21:43:43 <timburke> there's some precedent (e.g., https://review.opendev.org/c/openstack/swift/+/863446) but in general i'm not worried
21:44:19 <acoles> ok so I could add an UpgradeImpact to the commit message
21:45:02 <timburke> if we got to the point of actually emitting structured logs, and then *took that away*, i'd worry. but this, *shrug*
21:46:01 <timburke> fwiw, i did *not* call it out in the changelog
21:46:14 <acoles> well if there's no concerns re. the warnings then I will squash the two patches
21:47:11 <acoles> and then I can look forward to the next sharder debugging session 😜
21:47:21 <timburke> sounds good
21:49:06 <timburke> all right, i think i'll call it
21:49:17 <timburke> thank you all for coming, and thank you for working on swift!
21:49:23 <timburke> #endmeeting