21:00:50 #startmeeting swift 21:00:50 Meeting started Wed Mar 1 21:00:50 2023 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:50 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:50 The meeting name has been set to 'swift' 21:01:00 who's here for the swift meeting? 21:01:08 o/ 21:01:12 o/ 21:01:45 i'm kinda here, have the day off today so that means I'm on getting kids ready for school (however that works) :P 21:02:51 i didn't get around to updating the agenda, but i think it's mostly going to be a couple updates from last week, maybe one interesting new thing i'm working on 21:03:20 #topic ssync, data with offsets, and meta 21:03:34 o/ 21:03:51 clayg's probe test got squashed into acoles's fix 21:03:59 #link https://review.opendev.org/c/openstack/swift/+/874122 21:04:41 we're upgrading our cluster now to include that fix; we should be sure to include feedback about how that went on the review 21:05:37 being able to deal with metas with timestamps is still a separate review, but acoles seems to like the direction 21:05:40 #link https://review.opendev.org/c/openstack/swift/+/874184 21:06:24 timburke: persuaded me that we should fix a future bug while we had this all in our heads 21:06:29 the timestamp-offset delimiter business still seems a little strange, but i didn't immediately see a better way to do deal with it 21:07:56 #topic http keepalive timeout 21:08:28 so my eventlet patch merged! gotta admit, seemed easier to get merged than expected :-) 21:08:30 #link https://github.com/eventlet/eventlet/pull/788 21:09:24 which means i ought to revisit the swift patch to add config plumbing 21:09:28 #link https://review.opendev.org/c/openstack/swift/+/873744 21:10:28 are we all ok with turning it into a pure-plumbing patch, provided i make it clear in the sample config that the new option kinda requires new eventlet? 21:12:03 what happens if the option is set without new eventlet? 21:13:12 largely, existing behavior: keepalive is turned on, and with the general socket timeout (ie, client_timeout) 21:13:41 it would also give the option of setting keepalive_timeout to 0 to turn off keepalive behavior 21:13:50 Yup, do it 21:14:36 ok 21:15:32 all right then 21:15:34 #topic per-policy quotas 21:15:45 thanks for the reviews, mattoliver! 21:16:11 test refactor is now landed, and there's a +2 on the code refactor 21:16:18 #link https://review.opendev.org/c/openstack/swift/+/861487 21:16:27 any reason not to just merge it? 21:17:44 i suppose mattoliver's busy ;-) i can poke him more later 21:18:09 the actual feature patch needs some docs -- i'll try to get that up this week 21:18:12 #link https://review.opendev.org/c/openstack/swift/+/861282 21:19:22 other interesting thing i've been working on (and i should be sure to add it to the PTG etherpad) 21:19:24 I just glanced (not reviewed) and the refactor looks nicer than the original 21:20:16 thanks -- there were a couple sneaky spots, but the existing tests certainly helped 21:20:24 #topic statsd labeling extensions 21:21:18 Yeah it can probably just land 21:21:20 when swift came out, statsd was the basis for a pretty solid monitoring stack 21:22:03 these days, though, people generally seem to be coalescing around prometheus, or at least its data model 21:23:23 we at nvidia, for example, are running https://github.com/prometheus/statsd_exporter on every node to turn swift's stats into something that can be periodically scraped 21:24:29 I've been playing with otel metrics, put it as a topic on the ptg etherpad. Got a basic client to test some infrastructure here at work. Maybe I could at least write up some doc on how that works for extra discussions at the ptg? 21:25:00 By that i mean how open telemetry works 21:25:08 that'd be great, thanks! 21:26:33 as it works for us today, there's a bunch of parsing that's required -- a stat like `proxy-server.object.HEAD.200.timing:56.9911003112793|ms` doesn't have all the context we really want in a prometheus metric (like, 200 is the status, HEAD is the request method, etc.) 21:27:55 which means that whenever we add a new metric, there's a handoff between dev and ops about what the new metric is, then ops need to go update some yaml file so the new metric gets parsed properly, and *then* they can start using it in new dashboards 21:28:12 which all seems like some unnecessary friction 21:29:33 fortunately, there are already some extensions to add the missing labels for components, and the statsd_exporter even already knows how to eat several of them: https://github.com/prometheus/statsd_exporter#tagging-extensions 21:30:08 so i'm currently playing around with emitting metrics like `proxy-server.timing,layer=account,method=HEAD,status=204:41.67628288269043|ms` 21:30:22 or `proxy-server.timing:34.14654731750488|ms|#layer:account,method:HEAD,status:204` 21:30:35 or `proxy-server.timing#layer=account,method=HEAD,status=204:5.418539047241211|ms` 21:30:44 or `proxy-server.timing;layer=account;method=HEAD;status=204:34.639835357666016|ms` 21:31:33 (really, "proxy-server" should probably get labeled as something like "service"...) 21:31:58 my hope is to have a patch up ahead of the PTG, so... look forward to that! 21:32:05 nice! 21:32:37 "layer" is a new term to me? 21:32:56 idk, feel free to offer alternative suggestions :-) 21:33:10 vs tier or resource (I guess tier isn't clear) 21:33:22 haha it took us < 1second to get into a naming debate :D 21:33:40 let's save that for the PTG 21:34:53 Oh cool, I look forward to seeing it! 21:34:54 if it doesn't mesh well with an operator's existing metrics stack, (1) it's opt-in and they can definitely still do the old-school vanilla statsd metrics, and (2) most collection endpoints (i believe) offer some translation mechanism 21:34:55 I'm hoping we might eventually converge this "structured" stats with structured logging 21:35:14 +1 21:35:31 yes! there's a lot of context that seems like it'd be smart to share between stats and logging 21:35:40 e.g. build a "context" data structure and squirt it a logger and/or a stats client and you're done 21:36:15 that's all i've got 21:36:19 #topic open discussion 21:36:25 what else should we bring up this week? 21:36:40 on that theme, I wanted to draw attention to a change i have proposed to sharder logging 21:37:18 #link https://review.opendev.org/c/openstack/swift/+/875220 21:37:21 #link https://review.opendev.org/c/openstack/swift/+/875221 21:37:27 2 patches currently: https://review.opendev.org/c/openstack/swift/+/875220 and https://review.opendev.org/c/openstack/swift/+/875221 21:37:35 timburke: is so quick! 21:38:13 Oh yeah, I've been meaning to get to that.. but off for the rest of the week, so won't happen now until next week. 21:38:19 I recently had to debug some sharder issue and found the inconsistently log formats very frustrating 21:38:58 e.g sometime we include the DB path, sometimes the resource path, sometimes both...but worst, sometimes neither 21:39:59 So the patches ensure that every log message associated with a container DB (which is almost all) will consistently get both the db file path and the resource path (i,e, 'a/c') appended to the message 21:40:31 I wanted to flag it up because that includes WARNING and ERROR level messages that I am aware some ops may parse for alerts 21:41:07 so this change may break some parsing, but on the whole I believe we'll be better for having consistency 21:41:11 Sounds good, and as we eventually worker up the sharper it gets all more important. 21:41:37 *sharder 21:42:07 IDK if we have precedence for flagging up such a change, or if I am worrying too much (I tend to!) 21:43:26 Your making debugging via log messages easier.. and that's a win in my book 21:43:43 there's some precedent (e.g., https://review.opendev.org/c/openstack/swift/+/863446) but in general i'm not worried 21:44:19 ok so I could add an UpgradeImpact to the commit message 21:45:02 if we got to the point of actually emitting structured logs, and then *took that away*, i'd worry. but this, *shrug* 21:46:01 fwiw, i did *not* call it out in the changelog 21:46:14 well if there's no concerns re. the warnings then I will squash the two patches 21:47:11 and then I can look forward to the next sharder debugging session 😜 21:47:21 sounds good 21:49:06 all right, i think i'll call it 21:49:17 thank you all for coming, and thank you for working on swift! 21:49:23 #endmeeting