Wednesday, 2023-03-22

zaitcev	timburke: Are you saying that I don't need to be concerned with keystonemiddleware because we use our fork anyway?	00:23
opendevreview	Matthew Oliver proposed openstack/swift master: WIP: internal_client: Add iter_shard_ranges interface https://review.opendev.org/c/openstack/swift/+/877584	02:45
timburke	zaitcev, well... i know my clusters have used the swift-tree middleware for years now -- idk what your users are using -- but at least there's a migration path	03:56
timburke	presumably, keystone will be happy to be rid of the thing at some point	03:57
zaitcev	Of what thing? s3_token?	04:00
zaitcev	Obviously Nova uses ec2_token, so it's not going anywhere.	04:00
opendevreview	Matthew Oliver proposed openstack/swift master: WIP: Internalclient gatekeeper restore header shim https://review.opendev.org/c/openstack/swift/+/878188	04:37
mcape	hey all! one of three servers' controller failed, and all drives are unaccessible now. the problem is that cluster was at 95% of capacity, and now trying to rebalance as I understand. How can I stop the partition movement to handoff nodes (which will quickly overfill all cluster)?	07:46
mcape	it has 2 regions, with 1 zone in first, and 2 zones in seconds	08:05
mcape	the failed server is in 1 region... and two other servers in 1 region are heading to 98-99% of disk utilization... while the norm is 94-95% (as in second zone)	08:06
opendevreview	Merged openstack/python-swiftclient master: Use SLO by default for segmented uploads if the cluster supports it https://review.opendev.org/c/openstack/python-swiftclient/+/864444	16:00
edausq	hello, I have opened a bug report https://bugs.launchpad.net/swift/+bug/2012531	16:57
edausq	it is both impacting and tricky, I hope my report is clair enough, so a coredev can give a look	16:58
timburke	mcape, i think you've got two options: stop all replicators until you can get hardware replaced (which exacerbates your current durability troubles), or reduce the ring to 2 replicas and remove all devices from the failed node in the ring (which may cause some further shuffling of partitions and/or complicate bringing the disks back into the cluster)	16:59
timburke	edausq, looking at it now -- will try to keep you updated. just to double-check: it's the same version of swift under both py2 and py3, yeah?	17:06
opendevreview	ASHWIN A NAIR proposed openstack/swift master: allow x-open-expired on POST requests https://review.opendev.org/c/openstack/swift/+/877434	17:16
edausq	timburke: yes, same version of swift. thank you!	17:23
timburke	edausq, i haven't been able to repro yet -- i've definitely spent some time thinking about exactly this sort of a problem, though, and thought we had all our bases covered :-/ just to make sure i've got my environment right: which 2.29.x release is this? which version of python? eventlet?	18:19
timburke	is there anything special i should know about how the object-server's deployed? (for example, i know some people have tried getting it running using mod_wsgi or uwsgi instead of eventlet's wsgi server)	18:19
timburke	oh! and do you have encryption enabled? i just realized: i do, and that's probably throwing off my testing so far...	18:26
timburke	yup, that'll do it. sigh	18:28
edausq	we don't have encryption enabled. I am so glad to read you were able to reproduce! And you have a traceback too. I don't understand how come we don't, but that's another topic	19:52
edausq	timburke: since you can reproduce, I am guessing you don't need our details about python/eventlet and swift version.	19:55
timburke	edausq, yeah, i'm good -- thanks	20:03
kota	good morning	20:57
mattoliver	morning	21:02
acoles	kota: mattoliver good morning!	21:02
kota	acoles: mattoliver o/	21:02
indianwhocodes	o/	21:05
mattoliver	timburke: you around?	21:07
timburke	oh, right!	21:07
timburke	#startmeeting swift	21:07
opendevmeet	Meeting started Wed Mar 22 21:07:49 2023 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.	21:07
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	21:07
opendevmeet	The meeting name has been set to 'swift'	21:07
timburke	main things this week	21:08
timburke	#topic vPTG	21:08
timburke	it's next week!	21:08
mattoliver	already!	21:08
timburke	also, i accidentally scheduled a vacation at the same time 😳	21:09
kota	wow	21:09
mattoliver	sure sure :P	21:09
mattoliver	yeah, no stress	21:09
timburke	but it sounds like mattoliver is happy to lead discussions	21:09
mattoliver	yeah, I aint no timburke but I can talk, so happy to lead. But need people to help me discuss stuff :)	21:10
mattoliver	So put your topics down!	21:10
mattoliver	timburke: do we have rooms scheduled etc?	21:11
timburke	no, not yet -- i'd suggest going for this timeslot, M-Th	21:11
timburke	sorry acoles. there isn't really a good time :-(	21:12
timburke	take good notes! i'll read through the etherpad when i get back :-)	21:12
mattoliver	kk	21:13
mattoliver	is there a place I'm suppose to suggest/register the rooms or just register them via the bot like I did for the ops feed back last time?	21:13
timburke	via the bot, like last time. anyone should be able to book rooms over in #openinfra-events by messaging "#swift book <slot ref>"	21:14
mattoliver	cool, I'l come up with something	21:15
timburke	#topic py3 metadata bug	21:15
timburke	#link https://bugs.launchpad.net/swift/+bug/2012531	21:15
mattoliver	So long as acoles is ok with it. Or maybe we have an earler one for ops feedback.. I'll come up with something	21:15
mattoliver	oh this seems like an interesting bug	21:15
timburke	so... it looks like i may have done too much testing with encryption enabled	21:15
timburke	(encryption horribly mangles metadata anyway, then base64s it so it's safer -- which also prevented me from bumping into this earlier)	21:17
timburke	but the TLDR is that py3-only clusters would write down object metadata as WSGI strings (that crazy str.encode('utf8').decode('latin1') dance). they'd be able to round-trip them back out just fine, but if you had data on-disk already that was written under py2, that data would cause the object-server to bomb out	21:19
acoles	sorry guys I need to drop off, I'll do my best to make the PTG - mattoliver let me know what you work out with times	21:20
mattoliver	acoles: kk	21:21
timburke	my thinking is that the solution should be to ensure that diskfile only reads & writes proper strings, not WSGI ones -- but it will be interesting trying to deal with data that was written in a py3-only cluster	21:21
mattoliver	timburke: oh bummer	21:21
mattoliver	so diskfile will need to know how to return potential utf8 strings as wsgi ones, so antoher wsgi str dance.	21:22
mattoliver	but I guess it's only for the metadata?	21:22
timburke	yeah, should only be metadata. and (i think) only metadata from headers -- at the very least, metadata['name'] comes out right already	21:23
timburke	hopefully it's a reasonable assumption that no one would actually want to write metadata that's mis-encoded like that, so my plan is to try the wsgi_to_str transformation as we read meta -- if it doesn't succeed, assume it was written correctly (either under py2 or py3-with-new-swift)	21:24
mattoliver	yeah, kk	21:24
mattoliver	let me know how you go or if you need me to poke at anything, esp while your away	21:25
timburke	thanks mattoliver, i'll try to get a patch up for that later today	21:25
mattoliver	and thanks for digging into it. thats a bugger of a bug.	21:25
timburke	makes me wish i'd had the time/patience to get func tests running against a cluster with mixed python versions years ago...	21:27
timburke	anyway	21:27
timburke	#topic swiftclient release	21:27
timburke	we've had some interesting bug fixes in swiftclient since our last release!	21:27
timburke	#link https://review.opendev.org/c/openstack/python-swiftclient/+/874032 Retry with fresh socket on 499	21:29
timburke	#link https://review.opendev.org/c/openstack/python-swiftclient/+/877110 service: Check content-length before etag	21:29
timburke	#link https://review.opendev.org/c/openstack/python-swiftclient/+/877424 Include transaction ID on content-check failures	21:29
timburke	#link https://review.opendev.org/c/openstack/python-swiftclient/+/864444 Use SLO by default for segmented uploads if the cluster supports it	21:30
timburke	so i'm planning to get a release out soon (ideally this week)	21:30
mattoliver	ok cool	21:30
timburke	thanks clayg in particular for the reviews!	21:30
timburke	that's most everything i wanted to cover for this week	21:32
mattoliver	nice. If there is anything else anyone wants to cover, put it in the PTG etherpad ;)	21:33
timburke	other initiatives seem to be making steady progress (recovering expired objects, per-policy quotas, ssync timestamp-with-offset fix)	21:34
timburke	#topic open discussion	21:34
timburke	anything else we should talk about this week?	21:34
mattoliver	We did have some proxies with very large memory useage > 10G	21:34
mattoliver	so not sure if there is a bug there. maybe some memory leak with connections.. but it's too early to tell. I'm attempting to dig in. but just a heads up.	21:35
timburke	right! this was part of our testing with py3, right?	21:35
mattoliver	may or may not turn into anything	21:35
mattoliver	yup	21:35
timburke	i'm anxious to see a repro; haven't had a chance to dig into it more yet, myself	21:35
mattoliver	there seems to be alot of CLOSE_WAIT connections, so wonder if its a socket leak or not closing properly or something.	21:36
mattoliver	I'll try and dig in some more today	21:36
kota	nice	21:37
mattoliver	I am also working on an internalclient interface for getting shard ranges, as more and more things may need to become shard aware.	21:38
mattoliver	#link https://review.opendev.org/c/openstack/swift/+/877584	21:38
mattoliver	but it's still a WIP, like other things, let's see how we go.	21:38
mattoliver	if there is a gatekeeper added to the internal client it'll break the function though. Al has suggested one possible fix, I came up with a middleware shim in internal client, clayg seems to think we should just error hard.	21:39
mattoliver	break the interface I mean.	21:40
mattoliver	So dicsussions are happening about that.. might start with the simplest and error loud I guess, but let's see where it goes.	21:40
mattoliver	That's all I have	21:41
timburke	i'm surprised there'd be any internal clients that would want a gatekeeper... huh	21:42
mattoliver	well there aren't	21:43
mattoliver	but if someone creates one with alow_modify_pipeline=True (or whatever it's called), one will be added	21:43
mattoliver	and this would break sharding.. in fact it might already as the the sharder uses interenal client to get shards already, the interface just wants unified	21:44
mattoliver	or a mis configuration from an op.	21:44
timburke	i'll blame it on clayg ;-) https://review.opendev.org/c/openstack/swift/+/77042/1/swift/common/internal_client.py	21:49
mattoliver	So yeah, I could just be doing down an edgecase that doesn't really matter. But it is still a shoot foot edgecase, and do we attempt to avoid it, or assume people will do the right thing.	21:49
mattoliver	lol	21:49
timburke	well, i think i'll call it	21:49
mattoliver	kk	21:49
mattoliver	thats all I have anyway :)	21:49
timburke	thank you all for coming, and thank you for working on swift!	21:49
timburke	#endmeeting	21:49
opendevmeet	Meeting ended Wed Mar 22 21:49:31 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	21:49
opendevmeet	Minutes: https://meetings.opendev.org/meetings/swift/2023/swift.2023-03-22-21.07.html	21:49
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/swift/2023/swift.2023-03-22-21.07.txt	21:49
opendevmeet	Log: https://meetings.opendev.org/meetings/swift/2023/swift.2023-03-22-21.07.log.html	21:49
opendevreview	ASHWIN A NAIR proposed openstack/swift master: allow x-open-expired on POST requests https://review.opendev.org/c/openstack/swift/+/877434	21:49
timburke	huh. longer than normal meeting-end delay	21:49

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!