Wednesday, 2023-03-22

zaitcevtimburke: Are you saying that I don't need to be concerned with keystonemiddleware because we use our fork anyway?00:23
opendevreviewMatthew Oliver proposed openstack/swift master: WIP: internal_client: Add iter_shard_ranges interface  https://review.opendev.org/c/openstack/swift/+/87758402:45
timburkezaitcev, well... i know *my* clusters have used the swift-tree middleware for years now -- idk what *your* users are using -- but at least there's a migration path03:56
timburkepresumably, keystone will be happy to be rid of the thing at some point03:57
zaitcevOf what thing? s3_token?04:00
zaitcevObviously Nova uses ec2_token, so it's not going anywhere.04:00
opendevreviewMatthew Oliver proposed openstack/swift master: WIP: Internalclient gatekeeper restore header shim  https://review.opendev.org/c/openstack/swift/+/87818804:37
mcapehey all! one of three servers' controller failed, and all drives are unaccessible now. the problem is that cluster was at 95% of capacity, and now trying to rebalance as I understand. How can I stop the partition movement to handoff nodes (which will quickly overfill all cluster)?07:46
mcapeit has 2 regions, with 1 zone in first, and 2 zones in seconds08:05
mcapethe failed server is in 1 region... and two other servers in 1 region are heading to 98-99% of disk utilization... while the norm is 94-95% (as in second zone)08:06
opendevreviewMerged openstack/python-swiftclient master: Use SLO by default for segmented uploads if the cluster supports it  https://review.opendev.org/c/openstack/python-swiftclient/+/86444416:00
edausqhello, I have opened a bug report https://bugs.launchpad.net/swift/+bug/201253116:57
edausqit is both impacting and tricky, I hope my report is clair enough, so a coredev can give a look16:58
timburkemcape, i think you've got two options: stop all replicators until you can get hardware replaced (which exacerbates your current durability troubles), or reduce the ring to 2 replicas and remove all devices from the failed node in the ring (which may cause some further shuffling of partitions and/or complicate bringing the disks back into the cluster)16:59
timburkeedausq, looking at it now -- will try to keep you updated. just to double-check: it's the same version of swift under both py2 and py3, yeah?17:06
opendevreviewASHWIN A NAIR proposed openstack/swift master: allow x-open-expired on POST requests  https://review.opendev.org/c/openstack/swift/+/87743417:16
edausqtimburke: yes, same version of swift. thank you!17:23
timburkeedausq, i haven't been able to repro yet -- i've definitely spent some time thinking about exactly this sort of a problem, though, and thought we had all our bases covered :-/ just to make sure i've got my environment right: which 2.29.x release is this? which version of python? eventlet?18:19
timburkeis there anything special i should know about how the object-server's deployed? (for example, i know some people have tried getting it running using mod_wsgi or uwsgi instead of eventlet's wsgi server)18:19
timburkeoh! and do you have encryption enabled? i just realized: i *do*, and that's probably throwing off my testing so far...18:26
timburkeyup, that'll do it. *sigh*18:28
edausqwe don't have encryption enabled. I am so glad to read you were able to reproduce! And you have a traceback too. I don't understand how come we don't, but that's another topic19:52
edausqtimburke: since you can reproduce, I am guessing you don't need our details about python/eventlet and swift version.19:55
timburkeedausq, yeah, i'm good -- thanks20:03
kotagood morning20:57
mattolivermorning21:02
acoleskota: mattoliver good morning!21:02
kotaacoles: mattoliver o/21:02
indianwhocodeso/21:05
mattolivertimburke: you around?21:07
timburkeoh, right!21:07
timburke#startmeeting swift21:07
opendevmeetMeeting started Wed Mar 22 21:07:49 2023 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.21:07
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.21:07
opendevmeetThe meeting name has been set to 'swift'21:07
timburkemain things this week21:08
timburke#topic vPTG21:08
timburkeit's next week!21:08
mattoliveralready!21:08
timburkealso, i accidentally scheduled a vacation at the same time 😳21:09
kotawow21:09
mattoliversure sure :P 21:09
mattoliveryeah, no stress21:09
timburkebut it sounds like mattoliver is happy to lead discussions21:09
mattoliveryeah, I aint no timburke but I can talk, so happy to lead. But need people to help me discuss stuff :) 21:10
mattoliverSo put your topics down!21:10
mattolivertimburke: do we have rooms scheduled etc?21:11
timburkeno, not yet -- i'd suggest going for this timeslot, M-Th21:11
timburkesorry acoles. there isn't really a good time :-(21:12
timburketake good notes! i'll read through the etherpad when i get back :-)21:12
mattoliverkk21:13
mattoliveris there a place I'm suppose to suggest/register the rooms or just register them via the bot like I did for the ops feed back last time?21:13
timburkevia the bot, like last time. anyone should be able to book rooms over in #openinfra-events by messaging "#swift book <slot ref>"21:14
mattolivercool, I'l come up with something21:15
timburke#topic py3 metadata bug21:15
timburke#link https://bugs.launchpad.net/swift/+bug/201253121:15
mattoliverSo long as acoles is ok with it. Or maybe we have an earler one for ops feedback.. I'll come up with something21:15
mattoliveroh this seems like an interesting bug21:15
timburkeso... it looks like i may have done too much testing with encryption enabled21:15
timburke(encryption horribly mangles metadata anyway, then base64s it so it's safer -- which also prevented me from bumping into this earlier)21:17
timburkebut the TLDR is that py3-only clusters would write down object metadata as WSGI strings (that crazy str.encode('utf8').decode('latin1') dance). they'd be able to round-trip them back out just fine, but if you had data on-disk already that was written under py2, *that* data would cause the object-server to bomb out21:19
acolessorry guys I need to drop off, I'll do my best to make the PTG - mattoliver let me know what you work out with times21:20
mattoliveracoles: kk21:21
timburkemy thinking is that the solution should be to ensure that diskfile only reads & writes proper strings, not WSGI ones -- but it will be interesting trying to deal with data that was written in a py3-only cluster21:21
mattolivertimburke: oh bummer21:21
mattoliverso diskfile will need to know how to return potential utf8 strings as wsgi ones, so antoher wsgi str dance.  21:22
mattoliverbut I guess it's only for the metadata?21:22
timburkeyeah, should only be metadata. and (i think) only metadata from headers -- at the very least, metadata['name'] comes out right already21:23
timburkehopefully it's a reasonable assumption that no one would actually *want* to write metadata that's mis-encoded like that, so my plan is to try the wsgi_to_str transformation as we read meta -- if it doesn't succeed, assume it was written correctly (either under py2 or py3-with-new-swift)21:24
mattoliveryeah, kk21:24
mattoliverlet me know how you go or if you need me to poke at anything, esp while your away21:25
timburkethanks mattoliver, i'll try to get a patch up for that later today21:25
mattoliverand thanks for digging into it. thats a bugger of a bug. 21:25
timburkemakes me wish i'd had the time/patience to get func tests running against a cluster with mixed python versions years ago...21:27
timburkeanyway21:27
timburke#topic swiftclient release21:27
timburkewe've had some interesting bug fixes in swiftclient since our last release!21:27
timburke#link https://review.opendev.org/c/openstack/python-swiftclient/+/874032 Retry with fresh socket on 49921:29
timburke#link https://review.opendev.org/c/openstack/python-swiftclient/+/877110 service: Check content-length before etag21:29
timburke#link https://review.opendev.org/c/openstack/python-swiftclient/+/877424 Include transaction ID on content-check failures21:29
timburke#link https://review.opendev.org/c/openstack/python-swiftclient/+/864444 Use SLO by default for segmented uploads if the cluster supports it21:30
timburkeso i'm planning to get a release out soon (ideally this week)21:30
mattoliverok cool21:30
timburkethanks clayg in particular for the reviews!21:30
timburkethat's most everything i wanted to cover for this week21:32
mattolivernice. If there is anything else anyone wants to cover, put it in the PTG etherpad ;)21:33
timburkeother initiatives seem to be making steady progress (recovering expired objects, per-policy quotas, ssync timestamp-with-offset fix)21:34
timburke#topic open discussion21:34
timburkeanything else we should talk about this week?21:34
mattoliverWe did have some proxies with very large memory useage > 10G 21:34
mattoliverso not sure if there is a bug there. maybe some memory leak with connections.. but it's too early to tell. I'm attempting to dig in. but just a heads up.21:35
timburkeright! this was part of our testing with py3, right?21:35
mattolivermay or may not turn into anything 21:35
mattoliveryup21:35
timburkei'm anxious to see a repro; haven't had a chance to dig into it more yet, myself21:35
mattoliverthere seems to be alot of CLOSE_WAIT connections, so wonder if its a socket leak or not closing properly or something. 21:36
mattoliverI'll try and dig in some more today21:36
kotanice21:37
mattoliverI am also working on an internalclient interface for getting shard ranges, as more and more things may need to become shard aware. 21:38
mattoliver#link https://review.opendev.org/c/openstack/swift/+/87758421:38
mattoliverbut it's still a WIP, like other things, let's see how we go.21:38
mattoliverif there is a gatekeeper added to the internal client it'll break the function though. Al has suggested one possible fix, I came up with a middleware shim in internal client, clayg seems to think we should just error hard. 21:39
mattoliverbreak the interface I mean. 21:40
mattoliverSo dicsussions are happening about that.. might start with the simplest and error loud I guess, but let's see where it goes. 21:40
mattoliverThat's all I have21:41
timburkei'm surprised there'd be any internal clients that would want a gatekeeper... huh21:42
mattoliverwell there aren't21:43
mattoliverbut if someone creates one with alow_modify_pipeline=True (or whatever it's called), one will be added21:43
mattoliverand this would break sharding.. in fact it might already as the the sharder uses interenal client to get shards already, the interface just wants unified21:44
mattoliveror a mis configuration from an op. 21:44
timburkei'll blame it on clayg ;-) https://review.opendev.org/c/openstack/swift/+/77042/1/swift/common/internal_client.py21:49
mattoliverSo yeah, I could just be doing down an edgecase that doesn't really matter. But it is still a shoot foot edgecase, and do we attempt to avoid it, or assume people will do the right thing.21:49
mattoliverlol21:49
timburkewell, i think i'll call it21:49
mattoliverkk21:49
mattoliverthats all I have anyway :) 21:49
timburkethank you all for coming, and thank you for working on swift!21:49
timburke#endmeeting21:49
opendevmeetMeeting ended Wed Mar 22 21:49:31 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)21:49
opendevmeetMinutes:        https://meetings.opendev.org/meetings/swift/2023/swift.2023-03-22-21.07.html21:49
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/swift/2023/swift.2023-03-22-21.07.txt21:49
opendevmeetLog:            https://meetings.opendev.org/meetings/swift/2023/swift.2023-03-22-21.07.log.html21:49
opendevreviewASHWIN A NAIR proposed openstack/swift master: allow x-open-expired on POST requests  https://review.opendev.org/c/openstack/swift/+/87743421:49
timburkehuh. longer than normal meeting-end delay21:49

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!