21:00:17 <timburke> #startmeeting swift
21:00:18 <openstack> Meeting started Wed Mar 18 21:00:17 2020 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:19 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:21 <openstack> The meeting name has been set to 'swift'
21:00:25 <timburke> who's here for the swift meeting?
21:00:30 <seongsoocho> o/
21:00:39 <alecuyer> o/
21:01:02 <kota_> hello
21:01:26 <tdasilva> o/
21:02:01 <clayg> more like *party* time
21:02:12 <timburke> agenda's at https://wiki.openstack.org/wiki/Meetings/Swift
21:02:25 <timburke> #topic covid-19 / Vancouver
21:02:34 <timburke> so i'd meant to mention this thread last week but forgot (things have been a little hectic with my recent job transition)
21:02:38 <timburke> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-March/013127.html
21:02:43 <timburke> but it looks like i've got a new message to reference now, anyway!
21:02:49 <timburke> #link http://lists.openstack.org/pipermail/foundation/2020-March/002854.html
21:02:59 <timburke> looks like the PTG is going virtual!
21:03:23 <timburke> i know just this week the bay area (where i live) has recommended people shelter-in-place for the next three weeks
21:03:33 <timburke> so i guess this isn't entirely surprising
21:04:08 <timburke> (also, apologies in advance -- i'm probably going to be less available than usual if i've got two small kids at home full time)
21:04:29 <alecuyer> yup.. I'm trying that now (and with just one :) )
21:04:39 <rledisez> hi /
21:04:41 <rledisez> o/
21:04:47 <clayg> *virtual* PTG 🤔
21:04:56 <clayg> I hadn't heard that - thanks @timburke
21:05:13 <kota_> no worry, a lot of people have same situation. I also take care of my kids in my home.
21:05:33 <alecuyer> clayg: sounds as good as a "virtual" beer no ? but I guess it's for the best
21:06:06 <kota_> +1 for the virtual drinking.
21:06:08 <timburke> i'm sure there will be more organizing and planning going on over the next few months
21:07:16 <timburke> and i'm hopeful about us finding a way to have some dedicated time together to think hard about swift :-)
21:07:39 <timburke> stay safe everyone!
21:07:54 <clayg> perhaps even in person post apocalypse!
21:08:23 <timburke> we'll all meet up at matt's ~~beach house~~ bunker!
21:09:00 <timburke> #topic jerasure support in liberasurecode
21:09:25 <timburke> so i noticed recently that the liberasurecode gate is currently broken
21:09:31 <timburke> all jobs fail with something like `fatal: repository 'http://lab.jerasure.org/jerasure/gf-complete.git/' not found`
21:09:38 <timburke> at first, i was inclined to just replace the repo with a working mirror (such as ceph's fork on github, done in https://review.opendev.org/#/c/712842/)
21:09:39 <patchbot> patch 712842 - liberasurecode - Use ceph's GitHub mirrors for gf-complete/jerasure - 1 patch set
21:09:49 <timburke> but investigating further, i found
21:09:51 <timburke> #link http://web.eecs.utk.edu/~jplank/plank/www/software.html
21:09:59 <timburke> the notice toward the top indicates that jerasure is no longer supported and the source has been taken down
21:10:06 <timburke> (all-in-all, it sounds like part of a patent-suit settlement)
21:10:23 <timburke> so i guess my main question is: do we go chasing forks/mirrors (which may share a similar fate), or stop supporting jerasure? or maybe just stop *testing* jerasure? but then it'll be difficult to tell when/whether we've broken support
21:10:46 <timburke> i suppose that last one is the closest to our current support model for shss and libphazr... but i don't know that we'd even get reports of breakage, much less any assistance in resolving issues :-/
21:11:29 <kota_> true
21:12:17 <rledisez> stopping test does not seem good. I would vote in favor of deprecating it, but still supporting it through mirror for some time (1 year?)
21:12:32 <timburke> does anyone have clusters running with jerasure? i know swiftstack would always go with isa-l...
21:12:33 <rledisez> what are the other options instead if ISA-L to support the same EC schema?
21:12:39 <clayg> rledisez: that's pretty reasonable!
21:12:49 <rledisez> we run jerasure but i(ve been considering to move to isa-l recently
21:13:04 <kota_> AFAIK, isa-l or shss for NTT groups
21:13:31 <clayg> rledisez: oh ouch - i remember when we looked at jerasure the lawsuit stuff turned us off 😬
21:13:36 <rledisez> but I'm thinking what about people running swift on non-x86 processor (does somebody do that?). can they run isa-l in replacement of jerasure?
21:14:03 <clayg> rledisez: I don't think isa-l is going to be "compatible" so much as it'd just be a different ec policy with a different scheme - you'd want to "support" jerasure forever (or re-encode all your data!)
21:14:32 <kota_> I'd make sure if the liberasurecode_rs_vand is not effected by the GF-complete problem?
21:14:37 <rledisez> clayg: i did basci test and it was working, but it's on my todo to run extensive testing on that
21:15:00 <clayg> rledisez: oh WOW - it'd be *amazing* if I was wrong about that
21:15:24 <rledisez> clayg: just a basic test running pyeclib manualy, still need a lot of confirmation
21:15:26 <clayg> kota_: was the GF-complete the thing where decode would return bad data if you gave it specific combinations of frags?
21:15:53 <timburke> kota_, i know libec's built-in algo doesn't link against gf-complete -- though whether it would run into patent trouble is a separate issue...
21:16:21 <clayg> timburke: ok, well 1) awesome find, i'm sure no one else was paying attention to gate tests for pyeclib and 2) does rledisez 's suggestion of "support" through ceph mirror with big WARNING WILL REMOVE somewhere in the changelog ASAP?
21:16:28 <timburke> clayg, no, the bad data thing was an isa-l bug
21:16:41 <kota_> clayg: I don't think so. that problem was in isa-l rs_vand.
21:16:46 <clayg> timburke: then hopefully rledisez can drive putting together a "how to not with the jerasure" guide that we can publish when we pull the plu
21:17:07 <timburke> that all sounds like a great plan :-)
21:17:48 <clayg> rledisez: and god speed on getting of jerasure 👍
21:17:49 <timburke> (this, and the quadiron patches, reminds me how i rather wish we had some alternate plugin model that more-explicitly pushed the glue-code responsibility down to each driver...)
21:17:57 <rledisez> well, I hope my plan is gonna work then :D
21:18:20 <clayg> rledisez: well you can be like "look upstream is removing support for jerasure" :P
21:18:32 <clayg> timburke: yes plugins are so hard to do right 😞
21:18:48 <timburke> especially in a language you're not super-familiar with
21:19:41 <timburke> all right, i think i've got what i need out of that -- on to updates!
21:19:47 <timburke> #topic waterfall EC
21:20:00 <timburke> clayg, how's it looking?
21:20:32 <clayg> so I think my last update was two weeks ago - at that time I was like "yeah we can't just extend replicated concurrent gets; because the control is in the wrong place"
21:20:56 <clayg> so then I thought I'd just decouple EC get from database & replicated GETs then I'd be able to "make it so much simpler!!!"
21:21:00 <clayg> yeah that didn't work
21:21:15 <clayg> the first thing I wanted to "rip out" was the "resuming stream feature"
21:21:52 <clayg> basically I never liked it and don't have a clear picture of how often a chunkreadtimeout actually turns into a resume'd get - and even less so how often that WORKS - even for replicated!
21:22:54 <clayg> then I started looking at how it fails in the EC case and was all like  https://media1.tenor.com/images/dcb66efa26bc6d58becc3581e5f41e38/tenor.gif
21:23:25 <clayg> So i decided EC GET's don't NEED resuming behavior and THEN I can "make it so much simpler!!!
21:23:31 <clayg> but yeah that didn't work
21:24:16 <clayg> I removed a couple hundred lines of resume code - but there's still like 400 lines of "multi-byte range" response handling code that is ALSO burried in the GETorHEADHandlerBase/ResumingGetter mess
21:24:46 <clayg> and I'm not sure I can convince myself EC GET's don't NEED multi-byte-range responses
21:25:09 <clayg> I mean... they probably don't - I think Sam just added it because he wanted too and no one stopped him... but I could be wrong, maybe someone wants it
21:25:31 <clayg> and since I don't really have a good reason to pull it off of replicated objects it seems like we're probably stuck with it on EC
21:25:38 <clayg> ^ that's actually up for debate I guess?
21:25:47 <clayg> tdasilva: seemed to think "well maybe we CAN drop it!?"
21:27:01 <rledisez> if it was broken I would say drop it, but I think it's working, and I can tell for sure that somebody somewhere in the world is using it. so changing the API, mmm…
21:27:21 <clayg> anyways - aside from maybe a little forward progress on the core EC GET request handling code and related tests I'm kinda back to square one 😞
21:27:37 <clayg> yup, that's my gut as well
21:27:57 <timburke> i'm still wondering whether it might make things easier to reason about if we at least pulled the multi-range support out to middleware -- though i think SLO uses it, so ordering may be a little annoying...
21:28:04 <kota_> IIRC, the multi range supports for EC is needed because a segment may belong to 2 fragments
21:28:39 <kota_> due to the user range GET request.
21:29:11 <clayg> kota_: there IS some byte range translation for client requests - and you need that even for SINGLE range requests - but the ability for bytes=0-4,8-12 to turn into a MIME document isn't really dependent on the storage policy
21:29:37 <clayg> in FACT - we could *BUILD* multi-byte-range responses (the MIME responses) in middleware using ONLY single byte-range requests to the proxy
21:30:01 <clayg> start a MIME response, fetch bytes 0-4 and send those, then fetch 8-12 and send those
21:30:14 <kota_> ah, it should follow the storage policy. I don't think the translation is needed for the repliated one.
21:30:31 <clayg> that actually seems like a MUCH better way to do multi-byte-range responses than what we have now (that threads mime handling all through the proxy and storage layer)
21:31:03 <clayg> right for multi-byte-range request to replicated data we just return the object server's MIME response (which is... idk, gross to me for some reason)
21:31:15 <timburke> there's going to be some corner cases we'd have to consider if we moved it to middleware -- a 416 on the first range may or may not mean we should 416 the whole request, for example
21:31:38 <clayg> like I don't WANT my object servers to know how to make MIME responses - I think Sam just got a little crazy with multipart messages once he did that thing for EC PUT 🤷
21:31:44 <timburke> and *definitely* need to make sure we plumb in an If-Match header on subsequent requests
21:32:44 <clayg> timburke: yeah... if we decided to stop and say "ok, you can't have better backend EC request handling until you pull multi-part-byte requests to middleware" it'd be LONG haul
21:33:50 <timburke> fwiw, AWS only supports a single range per request
21:34:27 <clayg> so realistically I guess I'll probably take another stab at pulling apart GETorHEADHandler somehow
21:35:13 <clayg> leave the resuming and multi-byte-range handling in place and extract the connection logic so it's either like dependency injection, or just subclasses
21:37:10 <clayg> maybe ResummingGetter becomes BaseMultiRangeResumingGetter and GETorHEADHandler becomes ReplicatedGETorHEADHandler and some of ECObjectController._get_or_head_response goes into a new ECGETorHEADHandler that does all the Response Bucket stuff
21:37:23 <clayg> so, I guess that's the plan
21:37:25 <tdasilva> just to add a bit more about my idea of just dropping it. my reasoning was: 1. s3 doesn't support it (hence my assumption very few (if any) people care about it. 2. we can have build it in middleware. So my idea was "drop it" and if someone complains, add it to middleware
21:37:27 <clayg> 3rd times the charm!
21:38:11 <clayg> tdasilva: I didn't mean to throw you under the bus - FWIW I totally understood that line of reasoning and find it compelling
21:38:15 <tdasilva> if no one complains, less code for us to support.
21:39:09 <tdasilva> clayg: I gotcha, just wanted to provide some thoughts behind it, cause I honestly don't think it's a bad idea. but that's just my opinion...
21:39:24 <tdasilva> we could have the middleware ready
21:39:29 <clayg> also having investigated how much work it'll be to make "waterfall-ec" mergable - it's entirely possible priorities may shift and this will be a slow burn rather than hard push
21:40:53 <clayg> rledisez: straw man - if we had a change that made EC demonstrably better, plus simpler code - but dropped multi-byte-range responses BUT in followup patch we reimplemented multi-byte-range as middleware ware ...
21:41:08 <clayg> could we merge the first one w/o merging the second one until we need it? 😁
21:42:06 <rledisez> clayg: well, that's a tough position for me. like I have to wait for a customer to complain, then we merge it. in the mean time, my customer says he will move to OTHER-CLOUD-PROVIDER because it didn't break his workflow
21:42:32 <rledisez> maybe I should add a timeseries to monitore if somebody use it
21:43:17 <clayg> rledisez: i guess it depends how much you want it out
21:43:45 <clayg> rledisez: and it sounds like you're probably justifyable saying "it's not causing ME any pain; please don't make pain for me" and that seems reasonable
21:43:49 <tdasilva> I this it's reasonable to think that over time we add cruft to the code base that over time is no longer used/needed. It's really hard (almost impossible) to find it, but I think we should make attempts as it would simplify the code, making it better
21:43:53 <clayg> let me take one more stab at this with less code churn
21:44:00 <tdasilva> s/I this/I think
21:44:31 <clayg> if I fail again I may come back and beg you to do some more qualification on multi-range responses
21:44:32 <timburke> tdasilva or i could start poking at multi-range-as-middleware if we get serious about going that route, anyway
21:44:33 <rledisez> I guess we have some time to decide on this (if we need the middleware). I'll try to find out if somebody use multi-byte range on my clusters
21:44:54 <timburke> sounds good. we oughta keep moving
21:44:59 <clayg> 👍
21:45:02 <timburke> #topic lots of small files
21:45:08 <timburke> rledisez, i saw a merge from master!
21:45:23 <rledisez> yep, I'll let alecuyer explain where he is now on losf
21:45:27 <alecuyer> I've posted a list of the main changes planned so far, here
21:45:32 <alecuyer> #link https://wiki.openstack.org/wiki/Swift/ideas/small_files/implementation#LOSF_v2
21:45:57 <alecuyer> If you have questions, go ahead, or I can put it on an etherpad if that's better
21:46:20 <alecuyer> Otherwise, I haven't posted code yet, for lack of time these past few days, but also because of going back and forth and changing my mind about some things
21:46:58 <timburke> so does hashes.pkl get written in the volume, or somewhere else?
21:47:20 <alecuyer> it's written in the same place as it is in the regular diskfile, currently
21:47:34 <alecuyer> object-X/partition - but , that could change to be below the "losf" directory
21:48:07 <timburke> cool - i couldn't remember where we wrote it currently ;-)
21:48:52 <timburke> i look forward to seeing the next few patches!
21:48:56 <rledisez> right now the development is happening in our internal branch. how do you see the reconciliation with feature/losf?
21:49:01 <rledisez> alecuyer: ^
21:50:02 <alecuyer> well I think I still need to do some testing, and once I get something that I think works, I'll try to split it in proper patches
21:51:03 <rledisez> great. i guess we will try that future dev happen directly on feature branch :)
21:51:39 <timburke> +1
21:51:49 <timburke> #topic CORS
21:51:54 <timburke> p 712585 adds a cors gate job, and it even passes!
21:51:55 <patchbot> https://review.opendev.org/#/c/712585/ - swift - Add gate job for CORS func tests - 11 patch sets
21:52:02 <timburke> next up i'll work on stacking the s3api changes on top of that, and getting the s3 tests in p 710354 distributed across the s3api patches so you can see what gets enabled by each patch
21:52:02 <patchbot> https://review.opendev.org/#/c/710354/ - swift - Add CORS func tests for s3api - 3 patch sets
21:52:22 <timburke> has anyone tried running the new tests in p 533028? or even looked at them? i want to figure out whether this is even a palatable way to have func tests with an actual browser, or if i need to sort out something different
21:52:22 <patchbot> https://review.opendev.org/#/c/533028/ - swift - Add some functional CORS tests - 8 patch sets
21:52:35 <timburke> i saw that clayg has opinions :-)
21:52:57 <clayg> so on p 533028 - should all of the tests PASS?
21:52:57 <patchbot> https://review.opendev.org/#/c/533028/ - swift - Add some functional CORS tests - 8 patch sets
21:53:27 <timburke> yes
21:53:51 <timburke> (with the two patches that it's stacked on top of)
21:54:12 <timburke> well, pass or skip, anyway
21:55:14 <timburke> actually, maybe it's better to follow-up in -swift -- i wanted to leave time for
21:55:22 <timburke> #topic open discussion
21:55:47 <timburke> anything else for us to bring up?
21:56:02 <alecuyer> I'm curious to know the proportion of HEAD requests you all get on your clusters. Do share if you can!
21:56:09 <alecuyer> (I think I asked that once already actually ;) )
21:56:34 <rledisez> so for us, 54% HEAD, 23% GET
21:57:01 <rledisez> we've been trying to evaluate the cost of HEAD (cost in I/O)
21:57:06 <rledisez> it's not that easy
21:58:41 <clayg> I don't have that metric in aggregate offhand - I'll drop a note to try and sample some clusters we can look at
21:59:27 <timburke> alecuyer, rledisez do you also have stats on user agents? i know python-swiftclient tends to be noisy with the HEADs...
21:59:42 <rledisez> just a note on the drop-md5 work, i uploaded a "working" patch (some tests still need to be fixed). if you're interrested you can look at it. on replication policy it increase the download speed like x3. let me find the link
21:59:55 <alecuyer> rledisez: if you don't, I will look it at (user -agent) I don't have it now
21:59:59 <rledisez> timburke: I can check that
22:00:07 <rledisez> or alecuyer will :)
22:00:24 <rledisez> drop-md5: https://review.opendev.org/#/c/713059/
22:00:24 <patchbot> patch 713059 - swift - WIP: Make the hashing algorithm configurable - 2 patch sets
22:00:30 <timburke> i've got a snippet of logs from one of our clusters that's got like 300:9:1 for GET:HEAD:PUT, but it's a pretty short timespan iirc
22:00:31 <zaitcev> holy cow, where do all these HEAD come from?
22:00:45 <alecuyer> zaitcev:  my thoughts exactly
22:01:14 <seongsoocho> 80/15/5 for GET/HEAD/PUT
22:01:32 <alecuyer> seongsoocho:  thanks
22:02:01 <timburke> all right, we're at time
22:02:03 <timburke> thank you all for coming, and thank you for working on swift!
22:02:07 <timburke> #endmeeting