#openstack-meeting log

21:00:01 <notmyname> #startmeeting swift
21:00:02 <openstack> Meeting started Wed Jun 13 21:00:01 2018 UTC and is due to finish in 60 minutes.  The chair is notmyname. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:04 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:06 <openstack> The meeting name has been set to 'swift'
21:00:08 <notmyname> who's here for the swift team meeting?
21:00:40 <rledisez> hi o/
21:01:58 <acoles> hi
21:02:00 <notmyname> rledisez: hmm... think anyone else is joining us today?
21:02:12 <torgomatic_> o/
21:02:27 <notmyname> acoles: torgomatic_: hello
21:04:25 <notmyname> clayg is out this week. kota_ is out too
21:05:03 <notmyname> not sure where everyone else is
21:05:34 <notmyname> torgomatic_: do you want to go over your patch question, or should we wait until more people are online?
21:05:43 <torgomatic_> eh, I can go now if you like
21:05:48 <torgomatic_> I don't see much else on the agenda this meeting
21:06:09 <torgomatic_> I've written a change that makes the object updater have both multiple processes and per-process concurrency. The goal is to be faster at handling large numbers of pending updates.
21:06:15 <torgomatic_> I tested it on one (real-hardware) node with a few million async_pendings; the graph is here: https://storage101.dfw1.clouddrive.com/v1/MossoCloudFS_3b02a08f-d96b-49fe-b8a5-b88551fad855/graphs/parallel-expirer-graph.png?temp_url_sig=4127cb9b472e3222bc865e936c9cb023d12a64a0&temp_url_expires=1615318360&inline
21:06:23 <torgomatic_> The only weird thing here is the config parameters "concurrency" and "updater_workers". On master, "concurrency" means "number of processes", and each process handles async_pendings serially. With the patch, "updater_workers" means "number of processes", and "concurrency" means "number of concurrent updates in each process".
21:06:28 <torgomatic_> Most other Swift daemons have similar configs where "(type_)?workers" controls process count and "concurrency" controls greenthread count, like the object reconstructor, object replicator, object server, object expirer, container replicator, container server, account replicator, account server, and account reaper. The only other daemons using "concurrency" to control process count are the object auditor and container
21:06:28 <torgomatic_> updater.
21:06:35 <torgomatic_> On upgrade, this patch would result in a small change to existing clusters who have set concurrency=N. Before, they'd have N processes spread across (up to) N disks, each processing one async_pending at a time. After, they'd have 1 process operating on each disk in turn, processing N async_pendings at a time.
21:06:41 <torgomatic_> I think this is fine, but others may disagree, and that's why it's a meeting topic.
21:07:12 * notmyname reading
21:07:56 <notmyname> I am very glad rledisez is here :-)
21:08:07 <rledisez> :)
21:08:16 <rledisez> let me read it all once again ;)
21:09:21 <mattoliverau> Sorry I'm late o/
21:09:22 <acoles> I raised this when reviewing the patch. The patch is great, the move towards consistent option naming is laudable, but I felt the implications of the option name changes warranted wider discussion (in our brave new world of single +2)
21:10:04 <notmyname> it's a good question to raise. my gut reaction is the same as acoles's.
21:10:06 <rledisez> torgomatic_: it's not really the scope of the patch but i'm curious. how did you ended up with millions of async pendings?
21:10:12 <notmyname> torgomatic_: what's the upgrade path?
21:10:51 <torgomatic_> rledisez: lots and lots of puts of small objects to a single container
21:11:26 <acoles> we need a better solution for those small objects ! ;)
21:11:27 <torgomatic_> notmyname: the upgrade path is pretty much just upgrade Swift as normal
21:11:40 <rledisez> torgomatic_: that's what i thought ;)
21:11:40 <rledisez> i've pushed this last week exactly for that situation: https://review.openstack.org/#/c/571917/
21:11:40 <patchbot> patch 571917 - swift - WIP: Manage async_pendings priority per containers
21:11:50 <notmyname> current master only has the "concurrency" setting?
21:11:55 <torgomatic_> notmyname: correct
21:12:11 <rledisez> related to that patch, i'm a bit afraid of the I/O pressure it would put to disks if you have, let's say, concurrency=60
21:12:25 <rledisez> operators should be warned to take care of that value before upgrading
21:12:30 <notmyname> torgomatic_: and the default value of the new "update_workers" setting?
21:12:31 <torgomatic_> so you go from N-at-a-time before to N-at-a-time after, but the distribution is worse
21:12:41 <torgomatic_> rledisez: agreed; it's definitely worth a mention in the release notes
21:12:52 <torgomatic_> or a mailing list message to openstack-operators, or both
21:13:12 <torgomatic_> notmyname: default is 1, I think... let me look
21:13:20 <notmyname> 1 or 1 per drive?
21:13:32 <acoles> so what torgomatic_ says ^^ was my concern, that if all you do is upgrade swift, you might have end up with updater performance
21:13:53 <torgomatic_> defaults are concurrency=128 (which may be a bit high; I just made it up) and updater_workers=1
21:13:54 <acoles> degradation
21:14:05 <torgomatic_> acoles: true; if you've got only a few async_pendings, then you're okay with the upgrade
21:14:22 <notmyname> torgomatic_: so that means on upgrade, you would *not* have one per drive. you'd have only one
21:14:25 <torgomatic_> if you have a lot, then this upgrade could slow you down until you notice the new settings and then you get to process them faster than before
21:14:26 <rledisez> and if you've got plenty of them, you're already in trouble :D
21:14:41 <torgomatic_> notmyname: one what, exactly? process? concurrent update?
21:14:45 <notmyname> ah! "one...operating on each disk in turn". got it. /me learns to read
21:15:01 <notmyname> I misread it as "after upgrade...you'd have one per disk"
21:17:05 <notmyname> so what's the alternative? invent new config variables and keep inconsistency across all settings?
21:17:11 <acoles> rledisez: can I check, when you say concurrency=60 might be a bad idea, do you mean the new concurrency i.e. 60 threads, or do you mean 60 processes?
21:18:18 <torgomatic_> notmyname: yeah, pretty much. We could have "concurrency" for processes and "concurrency_per_process" for greenthreads or something, and then it's a special case forever but there's no upgrade issues
21:18:37 <torgomatic_> I prefer moving towards consistency; there are too many daemons not to IMO
21:19:08 <notmyname> so the question is "do we choose new names with inconsistency" or "go for config consistency at the risk of an upgrade slowing down existing clusters until settings are changed"
21:19:53 <torgomatic_> right, and it would only really harm existing clusters with lots of async_pendings
21:20:05 <rledisez> acoles: if i got it well: now concurrency=60 means 60 processes (one per disk)
21:20:05 <rledisez> after: concurrency=60 means one process per disks, 60 threads each. i think 60 threads per disk is a bad idea (if the value is just 2, it should be fine for everybody)
21:20:12 <notmyname> I'm going to assume all clusters have times with lots of async pendings
21:20:48 <torgomatic_> rledisez: fair enough; so one process per disk and 2 greenthreads per process is something you think is reasonable?
21:20:54 <acoles> rledisez: ok, that makes sense.
21:21:05 <torgomatic_> for reference, the graph above was done with 1 process per disk and 32 greenthreads per process
21:21:22 <notmyname> IMO, seeing as 100% of clusters have to update configs in order to take advantage of the new feature, going for consistency is worth it
21:21:50 <rledisez> yes. i can't tell the right number per disk without making some impact test. but until 5-10 it should be fine. over 10 is probably too much (think unlinking, inode update, journal update, too much I/O)
21:21:51 <notmyname> hmm... and changing the default as rledisez suggests would further minimize impact
21:22:25 <torgomatic_> rledisez: 8 it is... that's a power of 2, so people are less likely to wonder about it ;)
21:22:30 <rledisez> :D
21:22:36 * torgomatic_ will change the default to 8 after the meeting
21:22:55 <mattoliverau> Lol
21:23:03 <notmyname> everyone happy with this conclusion?
21:23:16 * rledisez is wondering how far we to dynamically adapt all of this and not worry anymore about default values
21:23:17 <acoles> notmyname: if you have concurrency >1 now then you do not need to change config to take advantage of the new feature, but you do need to change config to retain the old feature
21:23:59 <mattoliverau> +1 for consistency, +1 for 8, +1 for upgrade docs. +100 for some good work
21:24:17 <torgomatic_> rledisez: do you think "auto" as a value for updater_workers would work, where "auto" means "1 process per disk, however many that is?"
21:24:23 <torgomatic_> (future patch, not this one)
21:24:46 <torgomatic_> but that'd let you set greenthreads-per-disk and then forget about the reset
21:24:48 <torgomatic_> *rest
21:25:13 <rledisez> torgomatic_: no, i'm just dreaming that one day, processes will know the numbers of available I/O they can use without impacting the TTFB
21:25:23 <notmyname> rledisez: +1000!!!
21:25:51 <torgomatic_> yeah, if we have a single process responsible for each disk's IO, that gets easier
21:26:27 <torgomatic_> kind of tough to do that with all these uncoordinated WSGI servers, replicators, auditors, and other things all running around at the same time
21:26:33 <notmyname> that's definitely a goal that I want us all to work torwards over the next year or so
21:27:52 <notmyname> ok, anything else major to discuss this week?
21:27:57 * notmyname has some time pressure today
21:28:09 <torgomatic_> nothing else here
21:28:14 <notmyname> torgomatic_: thanks for bringing that up
21:28:17 <rledisez> just one (quick) question about "tsync" work
21:28:20 <notmyname> ok
21:28:28 <rledisez> do you think it would be in python or golang?
21:28:32 <notmyname> yes
21:28:33 <acoles> mattoliverau: has been asking about the ring composer CLI, whether we should merge it?
21:28:44 <notmyname> also yes
21:29:11 <acoles> #link https://review.openstack.org/451507
21:29:12 <patchbot> patch 451507 - swift - Experimental swift-ring-composer CLI to build comp...
21:29:43 <notmyname> rledisez: in reality, I think we will (and should) do tsync before any object server rewrite. which means we should implement tsync such that there aren't any language-specific serialization issues. yes, that means we'll write it twice
21:30:18 <rledisez> ok. because my fear is that a tsync in golang means to re-do all the work on LOSF to write the diskfile in golang
21:30:30 <rledisez> but I guess in any way, at some point everything will be rewritten
21:31:13 <notmyname> acoles: mattoliverau: having something is better than having nothing. maybe it should have a stderr line on invocation that says "this is experimental" or something like that
21:31:33 <acoles> notmyname: IIRC it has that, maybe on stdout
21:31:39 <notmyname> oh ok
21:32:31 <acoles> notmyname: no, stderr https://review.openstack.org/#/c/451507/18/swift/cli/ringcomposer.py@166
21:32:32 <patchbot> patch 451507 - swift - Experimental swift-ring-composer CLI to build comp...
21:32:33 <notmyname> rledisez: I'm not sure why golang tsync means rewriting golang diskfile
21:32:55 <torgomatic_> I agree that something is better than nothing re: ring-composer. Get *something* in now, then we can iterate later.
21:33:09 <notmyname> torgomatic_: acoles: also, it's got a warning! :-)
21:33:19 <torgomatic_> I always find it easier to fix something than to start with an empty .py file and come up with something good.
21:33:42 <notmyname> torgomatic_: alternatively, I've never had issues starting with an empty .py file and coming up with something bad
21:33:49 <torgomatic_> :)
21:34:17 <acoles> I can start and end with an empty .py file :P
21:34:18 <mattoliverau> Great my thoughts exactly.
21:34:33 <timburke> leave *nowhere* for the bugs to hide!
21:34:46 <rledisez> notmyname: well, I guess it depends a lot of the design. If the server is the normal object-server, no issues. if the server is a tsync server, it needs to use the diskfile. so i'm especially thinking of the LOSF diskfile
21:35:49 <notmyname> rledisez: I guess the good news is that nobody has started writing antyhing "tsync" yet, so it can be whatever we want it to be
21:35:55 <mattoliverau> Re: composter. I've been talking to people about a cool feature called composition rings so it would be nice to have something people could at least play with. I know the classes are there, but some clients is needed.
21:36:05 <mattoliverau> *composite
21:36:12 <mattoliverau> Don autocomplete
21:36:19 <notmyname> but IMO tsync should definitely account for the lsof work from the beginning
21:36:46 <mattoliverau> DHE also bought up this week, which reminded me that we never landed it
21:37:06 <notmyname> I've got two final thoughts (then I need to close the meeting and go offline for a bit)
21:37:16 <DHE> *brought. (not to be confused with bribes to get what I want :)
21:37:37 * mattoliverau gives up typing on his phone, autocomplete sucks and y'all just have to learn to guess what I'm saying :p
21:37:43 <DHE> oh. I'm sorry.
21:37:47 <notmyname> (1) it's really great to see all the review activity and code landing. my subjective feel is that the single +2 olicy is so-far successful (time will tell if it's sustainable)
21:38:02 <mattoliverau> Lol
21:38:36 <acoles> mattoliverau: I'd love it to be renamed as the 'composter' :)
21:38:49 <mattoliverau> DHE: that wasn't directed at you, sometimes my spelling is bad.. I am Australian
21:38:56 <mattoliverau> acoles: exactly!
21:39:09 <notmyname> (2) please remember to be kind on reviews. there was one review comment I saw recently that was perhaps a little more accusatory than it needed to be (not from anyone in here). seemed like a good time to thank everyone for their normally very kind reviews
21:39:22 <DHE> mattoliverau: I know, I'm just trying to be funny...
21:40:13 <mattoliverau> DHE: lol, I appreciate that ;) keep it up
21:40:41 <notmyname> thank you, everyone, for your contributions to swift. I'm glad you're part of the community
21:40:44 <notmyname> #endmeeting