21:00:03 #startmeeting swift 21:00:04 Meeting started Wed Oct 30 21:00:03 2019 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:05 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:07 The meeting name has been set to 'swift' 21:00:13 who's here for the swift meeting? 21:00:19 o/ 21:00:37 o/ 21:00:59 o/ 21:01:36 agenda's at https://wiki.openstack.org/wiki/Meetings/Swift 21:01:47 #topic Shanghai 21:01:59 it's almost here! 21:02:14 yey 21:02:22 in two days, i'll be on the plane! i'm excited (and a little freaking out) 21:02:35 H-36 before my flight :) 21:02:55 i've been adding what events i know about to the etherpad 21:02:58 #link https://etherpad.openstack.org/p/swift-ptg-shanghai 21:03:02 it'll logn flight, please safe. 21:03:23 be long 21:03:25 in particular, i saw there was a game night, like there have been the last few PTGs 21:03:48 they tend to be pretty fun, and a good opportunity to get to know some other the other openstackers better 21:04:08 i also saw that cschwede is going to be there! 21:04:08 good to know 21:04:16 when? 21:04:22 (at the ptg that is; i don't know about game night ;-) 21:04:57 ah, you told about the past ones. 21:05:03 got it. 21:05:18 game night's thursday, 8:00 PM, City Center Mariott Lobby 21:05:30 ah ok. thx. 21:06:00 * kota_ should go to the etherpad link 21:07:05 oh, there was also a flyer the foundation put together, where'd i put that... 21:07:18 #link https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/summits/shanghai/Shanghai-Travel-Tips.pdf 21:07:41 with some travel tips 21:07:50 travel tips! 21:08:37 "Some restrooms do not supply toilet paper. Suggested to carry some with you." is a little disconcerting... 21:08:46 but good to know! 21:08:51 oic. "Download ALL apps needed on your phone / desktop (including the Summit Mobile App!)" 21:09:04 `including the Summit App!` 21:09:11 oh wow so soon 21:09:40 yeah, i still need to get my phone situated... laptop's prepped; phone, not yet 21:09:56 at least y'all will be close to my timezone soon (assuming you'll have irc access) 21:10:38 strictly speaking, my timezone is closer ;-) 21:10:55 mattoliverau, i *think* irc will be ok? ptgbot isn't going to be so useful otherwise, anyway ;-) 21:11:04 ahh good point :) 21:11:18 kota_: lol 21:11:22 that is true 21:12:29 all right, that's all i've got for summit/ptg -- i can't wait to see kota_ clayg rledisez alecuyer and cschwede there! 21:12:37 on to updates! 21:12:46 #topic versioning 21:13:46 clayg and tdasilva have got the three patches stacked up now, and they've been iterating on it 21:13:50 cschwede there too, that's awesome! 21:14:44 i haven't been able to follow along quite as closely as before, as i've gotten a bit distracted with summit/ptg/general-travel prep 21:15:58 timburke: you do a great job of following as much as you do. 21:16:25 I guess they're not here to do an update. maybe we just link the patches and move on then? 21:16:25 but i got the impression they've been adding tests and fixing up rough edges, with the idea that clayg will have a pretty solid picture of what's involved so we can talk about it at the ptg 21:16:35 cool 21:16:58 so what comes first, the null containers? 21:17:09 or namespace 21:17:20 what ever terminology I'm suppose to use :P 21:17:29 yep, null namespace first -- https://review.opendev.org/#/c/682138/ 21:17:48 then a new versioning api for swift -- https://review.opendev.org/#/c/682382/ 21:18:26 and finally hooking up s3api to use the new api -- https://review.opendev.org/#/c/682382/ 21:18:49 sweet 21:18:54 i think that ought to cover it 21:19:00 #topic lots of small files 21:19:03 those look big changes 21:20:28 kota_, fortunately, almost 4k of the 5k lines in that middle patch are just new test files :-) 21:20:39 :-) 21:20:54 alecuyer is not here, but I think he told me there is nothing new on losf this week 21:21:29 rledisez, is there anything we should be trying to do or look at to be more prepared at the ptg? 21:22:18 I think what we have in mind right now is to stop evolving it for w while until it can be merged (fix bugs, tests, nothing new before merge) 21:22:54 I think alecuyer should be answering your question, i can't answer honestly 21:24:10 that's ok, no worries. something of a freeze sounds reasonable; and i guess the rest of us ought to be thinking about what needs to happen next for us to feel comfortable merging it to master 21:25:04 #topic profiling 21:25:13 #link https://etherpad.openstack.org/p/swift-profiling 21:25:33 rledisez, take it away :-) 21:25:38 thx :) 21:26:13 so, the full story is in the etherpad, but i short, we are CPU bound on proxy-servers, and it does not seem right that a decent proxy-server cannot handle more than 3 or 4 Gbps of trafic 21:26:42 so I did some profiling, I played with conf option timburke suggested last week and I put result there 21:27:11 first of all, I'm interested if you see any issue in the bench I did (wrong methodology etc…) 21:27:33 after that, I propose some ideas at the bottom to improve the situation i'd like to discuss 21:27:52 basically, object-server is fine. proxy/GET is fine, proxy/PUT is damn slow 21:28:22 you said it's got 10Gb NICs -- are there two of them? one client-facing, one cluster-facing? 21:29:05 timburke: on our production yes, but for the benchmark, all was local? on production we are far from 10Gbps on either interface 21:29:18 i mean, it was local, for sure :) 21:30:08 note: I still need to bench with EC policy 21:30:37 and are we measuring bandwidth on the client-facing traffic, or cluster-facing? 21:30:47 (just to sanity check ;-) 21:30:48 Is it possible refuse in-kernel MD5 and try some local libraries? 21:30:58 Maybe the kernel overhead is too great or something. 21:31:18 timburke: client facing (and I understand that cluster facing we are expecting N*bandwidth for a PUT) 21:32:05 rledisez: the benchmark ran under py3 or still py2? 21:32:16 zaitcev: are you talking about the splice option? 21:32:41 kota_: I did both for some measure, I didn't see any major difference, but mostly py2 21:32:54 ok 21:34:08 so with the 1MB chunk size... the client's seeing 5Gbps, so we must be generating 15Gbps on the cluster interface -- which seems about in line with the upper-bounds you were seeing in the object-server... 21:34:09 No, I am saying that all of our MD5 are calculated by kernel nowadays, right? 21:34:22 Every time you invoke md5 it's a syscall 21:34:32 Using the AF_LINK or what's its name 21:34:33 zaitcev, nope -- rledisez already pointed out that i had the wrong idea about that ;-) 21:34:59 ok 21:35:57 timburke: In my bench I had 3 object-server that could reach about 14Gbps, so in the best of best world the proxy should handle 15Gbps (because it's all localhost trafic, writing on /dev/shm) 21:36:19 when we're writing we just use the normal python hashlib: https://github.com/openstack/swift/blob/2.23.0/swift/obj/diskfile.py#L1669 21:37:03 and it's quite good given than object-server is "only" 20% slower than simple md5sum 21:37:25 so I'm not expecting major improvement on object-server 21:37:52 (well +17% bw / -17% cpu is still something :)) 21:38:43 just to be clear, i'm not suggesting at all to remove md5 calculation :) I just did it to get the best of proxy-server 21:40:11 eh, i know notmyname's talked about the idea of using soemthing other than md5 before... that's not *such* a crazy idea... 21:40:46 but yeah, i'm not sure how best to further investigate ATM 21:41:09 Makes me wonder if some simple perf daily or weekly check in zuul might be useful. Obviously with a grain of salt because of shared tenants. But might catch major degredations. 21:41:42 well, with just the no-timeout/no-queue on proxy we already could get a signifant perf improvement 21:42:46 I wonder why timeouts cause such a slow down, is it there implementation, because I wonder why a watchdog thread works so well 21:43:08 I would've thought a timeout would be a timed thread or something somewhat similar 21:43:22 * mattoliverau has never looked under the hood though 21:43:53 is it something evenlet monkey patches (me is just thinking out loud). 21:43:55 mattoliverau: it is quite good, except we call it for each chink (so each piece of 64KB), so it is called thousands of time for an upload 21:44:05 ahh 21:44:08 ok 21:44:18 while a watchdog will be initalized once and jsut a variable is updated then 21:44:25 yeah, eventlet basically schedules an event for later to raise the Timeout in the appropriate thread 21:45:19 and i think it's also part of why the increased chunk size captures a lot of the no-timeout gain 21:45:48 and the queue, well, the same * N, and it needs a synchronisation each time (so lock etc…) 21:46:06 timburke: right, bigger chunk == less call to Timeout/queue 21:46:23 rledisez, did you happen to measure RAM consumption differences between the different chunk sizes? 21:47:26 i wonder if we should just up the default chunk size... 21:47:29 nope, I didn't. but it's quite easy to calculate the woerst case scenario I think. the max queue size is 10 IIRW, there is N replica, so chink_size * N * 10 ? 21:47:45 *chunk 21:47:58 per PUT 21:48:33 and it was always a single PUT at a time, right? 21:49:39 yeah, Im planning to do more on concurrency later to see if there is something to optimize there 21:49:47 right now it's focus on one connection performance 21:51:51 rledisez: great job 21:51:52 out of curiosity, what kinds of speeds can you get with netcat? testing locally just now, i can get ~24GB/s with dd piping straight to /dev/null, but only 1.3-1.4GB/s (so ~11Gbps) if i send it through a socket that's piping to /dev/null... 21:52:25 timburke: do you have the exact command so I can copy/paste? 21:53:10 in one terminal, `nc -l 8081 > /dev/null` -- in another, `dd if=/dev/zero bs=1M count=10000 > /dev/null` 21:53:43 15.5 GB/s 21:53:52 i tried twiddling bs/count to do even larger chunk sizes, but it didn't have much difference 21:54:08 why's my laptop so slow!? boo! 21:54:28 good to know though, to keep in mind as an upper bound :-) 21:54:56 I can provide you a server to work, but you're going to have trouble at china customs ;) 21:55:16 all right, well... i guess we'll keep thinking about it. willing to bet we'll talk about this more next week 21:55:24 lol 21:55:33 got just a few more minutes 21:55:38 #topic open discussion 21:55:47 anything else anyone would like to bring up? 21:56:29 I have a mate who might be convincing his work to do some upstream time. They're interesting in tiering. So if it goes ahead I might point him at those stalled patches. 21:56:59 \o/ i love new contributors! 21:57:05 if so, a discussion that should be had (maybe at ptg) is is it still the right design? 21:57:27 good 21:57:31 or maybe should it use the new null namespace and hide tiering containers? 21:58:11 excellent question 21:58:15 I had a chat wtih him about some of it already while giving him a Swift intro the otherday online. 21:59:01 out of curiosity, who's his employer, if you can say? 21:59:13 if you can add that to the list of discussions it would be good. It'll depend on if he can swing it. But maybe as a friday thing after other null namespace dicsussions 21:59:25 Can't say yet 21:59:37 👍 21:59:54 timburke: but you might now them because they may or may not use swiftstack ;) 22:00:05 all right, we're about at time 22:00:18 thank you all for coming, and thank you for working on swift! 22:00:22 #endmeeting