21:00:32 #startmeeting swift 21:00:33 Meeting started Wed Oct 25 21:00:32 2017 UTC and is due to finish in 60 minutes. The chair is notmyname. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:34 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:36 The meeting name has been set to 'swift' 21:00:40 who's here for the swift team meeting? 21:00:42 o/ 21:00:47 o/ 21:00:48 hi o/ 21:01:53 hmm... I'm sure there's more than 4 people out there :-) 21:01:59 hi 21:02:07 yo! 21:02:50 o/ 21:02:51 hi, sorry to be late 21:03:13 now that acoles is here, we can finally begin 21:03:14 ;-) 21:03:18 sheesh 21:03:36 acoles: no worries. we haven't started yet :-) 21:03:39 I was here I just didn't say hello 21:03:47 ah. of course 21:04:28 ok, let's get started then. most of the people I expected to discuss today's topic are here 21:04:34 #link https://wiki.openstack.org/wiki/Meetings/Swift 21:04:41 #topic next release 21:04:52 it's time to tag another release 21:04:58 #link https://wiki.openstack.org/wiki/Swift/PriorityReviews 21:05:06 priority reviews page is updated with 2.16 stuff 21:05:17 it's in rough order that I'd prioritize it 21:05:26 please review stuff as you are able 21:05:54 ideally, I'd like to release next week (earlier than later, since i'll be flying to the openstack summit late next week) 21:06:14 are there any questions about any of these patches? 21:06:47 ok 21:06:52 Nope, let's aim for next week 21:07:02 #topic bug triage 21:07:08 #link https://etherpad.openstack.org/p/swift-bug-triage-list 21:07:18 last week we said we'd have a bug triage day next week 21:07:32 ideally, that would be right after a release is tagged :-) 21:07:43 which is great 21:08:10 so put the bug-bash day on your calendar and let's clean up that list 21:08:24 (this is a reminder topic instead of one that needs a lot of discussion, I think) 21:08:44 ok, now for the meaty topics :-) 21:08:52 #topic SPDK and swift 21:09:07 today peluse is back with us (yay) to talk about something he's been working on 21:09:12 rock n roll 21:09:14 peluse: take it away 21:09:27 I'm thinking I should have typed some shit up in advance to avoid all the typos I'm about to introduce :) 21:09:31 anyways... 21:09:34 http://spdk.io 21:09:45 #link http://spdk.io 21:09:54 is the URL as I mentioned before. Quick high level overview then I'll bring up a proposal someone in our community has made 21:10:02 that we haven't spent a whole lot of time thinking about TBH 21:10:06 ok 21:10:25 Also, here's a SNIA talk I did last month about SPDK in general and one relevant component called blobstore https://www.snia.org/sites/default/files/SDC/2017/presentations/Solid_State_Stor_NVM_PM_NVDIMM/Luse_Paul_Verma_Vishal_SPDK_Blobstore_A_Look_Inside_the_NVM_Optimized_Allocator.pdf 21:10:37 So SPDK is a set of user space components that is all BSD licensed 21:11:09 its used in a whole bunch of ways but mainly by storage appliances to optimize SSD performance in what swift would call the storage node 21:11:19 FYI its in Ceph already but not the default driver 21:11:41 and when I say "it" I mean whatever component the system has chosen to take on, in Ceph its the user space polled mode NVMe driver 21:11:56 there are some basic perf marketing type hypes slides in that deck I pated in for anyone interested 21:12:19 pretty huge gains when you consider latency and CPU sensitive apps running with latest SSDs 21:12:35 so the basic idea is a fast/efficient way to talk to fast storage media that might potentially be useful in swift's object server? 21:12:46 anyway, that's the real trick is that its all user space, direct access to HW, no INTs and no locking 21:12:51 yup 21:13:08 but there are a ton of compoennts, well not a ton, but a bunch that would not be relevant 21:13:15 could it be useful for the account/container servers, too, or are we just looking at object servers (and diskfile in particular)? 21:13:17 what are the integration points. I doubt it's as simple as mmaping a file and your'e done 21:13:19 and some are lirbaries and some are applications. 21:13:44 I think since its SSD only (well not techncially but it wouldn't make sense to use on spinning media) most likelt container 21:13:46 so we are talking of objec servers on SSD. is it a real use case? (i would think it's the target of ceph, very low latency) 21:14:06 if you used object servers there are probably some limitations wrt what we call blobstore 21:14:21 I'l get to the integration question in a sec 21:14:44 so, assuming a node takes on the user space NVMe driver and the driver talks directly to HW you can see there no kernel and no FS 21:15:00 so... unless the storage application talks in blocks it doesn't make much sense 21:15:11 ok 21:15:16 blobstore is SPDK's answer to this but its not a FS 21:15:46 it's a super simple way for apps that don't talk blocks that can use a really simple file-ish object-ish like interface to take advantage of SPDK 21:15:54 so for example, RocksDB 21:16:07 in that slide deck I mention some work we did there to bolt blobstore up to RocksDB as a back end 21:16:13 so ... as you know swift likes to be HW and driver agnostic. what does this tie in too? is it possible to write stuff in a way that works if you have fast media or not? 21:16:15 its that kind of idea that might makes sense for Swift 21:16:21 * jungleboyj looks in late 21:16:32 or is the idea that swift would engage spdk mode if it detects flash? 21:16:50 so there are lots of things that can be done there 21:17:09 but yeah I think anything more aggressive than NVMe only would not be worth it 21:17:18 SPDK doesn't automateically do any of that kind of detection 21:17:33 so that would have to be considered 21:17:33 that makes sense 21:17:41 I could imagine swift detecting that 21:18:07 and blocstore itself is pretty immature, need to point that out. We just now added code to recover from a dirty shutdown if that gives you an idea 21:18:10 ok, so tell me (us) more about the blobstore. would that be a diskfile thing? 21:18:16 so this whole thing would be a proof of concept type activity for sure 21:18:24 how does this make rledisez's LOSF work awesomer? 21:18:26 so yeah, I think diskfile would make sense 21:18:40 but I don't rememeber the details there of course. my brain is pretty small :) 21:18:53 In that slide deck you can see a super simple example of the interface 21:19:34 blobstore bascially takes over an entire disk, writes its own private metadata and then the app create "blobs" and does basic LBA sized reads and writes to them 21:19:46 ah, ok 21:19:47 it can't handle sub-LBA access (by design) 21:20:17 well, we can them pages in blobstore but they're 4K 21:20:17 that sounds like a haystack-in-a-library thing. or something similar to what you're working on rledisez 21:20:59 yes, blobstore would be what we call volume. and I guess it embed its own k/v indexation. so it looks similar in some ways 21:21:20 yeah, I think the integration effort w/Swift for production would be a decent sized lift but for a POC may be worth it provided, maybe for container SSDs, the latency and CPU usage bebenfit made sense 21:21:23 peluse: is there any spdk component that could replace sqlite? eg some kv store that does transactions? 21:21:34 eg to replace the container layer 21:21:54 rocksDB would be the closest match, using blobstore as a backing component 21:22:07 but that's really what Wewe's proposal was - to add a k/v interface on blobstore 21:22:14 ah ok. so a 3rd part db that works with spdk 21:22:27 yeah, maybe that's the best first step 21:23:04 any questions from anyone, so far? 21:23:11 I can't remember what sqlite guts look like, can you easily replace the storage engine as its called in like MariaDB, anyone know? 21:23:25 no 21:23:33 yeah, OK didn't think so 21:23:36 sqlite is "just" a DB library 21:23:37 dumb question from me, but can you explain the difference from spdk and the intel cas tech? 21:23:49 ^ not a dumb question 21:23:54 sure, good question 21:24:00 they are totally different for one thing 21:24:21 CAS is a caching project/product that works between an app and the FS. 21:24:52 SPDK is a whole bunch of stuff, but not caching layers. It has to be integrated with an application unless you use one of the things like the compiled iSCSI target 21:25:48 dunno if that's enough explanation - block cache vs library of stuff for integration, mainly polled mode device driver for NVMe 21:26:25 so Q for you guys, is there any urgency with container SSDs and latency and/or using a bunch of CPU? 21:26:29 peluse, so spdk provides performance improvements by substituting the FS and writing directly to block storage 21:26:37 do you handle caching in bdev or blobstore? or do you assume the underlaying device is fast enought 21:26:41 tdasilva, yup 21:26:55 rledisez, there's no data caching at all right now 21:27:11 peluse: very similar to bluestore? 21:27:26 bdev is a layer for abstracting different types of block devices. For example we can have an NVMe at the bottom of the stack or a RAM disk and for layers above bdev they don't care. its super light wieght 21:27:54 tdasilva, yeah, bluestore and blobstore area lot alike but bluestore was done of course just for Ceph and I think is more mature/feature rich right now 21:28:26 but Sage mentioned in his keynote at SNIA SDC about looking at maybe using rocksdb w/blobstore at some point in the future (dont quote me though) 21:28:44 that would be in addition to bluestore as backing FS though, no isntead of 21:28:54 peluse: what questions do you have for us? 21:29:02 peluse: ack, thanks 21:29:19 jsut the one above about pain points wrt latency and or CPU utilization around SSDs 21:30:11 well, and if anyone is interested enough to work with someone from the SPDK community to try and see if there's some sort of proof of concept worth messing with here 21:30:13 only pain points I've seen recently with the container layer is drive fullness and the contaienr replicator not having all the goodness we've added to the object replicator for when drives fill up 21:30:52 rledisez: how about you? any latency or cpu issues on containers or accounts? 21:31:11 peluse: from my experience, there is not really a pain point about storage speed on containers. having a lot of containers slo down some process (like replicator) as they need to scan all db. not sure yet if blobstore would help here 21:31:50 wen I say CPU util, there's more in that deck I referenced, using SPDK (nvme + blobstore) greatly reduces CPU utillization while at the same time greatly improving perf 21:31:57 so you get kinda a two fer one thing 21:32:22 so for containers you'll get more CPU utillization for other things happening on the storage node, and the IOs will be faster and more repsonsive 21:32:38 (or your money back) 21:32:44 heh 21:33:02 how can you measure that CPU usage related to kernel/fs. i don't think i see any, but i would like to check 21:33:12 most of the cpu usage comes from replicator or container-server 21:33:29 There's a perf blog on spdk.io that may have some good info in it, honestly I haven't read it :( 21:33:49 but we have some folks in our comm that live for that kinda stuff so I can ask there and get back to y'all 21:34:30 rledisez, yeah unless used for object storage wouldn't help w/replicator 21:34:48 if you have a magic command to get the cpu usage i would be interested (i guess it would be something related to perf command) 21:35:33 honestly, spdk sounds really cool. it seems like something that would be great for an all-flash future. (but I'm not sure if anyone deloying swift is there yet) 21:35:48 rledisez, yeah I dunno the details of the various measurements but the team has looked at every metric known to man using a variety of tools 21:36:25 peluse: do you have people in the spdk community who are interested in swift? if so, are they interested because they just want to integrate spdk everywhere or because they are using swift already? 21:37:00 Wewe is the only person I know that's brought it up and he wasn't able to get connected today due to network issues 21:37:33 right now there's more demand on features/integration than there is anything else so I don't think the former is driving anyone 21:37:52 ok 21:38:06 which is one of the reasons I wanted to chat w/you guys about this - if it doesn't make a lot of sense to investigate from your perspective we certainly have enough work on our plate :) 21:38:51 that's all I got for ya, other questions? 21:39:00 I think it makes sense when looking a few years into the future and preparing for that. it doesn't make sense from the sense that all of our current employers have a huge amount of stuff we need to do in swift way before we get to needing spdk 21:39:14 yup yup 21:39:14 (my opinion) 21:39:34 what is the current split of SSD usage, still mostly containers? 21:39:35 definitely something I want to keep an eye on 21:39:38 yeah 21:39:48 cool 21:39:57 flash still too expensive for interesting-sized object server deployments 21:40:08 makes sense 21:40:19 people these days are going for bigger nodes. 80 10TB in a single chassis 21:40:28 (and getting all the eww that implies) 21:40:51 well, that's not to say nobody on this end will work on a proof of concept anyways and if so I'll encourage them to check in the Swift comm frequently of course... 21:40:53 i like the idea, and we can surely share some stuff between LOSF/blobstore but i think that people looking for really low latency object store will check ceph as by its design/implem, it looks more suited 21:40:54 let's move on so we can give m_kazuhiro appropriate time :-) 21:40:58 peluse: that's great! 21:41:08 thanks for the time guys!! 21:41:09 and thanks for stopping by to give an update 21:41:27 my pleasure... ping me later if anyone has followup questions. take care! 21:41:29 rledisez: I can get you in contact with peluse if you can't find him on IRC later 21:41:43 #topic symlinks 21:41:49 Yeah thanks peluse, sounds like cool tech :) 21:41:54 m_kazuhiro: looks like the discussions and code have been going well! 21:42:05 only one more big question, and that's for CORS, right? 21:42:12 #link https://etherpad.openstack.org/p/swift_symlink_remaining_discussion_points 21:42:34 notmyname: Yes. There is only one discussion point for symlink. It's about CORS. 21:42:59 Details is in #4 of the etherpad page. 21:42:59 Overview is that... 21:43:03 timburke and I talked about it as soon as I walked in the office this mornign. he didn't even let me put down my bag! ;-) 21:44:00 When symlink and the target in different containers and these container have diffecent CORS settings... 21:45:19 clients will receive error response to GET/HEAD symlink even if the request follows CORS setting of the symlink container. 21:46:04 The discussion point is that "Do we accept this behavior?" and "If update behavior, how to update?" 21:46:28 timburke: can you give a summary of what we talked about earlier? (you understand the context better than me) 21:48:23 m_kazuhiro: so is the error you mention because of ACLs, or because of CORS allowed-origin settings? 401/403 because of container ACL is definitely fine -- and we can currently return such (kinda curious) responses 21:49:09 (ie, 200 on the preflight OPTIONS request, but then a 401/403 on the subsequent GET/POST/whatever) 21:50:04 the behavior i'd expect, given two containers both publicly readable, one with a permissive allowed-origin one without 21:51:41 timburke: Because of ALCs. My concerning case is that CORS setting is same with ALCs but clients will receive ACL error even if following CORS settings. 21:52:07 would be that a symlink from the permissive container into the "normal" one would work -- we'd 200 the OPTIONS request (because of the settings on the container that's actually in the HTTP request path), then allow the subsequent GET (because the ACLs on both containers allow it) 21:52:40 while the *other* way wouldn't work because we fast-fail the OPTIONS request and the ACLs have no bearing 21:54:25 m_kazuhiro: do we know the corresponding behavior for DLO/SLO? 21:54:51 sorry, I did sleep too much 21:55:05 * kota_ just get waken up. 21:55:14 like, if i have a DLO in one container, which has one set of ACLs and one particular CORS setting, but all of its segments are in another container where everything's different... 21:55:25 timburke: I'm not sure for DLO/SLO. 21:57:30 it seems like a similar situation could arise -- following whatever precedent that gives us would at least have the advantage of consistency 21:57:55 +1 21:58:14 Sounds like a nice way of answering the question 21:58:31 :-) 21:58:51 timburke: So, the conclusion is that we should accept and keep current behavior. correct? 21:59:47 pretty sure. it's worth double checking (and probably having a func test or two that include OPTIONS requests) 21:59:56 yeah 22:00:01 +1 22:00:36 I was just thinking that with questions like this, a functional test for each and the question "which one do we want to get passing" would be a great way to do a discussion 22:00:49 ...and we're out of our time 22:00:58 generally, though, it seems like we rarely think about OPTIONS in middleware, so symlinks probably behaves like slo/dlo :-) 22:01:07 m_kazuhiro: I think you've got enought to go on today, right? 22:01:20 notmyname: Yes! 22:01:23 great! 22:01:30 thanks everyone for coming! 22:01:33 thank you for your work on swift 22:01:37 #endmeeting