21:00:45 #startmeeting swift 21:00:46 Meeting started Wed May 24 21:00:45 2017 UTC and is due to finish in 60 minutes. The chair is notmyname. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:47 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:49 The meeting name has been set to 'swift' 21:00:53 who's here for the swift team meeting? 21:01:00 here 21:01:04 hello 21:01:04 here! 21:01:08 o/ 21:01:09 o/ 21:01:15 o/ 21:01:22 o/ 21:01:23 here! 21:02:01 hi 21:02:03 o/ 21:02:07 it's torgomatic and jrichli! 21:02:19 yay! 21:02:20 is cschwede_ going to be here? 21:02:26 acoles: ? 21:02:38 hello 21:02:50 * acoles was in wrong channel 21:03:19 ok, I'm told cschwede_ is out this week 21:03:19 me too 21:03:21 let's get started 21:03:26 #link https://wiki.openstack.org/wiki/Meetings/Swift 21:03:36 #topic previous meeting follow-up 21:04:04 * TC goals. there's a mailing list thread. I wrote some words on it (but not the governance patches) 21:04:17 not much more to discuss there. just stuff to track and get done 21:04:36 let's skip to some good stuff :-) 21:04:43 *global EC patchs 21:04:53 composite rings landed! yay! 21:04:58 notmyname: do you want me to draft a patch that documents when/how swift complies with the wsgi goal? 21:05:08 acoles: sure, that would be great 21:05:17 and per-policy config should likely land today, I think 21:05:44 yey 21:05:44 acoles: timburke: kota_: clayg: anything more to say on the global ec patches? 21:06:06 that will give us something else to scratch off on https://wiki.openstack.org/wiki/Swift/PriorityReviews 21:07:01 speaking of, we should totally go merge https://review.openstack.org/#/c/302494/, too 21:07:02 patch 302494 - swift - Sync metadata in 'rsync_then_merge' in db_replicator 21:07:15 er 21:07:35 i think some follow ups are missing in the list (e.g. https://review.openstack.org/#/c/465878/ 21:07:36 patch 465878 - swift - Small minor fixes for composite ring functionality 21:07:54 kota_: true. I haven't added the follow-ups 21:08:03 looking now... 21:08:04 i suppose timburke also has follow up doc change for composite ring 21:08:32 patch 465878 should be easy review 21:08:33 https://review.openstack.org/#/c/465878/ - swift - Small minor fixes for composite ring functionality 21:08:36 IIRC 21:08:46 huh? 21:10:06 in fact maybe I should have just +A'd it 21:10:07 I'll update the priority reviews wiki to include the follow up patches 21:10:27 ...at least those that acoles hasn't already +A'd ;-) 21:10:50 notmyname: thx and I'll help that if i found something missing 21:11:00 kota_: thank you 21:11:14 i didn't notice we have global ec patch list in the priority review 21:11:38 whoot global EC! 21:11:43 is tdasilva and cschwede_ around? 21:11:51 kota_: you've just got me bugging you about it :-) 21:11:59 clayg: scroll up. yes tdasilva. no csatari 21:12:09 o/ 21:12:09 * cschwede_ 21:12:35 #topic LOSF follow-up 21:12:40 rledisez: jeffli: hello! 21:12:54 #link https://etherpad.openstack.org/p/swift-losf-meta-storage 21:12:54 thx jeffli for joining :) 21:12:54 hello, john 21:13:20 first of all, thx to everybody for feedbacks on the etherpad 21:13:21 what's going on with your comepeting implementations this week? 21:13:42 so, correct me if i'm wrong jeffli, but we've been discussing a lot 21:13:44 notmyname: it's more like collaborating! 21:14:00 and i think we settled on a volume format 21:14:07 rledisez: great! 21:14:08 that's the first good news 21:14:09 whooat! 21:14:25 🍻 21:14:26 now, what we need to focus on is the K/V format 21:14:47 key format, value format, and also which DB (rocksdb, leveldb, somethong else) 21:14:50 meaning rocks vs leveldb? or schema and file format? 21:14:51 ah ok 21:14:55 so all of that :-) 21:14:57 both :) 21:15:01 just write a file in xfs whos contents point to the volume 21:15:28 also, related to some raised questions about one big K/V 21:15:37 we made some benchmarks on memory usage 21:15:50 nice. do you have the results to sharE? 21:15:57 it turns out spliting K/V in multiple small K/V would have a significant memory increase 21:16:20 no numbers right now, it was very quick tests, but i'll ask alecuyer to write that down somewhere 21:16:25 ok, thanks 21:16:41 link them on the ideas page (best) or in the existing etherpad (not great, but ok) 21:16:48 sure 21:17:07 i think that's all right now for me 21:17:12 rledisez: jeffli: is there anything you two need from the rest of us during this next week? 21:17:58 i can't find anything right now. we are perfectionning the flat namespace (ease to increase part power) 21:18:10 i want to raise another question here 21:18:28 jeffli: go ahead 21:19:38 it is about how to map volumes to partition. Currently Romain use multiple volumes per partition. We use one volume per partition. 21:19:56 that's a good question 21:20:14 rledisez: didn't you use more small volumes to avoid some compaction/gc issues? 21:20:35 the reason to have multiple volume is first for concurency, as volume are append only, it is locked during an uploaed 21:21:07 second reason as notmyname said is for limiting the compaction effect when it happens (no need to move 40GB of data at once) 21:21:36 so yeah, that makes lots of smaller volumes sound better (concurrency, so num_volumes should be roughly the same as the concurrency the drive supports) 21:21:46 jeffli: what's your take on that? 21:22:26 first, map one volume per partition is simple. And we can increase the partitions to reduce the contention 21:23:00 is # volume/journal essentially orthogonal to # of k-v/index 21:23:22 or are they related in your design (jeffli, rledisez) 21:23:23 clayg: no, right one one K/V, multiple volume 21:23:31 in both design 21:23:32 if you take the limit as files-per-partition goes to infinity, you wind up in the current situation where there's one volume per object... I don't know if that helps anything, but it's sort of interestign 21:23:39 one K/V for... each device? node? 21:23:45 er, volumes per partition, not files 21:23:49 clayg: each device 21:24:13 torgomatic: ohai! 21:24:27 torgomatic: in our design, the idea is to have a limit of volume per partition, so it's a kind of limit on concurrency per partition 21:24:33 rledisez: ack - jeffli ? 21:24:34 hmm... we should probably stop using "volume" to refer to the big flat file containing objects (since "volume" often means drive) 21:24:47 how's "slab file"? 21:24:53 or "slab" 21:24:58 journal 21:25:05 notmyname: I was just about to suggest that exact word 21:25:13 by default we limit to 10, which in our experience should be enough, but we will send metric each time this number wouldn't be enough so that the operator can adjust 21:25:27 it is call 'log' in Ambry. So journal could be ok. 21:25:38 rledisez: is it something the operator has to set up, or does it change over time based on load/cardinality? 21:26:11 * jungleboyj sneaks in late. 21:26:12 it would probably has to set up, with a "reasonable" default 21:26:14 clayg: the journal/log is the kv store, right? the slab is the big chunk of disk where the object are 21:26:17 slab makes me thing of idk... like I allocate blocks and fill them in and stuff - but AIUI no one is doing that - you just append to the end - I don't think that is a slab - but w/e 21:26:19 rledisez: hmm 21:26:34 naming things is hard ;-) 21:26:35 just don't call them kittens - cats make me sneeze so I wouldn't be comfortable with that name 21:26:42 clayg: done 21:26:55 or cantaloupe - i don't care for it - too sweet 21:26:59 clayg: i'll write that in case we forget :) 21:27:05 don't call the volume/journal/slab cantaloupe 21:27:10 ok, so 2 outstanding question to answer this week (we can discuss not in the meeting) 21:27:34 (1) a better name for "volume". perhaps slab, journal, or log 21:27:52 (2) how many of per hard drive 21:28:02 is that accurate? 21:28:19 lgtm 21:28:21 jeffli: rledisez makes a good case for contention if you *don't* have multiple volumes per device (at somepoint it's better to queue the request and possibly let the proxy timeout) than it is to throw another i/o into the kernel 21:28:47 jrichli: are we done talking about object-metadata *in* the K/V then? 21:28:53 jeffli: rledisez: if it's operator defined, then maybe some people will start with 1 and others will start with 10. should be able to support both, no? 21:29:08 gah... sorry - jrichli hi - I ment jeffli RE: *object* metadata 21:29:15 np 21:29:31 sure, but i think jeffli want to make request wait until the "volume" is available,, right jeffli? 21:29:40 tdasilva: that sounds ... like something we might should avoid? ie let's not repeat part power type questions that ops really have no business deciding 21:29:48 while we prefer to fail quickly 21:30:03 Alex confirms that the number of volumes per partition is configurable. But I am trying to make it simple so we don't care about how to select a volume in partition 21:30:04 rledisez: ah, yes now we're getting to something 21:30:13 notmyname: interresting point, decide for the oeprator, we never thought of that 21:30:28 rledisez: that's cause YOU are the operator! ;-) 21:30:40 i like to decide by myself :D 21:30:43 lol 21:30:47 i like to decide too 21:30:59 every power user likes to decide 21:31:04 step #1 think I know something and write code I think is smart step #2 realize I didn't know *shit* 21:31:09 yes, since we implement that in Go, if a volume is locked, the write attempt will block. 21:31:38 ok, 3 more minutes on this topic, then we need to move on... 21:32:21 jeffli: rledisez: (clayg?) where do you want to continue this conversation? IRC? email? phone call? etherpad? 21:32:21 we'll try to think a way it could work magically 21:32:28 I'm not sure why - but my gut is let it block? how *do* you decide which of N volumes on a device ultimately get the key allocated to them? 21:32:28 rledisez: perfect! 21:33:03 we can continue by mail, i think alex proposed a webex to jeffli also 21:33:11 yes 21:33:18 oh.. i'm just interested - there's some off chance I might know something useful about swift that is helpful to contribute but most likely I'll just be asking-questions playing devils advocate and rubber ducking - not sure I need to be involved if i'm not writing code 21:33:30 :-) 21:33:39 jeffli: just note that me and alex will be off until monday, so probably no answers to mail or whatever :) 21:33:49 IRC is too difficult for me. Haha 21:33:52 would be happy to setup a bluejeans session if folks are interested 21:33:54 Ok. 21:33:56 jeffli: that's ok 21:33:59 i like to see conversations evolve on etherpads 21:34:04 clayg: playing devil advocate is very useful! 21:34:10 they get messy and you have to throw them out 21:34:27 rledisez: jeffli: would it be possible to get the webex link in IRC so others can join? 21:34:36 notmyname: sure 21:34:40 but like specifically for "how to select which volume and what do do when that one is locked... or the next" 21:34:53 and if there's email, then keeping them on the m-dev mailing list (with [swift] in the subject) will help others stay involved 21:35:14 I think someone could write 20 sentences about what they did and I could spin questions out 21:35:18 email is fine if that's best 21:35:25 notmyname: sure, we'll move the discussion on the public place :) 21:35:30 Got it. Then I think we can discuss that in Email. 21:35:30 i also agree doing on os-dev would be *very* good 21:35:33 rledisez: jeffli: ok, thanks. and please let me know how I can help coordinate that, if needed. and don't forget tdasilva's offer of video chat :-) 21:35:35 like *very* good 21:35:48 I agree (about the mailing list being a good place) 21:35:48 Email to os-dev 21:36:01 clayg: will do ;) 21:36:08 zaitcev: look! using the ML!? 21:36:17 ok, two more topics 21:36:23 #topic swift on containers 21:36:28 tdasilva: ? 21:36:30 zaitcev: never thought he would see the day - having a positive influance on us after all these years after all ;) 21:36:36 patch 466255 21:36:37 https://review.openstack.org/#/c/466255/ - swift - Make mount_check option usable in containerized en... 21:36:39 tdasilva: yeah! what's going on with containers and stuff! 21:36:47 clayg, yes, I can die with regrets tomorrow 21:36:57 s/with /without / 21:37:17 zaitcev: please don't - i mean... just finish the PUT/POST patch first ;) 21:37:34 * jungleboyj hears containers are the next big thing. ;-) 21:37:56 clayg: I think your comments summarized well there. basically this patch helps us support deploying swift on containers on tripleo 21:38:22 so, about that extra lstat() 21:38:42 the issue is that we are trying to run each swift service in its own container, and we can't mount the disk in each of them, so we needed another way to mount_check 21:38:43 is it a real issue or not? Peter Portante used to think it was. 21:39:43 tdasilva: notmyname: my big idea on patch 466255 and others that might "add cruft in the name of containers!" was maybe we should do a feature branch and encourage those interesting in working/progessing the topic of swift on containers to merge merge merge! 21:39:43 https://review.openstack.org/#/c/466255/ - swift - Make mount_check option usable in containerized en... 21:39:58 I think he ran a download or something and a significant number of syscalls were those endless stat() calls. But then they probably aren't that expensive. 21:40:50 then we can go back at the end with - here is the whole kube/quartermaster story - there's some non-trivial value here and we've done all the leg work - so let's rally folks together to maybe clean some stuff up and make better docs and viola patch series merge awesomesause new world order 21:41:23 clayg: I think you have a valid point, OTOH, I think that's a small enough patch and it doesn't necessarily point to there being a ton more of patches like to support swift on containers 21:41:42 so I'd hold off on moving this to a feature branch 21:41:46 or we can eat it piecemeal as we go - I care more about the visibility/community-velocity/contributions than I do about adequate docs and some random lstat call 21:42:28 You know what the next step is going to be 21:42:48 tdasilva: if you really think there's not that many then I think cschwede_ should go ahead and do this patch right - but like... we haven't managed to get part-power-increase merged - or symlinks - or any other number of things 21:42:57 about this patch, it reminds me something i wanted to do, like having a flag-file saying this device should not be accessed anymore, like before i know i'll take the drive off because it needs replacement. is there something to do with this idea and the patch? 21:43:22 Like Tripple-O, Swift on undercloud, containing the ring. It has all the right IPs there and devices. And then little containers in the overcloud downloading ring.gz from that undercloud Swift. Voila, cluster map. Just like Ceph! 21:43:28 rledisez: hummingbird had something like that - I also wanted to flag drives as READ_ONLY 21:43:46 clayg: yeah, i like that too 21:44:36 so, do open() instead of stat() and maybe look at contents...? If we don't care about overhead anymore suddenly. 21:44:50 rledisez: no that's not related to this patch - but would be a huge improvement than this undocumented magic file pull-string to make thinks that aren't mount points tell mount_check they are (?) 21:45:25 zaitcev: I mean... rledisez's idea could be cached - check mtime every 30 seconds w/e 21:45:43 ok, there's a lot going in here 21:45:47 or, have an explicit device list somewhere in /srv/node/DEVICES. Read that once, not for every drive. 21:45:52 zaitcev: you know the history on mount_check - and how it's inherently racy - https://launchpad.net/bugs/1693005 21:45:53 Launchpad bug 1693005 in OpenStack Object Storage (swift) "mount check is racy" [Undecided,New] 21:45:56 helps if you have 30 drives 21:46:00 what do we need to do next to make progress in https://review.openstack.org/#/c/466255/1 21:46:05 zaitcev: *totally* 21:46:08 patch 466255 - swift - Make mount_check option usable in containerized en... 21:46:10 xattr on the mountpoint? (is that even possible?) the inode wuld be in cache everytime so it could be fast 21:46:43 (1) it seems like there's a ton of stuff to talk about with drives 21:46:45 notmyname: merge it! it's not the worst cruft we've ever taken on and all the "right ways to fix it" aren't really the author's problems. PLUS *containers* 21:47:00 (2) tdasilva doesn't think we need a feature branch 21:47:04 notmyname: I'd suggest wait on for cschwede_ to be back next week and comment on the direction he would like to take. I also would prefer a less hacky approach, and maybe we can live with mount_check=false for now 21:47:10 It always was racy, yes. The idea is to prevent Swift from doing a ton of work only to find that OSerror pops 21:47:11 uyes 21:47:36 tdasilva: good. can you cork it (a -2) until he gets back? 21:47:56 okay, so decision is to delay and kick the mount check can down the road a bit? I can rubber-stamp that. 21:48:01 and we need to get resolution on the lstat() or not 21:48:08 right? 21:48:42 notmyname: done 21:48:45 thanks 21:49:04 ok, moving on. one more big topic 21:49:09 tdasilva: with correct permissions you can avoid ever filling up root partition - if you plumb check_dir into account/container server (there's an existing patch from PavelK) you can have fast-507 response - I see no *technical* reason to need to make unmounted paths pretend to be mount points 21:49:20 thanks for being patient, especially for those who live where it is very early or very late 21:49:23 this next topic is for you 21:49:34 #topic new, alternate meeting time 21:50:15 in Boston, we talked about the fact that many of us cannot be at this meeting (hello mahatic) or it's very difficult (hello kota_, matt, cschwede_, etc) 21:50:16 clayg: totally makes sense. thanks for the detailed explanation on bug https://bugs.launchpad.net/swift/+bug/1693005 21:50:17 Launchpad bug 1693005 in OpenStack Object Storage (swift) "mount check is racy" [Undecided,New] 21:50:29 \o/ 21:50:37 so we proposed having a new bi-weekly meeting at a better time 21:50:50 here's the proposal... 21:51:03 Wednesdays at 0700, every two weeks, starting next week 21:51:21 (this is at 12:00am wednesday in california) 21:51:39 this time slot is pretty good for about everyone who doesn't live between Boston and SF ;-) 21:52:09 mahatic has agreed to chair the first meeting. I'll probably try to be there, too, for the first one 21:52:31 but mahatic has also asked to rotate the chair. that will be one of the first meeting topics 21:52:46 it is 3pm in China. That would be great. 21:53:03 jeffli: yes. you're amazing for being here at this meeting now :-) 21:53:16 the plan is to have a community sync point, like this meeting, where those who don't live in North America can more easily attend 21:53:36 Darn timezones, 2 am in Minnesota. 21:53:46 we'll still have this weekly meeting 21:53:52 like all the things we've done in the past, we'll try it out and adjust as needed 21:54:03 the point is that our global community can communicate 21:54:07 and please *talk* about stuff! Whatever you're working on - just get together and make sure you say which bugs are annoying you and which patches you wish you had time to review (or don't understand why they're not merged) 21:54:14 would the topics be the same or a different topic list? 21:54:20 clayg: well, said 21:54:26 rledisez: different topic list 21:54:26 I may not be awake/online at 12:00pm - but I *will* read the logs 21:54:34 every week 21:54:49 notmyname: 0700 UTC right? 21:54:53 unless they get boring - so make sure you throw in a few jokes at my expense to keep me on my toes 21:54:54 this week, I will work with mahatic on getting the meetings wiki page updated with an agenda, and I'll also send a ML announcement about it 21:54:58 acoles: correct 21:55:06 rledisez: I think it should have it's own adgenda - but I think we should keep it on the same page 21:55:21 let me make sure the bi-weekly setting 21:55:39 if nothing specific comes up - you can discuss what was discussed in the past two weeks of regular scheduled meetings 21:55:51 do we have a couple of meeting per 2 weeks? 21:55:55 the challenge for all of us is to ensure that we don't use it to have two separate communities. we've still got to all be communication (not the meeting 1 group and the meeting 2 group doing different things) 21:56:02 or switch the time? 21:56:12 "does anyone who wasn't there have any thoughts on that stat call zaitcev was going on about last week?" 21:56:16 the first meeting will be in 7 days. the second will be 21 days from now 21:56:24 ie 2 weeks after teh first meeting 21:56:42 May 31 and June 14 21:56:45 clayg, you've already shot that question down as the root of all evil, by Knuth 21:57:05 notmyname: does that mean that for folks that attend both meetings, they might have two meetings on the same day that are either very late or very early? i'm thinking someone like kota_ 21:57:24 tdasilva: correct (however 0700 isn't very late for him. 4pm) 21:57:32 or maybe not, i think it's already thursday for him 21:57:50 kota_: I think this meeting isn't going away or changing 21:57:50 I DO NOT expect that anyone should feel required to attend both meetings 21:57:51 tdasilva: strictly meaning, it's not both in a day (Wed and Thu morning) 21:57:57 Japan raises right about this time, I can tell by looking at Pawoo. 21:58:04 clayg: ok, thx 21:58:30 kota_: half as often - in a different timezone some of the same people that come here will come to a different meeting with perhaps other people that stuggle to make this meeting and try to remember that swift is happening *everywhere* 21:58:42 and when people who use swift talk - ~magic~ happens 21:58:49 tdasilva: the Europeans will get to have two meeting on same day :) 21:58:52 I'll also admit that the 2 meeting coordination will be hard. but we've got a great community, and we'll work it out 21:59:17 tdasilva: acoles *loves* the swift meeting tho - he wishes he could have three a day every day 21:59:17 yikes, we're at full time for this meeting 21:59:20 acoles: oh yeah, it's the Europeans that have a sucky time, like 7am and 22 21:59:35 clayg: I have the log on continuous replay ;) 21:59:45 it will be a lot better for onovy and seznam 21:59:52 right 22:00:05 it's 8/9 am for us 22:00:11 oops, run out of time? 22:00:12 onovy: make PavelK show up too! 22:00:21 clayg: ok 22:00:23 :D 22:00:36 ok, we're at time. gotta close this meeting. thank you, all, for your work on swift! 22:00:39 this is really cool, i'm excited about this second meeting and making it easy for huge part of the community to participate 22:00:39 #endmeeting