#openstack-meeting log

21:00:35 <notmyname> #startmeeting swift
21:00:36 <openstack> Meeting started Wed Aug 19 21:00:35 2015 UTC and is due to finish in 60 minutes.  The chair is notmyname. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:37 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:40 <openstack> The meeting name has been set to 'swift'
21:00:54 <notmyname> who's here for the swift meeting?
21:00:56 <jrichli> hello
21:00:57 <mattoliverau> o/
21:00:57 <cschwede> o/
21:00:59 <minwoob_> o/
21:01:00 <redbo> hi
21:01:00 <tdasilva> hi
21:01:01 <hurricanerix> o/
21:01:01 <blmartin> Here by phone
21:01:11 <mahatic> ^
21:01:13 <mattoliverau> So this is what it feels like to have a meeting at a normal hour :P
21:01:14 <ho> hi
21:01:33 <torgomatic> <--
21:01:37 <notmyname> mattoliverau: heh
21:01:39 <aerwin> o/o/
21:01:43 <nadeem> hi
21:01:44 <notmyname> mahatic: are you awake at some crazy hour?
21:01:45 <mahatic> mattoliverau: so this is what it feels like at a very odd hour :P
21:01:47 <acoles> here
21:01:58 <mattoliverau> mahatic: lol
21:02:01 <notmyname> mahatic: 2:30am for you?
21:02:17 <cutforth_> hello
21:02:22 <mahatic> notmyname: yes! weekly meeting cancelled for 6 am tomm! so I thought I'll pull off a couple of more hours :P
21:02:30 <notmyname> wow
21:02:37 <notmyname> ok, then let's get started
21:02:41 <notmyname> #link https://wiki.openstack.org/wiki/Meetings/Swift
21:02:54 <notmyname> #topic hackathon summary
21:03:06 <notmyname> last week we had a great week (IMO) in austin at the hackathon
21:03:20 <notmyname> I'd like to again thank jrichli for hosting and coordinating everything on the ground
21:03:20 <mattoliverau> +100
21:03:26 <mattoliverau> +100
21:03:29 <notmyname> also for the cookies ;-)
21:03:33 <mattoliverau> +100
21:03:37 <clayg> yup jrichli cookies were great
21:03:48 <acoles> great job jrichli
21:03:50 <jrichli> You all are welcome!  it was fun
21:04:07 <minwoob_> Thanks for the cookies.
21:04:15 <clayg> nom nom nom
21:04:22 <notmyname> I'm currently writing up a summary of the week in a blog post. should be published later this week (that's the plan)
21:04:22 <jrichli> :-)
21:04:29 <notmyname> but for a summary of the summary...
21:04:42 <clayg> heh
21:04:44 <mahatic> :D
21:04:51 <notmyname> landing patches faster: need better tools and communication
21:05:04 <notmyname> for tools, there's a couple of things to look in to
21:05:04 <redbo> it was great, jrichli!
21:05:18 <jrichli> glad you liked it redbo!
21:05:19 <notmyname> I've been told that gerrit doesn't have sticky comments
21:05:41 <notmyname> and I'm still working on a "group prioritization" thing to find who's starred what patches
21:05:56 <notmyname> but the basicn thing I took away there was to communicate more on patches
21:06:15 <notmyname> if there's already a +2, don't push a patch set over it unless necessary
21:06:23 <notmyname> if you rebase, make a note saying that's all you did
21:06:41 <notmyname> if you push a patch set up, give a quick sentence explaining the difference
21:06:59 <notmyname> that will help us all use the tool better (until such time as we have better tools)
21:07:18 <notmyname> also, lot's of stuff on EC issues and getting it to "done"
21:07:32 <notmyname> several things identified and being worked through now
21:07:50 <notmyname> it means there are some good patches up to review now, and take a look at the bugs tagged "ec"
21:08:07 <notmyname> I think that we'll be able to clear up these known issues and have something reasonable by tokyo
21:08:18 <zaitcev> I have to admit I started skipping EC recently.
21:08:34 <notmyname> that's like the opposite of the right thing! ;-)
21:08:39 <clayg> lol
21:08:43 <zaitcev> I know we have Clay, Kota, Sam et al on the job, so it's easy to get lazy.
21:08:46 <notmyname> zaitcev: we missed you last week
21:08:55 <notmyname> zaitcev: oh, I can sympathize with that :-)
21:09:33 <notmyname> we also talked about encryption and what's going on there
21:10:00 <notmyname> jrichli has organized https://trello.com/b/63l5zQhq/swift-encryption so that it's clear for the big stuff outstanding
21:10:11 <notmyname> in the "to do for reduced scope" column
21:10:42 <notmyname> I think we'll be able to make some great progress on this before tokyo, but I'd be surprised if we get it full done by then
21:11:02 <notmyname> (unless someone tells me "oh yeah it's simple and we're totally already done")
21:11:15 <jrichli> :-)
21:11:18 <acoles> we'll get the PUT path done :)
21:11:50 <notmyname> also at the hackathon we talked about global cluster improvements. not a ton of stuff there except maybe some handoff affinity. most of the rest is deployment management
21:12:22 * cschwede is curious about handoff affinity
21:12:28 <clayg> notmyname: is there a bug on handoff affinity?  I still don't think I understand exactly how the sorting is going wrong for us?
21:12:53 <mattoliverau> are we talking write affinity or handoff affinity
21:13:01 <clayg> cschwede: right?!  like I think we're supposed to already do this in the proxy, I think something is wrong - but I don't really understand what's breaking down
21:13:03 <notmyname> handoff, I thought
21:13:40 <notmyname> choose a handoff in the same region as the failed primary
21:13:42 <clayg> mattoliverau: it must be the affinity of a write when write affinity is turned on and some amount of primaries is down
21:13:47 <notmyname> maybe I misremembered
21:14:31 <mattoliverau> clayg: yup ok, I've also got a patch up there to add handoff affinity to the replicators
21:14:35 <notmyname> yeah, we need to actually have a bug written down
21:14:41 <cschwede> well, even if primaries are not down but use write affinity - you might need handoffs_first? but maybe needs to be discussed outside the meeting. i’m really curious!
21:14:44 <mattoliverau> so we're talking from the proxy side, got it
21:15:21 <cschwede> sometimes proxy is not enough? ie prevent new data written to a node, but also move data asap somewhere elese?
21:15:27 <clayg> handoff affinity in the replicators?  like when a primary gets a 507 for a partner in the same region? I don't think I have that starred
21:15:56 <clayg> we need to open a launchpad tag for global-cluster and makes some issues
21:16:00 <mattoliverau> clayg: yeah, it was an request from an OP during the ops session in Vancouver
21:16:01 <notmyname> ok, so it seems like there's something to with handoffs. and they might (or might not) have affinity.
21:16:04 <notmyname> clayg: good idea
21:16:16 <mattoliverau> lol
21:16:21 <mattoliverau> righ moving on :P
21:16:25 <notmyname> heh, ok
21:16:26 <notmyname> the full clusters discussion was good, I thought. got a list of stuff to write down in some "what to do when swift asplodes" doc. also some ideas as to new tools to write to help in cluster full scenarios
21:16:33 <clayg> mattoliverau: everyone keeps saying someone in vancouver said something about affinity - but I'm not sure that anyone knows who - or wrote down what they acctually wanted
21:16:54 <mattoliverau> yeah, its on the priority reviews wiki
21:17:03 <notmyname> mattoliverau: not in any detail, though
21:17:10 <mattoliverau> that's true
21:17:25 <notmyname> mattoliverau led a good discussion on container sharding
21:17:29 <clayg> mattoliverau: "handoff affinity (handoffs in the same region)" <= i don't know what that means or what's broken
21:17:35 <mattoliverau> Well I have something up, and hopefully the op in question will review :P
21:17:39 <notmyname> one thing I liked about it is that there are a lot more people looking/thinking about it
21:17:56 <clayg> yay container sharding!  redbo will know what to do!
21:17:59 <mattoliverau> yeah, I'll write up a new version of the SPEC to include what was discussed (maybe on the plane ride home)
21:18:04 <notmyname> and it looks like there might be a reasonable reduced-scope plan of what to do
21:18:05 <mattoliverau> ^ sharding
21:19:15 <notmyname> redbo and dfg and nadeem and allen told us about hummingbird. I like it a lot and they did a great job. I have a much better idea of what the goal actually is and the plans to solve the problem
21:19:25 <mattoliverau> +1
21:19:39 * clayg wonders if he'll ever get back to his hummingbird patch
21:19:44 <zaitcev> I don't have a hard requirement for c sharding, but I'm very enthusiastic for Matt actually doing it. Swift needed it for years, but we got by with application-controlled split-up for containers.
21:19:45 <notmyname> for those not there, the problem being solved is summed up as "object replication is slow in big clusters" (somehwat simplified)
21:20:12 <notmyname> so the golang object server + object replication processes help with speeding up replication
21:20:18 <notmyname> and that right there is a really good thing
21:20:51 <notmyname> so the initial numbers they presented were pretty encouraging, and they shared some tools on how we can all duplicate the tests in our own environment
21:21:28 <cschwede> can you give a raw estimation? like 10% faster, or more like 10x?
21:21:37 * cschwede needs faster object replication
21:21:49 <clayg> notmyname: oh yeah!  I want to try that with keep_cache_private = true
21:21:51 <notmyname> but also, there was the common understanding that the hummingbird branch is intentionally simplified (eg no policy support) for now, and it's a long-term thing
21:22:49 <clayg> cschwede: I think it was more like "can rebalance as fast as we're adding capacity" vs. "can not rebalance as fast as we're adding capacity"
21:22:54 <clayg> #10PB+problems
21:23:03 <notmyname> cschwede: the numbers for now are more about asyncs and .... yeah, what clayg said
21:23:32 <notmyname> right. how many partitions can get rebalanced when adding significant capacity
21:23:33 <ho> cschwede: put 1.5X, get 4.6X, delete 1.4X in my memo.
21:23:43 <notmyname> ho: thanks!
21:23:51 <notmyname> redbo: is that right? ^
21:24:03 <clayg> cschwede: oh, I thought you were talking about replication not the object server
21:24:14 <acoles> cschwede : same as ho in my notes, ho looked it up faster!
21:24:25 <clayg> I thought reads form the page cache were like 6x faster
21:24:44 <ho> acoles: hehe
21:24:54 <clayg> and I thought the os threads for blocking syscalls got them like near 2x on writes & delete's
21:24:58 <cschwede> clayg: ho: notmyname: acoles thx! sounds great. yeah, i had replication in mind primarily, but put/get as well
21:25:19 * cschwede needs to have a closer look to hummingbird
21:25:38 <redbo> Sorry, wasn't following too close.  Replication speed is one of the big goals.
21:26:16 <acoles> clayg: yeah i noted 2x on write/delete for 1k objs, numbers above were 1M
21:26:22 <clayg> I really do wanna do server per port with ~4 workers per port and keep_cache_private = true
21:26:38 <redbo> Policies is one simplification.  I don't think it'll be too difficult to add replicated policies, we just don't use them.
21:27:07 <notmyname> redbo: right. wasn't saying that as a ding. just that the goals right now are more around replication times in your clusters than "full replacement"
21:27:17 <notmyname> which is totally reasonable IMO
21:28:08 <notmyname> so aside for the specific things going on, the larger thing is just that there is a *ton* of stuff going on in swift right now
21:28:22 <notmyname> we had 27 people at the hackathon. and over 40 things listed as topics to talk about
21:28:26 <clayg> redbo: and POST!  but feature parity may be a distraction; enumeration of differences is a) more useful and b) required to quantify parity
21:28:41 <notmyname> so if you feel like it's hard to keep track of all the things and a lot is happening, you're right!
21:28:44 <clayg> notmyname: did we get to everything when it was all said and done?
21:28:58 <notmyname> clayg: we did not do anythign about metadata searching
21:29:12 <notmyname> but yeah, we got almost everything
21:29:15 <clayg> sigh, or part power adjustment - sorry cschwede
21:29:23 <clayg> well - that's pretty good i 'spose
21:29:34 <clayg> notmyname: I feel like it's hard to keep track of all the things and a lot is happening
21:29:43 <notmyname> clayg: you're right! ;-)
21:29:58 <cschwede> good that we meet again in ~ 9 weeks :)
21:30:17 <clayg> omg - i can't believe it
21:30:26 <notmyname> so with all the stuff going on, it's pretty important that we prioritize and realize what is reasonable to do before tokyo, what we can make progress on in the short-term, and what we defer to "conversations over beer"
21:31:22 <notmyname> so unless there's other things to say about the hackathon, that's a good segue to talkign about releases
21:31:41 <clayg> yay release!
21:32:14 <torgomatic> probably good to release code sometimes :)
21:32:15 <notmyname> #topic releases
21:32:18 <notmyname> yeah
21:32:30 <notmyname> so I want to do a release next week
21:32:35 <zaitcev> okay
21:32:39 <clayg> although at this point I want to get in peluse's etag_buckets fix for EC overwrite and probably other stuff too
21:32:39 <notmyname> I'll be working on release notes, etc
21:32:46 <notmyname> it will be 2.4
21:33:08 <notmyname> and that should give us time to have one more release before tokyo (albeit with a fast turnaround)
21:34:01 <notmyname> so here's a "rule" I'd like to try to follow: if there is any open bug that is marked as critical, it blocks a release
21:34:17 <notmyname> (rule in quotes because I don't know a better word of the top of my head)
21:34:45 <mahatic> guideline ;)
21:34:59 <notmyname> so if you're looking at a swift bug in launchpad.....rather, "when" you're looking at swift bugs in launchpad, if there's something that must be in the next release, it should be marked critical
21:35:03 <notmyname> mahatic: thanks :-)
21:35:16 <notmyname> does that sound ok with everyone?
21:35:19 <zaitcev> https://bugs.launchpad.net/swift shows 5 critical ones.
21:35:23 <mattoliverau> yup, great plan
21:35:48 <clayg> zaitcev: nice!  so... are they inprogress?
21:35:54 <notmyname> ya
21:36:17 <notmyname> the 2 open ones are the 2 we aren't supposed to talk about publicly yet ;-)
21:36:17 <mattoliverau> 3 are fix commited
21:36:25 <notmyname> shhhh
21:36:38 <mattoliverau> ninja bugs
21:36:50 <clayg> phthththth - dorks
21:37:03 <notmyname> anything else about releases?
21:37:29 <acoles> notmyname: sounds ok
21:37:41 <notmyname> #topic keystone related patches
21:37:46 <notmyname> #link http://paste.openstack.org/show/421542/
21:37:50 <notmyname> ho: this is your topic
21:37:56 <clayg> ho: drop some knowledge
21:37:59 <ho> yeah, I would like to introduce keystone related patches in swift to get atention from you :-)
21:38:07 <torgomatic> one of these years, I should figure out how to spin up a Keystone instance
21:38:19 <ho> I summarized the current status in the paste url.
21:38:37 <ho> If you have a time, please read it and feel free to ask me.
21:38:51 <ho> that's i want to say here. :)
21:38:56 <mattoliverau> ho: thanks for the paste, that's useful :)
21:38:59 <clayg> torgomatic: you can't - your business hours overlap with North American timezone - for some reason it's impossible to know anything about swift+keystone if you work in North America - no one knows why
21:39:25 <acoles> clayg: i'm moving to north america then :P
21:39:27 * clayg goes to read the paste
21:39:28 <mattoliverau> lol
21:39:42 <ho> acoles: great idea! lol@
21:39:56 <notmyname> ho: thanks for working on this
21:40:18 <clayg> ho: oh I think i misunderstood - you don't want more attention to swift+keystone *in general* - you just have some keystone patches that need more immediate attention
21:40:18 <notmyname> I just had someone today mention something about swift+keystone v3 and wanting something better
21:40:31 <zaitcev> torgomatic: it's very easy. just edit and run this - http://www.zaitcev.us/things/keystone-setup.sh
21:40:51 <notmyname> zaitcev: that's great!
21:41:01 <acoles> so ho has done a lot of work on a comprehensive test suite for checking all role based access scenarios. they were taking a looong time to execute but we got that done some in austin
21:41:02 <cschwede> zaitcev: nice! saved
21:41:24 <acoles> s/done/down/
21:41:26 <ho> clayg: yeah
21:42:08 <clayg> zaitcev: I dont' think `systemctl start openstack-keystone.service
21:42:14 <clayg> ^ will work on my ssytem
21:42:45 <ho> i also try to make vagrant-swift-all-in-one with keystone
21:43:00 <clayg> zaitcev: that's like "given a pre-existing keystone deployment" here's 100 lines of bash that can "set it up"
21:43:45 <ho> I will push it to the repository but i have to learn more about chef :-)
21:43:48 <tdasilva> i think mattoliverau ansible scripts sets up keystone, right?
21:43:50 <notmyname> ho: thanks for bringing these up
21:44:01 <zaitcev> clayg: My point is, you don't need to bring up whole OpenStack, just do what the script does and it makes Keystone with stable IDs and SQLite backend. Fairly painless when compared with the OOO-based installer...
21:44:27 <mattoliverau> tdasilva: yeah, I have something, I'll get that up one of these days
21:44:47 <clayg> hugokuo: has keystone setup scripts too!
21:45:12 <notmyname> sounds like there's a few things. ok, everyone go race to see who can write docs and scripts the fastest! go!
21:45:33 <notmyname> #topic open discussion
21:45:40 <notmyname> what else is on your mind?
21:46:12 <clayg> we everyone has scripts!  I don't think that really solves "a group is just a role that assigns a user to project under the domain, what's the problem!"
21:46:35 <torgomatic> eh, the scripts still help (me)
21:46:54 <torgomatic> last time I tried was by spinning up a full devstack, and it took two hours and didn't work
21:46:55 <clayg> yeah, i suppose you're right
21:47:10 <cschwede> sounds familar
21:47:18 <notmyname> I'm at the openstack ops meetup right now. there's love for swift here. "it just works." "it's very stable" "I don't have to worry about it". etc
21:47:22 <clayg> torgomatic: heh, i seem to be able to get my devstack turned on most of the time
21:47:25 <hurricanerix> notmyname: I have that testr patch currently marked as a WIP.  If anybody cares to look and give feedback on it.
21:47:26 <notmyname> be proud of what you've done with swif t:-)
21:47:26 <minwoob_> I think if we had more time at the hackathon, we might have gotten to a discussion regarding small file optimizations for EC.
21:47:40 <notmyname> hurricanerix: right! that's a good one
21:47:48 <notmyname> hurricanerix: I'm planning on looking at that asap
21:47:56 <clayg> torgomatic: but using keystone client is a real PITA - there's some descrepency between what the api says to do and what you can do with the keystone client?  I think the openstack client is better supported but less documented or something?
21:47:58 <notmyname> (which might be a day or so)
21:48:22 <minwoob_> Specifically, whether we'd want to go forward with a dual-policy for EC policy for small vs. large objects.
21:48:32 <hurricanerix> notmyname: cool, thanks.
21:48:38 <minwoob_> (replication and EC).
21:48:44 <zaitcev> On the other hand, I met an IT guy from New York who said he attended some of our sessions in Vancouver, and found "Swift people" "disorganized".
21:48:53 <notmyname> minwoob_: yeah, that's a big deal. definitely needs to happen (small file optimization). post tokyo, though
21:49:05 <notmyname> in the few times I opened my laptop, I saw a ton of pep8 patches (and zaitcev saying stuff on twitter). what's going on?
21:49:18 <acoles> torgomatic: clayg keystoneclient CLI doesn't support v3 so openstack cli the way
21:49:18 <clayg> zaitcev: we *are* disorganized - there's too much going on :'(
21:49:42 <clayg> acoles: I think zaitcev's script gets by with 'keystone'
21:49:51 <mattoliverau> zaitcev: and we didn't have monitors in the meeting rooms so that made showing things harder
21:49:53 <zaitcev> notmyname: The gentleman with strange nickname was trying to reduce the Hxxx/Fxxx exclusion list. One patch per exclusion.
21:50:09 <notmyname> the exclusion list is not a TODO list
21:50:17 <zaitcev> oh
21:50:26 <clayg> notmyname: heh
21:50:26 <minwoob_> notmyname: Right. Do you think it's something good for a spec?
21:50:28 <torgomatic> notmyname: yes, and we allowed the ones that made sense and rejected the oens that didn't
21:50:36 <notmyname> torgomatic: ok, great
21:51:10 <clayg> minwoob_: it would help to have more words - what you're thinking and how you're thinking about doing it - a spec is a plce you can write words - so it might work
21:51:18 <notmyname> I spoke with someone here, and it seems like it would be possible to adjust some dependencies so that we can get back to a whitelist of style changes. it's something I'll look in to at some point
21:51:32 <notmyname> clayg: minwoob_: we had some good dieas back at the boston hackathon
21:51:43 <acoles> clayg: yeah but i don't see it adding all the test accounts we now have - there's one in a non-default domain
21:51:44 <notmyname> not sure what (if anything) was written, though
21:52:03 <minwoob_> I'll try to coordinate with someone on that, then.
21:52:19 <clayg> the *boston* hackathon?  about directing to ec/replicated "policy" based on size?
21:52:51 <minwoob_> It probably goes deeper than just "if an object in the EC policy is < certain size, store it via replication".
21:52:56 <notmyname> clayg: something about just storing the whole data for smal objects, but in every fragment (ie 14x replication).
21:53:04 <notmyname> minwoob_: yeah, probably
21:53:15 <minwoob_> But a good start.
21:53:19 <clayg> minwoob_: redbo commented that basically the way the ec policies were exposed makes them very difficult to deploy for service providers because... well because users make unexpected choices :\
21:53:33 <notmyname> minwoob_: so, yes, it would be good to work on and it's definitely needed. one reason is because of what clayg jsut said
21:53:55 <notmyname> minwoob_: so I'd love to see something there, but don't expect it to get much attention for a few months at least
21:54:08 <minwoob_> Yeah, definitely something post-Liberty.
21:54:25 <notmyname> depending on what we get done in the next few weeks, it might be a decent session in tokyo
21:54:27 <redbo> if we can do a small file optimization for EC policies, could we do a large file EC optimization for replicated policies?  :)
21:54:32 <clayg> ho: anyway - if you *could* keep working on keystone support for vagrant-swift-all-in-one that'd be great
21:54:38 <notmyname> redbo: one policy for everything!
21:55:05 <notmyname> anything else to bring up from anyone this week?
21:55:09 <ho> clayg: thanks! i will brush up it more
21:55:39 <zaitcev> Just one more step and it'll turn out that policies were unnecessary to begin with. Except for e.g. geographic policies.
21:55:55 <notmyname> zaitcev: just one policy: do-the-right-thing
21:56:06 <mattoliverau> lol
21:56:09 <notmyname> ok, I'm calling it
21:56:17 <clayg> I'm really not sure if it's quite as simple as a single storage policy that can do both kinds of data :\  It feels more like per policy constraints or automagic re-routing of requests + symlinks
21:56:17 <notmyname> thanks for coming today. and thanks for working on swift
21:56:20 <torgomatic> what are you calling it?
21:56:27 <mattoliverau> Bob
21:56:31 <torgomatic> Good choice
21:56:32 <notmyname> torgomatic: done (the meeting)
21:56:32 <clayg> Steve
21:56:34 <jrichli> lol
21:56:37 <redbo> the other thing was if a customer sets EC policy, we could reject uploads < some size
21:56:47 <notmyname> it's always "steve" for me
21:56:53 <minwoob_> lol
21:56:56 <notmyname> redbo: ya. 409 conflict
21:57:03 <notmyname> redbo: or 412. whatever
21:57:15 <notmyname> redbo: and no chunked encoding
21:57:29 <notmyname> or copies. or *LOs
21:57:34 <cschwede> simplifies a lot
21:58:01 <notmyname> :-)
21:58:03 <notmyname> #endmeeting