21:00:51 #startmeeting swift 21:00:52 Meeting started Wed Aug 16 21:00:51 2017 UTC and is due to finish in 60 minutes. The chair is notmyname. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:54 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:56 The meeting name has been set to 'swift' 21:01:00 who's here for the swift team meeting? 21:01:05 o/ 21:01:12 hello o/ 21:01:15 hi o/ 21:01:16 o/ 21:01:55 hi 21:01:56 hi 21:02:06 clayg: ping 21:02:06 hello 21:02:19 werd 21:02:56 all right, let's get started 21:02:58 i'm here, i'm here! 21:02:59 #link https://wiki.openstack.org/wiki/Meetings/Swift 21:03:02 timburke: :-) 21:03:13 a couple of topics to go over this week 21:03:40 #topic pre/post amble patch 21:03:49 let me set some context here 21:04:19 we have https://review.openstack.org/#/c/365371/ from joeljwright that's been open a while 21:04:32 hmm.. patchbot isn't joining... 21:05:00 "Add Preamble and Postamble to SLO and SegmentedIterable" 21:05:25 i've seen more patchsets than that 21:05:32 i'm only mildly impressed 21:05:32 ok, so joeljwright is asking for a reasonable thing: is this patch (and therefore the idea behind it) something that we will actually land, or should he investigate other options 21:05:44 a historic IRC conversation was at 21:05:46 #link http://eavesdrop.openstack.org/irclogs/%23openstack-swift/%23openstack-swift.2017-06-19.log.html#t2017-06-19T15:48:47 21:06:31 in my opinion, it seems like a relatively novel extension to the SLO manifest format that is not breaking anything existing and that adds new functionality 21:07:02 but to quote the earlier conversation "there's such a huge difference between "this patch is good enough to land" and "this patch should land" " and "I'm a little worried about inventing something to solve a bunch of use cases that we make up" 21:07:19 so... joeljwright, where does that leave us? 21:07:33 share more about your use of this functionality at sohonet 21:07:37 hello 21:07:51 * kota_ is late 21:08:00 kota_: welcome. we're just getting started (you didn't miss much) 21:08:08 well, this all came about when we wanted to share multiple files using a tempurl and preserve some structure (so initially a tarball) 21:08:25 but when we started the work we realised it could be so much more 21:08:31 joeljwright: notmyname was highlighting that you're past "this is one way I can see to solve this" and all the way into "we're using this and it is working" 21:08:49 for like ... business needs - not just like "i tested it and it does what it says on the tin" 21:08:54 hence the generic extension to SLO 21:09:11 yes, we're not using exactly this (because it's not released) 21:09:18 yeah, what clayg said is what i'm interested in 21:09:29 but it makes a big difference to our use case of building tarballs using existing segments 21:09:49 we don't want to store thousands of tiny objects 21:09:58 because download performance is bad 21:10:03 and we're not interested in that data 21:10:21 the tiny objects being the *ambles for tarfile segments? 21:10:26 yes, sorry 21:10:45 in order to build tarballs we have to generate 2 very small objects for every stored object 21:10:53 a tar header and tar padding 21:11:02 both in the 0-1024 byte size range 21:11:46 ok. and you're using this today in prod? 21:11:49 in exploring how to make it better we realised that with a fairly simple addition to SLO we could explore not only tar, but also buyilding other container formats 21:12:12 we're using a related version in testing 21:12:21 yeah, I think doing the tar SLO client side is weird - I could see just *not* bothering with the tar and just making the client do GET for the objects it wants but *shrug* in the tempurl case (before) prefix... there's some flexibility here to be really prescriptive 21:12:31 we haven't deployed it to prod because of maintaining a patch 21:12:33 ok. did you find other non-tar formats that work here? 21:12:35 you can download these 10K things, in a tarball, with this single tempurl 21:13:16 clayg: to extend that "with these names, and fail if anything on disk changes" 21:13:21 i do believe that is a thing that will be useful from time to time - if they can figure out how to solve that with this they probably could have figured out a way to solve it w/o this 21:13:29 joeljwright: good point! 21:13:40 so... where are you at? 21:13:58 I think a lot of people have got past "yeah that is probably be a thing we could do" 21:14:07 currently I have a test system that uses this patch and builds the premables/postambles manually 21:14:31 I also have an experimental middleware that's just a pre-SLO manifest transform to make the headers/padding 21:14:48 like stub data - or using your data model? Like your business use case? with the existing segments and stuff? 21:15:21 clayg: can you elaborate what you're after 21:15:23 ? 21:15:24 joeljwright: is there any reason not to dynamically generate the tar headers on a requests (eg: POST req containing the list of objects to include in the archive)? i don't know the tar format, but i used to do that for zip. maybe it just does not match your use case 21:15:27 the manifests are big right? a few MB? Can you share for a production like use-case one a real world example? 21:16:04 yeah, our main product that uses Swift is a file transfer application 21:16:11 rledisez: that cuts out the tempurl use case - or at least you still have to distribute a manifest thing that enumerates the things 21:16:20 we build packages to be downloaded by recipients who do not have swift accounts 21:16:34 we use SLO + tempurl to achieve this 21:16:41 rledisez: I think it's related but distinct from joeljwright's use-case - but there is probably a use-case to make for more dynamic stuff too. 21:17:23 rledisez: the reason not to do it dynamically is to get the SLO data validation check 21:17:30 one thing I like about joeljwright's patch is that it has the hooks for future stuff like dynamic *ambles, but it does not add that complexity now 21:17:39 if it downloads/validates the data is what the sender wanted to share 21:17:46 joeljwright: specific elaboration on what I'm after (not sure if this helps anyone else) a real world example of a for-realzy-sovling-a-customer-use-case segment with *ambles 21:18:30 Like... in python stdlib folks always say "wouldn't it be great if this was in stdlib" and python core be like "yeah, probably, you should put it on pypi and after a bunch of people use it and you flesh out some of the edges of rubber meats the road we'll bring it in" 21:18:31 the only real-world I need it now use case is making tar-SLOs that don't cripple performance with loads of read for tiny objects on the object servers 21:18:33 i'm still a little up in the air about whether the inline data should be attached to a segment or to have a new "segment type" -- but i guess since we already have the *amble patch, may as well go with it? what i definitely *don't* want to do is change our minds on it after-the-fact 21:18:49 I have other things I want to explore - zip, ISO even mov 21:19:02 are you *waiting* on upstream to enter "maintenance" mode before you *use* it? Don't wait on us. Show us how stupid we are for not having it work *exactly like this* already 21:19:34 I have been waiting for upstream to validate that I'll get it eventually before I use it in production 21:19:59 otherwise I have to maintain many different code bases 21:20:02 i think that's created some of the tension we're experiencing 21:20:07 +1 21:20:27 i mean middleware is middleware - i have middleware - lots of people that run swift have middleware - you're supposed to? 21:20:37 rledisez: you have middleware? 21:20:41 joeljwright: ok, so to clarify, you've got a real problem, you've got a pretty cool idea on how it could be solved and patch for that, but you have *not* actually used this functionality yet to solve your problem in prod 21:20:48 the problem with this approach is it requires patching SLO and helpers 21:20:48 clayg: sureā€¦ 21:21:19 notmyname: that's a fair assessment, I've held off pushing to get this in prod 21:21:26 * timburke has definitely *never* released a spike of a swift-provided middleware... 21:21:29 I would deploy it tomorrow if I could though :) 21:21:46 lol 21:22:04 yeah, I can sympathinze with joeljwright's position, though, because he's not only maintaining swift in prod, AFAIK he's the only one doing it and adding the differences from upstream (even just patching slo middleware locally) could be daunting 21:22:45 ok, well I'm not saying that should change, and i'm *definitely* not saying that running hacks on upstream code is ideal (it definitely is crappy) 21:23:09 but it does create the tension of committing to support new SLO manifest formats forever in the hopes that we got the new stuff right before seeing it used 21:23:13 it's a tricky situation 21:23:18 clayg: thinking of use-cases, imagine researchers wanting to share large data sets together with the scripts used to analyze them 21:23:26 the datasets are large, they don't want to waste space by uploading a separate tar copy. the scripts are small, so even if we solve the de-dupe problem with small objects for headers/padding, we still have the ratelimiting problem 21:24:06 timburke: we can imagine a lot of places where new code could be useful... that's never been the problem, right? 21:24:26 this all feels a bit catch-22 21:24:34 heh yes! totally unfair to joeljwright 21:24:49 if I *had* such a use-case I would just see if joeljwright's existing solution works and then +2 (literally *works* for me!) 21:24:50 notmyname: clayg was looking for "a real world example of a for-realzy-sovling-a-customer-use-case segment with *ambles" -- i gave him something! 21:25:00 timburke: +1 21:25:06 :) 21:25:08 timburke: nope - i don't need an *idea* 21:25:54 so that's the thing here. as a big community (with different employers, different use cases, etc), to some extent we can't only approve the stuff that we personally see use cases for 21:26:26 joeljwright actually has a for realzy use case here. the question is more about if the give patch is maintainable by the community, *not* if it's actually useful functionality 21:26:31 clayg: isn't that what joeljwright's users have? like, he's got SLOs with big segments and tiny segments, and they make a tarball, but performance sucks due to all the tiny segments 21:27:54 torgomatic: not to mention all the extra time making 3* the number of SLO manifests I really need :) 21:28:32 :) 21:28:38 so we get back to the basic question of "who's gonna review it?" 21:28:43 right? is that where we're at? 21:29:16 clayg: but I want to know your thoughts too. is joeljwright's existing use case ok? even though he admits he hasn't run this code in his prod yet? 21:29:34 you can't say someone's use case is "not ok" 21:29:38 lol 21:29:41 it'd be like saying "you're not mad at me" 21:29:50 :) 21:30:20 It's just everytime I think about joel's use-case I write a TLO middleware - not add preamble to SLO 21:30:21 right. sorry. I didn't want to trap your answer there. 21:31:00 ok, yeah. so if one wanted to solve downloading .tar files, one would likely do somethign a lot simpler than *ambles in SLOs 21:31:01 and how much of slo does that reinvent? what about when you realize you want cpio archives instead? 21:31:14 timburke: you beat me to it 21:31:32 and the *amble idea is interesting because we see that it might be useful, but it's not actually been asked for, even by joeljwright's users 21:31:52 which makes it hard to justify the more complex solution 21:31:56 clayg: is that about right? 21:32:24 i *am* a bit nervous about the number of variations we're adding to slo segments. that's one of the reasons i've been thinking about having "object" segments and "inline" segments -- i feel like it might make things easier to reason about 21:32:26 idk, I think it's fine - if we have folks that "might" review it I don't think that really answer's joeljwright's problem really well 21:32:57 timburke: that's a fair suggestion 21:33:22 but we'd probably want to stop inline-only manifests 21:33:31 which is why I shied away from that 21:33:47 timburke: I have no strong intuition that I should prefer one solution over the other - it depends on the % of use-cases that need to envelope individual segments vs totally independent in-between and around datas 21:33:48 I wanted to make it obvious that this is for making the data your store more useful 21:33:51 both are complex and workable 21:33:52 not storing data in manifests 21:34:03 and i've certainly contributed! https://github.com/openstack/swift/commit/25d5e68 ... https://github.com/openstack/swift/commit/7fb102d ... 21:34:09 joeljwright: so what do you want to see? review the current patch with a goal of landing it? it sounds to me that timburke has some interesting alternative suggestions 21:34:17 like - SLO's are almost turing complete now 21:34:25 clayg: :D 21:34:44 joeljwright: on inline-only manifests... maybe just write the damn data? 21:35:17 "look, you wanted this data written, i wrote it! what else do you want?" 21:35:27 (as a non-slo, i mean) 21:35:29 I'm really after some review that say (a) this is too complex or (b) this looks useful or (c) this would be better if you changed... 21:35:48 definitely b, maybe a 21:35:58 I'd like to know if this has any likelihood of landing 21:36:08 because I need to solve this problem 21:36:14 acoles: kota_: tdasilva: you all are being quiet. what are your thoughts? what do you see as the best path forward? 21:36:20 okay, I'll volunteer to take an in-depth look at it in the next few days 21:36:35 torgomatic: thanks! 21:36:36 torgomatic: that's very helpful. thanks 21:38:19 I'll be around on the swift channel if anyone wants to talk about this separately 21:38:29 Now that I've heard of the problems joeljwright's having and how he solves it, it has me sold.. well that we need a solution. I haven't looked at the patch so can't say it's it, but willing to go look at it in the mindset of we should solve it 21:38:31 notmyname: still at catching up the history yet but... 21:39:26 I'll find time to have an initial look today 21:39:41 mattoliverau: kota_: thanks for looking! 21:39:47 I'm torn - I'd prefer not to make SLO more complex but I can see the desire to re-use SLO by adding ambles 21:40:31 primary question, i didn't find any reason why it is needed for tar archive case (it might be discussed before i joined here) 21:40:47 kota_: yeah, that's in some of the earlier links 21:41:03 kota_: the pre/postambles avoid the need to store tiny objects for tar header/padding 21:41:19 it looks just a feature to add binaries by manifest to each segment 21:41:26 but preserve the SLO features of validating that the data you wanted in the tar is what you're downloading 21:41:41 tick tock 21:41:43 ok, I want to move on the meeting, but thank you for your comments here. let's see if I can summarize where we are 21:41:51 I don't feel we're in a drastically different place that we've been before 21:42:07 yeah, but 40 minutes is enough! 21:42:07 joeljwright: oic 21:42:19 it is complex, it is worth solving, no one is *sure* what should change because we have no external pressure driving the design 21:42:40 heh, clayg just summarized it very well 21:42:52 clayg: maybe we can catch up some time tomorrow to talk about alternatives? 21:43:04 joeljwright: no, this code is already written 21:43:07 and torgomatic and mattoliverau offered to look at the patch 21:43:19 kk 21:43:26 I'll stop stressing now :) 21:43:30 :-) 21:43:33 (until the reviews arrive!) 21:43:33 we just have to have enough of a WAG at how to "qualify" it - and bandwidth to review 21:43:47 ok, let's move on. thank you joeljwright 21:43:56 #topic deadlock patch 21:43:59 https://review.openstack.org/#/c/493636/ 21:43:59 patch 493636 - swift - Fix deadlock when logging from a tpool thread. 21:44:12 torgomatic did a great job finding and solving this issue 21:44:41 +100 21:44:45 we worried earlier this week that it would add some linux-only things, but that's been removed (so we still only test linux, but it might work somewhere else maybe) 21:45:16 clayg: you said we'd likely land this right after the meeting today :-) 21:45:33 oh yeah, can answer any questions? 21:45:41 there's two I can NOT answer 21:46:02 #1 how do we prevent this kind of lockup bug in future (no idea, it's a huge space, probably ripe) 21:46:38 #2 is torgomatic some sort of super genius mutant (probably, we have some evidence of that, but one can not be sure - he may just be a cyborg) 21:47:20 That would confirm my theory that swiftstack Devs don't sleep 21:47:46 or are bots 21:47:49 torgomatic: you just pushed a new patch set during this meeting. is it all good from your perspective? timburke, clayg, acoles: are you planning on or expecting to +A it shortly? 21:48:01 what? new patch set!? 21:48:14 notmyname: I was planning to sleep shortly :) 21:48:16 notmyname: I think it's good, but then, I would ;) 21:48:33 acoles: proving mattoliverau and tdasilva wrong ;-) 21:48:41 clayg: added a check to make sure you can't unlock someone else's mutex, like threading._RLock has 21:48:48 Lol 21:49:22 slry, I will +A it so hard 21:49:38 so...probably +A soon? might be worth having someone *outside* of swiftstack understand the problem and how this fixes it, though... 21:49:51 I honestly need to package it today or in the am.. so.. my life is non-trivially better if we +A now - is that... workable for folks? 21:50:10 I hate it when things happen to quickly - it's so foreign to how we normally work it makes me uneasy. 21:50:24 ok. mostly for this topic I just wanted to make sure people understand it and are aware of it 21:50:24 lol 21:50:25 Are we *supposed* to be this agile and responsive?! Is this ok!? 21:50:46 clayg: if a patch doesn't take 6 months then something feels wrong? 21:51:00 tdasilve: :D 21:51:05 is there anyone who has *not* read the bug report and understand why this needs to be fixed ASAP? 21:51:59 yeah I think understanding is useful-ish - because... and this came up this morning - this is so awesome we should probably backport it - it could be RC on lp bug #1575277 or lp bug #1659712 21:52:01 Launchpad bug 1575277 in OpenStack Object Storage (swift) "object-replicator goes into bad state when lockup timeout < rsync timeout" [Medium,Confirmed] https://launchpad.net/bugs/1575277 21:52:02 Launchpad bug 1659712 in OpenStack Object Storage (swift) "Object replicator hang" [Undecided,New] https://launchpad.net/bugs/1659712 21:52:15 if so, then read https://launchpad.net/bugs/1710328 soon 21:52:16 Launchpad bug 1710328 in OpenStack Object Storage (swift) "object server deadlocks when a worker thread logs something" [High,In progress] - Assigned to Samuel Merritt (torgomatic) 21:52:35 not very helpful, but i need to say this patch is amazing. what used to never work (logging in diskfile) now works perfectly! 21:52:35 ok, I'll probably kick it in ~1.5 hrs 21:52:54 wooo! 21:52:56 we stressed it today with success 21:53:01 nice!! 21:53:02 wooo! 21:53:11 rledisez: can you leave a short comment in gerrit saying that? 21:53:14 Awesome 21:53:17 sure 21:53:19 thanks 21:53:21 rledisez: zomg rledisez doing *reviews*!? 21:53:36 ok. last thing (and related to this patch)... 21:53:40 tdasilva: i won't even be able to keep up we move so fast 21:53:41 rledisez: did that come up much for you before? i thought part of why it took so long to pin down was the fact that we *don't* do much logging in diskfile... 21:53:43 #topic swift 2.15.1 for pike 21:53:59 2.15.1 - best swift release YET 21:54:15 timing for the pike release is coming up quickly. what's on master at the end of the week is what I'll be tagging for pike as 2.15.1 21:54:19 i have hard stop at 3pm, i might leave 2 mins early if we're not done 21:54:21 timburke: we figured it was because of this bug. alecuyer ahs been working around for so long, when we saw the patch coming, it was like "omg, we are so excited" :D 21:54:36 great! 21:54:42 which includes torgomatic's patch for the deadlock 21:54:49 rledisez: that's great :-) 21:54:56 timburke: IDK but I have distant memory that whenever I tried to add debug logging to diskfile something broke ... and maybe this was it 21:55:11 any other questions/comments from anyone? 21:55:17 three cheers for torgomatic 21:55:20 hip hip! 21:55:23 hip hip 21:55:26 awww! 21:55:31 hoooooRAY! 21:55:46 \o/ 21:55:49 "hip hip hip hip awww!" is an appropriate set of cheers for a concurrency bug :) 21:55:51 notmyname: sorry, you said tag on Friday yeah!? 21:55:57 lolasd;lfjkasdf;lkjadsf 21:56:06 Lol 21:56:16 land on master by friday. I'll tag this weekend or monday 21:56:29 thank you, everyone, for coming 21:56:35 do we care for patch 472659 being in the release? 21:56:36 https://review.openstack.org/#/c/472659/ - swift - Allow to rebuild a fragment of an expired object 21:56:38 for swift 2.15.1, i'd love to see this merged, i think it's ready now, last comment was about a header name, waiting for a +A ;) : https://review.openstack.org/#/c/472659/ 21:56:38 patch 472659 - swift - Allow to rebuild a fragment of an expired object 21:56:41 https://review.openstack.org/#/c/472659/ <-review 21:56:41 patch 472659 - swift - Allow to rebuild a fragment of an expired object 21:56:42 :) 21:56:47 thanks especially for going over the *amble patchw ith joeljwright and joeljwright for sticking with us on it ;-) 21:56:58 thanks everyone 21:57:00 ok that's three request for same thing, it must happen! 21:57:06 yes merge that one 21:57:12 :) 21:57:13 it must happen 21:57:20 ok, i was reviewing that today, will continue tomorrow 21:57:24 tdasilva: thanks 21:57:26 tdasilva: thank you 21:57:40 thanks for your work on swift 21:57:44 #endmeeting