21:00:31 #startmeeting swift 21:00:32 Meeting started Wed May 16 21:00:31 2018 UTC and is due to finish in 60 minutes. The chair is notmyname. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:33 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:35 The meeting name has been set to 'swift' 21:00:39 who's here for the swift team meeting? 21:00:42 o/ 21:00:53 hi 21:00:56 hi o/ 21:01:05 o/ 21:01:30 acoles: clayg: timburke: ? 21:01:32 tdasilva: ? 21:01:40 hi 21:02:13 mattoliverau is out this week, right? 21:02:55 o/ 21:03:10 i have no idea about mattoliverau 21:03:12 clayg: are you here? 21:03:25 I think I heard that from acoles 21:03:40 IIR mattoliverau is on vacation 21:04:06 woooo! 21:04:10 all right, let's get started 21:04:17 #link https://wiki.openstack.org/wiki/Meetings/Swift 21:04:31 agenda this week is feature/deep-review (no surprise to anyone) 21:05:02 first question: where are we? acoles and timburke and clayg, can you give us an update of the state of the branch, the reviews, and the testing? 21:06:09 testing is going great - timburke says I need to do more stuff from the API - but this cluster was so jacked - i'm just happy to see container api requests not like 404ing and some indication that it might eventually have a chance at accurate up to date listings 21:06:54 large and giant containers have been sharded. child DBs have been sharded. idle DBs have been sharded. active DBs have been sharded (right, clayg?) 21:06:57 starting... Friday I think we can re-package and start some testing in a different cluster we've been loading up that is running on SSD's - so it's in not quite such a terrible state - I'm thinking about how to further quantify correctness validation 21:07:05 notmyname: yeah all that! 21:07:26 you name it we shard it - if it's big and on HDD's we know what to do - works a treat! 21:07:42 timburke: any extra color-commentary to add to that? 21:07:43 there's been a couple of issues we had to square last week - but it's all on the review branch now 21:07:44 i'm pretty happy with my review of 2 of the 8; i've at least gotten through almost all of the third patch 21:08:01 clayg: nice (all the updates on the review branch) 21:08:02 the only stuff we're hittting now have to do with the SwiftStack management interface and monitoring - boring 21:08:07 there are a ton of tests -- which is great! but means reviewing can take a while 21:08:12 acoles: how are you feeling about it? 21:08:49 as timburke says the first 2 of 8 patches seem to be ok, with +2's 21:09:10 there were some definite troubles with the client api, but i think acoles has started working on it, so i'm sure it'll be great Real Soon Now 21:09:36 timburke: tell me more. do we obviously break the API with the current state of the feature? 21:09:40 I have a backlog of comments to address on other patches, and I'm sure there wil be more 21:09:54 as of yesterday, delimiter queries could trip a 500 21:10:10 could? or would? 21:10:26 I hope this will help with that https://review.openstack.org/568930 21:10:27 patch 568930 - swift (feature/deep-review) - SQUASH: handle delimiter listings in proxy 21:10:31 could, iirc. if the last item was a subdir 21:10:43 that ^^ 21:10:49 ok 21:11:03 there are also some efficiency problems with prefix or path queries 21:11:06 so something more than a particular set of failures at the right time 21:11:28 (*and* delimiter, but that's... not so surprising, given the 500s) 21:12:22 notmyname: yeah, and also the sort of thing where a plain old /v1/a/c?delimiter=/ request would almost certainly trip it 21:12:29 yeah, that's not good 21:13:01 I'm inclined to downplay efficiency issues with listing billion-row containers 10k at a time 21:13:14 but I want to make sure that's not simply because I want the feature to be on master asap 21:14:17 I think that makes sense. 500s are obviously bad, but if something is kind of slow (only on sharded containers), then that can be sped up later 21:15:05 idk, one of the next queries after that could be /v1/a/c?delimiter=/&prefix=zzz/ and i don't want to hit a couple hundred containers to find the *one* that actually has a chance at giving me data 21:15:32 maybe it could be a fast-follow, though 21:16:01 timburke: acoles: is "there are some api issues that need to be squashed in now and some possible performance issues that could be addressed after" an accurate characterization of what current testing has revealed? 21:16:38 sure 21:16:47 kota_: m_kazuhiro: rledisez: does that make sense to you so far? 21:17:38 ok 21:18:01 +1 21:18:12 notmyname: yes. just to clear up a doubt. the performance issue is only related to sharded container, there is no impact at all on "normal" containers, right? 21:19:02 I wouldn't see how it could impact non-sharded containers, but acoles/timburke, please confirm 21:19:41 yup. my particular concern is limited to sharded containers 21:19:48 and more generally, acoles has done a great job of trying to make sure that non-sharded containers shouldn't get any *worse* 21:20:08 rledisez: I'm sure you have users with huge container DBs, right? 21:20:17 IIUC yes. but I'd hope we're all looking hard at that in review i.e. that unsharded is not impacted 21:20:47 yes, our recordman reached 1 billions objects on one container before understanding it couldn't work anymore ;) 21:20:54 rledisez: !!! 21:21:25 ok, here's my pre-typed thoughts/summary question... 21:21:30 we've been talking about this feature for a very long time. We've discussed it and designed it as a group. It's had focused attention from many people for the past six months. during the last month, it's been continuously updated by 4 devs. The feature has been well-tested in lab environments at different scales. At what point do we say we want to have this on master, and available to users, instead 21:21:31 of in gerrit as a proposal? 21:21:55 notmyname: I'm at that point 21:22:03 in other words, what else do we need to do before we merge it? 21:22:05 having it in gerrit sucks - i want to ship it 21:22:17 oh... that's a good question 21:22:26 from what I've heard, merging the delimiter fix is pretty important 21:22:28 I think we just need to click something on a web interface? 21:22:33 clayg: lol 21:23:06 clayg says, "I don't understand. just click the buttons in gerrit. why is this taking so long?" ;-) 21:24:10 kota_: what are your thoughts? 21:24:14 i'd like to get a full look at all the patches like i'd done with encryption, but that may take longer than anyone would like 21:24:21 it's a *lot* of code 21:25:08 there's some issues raised by reviewers that have not been addressed yet, I have them tracked on trello. be useful to know if any are considered s blockers. I do not want it to seem they are ignored. 21:25:16 i'm with timburke's opinion but exactly it takes longer term we expected before the review-branch proposed. 21:25:55 #link https://trello.com/b/z6oKKI4Q/container-sharding 21:25:57 loger than, 21:26:02 longer 21:28:12 longer is ok - I know notmyname was hoping to have it done before Vancouver - but if we think there's value in another week of pre-merge review that's ok 21:28:24 another way to look at is we merge it and do an `rc` kind of thing 21:28:31 let me slightly rephrase then. kota_ and timburke, what will you be looking for (if you had infinite time) in the rest of the review branch that would make you think we shouldn't merge it? 21:28:51 continue doing the "review" + test + fixups without a big freeze and gerrit train wreck 21:29:13 or maybe you're already in a "shouldn't merge yet" state, and you're looking to be convinced of why to merge it? 21:29:25 I'm going to be doing testing right up until we ship it - if I find something this week vs. next week it's not much difference to me 21:29:55 notmyname: on the meaning, I didn't find significant problem on the review-branch so might be ok to land it as -rc as clayg said. 21:30:41 whatever is easier to manage - i'm already packaging off the feature-deep so I can do testing beyond what i'm already done on my laptop and saio in dev 21:30:43 yes, my hope an dream is that we'll be able to send it to the gerrit merge queue by the end of this week (ie my friday evening). that way I can tag a release on monday and talk about that at the summit next week 21:32:38 instead of an -rc release, I'd prefer to do a 2.18.0 and follow-up with .1 (etc) as needed 21:32:42 it's easy to make releases 21:32:53 * clayg shrugs 21:33:24 not 2.17.95, 2.17.99? 21:33:25 acoles: what are your thoughts? 21:33:47 zaitcev: yeah my packages are still 2.17.XXX 21:33:51 something like saying "beta" like as we did in the past EC work? 21:34:19 my only real concern is that while it's proposed, there's a real and pressing need to review it, and that review catches a lot of bugs; once it lands, i know i'm unlikely to go back over whatever is actually merged with the same care -- which means we're punting to our users to discover and write up bugs for it 21:35:21 timburke: valid concern - only i'm actually still going to be testing it ... but that holds true merge or no, so I'm not advocating for anything specifically beyond "let's please merge this eventually" 21:36:20 ah, but testing and review are separate :-) 21:37:15 we'll *all* be testing it to some degree, i hope 21:37:33 IMO, we've tested this feature more pre-release than any other major feature we've written, and personally I can't imagine finding something to quibble about in the code that isn't a minor nit or hasn't already been found by the combined acoles clayg timburke and mattoliverau 21:38:04 notmyname: feels like we are light on review coverage, but its such a big change that we're not going to get it all polished in review. We need review to focus on 1. does this break unsharded? 2. things that will be a pain to change once on master 3. blockers. 21:39:05 notmyname: well the delimiter bug is a little scarey, and its due to functional testing not covering sharded container, plus human incompetence! 21:39:06 clayg always makes me think 2 is a very large set :-( 21:40:01 2 what? 21:40:03 acoles: what's your opinion on how much each of these three issues have been reviewed? 21:40:23 "but can we just go changing a 404 to a more-correct 405?? think of the clients!" 21:40:50 oh. 2 == the second thing in acoles's list 21:41:15 notmyname: gotcha! 21:41:20 timburke says that clayg thinks "things that are a pain to change on master" is a big set 21:41:20 notmyname: my opinion is guided by votes on gerrit 21:41:26 clayg: (I was struggling too) 21:41:42 acoles: fair enough 21:41:42 timburke: and it's a 400 => 405 if you're talking about the auth thing... and I'm not even advocating either way - i was just asking if we'd considered it 21:42:13 ok, back to sharding instead of auth response codes ;-) 21:42:23 thank you for talking so long on this topic so far this meeting :-) 21:42:24 would the delim bug have been harder to fix after we merged - or just embarrassing if we hadn't caught it this early? 21:42:50 did it effect unsharded containers? 21:43:05 notmyname: I think we're ok on 1 (unsharded impact), 2. root container sysmeta still bugs me, internal APIs I hope are ok. on 3. it's the unknown unknowns ;) 21:43:41 I think like EC we can probably have a release that's like "sharding is new, new things take some time to harden, if there's some nasty bugs we might have to break things and we'll try to have a plan for data that migth already be out there" 21:44:13 kota_: suggested some kind of "beta" like EC - again I think not having auto-shard helps us here? 21:44:43 clayg: if https://review.openstack.org/568930 is sufficient then it wouldn't have been a big deal. I was a little worried we needed proxy to tell backend to treat markers as inclusive, but I think I convinced myself not. 21:44:44 patch 568930 - swift (feature/deep-review) - SQUASH: handle delimiter listings in proxy 21:44:49 definitely helps 21:44:59 clayg: yep 21:45:18 and I'm not sure I agree with timburke about the hard line difference between review + test - I think at somepoint validating the design with real world has equal if not greater value to code reading (granted timburke is one of THE BEST code readers, and I'm not that great at it) 21:45:32 personally, since there's a patch for the delimiter fix, I am 100% ok with merging the next patch chain (with the delimiter fix and addressing comments). but I'm not going to click those buttons at this point until we're all ok with maintaining this on master 21:46:37 ☝️ 21:46:40 notmyname: I am conscious that I am now rate-limiting, to a certain extent, fixes that could be handled faster once we are on master 21:47:49 tdasilva: what do you think? 21:47:53 I like kota_'s idea about some sort of "preview" to caution people. and clayg has good comments on that (we don't have auto-shard yet, and we'll try to have a plan for anything else) 21:48:10 did tdasilva make it here today? 21:48:16 notmyname: i'll take a look the rest of patches briefly and maybe ok with the maintaining at the master if not significant issue found. 21:48:24 and like with encryption, i've got a running list of things that i've commented on (so at least it's written down *somewhere*), but agree with acoles that it doesn't need to block a merge to master 21:48:37 I talked to tdasilva on the phone yesterday at length about this topic. he is comfortable will merging soon 21:48:40 notmyname: I think we need to have a decision about "are we going to decide as a community when we're going to merge this and all the caveats around that" - or are we going to NOT make that decision and just see where the review branch is at NEXT wednesday? 21:48:54 NO to the second one 21:49:06 (mostly because I don't think we'll have a meeting next week) 21:49:12 heh 21:49:19 'cause I'll be at the summit. with several other people in here 21:49:26 "try to have a plan for anything else" is what bother's me. i'd be happy to convince customers with big containers to let me enable sharding for them. but only if i'm sure I won't be in a dead-end few weeks/months later 21:50:33 notmyname: if we merge on Friday when does the 2.18 release happen? 21:50:49 have we ever tried to disable sharding for a reasonably large container, and have the system fix itself? seems a lot like shrinking, but with horrible cardinalities... 21:50:50 or maybe that second part isn't conditional on the first? 21:51:17 rledisez: worst case (at this would be bad) would be some tool that migrates from a dead-end sharded schema into a clean, new DB with new goodness. yes, that's bad, but my point is that worst case still has the possibility for maintaining data integrity 21:51:36 clayg: 2.18.0 release happens ASAP after sharding lands 21:51:55 (assuming there isn't a lurking data-loss bug in the current patches) 21:52:01 rledisez: you can wait to hear from us - we'll be doing sharding at customer sites like in late June 21:52:12 which i'm *pretty* sure isn't there... i think... 21:52:16 also early june, most likely ;-) 21:52:31 that's very soon, i won't even be ready to upgrade to 2.18 ;) 21:52:34 notmyname: meh, well that does make "just merge it" a little heavier... 21:53:00 perhaps. but "are we going to decide as a community when we're going to merge this and all the caveats around that" 21:53:15 that's what I want 21:53:46 as a community, we say "yes, let's merge soon (ie this week) and release" or "no, let's wait at least another week for further review" 21:53:58 notmyname: right - i'm getting there - kota_ (I think somewhat reasonably) advocated for a "rc window" - basically sometime after it's on master before we ship the official 2.18 - you're saying it doesn't look like such a window exists - so I was just confirming 21:54:12 ah, got it 21:54:35 timburke: if we suspended new updates until sharding was complete we could simply abort sharding at any time. but with new updates going to shards, we have to shrink them back, which is likely to more painful than sharding them? 21:54:41 TBH, I'm not sure if the openstack mechanics still exist for an rc release. I think it's just a number, at that point. 21:55:38 2.18.0 will also include the great improvements to multi-process replication and the s3api integration 21:55:43 notmyname: +1 I don't understand OpenStack RC - but for me having master be "it's awesome plus sharding" be a thing for a little while (couple weeks?) is ok by me 21:56:29 acoles: yeah, i was thinking about PUTing a new set of shard ranges such that everyone would/should eventually shrink back into the root. which would suck, but would be an escape hatch 21:56:35 notmyname: like I think it'd be cool if we say "hey feature-deep-review has been going great, thanks everyone for the reviews + test - keep it up! we'll merge on Friday and I'll go to Vancouver, when I get back the week after we'll double confirm we want to cut 2.18 of if there's a show stopper" 21:56:43 I'd be happy to leave the meeting with that ^ 21:57:20 but... dunno how it works out timeline wise 21:57:53 If we DON'T merge on Friday I wonder if we might as well cut 2.18 before we merge? Which is also ok with me 21:58:04 timburke: yep, do-able, and a mass shrink event had occurred to me as a means to finally drop all shards when a container is empty...but then there'd be no rows to move. 21:58:07 my next swift release is going to include container-sharding somewhat regardless 21:58:12 that's a totally valid plan. the only "cost" is any social credibility or marketing benefit for saying "this is available now" vs "this will be available soon" 21:58:38 notmyname: I think that mostly effects you - and if you knew how to make an rc release you could say it's aviable NOW as an RC 21:58:56 you can say whatever you want - doesn't make the code any better - that's what we're working on - regardless of what you say ;) 21:59:02 clayg: 2.18.0 is an "rc" release of 2.18.1 ;-) 21:59:13 *not complaining* just pointing out differences 21:59:31 valid point. nothing I say next week makes the code any better or worse :-) 21:59:50 kota_: rledisez: timburke: acoles: what are your thoughts? 22:00:04 (I really want to finish this line of thinking before we call the meeting over) 22:00:08 notmyname: that's fine with me! I think some people care about openstack release numbers more than I do personally! no one thinks we shouldn't do more backports 22:00:14 IIRC nobody is in this room right now 22:00:39 it's not even an openstack cycle release. we'll have at least one more swift release before we do the next openstack cycle release 22:01:05 we're only just now in the middle of the "rocky" release cycle 22:01:09 i could probably live with that -- but i know i'm unlikely to get to the sharder at this rate before merging then 22:01:11 * kota_ feels fair enough at the release for marketing perspective 22:01:25 timburke: live with what? 22:01:39 merge by friday 22:02:53 Releasing is ok for me because I know/trust you made sure there would be no regressions for non-sharded containers. sharding will have bugs, but they will get fixed :) 22:03:22 note, if someone objects to a merge or release schedule because they think it's too early, raising that objection doesn't cast you in a bad light at all. please feel free to disagree or object 22:03:40 The stability for existing clusters was a source of strength for us. It allowed EC to phase in gradually. 22:04:19 zaitcev: the interesting thing about this feature is that there's no "phase in" when it's used. it will be immediately used ont he biggest contaienrs out there (good news, though, it's not somehting that happens automatically) 22:04:56 I think the question is either merge friday or wait another week for more review 22:06:07 whenever the merge happens, I'd like to cut a release asap. I have a slight preference for doing all that before vancouver next week. but I am comfortable with holding off, if that's what people agree it 22:06:11 s/it/to 22:06:17 so if you have half a day, shard a container and exercise the API, look for bugs in API. If you have a whole day, look through the *existing* modules for regressions to unsharded path. If you have 2 days..IDK, dig into the sharder daemon. 22:06:34 whoot! 22:06:42 thinking of reals, probably i can't make a time to review so much in the next week due to the summit attendance. 22:07:09 so just another week is not so much worth to me to get it better (just to me tho). 22:07:31 ok. that's a good point 22:07:34 so i don't think not so many difference to get merged this weekend or the next week end. 22:07:44 is mattoliverau also travelling to summit? if so i guess he's not going to have much time to review either 22:07:53 mattoliverau will be at the summit 22:07:58 as will cschwede 22:08:08 move the summit - that's the answer! ;) 22:08:12 :-) 22:08:17 lol 22:09:31 acoles: timburke: clayg: are you ok with planning to merge on friday? or do you want to wait another week? 22:10:29 clayg: timburke: I want to make sure your previous comments hold (and I didn't misunderstand them) 22:10:46 acoles: I don't think I saw an answer either way from you yet 22:11:09 if you're not ok with merging friday, that's ok 22:11:20 notmyname: yeah, if there are +2's on every patch 22:11:37 i'd just ask to m_kazuhiro i think he looked at the review branch 22:11:49 m_kazuhiro: did you find any problems so far? 22:11:56 kota_: he did a great review ! 22:12:18 acoles: :D 22:12:22 I think there are no problems 22:12:48 m_kazuhiro: ok 22:14:26 I'm still waiting on seeing somethign from clayg and timburke. sorry if I missed it 22:15:02 I'll go +2 everything Friday morning 22:15:08 then yeah - let's ship it 22:15:15 lol 22:15:59 did timburke go afk? 22:16:35 notmyname: he's offline on slack 22:16:39 hard to type, sleeping baby on my chest. let's do it! 22:16:43 ah :-) 22:16:51 timburke: awwww :D 22:16:54 i'll get as much review in as i can 22:17:02 ok, we're way over time (thanks for sticking with it) 22:17:16 here's a summary, as I've understood the conversation so far 22:17:31 (with some commentary) 22:17:41 mostly we're all ok with what we see so far 22:18:03 rledisez really doesn't want us to break prod for him, but since we haven't so far, he's pretty sure we wont' break it this time either 22:18:44 timburke is doing a great job reviewing, but it will be hard/impossible for him to review all of the patches thoroughly by friday. it gets hard to review with babies on your chest 22:18:54 clayg is ready to merge it yesterday 22:19:34 kota_ and mattoliverau have contributed and reviewd so far and are mostly ok, but will be in canada next week. therefore another week of review time doesn't matter there 22:19:40 🤞 22:20:00 m_kazuhiro has look at at some of the chain and done a great review, but he's mostly happy with the way things look 22:20:20 and he will be in canada too 22:20:26 ok 22:21:06 this meeting has gone hella long - I need to bounce out - but I agree with notmyname's recap and submit to the will of the greater good! Also... I'm going to ship container-sharding in a few weeks regardless, so y'all do you. ;) 22:21:08 acoles is pretty nervous, because he's the primary author here and there hasn't been extensive code-reading reviews on all patch sets of all patches in the chain 22:21:29 /end summary :- 22:21:33 :-) 22:21:33 I get the blame :/ 22:21:39 acoles: no! 22:21:43 no blame at all 22:21:48 :) 22:22:00 you've got a totally reasonable position 22:22:01 acoles gets THE GLORY!!! 22:22:07 clayg: +100 22:22:25 acoles: but importantly, am I wrong? :-) 22:23:35 let's have a target of friday for a merge. timburke will review. we'll revisit later this week, especially in light of the next revision that includes the delimiter fix 22:23:37 notmyname: no, you're accurate, and I meant nervous because I get the git blame. but sorry, move on, we need to progress 22:24:22 `git praise` works instead of blame, if you want a positive connotation 22:24:37 and yes, we've gone way over time. thanks again 22:24:38 I'll also look at the some of latter ones today to check them as releasing at friday. 22:24:44 kota_: thank you 22:25:00 thank you for all of your work on this feature 22:25:32 two final notes: (1) May 17 is swift's 8th birthday! (2) register for the PTG. the tickets go up in price in a few hours 22:25:43 ugh 22:25:45 okay 22:25:56 #endmeeting