16:00:00 #startmeeting cinder 16:00:01 Meeting started Wed Mar 7 16:00:00 2018 UTC and is due to finish in 60 minutes. The chair is jungleboyj. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:03 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:06 The meeting name has been set to 'cinder' 16:00:11 courtesy ping: jungleboyj DuncanT diablo_rojo, diablo_rojo_phon, rajinir tbarron xyang xyang1 e0ne gouthamr thingee erlontpsilva patrickeast tommylikehu eharney geguileo smcginnis lhx_ lhx__ aspiers jgriffith moshele hwalsh felipemonteiro lpetrut 16:00:18 hi 16:00:20 hi 16:00:29 hi! o/ 16:00:35 hi 16:00:43 Hey 16:01:28 Give people another minute to collect up. 16:01:49 Hope everyone eventually made it home safely from the PTG. 16:02:01 Hey 16:02:19 Still a bit of snow around :-) 16:02:22 DuncanT: 16:02:31 Has your country returned to normal? 16:02:56 We had a storm here on Monday that made the one there look like a flurry. Hardly paused us. 16:03:05 Yeah. Got rid of nearly all of the angry foreign types who were complaining about their extended stay now, too 16:03:20 Though it took 4 attempts to get Rachel's car out of my neighborhood including a beautiful 180 I engineered. 16:03:29 DuncanT: :-) 16:03:29 Video? 16:03:42 DuncanT: Nope sorry, was too busy driving the car. 16:03:46 #action jundleboy to get a dashcam 16:03:48 hi 16:03:54 :-) 16:04:11 Ok, I know smcginnis is in Tokyo so he is probably not going to be here. 16:04:18 So, lets get started. 16:04:25 #topic announcements 16:04:47 First, my apologies for everyone who got caught up in the Cinder Dinner debacle. 16:05:01 Can't believe an Irish pub would run out of food and beer. :-( 16:05:06 Hi 16:05:17 Thanks to those of you who came and who tried to come. 16:05:31 I apparently was too optimistic that night. 16:05:47 I blame being a Minnesotan. 16:06:32 Also, I have started working on getting the Cinder Rocky priorities and spec review list put together. I have linked it at the top of the meeting agendas etherpad. 16:06:42 #link https://etherpad.openstack.org/p/cinder-spec-review-tracking 16:06:49 I will be talking about that a bit more later in the meeting. 16:07:00 jungleboyj: thanks 16:07:01 Also a note that we have moved to the Rocky etherpad for meeting agendas. 16:07:11 #link https://etherpad.openstack.org/p/cinder-rocky-meeting-agendas 16:07:48 Last, I have gotten a recap of all the decisions and action items put together from the PTG: 16:07:51 #link https://wiki.openstack.org/wiki/CinderRockyPTGSummary 16:08:12 If anyone has concerns or questions with the content there, please let me know. 16:08:43 I think that is all I had for announcements. 16:08:52 #topic PTG Wrap Up 16:09:10 Thank you to all of you that were able to make it there. I feel like it was a productive week. 16:09:27 Thank you all for being flexible and working around the issues that the Beast From the East caused. 16:09:47 Kind-of feel like we all shared living through a disaster movie. ;-) 16:10:13 As you can see from the links above I have worked to get all the info we discussed organized for this release already. 16:10:39 Anyone have anything else they want to share with regards to the PTG? 16:11:36 For those who couldn't be there, I will share that the Foundation is having trouble getting funding for the PTGs from sponsors and is, therefore, losing a lot of money on them. 16:11:59 Sounds like they will be having another PTG in the fall, location TBD, but not sure about after that. 16:12:07 :( 16:12:18 :( 16:12:20 I think if they stop doing PTGs this team would need to go back to doing mid-cycles. 16:12:35 Do people agree with that statement? 16:12:52 jungleboyj: +1 for mid-cycles in this case 16:13:24 jungleboyj: I agree 16:13:42 Anyone disagree? 16:14:46 +1 midcycles 16:15:02 Ok, so I am going to start putting a bug in my managements ear about possibly doing something in RTP February or so of next year. 16:15:28 That way we have at least started considering an option if the PTGs come to an end. 16:16:12 What are the odds of snow stopping play there? 16:16:27 #action jungleboyj to start back-up planning for a mid-cycle in RTP for spring 2019. 16:16:35 DuncanT: :-) Pretty low. 16:16:51 Could happen in January but now they are at the point where the temps are in the 50's. 16:17:30 #topic New processes for Rocky release. 16:17:44 So, I have taken some of the discussion from the PTG and decided to try something new. 16:17:58 #link https://etherpad.openstack.org/p/cinder-spec-review-tracking 16:18:15 If you look at the spec review tracking page you will see that I have extended the content somewhat. 16:18:42 Instead of just including specs and associated reviews I have also put sections in for each of the changes that we agreed upon at the PTG. 16:18:55 I often use that page as a starting point for my review priorities. 16:19:05 I am hoping the rest of the team can do the same. 16:19:43 If you have work you are doing that is related to one of the priority items, please add it to the review list there and then this will help the team to focus on getting the work we have agreed upon reviewed and merged. 16:20:03 I will be taking some time in the weekly meeting to quickly review it each week and to follow up on progress. 16:20:18 I have the list roughly in priority order. 16:20:33 I am leaving those that still need specs merged first in the 'Spec Review' list. 16:21:25 I hope this is helpful to the team. Think it is worth a try and will help avoid getting to the end of the release, me going through the list of stuff that we talked about at the PTG and then realizing we missed stuff like happened with Queens. 16:21:55 Any questions on that? 16:22:40 ooh mid cycles... 16:22:42 i'll take a look at it 16:22:49 * hemna_ catches up 16:23:39 eharney: Thanks. 16:23:52 Welcome hemna_ Glad you are still alive. 16:24:31 So, obviously this takes the community to help make successful. I will add reviews in there as I see them. Just feel like we need to add a little more guidance behind our work. 16:24:41 Don't want to kill everyone with process either though. 16:25:29 That was all I had there. 16:25:38 #topic cinder coresec list 16:25:49 Thank you to eharney for noting the list is out of date. 16:26:03 I have gotten admin control from jgriffith for our security reviewer list. 16:26:37 So, I think that leaving hemna_ eharney DuncanT on the list makes sense. 16:26:46 I added smcginnis as he was, hilariously, missing. 16:26:50 hehe 16:26:57 well he is kinda shifty.... 16:27:01 hemna_: Yeah. 16:27:31 So, I am also thinking that thingee can maybe be removed as well as winston-d 16:27:50 heh winston-d 16:27:52 miss that guy 16:27:59 Add in e0ne, geguileo and tommylikehu ? 16:28:05 +1 16:28:05 hemna_: Yeah, me too. 16:28:43 Any objections to that plan? 16:28:55 eharney: Any concerns that that is too large a group to have? 16:29:06 no 16:29:18 Ok. So I am going to make those changes. 16:30:02 Thank you for the input there. 16:30:10 hemna sighting. 16:30:25 barely survived Dublin 16:30:33 hemna_: Will be here more often now. 16:30:36 Righ? 16:30:40 Right? 16:30:41 that's the plan :) 16:30:44 * jungleboyj glares at hemna_ 16:30:58 You'll be in Dublin more often? Awesome 16:31:01 #topic deferred volume removal 16:31:16 arnewiebalck_: The floor is yours. 16:31:20 :) 16:31:50 I’d like to propose to support rbd’s deferred volume deletion in Cinder. 16:31:56 As an option. 16:32:14 it would be good to have feedback from eharney and jbernard on this 16:32:27 the proposal is that this is leveraged by the existing volume delete API? 16:32:35 e0ne: ++ 16:32:39 Why I brought it here. 16:32:59 Can somebody explain the way it works, please? 16:33:12 the name alone gives me pause 16:33:23 from the docs: "Initial support for deferred image deletion via new rbd trash CLI commands. Images, even ones actively in-use by clones, can be moved to the trash and deleted at a later time." 16:33:53 when does that later time happen? 16:33:54 this sounds like something we should be investigating for the RBD driver, yes 16:33:59 after all outstanding operations are complete? 16:34:03 hemna: not by itself 16:34:19 arnewiebalck_: So what's the point? 16:34:23 the real deletion needs to be triggered 16:34:33 so delete.......no..now really delete... 16:34:47 this has the potential to help with some issues we've had in the driver when deleting volumes with dependencies, i think, but i'm not too familiar with it 16:34:48 DuncanT: The point is that “deletion” would be instantaneous from Cinder’s PoV. 16:34:57 move to trash is very fast 16:35:05 deletion happens in the background or later 16:35:06 So, Cinder thinks it is gone but later RBD finishes the process. 16:35:14 wiat 16:35:16 err 16:35:17 eharney: as I understood, it can help us to avoid 'rbd_flatten_volume_from_snapshot' param usage 16:35:18 right, it's lazy 16:35:37 deletion time becomes independent from volume size 16:35:39 wait....the real deletions happen automatically? or have to be triggered later (by cinder somehow?) 16:35:47 arnewiebalck_: What's the advantage? delete is async, and until it is really gone you can't reuse the storage, so all you're doing is lying to the user.... 16:36:34 DuncanT: we’ve seen cases where heavy deletion of several large volumes can block other operations 16:36:38 if Cinder has to trigger the real delete later....how can it do that because the volume is marked deleted in the cinder DB. 16:36:38 DuncanT: that kind of happens already when data is shared between volumes and images, might be useful, we'll have to go look at design for this 16:36:49 the user can delete and recreate immediately 16:36:52 DuncanT: in case of Ceph, large volumes removal could be very long operation. in this case, cinder will mark such volume as deleted 16:36:55 at any rate this would just be a change in the driver, i believe 16:36:57 I’m not lying to the user :) 16:37:14 eharney: yes, this is driver specific 16:37:24 how/when is the trashed volume deleted? 16:37:33 arnewiebalck_: Until the volume is gone, they can't recreate if there is not enough space.... 16:37:37 hemna_: At that point RBD handles the deletion from a separate command. 16:37:43 hemna: sth needs to trigger a purge 16:37:45 rbd trash purge 16:37:55 in other words, cinder has to issue that 16:38:03 :( 16:38:06 purge comes with Mimic, in Luminous it would be remove 16:38:08 * jungleboyj laughs ... it is like Window's recycle bin. 16:38:09 that's bad mmmkay 16:38:15 guys 16:38:29 eharney: Bring reason. 16:38:30 we don't need to try to hash out the whole implementation here right now 16:38:44 ++ 16:38:59 eharney: jbernard is there customer demand for this? 16:39:00 I'm just trying to understand the basics of it 16:39:32 sounds like there would be if it gives a perf benefit 16:39:38 We don't need to hash out the whole implementation, but unless we can discuss the theory of operation, there's no point it being on the agenda.... 16:40:10 :-) So, this feature is limited to RBD. 16:40:14 all i can say is it sounds like something interesting to look into 16:40:23 eharney: +1 16:40:30 eharney: but while on trash, is it still affecting the used space? because then it's affecting the scheduler over provisioining? 16:40:34 eharney: +1 16:40:42 eharney: +1 16:40:51 so the only perf benefit is that cinder marks it as deleted in it's db and then ignores it forever, instead of waiting for a 'real' rbd delete image to finish? 16:41:08 Are the RBD deletes slow? 16:41:28 jungleboyj: for a 50TB volume they can be 16:41:29 we can easily synchronize this in the driver, would it be helpful if i send a mail to the list with more details? 16:41:30 RBD *** slow.... 16:42:28 arnewiebalck_, is it more efficient to have 10 50TB volumes in the trash being purged? or 10 50TB volumes being deleted the original way? 16:42:48 arnewiebalck_: Does it also allow the user to control the load on the filesystem caused by deletes? 16:42:50 arnewiebalck_: so you tell Cinder to delete 50TB, cinder tells you is deleted, and you still don't have that space available, that would be odd for any Cinder user 16:42:55 hemna_: it’s more efficient from Cinder’s and the user’s point of view I’d say 16:42:55 I.E. delete at off peak times? 16:43:02 there is an expiration limit, which purge adheres to 16:43:09 I see no advantage to speeding up the transition from 'deleting' to 'deleted' if that is the only difference... If the transition to 'deleted' has finished, the space should be immediately available for reuse IMO 16:43:17 hemna_: I don’t think it’s faster on the backend 16:43:55 arnewiebalck_, ok, I guess that's what I'm trying to understand. the purge just happens async at some other time, and it's still as inefficient as a normal delete. 16:44:22 kicking the can down the road is all this does 16:44:24 DuncanT: if a user deletes a 50TB volume, he has to wait the hour it takes to delete it before recreating a new volume 16:44:51 DuncanT: Also, Cinder may run out of worker threads as they are all busy with deletions. WE’ve seen this here. 16:45:04 hemna_: yes 16:45:14 arnewiebalck_: Are you telling me you are running out of 1000 threads in Cinder volume? 16:45:17 arnewiebalck_: But until the delete is finished, they're still using the storage, so they should still hold the quota 16:45:34 DuncanT: +1 16:45:44 DuncanT: why? 16:46:01 arnewiebalck_: The point of the quota is to limit how much storage a user can tie up... Seems like there's a trivial DoS attach with deferred delete.... 16:46:09 DuncanT: if the space is needed, the trash can be purged any time 16:46:26 Ok ... team ... I think we are missing the point here. 16:46:40 jungleboyj: +1 16:46:49 this requires a spec 16:46:54 We have another topic to cover yet and I don't want to go spinning out of control. 16:46:56 and we can discuss it there 16:47:00 geguileo: ++ 16:47:20 For those that know RBD this sounds like something that is somewhat sensible. 16:47:29 Specs are painful to have discussions on about something like this :-( 16:47:37 Way too slow round-trip time 16:48:02 Lets get a spec on this that explains better the semantics and then continue discussion here when we all understand the proposal. 16:48:22 DuncanT: Understood, but it is a vehicle for jbernand, eharney and arnewiebalck_ to educate us. 16:48:53 arnewiebalck_: You ok with getting that started and we can talk more in a week or two? 16:49:00 jungleboyj: sure 16:49:00 <_alastor_> Does this sound useful for any backend other than Ceph? 16:49:08 _alastor_: Good question. 16:49:13 it's a ceph driver/backend feature... 16:49:24 We've discussed deferred deletes before 16:49:29 As a general feature 16:49:44 I think the last person who proposed it wanted an un-delete featuere 16:49:52 (Might even have been HP that proposed it) 16:50:00 Quota ended up being the issue 16:50:07 proposing a new feature/API sounds like a whole different discussion to me 16:50:21 eharney: ++ 16:50:27 eharney: +1 16:50:28 Ok, so lets table this for today. 16:50:41 #action arnewiebalck_ to put together spec with better description. 16:50:54 #action jungleboyj to add to spec review list. 16:51:15 #action team to review the spec and we will discuss further in the meeting a week or two from now. 16:51:29 arnewiebalck_: Sound like a plan for you? 16:51:41 jungleboyj: perfect 16:51:51 thx 16:52:09 arnewiebalck_: Welcome. Thanks for bringing it up. 16:52:20 Ok, last topic: 16:52:35 #topic Reexamine policy rule of OWNER . 16:52:40 tommylikehu: .... 16:52:43 https://bugs.launchpad.net/cinder/+bug/1714858 16:52:43 Launchpad bug 1714858 in Cinder "Some APIs doesn't check the owner policy" [Medium,In progress] - Assigned to TommyLike (hu-husheng) 16:52:47 jungleboyj: thanks 16:52:57 hope this would be quick and easy one 16:53:03 #link https://bugs.launchpad.net/cinder/+bug/1714858 16:53:09 this is not a big issue, but will affect a lot of APIs. 16:53:27 the point is what the response code we want the end user to get when non-administrator try to query/update/delete resource that is out of his project scope. 16:53:50 and whether we want to make the policy of OWNER really works in cinder. 16:54:33 i'm not too clear on the status of this bug and its fix as of now 16:55:06 aren't there still concerns buried in there somewhere about force delete operations being allowed when they shouldn't be, etc? (if not other operations) 16:55:41 eharney: this bug is saying the policy of OWNER will always succeed 16:55:57 tommylikehu: but it doesn't spell out what that actually means 16:56:03 tommylikehu: So, the problem is that if it is admin or owner sometimes the owner isn't able to use the command with that policy? 16:56:11 which is one of the things that we need to provide so we can understand the impact 16:56:23 tommylikehu: ? 16:56:36 it isn't clear when reading that bug if there are a bunch of different vulnerable APIs or what 16:57:04 jungleboyj: the policy check will always think you are the owner 16:57:15 that sounds fairly severe to me 16:57:19 geguileo: +1 16:57:20 geguileo: Oy! 16:57:22 eharney: Yup 16:57:33 Yeah, that is a problem. 16:57:39 so how is this not a security bug? 16:58:00 eharney: Now I am agreeing with you. 16:58:06 but most operations won't succeed because we limit our retrieval by the context's project_id 16:58:16 so the policy passes 16:58:17 eharney: because we still can not handle the resource that out of my project scope cause it will raise 404 16:58:27 tommylikehu: for what operation? 16:58:27 and the get of the resource would raise a NotFound 16:58:39 this change was made in numerous places, we don't even have a clear list of which are broken 16:58:51 ^ that is the problem 16:59:16 I checked some and we were protected by the DB retrieval mechanism, but I didn't check them all 16:59:18 so it still sounds like it should be tracked as a security bug to me until it's proven that there is no vulnerability etc 16:59:26 the problem is policy OWNER doesn't work at all 16:59:43 that's an implementation problem, the question is, what vulnerabilities are deployers exposed to? 17:00:14 So, we are out of time. 17:00:21 Can we take this to the channel? 17:00:32 sure 17:00:39 Ok. Cool. 17:00:48 Thanks everyone for a good meeting. 17:00:52 Talk to you next week. 17:00:56 #endmeeting