14:00:38 #startmeeting cinder 14:00:39 Meeting started Wed Feb 5 14:00:38 2020 UTC and is due to finish in 60 minutes. The chair is whoami-rajat. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:40 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:42 The meeting name has been set to 'cinder' 14:00:42 hi :) 14:00:43 hi 14:00:52 #topic roll call 14:00:57 hi 14:01:01 hi 14:01:17 o/ 14:01:28 hi 14:01:30 #link https://etherpad.openstack.org/p/cinder-ussuri-meetings 14:02:29 o/ 14:03:10 Hi 14:03:26 hi 14:03:34 will wait for 2 more minutes before the announcements 14:04:29 i think we can move to announcements now 14:04:33 #topic Announcements 14:04:49 Ussuri milestone-2 is next week Feb 10 - Feb 14 (specifically 13 February 2020 (23:59 UTC)) 14:04:49 that implies deadline for a new driver or a target driver 14:04:50 requirements for a driver to be complete is working code and unit tests merged into cinder repo + working third party CI 14:04:50 additional info in the mail 14:04:50 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-January/012055.html 14:05:38 any additional comments regarding this are welcome :) 14:06:43 okay, so moving on to the next announcement 14:06:59 code review policy for py2->py3 transition 14:07:00 #link https://review.opendev.org/#/c/703709/ 14:07:00 The main concern here is regarding backports 14:07:00 we need to have certain guidelines for code to work when backported to stable branches 14:07:26 this includes guidelines for features as well as bug fixes 14:09:56 let me know if i'm going too fast but seems like no discussion is needed around this too 14:10:56 so moving on 14:11:17 update to driver removal policy 14:11:17 #link https://review.opendev.org/#/c/704906/ 14:11:18 with some discussions around this topic from past few weeks, we've finally decided to keep 14:11:18 unsupported drivers in-tree until they cause major disturbance in cinder gate then they will be removed (given they have completed the deprecation cycle) 14:11:18 additional info is again mentioned in the patch 14:11:18 Think you are doing fine whoami-rajat :-) 14:11:47 o/ 14:11:50 jungleboyj, thanks :D 14:12:01 rosmaita, yay! 14:12:17 thanks for getting the meeting going 14:12:22 Looks like I need to look at the driver removal patch again. 14:12:28 Will do that. 14:12:44 so we've some review requests for the final announcement 14:12:51 rosmaita, np 14:12:57 https://review.opendev.org/#/c/704425/ - fix a unit test blocking sqlalchemy upgrade to 1.3.13 14:13:13 you want to elaborate on this rosmaita ? 14:13:42 no, just that we need to merge it soon, it's blocking all of openstack from upgrading sqlalchemy 14:13:59 looks like the problem was with one of our tests, not a real problem 14:14:27 ok 14:14:48 so one more review request 14:14:49 https://review.opendev.org/#/c/705362/ - open specs repo for Victoria 14:14:58 but this is approved 14:15:00 so nevermind 14:15:13 ok, thanks 14:15:28 just read through the scrollback, you covered everything i wanted to say 14:15:31 thanks whoami-rajat 14:15:47 so i think with the last announcement i can hand over to rosmaita 14:15:51 Spec freeze exception granted to "support volume-local-cache" 14:15:59 o/ 14:16:18 rosmaita, i was afraid if you had more elaborate notes and i may have missed some things, but glad to hear that, feww 14:16:40 some questions came up on the spec last week before the freeze deadline 14:16:55 so i wanted to carry it over before we said yes or no 14:17:12 which brings us to our next topic 14:17:23 #topic volume-local-cache spec 14:17:56 one Nova engineer thinks they are moving volumes to mount directly by qemu 14:18:12 not mount to host os first 14:18:19 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-January/012279.html 14:18:44 also, here's a link to the latest draft of the spec: 14:18:52 #link https://review.opendev.org/#/c/684556/ 14:19:36 so the issue is, if nova is planning to mount volumes directly by qemu, then that would completely bypass the cache in this spec, is that correct? 14:19:36 as i understand it, consuming cinder volumes via qemu instead of attaching them to the host is still blocked by the fact that qemu doesn't support multipath 14:19:45 but it has been a goal for a bit 14:19:51 rosmaita: yes 14:20:29 I know they've been pushing for that for a long time, but last I looked, there were multiple reasons NOT to go with direct QEMU mounting of volumes. 14:21:21 so it may not be as much of a done deal as is implied in that email? 14:21:44 i think everyone agrees it would be a better way to do things 14:21:54 multipath support is kind of a big deal, though 14:21:59 right 14:22:01 currently only rbd and sheepdog is mounting directly by qemu 14:22:10 and sheepdog is no more 14:22:56 I don't think it supported FC either. 14:23:07 "The storage protocol that won't die" 14:23:10 ok, so my reason in putting this on the agenda is to ask: is this a reason to hold up Liang's spec? 14:23:14 either way, i noted a handful of concerns on this spec that are generally around the theme that there are lot of things you have to account for when getting into the data path of cinder volumes that don't seem to be sorted out thoroughly yet 14:23:41 the encryption layering one being one of the biggest issues in my mind 14:24:06 yes, it is definitely important to get that right 14:24:25 eharney: did you see the update yet? does it address your concerns? 14:24:32 i haven't looked at it yet 14:24:39 ok 14:25:55 i think other than encryption, the major concern is migration? 14:26:42 yes, it's not been clear to me whether migration actually works, one of the nova folks pointed out that you have to have a mechanism to flush the cache during migration 14:26:59 Seems important. 14:27:19 otherwise you leave a bunch of data in the cache on the source host during migration that doesn't show up on the destination host 14:27:36 which would not work at all 14:27:43 which would be a bummer indeed 14:28:22 write-back mode is working like this 14:28:35 my feeling is that these major issues need to be understood at the spec stage, we probably shouldn't try to work them out in the implementation phase 14:28:39 write-through mode will not :) 14:29:10 so the "safe modes" should migrate OK? 14:29:17 yes 14:29:36 not dirty data in cache in write-through mode 14:29:40 ok, and the current proposal is that we would only support "safe modes" 14:29:40 the same issue crops up with consistency groups 14:29:45 flushing shouldn't be necessary in write-through right? 14:29:56 yes 14:30:14 no need flush for write-through 14:30:19 and snapshots "work" but may surprise users 14:30:57 (all for the same reason) 14:31:13 just to make sure i understand 14:31:21 in write-through/safe mode, every write io will go to backend 14:31:24 in the "safe modes", snapshots should be OK, right? 14:31:41 yes all of this would work normally in write-through mode 14:31:45 in safe modes, cache just like read only cache 14:31:53 ok, just wanted to be clear 14:32:13 so the problem is that to get the best benefit from the cache, all sorts of stuff could break 14:32:22 migrations, snapshots 14:32:39 is restricting to safe caching modes too big a restriction? 14:32:43 what i mean is 14:32:52 migrations would break, snapshots would succeed but have older data in them than you expected 14:33:15 right, and across a group, wouldn't necessarily be consistent any more 14:33:18 right 14:33:35 good thing we just call them "groups" now :) 14:33:56 well that's a separate thing 14:34:24 yeah, that was a bad joke 14:34:24 This is sounding concerning. 14:34:27 eharney: I still don't understand why older data than expected 14:34:49 no newer data in cache in any moment 14:35:22 because snapshots are performed on the backend, and a snapshot will be created for the volume with the data that was written there but data in the cache (which the user of the instance thinks has been written there) won't be in the snap (when in write-back mode) 14:36:16 write-back mode yes, safe-mode will not 14:36:27 right 14:36:33 :) 14:36:34 also i couldn't find any usecase regarding the write-back cache in the spec 14:36:52 will not support write-back mode 14:36:58 i asked LiangFang to remove it because we aren't supporting it any more 14:37:15 ceph support client side read only cache 14:37:25 oh ok 14:37:26 but it is using DRAM as cache 14:37:56 volume local cache is just something like read-only cache 14:38:11 but using persistent memory or fast ssd 14:38:39 I know ceph read only cache is a new feature just developed 14:39:19 here's a question: could we support the "unsafe" modes later on by, for example, making sure cache is flushed before a snapshot, or would that start to get too complicated? what i mean is, is there a theoretical reason other than complexity for why this couldn't be done reliably? 14:39:48 it definitely could be done, we just need hooks to request cache flushes in the right places 14:40:06 so it could be possible to implement this in phases 14:40:20 Is there a reason for the urgency here if there is a safer way to get this done? 14:40:36 becasuse i think operators and users are going to want to use the unsafe caching modes 14:40:57 would be interesting to know how widely used writeback caching is in Nova now 14:41:06 jungleboyj: mainly that we have to coordinate with nova to get it to actually work 14:41:10 Ok. Then we need to make sure to implement it so that they don't shoot themselves in the foot in the process. :-) 14:41:26 so nova doesn't want to approve changes unless we have approved it on our side 14:41:50 Sounds like we may need some cross-team meetings to work through all the intracasies like we had to do with multiattach. 14:42:07 smcginnis: ++ 14:42:11 smcginnis: ++ 14:42:22 ok 14:42:25 ++ 14:43:17 ok, i think this is worth pursuing during this cycle? anyone disagree? 14:44:08 seems worthwhile 14:44:18 what i mean is, having the discussions, not waiting for the PTG 14:44:37 Seems worthwhile if we can do it in a safe manner. 14:44:46 i would be much happier if we also had another reference implementation like dm-cache to test along with it, but that's probably dreaming too much :) 14:44:47 ok, i think the next move is to have a bluejeans conference 14:45:31 eharney: dan from Nova mentioned dm-crypt not working 14:45:34 could i get names of people who definitely would want to attend and their time zones 14:45:53 will help me offer some choices on a poll 14:46:00 for meeting day/time 14:46:13 i can (EST) 14:46:14 eharney: main issue is: the backend volume should not containing any metadata 14:46:30 IST +05:30 UTC 14:46:55 EST 14:46:58 UTC+8 14:47:44 i think encrypted volumes already don't follow that, but we can figure it out later 14:48:02 ok 14:48:25 ok, i'll look on the nova spec and see who's commented 14:48:41 thanks 14:48:54 i'll get a poll out later today or early tomorrow 14:49:29 #topic resource_filters response is inaccurate 14:49:54 this was implemented before i started working on cinder 14:50:00 context for this is whoami-rajat's patch fixing a problem in the volume-transfers API 14:50:18 what i noticed is this 14:50:35 #link https://docs.openstack.org/api-ref/block-storage/v3/?expanded=list-resource-filters-detail#resource-filters 14:50:42 #link https://review.opendev.org/#/c/703658/ 14:50:51 that's what our resource_filters response gives you 14:51:05 and actually, what most people would see is really this: 14:51:17 #link https://opendev.org/openstack/cinder/src/branch/master/etc/cinder/resource_filters.json 14:51:27 because it was designed to be operator-configurable 14:51:51 according to the api-ref, the value of the "resource" element is supposed to be "Resource which the filters will be applied to" 14:52:06 the problem is that all resources mentioned in our API URIs are plural ("volumes", "snapshots", "backups") whereas all the resources in the file are singular ("volume", "snapshot", "backup") 14:52:24 in some ways, this is a minor point 14:52:58 but i also noticed that the volume-transfers API doesn't implement the resource_filters framework that (most) of the other list-resource calls do 14:53:18 so, we should get the volume_transfers into that resource_filters response 14:53:43 which brings up the question: "volume-transfer" or "volume-transfers" 14:53:55 (yes, it's a hyphen '-', not an underscore 14:53:59 ) 14:54:37 one issue is, how can we change the resource_filters response? 14:54:57 but, my take is that since it was designed to be configurable, the response doesn't have to be microversioned 14:55:15 that is, whether it's available or not can be microversioned (i think it may be) 14:55:25 but we can correct the response without a new mv 14:55:45 i feel we should remove the plural 's', that makes more sense, volume name vs volumes name 14:55:48 (sorry, i kind of obsess over API issues) 14:56:17 well, the URL paths are all plural 14:56:24 I'm not sure what value we would get out of a change like that 14:56:28 and the question is, what does the filter apply to? 14:56:34 it's been that way forever 14:56:38 for better or worse 14:57:09 it's just kind of weird that we're giving a list of filters you can use, but there's no actual resource with that name 14:57:21 but i can live with it if it doesn't bother anyone else 14:57:43 i would agree it doesn't need to be microversioned, but tempest definitely disagreed with me the last time i went down a similar path 14:57:44 it is odd 14:57:45 wouldn't that change break a lot of clients expecting volumes vs volume ? 14:58:09 wouldn't those clients already break if you just removed the config for volume now? 14:58:12 so the change is to what shows up in the resource_filters response, not the API path 14:58:39 i think programmatically, what you would want to do is match the resource in the URL to the resource listed in the response 14:58:51 i don't think the API says it must contain any particular field like "volume" 14:58:51 right now, you have to know to remove the 's' to find it 14:58:59 right, but they would be expecting the key volumes in the response and if we changed it to volume, that would break their app/client/call/expectation 14:59:09 1 minute 14:59:20 hemna: the other way around, but i get your point 14:59:46 ok, we can continue this later, looks like some pro, some con 14:59:51 will have to be worked out 15:00:00 and i have prevented open discussion again 15:00:08 Thanks rosmaita 15:00:19 anyone with other issues, please move over to cinder channel 15:00:31 whoami-rajat: i think you need to end the meeting 15:00:37 oh ok 15:00:49 #endmeeting