#openstack-meeting-4 log

14:00:38 <whoami-rajat> #startmeeting cinder
14:00:39 <openstack> Meeting started Wed Feb  5 14:00:38 2020 UTC and is due to finish in 60 minutes.  The chair is whoami-rajat. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:40 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:42 <openstack> The meeting name has been set to 'cinder'
14:00:42 <m5z> hi :)
14:00:43 <e0ne> hi
14:00:52 <whoami-rajat> #topic roll call
14:00:57 <enriquetaso> hi
14:01:01 <LiangFang> hi
14:01:17 <smcginnis> o/
14:01:28 <eharney> hi
14:01:30 <whoami-rajat> #link https://etherpad.openstack.org/p/cinder-ussuri-meetings
14:02:29 <jungleboyj> o/
14:03:10 <anastzhyr> Hi
14:03:26 <tosky> hi
14:03:34 <whoami-rajat> will wait for 2 more minutes before the announcements
14:04:29 <whoami-rajat> i think we can move to announcements now
14:04:33 <whoami-rajat> #topic Announcements
14:04:49 <whoami-rajat> Ussuri milestone-2 is next week Feb 10 - Feb 14 (specifically 13 February 2020 (23:59 UTC))
14:04:49 <whoami-rajat> that implies deadline for a new driver or a target driver
14:04:50 <whoami-rajat> requirements for a driver to be complete is working code and unit tests merged into cinder repo + working third party CI
14:04:50 <whoami-rajat> additional info in the mail
14:04:50 <whoami-rajat> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-January/012055.html
14:05:38 <whoami-rajat> any additional comments regarding this are welcome :)
14:06:43 <whoami-rajat> okay, so moving on to the next announcement
14:06:59 <whoami-rajat> code review policy for py2->py3 transition
14:07:00 <whoami-rajat> #link https://review.opendev.org/#/c/703709/
14:07:00 <whoami-rajat> The main concern here is regarding backports
14:07:00 <whoami-rajat> we need to have certain guidelines for code to work when backported to stable branches
14:07:26 <whoami-rajat> this includes guidelines for features as well as bug fixes
14:09:56 <whoami-rajat> let me know if i'm going too fast but seems like no discussion is needed around this too
14:10:56 <whoami-rajat> so moving on
14:11:17 <whoami-rajat> update to driver removal policy
14:11:17 <whoami-rajat> #link https://review.opendev.org/#/c/704906/
14:11:18 <whoami-rajat> with some discussions around this topic from past few weeks, we've finally decided to keep
14:11:18 <whoami-rajat> unsupported drivers in-tree until they cause major disturbance in cinder gate then they will be removed (given they have completed the deprecation cycle)
14:11:18 <whoami-rajat> additional info is again mentioned in the patch
14:11:18 <jungleboyj> Think you are doing fine whoami-rajat  :-)
14:11:47 <rosmaita> o/
14:11:50 <whoami-rajat> jungleboyj, thanks :D
14:12:01 <whoami-rajat> rosmaita, yay!
14:12:17 <rosmaita> thanks for getting the meeting going
14:12:22 <jungleboyj> Looks like I need to look at the driver removal patch again.
14:12:28 <jungleboyj> Will do that.
14:12:44 <whoami-rajat> so we've some review requests for the final announcement
14:12:51 <whoami-rajat> rosmaita, np
14:12:57 <whoami-rajat> https://review.opendev.org/#/c/704425/ - fix a unit test blocking sqlalchemy upgrade to 1.3.13
14:13:13 <whoami-rajat> you want to elaborate on this rosmaita ?
14:13:42 <rosmaita> no, just that we need to merge it soon, it's blocking all of openstack from upgrading sqlalchemy
14:13:59 <rosmaita> looks like the problem was with one of our tests, not a real problem
14:14:27 <whoami-rajat> ok
14:14:48 <whoami-rajat> so one more review request
14:14:49 <whoami-rajat> https://review.opendev.org/#/c/705362/ - open specs repo for Victoria
14:14:58 <whoami-rajat> but this is approved
14:15:00 <whoami-rajat> so nevermind
14:15:13 <rosmaita> ok, thanks
14:15:28 <rosmaita> just read through the scrollback, you covered everything i wanted to say
14:15:31 <rosmaita> thanks whoami-rajat
14:15:47 <whoami-rajat> so i think with the last announcement i can hand over to rosmaita
14:15:51 <whoami-rajat> Spec freeze exception granted to "support volume-local-cache"
14:15:59 <rajinir> o/
14:16:18 <whoami-rajat> rosmaita, i was afraid if you had more elaborate notes and i may have missed some things, but glad to hear that, feww
14:16:40 <rosmaita> some questions came up on the spec last week before the freeze deadline
14:16:55 <rosmaita> so i wanted to carry it over before we said yes or no
14:17:12 <rosmaita> which brings us to our next topic
14:17:23 <rosmaita> #topic volume-local-cache spec
14:17:56 <LiangFang> one Nova engineer thinks they are moving volumes to mount directly by qemu
14:18:12 <LiangFang> not mount to host os first
14:18:19 <rosmaita> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-January/012279.html
14:18:44 <rosmaita> also, here's a link to the latest draft of the spec:
14:18:52 <rosmaita> #link https://review.opendev.org/#/c/684556/
14:19:36 <rosmaita> so the issue is, if nova is planning to mount volumes directly by qemu, then that would completely bypass the cache in this spec, is that correct?
14:19:36 <eharney> as i understand it, consuming cinder volumes via qemu instead of attaching them to the host is still blocked by the fact that qemu doesn't support multipath
14:19:45 <eharney> but it has been a goal for a bit
14:19:51 <LiangFang> rosmaita: yes
14:20:29 <smcginnis> I know they've been pushing for that for a long time, but last I looked, there were multiple reasons NOT to go with direct QEMU mounting of volumes.
14:21:21 <rosmaita> so it may not be as much of a done deal as is implied in that email?
14:21:44 <eharney> i think everyone agrees it would be a better way to do things
14:21:54 <rosmaita> multipath support is kind of a big deal, though
14:21:59 <eharney> right
14:22:01 <LiangFang> currently only rbd and sheepdog is mounting directly by qemu
14:22:10 <rosmaita> and sheepdog is no more
14:22:56 <smcginnis> I don't think it supported FC either.
14:23:07 <smcginnis> "The storage protocol that won't die"
14:23:10 <rosmaita> ok, so my reason in putting this on the agenda is to ask: is this a reason to hold up Liang's spec?
14:23:14 <eharney> either way, i noted a handful of concerns on this spec that are generally around the theme that there are lot of things you have to account for when getting into the data path of cinder volumes that don't seem to be sorted out thoroughly yet
14:23:41 <eharney> the encryption layering one being one of the biggest issues in my mind
14:24:06 <rosmaita> yes, it is definitely important to get that right
14:24:25 <rosmaita> eharney: did you see the update yet? does it address your concerns?
14:24:32 <eharney> i haven't looked at it yet
14:24:39 <rosmaita> ok
14:25:55 <rosmaita> i think other than encryption, the major concern is migration?
14:26:42 <eharney> yes, it's not been clear to me whether migration actually works, one of the nova folks pointed out that you have to have a mechanism to flush the cache during migration
14:26:59 <smcginnis> Seems important.
14:27:19 <eharney> otherwise you leave a bunch of data in the cache on the source host during migration that doesn't show up on the destination host
14:27:36 <eharney> which would not work at all
14:27:43 <rosmaita> which would be a bummer indeed
14:28:22 <LiangFang> write-back mode is working like this
14:28:35 <rosmaita> my feeling is that these major issues need to be understood at the spec stage, we probably shouldn't try to work them out in the implementation phase
14:28:39 <LiangFang> write-through mode will not ：）
14:29:10 <rosmaita> so the "safe modes" should migrate OK?
14:29:17 <LiangFang> yes
14:29:36 <LiangFang> not dirty data in cache in write-through mode
14:29:40 <rosmaita> ok, and the current proposal is that we would only support "safe modes"
14:29:40 <eharney> the same issue crops up with consistency groups
14:29:45 <whoami-rajat> flushing shouldn't be necessary in write-through right?
14:29:56 <LiangFang> yes
14:30:14 <LiangFang> no need flush for write-through
14:30:19 <eharney> and snapshots "work" but may surprise users
14:30:57 <eharney> (all for the same reason)
14:31:13 <rosmaita> just to make sure i understand
14:31:21 <LiangFang> in write-through/safe mode, every write io will go to backend
14:31:24 <rosmaita> in the "safe modes", snapshots should be OK, right?
14:31:41 <eharney> yes all of this would work normally in write-through mode
14:31:45 <LiangFang> in safe modes, cache just like read only cache
14:31:53 <rosmaita> ok, just wanted to be clear
14:32:13 <rosmaita> so the problem is that to get the best benefit from the cache, all sorts of stuff could break
14:32:22 <rosmaita> migrations, snapshots
14:32:39 <rosmaita> is restricting to safe caching modes too big a restriction?
14:32:43 <rosmaita> what i mean is
14:32:52 <eharney> migrations would break, snapshots would succeed but have older data in them than you expected
14:33:15 <rosmaita> right, and across a group, wouldn't necessarily be consistent any more
14:33:18 <eharney> right
14:33:35 <rosmaita> good thing we just call them "groups" now :)
14:33:56 <eharney> well that's a separate thing
14:34:24 <rosmaita> yeah, that was a bad joke
14:34:24 <jungleboyj> This is sounding concerning.
14:34:27 <LiangFang> eharney: I still don't understand why older data than expected
14:34:49 <LiangFang> no newer data in cache in any moment
14:35:22 <eharney> because snapshots are performed on the backend, and a snapshot will be created for the volume with the data that was written there but data in the cache (which the user of the instance thinks has been written there) won't be in the snap  (when in write-back mode)
14:36:16 <LiangFang> write-back mode yes, safe-mode will not
14:36:27 <eharney> right
14:36:33 <LiangFang> :)
14:36:34 <whoami-rajat> also i couldn't find any usecase regarding the write-back cache in the spec
14:36:52 <LiangFang> will not support write-back mode
14:36:58 <rosmaita> i asked LiangFang to remove it because we aren't supporting it any more
14:37:15 <LiangFang> ceph support client side read only cache
14:37:25 <whoami-rajat> oh ok
14:37:26 <LiangFang> but it is using DRAM as cache
14:37:56 <LiangFang> volume local cache is just something like read-only cache
14:38:11 <LiangFang> but using persistent memory or fast ssd
14:38:39 <LiangFang> I know ceph read only cache is a new feature just developed
14:39:19 <rosmaita> here's a question: could we support the "unsafe" modes later on by, for example, making sure cache is flushed before a snapshot, or would that start to get too complicated?  what i mean is, is there a theoretical reason other than complexity for why this couldn't be done reliably?
14:39:48 <eharney> it definitely could be done, we just need hooks to request cache flushes in the right places
14:40:06 <rosmaita> so it could be possible to implement this in phases
14:40:20 <jungleboyj> Is there a reason for the urgency here if there is a safer way to get this done?
14:40:36 <rosmaita> becasuse i think operators and users are going to want to use the unsafe caching modes
14:40:57 <eharney> would be interesting to know how widely used writeback caching is in Nova now
14:41:06 <rosmaita> jungleboyj: mainly that we have to coordinate with nova to get it to actually work
14:41:10 <jungleboyj> Ok.  Then we need to make sure to implement it so that they don't shoot themselves in the foot in the process.  :-)
14:41:26 <rosmaita> so nova doesn't want to approve changes unless we have approved it on our side
14:41:50 <smcginnis> Sounds like we may need some cross-team meetings to work through all the intracasies like we had to do with multiattach.
14:42:07 <rosmaita> smcginnis: ++
14:42:11 <jungleboyj> smcginnis:  ++
14:42:22 <LiangFang> ok
14:42:25 <LiangFang> ++
14:43:17 <rosmaita> ok, i think this is worth pursuing during this cycle?  anyone disagree?
14:44:08 <eharney> seems worthwhile
14:44:18 <rosmaita> what i mean is, having the discussions, not waiting for the PTG
14:44:37 <jungleboyj> Seems worthwhile if we can do it in a safe manner.
14:44:46 <eharney> i would be much happier if we also had another reference implementation like dm-cache to test along with it, but that's probably dreaming too much :)
14:44:47 <rosmaita> ok, i think the next move is to have a bluejeans conference
14:45:31 <LiangFang> eharney: dan from Nova mentioned dm-crypt not working
14:45:34 <rosmaita> could i get names of people who definitely would want to attend and their time zones
14:45:53 <rosmaita> will help me offer some choices on a poll
14:46:00 <rosmaita> for meeting day/time
14:46:13 <eharney> i can (EST)
14:46:14 <LiangFang> eharney: main issue is: the backend volume should not containing any metadata
14:46:30 <whoami-rajat> IST +05:30 UTC
14:46:55 <rosmaita> EST
14:46:58 <LiangFang> UTC+8
14:47:44 <eharney> i think encrypted volumes already don't follow that, but we can figure it out later
14:48:02 <LiangFang> ok
14:48:25 <rosmaita> ok, i'll look on the nova spec and see who's commented
14:48:41 <LiangFang> thanks
14:48:54 <rosmaita> i'll get a poll out later today or early tomorrow
14:49:29 <rosmaita> #topic resource_filters response is inaccurate
14:49:54 <rosmaita> this was implemented before i started working on cinder
14:50:00 <rosmaita> context for this is whoami-rajat's patch fixing a problem in the volume-transfers API
14:50:18 <rosmaita> what i noticed is this
14:50:35 <rosmaita> #link https://docs.openstack.org/api-ref/block-storage/v3/?expanded=list-resource-filters-detail#resource-filters
14:50:42 <whoami-rajat> #link https://review.opendev.org/#/c/703658/
14:50:51 <rosmaita> that's what our resource_filters response gives you
14:51:05 <rosmaita> and actually, what most people would see is really this:
14:51:17 <rosmaita> #link https://opendev.org/openstack/cinder/src/branch/master/etc/cinder/resource_filters.json
14:51:27 <rosmaita> because it was designed to be operator-configurable
14:51:51 <rosmaita> according to the api-ref, the value of the "resource" element is supposed to be "Resource which the filters will be applied to"
14:52:06 <rosmaita> the problem is that all resources mentioned in our API URIs are plural ("volumes", "snapshots", "backups") whereas all the resources in the file are singular ("volume", "snapshot", "backup")
14:52:24 <rosmaita> in some ways, this is a minor point
14:52:58 <rosmaita> but i also noticed that the volume-transfers API doesn't implement the resource_filters framework that (most) of the other list-resource calls do
14:53:18 <rosmaita> so, we should get the volume_transfers into that resource_filters response
14:53:43 <rosmaita> which brings up the question: "volume-transfer" or "volume-transfers"
14:53:55 <rosmaita> (yes, it's a hyphen '-', not an underscore
14:53:59 <rosmaita> )
14:54:37 <rosmaita> one issue is, how can we change the resource_filters response?
14:54:57 <rosmaita> but, my take is that since it was designed to be configurable, the response doesn't have to be microversioned
14:55:15 <rosmaita> that is, whether it's available or not can be microversioned (i think it may be)
14:55:25 <rosmaita> but we can correct the response without a new mv
14:55:45 <whoami-rajat> i feel we should remove the plural 's', that makes more sense, volume name vs volumes name
14:55:48 <rosmaita> (sorry, i kind of obsess over API issues)
14:56:17 <rosmaita> well, the URL paths are all plural
14:56:24 <hemna> I'm not sure what value we would get out of a change like that
14:56:28 <rosmaita> and the question is, what does the filter apply to?
14:56:34 <hemna> it's been that way forever
14:56:38 <hemna> for better or worse
14:57:09 <rosmaita> it's just kind of weird that we're giving a list of filters you can use, but there's no actual resource with that name
14:57:21 <rosmaita> but i can live with it if it doesn't bother anyone else
14:57:43 <eharney> i would agree it doesn't need to be microversioned, but tempest definitely disagreed with me the last time i went down a similar path
14:57:44 <eharney> it is odd
14:57:45 <hemna> wouldn't that change break a lot of clients expecting volumes vs volume ?
14:58:09 <eharney> wouldn't those clients already break if you just removed the config for volume now?
14:58:12 <rosmaita> so the change is to what shows up in the resource_filters response, not the API path
14:58:39 <rosmaita> i think programmatically, what you would want to do is match the resource in the URL to the resource listed in the response
14:58:51 <eharney> i don't think the API says it must contain any particular field like "volume"
14:58:51 <rosmaita> right now, you have to know to remove the 's' to find it
14:58:59 <hemna> right, but they would be expecting the key volumes in the response and if we changed it to volume, that would break their app/client/call/expectation
14:59:09 <smcginnis> 1 minute
14:59:20 <rosmaita> hemna: the other way around, but i get your point
14:59:46 <rosmaita> ok, we can continue this later, looks like some pro, some con
14:59:51 <rosmaita> will have to be worked out
15:00:00 <rosmaita> and i have prevented open discussion again
15:00:08 <whoami-rajat> Thanks rosmaita
15:00:19 <rosmaita> anyone with other issues, please move over to cinder channel
15:00:31 <rosmaita> whoami-rajat: i think you need to end the meeting
15:00:37 <whoami-rajat> oh ok
15:00:49 <whoami-rajat> #endmeeting