14:00:35 <rosmaita> #startmeeting cinder
14:00:36 <openstack> Meeting started Wed Jul  8 14:00:35 2020 UTC and is due to finish in 60 minutes.  The chair is rosmaita. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:37 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:39 <openstack> The meeting name has been set to 'cinder'
14:00:47 <jungleboyj> o/
14:00:55 <rosmaita> #topic roll call
14:00:55 <e0ne> hi
14:00:57 <whoami-rajat> Hi
14:01:00 <geguileo> hi! o/
14:01:27 <eharney> hi
14:01:29 <tosky> hi
14:02:10 <rosmaita> ok, looks like we have some people
14:02:13 <rosmaita> hello everyone
14:02:31 <rosmaita> #link https://etherpad.openstack.org/p/cinder-victoria-meetings
14:03:05 <rosmaita> i'm at a coffee shop due to a power outage
14:03:15 <smcginnis> o/
14:03:19 <rosmaita> so not using my usual keyboard, as you will notice
14:03:36 <rosmaita> ok, let's get started
14:03:45 <rosmaita> #topic updates
14:03:50 <jungleboyj> rosmaita,  You can go to coffee shops?
14:03:51 <jungleboyj> :-)
14:04:08 <rosmaita> i am sitting outside, 15 feet from anyone else
14:04:15 <jungleboyj> ++
14:04:16 <rosmaita> inside is closed, you can only get coffee and leave
14:04:22 <rosmaita> but the wifi is working!
14:04:32 <tosky> what else is needed then
14:04:39 <rosmaita> a better keyboard!
14:05:08 <rosmaita> i the function and control keys are mashed together and i am having cutting & pasting problems
14:05:12 <rosmaita> but enough about that
14:05:24 <rosmaita> ok, the video meeting poll closes tomorrow
14:05:38 <rosmaita> #link https://rosmaita.wufoo.com/forms/monthly-video-meeting-proposal/
14:05:57 <rosmaita> it even has an option for "don't care", so even if you don't care, you can still fill it out
14:06:11 <rosmaita> this week is R-minus-14
14:06:17 <rosmaita> milestone 2 is at R-11
14:06:17 <enriquetaso> o/
14:06:24 <rosmaita> hello sofia
14:06:26 <rosmaita> that is, really soon
14:06:33 <rosmaita> it is also the new driver merge deadline
14:06:53 <rosmaita> i think we have 2 new drivers proposed?
14:06:58 <rosmaita> hitachi is mostly together
14:07:03 <rosmaita> thanks to lseki and smcginnis for reviewing that closely
14:07:09 <rosmaita> and i think dell/emc is proposing a new driver?
14:07:41 <rosmaita> i don't think i've seen any patches, just the launchpad blueprint so far
14:07:52 <rosmaita> and a special note for geguileo
14:07:59 <rosmaita> ussuri cinderlib must be released by R-9
14:08:00 <smcginnis> rosmaita: I don't think we will get that Dell one for Victoria.
14:08:17 <LiangFang> o/
14:08:19 <rosmaita> smcginnis: ok
14:08:24 <geguileo> rosmaita: as soon as we review the patches that are in gerrit (with the exception of the one with the -W) we can release
14:08:43 <rosmaita> "we" meaning "me", at least partially ... OK, will do
14:09:17 <rosmaita> ok, that's all the announcements
14:09:40 <rosmaita> i thought for a minute i deleted lseki's topic by mistake
14:09:53 <rosmaita> but i see that he has moved it lower due to connection problems
14:10:11 <rosmaita> #topic Moving stable/ocata and stable/pike to quick EOL
14:10:20 <sfernand> Lucio is having some issues to join the meeting, he is asking if we could post pone that
14:10:23 <sfernand> ahh ok
14:10:33 <TusharTgite> hi
14:10:48 <rosmaita> ok, so you may have seen on the ML that nova is proposing to put pike and ocata into 'unmaintained'
14:10:59 <rosmaita> hang on while i paste links
14:11:10 <rosmaita> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015747.html
14:11:20 <rosmaita> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015798.html
14:11:46 <rosmaita> you may remember that there was a proposal to do this for ocata before the PTG
14:12:13 <rosmaita> and smcginnis pointed out in that thread that if one of the major projects EOLs a branch, we pretty much all have to do it
14:12:15 <rosmaita> anyway
14:12:24 <rosmaita> i looked at our cinder ocata and pike branches
14:12:36 <rosmaita> and they haven't been committed to in over 6 months
14:13:07 <rosmaita> i mention that because lyarwood was proposing to back-date the nova 'unmaintained' phase to the last commit, which would mean a 3 month head start
14:13:18 <rosmaita> i am not being clear
14:13:35 <rosmaita> the issue is that a branch is supposed to be 'unmaintained' for 6 months, and then can go EOL
14:13:53 <rosmaita> so, if it's ok for nova to back-date the 'unmaintained' period, i think we can too
14:14:03 <rosmaita> just so happens that our back-dating can be 6 months
14:14:15 <rosmaita> so my proposal is to put out a notice on the ML
14:14:43 <rosmaita> that we are putting cinder pike and ocata into 'unmaintained' for 2 weeks, and if no one adopts them, we will EOL them
14:14:52 <smcginnis> ++
14:15:02 <rosmaita> that's what i was waiting for!
14:15:06 <rosmaita> thanks smcginnis
14:15:11 <smcginnis> ;)
14:15:33 <rosmaita> ok, so i will do that this afternoon ... 2 weeks from today is 22 July
14:15:43 <rosmaita> (just to have that on the record)
14:15:58 <LiangFang> ++
14:16:07 <tosky> removing them will simplify a lot the job handling; most "modern" jobs starts from pike, if not rocky
14:16:37 <rosmaita> yeah, ocata has been dead to me for a month now
14:16:42 <rosmaita> and pike is not much better
14:16:54 <rosmaita> hooray for modernization
14:17:14 <rosmaita> that's all, if anyone has second thoughts, we may have some open discussion later, and there is always the ML
14:17:31 <rosmaita> #topic rethink the visibility of __DEFAULT__ type
14:17:36 <rosmaita> whoami-rajat: that's you
14:17:42 <whoami-rajat> rosmaita, thanks!
14:18:16 <rosmaita> #link https://bugs.launchpad.net/cinder/+bug/1886632
14:18:16 <openstack> Launchpad bug 1886632 in Cinder "Cannot delete __DEFAULT__ volume type" [Undecided,New] - Assigned to Rajat Dhasmana (whoami-rajat)
14:18:17 <whoami-rajat> So we've had a recent bug in which the author states that their users are being confused by the __DEFAULT__ name
14:18:19 * lseki sneaks in
14:18:33 <rosmaita> i was skeptical at first, but the last comment on the bug is very revealing
14:18:34 <whoami-rajat> s/name/type
14:19:27 <whoami-rajat> they say they don't want their users to see the __DEFAULT__ type since they've already configured CONF.default_volume_type
14:19:47 <eharney> they don't want to see it when listing types, that is?
14:20:12 <rosmaita> eharney: yes, but maybe even stronger than that
14:20:25 <rosmaita> i think the way to go here is to not display __DEFAULT__ in the GET /types response if there is a default-type configured in cinder.conf
14:20:35 <jungleboyj> *Sigh*
14:20:38 <whoami-rajat> eharney, yes, they say the users gets confused if they should use this one or the other their admin has configured as default
14:20:44 <rosmaita> for type-show, you need to know the UUID of the type, is that right?
14:21:00 <whoami-rajat> rosmaita, id or name
14:21:17 <rosmaita> we take the name in the path?
14:21:28 <whoami-rajat> names are unique for volume types
14:21:28 <jungleboyj> The concern does make sense.
14:21:49 <whoami-rajat> I'm not really sure if this is a problem for a large mass of just this particular case
14:22:08 <rosmaita> i think we will see it more and more
14:22:28 <rosmaita> the problem i see, is that __DEFAULT__ shows up in the api ref
14:22:40 <geguileo> rosmaita: hiding the __DEFAULT__ vol type if we have a default in .conf could lead to a deployment having some volumes with __DEFAULT__ type but not getting it listed
14:22:41 <rosmaita> and if you can do GET /types/__DEFAULT__
14:22:47 <geguileo> if they changed it after creating some volumes
14:22:50 <whoami-rajat> rosmaita, but we allow the __DEFAULT__ to be configurable, that's why it is visible
14:23:07 <rosmaita> yes, but if it is not used at all. what does that matter?
14:23:22 <whoami-rajat> rosmaita, also if a volume gets created with the __DEFAULT__ type, it would confuse users more that their volume is using a type which isn't visible
14:23:24 <geguileo> we could add a config option to hide it?
14:23:37 <rosmaita> no
14:23:59 <rosmaita> you just said they can do GET on the __DEFAULT__, so they can still see it
14:24:25 <rosmaita> i mean, at the time you do a GET /types call, if the operator has one configured, that is what you will get
14:24:34 <rosmaita> so we don't need to display the __DEFAULT__ in that case
14:24:43 <rosmaita> and if the operator removes the config
14:24:46 <rosmaita> then we will
14:24:51 <rosmaita> which makes sense
14:24:53 <geguileo> but if someone created a volume and it used __DEFAULT__
14:24:58 <geguileo> then the .conf was changed
14:25:06 <geguileo> listing types would not return it
14:25:15 <geguileo> and it would be weird not to have the type that some volume has
14:25:35 <geguileo> when listing, I mean
14:25:40 <geguileo> (the type would be there)
14:25:52 <rosmaita> i think it depends on what the types list is supposed to display
14:26:01 <rosmaita> i think the types that are currently available to you
14:26:11 <geguileo> and __DEFAULT__ is available
14:26:13 <eharney> can't a user still manually create a volume w/ type __DEFAULT__ even if we don't list it?
14:26:24 <geguileo> eharney: yup
14:26:33 <eharney> so i'm not sure it's just about visibility in the list
14:26:36 <rosmaita> that seems like a bug
14:26:48 <rosmaita> i mean, __DEFAULT__ is supposed to be for lazy operators
14:26:50 <geguileo> I don't see that as a bug...
14:27:03 <eharney> i think it probably is a bug
14:27:11 <rosmaita> sure, the operator has configured a default type, that's what the default should be
14:27:32 <rosmaita> so, looks like a can of worms has been opened
14:27:53 <eharney> presumably if the operator made a default volume type, they don't want __DEFAULT__ to be used
14:28:17 <rosmaita> yes, that's exactly this bug-filer's issue
14:28:49 <rosmaita> whoami-rajat: i forget, what are the restrictions on modifying __DEFAULT__ type?
14:29:09 <rosmaita> i mean the actual system default
14:29:35 <whoami-rajat> rosmaita, their issue is they don't want their users to see it, they don't use it but it doesn't cause them any problem other than confusion
14:29:41 <whoami-rajat> rosmaita, we can update it, but can't delete it
14:30:04 <rosmaita> so they could update __DEFAULT__ to have exactly the same properties as their preferred default?
14:30:04 <whoami-rajat> what i suggested was, i will document this clearly
14:30:21 <whoami-rajat> rosmaita, yes they can
14:30:54 <rosmaita> but they can't do it while there are any volumes of __DEFAULT__, right?
14:31:21 <whoami-rajat> rosmaita, yep, it shouldn't be in use by any volume
14:31:39 <rosmaita> well, except as eharney pointed out, a user could explicitly ask for it
14:31:58 <rosmaita> given that it's all over the api-ref responses
14:32:42 <whoami-rajat> I've no issues in improving the documentation but what they're suggesting is to remove it which will again allow creating of untyped volumes which i don't prefer
14:33:19 <whoami-rajat> and we also discussed the visibility scenario, that doesn't seem to work either
14:33:31 <rosmaita> i think we have 2 bugs:
14:33:55 <rosmaita> 1) if an operator has configured a default type, users should not be able to create a volume of __DEFAULT__ type
14:34:27 <rosmaita> 2) if an operator has configured a default type, the __DEFAULT__ should not be displayed in the GET /types response (this one is controversial right now)
14:34:57 <rosmaita> i think this is a real problem, because even though it's kind of silly, customer calls are a PITA
14:35:01 <smcginnis> Since __DEFAULT__ was created because we can't handle things right in our code because too many places expected to have a type, I think it should be hidden from end users.
14:35:11 <rosmaita> smcginnis: ++
14:35:35 <whoami-rajat> the configured one already has a priority over the __DEFAULT__ type
14:35:52 <rosmaita> yes, but there' s no way for end users to know that
14:36:12 <geguileo> and the problem is that horizon would present __DEFAULT__
14:36:29 <rosmaita> yeah, and DEFAULT looks more important that default
14:36:37 <smcginnis> Yep.
14:37:13 <jungleboyj> Yeah.  I do think that the complaint is relevant.
14:37:28 <geguileo> yeah, it's a reasonable complaint
14:37:46 <rosmaita> ok, let's think about this some more and revisit next week
14:38:02 <whoami-rajat> thanks everyone for their feedback
14:38:09 <jungleboyj> rosmaita, ++
14:38:13 <geguileo> I think we can hide the __DEFAULT__ type from the list if there are no volumes that use them and cinder.conf has a different default
14:38:24 <geguileo> s/them/it
14:38:47 <rosmaita> geguileo: problem is, any deployment since train will definitely have them
14:38:58 <geguileo> rosmaita: not necessarily
14:39:01 <rosmaita> yes
14:39:06 <rosmaita> there was a regression
14:39:10 <geguileo> rosmaita: they could have a default already defined
14:39:14 <geguileo> in the conf
14:39:16 <whoami-rajat> geguileo, but if they comment out the default part in cinder.conf, we should show it ?
14:39:20 <geguileo> and the __DEFAULT__ would not be used
14:39:33 <geguileo> whoami-rajat: that's what I would do
14:39:44 <geguileo> that, or having a config option
14:40:19 <rosmaita> i don't like the config option
14:40:36 <rosmaita> but we can discuss next week, let's move on
14:40:39 <geguileo> rosmaita: but it's the cleanest way, since we pass the responsibility to the admin
14:40:43 <geguileo> rosmaita: ok
14:41:17 <rosmaita> #topic CI issues
14:41:25 <rosmaita> tosky: hopefully this is quick
14:41:32 <tosky> I can just copy the content of the etherpad here
14:41:38 <tosky> or do a summary:
14:42:10 <tosky> - you can see many failures on cinder-tempest-plugin-lvm-lio-barbican fails, especially one test, I don't know why
14:42:37 <tosky> - https://review.opendev.org/#/c/733161/ should temporarily unblock cinder-tempest-plugins gate broken by the ceph updates (but we need to fix them)
14:43:01 <tosky> - please merge https://review.opendev.org/#/c/738978/ and its future train backport to make the ceph job pass again
14:43:23 <tosky> - devstack-plugin-nfs-tempest-full is superbroken for unknown reasons (see https://review.opendev.org/#/c/735959/)
14:43:30 <tosky> that's it - suggestions and help more than welcome
14:43:46 <eharney> the lio-barbican job has been a little flaky for a while, and i occasionally look at it, but the failures are never very actionable/interesting to me
14:44:03 <eharney> (that is, it probably needs a more thorough look)
14:44:30 <rosmaita> superbroken is even worse than usualy
14:44:41 <tosky> I suspect resource issues, the tests which fails most for lio-barbican is     tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern
14:44:55 <tosky> and it usually fails to connect to the spawned instance
14:44:59 <rosmaita> tosky: i think you are onto something there
14:45:01 <eharney> right
14:45:20 <eharney> are we still chasing any of these things with elastic recheck?
14:45:39 <rosmaita> i personally am not
14:46:03 <tosky> I admit not being too much into that; I was told no need to add recheck <foo> because it should be caught by elasticsearch (maybe after adding some rules)
14:47:02 <smcginnis> I think since Riedeman left, we lost our last elastic recheck champion. :)
14:47:12 <smcginnis> I think we should use it though. It does help.
14:47:24 <rosmaita> ok, let's address that next week too
14:47:31 <rosmaita> thanks, tosky
14:47:50 <rosmaita> #topic Fix for Fail to extend attached volume using generic NFS driver
14:47:58 <rosmaita> lseki: that's you
14:48:03 <lseki> hi
14:48:15 <lseki> I think kaisers is ooo but he can read the logs later
14:48:17 <rosmaita> hopefully your connection will hold for the next 10 min
14:48:32 <lseki> hopefully
14:48:47 <lseki> I talked to openstack-nova folks
14:49:15 <lseki> about https://bugs.launchpad.net/cinder/+bug/1870367
14:49:15 <openstack> Launchpad bug 1870367 in Cinder "Fail to extend attached volume using generic NFS driver" [High,In progress] - Assigned to Lucio Seki (lseki)
14:49:27 <rosmaita> i like the idea of nova doing everything
14:49:39 <lseki> in short, generic nfs driver is failing because it's trying to do an unnecessary `qemu-img resize` operation
14:50:01 <enriquetaso> :o
14:50:11 <lseki> so the fix is to avoid generic nfs driver from doing that
14:50:17 <lseki> and let nova do everything needed
14:50:46 <lseki> I submitted 3 draft patches for nova, cinder, and devstack
14:51:08 <lseki> nova patch to implement a trivial method called upon extend_volume
14:51:30 <lseki> cinder patch to make nfs driver skip the qemu-img resize when volume is attached
14:51:46 <lseki> devstack patch to enable the online extend test for generic nfs driver
14:52:19 <lseki> reviews are welcome!
14:52:36 <eharney> does the volume manager submit a nova event etc for extend after the driver's extend_volume call?
14:53:05 <lseki> soon, I'll submit a similar patch for ONTAP NFS driver; it works on my machine
14:53:23 <geguileo> eharney: we do
14:53:40 <eharney> i suspect this means the extend method may need a lock against create_snapshot and other snapshot calls in the nfs driver
14:53:55 <eharney> this also needs to be tested thoroughly with encrypted volumes
14:54:07 <eharney> but many thanks for working on this
14:54:39 <lseki> :-)
14:54:42 <rosmaita> lseki: looks like your request for corner cases has been satisfied
14:54:59 <rosmaita> lseki: thanks for the comprehensive report
14:55:02 <eharney> to be more clear: performing resize and snapshot operations concurrently may break with your current patch, but i haven't looked too closely
14:55:20 <lseki> kaisers may do something similar to quobyte nfs driver, putting a depends-on to nova patch
14:56:07 <lseki> eharney: hmm we should check that
14:56:34 <rosmaita> four minutes left ...
14:56:38 <lseki> I have another concern: what if nova fails to extend the volume for some  reason?
14:57:11 <lseki> cinder will update the DB with the new size, but the actual volume file will remain with the original size
14:57:15 <eharney> hmm
14:57:19 <geguileo> eharney: if the driver needs a lock to prevent snapshots while Nova does the resize we have a problem
14:57:35 <eharney> geguileo: how so?
14:57:38 <geguileo> because the call is async
14:57:45 <smcginnis> We just send an event.
14:57:52 <smcginnis> We don't ever even know if it happens.
14:57:54 <geguileo> exactly :-(
14:58:02 <geguileo> which brings us to lseki's concern
14:58:05 <smcginnis> "Hey nova, if you're listening, you can extend this volume if you feel like it."
14:58:09 <geguileo> what if it fails
14:58:23 <geguileo> so we need to find a way to make it synchronous
14:58:45 <eharney> i suspect there's an issue if you extend the root file while halfway through a create_snapshot operation which is shuffling files around
14:58:59 <geguileo> or implement a similar external events mechanism like Nova so they can let us know the result
14:59:19 <smcginnis> For other drivers it is not an issue since they extend the volume first, then send an event.
14:59:35 <smcginnis> It could be to nova, or it could be to someone else using Cinder for volume services.
14:59:45 <smcginnis> We definitely should not have a hard dependency on a nova API.
14:59:59 <rosmaita> ok, looks like this needs some more thought
15:00:04 <rosmaita> and we are out of time
15:00:08 <rosmaita> #endmeeting