14:00:35 #startmeeting cinder 14:00:36 Meeting started Wed Jul 8 14:00:35 2020 UTC and is due to finish in 60 minutes. The chair is rosmaita. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:37 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:39 The meeting name has been set to 'cinder' 14:00:47 o/ 14:00:55 #topic roll call 14:00:55 hi 14:00:57 Hi 14:01:00 hi! o/ 14:01:27 hi 14:01:29 hi 14:02:10 ok, looks like we have some people 14:02:13 hello everyone 14:02:31 #link https://etherpad.openstack.org/p/cinder-victoria-meetings 14:03:05 i'm at a coffee shop due to a power outage 14:03:15 o/ 14:03:19 so not using my usual keyboard, as you will notice 14:03:36 ok, let's get started 14:03:45 #topic updates 14:03:50 rosmaita, You can go to coffee shops? 14:03:51 :-) 14:04:08 i am sitting outside, 15 feet from anyone else 14:04:15 ++ 14:04:16 inside is closed, you can only get coffee and leave 14:04:22 but the wifi is working! 14:04:32 what else is needed then 14:04:39 a better keyboard! 14:05:08 i the function and control keys are mashed together and i am having cutting & pasting problems 14:05:12 but enough about that 14:05:24 ok, the video meeting poll closes tomorrow 14:05:38 #link https://rosmaita.wufoo.com/forms/monthly-video-meeting-proposal/ 14:05:57 it even has an option for "don't care", so even if you don't care, you can still fill it out 14:06:11 this week is R-minus-14 14:06:17 milestone 2 is at R-11 14:06:17 o/ 14:06:24 hello sofia 14:06:26 that is, really soon 14:06:33 it is also the new driver merge deadline 14:06:53 i think we have 2 new drivers proposed? 14:06:58 hitachi is mostly together 14:07:03 thanks to lseki and smcginnis for reviewing that closely 14:07:09 and i think dell/emc is proposing a new driver? 14:07:41 i don't think i've seen any patches, just the launchpad blueprint so far 14:07:52 and a special note for geguileo 14:07:59 ussuri cinderlib must be released by R-9 14:08:00 rosmaita: I don't think we will get that Dell one for Victoria. 14:08:17 o/ 14:08:19 smcginnis: ok 14:08:24 rosmaita: as soon as we review the patches that are in gerrit (with the exception of the one with the -W) we can release 14:08:43 "we" meaning "me", at least partially ... OK, will do 14:09:17 ok, that's all the announcements 14:09:40 i thought for a minute i deleted lseki's topic by mistake 14:09:53 but i see that he has moved it lower due to connection problems 14:10:11 #topic Moving stable/ocata and stable/pike to quick EOL 14:10:20 Lucio is having some issues to join the meeting, he is asking if we could post pone that 14:10:23 ahh ok 14:10:33 hi 14:10:48 ok, so you may have seen on the ML that nova is proposing to put pike and ocata into 'unmaintained' 14:10:59 hang on while i paste links 14:11:10 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015747.html 14:11:20 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015798.html 14:11:46 you may remember that there was a proposal to do this for ocata before the PTG 14:12:13 and smcginnis pointed out in that thread that if one of the major projects EOLs a branch, we pretty much all have to do it 14:12:15 anyway 14:12:24 i looked at our cinder ocata and pike branches 14:12:36 and they haven't been committed to in over 6 months 14:13:07 i mention that because lyarwood was proposing to back-date the nova 'unmaintained' phase to the last commit, which would mean a 3 month head start 14:13:18 i am not being clear 14:13:35 the issue is that a branch is supposed to be 'unmaintained' for 6 months, and then can go EOL 14:13:53 so, if it's ok for nova to back-date the 'unmaintained' period, i think we can too 14:14:03 just so happens that our back-dating can be 6 months 14:14:15 so my proposal is to put out a notice on the ML 14:14:43 that we are putting cinder pike and ocata into 'unmaintained' for 2 weeks, and if no one adopts them, we will EOL them 14:14:52 ++ 14:15:02 that's what i was waiting for! 14:15:06 thanks smcginnis 14:15:11 ;) 14:15:33 ok, so i will do that this afternoon ... 2 weeks from today is 22 July 14:15:43 (just to have that on the record) 14:15:58 ++ 14:16:07 removing them will simplify a lot the job handling; most "modern" jobs starts from pike, if not rocky 14:16:37 yeah, ocata has been dead to me for a month now 14:16:42 and pike is not much better 14:16:54 hooray for modernization 14:17:14 that's all, if anyone has second thoughts, we may have some open discussion later, and there is always the ML 14:17:31 #topic rethink the visibility of __DEFAULT__ type 14:17:36 whoami-rajat: that's you 14:17:42 rosmaita, thanks! 14:18:16 #link https://bugs.launchpad.net/cinder/+bug/1886632 14:18:16 Launchpad bug 1886632 in Cinder "Cannot delete __DEFAULT__ volume type" [Undecided,New] - Assigned to Rajat Dhasmana (whoami-rajat) 14:18:17 So we've had a recent bug in which the author states that their users are being confused by the __DEFAULT__ name 14:18:19 * lseki sneaks in 14:18:33 i was skeptical at first, but the last comment on the bug is very revealing 14:18:34 s/name/type 14:19:27 they say they don't want their users to see the __DEFAULT__ type since they've already configured CONF.default_volume_type 14:19:47 they don't want to see it when listing types, that is? 14:20:12 eharney: yes, but maybe even stronger than that 14:20:25 i think the way to go here is to not display __DEFAULT__ in the GET /types response if there is a default-type configured in cinder.conf 14:20:35 *Sigh* 14:20:38 eharney, yes, they say the users gets confused if they should use this one or the other their admin has configured as default 14:20:44 for type-show, you need to know the UUID of the type, is that right? 14:21:00 rosmaita, id or name 14:21:17 we take the name in the path? 14:21:28 names are unique for volume types 14:21:28 The concern does make sense. 14:21:49 I'm not really sure if this is a problem for a large mass of just this particular case 14:22:08 i think we will see it more and more 14:22:28 the problem i see, is that __DEFAULT__ shows up in the api ref 14:22:40 rosmaita: hiding the __DEFAULT__ vol type if we have a default in .conf could lead to a deployment having some volumes with __DEFAULT__ type but not getting it listed 14:22:41 and if you can do GET /types/__DEFAULT__ 14:22:47 if they changed it after creating some volumes 14:22:50 rosmaita, but we allow the __DEFAULT__ to be configurable, that's why it is visible 14:23:07 yes, but if it is not used at all. what does that matter? 14:23:22 rosmaita, also if a volume gets created with the __DEFAULT__ type, it would confuse users more that their volume is using a type which isn't visible 14:23:24 we could add a config option to hide it? 14:23:37 no 14:23:59 you just said they can do GET on the __DEFAULT__, so they can still see it 14:24:25 i mean, at the time you do a GET /types call, if the operator has one configured, that is what you will get 14:24:34 so we don't need to display the __DEFAULT__ in that case 14:24:43 and if the operator removes the config 14:24:46 then we will 14:24:51 which makes sense 14:24:53 but if someone created a volume and it used __DEFAULT__ 14:24:58 then the .conf was changed 14:25:06 listing types would not return it 14:25:15 and it would be weird not to have the type that some volume has 14:25:35 when listing, I mean 14:25:40 (the type would be there) 14:25:52 i think it depends on what the types list is supposed to display 14:26:01 i think the types that are currently available to you 14:26:11 and __DEFAULT__ is available 14:26:13 can't a user still manually create a volume w/ type __DEFAULT__ even if we don't list it? 14:26:24 eharney: yup 14:26:33 so i'm not sure it's just about visibility in the list 14:26:36 that seems like a bug 14:26:48 i mean, __DEFAULT__ is supposed to be for lazy operators 14:26:50 I don't see that as a bug... 14:27:03 i think it probably is a bug 14:27:11 sure, the operator has configured a default type, that's what the default should be 14:27:32 so, looks like a can of worms has been opened 14:27:53 presumably if the operator made a default volume type, they don't want __DEFAULT__ to be used 14:28:17 yes, that's exactly this bug-filer's issue 14:28:49 whoami-rajat: i forget, what are the restrictions on modifying __DEFAULT__ type? 14:29:09 i mean the actual system default 14:29:35 rosmaita, their issue is they don't want their users to see it, they don't use it but it doesn't cause them any problem other than confusion 14:29:41 rosmaita, we can update it, but can't delete it 14:30:04 so they could update __DEFAULT__ to have exactly the same properties as their preferred default? 14:30:04 what i suggested was, i will document this clearly 14:30:21 rosmaita, yes they can 14:30:54 but they can't do it while there are any volumes of __DEFAULT__, right? 14:31:21 rosmaita, yep, it shouldn't be in use by any volume 14:31:39 well, except as eharney pointed out, a user could explicitly ask for it 14:31:58 given that it's all over the api-ref responses 14:32:42 I've no issues in improving the documentation but what they're suggesting is to remove it which will again allow creating of untyped volumes which i don't prefer 14:33:19 and we also discussed the visibility scenario, that doesn't seem to work either 14:33:31 i think we have 2 bugs: 14:33:55 1) if an operator has configured a default type, users should not be able to create a volume of __DEFAULT__ type 14:34:27 2) if an operator has configured a default type, the __DEFAULT__ should not be displayed in the GET /types response (this one is controversial right now) 14:34:57 i think this is a real problem, because even though it's kind of silly, customer calls are a PITA 14:35:01 Since __DEFAULT__ was created because we can't handle things right in our code because too many places expected to have a type, I think it should be hidden from end users. 14:35:11 smcginnis: ++ 14:35:35 the configured one already has a priority over the __DEFAULT__ type 14:35:52 yes, but there' s no way for end users to know that 14:36:12 and the problem is that horizon would present __DEFAULT__ 14:36:29 yeah, and DEFAULT looks more important that default 14:36:37 Yep. 14:37:13 Yeah. I do think that the complaint is relevant. 14:37:28 yeah, it's a reasonable complaint 14:37:46 ok, let's think about this some more and revisit next week 14:38:02 thanks everyone for their feedback 14:38:09 rosmaita, ++ 14:38:13 I think we can hide the __DEFAULT__ type from the list if there are no volumes that use them and cinder.conf has a different default 14:38:24 s/them/it 14:38:47 geguileo: problem is, any deployment since train will definitely have them 14:38:58 rosmaita: not necessarily 14:39:01 yes 14:39:06 there was a regression 14:39:10 rosmaita: they could have a default already defined 14:39:14 in the conf 14:39:16 geguileo, but if they comment out the default part in cinder.conf, we should show it ? 14:39:20 and the __DEFAULT__ would not be used 14:39:33 whoami-rajat: that's what I would do 14:39:44 that, or having a config option 14:40:19 i don't like the config option 14:40:36 but we can discuss next week, let's move on 14:40:39 rosmaita: but it's the cleanest way, since we pass the responsibility to the admin 14:40:43 rosmaita: ok 14:41:17 #topic CI issues 14:41:25 tosky: hopefully this is quick 14:41:32 I can just copy the content of the etherpad here 14:41:38 or do a summary: 14:42:10 - you can see many failures on cinder-tempest-plugin-lvm-lio-barbican fails, especially one test, I don't know why 14:42:37 - https://review.opendev.org/#/c/733161/ should temporarily unblock cinder-tempest-plugins gate broken by the ceph updates (but we need to fix them) 14:43:01 - please merge https://review.opendev.org/#/c/738978/ and its future train backport to make the ceph job pass again 14:43:23 - devstack-plugin-nfs-tempest-full is superbroken for unknown reasons (see https://review.opendev.org/#/c/735959/) 14:43:30 that's it - suggestions and help more than welcome 14:43:46 the lio-barbican job has been a little flaky for a while, and i occasionally look at it, but the failures are never very actionable/interesting to me 14:44:03 (that is, it probably needs a more thorough look) 14:44:30 superbroken is even worse than usualy 14:44:41 I suspect resource issues, the tests which fails most for lio-barbican is tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern 14:44:55 and it usually fails to connect to the spawned instance 14:44:59 tosky: i think you are onto something there 14:45:01 right 14:45:20 are we still chasing any of these things with elastic recheck? 14:45:39 i personally am not 14:46:03 I admit not being too much into that; I was told no need to add recheck because it should be caught by elasticsearch (maybe after adding some rules) 14:47:02 I think since Riedeman left, we lost our last elastic recheck champion. :) 14:47:12 I think we should use it though. It does help. 14:47:24 ok, let's address that next week too 14:47:31 thanks, tosky 14:47:50 #topic Fix for Fail to extend attached volume using generic NFS driver 14:47:58 lseki: that's you 14:48:03 hi 14:48:15 I think kaisers is ooo but he can read the logs later 14:48:17 hopefully your connection will hold for the next 10 min 14:48:32 hopefully 14:48:47 I talked to openstack-nova folks 14:49:15 about https://bugs.launchpad.net/cinder/+bug/1870367 14:49:15 Launchpad bug 1870367 in Cinder "Fail to extend attached volume using generic NFS driver" [High,In progress] - Assigned to Lucio Seki (lseki) 14:49:27 i like the idea of nova doing everything 14:49:39 in short, generic nfs driver is failing because it's trying to do an unnecessary `qemu-img resize` operation 14:50:01 :o 14:50:11 so the fix is to avoid generic nfs driver from doing that 14:50:17 and let nova do everything needed 14:50:46 I submitted 3 draft patches for nova, cinder, and devstack 14:51:08 nova patch to implement a trivial method called upon extend_volume 14:51:30 cinder patch to make nfs driver skip the qemu-img resize when volume is attached 14:51:46 devstack patch to enable the online extend test for generic nfs driver 14:52:19 reviews are welcome! 14:52:36 does the volume manager submit a nova event etc for extend after the driver's extend_volume call? 14:53:05 soon, I'll submit a similar patch for ONTAP NFS driver; it works on my machine 14:53:23 eharney: we do 14:53:40 i suspect this means the extend method may need a lock against create_snapshot and other snapshot calls in the nfs driver 14:53:55 this also needs to be tested thoroughly with encrypted volumes 14:54:07 but many thanks for working on this 14:54:39 :-) 14:54:42 lseki: looks like your request for corner cases has been satisfied 14:54:59 lseki: thanks for the comprehensive report 14:55:02 to be more clear: performing resize and snapshot operations concurrently may break with your current patch, but i haven't looked too closely 14:55:20 kaisers may do something similar to quobyte nfs driver, putting a depends-on to nova patch 14:56:07 eharney: hmm we should check that 14:56:34 four minutes left ... 14:56:38 I have another concern: what if nova fails to extend the volume for some reason? 14:57:11 cinder will update the DB with the new size, but the actual volume file will remain with the original size 14:57:15 hmm 14:57:19 eharney: if the driver needs a lock to prevent snapshots while Nova does the resize we have a problem 14:57:35 geguileo: how so? 14:57:38 because the call is async 14:57:45 We just send an event. 14:57:52 We don't ever even know if it happens. 14:57:54 exactly :-( 14:58:02 which brings us to lseki's concern 14:58:05 "Hey nova, if you're listening, you can extend this volume if you feel like it." 14:58:09 what if it fails 14:58:23 so we need to find a way to make it synchronous 14:58:45 i suspect there's an issue if you extend the root file while halfway through a create_snapshot operation which is shuffling files around 14:58:59 or implement a similar external events mechanism like Nova so they can let us know the result 14:59:19 For other drivers it is not an issue since they extend the volume first, then send an event. 14:59:35 It could be to nova, or it could be to someone else using Cinder for volume services. 14:59:45 We definitely should not have a hard dependency on a nova API. 14:59:59 ok, looks like this needs some more thought 15:00:04 and we are out of time 15:00:08 #endmeeting