14:00:34 <rosmaita> #startmeeting cinder
14:00:35 <openstack> Meeting started Wed Jun 17 14:00:34 2020 UTC and is due to finish in 60 minutes.  The chair is rosmaita. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:36 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:38 <openstack> The meeting name has been set to 'cinder'
14:00:44 <enriquetaso> o/
14:00:44 <rosmaita> #topic roll call
14:00:48 <whoami-rajat> Hi
14:00:49 <walshh_> Hi
14:00:50 <eharney> hi
14:00:50 <enriquetaso> o/
14:00:58 <rajinir> hi
14:01:06 <jungleboyj> o/
14:01:15 <tosky> o/
14:01:24 <geguileo> hi! o/
14:01:45 <rosmaita> nice turnout!
14:01:46 <rosmaita> #link https://etherpad.openstack.org/p/cinder-victoria-meetings
14:02:06 <lseki> hi
14:02:20 <rosmaita> #topic announcements
14:02:28 <smcginnis> o/
14:02:45 <rosmaita> thanks to everyone who filled out the poll to determine a good time for the virtual midcycle
14:02:50 <rosmaita> which is next week!
14:03:00 <rosmaita> the winner is:\
14:03:11 <rosmaita> Wednesday 24 June 1400-1600 UTC
14:03:26 <rosmaita> so the first hour overlaps with the time of the usual cinder meeting
14:03:38 <rosmaita> so everyone should be able to attend at least the first part
14:03:52 <rosmaita> and we will need topics:
14:04:02 <rosmaita> #link https://etherpad.opendev.org/p/cinder-victoria-mid-cycles
14:04:03 <e0ne_> hi
14:04:42 <rosmaita> since it's the week before spec freeze, anyone working on a spec who has questions or needs some guidance, the virtual midcycle would be a good place to get answers
14:05:02 <rosmaita> anyone working on a spec should check out the virtual PTG etherpad
14:05:19 <rosmaita> each spec was discussed, and there are comments on the etherpad
14:05:34 <rosmaita> #link https://etherpad.opendev.org/p/victoria-ptg-cinder
14:05:50 <rosmaita> and, of course, we can discuss other issues as well
14:05:57 <hemna_> mep
14:06:16 <rosmaita> i know it's coming up fast, but this is a short cycle, and spec freeze is in 2 weeks
14:06:58 <whoami-rajat> rosmaita, we will have only 1 meeting right ? (of 2 hours)
14:07:01 <jungleboyj> Crazy.
14:07:16 <rosmaita> whoami-rajat: well, there will be part 2, but at week R-9
14:07:24 <rosmaita> so yes, just one meeting before spec freeze
14:07:31 <whoami-rajat> rosmaita, yep. ack.
14:07:32 <whoami-rajat> thanks
14:07:47 <rosmaita> #action rosmaita announcement to ML about virtual midcycle
14:08:05 <rosmaita> we'll do bluejeans again as a videoconf technology
14:08:46 <rosmaita> the other topic is that last time i looked, all devstack-based zuul jobs are broken
14:09:00 <rosmaita> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-June/015432.html
14:09:13 <rosmaita> so recheck is futile until that's fixed
14:09:59 <whoami-rajat> they've proposed a patch which is merged in master but when i looked at grenade failure, it shows error on previous code, not sure why
14:10:45 <whoami-rajat> https://review.opendev.org/#/c/577955/27/lib/apache
14:10:55 <tosky> the new uwsgi brought havoc
14:11:16 <rosmaita> but the stable branches don't seem to be doing so well, either
14:11:17 <tosky> whoami-rajat: grenade install the previous branch
14:11:27 <rosmaita> #link https://zuul.opendev.org/t/openstack/builds?job_name=tempest-full#
14:11:34 <smcginnis> I thought at least train was fixed as of yesterday.
14:11:41 <tosky> so they sequence is: make grenade non-voting, make it non-voting also in the backport, fix the issue in the previous branch, reenable it
14:12:01 <tosky> but yeah, until stein things should work, but not rocky
14:12:04 <tosky> and queens
14:12:15 <whoami-rajat> tosky, shouldn't it use the previous uwsgi version then? isn't it constrained for every release?
14:12:29 <rosmaita> ok, so we think stein through master should work now?
14:13:10 <whoami-rajat> i still see failure in my master patch i updated yesterday https://review.opendev.org/#/c/735513/
14:14:35 <tosky> uhm, but that was yesterday
14:14:58 <tosky> whoami-rajat: this patch has recent results https://review.opendev.org/#/c/712832/
14:15:00 <rosmaita> hopefully ajaeger will send an update to the ML when everything is back to normal
14:15:29 <rosmaita> #topic python2.7 compatibility issue
14:15:36 <whoami-rajat> oh, thanks tosky
14:16:21 <rosmaita> ok, so we've been talking about backports being more difficult now that master and ussuri are python3-only, and most of the stable branches are not
14:16:41 <rosmaita> so, here's a case in point so we can see what we're up against
14:16:52 <rosmaita> #link https://review.opendev.org/#/c/733100/
14:17:10 <rosmaita> open that link and take a look at the zuul results
14:17:49 <rosmaita> also, since this is a driver change, there was third party ci run on it manually for stable/train
14:18:04 <rosmaita> but, take alook at this:
14:18:17 <rosmaita> #link https://review.opendev.org/#/c/733100/1/os_brick/privileged/scaleio.py
14:18:36 <rosmaita> look at line 99 and think for a minute before opening the next link
14:18:53 <rosmaita> #link https://launchpad.net/bugs/1883654
14:18:53 <openstack> Launchpad bug 1883654 in os-brick train "os-brick fail to get password from config file" [Undecided,In progress] - Assigned to hamza (alqtaishat)
14:19:27 <rosmaita> so, you may remember hearing me say that we need to have good tests that will break on python3 only features
14:19:41 <hemna_> :(
14:19:42 <rosmaita> and you may also notice that i +2'd the above patch
14:20:02 <rosmaita> anyway, here's a test:
14:20:05 <rosmaita> #link https://review.opendev.org/#/c/735989/6/os_brick/tests/privileged/test_scaleio.py
14:20:27 <rosmaita> which is a 68 line test file to test a function that contains 4 statements
14:20:41 <rosmaita> and i'm not even saying that's a great test
14:20:56 <rosmaita> for instance, there's a pretty good chance that it doesn't work on windows
14:21:34 <rosmaita> but, the point is that we need to up our test game
14:21:40 <rosmaita> or at least i do :D
14:21:48 <hemna_> fwiw, there is no scaleio connector for windows
14:22:02 <rosmaita> cool
14:22:21 <rosmaita> but i guess this unit test would run anyway
14:22:37 <whoami-rajat> i think i left a comment on master version of this patch regarding testing that method
14:22:37 <rosmaita> the problem is using the tempfile
14:22:51 <rosmaita> i believe you did
14:23:09 <rosmaita> so bonus points for whoami-rajat
14:23:28 <whoami-rajat> :D
14:23:48 <whoami-rajat> we were just testing if the method was called but not going inside it
14:24:08 <whoami-rajat> i'm not sure if it would've helped with this situation
14:24:24 <enriquetaso> whoami-rajat++
14:24:41 <rosmaita> yeah, so the code wasn't completely un-tested
14:24:47 <rosmaita> just not tested thoroughly enough
14:25:11 <rosmaita> my main point is that reviewers should feel empowered to demand more tests
14:25:28 <rosmaita> and reviewees need to be patient when more tests are demanded
14:26:04 <whoami-rajat> rosmaita++
14:26:08 <rosmaita> any comments or observations before we move on?
14:26:40 <rosmaita> i guess i have one more -- anyone guess why the third-party CI passed?
14:26:49 <enriquetaso> I agree
14:26:56 <rosmaita> because we made everyone move to Python 3!!!
14:27:24 <eharney> :)
14:27:26 <whoami-rajat> rosmaita, on stable branches also the CI running is py3?
14:27:36 <whoami-rajat> i guess it's the same CI
14:27:37 <rosmaita> so even if third-party CI is testing stable branches, we can't count on that to detect problems
14:28:23 <rosmaita> ok, so in the next topic we can discuss why this is a particularly painful bug
14:28:36 <rosmaita> #topic updating OSSN-0086
14:28:54 <rosmaita> ok, so the patch that introduced that bug is one of the patches addressing a security issue
14:29:06 <rosmaita> the security issue had to be fixed in both cinder and os-brick
14:29:17 <rosmaita> luckily, the cinder side of the fix doesn't contain this problem
14:29:28 <rosmaita> (and hopefully doesn't contain a different one)
14:29:58 <rosmaita> so, ussuri and master are python3 only, so we don't need to worry about those
14:30:23 <rosmaita> but we do have T, S, R, and Q (plus some joker has proposed a backport from Q -> P)
14:30:39 <rosmaita> at this point, ocata is dead to me, and pike is getting there
14:30:52 <rosmaita> but we can discuss pike later
14:31:04 <rosmaita> in the ossn, we promised to fix back to queens
14:31:14 <rosmaita> so that's all i'm worried about now
14:31:36 <rosmaita> here's the train patch, which still needs some work:
14:31:41 <rosmaita> #link https://review.opendev.org/#/c/735989/
14:32:07 <rosmaita> in particular, smcginnis and jungleboyj please look at the release note on that patch
14:32:30 <rosmaita> i tested locally, it will be a clean backport from T to S
14:32:52 <rosmaita> so once we get train fixed, that will be big progress
14:33:05 <jungleboyj> Ok.
14:33:21 <rosmaita> so, the OSSN lists the patches. i will have to update it to include the new patch
14:33:49 <rosmaita> we just released cinder and os-brick from train, i think we will have to do new releases?
14:34:12 <rosmaita> so what we're looking at is: merge when gate is fixed, release new os-brick, wait for u-c change to merge, update cinder requirements, release new cinder
14:34:29 <smcginnis> ++
14:34:44 <rosmaita> ok, thanks, just wanted to make sure i wasn't being overly dramatic
14:34:45 <jungleboyj> ++
14:35:30 <rosmaita> i'll include a similar release note on the new cinder release, just saying you only need to upgrade if you are using that particular backend and using python 2.7
14:35:59 <rosmaita> ok, so now when we get to Rocky and Queens, we don't release any more becuase they are in Extended Maintenance mode
14:36:52 <rosmaita> the commit message for the cinder patch refers to the os-brick patch (which is now broken)
14:37:06 <rosmaita> but it also mentions the OSSN, which i will update with the second brick patch
14:37:39 <rosmaita> and i think if the commit message on the second patch says "Fix regression", anyone grabbing the first patch will notice the second in the git log?
14:38:15 <rosmaita> plus, i can send out an embarassing email to the ML giving operators a heads up
14:38:27 <rosmaita> i guess that will take care of Rocky and Queens?
14:38:44 <rosmaita> anyway, if anyone has a better suggestion, let me know
14:38:53 <rosmaita> i am just thinking out loud here
14:39:30 <rosmaita> so basically, what i really need is for stable cores to look for these patches
14:39:39 <rosmaita> i will liberally ping people in IRC
14:39:51 <jungleboyj> ++
14:39:58 <rosmaita> #topic expected behavior when the default volume type config option specifies a non-existent volume_type
14:40:48 <rosmaita> ok, the context is the work in Train to make sure that every volume has a volume_type
14:41:00 <rosmaita> because we used to allow no type, and that caused problems
14:41:27 <rosmaita> so now we have __DEFAULT__ if an operator doesn't set a default type in the cinder conf
14:42:00 <rosmaita> and we have a migration (i think in ussuri?) that makes any untyped volume __DEFAULT__
14:42:05 <rosmaita> (is that correct whoami-rajat ?)
14:42:28 <whoami-rajat> rosmaita, yes
14:42:31 <rosmaita> cool
14:42:46 <whoami-rajat> basically it modifies the volumes,snapshots,encryption tables but yes
14:43:22 <jungleboyj> Sounds familiar.  :-)
14:43:29 <rosmaita> ok, so there is a slight regression at the REST API layer (nothing to do with the database)
14:43:58 <rosmaita> and in fixing that, i am having an argument with whoami-rajat about expected behavior
14:44:28 <rosmaita> so the issue is that before train
14:44:40 <rosmaita> if the operator mis-configured the default volume type
14:44:44 <rosmaita> with a non-existent type
14:45:02 <rosmaita> we would log a message, and let the build succeed with no volume type
14:45:13 <hemna_> the build ?
14:45:22 <rosmaita> building the volume
14:46:05 <rosmaita> the code in Train changes this so that instead of continuing the volume creation
14:46:14 <rosmaita> an exception is raised, and no volume
14:46:37 <enriquetaso> oh
14:46:38 <rosmaita> i was thinking that the analog to the old behavior would be to allow the create to succeed, but with __DEFAULT__ as the volume type
14:46:46 <eharney> failing the create is the right thing to do
14:46:58 <whoami-rajat> #link https://review.opendev.org/#/c/639180/41/cinder/volume/volume_types.py
14:47:01 <whoami-rajat> L#186
14:47:08 <hemna_> I think failing is the right thing to do
14:47:11 <rosmaita> well, except you get a 202 and only find out the volume create failed later
14:47:12 <hemna_> it's a misconfigured cinder
14:47:43 <eharney> we want to fail the create because it also enables the case where an admin wants to make users select a volume type instead of defaulting to one
14:48:14 <rosmaita> ok, what about this case
14:48:16 <hemna_> so the intention is to set an invalid default on purpose ?
14:48:29 <rosmaita> no, but it's logged for the operator
14:48:46 <rosmaita> all i'm saying is that we have __DEFAULT__, why not use it?
14:48:55 <hemna_> in eharney's case an invalid default is specified with no intention of fixing it ?
14:48:57 <jungleboyj> The problem with failing is that it is an API change.  Right?
14:49:05 <eharney> do we check for an invalid volume type in the config at service start up?
14:49:13 <rosmaita> not sure
14:49:18 <rosmaita> the other place this will occur
14:49:18 <whoami-rajat> i honestly don't remember exactly but this made sense as if we configure a wrong default type (the right one points to a specific backend) then the volumes will end up in a totally unexpected backend (with the __DEFAULT__ type)
14:49:29 <hemna_> failing should happen already for a volume type passed in that's invalid
14:49:53 <jungleboyj> At one point in history we were checking bad defaults at start-up.
14:49:54 <smcginnis> eharney: We probably should have a check on start up.
14:50:04 <hemna_> smcginnis +1
14:50:15 <rosmaita> ok, we can follow up with that
14:50:25 <e0ne_> smcginnis: +1. it's a good suggestion
14:50:27 <jungleboyj> ++
14:50:39 <rosmaita> the other case is if a user creates a volume from an image
14:50:43 <hemna_> the only downside to that is that they have to start cinder w/o a default set, then create the volume type, then change the default in cinder.conf, then bounce cinder
14:50:52 <rosmaita> you can put metadata on the image saying what volume type you want
14:51:11 <hemna_> vs w/o the start check, they can specify the missing type, then start cinder and create it.
14:51:13 <rosmaita> and since that's done by end-users, they are bound to make mistakes
14:51:47 <rosmaita> do we want to fail the build, or give them the default?
14:52:06 <rosmaita> they used to get a successful build with no volume type
14:52:08 <hemna_> create should fail with invalid volume type
14:52:17 <hemna_> as it does today if you pass in an invalid type
14:52:35 <eharney> yeah, i don't think we want to quietly use the default in the image case either
14:52:50 <rosmaita> ok
14:53:17 <hemna_> we could do a check at startup and throw a warning
14:53:22 <smcginnis> I agree. We either need to fail on startup, or we should fail the volume creation.
14:53:37 <rosmaita> well, for the image case, we can't check at startup
14:53:54 <jungleboyj> smcginnis: ++
14:54:08 <jungleboyj> Ok, then it sounds like we need to fail at creation.
14:54:19 <rosmaita> ok, so the conclusion is: if a volume-type is mis-specified, we do not give you a default, we fail the create
14:54:28 <eharney> yes
14:54:29 <rosmaita> which actually makes sense
14:54:31 <jungleboyj> Yes.
14:54:36 <rosmaita> it's just different from pre-Train
14:55:05 <whoami-rajat> but we already do it so we're fine :)
14:55:27 <rosmaita> well, apparently we don't do it too much, or someone would have caught the bug earlier :)
14:55:44 <eharney> for more justification: defaulting to a different type would be quite bad for someone who thinks they're using encrypted volumes but accidentally isn't
14:56:12 <rosmaita> yeah, silently doing something that sort of works is never a good strategy
14:56:19 <whoami-rajat> rosmaita, yep, that too!
14:56:38 <rosmaita> i'll revise my patch to preserve the train behavior, then
14:56:54 <rosmaita> ok, thanks for the discussion
14:57:02 <rosmaita> #topic open discussion
14:57:12 <rosmaita> LiangFang is having connection problems
14:57:28 <rosmaita> but is asking people to please look at his patches for volume-local-cache
14:57:47 <rosmaita> the brick patch is using six, which i think we don't want any more
14:58:08 <rosmaita> but i forgot to actually leave that comment on the patch
14:58:14 <jungleboyj> ++
14:58:18 <hemna_> ok I'll take a look
14:58:39 <rosmaita> hemna_ had some questions or observations in the cinder channel just before the meeting started
14:58:49 <hemna_> rosmaita thanks
14:58:53 <enriquetaso> Hi, same request here for the NFS encryption patch https://review.opendev.org/#/c/597148/
14:59:18 <rosmaita> i had really good intentions to review both, but this unending security bug ate my time
14:59:28 <hemna_> so I was wondering why the volume's existing AZ isn't specified in the request spec during retype by default?
14:59:53 <rosmaita> we will have to continue this in the cinder channel, horizon meets in 20 seconds
14:59:55 <hemna_> this ends up resulting in the scheduler picking a backend existing in a different AZ, which fails
14:59:59 <hemna_> heh ok
15:00:09 <rosmaita> sorry about that
15:00:12 <rosmaita> thanks everyone
15:00:15 <rosmaita> #endmeeting