14:00:03 #startmeeting cinder 14:00:04 Meeting started Wed Jan 29 14:00:03 2020 UTC and is due to finish in 60 minutes. The chair is rosmaita. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:05 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:07 The meeting name has been set to 'cinder' 14:00:12 #topic roll call 14:00:14 hi 14:00:19 hey 14:00:27 hi 14:00:43 greetings thierry 14:00:59 #link https://etherpad.openstack.org/p/cinder-ussuri-meetings 14:01:03 hi! just lurking :) 14:01:09 Hi 14:01:12 hi 14:01:12 hi 14:01:33 o/ 14:01:42 looks like a good turnout 14:01:51 #topic announcements 14:01:53 o/ 14:02:29 i've been meaning to mention that you may have noticed, that i'm not as good as jay was about keeping notes in the agenda etherpad 14:02:37 so if you miss a meeting and want to know what went on 14:02:43 you need to look at the meeting log 14:02:53 otherwise, you may think nothing happened! 14:03:00 ok, first real announcement 14:03:09 :-) I can try to get back to doing notes. 14:03:12 #link https://etherpad.openstack.org/p/cinder-ussuri-meetings 14:03:20 that wasn't what i meant 14:03:23 o/ 14:03:33 rocky goes to "extended maintenance" status next month 14:03:40 I think the meeting logs are the best. Especially with the use of #action, #info, etc. 14:03:52 smcginnis_: :-) 14:03:52 jay for notes ++ 14:04:01 yeah, jungleboyj i'd kind of like to push people to using the meeting logs 14:04:11 ok but about rocky going to EM ... 14:04:13 final release must happen before 24 February 14:04:19 rosmaita: Ok. Sounds good. 14:04:20 you are doing great rosmaita 14:04:27 doesn't look like there are any/many outstanding patches for rocky 14:04:28 :P 14:04:50 rosmaita, i think one is mine 14:04:53 so this is really a notice that if there *is* something that looks like it should be backported, please propose it soon 14:05:22 whoami-rajat: right, i will keep an eye on that one 14:05:52 rosmaita, thanks 14:06:07 so we'll do the final rocky release 20 Feb 14:06:42 second announcement: 14:06:43 spec freeze on Friday 31 January (must be merged by 23:59 UTC) 14:06:48 that's this friday 14:06:54 hi 14:06:57 looks like we have 3 specs still in play for ussuri 14:07:06 they are on the agenda later 14:07:28 #topic Continued discussion about 3rd Party CI 14:07:37 thanks to jungleboyj for keeping on top of this 14:07:44 jungleboyj: you have the floor 14:07:51 :-) 14:08:14 Thanks. So, we started this topic last week and it seemed we needed to continue the discussion this week. 14:08:30 Or actually I guess it was during the virtual mid-cycle. 14:08:44 last week as well 14:08:55 :) 14:09:01 Anyway, I sent an e-mail to the mailing list and also targeted the CI e-mails for failing vendors. 14:09:13 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-January/012151.html 14:09:30 We got some responses as you can see in the etherpad. 14:09:31 hi, i am from HPE. we are trying to bring up our CI 14:09:45 raghavendrat: that is good to hear 14:09:46 its in progress 14:09:47 raghavendrat: Awesome. 14:09:52 Thank you for being here. 14:10:26 Thanks to ttx for working with the OSF to reach out to vendors as well. 14:10:53 So, the additional drivers to be unsupported has shrunk. 14:11:12 The question that is left, however, is what do we do now? 14:11:33 Do we need to re-address what we are doing with 3rd Party CI? 14:11:40 we had floated the idea last week about maybe just unsupporting but not removing drivers 14:11:53 i think smcginnis had a good point that you can't do that for very long 14:12:03 as libraries get updated, you will start to get failures 14:12:41 True. We are at the point that we have unsupported/removed nearly half the drivers over the last couple of releases. 14:12:50 Hi , I am from HPE , we are trying to setup for CI .. but facing some difficulties . is there any documentation available or a automated scripts to bring the setup in a single shot ? 14:12:51 i am hoping the Software Factory project may help with CI 14:13:06 An alternative being that we could move them to a different repo with a noop CI job. 14:14:22 yeah, only keep CI-tested oens in mainline, and use a separate repo for everything else 14:14:25 rosmaita: We someone working on setting up an example of how to use that? 14:14:38 rishabhhpe: take a look at https://softwarefactory-project.io/docs/index.html 14:14:38 The current doc is certainly lacking 14:15:00 smcginnis_: It seems keeping them somewhere is somewhat better than totally removing. 14:15:03 hi 14:15:06 jungleboyj: tosky was speaking with someone in the cinder channel the other day about it 14:15:32 i forget who though, but they were setting up a cinder CI 14:15:33 jungleboyj: Then if a distro wants to include them: "apt install openstack-cinder openstack-cinder-unsupported-drivers" 14:16:01 Ok. So, that is an option. 14:16:18 I just jumped in a discussion started by rosmaita :) 14:16:47 smcginnis_: you don't need to move them into a separate repository for distributions to split the packages 14:16:54 basically, for the Software Factory situation, we need someone to actually set it up for cinder and then report back 14:17:05 it was Hitachi i guess rosmaita tosky 14:17:11 tosky: Effect, not cause. ;) 14:17:21 there is a community around Software FActory, and RDO is using it for CI, so it is pretty solid 14:17:35 : ok 14:17:36 whoami-rajat: ty, that's right, it was Hitachi 14:17:45 smcginnis_: moving code around complicates the usage of the history; my suggestion would be to keep them in-tree and mark them somehow with some annotation 14:18:15 tosky: That's what we have today. 14:18:21 smcginnis_: ++ 14:18:22 The issue raised is that will eventually break. 14:18:27 i guess we could blacklist them from tests? 14:18:29 smcginnis_ but with removals part 14:18:42 So the options are either to remove them completely, or move them somewhere out of the way. 14:18:42 putting drivers in a separate repo also means you have to figure out how to keep dependencies in sync, or nobody will actually be able to install the unsupported drivers 14:18:58 eharney: Yeah, it just moves the problem really. 14:18:59 :-( 14:19:23 And since the vendors aren't maintaining them then it is unlikely anyone is going to do that work. 14:20:00 jungleboyj: +1 14:20:10 maybe we could move to unsupported list and when any dependency fails remove it? 14:20:23 smcginnis_: wouldn't it be possible to disable the setuptools entry points (if they are used; at least for sahara we used them) 14:20:23 IMHO, and from the past experience with sahara, either everything should stay in-tree as it is, or each driver should have its own repository from the start 14:20:23 any other solution is looking for troubles :) 14:20:24 m5z: Was just thinking that. 14:20:32 That might be a good compromise. 14:21:02 m5z: that is a good idea 14:21:14 i'd prefer to just have one repo 14:21:20 rosmaita: ++ 14:21:21 But then it's a fire and we can't wait to see if they get an update to any dependencies. 14:21:32 But probably better than just nuking them right away. 14:21:47 :-) 14:22:36 maybe we could have unsupported -> unit test failures -> removal before next release 14:22:46 ++ 14:22:52 we would blacklist as soon as we hit unit test failures 14:22:54 We couldn't do removal before next release. 14:22:55 ++ 14:23:08 It would have to be removal before we can merge anything else because suddenly the gate it borked. 14:23:18 Yeah. 14:23:43 if we blacklisted the tests, wouldn't that unblock the gate? 14:23:44 So, it goes away in that release, but that is ok because it was already unsupported. 14:24:25 rosmaita: So add a SkipTest to get around it right away, then remove by ~milestone-3 if not fixed? 14:24:37 I think I'd rather just remove it at that point. 14:24:59 Yeah, not sure the value of delaying the removal. 14:24:59 well, the skip test would give them a final few weeks to get it done 14:25:00 They can always propose a revert if dependencies are fixed, but considering it is already unsupported, that's not likely. 14:25:09 Fair enough. 14:25:58 smcginnis_: +1 14:26:52 ok, so we would remove an unsupported driver from the tree immediately upon it causing test failures in the gate 14:26:55 smcginnis_: That is like what we are currently doing. 14:27:17 jungleboyj: We wouldn't remove it the cycle after marking unsupported though. 14:27:25 Only as soon as it starts causing failures. 14:27:30 i think in a lot of cases we can opt to just fix the failing tests ourselves -- this is part of why it's useful to keep them in the tree 14:27:46 That might be an option is its something trivial. 14:28:07 we could keep that as an unadvertised option 14:28:18 smcginnis_: eharney ++ 14:28:18 yeah 14:28:56 alright, this sounds good ... i will write up something for us to look at before we announce this 14:29:02 but it think it's a good direction 14:29:09 Soooo... if we adopt this policy, are we going to revert some of the removals we've already done? 14:29:15 I see a lot of value in the CI we run ourselves (for "open source software" drivers). I'm unsure of the real value of 3rd-party CI for us. It's really a service for the vendors, to help them check they are not broken by changes 14:29:34 smcginnis_: foos uwarion 14:29:45 So i'm unsure we should support or unsupport them based on availability of CI 14:29:49 did not mean to say that 14:29:56 ttx: It's also good for the project as a whole as it prevents cases where someone installs cinder and has a lot of trouble getting it to run. 14:30:07 That looks just as bad for cinder as it does for the vendor. 14:30:20 smcginnis_: assuming that the 3rd-party CI actually tests the driver 14:30:22 Sometimes more so, because they think it's cinder's problem, not the vendors problem. 14:30:36 ttx: Yes, but that's what I'm saying. 14:30:38 yeah, i would prefer to keep 3rd party CI 14:31:00 We need 3rd party CI, or we need to remove non-open drivers from tree. 14:31:09 rosmaita: It is at least an indication that the vendor is engaged. 14:31:11 yeah 14:31:34 smcginnis_: i guess we should consider re-instating the drivers removed during this cycle 14:31:43 And I think that there should be some incentive to stay engaged. 14:31:54 those are the two options. But I'd say the more difficult we make 3rdparty CI, the less likely it is to report useful results 14:32:21 It's been a constant headache, but as a whole, I think our 3rd party CI has been useful. 14:32:22 ttx: that is why we are pushing Software Factory 14:32:26 So the two options really are... simplify 3rd-party CI setup, or remove drivers that require special hardware from the tree 14:32:33 Well, that is the thing being worked in parallel is making 3rd Party CI easier. 14:32:44 rosmaita: I agree, just trying to reframe why :) 14:33:11 It certainly can be simple: https://github.com/j-griffith/sos-ci 14:33:22 Just everyone wants to duplicate how infra works. 14:33:23 :-) 14:33:49 I thought at some point infra was pushing people to do that? 14:34:00 I don't think so. 14:34:08 This has been a headache for them too. 14:34:25 Ok. Yeah, I was surprised when they came back with that. I was unaware. 14:34:36 ok, we need to wrap this up for today 14:34:45 Yeah, let's move along. 14:34:53 rosmaita: Want to summarize the plan? 14:34:59 i think we made some progress 14:35:00 rosmaita: Please. 14:35:06 one query: whats end date ... when drivers would be marked as uspported/removed ? 14:35:19 unsupported would be same as now 14:35:33 removal would be when first failure in our gate occurs 14:35:38 rosmaita: ++ 14:35:51 i will write something up for us to review 14:36:04 #link https://wiki.openstack.org/wiki/Cinder/tested-3rdParty-drivers#Non-Compliance_Policy 14:36:21 #action rosmaita write up summary of what we decided or edit ^^ 14:36:31 ok. will have a look and also keep close watch 14:36:59 you may want to reach out to the hitachi people and combine efforts on Software Factory 14:37:06 Sounds good. Should I revert the removals that I pushed up this cycle? 14:37:11 check the openstack-cinder channel log for yesterday 14:37:18 ok 14:37:39 jungleboyj: i would hold off until after we are absolutely sure about this 14:37:52 (just in case someone thinks of a major objection we haven't considered) 14:37:52 Upgrade checkers too. 14:37:58 right 14:38:06 thanks jungleboyj and ttx 14:38:08 Ok. So, continue discussion. 14:38:24 #topic Spec: Volume local cache 14:38:30 hi 14:38:32 Thank you guys. 14:38:34 #link https://review.opendev.org/#/c/684556/ 14:38:55 should we do a microversion change for this? 14:39:00 my questions have been met except for the microversion one 14:39:09 https://review.opendev.org/#/c/684556/12/specs/ussuri/support-volume-local-cache.rst@180 14:39:45 i'm not sure "volume details" is the right place for that information unless i'm misunderstanding what that refers to 14:39:57 it should be part of the connection info etc, not the volume metadata? 14:40:09 it is in connection info 14:40:22 well, the volume-type extra specs will have the cacheable property 14:40:25 cinder fill the fields in that 14:40:38 "volume details" sounds like it would appear on "cinder show" etc 14:41:06 yes, that's how it sounded to me 14:41:19 sorry for misleading 14:42:39 should I change the word "volume details", then keep microversion not change? 14:42:59 yes 14:43:10 ok, thanks 14:43:14 no microversion impact if the API response doesn't change 14:43:44 ok, other than that, i think eharney and geguileo had a bunch of comments on earlier versions of the spec 14:43:45 ok 14:44:07 would be good if you could make sure the current version addresses your concerns 14:44:52 LiangFang: did you have any questions? 14:45:10 no more questions now:) thanks 14:45:20 ok, great 14:45:32 #topic src_backup_id 14:45:41 #link https://review.opendev.org/#/c/700977/ 14:45:59 this is close to being done 14:46:16 we talked last week about could it be a bug instead of a spec 14:46:28 Yeah, I still think this should just be dropped as a spec. Just add it. 14:46:30 but eric brought up a point about us using volume metadata for the field 14:46:42 i think that needs to be documented 14:46:57 mainly, that operators can't rely on it being there or accurate 14:47:13 but otherwise, i think the proposal is fine 14:47:27 also there was an issue about which id is used for incrementals 14:47:33 it's addressed in the spec 14:47:58 so, this will just need quick reviews once it's revised 14:48:07 but i don't think there's anything controversial 14:48:27 #topic Spec: 'fault' info in volume-show response 14:48:37 #link https://review.opendev.org/#/c/689977/ 14:48:56 this is probably not ready 14:49:15 it's still not clear why the user messages won't work 14:49:33 and i don't like the idea of adding another DB table until we are sure it's necessary 14:49:39 yeah, i still don't have a sense of why we want to add this when we already have a system that attempts to mostly do the same thing 14:49:54 ++ 14:50:09 there are probably some subtle differences but i suspect the answer is to just improve what we have rather than creating a new API for this 14:50:15 eharney: ++ 14:50:25 i will keep an eye on it for revisions 14:50:41 seems like it's inspired by nova instances having 'fault' property 14:51:19 yes, it's just not clear to me that it's going to provide the info the proposer is looking for 14:51:19 we currently have a scheme that ties faults to operations rather than the object being acted on 14:51:26 it's different, but seems to work well 14:51:49 if you want something like nova faults you can query our user messages by volume id already 14:52:29 well, i left enough comments asking for specific answers for what exactly can't be done 14:52:34 so we'll see what happens 14:52:37 yep, agreed. it's different but works 14:52:45 #topic sqlalchemy update to 1.3.13 breaks cinder 14:52:54 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-January/012210.html 14:53:06 ok, so the situation is that one of our unit tests fails 14:53:33 i took a look, but it turns out what we're doing in the test *only* happens in that test 14:53:41 so we could fix this by just changing the test 14:53:56 or by slightly modifying the db.sqlalchemy.api 14:54:31 i am inclined to just change the test at this point 14:54:53 because the db api change loads the glance metadata into each volume object 14:54:55 geguileo fixed some DetachedInstanceError problems a while ago, i wonder if this is a similar bug in our objects code that is just being revealed in tests now 14:55:13 that could be 14:55:42 most of the time when we want the glance info, we just make a call to get it, we don't expect it in the volume object 14:56:47 i'll grep the logs for geguileo's fix and see whether it's the same kind of thing 14:56:58 because i guess we'd do the same fix now to be consistent 14:57:15 ok, i'll take a look and then update my patch 14:57:37 the issue is usually us trying to do a lazy load when we no longer have the transaction in place... 14:57:41 i'm not sure how anxious the requirements team is to get sqlalchemy 1.3.13 into u-c 14:57:56 it works if it happens fast enough, but that's not usually the case iirc 14:58:12 maybe that's why it's suddenly broken 14:58:21 they may have optimized some code 14:58:30 and now it can't happen fast enough 14:58:36 in other words, it's usually bad code in cinder, something that could happen in a production env 14:59:04 as far as i can tell, this particular pattern is only used in that one unit test 14:59:19 i think the bot automatically updates u-c when a lib is released. 14:59:25 i mean puts up a patch for it 14:59:48 looks like we are out of time 15:00:01 thanks everyone! will try to have some open discussion next week 15:00:04 Thanks! 15:00:05 thanks! 15:00:07 but the CI discussion was helpful 15:00:16 thanks 15:00:27 #endmeeting