14:00:02 #startmeeting cinder 14:00:02 #link https://etherpad.openstack.org/p/cinder-ussuri-meetings 14:00:02 #topic roll call 14:00:03 Meeting started Wed Apr 15 14:00:02 2020 UTC and is due to finish in 60 minutes. The chair is rosmaita. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:04 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:07 The meeting name has been set to 'cinder' 14:00:16 hi 14:00:22 hi 14:00:28 o/ 14:00:29 hi 14:00:32 hi 14:00:45 o/ 14:01:00 o/ 14:01:01 Hi 14:01:12 o/ 14:01:16 o/ 14:01:36 lots on the agenda today so i'll get started 14:01:44 #topic announcements 14:01:59 {mitaka,newton}-driverfixes branches have been tagged eol and deleted 14:02:09 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-April/014055.html 14:02:27 we agreed to do this a few months ago, but it required some manual intervention to make it happen 14:02:51 if you haven't noticed, there's been a etherpad migration and domain change 14:03:02 #link http://lists.opendev.org/pipermail/service-announce/2020-April/000003.html 14:03:31 hopefully, you won't see anything, though monday the meeting agenda page was causing an internal server error 14:03:34 but that was fixed 14:03:55 #topic announcements - Victoria Virtual PTG 14:04:03 ok, so the dates have been set 14:04:13 one week earlier than the physical event 14:04:19 not sure why 14:04:21 Yay!?! 14:04:27 new dates: June 1 through June 5 14:04:36 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-April/014126.html 14:04:48 that email outlines the ground rules for the virtual PTG 14:05:02 key things impacting us are: 14:05:09 No team can sign up for more than 4 hours per UTC day 14:05:09 o/ 14:05:18 No team can sign up for more than 16 hours across all time slots 14:05:19 Which is good. 14:05:40 yeah, our 2-hour blocks have worked well for the midcycle sessions 14:05:53 anyway, here are the time slots: 14:06:04 #link https://ethercalc.openstack.org/126u8ek25noy 14:06:20 * lseki sneaks in 14:06:31 and here is a meeting time planner for the first day, covering the TZs usually represented at our cinder meeting 14:06:43 #link https://www.timeanddate.com/worldclock/meetingtime.html?month=6&day=1&year=2020&p1=159&p2=881&p3=141&p4=367&p5=176&p6=237&iv=0 14:06:55 i don't want to spend a lot of time on this today 14:07:19 i guess the thing to do is, please put time suggetions on the etherpad 14:07:29 #link https://etherpad.opendev.org/p/cinder-victoria-ptg-planning 14:07:35 or something 14:07:45 i really don't know a good way to organize this 14:07:53 so feel free to suggest ideas 14:08:02 :-) 14:08:16 but, do put time slot suggestions on the etherpad 14:08:33 or else i will schedule everytihing to be convenient for Roanoke time :) 14:08:41 hello world friends 14:08:51 Looks like 7 am Monday is the most likely time where things start to work. 14:08:54 ok, moving on 14:09:01 jungleboyj: :) 14:09:04 #topic announcements - FFEs 14:09:25 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-April/014035.html 14:09:33 that was the announcement to the ML 14:09:48 no other requests came in before yesterday's deadline, so that's all 14:10:00 i've been tracking progress here: 14:10:10 #link https://etherpad.opendev.org/p/cinder-ussuri-FFE-progress 14:10:16 looks like stuff is happening 14:10:40 i am beginning to wonder whether the nfs-volume-encryption is going to have to wait until victoria 14:10:51 enriquetaso: eharney: opinions? 14:11:04 i have also wondered the same 14:11:21 yep, I think is going to victoria 14:11:28 Something like that doesn't sound like something that we want to rush in place. 14:11:32 ok, i will re=target that bp 14:11:33 which is to say i wouldn't be upset about moving it out 14:11:51 and let's try to focus on it very early in victoria 14:12:10 ok, the other FFE is macrosan, but they are on the agenda for later 14:12:18 ok final item 14:12:30 #topic announcements - end-of-cycle driver stuff 14:12:40 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-April/014124.html 14:12:50 there are some outstanding items 14:13:11 the link above is something i sent to the ML and also forwarded directly to the driver maintainers of record 14:13:35 also, tracking that on this etherpad: 14:13:46 #topic https://etherpad.opendev.org/p/cinder-ussuri-driverstuff 14:13:54 oops 14:13:58 :P 14:14:05 #link https://etherpad.opendev.org/p/cinder-ussuri-driverstuff 14:14:23 anyway, mostly driver maintainers need to check that list 14:14:40 i'll raise the priority flag on reviews as they come in 14:14:57 also, I'm beginning the 3rd party compliance check 14:15:08 ++ 14:15:28 but, like i said last week, driver maintainers can check proactively to make sure their CIs are functioning reliably 14:15:45 because RC-1 is next week, and that's the deadline for having everything working 14:16:06 though, it was brought to my attention that the covid-19 situation is keeping people out of datacenters 14:16:07 Also a reminder that it is good to point new driver submitters to https://docs.openstack.org/cinder/latest/contributor/new_driver_checklist.html to make sure they are covering everything that we should be checking. 14:16:12 and not everyone can do stuff remotely 14:16:19 smcginnis: ty, good reminder 14:16:36 ok, that's all for announcements 14:16:47 #topic MacronSAN driver discussion 14:16:53 ruffian_sheep: that's you 14:17:27 Regarding tempest.api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap.test_volume_swap_with_multiattach in the tempest test case, there is a problem that cannot be passed. 14:18:26 Without any changes to the driver, I additionally create an instance and a volume, and perform the mount operation. This test case can pass. 14:18:57 you mean you can manually do what the test is doing, and you succeed? 14:19:05 http://120.77.149.115/88/711388/6/check/cinder-isicsi-driver/848d283/tempest_log/tox.log 14:19:09 http://120.77.149.115/88/711388/6/check/cinder-isicsi-driver/4a616b6/tempest_log/tox.log 14:19:19 ruffian_sheep, did you find any other error log except from the n-cpu one? 14:19:29 This is the log result of two condition changes 14:20:33 whoami-rajat | rosmaita: This test can be performed, but an error will occur when the resource is finally cleaned up. Moreover, in the logs of related test cases, only errors are found in the n-cpu. 14:21:40 In fact, the same error report exists in n-cpu, but after changing the conditions, the use case can be executed without error. 14:22:17 when you say "changing the conditions", what do you mean exactly? 14:22:51 the driver seems to work correctly if the test passes in the local run 14:22:57 Create a new instance and volume, and perform the attach_volume operation. 14:24:01 I don't know what the specific reason is, but when I do this and then execute the tempest use case, it can be executed completely correctly. 14:24:22 ruffian_sheep: Unrelated, but just want to note that the tempest run output should really be in the root log file. So job-output.log would be where most would expect to go to find that. Don't want to divert this discussion, but just letting you know. 14:24:34 there are some strange cinder api errors associated with that test failure in screen-c-api.log 14:25:31 smcginnis : get, i will change it 14:26:32 eharney: I also saw it, but from the execution results, it seems to have no effect? And it is not directly related to the use case of multiattach? 14:26:43 c-vol shows a lock held for 51 seconds, maybe something is taking longer than tempest expects there 14:27:11 look at req-6f364876-aafd lines in c-api and c-vol logs... probably don't have time to debug it all here 14:28:21 ruffian_sheep: so, multiattach is a new feature you are adding 14:28:40 i wonder whether you should hold off on that until V 14:28:58 so you can get everything set to mark the driver 'supported' in U 14:29:10 Yes, I wanted to add this feature to the ussuri version, but from the deadline, it is a bit unrealistic. 14:29:28 it looks like everything else is working though? 14:30:17 Yes, and I initially passed the test case on the s version. 14:31:21 Because it was unclear at the beginning to execute the ci environment regularly. And use the latest openstack version. I carried out the tempest test on the s version. 14:31:23 i think the thing to do is revise your patch without multiattach and address the comments on the review 14:31:54 anything else? 14:32:05 rosmaita++ 14:32:07 rosmaita: ++ 14:32:17 Yes, I confirmed this problem with whoami-rajat in the afternoon. For now, do not add related new features. 14:32:52 ok, thanks ... we will keep an eye on your patch 14:32:59 This way the CI could be marked supported and other features (apart from multiattach) could make it as well (probably) 14:33:14 into ussuri 14:33:29 #topic Continued discussion of: Cinder throws error creating incremental backup from parent in another project 14:33:35 ganso: that's you 14:33:47 we started this 2 weeks ago 14:33:55 link on the etherpad 14:34:37 i forget where we were on this, though 14:35:15 #link https://bugs.launchpad.net/cinder/+bug/1869746 14:35:15 Launchpad bug 1869746 in Cinder "Cinder throws error creating incremental backup from parent in another project" [Undecided,Confirmed] 14:36:12 ganso: comments? 14:36:25 oh sorry I missed the ping 14:36:30 ok 14:36:51 so, last time we discussed several different ways to tackle the problem 14:37:03 I summarized them in this etherpad 14:37:18 https://etherpad.opendev.org/p/cinder-backup-bug 14:38:02 basically I came up with 3 approaches from what we discussed, each with its pros and cons 14:38:18 ganso: how big a deal is this bug? 14:38:25 i'm getting server errors trying to load the etherpad 14:38:41 ganso: Having gone through all of those, is there one approach that makes the most sense to you? 14:38:57 eharney: Try a hard reload (ctrl+shift+r). 14:39:12 They upgraded etherpad, so it could be a bad cached js lib. 14:39:29 rosmaita: it is workaroundable, and IMO it is the consequence of the customer doing it not in the most appropriate way, my main concern is actually fixing it to avoid anybody else from hitting it in the future. 14:39:43 ganso: excellent 14:39:58 i think we should fix it, just looks like it could wait for V? 14:40:10 smcginnis: I haven't gone through them at the implementation level, I didn't have many cycles, I was mostly gathering info to see if I could map all the concerns and spot a dealbreaker 14:40:32 rosmaita: I was actually expecting a backportable fix 14:40:35 ganso: I mean just conceptually, what would be the most expected behavior of the service? 14:41:06 rosmaita: however, as you can see in the etherpad, the fix that looks more semantically correct is (c), but that doesn't look like it can be backported 14:41:20 In other words, how _should_ cinder handle something like this. What is the most correct and expected behavior under this scenario. 14:41:34 ok, let's all take an item to look over the etherpad and return to this next week 14:41:52 and ganso maybe you can answer smcginnis's question on the etherpad 14:41:55 Based on the discussion of whether an admin should be able to backup a tenant's volumes, (c) did sound like the more correct path to me. 14:42:16 smcginnis, can't load the etherpad with ctrl+shift+r :/ 14:42:21 smcginnis: exactly, so the expected behavior, we tackled the situations where someone creates a backup on behalf of someone else, and accounts against that someone else's quota, this looks slightly unusual to me. What we see today makes sense semantically, but causes the bug, and it becomes a bit unusual because backup have parent relationships 14:42:37 more continued discussion: tosky are you around? 14:42:38 whoami-rajat: yeah, looks broken on the server side :/ 14:42:40 In order to add more info: After debugging this a bit more and thanks to Rajat's investigation. There is a difference between the API using elevated (admin role) and manager using the user role. Looks like, there is not reason reason why we are using elevated context on the API code (checked into the DB) but removing the elevated may lead to some other broken functionality that worked previously (in general this could affect other 14:42:40 cases). 14:42:44 rosmaita: yep 14:42:54 * tosky waits for green light 14:42:55 #topic continuation of: cinder-tempest-plugin 14:43:33 apart from the reminder ("please go over the open cinder-tempest-plugin reviews"), I have a question about https://review.opendev.org/#/c/639205/ 14:44:07 as you can see it is an interesting experiment for a more complex scenario tests, which could deserve its own job 14:44:40 but it requires iSCSI multipath, and I'm not sure how to setup that on the gates 14:45:11 you would need the multipath daemon running 14:45:28 from some past discussions with some people, as I wrote in a comment on the review, I may have (incorrectly) got that it's possible to use LVM for that? 14:45:30 the theory (per geguileo) is that we could do this with the lvm driver by setting some additional options -- but not sure what all the steps are 14:45:59 yeah, it's easy to do 14:46:01 i don't know if it requires configuring additional IPs or anything like that 14:46:10 yes! So the questions are if a) if you would like to have this complex realistic scenario in the gates and b) if someone could please provide the instructions or guidelines for that 14:46:12 that's it 14:46:32 I think 639205 needs a rebase in order to run the new job "cinder-tempest-plugin-cbak-ceph" 14:46:48 it would definitely be cool to get this running 14:46:48 oh, sure, and also for the ddt thing 14:47:10 what are we testing here with that review? that multipath daemon can handle failover, or that cinder/os-brick can do a multipath attach? 14:47:10 next up: enriquetaso 14:47:10 one needs to set iscsi_secondary_ip_addresses with other IP addresses 14:47:21 #topic continuation of: Allow removing NFS snapshots in error status is stuck 14:47:30 if we are using a single node deployment, one can use 127.0.0.1 as the secondary IP 14:47:49 geguileo: please comment on the review :) 14:48:25 geguileo: doesn't have to be exact, if you can just point what to look for 14:48:25 ok.. so after discussing with eharney, I guess the patch isn't so bad 14:48:43 #link https://review.opendev.org/#/c/679138/ 14:48:53 About NFS snapshot in error... the model is always -- try to delete the snapshot on the backend, if it doesn't exist, then succeed, all drivers do that, this should just follow the same model 14:49:20 so, I should update the patch with this comment and see what happens 14:49:33 that sounds sensible 14:49:36 ++ 14:49:48 that was quick, thank you enriquetaso 14:49:54 \o/ 14:50:02 #topic Cinder master compatibility status with Ceph Nautilus and beyond 14:50:06 vkmc: that's you 14:50:13 o/ 14:50:17 hey folks 14:50:36 I'm working on updating the devstack-ceph-plugin script 14:50:50 something we use on the ci for manila, cinder, glance and nova 14:51:13 right now we are testing the master branch for openstack with old versions of ceph 14:51:37 luminous is the latest version we have in there 14:51:59 I had looked at rewriting the plugin using ceph-ansible since it seems to handle scenarios and versioning better than the old plugin code 14:52:12 so... I don't want to break your ci, and therefore I wanted to ask before moving forward 14:52:17 ceph-ansible can also handle ceph iscsi too 14:52:21 have you been testing cinder with nautilus? 14:52:22 so if we change the default release of ceph in there, it will change it for our jobs on stable branches too, right? 14:53:00 it depends on how it is set on that job, I guess 14:53:12 my idea was to submit an experimental job first 14:53:17 keep what we have now, continue testing with luminous 14:53:28 only the master (and maybe ussuri) variant of the job could be changed to use nautilus 14:53:39 and then, with time, drop the experimental job and promote it 14:53:44 i don't recall that we specify versions in our jobs... probably need to decide what the correct ceph version is to use for older stable branches, and if it matters 14:54:02 btw, this is the patch for this update I'm talking about, if you want to take a look https://review.opendev.org/#/c/676722/ 14:54:31 eharney, we don't have that option on the plugin yet, we just pull whatever version is hardcoded there 14:54:46 and that's what I want to implement :) 14:55:19 I see you have two gates in cinder (the ones I could see)... one in the check pipeline and one for third party IBM, not sure if you have another one using the plugin 14:55:29 oh... that CEPH_RELEASE var is misleading currently 14:55:30 vkmc: you would need to set CEPH_RELEASE in the vars section of the branch-specific variant of the job 14:55:37 tosky, yes 14:55:38 uh 14:56:37 so if we change the default release of ceph in there, it will change it for our jobs on stable branches too, right? <- yes 14:57:14 so, experimental gate for master, continue using whatever version we were using for stable branches 14:58:23 we just need to pick what version(s) we want to run for stable 14:58:26 we have 2 more minutes, so we can continue the discussion on the cinder channel if that sounds good for you 14:58:49 ok, experimental sounds good, won't break anything 14:58:56 we can figure out the details on reviews 14:59:01 ok, 1 minute 14:59:07 thanks, vkmc 14:59:07 thanks folks 14:59:15 #topic open discussion 14:59:18 30 seconds 14:59:21 thanks 14:59:27 Thanks! 14:59:30 anything? 14:59:34 cinder is awesome 14:59:42 ok, can't top that 14:59:42 :) 14:59:45 Just wanted to point out that we will need this before victoria when py38 becomes voting: https://review.opendev.org/#/c/720008/ 14:59:45 :D 14:59:45 #endmeeting