14:00:04 #startmeeting cinder 14:00:04 Meeting started Wed Jun 28 14:00:04 2023 UTC and is due to finish in 60 minutes. The chair is whoami-rajat. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:04 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:04 The meeting name has been set to 'cinder' 14:00:08 #topic roll call 14:00:11 Hello all 14:00:22 hi 14:00:38 o/ 14:00:40 hi 14:00:54 o/ 14:01:18 #link https://etherpad.opendev.org/p/cinder-bobcat-meetings 14:01:35 guess who forgot this was the last wednesday of the month? 14:01:37 0/ 14:01:38 * whoami-rajat facepalms 14:01:47 o/ 14:02:15 o/ 14:02:34 o/ 14:02:57 o/ 14:03:48 o/ 14:04:48 good turnout today 14:04:51 let's get started 14:04:54 #topic announcements 14:05:06 Milestone-2 (06 July) 14:05:17 we have the Volume and target Driver freeze coming up next week 14:05:23 #link https://etherpad.opendev.org/p/cinder-2023-2-bobcat-drivers 14:05:34 I have created this etherpad to track the open drivers 14:05:54 currently there are only 2, Yadro and Lustre 14:06:10 if there are any other drivers which i missed, please add it to the list 14:06:22 o/ 14:06:45 others, please review the drivers 14:06:48 o/ 14:07:06 next, SQLAlchemy 2.0 resolution 14:07:12 #link https://review.opendev.org/c/openstack/governance/+/887083 14:07:29 JayF proposed a governance patch to provide details about migration to SQLAlchemy 2.0 14:07:54 basically they are planning to bump the requirements to sqlalchemy2.0 early in 2024.1 cycle 14:08:05 to get an idea of which projects are compatible with it and which are not 14:08:26 it is also mentioned that projects need to move away from sqlalchemy-migrate and implement alembic 14:08:45 that work has already been done in Cinder, thanks to Stephen 14:08:57 hopefully the requirement bump to sqlalchemy 2.0 won't affect cinder but let's see 14:09:04 we will have sufficient amount of time to fix the gate 14:09:23 also the resolution isn't final yet so there might be changes but above points should be constant, as they have been for past few cycles 14:09:43 next, EM Discussion 14:09:50 there is no update on this 14:10:13 was just checking the TC meeting discussion and I guess Kristi or JayF is going to start a ML thread for continuing discussion from PTG 14:10:32 let's see if we get an update this week 14:10:47 not sure if we want to hold on our EOL patches till then 14:10:55 rosmaita, what are your thoughts? ^ 14:11:24 not sure, really 14:12:10 maybe we could do one more announcement to give packagers a heads-up 14:12:26 sounds good 14:12:30 i think we can separate out the overall EM discussion from what's best for cinder right now 14:12:34 do we want to use the old thread or create a new one? 14:12:55 i think create a new one 14:12:57 yeah true, not sure how long this discussion going to go 14:13:12 so we can get something like "FINAL CALL FOR COMMENTS: " in the subject line 14:13:41 sure, i will take an action item for this 14:13:49 and good idea to separate it out from the EM discussion 14:14:14 #action whoami-rajat to send a mail regarding reminder for cinder branches being EOL 14:14:30 sounds good 14:14:39 thanks! 14:14:54 that's all the announcements we had for today 14:15:00 anyone has anything else? 14:15:32 rosmaita, i did do a summary of your summary of PTG, not sure if you got time to go through it but apart from that, anything from PTG you would like to highlight? 14:16:08 i can't think of anything 14:16:22 i hope the poor turnout for the PTG was because of bad timing, not lack of interest 14:16:38 that is, we already had the 2023.2 PTG virtually a few months ago 14:17:05 yeah, that could be the same reason why we weren't able to gather much topics for cinder sessions as well 14:17:07 so maybe companies didn't think it was a priority to go to this one 14:18:14 agreed, doesn't seem the project is losing interest, it was just an event at odd time 14:19:02 ok, guess that's all for announcements 14:19:05 let's move to topics 14:19:13 #topic Bug: https://bugs.launchpad.net/cinder/+bug/2002535 14:19:24 not sure who added it, there is no IRC nick 14:19:47 my bad it's me 14:20:30 so it seems that there is a conversion from qcow2 to raw during the creation from an image 14:20:31 oh ok, please go ahead 14:20:53 and the metadata of the volume is not updated 14:20:59 so it remains qcow2 14:21:06 while the block dev is raw 14:21:34 as a result, when creating an instance the attachment shows qcow2 but it works fine 14:21:48 until you want to resize it where an error is thrown 14:21:57 saying that the format mismatch 14:22:21 So I have proposed a fix: https://review.opendev.org/c/openstack/cinder/+/881549 14:22:27 are you using the generic nfs driver or vendor nfs? 14:22:52 is that the right gerrit link? 14:23:13 I'm using powerstore but it happens in the nfs generic space 14:23:15 weird 14:23:40 https://review.opendev.org/c/openstack/cinder/+/887081 14:23:46 that's the right one 14:24:11 I reproduced part of the problem of the launchpad bug.. 14:24:33 i think that patch has some problems but i'll review it more closely and comment there 14:24:44 sure, I'm still learning 14:24:53 When creating a volume from a glance image.. cinder created a raw volume. This happens because the nfs driver first fetches the glance image glance and keeps using the same format. 14:25:01 (it probably autodetects the wrong format when using raw images) 14:25:12 fetch the image to raw* 14:25:14 happystacker, is this config option enabled in your deployment? nfs_qcow2_volumes 14:25:44 and are you using a master deployment or an older version? 14:25:47 Yes, but it seems not to change anything 14:26:17 master deployment 14:27:52 hmm, i think we have a problem but need to test it 14:28:06 let me know if I can help more 14:28:16 we set the format during driver initialization to 'raw' https://github.com/openstack/cinder/blob/e673bcc368d3a24ec21713adbd83b4ab6cbcae18/cinder/volume/drivers/remotefs.py#L180 14:29:03 but if we enable the qcow2 option, we switch the format to 'qcow2' https://github.com/openstack/cinder/blob/e673bcc368d3a24ec21713adbd83b4ab6cbcae18/cinder/volume/drivers/remotefs.py#L326 14:29:27 but we will restart the service while doing the config change 14:29:30 so i guess it should be fine 14:30:02 anyway, as i said, need to be tested but it is possible that the format value is not assigned correctly in some cases 14:30:12 whoami-rajat, in L530 `image_utils.fetch_to_raw` 14:30:54 problem happens here: https://github.com/openstack/cinder/blob/e673bcc368d3a24ec21713adbd83b4ab6cbcae18/cinder/volume/drivers/remotefs.py#L530 14:31:52 so maybe we just can add a validation step which prevents the conversion if the qcow2 option is enabled? 14:32:01 i think we always convert the qcow2 image to raw when we want to create a bootable volume from image 14:32:23 do we support writing a qcow2 image to a qcow2 volume? 14:32:26 As for now, there is a check which verifies that both format are the same, if it's not the case, it'll update the metadata 14:32:50 it's always converted yes 14:32:51 we could support qcow2->qcow2, not sure if it tries to currently 14:33:25 so if the conversion happens, then the metadata should be updated as well right? 14:33:32 i remember disabling qcow2->qcow2 in glance but not sure what was the reason 14:33:36 it looks like the key is checking the format when copying the image and doing the right thing when the volume is supposed to be a qcow2 format 14:35:24 yep we don't support it in glance cinder store, let me see if i can find the reason https://github.com/openstack/glance_store/blob/5a81f77bd48e46eac6ab0636f0f52dbceec4e8d3/glance_store/_drivers/cinder/nfs.py#L72 14:35:55 because the glance store is expecting to deal with cinder volumes that are raw, it doesn't support nfs snaps either 14:36:04 presumably because that restriction is much easier than adding support for all of that 14:36:10 ok 14:36:37 I'm not sure if possible but maybe we replace fetch_to_raw with `fetch_to_volume_format` https://github.com/openstack/cinder/blob/e673bcc368d3a24ec21713adbd83b4ab6cbcae18/cinder/image/image_utils.py#L822 14:36:51 which means we need to keep the conversion 14:37:04 but update the metadata to reflect the change? 14:37:26 so that the attachment will be raw based but not qcow2 14:37:32 https://github.com/openstack/glance_store/commit/85c7a06687291eba30510d63d3ee8b9e9cb33c5f 14:38:04 looks like there is some problem with extend volume in case of qcow2->qcow2, as i see written in the commit message 14:38:25 but don't remember anything else ... 14:40:01 also, we need to support writing images to qcow2 volumes to support nfs encrypted volumes 14:40:01 in that case.. i think updating the metadata looks good 14:40:28 yes, nfs encryption works with qcow2 14:40:51 eharney, ack, i think your comments makes sense that we are lacking a lot of support around it so better to just block it 14:42:58 senrique, i don't exactly know the reason why we call fetch to raw here but we do that in all other reference drivers like lvm, ceph etc 14:43:47 but again they don't have to deal with qcow2 volumes 14:43:52 not really sure 14:46:43 let's continue this discussion in the next event, maybe good for midcycle -2 ? let's see 14:46:48 we have another topic to discussi 14:46:51 discuss 14:47:15 #topic test_rebuild_volume_backed_server failing 100% on ceph job 14:47:19 senrique, that's you 14:47:32 CI was failing because of that 14:47:42 #link https://bugs.launchpad.net/cinder/+bug/2025096 14:47:56 but now the "test_rebuild_volume_backed_server" test is skipped for the Ceph job until the problem is fixed. 14:48:14 #link https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/887003 14:48:24 that's all :) 14:48:28 senrique, i analyzed the job results yesterday and found the issue 14:48:41 let me see if it's still in my notes 14:48:50 i think the nfs job is failling with the same issue tho 14:49:21 currently we have a timeout of 20 seconds per GB on the nova side during reimage 14:49:38 which we optimistically set for a good deployment 14:49:50 but our gate jobs are pretty slow with IO and hence takes more time 14:50:02 the cinder logs tell the truth about the time taken for the operation 14:50:05 first log: 01:02:49.782746 14:50:11 last log: 01:03:23.661182 14:50:24 so it's around 34 seconds which is greater than the timeout 14:50:34 hence we fail on the nova side 14:50:35 Jun 26 01:03:11.721921 np0034441113 nova-compute[97944]: ERROR oslo_messaging.rpc.server eventlet.timeout.Timeout: 20 seconds 14:51:07 I didn't know there was a bug report, i will add my analysis to it though dansmith has already proposed patches to fix it 14:51:13 thanks for linking the bug senrique 14:51:36 thanks whoami-rajat, good report 14:52:03 that's a good catch whoami-rajat 14:52:21 i'm currently looking into cinder-tempest-plugin-cbak-ceph failures as well, unclear if those are related 14:52:23 ++ 14:57:26 sorry got busy updating the bug, will do it later 14:57:31 let's move to open discussion for 3 minutes 14:57:34 #topic open discussion 14:57:47 there is a big list of review requests today 14:58:13 please take a look when you get some time 14:58:25 thank you! 14:58:40 Hi all. I just would like to have some eyes on https://review.opendev.org/c/openstack/cinder/+/868485, its been a while. Thanks 15:00:15 drencrom: ack 15:00:20 we are out of time 15:00:24 thanks everyone for attending 15:00:26 #endmeeting