Wednesday, 2018-12-12

openstackgerritYikun Jiang proposed openstack/cinder master: Add x_project_id, accepted to transfers
openstackgerritYikun Jiang proposed openstack/cinder master: Improve volume transfer records
openstackgerritBrin Zhang proposed openstack/cinder master: Add user_id in backup list and show API
*** lpetrut has joined #openstack-cinder06:30
openstackgerritYikun Jiang proposed openstack/cinder master: Add x_project_id, accepted to transfers
openstackgerritYikun Jiang proposed openstack/cinder master: Improve volume transfer records
*** Dinesh_Bhor has joined #openstack-cinder06:47
openstackgerritYikun Jiang proposed openstack/cinder master: Add x_project_id, accepted to transfers
openstackgerritYikun Jiang proposed openstack/cinder master: Improve volume transfer records
openstackgerritGorka Eguileor proposed openstack/cinder stable/rocky: Ensure image utils don't block greenthreads
openstackgerritwang yong proposed openstack/cinder master: Cinder volume driver for Inspur AS13000 series
openstackgerritGorka Eguileor proposed openstack/cinder master: Add cinderlib functional tests
openstackgerritSofia Enriquez proposed openstack/cinder stable/rocky: Ensure image utils don't block greenthreads
mszwed /q lennyb14:33
mszwedoops :)14:33
noonedeadpunkHi everyone. It seems, that read/write_iops_sec_per_gb_min do not work at rocky. So I have a disk type with the following limits
noonedeadpunkAnd have 2 disks, which are connected to the instance - 10Gb VDC and 200GB VDD:
noonedeadpunkAnd min value is always applied, not depending on size of the disk15:11
whoami-rajatHi smcginnis  jungleboyj , can you please take a look at . Thanks!15:58
openstackgerritWoojay Poynter proposed openstack/cinder master: Update DRBD (drbdmanagedrv) Driver to LINSTOR (linstordrv)
markguz_jungleboyj: i got it mostly working as you described. can't seem to actually attach the volume tho. nova throws an exception complaining about "KeyError: 'target_wwn'"18:23
openstackgerritMerged openstack/cinder master: An alternate way to fix retype notifier test case
jungleboyjmarkguz_:  Hmmm, that is interesting.  So something isn't totally working obviously.18:35
markguz_jungleboyj: yeah. i think it can't locate the FC device...19:00
jungleboyjOdd that it is coming through as a KeyError.  Seems like that is a bug.19:03
jungleboyjAre there any errors in the volume log?19:04
mriedemthis is a pretty old gate bug at this point
openstackLaunchpad bug 1763712 in Cinder "Unable to update the attachment.: MessagingTimeout" [Medium,Confirmed]19:39
mriedemwould be good to get cinder people to take a look at that again19:39
mriedemnot sure if that's just a problem on slow nodes or what, or if something is inefficient during volume attachment update in cinder-volume19:40
mriedemthe rpc call from c-api to c-vol definitely doesn't help19:40
mriedemalso, this was marked as fixed but something must have regressed it
smcginniscanori01: No, thick or thin. See the lvm_type config option:
canori01smcginnis: Ah, I was just looking at that. So default=thick? Because it says on there that auto defaults to thin20:01
smcginniscanori01: Yeah, if you want thick then set lvm_type=thick20:02
canori01smcginnis: Thanks. That's not one of the keywordslisted on the doc, but I'll give that a shot20:03
smcginnis"keywords listed on the doc"?20:04
canori01yeah, it says "(String(choices=[‘default’, ‘thin’, ‘auto’]))" "thick" is not listed there as a choice20:04
canori01but if "thick" works, I'll go with that20:05
smcginnisOh! Sorry, thick is indeed not an option there.20:06
canori01yeah, that's why I was wondering if "default" was thick. Or if thick is supported at all20:07
smcginnisI'd have to look closer (which I'm not able to do right now), but IIRC, default or auto will give you thick if the VG has not been set up as thinly provisioned.20:08
smcginniseharney: You around? I know you probably know all that off the top of your head. ^20:08
*** kmalloc is now known as notmorgan20:10
mriedemjungleboyj: dansmith: on that attachment update rpc timeout,20:39
mriedem99% of the 60 seconds spent are on this call:20:40
mriedemDec 10 22:03:24.340740 ubuntu-xenial-inap-mtl01-0001107482 cinder-volume[31110]: DEBUG oslo_concurrency.processutils [req-2627d823-75bc-4092-bb8e-f079761d732f req-fec531fa-7200-4e15-a820-18ea7e5e6214 tempest-ServerActionsTestJSON-1162184781 None] CMD "sudo cinder-rootwrap /etc/cinder/rootwrap.conf lvchange -a y --yes -K stack-volumes-lvmdriver-1/volume-b62c0c8d-b3cd-46d0-b009-cabc13115f07" returned: 0 in 58.435s {{(pid=312020:40
mriedemxecute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/}}20:40
dansmiththat seems really odd20:40
dansmithlvchange should be pretty easy/fast,20:40
jungleboyjSheesh.  Why would an LVChange take that long?20:41
dansmithalthough I think it does require doing a block device refresh20:41
eharneywhich job is that on?20:41
mriedemeharney: nova-multiattach job, but not a multiattach volume or test20:41
smcginnisThere are some retries around lvchange since every 6 months or so there's some update that causes the call to fail the first few times.20:41
mriedemthis isn't a retry, unless it's not logging it20:42
dansmithbut I think mriedem is saying this one call blocks for that long20:42
mriedemreturned: 0 in 58.435s20:42
dansmithmriedem: also, that's 96%.. come on.. FAKE NEWS20:42
mriedemhey i'll shut down this whole gd government20:43
dansmithis this a manhood thing for you?20:43
jungleboyjIf that is even possible for him.20:43
dansmithokay that's enough before I barf.20:44
smcginnisSo if lvchange is taking its sweet time, not sure how we can improve that situation in the cinder code.20:44
mriedemfwiw a couple of seconds before that another volume is hitting lvchange20:44
dansmithmriedem: and how long does that one take?20:45
dansmithsmcginnis: do you guys @synchronized your calls to lvm stuff?20:45
mriedemDec 10 22:03:24.317836 ubuntu-xenial-inap-mtl01-0001107482 cinder-volume[31110]: DEBUG oslo_concurrency.processutils [req-9e16b85d-5662-4c70-aa13-88cd9c6a03fc req-ba539360-fea6-4538-88dd-afa948120e1c tempest-ServerRescueNegativeTestJSON-1204517625 None] CMD "sudo cinder-rootwrap /etc/cinder/rootwrap.conf lvchange -a y --yes -K stack-volumes-lvmdriver-1/volume-c48c90f2-bf42-473b-b9a4-48f7e6f97f24" returned: 0 in 61.526s {{(p20:45
mriedem1207) execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/}}20:45
mriedemthat one takes over 61 seconds20:45
smcginnisdansmith: I seem to remember it being synchronized.20:45
dansmithif not, it would be interesting to see if serializing all lvm ops make things overall go faster20:46
dansmithI thought they had some of their own internal locking, but if they're fighting and thrashing, that might really go slow on an io-limited instance20:46
jungleboyjdansmith:  That would be my guess as to what is happening.20:47
dansmithjungleboyj: could you throw up a patch we could hammer on a bit?20:47
jungleboyjEspecially given that we have started seeing more issues like this since the size of our infra cloud has decreased.20:47
mriedemsmcginnis: this code right?
smcginnisHmm, yeah. activate_lv and deactivate.20:48
smcginnisDoes NOT appear to have synchronization.20:48
smcginnisWe could try that and see if it makes a difference.20:48
dansmithno lockutils in there at all20:48
dansmithwe'd want to sync all ops, not just all lvchange, so something more than just a decorator would be needed20:49
eharneydo we know if this happens on lvm jobs that don't use the scsi-target-utils target?20:49
smcginnisI think when I've seen these before there weren't any signs of other operations going on at the same time, but shouldn't hurt to try.20:49
dansmithmriedem: so that code is in brick.. can I change it in their tree and make them use the change, or do I have to do complicated library things?20:51
jungleboyjdansmith:  What do you mean as far as a patch?20:51
dansmithjungleboyj: I'll start20:51
jungleboyjOk.  Cool.20:51
smcginnisThe LVM bits aren't in os-brick. That's still local in o/cinder20:51
mriedemdansmith: it's in cinder20:51
eharneythis message in syslog is interesting:   Dec 10 22:02:31 ubuntu-xenial-inap-mtl01-0001107482 kernel: device-mapper: thin: 252:2: reached low water mark for data device: sending event.20:52
jungleboyjsmcginnis:  Are you thinking that the activate and deactivate might be blocking each other?20:52
openstackgerritDhinesh Balasubramaniam proposed openstack/cinder master: Hedvig Cinder driver implementation
*** hedvig_01 has joined #openstack-cinder20:53
smcginnisjungleboyj: Or multiple lv ops slower each other down.20:53
smcginnismriedem: It does appear there were two there in the logs you linked to above.20:53
jungleboyjOk.  That makes sense.20:53
smcginnisTwo overlapping calls that is.20:53
mriedemDec 10 22:02:22 ubuntu-xenial-inap-mtl01-0001107482 sudo[5460]:    stack : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/bin/cinder-rootwrap /etc/cinder/rootwrap.conf lvchange -a y --yes -K stack-volumes-lvmdriver-1/volume-c48c90f2-bf42-473b-b9a4-48f7e6f97f2420:55
mriedemDec 10 22:02:25 ubuntu-xenial-inap-mtl01-0001107482 sudo[5504]:    stack : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/bin/cinder-rootwrap /etc/cinder/rootwrap.conf lvchange -a y --yes -K stack-volumes-lvmdriver-1/volume-b62c0c8d-b3cd-46d0-b009-cabc13115f0720:55
dansmithI'll push a patch in a sec20:55
mriedem3 seconds apart20:55
mriedemnote also that cinder-backup calls initialize_connection in cinder-volume which is the same thing that attachment_update does, so likely the same source of the cinder backup messaging timeouts20:56
jungleboyjHmm, interesting.20:56
hedvig_01@eharney: Thank you for the comments. We have pushed the code to openstack repository.
openstackgerritDan Smith proposed openstack/cinder master: WIP: Synchronize all lvm operations
dansmithdoes that look like it'll do it? ^20:57
smcginnisdansmith: Yeah, I think that looks right. Thanks20:58
mriedemdansmith: the theory being that 2 lvchange -a happening at the same time are causing both to be extra slow?20:58
dansmithmriedem: yeah, so a couple possibilities:20:59
jungleboyjhedvig_01:  Have you gotten the boot pattern test running yet?20:59
dansmith1. access to dm is one at a time, IIRC, which means they'll fight to setup dm devices to some degree, and20:59
dansmith2. If they're doing anything that causes buffer cache to flush, two running at once on an io-limited system will slow things down21:00
dansmithIIRC, there is a long history of issues with lvm concurrency21:00
smcginnisTotally plausible it could cause some sort of internal deadlock/thrashing to happen.21:00
jungleboyjdansmith:  I know you are right on the concurrency issues.21:02
dansmithmriedem: see here, and look at "locking_type" and below:
hedvig_01@jungleboyj: No we did not.  As mentioned earlier, we do not take responsibility for quiescing. If IOs are done on a volume and immediately a snapshot is invoked without quiescing, data will be inconsistent.21:02
hedvig_01I was waiting for a response from you guys for the  same.21:03
mriedemdansmith: aha21:03
jungleboyjeharney:  Wasn'21:03
mriedemand wait_for_locks defaults to 121:03
mriedemso yeah by default we'll lock on /var/lock/lvm for concurrent operations21:04
jungleboyjt that the big discussion earlier in the week or last week that quiescing isn't a requirement for any other drivers?21:04
jungleboyjYou just get a crash-consistent snapshot without quiescing ?21:04
dansmithwhich means it is using its own locking and should be providing protection, but we could be having dining philosopher type interaction21:04
eharneyjungleboyj: yes, that test not working due to concerns about quiescing doesn't make sense IMO, i think something else is going on there21:05
jungleboyjeharney:  Ok.  I agree.21:05
*** rcernin has joined #openstack-cinder21:06
jungleboyjhedvig_01:  Does the test not run correctly or are you disabling it because you are concerned with the results of doing that operation?21:06
redcavalierHi, we made a bug report yesterday and we were wondering if somebody could have a look at it, to at least confirm that it's valid.21:12
redcavalierHere's the report :
openstackLaunchpad bug 1808083 in Cinder "DellEMC SC Storage Center API "Unable to find result where folderPath is" - Duplicate folders" [Undecided,New]21:12
hedvig_01@jungleboyj , @eharney: I  can get a crash-consistent snapshot without quiescing. Does openstack follow this . Send IOs, Wait for current IOs to succeed and  then issue snapshot?21:18
jungleboyjhedvig_01:  I can't speak as to how this is handled on the Nova side, but I know other drivers are able to create snapshots without any type of quiescing happening.21:22
eharneyhedvig_01: what protocol does your driver attach with?21:22
*** ganso has quit IRC21:25
hedvig_01@eharney: block protocol21:28
jungleboyjhedvig_01:  So, lets make this simple.  The boot pattern test is required.  Other storage backends using iSCSI work fine as a sync is done and if the iSCSI cache flush is done properly all the data should be there.21:28
jungleboyjIf your driver can't handle that, then it isn't ready to be merged.21:28
markguz_jungleboyj: digging around. it seems that ubuntu 16.04 creates a /dev/disk/by-path/pci-0000:01:00.1-fc-0x500507605ebff681-lun-121:38
markguz_but nova on mount is looking for /dev/disk/by-path/fc-0x500507605ebff681-lun-121:39
markguz_i guess i need to ask the nova folks about this21:39
markguz_seems weird21:39
jungleboyjThat does seem weird.21:39
jungleboyjhemna:  You around?21:40
imacdonnthat's actually a os_brick thing (which nova and cinder both utilise)21:40
jungleboyjDid something change in ubuntu that that isn't working?21:41
hedvig_01@jungleboyj , @eharney :  I see most of them skipping this test too. I had sent a link earlier and also some other vendor also raised the same concern last week while we were discussing this .  Let me try running this testcase in loop and check if a sync is performed before we receive a snapshot request. If that has happened, and we acknowledged all IOs before snapshot came in, then definitely it's something we are missing on our21:41
hedvig_01 end.21:41
markguz_jungleboyj: the only diff here is that this is ppc64el21:41
markguz_or le if you're not ubuntu21:41
hedvig_01I'll keep you  guys posted21:41
jungleboyjmarkguz_:  Oh, interesting.  I ahve seen differences like that before when working with PPC64 back in my IBM days.21:43
markguz_jungleboyj: i might need to modify udev rules somehow21:43
*** lpetrut has joined #openstack-cinder21:44
*** _alastor_ has joined #openstack-cinder21:46
hedvig_01@eharney: Thank you for the comments. check_for_export , get_Export etc are not deleted because it throws an error if we delete these functions. I guess it's a part of the interface. Log level are changed to info because we got a few comments saying we do not have info logs and it's difficult to trace in production env. So I made a few logs i21:56
hedvig_01nfo, so that the code flow can be traced21:56
markguz_jungleboyj: so it seems that os-brick thinks ppc64 doesn't need the prepended pci-path. there's separate where _get_host_devices does not include pci_num22:19
markguz_i'm running pike, so maybe this is fixed already22:19
markguz_pike is getting old i guess22:19
jungleboyjmarkguz_:  Ok.22:19
markguz_jungleboyj: i hacked the file and the volume attaches no problems22:20
jungleboyjWhat OS are you running?22:22
markguz_ubuntu 16.04 LTS22:22
jungleboyjOk, you said that earlier.22:22
markguz_little endian22:22
jungleboyjI am guessing that the IBMers made an assumption of RedHat and RedHat may handle things differently?22:23
jungleboyjIt would seem to me that it would be valid to try update os-brick to try and handle both cases.22:23
jungleboyjMay want to check with Chhavi Agarawal and get her input though.22:24
markguz_well it could be that os-brick rocky has the changes. ihavn't check22:24
markguz_k. will do22:24
jungleboyjWell, I just looked at master and I don't seen an update that obviously handles that.22:25
markguz_i'll drop her a line. I'm an ibmer too so no problems22:25
jungleboyjmarkguz_:  Ah, that is why you are using ppc64.22:25
*** obre_ has joined #openstack-cinder22:25
markguz_jungleboyj: lol because noone else does?22:25
jungleboyjWell, I remember from being there ... not as many as I hoped.22:26
*** _alastor_ has quit IRC22:26
markguz_jungleboyj: the openpower systems are nice.  not had a chance to play with Power9 yet, but the P8 systems are sweet22:26
jungleboyjWere you looking at  You looking at _get_host_devices ?22:26
markguz_def _get_host_devices22:26
jungleboyjYeah, it is still just looking at fc-%s-lun%s22:27
markguz_had to change the host_device filter and add the pci_num param to get it to work22:27
jungleboyjOk.  So, it hasn't been fixed.22:27
markguz_essentially making it the same as intel22:27
jungleboyjI am guessing it is RedHat specific.22:27
jungleboyjmarkguz_:  Where you based?22:28
markguz_Houston TX22:28
markguz_jungleboyj: Houston TX22:28
jungleboyjCool.  I was and still am in Rochester, MN.  Work for Lenovo now though.22:28
*** lpetrut has quit IRC22:32
jungleboyjWhen they moved me to SoftLayer I voluntarily went to Lenovo to help get their cloud initiatives going.22:33
jungleboyjWas originally in Research but I am now in DCG which was the xSeries business from IBM.22:33
jungleboyjWe are doing some great stuff.22:33
markguz_yeah the lenovo systemXs are nice as are the thinkSystems22:34
*** trident has joined #openstack-cinder22:34
jungleboyjYeah, I have an old 3650-m3 at home now.  Would love an sr65022:34
markguz_we just got a bunch in.22:35
markguz_i use ironic to provision the labs systems here22:35
jungleboyjSweet.  Don't tell the SoftLayer people.  ;-)22:35
markguz_hehe. well we also use bluemix for stuff. but for running the local lab infra openstack is perfect. lets me set stuff up self service for most folks.22:36
jungleboyjYep.  That is the way to go for the future.22:39
markguz_yep :-)22:40
woojayjungleboyj: Would you have a moment to help me understand some of the failed tests on  ?22:46
jungleboyjwoojay:  Looking.22:48
woojayThank you.  I goofed on pylint for sure.  Just trying to see how to fix tox-docs, and how grenade-py3 andnova-multiattach are failing.22:49
jungleboyjwoojay: So, this test case is failing:  tempest.api.image.v2.test_images.BasicOperationsImagesTest.test_update_image[id-f66891a7-a35c-41a8-b590-a065c2a1caa6,smoke]22:50
jungleboyjtempest.lib.exceptions.UnexpectedContentType: Unexpected content type provided22:51
jungleboyjnova-multiattach also failed but we ahve been seeing issues there.22:54
jungleboyjElastic recheck seems to think it knows why the grenade-py3 failure happened.22:55
woojayI figured on multiattach, since it passed on the first time.22:55
woojayI did see that notice about elastic recheck.  I do see it's not just me now
woojayI saw a potential fix for that grenade bug merged to devstack a few days ago so I wasn't sure if that would still affect mine.22:56
jungleboyjAh, and there is a problem with docs.22:56
jungleboyjYou did or did not know what that was about.22:56
woojayI only just read about the potential fix regarding grenade just moments ago.22:57
woojayAnd I definitely am not sure how to fix the failure on the docs.22:58
jungleboyjOk.  So you need a recheck on the grenade failure but lets figure out the docs failure.22:58
woojayyes, that would be great.22:58
*** smcginnis has joined #openstack-cinder23:07
*** ChanServ sets mode: +o smcginnis23:07
jungleboyj have recreated the failure locally.  Will take me a little bit to figure out what is wrong.  It is a problem with the way your docstrings are.  Have seen it before but can't remember what it is.23:09
woojayI see.  I can also try to removing them and convert to regular comments, if that may help.23:10
jungleboyjEh, give me a little time to play.23:11
woojayThank you so much.23:11
jungleboyjOh man.23:17
jungleboyjFound the problem.23:18
jungleboyjI will update the review.23:18
woojayI must have been super silly.23:18
jungleboyjTypo in the support-matrix.ini file.23:19
woojayI see the comment.  Thank you so much for that.23:19
jungleboyjWelcome.  I knew I had seen it somewhere before.23:20
*** dklyle has joined #openstack-cinder23:35
