Thursday, 2015-04-16

patrickeastasselin: i wonder how hard it would be to set up something like logstash internally for a 3rd party ci system00:19
asselinpatrickeast, shouldn't be too hard:
asselinit's on my list to things to do eventually00:20
thingeemtreinish: there is definitely a delete happening before the detach.00:57
*** markvoelker has joined #openstack-cinder01:07
thingeemtreinish: 352125fa-5920-418d-aa7c-db64669c828c01:46
thingeenotice how that volume is detached, exported, deleted, detached01:46
*** annashen has joined #openstack-cinder03:17
*** thingee has joined #openstack-cinder03:36
*** vilobhmm1 has joined #openstack-cinder05:05
openstackgerritMike Perez proposed openstack/cinder-specs: Introduce use case section
openstackgerritMike Perez proposed openstack/cinder-specs: Introduce use case section
*** rongze_ has joined #openstack-cinder05:21
*** rongze has quit IRC05:21
*** zerda has joined #openstack-cinder05:22
openstackgerritJeegn Chen proposed openstack/cinder: Remove redundant code from VNX Cinder Driver
*** rongze has joined #openstack-cinder05:28
*** rongze_ has quit IRC05:29
*** rongze has quit IRC05:29
openstackgerritJeegn Chen proposed openstack/cinder: Remove redundant code from VNX Cinder Driver
*** vilobhmm1 has quit IRC05:29
openstackgerritMike Perez proposed openstack/cinder-specs: Open Liberty for specs
openstackgerritMike Perez proposed openstack/cinder-specs: Open Liberty for specs
*** thingee has quit IRC06:19
*** _cjones_ has joined #openstack-cinder06:21
openstackgerritVincent Hou proposed openstack/cinder: Select one instance to swap the volume for migration
openstackgerritDeepak C Shetty proposed openstack/cinder: WIP: Cinder part of volume snapshot improvements
*** rongze has quit IRC07:56
*** rongze has joined #openstack-cinder07:56
*** rongze has quit IRC08:01
*** Mandell has quit IRC08:01
*** esker has joined #openstack-cinder08:01
winston-djbernard, jdurgin:
openstackLaunchpad bug 1444855 in Cinder "RBD driver doesn't support customized ceph cluster name" [Undecided,New]
winston-djbernard, jdurgin: rbd driver should allow user to specify cluster name.
openstackgerritDeepak C Shetty proposed openstack/cinder-specs: Assisted snapshot improvements
openstackgerritDaisuke Fujita proposed openstack/cinder: Fix a wrong argument of delete_keys method
openstackgerritFabien Boucher proposed openstack/cinder: Complete the doc/README.rst instructions to build docs
*** jaypipes has quit IRC09:10
*** e0ne has quit IRC09:10
*** rongze_ has joined #openstack-cinder09:10
*** rongze has quit IRC09:11
*** rongze_ has quit IRC09:12
*** skraynev has joined #openstack-cinder09:12
openstackgerritRick Chen proposed openstack/cinder: Add patch for consistency group update in ProphetStor driver
*** annashen has quit IRC09:49
*** asselin has quit IRC10:19
*** deepakcs has quit IRC10:23
DuncanTdivya: Usually means your volume manager didn't start properly, grep ERROR in the cinder-volume log10:30
openstackgerritPetrut Lucian proposed openstack/cinder: SMBFS: Fix retrieving total allocated size
openstackgerritPetrut Lucian proposed openstack/cinder: Windows SMBFS: fix volume extend
openstackgerritPetrut Lucian proposed openstack/cinder: Windows SMBFS: Fix image resize errors during volume creation
openstackgerritPetrut Lucian proposed openstack/cinder: SMBFS: Add minimum qemu-img version requirement
*** sgotliv has joined #openstack-cinder10:42
*** rongze has joined #openstack-cinder10:42
openstackgerritPetrut Lucian proposed openstack/cinder: SMBFS: Lock on a per-volume basis
*** esker has quit IRC11:27
*** rongze has joined #openstack-cinder11:43
DuncanTdivya: If you've not enabled persistent logs in your local.conf, then you can just look at the c-vol screen tab11:44
openstackgerrityogeshprasad proposed openstack/cinder: Add chap support to CloudByte cinder driver
openstackgerrityogeshprasad proposed openstack/cinder: Add chap support to CloudByte cinder driver
*** julim has quit IRC12:22
divyaDuncanT-Logs are there in
divyathe logs show HP3par array is uninitialised12:31
divyacan you please help12:31
*** mriedem has joined #openstack-cinder12:56
*** annashen has quit IRC12:57
*** avishay has joined #openstack-cinder12:58
dulekDuncanT: Hi! I couldn't make it to the meeting yesterday, but I see guys mentioned me a few times.13:12
DuncanTdivya: Sorry, got pulled out into an incident, back now, just looking at your log13:12
dulekDuncanT: I was looking into some issues and right now fellow engineer is writing a PoC of moving locks in c-vol to tooz,13:13
DuncanTdivya: If you restart the volume manager (just ctrl+c it then repeat the last command in the shell history) you should be able to find the first error, which will tell you *why* the manager is uninitialised13:14
dulekDuncanT: This may solve issues with drivers locks. Only requirement - redis installed.13:14
dulekDuncanT: This is how Ceilometer is using tooz. :)13:14
DuncanTdulek: I've been doing some more analysis today. One issue that isn't solved by tooz or similar is how to get rid of stuck lock when a node crashes / reboots/etc13:15
dulekDuncanT: I don't think this issue is valid. Let me ask my colleague and get back to you.13:15
DuncanTdulek: We might be better using a state change in the API to prevent concurrent operations at source...13:16
DuncanTdulek: No locks needed then13:16
dulekDuncanT: I do think that this is better approach, just trying to explore others possibilities.13:20
*** nlevinki has joined #openstack-cinder13:20
dulekDuncanT: So tooz won't deadlock on service failure because it requires services to send also heartbeats.13:21
dulekDuncanT: Lock is released if there are no heartbeats send.13:21
dulekDuncanT: Also - lock is released if service creates a new coordinator.13:22
dulekDuncanT: I'll add that to Etherpad. And definitely I want to be a part of this effort in L to make c-vol A/A-ready. :)13:23
geguileoDuncanT: There is timeout on locks13:23
geguileoDuncanT: So if service dies lock is released13:23
geguileodulek: DuncanT Like I mentioned yesterday Tooz with Redis doesn't solve all issues for atomic state changes13:24
geguileodulek: DuncanT Although it reduces to almost none, there are still corner cases where errors may happen13:25
dulekgeguileo: Can you describe an example? It will be better to be on the same page. :)13:26
geguileodulek: Ok13:27
openstackgerritTina Tang proposed openstack/cinder: Attach/detach batch processing in VNX driver
dulekgeguileo: And if some drivers *require* locking we can create a way to do that - just before setting the driver - configure tooz. :)13:28
geguileodulek: On that part I agree  :)13:28
*** zhipeng has joined #openstack-cinder13:29
geguileodulek: Case is when server A gets lock, then B waits onlock, A changes from available to X in galera node 1 and releases the lock, B acquires lock and looks at galera node 2 where resource is still set as available because propagation hasn't happened13:29
*** markvoelker has joined #openstack-cinder13:30
geguileodulek: Unless you lock the resource for the whole duration of the operation and not just the state change13:30
geguileodulek: But then you have to be careful with the heartbeat13:30
dulekgeguileo: Oh, right now Cinder is locking for the whole operation I think.13:31
dulekgeguileo: And I was thinking of that approach.13:31
geguileodulek: But if I'm not mistaken the state change happens elsewhere13:32
dulekgeguileo: I was never thinking of using tooz to simulate check-and-set (or select for update). That would be silly. ;)13:32
geguileodulek: Because locks are being used for example so that snapshot don't get deleted when creating volume from snapshot13:32
geguileodulek: Ok, so you were looking at tooz for resource locking and not for state changing?13:33
dulekgeguileo: Exactly.13:34
dulekgeguileo: I was thinking of these.13:36
geguileodulek: Ok, but you know that state changes happen on API13:37
geguileodulek: and then calls rpcapi13:38
dulekgeguileo: Yes, the more you explain the more I think check-and-set is definitely required.13:38
dulekgeguileo: In the first place. :)13:38
geguileodulek: Exactly, that's the first issue to solve: atomic state changes13:38
*** esker has joined #openstack-cinder13:39
geguileodulek: Then for operations that need to lock resources we could use Tooz13:39
dulekgeguileo: Okay, I'm glad we agreed. :)13:39
geguileodulek: Me too  :)13:39
geguileodulek: I've been playing a little bit with Tooz and it's very convenient  :)13:40
*** rushil has joined #openstack-cinder13:42
*** divya has quit IRC13:42
*** xyang1 has joined #openstack-cinder14:02
DuncanTdulek: geguileo: Sorry I keep dropping out of the converstation. The approach I'm trying to take for this not is to list the problem before looking at a solution. An example problem is that some drivers (e.g. the fs based ones) need mutual exclusion between delete, snap, create from snap etc. This is currently provided by the lock decorators in the drivers14:04
*** kaisers1 has quit IRC14:04
DuncanTdulek: geguileo: the nova->cinder API has races, including tryi ng to attach the volume to two VMs ast once. The current protection against that is the reserve call which sets the attaching state, and a call to check the state - this is still racy14:05
geguileoDuncanT: Yes, I know we have multiple issues: state changes, locking using files, drivers that apparently are not ussing synchronization...14:05
DuncanTdulek: geguileo: There are some locks in cinder-volume manager. I'm not sure what they are there to protect against, I'd like to know14:06
dulekDuncanT: Is there any way to achieve driver needs without distributed locking?14:06
DuncanTOnce we have  alist of all of the issues we are looking to solve, we can look at the various solutions14:06
geguileoDuncanT: The problem about the state races is the one I'm looking to fix now, that's what the tests I'm doing are for14:06
DuncanTdulek: FS drivers need mutual excludion of certain operations14:06
DuncanTgeguileo: So some of the state races go away entirely if you fix the nnova api14:07
DuncanTgeguileo: That is an example of why I intend to nak any state patches for a few days until we clearly understand the problems14:07
geguileogeguileo: But wouldn't it make more sense to remove race conditions inhouse?14:07
DuncanTgeguileo: With the current API, you *can't* fix the race in-house14:07
geguileoDuncanT: That was for you, I don't know why I said it to myself  ;)14:07
geguileoDuncanT: Really?14:08
dulekgeguileo: It was in the Etherpad actually14:14
DuncanTdulek: Nova checking the state is a problem, yeah14:14
dulekgeguileo: I guess this introduces races because state can change. But to be honest this one can probably be taken care in cinder14:14
DuncanTdulek: Nova expects once it has chacked the state that that state can't change under it14:15
*** haomaiwang has quit IRC14:15
DuncanTdulek: e.g. if nova does check then reserve, and the check says the reserve should work, then nova has no code patch to handle the reserve failing14:15
DuncanTdulek: The checks should be gotten rid of, and make reserve fail if the state isn't right, and make nova deal with that14:16
*** markvoelker_ has joined #openstack-cinder14:16
geguileoDuncanT: dulek But that's Nova's problem14:16
DuncanTduleek: I'm not sure we even need a reserve, just have initialise connection do the work of reserve14:16
DuncanTdulek: No problem. I just wanted to be clear why I was resisting merging any of the state change fixes now - it is unclear exactly which of the many 'state problems' and 'state races' we have the patches are trying to solve14:20
dulekDuncanT: To be honest I'm aware of two approaches - micro-states and work with removing the file locks and get status changes atomic.14:21
dulekDuncanT: Are there more?14:21
*** avishay has quit IRC14:21
*** emagana has joined #openstack-cinder14:21
DuncanTdulek: Micro-states is one. Great big states is two. DLM is a third. Changing the cinder model to be more tolerant of failure and eventual consitency is another14:22
DuncanTdulek: I'm still at the 'what are the bugs, in detail' stage14:23
dulekDuncanT: That's fine. What's DLM?14:23
*** markvoelker has joined #openstack-cinder14:23
DuncanTDistributed lock manager, e.g. tooz14:23
*** sgotliv_ has joined #openstack-cinder14:25
sdagueis there anyone that might be up for helping look at the following bug from a cinder perspective -
openstackLaunchpad bug 1445021 in OpenStack Compute (nova) "nova-compute does not start after upgrade from juno->kilo if there are boot from volume servers running" [Critical,New]
*** e0ne has quit IRC14:46
openstackgerritHuang Zhiteng proposed openstack/cinder: Add support for customized cluster name
*** annegentle has quit IRC15:04
sdaguemtanino: I can trigger a sync, you think we're just in the fscache?15:18
*** esker has joined #openstack-cinder15:39
openstackgerritRajini Ram proposed openstack/cinder: Fixes the retries on Network Connection Error
*** thingee has joined #openstack-cinder16:02
thingeemtreinish: there is definitely a delete happening before the detach.16:02
thingeemtreinish: 352125fa-5920-418d-aa7c-db64669c828c16:02
thingeenotice how that volume is detached, exported, deleted, detached16:02
bswartzthingee: ping16:03
thingeebswartz: hi16:03
*** rongze has joined #openstack-cinder16:04
*** emagana has quit IRC16:05
*** jungleboyj has quit IRC16:06
*** rongze has quit IRC16:06
*** annashen has quit IRC16:06
*** rongze has joined #openstack-cinder16:07
mtreinishthingee: does it do a detach right before the delete?16:07
mtreinishalthough it does look like we do a second detach after the delete (which fails)16:07
thingeedetach, export, deleted, detach16:08
*** markvoelker has joined #openstack-cinder16:08
thingeeI'm only running the volume boot pattern test16:08
*** jistr has quit IRC16:08
*** ociuhandu has quit IRC16:08
*** rongze has quit IRC16:09
*** adurbin_ has joined #openstack-cinder16:10
*** afazekas has quit IRC16:11
*** emagana has joined #openstack-cinder16:13
*** nshaikh has quit IRC16:13
*** Liu has quit IRC16:13
xyang1jgriffith: Oh, that one. Sort of16:14
jgriffithxyang1: LOL :)16:14
xyang1jgriffith: Any concerns?16:15
jgriffithxyang1: there's an awful lot going on there16:15
jgriffithxyang1: I'm trying to *find* the responses to the queued calls16:15
*** rushiagr_away is now known as rushiagr16:15
jgriffithxyang1: also wondering if she meant to remove the rev history on purpose?16:15
xyang1jgriffith: Yes, the purpose is to improve performance16:15
*** annegentle has quit IRC16:15
jgriffithxyang1: right, and it should "fix" the Ericcson case no?16:15
xyang1jgriffith: yes:)16:16
mtreinishthingee: although tbh stack tracing on a detach because the volume can't be found is weird16:16
jgriffithxyang1: so it looks like it just batches and processes the attach/detach commands in their own queue16:16
xyang1jgriffith: They do lots of stress test16:16
jgriffithxyang1: so we still won't get false responses over the RPC layer16:16
jgriffithxyang1: in other words we don't change status until it's processed in the batch/queue16:16
xyang1jgriffith: Right, this is to avoid time out16:17
jgriffithxyang1: That was my only concern, but I *think* I see how the code works16:17
jgriffithxyang1: wanted to check with you :)16:17
jgriffithxyang1: and the version history stuff?16:17
xyang1jgriffith: Which version history?16:17
jgriffithxyang1: well...  that's the thing I am kinda confused about16:17
xyang1Not following16:17
jgriffithxyang1: oh... version history:
xyang1jgriffith: It is a very big patch16:23
jgriffithxyang1: interesting... yeah16:29
jgriffithxyang1: that is useful I think, but I hadn't thought of it16:29
jgriffithxyang1: but wouldn't that just be up to the driver?16:29
thingeemtreinish: yeah I'm thinking it's coming from nova16:29
xyang1jgriffith: It seems that we need to add a few more new CG APIs to support both16:29
jgriffithxyang1: so you say "replicate volume 'foo'"16:30
thingeemtreinish: I've glanced through logs and didn't see anything new that would be obvious16:30
*** sgotliv_ has quit IRC16:30
jgriffithxyang1: driver looks and and says "oh... this is in a CG, so replicate all of them"16:30
thingeemtreinish: I'll file a bug and keep looking.16:30
*** mriedem has quit IRC16:30
xyang1jgriffith: Then you need to put them in group earlier16:30
thingeeagainst nova and tempest until I can verify what's doing it16:31
xyang1jgriffith: Otherwise how do you tell16:31
xyang1jgriffith: I mean when we create a CG, we choose 1 backend that supports all volume types, so this should apply to cg for replication?16:32
*** annegentle has joined #openstack-cinder16:32
*** mriedem has joined #openstack-cinder16:32
*** markus_z has quit IRC16:38
thingeemtreinish: instead of stacktrace, are you saying it should just move along?16:44
*** patrickeast has joined #openstack-cinder16:44
mtreinishthingee: yeah, log that it wasn't found, but that's not something unexpected16:45
mtreinishit just should be caught because a not found is expected16:46
thingeemtreinish: wonder if that would have any bad side effects from the api that uses that raise for status code return.16:46
thingeemtreinish: but I agree16:46
mtreinishmarcusvrn1: so the tempest config files are different16:53
*** IanGovett1 has quit IRC17:00
*** Yogi1 has quit IRC17:02
jgriffithxyang1: sorry... got pulled away17:05
jgriffithxyang1: so yes, that would mean you have to know it's a CG prior to replication17:05
jgriffithxyang1: honestly it's been my opinion that requiring CG and Replication knowledge at create time is "ok"17:05
jgriffithxyang1: but I know that's not going to fly17:06
*** emagana has joined #openstack-cinder17:06
*** dannywilson has joined #openstack-cinder17:06
jgriffithxyang1: but keep in mind there's retype17:06
jgriffithxyang1: which solves at least "some" of that17:06
xyang1jgriffith: You are saying only let it happen at volume create time? Not create a CG ahead of time?17:07
*** dannywilson has quit IRC17:08
jgriffithxyang1: I'm saying I'm not saying anything yet :)17:08
xyang1jgriffith: Ok:)17:08
jgriffithxyang1: LOL17:08
asselinpatrickeast, marcusvrn we disabled those tests...haven't gotten them to work yet17:08
jgriffithxyang1: IMO I think types work well for these sorts of things17:09
jgriffithxyang1: I'm not necessarily convinced that new API's and moving away from those was a good idea17:09
*** dannywilson has joined #openstack-cinder17:09
*** annegentle has quit IRC17:09
jgriffithxyang1: at the same time though, thingee has ran on the platform of putting more backend management in Cinder17:09
*** dannywilson has quit IRC17:09
jgriffithxyang1: so he obviously has very different vision than I do17:09
*** dannywilson has joined #openstack-cinder17:10
jgriffithxyang1: and I think you, hemna and others did as well17:10
*** rongze has joined #openstack-cinder17:10
jgriffith^^ have a different view than me17:10
*** dannywilson has quit IRC17:10
xyang1jgriffith: :)17:10
*** gcivitella has quit IRC17:10
*** dannywilson has joined #openstack-cinder17:10
*** dannywilson has quit IRC17:10
jgriffithxyang1: honestly, based on all the notes in the review I have up I'm not sure there's any way to make everybody happy here17:10
jgriffithxyang1: we've become a pretty divisive group17:11
xyang1jgriffith: The spec looks good, I just want to make sure we think about cg as well17:11
jgriffithxyang1: Yeah, that's an important piece and you're right I didn't talk about it at all17:12
jgriffithxyang1: I'm just thinking.....17:12
*** markvoelker has joined #openstack-cinder17:12
jgriffithxyang1: Trying to think if doing something like retypes to get what you want would work17:12
jgriffithxyang1: Because of the whole knowing if it's a CG at creation or not etc17:13
jgriffithxyang1: I kinda like pushing the "how" back to the driver17:13
xyang1jgriffith: You mean retype to make all volumes in one type?17:13
*** tshefi has quit IRC17:13
xyang1jgriffith: You are saying hiding that from cinder?17:14
jgriffithxyang1: so you have Type=CG with N-volumes17:14
jgriffithxyang1: and you want to replicate the CG...17:14
jgriffithxyang1: then you'd need to retype to "Type=CG-Replicated"17:14
*** annegentle has joined #openstack-cinder17:14
jgriffithxyang1: I'd like to hide more of it from Cinder than we do currently yes17:15
jgriffithxyang1: because the deltas between the platforms makes things a bit difficult IMO17:15
jgriffithxyang1: well... not "difficult" but kind of impossible17:15
jgriffithxyang1: and very difficult to maintain/read17:15
xyang1jgriffith: Then it won't be integrated with the existing CG implementation17:15
jgriffithxyang1: That's the problem17:16
jgriffithxyang1: I'd like ot figure out how to keep the CG integration that's there and make this idea work with it17:16
jgriffithI just haven't come up with that yet :)17:16
xyang1jgriffith: If we take cg id at create volume time, that should solve the problem17:17
jgriffithxyang1: Oh... duhhh, yeah17:17
*** ronis_ has joined #openstack-cinder17:32
*** bkopilov has joined #openstack-cinder17:32
*** ronis has quit IRC17:35
eharneythingee: looking17:35
*** bkopilov has quit IRC17:37
*** bkopilov has joined #openstack-cinder17:40
*** markvoelker has quit IRC17:43
openstackgerritWalter A. Boring IV (hemna) proposed openstack/os-brick: Mask passwords with iscsiadm commands
*** EmilienM|afk is now known as EmilienM17:43
openstackgerritWalter A. Boring IV (hemna) proposed openstack/cinder: Mask passwords with iscsiadm commands
*** ociuhandu has quit IRC17:45
*** bkopilov has quit IRC17:45
openstackgerritWalter A. Boring IV (hemna) proposed openstack/os-brick: Brick: Fix race in removing iSCSI device
*** rongze has joined #openstack-cinder17:52
*** rongze has joined #openstack-cinder18:00
*** Yogi1 has joined #openstack-cinder18:00
*** rongze_ has quit IRC18:00
*** Apoorva_ has joined #openstack-cinder18:06
*** Apoorva_ has quit IRC18:08
*** Apoorva has quit IRC18:09
*** Apoorva_ has joined #openstack-cinder18:09
*** rongze has quit IRC18:09
*** dulek has joined #openstack-cinder18:12
openstackgerritDoug Hellmann proposed openstack/python-cinderclient: Uncap library requirements for liberty
*** lpabon has joined #openstack-cinder18:16
*** lpabon has quit IRC18:16
*** crose has quit IRC18:21
thingeemtreinish, jgriffith: continuing from yesterday ... so terminate connection is deleting the volume and then later the volumebootpattern is detaching the volume that was deleted.18:21
*** Maike has joined #openstack-cinder18:22
hemnanew star wars trailer18:22
hemnaso epic!!18:22
mtreinishthingee: I couldn't find anywhere in the test was calling attach or detach (there was a method to do detach, but it didn't look like it was ever called)18:22
mtreinishthingee: do you have the api logs and tempest logs showing tempest making the detach?18:25
thingeemtreinish: pretty sure it's on the nova side at this point18:27
*** vilobhmm1 has joined #openstack-cinder18:27
*** vilobhmm11 has joined #openstack-cinder18:29
*** bswartz has joined #openstack-cinder18:30
vilobhmm11thingee, DuncanT : added previous seen issues and patches proposed that i know of to the etherpad please have a look
*** vilobhmm1 has quit IRC18:32
openstackgerritAngela Smith proposed openstack/cinder-specs: Spec for Brocade Zone Driver Virtual Fabrics support
openstackgerritAngela Smith proposed openstack/cinder-specs: Spec for Brocade Zone Driver Virtual Fabrics support
openstackgerritAngela Smith proposed openstack/cinder-specs: Spec for Brocade Zone Driver Friendly Zone names
mtreinishjgriffith: so I'm going to bug you again how goes the cli test migration?18:51
jgriffithmtreinish: :)18:58
jgriffithmtreinish: thanks for the reminder18:58
jgriffithmtreinish: I need to get that back up on my list18:58
jgriffithmtreinish: I honestly haven't done anything more than submit the base change to cinderclient18:59
mtreinishwell at least you're doing it. I cant get anyone from glance, heat, or swift to even start the process18:59
jgriffithmtreinish: and that has been sitting in review for *ever*18:59
jgriffithmtreinish: well... half of it18:59
jgriffithmtreinish: :(19:00
jgriffithmtreinish: so I'lll carve out some time here in the next week or so and take another hard run at it19:00
jgriffithI'd like to have some stuff in place for Vancouver19:00
jgriffithmtreinish: so we can chat in person :)19:00
mtreinishthingee: ^^^ can you put your elected powers to work and grease the the wheels?19:00
mtreinishjgriffith: cool thanks19:00
mtreinishthingee: lol, the review wheels, not the jgriffith wheel :)19:01
mtreinishjgriffith: I probably wouldn't snub you in Vancouver if we didn't get this in place :)19:01
openstackgerritAngela Smith proposed openstack/cinder-specs: Spec for Brocade Zone Driver Friendly Zone names
*** cbader has joined #openstack-cinder19:04
*** rongze has joined #openstack-cinder19:06
*** rushiagr is now known as rushiagr_away19:08
*** rongze has quit IRC19:10
*** markvoelker has quit IRC19:14
*** Mandell has quit IRC19:19
*** Yogi11 has joined #openstack-cinder19:23
*** jungleboyj has quit IRC19:24
*** Mandell has joined #openstack-cinder19:25
thingeemtreinish: ok not delete on termination. Nova never issues the delete volume it seems. I definitely see tempest log with a delete request to cinder19:25
*** Yogi1 has quit IRC19:26
mtreinishthingee: yeah that's consistent with my reading of the tempest code. When tempest calls it's create_volume() it registers a cleanup for the volume delete19:27
mtreinishfwiw, it also does the same with server create, so it'll call server delete and wait for that before calling volume delete (because cleanup is a lifo)19:28
mtreinishwhich is why tempest doesnt ever issue a detach happens because the servers are all boot from volume for that test19:29
thingeepretty sure there is something out of order with tempest doing the delete and then a detach call happening :(19:30
thingeebut I'm not sure why it's not happening in gate, but just to some drivers.19:30
*** annashen has quit IRC19:32
tbarronIs there a way to tell taskflow to only attempt a volume create one time when a single backend is configured?19:32
tbarronI'm not seeing a cinder.conf setting for this right off.19:33
mtreinishthingee: well how does nova handle a delete with a boot from volume server?19:33
thingeeonly if the block device mapping has a delete on terminate. I have a breakpoint on the nova.volume.cinder.delete and it never gets hit. also tempest is explicitly setting delete on terminate to false19:34
thingeeI specifically see from tempest_lib a delete call to cinder19:34
mtreinishthingee: also is this on the first volume or the volume from the snapshot?19:35
tbarronI ask in part because we have automated tests that are apparently triggering a race condition between taskflow volume create and volume delete, where the latter is triggered by polling for the volume to got to error or available.19:35
*** dustins has quit IRC19:35
mtreinishthingee: ok, does nova not issue a detach before deleting the server from volume. (sorry I'm ignorant of that code path)19:35
tbarrontaskflow appears to set volume state from creating to error to creating to error etc. on negative tests (where the crteate is supposed to fail).19:36
mtreinishthingee: i get that tempest is calling the volume delete after the server is deleted, I'm just trying to figure out where that detach is coming from19:36
thingeemtreinish: correct detach is after.19:36
tbarronour test cleanup code can see the error state and attempt to delete the volume while taskflow is still doing retries.19:36
*** annegentle has quit IRC19:36
mtreinishthingee: so nova deletes the vm and then detaches the volume? That seems backwards to me19:37
thingeewhoops sorry before19:38
tbarronat 015-04-16 12:54:33.550 we see taskflow setting state to error19:38
*** dustins has joined #openstack-cinder19:39
thingeemtreinish: and that comes from
tbarronAt 2015-04-16 12:54:33.638 we see the manager trying to do a delete, before taskflow has kicked off the next create attempt19:39
mtreinishthingee: ah ok that makes sense. So the volume isn't attached when tempest goes to delete the volume. So I wonder where that later detach call is coming from?19:40
mtreinishthingee: do you have a req-id from the second detach (after the volume is created) that you can search for in the tempest logs? Or the timestamp to check nova's logs19:41
mtreinish(since nova doesn't log the req-id from cinder requests)19:41
tbarronThe delete is running inside an oslo concurrency lock, but that seems to not exclude taskflow create from doing DB updates.19:42
thingeemtreinish: hang on...19:42
tbarronharlowja: ^^^^^^ taskflow cinder create questions above19:44
thingeemtreinish: pretty sure they're all coming from nova19:44
thingeedetaches that is19:45
*** timcl has quit IRC19:45
tbarronharlowja: can you read the last 5 minutes of buffer here, questions about an apparent race between taskflow volume create and volume delete on the same volume?19:45
mtreinishok, that's what I was expecting. I guess now that just means we need to trace through nova to figure out why it's sending a second detach after the server and volume have already been deleted19:46
mtreinishand why we never hit that in the gate (or with other backends)19:46
tbarronharlowja: here's a better pastebin, the first one didn't show manager delete_volume gaining the oslo concurrency lock:19:47
thingeemtreinish: maybe because the instance we're deleting has a bdm to that volume still?19:47
thingeethat was deleted19:47
*** julim has quit IRC19:48
thingeemtreinish: just tracing through
harlowjatbarron ok, so request comes in to delete, seems like it gets the lock 'lvo_inner2' create is still running, and reverting right , i don't see delete every entering the RUNNING state though, guess thats cause delete isn't using taskflow19:52
tbarronharlowja: right, delete isn't using taskflow.19:53
tbarronthis delete is crapping out right away b/c it finds the volume in an unxpected state (no host string, but it19:53
*** bkopilov has joined #openstack-cinder19:53
tbarronit's trying to parse it).19:53
mtreinishthingee: yeah that should return a 404 right? because the volume was deleted19:54
tbarronI left that part out of the log and description because the more fundamental issue to me is that we allow the delete in while taskflow is still retrying19:54
harlowjatbarron what lock is the create getting, seems like it could get a few @
mtreinishthingee: err or just a NotFound19:54
mtreinishthingee: I think I'm missing something19:55
harlowjatbarron and locked_volume_operation seems to get some other kind of locks19:55
harlowjasooo how's that work :-/19:55
tbarronnot sure19:55
harlowjaseems like they use different locks, so maybe thats the problem19:56
tbarronharlowja: does the create in taskflow hold the same lock through all three attempts to do the volume create?19:56
harlowjatbarron the lock isn't in the taskflow code, so yes19:57
*** dustins has quit IRC19:57
tbarroneven though it is setting the DB state back to error at the end of the create?19:57
harlowjautils.synchronized(locked_action, external=True) is doing all that19:57
harlowjathe lock is around the running of the taskflow 'flow'19:57
harlowjaoutside of taskflow19:57
tbarronharlowja: I'm a complete newbie relative to taskflow.19:58
harlowjahere to help!19:58
harlowjaso that code is locking, then letting taskflow run the workflow, and then release, so some lock is held during that entire process19:59
tbarronharlowja: but you are saying if I understand you that we should be setting up mutual exclusion between create and delete using locks instead of, e.g. checks on DB state, through the entire 3-attempt create lifecycle.19:59
harlowjathat seems to be intention, not sure the reality :)20:00
tbarronWe appear instead - though I can't say I know every corner of the code - to rely on the volume state being available or error to see if it's safe to do a delete.20:00
tbarronAnd you set it back to error.20:01
*** annegentle has joined #openstack-cinder20:01
harlowjawell it makes me wonder why delete and create are running at the same time (like that pastebin shows)20:02
harlowjathey shouldn't be able to20:02
tbarron # NOTE(harlowja): this is awkward to be done here, shouldn't20:02
tbarron            # this happen at the scheduler itself20:02
tbarronharlowja: you do note: ^^^20:02
harlowjaya, thats the general rescheduling loop junk20:02
* harlowja can't fix all the things, so i just leave comments about weirdness i saw when refactoring that, lol20:03
tbarronharlowja: is there a way to configure taskflow to only do one attempt?  That will give us a workaround for our automated testing while the more general locking/exclusion issue is sorted out.20:03
harlowjaso taskflow isn't really involved in that either20:03
harlowjaits the cinder rescheduling loop20:04
harlowjathere is a way afaik20:04
harlowjatbarron ^ is really what is doing this, taskflow isn't really involved (another weirdness that i can't change, haha)20:05
tbarronharlowja: I see!  thanks.20:05
harlowjasomeone needs to write a blog post about that stuff, haha20:05
tbarronharlowja: thanks for your help.  I'm going to set CONF.max_scheduler_attempts = 1 for now.20:09
tbarronThat said, I think I'm not the only one to have confusion about the mix of taskflow (micro) state transitions and cinder20:10
tbarronobject state transitions, esp. exclusion relations when doing updates on the latter.20:10
*** annegentle has quit IRC20:10
harlowjai know20:11
harlowjatis life :-P20:11
*** annegentle has joined #openstack-cinder20:11
harlowjathe magic loop though pre-exists me, but i underestand the misunderstanding20:11
harlowjaits common20:11
*** rongze has quit IRC20:11
*** markvoelker has joined #openstack-cinder20:12
openstackgerritxing-yang proposed openstack/cinder: Fixed issue with mismatched config in VMAX driver
*** annegentle has quit IRC20:46
thingeemtreinish: here's the trace of the last detach
thingeehemna, mtreinish: so I think nova isn't expecting InvalidVolume to be raised from cinder in detach.21:01
*** dannywil_ has joined #openstack-cinder21:01
thingeehemna, mtreinish: this changed with the multi-attach stuff21:01
mtreinishthingee: oh, i could see that21:02
hemnathat's a trace ?21:06
mtreinishthingee: what does InvalidVolume get converted into in cinderclient?21:06
* hemna is confused21:06
hemnais there a defect somewhere?21:07
thingeehemna: yes, multi-attach change is raising something nova isn't expecting21:07
thingeesee my last few messages for the trace and code from nova/cinder sides21:07
mtreinishthingee: yeah that's definitely a behavior change then21:07
*** rongze has joined #openstack-cinder21:08
mtreinishit would be pretty easy to either change cinder back, or have nova understand a 400 in that except block21:08
*** bswartz has quit IRC21:08
hemnalaunchpad url ?21:09
thingeehemna: none yet, just discovered21:09
hemnaok well why are we getting an invalid volume?21:10
hemnais the uuid bogus being passed to cinder ?21:10
thingeehemna: dunno, I've been working most yesterday and today with mtreinish tracing down this issue.21:11
hemnalogfile with the exception ?21:11
*** harlowja is now known as harlowja_away21:11
*** emagana has quit IRC21:12
mtreinishhemna: it's not bogus, it's just already been deleted21:12
hemnawhy ?21:12
*** rongze has quit IRC21:13
hemnawhy is nova calling cinder to detach a deleted volume ?21:13
*** EmilienM is now known as EmilienM|afk21:13
hemnaand why is cinder allowing a volume to be deleted if it's attached ?21:13
thingeehemna: I think it's because nova has it in the bdm, but tempest is deleting it.21:13
thingeehemna: the point is the nova just isn't expecting invalidvolume to be raised, so it stacktraces.21:14
hemnanot sure that makes sense to me21:14
hemnacinder shouldn't allow a volume to be deleted by anyone if it's attached21:14
thingeehemna: still trying to understand myself21:14
mtreinishhemna: tempest deletes the server waits for it to disappear, then issues the volume delete and waits for it to delete21:14
mtreinishsince it's a boot from volume server, the detach is happening in nova21:15
mtreinishthingee: does cinderclients detach call wait for the detach to finish? (its async iirc)21:15
hemnaok, but that still doesn't explain the out of sequence events21:15
hemnare: a volume being deleted before detached21:16
*** Liu has quit IRC21:16
thingeehemna: again, tempest does the delete, nova does the detach.21:16
thingeecompletely separate21:16
jgriffithhemna: I'm completely confused becasue one of the pastebins I looked at earlier had the detach21:16
jgriffithso who knows21:16
*** annegentle has joined #openstack-cinder21:16
mtreinishI'll bbiab, got to head home before the qa meeting21:16
jgriffiththingee: is there any way to just get a complete set of logs posted?21:17
* thingee still waiting for test21:17
hemnahehe ok.  I'm not totally crazy then21:17
thingeejgriffith: yeah just a I mentioned I'm waiting on the test21:18
jgriffiththingee: cool21:18
hemnamaybe the status == detaching ?21:19
*** Liu has joined #openstack-cinder21:21
*** dims__ has quit IRC21:23
*** cbader has quit IRC21:25
*** Adriano_ has quit IRC21:26
thingeejgriffith, hemna, mtreinish: ok I think that was from a result of my breakpoints. false alarm.. However, I think it may be possible for that to happen if things were running slow?21:27
thingeeas hemna mentioned with the possibility of things in a detaching state.21:28
openstackgerritWalter A. Boring IV (hemna) proposed openstack/os-brick: Brick: Fix race in removing iSCSI device
*** bswartz has joined #openstack-cinder21:29
openstackgerritAngela Smith proposed openstack/cinder-specs: Spec for Brocade Zone Driver Virtual Fabrics support
hemnaI'm just guessing at this point21:30
*** eharney has quit IRC21:30
hemnaI'd hope that the API would prevent the call from getting to the manager if the volume is in an 'ing' state21:30
hemnabut I don't think we are that safe21:30
*** annegentle has joined #openstack-cinder21:48
openstackgerritAnish Bhatt proposed openstack/cinder-specs: Add open-iscsi transport support to brick
anishthat's what I did right now21:52
hemnajogo had enabled a tempest test, and the test was calling delete on the vm (boot from volume), and then the cleanup of the test called detach.22:13
hemnaso the test was calling detach on a volume that had already been detached22:14
hemnaso the exception was correct.22:14
*** rongze has quit IRC22:14
hemnathis may be the case as wel22:14
jgriffithhemna: thingee 2015-04-15 20:01:50.739 1093822:15
*** annegentle has quit IRC22:15
thingeejgriffith: which process?22:15
jgriffiththingee: c-vol22:15
jgriffiththingee: call do driver.detach_volume22:16
* jgriffith doesn't know why the trace info doesn't seem to align though22:19
jgriffithand grep doesn't seem to find that message which is "weird"22:20
jgriffithit *was* here:
jgriffithbut looks like it's been removed since22:24
jgriffiththingee: guess I could look at the patch ref you have )22:25
jgriffiththingee: oh... arch 4!22:25
jgriffiththingee: that's pretty old22:26
jgriffiththingee: although upate April 1522:26
thingeejgriffith: march 4th?22:27
thingeejgriffith: where did you see that?22:27
jgriffiththingee: that patch was submitted on march 422:27
jgriffiththingee: but updated version 6 on the 15'th of april22:27
jgriffiththingee: I was trying to figure out why the messages were so out of date with whats in Master right now22:28
jgriffiththingee: because a bunch of that appears to have changed22:28
hemnaso for whatever reason nova is calling detach on that volume more than once22:28
hemnahence the exception22:28
jgriffiththingee: your logs are for:
hemnamost likely a tempest cleanup22:28
jgriffithhemna: which is fine22:28
jgriffithhemna: IMO that should just be handled/ignored by the driver22:29
jgriffithhemna: "Hey... stand up", "ok, I already am"22:29
thingeejgriffith: in
jgriffiththingee: lemme pull the patch so I can look at it all22:30
jgriffiththingee: but you may want to try this on current master too22:30
jgriffiththingee: as it looks like that raise might have been removed anyway22:30
*** Lee1092 has quit IRC22:30
*** krtaylor has quit IRC22:31
jgriffiththingee: Yeah, that's been *fixed* already22:31
*** jaypipes has quit IRC22:31
openstackgerritWalter A. Boring IV (hemna) proposed openstack/os-brick: Brick: Fix race in removing iSCSI device
jgriffithhemna: oh... so I was saying don't raise, pass.. and update22:37
jgriffithhemna: but it's not that simple22:38
jgriffithhemna: there's a good deal of *stuff* after that now22:38
jgriffithhemna: so I'm curious... did something "change" that we don't use "begin detaching" anymore?22:43
jgriffithhemna: or that Nova doesn't use it I guess?22:43
hemnaso that's part of the problem22:44
hemnaNova isn't using the correct order of Cinder APIs22:44
hemnaat times22:44
hemnathe rebuild bug that popped up was a case in point of that22:44
hemnanova was simply calling detach22:44
hemnawithout calling terminate_connection22:45
hemnaso it wouldn't surprise me if nova doesn't call reserve, begin, etc22:45
* hemna has a sad22:45
openstackgerritJohn Griffith proposed openstack/cinder: Add volume status check to api detach method
jgriffithhemna: oh... yeah :)22:46
jgriffithhemna: although actually.. I think in this test Tempest is calling cinderclient directly itself22:46
jgriffithso it has nothing to do with Nova22:46
hemnayah you are probably right.22:46
jgriffiththingee: so that's your failure BTW I think22:46
jgriffiththingee: we didn't used to raise in that case... now we do22:46
jgriffiththingee: not quite sure why it wouldn't be seen *everywhere* but frankly I haven't looked :)22:47
thingeejgriffith: good, that's what I was saying earlier :)22:47
jgriffiththingee: by "everywhere" I mean other drivers22:47
thingeejgriffith: maybe my ci is slow22:47
jgriffiththingee: Oh... well if you already knew why did I just waste the last hour looking at this with you22:47
thingeejgriffith: I raised it earlier...and then I think you just started looking on your own to come to the same conclusion22:48
*** markvoelker has joined #openstack-cinder22:48
jgriffiththingee: oh... how stupid of me22:48
jgriffiththought i was being helpful22:48
jgriffithdidn't realize you had it all figured out.22:48
jgriffithsorry... ok, just ignore me :)22:49
thingeewell I wasn't sure...that's why I included you all. to see if I was crazy22:49
*** rongze has quit IRC23:14
mtreinishjgriffith: tempest doesn't call cinderclient at all (except in the cli tests which will be moved shortly :) )23:14
jgriffithmtreinish: LOL... nice plug! :)23:17
jgriffithmtreinish: so I should clarify what I meant... meaning that the detach call is made during the cleanup by Tempest23:18
mtreinishjgriffith: it's not in test volume boot pattern which is what thingee was looking at23:18
jgriffithmtreinish: ahhh23:18
mtreinishit's all boot from volume23:19
mtreinishso there is no manual attach or detach23:19
jgriffithmtreinish: well regardless, apparantly he knew/knows what the problem is there23:19
jgriffithmtreinish: oh..... so it's using the crazy "nova boot" args that create the volume and boot all in once23:19
jgriffithmtreinish: yeah, so that's troublesome all on it's own I'm afraid23:22
thingeemtreinish: my tempest.log was showing a DELETE to cinder23:22
*** rongze has quit IRC23:23
jgriffithmtreinish: oh... no, we're talking different :)23:23
*** rongze has joined #openstack-cinder23:23
*** fanyaohong has joined #openstack-cinder23:23
jgriffithmtreinish: that sends the call to create_volume_from_image and then passes in that ID23:23
jgriffithmtreinish: that's cool23:23
mtreinishoh ok, cool23:23
jgriffithmtreinish: there's a crazy thing that Horizon does where it does it all in one shot23:23
mtreinishoh I didn't even realize there was something to do that23:24
jgriffithmtreinish: and it's stupid lame because for any "real" sized volume it will time out every time23:24
jgriffithor... real sized image23:24
mtreinishthingee: sure a delete, but no detach23:24
*** rongze_ has joined #openstack-cinder23:24
*** rongze has quit IRC23:24
mtreinishthingee: it deletes the volume it creates after it deletes the server23:24
jgriffithmtreinish: yeah, it's an awful thing IMHO23:24
thingeemtreinish: yea23:24
*** rongze has joined #openstack-cinder23:25
*** rongze_ has quit IRC23:25
*** rongze_ has joined #openstack-cinder23:32
*** rongze has quit IRC23:35
*** Apoorva_ has joined #openstack-cinder23:37
*** Apoorva has quit IRC23:37
