21:00:24 #startmeeting Cinder-Nova API changes 21:00:25 Meeting started Wed Mar 30 21:00:24 2016 UTC and is due to finish in 60 minutes. The chair is ildikov. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:26 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:29 The meeting name has been set to 'cinder_nova_api_changes' 21:00:30 HI 21:00:34 hey 21:00:37 scottda ildikov DuncanT ameade cFouts johnthetubaguy jaypipes takashin alaski e0ne jgriffith tbarron andrearosa hemna erlon mriedem gouthamr 21:00:51 Hi 21:00:53 hi 21:00:54 my typingabilities are limited, sorry :S 21:00:57 hi 21:01:00 Hi 21:01:02 hi everyone 21:01:17 #chair scottda 21:01:17 Current chairs: ildikov scottda 21:01:27 * alaski lurks 21:01:41 #link https://etherpad.openstack.org/p/cinder-nova-api-changes 21:01:59 ildikov: I can type if you'd like 21:02:05 let's wait one more min for folks to show up 21:02:08 OK 21:02:10 hi 21:02:44 o/ 21:02:51 i'm here! 21:03:03 * jungleboyj cheers for mriedem 21:03:03 Please add items to the etherpad under Issues Summary if y 21:03:20 hi 21:03:24 ok, I guess we can start 21:03:46 So, we'd like to fix some things in the API between Nova and Cinder 21:03:53 the agenda is on the etherpad scottda posted above 21:04:11 #topic Briefly discuss the current situation, summarize/agree on the issues 21:04:34 One issue we faced in Mitaka is with multi-attach in Nova... 21:04:42 there were a few mail threads on the mailing list about issues with the interaction between these two modules 21:05:13 unfortunately i've failed at reading the multiattach detach thread 21:05:24 but i think i have a general idea 21:05:54 We had to revert ildikov 's patches because it broke older Cinder clients by passing in a new parameter. 21:06:04 multiattach is special, it highlights already existing issues pretty well 21:06:11 We can fix that now that Cinder has api microversions 21:06:43 But that will require figuring out the pattern to use when Nova needs to use a cinder microversion. So that is one thing. 21:06:45 scottda: those were the patches reverted during the midcycle right? 21:06:50 mriedem: yes 21:07:05 and they were to what, pass the connector on attach/detach? 21:07:18 mriedem: I believe so 21:07:19 the passed the host name to Cinder 21:07:25 oh, right. 21:07:27 *they 21:07:41 Cinder can accept host name, but not host name and instance at the same time 21:08:03 Newer (Mitaka +) cinder can, the issue is with LIberty and older 21:08:08 don't ask about the history of that check, but it's what broke Cinder 21:08:27 so there is a microversion that added hostname to attach/detach? 21:08:28 scottda: yeah, exactly, hemna fixed this during Mitaka 21:08:40 this is why tests passed on the gate for my patch 21:08:52 mriedem: Actually, not microversioned in Cinder. just an optional parameter in Mitaka 21:09:05 but I think this is a small part of the whole issue 21:09:32 so if i have a newton nova and liberty cinder, 21:09:37 and nova passes hostname to cinder 21:09:39 kablammo 21:09:40 right? 21:09:46 instance + hostname 21:09:50 yes 21:09:54 hence why we have microversions 21:10:01 b/c nova has to ask what microversions it can send 21:10:07 right 21:10:32 And that paves the way for any other changes that are backward incompatible. 21:11:05 well, it's also about discoverability, but yeah 21:12:04 Yes. I think when Nova instantiates the cinderclient, it can figure out the server version then and possible store it. 21:12:22 Then use that to trigger different logic in the code. maybe 21:12:29 * scottda is waving his arms around here 21:12:54 But we'll have to figure out those details, and then use that pattern for other stuffs. 21:13:29 ildikov: Any other things around multiattach? 21:13:42 does this fix the detach issue though? 21:13:53 I'm not sure sending hostname will fix everything 21:13:56 i thought there was a problem there where cinder needed to start storing the connector in the attachment db 21:14:14 mriedem: There is a POC by jgriffith that might help more with detach... 21:14:30 #link https://review.openstack.org/#/c/284867/ 21:14:35 mriedem: it's kind of an open question still that in what extent we're dealing with the dead horse in each module 21:14:42 That stores the connector in the attachment db 21:14:58 mriedem: Cinder can figure out detach based on this extra info on it's side 21:15:06 And could also return the attachment_id to Nova when intialize_connection is called. 21:15:34 if the hostname is stored with the attachments in volume info than Nova can also use that information when disconnects a volume 21:15:48 although we might face issues still due to the different Cinder drivers 21:15:59 s/drivers/back ends/ 21:16:39 Yes, that is another issue. Some drivers multiplex the iscsi connection for >1 volume to the same compute host. Others create 1 connection per attachment. 21:16:43 scottda: "And could also return the attachment_id to Nova when intialize_connection is called." - that's prior to nova calling os-attach in the volume api 21:16:47 scottda: so that's a chicken/egg 21:17:13 mriedem: that was my thinking as well 21:17:22 mriedem: Not sure the details of jgriffith's POC. You are right. Might return it in attach() 21:17:34 we already get the attachment_id back from os-attach 21:17:49 yeah, we get in before detaching 21:18:22 i think having the info was half the issue, knowing when to actually disconnect from the backend was the other part that is (in my mind) still unclear 21:18:25 oh nvm 21:18:30 os-attach is a 202 21:18:36 nova waits for the volume to go to in-use 21:18:42 and nova can get the attachments from the volume GET 21:18:49 but it's only for telling Cinder which attachment to detach in case of multiple, it does not say anything about the connector 21:18:53 like scottda mentioned some backend would have a single iscsi session open for all the attachments 21:19:06 s/backend/backends/ 21:19:30 which is a problem for multi-attach to >1 vm on the same compute node 21:19:42 well, becomes a problem 21:19:48 patrickeast: yeah, exactly, tempest tests with lvm and logs show nicely in multiattach case that the session is killed in case of the first detach on a host 21:20:21 patrickeast: Right. So there's a question of whether Nova or Cinder should contain the logic around when to actually detach() in Cinder and remove the export 21:20:26 patrickeast: and the target is also removed on Cinder side at the moment 21:20:35 yep yep 21:20:48 I don't think we need to answer these questions now... 21:20:58 I think we are better off figuring out what the questions are... 21:21:06 and listing the possible solutions. 21:21:08 scottda: +1 21:21:30 scottda: yeah, we already got into a few small rabbit holes here 21:22:32 I think regarding multiattach the main issue is detach 21:23:08 ildikov: And if we figured out when to actually remove the target and finalize the detach in Cinder, would that solve all known issues? 21:23:47 That, and only passing in host+instance_id with Mitaka+ versions of Cinder? 21:24:00 Nova needs to know also when to call out to os-brick 21:24:29 ildikov: This is for the driver/backend specific issues? Around when to actually remove the target? 21:24:42 Or something else? 21:25:14 as far as I understood Nova closes the session on the compute node, while Cinder removes the target, but I might messed up with the logs regarding this 21:25:16 does removing the target happen in the terminate_connection API in cinder? 21:25:31 yes 21:25:40 and nova passes the connector to that, 21:25:41 but we also call disconnect_volume in Nova 21:25:41 removing the target does, but you need to log out on the compute node first 21:25:46 which it's getting from os-brick 21:25:48 so at some point nova needs to know what to do 21:26:01 mriedem: yeah, that's what I meant 21:26:03 yes, the order is the nova virt driver disconnects the volume 21:26:10 which actually happens in os-brick 21:26:17 then it calls the terminate-connection API in cinder 21:26:24 yep, exactly 21:26:25 right 21:26:25 and passes the volume connector 21:26:38 if that connector dict was stored with the attachment in the cinder db, nova wouldn't have to pass that 21:27:03 nova also get the connector from os-brick 21:27:27 mriedem: I would assume in that case Nova could also check whether there are multiple volume attachments using the same? 21:27:53 using the same what? 21:27:54 hemna Had an idea for how to do that from the Nova side. 21:28:01 He is not here ATM 21:28:20 yeah, but that required calling initialize_connection from detach 21:28:44 ildikov: Yes. But we could add a new API to Cinder to get that info 21:29:03 +1 21:29:16 scottda: can we store that info with the attachments? 21:29:31 Can who? Nova? 21:29:33 or Cinder 21:29:58 scottda: if not I'm fine with the new API as well, I just would like to minimise the round trips as that's a constant concern 21:30:06 Cinder 21:30:08 There are a few ideas around what to store, where, by whom.... 21:30:22 one argument in the past is that cinder may not always know about all of the places a volume was attached by some cinder user (in this case nova-compute + os-brick) 21:30:27 and which entity determines when to remove the target/export 21:30:34 so storing the refcount of attachments to some host wasn't really ideal to keep there 21:30:50 like the vm->attached volume mapping count, that is 21:30:57 scottda: yeah, I know, would be nice to clarify that either here or on a follow up meeting/mail thread... 21:31:00 patrickeast: Right. There is the arguement that Cinder should not care about any of this. Just call Cinder when you want the target removed. 21:31:51 scottda: yea, i mean, there is sort of a fundamental argument about whether its right or wrong, but also concerns about whether it actually *can* consistently keep track 21:32:11 patrickeast: I remember a version of that idea, live migration can mess those things up for instance if it's allowed 21:33:03 ildikov: I'm working on a Cinder Spec to allow update of connector_info and host in the Cinder DB. That would help with live migration issues. 21:33:25 patrickeast: I was wondering to store connector info 21:34:09 scottda: yeah, sounds like a good start to me, although not a solution to the debates mentioned here 21:34:46 ildikov: Right. Just one piece in the puzzle. And one that doesn't have a lot of different ideas floating around to confuse things. 21:35:30 scottda: I was suggesting this direction as well, so I'm convinced :) 21:35:42 So, under etherpad Issues Summary #2: New Cinder API for Updating Connector info and Nova host is needed for Nova live migration, evacuation, and shelve offload 21:35:53 Is there any debate or issues with that? 21:36:00 scottda: it's still a question though how closely coupled we want Nova and Cinder to be here 21:36:23 Well, for live migration/shelve_offload/evacuate, we are changing the host.... 21:36:32 And Cinder stores that host info. 21:36:52 IF we are using host+instance_id for detach in multi-attach, we'd need to update it. 21:37:13 andrearosa: Wasn't there some other bug around volumes attached during migration? 21:37:27 That would be fixed if we could update that info in Cinder DB? 21:38:04 when the host is changed we need to do updates for sure 21:38:21 IIRC there is an issue in not closing the connection with the target when we migrte, that is an issue with shelved_offload not sure if it iw an issue for LM 21:38:23 OK, so that's Issue #2 21:38:50 andrearosa: Ahh..right. I filed a bug for that but it was relegated to "Wishlist" 21:38:59 yeap 21:39:06 I wanted to add a Cinder API to close the connection/remove the target 21:39:34 so, I'm skipping issue #1 for a moment because it is hard.. 21:39:41 Issue #3 is Cinder cannot force-detach without Nova/caller help since it does not have Connector info 21:39:51 We can fix that by saving the connector_info 21:40:00 easy :D 21:40:01 using jgriffith's patch, or something similar 21:40:15 yup. So let's move back to #1 21:40:26 Detaching volumes with multi-attach when on the same host causes problems 21:40:52 I'm suggesting we list the issues and alternatives, with Pros and Cons. 21:41:03 +1 21:41:11 +1 21:41:41 sounds cool 21:42:04 We can add that to the etherpad. And discuss a bit in IRC. Then have a meeting or wrestling match or something to get to some resolution. 21:42:51 Did anyone have any other issues that needed addressing? IF so, please add to the etherpad "Issues Summary" 21:43:29 Anything else we could discuss right now? 21:43:31 ndipanov had a patch in nova i can probably add 21:43:33 it would be good to see a bigger picture as opposed to focus on one particular item 21:43:35 but it's more of a nova issue i think 21:43:54 mriedem: which one? 21:43:56 https://review.openstack.org/#/c/290793/ 21:44:27 mriedem: tnx! 21:44:30 mriedem: OK. I think we are open to any changes in the Cinder API to make things better. So anything is fair game. 21:44:41 i'm not sure this requires a cinder api change 21:44:54 scottda: mriedem: do we have a picture on what to capture where? 21:44:55 i think it's a nova problem, but was related to some cinder interaction on attach 21:45:00 I actually have a WIP spec to refactor the basic attach and detach APIs. But that is probably a lower priority. 21:45:00 I mean spec wise 21:45:26 ildikov: Well, one spec is cinder-connector-host-update 21:45:31 you probably can't write specs until you know what the current multiattach problems are and what the possible fixes are 21:45:48 which i thought needed documenting somewhere (maybe the same etherpad?) 21:45:52 ildikov: I'm working on that one. 21:46:20 mriedem: Yes, I think that etherpad can work for the issues and alternatives for multiattach solution. 21:46:23 i've honestly lost a lot of the context on the multiattach details since prior to the midcycle 21:46:26 mriedem: I mainly meant whether we can all agree that it would be better to keep things separate a bit 21:46:39 I think we all have. It's a complex area. 21:47:03 like don't shoehorn everything into the multiattach spec for instance 21:47:10 ildikov: i'd agree with that 21:47:19 yup, me too. 21:47:39 as storing extra info is used for other cases as well, so it would be better to have full view as opposed to fix only multiattach aspects IMHO 21:47:46 we'll need some details laid out on the version constraints here too, i.e. nova newton talking to cinder liberty 21:47:52 WE usually start with implementing changes on the Cinder side. Nova cannot implement things until Cinder has. 21:48:16 mriedem: how many version do we plan to support backwards? 21:48:49 as during Mitaka I think someone mentioned Icehouse or Juno also which is kind of way back in the past to adapt to IMHO 21:48:57 ildikov: well, it really comes down to the api 21:49:17 nova needs a way to tell that the version of cinder it's talking to can support api x 21:49:28 WE cannot crash older deployments of Cinder. We'll have to make sure that any incompatible changes are safely using microversions. 21:49:30 and if not, we have to fail on anything that requires nova using cinder api x 21:49:30 mriedem: like everything by the end of the day :) 21:50:14 OK. we can get folks thinking about the best way to do that. 21:50:45 but the client interaction doesn't really get flushed out until we know what the cinder api changes are, if any 21:50:45 Nova should be able to discover Cinder API version when the cinderclient is instantiated. 21:50:47 yeap, right, let's elaborate things on the etherpad 21:51:11 scottda: yup 21:51:14 OK. We'll get hemna and jgriffith to chime in there. 21:51:52 Alright. Anything else? 21:51:53 I'd like to mention that the Cinder community should discuss how often the Cinder API version is updated. I've seen it updated each time there is an API change and that seems to cause quite a bit of pain 21:52:19 maybe only update the API version at milestones? 21:52:20 scottda: do we have a follow up plan? 21:53:02 scottda: have another sync before the summit and/or talk about this on the summit? and/or bring it back to the mailing list? 21:53:05 cfouts: What does Nova do? I think Manila changes for each API change. We need to allow any deployer to take HEAD of master at any time and use it. 21:53:25 Nova seems to change far less often but I don't know what the logic is 21:53:25 ildikov: I think we'll end up syncing again before the summit. That's still 3+ weeks out. 21:53:33 scottda: cfouts: http://docs.openstack.org/developer/nova/api_microversion_dev.html 21:53:40 mriedem: thanks 21:53:49 the microversion changes when it needs to based on the change 21:54:01 you can't line things up around milestones 21:54:03 mriedem: Yes. I stole all the Cinder code and docs from Nova. Or Manila, which got it from Nova. 21:54:17 the point of microversions is the client opts into them 21:54:22 cfouts: What do you mean by 'problems'? 21:54:28 so if your client code is requesting x.latest, that's a bug in the client 21:54:33 scottda: I think it would be good to have a quick one, but we will organise it then when we're getting closer 21:54:41 cfouts: 'pain', sorry 21:54:56 ildikov: OK. Once we're happy with our listing of alternatives, we can sync again. 21:55:09 scottda: ildikov: yeah we need another sync up meeting before the summit 21:55:15 maybe in 2 weeks? 21:55:23 Sure. 21:55:23 fwiw, i'm fine with weekly meetings if needed 21:55:33 DuncanT: competing patches with the same microversion, client needing to be updated by developer to work with latest server version, QA wondering why things don't work because of microversion change 21:55:36 most of the work right now is on the cinder team i think to lay out the plan 21:55:57 cfouts: QA shouldn't be testing with the latest 21:56:06 mriedem: I agree. But there are the issues around alternatives to multi-attach support. and NOva will have opinions on that. 21:56:11 clients need to update to support microversions in order 21:56:19 scottda: yeah, agree 21:56:28 scottda: you guys set up the options, and we'll knock them down 21:56:37 scottda: mriedem: agreed, I will call for another meeting before the summit 21:56:44 mriedem: our QA tests internally before we push upstream 21:57:00 maybe that isn't a wide spread practice 21:57:06 cfouts: Competing patches, sure, it's annoying. We can at least avoid merging clashes with unit tests. Clien tonly needs updating if it wants to use the new feature, that's the point, and that change is needed whether you microversion it or not 21:57:08 cfouts: i guess i don't have context 21:57:11 We only test in production. 21:57:14 :) 21:57:28 for example, tempest is testing nova v2.1 21:57:32 but nova has like v2.25 21:57:35 mriedem: oh, missed the weekly meetings idea, TBH I could live with that as well, I think we can easily end up with a bigger amount of tasks here 21:57:53 ildikov: at least to checkpoint on progress i think that would be fine 21:57:55 OK, I have to run. Thanks everyone. Weekly sounds good for now. It will put some pressure on us. 21:57:59 ildikov: if there is no news, it's a short meeting 21:58:16 cfouts: All sorts of things can change with a patch between upstream submisison and merging, that is just the nature of the beast 21:58:25 mriedem: exactly, and having a weekly sync can ensure progress as well 21:58:42 scottda: thanks! 21:58:47 DuncanT: maybe it is more of our process then. I can live with that :) 21:58:47 ildikov: Don't forget to endmeeting when we are all done. 21:59:00 scottda: I will contact you about the follow-up/maybe weekly meetings 21:59:07 great 21:59:24 scottda: no worries, it's not my first one I had in mind :) 22:00:01 ok, we have a half minute left 22:00:05 I would like to talk about the nova swap volume function issue. 22:00:13 is there any other aspects we would want to discuss here today? 22:00:23 #link https://bugs.launchpad.net/cinder/+bug/1522705 22:00:25 Launchpad bug 1522705 in Cinder "Cinder volumes are stuck when non admin user executes nova swap volume API" [Undecided,In progress] - Assigned to Takashi NATSUME (natsume-takashi) 22:00:25 cfouts: We've had similar issues before microversions... I wonder if you could do something by bumping to a massive microversion in the (local only) patch, so your test can request something that will never be supported by upstream code or something? I'll have a think about it 22:01:03 takashin: is this the one you wrote the mail about earlier? 22:01:21 thank you all, I have to go. Bye 22:01:25 I wrote in the etherpad. 22:01:49 andrearosa: thanks for joining 22:01:55 The bug is caused because cinder 'migrate_volume_completion' API can be executed by admin only in default settings of cinder policy.json. 22:02:21 takashin: yeah, I can see now 22:02:35 takashin: That's because it was only designed to be used by nova internals 22:03:34 IMO, it is the problem whether attach/detach volumes is performed in nova side or cinder side. 22:04:42 takashin: so this is only a problem if it's a user-initiated volume migration on the cinder side right? 22:05:05 yes 22:05:08 i don't know the details, but cinder calls the nova swap-volume API? 22:05:13 as a non-adin 22:05:15 *admin 22:05:25 and nova then calls back to cinder as that non-admin and it fails 22:05:33 in cinder, volume migration is admin-only 22:05:46 But nova allows non-admin-user 22:06:39 for swap 22:07:04 because for nova it's just a detach of one volume and attach of another 22:07:29 Now detach/attach volume is performed in cider side. 22:07:37 So my first question would be what is the usecase for a none-admin to call this? 22:08:05 Originally nova allows it to non-adminuser 22:08:45 if cinder is initiating the swap volume call to nova, why doesn't it do it with the same admin context? 22:08:59 oh nvm 22:09:09 No 22:09:10 so admin volume migration from cinder is not a problem, right? 22:09:16 but non-user swap volume from nova is a problem 22:09:24 It was expected to be admin only, because it's dangerous in general to go swapping volumes without generating detach/attach events on the VM - it's an easy way to corrupt your volumes with old cached data 22:09:41 because http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/manager.py#n4943 22:10:30 so it seems like nova should check if the context is admin before calling http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/manager.py#n4943 22:10:52 this also bleeds into a cross-project discussion about api capability discoverability 22:11:16 Does it means admin-context in non-admin user operation? 22:11:17 so the user can find out what apis they can actually perform 22:11:40 mriedem: In this case, the answer is 'admin only', and always has been 22:11:55 DuncanT: the policy file is configurable though 22:12:01 unless there is something hardcoded in cinder for this 22:12:23 mriedem: Not sure without checking the code... we fixed a bunch of those but there might be more 22:12:46 Change policy in nova? 22:12:56 takashin: to what? make it admin-only? 22:13:02 Yes 22:13:08 idk, i guess that's an option 22:13:22 O.K. 22:13:24 i honestly don't know what all of the use cases are around swap volume or when it was originally introduced 22:13:36 i'd need some input from others on the nova team about that, like ndipanov 22:14:05 seems it should have been implemented like the nova/neutron callback api that we have for vif plugging 22:14:09 which is admin only 22:14:25 "compute_extension:instance_actions:events": "rule:admin_api", 22:14:31 we can follow up on this as well and add info to the etherpad in the meantime 22:14:42 yeah 22:14:45 i have to head out 22:15:31 there is a ML thread about this? 22:15:54 mriedem: second one at the bottom of the etherpad IIRC 22:16:01 Yes 22:16:05 Fix nova swap volume (updating an attached volume) function 22:16:06 #link http://lists.openstack.org/pipermail/openstack-dev/2016-February/087549.html 22:16:07 yeah 22:16:20 yeah, that's the one 22:16:23 i see it got 0 replies :) 22:16:35 ok, i'll try to read through that at some point and reply, or get others from nova involved 22:16:47 Thank you. 22:18:11 great, I suggest to close the meeting now 22:18:15 yes please :) 22:18:21 O.K. 22:18:24 I will call for a follow up meeting or make it even a series 22:18:47 I would encourage everyone to keep an eye and add info to the etherpad in the meantime! 22:19:02 thanks everyone for joining, I think it was a good start 22:19:53 so if nothing else, then good afternoon/evening/night/morning everyone :) 22:20:05 ildikov: yup, thanks for starting this 22:20:10 #endmeeting