17:00:23 #startmeeting cinder-nova-api-changes 17:00:25 Meeting started Thu May 12 17:00:23 2016 UTC and is due to finish in 60 minutes. The chair is ildikov. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:27 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:29 o/ 17:00:30 The meeting name has been set to 'cinder_nova_api_changes' 17:00:35 hi 17:00:36 o/ 17:00:41 scottda ildikov DuncanT ameade cFouts johnthetubaguy jaypipes takashin alaski e0ne jgriffith tbarron andrearosa hemna erlon mriedem gouthamr ebalduf patrickeast smcginnis diablo_rojo gsilvis 17:00:52 hi 17:01:19 o/ 17:01:31 mriedem: I'm not sure why my question is being avoided after I asked twice. Is this because it's a priority problem in nova, or because it never will be for multiattach. 17:01:33 Do we have an agenda up somewhere? 17:01:36 as far as I know jgriffith_ is out today, but we still have a few items to touch on 17:01:40 o/ 17:01:50 thingee: later 17:02:02 etherpad with info: #link https://etherpad.openstack.org/p/cinder-nova-api-changes 17:02:07 lurking and learning 17:02:22 there were some items from ildikov's meeting minutes from last week 17:02:33 smcginnis: I added the list of items we are targeting to get done to the etherpad 17:02:38 " John Griffith will work on the above described solution, that target is to have patches up by next week." 17:02:39 ildikov: Thanks! 17:02:49 we can go through those 17:03:16 I have a question 17:03:46 I'm working on a nova patch to not call check_attach at attach time 17:03:47 I haven't seen patch(es) up from John yet 17:04:14 and check_attach does 2 things. 1) it checks internal state of the volume and 2) checks the availability zone 17:04:38 does it make sense to add an optional AZ param to os-attach ? 17:04:50 and have cinder check at os-reserve ? 17:05:00 or just keep the check on the nova side only 17:05:01 https://github.com/openstack/nova/blob/026468772672215d34a593e631d1e62d6a615aa4/nova/volume/cinder.py#L279 17:05:18 https://github.com/openstack/nova/blob/master/nova/volume/cinder.py#L289-L299 17:05:38 so, the az stuff is a mess kind of 17:05:38 I was just working on moving that code into a check_availability_zone() call in there instead 17:05:53 but before I go forward, I'd like to hear opinions on it 17:05:53 see https://github.com/openstack/nova/blob/026468772672215d34a593e631d1e62d6a615aa4/nova/virt/block_device.py#L60 17:06:40 ^ is really for boot from volume where nova creates the volume 17:06:52 I'd prefer to change nova's attach code to simply call os-reserve 17:06:57 because nova will create the volume in the same AZ that the instance is in, which might not exist in cinder 17:07:00 instead of a volume get, then check, then os-reserve 17:07:15 hemna: i think the az check in the api just needs to remain a separate thing 17:07:34 see my todo here https://github.com/openstack/nova/blob/026468772672215d34a593e631d1e62d6a615aa4/nova/virt/block_device.py#L79 17:07:57 i've had a long-term wish of creating the volume in nova-api for boot from volume, and then attaching it later 17:08:05 https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3095 17:08:07 so we do all of the az checking and stuff with cinder in the api rather than on the compute 17:08:08 that thing 17:08:22 But for nova to do the AZ check, it will still need the volume.get, which defeats the point of what hemna is trying to do. 17:08:23 I was hoping could simply be a call to self.volume_api.reserve_Volume() 17:08:32 scottda, +1 17:08:33 yah 17:08:45 so there is that. 17:09:13 the get, then reserve means there is still a race 17:09:26 so you'd have to pass the az to os-reserve 17:09:29 yah 17:09:42 as an optional param 17:09:49 if it's there, cinder tests it. 17:09:57 re: https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3095 ndipanov had a patch for a race in there also: https://review.openstack.org/#/c/290793/ 17:09:59 if it's not, it assumes it's open, re: no AZ 17:10:45 yeah, and nova's logic for passing the az would be based on what we have in https://github.com/openstack/nova/blob/026468772672215d34a593e631d1e62d6a615aa4/nova/virt/block_device.py#L60 17:10:46 for bfv 17:11:59 shall we take notes in https://etherpad.openstack.org/p/cinder-nova-api-changes ? 17:11:59 https://github.com/openstack/nova/blob/master/nova/volume/cinder.py#L289 17:12:05 so right now, that's checked 17:12:34 kinda the same thing 17:12:43 yeah, nova would just need to re-use some logic to determine if it needs to pass the az to os-reserve 17:12:52 sounds like the _get_volume_create_az_value() needs to be public 17:12:57 if CONF.cinder.cross_az_attach, we'd pass None 17:12:58 mriedem: I will add the decision points to the etherpad after the meeting 17:14:20 I don't see any AZ check on the cinder side 17:14:29 so I dunno 17:14:48 I don't think there's any AZ checks enforced in Cinder 17:14:51 yes there is 17:14:55 when creating the volume 17:15:06 nova can pass an az and if it doesn't exist cinder fails the volume create request 17:15:08 UNLESS 17:15:14 you set a backdoor config option to ignore htat 17:15:16 *that 17:15:20 the create flow passes in an AZ 17:15:20 Unless a fallback is configured. 17:15:34 smcginnis: right, which was a hack because we didn't have the fix in nova 17:15:39 which is https://github.com/openstack/nova/blob/026468772672215d34a593e631d1e62d6a615aa4/nova/virt/block_device.py#L60 17:15:42 bleh 17:15:47 https://github.com/openstack/nova/commit/f9a51b970f688b90baf0ae3ef31d79b3fec02ed1 17:15:52 ok, so I don't want to make the AZ nightmare worse 17:16:05 hemna: You made is worse by mentioning it. 17:16:19 well, passing the az to os-reserve and cinder checking if it's provided, isn't really making it worse 17:16:24 if nova doesn't provide it, it's a noop 17:16:32 scottda, :) 17:16:43 if cinder microversion isn't new enough for nova to pass it, then nova still has to check like it is today 17:16:47 I guess the real question is, should cinder care? 17:17:00 should cinder be doing the check and fail if the AZ doesn't match ? 17:17:10 so, 17:17:25 when i was fixing this bug in nova, i had a thread in the ML about removing the nova cross_az_attach option 17:17:25 There are use cases where deployers had geographically distinct AZs, so this was needed. 17:17:26 afaik AZ is a nova concept ? 17:17:30 and there were operators saying they relied on it 17:17:35 scottda: yes 17:17:36 that 17:17:49 We did it in our (now defunct) public cloud... 17:18:02 see http://lists.openstack.org/pipermail/openstack-operators/2015-September/008252.html 17:18:06 for some light bedtime reading 17:18:11 :) 17:18:17 The backdoor config option work brought up the fact that AZs were never fully baked. 17:18:25 starts here http://lists.openstack.org/pipermail/openstack-operators/2015-September/008224.html 17:18:45 Yeah, the terminology is vague, and that's part of the problem....but we still have to live with it. 17:18:49 and by 'fully baked' does that mean that nova should be passing the AZ in calls to Cinder ? 17:18:56 so that they can both be on the same page? 17:18:58 this is the cinder workaround https://review.openstack.org/#/c/217857/ 17:19:15 hemna: I think to fully support and enforce AZs, yeah. :/ 17:19:47 there are some decent details and background in the commit message of https://review.openstack.org/#/c/227564/ 17:19:53 so if a user creates a volume, is the AZ set? and to what? and how is that checked against attach calls from nova ? 17:19:56 But maybe we should shelve this az discussion for now and get back to multiattach. AZs are an issue for single and multi attach. 17:19:57 bleh 17:20:01 * hemna cowers in defeat 17:20:31 so ftr, to fully remove nova's check_attach, cinder's os-reserve would need to take an az 17:20:33 to validate it 17:20:39 mriedem, yah 17:20:40 at least to be consistent with how things are today 17:20:46 that's why I brought it up 17:20:47 let it be written in the etherpad for all time! 17:20:47 I guess we can make the 'check_attach' removal a two step process 17:21:06 so, if I still do the AZ check on the nova side 17:21:16 the race is smaller 17:21:24 at least nova won't be checking volume state 17:21:48 Yeah, but that's a bit of code churn and review time for an incomplete fix... 17:21:52 I think eventually, we do want to just pass the AZ to cinder and then nova can call reserve w/o a get. 17:22:25 I won't change the functionality of check_attach for now. 17:22:33 but I will refactor the AZ check out of there 17:22:34 scottda: it's just a bug fix really 17:22:42 and then simply call the new AZ check after the get. 17:22:51 yeah i think hemna and i are on the same page 17:22:53 fair enough 17:22:55 then reserve_volume will catch the state checks. 17:23:07 is it only the BFV case? 17:23:14 no 17:23:16 ildikov: no 17:23:18 I mean when check_AZ will need to be called 17:23:35 so in the remaining 7 minutes i have... 17:23:42 ok I'll forge ahead with this and push it up today then. 17:23:59 hemna: you might want to look at https://review.openstack.org/#/c/290793/ too 17:24:04 cool, added a note to the etherpad about the AZ check 17:25:11 mriedem, ok will do 17:25:16 mriedem: can you check the multiattach spec when you have some time? 17:25:39 ildikov: is it any different from mitaka? 17:26:20 mriedem: slightly updated, I added a link to the etherpad so that we would not need to add implementation details to the spec regarding how to sort out things in Cinder 17:26:22 because i was under the impression that the multiattach spec was going to be dependent on the POC that jgriffith_ was going to be doing 17:26:59 does this mean we can talk about approving it, when that is ready? 17:27:31 i'd prefer to not land a bunch of technical debt in nova just to get this in 17:28:13 the Cinder part is a dependency in the sepc, if these issues are not sorted out, than we're in trouble anyway 17:28:28 it does not mean to sort it out in Nova instead in my view 17:29:15 ok i'll have to review the spec to see the changes then 17:29:21 and the plan is to get them done :) 17:29:25 #action mriedem to review multiattach nova spec 17:29:36 tnx 17:29:41 #action hemna to poke at cleaning up nova check_attach 17:29:53 coolio 17:29:55 what's the status on cinder migrate testing on the multinode job in the gate? 17:29:57 if there's anything Nova specific that's missing I will add it 17:30:30 mriedem: We're starting with cinder migrate on a single node. We think we can get that working.... 17:30:54 But it looks like Devstack support for multi-backend was removed. I'm trying to figure out why, and what alternative exists. 17:31:09 scottda: as in resize? 17:31:10 But eventually want multi-node as well. 17:31:30 wait what? 17:31:38 cinder multi-backend removed from devstack ? 17:31:41 no, just have 2 LVM volume groups as separate backends, and migrate between them on a single node. 17:31:51 mriedem: no like like multi drivers 17:32:04 hemna: No, I've actually found a way to do it, the syntax has changed... 17:32:06 and 17:32:20 and Tempest multi-backend tests are failing for me. Not sure why. 17:32:47 and that will still test swap volume? 17:33:08 yes, calling cinder migrate will call swap volume. 17:34:35 ok, do we want to talk about https://review.openstack.org/#/c/312773/ ? 17:35:39 What do you think of that patch mriedem ? 17:35:49 honestly i haven't had the time to dig into it 17:36:19 would be nice to see the live migration job or multi node job passing on it 17:36:22 but those are super flaky 17:36:50 i can dig into the test failures for volume-backed live migration 17:36:54 and see if they are related 17:37:54 finally, before i go, 17:38:04 anyone talked to jgriffith_ on the os-initialize_connection changes? 17:38:31 no, I haven't 17:38:37 mriedem: the job says for live migration that it passed, but I might missed smth in the logs... 17:38:57 mriedem: I talked to him briefly, he's working on it, but we couldn't go into details 17:39:03 ildikov: yeah http://logs.openstack.org/73/312773/1/experimental/gate-tempest-dsvm-multinode-live-migration/c57f6b9/console.html#_2016-05-08_09_14_44_572 17:39:11 the experimental jobs seem.....borked almost every time. :( 17:39:26 I think John said in IRC that he had unit tests for his patch mostly passing... 17:39:54 hemna: it's weird a bit, it congratulates you and then marks the test failed... 17:40:17 hehe 17:40:41 it is 17:40:41 like a participation trophy. 17:40:41 http://logs.openstack.org/73/312773/1/experimental/gate-tempest-dsvm-multinode-live-migration/c57f6b9/console.html#_2016-05-08_09_18_59_360 17:40:44 setting up ceph 17:40:50 i've pinged tdurakov on that, he works on that job 17:41:02 mriedem: scottda: I will try to catch him and add notes to the etherpad about that item this week or early next 17:41:03 that job sets up various storage backends in a single job 17:41:10 and runs the same 4 tests 17:41:13 looks like it's not working for ceph atm 17:41:21 ildikov: ok 17:41:42 alright, over by 11 minutes 17:41:44 anything else? 17:41:52 also this time next week might be tricky for me 17:42:00 but will try my best 17:42:20 change the time as needed 17:42:25 Let's work on a new time. It'd be nice to have JohnGarbuttt here, and JohnG as well 17:42:35 also I know johnthetubaguy cannot make it at this slot, so if it's problematic to either of you in general please let me know and then we can find another one 17:42:37 ok 17:42:48 also, fyi, i'm out from 5/20-5/30 17:42:51 thanks for the help guys 17:42:56 back on 5/31 17:43:09 mriedem: Nice 17:43:11 mriedem: ok, thanks for the info 17:43:20 ok, bye all. 17:43:33 I will reach out to you regarding time slots 17:43:41 thanks all! 17:44:15 #endmeeting