15:00:20 #startmeeting manila 15:00:21 Meeting started Thu Jun 15 15:00:20 2017 UTC and is due to finish in 60 minutes. The chair is bswartz. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:23 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:25 The meeting name has been set to 'manila' 15:00:26 hello all 15:00:28 Hi 15:00:33 hi 15:00:34 hi 15:00:38 hi 15:00:39 \o 15:00:45 hello 15:00:51 hello 15:01:08 hi 15:01:10 no announcements today 15:01:18 #agenda https://wiki.openstack.org/wiki/Manila/Meetings 15:01:31 we have 2 important topics to cover 15:01:46 first up: 15:01:50 #topic How should support both IPv6 and IPv4 with DHSS=true 15:02:31 so during the Atlanta PTG we identified an issue with supporting both IPv4 and IPv6 at the same time for DHSS=true backends 15:02:40 o/ 15:02:42 Should we support both 6 and 4 with DHSS=true as initial target 15:02:53 it's not possible to do it with the current share network design 15:03:10 so we know that a change to share networks is needed before we can achieve that 15:03:27 some of us discussed possible approaches (me, tbarron, gouthamr) on Friday 15:03:40 link:https://review.openstack.org/#/c/391805/ 15:04:00 but I'd like to table that part of the discussion and first make sure we have agreement on our pike goal 15:04:34 for Pike the plan is to support both IPv4 and IPv6 for dhss=false drivers 15:05:05 and for dhss=true drivers, either IPv4 or IPv6 should be supported, but not both at the same time 15:05:07 o/ 15:05:21 bswartz: agree 15:05:21 * bswartz marks gouthamr tardy :-p 15:05:31 :[ 15:06:00 +1 with the proposal 15:06:40 I'm pretty sure this is consistent with what we've been planning since ocata, but since zhongjun was confused about it, I wanted to make sure we were all on the same page as a community 15:07:09 +1 15:07:19 so if there are no issues there, we can move on to a more contentious topic... 15:07:36 not anymore 15:07:37 In coat, we planning to support either IPv4 or IPv6 for dhss=false and true :) 15:07:50 s/coat/ocata 15:08:20 zhongjun we've always wanted dual support 15:08:33 it's not hard to achieve for dhss=false 15:08:52 :) +1 15:08:57 it is hard for dhss=true, and we haven't had a volunteer to redesign the share networks APIs yet 15:08:58 bswartz zhongjun: have we resolved how we'd test it in the gate yet though? 15:09:05 gouthamr: 1 job 15:09:12 both in 1 job I mean 15:09:25 ipv6 testing would just be a tempest flag 15:10:04 okay let's move on 15:10:07 #topic IDs and instance IDs in the driver interface 15:10:17 #link https://bugs.launchpad.net/manila/+bug/1697581 15:10:19 Launchpad bug 1697581 in Manila "create snapshot failed due to absent of snapshot['share_id']" [Critical,In progress] - Assigned to Valeriy Ponomaryov (vponomaryov) 15:10:20 gouthamr: add a job by vkmc, but not test yet 15:10:22 #link https://review.openstack.org/#/c/433854/12/manila/db/sqlalchemy/models.py 15:10:28 #link https://review.openstack.org/#/c/473864/ 15:10:52 so there was some driver breakage caused by the share groups DB refactor change that merged 15:11:11 unfortunately reviewers (me included) did not catch the 3rd party CI failures 15:11:56 so I'll remind everyone that we should be looking for 3rd party CI failures on changes that can potentially break drivers (especially changes affecting the driver interface) 15:12:24 everyone just got used to their failures ) 15:12:50 the history here is that when we introduced share instances 2 years ago, we planned to keep them hidden from drivers, and present the instances to drivers as if they were the actual shares 15:13:19 vponomaryov: yes that's part of the problem -- some CIs were already failing for different reasons so it was impossible to notice that the share groups change broke them 15:14:00 bswartz: i had a different way to tell, our DHSS=True driver was always passing :P 15:14:16 at that time we agreed that the actual share IDs and snapshot IDs (the ones exposed through the REST API) should never be used by the drivers 15:14:30 and then consistency_group_xyz (don't remember what) flag for tempest was reused and our manifests weren't updated 15:14:48 later on then when we added migration and replication we started to bend our own rule 15:15:12 and now we have a situation where at least 1 driver (ZFS) actually does use the share ID for something 15:15:20 gouthamr: lol 15:16:00 there are multiple options to fix this, but we don't seem to agree on which one to pursue 15:16:24 i think we need to fix replication and migration if they are the problem areas 15:16:47 1) admit defeat and present both share IDs and instance IDs through the driver interface -- this would require changing all the drivers the directly consume the instance IDs instead of the share IDs 15:17:23 2) be more strict about never presenting the share IDs to the driver, and fix drivers which currently rely on it 15:17:48 3) roll back to the previous state where stuff worked but it was all murky and confusing 15:18:00 snapshot['share_id'] has always meant to refer to whatever ID is picked by the driver to refer to shares on their respective drivers... and for a lot of drivers, it's whatever comes as share['id'] in create_share 15:18:06 we should keep it that way 15:18:33 vponomaryov's bugfix for this patch was the first step in (1), but xyang didn't like it -- AIUI because it would require significant driver changes 15:18:36 about 2) this is enforced in code review... 15:18:50 bswartz, https://review.openstack.org/#/c/473864/ is (3) right now 15:18:50 vponomaryov: can ZFS avoid using the snapshot share_id ? 15:18:58 I personally lean towards option (2) 15:19:20 but I want to know if anything other than ZFS actually requires the share ID or snapshot ID (not the instance IDs) 15:19:26 ganso: this driver stores appropriate data for share facade and share instances 15:19:28 bswartz: If I understood correctly, we are broken at this moment, so we need option #1 or #3 to fix 15:19:35 ganso: stores separate sets of data 15:19:52 bswartz: (3) now and (2) or (1) later 15:20:10 +1 15:20:20 +1 15:20:44 ganso: I think ZFS could be modified to not rely on the share ID, but it would be a breaking change 15:21:00 that is, exising replicated shares would probably not survive an upgrade 15:21:18 We always use the instance IDs in our driver. 15:21:23 I'm not aware of any users relying on ZFS fortunately 15:21:26 do drivers need to log the ID and maybe should present the share ID rather than instance ID in the log? o/w I like #2 15:21:51 or is it ok to log instance IDs ... as long as we don't present to end user 15:22:03 quick search: the container driver also seems to be using share['share_id'] 15:22:22 gouthamr: it doesn't do snapshots )) 15:22:22 IMO it should be possible for drivers to come up with their own identifier that ties together the various replicas of a share and store that in driver private share data or in the provider location field 15:22:23 share_name = share.share_id :| 15:22:41 and thus not rely on the share ID to relate different replicas to eachother 15:22:54 vponomaryov: or replication 15:24:17 gouthamr: regarding enforcement of (2) code reviews haven't worked well so I would propose actually modifying the manager to pass down synthetic objects to the drivers instead of the model objects 15:25:06 so I'd like to figure out which approach we should pursue 15:25:15 hopefully nobody is in favor of (1) 15:25:22 but if you are please speak up 15:25:44 I think the question is whether to (2) or (3) or (3) followed by (2) 15:26:17 sorry if my usage of numbers is confusing 15:26:29 the latter... so we can evaluate this with more time.. 15:26:42 we can't keep being broken at the moment. 15:26:53 bswartz: me, personally, don't see problem with (1) 15:27:02 bswartz: so, I woudl say (3) then (1) 15:27:34 vponomaryov: the downside there is that existing drivers need to be modified significantly, and future driver authors see a more complex interface to deal with 15:27:53 it's hard enough to figure out how to write a manila driver 15:27:59 bswartz: significantly? cannot agree 15:28:06 adding different kinds of IDs would make that worse IMO 15:28:34 vponomaryov: the problem is that it's hard to be sure you've found every place the IDs is used when doing a field rename 15:28:55 python is an untyped language so there's no "refactor" button that just works 15:28:58 significantly: relative term --- less significantly for someone who wrote multiple first party drivers, more significantly for someone bridging a storage array :D 15:29:31 gouthamr: +1 15:29:37 the other downside of any kind of driver refactor is that it makes backporting bugfixes harder -- so I'd rather not force all the driver to change 15:30:24 (3) forever then! )) 15:30:24 bswartz: +1 15:30:30 plus, we've seen the response time of driver updates when we forced everyone to implement update_access... some are still doing it or haven't done it 15:30:58 well what arguments are there against (2)? 15:31:18 Are we aware of any cases other than ZFS replication where the share ID actually matters to a driver? 15:31:28 bswartz: i'm concerned about drivers that use driver-private-data 15:31:38 if not, then the sole downside to (2) is that we need to modify ZFS replication 15:33:12 which brings up another question... 15:33:33 since vponomaryov won't be maintaining ZfsOnLinux anymore, is there a volunteer to take that over? 15:33:51 Do we need to check it again? Some other driver maintainer don't attend this meeting. 15:35:03 zhongjun: we will double check before introducing any changes, and we can/should look closely at CI systems in any changes that actually remove the share ID or shapshot ID from the objects passed to the drivers 15:35:49 gouthamr: what's your concern about private data? 15:36:06 private data is meant to make things easier for drivers 15:36:13 bswartz: i;m concerned about updates 15:36:49 what updates 15:37:04 if the driver relies on checking the id from fields that will disappear from resources passed in the driver interface 15:37:23 to store things into the driver-private-data, how would we avoid that problem? 15:38:00 gouthamr: that would count as a driver using those fields, and that's exactly what we're trying to determine 15:38:18 perhaps we just need to audit the code, but I was looking for anyone that knows of cases already 15:38:29 because vponomaryov pointed out the ZFS use case 15:38:53 container driver refers to shares created with the share ID not the instance ID 15:39:25 container driver should be fixed then -- its not like it has a good reason to do that, right? 15:39:39 bswartz: yes, compared to ZFS 15:40:40 can we agree in principle that we will do (3) and then pursue (2) assuming no other difficult cases other than ZFS come up? 15:40:54 since there are no volunteers to maintain ZFS I can take a look at that myself 15:41:22 I'm actually a heavy user of ZFS so I'm quite familiar with how it works 15:42:04 ZFSonLinux, not Oracle ZFS driver 15:42:11 bswartz: nope, but vponomaryov will help with (3)? 15:42:25 (3) is already here -> https://review.openstack.org/#/c/473864/ 15:42:38 just will upload one more patch set 15:42:39 gouthamr: if you read the #manila channel that was discussed already 15:43:12 vponomaryov: awesome.. thanks. I'll see what needs to be fixed on the NetApp CI to unbreak ourselves 15:43:28 okay it sounds like we have a path forward here 15:43:39 I'll open up the floor to other topics 15:43:42 #topic open discussion 15:43:55 one note from me 15:44:02 Since now, I will be able to spend only my spare time for manila. 15:44:18 we hope you have lots of spare time 15:44:24 ^_^ 15:44:28 :) 15:44:31 tbarron: +1000 15:44:43 vponomaryov: and seriously, we hope your non-spare time is rewarding to you! 15:44:55 )) 15:45:10 vponomaryov: that includes saturday, sunday, and every night. lots of time:) 15:45:18 reserve some time to argue with ganso and myself. 15:45:25 in my spare time I've been playing with the neutron l2gw plugin -- it may have some interesting applications for Manila 15:45:28 gouthamr: +1 xD 15:45:36 vponomaryov: you leave big shoes to fill 15:45:42 xyang2:i see you doing regular work during those times 15:46:09 markstur: just do your best ) 15:46:15 tbarron: right. I sleep during the day:) 15:46:21 bswartz: 'splain about l2gw 15:46:25 markstur: and you will see it is not that big 15:46:28 vponomaryov: thanks for all you've done for manila -- we would never have been this successful without your efforts 15:46:42 vponomaryov: thank you! 15:46:51 xyang2: i sleep during meetings too 15:47:00 thanks guys, it is pleasure to work with all of you 15:47:05 tbarron: :) 15:47:06 +1000 15:47:13 vponomaryov: =) 15:47:26 and argue too ^_^ 15:47:29 #link https://github.com/openstack/networking-l2gw 15:47:31 :P 15:47:31 vponomaryov: thank you for review my code. 15:47:36 #link https://docs.openstack.org/developer/networking-l2gw/readme.html 15:47:37 um the thousand was thanks to VP not in response to Tom's napping in meetings 15:47:51 * tbarron zzz.... 15:48:42 ganso: where can I take member ticket to closed club of retired manila devs? 15:48:57 ^ this neutron plugin allows you to take a neutron network and extend it to an external VLAN network 15:49:07 vponomaryov: haha I'll email it to you :P 15:50:00 http://livinglifeph.com/wp-content/uploads/2016/11/metro-manila-retirement-hoppler-1.jpg 15:50:05 tbarron: it's primarily of interest to dhss=true drivers 15:50:19 gouthamr: ROFL 15:50:46 looks nice :) 15:50:56 bswartz: with off-cloud appliances, to help integrate them into neutron, right? 15:50:59 * bswartz doesn't want to live in the Philippines 15:51:15 tbarron: yep.. 15:51:16 tbarron: that's the use case I'm interested in 15:51:51 bswartz: I"m just calling that out b/c there may be some other appliances that have similar needs 15:52:07 yes that's why I'm mentioning it too 15:52:09 and may share your interest 15:52:58 currently the netapp driver has awkward requirements to be able to run in dhss=true mode 15:53:14 the l2gw stuff may allow us to relax the requirements and work in more use cases 15:53:25 i guess every dhss=true driver in the tree, unless they natively support vxlan 15:53:28 http://pics4.city-data.com/cpicc/cfiles37329.jpg 15:53:44 bswartz: You can retire in Manila, Arkansas ^ 15:53:53 >_< 15:54:00 hahahaha 15:54:07 North Carolina is fine with me 15:54:17 manila phillipines is probably safer 15:54:24 markstur: "cry me a river"? 15:54:42 Speaking of shared interests, I'm starting to research "instance HA" for service VMs since as manila uses them currently they are a SPOF in the data path 15:54:56 if anyone else is looking at this please ping me 15:55:29 vponomaryov: :) 15:55:33 tbarron: related to that -- does anyone know anything about neutron "service" networks? 15:55:41 tbarron: Is there some links about that? 15:56:04 I'm not even sure what they're called in neutron or if they exist 15:56:13 zhongjun: not really that I know of, maybe I'll start a wiki or blog ... 15:56:26 certainly not called service-networks.. what are you thinking about? 15:56:47 networks that have ports on the control nodes 15:57:11 so m-shr could SSH to a service_instance through that port, for example 15:57:23 gouthamr: common network for service needs that is available from hosts/controllers/compute nodes 15:57:42 gouthamr: but not exposed to users 15:57:45 vponomaryov: did you ever find any concrete information on that? 15:57:49 bswartz: on that subject I've thought that maybe a bridge on the control node plus an admin/service-user-owned neutron net conncected to that bridge may do what is needed 15:57:51 bswartz: no 15:57:58 okay 15:58:07 so that remains an area to research 15:58:32 yeah, that's the other are of service instance module that really needs attention I think 15:58:40 besides the instance HA area 15:58:50 tbarron: the important thing would be neutron APIs to set that up though 15:59:10 how to get connectivity to SVMs w/o the @#$#@$ layer 2 stitching that we have to do now 15:59:25 bswartz: yeah, agree on the APIs 15:59:25 if it's not supported by neutron then it would be no better than the hack currently used by the generic driver 15:59:44 okay we're out of time 15:59:49 thank you all 15:59:57 #endmeeting