14:00:50 #startmeeting kuryr 14:00:51 Meeting started Mon Jan 16 14:00:50 2017 UTC and is due to finish in 60 minutes. The chair is apuimedo. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:52 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:54 The meeting name has been set to 'kuryr' 14:01:10 Hello and welcome everybody to another Kuryr weekly IRC meeting 14:01:21 o/ 14:01:22 o/ 14:01:27 o/ 14:01:28 Today vikasc and irenab have excused themselves and won't be able to join 14:01:30 o/ 14:02:04 #topic kuryr-lib 14:03:40 o/ 14:03:41 There's basically no news on kuryr-lib :-) 14:03:44 No news, good news 14:03:47 as they say 14:03:54 only this patch 14:03:58 #link https://review.openstack.org/#/c/418792/ 14:04:19 which pisses me off a bit that came so late 14:04:45 but... What can we do, we merge it now and we'll consider if we do a point release to get the requirement upped 14:05:04 o/ 14:05:46 Anybody's got anything on kuryr-lib? 14:06:42 very well! 14:06:44 Moving on 14:06:48 #topic kuryr-libnetwokr 14:06:51 darn!!! 14:06:54 typo 14:06:56 eheh 14:06:58 #topic kuryr-libnetwork 14:07:15 ltomasbo: can you update us on the race? 14:07:25 Hi 14:07:40 the problem is that, after calling to attach the (sub)port to the trunk 14:07:49 we (kuryr-libnetwork) calls update_port 14:07:59 and it seems attach_subport also calls update_port internally 14:08:05 therefore, there is a race there 14:08:12 and something me device_owner is not properly set 14:08:29 which makes kuryr-libnetwork not able to remove the port after removing the container 14:08:53 since there is a filter by deviceowner before removing the port 14:09:10 #link https://review.openstack.org/#/c/419028/ 14:09:41 I've been discussing with armax about the possibility of reverting the patch that set device_owner for trunk ports 14:10:04 another easy fix could ve to remove the device_owner filter we do on ipam_release before calling the remove_port 14:10:22 #info there is a race in container-in-VM flow due to subport addition usage of device owner 14:11:06 I'm sorry, but I'm not sure I've understood the root cause of the race 14:11:15 ltomasbo: it is kind of a nice service we provide our users to mark which ports we manage for them automatically 14:11:26 apuimedo, yes, I agree 14:11:28 so we should try to preserve it as much as possible 14:11:46 mchiappero: basically we update the device:owner field to container:kuryr 14:11:50 the problem is that, the patch I proposed to revert is setting a different device_owner to the subports 14:11:57 and the trunk_subport_add does the same behind the scene 14:12:07 therefore we call attach_subport and then update_port 14:12:18 but attach_subport internally calls to update_port too 14:12:24 so IIRC, if the trunk_subport_add op finds it changed before it finishes, it goes boom. Is that right ltomasbo? 14:12:27 and both update_ports set a different device_owner 14:12:47 or is it only that it breaks our removal? 14:13:12 we need to call first trunk_subport_add and then update_port 14:13:26 since trunk_subport_add will fail if device_id is already set in the port 14:13:55 butn trunk_subport_add also calls update_port to set the device_owner to trunk:subport 14:14:14 while we call update_port from kuryr-libnetwork to set device_owner to kuryr:container 14:14:15 ltomasbo wait a sec, if trunk_subport_add fails, does that mean that it does not support device_id? or is it a bug in neutron? 14:14:42 with the current implementation, there is no problem with device_id, as we call first trunk_subport_add 14:14:52 the problem is that, as device_owner is not set to kuryr:container 14:15:00 the port will not be deleted after the container is removed 14:15:30 and, the fact that turnk_subport_add fails if device_id is already set is not a bug 14:15:32 what is the device_owner then? if not 'kuryr:container'? 14:15:33 is the way it should work 14:15:55 if the port is already in used, it should not be made part of a trunk 14:16:01 ivc_: no, it is trunk:subport 14:16:08 device_owner is set to trunk:subport 14:16:08 sometimes, depending on the race 14:16:42 apuimedo thats what i see as the problem. neutron trunk code set it to 'trunk:subport' and we reset it to 'kuryr:container', right? 14:16:55 i think we are breaking some neutron contract here 14:16:56 ivc_: I think so 14:17:03 no, no contract 14:17:05 yes 14:17:17 yes, no? :D 14:17:22 they only set it to trunk:subport for no specific reason 14:17:25 they just tagged subports to kuryr:container for simplicity 14:17:33 there is no real need for that on neutron 14:17:36 neutron might expect it to be 'trunk:subport' 14:17:48 maybe it's not used now, but we should not rely on it 14:17:59 ivc_: mchiappero: "Altough this is not currently required by any of the business logic, it is handy to have this set to help users quickly identify ports used in trunks" 14:18:11 This is the justification for setting trunk:subport in the original patch 14:18:16 nothing uses this fact 14:18:19 'currently' 14:18:36 ivc_: this 'currently' has not changed 14:18:42 and there's no good reason it should 14:18:56 at some point neutron could add the business logic that would rely on it 14:18:57 I agree with using kuryr as device owner, but still, I don't fully understand whether it's a timing issue or what 14:18:57 we can modify (I'm waiting for Armax answer) the way trunk_subport_owner set the device_owner 14:19:04 ivc_: I agree with you on that 14:19:08 I find it misguided 14:19:15 and make it possible to not set it to anything (for the kuryr case) 14:19:59 ivc_: I honestly just don't see the point of setting it to trunk:subport 14:20:09 neither do I 14:20:14 when it is something that can be checked in the API, that it belongs to a trukn 14:20:16 *trunk 14:20:21 I think it is just to easily find the subports 14:20:24 apuimedo i understand that, but thats the current api 14:20:26 but now the damage is done 14:20:29 exactly 14:21:08 imo best course of action is to update neutron trunk api in a way that would allow us to legitimately set device_owner 14:21:16 yes, it seems reverting that could affect already set deployments 14:21:23 excluding the potential conflict between trunk code and kuryr 14:21:26 it is quite annoying, but we'll probably have to consider whether we do not mark subports or use tags or something else 14:21:38 ivc_: that was my first thought 14:21:46 to extend the api of trunk_subport_add 14:21:51 so that you can pass it a device owner 14:22:10 (which by the way saves us one neutron call :P ) 14:22:15 yup 14:22:29 ltomasbo: did you propose that to armax? 14:22:30 and easy modifications could be to just change the config option, and allow not to set any device_owner 14:22:35 Is there a mailing list thread for that? 14:22:38 without needed to modify the API 14:22:43 which is always trickier 14:23:02 ivc_: the problem with extending the API is that we're probably too close to freeze or already frozen 14:23:03 it is being discussed on the revert patch: 14:23:09 https://review.openstack.org/#/c/419028 14:23:12 ltomasbo: gotcha 14:23:49 -2 ... 14:23:55 :D 14:24:09 I know! But still discussing with armax 14:24:20 ltomasbo got the hammer of justice! 14:24:28 and I agree reverting could not be done, but we are using that to discuss 14:24:37 and then (I suppose) another solution will be proposed 14:24:55 I see the other option, not setting TRUNK_SUBPORT_OWNER 14:25:02 ltomasbo: from the commit message discussion I see an interesting option 14:25:16 and making the code not setting the device owner on trunk_add_subport 14:25:20 we could probably argue for having the device:owner unchanged if it is not None 14:25:22 it will be just a couple of lines 14:25:25 so how tricky it would be to keep 'trunk:subport' owner? do we have some sort of workaround? 14:25:30 and move the update before the trunk_port_add 14:25:37 *trunk_subport_add 14:25:44 sorry but I guess there is no guarantee on the serialization of the operations in neutron 14:25:53 but shouldn't be that way for the same port? 14:26:09 shouldn't actions to the same port be serialized? 14:26:20 wouldn't this solve the issue? 14:26:30 mchiappero: not sure 14:26:37 mchiappero afaik they are. as soon as you got confirmation for your request, it is commited 14:27:08 yes, but calls are async, so, they can be executed in different orders 14:27:09 so in this case our update port could get confirmed first, right? 14:27:15 ivc_: re keeping trunk:subport would break our contract of marking our resources 14:27:19 the problem is not the race as i understand it but just the conflict between kuryr's port update and trunk logic 14:27:32 yes and not 14:27:42 ivc_: For me the race is just a symptom 14:27:57 if it ensure that our call is called after trunk_add_subport fully finished 14:28:01 apuimedo: right 14:28:05 then from kuryr point of view, that will work 14:28:14 but I agree with apuimedo 14:28:17 o/ 14:28:21 the problem is the use of device_owner 14:28:29 exactly 14:28:33 yup 14:28:46 I would expect neutron to perform some ordering or take some port specific lock 14:28:48 I would really like to have the extra parameter in the trunk_subport_add 14:29:06 but that will not solve the problem 14:29:15 if we set device_owner to whatever we want 14:29:19 we can still do that 14:29:23 but the problem will be the same 14:29:36 not an unified view of what device_owner should be about 14:29:57 ltomasbo: recognizing that it can be set to something else kind of forces Neutron to acknowledge that this field is informative for them and not for logic 14:30:06 ltomasbo, toni's point is if we get 'device_owner' as part of 'trunk_subport_add' api, it would make 'kuryr:container' device owner legit 14:30:18 got it 14:30:22 ivc_: or a "use at your own risk" 14:30:34 depends on the wording in the method doc 14:30:39 that gets merged 14:31:09 of course, for kuryr deployment we can state that TRUNK_SUBPORT_OWNER should be set to kuryr:container 14:31:19 I still don't fully understand: does setting the owner after a well finished and seccessful trunk_subport_add work? 14:31:21 and that will solve the problem, at the expense of flexibility 14:31:23 ivc_: I didn't want to say it. But the right fix, things being what they are now, would be that there would be multiple owners 14:31:39 but that is even more API breaking :/ 14:31:52 mchiappero: it does 14:31:59 ok, so the problem is neutron 14:32:05 but it makes us tread in unsafe waters 14:32:19 that's something for them to fix 14:32:31 that if neutron subport related code started relying on this (like for upgrades) 14:32:39 it could render our subports useless 14:33:12 the fix I'd like to avoid, but that work work right away 14:33:18 apuimedo i think multiple owners would only add confusion. the 'owner' should be unique, but the device_owner field should not be used for storing 'informative' date as it is the case with 'trunk:subport' 14:33:30 data* 14:33:32 ivc_: agree 14:33:32 is to use a new tag instead of the device:owner 14:34:21 imo kuryr is the real owner in this case 14:34:32 ivc_: I agree with you, if Neutron wanted so much to have this informative to avoid checking in DB whether it was a subport, it could have added a field for that 14:34:39 ivc_: no question about that 14:34:45 but we need a pragmatic solution 14:35:06 let's wait to hear what ltomasbo gets from armax in proposing a new parameter for the trunk operations 14:35:47 would you agree/like the other solution? just disabling trunk_add_subport to write on device:owner? 14:35:51 the problem is HCF 14:35:56 #action ltomasbo to continue discussion with armax, proposing trunk_subport_add to receive optionally an API owner name 14:36:13 ltomasbo: disabling it how? 14:36:16 ivc_: HCF? 14:36:23 hard code freeze 14:36:41 as they just set whatever value is in TRUNK_SUBPORT_OWNER in the config.py file 14:36:45 just setting that to none 14:37:02 and if it is set to none, then just don't call update_port 14:37:15 and if it is set to wahtever it is, keep working as it is 14:37:22 so that it does not break current deployments 14:37:32 ltomasbo: that would apply to all the ports, not just those used by kuryr 14:37:44 only to subports 14:38:00 right, but a user may use subports for other purposes 14:38:20 apuimedo ltomasbo if we look at it from different perspective, do we need 'device_owner' for cleanup only? 14:38:38 we don't even really needed 14:38:45 is just a filter to speed up the search 14:38:55 ivc_: we use it for cleanup. But its main purpose is to notify that it is automatically handled by kuryr to users 14:39:10 it was sort of... Since it is already there, let's use it 14:39:47 to me, it makes sense, as it is kuryr service creating/managing them 14:39:49 I used it a lot while working on ipvlan 14:39:50 as much as i dislike special-casing, maybe then we could have a special case for 'trunk:subport' that would fetch the ports for kuryr-managed nodes somehow 14:39:58 i often had leftovers 14:40:10 while tagging that it is a subport, should not go on device_owner, but device_type or something like that if they need it 14:40:13 ofc until we get a proper api update on neutron side 14:41:36 ivc_: let's wait a week to see what Neutron people say 14:41:48 and then we can decide on contingency measures 14:41:50 sure 14:42:32 anything else about kuryr-libnetwork? 14:43:34 very well 14:43:35 moving on 14:43:42 #topic fuxi 14:43:50 #chair hongbin 14:43:51 Current chairs: apuimedo hongbin 14:43:53 hi 14:44:06 in last week, there are several proposed fixes 14:44:17 #link https://review.openstack.org/#/c/419767/ 14:44:19 hongbin: today I was asked about fuxi on magnum. Do we have some docs on that? Or it only targets bare metal? 14:44:39 apuimedo: i am happy to explore fuxi on magnum 14:44:56 apuimedo: it is definitely one of the target 14:45:03 apuimedo: there are several things that needs to be done 14:45:13 apuimedo: 1. containerized fuxi 14:45:19 apuimedo: 2. trust support 14:45:34 apuimedo: then, we are ready to propose it to magnum 14:45:57 cool 14:46:04 sorry for the interruption 14:46:09 apuimedo: np 14:46:09 :-) 14:46:24 yes, to continue, 14:46:36 i was trying to move fuxi to py35 14:46:43 #link https://review.openstack.org/#/c/419683/ 14:47:03 the last one, i have a pov for making multi-tenancy support 14:47:12 #link https://review.openstack.org/#/c/420386/ 14:47:25 all of those are under review, feedback is appreciate 14:47:31 apuimedo: that is all from my side 14:47:51 hongbin: I suppose swift's failure with py3 is reported as a bug, right? 14:47:55 thanks hongbin 14:48:02 #topic kuryr-kubernetes 14:48:09 apuimedo: yes, we worked around swift 14:49:04 #info ivc_ is a new core for Kuryr-kubernetes! Congratulations! 14:49:19 thanks! :) 14:49:20 ivc_: and now that you are congratulated, pls review https://review.openstack.org/#/c/419933/ for merge :P 14:50:00 #info vikasc reported that he is finishing addressing ltomasbo and ivc_'s comments to https://review.openstack.org/#/c/410578/ 14:50:29 hongbin: we could appreciate help into drafting a plan to integrate kuryr-kubernetes with Magnum once that patch is merged 14:50:48 a TODO list or something like that 14:50:57 apuimedo: i can try 14:51:04 s/could/would/ 14:51:05 nice, I'm trying to follow the instruction to set it up, but so far no luck with it 14:51:05 :-) 14:51:07 thanks hongbin 14:51:19 ltomasbo: you mean vikasc's patch? 14:51:23 yep 14:51:31 using the devstack templates 14:51:33 ltomasbo: ping him in the morning then :P 14:51:40 I'll do 14:51:43 cool 14:51:51 ivc_: any news on the services front? 14:52:07 nope 14:52:31 very well 14:52:36 servivcs == lbaas/octavia? 14:52:43 s/servivcs/services 14:52:44 ltomasbo: neutron-lbaasv2 14:52:55 we should add octavia after that 14:53:05 is there a patch on that already? I would like to take a look 14:53:06 ltomasbo i think toni is referring to the split of https://review.openstack.org/#/c/376045/ :) 14:53:10 it should be a matter of changing the driver 14:53:28 great! 14:53:30 ltomasbo: there is a patch, the one ivc_ links to. However, it needs a bit of splitting and UT 14:53:54 should that include the floating ip support too (in a follow up patch)? 14:54:12 let me check 14:54:27 (maybe it already does...) 14:54:32 ltomasbo it does not 14:54:33 just asking... 14:54:39 ok 14:55:01 I was checking if you could define externalIP for pod 14:55:04 apparently you can't 14:55:12 so yeah, it should be a follow-up patch 14:55:53 anybody's got anything about kuryr-kubernetes? 14:56:20 apuimedo ltomasbo thats a rather trivial change technically, but i'm not yet certain if floating ip is the right fit for external IP 14:57:11 ivc_: why? 14:57:38 I think it is something you can add to the VIP in lbaas, so that the loadbalancer can get reached from outside 14:58:09 ltomasbo apuimedo because there's also 'loadbalancer' type service 14:58:58 my understanding was that external ip (https://kubernetes.io/docs/user-guide/services/#external-ips) from k8s point of view is a an IP configured on the node's interface 14:59:04 apuimedo: y'all wrapping up? 14:59:11 sigmavirus: we are 14:59:15 sorry about that 14:59:21 No problem :) 14:59:27 let's move to the channel 14:59:32 thank you all for joining 14:59:36 #endmeeting