14:07:39 #startmeeting kuryr 14:07:40 Meeting started Mon Nov 13 14:07:39 2017 UTC and is due to finish in 60 minutes. The chair is irenab. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:07:41 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:07:43 The meeting name has been set to 'kuryr' 14:07:57 hi, who is here for kuryr meeting? 14:08:19 Hi 14:09:31 o/ 14:09:43 yboaron, dulek hi 14:09:44 o/ 14:09:53 o/ 14:10:28 #info yboaron dulek leyal ltomasbo irenab in a meeting 14:10:36 lets start 14:11:04 apuimedo asked me to chair the meeting today, he is sick and still recovering from the trip to OS summit 14:11:39 I suggest we skip kuryr and kuryr-libnetwork topics, unless you have something to update 14:12:16 #topic kuryr-kubernetes 14:12:35 dulek: do you want to update regarding CNI slip adventure? 14:12:57 irenab: Sure! 14:13:15 Sooo… ltomasbo found bugs that I was fixing most of the last week. 14:13:45 Patches are ready again, bugs were related to incorrect assumption we had in Kuryr, that's why it took me whole week. 14:13:46 well, I would say I hit the bug, but dulek found it 14:14:28 So this is the main patch: https://review.openstack.org/#/c/515186/, it's dependent on one more bugfix. 14:14:28 dulek: the missing container_id check? 14:15:03 irenab: Yes, but also some pyroute2 internal timeout issues. 14:15:23 I hope apuimedo will be well tomorrow and we'll be able to start merging the patches. :) 14:15:26 dulek: please share the link 14:15:46 #link https://review.openstack.org/#/c/515186/ 14:16:06 #action apuimendo irenab to review the https://review.openstack.org/#/c/515186/ 14:16:33 irenab: Hm, I don't have a bug created for this pyroute2 issue as I think it only manifests in CNI daemon case - so fix for this one is included.. ;) 14:17:01 dulek: so the same patch, right? 14:17:05 Yes. 14:17:11 great 14:17:37 good that you and ltomasbo ran scale scenario 14:17:46 irenab: https://review.openstack.org/#/c/515186/11/kuryr_kubernetes/cni/daemon/service.py@194 - if you're interested in how fix looks like. 14:18:09 Yeah, thanks ltomasbo for running the tests! 14:18:24 np, glad to help! 14:18:36 dulek: so the fix is to increase timeout? 14:18:50 and it was not that big scale, just a bunch of pods in a single server (around 20-30) 14:19:29 irenab: More like to make it configurable. My kernel on a VM was just not fast enough to complete all the pyroute2 operations in default 5 seconds. 14:19:52 I wonder if the timeout shouldn’t be dynamically adjustable 14:20:13 irenab: In case a timeout will be hit - kubelet will retry, so with bug 1731485 fixed we'll be fine anyway. 14:20:13 bug 1731485 in kuryr-kubernetes "Kuryr ignores CNI_CONTAINERID when serving requests" [High,In progress] https://launchpad.net/bugs/1731485 - Assigned to Michal Dulko (michal-dulko-f) 14:20:21 dulek: proably need to add some advisory on how to setup the timeout 14:21:06 irenab: To be honest - I don't really know how to do that myself. I was just trying a few values, 30 seconds were enough for ~50 pods being created. 14:21:29 irenab: Plus this timeout doesn't mean we'll be always waiting that much. It's just the maximum wait time. 14:22:10 dulek: I am just not sure we should expose some operator facing configs if we are not sure how to set them … 14:22:51 irenab: I think I can add some documentation describing relationships between timeouts as we have multiple options now. 14:23:26 good enough, lets keep reasonable default and document the relationship to enable tuning 14:23:37 Will do! 14:23:46 dulek: thanks a lot 14:24:30 #action dulek to document the CNI daemon config timeout 14:25:01 dulek: anything else you would like to raise? 14:25:11 Nope, that's it, thanks! 14:25:40 ltomasbo: anything from your side? 14:25:56 I have a patch about adding a readiness probe to the kuryr controller 14:26:03 when running in containerized mode 14:26:15 https://review.openstack.org/#/c/518502/ 14:26:30 dulek, already provided some reviews 14:26:43 ltomasbo: this should prevent kuryr controller pod to be considered active till pools are populated? 14:26:53 it basically makes the controller not ready until the precreated ports have been loaded 14:27:09 #link https://review.openstack.org/#/c/518502/ 14:27:28 irenab, yep, until the existing ports are loaded into their respective pools 14:27:41 ltomasbo: any issue you want to discuss? 14:27:57 #action everyone please review https://review.openstack.org/#/c/518502/ 14:28:06 I tested it and seems to work find 14:28:11 *fine 14:28:45 ltomasbo: One question… Besides pod being not ready - does it affect if it accepts requests? 14:28:56 ltomasbo: I wonder if there supposed to be single readiness probe per container or there can be few? 14:29:35 irenab, that I don't know, but probably there may be more than one, need to check that 14:29:49 irenab, are you thinking on another test to be added? 14:30:17 ltomasbo: yes, maybe one will beneeded once Network Policies are supported 14:30:29 dulek, I assume it should not accept requests, but didn't check actually 14:30:50 dulek: this is probably up to k8s to manage 14:31:17 irenab, worse case we can create a script that it to be executed and return a given value, and that could include several checkings 14:31:20 ltomasbo: If it's before Watcher is started, then it won't. 14:31:30 ltomasbo: +1 14:32:05 dulek, good question, I will double check that 14:32:07 irenab: k8s can manage that if e.g. the Pod is added to a Service. But kuryr-controller have no API. 14:32:48 there is the one for the tool to populate pools 14:32:50 ltomasbo: It would be good to block annotating VIFs until we have all info recovered. But that's most likely already done. :) 14:33:24 dulek: do you suggest internal controller state of being active? 14:33:46 maybe a good idea 14:34:00 dulek, that was the intention with the readiness probe, but I just added the check, so probably not at the right place... :/ 14:34:12 irenab: Yes, or more specifically - not to start any Watcher before all ports are recovered. 14:34:29 ltomasbo, is it probe per pod/resource or a global one ? 14:34:33 ltomasbo: Yeah, so if that was the intention, then k8s will not manage that for you on it's own. 14:34:35 resource 14:34:56 dulek: yes, moving to Active will ‘open’ the controller to external world 14:35:43 yboaron: Pod (which is k8s Controller container) 14:35:51 irenab, dulek: I thought that was managed by kubernetes, if it was not ready, it will not receive requests 14:35:57 irenab: Oh? So until it's ready pod will have no network connectivity? Then it will be unable to complete the recovery. 14:36:04 yboaron, yep, the kuryr-controller pod 14:36:19 ltomasbo: Yes, but what requests? Controller watches on k8s API, doesn't do much more. 14:36:48 dulek, controller calls neutron to get/create the ports 14:37:02 I think it can still work, but will allocate the ports via neutron 14:37:42 ltomasbo: Sure thing, but how can k8s block that? Let me try to find this in k8s docs. 14:38:02 ltomasbo: waht was your intention with the readiness probe? 14:38:16 dulek, irenab: "A pod with containers reporting that they are not ready does not receive traffic through Kubernetes Services." 14:38:48 ltomasbo: Exactly. And we're not using Services for kuryr-controller, as it doesn't have an API. 14:39:12 ltomasbo: so what it supposed to serve? 14:39:14 ltomasbo: Where by Service I mean this: https://kubernetes.io/docs/concepts/services-networking/service/ - it's just a load balancer. 14:39:25 dulek, aren't we? 14:39:45 don't we have a service (lbaas) for the K8S API? 14:39:48 ltomasbo: but k8s controller jst watches the events, it does not serve any API requests 14:40:06 irenab: Right! :) 14:40:39 ltomasbo: I wonder if it is realted to any further work of adding HA to the k8s controller 14:40:53 dulek, irenab: yep, but if kubernete sets the pod as not ready, it will not perform the API actions regarding the kuryr-controller (the needed annotations) 14:41:35 ltomasbo: Can you check that? If it's true, then I'm totally wrong and should apologize. 14:41:59 no, most probably I'm wrong, I did not check that 14:42:09 I just assumed kubernetes was taking care of that 14:42:16 I will double check 14:42:25 thanks for pointing that out! 14:42:33 ltomasbo: so you suggest that k8s will ignore the changes applied by k8s controller on k8s API ? 14:42:44 till it is ready? 14:43:09 that is what I understood from this: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#define-readiness-probes 14:43:27 #action ltomasbo check and update regarding readiness probe for k8s controller 14:44:07 ltomasbo: thanks, lets follow up on this after the check 14:44:31 leyal: do you want to update regarding Network Policy support? 14:44:42 yes , 14:44:50 i upload a spec - 14:45:00 link? 14:45:02 https://review.openstack.org/#/c/519319/ \ 14:45:18 #link https://review.openstack.org/#/c/519319/ Network Policy Spec 14:46:00 everyone please review the spec 14:46:22 leyal: anything you want to raise now? 14:47:05 lets to the discussion on the patch .. 14:47:22 sure, thank you for the update 14:47:42 it's contains a lot of details.. 14:48:00 anything else on kuryr-kubernetes? 14:49:21 #topic open discussion 14:49:47 any other issue/topic to discuss? 14:50:21 I hope next time we will have dmellado and apuimedo to share the inputs from the summit 14:51:10 Well, I think we can close a meeting few munute earlier 14:51:28 thanks everyone for joining 14:51:37 #endmeeting