09:01:03 #startmeeting dragonflow 09:01:04 Meeting started Mon May 16 09:01:03 2016 UTC and is due to finish in 60 minutes. The chair is gsagie. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:01:05 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:01:08 The meeting name has been set to 'dragonflow' 09:01:09 Hello everyone, 09:01:11 hi 09:01:16 Who is here for dragonflow meeting? :) 09:01:18 Hi 09:01:19 hi 09:01:50 hello 09:01:54 hi 09:01:55 Hi 09:02:06 yo 09:02:22 #info #info scsnow, DuanKebo, gampel, yuli_s, oanson, Shlomo_N, vikasc_, gsagie in meeting 09:02:41 ok everyone we are doing the first meeting after the summit 09:02:54 lets focus on the version stability and bugs 09:03:08 and try to get into a version we can tag 09:03:15 #topic bugs - security groups 09:03:53 DuanKebo: i enabled tempest security groups tests and some tests are failing, we need someone to look at the logs and fix these bugs 09:03:57 you can see the scenarios here: 09:04:19 http://logs.openstack.org/58/316558/2/check/gate-tempest-dsvm-dragonflow/6e108c9/logs/testr_results.html.gz 09:04:33 If you go down you will see .TestSecurityGroupsBasicOps 09:04:52 there is a good chance that the failures are related to floating IPs as well so we need to investigate this 09:04:55 Hi gal, np 09:05:06 We also should investigate the other failing tests there (many related to floating ip) 09:05:11 okie great 09:05:24 yes we need to fix all the errors in the tempest 09:05:32 #action DuanKebo to investigate failing security groups tempest tests 09:05:45 Any more known bugs for security groups? 09:05:46 i have numerous problems with FIP 09:05:57 lets finish with security groups for sec and then move to FIP 09:06:02 but each time the bug is different and I can not reproduce same bug 09:06:20 yuli_s: any important bugs open for security groups? 09:06:52 I saw we have a patch that implement port security, we need to review this but its less critical right now 09:07:06 oanson: noticed any bug with security groups? 09:07:10 i have seen the following problem, the packet comming from outside 09:07:14 No. 09:07:21 Nothing new that isn't reported 09:07:22 to fip was no passing the firewall 09:07:52 yuli_s: you mean the SG rules 09:07:54 yuli_s: do we have a bug report? 09:08:28 and is someone from DuanKebo know this bug? i think we need to work closely with them and report and assign the bugs to the appropriate person 09:08:37 nop, I will try to debug this. all the time I have different problems with FIP 09:08:57 but I can not reproduce something specific 09:09:11 yuli_s: we are still with security groups :) 09:09:27 you have problems with FIP and security groups? 09:09:39 yes, 09:10:14 i will try to reproduce one of the bugs and report it 09:10:16 ok, we need to try and identify a problem and the way to reproduce it in order to move this into a bug 09:10:41 reported bugs with no reproduction are problematic 09:10:57 okie and then we can assign this to DuanKebo team 09:11:12 I have reported a SG bug and it still not assiged to anyone: https://bugs.launchpad.net/dragonflow/+bug/1577121 09:11:14 Launchpad bug 1577121 in DragonFlow "VM isn't getting an IP using Nova api boot" [High,New] 09:11:52 humm 09:11:56 are you using nova-network commands 09:12:03 ? 09:12:03 Hi Shlomo, i will check the sg bg 09:12:23 and find someone to solve it. 09:12:24 ok, 10x DuanKebo 09:12:25 gampel: i believe this might be the problem, i think he is using the client 09:12:37 the python client 09:12:38 Hi folks, i am newbie and would like to start an easy bug. Please let me know if I can help. 09:12:52 vikasc_: welcome :) 09:13:00 vikasc_: would you like to look at this bug? 09:13:01 gsagie, thanks :) 09:13:02 welcome! 09:13:15 Shlomo_N: please add to the bug report details of how you create the VM with Nova 09:13:17 the one which Shlomo_N reported? 09:13:23 sure 09:13:23 or the python script 09:13:35 vikasc_: ok great, will assign this to you 09:13:36 https://bugs.launchpad.net/dragonflow/+bug/1577121 09:13:37 Launchpad bug 1577121 in DragonFlow "VM isn't getting an IP using Nova api boot" [High,New] 09:13:44 Shlomo_N: I comment in your performance patch we support only neutron network , i am not sure if this is related 09:13:57 vikasc_: or just assign this for yourself 09:14:05 vikasc_: welcome 09:14:14 gsagie, thanks. will assign 09:14:19 #action vikasc_ look at bug 1577121 09:14:32 gampel: I have used nova boot command 09:14:35 gampel, I was able to setup with your pointers. Thanks :) 09:14:40 #action Shlomo_N update bug 1577121 with steps to reproduce (script or CLI commands to create VM with Nova) 09:14:41 bug 1577121 in DragonFlow "VM isn't getting an IP using Nova api boot" [High,New] https://launchpad.net/bugs/1577121 09:14:45 you are welcome 09:15:12 vikasc_: feel free to ask in the channel if you encounter any problems and thanks for your help 09:15:19 #topic bugs - distributed DNAT 09:15:25 gsagie, sure 09:15:37 okie, any problems regarding FIP? 09:15:41 i saw oanson has a patch 09:15:54 #link https://review.openstack.org/#/c/316558/ 09:16:16 This patch is l3 general - not DNAT specific 09:16:16 DuanKebo: i need you to look at this patch and review as well please, i believe you also investigated this issue and said there is a missing RPC call 09:16:40 yuli_s: did you test the patch ? 09:16:59 gampel, I am running stack now with this patch 09:17:03 gampel: i reviewed the patch, it has some comments, we dont need to use Neutron DVR 09:17:23 classes 09:17:27 Ok I will review it today 09:17:44 gsagie, I am not certain that's true 09:17:50 OK i will review it 09:17:53 Neutron detects the type of agent. 09:17:56 gampel: do you still want us to test this patch? 09:17:56 today or tomorrow 09:18:11 If the agent is legacy, not DVR, neutron assumes the router shouldn't be on the compute node. 09:18:27 In that case, the agent doesn't receive the routers to update (during, e.g. sync) 09:18:47 If the agent doesn't mark the routers as updated, Neutron code deletes the related namespaces 09:19:02 oanson: so what you are saying is that Neutron doesnt work unless its working in DVR mode 09:19:24 In our deployment 09:19:31 Since we are doing a form of DVR 09:19:40 routers are implemented on compute nodes, and not network nodes. 09:19:57 we are not doing a form of Neutron DVR, lets just continue to investigate this problem 09:20:13 gsagie, I'd be happy if you could explain 09:20:33 oanson: But the l3-agent is not aware of that so it should work 09:21:03 DVR is implemented internally in DF, Neutron l3 is not aware for this, on the network node we do the same behaivour as the legacy Neutron implementation (for SNAT) 09:21:21 also we need to pay attention that with the added code the FIP namespaces are not added 09:21:28 But we can take this offline 09:21:44 yes I agree lets take this offline and lets all review the patch 09:22:01 i was the one that wrote the Neutron seperation code between the DvrLocalRouter and DvrEdgeRouter so i am familiar with this code 09:22:30 DuanKebo: in general i think that both yuli_s and oanson see that FIP feature is not stable enough right now, is that correct oanson and yuli_s? 09:22:44 gsagie, so can I expect a walkthrough published soon? :) 09:22:47 gsagie, yes 09:23:10 oanson: np :) 09:23:22 you both can report bug about it, let work together to stablize it 09:23:30 DuanKebo: anyone from your team is currently testing the version? 09:23:33 sure 09:23:39 we need more testing on the FIP part 09:24:01 DuanKebo, what I saw is reported. I was trying to solve the bug with the patch above. 09:24:12 DuanKebo: will it be possible for you team to do some FIP and SG testing 09:24:22 btw I have this problem: https://bugs.launchpad.net/dragonflow/+bug/1581946 09:24:23 Launchpad bug 1581946 in DragonFlow "when df-l3-agent is used external network in not created" [Medium,New] - Assigned to Duan Kebo (duankebo) 09:24:58 We need to make sure that we have good reproduction steps in all the bugs 09:25:26 yuli_s, I think this is the l3 agent issue. I think these networks are only created when q-l3 is enabled. 09:25:27 We will do some FIP test after this iteration. 09:25:36 yuli_s: can you please go over all the open FIP bugs that are unassigned and assign them to DuanKebo team? 09:25:53 i will go over it and check that all the bugs are clear enough 09:26:05 DuanKebo: if there is any bugs you need more information about please reach out 09:26:16 some of the descriptions are not enough 09:26:30 DuanKebo: this bug is assigned to you, are you looking into it ? 09:26:51 np, you can assign it to me. 09:27:08 It is all ready but if you can not I can look into it 09:27:38 I can, thank you Eran! 09:27:54 okie, anything else on FIP? 09:28:04 lets organize the bugs into priorities 09:28:25 #action yuli_s organize bugs into priorities and assign unassigned bugs 09:28:37 nick-ma here? 09:28:53 yes 09:28:59 #topic DB consistency 09:29:12 nick-ma: saw the patches, anything need to be discussed? 09:29:36 currently the db objects are all versioned. i'm discussing with hujie. he will follow the spec to go to the next step. 09:29:47 okie great, good job 09:30:03 #action hujie continue versioned object work 09:30:06 good job with the lock :) 09:30:14 gampel: anything else on this topic? 09:30:22 hear hear 09:30:37 we need to continue the version obj to to the local cache 09:30:53 gampel: yes, someone needs to do a design for this 09:30:54 I will try to add this 09:31:00 i can take it if you want 09:31:00 yes, and pub/sub. 09:31:24 nick-ma: yeah, send the version and update the cache accordingly 09:31:26 yes the high level; design is in the pub sub spec , i will review if we need to update it 09:31:36 Eran, hujie is working on it 09:31:53 DuanKebo: so no help is needed by me for the local cache and pub-sub for versions ? 09:31:54 consistency between df db and local cache 09:32:10 Ok great tell him to talk with me and review the pub sub spec 09:32:25 need you to review ;-_ 09:32:27 #action hujie work on pub-sub version and local cache - db consistency using object versions 09:32:30 ;-) 09:32:34 will do 09:32:37 #topic open patches 09:32:45 okie, DuanKebo now its your time :) 09:32:53 what are urgent patches that needs to be reviewed? 09:32:57 and we havent reviewed yet? 09:33:14 we have several patch to review 09:33:35 some are in merge conflict, so needs to resolve 09:33:44 the redis ha patch and ml2 driver patch 09:34:09 #action nick-ma,gampel,gsagie review Redis HA and ML2 driver patches 09:34:14 anything else urgent? 09:34:26 I think here's another patch that we need to take care of, https://review.openstack.org/#/c/315433/, this is not new feature, but has lots of changes on interfaces. 09:34:29 Do we want to stabilize the main branch before merging the ML2 ? 09:35:08 gampel: i think the ML2 is not related to any code 09:35:13 we can make ML2 an option in devstack. 09:35:21 so its transparent, if its not then we shouldnt merge it at this point 09:35:39 i agree with gal 09:35:45 #action gsagie,gampel review https://review.openstack.org/#/c/315433/ 09:35:54 thanks nick-ma, will take a look 09:36:03 Ok no problem we just need to make sure we can have a sable tag soon 09:36:22 yes, we are not merging anything that might hurt stability or new features at this point until we tag 09:36:23 Okie. 09:36:41 ok 09:37:09 DuanKebo: we will take a look at these patches ASAP 09:37:33 any other important open patch that needs our attention? 09:37:33 OK, 3x to all of you! 09:37:52 I also will encourage guys in bejing to review 09:38:11 btw DuanKebo one thing that is missing in the ML2 patch is to do another patch to add L3 service plugin to run with it 09:38:35 #topic metadata service 09:38:36 Yes , we are working on ti 09:38:41 DuanKebo: ok great 09:38:45 oanson: :) 09:38:49 Yes 09:38:58 I have uploaded a metadata service spec 09:39:05 it is here: https://review.openstack.org/#/c/309753/ 09:39:18 #link https://review.openstack.org/#/c/309753/ metadata service spec 09:39:20 There are some open issues that I would like to cover 09:39:26 For instance the namespace issue 09:39:33 oanson: can you remind the issue? 09:39:41 I think the service can be done without a namespace. I think it is secure enough 09:39:50 I think it will lead to better performance. 09:39:59 oanson: you mean as an application ? 09:40:05 or as a process 09:40:19 I mean as a service within the DF controller 09:40:21 gampel: as a process i believe 09:40:32 or a separate process - whichever is preferable. 09:40:34 or a service 09:40:44 a thread in the df_controller ? 09:40:49 gampel, yes 09:41:06 why do you think it weill give beter performance ? 09:41:14 Less processes means less bookkeeping. So within the DF controller seems preferable. 09:41:39 If the service is run within a namespace, then we need a veth pair and NAT to allow packets to reach nova API. 09:41:48 Without the namespace, this can be waived. 09:42:09 i think a thread is going to be problematic, but i dont think performance is the main issue here regardless, security and isolation 09:42:25 gsagie, please explain. 09:42:31 But then it is in Python performance will be bad 09:43:05 i agree with gampel here, i dont think we want to run it in a thread either way, another process is ok. what we are talking is about the namespace or not the namespace 09:43:21 The performance is not the main issue. Without a namespace we have simpler code. 09:43:26 Especially if it is run as a thread. 09:43:41 Why don't we want to run it in a thread? 09:43:51 But then your controller is busy with data path 09:43:57 DuanKebo: basically according to oanson spec metadata traffic is going to go from the VM to a process running on the compute host (our metadata service) is this something that is acceptable or do you think this process needs to run in a seperate namespace for isolation? 09:44:37 is it not a app? just like dhcp? 09:44:45 DuanKebo: not only an app 09:44:48 And why do we need isolation? The code is safe. External data is sanitised (in some cases twice). 09:44:49 as we do TCP termination 09:45:13 so we have an app to install flows and a "service" that adds the HTTP hearders 09:45:16 gampel, It's very little data, and very little to deal with. And if done from the controller, we can use the cache 09:45:26 and traffic is directed to (oanson please correct me if i got something wrong) 09:45:30 it 09:45:38 I'm not sure about this question, currently we don't work on metadata service. 09:45:56 gsagie, the traffic is always directed to 169.254.169.254. We divert it in OVS. 09:46:07 DuanKebo: oanson works on it, we just wanted to consult if the way we are planning to do it is ok with you 09:46:12 But essentially, yes, the traffic is directed to the service. 09:46:29 I think that the DF controller should be part of the data PATH if it is not mandatory , you could have multiple VM or container trying to connect the metadata server it in the same time 09:46:48 OK, i will review the patch. But i think performance is not the problem. 09:46:56 for metadata. 09:47:06 DuanKebo: yes, mostly a security issue 09:47:23 yes. the metadata only services the local instances. 09:47:38 i dont think the namespace addition is a big overhead either way, especially since its only one namesapce, but as oanson indicate its simpler with out it 09:47:39 gsagie, I am missing the security issue. 09:47:41 i am ok with this 09:47:44 I am back 09:48:03 problem with Internet connectivity 09:48:23 gsagie: what are your security concern , DOS attack ? 09:48:30 if the code guarantees the isolation by nature, why not use the thread instead of a process in namespace. 09:49:04 gampel: its not only security, we are attaching processes and binding ports in the host machine without doing isolation for no good reason, but as i mentioned its not so important for me 09:49:08 +1 nick 09:49:16 DOS attacks are not solved with namespaces anyway. 09:49:22 +1 nick-ma 09:49:23 i suggested on the spec to email to the mailing list to ask about it 09:49:50 I explain in the spec exactly how and why it is safe. 09:49:57 nick-ma: its about binding a process to the host networking 09:50:06 We only bind to the one address (169.254.169.54) on an ephemeral port 09:50:11 okie 09:50:12 My feeling is that the user space thread that we are using are very problematic (signal core etc ) I am not for adding data path in a thread 09:50:49 got it. 09:50:51 if everyone are ok without the namespace its fine with me, i dont have -1 on the patch anyway 09:50:58 lets continue to next topic 09:51:23 another isuue with this patch i feel that using ports is much simpler 09:51:25 its not that critical either way, switching to a namespace will not be too hard if we find problems 09:51:33 To summarize: separate process. No namespace. 09:51:41 gampel: i agree 09:52:15 gampel, what do you mean? 09:52:30 oanson: using TCP ports and not changing IPs i believe 09:52:46 oanson: I think that using the IP address for the redirection is too complicated , it will be much simpler with ports redirection 09:53:14 I am not sure I see how. We have to override one field. We have to somehow keep the original value of that field 09:53:24 What does it matter if its IP or port? 09:53:25 oanson: no need for the new ARPs 09:53:29 responders 09:53:33 we can manage this per network 09:53:37 and not per port 09:53:43 but again lets take this offline 09:53:47 gsagie, that's not true, since we need to support any IP address anyway. 09:53:56 ARPs are sent anyway. 09:54:00 oanson: thats not how this works in Neutron 09:54:03 oanson: it is much easy to review debug and simpler to understand , the only drawback is the size 09:54:12 as we discussed cloud-init happens after the instance gets IP 09:54:13 gsagie, Yes. That's the point. :) 09:54:28 so we dont need to support "Auto IPs" 09:54:41 gsagie, which is irrelevant (as I mentioned in the spec). Any IP is supported. 09:55:01 The proposed solution supportes auto IPs (and any other IPs) automatically. 09:55:23 gampel, I don't see the difference. 09:55:31 okie, lets continue this on the review 09:55:35 or offline 09:55:35 The OF port is encoded in the new IP. So we immediately know the source VM anyway. 09:55:40 want to give Shlomo some minute 09:55:48 #topic performance testing 09:55:52 Shlomo_N: any update? 09:56:00 10x :) 09:56:26 I working on the auto-testing framework 09:56:38 You can already use it for env. setup 09:56:44 for multi-node 09:56:46 Shlomo_N: any patches? 09:56:51 so we can test it 09:56:51 sure 09:57:00 https://review.openstack.org/#/c/304470/ 09:57:22 #link https://review.openstack.org/#/c/304470/ 09:57:28 okie thanks, we will take a look 09:57:37 great 09:57:38 did we get any performance numbers from feipeng? 09:57:47 comparing DF to DVR? 09:58:08 not yet, he need to send me in the coming days 09:58:10 if you haven't, i can send them to you 09:58:24 DuanKebo: yeah please 09:58:26 he sent you? 09:58:26 thanks :) 09:58:32 sure, thanks 09:58:57 yes, but l3 only 09:59:22 anything else anyone? 09:59:27 #topic open discussion 09:59:35 Hi Gal 09:59:46 hi :) 09:59:49 :) 09:59:55 gampel, are you working on https://bugs.launchpad.net/dragonflow/+bug/1524348 ? 09:59:56 Launchpad bug 1524348 in DragonFlow "dhcp app is missing some functionality which are defined by protocol" [Low,New] - Assigned to Eran Gampel (eran-gampel) 09:59:56 lol 09:59:57 CI use etcd now, 10:00:09 can we add some test for redis? 10:00:13 DuanKebo: we can replace this for redis 10:00:17 we want Redis to be our default 10:00:20 No soon do you want it 10:00:22 fullstack test for redis. 10:00:37 gampel, Yep, I would like to look into this 10:00:43 if you don't mind 10:00:51 so yes thx please assign it to you 10:00:53 Great! 10:01:03 thanks 10:01:04 DuanKebo: do you know how to do it or you need my help? 10:01:16 oanson, can you please answer my question in https://bugs.launchpad.net/dragonflow/+bug/1562966 ? thanks 10:01:17 Launchpad bug 1562966 in DragonFlow "Subnets in fullstack tests collide and collisions are resolved manually." [Wishlist,New] - Assigned to Pavel Gluschak (scsnow) 10:01:30 No, I don't know 10:01:32 #action gsagie replace etcd with Redis on CI gate testing 10:01:33 scsnow, sure, let me just read it :) 10:01:34 how to do it 10:01:40 oanson, thanks 10:01:45 DuanKebo: okie np, will do it and put you as reviewer so you will see 10:02:01 Great, thank you! 10:02:24 if there are any failure, i can solve them 10:02:28 okie thanks everyone for attending, we are our of time 10:02:36 thanks. 10:02:37 DuanKebo: i will let you know once the patch is up 10:02:39 thanks everyone 10:02:46 thanks 10:02:46 Thank you! 10:02:47 and lets continue in #openstack-dragonflow 10:02:49 Thanks. 10:02:50 for any questions 10:02:53 #endmeeting