09:01:07 #startmeeting Dragonflow 09:01:08 Meeting started Mon Sep 5 09:01:07 2016 UTC and is due to finish in 60 minutes. The chair is oanson. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:01:09 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:01:12 The meeting name has been set to 'dragonflow' 09:01:23 All right, who is here for the Dragonflow meeting? 09:01:27 o/ 09:01:47 o/ 09:02:37 o/ 09:02:39 I was kinda sortsa hoping for more people :) 09:02:51 let's give them a couple more 09:05:03 DuanKebo, hi. 09:05:20 hi 09:07:31 Well, let's get started 09:07:52 #topic Roadmap 09:08:03 Security groups is giving us a lot of grief. 09:08:12 For a while it isn't working on the gate. 09:08:23 DuanKebo, I understand your team is taking this? 09:08:50 It's duo to kernerl module of ct 09:08:55 *due to 09:09:04 What version has to be installed? 09:09:28 as far as i know, introduced by the dpdk patch. 09:09:50 I think we can optimize the install of ovs 09:10:08 DuanKebo, that would be great. How? 09:10:19 ovs version 2.5 is ok, 09:10:34 Do you know what kernel version and what modules are needed? 09:11:11 if the right version of ovs is installed, we needn't to uninstall and install it again. 09:11:28 Because in my environment I have upgraded to the latest kernel (4.7.2) and it still fails. 09:11:42 so, this is a patch to the dragonflow devstack? 09:12:18 Additionally, in ML2 (I'll go into more details in a second) it failed before the DPDK patch was merged. 09:12:20 i'm using ubuntu 14.04 09:13:13 hi 09:13:17 nick-ma, Hi 09:13:43 Hi 09:13:49 DuanKebo, have you tried locally reverting the dpdk patch and seeing if it solves the issue? 09:14:16 the fialure is caused by not loading contrack module 09:14:27 why dpdk patch affect sg? 09:14:35 sorry i dont get.it 09:14:44 nick-ma, +1 - neither do i 09:14:54 one second, yuanwei has the details. 09:15:17 DuanKebo, yuanwei, what's the name of the kernel module that needs to be loaded? 09:15:42 Hello 09:15:57 wait a sec 09:17:11 I found errors in vswitch log 09:17:27 yuanwei, yes 09:17:46 i tried several combination of kernel and ovs locally. but they failed. any temp solutions? 09:18:16 oanson, i think ovs uses libnetfilter_conntrack 09:18:24 2016-08-31T07:47:38.381Z|00011|ofproto_dpif|INFO|system@ovs-system: Datapath does not support ct_state 09:18:24 2016-08-31T07:47:38.381Z|00011|ofproto_dpif|INFO|system@ovs-system: Datapath does not support ct_state 09:18:24 Datapath does not support ct_state 09:18:24 Datapath does not support ct_state 09:18:24 Datapath does not support ct_state 09:19:15 I think a old datapath was in the kernel, not the new one 09:19:48 yuanwei, the dpdk patch should only be in effect if it is activated. How did it make a difference? 09:21:48 I would like to move on. 09:21:59 I found this patch deleted some codes 09:22:03 DuanKebo, yuanwei, please open a bug about SG, and put all the information you have in there 09:22:18 a temp solution is remove the ovs and reinstall it manually 09:22:23 We will keep the discussion on the bug, and see if we can find a solution. 09:22:34 DuanKebo, noted. Please add that to the bug report. 09:23:01 #action DuanKebo,yuanwei open bug report on SG and update it with cause and workaround 09:23:10 About ML2 09:23:18 I have tried setting it up and testing it on my environment. 09:23:28 It appears to work, but the fullstack tests fail there as well 09:23:41 I also saw Li Ma tried to have the gate test it, and there are many errors there too. 09:24:00 In my environment, specifically, some issues (but not all) were related to SG, so I will try again with the workaround 09:24:15 DuanKebo, can you guys take ownership on this and iron out the ML2 bugs? 09:24:38 I want to reach a state where ML2 is completely working before the summit 09:24:52 We have the same puzzles. 09:25:24 I think for now we should add another gate task for ml2 fullstack. 09:25:28 the ml2 pulgin runs successfully on our local envionment. 09:25:44 i.e. have both core plugin and ml2 run on the gate. 09:25:49 but no problem, we can take it. 09:25:58 oanson, perhaps try to uninstall OVS and re-install OVS 2.5 09:25:59 This way we can keep tabs on how well ml2 is doing as well. 09:26:34 https://review.openstack.org/#/c/348169/15/devstack/ovs_setup.sh@116 @MaLi those change may cause that error I mentioned 09:26:41 oshidoshi, will try that too 09:27:08 oanson, if that works out, the devstack can be updated to do the same (uninstall and reinstall) 09:27:23 oshidoshi, the devstack already does that. 09:27:30 The location may have to be changed 09:27:36 you mean the removing of module 09:27:46 yes 09:28:06 yuanwei: ok, but these lines are always failed in my local env. 09:28:37 i think we need a more stable solution to reload kernel mmodule. 09:29:09 yes 09:29:12 ok, I agree, that is only a temp fix 09:30:37 in fact, for system like ubuntu 14.04, we can just use the default ovs, it's already 2.5.0 09:30:58 DuanKebo, I think you need to install the cloud:mitaka repository for that. 09:31:03 And in Fedora, that's what we do 09:31:22 I think we just need to make sure the conntrack module is loaded before openvswitch 09:32:21 QoS and VLAN reviews are online. 09:32:32 Non-cores, please review them. 09:33:00 There are a lot of reviews, and only two cores, so we need all the help we can get 09:33:23 Especially the long ones, e.g. qos, vlan, extra routes 09:33:45 DuanKebo, about extra routes - what about adding routing policy api to Neutron? Is that still interesting? 09:34:23 Yes, before we run it stably in df 09:34:40 all right. I'll work something up. 09:35:09 Please also review this patch https://review.openstack.org/#/c/339975/, that is the first patch about allowed address pairs module 09:35:27 ok 09:35:40 yuanwei, this patch has 2 downvotes on it. 09:35:46 I will update this patch today 09:35:57 All right. Then we will review it. 09:36:07 thanks;) 09:36:25 Any other road-map issues? 09:36:27 oanson, you need spend more time reviewing also ^o^ 09:36:54 DuanKebo, as I mentioned, there are too many reviews for the number of reviewers. 09:37:17 But I am doing my best :) 09:37:45 Any other roadmap issues? 09:37:56 i will also try to spend more time on reviewing. 09:38:12 nick-ma, I think you're the only one who reviews too much :) 09:38:42 #topic Bugs 09:39:01 sec. 09:39:14 There is bug 1619101 09:39:14 bug 1619101 in DragonFlow "secgroup ofperror flooding in the fullstack ci" [Critical,New] https://launchpad.net/bugs/1619101 09:39:28 yes, i wanted to talk about it 09:39:38 It is critical, and it is SG related. Yuanwei, can I assign it to you? 09:40:57 is this the same problem related with the contrack 09:41:02 Is bug 1571551 still experienced by anyone? 09:41:02 bug 1571551 in DragonFlow "Kernel module vport_geneve.ko fails to load on ubuntu" [High,New] https://launchpad.net/bugs/1571551 09:41:11 DuanKebo, yes, that's what I thought as well 09:41:24 That's why I am assigning it to you and yuanwei . 09:41:55 All right, I am bumping 1571551 down to medium. If anyone runs into it, please bump it back up to high. 09:41:56 OK 09:42:04 ok 09:42:13 we need https://bugs.launchpad.net/dragonflow/+bug/1571551 09:42:13 Launchpad bug 1571551 in DragonFlow "Kernel module vport_geneve.ko fails to load on ubuntu" [Medium,New] 09:42:32 we can bypass it by completely reinstall ovs 09:42:47 before it is solved 09:43:04 DuanKebo, yes, but that's a workaround. On the other hand, this is devstack environment, so it's acceptable. 09:43:28 Bug 1480672 , which is on Gal 09:43:28 bug 1480672 in DragonFlow "Add Neighbour Discovery handling in local controller" [Medium,Triaged] https://launchpad.net/bugs/1480672 - Assigned to Gal Sagie (gal-sagie) 09:43:33 lihi, I understand you are working on it? 09:43:52 yes, omer, we still need work on the bug. 09:43:52 yes, I'm on it 09:44:12 DuanKebo, I am just organising to know who works on what. 09:44:24 You don't have to solve it today. 09:44:26 Tomorrow... 09:44:35 lihi, I am assigning it to you, then. 09:44:55 we need another owner for : https://bugs.launchpad.net/dragonflow/+bug/1614334 09:44:55 Launchpad bug 1614334 in DragonFlow "Fail to Install dragonflow" [Medium,New] 09:45:01 OK. Just don't forget to review it on time 09:46:07 yuli, if no one, I can take it. 09:46:13 DuanKebo, about bug 1614334, did you have git installed when this happened? 09:46:13 bug 1614334 in DragonFlow "Fail to Install dragonflow" [Medium,New] https://launchpad.net/bugs/1614334 09:46:31 DuanKebo, it is possible that this bug happened because of old libraries 09:46:51 yuli_s1, no. 09:47:10 I'll take it. I was just looking at versioning issues this morning, so I may recognise something :) 09:47:15 I think someone else who don't use git may face this problem 09:47:41 if i git clone the code, it can install successfully. 09:47:55 DuanKebo, yes. 09:48:11 yuli, not because of old libs. 09:48:16 That's what the error message says - it needs access to the remote repository, which it doesn't have with a tarball. 09:48:27 as i have said, when using git clone, it works. 09:48:39 Yes 09:48:50 All right, I'll see what I can find about it. 09:48:50 yesterday i found another issue with performance degradation when creating of hundreds of subnets, 09:49:19 yuli_s1, did you open a bug report? 09:49:22 i have not reported it, i will report it and take it to myself. 09:49:38 Very good. 09:49:49 control plane performance? 09:49:50 Anything else? 09:49:59 #topic Performance 09:50:14 yuli_s1, any new exciting results? 09:50:20 i was working on performance tests 09:50:42 I have build a following test 09:51:03 30+ real servers running full devstack installation 09:51:29 +129 additional df-controllers (each one works with it's own br-int) 09:51:36 (on each server) 09:52:00 in my test I am foing the following 09:52:09 i create a network and 256 subnets for it 09:52:37 currently it takes almost 200 secs to finish the whole tests 09:53:08 yesterday I was checking 09:53:18 According to your test, do you think currently, can dragonflow support 4000+ compute nodes? @yuli? 09:53:23 number of rules added to each br-int 09:53:48 when running script that printed me numbers of rules 09:54:03 sometimes the counter went down and than up 09:54:24 i suppose, when receiving an update for a network, we recreate dhcp related rules 09:54:40 witch can be a reason for performance degradation 09:55:08 DuanKebo, I think we can support it, but I think we need to run another test first, to measure full-system update when creating ports - it is the more generic use case 09:55:11 yuli, can you report a bug about this? 09:55:18 sure, will do it 09:55:40 what is full-system update? 09:55:41 DuanKebo, yes 09:55:41 sure 09:56:09 yuli_s1, 256 subnets per server? Or just 1 network with 256 subnets for the entire test? 09:56:20 130 df-controllers per server 09:56:33 But how many subnets? 09:56:36 in test I create 1 network with 256 subnets 09:56:37 And was this core or ML2? 09:56:46 yuli_s1, I see. 09:56:51 sec. 09:57:10 not ml2 09:58:00 yuli_s1, I think the test with ML2 is also important - since this will be our standard next version 09:58:05 (hopefully) 09:58:15 sure, I will test it too ! 09:58:20 All right. That's all the time we have 09:58:33 Lucky no one added an items to the open discussion :) 09:58:38 we'd better test is first. @yuli 09:58:53 ok 09:58:55 Anything else before we finish? 09:58:55 if you have some problem testing it, we can help you 09:59:01 thanks ! 09:59:19 DuanKebo, I will try to push the scripts to repository 09:59:38 yuli_s1, that's a great idea! 09:59:44 so you will be able to see the code 09:59:46 launchpad blueprints need to be updated. 09:59:56 nick-ma, yes. 10:00:05 I will go over them for next meeting. 10:00:09 ok, we will review it. 10:00:11 I think we need more than 2 minutes for them :) 10:00:30 #action oanson Review and summarise blueprints. 10:00:37 All right. That's our time. 10:00:40 Thanks everyone. 10:00:49 #endmeeting