09:04:05 #startmeeting Dragonflow 09:04:06 Meeting started Mon Apr 11 09:04:05 2016 UTC and is due to finish in 60 minutes. The chair is oanson. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:04:07 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:04:09 The meeting name has been set to 'dragonflow' 09:04:12 gampel, thankgs 09:04:19 #info scsnow gampel Shlomo_N raofei nick-ma oanson yuli_s gsagie in meeting 09:04:36 #topic Security Groups 09:04:50 #link https://review.openstack.org/#/c/280538/ 09:05:13 From the testing so far it does not break the code 09:05:23 We are in feature freeze, but due to the importance of this feature and seeing how close it is to completion, we want to try and merge it. 09:05:24 but does not work properly 09:05:59 yuli_s, dingboopt you were testing this feature. Any comments? 09:06:54 I found a bug with double security group fields, so basically SG does not works for me 09:07:18 oanson: I think we can merge it if it does not break the flow when set to work without SG 09:07:28 gampel, We can do that 09:07:37 yuli_s, do you think it can be made to work by Monday? 09:07:49 (i.e. next Monday, the 18th?) 09:07:51 I see that there is a new patch from today what does it include ? 09:07:52 probably yes 09:08:03 why do we need to merge it if it doesnt work? 09:08:05 I have no tested the last patch 09:08:07 yes, suggest to disable it by default in devstack. 09:08:07 this doesnt make sense 09:08:28 gsagie, if it can be made to work by the code freeze, merging it should expedite matters 09:08:30 if its not enabled then its not in the feature freeze 09:08:41 thats something needs to be decided now 09:08:41 gsagie, not enabled by default 09:09:02 still available for anyone who wants to use it 09:09:30 who will want to use it if its not working? 09:09:43 it depends. if someone is willing to use it, but it doesn't work, it hurts. 09:09:48 oanson: I will take a look at the bugs after work. 09:09:48 we will make it work this week 09:09:56 so lets merge it when it works 09:10:02 The plan is to get it to work by the code freeze 09:10:04 gsagie, I suppose it will be fixed ASAP 09:10:21 Then it will be available, working, but not enabled by default. 09:10:34 We could enable it by default if we are sure of its stability 09:10:37 gsagie, agree 09:10:44 so if thats the plan lets get it to work, why we need to merge code that is not working? 09:10:47 is yuanwei here ? 09:11:11 if security groups is in the version then it needs to work, and we are past the deadline 09:11:12 agree. 09:11:20 Yes 09:11:23 so we need to agree if its important feature enough or not 09:11:32 if it is, we extend the deadline 09:11:39 I understood that we agreed that it was. 09:11:45 dingboopt: can you make security group work in the following 2 days? 09:11:46 i think it's important. 09:11:49 yes it is and we need to extend the deadline only for SG 09:11:55 Duan Kebo: can yuanwei can do it? 09:12:02 If everyone finds a bug of SG, would you please forward it to me? 09:12:05 we need fast iterations on it 09:12:23 I will make sure there is someone working on it? 09:12:31 not for other features that did not made the feature freeze 09:12:49 yuli can you please share the bug here ? 09:12:53 Frank_Duan: ok thanks, yuli_s and oanson will report bugs to you 09:13:14 In addition to launchpad. It is helpful that the list is maintained 09:13:27 SG bugs can be marked critical so that we won't miss them. 09:13:49 It's better to decide who should responsibility for this feature. It's better for yuanwei or dingbo to complete it fully. 09:13:51 gampel, In lport table we have 2 fields to save: sgids and security_groups. In my test they have different value. 09:13:53 yuli_s: please share the bug you found, also i noticed that IPv6 is not supported need to add a bug for it as well 09:14:20 yes. ipv6 is not working. 09:14:23 the problem yuli_s showed me is that we have 2 fields for security groups 09:14:33 need to remove one 09:14:37 I have talked it with dingboot 09:15:01 He is busy recently, so I and yuawei will fix bugs of sg 09:15:13 Frank_Duan: ok great, please let me know if you need help 09:15:24 I am also available to help 09:15:24 Ok thx we need to make it to work by the end of the week 09:15:30 ok, great 09:15:55 thank you omer! 09:15:55 #action Frank_Duan yuanwei To finish SG app by end of week. oanson gsagie yuli_s to help where needed 09:16:06 Anything else on this topic? 09:16:16 the reliability feature 09:16:22 #topic Reliability 09:16:28 hshan: The stage is yours. 09:16:34 thx 09:16:40 I think that we should delay it after the Mitcka tag 09:17:47 what is missing for that feature? 09:17:51 hshan: ? 09:17:55 we need to move the 'switch_features_handler' from openflow handler to ovsdb_monitor, reliability feature rely on that 09:18:39 this is the last dependence of reliability feature 09:18:49 hshan: There is also the use of mod_flow and OFPFlowMod 09:19:05 oanson: yes 09:19:09 ok, so this is a quick fix 09:19:13 you need to change all the app to use mod_flow 09:19:14 moving the switch features handler 09:19:26 this is a big change 09:19:33 gsagie: I'll add a new patch to do that 09:19:35 gsagie: we should consider if we keep this event 09:19:49 I think that we should delay the reliability feature to after Mitka 09:19:50 the sync_started and sync_finished should cover all cases. 09:20:02 But this is definitely a change for after Mitaka, yes 09:20:41 maybe we should merge it first, but don't enable it 09:20:49 what do you think? 09:20:53 hshan: I know that you worked very hard on this one But I feel that we need to merge it next week to master 09:20:56 hshan: so change mod_flow and change switch_features_handler, what about L3 app? 09:20:58 hshan: I am afraid it may make the code unstable 09:21:23 I think that it is problematic because it changes some infrastructure code 09:21:29 like mod_flow 09:21:34 Even disabled, it has some changes within the code that have some effect 09:21:42 like what gampel said :) 09:21:45 gampel: but this we can disable 09:21:51 very easily 09:22:22 I feel that this feature is not high priority like SG and it is too risky 09:22:36 currently, the reliability feature's changing list is too long, i plan to divide it into several patches 09:22:52 Actually, looking at the patch again, it looks like most changes are in new files 09:23:02 (I remembered an older version (: ) 09:23:10 huh 09:23:13 But if you all think it is not we can consider adding it with disable flag 09:23:17 hshan: ok, lets decide that its going to be in our next version, it doesnt mean its not going to be merged 09:23:27 yes, this reliability feature itself is simple 09:23:27 mark it as experimental in the option. if it can be disabled and not affect the pipeline, i suggest we do it and continue to test by enabling it in devstack. 09:23:28 its just going to be merged in next version to master 09:23:40 The only problem if its disabled in the code in df_base_app. It has to be conditional anyway. So if we add that, it can also be merged. 09:23:55 nick-ma: i agree, i personally dont see why its not going to be easy to disable this feature 09:24:29 need to make sure it is working by code freeze. 09:24:29 Ok so hshan add a patch that we can disable every thing and we coudl merge it this week 09:24:34 hshan: the question is, why its important to get ir merged? 09:24:36 it 09:24:45 and is it working 09:24:50 I agree with the question ? 09:24:52 of course 09:25:06 if it is not wokring by end of this week, let it go to next release. even if it is experimental, we need to make sure it is working then. 09:25:10 hshan: did you change L3 application? 09:25:23 nick-ma: agree 09:25:28 I change mod_flow 09:25:44 all other apps use mod_flow 09:26:03 hshan: ahh i see 09:26:11 hshan: This is in the latest patchset? 09:26:11 ok 09:26:12 hshan: so any app that uses the cookie will work 09:26:32 The change is mainly in controller relibility itself. Modoify to other files is mior. 09:26:55 yes i can see that, i will test this patch today 09:27:10 oanson: I think it is, all other job will be did in new patch 09:27:10 and report by tomorrow and lets decide then 09:27:24 ok so we agree to add it if it works and that it could be disabled and marked as experimental 09:27:26 ok 09:27:36 So I agree we mark it as experimental feature. 09:27:43 gsagie: you must wait for the new patches a little, I think I can finished it today 09:27:54 hshan: please do and email me when its done 09:28:00 and tell me which patch to use for testing 09:28:05 gsagie: ok 09:28:12 #action hshan finish reliability and add option to disable reliability from configuration 09:28:17 sure, I'll will 09:28:18 It's also a important featuer if someone want to use df in any producion 09:28:21 i will also look how we can safely disable it if it turns out not to work 09:28:28 #action gsagie test reliability feature 09:28:51 Frank_Duan: we all agree its important, the question is it stable enough right now or needs to be merged in a week 09:28:57 its really not that big delay either way 09:29:04 i agree 09:29:11 but we want a "semi" stable version for more testing 09:29:18 and not to deal with logic bugs 09:29:21 Though I guess that if anyone uses it in production, they'll want a tagged version 09:29:23 before the summit, thats all 09:29:28 rather than the ongoing master. 09:29:43 oanson: yes but we can always add tags 09:29:57 We can mark it as RC_1 09:30:00 gsagie: Not if we're in the middle of feature development and adding new bugs :) 09:30:21 All right, I think we reached an agreement. 09:30:25 and cherry pick it when it is done to RC_2 09:30:38 This feature needs some time to mature. 09:31:02 All features need to be heavily tested in the coming week 09:31:05 I do think it will be perfect in a few weeks. 09:31:14 oanson: agree. 09:31:19 We need to find and iron out as many bugs before the summit 09:31:56 Any more issues on this topic? 09:32:14 #topic Bugs 09:32:15 Frank_Duan: the big question it also depends on your schedule 09:32:47 Yes 09:32:56 #link https://bugs.launchpad.net/dragonflow 09:33:02 yuli_s, anything to report? 09:33:05 We will focus on fixed bugs of these features. 09:33:09 oanson, most of the bugs are taking care of 09:33:47 when doing debugging, I am getting strange exceptions, I just send you one 09:34:08 oanson: there is the unreported one about the publisher update frequency 09:34:28 gampel, I am on that one. 09:34:55 yuli_s, please report these exceptions 09:35:04 They may hide a larger problem beneath 09:35:06 Frank_Duan: are you testing the selective proactive on multi node setup 09:35:07 I hope we will close all open issues 09:35:12 oanson, yup 09:35:14 And it allows others to investigate and fix them 09:35:25 Not yet. 09:35:35 But hujie has 09:35:46 Frank_Duan: I think it is very important to test both redis and zmq 09:36:04 I see there are also 2 unassigned sg bugs. 09:36:07 for example this one happens during debugging 09:36:10 https://bugs.launchpad.net/dragonflow/+bug/1568506 09:36:11 Launchpad bug 1568506 in DragonFlow "sg test bug 2" [High,New] 09:36:17 I have test all feature for basic function in the middle of March based on redis in multi mode environment 09:36:18 gampel, testing is next topic 09:36:28 Lets focus on bugs, and then we can get to it :) 09:36:30 Opps 09:36:30 Gampel, we only tested redis 09:36:38 and will test zmq later 09:36:45 okie lets go to testing 09:37:10 who is taking the SG bug 09:37:20 go to testing 09:37:34 You are assign the SG bug to Yuanwei 09:37:45 You can 09:37:57 #action yuanwei to review SG bugs 09:38:02 yuli_s, please assign the bugs 09:38:07 ok 09:38:26 Any other bug-related topics? 09:38:30 nop 09:38:38 #topic Testing 09:38:47 We have to make sure all features are tested 09:38:58 including multinode, using redis and ZMQ 09:39:10 and east-west traffic on L2 and L3 networks 09:39:24 and Distributed DNAT 09:39:29 Yes, including dnat 09:39:38 And SG 09:39:49 And if we can, also reliability 09:40:28 It is important that any bugs found are uploaded to launchpad, possibly with a mail to yuli_s who is our bug-master 09:40:37 sure 09:41:03 sure 09:41:15 Shlomo_N: Can you share the status of your performance testing so far 09:41:44 yes, sure 09:42:33 Last week I have tested for 4 scenarios: E-W (L2 and L3), N-S and SNAT 09:43:26 After twicking the linux kernel, I have managed to get near 90% line utilization 09:43:41 Here are the results: 09:43:42 L2 VM VM 7.39Gb/s 09:43:45 oops 09:44:00 L2: 7.39Gb/s 09:44:11 L3: 7.24Gb/s 09:44:21 North-South: 6.98Gb/s 09:44:29 it it cross node 09:44:32 ? 09:44:34 SNAT: 8.38Gb/s 09:44:43 Yes, all tests are cross node only 09:44:52 On a single 10G line? 09:44:59 geneve or vxlan? 09:45:17 Yep, the lab I'm using it based on 10Gb/s and vxlan 09:45:20 did you get the line rate with multi VM traffic 09:45:21 ? 09:45:30 Shlomo, do you have the performance data of neutron + ovs agent? 09:46:07 I got near line rate with multi vm traffic 09:46:29 why does snat get more bandwidth/sec? i don't understand. 09:46:31 Frank_Duan: I have, but it's based on 1Gb/s lab 09:47:00 Frank_Duan: I think next will be to test with SG and then DVR 09:47:08 Shlomo_N: Do you also have CPU usage statistics for these tests? Is the bottleneck now the line width? 09:47:28 nick-ma: probably because we have only single VMSwitch in the way 09:47:59 Shlomo_N: i agree with nick-ma: why is the SNAT and DNAT not the same it is alll centralized right ? 09:48:07 oanson: the bottleneck is not the line bandwidth 09:48:57 Yes, gampel. we also need to compare them with performance data of neutron + ovs agent 09:48:59 gampel: yes. DNAT wasn't tested 09:49:21 you said North-South: 6.98Gb/s and SNAT: 8.38Gb/s 09:49:47 yes 09:49:56 Frank_Duan: yes next we will do reference implementation with DVR 09:50:18 probably because we have only single VMSwitch in the SNAT scenario 09:50:22 We can do this in Beijing 09:50:36 That will be great help 09:50:55 shlomo is working on automating the testing you could use his work 09:51:24 Shlomo_N: will you be able to upload it to the DF repository 09:51:28 Frank_Duan: how many servers you have there? 09:51:40 Sure, I will 09:51:57 we have 3 severs with 10GE if 09:52:15 other servers only have 1ge ports. 09:52:28 So we have bigger lab here 09:52:35 :-) 09:53:42 Anyone have 40Gb/s lab anywhere? 09:53:57 #action Shlomo_N upload performance tests to repository 09:54:19 10x oanson 09:54:22 Shlomo_N: maybe they can do the DVR OVS test and you could focus on the SG automation 09:54:32 Shlomo_N, no problem :) 09:54:44 gampel: ok 09:55:07 All right. Anything else on testing? 09:55:40 #topic open discussion 09:56:21 i suggest we vote on py34 gate. 09:56:50 gampel: did you have a chance to look for bug I could work on? 09:56:50 nick-ma: second 09:57:06 it was working and broken and fixed and working then. 09:57:12 please look at the DF Austin topics and add suggestions https://etherpad.openstack.org/p/dragonflow-design-summit 09:57:14 +1 09:57:23 i fixed it two times. 09:57:29 you mean to make it voting 09:57:38 yes. 09:58:20 scsnow__: please contact yuli he is the bug master but I you dod some testing you will find bugs we are in integration time 09:58:24 I think it's important. We should be able to automatically move over to py34 09:58:39 is it working now 09:58:50 of course it is. 09:58:53 It appears stable. I looked back a few reviews. 09:58:59 Thanks to nick-ma :) 09:58:59 scsnow__: will you be in Austin 09:59:11 thx yes +1 on make it voting 09:59:28 gampel: no :( 09:59:34 So it's agreed? 09:59:43 :-) 10:00:08 gampel, nick-ma: Who gets to do it? 10:00:12 yuli: can you help scsnow__ find an easy bug to work on 10:00:29 I will do it no problem 10:00:44 #action gampel make gate test py34 voting 10:00:50 gampel, sure 10:00:57 scsnow yuli_s: there are lots of simple tasks in wishlist. 10:00:59 The fullstack tests also seem to be getting there, but not ready yet. 10:01:15 The following error repeats: Exception: VM is not deleted 10:01:27 I will open a bug, and if I have time I will look into it. 10:01:52 Yes i think making the fullsatck stable is very important ! 10:01:54 nick-ma, yup, you are right 10:02:07 #action oanson make fullstack more stable 10:02:16 time is up thank you everyone 10:02:25 thanks all. 10:02:32 bb 10:02:35 Thank you. 10:02:42 #endmeeting