09:04:05 <oanson> #startmeeting Dragonflow
09:04:06 <openstack> Meeting started Mon Apr 11 09:04:05 2016 UTC and is due to finish in 60 minutes.  The chair is oanson. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:04:07 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:04:09 <openstack> The meeting name has been set to 'dragonflow'
09:04:12 <oanson> gampel, thankgs
09:04:19 <oanson> #info scsnow gampel Shlomo_N raofei nick-ma oanson yuli_s gsagie  in meeting
09:04:36 <oanson> #topic Security Groups
09:04:50 <oanson> #link https://review.openstack.org/#/c/280538/
09:05:13 <gampel> From the testing so far it does not break the code
09:05:23 <oanson> We are in feature freeze, but due to the importance of this feature and seeing how close it is to completion, we want to try and merge it.
09:05:24 <gampel> but does not work properly
09:05:59 <oanson> yuli_s, dingboopt you were testing this feature. Any comments?
09:06:54 <yuli_s> I found a bug with double security group fields, so basically SG does not works for me
09:07:18 <gampel> oanson: I think we can merge it if it does not break the flow when set to work without SG
09:07:28 <oanson> gampel, We can do that
09:07:37 <oanson> yuli_s, do you think it can be made to work by Monday?
09:07:49 <oanson> (i.e. next Monday, the 18th?)
09:07:51 <gampel> I see that there is a new patch from today what does it include ?
09:07:52 <yuli_s> probably yes
09:08:03 <gsagie> why do we need to merge it if it doesnt work?
09:08:05 <yuli_s> I have no tested the last patch
09:08:07 <nick-ma> yes, suggest to disable it by default in devstack.
09:08:07 <gsagie> this doesnt make sense
09:08:28 <oanson> gsagie, if it can be made to work by the code freeze, merging it should expedite matters
09:08:30 <gsagie> if its not enabled then its not in the feature freeze
09:08:41 <gsagie> thats something needs to be decided now
09:08:41 <oanson> gsagie, not enabled by default
09:09:02 <oanson> still available for anyone who wants to use it
09:09:30 <gsagie> who will want to use it if its not working?
09:09:43 <nick-ma> it depends. if someone is willing to use it, but it doesn't work, it hurts.
09:09:48 <dingboopt> oanson: I will take a look at the bugs after work.
09:09:48 <gampel> we will make it work this week
09:09:56 <gsagie> so lets merge it when it works
09:10:02 <oanson> The plan is to get it to work by the code freeze
09:10:04 <yuli_s> gsagie, I suppose it will be fixed ASAP
09:10:21 <oanson> Then it will be available, working, but not enabled by default.
09:10:34 <oanson> We could enable it by default if we are sure of its stability
09:10:37 <yuli_s> gsagie, agree
09:10:44 <gsagie> so if thats the plan lets get it to work, why we need to merge code that is not working?
09:10:47 <gampel> is yuanwei here ?
09:11:11 <gsagie> if security groups is in the version then it needs to work, and we are past the deadline
09:11:12 <nick-ma> agree.
09:11:20 <oanson> Yes
09:11:23 <gsagie> so we need to agree if its important feature enough or not
09:11:32 <gsagie> if it is, we extend the deadline
09:11:39 <oanson> I understood that we agreed that it was.
09:11:45 <gsagie> dingboopt: can you make security group work in the following 2 days?
09:11:46 <nick-ma> i think it's important.
09:11:49 <gampel> yes it is and we need to extend the deadline only for SG
09:11:55 <gsagie> Duan Kebo: can yuanwei can do it?
09:12:02 <Frank_Duan> If everyone finds a bug of SG, would you please forward it to me?
09:12:05 <gsagie> we need fast iterations on it
09:12:23 <Frank_Duan> I will make sure there is someone working on it?
09:12:31 <gampel> not for other features that did not made the feature freeze
09:12:49 <gampel> yuli can you please share the bug here ?
09:12:53 <gsagie> Frank_Duan: ok thanks, yuli_s and oanson will report bugs to you
09:13:14 <oanson> In addition to launchpad. It is helpful that the list is maintained
09:13:27 <oanson> SG bugs can be marked critical so that we won't miss them.
09:13:49 <raofei> It's better to decide who should responsibility for this feature. It's better for yuanwei or dingbo to complete it fully.
09:13:51 <yuli_s> gampel, In lport table we have 2 fields to save: sgids and security_groups. In my test they have different value.
09:13:53 <gsagie> yuli_s: please share the bug you found, also i noticed that IPv6 is not supported need to add a bug for it as well
09:14:20 <nick-ma> yes. ipv6 is not working.
09:14:23 <gsagie> the problem yuli_s showed me is that we have 2 fields for security groups
09:14:33 <gsagie> need to remove one
09:14:37 <Frank_Duan> I have talked it with dingboot
09:15:01 <Frank_Duan> He is busy recently, so I and yuawei will fix bugs of sg
09:15:13 <gsagie> Frank_Duan: ok great, please let me know if you need help
09:15:24 <oanson> I am also available to help
09:15:24 <gampel> Ok thx we need to make it to work by the end of the week
09:15:30 <yuli_s> ok, great
09:15:55 <Frank_Duan> thank you omer!
09:15:55 <oanson> #action Frank_Duan yuanwei To finish SG app by end of week. oanson gsagie yuli_s to help where needed
09:16:06 <oanson> Anything else on this topic?
09:16:16 <hshan> the reliability feature
09:16:22 <oanson> #topic Reliability
09:16:28 <oanson> hshan: The stage is yours.
09:16:34 <hshan> thx
09:16:40 <gampel> I think that we should delay it after the Mitcka  tag
09:17:47 <gsagie> what is missing for that feature?
09:17:51 <gsagie> hshan: ?
09:17:55 <hshan> we need to move the 'switch_features_handler' from openflow handler to ovsdb_monitor, reliability feature rely on that
09:18:39 <hshan> this is the last dependence of reliability feature
09:18:49 <oanson> hshan: There is also the use of mod_flow and OFPFlowMod
09:19:05 <hshan> oanson: yes
09:19:09 <gsagie> ok, so this is a quick fix
09:19:13 <gampel> you need to change all the app to use mod_flow
09:19:14 <gsagie> moving the switch features handler
09:19:26 <gampel> this is a big change
09:19:33 <hshan> gsagie: I'll add a new patch to do that
09:19:35 <oanson> gsagie: we should consider if we keep this event
09:19:49 <gampel> I think that we should delay the reliability feature to after Mitka
09:19:50 <oanson> the sync_started and sync_finished should cover all cases.
09:20:02 <oanson> But this is definitely a change for after Mitaka, yes
09:20:41 <hshan> maybe we should merge it first, but don't enable it
09:20:49 <hshan> what do you think?
09:20:53 <gampel> hshan: I know that you worked very hard on this one But I feel that we need to merge it next week to master
09:20:56 <gsagie> hshan: so change mod_flow and change switch_features_handler, what about L3 app?
09:20:58 <oanson> hshan: I am afraid it may make the code unstable
09:21:23 <gampel> I think that it is problematic because it changes some infrastructure code
09:21:29 <gampel> like mod_flow
09:21:34 <oanson> Even disabled, it has some changes within the code that have some effect
09:21:42 <oanson> like what gampel said :)
09:21:45 <gsagie> gampel: but this we can disable
09:21:51 <gsagie> very easily
09:22:22 <gampel> I feel that this feature is not high priority like SG and it is too risky
09:22:36 <hshan> currently, the reliability feature's changing list is too long, i plan to divide it into several patches
09:22:52 <oanson> Actually, looking at the patch again, it looks like most changes are in new files
09:23:02 <oanson> (I remembered an older version (: )
09:23:10 <hshan> huh
09:23:13 <gampel> But if you all think it is not we can consider adding it with disable flag
09:23:17 <gsagie> hshan: ok, lets decide that its going to be in our next version, it doesnt mean its not going to be merged
09:23:27 <hshan> yes, this reliability feature itself is simple
09:23:27 <nick-ma> mark it as experimental in the option. if it can be disabled and not affect the pipeline, i suggest we do it and continue to test by enabling it in devstack.
09:23:28 <gsagie> its just going to be merged in next version to master
09:23:40 <oanson> The only problem if its disabled in the code in df_base_app. It has to be conditional anyway. So if we add that, it can also be merged.
09:23:55 <gsagie> nick-ma: i agree, i personally dont see why its not going to be easy to disable this feature
09:24:29 <nick-ma> need to make sure it is working by code freeze.
09:24:29 <gampel> Ok so hshan add a patch that we can disable every thing and we coudl merge it this week
09:24:34 <gsagie> hshan: the question is, why its important to get ir merged?
09:24:36 <gsagie> it
09:24:45 <gsagie> and is it working
09:24:50 <gampel> I agree with the question ?
09:24:52 <hshan> of course
09:25:06 <nick-ma> if it is not wokring by end of this week, let it go to next release. even if it is experimental, we need to make sure it is working then.
09:25:10 <gsagie> hshan: did you change L3 application?
09:25:23 <gsagie> nick-ma: agree
09:25:28 <hshan> I change mod_flow
09:25:44 <hshan> all other apps use mod_flow
09:26:03 <gsagie> hshan: ahh i see
09:26:11 <oanson> hshan: This is in the latest patchset?
09:26:11 <hshan> ok
09:26:12 <gsagie> hshan: so any app that uses the cookie will work
09:26:32 <Frank_Duan> The change is mainly in controller relibility itself. Modoify to other files is mior.
09:26:55 <gsagie> yes i can see that, i will test this patch today
09:27:10 <hshan> oanson: I think it is, all other job will be did in new patch
09:27:10 <gsagie> and report by tomorrow and lets decide then
09:27:24 <gampel> ok so we agree to add it if it works and that it could be disabled and marked as experimental
09:27:26 <nick-ma> ok
09:27:36 <Frank_Duan> So I agree we mark it as experimental feature.
09:27:43 <hshan> gsagie: you must wait for the new patches a little, I think I can finished it today
09:27:54 <gsagie> hshan: please do and email me when its done
09:28:00 <gsagie> and tell me which patch to use for testing
09:28:05 <hshan> gsagie: ok
09:28:12 <oanson> #action hshan finish reliability and add option to disable reliability from configuration
09:28:17 <hshan> sure, I'll will
09:28:18 <Frank_Duan> It's also a important featuer if someone want to use df in any producion
09:28:21 <gsagie> i will also look how we can safely disable it if it turns out not to work
09:28:28 <oanson> #action gsagie test reliability feature
09:28:51 <gsagie> Frank_Duan: we all agree its important, the question is it stable enough right now or needs to be merged in a week
09:28:57 <gsagie> its really not that big delay either way
09:29:04 <gampel> i agree
09:29:11 <gsagie> but we want a "semi" stable version for more testing
09:29:18 <gsagie> and not to deal with logic bugs
09:29:21 <oanson> Though I guess that if anyone uses it in production, they'll want a tagged version
09:29:23 <gsagie> before the summit, thats all
09:29:28 <oanson> rather than the ongoing master.
09:29:43 <gsagie> oanson: yes but we can always add tags
09:29:57 <gampel> We can mark it as RC_1
09:30:00 <oanson> gsagie: Not if we're in the middle of feature development and adding new bugs :)
09:30:21 <oanson> All right, I think we reached an agreement.
09:30:25 <gampel> and cherry pick it when it is done to RC_2
09:30:38 <Frank_Duan> This feature needs some time to mature.
09:31:02 <oanson> All features need to be heavily tested in the coming week
09:31:05 <Frank_Duan> I do think it will be perfect in a few weeks.
09:31:14 <nick-ma> oanson: agree.
09:31:19 <oanson> We need to find and iron out as many bugs before the summit
09:31:56 <oanson> Any more issues on this topic?
09:32:14 <oanson> #topic Bugs
09:32:15 <gsagie> Frank_Duan: the big question it also depends on your schedule
09:32:47 <Frank_Duan> Yes
09:32:56 <oanson> #link https://bugs.launchpad.net/dragonflow
09:33:02 <oanson> yuli_s, anything to report?
09:33:05 <Frank_Duan> We will focus on fixed bugs of these features.
09:33:09 <yuli_s> oanson, most of the bugs are taking care of
09:33:47 <yuli_s> when doing debugging, I am getting strange exceptions, I just send you one
09:34:08 <gampel> oanson:  there is the unreported one about the publisher update frequency
09:34:28 <oanson> gampel, I am on that one.
09:34:55 <oanson> yuli_s, please report these exceptions
09:35:04 <oanson> They may hide a larger problem beneath
09:35:06 <gampel> Frank_Duan:  are you testing the selective proactive on multi node setup
09:35:07 <yuli_s> I hope we will close all open issues
09:35:12 <yuli_s> oanson, yup
09:35:14 <oanson> And it allows others to investigate and fix them
09:35:25 <Frank_Duan> Not yet.
09:35:35 <Frank_Duan> But hujie has
09:35:46 <gampel> Frank_Duan:  I think it is very important to test both redis and zmq
09:36:04 <oanson> I see there are also 2 unassigned sg bugs.
09:36:07 <yuli_s> for example this one happens during debugging
09:36:10 <yuli_s> https://bugs.launchpad.net/dragonflow/+bug/1568506
09:36:11 <openstack> Launchpad bug 1568506 in DragonFlow "sg test bug 2" [High,New]
09:36:17 <hujie> I have test all feature for basic function in the middle of March based on redis in multi mode environment
09:36:18 <oanson> gampel, testing is next topic
09:36:28 <oanson> Lets focus on bugs, and then we can get to it :)
09:36:30 <gampel> Opps
09:36:30 <Frank_Duan> Gampel, we only tested redis
09:36:38 <Frank_Duan> and will test zmq later
09:36:45 <gsagie> okie lets go to testing
09:37:10 <gampel> who is taking the SG bug
09:37:20 <gsagie> go to testing
09:37:34 <Frank_Duan> You are assign the SG bug to Yuanwei
09:37:45 <Frank_Duan> You can
09:37:57 <oanson> #action yuanwei to review SG bugs
09:38:02 <oanson> yuli_s, please assign the bugs
09:38:07 <yuli_s> ok
09:38:26 <oanson> Any other bug-related topics?
09:38:30 <yuli_s> nop
09:38:38 <oanson> #topic Testing
09:38:47 <oanson> We have to make sure all features are tested
09:38:58 <oanson> including multinode, using redis and ZMQ
09:39:10 <oanson> and east-west traffic on L2 and L3 networks
09:39:24 <gampel> and Distributed  DNAT
09:39:29 <oanson> Yes, including dnat
09:39:38 <oanson> And SG
09:39:49 <oanson> And if we can, also reliability
09:40:28 <oanson> It is important that any bugs found are uploaded to launchpad, possibly with a mail to yuli_s who is our bug-master
09:40:37 <Shlomo_N> sure
09:41:03 <raofei> sure
09:41:15 <gampel> Shlomo_N: Can you share the status of your performance testing so far
09:41:44 <Shlomo_N> yes, sure
09:42:33 <Shlomo_N> Last week I have tested for 4 scenarios: E-W (L2 and L3), N-S and SNAT
09:43:26 <Shlomo_N> After twicking the linux kernel, I have managed to get near 90% line utilization
09:43:41 <Shlomo_N> Here are the results:
09:43:42 <Shlomo_N> L2	VM	VM	7.39Gb/s
09:43:45 <Shlomo_N> oops
09:44:00 <Shlomo_N> L2: 7.39Gb/s
09:44:11 <Shlomo_N> L3: 7.24Gb/s
09:44:21 <Shlomo_N> North-South: 6.98Gb/s
09:44:29 <gampel> it  it cross node
09:44:32 <gampel> ?
09:44:34 <Shlomo_N> SNAT: 8.38Gb/s
09:44:43 <Shlomo_N> Yes, all tests are cross node only
09:44:52 <oanson> On a single 10G line?
09:44:59 <nick-ma> geneve or vxlan?
09:45:17 <Shlomo_N> Yep, the lab I'm using it based on 10Gb/s and vxlan
09:45:20 <gampel> did you get the line rate with multi VM traffic
09:45:21 <gampel> ?
09:45:30 <Frank_Duan> Shlomo, do you have the performance data of neutron + ovs agent?
09:46:07 <Shlomo_N> I got near line rate with multi vm traffic
09:46:29 <nick-ma> why does snat get more bandwidth/sec? i don't understand.
09:46:31 <Shlomo_N> Frank_Duan: I have, but it's based on 1Gb/s lab
09:47:00 <gampel> Frank_Duan: I think next will be to test with SG and then DVR
09:47:08 <oanson> Shlomo_N: Do you also have CPU usage statistics for these tests? Is the bottleneck now the line width?
09:47:28 <Shlomo_N> nick-ma: probably because we have only single VMSwitch in the way
09:47:59 <gampel> Shlomo_N:  i agree with nick-ma: why is the SNAT and DNAT not the same it is alll centralized right ?
09:48:07 <Shlomo_N> oanson: the bottleneck is not the line bandwidth
09:48:57 <Frank_Duan> Yes, gampel. we also need to compare them with performance data of neutron + ovs agent
09:48:59 <Shlomo_N> gampel: yes. DNAT wasn't tested
09:49:21 <gampel> you said North-South: 6.98Gb/s  and SNAT: 8.38Gb/s
09:49:47 <Shlomo_N> yes
09:49:56 <gampel> Frank_Duan:  yes next we will do reference implementation with DVR
09:50:18 <Shlomo_N> probably because we have only single VMSwitch in the SNAT scenario
09:50:22 <Frank_Duan> We can do this in Beijing
09:50:36 <gampel> That will be great help
09:50:55 <gampel> shlomo is working on automating the testing you could use his work
09:51:24 <gampel> Shlomo_N: will you be able to upload it to the DF repository
09:51:28 <Shlomo_N> Frank_Duan: how many servers you have there?
09:51:40 <Shlomo_N> Sure, I will
09:51:57 <Frank_Duan> we have 3 severs with 10GE if
09:52:15 <Frank_Duan> other servers only have 1ge ports.
09:52:28 <Shlomo_N> So we have bigger lab here
09:52:35 <Shlomo_N> :-)
09:53:42 <Shlomo_N> Anyone have 40Gb/s lab anywhere?
09:53:57 <oanson> #action Shlomo_N upload performance tests to repository
09:54:19 <Shlomo_N> 10x oanson
09:54:22 <gampel> Shlomo_N:  maybe they can do the DVR OVS test and you could focus on the SG automation
09:54:32 <oanson> Shlomo_N, no problem :)
09:54:44 <Shlomo_N> gampel: ok
09:55:07 <oanson> All right. Anything else on testing?
09:55:40 <oanson> #topic open discussion
09:56:21 <nick-ma> i suggest we vote on py34 gate.
09:56:50 <scsnow__> gampel: did you have a chance to look for bug I could work on?
09:56:50 <oanson> nick-ma: second
09:57:06 <nick-ma> it was working and broken and fixed and working then.
09:57:12 <gampel> please look at the DF Austin topics and add suggestions https://etherpad.openstack.org/p/dragonflow-design-summit
09:57:14 <gampel> +1
09:57:23 <nick-ma> i fixed it two times.
09:57:29 <gampel> you mean to make it voting
09:57:38 <nick-ma> yes.
09:58:20 <gampel> scsnow__: please contact yuli he is the bug master but I you dod some testing you will find bugs we are in integration time
09:58:24 <oanson> I think it's important. We should be able to automatically move over to py34
09:58:39 <gampel> is it working now
09:58:50 <nick-ma> of course it is.
09:58:53 <oanson> It appears stable. I looked back a few reviews.
09:58:59 <oanson> Thanks to nick-ma :)
09:58:59 <gampel> scsnow__: will you be in Austin
09:59:11 <gampel> thx yes +1 on make it voting
09:59:28 <scsnow__> gampel: no :(
09:59:34 <oanson> So it's agreed?
09:59:43 <nick-ma> :-)
10:00:08 <oanson> gampel, nick-ma: Who gets to do it?
10:00:12 <gampel> yuli: can you help scsnow__ find an easy bug to work on
10:00:29 <gampel> I will do it no problem
10:00:44 <oanson> #action gampel make gate test py34 voting
10:00:50 <yuli_s> gampel, sure
10:00:57 <nick-ma> scsnow yuli_s: there are lots of simple tasks in wishlist.
10:00:59 <oanson> The fullstack tests also seem to be getting there, but not ready yet.
10:01:15 <oanson> The following error repeats: Exception: VM is not deleted
10:01:27 <oanson> I will open a bug, and if I have time I will look into it.
10:01:52 <gampel> Yes i think making the fullsatck stable is very important !
10:01:54 <yuli_s> nick-ma, yup, you are right
10:02:07 <oanson> #action oanson make fullstack more stable
10:02:16 <gampel> time is up thank you everyone
10:02:25 <nick-ma> thanks all.
10:02:32 <scsnow__> bb
10:02:35 <oanson> Thank you.
10:02:42 <oanson> #endmeeting