09:00:30 <gsagie> #startmeeting dragonflow
09:00:31 <openstack> Meeting started Mon Feb 22 09:00:30 2016 UTC and is due to finish in 60 minutes.  The chair is gsagie. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:00:32 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:00:32 <gsagie> Hello all
09:00:34 <openstack> The meeting name has been set to 'dragonflow'
09:00:38 <gsagie> Who is here for Dragonflow meeting :)?
09:00:47 <gampel> Hi everyone
09:01:01 <dingboopt> o/
09:01:09 <oanson> Hi
09:01:41 <gsagie> gampel: if you can please ping DuanKebo
09:01:47 <gampel> ok i will
09:01:47 <nick-ma> hi
09:02:14 <gsagie> #info gampel, nick-ma, oanson, dingboopt, gsagie in meeting
09:02:24 <gsagie> yuli_s. Shlomo here?
09:02:32 <yuli_s> Yes
09:02:35 <Shlomo_N> yes
09:02:35 <gsagie> #info yuli_s in meeting :)
09:02:47 <gsagie> #info Shlomo_N in meeting :):):):):):)
09:02:57 <Shlomo_N> :0
09:03:19 <gsagie> lets wait a minute or two for DuanKebo and we can start
09:03:41 <gsagie> #topic announcements
09:04:17 <gsagie> As we mentioned last week, Dragonflow submitted a request to be added as a big-tent project
09:04:47 <gsagie> gampel will be our PTL for the first round, the discussion about it in the TC meeting is suppose to be tomorrow
09:05:08 <gampel> we will have standard election in April
09:05:27 <gampel> so this is only intermediate stage
09:05:32 <gsagie> #info https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee  TC irc meeting
09:05:39 <gsagie> if anyone is interested to join in
09:05:48 <gsagie> dingboopt: raofei is here btw?
09:07:05 <gsagie> another announcement is we decided and voted to let nick-ma join Dragonflow core-team, nick-ma has been a huge help for us in reviews and in patches and is leading our DB-consistency efforts and i am sure he will continue to be great help to Dragonflow
09:07:12 <gsagie> so welcome to the team nick-ma
09:07:38 <gampel> :)welcome  nick-ma
09:07:46 <nick-ma> thanks.
09:07:47 <dingboopt> i just notified him
09:07:59 <Shlomo_N> welcome nick-ma
09:08:02 <gsagie> #topic security groups and port security
09:08:17 <gsagie> dingboopt: the stage is yours
09:08:23 <gsagie> keep everyone in the loop :)
09:08:53 <dingboopt> I write  most the code
09:08:59 <dingboopt> and test it by the way
09:09:07 <dingboopt> everything looks ok
09:09:14 <gsagie> okie, so by next week we should have most/all of the code merged right?
09:09:24 <dingboopt> yes
09:09:31 <gsagie> we also would like to do performance testing for it (Shlomo can assist with this)
09:09:38 <dingboopt> ok
09:09:39 <gsagie> to see the overhead of security groups
09:09:46 <dingboopt> i see
09:09:48 <gsagie> okie great, good job on that front
09:10:06 <gsagie> #action dingboopt upload all security groups app code for review
09:10:19 <gsagie> we also need to test it and see it works for all cases
09:10:26 <gsagie> including multi node setups
09:10:52 <dingboopt> ok
09:10:57 <gampel> dingboopt: we need to make sure we add fullstack test
09:11:03 <gampel> for sg
09:11:04 <dingboopt> I see
09:11:11 <gsagie> okie great, please let us know if you encounter any problem or need our help
09:11:31 <dingboopt> keep all related people updated
09:11:42 <gampel> please look at the patch omer send for ARP responder testing
09:11:46 <gsagie> i want to see the overall pipeline once its merged, we are going to have many changes to the pipeline now with all the new features and we need to make sure
09:11:52 <gsagie> everything integrate good together
09:12:10 <gsagie> yes, you can talk with oanson about the new testing infrastructure added
09:12:16 <dingboopt> yes
09:12:18 <gsagie> will be good to add security group and port security tests using that
09:12:19 <gsagie> ok
09:12:24 <dingboopt> I see
09:12:31 <gsagie> thank you dingboopt
09:12:35 <gsagie> #topic DB consistency
09:12:40 <nick-ma> ok
09:12:40 <gsagie> nick-ma: the stage is yours
09:12:59 <nick-ma> i uploaded the source code that I wrote last several days and updated the spec to reflect the design.
09:13:14 <nick-ma> the current code is under manual testing.
09:13:44 <dingboopt> Only version number, no jounal db table?
09:13:47 <gsagie> #info DB consistency patch https://review.openstack.org/#/c/282290/
09:14:00 <nick-ma> version number and SQL lock db table.
09:14:11 <gsagie> #info DB consistency spec https://review.openstack.org/#/c/268005/
09:14:23 <dingboopt> agree
09:14:23 <gsagie> hi raofei, welcome! :)
09:14:31 <nick-ma> introduce one db table for versioned object; another db table for distributed lock. all are implemented by SQL.
09:14:40 <nick-ma> you can check the updated spec.
09:14:44 <gsagie> dingboopt: would be great if you can review it as well and provide feedback
09:14:57 <dingboopt> yes
09:15:07 <dingboopt> will do
09:15:19 <gsagie> nick-ma: ok sounds great, any help that you would like to have oanson help with?
09:15:26 <gsagie> err any task
09:15:28 <nick-ma> the testing result is that I concurrently creates 10 subnets and it works.  But it may bring overhead on sql backend due to isolation level
09:15:30 <nick-ma> problem
09:16:40 <nick-ma> neutron by default sets isolation level to REPEATABLE READ. but it is not good for SQL lock statements. the mysql official suggests to use READ COMMITTED which is by default in postgresql. but neutron core team refused during some mailing list discussion. this problem is also in nova team.
09:17:40 <nick-ma> there are no solution in the upstream because this is the common problem for almost all the projects.
09:17:48 <gampel> i see wnats the process to change that is it in the code or configration
09:18:32 <dingboopt> changing the isolation level may cause problem to exsting neutron code
09:18:36 <nick-ma> code conversion. lots of db codes are assumed under REPEATABLE READ.
09:18:41 <nick-ma> yes
09:18:44 <nick-ma> dingbo is right.
09:18:54 <gsagie> you mean the seperation of network and subnet?
09:18:54 <gsagie> to two objects in dragonflow model
09:19:06 <gampel> OK there some connectivity issues for gal
09:19:20 <nick-ma> no, currently i don't change the schema.
09:19:30 <dingboopt> transaction isolation leve
09:19:31 <dingboopt> l
09:19:54 <nick-ma> i think if the code is working for the current schema. it is also working for changes in the later stage.
09:20:03 <gampel> next stage is to to the compare and swap ? will then the DB setting effect the performance
09:21:11 <gampel> nick-ma: you mean the effect is when you lock the version table
09:21:18 <nick-ma> compare and swap is based on what Db backends we use. ETCD provide compare-swap for value. zookeeper provides compare-swap for internal version. if we need it, i can provide another spec on that.
09:22:00 <nick-ma> yes, lock statements are working, but not optimal.
09:22:05 <gampel> I see, I think that the end goal is to do the compare and swap
09:22:33 <gampel> but i agree we need to do small steps and make sure we understand the problems
09:22:43 <nick-ma> another test is also needed for Galera clustering to demonstrate the sql lock mechanism.
09:23:29 <gampel> Ok we will try to set it up here, you are using Galera clustering in your solution  ?
09:23:29 <nick-ma> i'll do it this week and try to finish the current work using sql lock to protect concurrent writes on DF DB.
09:24:00 <gampel> thats great i will try to review your patch today
09:24:04 <nick-ma> no, i'll evaluate and test the code under Galera to see the effect.
09:24:34 <nick-ma> you can review them tomorrow. i'll update it this evening.
09:24:39 <gsagie> ok sorry
09:24:41 <gampel> Ok will do
09:24:45 <gsagie> we have internet problems here :)
09:24:59 <gsagie> gampel, nick-ma: problem solved?
09:25:06 <gsagie> err i mean topic :)
09:25:26 <gsagie> #topic publish-subscribe
09:25:37 <gsagie> anything open on DB consistency?
09:25:42 <gampel> I think we coverd this topic with action items form me and nick-ma to test under Galera
09:25:45 <yuli_s2> I am back
09:25:58 <gsagie> #action nick-ma, gampel test DB consistency under Galera
09:26:04 <nick-ma> yes. if you have any problems or questions, please review my spec.
09:26:10 <gsagie> okie great
09:26:29 <gampel> and we need to understand better the effect of the DB setting EPEATABLE READ/ READ COMMITTED
09:26:36 <nick-ma> i'll make sure the spec and code are consistent.
09:26:45 <gsagie> gampel: anything on the publish-subscribe?
09:26:53 <gampel> we merged the API
09:27:06 <gsagie> too bad DuanKebo is not here, as we need to sync regarding redis :(
09:27:14 <gampel> currently  all teh publishing is done via the neutron server
09:27:15 <gsagie> we will schedule meeting for tommorow to discuss it
09:27:31 <gampel> he told me he will join shortly
09:27:34 <gsagie> okie great, oanson is adding monitoring thread
09:27:52 <gsagie> oanson: hi, just in time :)
09:27:57 <gampel> yes we are adding a publisher service
09:28:02 <gsagie> would like to describe the monitoring threadas
09:28:07 <gsagie> for tables
09:28:10 <gsagie> ?
09:28:18 <oanson> Sure
09:28:34 <oanson> The plan is to add a separate service, i.e. separate process, for publishing info.
09:28:49 <gsagie> #info pub-sub service https://review.openstack.org/#/c/281336/
09:28:53 <oanson> On this service, we plan to run a thread monitoring the distributed db for changes
09:29:09 <gsagie> #info pub-sub api https://review.openstack.org/#/c/282879/
09:29:15 <oanson> This means that the monitor will only run once on a neutron server, and not once per CPU/core
09:29:49 <oanson> A serivce is necessary in most cases, since binding the socket (for TCP/IP communications) can only be done from a single process
09:30:04 <gampel> it is impotent  to note that q-svc is forked per core, so currently  we run it multiple time in the same machine
09:30:38 <oanson> We'll make the inter-process and network communications pluggable, so in case a published event doesn't need to go through the service (e.g. UDP), it can bypass it and go directly out the network
09:30:51 <oanson> In that case, the service will only be for DB table monitoring.
09:31:04 <gsagie> okie, good work oanson, you believe it can be done this week? we do need to see how we help on the redis part as well, to have it act similar to this
09:31:21 <oanson> Definitely will be up for review this week
09:31:27 <gsagie> #action oanson pub/sub service patch for review
09:31:30 <gsagie> ok great, thanks
09:31:36 <gampel> we currently think to call it publisher service , if there are any other suggestion it will be great
09:31:51 <gsagie> db-publisher-service
09:32:05 <gampel> sound good to me
09:32:11 <gsagie> publisher service sounds good either way
09:32:33 <gsagie> gampel: anything else open on this front? we do need to start test it
09:32:36 <oanson> df-publisher-service, in case it ever publishes non-db related events
09:32:41 <gsagie> more often
09:32:52 <gampel> yes i need to add the service discovery table for all the publishers
09:33:12 <gsagie> #action gampel add service discovery DF table for publishers
09:33:13 <gampel> and will address all the comments on teh spce please review
09:33:24 <gsagie> yes, in general i think we want to have all/most of our configuration from the DB
09:33:30 <gsagie> less use of configuration files
09:33:39 <gsagie> will make updates/upgrades/deployments easier
09:33:46 <gsagie> and of course managment
09:33:53 <gampel> #info: :https://review.openstack.org/#/c/263733/
09:34:10 <gampel> yes
09:34:12 <gsagie> #info pub-sub spec https://review.openstack.org/#/c/263733/
09:34:16 <gsagie> please review everyone
09:34:32 <gsagie> #topic distributed DNAT
09:34:38 <gsagie> raofei: the stage is yours :)
09:34:45 <gsagie> any update?
09:35:06 <gsagie> #info dnat patch https://review.openstack.org/#/c/279434/
09:35:34 <raofei> This week I will start the DNAT app coding, maybe i can commit it this week.
09:35:54 <gsagie> raofei: okie great, regarding the spec you sent, maybe consider changing the name
09:36:06 <raofei> sorry, I'm attending a meeting with somebody just now.
09:36:11 <gsagie> to be more specific for the title as we already have distributed dnat spec
09:36:24 <gsagie> raofei: okie np, lets take this offline and on the review
09:36:37 <raofei> OK, please commit your comments, thanks.
09:36:41 <gsagie> #topic selective proactive
09:36:46 <gsagie> DuanKebo here?
09:37:06 <gampel> he is in another meeting, let me check
09:37:30 <gsagie> the spec was merged, we talked and decided to not move the functionallity from the controller
09:37:57 <gsagie> but have seperate module just for the selective/proactive part, and then when everything is done maybe we can consider rafactoring
09:38:09 <gampel> OK
09:38:12 <gsagie> reason is there are many people working on the same code and we dont want to introduce major
09:38:23 <gsagie> code refactoring that are not needed, personally i dont think they are right
09:38:31 <gsagie> but we can re-look when everything is done
09:38:50 <gsagie> the OVSDB monitor events are sent to the DB changes queue and the controller will react accordingly
09:39:03 <gampel> yes we have a lot of patch integration on the way
09:39:11 <gsagie> anyone has anything to add on selective proactive?
09:39:32 <gsagie> we will discuss with DuanKebo tommorow, anyone that wants to join is welcome, come to #openstack-dragonflow
09:39:56 <gsagie> ok, for controller reliability no one is here to talk about
09:40:02 <gsagie> #topic controller reliability
09:40:15 <gsagie> we will need to make sure this is all good and works with current L3
09:40:24 <gsagie> application as it changes cookie for flows
09:40:42 <gsagie> i think we should set the L3 proactive app as default for now
09:40:47 <gsagie> any thoughts?
09:41:00 <gampel> whats the namespace needed for the L3 how many bits
09:41:41 <gsagie> there is one more optimization that is needed in the L3 app and thats not using flows from every subnet to every subnet but just
09:41:55 <gampel> the cookie  is 64 bit, if we use 32 bit for the reliability is it enough ?
09:42:01 <gsagie> classifying the router id (giving it a register) and then having one flow per router interface
09:42:32 <gsagie> gampel: for reliability even less is enough it basically only need to be 2-3 numbers
09:42:37 <raofei> yes, I think so, using router id instead of subnet
09:42:40 <gsagie> 2-3 options
09:42:41 <gampel> yes we discuss this in the past how big effort  an effort you think it is
09:42:56 <gsagie> gampel: not big at all we already have everything in place
09:43:00 <gsagie> i will start working on it
09:43:36 <gsagie> #action gsagie enhance L3 apps to use one flow per router interface based on router id
09:43:48 <gsagie> then, i think we dont even need the cookie
09:43:52 <gsagie> for the L3 proactive app
09:44:17 <oanson> gsagie, you mean one flow from every subnet to every router? Or one flow from every subnet to each router, and then flows between routers?
09:44:29 <oanson> to its* router, in the second question.
09:44:58 <gsagie> oanson: regardless of proactive/reactive, i mean the flows at the first table, we will only need one flow per router interface
09:45:12 <gsagie> right now we have from each router interface to all the other interfaces to check there is route
09:45:26 <gsagie> for proactive, we will have to detect the destination port in the next table
09:45:39 <gsagie> we are running out of time so lets continue this offline
09:45:43 <gsagie> Hi DuanKebo :)
09:45:46 <DuanKebo> Hi
09:45:52 <gsagie> we wont have enough time for all topics
09:45:55 <DuanKebo> Sorry for late
09:45:56 <gsagie> #topic RedisDB
09:46:20 <gsagie> DuanKebo: np, we really feel we need to sync regarding redis DB implementation, both the DB and the pub/sub
09:46:22 <gampel> DuanKebo ?  are you here
09:46:28 <gsagie> as its going to be the default implementation
09:46:37 <DuanKebo> Eran has posted some comments about redis driver
09:46:56 <DuanKebo> Feipeng will discuss them with Eran
09:47:14 <gampel> yes  he can ping me any time
09:47:39 <gsagie> okie, we need to see maybe me or oanson can help with any task
09:47:42 <gsagie> or someone else
09:48:05 <gsagie> #action gsagie.gampel,DuanKebo,feipeng sync regarding redis db efforts and see we on schedule
09:48:47 <DuanKebo> We also need to modify the df plugin to support redis db
09:48:49 <gampel> yes make sure we are aligned with the db-publisher-service idea
09:49:18 <DuanKebo> Yes, it will use the sub-pub framework
09:49:41 <nick-ma> does redis provide build-in pub/sub?
09:49:52 <DuanKebo> Yes
09:49:58 <DuanKebo> it does.
09:50:21 <nick-ma> ok, get it. thanks.
09:50:27 <gsagie> yes it has some nice features regarding selective proactive
09:50:41 <gsagie> it also filter things in the publisher which is good
09:51:08 <gsagie> okie
09:51:10 <DuanKebo> Hi gal, we need the a "tanant_id" para in the north bound db interfaces.
09:51:11 <nick-ma> okie
09:51:31 <gampel> we need to test it in scale , zmq has a publisher XPUB but it does not scale when doing publisher side filtering
09:51:52 <gsagie> DuanKebo: i am adding this part right now, what do you see is missing?
09:52:36 <gsagie> #action test redis pub/sub with publisher side filtering
09:52:39 <gampel> in ZMQ we currently use PUB which is client side (we need to test the trade off network/server load)
09:52:39 <DuanKebo> Great
09:53:20 <gsagie> DuanKebo: i add it to db_api, i think its already added and the selective proactive will also need to call the pub/sub "register_topic" with new tenant ids
09:53:30 <gsagie> to register for pub/sub in addition to bringing the correct DB
09:53:58 <gsagie> our Neutron plugin should already populate all objects with another parameter called "topic" which we pass to the DB_API
09:54:03 <gsagie> thats the tenant id
09:54:06 <DuanKebo> Yes, it will can register-topic and unregister_topic
09:54:15 <gsagie> okie great
09:54:20 <gampel> yes and unregister when deleting the Tenant VMs
09:54:41 <DuanKebo> In addition to the sub/pub interface,
09:54:50 <gsagie> we dont have time to talk about testing/bugs but lets agree that next meeting we going to put it first, Shlomo_N is doing some good work and with the community for us to compare results with DVR
09:54:58 <gsagie> and understand L3 performance
09:54:59 <DuanKebo> the query interface aslo need tenant_id para.
09:55:12 <gsagie> yuli_s will soon start working on control plane testing and stretch tests
09:55:33 <Shlomo_N> thanks gsagie :)
09:55:37 <gsagie> DuanKebo: what do you mean "query" ? you mean query from the DB? like "get_all_keys"?
09:55:53 <DuanKebo> for example
09:56:02 <gsagie> DuanKebo: in master we already have this
09:56:10 <DuanKebo> get_ports()
09:56:11 <yuli_s2> yup
09:56:11 <gsagie> in the db_api
09:56:24 <gsagie> we called it "topic", by default its None
09:56:29 <gampel> #info:  Yuli control plane spec (WIP) https://review.openstack.org/#/c/282873/
09:56:37 <DuanKebo> OK, i will update my code
09:56:46 <gsagie> #info:  Yuli control plane spec (WIP) https://review.openstack.org/#/c/282873/
09:56:51 <gsagie> cool didnt see this, will review
09:57:02 <gsagie> DuanKebo: ok, please let me know if anything is missing
09:57:06 <gsagie> #topic open discussion
09:57:16 <gsagie> only 1-2 minutes left, anyone wants to add anything? :)
09:57:37 <gsagie> DuanKebo: we will have the meeting tommorow to furthur sync regarding redis db and selective proactive
09:57:47 <DuanKebo> OK
09:57:50 <gsagie> so if you guys have any questions..
09:57:59 <DuanKebo> I will book a meeting room in advance.
09:58:07 <DuanKebo> Sorry for the inconvienent.
09:58:13 <gampel> thank you everyone :)
09:58:17 <gsagie> ok, thanks everyone for joining
09:58:24 <nick-ma> in irc?
09:58:38 <Shlomo_N> yes
09:58:42 <gsagie> nick-ma: we will send you the ivnitation
09:58:46 <nick-ma> okie. thanks.
09:58:48 <gsagie> so you can join in
09:59:11 <nick-ma> thanks, all.
09:59:29 <gsagie> ok thanks everyone!
09:59:31 <gsagie> #endmeeting