09:00:30 #startmeeting dragonflow 09:00:31 Meeting started Mon Feb 22 09:00:30 2016 UTC and is due to finish in 60 minutes. The chair is gsagie. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:00:32 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:00:32 Hello all 09:00:34 The meeting name has been set to 'dragonflow' 09:00:38 Who is here for Dragonflow meeting :)? 09:00:47 Hi everyone 09:01:01 o/ 09:01:09 Hi 09:01:41 gampel: if you can please ping DuanKebo 09:01:47 ok i will 09:01:47 hi 09:02:14 #info gampel, nick-ma, oanson, dingboopt, gsagie in meeting 09:02:24 yuli_s. Shlomo here? 09:02:32 Yes 09:02:35 yes 09:02:35 #info yuli_s in meeting :) 09:02:47 #info Shlomo_N in meeting :):):):):):) 09:02:57 :0 09:03:19 lets wait a minute or two for DuanKebo and we can start 09:03:41 #topic announcements 09:04:17 As we mentioned last week, Dragonflow submitted a request to be added as a big-tent project 09:04:47 gampel will be our PTL for the first round, the discussion about it in the TC meeting is suppose to be tomorrow 09:05:08 we will have standard election in April 09:05:27 so this is only intermediate stage 09:05:32 #info https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee TC irc meeting 09:05:39 if anyone is interested to join in 09:05:48 dingboopt: raofei is here btw? 09:07:05 another announcement is we decided and voted to let nick-ma join Dragonflow core-team, nick-ma has been a huge help for us in reviews and in patches and is leading our DB-consistency efforts and i am sure he will continue to be great help to Dragonflow 09:07:12 so welcome to the team nick-ma 09:07:38 :)welcome nick-ma 09:07:46 thanks. 09:07:47 i just notified him 09:07:59 welcome nick-ma 09:08:02 #topic security groups and port security 09:08:17 dingboopt: the stage is yours 09:08:23 keep everyone in the loop :) 09:08:53 I write most the code 09:08:59 and test it by the way 09:09:07 everything looks ok 09:09:14 okie, so by next week we should have most/all of the code merged right? 09:09:24 yes 09:09:31 we also would like to do performance testing for it (Shlomo can assist with this) 09:09:38 ok 09:09:39 to see the overhead of security groups 09:09:46 i see 09:09:48 okie great, good job on that front 09:10:06 #action dingboopt upload all security groups app code for review 09:10:19 we also need to test it and see it works for all cases 09:10:26 including multi node setups 09:10:52 ok 09:10:57 dingboopt: we need to make sure we add fullstack test 09:11:03 for sg 09:11:04 I see 09:11:11 okie great, please let us know if you encounter any problem or need our help 09:11:31 keep all related people updated 09:11:42 please look at the patch omer send for ARP responder testing 09:11:46 i want to see the overall pipeline once its merged, we are going to have many changes to the pipeline now with all the new features and we need to make sure 09:11:52 everything integrate good together 09:12:10 yes, you can talk with oanson about the new testing infrastructure added 09:12:16 yes 09:12:18 will be good to add security group and port security tests using that 09:12:19 ok 09:12:24 I see 09:12:31 thank you dingboopt 09:12:35 #topic DB consistency 09:12:40 ok 09:12:40 nick-ma: the stage is yours 09:12:59 i uploaded the source code that I wrote last several days and updated the spec to reflect the design. 09:13:14 the current code is under manual testing. 09:13:44 Only version number, no jounal db table? 09:13:47 #info DB consistency patch https://review.openstack.org/#/c/282290/ 09:14:00 version number and SQL lock db table. 09:14:11 #info DB consistency spec https://review.openstack.org/#/c/268005/ 09:14:23 agree 09:14:23 hi raofei, welcome! :) 09:14:31 introduce one db table for versioned object; another db table for distributed lock. all are implemented by SQL. 09:14:40 you can check the updated spec. 09:14:44 dingboopt: would be great if you can review it as well and provide feedback 09:14:57 yes 09:15:07 will do 09:15:19 nick-ma: ok sounds great, any help that you would like to have oanson help with? 09:15:26 err any task 09:15:28 the testing result is that I concurrently creates 10 subnets and it works. But it may bring overhead on sql backend due to isolation level 09:15:30 problem 09:16:40 neutron by default sets isolation level to REPEATABLE READ. but it is not good for SQL lock statements. the mysql official suggests to use READ COMMITTED which is by default in postgresql. but neutron core team refused during some mailing list discussion. this problem is also in nova team. 09:17:40 there are no solution in the upstream because this is the common problem for almost all the projects. 09:17:48 i see wnats the process to change that is it in the code or configration 09:18:32 changing the isolation level may cause problem to exsting neutron code 09:18:36 code conversion. lots of db codes are assumed under REPEATABLE READ. 09:18:41 yes 09:18:44 dingbo is right. 09:18:54 you mean the seperation of network and subnet? 09:18:54 to two objects in dragonflow model 09:19:06 OK there some connectivity issues for gal 09:19:20 no, currently i don't change the schema. 09:19:30 transaction isolation leve 09:19:31 l 09:19:54 i think if the code is working for the current schema. it is also working for changes in the later stage. 09:20:03 next stage is to to the compare and swap ? will then the DB setting effect the performance 09:21:11 nick-ma: you mean the effect is when you lock the version table 09:21:18 compare and swap is based on what Db backends we use. ETCD provide compare-swap for value. zookeeper provides compare-swap for internal version. if we need it, i can provide another spec on that. 09:22:00 yes, lock statements are working, but not optimal. 09:22:05 I see, I think that the end goal is to do the compare and swap 09:22:33 but i agree we need to do small steps and make sure we understand the problems 09:22:43 another test is also needed for Galera clustering to demonstrate the sql lock mechanism. 09:23:29 Ok we will try to set it up here, you are using Galera clustering in your solution ? 09:23:29 i'll do it this week and try to finish the current work using sql lock to protect concurrent writes on DF DB. 09:24:00 thats great i will try to review your patch today 09:24:04 no, i'll evaluate and test the code under Galera to see the effect. 09:24:34 you can review them tomorrow. i'll update it this evening. 09:24:39 ok sorry 09:24:41 Ok will do 09:24:45 we have internet problems here :) 09:24:59 gampel, nick-ma: problem solved? 09:25:06 err i mean topic :) 09:25:26 #topic publish-subscribe 09:25:37 anything open on DB consistency? 09:25:42 I think we coverd this topic with action items form me and nick-ma to test under Galera 09:25:45 I am back 09:25:58 #action nick-ma, gampel test DB consistency under Galera 09:26:04 yes. if you have any problems or questions, please review my spec. 09:26:10 okie great 09:26:29 and we need to understand better the effect of the DB setting EPEATABLE READ/ READ COMMITTED 09:26:36 i'll make sure the spec and code are consistent. 09:26:45 gampel: anything on the publish-subscribe? 09:26:53 we merged the API 09:27:06 too bad DuanKebo is not here, as we need to sync regarding redis :( 09:27:14 currently all teh publishing is done via the neutron server 09:27:15 we will schedule meeting for tommorow to discuss it 09:27:31 he told me he will join shortly 09:27:34 okie great, oanson is adding monitoring thread 09:27:52 oanson: hi, just in time :) 09:27:57 yes we are adding a publisher service 09:28:02 would like to describe the monitoring threadas 09:28:07 for tables 09:28:10 ? 09:28:18 Sure 09:28:34 The plan is to add a separate service, i.e. separate process, for publishing info. 09:28:49 #info pub-sub service https://review.openstack.org/#/c/281336/ 09:28:53 On this service, we plan to run a thread monitoring the distributed db for changes 09:29:09 #info pub-sub api https://review.openstack.org/#/c/282879/ 09:29:15 This means that the monitor will only run once on a neutron server, and not once per CPU/core 09:29:49 A serivce is necessary in most cases, since binding the socket (for TCP/IP communications) can only be done from a single process 09:30:04 it is impotent to note that q-svc is forked per core, so currently we run it multiple time in the same machine 09:30:38 We'll make the inter-process and network communications pluggable, so in case a published event doesn't need to go through the service (e.g. UDP), it can bypass it and go directly out the network 09:30:51 In that case, the service will only be for DB table monitoring. 09:31:04 okie, good work oanson, you believe it can be done this week? we do need to see how we help on the redis part as well, to have it act similar to this 09:31:21 Definitely will be up for review this week 09:31:27 #action oanson pub/sub service patch for review 09:31:30 ok great, thanks 09:31:36 we currently think to call it publisher service , if there are any other suggestion it will be great 09:31:51 db-publisher-service 09:32:05 sound good to me 09:32:11 publisher service sounds good either way 09:32:33 gampel: anything else open on this front? we do need to start test it 09:32:36 df-publisher-service, in case it ever publishes non-db related events 09:32:41 more often 09:32:52 yes i need to add the service discovery table for all the publishers 09:33:12 #action gampel add service discovery DF table for publishers 09:33:13 and will address all the comments on teh spce please review 09:33:24 yes, in general i think we want to have all/most of our configuration from the DB 09:33:30 less use of configuration files 09:33:39 will make updates/upgrades/deployments easier 09:33:46 and of course managment 09:33:53 #info: :https://review.openstack.org/#/c/263733/ 09:34:10 yes 09:34:12 #info pub-sub spec https://review.openstack.org/#/c/263733/ 09:34:16 please review everyone 09:34:32 #topic distributed DNAT 09:34:38 raofei: the stage is yours :) 09:34:45 any update? 09:35:06 #info dnat patch https://review.openstack.org/#/c/279434/ 09:35:34 This week I will start the DNAT app coding, maybe i can commit it this week. 09:35:54 raofei: okie great, regarding the spec you sent, maybe consider changing the name 09:36:06 sorry, I'm attending a meeting with somebody just now. 09:36:11 to be more specific for the title as we already have distributed dnat spec 09:36:24 raofei: okie np, lets take this offline and on the review 09:36:37 OK, please commit your comments, thanks. 09:36:41 #topic selective proactive 09:36:46 DuanKebo here? 09:37:06 he is in another meeting, let me check 09:37:30 the spec was merged, we talked and decided to not move the functionallity from the controller 09:37:57 but have seperate module just for the selective/proactive part, and then when everything is done maybe we can consider rafactoring 09:38:09 OK 09:38:12 reason is there are many people working on the same code and we dont want to introduce major 09:38:23 code refactoring that are not needed, personally i dont think they are right 09:38:31 but we can re-look when everything is done 09:38:50 the OVSDB monitor events are sent to the DB changes queue and the controller will react accordingly 09:39:03 yes we have a lot of patch integration on the way 09:39:11 anyone has anything to add on selective proactive? 09:39:32 we will discuss with DuanKebo tommorow, anyone that wants to join is welcome, come to #openstack-dragonflow 09:39:56 ok, for controller reliability no one is here to talk about 09:40:02 #topic controller reliability 09:40:15 we will need to make sure this is all good and works with current L3 09:40:24 application as it changes cookie for flows 09:40:42 i think we should set the L3 proactive app as default for now 09:40:47 any thoughts? 09:41:00 whats the namespace needed for the L3 how many bits 09:41:41 there is one more optimization that is needed in the L3 app and thats not using flows from every subnet to every subnet but just 09:41:55 the cookie is 64 bit, if we use 32 bit for the reliability is it enough ? 09:42:01 classifying the router id (giving it a register) and then having one flow per router interface 09:42:32 gampel: for reliability even less is enough it basically only need to be 2-3 numbers 09:42:37 yes, I think so, using router id instead of subnet 09:42:40 2-3 options 09:42:41 yes we discuss this in the past how big effort an effort you think it is 09:42:56 gampel: not big at all we already have everything in place 09:43:00 i will start working on it 09:43:36 #action gsagie enhance L3 apps to use one flow per router interface based on router id 09:43:48 then, i think we dont even need the cookie 09:43:52 for the L3 proactive app 09:44:17 gsagie, you mean one flow from every subnet to every router? Or one flow from every subnet to each router, and then flows between routers? 09:44:29 to its* router, in the second question. 09:44:58 oanson: regardless of proactive/reactive, i mean the flows at the first table, we will only need one flow per router interface 09:45:12 right now we have from each router interface to all the other interfaces to check there is route 09:45:26 for proactive, we will have to detect the destination port in the next table 09:45:39 we are running out of time so lets continue this offline 09:45:43 Hi DuanKebo :) 09:45:46 Hi 09:45:52 we wont have enough time for all topics 09:45:55 Sorry for late 09:45:56 #topic RedisDB 09:46:20 DuanKebo: np, we really feel we need to sync regarding redis DB implementation, both the DB and the pub/sub 09:46:22 DuanKebo ? are you here 09:46:28 as its going to be the default implementation 09:46:37 Eran has posted some comments about redis driver 09:46:56 Feipeng will discuss them with Eran 09:47:14 yes he can ping me any time 09:47:39 okie, we need to see maybe me or oanson can help with any task 09:47:42 or someone else 09:48:05 #action gsagie.gampel,DuanKebo,feipeng sync regarding redis db efforts and see we on schedule 09:48:47 We also need to modify the df plugin to support redis db 09:48:49 yes make sure we are aligned with the db-publisher-service idea 09:49:18 Yes, it will use the sub-pub framework 09:49:41 does redis provide build-in pub/sub? 09:49:52 Yes 09:49:58 it does. 09:50:21 ok, get it. thanks. 09:50:27 yes it has some nice features regarding selective proactive 09:50:41 it also filter things in the publisher which is good 09:51:08 okie 09:51:10 Hi gal, we need the a "tanant_id" para in the north bound db interfaces. 09:51:11 okie 09:51:31 we need to test it in scale , zmq has a publisher XPUB but it does not scale when doing publisher side filtering 09:51:52 DuanKebo: i am adding this part right now, what do you see is missing? 09:52:36 #action test redis pub/sub with publisher side filtering 09:52:39 in ZMQ we currently use PUB which is client side (we need to test the trade off network/server load) 09:52:39 Great 09:53:20 DuanKebo: i add it to db_api, i think its already added and the selective proactive will also need to call the pub/sub "register_topic" with new tenant ids 09:53:30 to register for pub/sub in addition to bringing the correct DB 09:53:58 our Neutron plugin should already populate all objects with another parameter called "topic" which we pass to the DB_API 09:54:03 thats the tenant id 09:54:06 Yes, it will can register-topic and unregister_topic 09:54:15 okie great 09:54:20 yes and unregister when deleting the Tenant VMs 09:54:41 In addition to the sub/pub interface, 09:54:50 we dont have time to talk about testing/bugs but lets agree that next meeting we going to put it first, Shlomo_N is doing some good work and with the community for us to compare results with DVR 09:54:58 and understand L3 performance 09:54:59 the query interface aslo need tenant_id para. 09:55:12 yuli_s will soon start working on control plane testing and stretch tests 09:55:33 thanks gsagie :) 09:55:37 DuanKebo: what do you mean "query" ? you mean query from the DB? like "get_all_keys"? 09:55:53 for example 09:56:02 DuanKebo: in master we already have this 09:56:10 get_ports() 09:56:11 yup 09:56:11 in the db_api 09:56:24 we called it "topic", by default its None 09:56:29 #info: Yuli control plane spec (WIP) https://review.openstack.org/#/c/282873/ 09:56:37 OK, i will update my code 09:56:46 #info: Yuli control plane spec (WIP) https://review.openstack.org/#/c/282873/ 09:56:51 cool didnt see this, will review 09:57:02 DuanKebo: ok, please let me know if anything is missing 09:57:06 #topic open discussion 09:57:16 only 1-2 minutes left, anyone wants to add anything? :) 09:57:37 DuanKebo: we will have the meeting tommorow to furthur sync regarding redis db and selective proactive 09:57:47 OK 09:57:50 so if you guys have any questions.. 09:57:59 I will book a meeting room in advance. 09:58:07 Sorry for the inconvienent. 09:58:13 thank you everyone :) 09:58:17 ok, thanks everyone for joining 09:58:24 in irc? 09:58:38 yes 09:58:42 nick-ma: we will send you the ivnitation 09:58:46 okie. thanks. 09:58:48 so you can join in 09:59:11 thanks, all. 09:59:29 ok thanks everyone! 09:59:31 #endmeeting