09:01:42 #startmeeting dragonflow 09:01:43 Meeting started Mon May 23 09:01:42 2016 UTC and is due to finish in 60 minutes. The chair is gsagie. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:01:44 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:01:47 The meeting name has been set to 'dragonflow' 09:01:51 Hello everyone, 09:01:54 Hi 09:01:56 Hi 09:01:56 who is here for the meeting? 09:02:20 #info oanson, DuanKebo, hujie, gampel, gsagie in meeting 09:02:26 hi 09:02:30 Hi 09:02:47 hi 09:03:07 Hello ! 09:03:12 da jai hao 09:03:15 #info scsnow and yuli_s1 also in meeting :) 09:03:16 da jia hao 09:03:24 ;) 09:03:31 #topic Redis HA 09:03:50 i have put some comments on the patch, i think oanson also had some suggestions 09:04:03 oanson: like to share? 09:04:13 I have also put my comments on the patch. 09:04:24 I suggested to use an existing solution like sentinel. 09:04:38 gsagie: Do you have the link on hand? 09:04:55 link:https://review.openstack.org/#/c/298701/13/doc/source/specs/redis_availability.rst 09:04:57 #link https://review.openstack.org/#/c/298701/ 09:04:57 one second, Zhu weiqiong is coming 09:05:08 zhuweiqiong said that this is for old implementation, but I didn't have time to investigate further. 09:05:20 I used sentinel with Redis 3.0.7. Seems to work 09:06:18 hello 09:06:29 hi nick-ma 09:06:38 But as I said, I didn't research this further 09:06:42 I think the overall picture seems ok, i wonder few things like why we read/write only to master, i assume this is because of the consistency issues but dont we solve this with the work that nick-ma done? 09:07:50 okie maybe we can continue this later when Zhu weiqiong comes 09:07:54 I think it is decided by redis itself. 09:08:01 you mean the redis clustering management. i didn't research it further either. does it use a kind of election? if so, that makes sense. 09:08:10 why we need to read from the slave? 09:08:30 DuanKebo: If the master fails, the slave acts as a backup 09:08:34 ohh hi zhuweiqiong 09:08:42 hi 09:08:55 oanson had a question regarding the sentinel use in Redis HA, maybe you can clarify please 09:09:07 the Redis cluster is a multi-master cluster, right? 09:09:19 and each master node has its slave node 09:09:34 there is no need to use sentinel 09:09:39 hujie, yes 09:10:38 cluster has already integrated the functions of sentinal 09:11:48 zhuweiqiong, but you say you need to re-implement things such as node failure detection, and master election, which already exist in other solutions 09:13:00 is the Redis solution already written? or just the spec? 09:13:40 zhuweiqiong: i think oanson is just trying to save you some work effort :) but we can continue look at this offline 09:13:53 the only thing i do is to read the cluster topology to update the connections between client and server and notify to re-sync data. Failure detection and master election is done by cluster itself 09:13:59 The code is be reviewing for several weeking 09:14:03 and talk on the review, but after your explanation today on IRC i think it make sense either way 09:14:06 *weeks 09:14:33 zhuweiqiong: ok that sounds reasonable, we will review the patches 09:14:37 can you give links ? 09:14:44 so they will be in the log and we can take a deeper look 09:14:48 sentinel could not help on thi 09:15:08 sure, the redis ha: https://review.openstack.org/#/c/309642/35 09:15:19 #link https://review.openstack.org/#/c/309642/35 redis ha 09:15:31 thanks 09:15:38 ok we will speed up review on this 09:15:41 #topic bugs 09:15:48 hi, sorry for being late :) 09:15:50 thx :) 09:15:56 hi Vikasc 09:16:05 hi vikasc welcome 09:16:08 oanson and yuli_s found some gaps in the DNAT implementation which are being closed 09:16:14 hi gsagie 09:16:16 nick-ma did the work on the br-ex adding 09:16:31 yes 09:16:44 and there is another issue which yuli_s is working on when the floating IP is inside the L2 network 09:16:54 you mean missing the br-ex in the devstack setup fro compute nodes ? 09:16:57 in table 30 - EGRESS_NAT_TABLE 09:16:59 gampel: yes 09:17:19 we send packet rewriting dest mac 09:17:23 of gateway 09:17:26 what do you mean that the fip is is inside the l2 09:17:26 I also found some bugs in the master branch when I wrote the db consistency code in my local computer :) 09:18:08 we have a bug when connecting with the IP with the same net as public net 09:18:28 gampel: the problem was noticed that when we send traffic "from the floating ip" we send on the GW mac, but the destination could also be another compute node in the same network, what yuli_s,oanson saw in the office is that the router didnt forward the packet 09:18:30 the packets are send to the gateway, while they should be send directly 09:18:53 oanson suggested a solution that sounds good 09:18:54 a solution for this is to forward packets to NORMAL 09:18:55 The router doesn't have to forward the packets. 09:19:04 oanson: want to explain 09:19:26 oanson, got for it 09:19:27 Just that you said 'router didnt forward the packet'. It doesn't have to. 09:19:40 you mean two compute nodes with FIPs on them 09:19:57 If the router recognises that the next hop is on the same port as the incoming packet, it is allowed to send a message saying it is the wrong destination. 09:20:18 gampel: for example.. 09:20:20 This tells the sending host to find the correct gateway or the correct direct host 09:20:36 two compute nodes with FIPs on the same network not actually classic "north-south" 09:20:58 gsagie: but a possible scenario 09:21:05 yes 09:21:18 In any case, the proposed solution is to send the packet to the compute node for routing. 09:21:19 is there a patch with the solution? 09:21:30 yuli_s1? 09:21:32 gsagie, I am working on it 09:21:41 So the BUG is that they currently go via the SNAT 09:21:58 The compute node recognises if the packet needs to be routed to the direct network, and does the neighbour discovery and forwards the packet. 09:22:00 gampel: no, its not working if the router doesnt forward the packet 09:22:07 Ok got it 09:22:13 Or recognises which is the next hop gateway, according to its routing table, and passes the packet to that. 09:22:25 (with neighbour discovery as well) 09:22:39 #action yuli_s add patch to solve dnat problem with GW mac 09:23:10 i think that they are working on adding ARP sender support to OVS so we might be able to leverage something like that in the future 09:23:12 all the code related to gw mac discovery is not needed 09:23:38 are we going to wait with the release tag for this bug? 09:23:40 as the packet will be forwarded to linux routing 09:24:22 gsagie, yes 09:24:39 gampel, DuanKebo: ? should we wait with the release tag for that? 09:24:43 nick-ma: ? 09:24:52 This is a possible scenario, but not always needed. 09:25:01 DuanKebo: i agree 09:25:27 how long do we think it will take ? 09:25:29 i think we can tag and backport this fix, but whatever you guys think 09:25:43 gampel, i will send first patch today 09:25:48 I don't think this solution should take long 09:26:07 I agree with gal we could do a backport to the release branch 09:26:13 yes, agree. 09:26:31 okie 09:26:45 any other problems with the current version? 09:26:46 what the release tag is for? 09:26:47 yuli, you mean the arp packet will work through the NORMAL flow to learn GW MAC? 09:27:05 DuanKebo: this is essentially a release in OpenStack for Dragonflow 09:27:15 we want to have a stable Mitaka tag 09:27:19 hujie, no need for mac gw, 09:27:24 so we can have a stable version and then continue work on new features and more patches 09:27:36 Ok 09:27:37 hujie, we will set the br-ex mac and NORMAL 09:27:42 hujie, in essence yes 09:27:52 and it fixes all problems like a miracle ;) 09:28:13 ok, any other notable bug/problem ? 09:28:21 I have :) 09:28:37 please share :) 09:28:37 we have a build issue for OVS 09:28:54 @omer is working on that, 09:29:09 Yes. 09:29:11 I have a patch which describe the db consistency based on nick-ma 09:29:20 gsagie, please let me know if any specific help is needed :) 09:29:22 I also noticed a problem with security groups and OVS 2.5 09:29:24 https://review.openstack.org/#/c/300877/ 09:29:24 hujie: share the link? 09:29:47 btw, I have not tested the metadata patch 09:29:58 i took up two bugs but unfortunately forund invalid both of them :) 09:30:00 we probably need few days 09:30:11 to make sure everything works as expected 09:30:12 *found 09:30:12 would you please review it and I have modify the spec according to Eran's recently review comments 09:30:21 vikasc: there is one thing that i wasnt able to work on, we want to make Redis as the default DB in the gate CI tests 09:30:32 yuli_s1, I think the metadata service is not for the tagged version 09:30:33 #link https://review.openstack.org/#/c/300877/ 09:30:43 vikasc, great ! 09:30:46 vikasc: So you closed 2 bugs. That's great! :) 09:30:54 vikasc: think you can take it? 09:31:04 yuli_s1, oanson :) with 0 line commits 09:31:21 vikasc, that's the best way. 0 degradation. 0 new bugs. 09:31:22 hujie: okie, its on my list we will review it today 09:31:27 thanks 09:31:29 also, I have another patch, some future patch may rely on it and I think it's higher priority :) 09:31:32 gsagie, sure .. i will seek help offline from you if needed 09:31:40 vikasc: ok great 09:31:44 https://review.openstack.org/#/c/315433/ 09:31:57 #action vikasc change gate testing to Redis from etcd 09:32:07 #link https://review.openstack.org/#/c/315433/ 09:32:16 And I'll commit the bugs I found recently after the patch is merged :) 09:32:28 I also noticed a problem with security groups and OVS 2.5 09:32:38 oanson: please share 09:32:45 security groups rely on conjunction feature, which was added only in january 09:32:54 hujie: yes good work on that one we jsut need to make sure we did not break any think it is a big patch of the api interfaces 09:32:59 Commit 18080541d2768c17c17711c35b4d4a23ab3e4153 09:33:26 ok, I know that, thanks :) 09:33:34 oanson: conjunction feature was added only on OVS 2.5? 09:33:43 sounds strange to me 09:33:44 When using ovs 2.5 with security groups, I see many OFP errors in log, and no sg flows 09:33:58 ovs 2.4 09:33:59 I just saw it now, so I didn't have time to investigate 09:34:15 okie, let me put an action on this 09:34:38 #action oanson investigate problem with conjunction, security group and OVS 2.5 09:34:59 I found of new bug https://bugs.launchpad.net/dragonflow/+bug/1583992, I think this is an important bug. 09:35:00 Launchpad bug 1583992 in DragonFlow "There are many DBKeyNotFound exceptions in dragonflow's log when running fullstack" [High,In progress] - Assigned to Duan Kebo (duankebo) 09:35:01 And if anything is needed from DuanKebo team please let them know 09:35:23 OK 09:35:34 DuanKebo: you run fullstack locally? or on the gate? 09:35:42 yes, yuan wei can help 09:35:51 no 09:36:11 the exceptiions are in CI 09:36:17 If exception occurs when processing a northbound event, one second delay will be introduced, it leads to more exceptions because of the delay. Avalanche will occur then. 09:36:19 regarding fullstack test. should it work out of the box in centos 7? 09:36:30 I see this bug is assigned on you, do you need any help to investigate it? 09:36:57 I want to limit the frequency of sync 09:37:08 scsnow: we mainly test on Fedora and Ubuntu, dunno if anyone tried on CentOs 09:37:22 Do you think it's ok? 09:37:33 DuanKebo: what do you mean" limit the frequency of sync" ? 09:37:37 gsagie, I have some problems with deps, that I had to resolve manually 09:37:54 scsnow: please add bug for this 09:38:15 self.apply_db_change(None, None, 'sync', None) this will introduce a 1 second delay 09:38:17 we did not tested it on centos only on fedora 09:38:50 scsnow: and if you have a solution :) i believe our devstack script need to be aligned to have special support for centos 09:39:02 gsagie, ok, I'll recreate my centos 7 vm from scratch and then report bug if needed 09:39:21 if a nb exception occur, self.apply_db_change will be called. and the delay will cause more exceptions. 09:40:07 DuanKebo: so you want to cancel this sync? 09:40:24 no, limit the frequency, 09:40:35 DuanKebo: it sounds ok to me 09:40:56 not do the sync every time excepiton occurs. 09:41:02 we should also check if its really needed with Redis 09:41:08 and with out consistency solution 09:41:10 our 09:41:15 yes. 09:41:17 Do you meet this bug, when we use the redis db driver, invoke the get_logical_port() in nb_api, if you just input the port_id arg, that's ok, but if you add the topic arg, that happens the exception 09:41:29 The end goal is to have a "heartbit" from the API servers (publishers) so we dont really need this 09:41:54 I think it will occur under redis 09:41:59 but you can limit the frequency for now if that helps 09:42:08 or even cancel it 09:42:18 we only added it before we had a consistency solution 09:42:49 Yes 09:43:05 i have canceled the "sync" in redis driver in my patch 09:43:34 DuanKebo, zhuweiqiong: sounds good to me 09:43:40 the message in db consistency patch we call it recover :) 09:43:48 yes 09:43:49 no problem :) 09:44:08 anything else? 09:44:14 nick-ma: something you want to raise? 09:44:21 Do you meet this bug, when we use the redis db driver, invoke the get_logical_port() in nb_api, if you just input the port_id arg, that's ok, but if you add the topic arg, that happens the exception 09:44:58 gsagie: nop. 09:45:03 hujie: ?? 09:45:31 hujie: DuanKebo raised this bug, not familiar with what you talk about 09:45:47 do you have test the current redis db driver in your environment? 09:45:53 also if there is a simple bug, vikasc might want to help with that :) 09:46:04 I mean if get the lport by id, it's ok 09:46:05 sure 09:46:07 hujie: i think yuli_s and oanson did, i havent yet 09:46:12 ok. here's a weird problem in my patch, the jenkins gives -1, but i don't know why. https://review.openstack.org/#/c/319713/ 09:46:22 but if get lport by id+topic, it happens exception 09:46:43 redis db driver 09:46:46 nick-ma: requierment problem 09:47:05 nick-ma: Requirement for package neutron-lib : Requirement(package=u'neutron-lib', location='', specifiers='>=0.0.1', markers=u'', comment=u'# Apache-2.0', extras=frozenset([])) does not match openstack/requirements value : set([Requirement(package='neutron-lib', location='', specifiers='>=0.1.0', markers='', comment='# Apache-2.0', extras=frozenset([]))]) 09:47:23 ok, thanks. i will check it. 09:47:23 need to change to >=0.1.0 09:47:35 hujie: what exception? 09:48:02 Traceback (most recent call last): 09:48:02 File "/home/hujie/tool-db-sync/df/api_nb.py", line 201, in get_logical_port 09:48:02 return LogicalPort(port_value) 09:48:02 File "/home/hujie/tool-db-sync/df/api_nb.py", line 505, in __init__ 09:48:02 self.lport = jsonutils.loads(value) 09:48:03 File "/usr/local/lib/python2.7/dist-packages/oslo_serialization/jsonutils.py", line 229, in loads 09:48:03 return json.loads(encodeutils.safe_decode(s, encoding), **kwargs) 09:48:04 File "/usr/local/lib/python2.7/dist-packages/oslo_utils/encodeutils.py", line 39, in safe_decode 09:48:04 raise TypeError("%s can't be decoded" % type(text)) 09:48:57 i think it is not a redis related problem 09:49:23 zhuweiqiong, it could be the value redis returned is not a json string 09:49:53 yes, I agree with omer, but we need to verify it 09:50:00 yes, you are right 09:50:17 oanson, do you meet this bug in your environment? 09:50:22 okie, let me know if you need any help with that and please add a bug 09:50:32 i will try to raise an env with Redis this week and test it out 09:50:45 I think we can go on 09:50:45 hujie, I didn't notice, but I did very specific tests. So I may have missed it. 09:50:52 okie 09:50:57 #topic open discussion 09:51:51 I would like to announce that the team decided to add oanson as a core reviewer for Dragonflow, all of oanson help to the project is greatly appreciated and he is doing good work, hopefully this will help us to review patches faster 09:51:56 and merge patches faster 09:52:07 so good work oanson 09:52:25 Thanks. 09:52:25 oanson, congrats :) 09:52:27 thanks, oanson's work. congrats. 09:52:30 I agree omer did a great job 09:52:31 +1 09:52:37 If everyone find any securiy groups related bug, can send it to Yuan Wei 09:52:38 oanson, mazal tov ! 09:52:42 +1 09:52:43 anyone would like anything else to add? 09:53:14 DuanKebo: okie, oanson will verify security group bug with OVS 2.5 and send to YuanWei if he find anything 09:53:42 for FIP related ones, Fei Peng can help. 09:54:12 DuanKebo: okie, for the FIP case let have Fei Peng review yuli_s patch when its ready today 09:54:25 because its going to change the GW mac setting, so need your comments on that 09:54:40 OK, np 09:54:57 okie thanks everyone, and if anyone needs anything #openstack-dragonflow :) 09:55:08 good day/evening 09:55:13 bye 09:55:15 Thanks bye 09:55:18 #endmeeting