09:01:42 <gsagie> #startmeeting dragonflow
09:01:43 <openstack> Meeting started Mon May 23 09:01:42 2016 UTC and is due to finish in 60 minutes.  The chair is gsagie. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:01:44 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:01:47 <openstack> The meeting name has been set to 'dragonflow'
09:01:51 <gsagie> Hello everyone,
09:01:54 <oanson> Hi
09:01:56 <DuanKebo> Hi
09:01:56 <gsagie> who is here for the meeting?
09:02:20 <gsagie> #info oanson, DuanKebo, hujie, gampel, gsagie in meeting
09:02:26 <gampel> hi
09:02:30 <hujie> Hi
09:02:47 <scsnow> hi
09:03:07 <yuli_s1> Hello !
09:03:12 <yuli_s1> da jai hao
09:03:15 <gsagie> #info scsnow and yuli_s1 also in meeting :)
09:03:16 <yuli_s1> da jia hao
09:03:24 <yuli_s1> ;)
09:03:31 <gsagie> #topic Redis HA
09:03:50 <gsagie> i have put some comments on the patch, i think oanson also had some suggestions
09:04:03 <gsagie> oanson: like to share?
09:04:13 <oanson> I have also put my comments on the patch.
09:04:24 <oanson> I suggested to use an existing solution like sentinel.
09:04:38 <oanson> gsagie: Do you have the link on hand?
09:04:55 <gampel> link:https://review.openstack.org/#/c/298701/13/doc/source/specs/redis_availability.rst
09:04:57 <gsagie> #link https://review.openstack.org/#/c/298701/
09:04:57 <DuanKebo> one second, Zhu weiqiong is coming
09:05:08 <oanson> zhuweiqiong said that this is for old implementation, but I didn't have time to investigate further.
09:05:20 <oanson> I used sentinel with Redis 3.0.7. Seems to work
09:06:18 <nick-ma> hello
09:06:29 <gampel> hi nick-ma
09:06:38 <oanson> But as I said, I didn't research this further
09:06:42 <gsagie> I think the overall picture seems ok, i wonder few things like why we read/write only to master, i assume this is because of the consistency issues but dont we solve this with the work that nick-ma done?
09:07:50 <gsagie> okie maybe we can continue this later when Zhu weiqiong comes
09:07:54 <DuanKebo> I think it is decided by redis itself.
09:08:01 <nick-ma> you mean the redis clustering management. i didn't research it further either. does it use a kind of election? if so, that makes sense.
09:08:10 <DuanKebo> why we need to read from the slave?
09:08:30 <oanson> DuanKebo: If the master fails, the slave acts as a backup
09:08:34 <gsagie> ohh hi zhuweiqiong
09:08:42 <zhuweiqiong> hi
09:08:55 <gsagie> oanson had a question regarding the sentinel use in Redis HA, maybe you can clarify please
09:09:07 <hujie> the Redis cluster is a multi-master cluster, right?
09:09:19 <hujie> and each master node has its slave node
09:09:34 <zhuweiqiong> there is no need to use sentinel
09:09:39 <oanson> hujie, yes
09:10:38 <zhuweiqiong> cluster has already integrated the functions of sentinal
09:11:48 <oanson> zhuweiqiong, but you say you need to re-implement things such as node failure detection, and master election, which already exist in other solutions
09:13:00 <gsagie> is the Redis solution already written? or just the spec?
09:13:40 <gsagie> zhuweiqiong: i think oanson is just trying to save you some work effort :) but we can continue look at this offline
09:13:53 <zhuweiqiong> the only thing i do is to read the cluster topology to update the connections between client and server and notify to re-sync data. Failure detection and master election is done by cluster itself
09:13:59 <DuanKebo> The code is be reviewing for several weeking
09:14:03 <gsagie> and talk on the review, but after your explanation today on IRC i think it make sense either way
09:14:06 <DuanKebo> *weeks
09:14:33 <gsagie> zhuweiqiong: ok that sounds reasonable, we will review the patches
09:14:37 <gsagie> can you give links ?
09:14:44 <gsagie> so they will be in the log and we can take a deeper look
09:14:48 <zhuweiqiong> sentinel could not help on thi
09:15:08 <zhuweiqiong> sure, the redis ha: https://review.openstack.org/#/c/309642/35
09:15:19 <gsagie> #link https://review.openstack.org/#/c/309642/35 redis ha
09:15:31 <gsagie> thanks
09:15:38 <gsagie> ok we will speed up review on this
09:15:41 <gsagie> #topic bugs
09:15:48 <vikasc> hi, sorry for being late :)
09:15:50 <zhuweiqiong> thx :)
09:15:56 <gsagie> hi Vikasc
09:16:05 <gampel> hi vikasc welcome
09:16:08 <gsagie> oanson and yuli_s found some gaps in the DNAT implementation which are being closed
09:16:14 <vikasc> hi gsagie
09:16:16 <gsagie> nick-ma did the work on the br-ex adding
09:16:31 <yuli_s1> yes
09:16:44 <gsagie> and there is another issue which yuli_s is working on when the floating IP is inside the L2 network
09:16:54 <gampel> you mean missing the br-ex in the devstack setup fro compute nodes ?
09:16:57 <yuli_s1> in table 30 - EGRESS_NAT_TABLE
09:16:59 <gsagie> gampel: yes
09:17:19 <yuli_s1> we send packet rewriting dest mac
09:17:23 <yuli_s1> of gateway
09:17:26 <gampel> what do you mean that the fip is is inside the l2
09:17:26 <hujie> I also found some bugs in the master branch when I wrote the db consistency code in my local computer :)
09:18:08 <yuli_s1> we have a bug when connecting with the IP with the same net as public net
09:18:28 <gsagie> gampel: the problem was noticed that when we send traffic "from the floating ip" we send on the GW mac, but the destination could also be another compute node in the same network, what yuli_s,oanson saw in the office is that the router didnt forward the packet
09:18:30 <yuli_s1> the packets are send to the gateway,  while they should be send directly
09:18:53 <gsagie> oanson suggested a solution that sounds good
09:18:54 <yuli_s1> a solution for this is to forward packets to NORMAL
09:18:55 <oanson> The router doesn't have to forward the packets.
09:19:04 <gsagie> oanson: want to explain
09:19:26 <yuli_s1> oanson, got for it
09:19:27 <oanson> Just that you said 'router didnt forward the packet'. It doesn't have to.
09:19:40 <gampel> you mean two compute nodes with FIPs on them
09:19:57 <oanson> If the router recognises that the next hop is on the same port as the incoming packet, it is allowed to send a message saying it is the wrong destination.
09:20:18 <gsagie> gampel: for example..
09:20:20 <oanson> This tells the sending host to find the correct gateway or the correct direct host
09:20:36 <gsagie> two compute nodes with FIPs on the same network not actually classic "north-south"
09:20:58 <oanson> gsagie: but a possible scenario
09:21:05 <gsagie> yes
09:21:18 <oanson> In any case, the proposed solution is to send the packet to the compute node for routing.
09:21:19 <gsagie> is there a patch with the solution?
09:21:30 <oanson> yuli_s1?
09:21:32 <yuli_s1> gsagie, I am working on it
09:21:41 <gampel> So the BUG is that they currently go via the SNAT
09:21:58 <oanson> The compute node recognises if the packet needs to be routed to the direct network, and does the neighbour discovery and forwards the packet.
09:22:00 <gsagie> gampel: no, its not working if the router doesnt forward the packet
09:22:07 <gampel> Ok got it
09:22:13 <oanson> Or recognises which is the next hop gateway, according to its routing table, and passes the packet to that.
09:22:25 <oanson> (with neighbour discovery as well)
09:22:39 <gsagie> #action yuli_s add patch to solve dnat problem with GW mac
09:23:10 <gsagie> i think that they are working on adding ARP sender support to OVS so we might be able to leverage something like that in the future
09:23:12 <yuli_s1> all the code related to gw mac discovery is not needed
09:23:38 <gsagie> are we going to wait with the release tag for this bug?
09:23:40 <yuli_s1> as the packet will be forwarded to linux routing
09:24:22 <yuli_s1> gsagie, yes
09:24:39 <gsagie> gampel, DuanKebo: ? should we wait with the release tag for that?
09:24:43 <gsagie> nick-ma: ?
09:24:52 <DuanKebo> This is a possible scenario, but not always needed.
09:25:01 <gsagie> DuanKebo: i agree
09:25:27 <gampel> how long do we think it will take ?
09:25:29 <gsagie> i think we can tag and backport this fix, but whatever you guys think
09:25:43 <yuli_s1> gampel, i will send first patch today
09:25:48 <oanson> I don't think this solution should take long
09:26:07 <gampel> I agree with gal we could do a backport to the release branch
09:26:13 <nick-ma> yes, agree.
09:26:31 <gsagie> okie
09:26:45 <gsagie> any other problems with the current version?
09:26:46 <DuanKebo> what the release tag is for?
09:26:47 <hujie> yuli, you mean the arp packet will work through the NORMAL flow to learn GW MAC?
09:27:05 <gsagie> DuanKebo: this is essentially a release in OpenStack for Dragonflow
09:27:15 <gampel> we want to have a stable Mitaka tag
09:27:19 <yuli_s1> hujie, no need for mac gw,
09:27:24 <gsagie> so we can have a stable version and then continue work on new features and more patches
09:27:36 <DuanKebo> Ok
09:27:37 <yuli_s1> hujie, we will set the br-ex mac and NORMAL
09:27:42 <oanson> hujie, in essence yes
09:27:52 <yuli_s1> and it fixes all problems like a miracle ;)
09:28:13 <gsagie> ok, any other notable bug/problem ?
09:28:21 <hujie> I have :)
09:28:37 <gsagie> please share :)
09:28:37 <gampel> we have a build issue for OVS
09:28:54 <gampel> @omer is working on that,
09:29:09 <oanson> Yes.
09:29:11 <hujie> I have a patch which describe the db consistency based on nick-ma
09:29:20 <vikasc> gsagie,  please let me know if any specific help is needed :)
09:29:22 <oanson> I also noticed a problem with security groups and OVS 2.5
09:29:24 <hujie> https://review.openstack.org/#/c/300877/
09:29:24 <gsagie> hujie: share the link?
09:29:47 <yuli_s1> btw, I have not tested the metadata patch
09:29:58 <vikasc> i took up two bugs but unfortunately forund invalid both of them :)
09:30:00 <yuli_s1> we probably need few days
09:30:11 <yuli_s1> to make sure everything works as expected
09:30:12 <vikasc> *found
09:30:12 <hujie> would you please review it and I have modify the spec according to Eran's recently review comments
09:30:21 <gsagie> vikasc: there is one thing that i wasnt able to work on, we want to make Redis as the default DB in the gate CI tests
09:30:32 <oanson> yuli_s1, I think the metadata service is not for the tagged version
09:30:33 <gsagie> #link https://review.openstack.org/#/c/300877/
09:30:43 <yuli_s1> vikasc, great !
09:30:46 <oanson> vikasc: So you closed 2 bugs. That's great! :)
09:30:54 <gsagie> vikasc: think you can take it?
09:31:04 <vikasc> yuli_s1, oanson  :) with 0 line commits
09:31:21 <oanson> vikasc, that's the best way. 0 degradation. 0 new bugs.
09:31:22 <gsagie> hujie: okie, its on my list we will review it today
09:31:27 <gsagie> thanks
09:31:29 <hujie> also, I have another patch, some future patch may rely on it and I think it's higher priority :)
09:31:32 <vikasc> gsagie,  sure .. i will seek help offline from you if needed
09:31:40 <gsagie> vikasc: ok great
09:31:44 <hujie> https://review.openstack.org/#/c/315433/
09:31:57 <gsagie> #action vikasc change gate testing to Redis from etcd
09:32:07 <gsagie> #link https://review.openstack.org/#/c/315433/
09:32:16 <hujie> And I'll commit the bugs I found recently after the patch is merged :)
09:32:28 <oanson> I also noticed a problem with security groups and OVS 2.5
09:32:38 <gsagie> oanson: please share
09:32:45 <oanson> security groups rely on conjunction feature, which was added only in january
09:32:54 <gampel> hujie:  yes good work on that one we jsut need to make sure we did not break any think it is a big patch of the api interfaces
09:32:59 <oanson> Commit 18080541d2768c17c17711c35b4d4a23ab3e4153
09:33:26 <hujie> ok, I know that, thanks :)
09:33:34 <gsagie> oanson: conjunction feature was added only on OVS 2.5?
09:33:43 <gsagie> sounds strange to me
09:33:44 <oanson> When using ovs 2.5 with security groups, I see many OFP errors in log, and no sg flows
09:33:58 <DuanKebo> ovs 2.4
09:33:59 <oanson> I just saw it now, so I didn't have time to investigate
09:34:15 <gsagie> okie, let me put an action on this
09:34:38 <gsagie> #action oanson investigate problem with conjunction, security group and OVS 2.5
09:34:59 <DuanKebo> I found of new bug https://bugs.launchpad.net/dragonflow/+bug/1583992, I think this is an important bug.
09:35:00 <openstack> Launchpad bug 1583992 in DragonFlow "There are many DBKeyNotFound exceptions in dragonflow's log when running fullstack" [High,In progress] - Assigned to Duan Kebo (duankebo)
09:35:01 <gsagie> And if anything is needed from DuanKebo team please let them know
09:35:23 <hujie> OK
09:35:34 <gsagie> DuanKebo: you run fullstack locally? or on the gate?
09:35:42 <DuanKebo> yes, yuan wei can help
09:35:51 <DuanKebo> no
09:36:11 <DuanKebo> the exceptiions are in CI
09:36:17 <DuanKebo> If exception occurs when processing a northbound event, one second delay will be introduced, it leads to more exceptions because of the delay. Avalanche will occur then.
09:36:19 <scsnow> regarding fullstack test. should it work out of the box in centos 7?
09:36:30 <gsagie> I see this bug is assigned on you, do you need any help to investigate it?
09:36:57 <DuanKebo> I want to limit the frequency of sync
09:37:08 <gsagie> scsnow: we mainly test on Fedora and Ubuntu, dunno if anyone tried on CentOs
09:37:22 <DuanKebo> Do you think it's ok?
09:37:33 <gsagie> DuanKebo: what do you mean" limit the frequency of sync" ?
09:37:37 <scsnow> gsagie, I have some problems with deps, that I had to resolve manually
09:37:54 <gsagie> scsnow: please add bug for this
09:38:15 <DuanKebo> self.apply_db_change(None, None, 'sync', None) this will introduce a 1 second delay
09:38:17 <gampel> we did not tested it on centos only on fedora
09:38:50 <gsagie> scsnow: and if you have a solution :) i believe our devstack script need to be aligned to have special support for centos
09:39:02 <scsnow> gsagie, ok, I'll recreate my centos 7 vm from scratch and then report bug if needed
09:39:21 <DuanKebo> if a nb exception occur, self.apply_db_change will be called. and the delay will cause more exceptions.
09:40:07 <gsagie> DuanKebo: so you want to cancel this sync?
09:40:24 <DuanKebo> no, limit the frequency,
09:40:35 <gsagie> DuanKebo: it sounds ok to me
09:40:56 <DuanKebo> not do the sync every time excepiton occurs.
09:41:02 <gsagie> we should also check if its really needed with Redis
09:41:08 <gsagie> and with out consistency solution
09:41:10 <gsagie> our
09:41:15 <nick-ma> yes.
09:41:17 <hujie> Do you meet this bug, when we use the redis db driver, invoke the get_logical_port() in nb_api, if you just input the port_id arg, that's ok, but if you add the topic arg, that happens the exception
09:41:29 <gsagie> The end goal is to have a "heartbit" from the API servers (publishers) so we dont really need this
09:41:54 <DuanKebo> I think it will occur under redis
09:41:59 <gsagie> but you can limit the frequency for now if that helps
09:42:08 <gsagie> or even cancel it
09:42:18 <gsagie> we only added it before we had a consistency solution
09:42:49 <DuanKebo> Yes
09:43:05 <zhuweiqiong> i have canceled the "sync" in redis driver in my patch
09:43:34 <gsagie> DuanKebo, zhuweiqiong: sounds good to me
09:43:40 <hujie> the message in db consistency patch we call it recover :)
09:43:48 <zhuweiqiong> yes
09:43:49 <gsagie> no problem :)
09:44:08 <gsagie> anything else?
09:44:14 <gsagie> nick-ma: something you want to raise?
09:44:21 <hujie> Do you meet this bug, when we use the redis db driver, invoke the get_logical_port() in nb_api, if you just input the port_id arg, that's ok, but if you add the topic arg, that happens the exception
09:44:58 <nick-ma> gsagie: nop.
09:45:03 <gsagie> hujie: ??
09:45:31 <gsagie> hujie: DuanKebo raised this bug, not familiar with what you talk about
09:45:47 <hujie> do you have test the current redis db driver in your environment?
09:45:53 <gsagie> also if there is a simple bug, vikasc might want to help with that :)
09:46:04 <hujie> I mean if get the lport by id, it's ok
09:46:05 <vikasc> sure
09:46:07 <gsagie> hujie: i think yuli_s and oanson did, i havent yet
09:46:12 <nick-ma> ok. here's a weird problem in my patch, the jenkins gives -1, but i don't know why.  https://review.openstack.org/#/c/319713/
09:46:22 <hujie> but if get lport by id+topic, it happens exception
09:46:43 <hujie> redis db driver
09:46:46 <gsagie> nick-ma: requierment problem
09:47:05 <gsagie> nick-ma: Requirement for package neutron-lib : Requirement(package=u'neutron-lib', location='', specifiers='>=0.0.1', markers=u'', comment=u'# Apache-2.0', extras=frozenset([])) does not match openstack/requirements value : set([Requirement(package='neutron-lib', location='', specifiers='>=0.1.0', markers='', comment='# Apache-2.0', extras=frozenset([]))])
09:47:23 <nick-ma> ok, thanks. i will check it.
09:47:23 <gsagie> need to change to >=0.1.0
09:47:35 <zhuweiqiong> hujie: what exception?
09:48:02 <hujie> Traceback (most recent call last):
09:48:02 <hujie> File "/home/hujie/tool-db-sync/df/api_nb.py", line 201, in get_logical_port
09:48:02 <hujie> return LogicalPort(port_value)
09:48:02 <hujie> File "/home/hujie/tool-db-sync/df/api_nb.py", line 505, in __init__
09:48:02 <hujie> self.lport = jsonutils.loads(value)
09:48:03 <hujie> File "/usr/local/lib/python2.7/dist-packages/oslo_serialization/jsonutils.py", line 229, in loads
09:48:03 <hujie> return json.loads(encodeutils.safe_decode(s, encoding), **kwargs)
09:48:04 <hujie> File "/usr/local/lib/python2.7/dist-packages/oslo_utils/encodeutils.py", line 39, in safe_decode
09:48:04 <hujie> raise TypeError("%s can't be decoded" % type(text))
09:48:57 <zhuweiqiong> i think it is not a redis related problem
09:49:23 <oanson> zhuweiqiong, it could be the value redis returned is not a json string
09:49:53 <hujie> yes, I agree with omer, but we need to verify it
09:50:00 <zhuweiqiong> yes, you are right
09:50:17 <hujie> oanson, do you meet this bug in your environment?
09:50:22 <gsagie> okie, let me know if you need any help with that and please add a bug
09:50:32 <gsagie> i will try to raise an env with Redis this week and test it out
09:50:45 <DuanKebo> I think we can go on
09:50:45 <oanson> hujie, I didn't notice, but I did very specific tests. So I may have missed it.
09:50:52 <gsagie> okie
09:50:57 <gsagie> #topic open discussion
09:51:51 <gsagie> I would like to announce that the team decided to add oanson as a core reviewer for Dragonflow, all of oanson help to the project is greatly appreciated and he is doing good work, hopefully this will help us to review patches faster
09:51:56 <gsagie> and merge patches faster
09:52:07 <gsagie> so good work oanson
09:52:25 <oanson> Thanks.
09:52:25 <vikasc> oanson, congrats :)
09:52:27 <nick-ma> thanks, oanson's work. congrats.
09:52:30 <gampel> I agree omer did a great job
09:52:31 <gampel> +1
09:52:37 <DuanKebo> If everyone find any securiy groups related bug, can send it to Yuan Wei
09:52:38 <yuli_s1> oanson, mazal tov !
09:52:42 <DuanKebo> +1
09:52:43 <gsagie> anyone would like anything else to add?
09:53:14 <gsagie> DuanKebo: okie, oanson will verify security group bug with OVS 2.5 and send to YuanWei if he find anything
09:53:42 <DuanKebo> for FIP related ones, Fei Peng can help.
09:54:12 <gsagie> DuanKebo: okie, for the FIP case let have Fei Peng review yuli_s patch when its ready today
09:54:25 <gsagie> because its going to change the GW mac setting, so need your comments on that
09:54:40 <DuanKebo> OK, np
09:54:57 <gsagie> okie thanks everyone, and if anyone needs anything #openstack-dragonflow :)
09:55:08 <gsagie> good day/evening
09:55:13 <scsnow> bye
09:55:15 <gampel> Thanks bye
09:55:18 <gsagie> #endmeeting