#openstack-meeting-3 log

15:00:53 <carl_baldwin> #startmeeting neutron_l3
15:00:53 <openstack> Meeting started Thu Feb 19 15:00:53 2015 UTC and is due to finish in 60 minutes.  The chair is carl_baldwin. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:54 <amuller> heya
15:00:57 <openstack> The meeting name has been set to 'neutron_l3'
15:01:04 <salv-orlando> aloha
15:01:18 <carl_baldwin> #topic Announcements
15:01:30 <mrsmith> G'day
15:01:33 <carl_baldwin> #link https://wiki.openstack.org/wiki/Meetings/Neutron-L3-Subteam
15:01:58 <carl_baldwin> kilo-3 is four weeks away.  That isn’t much time at all.
15:02:58 <carl_baldwin> Voting for summit talks is open this week and will close soon.  I’m not sure when.
15:03:25 <pc_m> 23rd I think...
15:03:30 <amuller> yeah
15:04:04 <carl_baldwin> That sounds right.  I’ve already voted.
15:04:09 <carl_baldwin> Any other announcements?
15:04:47 <carl_baldwin> #topic Bugs
15:05:24 <carl_baldwin> I don’t have any bugs to bring up now.  Are there any bugs to bring to our attention?
15:06:23 <salv-orlando> not from me. I am not aware of serious issues with l3 agent or dvr
15:06:42 <salv-orlando> but this does not mean that there aren't issues - it's just that I'm not aware of them!
15:07:08 <carl_baldwin> #topic L3 agent restructuring.
15:07:49 <carl_baldwin> We’re winding down this effort but still trying to get patches merged
15:07:52 <carl_baldwin> #link https://review.openstack.org/#/q/topic:bp/restructure-l3-agent+status:open,n,z
15:08:17 <carl_baldwin> If you remove the status:open from the query, you can see just how much has already been merged.  We’re really on the tail end here.
15:08:45 <carl_baldwin> Anything to discuss here?
15:10:00 <carl_baldwin> We’ll just keep moving forward then.
15:10:13 <carl_baldwin> #topic neutron-ipam
15:10:54 <carl_baldwin> How are we doing here?
15:11:19 <tidwellr1> new patch here https://review.openstack.org/#/c/148698/
15:11:30 <tidwellr1> *patch set
15:11:31 <salv-orlando> aloha
15:11:50 <carl_baldwin> tidwellr1: Thanks.
15:12:09 <salv-orlando> so I have two topics which I'd like to bring to your attention
15:12:13 <carl_baldwin> tidwellr1: I will be sure to review it today.
15:12:14 <pavel_bondar> New version of db_base re-factor, still WIP, https://review.openstack.org/#/c/153236/
15:12:38 <salv-orlando> 1)  non blocking IP Allocation algorithms 2) what we can reasonably merge in Kilo
15:13:00 <salv-orlando> I might become quite chatty - so if you want me to move this discussions to the mailing list I'll do that
15:13:03 <carl_baldwin> pavel_bondar: Thanks.  I will review it today.  Hopefully others here can too.
15:13:31 <carl_baldwin> salv-orlando: No worries.  Let’s see what we can discuss here.
15:13:41 <salv-orlando> Ip allocation: https://github.com/salv-orlando/ip_allocation_poc
15:13:58 <salv-orlando> I've tested 2 non-locking algorithms (that don't do lock for update)
15:14:32 <salv-orlando> and compared them with lock for update.  executing a lock for update scales a lot better. Were you expecting that?
15:14:56 <salv-orlando> I kind of was expecting this - since approaches based on retries are conceptually similar to active waits
15:15:17 <carl_baldwin> salv-orlando: That kind of makes sense.
15:15:27 <salv-orlando> however the issue is that lock for update queries suffer of dataset validation issues in active/active clusters like galera
15:15:36 <salv-orlando> so we need an alternative
15:15:37 <carl_baldwin> Are both found in db.py in your link?
15:15:53 <salv-orlando> carl_baldwin: the algorithms package
15:16:12 <carl_baldwin> salv-orlando: Thanks, I glanced right over that sub-package.
15:16:15 <salv-orlando> carl_baldwin: that github repo is kind of ok - I just need to add some documentation to explain the algorithms
15:17:08 <salv-orlando> I devised two - one based on primary keys. It's rather slow, and also I found out that in active active cluster also primary key violations trigger data set validation failures and hence db deadlocks
15:17:17 <salv-orlando> so that approach is baddish as well.
15:17:57 <salv-orlando> the third one instead is a three-step approach which leverages compare-and-swap and also uses the same technique as the bully election to uniquely determine a winner
15:18:08 <carl_baldwin> salv-orlando: I’m not I understand yet the data set validation failures with primary key violations.
15:19:07 <salv-orlando> carl_baldwin: basically if you add two records with the same pkey on two nodes you won't know about the violation until the data sets are synchronized, and this will be signalled with a DBDeadlock error
15:19:20 <salv-orlando> apparently dealing with these errors is quite expensive
15:19:57 <salv-orlando> I say apparently because I do not have numbers to support a quantitative judgment. But somebody from Galera came to the ML to explain how things work.
15:20:04 <carl_baldwin> salv-orlando: I see.  It seems there is much I don’t know about the active/active clusters.  I wouldn’t have expected such a delay.
15:20:40 <carl_baldwin> salv-orlando: Do you have a link to that ML thread?
15:20:49 <salv-orlando> carl_baldwin: gimme one minute to find it.
15:21:25 <salv-orlando> carl_baldwin: I think the issue lies in the fact that the rollback upon a data set validation error is a lot more expensive that a rollback which can be triggered before a local commit (local == your local node)
15:22:12 * salv-orlando finding ML threads... please wait
15:22:39 <salv-orlando> carl_baldwin: this is one thread -> http://lists.openstack.org/pipermail/openstack-dev/2014-May/035264.html
15:23:07 <salv-orlando> carl_baldwin: and this is the second one -> http://lists.openstack.org/pipermail/openstack-dev/2015-February/056007.html
15:23:24 <salv-orlando> summarizing - I will post later on today a detailed analysis
15:24:01 <salv-orlando> but my opinion is: give users a chance to choose the algorithms that better suits their needs. a query which does lock for update works in most cases.
15:24:15 <carl_baldwin> salv-orlando: Thank you.  This will take some time to review.
15:24:25 <salv-orlando> And also provide support for DBDeadlock failures when doing LOCK...FOR UPDATE
15:24:39 <salv-orlando> But then also offer an alternative which scales decently for people using ACTIVE/ACTIVE clusters
15:24:57 <salv-orlando> finally, there are two more options to consider from an architectural perspective:
15:25:39 * carl_baldwin on the edge of his seat
15:25:42 <salv-orlando> - distributed locking at the application layer. We have avoided it so far not because "it does not scale" (as I've heard several times) but because it is a process which is intrinsecally error prone. So if we can avoid distributed coordination, it's better
15:25:53 <salv-orlando> But if are better off with some distributed coordination, we should use it
15:26:49 <amuller> salv-orlando: like dogpile?
15:26:55 <carl_baldwin> Right, up to now, we’ve limited all distributed coordination to what we can accomplish with the db.
15:27:00 <salv-orlando> - then we can also think of having a centralized IPAM server even when there multiple API workers. This conductor-like solution will greatly simplify the architecture and the implementation of IP Allocation strategies, at the expenses of reducing the level of concurrency supported by the IPAM service
15:27:08 <salv-orlando> amuller: dogpile but even memcached
15:27:24 <salv-orlando> amuller: and if you want to find the best way to make your like more complicated, use zookeeper
15:27:33 <salv-orlando> or even better, implement your own system!
15:28:33 <amuller> using dogpile or memcache to retain 'select for update' "like" behavior but making Galera happy seems like such a braindead simple solution to me
15:28:49 <salv-orlando> this is all for the first part I wanted to discuss (IP allocation)
15:29:04 <amuller> much easier than shifting our entire locking paradigm throughout the code
15:30:32 <salv-orlando> amuller: while I am sure that adding a distributed lock among workers is kind of easy, I am not sure what do you about shifting the entire locking paradigm throughout the code. Anyway, we have plenty of time for in-depth technical discussions, because I think that for Kilo we should be happy to just have an IPAM driver which does IPAM as it's done todya.
15:30:39 <carl_baldwin> We need to look in to it further.  I’ve had it in the back of my mind that distributed coordination or a separate service will be the answer.
15:31:02 <salv-orlando> not because it's difficult to code, but simply because we are an opinionated community and reviews can lead to the most unexpected and spectacular results.
15:31:15 <salv-orlando> anyway this lead me to the second point which is what we can merge on kilo.
15:31:17 <salv-orlando> for kilo, sorry
15:31:23 <carl_baldwin> salv-orlando: Right, we’ll have to look at this for Liberty.  We need to discuss now what we can do for Kilo.
15:32:14 <salv-orlando> carl_baldwin: to cut a long story short I think 1) db_base refactoring that enables optional usage of ipam driver; 2) reference ipam driver 3) other drivers - but I'm not sure if we want to have them in openstack/neutron
15:32:58 <salv-orlando> Please do not hate me but I reckon it might be better to move subnet pools out to Liberty. I'm not sure we can get enough review cycles there and the necessary API changes in.
15:33:11 <carl_baldwin> salv-orlando: So far, that matches what I was thinking.
15:33:39 <carl_baldwin> salv-orlando: Are you aware that the API changes are up for review?
15:33:53 <salv-orlando> carl_baldwin: yes I am. But you know how it is with APIs ;)
15:34:53 <salv-orlando> I have to leave because I have another meeting now - but I want just to mention that I think I'll have the IPAM driver ready by next weds
15:35:10 <salv-orlando> and then I can move one to help with Pavel's patch review and ensure it gets merged in time.
15:35:28 <salv-orlando> As with anything that touches the db_base_plugin class that might raise some eybrows
15:35:31 <pavel_bondar> thanks, sounds good to me
15:35:59 <carl_baldwin> salv-orlando: Thanks. Would you mind doing a review of the API changes before making any judgement?
15:36:49 <salv-orlando> carl_baldwin: I will. But I sense that you are then still confident to get the subnet pools merged in Kilo ;)
15:37:06 <carl_baldwin> salv-orlando: I’m not ready to concede defeat, no.
15:37:09 <carl_baldwin> :)
15:37:33 <salv-orlando> carl_baldwin: In my opinion the API change is fine as long as it's backward compatible, but I'm not the only reviewer around!
15:37:54 <salv-orlando> anyway, I'll review these patches and provide a detailed update on IP allocation
15:37:57 <carl_baldwin> salv-orlando: Who else should I get to review it?
15:38:11 <carl_baldwin> salv-orlando: Thank you.
15:38:26 <salv-orlando> carl_baldwin: markmcclain in the past took interest in all the IPAM stuff
15:38:40 <carl_baldwin> salv-orlando: I will talk to him.
15:38:41 <salv-orlando> it might be worth to also get armax's and mestery's perspective on the release impact
15:38:56 <carl_baldwin> salv-orlando: I will
15:39:12 <carl_baldwin> #action carl_baldwin will talk to Mark about subnet API.
15:39:13 <salv-orlando> carl_baldwin: cool. Enjoy the rest of the meeting guys!
15:39:27 <carl_baldwin> #action carl_baldwin will talk to mestery and armax about release impact.
15:39:32 <carl_baldwin> salv-orlando: bye.
15:39:38 <carl_baldwin> Anything more on ipam?
15:42:13 <carl_baldwin> Let’s get some review attention on the patchs the remainder of this week.   I will be sure they can all be easily found from the L3 meeting page.
15:42:33 <carl_baldwin> #action carl_baldwin will update meeting page to easily find ipam patches.
15:42:53 <carl_baldwin> #topic neutron-ovs-dvr
15:43:10 <carl_baldwin> mrsmith: Rajeev: Swami: you around?
15:43:13 <carl_baldwin> Anything to discuss?
15:43:33 <carl_baldwin> viveknarasimhan alse
15:43:38 <carl_baldwin> s/alse/also/
15:43:51 <mrsmith> I am just getting back up to speed from holiday
15:44:25 <Rajeev> carl_baldwin: resuming work on HA
15:44:59 <Rajeev> and some other patches that were pending but nothing to discuss here for me.
15:48:16 <carl_baldwin> Thanks, I see that Swami’s patch merged.  Is anyone keeping an eye on the dvr job failures?
15:49:11 <carl_baldwin> I don’t see any significant improvement yet.
15:49:17 <Rajeev> carl_baldwin: I think armax's graph will be the indicator
15:49:26 <mrsmith> Slight drop ya
15:49:37 <mrsmith> Last I checked
15:50:31 <carl_baldwin> I think it needs a little more time.
15:51:11 <Rajeev> carl_baldwin: agreed
15:52:42 <Rajeev> we do have couple of talks submitted, please do vote
15:53:41 <carl_baldwin> Rajeev: I did see them.  Best of luck.
15:54:04 <Rajeev> carl_baldwin: Thanks.
15:54:24 <carl_baldwin> Well, I think we’re about ready to wrap up.
15:54:29 <carl_baldwin> #topic Open Discussion
15:54:53 <carl_baldwin> Anyone have good talk abstracts to plug?  ;)
15:56:00 <pc_m> carl_baldwin: Do think we're at a point where we can refactor the VPN device driver to access new router object, vs use VPN_service and agent?
15:56:51 <carl_baldwin> pc_m: Yes, I think we can get started on that.
15:56:52 <pc_m> current device driver -> vpn service -> agent -> router
15:57:15 <carl_baldwin> John submitted a talk on IPAM
15:57:18 <carl_baldwin> #link https://www.openstack.org/vote-vancouver/presentation/subnet-pools-and-pluggable-external-ip-management-in-openstack-kilo
15:57:32 <pc_m> device driver has router_id, should we use that to get the router object and then move the methods into the driver?
15:57:47 <carl_baldwin> rossella_s submitted a talk on L2/L3 agent improvements
15:57:51 <carl_baldwin> #link https://www.openstack.org/vote-vancouver/presentation/neutron-l2-and-l3-agents-how-they-work-and-how-kilo-improves-them
15:58:29 <carl_baldwin> sc68cal and I submitted one on IPv6 and L3
15:58:33 <carl_baldwin> #link https://www.openstack.org/vote-vancouver/presentation/whats-coming-for-ipv6-and-l3-in-neutron
15:58:34 <pc_m> carl_baldwin: driver needs to add/delete nat rules. Are there existing methods in the router class that can be used?
15:59:14 <carl_baldwin> pc_m: Some of that might still be unmerged in the patch chain.
15:59:33 <pc_m> carl_baldwin: OK. Will ping you off line to discuss
15:59:41 <carl_baldwin> pc_m: It just occurred to me that (obviously) the advanced services can’t have patches that depend on unmerged patches in Neutron.
15:59:46 <carl_baldwin> pc_m: Okay.
16:00:09 <carl_baldwin> Okay, we’re out of time.
16:00:11 <carl_baldwin> #endmeeting