15:00:53 #startmeeting neutron_l3 15:00:53 Meeting started Thu Feb 19 15:00:53 2015 UTC and is due to finish in 60 minutes. The chair is carl_baldwin. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:54 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:54 heya 15:00:57 The meeting name has been set to 'neutron_l3' 15:01:04 aloha 15:01:18 #topic Announcements 15:01:30 G'day 15:01:33 #link https://wiki.openstack.org/wiki/Meetings/Neutron-L3-Subteam 15:01:58 kilo-3 is four weeks away. That isn’t much time at all. 15:02:58 Voting for summit talks is open this week and will close soon. I’m not sure when. 15:03:25 23rd I think... 15:03:30 yeah 15:04:04 That sounds right. I’ve already voted. 15:04:09 Any other announcements? 15:04:47 #topic Bugs 15:05:24 I don’t have any bugs to bring up now. Are there any bugs to bring to our attention? 15:06:23 not from me. I am not aware of serious issues with l3 agent or dvr 15:06:42 but this does not mean that there aren't issues - it's just that I'm not aware of them! 15:07:08 #topic L3 agent restructuring. 15:07:49 We’re winding down this effort but still trying to get patches merged 15:07:52 #link https://review.openstack.org/#/q/topic:bp/restructure-l3-agent+status:open,n,z 15:08:17 If you remove the status:open from the query, you can see just how much has already been merged. We’re really on the tail end here. 15:08:45 Anything to discuss here? 15:10:00 We’ll just keep moving forward then. 15:10:13 #topic neutron-ipam 15:10:54 How are we doing here? 15:11:19 new patch here https://review.openstack.org/#/c/148698/ 15:11:30 *patch set 15:11:31 aloha 15:11:50 tidwellr1: Thanks. 15:12:09 so I have two topics which I'd like to bring to your attention 15:12:13 tidwellr1: I will be sure to review it today. 15:12:14 New version of db_base re-factor, still WIP, https://review.openstack.org/#/c/153236/ 15:12:38 1) non blocking IP Allocation algorithms 2) what we can reasonably merge in Kilo 15:13:00 I might become quite chatty - so if you want me to move this discussions to the mailing list I'll do that 15:13:03 pavel_bondar: Thanks. I will review it today. Hopefully others here can too. 15:13:31 salv-orlando: No worries. Let’s see what we can discuss here. 15:13:41 Ip allocation: https://github.com/salv-orlando/ip_allocation_poc 15:13:58 I've tested 2 non-locking algorithms (that don't do lock for update) 15:14:32 and compared them with lock for update. executing a lock for update scales a lot better. Were you expecting that? 15:14:56 I kind of was expecting this - since approaches based on retries are conceptually similar to active waits 15:15:17 salv-orlando: That kind of makes sense. 15:15:27 however the issue is that lock for update queries suffer of dataset validation issues in active/active clusters like galera 15:15:36 so we need an alternative 15:15:37 Are both found in db.py in your link? 15:15:53 carl_baldwin: the algorithms package 15:16:12 salv-orlando: Thanks, I glanced right over that sub-package. 15:16:15 carl_baldwin: that github repo is kind of ok - I just need to add some documentation to explain the algorithms 15:17:08 I devised two - one based on primary keys. It's rather slow, and also I found out that in active active cluster also primary key violations trigger data set validation failures and hence db deadlocks 15:17:17 so that approach is baddish as well. 15:17:57 the third one instead is a three-step approach which leverages compare-and-swap and also uses the same technique as the bully election to uniquely determine a winner 15:18:08 salv-orlando: I’m not I understand yet the data set validation failures with primary key violations. 15:19:07 carl_baldwin: basically if you add two records with the same pkey on two nodes you won't know about the violation until the data sets are synchronized, and this will be signalled with a DBDeadlock error 15:19:20 apparently dealing with these errors is quite expensive 15:19:57 I say apparently because I do not have numbers to support a quantitative judgment. But somebody from Galera came to the ML to explain how things work. 15:20:04 salv-orlando: I see. It seems there is much I don’t know about the active/active clusters. I wouldn’t have expected such a delay. 15:20:40 salv-orlando: Do you have a link to that ML thread? 15:20:49 carl_baldwin: gimme one minute to find it. 15:21:25 carl_baldwin: I think the issue lies in the fact that the rollback upon a data set validation error is a lot more expensive that a rollback which can be triggered before a local commit (local == your local node) 15:22:12 * salv-orlando finding ML threads... please wait 15:22:39 carl_baldwin: this is one thread -> http://lists.openstack.org/pipermail/openstack-dev/2014-May/035264.html 15:23:07 carl_baldwin: and this is the second one -> http://lists.openstack.org/pipermail/openstack-dev/2015-February/056007.html 15:23:24 summarizing - I will post later on today a detailed analysis 15:24:01 but my opinion is: give users a chance to choose the algorithms that better suits their needs. a query which does lock for update works in most cases. 15:24:15 salv-orlando: Thank you. This will take some time to review. 15:24:25 And also provide support for DBDeadlock failures when doing LOCK...FOR UPDATE 15:24:39 But then also offer an alternative which scales decently for people using ACTIVE/ACTIVE clusters 15:24:57 finally, there are two more options to consider from an architectural perspective: 15:25:39 * carl_baldwin on the edge of his seat 15:25:42 - distributed locking at the application layer. We have avoided it so far not because "it does not scale" (as I've heard several times) but because it is a process which is intrinsecally error prone. So if we can avoid distributed coordination, it's better 15:25:53 But if are better off with some distributed coordination, we should use it 15:26:49 salv-orlando: like dogpile? 15:26:55 Right, up to now, we’ve limited all distributed coordination to what we can accomplish with the db. 15:27:00 - then we can also think of having a centralized IPAM server even when there multiple API workers. This conductor-like solution will greatly simplify the architecture and the implementation of IP Allocation strategies, at the expenses of reducing the level of concurrency supported by the IPAM service 15:27:08 amuller: dogpile but even memcached 15:27:24 amuller: and if you want to find the best way to make your like more complicated, use zookeeper 15:27:33 or even better, implement your own system! 15:28:33 using dogpile or memcache to retain 'select for update' "like" behavior but making Galera happy seems like such a braindead simple solution to me 15:28:49 this is all for the first part I wanted to discuss (IP allocation) 15:29:04 much easier than shifting our entire locking paradigm throughout the code 15:30:32 amuller: while I am sure that adding a distributed lock among workers is kind of easy, I am not sure what do you about shifting the entire locking paradigm throughout the code. Anyway, we have plenty of time for in-depth technical discussions, because I think that for Kilo we should be happy to just have an IPAM driver which does IPAM as it's done todya. 15:30:39 We need to look in to it further. I’ve had it in the back of my mind that distributed coordination or a separate service will be the answer. 15:31:02 not because it's difficult to code, but simply because we are an opinionated community and reviews can lead to the most unexpected and spectacular results. 15:31:15 anyway this lead me to the second point which is what we can merge on kilo. 15:31:17 for kilo, sorry 15:31:23 salv-orlando: Right, we’ll have to look at this for Liberty. We need to discuss now what we can do for Kilo. 15:32:14 carl_baldwin: to cut a long story short I think 1) db_base refactoring that enables optional usage of ipam driver; 2) reference ipam driver 3) other drivers - but I'm not sure if we want to have them in openstack/neutron 15:32:58 Please do not hate me but I reckon it might be better to move subnet pools out to Liberty. I'm not sure we can get enough review cycles there and the necessary API changes in. 15:33:11 salv-orlando: So far, that matches what I was thinking. 15:33:39 salv-orlando: Are you aware that the API changes are up for review? 15:33:53 carl_baldwin: yes I am. But you know how it is with APIs ;) 15:34:53 I have to leave because I have another meeting now - but I want just to mention that I think I'll have the IPAM driver ready by next weds 15:35:10 and then I can move one to help with Pavel's patch review and ensure it gets merged in time. 15:35:28 As with anything that touches the db_base_plugin class that might raise some eybrows 15:35:31 thanks, sounds good to me 15:35:59 salv-orlando: Thanks. Would you mind doing a review of the API changes before making any judgement? 15:36:49 carl_baldwin: I will. But I sense that you are then still confident to get the subnet pools merged in Kilo ;) 15:37:06 salv-orlando: I’m not ready to concede defeat, no. 15:37:09 :) 15:37:33 carl_baldwin: In my opinion the API change is fine as long as it's backward compatible, but I'm not the only reviewer around! 15:37:54 anyway, I'll review these patches and provide a detailed update on IP allocation 15:37:57 salv-orlando: Who else should I get to review it? 15:38:11 salv-orlando: Thank you. 15:38:26 carl_baldwin: markmcclain in the past took interest in all the IPAM stuff 15:38:40 salv-orlando: I will talk to him. 15:38:41 it might be worth to also get armax's and mestery's perspective on the release impact 15:38:56 salv-orlando: I will 15:39:12 #action carl_baldwin will talk to Mark about subnet API. 15:39:13 carl_baldwin: cool. Enjoy the rest of the meeting guys! 15:39:27 #action carl_baldwin will talk to mestery and armax about release impact. 15:39:32 salv-orlando: bye. 15:39:38 Anything more on ipam? 15:42:13 Let’s get some review attention on the patchs the remainder of this week. I will be sure they can all be easily found from the L3 meeting page. 15:42:33 #action carl_baldwin will update meeting page to easily find ipam patches. 15:42:53 #topic neutron-ovs-dvr 15:43:10 mrsmith: Rajeev: Swami: you around? 15:43:13 Anything to discuss? 15:43:33 viveknarasimhan alse 15:43:38 s/alse/also/ 15:43:51 I am just getting back up to speed from holiday 15:44:25 carl_baldwin: resuming work on HA 15:44:59 and some other patches that were pending but nothing to discuss here for me. 15:48:16 Thanks, I see that Swami’s patch merged. Is anyone keeping an eye on the dvr job failures? 15:49:11 I don’t see any significant improvement yet. 15:49:17 carl_baldwin: I think armax's graph will be the indicator 15:49:26 Slight drop ya 15:49:37 Last I checked 15:50:31 I think it needs a little more time. 15:51:11 carl_baldwin: agreed 15:52:42 we do have couple of talks submitted, please do vote 15:53:41 Rajeev: I did see them. Best of luck. 15:54:04 carl_baldwin: Thanks. 15:54:24 Well, I think we’re about ready to wrap up. 15:54:29 #topic Open Discussion 15:54:53 Anyone have good talk abstracts to plug? ;) 15:56:00 carl_baldwin: Do think we're at a point where we can refactor the VPN device driver to access new router object, vs use VPN_service and agent? 15:56:51 pc_m: Yes, I think we can get started on that. 15:56:52 current device driver -> vpn service -> agent -> router 15:57:15 John submitted a talk on IPAM 15:57:18 #link https://www.openstack.org/vote-vancouver/presentation/subnet-pools-and-pluggable-external-ip-management-in-openstack-kilo 15:57:32 device driver has router_id, should we use that to get the router object and then move the methods into the driver? 15:57:47 rossella_s submitted a talk on L2/L3 agent improvements 15:57:51 #link https://www.openstack.org/vote-vancouver/presentation/neutron-l2-and-l3-agents-how-they-work-and-how-kilo-improves-them 15:58:29 sc68cal and I submitted one on IPv6 and L3 15:58:33 #link https://www.openstack.org/vote-vancouver/presentation/whats-coming-for-ipv6-and-l3-in-neutron 15:58:34 carl_baldwin: driver needs to add/delete nat rules. Are there existing methods in the router class that can be used? 15:59:14 pc_m: Some of that might still be unmerged in the patch chain. 15:59:33 carl_baldwin: OK. Will ping you off line to discuss 15:59:41 pc_m: It just occurred to me that (obviously) the advanced services can’t have patches that depend on unmerged patches in Neutron. 15:59:46 pc_m: Okay. 16:00:09 Okay, we’re out of time. 16:00:11 #endmeeting