14:00:10 #startmeeting neutron_drivers 14:00:11 Meeting started Fri Jul 26 14:00:10 2019 UTC and is due to finish in 60 minutes. The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:12 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:14 The meeting name has been set to 'neutron_drivers' 14:00:17 hi 14:00:23 hey 14:00:25 hi 14:00:34 hi 14:00:43 hi 14:00:58 hello 14:01:11 haleyb|away: is off on vacation, so we are good to go 14:01:19 #topic RFEs 14:02:11 Good to see yamamoto is recovered from his surgery. \o/ 14:02:22 \o/ 14:02:24 thank you 14:02:26 o/ 14:02:54 This is the RFE to be discussed today: https://bugs.launchpad.net/neutron/+bug/1836834 14:02:55 Launchpad bug 1836834 in neutron "[RFE] introduce distributed locks to ipam" [Wishlist,Confirmed] - Assigned to qinhaizhong (qinhaizhong) 14:03:20 and its associated spec: https://review.opendev.org/#/c/657221/ 14:06:07 this spec already +2, maybe need some others propose changes or W+1 14:06:11 I will say that the spec is lacking a bit of detail on the basic operation of this proposal. For example, we are implementing distributed locks, but what precisely is getting locked? If a process fails to release a lock, how is that sensed and remedied? 14:06:18 for me it looks good as separate driver 14:07:24 Is the lock on a per-subnet basis or is it for all IPAM IP allocation? Is there a fallback mode if the backend tooz points to does not respond in a timely manner? 14:07:57 I think these implementations will be reflected in the code logic, this is a new driver that you can configure to use. 14:09:30 brinzhang: I understand, and there are many details that will be determined at implementation. But the spec says "We will fix IP allocation conflicts by using locks" without saying *how* using locks fixes IP allocation conflicts. 14:10:14 brinzhang, you may give us some brief summary about the implementation. : ) njohnston's concern is worth clarifying. 14:10:28 Based on subnet-id + ip 14:10:29 I agree with njohnston . We can be relaxed with the spec description, but something more detailed could be defined in the document 14:11:00 njohnston: good point. the spec should clarify a basic approach and how the problem can be solved. 14:11:10 njohnston: qinhaizhong will be attendance, wait please. 14:12:56 the other question is, how much experience does the team proposing this spec have with tooz? I mean, have you solved previous scale up problems using it? 14:13:25 I was thinking about this spec more like some overall description of problem and wanted to see implementation details as PoC and to check if it will really improve IPAM, but I can also wait for some more detailed spec 14:14:46 mlavalle: I don't have much experience with tooz but I know that when lucasgomes implemented some "hash ring" mechanism in networking-ovn using tooz, it speed up process of creating trunk subports about 10 times or something like that, so it may help with such problems for sure 14:15:21 I don't think there has to be a deep deep description, just a few sentences laying out the approach so it's transparent to the community why this is a superior approach 14:15:22 Moreover, this will introduce a new lock store? What if the lock DB is down? No port can be created successfully? 14:15:50 I am not doubting tooz. I just want evidence that it has chances to help indeed 14:16:29 and the ovn experience you mention is a good data point. thanks for mentioning it 14:16:34 liuyulong, I don't think this rely on the same DB, but I'm not sure 14:16:43 mlavalle: I understand, and IMO best evidence would be if we would have PoC and could compare it with current driver :) 14:17:06 slaweq, agree 14:17:12 slaweq: I agree. 14:17:37 ralonsoh, tooz can use various store drivers, like redis, mysql and so on. 14:17:39 (and a more detailed description in the spec) 14:17:51 If the authors wanted to work out the details in a POC and then update the spec with a concise description of the approach once they have worked it out, I think that would allay my concerns about transparency 14:17:54 liuyulong, I know, just guessing 14:18:40 njohnston++ 14:19:03 njohnston: agree 14:20:05 qinhaizhong01: njohnston: I was thinking about this spec more like some overall description of problem and wanted to see implementation details as PoC and to check if it will really improve IPAM, but I can also wait for some more detailed spec 14:20:05 mlavalle: I don't have much experience with tooz but I know that when lucasgomes implemented some "hash ring" mechanism in networking-ovn using tooz, it speed up process of creating trunk subports about 10 times or something like that, so it may help with such problems for sure 14:21:13 Performance and scalibility improvement is very important for Neutro 14:21:16 Neutron 14:21:24 Based on the original _generate_ips algorithm, this method will be reimplemented. The ips calculated for _generate_ips will be "ip+subnet_id" as the key plus the distributed lock. 14:22:00 So I like in principle this proposal, even in an exploratory fashion 14:22:10 Introduce such centralized components, always increase the deployment difficulties of operation and maintenance. (I'm not saying this is not acceptable, just some thoughts.) 14:22:35 I would like to see why and how a distributed lock addresses the problem in the spec. 14:23:04 with that in mind and seeing the feedback from the team, this is what we propose: 14:23:23 1) We approve the RFE today with the understanding that 14:24:02 2) A more detailed spec will be proposed with the feedback from the team today 14:24:46 3) We will see the code as a PoC. We will use the experience of the PoC to feedback on the spec if needed 14:24:58 +1 14:25:05 4) The code will come with Rally tests, so we can measure improvement 14:26:17 IMO, we should welcoming experimenting 14:26:25 what do others think? 14:26:34 +1 14:26:45 it totally makes sense to me 14:26:48 +! 14:26:50 +1 14:26:51 +1 14:26:54 I really like that approach 14:26:55 +1 14:27:04 +1 14:28:35 brinzhang, qinhaizhong01: let me be clear. we are heading towards the final milestone of the cycle. And we are taking this proposal as a PoC. So the chances of this merging in TRain are rather slim. ok? 14:30:31 I'll take the silence as acquiescence :-) 14:31:55 I'll approve the RFE at the end of the meeting, adding the four points ^^^^ in the comments section 14:32:06 Let's move on then 14:32:29 Next one we have today is https://bugs.launchpad.net/neutron/+bug/1837847 14:32:30 Launchpad bug 1837847 in neutron "[RFE] neutron-vpnaas OpenVPN driver" [Undecided,New] 14:33:07 I am not sure we should discuss it today. I don't understand all the details in this proposal 14:33:47 I am bringing it up today with the hope that amotoki and yamamoto who know more about VPN could comment on it and help us triage it 14:33:57 mlavalle: where let you confusing? 14:34:03 What details? 14:34:31 I am talking now about another RFE, not yours brinzhang ann qinhaizhong01 14:36:27 I haven't looked at the OpenVPN driver RFE... 14:36:29 I'm not vpnaas expert so it's hard for me to talk about it 14:36:58 When he says his use case is to have broadcast/multicast communication with the instances, does that mean the vpn IP needs to be within the same L2 domain? 14:37:49 good question to ask in the RFE 14:37:56 If so, then I don't see how going through Neutron IPAM can be avoided, whether it's in a pre-reservation capacity or on-demand. 14:38:13 * njohnston posts the question 14:38:38 Thanks! 14:39:14 njohnston is much faster than me. 14:39:21 Looks like it will involved with DHCP? A spec with some detail is needed also. 14:39:49 At a glance, I cannot understand that point and am thinking.... 14:40:13 liuyulong: perhaps we need to understand what is the actual problem first before a spec. 14:41:45 A spec always has the section "Problem Description" : ) 14:41:48 ok, let's post questions in the RFE and see if we can move it forward 14:42:23 anythong else we should discuss today? 14:42:40 I have one 14:43:01 https://bugs.launchpad.net/neutron/+bug/1817881 14:43:02 Launchpad bug 1817881 in neutron " [RFE] L3 IPs monitor/metering via current QoS functionality (tc filters)" [Wishlist,In progress] - Assigned to LIU Yulong (dragon889) 14:43:07 We discussed this once in drivers meeting. 14:43:18 But we have no result. 14:43:42 Basicly we have a consensus is adding a new l3 agent extension. 14:43:55 But it is not approved yet. 14:44:51 one question: is it still proposal only for FIPs with QoS set? 14:45:48 slaweq, it relay on that, the tc statistics. 14:46:11 can't we use tc statistics without QoS enabled for FIP? 14:46:51 it doesn't look like something user friendly: "do You want metering, You need to set QoS on FIP" :) 14:46:52 I'm not sure, but I can. Once tc filters accept 0 for rate and burst. 14:47:09 But now, we may need a very large value for it as default. 14:47:39 if we want to have metering enabled, it should be done for all FIPs/routers handled by L3 agent 14:47:46 slaweq, we need to measure the possible performance impact 14:48:03 and should be independent of QoS feature 14:48:12 if we are adding a TC filter in a interface, without needing it, this can slow down it 14:48:18 Bandwidth limitation and metering of public IP is a basic rule. 14:49:00 liuyulong: do you mean all deployments should have bandwidth limitation and metering of public IPs? 14:49:04 but we only need the TC filter if the operator configures metering, right? 14:49:18 in this scenario I mean 14:49:23 mlavalle, right now yes 14:49:42 sorry, we have TC filters if we ask for QoS 14:49:51 FYI, the discuss once: http://eavesdrop.openstack.org/meetings/neutron_drivers/2019/neutron_drivers.2019-03-29-14.00.log.html 14:50:49 mlavalle: yes, and IMO it should be like that: "You want metering - You should enable metering in config file" and not "You want metering - You should enable QoS for FIP with some custom, very high value for bw limit" :) 14:51:11 mlavalle, yes, tc filter rule is enough. It has a accurate statistics data. 14:51:55 so will slaweq's point be addressed? 14:52:40 amotoki, no, it is not a compulsory requirements, I mean it's more like a deployment consensus. 14:53:41 So, what's value will be considered as very large for filter default? 14:53:54 10Gpbs? 14:53:56 40Gbps? 14:54:43 ConnectX-6 HCAs with 200Gbps 14:54:53 from the Mellanox website 14:55:19 Haha, It seems that there will always be higher values. 14:58:06 there is spec for this, I will try to review it in next few days, will it be ok liuyulong? 14:58:35 slaweq, OK, let me paste it here. 14:58:47 #link https://review.opendev.org/#/c/658511/ 14:58:54 liuyulong: thx 14:59:00 Title " 14:59:01 L3 agent self-service metering" 14:59:05 ok, let's all review the spec and we start with this RFE next week 14:59:20 mlavalle++ 14:59:40 Have a nice weekend 14:59:48 #endmeeting