#openstack-meeting log

14:01:01 <slaweq> #startmeeting neutron_drivers
14:01:02 <openstack> Meeting started Fri Oct 18 14:01:01 2019 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:01:03 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:01:04 <slaweq> hi
14:01:05 <openstack> The meeting name has been set to 'neutron_drivers'
14:01:09 <mlavalle> o/
14:01:11 <njohnston> o/
14:02:14 <slaweq> lets wait few more minutes for haleyb and yamamoto to have a quorum
14:02:17 <haleyb> hi
14:02:42 <ralonsoh> hi
14:04:27 <slaweq> amotoki will not be there today, we are still missing yamamoto but I think we can start as we already have quorum
14:04:37 <slaweq> #topic RFEs
14:04:54 <slaweq> first on the list for today is:
14:04:56 <slaweq> https://bugs.launchpad.net/neutron/+bug/1843924
14:04:56 <openstack> Launchpad bug 1843924 in neutron "[RFE] Create optional bulk resource_extend" [Wishlist,Triaged]
14:05:02 <yamamoto> hi
14:05:15 <slaweq> proposed by njohnston
14:06:22 <njohnston> Hi!  This spec is oriented towards solving one of the two areas of bulk operations that are hard to optimize (the other being IPAM).
14:07:11 <njohnston> There is significant time being spent in resource_extend, and at present there is no way to pass resources to extenders in bulk form.
14:08:23 <njohnston> My main concern that led me to submit this as an RFE is that this be graceful to use for items that leverage resource_extend, especially because I don't know if any out of tree extensions register with resource_extend.  I would not be surprised if they do.
14:12:20 <njohnston> I don't have a clear idea of what the performance gains will be, but I do think they will be non-trivial.  And this benefits more than just the bulk port optimization work - if any other bulk operations will be optimized in future this can be leveraged there as well.
14:14:23 <mlavalle> Yeah... This is an area we can explore with the work I've been doing with code profiling
14:14:56 <mlavalle> This weekend I will push the final version of that tooling, per the last meeting of the performance team
14:15:44 <mlavalle> so we can start exploring how much improvement comes out of this RFE
14:16:32 <slaweq> njohnston: so with this profiling tool from mlavalle You should be able to measure actually how much time it spends now on resource_extend operations and than we can know better how (if) much we can win there
14:16:58 <njohnston> slaweq: Correct, which I think is a natural pre-requisite for this.
14:18:28 <slaweq> so njohnston do You think we should get back to this rfe when You will do some profiling already?
14:19:11 <mlavalle> why not approve the RFE and require a PoC
14:19:27 <njohnston> I am fine with either option
14:19:32 <mlavalle> where njohnston writes the PoC and I help with the profiling
14:19:57 <slaweq> mlavalle: that would work for me too
14:20:09 <slaweq> haleyb: yamamoto: any opinions?
14:20:34 <yamamoto> fine with either ways
14:20:52 <haleyb> i think it would be great to get this done, as we know more users are using bulk ops
14:21:15 <slaweq> ok, so lets approve rfe and I will add comment about PoC and profiling it
14:22:35 <mlavalle> profiling before and after
14:23:14 <slaweq> mlavalle: correct
14:23:18 <njohnston> The POC will start disabled, so it will be functionally the same as master... then it should be a one-liner to enable it for profiling
14:23:26 <njohnston> to make it easy
14:23:30 <mlavalle> yeap
14:24:09 <slaweq> ok, lets move on than
14:24:13 <slaweq> next rfe
14:24:15 <slaweq> https://bugs.launchpad.net/neutron/+bug/1825345
14:24:15 <openstack> Launchpad bug 1825345 in neutron "[RFE] admin-state-down doesn't evacuate bindings in the dhcp_agent_id column" [Wishlist,Confirmed]
14:24:30 <slaweq> this one was discussed some time ago already
14:24:37 <slaweq> recently I talked with zigo about it on irc
14:24:57 <slaweq> and I wrote summary of pros and cons of 2 possible solutions in my last comment
14:25:44 <slaweq> so I wanted to discuss here which option is in Your opinion better (or maybe there is some 3rd one?) and what we will finally do with this rfe
14:29:29 <ralonsoh> with client side implementation, we have the drawback of a slower evacuation, but this is also a pros (see server side cons: possible congestion)
14:29:33 <ralonsoh> Am I correct?
14:29:53 <slaweq> ralonsoh: yes, that is correct IMO
14:30:13 <ralonsoh> so +1 to client (easier, no API, no server change)
14:31:18 <haleyb> it reminds me a little of the StarlingX proposal for automatic re-balancing, but not exactly
14:31:39 <njohnston> I had the same thought haleyb
14:32:06 <mlavalle> yeah, but in this case the rebalancing is manual
14:32:20 <mlavalle> it's in control of the admin
14:32:21 <haleyb> that was a little different, i think it might be Ok to do this in the server, maybe just in two steps - --disable, then --evacuate
14:32:45 <haleyb> mlavalle: agreed, automatic is not preferred by me
14:33:53 <haleyb> slaweq: in some cases, for example when dhcp_agents_per_network (or whatever that is) == number of controllers, evacuating an agent might do nothing, since the network is already scheduled on the others
14:33:56 <slaweq> haleyb: I agree, this should be done only on admin's request
14:34:29 <haleyb> we still have the issue of un-balanced when the agent is brought back
14:35:05 <slaweq> haleyb: but that's IMO different problem than this rfe is trying to address
14:35:27 <haleyb> slaweq: right, just an obvious comment for another RFE :)
14:35:32 <slaweq> :)
14:36:17 <yamamoto> how would the --evacuate option look like at the rest api level?
14:37:22 <slaweq> yamamoto: I guess it should be new API call, something like /agents/<agent_id>/evacuate
14:37:31 <slaweq> but that's only my assumption now
14:39:05 <mlavalle> but maybe yamamoto is asking more about the mechanics of how it would work
14:40:17 <slaweq> mlavalle: so that I don't know, it's zigo's proposal
14:40:47 <njohnston> I wonder if it's like get-me-a-network, it's a rest call that just orchestrates a few other operations under the hood
14:41:04 <mlavalle> most likely
14:41:41 <slaweq> njohnston: probably it would be like that
14:42:19 <yamamoto> is there any precedent where we have this kind of automation in the client?
14:44:29 <slaweq> IIUC there are some plans to add project deletion (like ospurge) into openstack sdk
14:44:44 <slaweq> but it's for sure not implemented there yet
14:45:27 <njohnston> and that is better as a client based thing because it spans multiple services
14:49:18 <slaweq> njohnston: I agree
14:50:36 <slaweq> nova have evacute of vms from host, right?
14:50:44 <mlavalle> yes
14:50:46 <slaweq> and it's done on server side, is that correct?
14:50:52 <mlavalle> yes
14:52:06 <slaweq> do You know about other projects who provides such option for some resources maybe?
14:52:07 <mlavalle> https://docs.openstack.org/api-ref/compute/?expanded=evacuate-server-evacuate-action-detail
14:52:33 <mlavalle> it is an action you perform on a server
14:53:05 <slaweq> so maybe we should do it also on server side to be kind of "consistent" with e.g. nova?
14:53:57 <mlavalle> and the 'host' parameter that goes in the request body is optional
14:54:11 <mlavalle> if not specified, the scheduler picks a destination host
14:54:53 <slaweq> ahh, but it's to evacuate "VM"
14:55:10 <slaweq> so in our case it would be "evacuate network X from agent Y"
14:55:50 <mlavalle> yeah
14:56:02 <mlavalle> but it's a good parallel
14:56:15 <slaweq> but for that we have already API
14:56:18 <mlavalle> most likely that's where zigo got the idea from
14:56:43 <slaweq> https://docs.openstack.org/api-ref/network/v2/?expanded=schedule-a-network-to-a-dhcp-agent-detail#agents
14:57:06 <slaweq> what we are potentially missing is something like "evacuate all networks from agent Y"
14:57:32 <mlavalle> why not clarify that in the RFE?
14:57:34 <slaweq> so it would be something a but different than what "nova's evacuate" is
14:57:55 <slaweq> ok, I will ask to clarify that in rfe
14:58:03 <slaweq> and we will get back to this one once again
14:58:07 <slaweq> ok for You?
14:58:54 <njohnston> +1
14:58:59 <mlavalle> if you read the initial description in the RFE, he proposes: openstack network agent evacuate e865d619-b122-4234-aebb-3f5c24df1c8e
14:59:19 <mlavalle> which is menas network by network
14:59:39 <mlavalle> which means network by network
14:59:58 <slaweq> mlavalle: do You think that "e865d619-b122-4234-aebb-3f5c24df1c8e" is network id in this example?
15:00:00 <mlavalle> in other words, it is a parallel to the Nova evacuate
15:00:01 <slaweq> or agent id?
15:00:12 <slaweq> IMO it's agent id
15:00:13 <njohnston> I think it's an agent id
15:00:23 <mlavalle> you are right, it is agent id
15:00:26 <ralonsoh> (time is up)
15:00:30 <slaweq> ok, have to finish now
15:00:34 <slaweq> thx for attending
15:00:38 <njohnston> o/
15:00:38 <slaweq> and have a great weekend
15:00:40 <slaweq> o/
15:00:43 <slaweq> #endmeeting