14:00:18 <slaweq> #startmeeting neutron_drivers
14:00:19 <openstack> Meeting started Fri Jan  8 14:00:18 2021 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:20 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:23 <openstack> The meeting name has been set to 'neutron_drivers'
14:00:29 <mlavalle> o/
14:00:41 <rafaelweingartne> \o
14:00:42 <ralonsoh> hi
14:00:44 <yonglihe> hi
14:00:45 <gibi> o/
14:00:53 <amotoki> hi
14:01:07 <xinranwang> Hi
14:01:20 <slaweq> haleyb: njohnston yamamoto: ping
14:01:32 <haleyb> hi, didn't see reminder
14:01:49 <slaweq> hello everyone on the first drivers meeting in 2021 :)
14:01:56 <slaweq> first of all Happy New Year!
14:02:04 <yonglihe> Happy New Year
14:02:18 <lajoskatona> o/
14:02:22 <amotoki> happy new year
14:02:31 <ralonsoh> hny!
14:02:43 <slaweq> and now lets start as we have couple of topics to discuss
14:02:46 <slaweq> #topic RFEs
14:02:55 <slaweq> first one:
14:02:56 <slaweq> https://bugs.launchpad.net/neutron/+bug/1909100
14:02:59 <openstack> Launchpad bug 1909100 in neutron "[RFE]add new vnic type "cyborg"" [Wishlist,Confirmed] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)
14:03:28 <ralonsoh> I think xinranwang could explain this RFE a bit
14:03:47 <xinranwang> sure
14:04:26 <xinranwang> We hope to add a new vnic type for port to indicate that the port has a backend managed by cyborg
14:04:48 <xinranwang> so that nova can trigger the interaction with cyborg according to this vnic type
14:06:04 <slaweq> based on last comments by gibi and ralonsoh in the LP I'm not sure we really need to add such new vnic type
14:06:16 <ralonsoh> and amotoki's comment
14:06:20 <slaweq> right
14:06:46 <ralonsoh> this port is almost a "direct" port
14:06:54 <ralonsoh> actually this is a PCI device
14:07:00 <gibi> if there is no new vnic_type then nova will either ignor the vnic_type for these ports or neutron should enforce vnic_type=direct
14:07:07 <gibi> or direct-physical
14:07:25 <gibi> ignoring incoming vnic_type seems hackish
14:07:32 <ralonsoh> agree
14:07:36 <yonglihe> yeah, it should behave like this.
14:08:17 <amotoki> vnic types are used to determine how neutron handles the port. my concern is what happens if two or more vnic types which are backed by cyborg.
14:08:51 <yonglihe> maybe should limited it's should not be normal in neutron? after all neutron know what network should like.
14:09:30 <yonglihe> for now, direct is supported, and for future  the direct-physical is candidate.
14:10:02 <amotoki> so does it mean you need more vnic type for cybord backed ports?
14:10:43 <sean-k-mooney> the intent was to have a sperate vnic type dedicated to device that were  managed by cyborg and not support the device-procfile with other vnic types
14:11:30 <ralonsoh> but this is not needed in the nova side
14:11:34 <sean-k-mooney> on the nova side we wanted a clear way to differenciate between hardwar offloaded ovs and ovs with cyborg ports or similar for ml2/sriov nic agent
14:11:41 <ralonsoh> and neutron can limit the device-profile to direct ports
14:12:07 <yonglihe> sean, we had to use sriov agent
14:12:11 <ralonsoh> this can be done reading the port definition, with the "device_profile" extension
14:12:17 <sean-k-mooney> then we cant have colocation of hardware offloaded ovs and cyborg on the same compute
14:12:20 <sean-k-mooney> right?
14:12:24 <xinranwang> we should limit that only new vnic type should have device -profile filled, if we have new vnic type.
14:12:36 <sean-k-mooney> we dont want to assume that any exssiting ml2 driver that support ovs will work with cyborg
14:13:08 <sean-k-mooney> * not support ovs support vnic type direct
14:13:21 <yonglihe> sean, sure, only sriov-agent working, not support ovs managed vf
14:13:51 <sean-k-mooney> right but ml2/ovs support direct as does ovn for hardware offloaded ovs
14:14:14 <sean-k-mooney> we did not want those ml2 drivers to be bind the port in that case correct
14:14:17 <ralonsoh> OVN direct is for external ports (sriov)
14:14:51 <sean-k-mooney> ralonsoh: that will taken the hardware offloaded ovs codepath in os-vif
14:15:03 <yonglihe> how ml2/ovs differentiate it from normal sriov direct
14:15:12 <sean-k-mooney> depending on the vif_type that is returned
14:15:17 <sean-k-mooney> speerat topic i guess
14:15:53 <yonglihe> so the if no new vinc, neutron should limited the backend not been set if it's belong to ovs
14:16:10 <yonglihe> base on vif_type
14:16:47 <ralonsoh> neutron folks?
14:16:48 <sean-k-mooney> yes so there are 2 things nova would have to treat the port as a normal sriov port
14:17:10 <sean-k-mooney> e.g. not attempt to add it to ovs but if its bound by the ml2/ovs dirver then we woudl try to add it to ovs
14:17:26 <sean-k-mooney> the only thing that would prevent that today woudl be the check for swtidev api
14:17:42 <sean-k-mooney> presumably the cyborg VF would not have that enable but they could in theory
14:17:44 <slaweq> so IIUC new vnic type will be mostly useful for nova, right? So nova will not need to do various "if's"
14:18:19 <slaweq> and will know that if vnic_type=='cyborg' then device_profile is needed also
14:18:20 <sean-k-mooney> slaweq: its also useful for neturon so existing ml2 dirver dont have to be modifed to filter out cyborg ports if they dont supprot them
14:18:22 <slaweq> is that correct?
14:18:36 <sean-k-mooney> yes
14:18:38 <slaweq> sean-k-mooney: right
14:19:12 <sean-k-mooney> we could technically make this work i without the new vnic type
14:19:28 <slaweq> and without this new type both, nova and neutron will need to do some if vnic_type=='direct' and device_profile not None, then "it's cyborg port"
14:19:29 <sean-k-mooney> but we felt being explicit as simpler
14:19:31 <slaweq> or something like that
14:19:35 <slaweq> correct?
14:19:43 <sean-k-mooney> yes
14:19:45 <yonglihe> what about new vif?
14:19:55 <yonglihe> just like ml2/ovs does it
14:20:13 <slaweq> thx sean-k-mooney for confirmation
14:20:25 <sean-k-mooney> yonglihe: sorry im not following can you restate that again
14:20:55 <yonglihe> use vif to mark the port is 'cyborg backend' instead of vnic.
14:21:05 <yonglihe> vif_type vs vnic_type
14:21:08 <sean-k-mooney> you cannot set the vif-type
14:21:13 <yonglihe> ok
14:21:16 <sean-k-mooney> that is chosen by the driver
14:21:25 <sean-k-mooney> we could add a new one i guess
14:21:28 <mlavalle> vif type is an output of the binding process
14:21:42 <yonglihe> so that's now working
14:21:45 <yonglihe> thanks
14:21:48 <slaweq> so, one more question - about amotoki's concern regarding name of the new vnic_type
14:21:52 <sean-k-mooney> so we could use vnic direct with vif-type cyborg
14:22:20 <sean-k-mooney> that would allow use to resus macvtap or direct-phsyical if we wanted too
14:22:27 <slaweq> can it be something else, to reflect " correspoding functionality rather than who implements the functionality."?
14:22:40 <amotoki> if we assume 'direct' vnic type with cybord support, isn't better to name it sa direct-cybord or direct-<....> with more functional name?
14:23:03 <slaweq> or "accelerator" maybe?
14:23:09 <yonglihe> ok for me
14:23:13 <amotoki> if so, if you want to support cybord support with other vnic, we can name it as XXX-cybord/accerlarator.
14:23:24 <sean-k-mooney> amotoki: that came up in the ptg and i belive sylvain had a similar concern basicaly suggesting do not include the project name
14:23:48 <mlavalle> that's a good idea IMO
14:23:48 <yonglihe> accelerator-x might be nice
14:23:53 <sean-k-mooney> slaweq: acclerator and device-profile were both suggested before
14:24:03 <slaweq> :)
14:24:04 <sean-k-mooney> i have no strong feeling either way
14:24:19 <yonglihe> acclerator-direct acclerator-direct-phy
14:24:21 <mlavalle> let's not use the project name
14:24:27 <yonglihe> agree
14:24:28 <gibi> yonglihe: +1
14:24:39 <amotoki> +1 for yonglihe's idea
14:24:41 <slaweq> that is fine for me
14:24:43 <sean-k-mooney> yep accelerato-<connection mechanium> sound good to me
14:24:44 <slaweq> +1
14:24:55 <yonglihe> nice
14:24:57 <xinranwang> yonglihe: +1
14:25:04 <mlavalle> +1
14:25:12 <ralonsoh> +1
14:25:14 <ralonsoh> I'll amend the patch today
14:25:24 <slaweq> haleyb: njohnston: any thoughts?
14:25:56 <haleyb> +1 from me
14:26:12 <yonglihe> ralonsoh, thanks , i gonna verify that patch 3 days later
14:26:30 <slaweq> I will mark that rfe as approved
14:26:31 <sean-k-mooney> by the way we are avoiding the exsiting smartnic vnic type because that is used for ironic already.
14:26:33 <sean-k-mooney> https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/definitions/portbindings.py#L119
14:26:35 <slaweq> with note about naming change
14:27:13 <slaweq> next RFE now
14:27:15 <xinranwang> slaweq ralonsoh  thanks
14:27:15 <slaweq> https://bugs.launchpad.net/neutron/+bug/1900934
14:27:19 <openstack> Launchpad bug 1900934 in neutron "[RFE][DHCP][OVS] flow based DHCP" [Wishlist,New] - Assigned to LIU Yulong (dragon889)
14:27:23 <slaweq> thank You xinranwang for proposal
14:27:54 <slaweq> regarding LP 1900934 - this was already discussed few times
14:28:10 <slaweq> liuyulong proposed spec already https://review.opendev.org/c/openstack/neutron-specs/+/768588
14:28:20 <slaweq> but rfe is still not decided
14:28:57 <slaweq> so I think we should decide if we want to go with this solution, and continue discussion about details in the review of spec or if we don't want it in neutron at all
14:29:03 <sean-k-mooney> slaweq: i assume this is doing dhcp via openflow rules similar ot ovn with ml2/ovs?
14:29:11 <slaweq> sean-k-mooney: yes
14:29:15 <slaweq> exactly
14:29:43 <sean-k-mooney> cool that would be nice espcially for routed networks
14:29:56 <sean-k-mooney> since each l2 agent could provide dhcp for the segment
14:30:05 <mlavalle> I also have to say that my employer might be interested on this
14:30:26 <sean-k-mooney> assumign it was done as an l2 agent exteion rather then in the dhcp agent
14:30:32 <ralonsoh> I'm ok with the RFE, just some comments in the spec (we can move the discussion there)
14:30:43 <slaweq> sean-k-mooney: that is original proposal IIRC
14:30:46 <ralonsoh> just wondering the gaps between DHCP agent and OVS DHCP
14:31:29 <slaweq> ralonsoh: one of the gaps will be for sure that there will be no dns names resolving in such case
14:31:32 <slaweq> only dhcp
14:31:35 <ralonsoh> yeah
14:31:56 <amotoki> how about extra dhcp optios?
14:32:01 <slaweq> also, I'm not sure if all extra-dhcp-options will work
14:32:05 <slaweq> amotoki++
14:32:13 <slaweq> probably some of them may not work, I'm not sure
14:32:25 <slaweq> but IMHO that
14:32:33 <amotoki> anyway it can be covered by documentation on feature differences between flow-based dhcp and dhcp-agent
14:32:36 <slaweq> that is fine as long as it will be documented
14:32:51 <slaweq> amotoki: You are faster than me again :P
14:32:59 <mlavalle> and will serve a lot of "plain vanilla' dhcp cases
14:33:28 <amotoki> I think it is better to call it as "flow-based dhcp agent" rather than distributed dhcp agent. slaweq's rfe covers distributed agent in some way too.
14:34:50 <slaweq> amotoki: technically it's not even "agent" but dhcp-extension
14:35:14 <amotoki> slaweq: correct. I know it is not an agent.
14:35:36 <amotoki> I spelled "dhcp AGENT" too many times :(
14:35:43 <slaweq> :)
14:37:08 <mlavalle> so let's approve it and move on with the RFE
14:37:20 <mlavalle> +1 from me
14:37:36 <ralonsoh> +1
14:37:37 <amotoki> i'm fine to approve it
14:37:39 <haleyb> +1 from me
14:37:40 <slaweq> mlavalle: that is also my vote - lets approve rfe and continue discussion about details in spec review
14:37:46 <slaweq> so +1
14:38:12 <slaweq> I will mark this rfe as approved
14:38:14 <slaweq> thx
14:38:27 <slaweq> last rfe for today
14:38:28 <slaweq> https://bugs.launchpad.net/neutron/+bug/1910533
14:38:41 <openstack> Launchpad bug 1910533 in neutron "[RFE] New dhcp agents scheduler - use all agents for each network" [Wishlist,New] - Assigned to Slawek Kaplonski (slaweq)
14:39:05 <slaweq> I proposed that RFE but ralonsoh may have more details about use case
14:39:12 <ralonsoh> I can confirm this is a source of problems in some deployments
14:39:16 <slaweq> as he was recently "impacted" by this limitation :)
14:39:52 <ralonsoh> if you have several leafs in a deployment and networks across those leafs
14:40:26 <ralonsoh> if you don't specify the correct number of DHCP agents, some leafs won't have a DHCP agent running
14:40:34 <ralonsoh> and the VMs won't have IP
14:40:53 <amotoki> ralonsoh: is a broadcast domain separeted?
14:40:59 <ralonsoh> yes
14:41:55 <amotoki> to overcome it we need to use dhcp-relay or deploy dhcp agents per broadcast domain
14:42:14 <amotoki> this request sounds reasonable to me
14:42:18 <haleyb> ralonsoh: so when you add a leaf/site but don't increase the agents it doesn't get an agent?
14:42:35 <ralonsoh> exactly this is the problem
14:42:57 <haleyb> ack, thanks
14:43:05 <slaweq> haleyb: right as number of agents per network isn't related to sites at all
14:43:32 <slaweq> so such new scheduler could be simply "workaround" of that problem
14:44:11 <sean-k-mooney> ralonsoh: technically you could deploy 1 dhcp instance per network segment
14:44:30 <ralonsoh> yes, that's a possibility
14:44:37 <sean-k-mooney> at least for routed networks
14:44:39 <ralonsoh> but you need to know where is each agent
14:44:57 <sean-k-mooney> you kind of already do
14:45:13 <sean-k-mooney> you know its hosts and the segment mappings
14:45:43 <sean-k-mooney> but again only in the routed networks case
14:47:02 <sean-k-mooney> increasing the dhcp agent count would not guarenttee it is on the leaf site right
14:47:16 <sean-k-mooney> it could add another instnace to the central site in principal
14:47:20 <ralonsoh> ok, I was looking for the BZ
14:47:21 <ralonsoh> https://bugzilla.redhat.com/show_bug.cgi?id=1886622
14:47:33 <openstack> bugzilla.redhat.com bug 1886622 in openstack-neutron "Floating IP assigned to an instance is not accessible in scale up Multistack env with spine& leaf topology" [High,Assigned] - Assigned to ralonsoh
14:47:35 <ralonsoh> it has public information about this error
14:48:04 <amotoki> we can assume a deployment knows which network node belongs to which segment
14:48:09 <slaweq> sean-k-mooney: right, that's why we propose to add scheduler which will schedule network to all dhcp agents
14:48:19 <sean-k-mooney> so really when adding a new leaf site today you would have to expcitly add an instnace to the new agent deployed at that site
14:48:25 <ralonsoh> amotoki, yes, that's correct
14:48:27 <amotoki> if we deploy dhcp agent per segment, scheduling a netwokr to all dhcp agents will be a workaround
14:49:10 <sean-k-mooney> amotoki: you could do both. all agent if not a routed network and per segment if it is. but per segement is just an optimisation really
14:49:53 <amotoki> sean-k-mooney: yes, that's right
14:50:08 <slaweq> TBH Liu's proposal about distributed dhcp would solve this use case also
14:50:45 <ralonsoh> right (for OVS)
14:50:53 <slaweq> ralonsoh: yep
14:51:37 <ralonsoh> but would be desirable to have this dhcp scheduler to avoid the need of setting the exact number of DHCP agents needed
14:52:20 <amotoki> yeah agree. deployments can continue to use DHCP agent they are familiar with too.
14:52:43 <amotoki> I am not sure we need a new dhcp agent scheduler for this.
14:52:46 <amotoki> Another option is to modify the current dhcp agent scheduler to accept an option like dhcp_agent_per_network=all
14:53:02 <ralonsoh> agree
14:53:19 <slaweq> amotoki: that may be good idea
14:54:18 <ralonsoh> do you prefer to explore this idea? the change in the code will be smaller
14:54:37 <slaweq> ralonsoh: I can
14:54:51 <slaweq> and we will get back to that in next weeks
14:55:21 <amotoki> anyway I am okay with the basic idea to assign a network to all agents.
14:55:38 <ralonsoh> +1 to this idea
14:55:56 <slaweq> so do You want to vote on approval rfe as an idea today, or wait more for some PoC code?
14:56:16 <slaweq> (I will not vote as I proposed rfe)
14:56:17 <mlavalle> +1
14:56:24 <ralonsoh> I can wait for a POC
14:56:35 <amotoki> i am okay with either way
14:56:45 <mlavalle> but we can surely approve the RFE
14:56:45 <haleyb> +1 from me
14:57:03 <mlavalle> if the PoC is not satusfactory, we can scrap it
14:57:12 <mlavalle> I think we can trust slaweq
14:57:17 <slaweq> thx :)
14:57:17 <mlavalle> can't we?
14:57:18 <ralonsoh> maybe...
14:57:20 <ralonsoh> hehehehe
14:57:23 <slaweq> :P
14:57:28 <amotoki> hehe :)
14:57:35 <slaweq> I don't trust myself so I'm not sure ;)
14:57:41 <slaweq> but thank You
14:57:48 <slaweq> I will mark this one as approved also
14:57:56 <slaweq> this was really effective meeting
14:58:00 <slaweq> 3 rfes approved
14:58:00 <ralonsoh> sure
14:58:06 <slaweq> thank You
14:58:14 <mlavalle> o/
14:58:16 <slaweq> I think we can call it a meeting now
14:58:20 <ralonsoh> bye!
14:58:29 <slaweq> have a great weekend and see You online
14:58:29 <amotoki> o/
14:58:31 <slaweq> o/
14:58:33 <slaweq> #endmeeting