#openstack-meeting log

15:00:08 <slaweq> #startmeeting neutron_qos
15:00:09 <openstack> Meeting started Tue May  8 15:00:08 2018 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:12 <openstack> The meeting name has been set to 'neutron_qos'
15:00:14 <slaweq> hi!
15:00:27 <mlavalle> hi
15:00:34 <rubasov> hi
15:01:16 <slaweq> ok, let's start
15:01:22 <slaweq> #topic RFEs
15:01:24 <lajoskatona_> hi
15:01:36 <slaweq> #link https://bugs.launchpad.net/neutron/+bug/1727578
15:01:37 <openstack> Launchpad bug 1727578 in neutron "[RFE]Support apply qos policy in VPN service" [Wishlist,Triaged]
15:01:56 <slaweq> just FYI: specs for this one is merged already
15:01:57 <njohnston> hello
15:02:07 <slaweq> I hope zhaobo6 will start working on implementation now
15:02:17 <mlavalle> I know he is working on it
15:02:34 <slaweq> super :)
15:02:44 <mlavalle> it is actually in his goals
15:02:59 <slaweq> is he from Your company?
15:03:03 <mlavalle> yeah
15:03:19 <slaweq> super - so we have good info about that :)
15:03:20 <slaweq> thx
15:03:43 <slaweq> next one on list is:
15:03:43 <slaweq> #link https://bugs.launchpad.net/neutron/+bug/1560963
15:03:45 <openstack> Launchpad bug 1560963 in neutron "[RFE] Minimum bandwidth support (egress)" [Wishlist,In progress]
15:03:55 <slaweq> I have some idea how maybe it could be done but I didn't even test PoC if that will works actually
15:04:43 <slaweq> I think about some marking of packets like (MARK in iptables) when packet is going out from tap interface and then match such marks on physical interface to proper class
15:05:17 <slaweq> but as I said I even check any PoC if that could makes sense even
15:05:32 <slaweq> do You think that such solution might work maybe?
15:05:46 <mlavalle> worth giving it a try
15:05:59 <mlavalle> did the paper I sent help?
15:06:08 <slaweq> yes, I read it
15:06:10 <reedip_> o/
15:06:40 <slaweq> and it is quite helpful
15:06:45 <slaweq> thx mlavalle for it
15:06:45 <mlavalle> cool
15:06:50 <slaweq> hi reedip_
15:06:51 <rubasov> I have a faint memory of somebody claiming iptables CLASSIFY more efficient than MARK
15:07:03 <njohnston> hi reedip_!
15:07:31 <slaweq> rubasov: thx for the tip
15:07:40 <slaweq> I will try to explore it
15:08:07 <slaweq> also there is the case with openflow rules and without iptables so this should be somehow covered also :)
15:09:36 <slaweq> I really want to make some PoC for it soon
15:09:40 <slaweq> :)
15:09:49 <slaweq> ok, moving to the next one
15:09:50 <slaweq> #link https://bugs.launchpad.net/neutron/+bug/1560963
15:09:52 <openstack> Launchpad bug 1560963 in neutron "[RFE] Minimum bandwidth support (egress)" [Wishlist,In progress]
15:10:13 <slaweq> AFAIK rubasov and mlavalle wanted to discuss something about that one today
15:10:16 <slaweq> right?
15:10:23 <mlavalle> yeah
15:10:25 <rubasov> yep
15:10:32 <slaweq> so go on :)
15:10:52 <rubasov> let me shortly summarize the question
15:11:00 <rubasov> we have two options
15:11:51 <rubasov> option 1: one qos policy rule (mimimum_bw as today) extended with two boolean parameters: enforce_data_plane, enforce_placement
15:12:09 <rubasov> option 1: two qos policy rules for data plane and placement
15:12:16 <rubasov> I mean option 2 :-)
15:12:55 <mlavalle> the point that I don't get is why we need to separate dta plane enforcement and placement enforcement
15:12:56 <rubasov> option 1 is clearly better API design and better user experience (for the admin at least)
15:13:30 <rubasov> but I'm not sure about how it works with rule validation
15:13:33 <slaweq> I have same question as mlavalle
15:14:01 <mlavalle> if a port has a policy with a minimum bandwidth rule, it should be enforced in the data plane and for placament purposes also
15:14:48 <rubasov> mlavalle, slaweq: since placement enforcement is agnostic of drivers we may have users wanting placement enforcement but not yet having data plane enforcement
15:15:45 <mlavalle> but shouldn't those users be aware of what is available?
15:15:52 <mlavalle> and request what is available
15:15:57 <rubasov> also (at least during upgrade) the current behavior is enforce_data_plane=True but enforce_placement=False
15:16:50 <rubasov> mlavalle: er, I'm not sure I understand your last question
15:17:32 <slaweq> what about validating this rule type for all backends and document properly that it is supported by X, Y, Z backends and if You are using different one then min bandwidth will be only enforcement by placement
15:18:00 <slaweq> without additional switch in API
15:18:16 <mlavalle> The way I see it, having a minimum bandwidth rule in a port means that it should be enforced in all the components that are necessary in the OpenStack system to make it effective
15:18:52 <mlavalle> we shouldn't separate data plane and placement
15:19:02 <rubasov> but the system will actually work different in different openstack versions, shouldn't that be shown in the API?
15:19:19 <slaweq> yes, but (at least for now) we don't have a way to enforce it on data plane yet
15:19:59 <slaweq> but for some private clouds it might be useful to use it only with "guarantee" on placement level
15:20:16 <mlavalle> so, if I can create a minimum bandwidth rule, that is the API telling me it is possible
15:20:17 <slaweq> rubasov: is that what You want to achieve?
15:20:57 <rubasov> slaweq: yes
15:21:14 <slaweq> ok, so I at least understand it properly :)
15:21:15 <mlavalle> the fact that I can create and assign a minimum bandwidth rule to a port should be the necessary condition for that rule to be enforced accross all the OpenStack componentts necessary to make it effective
15:21:26 <gibi> keeping a single rule and enforcing that placement allocation is mandatory makes the upgrade harder. This will force us to create allocations for bound ports in placement during upgrade. However it is the same upgrade that will install the code that will report bandwidth inventory.
15:21:52 <gibi> feels like a chicken and egg problem
15:22:29 <mlavalle> so let's solve a migration problem
15:22:36 <rubasov> mlavalle: for example what about direction ingress?
15:22:43 <mlavalle> but let's not create artificial concepts in the API
15:23:23 <slaweq> mlavalle++
15:23:30 <rubasov> as soon as we'll have placement enforcement that will work for that direction (both directions in fact) for all drivers but we may not have data plane enforcement for all drivers at that point
15:24:20 <slaweq> yes, and IMHO that should be properly documented that users would be aware of whatis guaranteed and what is best effort only
15:25:04 <mlavalle> yes, isn't this a documentation issue?
15:26:57 <rubasov> but that documentation could not be global, it would depend on which drivers are loaded, right? so each deployer would have to document this again for each deployment
15:27:31 <slaweq> rubasov: but I was thinking about something like:
15:28:04 <mlavalle> we can give them the rules: if you have such and such drivers, you minimum bandwidth rules will be enforced
15:28:06 <slaweq> min bw limit is supported by ovs, sr-iov backends so if You use those backends, Your min bw will be guarantee
15:28:34 * mlavalle tried to say the same thing as slaweq
15:28:37 <slaweq> if You use other backend, like linuxbridge, bw limit is only enforcement by placement and is not guarantee
15:29:04 <slaweq> what You think about something like that?
15:30:10 <rubasov> that's pretty much what we had in patch set 6
15:30:15 <rubasov> https://review.openstack.org/#/c/508149/6/specs/rocky/minimum-bandwidth-allocation-placement-api.rst@83
15:30:15 <patchbot> patch 508149 - neutron-specs - QoS minimum bandwidth allocation in Placement API
15:30:52 <rubasov> I'm trying to remember why we moved away from it (if there was a proper reason, I'm not sure)
15:31:21 <mlavalle> I am happy with that
15:31:46 <mlavalle> because what is stated there is a "temporary" situation
15:31:58 <mlavalle> that we can change as we add support for more drivers
15:32:18 <slaweq> at least it should be temporary and we should add support for ovs and lb backends for it :)
15:32:26 <mlavalle> in contrast, adding stuff to the API creates long term, permamnet situations
15:32:37 <mlavalle> that are difficult to change in the future
15:33:06 <slaweq> mlavalle: in our case API changes are almost impossible to change :)
15:33:08 <mlavalle> APIs create commitments
15:34:21 <rubasov> I'm just a bit afraid of the user being confused when asking for the same thing in the API and getting many different things depending on OpenStack version and/or backend drivers (both being things the end user should not know about)
15:35:27 <mlavalle> In that sense, we have an undesirable situation given how fast the community can add minimum bandwidth support to different drivers
15:35:29 <rubasov> but I can let that fear go if you prefer to handle this kind of support matrix by documentation
15:35:59 <slaweq> I would prefer docs way for that
15:36:06 <mlavalle> but that undesirable situation is temporary, as we add support for more drivers
15:36:10 <slaweq> but I think that it's mlavalle and drivers team choice :)
15:36:44 <mlavalle> and we can mitigate it with proper documentation
15:37:25 <rubasov> I accept that choice
15:37:34 <gibi> can we get back to the upgrade issue?
15:37:34 <mlavalle> in fact, we use that matrix approach already in the QoS documentation, IIRC
15:37:43 <mlavalle> sure
15:37:49 <gibi> assume there is a bound SRIOV port with minimum bandwidth rule today. that is enforced on the data plane
15:38:36 <gibi> now the deployer upgrades neutron and nova to the version that support placement bandwidth rule and that r
15:38:58 <gibi> requires the port to have bandwidth allocation in placement
15:39:15 <gibi> during upgrade we cannot make that allocation, as no bandwidht inventory will be in place
15:39:16 <slaweq> mlavalle: we have such matrix here: https://docs.openstack.org/neutron/latest/contributor/internals/quality_of_service.html#agent-backends
15:39:49 <rubasov> slaweq: yep, I was thinking of the same
15:39:56 <gibi> so after such upgade we will have an inconsistent resource allocation situation in placement
15:40:27 <gibi> which means any new boot with QoS bandwidth aware port will result in a not enforced resource limit
15:40:46 <mlavalle> can't we include in the upgrade process a script that cretes the bandwidth inventories
15:40:48 <mlavalle> ?
15:40:58 <gibi> mlavalle: that inventory is agent dependent
15:41:02 <gibi> mlavalle: so I'm not sure
15:41:47 <gibi> moreover there are two cases
15:41:53 <mlavalle> if a host is running agents with minimum bandwidth supppirt
15:42:10 <mlavalle> the script will create the RPs necessary
15:42:30 <gibi> mlavalle: so that script will duplicate the work of the given agent (ovs, sriov, etc)
15:42:40 <mlavalle> we can get that data in the controller from the agents heartbeats
15:42:58 <mlavalle> only during the upgrades
15:43:00 <gibi> mlavalle: but that would require an upgraded and running agent
15:44:01 <gibi> mlavalle: and an upgraded and running controller that understands the new heartbeat sturcture
15:44:04 <slaweq> gibi: such script would need to duplicate only sriov agent's work as only this backend supports now min bandwidth
15:44:12 <gibi> slaweq: good point
15:44:16 <gibi> slaweq: that help
15:44:27 <gibi> still there are two cases
15:44:45 <gibi> i) we can do an in place allocation for the bound port
15:44:56 <gibi> ii) we cannot as there is not enough bandwidth on that host
15:45:11 <rubasov> I think (ii) is the really hard one
15:45:18 <gibi> #ii) would require a livemigration to resolve
15:45:38 <mlavalle> yeah, and the admin should be aware of it
15:45:49 <mlavalle> and take necessary actions
15:46:05 <slaweq> mlavalle++
15:46:17 <mlavalle> however unpleasant the upgrade process might be, it will happen once
15:46:19 <gibi> mlavalle: either we fail the upgrade or implement a pre-flight check for such situation
15:46:55 <mlavalle> but that is preferable to add things to the API that we will never be able to remove
15:47:21 <mlavalle> and that will create an unpleasant experience to users and admins for a loooong time
15:48:02 <gibi> mlavalle: you statement about API commitment make sense to me. I'm just pretty afraid about the unplesantness of this upgrade
15:48:26 <mlavalle> me too, but it is only once per deployment
15:48:34 <gibi> mlavalle: maybe when we will have some PoC code where I can play with the upgrade that would help removing some of the fear
15:48:59 <mlavalle> let's focus on making mitigating that unplestaness as much as we can
15:49:14 * mlavalle not sure that word exists
15:50:02 <rubasov> IMHO even if gibi's upgrade concerns are valid (not 100% sure myself at the moment) we can get back to this if and when the upgrade proves to be too painful in a lab setting (but before removing the experimental flag from the overall feature)
15:50:31 <gibi> rubasov: I can live with that
15:50:36 <mlavalle> yeah, let's move ahead with the spec assuming documentation
15:50:55 <mlavalle> will help us address the migration problem
15:51:07 <mlavalle> we all understand that there are trade offs
15:51:30 <mlavalle> and let's not over commit to an API design that is not ideal
15:51:33 <slaweq> I agree, let's make something first and then deal with upgrades problem :)
15:51:47 <rubasov> understood
15:52:02 <rubasov> I'll update the spec accordingly tomorrow
15:52:42 <mlavalle> rubasov, gibi, lajoskatona_: I want to commend your great work over the past few weeks on this topic
15:52:51 <mlavalle> you guys are the best !
15:52:58 <slaweq> mlavalle++
15:52:59 <gibi> mlavalle: thanks
15:53:03 <rubasov> mlavalle, slaweq: thank you for all the help
15:54:16 <mlavalle> ok, slaweq I think we can move on
15:54:26 <slaweq> ok, I think we can quickly move to next topic now :)
15:54:38 <slaweq> or wait
15:54:41 <slaweq> there is also
15:54:42 <slaweq> #link https://bugs.launchpad.net/neutron/+bug/1505627
15:54:43 <openstack> Launchpad bug 1505627 in neutron "[RFE] QoS Explicit Congestion Notification (ECN) Support" [Wishlist,Triaged] - Assigned to Reedip (reedip-banerjee)
15:54:47 <slaweq> which is mark as postponed
15:55:03 <slaweq> but specs is waiting for review: https://review.openstack.org/#/c/445762/
15:55:04 <patchbot> patch 445762 - neutron-specs - Spec for Explicit Congestion Notification
15:55:16 * mlavalle has that spec in his bakclog of reviews
15:55:18 <slaweq> so please add it to Your queue maybe :)
15:55:23 <slaweq> thx mlavalle
15:55:28 <slaweq> ok, so next topic
15:55:33 <slaweq> #topic Bugs
15:55:52 <slaweq> I'm now aware about any new bug related to QoS
15:56:01 <slaweq> so just short sum up
15:56:06 <slaweq> #link https://bugs.launchpad.net/neutron/+bug/1758316
15:56:07 <openstack> Launchpad bug 1758316 in neutron "Floating IP QoS don't work in DVR router" [High,In progress] - Assigned to LIU Yulong (dragon889)
15:56:21 <slaweq> patch for that is in gate probably: https://review.openstack.org/#/c/558724/
15:56:22 <patchbot> patch 558724 - neutron - [L3][QoS] Cover mixed dvr_snat and compute node dv...
15:56:33 <slaweq> so it might be consiedered as fixed IMO :)
15:56:51 <slaweq> no, there is -2 from zuul :/
15:56:52 <mlavalle> yeap
15:56:58 <slaweq> so we will have to recheck it
15:57:28 * mlavalle who is weizj, he is been reviewing a lot of patches
15:57:40 <slaweq> mlavalle: I don't know
15:57:49 <slaweq> but I also saw a lot of reviews from him recently
15:57:54 <mlavalle> well, it is good to have one more reviewer
15:58:03 <slaweq> yes
15:58:11 <slaweq> two more bugs (very old):
15:58:12 <slaweq> #link https://bugs.launchpad.net/neutron/+bug/1639186
15:58:13 <openstack> Launchpad bug 1639186 in neutron "qos max bandwidth rules not working for neutron trunk ports" [Low,Confirmed]
15:58:16 <slaweq> #link https://bugs.launchpad.net/neutron/+bug/1732852
15:58:17 <openstack> Launchpad bug 1732852 in neutron "neutron don't support Router gateway rate limit " [Low,In progress]
15:58:32 <slaweq> there is no any update about them
15:58:37 <slaweq> it's just short reminder - maybe someone want's to hug them :)
15:59:05 <lajoskatona_> bye
16:00:18 <slaweq> ok, I think that we are almost out of time now
16:00:24 <slaweq> so thanks guys
16:00:25 <mlavalle> yeap
16:00:35 <slaweq> #endmeeting