15:02:22 <ajo> #startmeeting neutron_qos
15:02:22 <davidsha> hi
15:02:22 <openstack> Meeting started Tue Jan 31 15:02:22 2017 UTC and is due to finish in 60 minutes.  The chair is ajo. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:02:23 <ajo> Hello :)
15:02:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:02:25 <openstack> The meeting name has been set to 'neutron_qos'
15:02:39 <ajo> #topic Announcements
15:02:45 <njohnston> :q
15:02:49 <ajo> Qos driver is finally in shape and refactor merged
15:02:53 <ralonsoh> great!!!
15:03:04 <slaweq_> \o/
15:03:08 <davidsha> hi njohnston
15:03:21 <ajo> #link https://review.openstack.org/#/c/396651/
15:03:22 <ajo> sorry for being sooooo slow :/
15:03:30 <ajo> We shall ping the relevant people on midokura (yamamoto), ovn (russelb),
15:03:30 <ajo> nsx (garyk) to see if they need any help migrating to the new
15:03:30 <ajo> model, the old notification_driver support will be removed in Pike
15:03:53 <ajo> #action ajo ping all qos-driver implementers to update their drivers in pike, change shall be easy,
15:04:30 <ajo> Now slaweq_ is working on the 2nd thing: Enhanced validation
15:04:30 <ajo> #link https://review.openstack.org/#/c/426946/
15:04:35 <ajo> for Pike
15:05:03 <slaweq_> I will do my best to finish it asap
15:05:20 <ralonsoh> I'll review this patch tomorrow and I'll follow it
15:05:21 <ajo> The structural changes made him reboot his original patch
15:05:25 <ajo> But I hope the new interface will help
15:05:25 <slaweq_> now it should be easier than it was before Your refactor
15:05:49 <ajo> yes, before it was becoming a bit too complicated, let's see how it will look now
15:06:00 <slaweq_> as I was looking into it yesterday it should be easier but we will see :)
15:06:37 <ajo> My colleagues always fear me when I say it will be easy, it never is :) right dalvarez ? :P
15:07:02 <slaweq_> I know, that's why I wrote "it should be" :P
15:07:02 <ajo> so... I have another point for documentation
15:07:20 <ajo> 'XD
15:07:21 <ajo> #topic Documentation
15:07:26 <ajo> Networking guide needs to be updated about minimum egress bandwidth
15:07:26 <ajo> support.
15:07:27 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1618769
15:07:28 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1618762
15:07:28 <openstack> Launchpad bug 1618769 in openstack-manuals " SR-IOV: add agent QoS driver to support egress minimum bandwidth" [Low,Confirmed] - Assigned to Miguel Angel Ajo (mangelajo)
15:07:30 <openstack> Launchpad bug 1618762 in openstack-manuals " Add QoS minimum bandwidth rule for instance egress traffic" [Low,Confirmed] - Assigned to Miguel Angel Ajo (mangelajo)
15:07:43 <ajo> I planned to work on that
15:07:47 <ajo> it shall be easy
15:08:00 <ajo> but we must have it ready for the release, this was on Newton I think, but we forgot ':D
15:08:23 <ralonsoh> I can take one of this
15:08:25 <dalvarez> ajo, lol im scared now
15:08:28 <ralonsoh> one of these
15:08:33 <ajo> X) dalvarez
15:08:56 <ajo> ralonsoh it's the same one I think, not sure if we need to do anything special for SR-IOV
15:09:05 <ralonsoh> No, nothing special
15:09:21 <ajo> one was generated for the "api changes" and the other one for docimpact on the SR-IOV
15:09:27 <ralonsoh> ahhh I see
15:09:28 <ajo> so one commit shall close it all, may by I can write, and I can ping you for reviews ralonsoh ?
15:09:34 <ralonsoh> yes
15:09:38 <ajo> may by->may be
15:09:39 <ajo> :)
15:09:49 <reedip_> o/
15:10:00 <ajo> hi reedip_ I've got a point for you later ;)
15:10:15 <reedip_> :D ok ajo
15:10:21 <ajo> #topic Bugs
15:10:32 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1649503
15:10:32 <openstack> Launchpad bug 1649503 in neutron "Mechanism driver can't be notified with updated network" [High,In progress] - Assigned to Hong Hui Xiao (xiaohhui)
15:10:32 <ajo> We need kevinbenton's help on this one
15:10:47 <ajo> it's related to some recent change, now when you detach a policy from a network, that's not notified properly
15:11:24 <ajo> this looks like a high priority bug (in the context of QoS)
15:11:41 <slaweq_> but it's waiting for action for quite long time now
15:11:54 <slaweq_> maybe You can ask kevinbenton to look on it? :)
15:12:00 <ajo> yes, I will,
15:12:04 <ajo> I started by adding him to the bug
15:12:08 <ajo> I will ping him personally
15:12:48 <ajo> #action ajo ping kevinbenton about bug 1649503
15:12:48 <openstack> bug 1649503 in neutron "Mechanism driver can't be notified with updated network" [High,In progress] https://launchpad.net/bugs/1649503 - Assigned to Hong Hui Xiao (xiaohhui)
15:13:26 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1627749 better error handling
15:13:26 <openstack> Launchpad bug 1627749 in neutron "qos driver api can have better error handling" [Medium,Confirmed] - Assigned to Miguel Angel Ajo (mangelajo)
15:13:38 <ajo> there was some work around that on the qos driver refactor, but not sure if enough
15:13:41 * ajo looks for the link
15:14:10 <ajo> https://review.openstack.org/#/c/396651/29/neutron/services/qos/drivers/manager.py@69
15:14:15 <ajo> may be this is not enough to close that bug
15:14:44 <ajo> shall we probably, let all drivers be called, and then raise the exception after rpc push happens ?
15:15:02 <ajo> thoughts?
15:15:22 <ralonsoh> I don't think so
15:15:28 <ajo> we don't have yamamoto here
15:15:31 <ralonsoh> I prefer the way is now implemented
15:15:53 <ajo> ralonsoh the problem with current implementation, is that if one driver fails, the others aren't called at all
15:15:58 <ajo> may be some are called, some are not
15:15:59 <ralonsoh> I know
15:16:01 <ajo> depending on the order
15:16:04 <ralonsoh> but no driver should fail
15:16:07 <ajo> it's very undeterministic
15:16:16 <ajo> that is true
15:16:29 <ajo> if a driver fails it's it's responsibility to retry later, and resync...
15:16:30 <ajo> but
15:16:40 <ajo> a bad implementation of one, should not affect the others, or the rpc
15:16:56 <ajo> it's not bad if we are more robust in the face of external errors
15:17:20 <ajo> unless we find a good reason why that would be problematic
15:17:38 <ralonsoh> ok, that's a new patch!
15:17:55 <ajo> (code being more complex can be a good reason against it, but I believe it shouldn't be very complicated *warning on perceived complexity O:)* )
15:18:23 <ajo> ralonsoh I will try to put a new patch for this, see how it looks, code goes weird I'm ok to drop it
15:18:38 <ralonsoh> ok
15:19:21 * ajo looks at https://bugs.launchpad.net/neutron/+bugs?field.tag=qos
15:19:23 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1657381
15:19:23 <openstack> Launchpad bug 1657381 in neutron "QoS drivers need to implement a precommit for the actions" [Medium,In progress] - Assigned to Miguel Angel Ajo (mangelajo)
15:19:32 <ajo> This one, after a thought, is not very clear to me that it's necessary
15:19:56 <ajo> since in the end, drivers are not supposed to fail on any policy modification
15:20:04 <ajo> may be what's bad for them is good for other drivers
15:20:17 <ajo> and that will be handled by the enhanced validation
15:20:35 <ajo> bad for them = "the specific driver can't handle it"
15:21:00 <ajo> so, the fact that ODL for example writes to a log, and then sends the log, is implementation detail, they can do that in the one existing call
15:21:20 <ajo> and if they fail, it's their responsibility, again, to retry to sync it later
15:21:29 <ajo> ralonsoh thoughts?
15:21:45 <ajo> I wish I had an ODL'r here :D
15:21:57 <ralonsoh> the point is why they need those calls?
15:22:16 <ralonsoh> they don't implement any action for them
15:22:32 <ajo> ralonsoh exactly, and, even if they need them, if we're not goint to allow exceptions happening to stop the db transactions I see no benefit
15:22:32 <slaweq_> I think that maybe we should make it working with any plugin, not only ML2 which not uses such precommit
15:22:34 <ralonsoh> buy I'll review ODL code
15:22:47 <slaweq_> so maybe it's worth to do it if some plugins needs it
15:23:01 <ajo> slaweq_ if there's good justification, right
15:23:21 <ajo> but they still failed to justify the need
15:23:27 <ajo> or may be they did, but I didn't got it
15:24:28 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1639186
15:24:28 <openstack> Launchpad bug 1639186 in neutron "qos max bandwidth rules not working for neutron trunk ports" [Low,Confirmed] - Assigned to Luis Tomas Bolivar (ltomasbo)
15:24:30 <ajo> ltomasbo  ^
15:24:53 <ajo> any advance on this? did you look at russellb comments  here https://bugs.launchpad.net/neutron/+bug/1639186/comments/11 ?
15:25:24 <ajo> I'm pinging him
15:25:45 <ajo> ralonsoh russellb says they use queues, and they steer the traffic using OF rules
15:25:47 <ltomasbo> hi ajo
15:25:55 <ajo> hi ltomasbo  :)
15:25:59 <ltomasbo> no, I did not take any further action on that
15:26:20 <ajo> ralonsoh : so may be that will be doable when we have min bw and queues in OVS. ?
15:26:41 <ajo> ltomasbo may be I should detach you from the bug for now for just in case anyone else wants to step up for it?
15:26:49 <ralonsoh> but we still don't have min in OVS
15:27:08 <ltomasbo> ajo: sure! please do
15:27:20 <ajo> ralonsoh correct, but, it would be similar to what you did in LB, and then we could get rid of this bug with trunk ports
15:27:52 <ralonsoh> ajo: I tried to do this, but we don't have the IFB like in Linux Bridge
15:28:13 <ralonsoh> ajo: I tried to figure out how to implement this, without any luck...
15:28:21 <ajo> ralonsoh yes, we'd need to rely on queues, etc... all the thing we talked about, *not easy*
15:28:30 <ralonsoh> ok
15:28:58 <ajo> let's revisit it on pike and try to get it going. Will you be around on the PTG?
15:29:10 <ralonsoh> yes, we can talk about his in the PTG
15:29:23 <ralonsoh> s/his/this
15:30:14 <ajo> ralonsoh we also have the one related to router GW ports
15:30:20 <ajo> I can't find it now
15:30:51 <ralonsoh> https://review.openstack.org/#/c/425218/
15:30:53 <ralonsoh> this one?
15:31:02 <ajo> #link https://review.openstack.org/#/c/425218/
15:31:27 <ajo> oh, I must review it again
15:31:27 <ralonsoh> yes, I know. kevinbenton told us to implement it
15:32:00 <ralonsoh> and https://review.openstack.org/#/c/425280/
15:32:40 <ajo> #link https://review.openstack.org/#/c/425280/
15:33:21 <ajo> please folks review those two ^
15:33:22 <ajo> thanks ralonsoh !!
15:33:23 <ajo> any bug I'm missing?
15:34:01 <ajo> 3,
15:34:09 <ajo> 2,
15:34:17 <ajo> 1,
15:34:28 <ajo> #topic RFEs
15:34:53 <ajo> So, after the validation, for pike, we have in pipeline:
15:34:54 <ajo> * instance ingress bw limiting,
15:35:11 <slaweq_> yes, it waits for improved validation :)
15:35:22 <ajo> * strict minimum bw (integration with nova placement api)
15:35:31 <ajo> I believe bits are in place for those two
15:35:47 <ralonsoh> cool!
15:36:01 <ajo> we can think of VLAN 802.1p if somebody is willing to take it, it shall be easy
15:36:29 <reedip_> ajo : link ?
15:37:10 <ralonsoh> do you mean ECN?
15:37:21 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1560961. instance-ingress
15:37:21 <openstack> Launchpad bug 1560961 in neutron "[RFE] Allow instance-ingress bandwidth limiting" [Wishlist,In progress] - Assigned to Slawek Kaplonski (slaweq)
15:37:22 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1578989 strict minimum bw
15:37:23 <reedip_> neverming, got it : https://bugs.launchpad.net/neutron/+bug/1505631
15:37:23 <openstack> Launchpad bug 1578989 in neutron "[RFE] Strict minimum bandwidth support (egress)" [Wishlist,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)
15:37:24 <openstack> Launchpad bug 1505631 in neutron "[RFE] QoS VLAN 802.1p Support" [Wishlist,Confirmed] - Assigned to Kannan Raman (kannanrc20)
15:37:33 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1505631
15:37:34 <davidsha> ralonsoh: There is another for mapping dscp to v-lan pcp
15:37:48 <ajo> yes, we need to talk about that one
15:38:00 <ajo> but I wanted to talk about reedip's ECN proposal again
15:38:09 <ajo> he updated the etherpad and I couldn't look at it until today
15:38:17 <ajo> #link https://etherpad.openstack.org/p/QoS_ECN
15:38:52 <ajo> reedip_ : most details make sense, but we can't control the VM internal settings
15:39:04 <reedip_> ajo : hmm, okay
15:39:20 <ajo> reedip_ it will be OS dependant, etc... and it's no business of neutron or nova to tweak anything inside the VMs
15:39:21 <ajo> we can provide documentation, or heat templates for that
15:39:25 <ajo> in linux I believe it's on by default
15:39:27 <ajo> may be I'm wrong
15:39:54 <reedip_> ajo : in linux. no , ECN has to be enabled
15:40:14 <ajo> aha, so we may want to provide documentation to let people do that
15:40:17 <reedip_> I think yes, we can provide heat templates for that / documentation if required
15:40:19 <ajo> I thought It was on by default
15:40:27 <ajo> so
15:40:46 <ajo> the proposal, has an API that is a bit out of how we handle everything else in QoS plugin now
15:40:57 <ajo> It'd be great if we can find a way to fit it in the current model
15:41:20 <reedip_> ajo : yes, I saw that comment, would try to integrate it , but need to look more into it .
15:41:21 <ajo> may be, if the change to routers is not invasive,
15:41:22 <ajo> (or changes performance)
15:41:23 <ajo> we could just implement that in the l3 agent
15:41:24 <ajo> and have it enabled by default
15:41:48 <davidsha> So if the ECN bit is set and its not enabled, the packet isn't echo'ed with the receive bit set right??
15:42:46 <ajo> I believe that what reedip_ proposes is that our routers will set the flag on the outgoing packet, that then will be echoed by the receiving machine back to the sender
15:42:50 <reedip_> davidsha : you mean if one router has detected COngestion, but one of the End points ( a VM for example ) doesnt have ECN enabled, then YES, the ACK packet back to the sender wont have the ECN receievd bit set
15:43:21 <ajo> VM1 sends packet
15:43:22 <ajo> packet crosses router (congested one)
15:43:23 <davidsha> reedip_: ack, thanks!
15:43:23 <ajo> router sets flag on packet
15:43:24 <ajo> packet arrives VM2
15:43:35 <ajo> VM2 sends the echo bit on next TCP frame to sender
15:43:43 <ajo> so
15:43:48 <ajo> missing details for this proposal are:
15:43:53 <ajo> 1) how to do it in the neutron routers
15:44:02 <ajo> 2) ways to fit this on the current model
15:44:24 <ajo> for 2, may be we don't even need it, but if we need conditional enablement of ECN on the routers, we could, for example:
15:44:43 <ajo> let admins set policies with an ECN rule in the external and internal networks of the router
15:44:55 <ajo> and when at least one internal network, and the external network provide ECN, we ECN-enable the router
15:45:02 <reedip_> ajo : ack
15:45:20 <ajo> this would require integration of QoS in l3, but now l3 has a framework to add extensions, it should be doable
15:45:21 <ajo> thanks njohnston !! ;)
15:45:33 * njohnston bows
15:45:44 <reedip_> thanks njohnston :D
15:46:01 <ajo> reedip_ I don't know about 1, if you show me a viable POC of how to do it in the low level (qrouter), then we can formalize this RFE properly
15:46:24 <reedip_> ajo : yep, this is an Action Item for me before the next meeting
15:46:47 <ajo> #action reedip ECN RFE refinement :)
15:47:20 <ajo> davidsha did you want to talk about the DSCP/VLAN mapping ?
15:47:31 <ajo> may be we should cover first the basic VLAN rules
15:47:49 <ajo> I asked the submitter for more details and never answered
15:47:50 <reedip_> ajo : meanwhile , I think I can voluteer  on the 802.1p if its available
15:47:51 <davidsha> ajo: kk, it wan't my rfe I'd just read it previously ;)
15:48:08 <davidsha> wasn't*
15:49:23 <ajo> reedip_ https://bugs.launchpad.net/neutron/+bug/1505631 It seems like you could revive it in Pike, it's postponed
15:49:23 <openstack> Launchpad bug 1505631 in neutron "[RFE] QoS VLAN 802.1p Support" [Wishlist,Confirmed] - Assigned to Kannan Raman (kannanrc20)
15:49:30 <ajo> which means, it was approved, but we were waiting on other stuff, or we had no hands :)
15:49:42 <davidsha> Just for clarity on this going forward, this would mean we'd have the dscp marking rule, a pcp marking rule and a "traffic class" marking rule correct?
15:50:22 <reedip_> Yes, I will take it up ajo  for Pike
15:50:30 <ajo> davidsha what was pcp? %)
15:51:13 <davidsha> ajo: the vlan version of dscp it stands for Priority Code Point
15:51:20 <ajo> ahh,
15:51:28 <ajo> that would be then 2 ?
15:51:31 <ajo> 1) DSCP
15:51:40 <ajo> 2) VLAN
15:52:08 <davidsha> There was another RFE for mapping over DSCP and PCP if I recall
15:52:44 <davidsha> so Pcp is a 0-7 values and dscp is mapped to 0-7 traffic classes.
15:53:23 <ajo> davidsha, let's look at that when we have vlan, but ye
15:53:41 <ajo> at that point, DSCP marking rules would be incompatible with mapping rules
15:53:58 <ajo> or am I getting it wrong?
15:53:59 <ajo> do you mean
15:54:07 <ajo> mapping the DSCP flags when going over vlan? or what ?
15:54:18 <ajo> that RFE we mentioned before?
15:54:27 <davidsha> ajo: kk, I'm not sure who proposed it but I recall reading the RFE.
15:55:21 <ajo> davidsha link, or bring it to next meeting :)
15:55:27 <davidsha> ajo: It was a while since I read it so I'm not entirely sure.
15:55:34 <ajo> anyway, unless of it being of special interest to you, or anyone willing to code it, I suspect we have enough in our plate for Pike already :)
15:55:50 <davidsha> ack
15:56:06 <ajo> we have the openflow pipeline in the queue too, which could be cool to fix for Pike too :)
15:56:08 <ralonsoh> one more: https://bugs.launchpad.net/neutron/+bug/1639220
15:56:08 <openstack> Launchpad bug 1639220 in neutron "[RFE] Default action for RBAC" [Undecided,New] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)
15:56:12 <ralonsoh> this one is very short
15:56:13 <ajo> oh right
15:56:13 <davidsha> +1
15:56:25 <ralonsoh> just waiting your reply
15:57:32 * ajo reads
15:58:00 <ajo> yes, having an api to set a per-tenant default could make sense
15:58:08 <ajo> a lot of sense actually
15:58:47 <ajo> I'm trying to add tags, but it ignores me
15:58:56 <ajo> ralonsoh can you add #qos and #rfe tags?
15:59:20 <ajo> ralonsoh we would need that triaged by the drivers meeting
15:59:28 <ajo> but, makes sense to me, I'm going to comment
15:59:32 <ajo> and close the meeting, we have 30 secs
15:59:32 <ralonsoh> ajo: done
15:59:54 <ajo> #endmeeting