#openstack-meeting-3 log

14:03:42 <ajo> #startmeeting neutron_qos
14:03:43 <openstack> Meeting started Wed Feb 24 14:03:42 2016 UTC and is due to finish in 60 minutes.  The chair is ajo. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:03:44 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:03:47 <openstack> The meeting name has been set to 'neutron_qos'
14:03:47 <ajo> Hi everybody! ;)
14:03:48 <ihrachys> o/
14:03:53 <jschwarz> \o/
14:04:02 <irenab> hi
14:04:06 <davidsha> hi!
14:04:06 <ajo> #link http://eavesdrop.openstack.org/#Neutron_QoS_Meeting
14:04:12 <njohnston> o/
14:04:15 <ajo> o/ :)
14:04:46 <ajo> I wanted to start raising the topic of our roadmap
14:05:14 <ajo> on last drivers meetings there were concerns about our roadmap, status, and the amount of RFEs they were finding
14:05:28 <ajo> #link http://eavesdrop.openstack.org/meetings/neutron_drivers/2016/neutron_drivers.2016-02-18-22.01.log.html#l-52
14:05:51 <irenab> ajo: so qos features have high request
14:05:56 <ajo> So I thought we may clarify that
14:05:59 <ajo> #link http://lists.openstack.org/pipermail/openstack-dev/2016-February/087360.html
14:06:08 <ajo> I sent this email to the mailing list
14:06:24 <ihrachys> irenab: yeah, but it's not like you post an RFE and it magically happen
14:06:46 <ajo> and, to be right, armax was partly right, because I haven't been doing a good review of new RFEs, because I was focused on the mitaka bits
14:06:47 <ihrachys> we should consider available resources, current roadmap...
14:07:12 <irenab> ihrachys: the intent should be approved so the one who proposes can move on
14:07:18 <ajo> and I guess, they felt overwhelmed by RFEs they didn't understand how exactly fit in the architecture we designed
14:07:24 <armax> ajo: I am always 100% right!
14:07:24 <armax> :)
14:07:30 <armax> ajo: jokes aside, I saw your email…but I haven’t had the chance to reply yet…I’ll do that today
14:07:30 <ajo> armax++
14:07:31 <ajo> lol
14:07:32 <ihrachys> irenab: you can't effectively move forward without having reviewers on board
14:07:50 <ihrachys> that's why we have approvers for blueprints
14:08:00 <irenab> ihrachys: so intent and review commitmentm right?
14:08:02 <ihrachys> (we don't have them for RFEs and I believe that's a bug)
14:08:19 <ajo> armax: I wanted to discuss the current status on the meeting, and then send a detailed report, I'm sorry I haven't been communicating to you properly, and actively reviewing new RFEs, consider that changed from now on
14:08:19 <ihrachys> armax: btw do we plan to have approvers for RFEs?
14:08:58 <armax> ihrachys: something to consider/experiment next cycle. We’ll do a postmortem once mitaka is out of the way
14:09:11 <njohnston> So I think we take the list of all QoS features that could be implemented - BW limiting, DSCP, ECN, 802.1p/q, and minimum bandwidth guarantees - and the possible implementations for each - OVS, LB, SR-IOV - and we can provide a matrix of all the QoS items between us and full implementation.  Some of them will be empty spots - DSCP on SR-IOV is an impossibility - but at least we can say "this is the comp
14:09:11 <njohnston> lete roadmap", and show which 1-3 items we're targeting this cycle and next cycle
14:09:33 <ajo> yes,
14:09:40 <ajo> I have brought up this:
14:09:42 <ajo> #link https://etherpad.openstack.org/p/qos-roadmap
14:09:47 <ajo> to discuss during the meeting
14:09:48 <ihrachys> the matrix idea seems like a good idea
14:09:55 <ajo> And I was thinking of the same
14:10:08 <ajo> one important thing, is that we don't need to fill RFEs for specific ref-arch implementations
14:10:25 <ajo> probably a bug is enough
14:10:41 <ihrachys> +
14:10:57 <ajo> only if the implementation is a huge change to the specific implementation, then that could be a matter of an spec/rfe/devref
14:11:14 <ajo> to have a better understanding of how is it going to be implemented
14:11:33 <njohnston> That sounds fair to me.
14:11:47 <ajo> So, in the tiny etherpad,
14:11:54 <ajo> I have detailed r current status
14:11:57 <ajo> the documentation we have,
14:12:04 <ajo> and what we're doing for mitaka
14:12:23 <ajo> basically, we don't have a *lot* of things in mitaka, because, doing things right, we need to cover a lot of related dependencis
14:12:38 <ihrachys> irenab: my understanding is that posting an RFE without having anyone to implement and approve the code is a waste of drivers time
14:12:49 <ajo> yeah
14:12:49 * jschwarz thinks that aside from listing features, etc, you guys may want to assign them to people (so drivers will feel comfortable and will know who to ask when things go south)
14:12:51 <ajo> I agree too
14:12:57 <ajo> we can discuss new ideas in the meeting
14:13:18 <ihrachys> jschwarz: and that's where approvers for RFE should come to help
14:13:23 <ajo> but I'd say, let's only fill RFEs if we have people willing, and with the ability to implement
14:13:47 <irenab> ajo: ihrachys : sounds reasonable
14:14:11 <ajo> We still won't have control over the RFEs people fill, but I will monitor that
14:14:16 <ajo> on a weekly basis
14:14:37 <ajo> #action ajo sets a calendar reminder for himself before driver's meeting to check any new QoS related RFE
14:15:24 <irenab> Initially I thought that RFE was for users to express requirements
14:15:51 <ajo> yes, in fact I understand that's the thing
14:16:16 <ajo> but if we can globally do some filtering ourselves here, we're as developers proposing features, so we can discuss it in advance
14:16:19 <ajo> to have a more filtered RFE
14:16:31 <irenab> +1
14:16:35 <ajo> or higher quality RFE, know that we have backers to write the code, etc..
14:16:35 <ihrachys> irenab: well kinda. are users posting the RFEs in question though? for the most part, it's people who are in the community, so a more informal means of tracking ideas could be less harsh for drivers. but maybe it's just me ranting and we should post more RFEs.
14:16:51 <ajo> I guess the general workflow is
14:17:16 <ajo> customer -> openstack-related-company -> developer -> RFE
14:17:20 <ajo> and in some cases
14:17:29 <ihrachys> the thing is, I see that some RFEs are actually closed on drivers meetings because there is no one to back the implementation up.
14:17:33 <ajo> openstack-user/contributor-company -> developer -> RFE
14:17:44 <moshele> hi
14:18:28 <ajo> njohnston, and vhoward seem to be in a good TZ for the drivers meeting
14:18:41 <ajo> and they helped so far by being there and answering :)
14:18:50 <njohnston> We're happy to represent :)
14:18:51 <ihrachys> I usually try to join too, but that time I was off
14:18:57 <ajo> so I guess we could pre-analize here, and if they can represent us, that's great
14:19:23 <ajo> I used too, and I will try from now on, but I will be quite random
14:19:28 <ajo> njohnston++
14:19:30 <ajo> thanks
14:19:43 <ihrachys> yes, let's have US folks on board with representing the group there :)
14:20:04 <ajo> I guess that from now on, we could have a meeting section for qos-related-rfes
14:20:45 <njohnston> sounds good
14:20:55 <ajo> if you find anything missing or you believe something is wrong, please update https://etherpad.openstack.org/p/qos-roadmap when you have time
14:21:55 <ajo> I think VLAN marking, Ingress QoS rate limiting,  are probably quite straight forward
14:22:20 <ajo> in fact, when we implemented  the low levels of vm-egress , we did vm-ingress by mistake
14:22:31 <ajo> and had to switch the implementation
14:22:37 <ihrachys> :D we could close two features in one go
14:22:45 <davidsha> VLAN marking is the same way as dscp for openvswitch.
14:22:54 <ajo> gal-sagie implementation is still there in gerrit
14:23:11 <ajo> davidsha, exactly, it's almost the same
14:23:26 <ajo> one tackles L3, and the other tackles L2
14:23:38 <ihrachys> still two separate rule types
14:23:42 <ajo> exactly
14:24:05 <ajo> then, we have the ECN RFE, which makes sense IMO, but there are a few things to clarify
14:24:36 <ajo> basically, ECN seems to be a mechanism that can be used in combination with TCP/IP to throttle the other host end dinamically
14:24:43 <ajo> if your ingress is getting congested
14:25:14 <ajo> but I believe we need to clarify how congestion is detected, and how we model the rules
14:25:28 <irenab> ajo: I have a suggestion
14:25:37 <davidsha> would that be something to use with traffic classification then?
14:25:39 <ajo> irenab, shot :D
14:25:46 <njohnston> "Conventionally, TCP/IP networks signal congestion by dropping packets. When ECN is successfully negotiated, an ECN-aware router may set a mark in the IP header instead of dropping a packet in order to signal impending congestion. The receiver of the packet echoes the congestion indication to the sender, which reduces its transmission rate as if it detected a dropped packet."
14:25:53 <irenab> I think each RFE should present relevant use case(s)
14:26:06 <ajo> davidsha, no, that's more related to your other RFE :)
14:26:19 <irenab> so it will be clear how the requested functionality is used
14:26:31 <ajo> irenab: +1
14:26:42 <davidsha> ajo: Ah ok.
14:26:50 <ajo> I see use cases for ECN now that I understood it,
14:26:56 <ajo> but, yes, that's not well addressed
14:27:10 <ajo> we should ask vikram and reedip_ reedip for that
14:27:11 <irenab> I think neutron implementation details much less important and can be resolved later
14:27:30 <ihrachys> is there a case when you have ECN supported but you want to disable it?
14:27:30 <ajo> yes
14:27:47 <ajo> ihrachys, like filter ECN flags?
14:28:07 <irenab> I beleive this can be the case, since it should be across the fabric
14:28:09 <njohnston> My question is, ECN is negotiated, it isn't something that is supposed to be administratively enabled or disabled.  And the thing doing the negotiation won't be neutron, it will be the TCP stack implementation itself.
14:28:46 <ajo> njohnston, as far as I undestood, switches, and mid-point network devices can modify the flags
14:28:48 <ajo> on flight
14:29:03 <ihrachys> ajo: yeah but who's going to decide the flag to be set?
14:29:03 <ajo> so they're able to throttle traffic going through it
14:29:12 <ajo> but I'm not 100% sure, we may ask vikram and reedit
14:29:13 <njohnston> "When both endpoints support ECN they mark their packets with ECT(0) or ECT(1). If the packet traverses an active queue management (AQM) queue (e.g., a queue that uses random early detection (RED)) that is experiencing congestion and the corresponding router supports ECN, it may change the codepoint to CE instead of dropping the packet."
14:29:21 <ajo> I'm not an ECN expert by any mean, totally new to me
14:29:44 <ajo> njohnston, ahh, exactly, I got it right then
14:29:55 <ajo> ihrachys, : that's one of the question I had for them
14:30:16 <ajo> ihrachys, it could be the agent, inspecting the ports sustained BW, the host load, the host br-* interfaces bw... etc
14:30:28 <ihrachys> it's clearly not well understood in the team. let's do some homework reading docs before we decide anything on its feasibility.
14:30:33 <njohnston> +1
14:30:38 <davidsha> +1`
14:30:39 <ajo> yeah,
14:30:48 <ajo> it's the time to read, and ask the RFE proposers
14:30:52 <ajo> I'm still on that proccess
14:31:03 <ajo> I see possible value in it
14:31:22 <ajo> as something softer/more effective than policing
14:31:29 <ajo> but, policing is fully automatic
14:31:33 <irenab> ajo: ihrachys : general question regarding RFE. Lets say there something that cannot be impemented by Ref implementation, it should not be proposed?
14:32:06 <ajo> irenab, it's my understanding that "no", but, well, we have things that are only cisco implemented
14:32:12 <ajo> what was the name of it...
14:32:13 <ajo> hmm
14:32:26 <ihrachys> irenab: I think otherwise, I believe it can be proposed.
14:32:51 <ihrachys> though I really wonder what can't be implemented in ovs.
14:32:57 <ajo> I believe it could be proposed, if some SDN vendor implements it, but we'd have to discuss with drivers & core team
14:33:09 <irenab> got it, thanks
14:33:20 <ajo> ihrachys, I start to grasp the limits sometimes,
14:33:21 <ihrachys> yes, that would require some exception process, but I believe it may have place for itself
14:33:50 <ajo> yes, if the use case is well funded, as you said, and it can be modeled, we may try
14:34:07 <ajo> Then
14:34:29 <ajo> we have davidsha's RFE #link https://etherpad.openstack.org/p/qos-roadmap  about neutron QoS priority queing rules
14:34:44 <ajo> davisha, if I didn't get it wrong
14:34:46 <ihrachys> ajo: wrong link
14:34:50 <ajo> sorry
14:34:50 <ajo> :/
14:35:04 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1527671
14:35:04 <openstack> Launchpad bug 1527671 in neutron "[RFE]Neutron QoS Priority Queuing rule" [Wishlist,Triaged]
14:35:12 <ajo> If I didn't get it wrong
14:35:27 <ajo> you propose to have filters for traffic, so different traffic can be limited in different ways
14:35:28 <ajo> right?
14:35:36 <davidsha> correct
14:35:39 <ajo> ok
14:35:52 <ihrachys> does it rely on traffic classifier?
14:35:55 <ihrachys> I assume yes
14:36:05 <ajo> that was foreseen  in our initial brainstorms
14:36:16 <ajo> and we thought we could model such thing
14:36:20 <ihrachys> I don't see the dep mentioned there
14:36:23 <ajo> by attaching rules to traffic classifiers
14:36:30 <davidsha> it was originally going to use ovs flows and then I was looking into tc
14:36:33 <ajo> ihrachys, I commented in #12
14:36:39 <ihrachys> oh I see
14:36:46 <ihrachys> was looking at original description
14:36:48 <ajo> I believe
14:36:54 <ajo> the user case is clear,
14:37:11 <ajo> and the modeling, needs some eye on this: http://git.openstack.org/cgit/openstack/neutron-classifier
14:37:14 <ajo> #link http://git.openstack.org/cgit/openstack/neutron-classifier
14:37:33 <ajo> the RFE should probably be refactored to something like that
14:37:35 <njohnston> davidsha: Can we talk after this meeting about this?  I would like to understand how "the least likelihood of being rejected due to a queue reaching its maximum capacity" is different from DSCP.  That's kind of what DSCP is all about.
14:38:08 <davidsha> njohnston: kk, I'm free to talk.
14:38:12 <njohnston> thanks
14:38:20 <ajo> njohnston, the idea is that you assign different bw limits to different kinds of traffic
14:38:28 <ajo> so you have different likelihoods
14:38:30 <ajo> but let's expand later :)
14:38:53 <irenab> ajo: what is the state of neutron-classifier?
14:39:03 <ajo> davidsha, does it seem reasonable for you to change that RFE into: integrating QoS rules to neutron-classifiers ?
14:39:10 <ajo> that is something we should investigate, definitely
14:39:21 <ajo> I thought we'd end up with some common REST API to manage the classifiers
14:39:25 <ajo> but I don't see that, just libs
14:39:26 <ihrachys> irenab: I believe it's on hold
14:39:29 <ajo> and DB moles
14:39:31 <ajo> models
14:39:42 <ajo> ok, may be they need help on that
14:39:48 <ihrachys> irenab: probably starving for implementers
14:39:57 <davidsha> ajo: would it be ok if I looked into neutron classifier a bit more first?
14:40:11 <ajo> davidsha, makes total sense
14:40:25 <ajo> #action davidsha to look into neutron-classifiers state
14:40:30 <irenab> I do not remember seeing anything on the mailing list or any dedicated sub-team
14:41:02 <ihrachys> irenab: I believe it was just an experiment from Sean Collins that never delivered much
14:41:08 <ajo> let's investigate, and bring up the topic to see how it is.
14:41:19 <ihrachys> I suspect Sean would appreciate help
14:41:27 <ajo> ihrachys, when it was proposed the call was to make it a separate library
14:41:54 <ajo> yes
14:42:03 <ajo> ok
14:42:23 <ajo> and, the last RFE(s) in place are for bandwidth guarantees
14:42:44 <ajo> when we talk about BW guarantees, it's about minimum bandwidth on ports
14:42:54 <ajo> we can have strict, or best-effort
14:43:11 <ajo> strict requires coordination with nova-scheduler, so no interface is oversubscribed ....
14:43:24 <ajo> I'm trying to fight on that battle, but ... to be fair, I'm far from success,
14:43:50 <ajo> there's a spec from Jaypipes which could satisfy what we need in that regard, but It doesn't look to me as dynamic as I think it could be
14:44:06 <ajo> if we could use that mechanism they're designing, it could be awesome
14:44:11 <ajo> (generic resource pools)
14:44:13 * ajo looks for the link
14:44:26 <irenab> ajo: meaning nova will manage BW counting?
14:44:27 <ajo> #link https://review.openstack.org/#/c/253187/
14:44:32 <ajo> irenab, nope
14:44:38 <ajo> not by itself
14:44:54 <ajo> I mean yes, sorry
14:45:02 <irenab> refer the counting done by 3rd party (neutron?)
14:45:17 <ajo> but our dynamic way of modifying policies I'm unsure it plays well with that
14:45:30 <ajo> we may need to some sort of process to sync to that API
14:45:36 <ajo> any of our changes in policies
14:45:44 <ajo> so the nova database is always up to date
14:46:05 <ajo> what bugs me, is that to make that possible, we could need to create one resource pool (or several) per compute node
14:46:19 <ajo> because those resources are consumed  in the compute nodes itselves
14:46:22 <ajo> themselves
14:46:24 <ajo> (sorry) :)
14:46:54 <ajo> I guess that could also help model things like TOR switch bandwidth, and things like that
14:47:03 <ajo> but ok, I'm trying to explore that
14:47:10 <ajo> ..
14:47:16 <ajo> In the other hand, and I finish,
14:47:22 <ajo> is best-effort
14:47:31 <ajo> that basically, is... do what we can within the hosts/hypervisors
14:47:33 <ajo> to guarantee that
14:47:42 <ajo> ovs and TC have mechanisms for that
14:48:01 <ajo> I explored them, and I think davidsha did it too
14:48:11 <ajo> they seem to work
14:48:21 <irenab> ajo: any summary you can share on your findings?
14:48:25 <ajo> the OVS/OF ones require a total refactor or our openflow rules
14:48:32 <ajo> of our
14:48:48 <ajo> because NORMAL rules don't work to queue traffic (we need to use queues)
14:48:49 <ajo> and...
14:49:19 <ajo> TC mixes technologies for filtering traffic (TC and OF...) (a bit like mixing linuxbridge and iptables with openflow)
14:49:22 <ajo> so
14:49:26 <ajo> there's no golden path
14:49:28 <ajo> it can be done
14:49:51 <ajo> may be we could start by TC,and then upgrade to something better in the future (OF only)
14:50:03 <ajo> it worked pretty well on my testings
14:50:13 <ajo> but ok
14:50:28 <ajo> I can dive in the details on another meeting, probably not important now
14:50:49 <ajo> I will switch to checking the status of the ongoing patches if there's no objection
14:51:00 <ihrachys> YES
14:51:05 <njohnston> no objection
14:51:07 <ajo> #topic status
14:51:21 <ajo> njohnston, how's DSCP? and L2 api, any blocker?
14:51:46 <ajo> I made a comment on the RPC patch so you can test the upgrade mechanism
14:51:47 <njohnston> The L2 agent patch has Ihar's +2 and just needs another https://review.openstack.org/#/c/267591/
14:51:56 <ajo> I'm not sure if I fully clarified it
14:52:04 <ajo> #action ajo review L2 agent patch!!! :]
14:52:07 <ihrachys> njohnston: there is a concern from yamamoto there. are we going to handle that?
14:52:11 <njohnston> RPC rolling upgrades has some concerns https://review.openstack.org/#/c/268040/
14:52:32 <ajo> njohnston, yes will address it tonight I guess, I was focusing on the roadmap this morning :)
14:52:59 <irenab> ajo: just wanted to raise the discussion at neutron channel today regarding max_burst parameter misleading name in bw_limit rule
14:53:03 <njohnston> ihrachys: I don't know how we can properly account for that
14:53:07 <davidsha> ihrachys: I don't think it's a problem, any project that used the agent_uuid_stamp was using it for flows.
14:53:22 <njohnston> +1 ^^
14:53:45 <ihrachys> davidsha: I am good. just wanted to clarify.
14:54:02 <njohnston> So once those 2 patches get merged the main DSCP patch looks good, it only has one nit from Vikram https://review.openstack.org/#/c/251738
14:54:06 <ajo> irenab, I agree it's missleading :/ can we talk about it after meeting? :)
14:54:30 <irenab> if the is agreed to be the bug slawek mentioned he would like to fix it
14:54:46 <irenab> ajo: sure
14:54:48 <njohnston> and then the python-neutronclient change for DSCP also looks to be in good shape: https://review.openstack.org/#/c/254280
14:55:21 <ajo> ok, that's great :)
14:55:25 <njohnston> The documentation changes associated with DSCP already have 2 +2s, so I think they can go as soon as the patch they depend on merges
14:55:39 <njohnston> https://review.openstack.org/#/c/273638
14:55:58 <ajo> njohnston, great , side note, the QoS API docs got re-injected, it seems the coauthor removed it by error
14:56:13 <ajo> https://review.openstack.org/#/c/284059/1
14:56:39 <ajo> njohnston, we need to contribute it to the common API guide ^
14:56:43 <njohnston> d'oh
14:56:58 <ajo> it's a hell, XML :)
14:57:05 <njohnston> ajo: There is an API guide change as well: https://review.openstack.org/#/c/275253 with one +2 already
14:57:09 <ajo> I must admit for QoS somebody from the doc team helped
14:57:24 <ajo> ohhh
14:57:29 <ajo> awesome njohnston !!!
14:57:33 <ajo> good work
14:57:46 <njohnston> ajo: All of the gaggle of DSCP changes are listed explicitly in the main patch's commit message: https://review.openstack.org/#/c/251738
14:57:56 <njohnston> and they all depend on the main patch
14:58:14 <ajo> yikes
14:58:15 <ihrachys> nice work
14:58:21 <ajo> 2 minutes to the end of the hour :/
14:58:26 <ajo> any other important updates?
14:58:38 <ajo> I saw the LB support was making good progress too
14:58:42 <ajo> we have fullstack tests now :)
14:58:46 <ihrachys> I believe it's ready to merge
14:58:50 <ajo> ok
14:58:54 <ajo> so another action for me
14:59:01 <ajo> #action ajo review Linux bridge related patches for QoS
14:59:09 <ihrachys> I wonder whether everyone is fine that two redhat cores merge stuff
14:59:21 * ajo tries to clone himself:  raise CloneError()
14:59:30 <ajo> ihrachys, that's a good question
14:59:51 <ihrachys> ajo: maybe it's fine to review and then ask someone else to rubber stamp it
14:59:51 <ajo> give that LB is not our main thing in redhat
14:59:58 <ajo> we're doing it for the community mostly :)
15:00:03 <njohnston> link for the LB change: https://review.openstack.org/236210
15:00:12 <ihrachys> ok we need to wrap up
15:00:15 <ajo> yeah, may be asking for a third +2 and +W
15:00:16 <ajo> yes
15:00:31 <ajo> ok, wrapping up,thanks everybody
15:00:33 * njohnston has no religion on +2s from y'all
15:00:49 <ajo> let's keep discussing on #openstack-neutron (who cans)
15:00:53 <ajo> #endmeeting