21:04:02 <markmcclain> #startmeeting Networking
21:04:03 <openstack> Meeting started Mon Sep 30 21:04:02 2013 UTC and is due to finish in 60 minutes.  The chair is markmcclain. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:04:04 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:04:07 <openstack> The meeting name has been set to 'networking'
21:04:18 <markmcclain> #link https://wiki.openstack.org/wiki/Network/Meetings
21:04:23 <markmcclain> #topic Announcements
21:04:31 <markmcclain> The RC will cut tonight/tomorrow
21:04:50 <markmcclain> I've been tagging non-essential bugs as havana-rc-potential
21:05:05 <markmcclain> if they land before the release candidate is cut great
21:05:32 <markmcclain> if not we'll consider any late merges into a potential RC2
21:06:59 <markmcclain> We've had a few bugs that have been causing problems in the gate for a few days now
21:07:03 <markmcclain> #topic Bugs
21:07:09 <markmcclain> https://bugs.launchpad.net/neutron/+bugs?search=Search&field.importance=Critical&field.status=New&field.status=Confirmed&field.status=Triaged&field.status=In+Progress
21:07:54 <markmcclain> The bugs have been causing gate resets which slows down code merging for the entire community
21:08:04 <marun> :(
21:08:06 <markmcclain> https://bugs.launchpad.net/neutron/+bug/1230407
21:08:08 <uvirtbot> Launchpad bug 1230407 in neutron "VMs can't progress through state changes because Neutron is deadlocking on it's database queries, and thus leaving networks in inconsistent states" [Critical,Confirmed]
21:08:31 <markmcclain> the interim work around was to revert the inclusion of the VPN service into the gate
21:08:33 <marun> is that a duplicate of that previous gate bug?
21:08:58 <markmcclain> marun: they're all slight variants of the same problem
21:09:24 <marun> I think jlibosva (probably not on) has narrowed the problem to a deadlock with the l3scheduler
21:09:41 <marun> he reports being able to consistently reproduce anyway
21:09:59 <marun> to clarify, competition between l3scheduler and agent heartbeat update via rpc
21:10:17 <markmcclain> good to know.. reproducing these failures has been a bit of a challenge
21:11:25 <nati_ueno> Hi sorry late
21:12:18 <markmcclain> Are there any other critical bugs that team needs to be tracking?
21:13:03 <markmcclain> #topic API Docs
21:13:13 <emagana> we are targeting ML2 first cut of the documentation this wednesday, the ML2 team has done a great job finding owners for the doc bugs.. all of them have been assigned
21:13:29 <markmcclain> emagana: you jumped the gun :)
21:14:02 <emagana> also a bug doc to include a note about deprecating the OVS and LB
21:14:25 <emagana> markmcclain: sorry..
21:14:29 <emagana> will wait!
21:14:38 <markmcclain> no worries
21:14:40 <markmcclain> Salvatore is off this evening, but emailed me to say that the API docs are on scheduled and should be completed by the Havana release
21:14:46 <markmcclain> https://bugs.launchpad.net/openstack-api-site/+bugs?field.tag=netconn-api
21:15:05 <markmcclain> now we'll move to Admin and user docs
21:15:09 <markmcclain> #topics Docs
21:15:14 <markmcclain> emagana: your turn
21:15:21 <markmcclain> #topic Docs
21:15:28 <emagana> well, I did already mention the ML2
21:15:40 <emagana> I think that is the most important update
21:16:40 <markmcclain> that's good news
21:16:59 <markmcclain> Anything questions on docs?
21:17:58 <markmcclain> #topic FWaaS
21:18:12 <markmcclain> SumitNaiksatam: looks like we a few open reviews for Horizon support
21:18:26 <SumitNaiksatam> just one as of now
21:18:30 <SumitNaiksatam> one just got merged
21:18:35 <SumitNaiksatam> we are doing good :-)
21:18:35 <markmcclain> good
21:18:43 <SumitNaiksatam> this one should also get through
21:18:51 <markmcclain> this one? https://review.openstack.org/#/c/49019
21:19:03 <SumitNaiksatam> yeah
21:19:10 <SumitNaiksatam> its failing on tempest gate
21:19:31 <SumitNaiksatam> otherwise fix is good
21:19:47 <amotoki> SumitNaiksatam: the fix is good. now looking :-)
21:20:01 <SumitNaiksatam> yeah thanks to KC for the fix
21:20:09 <markmcclain> looks like it is being caught up in the mix of the critical bugs we talked about earlier
21:20:24 <SumitNaiksatam> yeah
21:20:43 <SumitNaiksatam> otherwise all other fwaas neutron bugs fixed, so good for now, touch wood
21:21:04 <markmcclain> good thanks for updating
21:21:21 <markmcclain> #topic Horizon
21:21:44 <amotoki> hi, morning.
21:21:53 <markmcclain> amotoki: Any additional Horizon related information to share?
21:22:15 <amotoki> thanks for taking care of quota issue.
21:22:50 <amotoki> https://bugs.launchpad.net/neutron/+bug/1189671 is the only remaining one.
21:22:51 <markmcclain> you're welcome
21:22:52 <uvirtbot> Launchpad bug 1189671 in neutron "default quota driver not suitable for production" [High,In progress]
21:23:22 <amotoki> what is its status?
21:23:26 <markmcclain> that one has some strange unit test assumptions
21:24:01 <markmcclain> so when run out of order installed quota driver isn't always initialized
21:24:52 <markmcclain> in production the order is correct based on normal operation
21:26:07 <markmcclain> I'll post a new revision for you to look at today.. I'll ping you when it's ready
21:26:33 <amotoki> markmcclain: thanks. I will take care of the status.
21:26:55 <amotoki> all from me.
21:27:04 <markmcclain> amotoki: thanks for updating
21:27:24 <markmcclain> #topic Hong Kong Design Summit
21:27:34 <markmcclain> #link http://summit.openstack.org
21:28:14 <markmcclain> You can file summit sessions ideas at that site.
21:28:42 <markmcclain> Remember it is not the first to file, so if your idea overlaps with another feel free to file yours too.
21:29:01 <geoffarnold> What's the deadline?
21:29:38 <markmcclain> I believe the deadline is October 18th
21:29:53 <markmcclain> I encourage folks to file early
21:30:10 <markmcclain> it enables time to provide feedback and helps with planning
21:30:12 <mestery> markmcclain: At one point you said these need blueprints attached, still the case?
21:30:28 <geoffarnold> And do we know how many Neutron-related slots will be available? (From which we can infer pressure to merge....)
21:31:06 <markmcclain> geoffarnold: We have Tues, Wednesday, and Friday
21:32:29 <markmcclain> there will be some pressure to merge as our time in our track has been oversubscribed for the last 3 summits
21:33:27 <markmcclain> That's where filing early helps us better plan and schedule to reduce conflicts for folks who are speaking or needed for cross project sessions
21:33:51 <markmcclain> When proposing topics remember to consider community features too.
21:34:42 <markmcclain> Community features are beneficial to everyone in the Neutron community.
21:35:07 <markmcclain> Any questions on the summit?
21:35:17 <SumitNaiksatam> markmcclain: except vendor plugin features, aren't what we are all doing beneficial to the entire community?
21:35:28 <mestery> Do we need blueprints for design summit proposals?
21:36:06 <marun> mestery: only if you're implementing something!
21:36:39 <mestery> marun: I thought markmcclain said at an earlier meeting design summit sessions with blueprints would have higher precedence.
21:36:53 <geoffarnold> Implementing includes refactoring and deprecating, of course ;-)
21:37:01 <markmcclain> SumitNaiksatam: Yes, but there is a tendency to shiny new features instead of the task such is refactoring code or improving scalability
21:37:01 <marun> of course
21:37:19 <marun> *cough* testing *cough*
21:37:38 * mestery thinks marun is trying to tell us something. :)
21:37:44 <markmcclain> mestery: yes.. including a blueprint with your summit proposal will increase your chances of being selected
21:37:54 <marun> mestery: it's like you're psychic or something!
21:37:56 <mestery> markmcclain: thanks for clarifying.
21:38:02 <SumitNaiksatam> markmcclain: don't want to get into philosophical debate, but I request we stop using the "shiny" new feature terminology
21:38:13 <marun> why?
21:38:41 <SumitNaiksatam> in my mind there are two types of feature, neutron common, and vendor specific
21:38:47 <markmcclain> marun is being a little subtle on testing bit, but testing is something that would definitely want to discuss
21:39:18 <markmcclain> the current tempest random failures are a good indicator that we have some work to do in this area
21:39:30 <marun> SumitNaiksatam: if our existing codebase is extremely problematic from a quality perspective - and our gate issues suggest that is the case - most new features are shiny by definition
21:40:00 <SumitNaiksatam> marun: to me that is a matter of prioritization
21:40:13 <marun> Hopefully the 'neutron sucks' (provisional name of course) session will bring about consensus on how to change this
21:40:25 <marun> But one of the main things is that no new feature should be acceptable without corresponding tempest tests
21:40:34 <markmcclain> SumitNaiksatam: shiny features are good things too I'll find a different way of classifying items
21:40:37 <emagana> marun: +1
21:40:38 <marun> Writing them after the fact has proven disastrous.
21:40:53 <SumitNaiksatam> marun: agree
21:41:25 <mestery> marun: +1
21:41:26 <marun> SumitNaiksatam: Agree on prioritization, completely.
21:41:42 <mestery> And lets not forget migration from and deprecation of nova-networking
21:41:56 <marun> SumitNaiksatam: The hope is to make sure all the important quality issues can be weighed against features this cycle rather than being relegated to the end of the list by inattention.
21:41:57 <mestery> That's been on the docket for a while as well, and is something we as a team need to start looking at solving I think.
21:42:12 <SumitNaiksatam> marun: agree
21:42:30 <geoffarnold> I know we've used Tempest for a while, but it still makes me think of http://en.wikipedia.org/wiki/Tempest_(codename)
21:42:40 <markmcclain> mestery: I agree
21:43:10 <marun> mestery: +1
21:43:12 <beagles> I've been working on a first crack of a definition of  "parity", it isn't quite done yet (should be next COB)
21:43:21 <emagana> please, include Performance Improvement as a main topic. I am pretty sure some developers are working on that
21:43:21 <markmcclain> beagles: great!
21:43:26 <mestery> beagles: Perfect!
21:43:28 <beagles> https://docs.google.com/document/d/1E9CiG8fmcn0FMI15zpE0dRQ1CcFhXV8TxL7mU7vgWv8/edit?usp=sharing
21:43:36 <marun> emagana: I'm not sure we should be approaching that yet
21:43:38 <geoffarnold> emagana: +1
21:43:50 <marun> I think quality and testing are a prerequisite for optimization.
21:43:51 <mestery> beagles: This is pretty awesome, thanks for writing this up.
21:43:58 <marun> If we do one before the other we're going to have issues.
21:44:02 <beagles> there is still lots to add in there, but some of that is in "put all my bloody notes together"
21:44:12 <markmcclain> Remember the summit is community driven, so please do file sessions on these topics
21:44:21 <beagles> there is some very strong language about the necessity of testing there :)
21:44:26 <emagana> marun: we add a lot of new features in this release, I suggest we should focus on performance and scalability for I release
21:44:32 <mestery> beagles: I'm sure marun approves of that. :)
21:45:15 <marun> +1
21:45:26 <beagles> heh
21:45:30 <marun> emagana: if there were tempest tests for those features, I'd agree.
21:45:55 <marun> emagana: But in the absense of being able to ensure that optimizations don't cause regressions, I'll fight tooth and nail against optimization initiatives.
21:46:27 <emagana> marun: Yes, of course. My point is to give priority to performance improvement instead of more new features. I remember reading an email when L3 agent took 5 minutes in restart with just 10 tenants and 10 networks each
21:47:43 <marun> emagana: Ah, ok.  Yes, we need acceptable quality (whatever that might be) in existing features before new ones get priority.
21:47:45 <marun> +1
21:47:53 <markmcclain> So remember to file early: http://summit.openstack.org
21:48:10 <markmcclain> #topic Open Discussion
21:48:28 <markmcclain> Any other items the team needs to discuss?
21:48:30 <mestery> The patch to default devstack to ML2 instead of OVS has been stalling in the gate all day: https://review.openstack.org/#/c/47837/
21:48:41 <mestery> But once that goes in, ML2 will be exercised much more as part of the gate.
21:49:33 <markmcclain> cool
21:49:47 <amotoki> cool, but one question.
21:49:48 <amotoki> Doesn't it make the situation more complicated if we change the defautl plugin at this moment?
21:50:03 <jog0> markmcclain: whats your plan for stabilizing gate?
21:50:13 <mestery> amotoki: It's a risk, but this will help catch issues with ML2 earlier rather than later.
21:50:22 <mestery> And tempest passes the full suite with it already.
21:50:32 <mestery> rkukura and I have verified this locally in our setups.
21:50:45 <markmcclain> jog0: Salvatore did some digging on two of the issues
21:51:20 <jog0> markmcclain: because cutting an RC and opening up development with these bugs outstanding concerns me
21:51:24 <markmcclain> so I'm picking up from the notes he left and I'll ping marun about the agent scheduling deadlock
21:51:53 <amotoki> mestery: I don't have an objection on moving the default. just hearing opinions.
21:52:00 <rkukura> I have one topic to discuss
21:52:04 <mestery> amotoki: Understood.
21:52:19 <sdague> markmcclain: as far as i can see droping q-vpn from devstack-gate made the deadlock go away just be decreasing neutron db traffic
21:52:36 <markmcclain> jog0: I don't think these are recent regressions, but instead we've managed to tweak unrelated items in such a way to amplify them
21:52:38 <sdague> which means the issue is still there, we just aren't tickling it
21:53:10 <markmcclain> sdague: agreed that's why I'm still working on that problem
21:53:22 <sdague> that looked like the minor race though, there are a lot more hard to track down races on networks not starting, and no indication why
21:53:28 <markmcclain> the reversion was temporary workaround
21:53:31 <jog0> sdague: that sounds like a workaround not a fix
21:53:38 <markmcclain> the failures still occur too frequently
21:53:45 <sdague> jog0: absolutely
21:54:29 <markmcclain> sdague: devstack is a still testing environment, so the problem itself needs to be addressed because prod is going to be significantly more complex
21:54:43 <markmcclain> s/still/simple/
21:54:53 <sdague> markmcclain: yep, definitely
21:55:01 <jog0> markmcclain: so will RC and trunk open even if these aren't fixed?
21:55:49 <markmcclain> my personal preference is to cut an RC with this as a known issue
21:56:08 <jog0> markmcclain: can we cut a release with this as a bug?
21:56:13 <markmcclain> mainly for the psychological reason that folks are more likely to test the RC
21:56:50 <jog0> markmcclain: true, but I am afraid of devs moving forward while critical bugs still exist
21:56:50 <markmcclain> jog0: I'd have reservations cutting a release until we understand bug better
21:57:05 <jog0> markmcclain: fair enough, thanks
21:57:15 <sdague> markmcclain: so I'm actually more concerned with the other neutron races which seem far easier to tickle
21:57:28 <sdague> and far harder to figure out the issue
21:57:38 <sdague> https://bugs.launchpad.net/swift/+bug/1224001
21:57:39 <uvirtbot> Launchpad bug 1224001 in neutron "test_network_basic_ops fails waiting for network to become available" [Critical,New]
21:57:48 <jog0> sdague: yeah that one is a strange
21:58:04 <sdague> that's basically why the gate is slow
21:58:13 <markmcclain> sdague: I think that one might be related to same agent problems
21:58:26 <sdague> possibly
21:58:31 <sdague> http://logstash.openstack.org/#eyJzZWFyY2giOiJcInRlbXBlc3Quc2NlbmFyaW8udGVzdF9uZXR3b3JrX2Jhc2ljX29wcyBBc3NlcnRpb25FcnJvcjogVGltZWQgb3V0IHdhaXRpbmcgZm9yXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6Ijg2NDAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM4MDQ1NzY5NzMxOX0=
21:58:45 <sdague> just to get an idea of *how* often that one's popping up
21:59:08 <jog0> sdague: and flaky gate meens bad user experience for havana :(
21:59:45 <markmcclain> jog0: agreed…. users expect stability
22:01:34 <markmcclain> Ok time is up for today… thanks for stopping by.. if you've more data on the critical bugs please add comments to https://bugs.launchpad.net/neutron/+bug/1230407 or https://bugs.launchpad.net/neutron/+bug/1224001
22:01:36 <uvirtbot> Launchpad bug 1230407 in neutron "VMs can't progress through state changes because Neutron is deadlocking on it's database queries, and thus leaving networks in inconsistent states" [Critical,Confirmed]
22:01:54 <markmcclain> Have a good rest of the week
22:01:56 <markmcclain> #endmeeting