21:04:02 #startmeeting Networking 21:04:03 Meeting started Mon Sep 30 21:04:02 2013 UTC and is due to finish in 60 minutes. The chair is markmcclain. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:04:04 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:04:07 The meeting name has been set to 'networking' 21:04:18 #link https://wiki.openstack.org/wiki/Network/Meetings 21:04:23 #topic Announcements 21:04:31 The RC will cut tonight/tomorrow 21:04:50 I've been tagging non-essential bugs as havana-rc-potential 21:05:05 if they land before the release candidate is cut great 21:05:32 if not we'll consider any late merges into a potential RC2 21:06:59 We've had a few bugs that have been causing problems in the gate for a few days now 21:07:03 #topic Bugs 21:07:09 https://bugs.launchpad.net/neutron/+bugs?search=Search&field.importance=Critical&field.status=New&field.status=Confirmed&field.status=Triaged&field.status=In+Progress 21:07:54 The bugs have been causing gate resets which slows down code merging for the entire community 21:08:04 :( 21:08:06 https://bugs.launchpad.net/neutron/+bug/1230407 21:08:08 Launchpad bug 1230407 in neutron "VMs can't progress through state changes because Neutron is deadlocking on it's database queries, and thus leaving networks in inconsistent states" [Critical,Confirmed] 21:08:31 the interim work around was to revert the inclusion of the VPN service into the gate 21:08:33 is that a duplicate of that previous gate bug? 21:08:58 marun: they're all slight variants of the same problem 21:09:24 I think jlibosva (probably not on) has narrowed the problem to a deadlock with the l3scheduler 21:09:41 he reports being able to consistently reproduce anyway 21:09:59 to clarify, competition between l3scheduler and agent heartbeat update via rpc 21:10:17 good to know.. reproducing these failures has been a bit of a challenge 21:11:25 Hi sorry late 21:12:18 Are there any other critical bugs that team needs to be tracking? 21:13:03 #topic API Docs 21:13:13 we are targeting ML2 first cut of the documentation this wednesday, the ML2 team has done a great job finding owners for the doc bugs.. all of them have been assigned 21:13:29 emagana: you jumped the gun :) 21:14:02 also a bug doc to include a note about deprecating the OVS and LB 21:14:25 markmcclain: sorry.. 21:14:29 will wait! 21:14:38 no worries 21:14:40 Salvatore is off this evening, but emailed me to say that the API docs are on scheduled and should be completed by the Havana release 21:14:46 https://bugs.launchpad.net/openstack-api-site/+bugs?field.tag=netconn-api 21:15:05 now we'll move to Admin and user docs 21:15:09 #topics Docs 21:15:14 emagana: your turn 21:15:21 #topic Docs 21:15:28 well, I did already mention the ML2 21:15:40 I think that is the most important update 21:16:40 that's good news 21:16:59 Anything questions on docs? 21:17:58 #topic FWaaS 21:18:12 SumitNaiksatam: looks like we a few open reviews for Horizon support 21:18:26 just one as of now 21:18:30 one just got merged 21:18:35 we are doing good :-) 21:18:35 good 21:18:43 this one should also get through 21:18:51 this one? https://review.openstack.org/#/c/49019 21:19:03 yeah 21:19:10 its failing on tempest gate 21:19:31 otherwise fix is good 21:19:47 SumitNaiksatam: the fix is good. now looking :-) 21:20:01 yeah thanks to KC for the fix 21:20:09 looks like it is being caught up in the mix of the critical bugs we talked about earlier 21:20:24 yeah 21:20:43 otherwise all other fwaas neutron bugs fixed, so good for now, touch wood 21:21:04 good thanks for updating 21:21:21 #topic Horizon 21:21:44 hi, morning. 21:21:53 amotoki: Any additional Horizon related information to share? 21:22:15 thanks for taking care of quota issue. 21:22:50 https://bugs.launchpad.net/neutron/+bug/1189671 is the only remaining one. 21:22:51 you're welcome 21:22:52 Launchpad bug 1189671 in neutron "default quota driver not suitable for production" [High,In progress] 21:23:22 what is its status? 21:23:26 that one has some strange unit test assumptions 21:24:01 so when run out of order installed quota driver isn't always initialized 21:24:52 in production the order is correct based on normal operation 21:26:07 I'll post a new revision for you to look at today.. I'll ping you when it's ready 21:26:33 markmcclain: thanks. I will take care of the status. 21:26:55 all from me. 21:27:04 amotoki: thanks for updating 21:27:24 #topic Hong Kong Design Summit 21:27:34 #link http://summit.openstack.org 21:28:14 You can file summit sessions ideas at that site. 21:28:42 Remember it is not the first to file, so if your idea overlaps with another feel free to file yours too. 21:29:01 What's the deadline? 21:29:38 I believe the deadline is October 18th 21:29:53 I encourage folks to file early 21:30:10 it enables time to provide feedback and helps with planning 21:30:12 markmcclain: At one point you said these need blueprints attached, still the case? 21:30:28 And do we know how many Neutron-related slots will be available? (From which we can infer pressure to merge....) 21:31:06 geoffarnold: We have Tues, Wednesday, and Friday 21:32:29 there will be some pressure to merge as our time in our track has been oversubscribed for the last 3 summits 21:33:27 That's where filing early helps us better plan and schedule to reduce conflicts for folks who are speaking or needed for cross project sessions 21:33:51 When proposing topics remember to consider community features too. 21:34:42 Community features are beneficial to everyone in the Neutron community. 21:35:07 Any questions on the summit? 21:35:17 markmcclain: except vendor plugin features, aren't what we are all doing beneficial to the entire community? 21:35:28 Do we need blueprints for design summit proposals? 21:36:06 mestery: only if you're implementing something! 21:36:39 marun: I thought markmcclain said at an earlier meeting design summit sessions with blueprints would have higher precedence. 21:36:53 Implementing includes refactoring and deprecating, of course ;-) 21:37:01 SumitNaiksatam: Yes, but there is a tendency to shiny new features instead of the task such is refactoring code or improving scalability 21:37:01 of course 21:37:19 *cough* testing *cough* 21:37:38 * mestery thinks marun is trying to tell us something. :) 21:37:44 mestery: yes.. including a blueprint with your summit proposal will increase your chances of being selected 21:37:54 mestery: it's like you're psychic or something! 21:37:56 markmcclain: thanks for clarifying. 21:38:02 markmcclain: don't want to get into philosophical debate, but I request we stop using the "shiny" new feature terminology 21:38:13 why? 21:38:41 in my mind there are two types of feature, neutron common, and vendor specific 21:38:47 marun is being a little subtle on testing bit, but testing is something that would definitely want to discuss 21:39:18 the current tempest random failures are a good indicator that we have some work to do in this area 21:39:30 SumitNaiksatam: if our existing codebase is extremely problematic from a quality perspective - and our gate issues suggest that is the case - most new features are shiny by definition 21:40:00 marun: to me that is a matter of prioritization 21:40:13 Hopefully the 'neutron sucks' (provisional name of course) session will bring about consensus on how to change this 21:40:25 But one of the main things is that no new feature should be acceptable without corresponding tempest tests 21:40:34 SumitNaiksatam: shiny features are good things too I'll find a different way of classifying items 21:40:37 marun: +1 21:40:38 Writing them after the fact has proven disastrous. 21:40:53 marun: agree 21:41:25 marun: +1 21:41:26 SumitNaiksatam: Agree on prioritization, completely. 21:41:42 And lets not forget migration from and deprecation of nova-networking 21:41:56 SumitNaiksatam: The hope is to make sure all the important quality issues can be weighed against features this cycle rather than being relegated to the end of the list by inattention. 21:41:57 That's been on the docket for a while as well, and is something we as a team need to start looking at solving I think. 21:42:12 marun: agree 21:42:30 I know we've used Tempest for a while, but it still makes me think of http://en.wikipedia.org/wiki/Tempest_(codename) 21:42:40 mestery: I agree 21:43:10 mestery: +1 21:43:12 I've been working on a first crack of a definition of "parity", it isn't quite done yet (should be next COB) 21:43:21 please, include Performance Improvement as a main topic. I am pretty sure some developers are working on that 21:43:21 beagles: great! 21:43:26 beagles: Perfect! 21:43:28 https://docs.google.com/document/d/1E9CiG8fmcn0FMI15zpE0dRQ1CcFhXV8TxL7mU7vgWv8/edit?usp=sharing 21:43:36 emagana: I'm not sure we should be approaching that yet 21:43:38 emagana: +1 21:43:50 I think quality and testing are a prerequisite for optimization. 21:43:51 beagles: This is pretty awesome, thanks for writing this up. 21:43:58 If we do one before the other we're going to have issues. 21:44:02 there is still lots to add in there, but some of that is in "put all my bloody notes together" 21:44:12 Remember the summit is community driven, so please do file sessions on these topics 21:44:21 there is some very strong language about the necessity of testing there :) 21:44:26 marun: we add a lot of new features in this release, I suggest we should focus on performance and scalability for I release 21:44:32 beagles: I'm sure marun approves of that. :) 21:45:15 +1 21:45:26 heh 21:45:30 emagana: if there were tempest tests for those features, I'd agree. 21:45:55 emagana: But in the absense of being able to ensure that optimizations don't cause regressions, I'll fight tooth and nail against optimization initiatives. 21:46:27 marun: Yes, of course. My point is to give priority to performance improvement instead of more new features. I remember reading an email when L3 agent took 5 minutes in restart with just 10 tenants and 10 networks each 21:47:43 emagana: Ah, ok. Yes, we need acceptable quality (whatever that might be) in existing features before new ones get priority. 21:47:45 +1 21:47:53 So remember to file early: http://summit.openstack.org 21:48:10 #topic Open Discussion 21:48:28 Any other items the team needs to discuss? 21:48:30 The patch to default devstack to ML2 instead of OVS has been stalling in the gate all day: https://review.openstack.org/#/c/47837/ 21:48:41 But once that goes in, ML2 will be exercised much more as part of the gate. 21:49:33 cool 21:49:47 cool, but one question. 21:49:48 Doesn't it make the situation more complicated if we change the defautl plugin at this moment? 21:50:03 markmcclain: whats your plan for stabilizing gate? 21:50:13 amotoki: It's a risk, but this will help catch issues with ML2 earlier rather than later. 21:50:22 And tempest passes the full suite with it already. 21:50:32 rkukura and I have verified this locally in our setups. 21:50:45 jog0: Salvatore did some digging on two of the issues 21:51:20 markmcclain: because cutting an RC and opening up development with these bugs outstanding concerns me 21:51:24 so I'm picking up from the notes he left and I'll ping marun about the agent scheduling deadlock 21:51:53 mestery: I don't have an objection on moving the default. just hearing opinions. 21:52:00 I have one topic to discuss 21:52:04 amotoki: Understood. 21:52:19 markmcclain: as far as i can see droping q-vpn from devstack-gate made the deadlock go away just be decreasing neutron db traffic 21:52:36 jog0: I don't think these are recent regressions, but instead we've managed to tweak unrelated items in such a way to amplify them 21:52:38 which means the issue is still there, we just aren't tickling it 21:53:10 sdague: agreed that's why I'm still working on that problem 21:53:22 that looked like the minor race though, there are a lot more hard to track down races on networks not starting, and no indication why 21:53:28 the reversion was temporary workaround 21:53:31 sdague: that sounds like a workaround not a fix 21:53:38 the failures still occur too frequently 21:53:45 jog0: absolutely 21:54:29 sdague: devstack is a still testing environment, so the problem itself needs to be addressed because prod is going to be significantly more complex 21:54:43 s/still/simple/ 21:54:53 markmcclain: yep, definitely 21:55:01 markmcclain: so will RC and trunk open even if these aren't fixed? 21:55:49 my personal preference is to cut an RC with this as a known issue 21:56:08 markmcclain: can we cut a release with this as a bug? 21:56:13 mainly for the psychological reason that folks are more likely to test the RC 21:56:50 markmcclain: true, but I am afraid of devs moving forward while critical bugs still exist 21:56:50 jog0: I'd have reservations cutting a release until we understand bug better 21:57:05 markmcclain: fair enough, thanks 21:57:15 markmcclain: so I'm actually more concerned with the other neutron races which seem far easier to tickle 21:57:28 and far harder to figure out the issue 21:57:38 https://bugs.launchpad.net/swift/+bug/1224001 21:57:39 Launchpad bug 1224001 in neutron "test_network_basic_ops fails waiting for network to become available" [Critical,New] 21:57:48 sdague: yeah that one is a strange 21:58:04 that's basically why the gate is slow 21:58:13 sdague: I think that one might be related to same agent problems 21:58:26 possibly 21:58:31 http://logstash.openstack.org/#eyJzZWFyY2giOiJcInRlbXBlc3Quc2NlbmFyaW8udGVzdF9uZXR3b3JrX2Jhc2ljX29wcyBBc3NlcnRpb25FcnJvcjogVGltZWQgb3V0IHdhaXRpbmcgZm9yXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6Ijg2NDAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM4MDQ1NzY5NzMxOX0= 21:58:45 just to get an idea of *how* often that one's popping up 21:59:08 sdague: and flaky gate meens bad user experience for havana :( 21:59:45 jog0: agreed…. users expect stability 22:01:34 Ok time is up for today… thanks for stopping by.. if you've more data on the critical bugs please add comments to https://bugs.launchpad.net/neutron/+bug/1230407 or https://bugs.launchpad.net/neutron/+bug/1224001 22:01:36 Launchpad bug 1230407 in neutron "VMs can't progress through state changes because Neutron is deadlocking on it's database queries, and thus leaving networks in inconsistent states" [Critical,Confirmed] 22:01:54 Have a good rest of the week 22:01:56 #endmeeting