21:02:16 #startmeeting Networking 21:02:17 Meeting started Mon Jan 6 21:02:16 2014 UTC and is due to finish in 60 minutes. The chair is markmcclain. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:02:18 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:02:21 The meeting name has been set to 'networking' 21:02:23 Ha! 21:02:33 * mestery can see his breath as he types in this meeting. 21:02:49 #link https://wiki.openstack.org/wiki/Network/Meetings 21:02:51 cold, heh? 21:02:55 I'm guessing anything south of the mason dixon line 21:03:52 so it's been two weeks since our last meeting and many took time off during the holidays 21:03:53 Almost -30 air temperature this morning, at least -50 with wind chill, all temps F. As cold as I can remember it. 21:04:40 mestery: that just sounds painful 21:04:47 mestery: where do you live? 21:04:49 Wow!! 21:04:55 mestery: if next week in Montreal is anything like that, I think I'll just die. That is way below my operating temperature. Seriously I have a label which says "operate strictly between 0C an 40C" 21:05:11 mlavelle: Minnesota 21:05:12 welcome to the polar vortex 21:05:26 Icehouse-2 is Jan 24rd 21:05:30 oops 23rd 21:05:57 The tempest/Neutron sprint is next week 21:06:01 * mestery hopes someone brings salv-orlando a toque: http://en.wikipedia.org/wiki/Toque#Canadian_usage 21:06:40 #topic Bugs 21:06:54 mestery: believe it or not I have a Toronto Maple Leafs toque, somewhere 21:07:11 * mestery loves the Maple Leafs. :) 21:07:12 mestery: its a balmy 28F in montreal 21:07:26 dkehn: Psssh. That's for amateurs. :) 21:07:40 dkehn: oh yeah? I'm so packing my swimsuit and sunglasses then 21:07:54 so we're still tracking the same critical bugs as before Christmas 21:07:55 http://www.weather.com/weather/extended/CAXX0301?par=yahoo&site=www.yahoo.com&promo=extendedforecast&cm_ven=Yahoo&cm_cat=www.yahoo.com&cm_pla=forecastpage&cm_ite=CityPage 21:08:10 salv-orlando: I'll meet you by the pool 21:08:22 markmcclain: I have updated the status of bug 1253896 21:08:22 markmcclain: at least one should go away 21:08:31 salv-orlando: ok 21:08:37 enikanorov: which one? 21:08:50 "timeout waiting for thing" one 21:09:19 great 21:09:23 the fix has been committed to nova 21:10:43 enikanorov: do you have a link for the fix? 21:10:49 I want to link it into https://bugs.launchpad.net/neutron/+bug/1250168 21:10:56 let me find 21:11:14 nati_ueno: looks like we have reviews ready for https://bugs.launchpad.net/neutron/+bug/1112912 21:11:43 nati_ueno: still seems to failing jenkins 21:11:53 have you been able to triage it? 21:12:02 markmcclain: I got it. I'll fix it in this afternoon 21:12:16 great 21:12:38 markmcclain: Jenkins looks working for me (Dec16) but may be rebase needed 21:12:45 markmcclain: https://review.openstack.org/#/c/64383/ 21:12:56 I've getting Jenkin's failures on my code review (only docstring changes in latest version). 21:13:03 nati_ueno: please rebase 21:13:06 enikanorov: thanks 21:13:09 Can someone help me off-line to devug 21:13:09 markmcclain: gotcha 21:13:15 debug? 21:13:38 pcm_: ask around in the channel after the meeting 21:13:45 or do I need to rebase (was done last week). 21:13:50 if I didn't have to run I'd hang around and help 21:13:50 markmcclain: OK 21:14:06 pcm_: I'd try that 21:14:13 markmcclain: OK. 21:14:40 marun: this bug is still open 21:14:40 https://bugs.launchpad.net/neutron/+bug/1192381 21:14:45 can we consider it closed? 21:14:54 markmcclain: I think so, yes. 21:15:10 ok 21:15:12 markmcclain: there is a follow-on blueprint that will ensure eventual consistency: https://blueprints.launchpad.net/neutron/+spec/eventually-consistent-dhcp-agent 21:15:25 markmcclain: but the best we can do without a refactor has already been committed 21:15:33 ok 21:15:45 what milestone should I target for the bp? 21:16:06 markmcclain: probably icehouse-3 21:16:10 ok 21:16:24 markmcclain: hopefully progress can come sooner but i don't want to rush it 21:17:09 yeah we definitely want to make sure we maintain stability 21:17:27 Any other critical bugs the team needs to discuss? 21:17:42 do we skip our favourite bug? 21:17:46 bug 1253896? 21:18:15 we did 21:18:24 https://bugs.launchpad.net/neutron/+bug/1253896 21:18:42 looks like you did some research on it late last week 21:18:44 come on, we're actually not fixing this bug just because we love talking about it. 21:19:05 seriously, failures in non-isolated jobs are down to 0% which is good for the gate 21:19:12 yeah 21:19:46 however we want to make isolated and parallel jobs the default solution, so it's not so good that just the isolated jobs has a failure rate of about ^% 21:19:50 sorry 6% 21:20:02 and that's up ftom about 2.5% before christmas 21:20:07 ugh 21:20:22 do we want to get isolated/parallel working as currently defined? 21:20:36 it's currently a really beastly stress test that is unrepresentative of real-world usage for the most part 21:20:50 marun: yes, that is correct, but I would like to move this discussion in the tempest part 21:20:51 (because of running on a single node that is cpu/io bound like crazy) 21:21:05 salv-orlando: fair enough 21:21:32 We'll return to parallel tests in tempest section 21:21:43 for the current situation, we need to look at the logs. If we conclude that the reasons for the failure are the same that are causing failures in the parallel jobs, then we should just wait for the patches to merge; but I doubt that. 21:22:05 It would be great if we can get some fresh eyes to look at the logs, as I won't have much time during this week. 21:22:08 yeah I am interested to know what caused the failure rate to triple 21:22:31 Any other bugs the team needs to discuss? 21:22:35 I am too. Note that I've been taking 24 hours samples - the first on Dec 23 and the second on Jan 2 21:23:14 that's is all. I would be happier if I somebody else volunteers to look at the logs and provide his/her feedback 21:23:22 especially if that somebody is coming to montreal next week. 21:24:01 I've got training tomorrow, but I'll try to spend time digging if anyone has spare cycles before then feel free update the bug 21:24:05 * salv-orlando is sure people are now rushing to logs.openstack.org to check the logs 21:24:50 #topic Docs 21:25:04 emagana is out today but has filled in the report 21:25:15 #topic Tempest 21:25:24 markmcclain: can I bring up a docs item? 21:25:28 #undo 21:25:29 Removing item from minutes: 21:25:36 annegentle_: yes! 21:25:39 markmcclain: we're having a discussion on the mailing list about a new networking -only guide 21:25:45 #link http://lists.openstack.org/pipermail/openstack-docs/2014-January/003582.html 21:25:54 that just goes to a mid-thread discussion 21:26:04 but since Edgar has been out I haven't been able to ping him about it 21:26:33 so, just wanted to put it on your radar... I'm hesitant to add another guide what with all the reorg we've been doing (augggh) but wanted to see what you all think, would it be useful, better, worse? 21:27:13 I missed this thread, so thanks for raising it 21:27:30 +1 21:27:55 I agree with Tim Bell that "if the image management gets it's own book, why not networking" 21:28:14 but, it's a pile of work, with real underlying teaching needs, so we need to find a good "owner" 21:28:18 subscribed to the BP - i'll pitch in from a lot of doc that I wrote up internally while I was learning Neutron 21:28:24 I'm all in favor of a guide for networking. 21:28:37 anyway, feel free to discuss amongst yourselves, comment on the blueprint, etc. Ohh thanks sc68cal 21:28:54 markmcclain: thanks for letting me pop in :) 21:29:03 carry on 21:29:05 my 2p at first glance is that the Neutron community has struggled a bit to handle the current workload 21:29:37 salv-orlando: yep really it'd be best to have a tech writer take it 21:29:51 salv-orlando: if you know of anyone, I could even ask for a contract 21:30:48 annegentle_: exactly my point is that we don't have a doc guy so far; just people taking turns in playing that role. I'll let you know if I hear of somebody interested 21:31:00 k thanks 21:31:19 ok.. everyone feel free to catchup on the thread and chime in on the mailing list 21:31:25 annegentle_: thanks for pointing this out 21:31:38 annegentle_: anything else since edgar is out this week? 21:31:40 note that it is a thread in docs ML 21:31:52 amotoki: good reminder 21:32:19 #topic tempest 21:32:19 * mestery didn't know there was a docs mailer and wonders why that is in fact ... :( 21:32:49 Let's circle back to parallel testing before we dive into Tempest 21:32:58 salv-orlando: want to update on parallel testing? 21:33:06 sure markmcclain 21:33:39 http://lists.openstack.org/pipermail/openstack-dev/2013-December/023109.html 21:34:02 We have a bunch of patches which are aimed at solving the structural problems we found in the OVS agent. 21:34:15 They're are all listed in the email linked above 21:34:37 http://lists.openstack.org/pipermail/openstack-dev/2014-January/023289.html 21:34:38 While running parallel tests we noticed a set of new issues. 21:35:24 * salv-orlando is looking for a link, sorry 21:35:52 these issues have all been tagged with neutron-parallel: https://bugs.launchpad.net/neutron/+bugs?field.tag=neutron-parallel 21:36:20 a single issue is causing 90% of tests to fail and has to do with an error on port quota check. 21:36:24 I have a fix for it. 21:36:39 ok 21:36:55 But sdague rightly asked me to make sure that I'm not just gaming the test and hiding what would be a fundamental issue inneutron 21:37:06 but I'm no that anyway 21:37:35 https://review.openstack.org/#/c/64217/ 21:37:38 on the other issues, I think most of them are just because the tests are not parallel-safe, or do not take into account things might work differently with neutron 21:37:47 except for one error: the ssh protocol banner error 21:38:06 I think Nachi had a good hint that it might depend on the metadata server being slow or failing 21:38:12 nati-ueno: ^^^ 21:38:36 because that would explain why ping works and ssh not even if iptables rules are correctly configured on the l3-agent 21:39:03 So that's the current situation. What I would love to see is people picking up all the bugs tagged with neutron-parallel 21:39:09 and squash all of them by next week 21:39:20 that's all from me. Questions? 21:39:51 it would be really nice to have the parallel bugs solved before we all meetup 21:40:16 * marun gets with the squashing 21:40:24 marun: thanks 21:40:43 salv-orlando: thanks for working through the parallel issue 21:41:31 I think it's now time to move to what marun said: the load imposed by tempest 21:41:40 happy to talk about that? 21:41:48 yeah we can discuss 21:42:06 salv-orlando: Yes I faced the errors when metadata server didn't working well 21:42:14 I think if I interpret marun correctly, tempest with isolation and parallelisms bring the cpu on the gate close to 100% 21:42:19 salv-orlando: problem is the ssl certificate configration via metadata server 21:42:22 i think it's reasonable to do stress testing, but watching neutron fall over in a scenario that no self-respecting operator will allow is crazy. 21:42:47 * mestery agrees with marun. 21:42:48 I think this is because every test creates and wires a network with dhcp enabled, and attach is to a router (which is wired as well) 21:43:11 now some history, deriving from conversations with mtreinish 21:43:25 this has been deemed good, because it stressed the neutron a bit 21:43:59 since there is no other form of stress testing (the large ops jobs uses a fake driver, so it stresses just the api server) 21:44:18 I would agree the stress has been good 21:44:20 it is arguable whether stress testing should be part of the current test suite, whose aim is functional 21:44:27 and that decision is not from me. 21:44:43 also agree that functional and stress should be different conversations 21:45:21 From my side, I've noticed that in particular creating and wiring a router for every test put such a load on the l3 agent that caused long delays and potential timeouts; so I had a patch for making router creation only optional 21:46:07 salv-orlando: so it's our agent not handling the load 21:46:44 markmcclain: I would say that if you ask to wire 50 routers in a minute, it's understandable that the agent might take more than 60 seconds to process the load 21:47:18 yeah agreed 21:47:24 is that a reasonable usage scenario? 21:47:45 in a large cloud maybe 21:48:06 anyway, I would just like to get a consensus between neutron and qa teams 21:48:24 on how to behave wrt network resource creation for each test 21:48:46 I am happy with anything :) 21:49:15 salv-orlando: i think providing stable behavior under parallel execution is the goal. 21:49:21 ++ 21:49:27 Seems to me that the goal for functional (tempest) tests should be to run as quickly as possible, and with minimal resources (i.e. in single VM). Stress testing is different. 21:49:29 But I guess we can't do anything better than moving the discussion on the mailing list 21:49:38 rkukura: +1 21:49:43 salv-orlando: agreed 21:50:01 ok marun, can we count on you to start the thread? 21:50:01 yeah I'd like to give folks an opportunity to weigh in 21:50:12 salv-orlando: can do 21:50:18 marun: thanks 21:50:19 marun: thanks 21:50:29 np 21:50:36 ok 10 minutes left 21:51:00 mlavalle: Any items that you included on the agenda that need to be highlighted? 21:51:20 nope, everything is in the agenda, let's move on 21:51:34 mlavalle: thanks for providing the update 21:51:47 np 21:52:03 Any other Tempest items to discss? 21:52:19 #topic IPv6 21:52:25 hello 21:52:38 I'm still working on the tail-f failure 21:52:51 I've been trying to track down the system maintainer 21:53:08 I found one person, but he was the point of contact 21:53:12 *was not 21:53:21 Thanks - that's pretty much blocking all of our progress, since it seems to scare off reviewers with the big scary -1 21:53:30 markmcclain: I was wondering if it's ok to ask the infra team to temporarily revoke voting rights to the service user 21:53:50 salv-orlando: that's an interesting proposition 21:53:55 I'll discussion with them 21:54:21 We're also looking more deeply into what would need to be done for VIF attributes for hairpinning 21:54:22 sc68cal: yeah I've changed my gerrit view so that I can see who gave -1s 21:54:29 ok 21:54:34 we're finding that it's a libvirt only behavior 21:54:50 ok 21:55:06 Anything else? 21:55:13 So is it worth having a VIF attribute if it's really only one compute driver using it? 21:55:33 otherwise that's it for me 21:55:44 that's probably best asked on ML, so that we give everyone a chance to weigh in 21:55:51 Thanks for the update 21:55:53 #topic ML2 21:56:00 rkukura or mestery:? 21:56:23 We plan to drive the port delete issue to closure this week at the meeting, there has been a long-standing thread on that since before the holidays. 21:56:36 Also, the plan for this week is to refocus on the TypeDriver enhancements as well. 21:56:45 some closure on that will be good 21:56:54 We're seeing a few more Ml2 MechanismDrivers pop up, which is also pretty cool! 21:57:00 awesome 21:57:18 That's about it for ML2, more detail in the Wednesday meeting on IRC. 21:57:35 thanks for updating 21:57:36 nothing to add 21:57:43 #topic CLI 21:58:09 On Friday, a version of dependent lib was released that broke the neutronclient 21:58:21 we rushed out a temporary fix 21:58:34 and released 3.3.3 which is compatible with the newer version of cliff 21:58:50 #action markmcclain to send out email on long term client fix 21:58:55 * anteaya notes verification voting is turned off for tail-f, see ml 21:59:22 also new accounts have verification voting off by default 21:59:42 anteaya: thanks for clearing that up 21:59:50 np, sorry I'm late 21:59:50 #topic Open Discussion 22:00:00 montreal weather: http://www.weather.com/weather/tenday/Montreal+Canada+CAXX0301 22:00:05 dumb question...there's been some meeting date/time changes. Is there a wiki with all meetings listed (versus scanning ML)? 22:00:18 pcm_: https://wiki.openstack.org/wiki/Meetings 22:00:25 sc68cal: Thanks! 22:00:31 we are doing waves of cold and warm, so far looking warm for code sprint 22:00:42 Hi Folks, I have posted an updated version of "Distributed virtual Router" blueprint for review. Please review it and provide your feedback. 22:00:43 oh, on the issue of ml2.... 22:00:48 markmcclain: neutron multihost. is it planned? I guess it should be on nova-parity list? 22:00:52 link: https://docs.google.com/document/d/1iXMAyVMf42FTahExmGdYNGOBFyeA4e74sAO3pvr_RjA/edit 22:01:03 requiring the agent to be up to bind ports synchronously. good? bad? discuss! 22:01:11 i also remember a patch from gongysh implementing multihost 22:01:15 enikanorov: yes there are multiple groups working on solutions to tackle it 22:01:42 marun: Sounds like a question for openstack-dev 22:01:42 markmcclain: do you knwo who can I contact with to discuss it? 22:01:42 marun: Is that with OVS and/or LB? I would think it has to be up for a synchronous bind, no? 22:01:47 enikanorov: the original patch from gongysh I don't think would be accepted right now 22:01:57 that's for sure 22:02:05 I'll encourage the teams working on solutions to post updates to the ML 22:02:15 ok, thanks 22:02:42 rkukura: fair enough 22:02:47 marun, mestery: I think this is the "fail to bind if agent isn't currently alive" vs. "there has been an agent on that node sometime in the past so it will eventually work, so lets just bind now" 22:02:58 marun: that's definitely a ML discussion since we're over time 22:03:08 enikanorov: I think the DVR is a direction to replace multihost. 22:03:15 mestery: neutron is a distributed app. we need to act accordingly 22:03:44 marun: Lets discuss on ML :) 22:03:55 Thanks to everyone for stopping in this week… remember we have the IRC channel and ML to discuss items between meetings 22:03:55 mestery: :) 22:04:00 #endmeeting