#openstack-meeting log

17:00:47 <davidkranz> #startmeeting qa
17:00:49 <openstack> Meeting started Thu Jan 17 17:00:47 2013 UTC.  The chair is davidkranz. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:00:50 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:00:52 <openstack> The meeting name has been set to 'qa'
17:00:57 <mtreinish> hi
17:01:01 <mlavalle> hi
17:01:03 <ravikumar_hp> hi
17:01:05 <davidkranz> Hi there.
17:01:05 <donaldngo> hi
17:01:24 <davidkranz> Jay couldn't make it today.
17:01:50 <davidkranz> afazekas: Around?
17:02:19 <davidkranz> sdague: Here?
17:02:50 <mtreinish> davidkranz: sdague is out sick today. So, I don't think he'll make it.
17:03:05 <davidkranz> mtreinish: Ok, then let's start.
17:03:10 <davidkranz> #topic Reviews
17:03:34 <davidkranz> I don't think there is much to discuss there.
17:04:07 <davidkranz> Can any one speak to the state of the quantum tests?
17:04:32 * afazekas is here
17:04:35 <mlavalle> davidkranz: yes
17:04:47 <mlavalle> I am developing the code for 2 BP's
17:04:50 <afrittoli> hi
17:04:54 <mlavalle> basic tests
17:04:59 <mlavalle> and advanced tests
17:05:38 <mlavalle> https://blueprints.launchpad.net/tempest/+spec/quantum-basic-api
17:06:01 <mlavalle> https://blueprints.launchpad.net/tempest/+spec/quantum-extended-api
17:06:12 <mlavalle> coding the first one
17:06:24 <mlavalle> zyluo is writing code for the second one
17:06:30 <davidkranz> mlavalle: I see those blueprints. What is the relation to https://review.openstack.org/#/c/19152/
17:06:36 <ravikumar_hp> mlavalle: is basic tests will be gated tests?
17:07:21 <mlavalle> davidkranz: that's a refactoring of a smoketest that mnewby implemented a month ago
17:08:01 <davidkranz> mlavalle: So your new tests are in addition to those.
17:08:17 <mlavalle> davidkranz: correct
17:08:54 <davidkranz> mlavalle: So it sounds like progress is being made. That's great.
17:09:15 <mlavalle> davidkranz: :-)
17:10:12 <davidkranz> OK, I think the next topic is progress on parallel execution.
17:10:20 <davidkranz> #topic Parallel execution
17:11:34 <afazekas> At the moment  the resource reuse has more benefits
17:12:12 <davidkranz> afazekas: Resource reuse is certainly easier being more local.
17:12:56 <donaldngo> have we decided to use testr to run the current tempest tests?
17:13:22 <davidkranz> afazekas: But they are both important. The sheer volume of tests is increasing rapidly.
17:13:44 <davidkranz> donaldngo: That is a work in progress.
17:13:59 <afazekas> Probably we need to add more CPU to the gate VMs in order to see performance improvement from parallel testing
17:14:23 <afazekas> Looks like now, even the heavy load can causes flaky cases
17:15:05 <davidkranz> afazekas: That's probably true.
17:16:01 <davidkranz> afazekas: But still, a lot of time is spent waiting for state changes and that does not need more cpu to eliminate.
17:16:05 <afazekas> I will play with tmpfs as instance and glance image storage, probably it can mitigate the flaky issues, and might increase the performance
17:16:51 <afazekas> davidkranz: since the tempest and n-cpu running on the same cpu, we are affected by the cpu load
17:17:01 <davidkranz> afazekas: That would be great.
17:17:56 <davidkranz> afazekas: It may just be that there is simply too many processes running for a single-cpu instance.
17:18:32 <afazekas> yes
17:18:37 <davidkranz> It would be interesting to compare the performance right now with a 2-cpu instance.
17:18:52 <afazekas> we should test the impact of adding more cpu to qemu
17:19:01 <afazekas> yes
17:19:49 <davidkranz> #topic Open Discussion
17:19:59 <davidkranz> Does any one have anything else to discuss?
17:20:03 <donaldngo> is the migration to testtools still a work in progress or are we sticking with nose?
17:21:26 <davidkranz> donaldngo: I think cyeoh is working on it based on the chatter in #openstack-qa
17:21:40 <afazekas> Why we need to switch to testtools ?
17:22:10 <donaldngo> yea I saw cyeoh email I wasn't sure if this was a proof of concept or a change in direction
17:22:46 <davidkranz> afazekas: Jay and Daryl both tried and failed to get around bugs in the nose multiprocessing plugin
17:23:12 <davidkranz> afazekas: THere was no response from the nose people.
17:23:34 <davidkranz> The developer of testtools is part of the OpenStack community and was seen as an alternative.
17:23:38 <afazekas> davidkranz: can you send me links to this bugs ?
17:24:02 <davidkranz> afazekas: You should ask jaypipes for the details
17:24:16 <davidkranz> Also, the ci team switched away from nose.
17:24:20 <jhenner> what about fixing the multiprocessing plugin? (I know it is a headache to use)
17:24:35 <davidkranz> But there has not been a decision that we should stop using nose.
17:24:50 <afazekas> IMHO our case is vary special, so we might need to develop our own very dynamic test tools, with resource reuse capabilities
17:25:00 <davidkranz> jhenner: I wish jaypipes were here to comment.
17:25:04 <jaypipes> davidkranz: I spent a few hours last night, and have some promising code.
17:25:12 <jaypipes> davidkranz:  on testtools + fixtures.
17:25:24 <jaypipes> davidkranz:  problem is, it really does require quite a big rewrite.
17:25:41 <jaypipes> and I'm not sure how long it would take
17:25:51 <davidkranz> jaypipes: Cool. I think folks are still unsure that we should do such a rewrite and that we could salvage nose with less work.
17:26:37 <donaldngo> davidkranz++
17:26:40 <afazekas> Is testtools using the threading module for parallel execution ?
17:26:40 <davidkranz> jaypipes: testtools is still work-in-progress and I'm not sure how we will decide in the end.
17:28:28 <afazekas> We need to consider IPC between test threads/processes, because of the resource sharing
17:29:35 <davidkranz> I think if some one can make nose do what we need, we would probably stick with it. But no one has.
17:29:52 <davidkranz> And those who have tried are pursuing the testtools approach at the moment.
17:31:58 <davidkranz> Any other comments or other topics to discuss?
17:33:21 <afazekas> Would be nice if we could identify flaky issues, more easier.
17:34:13 <afazekas> Basically collecting reviews when a recheck/reverify fixes an issue.
17:34:49 <jhenner> do we know what we need?
17:35:14 <davidkranz> afazekas: There are also the hourly full tempest runs.
17:35:15 <jhenner> I mean, what are our requirements? Are there summarized somewhere?
17:35:21 <davidkranz> afazekas: They are still flaky.
17:35:46 <davidkranz> jhenner: You mean requirements about flakiness?
17:36:22 <afazekas> I want to find coincides in the flaky cases log files.
17:37:03 <mtreinish> davidkranz: I didn't think that the hourly has failed recently since I pushed the fix to that error state during build fix.
17:38:08 <davidkranz> mtreinish: It failed yesterday and the day before. Doesn't seem tempest-related.
17:38:40 <mtreinish> davidkranz: ok
17:40:28 <davidkranz> It would make sense to keep track of the failure rate.
17:40:45 <davidkranz> I will ask the ci folks about that.
17:40:48 <jhenner> davidkranz: I meant requirements for the test runner to run. I didn't finish my sentence in time.
17:41:36 <jhenner> I think failure rate can be watched by some Jenkins plugin. Let me chceck
17:41:36 <davidkranz> jhenner: Ah. I don't think there are any written down.
17:42:13 <davidkranz> jhenner: That would be great.
17:42:45 <jhenner> "Project healthL can display number of failures per test in some interval you choose.
17:43:11 <davidkranz> jhenner: Is this something the ci folks have to install or whatever?
17:43:32 <afazekas> davidkranz: I think so
17:43:54 <davidkranz> jeblair: You there?
17:44:02 <jhenner> https://wiki.jenkins-ci.org/display/JENKINS/Project+Health+Report+Plugin
17:45:07 <davidkranz> That looks like what we need exactly.
17:46:43 <davidkranz> I'll ping the ci folks about that.
17:46:57 <jeblair> hi!
17:47:06 <jeblair> scrolling back
17:47:15 <davidkranz> jeblair: We were talking about the need to track flakiness.
17:47:33 <davidkranz> jeblair: And wondering if we could use that plugin.
17:47:48 <davidkranz> jeblair: Or some other idea you might have for that purpose.
17:48:51 <jeblair> davidkranz: we had to stop having jenkins parse test output, because of the impact that has on jenkins (it creates a synchronization point in test runs, and keeping a build history causes too much load)
17:49:03 <jeblair> davidkranz: so i don't think we can use that plugin
17:49:22 <davidkranz> jeblair: What about just tracking the %failure on a day by day basis.
17:49:49 <davidkranz> jeblair: I mean overall success so test output is not needed.
17:49:51 <jeblair> davidkranz: the concern here is which individual tests are failing, right?
17:50:05 <jhenner> jeblair: There is some clustering support for Jenkins. Do you know about that? Wouldn't it help?
17:50:20 <jeblair> davidkranz: (btw, you saw http://status.openstack.org/rechecks/ which is human crowdsourcing for overall job failing flakiness)
17:50:51 <davidkranz> jeblair: I did see that.
17:51:21 <davidkranz> jeblair: It's been my experience that folks ignore flakies unless the heat rises to a certain level
17:51:42 <davidkranz> jeblair: So I just thought it would be good to know when we were there in an obvious and objective way.
17:52:30 <davidkranz> Still trying to move toward a bigger gate but worried about flakiness.
17:52:45 <jeblair> davidkranz: so what's the granularity you want?  at the jenkins job level, or individual test level?
17:53:29 <davidkranz> jeblair: Both really, but it seemed like you were saying individual test level was too expensive.
17:53:47 <jeblair> davidkranz: well, just that having _jenkins_ parse that is too expensive
17:54:09 <davidkranz> jeblair: I see. We could scrape log files instead.
17:54:32 <jeblair> davidkranz: also, there's a data-cleanliness issue, in that we run tests for proposed as well as merged changes, and also periodic;  you're probably not so concerned with which tests are failing on proposed changes?
17:54:51 <davidkranz> jeblair: Right.
17:55:34 <davidkranz> jeblair: If we could start with job-level failure rate it would be helpful.
17:55:53 <davidkranz> jeblair: If load or other infrastructure issues are a problem it causes random tests to fail.
17:56:22 <jeblair> okay, i think the meeting's almost over, so let's brainstorm about that.
17:56:39 <davidkranz> jeblair: Thanks. That sounds good.
17:57:03 <davidkranz> ANything else before closing the meeting?
17:57:33 <davidkranz> OK, see you all next week.
17:57:37 * afazekas NO
17:58:03 <davidkranz> or on #openstack-qa
17:58:07 <davidkranz> #endmeeting