17:00:25 #startmeeting 17:00:26 Meeting started Thu Jun 21 17:00:25 2012 UTC. The chair is jaypipes. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:27 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:31 hi! 17:00:35 Hello! 17:01:10 davidkranz_: around? 17:01:25 Here now. 17:02:05 heya 17:02:08 jaypipes: You saw that Daryl can't make it, right? 17:02:13 yeah 17:02:45 We're pretty much all bowing out this week, meeting conflicts and the like. 17:02:56 so I'm getting increasingly more frustrated with running tempest with multiple parallel processes... 17:03:11 jaypipes: Anything I could help with? 17:03:16 trying to diagnose Nova timeouts is getting very annoying. 17:03:28 davidkranz: I'm not sure :( 17:04:21 davidkranz: the problem is that when I run the tests without --processes, it takes a while but eventually completes. If I run with --processes=8 (my box is a 12-core machine), after a while I start seeing RPC timeouts in networking. and then shit starts snowballing after that. 17:05:15 jaypipes: Hmm. 17:05:27 davidkranz: and it's not an issue with quotas, because I've made the base compute test class create its own tenant/user for its testing. 17:05:39 davidkranz: I'm wondering if Nova just can't keep up with it. 17:05:57 jaypipes: That's what I was thinking. Are you using a single devstack node? 17:05:58 davidkranz: so I will reset my env (yet again) and try with processes=2 instead, and see if things work better. 17:06:17 davidkranz: yup, but frankly, the box has 12 cores and 24G of memory... should NOT be an issue 17:06:24 jaypipes: We need to separate nova stress issues from tempest paralleliaztion issues. 17:07:02 davidkranz: I'm actually not trying to stress Nova! :) Just trying to run the tempest test suite in a shorter amount of time with parallel processes 17:07:10 jaypipes: The number of cores might not matter if there is only one api server or nova-network server. 17:07:36 davidkranz: right. 17:07:42 jaypipes: You may not be trying to stress, but if you are running 8 copies of tempest then you are! 17:08:11 davidkranz: here's the kicker, though: when tempest starts crawling (after these RPC timeouts), doing a virsh list --all hangs indefinitely. So I think this may actually be a libvirt issue./ 17:08:39 davidkranz: not running 8 copies of tempest... just oine copy of tempest, with all the tests split across 8 processes. 17:08:53 jaypipes: I will take a try running against a multi-node system with nova-network running on all compute nodes. 17:09:16 davidkranz: k, I will push my code then for you to pull. 17:10:07 jaypipes: 8 copies of one with 8 processes still puts the same amount of transient stress, just for shorter duratino overall 17:10:34 jaypipes: I meant "or" one with 8 processes. 17:10:41 sure 17:10:56 but that doesn't explain libvirt/QEMU hanging. :( 17:11:23 I'm going to chat with vish about the libvirt non-blocking mode patch that is currently in the queue to see if that might help 17:11:52 jaypipes: It is also possible this is a result of some post-essex regression. 17:12:17 yeah 17:12:22 jaypipes: I think I should work on getting some version of the stress tests into a job that runs every night. 17:13:04 davidkranz: that would be good, yes. 17:13:28 davidkranz: problem is, running stress tests on a 4G VM in the CI environment isn't particularly useful in reporting real errors... 17:13:59 jaypipes: Yes, we need a real cluster for that. We will also need a real cluster when there are real performance tests. 17:14:09 indeed 17:14:53 well, besides me bitching about this, are there particular topics we need to discuss this week? 17:15:25 jaypipes: Just what we should do about the resource thing. 17:15:40 jaypipes: Daryl seemed to think it was related to something you were working on. 17:16:16 davidkranz: well, the original patch I put together for the refactoring of smoke tests did have a resource manager in the base test classes. 17:16:21 I believe that is what he means 17:16:48 jaypipes: OK. Perhaps you can comment on the email I sent outlining my "counter-proposal". 17:17:04 davidkranz: I will, yes 17:17:11 jaypipes: Great. 17:17:24 jaypipes: Are we taking a direction on the negative tests re-factor? I've submitted a lot of negative tests, how can I help? 17:17:33 jaypipes: Nothing else that I know of at the moment. 17:18:07 1. Identifying overlaps in unit tests 17:18:18 rohitk: we are not adding any more negative tests at this point. Instead, we are looking at using a grammar-based fuzz testing tool like randgen to do negative API testing 17:18:39 jaypipes: hmmm 17:19:07 jaypipes: The randgen would do negative API (blackbox) testing 17:19:41 rohitk: correct. 17:19:51 rohitk: although so do the unit tests mostly. 17:19:53 jaypipes: I think that would depend on the FuzzClientManager 17:19:54 ? 17:20:01 yes. 17:20:45 jaypipes: ok, I'll look up the randgen LP link that you put up in the e-mail 17:20:49 thanks! 17:20:59 rohitk: basically, the recent addition of so many negative test cases have made tempest run about 200% longer than before, and we need to find a better, faster strategy instead of adding a test method for every possible negative iteration 17:21:30 japypipes: totally agree, there is little value in making tests unnecessarily run longer 17:22:07 jaypipes: As soon as you push your code I will give it a try. 17:22:17 davidkranz: k, thx 17:23:07 * jaypipes wishes there were 30 hours in a day... :( 17:23:55 alright... JoseSwiftQE, any update on swift? 17:25:19 jaypipes: I've also tracked updates on the Bugs filed for the Skipped tests, i'll wear the SkipCaptain hat for cleaning those up 17:25:30 jaypipes: libvirt hang: is it on oneiric? 17:25:44 vishy: yep 17:25:54 jaypipes: No changes since last meeting. Just waiting for reviews. 17:25:57 rohitk: thx. where are you keeping track of that stuff? 17:26:02 JoseSwiftQE: k, thx 17:26:10 jaypipes: it is a libvirt bug that has been discussed on the ml 17:26:26 i saw updates on many of the keystone bugs filed by myself, have'nt tracked them at a place yet, 17:26:31 jaypipes: but will do 17:26:40 vishy: it's that RPC timeout thing... it's back. Whenever I run with --processes=X where X is >1 17:26:55 jaypipes: oh nm then 17:26:57 vishy: and libvirt just seems to hang and ERROR builds just pile up. 17:27:34 jaypipes: oh i have a good idea about that 17:27:42 vishy: do tell! 17:27:55 jaypipes: are you sure it is libvirt that is hanging? 17:28:32 vishy: if I do a virsh list --all, it hangs. doign ps aux |grep kvm shows a bunch of instances 17:28:42 jaypipes: it is probably this: http://www.gossamer-threads.com/lists/openstack/dev/8808?do=post_view_threaded#8808 17:28:51 jaypipes: solution: use precise :) 17:29:31 vishy: heh. 17:30:36 jaypipes: I think that explains it. Notice the comment from me in that thread. I have been using precise since April... 17:30:46 davidkranz: k. 17:30:58 I will try installing 12.04 then 17:31:14 dist-upgrade from oneric to precise is a complete FAIL. 17:31:35 jaypipes: Good idea. Just beware that there are some incompatibilities with glance I ran into,. 17:31:36 I'll pull another 12.04 iso and reinstall everything... ugh. 17:31:37 jaypipes: ++ 17:31:45 davidkranz: what incompats? 17:32:08 jaypipes: It had to do with resyncing the database. 17:32:30 I don't remember the details. It was a while ago. 17:32:47 davidkranz: oh, k\ 17:32:50 jaypipes: And they may have been fixed. I was a guinea pig for 12.04 with Adam G. 17:32:55 heh 17:34:06 alright y'all, I'm going to head out and install 12.04. davidkranz could you type up a very brief summary to the ML? 17:34:14 #endmeeting