17:00:47 #startmeeting qa 17:00:48 Meeting started Thu Dec 13 17:00:47 2012 UTC. The chair is jaypipes. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:49 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:51 The meeting name has been set to 'qa' 17:01:30 Hi Jay. 17:01:58 #topic Proposal to add Attila (afazekas) to QA core 17:02:20 I'd like to recognize afazekas's work in the past couple months on Tempest 17:02:31 and am proposing his inclusion into qa-core 17:02:51 I figure it's good enough to do an informal vote here 17:03:05 jaypipes: +1 17:03:16 sounds good +1 17:03:23 If you agree with the proposal, please say so. If you have reservations or would like to hold off, please also say so 17:03:26 +1 yes. Afazekas provides valuable review feedbacks 17:04:12 * jaypipes has been impressed with afazekas's ability to decipher the relationships between tempest, devstack and devstack-gate, which can be tricky! 17:05:08 OK, well, I'll take that as a tentative agreement on Attila. I'll send a note to the QA list this afternoon asking for more feedback, and if I receive none by end of today, I will add afazekas to qa-core 17:05:22 alrighty, next topic... 17:05:29 #topic Outstanding reviews 17:05:37 #link https://review.openstack.org/#q,status:open+project:openstack/tempest,n,z 17:05:43 We will work bottom up. 17:06:04 fungi, mordred: https://review.openstack.org/#/c/17063/ 17:06:27 fungi, mordred: any progress on this or thoughts about proceeding to getting Tempest installable via normal Python means? 17:06:59 fungi, mordred: last I checked, there were issues because tempest had an /tempest/openstack.py module that name-interfered with tempest.openstack.common inclusion? 17:06:59 i'm not entirely sure what issues mordred ran into trying to add that 17:07:15 ahh, right, namespace problems 17:07:19 jaypipes: that is correct - and I have not yet had time to sort that out 17:07:50 mordred: OK, well it has a negative review and will be auto-expired in the next couple days I believe... 17:08:05 mordred: if you want to shelve, could you mark it Work In Progress? 17:08:08 Similar change added to the tempest, I did not see the concurent attempt 17:08:10 jaypipes: yah 17:08:21 mordred: also, a blueprint or bug would be great ;) 17:08:45 the issue with that patch we have an openstack.py and openstack folder in the same location 17:08:46 afazekas: I think the review above is about pulling in the Oslo (openstack-common) packaging help 17:08:51 afazekas: right. 17:09:15 afazekas: so it will take some effort to rename the openstack.py module, since it's used virtually everywhere ;) 17:09:44 afazekas: if you're interested, mordred and fungi can provide some insight into their overall direction on that work 17:10:08 yes. 17:10:35 ok, anything more to add on that review? 17:11:07 I am seeking for merged setup py change 17:11:14 #action mordred to mark https://review.openstack.org/#/c/17063/ Work in Progress 17:11:38 #action afazekas to work with mordred and fungi on setup.py normalization with openstack-common 17:12:02 alright, next one... 17:12:04 #link https://review.openstack.org/#/c/17829/ 17:12:09 Ravikumar_hp: you're up! 17:12:14 jaypipies: object expiry testcase - resubmitted incorporating review feedback . Wish it is merged this week . 17:12:18 jaypipes: I think this just needs a review. 17:12:19 this is the swift object expiry test 17:12:30 i will follow up sdague to get it reviewed 17:12:37 also afazekas 17:12:52 Ravikumar_hp: good. I will give it a stab this afternoon, as well 17:13:28 Ravikumar_hp: it looked like the initial concerns from sdague were addressed with a shorter sleep(5) call? 17:13:36 yes 17:13:51 Ravikumar_hp: Other than that, the test is essentially skipped until bug 1069849 is resolved, right? 17:13:52 Launchpad bug 1069849 in swift "Containers show expired objects" [Undecided,In progress] https://launchpad.net/bugs/1069849 17:14:01 yes 17:14:16 also it is not gated smoke test 17:14:21 right, noted 17:15:00 Ravikumar_hp: anything more on that one? 17:15:09 no . thanks 17:15:12 np :) 17:15:17 #link https://review.openstack.org/#/c/18035/ 17:15:38 mtreinish: ping 17:15:42 jaypipes: This failed with our friendly flaky server failure. Just rechecked. 17:15:48 anyone know Jaraslov Henner's IRC? 17:16:19 davidkranz: which one? https://review.openstack.org/#/c/18035/ ? 17:16:28 jaypipes: Yes. 17:16:55 davidkranz: hmm, that's odd... it's the glanceclient test. 17:17:27 maurosr: ping 17:17:37 maurosr: you had some concerns on https://review.openstack.org/#/c/18035/ 17:17:53 maurosr: and I wanted to make sure you had your questions answered. 17:18:11 maurosr: what I believe Jaroslav is doing is correct, though a bit obtuse 17:18:50 maurosr: the set() <= set() operation is detecting whether the IDs of the added images are different from the set of "current images" from the image list call 17:19:26 dwalleck: welcome back ;) gotta love VPNs. 17:19:46 yup 17:19:52 OK, well it looks like folks aren't around that need to talk about https://review.openstack.org/#/c/18035/... so we'll move on. 17:20:06 jaypipes: right.. got it now, should have tested it before.. 17:20:19 maurosr: no worries! 17:20:30 #link https://review.openstack.org/#/c/18030/ 17:20:55 I tend to agree with mtreinish about https://review.openstack.org/#/c/18030/. The XML to JSON (and JSON to XML) stuff there is very fragile 17:21:09 and I'm not sure that the proposed solution really solves the bug properly. 17:21:36 davidkranz, dwalleck, afazekas, sdague: if you could give a review on https://review.openstack.org/#/c/18030/, that would be great. It's an XML output vs. JSON output mismatch issue. 17:22:07 mnewby: around? 17:22:12 http://docs.openstack.org/compute/api/v1.1 contains API version 17:23:14 probably this part should be differnt with different cumpute api version 17:23:42 afazekas: sorry, I'm not following you... 17:23:58 afazekas: That link doesn not exist 17:24:34 bit it is in the xml 17:24:57 and we might have multiple api version to support 17:25:14 s/bit/but/ 17:25:25 Oh, I see what you're saying.... 17:25:58 looks like the [compute] section does not have api_version option 17:26:28 afazekas: I think, though, in this case of the review, it's a matter of the XML translation in the volume_extensions rest client not being correct 17:28:23 OK, well, let's move on... 17:28:45 The remaining reviews seem to be just waiting on a successful tempest gate run, so we can move on to other topics. 17:28:50 #topic Open Discussion 17:29:05 Please feel free to bring up issues now 17:29:28 jaypipes:any idea on parallel execution or test tools 17:29:36 dwalleck_ and others: Has anyone been able to make progress on parallelization? 17:29:43 Ravikumar_hp: lol, you beat me to it :) 17:30:01 jaypipes: yes, I've had success with a few different options 17:30:24 dwalleck_: please do tell! :) 17:30:26 dwalleck_: Enough to make a recommendation? 17:30:39 The first step for all of them though requires ripping out all of our nose tags/imports 17:31:34 The second is refactoring some of our tests to be more efficient when run in parallel (which goes back to the patch I unsubmitted/need to submit again) 17:32:16 Even something as simple as writing a short python script to gather the tests and spin them up in threads/processes/greenthreads works 17:32:48 dwalleck_: the patch that breaks out some of the tests into smaller tests? 17:32:53 dwalleck_: the server actions, etc? 17:33:20 The problem we're going to run into isn't resource constaints test server-side, it's the fact that even though you run everything in parallel, the tests will still take as long to run as the longest running test class/module 17:33:22 yes 17:33:37 So without that, the benefits are there, but minor 17:34:33 dwalleck_: well, just getting to the state where tempest only takes as long as its longest test would be a huge accomplishment! 17:34:39 The only way to get around it would be to "fix" the type of parallization nose allows, but there's a good reason no one else does it: it's tricky to implement right 17:34:41 dwalleck_: Why are they minor? If the longest test takes 1 minute, that's great. 17:34:54 davidkranz: heck, if it takes 5 minutes, great ;) 17:35:16 davidkranz: The longest running test class. In this case, that's test server actions, which alone takes well over 10 min 17:35:28 I think the big lossage now is failing to overlap actual test cases with waiting for servers. 17:35:33 And got much longer when I added admin actions 17:35:43 dwalleck_: So if we just break up that test we should be pretty good. 17:36:02 Unfourtantly we are limited in number of VM's we can run on single machine at the same time 17:36:16 Yeah, that will be about as good as it gets without resorting to other tricks such as pre-building VMs for certain tests 17:36:25 dwalleck_: We just need to make sure we don't sequentially allocate servers in a single test. 17:36:52 afazekas: well, sure, but we aren't really hitting those issues yet... at least as far as total runtime of tempest goes. We're still a serialized execution :( 17:37:33 afazekas: If we are parallel, we can just throttle vm creation and make tests wait. 17:37:38 davidkranz: You can even do that (I think we do it already in the list servers test). You just don't start waiting till you've created all the servers you want, and then start waiting 17:37:56 jaypipes: I am speaking about what can happen if we start every case in parallel. 17:38:03 afazekas: ah, yes indeed. 17:38:20 So if no one minds me making a WIP branch, I can rip all the nose stuff out and show an example of what this could look like 17:38:33 dwalleck_: go for it. 17:38:43 I think we're going to need a lot more discussion, but it gives a starting point 17:39:04 dwalleck_: Yes. But once we parallelize waiting, it is not a problem any more. 17:39:14 sounds good 17:39:16 dwalleck_: We become limited by the resource limit. 17:39:30 dwalleck_: Which is the best we can do. 17:39:45 davidkranz: The resource limit can be worked around as well. Quotas are very easy to manipulate 17:39:49 something else that could significantly improve performance is this: Only do a single setup for ListXXX tests, and then execute the XML *and* JSON clients against those fixtures. Right now, we do a setUpClass() creating servers for both XML and JSON when that isn't necessary for list tests 17:39:54 dwalleck_: Would be great to have some csv about how mutch time spent in every method 17:40:28 afazekas: I had that somewhere (you can get the same thing by using the --with-xunit option with nose) 17:40:34 afazekas: If you mean every test case you can get XML from a nosetest option 17:41:26 dwalleck_: unfortunately, the xunit output only includes tests themselves, whereas a large portion of time is spent in setUpClass and tearDownClass, and those are not represented in the timings 17:41:30 But the problem is that it doesn't include the time spent in fixtures, so I did some instrumentation to get some stats on how many things we build and how long we spend waiting. Those numbers seem to be the crux of the problem anyway 17:41:37 right 17:41:38 dwalleck_: :) right. 17:42:09 dwalleck_: Sounds like you have a good handle on this. That's great. 17:42:09 dwalleck_: I've found that reducing the build_interval from 10 to 3 speeds things up significantly. 17:42:10 And if my devstack environment will play nicely, I can get that. Having some odd issues with the tests hanging on deletion of floating IPs 17:42:34 But I'd be glad to share those numbers once I have them 17:42:41 cool. 17:42:58 jaypipes: In the devstack case definitely. Since the servers build so fast, it makes sense to check more often 17:43:05 Done with this topic? 17:43:41 I have been thinking about the fuzz testing. 17:44:02 Is any one actually working on that? 17:44:28 matt has been some with his team. Not sure about his progress though 17:44:48 dwalleck_: matt? 17:44:53 dwalleck_: in our CI cluster, our servers build out in about 6 seconds, on average... 17:45:00 I'll poke him to make an apperance or at least send an email out 17:45:10 dwalleck_: which means setting to 3 usually is a good target 17:45:30 davidkranz: matt tesauro, app sec guy who was with my group at the conference 17:45:42 dwalleck_: GOt it. 17:45:57 dwalleck_: I've found that deleting and waiting for a converged server delete can take as much time as launching... 17:48:03 anybody have anything more to bring up? davidkranz, I have not seen any response to my ML post about the flaky test failures :( other than sdague's response about the DNS fixes. 17:48:17 perhaps I should word it differently? or send to different group? 17:48:51 jaypipes: Not sure. It just seems like we see this as more important than the nova group does. 17:49:10 jaypipes: We could turn on the gate :) 17:50:04 jaypipes: I was also thinking of trying the stress tests again. 17:50:26 jaypipes: As soon as we get the bogus ERRORs out of the logs. 17:51:02 jaypipes: If the problem is that machines outside of the ci invironment are too fast, stress might how the problem better. 17:51:07 davidkranz: well, let's do this: let's work on the ERROR crap and cleaning up the tempest output (glanceclient logging, etc) so it's easier for nova devs to work with us in diagnosing the race conditions we see sometimes, and then I'll send another post begging for help 17:51:38 jaypipes: ++ All the ERROR crap is encapsulated in nova bugs I filed. 17:52:11 I wonder how will someone use the OpenStack's json/xml REST API, if they can't read python (Openstack documentation is the source code). For example a typical java or C coder, does not know python. 17:52:50 davidkranz: excellent. 17:53:06 afazekas: it's a problem, true. 17:53:31 afazekas: I believe at this point the best solution is to do the following: 17:53:47 a) Identify *very specific questions* that are unanswered or vague in the docs 17:53:57 afazekas: The start would be good, published docstrings for the python API. 17:54:07 b) Bring the specific issue to the attention of annegentle and the PTL for the project 17:54:15 c) File a bug, tagged with doc-impact 17:54:22 d) Rinse and repeat 17:55:06 jaypipes: Part of the issue is that in novaclient, for example, there are good docstrings for the cli but not the python API. 17:55:35 jaypipes: You have to read the code to use the API, but not to use the cli. 17:55:54 davidkranz: yep. 17:55:58 jaypipes: I was actually doing that today :) 17:57:15 OK all, going to end the meeting now. Please send last-minute questions or feedback to posts on the QA mailing list (I just sent out ML post about afazekas nomination to qa-core) 17:57:18 jaypipes: hi 17:57:27 Good night/day/morning/weekend ;) 17:57:32 mnewby: ! there you are... 17:57:49 mnewby: closing meeting right now, but let's go to #openstack-dev to chat about the quantum test review 17:57:53 #endmeeting