17:02:39 #startmeeting qa 17:02:40 Meeting started Thu Aug 30 17:02:39 2012 UTC. The chair is jaypipes. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:02:41 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:02:42 The meeting name has been set to 'qa' 17:03:02 Welcome back, Jay! 17:03:03 dwalleck, davidkranz, Ravikumar_hp, torandu: yo 17:03:07 howdy! 17:03:14 Glad to have you back 17:03:28 hi Jay 17:03:30 jaypipes: Lot's of action while you were gone :) 17:04:04 davidkranz: trying to catch up :) 17:04:09 We have an issue with flakey tests. 17:04:10 yup, re-wrote the whole thing in perl :D 17:04:18 dwalleck: lol 17:04:36 davidkranz: can you summarize for me what you've found (re: shaky tests) 17:04:42 If you look at the last weeks failures of tempest gate there is a bunch of inexplicable stuff 17:04:52 Just a sec.. 17:05:16 I know some of it was servers going into error 17:05:49 Here is a selection: http://paste.openstack.org/show/20586/ 17:06:21 There are also the three issues of last week that I submitted skips for but still need to be investigated. 17:06:29 The flavors ones were a bug that I reported 17:06:36 kk, looking 17:06:46 dwalleck: The keypair one really bothers me. 17:07:07 dwalleck: Also the one with the reference to the %xxx% template variable. Seems like some kind of infrastructure issues. 17:07:28 guys, are these errors really due to changes in core projects that are breaking tempest because we are only gating upstream core projects on smoke tests, not whole suiet? 17:07:38 I mean Could not parse rfc1738 URL from string '%COMPUTE_DB_URI%' 17:07:57 jaypipes: I'm not sure because they are flakey. 17:08:09 jaypipes: There are no tests that always fail. 17:08:51 davidkranz: well, the COMPUTE_DB_URI one should always fail or never fail... that is surprising if it's flaky 17:08:54 the flavors one was definitely an upstream issue 17:09:12 davidkranz: because that is entirely due to the devstack/tools/configure_tempest.sh being created from devstack-gate 17:09:36 no idea on the keypair one... looks like the behaviour changed upstream and broke us 17:09:58 jaypipes: But it is flakey.. 17:10:00 * jaypipes thinks this kind of thing points to the need to have upstream projects gated on the whole tempest run... 17:10:29 flakey is okay if it's still a bug. It just means its an itermittent/race condition defect 17:10:29 davidkranz: the COMPUTE_DB_URI one? Could you provide a link to the log files for a job that failed b/c of that error 17:10:36 Which are good ones to catch! 17:11:06 fyi, i introduced a bug in devstack-gate that caused old commits of some projects to be used between Aug 29, 2012 8:35 AM and Aug 29, 2012 2:51 PM 17:11:15 (pacific time) 17:11:17 jaypipes: http://logs.openstack.org/12048/1/gate/gate-tempest-devstack-vm/9641/console.html 17:11:56 gimme a sec to read rthrough this.... 17:12:05 i think the things you're talking about are mostly outside that range, but just in case. 17:12:13 (also, sorry :( ) 17:12:27 :), no worrie jeblair 17:13:27 jeblair: These things happen. 17:13:40 jeblair: It would be good though if the logs had a timestamp at the top. 17:13:55 jeblair: I mean with a real date. 17:14:14 davidkranz: OK, so the COMPUTE_DB_URI should have been replaced in the devstack./tools/configure_tempest.sh script. 17:14:42 davidkranz: good point. you can check the file date for now, but i'll work on adding that. 17:14:44 davidkranz: one sec, finding devstack patch that is matched to that 17:14:51 jaypipes: I looked at that stuff for a bug but didn't find one. 17:15:52 davidkranz: https://review.openstack.org/#/c/10659/3/tools/configure_tempest.sh 17:16:03 davidkranz: that COMPUTE_DB_URI should be fixed now... for good. 17:16:34 davidkranz: that failure should have only occurred in the time between the patch for whitebox tests in tempest and the corrrelated devstack change that added the whitebox vars to the tempest config 17:17:38 cripes, I'm a bit late, 17:17:39 but I'm here 17:18:22 jaypipes: OK, I don't remember the date of that log but let's assume that is OK until proven otherwise. 17:18:39 davidkranz: kk 17:19:11 So that leaves us with the keypair one. 17:19:34 sorry, had a drive by abduction, back 17:20:47 davidkranz: I can look into the keypair one today. 17:21:00 if I can get through this craptop of emails... :( 17:21:07 OK, great. We should create a page in etherpad or wiki where people can paste these kind of failures. 17:21:31 It is too easy to see a failure and just do a 'recheck' 17:21:58 yep 17:22:14 we also need to discuss whether running the full suite should be the gate instead of just smoke... 17:22:24 because I have a feeling that is what is biting us here../ 17:22:30 jaypipes: ++ 17:22:37 you mean for nova's gate? 17:22:40 Though it is good to see if things break 17:22:42 jaypipes: Probably so. 17:23:04 jaypipes: We could skip the really slow tests for that gate. 17:23:08 Perhaps not as a gate job, it might still be a good idea to run the full suite daily 17:23:08 jaypipes: Total execution time of tests should be as low as possible, meaning more optimizations 17:23:24 dwalleck: +1 17:23:38 there is a new feature in zuul that allows you to run a test as non voting 17:23:49 dansmith: I mean for all the core projects. 17:23:59 jaypipes: yeah, I vote for that.. spread the pain around a bit :D 17:24:37 jaypipes: I think that having a third group of tests that are "long and boring" that don't gate might be suitable, but unless other projects are gated the same way, 17:24:46 tempest seems like it's just going to continue to be broken 17:25:11 clarkb: That would be great. Blocking a bad checkin is good but the critical thing is tying a failure to a particular change which we can't do now except for the real gate. 17:25:22 dansmith: well, the issue is, anything that isn't gated can break Tempest... for instance, if an API changes, and the API call isn't in smoke tests, tempest's own gate (which runs all tempest suite) will break... 17:25:37 clarkb: ++ 17:25:46 at a minimum, that woulud be very useful 17:25:56 davidkranz: yes it would allow you to run the full tempest suite against a change, results would be reported, but won't actually count for or against that change 17:26:01 jaypipes: ++ 17:26:14 clarkb: yeah, that sounds reasonable 17:26:45 A non-gating full run andm moving some more tests into the gate would be a good start. 17:26:52 yeah 17:27:03 maybe, 17:27:28 we make the non-gating full run shame the submitter with verbal insults so they feel bad... 17:27:53 In the long run we want a full tempest run to take hours so we will have to draw the line somewhere in any event. 17:28:37 davidkranz: If it takes multiple hours, we've done something very wrong 17:29:15 dwalleck: I meant taking that long because there is eventually 20 core projects with negative fuzz testing blah blah blah. 17:29:23 davidkranz: I'd classify such tests as longevity tests, which are important to Openstack too 17:29:24 ahh, gotcha 17:29:54 davidkranz: ok 17:29:54 rohitk: I didn't mea longevity tests, just good coverage of a huge number of apis. 17:30:26 davidkranz: ++, which leads us to 'what happened to the fuzz client?? ' :) 17:30:42 indeed. 17:30:46 jaypipes: This came up at a meeting while you were away. 17:30:55 jaypipes: You seemed to be the one with the most info. 17:31:28 davidkranz: I was only the one that proposed it :) 17:31:44 clarkb: Can you set up the non-gating tempest run for all projects? 17:32:06 jaypipes: there was someone who volunteered to introduce randgen, but we have'nt reached there 17:32:09 davidkranz: the idea is to have a fuzz testing tool like randgen take the Compute API grammar and fire random./bad crap at the API and verify no holes and expected error returns 17:32:18 jeblair: ^ 17:32:21 rohitk: pcrews on mtaylor's CI team 17:32:32 jaypipes: ok 17:32:37 jaypipes: Right. The question was whether any one was actually working on that or not. 17:32:51 davidkranz: not that I know of. 17:33:01 davidkranz: it just hasn't been a priority. 17:33:02 if there is github ongoing project/code, people could contribute and accelerate 17:33:16 davidkranz: all I know is the negative tests added were not scalable or maintainable IMHO 17:33:28 rohitk: agreed, just don't have the resources right now. 17:33:28 now that I think about it more one concern would be it would add a third devstack-tempest run to each change 17:33:31 jaypipes: We all agreed with that. 17:33:54 (we are currently transitioning to havnig one that uses cinder to two one with cinder and one with nova volume) 17:34:36 davidkranz: I would want jeblair to provide some feedback before implementing anything 17:34:48 clarkb: Sure. 17:37:19 jaypipes: Where is the "Compute API grammar"? 17:38:01 davidkranz: we need to make it. 17:38:11 jaypipes: :) 17:38:15 davidkranz: not that hard given decent API docs 17:40:45 Anything else for now? 17:40:48 ok, y'all, anything urgent? 17:40:52 * dwalleck actually understands the admin api now...a bit 17:41:00 not from em... needs to get back to emails... and reviews 17:41:03 me 17:41:23 ditto. 17:41:32 OK, I guess that is it for today. 17:41:37 done here too 17:41:58 kk 17:42:00 #endmeeting