17:02:39 <jaypipes> #startmeeting qa
17:02:40 <openstack> Meeting started Thu Aug 30 17:02:39 2012 UTC.  The chair is jaypipes. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:02:41 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:02:42 <openstack> The meeting name has been set to 'qa'
17:03:02 <davidkranz> Welcome back, Jay!
17:03:03 <jaypipes> dwalleck, davidkranz, Ravikumar_hp, torandu: yo
17:03:07 <dwalleck> howdy!
17:03:14 <dwalleck> Glad to have you back
17:03:28 <Ravikumar_hp> hi Jay
17:03:30 <davidkranz> jaypipes: Lot's of action while you were gone :)
17:04:04 <jaypipes> davidkranz: trying to catch up :)
17:04:09 <davidkranz> We have an issue with flakey tests.
17:04:10 <dwalleck> yup, re-wrote the whole thing in perl :D
17:04:18 <jaypipes> dwalleck: lol
17:04:36 <jaypipes> davidkranz: can you summarize for me what you've found (re: shaky tests)
17:04:42 <davidkranz> If you look at the last weeks failures of tempest gate there is a bunch of inexplicable stuff
17:04:52 <davidkranz> Just a sec..
17:05:16 <dwalleck> I know some of it was servers going into error
17:05:49 <davidkranz> Here is a selection: http://paste.openstack.org/show/20586/
17:06:21 <davidkranz> There are also the three issues of last week that I submitted skips for but still need to be investigated.
17:06:29 <dwalleck> The flavors ones were a bug that I reported
17:06:36 <jaypipes> kk, looking
17:06:46 <davidkranz> dwalleck: The keypair one really bothers me.
17:07:07 <davidkranz> dwalleck: Also the one with the reference to the %xxx% template variable. Seems like some kind of infrastructure issues.
17:07:28 <jaypipes> guys, are these errors really due to changes in core projects that are breaking tempest because we are only gating upstream core projects on smoke tests, not whole suiet?
17:07:38 <davidkranz> I mean  Could not parse rfc1738 URL from string '%COMPUTE_DB_URI%'
17:07:57 <davidkranz> jaypipes: I'm not sure because they are flakey.
17:08:09 <davidkranz> jaypipes: There are no tests that always fail.
17:08:51 <jaypipes> davidkranz: well, the COMPUTE_DB_URI one should always fail or never fail... that is surprising if it's flaky
17:08:54 <dwalleck> the flavors one was definitely an upstream issue
17:09:12 <jaypipes> davidkranz: because that is entirely due to the devstack/tools/configure_tempest.sh being created from devstack-gate
17:09:36 <jaypipes> no idea on the keypair one... looks like the behaviour changed upstream and broke us
17:09:58 <davidkranz> jaypipes: But it is flakey..
17:10:00 * jaypipes thinks this kind of thing points to the need to have upstream projects gated on the whole tempest run...
17:10:29 <dwalleck> flakey is okay if it's still a bug. It just means its an itermittent/race condition defect
17:10:29 <jaypipes> davidkranz: the COMPUTE_DB_URI one? Could you provide a link to the log files for a job that failed b/c of that error
17:10:36 <dwalleck> Which are good ones to catch!
17:11:06 <jeblair> fyi, i introduced a bug in devstack-gate that caused old commits of some projects to be used between Aug 29, 2012 8:35 AM and Aug 29, 2012 2:51 PM
17:11:15 <jeblair> (pacific time)
17:11:17 <davidkranz> jaypipes: http://logs.openstack.org/12048/1/gate/gate-tempest-devstack-vm/9641/console.html
17:11:56 <jaypipes> gimme a sec to read rthrough this....
17:12:05 <jeblair> i think the things you're talking about are mostly outside that range, but just in case.
17:12:13 <jeblair> (also, sorry :( )
17:12:27 <jaypipes> :), no worrie jeblair
17:13:27 <davidkranz> jeblair: These things happen.
17:13:40 <davidkranz> jeblair: It would be good though if the logs had a timestamp at the top.
17:13:55 <davidkranz> jeblair: I mean with a real date.
17:14:14 <jaypipes> davidkranz: OK, so the COMPUTE_DB_URI should have been replaced in the devstack./tools/configure_tempest.sh script.
17:14:42 <jeblair> davidkranz: good point.  you can check the file date for now, but i'll work on adding that.
17:14:44 <jaypipes> davidkranz: one sec, finding devstack patch that is matched to that
17:14:51 <davidkranz> jaypipes: I looked at that stuff for a bug but didn't find one.
17:15:52 <jaypipes> davidkranz: https://review.openstack.org/#/c/10659/3/tools/configure_tempest.sh
17:16:03 <jaypipes> davidkranz: that COMPUTE_DB_URI should be fixed now... for good.
17:16:34 <jaypipes> davidkranz: that failure should have only occurred in the time between the patch for whitebox tests in tempest and the corrrelated devstack change that added the whitebox vars to the tempest config
17:17:38 <dansmith> cripes, I'm a bit late,
17:17:39 <dansmith> but I'm here
17:18:22 <davidkranz> jaypipes: OK, I don't remember the date of that log but let's assume that is OK until proven otherwise.
17:18:39 <jaypipes> davidkranz: kk
17:19:11 <davidkranz> So that leaves us with the keypair one.
17:19:34 <dwalleck> sorry, had a drive by abduction, back
17:20:47 <jaypipes> davidkranz: I can look into the keypair one today.
17:21:00 <jaypipes> if I can get through this craptop of emails... :(
17:21:07 <davidkranz> OK, great. We should create a page in etherpad or wiki where people can paste these kind of failures.
17:21:31 <davidkranz> It is too easy to see a failure and just do a 'recheck'
17:21:58 <jaypipes> yep
17:22:14 <jaypipes> we also need to discuss whether running the full suite should be the gate instead of just smoke...
17:22:24 <jaypipes> because I have a feeling that is what is biting us here../
17:22:30 <dwalleck> jaypipes: ++
17:22:37 <dansmith> you mean for nova's gate?
17:22:40 <dwalleck> Though it is good to see if things break
17:22:42 <davidkranz> jaypipes: Probably so.
17:23:04 <davidkranz> jaypipes: We could skip the really slow tests for that gate.
17:23:08 <dwalleck> Perhaps not as a gate job, it might still be a good idea to run the full suite daily
17:23:08 <rohitk> jaypipes: Total execution time of tests should be as low as possible, meaning more optimizations
17:23:24 <rohitk> dwalleck: +1
17:23:38 <clarkb> there is a new feature in zuul that allows you to run a test as non voting
17:23:49 <jaypipes> dansmith: I mean for all the core projects.
17:23:59 <dansmith> jaypipes: yeah, I vote for that.. spread the pain around a bit :D
17:24:37 <dansmith> jaypipes: I think that having a third group of tests that are "long and boring" that don't gate might be suitable, but unless other projects are gated the same way,
17:24:46 <dansmith> tempest seems like it's just going to continue to be broken
17:25:11 <davidkranz> clarkb: That would be great. Blocking a bad checkin is good but the critical thing is tying a failure to a particular change which we can't do now except for the real gate.
17:25:22 <jaypipes> dansmith: well, the issue is, anything that isn't gated can break Tempest... for instance, if an API changes, and the API call isn't in smoke tests, tempest's own gate (which runs all tempest suite) will break...
17:25:37 <jaypipes> clarkb: ++
17:25:46 <jaypipes> at a minimum, that woulud be very useful
17:25:56 <clarkb> davidkranz: yes it would allow you to run the full tempest suite against a change, results would be reported, but won't actually count for or against that change
17:26:01 <dwalleck> jaypipes: ++
17:26:14 <dansmith> clarkb: yeah, that sounds reasonable
17:26:45 <davidkranz> A non-gating full run andm moving some more tests into the gate would be a good start.
17:26:52 <jaypipes> yeah
17:27:03 <dansmith> maybe,
17:27:28 <dansmith> we make the non-gating full run shame the submitter with verbal insults so they feel bad...
17:27:53 <davidkranz> In the long run we want a full tempest run to take hours so we will have to draw the line somewhere in any event.
17:28:37 <dwalleck> davidkranz: If it takes multiple hours, we've done something very wrong
17:29:15 <davidkranz> dwalleck: I meant taking that long because there is eventually 20 core projects with negative fuzz testing blah blah blah.
17:29:23 <rohitk> davidkranz: I'd classify such tests as longevity tests, which are important to Openstack too
17:29:24 <dwalleck> ahh, gotcha
17:29:54 <rohitk> davidkranz: ok
17:29:54 <davidkranz> rohitk: I didn't mea longevity tests, just good coverage of a huge number of apis.
17:30:26 <rohitk> davidkranz: ++, which leads us to 'what happened to the fuzz client?? ' :)
17:30:42 <jaypipes> indeed.
17:30:46 <davidkranz> jaypipes: This came up at a meeting while you were away.
17:30:55 <davidkranz> jaypipes: You seemed to be the one with the most info.
17:31:28 <jaypipes> davidkranz: I was only the one that proposed it :)
17:31:44 <davidkranz> clarkb: Can you set up the non-gating tempest run for all projects?
17:32:06 <rohitk> jaypipes: there was someone who volunteered to introduce randgen, but we have'nt reached there
17:32:09 <jaypipes> davidkranz: the idea is to have a fuzz testing tool like randgen take the Compute API grammar and fire random./bad crap at the API and verify no holes and expected error returns
17:32:18 <clarkb> jeblair: ^
17:32:21 <jaypipes> rohitk: pcrews on mtaylor's CI team
17:32:32 <rohitk> jaypipes: ok
17:32:37 <davidkranz> jaypipes: Right. The question was whether any one was actually working on that or not.
17:32:51 <jaypipes> davidkranz: not that I know of.
17:33:01 <jaypipes> davidkranz: it just hasn't been a priority.
17:33:02 <rohitk> if there is github ongoing project/code, people could contribute and accelerate
17:33:16 <jaypipes> davidkranz: all I know is the negative tests added were not scalable or maintainable IMHO
17:33:28 <jaypipes> rohitk: agreed, just don't have the resources right now.
17:33:28 <clarkb> now that I think about it more one concern would be it would add a third devstack-tempest run to each change
17:33:31 <davidkranz> jaypipes: We all agreed with that.
17:33:54 <clarkb> (we are currently transitioning to havnig one that uses cinder to two one with cinder and one with nova volume)
17:34:36 <clarkb> davidkranz: I would want jeblair to provide some feedback before implementing anything
17:34:48 <davidkranz> clarkb: Sure.
17:37:19 <davidkranz> jaypipes: Where is the "Compute API grammar"?
17:38:01 <jaypipes> davidkranz: we need to make it.
17:38:11 <davidkranz> jaypipes: :)
17:38:15 <jaypipes> davidkranz: not that hard given decent API docs
17:40:45 <davidkranz> Anything else for now?
17:40:48 <jaypipes> ok, y'all, anything urgent?
17:40:52 * dwalleck actually understands the admin api now...a bit
17:41:00 <jaypipes> not from em... needs to get back to emails... and reviews
17:41:03 <jaypipes> me
17:41:23 <dwalleck> ditto.
17:41:32 <davidkranz> OK, I guess that is it for today.
17:41:37 <rohitk> done here too
17:41:58 <jaypipes> kk
17:42:00 <jaypipes> #endmeeting