17:00:58 <sdague> #startmeeting qa
17:00:59 <openstack> Meeting started Thu Oct 24 17:00:58 2013 UTC and is due to finish in 60 minutes.  The chair is sdague. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:01:00 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:01:02 <openstack> The meeting name has been set to 'qa'
17:01:07 <sdague> who's here?
17:01:10 <afazekas> O/
17:01:14 <mtreinish> I am here
17:01:14 <mlavalle> sdague: hi
17:01:17 <maurosr> o/
17:01:19 <Anju> hi
17:01:19 <dkranz> Here
17:01:20 <adalbas> here
17:01:32 <sdague> #link https://wiki.openstack.org/wiki/Meetings/QATeamMeeting Agenda
17:01:44 <sdague> #topic Design Summit Schedule (sdague)
17:01:53 <mkoderer> hi *
17:01:58 <sdague> #link http://icehousedesignsummit.sched.org/
17:02:20 <mkoderer> sdague: I got my final approval
17:02:30 <sdague> so the summit schedule is pushed, I figured we'd take a minute to figure out if there were any last minute sessions that we really need, and that I need to adjust for
17:02:49 <sdague> otherwise, I'm pretty happy with how the schedule played out, and I think there will be a lot of good meat there
17:03:16 <dkranz> sdague: lgtm
17:03:18 <mkoderer> looks good for me
17:03:45 <mtreinish> sdague: nice pun?
17:03:55 <sdague> :)
17:04:16 <sdague> I guess I asked before, but who all is going to be there?
17:04:18 <sdague> o/
17:04:23 <sdague> just to get a sense of things
17:04:25 <mtreinish> I will
17:04:38 <mkoderer> I will be there
17:04:49 <Anju> sdague:  I will also
17:04:58 <dkranz> I will
17:05:01 <afazekas> o/
17:05:11 <sdague> cool, the gang will all be there :)
17:05:18 <sdague> ok, next topic
17:05:33 <sdague> #topic Neutron job status (mtreinish)
17:05:41 <mtreinish> oh this is a topic
17:05:57 <mtreinish> ok so you may have noticed a new gating job on tempest neutron-pg-isolated
17:05:59 <sdague> I figured you've got the most recent knowledge on that
17:06:16 <mlavalle> sdague: I also have things to report regarding neutron
17:06:25 <mtreinish> that is the same as the regular neutron job just with tenant isolation enabled (also with a postgres db)
17:06:32 <sdague> mlavalle: cool, jump in
17:06:45 <mlavalle> i'll wait for mtreinish to finish
17:06:52 <mtreinish> yesterday I broke the neutron gate by increase the number of tests that have isolation enabled
17:07:16 <mtreinish> it exposed another real bug in neutron
17:07:29 <jog0> what does isolated mean?
17:07:48 <mtreinish> jog0: it creates a separate tenant and user for each test class
17:07:58 <mtreinish> and with neutron makes an separate network for each tenant
17:08:15 <jog0> mtreinish: thought so, thanks. and that makes  neutron fail more?
17:08:19 <mtreinish> yep
17:08:39 <jog0> strange
17:08:55 <mtreinish> so the job was added to fix the asymmetry between the neutron gate and the tempest gate
17:08:55 <sdague> there looks like there is some resource starvation that's happening
17:08:57 <mlavalle> mtreinish; the good thing about this isolation code is that we are really putting neutron through its paces
17:09:15 <mtreinish> so we can catch these issues without me breaking the neutron gate to do it
17:09:35 <dkranz> mtreinish: ++
17:09:58 <sdague> mtreinish: and you tripped another deadlock, right?
17:10:21 <mtreinish> nati_uen_: thought so, but I did a logstash query this morning and it wasn't a 1:1 match up with the tfail
17:10:48 <mtreinish> this is nati_uen_ etherpad with debug notes: https://etherpad.openstack.org/p/debug1243726
17:11:19 <mtreinish> and bug 1243726 was opened for the issue
17:11:21 <uvirtbot> Launchpad bug 1243726 in neutron "tempest failure: No more IP addresses available on network" [Critical,Confirmed] https://launchpad.net/bugs/1243726
17:11:32 <sdague> #link https://etherpad.openstack.org/p/debug1243726 etherpad for debugging tenant isolation
17:11:50 <sdague> mtreinish: you know if nati_uen_ is still working the issue?
17:11:59 <mtreinish> I think so
17:12:08 <mlavalle> mtreinish: yeah, that bug is consistent with what I find in my dev system
17:13:11 <sdague> mlavalle: ok, great. Are there other things you have to report on it?
17:13:18 <sdague> or on other issues here?
17:13:32 <mlavalle> sdague: I've been working on debugging https://bugs.launchpad.net/swift/+bug/1224001
17:13:34 <uvirtbot> Launchpad bug 1224001 in neutron "test_network_basic_ops fails waiting for network to become available" [Critical,Fix released]
17:13:58 <mlavalle> sdague: the nature of the failure has changed in the log stash since the last fix to neutron
17:14:09 <mlavalle> it now is mostly ping failures
17:14:23 <mtreinish> mlavalle: do we need to change the elastic recheck query?
17:14:33 <mtreinish> or open a new bug?
17:15:18 <mlavalle> i can reproduce in my dev system and will continue debugging. I will use this as an opportunity to develop some of the tcpdump stuff we talked about last week
17:15:27 <sdague> mlavalle: do you have a new recheck query for it?
17:15:39 <sdague> that would be good to change so we can categorize it
17:15:50 <mlavalle> sdague: i will soon
17:16:17 <mlavalle> mtreinish: no for the time being. but i will ping you in irc if i think we should do it
17:16:31 <mtreinish> mlavalle: ok
17:16:41 <mlavalle> that's all i have
17:17:25 <mlavalle> by the way, i'm not going to HKG but want somehow to be part of the neutron conversation :-(
17:17:25 <sdague> ok, great
17:17:51 <sdague> mlavalle: ok... I'm not sure how we do that, but we'll at least try to have a solid etherpad in advance
17:18:14 <sdague> #topic Tempest config file naming conventions and reorg (mtreinish)
17:18:24 <sdague> ok, mtreinish yours again
17:18:32 <mtreinish> I thought this was at the bottom
17:18:32 <mlavalle> sdague: that's great, thanks. I just the team know that i'm committed to this effort
17:18:46 <dkranz> mtreinish: Doesn't matter, go ahead
17:19:05 <mtreinish> so this week I've been going through the config file and changing the grouping around and trying to update the naming to be consistent
17:19:17 <mkoderer> mtreinish: +1
17:19:40 <mtreinish> I want to start adding options for every extension and extra feature we're testing
17:19:47 <mtreinish> instead of just assuming that they are enabled
17:20:08 <mtreinish> but sdague brought up the good point of how we handle that with multiple api versions for the same extension
17:20:20 <mtreinish> like the nova api v3
17:21:03 <mtreinish> so does anyone has any input on what are config strategy should be for this kind of thing?
17:21:29 <mtreinish> I was thinking for extensions with multiple versions we make it a string instead of a bool option to specify which versions are enabled
17:21:46 <mtreinish> obviously this is only a transient issue because eventually the old api version will be deprecated
17:21:52 <dkranz> mtreinish: This is going to be pretty ugly a year or two from now
17:21:53 <sdague> so we also have the issue of configuring this from devstack
17:22:08 <sdague> because devstack really has no idea, as the way nova works is that everything is loaded by default
17:22:32 <dkranz> Coudn't we have some way to "opt-out" of extensions?
17:22:40 <mtreinish> sdague: so I'm fine for defaulting everything true in the sample conf that will work around the devstack issue
17:22:52 <dkranz> Realistically, installations are going to have most enabled, if not aoo.
17:22:54 <sdague> yeh, that just seems like a huge number of options, easy to get wrong
17:22:55 <mtreinish> and I'm working on the config verification script for people who are manually configuring tempest
17:23:11 <mtreinish> which will do the api querying to figure out what is enabled
17:23:12 <sdague> what if we had a list option
17:23:28 <sdague> computev2 = blah,foo,bar
17:23:35 <dkranz> sdague: of excluded extensions?
17:23:38 <sdague> and 'all' is a special value
17:23:39 <mtreinish> that list will get pretty long
17:23:55 <dkranz> mtreinish: Not if it means exclusion
17:23:56 <maurosr> using something similar to what we do in nova policy file wouldn't work?
17:24:02 <sdague> dkranz: so, again, with nova, the minute you specify extensions, you specify them all
17:24:11 <sdague> there isn't an exclude
17:24:28 <sdague> so doing the math becomes interesting
17:24:34 <dkranz> sdague: So you either have all extensions enabled or none?
17:24:46 <sdague> either all, or the list you provide
17:25:09 <sdague> in v3 it's different, because of entry poitns
17:25:19 <dkranz> sdague: I was talking about exclusion only in the tempest config
17:25:34 <mtreinish> maurosr: do you have a link?
17:25:40 <dkranz> So tempest would assume enabled unless mentioned
17:25:51 <dkranz> which would also handle the devstack case
17:25:52 <sdague> dkranz: right, but that would mean you have to figure our that nova added a new extension that you didn't know about
17:26:07 <sdague> because you actually need to compute the diff
17:26:29 <dkranz> but if you didn't know about it you would be running with it, unless the default was disabled
17:26:56 <dkranz> I think I may be too ignorant about this  so will be quiet
17:27:06 <sdague> heh, no think of it this way :)
17:27:15 <sdague> avail extensions: a, b, c, d, e
17:27:21 <sdague> nova loads: a, b
17:27:24 <maurosr> mtreinish: https://github.com/openstack/nova/blob/master/etc/nova/policy.json of course just the model, the idea would be enable extensions or not instead of privilege level
17:27:31 <sdague> tempest exclusion for: c, d, e
17:27:40 <sdague> now nova adds ext f
17:27:45 <sdague> and your validation break
17:27:56 <sdague> because f isn't excluded from your tempest config
17:28:01 <sdague> but it's not enabled in nova
17:28:13 <dkranz> sdague: I just did not realize that these extensions in nova were by default opt-in
17:28:17 <sdague> so if we are building a list, it should be in the same order as the services
17:28:25 <sdague> dkranz: well, it's weird
17:28:29 <mtreinish> maurosr: that's basically what I'm proposing except instead of doubling up for the v3 extensions make it a string which specifies the versions
17:28:31 <sdague> it's all in, or explicit it
17:28:55 <dkranz> sdague: I don't think that matches the real usage model at all, which will mostly be "in" not "out"
17:28:57 <sdague> explicit in
17:29:01 <dkranz> sdague: but oh well
17:29:11 <sdague> yeh, the way it is
17:29:35 <sdague> mtreinish: so is there an oslo config type that would let us do this with lists (that could be multi line)?
17:29:40 <sdague> instead of lots of options?
17:29:52 <mtreinish> sdague: there is ListOpt
17:29:59 <sdague> I think it's at least exploring how terrible that patch would be
17:30:16 <sdague> because a ton of boolean options feels weird to me
17:30:33 <sdague> nova v2 is 70 extensions I think
17:30:40 <mtreinish> sdague: we use it for logging right now: https://git.openstack.org/cgit/openstack/tempest/tree/etc/tempest.conf.sample#n13
17:30:45 <mlavalle> sdague: I have to run to another meeting. See you in openstack-qa
17:30:50 <sdague> mlavalle: sure
17:31:13 <mtreinish> sdague: yeah that's fair
17:31:16 <sdague> mtreinish: sure, but that's a much smaller list
17:31:26 <dkranz> mtreinish: Could we have the option just point to a policy file, or wherever the "in" is defined?
17:31:41 <sdague> I guess dkranz's exclude approach would be good as well
17:31:53 <dkranz> Then the conf would not have to be updated all the time.
17:31:55 <sdague> from brevity, though we know it would cause issues
17:32:03 <sdague> dkranz: the policy file is not network accessible
17:32:15 <mtreinish> sdague: that approach just makes my verification script more difficult
17:32:25 <mtreinish> sdague: I think he's saying break out this into a separate file
17:32:26 <dkranz> sdague: I meant "get a copy from the cloud you are running against with tempest"
17:32:27 <sdague> mtreinish: excludes... yeh
17:32:39 <sdague> dkranz: so the problem is, you might not be able to do that
17:32:56 <sdague> if I want to run tempest against hp cloud to figure out if it's really openstack, I can't get their policy file
17:33:09 <dkranz> sdague: Why not, or at least a sanitized subset with just what we care about?
17:33:24 <dkranz> sdague: Surely the implemented extensions is public?
17:33:36 <sdague> dkranz: but not the policy file
17:33:47 <dkranz> sdague: I am going for DRY really
17:33:59 <dkranz> But perhaps it is not possible
17:34:16 <sdague> and the reason we're going down this path, vs. trusting list_extensions, is to be explicit
17:34:33 <dkranz> sdague: I understand
17:34:50 <sdague> mtreinish: ok, so how about explicit and "all"
17:34:57 <sdague> as a list option
17:35:15 <mtreinish> sdague: sure I can do that
17:35:23 <sdague> lets see how bad it is
17:35:41 <mtreinish> well we'll never see how bad it will get because we only run it as all :)
17:35:52 <mtreinish> and not 60 of 70 extensions
17:36:07 <sdague> well, someone else will tell us how bad it is
17:36:34 <sdague> ok, lets move on
17:36:39 <sdague> #topic Scope and place for performance testing such as Rally (dkranz)
17:36:53 <dkranz> So there was a discussion about this on the ml
17:37:14 <dkranz> I just wanted to get a feel of whether we think performance testing should ever be part of tempest
17:37:50 <dkranz> I could go either way
17:38:02 <mtreinish> dkranz: I think that's a good idea or at least the part of it that's actually exercising things
17:38:31 <mkoderer> dkranz: I like the idea
17:38:41 <dkranz> mtreinish: Right, but who will do this work?
17:38:58 <dkranz> mtreinish: If it is not done soon, and people like rally, it will get harder and harder.
17:39:24 <sdague> dkranz: I think the point was letting the rally folks know that we'll like that part in tempest
17:39:42 <sdague> they presumably already were going to do that work, so just that this is the place it should happen
17:40:01 <dkranz> sdague: That works for me, but I was not sure they intended to do that
17:40:06 <sdague> I think the community spoke up pretty strongly about not wanting another load driver out there
17:40:30 <dkranz> sdague: ok, so we will see it percolate a bit
17:40:38 <sdague> dkranz: yes, it remains unclear to me either, but it also was clear they wanted to be part of the gate, and I don't think that will happen if they remain split off doing their own thing
17:40:45 <dkranz> sdague: perhaps there could be some informal discussion at the summit
17:40:50 <sdague> sure
17:41:10 <dkranz> sdague: Agreed about the gate. BUt that is sketchy for the real value
17:41:17 <dkranz> sdague: Even more so than for stress tests
17:41:45 <sdague> dkranz: I think it's a hard problem, but I don't want to completely give up on it yet
17:42:00 <dkranz> sdague: ok let's discuss at summit
17:43:19 <sdague> #topic Status and roll-out plan for failing the gate on log errors (dkranz)
17:43:22 <sdague> next topic
17:43:28 <sdague> all you dkranz
17:43:39 <dkranz> So there was more contention about this on the ml than I expected
17:44:40 <dkranz> I'm not sure how to proceed. I think my case was convincing.
17:45:13 <dkranz> A lot of folks seem to not get that if we allow crap in logs, no one will look at them and that is really bad.
17:45:28 <sdague> dkranz: I don't think it was that contentious
17:45:54 <sdague> honestly, I think the current whitelist approach is fine, and I expect there might be just a few error conditions that we negotiate over at the end
17:45:55 <dkranz> sdague: So if we say we are going to start to fail non-whitelisted errors there will be no objection?
17:46:13 <sdague> dkranz: I think so
17:46:19 <dkranz> sdague: Great, if that is true.
17:46:38 <sdague> jgriffith already went and changed a couple of error conditions in cinder because of the conversation
17:46:46 <dkranz> sdague: ok, cool
17:46:49 <dkranz> sdague: next
17:47:06 <sdague> it looks like there are a couple more that need to be whitelisted out of nova network
17:47:13 <sdague> from the last time I looked at logs
17:47:27 <dkranz> sdague: I am re-watching now
17:47:37 <dkranz> sdague: You can probably imagine how painful this is.
17:47:54 <dkranz> sdague: But I will push through, The end is in sight.
17:48:50 <sdague> cool :)
17:48:56 <dkranz> next topic?
17:49:00 <sdague> yep
17:49:03 <sdague> #topic State of 'smoke' tagging: can we make it useful? (dkranz)
17:49:10 <sdague> it's the dkranz show :)
17:49:14 <dkranz> :)
17:49:26 <sdague> and, honestly, I'm super excited for the whitelist error stuff to hit
17:49:34 <dkranz> So it came up that the current smoke tagging is pretty arbitrary.
17:49:52 <dkranz> We want a set of tests that can run in 5-10 minutes that cover the most ground
17:50:02 <mtreinish> dkranz: right now it's only used for what runs in grenade and the neutron jobs
17:50:18 <mkoderer> we should get rid of all negative test flagged as smoke
17:50:21 <dkranz> mtreinish: Right, but that was not the intent
17:50:27 <sdague> yeh, so honestly, I think we should just dump smoke and have our smoke target be all the non-slow scenario tests
17:50:35 <mkoderer> I don't see any reason for negative tests that are smoke tests
17:50:45 <sdague> mkoderer: +1 agree with that
17:51:01 <dkranz> sdague: That is reasonable if we have the right coverage.
17:51:15 <dkranz> sdague: Certainly the scenario tests *should* have enough coverage.
17:51:31 <mtreinish> sdague: that was the future intent for grenade right, to just run scenario (and increase the scenario coverage)
17:51:39 <mtreinish> and for neutron we want that to be running full anyway
17:51:42 <sdague> dkranz: agreed, actually I'm hoping we can talk about that in - http://icehousedesignsummit.sched.org/event/1a28654a7e05217067ded2bacbfa7484
17:51:46 <dkranz> So that works for me
17:52:20 <sdague> mtreinish: yeh
17:52:26 <dkranz> except that as we go forward we will have more non-slow scenario tests than can run in 5 minutes
17:52:31 <afazekas> IMHO several auth* related test should be smoke even if its a negative test
17:53:07 <sdague> so maybe the negative test discussion at summit, and the scenario test discussion will flesh this out
17:53:14 <dkranz> sdague: ok
17:53:35 <dkranz> Perhaps after neutron is working we can re-use smoke to mean what it should.
17:53:48 <sdague> dkranz: yeh, that would be nice
17:53:49 <dkranz> Then we can run 'smoke' scenario and api
17:53:52 <mkoderer> afazekas: could be that there are some exceptions..
17:54:11 <dkranz> I think that is he right answer for what we want.
17:54:23 <sdague> sure, I guess we could do that, tag some representative scenario tests and a few others we think are important
17:54:30 <dkranz> sdague: Exactly
17:54:53 <sdague> but in reality the API tests feel like they are largely a different class, and each very small, so being in smoke isn't quite right
17:55:01 <sdague> but... a summit discussion
17:55:05 <sdague> also possibly with beer :)
17:55:12 <dkranz> sdague: Definitely
17:55:22 <dkranz> sdague: That's all from me
17:55:23 * mkoderer not sure if the beer tastes good in HK
17:55:33 <sdague> I will have to say the beer during summit sessions in san diego was a great idea
17:55:35 <dkranz> mkoderer: Bring some!
17:55:49 <sdague> mkoderer: I'm sure they have importers :)
17:55:52 <sdague> ok
17:55:59 <mkoderer> ok :)
17:56:00 <sdague> #topic Open Discussion
17:56:05 <sdague> anything else?
17:56:29 <sdague> are people going to be around next week, or are they starting to travel by that point?
17:56:40 <dkranz> sdague: I will be on a plane next Thur so may or may not have connectivity for the meeting
17:56:46 <sdague> mtreinish and I are flying on Fri
17:56:55 <sdague> others?
17:57:06 <mkoderer> I leave on Sunday
17:57:13 <mtreinish> dkranz, sdague: doesn't that depend on your reference point? (and departure time)
17:57:38 <jog0> https://review.openstack.org/#/c/53699/ needs review to unblock stable/havana
17:57:56 <jog0> I had a commit message typo but zuul said it was working
17:58:06 <mtreinish> jog0: why don't you cherry pick the version of that in master?
17:58:12 <sdague> heh
17:58:36 <jog0> mtreinish: that is a seperate patch https://review.openstack.org/#/c/51041/
17:58:45 <jog0> this is a bigger issue though
17:58:51 <sdague> dkranz: ok, can you +2 - https://review.openstack.org/#/c/52413/1 first?
17:59:00 <dkranz> sdague: Yeah, just a sec
17:59:05 <mtreinish> oh crap, that's going to be a merge conflict
17:59:13 <mtreinish> no there was patch bumping six on master
17:59:16 <mtreinish> that's been merged
17:59:32 <mtreinish> jog0: https://git.openstack.org/cgit/openstack/tempest/commit/?id=c0441be3d7f994998779054991214242c5005877
18:00:01 <dkranz> sdague: I did it
18:00:15 <sdague> ok, we need to give up the slot, lets take this to -qa
18:00:20 <sdague> #endmeeting