17:00:03 <mtreinish> #startmeeting qa
17:00:04 <openstack> Meeting started Thu Sep 19 17:00:03 2013 UTC and is due to finish in 60 minutes.  The chair is mtreinish. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:00:05 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:00:07 <openstack> The meeting name has been set to 'qa'
17:00:20 <mtreinish> who's here for the meeting?
17:00:37 <afazekas> hi
17:00:42 <Anju> hii
17:00:42 <mkoderer> hi
17:01:10 <dkranz> hi
17:01:10 <giulivo> hi
17:01:12 <mtreinish> ok here today's agenda:
17:01:14 <mtreinish> #link https://wiki.openstack.org/wiki/Meetings/QATeamMeeting
17:01:23 <mtreinish> lets get started
17:01:34 <mtreinish> #topic neutron testing status
17:01:39 <ravikumar_hp> hi
17:01:41 <mtreinish> mlavalle: are you around?
17:02:10 <mlavalle> Yes i am
17:02:21 <mtreinish> ok, any update on neutron testing?
17:02:44 <mlavalle> mtreinish: I've been debugging isolated_creds
17:03:13 <mlavalle> mtreinish: as you know, I am having a race issue. With the latest run you sent me…..
17:03:33 <mtreinish> ok
17:03:43 <mlavalle> I think I need to add exception handling when I don't find the port in cleanup_ports
17:03:47 <psedlak> hi
17:04:02 <mlavalle> that's going to be my next patchset
17:04:20 <mtreinish> mlavalle: ok, we talk about the details of the patch after the meeting I had some questions
17:04:31 <mtreinish> is there anything else on the neutron front?
17:04:43 <mlavalle> I would also like to request a run of the neutron_full gate job
17:04:44 <afazekas> can we merge the partially working isolation patch now ?
17:05:11 <mtreinish> afazekas: it'll fail on clean up
17:05:20 <mtreinish> mlavalle: I think it's in the experimental queue
17:05:21 <mlavalle> right now is disabled. I want to see the status of a run, after all the changes that have been merged
17:05:36 <mlavalle> can you point me to it?
17:06:05 <mtreinish> mlavalle: I think you leave a zero comment 'check experimental'
17:06:08 <mtreinish> dkranz: is that right?
17:06:14 <dkranz> mtreinish: Yes.
17:07:00 <mlavalle> mtreinish: ok, with that I will be able to gauge how much more work we need to do to fix it for godd
17:07:12 <mlavalle> that's all I have
17:07:13 <mtreinish> ok cool
17:07:21 <mtreinish> let's move on then
17:07:28 <mtreinish> #topic blueprints
17:07:49 <afazekas> The patch makes closer to a working network isolation  ant it is easier to continue if it is merged, it does not have any negative impact  to the  current jobs
17:07:49 <mtreinish> are there any blueprints that we need to discuss?
17:08:32 <mtreinish> afazekas: I'm not comfortable merging code that doesn't work
17:09:00 <mtreinish> ok if there aren't any blueprints to bring up then lets move
17:09:14 <mtreinish> #topic Critical Reviews
17:09:23 <mlavalle> afazekas: I don't feel comfortable either, yet. let me wrestle with it a little longer
17:09:28 <mtreinish> are there any reviews that people need to get eyes on?
17:09:28 <dkranz> mlavalle: You should do the check experimental in a tempest patch. It was not added to all projects.
17:10:17 <Anju> mtreinish:   https://review.openstack.org/#/c/43039/
17:10:20 <Anju> this one
17:10:28 <mtreinish> #link https://review.openstack.org/#/c/43039/
17:10:29 <dkranz> mtreinish: https://review.openstack.org/#/c/38995/ is very old and I hope can be approved.
17:10:38 <mtreinish> #link https://review.openstack.org/#/c/38995/
17:11:05 <mtreinish> ok I've got them open on my browser I'll take a look after the meeting
17:11:06 <dkranz> Anju: That link was merged
17:11:20 <mtreinish> and any other cores can pick them off too
17:11:31 <giulivo> mtreinish, one more, it's not critical but useful to figure what is good and what isn't for the stable branches
17:11:32 <giulivo> https://review.openstack.org/#/c/45808/
17:11:33 <mtreinish> Anju: yeah it is
17:11:44 <mtreinish> Anju: it's also a nova patch
17:11:52 <Anju> mtreinish:   https://review.openstack.org/#/c/39621/
17:11:56 <Anju> mtreinish:  this one
17:11:58 <mtreinish> #link https://review.openstack.org/#/c/45808/
17:12:16 <giulivo> btw https://review.openstack.org/#/c/45808/ is a cherry pick
17:12:16 <Anju> i want to how to proceed in v3 tests
17:12:21 <mkoderer> https://review.openstack.org/#/c/39621/ got several -1's
17:12:58 <mtreinish> giulivo: yeah I see that we can talk about that after the meeting
17:13:12 <mtreinish> ok are there any other reviews?
17:13:22 <adalbas> #link https://review.openstack.org/#/c/39621/
17:13:34 <dkranz> mkoderer: I don't think the -1s are valid at this point.
17:13:48 <Anju> mtreinish, mkoderer , giulivo , dkranz ,afazekas : i know the topic is not for v3 tests. but need a direction...
17:14:11 <Anju> mtreinish, mkoderer , giulivo , dkranz ,afazekas : two times -1 in this patch
17:14:12 <mtreinish> dkranz: yeah it just needs a rebase probably
17:14:12 <malini2> sorry to interrupt -- but do we not have a security meeting today?
17:14:27 <mtreinish> malini2: it's in an hour I think
17:14:39 <malini2> oh
17:15:00 <mtreinish> Anju: there is nothing wrong with the v3 tests but that one just needs a rebase
17:15:12 <mtreinish> that commit is a straight copy and paste of the v2 tests
17:15:31 <dkranz> mtreinish: To be clear, we have adopted the copy/modify approach and are ready to move on any v3 reviews, right?
17:15:52 <Anju> afazekas:  is this ok?
17:16:10 <mtreinish> dkranz: I think that's what the plan is. I didn't really agree with it but the consensus was against me so I'm fine with it now :)
17:16:25 <Anju> your comments to  more inheritance and less copy-paste ?
17:16:33 <afazekas> Anju: It is the faster way to get v3 test, adding same inheritance is possible latter
17:16:39 <dkranz> mtreinish: What did you propose?
17:16:56 <mtreinish> just adding v3 tests as individual patches don't bother copying and pasting
17:17:05 <mtreinish> and maybe break it up into smaller patches in a longer series
17:17:24 <dkranz> mtreinish: but the end result is still a full copy of tests for v3?
17:17:33 <mtreinish> yeah
17:17:59 <dkranz> mtreinish: I was talking about copy/modify vs attemped inheritance
17:18:23 <mtreinish> yeah the copy/modify is the way they're going to do it
17:18:32 <dkranz> mtreinish: I don't remember a discussion and don't really have an opinion about how we get there
17:18:48 <dkranz> mtreinish: But we can move on now.
17:18:51 <mtreinish> it was on the ml and partially in a review at one point
17:18:59 <mtreinish> yeah ok let's go to the next topic
17:19:11 <mtreinish> #topic How to handle bug fixes in launchpad
17:19:18 <mtreinish> so this one is mine
17:19:42 <mtreinish> during the bug day earlier this week we noticed that all the fixed bugs were in the fix committed state
17:20:07 <mtreinish> it used to immediately go into fix released after the patch was merged
17:20:21 <mtreinish> there seemed to be a bit of disagreement over which approach was better
17:20:31 <afazekas> IMHO after it merged it should go the Fix released state automatically or with 1-7 day delay
17:20:40 <dkranz> afazekas: I agree
17:20:42 <mtreinish> so I just wanted to bring it up during the meeting to see if there was a strong opinion one way or the other
17:21:07 <mtreinish> afazekas: ok yeah that's what I'm leaning towards again
17:21:31 <mtreinish> I can revert for jeepb to switch it back to the old behavior then
17:21:40 <dkranz> mtreinish: Thanks.
17:21:54 <mtreinish> ok this was a quick topic
17:22:01 <mtreinish> #topic Bogus errors in logs one more time
17:22:04 <mtreinish> dkranz: you're up
17:22:32 <dkranz> So I was trying to see if there was a failure in the neutron logs and saw that it too has lots of bogus ERROR/stacktrace
17:22:35 <giulivo> mtreinish, and eventually move whatever is in fix committed to fix released ?
17:22:48 <dkranz> That show up even on successful runs
17:22:54 <afazekas> the tempest.log should have a thread/pid number
17:23:10 <mtreinish> giulivo: yeah that was the intent of switching leaving it in fix committed
17:23:13 <dkranz> infra has agreed they could put in a regexp on the logs to fail a build if bogus errors show up
17:23:46 <dkranz> Part of fixing this issues is educating developers about what log.error should be used and not used for.
17:24:10 <dkranz> The second part is giving some priority to fixing the bogus ones
17:24:10 <mtreinish> dkranz: yeah I've seen overuse of log.ERROR before too
17:24:21 <afazekas> I hope it will not lead to not using the error or critical level when it is required
17:24:22 <mtreinish> I think sdague was working on a whitelist at one point
17:24:31 <dkranz> The third part is defining a "whitelist" so that infra can start failing builds that introduce new ones.
17:24:35 <mtreinish> or that might have just been for stacktraces
17:25:00 <dkranz> mtreinish: I haven
17:25:20 <dkranz> I haven't heard any one disagree with these points but I'm not sure how to make it happen.
17:25:26 <dkranz> Particularly the priority part
17:25:55 <dkranz> I don't know why people don't think this has customer/user impact
17:26:16 <dkranz> It was a major headache with the system I ran.
17:26:29 <mtreinish> dkranz: well we can make a blueprint for this, for the priority one it's really 2 parts identifying the spurious log messages and then opening bugs for them
17:26:45 <mtreinish> and marking them as high priority bugs
17:26:51 <jog0> it would be nice to see some of this before Havana is cut
17:26:54 <dkranz> mtreinish: Yeah. THere are already a bunch of bugs I filed a while ago.
17:27:05 <dkranz> mtreinish: I can take another pass at this.
17:27:25 <dkranz> But really some one from each team should grep there logs and go from there
17:27:27 <mtreinish> I think the best way to start going about this is to send out a post to the ML to try to get a wider audience
17:27:47 <afazekas> +1
17:28:09 <dkranz> mtreinish: Sure. But I've done that before :(
17:28:10 <jog0> we can use logstash for this i think
17:28:29 <mtreinish> jog0: yeah logstash will be useful for this too
17:28:32 <dkranz> jog0: How?
17:28:33 <jog0> (for finding what stactraces happen today)
17:28:41 <giulivo> dkranz, mtreinish how about going the opposite route which is collecting what is logged as error also when tempest succeeds and post it to the -dev list ?
17:28:54 <dkranz> giulivo: That is exactly what I was talking about
17:29:07 <giulivo> oh I thought asking -dev to inspect
17:29:25 <dkranz> giulivo: Yes, to inspect the bogus ERROR in their logs
17:29:51 <giulivo> ok so I was just suggesting -qa/-infra inspects the logs first and post the bogus messages
17:30:22 <jog0> @message:Traceback* AND @fields.filename:"logs/screen-n-api.txt"
17:30:34 <jog0> dkranz: ^ put that in logstash.openstack.org
17:30:53 <mtreinish> dkranz: there should probably be a summit topic on this too.
17:31:06 <dkranz> mtreinish: OK, I'll put one in.
17:31:11 <mtreinish> but we want to start it before havana
17:31:11 <jog0> 'message:Traceback* AND @fields.filename:"logs/screen-n-api.txt" AND @fields.build_status:"SUCCESS"' will tell you for only passing jobs
17:31:30 <dkranz> jog0: OK
17:31:59 <dkranz> jog0: There are also bogus ERROR without stacktrace
17:32:13 <mtreinish> there is also the find_stack_traces.py script in tools
17:32:23 <mtreinish> although I haven't used it at all
17:32:46 <dkranz> mtreinish: I will do something
17:33:03 <afazekas> where the white list should be stored ? is the devstack-gate repo good for whitelist ?
17:33:27 <mtreinish> afazekas: it should probably be separate
17:33:31 <dkranz> afazekas: Sean had some idea for this but didn't tell me details
17:34:06 <mtreinish> dkranz: well then we can all bug him when he gets back.
17:34:16 <mtreinish> dkranz: it probably wouldn't hurt to start a bp on this now
17:34:28 <mtreinish> and we can fill in the details after we have some more discussion about how to do it
17:34:36 <dkranz> mtreinish: OK, I'll do that.
17:34:47 <afazekas> tempest  or infra bp ?
17:34:49 <dkranz> mtreinish: The problem is that it is really a cross-project blueprint
17:35:05 <dkranz> mtreinish: We as a community don't always do so well with those.
17:35:38 <mtreinish> dkranz: yeah, it really doesn't fit too well in one project. Just stick it somewhere I guess
17:35:46 <mtreinish> we can make individual project bps too
17:35:52 <mtreinish> and use dependency to track them
17:35:59 <mtreinish> that's what I did for the coverage extension
17:36:00 <dkranz> mtreinish: That's a good idea.
17:36:10 <mtreinish> dkranz: https://blueprints.launchpad.net/tempest/+spec/tempest-coverage-reporting
17:36:17 <dkranz> mtreinish: I'll figure out which projects have issues and open them in each
17:36:48 <dkranz> mtreinish: tempest is probably the one project that won't require code changes :)
17:36:50 <psedlak> isn't the infra obvious choice for that (bogus errors), what would be the reason to create it as tempest bp?
17:36:54 <jog0> if we can gate on no new stacktraces before Havana is out that would be really amazing
17:37:15 <dkranz> jog0: Yes
17:37:40 <dkranz> jog0: To be clear, are you distinguishing between incorrect ERRORs that have stacktraces and those that don't?
17:38:10 <dkranz> jog0: If we make "rules" for developers we have to be very precise
17:39:06 <afazekas> if there is an ERROR message  without detailed info it is a double -1 :)
17:39:58 <dkranz> IMO, ERROR in log should be for something the operator *should* understand/investigate
17:40:26 <dkranz> And can thus be used as a monitoring alert
17:40:45 <jog0> I agree with dkranz.  I think its better to take a smaller step at first and just worry about stacktraces and not errors
17:40:46 <afazekas> +1
17:40:50 <dkranz> swift is hopeless in this regard
17:41:27 <mtreinish> dkranz: +1 (not about swift I haven't really looked at swift logs much)
17:41:37 <mtreinish> dkranz: ok is there anything else on this topic?
17:41:39 <rockyg> Gotta start somewhere.  +1
17:41:39 <dkranz> jog0: I'm ok with that.
17:41:56 <dkranz> Just want to make sure if people agree with my statement about ERROR above
17:42:08 <dkranz> at 13:39
17:42:21 <afazekas> dkranz: I assume syslog con be configured to separate the swift logs
17:42:34 <rockyg> Yes
17:42:39 <mtreinish> dkranz: I do (that's what the +1 was for)
17:42:39 <dkranz> afazekas: Sure
17:42:42 <jog0> dkranz: why not punt the ERROR msg stuff till Icehouse
17:42:53 <dkranz> mtreinish: OK, great. I'll report next week on my progress.
17:42:55 <jog0> consensus and all)
17:42:59 <mtreinish> jog0: yeah that makes sense
17:43:18 <mtreinish> dkranz: ok you'll get a semi-permanent spot on the agenda then :)
17:43:26 <dkranz> mtreinish: :)
17:43:36 <mtreinish> ok then let's open the floor
17:43:39 <rockyg> Error in log means investigate to any ops guy.  So better have enough info for them to stay to dog e
17:43:41 <mtreinish> #topic open discussion
17:44:08 <mtreinish> are there any topics to bring up with what time is left?
17:44:31 <jog0> mtreinish: you can mention the work we have been doing
17:44:48 <mtreinish> oh yeah this is a good forum to discuss that
17:45:01 <afazekas> The pid/thread should be logged in the tempest.log , inorder to distinguish the logs from different workers
17:45:20 <mtreinish> so jog0 and I have been working on a bot that watches the gerrit stream for tempest failures
17:45:31 <mtreinish> and then use logstash to find fingerprints for open bugs
17:45:47 <mtreinish> and report back on irc and the gerrit commit with what it found
17:45:48 <rockyg> Cool
17:46:01 <mtreinish> you've probably seen RecheckWatchBot on the -qa channel
17:46:04 <jog0> well a human stil has to find the fingerprint but then we use logstash to classify the failures
17:46:23 <mtreinish> yeah that's what I meant (I didn't word it clearly)
17:46:52 <mtreinish> we're going to be moving it over to infra soon and will have everything up in gerrit too
17:47:36 <dkranz> mtreinish: That's cool.
17:47:48 <dkranz> mtreinish: Who do we expect to look at this, and what do we expect them to do?
17:48:06 <mtreinish> dkranz: it's mostly to lower the developer load for using recheck
17:48:12 <jog0> dkranz: you and the patch author
17:48:13 <mtreinish> and avoid duplicate bugs
17:48:28 <afazekas> What is the easiest way to find the logs relates a recheck ? on the recheck page you see the change number, but you need to click a lot for getting to the real job logs
17:48:33 <dkranz> jog0: Ah
17:48:52 <dkranz> So if I see this I should go to the review and do a recheck if one hasn't been done already?
17:48:55 <jog0> in this case https://review.openstack.org/#/c/47365/ just the patch auther looks at it
17:49:13 <jog0> or you if you want
17:49:28 <jog0> but when an unclassifed faulure comes up, we need to write a logstash query for it
17:49:46 <jog0> https://github.com/jogo/elasticRecheck/blob/master/queries.json
17:50:20 <jog0> afazekas: there isn't an easy way which is why we wrote this
17:50:27 <dkranz> jog0: I see.
17:51:12 <mtreinish> afazekas: about the logging I'm not opposed to doing that, but it's not exactly straightforward and I think it's a lower priority
17:51:13 <jog0> this should reduce the number or recheck no bugs, making our recheck numbers more accurate
17:51:26 <jog0> allowing us to better prioritize transient gate failures
17:52:39 <afazekas> Is anybody knows why this bug moved to medium ? https://bugs.launchpad.net/tempest/+bug/1205344   can we enable the test case ?
17:52:41 <uvirtbot> Launchpad bug 1205344 in nova "mkfs error in test_stamp_pattern" [Medium,Confirmed]
17:53:09 <mtreinish> jog0: ^^^ you and russellb were talking about that yesterday right?
17:53:11 <jog0> afazekas: yeah russell changed it  yesterday
17:53:24 <jog0> you can dig through the eavesdrop for nova for the exact wording
17:53:43 <jog0> how frequent was that bug
17:54:12 <afazekas> it was really frequent
17:54:43 <jog0> russellb: ^
17:55:08 <mtreinish> afazekas: you guys should probably take this offline (we've got ~5 min left)
17:55:29 <mtreinish> are there any other topics to bring up with what time we've got left?
17:55:36 <dkranz> Not from me
17:55:57 <giulivo> jog0, I think it is a nice tool, you said will enable coop via gerrit not git pull requests right?
17:56:15 <giulivo> looks like managing the queries is the most coop part
17:56:24 <mtreinish> giulivo: yeah that's the plan we're probably going to be making the move today
17:56:33 <clarkb> at least starting the process of moving
17:56:45 <mtreinish> clarkb: :)
17:56:56 <clarkb> I don't want to sneak it by jeblair, fungi, and mordred and we are all pretty busy this week
17:57:16 <psedlak> afazekas touched the issue that logs from parallel runs are not much useful ... could we add thread/pid numbers to log format or something like that?
17:58:04 <clarkb> psedlak: the subunit log has that
17:58:22 <clarkb> psedlak: I think we should probably attach the logs to the subunit for each test as they are run
17:58:29 <clarkb> this is what nova et al do
17:58:59 <psedlak> yes, i for example meant the tempest.txt ...
17:59:15 <psedlak> clarkb: what has to be done for that?
18:00:00 <clarkb> psedlak: http://git.openstack.org/cgit/openstack/nova/tree/nova/test.py#n242
18:00:06 <mtreinish> psedlak: another option which might be simpler would be printing the testname with each log message
18:00:14 <mtreinish> but we're out of time
18:00:20 <mtreinish> psedlak: we can pick this up on -qa
18:00:23 <mtreinish> #endmeeting