17:00:03 #startmeeting qa 17:00:04 Meeting started Thu Sep 19 17:00:03 2013 UTC and is due to finish in 60 minutes. The chair is mtreinish. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:05 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:07 The meeting name has been set to 'qa' 17:00:20 who's here for the meeting? 17:00:37 hi 17:00:42 hii 17:00:42 hi 17:01:10 hi 17:01:10 hi 17:01:12 ok here today's agenda: 17:01:14 #link https://wiki.openstack.org/wiki/Meetings/QATeamMeeting 17:01:23 lets get started 17:01:34 #topic neutron testing status 17:01:39 hi 17:01:41 mlavalle: are you around? 17:02:10 Yes i am 17:02:21 ok, any update on neutron testing? 17:02:44 mtreinish: I've been debugging isolated_creds 17:03:13 mtreinish: as you know, I am having a race issue. With the latest run you sent me….. 17:03:33 ok 17:03:43 I think I need to add exception handling when I don't find the port in cleanup_ports 17:03:47 hi 17:04:02 that's going to be my next patchset 17:04:20 mlavalle: ok, we talk about the details of the patch after the meeting I had some questions 17:04:31 is there anything else on the neutron front? 17:04:43 I would also like to request a run of the neutron_full gate job 17:04:44 can we merge the partially working isolation patch now ? 17:05:11 afazekas: it'll fail on clean up 17:05:20 mlavalle: I think it's in the experimental queue 17:05:21 right now is disabled. I want to see the status of a run, after all the changes that have been merged 17:05:36 can you point me to it? 17:06:05 mlavalle: I think you leave a zero comment 'check experimental' 17:06:08 dkranz: is that right? 17:06:14 mtreinish: Yes. 17:07:00 mtreinish: ok, with that I will be able to gauge how much more work we need to do to fix it for godd 17:07:12 that's all I have 17:07:13 ok cool 17:07:21 let's move on then 17:07:28 #topic blueprints 17:07:49 The patch makes closer to a working network isolation ant it is easier to continue if it is merged, it does not have any negative impact to the current jobs 17:07:49 are there any blueprints that we need to discuss? 17:08:32 afazekas: I'm not comfortable merging code that doesn't work 17:09:00 ok if there aren't any blueprints to bring up then lets move 17:09:14 #topic Critical Reviews 17:09:23 afazekas: I don't feel comfortable either, yet. let me wrestle with it a little longer 17:09:28 are there any reviews that people need to get eyes on? 17:09:28 mlavalle: You should do the check experimental in a tempest patch. It was not added to all projects. 17:10:17 mtreinish: https://review.openstack.org/#/c/43039/ 17:10:20 this one 17:10:28 #link https://review.openstack.org/#/c/43039/ 17:10:29 mtreinish: https://review.openstack.org/#/c/38995/ is very old and I hope can be approved. 17:10:38 #link https://review.openstack.org/#/c/38995/ 17:11:05 ok I've got them open on my browser I'll take a look after the meeting 17:11:06 Anju: That link was merged 17:11:20 and any other cores can pick them off too 17:11:31 mtreinish, one more, it's not critical but useful to figure what is good and what isn't for the stable branches 17:11:32 https://review.openstack.org/#/c/45808/ 17:11:33 Anju: yeah it is 17:11:44 Anju: it's also a nova patch 17:11:52 mtreinish: https://review.openstack.org/#/c/39621/ 17:11:56 mtreinish: this one 17:11:58 #link https://review.openstack.org/#/c/45808/ 17:12:16 btw https://review.openstack.org/#/c/45808/ is a cherry pick 17:12:16 i want to how to proceed in v3 tests 17:12:21 https://review.openstack.org/#/c/39621/ got several -1's 17:12:58 giulivo: yeah I see that we can talk about that after the meeting 17:13:12 ok are there any other reviews? 17:13:22 #link https://review.openstack.org/#/c/39621/ 17:13:34 mkoderer: I don't think the -1s are valid at this point. 17:13:48 mtreinish, mkoderer , giulivo , dkranz ,afazekas : i know the topic is not for v3 tests. but need a direction... 17:14:11 mtreinish, mkoderer , giulivo , dkranz ,afazekas : two times -1 in this patch 17:14:12 dkranz: yeah it just needs a rebase probably 17:14:12 sorry to interrupt -- but do we not have a security meeting today? 17:14:27 malini2: it's in an hour I think 17:14:39 oh 17:15:00 Anju: there is nothing wrong with the v3 tests but that one just needs a rebase 17:15:12 that commit is a straight copy and paste of the v2 tests 17:15:31 mtreinish: To be clear, we have adopted the copy/modify approach and are ready to move on any v3 reviews, right? 17:15:52 afazekas: is this ok? 17:16:10 dkranz: I think that's what the plan is. I didn't really agree with it but the consensus was against me so I'm fine with it now :) 17:16:25 your comments to more inheritance and less copy-paste ? 17:16:33 Anju: It is the faster way to get v3 test, adding same inheritance is possible latter 17:16:39 mtreinish: What did you propose? 17:16:56 just adding v3 tests as individual patches don't bother copying and pasting 17:17:05 and maybe break it up into smaller patches in a longer series 17:17:24 mtreinish: but the end result is still a full copy of tests for v3? 17:17:33 yeah 17:17:59 mtreinish: I was talking about copy/modify vs attemped inheritance 17:18:23 yeah the copy/modify is the way they're going to do it 17:18:32 mtreinish: I don't remember a discussion and don't really have an opinion about how we get there 17:18:48 mtreinish: But we can move on now. 17:18:51 it was on the ml and partially in a review at one point 17:18:59 yeah ok let's go to the next topic 17:19:11 #topic How to handle bug fixes in launchpad 17:19:18 so this one is mine 17:19:42 during the bug day earlier this week we noticed that all the fixed bugs were in the fix committed state 17:20:07 it used to immediately go into fix released after the patch was merged 17:20:21 there seemed to be a bit of disagreement over which approach was better 17:20:31 IMHO after it merged it should go the Fix released state automatically or with 1-7 day delay 17:20:40 afazekas: I agree 17:20:42 so I just wanted to bring it up during the meeting to see if there was a strong opinion one way or the other 17:21:07 afazekas: ok yeah that's what I'm leaning towards again 17:21:31 I can revert for jeepb to switch it back to the old behavior then 17:21:40 mtreinish: Thanks. 17:21:54 ok this was a quick topic 17:22:01 #topic Bogus errors in logs one more time 17:22:04 dkranz: you're up 17:22:32 So I was trying to see if there was a failure in the neutron logs and saw that it too has lots of bogus ERROR/stacktrace 17:22:35 mtreinish, and eventually move whatever is in fix committed to fix released ? 17:22:48 That show up even on successful runs 17:22:54 the tempest.log should have a thread/pid number 17:23:10 giulivo: yeah that was the intent of switching leaving it in fix committed 17:23:13 infra has agreed they could put in a regexp on the logs to fail a build if bogus errors show up 17:23:46 Part of fixing this issues is educating developers about what log.error should be used and not used for. 17:24:10 The second part is giving some priority to fixing the bogus ones 17:24:10 dkranz: yeah I've seen overuse of log.ERROR before too 17:24:21 I hope it will not lead to not using the error or critical level when it is required 17:24:22 I think sdague was working on a whitelist at one point 17:24:31 The third part is defining a "whitelist" so that infra can start failing builds that introduce new ones. 17:24:35 or that might have just been for stacktraces 17:25:00 mtreinish: I haven 17:25:20 I haven't heard any one disagree with these points but I'm not sure how to make it happen. 17:25:26 Particularly the priority part 17:25:55 I don't know why people don't think this has customer/user impact 17:26:16 It was a major headache with the system I ran. 17:26:29 dkranz: well we can make a blueprint for this, for the priority one it's really 2 parts identifying the spurious log messages and then opening bugs for them 17:26:45 and marking them as high priority bugs 17:26:51 it would be nice to see some of this before Havana is cut 17:26:54 mtreinish: Yeah. THere are already a bunch of bugs I filed a while ago. 17:27:05 mtreinish: I can take another pass at this. 17:27:25 But really some one from each team should grep there logs and go from there 17:27:27 I think the best way to start going about this is to send out a post to the ML to try to get a wider audience 17:27:47 +1 17:28:09 mtreinish: Sure. But I've done that before :( 17:28:10 we can use logstash for this i think 17:28:29 jog0: yeah logstash will be useful for this too 17:28:32 jog0: How? 17:28:33 (for finding what stactraces happen today) 17:28:41 dkranz, mtreinish how about going the opposite route which is collecting what is logged as error also when tempest succeeds and post it to the -dev list ? 17:28:54 giulivo: That is exactly what I was talking about 17:29:07 oh I thought asking -dev to inspect 17:29:25 giulivo: Yes, to inspect the bogus ERROR in their logs 17:29:51 ok so I was just suggesting -qa/-infra inspects the logs first and post the bogus messages 17:30:22 @message:Traceback* AND @fields.filename:"logs/screen-n-api.txt" 17:30:34 dkranz: ^ put that in logstash.openstack.org 17:30:53 dkranz: there should probably be a summit topic on this too. 17:31:06 mtreinish: OK, I'll put one in. 17:31:11 but we want to start it before havana 17:31:11 'message:Traceback* AND @fields.filename:"logs/screen-n-api.txt" AND @fields.build_status:"SUCCESS"' will tell you for only passing jobs 17:31:30 jog0: OK 17:31:59 jog0: There are also bogus ERROR without stacktrace 17:32:13 there is also the find_stack_traces.py script in tools 17:32:23 although I haven't used it at all 17:32:46 mtreinish: I will do something 17:33:03 where the white list should be stored ? is the devstack-gate repo good for whitelist ? 17:33:27 afazekas: it should probably be separate 17:33:31 afazekas: Sean had some idea for this but didn't tell me details 17:34:06 dkranz: well then we can all bug him when he gets back. 17:34:16 dkranz: it probably wouldn't hurt to start a bp on this now 17:34:28 and we can fill in the details after we have some more discussion about how to do it 17:34:36 mtreinish: OK, I'll do that. 17:34:47 tempest or infra bp ? 17:34:49 mtreinish: The problem is that it is really a cross-project blueprint 17:35:05 mtreinish: We as a community don't always do so well with those. 17:35:38 dkranz: yeah, it really doesn't fit too well in one project. Just stick it somewhere I guess 17:35:46 we can make individual project bps too 17:35:52 and use dependency to track them 17:35:59 that's what I did for the coverage extension 17:36:00 mtreinish: That's a good idea. 17:36:10 dkranz: https://blueprints.launchpad.net/tempest/+spec/tempest-coverage-reporting 17:36:17 mtreinish: I'll figure out which projects have issues and open them in each 17:36:48 mtreinish: tempest is probably the one project that won't require code changes :) 17:36:50 isn't the infra obvious choice for that (bogus errors), what would be the reason to create it as tempest bp? 17:36:54 if we can gate on no new stacktraces before Havana is out that would be really amazing 17:37:15 jog0: Yes 17:37:40 jog0: To be clear, are you distinguishing between incorrect ERRORs that have stacktraces and those that don't? 17:38:10 jog0: If we make "rules" for developers we have to be very precise 17:39:06 if there is an ERROR message without detailed info it is a double -1 :) 17:39:58 IMO, ERROR in log should be for something the operator *should* understand/investigate 17:40:26 And can thus be used as a monitoring alert 17:40:45 I agree with dkranz. I think its better to take a smaller step at first and just worry about stacktraces and not errors 17:40:46 +1 17:40:50 swift is hopeless in this regard 17:41:27 dkranz: +1 (not about swift I haven't really looked at swift logs much) 17:41:37 dkranz: ok is there anything else on this topic? 17:41:39 Gotta start somewhere. +1 17:41:39 jog0: I'm ok with that. 17:41:56 Just want to make sure if people agree with my statement about ERROR above 17:42:08 at 13:39 17:42:21 dkranz: I assume syslog con be configured to separate the swift logs 17:42:34 Yes 17:42:39 dkranz: I do (that's what the +1 was for) 17:42:39 afazekas: Sure 17:42:42 dkranz: why not punt the ERROR msg stuff till Icehouse 17:42:53 mtreinish: OK, great. I'll report next week on my progress. 17:42:55 consensus and all) 17:42:59 jog0: yeah that makes sense 17:43:18 dkranz: ok you'll get a semi-permanent spot on the agenda then :) 17:43:26 mtreinish: :) 17:43:36 ok then let's open the floor 17:43:39 Error in log means investigate to any ops guy. So better have enough info for them to stay to dog e 17:43:41 #topic open discussion 17:44:08 are there any topics to bring up with what time is left? 17:44:31 mtreinish: you can mention the work we have been doing 17:44:48 oh yeah this is a good forum to discuss that 17:45:01 The pid/thread should be logged in the tempest.log , inorder to distinguish the logs from different workers 17:45:20 so jog0 and I have been working on a bot that watches the gerrit stream for tempest failures 17:45:31 and then use logstash to find fingerprints for open bugs 17:45:47 and report back on irc and the gerrit commit with what it found 17:45:48 Cool 17:46:01 you've probably seen RecheckWatchBot on the -qa channel 17:46:04 well a human stil has to find the fingerprint but then we use logstash to classify the failures 17:46:23 yeah that's what I meant (I didn't word it clearly) 17:46:52 we're going to be moving it over to infra soon and will have everything up in gerrit too 17:47:36 mtreinish: That's cool. 17:47:48 mtreinish: Who do we expect to look at this, and what do we expect them to do? 17:48:06 dkranz: it's mostly to lower the developer load for using recheck 17:48:12 dkranz: you and the patch author 17:48:13 and avoid duplicate bugs 17:48:28 What is the easiest way to find the logs relates a recheck ? on the recheck page you see the change number, but you need to click a lot for getting to the real job logs 17:48:33 jog0: Ah 17:48:52 So if I see this I should go to the review and do a recheck if one hasn't been done already? 17:48:55 in this case https://review.openstack.org/#/c/47365/ just the patch auther looks at it 17:49:13 or you if you want 17:49:28 but when an unclassifed faulure comes up, we need to write a logstash query for it 17:49:46 https://github.com/jogo/elasticRecheck/blob/master/queries.json 17:50:20 afazekas: there isn't an easy way which is why we wrote this 17:50:27 jog0: I see. 17:51:12 afazekas: about the logging I'm not opposed to doing that, but it's not exactly straightforward and I think it's a lower priority 17:51:13 this should reduce the number or recheck no bugs, making our recheck numbers more accurate 17:51:26 allowing us to better prioritize transient gate failures 17:52:39 Is anybody knows why this bug moved to medium ? https://bugs.launchpad.net/tempest/+bug/1205344 can we enable the test case ? 17:52:41 Launchpad bug 1205344 in nova "mkfs error in test_stamp_pattern" [Medium,Confirmed] 17:53:09 jog0: ^^^ you and russellb were talking about that yesterday right? 17:53:11 afazekas: yeah russell changed it yesterday 17:53:24 you can dig through the eavesdrop for nova for the exact wording 17:53:43 how frequent was that bug 17:54:12 it was really frequent 17:54:43 russellb: ^ 17:55:08 afazekas: you guys should probably take this offline (we've got ~5 min left) 17:55:29 are there any other topics to bring up with what time we've got left? 17:55:36 Not from me 17:55:57 jog0, I think it is a nice tool, you said will enable coop via gerrit not git pull requests right? 17:56:15 looks like managing the queries is the most coop part 17:56:24 giulivo: yeah that's the plan we're probably going to be making the move today 17:56:33 at least starting the process of moving 17:56:45 clarkb: :) 17:56:56 I don't want to sneak it by jeblair, fungi, and mordred and we are all pretty busy this week 17:57:16 afazekas touched the issue that logs from parallel runs are not much useful ... could we add thread/pid numbers to log format or something like that? 17:58:04 psedlak: the subunit log has that 17:58:22 psedlak: I think we should probably attach the logs to the subunit for each test as they are run 17:58:29 this is what nova et al do 17:58:59 yes, i for example meant the tempest.txt ... 17:59:15 clarkb: what has to be done for that? 18:00:00 psedlak: http://git.openstack.org/cgit/openstack/nova/tree/nova/test.py#n242 18:00:06 psedlak: another option which might be simpler would be printing the testname with each log message 18:00:14 but we're out of time 18:00:20 psedlak: we can pick this up on -qa 18:00:23 #endmeeting