#openstack-meeting log

17:00:34 <mtreinish> #startmeeting qa
17:00:34 <openstack> Meeting started Thu Jul  3 17:00:34 2014 UTC and is due to finish in 60 minutes.  The chair is mtreinish. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:00:35 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:00:37 <openstack> The meeting name has been set to 'qa'
17:00:45 <mtreinish> Hi who's here today?
17:00:50 <k4n0> o/
17:00:50 <asselin> hi
17:00:53 <mkoderer> o/
17:00:58 <mtreinish> #link https://wiki.openstack.org/wiki/Meetings/QATeamMeeting#Proposed_Agenda_for_July_3_2014_.281700_UTC.29
17:00:59 <salv-orlando> aloha
17:01:02 <mtreinish> ^^^ Today's agenda
17:01:19 <dkranz> o/
17:01:28 <jlanoux> hi
17:01:55 <mtreinish> ok let's get started
17:02:04 <mtreinish> #topic Spec review day July 9th
17:02:15 <mtreinish> #link http://lists.openstack.org/pipermail/openstack-dev/2014-July/039105.html
17:02:26 <mtreinish> so I'm not sure everyone saw that ML post
17:02:36 <mtreinish> but next Wed. we're going to have a spec review day
17:02:42 <mkoderer> ok
17:02:49 <mtreinish> the goal is to go through the backlog on the qa-specs repo
17:02:56 <mtreinish> which has been a pretty slow process
17:03:17 <mtreinish> so if everyone could concentrate on specs reviews then that would be awesome
17:03:59 <mtreinish> that's all I had on this topic. So unless someone has something else to add we can move on
17:04:14 <asselin> added to my calendar
17:04:37 <mtreinish> #topic Mid-cycle Meet-up full, registration closed
17:04:47 <mtreinish> #link http://lists.openstack.org/pipermail/openstack-dev/2014-July/039209.html
17:05:05 <mtreinish> so just another announcement that we have no more space available for the midcycle meetup
17:05:08 <mkoderer> mtreinish: there was one registration after you closed it ;)
17:05:17 <mtreinish> mkoderer: heh, yeah I noticed
17:05:30 <mtreinish> I'll talk to cody-somerville...
17:05:31 <dkranz> eventual consistency
17:05:36 <mkoderer> I put him on the list anyway
17:05:47 <mtreinish> yeah we actually had 1 free slot so it's ok
17:05:56 <mtreinish> so anyway now we really are full
17:06:14 <mtreinish> so if you're name's not on the list unfortunately there isn't any room
17:06:38 <mtreinish> sorry if you're unable to attend
17:07:03 <mkoderer> next time we have to plan with more ppl ;)
17:07:23 <mtreinish> honestly I wasn't expecting to hit 30
17:07:32 <mtreinish> so I'm a bit surprised
17:07:52 <mkoderer> sure but it's a good sign :)
17:08:10 <mtreinish> mkoderer: no disagreement from me :)
17:08:14 <mtreinish> that's all I had for this topic. So unless someone has something else we can move on
17:08:46 <mtreinish> #topic Specs Review
17:08:57 <mtreinish> #link https://review.openstack.org/#/c/104576/
17:09:07 <andreaf> hi - sorry I'm late
17:09:12 <mtreinish> so I pushed this out this morning to drop a spec that was superseeded
17:09:31 <mtreinish> but andreaf brought up the point do we really want to just rm the rst files if a spec is abandonded
17:09:37 <mtreinish> or archive them somewhere in tree
17:10:06 <mtreinish> so we have a record of them if they ever need to be restored
17:10:06 <dkranz> mtreinish: Can't hurt much to move them to some abandoned dir
17:10:23 <mtreinish> yeah, I don't have a strong opinion either way, rm just seemed simpler
17:10:25 <dkranz> mtreinish: I doubt there will be a large number, at least I hope not
17:10:30 <mtreinish> yeah hopefully not
17:10:46 <dkranz> It can also be recovered from git though, right?
17:11:02 <mtreinish> dkranz: yeah getting it from git is as simple as a revert
17:11:10 <andreaf> andreaf: mtreinish dkranz yes it can also be recovered by git - it's just a folder is more visible / accessible
17:11:28 <dkranz> andreaf: But realistically who will want to see it?
17:11:38 <mtreinish> dkranz: yeah that's what I was thinking
17:11:48 <dkranz> andreaf: Anyway, I dont have a strong opinion either
17:11:53 <mtreinish> if it's abandonded it means it's not going to be implemented
17:12:03 <mtreinish> and why do we want to archive it
17:12:19 <mkoderer> and the reason why it's abandonded is in gerrit?
17:12:20 <mtreinish> the implemented dir is different because it's pseudo documentation of features
17:12:22 <andreaf> dkranz mtreinish so can a spec stay in review forever?
17:12:29 <dkranz> And it will bitrot anyway as things change.
17:12:53 <mkoderer> I would simply remove it...
17:12:54 <mtreinish> andreaf: yes, I'm not sure cores can force an abandon, just -2
17:13:03 <andreaf> I was thinking about specs which are interesting but abandoned because no one has time to work on them atm
17:13:17 <mtreinish> andreaf: that's the TODO thing again
17:13:41 <mkoderer> andreaf: ah, ok that can be useful
17:13:47 <mkoderer> like a backlog of specs
17:13:53 <mtreinish> andreaf: so there is probably a class of things like that specs without owners
17:13:57 <dkranz> +1
17:14:02 <k4n0> I have a question on specs, do new tests have to be proposed via a spec or a bug?
17:14:03 <mtreinish> but that's different and it hasn't come up yet
17:14:21 <dkranz> k4n0: Not by bug
17:14:24 <mtreinish> k4n0: can we discuss that after the meeting in openstack-qa?
17:14:28 <k4n0> ok
17:14:31 <k4n0> thanks
17:15:12 <mtreinish> ok so I'm thinking the direction we should take with this case and others like it when the spec is superseeded is to just rm the file
17:15:21 <andreaf> mtreinish: so I don't have a very strong opinion on an abandoned folder, I thought it could be useful for new people to look into that folder to pick specs
17:15:30 <mtreinish> and if there is a case where the owner can't continue the work we can revisit archiving them somewhere else
17:15:41 <andreaf> mtreinish: ok sounds good
17:15:53 <mtreinish> ok then let's move on the next spec on the agenda
17:15:53 <dkranz> andreaf: There is a difference between abandoned and no one has done it yet.
17:16:11 <mtreinish> #link https://review.openstack.org/#/c/101232/
17:16:29 <mtreinish> I'm assuming yfried put this on the agenda
17:16:35 <mtreinish> but I'm not sure what he wanted to discuss
17:16:35 <dkranz> mtreinish: Yes
17:16:50 <dkranz> mtreinish: He wants another core to review it
17:17:00 <dkranz> mtreinish: You disqualified yourself as a co-author
17:17:26 <mtreinish> dkranz: this is a spec, not the addcleanup patch :)
17:17:42 <mtreinish> that bounced on merge conflict after the +A
17:17:47 <dkranz> mtreinish: Oh, sorry
17:18:11 <dkranz> mtreinish: I guess he wanted some feedback, even if not a formal review
17:18:31 <dkranz> mtreinish: I think he is eager to get going but thinks it might be controversial
17:18:53 <mtreinish> ok, well I'll take a look at it. It sounds controversial from the commit summary
17:18:55 <andreaf> dkranz: I'll add it to my review list
17:19:06 <dkranz> mtreinish: thanks
17:19:24 <mtreinish> especially given how much work is involved in a major organizational refactor
17:19:43 <mtreinish> ok are there any other specs that people would like to discuss?
17:19:57 <dkranz> mtreinish: We have some movement on the tempest config script
17:20:21 <dkranz> mtreinish: I left two more comments but am almost ready to give my +2
17:20:29 <mtreinish> dkranz: yeah I saw, I need to take another pass at it too
17:21:01 <dkranz> mtreinish: I am going to also add that the discovery part should be its own module that can be shared with the verify script or anything else that needs it
17:21:27 <andreaf> dkranz: +1
17:21:49 <mtreinish> dkranz: that makes sense to me, you can just break it out of what's in the verify script
17:21:52 <mtreinish> and go from there
17:22:01 <mtreinish> but that should be an explicit work item then
17:22:15 <dkranz> mtreinish: right, that was going to be my additional comment which I will make right after meeting
17:22:41 <mtreinish> ok, cool
17:22:55 <mtreinish> are there any other specs, otherwise lets move on
17:23:29 <mtreinish> #topic Blueprints
17:23:36 <mtreinish> #link https://blueprints.launchpad.net/tempest/
17:23:47 <mtreinish> does anyone who has an inprogress BP have a status update
17:24:14 <mtreinish> we're down 2 from last week, we marked branchless tempest as complete, although one work item was spun off as a separate spec
17:24:20 <dkranz> mtreinish: The ui would not let me change https://blueprints.launchpad.net/tempest/+spec/client-checks-success to Started
17:24:23 <mtreinish> and the nova v3 test refactor was dropped
17:24:27 <dkranz> mtreinish: Not sure why
17:24:42 <dkranz> mtreinish: At least I could not figure out how to do it
17:24:45 <mtreinish> dkranz: hmm, I just did it for you
17:24:59 <mtreinish> didn't seem to complain
17:25:03 <dkranz> mtreinish: weird
17:25:05 <mtreinish> but lp is weird
17:25:21 <mtreinish> it still drives me crazy that I can't get new bp notifications
17:25:41 <dkranz> mtreinish: anyway I would appreciate a review of https://review.openstack.org/#/c/104290/ because it touches a lot of files
17:25:53 <dkranz> though in a simple way
17:26:02 <dkranz> and I nope to avoid rebase issues :)
17:26:11 <andreaf> dkranz: I'll have a look
17:26:27 <dkranz> That is most of the identity portion of client checking
17:26:36 <dkranz> andreaf: Thanks
17:26:37 <jlanoux> dkranz: me too
17:26:55 <mtreinish> dkranz: ok, I'll take a look. That's just to move the resp code checks into the clients right?
17:27:56 <dkranz> mtreinish: Right, along with a bug fix in the same code
17:28:30 <mtreinish> ok, are there any other BPs to discuss?
17:28:55 <mtreinish> dkranz: hmm, what happened to one logical change per commit :)
17:29:31 <mtreinish> ok, let's move on
17:29:32 <dkranz> mtreinish: The problem was that the fix for the bug and the client check change overwrite exactly the same code. You will see.
17:29:43 <mtreinish> ok, I was just giving you a hard time
17:29:47 <dkranz> mtreinish: :)
17:29:51 <mtreinish> #topic Grenade
17:30:04 <mtreinish> so I don't think sdague is around right now
17:30:15 <mtreinish> but there was a discussion of javelin2 on the ML
17:30:41 <mtreinish> and I suggested that we should avoid adding new features to javelin2 until we get it working in the grenade job
17:31:02 <mtreinish> I also know that EmilienM has been doing a bunch of work on adding other services to grenade
17:31:24 <mtreinish> but I haven't been following things that closely
17:31:45 <mtreinish> so unless anyone has something to add here we can just move on
17:32:42 <mtreinish> #topic Neutron testing
17:32:54 <mtreinish> salv-orlando: I'm sure you've got something for this topic :)
17:33:16 <salv-orlando> yes. Basically we have made some progress in making the full job voting.
17:33:32 <salv-orlando> in a nutshell the problem is not the job being “full” but rather being “parallel"
17:33:58 <salv-orlando> anyway, the top offenders have been identified, and we have patches for it.
17:34:21 <salv-orlando> However, I made a mess in one of them, as I ddi not identify correctly the root cause
17:34:32 <salv-orlando> that’s the bad patch: https://review.openstack.org/#/c/99182/
17:34:43 <mtreinish> heh, well parallel is always the trouble spot...
17:34:49 <salv-orlando> I know have the correct root cause and will push a patch soon.
17:35:25 <salv-orlando> needless to say, these failure are related to the neutron/nova event mechanism which is the biggest feature introduced since we ‘fixed’ parallel testing back in february
17:35:57 <salv-orlando> beyond these failures we have a few remaining issues, mostly around ‘lock wait timeouts'
17:36:18 <salv-orlando> these issues affect both smoke and full jobs, but parallelism make their frequency slightly higher.
17:36:39 <salv-orlando> We have people working on the ‘lock wait timeout’ issues, but I don’t have a timeline for that.
17:37:09 <mtreinish> salv-orlando: ok, so do you think after the bugginess around the event mechanism is resolved it's time to flip the switch?
17:37:23 <mtreinish> and live with the lock wait timeouts for a while
17:37:30 <salv-orlando> mtreinish: in a nutshell, yes.
17:37:46 <mtreinish> ok sounds sane to me
17:37:50 <salv-orlando> also because the lock wait timeout error is not always crticial, from a job success perspective
17:37:52 * afazekas hopes just fixing the #1329546 will be enough for voting
17:38:12 <salv-orlando> afazekas: 1329546 is where I made the mistake in finding the root cause
17:39:00 <salv-orlando> There is a nova issue where in some cases the VM just boots and does not wait for an event, and I thought it was also happening the opposite: the compute node waiting for an event that would never come
17:39:06 <salv-orlando> as you saw, this was not the case.
17:39:17 <mtreinish> salv-orlando: and on the full everywhere vs asymmetrical thing I'm fine either way. I'll bring it up during the project meeting next week
17:39:36 <salv-orlando> mtreinish: cool, so I’ll change the config patch to make the job full everywhere
17:39:48 <salv-orlando> so I guess we’ll put it directly on the integrated gate?
17:39:48 <mtreinish> just to see if there is a strong opinion either way
17:40:04 <mtreinish> salv-orlando: yeah if we go with it everywhere that'd be the best way to do it
17:40:21 <mtreinish> although I don't think all the projects are using the integrated-gate template
17:40:34 <mtreinish> so you might have to update it manually for a couple of projects
17:40:51 <salv-orlando> that’s all for the neutron full job side from em. If we’re lucky we can get these patches merged this week so the folks at the nuetron code sprint next week will deal with the increased failure rate!
17:41:09 <afazekas> #link http://logstash.openstack.org/#eyJmaWVsZHMiOltdLCJzZWFyY2giOiJtZXNzYWdlOlwiZmFpbGVkIHRvIHJlYWNoIFZFUklGWV9SRVNJWkUgc3RhdHVzXCIgQU5EIG1lc3NhZ2U6XCJDdXJyZW50IHRhc2sgc3RhdGVcXDogcmVzaXplX2ZpbmlzaFwiIEFORCB0YWdzOlwiY29uc29sZS5odG1sXCIiLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsIm9mZnNldCI6MCwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjE0MDQyODA3NTYxNDN9 This is one of the issue type which happens significantly more f
17:41:09 <afazekas> requently in the full job
17:41:36 <marun> I have a topic for discussion before we move on from Neutron, if there's time.
17:41:44 <afazekas> Solving this issue might be enough for voting
17:41:58 <mtreinish> marun: sure
17:42:27 <salv-orlando> afazekas: that signature includes both bug 1329546 and 1333654
17:42:29 <uvirtbot> Launchpad bug 1329546 in nova "Upon rebuild instances might never get to Active state" [Undecided,In progress] https://launchpad.net/bugs/1329546
17:42:30 <uvirtbot> Launchpad bug 1333654 in nova "Timeout waiting for vif plugging callback for instance" [Undecided,In progress] https://launchpad.net/bugs/1333654
17:43:24 <salv-orlando> afazekas: you’ll have to dig into logs to see if the action fails because of an error while posting to instance_external_event
17:43:28 <salv-orlando> or because of a timeout
17:43:56 <salv-orlando> marun: go ahead. I’m done if nobody has anythign to add on the full job.
17:44:10 <marun> ok
17:44:14 <marun> As you all know, nova network/neutron parity is mandated by the TC, and validating the work items requires multi-node testing
17:44:34 <marun> We know that it's not going to be possible to do multi-node in the gate in the near term, so we're left with 3rd party testing.
17:44:54 <marun> We still have to hash out who's going to provide the resources for 3rd party testing, but that's a separate concern.
17:45:32 <marun> I'd like to see Tempest accept multinode-requiring tests, with the provisio (as nova and neutron already require) that such tests are run by 3rd party jobs.
17:45:53 <mtreinish> marun: awesome if there is someone running a ci with multinode
17:45:57 <clarkb> marun: why is it not possible in the gate near term?
17:46:00 <mtreinish> that opens up all sorts of new testing
17:46:03 <marun> Failing that, we'll have to put multi-node scenario tests in the Neutron tree, and it will be harder to get good oversight from the tempest team.
17:46:14 <marun> clarkb: near-term -> in the next month?
17:46:23 <clarkb> marun: I mean we support multinode testing now
17:46:25 <marun> clarkb: I'm happy to be wrong :)
17:46:26 <clarkb> no one is using it
17:46:33 <marun> clarkb: wow, news to me.
17:46:38 <dkranz> clarkb: news to me too
17:46:40 <mtreinish> marun: the only requirement related to this for tempest is that code in new tests gets executed in a ci system
17:46:49 <mtreinish> because tempest is mostly self verifying
17:46:55 <sdague> marun: honestly, it's probably a week's worth of work for someone to make it do multinode devstack testing
17:47:00 <dkranz> clarkb: Is there a wiki page or something about it?
17:47:03 <sdague> but no volunteers
17:47:15 <clarkb> dkranz: no. no one has done anything with it so nothing to wiki
17:47:24 <marun> mtreinish: awesome.  so whether we can take advantage of upstream or 3rd party, so long as we are running it, the tests can go in.
17:47:39 <marun> sdague: we'll have volunteers ;)
17:47:43 <dkranz> sdague: I will try to find a volunteer.
17:47:59 <mtreinish> marun: yep, we just don't want to land code that hasn't been run. Which is why we've been blocking things that require a multi-node env
17:47:59 <afazekas> How many nodes we could use in multonede job ?
17:48:12 <clarkb> afazekas: its technically arbitrary but starting with 2 is probably easiest
17:48:16 <sdague> afazekas: honestly, start with 2
17:48:17 <andreaf> sdague, mtreinish: there are things that may not work in multinode, one that comes to mind it the log parsing - unless multinode uses some rsyslog server or so
17:48:20 <marun> mtreinish: totally understand, wouldn't have it any other way.
17:48:36 <sdague> andreaf: there are definitely things that have to be sorted
17:48:40 <mtreinish> andreaf: well that's something for whoever implements it to solve :)
17:49:05 <sdague> but the real issue is just no one is taking advantage of the nodepool facility yet
17:49:07 <afazekas> sdague: I can configure devstack to work in multi-node, but I do not know how it could be added to the gate
17:49:11 <andreaf> marun: so do you need 1 node as now +1 compute or do you need to split neutron control plane as well?
17:49:27 <dkranz> sdague: sounds like afazekas is a volunteer perhaps
17:49:29 <sdague> afazekas: right, that's where we need a person to dive into that
17:49:34 <marun> andreaf: more than 1 compute node
17:49:53 <afazekas> sdague: can we discuss it on the met up ?
17:49:57 <marun> andreaf: so that we can validate connectivity between vm's on multiple nodes and test nova ha/neutron dvr
17:49:59 <andreaf> marun: ok that's great - it would allow also to run migration tests
17:50:06 <sdague> initial multinode configurations should be all services (including compute) on 1 node, and compute only on 2nd node
17:50:08 <marun> andreaf: great!
17:50:10 <sdague> so you'll have 2 computes
17:50:50 <sdague> afazekas: sure, we should have all the right people there
17:51:07 <mtreinish> yeah it's probably a good topic for the meetup
17:51:12 <afazekas> ok
17:51:13 <mtreinish> I'll add it to the potetial topic list
17:51:23 <clarkb> I think someone could start working on it now though
17:51:27 <andreaf> mtreinish: yes good idea :)
17:51:29 <mtreinish> clarkb: very true
17:51:32 <clarkb> I am not convinced meetup is necessary to start the work
17:51:46 <clarkb> might be good to take the hard bits to the meetup
17:51:49 <afazekas> psedlak: ^
17:52:08 <dkranz> clarkb: Can you give enough info for some one to get started before then?
17:52:22 <andreaf> clarkb: so does nodepool already understand the concept of having more than one node associated to a job?
17:53:08 <clarkb> andreaf: yes
17:53:15 <mtreinish> ok well we're at < 8min left so can you guys can take the multinode coversation to -infra or -qa after the meeting
17:53:22 <clarkb> yes please to -infra
17:53:25 <andreaf> does it make sense to have a qa-spec or infra-spec of this?
17:53:26 <afazekas> clarkb: Is all test node on the same network ?
17:53:34 <clarkb> afazekas: we can talk about it in -infra
17:53:38 <mtreinish> ok let's move on
17:53:43 <mtreinish> #topic Critical Reviews
17:53:58 <mtreinish> so does anyone have any reviews that they'd like to get extra eyes on
17:54:47 <mtreinish> wow this must be a first no one has any reviews that need extra attention :)
17:55:07 <adam_g> ive got a few up that add new compute feature flags, helpful for getting our ironic testing rolling in a more vanilla fashion
17:55:13 <mtreinish> andreaf: links?
17:55:18 <adam_g> sec
17:55:34 <mtreinish> oops stupid tab complete
17:55:34 <dkranz> mtreinish: I already mentioned mine :)
17:55:50 <adam_g> https://review.openstack.org/102628 https://review.openstack.org/101381  actually, i guess its only one at this point but there will be more coming
17:56:15 <mtreinish> #link  https://review.openstack.org/102628
17:56:35 <mtreinish> #link https://review.openstack.org/#/c/101381/
17:56:41 <adam_g> thanks :)
17:57:36 <mtreinish> ok if there aren't any other reviews I guess we can move on to a brief open discussion...
17:57:41 <mtreinish> #topic Open Discussion
17:58:13 <andreaf> mtreinish: we said at some point we would discuss about how to use topics in reviews
17:58:21 <andreaf> mtreinish: but we never did until now
17:58:34 <mtreinish> we did? I don't recall that
17:58:46 <andreaf> mtreinish: I still think it would be helpful to have a way of filtering reviews based on the service
17:58:55 <andreaf> at the summit
17:59:08 <mtreinish> andreaf: well that's part of the point of having only one bp for a testing effort
17:59:17 <mtreinish> to group reviews by an effort
17:59:29 <mtreinish> the problem is that the burden is on the submitter
17:59:49 <mtreinish> and I'm not sure forcing a specific topic is something we want to nit pick on
18:00:19 <mtreinish> although I guess we do nitpick on the commit msg mentioning the bp
18:00:23 <mtreinish> anyway that's time
18:00:27 <mtreinish> thanks everyone
18:00:30 <mtreinish> #endmeeting