17:02:20 <krtaylor> #startmeeting ironic-qa
17:02:20 <openstack> Meeting started Wed Dec  9 17:02:20 2015 UTC and is due to finish in 60 minutes.  The chair is krtaylor. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:02:21 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:02:24 <DuncanT> hemna: Moving to main channel. Model updates - we want to kill db access in drivers
17:02:24 <openstack> The meeting name has been set to 'ironic_qa'
17:02:34 <krtaylor> anyone here for ironic-qa?
17:02:35 <hemna> DuncanT, +1
17:02:39 <[1]cdearborn> o/
17:02:41 <rpioso> o/
17:02:43 <jroll> krtaylor: SUP :)
17:02:50 <sinval> o/
17:02:54 <liliars> o/
17:03:16 <krtaylor> hi everyone
17:03:25 <kcalman> o/
17:03:32 <krtaylor> here is the agenda:
17:03:36 <krtaylor> #link https://wiki.openstack.org/wiki/Meetings/Ironic-QA
17:03:59 <krtaylor> not much there, but I know better than to think we won't have anything to talk about  :)
17:04:26 <krtaylor> #topic Announcements
17:04:54 <sinval> We have great news about OneView CI, we are able to test deployment workflow, I assume that we are publishing the deploy test job until the end of this week :)
17:05:22 <krtaylor> good
17:05:28 <krtaylor> and we have one on the agenda
17:05:31 <krtaylor> OpenStack Health Dashboard
17:05:42 <krtaylor> not sure if that was sambetts
17:05:47 <jroll> also just a quick note, we're making the ipxe job voting, it's been stable for a while: https://review.openstack.org/#/c/255382/
17:06:01 <krtaylor> but here's the link for the Dashboard:
17:06:14 <krtaylor> #link http://status.openstack.org/openstack-health/#/g/project/openstack%252Fironic
17:06:37 <krtaylor> ipxe voting:
17:06:47 <krtaylor> #link https://review.openstack.org/#/c/255382/
17:07:03 <jroll> so now that we've brought the dashboard up... can we talk about how abysmal that failure rate is? can save for open discussion if we like...
17:07:33 <krtaylor> yes, we should, lets table that for open discussion
17:07:34 <jroll> and even worse, thuogh I think it may be a sample size thing: http://status.openstack.org/openstack-health/#/g/project/openstack%252Fironic-python-agent
17:07:37 <jroll> k
17:07:49 <sambetts> o/
17:08:06 <krtaylor> hi sambetts we were just looking at the dashboard
17:08:31 <krtaylor> ok, any other quick announcements?
17:08:32 <sambetts> Yeah I came up on the ML and I thought it was cool
17:08:37 <sambetts> it*
17:09:24 <krtaylor> #topic Grenade and Functional testing
17:09:52 <krtaylor> not sure of any progress this week either, jlvillal is in class
17:10:08 <krtaylor> anyone?
17:10:26 <krtaylor> then onward
17:10:42 <krtaylor> #topic Third party CI
17:10:54 <krtaylor> so,t he spec has had a lot of activity  :)
17:11:36 <sinval> hehe
17:11:36 <krtaylor> jroll, I read the comments and started a revision, but got distracted by my day job, sorry about the duplicate patch
17:12:07 <jroll> krtaylor: no worries :)
17:12:20 <krtaylor> there were some counterpoint comments
17:12:50 <krtaylor> lucasagomes isn't here...
17:13:12 <jroll> I think we're way past the point of deciding whether or not we're doing this...
17:13:52 <[1]cdearborn> agree
17:14:01 <krtaylor> jroll, agreed, but maybe we can address some of the concerns that prompted the concern
17:14:26 <krtaylor> used concern to many times
17:14:37 <jroll> seems like the concerns were "I don't want my driver dropped from tree, but I can't CI it"
17:14:50 <jroll> idk how to address this unless we just don't do it
17:15:15 <krtaylor> can that driver be tested as a part of the infra testing?
17:15:39 <krtaylor> if it requires special hw, then no
17:15:41 <jroll> almost certainly not, it needs real hardware afaik
17:15:44 <lucasagomes> we could somehow mark drivers in tree that are CI'ed and not CI'ed?
17:15:56 <lucasagomes> having a config option to allow enabling non-CI'ed driver?
17:15:59 <jroll> maybe kvm has a wol
17:16:12 <jroll> lucasagomes: that was an alternative proposed a long time ago that we decided not to do
17:16:17 <jroll> I don't remember off hand all the reasons why
17:16:26 <lucasagomes> jroll, it was a bit different AFAIR
17:16:39 <lucasagomes> the propose before was you could either enable production drivers or testing drivers
17:17:07 <lucasagomes> (but I wanna point out that, I'm OK with that proposal in the spec. I understand it has been discussed)
17:17:12 <jroll> hmm
17:17:22 <jroll> yeah, I think we don't want to turn back now
17:17:27 <krtaylor> historically, as projects have grown, that has not worked well, if it is in tree, it needs to be tested
17:17:36 <lucasagomes> jroll, yeah, and I'm fine with that
17:18:06 <jroll> cool
17:18:11 <lucasagomes> krtaylor, it's "tested" not as a functional test
17:18:17 <lucasagomes> but unittest is testing
17:18:29 <lucasagomes> I think my idea is more like staging drivers in linux
17:18:39 <jroll> dprince: do you want to talk about this at all or are you good with my replies on the spec patch?
17:18:50 <lucasagomes> which are drivers in-tree but with less promises that they are working
17:19:43 <krtaylor> lucasagomes, yeah, that is what I was proposing at summit, like an attic for drivers that were interesting, but not production
17:20:00 <lucasagomes> krtaylor, ++
17:20:30 <lucasagomes> and I understand that there's an overhead about having driver in tree as well, we have to maintain the code
17:20:38 <lucasagomes> so all ideas has pos/cons
17:20:54 <krtaylor> the spec started that way, but the consensus was that once out of main tested tree, they were noop
17:21:21 <dprince> lucasagomes: maintenance in-tree is much cheaper though
17:21:29 <lucasagomes> dprince, right
17:21:36 <jroll> why?
17:21:44 <dprince> jroll: grep
17:22:09 <jroll> grep -r thing ironic/ mydriver/
17:22:09 <jroll> ?
17:22:22 <dprince> jroll: I'm just not keen on all the split out here
17:22:45 <dprince> jroll: I mean if we expected 10 more power management drivers over the next year, maybe
17:22:45 <jroll> right, I'd like a concrete reason why. most of the issues I've heard are easily solved
17:23:04 <dprince> jroll: even then, the main arguement seems to be quality of the "in-tree" drivers
17:23:16 <dprince> jroll: I don't think iboot, and wol are dragging down that quality
17:24:06 <jroll> dprince: we want to be able to say "all drivers in tree are well tested"
17:24:21 <krtaylor> put another way, why would we be different than any other project wrt vendor CI?
17:24:33 <jroll> it's not that drivers are dragging down quality, it's that the quality is 100% unknown
17:24:41 <dprince> krtaylor: other projects got it wrong in some cases
17:24:46 <dprince> krtaylor: took it too far
17:24:48 <lucasagomes> one thing that in-tree drivers helps is to expand the driver interface... It's easy to argue about adding an extra parameter for example in one of the methods if it's in-tree
17:24:51 <lucasagomes> IMO
17:25:07 <lucasagomes> like when get_supported_boot_devices() was extended to add the "task" argument
17:25:20 <lucasagomes> because we needed to check the arch of the node instead of just returning an static list
17:25:44 <lucasagomes> small example, that is easy to reason about if the driver is in-tree when changing the base interfaces
17:25:57 <dprince> just my prospective, I don't see any reason to penalize developers who have taken the extra time to enhance their dev environments with drivers which are super helpful at developing and testing Ironic
17:26:19 <dprince> perhaps out of tree makes sense for some vendors who don't generally engage upstream
17:26:28 <dprince> sort of a pay to play model
17:26:33 <jroll> dprince: I don't see the penalty, is my point
17:26:44 <dprince> jroll: out of tree, out of site, out of mind
17:26:59 <jroll> and we *want* to keep out of tree drivers working
17:27:01 <dprince> jroll: i.e. I have to go and read the changelogs as to why interface X changed
17:27:02 <jroll> I think they're super valuable
17:27:15 <jroll> we don't want to break that interface
17:27:19 <dprince> jroll: right, I'm saying (in practice) it isn't perfect
17:27:26 <dprince> jroll: look at the includes in all the vendor drivers
17:27:41 <jroll> no, but we're working towards that
17:27:41 <jroll> yep
17:27:43 <dprince> jroll: some of those I think might be considered internal drivers
17:27:44 <jroll> we have a plan to get rid of those
17:28:14 <jroll> and have an actual driver api sort of thing
17:28:30 <dprince> jroll: until there are versioned (internally versioned) libaries out of tree could be painful
17:28:34 <dprince> jroll: that is all I'm saying
17:28:45 <jroll> sure
17:29:17 <jroll> we also have 1.5+ cycles to figure out how to make it less painful
17:29:29 <krtaylor> true
17:29:42 <krtaylor> but, we have already started off down the runway
17:30:32 <lucasagomes> jroll, what about documentation for out-of-tree drivers?
17:30:39 <lucasagomes> things like http://docs.openstack.org/developer/ironic/drivers/wol.html
17:30:57 <lucasagomes> or http://docs.openstack.org/developer/ironic/drivers/vbox.html
17:31:05 <jroll> great question, we'll need to solve that somehow
17:31:17 <jroll> readthedocs is pretty easy, and infra has support for it
17:31:39 <jroll> there's probably other good ways
17:31:51 <lucasagomes> true, yeah we should give it some thought
17:31:57 <krtaylor> it the spirit of big tent and fostering new projects (drivers) maybe therecan be a cross-project driver incubation repo
17:32:36 <krtaylor> but we have time to brainstorm that
17:33:25 <krtaylor> so, the issue is that there are valuable drivers that developers are using, that cannot be tested as a part of our infra check pipeline
17:34:02 <lucasagomes> krtaylor, right... they can be tested, but we just don't have $$ to actually test it
17:34:26 <krtaylor> and that is a special case, maybe we can find a vendor that could?
17:34:33 * krtaylor thinks ipmi...
17:34:38 <lucasagomes> I think jroll is looking at it
17:34:41 <lucasagomes> for ipmitool for e.g
17:34:52 <lucasagomes> because that's our "main" (not sure I can say it) driver
17:34:54 <jroll> yeah, I'm on top of ipmitool
17:35:02 <jroll> "recommended" :)
17:35:15 <jroll> so my thought is we roll forward with our plan, try to make it work well for those drivers, and go from there
17:35:15 <lucasagomes> IBM may want to do it for ipminative as well, since pyghmi is something they work on
17:35:39 <krtaylor> yes, we are having those discussions also
17:35:51 <jroll> and if it's super terrible for out of tree drivers, we either work really hard to fix it, or admit that out of tree in ironic is terrible and allow them back in tree with some other mechanism to say it's untested
17:36:18 <krtaylor> works for me
17:37:16 <lucasagomes> cool
17:37:25 <jroll> can we land this spec then? :D
17:37:30 <sambetts> I'm already maintaining an out of tree driver, and its not that bad right now, nothing has drastically broken anything recently
17:37:42 <krtaylor> jroll, +1000
17:37:45 <lucasagomes> sambetts, yeah we want to make it even better
17:37:51 <sambetts> of course :)
17:37:51 * lucasagomes wants at least
17:38:20 <jroll> lucasagomes: don't forget we had a whole session on the driver interface and came out with a plan :)
17:38:26 <sinval> jroll, I think there are some issues regarding testing, I'm not sure if krtaylor is planning to discuss that during this meeting...
17:38:40 <lucasagomes> jroll, right
17:38:51 <krtaylor> ok, we are winding down on that topic
17:38:57 <jroll> sinval: open discussion? :)
17:39:03 <krtaylor> anyone object to moving on?
17:39:12 <jroll> let's do it
17:39:14 <sinval> jroll: can be
17:39:15 <krtaylor> #topic General QA and Open Discussion
17:39:26 <jroll> sinval: whatcha got
17:40:31 <sinval> deva had comments about the testing session in the spec, something about: "None is not good for a testing spec"
17:40:40 <jroll> sinval: oh, that's been fixed
17:41:14 <krtaylor> that was a joke about a testing spec not having test impact :)
17:41:36 <krtaylor> it was the subject of the entire spec :)
17:41:48 <jroll> my open discussion topic: we have sporadic failures caused by either timeouts and/or pxe failures, still plaguing the gate for months (years?) now, and it's long past the point where we're terrible people for not fixing them
17:41:53 <jroll> #link http://status.openstack.org/elastic-recheck/#1393099
17:41:59 <jroll> #link http://status.openstack.org/elastic-recheck/#1408067
17:42:06 <jroll> top 2 out of 3 on elastic recheck
17:42:29 <jroll> we can't just let these sit anymore, we need someone working on them as hard as possible, and nothing else
17:42:32 <lucasagomes> that timeout is a PITA indeed
17:42:44 <jroll> clark b has the bug numbers memorized - that's how bad it is
17:43:16 <jroll> I tried to get to it, but I have way too much other stuff going on
17:43:25 <jroll> I'm happy to prioritize it on my list, but would rather have a volunteer that knows this stuff well working on it
17:43:37 <jroll> it's a terrible waste of infra resources :(
17:43:38 <krtaylor> jlvillal will more than likely be interested, but I won't volunteer him
17:45:22 <krtaylor> #action find someone to work the transient timeout and pxe failures
17:45:27 * lucasagomes is also too busy to take a look
17:45:36 <lucasagomes> but I'm happy to help
17:45:37 <jroll> no volunteers? :(
17:45:44 <krtaylor> I am too atm, maybe in a few weeks
17:45:57 <lucasagomes> jroll, do we know at least why it takes so long? slow network when PXE booting?
17:46:26 <jroll> lucasagomes: probably slow gate nodes, I don't think localhost networking could be that slow
17:46:34 <lucasagomes> tftp can be a real pain even for local network
17:46:44 <lucasagomes> right :-/
17:47:05 <lucasagomes> sambetts, would be good to test ur tiny ipa there anyway
17:47:10 <lucasagomes> it will consume fair less resources
17:47:16 <sambetts> I was about to suggest that :-P
17:47:17 <lucasagomes> should be very quick to boot that
17:47:47 <jroll> idk if it's always the boot being slow, either, hard to tell
17:47:51 <jroll> could just be *everything* being slow
17:48:06 <jroll> for instance, jay had a test yesterday where it timed out during cleaning
17:48:18 <sambetts> its a considerably smaller image to tranfer by tftp too, so that might help
17:49:16 <sambetts> :/
17:49:27 <lucasagomes> one thing that is hard to debug is the boot time
17:49:35 <lucasagomes> because every time we power on the nested VM
17:49:44 <lucasagomes> the file which the console is redirect to is overwritten
17:49:56 <lucasagomes> so it's actually hard to see the logs and figure out how long did it take to boot
17:49:58 <sambetts> yeah that is really frustrating for debuging inspector too
17:50:15 <jroll> yep, need to fix that as well
17:50:41 <jroll> I guess I'm going to get hacking on that stuff, then
17:51:11 <lucasagomes> yeah, improving the troubleshooting def helps
17:51:23 <jroll> I may also work on moving devstack code into our tree as a plguin, that will make it much easier to work on it I thin
17:51:23 <jroll> k
17:51:28 <jroll> and make devstack people super happy
17:51:32 <lucasagomes> ++
17:51:35 <sinval> krtaylor: did you have the chance to think about those questions about testing in the spec? or we are going to postpone it?
17:52:07 <jroll> sinval: have you looked at the spec since yesterday?
17:52:18 <jroll> sinval: what questions do you have that are still unanswered?
17:52:19 <krtaylor> sinval, yes I replied, we will document
17:52:39 <sinval> jroll, krtaylor I'm reading right now
17:52:56 <jroll> ok
17:53:54 <krtaylor> we are nearing the top of the hour, any other topics, questions?
17:55:46 <sambetts> Nothign from me
17:56:02 <krtaylor> going once...twice...
17:56:08 <sinval> ops
17:56:23 <sinval> 1. Are we considering that the tempest-dsvm-pxe-ipa is enough for testing a patch  impacts on a driver?
17:56:29 <sinval> 1.1 If not: What are the base test cases for considering a driver CI as "reliable"?
17:56:38 <sinval> 2. Can a CI implement specific test cases that are not in Ironic tree or even in the Tempest tree to ensure that their driver is not broke by a patch?
17:56:51 <sinval> 3. Are we considering implementation of functional test cases of driver interfaces to ensure that a CI is reliable?
17:57:21 <sinval> just some things to think about, I'm not sure if it is clear to everyone
17:57:47 <krtaylor> good questions, but don't have all the answers atm
17:58:25 <jroll> so, tl;dr 1) it's a good start, and we can add to it. 2) sure? :) 3) I haven't considered it much, but we should totally add things like power off/on calls to the API to tempest if they don't exist already
17:59:00 <krtaylor> agreed  :)
17:59:00 <sinval> cool
17:59:07 <jroll> sinval: let's add those to the docs when we write those :)
17:59:17 <sinval> jroll: sure
17:59:29 <krtaylor> yes! I'll sign you up :)
17:59:38 <krtaylor> ok, so I think we are done here
17:59:42 <krtaylor> thanks everyone!
17:59:48 <jroll> thanks krtaylor :D
17:59:58 <sinval> o/
18:00:07 <krtaylor> #endmeeting