19:01:24 <mtaylor> #startmeeting
19:01:25 <openstack> Meeting started Tue Aug  7 19:01:24 2012 UTC.  The chair is mtaylor. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:26 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:26 <mtaylor> CI anybody?
19:01:33 <jeblair> please.
19:01:43 <mtaylor> jeblair: you wanna talk about anything?
19:02:21 <jeblair> mtaylor: i'd like to talk about current problems and what we're going to do about them.
19:02:28 <mtaylor> #topic current problems
19:02:36 <mtaylor> jeblair: you have the floor
19:02:41 <jeblair> mtaylor: have you had a chance to look into why a git remote update taks 6 minutes?
19:03:21 <mtaylor> jeblair: no, I have not. I keep getting pulled on to phone calls
19:03:59 <mtaylor> jeblair: is it only between rax hosts? or does that need investigation too?
19:04:02 <jeblair> so i'm very worried about that.
19:04:55 <jeblair> i have only performed minimal investigation.  it seems to be most pronounced on an oneiric rs host talking to review.openstack.org.  it seems less of a problem from a precise host.  and it doesn't seem to be much of a problem from my home connection.
19:04:58 <jeblair> that's all the data i hive.
19:04:59 <jeblair> have
19:05:08 <mtaylor> ok
19:05:22 <jeblair> also....
19:06:15 <jeblair> review.o.o is averaging 2.24 Mb outbound traffic.  I forget what the link is, but i don't think it's hitting the limit.
19:06:28 <mtaylor> oneiric is running an older version of git than precise - but not by much, so it _should_ support efficent http protocol
19:07:06 <jeblair> i'd really love it if someone could look into that, because it's eating up 6 minutes of run time for every unit test, and more than _20_ minutes on devstack runs.
19:07:11 <jeblair> i think it's a critical issue.
19:07:14 <mtaylor> 1.7.5.4 vs. 1.7.9.5
19:07:35 <mtaylor> yeah. I will work on that this afternoon. LinuxJedi any chance you have any brain-space to help?
19:07:53 <jeblair> the ideas i've brainstormed are slow links between certain sets of hosts within rackspace
19:08:09 <LinuxJedi> probably, yes
19:08:33 <jeblair> or perhaps even oversubscribed io
19:08:53 * LinuxJedi wonders if we can simulate the traffic
19:09:11 <jeblair> which traffic?
19:09:38 <LinuxJedi> git http
19:10:14 <jeblair> "git remote update" ?
19:10:41 <mtaylor> jeblair: btw - do we need to do remote update?
19:10:51 <LinuxJedi> sure, but I meant with something we can get some real data from
19:10:54 <mtaylor> jeblair: since we're later doing fetch on specific refspec?
19:10:59 <jeblair> yes, so that we can "git checkout master"
19:11:23 <jeblair> LinuxJedi: i'm trying to follow, but i'm puzzled.  what's not real about that?
19:11:56 <jeblair> LinuxJedi: i'm trying to understand what thing you want to simulate?
19:12:45 <LinuxJedi> jeblair: nevermind we can just use packet analysers to get everything I was thinking of
19:12:56 <jeblair> ok
19:13:12 <jeblair> next critical issue:
19:13:17 <jeblair> i filed this bug: https://bugs.launchpad.net/openstack-ci/+bug/1034032
19:13:18 <uvirtbot> Launchpad bug 1034032 in openstack-ci "make static html versions of jenkins reports for archiving" [Critical,Triaged]
19:13:44 <jeblair> we really need to stop using the junit post-build option in jenkins
19:14:31 <jeblair> that's step on in that bug -- to generate junit output some other way
19:14:55 <jeblair> then we can stick the job information, including the unit test report, on a static webserver
19:15:28 <jeblair> i think this is the next biggest scalability hurdle for us -- it's significantly slowing down zuul and jenkins
19:15:39 <jeblair> and it's the biggest cause of deadlocks that we have to manually clear out
19:16:07 <clarkb> can we disable that plugin while we work on a fix?
19:16:11 <jeblair> so i'd really like to see some progress on that.
19:16:32 <jeblair> clarkb: unfortunately, we did a lot of work to make the unit test output usable via the junit module...
19:16:36 <mtaylor> clarkb: doesn't nose have an html output mechanism?
19:16:48 <jeblair> mtaylor: no it does not.
19:17:34 <jeblair> mtaylor: if we disabled the junit build step, do you believe the results would be usable to developers, as things currently stand?
19:17:53 <jeblair> (my understanding is that you did some work to get all of the log output, etc, in the xunit report)
19:18:16 <mtaylor> jeblair: I did - but that was really just getting the results to be picked up by nose
19:18:35 <mtaylor> jeblair: if we turned off xml output, we'd get standard nose error reports to stdout at the end of the run
19:19:06 <clarkb> it isn't the output that is failing though right? it is the junit plugin waiting for the output?
19:19:11 <jeblair> correct
19:20:00 <jeblair> mtaylor: so you think that output would be useful, without the organization provided by junit?
19:20:34 <mtaylor> jeblair: it wouldn't be as pretty, but all of the information should be there
19:20:45 <mtaylor> oh, wait
19:21:42 <mtaylor> so - they'd get this:
19:21:44 <mtaylor> https://jenkins.openstack.org/job/gate-nova-python26/3956/console
19:21:53 <mtaylor> so you can see the traceback at the end
19:22:00 <mtaylor> as well as captured logging
19:22:03 <mtaylor> it's ugly, but it's here
19:22:05 <jeblair> not yet i can't, i'm waiting for jenkins.
19:22:05 <mtaylor> there
19:23:10 <jeblair> mtaylor: why isn't that job processed by junit?
19:23:29 <clarkb> jeblair: it is but the option to also retain stdout results is checked
19:23:46 <mtaylor> what clarkb said
19:23:49 <jeblair> er
19:23:49 <jeblair> no
19:23:58 <jeblair> why does that build not have a test report?
19:24:00 <mtaylor> oh, you're right
19:24:48 <mtaylor> we do this: export NOSE_WITH_XUNIT=1
19:24:49 <jeblair> the option to "also retain stdout" is an option in jenkins junit processing that causes jenkins to keep the output recorded in the xml file
19:25:00 <jeblair> yes, that job _wrote_ an xml file
19:25:05 <jeblair> but jenkins did not read it.
19:25:36 <jeblair> clarkb: the "retain stdout" option doesn't have any effect on what is printed to stdout by nose, or what's recorded in the xml file.
19:25:56 <mtaylor> good question
19:26:07 <jeblair> mtaylor: so what i'm getting at is, is there logging that's going into the xml file that we're not seeing?
19:26:42 <mtaylor> jeblair: there should be no _additional_ logging into xml
19:26:46 <jeblair> mtaylor: i thought there was a whole "run_tests.log" thing and you got most of the output to go via the nose log capture plugin.
19:26:58 <jeblair> i don't see any logs from that failed test
19:27:06 <mtaylor> that is the output at the end
19:27:14 <jeblair> oh wait
19:27:17 <jeblair> there is _one_ log line
19:27:27 <mtaylor> yeah. there just wasn't much log output during that test run
19:27:38 <jeblair> okay, sorry i missed that.
19:27:50 <jeblair> i just saw the exception.  was expecting more logs.
19:28:29 <jeblair> okay, so i guess we can disable junit in jenkins as clarkb suggested.  do we have agreement on that?
19:28:43 <mtaylor> yes. works for me
19:28:49 <clarkb> works for me too
19:29:06 <mtaylor> and then we can work on getting nose output into html pages that can be copied to a static web server
19:29:24 <mtaylor> once we do that, I assume zuul will be able to report back the right link for people to look at?
19:29:25 <jeblair> okay.  i will submit a change to disable junit
19:29:32 <jeblair> yep. that's all described in the bug
19:29:35 <mtaylor> great
19:29:59 <jeblair> (the rest of the bug still stands -- this is just the first step)
19:30:18 <clarkb> yup
19:30:26 <jeblair> #action jeblair disable junit processing in jenkins jobs
19:30:36 <jeblair> (is that the right action syntax?)
19:31:06 <jeblair> next critical item: https://bugs.launchpad.net/openstack-ci/+bug/1010621
19:31:08 <uvirtbot> Launchpad bug 1010621 in openstack-ci "important servers should have backups" [Critical,In progress]
19:31:30 <jeblair> mtaylor: have you heard back about the hpcloud volume service?
19:32:10 <clarkb> jeblair: yes I think that is correct
19:32:30 <mtaylor> jeblair: no
19:32:34 <mtaylor> jeblair: pinging again
19:33:32 <jeblair> mtaylor: are they being unresponsive?  cause I'd really like to get this going, but i don't think we have an effective backup until we _at least_ have one in another account from the one where all our important servers are...
19:34:28 <jeblair> i mean, should we come up with a different backup strategy?
19:36:09 <mtaylor> jeblair: they report it as enabled on tenant id 15813847660783
19:36:46 <jeblair> mtaylor: no idea what tenant that is.
19:36:52 <mtaylor> I know. super helpful, right?
19:37:59 <jeblair> okay, that's supposed to be the stackforge tenant.  i'll try again.
19:38:21 <jeblair> #action jeblair to see if volume service works in hpcloud for backups
19:38:38 <jeblair> so those are all the critical issues i know of that we can do something about at the moment...
19:39:16 <jeblair> i'd really like to see work going into addressing those, and maybe dealing with some of the bug backlog.
19:39:57 * mtaylor agrees
19:40:11 <mtaylor> I'll work with LinuxJedi on trying to figure out the git slowdown
19:40:20 <jeblair> i'm filing a bug about that one right now
19:40:21 <mtaylor> clarkb: can you take figuring out how to get a nice html report from nose?
19:40:27 <clarkb> sure
19:40:44 <mtaylor> jeblair: artifact copy itself is a blocking operation, yeah?
19:40:57 <mtaylor> jeblair: do we need to solve that for html report transfer?
19:41:19 <clarkb> mtaylor: we could just add another 'execute shell' at the end of the builds if we need to
19:41:25 <jeblair> yeah...
19:41:28 <jeblair> or
19:41:40 <clarkb> there is the option to always run a piece of code even if a test fails I think
19:41:42 <jeblair> just as an scp operation, without listing them as artifacts, like the docs jobs.
19:42:00 <clarkb> that should work
19:42:13 <mtaylor> k
19:42:17 <jeblair> i think that's worth investigating first -- i don't think it has the jenkins locking problems that artifacts have
19:42:26 <jeblair> and it's cleaner than running a shell
19:42:37 <jeblair> but another challenge is how to get the console output..
19:42:38 <clarkb> and we already have jenkins_job builders for it
19:42:47 <clarkb> jeblair: more tee?
19:43:19 <mtaylor> let's do one thing at a time - let's get test output fixed
19:43:26 <jeblair> clarkb: that's an idea.  output with timestamps would be even more awesome though.  :)
19:43:38 <mtaylor> then we can figure out console output
19:43:56 <jeblair> https://bugs.launchpad.net/openstack-ci/+bug/1034130
19:43:58 <uvirtbot> Launchpad bug 1034130 in openstack-ci "find out why git operations from oneiric hosts are slow" [Critical,Triaged]
19:44:25 <jeblair> #action clarkb get a nice html report from nose
19:44:41 <jeblair> #mtaylor,LinuxJedi find out why git operations from oneiric hosts are slow
19:45:05 <clarkb> you forgot the #action
19:45:10 <jeblair> #action mtaylor,LinuxJedi find out why git operations from oneiric hosts are slow
19:45:11 <jeblair> heh
19:45:37 <LinuxJedi> mtaylor is an action :)
19:45:42 <jeblair> i think that covers critical operational stuff.
19:46:09 * clarkb jumps in really quick so that he doesn't forget. if people could take a look at https://review.openstack.org/#/c/10784/ that would be awesome
19:46:24 <clarkb> that is a draft to deal with one of the things assigned to me last week
19:49:14 <mtaylor> clarkb: I shall look at that
19:49:21 <mtaylor> clarkb: perhaps un-draft it so I don't forget?
19:49:35 <clarkb> ok
19:50:16 <mtaylor> #topic open floor
19:50:22 <mtaylor> anybody got anything else?
19:50:38 <clarkb> gerritbot is ready for its first release as soon as zuul can handle it
19:50:50 <jeblair> go for it
19:51:00 <clarkb> I don't think I have the proper permissions
19:51:30 <clarkb> I can't even +2 gerritbot
19:51:42 <devananda> devstack merged my openvz support patch, so it would be great if devstack-gate did the same soon
19:51:59 <jeblair> clarkb: apparently you need to join openstack-ci-core
19:52:07 <clarkb> jeblair: ok
19:52:13 <jeblair> (and we probably need to give that group perms to tag)
19:52:38 <mtaylor> clarkb: http://pypi.python.org/pypi/HTMLTestRunner
19:52:44 <mtaylor> devananda: ++
19:53:03 <jeblair> mtaylor: why don't you review it?  i gave it a +2.  ;)
19:53:11 <clarkb> mtaylor: nice. BSD too
19:53:46 <mtaylor> jeblair: maybe I will
19:53:51 <jeblair> lovely.  i mean, other than the colors.  but i think that's exactly what we want.
19:54:01 <mtaylor> I have no idea if that's suitable, but it might be a start at leats
19:54:04 <mtaylor> least
19:54:22 <jeblair> that at least is what i was imagining the end product looked like.
19:54:32 <jeblair> s/looked/should look/
19:55:02 <mtaylor> ++
19:57:01 <mtaylor> okie. that's all I've got for this week
19:57:07 <mtaylor> thanks all
19:57:10 <mtaylor> #endmeeting