#openstack-meeting log

19:04:34 <fungi> #startmeeting infra
19:04:35 <openstack> Meeting started Tue Feb 12 19:04:34 2013 UTC.  The chair is fungi. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:04:36 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:04:38 <openstack> The meeting name has been set to 'infra'
19:05:00 <fungi> Wiki: Meetings/InfraTeamMeeting (last edited 2013-01-29 05:30:39 by JamesBlair)
19:05:10 <fungi> i feel like i've seen this episode before
19:05:24 * fungi pulls up the action items list
19:06:02 <fungi> clarkb start discussion on long term log archival options when jeblair gets back
19:06:10 <fungi> i guess that can happen now
19:06:31 <clarkb> yes, sort of started it yesterday but not in much detail
19:06:32 <jeblair> o/
19:06:41 <fungi> it's a jeblair!
19:06:42 <clarkb> should we have that discussion here and now?
19:07:09 <fungi> up to you guys. the itenerary is short and mostly checkboxish
19:07:17 <clarkb> may as well then
19:07:30 <clarkb> our test log growth is not linear
19:07:37 <fungi> to say the least
19:08:01 <jeblair> so i was wondering how much logstash could be a complete replacement for statically storing logs
19:08:09 <clarkb> http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=309&rra_id=all you can see the curve at the bottom of that page
19:08:12 <jeblair> and clarkb said it may not be a good one
19:08:21 <jeblair> clarkb: can you elaborate on why?
19:08:37 <clarkb> jeblair: logstash's purpose is to index and give quick access to your logs
19:08:49 <clarkb> this makes it a bad way to archive your logs
19:09:03 <jeblair> clarkb: but doesn't it have a complete copy of them and present them to you through a web ui?
19:09:14 <fungi> so it's only useful for recent/small log quantities i guess?
19:09:43 <clarkb> yes, however if were to keep 100GB of logs in logstash I get the feeling we would need a much larger elasticsearch cluster to handle the increased workload
19:10:07 <clarkb> I don't think we want to sacrifice index speed and usability for archival functionality
19:10:37 <jeblair> clarkb: ok, so you're suggesting we maintain a smaller logstash system for searching the last x days of logs
19:10:43 <ttx> o/
19:10:44 <clarkb> yeah
19:10:49 <fungi> oh, guess i forgot to /topic
19:10:57 <fungi> #topic log archival
19:11:05 <jeblair> clarkb: and keep 6 mos of logs in a static setup?
19:11:18 <jeblair> clarkb: (it's easy to delete >x days of logs from logstash?)
19:11:20 <clarkb> the way logstash is configured to index by default suggests that this is the normal operating behavior (each day gets its own index and you want to keep the number of indexes down)
19:11:50 <clarkb> jeblair: yes and yes. To make deleting logs from logstash easy you set it to use a timebased index of some sort
19:12:02 <clarkb> jeblair: then you have a cron or similar look for old indexes and delete them
19:12:05 <jeblair> gotcha
19:12:49 <jeblair> clarkb: so for the zuul reports in gerrit, we'd probably need to link to the static archiving solution
19:12:52 <mordred> o/
19:12:59 <jeblair> clarkb: since the logstash one will disappear
19:13:17 <clarkb> jeblair: I think so. But probably with a note that you can search recent job runs in logstash and link to logstash
19:13:18 <jeblair> but maybe we can link to logstash from the static reports
19:13:21 <fungi> and so we'll still need some separate interface to anonymously browse older archived logs. does cloudfiles have that built in?
19:13:49 <jeblair> yeah, we're relying on the apache mod_index for that now
19:14:08 <jeblair> i don't think there's such a thing for cloud files
19:14:14 <mordred> maybe notmyname has something ...
19:14:17 <jeblair> notmyname, mordred?
19:14:23 <notmyname> just saw my name
19:14:45 <mordred> notmyname: last 8 lines of scrollback - discussing putting build logs in swift - wondering about indexes
19:15:02 <jeblair> notmyname: http://logs.openstack.org/21691/1/check/gate-tempest-devstack-vm-full/2081/
19:15:23 <notmyname> if you use the pseudo directory naming structure I suggested yesterday (to clarkb?), then you have listing support
19:15:56 <notmyname> combined with staticweb (which cloud files supports), you can even get "pretty" web pages
19:16:04 <jeblair> notmyname: oh, so if you GET a .../job-id/ url, you get an index page?
19:16:19 <notmyname> jeblair: ya, it can be configured to do so
19:16:29 <jeblair> notmyname: cool, that sounds like exactly what we need then
19:16:32 <jeblair> thx!
19:16:35 <fungi> perfect
19:17:03 <ttx> yay cloud
19:17:17 <jeblair> so we should be able to use jclouds to put these things in cloud files...
19:17:18 <fungi> have we exhausted that topic for the moment? action items coming from it?
19:17:22 <clarkb> the second thing we need to sort out is having jenkins use swift the logs. in theory we can rely on jclouds for that. in practice clarkb had a bad experience last week
19:17:42 <fungi> ahh, right
19:17:48 <jeblair> then clarkb can watch the zeromq stream to fetch those things and shove them in logstash
19:17:52 <fungi> you said there was a patch forthcoming from jclouds
19:18:01 <jeblair> clarkb: oh :(, what's the skinny on that?
19:18:05 <clarkb> the current release of the jclouds plugin is broken for blob storing. this has been fixed in master.
19:18:30 <clarkb> second that fixed commit dumps your cloud key/password to the console log when it gets a 401 from the cloud provider
19:18:47 <clarkb> there is a potential fix for that at the tip of master but i have yet to find time to test it
19:18:52 <jeblair> my experience suggests that errors from cloud providers are frequent.
19:18:59 <fungi> oh, right. lots of notfun in that
19:19:21 <clarkb> so we will just need to be careful and defensive about how we test the jclouds blobstore
19:19:32 <clarkb> abayer in #jclouds has been super helpful though
19:20:06 <jeblair> clarkb: yeah, we don't need to rush this.  :)
19:20:35 <fungi> we now have more space at least
19:21:51 <fungi> okay, so next action items, or was there more on that one?
19:22:09 <jeblair> fungi: i'm good
19:22:20 <clarkb> I think that is it
19:22:30 <fungi> #topic wiki stuffs
19:22:35 <mordred> moin sucks
19:22:39 * annegentle waves
19:22:45 * annegentle agrees with mordred
19:22:52 <fungi> next action items were the date change to saturday and annegentle sending an updated announcement?
19:23:09 <annegentle> fungi: I wasn't sure if the date really changed so I didn't send anything :)
19:23:19 <jeblair> to answer annegentle's question in email, no, i'm not critical.  :)
19:23:34 <mordred> jeblair is so critical
19:24:04 <fungi> so then it's sticking with sunday after all?
19:24:58 <pleia2> ryan confirmed saturday was ok
19:25:41 <jeblair> (i'm so sorry!)
19:25:43 <clarkb> yeah we are saturdaying
19:26:00 <clarkb> annegentle: oh, you didn't get the email from ryan? I completely missed that this didn't go out
19:26:07 <clarkb> my bad
19:26:14 <fungi> so if it's saturday, then we need a last-minute update announcement i guess
19:26:17 <annegentle> clarkb: ohh ok
19:26:23 <annegentle> yeah that's how I missed it
19:26:27 <annegentle> sure, I'll send now
19:26:32 <fungi> annegentle: thanks!
19:27:06 <fungi> so who all is planning to be around for the cut-over on saturday then?
19:27:14 * jeblair plans to be around
19:27:21 <mordred> yeah. me too
19:27:30 <olaph> should be
19:27:39 <pleia2> not me (I'll be out of town for the holiday weekend)
19:27:45 * ttx will probably be jetlagged but ~present
19:28:10 <clarkb> o/
19:28:15 <fungi> #action jeblair, clarkb, olaph, Ryan_Lane, mordred, annegentle, ttx, fungi work on wiki upgrade 2013-02-16
19:28:53 <fungi> any other wiki-related notes while we're on the topic? we're missing a Ryan_Lane in here i guess
19:29:34 * mordred just saw Ryan_Lane an hour ago - he was leaving on his way to work ...
19:29:40 <fungi> we can catch up with him later in the week if there are last-minute issues i suppose
19:29:48 * zaro is out of town.
19:30:13 <fungi> #topic python-swiftclient upload
19:30:17 <fungi> mordred to upload python-swiftclient 1.3.0
19:30:20 <fungi> that happened, right?
19:30:25 <mordred> yup
19:30:33 <fungi> okay. that's it for old business
19:30:36 <mordred> I used the jenkins jobs too - so they have been tested
19:30:42 <fungi> new business...
19:30:57 <fungi> # cla
19:31:02 <fungi> er
19:31:05 <fungi> #topic cla
19:31:30 <mordred> funny - we're just now talking about CLAs at the board meeting
19:31:38 <fungi> no real news on the cla stuff. basically ready, pending last minute reviews (i'll hit people up in the next week for reviewing after a rebase)
19:31:51 <jeblair> at LCA we chatted with notmyname about it, and realized he was unaware of the change...
19:31:58 <fungi> still on track for 2013-02-24 cut-over
19:32:00 <jeblair> mordred: are we getting rid of CLA? ;)
19:32:05 <mordred> jeblair: nope
19:32:12 <clarkb> jeblair: fungi should we send another announcement to the mailing list?
19:32:15 <jeblair> so we discussed that we sholud do some extra communication...
19:32:20 <jeblair> probably in a different form
19:32:21 <fungi> yes, agreed
19:32:21 <mordred> yeah. more specific
19:32:27 <mordred> less explanatory
19:32:36 <mordred> "you will need to resign the CLA on X"
19:32:43 <jeblair> ttx: can you make sure the PTLs are all aware of this at the project meeting?
19:32:57 <ttx> jeblair: sure
19:32:59 <fungi> #action fungi draft and send more explicit/cautionary announcement about cla cut-over
19:33:10 <fungi> #action ttx discuss cla at next project meeting
19:33:19 <jeblair> we can ask lauren/stef to use blog/community newsletter/etc to disseminate it as well
19:33:27 <ttx> fungi: any chnace you cna send that before the meeting, so that I can reference your post ?
19:33:31 <ttx> cna*
19:33:33 <ttx> can*
19:33:35 <fungi> #action fungi hit up infra core team for reviews
19:33:40 <fungi> ttx: sure
19:33:51 <fungi> remind me when the project meeting happens?
19:33:57 <fungi> i can look it up if necessary
19:34:11 <fungi> i can join in when that topic comes up too
19:34:15 <jeblair> fungi: then maybe pass that announcement to lauren and stef for them to process
19:34:39 <fungi> jeblair: awesome. sounds like a great idea
19:34:46 <jeblair> finally, there's one other option we should consider: we _could_ email everyone in gerrit.
19:35:00 <fungi> spamtastic
19:35:07 <fungi> i'm a little edgy about that
19:35:19 <jeblair> i _think_ the other things we've discussed should be enough...
19:35:29 <mordred> they'll notice whenthey go to upload something
19:35:59 <jeblair> so i'd propose that we do those, and only if we still think no one knows about it a week before the cutover, should we spam.
19:36:15 <jeblair> (regardless, we should send at least one more announcement closer to the cutover)
19:36:28 <jeblair> (ml, i mean, not spam)
19:36:33 <fungi> wiki tells me the project/release meeting is Tuesdays at 2100 UTC so i'll see about getting that announcement to the -dev ml in the next hour after this meeting
19:36:45 <fungi> if that's what ttx was suggesting
19:37:05 <ttx> yep
19:37:11 <jeblair> mordred: indeed, i think that's a reasonably mitigating factor.  the error has its own instructions for rectification.
19:37:15 <ttx> otherwise I'll just point to the date
19:37:29 <fungi> yes. the error messages are quite explicit
19:37:38 <fungi> urls and all
19:38:04 <fungi> okay, anything else on cla stuff?
19:38:14 <jeblair> nope
19:38:20 <fungi> #topic jenkins slave operating systems
19:38:50 <fungi> i think we covered a couple items under this topic umbrella previously... jclouds and quantal
19:39:25 <fungi> quantal is working for static slaves but i'm having trouble with jclouds'ing it. asked on #jclouds but nothing too helpful yet
19:40:17 <fungi> short story is i can launch slaves from jclouds with the ubuntu version specified as 11.10 or 12.04, but if i change that to 12.10 i get an immediate java exception about not matching any images
19:40:36 <fungi> so i think it's an image metadata issue in rackspace at this point
19:40:47 <mordred> YAY!
19:41:08 <fungi> also jclouds-related, clarkb: you had some issues with slaves not deleting right?
19:41:21 <clarkb> maybe
19:41:30 <fungi> fungi@ci-puppetmaster:~$ nova list|grep -c jclouds
19:41:32 <fungi> 16
19:41:39 <clarkb> so yeah
19:41:48 <fungi> i don't see anywhere near that many in jenkins (like maybe only 1)
19:41:57 <ttx> fungi: 1h20min from now
19:42:21 <fungi> ttx: yep. thanks! i checked the wiki pretty much immediately anyway
19:42:25 * ttx is answering asynchronously
19:42:31 <jeblair> probably the same thing we work around in devstack-gate...
19:42:46 <jeblair> nova returns a 200 for the delete api call and then does not delete the server
19:43:14 <fungi> that's pretty neat
19:43:20 <jeblair> some how this is apparently not a bug
19:43:35 <fungi> s/bug/money making opportunity for providers/
19:43:45 <jeblair> except in our case.  :/
19:43:50 <fungi> indeed
19:44:42 <fungi> anyway, if there's nothing much else on that, we should probably jump into devstack/tempest/gating changes et cetera
19:45:04 <fungi> #topic devstack, tempest and gating changes
19:45:05 <ttx> notabug: "Empowering the ecosystem"
19:45:14 <fungi> ttx: heh
19:45:23 <jeblair> enriching the ecosystem
19:45:57 <fungi> okay, so there's been some more improvements to test scope, runtimes and also proposed efficiency improvements for the gating pipeline?
19:46:09 <fungi> anyone want to discuss high points on that?
19:46:56 <jeblair> fungi: is 'efficiency improvements' https://review.openstack.org/#/c/20948/  ?
19:47:17 <fungi> jeblair: that looks like one of them
19:47:25 <jeblair> fungi: another?
19:47:29 <fungi> clarkb: also has a wip one i think
19:47:51 <fungi> and we just put something through to make expensive tests dependent on cheap ones
19:48:04 <fungi> though only for the check pipeline, not gate
19:48:08 <jeblair> fungi, clarkb, link?
19:48:13 <jeblair> to clarkb's change
19:48:14 * fungi is looking
19:48:24 <jeblair> i saw mordred's pep8 change
19:48:35 <fungi> #link https://review.openstack.org/21723
19:49:51 <jeblair> mordred: for https://review.openstack.org/#/c/21267/
19:49:57 <jeblair> mordred: what problem does that solve?
19:50:35 <clarkb> jeblair: https://bugs.launchpad.net/zuul/+bug/1123329
19:50:37 <uvirtbot> Launchpad bug 1123329 in zuul "Zuul should remove unmergable changes from queue as soon as possible." [Undecided,New]
19:51:31 <jeblair> clarkb: i like the sound of that
19:51:31 <clarkb> jeblair: with the increased gate time per change unmergable changes end up serializing zuul more than is necessary
19:51:51 <fungi> i believe the idea behind 21267 was to avoid burning slave time on tempest tests (which are upwards of an hour or two at this point) if the tests which take <5 minutes don't pass anyway
19:52:48 <jeblair> fungi: yeah, but we're not running low on devstack/tempest slaves, yeah?  and with jclouds, it seems like we shouldn't be running low on regular slaves either, in general...
19:53:01 <jeblair> fungi: so what that change does is optimize for the case where pep8 fails
19:53:36 <jeblair> it means that if your change fails pep8, the devs are notified very early
19:53:50 <clarkb> a lot of changes fail pep8
19:53:56 <jeblair> but if your change passes pep8, it now takes runtime(tempest)+runtime(pep8) to be notified
19:54:09 <fungi> or pyflakes in some cases, but yes. more generally it was to avoid long-running tests if the patch isn't syntactically correct python (style checks are just a bonus there)
19:54:43 <jeblair> yeah, i'm just wondering why that's desirable -- it is because we want to notify people that changes fail pep8 fast (at the cost of making successful changes take _longer_ to notify)
19:54:54 <jeblair> or is it to solve a slave resource contention problem
19:55:16 <fungi> and we're still running short on static slaves for the moment, until jclouds is in better shape
19:55:42 <mordred> the devs who were chatting with us (sdague was an active one) didn't seem to mind the extra 5ish minutes compared to getting the reject quicker
19:55:47 <clarkb> I see it as reducing unnecessary load on the slaves and to encourage devs to run tox -epep8 before pushing
19:55:59 <clarkb> jenkins is becoming everyones personal test box
19:56:22 <mordred> alternately ... it might be interesting to add a feature to zuul to cancel and dequeu remaining jobs if one of them fails
19:56:41 <jeblair> yeah, though looking at the zuul queue, the static slave tests are running ahead of the devstack tests, which is the main thing
19:56:43 <mordred> which could get us to a place where we get the canary benefit without the serialization concern from jeblair
19:57:03 <jeblair> mordred: yeah, that's kind of where i was heading
19:57:06 <fungi> we wind up starving the gate pipeline of available static slaves under heavy activity periods right now, and at least a significant percentage are spinning on failed unit tests for the check pipeline
19:57:45 <fungi> so yes, anything to help that situation would be an improvement
19:58:08 <jeblair> mordred: we wanted to return as much info to devs as quickly as possible
19:58:17 <mordred> ++
19:58:18 <jeblair> mordred: i think that's still desirable
19:58:23 <mordred> I agree
19:58:43 <jeblair> mordred: but clarkb has a good point that people are just throwing shit at jenkins and seeing what sticks
19:58:57 <jeblair> admittedly, he didn't quite put it like that.  but i will. :)
19:59:02 <clarkb> :)
19:59:26 <fungi> and we're about out of time
19:59:39 <mordred> right. which is why I think early fail in check queue is helpful
19:59:48 <jeblair> so anyway, yeah, let's think about short-circuiting the whole test run if some tests fail
19:59:51 <pleia2> real quick, I made a wiki page: http://wiki.openstack.org/InfraTeam
19:59:52 <jeblair> but still start them all in parallel
20:00:04 <ttx> jeblair: +1
20:00:09 <ttx> at leas tfor gate
20:00:16 <fungi> #topic general
20:00:20 <fungi> #link http://wiki.openstack.org/InfraTeam
20:00:26 <mordred> jeblair: agree
20:00:31 <pleia2> thanks fungi
20:00:43 <fungi> thank you pleia2!
20:01:00 <fungi> okay, i'll go ahead and shut this down so ttx can have the channel for tc
20:01:05 <fungi> #endmeeting