#openstack-meeting log

19:03:45 <mtaylor> #startmeeting
19:03:46 <openstack> Meeting started Tue Jun  5 19:03:45 2012 UTC.  The chair is mtaylor. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:03:47 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:04:00 <jeblair> thank goodness that made it in before the startmeeting.
19:04:10 <clarkb> jeblair: we were racing?
19:04:12 <mtaylor> who wants to talk about barrell racing?
19:04:34 <mtaylor> OR, I guess we can talk about CI stuff
19:04:36 * jeblair wants to know what a barrell is.
19:04:41 * mtaylor can't spell
19:05:02 <mtaylor> #topic zuul
19:05:02 <jeblair> bigger than a barrel, i'd imagine
19:05:15 <mtaylor> jeblair: so - you wanna tell folks all about the new hotness?
19:05:21 <jeblair> yeah, so zuul is in production, basically globally for openstack now
19:05:39 <jeblair> because of the interdependencies of all the projects, we can't phase it in, it's pretty much all or nothing.
19:05:55 <jeblair> I wrote a mailing list post about it, which you should receive in the next 6 hours if you haven't already
19:06:02 * mtaylor hands jeblair a large salmon
19:06:04 <jeblair> and a blog post here
19:06:04 <mtaylor> totally awesome
19:06:08 <jeblair> #link http://amo-probos.org/post/14
19:06:26 <jeblair> After rolling it out, it pretty much immediately started testing keystone changes in parallel
19:06:30 <jeblair> http://paste.openstack.org/show/18354/
19:06:35 <jeblair> that's what that looks like.
19:06:47 <jeblair> not to be outdone, 4 nova changes were tested in parallel shortly after that
19:06:51 <jeblair> http://paste.openstack.org/show/18357/
19:07:39 <mtaylor> things I like a) parallel testing b) dependent testing (yay for not running long-running tests if the quick ones don't pass)
19:07:58 <jeblair> i'm pretty sure the ssh connection is going to die at some point
19:08:14 <clarkb> so in that output the change at the top was tested with all of the changes below it merged in as well?
19:08:14 <jeblair> but that's a matter of waiting until that happens, and figuring out why from the debug messages.
19:08:27 <jeblair> clarkb: yep
19:08:41 <jeblair> clarkb: and only merged if they all passed (they did)
19:09:13 <deva> Cross project dependencies, even?
19:09:26 <jeblair> deva: yes and no...
19:09:37 <jeblair> yes in that the changes across dependent projects are sequenced
19:09:56 <jeblair> no in that you can not specify a change to one project must be tested with a change to another project
19:10:25 <jeblair> deva: it may be possible to do that if we can get the merge job behaving exactly like gerrit's merge check.  it's something i plan on looking into.
19:11:05 <deva> Gotcha
19:11:47 <mtaylor> jeblair: should we do pep8 before unittests similar to how we do merge first now?
19:12:44 <jeblair> mtaylor: we could do that; the pep8 tests take a little longer since they're done in a tox venv
19:12:54 <mtaylor> jeblair: good point
19:13:43 <jeblair> mtaylor: also, unit tests can still be meaningful even if pep8 fails
19:13:51 <jeblair> (which isn't true for a failing merge test)
19:13:57 <mtaylor> indeed
19:14:12 <jeblair> so i think we'd at least want to keep the current setup for the check queue
19:14:29 <jeblair> let's look into how long the pep8 tests take before deciding to change the gate queue
19:14:31 <mtaylor> yeah - I can be on board with that
19:15:40 <jeblair> that's probably it for zuul
19:17:00 <mtaylor> cool
19:17:05 <mtaylor> lemme see ...
19:17:16 <mtaylor> #topic gerrit changes
19:17:32 <mtaylor> Shrews, clarkb: how are we doing on our new gerrit features?
19:18:01 <Shrews> Work In Progress is ready, available on review-dev now.
19:18:26 <clarkb> and I think the first attempt at a better dashboard and list of "reviewable" changes is complete
19:18:42 <Shrews> As an enhancement, we'll soon be adding a new per-project permission so more people can use the WIP feature.
19:19:06 <Shrews> right now, only change submitter, branch owner, project owner, and admins can use it
19:19:31 <mtaylor> I think we should land both of your most recent changes, install those on review-dev to double-check ... and then release to review.openstack.org
19:19:40 <mtaylor> unless somebody thinks we should wait for Shrews' acl fix?
19:20:05 <Shrews> mtaylor: i see no reason to wait on it
19:20:35 <clarkb> I have no problems with it
19:20:38 <mtaylor> I think that gerrit 2.4 + dashboard are pretty compelling, and giving change owner ability to WIP is nice
19:20:52 <mtaylor> and might get us a little bit more real-world use of wip
19:21:03 <clarkb> I have a feeling the better priority sorting will take some time
19:21:18 <jeblair> how long do you think the acl will take?
19:21:19 <clarkb> and I haven't really dug into it yet, so don't wait
19:22:06 <jeblair> (because if it's not going to be too long, we may want to wait until we can announce the feature, and announce that -core developers can wip changes)
19:22:26 <mtaylor> that's a good point - Shrews? thoughts?
19:23:00 <Shrews> jeblair: i'm *hoping* this week
19:23:18 <Shrews> so we can hold off a couple of days if you want to see where i stand then
19:25:47 <clarkb> I was going to update puppet to land http://ci.openstack.org/tarballs/test/gerrit-2.4-11-gd4a0c4b.war on review-dev. Should I go ahead or will Shrews' change and my latest one be approved soon?
19:27:50 <mtaylor> I'm good with both changes landing
19:28:29 <clarkb> I can update puppet after they land then
19:28:33 <mtaylor> cool
19:28:54 <mtaylor> alright, let's hold off a couple of days before updating review and see how the acl changes go
19:30:02 <mtaylor> I think that's all the big-ticket topics for the moment ...
19:30:06 <mtaylor> #topic open discussion
19:30:44 <mtaylor> I'm trying to get the global dependency list stuff up and going (after realizing that we can use the update.py machinery in openstack-common to our advantage)
19:30:57 <mtaylor> and I got pure-nosetests changes done for nova and glance
19:31:10 <mtaylor> OH - I did something else I forgot about ... new pypi mirror code
19:31:40 <clarkb> LinuxJedi isn't here, but after cleaning up etherpad-lite's puppet module I think I may want a precise host instead of an oneiric host for that >_>
19:31:41 <mtaylor> pypi.openstack.org is created from all of the packages downloaded by pip-installing all of the requirements from all of the branches of all of our projects
19:31:45 <jeblair> mtaylor: re dependency list, is awesome -- basic idea to have the list in openstack-common, and use update.py to copy it into projects?
19:31:52 <mtaylor> jeblair: yes.
19:31:55 <mtaylor> jeblair: except
19:31:59 <LinuxJedi> clarkb: can't do that yet
19:32:05 <clarkb> LinuxJedi: darn, ok
19:32:09 <mtaylor> jeblair: we won't copy entries from the global list into the projects unless that depend is there first
19:32:11 <LinuxJedi> clarkb: since Rackspace doesn't give us Precise
19:32:31 <jeblair> and nosetests is awesome, except it outputs a lot of logging to console.
19:32:31 <mtaylor> so each projects list will be a subset of the global list ... but the versions will be tied...
19:32:36 <LinuxJedi> clarkb: unless mtaylor wants it on the SF HP Cloud account or something
19:32:49 <mtaylor> jeblair: yeah, I've gotta fix the nosetest output thing ... vishy said he was cool with our proposed change
19:32:52 <jeblair> LinuxJedi: i think precise images exist now.
19:32:57 <mtaylor> they do
19:33:00 <LinuxJedi> jeblair: ah, awesome
19:33:05 <mtaylor> we can spin up precise slaves via jclouds-plugin even
19:33:07 <LinuxJedi> clarkb: ok, scrap what I said ;)
19:34:03 <clarkb> LinuxJedi: if you can swap oneiric out for precise when you get back that would be awesome
19:34:24 <mtaylor> speaking of that ...
19:34:28 <mtaylor> #topic etherpad
19:34:34 <clarkb> I am still fiddling with it a little on my test box though. Not entirely sure logrotate is working the way I want it to
19:34:38 <mtaylor> should we talk about a transition plan?
19:34:51 <LinuxJedi> clarkb: sure, can I erase the oneiric one in the process or do you temporarily need both?
19:35:02 <clarkb> LinuxJedi: I do not need the oneiric box so erasing is fine
19:35:07 <LinuxJedi> cool
19:35:23 * LinuxJedi goes back to lurking and pretending to be not working on a public holiday ;)
19:35:27 <jeblair> clarkb: lovely puppet work, btw.
19:36:18 <clarkb> #link https://github.com/Pita/etherpad-lite/wiki/How-to-migrate-the-database-from-Etherpad-to-Etherpad-Lite
19:36:32 <clarkb> that link describes the technical process behind migrating
19:37:07 <clarkb> basically run a js script to dump the old DB then cat that back into the etherpad lite DB
19:37:23 <mtaylor> so we should be able to dry run the data migration a few times to make sure it's solid and see how long it takes
19:37:42 <LinuxJedi> clarkb: let me know if you need any more VMs for the dry runs
19:37:51 * LinuxJedi can spin up as many as you need
19:37:54 <clarkb> ok
19:37:57 <mtaylor> at that point, should just be a scheduled downtown and migration, yeah?
19:38:12 <mtaylor> are we close enough on it to be thinking about that? or am I jumping the gun?
19:38:49 <clarkb> probably jumping the gun a little, but yes if things look good after migrating a couple times we should be able to schedule a downtime and DNS cutover or however you want to actually flip the switch
19:39:17 <clarkb> does the CI team admin etherpad.openstack.org?
19:39:22 <mtaylor> ok. I'll just sit back on my haunches for a while
19:39:26 <LinuxJedi> clarkb: yes
19:39:32 <mtaylor> well, sort of
19:39:37 <mtaylor> we have the login to it :)
19:39:37 <LinuxJedi> clarkb: I can help you with a migration plan when ready
19:39:37 <clarkb> so access to the old DB shouldn't be a problem?
19:39:43 <LinuxJedi> clarkb: I have logins for everything
19:39:50 <clarkb> great
19:39:51 <mtaylor> LinuxJedi: has global root on the internet
19:40:01 <LinuxJedi> rm -rf /internet
19:40:35 <mtaylor> crap. now I can't work
19:40:46 <mtaylor> #topic open discussion
19:41:01 <mtaylor> anybody got anything else? questions? comments?
19:41:44 * LinuxJedi has had 2 days off this week and lots of non-public admin stuff this week so it will probably be a quietish week from me
19:42:23 <LinuxJedi> but I can fix everyone's problems as usual and I have a few things planned
19:42:25 <LinuxJedi> :)
19:42:28 <mtaylor> hehehe
19:42:41 <mtaylor> well, for the record, I did NOT break anything this weekend
19:42:49 <LinuxJedi> yay \o/
19:43:01 * LinuxJedi buys mtaylor a beer
19:43:20 <clarkb> are we fully recovered from the forkbombs?
19:43:32 <mtaylor> good question. actually...
19:43:40 <mtaylor> #topic multiprocess forkbombs
19:43:49 <mtaylor> we should probably talk about that for a sec just for the record
19:43:58 <jeblair> i think so, unless a test snuck in last night as i was merging the revert patch
19:44:22 <jeblair> #link https://wiki.jenkins-ci.org/display/JENKINS/ProcessTreeKiller
19:44:38 <jeblair> becaues of that, i believe that jenkins should have killed the processes that got out of control
19:45:07 <jeblair> on the two machines i could (eventually) log into, the processes in question had the correct environment for that to operate
19:45:12 <clarkb> is there any value in setting ulimits on the test VMs?
19:45:13 <jeblair> so i'm not sure why it didn't happen.
19:45:36 <mtaylor> someone was suggesting that the forkbomb was going so fast that perhaps the killer couldn't keep up
19:45:38 <jeblair> it may have been so pathologically bad that jenkins couldn't run that code.
19:46:19 <jeblair> perhaps, but that's a naive implementation of a process killer; it should do a complete pass and eventually kill the parent.
19:46:24 <jeblair> but i don't know how it's implemented in jenkins.
19:46:43 * mtaylor blames java
19:46:48 <jeblair> clarkb: we may want to look into that.  or something with cgroups
19:47:32 <clarkb> I think the goal with ulimit/cgroups would be to keep the machine in a useable state for debugging?
19:47:40 <clarkb> and possibly give jenkins a better shot at cleaning things up
19:47:43 <jeblair> and probably look into the processtreekiller code to see what it's actually doing.
19:47:56 <mtaylor> jeblair: any further thoughts on the post-build action of cleaning up lurking processes?
19:48:33 <jeblair> mtaylor: my thoughts on that are disrupted by the processtreekiller -- if it was supposed to run but failed, i think there's probably nothing we can do from within jenkins to do the same thing.
19:49:22 <mtaylor> jeblair: good point
19:50:37 <Shrews> heh, it lists ALL processes and check the env variables of each. ick
19:50:57 <mtaylor> wow, really? that's special
19:51:43 <jeblair> Shrews: better ideas?
19:52:14 <Shrews> jeblair: store list of pids? not sure without understanding jenkins code
19:52:55 <jeblair> jenkins spawns processes that can spawn processes that can spawn processes whose parents can die making the children be reparented to PID 1.
19:53:05 <jeblair> all of which happened yesterday
19:53:34 <jeblair> so i'm hard pressed to see a better way (other than using cgroups which isn't cross-platform)
19:54:14 <LinuxJedi> jeblair: still loving Jenkins? ;)
19:55:04 <jeblair> LinuxJedi: in my statement above, the processes i'm talking about are the test processes.
19:55:13 <LinuxJedi> ah, ok :)
19:55:22 <Shrews> eh, there could probably be some sort of central reporting system when a new child is spawned.
19:56:21 <mtaylor> well... I think that's about it for real this time
19:56:21 <jeblair> Shrews: I think what you're describing doesn't exist in unix.
19:56:26 <mtaylor> last thoughts?
19:56:28 <clarkb> Shrews: you should write a custom init just for jenkins hosts
19:56:44 <Shrews> jeblair: i'm thinking at the jenkins level.
19:56:48 <jeblair> perhaps we should use systemd.
19:57:20 <jeblair> Shrews: the processes we're talking about aren't spawned by jenkins, they're spawned by the test runner that we told jenkins to run.
19:57:33 <Shrews> jeblair: oh, well that is different indeed
19:58:19 <mtaylor> thanks everybody!
19:58:22 <mtaylor> #endmeeting