19:03:45 #startmeeting 19:03:46 Meeting started Tue Jun 5 19:03:45 2012 UTC. The chair is mtaylor. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:03:47 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:04:00 thank goodness that made it in before the startmeeting. 19:04:10 jeblair: we were racing? 19:04:12 who wants to talk about barrell racing? 19:04:34 OR, I guess we can talk about CI stuff 19:04:36 * jeblair wants to know what a barrell is. 19:04:41 * mtaylor can't spell 19:05:02 #topic zuul 19:05:02 bigger than a barrel, i'd imagine 19:05:15 jeblair: so - you wanna tell folks all about the new hotness? 19:05:21 yeah, so zuul is in production, basically globally for openstack now 19:05:39 because of the interdependencies of all the projects, we can't phase it in, it's pretty much all or nothing. 19:05:55 I wrote a mailing list post about it, which you should receive in the next 6 hours if you haven't already 19:06:02 * mtaylor hands jeblair a large salmon 19:06:04 and a blog post here 19:06:04 totally awesome 19:06:08 #link http://amo-probos.org/post/14 19:06:26 After rolling it out, it pretty much immediately started testing keystone changes in parallel 19:06:30 http://paste.openstack.org/show/18354/ 19:06:35 that's what that looks like. 19:06:47 not to be outdone, 4 nova changes were tested in parallel shortly after that 19:06:51 http://paste.openstack.org/show/18357/ 19:07:39 things I like a) parallel testing b) dependent testing (yay for not running long-running tests if the quick ones don't pass) 19:07:58 i'm pretty sure the ssh connection is going to die at some point 19:08:14 so in that output the change at the top was tested with all of the changes below it merged in as well? 19:08:14 but that's a matter of waiting until that happens, and figuring out why from the debug messages. 19:08:27 clarkb: yep 19:08:41 clarkb: and only merged if they all passed (they did) 19:09:13 Cross project dependencies, even? 19:09:26 deva: yes and no... 19:09:37 yes in that the changes across dependent projects are sequenced 19:09:56 no in that you can not specify a change to one project must be tested with a change to another project 19:10:25 deva: it may be possible to do that if we can get the merge job behaving exactly like gerrit's merge check. it's something i plan on looking into. 19:11:05 Gotcha 19:11:47 jeblair: should we do pep8 before unittests similar to how we do merge first now? 19:12:44 mtaylor: we could do that; the pep8 tests take a little longer since they're done in a tox venv 19:12:54 jeblair: good point 19:13:43 mtaylor: also, unit tests can still be meaningful even if pep8 fails 19:13:51 (which isn't true for a failing merge test) 19:13:57 indeed 19:14:12 so i think we'd at least want to keep the current setup for the check queue 19:14:29 let's look into how long the pep8 tests take before deciding to change the gate queue 19:14:31 yeah - I can be on board with that 19:15:40 that's probably it for zuul 19:17:00 cool 19:17:05 lemme see ... 19:17:16 #topic gerrit changes 19:17:32 Shrews, clarkb: how are we doing on our new gerrit features? 19:18:01 Work In Progress is ready, available on review-dev now. 19:18:26 and I think the first attempt at a better dashboard and list of "reviewable" changes is complete 19:18:42 As an enhancement, we'll soon be adding a new per-project permission so more people can use the WIP feature. 19:19:06 right now, only change submitter, branch owner, project owner, and admins can use it 19:19:31 I think we should land both of your most recent changes, install those on review-dev to double-check ... and then release to review.openstack.org 19:19:40 unless somebody thinks we should wait for Shrews' acl fix? 19:20:05 mtaylor: i see no reason to wait on it 19:20:35 I have no problems with it 19:20:38 I think that gerrit 2.4 + dashboard are pretty compelling, and giving change owner ability to WIP is nice 19:20:52 and might get us a little bit more real-world use of wip 19:21:03 I have a feeling the better priority sorting will take some time 19:21:18 how long do you think the acl will take? 19:21:19 and I haven't really dug into it yet, so don't wait 19:22:06 (because if it's not going to be too long, we may want to wait until we can announce the feature, and announce that -core developers can wip changes) 19:22:26 that's a good point - Shrews? thoughts? 19:23:00 jeblair: i'm *hoping* this week 19:23:18 so we can hold off a couple of days if you want to see where i stand then 19:25:47 I was going to update puppet to land http://ci.openstack.org/tarballs/test/gerrit-2.4-11-gd4a0c4b.war on review-dev. Should I go ahead or will Shrews' change and my latest one be approved soon? 19:27:50 I'm good with both changes landing 19:28:29 I can update puppet after they land then 19:28:33 cool 19:28:54 alright, let's hold off a couple of days before updating review and see how the acl changes go 19:30:02 I think that's all the big-ticket topics for the moment ... 19:30:06 #topic open discussion 19:30:44 I'm trying to get the global dependency list stuff up and going (after realizing that we can use the update.py machinery in openstack-common to our advantage) 19:30:57 and I got pure-nosetests changes done for nova and glance 19:31:10 OH - I did something else I forgot about ... new pypi mirror code 19:31:40 LinuxJedi isn't here, but after cleaning up etherpad-lite's puppet module I think I may want a precise host instead of an oneiric host for that >_> 19:31:41 pypi.openstack.org is created from all of the packages downloaded by pip-installing all of the requirements from all of the branches of all of our projects 19:31:45 mtaylor: re dependency list, is awesome -- basic idea to have the list in openstack-common, and use update.py to copy it into projects? 19:31:52 jeblair: yes. 19:31:55 jeblair: except 19:31:59 clarkb: can't do that yet 19:32:05 LinuxJedi: darn, ok 19:32:09 jeblair: we won't copy entries from the global list into the projects unless that depend is there first 19:32:11 clarkb: since Rackspace doesn't give us Precise 19:32:31 and nosetests is awesome, except it outputs a lot of logging to console. 19:32:31 so each projects list will be a subset of the global list ... but the versions will be tied... 19:32:36 clarkb: unless mtaylor wants it on the SF HP Cloud account or something 19:32:49 jeblair: yeah, I've gotta fix the nosetest output thing ... vishy said he was cool with our proposed change 19:32:52 LinuxJedi: i think precise images exist now. 19:32:57 they do 19:33:00 jeblair: ah, awesome 19:33:05 we can spin up precise slaves via jclouds-plugin even 19:33:07 clarkb: ok, scrap what I said ;) 19:34:03 LinuxJedi: if you can swap oneiric out for precise when you get back that would be awesome 19:34:24 speaking of that ... 19:34:28 #topic etherpad 19:34:34 I am still fiddling with it a little on my test box though. Not entirely sure logrotate is working the way I want it to 19:34:38 should we talk about a transition plan? 19:34:51 clarkb: sure, can I erase the oneiric one in the process or do you temporarily need both? 19:35:02 LinuxJedi: I do not need the oneiric box so erasing is fine 19:35:07 cool 19:35:23 * LinuxJedi goes back to lurking and pretending to be not working on a public holiday ;) 19:35:27 clarkb: lovely puppet work, btw. 19:36:18 #link https://github.com/Pita/etherpad-lite/wiki/How-to-migrate-the-database-from-Etherpad-to-Etherpad-Lite 19:36:32 that link describes the technical process behind migrating 19:37:07 basically run a js script to dump the old DB then cat that back into the etherpad lite DB 19:37:23 so we should be able to dry run the data migration a few times to make sure it's solid and see how long it takes 19:37:42 clarkb: let me know if you need any more VMs for the dry runs 19:37:51 * LinuxJedi can spin up as many as you need 19:37:54 ok 19:37:57 at that point, should just be a scheduled downtown and migration, yeah? 19:38:12 are we close enough on it to be thinking about that? or am I jumping the gun? 19:38:49 probably jumping the gun a little, but yes if things look good after migrating a couple times we should be able to schedule a downtime and DNS cutover or however you want to actually flip the switch 19:39:17 does the CI team admin etherpad.openstack.org? 19:39:22 ok. I'll just sit back on my haunches for a while 19:39:26 clarkb: yes 19:39:32 well, sort of 19:39:37 we have the login to it :) 19:39:37 clarkb: I can help you with a migration plan when ready 19:39:37 so access to the old DB shouldn't be a problem? 19:39:43 clarkb: I have logins for everything 19:39:50 great 19:39:51 LinuxJedi: has global root on the internet 19:40:01 rm -rf /internet 19:40:35 crap. now I can't work 19:40:46 #topic open discussion 19:41:01 anybody got anything else? questions? comments? 19:41:44 * LinuxJedi has had 2 days off this week and lots of non-public admin stuff this week so it will probably be a quietish week from me 19:42:23 but I can fix everyone's problems as usual and I have a few things planned 19:42:25 :) 19:42:28 hehehe 19:42:41 well, for the record, I did NOT break anything this weekend 19:42:49 yay \o/ 19:43:01 * LinuxJedi buys mtaylor a beer 19:43:20 are we fully recovered from the forkbombs? 19:43:32 good question. actually... 19:43:40 #topic multiprocess forkbombs 19:43:49 we should probably talk about that for a sec just for the record 19:43:58 i think so, unless a test snuck in last night as i was merging the revert patch 19:44:22 #link https://wiki.jenkins-ci.org/display/JENKINS/ProcessTreeKiller 19:44:38 becaues of that, i believe that jenkins should have killed the processes that got out of control 19:45:07 on the two machines i could (eventually) log into, the processes in question had the correct environment for that to operate 19:45:12 is there any value in setting ulimits on the test VMs? 19:45:13 so i'm not sure why it didn't happen. 19:45:36 someone was suggesting that the forkbomb was going so fast that perhaps the killer couldn't keep up 19:45:38 it may have been so pathologically bad that jenkins couldn't run that code. 19:46:19 perhaps, but that's a naive implementation of a process killer; it should do a complete pass and eventually kill the parent. 19:46:24 but i don't know how it's implemented in jenkins. 19:46:43 * mtaylor blames java 19:46:48 clarkb: we may want to look into that. or something with cgroups 19:47:32 I think the goal with ulimit/cgroups would be to keep the machine in a useable state for debugging? 19:47:40 and possibly give jenkins a better shot at cleaning things up 19:47:43 and probably look into the processtreekiller code to see what it's actually doing. 19:47:56 jeblair: any further thoughts on the post-build action of cleaning up lurking processes? 19:48:33 mtaylor: my thoughts on that are disrupted by the processtreekiller -- if it was supposed to run but failed, i think there's probably nothing we can do from within jenkins to do the same thing. 19:49:22 jeblair: good point 19:50:37 heh, it lists ALL processes and check the env variables of each. ick 19:50:57 wow, really? that's special 19:51:43 Shrews: better ideas? 19:52:14 jeblair: store list of pids? not sure without understanding jenkins code 19:52:55 jenkins spawns processes that can spawn processes that can spawn processes whose parents can die making the children be reparented to PID 1. 19:53:05 all of which happened yesterday 19:53:34 so i'm hard pressed to see a better way (other than using cgroups which isn't cross-platform) 19:54:14 jeblair: still loving Jenkins? ;) 19:55:04 LinuxJedi: in my statement above, the processes i'm talking about are the test processes. 19:55:13 ah, ok :) 19:55:22 eh, there could probably be some sort of central reporting system when a new child is spawned. 19:56:21 well... I think that's about it for real this time 19:56:21 Shrews: I think what you're describing doesn't exist in unix. 19:56:26 last thoughts? 19:56:28 Shrews: you should write a custom init just for jenkins hosts 19:56:44 jeblair: i'm thinking at the jenkins level. 19:56:48 perhaps we should use systemd. 19:57:20 Shrews: the processes we're talking about aren't spawned by jenkins, they're spawned by the test runner that we told jenkins to run. 19:57:33 jeblair: oh, well that is different indeed 19:58:19 thanks everybody! 19:58:22 #endmeeting