19:02:09 #startmeeting infra 19:02:11 Meeting started Tue May 7 19:02:09 2013 UTC. The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:02:12 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:02:13 o/ 19:02:14 The meeting name has been set to 'infra' 19:02:43 #topic bugs 19:03:00 (i'm going to abuse my position as chair to insert a topic not currently on the agenda) 19:03:09 abuse away 19:03:28 this is a PSA to remind people to use launchpad bugs for infra tasks 19:03:43 i think we were a bit lax about that last cycle (all of us, me too) 19:04:20 we were. ++ to pleia for forcing us to do bug days 19:04:21 especially as we're trying it easier for others to get involved, i think keeping up with bug status is important 19:04:39 yes, much thanks to pleia for that; we'd be in a much worse position otherwise 19:04:40 +1 to that 19:04:48 us new folks have to know where to find work that needs to be done 19:04:56 i couldn't agree more. definitely going to strive to improve on that as my new cycle resolution 19:05:32 so anyway, please take a minute and make sure that things you're working on have bugs assigned to you, and things you aren't working on don't. :) 19:05:53 btw, i think we have started doing a better job with low-hanging-fruit tags 19:06:05 o/ 19:06:12 so hopefully that will be an effective way for new people to pick up fairly independent tasks 19:06:28 any other thoughts on that? 19:06:45 I think we should try and make the bugday thing frequent and scheduled in advance 19:06:53 oh something that is also missing 19:06:57 seconded 19:07:02 a document to outline proper bug workflow 19:07:17 maybe that exists somewhere? 19:07:19 clarkb: +1; how often? line up with milestones? 19:07:30 I just took a guess at what to do 19:07:48 jeblair: I was thinking once a month. lining up with milestones might be hard as we end up being very busy around milestone time it seems like 19:07:48 jlk: no, but i think i need to write a 'how to contribute to openstack-infra' doc 19:08:07 jlk: i should assign a bug to myself for that. :) 19:08:09 lining up between milestones ;) 19:08:23 but any schedule that is consistent and doesn't allow us to put it off would be good 19:08:43 and maybe we cycle responsibility for driving it so that pleia doesn't have to do it each time 19:08:49 bknudson: yes, if it exist then use it.. if not, then use virtual default domain 19:09:11 sorry, wrong chat box 19:09:34 clarkb: want to mock up a calendar? 19:09:48 jeblair: sure. I will submit a bug for it too :P 19:09:58 #action clarkb to mock up infra bugday calendar 19:10:02 o/ 19:10:06 jlk: basically, feel free to assign a bug to yourself when you decide to start working on something 19:10:52 #topic actions from last meeting 19:10:58 #link http://eavesdrop.openstack.org/meetings/infra/2013/infra.2013-04-30-19.03.html 19:11:00 jeblair: that's what I assumed. 19:11:13 mordred: mordred set up per-provider apt mirrors (incl cloud archive) and magic puppet config to use them ? 19:11:52 mordred: maybe you should just open a bug for that and let us know when there's something to start testing 19:12:20 jeblair: yes. I will do this 19:12:24 clarkb: clarkb to ping markmc and sdague about move to testr 19:12:33 I have not done that yet 19:12:42 #action clarkb to ping markmc and sdague about move to testr 19:12:47 i assume that's still a good idea. :) 19:12:53 it is 19:13:07 and it should be a higher priority of mine to get things in before milestone 1 if possible 19:13:15 mordred did bring it up in the project meeting iirc 19:13:23 yeah. people are receptive to it 19:13:41 I think on my tdl is "open a bug about migrating everything and set up the per-project bug tasks" 19:14:21 #topic oneiric server migrations 19:14:26 so we moved lists and eavesdrop 19:14:41 the continued avalanche of emails to os-dev seems to indicate that went okay 19:14:52 and meetbot is answering hails so 19:15:01 i guess that's that? 19:15:04 we need to shutdown/delete the old servers at some point. Once we have done that the task is complete 19:15:07 jeblair: not quite 19:15:14 a resounding success 19:15:30 we need to delete the old servers (unless you already did that) and mirror26 needs to be swapped out for a centos slave 19:15:48 reed: have you logged into the new lists.o.o? 19:16:07 i can take mirror26 as an action item 19:16:15 reed: if not, let us know when you do so and if you have any objections to deleting the old server. 19:16:15 unless you wanted it, clarkb 19:16:35 fungi: go for it 19:16:39 jeblair, yes 19:16:43 #action fungi open a bug about replacing mirror26 and assign it to himself 19:16:43 system restart required 19:16:53 reed: ? 19:17:08 just logged in, *** System restart required *** 19:17:09 oh. the issue. :) 19:17:32 i believe it actually was recently rebooted. 19:17:45 it was rebooted on saturday before we updated DNS 19:17:55 I guess that means that more updates have come in since then 19:17:58 reed: anything missing from the move? or can we delete the old server? 19:18:01 * fungi is pretty sure our devstack slaves are the only servers we have which don't say "restart required" every time you log in 19:18:28 jeblair, how should I know if anything is missing? did anybody complain? 19:19:04 reed: no, i think we have the archives and the lists seem to be working, so i don't see a reason 19:19:24 reed: but we didn't sync homedirs (i don't think) 19:19:40 alright then, I don't think I have anything in the old server there anyway 19:19:41 reed: so if your bitcoin wallet is on that server you should copy it. :) 19:19:44 jeblair: I did not sync homedirs 19:19:55 oh, my wallet! 19:19:55 jeblair already stole all of my bitcoins 19:20:18 #action jeblair delete old lists and eavesdrop 19:20:21 one of the cloud expense management systems allows you to request bitcoins for payments 19:20:39 we should charge bitcoins for rechecks 19:20:47 ha 19:20:52 we should charge bugfixes for rechecks 19:20:59 #topic jenkins slave operating systems 19:21:17 my notes in the wiki say: current idea: test master and stable branches on latest lts+cloud archive at time of initial development 19:21:21 and: open question: what to do with havana (currently testing on quantal -- "I" release would test on precise?) 19:21:28 there's an idea about having the ci system generate a bitcoin for each build, and then embed build id information into the bitcoin... 19:21:55 oh good. this topic again. my favorite :) 19:22:16 jeblair: I have thought about it a bit over the last week and I think that testing havana on quantal then "I" on precise is silly 19:22:29 clarkb: yes, that sounds silly to me too. 19:22:46 it opens us to potential problems when we open I for dev 19:23:08 and we may as well sink the cost now before quantal and precise have time to diverage 19:23:41 so if we're going to stick with the plan of lts+cloud archive, then i think we should roll back our slaves to precise asap. 19:23:56 and the thought is that we'll be able to test the "j" release on the next lts? 19:24:18 fungi: yes 19:24:30 lts+cloud archive ++ 19:24:44 at least until it causes some unforeseen problem 19:24:55 makes sense. i can spin up a new farm of precise slaves then. most of the old ones were rackspace legacy and needed rebuilding anyway 19:25:03 I believe zul and Daviey indicated they didn't think tracking depends in that order would be a problem 19:25:10 jeblair: I assume we want to run it by the TC first? 19:25:23 but I agree that sooner is better than later 19:25:58 the tc agenda is probably long since closed for today's meeting. do we need to see about getting something in for next week with them? 19:26:02 honestly, I don't the TC will want to be bothered with it (gut feeling, based on previous times I've asked things) 19:26:22 yes, why don't we do it, and just let them know 19:26:24 it doesn't change much in terms of developer experience, since we're still hacking on pypi 19:26:26 don't make it a question 19:26:30 fair enough 19:26:34 make it a "hey we're doing this thing, thought you'd like to know" 19:26:54 if they feel strongly about it, we can certainly open the discussion (and i would _love_ new ideas about how to solve the problem. :) 19:27:19 mordred: you want to be the messenger? 19:27:23 jeblair: sure 19:27:30 I believe we'll be talking soon 19:27:41 #action mordred inform TC of current testing plans 19:28:04 #agreed drop quantal slaves in favor of precise+cloud archive 19:28:14 #action fungi open bug about spinning up new precise slaves, then do it 19:28:45 any baremetal updates this week? 19:28:55 not to my knowledge 19:29:02 #topic open discussion 19:29:20 oh, while we're talking about slave servers, rackspace says the packet loss on mirror27 is due to another customer on that compute node 19:29:27 fungi: ! 19:29:32 fwiw, I'm almost done screwing with hacking to add support for per-project local checks 19:29:41 as a result, I'd like to say "pep8 is a terrible code base" 19:29:45 fungi: we should DoS them in return :P 19:29:47 they offered to migrate us to another compute node, but it will involve downtime. should i just build another instead? 19:29:48 fungi: want to spin up a replacement mirror27? istr that we have had long-running problems with that one? 19:29:53 alias opix="open and fix" #usage I'll opix a bug for that 19:30:06 heh 19:30:23 that's the cloud way right? problem server? spin up a new one! 19:30:26 (or 10) 19:30:33 mordred: i agree. :) 19:30:33 yeah, i'll just do replacements for both mirrors in that case 19:30:38 fungi: +1 19:30:51 do we need to spend more time troubleshooting static.o.o problems/ 19:31:03 oh? 19:31:04 sounds like we were happy calling it a network issue 19:31:16 oh, the ipv6 ssh thing? 19:31:19 are we still happy with that as the most recent pypi.o.o failure? 19:31:36 ahh, that, yes 19:31:38 fungi: no pip couldn't fetch 5 packages from static.o.o the other day 19:31:42 right 19:31:48 clarkb: i just re-ran the logstash query with no additional hits 19:32:07 that's what prompted me to open the ticket. i strongly suspect it was the packet loss getting worse than usual 19:32:13 fungi: I see 19:32:33 i'd seen it off and on in the past, but never to the point of impacting tests (afaik) 19:32:46 so - possibly a can of worms - but based off of "jlk | that's the cloud way right? problem server? spin up a new one!" 19:33:16 fungi: though i believe the mirror packet loss is mirror27 <-> static, wheras the test timeouts were slave <-> static... 19:33:18 should we spend _any_ time thinking about ways we can make some of our longer-lived services more cloud-y? 19:33:39 for easier "oh, just add another mirror to the pool and kill the ugly one" like our slaves are 19:33:59 mmm, right. i keep forgetting static is what actually serves the mirrors 19:34:14 mordred: are you going to make heat work for us? 19:34:17 so then no, that was not necessarily related 19:34:20 because I would be onboard with that :) 19:34:29 mordred: fyi we're struggling with that internally too, w/ our openstack control stuff in cloud, treating them more "cloudy" whatever that means. 19:35:06 jlk: yeah, I mean - it's easier for services that are actually intended for it - like our slave pool 19:35:11 otoh - jenkins, you know? 19:35:13 yup 19:35:20 mordred: as they present problems, sure, but not necessarily go fixing things that aren't broke. 19:35:26 yeah, this are harder questions 19:35:29 jeblair: good point 19:35:32 jeblair: +1 19:35:34 mordred: we are making jenkins more cloudy -- zuul/gearman... 19:35:57 does gearman have an easy way to promote to master? 19:36:01 * mordred used floating ip's on hp cloud the other day to support creating/deleting the same thing over and over again while testing - but having the dns point to the floating ip 19:36:12 jlk: no, gearman and zuul will be (co-located) SPOFs 19:36:13 jlk: gearman doesn't have a master/slave concept 19:36:19 mordred: yeah I intend on trying floating ips at some point 19:36:30 clarkb: it worked VERY well and made me happy 19:36:35 * ttx waves 19:36:45 jeblair: doesn't gearman have support for multi-master operation-ish something? 19:36:59 gear man job server(s) 19:37:04 mordred: I think it does, but if zuul is already a spof... 19:37:08 ttx: I am tasked with communicating a change in our jenkins slave strategy to the TC - do I need an agenda item? 19:37:13 clarkb: good point 19:37:21 mordred, jlk: yeah, actually you can just use multiple gearman masters 19:37:32 mordred, jlk: and have all the clients and workers talk to all of the masters 19:37:32 so yes, you can have multiple in a active/active mod 19:37:40 so once gearman is in, then our only spofs will be gerrit/zuul 19:37:44 but as stated, doesn't solve zuul 19:37:51 mordred, jlk: however, we'll probably just run one on the zuul server. because zuul spof. 19:37:57 yeah 19:37:58 mordred: you can probably use the open discussion area at the end. If it's more significant should be posted to -dev and linked to -tc to get a proper topic on the agenda 19:38:09 (of next week)� 19:38:18 ttx: it's not. I don't think anyone will actually care of have an opinion - but information is good 19:38:41 mordred: will try to give you one minute at the end -- busy agenda 19:39:42 jeblair: can I take a turn? 19:39:53 anteaya: the floor is yours 19:39:56 thanks 19:40:09 sorry I haven't been around much lately, figuring out the new job and all 19:40:28 hoping to get back to the things I was working on like the openstackwatch url patch 19:40:49 but if something I said I would do is important, pluck if from my hands and carry on 19:40:51 and thanks 19:41:29 anteaya: thank you, and i hope the new job is going well. 19:41:38 :D thank jeblair it is 19:41:57 like most new jobs I have to get in there and do stuff for a while to figure out what I should be doing 19:42:09 getting there though 19:42:46 anteaya: we do need to sync up with you about donating devstack nodes. should i email someone? 19:43:00 hmmm, I was hoping I would have them by now 19:43:18 when I was in Montreal last week I met with all the people I thought I needed to meet with 19:43:29 and was under the impression there were no impediments 19:43:37 thought I would have the account by now 19:43:50 you are welcome to email the thread I started to inquire 19:44:02 though it will probably be me that replies 19:44:13 anteaya: ok, will do. and if you need us to sign up with a jenkins@openstack.org email address or something, we can do that. 19:44:15 let's do that, let's use the official channels and see what happens 19:44:37 I don't think so, I got the -infra core emails from mordred last week 19:44:49 so I don't think I need more emails 19:45:34 email the thread, I'll forward it around, maybe that will help things 19:45:39 and thanks 19:45:43 thank you 19:45:54 i'm continuing to hack away at zuul+gearman 19:46:06 fun times 19:46:08 right before this meeting, i had 5 of it's functional tests working 19:46:19 oh, on the centos py26 unit test front, dprince indicated yesterday that he thought finalizing the remaining nova stable backports by thursday was doable (when oneiric's support expires) 19:46:22 dprince, still the case? 19:46:31 i'm hoping i can have a patchset that passes tests soon. 19:46:41 jeblair: nice 19:46:56 yay for passing tests 19:47:13 I have a series of changes up that makes the jenkins log pusher stuff for logstash more properly daemon like 19:47:44 i'm figuring out how to integrate WIP with gerrit 2.6 configs. 19:48:06 I think that what I currently have is enough to start transitioning back to importing more logs and working to normalize the log formats. But I will probably push that down the stack while I sort out testr 19:48:06 fungi: for grizzly we need the one branch in and we are set. 19:48:18 dprince: any hope for folsom? 19:48:24 fungi: for folsom I think centos6 may be a lost cause. 19:48:30 ugh 19:48:54 clarkb: what are you doing with testr? 19:49:13 'twould be sad if we could test stable/folsom for everything except nova on centos 19:49:33 dprince, fungi: hrm, that means we have no supported python2.6 test for folsom nova 19:49:35 fungi: it looks like it could be several things (more than 2 or 3) that would need to get backported to fix all that stuff. 19:49:43 jeblair: motivating people like sdague and markmc to push everyone else along :) 19:49:49 leaves us maintaining special nova-folsom test slaves running some other os as of yet undetermined 19:49:55 jeblair: I don't intend on doing much implementation myself this time around 19:50:02 clarkb: +1 19:50:33 oh no, what did I do wrong? :) 19:50:41 sdague: nothing :) 19:50:42 jeblair: the centos work can be done. but I'm not convinced it is worth the effort. 19:50:52 jeblair, I'm fine with dropping 2.6 testing on stable/folsom - there should be pretty much nothing happening there now 19:51:05 i'll start to look into debian slaves for nova/folsom unit tests i guess? 19:51:10 options for testing nova on python2.6 on folsom: a) backport fixes and test on centos; b) drop tests; c) use debian 19:51:38 (b) or we'll get (a) done somehow IMHO 19:52:05 I saw b. folsom came out before we made the current distro policy 19:52:06 oh, i prefer markmc's suggestion in that case. less work for me ;) 19:52:12 s/saw/say/ 19:52:45 oh so I don't forget. 19:52:49 okay. (b) is a one line change to zuul's config 19:52:54 #action clarkb to get hpcloud az3 sorted out 19:53:10 * jlk has to drop off 19:53:12 #agreed drop python2.6 testing for nova on folsom 19:53:21 jlk: thanks! 19:53:23 * mordred shoots folsom/python2.6 in the facehole 19:53:35 #action fungi add change to disable nova py26 tests for folsom 19:53:58 i'll drop that on top of my oneiric->centos change and we can merge them together 19:54:18 fungi: cool. oh, sorry, i think it's 2 lines. 19:54:30 jeblair: i'll find the extra electrons somewhere 19:54:57 anything else? 19:56:08 hubcap 19:56:13 a merry tuesday to all! (excepting those for whom it may already be wednesday) 19:56:41 thanks everyone! 19:56:42 #endmeeting