#openstack-meeting log

19:03:46 <jeblair> #startmeeting infra
19:03:47 <openstack> Meeting started Tue Apr 30 19:03:46 2013 UTC.  The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:03:48 <ttx> ~o~
19:03:49 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:03:51 <openstack> The meeting name has been set to 'infra'
19:03:53 <topol> isnt that just the next ptl yelling at you
19:04:06 <topol> to get off
19:04:15 <olaph> o/
19:04:43 <clarkb> o/
19:05:00 <fungi> heyoh!
19:05:25 <jlk> o/
19:05:26 <zaro> o/
19:05:40 <jeblair> mordred: are you above 10,000 feet yet?  :)
19:05:55 <mordred> o/
19:06:15 <jeblair> #topic items from last meeting
19:06:24 <jeblair> fungi: slaves?
19:06:28 <fungi> yup
19:06:42 <fungi> rhel6 slaves have been replaced by centos6 slaves and destroyed/deleted
19:07:06 <fungi> the change to switch the node labels (i added temporary compatability labels ahead of time) is in the process of merging
19:07:14 <jeblair> cool
19:07:36 <fungi> it also adds some periodic centos6 bitrot jobs so dprince can ferret out the remainder of patches needing backports
19:07:55 <fungi> and i disabled and eventually deleted the precise unit test slaves too
19:08:09 <fungi> since those hadn't been used for a month or more
19:08:10 <mordred> w00t
19:08:29 <jeblair> i believe the stable 2.6 tests are the last thing that holds us back from dropping oneiric?
19:08:32 <fungi> i think that's the current status on slave versions at the moment
19:08:42 <clarkb> jeblair: yes
19:09:04 <jeblair> fungi: sounds good!
19:09:09 <fungi> yeah, dprince is working on sorting the last few patches he needs for that
19:09:12 <fungi> oh
19:09:30 <fungi> also i cleaned up our puppeting to make it easier to add debian slaves should we want/need to do so later
19:09:53 <jlk> or sooner...
19:09:54 <clarkb> that segways us into "what do we do about >= 13.04 having only 9 months of support?"
19:09:59 <jeblair> good, i like having a plan b
19:10:04 <fungi> someone on the tc (hint mordred) needs to strike up the conversations around all that
19:10:20 <jeblair> #topic jenkins slave operating systems
19:10:32 <clarkb> rackspace does have a raring image now fwiw
19:10:32 <mordred> merp
19:10:35 * clarkb checks hpcloud
19:10:36 <mordred> oh good
19:10:41 <jeblair> mordred: you had some thoughts about that last week; want to chat about those?
19:11:15 <mordred> the idea was that since we're not supposed to break rhel/lts
19:11:22 <clarkb> no raring on hpcloud
19:11:36 <mordred> that we use lts nodes to do testing of stable+1
19:12:03 <clarkb> mordred: my concern with that is while we are not supposed to break lts how do we know we havent?
19:12:04 <mordred> and maintain our focus otherwise on current ubuntu release for dev purposes of master
19:12:15 <jeblair> mordred: by stable+1 do you mean two releases back?  eg, folsom now?
19:12:25 <mordred> yes. stable+1 == folsom now
19:12:41 <fungi> good thing our precise unit test slaves needed to be rebuilt on rackspace nova anyway, so deleting them was not wasted work
19:12:51 <mordred> and I'd say that since stable branches are really the purview of the distros and they've pledged to support things on their lts release
19:12:54 <fungi> (well, all but 2 of the 16 anyway)
19:13:00 <mordred> there's a clear ownership for problems
19:13:39 <mordred> Daviey: you around? does the above sound reasonable to you?
19:13:48 <jeblair> mordred: what do we use to test 2.6 on master?
19:13:57 <clarkb> jeblair: centos
19:14:06 <mordred> centos
19:14:17 <clarkb> that problem exists even without the 9 month support
19:14:28 <mordred> or hell - debian apparently has all pythons :)
19:14:55 <fungi> and i assume lts for stable+1 means the lts which was current at the time stable+1 was developed/released, not whatever the latest lts is (which might not be the same occasionally)
19:15:19 <jeblair> i'm wondering why we should bother testing on non-lts at all?  (which i think is pretty similar to what clarkb is saying?)
19:15:21 <fungi> mordred: i wouldn't say all, but wheezy will have 2.6, 2.7 and 3.2 in main, with 3.3 in backports
19:15:34 <clarkb> jeblair: correct
19:15:35 <ttx> fungi: with Ubuntu's cloud archive latest LTS is where you would find stable+1
19:15:39 <mordred> jeblair: because our devs focus on current release, not on lts
19:15:45 <mordred> and for good reason
19:16:11 <ttx> https://wiki.ubuntu.com/ServerTeam/CloudArchive
19:16:16 <zul> ttx: not necessarily  a new version of python
19:16:44 <jeblair> mordred: sure, but your plan is to bump testing of a project that was tested on latest back to lts at a more or less arbitrary point in time
19:16:50 <jlk> wheezy is probably not a good target either, unless you want to shuffle it again soon
19:17:19 <jeblair> mordred: so if that's going to work, why won't just testing on the lts to start with work?
19:17:27 <mordred> jeblair: we could test on both the whole time and then just drop "latest" when it dies
19:17:35 <mordred> sequencing, I believe
19:17:43 <mordred> new things go in to master, tested against latest ubuntu
19:17:59 <clarkb> we have been really bad at changing platforms for various reasons
19:18:01 <mordred> they will be backported to cloud archive for lts, but probably not until they've landed on master I'd imagine
19:18:14 <mordred> new features are not landed against stable branches
19:18:16 <clarkb> we are only just recently on quantal (~6 months after release) and devstack is still all precise
19:18:25 <mordred> so the needs for preemptive backporting don't exist
19:18:39 <clarkb> if we have to iterate that quickly just to drop support 3 months later that feels like a lot of wasted effort to me
19:19:03 <mordred> clarkb makes a good point that most of this is theory and not practice
19:19:06 <zul> wait what are you guys trying to do?
19:19:21 <mordred> zul: you guys new support lifecycle broke ours
19:19:22 <fungi> well, i consider the time i spent getting tests running on quantal will be applicable toward getting them running on raring anyway
19:19:34 <mordred> zul: our stable branches need testing for 12 months
19:19:43 <zul> mordred:  how?
19:19:44 <mordred> but ubuntu only now exists for 9 months at a pop
19:19:47 <ttx> mordred: sounds like revenge :)
19:19:52 <mordred> ttx: likely :)
19:20:11 <mordred> zul: so we're trying to sort out how to test changes to stable+1 branches
19:20:24 <zul> mordred:  we wouldnt get revenge ;)
19:20:39 <clarkb> now we could just automagically switch to $newrelease when they come out and break the gate
19:20:41 <zul> mordred:  why not 12.04 with the coud archive enabled
19:20:47 <mordred> because
19:20:49 <ttx> zul: we can't continue to test grizzly/raring when raring goes out of support, and we need it for 12 months, and you now provide 9 instead of 18
19:21:00 <mordred> zul: sorry, that's what I was proposing
19:21:05 <clarkb> with the expectation that a week after ubuntu releases we spend a couple days fixing all the things
19:21:16 <zul> mordred:  that sounds sane to me
19:21:18 <clarkb> but if we try to put things in place workign I think we will always be well behind the curve
19:21:29 <mordred> clarkb: you mean for master?
19:21:32 <clarkb> mordred: yes
19:21:34 * fungi dislikes the kind of scramble "taking a couple days to fix all the things breaking the gate" implies
19:21:45 <mordred> clarkb: I think our problem has historically been the slow speed our cloud vendors have in providing us images
19:21:48 <zul> well all of the stuff is gotten from pip isnt it?
19:21:55 <mordred> zul: libvirt
19:22:04 <fungi> and python itself
19:22:10 <jeblair> mordred: clarkb, fungi, and dprince have put a huge amount of effort into upgrading to quantal, and we're not done yet
19:22:14 <mordred> which, btw, is broken on redhat (python itself)
19:22:15 <zul> mordred:  libvirt is not a problem the cloud archive gets the same version whats in the development release
19:22:21 <mordred> jeblair: agree
19:22:36 <clarkb> mordred: my issue is that if we switch to raring at the end of havana (as we switched to quantal afte rgrizzly) then we have only 3 months of support on that before we drop back to LTS
19:22:52 <jeblair> clarkb: we shouldn't have waited that long
19:22:55 <clarkb> so its not just the stable branches that are at issue. we *will* have to iterate much faster than we have been able to
19:22:59 <clarkb> jeblair: yeah that
19:23:20 <mordred> well, problem 1 is that it takes so long for us to be able to _start_ migrating
19:23:26 <jlk> the worry is that changing the test platform in the middle of a release cycle introduces more churn than desired?
19:23:31 <mordred> but - I'm willing to not die on this hill
19:23:54 <clarkb> jlk: it is a lot of churn that we give up on shortly after (to me the benefits are fuzzy but the costs are known and expensive)
19:24:02 <jlk> nod
19:24:09 <mordred> if everyone else thinks that lts+cloud-archive is sane for master, then fine... I just worry that we're going to hit backport hell like we had 2 years ago
19:24:30 <jlk> and the flip side is that continuing to test on a dead-end platform isn't providing much benefit in the Real World?
19:24:32 <mordred> but it's also possible that we've stabalized and I'm being an olld curmudgeon
19:25:01 <zul> mordred:  you shouldnt since you guys are getting the python dependencies from pypi
19:25:29 <mordred> zul: yeah. and I think libxml and libvirt are reasonably sane from an api perspective at this point
19:25:40 <mordred> I'm game
19:25:42 <clarkb> mordred: if we could get rackspace and hp to commit to having images available in a decent amount of time (thank you rackspace for raring) I think we could try speeding the cycle up
19:25:43 <zul> mordred:  hah libvirt sane
19:25:51 <mordred> clarkb: HAHAHAHAHAHAHAHAHAHA
19:25:51 <jlk> the desire is for something newer than LTS, but with longer support than Ubuntu now has
19:25:55 <clarkb> mordred: exactly
19:26:04 <mordred> clarkb: HAHAHAHAHAHAHAHAHAHA
19:26:07 <mordred> sorry
19:26:11 <mordred> I repeated myself
19:26:11 <clarkb> if we are beholden to other people it is really hard to promise with such a short window
19:26:35 <mordred> yes. I believe the story will be a bit different once we have glance api endpoint available, but they are not now
19:26:37 <jlk> clarkb: is it just Ubuntu images you're in need of, or would fast access to other platforms (like Fedora) help as well?
19:26:47 <clarkb> jlk: right now its just ubuntu
19:26:48 <mordred> jlk: fedora would not be helpful
19:26:54 <fungi> from my perspective, the desire is for something which has the versions of system components we need (python et al) and decent run time for security support into the future
19:27:00 <jlk> ok. (just trying to understand the problem scope)
19:27:18 <mordred> jlk: when people talk about cloud interop - this is one of the things that doesn't get talked about enough
19:27:32 <jlk> image availability, or platform to run on?
19:27:33 <clarkb> jlk: the TC decided in january that we would tset on latest ubuntu with an eye for not breaking current RHEL and Ubuntu LTS
19:27:34 <mordred> without image upload ability, we're stuck waiting until BOTH clouds upload new images
19:27:46 <jlk> nod
19:28:01 <clarkb> we are going to use centos to test python 2.6 as ubuntu has ditched 2.6. That covers current RHEL
19:28:07 <jlk> is somebody working on image upload capability on the RAX side? (understanding that it's not a problem for raring at this time)
19:28:24 <mordred> yes. both clouds want it as a feature
19:28:27 <clarkb> so now we need to accomodate testing on current ubuntu or ask the TC to reconsider the platforms we test on or ????
19:28:46 <mordred> well - the policy has always been dev on latest ubuntu
19:28:57 <mordred> however, I do not believe we have EVER actually been able to do that
19:29:01 <mordred> due to lag time
19:29:20 <mordred> we've pretty consistently been at least one release behind
19:29:30 <jeblair> https://etherpad.openstack.org/python-support-motion
19:29:31 * jlk cries a little to himself, softly.
19:29:36 <jeblair> #link https://etherpad.openstack.org/python-support-motion
19:29:36 <mordred> so it might just be time to call a spade a spade and go with a new plan
19:30:00 <fungi> distro making release + all providers of interest making images available of that release + time for us to test and fix things we need working on that release + time to switch tests over to it
19:30:23 <jeblair> technically the tc agreed on a motion about targeting development.  it wasn't quite so specific to specify exactly what test platforms, but i think the intent is that we should use it to guide what we do
19:30:33 <mordred> I agree
19:30:58 <mordred> however, I betcha we could do the lts+cloud archive for testing
19:31:06 <mordred> as a vehicle to support that motion
19:31:10 <zul> +1
19:31:17 <mordred> and have few enough corner cases that anyone would ever notice
19:31:41 <jeblair> mordred: it sounds like that's worth a shot, and if we get into dependency hell, then we know we have a reason to speed up the treadmill
19:31:43 <mordred> since cloud archive has latest ubuntu backports of the relevant bits
19:31:48 <mordred> jeblair: ++
19:32:00 <mordred> so....
19:32:14 <mordred> we may want to add an apt mirror for the cloud archive
19:32:23 <mordred> as I do not believe our providers are doing local mirrors of it
19:32:49 <mordred> or mayubbe it doesn't matter?
19:32:56 <jlk> seems like Ubuntu is going to be trying to do that work anyway (keeping OStack releases going on LTS) so making use of that effort makes sense to me.
19:33:11 <fungi> if we're going to that trouble, it seems sane to just mirror what we need in general (rackspace's ubuntu mirror has gone down from time to time too)
19:33:22 <clarkb> fungi: ++
19:33:38 <mordred> fungi: I have a reprepro config for it already actually
19:34:00 <jeblair> mordred: you want to drop that in puppet then?
19:34:06 <mordred> we'd need cloud-local mirrors
19:34:21 <mordred> which I'm not 100% sure how to solve - but I'll put my brainhole on it
19:34:47 <fungi> i continue to wonder if cloud-local (one per az or whatever) mirrors don't also make sense for our pypi mirroring
19:35:01 <jeblair> mordred: i think devstack-gate can accommodate that fairly easily
19:35:08 <jlk> mordred: that'd be our own instances acting as a mirror, because of fear that the provider provided mirror might go down?
19:35:16 <clarkb> jlk: correct
19:35:17 <jeblair> mordred: (in the image creation step, do provider-specific apt-source config)
19:35:27 <clarkb> jlk: and they do go down occasionally
19:35:46 <mordred> jeblair: yes. although I'd like to figure it out for unittest slaves too
19:35:58 <jlk> clarkb: but doesn't that just cause them to hit the next mirror (maybe more slowly)?
19:36:09 <jlk> forgive me, I come from the yum world
19:36:13 <zul> jeblair:  besides you know who to bug if something breaks ;)
19:36:17 <fungi> jlk: the bigger issue is, the more external bits we rely on being reachable for tests, the more their outages multiply each other (multiple points of failure rather than just a single point of failure)
19:36:22 <jeblair> mordred: actually, if you solve it for unit test slaves in puppet , you might not have to do anything special for devstack-gate.
19:36:25 <clarkb> jlk: no apt usually complains then dies
19:36:31 <jlk> clarkb: awesome :/
19:36:48 <jeblair> mordred: (even though unit test slaves don't strictly need it right now because they are one-provider)
19:36:48 <jlk> fungi: I understand that. What I didn't quite grasp was that apt doesn't have a mirror system to fail through
19:37:00 <mordred> jeblair: let me poke and see if there is a pure-puppet mechanism I can dream up
19:37:05 <mordred> jeblair: ++
19:37:31 <jeblair> #action mordred set up per-provider apt mirrors (incl cloud archive) and magic puppet config to use them
19:37:52 <fungi> jlk: yeah apt expects to try one url to retrieve a package, and then errors out rather than continuing to spend time trying other urls
19:38:14 <jlk> sad
19:38:36 <fungi> well, the alternative is to take lots of time to realize your network is broken and it's not a mirror issue
19:38:38 <jlk> so really, doing our own is just moving the potential problem closer to us
19:38:43 <jeblair> so, er, we're dropping the quantal slaves and going back to precise?
19:39:00 <jeblair> jlk: which tends to work out for us
19:39:10 <clarkb> jeblair: I really want to say no, because quantal has 18 months of support
19:39:16 <jlk> fungi: *shrug* in the yum world that could be a matter of seconds or so. But you never suffer from a single mirror being out of date or down.
19:39:16 <clarkb> but doing quantal then going back to precise is just weird
19:39:47 <fungi> when is the next ubuntu lts due?
19:40:01 <clarkb> 14.04
19:40:11 <clarkb> one year
19:40:15 * fungi thinks doing quantal and then upgrading to wheezy doesn't sound *that* weird ;)
19:40:40 <clarkb> (we may want to move onto other topics before our hour is up)
19:40:47 <fungi> concur
19:40:52 <jlk> +1
19:40:53 <clarkb> I think we have a good handle of the problem with a general solution. we can sort out details later
19:41:00 <jeblair> hrm
19:41:02 <jeblair> details are important?
19:41:15 <jeblair> clarkb: i think you just said that we have decided to test "I" on precise
19:41:21 <jlk> details are important, but consensus may not be reached during meeting.
19:41:26 <clarkb> definitely but so are things like gerrit 2.6, lists.o.o, logstash, etc :)
19:41:36 <jeblair> clarkb: and we're either planning on testing "H" on either quantal or precise?
19:41:47 <clarkb> jeblair: that is how I grok
19:42:13 <jeblair> i think that's kind of an important point to resolve so we don't go off-track...
19:42:38 <fungi> will 14.04 be available in time for the "ifoo" development timeframe, or not until "jbar"?
19:42:43 <jeblair> if you want, we can punt to the next meeting for time, but i don't want to start work on this project without resolving that.
19:43:03 <clarkb> jeblair: ++ I don't intend on things changing until we have consensus
19:43:38 <jeblair> okay, i'll put this on the agenda for next time then
19:43:53 <jeblair> however, the agenda wasn't updated since last time, so, what else would you like to discuss? :)
19:44:23 <clarkb> woops. I know gerrit 2.6, lists.o.o, and logstash are things that are on my radar
19:44:28 <mordred> bunnies
19:44:30 <fungi> does anyone have anything specifically for me? i need to duck out early (another minute or two)
19:44:35 <mordred> nope
19:44:46 <jeblair> #topic gerrit2.6
19:44:48 <clarkb> also I'd like to talk about testr maybe as we should really push that hard before people get wary of merging those changes
19:44:59 <mordred> I believe zaro is going to start looking at 2.6
19:45:08 * fungi ducks out. back in #-infra later if something comes up
19:45:10 <zaro> just started reading docs.
19:45:19 <mordred> awesome
19:45:25 <jeblair> zaro: can you find-or-create a bug in openstack-ci about upgrading to 2.6 and assign it to yourself?
19:45:35 <zaro> jeblair: sure will.
19:45:54 <clarkb> as I understand it the intent with gerrit 2.6 is to no longer run a fork of gerrit
19:46:00 <clarkb> is that correct?
19:46:01 <mordred> zaro: welcome to the traditional hazing gerrit tasks - everyone has had one when they started ... :)
19:46:05 <jeblair> i would like to not run a fork
19:46:05 <mordred> clarkb: yes. if possible
19:46:19 <mordred> ++
19:46:24 <zaro> mordred: thanks a lot.  clarkb warned me during interview.
19:46:27 <clarkb> awesome
19:46:29 <jeblair> if we have to diverge, i'd like us to try to do it in a way where we expect the divergence to be accepted upstream
19:46:37 <clarkb> ++
19:46:41 <mordred> ++
19:46:49 <clarkb> I ask because I think this will influence the upgrade process
19:46:59 <clarkb> want to make sure we are on the same page. sounds like we are \o/
19:47:18 <jeblair> yeah, it's not just forward-port patches, it's gap-analysis, and try to figure out the easiest way to close
19:47:24 <jeblair> clarkb: +1
19:47:33 <jeblair> #topic lists.o.o
19:47:59 <jeblair> clarkb: TTL is already 300
19:48:02 <clarkb> so I just booted and puppetted a replacement server for lists.o.o (old server is oneiric which will EOL in just over a week)
19:48:13 <jeblair> clarkb: so dns is ready to change when you are
19:48:33 <clarkb> ok. I will set temporary DNS records for the new host after this meeting.
19:48:44 <jeblair> we should announce a cutover time
19:48:59 <clarkb> I will start the data migration after the 1st to avoid any mailman monthly emails
19:49:14 <clarkb> jeblair: yes. Is this something that we think needs to happen over a weekend?
19:49:20 <clarkb> (I am leaning that way)
19:49:21 <jeblair> and as i mentioned in -infra a few mins ago, i think we should avoid having exim send over v6 to start
19:49:28 <mordred> ++
19:49:30 <jeblair> though i think it's okay to add AAAA records
19:49:45 <jeblair> (and have exim receive on v6)
19:49:50 * mordred agrees with every opinion jeblair has on mail
19:50:00 <clarkb> if we want to do a weekend before oneiric EOLs we will have to do it this weekend. We can do it the one after if we are willing to risk a couple days of EOL
19:50:20 <zaro> this weekend is good for me.
19:50:21 <jeblair> clarkb: yes, i think something in the friday-night to sunday-morning range
19:50:30 <jeblair> clarkb: and this weekend works for me too
19:50:44 <mordred> same here
19:50:47 <clarkb> how about 9am PST saturday?
19:50:54 <mordred> great
19:50:57 <jeblair> wfm
19:51:01 <zaro> +1
19:51:10 <clarkb> ok, I will send a notification this afternoon after lunch
19:51:16 <jeblair> #action clarkb send email announcing lists.o.o move at 9am pst saturday
19:51:37 <jeblair> #topic testr
19:51:38 <Daviey> mordred: hey
19:51:48 <jeblair> clarkb: testr thoughts?
19:51:57 <mordred> yes. I agree
19:51:58 <clarkb> ya, nova, quantum and some of the clients are done
19:52:08 <mordred> we should push hard on testr early in the cycle
19:52:15 <clarkb> mordred: do we need to be more coordinated and push testr on everyone else before milestone 1?
19:52:23 <mordred> but - it's a big task and slightly out of scope for us
19:52:37 <mordred> I think we should just get markmc and sdague to yell at people
19:53:10 <mordred> (honestly, there's no way that we have the manpower to do it by ourselves)
19:53:36 <mordred> so perhaps bugging ttx to start a chat with folks in the meeting about best ways to get them migrated?
19:55:53 <mordred> I can tell everyone is excited by this topic
19:56:47 <hub_cap> woooo
19:56:55 <sdague> me yelling doesn't help all that much :)
19:56:55 <jeblair> #topic eavesdrop.o.o
19:57:05 <jeblair> i think eavesdrop needs migration too.
19:57:11 <mordred> yah
19:57:11 <clarkb> that works for me.
19:57:11 <clarkb> but I think we should pay attention to it and be proactive
19:57:11 <clarkb> #action clarkb to ping markmc and sdague about move to testr
19:57:11 <clarkb> I will see what they think and do braindumps as necessary
19:57:11 <jeblair> clarkb: shall we do it at the same time as lists?
19:57:11 <clarkb> Do we want an open discussion? I can talk about logstash a little bit too
19:57:18 <clarkb> jeblair: might as well
19:57:35 <jeblair> #topic open discussion
19:57:45 <ttx> mordred: ack, how about I plug you in during meeting so that you pass the bucket ?
19:57:45 <ttx> maybe someone else will volunteer to do the nagging
19:57:55 <mordred> ttx: great
19:58:05 <clarkb> FYI I think logstash.o.o's data is now consistent. After much hammering
19:58:21 <clarkb> index size per day has grown to about 5GB
19:58:25 <jeblair> clarkb: so we're past the burn-and-rebuild stage?
19:58:30 <clarkb> jeblair: I think so
19:58:37 <jeblair> neato!
19:58:42 <mordred> ++
19:59:26 <clarkb> there is a bug in kibana where the timestamps don't show their milliseconds correctly... is fixed in master. I may pull that it. Otherwise I think the next step is getting other logs into logstash
20:00:08 <jeblair> cool.  i think that's time for us.
20:00:24 <jeblair> thanks all, and we'll work out the rest of the details about test platforms next week
20:00:26 <jeblair> #endmeeting