19:02:38 <jeblair> #startmeeting infra
19:02:38 <mordred> o/
19:02:38 <openstack> Meeting started Tue Apr  9 19:02:38 2013 UTC.  The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:02:39 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:02:41 <openstack> The meeting name has been set to 'infra'
19:02:44 <olaph> o/
19:03:08 <jeblair> mordred: do you have anything before you get on a plane?
19:03:20 <mordred> jeblair: I need to unbreak 2.6 for pbr projects
19:03:25 <jeblair> #topic mordred plane flight
19:03:55 <jeblair> we had wanted to just wait for rhel to fix that for us
19:04:06 <jeblair> fungi: but i'm guessing that's not going to happen today
19:04:10 <mordred> yeah - but rhel seems unhappy too I thought?
19:04:25 <mordred> or is it legit just 2.6 ubuntu that's giving us the problem?
19:04:25 <jeblair> mordred: and pbr is broken now
19:04:26 <fungi> i believe dprince was still working on backports from master
19:04:51 <fungi> which is to say we could switch to rhel6 today for master/havana afaik
19:05:09 <jeblair> mordred: so what if we switch the pbr projects to rhel for 26?
19:05:21 <mordred> jeblair: let's try that as step one
19:05:32 <mordred> jeblair: and see how it goes before trying a plan b
19:05:39 <jeblair> mordred: i'm worried that otherwise it involves puppet hacking
19:05:43 <mordred> it _should_ solve the underlying problem
19:05:43 <fungi> mordred: have an example of a job you'd like me to fire on a rhel6 slave as a test?
19:06:00 <jeblair> mordred: which would be very useful, i mean, clarkb thinks we are going to run into this again with python3
19:06:06 <fungi> or is it already testing successfully on rhel6 non-voting?
19:06:15 <clarkb> jeblair: I actually spoke to someone about that and assuming pip -E works properly we should be able to patch the puppet provider
19:06:27 <clarkb> or we run puppet twice with two different python envs set
19:06:29 <mordred> fungi: gate-os-config-applier-python26 and gate-gear-python26
19:06:44 <mordred> jeblair: yah. I believe we ultimately need to solve this more resiliently
19:06:46 <jeblair> mordred, clarkb: you want to fight over which of you hacks that? :)
19:06:55 <mordred> but I think that doing that with some time to think about it properly would be nice
19:06:55 <clarkb> jeblair: I did install zmq for the log pusher script through apt and not pip to avoid this problem :)
19:07:06 <fungi> mordred: those are also failing on rhel6...
19:07:07 <jeblair> clarkb: also, that's preferred anyway.  :)
19:07:09 <fungi> #link https://jenkins.openstack.org/job/gate-os-config-applier-python26-rhel6/
19:07:20 <mordred> fungi: GREAT
19:07:35 <fungi> no idea if they're failing the same way, but they're failing
19:07:37 <clarkb> jeblair: I can bring it back up with finch over at puppetlabs, and see if we can hack something useful for all of puppet users
19:07:38 <jeblair> mordred: fascinating -- and that's with a 2.6 egg
19:08:00 <mordred> ok. that blows my previous theory
19:08:39 <clarkb> basically requires me to do more pip testing and feed him the info so that we can get a patch into the provider
19:08:50 <mordred> how about I spin up a rhel env and debug and get back to everyone. in the mean time, os-config-applier and gear could disable python2.6 tests for the time being if they're blocked
19:08:58 <jeblair> mordred: +1
19:09:05 <clarkb> sounds good
19:09:09 <jeblair> clarkb: +1 as well
19:09:13 <mordred> (not ideal, but, you know, neither project will die without 2.6 for a day)
19:09:16 <mordred> clarkb: ++
19:09:27 * mordred will add better 2.6 testing to pbr as well
19:09:30 * mordred cries
19:09:35 <jeblair> mordred: and did you see the zuul bug i filed?
19:09:44 * dprince is willing to help out if needed too
19:09:46 <jeblair> #link https://bugs.launchpad.net/zuul/+bug/1166937
19:09:47 <uvirtbot> Launchpad bug 1166937 in zuul "Option to group multiple jobs together in job trees" [Wishlist,Triaged]
19:09:54 <mordred> jeblair: I did not. I'll look
19:09:59 <mordred> awesome
19:10:20 <mordred> dprince - if you happen to get bored and figure out why https://jenkins.openstack.org/job/gate-os-config-applier-python26-rhel6/ is breaking in the next hour before I get back online, I will buy you a puppy
19:10:22 <jeblair> i encapsulated what we talked about, including after you dropped off
19:10:29 <mordred> jeblair: thanks!
19:10:47 <jeblair> mordred: i think that implementation is mostly in zuul/model.py
19:10:58 <anteaya> dprince: make sure it is a house trained puppy
19:10:59 <mordred> excellent
19:11:00 <jeblair> mordred: and a little bit in the config parser in scheduler.py
19:11:07 <dprince> mordred: I already have one (a humpy one)
19:11:14 <mordred> nice
19:11:18 <mordred> ok - me run to plane
19:11:21 <mordred> back online in a bit
19:11:29 <jeblair> mordred: godspeed
19:11:53 <jeblair> there were no actions from last meeting
19:11:57 <jeblair> #topi gerrit/lp groups
19:11:59 <jeblair> #topic gerrit/lp groups
19:12:25 <fungi> mmm, did we still have any to-dos on that?
19:12:42 <fungi> i think it's wrapped up aside from any other cleanup ttx might have wanted to do in lp
19:12:52 <jeblair> fungi: did the ptl change land?
19:13:01 <fungi> i was just checking...
19:13:24 <fungi> #link https://review.openstack.org/25806
19:13:30 <fungi> merged yesterday
19:13:53 <jeblair> woo
19:13:55 <fungi> oh, i probably should add a note in that groups cleanup bug of ttx's
19:14:03 <jeblair> #topic grenade
19:14:19 <jeblair> dtroyer pointed me at some changes he wants to merge first, and then...
19:14:30 <jeblair> we can cut stable/grizzly branches of grenade and devstack
19:14:50 <jeblair> and then i think we'll be set to run non-voting grenade jobs widely on both master and stable/grizzly
19:15:14 <clarkb> \o/
19:15:20 <clarkb> is it working now?
19:15:29 <jeblair> clarkb: it has occasionally succeeded
19:16:14 <clarkb> nice
19:16:16 <jeblair> clarkb: i haven't really analyzed the failures to know more about when it succeeds/fails
19:17:09 <jeblair> #topic gearman
19:17:40 <jeblair> so on my side, i wrote a new python gearman client that is much more suited to how we want to use it it zuul
19:17:43 <zaro> this is depressing me
19:17:58 <zaro> i've been debugging.
19:18:02 <jeblair> #link https://github.com/openstack-infra/gear
19:18:11 <zaro> finally figured out exactly why getting double builds.
19:18:57 <zaro> it is because error occurs when attempting to reregister functions while current build is running.
19:19:17 <jeblair> zaro: i can't see a need to register functions while a build is running
19:19:43 <zaro> when error occurs on worker it will close the connection with gearman then reopen but build is still on gearman queue so it runs again.
19:20:34 <zaro> jeblair: code i've got re-registers on events from jenkins.
19:20:55 <jeblair> zaro: right, but it doesn't need to do that while a build is running
19:20:57 <zaro> jeblair: you might want to register at anytime.
19:21:24 <zaro> jeblair: you mean block until build finishes?
19:21:30 <jeblair> zaro: functions are registered per-worker; a worker doesn't need to change its functions while a build is running
19:22:26 <jeblair> zaro: i would postpone changing functions until after the build is complete (which i included in my sketch of a worker routine i sent the other day)
19:22:46 <zaro> jeblair: i see what you mean.  i was looking for a way to get more granular in registering, but didn't see a way.  i can look again.
19:24:28 <zaro> ok.  will try this approach again.  can't remember why i gave up last time.
19:24:33 <jeblair> zaro: i'm of the opinion that the gearman-java GearmanWorkerImpl makes too many assumptions about how it's being used; I think we probably will need to write our own GearmanWorker.  I'm still reading, but I'd like you to consider that as you continue to dig into it.
19:25:08 <zaro> will do.
19:25:43 <jeblair> #topic pypi mirror/requirements
19:26:07 <clarkb> we are gating openstack/requirements on the ability to isntall all requirements together
19:26:38 <jeblair> and when https://review.openstack.org/#/c/26490/ merges, we will actually be running the requirements gate jobs for projects
19:27:51 <jeblair> we probably need to make the jobs and repo branch aware pretty soon...
19:28:06 <jeblair> i think that depends on how openstack/requirements wants to handle branches.  maybe a summit question.
19:28:47 <clarkb> I do have a question about reviewing openstack/requirements. currently we have +2 and approve perms, but it seems like we should defer to the PTLs for most of those reviews?
19:29:35 <jeblair> clarkb: i think so.  i only really intend on weighing in when it seems to affect build/test oriented things...
19:29:47 <fungi> i've been refraining from approving them in most cases if it's only ci core votes on them, unless there's some urgency
19:30:26 <fungi> usually only when it's breaking the gate or holding back a ci project
19:30:39 <jeblair> i don't feel i have a lot of input on random library versions, so yeah, i'd say we should be conservative and mostly the openstack-common and ptls should be weighing in most of the time
19:30:40 <jeblair> fungi: +1
19:31:04 <clarkb> cool. I figured we had the perms to sort out problems, but wasnt sure if we had been asked to actuall manage the repo
19:31:08 <fungi> er, poor wording. not a ci project but rather ci work on an openstack project which uses the requirements repo
19:31:40 <jeblair> markmc did explicitly want us to be involved, so afaik, we're not stepping on anyone's toes.
19:32:10 <jeblair> we didn't accidentally get perms to the repo, we really are supposed to have them.  :)
19:32:57 <jeblair> #topic releasing git-review
19:33:03 <fungi> it happened
19:33:10 <jeblair> woo!
19:33:16 <fungi> 1.21 is on pypi, manually this time
19:33:33 <fungi> 1.22 may be automated, if the gods are willing
19:33:52 <fungi> i've been running back through and closing out bug reports if they're fixed in 1.21
19:34:02 <jeblair> fungi: do we need to schedule a chat (perhaps when mordred is around) about pbr/etc for git-review?
19:34:29 <fungi> yes, some time in one of those rare moments when he's not on a plane
19:34:41 <jeblair> fungi: i'll put it on the agenda so we don't forget
19:34:51 <fungi> thanks
19:35:07 <jeblair> #topic baremetal testing
19:35:31 <pleia2> so the tripleo folks have been changing up diskimage-builder and how it creates the bootstrap node a bit
19:36:04 <pleia2> so I've been testing changes as it comes along so we're ready once they're ready to start doing formal testing
19:37:13 <pleia2> also been working on getting this going https://github.com/openstack-infra/devstack-gate/blob/master/README.md#developer-setup but keep bumping into issues with the instructions (they're a bit slim, need some local additions and modifications)
19:37:23 * ttx lurks
19:37:32 <jeblair> pleia2: they may have bitrotted too, let me know if you have questions
19:37:44 <jeblair> pleia2: i haven't actually had to follow those in months
19:37:49 <pleia2> did have a wip-devstack-precise-1365534386.template.openstack.org started on hpcloud this morning though, even if it failed once it tried to grab hiera data from puppet
19:38:09 <pleia2> jeblair: great, thanks
19:38:24 <pleia2> just trying to work with it to get a feel for how this works
19:38:31 <pleia2> (aside from just reading scripts)
19:38:50 <pleia2> that's about it though for baremetal
19:38:56 <fungi> that's awesome progress
19:39:03 <anteaya> o/
19:39:03 <jeblair> pleia2: you may need to combine the 'install_jenkins_slave.sh' trick of running puppet apply with the devstack-gate developer setup to avoid it trying to talk to our puppetmaster
19:39:28 <pleia2> jeblair: makes sense, thanks
19:39:32 <fungi> pleia2: did you get past the sqlite db content requirements, i guess?
19:39:44 <pleia2> fungi: yeah, devananda got me sorted :) (I'll be updating the docs)
19:39:53 <jeblair> pleia2: thanks much.  :)
19:39:59 <fungi> excellent
19:40:33 <jeblair> i think the config portion of the sqlite db should become a yaml file (though the status portion should probably remain a sqlite db)
19:40:55 <jeblair> that is not high on my todo list.  :(
19:41:23 <jeblair> #topic open discussion
19:41:37 <clarkb> logstash
19:41:45 <anteaya> o/ I added openstackwatch as an agenda item to the wrong wikipage
19:41:53 <jeblair> anteaya: which page?
19:42:06 <anteaya> https://wiki.openstack.org/wiki/Meetings/CITeamMeeting
19:42:13 <fungi> anteaya: yeah should have been https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting
19:42:13 * clarkb blazes ahead with logstash because it should be short
19:42:21 <anteaya> yeah
19:42:22 <anteaya> d'oh
19:42:23 <jeblair> clarkb: go
19:42:39 <clarkb> logstash is running on logstash.openstack.org. you can get the web gui and query it at http://logstash.openstack.org
19:42:39 <jeblair> anteaya: (didn't know that existed; will delete)
19:42:44 <anteaya> k
19:42:51 <clarkb> currently all jenkins job console logs should be getting indexed there.
19:43:42 <clarkb> I may delete data for cleanup purposes
19:44:03 <clarkb> and some of the current data is ugly, but it is getting better as I add filters to logstash to properly parse things
19:44:18 <jeblair> clarkb: i'd say feel free to delete/reset at will as we work this out up until (if/when) we decide logstash is the primary repository instead of logs.o.o
19:44:18 <clarkb> over the course of 24 hours we have added over 21 million log lines
19:44:44 <fungi> any feel for how much retention we can reasonably shoot for there?
19:44:46 <clarkb> the index for today (UTC time) is up to almost 12GB compressed
19:44:55 <clarkb> and this is just console logs
19:45:16 <clarkb> at the end of today UTC time I will run the optimize operation on that index to see if that results in a smaller index
19:45:43 <jeblair> clarkb: that is a lot.
19:46:15 <clarkb> yeah, we may need to aggressively filter and/or run a proper elasticsearch cluster if we want to use this for long term storage
19:46:17 <jeblair> clarkb: as in, almost certainly too much, especially since it's a fraction of what we're storing.
19:46:25 <pleia2> yeah, I think we're averaging about 2G/day compressed in static form
19:46:59 <clarkb> fwiw I think logstash is viable as a short term storage location for easy querying
19:47:12 <clarkb> then have an archive like logs.openstack.org for long term storage.
19:47:30 <clarkb> then if we need to run the log-pusher script over particular logs to reshove them into logstash if we need something from the past
19:48:01 <clarkb> thats about all I have
19:48:14 <jeblair> clarkb: if we have to compromise on what we use it for, we should actually set out some goals and requirements and make sure we achieve them.
19:48:27 <jeblair> clarkb: good summit conversation fodder
19:48:29 <pleia2> reminds me, someone might want to check that my find command to rotate logs is behaving as expected
19:48:33 <clarkb> jeblair: yup
19:48:37 <pleia2> pretty sure it should have deleted some by now
19:49:09 <fungi> pleia2: maybe not... as i said before i didn't restore any logs from prior to september 26 when i rebuilt the server
19:49:19 <clarkb> anteaya: I think you are up
19:49:31 <anteaya> openstackwatch is alive: http://rss.cdn.openstack.org/cinder.xml
19:49:37 <anteaya> but serves no content
19:49:43 <pleia2> fungi: oh right, I had an off by one month in my head month-wise
19:49:55 <anteaya> this is what it should be serving: http://rss.chmouel.com/cinder.xml
19:50:11 <anteaya> so somewhere part of the script is not getting what it expected
19:50:32 <anteaya> so the question came up, do we stay with making swift work or do we go with serving xml files
19:50:34 <jeblair> anteaya: i thought the hypothesis was that review-dev was overwriting it?
19:50:49 <anteaya> that was a potential hypothesis yes
19:50:57 <anteaya> clarkb might be able to expand on that more
19:51:13 <fungi> jeblair: i chimed in later when i got back from dinner and pointed out that the config on review-dev lacks swift credentials, so could not
19:51:20 <jeblair> ah
19:51:28 <anteaya> I too had missed that, thank you fungi
19:51:36 <anteaya> so in terms of a way forward
19:51:57 <anteaya> stay with swift, or go with xml was my understanding of the question
19:52:05 <fungi> and also suggested that the stdout capability openstackwatch has as a fallback would be a useful way to troubleshoot it
19:52:39 <jeblair> yes, it seems the lack of content is a separate question from the output format.  that just needs debugging.
19:52:45 <anteaya> so at this point, do we have a way to debug what is running?
19:53:06 <anteaya> at least to understand why no content is being served?
19:53:16 <jeblair> i think it would be useful for this group to decide what we actually want to do with this
19:53:18 <fungi> anteaya: you can run it yourself but don't put swift credentials in the config and it should spew on stdout what it would otherwise upload to swift
19:53:29 <jeblair> what service are we trying to provide?  and how should we host that service?
19:53:34 <fungi> at least from what i could tell reading through the script
19:53:53 <anteaya> yes, well chmouel's feed bears witness to that
19:54:09 <anteaya> but I am at a loss as to why our configuration of the script serves no content
19:54:24 <fungi> well, the short description is "rss feeds of new changes uploaded for review on individual projects"
19:54:29 <clarkb> jeblair: I think the service here is providing reviewers/interested parties an alternative to the gerrit project watches and email
19:54:30 <chmouel> maybe it would be best to generate xml static file than uploading to swift?
19:55:02 <jeblair> clarkb: that sounds like a useful service; so in that case, i think we should have it automatically generate a feed for every project on review.o.o
19:55:04 <anteaya> chmouel any idea why our config would serve no content yet yours does?
19:55:31 <chmouel> humm i'm not sure
19:55:36 <chmouel> let me check the scollback
19:55:53 <fungi> chmouel: that's one suggestion which came up. basically modify it so that we serve those rss xml files directly from the apache instance our gerrit server
19:56:00 <anteaya> chmouel: the script is the same: https://github.com/openstack-infra/jeepyb/blob/master/jeepyb/cmd/openstackwatch.py
19:56:01 <jeblair> if we want to make this a more seamless integration with gerrit, then i think we sholud host it at review.o.o.  perhaps at a url like 'review.openstack.org/rss/org/project.xml'
19:56:22 <fungi> and then link that in the gerrit theme?
19:56:32 <chmouel> jeblair: the proper solution would be that gerrit itself provide rss feeds :)
19:56:35 <jeblair> we could have it either read project.yaml or 'gerrit ls-projects' to get the list
19:57:05 <jeblair> chmouel: true.  :)  people so rarely volunteer for java hacking projects around here.
19:57:32 <chmouel> heh fair  java+xml is not much  fun
19:58:11 <jeblair> fungi: the theme linking will take some thought i think, especially a way to handle the per-project feeds
19:58:33 <anteaya> jeblair: should we continue to mull on it and discuss it again next week?
19:58:40 <anteaya> I'm not feeling a decision is nigh
19:58:42 <fungi> right, i'm not immediately coming up with any great ideas as to how to make that visible in the gerrit interface
19:59:03 <fungi> the projects list is not something people hit often, for example
19:59:06 <chmouel> next week is probably going to be beer^H^Hsummit time :)
19:59:18 <clarkb> plenty of time for discussion :)
19:59:19 * anteaya notes to use the correct wiki page next time
19:59:25 <jeblair> yeah, let's think about that.  there's always just documenting it in the wiki; but it would be nice to get some kind of link going on.
19:59:33 <jeblair> thanks everyone!
19:59:41 <jeblair> see you next week, in person, i hope
19:59:44 <ttx> see you all soon!
19:59:52 <chmouel> see you soon!
19:59:57 <jeblair> #endmeeting