#openstack-meeting log

19:02:59 <jeblair> #startmeeting infra
19:03:00 <openstack> Meeting started Tue Jan 20 19:02:59 2015 UTC and is due to finish in 60 minutes.  The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:03:02 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:03:04 <openstack> The meeting name has been set to 'infra'
19:03:13 <jeblair> #link agenda https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:03:24 <jeblair> ...no idea if that's accurate...
19:03:25 <jeblair> #link previous meeting http://eavesdrop.openstack.org/meetings/infra/2015/infra.2015-01-13-19.02.html
19:03:37 <asselin_> hi
19:03:42 <krtaylor> o/
19:03:49 <jeblair> #topic Actions from last meeting
19:03:55 <jeblair> SergeyLukjanov send announcement about puppet module split to infra and dev
19:03:58 <jeblair> that happened
19:04:17 <jeblair> lemme dig up a link
19:04:31 <clarkb> I have it
19:04:36 <clarkb> #link http://lists.openstack.org/pipermail/openstack-infra/2015-January/002323.html
19:05:02 <jeblair> yay.  so that's jan 28th
19:05:33 <jeblair> we should all be back from running around various places by then :)
19:05:38 <jeblair> or at lesat most of us
19:05:39 <fungi> yep
19:05:43 <clarkb> me three
19:05:44 <fungi> except for anteaya
19:05:59 <pleia2> yeah, I'll be around then too (but flying off to fosdem the day after)
19:06:18 <jeblair> jhesketh: around?
19:06:43 <jeblair> i think we can defer scheduling project renames for now
19:07:06 <fungi> yeah
19:07:14 <fungi> don't think we'd do any before next week
19:07:33 <jeblair> #topic Priority efforts:  Puppet module split
19:07:52 <clarkb> I have mostly put these changes on the back burner figuring we would crank through them during the sprint
19:07:52 <jeblair> #link https://etherpad.openstack.org/p/puppet-module-split-sprint
19:08:02 <clarkb> if that is not ok let me know and I can put those changes back on my review list
19:08:17 <jeblair> there's a sign up sheet on the etherpad ^
19:08:56 <mmedvede> I have assigned myself a bunch of modules on storyboard
19:09:16 <asselin_> clarkb, I added recheckwatch to run through before the sprint, but it can wait
19:10:08 <asselin_> I updated storyboard a bit, but not sure it's 100% synchronized with reality
19:10:34 <jeblair> asselin_: recheckwatch?
19:10:47 <asselin_> puppet-recheckwatch
19:10:48 <nibalizer> o/ (sorry for late)
19:11:22 <asselin_> jeblair, https://review.openstack.org/#/c/140430/
19:11:24 <fungi> it seemed like there were a lot of volunteers to write the modules and split the repos, so i figured i would spend the sprint focusing on reviewing the same
19:11:40 <fungi> as that is where i fear we may get bottlenecked
19:11:51 <fungi> that and possibly troubleshooting problems arising from merging them
19:13:40 <jeblair> asselin_: ah, gotcha, thanks
19:13:54 <jeblair> fungi: ++
19:15:39 <jeblair> #topic Priority efforts:  Nodepool DIB
19:16:15 <clarkb> so as of today this is working for devstack images in hpcloud best I can tell
19:16:26 <clarkb> though I should check overall health of image building across all the images
19:16:34 <clarkb> but we can build them and mix snapshots and dib images
19:16:51 <fungi> and check experimental runs stuff on them
19:16:56 <fungi> for devstack-gate
19:17:05 <clarkb> fungi: I switched all jobs too
19:17:17 <fungi> oh, right, now devstack jobs running in hpcloud get dib
19:17:21 <jeblair> clarkb: you switched all jobs? :)
19:17:27 <fungi> er, dib-built images
19:17:27 <clarkb> so the the devstack-trusty and devstack-precise labels in hpcloud are using those images
19:17:35 <jeblair> oh wow!
19:17:47 <clarkb> ya it worked so I dove in and it hasn't broken yet
19:17:53 <fungi> and seem (yay!) to not have broken anything obvious
19:17:56 <clarkb> that was thursday last week?
19:18:02 <fungi> sounds right
19:18:18 <jeblair> that's excellent news!
19:18:33 <clarkb> so next steps are make not devstack images dibable (make jobs do more?) and make nodepool speak rax glance and image format
19:18:35 <jeblair> so what's next?  wait for mordred to get us ready for rax-lance?
19:18:50 <clarkb> jeblair: ya that and working on moving non devstack jobs to dib'd images too
19:19:00 <fungi> rax lance sounds like a fiendish weapon
19:19:05 <mordred> ohai1
19:19:09 <jeblair> fungi: i'm told it's painful, yes
19:19:16 <mordred> sorry - was arguing keystone in -dev
19:19:37 <fungi> keystone wins. flawless victory
19:19:39 <mordred> I've got a local patch to nodepool that's halfway to getting the needed glance/swift code in place
19:19:44 <jeblair> clarkb: cool, that will make it easier to expand to all 9 virtual routers in hpcloud too (once we get rid of nodepool-built images there)
19:19:48 <mordred> I'm hoping to finish on today's plane
19:20:21 <clarkb> oh there is a bug that we build two images right next to each other and upload those as our two current images instead of having a 24 hour old image and a current image
19:20:23 <mordred> also - jroll tells me that the network info in config-drive is rolling out to rax now and is in place in IAD
19:20:29 <clarkb> so that needs debugging and fixing too
19:21:17 <jeblair> clarkb: after 24 hours, do we _also_ have a 24 hour old image plus 1 or 2 current images?
19:21:30 <jeblair> or do we delete both then have 2 new ones?
19:21:38 <clarkb> jeblair: delete both and have 2 new ones
19:21:52 <clarkb> jeblair: I think it may be because I manually did image builds at some point and it threw off image rotating
19:22:07 <clarkb> I think that if I maybe delete the older of the two it will go back to normal?
19:22:10 <jeblair> okay, that's pretty important then -- i don't want to be without the ability to roll back images quickly
19:22:18 <clarkb> jeblair: ya
19:22:41 <jeblair> that should be unit-testable in nodepool i think
19:22:49 <clarkb> ok will look into that
19:22:58 <clarkb> and ya should be since we don't actually need to make images to test rotation
19:23:09 <jeblair> (so hopefully should have a faster debug/test cycle than many nodepool changes)
19:23:40 <jeblair> anything else on dib nodepool?
19:24:03 <clarkb> not from me
19:24:26 <jeblair> #topic priority efforts:  Jobs on trusty
19:24:45 <fungi> there has been some movement in the discussion with the ubuntu package maintainers
19:25:01 <mordred> fungi: was is just a shuffling kind of dance?
19:25:02 <fungi> however, it sounds like they don't expect to have it fixed in the next few weeks at least
19:25:15 <fungi> so i think we should go ahead and drop py33 testing for now
19:25:49 <fungi> we can switch all but two projects which are currently gating on 3.3 to 3.4
19:25:58 <fungi> i can write up the change some time today hopefully
19:26:53 <jeblair> we should probably send something to -dev and give at least 1 week notice, yeah?
19:26:55 <fungi> oslo.rootwrap and oslo.messaging would be the two left naked until we can run their jobs on trusty
19:27:09 <fungi> yep, i'll put together an announcement for that
19:27:28 <mordred> I'm assuming dhellmann is fine with leaving rootwrap and messaging naked for a minute?
19:28:04 <fungi> i _did_ look into centos 7 as an option, but they don't have 3.4 available for side-by-side installation (another option i'm not fond of is compiling our own or using a third-party package)
19:28:58 <fungi> i hope so but need to confirm with him again. i did offer some help to the ubuntuites if i can be of assistance at all in getting some of the legwork done but haven't seen a reply yet
19:30:02 <jeblair> i'm not sure what else is current on the agenda, so....
19:30:06 <fungi> anyway, i don't have anything else for this topic at the moment
19:30:06 <jeblair> #topic open discussion
19:30:47 <clarkb> reports of slow gerrit
19:31:16 <krtaylor> I'll mention that we have the third-party documentation sprint coming up this week
19:31:28 <clarkb> we are hitting our 8GB memory limit on the jvm
19:31:38 <clarkb> that may be a symptom or the cause
19:32:09 <jeblair> krtaylor: what is the goal of the sprint?
19:32:10 <clarkb> zaro: ^ is 8GB reasonable? maybe this is a memory leak
19:32:40 <krtaylor> jeblair, major rewrite/refresh of the third-party docs, any and all help would be appreciated ->  https://etherpad.openstack.org/p/third-party-ci-documentation
19:32:42 <jeblair> clarkb: how much history for that metric do we have?
19:33:05 <asselin_> clarkb, I've got a tool that can pinpoint memory leaks
19:33:26 <clarkb> jeblair: via melody a year but the graphs are kinda rough
19:33:38 <clarkb> er sorry melody hasn't been there for a year so since we reinstalled gerrit?
19:33:47 <jeblair> krtaylor: gotcha.  unfortunately, most of infra-core is traveling and in meetings this week :(
19:34:08 <pleia2> my WIP zanata module change could use some feedback if someone wants to play around with it this week, there are some puppet order of operations issues (need wildfly modules installed before zanata, etc) and I'm not quite sure what the best way about structuring it is https://review.openstack.org/#/c/147947/
19:34:31 <asselin_> jeblair, then there'll be (hopefull) lots to review when on planes, or when you're back
19:34:36 <krtaylor> jeblair, yeah, but at least we can get a bunch of patches waiting for review when everyone frees up
19:34:44 <krtaylor> asselin_, yes
19:34:45 <clarkb> pleia2: that sounds like a good thing to look at during my layover
19:34:46 <jeblair> asselin_, krtaylor ++
19:34:57 <fungi> there was a point at which we speculated that gertty's polling frequency might be too high and could put increased load on gerrit... anyway to confirm whether its increased popularity is/isn't contributing?
19:35:20 <pleia2> krtaylor: I'm not core, but I should be around during the sprint and can help out with reviews
19:35:27 <clarkb> I have since end of september looks like in melody
19:35:51 <clarkb> and there is a definite positive slope on the memory graph
19:35:51 <krtaylor> pleia2, great, thanks!
19:36:18 <nibalizer> pleia2: im happy to be a resource on zanata/puppet stuff
19:36:23 <jeblair> clarkb: that does sound leaky; is that the last restart?
19:36:56 <clarkb> jeblair: maybe? I need to read ps manpage t oconfirm :) ps -elf says 2014
19:37:04 <jeblair> heh
19:37:06 <pleia2> nibalizer: thanks, I've appreciated your review thus far :)
19:37:15 <asselin_> clarkb, http://h30499.www3.hp.com/t5/Business-Service-Management-BAC/Plugging-the-leaks-Blog-3-A-novel-approach/ba-p/5410775#.VL6uV0cVgh8
19:37:25 <jeblair> clarkb: maybe restarting right now would give us info
19:37:29 <asselin_> clarkb, there's a free 'profiler' mode that should work
19:39:22 <jeblair> clarkb: should we try that now?
19:40:32 <clarkb> jeblair: restart was december 6th
19:40:37 <clarkb> let me look at graph again knowing that
19:41:16 <clarkb> ya looks like there is a fall in memory use right around then and about a week later it picks up in a large tick then again in a couple weeks
19:41:20 <fungi> ftr, `ps -o pid,lstart` is the magic for seeing a proper start time
19:41:27 <clarkb> jeblair: so at very least it may be a good tempoary fix to restart gerrit
19:42:15 <jeblair> yeah, i think it would be useful to know if a restart makes it better, and how long it takes for it to get worse (and whether we can correlate that with anything)
19:42:41 <jeblair> i think this meeting has about wound down... any last items before we call it done?
19:42:41 <clarkb> ok I can do that now someone want to kick statusbot?
19:43:04 <jeblair> clarkb: will do
19:43:22 <fungi> oh, while we have a bunch of foundation people in a room together as a captive audience, anyone have anything they want us to make them generally aware of which you think they might not be?
19:43:41 <clarkb> jeblair: do I need to do anything to make zuul not freak out or just roll the dice?
19:44:00 <jeblair> fungi: thank them for the web site work? :)
19:44:04 <fungi> zuul _seems_ to reconnect fine
19:44:23 <clarkb> ya I think its more if zuul tries to report while gerrit is restarting that it may have a sad
19:44:36 <fungi> jeblair: yes! it's even on the roadmap in the master slide deck to achieve "infra team parity" by end of 2015
19:44:54 <clarkb> zuul is processing a small queue backlog maybe I should wait
19:44:59 <fungi> so hoping to help make that goal a reality
19:45:12 <jeblair> woot!
19:45:30 <clarkb> except zuul is probably slow processing that due to slow gerrit
19:45:53 <clarkb> ok I am not waiting just going to restart
19:45:58 <jeblair> thanks everyone, see you at next week's meeting and sprint
19:46:02 <jeblair> #endmeeting