19:02:59 #startmeeting infra 19:03:00 Meeting started Tue Jan 20 19:02:59 2015 UTC and is due to finish in 60 minutes. The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:03:02 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:03:04 The meeting name has been set to 'infra' 19:03:13 #link agenda https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:03:24 ...no idea if that's accurate... 19:03:25 #link previous meeting http://eavesdrop.openstack.org/meetings/infra/2015/infra.2015-01-13-19.02.html 19:03:37 hi 19:03:42 o/ 19:03:49 #topic Actions from last meeting 19:03:55 SergeyLukjanov send announcement about puppet module split to infra and dev 19:03:58 that happened 19:04:17 lemme dig up a link 19:04:31 I have it 19:04:36 #link http://lists.openstack.org/pipermail/openstack-infra/2015-January/002323.html 19:05:02 yay. so that's jan 28th 19:05:33 we should all be back from running around various places by then :) 19:05:38 or at lesat most of us 19:05:39 yep 19:05:43 me three 19:05:44 except for anteaya 19:05:59 yeah, I'll be around then too (but flying off to fosdem the day after) 19:06:18 jhesketh: around? 19:06:43 i think we can defer scheduling project renames for now 19:07:06 yeah 19:07:14 don't think we'd do any before next week 19:07:33 #topic Priority efforts: Puppet module split 19:07:52 I have mostly put these changes on the back burner figuring we would crank through them during the sprint 19:07:52 #link https://etherpad.openstack.org/p/puppet-module-split-sprint 19:08:02 if that is not ok let me know and I can put those changes back on my review list 19:08:17 there's a sign up sheet on the etherpad ^ 19:08:56 I have assigned myself a bunch of modules on storyboard 19:09:16 clarkb, I added recheckwatch to run through before the sprint, but it can wait 19:10:08 I updated storyboard a bit, but not sure it's 100% synchronized with reality 19:10:34 asselin_: recheckwatch? 19:10:47 puppet-recheckwatch 19:10:48 o/ (sorry for late) 19:11:22 jeblair, https://review.openstack.org/#/c/140430/ 19:11:24 it seemed like there were a lot of volunteers to write the modules and split the repos, so i figured i would spend the sprint focusing on reviewing the same 19:11:40 as that is where i fear we may get bottlenecked 19:11:51 that and possibly troubleshooting problems arising from merging them 19:13:40 asselin_: ah, gotcha, thanks 19:13:54 fungi: ++ 19:15:39 #topic Priority efforts: Nodepool DIB 19:16:15 so as of today this is working for devstack images in hpcloud best I can tell 19:16:26 though I should check overall health of image building across all the images 19:16:34 but we can build them and mix snapshots and dib images 19:16:51 and check experimental runs stuff on them 19:16:56 for devstack-gate 19:17:05 fungi: I switched all jobs too 19:17:17 oh, right, now devstack jobs running in hpcloud get dib 19:17:21 clarkb: you switched all jobs? :) 19:17:27 er, dib-built images 19:17:27 so the the devstack-trusty and devstack-precise labels in hpcloud are using those images 19:17:35 oh wow! 19:17:47 ya it worked so I dove in and it hasn't broken yet 19:17:53 and seem (yay!) to not have broken anything obvious 19:17:56 that was thursday last week? 19:18:02 sounds right 19:18:18 that's excellent news! 19:18:33 so next steps are make not devstack images dibable (make jobs do more?) and make nodepool speak rax glance and image format 19:18:35 so what's next? wait for mordred to get us ready for rax-lance? 19:18:50 jeblair: ya that and working on moving non devstack jobs to dib'd images too 19:19:00 rax lance sounds like a fiendish weapon 19:19:05 ohai1 19:19:09 fungi: i'm told it's painful, yes 19:19:16 sorry - was arguing keystone in -dev 19:19:37 keystone wins. flawless victory 19:19:39 I've got a local patch to nodepool that's halfway to getting the needed glance/swift code in place 19:19:44 clarkb: cool, that will make it easier to expand to all 9 virtual routers in hpcloud too (once we get rid of nodepool-built images there) 19:19:48 I'm hoping to finish on today's plane 19:20:21 oh there is a bug that we build two images right next to each other and upload those as our two current images instead of having a 24 hour old image and a current image 19:20:23 also - jroll tells me that the network info in config-drive is rolling out to rax now and is in place in IAD 19:20:29 so that needs debugging and fixing too 19:21:17 clarkb: after 24 hours, do we _also_ have a 24 hour old image plus 1 or 2 current images? 19:21:30 or do we delete both then have 2 new ones? 19:21:38 jeblair: delete both and have 2 new ones 19:21:52 jeblair: I think it may be because I manually did image builds at some point and it threw off image rotating 19:22:07 I think that if I maybe delete the older of the two it will go back to normal? 19:22:10 okay, that's pretty important then -- i don't want to be without the ability to roll back images quickly 19:22:18 jeblair: ya 19:22:41 that should be unit-testable in nodepool i think 19:22:49 ok will look into that 19:22:58 and ya should be since we don't actually need to make images to test rotation 19:23:09 (so hopefully should have a faster debug/test cycle than many nodepool changes) 19:23:40 anything else on dib nodepool? 19:24:03 not from me 19:24:26 #topic priority efforts: Jobs on trusty 19:24:45 there has been some movement in the discussion with the ubuntu package maintainers 19:25:01 fungi: was is just a shuffling kind of dance? 19:25:02 however, it sounds like they don't expect to have it fixed in the next few weeks at least 19:25:15 so i think we should go ahead and drop py33 testing for now 19:25:49 we can switch all but two projects which are currently gating on 3.3 to 3.4 19:25:58 i can write up the change some time today hopefully 19:26:53 we should probably send something to -dev and give at least 1 week notice, yeah? 19:26:55 oslo.rootwrap and oslo.messaging would be the two left naked until we can run their jobs on trusty 19:27:09 yep, i'll put together an announcement for that 19:27:28 I'm assuming dhellmann is fine with leaving rootwrap and messaging naked for a minute? 19:28:04 i _did_ look into centos 7 as an option, but they don't have 3.4 available for side-by-side installation (another option i'm not fond of is compiling our own or using a third-party package) 19:28:58 i hope so but need to confirm with him again. i did offer some help to the ubuntuites if i can be of assistance at all in getting some of the legwork done but haven't seen a reply yet 19:30:02 i'm not sure what else is current on the agenda, so.... 19:30:06 anyway, i don't have anything else for this topic at the moment 19:30:06 #topic open discussion 19:30:47 reports of slow gerrit 19:31:16 I'll mention that we have the third-party documentation sprint coming up this week 19:31:28 we are hitting our 8GB memory limit on the jvm 19:31:38 that may be a symptom or the cause 19:32:09 krtaylor: what is the goal of the sprint? 19:32:10 zaro: ^ is 8GB reasonable? maybe this is a memory leak 19:32:40 jeblair, major rewrite/refresh of the third-party docs, any and all help would be appreciated -> https://etherpad.openstack.org/p/third-party-ci-documentation 19:32:42 clarkb: how much history for that metric do we have? 19:33:05 clarkb, I've got a tool that can pinpoint memory leaks 19:33:26 jeblair: via melody a year but the graphs are kinda rough 19:33:38 er sorry melody hasn't been there for a year so since we reinstalled gerrit? 19:33:47 krtaylor: gotcha. unfortunately, most of infra-core is traveling and in meetings this week :( 19:34:08 my WIP zanata module change could use some feedback if someone wants to play around with it this week, there are some puppet order of operations issues (need wildfly modules installed before zanata, etc) and I'm not quite sure what the best way about structuring it is https://review.openstack.org/#/c/147947/ 19:34:31 jeblair, then there'll be (hopefull) lots to review when on planes, or when you're back 19:34:36 jeblair, yeah, but at least we can get a bunch of patches waiting for review when everyone frees up 19:34:44 asselin_, yes 19:34:45 pleia2: that sounds like a good thing to look at during my layover 19:34:46 asselin_, krtaylor ++ 19:34:57 there was a point at which we speculated that gertty's polling frequency might be too high and could put increased load on gerrit... anyway to confirm whether its increased popularity is/isn't contributing? 19:35:20 krtaylor: I'm not core, but I should be around during the sprint and can help out with reviews 19:35:27 I have since end of september looks like in melody 19:35:51 and there is a definite positive slope on the memory graph 19:35:51 pleia2, great, thanks! 19:36:18 pleia2: im happy to be a resource on zanata/puppet stuff 19:36:23 clarkb: that does sound leaky; is that the last restart? 19:36:56 jeblair: maybe? I need to read ps manpage t oconfirm :) ps -elf says 2014 19:37:04 heh 19:37:06 nibalizer: thanks, I've appreciated your review thus far :) 19:37:15 clarkb, http://h30499.www3.hp.com/t5/Business-Service-Management-BAC/Plugging-the-leaks-Blog-3-A-novel-approach/ba-p/5410775#.VL6uV0cVgh8 19:37:25 clarkb: maybe restarting right now would give us info 19:37:29 clarkb, there's a free 'profiler' mode that should work 19:39:22 clarkb: should we try that now? 19:40:32 jeblair: restart was december 6th 19:40:37 let me look at graph again knowing that 19:41:16 ya looks like there is a fall in memory use right around then and about a week later it picks up in a large tick then again in a couple weeks 19:41:20 ftr, `ps -o pid,lstart` is the magic for seeing a proper start time 19:41:27 jeblair: so at very least it may be a good tempoary fix to restart gerrit 19:42:15 yeah, i think it would be useful to know if a restart makes it better, and how long it takes for it to get worse (and whether we can correlate that with anything) 19:42:41 i think this meeting has about wound down... any last items before we call it done? 19:42:41 ok I can do that now someone want to kick statusbot? 19:43:04 clarkb: will do 19:43:22 oh, while we have a bunch of foundation people in a room together as a captive audience, anyone have anything they want us to make them generally aware of which you think they might not be? 19:43:41 jeblair: do I need to do anything to make zuul not freak out or just roll the dice? 19:44:00 fungi: thank them for the web site work? :) 19:44:04 zuul _seems_ to reconnect fine 19:44:23 ya I think its more if zuul tries to report while gerrit is restarting that it may have a sad 19:44:36 jeblair: yes! it's even on the roadmap in the master slide deck to achieve "infra team parity" by end of 2015 19:44:54 zuul is processing a small queue backlog maybe I should wait 19:44:59 so hoping to help make that goal a reality 19:45:12 woot! 19:45:30 except zuul is probably slow processing that due to slow gerrit 19:45:53 ok I am not waiting just going to restart 19:45:58 thanks everyone, see you at next week's meeting and sprint 19:46:02 #endmeeting