19:02:12 #startmeeting infra 19:02:14 Meeting started Tue Jan 14 19:02:12 2014 UTC and is due to finish in 60 minutes. The chair is fungi. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:02:15 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:02:17 The meeting name has been set to 'infra' 19:02:23 o/ 19:02:27 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:03:07 i took the liberty to throw a few more items on the agenda at the last minute, mainly stuff which has cropped up on which i'm either looking into or failing to find bandwidth to address 19:03:30 we may not get to everything mentioned there, but we'll see how far the hour takes us 19:03:41 #topic Actions from last meeting 19:03:46 there were none 19:03:53 yay! 19:03:54 we win 19:03:57 it was also a very short meeting 19:04:11 #topic Trove testing (mordred, hub_cap, SlickNik) 19:04:28 I have no status 19:04:34 no exciting news? 19:04:37 and hub_cap and SlickNik are not here 19:04:47 well, I've done no real work in a month, so expect very little from me 19:04:47 SlickNik was in here last week and said he's working on it 19:04:51 awesome 19:05:05 so we'll just assume that's still the case and move along 19:05:15 #topic Tripleo testing (lifeless, pleia2, fungi) 19:05:26 we apparently have a new tripleo cloud? 19:05:30 the new cloud is up for this, fungi added the new info in a review 19:05:37 lifeless and i were discussing it last night 19:05:45 and again just now 19:05:51 :) 19:06:02 i need to test the credentials by creating a floating-ip for the controller i guess? 19:06:04 I have to schedule a meeting with dprince and derekh to chat about progress otherwise 19:06:22 i'll get the details later on what should happen from my end 19:06:30 thanks 19:06:41 that's it from me, holidays + LCA has put me a bit behind 19:06:42 the config change for nodepool is at... 19:06:46 #link https://review.openstack.org/66491 19:07:05 i've updated the creds in hiera to what i think they're supposed to be now 19:07:27 the old poc cloud going down exposed a nodepool bug for us too 19:07:27 * mordred is excited 19:07:44 #link https://launchpad.net/bugs/1269001 19:07:46 Launchpad bug 1269001 in nodepool "Nodepool stops building any new nodes when one provider is down" [High,Triaged] 19:07:54 ouch 19:08:04 it resulted in the backup you see in the gate currently 19:08:06 yeh, that's why we have a huge gate queue this morning 19:08:11 the test nodes graph on the status page is fun 19:08:36 we had 250 jobs in the check queue, at least that is trending down 19:09:01 i said in the bug i'd make a patch, and then started to dig into the nodepool source, and then was drawn and quartered by other things which cropped up, so if anyone else wants that bug, it's probably not too hard 19:09:25 otherwise i hope to get to it before it bites us again 19:10:03 anything else on tripleo testing before i move on? 19:10:25 #action fungi test new tripleo ci cloud account credentials 19:10:34 #topic Savanna testing (SergeyLukjanov) 19:10:46 nothing realling interesting this week too 19:10:55 changes are under review in tempest 19:11:01 okay, want to keep it on the agenda for next week still? 19:11:12 basic integration has been already merged in 19:11:27 awesome 19:11:30 fungi, probably we could move it to the end of agenda 19:11:36 okay, will do 19:12:09 #topic Zuul release (2.0?) / packaging (jeblair, pabelanger) 19:12:11 I don't think that we'll have enough questions in the nearest feature to have searated section on the meeting 19:12:20 ^^ about savanna testing 19:12:40 #link http://git.openstack.org/cgit/openstack-infra/zuul/tag/?id=2.0.0 19:12:47 i guess that happened 19:13:01 weeks ago 19:13:22 has there been any fallout from it which bears discussing, or should it come off the agenda? 19:13:32 i'm thinking the latter 19:13:42 * mordred votes later 19:14:15 after more than a month, any bugs should be addressed as, well, bugs 19:14:29 #topic Jenkins 1.543 upgrade (zaro, clarkb) 19:15:13 i believe the main news here is that jenkins.o.o and jenkins01 still need an upgrade to match 02-04, but sdague has spotted some missing logs which clarkb thinks may be a locking/sync problem in the scp plugin 19:15:42 yeh, we're loosing console logs an alarming amount of the time 19:15:49 5 - 10% by what I'm seeing 19:16:02 which explains why elastic recheck has been missing a lot of things 19:16:06 can we use turbo-hipster yet? 19:16:37 current guess based on the logstash client logs is that we're racing and requesting the console log before it's available, so we get a 404 19:17:20 and that this is probably the upshot of the threading fix which was made to the scp plugin to work properly on newer jenkins 19:17:35 which coincides with when we think this behavior began 19:17:44 seems sensible to me 19:18:10 yeh, fixing that is somewhat of a blocker for some of the ER work, because if ES isn't a reliable source of truth, a lot of the numbers have no meaning 19:18:16 zaro: you were wanting to discuss it in detail with clarkb before digging into it further, you said 19:19:30 #action zaro discuss potential scp plugin race with clarkb 19:20:14 #action fungi upgrade jenkins.o.o and jenkins01 to match 02-04 19:20:28 if somebody else beats me to that, i won't complain 19:20:41 #topic Requested StackForge project rename (fungi, clarkb, zhiwei) 19:20:55 #link http://lists.openstack.org/pipermail/openstack-infra/2014-January/000594.html 19:21:18 stackforge/cookbook-openstack-metering wants to rename to stackforge/cookbook-openstack-telemetry 19:21:52 apparently using official terms instead of codenames for openstack projects doesn't keep you from having to rename things 19:21:57 :) 19:22:32 i'm willing to do this on saturday (the 18th) and clarkb said he expected to be around that day if i ran into major issues 19:23:15 i'll tentatively set this for 19:00 utc, but i'll nail down a time when he's around 19:23:36 #action fungi rename stackforge/cookbook-openstack-metering to -telemetry 19:23:55 #topic Ongoing new project creation issues (fungi, clarkb) 19:24:05 * mordred was just reading the latest on that 19:24:10 manage-projects is apparently still broken 19:24:26 didn't we make it so that it would error out if group creation didnt' work 19:24:33 so that at least you could just re-run it over and over again/ 19:24:35 ? 19:24:39 i tried two more new project creations as guinea pigs yesterday and got the same behavior we'd been seein gpreviously 19:24:46 * mordred is sad 19:25:02 i even tried it on one project which was reusing an existing acl, thus no group creation required 19:25:35 something prevented it from getting as far as cloning the upstream repo, yet it created an empty project in gerrit and then we got broken mirrors everywhere for it 19:25:56 oh god 19:26:00 wth? 19:26:00 i think next we should run it manually without letting puppet try to run it first, since when i rerun it, everything seems fine 19:26:27 yeah. maybe we shoudl just, for the time being, run it manually from time to time 19:26:41 since that's probably less work than fixing the broken runs 19:26:50 anyway, new project requests are piling up, most have -2 votes on them waiting on this to get working 19:26:52 and then once we've figured out what's wrong, we can re-enable the puppet triger 19:27:05 o/ 19:27:47 mordred: i've been reviewing most of the new project requests even in light of their -2 condition, trying to get them in shape anyway. if you want to look at them and try manual manage-projects runs on them, i won't object 19:28:05 fungi: k. 19:28:18 that would require me having +2 access and ssh access again 19:28:20 though i will admit, my review backlog the past few weeks has been abominable 19:28:26 mordred: yes 19:28:45 #action fungi get mordred's gerrit group membership reinstated 19:28:56 thanks for the reminder ;) 19:29:11 #action mordred look at the current state of manage-projects failures 19:29:35 #topic Pip 1.5 readiness efforts (fungi, mordred) 19:30:13 at this point most stuff is okay, but requirements integration is broken still 19:30:48 we have four known global requirements which pip 1.5 will not download without explicit --allow-external --allow-insecure whitelisting 19:31:14 i have a change proposed to do that... 19:31:24 #link https://review.openstack.org/66364 19:32:03 however, it's now hitting an issue with pip 1.5's refusal to follow -f urls in requirements by default 19:32:22 apparently we have projects consuming oslo.messaging even though it's never been released to pypi 19:32:37 fungi: how about we land a change to run-mirror with the allow-insecure flag turned on to allow -f 19:32:39 i did get around to reserving it on pypi yesterday at least, so nobody else can squat it 19:32:47 fungi: then we land the changes to things to remove their -f 19:32:56 then we land a change to remove the allow-insecure 19:33:23 mordred: i do want to try that next, however i also want to make sure it's not going to result in us pulling those things into our actual mirror (which pypi-mirror also builds/updates) 19:34:08 and teh command-line flag to allow it only first appeared in pip 1.5, same release which needs it, so if we want it to be able to run on <1.5 we need to pass it as an envvar instead 19:34:11 no more so that it would have before 19:34:24 I do not think we care about being able to run run-mirror on pip <1.5 19:34:39 * mordred strongly does not care 19:35:02 mordred: well, that puts us in a bit of a chicken-and-egg situation while transitioning, since we have pinned our slaves to older virtualenv 19:35:20 hrm. wait 19:35:33 why? 19:35:38 so we need it to work with virtualenv 1.10.1/pip 1.4.1 long enough to switch the mirror updater 19:35:45 run-mirror upgrades pip in the venv it creates as one of its first steps 19:36:00 so the pip that comes with the venv should not matter 19:36:14 ahh, so we should already be failing this way on the mirror slaves the same way we're failing in the requirements integration jobs? 19:36:25 yup 19:36:46 run-mirror itself creates and operates inside of venvs to protect against bonghits 19:36:47 good to know. in that case maybe i just try the cli option and see how far it gets us on that existing patch 19:37:20 anyway, weeds 19:37:43 anybody have anything else on new pip goings on before i move to the next topic? 19:38:13 i'll link the tracking bug and etherpad... 19:38:28 #link https://launchpad.net/bugs/1267364 19:38:30 Launchpad bug 1267364 in openstack-ci "Recurrent jenkins slave agent failures" [Critical,In progress] 19:38:39 o/ 19:38:48 er, wrong bug 19:39:19 #link https://launchpad.net/bugs/1266513 19:39:21 Launchpad bug 1266513 in tripleo "Some Python requirements are not hosted on PyPI" [Critical,In progress] 19:39:45 #link https://etherpad.openstack.org/p/pip1.5Upgrade 19:40:08 #topic OpenID provider project (fungi, reed) 19:40:45 mmm. openid 19:40:45 smarcet has been working on the php end of things for this and got some of the initial redis module written for puppet which it's using 19:40:55 mordred, openid is yummy 19:41:00 my next phase of the deployment automation is awaiting review... 19:41:06 #link https://review.openstack.org/63316 19:41:37 i'm a bit swamped and it needs someone to take up the mantle of adding the project-specific deployment steps on top of that 19:42:07 I will try to identify blockroads with smarcet and try to recruit a mentor for him that is not swamped 19:42:12 i have details from smarcet on what commands need to be run to deploy it 19:42:37 i just have been doing a horrible job of finding time to help with next steps 19:42:42 if meanwhile we can merge 63316 that'd be great 19:42:56 fungi, you've gone already above and beyond, thank you 19:43:21 looks like jeblair reviewed it on sunday, so i may just go ahead and merge that change so we can pick up some momentum 19:43:30 ++ 19:43:54 #action reed to talk to smarcet and find a mentor to help him get through the CI learning curve faster 19:44:00 but definitely, anyone who finds this exciting is more than welcome to pitch in. i find it exciting, just very busy already 19:44:39 anyway, trying to get through the meeting agenda, so moving on... 19:44:44 #topic Graphite cleanup (fungi) 19:45:02 the graphite server is spending a *lot* of time (an entire cpu pegged) in iowait 19:45:29 oh, well that's not great 19:45:34 the load is also seeming causing it to fail to generate and serve graphs 19:45:51 i think we probably need to look at a faster cinder volume for the whisper files (ssd backed media) 19:46:24 it's also running out of disk space. the whisper files are fixed size, but we add more and more metrics (new job names, et cetera) 19:46:58 i discussed with jeblair and he's on board with autodeleting any whisper files which haven't received an update in 2 weeks or maybe a month 19:47:45 it needs someone to look into it, but i'll throw myself on the action item for now as a placeholder and just assume i won't get a chance to look at it between now and the next meeting 19:48:24 #action fungi move graphite whisper files to faster volume 19:48:38 #action fungi prune obsolete whisper files automatically on graphite server 19:49:01 probably best done in the opposite order, so there are fewer files to rsync 19:49:19 #topic Maven clouddoc plugin move (zaro, mordred) 19:49:32 ugh. what did I do now? 19:49:43 so i don't think there's anything more to do on this. 19:49:48 i think your name is a historical artifact on there, mordred 19:49:57 looks like dcramer is doing the release manually 19:50:14 using the maven release plugin 19:50:23 i agreed to do the bits mentioned at the end of this review, but haven't found the time (maven nexus org setup stuff) 19:50:30 #link https://review.openstack.org/46099 19:51:30 that one is not needed unless this one is approved.. https://review.openstack.org/#/c/58349 19:51:56 ahh, good to know 19:52:22 #action fungi request org.openstack group in sonatype jira for maven nexus 19:52:42 i'll likely just defer that until 58349 gets traction in that case 19:52:48 thanks zaro! 19:52:54 right now dcramer is doing the releases manually bypassing what the openstack CI wants to do 19:53:57 #topic Private gerrit for security reviews (zaro, fungi) 19:54:08 I'm guessing we can just ignore until ann or dcramer needs something. 19:54:20 pretty sure this has ended up on the back burner, since gerrit upgrades are worked further through 19:54:21 fungi, I could probably help you with sonatype if you want, I have some groups there 19:54:39 private gerrit.. i think we should wait until after gerrit 2.8+ 19:54:43 ++ 19:54:44 SergeyLukjanov: when that task wakes back up, i'll try to remember to ping you for suggestions. thanks!@ 19:54:46 almost there. 19:54:58 zaro: I completely agree- it seems bananas to add a gerrit that we'll need to upgrade 19:55:17 so that takes us to... 19:55:22 #topic Upgrade gerrit (zaro) 19:55:41 upgrade upgrade!!! 19:55:44 well.. gerrit 2.8 is on review.o.o 19:55:55 and seems to be increasingly usable 19:55:55 just baking in i guess.. 19:56:16 you mean review-dev.o.o ? 19:56:17 i believe all questions have been answered on etherpad: https://etherpad.openstack.org/p/gerrit-2.8-upgrade 19:56:17 i think not enough of us have been around to test things we want to make sure didn't break on it 19:56:40 AaronGr: ^^ don't know if you've been tracking this one 19:56:59 the next thing i was gonna do was to create a script to semi-automate the upgrade 19:57:06 what sort of schedule is google looking at for gerrit 2.9, any idea? 19:57:09 mordred: i haven't been, no. 19:57:19 new review screen looks quite overloaded 19:57:39 the idea is to semi-automate 1st upgrade to 2.8 since it's a troublesome process. then automate 2.8 to next releases via puppet. 19:57:49 I think that's great 19:58:00 apparently 2.9 is taking away the old review screen view entirely, so the sooner we prepare to make the new one usable (upstream patches, whatever) the better on that 19:58:02 SergeyLukjanov: not turned on in review-de.o.o 19:58:05 SergeyLukjanov: I think we're planning on having the old screen on by default to start with, yeah? 19:58:13 here is a topic about 2.9 release https://groups.google.com/d/topic/repo-discuss/rAmliEzSsko/discussion 19:58:45 #link https://groups.google.com/d/topic/repo-discuss/rAmliEzSsko/discussion 19:58:55 mordred, AFAIK it'll be removed in next gerrit releases and I saw a CR to enable new screen for review.o.o while upgrading gerrit 19:59:33 zaro, I know, I've setup an instance for myself and was surprised :() 19:59:35 2.9-rc0 early this week according to mfick 19:59:37 fungi: i believe google is targeting march for 2.9 release? 20:00:20 due to the first message in https://groups.google.com/forum/#!topic/repo-discuss/rAmliEzSsko master will delete old change screen code 20:00:21 okay, we're over time 20:00:31 need to get the tc meeting going 20:00:39 thanks everybody! 20:00:45 #endmeeting