19:03:53 #startmeeting infra 19:03:54 Meeting started Tue Aug 4 19:03:53 2015 UTC and is due to finish in 60 minutes. The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:03:55 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:03:57 The meeting name has been set to 'infra' 19:04:05 #link agenda https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:04:05 #link previous meeting http://eavesdrop.openstack.org/meetings/infra/2015/infra.2015-07-28-19.01.html 19:04:14 #topic Announcements 19:04:20 #topic Announcements: Project management 19:04:30 we've made a lot of changes with how we work recently: 19:04:42 we write specs for any moderately complex task so we can articulate what we want to accomplish and help reviewers understand our changes 19:04:42 we have priority efforts so we know where we should look first to keep our most important efforts moving 19:04:52 and we've radically decentralized with the formation of the council composed of core teams 19:05:02 however, recently i feel like it's getting harder to see where we are in progressing our work 19:05:09 hi 19:05:22 this is important because it means an effort could be blocked on something without our realizing how to move it forward 19:05:34 or you might imagine some dependency on an effort that involves a transition, and it's not clear whether that's happened yet 19:05:46 all of this can be addressed with better project management (which when done poorly is just nagging, but when done well helps facilitate communication and progress among a distributed team) 19:05:59 so i've asked Clint to help us out with that 19:06:08 and in a fit of generosity, he accepted :) 19:06:11 thanks Clint!!! 19:06:18 it's a pretty open ended problem and he'll be looking at how to actually accomplish it 19:06:20 o/ 19:06:29 but i think there's a lot of room for improvement here, and i think it could be really helpful to all of us 19:06:38 i've asked him to start by focusing on the priority efforts, and also maniphest (as i believe it will also be helpful in this regard) 19:06:47 i expect that he may have a lot of questions for folks, so please help him out 19:06:54 i will definitely be asking questions 19:07:04 where is Monty? 19:07:10 Clint: cool :) 19:07:25 mrmartin: on a pacific island 19:07:26 Clint: sounds great :-) 19:07:29 monty is vacationing 19:07:39 :) 19:07:46 we should totally have a "where is monty" agenda item; though this isn't it :) 19:08:08 jeblair: i have considered a cross-platform app 19:09:16 any other questions or helpful suggestions for Clint? 19:10:09 Clint: thanks! 19:10:13 * Clint nods. 19:10:13 #topic Specs approval 19:10:17 #topic Specs approval: Zuul v3 (jeblair) 19:10:36 #link zuulv3 spec http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html 19:10:42 #info zuulv3 spec was approved 19:10:48 Niiice 19:10:51 huzzah 19:11:06 some quick thoughts on that: 19:11:23 i'm thinking of revising the spec to suggest that we do development in branches on both zuul and nodepool 19:11:49 +1 19:11:57 and i'm going to collect the names of folks from the summit etherpad and those who otherwise volunteered to help and see if we can have a huddle and discuss how to proceed 19:12:01 maybe divvy up some items 19:12:19 so keep an eye out for that 19:12:29 There might be a lot of churn in nodepool with both v3 and builders at the same time 19:12:34 maybe too much rebasing 19:12:49 greghaynes: with branches we can use merge commits! 19:12:49 and branching will make that a bit worse 19:13:08 yes, there will be a pretty big delta, its more clear to resolve but still a lot of drift 19:13:24 yeah, if we can get builders in soon, that will help 19:13:51 but if we don't, i'm sure we'll manage 19:14:15 #topic Schedule Project Renames 19:14:32 i'm going to abuse this topic since i didn't put in what we should really start talking about... 19:15:00 which is that the "put all new projects in openstack/" resolution was approved 19:15:02 #link http://governance.openstack.org/resolutions/20150615-stackforge-retirement.html 19:15:23 so we should work on a plan for dealing with that -- 19:15:47 does this mean we are also moving the existing stackforge ones over? 19:15:47 do we want to perform an automatic mass-migration simultaneously ourselves? 19:15:59 oh wow, it does 19:16:26 i think if we don't move everyone at once, it will be painful 19:16:52 or let projects do it on their own time-frame, but maybe we batch them so we do moves every 2 months or so...? 19:16:52 we should try to get a list of those that want to be attic'd as part of hte process too (thinking at least pecan) 19:16:59 i mean, _more_ painful 19:17:09 oh, also, i would like to end the practice of moving things to the attic 19:17:29 jeblair: ok, what do we do about pecan then? just hav ethem update that repo with a "use github" message instead? 19:17:30 i previously had a week preference not to attic things 19:17:43 yeah, attic means more pointless renaming 19:17:52 (I am happy either way, but we do need to accomodate their concerns) 19:18:08 but now that i have seen projects moved to the attic with absolutely no indication they are actually dead (complete with transparent github rewrites so their users won't even notice), i have a _strong_ conviction it's a bad idea. 19:18:42 clarkb: yeah, so perhaps we let defunct (for whatever reason) projects just stay in stackforge 19:18:51 if we need a script to push up a boilerplate readme and file deletion change along with a read-only acl update, that's still a much better use of our time to semi-automate 19:18:52 how about this: 19:19:13 we ask projects to sign up to be moved to openstack. if they don't sign up, they end up on a list of projects we will make read-only. 19:19:30 we publicise that list, and timetable, etc, so people can review it. 19:19:41 then we move them en-masse, ourselves. 19:19:54 that seems reasonable 19:20:03 (and, obviously, do not move the ones on the read-only list) 19:20:07 ++ 19:20:15 "publicise" may be tricky, but can we poll+send to cores on all the projects? 19:20:22 I think some stackforge folks don't read all of openstack-dev 19:20:28 pleia2: ya that should be easy with a gerrit api script 19:20:31 cool 19:20:32 will 'read-only' projects be listed anywhere other than gerrit acls? 19:20:55 should be trivial to build a contact list. oh, i already almost have that written i think 19:21:06 a number of people (and me) write scripts like for repo in `gerrit ls-projects | grep infra | grep puppet`; do something; git review; done 19:21:22 and inactive/readonly projects in that list screws that up 19:21:56 #link http://git.openstack.org/cgit/openstack-infra/system-config/tree/tools/who-approves.py 19:21:57 I wonder if ls-projects can be limited to non-readonly somehow? 19:22:15 how about this: i write a message to openstack-dev giving folks general context for governance resolution, since it still hasn't really been publicised, and let people know that we will start planning a future move. then after that, i'll start a thread for planning the actual move, and we can work out logistics and dates. 19:22:27 +1 19:22:54 jeblair: sounds good to me 19:23:15 that sounds good 19:23:19 nibalizer: maybe we should look into whether we can do something similar with the rest api? 19:23:36 #action jeblair write message to openstack-dev with overall context for stackforge retirement/move 19:23:40 sounds great. we also all need to remember to remind people who don't read closely that this is a purely cosmetic change. there's no implied governance alteration needed/accompanying it 19:24:04 #action jeblair start discussion thread about logistics of repo moves 19:24:15 its a thing that only really makes sense in the github world 19:24:23 and we github so :/ 19:24:47 we're just "getting rid of the word 'stackforge' in our urls" 19:25:11 (and replacing it with another word we already use a lot) 19:25:13 yep. and maybe in 6 months or a year, we have a discussion like "hey, so should we remove openstack/ from git.o.o"? :) (would be so much less typing) but that's getting way ahead. 19:25:49 okay, i think we know generally how to proceed here, thanks 19:25:53 #topic Priority Efforts (Migration to Zanata) 19:25:53 * fungi would (eventually) love that 19:26:02 hey 19:26:08 pleia2: howdy! 19:26:21 so I've got UbuConLA, some hiking in the Andes and then the Ops Mid-cycle in the next two weeks (starting tomorrow) 19:26:43 I don't want my absence to cause problems for the i18n team and StevenK who are still working on the Zanata stuff 19:27:08 on line 200 of https://etherpad.openstack.org/p/zanata-install I listed my outstanding tasks, so I'd like to do a soft hand-off to someone while I'm gone so things keep chugging along 19:27:30 I can help out there 19:27:40 I am relatively familiar with how all that fits together so happy to 19:27:55 #link https://etherpad.openstack.org/p/zanata-install 19:28:20 and as always, if I can get folks to review open zanata reviews, it would get StevenK unstuck in a few places https://review.openstack.org/#/q/status:open+topic:zanata,n,z 19:28:23 #info clarkb is pleia2 for next 2 weeks 19:28:46 https://review.openstack.org/#/c/206305/ needs another +2 19:28:48 I did review the jeepyb change stevenK wrote yesterday 19:28:51 it looks good 19:28:55 great 19:29:03 need to find the puppet change to run it but that should be straightforwrad 19:29:18 will review 206305 today 19:29:22 clarkb: https://review.openstack.org/208811 19:29:28 topic:zanata ftw 19:29:40 he also said he'll be working on 203928 this week, so that WIP will move along soon 19:29:53 the jeepyb change is under that topic too, perfect 19:30:16 clarkb, pleia2: thanks! 19:30:29 I think that's it, I'm around for the rest of the day if clarkb needs any more details for anything 19:30:55 thanks clarkb :) 19:31:02 #topic #topic Puppet-httpd replacement of puppet-apache (pabelanger/nibalizer) 19:31:04 gah 19:31:05 #topic Puppet-httpd replacement of puppet-apache (pabelanger/nibalizer) 19:31:18 this is quite close to being done 19:31:22 aye 19:31:40 https://review.openstack.org/#/c/198819/ one more 19:32:06 to system-config this time, not a module 19:32:43 https://review.openstack.org/#/c/201759/ is for grafana 19:32:49 and system-config too 19:32:58 its also worth noting that the existence of a patch like this demonstrates that there is still active configuration inside system-config that hasn't been pushed out into the modules, but thats a different topic 19:33:55 nibalizer: yeah, some of thot is probably some low-hanging fruit 19:33:56 there are a couple follow on activites such as removing if statements that paul had to hack in, as well as finally removing apache 0.0.4 from modules.env 19:34:16 but I'm very pleased with how quickly and effectively this went 19:34:17 (static.o.o is not lhf, but many of the others are) 19:34:19 thanks pabelanger 19:34:41 https://review.openstack.org/#/c/205696/ is the removal of puppet-apache 19:34:49 pabelanger, nibalizer: indeed, thanks! :) 19:34:51 nibalizer: np, was strightforward 19:35:44 thats all I have on that topic 19:35:49 cool, thanks 19:35:56 #topic Restore from backup test (jeblair) 19:36:12 this is a thing we have talked about a bit... but haven't really done 19:36:24 i mean, backups are great, but you don't really have them unless you've used them 19:36:40 when i set some of this up, early on i did some limited restore tests 19:36:52 (like log into the server and restore to a dir in /tmp/) 19:37:08 but that was a long time ago, and not at all systematic, nor has it been done for every host 19:37:12 when I updated the docs I too restored specific files from backup to test the docs 19:37:21 For some context - what all is this a backup of? 19:37:23 but has all those same issues 19:37:27 greghaynes: ah great question! 19:37:50 we perform fairly traditional filesystem backups of major hosts to an off-host location 19:37:59 specifically a server in a different cloud account on a different provider 19:38:07 er 19:38:11 strike different provider 19:38:25 heh, different region? 19:38:27 (eventually different provider) 19:38:31 yes different region 19:38:31 yes different region 19:38:40 ok, gotcha 19:38:51 I am really a fan of daily restore from backup jobs :) 19:38:54 we used to, but then the 'different provider' did one of the things that the diversity is supposed to protect us from 19:38:59 which is, they deleted the host 19:39:06 greghaynes: no points for guessing which 19:39:13 haha 19:39:58 but yeah, ideally, we back up to multiple locations that are as diverse from where we actually run these as possible. we are not currently living the dream. 19:40:33 at any rate, i put this on the agenda so we could start talking about it and brainstorming... 19:40:45 though it makes a great point. prepare for the loss of your backup server ;) 19:40:59 we do have a third region available to us to add diversity 19:41:09 (we should probably work on replacing the one that was deleted, i think we can now) 19:41:31 nameless provider seems to have gotten a little more reliable about not randomly deleting our servers 19:41:35 since i did most of the work designing the system as is, i think i'd like someone else to take the lead on evaluating our preparedness to restore 19:42:11 i was imagining we design and conduct a periodic drill, but i'm also open to greghaynes suggestion of automated continuous testing... 19:42:28 not to derail on specifics, but one related item I think we need to handle better is the we ran out of disk on backup server issue since our backups are append only 19:42:43 clarkb: agreed 19:43:22 i like the automated continuous testing idea if it's feasible (i mean, we already do a lot of that) but would need some way to assure we pay attention to failures 19:43:23 #link http://docs.openstack.org/infra/system-config/sysadmin.html#backups 19:43:26 Yea, im sure its a matter of how much work restoring from backup is. Also, the automatedish setup is only useful if you have a good way to notice when a fialure happens 19:44:09 periodic drills are something we can have as early as today if we want them, modulo available root sysadmin availability 19:44:42 Going through one is probably a prereq for knowing what all an automated solution would need to do, also 19:44:47 at least for me 19:45:17 greghaynes: good point 19:45:34 so probably we should start with that, and then see whether we think the process is amenable to automation 19:46:13 ++ 19:46:40 at any rate, i'm looking for not-me volunteers for this... think about whether you can help out, and we'll check back in next week 19:46:52 yep, identifying _what_ we want to check is probably step #1 19:47:18 #topic Puppet apply coverage for fedora21 / centos7 (pabelanger) 19:47:25 hey 19:47:43 pabelanger: o/ 19:47:49 this one is quick, I have 2 reviews up to add support for Fedora21 and Centos7 for our puppet-apply tests. 19:47:53 https://review.openstack.org/#/c/203756/ 19:47:57 https://review.openstack.org/#/c/203759/ 19:48:16 they are non-voting jobs right now, but want to kick them to voting for next week 19:48:25 so, just need some help reviewing and approving 19:48:33 oh hey, they make the jobs go green, respectively 19:49:02 this seems like a swell idea 19:49:10 * clarkb will add to review list today 19:49:24 while people are thinking about puppet-apply, https://review.openstack.org/205873 & https://review.openstack.org/206319 are about running it with project-config 19:49:54 ianw: oh thanks for bringing that up... 19:50:30 thats all I had 19:50:45 i've been thinking about that, and i think there's the idea that it should have an impact, though it doesn't in our current jobs 19:51:25 I think it matters if project_config ever grows a puppet template 19:51:27 but right now it does not 19:51:30 that is to say, it really is related, but our apply jobs may not get far enough in what they do to actually exercise it 19:51:33 ianw: we pull the ocnfig we are deploying out of project_config so it needs to cogate 19:51:50 ianw: for noop its less important but anything else would need it 19:52:46 and on the subject of puppet: https://review.openstack.org/#/c/206779/ adds a hiera directory to system-config and starts using it 19:52:49 (just noting that as we do the other beaker based testing) 19:53:17 nibalizer: nice, will be checking that out 19:53:33 pabelanger: https://review.openstack.org/#/c/203756/ doesn't pass the f21 test 19:54:15 ok, well i'll have to think about it. i think we could skip those jobs for the bulk of project-config changes that are adding/modifying zuul jobs 19:54:15 clarkb: right, I don't know why right now. I need to check out dib builds to see what happens. Since it should be the same code for production installs 19:54:29 clarkb: I won't be gating that until I figure it out 19:55:14 ianw: yeah. i think it's worth some thought. there's definitely a relationship there. we're probably not adequately testing it. if we can, we should. but if it's not possible, then yeah, maybe we should drop the jobs. 19:55:35 #topic Open discussion 19:55:39 1/ could we have reviews on "Add support for backport-potential commit flag" > https://review.openstack.org/#/c/175849/ and 2/ Also "jenkins: implement copy-puppet-logs publisher" (miss +A) > https://review.openstack.org/206693 - thank you 19:56:37 Gerrit upgrade update, new info in #link https://code.google.com/p/gerrit/issues/detail?id=3424 19:56:43 So, I know we only have 5mins, but was going to do this on mailing list. Could somebody explain when it is appropiate to make a job (node) reusable? https://github.com/openstack-infra/project-config/blob/master/zuul/openstack_functions.py#L34 19:56:46 EmilienM: my initial impression of 175849 is the submitter wrote the commit message should just push to the old branch so that conflicts can be handled 19:57:03 was wondering if you guys can take a look and help me determine if we want to increase diff timeout? 19:57:25 pabelanger: hey, what's the latest status on bringing stackalytics into openstack.org? 19:57:28 ericsson has independently ocnfirmed our jenkins slave plugin bug 19:58:05 jaypipes: good, I rebased patches today. We're pretty close. I suspect next week I'll have a request up for reviews / approval 19:58:24 pabelanger: coolio. let me know if you need any assistance from my end. 19:58:41 clarkb: don't you think it's a good first iteration? 19:58:55 EmilienM: I worry its putting too much effort into a nintractable problem 19:59:18 pabelanger: i think reusable if it's not running untrusted code? 19:59:27 is any code trusted though... 19:59:29 its trivial to push the backport if you know you need to push it and it allows you to do the non trivial work of resolving conflicts 20:00:05 ianw: so, what if a job sandboxes code into the workspace, which gets deleted after a job runs? Could that be reusable? 20:00:20 ianw: EG: not using sudo 20:00:30 clarkb: it's just we want to automate it 20:00:36 clarkb: the "push" part 20:00:42 times up; maybe finish these in #-infra 20:00:43 EmilienM: yes and I am saying you cannot automate it 20:00:47 thanks everyone! 20:00:50 #endmeeting