#openstack-meeting log

19:03:53 <jeblair> #startmeeting infra
19:03:54 <openstack> Meeting started Tue Aug  4 19:03:53 2015 UTC and is due to finish in 60 minutes.  The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:03:55 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:03:57 <openstack> The meeting name has been set to 'infra'
19:04:05 <jeblair> #link agenda https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:04:05 <jeblair> #link previous meeting http://eavesdrop.openstack.org/meetings/infra/2015/infra.2015-07-28-19.01.html
19:04:14 <jeblair> #topic Announcements
19:04:20 <jeblair> #topic Announcements: Project management
19:04:30 <jeblair> we've made a lot of changes with how we work recently:
19:04:42 <jeblair> we write specs for any moderately complex task so we can articulate what we want to accomplish and help reviewers understand our changes
19:04:42 <jeblair> we have priority efforts so we know where we should look first to keep our most important efforts moving
19:04:52 <jeblair> and we've radically decentralized with the formation of the council composed of core teams
19:05:02 <jeblair> however, recently i feel like it's getting harder to see where we are in progressing our work
19:05:09 <nibalizer> hi
19:05:22 <jeblair> this is important because it means an effort could be blocked on something without our realizing how to move it forward
19:05:34 <jeblair> or you might imagine some dependency on an effort that involves a transition, and it's not clear whether that's happened yet
19:05:46 <jeblair> all of this can be addressed with better project management (which when done poorly is just nagging, but when done well helps facilitate communication and progress among a distributed team)
19:05:59 <jeblair> so i've asked Clint to help us out with that
19:06:08 <jeblair> and in a fit of generosity, he accepted :)
19:06:11 <fungi> thanks Clint!!!
19:06:18 <jeblair> it's a pretty open ended problem and he'll be looking at how to actually accomplish it
19:06:20 <mrmartin> o/
19:06:29 <jeblair> but i think there's a lot of room for improvement here, and i think it could be really helpful to all of us
19:06:38 <jeblair> i've asked him to start by focusing on the priority efforts, and also maniphest (as i believe it will also be helpful in this regard)
19:06:47 <jeblair> i expect that he may have a lot of questions for folks, so please help him out
19:06:54 <Clint> i will definitely be asking questions
19:07:04 <mrmartin> where is Monty?
19:07:10 <jeblair> Clint: cool :)
19:07:25 <jeblair> mrmartin: on a pacific island
19:07:26 <jhesketh> Clint: sounds great :-)
19:07:29 <Shrews> monty is vacationing
19:07:39 <aukhan> :)
19:07:46 <jeblair> we should totally have a "where is monty" agenda item; though this isn't it :)
19:08:08 <Shrews> jeblair: i have considered a cross-platform app
19:09:16 <jeblair> any other questions or helpful suggestions for Clint?
19:10:09 <jeblair> Clint: thanks!
19:10:13 * Clint nods.
19:10:13 <jeblair> #topic Specs approval
19:10:17 <jeblair> #topic Specs approval: Zuul v3 (jeblair)
19:10:36 <jeblair> #link zuulv3 spec http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html
19:10:42 <jeblair> #info zuulv3 spec was approved
19:10:48 <pabelanger> Niiice
19:10:51 <pleia2> huzzah
19:11:06 <jeblair> some quick thoughts on that:
19:11:23 <jeblair> i'm thinking of revising the spec to suggest that we do development in branches on both zuul and nodepool
19:11:49 <jhesketh> +1
19:11:57 <jeblair> and i'm going to collect the names of folks from the summit etherpad and those who otherwise volunteered to help and see if we can have a huddle and discuss how to proceed
19:12:01 <jeblair> maybe divvy up some items
19:12:19 <jeblair> so keep an eye out for that
19:12:29 <greghaynes> There might be a lot of churn in nodepool with both v3 and builders at the same time
19:12:34 <greghaynes> maybe too much rebasing
19:12:49 <jeblair> greghaynes: with branches we can use merge commits!
19:12:49 <greghaynes> and branching will make that a bit worse
19:13:08 <greghaynes> yes, there will be a pretty big delta, its more clear to resolve but still a lot of drift
19:13:24 <jeblair> yeah, if we can get builders in soon, that will help
19:13:51 <jeblair> but if we don't, i'm sure we'll manage
19:14:15 <jeblair> #topic Schedule Project Renames
19:14:32 <jeblair> i'm going to abuse this topic since i didn't put in what we should really start talking about...
19:15:00 <jeblair> which is that the "put all new projects in openstack/" resolution was approved
19:15:02 <jeblair> #link http://governance.openstack.org/resolutions/20150615-stackforge-retirement.html
19:15:23 <jeblair> so we should work on a plan for dealing with that --
19:15:47 <greghaynes> does this mean we are also moving the existing stackforge ones over?
19:15:47 <jeblair> do we want to perform an automatic mass-migration simultaneously ourselves?
19:15:59 <greghaynes> oh wow, it does
19:16:26 <fungi> i think if we don't move everyone at once, it will be painful
19:16:52 <jeblair> or let projects do it on their own time-frame, but maybe we batch them so we do moves every 2 months or so...?
19:16:52 <clarkb> we should try to get a list of those that want to be attic'd as part of hte process too (thinking at least pecan)
19:16:59 <fungi> i mean, _more_ painful
19:17:09 <jeblair> oh, also, i would like to end the practice of moving things to the attic
19:17:29 <clarkb> jeblair: ok, what do we do about pecan then? just hav ethem update that repo with a "use github" message instead?
19:17:30 <jeblair> i previously had a week preference not to attic things
19:17:43 <fungi> yeah, attic means more pointless renaming
19:17:52 <clarkb> (I am happy either way, but we do need to accomodate their concerns)
19:18:08 <jeblair> but now that i have seen projects moved to the attic with absolutely no indication they are actually dead (complete with transparent github rewrites so their users won't even notice), i have a _strong_ conviction it's a bad idea.
19:18:42 <jeblair> clarkb: yeah, so perhaps we let defunct (for whatever reason) projects just stay in stackforge
19:18:51 <fungi> if we need a script to push up a boilerplate readme and file deletion change along with a read-only acl update, that's still a much better use of our time to semi-automate
19:18:52 <jeblair> how about this:
19:19:13 <jeblair> we ask projects to sign up to be moved to openstack.  if they don't sign up, they end up on a list of projects we will make read-only.
19:19:30 <jeblair> we publicise that list, and timetable, etc, so people can review it.
19:19:41 <jeblair> then we move them en-masse, ourselves.
19:19:54 <clarkb> that seems reasonable
19:20:03 <jeblair> (and, obviously, do not move the ones on the read-only list)
19:20:07 <greghaynes> ++
19:20:15 <pleia2> "publicise" may be tricky, but can we poll+send to cores on all the projects?
19:20:22 <pleia2> I think some stackforge folks don't read all of openstack-dev
19:20:28 <clarkb> pleia2: ya that should be easy with a gerrit api script
19:20:31 <pleia2> cool
19:20:32 <nibalizer> will 'read-only' projects be listed anywhere other than gerrit acls?
19:20:55 <fungi> should be trivial to build a contact list. oh, i already almost have that written i think
19:21:06 <nibalizer> a number of people (and me) write scripts like for repo in `gerrit ls-projects | grep infra | grep puppet`; do something; git review; done
19:21:22 <nibalizer> and inactive/readonly projects in that list screws that up
19:21:56 <fungi> #link http://git.openstack.org/cgit/openstack-infra/system-config/tree/tools/who-approves.py
19:21:57 <jhesketh> I wonder if ls-projects can be limited to non-readonly somehow?
19:22:15 <jeblair> how about this: i write a message to openstack-dev giving folks general context for governance resolution, since it still hasn't really been publicised, and let people know that we will start planning a future move. then after that, i'll start a thread for planning the actual move, and we can work out logistics and dates.
19:22:27 <pleia2> +1
19:22:54 <clarkb> jeblair: sounds good to me
19:23:15 <nibalizer> that sounds good
19:23:19 <jeblair> nibalizer: maybe we should look into whether we can do something similar with the rest api?
19:23:36 <jeblair> #action jeblair write message to openstack-dev with overall context for stackforge retirement/move
19:23:40 <fungi> sounds great. we also all need to remember to remind people who don't read closely that this is a purely cosmetic change. there's no implied governance alteration needed/accompanying it
19:24:04 <jeblair> #action jeblair start discussion thread about logistics of repo moves
19:24:15 <clarkb> its a thing that only really makes sense in the github world
19:24:23 <clarkb> and we github so :/
19:24:47 <fungi> we're just "getting rid of the word 'stackforge' in our urls"
19:25:11 <fungi> (and replacing it with another word we already use a lot)
19:25:13 <jeblair> yep.  and maybe in 6 months or a year, we have a discussion like "hey, so should we remove openstack/ from git.o.o"? :)  (would be so much less typing)  but that's getting way ahead.
19:25:49 <jeblair> okay, i think we know generally how to proceed here, thanks
19:25:53 <jeblair> #topic Priority Efforts (Migration to Zanata)
19:25:53 * fungi would (eventually) love that
19:26:02 <pleia2> hey
19:26:08 <jeblair> pleia2: howdy!
19:26:21 <pleia2> so I've got UbuConLA, some hiking in the Andes and then the Ops Mid-cycle in the next two weeks (starting tomorrow)
19:26:43 <pleia2> I don't want my absence to cause problems for the i18n team and StevenK who are still working on the Zanata stuff
19:27:08 <pleia2> on line 200 of https://etherpad.openstack.org/p/zanata-install I listed my outstanding tasks, so I'd like to do a soft hand-off to someone while I'm gone so things keep chugging along
19:27:30 <clarkb> I can help out there
19:27:40 <clarkb> I am relatively familiar with how all that fits together so happy to
19:27:55 <clarkb> #link https://etherpad.openstack.org/p/zanata-install
19:28:20 <pleia2> and as always, if I can get folks to review open zanata reviews, it would get StevenK unstuck in a few places https://review.openstack.org/#/q/status:open+topic:zanata,n,z
19:28:23 <jeblair> #info clarkb is pleia2 for next 2 weeks
19:28:46 <pleia2> https://review.openstack.org/#/c/206305/ needs another +2
19:28:48 <clarkb> I did review the jeepyb change stevenK wrote yesterday
19:28:51 <clarkb> it looks good
19:28:55 <pleia2> great
19:29:03 <clarkb> need to find the puppet change to run it but that should be straightforwrad
19:29:18 <clarkb> will review 206305 today
19:29:22 <jeblair> clarkb: https://review.openstack.org/208811
19:29:28 <jeblair> topic:zanata ftw
19:29:40 <pleia2> he also said he'll be working on 203928 this week, so that WIP will move along soon
19:29:53 <clarkb> the jeepyb change is under that topic too, perfect
19:30:16 <jeblair> clarkb, pleia2: thanks!
19:30:29 <pleia2> I think that's it, I'm around for the rest of the day if clarkb needs any more details for anything
19:30:55 <pleia2> thanks clarkb :)
19:31:02 <jeblair> #topic #topic Puppet-httpd replacement of puppet-apache (pabelanger/nibalizer)
19:31:04 <jeblair> gah
19:31:05 <jeblair> #topic Puppet-httpd replacement of puppet-apache (pabelanger/nibalizer)
19:31:18 <nibalizer> this is quite close to being done
19:31:22 <pabelanger> aye
19:31:40 <nibalizer> https://review.openstack.org/#/c/198819/ one more
19:32:06 <nibalizer> to system-config this time, not a module
19:32:43 <pabelanger> https://review.openstack.org/#/c/201759/ is for grafana
19:32:49 <pabelanger> and system-config too
19:32:58 <nibalizer> its also worth noting that the existence of a patch like this demonstrates that there is still active configuration inside system-config that hasn't been pushed out into the modules, but thats a different topic
19:33:55 <jeblair> nibalizer: yeah, some of thot is probably some low-hanging fruit
19:33:56 <nibalizer> there are a couple follow on activites such as removing if statements that paul had to hack in, as well as finally removing apache 0.0.4 from modules.env
19:34:16 <nibalizer> but I'm very pleased with how quickly and effectively this went
19:34:17 <jeblair> (static.o.o is not lhf, but many of the others are)
19:34:19 <nibalizer> thanks pabelanger
19:34:41 <pabelanger> https://review.openstack.org/#/c/205696/ is the removal of puppet-apache
19:34:49 <jeblair> pabelanger, nibalizer: indeed, thanks! :)
19:34:51 <pabelanger> nibalizer: np, was strightforward
19:35:44 <nibalizer> thats all I have on that topic
19:35:49 <jeblair> cool, thanks
19:35:56 <jeblair> #topic Restore from backup test (jeblair)
19:36:12 <jeblair> this is a thing we have talked about a bit... but haven't really done
19:36:24 <jeblair> i mean, backups are great, but you don't really have them unless you've used them
19:36:40 <jeblair> when i set some of this up, early on i did some limited restore tests
19:36:52 <jeblair> (like log into the server and restore to a dir in /tmp/)
19:37:08 <jeblair> but that was a long time ago, and not at all systematic, nor has it been done for every host
19:37:12 <clarkb> when I updated the docs I too restored specific files from backup to test the docs
19:37:21 <greghaynes> For some context - what all is this a backup of?
19:37:23 <clarkb> but has all those same issues
19:37:27 <jeblair> greghaynes: ah great question!
19:37:50 <jeblair> we perform fairly traditional filesystem backups of major hosts to an off-host location
19:37:59 <jeblair> specifically a server in a different cloud account on a different provider
19:38:07 <jeblair> er
19:38:11 <jeblair> strike different provider
19:38:25 <greghaynes> heh, different region?
19:38:27 <fungi> (eventually different provider)
19:38:31 <clarkb> yes different region
19:38:31 <jeblair> yes different region
19:38:40 <greghaynes> ok, gotcha
19:38:51 <greghaynes> I am really a fan of daily restore from backup jobs :)
19:38:54 <jeblair> we used to, but then the 'different provider' did one of the things that the diversity is supposed to protect us from
19:38:59 <jeblair> which is, they deleted the host
19:39:06 <jeblair> greghaynes: no points for guessing which
19:39:13 <greghaynes> haha
19:39:58 <jeblair> but yeah, ideally, we back up to multiple locations that are as diverse from where we actually run these as possible.  we are not currently living the dream.
19:40:33 <jeblair> at any rate, i put this on the agenda so we could start talking about it and brainstorming...
19:40:45 <fungi> though it makes a great point. prepare for the loss of your backup server ;)
19:40:59 <clarkb> we do have a third region available to us to add diversity
19:41:09 <jeblair> (we should probably work on replacing the one that was deleted, i think we can now)
19:41:31 <fungi> nameless provider seems to have gotten a little more reliable about not randomly deleting our servers
19:41:35 <jeblair> since i did most of the work designing the system as is, i think i'd like someone else to take the lead on evaluating our preparedness to restore
19:42:11 <jeblair> i was imagining we design and conduct a periodic drill, but i'm also open to greghaynes suggestion of automated continuous testing...
19:42:28 <clarkb> not to derail on specifics, but one related item I think we need to handle better is the we ran out of disk on backup server issue since our backups are append only
19:42:43 <jeblair> clarkb: agreed
19:43:22 <fungi> i like the automated continuous testing idea if it's feasible (i mean, we already do a lot of that) but would need some way to assure we pay attention to failures
19:43:23 <jeblair> #link http://docs.openstack.org/infra/system-config/sysadmin.html#backups
19:43:26 <greghaynes> Yea, im sure its a matter of how much work restoring from backup is. Also, the automatedish setup is only useful if you have a good way to notice when a fialure happens
19:44:09 <fungi> periodic drills are something we can have as early as today if we want them, modulo available root sysadmin availability
19:44:42 <greghaynes> Going through one is probably a prereq for knowing what all an automated solution would need to do, also
19:44:47 <greghaynes> at least for me
19:45:17 <jeblair> greghaynes: good point
19:45:34 <jeblair> so probably we should start with that, and then see whether we think the process is amenable to automation
19:46:13 <greghaynes> ++
19:46:40 <jeblair> at any rate, i'm looking for not-me volunteers for this... think about whether you can help out, and we'll check back in next week
19:46:52 <fungi> yep, identifying _what_ we want to check is probably step #1
19:47:18 <jeblair> #topic  Puppet apply coverage for fedora21 / centos7 (pabelanger)
19:47:25 <pabelanger> hey
19:47:43 <jeblair> pabelanger: o/
19:47:49 <pabelanger> this one is quick, I have 2 reviews up to add support for Fedora21 and Centos7 for our puppet-apply tests.
19:47:53 <pabelanger> https://review.openstack.org/#/c/203756/
19:47:57 <pabelanger> https://review.openstack.org/#/c/203759/
19:48:16 <pabelanger> they are non-voting jobs right now, but want to kick them to voting for next week
19:48:25 <pabelanger> so, just need some help reviewing and approving
19:48:33 <jeblair> oh hey, they make the jobs go green, respectively
19:49:02 <jeblair> this seems like a swell idea
19:49:10 * clarkb will add to review list today
19:49:24 <ianw> while people are thinking about puppet-apply, https://review.openstack.org/205873 & https://review.openstack.org/206319 are about running it with project-config
19:49:54 <jeblair> ianw: oh thanks for bringing that up...
19:50:30 <pabelanger> thats all I had
19:50:45 <jeblair> i've been thinking about that, and i think there's the idea that it should have an impact, though it doesn't in our current jobs
19:51:25 <nibalizer> I think it matters if project_config ever grows a puppet template
19:51:27 <nibalizer> but right now it does not
19:51:30 <jeblair> that is to say, it really is related, but our apply jobs may not get far enough in what they do to actually exercise it
19:51:33 <clarkb> ianw: we pull the ocnfig we are deploying out of project_config so it needs to cogate
19:51:50 <clarkb> ianw: for noop its less important but anything else would need it
19:52:46 <nibalizer> and on the subject of puppet: https://review.openstack.org/#/c/206779/ adds a hiera directory to system-config and starts using it
19:52:49 <clarkb> (just noting that as we do the other beaker based testing)
19:53:17 <pabelanger> nibalizer: nice, will be checking that out
19:53:33 <clarkb> pabelanger: https://review.openstack.org/#/c/203756/ doesn't pass the f21 test
19:54:15 <ianw> ok, well i'll have to think about it.  i think we could skip those jobs for the bulk of project-config changes that are adding/modifying zuul jobs
19:54:15 <pabelanger> clarkb: right, I don't know why right now.  I need to check out dib builds to see what happens. Since it should be the same code for production installs
19:54:29 <pabelanger> clarkb: I won't be gating that until I figure it out
19:55:14 <jeblair> ianw: yeah.  i think it's worth some thought.  there's definitely a relationship there.  we're probably not adequately testing it.  if we can, we should.  but if it's not possible, then yeah, maybe we should drop the jobs.
19:55:35 <jeblair> #topic Open discussion
19:55:39 <EmilienM> 1/ could we have reviews on "Add support for backport-potential commit flag" > https://review.openstack.org/#/c/175849/ and 2/ Also "jenkins: implement copy-puppet-logs publisher" (miss +A) > https://review.openstack.org/206693 - thank you
19:56:37 <zaro> Gerrit upgrade update, new info in #link https://code.google.com/p/gerrit/issues/detail?id=3424
19:56:43 <pabelanger> So, I know we only have 5mins, but was going to do this on mailing list. Could somebody explain when it is appropiate to make a job (node) reusable? https://github.com/openstack-infra/project-config/blob/master/zuul/openstack_functions.py#L34
19:56:46 <clarkb> EmilienM: my initial impression of 175849 is the submitter wrote the commit message should just push to the old branch so that conflicts can be handled
19:57:03 <zaro> was wondering if you guys can take a look and help me determine if we want to increase diff timeout?
19:57:25 <jaypipes> pabelanger: hey, what's the latest status on bringing stackalytics into openstack.org?
19:57:28 <clarkb> ericsson has independently ocnfirmed our jenkins slave plugin bug
19:58:05 <pabelanger> jaypipes: good, I rebased patches today. We're pretty close. I suspect next week I'll have a request up for reviews / approval
19:58:24 <jaypipes> pabelanger: coolio. let me know if you need any assistance from my end.
19:58:41 <EmilienM> clarkb: don't you think it's a good first iteration?
19:58:55 <clarkb> EmilienM: I worry its putting too much effort into a nintractable problem
19:59:18 <ianw> pabelanger: i think reusable if it's not running untrusted code?
19:59:27 <ianw> is any code trusted though...
19:59:29 <clarkb> its trivial to push the backport if you know you need to push it and it allows you to do the non trivial work of resolving conflicts
20:00:05 <pabelanger> ianw: so, what if a job sandboxes code into the workspace, which gets deleted after a job runs?  Could that be reusable?
20:00:20 <pabelanger> ianw: EG: not using sudo
20:00:30 <EmilienM> clarkb: it's just we want to automate it
20:00:36 <EmilienM> clarkb: the "push" part
20:00:42 <jeblair> times up; maybe finish these in #-infra
20:00:43 <clarkb> EmilienM: yes and I am saying you cannot automate it
20:00:47 <jeblair> thanks everyone!
20:00:50 <jeblair> #endmeeting