19:01:34 <clarkb> #startmeeting infra
19:01:35 <openstack> Meeting started Tue Oct 31 19:01:34 2017 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:37 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:39 <openstack> The meeting name has been set to 'infra'
19:02:19 <ianw> o/
19:02:27 <clarkb> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:02:35 <clarkb> #topic Announcements
19:02:45 <clarkb> #info Summit next week
19:03:08 <clarkb> I expect next week will be somewhat quiet as a result
19:03:33 <fungi> unless you're at the conference
19:03:42 <fungi> in which case it will be noisy
19:03:45 <clarkb> #info No meeting November 7 (next week)
19:03:50 <clarkb> we won't have a meeting as a result
19:04:06 <fungi> (except possibly in person)
19:04:07 <clarkb> I will be home the week after summit though so intend on being here to run the meeting then
19:04:18 <dmsimard> oh, hey, there'll be a bunch of infra-root coverage in APAC during the summit week :)
19:04:24 <clarkb> dmsimard: indeed
19:04:26 <fungi> dmsimard: you know it!
19:04:40 <jeblair> does it count if we're drunk?
19:04:42 <dmsimard> I'll stay in the cold Canada so you guys drink a beer or two in my honor
19:04:42 <clarkb> I will be on a plane tomorrow as will others. Excited to see those who can make it
19:05:03 <pabelanger> yah, I start travels on Thursday evening
19:05:04 <fungi> jeblair: that's a rather existential question
19:05:22 <clarkb> #topic Actions from last meeting
19:05:29 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2017/infra.2017-10-24-19.00.txt Minutes from last meeting
19:05:39 <clarkb> #action fungi Document secrets backup policy
19:05:47 <clarkb> fungi: ^ is there somethign we can review for this yet?
19:05:51 <fungi> thanks, and yeah it's coming when, er, i get to it
19:05:55 <clarkb> ok :)
19:06:13 <clarkb> ianw has started work on backups so I won't action him again on that as it is in progress
19:06:25 <clarkb> #topic Specs approval
19:06:32 <clarkb> #link https://review.openstack.org/#/c/516454/
19:06:59 <clarkb> this is an easy mark implemented spec change. I'd like to open that up to voting now and will likely approve it as soon as I have a computer with internets again after flying across the pacific
19:07:21 <pabelanger> ++
19:07:42 <fungi> works for me, but then again it's my change
19:08:07 <clarkb> haven't really seen any other specs that are urgent likely due to the next topic
19:08:09 <clarkb> #topic Priority Efforts
19:08:14 <clarkb> #topic Zuul v3
19:08:31 <clarkb> we seem to be settling in quite a bit on v3 now \o/
19:08:49 <jeblair> i agree, not everything is on fire
19:08:54 <pabelanger> ha
19:08:55 <fungi> it's refreshing
19:08:57 <clarkb> jeblair: there wasn't anything else we need to talk about the PTI and job variants is there?
19:09:29 <dmsimard> there's still some fire but it's... less fire-y
19:09:52 <fungi> i am sort of interested to see how we deal with the pti saying no tox for docs builds and dealing with dependencies for sphinx extensions
19:09:58 <jeblair> clarkb: i don't think so -- i think our last decision on that is holding.  we had that change that we worked through the implications, but i don't think it changed our thinking really.
19:10:00 <fungi> but that's not really v3-related
19:10:19 <clarkb> jeblair: ok I'll remove it from the agenda then
19:10:21 <clarkb> pabelanger: you are up
19:10:44 <jeblair> fungi: yeah, best i can see is the pti will specify a docs/requirements.txt file that our job will install ... somewhere, and then run docs.
19:10:51 <pabelanger> so, wanted to highlight zuul-cloner removal, we have 2 reviews up
19:11:13 <pabelanger> however, there was some discussions this morning that will likely result in broken branch-tarball jobs
19:11:24 <pabelanger> because tox_install.sh in projects rely on zuul-cloner.
19:11:43 <fungi> usually that's a conditional thing and they skip it if not present
19:11:49 <pabelanger> eg: https://review.openstack.org/514483/ and https://review.openstack.org/513506/
19:12:03 <fungi> because not all their devs are going to want to locally install zuul to get zuul-cloner available
19:12:14 <jeblair> fungi: but if it's not there, it's going to clone from git.o.o which is worse for us
19:12:19 <fungi> ahh
19:12:25 <AJaeger> (sorry, going back a bit) jeblair: we discussed some policy last week - will you update the infra-manual as well, please?
19:12:28 <pabelanger> basically, once zuul-cloner role is remove from base playbook, we should be suggesting jobs use base-legacy, right? Or create a variant?
19:12:41 <fungi> jeblair: that would be an alternate implementation i hadn't seen yet
19:12:56 <dmsimard> another thing is that this isn't just for tox_install.sh, but other projects that have jobs run off of base which aren't necessarily legacy and end up running a script that uses zuul-cloner
19:13:02 <jeblair> fungi: oh?  i thought it cloned requirements from git.o.o so local devs have constraints?
19:13:10 <fungi> i wonder how many have that specific behavior (retrieving zuul-cloner over the network if not available)
19:13:17 <jeblair> dmsimard: then that's a legacy job
19:13:25 <jeblair> dmsimard: if so, it's parented wrong
19:13:25 <pabelanger> agree
19:13:36 <fungi> jeblair: ahh, that's what you meant by cloning over the network
19:13:53 <jeblair> fungi: no sorry, i meant that tox_install is going to clone *requirements*
19:13:54 <clarkb> so to clarify the issue is jobs that use zuul-cloner that are not parented to legacy-base
19:14:05 <clarkb> we will continue to install the z-c shim on legacy-base but want to remove it from base
19:14:23 <pabelanger> yes
19:14:30 <dmsimard> jeblair: for me, a legacy job is a job that was automatically migrated from v2.
19:14:32 <clarkb> can we just update the jobs to use the different base then?
19:14:39 <jeblair> clarkb: yeah, and the only ones i'm worried about are the ones that are using in-repo tox_install.sh files, which are a bunch of neutron and horizon jobs.
19:14:46 <jeblair> dmsimard: me too :)
19:14:55 <pabelanger> in the example this morning, our native zuulv3 jobs publish-openstack-python-branch-tarball, still needed zuul-cloner for python-ironicclient, due to tox_install.sh
19:14:59 <jeblair> let me try to rehprase
19:15:50 <AJaeger> and publish-openstack-python-branch-tarball is based on publish-openstack-artifacts - which has no parent
19:15:58 <jeblair> we have a set of v3 native jobs that incidentally cause tox_install.sh to run.  these are the ones i'm concerned about.  they include unit test jobs, tarball jobs, and probably some others.
19:16:08 <AJaeger> agreed, jeblair
19:16:09 <jeblair> AJaeger: then it's parent is 'base'
19:16:43 <fungi> so longer-term we want the affected projects to adjust their tox configs and we also want to stop using tox for some things (such as tarball generation and sphinx doc builds)
19:16:45 <AJaeger> jeblair: yes, and that is not legacy-base, so part of your set of jobs
19:16:50 <jeblair> so, in parallel, mordred is doing work to eliminate, or significantly change, tox_install.sh.
19:16:56 <fungi> and i guess the question is what to do in the near term?
19:17:12 <jeblair> i suggest that we defer removing the shim from base until mordred's tox_install.sh work is done
19:17:17 <clarkb> jeblair: gotcha, so is the idea to wait for tox_install.sh modifications to get in first then make the shim change?
19:17:21 <clarkb> sounds like it :)
19:17:33 <clarkb> that seems fine to me
19:17:46 <AJaeger> +1
19:17:47 <fungi> the down-side being that we may end up with an increasing proliferation of zuul-cloner dependence cargo-culted from legacy jobs?
19:17:52 <pabelanger> that could risk more jobs coming on line with zuul-cloner, and breaking, if in base? which, sounds like we are okay with
19:17:54 <jeblair> that's the easiest thing, though it creates a risk people may accidentally use the zuul-cloner shim as they write native v3 jobs
19:18:02 <ianw> mordred's work being tox-siblings?
19:18:15 <jeblair> ianw: yeah, that's part of it
19:19:41 <pabelanger> we if we figured out the jobs that still used tox_install.sh, and moved them into a legacy job again? even it needed to be created
19:19:43 <jeblair> though there was also talk about publishing, and dependencies, and... i can't really speak to all of it.  :)
19:20:42 <clarkb> pabelanger: one problem with that is if it is eg horizon python jobs then we are using different python job defs for different projects
19:20:49 <clarkb> (and we've tried really hard to not do that for years now)
19:21:06 <jeblair> if we did that, i assume we'd change all of openstack-tox-py27 and friends to parent to this job
19:21:38 <clarkb> I think the risk fo more z-c sneaking in is low
19:21:45 <pabelanger> okay
19:21:54 <clarkb> and if we are working to eliminate the need for that so we can remove it from the base job that seems fine?
19:22:26 <pabelanger> yah, we can remove zuul-env from DIB, that should be okay
19:22:35 <jeblair> i still lean that way because it's the least amount of 'extra' work, assuming more zuul-cloner doesn't sneak in.
19:22:36 <pabelanger> just removal of role in base job will break things
19:23:08 <jeblair> yeah, assuming the role is robust enough to create zuul-cloner binary and any needed directories if none already exists
19:23:14 <fungi> plan sounds solid enough to me
19:23:21 <jeblair> i *think* it's just splatting it into /usr/local/bin, so probably fine
19:23:34 <clarkb> jeblair: my recollection of reviewing that was that it did set it up in a way that should just work (tm)
19:23:44 <pabelanger> jeblair: it creates zuul-env path and overwrites today
19:23:58 <jeblair> oh right, it's zuul-env.  so as long as it makes the dirs.
19:24:09 <pabelanger> yah, it does right now
19:24:22 <pabelanger> thanks Shrews!
19:24:28 <clarkb> we could uipdate the shim to emit a string that says "this is deprecated"
19:24:36 <clarkb> then assuming we get logsatsh working reliably again query that string
19:24:41 <clarkb> to find where it is being used
19:24:46 <pabelanger> yes, I would like that
19:24:55 <AJaeger> clarkb: codesearch might be easier ;)
19:25:01 <clarkb> AJaeger: ya or codesearch
19:25:50 <dmsimard> clarkb: it already does
19:25:51 <pabelanger> the other one is jenkins element / user, how did we want to handle that.
19:25:51 <fungi> well, codesearch doesn't really tell us whether it's actually used, while logstash gets us actual uses but mostly only the frequent ones (so infrequently-run jobs might get missed)
19:25:54 <clarkb> dmsimard: perfect
19:26:16 <clarkb> pabelanger: one thing to keep in mind is we'll break third party CIs
19:26:17 <pabelanger> https://review.openstack.org/514485/ Remove jenkins from DIB image
19:26:26 <clarkb> we already get frequent complaints from them about the lack of java
19:26:39 <pabelanger> clarkb: right, that is true
19:26:44 <clarkb> maybe the thing to do is keep the element in tree, update it to install java, but remove it from our images
19:26:53 <pabelanger> however, we can point them to project-config branch?
19:26:56 <clarkb> and that way third party CIs get a thing that functions for them and shouldn't need much changes
19:26:58 <fungi> we could leave the dib element which installs zuul-cloner but not actually include it in our images, right?
19:27:13 <fungi> er, what clarkb said while i was typing ;)
19:28:59 <pabelanger> we'd have to refactor it a bit into another element but ya
19:29:34 <pabelanger> jenkins-slave for the most part is good, but nodepool-base and openstack-repos do some jenkins things
19:29:50 <pabelanger> if we did that, then we'd stop includeing jenkins-slave in nodepool.yaml
19:29:59 <clarkb> ya we'd drop it from our images
19:30:05 <clarkb> but carry the image to avoid all the questions we get about it
19:30:31 <clarkb> er carry the element
19:30:33 <pabelanger> okay, I can work on that
19:30:52 <jeblair> yeah, especially since we're not yet telling 3pcis to run zuulv3 yet, we should not be too quick to delete the elements.  but i'd love to drop from our images.
19:31:22 <clarkb> sounds like a plan then
19:31:29 <pabelanger> yah, so I'll move everything into jenkins-slave element, and see how that works
19:31:43 <clarkb> #agreed keep element for jenkins in project-config to aid third party ci, don't install it on our images
19:32:04 <clarkb> #agreed drop z-c shim from base job after mordred removes it from jobs that currently use it with a base parent
19:32:16 <clarkb> anything else zuul related befor we move on?
19:32:35 <ianw> can we do the zuul-env drop in a separate change?
19:32:54 <ianw> i think i mentioned that, it was in with the jenkins user removal
19:33:25 <pabelanger> 514483 is it's own change, but I'll move into jenkins-slave element now
19:33:26 <clarkb> ya we should separate those
19:33:40 <ianw> oh, no i was thinking of the sudo grep stuff, sorry
19:33:48 <ianw> https://review.openstack.org/#/c/514485/1/nodepool/elements/zuul-worker/install.d/60-zuul-worker
19:33:51 <ianw> ok, ignore me
19:34:01 <clarkb> alright zuulv3 going once
19:34:31 <clarkb> #topic General topics
19:34:41 <clarkb> #topic New backup server in ord
19:34:50 <clarkb> ianw: ^ want to update us?
19:35:06 <ianw> ok, so i started looking at zuulv3 backups and decided this server really wanted updating
19:35:17 <ianw> we now have 3 backup servers in various states of completeness
19:35:49 <ianw> firstly,
19:36:03 <ianw> #link https://review.openstack.org/516148 is for a new xenial server, which should be uncontroversial?
19:36:16 <ianw> unless the name has problems
19:36:51 <ianw> that server is up, and has an initial rsync of the existing server's /opt/backups
19:37:23 <ianw> #link https://review.openstack.org/516157 i think zuulv3 could just start using it now, as it has no existing backups
19:37:56 <ianw> #link https://review.openstack.org/516159 moves everything else
19:38:23 <ianw> that would need me to babysit.  in a quiet period do a final rsync, and ensure the new host key is accepted on the to be backed up hosts
19:38:24 <clarkb> ok so mostly just needing reviews to switch over the backup target?
19:38:56 <ianw> yes, as long as we agree ... i just kind of went ahead with this on a monday so, yeah
19:39:13 <clarkb> ya I think updating to modern host is a good idea
19:39:24 <clarkb> I haven't reviewed the changes yet but am onboard with the idea
19:39:39 <ianw> ok, the only other thing for discussion is the size
19:39:55 <ianw> it's not critical but /dev/mapper/main-backups 3.0T 2.3T 528G 82% /opt/backups
19:40:12 <ianw> from my playing with bup, one issue is there isn't really a way to prune old data
19:40:15 <clarkb> ya bup is append only and you can' delete the old things
19:40:17 <jeblair> it's append-only, so we'll get a reprieve when we switch
19:40:34 <ianw> jeblair: well i cloned the old backups.  but we could start again?
19:40:51 <ianw> that seems to be the option, move the old directory out of the way and start over
19:40:55 <jeblair> oh yeah
19:41:06 <jeblair> maybe we should start over and keep the old host for a couple months?
19:41:07 <ianw> or, we attach another tb and worry about it some other time
19:41:24 <clarkb> jeblair: that seems reasonable
19:41:35 <clarkb> and a host rebuild seems like as good a time as any to actually reset the base of the backups
19:41:45 <ianw> ok, we can attach the cinder volumes to the new host and just keep them at /opt/old for a bit or something
19:42:22 <fungi> also i expect the old backup instance we replaced a while back can be safely deleted now?
19:42:26 <fungi> i can't even seem to log into it
19:42:47 <ianw> fungi: i think so, i can clean that up
19:42:48 <fungi> shows active in nova list but doesn't actually respond
19:43:12 <clarkb> #agreed Start fresh backups on new backup host, keep old backups around for a few months before deleting them
19:43:39 <pabelanger> wfm
19:43:42 <clarkb> ianw: so other than answer questions around ^ the other thing you need is reviews?
19:44:15 <ianw> yep, great.  that means i can just setup the new server, accept the keys on the to-be-backed-up hosts and they can start fresh, easy
19:44:51 <clarkb> #topic rax-ord instance clean up
19:44:51 <fungi> wfm
19:45:00 <clarkb> pabelanger: this one is yours too
19:45:12 <pabelanger> this one is me
19:45:19 <pabelanger> 2 x fg-test, 1 x pypi.slave.openstack.org can we delete these instances?
19:45:27 <pabelanger> or anybody know what they are?
19:45:30 <clarkb> thinking about pypi.slave.openstack.org more I think that may have been the instance that built our old old old mirror?
19:45:45 <pabelanger> yah, I think so too
19:45:47 <clarkb> or was it what we used to publish to pypi?
19:46:02 <clarkb> in either case I think we can remove it since both those use cases are now solved elsewhere
19:46:22 <clarkb> however it doesn't appear to be in dns
19:46:25 <pabelanger> sure, just wanted to make sure before I deleted them
19:46:34 <clarkb> so maybe we need to log in and check what is actually at that IP before we delete?
19:46:49 <pabelanger> I couldn't get into fg-test
19:46:49 <clarkb> also I don't know what fg-test is
19:46:57 <pabelanger> will try pypi.slave.o.o
19:47:20 <clarkb> jeblair: fungi any idea what fg-test is?
19:47:40 <fungi> not a clue
19:47:45 <jeblair> no
19:48:16 <fungi> have the ip address handy?
19:48:28 <clarkb> server list is slow
19:48:49 <clarkb> pabelanger: ^ do you have IPs?
19:48:51 <pabelanger> yes, took some time to clean up this afternoon
19:48:54 <pabelanger> yah, 1 sec
19:50:04 <pabelanger> no, not in buffer any more, waiting for openstack server list now
19:50:29 <clarkb> probably the thing to do here is do our best to login via ip and just double check the servers aren't misnamed or otherwise important
19:50:31 <clarkb> then delete thenm
19:50:40 <clarkb> can can follow up on that after the meeting?
19:50:40 <fungi> server list isn't usually that slow
19:50:47 <fungi> but yeah, let's move on
19:51:02 <clarkb> #topic Open Discussion
19:51:09 <pabelanger> 50.56.112.15 pypi.slave.o.o
19:51:30 <pabelanger> 50.56.121.53 and 50.56.121.54 fg-test
19:51:37 <clarkb> I would appreciate more reviews on https://review.openstack.org/#/c/516502/ as I think something like that will have a big impact on logstash indexing
19:51:40 <clarkb> jeblair: thank you for the review
19:51:42 <fungi> pretty sure pypi.slave.o.o got replaced by release.slave.o.o
19:52:09 <fungi> but it was in ord?
19:52:09 <pabelanger> still working on launching more logstash-workers, likely try again this evening after halloween
19:52:22 <pabelanger> fungi: yes, along side nodepool nodes
19:52:29 <jeblair> clarkb: yeah i like the approach there (non-.gz is authoritative name)
19:52:31 <fungi> not dfw?
19:52:50 <pabelanger> fungi: correct
19:52:52 <fungi> i also don't see an fg-test node. what are you looking at the openstackci tenant or somewhere else?
19:53:12 <clarkb> also I'll start to become afk more and more over the next few days, back around on the weekend during board meetings and such :)
19:53:35 <pabelanger> fungi: I'm using nodepool.o.o clouds.yaml file (nodepool user)
19:53:44 <ianychoi> clarkb, will you have time during Summit? Zanata development team will be in Summit - I am not sure whether the team already contacted you or not but hopefully I18n and infra things regarding Zanata would be successfully discussed on the next week
19:53:49 <fungi> ahh, so openstackjenkins
19:53:49 <clarkb> fungi: its the openstackjenkins tenant
19:54:00 <pabelanger> yes
19:54:02 <pabelanger> sorry
19:54:21 <clarkb> ianychoi: I'll be around schedule is quite full already, probably hte best bet is lunch?
19:54:37 <clarkb> ianychoi: ping me and we can sort somethign out
19:54:50 <pabelanger> ya, my key doesn't work for fg-test, but possible it is old key at this point
19:55:07 <ianychoi> clarkb, I think so. Yep - I will tell Zanata development team members, thanks!
19:55:12 <clarkb> oh also ianw will we have a decision on team evening things soon ish?
19:55:25 <ianw> oh, i think wednesday works?
19:55:47 <ianw> mon is some rdo thing i think, tuesday is melbourne cup thing, so that leaves ...
19:56:20 <clarkb> I think wednesday works
19:56:53 <rbowen> 778108
19:56:55 <rbowen> sorry.
19:57:26 <clarkb> ianw: probably best thing is respond to the thread and make some decisions and anyone that has a major conflict can attempt to plan something different >_> this was my tactic at previous PTGs
19:57:39 <ianw> clarkb: ok, will do
19:57:49 <clarkb> and with that I'll get out of the way of the TC maybe having a meeting
19:57:51 <clarkb> #endmeeting