19:01:34 #startmeeting infra 19:01:35 Meeting started Tue Oct 31 19:01:34 2017 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:37 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:39 The meeting name has been set to 'infra' 19:02:19 o/ 19:02:27 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:02:35 #topic Announcements 19:02:45 #info Summit next week 19:03:08 I expect next week will be somewhat quiet as a result 19:03:33 unless you're at the conference 19:03:42 in which case it will be noisy 19:03:45 #info No meeting November 7 (next week) 19:03:50 we won't have a meeting as a result 19:04:06 (except possibly in person) 19:04:07 I will be home the week after summit though so intend on being here to run the meeting then 19:04:18 oh, hey, there'll be a bunch of infra-root coverage in APAC during the summit week :) 19:04:24 dmsimard: indeed 19:04:26 dmsimard: you know it! 19:04:40 does it count if we're drunk? 19:04:42 I'll stay in the cold Canada so you guys drink a beer or two in my honor 19:04:42 I will be on a plane tomorrow as will others. Excited to see those who can make it 19:05:03 yah, I start travels on Thursday evening 19:05:04 jeblair: that's a rather existential question 19:05:22 #topic Actions from last meeting 19:05:29 #link http://eavesdrop.openstack.org/meetings/infra/2017/infra.2017-10-24-19.00.txt Minutes from last meeting 19:05:39 #action fungi Document secrets backup policy 19:05:47 fungi: ^ is there somethign we can review for this yet? 19:05:51 thanks, and yeah it's coming when, er, i get to it 19:05:55 ok :) 19:06:13 ianw has started work on backups so I won't action him again on that as it is in progress 19:06:25 #topic Specs approval 19:06:32 #link https://review.openstack.org/#/c/516454/ 19:06:59 this is an easy mark implemented spec change. I'd like to open that up to voting now and will likely approve it as soon as I have a computer with internets again after flying across the pacific 19:07:21 ++ 19:07:42 works for me, but then again it's my change 19:08:07 haven't really seen any other specs that are urgent likely due to the next topic 19:08:09 #topic Priority Efforts 19:08:14 #topic Zuul v3 19:08:31 we seem to be settling in quite a bit on v3 now \o/ 19:08:49 i agree, not everything is on fire 19:08:54 ha 19:08:55 it's refreshing 19:08:57 jeblair: there wasn't anything else we need to talk about the PTI and job variants is there? 19:09:29 there's still some fire but it's... less fire-y 19:09:52 i am sort of interested to see how we deal with the pti saying no tox for docs builds and dealing with dependencies for sphinx extensions 19:09:58 clarkb: i don't think so -- i think our last decision on that is holding. we had that change that we worked through the implications, but i don't think it changed our thinking really. 19:10:00 but that's not really v3-related 19:10:19 jeblair: ok I'll remove it from the agenda then 19:10:21 pabelanger: you are up 19:10:44 fungi: yeah, best i can see is the pti will specify a docs/requirements.txt file that our job will install ... somewhere, and then run docs. 19:10:51 so, wanted to highlight zuul-cloner removal, we have 2 reviews up 19:11:13 however, there was some discussions this morning that will likely result in broken branch-tarball jobs 19:11:24 because tox_install.sh in projects rely on zuul-cloner. 19:11:43 usually that's a conditional thing and they skip it if not present 19:11:49 eg: https://review.openstack.org/514483/ and https://review.openstack.org/513506/ 19:12:03 because not all their devs are going to want to locally install zuul to get zuul-cloner available 19:12:14 fungi: but if it's not there, it's going to clone from git.o.o which is worse for us 19:12:19 ahh 19:12:25 (sorry, going back a bit) jeblair: we discussed some policy last week - will you update the infra-manual as well, please? 19:12:28 basically, once zuul-cloner role is remove from base playbook, we should be suggesting jobs use base-legacy, right? Or create a variant? 19:12:41 jeblair: that would be an alternate implementation i hadn't seen yet 19:12:56 another thing is that this isn't just for tox_install.sh, but other projects that have jobs run off of base which aren't necessarily legacy and end up running a script that uses zuul-cloner 19:13:02 fungi: oh? i thought it cloned requirements from git.o.o so local devs have constraints? 19:13:10 i wonder how many have that specific behavior (retrieving zuul-cloner over the network if not available) 19:13:17 dmsimard: then that's a legacy job 19:13:25 dmsimard: if so, it's parented wrong 19:13:25 agree 19:13:36 jeblair: ahh, that's what you meant by cloning over the network 19:13:53 fungi: no sorry, i meant that tox_install is going to clone *requirements* 19:13:54 so to clarify the issue is jobs that use zuul-cloner that are not parented to legacy-base 19:14:05 we will continue to install the z-c shim on legacy-base but want to remove it from base 19:14:23 yes 19:14:30 jeblair: for me, a legacy job is a job that was automatically migrated from v2. 19:14:32 can we just update the jobs to use the different base then? 19:14:39 clarkb: yeah, and the only ones i'm worried about are the ones that are using in-repo tox_install.sh files, which are a bunch of neutron and horizon jobs. 19:14:46 dmsimard: me too :) 19:14:55 in the example this morning, our native zuulv3 jobs publish-openstack-python-branch-tarball, still needed zuul-cloner for python-ironicclient, due to tox_install.sh 19:14:59 let me try to rehprase 19:15:50 and publish-openstack-python-branch-tarball is based on publish-openstack-artifacts - which has no parent 19:15:58 we have a set of v3 native jobs that incidentally cause tox_install.sh to run. these are the ones i'm concerned about. they include unit test jobs, tarball jobs, and probably some others. 19:16:08 agreed, jeblair 19:16:09 AJaeger: then it's parent is 'base' 19:16:43 so longer-term we want the affected projects to adjust their tox configs and we also want to stop using tox for some things (such as tarball generation and sphinx doc builds) 19:16:45 jeblair: yes, and that is not legacy-base, so part of your set of jobs 19:16:50 so, in parallel, mordred is doing work to eliminate, or significantly change, tox_install.sh. 19:16:56 and i guess the question is what to do in the near term? 19:17:12 i suggest that we defer removing the shim from base until mordred's tox_install.sh work is done 19:17:17 jeblair: gotcha, so is the idea to wait for tox_install.sh modifications to get in first then make the shim change? 19:17:21 sounds like it :) 19:17:33 that seems fine to me 19:17:46 +1 19:17:47 the down-side being that we may end up with an increasing proliferation of zuul-cloner dependence cargo-culted from legacy jobs? 19:17:52 that could risk more jobs coming on line with zuul-cloner, and breaking, if in base? which, sounds like we are okay with 19:17:54 that's the easiest thing, though it creates a risk people may accidentally use the zuul-cloner shim as they write native v3 jobs 19:18:02 mordred's work being tox-siblings? 19:18:15 ianw: yeah, that's part of it 19:19:41 we if we figured out the jobs that still used tox_install.sh, and moved them into a legacy job again? even it needed to be created 19:19:43 though there was also talk about publishing, and dependencies, and... i can't really speak to all of it. :) 19:20:42 pabelanger: one problem with that is if it is eg horizon python jobs then we are using different python job defs for different projects 19:20:49 (and we've tried really hard to not do that for years now) 19:21:06 if we did that, i assume we'd change all of openstack-tox-py27 and friends to parent to this job 19:21:38 I think the risk fo more z-c sneaking in is low 19:21:45 okay 19:21:54 and if we are working to eliminate the need for that so we can remove it from the base job that seems fine? 19:22:26 yah, we can remove zuul-env from DIB, that should be okay 19:22:35 i still lean that way because it's the least amount of 'extra' work, assuming more zuul-cloner doesn't sneak in. 19:22:36 just removal of role in base job will break things 19:23:08 yeah, assuming the role is robust enough to create zuul-cloner binary and any needed directories if none already exists 19:23:14 plan sounds solid enough to me 19:23:21 i *think* it's just splatting it into /usr/local/bin, so probably fine 19:23:34 jeblair: my recollection of reviewing that was that it did set it up in a way that should just work (tm) 19:23:44 jeblair: it creates zuul-env path and overwrites today 19:23:58 oh right, it's zuul-env. so as long as it makes the dirs. 19:24:09 yah, it does right now 19:24:22 thanks Shrews! 19:24:28 we could uipdate the shim to emit a string that says "this is deprecated" 19:24:36 then assuming we get logsatsh working reliably again query that string 19:24:41 to find where it is being used 19:24:46 yes, I would like that 19:24:55 clarkb: codesearch might be easier ;) 19:25:01 AJaeger: ya or codesearch 19:25:50 clarkb: it already does 19:25:51 the other one is jenkins element / user, how did we want to handle that. 19:25:51 well, codesearch doesn't really tell us whether it's actually used, while logstash gets us actual uses but mostly only the frequent ones (so infrequently-run jobs might get missed) 19:25:54 dmsimard: perfect 19:26:16 pabelanger: one thing to keep in mind is we'll break third party CIs 19:26:17 https://review.openstack.org/514485/ Remove jenkins from DIB image 19:26:26 we already get frequent complaints from them about the lack of java 19:26:39 clarkb: right, that is true 19:26:44 maybe the thing to do is keep the element in tree, update it to install java, but remove it from our images 19:26:53 however, we can point them to project-config branch? 19:26:56 and that way third party CIs get a thing that functions for them and shouldn't need much changes 19:26:58 we could leave the dib element which installs zuul-cloner but not actually include it in our images, right? 19:27:13 er, what clarkb said while i was typing ;) 19:28:59 we'd have to refactor it a bit into another element but ya 19:29:34 jenkins-slave for the most part is good, but nodepool-base and openstack-repos do some jenkins things 19:29:50 if we did that, then we'd stop includeing jenkins-slave in nodepool.yaml 19:29:59 ya we'd drop it from our images 19:30:05 but carry the image to avoid all the questions we get about it 19:30:31 er carry the element 19:30:33 okay, I can work on that 19:30:52 yeah, especially since we're not yet telling 3pcis to run zuulv3 yet, we should not be too quick to delete the elements. but i'd love to drop from our images. 19:31:22 sounds like a plan then 19:31:29 yah, so I'll move everything into jenkins-slave element, and see how that works 19:31:43 #agreed keep element for jenkins in project-config to aid third party ci, don't install it on our images 19:32:04 #agreed drop z-c shim from base job after mordred removes it from jobs that currently use it with a base parent 19:32:16 anything else zuul related befor we move on? 19:32:35 can we do the zuul-env drop in a separate change? 19:32:54 i think i mentioned that, it was in with the jenkins user removal 19:33:25 514483 is it's own change, but I'll move into jenkins-slave element now 19:33:26 ya we should separate those 19:33:40 oh, no i was thinking of the sudo grep stuff, sorry 19:33:48 https://review.openstack.org/#/c/514485/1/nodepool/elements/zuul-worker/install.d/60-zuul-worker 19:33:51 ok, ignore me 19:34:01 alright zuulv3 going once 19:34:31 #topic General topics 19:34:41 #topic New backup server in ord 19:34:50 ianw: ^ want to update us? 19:35:06 ok, so i started looking at zuulv3 backups and decided this server really wanted updating 19:35:17 we now have 3 backup servers in various states of completeness 19:35:49 firstly, 19:36:03 #link https://review.openstack.org/516148 is for a new xenial server, which should be uncontroversial? 19:36:16 unless the name has problems 19:36:51 that server is up, and has an initial rsync of the existing server's /opt/backups 19:37:23 #link https://review.openstack.org/516157 i think zuulv3 could just start using it now, as it has no existing backups 19:37:56 #link https://review.openstack.org/516159 moves everything else 19:38:23 that would need me to babysit. in a quiet period do a final rsync, and ensure the new host key is accepted on the to be backed up hosts 19:38:24 ok so mostly just needing reviews to switch over the backup target? 19:38:56 yes, as long as we agree ... i just kind of went ahead with this on a monday so, yeah 19:39:13 ya I think updating to modern host is a good idea 19:39:24 I haven't reviewed the changes yet but am onboard with the idea 19:39:39 ok, the only other thing for discussion is the size 19:39:55 it's not critical but /dev/mapper/main-backups 3.0T 2.3T 528G 82% /opt/backups 19:40:12 from my playing with bup, one issue is there isn't really a way to prune old data 19:40:15 ya bup is append only and you can' delete the old things 19:40:17 it's append-only, so we'll get a reprieve when we switch 19:40:34 jeblair: well i cloned the old backups. but we could start again? 19:40:51 that seems to be the option, move the old directory out of the way and start over 19:40:55 oh yeah 19:41:06 maybe we should start over and keep the old host for a couple months? 19:41:07 or, we attach another tb and worry about it some other time 19:41:24 jeblair: that seems reasonable 19:41:35 and a host rebuild seems like as good a time as any to actually reset the base of the backups 19:41:45 ok, we can attach the cinder volumes to the new host and just keep them at /opt/old for a bit or something 19:42:22 also i expect the old backup instance we replaced a while back can be safely deleted now? 19:42:26 i can't even seem to log into it 19:42:47 fungi: i think so, i can clean that up 19:42:48 shows active in nova list but doesn't actually respond 19:43:12 #agreed Start fresh backups on new backup host, keep old backups around for a few months before deleting them 19:43:39 wfm 19:43:42 ianw: so other than answer questions around ^ the other thing you need is reviews? 19:44:15 yep, great. that means i can just setup the new server, accept the keys on the to-be-backed-up hosts and they can start fresh, easy 19:44:51 #topic rax-ord instance clean up 19:44:51 wfm 19:45:00 pabelanger: this one is yours too 19:45:12 this one is me 19:45:19 2 x fg-test, 1 x pypi.slave.openstack.org can we delete these instances? 19:45:27 or anybody know what they are? 19:45:30 thinking about pypi.slave.openstack.org more I think that may have been the instance that built our old old old mirror? 19:45:45 yah, I think so too 19:45:47 or was it what we used to publish to pypi? 19:46:02 in either case I think we can remove it since both those use cases are now solved elsewhere 19:46:22 however it doesn't appear to be in dns 19:46:25 sure, just wanted to make sure before I deleted them 19:46:34 so maybe we need to log in and check what is actually at that IP before we delete? 19:46:49 I couldn't get into fg-test 19:46:49 also I don't know what fg-test is 19:46:57 will try pypi.slave.o.o 19:47:20 jeblair: fungi any idea what fg-test is? 19:47:40 not a clue 19:47:45 no 19:48:16 have the ip address handy? 19:48:28 server list is slow 19:48:49 pabelanger: ^ do you have IPs? 19:48:51 yes, took some time to clean up this afternoon 19:48:54 yah, 1 sec 19:50:04 no, not in buffer any more, waiting for openstack server list now 19:50:29 probably the thing to do here is do our best to login via ip and just double check the servers aren't misnamed or otherwise important 19:50:31 then delete thenm 19:50:40 can can follow up on that after the meeting? 19:50:40 server list isn't usually that slow 19:50:47 but yeah, let's move on 19:51:02 #topic Open Discussion 19:51:09 50.56.112.15 pypi.slave.o.o 19:51:30 50.56.121.53 and 50.56.121.54 fg-test 19:51:37 I would appreciate more reviews on https://review.openstack.org/#/c/516502/ as I think something like that will have a big impact on logstash indexing 19:51:40 jeblair: thank you for the review 19:51:42 pretty sure pypi.slave.o.o got replaced by release.slave.o.o 19:52:09 but it was in ord? 19:52:09 still working on launching more logstash-workers, likely try again this evening after halloween 19:52:22 fungi: yes, along side nodepool nodes 19:52:29 clarkb: yeah i like the approach there (non-.gz is authoritative name) 19:52:31 not dfw? 19:52:50 fungi: correct 19:52:52 i also don't see an fg-test node. what are you looking at the openstackci tenant or somewhere else? 19:53:12 also I'll start to become afk more and more over the next few days, back around on the weekend during board meetings and such :) 19:53:35 fungi: I'm using nodepool.o.o clouds.yaml file (nodepool user) 19:53:44 clarkb, will you have time during Summit? Zanata development team will be in Summit - I am not sure whether the team already contacted you or not but hopefully I18n and infra things regarding Zanata would be successfully discussed on the next week 19:53:49 ahh, so openstackjenkins 19:53:49 fungi: its the openstackjenkins tenant 19:54:00 yes 19:54:02 sorry 19:54:21 ianychoi: I'll be around schedule is quite full already, probably hte best bet is lunch? 19:54:37 ianychoi: ping me and we can sort somethign out 19:54:50 ya, my key doesn't work for fg-test, but possible it is old key at this point 19:55:07 clarkb, I think so. Yep - I will tell Zanata development team members, thanks! 19:55:12 oh also ianw will we have a decision on team evening things soon ish? 19:55:25 oh, i think wednesday works? 19:55:47 mon is some rdo thing i think, tuesday is melbourne cup thing, so that leaves ... 19:56:20 I think wednesday works 19:56:53 778108 19:56:55 sorry. 19:57:26 ianw: probably best thing is respond to the thread and make some decisions and anyone that has a major conflict can attempt to plan something different >_> this was my tactic at previous PTGs 19:57:39 clarkb: ok, will do 19:57:49 and with that I'll get out of the way of the TC maybe having a meeting 19:57:51 #endmeeting