14:51:37 <clarkb> #startmeeting infra
14:51:38 <openstack> Meeting started Fri Jul 24 14:51:37 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:51:39 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:51:41 <openstack> The meeting name has been set to 'infra'
14:51:48 <clarkb> #topic Gerrit Project Renames July 24, 2020
14:52:00 <clarkb> #link https://etherpad.opendev.org/p/gerrit-2020-07-24
14:52:05 <fungi> clarkb: we've used opendev-maint for the meeting name previously i think?
14:52:33 <clarkb> ah that is what I was wondering. Maybe we can just mv this file when we are done?
14:52:38 <clarkb> or should I end and start new?
14:53:02 <fungi> it's probably fine for this one
14:53:14 <clarkb> k
14:53:23 <fungi> i missed you were asking what meeting name to use, sorry :/
14:53:47 <clarkb> I have started a root screen on bridge and run disable-ansible there. We are also waiting for a couple of openstack releases to flush through the release and post pipelines.
14:54:42 <clarkb> Once the releases are complete we'll proceed with irc notices and running the playbook
14:55:29 <fungi> i've checked the renames data change and the copy of the yaml file from it on bridge.o.o, both identical and correct
14:56:04 <fungi> they match what's going on in the rename changes too
14:56:44 <clarkb> there is also a single ironic job that should end any minute now and if it succeeds will flush about 5 changes so may wait for that too, though I think that is less critical
15:01:17 <fungi> it's zuul.opendev.org/t/openstack/status/
15:01:23 <fungi> er, mispaste
15:02:35 <clarkb> we're still waiting on nodes for the releases but I think all of the jobs are queued at this point. Now we wait
15:03:51 <clarkb> nova's release notes job takes more than half an hour?
15:04:04 <clarkb> I imagine its safe to restart Gerrit while that is running since it isn't pushing tags
15:04:53 <fungi> yeah, the problem is if we take gerrit down during tag pushes, slightly less so for the job which proposes constraints changes
15:05:15 <fungi> the last build for that ironic change is also just wrapping up now
15:05:57 <clarkb> fungi: it would probably be good to wait at least for the release-openstack-python jobs to finish too incase they fetch from gerrit?
15:06:04 <clarkb> I don't think they do but I'm always surprised :)
15:08:58 <fungi> i don't think they do. propose-update-constraints will definitely be pushing to gerrit though
15:09:36 <clarkb> oh I see
15:09:38 <clarkb> ya
15:10:20 <clarkb> we are 4 minutes away for the one release
15:13:24 <clarkb> fungi: https://zuul.opendev.org/t/openstack/build/ad0de3a8325c4c3ab4c462e6ee1bf509 nova release failed on that
15:13:36 <clarkb> is that something we can deal with after the maintenance?
15:14:10 <clarkb> looks like it did upload to pypi
15:14:29 <clarkb> and it ran on ze10 (not sure if that aws one of the time delta servers)
15:15:11 <fungi> ugh, yes that's the same problem i was trying to track down on ze11 yesterday
15:15:33 <fungi> ze02, ze10 and ze11 were the three which spontaneously rebooted at various times on wednesday
15:16:36 <fungi> it'll have to be dealt with after the maintenance
15:16:46 <fungi> it involves manually copying files from pypi into afs
15:17:36 <clarkb> I'm checking with smcginnis now that we are good to proceed wtih maintenance from their side
15:17:44 <clarkb> all the tagging and constraints have been pushed I think
15:19:20 <clarkb> fungi: corvus I've got the ansible playbook command queued up in screen. smcginnis thinks we are clear to proceed.
15:19:25 <corvus> ++
15:19:50 <clarkb> will running the status notice here confuse the bots: #status notice We are renaming projects in Gerrit and review.opendev.org will experience a short outage. Thank you for your patience.
15:20:01 <fungi> yep, should be clear to proceed
15:20:33 <clarkb> I'll run it in #opendev to avoid any bot confusion
15:20:41 <fungi> it shouldn't confuse the bots
15:20:47 <clarkb> oh well then I guess I'll try it here
15:20:52 <clarkb> #status notice We are renaming projects in Gerrit and review.opendev.org will experience a short outage. Thank you for your patience.
15:20:52 <openstackstatus> clarkb: sending notice
15:20:59 <fungi> the only real confusion will be if we switch meeting topics while still under alert
15:21:06 <fungi> i think
15:21:16 <fungi> or end the meeting while still under alert
15:21:27 <clarkb> ya I didn't alert in part to avoid topic changes
15:21:55 <clarkb> Once it says it is done I'll remove my # prefix on the bridge screen command line and run the playbook?
15:22:15 <fungi> the maintenance itself will take less time to complete than undoing the alert would anyway
15:22:34 <fungi> yeah, looks right
15:23:57 <corvus> ++
15:24:03 <openstackstatus> clarkb: finished sending notice
15:24:12 <clarkb> alright proceeding with the playbook command now
15:24:56 <fungi> please do
15:24:58 <clarkb> it is running
15:25:17 <fungi> no usable temp dir found?
15:25:18 <clarkb> uhm there was gurmp about a tmp dir?
15:25:21 <clarkb> but its proceeding?
15:25:28 <corvus> review-test
15:25:33 <clarkb> review-test
15:25:34 <clarkb> ya ok
15:25:35 <fungi> yup
15:26:34 <clarkb> its at the wati for gerrit to come up stage
15:28:10 <clarkb> logs claim it is ready so web should catch up momentarily
15:28:21 <fungi> api has gone from refusing to hanging
15:28:54 <fungi> help responds now
15:29:02 <fungi> ls-projects isn't returning yet for me though
15:29:10 <corvus> i can load changes using the web ui
15:29:46 <clarkb> yup web ui is up for me but ls-projects isn't working yet
15:30:32 <clarkb> show-queue works /me tries ls-projects again
15:30:49 <fungi> now it's outputing the projects list for me, just pausing/buffering slowly
15:31:06 <fungi> maybe project listing is slow at startup
15:31:16 <fungi> okay it finally returned for me
15:31:28 <fungi> and now it returns quickly when rerunning
15:31:35 <clarkb> yup my initla call errored but now if I do it it works
15:31:36 <fungi> so we're probably in the clear to move along
15:31:47 <clarkb> corvus: ^ you think we are ready too?
15:32:15 <corvus> yep
15:32:40 <clarkb> that lgtm other than review-test
15:33:16 <fungi> why does it run against review-test?
15:33:33 <clarkb> fungi: we must be using the review group and review-test is in it?
15:33:49 <clarkb> would probably be a good idea to be more explicit or remove review-test from that group but that will need investigating
15:33:55 <fungi> ahh, that would make sense, yeah
15:34:27 <fungi> okay, so now we land the rename changes?
15:34:31 <clarkb> we are ready to merge https://review.opendev.org/#/c/739286/ and https://review.opendev.org/#/c/738979 right?
15:34:39 <fungi> yeah i think so
15:34:44 <clarkb> this is where we had problems last time but the ansible disablement seems to be working so I think its ok
15:34:57 <fungi> also 742731 right?
15:34:57 <clarkb> and I should force merge those because the first cahnge cannot merge as is?
15:35:12 <clarkb> fungi: ya that one too but it can happen without force merging and before or after we disable ansible
15:35:37 <corvus> yeah, i think force-merge 738286
15:35:41 <fungi> yeah, unless we want to split the zuul tenant config change out we'll need to bypass zuul
15:35:53 <clarkb> corvus: should I force merge both so that we can reenable ansible more quickly?
15:35:57 <corvus> we'll get a zuul config error which will reconcile once we re-enable zuul on bridge and deploy
15:36:13 <corvus> clarkb: yes i think it's fine to do both
15:36:20 <clarkb> ok I'll force merge both now
15:36:33 <fungi> we need all three merged before we turn ansible back on though, right?
15:37:04 <fungi> er, no i guess the renames data is only used if we rebuild gitea servers
15:37:06 <clarkb> fungi: no, the record change is purely information and not processed by automation (yet)
15:37:16 <fungi> so that can merge in its own time, yeah
15:37:22 <corvus> (i had a typo earlier, 739286 not 738286)
15:38:15 <clarkb> corvus: ya I opened them from the etherpad and checked content
15:38:17 <clarkb> I have merged both
15:38:26 <fungi> yep, gerritbot confirmed
15:39:39 <clarkb> https://gitea01.opendev.org:3000/openstack/project-config/commits/branch/master I'm checking that on 01 to 08 now
15:39:41 <clarkb> 01 lgtm
15:40:31 <clarkb> all 8 lgtm
15:40:50 <fungi> yep, i also just finished checking them. all 8 look like they have those now
15:40:55 <clarkb> deploy has both changes queued up and the first manage projects job should be hitting our ansible is disabled check
15:41:26 <clarkb> fungi: corvus let me know if you think there is anything else we should check before removing the ansible disablement file. I think we should be good to proceed
15:41:34 <corvus> clarkb: i think we're good
15:41:36 <fungi> i think we're ready to go
15:42:04 <clarkb> the file has been rm'd
15:42:42 <clarkb> msg executing local code is prohibited
15:42:45 <clarkb> ok then
15:42:52 <corvus> ?
15:42:59 <clarkb> manage-projects job failed due to ^
15:43:04 <corvus> lisk?
15:43:22 <clarkb> getting one
15:43:33 <clarkb> https://zuul.opendev.org/t/openstack/build/948ba0341b334e9db4c2f32779fdae86
15:43:38 <clarkb> I think we're ok from a renaming standpoint
15:44:05 <clarkb> we just failed to run the job but once the job is working again we'll apply the updated state and noop
15:44:12 <clarkb> however, we cannot create new projects at the moment
15:45:09 <clarkb> it is the git repo update
15:45:19 <clarkb> we may actually not update anything right now :/
15:45:35 <clarkb> I'm wondering if that means we want to disable ansible again to keep periodic jobs from potentially being unhappy about things?
15:46:49 <fungi> i guess we haven't actually run infra-prod-manage-projects successfully since the upgrade: https://zuul.opendev.org/t/openstack/builds?job_name=infra-prod-manage-projects
15:47:27 <corvus> i suspect we don't really need to disable since it's just going to continue to bomb at the start
15:47:30 <clarkb> fungi: or any other infra-prod playbook
15:47:43 <clarkb> corvus: ya I think that is correct since this is very early in the infra prod playbook
15:48:35 <corvus> we're using the zuul project ssh key for access control, right?
15:48:43 <clarkb> corvus: yes
15:48:55 <clarkb> it should be added to the bridge zuul user
15:49:59 <corvus> i think we're going to need a new base job in a config-project to fix this
15:50:18 <corvus> system-config is currently in the openstack tenant, so we can add it to opendev/base-jobs or openstack/project-config
15:50:40 <corvus> i don't think we have a config-project that's just for the opendev tenant, do we?
15:50:55 <clarkb> corvus: we have opendev/project-config which we don't really use for much yet
15:50:59 <corvus> aha
15:51:32 <corvus> it's not in openstack
15:51:40 <corvus> maybe we should just put this in base-jobs for now, then move it?
15:51:54 <fungi> i'm okay with it going into opendev/base-jobs initially
15:51:55 <clarkb> opendev/base-jobs you mean? ya I think that would be easy enough
15:52:06 <corvus> i mean, no other project will be able to use it because of the project key anyway
15:52:26 <corvus> i'll work on a change
15:52:28 <fungi> we'll eventually shuffle all of that into the opendev tenant regardless
15:52:37 <clarkb> we can also explicitly limit it to openstack/project-config and openstack/system-config right?
15:52:49 <fungi> and can then put it in opendev/project-config when we do
15:52:51 <clarkb> allowed-projects or whatever the term is
15:52:54 <corvus> fungi: yeah, and when we do, we can pull it into a narrower scope
15:52:56 <corvus> clarkb: yes
15:55:29 <clarkb> and the only role it seems to use is prepare-workspace-git which is in zuul-jobs which we include to base-jobs so that will eb ok too I think
15:56:03 <clarkb> we can also probably start small and disable most of the CD jobs and try the new one with a single job? and then expand from there as things are happy
16:00:03 <clarkb> hrm unknown project opendev/system-config :/
16:00:21 <fungi> where did you see that?
16:00:39 <clarkb> on the change corvus just pushed
16:00:43 <corvus> i'll fix it in a minute, working on the other change now
16:00:51 <corvus> that should be openstack
16:00:51 <clarkb> k
16:00:59 <clarkb> oh right
16:01:08 <fungi> oh, in 742934
16:01:24 <corvus> oh wait no that's right
16:01:39 <corvus> it's just that opendev/system-config is not in every tenant
16:01:45 <corvus> let's just skip the allowed-projects
16:02:54 <clarkb> and by skipping allowed projects other projects can run it but will fail to ssh because they don't have the project key?
16:03:16 <corvus> yep.
16:03:44 <clarkb> ok maybe if we do that we should test it once it has landed (to ensure it fails as expected)
16:03:51 <clarkb> but I may be overly paranoid
16:04:34 <corvus> clarkb: sounds good
16:08:44 <clarkb> corvus: as another option can we add system-config to opendev without loading any configs from it?
16:08:54 <clarkb> its a move we'll do eventually (but with loaded configs) so maybe that is a good step anyway?
16:09:05 <corvus> clarkb: would need to be added to every tenant
16:09:19 <corvus> the issue is opendev/base-jobs is in every tenant, so every tenant needs to understand that job defn
16:09:19 <clarkb> because we load base jobs in every tenant, got it
16:09:53 <corvus> we could add system-config and project-config to every tenant and just "include: []" but i think even that is too messy
16:10:06 <corvus> this gets better once we move system-config into the opendev tenant
16:10:14 <clarkb> ya
16:11:22 <clarkb> https://review.opendev.org/#/admin/groups/459,members is who has approval on that repo fwiw. I'm thinking I may trim it down a bit?
16:11:59 <clarkb> infra-core + dmsimard, frickler, mnaser, ajaeger?
16:12:02 <clarkb> any objections?
16:12:16 <clarkb> I guess the ssh key still protects us there
16:12:20 <clarkb> so its probably fine to leave it as is
16:12:30 <fungi> oh, yeah, lots of emeritus reviewers who haven't been involved for a while
16:12:47 <clarkb> oh no the ssh key doesn't protect us as muc hthere once we consume the base job
16:13:05 <clarkb> so ya I think trimming that a bit maks sense. Any objections to doing it with the group above?
16:13:19 <mnaser> no objection
16:13:32 <mnaser> and if they're active again, we can bring them up anytime
16:13:45 <clarkb> mnaser: ya its an active vs inactive question but also a root vs not root question
16:14:08 <mnaser> ah.  well, in full transparency, i have access to vexxhost/base-jobs which is a config project inside vexxhost tenant
16:14:11 <mnaser> so.. i'd lose that..
16:14:36 <corvus> i think i'm lost
16:14:47 <fungi> mnaser: clarkb is proposing to continue including you
16:14:53 <corvus> we're talking about trimming membership of project-config-core which has approval rights for which repo(s)?
16:15:01 <clarkb> corvus: opendev/base-jobs
16:15:08 <mnaser> oh, sigh, sorry.  i thought the + was a ping, and not foo + bar
16:15:39 <corvus> i think clarkb's trimming is appropriate and should have no net effect
16:16:08 <fungi> yes, i agree with the proposed adjustment to that group membership
16:16:46 <AJaeger> clarkb: trimming looks fine - I just wonder about dmsimard, is he still reviewing? I would drop him as well
16:16:47 <clarkb> mnaser: yup I trust both you an AJaeger with that access even if you aren't proer roots
16:17:02 <clarkb> AJaeger: let me double check if we've cleared out his ssh key if so I'll clear from that group too
16:17:29 <AJaeger> clarkb: change Ife3cfdfe3b674c7703adcbcf7f5a4af708fcd03a
16:17:35 <clarkb> dmsimard is removed yup
16:18:11 <clarkb> https://review.opendev.org/#/admin/groups/459,members should be good now. infra-core had ianw and frickler in it so they are removed from the extra individuals list
16:18:53 <clarkb> now I'm going to update infra-core while I'm thinking about it
16:19:24 <clarkb> thats done, should better reflect what we've got in ansible
16:19:37 <AJaeger> clarkb: updates look good to me
16:24:12 <clarkb> I've approved https://review.opendev.org/#/c/742731/1
16:28:36 <clarkb> I'll end the meeting here as the steps on the etherpad (other than disk cleanups) are completed
16:28:39 <clarkb> #endmeeting