14:51:37 #startmeeting infra 14:51:38 Meeting started Fri Jul 24 14:51:37 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:51:39 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:51:41 The meeting name has been set to 'infra' 14:51:48 #topic Gerrit Project Renames July 24, 2020 14:52:00 #link https://etherpad.opendev.org/p/gerrit-2020-07-24 14:52:05 clarkb: we've used opendev-maint for the meeting name previously i think? 14:52:33 ah that is what I was wondering. Maybe we can just mv this file when we are done? 14:52:38 or should I end and start new? 14:53:02 it's probably fine for this one 14:53:14 k 14:53:23 i missed you were asking what meeting name to use, sorry :/ 14:53:47 I have started a root screen on bridge and run disable-ansible there. We are also waiting for a couple of openstack releases to flush through the release and post pipelines. 14:54:42 Once the releases are complete we'll proceed with irc notices and running the playbook 14:55:29 i've checked the renames data change and the copy of the yaml file from it on bridge.o.o, both identical and correct 14:56:04 they match what's going on in the rename changes too 14:56:44 there is also a single ironic job that should end any minute now and if it succeeds will flush about 5 changes so may wait for that too, though I think that is less critical 15:01:17 it's zuul.opendev.org/t/openstack/status/ 15:01:23 er, mispaste 15:02:35 we're still waiting on nodes for the releases but I think all of the jobs are queued at this point. Now we wait 15:03:51 nova's release notes job takes more than half an hour? 15:04:04 I imagine its safe to restart Gerrit while that is running since it isn't pushing tags 15:04:53 yeah, the problem is if we take gerrit down during tag pushes, slightly less so for the job which proposes constraints changes 15:05:15 the last build for that ironic change is also just wrapping up now 15:05:57 fungi: it would probably be good to wait at least for the release-openstack-python jobs to finish too incase they fetch from gerrit? 15:06:04 I don't think they do but I'm always surprised :) 15:08:58 i don't think they do. propose-update-constraints will definitely be pushing to gerrit though 15:09:36 oh I see 15:09:38 ya 15:10:20 we are 4 minutes away for the one release 15:13:24 fungi: https://zuul.opendev.org/t/openstack/build/ad0de3a8325c4c3ab4c462e6ee1bf509 nova release failed on that 15:13:36 is that something we can deal with after the maintenance? 15:14:10 looks like it did upload to pypi 15:14:29 and it ran on ze10 (not sure if that aws one of the time delta servers) 15:15:11 ugh, yes that's the same problem i was trying to track down on ze11 yesterday 15:15:33 ze02, ze10 and ze11 were the three which spontaneously rebooted at various times on wednesday 15:16:36 it'll have to be dealt with after the maintenance 15:16:46 it involves manually copying files from pypi into afs 15:17:36 I'm checking with smcginnis now that we are good to proceed wtih maintenance from their side 15:17:44 all the tagging and constraints have been pushed I think 15:19:20 fungi: corvus I've got the ansible playbook command queued up in screen. smcginnis thinks we are clear to proceed. 15:19:25 ++ 15:19:50 will running the status notice here confuse the bots: #status notice We are renaming projects in Gerrit and review.opendev.org will experience a short outage. Thank you for your patience. 15:20:01 yep, should be clear to proceed 15:20:33 I'll run it in #opendev to avoid any bot confusion 15:20:41 it shouldn't confuse the bots 15:20:47 oh well then I guess I'll try it here 15:20:52 #status notice We are renaming projects in Gerrit and review.opendev.org will experience a short outage. Thank you for your patience. 15:20:52 clarkb: sending notice 15:20:59 the only real confusion will be if we switch meeting topics while still under alert 15:21:06 i think 15:21:16 or end the meeting while still under alert 15:21:27 ya I didn't alert in part to avoid topic changes 15:21:55 Once it says it is done I'll remove my # prefix on the bridge screen command line and run the playbook? 15:22:15 the maintenance itself will take less time to complete than undoing the alert would anyway 15:22:34 yeah, looks right 15:23:57 ++ 15:24:03 clarkb: finished sending notice 15:24:12 alright proceeding with the playbook command now 15:24:56 please do 15:24:58 it is running 15:25:17 no usable temp dir found? 15:25:18 uhm there was gurmp about a tmp dir? 15:25:21 but its proceeding? 15:25:28 review-test 15:25:33 review-test 15:25:34 ya ok 15:25:35 yup 15:26:34 its at the wati for gerrit to come up stage 15:28:10 logs claim it is ready so web should catch up momentarily 15:28:21 api has gone from refusing to hanging 15:28:54 help responds now 15:29:02 ls-projects isn't returning yet for me though 15:29:10 i can load changes using the web ui 15:29:46 yup web ui is up for me but ls-projects isn't working yet 15:30:32 show-queue works /me tries ls-projects again 15:30:49 now it's outputing the projects list for me, just pausing/buffering slowly 15:31:06 maybe project listing is slow at startup 15:31:16 okay it finally returned for me 15:31:28 and now it returns quickly when rerunning 15:31:35 yup my initla call errored but now if I do it it works 15:31:36 so we're probably in the clear to move along 15:31:47 corvus: ^ you think we are ready too? 15:32:15 yep 15:32:40 that lgtm other than review-test 15:33:16 why does it run against review-test? 15:33:33 fungi: we must be using the review group and review-test is in it? 15:33:49 would probably be a good idea to be more explicit or remove review-test from that group but that will need investigating 15:33:55 ahh, that would make sense, yeah 15:34:27 okay, so now we land the rename changes? 15:34:31 we are ready to merge https://review.opendev.org/#/c/739286/ and https://review.opendev.org/#/c/738979 right? 15:34:39 yeah i think so 15:34:44 this is where we had problems last time but the ansible disablement seems to be working so I think its ok 15:34:57 also 742731 right? 15:34:57 and I should force merge those because the first cahnge cannot merge as is? 15:35:12 fungi: ya that one too but it can happen without force merging and before or after we disable ansible 15:35:37 yeah, i think force-merge 738286 15:35:41 yeah, unless we want to split the zuul tenant config change out we'll need to bypass zuul 15:35:53 corvus: should I force merge both so that we can reenable ansible more quickly? 15:35:57 we'll get a zuul config error which will reconcile once we re-enable zuul on bridge and deploy 15:36:13 clarkb: yes i think it's fine to do both 15:36:20 ok I'll force merge both now 15:36:33 we need all three merged before we turn ansible back on though, right? 15:37:04 er, no i guess the renames data is only used if we rebuild gitea servers 15:37:06 fungi: no, the record change is purely information and not processed by automation (yet) 15:37:16 so that can merge in its own time, yeah 15:37:22 (i had a typo earlier, 739286 not 738286) 15:38:15 corvus: ya I opened them from the etherpad and checked content 15:38:17 I have merged both 15:38:26 yep, gerritbot confirmed 15:39:39 https://gitea01.opendev.org:3000/openstack/project-config/commits/branch/master I'm checking that on 01 to 08 now 15:39:41 01 lgtm 15:40:31 all 8 lgtm 15:40:50 yep, i also just finished checking them. all 8 look like they have those now 15:40:55 deploy has both changes queued up and the first manage projects job should be hitting our ansible is disabled check 15:41:26 fungi: corvus let me know if you think there is anything else we should check before removing the ansible disablement file. I think we should be good to proceed 15:41:34 clarkb: i think we're good 15:41:36 i think we're ready to go 15:42:04 the file has been rm'd 15:42:42 msg executing local code is prohibited 15:42:45 ok then 15:42:52 ? 15:42:59 manage-projects job failed due to ^ 15:43:04 lisk? 15:43:22 getting one 15:43:33 https://zuul.opendev.org/t/openstack/build/948ba0341b334e9db4c2f32779fdae86 15:43:38 I think we're ok from a renaming standpoint 15:44:05 we just failed to run the job but once the job is working again we'll apply the updated state and noop 15:44:12 however, we cannot create new projects at the moment 15:45:09 it is the git repo update 15:45:19 we may actually not update anything right now :/ 15:45:35 I'm wondering if that means we want to disable ansible again to keep periodic jobs from potentially being unhappy about things? 15:46:49 i guess we haven't actually run infra-prod-manage-projects successfully since the upgrade: https://zuul.opendev.org/t/openstack/builds?job_name=infra-prod-manage-projects 15:47:27 i suspect we don't really need to disable since it's just going to continue to bomb at the start 15:47:30 fungi: or any other infra-prod playbook 15:47:43 corvus: ya I think that is correct since this is very early in the infra prod playbook 15:48:35 we're using the zuul project ssh key for access control, right? 15:48:43 corvus: yes 15:48:55 it should be added to the bridge zuul user 15:49:59 i think we're going to need a new base job in a config-project to fix this 15:50:18 system-config is currently in the openstack tenant, so we can add it to opendev/base-jobs or openstack/project-config 15:50:40 i don't think we have a config-project that's just for the opendev tenant, do we? 15:50:55 corvus: we have opendev/project-config which we don't really use for much yet 15:50:59 aha 15:51:32 it's not in openstack 15:51:40 maybe we should just put this in base-jobs for now, then move it? 15:51:54 i'm okay with it going into opendev/base-jobs initially 15:51:55 opendev/base-jobs you mean? ya I think that would be easy enough 15:52:06 i mean, no other project will be able to use it because of the project key anyway 15:52:26 i'll work on a change 15:52:28 we'll eventually shuffle all of that into the opendev tenant regardless 15:52:37 we can also explicitly limit it to openstack/project-config and openstack/system-config right? 15:52:49 and can then put it in opendev/project-config when we do 15:52:51 allowed-projects or whatever the term is 15:52:54 fungi: yeah, and when we do, we can pull it into a narrower scope 15:52:56 clarkb: yes 15:55:29 and the only role it seems to use is prepare-workspace-git which is in zuul-jobs which we include to base-jobs so that will eb ok too I think 15:56:03 we can also probably start small and disable most of the CD jobs and try the new one with a single job? and then expand from there as things are happy 16:00:03 hrm unknown project opendev/system-config :/ 16:00:21 where did you see that? 16:00:39 on the change corvus just pushed 16:00:43 i'll fix it in a minute, working on the other change now 16:00:51 that should be openstack 16:00:51 k 16:00:59 oh right 16:01:08 oh, in 742934 16:01:24 oh wait no that's right 16:01:39 it's just that opendev/system-config is not in every tenant 16:01:45 let's just skip the allowed-projects 16:02:54 and by skipping allowed projects other projects can run it but will fail to ssh because they don't have the project key? 16:03:16 yep. 16:03:44 ok maybe if we do that we should test it once it has landed (to ensure it fails as expected) 16:03:51 but I may be overly paranoid 16:04:34 clarkb: sounds good 16:08:44 corvus: as another option can we add system-config to opendev without loading any configs from it? 16:08:54 its a move we'll do eventually (but with loaded configs) so maybe that is a good step anyway? 16:09:05 clarkb: would need to be added to every tenant 16:09:19 the issue is opendev/base-jobs is in every tenant, so every tenant needs to understand that job defn 16:09:19 because we load base jobs in every tenant, got it 16:09:53 we could add system-config and project-config to every tenant and just "include: []" but i think even that is too messy 16:10:06 this gets better once we move system-config into the opendev tenant 16:10:14 ya 16:11:22 https://review.opendev.org/#/admin/groups/459,members is who has approval on that repo fwiw. I'm thinking I may trim it down a bit? 16:11:59 infra-core + dmsimard, frickler, mnaser, ajaeger? 16:12:02 any objections? 16:12:16 I guess the ssh key still protects us there 16:12:20 so its probably fine to leave it as is 16:12:30 oh, yeah, lots of emeritus reviewers who haven't been involved for a while 16:12:47 oh no the ssh key doesn't protect us as muc hthere once we consume the base job 16:13:05 so ya I think trimming that a bit maks sense. Any objections to doing it with the group above? 16:13:19 no objection 16:13:32 and if they're active again, we can bring them up anytime 16:13:45 mnaser: ya its an active vs inactive question but also a root vs not root question 16:14:08 ah. well, in full transparency, i have access to vexxhost/base-jobs which is a config project inside vexxhost tenant 16:14:11 so.. i'd lose that.. 16:14:36 i think i'm lost 16:14:47 mnaser: clarkb is proposing to continue including you 16:14:53 we're talking about trimming membership of project-config-core which has approval rights for which repo(s)? 16:15:01 corvus: opendev/base-jobs 16:15:08 oh, sigh, sorry. i thought the + was a ping, and not foo + bar 16:15:39 i think clarkb's trimming is appropriate and should have no net effect 16:16:08 yes, i agree with the proposed adjustment to that group membership 16:16:46 clarkb: trimming looks fine - I just wonder about dmsimard, is he still reviewing? I would drop him as well 16:16:47 mnaser: yup I trust both you an AJaeger with that access even if you aren't proer roots 16:17:02 AJaeger: let me double check if we've cleared out his ssh key if so I'll clear from that group too 16:17:29 clarkb: change Ife3cfdfe3b674c7703adcbcf7f5a4af708fcd03a 16:17:35 dmsimard is removed yup 16:18:11 https://review.opendev.org/#/admin/groups/459,members should be good now. infra-core had ianw and frickler in it so they are removed from the extra individuals list 16:18:53 now I'm going to update infra-core while I'm thinking about it 16:19:24 thats done, should better reflect what we've got in ansible 16:19:37 clarkb: updates look good to me 16:24:12 I've approved https://review.opendev.org/#/c/742731/1 16:28:36 I'll end the meeting here as the steps on the etherpad (other than disk cleanups) are completed 16:28:39 #endmeeting