Saturday, 2023-09-09

corvusthere are 3 changes in the zuul openstack tenant; i'm dequeing them manually to let the automated zuul restart playbook finish gracefully14:37
fungithanks! i'm around for a few minutes, but need to leave to get lunch soon14:40
corvusi think i'll just wait for it to finish; it's rebooting mergers now; probably takes a few minutes for each, so probably nothing happening until about 1500 anyway :)14:41
fungizuul does look pretty quiet, but i guess that's to be expected on a saturday14:48
corvusit's restarting schedulers now, so getting close to being finished... then i can restart it again :)14:53
corvusokay it's done; i'm going to ... dequeue some more changes, then stop all of zuul15:03
corvusspeaking of... https://review.opendev.org/867177 would be really helpful to have merged.  :)15:04
corvusas a consequence the opendev-prod-hourly jobs aren't going to be dequeued, they're just going to be killed15:04
Clark[m]That is fine they'll run again in an hour15:05
corvusyeah, but if it worked i could gracefully dequeue and re-enqueue them :)15:05
Clark[m]Is there a reason to dequeue first? Aren't we stopping zuul then clearing all the state anyway?15:06
corvusso that i can re-enqueue them with a click15:06
Clark[m]Ah15:06
corvuseverything is down now15:07
corvusrunning docker-compose run scheduler bash15:09
corvusand zuul-admin delete-state15:09
corvuson zuul0115:09
corvusit's not fast15:10
corvusthe zk data size is progressing down on grafana15:11
corvusit's done15:11
corvusstarted zuul01 scheduler and all mergers15:12
corvusstarting all executors15:12
corvusstarting zuul02 scheduler15:13
Clark[m]Was zuul02 started after zuul01 steady state?15:13
corvusstarting both web servers15:13
corvusno15:13
Clark[m](just curious on order of operations)15:13
corvusstarting everything at once should be fine even.15:14
corvusthey will handle different tenants15:14
corvusit looks like all tenants except openstack are online now15:15
corvuswe're on x/f15:17
Clark[m]I think last time we did this openstack took almost 20 minutes15:17
corvusi think since then we may have added the threadpool executor processing of cat jobs15:18
corvusit's done with the cat jobs now15:18
corvus2023-09-09 15:18:29,657 DEBUG zuul.GithubRequest: GET https://api.github.com/repos/eventlet/eventlet result: 403, size: 280, duration: 4115:19
corvus2023-09-09 15:18:29,657 WARNING zuul.GithubRateLimitHandler: API rate limit reached, need to wait for 3259 seconds15:19
corvusso... we're offline for an hour?15:20
corvushttps://paste.opendev.org/show/b9VVT76ngRpZDknxhnpe/ is the full list of projects without installation ids15:24
Clark[m]The issue there being we get fewer API requests per $time when scanning those repos. I guess we should consider some cleanup if it causes us to be unable to reload configs on a clean startup15:25
corvusyeah, if nobody is actually using those, then probably time to remove them15:26
Clark[m]I suspect that Ansible and friends require the most API requests to evaluate. But also projects like cherrypy may not be necessary any longer15:26
corvuspretty sure the app was installed in some of those repos previously15:26
corvusanyway, i see two ways forward: either we manually remove them from the config now, or eat breakfast and come back at 16:15.15:27
corvus(though there's no guarantee that we won't run into another rate limit then)15:28
Clark[m]I think some of those are used and never had apps installed. As they are used on our end for integration stuff not reporting back with any coordination to the other side15:28
corvusyep15:29
Clark[m]We should already exclude things like loading configs from those repos. Does that help reduce the number of API queries we need to make?15:30
Clark[m]In any case I suspect that the Ansible related repos are the majority of the problem here due to how they use branches. I think Ansible did have an app once but no longer. Removing it would likely break people including our system-config Ansible devel test job15:32
corvusi'm not sure all the api requests that are being done, but the one that tripped the limit was just fetching general info for the repo.15:33
Clark[m](for those that don't know Ansible repos seem to use the central repo as location for branches used to make PRs rather than separate forks. Similar in workflow to Gerrit if you hand wave a bit but the problem is you get real branches and lots of them)15:35
corvus2023-09-09 15:18:27,226 DEBUG zuul.GithubRequest: GET https://api.github.com/repositories/7833168/branches?per_page=100&page=7&per_page=100 result: 200, size: 14097, duration: 26415:36
corvusClark: i think your theory is sound -- but maybe also add kibana to the list; that appears to be what that one is15:37
Clark[m]I'm guessing 100 per page is the most we can request. I wonder if we can have git list them directly instead and bypass the rate limit15:38
Clark[m]But that doesn't change all the work needed to be done against the branches 15:39
corvusgit doesn't tell you if branch protection is enabled15:40
Clark[m]Aha15:40
fungido we need to know about branch protection for repos that we're not loading configuration from and not gating?16:01
Clark[m]You could still have centrally managed zuul config operating against the repos that need the protection info?16:05
fungiah, i suppose so16:08
fungianyway, disappearing for the next ~24 hours, won't have a computer with me, but will check back in tomorrow16:09
fungithanks for working on the restarts, sorry i had to miss most of it16:09
* fungi vanishes in a puff of electrons16:10
corvus2023-09-09 16:13:25,109 WARNING zuul.GithubRateLimitHandler: API rate limit reached, need to wait for 3565 seconds16:13
corvusthis is impractical16:13
Clark[m]If we remove those github projects from the list would we fail to load configs for all projects that rely on them?16:15
corvusdepends on the config, but if they're just referenced by individual jobs, then just those jobs would disappear.  if they are in project-pipelines, then we'd lose the project pipeline config.16:16
Clark[m]We'd lose the project pipeline config for the GitHub project? Or maybe I don't understand the nuance there.16:17
Clark[m]If I understand generally though jobs like system-config's run job with Ansible devel and ara from source would go away but system config as a whole would be fine. There are likely other users of Ansible though16:21
Clark[m]I think this job is the only usage of ara according to codesearch16:21
corvusClark: yes, we'd lose the ppc for the github project16:23
corvussummary: the current status is that the scheduler is waiting on the github api limit to clear in order to finish loading the openstack tenant, at which point the web servers will come online, and the system will be fully up.16:32
corvusthat will take at least another 45 minutes, and if the api limit is exceeded again, another hour after that, and so on.16:32
corvusLooks like it's up17:44
opendevreviewGhanshyam proposed openstack/project-config master: Retire os-win: end project gating  https://review.opendev.org/c/openstack/project-config/+/89440718:50
opendevreviewGhanshyam proposed openstack/project-config master: Retire compute-hyperv: end project gating  https://review.opendev.org/c/openstack/project-config/+/89440818:52
opendevreviewGhanshyam proposed openstack/project-config master: Retire networking-hyperv: end project gating  https://review.opendev.org/c/openstack/project-config/+/89440918:54
opendevreviewGhanshyam proposed openstack/project-config master: Retire oswin-tempest-plugin: end project gating  https://review.opendev.org/c/openstack/project-config/+/89441018:55
opendevreviewRadosÅ‚aw Piliszek proposed opendev/system-config master: Add codesearch to cacti  https://review.opendev.org/c/opendev/system-config/+/89441719:49
opendevreviewGhanshyam proposed openstack/project-config master: Update theacl for retired winstacker project repo  https://review.opendev.org/c/openstack/project-config/+/89441819:51
opendevreviewGhanshyam proposed openstack/project-config master: Update the gerrit acl for retired winstacker project  https://review.opendev.org/c/openstack/project-config/+/89441819:53
opendevreviewGhanshyam proposed openstack/project-config master: Retire os-win: end project gating  https://review.opendev.org/c/openstack/project-config/+/89440719:53
opendevreviewGhanshyam proposed openstack/project-config master: Retire compute-hyperv: end project gating  https://review.opendev.org/c/openstack/project-config/+/89440819:55
opendevreviewGhanshyam proposed openstack/project-config master: Retire networking-hyperv: end project gating  https://review.opendev.org/c/openstack/project-config/+/89440919:55
opendevreviewGhanshyam proposed openstack/project-config master: Retire oswin-tempest-plugin: end project gating  https://review.opendev.org/c/openstack/project-config/+/89441019:55
opendevreviewGhanshyam proposed openstack/project-config master: Retire os-win: remove project from infra  https://review.opendev.org/c/openstack/project-config/+/89441919:59
opendevreviewGhanshyam proposed openstack/project-config master: Retire compute-hyperv: remove project from infra  https://review.opendev.org/c/openstack/project-config/+/89442020:02
opendevreviewGhanshyam proposed openstack/project-config master: Retire networking-hyperv: remove project from infra  https://review.opendev.org/c/openstack/project-config/+/89444120:05
opendevreviewGhanshyam proposed openstack/project-config master: Retire oswin-tempest-plugin: remove project from infra  https://review.opendev.org/c/openstack/project-config/+/89444220:10

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!