Wednesday, 2024-04-24

@picog:matrix.orgHi guys.06:00
I'm still seeing zuul containers crash every night, would love to understand what is happening here.
```
podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f41107fe744e docker.io/gerritcodereview/gerrit:latest 22 hours ago Exited (137) 3 hours ago 0.0.0.0:8080->8080/tcp, 0.0.0.0:29418->29418/tcp zuul_gerrit_1
3b95446473e3 docker.io/library/zookeeper:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_zk_1
3f30fca7aaa4 docker.io/library/mariadb:latest mariadbd 22 hours ago Up 22 hours zuul_mysql_1
c98938fba7bc localhost/zuul_logs:latest httpd-foreground 22 hours ago Up 22 hours 0.0.0.0:8000->80/tcp zuul_logs_1
c3ae94ef5c28 quay.io/zuul-ci/zuul-executor:latest /usr/local/lib/zu... 22 hours ago Exited (2) 22 hours ago zuul_gerritconfig_1
5bc728574e98 quay.io/zuul-ci/nodepool-launcher:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:8005->8005/tcp zuul_launcher_1
ccc15f05723a quay.io/zuul-ci/zuul-scheduler:latest sh -c /var/playbo... 22 hours ago Exited (139) 3 hours ago zuul_scheduler_1
ddfce79beac9 quay.io/zuul-ci/zuul-web:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:9000->9000/tcp zuul_web_1
a718dabae24b localhost/zuul_executor:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_executor_1
```
gerrit container logs (Can't see anything interesting)
```
[2024-04-23T16:08:56.233Z] [HTTP POST /a/changes/zuul-config~master~I875745a421c5eba6457e10b93d2c56e43373aff7/revisions/8096783323b3f1b2ef39ec67d0bb (zuul from 10.89.0.1)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.commentAddedHook resolved to /var/gerrit/hooks/comment-added [CONTEXT project="zuul-config" request="REST /changes/*/revisions/*/review" ]
[2024-04-23T16:38:23.222Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.submitHook resolved to /var/gerrit/hooks/submit [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ]
[2024-04-23T16:38:23.507Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.changeMergedHook resolved to /var/gerrit/hooks/change-merged [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ]
[2024-04-23T16:52:17.489Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: new: 1 (\) [CONTEXT ratelimit_period="1 MINUTES [skipped: 7]" ]
[2024-04-23T19:34:59.257Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: refs: 1, new: 1 [CONTEXT ratelimit_period="1 MINUTES [skipped: 4]" ]
```
gerrit_config:
```
podman logs --tail 20 c3ae94ef5c28
ok: [localhost]
TASK [Create temp dir for Gerrit config update] ********************************
changed: [localhost]
TASK [Set All-Project repo location] *******************************************
ok: [localhost]
TASK [Checkout All-Projects config] ********************************************
changed: [localhost]
TASK [Copy new All-Projects config into place] *********************************
ok: [localhost]
TASK [Update All-Projects config in Gerrit] ************************************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "set -e\ngit config user.email 'admin@example.com'\ngit commit -a -m 'update config'\ngit push http://admin:secret@gerrit:8080/All-Projects +HEAD:refs/meta/config\n", "delta": "0:00:00.026840", "end": "2024-04-23 07:40:07.222905", "msg": "non-zero return code", "rc": 1, "start": "2024-04-23 07:40:07.196065", "stderr": "", "stderr_lines": [], "stdout": "Not currently on any branch.\nnothing to commit, working tree clean", "stdout_lines": ["Not currently on any branch.", "nothing to commit, working tree clean"]}
PLAY RECAP *********************************************************************
localhost : ok=12 changed=3 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0```
Scheduler quits because it can't connect to gerrit?
```
podman logs --tail 20 ccc15f05723a
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: Traceback (most recent call last):
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 115, in run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: self.watcher_election.run(self._run)
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/zk/election.py", line 28, in run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: return super().run(func, *args, **kwargs)
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/kazoo/recipe/election.py", line 54, in run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: func(*args, **kwargs)
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 80, in _run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: client.connect(self.hostname,
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 377, in connect
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: to_try = list(self._families_and_addresses(hostname, port))
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 202, in _families_and_addresses
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: addrinfos = socket.getaddrinfo(
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: socket.gaierror: [Errno -2] Name or service not known
```
Sorry, for the long message, should I file this info somewhere else?
@picog:matrix.org * Hi guys.06:00
I'm still seeing zuul containers crash every night, would love to understand what is happening here.
```
podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f41107fe744e docker.io/gerritcodereview/gerrit:latest 22 hours ago Exited (137) 3 hours ago 0.0.0.0:8080->8080/tcp, 0.0.0.0:29418->29418/tcp zuul_gerrit_1
3b95446473e3 docker.io/library/zookeeper:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_zk_1
3f30fca7aaa4 docker.io/library/mariadb:latest mariadbd 22 hours ago Up 22 hours zuul_mysql_1
c98938fba7bc localhost/zuul_logs:latest httpd-foreground 22 hours ago Up 22 hours 0.0.0.0:8000->80/tcp zuul_logs_1
c3ae94ef5c28 quay.io/zuul-ci/zuul-executor:latest /usr/local/lib/zu... 22 hours ago Exited (2) 22 hours ago zuul_gerritconfig_1
5bc728574e98 quay.io/zuul-ci/nodepool-launcher:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:8005->8005/tcp zuul_launcher_1
ccc15f05723a quay.io/zuul-ci/zuul-scheduler:latest sh -c /var/playbo... 22 hours ago Exited (139) 3 hours ago zuul_scheduler_1
ddfce79beac9 quay.io/zuul-ci/zuul-web:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:9000->9000/tcp zuul_web_1
a718dabae24b localhost/zuul_executor:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_executor_1
```
gerrit container logs (Can't see anything interesting)
```
[2024-04-23T16:08:56.233Z] [HTTP POST /a/changes/zuul-config~master~I875745a421c5eba6457e10b93d2c56e43373aff7/revisions/8096783323b3f1b2ef39ec67d0bb (zuul from 10.89.0.1)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.commentAddedHook resolved to /var/gerrit/hooks/comment-added [CONTEXT project="zuul-config" request="REST /changes/*/revisions/*/review" ]
[2024-04-23T16:38:23.222Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.submitHook resolved to /var/gerrit/hooks/submit [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ]
[2024-04-23T16:38:23.507Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.changeMergedHook resolved to /var/gerrit/hooks/change-merged [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ]
[2024-04-23T16:52:17.489Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: new: 1 (\) [CONTEXT ratelimit_period="1 MINUTES [skipped: 7]" ]
[2024-04-23T19:34:59.257Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: refs: 1, new: 1 [CONTEXT ratelimit_period="1 MINUTES [skipped: 4]" ]
```
gerrit\_config:
````
podman logs --tail 20 c3ae94ef5c28
ok: [localhost]
TASK [Create temp dir for Gerrit config update] ********************************
changed: [localhost]
TASK [Set All-Project repo location] *******************************************
ok: [localhost]
TASK [Checkout All-Projects config] ********************************************
changed: [localhost]
TASK [Copy new All-Projects config into place] *********************************
ok: [localhost]
TASK [Update All-Projects config in Gerrit] ************************************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "set -e\ngit config user.email 'admin@example.com'\ngit commit -a -m 'update config'\ngit push http://admin:secret@gerrit:8080/All-Projects +HEAD:refs/meta/config\n", "delta": "0:00:00.026840", "end": "2024-04-23 07:40:07.222905", "msg": "non-zero return code", "rc": 1, "start": "2024-04-23 07:40:07.196065", "stderr": "", "stderr_lines": [], "stdout": "Not currently on any branch.\nnothing to commit, working tree clean", "stdout_lines": ["Not currently on any branch.", "nothing to commit, working tree clean"]}
PLAY RECAP *********************************************************************
localhost : ok=12 changed=3 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0```
Scheduler quits because it can't connect to gerrit?
````
podman logs --tail 20 ccc15f05723a
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: Traceback (most recent call last):
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 115, in run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: self.watcher\_election.run(self.\_run)
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/zk/election.py", line 28, in run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: return super().run(func, \*args, \*\*kwargs)
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/kazoo/recipe/election.py", line 54, in run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: func(\*args, \*\*kwargs)
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 80, in \_run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: client.connect(self.hostname,
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 377, in connect
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: to\_try = list(self.\_families\_and\_addresses(hostname, port))
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 202, in \_families\_and\_addresses
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: addrinfos = socket.getaddrinfo(
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: for res in \_socket.getaddrinfo(host, port, family, type, proto, flags):
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: socket.gaierror: \[Errno -2\] Name or service not known
```
Sorry, for the long message, should I file this info somewhere else?
@picog:matrix.org * Hi guys.06:01
I'm still seeing zuul containers crash every night, would love to understand what is happening here.
```
podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f41107fe744e docker.io/gerritcodereview/gerrit:latest 22 hours ago Exited (137) 3 hours ago 0.0.0.0:8080->8080/tcp, 0.0.0.0:29418->29418/tcp zuul_gerrit_1
3b95446473e3 docker.io/library/zookeeper:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_zk_1
3f30fca7aaa4 docker.io/library/mariadb:latest mariadbd 22 hours ago Up 22 hours zuul_mysql_1
c98938fba7bc localhost/zuul_logs:latest httpd-foreground 22 hours ago Up 22 hours 0.0.0.0:8000->80/tcp zuul_logs_1
c3ae94ef5c28 quay.io/zuul-ci/zuul-executor:latest /usr/local/lib/zu... 22 hours ago Exited (2) 22 hours ago zuul_gerritconfig_1
5bc728574e98 quay.io/zuul-ci/nodepool-launcher:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:8005->8005/tcp zuul_launcher_1
ccc15f05723a quay.io/zuul-ci/zuul-scheduler:latest sh -c /var/playbo... 22 hours ago Exited (139) 3 hours ago zuul_scheduler_1
ddfce79beac9 quay.io/zuul-ci/zuul-web:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:9000->9000/tcp zuul_web_1
a718dabae24b localhost/zuul_executor:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_executor_1
```
gerrit container logs (Can't see anything interesting)
```
[2024-04-23T16:08:56.233Z] [HTTP POST /a/changes/zuul-config~master~I875745a421c5eba6457e10b93d2c56e43373aff7/revisions/8096783323b3f1b2ef39ec67d0bb (zuul from 10.89.0.1)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.commentAddedHook resolved to /var/gerrit/hooks/comment-added [CONTEXT project="zuul-config" request="REST /changes/*/revisions/*/review" ]
[2024-04-23T16:38:23.222Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.submitHook resolved to /var/gerrit/hooks/submit [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ]
[2024-04-23T16:38:23.507Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.changeMergedHook resolved to /var/gerrit/hooks/change-merged [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ]
[2024-04-23T16:52:17.489Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: new: 1 (\) [CONTEXT ratelimit_period="1 MINUTES [skipped: 7]" ]
[2024-04-23T19:34:59.257Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: refs: 1, new: 1 [CONTEXT ratelimit_period="1 MINUTES [skipped: 4]" ]
```
gerrit\_config:
````
podman logs --tail 20 c3ae94ef5c28
ok: [localhost]
TASK [Create temp dir for Gerrit config update] ********************************
changed: [localhost]
TASK [Set All-Project repo location] *******************************************
ok: [localhost]
TASK [Checkout All-Projects config] ********************************************
changed: [localhost]
TASK [Copy new All-Projects config into place] *********************************
ok: [localhost]
TASK [Update All-Projects config in Gerrit] ************************************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "set -e\ngit config user.email 'admin@example.com'\ngit commit -a -m 'update config'\ngit push http://admin:secret@gerrit:8080/All-Projects +HEAD:refs/meta/config\n", "delta": "0:00:00.026840", "end": "2024-04-23 07:40:07.222905", "msg": "non-zero return code", "rc": 1, "start": "2024-04-23 07:40:07.196065", "stderr": "", "stderr_lines": [], "stdout": "Not currently on any branch.\nnothing to commit, working tree clean", "stdout_lines": ["Not currently on any branch.", "nothing to commit, working tree clean"]}
PLAY RECAP *********************************************************************
localhost : ok=12 changed=3 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0```
Scheduler quits because it can't connect to gerrit?
```
podman logs --tail 20 ccc15f05723a
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: Traceback (most recent call last):
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 115, in run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: self.watcher\_election.run(self.\_run)
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/zk/election.py", line 28, in run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: return super().run(func, \*args, \*\*kwargs)
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/kazoo/recipe/election.py", line 54, in run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: func(\*args, \*\*kwargs)
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 80, in \_run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: client.connect(self.hostname,
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 377, in connect
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: to\_try = list(self.\_families\_and\_addresses(hostname, port))
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 202, in \_families\_and\_addresses
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: addrinfos = socket.getaddrinfo(
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: for res in \_socket.getaddrinfo(host, port, family, type, proto, flags):
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: socket.gaierror: \[Errno -2\] Name or service not known
```
Sorry, for the long message, should I file this info somewhere else?
@picog:matrix.org * Hi guys.06:02
I'm still seeing zuul containers crash every night, would love to understand what is happening here.
```
podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f41107fe744e docker.io/gerritcodereview/gerrit:latest 22 hours ago Exited (137) 3 hours ago 0.0.0.0:8080->8080/tcp, 0.0.0.0:29418->29418/tcp zuul_gerrit_1
3b95446473e3 docker.io/library/zookeeper:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_zk_1
3f30fca7aaa4 docker.io/library/mariadb:latest mariadbd 22 hours ago Up 22 hours zuul_mysql_1
c98938fba7bc localhost/zuul_logs:latest httpd-foreground 22 hours ago Up 22 hours 0.0.0.0:8000->80/tcp zuul_logs_1
c3ae94ef5c28 quay.io/zuul-ci/zuul-executor:latest /usr/local/lib/zu... 22 hours ago Exited (2) 22 hours ago zuul_gerritconfig_1
5bc728574e98 quay.io/zuul-ci/nodepool-launcher:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:8005->8005/tcp zuul_launcher_1
ccc15f05723a quay.io/zuul-ci/zuul-scheduler:latest sh -c /var/playbo... 22 hours ago Exited (139) 3 hours ago zuul_scheduler_1
ddfce79beac9 quay.io/zuul-ci/zuul-web:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:9000->9000/tcp zuul_web_1
a718dabae24b localhost/zuul_executor:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_executor_1
```
gerrit container logs (Can't see anything interesting)
```
[2024-04-23T16:08:56.233Z] [HTTP POST /a/changes/zuul-config~master~I875745a421c5eba6457e10b93d2c56e43373aff7/revisions/8096783323b3f1b2ef39ec67d0bb (zuul from 10.89.0.1)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.commentAddedHook resolved to /var/gerrit/hooks/comment-added [CONTEXT project="zuul-config" request="REST /changes/*/revisions/*/review" ]
[2024-04-23T16:38:23.222Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.submitHook resolved to /var/gerrit/hooks/submit [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ]
[2024-04-23T16:38:23.507Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.changeMergedHook resolved to /var/gerrit/hooks/change-merged [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ]
[2024-04-23T16:52:17.489Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: new: 1 (\) [CONTEXT ratelimit_period="1 MINUTES [skipped: 7]" ]
[2024-04-23T19:34:59.257Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: refs: 1, new: 1 [CONTEXT ratelimit_period="1 MINUTES [skipped: 4]" ]
```
gerrit\_config:
```
podman logs --tail 20 c3ae94ef5c28
ok: [localhost]
TASK [Create temp dir for Gerrit config update] ********************************
changed: [localhost]
TASK [Set All-Project repo location] *******************************************
ok: [localhost]
TASK [Checkout All-Projects config] ********************************************
changed: [localhost]
TASK [Copy new All-Projects config into place] *********************************
ok: [localhost]
TASK [Update All-Projects config in Gerrit] ************************************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "set -e\ngit config user.email 'admin@example.com'\ngit commit -a -m 'update config'\ngit push http://admin:secret@gerrit:8080/All-Projects +HEAD:refs/meta/config\n", "delta": "0:00:00.026840", "end": "2024-04-23 07:40:07.222905", "msg": "non-zero return code", "rc": 1, "start": "2024-04-23 07:40:07.196065", "stderr": "", "stderr_lines": [], "stdout": "Not currently on any branch.\nnothing to commit, working tree clean", "stdout_lines": ["Not currently on any branch.", "nothing to commit, working tree clean"]}
PLAY RECAP *********************************************************************
localhost : ok=12 changed=3 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0```
Scheduler quits because it can't connect to gerrit?
```
podman logs --tail 20 ccc15f05723a
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: Traceback (most recent call last):
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 115, in run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: self.watcher\_election.run(self.\_run)
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/zk/election.py", line 28, in run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: return super().run(func, \*args, \*\*kwargs)
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/kazoo/recipe/election.py", line 54, in run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: func(\*args, \*\*kwargs)
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 80, in \_run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: client.connect(self.hostname,
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 377, in connect
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: to\_try = list(self.\_families\_and\_addresses(hostname, port))
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 202, in \_families\_and\_addresses
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: addrinfos = socket.getaddrinfo(
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: for res in \_socket.getaddrinfo(host, port, family, type, proto, flags):
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: socket.gaierror: \[Errno -2\] Name or service not known
```
Sorry, for the long message, should I file this info somewhere else?
@picog:matrix.org * Hi guys.06:03
I'm still seeing zuul containers crash every night, would love to understand what is happening here.
```
podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f41107fe744e docker.io/gerritcodereview/gerrit:latest 22 hours ago Exited (137) 3 hours ago 0.0.0.0:8080->8080/tcp, 0.0.0.0:29418->29418/tcp zuul_gerrit_1
3b95446473e3 docker.io/library/zookeeper:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_zk_1
3f30fca7aaa4 docker.io/library/mariadb:latest mariadbd 22 hours ago Up 22 hours zuul_mysql_1
c98938fba7bc localhost/zuul_logs:latest httpd-foreground 22 hours ago Up 22 hours 0.0.0.0:8000->80/tcp zuul_logs_1
c3ae94ef5c28 quay.io/zuul-ci/zuul-executor:latest /usr/local/lib/zu... 22 hours ago Exited (2) 22 hours ago zuul_gerritconfig_1
5bc728574e98 quay.io/zuul-ci/nodepool-launcher:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:8005->8005/tcp zuul_launcher_1
ccc15f05723a quay.io/zuul-ci/zuul-scheduler:latest sh -c /var/playbo... 22 hours ago Exited (139) 3 hours ago zuul_scheduler_1
ddfce79beac9 quay.io/zuul-ci/zuul-web:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:9000->9000/tcp zuul_web_1
a718dabae24b localhost/zuul_executor:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_executor_1
```
gerrit container logs (Can't see anything interesting)
```
[2024-04-23T16:08:56.233Z] [HTTP POST /a/changes/zuul-config~master~I875745a421c5eba6457e10b93d2c56e43373aff7/revisions/8096783323b3f1b2ef39ec67d0bb (zuul from 10.89.0.1)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.commentAddedHook resolved to /var/gerrit/hooks/comment-added [CONTEXT project="zuul-config" request="REST /changes/*/revisions/*/review" ]
[2024-04-23T16:38:23.222Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.submitHook resolved to /var/gerrit/hooks/submit [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ]
[2024-04-23T16:38:23.507Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.changeMergedHook resolved to /var/gerrit/hooks/change-merged [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ]
[2024-04-23T16:52:17.489Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: new: 1 (\) [CONTEXT ratelimit_period="1 MINUTES [skipped: 7]" ]
[2024-04-23T19:34:59.257Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: refs: 1, new: 1 [CONTEXT ratelimit_period="1 MINUTES [skipped: 4]" ]
```
gerrit\_config:
````
podman logs --tail 20 c3ae94ef5c28
ok: [localhost]
TASK [Create temp dir for Gerrit config update] ********************************
changed: [localhost]
TASK [Set All-Project repo location] *******************************************
ok: [localhost]
TASK [Checkout All-Projects config] ********************************************
changed: [localhost]
TASK [Copy new All-Projects config into place] *********************************
ok: [localhost]
TASK [Update All-Projects config in Gerrit] ************************************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "set -e\ngit config user.email 'admin@example.com'\ngit commit -a -m 'update config'\ngit push http://admin:secret@gerrit:8080/All-Projects +HEAD:refs/meta/config\n", "delta": "0:00:00.026840", "end": "2024-04-23 07:40:07.222905", "msg": "non-zero return code", "rc": 1, "start": "2024-04-23 07:40:07.196065", "stderr": "", "stderr_lines": [], "stdout": "Not currently on any branch.\nnothing to commit, working tree clean", "stdout_lines": ["Not currently on any branch.", "nothing to commit, working tree clean"]}
PLAY RECAP *********************************************************************
localhost : ok=12 changed=3 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
```
Scheduler quits because it can't connect to gerrit?
```
podman logs --tail 20 ccc15f05723a
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: Traceback (most recent call last):
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 115, in run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: self.watcher\_election.run(self.\_run)
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/zk/election.py", line 28, in run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: return super().run(func, \*args, \*\*kwargs)
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/kazoo/recipe/election.py", line 54, in run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: func(\*args, \*\*kwargs)
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 80, in \_run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: client.connect(self.hostname,
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 377, in connect
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: to\_try = list(self.\_families\_and\_addresses(hostname, port))
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 202, in \_families\_and\_addresses
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: addrinfos = socket.getaddrinfo(
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: for res in \_socket.getaddrinfo(host, port, family, type, proto, flags):
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: socket.gaierror: \[Errno -2\] Name or service not known
```
Sorry, for the long message, should I file this info somewhere else?
@picog:matrix.orgHi guys.06:06
I'm still seeing zuul containers crash every night, would love to understand what is happening here.
```
podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f41107fe744e docker.io/gerritcodereview/gerrit:latest 22 hours ago Exited (137) 3 hours ago 0.0.0.0:8080->8080/tcp, 0.0.0.0:29418->29418/tcp zuul_gerrit_1
3b95446473e3 docker.io/library/zookeeper:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_zk_1
3f30fca7aaa4 docker.io/library/mariadb:latest mariadbd 22 hours ago Up 22 hours zuul_mysql_1
c98938fba7bc localhost/zuul_logs:latest httpd-foreground 22 hours ago Up 22 hours 0.0.0.0:8000->80/tcp zuul_logs_1
c3ae94ef5c28 quay.io/zuul-ci/zuul-executor:latest /usr/local/lib/zu... 22 hours ago Exited (2) 22 hours ago zuul_gerritconfig_1
5bc728574e98 quay.io/zuul-ci/nodepool-launcher:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:8005->8005/tcp zuul_launcher_1
ccc15f05723a quay.io/zuul-ci/zuul-scheduler:latest sh -c /var/playbo... 22 hours ago Exited (139) 3 hours ago zuul_scheduler_1
ddfce79beac9 quay.io/zuul-ci/zuul-web:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:9000->9000/tcp zuul_web_1
a718dabae24b localhost/zuul_executor:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_executor_1
```
gerrit container logs (Can't see anything interesting)
```
[2024-04-23T16:08:56.233Z] [HTTP POST /a/changes/zuul-config~master~I875745a421c5eba6457e10b93d2c56e43373aff7/revisions/8096783323b3f1b2ef39ec67d0bb (zuul from 10.89.0.1)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.commentAddedHook resolved to /var/gerrit/hooks/comment-added [CONTEXT project="zuul-config" request="REST /changes/*/revisions/*/review" ]
[2024-04-23T16:38:23.222Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.submitHook resolved to /var/gerrit/hooks/submit [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ]
[2024-04-23T16:38:23.507Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.changeMergedHook resolved to /var/gerrit/hooks/change-merged [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ]
[2024-04-23T16:52:17.489Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: new: 1 (\) [CONTEXT ratelimit_period="1 MINUTES [skipped: 7]" ]
[2024-04-23T19:34:59.257Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: refs: 1, new: 1 [CONTEXT ratelimit_period="1 MINUTES [skipped: 4]" ]
```
gerrit\_config:
```
podman logs --tail 20 c3ae94ef5c28
ok: [localhost]
TASK [Create temp dir for Gerrit config update] ********************************
changed: [localhost]
TASK [Set All-Project repo location] *******************************************
ok: [localhost]
TASK [Checkout All-Projects config] ********************************************
changed: [localhost]
TASK [Copy new All-Projects config into place] *********************************
ok: [localhost]
TASK [Update All-Projects config in Gerrit] ************************************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "set -e\ngit config user.email 'admin@example.com'\ngit commit -a -m 'update config'\ngit push http://admin:secret@gerrit:8080/All-Projects +HEAD:refs/meta/config\n", "delta": "0:00:00.026840", "end": "2024-04-23 07:40:07.222905", "msg": "non-zero return code", "rc": 1, "start": "2024-04-23 07:40:07.196065", "stderr": "", "stderr_lines": [], "stdout": "Not currently on any branch.\nnothing to commit, working tree clean", "stdout_lines": ["Not currently on any branch.", "nothing to commit, working tree clean"]}
PLAY RECAP *********************************************************************
localhost : ok=12 changed=3 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
```
Scheduler quits because it can't connect to gerrit?
```
podman logs --tail 20 ccc15f05723a
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: Traceback (most recent call last):
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 115, in run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: self.watcher\_election.run(self.\_run)
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/zk/election.py", line 28, in run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: return super().run(func, \*args, \*\*kwargs)
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/kazoo/recipe/election.py", line 54, in run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: func(\*args, \*\*kwargs)
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 80, in \_run
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: client.connect(self.hostname,
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 377, in connect
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: to\_try = list(self.\_families\_and\_addresses(hostname, port))
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 202, in \_families\_and\_addresses
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: addrinfos = socket.getaddrinfo(
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: for res in \_socket.getaddrinfo(host, port, family, type, proto, flags):
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: socket.gaierror: \[Errno -2\] Name or service not known
```
Sorry, for the long message, should I file this info somewhere else?
@picog:matrix.orgMaybe the gerrit_config container failing is irrelevant, I see it exits soon after starting even when I restart everything. Perhaps just used to do a one time setup? 06:57
@picog:matrix.orgOh, perhaps this is the real reason?07:07
```
zuul systemd-coredump[3228705]: [🡕] Process 3103907 (zuul-scheduler) of user 0 dumped core.Apr 24 04:41
Module /usr/local/lib/python3.11/site-packages/confluent_kafka.libs/librdkafka-55260171.so.1 without build-id.
Module /usr/local/lib/python3.11/site-packages/confluent_kafka.libs/librdkafka-55260171.so.1
Module /usr/local/lib/python3.11/site-packages/confluent_kafka/cimpl.cpython-311-x86_64-linux-gnu.so without build-id.
Module /usr/local/lib/python3.11/site-packages/confluent_kafka/cimpl.cpython-311-x86_64-linux-gnu.so
Module /usr/local/lib/python3.11/site-packages/google/_upb/_message.abi3.so without build-id.
Module /usr/local/lib/python3.11/site-packages/google/_upb/_message.abi3.so
Stack trace of thread 88:
#0 0x00007f8311ee583c n/a (/usr/lib/x86_64-linux-gnu/libc.so.6 + 0x8883c)
ELF object binary architecture: AMD x86-64
```
@picog:matrix.org * Oh, perhaps this is the real reason?07:11
```
zuul (sd-parse-elf)[3228707]: Could not parse number of program headers from core file: invalid `Elf' handleApr 24 04:41
zuul (sd-parse-elf)[3228707]: Could not parse number of program headers from core file: invalid `Elf' handleApr 24 04:41
zuul (sd-parse-elf)[3228707]: Could not parse number of program headers from core file: invalid `Elf' handleApr 24 04:41
zuul systemd-coredump[3228705]: [🡕] Process 3103907 (zuul-scheduler) of user 0 dumped core.Apr 24 04:41
Module /usr/local/lib/python3.11/site-packages/confluent_kafka.libs/librdkafka-55260171.so.1 without build-id.
Module /usr/local/lib/python3.11/site-packages/confluent_kafka.libs/librdkafka-55260171.so.1
Module /usr/local/lib/python3.11/site-packages/confluent_kafka/cimpl.cpython-311-x86_64-linux-gnu.so without build-id.
Module /usr/local/lib/python3.11/site-packages/confluent_kafka/cimpl.cpython-311-x86_64-linux-gnu.so
Module /usr/local/lib/python3.11/site-packages/google/_upb/_message.abi3.so without build-id.
Module /usr/local/lib/python3.11/site-packages/google/_upb/_message.abi3.so
Stack trace of thread 88:
#0 0x00007f8311ee583c n/a (/usr/lib/x86_64-linux-gnu/libc.so.6 + 0x8883c)
ELF object binary architecture: AMD x86-64
```
-@gerrit:opendev.org- Felix Edel proposed:07:57
- [zuul/zuul] 916744: Visualize branches in ChangeQueues https://review.opendev.org/c/zuul/zuul/+/916744
- [zuul/zuul] 916867: Implement admin actions (promote, dequeue) in new QueueItem component https://review.opendev.org/c/zuul/zuul/+/916867
-@gerrit:opendev.org- Felix Edel proposed: [zuul/zuul] 916867: Implement admin actions (promote, dequeue) in new QueueItem component https://review.opendev.org/c/zuul/zuul/+/91686708:02
-@gerrit:opendev.org- Christian Mueller proposed: [zuul/nodepool] 916801: WIP: enable EC2 Fleet API https://review.opendev.org/c/zuul/nodepool/+/91680109:30
@fungicide:matrix.orgwe don't bound the confluent-kafka version in our image builds, so based on a reading of https://pypi.org/project/confluent-kafka/2.3.0/#files we should presumably be installing confluent_kafka-2.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl in our container images12:44
@fungicide:matrix.orgthat's the latest version since october 202312:44
@fungicide:matrix.organd that wheel does bundle a pre-built copy of confluent_kafka.libs/librdkafka-55260171.so.112:55
@fungicide:matrix.org * and that wheel does bundle a pre-built copy of `confluent\_kafka.libs/librdkafka-55260171.so.1`12:55
@fungicide:matrix.org * and that wheel does bundle a pre-built copy of `confluent_kafka.libs/librdkafka-55260171.so.1`12:56
@fungicide:matrix.orgthere was a protobuf release much more recently, but still nearly a month ago so seems unlikely this is a new change there13:03
@picog:matrix.orgIt's probably going to be quite hard to get a debug build right? If I switch out the python command to python3-dbg, would that help? 13:03
@fungicide:matrix.orgi'd need to look into where the base images get their python builds from. the images themselves are based on debian bookworm, but i get the impression the cpython interpreter there is not from debian's own python3 packages, they may provide a separate image layer for debugging symbols13:07
@picog:matrix.orgI looked as if it was built from source, yes 13:07
@fungicide:matrix.orgfor approximately how long have you been observing this failure? a few days? weeks? longer?13:08
@picog:matrix.orgIt seems to happen nightly, but this is a fairly new setup, so not longer than a week. 13:09
@fungicide:matrix.orghave you checked dmesg for signs of oom killer activity or the like?13:09
@picog:matrix.orgI found the segfault when I looked through journalctl, saw nothing else. 13:10
@picog:matrix.orgAre there any specific hardware requirements for zuul, I'm running in VM that reports a skylake cpu (probably from the host)13:12
```
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 40 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Vendor ID: GenuineIntel
BIOS Vendor ID: QEMU
Model name: Intel Core Processor (Skylake, IBRS)
BIOS Model name: pc-q35-2.11 CPU @ 2.0GHz
BIOS CPU family: 1
CPU family: 6
Model: 94
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 2
Stepping: 3
BogoMIPS: 4399.99
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt
tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti intel_ppin ibrs ibpb tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveop
t arat vnmi
```
@jim:acmegating.comI build debug images of acme enterprise zuul for acme gating customers.  here is the upstream change i use to do it: https://review.opendev.org/897859  -- you can pull that change locally if you want to make your own debug image.13:13
@picog:matrix.orgVery nice. 13:15
@fungicide:matrix.orgcomparing timestamps, it looks like there's about a 15-second delay between getting disconnected from gerrit in the scheduler container log and the coredump reported in the system journal, i suppose those are close enough together to potentially be related. wonder if the coredump is merely a side effect of how the process stopped13:21
@fungicide:matrix.orgPico: could it be something simple, like some process (package upgrades?) restarting docker?13:22
@fungicide:matrix.orgi've seen unexpected docker restarts kill every running container13:23
@fungicide:matrix.orgyour podman ps says both gerrit and zuul-scheduler containers stopped around the same time13:24
@fungicide:matrix.orgthough i guess if you're running podman then there's no dockerd to get restarted13:30
@picog:matrix.orgI see this https://zuul-ci.org/docs/zuul/latest/tutorials/quick-start.html suggests 2GB of ram, but that's probably the bare minimum.13:31
I'm bumping this up to 4, or 8?
@fungicide:matrix.orgregardless, it seems like the sequence is that connections to gerrit's ssh socket stopped working (perhaps when its container suddenly stopped), and then shortly thereafter the process in the zuul-scheduler container died ungracefully (possibly due to some outside signal)13:31
@picog:matrix.org> <@fungicide:matrix.org> have you checked dmesg for signs of oom killer activity or the like?13:33
I didn't check properly before I responded, sorry. I see multiple events
@fungicide:matrix.orgPico: do you have anything tracking memory usage on that vm? maybe some internally-scheduled process caused memory usage to balloon, but why there wouldn't be oom killer messages in the journal i'm not sure13:33
@fungicide:matrix.orgoh, you see multiple oom killer events? yeah there's your next breadcrumb at least13:33
@picog:matrix.orgThere are, I was just manually scrolling through the logs and didn't see it, now that I grep, there are a few. 13:34
@fungicide:matrix.orgdo they seem to coincide with when the containers stopped? do the process names correspond to things that would be running in the containers? but yes, regardless you'll want to get that sorted13:35
@fungicide:matrix.orgyou could increase your available memory, but if there's something wrong causing utilization to grow unbounded then it may not do more than delay the issue13:36
@picog:matrix.orgYeah, let me bump it up slightly and then keep an eye on it 13:37
@fungicide:matrix.orgif you can afford to have more than one vm, you might be better off moving your zuul-executor process to a separate one13:37
@fungicide:matrix.orgit could be some job the executor is running around that time consuming all your memory, for example13:37
@picog:matrix.orgFunny, I think this explains another issue I was having with playbooks just stopping without any failure reason  13:38
@fungicide:matrix.orgthe various component services for zuul are designed to be able to be distributed across the network and can scale horizontally that way to increase capacity13:38
@picog:matrix.orgOut of memory: Killed process 3288515 (ansible-playboo) total-vm:1300052kB, anon-rss:885364kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:2272kB oom_score_adj:013:38
@fungicide:matrix.orgbut the most volatile one, resource wise, is the executor of course since what it consumes will depend on job payloads13:39
@picog:matrix.orgThanks a lot, I was on a bit of a wild goose chase 13:39
@fungicide:matrix.organd if your executor is on a separate vm, you get a little bit of added protection from a job eating every last byte of ram and killing all your other services on the same system13:40
@fungicide:matrix.orgthe executor does have resource governors that can be used to try to prevent that from happening, but they're not bulletproof13:41
@picog:matrix.orgI will try that thanks 13:45
@picog:matrix.orgProbably a stupid question, but don't immediately see the answer, where do I specify the hostname of the executor if it's running in a different vm? 13:58
@jim:acmegating.comPico: you point the components at the same zookeeper cluster.  basically they should just all have the same zuul.conf.  they will figure out how to talk to each other that way.13:59
@picog:matrix.orgOkay, interesting. 14:00
@jim:acmegating.comthough obviously, your zuul.conf shouldn't point to a zookeeper on localhost in that case; that's the hostname you'll need to set.14:10
-@gerrit:opendev.org- Christian Mueller proposed: [zuul/nodepool] 916801: WIP: enable EC2 Fleet API https://review.opendev.org/c/zuul/nodepool/+/91680115:23
@picog:matrix.org> <@jim:acmegating.com> though obviously, your zuul.conf shouldn't point to a zookeeper on localhost in that case; that's the hostname you'll need to set.15:42
There is a single zookeeper instance right?
@jim:acmegating.comPico: yes; there is a single zookeeper quorum (which is a cluster of zk servers acting in concert). in the zuul quickstart, that is configured as a quorum with a single server.  like all parts of zuul, that can be scaled as necessary.16:03
@picog:matrix.org> <@jim:acmegating.com> Pico: yes; there is a single zookeeper quorum (which is a cluster of zk servers acting in concert). in the zuul quickstart, that is configured as a quorum with a single server.  like all parts of zuul, that can be scaled as necessary.16:06
```
[zookeeper]
hosts=zk:2281
tls_cert=/var/certs/certs/client.pem
tls_key=/var/certs/keys/clientkey.pem
tls_ca=/var/certs/certs/cacert.pem
```
So if I instantiate a new vm for the executor, I just need to edit this "zk" to point to the old zk instance which will now listen on the host network.
@f2ked:matrix.orgsadly, this still does not work.17:39
the nodes even make the `/tmp/console-*.log` files.
the executor logs do not mention console or the port number
could it be the web server?
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 916344: Gerrit: skip ref-updated /meta events https://review.opendev.org/c/zuul/zuul/+/91634419:50
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:21:17
- [zuul/nodepool] 916949: Add min-retention-time to metastatic driver https://review.opendev.org/c/zuul/nodepool/+/916949
- [zuul/nodepool] 916950: Add max-age to metastatic driver https://review.opendev.org/c/zuul/nodepool/+/916950
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/nodepool] 916008: Demote launch/delete timeeouts to warnings https://review.opendev.org/c/zuul/nodepool/+/91600823:08
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Simon Westphahl: [zuul/zuul] 914947: Temporarily pin urllib3 != 2.1.0 https://review.opendev.org/c/zuul/zuul/+/91494723:10
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Simon Westphahl: [zuul/zuul] 914947: Temporarily pin urllib3 != 2.1.0 https://review.opendev.org/c/zuul/zuul/+/91494723:11
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/nodepool] 916343: Demote launch keyscan exceptions to warnings https://review.opendev.org/c/zuul/nodepool/+/91634323:15

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!