Friday, 2023-12-08

opendevreviewFelix Edel proposed zuul/zuul-jobs master: mirror-workspace-git-repos: Retry on failure in git update task  https://review.opendev.org/c/zuul/zuul-jobs/+/90290708:07
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: rocky-container: Add installation of Minimal Install group  https://review.opendev.org/c/openstack/diskimage-builder/+/89937208:37
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: rocky-container: Add installation of Minimal Install group  https://review.opendev.org/c/openstack/diskimage-builder/+/89937210:21
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: mirror-workspace-git-repos: Retry on failure in git update task  https://review.opendev.org/c/zuul/zuul-jobs/+/90290714:38
opendevreviewMerged zuul/zuul-jobs master: mirror-workspace-git-repos: Retry on failure in git update task  https://review.opendev.org/c/zuul/zuul-jobs/+/90290715:04
clarkbthe gitea09 backups to the one backup server are still failing...22:58
clarkbI'm going to try a manual run22:58
clarkbit fails when run manually so now we know the periodic jobs aren't at fault. The row it complained about changed between the last autoamted run and my manual run23:06
clarkbafter realizing I needed to set -o pipefail for accurate test results running `bash /etc/borg-streams/mysql | gzip -9 > clarkb_test_db_backup.sql.gz` locally on the server and not piping to borg to stream offserver succeeds. Which I expected because the other backup host is backing up just fine23:24
clarkbthis leads me to think that the problem has to do with the networking connection between gitea09 and the vexxhost backup server causing a backup in the stream such that mysqldump has a network error23:25
opendevreviewGhanshyam proposed openstack/project-config master: Remove retired js-openstack-lib from infra  https://review.opendev.org/c/openstack/project-config/+/79852923:27
clarkbI tried added --max-allowed-packet=256M since the internets say one reason these sorts of errors can occur is having the packet size to small for a row23:41
clarkbhowever, I didn't really expect that to help because the other backup works and I would expect this to be a universal problem if increasing the packet size helps23:42
clarkbI undid that manual change and the server is back to the way it started. I think I'm going to need to sleep on this one. It feels like the sort of bug that bashing my head against isn't going to be helpful with since it has to do with buffer/networking/mariadb stuff23:44
clarkbone thing I think we could do as a workaround is to have the backup write to a tmpfile on disk, cat the file to stream it out, then rm the file23:44
clarkbthen we remove the mysqldump into borg-backup and instead have mysqldump to disk, then cat/zcat into borgbackup23:45
clarkbianw: ^ fyi struggles with the streaming backups23:45
clarkbnot sure if you ahve seen similar before and may have pointers23:45
clarkbone thing that just occured to me: This could be a regression in mariadb or mariadbdump/mysqldump since one of the things that does change over time is our mariadb container image23:47
clarkbI've put everything back the way it was before. I suspect this will continue to fail until we do something or if this is a mariadb regression they fix it and it magicaly goes away.23:51
clarkbthe more I think about it the more I like the idea of using a staging file locally. We should be able to do something like our current docker exec command | gzip -9 > $(mktemp tmp.XXXXXXXXXX.sql.gz) && zcat $TMPFILE ; rm $TMPFILE23:53
clarkbthat said debugging help to better understand is probably the first order of business23:54

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!