Wednesday, 2021-11-03

clarkbon backup02 I think we can live with this situation considering the lists case is handled proeprly due to a lack of a db backup00:00
clarkbWhen we upgrade to mailman3 we'll add a db I think as part of that but we'll also switch to a new server00:00
fungiyep00:00
clarkband that new server will have a new borg backup location and we won't have problems as a result00:00
ianwi can just remove those archives, as they are old, from before i fixed it00:00
clarkbbasically that means lists is fine as is I think00:00
clarkband review01 is unfortaunte but not much we can do now and we have backup01 archives00:00
ianw(the archives NOT named "lists-filesystem-*" in the lists backup, to be clear)00:01
clarkbianw: right or we can keep them and let them age out/persist as its ok in that particular circumstance (maybe make note of it somewhere)00:01
ianwyes, actually, i can just rename it "lists-filesystem" with the same date00:01
ianwi'll do that, and same for review, to avoid confusion over this00:02
ianwand i will audit backup01 for the same issue00:02
clarkbok I'm completely off of that server fwiw so there won't be any interference from me00:02
clarkb(and the only non RO thing I did was run the prune script which logs to a file for us automatically)00:02
ianwand i'll post a change for storyboard db backup.  i'm kind of assumign that's a trove?00:02
clarkbI think it may be a local db fungi ^ ?00:02
ianwanyway, i'll look and figure it out00:03
fungiyes, it's on the server00:03
fungiwe did that to help performance00:03
clarkbianw: thanks. This was a good exercise for me to better understand this stuff. Thank you for walking me through this stuff00:03
ianwclarkb: thank you!  another set of eyes always finds interesting things00:03
clarkbI was hoping to restart nodepool-builder on nb03 today as well. I'll check on that fter dinner I guess00:03
ianwdid that change pass?00:04
clarkbianw: its in the gate right now00:04
clarkb816389 specifically00:04
ianwok, no worries.  i can watch and pull the new container when it promotes and check it out00:04
clarkbI don't mind checking really quickly after dinner. Then can hand off or defer to tomorrow if this finds a new problem :)00:05
ianwif i get the f35 change in, i will want to bump a new nb container with a later dib release in it to build that too00:05
clarkbalright food is here. I'll check in in a bit on the nodepool thing00:05
clarkbthanks again00:05
ianwok, step 1, backups01 doesn't have this issue : https://paste.opendev.org/show/810352/00:13
ianw*** /opt/backups/borg-wiki-update-test/backup00:14
ianwwiki-upgrade-test-filesystem00:14
ianwwiki-upgrade-test-filesystem-2021-02-1600:14
ianwis weird though00:14
ianwthat is coming from "wiki-upgrade-test-filesystem-2021-02-16T02:56:09.checkpoint Tue, 2021-02-16 02:56:11 [c444a0765e5791f3f68f08624d1efd80bf8a3ebc96bb225f08e4013befa2b460]"00:16
ianwnot sure where ".checkpoint" comes from ...00:16
ianwhttps://borgbackup.readthedocs.io/en/stable/faq.html#if-a-backup-stops-mid-way-does-the-already-backed-up-data-stay-there00:17
clarkbok I've pulled the new arm64 builder image and started services there00:31
clarkbit seems to be running and not dying in a loop00:31
clarkbit is having some trouble deleting images in osuosl because they are in use outside of the glance store so we may have leaked eimages there00:32
clarkbbut things seem to be running so I can look at that tomorrow.00:32
ianwthanks00:34
opendevreviewIan Wienand proposed opendev/system-config master: borg-backup: skip .checkpoint archives  https://review.opendev.org/c/opendev/system-config/+/81642201:45
corvusianw: i'm going to restart zuul for some more interactive debugging01:47
ianwokdokie01:47
corvusrestarting now01:52
corvusre-enqueing02:03
ianwi've done like "/opt/borg/bin/borg rename /opt/backups/borg-lists/backup::lists-2021-01-14T05:51:03 lists-filesystem-2021-01-14T05:51:03" on backup01; all borg-lists archives are now correctly named02:07
fungiawesome, thanks for straightening that out02:09
corvusi'm starting a second scheduler, so status pages will produce sporadic errors for a bit02:10
corvusre-enqueue complete02:12
ianwok, same done for review01 on backup02 (sorry, prior message was about backup02 as well)02:13
ianwclarkb: re storyboard dbs; i mounted the backups and var/backups/mysql_backups/ is populated and looks good.  i don't think it's worth updating the puppet to put in a streaming backup job02:37
ianwi've heard rumors of it being containerised, etc. so i guess we can just update it when we pull it all under ansible control02:38
corvusokay i have managed to break zuul in a controlled manner02:42
corvusi'm continuing to debug for a bit before i restart02:43
corvusi think i grok it.  i'll revert/restart now02:48
corvusdeleting zk state02:50
corvusre-enqueing03:14
corvusdone03:18
corvusianw:  okay, i think everything's back to normal now re zuul; i'm heading out03:19
*** redrobot2 is now known as redrobot05:58
*** gibi is now known as gibi_pto_back_thu06:11
*** jpena|off is now known as jpena10:09
opendevreviewDr. Jens Harbott proposed opendev/infra-manual master: Update description of our testing environment  https://review.opendev.org/c/opendev/infra-manual/+/81645111:02
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: cloud-init: Support growpart and pvresize for specified devices  https://review.opendev.org/c/openstack/diskimage-builder/+/81645811:15
*** dviroel|out is now known as dviroel|rover11:21
opendevreviewAlfredo Moralejo proposed openstack/project-config master: Add support for CentOS Stream 9 in nodepool elements  https://review.opendev.org/c/openstack/project-config/+/81144211:49
opendevreviewAlfredo Moralejo proposed openstack/project-config master: Add centos-9-stream nodepool image  https://review.opendev.org/c/openstack/project-config/+/81646511:49
opendevreviewMerged opendev/infra-manual master: Update description of our testing environment  https://review.opendev.org/c/opendev/infra-manual/+/81645111:53
opendevreviewMerged openstack/diskimage-builder master: fedora-container: update to Fedora 35  https://review.opendev.org/c/openstack/diskimage-builder/+/81557411:58
clarkbianw: oh got it its doign the old school backups so we have them db in there.15:19
opendevreviewJeremy Stanley proposed opendev/infra-specs master: Add a specification for Mailman 3  https://review.opendev.org/c/opendev/infra-specs/+/81099015:21
clarkbfungi: do you want to respond to emilienM or should I do it re gophercloud using "some" opendev resources (we're pretty well aligned against doing that at this point, but an openstack project could set up third party testing with the sdk I suppose)16:26
fungiyeah, i was just pulling up the references to link in my reply16:27
clarkbthanks I figured you were already on top of it since you tend to read that mailing list more quiickly than I do hence my question :)16:27
clarkblet me know if I can help16:27
fungireplied16:41
clarkbfungi: for the mailman3 spec (catching up on the latest ps) the talk about bounce handling and all that makes me wonder if we need ot explicitly test the dmarc/dkim stuff?16:42
clarkbI think kata lists just do the munging and opendev lists do the don't rewrite/add headers option. We probably want to ensure both options are still viable?16:42
clarkbI don't think we need to explicitly add that to the spec unless you feel it is improtant too, but wanted to call it out while I was thinking of it16:43
*** marios is now known as marios|out16:43
fungii figured we'd evaluate all of that at the time, yeah16:48
corvusfyi, my thinking on zuul now is that it would be best to try to wait until we finish the branch cache refresh work before we restart on master again.  we have hopefully found and fixed the biggest bug (re the change cache) in master, but we know we can't run with two schedulers for long without the branch cache work.  so getting that work done before restarting again puts us in a place where we can potentially run with 2 schedulers for a longer17:16
corvusperiod if everything works.  and in the interim, that gives the opendev service a breather since we had a lot of restarts recently.17:16
clarkbmakes sense17:17
fungiyeah, works for me, thanks17:21
fricklerinfra-root: ethercalc seems down (503), can someone doublecheck and possibly restart?17:36
clarkblooks like the service itself isn't running17:38
clarkbI'll restart it17:38
clarkbI don't see any logs in journalctl or in /var/log/ethercalc so not sure why17:38
clarkbfrickler: it is up now17:40
fungiwe've seen it crash due to a csv export bug in the past17:40
clarkbfungi: any idea where the logs go?17:40
fungialso our puppet-ethercalc module explicitly ensures "started" which is why it magically recovers on its own if left alone17:41
fungiapache maybe?17:41
clarkbwell it runs as a separate node process17:41
clarkball apache does is proxy17:41
fungioh, syslog i think17:41
clarkbits also possible that it just doesn't log anything and dies17:41
fungidigging17:42
fungiyeah, it's in syslog17:42
fungi/var/log/syslog.1 around 14:08:43 utc yesterday17:43
fungiError: Can't set headers after they are sent.17:43
clarkbheh so a bug in the tool. I guess restarting is what we do then17:45
fungiyeah, same error we usually see, reported a few years ago but still unfixed: https://github.com/audreyt/ethercalc/issues/62617:46
fungiwe might be able to disable excel exporting17:46
*** jpena is now known as jpena|off17:48
fricklercacti02 says: The SSL certificate for openstackid.org "(CN: subject= /CN=openstackid.org)" will expire on Dec  1 19:21:18 2021 GMT18:23
fungiyeah, clarkb gave the sysadmins for that service a heads up about it earlier today18:25
fungiwe don't host it any longer, but we do host a few systems which would be inaccessable if it were to have its cert expire, so might be good to continue monitoring it18:26
fungior maybe it's just translate.openstack.org we host which uses it at this point18:26
clarkbya I think keeping that check in place for another cycle or two is a good idea18:28
clarkbthen if they are refreshing the cert properly we might turn it off18:28
fungiour znode count is drastically reduced (by more than half) since sunday. did zuul's znode usage get more efficient?19:03
dmsimardFYI, upstream EOL of ansible 2.9 and ansible-base 2.10 have been announced: https://groups.google.com/g/ansible-announce/c/kegIH5_okmg/19:04
fungihttps://grafana.opendev.org/d/5Imot6EMk/zuul-status?viewPanel=37&orgId=1&from=now-14d&to=now19:04
fungidmsimard: thanks for the info!19:04
fungi"The planned end of life date for upstream Ansible 2.9 is May 23, 2022 which coincides with the scheduled release of ansible-core 2.13. End of life for ansible-base 2.10 will also coincide with the scheduled release of ansible-core 2.13."19:05
*** dviroel|rover is now known as dviroel|rover|afk19:45
ianwclarkb/fungi: https://review.opendev.org/c/opendev/system-config/+/816422 was a fairly easy one from yesterday after i noticed weird archives in the list.  be good if you could double check my awking20:17
ianwit looks like magnum have merged the change to stop using our mirror on all branches that currently aren't in a failing state.20:18
ianwi'll take opinions on our mirror removal with https://review.opendev.org/c/opendev/system-config/+/816416 ... we could either force-merge the other changes to the broken branches, or just leave those changes as-is20:19
ianwif anyone wants to fix the gates, they can stack their changes ontop, but i feel like we've giving good notification the image isn't there any more20:19
fungiyeah, i think the awk script there looks okay, maybe in the future we can replace it with inefficient and verbose code which is a bit less inscrutible20:23
ianwprobably a "borg list-archives" command would be useful, that just gives the unique archive names in the backup20:25
ianw... having being prompted and actually reading borg list ...20:26
ianwpossibly --format '{name}{NL}' does that?20:27
clarkbdmsimard: I believ we're on ansible 4? something liek that20:28
*** dviroel|rover|afk is now known as dviroel|rover21:29
*** dviroel|rover is now known as dviroel|rover|out22:18
clarkbianw: I've approved the .checkpoint fixup after testing the awk locally. I think that is correct but maybe we should run a noop prune after it lands?22:21
clarkbby testing the awk I mean echo FOO | awk where FOO has a .checkpoint or it doesn't22:21
clarkbI didn't check it with actual borg listings22:21
ianwclarkb: thanks, good idea; backup01 has one 22:26
opendevreviewMerged opendev/system-config master: borg-backup: skip .checkpoint archives  https://review.opendev.org/c/opendev/system-config/+/81642222:45
ianwi'm just running that noop prune test now23:29
opendevreviewMerged opendev/system-config master: Add Fedora 35 mirror  https://review.opendev.org/c/opendev/system-config/+/81640423:42
ianwi'll run that ^  i've noticed that heat is using fedora images (https://review.opendev.org/c/openstack/heat/+/816592)23:50

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!