Monday, 2021-08-09

*** abhishekk is now known as akekane|home05:21
*** akekane|home is now known as abhishekk05:22
*** ykarel_ is now known as ykarel05:43
*** jpena|off is now known as jpena07:35
*** rpittau|afk is now known as rpittau07:52
*** akekane_ is now known as abhishekk08:26
*** ykarel is now known as ykarel|lunch08:32
*** ykarel|lunch is now known as ykarel10:06
*** jcapitao is now known as jcapitao_lunch10:41
*** rlandy is now known as rlandy|ruck11:08
yoctozeptoI am wondering why Masakari renos do not show up https://docs.openstack.org/releasenotes/masakari-dashboard/unreleased.html the promote job has run successfully but its effects cannot be seen11:35
*** jpena is now known as jpena|lunch11:39
fungiyoctozepto: do you see them being built in the docs job?11:40
fungier, nevermind. openstack projects seem to use a separate releasenotes job11:41
*** hberaud_ is now known as hberaud11:44
fungiahh, the releasenotes are built as part of the docs job but then the promote-openstack-releasenotes job pulls the release notes from the created docs tarball artifact and publishes those separately11:47
fungifor example, https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_2e4/798842/3/gate/build-openstack-releasenotes/2e4842e/docs-html.tar.gz is the one https://zuul.opendev.org/t/openstack/build/7d5734d34dff4f669f0b5584aff2e15e tried to publish them from11:48
fungiand i see a release note in the unreleased.html that includes, which is not appearing on the docs site11:51
fungii can see the new note and a build timestamp from saturday in /afs/.openstack.org/docs/releasenotes/masakari-dashboard/unreleased.html (the read-write volume path)11:56
fungibut the docs site publishes from a read-only replica at /afs/openstack.org/docs/releasenotes/masakari-dashboard/unreleased.html which lacks that note and has a build timestamp from months ago (2021-05-24)11:57
fungiso looks like that volume may have stale content, vos release could be broken, i'll look into that possibility11:58
fungiianw: if you're still around, where did the vos release cronjob for static site volumes move to?12:04
yoctozeptothanks fungi for handling the issue12:05
fungioh, right, we moved it from cron to ansible, so i guess bridge initiates and logs it now12:06
fungilooks like vos release has been failing for the docs volume for at least a week (that's as far back as our log retention goes)12:10
fungii suppose it's possible we rebooted afs01.dfw in the middle of a vos release of the docs volume when we were doing kernel updates a few weeks back12:14
fungithere's a tarballs volume update in progress since a few minutes ago, but once it completes i'll take the lock for that cronjob in a root screen session on mirror-update and start trying to manually release the docs volume12:19
fungiyeah, vos release says the vldb entry is already locked. i'll try to clear it12:27
fungiunlocked it, vos release is running with -localauth under a root screen session on afs01.dfw now12:28
*** jpena|lunch is now known as jpena12:32
fungiit hasn't said anything to the tty yet though, leading me to wonder if there's an existing transaction it's waiting to see complete12:48
fungiif this takes much longer i'll propose a patch to switch to serving sites from the read-write volume until we can get the read-only replicas back in sync12:50
yoctozeptothanks13:04
fungiyoctozepto: it finished. check that https://docs.openstack.org/releasenotes/masakari-dashboard/unreleased.html now shows what you expect when refreshed13:13
fungi#status log Deleted stale vldb entry for AFS docs volume and ran vos release manually to catch up the read-only replicas13:13
opendevstatusfungi: finished logging13:13
yoctozeptofungi: thanks! I assume I should be expecting it now to always stay in sync? or does not it run on promote, only on some schedule?14:23
fungiyoctozepto: the promote job writes to the read-write volume. every 5 minutes a cronjob runs to sync that read-write volume to read-only replicas. this can take upwards of a few minutes (and occasionally much longer if queued up behind a particularly large tarballs site update or something), but generally within 10 minutes after completion of the promote job you should see it reflected14:25
fungion the website14:25
yoctozeptofungi: thanks, that's really helpful; I will report if it ever takes longer than one hour for these to appear14:26
fungithanks14:28
yoctozeptoI have a related question too; does some process try to reconcile failures such as: https://zuul.opendev.org/t/openstack/build/d0cbe655424e4b1995fdcd39eae43221 or is it only best-effort under the assumption that some later promote happens and succeeds?14:30
fungiyoctozepto: it's assumed that a subsequent release notes publication job will succeed later and include the earlier content. also while those jobs are triggered on each branch, they all build and upload notes for all branches so as long as the job succeeds later on some branch the content should be incorporated14:34
fungias for that exact failure, i wonder if there's a problem with the afs driver on one of the executors... looking into it now14:36
yoctozeptofungi: thanks and thanks14:36
fungilooks like it ran from ze12 according to the inventory14:36
fungino recent afs errors in its dmesg though14:37
fungioh, i should have looked at https://zuul.opendev.org/t/openstack/build/d0cbe655424e4b1995fdcd39eae43221/console#1/0/28/localhost closely14:38
fungirsync: rename "/afs/.openstack.org/docs/releasenotes/kolla-ansible/.victoria.html.mxBbLh" -> "victoria.html": No such file or directory (2)14:38
fungiwe sometimes see that when multiple release notes jobs are running at the same time and trying to update the same path14:39
fungiand one build delete's the other build's tempfile14:39
fungideletes14:39
yoctozeptomakes sense; would need a mutex to solve for good14:40
yoctozeptoanyhow, good to know the details14:40
fungiyep, the 803845,1 build for stable/wallaby was running at exactly the same tie14:40
fungitime14:40
fungiwe'd need to do cross-branch mutexes to avoid this, right14:41
fungibasically there were updates for the release notes on the stable/ussuri and stable/wallaby branches of the same project trying to rsync --delete at the exact same time14:42
fungiso one deleted the other's tempfiles because they weren't expected content14:42
yoctozeptocan't be truer14:43
*** ykarel is now known as ykarel|away14:48
*** jpena is now known as jpena|off15:32
*** rpittau is now known as rpittau|afk16:26
zulWhere would I be adding an envlist for this one? https://zuul.opendev.org/t/openstack/build/005f12d679ef4891ae00d87008ca681f16:59
clarkbzul: if you go to https://zuul.opendev.org/t/openstack/build/005f12d679ef4891ae00d87008ca681f/console (its the console button on the right side of the link you gave above) you'll see it is the tox role that is failing (its the 'tox:' prefix that gives that away). Then you can look up that role in https://opendev.org/zuul/zuul-jobs to find the args it takes :17:02
clarkbhttps://opendev.org/zuul/zuul-jobs/raw/branch/master/roles/tox/README.rst17:02
clarkblooks like you want to set a var called tox_envlist on the job17:03
zulthanks17:03
*** rlandy|ruck is now known as rlandy|drappt17:21
*** rlandy|drappt is now known as rlandy19:43
*** rlandy is now known as rlandy|ruck19:45

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!