Tuesday, 2022-11-29

*** persia is now known as Guest28500:19
*** rlandy is now known as rlandy|out00:29
ianwreally can't tell the difference between the working issue request and the failing one00:58
ianwMon Nov 28 23:55:46 UTC 2022] Dns record not added yet, so, save to /etc/letsencrypt-certs/docs.airshipit.org_ecc/docs.airshipit.org.conf and exit.01:03
ianwhrm ... i wonder if it's writing out the tokens into  /etc/letsencrypt-certs/docs.airshipit.org_ecc/docs.airshipit.org.conf but then reading  /etc/letsencrypt-certs/docs.airshipit.org/docs.airshipit.org.conf01:05
ianwso when we renew, it doesn't see those token01:05
clarkbits odd it would work sometimes if that is the case?01:12
*** clarkb is now known as Guest29801:19
*** Guest298 is now known as clarkb01:20
ianwon the issue -> [Mon Nov 28 23:55:45 UTC 2022] DOMAIN_PATH='/etc/letsencrypt-certs/docs.airshipit.org_ecc'01:31
ianwon the renewal -> [Mon Nov 28 23:58:00 UTC 2022] DOMAIN_PATH='/etc/letsencrypt-certs/docs.airshipit.org'01:32
ianwalmost certainly has to do with https://github.com/acmesh-official/acme.sh/commit/ec0e871592d286b8cd4e1d407411f0ba17c775e601:32
ianwi've filed https://github.com/acmesh-official/acme.sh/issues/441602:04
ianwi think this works when we re-run it, because then it's using the tokens from the last run02:05
ianwi will pin static to 3.0.5 and run it manually, my intuition is that it will all work02:05
ianwyep02:09
*** atmark is now known as Guest30502:10
opendevreviewIan Wienand proposed opendev/system-config master: letsencrypt: pin acme.sh to 3.0.5  https://review.opendev.org/c/opendev/system-config/+/86593002:33
opendevreviewIan Wienand proposed opendev/system-config master: letsencrypt: more echos in driver output  https://review.opendev.org/c/opendev/system-config/+/86593603:56
opendevreviewIan Wienand proposed opendev/system-config master: letsencrypt: add variable for more verbose debugging  https://review.opendev.org/c/opendev/system-config/+/86593703:56
*** yadnesh|away is now known as yadnesh04:14
*** pojadhav|out is now known as pojadhav|ruck04:51
fricklerregarding the possible "use gitea for issues" discussion I think we also need to start the discussion on how we want to position ourselves between the original gitea and what seems to be becoming a major fork https://codeberg.org/forgejo/forgejo06:27
fricklerfwiw I tried yesterday to look at how codeberg does scale their gitea deployment, which does use issues and PRs and things already, but it seems they don't, running single-server so far06:28
*** ysandeep is now known as ysandeep|afk08:02
*** yadnesh is now known as yadnesh|afk08:11
*** jpena|off is now known as jpena08:23
*** yadnesh|afk is now known as yadnesh08:27
opendevreviewIan Wienand proposed opendev/system-config master: letsencrypt: more echos in driver output  https://review.opendev.org/c/opendev/system-config/+/86593609:09
opendevreviewIan Wienand proposed opendev/system-config master: letsencrypt: add variable for more verbose debugging  https://review.opendev.org/c/opendev/system-config/+/86593709:09
*** ysandeep|afk is now known as ysandeep09:36
opendevreviewCedric Jeanneret proposed opendev/base-jobs master: Add env var for ansible-galaxy proxy  https://review.opendev.org/c/opendev/base-jobs/+/86597010:38
*** rlandy|out is now known as rlandy11:05
*** dviroel|afk is now known as dviroel11:12
*** frenzy_friday is now known as frenzy_friday|rover12:15
opendevreviewMerged openstack/project-config master: Use kolla.config for kolla-ansible in gerrit  https://review.opendev.org/c/openstack/project-config/+/86568612:50
opendevreviewCedric Jeanneret proposed opendev/base-jobs master: Add env var for ansible-galaxy proxy  https://review.opendev.org/c/opendev/base-jobs/+/86597013:04
*** dasm|off is now known as dasm13:05
*** ysandeep is now known as ysandeep|afk13:16
opendevreviewMerged openstack/project-config master: Ensure NetworkManager doesn't override /etc/resolv.conf  https://review.opendev.org/c/openstack/project-config/+/86543313:19
*** pojadhav|ruck is now known as pojadhav|out13:32
*** frenzy_friday|rover is now known as frenzy_friday|rover|food13:43
*** ysandeep|afk is now known as ysandeep15:20
*** dviroel is now known as dviroel|lunch16:11
*** ysandeep is now known as ysandeep|out16:20
*** frenzy_friday|rover|food is now known as frenzy_friday|rover16:21
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Add support for nodejs testing on Debian  https://review.opendev.org/c/zuul/zuul-jobs/+/86545916:28
*** marios is now known as marios|out16:47
fungiianw: clarkb: the system-config-run-letsencrypt failure on https://review.opendev.org/865937 is baffling, what am i missing?17:00
fungiwhile the quote escaping shown by ansible looks questionable, the way that parameter is set and used seems to pre-date the change17:01
clarkbfungi: it looks like it is running the issue command which I thought we weren't supposed to do in CI17:03
fungioh, perhaps it's the condition not matching there17:04
clarkbwe might be setting the flag to talk to the staging server in that job though17:04
fungiseems like the letsencrypt_self_generate_tokens option would control that17:05
clarkbif we do that then I think we expect things to work, however it relies on talking to their staging service17:05
clarkbletsencrypt_self_generate_tokens: False is set on the job17:05
clarkbnow does it set the flag for using the staging service17:06
clarkbplaybooks/zuul/templates/group_vars/letsencrypt.yaml.j2:letsencrypt_use_staging: True17:07
clarkbI've not found anything adding thosenodes to that group yet17:09
clarkbplaybooks/zuul/templates/gate-groups.yaml.j2 that file does it17:11
*** dviroel|lunch is now known as dviroel17:12
clarkbfungi: https://zuul.opendev.org/t/openstack/build/c84b753277f14c3fbc96b9ccf174bce4/log/letsencrypt01.opendev.org/acme.sh/acme.sh.log#417:12
clarkbI think it might be --debug217:12
clarkbnot --debug=217:12
fungiooh. so the quoting was a red herring17:14
fungiunfortunately no_log eating all the output from the command makes it hard to reason through from just the logs17:15
fungihttps://github.com/acmesh-official/acme.sh/wiki/How-to-debug-acme.sh says "--debug 2" though17:16
*** jpena is now known as jpena|off17:17
fungiaccording to https://github.com/acmesh-official/acme.sh/wiki/Enable-acme.sh-log there's also a "--log-level 2"17:18
clarkbfungi: I'm not sure what quoting you are looking at17:23
clarkbI just followed the errors reported by zuul and it led to the log file complaining about the invalid option17:23
fungioh, i keep forgetting about the redirected log being available. i was looking at the ara report for the failing nested ansible task, and going by how additional escaped quotes were wrapping the list of cli args being passed after the issue subcommand17:24
fungii quoted it in my first review comment17:25
fungihttps://0bf4aecb8c3e109f5411-ea455149ffcdba396338d4b9ec44d2c2.ssl.cf5.rackcdn.com/865937/2/check/system-config-run-letsencrypt/c84b753/bridge99.opendev.org/ara-report/results/491.html17:25
fungithe _raw_params string specifically17:25
fungimaybe the shell eventually ends up eating those additional quotes before tokenizing though17:26
fungibut yeah, https://zuul.opendev.org/t/openstack/build/c84b753277f14c3fbc96b9ccf174bce4/log/letsencrypt01.opendev.org/acme.sh/acme.sh.log#4 looks far more likely to be the cause of the nonzero exit, and makes sense as being the result of the proposed change17:28
fungiso maybe the option lexer in acme.sh doesn't handle = and requires space-separated option/value pairs instead?17:29
fungii'll give that a try17:29
clarkbI think --debug2 may be special17:29
clarkblet me loo17:29
clarkb--debug [0|1|2|3]                 Output debug info. Defaults to 1 if argument is omitted.17:29
clarkbI guess it would be --debug 217:30
opendevreviewJeremy Stanley proposed opendev/system-config master: letsencrypt: add variable for more verbose debugging  https://review.opendev.org/c/opendev/system-config/+/86593717:31
fungicool. in which case that ^ removes the =17:31
fungiand replaces it with a space17:31
clarkbany reason t not approve https://review.opendev.org/c/opendev/system-config/+/865930 ?17:33
fungii wasn't sure if i had adequately answered frickler's question17:34
clarkbI'm going to go ahead and approve it so that we can check for happy certs tomorrow and also make progress on mailman317:37
fungithanks!17:39
fungii was hoping to go ahead and approve the remaining topic:mailman3 changes as soon as that merges17:40
fricklerfungi: clarkb: I'm fine with 3.0.5 as workaround for now, I'm just concerned that it may be as difficult to move away from it again as with the grafana tag18:08
clarkbyup I'm hoping the upstream issue helps address that18:08
opendevreviewDr. Jens Harbott proposed zuul/zuul-jobs master: Use recent node version for markdownlint job  https://review.opendev.org/c/zuul/zuul-jobs/+/86605518:14
opendevreviewMerged opendev/system-config master: letsencrypt: pin acme.sh to 3.0.5  https://review.opendev.org/c/opendev/system-config/+/86593018:32
opendevreviewMerged opendev/system-config master: Add lists01.opendev.org to the inventory  https://review.opendev.org/c/opendev/system-config/+/86536119:15
opendevreviewMerged opendev/system-config master: bridge: switch OSC from container to local install  https://review.opendev.org/c/opendev/system-config/+/86560619:41
fungioh, in addition to checking blocklists, i sent a test message from a shell on the new lists01 server to my personal address to check for any obvious red flags my spamassassin filter might throw19:56
fungiinfra-pro-base failed for the production mm3 server deployment: https://zuul.opendev.org/t/openstack/build/59b05a7ff175438dbdb0e378f02fc93020:16
fungiansible log says:20:16
fungifatal: [lists01.opendev.org]: UNREACHABLE!20:16
fungiFailed to connect to the host via ssh: Host key verification failed.20:16
fungiianw: maybe we have a race condition with adding known_hosts entries and connecting to new servers?20:17
fungithere are lists01.opendev.org entries in /etc/ssh/ssh_known_hosts with a last modified time of 20:05 on the file20:18
fungibut looks like bridge probably tried to connect at 19:3720:19
fungii guess the next hourly will catch it up, but probably we need to make sure that the keys are plugged into bridge before we do any remote tasks?20:19
fungiinfra-root: draft mm3 migration announcement for the opendev and zuul lists sites is available for review near the top of https://etherpad.opendev.org/p/mm3migration now (just after the migration plan steps). feel free to mark it up and i'll incorporate changes and send sometime tomorrow if there's no disagreements and assuming we're not still struggling to get the configuration rolled out to20:21
fungithe server20:21
ianwfungi: hrm, that should be done in the "bootstrap-bridge" step, which should run before any other job20:42
ianwhrm, which didn't trigger in the buildset -> https://zuul.opendev.org/t/openstack/buildset/0709aa73d4124175b0a21146f1cfc77e20:44
ianwi'll have to think about that20:44
fungiinteresting. i guess it also happened in one of the other jobs which ran, just not early enough20:45
fungiso i suppose bootstrap-bridge needs to trigger any time a change alters inventory/base/hosts.yaml20:46
opendevreviewJeremy Stanley proposed opendev/system-config master: Also bootstrap bridge any time inventory changes  https://review.opendev.org/c/opendev/system-config/+/86606920:50
fungiianw: so that ^ ?20:50
ianwyeah, for parallel running, it should basically always trigger, as that would be the point we synchronize the system-config checkout, before letting the other jobs run against it20:50
fungioh, are you saying this is a transitional state and the parallel deploy work will address it anyway?20:51
ianwfungi: i think so -- https://review.opendev.org/c/opendev/system-config/+/820320/6/zuul.d/infra-prod.yaml would remove the matchers so the job always runs20:52
ianwbut for the time being, i think your change is correct20:52
fungii can adjust it to just do what 820320 does so there's no merge conflict20:53
fungiif you prefer20:53
ianwno that's fine, thanks, i think stick with the simple thing first20:53
fungialso it appears the osc change didn't successfully deploy either20:54
ianwrefusing to convert from file to symlink for /usr/local/bin/openstack20:55
ianwi guess i should have rm'd that first.  i think i'll just manually do that on the server20:55
fungiwfm20:56
fungiwe're never going to need to upgrade any more bridge servers with that change anyway20:57
fungii suppose if ansible's file module has something equivalent to ln's -f option, we could have used that (if we'd thought about it)20:58
ianwyeah i should probably have ensured absent: before hand.  anyway i re-ran service-bridge and it's setup now20:59
ianw(it == openstack command)20:59
fungithanks!21:00
ianw3.5.4 does have the speed improvements for the copy-approvals included (Split the work into slices of changes and then utilize one thread per slice. Also skip unnecessary reindexing. On a large Gerrit site with 1.3 million changes, this reduced the time for copy-approvals from 5 hours to 15-20 minutes.)21:05
ianwso perhaps hours is over-estimating now21:06
fungiwe're about 2/3 that size21:06
fungiwhat is copy-approvals?21:07
Clark[m]One thing about copy approvals is there aren't a ton of docs so it's hard to know what time expect like how do we know it is done etc21:07
fungiis that an upgrade time data migration?21:08
Clark[m]It can be run online so the suggestion is you run well in advance of your upgrade21:12
Clark[m]A number of people had failures with it and this latest 3.5 release fixes some of those too21:12
Clark[m]But yes required to run while on 3.5 before upgrading to 3.6 and can be online or offline21:12
ianwClark[m]: afaics from looking at post 3.5.4 things in https://gerrit.googlesource.com/gerrit/+log/refs/heads/stable-3.5 -- i don't think there's any more additions to the copy-approvals bits21:13
ianwreading the threads, nothing specific i can find about 3.5.4 issues with copy-approvals21:17
ianwone thing people have been calling out is maybe running it individually for each project to make it clear if any fails21:17
ianw"Turns out the only one that failed was All-Projects... 🤔" -- https://groups.google.com/d/msgid/repo-discuss/CAALJ1-nM3K%2BgandnBOA4VSvYU5U2JuP490-xXDry6NS%2BsEATWA%40mail.gmail.com?utm_medium=email&utm_source=footer21:18
ianwmy current thinking is that after school run i'll do a test on just one project.  i'll see what sort of logs that produces and then we can have something that goes through each project in ls-projects one by one21:26
ianwi think that's a bit more conservative than letting it run globally in parallel and then having to pick apart the failures21:26
ianwwe're not in any particular rush21:27
Clark[m]That sounds like a great plan21:38
*** dasm is now known as dasm|off21:39
Clark[m]There are a handful of projects/changes that have problems with reindexing today. One of those may be a good option? I'm trying to remember what they are now but it's due to some changes with incomplete data iirc21:42
fungimaybe at least run it separately for the all-projects repo21:45
fungiand all-users21:45
Clark[m]I would expect those to be pretty safe since we don't do reviews on them. What copy approvals does is copy votes from old patchsets to new patchsets when those votes carry over due to trivial rebases and such21:50
Clark[m]But can't hurt to do them that way either21:51
fungioh, yes then i agree it's more likely we'll see issues with repos which have been around since the dawn of 201221:57
fungisince they may not test migrating reviews from gerrit 1.x21:57
*** dviroel is now known as dviroel|out22:00
clarkbfungi: ya I think the issues they already fixed were corner cases like that22:18
ianwso the output looks like -> https://paste.opendev.org/show/btjcjw7KfDgYxRV6wc2p/22:26
clarkbianw: ah ok so the output gives you progress in the foreground and we don't need to dig through the error log presumably. Thats good22:28
ianwlooking at one of the modified changes, https://review.opendev.org/c/openstack/diskimage-builder/+/89531, doesn't seem to have any side-effects (e.g. like bumping it to the top of the review list)22:28
clarkblooks like December Tuesday largely dodge the major holidays? December 20, 27, and January 3. That said I don't really expect everyone to be around all those days so we may still skip some or all of them22:42
clarkbfor myself I'll probably be around for December 20 and January 3.22:43
fungiyah i'll be around but happy to skip22:50
clarkbfrickler: post /opt movement I'm seeing / 37GB total, 3.8GB used, 32GB available /opt is ~12GB used out of 74GB total. Pre /opt movement / is 37GB total, 14GB used, 22GB available.22:50
clarkbPartitioning and formatting isn't terribly slow so we could theoretically continue to make a swap partition and add the 74GB /opt and somehow mount the preexisting /opt over that22:51
clarkbcan you bind mount in the same namespace with overlapping names?22:52
ianwok, i have a small script running in a screen on review, output to ~ianw/copy-approvals22:53
fungicow mount the extra fs over top of /opt maybe?22:53
clarkbfungi: is that doable? that might be the ticket22:53
opendevreviewJames E. Blair proposed ttygroup/gertty master: WIP: support inline comment threads  https://review.opendev.org/c/ttygroup/gertty/+/86058822:53
clarkbianw: and that script is doing copy approvals for everything?22:53
fungii totes made it up just now, but maybe?22:53
clarkbfungi: looking at mount's manpage for --bind I don't see anything allowing one to be more specific than a file path which complicates things if we don't want to break existing paths22:54
ianwclarkb: yep, it's looping through each project; ~ianw/copy-approvals/logs has the output for each22:54
clarkbianw: I wonder if we should rerun it afterwards with no project filters just to be sure there aren't any hidden "projects" that gerrit will want torun it against too. That seems unlikely but gerrit is good at surprising me22:55
ianwsure, can do22:55
ianwso far everything seems to be working -- or at least the command is giving a return code of 022:55
corvusfungi: mm announcement email lgtm22:55
ianwhasn't hit openstack/nova yet though :)22:56
clarkbfungi: mailing list work announcement lgtm as well22:57
fungithanks!22:57
corvusfungi: the latest update to 860588 should have significant improvements to the comment editing.  i'd say the patch is about 80% complete now  (the diff screen needs more work), but i believe at this point there should be no more disappearing comments.  unified diff mode is lagging behind side-by-side a bit.22:59
corvusat this point, while i wouldn't recommend it for the casual user, it should work well enough without any hidden gotchas for the adventurous user.22:59
corvusclarkb: re the gerrit upgrade topic -- maybe something to put on the todo list for the time that we spend on 3.6 is to upgrade our project.config files to match the 3.7 expectations (even though it shouldn't break existing projects, it would be good to have the same configs for old + new)23:01
clarkbcorvus: I think I parse what you mean but to be sure you're saying once we are on 3.6 we should take a moment to synchronize acls for 3.7's expectations? if so I agree23:02
corvusyep exactly23:03
clarkbfungi: not having openstacksdk listed in https://review.opendev.org/c/opendev/system-config/+/865320/5/launch/pyproject.toml means that pip won't upgrade it to latest because 0.102 will be sufficient I believe23:03
clarkbfungi: do we want to explicitly ask for >=0.103 or manually update it. I'm leaning towards being explicit as we've found this whole thing to be confusing over time23:03
ianwprobably not a bad idea to drop a comment with a > just for context23:09
ianwok, finished for all projects.  none returned an error code, which is nice23:21
clarkbnice23:23
ianw$ cat * | grep updated | wc -l23:26
ianw558023:26
ianwit doesn't really seem like that many changes were really updated23:27
clarkbits only the subset of changes that have votes that need to be forward ported to newer patchsets23:29
clarkbthe old system apparently looked at all patchsets and figured out which votes apply to the current one but current Gerrit wants to reapply those votes as those events occur so it can view only the current patchset23:30
clarkbianw: I left comments on https://review.opendev.org/q/topic:rax-rdns23:30
clarkbthe first change in the tsakc has a bug otherwise its mostly me just talking out loud23:30
ianwok thanks, will go over 23:31
clarkband I've approved the idnetity api filter removal for rax23:32
corvusclarkb: okay i dug into the gerrit 3.7.0 thing -- the latest is that it's accepted as a gerrit bug, nothing for zuul to fix.  bug here: https://bugs.chromium.org/p/gerrit/issues/detail?id=16475 patch here: https://gerrit-review.googlesource.com/c/gerrit/+/35301723:38
corvusclarkb: i think our action is to wait until that lands (or possibly we could do a test build if we want to double check that patch)23:38
clarkbI expect the original reporter may be able to test or we could reach out and offer to help them test?23:41
corvusluca requested they test23:41
clarkbah perfect if a test is added in the fix change that is probably good enough for our needs23:41
corvus(but also, luca tested locally and plans to add a unit test, so yeah, i think that sounds good enough unless someone requests otherwise)23:42
fungicorvus: thanks! i'll refresh my install23:46
corvusamusingly, i had trouble reproducing at first because my local js client was 3.6.x, but after a shift-reload saw the issue.  that may be something to keep in mind if any users report not seeing new features after upgrades :)23:47
ianwclarkb: good call on rerunning.  when rerunning it seeemed to find another 24 changes to update (~ianw/copy-approvals/rerun.txt)23:47
fungiclarkb: ianw: that's a good point, we likely want to force newer sdk in order to make sure it actually upgrades, or completely recreate that venv on update23:48
ianwi might re-run it *again* to see if it has become idempotent23:49
fungiyeah, it could keep finding those same 24 every time now23:50
clarkbcorvus: I think I may have identified a bug in luca's fix. I left comments to illustrate it23:50
clarkbianw: ++23:50
clarkbianw: and maybe even rerun it again the day before we upgrade or something to make sure old data isn't being inserted somehow23:51
*** rlandy is now known as rlandy|out23:51
corvusclarkb: i'll repro your situation and see what stream-events emits23:51
corvusoh er that won't be useful without lucas patch, nevermind23:52
clarkbI only realized this because I looked at luca's comment and realized I could reply to it.23:53
corvusclarkb: fwiw, i believe that matches the existing behavior -- here's davido's original fix from a few years ago: https://gerrit-review.googlesource.com/c/gerrit/+/291144/16/java/com/google/gerrit/server/restapi/change/PostReview.java23:53
corvusso i agree with you that it's potentially weird, but also, it's probably not a regression from the current state23:54
clarkbneaty23:55
clarkb*neat23:55
corvusa potential way to disambiguate would be to look for the patchset level comment with no in_reply_to set23:55
corvus(now you know why i batched this work with the gertty work -- i paged all that in at the same time)23:55
clarkbyou're probably the expert on this now too :P23:56
* corvus hides23:56

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!