Friday, 2021-12-03

corvusthanks; the delta from your last review on keycloak is very small :)00:00
clarkbcorvus: for the auth token. We supply the value in the config as our token directly? or do we generate a time specific token? And if we generate a time specific token what prevents anyone from doing that (I've got the zuul client docs up and they don't seem to speak to this)00:02
clarkboh there it is zuul create-token00:02
clarkbzuul create-auth-token00:02
clarkbbut that still doesn't explain how this is ACL'd https://zuul-ci.org/docs/zuul/reference/client.html#create-auth-token00:03
corvusclarkb: my plan is to generate one token with no limit on the lifetime, and stick that in a config file on zuul0X for use with zuul-client00:05
corvusif we lose that token, we'll need to change the secret, but we can do that easily00:05
clarkbcorvus: ok, and what prevents anyone from running zuul create-auth-token and creating another token?00:06
clarkbthat is what I'm confused about. The command doesn't seem to accept a secret at all00:06
clarkbso wondering where the trust is involved00:06
corvusclarkb: oh we run that on zuul02 and it reads zuul.conf, so only we can create those tokens00:06
corvusand then the acl is here: https://review.opendev.org/82027700:06
clarkboh this isn't creating a token via the rest api00:06
clarkbcreating the token is local via the local config. Then the craeted token can be used via the rest api according to the rules in 82027700:07
corvuscorrect, 'zuul create-auth-token' is an admin-only command that can only be run with the actual production zuul.conf.00:07
corvusclarkb: exactly00:07
clarkbok now to find the docs on those config settings00:08
corvusand i set up the rules so we can use a single token for every tenant00:08
clarkbcorvus: if I read the docs correctly you may not actually be planning to use teh create-auth-token command and instead manually generate the token?00:13
clarkbI'm basing this on the fact that you seem to be relying on the issuer to accept the token via the iss config in 820277. But create-auth-token doesn't seem to allow you to specify that? Or maybe it is automatic based on the zuul-web canonical name00:13
corvusit uses the value from the config file (issuer_id)00:14
clarkbaha00:14
clarkband we could filter this further by matching sub is admin or similar, but in this case it is sufficient to match the issuer since we are the only issuer00:14
clarkbI know we've broken down the zuul docs in terms of tutorials, guides, and reference but digging through docs for this makes me wish that everything was more centralized.00:15
clarkbI'll have to think on how to make this stuff more discoverable and easy to read in the docs. I don't have any good ideas other than I wish it wasn't in three different places right now :)00:17
clarkbcorvus: one last question. The docs say that revoking tokens is not trivial. I assume the process is basically to chgne the secret in the config? then old tokens won't match anymore?00:21
clarkbBasically it is doable but might requires a restart?00:21
corvusyep00:22
corvus(it may as well be a shared-secret system the way we'll be using it -- but this way we don't need to implement another auth mech)00:23
clarkbcool and then if we only ever leave the token on the servers themselves the risk is roughly the same as the other side of the verification00:23
corvusyep00:23
clarkbif we move the secret off host we should time scope them00:23
corvusyeah.  hopefully instead of doing that though we just run keycloak for our local convenience :)00:24
clarkbright00:24
clarkbI'm just making sure I've got a good grasp of how this works. This has helped, thanks00:24
clarkband for keycloak we'd add a new auth opendevkeycleak entry or similar wthat is driver openidconnect. Then in our admin rules we would configure it to look for that issuer and probably specific users00:25
clarkband then token issuance happens via keycloak's api00:25
clarkbwhatever that method might be00:25
corvusclarkb: yep, and we can do stuff with keycloak groups and zuul tenants, etc.00:26
Clark[m]Related to understanding how things work John Carmack did a commencement speech recently where he talks about not being afraid to dig into details and actually understand how things work. I'm sure I'll never have as much understanding as he does, but I find that I enjoy technology a lot more when I don't treat it as magic but instead as largely decipherable tools. https://www.youtube.com/watch?v=YOZnqjHkULc00:29
clarkbheh and now to the other client. Of course sometimes you wonder if software and computers truly are decipherable :)00:30
opendevreviewIan Wienand proposed opendev/system-config master: Update bridge playbook match  https://review.opendev.org/c/opendev/system-config/+/82028100:38
opendevreviewIan Wienand proposed opendev/system-config master: Rename install-ansible to bootstrap-bridge  https://review.opendev.org/c/opendev/system-config/+/82028200:38
opendevreviewIan Wienand proposed opendev/system-config master: Rename install-ansible to bootstrap-bridge  https://review.opendev.org/c/opendev/system-config/+/82028200:45
ianwcorvus: sorry, lost it in the scrollback, did you want a full zuul restart, or just schedulers/web?00:46
corvusRoll sched and web only00:48
ianwok, i can do that, and gerrit, in say 1-30/2 hrs from now00:49
corvusThx. I won't be around then fyi. Or even now really :)00:50
ianwno worries.  what could possibly go wrong :)00:50
ianwas clarkb says, always good to get into the details00:50
corvusYou know where the big red reset button is00:51
*** rlandy|ruck is now known as rlandy|out00:55
kevinzclarkb: ianw: Np, I will update today for the Cert.02:30
ianwok going for some restarts02:45
ianwok, https://zuul.opendev.org/t/openstack/build/414cdcdd14bc432c9909c692a3841aed/logs pushed 3.3: digest: sha256:152fc54f4d91f938cfe6bf5a762f129f8716e05a46619a5fe31eaaca5eabd7c502:48
ianwthat matches https://hub.docker.com/layers/opendevorg/gerrit/3.3/images/sha256-152fc54f4d91f938cfe6bf5a762f129f8716e05a46619a5fe31eaaca5eabd7c5?context=explore02:48
ianw "RepoDigests": [02:50
ianw            "opendevorg/gerrit@sha256:152fc54f4d91f938cfe6bf5a762f129f8716e05a46619a5fe31eaaca5eabd7c5"02:50
ianw        ],02:50
ianwmatches on review.  so we're ready to restart with 3.3.802:51
ianwold image is a071a9727a9202:51
fungilgtm02:53
ianw... and back  Powered by Gerrit Code Review (3.3.8-9-g783af24727-dirty) 02:53
fungiyay! thanks02:54
ianw#status log restarted gerrit with 3.3.8 from https://review.opendev.org/c/opendev/system-config/+/819733/02:54
opendevstatusianw: finished logging02:54
ianwnow zuul02:54
*** sshnaidm is now known as sshnaidm|off02:57
ianwzuul/zuul-scheduler                                         <none>                                    0a216ce83b59   26 hours ago   491MB03:00
ianwi'm not so sure about this on zuul0103:00
ianwhttps://hub.docker.com/layers/zuul/zuul-scheduler/latest/images/sha256-25347323eeaead7f8a8ca27f5b8ffd5ee62dda5ddcb508b30f1b6390727674bb?context=explore03:00
ianwis the latest03:00
ianw          "zuul/zuul-scheduler@sha256:25347323eeaead7f8a8ca27f5b8ffd5ee62dda5ddcb508b30f1b6390727674bb"03:02
ianwit's ok, i'm just blind, that's the old image03:02
ianweverything matches03:02
ianw2021-12-03 03:02:40,151 DEBUG zuul.CommandSocket: Received b'stop' from socket03:03
wxy-xiyuanHi, @ianw, cloud you please take a look https://review.opendev.org/c/openstack/project-config/+/818723 if you're free? When I tried to add openEuler support to devstack, the team suggest that the CI cloud be ready at the same time. Before I write the job, the node should be ready first I guess.03:03
wxy-xiyuans/cloud/could03:04
ianwhttps://zuul.opendev.org/components shows zuul01 init03:06
ianw01 is up, going to restart 02 & web03:24
ianwwyx-xiyuan: sorry, i had missed that one.  one comment inline03:33
ianwz2 scheduler up, restarting web now03:39
wxy-xiyuan@ianw, big thanks! will reply soon03:40
opendevreviewwangxiyuan proposed openstack/project-config master: Add openEuler 20.03 LTS SP2 node  https://review.opendev.org/c/openstack/project-config/+/81872303:44
ianwwe now seem to have a beaker next to pipelines, which i think means it worked03:57
ianw#status log performed rolling restart of zuul01/02 and zuul-web 03:58
opendevstatusianw: finished logging03:58
fungihttps://zuul.openstack.org/status is back as well now that the fix is in place04:14
*** ysandeep|out is now known as ysandeep|ruck04:54
*** pojadhav|afk is now known as pojadhav05:00
wxy-xiyuanianw, fixed now. :)06:06
*** raukadah is now known as chandankumar06:12
*** ysandeep|ruck is now known as ysandeep|afk06:16
ianwwxy-xiyuan: one other thing; is there a reason it's not added to nl02?  as it's a new distro, we could restrict it to the rax servers to start and then roll out everywhere when it is working (i.e do it in a follow-on)07:48
ianwit is easier to debug one thing at a time07:48
wxy-xiyuanMy thought is to enable it in a small place for testing first. Once it's stable enough, we can add it to everywhere. So I just added it in nl01 for x8607:51
opendevreviewIan Wienand proposed opendev/system-config master: infra-prod: setup system-config on bridge in bootstrap job  https://review.opendev.org/c/opendev/system-config/+/82032007:52
ianwwxy-xiyuan: ok; might be worth adding it in a follow-on but mark it wip07:53
wxy-xiyuanSure07:53
ianwclarkb / fungi: ^^that's a bit of a hail mary change and i'll think on it some more; but it feels right.  basically, a infra-prod-bootstrap-bridge job should *always* run, and it should a) install the production ansible and b) update system-config to the buildset reference07:55
ianwat the moment, we use playbooks/zuul/run-production-playbook.yaml to run "install-ansible.yaml".  that actually feels wrong -- that is using the "production" ansible to install ... the production ansible07:57
ianwnow I don't think you'd actually ever notice unless we wiped out ansible on bridge, but it still feels like we've got a hidden bootstrap problem with that07:58
ianwit is a question for me if our existing install-ansible role is 100% idempotent.  it probably is but i'd want to investigate07:59
ianwby removing the file matcher and making this run unconditionally, i think we have avoided the fundamental problem of the other queues not updating the source08:00
*** jpena|off is now known as jpena08:02
ianwthe DISABLE-ANSIBLE also needs integration.  I think the place to do that is from setup-keys -- after setting up so each executor can log into bridge, each job can then check the on-bridge-disk flag file and stop itself before it goes on08:02
*** ysandeep|afk is now known as ysandeep08:07
*** ysandeep is now known as ysandeep|ruck08:08
opendevreviewIan Wienand proposed openstack/project-config master: Update the opendev/system-config tag  https://review.opendev.org/c/openstack/project-config/+/81971508:16
opendevreviewElod Illes proposed openstack/project-config master: Add rights to neutron-dynamic-routing-stable-maint  https://review.opendev.org/c/openstack/project-config/+/82035108:51
*** ysandeep|ruck is now known as ysandeep|lunch08:58
*** ysandeep|lunch is now known as ysandeep10:13
*** ysandeep is now known as ysandeep|ruck10:16
opendevreviewArnaud Morin proposed openstack/project-config master: Disable nodepool temporarily  https://review.opendev.org/c/openstack/project-config/+/82036910:44
*** rlandy_ is now known as rlandy|ruck11:14
*** arxcruz|rover is now known as arxcruz12:43
fungiianw: i've pretty much always assumed there were bootstrapping gaps for bridge.o.o, but i agree it would be good to close them where we can13:08
*** pojadhav is now known as pojadhav|brb13:48
*** pojadhav|brb is now known as pojadhav15:20
*** chandankumar is now known as raukadah15:29
clarkbianw: fungi: left a couple of thoughts on that system-config update. I think it is close to what we want, but needs a few edits. Also as noted we might be able to do it in two stages where we can confirm the first job is doing what we want before we rely on it16:08
fungii'm still thinking through how best to test the hanging newlist call16:08
clarkbfungi: Maybe hack up the test input for the test job and undo the "don't send emails"16:10
fungiyeah, maybe drop all the lists except the mailman meta-list for one site16:11
clarkbfungi: for that I think you'll want to update the inventory stuff to remove all the lists except for your lists.openinfra.dev list. Then toggle the test flag to false16:11
clarkbya exactly16:11
clarkbthen you should be in a state where it is stuck and the job will timeout. Then you can hop onto the held node and rerun the newlist manually16:11
*** marios is now known as marios|afk16:15
*** pojadhav is now known as pojadhav|dinner16:36
*** ysandeep|ruck is now known as ysandeep|out16:38
opendevreviewJeremy Stanley proposed opendev/system-config master: DNM: Reproduce mailman newlist hanging  https://review.opendev.org/c/opendev/system-config/+/82039216:38
clarkbya something along those lines should work16:39
fungiif it ever gets a node assigned16:44
fungithere it goes16:45
corvusinfra-root: frickler requested that i move the docker valume mapped directory /var/keycloak/log to /var/log/keycloak in https://review.opendev.org/819923 --   do we have a collective preference for that?16:45
corvusin some cases we have, eg, /var/log/zuul, but in others, i see us putting all the docker volume dirs under one /var/foo16:46
fungii guess it's a question of whether it's more convenient to group the mapped dirs together in one place on the host fs16:46
clarkbI don't think we've been very consistent about how to capture logs for containers. Partly because services do logging in a variety of ways. For services that write to stdout/stderr we've got syslog capturing systems that write them to dedicated files on disk. Services like zuul and gitea we capture in log dirs but as you mention they are done in different ways16:47
fungii don't have a personal preference, other than for consistency, which we already seem to lack at this point16:47
corvusyeah, that change seems to have collected a very large number of nit comments where people said "we do it this way" and in fact, we do it that way half the time, and we do it another way the other half of the time.  so i'm trying to navigate that and produce something that will actually get some +2 reviews.16:48
corvusso i'm trying to figure out what the actual right answers are16:48
corvus(i copied that from the etherpad role, btw, so everything in that change has precedent)16:49
fungii'll admit i do tend to look in /var/log first when trying to find logs, but a quick skim of the docker-compose file typically sorts me out if i don't find whatever i'm looking for16:50
corvusokay, i'll switch it then since frickler has a preference and no one else does16:50
*** jpena is now known as jpena|off16:51
clarkbFor me my biggest concerns at this point are understanding the user we're running as (1000 in this case) and ensuring we don't accidentally delete state because ti was written into the ephemeral image and not a bind mount (I think we're mounting the h2 db dir?)16:51
corvusclarkb: correct16:51
clarkbOtherwise it is hard to be consistent with every application simply because they differ and referring to the configs (docker-compose in many cases) is a good way to determine that when debugging16:52
corvus(tbh, i prefer the other way considering that the actual filesystem location in the container is /opt/jboss/keycloak/standalone/log , and it's a sibling directory to data, so in my original change, they are both sibling directories in both locations)16:52
*** pojadhav|dinner is now known as pojadhav16:53
opendevreviewJames E. Blair proposed opendev/system-config master: Add a keycloak server  https://review.opendev.org/c/opendev/system-config/+/81992316:54
corvusclarkb, ianw, frickler ^ i think i addressed all the comments16:54
*** marios|afk is now known as marios16:59
*** marios is now known as marios|out17:08
opendevreviewClark Boylan proposed opendev/system-config master: Add a second Zuul user in gerrit testing  https://review.opendev.org/c/opendev/system-config/+/82039517:10
clarkbThats a naive first step in testing around case sensitive usernames17:10
opendevreviewJames E. Blair proposed opendev/system-config master: Add a keycloak server  https://review.opendev.org/c/opendev/system-config/+/81992317:24
*** pojadhav is now known as pojadhav|out17:24
corvusclarkb, fungi ^ one more ps to fix the thing fungi caught17:24
fungithanks!17:25
fungiso the good news is that i was able to recreate the hanging newlist call: https://zuul.opendev.org/t/openstack/build/ef9e10d4365b4aa69afac0bd4bd149de17:34
clarkbyay17:34
fungithe ara report has the tasks up to the one which tries to create the lists since that task times out and never completes17:34
fungidoesn't actually have that task itself, so i simply assume it's hanging like we observed in production17:35
clarkbya the json only writes out when the task completes iirc17:35
clarkband we probably timed out and killed ansible before that happened due to the nag17:36
clarkb*hang17:36
clarkbfungi: I guess you can run newlist by hand next and see what it prompts for and if that is different for the mailman meta list than a normal list?17:41
clarkbit may be that there is a required value to be provided and it can't default like the normal list case17:41
fungii suspect it's more that our workaround isn't working around. i'm going to test that next17:43
clarkbfungi: also why didn't testing catch this when you added the new site? I guess that is why it is good news we replicated17:45
opendevreviewJeremy Stanley proposed opendev/system-config master: DNM: Reproduce mailman newlist hanging  https://review.opendev.org/c/opendev/system-config/+/82039217:45
fungiwell, we don't test it because we explicitly avoid sending notifications, and it's the notification sending which prompts17:46
fungii replicated it by running the test job without disabling notifications17:46
fungiwe previously thought we had replicated it and that replacing stdin with an empty string would do the same as what we had tested, i believe17:47
fungibut we may need to add </dev/null or something like that17:48
clarkboh right I forgot about that so ya removing the no email flag makes it prompt ( I wish software didn't do that, but what can you do)17:49
clarkbfungi: yes I'm fairly certain we managed to confirm it was fixed, but I suppose its possible we only convinced ourselves of that and reality was different17:49
fungiwell, what's fun is that removing the no email flag makes it prompt if it thinks you're in an interactive shell, and it looks like ansible probably goes out of its way to convince shells that's the case17:50
fungipart of the problem is that killing the newlist once it reaches the notification prompt basically has the desired effect minus notification, since the list has already been created at that point17:52
fungiso subsequent runs will see the list exists and not rerun newlist17:52
clarkbfun.17:52
fungithe effective way to test this would be to configure exim to send all messages to /dev/null or something17:54
fungiso that newlist can believe it's notifying admins17:54
clarkbfungi: and remove the test only flag? I'd be open to that17:54
clarkbbut also not sure I know how to make exim garuntee that17:54
fungiwell, we could in theory use this change or a similar one to work out the details17:55
fungiso that we avoid annoying/confusing real list admins17:55
clarkb++17:56
clarkbfungi: maybe we need to run under nohup?17:58
clarkba new patchset to your existing change should be able to test something like that. Then we can work backwards to swap out exim configs?17:59
fungiyep. that ought to work18:00
fungimmm, nohup also redirects stdout/stderr to local files automatically. that may make debugging harder18:06
fungii'll wip it for the moment while we test whether it's an effective workaround at all18:06
opendevreviewJeremy Stanley proposed opendev/system-config master: Run newlist under nohup  https://review.opendev.org/c/opendev/system-config/+/82039718:09
clarkbfungi: nohup manpage says we can redirect to other files if we prefer. We would have to switch from command to shell module in the nasible to use redirects18:13
fungiyeah, and if we can use redirects we could just </dev/null explicitly instead18:13
clarkbya18:18
fungii think we ended up using cmd and overloading stdin that way because "shell" is frowned upon?18:19
clarkbya I think the linter prefers it that way but if redirecting then you need the shell and its fine18:21
clarkbfungi: https://zuul.opendev.org/t/openstack/build/d6d023ba318e44e6b5aefd861614061e/log/job-output.txt#22193-2222118:32
clarkbfungi: maybe the "fix" here is to switch to sending a newline on the stdin18:32
fungi819923,9 looks like it failed system-config-run-keycloak when trying to start apache, but we don't collect apache logs18:32
clarkbhttps://docs.ansible.com/ansible/latest/collections/ansible/builtin/shell_module.html#parameter-stdin_add_newline but apparently it is already the default to send a newline18:33
clarkbfungi: https://opendev.org/opendev/puppet-mailman/src/branch/master/lib/puppet/provider/mailman_list/mailman.rb#L69-L93 that is what puppte was doing I think. Unfortunately no explicit stdin handling18:38
clarkbfungi: is it possible that mailman updates and/or mailman on focal is the cause of this behavior change?18:39
clarkbthat could explain why we were confident it was fixed but now it isn't18:39
fungii can certainly try running the job on bionic18:42
fungior xenial?18:43
clarkbya it would've been xenial before18:43
opendevreviewJeremy Stanley proposed opendev/system-config master: DNM: Reproduce mailman newlist hanging  https://review.opendev.org/c/opendev/system-config/+/82039218:47
opendevreviewJeremy Stanley proposed opendev/system-config master: Redirect stdin for newlist  https://review.opendev.org/c/opendev/system-config/+/82039718:47
fungiswitched the reproducer to xenial, updated the workaround to redirect stdin with a shell task18:47
opendevreviewJeremy Stanley proposed opendev/system-config master: Pipe yes into newlist  https://review.opendev.org/c/opendev/system-config/+/82039719:25
clarkbinfra-root I copied my raw notes file for the gerrit user summit into my homedir on review0219:27
clarkbI'd typically prefer to stick them in an etherpad but they are very raw and have event urls and I'm not comfortable putting them on etherpad right away19:28
clarkbif you would like them more curated on an etherpad I can work on that next week19:28
corvusclarkb: don't forget to scrub the names for gdpr compliance!  (/sarcasm -- maybe, honestly, don't know)19:30
clarkbcorvus: ya... thats one of the things since I put some names in there19:30
clarkband they seem very cautious in that community about names :)19:31
corvussimple solution: give everyone aliases from clue.  Colonel Mustard uploaded gerrit 3.4 in the office with the release script.19:32
clarkbhahahaha19:32
fungithe newlist </dev/null in https://zuul.opendev.org/t/openstack/build/052e2fa7f2ef4fa592db74ffe991136d looks like it did work (i think the prompt was printed but bypassed), though subsequent tests failed for it19:32
fungioh, but maybe that's because i didn't fix the tests for the lists i omitted19:32
clarkbfungi: ya the tests check specific sites and lsits19:32
clarkbif </dev/null worked then we probably have a reasonable workaround19:33
fungii agree, i'll try switching back to that one momentarily19:34
opendevreviewJeremy Stanley proposed opendev/system-config master: DNM: Reproduce mailman newlist hanging  https://review.opendev.org/c/opendev/system-config/+/82039219:39
opendevreviewJeremy Stanley proposed opendev/system-config master: Redirect stdin for newlist  https://review.opendev.org/c/opendev/system-config/+/82039719:39
fungialso i confirmed the reproducer still reproduces on xenial, so i don't think this crept in with the focal upgrade, i think it was just never thoroughly tested19:40
fungifurther, i think corvus would make an excellent professor plum19:42
ianw> in some cases we have, eg, /var/log/zuul, but in others, i see us putting all the docker volume dirs under one /var/foo20:33
opendevreviewJeremy Stanley proposed opendev/system-config master: DNM: Reproduce mailman newlist hanging  https://review.opendev.org/c/opendev/system-config/+/82039220:34
opendevreviewJeremy Stanley proposed opendev/system-config master: Redirect stdin for newlist  https://review.opendev.org/c/opendev/system-config/+/82039720:34
ianwmy feeling on that is probably that if it's under /var/foo, /var/foo might be a separate cinder volume.  i feel like maybe gerrit/graphite are things that have separate storage volumes20:34
ianwanyway, not super fussed20:34
corvuswell, etherpad was the role i copied all that from20:35
corvus(so every thing someone disagreed with was true for the etherpad role).  there's /var/etherpad/db and /var/etherpad/www20:35
corvusi'm not super fussed either, but given the differences between roles and review comments, maybe we ought to go through and articulate a policy20:36
corvusinfra-root: public service announcement: because of all the recent 'zuul delete-state' runs, there are no autohold records in zuul, but there are some held nodes in nodepool.  might be worth a check of the nodepool nodes.20:38
ianwclarkb: thanks, will loop back on your comments, sorry yes i meant to add the nodes:[] from your prior comment on that, thanks for picking up20:39
fungiclarkb: one other thing i've noticed... even though i set my address as the testlist admin, i did *not* receive any notification from the test node. checked my mta's logs and there were no connections (not even rejections) from the node's ip address21:28
funginot sure if we're successfully blocking test nodes from sending e-mail already, or if that workaround is causing newlist not to generate the notification21:30
fungi(though it has an exit code of 0 so it didn't act like that was a failure)21:30
clarkbcorvus: thanks for the reminder I am pretty sure I have a couple I can clean up. Will check momentarily21:30
clarkbfungi: it oculd be the test node provider blocks smtp21:31
clarkbI have requested that nodepool delete my held nodes they shuold disappear momentarily21:32
clarkbfungi: maybe that is the easiest thing to do though add an iptables rule blocking port 25?21:33
clarkbleft that as a suggestion on the change so that it doesn't get missed if fungi is weekending already21:37
fungithis is how i weekend ;)21:45
clarkbfungi: if you were closer I'd take you fishing or something so that you could weekend more weekendy21:46
fungibut yeah, i agree, an egress rule blocking destination port 25/tcp before the allow all egress rule would be a great addition21:46
fungiif you were closer you wouldn't need to take me fishing, we could just cast from the yard ;)21:46
clarkbha indeed21:46
fungianyway, firewall rule is a stellar idea, far less effort than reconfiguring exim to drop outbound messages on the floor21:48
ianwi probably have some gerrit held nodes, they can be removed if in there, otherwise i'll clean up and test the new gerrit 3.4 images next week21:48
fungiclarkb: the main reason i brought it up is that i suspect we need to test whether it tried to deliver the message, so that we know the workaround isn't just equivalent to always doing newlist -q21:48
clarkbfungi: ah good point. Maybe we can test that on a held node?21:49
clarkbthrough manual invocations of newlist21:49
fungialso, maybe we can configure our parent job to collect exim logs21:49
fungiif i had the exim logs and they showed mailman sending notifications through exim (even if undeliverable because of firewall drop/reject) that would be enough to satisfy my concern21:50
fungiso i think that's what i'll do. a change to block outbound 25 on our test list nodes, a change to collect exim logs in all our deployment tests, and then drop the -q and associated conditional in the workaround21:52
clarkbsounds like a plan21:53
clarkbby the way my naive case insensitive username collision fails and this si apparently expected. The reason for this is while current gerrit treats existing usernames as case sensitive (therefore not breaking our existing users) it won't let you create new users that have collisions21:54
clarkbthat makes testing of the behavior changes a bit difficutl, but probably good enough for now21:54
fungiin fact, it might be a good idea to just make system-config-run block outbound 25/tcp from everything?21:55
fungisystem-config-run is only inherited by test jobs, right?21:56
clarkbyes system-config-run is independent of our prod stuff21:56
clarkbthere is overlap where the roles and common group/host vars are used21:57
clarkbbut system-config-run runs distinct playbooks to setup stuff and will also put new zuul test job specific host/group varsi n places21:57
fungiokay, i'll propose a single change which blocks 25/tcp outbound in system-config-run and collects exim logs21:57
fungiif that makes sense21:57
clarkbyup I think that sounds great21:58
fungithat way any of our deployment test jobs shouldn't be able to accidentally send outbound e-mail, but also if we're curious about whether something tried we can look in the log21:59
clarkbinfra-root https://review.opendev.org/c/opendev/system-config/+/820267 would be a good one to review for early next week. It upgrades gitea to 1.15.722:05
clarkbI'm a bit distracted this afternoon with parenting duties so please don't approve now unless you intend on watchnig it :) but I'll happily land it monday or fix issues if people find them22:05
opendevreviewJames E. Blair proposed opendev/system-config master: Add a keycloak server  https://review.opendev.org/c/opendev/system-config/+/81992322:17
corvusclarkb, fungi, ianw: i now have a definite preference for the ordering of server certs; i updated that to list the server first so that we don't have to template out more of the apache config22:18
corvus(and really, the individual server name is optional anyway; we could drop it and be fine; it's just a convenience for us when debugging with direct access)22:19
Clark[m]Ah because it can always be keycloak.opendev.org in the file path that way?22:19
corvusyep22:20
opendevreviewJames E. Blair proposed opendev/system-config master: Update letsencrypt role docs to suggest a specific order  https://review.opendev.org/c/opendev/system-config/+/82040922:26
ianwcorvus: i'm fine with adding on backups as we find out.  is there a particular reason you don't want to grab the service-status page via apache in the testinfra?  i've definitely seen things before where it was listening, but not actually responding correctly, so my preference is to do more end-to-end validation in testinfra if we can22:36
corvusianw:  oh sorry i forgot to reply to that comment... do we do that anywhere?22:40
ianwwe generally call out to curl; quite a few examples e.g. https://opendev.org/opendev/system-config/src/branch/master/testinfra/test_codesearch.py#L2322:40
ianwas this has a UI, could even do the screenshot stuff22:41
ianwi don't mind if this is a follow-on; just we do have facilities to do a lot more testing there22:41
corvushave an example with a system-status page?22:42
corvusi'm looking and can't find one22:42
ianwoh, i don't think explicitly a system-status page22:42
ianwbut we do have examples of using requests directly22:43
ianwhttps://opendev.org/opendev/system-config/src/branch/master/testinfra/test_paste.py#L35 is what i'm thinking of.  so if it's json that might be easier too22:45
corvushow about we put in a check for loading the main page (so something like "test_paste") in a followup?  I already destroyed my local test env; i think at this point it'll be easier to just look at the server when we boot it to get the correct output.  i think that's worth more than the apache status page anyway (which is only there again because i copied that from etherpad)22:50
corvusincidentally, the reason i missed that comment is that it's on line 23, and gerrit and gertty disagree on whether that file has a line 23.22:55
corvus(the final byte of that file is the newline on line 22)22:55
fungiseems fine to keep on the to do list for a followup change, i assume ianw would be fine with that too since he's +2'd the current change22:57
clarkbcorvus: that is a neat bug :)22:57
corvusyeah.  i'm clearly going to need to "fix" it even if i don't agree it's "broken" :)22:57
clarkbthe best bugs are the ones you fix that were never really boken in the first place22:58
corvusis it okay from a system-config ansible perspective for me to +w that now?22:58
clarkbI just sat back down after a school run and can take another look, however if only the le names moved I'm fine with a +w22:59
clarkbit also dones't have a new server yet so this is largely a noop until we boot one right?22:59
clarkbso ya +w should be fine22:59
fungiyes, that's all23:14
ianwfollow-up is fine.  i've just been writing a talk about how amazing our testing is, so i'm attuned to it atm :)23:16
corvusshould i +w that change, or should i create the server and add it to dns and inventory first?23:17
corvusnot sure about the chicken/egg thing here23:17
corvusalso, i'm going to guess 4g for this server23:18
ianwi'd probably make the server and add to inventory then merge the change but i don't think it matters23:20
Clark[m]Ya doesn't matter too much. You'll just have to follow-up with an inventory update next23:20
corvusokay, i'll do the server first... and redo it since i just named it keystone instead of keycloak23:23
corvuser, anyone know how to run "openstack server list" on bridge?23:28
fungihuh, yeah, i'm getting a "temporary failure in name resolution"23:30
corvusoh okay so that was supposed to work23:30
fungisudo ~fungi/osc/bin/openstack --os-cloud openstackci-vexxhost --os-region-name ca-ymq-1 server list23:30
fungii installed latest osc in a venv there23:31
corvusoh should i make this in vexxhost and not rax dfw?23:31
fungioh, i see, i think the name resolution failures are something to do with osc being installed in a docker container?23:31
fungi/usr/local/bin/openstack on bridge is a wrapper script calling docker run23:32
corvusoh, so running your command with rax instead of vexx might work?23:32
fungithere may be dns resolver configuration problems within the container itself23:32
corvusrunning `~fungi/osc/bin/openstack --os-cloud openstackci-rax` as root does not work for me23:33
corvus`Version 2 is not supported, use supported version 3 instead.`23:33
fungiyeah, i was just trying to figure that out as well23:34
fungii wonder if rackspace changed their keystone23:34
corvussomehow launch-node works tho23:34
fungiyeah, our clouds.yaml seems to set identity_api_version: 223:35
corvusokay, i managed to find the right rackspace web login and deleted the server23:37
fungii'm betting we need to update the clouds.yaml now23:38
ianwi am 100% sure we've had this problem with the openstack wrapper on bridge before23:41
ianwi just can not find any details about it23:42
ianwi feel like we might have done something like a docker restart and it started working23:43
fungioh, i think the "Version 2 is not supported, use supported version 3 instead." error is coming from osc itself. rackspace still uses/needs keystone v2 api, so you have to use an old osc release to talk to rackspace23:43
fungiit seems like osc has given up on the idea of backward compatibility there23:45
Clark[m]This is why shade existed and it makes me wonder if we need to resurrect that sort of idea with the sdk team23:47
fungithough i can't be sure, as i'm not actually finding that error string in osc23:51
Clark[m]It would be in the keystoneauth library23:54
Clark[m]And ya grepping those repos is often an exercise in frustration because the code does too much magic23:54
fungii think it's actually bubbling up from cinderclient?23:58
fungisudo ~fungi/osc/bin/openstack --os-cloud openstackci-rax --os-region-name DFW --os-volume-api-version 3 server list23:59
fungithat's working for me now23:59
fungioverriding the volume_api_version: 2 from clouds.yaml23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!