Thursday, 2022-10-27

opendevreviewIan Wienand proposed opendev/system-config master: inventory: add host keys  https://review.opendev.org/c/opendev/system-config/+/86276200:07
corvusinfra-root: there seems to be a zuul issue00:35
corvusi think nodepool is not connected to zk00:36
corvus2022-10-27 00:35:45,250 WARNING kazoo.client: Connection dropped: socket connection error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)00:36
ianwthis seems like suspiciously something a new bridge would have done ...00:36
corvusah -- maybe it created a new CA00:37
corvusi think if the ca directory wasn't manually copied over, it would probably do that00:37
ianwwe just fixed the nodepool job -> https://zuul.opendev.org/t/openstack/builds?job_name=infra-prod-service-nodepool&project=opendev/system-config00:38
corvusif that's the case, then we might have restarted nodepool but not the zk servers; so we might need to restart everything?00:38
ianwi did not copy a CA directory.  which one so i can update https://etherpad.opendev.org/p/bastion-upgrade-nodes-2022-10 ?00:38
ianw(sorry just pulling up roles, not 100% familiar with the layout)00:39
ianw /var/zookeeper/tls i guess?00:40
corvuson bridge it's /var/zk-ca and /var/jaeger-ca00:40
corvus(we have 2 cas now)00:40
ianwmea culpa i totally overlooked migrating those :/00:41
corvusi didn't think about it either :|00:41
ianwif i copy them from old bridge -> new bridge and re-ansible nodepool, that should work?00:41
corvusyep i think so00:41
ianwok one sec ...00:42
corvusand if i'm right about the sequencing, then that may be the only thing necessary -- i think zuul<->zk is still using existing connections and so hasn't noticed or cared about any files changing out from under it.00:42
corvus(so we might be able to get away with not touching the rest of the system and it will just resume once nodepool is supplying nodes)00:43
ianwi've put them in place (old versions in /root/old-ca) and am running a manual run of ansible-playbook service-nodepool now00:45
corvusnl01 is doing work now00:48
corvusand zuul is running new jobs00:48
ianw<phew>00:49
corvusso we're probably back in service00:49
ianwwe would have noticed almost straight away -- but the nodepool playbook wasn't working due to the old !!binary data in the inventory, for which our best guess is that python3.10's json encoder is more unhappy about00:50
ianwc.f. https://review.opendev.org/c/opendev/system-config/+/86275900:50
ianwthanks for that, and i will be putting together notes from that etherpad to make it a bit easier to switch hosts next time00:51
ianwplaybook just finished00:52
corvusthank you!00:52
*** rlandy|bbl is now known as rlandy|out01:25
opendevreviewIan Wienand proposed opendev/system-config master: bastion host: add global known_hosts values  https://review.opendev.org/c/opendev/system-config/+/86276402:00
opendevreviewIan Wienand proposed opendev/system-config master: base: restrict bastion login to bridge01.opendev.org  https://review.opendev.org/c/opendev/system-config/+/86276502:10
opendevreviewIan Wienand proposed opendev/system-config master: Remove old bridge testing  https://review.opendev.org/c/opendev/system-config/+/86276602:10
opendevreviewIan Wienand proposed opendev/system-config master: bastion host: add global known_hosts values  https://review.opendev.org/c/opendev/system-config/+/86276402:21
opendevreviewIan Wienand proposed opendev/system-config master: base: restrict bastion login to bridge01.opendev.org  https://review.opendev.org/c/opendev/system-config/+/86276502:21
opendevreviewIan Wienand proposed opendev/system-config master: Remove old bridge testing  https://review.opendev.org/c/opendev/system-config/+/86276602:21
opendevreviewIan Wienand proposed opendev/system-config master: inventory: add host keys  https://review.opendev.org/c/opendev/system-config/+/86276202:49
opendevreviewIan Wienand proposed opendev/system-config master: bastion host: add global known_hosts values  https://review.opendev.org/c/opendev/system-config/+/86276402:49
opendevreviewIan Wienand proposed opendev/system-config master: base: restrict bastion login to bridge01.opendev.org  https://review.opendev.org/c/opendev/system-config/+/86276502:49
opendevreviewIan Wienand proposed opendev/system-config master: Remove old bridge testing  https://review.opendev.org/c/opendev/system-config/+/86276602:49
opendevreviewIan Wienand proposed opendev/system-config master: bastion host: add global known_hosts values  https://review.opendev.org/c/opendev/system-config/+/86276403:28
opendevreviewIan Wienand proposed opendev/system-config master: base: restrict bastion login to bridge01.opendev.org  https://review.opendev.org/c/opendev/system-config/+/86276503:28
opendevreviewIan Wienand proposed opendev/system-config master: Remove old bridge testing  https://review.opendev.org/c/opendev/system-config/+/86276603:28
opendevreviewIan Wienand proposed opendev/system-config master: bastion host: add global known_hosts values  https://review.opendev.org/c/opendev/system-config/+/86276403:49
opendevreviewIan Wienand proposed opendev/system-config master: base: restrict bastion login to bridge01.opendev.org  https://review.opendev.org/c/opendev/system-config/+/86276503:49
opendevreviewIan Wienand proposed opendev/system-config master: Remove old bridge testing  https://review.opendev.org/c/opendev/system-config/+/86276603:49
*** dasm|rover is now known as dasm|off03:50
*** marios is now known as marios|ruck04:57
opendevreviewIan Wienand proposed opendev/system-config master: bastion host: add global known_hosts values  https://review.opendev.org/c/opendev/system-config/+/86276405:35
opendevreviewIan Wienand proposed opendev/system-config master: base: restrict bastion login to bridge01.opendev.org  https://review.opendev.org/c/opendev/system-config/+/86276505:35
opendevreviewIan Wienand proposed opendev/system-config master: Remove old bridge testing  https://review.opendev.org/c/opendev/system-config/+/86276605:35
opendevreviewIan Wienand proposed opendev/system-config master: bootstrap-bridge: Codify allowed Zuul logins  https://review.opendev.org/c/opendev/system-config/+/86276105:35
opendevreviewIan Wienand proposed opendev/system-config master: base: restrict bastion login to bridge01.opendev.org  https://review.opendev.org/c/opendev/system-config/+/86276505:35
opendevreviewIan Wienand proposed opendev/system-config master: Remove old bridge testing  https://review.opendev.org/c/opendev/system-config/+/86276605:36
opendevreviewIan Wienand proposed opendev/system-config master: inventory: add host keys  https://review.opendev.org/c/opendev/system-config/+/86276205:36
opendevreviewIan Wienand proposed opendev/system-config master: bootstrap-bridge: Codify allowed Zuul logins  https://review.opendev.org/c/opendev/system-config/+/86276105:38
opendevreviewIan Wienand proposed opendev/system-config master: inventory: add host keys  https://review.opendev.org/c/opendev/system-config/+/86276205:38
opendevreviewIan Wienand proposed opendev/system-config master: bastion host: add global known_hosts values  https://review.opendev.org/c/opendev/system-config/+/86276405:38
*** jpena|off is now known as jpena06:58
noonedeadpunkfrickler: yes, https://pubmirror1.math.uh.edu/fedora-buffet/ is broken indeed09:26
noonedeadpunkhttps://pubmirror1.math.uh.edu/fedora-buffet/epel/9/Everything/x86_64/repodata/repomd.xml contains links that does not exist in fact09:26
noonedeadpunkShould we jsut try switching mirrors then?09:27
mnasiadkaWe should, Kolla is also affected09:27
mnasiadkaquestion to which one09:27
noonedeadpunkTry to open metalink from where rsync is ran?09:29
noonedeadpunkIe `curl https://mirrors.fedoraproject.org/metalink?repo=epel-9&arch=x86_64` ?09:29
noonedeadpunkthis will get list of best-matching mirrors based on the location09:29
mnasiadkaYes, but we would need infra-root help with that I guess ;-)09:31
mnasiadka6 years ago it was changed from the mirror on kernel.org - https://opendev.org/opendev/system-config/commit/8e3bfee4ee740cb59bb800c309ae73e19d7a05c909:32
noonedeadpunkwell, or anybody who has access to rax or vexxhost I guess ?:)09:32
fricklerthis is what I get on mirror-update, where we run the rsync https://paste.opendev.org/show/bLeOZxhSnT4q7JtcDLUd/09:58
fricklerfeel free to propose a change to use one of those09:58
noonedeadpunkI like rsync://download-ib01.fedoraproject.org/ or rsync://mirrors.mit.edu/10:16
noonedeadpunkfedoraproject sounds like smth reference...10:16
noonedeadpunkWill propose patch10:16
noonedeadpunkbut not sure how they're permanent though...10:18
mnasiadkafedoraproject.org seems like it's at least close to source and maintained by fedora10:20
noonedeadpunkwell, they have download-ib01.fedoraproject.org and download-cc-rdu01.fedoraproject.org and dunno if they mean to make these persistant...10:22
noonedeadpunkI used mit.edu mirrors for other repos and they tend to work nicely and reliably for years...10:24
noonedeadpunkand domain doesn't look like smth that can change in a week and for metalink usage only...10:24
opendevreviewDmitriy Rabotyagov proposed opendev/system-config master: Switch epel mirror to mirrors.mit.edu  https://review.opendev.org/c/opendev/system-config/+/86279310:26
*** rlandy|out is now known as rlandy10:34
*** dviroel_ is now known as dviroel11:30
fungiwe did use mit mirrors for a while for something, but then they had prolonged issues and we switched, as usually tends to happen11:43
fungihttps://review.opendev.org/811832 Revert "Switch Fedora mirror to mirrors.mit.edu"11:44
fungiroughly a year ago we switched our fedora mirroring from mit to uh for similar reasons to the problem at uh now for epel11:45
noonedeadpunkugh11:49
noonedeadpunkI didn't know that :(11:49
noonedeadpunkshould we go with download-ib01.fedoraproject.org/ then?11:50
noonedeadpunkOr mirror.cogentco.com ? :D11:51
* noonedeadpunk don't trust cogent too much11:51
noonedeadpunkbut might be worth a shot...11:52
noonedeadpunkI would say that issue with MIT was different.11:53
noonedeadpunkas uh now have repomod.xml jsut out of sync with actuall content11:53
fricklerthe ideal solution would be to find a mirror with operators someone has a direct contact to11:58
fungiyeah, the point is that every mirror we ever try eventually gets swapped out for another due to prolonged issues, so i don't expect that to necessarily change12:11
fungisometimes we even do have responsive mirror operators, but when they respond with something like "we're aware and are rebuilding our mirror but it's going to take another week to finish" then we still end up needing to switch to something else12:12
noonedeadpunkWell, it's ideal solution indeed... 12:19
noonedeadpunkBut unfortunatelly I don't think I do have such contacts :(12:20
*** dasm|off is now known as dasm|rover13:10
kayohello everyone, if I want to start coding around openstack where is the best place to do it? and we have a tag like 'easy hacks' to start?13:44
fungikayo: this channel is for a collaboration services community. our systems host a variety of projects, one of which is openstack. the openstack community has a guide for getting started contributing to their projects though, and you can find it here: https://docs.openstack.org/contributors/13:46
kayogot it, thanks fungi13:47
fungimy pleasure13:47
fungiand if you have questions generally relevant to the opendev collaboratory, this would be the place to ask13:48
noonedeadpunksooo..... should we jsut wait for current mirrors to re-sync (hopefully they're taking care of it)13:53
funginoonedeadpunk: this code comment suggests that tibbs in #fedora-admin might know the status: https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror-update/files/epel-mirror-update#L28-L2914:02
fungilooks like we switched from mirrors.kernel.org to pubmirror1.math.uh.edu for epel mirroring 5.5 years ago in https://review.openstack.org/45327414:05
fungiwith the rationale that "we have a point of contact for future failures"14:06
noonedeadpunksurprisingly, person still seems hanging in same channel...14:07
noonedeadpunkI haven't treated this comment and point of contact - shame on me14:08
fungiwe seem to be using mirror.facebook.net for centos and fedora, not sure if they also mirror epel, but also it's not as if they haven't had stale/corrupt mirrors for extended periods in the past year too14:09
noonedeadpunkthey do14:10
noonedeadpunkfacebook was in the list14:10
noonedeadpunk`rsync://mirror.facebook.net/fedora/epel/`14:10
fungiwe've tried lots of mirrors for these things over the years, and the only thing they seem to have in common is that they all suffer issues from time to time14:10
fungias evidenced by the fact that we have this discussion about one or another of them every few months14:11
noonedeadpunkwell, that is true. But they ususally recover in ~24 hours I'd say14:11
noonedeadpunkAnd it's third day it's broken which a bit annoying14:12
noonedeadpunkI pinnged tibbs so let's see14:13
fungithanks!14:14
fungiand if it's going to be a while (or they're unresponsive), we can try a different mirror yet again14:14
*** knikolla[m] is now known as knikolla14:38
clarkbI went ahead a +2'd the mirror change since that seems reasonable if we don't hear back15:08
clarkbalso I'm having a very slow start this morning15:08
clarkbnote that it is my understanding that we cannot use any of the mirrors under fedoraproject.org as those are all top level and only full public mirrors may sync from there15:10
noonedeadpunkI can put -W to wait for it for a while15:12
noonedeadpunk*for answer15:13
mnasiadkaI moved Kolla builds to use  download-ib01.fedoraproject.org mirror and it works15:13
noonedeadpunkI think you can reelase -W with core 15:13
mnasiadka(for now)15:13
clarkbya end users using those mirrors is fine. It is our mirrors that shouldn't sync from there15:14
clarkbinfra-root anyone have time to review https://review.opendev.org/c/opendev/system-config/+/862631 to create python3.11 base images now that 3.8 has been dropped?15:30
clarkbcorvus: ianw: do we need to double check that the auth for tracing didn't get affected like nodepool?15:35
noonedeadpunkum, should 3.8 be dropped?15:35
clarkbnoonedeadpunk: yes, opendev doesn't use it anymore on an of our container images15:35
noonedeadpunkah, ok, fair. it's internal15:36
clarkbthis is independnet of the platforms Zuul provides for testing and is used for running our services15:36
noonedeadpunkyeah-yeah, fair15:36
noonedeadpunksorry15:36
clarkbseparately looking at the zuul job runtime improvements between 3.8 and 3.10/3.11 I think everyone should drop it :)15:37
clarkbit will be interesting to see if openstack's 3.10 testing shows similar results or not15:37
noonedeadpunkwell, after AA 15:37
corvusclarkb: i mentioned both CAs at the time and was operating under the assumption ianw copied both over.  jaeger is receiving traces, so i think we can wait for him to confirm.15:39
clarkback15:41
*** dviroel is now known as dviroel|lunch15:50
opendevreviewDmitriy Rabotyagov proposed opendev/system-config master: Switch epel mirror to pubmirror3.math.uh.edu  https://review.opendev.org/c/opendev/system-config/+/86279316:08
*** marios|ruck is now known as marios|out16:09
noonedeadpunkso, repo maintainer has responded and suggested using pubmirror3 as it's lighter and less loaded and less likely to fail16:09
fungithanks! expediting then16:09
fungistill has rsync available, i take it?16:10
noonedeadpunkWell look like it does16:10
fungii'll know momentarily16:11
noonedeadpunkI checked with rsync -rltvz   rsync://pubmirror3.math.uh.edu/fedora-buffet/epel/8 /tmp/epel16:11
noonedeadpunkbut would be great if you could double-checked as I'm quite annoyingly side-pinged16:11
fungisync in progress now16:13
funginoonedeadpunk: mnasiadka: our epel mirrors should be updated to match what's on pubmirror3.math.uh.edu now. please double-check they have the content you're hoping for16:15
fungiif they do, i'll approve 862793 and release my lock session once that deploys16:16
noonedeadpunkthis means basically issue recheck somewhere?16:21
fungiunless you know how it was breaking and can more directly check the files there seem to have the correction16:25
noonedeadpunkissued recheck for 86160116:27
noonedeadpunkit will fall into retry_limit if did not help16:27
fungihopefully should be able to see fairly quickly if it gets past installing things from epel or not?16:28
noonedeadpunkwell... I think it takes ~15 mins before our stuff start running at all16:29
noonedeadpunkbut yes, realtively quickly we can confirm if it fixed issue or not16:29
*** jpena is now known as jpena|off16:32
mnasiadkafungi: pubmirror3 looks good16:37
fungithanks for confirming!16:39
fungionce noonedeadpunk is satisfied, i'll make it permanent16:39
* noonedeadpunk needs another 5 mins16:41
*** dviroel|lunch is now known as dviroel16:45
noonedeadpunkfungi: works for us!16:49
fungiawesome, thanks!16:50
opendevreviewMerged opendev/system-config master: Switch epel mirror to pubmirror3.math.uh.edu  https://review.opendev.org/c/opendev/system-config/+/86279317:15
clarkbfungi: do you know if I write config to disable phased updates on all debuntu machines if older versions like bionic will complain because tehy don't recognize the config entries?17:46
clarkbI've remembered we probably want to disable phased updates for our prod jammy servers and trying to decide if I need to only apply that file to jammy and newer17:46
clarkboh I can use the held gitea server to check18:00
clarkbhrm that server doesn't seem to exist anymore. I don't think I cleaned it up18:02
fungiyeah, i'm really not sure to be honest, though i expect apt will just ignore things it doesn't recognize18:02
fungithe held mm3 listserv is jammy18:03
fungi862793 has deployed, so i'm releasing the lock now18:04
clarkbI've just remembered my local fileserver is bionic. I'll test really quickly on that18:05
clarkbseems like it does get ignored (or if it doesn't then it does what we want either way). I was able to apt-get update and dist-upgrade my local machine without complaint18:08
opendevreviewClark Boylan proposed opendev/system-config master: Remove snapd from our servers  https://review.opendev.org/c/opendev/system-config/+/86283418:09
opendevreviewClark Boylan proposed opendev/system-config master: Don't install phased package updates with apt  https://review.opendev.org/c/opendev/system-config/+/86283518:09
clarkbThose are two changes related to booting new jammy servers. Though the first applies to our older servers too18:09
*** dasm|rover is now known as dasm|off19:21
*** dviroel is now known as dviroel|afk19:39
ianwclarkb/corvus: yep i copied both of those ca's to the new bridge19:54
clarkbianw: for https://review.opendev.org/c/opendev/system-config/+/862761 is that necessary because the normal mechanism only runs after bridge is bootstrapped?19:58
clarkbwondering if we need to remove the old mechanism to reduce confusion19:58
ianwclarkb: so afaict there is no old method, other than manually copying in keys to zuul's authorised keys?20:00
ianwthat was why i thought it was better to keep track of it in system-config20:01
clarkbI thought we did a zuul user instanll20:02
clarkbianw: ya its via extra_users in the bastion group file20:03
clarkbbut I think that does occur after bootstrapping, but maybe its ok?20:03
ianwclarkb: hrm -- that doesn't install the authorized keys for the other projects?20:05
clarkbianw: it should install them. They are listed for the user in the all.yaml group var file20:06
ianwsystem-config : inventory/service/group_vars/all.yaml ?20:08
ianwoh i see20:09
clarkbya, extra_users indicates which users listed in that file should be added20:10
ianwok, i missed that path20:11
clarkbI think the chicken and egg here is once you've booted a new server how do you add the zuul user to it since we don't normall do that. Maybe this hsould be a flag to launch node instead20:11
clarkband have base install it normally rathe rthan something we do in that chnage?20:11
ianwyeah i will document this in the "how to replace bastion host"20:12
ianwthe launch node runs base manually, maybe?20:13
ianwthe problem might be that the new host isn't in the bastion group20:14
clarkbyes20:16
clarkbso the normal user setup happens which includes our users, but it won't do host specific extra users20:16
clarkbbut maybe we can set a flag for that as part of launch node? like setting a group20:16
clarkbthen still only run the base playbook, but it can preapply things like that (I think iptables would be the other big one?)20:17
ianwthanks, i'll flip that to WIP and look into that, i'm sure we can figure something there20:20
ianwhttps://review.opendev.org/c/opendev/system-config/+/862762/4 and https://review.opendev.org/c/opendev/system-config/+/862764/7 to add the host keys automatically i think are orthogonal to that, and hopefully useful20:20
fungii think i fell asleep while the nodepool zk config problem was being ironed out... is that solved now or should i review some changes?20:22
ianwfungi: i don't think there's anything more -- i migrated the old keys over and restarted nodepool and i believe that was it.  also added a note to https://etherpad.opendev.org/p/bastion-upgrade-nodes-2022-10 so i can write up something that doesn't forget it for next time20:23
fungioh, i meant the replacement for the ansible module. i saw you did something purely with templating/var expansion but i was losing coherency before test results came back20:25
ianwfungi: oh, that -- that is https://review.opendev.org/c/opendev/system-config/+/86275920:27
ianwthat seems to have reviews, thank you.  i think it should be safe to merge, but low priority, it's just slightly slow20:28
fungithanks, so nodepool deployments are still broken without that, right?20:30
fungioh, the extra binary copy of the zk keys was the underlying issue20:31
funginow i remember20:31
fungianyway, approved that module replacement. it seems like a net win all around20:33
fungimany times faster with much less code20:33
ianwfungi: yes, to be clear, that's correct.  the thing that was breaking was that we had (unused) keys in our private ansible with !!binary which couldn't be serialised to json.  the theory is that <python3.10 was for some reason less picky about it20:36
*** dviroel|afk is now known as dviroel20:46
opendevreviewMerged opendev/system-config master: nodepool-base: don't call out to find zk_hosts  https://review.opendev.org/c/opendev/system-config/+/86275921:05
*** rlandy is now known as rlandy|bbl21:12
*** dviroel is now known as dviroel|out22:10
clarkbfungi: or ianw any interest in python3.11 docker base images ? https://review.opendev.org/c/opendev/system-config/+/86263122:29
fungioh, right i had that up earlier and then got distracted22:30
ianwlgtm22:30
clarkbthanks!22:30
fungii could have sworn i reviewed that earlier, but anyway +322:30
fungii probably just got distracted by something in between looking through the diff and actually voting on it22:31
clarkbonce that is in I can update the zuul and nodepool changes to switch most of their testing over to 3.11 and then zuul can decide if that is worthwhile (I like zuul being able to show off how good CI/CD enable changes like that22:31
fungithanks22:32
clarkbon the gitea side of things I was going to look into their new proxy protocol support. Has anyone set that up before and if so any idea if apache can speak it too?22:32
clarkb(I think the proxy protocol may give us the logs we want of being abel to trace requests through the proxies properly)22:33
fungiwe have it set up between haproxy and apache2, right?22:34
fungii thought i remembered discussing it anyway, where proxy info is embedded into the socket setup22:34
clarkbI don't think we're doing the PROXY protocol22:35
clarkbits really only interesting because gitea's xforwarded for handling is broken22:37
clarkbmaybe it is best to just wait for or try to get that fixed instead22:37
clarkbkevinz: hey, it looks like we're still getting alerts that the cloud ssl cert will expire in a few days22:40
clarkbanother thing that just occured to me is adding that may cause direct requests to the backend to not work the way we expect22:45
opendevreviewIan Wienand proposed opendev/system-config master: inventory: add host keys  https://review.opendev.org/c/opendev/system-config/+/86276223:09
opendevreviewIan Wienand proposed opendev/system-config master: bastion host: add global known_hosts values  https://review.opendev.org/c/opendev/system-config/+/86276423:09
opendevreviewMerged opendev/system-config master: Add python 3.11 docker images  https://review.opendev.org/c/opendev/system-config/+/86263123:11

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!