Monday, 2022-10-03

opendevreviewIan Wienand proposed opendev/system-config master: [dnm] attempting to trigger zuul syntax error  https://review.opendev.org/c/opendev/system-config/+/86006102:24
*** soniya29 is now known as soniya29|ruck05:05
*** soniya29|ruck is now known as soniya29|ruck|afk06:54
*** jpena|off is now known as jpena07:17
*** soniya29|ruck|afk is now known as soniya29|ruck07:37
*** pojadhav is now known as pojadhav|sick07:55
*** marios is now known as marios|call08:47
*** soniya29|ruck is now known as soniya29|ruck|lunch09:00
*** marios|call is now known as marios09:04
*** soniya29|ruck|lunch is now known as soniya29|ruck10:26
*** rlandy|out is now known as rlandy10:35
*** lbragstad4 is now known as lbragstad11:05
*** dviroel_ is now known as dviroel11:40
*** dasm|off is now known as dasm12:59
*** rcastillo is now known as rcastillo|ruck13:29
opendevreviewNeil Hanlon proposed openstack/project-config master: Add rockylinux 9 to OSA grafana  https://review.opendev.org/c/openstack/project-config/+/86009414:19
clarkbinfra-root I'm going to try and dig into the jammy launch node issues today. corvus iirc the issue was deleting the ubuntu user which we were currently ssh'd in as?15:10
clarkbhrm it looks like one of the first things that launch node does is switch to root if it isn't already root15:11
*** dviroel is now known as dviroel|lunch15:11
clarkbah ok it was a specific pid 1559 which may have been running independently of the ssh connection15:12
clarkbfungi: also before I dive too deeply into ^ I should probably go and check that I can trace a connection from the gitea lb to apache to gitea itself15:19
fungimakes sense15:21
clarkbok breakfast first, then gitea, then jammy launches15:21
mtreinishrandom ssh key question. I'm trying to push a patch to gerrit and my public key (which I had been using on gerrit since ~2013) is being rejected. Looking at the verbose output it seems to be caused by "no mutual signature algorithm"15:29
mtreinishwas the a change in the allowed ssh key algorithms that I missed?15:30
clarkbmtreinish: yes, but on the client side. Chances are you are running newish openssh which dropped supported for rsa + sha1. but when they did that they didn't update the default to rsa + sha2 (a bug imo). Gerrit can do rsa+sha2 but doesn't support the key exchange extension to negotiate that so it fails15:35
clarkbmtreinish: I actually fixed that in newer gerrit but the backport to 3.5 is stalled because my account in upstream gerrit got deleted or smething and other people won't manually cherry pick the chagne for me15:36
clarkband google hasn't said what happened to my account yet15:36
clarkbmtreinish: there are two workarounds. One is to use a key that isn't an rsa key. The other is to specifically allow rsa + sha1 to review.opendev.org15:36
mtreinishheh, I guess it's the curse of archlinux again :)15:37
clarkbmtreinish: https://www.openssh.com/txt/release-8.8 the seciont on backward incompatible changes covers this as well as the rsa + sha1 work around15:37
mtreinishI also can try pushing from a system that I haven't updated in a while I guess15:37
mtreinishthanks I'll give those a try15:37
clarkboriginally we weren't going to backport the fix to gerrit 3.5 because that would require updating mina on 3.5 which required updating jgit. But then they did that last week for other reasons so now the key negotiation fix is valid15:38
clarkbI also looked at the openssh code to try and figure out how to update the fallback when the negotiation fails to sha2 since sha1 is absically never goign to work. And I got lost in all the indirection they do to implement defaults15:40
clarkbI could probably figure it out if I took the time to do a debug build and attach gdb15:40
clarkbbut meh15:40
mtreinishheh, yeah that's probably too much I would have given up long before that15:41
priteauHello. Just got a POST_FAILURE due to an host key verification failure.15:44
priteauhttps://zuul.opendev.org/t/openstack/build/80949a6467644308837009c3c39a6ecd15:44
clarkbpriteau: we believe those occur because openstack is reusing IP addresses in the cloud(s) that we boot test nodes in.15:44
priteauWould it make sense to clear known host somewhere in Zuul?15:45
clarkbpriteau: unfortunately there isn't much we can do about that from our end other than try and encourage openstack to stop doing that. But since we don't run the clouds we don't have insight into when/why it happens (though aiui cells are suspected)15:45
clarkbpriteau: no that won't help we'd just fail to ssh when the IP is attached to a host we don't control15:45
clarkbpriteau: basically two (or more) hosts end up with the same IP then fight over populating ARP tables15:46
priteauOh, that's bad15:46
clarkbwhichever is currently in the ARP tables wins and gets the connections. If that isn't our host then you get the failure you see. It is entirely a bug in openstack15:46
priteauI thought you meant reusing as in reusing later. Like Neutron does everywhere.15:46
clarkbno thats fine15:46
priteauA genuine bug in openstack? Or something broken in one of the clouds opendev uses?15:47
clarkbpriteau: I mean the fact that it is possible is a bug in openstack to me.15:47
clarkbit should never be possible for neutron/nova/whatever to give two different hosts the same IP at the same time15:48
clarkbeven if the issue is in a third party driver nova/neutron/whatever should say "no"15:48
clarkband fail to boot the second isntance instead15:48
clarkbfungi: I opened a connection to https://opendev.org/opendev/git-review from my desk. Then traced that to the backend. One thing I notice is that apache -> gitea is using a single connection for many requests to the frontend which means this isn't perfect but it is an improvement on what we had before15:49
clarkbpriteau: fwiw it is also possible for jobs to reset their ssh host keys which would also break this, but this is the standard openstack-tox-docs jobs which shouldn't do that unless the tox run is doing something very weird15:51
*** marios is now known as marios|out15:51
fungipriteau: apparently cells v1 was really bad about losing track of virtual machines, but i'm not sure all the occurrences are attributable to that15:58
fungibut in essence yes, what happens is that some old vm which nova no longer knows about is running on one of the hypervisors, but it/neutron think the ip address is available again so they assign it to a test node we boot, and then we intermittently end up trying to connect to the old stale guest rather than our test node15:59
fungithe cloud providers where this is relatively common seem to run automated "cleanup" tasks to find those rogue vms and clear them out periodically16:00
priteauIndeed, I could see this happening16:01
priteauRogue VMs16:01
corvusclarkb: yes, i think it's likely that userdel is just more careful now than in older versions, and the process (whatever it is) probably is running in older versions too.  i suspect the right answer may be to find a way to ignore the error and proceed16:02
mtreinishclarkb: thanks I just created a second key with ecdsa and was able to push my patch: https://review.opendev.org/c/openstack/stevedore/+/86010916:02
fungimtreinish: yep, that's probably the safest workaround16:02
mtreinish(the config option didn't work for me for whatever reason, I think I remember reading something in an arch package upgrade guide about the rsa keys, so it might be something on the package side)16:02
clarkbcorvus: ya userdel has a --force option whihc will get around that but it has a bunch of other new behavior it brings in too that we may not want16:03
clarkbcorvus just dropped but "This option forces the removal of the user account, even if the user is still logged in. It also forces userdel to remove the user's home directory and mail spool, even if another user uses the same home directory or if the mail spool is not owned by the specified user."16:05
clarkbthe idea I wanted to look into is rebooting before ssh'ing back in as root which should ensure that any remaining ubuntu owned processes are gone16:09
clarkbbut we can add the force option instead if others aren't worried about that (I think it may cause some stuff to get deleted for other old disabled users). Hrm maybe we move the regular disabled users and system image user disablement into different tasks and one can use force and theo ther won't16:10
clarkbI'll go ahead and write that change because I think it will be the least impactful and easiest to understand16:11
opendevreviewClark Boylan proposed opendev/system-config master: Disable distro cloud image users more forcefully  https://review.opendev.org/c/opendev/system-config/+/86011216:24
clarkbsomething like that maybe16:24
clarkbfungi: https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/thread/FVZW5DQJ7C3TW4LPIIU7ARI7XMVJYYWX/ thats the followup to my email about mailman3 docker images. TLDR it sounds like maxking is largely doing it alone and the others involved in the projcet aren't invovled with the docker stuff:/16:30
clarkbfungi: I'm beginning to wonder if we shouldn't fork the images16:30
*** jpena is now known as jpena|off16:30
fungigot it, so more of an example/starting point16:31
clarkbwell I think the intention is that they be production ready 16:32
clarkbbut I'm not sure they receive the attention needed currently to manage that. I'd be happy to help maxking upstream, but I'm not sure how to reach out other than what I've already done (issues/PRs/mailing list)16:33
fungiyeah. also we could always un-fork later if maintenance picks back up on it16:33
fungihopefully there's not a ton of churn for the projects being bundled into those images these days, since mm3 has had many years to stabilize now16:34
clarkbI'll have to think on this a bit more now that I'm leaning towards a local fork or modification. In particular we should decide if we want to build them up ourselves using a complete fork of the upstream docker fiels or if we just want to fetch the upstream images and modify them to our needs16:36
fungisure, we do both in different places, depending on the situation16:37
clarkbI think the major upside to forking properly is we can set the uids and gids without needing to do a global chown across the image. However, if we do that we're more than likely goign to never reconverge with upstream16:38
*** dviroel|lunch is now known as dviroel16:38
clarkbif we just want to install lynx then doing that in a new layer is probably simplest and most likely to allow us to unfork later16:38
clarkbso that might be the best place to start as it keeps the delta small and options open16:38
fungithough it leaves us with concerns over the uid/gid conflicts16:40
clarkbright16:42
fungilooks like the container author was last seen responding to this thread: https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/thread/NNYOXOE33DJEFWQ5WUBMJBB35IRAACQK/#S3KFJQD23IMACB7CR6K4ZWUQITREG6ID16:42
fungithat was roughly a month ago16:42
clarkbfyi jitsi meet updated about 4 days ago. I've gone ahead and subscribed to release notifications for https://github.com/jitsi/docker-jitsi-meet so that I'll get alerted when those updates happen17:53
clarkbBut we may want to retest meetpad soon just to be sure the update is working for us17:53
fungiclarkb: spotz and i used it a few hours ago for the diversity wg meeting, seemed fine still18:37
*** dviroel is now known as dviroel|afk19:37
clarkbfungi: ah cool19:44
clarkbfungi: over in gerrit land they are trying to run debian buster jobs and use openjdk-8. It seems that openjdk-8 is not in buster proper but is in sid. Do you know anywhere in our jobs where we might add debian unstable as an example I can show them?19:59
clarkbI showed them what zuul did to install libc previously20:08
clarkbI think that will work. Pin the default release to stable then install openjdk-8 which only exists in unstable20:08
fungitechnically you don't need to pin testing or unstable if you have stable sources, since the repositories themselves set relative priorities, so you'll only ever wind up getting packages from sid if they don't exist in buster or you explicitly request them by version or suite name20:56
clarkboh cool. I'd push a patch to them but I can't do that aynmore because something broke my account. But I gave them lots of examples and hopefully they can address it themselves with that info20:56
ianwclarkb: thanks for looking at the jammy launcher, i was wondering if that would work.  21:08
ianwrelated to that; https://review.opendev.org/q/topic:bridge-ansible-venv is ready for review21:09
*** dasm is now known as dasm|off21:16
clarkbah cool I'll have to take a look at those now21:21
clarkbI think I reivewed some of them previously but that was very early on.21:21
clarkbfungi: I've nearly got a mm3 docker image fork change ready to push on top of the existing change21:22
clarkbhowever, I'm realizing that we may run into the problem with the locale stuff21:22
clarkbbut we'll figure that out if it becomes a problem I guess21:22
opendevreviewClark Boylan proposed opendev/system-config master: WIP fork the maxking/docker-mailman images  https://review.opendev.org/c/opendev/system-config/+/86015721:36
clarkbI think the django msgfmt issue with the mailman3 images is addressed by https://github.com/django-extensions/django-extensions/pull/1740 which seems to be in the most recent relaese of that tool and is newer than the failures I saw upstream21:37
opendevreviewClark Boylan proposed opendev/system-config master: WIP fork the maxking/docker-mailman images  https://review.opendev.org/c/opendev/system-config/+/86015721:41
clarkbonce ^ seems to be working we can layer in our additions as heavily as we like21:44
clarkbright now all I'm doing different than upstream is adding lynx21:44
clarkbianw: re the ansible in venv. The plan is to switch the existing server over to the venv first right? So we've got to be careful about not updating ansible to start?21:46
fungiclarkb: awesome (wrt mm3 image fork), i was planning to do at least one more import on a fresh held node21:51
fungiand yes, i saw the thread on their ml about that pr which maxking suggested but then hasn't had time to review since21:51
fungispotted it when i was browsing their archives earlier today21:52
clarkbhttps://github.com/maxking/docker-mailman/pull/555 is the related PR fwiw21:54
clarkbheh tox-linters explodes on the vendored scripts21:55
opendevreviewClark Boylan proposed opendev/system-config master: WIP fork the maxking/docker-mailman images  https://review.opendev.org/c/opendev/system-config/+/86015722:03
clarkbwe apparently need buildkit to build these images22:03
ianwclarkb: yeah, that entire stack should be safe to apply to the existing host.  but i do want to cut over to the new server sooner rather than later.  22:08
*** rlandy is now known as rlandy|bbl22:11
clarkbianw: suggestion on https://review.opendev.org/c/opendev/system-config/+/857799/ which affects its child22:17
ianwthanks; yep we can add a bionic job, just to be sure22:18
clarkbianw: do you know if the updated selenium works against older selenium that will install on bionic? I guess we'll find out :)22:21
ianwno the python would be too old for that.  but in just a basic bridge deployment test i don't think we'd be trying to run selenium22:21
clarkboh right22:22
clarkbits only gitea, paste, codesearch etc that do the screenshots22:22
clarkbwoot my images built this time in 86015722:25
clarkbianw: also did you see https://review.opendev.org/c/zuul/zuul/+/855309/ merged?22:29
clarkbhttps://review.opendev.org/c/opendev/system-config/+/855472 which means that chagne is ready to land when we are I think. I've got it on the meeting agenda too22:29
ianwyeah, thanks.  maybe merge early tomorrow (for me) and can monitor22:30
clarkbinfra-root I finally updated the meeting agnead for tomorrow. Is anything important missing from that?22:34
ianwlgtm, thanks22:35
opendevreviewClark Boylan proposed opendev/system-config master: WIP fork the maxking/docker-mailman images  https://review.opendev.org/c/opendev/system-config/+/86015723:05
clarkbnow with less bashate hate23:05
clarkband agenda sent23:09
opendevreviewClark Boylan proposed opendev/system-config master: WIP fork the maxking/docker-mailman images  https://review.opendev.org/c/opendev/system-config/+/86015723:14
clarkbit helps to properly test that first (I did actually test it but because find prints a ton of lines I missed it was still printing the lines I didn't want)23:15

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!