Tuesday, 2021-01-05

clarkbfungi: we've got high load on bridge again. I think beacuse of the other servers you discovered that took a holiday recently00:32
fungioh, likely so, i can take a look in a bit if you haven't00:33
clarkbya it appears to be leaky ansible again but for elasticsearch and logstash workers? I'm going to look at them then kill old processes00:34
clarkbelasticsearch 02-04 and 07 all needed to be restarted and logstash-worker02 and 1000:47
clarkbthis was based on info from stale ssh connections reported on bridge00:47
fungijudging solely from the ones they opened tickets on for us, rax was doing a significant amount of host migrations over the holidays, so not surprised at all00:49
clarkbI'm cleaning up the ansible procesess now00:50
fungiahh, thanks!00:54
clarkbfungi: are you able to keep an eye on the gitea change?00:55
clarkbif so I'll finish this cleanup then go help with dinner00:55
fungiyeah, i'll try to keep tabs on the servers00:59
clarkbprotip the ssh controlmaster persistent processes don't seem to get cleaned up once they get reparented to init when you kill their parents01:03
openstackgerritMerged opendev/system-config master: Upgrade to gitea 1.12.5  https://review.opendev.org/c/opendev/system-config/+/76922501:03
clarkbI got all the stale ansible-playbook processes and now and goign through those ssh processses01:03
clarkbfungi: ok I think bridge is much happier now01:06
clarkbremote puppet else is currently running then the gitea deploy should happen01:06
fungiyup, that's what it's looking like to me01:07
clarkbfungi: I believe ti should do the servers in order too so you can check https://gitea01.opendev.org:3000/ then 02 and so on as it works through them01:07
clarkbfungi: looks like gitea01 just updated. I see the new version and can browse zuul via the web ui01:19
fungiyeah, i've been testing them and not seen an issue yet01:31
fungiand fiished01:32
fungiloading the webuis has been sluggish for me, but could be my connection, or cold caches on the servers i guess01:32
fungipoked around a bit and they seem fine01:33
fungionce i get the assets loaded for theming and whatnot, it's snappy01:33
openstackgerritBrian Rosmaita proposed opendev/gerritlib master: Update documentation  https://review.opendev.org/c/opendev/gerritlib/+/76924103:19
fungii'm firing up an engagement stats run, queries might ratchet up the load on gerrit a little but i'll keep an eye on it, will probably run until around 06:00 based on prior experiences03:52
openstackgerritdaniel.pawlik proposed openstack/diskimage-builder master: Replace Fedora 31 functional tests with Fedora 33  https://review.opendev.org/c/openstack/diskimage-builder/+/76809207:57
openstackgerritMerged openstack/diskimage-builder master: Add Python3 Wallaby unit tests  https://review.opendev.org/c/openstack/diskimage-builder/+/75724709:37
openstackgerritThierry Carrez proposed openstack/project-config master: release-scripts: Remove misleading error message  https://review.opendev.org/c/openstack/project-config/+/76935314:30
clarkbfungi: looks like ethercalc, logstash-worker 17 & 20 may also be in similar situation to those other servers from yesterday. We've got some older stale ansible ssh connections to them since my previous cleanup16:04
clarkbI half expected once the previous set was cleared out ansible would run further and find new ones. I'll take a look at them in a bit16:04
fungiclarkb: yeah, i was getting to ethercalc after checking zm02 (we got a ticky from rax overnight about a host problem impacting it)16:05
fungiseems to be up and working now though16:05
fungilooks like it came up around 13:14 utc16:05
clarkbethercalc is up you mean or zm02?16:06
fungi#status log zm02 was rebooted by the provider at 13:14 utc following recovery from a host outage16:06
openstackstatusfungi: finished logging16:06
fungii'm checking ethercalc's console now16:07
clarkbethercalc seems to ask for my ssh key then stops there. the logstash workers don't even get that far. I'll reboot the logstash workers16:07
fungiyeah, ethercalc may have lost contact with its rootfs or something16:07
fungithe usual hung kernel tasks kmesg spam on the console16:08
fungii'll reboot it16:08
clarkbthen I'll give it an hour an look at any leaked ansible processes (the time delta makes it easier to spot them in ps output)16:09
openstackgerritMerged openstack/project-config master: release-scripts: Remove misleading error message  https://review.opendev.org/c/openstack/project-config/+/76935316:11
fungi#status log rebooted ethercalc.o.o because the server hung some time in the past 24 hours16:20
openstackstatusfungi: finished logging16:20
fungilooks like we don't have the new ethercalc server in cacti16:21
*** lpetrut has quit IRC16:21
openstackgerritJeremy Stanley proposed opendev/system-config master: Clean up ethercalc server replacement transition  https://review.opendev.org/c/opendev/system-config/+/76939616:29
fungithat should solve it ^16:29
clarkbfungi: https://zuul.opendev.org/t/openstack/build/d72fc062ed96415a8d62dade5503a212/log/gitea99.opendev.org/docker/gitea-docker_gitea-ssh_1.txt#5 is at least one of the problems perveting that gitea 1.13.1 upgrade from succeeding. Any idea what sshd is trying to tell us there?16:43
clarkbalso really curious why 1.12.5 was fine and 1.13.1 is not. They use the same debian:buster-slim base image16:45
clarkbI believe that file is part of the image not bind mounted in so it should be consistent between those two builds16:46
clarkbya we don't bidn mount it in and use an env var to change the listen port16:48
clarkbwe run /etc/s6/openssh/setup to generate host keys if necessary when starting the container which is what I think failed16:49
fungi`docker-compose exec gitea-ssh ssh -Q key` needs to match up with HostKeyAlgorithms16:50
fungihowever we don't seem to set HostKeyAlgorithms in /etc/ssh/sshd_config so it should be the default?16:52
clarkbfungi: right, I guess my confusion is that we're just using debian's docker image and ssh packaging. We don't appear to be changing any of that config. And it works in one case but not another16:52
clarkbyup exactly. that is what has me extra confused (if we were editing the list we likely got it wrong or things changed under us but we don't do that)16:52
fungiright, i'm just trying to work out where the problem lies first16:52
clarkbis it possible buster updated openssh-server builds and broke their default config and no one has noticed beacuse it works ok on an upgrade?16:52
clarkbsince host keys will already exist if you just update openssh-server it won't regen them and this is a new install only problem potentially16:53
fungifairly unlikely for something like that to break in stable16:53
fungibut maybe it's something which broke in an update of the docker images themselves16:53
clarkbmaybe a regression in that setup script?16:53
clarkbor a problem with how we call it directly?16:54
fungii'm not entirely sure how the docker images are built (and i want to say they're not really officially produced by debian, but i don't exactly recall)16:54
clarkboh ya it is possible the config is baked into thei mage and conflicts with the upstream packaging16:55
clarkb"Debian Developers tianon and paultag" <- are the maintainers listed on docker hub16:55
fungiright, managed by debian developers, but not necessarily produced by the debian project (to be official, builds have to be performed with dsa-controlled infrastructure, et cetera)16:56
clarkbI guess we can try to reproduce by pulling debian:buster-slim, install openssh-server, then run the generate locally. I'll try that16:57
fungias to why it hit one gitea build and not the other, was there a significant gap in when those builds ran?16:59
clarkbI haven't checked but they were both in the queues at roughly the same time, but maybe af ew minutes made all the difference here17:01
clarkbI think I may see it, need a few mintues to track it all down though17:04
fungican't wait for the deduction17:05
clarkbwe're reusing the ssh config from the upstraem gitea docker images17:07
clarkband that is where sshd_config comes from17:07
clarkbgoing to diff 1.25.5 and 1.31.1 contents17:07
clarkber 12.5 and 13.117:07
clarkbthey set CASignatureAlgorithms ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,sk-ecdsa-sha2-nistp256@openssh.com,ssh-ed25519,sk-ssh-ed25519@openssh.com,rsa-sha2-512,rsa-sha2-256,ssh-rsa and their images are based on alpine. Ours are based on debian17:11
clarkbthe mismatch must be between debian and alpine openssh-server builds17:11
clarkbsk-ecdsa-sha2-nistp256@openssh.com and sk-ssh-ed25519@openssh.com are the ones debian openssh doesn't like I think17:12
fungiyep, that sounds entirely plausible17:12
clarkbthose are the new u2f key types. Is buster too old to support them?17:13
fungithe sk key variants are for fido u2f "security keys" looks like17:14
clarkbok I'm going to do the hacky thing and do a replacement that should work with debain just to see if we get a working system from that. Then we can decide how to unhack this from there17:14
fungiadded in openssh 8.2 looks like17:14
fungibuster ships openssh 7.9 but has 8.4 in backports17:15
clarkboh hrm should we maybe just install a newer openssh17:15
clarkbthis is only for replication so adding fancy features like that for users doesn't help us much I don't think17:16
fungiright, i think whichever is the simpler solution is what gets my vote17:17
openstackgerritClark Boylan proposed opendev/system-config master: Update gitea to 1.13.1  https://review.opendev.org/c/opendev/system-config/+/76922617:24
clarkbI went with backports because that avoids needing to modify upstream things17:24
fungialso that's using the openssh candidate for the next debian stable version (bullseye), so fewer surprises when we update the image to it i guess17:26
fungiis repo.projects the key for the kanban bits?17:32
clarkbfungi: ya poorly named. Writing that change was like 60% figuring that out17:33
clarkbit jumped out to me because the repo html template added a new projects thing and I had to dig from there to figure out what it actually is17:34
clarkbI'm 95% sure its the kanban project management stuff17:34
fungiclarkb: there's an e-mail confirmation probe from inmotion in the infra-root inbox, i assume that's you17:36
clarkbit is17:36
clarkbhowever, now it wants a credit card and I'm letting them know that I will not do that after past experience with $cloud17:36
clarkbso I'm just gonna let it sit in a half created state until they get back to me on next steps I think17:37
fungiyeah, sensible17:37
fungiit's not uncommon for billing departments to get their wires crossed and auto-charge folks for supposedly donated resources17:38
corvusclarkb: who?17:38
corvusi mean what are you working on now?17:38
clarkbcorvus: oh this is the inmotion hosted private cloud thing we takled about last year17:38
corvusok thanks :)17:39
clarkbsounds like they are close to being able to spin something up and have asked us to create an account so I was pushing on that17:39
fungi"InMotion Flex Metal Cloud"17:39
clarkbsounds like I add them as owners on the account too then they can sort out billing so I've done that and am waiting for further instructions17:44
clarkbI've updated the usual location with the details (those we have so far)17:51
clarkbsounds like we need further setup on their side though17:51
corvusi may be 5m late for mtg18:35
*** whoami-rajat__ has quit IRC18:39
clarkbI expect it will be pretty informal and just a catch up on things18:41
fungii'm happy to also be 5m late if that helps ;)18:42
openstackgerritJeremy Stanley proposed opendev/engagement master: Initial commit  https://review.opendev.org/c/opendev/engagement/+/72929319:33
fungii'm getting some very minor discrepancies in subsequent runs of ^ for the same time periods, so adding more debugging output to see if i can work out whether gerrit is actually returning unstable query results19:48
*** rosmaita has joined #opendev20:02
openstackgerritlotorev vitaly proposed zuul/zuul-jobs master: Clarity tox_environment accepts dictionary not list  https://review.opendev.org/c/zuul/zuul-jobs/+/76943322:33
openstackgerritlotorev vitaly proposed zuul/zuul-jobs master: Clarity tox_environment accepts dictionary not list  https://review.opendev.org/c/zuul/zuul-jobs/+/76943322:35
openstackgerritlotorev vitaly proposed zuul/zuul-jobs master: Document Python siblings handling for tox role  https://review.opendev.org/c/zuul/zuul-jobs/+/76882322:35
*** tkajinam has joined #opendev23:01
*** diablo_rojo has joined #opendev23:48
