Friday, 2021-02-12

ianwyeah, rsa key seems to do the same thing00:00
clarkbjust talking out loud here: other things that can cause ssh to fail include the shell not being set properly (wrong path, not being installed, etc). Permissions on the authorized_keys file. What else? But you ruled those issues out by using the key to login from elsewhere00:02
clarkbif it was source side permissions on the key itself it wouldn't offer it at all and you said it was being offered00:03
ianwyeah, the user was generated by ansible, same as all the other users00:06
fungicorvus: yeah, it does appear that gerrit starts timing out worker threads and throwing write errors in that situation, at least judging from the exceptions raised in the log about a subset of the pushes from that series. how bad it gets may also depend on background load on the system too00:09
fungiianw: agreed, it's possible the openssh client on trusty may have trouble with an openssh server on focal. i saw something similar trying to talk to very old gerrit (mina-sshd) from a focal openssh client too. i expect it comes down to deprecated ciphers/hashes in focal00:11
ianwi've built an OpenSSH_7.6p1 in /tmp and it still doesn't work00:12
ianwbut it is linked against it's old openssl00:12
fungiright, i think it's likely more to do with what openssl supports or doesn't00:12
fungisince openssh is relying on it for cryptographic primitives00:13
fungii suppose we could snapshot the server and then try an in-place upgrade to xenial00:14
corvushappy [utc] lunar new year!00:29
fungiand to you!00:36
*** tosky has quit IRC00:36
corvusclarkb: npr talks krz: https://www.npr.org/2021/02/11/966499158/reading-the-game-kentucky-route-zero00:39
clarkbcorvus: its the sort of game that non gamers can get into too if you are interested00:46
clarkbdoesn't require you to react quickly or figure out a controller to perform coordinated tasks00:46
*** mlavalle has quit IRC01:01
*** DSpider has quit IRC01:08
openstackgerritGoutham Pacha Ravi proposed opendev/yaml2ical master: Add third week meetings schedule  https://review.opendev.org/c/opendev/yaml2ical/+/77530401:24
ianwso it turned out to be me misreading user names01:33
ianwsigh01:33
ianwfungi: do we need the various /homes on wiki server backed up?01:33
fungidoubtful but i'll take a quick look01:36
*** hemanth_n has joined #opendev01:52
fungiianw: other than those usernames being a trip down memory lane, i don't see anything important to hold onto (it's all old downloads of wiki source, old copies of configs, et cetera02:18
fungi)02:18
ianwthanks, i'll probably just prune them.  i need to put the db in too02:30
ianwsorry, a little distracted, we just went back into a 5 day lockdown due to the UK strain getting out :(02:31
fungioof02:33
*** ysandeep|out is now known as ysandeep|rover02:42
fungifrom what i gather it's already running rampant here, we're more concerned about the south african strain at this point02:48
ianw:(02:54
*** dviroel has quit IRC03:09
ysandeep|rover#opendev We're still seeing some limestone mirror related RETRY_LIMIT failures  https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_99a/775067/1/check/openstack-tox-py37/99aa3cd/job-output.txt03:35
fungithose are almost certainly the continued random missing ipv4 route problem, we abort builds pretty much first thing if they hit that, but if the build had already retried twice before (maybe it got extremely lucky and hit that problem three times in a row? or maybe it's a job which already tends to crash nodes at random some of the time?) and then ran into that on its third try03:42
fungioh, looking at that log, the failure was something different03:43
fungithat's acting like the mirror there is having trouble getting some things in afs03:44
ysandeep|roverhmm, Failed to fetch https://mirror.regionone.limestone.opendev.org/ubuntu/dists/bionic/universe/binary-amd64/Packages  403  Forbidden [IP: 2607:ff68:100:54:f816:3eff:feb5:4635 443]03:44
fungils: cannot open directory '/afs/openstack.org/': Connection timed out03:45
fungi[Fri Feb 12 02:02:13 2021] afs: Lost contact with volume location server 23.253.200.228 in cell openstack.org (code -1)03:45
fungi[Fri Feb 12 02:03:08 2021] afs: Lost contact with volume location server 104.130.136.20 in cell openstack.org (code -1)03:45
fungiipv4 network connectivity problems there?03:45
fungiit can ping them03:46
fungitrying to restart openafs-client on it now but it's just hanging03:48
fungii'll try rebooting the mirror03:48
ysandeep|roverack03:48
fungithe reboot may take a minute to give up on the afs client03:49
fungithere it goes03:49
fungiit's booted back up03:51
fungi[Fri Feb 12 03:54:21 2021] afs: network error for 104.130.136.20:7003: origin 0 type 3 code 10 (Destination Host Prohibited)03:55
funginow it's saying that for both of them too03:55
fungiand still not connecting03:55
fungium, our other mirrors are saying the same thing03:56
ysandeep|roverfungi.. fyi i just noticed another failure with different mirror03:56
ysandeep|roverhttps://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_9b6/775310/3/check/tripleo-validations-centos-8-molecule-ceph/9b6b259/job-output.txt03:56
ysandeep|rover~~~03:56
ysandeep|rovererror: Status code: 403 for https://mirror.kna1.airship-citycloud.opendev.org/centos/8/AppStream/x86_64/os/repodata/repomd.xml (IP: 188.212.109.26) (https://mirror.kna1.airship-citycloud.opendev.org/centos/8/AppStream/x86_64/os/repodata/repomd.xml).03:56
ysandeep|rover~~~03:56
fungiat least ovh gra103:56
fungiwhere i just  tried03:56
fungiysandeep|rover: yes, i think something has just happened to our fileservers03:56
fungii can ssh into both of them, they've been up for weeks since last reboots03:57
fungioh, that's the afs db servers everything's complaining about, not the fileservers03:59
fungithey're both reachable by ssh and up for over a week03:59
fungilooks like everything lost contact with them at 02:05 utc04:00
funginothing interesting in dmesg since their last reboots04:01
fungii can get to them via ipv4 as well04:01
fungiianw: any guesses as to what to check? syslog is basically quiet as well04:06
fungiif you filter out the constant snmp noise anyway04:06
fungiahh, we've got separate service logs under /var/log/openafs/04:06
fungibut they're also no help, basically nothing in them since the most recent reboots when those services started04:08
fungiianw: ansible updated our iptables rules just before everything lost contact04:09
fungii think the client errors about "Destination Host Prohibited" are literal04:10
fungi22:46 <openstackgerrit> Merged opendev/system-config master: Refactor AFS groups https://review.opendev.org/c/opendev/system-config/+/77505704:11
fungii bet it was deploying that04:11
funginow to figure out what we were previously allowing on the db servers which we suddenly blocked04:12
fungiiptables_extra_public_udp_ports: [7000,7001,7002,7003,7004,7005,7006,7007]04:15
fungii'm going to temporarily put these servers into the emergency disable list for ansible04:16
fungiokay, i think everything's back up04:20
fungi#status log Added afsdb01 and afsdb02 servers to emergency disable list and added back missing public UDP ports in firewall rules while we work out what was missing from 77505704:21
openstackstatusfungi: finished logging04:21
fungiysandeep|rover: i *think* everything should be back to normal now04:22
fungiianw: do we need to rename inventory/service/group_vars/afs.yaml and inventory/service/group_vars/afsdb.yaml to match the new group names?04:25
ysandeep|roverfungi thank you, ++04:26
openstackgerritJeremy Stanley proposed opendev/system-config master: Update AFS group vars filenames  https://review.opendev.org/c/opendev/system-config/+/77531104:29
fungiianw: ysandeep|rover: ^ i think that's the longer term fix04:29
fungiunless i'm misunderstanding how these pieces fit together04:29
ysandeep|roverfungi, thanks!, I am rechecking failed patches, i will report here if i find still some issues.04:31
fungiappreciated! and thanks for letting us know you were seeing a problem04:32
ysandeep|roverthanks for fixing issues so quickly :)04:33
*** ykarel_ has joined #opendev04:51
*** ykarel_ is now known as ykarel05:59
*** marios has joined #opendev06:20
*** rchurch has quit IRC06:24
*** eolivare has joined #opendev06:55
*** slaweq has joined #opendev07:11
ianwarrrgghhh terribly sorry to step away and leave that07:11
openstackgerritIan Wienand proposed opendev/system-config master: Update AFS group vars filenames  https://review.opendev.org/c/opendev/system-config/+/77531107:27
ianwoh, the afs servers survived because they're still in emergency07:33
*** sboyron_ has joined #opendev07:45
*** ykarel_ has joined #opendev07:47
*** ralonsoh has joined #opendev07:50
*** ykarel has quit IRC07:50
*** hashar has joined #opendev07:54
*** ysandeep|rover is now known as ysandeep|lunch07:56
*** rpittau|afk is now known as rpittau08:02
*** andrewbonney has joined #opendev08:13
*** tosky has joined #opendev08:24
fungiyep08:33
fungii realized that when i went to add these08:33
fungiwe can peel them back carefully and make sure things still work08:33
fungiand no apologies needed, these things happen08:33
*** jpena|off is now known as jpena08:56
*** ysandeep|lunch is now known as ysandeep|rover08:59
*** DSpider has joined #opendev09:44
*** redrobot9 has joined #opendev10:26
*** redrobot has quit IRC10:27
*** redrobot9 is now known as redrobot10:27
*** ykarel_ is now known as ykarel10:37
*** ysandeep|rover is now known as ysandeep|afk11:18
*** dviroel has joined #opendev11:22
*** sshnaidm|afk has quit IRC11:41
*** dtantsur|afk is now known as dtantsur11:41
*** ysandeep|afk is now known as ysandeep|rover11:42
*** sshnaidm|afk has joined #opendev11:49
*** sshnaidm|afk is now known as sshnaidm|off11:50
*** hashar has quit IRC12:03
*** fressi has joined #opendev12:06
*** jpena is now known as jpena|lunch12:31
openstackgerritSorin Sbârnea proposed zuul/zuul-jobs master: update-json-file: avoid failure when destination does not exists  https://review.opendev.org/c/zuul/zuul-jobs/+/77537312:33
*** hashar has joined #opendev12:42
*** hemanth_n has quit IRC12:45
openstackgerritSorin Sbârnea proposed zuul/zuul-jobs master: update-json-file: avoid failure when destination does not exists  https://review.opendev.org/c/zuul/zuul-jobs/+/77537313:22
openstackgerritSorin Sbârnea proposed zuul/zuul-jobs master: update-json-file: avoid failure when destination does not exists  https://review.opendev.org/c/zuul/zuul-jobs/+/77537313:26
*** jpena|lunch is now known as jpena13:30
*** ysandeep|rover is now known as ysandeep|mtg13:32
*** d34dh0r53 has quit IRC13:46
*** d34dh0r53 has joined #opendev13:54
*** mlavalle has joined #opendev14:00
*** ysandeep|mtg is now known as ysandeep14:05
*** ysandeep is now known as ysandeep|away14:07
*** fressi has quit IRC14:22
*** d34dh0r53 has quit IRC14:45
*** d34dh0r53 has joined #opendev14:45
*** rpittau is now known as rpittau|afk15:03
*** ykarel has quit IRC15:42
*** ykarel has joined #opendev15:42
*** roman_g has joined #opendev15:45
*** lbragstad_ has joined #opendev15:46
fricklerslaweq: I have a held node for you on https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_881/773670/3/check/neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid/88171f4/ can you remind me of your ssh key once more? (we really should make a list of those somewhere)15:47
roman_gHello team. Is there any maintenance on CityCloud mirror VM? We are getting either "unable to connect" when trying you apt-get install packages from mirror.kna1.airship-citycloud.opendev.org, or HTTP 403, or something like that.15:47
slaweqfrickler: http://paste.openstack.org/show/802603/15:47
slaweqthx a lot15:47
fricklerslaweq: root@172.99.69.133 , let us know how it goes15:49
*** lbragstad has quit IRC15:50
fricklerroman_g: there was an issue about 12h ago, are you looking at old logs or is that still happening for you now?15:50
roman_gfrickler haha, I see old issues. Need to investigate latest ones.15:52
roman_gThank you.15:52
slaweqfrickler: sure, thx a lot15:55
roman_ghaha -> aha, sorry15:56
*** sboyron_ has quit IRC15:58
*** LowKeys has joined #opendev16:03
LowKeysHi morning16:04
LowKeysi've questions, how to fix this issue during git clone :  error: RPC failed; curl 56 GnuTLS recv error (-9): Error decoding the received TLS packet.16:04
*** mlavalle has quit IRC16:08
*** marios is now known as marios|call16:08
*** mlavalle has joined #opendev16:09
clarkbLowKeys: what are you cloning against?16:14
clarkb(the url would be helpful)16:15
LowKeysclarkb: i do git clone https://opendev.org/openstack/openstack-ansible16:15
clarkbok as a quick sanity check I've cloned that repo from all three gitea backends directly16:18
clarkbthat seems happy at least16:18
clarkbcacti also shows things look happy so no clues there16:20
LowKeysok problem solved, i think this connection issue, i changed the public ip, and solved16:23
clarkbya curl error 56 typically indicates a network issue looks like16:25
*** lbragstad_ is now known as lbragstad16:25
slaweqfrickler: I have to leave now for some time, but I will continue my debugging in few hours if that isn't big problem, later I will tell You on this channel when the vm can be deleted16:29
slaweqI hope it's fine for You16:30
*** marios|call is now known as marios16:42
fricklerslaweq: sure, take your time, can also be a couple of days, no problem16:50
*** ykarel is now known as ykarel|away16:54
fungiroman_g: yeah, there was a ~2hr period around 02:00-04:00 where we accidentally merged some incorrect configuration management and it removed firewall rules from part of our distributed storage backend for the package mirrors, sorry about that. job volume was low enough at that time i opted to just mention something in the status log and not spam all the irc channels16:58
*** ykarel|away has quit IRC17:06
*** marios is now known as marios|out17:07
*** jpena is now known as jpena|brb17:13
LowKeysclarkb: yes, thank you btw17:16
*** marios|out has quit IRC17:18
*** hashar has quit IRC17:18
*** d34dh0r53 has quit IRC17:21
*** eolivare has quit IRC17:30
*** gmann is now known as gmann_afk17:40
roman_gfungi Thank you.17:42
*** andrewbonney has quit IRC18:02
*** jpena|brb is now known as jpena18:03
*** LowKeys has quit IRC18:04
*** hamalq has joined #opendev18:36
*** dtantsur is now known as dtantsur|afk18:39
*** roman_g has quit IRC18:47
*** gmann_afk is now known as gmann18:52
*** jpena is now known as jpena|off18:58
fungistill watching http://travaux.ovh.net/?do=details&id=48997 to determine when it's safe to merge https://review.opendev.org/775209 but they haven't marked the incident as resolved yet (last update was yesterday... not sure if those timestamps are utc or cst)19:42
*** hashar has joined #opendev20:19
*** auristor has quit IRC20:21
*** auristor has joined #opendev20:22
*** ralonsoh has quit IRC20:37
clarkbfungi: it seems like things are working if we want to just go for it but ya the indication that stuff was still being fixed made me decide not ot push it20:43
*** hashar has quit IRC20:46
slaweqfrickler: clarkb: thx a lot for that host, I think I found the problem there. You can delete node 172.99.69.133 now20:50
slaweqand also have a great weekend :)20:50
*** slaweq has quit IRC21:00
fungilooks like that hold was removed21:18
*** roman_g has joined #opendev21:21
*** roman_g has quit IRC21:21
*** mlavalle has quit IRC21:22
*** mlavalle has joined #opendev21:35
*** klonn has joined #opendev22:14
*** hamalq has quit IRC23:51

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!