Wednesday, 2020-10-28

openstackgerritIan Wienand proposed opendev/zone-opendev.org master: Fix RAX ORD internal mirror typo  https://review.opendev.org/75997400:01
openstackgerritMerged opendev/zone-opendev.org master: Fix RAX ORD internal mirror typo  https://review.opendev.org/75997400:14
*** hamalq has quit IRC00:26
openstackgerritIan Wienand proposed opendev/system-config master: reprepo: enable cron jobs  https://review.opendev.org/75996500:29
openstackgerritIan Wienand proposed opendev/system-config master: reprepro: deploy Ubuntu keys  https://review.opendev.org/75997500:29
openstackgerritIan Wienand proposed opendev/system-config master: Remove mirror-update server and related puppet  https://review.opendev.org/75997600:43
ianw$ host mirror-int.ord.rax.opendev.org00:43
ianwmirror-int.ord.rax.opendev.org is an alias for mirror01-int.ord.rax.opendev.org.00:43
ianwmirror01-int.ord.rax.opendev.org has address 10.209.128.5700:43
fungilgtm now00:54
*** Green_Bird has joined #opendev00:57
*** Green_Bird has quit IRC00:57
*** Goneri has quit IRC01:03
openstackgerritMerged opendev/system-config master: Generate internal certs for RAX ORD mirror  https://review.opendev.org/75997101:09
*** kindwindfall has quit IRC01:40
openstackgerritMerged opendev/system-config master: reprepro: deploy Ubuntu keys  https://review.opendev.org/75997501:40
ianw[Wed Oct 28 01:44:39 UTC 2020] Verifying: mirror01-int.ord.rax.opendev.org02:01
ianw[Wed Oct 28 01:44:43 UTC 2020] mirror01-int.ord.rax.opendev.org:Verify error:During secondary validation: Incorrect TXT record02:01
ianwi'm not sure what the deal is with that :/02:01
ianwlooks like LE does multiple lookups and one of them failed, which is weird02:06
ianwacme.opendev.org is showing the right txt records02:06
clarkbdid it race?02:10
clarkbor maybe we have the old txt there too for the existing name and it doesnt see that asvalid currently?02:10
ianw[Wed Oct 28 01:44:50 UTC 2020] review-dev.opendev.org:Verify error:During secondary validation: Incorrect TXT record02:11
ianwso same thing happened for review-dev02:11
ianw   "cmd": "rndc reload acme.opendev.org",02:13
ianwit says "zone reload queued"02:13
ianwbut i've never know it to take that long02:14
clarkbperhaps the lag was in syncing to the ns servers?02:15
ianwOct 28 01:44:34 adns1 named[15038]: client @0x7fdfd030d300 104.239.140.165#56334/key tsig (acme.opendev.org): transfer of 'acme.opendev.org/IN': AXFR started: TSIG tsig (serial 1603849472)02:16
ianwOct 28 01:44:34 adns1 named[15038]: client @0x7fdfd030d300 104.239.140.165#56334/key tsig (acme.opendev.org): transfer of 'acme.opendev.org/IN': AXFR ended02:16
ianwOct 28 01:52:19 adns1 named[15038]: client @0x7fdfd04e3790 162.253.55.16#35254/key tsig (acme.opendev.org): transfer of 'acme.opendev.org/IN': AXFR started: TSIG tsig (serial 1603849472)02:16
ianwOct 28 01:52:19 adns1 named[15038]: client @0x7fdfd04e3790 162.253.55.16#35254/key tsig (acme.opendev.org): transfer of 'acme.opendev.org/IN': AXFR ended02:16
ianwi think that says that ns2 got the update some 10 minutes later02:17
ianwmaybe we have to manually run a transfer on ns1/ns2 to be sure?  i haven't seen this before02:19
clarkb Ya I'm not sure why it would stagger them like that. It could be a load balacning thing done for larger zones?02:19
ianwOct 28 01:44:34 ns2 nsd[18851]: notify for acme.opendev.org. from 2001:4800:7819:104:be76:4eff:fe04:43d0 refused, no acl matches.02:20
ianwOct 28 01:44:34 ns2 nsd[2112]: [2020-10-28 01:44:34.796] nsd[18851]: info: notify for acme.opendev.org. from 2001:4800:7819:104:be76:4eff:fe04:43d0 refused, no acl matches.02:20
ianwi wonder if we don't have ipv6 configured02:21
openstackgerritIan Wienand proposed opendev/system-config master: nameserver: Allow master server to notify via ipv6  https://review.opendev.org/75998802:34
ianwi think maybe we've just been lucky with timing?  unless something in particular changed02:35
clarkbdoesthe ipv6 log happen earlier when we expect?02:37
ianwyeah, the timestamps line up02:37
ianwwe issued the reload at 01:44:5002:37
clarkbchange seems fine fwiw02:38
ianwOct 28 01:44:34 adns1 named[15038]: received control channel command 'reload acme.opendev.org'02:40
ianw Oct 28 01:44:34 ns2 nsd[18851]: notify for acme.opendev.org. from 2001:4800:7819:104:be76:4eff:fe04:43d0 refused, no acl matches02:40
ianwso yeah, same second02:40
ianwbut  [Wed Oct 28 01:44:50 UTC 2020] review-dev.opendev.org:Verify error:During secondary validation: Incorrect TXT record02:40
ianwperhaps we should put like a 1 minute sleep in there at least02:41
openstackgerritIan Wienand proposed opendev/system-config master: letsencrypt-install-txt-record: pause after adding TXT records  https://review.opendev.org/75999102:45
openstackgerritIan Wienand proposed opendev/system-config master: nameserver: Allow master server to notify via ipv6  https://review.opendev.org/75998803:10
openstackgerritIan Wienand proposed opendev/system-config master: letsencrypt-install-txt-record: pause after adding TXT records  https://review.opendev.org/75999103:10
*** tkajinam has quit IRC03:10
openstackgerritIan Wienand proposed opendev/system-config master: mirror: remove old ceph links  https://review.opendev.org/76004003:19
*** lamt has quit IRC03:22
*** rchurch has quit IRC03:33
*** rchurch has joined #opendev03:34
openstackgerritMerged opendev/system-config master: ARM64 : run base test on Focal too  https://review.opendev.org/75662903:57
*** sean-k-mooney has quit IRC04:16
*** sean-k-mooney has joined #opendev04:16
*** ykarel has joined #opendev04:24
*** fressi has joined #opendev04:33
*** fressi has quit IRC04:43
fungiokay, awake again... are we okay for a quick gerrit restart now before the call starts?04:51
clarkbI guess so? I'm around04:54
fungiokay, restarting04:55
fungidowned04:55
fungiup -d'd04:55
clarkbfungi: when gerrit is done you may want to catch up on the conversation we had around zone transfers and allowed ip addrs04:56
clarkbtldr we think LE may be failing beacuse its trying to use ipv6 and it fallsback to ipv4 too slowly for LE verification to work04:56
ianwyou'd think we would have noticed by now, but yeah, maybe something changed?04:57
clarkbI know on some clouds the RAs happen late enough that you don't get them immediately on boot04:58
fungigerrit seems to be up again04:58
clarkbmaybe unattended upgrades caused the server to restart and notice it has an ipv6 address ti can speak from now?04:58
ianwyeah, it hasn't rebooted in ... a long time04:59
fungi"notify for acme.opendev.org. from 2001:4800:7819:104:be76:4eff:fe04:43d0 refused, no acl matches" suggests that our secondary nameservers are not configured to trust zone update notifications from our primary04:59
fungior at least not from its v6 addy05:00
clarkbfungi: yup ianw has a change up to fix that05:00
fungiahh, okay05:00
funginow i see it05:00
fungialso i guess call is starting05:00
clarkbyup I'm there :)05:00
ianwit seems everything is having fun with the new cryptography release05:00
clarkbdid they break things?05:01
ianwhttp://paste.openstack.org/show/gYsegN1rbLIlCYAnR7Dj/05:15
ianwclarkb/fungi: http://paste.openstack.org/show/IpLdtZECcqsncXyThBGa/05:18
fungi#status log restarted gerrit at 04:55 to pick up its-storyboard plugin config update05:25
openstackstatusfungi: finished logging05:25
fungiopenstackgerrit seems to have gone silent05:44
fungii wonder if the gerrit restart has confuddled it. will restart05:44
fungi2020-10-28 04:55:41,410 ERROR gerrit.GerritWatcher: Exception con05:45
fungisuming ssh event stream:05:46
fungiyeah, restarting it now05:46
*** openstackgerrit has quit IRC05:46
fungi#status log restarted gerritbot which got confused reading the event stream while gerrit was restarted05:47
openstackstatusfungi: finished logging05:47
fungihttps://review.opendev.org/760051 Document dual account split for Gerrit admins05:49
*** openstackgerrit has joined #opendev05:55
openstackgerritIan Wienand proposed opendev/system-config master: mirror-update/reprepro : use common functions  https://review.opendev.org/75869505:55
openstackgerritMerged opendev/system-config master: mirror: remove old ceph links  https://review.opendev.org/76004006:08
*** ysandeep|away is now known as ysandeep|ruck06:08
*** DSpider has joined #opendev06:34
fungibefore i go to bed, just briefly marvelling at how polite robots are to one another, if only people were as cordial: https://github.com/openstack/openstack-helm-images/pull/106:34
fungias robograndpa always said, "you'll catch more rustflies with lubricant than cleaning fluid"06:44
* fungi realizing he's probably mildly delirious wanders off to find sleep06:46
*** tkajinam has joined #opendev06:53
*** marios has joined #opendev06:54
*** sshnaidm|afk is now known as sshnaidm|rover07:00
*** eolivare has joined #opendev07:22
*** sboyron has joined #opendev07:30
*** ykarel has quit IRC07:44
*** ralonsoh has joined #opendev07:45
*** ykarel has joined #opendev07:45
*** rpittau|afk is now known as rpittau07:51
*** slaweq has joined #opendev08:00
*** andrewbonney has joined #opendev08:10
*** lpetrut has joined #opendev08:11
*** webmariner has quit IRC08:35
*** hashar has joined #opendev08:42
*** ykarel_ has joined #opendev08:49
*** ykarel has quit IRC08:52
*** tosky has joined #opendev08:57
*** ysandeep|ruck is now known as ysandeep|lunch09:00
*** openstack has quit IRC09:21
*** openstack has joined #opendev09:22
*** ChanServ sets mode: +o openstack09:22
openstackgerritJens Harbott (frickler) proposed opendev/system-config master: nameserver: Allow master server to notify via ipv6  https://review.opendev.org/75998809:26
*** logan- has quit IRC09:27
*** logan- has joined #opendev09:27
*** ysandeep|lunch is now known as ysandeep|ruck09:45
*** ykarel_ is now known as ykarel09:51
openstackgerritLajos Katona proposed openstack/project-config master: Add publish-to-pypi template to networking-l2gw  https://review.opendev.org/76009609:59
openstackgerritLajos Katona proposed openstack/project-config master: Add publish-to-pypi template to networking-l2gw  https://review.opendev.org/76009610:03
*** marios has joined #opendev10:14
*** ysandeep|ruck is now known as ysandeep|brb10:31
zbrapparently gerrit seems to have problems remembering sessions, it lost my session seconds ago, even if I was already logged in and active.10:41
frickler#status log force-merged https://review.opendev.org/759831 at the request of nova ptl in order to unblock integrated gate10:43
openstackstatusfrickler: finished logging10:43
fricklerzbr: gerrit was restarted yesterday evening, was this the first time you accessed it since then?10:46
fricklerzbr: actually not yesterday but this morning at 04:55 UTC10:46
zbrlike seconds before, anyway lets hope we succeed with the upgrade soon.10:47
*** slaweq has quit IRC10:51
*** ysandeep|brb is now known as ysandeep|ruck10:54
*** slaweq has joined #opendev10:55
*** Green_Bird has joined #opendev11:13
*** lpetrut has quit IRC11:25
*** lpetrut has joined #opendev11:26
openstackgerritMerged opendev/system-config master: nameserver: Allow master server to notify via ipv6  https://review.opendev.org/75998811:26
*** lpetrut has quit IRC11:26
*** lpetrut has joined #opendev11:27
*** dmellado has quit IRC11:38
*** dmellado has joined #opendev11:39
*** ykarel has quit IRC11:47
*** ykarel has joined #opendev11:49
*** Green_Bird has quit IRC12:03
*** Green_Bird has joined #opendev12:04
*** ykarel_ has joined #opendev12:16
*** eolivare has quit IRC12:21
*** hashar has quit IRC12:50
*** hashar has joined #opendev12:56
openstackgerritMerged openstack/project-config master: Use internal address for RAX ORD  https://review.opendev.org/75997213:10
fungizbr: if you have multiple gerrit browser tabs open and the session for one of them expires and you re-login, then the login for the other tabs is invalidated, and will appear not to be logged in the next time you do anything in them (unless you just reload them). then if you try to log one of them in it will invalidate the other login you just completed... the only workarounds i know of are to either close13:11
fungiall but one of your gerrit tabs, or explicitly reload every other gerrit tab if you have to log one of them back in13:11
zbrouch... what can i say.13:12
fungii tend to do the latter if i have multiple gerrit tabs open13:12
zbrthis explains it, i have tons of tabs.13:12
fungiyeah, basically auth will fight between tabs13:12
zbri wonder if newer versions sorted that13:12
fungipolygerrit might, but this problem has been around for as long as i can remember13:13
fungiit's a good idea of something to test on review-test once we run back through the upgrade on it (likely next week)13:15
openstackgerritMerged openstack/project-config master: Add publish-to-pypi template to networking-l2gw  https://review.opendev.org/76009613:16
*** eolivare has joined #opendev13:24
*** sshnaidm|rover has quit IRC13:25
*** ykarel_ has quit IRC13:31
fricklerinfra-root: publish-irc-meetings in opendev-prod-hourly has been queued for 87 hrs according to the status page, that doesn't look right to me13:39
*** sshnaidm|rover has joined #opendev13:39
*** sshnaidm|rover is now known as sshnaidm|mtg13:44
*** mlavalle has joined #opendev14:07
*** d34dh0r53 has quit IRC14:07
*** d34dh0r53 has joined #opendev14:11
*** Goneri has joined #opendev14:48
*** Goneri has quit IRC14:52
fricklerinfra-root: puppet has failed in the deploy pipeline for https://review.opendev.org/759972 with some issues I can't identify14:53
openstackgerritAleksey Zvyagintsev proposed openstack/diskimage-builder master: Allow processing 'focal' ubuntu release in lvm  https://review.opendev.org/76015614:55
fungifrickler: where do you see the deploy failure? or has it not reported yet?15:00
fricklerfungi: ah, wrong link, sorry, https://review.opendev.org/759988 is the one with the failures15:03
fricklerfor the rax mirror change I was going to ask whether it needs some further action in order to get deployed15:04
fungiwhich failure are you looking at for 759988? looks like both infra-prod-remote-puppet-afs and infra-prod-remote-puppet-else had problems15:05
*** ykarel is now known as ykarel|away15:07
fungiinfra-prod-remote-puppet-else seems to have failed installing puppet on grafana01.opendev.org ("Platform not currently supported"), running puppet on ask01.openstack.org ("Error: Systemd start for jetty failed!") and running puppet on openstackid-dev01.openstack.org ("Error: Function lookup() did not find a value for the name 'openstackid_dev_message_broker_host'")15:10
*** lpetrut has quit IRC15:19
fungietherpad is looking really slow for me. is anyone else having trouble? could just be my systems is overloaded by ptg browser stuff15:25
fungiscoreboard looks fairly open, so it's probably just me15:28
*** ykarel|away has quit IRC15:41
mrungehi there, is there a process to become stable core in projects?15:51
mrungeI've been stable core in horizon quite some time ago15:51
mrungenow most of the people from telemetry are gone, and it seems there is only one stable core for telemetry left15:52
fungimrunge: it's up to the projects themselves. sounds like a question you might want to (re)ask in #openstack-tc15:53
mrungefungi, ack15:53
fungimr i see zhurong is the ptl for telemetry, and cloudnull/gmann were the openstack tc liaisons for the project last cycle (i think they're in the process of identifying liaisons for the new cycle) so those might also be people to reach out to: https://governance.openstack.org/tc/reference/projects/telemetry.html15:54
fungier, mrunge ^15:54
mrungefungi, it seems I am the PTL for telemetry in this cycle15:55
mrunge:)15:55
fungimrunge: aha, i guess the change to set you hasn't merged yet15:56
mrungeprobably, yes15:56
fungiin that case i can add you to the core review group if none of the existing core reviewers is around to do so. i'll just double-check the governance changes real fast15:56
mrungethat would be great fungi15:56
mrungeyes, sure15:56
gmannmrunge: process is to ask stable core team to add in list but I do not think that team is active anymore. We in TC had some discussion about changing the process in shangahi PTG but that could not proceed further. I am going to add it in TC PTG etherpad to discuss15:59
clarkbfungi: I believe we'rerunning grafana on docker now so that may need an inventpry cleanup?16:00
clarkbnot sure about the other two16:01
fungimrunge: aha, looks like the confusion is my (openstack technical election official) fault, i didn't notice https://review.opendev.org/757971 generated by our governance update script didn't reflect the telemetry election outcome recorded in https://review.opendev.org/75796716:05
fungimrunge: i'll push up a change to fix that real quick16:05
*** sauloasilva1 has joined #opendev16:06
mrungefungi, thank you!16:06
openstackgerritzbr proposed zuul/zuul-jobs master: Add test_setup_reset_connection setting  https://review.opendev.org/65313016:08
fnordahl#os-charms discussing in-flight development and hashing out topics for tomorrows session16:09
fnordahl#os-charms now discussing in-flight development and hashing out topics for tomorrows session16:10
*** chandankumar is now known as raukadah16:23
AJaegerfnordahl: are you sure you're in the right channel?16:27
*** ysandeep|ruck is now known as ysandeep|away16:29
fnordahlAJaeger: I'm postive that I was not, did not post #chanfail in an attempt to dodge that fact but you caught me :)16:33
openstackgerritJens Harbott (frickler) proposed openstack/project-config master: Fix the internal rax mirror name  https://review.opendev.org/76018316:35
mrungefungi, did we had a recent change in the ssh config for review?16:48
mrunge"no mutual signature algorithm" ?16:48
clarkbmrunge: did you recently upgrade to fedora 33?16:49
mrungeyes, I did16:49
fungimrunge: i'm guessing you had a recent change in your operating system ;)16:49
fungiyeah, that16:49
mrungeyes.... sigh16:50
clarkbfedora 33 has decided that the deprecated sha1 hashing for ssh host keys should be disallowed and not just deprecated so that is disabled by default on fedora 3316:50
clarkbyou can reenable it on a per host basis which is our suggestion for now. And you can see yseterday's announcement for a gerrit upgrade in a few weeks for how we'll address that longer term16:50
mrungethank you clarkb , that is a great help16:51
mrungeat least, I have an idea where to look now16:51
*** hamalq has joined #opendev16:51
openstackgerritMerged openstack/project-config master: Fix the internal rax mirror name  https://review.opendev.org/76018316:55
mrungeclarkb, out of curiosity, where was the announcement re. gerrit upgrade sent to? I certainly missed that16:56
fungimrunge: service-announce@lists.opendev.org mailing list16:57
clarkbmrunge: http://lists.opendev.org/pipermail/service-announce/2020-October/000012.html16:57
mrungethank you, /me takes a note on subscribing to that16:57
openstackgerritClark Boylan proposed opendev/system-config master: This updates LE config for the ord mirror to the correct name  https://review.opendev.org/76018516:59
clarkbAJaeger: fungi ^ I think that is the fix we need then that will trigger the playbook and update the cert17:00
fungithanks17:00
clarkbmgagne: may also have throughts on the network issues in inap17:01
fungithe ones i looked at were consistently reporting an unexpected network disconnection in ansible which bubbled up as failed/retry17:05
clarkbin the past we've seen similar when ansible tries to start a new connection (usually for rsync since it doesn't use the control persistent ssh process) when two instances are fighting over an IP address via arp17:06
fungii thought we usually saw ssh host key mismatches, but maybe only if the rogue instances were listening on ssh17:07
clarkbya that was how it often showed up, but I imagine it isn't the only way the error can manifest17:08
fungii agree17:08
*** rpittau is now known as rpittau|afk17:09
*** mlavalle has quit IRC17:14
openstackgerritMatthias Runge proposed openstack/project-config master: Create telemetry group and include  https://review.opendev.org/76019017:16
*** mlavalle has joined #opendev17:17
*** tosky has quit IRC17:20
*** marios is now known as marios|out17:25
*** eolivare has quit IRC17:25
clarkbfungi: zuul is happy with https://review.opendev.org/#/c/760185/ now if you take a look at that to fix the ssl cert for mirror-int.ord17:41
*** sshnaidm|mtg is now known as sshnaidm|rover17:46
*** ykarel|away has joined #opendev17:47
fungiwill check it out, thanks. juggling too many things at once17:47
*** marios|out has quit IRC17:53
*** andrewbonney has quit IRC17:57
*** ykarel|away has quit IRC17:59
*** ralonsoh has quit IRC18:00
openstackgerritMerged opendev/system-config master: This updates LE config for the ord mirror to the correct name  https://review.opendev.org/76018518:18
*** hashar has quit IRC18:28
*** webmariner has joined #opendev18:28
clarkbbase playbook for the LE config fix failed on a small number of hosts. It looks like rc: -13 running ASK [base/exim : Install Exim]18:46
clarkbthat should run hourly too I guess we check if the same hosts fail again then dig in more (possible apt problem?)18:47
openstackgerritMatthias Runge proposed openstack/project-config master: Create telemetry group and include  https://review.opendev.org/76019018:52
clarkbfungi: ok the cert looks good to me now on mirror.ord. Checking to see if zuul executors have an updated project config which will change the mirror naming18:54
clarkbya ze01 has an updated project config18:54
clarkbI think we can reenable ord in nodepool if you agree18:54
*** sshnaidm_ has joined #opendev18:57
*** sshnaidm|rover has quit IRC19:00
*** sshnaidm_ is now known as sshnaidm|rover19:06
TheJuliais zuul being crawled again?19:14
clarkbdo you mean gerrit?19:14
clarkbload is elevated http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=25&rra_id=all19:15
TheJuliaerr, yeah19:16
TheJuliagerrit19:16
TheJulia20-30 seconds per page load19:16
TheJuliaand now a few in a second19:16
TheJulia~7-8 seconds on my last click19:17
clarkbthe index threads are the ones consuming all the cpu time says melody so that continues to fit the pattern19:17
clarkbya I see the requests in the apache logs now19:18
*** mtreinish has quit IRC19:19
clarkbI've applied an firewall rule for that source19:20
*** mtreinish has joined #opendev19:20
*** sshnaidm|rover is now known as sshnaidm|afk19:28
*** mtreinish has quit IRC19:29
*** mtreinish has joined #opendev19:29
clarkbianw: I think you had mentioned maybe contacting google abuse and seeing if we can work out something via that avenue? do we still think that is a good idea?19:33
*** rchurch has quit IRC19:36
*** hashar has joined #opendev19:39
*** rchurch has joined #opendev19:40
clarkb"Excessive web crawling" is an option in their drop down of abuse types19:44
fungicool, i've taken nl01 back out of the emergency disable list. should i manually restore the max-servers for rax-ord or wait for ansible to bring it back up? i guess no need to wait since only the executors need the mirror name fix right?19:52
clarkbya I think you can just return the old max-servers value19:52
fungioh, huh, looks like ansible did anyway. i probably raced an in-progress run and lost19:53
fungiyeah, it got updated at 17:33z19:54
fungishall i go ahead and revert the inap-mtl01 disablement too?19:54
clarkbI guess so and we can monitor it for trouble19:55
openstackgerritJeremy Stanley proposed openstack/project-config master: Revert "Temporarily stop booting nodes in inap-mtl01"  https://review.opendev.org/76022019:55
donnydin case somebody asks about POST_FAILURES related to OE - my edge router is on it's last leg apparently and just turned itself off and was quite resistant to turning back on....20:00
openstackgerritClark Boylan proposed opendev/base-jobs master: Disable logging to OpenEdge  https://review.opendev.org/76022220:03
clarkbfungi: donnyd ^ that should cover the log publishing side20:03
fungidonnyd: i know how it feels20:05
donnydyea, its a pretty large bummer fungi20:10
donnydbut I have done what I can20:10
fungiwell, er, yes i also know how it feels to have a router which doesn't want to boot, but i meant more that after the past few weeks i think i know how your router feels too. i've got a tendency to turn off at random and don't really want to power back on20:12
openstackgerritDonny Davis proposed openstack/project-config master: Disable OE, edge router is about to buy the farm  https://review.opendev.org/76022720:12
donnydI think the power supply is going, but it also puked up some controller errors for the internal disk controller20:13
donnydhonestly I can't believe it came back on at all20:13
donnydnot good at all20:14
fungiahh, yeah, failing psu can result in power sags which lead to misbehaving subsystems for sure20:14
donnydoh I feel you on the powering down part too20:15
donnydI want to power myself down right now and reboot sometime next week20:15
fungidefinitely tempting20:16
openstackgerritMerged opendev/base-jobs master: Disable logging to OpenEdge  https://review.opendev.org/76022220:28
clarkbhttp://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=68293&rra_id=all I think that shows the mirror.ord change is working20:32
clarkbtraffic on the eth0 side has dropped way off20:33
clarkbwe're still proxying externally as well as pulling from afs externally so the public interface slowness may still impact us but much less so due to caching20:33
clarkbI get much quicker throughput pulling that fedora atomic image from home so ya seems happier20:34
*** tosky has joined #opendev20:39
clarkbTheJulia: ^ fyi on that since you reported the pep8 timeouts that led to use doing that switch of interfaces20:39
*** mlavalle has quit IRC20:40
ianwclarkb: yeah, i ended up reporting it, if it goes to a human is debatable20:46
clarkbianw: is there a way to add new IPs to that or maybe we just wait and see what happens with the original report?20:46
ianwi comes from an email google-cloud-compliance-reply+<uuid-looking-thing> which says "please do not hesitate to reach out to us with additional questions or concerns"20:48
ianwpleasingly vague on what reaching out means20:48
fungiplease feel free to go soak your head20:49
ianwi can put together a reply with the latest ip20:49
ianwfungi: i noticed discussion on grafana puppet, i need to remerge https://review.opendev.org/#/c/739625/ but we should get rid of that20:50
fungiahh, okay20:51
openstackgerritMerged openstack/project-config master: Disable OE, edge router is about to buy the farm  https://review.opendev.org/76022720:56
openstackgerritIan Wienand proposed opendev/system-config master: Cleanup grafana.openstack.org  https://review.opendev.org/73962520:59
ianwfungi: any objections switching the mirror-update jobs today?  https://review.opendev.org/#/c/759965/  i've run a few by hand, especially the ubuntu ones, and they seem ok (keys available, seems to just work)21:01
ianwnote logs will be exported now too @ https://static.opendev.org/mirror/logs/reprepro/21:02
openstackgerritJeremy Stanley proposed opendev/system-config master: Document dual account split for Gerrit admins  https://review.opendev.org/76005121:03
fungiianw: should we poweroff the mirror-update.openstack.org server first so it won't fire any of its cronjobs in a race with the newer server between deployment jobs for the two, or do you already have it covered another way?21:06
*** lamt has joined #opendev21:08
ianwfungi: that might be best, i've marked them as disabled via puppet in that change but really i imagine we don't even need to apply that21:08
fungimmm, yeah, mainly trying to work out how we stop them from possibly both running briefly and fighting one another21:09
ianwif you're happy i can shut it down, put it in emergency, merge the change, and then monitor; if it works we can just merge the change to remove the old server from puppet21:09
ianwif it doesn't, we can always hand-edit the cron jobs while we figure it out21:10
fungiyeah, i'll poweroff the old server now after commenting out everything in root's crontab just in case we have to boot it again for some reason21:11
fungi#status log powered off mirror-update.openstack.org (with its root crontab content commented out) in preparation for merging https://review.opendev.org/75996521:13
openstackstatusfungi: finished logging21:13
fungiconfig-core: https://review.opendev.org/760220 brings inap back online should restore more capacity again21:15
ianwfungi: cool, thanks, i'll watch the jobs today21:17
ianwanother puppet host down!21:17
ianwclarkb: for consistency do we want to switch all the RAX's to an internal address?21:18
clarkbya probably a good idea21:18
ianwi can setup the dns entries today21:19
openstackgerritMerged opendev/system-config master: reprepo: enable cron jobs  https://review.opendev.org/75996521:45
*** slaweq has quit IRC21:48
*** Green_Bird has quit IRC21:54
*** dulek has quit IRC21:59
*** rm_work has quit IRC21:59
*** jrosser has quit IRC21:59
*** zer0c00l has quit IRC21:59
*** zer0c00l has joined #opendev21:59
*** rm_work has joined #opendev21:59
*** jrosser has joined #opendev21:59
*** dulek has joined #opendev22:01
*** slaweq has joined #opendev22:04
*** DSpider has quit IRC22:09
openstackgerritIan Wienand proposed opendev/system-config master: reprepro: randomise start times better  https://review.opendev.org/76025022:10
*** slaweq has quit IRC22:14
openstackgerritIan Wienand proposed opendev/system-config master: reprepro: fix cron config path and randomise times better  https://review.opendev.org/76025022:28
ianwfungi: ^ that one's now an actual fix as the cron job requires the full path (my testing i was doing with a manual command-line that i guess i got right)22:29
fungii've been noticing the cronspam from reprepro runs is kinda verbose too22:32
ianwhrm, are we missing stderr22:33
fungiit says it's doing 2>&122:34
fungiso i think reprepro may be writing to fd 3 or something22:34
ianwthat sounds like a reprepro thing to do22:35
fungiif you want to temporarily add your address to the daily firehose, it's done by editing /etc/ansible/hosts/group_vars/all.yaml22:35
fungion bridge22:36
ianwhrm, running it on command line i'm not seeing output with 2>&122:38
ianwfungi: is it debian-docker in particular?22:39
ianwflock -n /var/run/reprepro/debian-docker.lock bash -c "for DISTRO in xenial bionic focal; do reprepro-mirror-update /etc/reprepro/debian-docker-\$DISTRO mirror.deb-docker >>/var/log/reprepro/debian-docker-\$DISTRO-mirror.log; done" 2>&122:40
ianwi think the 2>&1 might be in the wrong place there22:40
fungiahh, yup possibly22:40
fungineeds to be inside the done, i agree that's it22:41
openstackgerritIan Wienand proposed opendev/system-config master: reprepro: catch stderr of individual deb-docker runs  https://review.opendev.org/76025422:42
fungiyeah, instead of redirecting stdout and stderr to the log, it was redirecting stdout to the log and then redirecting stderr of the outer calling process to its stderr22:43
fungier, to its stdout22:43
fungiso the stderr of the reprepro invocations was being written to the stdout of the flock and caught by crond22:44
ianwthat job's a bit ridiculous, but because they all share a volume it doesn't seem to make sense to split it into 3 jobs that could sometimes lock each other out22:45
fungiindeed22:45
TheJuliaclarkb: thanks for the update!23:05
*** sboyron has quit IRC23:10
openstackgerritMerged opendev/system-config master: reprepro: fix cron config path and randomise times better  https://review.opendev.org/76025023:27
*** hashar has quit IRC23:28
openstackgerritMerged opendev/system-config master: reprepro: catch stderr of individual deb-docker runs  https://review.opendev.org/76025423:47
ianwfungi: i think we're coming to the conclusion the next best step is to mod_rewrite these problem queries to a static html page saying "this is causing problems, please contact us"23:47
ianwit seems like a match on the UA + bits of the query string will keep it pretty unique?23:48
clarkbwe should be able to grep apache logs to do a rough confirmation of ^23:48
ianwi guess we have to chain a few rules together to be effective?23:48
ianwi was saying to clarkb thought it will be good to have examples in there when we're not under actual pressure to block something urgently23:49
clarkbfwiw I just grepped the UA and the set of url query parameters and both the head and tail of that were our most recent friend23:50
clarkbianw: I htink if you have a sequence of rewrite conditions they are ANDed together?23:50

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!