Wednesday, 2020-08-12

fungii can't *quite* remember what openssl error 1416F086 is. i'm sure it'll come to me in a moment00:02
fungiself-signed cert, i guess?00:03
fungiand git-send-email isn't going to let the server present just any ol' cert, huh?00:04
*** ryohayakawa has joined #opendev00:05
fungiokay, jaraco/irc pr is all green now00:06
fungii substituted the python interpreter version in ctcp version replies, we'll see how that flies00:07
ianwfungi: this is just set up to send via gmail (the corporate mail server) ... if anything should work i'd think it would be this :/00:07
fungiyikes. maybe something's wrong with the ca bundle on the client then?00:08
fungior maybe someone let a test cert leak into production at gmail00:08
ianw... no ... i had an old config file lying around from the last time i tried to use git send-email pointing to the old, internal corporate server, which must have been overriding my settings00:11
fungiaha00:11
ianwi wish i could just let this ipv6 thing go but it's my white whale :)00:13
*** shtepanie has joined #opendev00:42
ianwdonnyd/fungi: this is something like what i'm proposing as a libvirt doc update for the nat address choice ->  http://paste.openstack.org/show/796749/ ... seem about right?00:49
donnydI would maybe just use your real world example for the libvirt doc00:52
donnydother than that it LGTM00:53
ianwdonnyd: yeah, i guess the thing is https://tools.ietf.org/html/rfc4193#section-3.2.2 goes into great detail about how to generate a random /4800:53
ianwi figure if you put something in the doc, it just gets copied :)00:53
donnydThat is correct00:54
donnydand its exactly what people do00:54
donnydI would say its less likely that people will go read the gylphs from ietf00:55
donnydReading an RFC isn't really at the top of the list for most people on "things I will do with my evening"00:56
ianwheh :)  you can either have ipv6, or not read RFC's ... choose one :)00:56
ianwin think in general, the world has chosen the latter00:57
clarkbmy ISP says that we may have ipv6 by the end of the year00:58
clarkbwill be the best outcome of them being bought out if it happens00:58
ianwi don't know if it's a fedora bug that fd00: interfaces are not preferenced over ipv400:59
ianwfrom what i can tell, it's a practical decision that people had fc00::/7 addresses that didn't route anywhere, and it would case all sorts of issues01:00
ianwclarkb: odds that your ipv6 also comes with cgnat? :)01:18
clarkbI doubt it will01:20
*** qchris has quit IRC01:45
donnydOverall I think your post to the docs is a large value add and over time as ipv6 becomes less of the mystery to people, the more value things like simple usable docs will have.  And I think that is what you wrote up.01:54
*** qchris has joined #opendev01:57
*** shtepanie has quit IRC03:52
*** dmsimard2 has joined #opendev04:12
*** dmsimard has quit IRC04:13
*** dmsimard2 is now known as dmsimard04:13
*** ysandeep|away is now known as ysandeep04:14
*** logan- has joined #opendev04:40
*** weshay|pto has quit IRC06:12
*** weshay_ has joined #opendev06:13
*** DSpider has joined #opendev07:00
*** openstackgerrit has joined #opendev07:00
openstackgerrityatin proposed zuul/zuul-jobs master: Fix url for ARA report  https://review.opendev.org/74579207:00
*** ryohayakawa has quit IRC07:03
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: validate-host: skip linux nly tasks on windows machines  https://review.opendev.org/74579707:10
*** ssbarnea has joined #opendev07:18
*** zbr has quit IRC07:29
*** ssbarnea has quit IRC07:29
*** zbr9 has joined #opendev07:29
*** hashar has joined #opendev07:30
*** tosky has joined #opendev07:41
*** moppy has quit IRC08:01
*** moppy has joined #opendev08:01
*** openstackgerrit has quit IRC08:09
*** zbr9 has quit IRC08:12
*** zbr has joined #opendev08:13
*** tkajinam has quit IRC08:19
*** Eighth_Doctor has quit IRC09:05
*** mordred has quit IRC09:06
*** Eighth_Doctor has joined #opendev09:14
*** mordred has joined #opendev09:45
*** openstackgerrit has joined #opendev10:04
openstackgerritRiccardo Pittau proposed openstack/diskimage-builder master: Update name of ipa job  https://review.opendev.org/74304210:04
openstackgerritRiccardo Pittau proposed openstack/diskimage-builder master: Do not install python2 packages in ubuntu focal  https://review.opendev.org/74566510:16
openstackgerritCarlos Goncalves proposed openstack/diskimage-builder master: Add octavia-amphora-image-build-live jobs  https://review.opendev.org/74582310:20
cgoncalveshey there! openstackgerrit is back online but has not joined #openstack-lbaas and #openstack-infra at least10:22
*** DSpider has quit IRC10:32
*** DSpider has joined #opendev10:33
*** hashar has quit IRC10:44
*** lpetrut has joined #opendev10:45
*** calcmandan has quit IRC10:48
*** calcmandan has joined #opendev10:49
AJaegercgoncalves: are you missing notifications? It only joins channels when there's something to notify about. There's a maximal channel limit a user/bot can be in, so it leaves/re-joins as needed.10:54
cgoncalvesAJaeger, definitely missing in #openstack-lbaas10:55
*** sshnaidm is now known as sshnaidm|afk10:57
yoctozeptoit has not joined kolla either11:30
AJaegercgoncalves, yoctozepto: see above in this channel when it joined here - so, please give us a link to a change that should have been notified and didn't - and then somebody can check log files...11:46
cgoncalvesAJaeger, bot did not notify on #openstack-lbaas of https://review.opendev.org/#/c/74583111:48
cgoncalvesother changes: https://review.opendev.org/#/c/745820/ & https://review.opendev.org/#/c/685337/11:49
yoctozeptoI guess cgoncalves's changes are enough, in case of k&k-a it's a ton of these ;d11:50
AJaegerthanks. Let's ask infra-root to investigate those ^11:59
AJaegeryoctozepto: yes, cgoncalves' are enough12:00
AJaegerAt least I hope so ;)12:00
*** hashar has joined #opendev12:11
mnaserinfra-root: is http://mirror.ca-ymq-1.vexxhost.opendev.org having issues?  it's taking a long time to respond, but i don't have visiblity into the VM12:38
mnaserthings load but take a _very_ long time, enough to cause jobs to timeout12:39
mnasernothing in console log12:41
mnaserload average on the hypervisor it's on is 1.98 so the system is fine12:42
openstackgerritCarlos Goncalves proposed openstack/project-config master: Update branch checkout for octavia-lib DIB element  https://review.opendev.org/74587712:52
fricklermnaser: I can log in and don't see anything obviously bad. do you have logs? is it for the AFS mirror or some of the proxies?12:53
mnaserfrickler: seeing these "urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='mirror.ca-ymq-1.vexxhost.opendev.org', port=443): Read timed out. (read timeout=60.0)"12:54
mnaserbut also when i was opening it here on my side it took a long time for pages to load12:54
mnaserit seems responsive again now though12:54
fricklermnaser: hmm, cacti graphs are empty starting around 4:00, maybe some other infra-root can take a deeper look soon12:57
mnaserfrickler: ok good, i'm not going nuts :)12:57
*** Marcelo- has joined #opendev13:02
fricklerinfra-root: gerritbot indeed seems to assume it doesn't have to do any notification for most events, logging "INFO gerritbot: Potential channels to receive event notification: set()", so likely some kind of config issue with the new deployment13:02
mnaserfrickler, infra-root: to note though, i just saw it join #openstack-tc and post a notification about a merged change so..13:11
fungicacti can ping both v4 and v6 addresses for mirror.ca-ymq-1.vexxhost.opendev.org13:27
fungii thought we were only using sjc1 though?13:28
fungisnmpd is still running too13:29
*** sshnaidm|afk is now known as sshnaidm13:38
fungiit's configured to only log warnings and above, but journalctl doesn't have any for it other than restarts (most recent was a few weeks ago)13:45
fungiusing tcpdump now to see if snmp queries are getting there at all13:46
fungiokay, so i think the tcpdump nails down the issue... cacti is sending snmp queries to the mirror, it's receiving them and responding, but then cacti is never receiving the responses13:57
fungiit's over ipv6, so looks suspiciously like the unidirectional v6 packet loss we've been seeing with systems in rackspace, but which in particular seems to especially impact the cacti server for some reason13:58
openstackgerritMerged zuul/zuul-jobs master: Fix url for ARA report  https://review.opendev.org/74579214:04
*** ysandeep is now known as ysandeep|dinner14:10
fricklermnaser: yes, it isn't 100% broken, it just seems to serve only a very exquisite subset of events14:12
fungiwell, keep in mind that the majority of events it logs shouldn't generate notifications to channels14:28
fungiit gets the full gerrit event stream, analyzes every event and logs its decisions in the debug log, then only sends notifications for the tiny subset which its configuration says should get them14:29
fungii'll see if i can tell why 745831 wasn't announced to #openstack-lbaas for a start14:29
fricklerfungi: ah, you're right of course, seems the docker log only holds about 1h worth of data, we might want to log to somewhere more persistent14:37
fungiyep, once i can figure out how to get docker-compose to show me more logs...14:37
fungihrm, yeah i'm starting to suspect that docker-compose is just throwing away logs and not saving them anywhere14:39
fungiaha! it also writes to syslog14:40
fungilooks like event 64c3a1b1decf is the one we want14:42
fungino, nevermind, that's not an event, that's a process14:42
fricklerbut it seems to claim to have logged a message for that. not sure why every line seems to be logged twice in syslog, though14:43
fungiAug 12 11:03:29 eavesdrop01 64c3a1b1decf[1386]: 2020-08-12 11:03:29,173 INFO gerritbot: Potential channels to receive event notification: {'openstack-lbaas'}14:44
fungiAug 12 11:03:29 eavesdrop01 64c3a1b1decf[1386]: 2020-08-12 11:03:29,173 INFO gerritbot: Compiled Message openstack-lbaas: Carlos Goncalves proposed openstack/octavia master: Set Grub timeout to 0 for fast boot times  https://review.opendev.org/74583114:44
fungiAug 12 11:03:29 eavesdrop01 64c3a1b1decf[1386]: 2020-08-12 11:03:29,174 INFO gerritbot: Sending "Carlos Goncalves proposed openstack/octavia master: Set Grub timeout to 0 for fast boot times  https://review.opendev.org/745831" to openstack-lbaas14:44
fungiso it thinks it sent it to the server14:44
fungibut yeah, no sign of it in http://eavesdrop.openstack.org/irclogs/%23openstack-lbaas/%23openstack-lbaas.2020-08-12.log.html14:45
johnsomThe bot isn't in the #openstack-lbaas channel according to my client.14:45
johnsomI think it used to lurk in the channel if I remember right.14:46
fungijohnsom: yeah, it can't join all channels (it's configured for more than freenode allows) so it opportunistically joins channels if it has a message for them and then parts if it needs to free up available channels to be able to post messages in others14:46
cgoncalvesjohnsom, AJaeger wrote earlier this: " It only joins channels when there's something to notify about. There's a maximal channel limit a user/bot can be in, so it leaves/re-joins as needed."14:46
johnsomAh, ok14:47
fungiso at start it's present in no channels, and joins them on demand up to the (120?) channel limit, then starts leaving the least recently needed channels as it has to join others14:47
fricklerI think there is some issue with the bot wanting to log to "openstack-lbaas" instead of "#openstack-lbaas"14:47
fricklerfor channels where it works, there is a "#" in the channel name14:48
*** mlavalle has joined #opendev14:49
fungii concur14:49
fungiAug 12 14:48:29 eavesdrop01 64c3a1b1decf[1386]: 2020-08-12 14:48:29,154 INFO gerritbot: Potential channels to receive event notification: {'#openstack-release'}14:49
fungiet cetera14:49
fungiso looks like maybe a configuration error14:49
fungithough the configuration doesn't use # in front of any channel names14:50
fricklerchannel_config.yaml doesn't have a # for any channel14:50
fungiyeah, filtering the logs i see it also incorrectly trying to send to a bunch of non-# channel names14:53
fungii wonder if this is a recent regression in gerritbot14:53
fungilooks like the version running on review.o.o was e387941 from december14:56
fungiluckily there have been only 6 commits since then14:57
*** priteau has joined #opendev14:59
fricklerwe did run with py2 on review, didn't we? so might be a py3 issue, the commit since dec don't look suspicious to me14:59
fungiyeah, i've now gone over all the recent commits since what we had installed on review.o.o and i agree, none of those were significant in ways which should impact this15:01
fungiso gonna need to roll up sleeves and dive deeper in the code15:01
*** ysandeep|dinner is now known as ysandeep15:03
fricklerfungi: https://opendev.org/opendev/gerritbot/src/branch/master/gerritbot/bot.py#L403-L412 looks to mix up using data and self.data, I'd go for cleaning that up first15:04
fungihttps://opendev.org/opendev/gerritbot/src/branch/master/gerritbot/bot.py#L406-L407 is where it seems to prepend the #15:05
fungioh, you're already there-ish15:05
fungii feel like we're both covering the same ground15:05
fricklerfungi: my python foo isn't strong, but that code makes me wonder whether changing data after setting "self.data=data" may behave different with py315:06
fungiyeah, modifying an iterable in place15:07
fungiwhile iterating on it15:08
fungiactually not a great idea. better to iterate on a copy while using it as a reference to modify the original15:08
fungii bet data.keys() returned a copy in python 2 but returns an iterable tied to the data object in python 315:09
fungiwe might want to do keys = list(data.keys()) there?15:09
AJaeger"modifying an iterable in place" is not allowed anymore with python 3.7 and a hard failure.15:10
fungi`python --version` in the container says "Python 3.7.8"15:12
fungiso i guess that's not it15:12
* frickler needs to leave, will check back later15:15
*** ysandeep is now known as ysandeep|away15:19
fungii'll keep fiddling with it15:19
smcginnismnaser: I've seen tow patches with retry failures to a Vexxhost mirror. Not sure if it's just flakiness in the network or something else, but thought I should mention it.15:20
mnasersmcginnis: yeah, i notiecd that this morning.  i dont see anythign in the system itself15:20
mnaseris it happening recently? apparently the most recent i've seen it was around 9am-ish15:21
mnasersmcginnis: it was _really_ slow to respond, don't have access to the machine.  when did those patches fail?15:21
smcginnismnaser: Just hit it now with https://zuul.opendev.org/t/openstack/build/3ea6602420aa4c3abe80d497f57b577715:21
mnaseryes.  i can see that when i click http://mirror.ca-ymq-1.vexxhost.opendev.org/ it takes a while to open folders15:22
smcginnisThe last one I looked at before this (last night I think) it had some retries that eventually succeeded for an earlier package, ran fine for a few more, then timed out with retries on a later one.15:22
mnasereverything is ok from our side :\15:24
mnasercc infra-root ^15:24
smcginnisWe can blame network gremlins for now.15:24
mnasersmcginnis: it's repeated a few times today15:24
mnaserplus it's stopped reporting into cacti too, so there's that15:25
mnaserwe really should get to the bottom of it otherwise we're just wasting compute power15:25
smcginnisYeah15:26
smcginnismnaser: When you said "on our side" above, were you referring to vexxhost or opendev as "our"? :)15:27
mnasersmcginnis: sorry, vexxhost is not seeing any issues :)15:27
openstackgerritThierry Carrez proposed opendev/system-config master: Redirect UC content to TC site  https://review.opendev.org/74449715:27
fungimnaser: you probably missed my investigation of the cacti situation above, but i don't see that it's likely to be related15:28
mnaserit hurts us even more becuase that means any job running on our cloud will fail and just burn through systems15:28
mnaseroh, i guess this might be the ipv6 thing happening eh :\15:29
mnaserthe thing is, i am having problems accessing http://mirror.ca-ymq-1.vexxhost.opendev.org even over ipv4 (seeing it load slowly when i browse around)15:29
fungitcpdump shows snmp requests reaching the mirror, snmpd on the mirror replies, but those responses never reach cacti. that's over ipv6 and we've got similar situations with other systems not able to get v6 packets back to cacti (even from within rackspace's own network)15:29
fungiafs, in contrast, is all over ipv415:29
fungiit doesn't even support ipv615:29
mnaserdo apache logs show any slow requests?15:30
smcginnisOK, just saw three more recent job failures matching this. Looks like it definitely is a bigger issue.15:31
mnaseryeah, i am seeing failures here too15:31
fungiapache doesn't log the time a request takes to satisfy, that i can find15:36
mnaserah, that's a bummer15:37
mnasernothing in eror logs?15:37
fungiit's constantly spewing file negotiation failures, that's generally just because it doesn't know the file type though i think15:37
fungilike:15:38
fungiAH00687: Negotiation: discovered file(s) matching request: /var/www/mirror/wheel/ubuntu-18.04-x86_64/a/appdirs/index.html (None could be negotiated).,15:38
fungialso seeing some of:15:38
fungiAH01401: Zlib: Validation bytes not present,15:38
fungii'll shift gears to look into this deeper, and finish worrying about fixing gerritbot later15:39
fungidoes someone have a link to a job failure?15:39
mnaserfungi: https://zuul.opendev.org/t/vexxhost/build/26648c0867ab4f7eb4aa5567f60007e1 here is one15:40
fungithanks15:40
fungiokay, so for starters, /pypi/ isn't anything we mirror, it's not served out of afs, this is a proxy to the nearest fastly cdn endpoint for pypi.org15:43
fungiso i'll check to see whether that mirror host is having trouble reaching or getting responses from pypi15:43
mnaserhttps://zuul.opendev.org/t/vexxhost/build/61c6cf30b09c4deb821e05108dc1b0d3 -- another breakage too, this one might not be related to teh cache though fungi15:43
mnaseri think /pypifiles/ is hosted locally15:43
funginope, also a proxy15:44
mnaserah15:44
fungihttps://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror/templates/mirror.vhost.j2#L9415:44
fungipypi is split between an index site and a file hosting site, so we have to proxy both15:45
fungibut both sites use the same cdn network (fastly) so are likely winding up hitting the same endpoints for it15:45
fungiConnecting to pypi.org (pypi.org)|2a04:4e42:200::223|:443...15:45
fungiwhen i do `wget -O/dev/null 'https://pypi.org/'`15:46
fungiit's just sitting there15:46
fungimnaser: can you reach pypi at all over ipv6 from vexxhost ca-ymq-115:47
fungi?15:47
mnaseryep, its hanging indeed15:48
fungiso that seems to be the crux of the problem15:48
mnaserat this point it wouldn't even surprise me that someone got mad at someone and unpeeered and routes are gone15:48
fungiyup15:49
fungii don't miss my days doing isp backbone peering15:49
fungiall the finger-pointing between carriers was atrocious15:49
fungiwe can dial max-servers down to 0 there in the short term if you'd like15:50
fungibasically anything trying to pip install in a job is going to fail there for now, i think15:50
*** shtepanie has joined #opendev15:51
fungii'll push something up15:53
mnasergive me a few minutes to try and see whats going on before we stop the whole thing, if that's ok15:53
fungisure, works for me15:53
mnaseri mean, maybe we could workaround it by adding pypi.org ipv4 address to /etc/hosts -- just for now i guess15:54
fungithat might work. i think apache will check that. but since pypi and pythonhosted are using a cdn, hard-coding its ip addresses could be risky15:54
mnaseri agree, but i am talking about doing that for an hour or two at most while i debug, just to get things flowing again15:55
mnaseri agree, but i am talking about doing that for an hour or two at most while i debug, just to get things flowing again15:55
mnaseroops, wrong arrow up enter window15:55
fungiyeah, i also can't ping ipv6 addresses for some stuff in rackspace from the mirror there, so i guess it's a fairly broad set of routes affected15:56
mnaserping6 google.ca seems to work though15:57
mnaserso it's a subset of things...15:57
*** lpetrut has quit IRC15:58
fungiyep, i mean, i'm ssh'd in over ipv6 so it's obviously working for some routes15:58
fungibut it's clearly more than just the fastly cdn endpoint impacted15:59
mnaserfungi: so you said that when you were doing your tcpdumps, incoming traffic arrived, responded to but never reached rax (wrt snmp?)15:59
fungicorrect15:59
fungiso this could be related, and it's an asymmetric routr15:59
fungiroute16:00
fungiinstalling traceroute to see if i can spot a difference16:00
mnaserfungi: if you don't mind, could you run a traceroute from rax to the mirror?16:00
mnaserthank you :)16:00
fungiheh16:00
mnaserthat way i can see at least which path it's unhappy about16:00
fungiyeah, been in your shoes more times than i care to remember16:01
fungimnaser: oddly, i never get a response even from the gateway16:03
fungino hops responding16:03
fungioh wow.16:04
fungino default route?16:04
fungioh, nevermind, there are two16:04
fungilooks like i'm seeing default routes announced through fe80::ce2d:e0ff:fe0f:74af and fe80::ce2d:e0ff:fe5a:d84e but neither are responding with ttl expired when trying to traceroute16:05
fungitried both udp (default) and icmp traceroute16:06
fungii can traceroute to my home ipv6 address just fine, the one i'm ssh'd in from16:07
fungibut not to cacti's16:07
funginor to pypi.org16:08
fungimy home address is in 2600:6C00::/24 and traceroute shows responses starting from 2604:e100:1:0:ce2d:e0ff:fe5a:d84e in vexxhost16:10
fungibut even that hop doesn't show up when tracerouting to cacti or pypi16:10
fungido you plumb ebgp all the way down into your igp there? or is there something shadowing those prefixes in your igp?16:11
mnaserfungi: fe80::ce2d:e0ff:fe0f:74af is link-local interfaces which are announced to all vms16:13
mnaserthey shouldnt be reachable from the vms there16:14
mnaseri an see that its pingable from another machine here16:14
fungipypi is?16:14
fungibut yeah, for v6 destinations i can reach, 2604:e100:1:0:ce2d:e0ff:fe5a:d84e responds as the first hop. for v6 destinations i can't reach, there is no first hop (or any hops) responding16:15
mnaserwhich means thats the same host as  fe80::ce2d:e0ff:fe5a:d84e16:16
fungimaking me suspect that the first router is black-holing those prefixes somehow16:16
mnaserwhich means that potentially fe80::ce2d:e0ff:fe0f:74af is the issue16:17
fungiif it simply didn't have a route for them i'd expect an icmp no route to host or network unreachable16:17
mnaserplot twist, 74af is the one that holds the closer route to pypi16:18
fungifwiw, i can ping 2604:e100:1:0:ce2d:e0ff:fe0f:74af just fine16:18
mnaseroh i have an idea16:19
fungiso if it were the problem i'd expect to be getting messages back from it in a traceroute16:19
mnasercan you ping 2001:550:2:6::26:216:19
mnaserand 2605:9000:400:107::c16:19
fungii get responses from 2605:9000:400:107::c but not 2001:550:2:6::26:216:19
mnaserprogress16:21
mnaserfungi: what about 2001:550:2:6::26:1 ?16:22
fungino response16:22
mnasercan i get a trace to :1 ?16:22
fungitraceroute to 2001:550:2:6::26:1 (2001:550:2:6::26:1), 30 hops max, 80 byte packets16:23
fungi1  * * *16:23
funginothing back from any hops16:23
mnaserthe heck16:23
mnaseris there no route towards it?16:23
mnaserfungi: i assume you are pinging/tracing from your local system right?16:23
fungior whatever is handling the next hop is eating it silently16:24
fungithese are pings/traceroutes from the mirror instance in ca-ymq-116:24
mnaserah, i thought those were pings externally16:24
mnaserare you able to reach those from rax or local at your side?16:25
fungifrom home i can reach all three of 2001:550:2:6::26:2 2605:9000:400:107::c 2001:550:2:6::26:116:26
fungisame from bridge.openstack.org in rackspace dfw16:26
mnaserwhats interesting is16:27
mnaserit stopped working exactly at 4am utc16:27
mnaserwhich is 12am est16:27
fungii can also ping the mirror from bridge.o.o but can't ping bridge.o.o from the mirror16:27
mnaserhmm, provider did an upgrade overnight16:28
fungisimilarly i can ping mirror.ca-ymq-1.vexxhost.opendev.org from cacti.openstack.org but not the reverse16:29
fungiit's like replies to inbound flows for icmp echo request are set up and the returning echo replies get routed correctly (but the same is apparently not true of snmp/udp?)16:30
fungiit's just odd to see this sort of stateful behavior at the carrier level. they must be doing some sort of flow-based balancing across their gear or something and not simple hash16:33
fungior maybe i'm just getting lucky and outbound icmp echo replies are getting hashed through a working router but not outbound echo requests16:35
mnaserfungi: yeah.. somethign is weird there.  i turned off the ipv6 peer to restart it16:35
mnaserand now it just stuck oepnsent16:35
fungiew16:36
mnaserfungi: ok, it's escalated with the network provider right now16:42
mnaserbgp session is back up but this is still a problem16:42
fungiwant me to go forward with a max-servers=0 patch for now?16:43
mnaseryeah lets do that and we can approve it together quickly to unblock world16:43
openstackgerritJeremy Stanley proposed openstack/project-config master: Temporarily disable vexxhost ca-ymq-1  https://review.opendev.org/74592916:45
fungimnaser: ^ feel free to single-core approve16:45
fungii can enqueue to the gate directly16:45
mnaserfungi: done and think that's a good idea16:46
mnaseras we may fail a few times along our way there16:46
fungiit's in the gate now16:46
openstackgerritJeremy Stanley proposed opendev/gerritbot master: Iterate over a copy of the channel keys  https://review.opendev.org/74593016:50
fungifrickler: AJaeger: clarkb: ^ i hope that's the fix for gerritbot16:50
openstackgerritMerged openstack/project-config master: Temporarily disable vexxhost ca-ymq-1  https://review.opendev.org/74592916:59
*** Baragaki has joined #opendev17:04
*** priteau has quit IRC17:27
*** Marcelo- has quit IRC17:59
fungiinfra-root: if anyone happens to have a moment to spare, i'm hoping https://review.opendev.org/745930 will solve our latest gerritbot regression18:11
*** knikolla_ has joined #opendev18:13
clarkbfungi: one jetty based +218:17
corvus+318:18
fungimuch thanks all!18:19
fungii'll give the logs a close watch after this merges18:20
*** mordred has quit IRC18:20
*** gouthamr has quit IRC18:20
*** knikolla has quit IRC18:20
*** knikolla_ is now known as knikolla18:20
fungiclarkb: enjoy the jetty! hope you catch something without catching something18:20
corvusheh, i thought clarkb was digging into java web servers, but this is better.  :)18:21
*** Eighth_Doctor has quit IRC18:21
*** gouthamr has joined #opendev18:22
*** mordred has joined #opendev18:28
openstackgerritMerged opendev/gerritbot master: Iterate over a copy of the channel keys  https://review.opendev.org/74593018:56
*** Eighth_Doctor has joined #opendev19:00
*** shtepanie has quit IRC19:21
mnaserinfra-root: would anyone be kind enough to run a traceroute6 from bridge.openstack.org to mirror.ca-ymq-1.vexxhost.opendev.org ?19:23
corvuson it19:25
corvusmnaser: http://paste.openstack.org/show/796794/19:27
fungimnaser: traceroute back is unfortunately still blank, no response even from the first hop19:29
mnasercorvus, fungi: thank you.  yes, the no response from first hop is very confusing19:30
corvusmnaser, fungi: fwiw, mtr from bridge: http://paste.openstack.org/show/796795/19:32
mnasercorvus: while you're in there, can you confirm if '2604:e100:1:0:ce2d:e0ff:fe0f:74af' and '2604:e100:1:0:ce2d:e0ff:fe5a:d84e' are indeed currently sending icmp requests to that system?19:43
mnasersorry, i just don't have access to a system that is on an 'unreachable' network :(19:43
corvusworking19:43
corvusmnaser: yes receiving and replying19:48
corvusseems fairly steady at 1hz each19:48
mnaserhrm, ok.  alright, so both outbound routes are working just fine19:48
corvusand stopped :)19:48
mnaseryep, as expected19:49
*** hashar has quit IRC20:43
mnasersigh21:07
mnaserinfra-root: can someone run this on the mirror node -- `ip -6 addr list | grep 2001:db8 | awk '{ print $2 }' | xargs -I {} -n1 ip addr del {} dev eth0`21:08
mnasersomehow 2001:db8:{0,1}::/64 addresses got dynamically configured, i'm still digging into this, but they'll need to be removed21:08
openstackgerritMohammed Naser proposed openstack/project-config master: Revert "Temporarily disable vexxhost ca-ymq-1"  https://review.opendev.org/74596621:10
fungimnaser: probably you want ens3 instead of eth0, and done21:10
mnaserinfra-root: ^ appreciate a vote on that, once that's done and verified, we can land that, meanwhile i'll be investigating how the ra showed up21:10
mnaserfungi: ah cool, wanna run/merge that?21:11
fungican do, also after clearing those routes i can reach stuff from the mirror again21:12
fungialso https://mirror.ca-ymq-1.vexxhost.opendev.org/pypi/ is working21:12
fungimnaser: and approved, thank for looking into its!21:13
fungier, it21:13
mnaserfungi: what i noticed was when tcpdumping, it was picking the 2001:db8:: as src address21:13
mnaserwhen trying to reach pypi.org21:13
mnaserbut going to google, it wasn't21:13
fungigot it, so there was a rogue prefix announced on that lan?21:16
fungifun stuff21:16
fungii think we thought we saw that once in limestone too, but couldn't repro it21:17
fungithat also explains the symptoms we saw, as far as being able to reach the machine but it not being able to reach stuff21:18
fungiit was responding from the address to which things were connecting, but initiating from a different address which wasn't routable21:18
fungi2001:db8:0:3::/64 dev ens3 proto ra metric 100 expires 2550846sec pref medium21:21
fungi2001:db8:1::/64 dev ens3 proto ra metric 100 expires 2550801sec pref medium21:21
fungiwe had both of those in the local routing table21:21
fungi2001:db8::/32 IPV6-DOC-AP "IPv6 prefix for documentation purpose" (This address range is to be used for documentation purpose only. For more information please see http://www.apnic.net/info/faq/ipv6-documentation-prefix-faq.html )21:22
logan-https://bugs.launchpad.net/neutron/+bug/184471221:23
openstackLaunchpad bug 1844712 in OpenStack Security Advisory "RA Leak on tenant network" [Undecided,Incomplete]21:23
logan-that was a strange one. what youre seeing looks like a recurrence of that bug mnaser. the block is different, but that's only because the ipv6 blocks those jobs use were updated over the course of that bug: https://bugs.launchpad.net/neutron/+bug/1844712/comments/8 ...to the cidr you saw today :)21:28
openstackLaunchpad bug 1844712 in OpenStack Security Advisory "RA Leak on tenant network" [Undecided,Incomplete]21:28
clarkbfungi: corvus got two keeper rock fish. we got a ling cod but it was below minimum size so went back in21:30
fungicodesearch mostly turns up hits in our docs (unsurprisingly) but the prefix also gets heavy use in test for neutron, horizon, octavia, nova, tripleo, searchlight, ironic, manila, swift, zun, charms, designate, tempest, devstack, kuryr, osc, vitrage, watcher, cinder, monasca, sdk, puppet, several oslo libs... http://codesearch.openstack.org/?q=2001%3Adb821:31
fungihard to tell just from that what might be spewed via route announcements from misconfigured job nodes21:32
mnaserlogan-: urgh.  but you run lxb right?21:32
logan-yup, lxb cloud in that bug21:32
fungibut since the mirror is in a different tenant... it's somewhat unexpected behavior21:33
mnaseri guess this is more of a firewall driver issue.  i think we have some systems that use iptables_hybrid and some with ovs driver21:33
mnaserlogan-: you dont have a repro i assume?21:35
logan-possible way to approach it: find a timestamp for the RA / IP getting added on mirror, and then correlate that with jobs that were running at the time to try and identify the suspect VM(s), then look thru nova/neutron logs to try to find what went wrong21:36
logan-nope, never was able to repro21:36
mnaseri cant imagine this would be easy to reproduce21:36
logan-and you'd think with all the vm launches going on we'd see it more often. it is crazy when it pops up21:36
fungigotta be a rare race with ports and filters or something21:37
mnaserlogan-: how many times have you hit this?  i'm asking because we only recently upgraded to stein for this cloud21:37
logan-once21:37
logan-on rocky21:37
mnaserthis cloud was on queens for little while and we never hit it, but hitting it once also isn't a correct index21:37
mnaserlogan-: what is interesting is this failed almost near exactly at 4am utc / 12am est21:37
mnaserhttps://usercontent.irccloud-cdn.com/file/JGp4BKOx/image.png21:38
logan-hmm, iirc ours was mid-morning EST so i didn't think much of the timing21:38
mnaser19th of september21:39
mnaserlet me see if cacti goes that fa back21:39
mnaserunfortunately not21:40
fungilikely the server got rebuilt21:49
fungireplaced, whatever21:49
fungiwe've been replacing a lot of our mirrors over the past year for ubuntu upgrades and newer domain name21:49
fungiand ansibilification21:49
openstackgerritMerged openstack/project-config master: Revert "Temporarily disable vexxhost ca-ymq-1"  https://review.opendev.org/74596621:50
logan-in the irc log from that bug, https://i.imgur.com/XsJTj6Y.jpg was linked. that's in central time.. so when it happened on our cloud it was around 9:15 AM Central. then it wasn't discovered until around 10:30-11, and the test node(s) that we guessed might have caused the issue were long gone by then.21:55
fungiclarkb: if you're around, what triggers updating the docker image for gerritbot on eavesdrop? i see that we published the hopefully-fixed build to dockerhub but the hourly system-config deploy doesn't seem to be doing it. the daily deploy?22:38
clarkbI think that is currnetly missing22:39
clarkbwe can have gerritbot changes themselves trigger them or do tthem hourly like zuul and nodepool22:39
clarkbalso need to tie in project-config to trigger infra-prod-service-eavesdrop when channel config updates22:40
*** tkajinam has joined #opendev22:57
fungiclarkb: so for now should i just docker-compose down/up -d?23:00
clarkbyou need to do a pull first I think23:12
clarkbbut ya that should do it23:12
fungiahh, right-o23:12
fungipulled23:12
*** openstackgerrit has quit IRC23:13
fungidowned23:13
fungiupped23:13
fungiwatching syslog23:13
*** tosky has quit IRC23:13
fungi#status log manually pulled, downed and upped gerritbot container on eavesdrop for recent config parsing fix23:15
openstackstatusfungi: finished logging23:15
*** ryohayakawa has joined #opendev23:49
*** ryohayakawa has quit IRC23:56
*** ryohayakawa has joined #opendev23:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!