Friday, 2022-09-30

*** ysandeep is now known as ysandeep|afk01:33
*** ysandeep|afk is now known as ysandeep02:19
*** ysandeep is now known as ysandeep|afk04:15
*** soniya29 is now known as soniya29|ruck04:32
*** ysandeep|afk is now known as ysandeep05:04
*** soniya29|ruck is now known as soniya29|ruck|afk05:26
*** soniya29|ruck|afk is now known as soniya29|ruck06:00
*** ysandeep is now known as ysandeep|away07:01
*** frenzyfriday|ruck is now known as frenzyfriday|rover07:07
*** jpena|off is now known as jpena07:17
*** ysandeep|away is now known as ysandeep|lunch08:10
opendevreviewSimon Westphahl proposed zuul/zuul-jobs master: Allow overriding of Bazel installer checksum  https://review.opendev.org/c/zuul/zuul-jobs/+/85994308:13
opendevreviewSimon Westphahl proposed zuul/zuul-jobs master: Allow overriding of Bazel installer checksum  https://review.opendev.org/c/zuul/zuul-jobs/+/85994308:31
opendevreviewSimon Westphahl proposed zuul/zuul-jobs master: Allow overriding of Bazel installer checksum  https://review.opendev.org/c/zuul/zuul-jobs/+/85994308:45
*** soniya29|ruck is now known as soniya29|ruck|lunch08:46
*** soniya29|ruck|lunch is now known as soniya29|ruck09:36
*** ysandeep|lunch is now known as ysandeep09:40
*** Guest1737 is now known as diablo_rojo10:09
*** soniya29|ruck is now known as soniya29|ruck|afk10:24
*** rlandy|out is now known as rlandy10:34
*** soniya29|ruck|afk is now known as soniya29|ruck10:36
*** ysandeep is now known as ysandeep|brb10:45
*** ysandeep|brb is now known as ysandeep11:13
opendevreviewArtem Goncharov proposed openstack/project-config master: Add Post-Check flag to OpenStackSDK project  https://review.opendev.org/c/openstack/project-config/+/85997611:42
opendevreviewArtem Goncharov proposed openstack/project-config master: Add check-post pipeline  https://review.opendev.org/c/openstack/project-config/+/85997711:43
*** dviroel is now known as dviroel|afk11:50
opendevreviewArtem Goncharov proposed openstack/project-config master: Add Post-Check flag to OpenStackSDK project  https://review.opendev.org/c/openstack/project-config/+/85997611:56
opendevreviewArtem Goncharov proposed openstack/project-config master: Add Post-Check flag to OpenStackSDK project  https://review.opendev.org/c/openstack/project-config/+/85997612:15
opendevreviewArtem Goncharov proposed openstack/project-config master: Add Post-Check flag to OpenStackSDK project  https://review.opendev.org/c/openstack/project-config/+/85997612:27
*** dviroel|afk is now known as dviroel13:06
opendevreviewArtem Goncharov proposed openstack/project-config master: Add Allow-Post-Review flag to OpenStackSDK project  https://review.opendev.org/c/openstack/project-config/+/85997613:07
opendevreviewArtem Goncharov proposed openstack/project-config master: Add post-review pipeline  https://review.opendev.org/c/openstack/project-config/+/85997713:07
*** soniya29|ruck is now known as soniya29|ruck|dinner13:38
*** dasm|off is now known as dasm13:39
*** soniya29|ruck|dinner is now known as soniya29|ruck14:04
Clark[m]fungi: re the git clone errors I'm not sure playing telephone on the mailing list is going to help us. We probably need to have those affected share IPs if we can't infer them from logs and trace specific requests to the backends and see what gitea and haproxy look like14:06
Clark[m]However that sort of issue is one that corporate firewalls can instigate iirc14:07
fungiyeah, that's why in my last call i asked if folks are using the same provider or in the same region of the world14:08
*** lbragstad4 is now known as lbragstad14:53
Clark[m]Gitea06 has a spike in connections but in line with other busy backends15:04
fungiyeah, and it didn't persist15:08
clarkbthe other thing to check is for connections in haproxy with a CD or is it cD ? status15:09
clarkbthat indicates haproxy belives the client disconnected iirc15:10
clarkband would be a good clue that something downstream of us initiated the eof15:10
fricklerheadsup coming monday is a bank holiday in Germany, I may or may not be around15:10
fungienjoy! and thanks for the reminder15:11
*** ysandeep is now known as ysandeep|out15:11
*** dviroel is now known as dviroel|lunch15:12
clarkbhttps://www.haproxy.com/documentation/hapee/latest/onepage/#8.5 yes CD is client disconnection15:17
fungiclarkb: it seems this is somewhat common if git (or other things doing long-running transfers) are built against gnutls. it's apparently not as robust against network issues as openssl15:17
fungiit might be solved in newer gnutls, or on newer distro releases which build git against openssl (since the relicensing)15:20
fungithough the version in debian/sid still says it depends on libcurl3-gnutls15:22
clarkbI count 301 CD terminated connections since todays log began from a german ipv6 addr15:24
clarkbthis isn't the worst offender (an amazon IP is) and there is a .eu ipv4 addr as well15:24
fungidoes traceroute to those from the lb seem to traverse a common provider?15:25
clarkbsomething like cat haproxy.log | grep -v -- ' -- ' | cut -d' ' -f 6 | sed -ne 's/\(:[0-9]\+\)$//p' | sort | uniq -c | sort | tail15:25
fungiparticularly one other than (or after) zayo?15:26
* clarkb checks15:26
clarkbthe ipv4 trace has a distinct lack of hostnames and since the other is ipv6 its hard to map acorss them15:27
clarkbwith ipv6 we seem to go cogent to level 3. With ipv4 zayo to I don't know what to the final destination15:28
fungiyeah, i've noticed a lot of backbone hops no longer publish reverse dns entries, sadly15:28
fungiwhat is this net coming to?15:28
clarkbthis is the gitea load balancer15:28
fungisorry, that was a rhetorical quip15:28
clarkboh heh15:28
clarkbIn any case I do see a fairly large number of CD states recorded by haproxy. Implying to me that it is very possible this is downstream of us. But as I mentioned before playing telephone on the mailing list makes it difficult to say for sure15:29
fungianyway, i'm turning off my git clone loops since they never produced any errors at all15:29
clarkbif we had someone in here that could tell us what the originating IPs are and more specific timestamps this would be easier15:29
clarkb++15:29
*** marios is now known as marios|out15:31
fungigtema: ^ you mentioned seeing it from somewhere in europe?15:33
fungineil indicated seeing it from his home provider in the uk, and says the ci system he's seeing it on is hosted in germany15:33
gtemayes, I am seeing it from europe15:34
gtemafrom germany15:34
fungiapparently if you have a git built linking against openssl instead of gnutls it might handle that better, but it looks like debian (and so ubuntu) are at least still linking git with gnutls in their latest builds15:36
gtemaunder fedora and mac I see different error, but still falling on the same feet15:36
clarkbgtema: and this is all over ipv4?15:38
gtemayes15:38
clarkbif you're comfortable PM'ing me a source IP addr I can cehck the haproxy logs to see if that IP shows the CD termination state15:38
dtantsurgtema: I assume you also have telekom?15:39
gtema80.158.88.20615:39
gtemadtantsur - yes, telekom15:39
* dtantsur may have IPv6 though15:39
fungihopefully this isn't hurricane impact on transatlantic communications, but the winds are only just now reaching mae east where most of that peering is15:39
gtemabut not only from home telekom, from the telekom cloud as well, in all cases traceroute start hickup in zayo net15:40
gtemabut issues eventually seen also between nl and uk already, sometimes uk - us15:40
clarkbI see 10 completed connections. 5 with CD state. 3 with SD. 2 with -- (normal)15:40
dtantsurPING opendev.org(2604:e100:3:0:f816:3eff:fe6b:ad62 (2604:e100:3:0:f816:3eff:fe6b:ad62)) 56 data bytes15:41
dtantsurFrom 2003:0:8200:c000::1 (2003:0:8200:c000::1) icmp_seq=1 Destination unreachable: No route15:41
dtantsurthis is weird, I can definitely clone from it..15:41
clarkbdtantsur: interesting you're on ipv6 not ipv4?15:41
dtantsurbut yeah, my difference with gtema may be IPv615:41
clarkbthat may explain it15:41
fungialso the mae east peering is pretty much a bunker (inside a parking garage in tyson's corner), and has massive battery backups, so very unlikely to be the weather ;)15:41
clarkbinterestingly the SD states all happened in a ~5 minute window about 5 hours ago and since then its largely CDs sprad out15:42
clarkbdefinitely seems network related particularly if someone on the same isp but using ipv6 is able to get through just fine15:43
dtantsurclarkb: aha, curl tries IPv6 and falls back to v415:43
clarkbdtantsur: are you able to force ipv4 and reproduce?15:43
clarkb(you can modify /etc/hosts for opendev.org to be the ipv4 addr only)15:43
clarkbthere might be a git flag too, but ^ should work reliably15:43
dtantsur"connect to 2604:e100:3:0:f816:3eff:fe6b:ad62 port 443 failed: Network is unreachable" from curl15:44
dtantsurI do find it suspicious15:44
dtantsurlemme try15:44
gtemaclarkb - do you see any changes with my ip - right now it again failed15:44
clarkbgtema: this time it was an SD15:44
gtemaSD is what?15:44
clarkblet me check the gitea02 logs15:45
clarkbgtema: server initiate the disconnect15:45
gtemahmm15:45
gtemaI see: error: RPC failed; curl 18 transfer closed with outstanding read data remaining15:45
gtemafatal: early EOF15:45
gtemafatal: fetch-pack: invalid index-pack output15:45
fungigit has a "-4" command-line option15:45
dtantsurclarkb: I forced v4 in hosts, cloning succesfully so far15:45
fungigit clone -4 https://opendev.org/openstack/devstack15:46
fungior whatever15:46
dtantsurdoes anyone have a guess why I'm seeing "destination unreachable" using v6?15:46
clarkbuhm did we stop recording backend ports in the haproxy logs?15:46
gtemadtantsur - for me it clones perfectly until the last percent15:46
dtantsurgtema: I repeated clone in a loop15:47
dtantsurall succeded15:47
fungidtantsur: are you able to reach anything over v6 from where you're testing?15:47
clarkbgtema: which repo was that a clone for that just failed?15:47
gtemagit clone https://opendev.org/openstack/heat15:47
fungithough i think frickler previously reported that vexxhost's v6 prefixes are too long and get filtered by some isps15:47
gtemabut it's not 100% reproducable - sometimes it passes, but it is clearly slow15:48
fungiand at least at the time when it was a problem they didn't have any backup v6 routes for longer prefixes15:48
fungier, i mean backup routes with shorter prefixes15:48
dtantsurfungi: google and facebook curl via v6 for me15:48
gtemadtantsur - sure you get google from us?15:49
gtemaI tested it also and was landing always in EU mirrir15:49
gtemamirror15:49
dtantsurgtema: the question was about v6 on my side15:49
dtantsurI'm quite sure it works at all15:49
fungiyeah, i guess it's possible their v6 routes are still filtered at some isps' borders because of the prefix length15:49
dtantsurthat explains15:49
fungii haven't looked in bgp recently to see what it's like for them15:49
dtantsuranyway, it's funny. we have the same provider in the same region of the same country. I cannot reproduce the failure Oo15:51
fungithere was old iana guidance that said v6 announcements shouldn't be longer than 48 bits and vexxhost was doing 56 i think in order to carve up their assignments for different regions in a more fine-grained way15:51
clarkbdtantsur: can you try talking to https://gitea02.opendev.org:3081 to reproduce?15:51
clarkbits possible this is backend specific and that is the backend that gtema is talking to15:51
dtantsurcurl works, trying git15:52
dtantsur(okay, same provider, but my IP address is clearly from a different subnet)15:54
fungihttps://bgpview.io/ip/2604:e100:3:0:f816:3eff:fe6b:ad62 reports seeing a /48 announced currently. maybe it was that the old iana guidance was for filtering prefixes longer than 32 bits. i'll see if i can find it15:54
dtantsur4 attempts worked on that backend15:54
clarkbdtantsur: thank you for checking15:57
clarkbfwiw I'm still trying to work my way through the logs on gitea02 to see if there is any indication of the server closing things for some reason (so far nothing)15:57
gtemadefinitely something on the general routing side: time git clone https://gitea02.opendev.org:3081/openstack/python-openstackclient => 5s (from 91.7.x.x) and 2m39 from 80.158.88.x15:58
gtemabut in both cases it worked now15:58
clarkbhttps://stackoverflow.com/questions/21277806/fatal-early-eof-fatal-index-pack-failed indicates that heat clone error may be due to memory needs of git15:59
clarkb(I think that is client side?)16:00
gtemahe, this looks funny - our repos are now too big?16:00
clarkbfor your client's defaults to unpack maybe?16:00
fricklerI can confirm the issue with IPv6 from AS3320 to opendev.org is back. I can try to ping my contact there or maybe gtema has some internal link16:00
fungion the v6 prefix filtering, all current recommendations i'm finding are to filter out bgp6 announcements longer than 48 bits, so that vexxhost route should be fine by modern standards. even ripe seems to say so (slide 54 in this training deck): https://www.ripe.net/support/training/material/webinar-slides/bgp-security-irr-filtering.pdf16:00
frickleryes, /48 is acceptable usually and the route6 object that mnaser created last year for this is still in place16:01
clarkbgtema: git/2.33.1.gl1 is that you?16:01
clarkber is that your git client version?16:01
gtema2.35.316:02
gtemaand 2.37.3 from mac16:02
gtemafrickler - wrt as3320. I have contact who know contact who ... So generally I can try, but that is definitely not going to be fast my way16:03
*** dviroel|lunch is now known as dviroel16:05
clarkbgtema: interesting does that mean your heat clone that failed above ran for about half an hour?16:05
clarkb(I'm trying to reconcile it with what I see in the logs)16:05
gtemaclarkb - nope. It either breaks after 3-4 min or I cancel it16:06
fungiit's possible 30 minutes was when the lb gave up waiting for the client response16:06
clarkbfungi: oh that could be16:06
fungibecause it never "got the memo" that the client went away16:06
clarkbya then its a server side disconnect at that point maybe16:07
fungiright, we may have failsafe timeouts on the gitea end or in haproxy or it could be some state tracking middlebox elsewhere which terminated the session later16:07
fungimaybe even an inactivity timeout from conntrack on the hypervisor host16:08
fungitoo many variables16:08
clarkbbut ya I see 200s and 307s. There are a couple of 401s that are curious but I have to assume those are due to parameters sent with the request that I can't view in the logs16:09
clarkbseparately I thought we had fixed the issue of having ports logged all the way through to trace requests and we definitely don't have that anymore16:09
clarkbwe need the port for the backend request logged in haproxy. Then we also need to have gitea log the port (it logs :0 for some reason)16:09
fungii thought we had it logged in apache on the gitea side?16:10
clarkbfungi: we do but that is insufficient to trace a connection/request from haproxy through to gitea16:11
fungior maybe it was that we broke that logging when we added the apache layer16:11
clarkbwe need it in the gitea logs and haproxy as well to trace through the entire system16:11
clarkbI'm going to look at that now as debugging this without that info is not pleasant16:11
fungii can't imagine how we lost the haproxy side log details, unless they changed their log formatting language16:12
clarkbit seems we're ignoring the format specification for sure16:14
fungii see the source ports log-format  "%ci:%cp [%t] %ft [%bi]:%bp %b/%s %Tw/%Tc/%Tt %B %ts %ac/%fc/%bc/%16:15
fungisc/%rc %sq/%bq"16:15
fungigrr, stray newline in my buffer16:15
fungibut yes it doesn't seem to appear in the logs16:16
*** jpena is now known as jpena|off16:17
clarkb[38.108.68.124]:52854 balance_git_https/gitea02.opendev.org 1/0/164628 16:17
clarkbdoes that map to [%bi]:%bp %b/%s %Tw/%Tc/%Tt ?16:17
clarkband if so why is the frontend ip show in the []s16:17
clarkboh that is because it is the source side of the connection16:18
clarkbfungi: ok I think we have haproxy <-> apache info but not apache <-> gitea16:19
fungioh, right, we have to map the frontend entries to their corresponding backend entries in the haproxy log16:20
clarkbthe gitea log seems to be logging some forwarded for entry and shows 38.108.68.124:0. I think I recall trying to add this info to the gitea log and apparently that doesnt' work16:21
clarkbusing that extra bit of info gitea and apache appear to report an http 200 response for fatal: fetch-pack: invalid index-pack output16:22
clarkbI think it must've failed on the client side and then not properly shut down the connection so the server eventually does it16:22
clarkbmaybe the bits are being flipped or there is some memory issue on the client hard to say at this point but good to know gitea seems to think it was fine16:23
clarkbOk and I did update the app.ini for gitea to record the request remote addr whcih I thought in testing was working16:28
clarkbEither going through the haproxy breaks this or this is a gitea regression16:29
clarkbhttps://github.com/go-chi/chi/issues/453 I think that is realted16:32
fungihttps://github.com/go-chi/chi/issues/708 seems to propose an alternative which would preserve the port16:37
fungiand there's this option: https://github.com/go-chi/chi/pull/51816:38
fungiclarkb: maybe the simple solution is to just tell apache mod_proxy not to set x-forwarded-for and then map them up from the logs ourselves?16:40
clarkbI think gitea 1.14.0 broke this. This is when gitea migrated from macaron (which my commit for updating the access log format refers to) to chi16:40
clarkbfungi: we'd need to record the port in gitea somehow and I'm not sure how to do that without x-forwarded-for. Or maybe you're saying it will fallback to the regular behavior? Ya that might make sense16:41
fungiright, wondering if the realip middleware will just not replace it if it thinks the client address is already "real" (because of a lack of proxy headers)16:42
clarkbwe should be able to test that at least16:42
clarkbI need to eat something, but then I can look at updating the apache configs to try that16:43
clarkbalso I think we can drop the custom access log template now that gitea doesn't use macaron16:43
fungiyeah, i still haven't had a chance to get my morning shower, so should take a break as well16:43
clarkbits possible that may fix it too if we're accessing the remoteaddr differently (though I doubt it)16:43
clarkbfungi: I'm not seeing any way to have apache record the port it uses to establish the proxy connection?17:48
clarkboh I see17:51
opendevreviewClark Boylan proposed opendev/system-config master: Update gitea logs for better request tracing  https://review.opendev.org/c/opendev/system-config/+/86001017:58
clarkbsomething like that maybe. We should be able to check the logs from the zuul jobs to confirm17:59
fungiah okay, so we were already setting a custom logformat in the vhost18:00
fungithe suggestions i found were for adjusting loglevel for the proxy subsystem18:00
clarkbfungi: ya I think we set the custome apache log to log the port of the upstream connection18:06
clarkbpreviously this should've given us the same host:port pair in all three log files18:06
clarkbbut then gitea updated to chi and broke that18:06
clarkbfungi: I jumped onto the host and this seems to be working. I'm actually going to split this into two changes now because maybe our overriding of the access log template is at least partially to blame18:27
opendevreviewClark Boylan proposed opendev/system-config master: Update gitea logs for better request tracing  https://review.opendev.org/c/opendev/system-config/+/86001018:30
opendevreviewClark Boylan proposed opendev/system-config master: Switch back to default gitea access log format  https://review.opendev.org/c/opendev/system-config/+/86001718:30
clarkbLooks like https://zuul.opendev.org/t/openstack/build/6622a1ec85d0476584db69047ed4673d/log/gitea99.opendev.org/logs/access.log#11 shows that simply using the default log format won't fix this. I think landing that is good cleanup though and it isn't a bigger regression19:47
clarkband we need to collect apache logs19:48
* clarkb writes another change19:48
opendevreviewClark Boylan proposed opendev/system-config master: Collect apache logs from gitea99 host in testing  https://review.opendev.org/c/opendev/system-config/+/86003019:49
*** lbragstad1 is now known as lbragstad20:23
*** dviroel is now known as dviroel|afk20:28
*** frenzyfriday|rover is now known as frenzyfriday20:39
opendevreviewMerged opendev/system-config master: Switch back to default gitea access log format  https://review.opendev.org/c/opendev/system-config/+/86001720:41
*** dasm is now known as dasm|off21:17
opendevreviewMerged opendev/system-config master: Update gitea logs for better request tracing  https://review.opendev.org/c/opendev/system-config/+/86001021:22
clarkbthe testing update followup to ^ seems to show they both work in a way that is traceable now21:41
clarkbits not as simple as finding the same string in three different log files, but it is doable21:41
fungiyep21:41
clarkbI did leave a message in the gitea dev discord channel (via matrix) asking if anyone knows why that is broken.21:44
clarkbI still suspect go-chi but from what I can tell go-chi wants middleware installed to do that translation and go-chi isn't yet doing the :0 if no other valid value is found. That makes me think gitea itself may be doing it somehow21:44
opendevreviewMerged opendev/system-config master: Collect apache logs from gitea99 host in testing  https://review.opendev.org/c/opendev/system-config/+/86003022:44

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!