Wednesday, 2023-08-23

fungimnaser: okay so that's connecting to afs01.dfw.openstack.org from vexxhost ca-ymq-1?00:24
mnaserfungi: that’s correct00:25
mnaserwell, from the same site but not from ca-ymq-1 directly :)00:25
fungii can successfully `ls /afs/openstack.org/` from mirror.ca-ymq-1.vexxhost.opendev.org00:26
mnaserfungi: hmm, the only thing that changed is the SDN is now OVN.. but I’m seeing traffic go back and forth so I’m a bit clueless00:28
mnaserI wonder if it’s something to do with UDP traffic00:29
fungiinteresting, i think mirror.ca-ymq-1.vexxhost.opendev.org is hitting afs01.ord.openstack.org instead of dfw00:35
fungimaybe it's a problem reaching rax-dfw?00:35
fungiping works fine though00:35
mnaserfungi: it does ping fine indeed, i'm even seeing traffic back and forth.. but just seems like the ls times out00:37
fungifrom home i'm hitting afs01.dfw.openstack.org when i browse thoug00:38
fungih00:38
fungiso if it's a problem with dfw and not ord, it's also not broken from some parts of the internet either00:39
mnaseryeah.. i wish i had more info, maybe what's happening is some mtu thing00:39
fungipmtud blackhole maybe00:39
fungii'll see if i can figure out how to get the mirror to hit dfw00:41
mnaserim curious what a successful tcpdump looks like00:42
* mnaser hmm, interesting, looks like the afsd process is hung00:44
fungimnaser: https://paste.opendev.org/show/bDrIsFdnfpLUUqqbn7KL/00:44
fungithat's from my home workstation which seems to be hitting afs01.dfw00:45
mnaserhrm00:46
mnaseryeah definetly dont see that00:46
mnaserhttps://paste.opendev.org/show/bB7O2Jye8Hug5NPFL2iB/00:48
mnaserthis is what im seeing on start up00:49
fungiwhat openafs-client version? (or are you using kafs?)00:50
mnaseropenafs-client-1.8.8.1-1.el7.x86_6400:51
fungiwe're using 1.8.8.1-3ubuntu2~22.04.2 on our zuul executors, so should be fine00:51
fungiany chance you see similar issues from sjc1?00:52
mnaserdoes openafs use bidirectional traffic?00:52
mnaseri have allow out all, but nothing inbound (other than ssh)00:52
mnaserand i can test.. but this is not from the public cloud but from another cloud running in the same site (so not vexxhost cloud persay)00:53
mnaseri wonder if this is because ovn is using stateless rules00:54
mnaserso i need to add the openafs client ports00:54
JayFI am pretty sure UDP needs incoming ports opened, and it was a range iirc (it's been a very long while since a ran openafs)00:54
fungiyeah, i believe it opens return data channels00:55
mnaseryeah and ovn uses stateless fw rules as opposed to ml2/ovs stateful ones00:55
mnaserso i wonder since this system only has port 22 open..00:55
JayFwell, I think it's the sorta thing you'd need special handling for, like ftp or sip do to mark return traffic as related00:55
JayFit's not so much stateful as I think two-way RPC comms 00:56
mnaserill try to enable 7000 to 7007 and see what happens00:56
JayF(again my knowledge is at least a decade old so please validate for yourself)00:56
fungiwe have iptables set for "-m state --state ESTABLISHED,RELATED -j ACCEPT" but nothing special for inbound communications, fwiw00:57
fungimaybe ovn is less smart about identifying afs responses00:57
fungiwe have our security groups wide open in both directions though, and are relying on iptables on the server instances instead01:00
fungiat home it's working fine through an openbsd nat, for that matter, with no inbound mappings at all01:03
mnaseryeah but i think ovn is full stateless when it comes to its security groups01:05
fungiah, okay, so doesn't attempt to do udp state tracking guesswork?01:07
mnaserit might not be, well, i just tried to enable all incoming tcp and udp and still no bueno01:08
mnaser`$ ls /afs/openstack.org` just hangs01:08
mnaserim deffo seeing 'action' in tcpdump happening.. with pings and pongs,01:09
fungiour mirrors in vexxhost ca-ymq-1 and sjc1 are hitting afs01.ord instead of dfw, so it may still be something going missing on some routes and not others01:09
mnaserthe odd thing is im actually seeing traffic go and come back01:09
mnaserhttps://paste.opendev.org/show/bd223OuxBuM5GS0zup5n/ things are moving here01:11
fricklermnaser: so iiuc the issue is when a vm is in a tenant network behind OVN? as opposed to our mirror which is in a provider network, so no OVN SNAT involved?07:44
*** dviroel_ is now known as dviroel11:27
fungialternatively it could be some packets being modified or going missing between there and rackspace's dfw network but not their ord network11:36
fungii didn't try forcing mirror.ca-ymq-1.vexxhost.opendev.org to prefer afs01.dfw.openstack.org instead of afs01.ord.openstack.org11:37
mnaserfrickler: so it seems with ovn + distributed floating ips.. if the MTU of the tenant network is smaller than the provider network, the network will end up getting packets that are too big for the interface and get dropped13:51
*** d34dh0r5- is now known as d34dh0r5313:52
fricklermnaser: ah, yes, that would be a plausible explanation. one of the reasons I always make sure tenant networks as well as provider networks have MTU 150013:55
fungimnaser: can't it do proper pmtud or allow fragments worst case?13:55
JayFmnaser: frickler: That's literally the thing we just figured out was breaking Julia's attempt to get Ironic+OVN working :)13:56
fricklerJayF: that was the issue with tftp stalling, right?13:56
JayFwell, more than just tftp, we switched to http and it was still stalling13:57
fungimaybe i'm just too used to bsd based routers, but mtu blackholes are, like, a flashback to the 1990s for me13:57
JayFyeah when I saw the shape of Julia's failure MTU is the first thing I said 13:58
fungiit's hard to believe a modern routing platform wouldn't just quietly solve them (either by telling the peers to negotiate packet sizes down or fragmentation and reassembly on the fly)13:59
fricklerI wouldn't be surprised if OVN would silently drop too big packets instead of generating proper ICMP responses, but also I'm biased about that piece of software14:00
fungithe other possibility is that something else is zealously discarding the icmp error replies14:01
fungii used to see that back in the bad old days where people thought "block all icmp" was something expected for firewalling14:01
fricklerguess I'll need to do some devstack test setup. I need to do a big update on OVN gaps anyhow14:05
frickleror maybe I can create a test network in our vexxhost tenant?14:05
fungias long as you don't attach any of our production servers to it, that's probably safe14:06
mnaserfrickler: the vexxhost public cloud is not running ovn yet.14:06
fungiand also without access to the underlying infrastructure it might be hard to troubleshoot even if it were14:07
mnaserfor context, tenant network is 1450, provider network is 1500, distirbuted floating ip set to on .. boom14:08
fricklermnaser: so without distributed fip it is working?14:09
fricklerplease create a bug report for neutron if possible. also just for completeness, which ovn version?14:11
fricklerthinking about it, JayF: do you have a bug report for your ironic issue?14:15
JayFI don't know if we ever considered it a bug; I think we consider it a devstack configuration issueo14:17
JayFwhile building ironic support for ovn14:17
mnaserfrickler: i did not try to have it disabled, but i think it might be more complicated since i end up with many messages like this in dmesg: `tapdf9341b0-6d: dropped over-mtu packet: 1472 > 1450`14:24
fricklermnaser: oh, so that looks like OVN does forward the packet, but the kernel then drops it14:27
fricklerralonsoh: lajoskatona: ^^ is that something you've maybe seen before?14:27
clarkbone issue with neutron and mtus historically is that there are multiple l2 (not l3) devices in a row which prevents them from generating proper icmp responses to the source to shorten their packets14:27
clarkbbecause icmp operates at l3 and requires an ip address iirc14:28
ralonsohlet me read the conversation14:28
fungioh, yes connecting l2 broadcast domains without an actual routing node is a recipe for packet loss and black holes14:28
clarkbthis problem is what drove me to push neutron to manage mtus far more aggressively along the entire pathway for VM connections because neutron is the only thing that knows the complete info14:28
clarkbessentually neutron should (and did/does) set the mtu to the lowest value on the pathway across all nodes14:28
clarkbto address this sort of problem. But for a long time neutron did not do this and things broke a lot particularly when we first started doing multinode testing14:29
fungithe former network engineer in me shudders at that thought14:29
fricklerralonsoh: tl;dr: UDP from the outside towards a FIP with size 1500 gets dropped when the inside tenant network has smaller MTU14:29
clarkbmy initial impression is that neutron should/must set the mtu on the fip to the lowest size of the path behind it14:29
ralonsohfrickler, without packet fragmentation?14:30
clarkbbecause the path behind it is almost always >3 interfaces only the last of which actually has an ip address and can send an icmp14:30
fricklerralonsoh: it seems so. I'll set up a test in devstack for myself now14:30
ralonsohfrickler, I would need to check with core OVN folks if that is possible. If not, the inner tenant network should increase the MTU14:31
ralonsohis that a problem to increase the MTU?14:31
fungidoes whatever neutron's using to interconnect the bridges actually have the ability to do on-the-fly transparent fragmentation and reassembly?14:31
clarkbyou can't increase the inner tenant mtu if it is running on a 1500 mtu host network14:31
funginormally i'd expect to connect the bridges by a router which can supply ntf responses to the sender at least14:32
fungior df and do pmtud14:32
ralonsohsorry what bridges? the issue is between networks, not bridges14:32
mnaserralonsoh: in my case i can increase it with no issues, but this seems like an undocumented ml2/ovs vs ml2/ovn issue14:32
mnaserfungi: i think you're thinking in ml2/ovs terms14:33
mnasertap/qbr/qvo/etc .. not a thing in an ovn deployment14:33
ralonsohmnaser, for sure. Please open a LP bug because I need to redirect that to some core OVN folks14:33
mnaseri am writing one up right now :)14:33
ralonsohand please specify the network types used14:33
fungiclarkb was saying earlier that in some architectures neutron is directly connecting laeyr 2 broadcast domains with potentially differing mtus14:33
clarkbmnaser: fwiw the example you gave is for a tap so thats still a thing here14:33
ralonsohmnaser, are you using VLAN?14:33
mnaseryea, its a tap directly attached to br-int14:33
mnaseryes vlan provider network ralonsoh 14:34
ralonsohyes, ok, that could be a problem with FIPs and DVR14:34
clarkbto be clear I don't know what neutron is doing with ovn. I do know this exact sort of problem was a major issue in neutron when we set up multinode testing and ou couldn't rely on fragementation due to the lack of ability for devices to send icmp responses14:34
ralonsohactually we are going to limit that for now14:34
ralonsohreason: FIP with port forwarding implies to send to traffic through the SNAT node (central node)14:35
ralonsohand that implies any other FIP in this network should be sent too to the central node14:35
ralonsohbut please, open the LP bug documenting the issue. I'll mark it as priority=high14:36
mnaserralonsoh: https://bugs.launchpad.net/neutron/+bug/203281714:37
ralonsohthanks14:37
clarkband the tap device doesn't not have an associated l3 address to generate an icmp response from I guess?14:39
clarkbfwiw it wouldn't surprise me if OVN says this isn't a bug because the remot side is supposed to complain not the source right?14:39
-opendevstatus- NOTICE: Gerrit is going to be restarted to pick up a small config update. You will notice a short outage of the service.15:33
dpawlikfungi: hey, tl;dr it was a network issue. AFS is connecting to openstack well :)15:52
dpawlikthanks for yesterday help 15:52
fungidpawlik: good to know, thanks for following up!16:58
fricklerdpawlik: just being curious, was that the same thing that mnaser was talking about earlier or a different one?17:21
fricklermnaser: ralonsoh: just fyi found this by accident: "MTU handling (fragmentation on output)" https://github.com/ovn-org/ovn/blob/main/TODO.rst18:45
mnaserfrickler: if that’s the case.. ouch :(18:48
fungijust don't stub your toe on all the "under construction" signs18:50
fricklerat least I could trivially reproduce this by simply doing a large ping to an instance in devstack. so also not related to vlan, same thing with geneve19:08
fricklerfungi: or "mind the gap(s)", cf. also https://review.opendev.org/c/openstack/neutron/+/89257819:18
JayFfrickler: I passed that on to Julia, as well19:27
JayF[screams in MTU but nobody can hear it because the scream is too large]19:28
JayFshe literally has been trying to get OVN working with Ironic jobs off and on for a month+ and that's at least one giant reason why not :(19:28
clarkbwhy would OVN spoof your dns servers? that seems like a massive flaw21:15
clarkbwe (opendev) intentionally configure DNS servers external to our clouds because the cloud run dns servers are often deficient21:15
fungis/deficient/utterly broken/21:18
fricklerclarkb: they do that because they replaced the dnsmasq setup that Neutron used earlier with packet mangling rules. those rules don't have any IP address they could use as source other than the one the original request was sent to21:49
clarkbfrickler: but neutron dnsmasq is/was only for dhcp right? it isn't the actual dns server?21:50
clarkbI guess I don't understand the nuance there21:50
fricklerclarkb: it does both, depending on the deployment scenario. https://docs.openstack.org/neutron/latest/admin/config-dns-res.html#case-2-dhcp-agents-forward-dns-queries-from-instances21:51
frickleractually that doc is incomplete, it doesn't mention that queries for local host entries are not forwarded21:52
fricklerso if you have vm1 and vm2 in a subnet, and vm1 asks "A? vm2", dnsmasq will answer that query from a hosts file generated by neutron21:53
fricklerthe difference is that in the OVS scenario, the vm will explicitly have configured the dhcp server IP as resolver address to query21:55
fricklerwith OVN, you ask 1.1.1.1 or similar, and OVN will intercept that packet and generate the answer "vm2 A 10.1.2.3"21:56
fricklerand yes, in some legislation that could be seen as data fraud21:56
clarkbhuh I never new that neutron would resolve that way. I thought you had to do designate integration and query the actual zone22:11
fricklerI think this is somehow mimicking what nova-network did, though I never worked with the latter22:18
clarkband I guess dns over tls or dns over https would completely break that too. Seems that is the direction we're headed too22:25
frickleryes. dnssec in particular won't work with that. so maybe in due time this will all be dropped and only designate integration remains. if designate ever learns to deal with dnssec, that is22:29
*** dtantsur_ is now known as dtantsur23:58

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!