Followup: OpenConnect unusably slow

Wed Jun 19 15:57:07 EDT 2013

On Wed, 2013-06-19 at 21:16 +0200, Thomas Richter wrote:
> 
> 18.06.2013 23:20:58 **fragmentation flood** 129.69.90.139, 443->> 
> 192.168.2.103, 59947 (von PPPoE - Eingang)
> 18.06.2013 23:20:56 **fragmentation flood** 129.69.90.139, 443->> 
> 192.168.2.103, 59947 (von PPPoE - Eingang)

This is almost certainly your culprit. You say the router does no
filtering, but there's no point in detecting and reporting a
"fragmentation flood" if you aren't going to start *dropping* some of
the packets.

Although it's very odd to drop the 'start of packet' fragments and not
the subsequent ones. And the time in the log you show above doesn't
match the times that packets were missing from your previous tcpdumps.

> So, could it be that either the router is not telling me the right MTU, 

The MTU is per-link. The Ethernet link between your internal clients and
the router, over your wireless or wired network, *is* 1500. It's not
giving you incorrect information.

The MTU on the link between the router and the ISP is (presumably) 1492.

There exists the concept of a "path MTU", which is the *lowest* MTU of
all the links on the path between you and the server. Which is likely
also to be 1492, since every *proper* link has an MTU of at least 1500.

In Legacy IP, routers are allowed to automatically fragment packets. So
when a router sees a packet that's too big to fit down the next link, it
can split it up into fragments for itself.

However, that ends up being really inefficient. The hosts at each end
really want to learn what the *path* MTU is, and send packets no larger
than that size.

To do this, they set a 'Don't Fragment' (DF) bit on the packets they
send. And this means that a router *doesn't* automatically fragment
them. Instead it sends an ICMP 'fragmentation needed' message back to
the sender, letting the *sender* know that it needs to reduce the size
of the packets it's sending. (In IPv6, there is no DF bit and it's as if
it were always set; routers *never* fragment for you.)

So, you'll notice that the DF bit is set on your outbound packets, and
your machine should *learn* about the lower MTU of the PPPoE link when
your local router tells it that certain packets didn't fit.

However, you'll also notice that the DF bit *isn't* set on the incoming
packets you see from the Cisco side. As far as I can tell, that's purely
because a large number of Cisco admins are morons, and like to filter
ICMP because they don't realise how badly that breaks networks. So the
Cisco server *doesn't* set the DF bit, *doesn't* get hurt when broken
firewalls break incoming ICMP, and doesn't know what the path MTU is
between you and the server. So it can't make optimal use of the network
by fragmenting packets properly at source.

> Not yet tried so far, but at least 700 *is* sufficient to fix the 
> problem. Probably I'll try just the 1500 minus the overhead of PPPoE, 
> minus the overhead of VPN and bisect it from both ends. This will take
> a while.

You could try just leaving it at the default (1408?) and then using
'ping -s xxx' to try packets of different lengths. watch what's
happening with tcpdump. See when the fragmentation of the incoming
packets starts to happen.

You could try this sized ping both over the VPN, and on the real
network. No need to do it with UDP, I suspect.

-- 
dwmw2

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5745 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/openconnect-devel/attachments/20130619/85e71ded/attachment-0001.bin>