From 67469601406c12ced3db9956aeb0ef0854e2952f Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 24 Apr 2012 07:37:38 +0000 Subject: ipv6: RTAX_FEATURE_ALLFRAG causes inefficient TCP segment sizing MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Quoting Tore Anderson from : https://bugzilla.kernel.org/show_bug.cgi?id=42572 When RTAX_FEATURE_ALLFRAG is set on a route, the effective TCP segment size does not take into account the size of the IPv6 Fragmentation header that needs to be included in outbound packets, causing every transmitted TCP segment to be fragmented across two IPv6 packets, the latter of which will only contain 8 bytes of actual payload. RTAX_FEATURE_ALLFRAG is typically set on a route in response to receving a ICMPv6 Packet Too Big message indicating a Path MTU of less than 1280 bytes. 1280 bytes is the minimum IPv6 MTU, however ICMPv6 PTBs with MTU < 1280 are still valid, in particular when an IPv6 packet is sent to an IPv4 destination through a stateless translator. Any ICMPv4 Need To Fragment packets originated from the IPv4 part of the path will be translated to ICMPv6 PTB which may then indicate an MTU of less than 1280. The Linux kernel refuses to reduce the effective MTU to anything below 1280 bytes, instead it sets it to exactly 1280 bytes, and RTAX_FEATURE_ALLFRAG is also set. However, the TCP segment size appears to be set to 1240 bytes (1280 Path MTU - 40 bytes of IPv6 header), instead of 1232 (additionally taking into account the 8 bytes required by the IPv6 Fragmentation extension header). This in turn results in rather inefficient transmission, as every transmitted TCP segment now is split in two fragments containing 1232+8 bytes of payload. After this patch, all the outgoing packets that includes a Fragmentation header all are "atomic" or "non-fragmented" fragments, i.e., they both have Offset=0 and More Fragments=0. With help from David S. Miller Reported-by: Tore Anderson Signed-off-by: Eric Dumazet Cc: Maciej Żenczykowski Cc: Tom Herbert Tested-by: Tore Anderson Signed-off-by: David S. Miller --- include/net/inet_connection_sock.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/net/inet_connection_sock.h') diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 46c9e2ccdf02..7d83f90f203f 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -45,6 +45,7 @@ struct inet_connection_sock_af_ops { struct dst_entry *dst); struct inet_peer *(*get_peer)(struct sock *sk, bool *release_it); u16 net_header_len; + u16 net_frag_header_len; u16 sockaddr_len; int (*setsockopt)(struct sock *sk, int level, int optname, char __user *optval, unsigned int optlen); -- cgit v1.2.3