[BUG,REGRESSION?] 3.11.6+,3.12: GbE iface rate drops to few KB/s

Eric Dumazet eric.dumazet at gmail.com
Sun Nov 17 12:41:38 EST 2013


On Sun, 2013-11-17 at 15:19 +0100, Willy Tarreau wrote:

> 
> So it is fairly possible that in your case you can't fill the link if you
> consume too many descriptors. For example, if your server uses TCP_NODELAY
> and sends incomplete segments (which is quite common), it's very easy to
> run out of descriptors before the link is full.

BTW I have a very simple patch for TCP stack that could help this exact
situation...

Idea is to use TCP Small Queue so that we dont fill qdisc/TX ring with
very small frames, and let tcp_sendmsg() have more chance to fill
complete packets.

Again, for this to work very well, you need that NIC performs TX
completion in reasonable amount of time...

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 3dc0c6c..10456cf 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -624,13 +624,19 @@ static inline void tcp_push(struct sock *sk, int flags, int mss_now,
 {
 	if (tcp_send_head(sk)) {
 		struct tcp_sock *tp = tcp_sk(sk);
+		struct sk_buff *skb = tcp_write_queue_tail(sk);
 
 		if (!(flags & MSG_MORE) || forced_push(tp))
-			tcp_mark_push(tp, tcp_write_queue_tail(sk));
+			tcp_mark_push(tp, skb);
 
 		tcp_mark_urg(tp, flags);
-		__tcp_push_pending_frames(sk, mss_now,
-					  (flags & MSG_MORE) ? TCP_NAGLE_CORK : nonagle);
+		if (flags & MSG_MORE)
+			nonagle = TCP_NAGLE_CORK;
+		if (atomic_read(&sk->sk_wmem_alloc) > 2048) {
+			set_bit(TSQ_THROTTLED, &tp->tsq_flags);
+			nonagle = TCP_NAGLE_CORK;
+		}
+		__tcp_push_pending_frames(sk, mss_now, nonagle);
 	}
 }
 





More information about the linux-arm-kernel mailing list