<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/net/ipv4/tcp_timer.c, branch v3.7.9</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>tcp: Reject invalid ack_seq to Fast Open sockets</title>
<updated>2012-10-23T06:42:56+00:00</updated>
<author>
<name>Jerry Chu</name>
<email>hkchu@google.com</email>
</author>
<published>2012-10-22T11:26:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=37561f68bd527ec39076e32effdc7b1dcdfb17ea'/>
<id>37561f68bd527ec39076e32effdc7b1dcdfb17ea</id>
<content type='text'>
A packet with an invalid ack_seq may cause a TCP Fast Open socket to switch
to the unexpected TCP_CLOSING state, triggering a BUG_ON kernel panic.

When a FIN packet with an invalid ack_seq# arrives at a socket in
the TCP_FIN_WAIT1 state, rather than discarding the packet, the current
code will accept the FIN, causing state transition to TCP_CLOSING.

This may be a small deviation from RFC793, which seems to say that the
packet should be dropped. Unfortunately I did not expect this case for
Fast Open hence it will trigger a BUG_ON panic.

It turns out there is really nothing bad about a TFO socket going into
TCP_CLOSING state so I could just remove the BUG_ON statements. But after
some thought I think it's better to treat this case like TCP_SYN_RECV
and return a RST to the confused peer who caused the unacceptable ack_seq
to be generated in the first place.

Signed-off-by: H.K. Jerry Chu &lt;hkchu@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A packet with an invalid ack_seq may cause a TCP Fast Open socket to switch
to the unexpected TCP_CLOSING state, triggering a BUG_ON kernel panic.

When a FIN packet with an invalid ack_seq# arrives at a socket in
the TCP_FIN_WAIT1 state, rather than discarding the packet, the current
code will accept the FIN, causing state transition to TCP_CLOSING.

This may be a small deviation from RFC793, which seems to say that the
packet should be dropped. Unfortunately I did not expect this case for
Fast Open hence it will trigger a BUG_ON panic.

It turns out there is really nothing bad about a TFO socket going into
TCP_CLOSING state so I could just remove the BUG_ON statements. But after
some thought I think it's better to treat this case like TCP_SYN_RECV
and return a RST to the confused peer who caused the unacceptable ack_seq
to be generated in the first place.

Signed-off-by: H.K. Jerry Chu &lt;hkchu@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: TCP Fast Open Server - support TFO listeners</title>
<updated>2012-09-01T00:02:19+00:00</updated>
<author>
<name>Jerry Chu</name>
<email>hkchu@google.com</email>
</author>
<published>2012-08-31T12:29:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8336886f786fdacbc19b719c1f7ea91eb70706d4'/>
<id>8336886f786fdacbc19b719c1f7ea91eb70706d4</id>
<content type='text'>
This patch builds on top of the previous patch to add the support
for TFO listeners. This includes -

1. allocating, properly initializing, and managing the per listener
fastopen_queue structure when TFO is enabled

2. changes to the inet_csk_accept code to support TFO. E.g., the
request_sock can no longer be freed upon accept(), not until 3WHS
finishes

3. allowing a TCP_SYN_RECV socket to properly poll() and sendmsg()
if it's a TFO socket

4. properly closing a TFO listener, and a TFO socket before 3WHS
finishes

5. supporting TCP_FASTOPEN socket option

6. modifying tcp_check_req() to use to check a TFO socket as well
as request_sock

7. supporting TCP's TFO cookie option

8. adding a new SYN-ACK retransmit handler to use the timer directly
off the TFO socket rather than the listener socket. Note that TFO
server side will not retransmit anything other than SYN-ACK until
the 3WHS is completed.

The patch also contains an important function
"reqsk_fastopen_remove()" to manage the somewhat complex relation
between a listener, its request_sock, and the corresponding child
socket. See the comment above the function for the detail.

Signed-off-by: H.K. Jerry Chu &lt;hkchu@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch builds on top of the previous patch to add the support
for TFO listeners. This includes -

1. allocating, properly initializing, and managing the per listener
fastopen_queue structure when TFO is enabled

2. changes to the inet_csk_accept code to support TFO. E.g., the
request_sock can no longer be freed upon accept(), not until 3WHS
finishes

3. allowing a TCP_SYN_RECV socket to properly poll() and sendmsg()
if it's a TFO socket

4. properly closing a TFO listener, and a TFO socket before 3WHS
finishes

5. supporting TCP_FASTOPEN socket option

6. modifying tcp_check_req() to use to check a TFO socket as well
as request_sock

7. supporting TCP's TFO cookie option

8. adding a new SYN-ACK retransmit handler to use the timer directly
off the TFO socket rather than the listener socket. Note that TFO
server side will not retransmit anything other than SYN-ACK until
the 3WHS is completed.

The patch also contains an important function
"reqsk_fastopen_remove()" to manage the somewhat complex relation
between a listener, its request_sock, and the corresponding child
socket. See the comment above the function for the detail.

Signed-off-by: H.K. Jerry Chu &lt;hkchu@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: fix possible socket refcount problem</title>
<updated>2012-08-21T21:42:23+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-08-20T00:22:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=144d56e91044181ec0ef67aeca91e9a8b5718348'/>
<id>144d56e91044181ec0ef67aeca91e9a8b5718348</id>
<content type='text'>
Commit 6f458dfb40 (tcp: improve latencies of timer triggered events)
added bug leading to following trace :

[ 2866.131281] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
[ 2866.131726]
[ 2866.132188] =========================
[ 2866.132281] [ BUG: held lock freed! ]
[ 2866.132281] 3.6.0-rc1+ #622 Not tainted
[ 2866.132281] -------------------------
[ 2866.132281] kworker/0:1/652 is freeing memory ffff880019ec0000-ffff880019ec0a1f, with a lock still held there!
[ 2866.132281]  (sk_lock-AF_INET-RPC){+.+...}, at: [&lt;ffffffff81903619&gt;] tcp_sendmsg+0x29/0xcc6
[ 2866.132281] 4 locks held by kworker/0:1/652:
[ 2866.132281]  #0:  (rpciod){.+.+.+}, at: [&lt;ffffffff81083567&gt;] process_one_work+0x1de/0x47f
[ 2866.132281]  #1:  ((&amp;task-&gt;u.tk_work)){+.+.+.}, at: [&lt;ffffffff81083567&gt;] process_one_work+0x1de/0x47f
[ 2866.132281]  #2:  (sk_lock-AF_INET-RPC){+.+...}, at: [&lt;ffffffff81903619&gt;] tcp_sendmsg+0x29/0xcc6
[ 2866.132281]  #3:  (&amp;icsk-&gt;icsk_retransmit_timer){+.-...}, at: [&lt;ffffffff81078017&gt;] run_timer_softirq+0x1ad/0x35f
[ 2866.132281]
[ 2866.132281] stack backtrace:
[ 2866.132281] Pid: 652, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #622
[ 2866.132281] Call Trace:
[ 2866.132281]  &lt;IRQ&gt;  [&lt;ffffffff810bc527&gt;] debug_check_no_locks_freed+0x112/0x159
[ 2866.132281]  [&lt;ffffffff818a0839&gt;] ? __sk_free+0xfd/0x114
[ 2866.132281]  [&lt;ffffffff811549fa&gt;] kmem_cache_free+0x6b/0x13a
[ 2866.132281]  [&lt;ffffffff818a0839&gt;] __sk_free+0xfd/0x114
[ 2866.132281]  [&lt;ffffffff818a08c0&gt;] sk_free+0x1c/0x1e
[ 2866.132281]  [&lt;ffffffff81911e1c&gt;] tcp_write_timer+0x51/0x56
[ 2866.132281]  [&lt;ffffffff81078082&gt;] run_timer_softirq+0x218/0x35f
[ 2866.132281]  [&lt;ffffffff81078017&gt;] ? run_timer_softirq+0x1ad/0x35f
[ 2866.132281]  [&lt;ffffffff810f5831&gt;] ? rb_commit+0x58/0x85
[ 2866.132281]  [&lt;ffffffff81911dcb&gt;] ? tcp_write_timer_handler+0x148/0x148
[ 2866.132281]  [&lt;ffffffff81070bd6&gt;] __do_softirq+0xcb/0x1f9
[ 2866.132281]  [&lt;ffffffff81a0a00c&gt;] ? _raw_spin_unlock+0x29/0x2e
[ 2866.132281]  [&lt;ffffffff81a1227c&gt;] call_softirq+0x1c/0x30
[ 2866.132281]  [&lt;ffffffff81039f38&gt;] do_softirq+0x4a/0xa6
[ 2866.132281]  [&lt;ffffffff81070f2b&gt;] irq_exit+0x51/0xad
[ 2866.132281]  [&lt;ffffffff81a129cd&gt;] do_IRQ+0x9d/0xb4
[ 2866.132281]  [&lt;ffffffff81a0a3ef&gt;] common_interrupt+0x6f/0x6f
[ 2866.132281]  &lt;EOI&gt;  [&lt;ffffffff8109d006&gt;] ? sched_clock_cpu+0x58/0xd1
[ 2866.132281]  [&lt;ffffffff81a0a172&gt;] ? _raw_spin_unlock_irqrestore+0x4c/0x56
[ 2866.132281]  [&lt;ffffffff81078692&gt;] mod_timer+0x178/0x1a9
[ 2866.132281]  [&lt;ffffffff818a00aa&gt;] sk_reset_timer+0x19/0x26
[ 2866.132281]  [&lt;ffffffff8190b2cc&gt;] tcp_rearm_rto+0x99/0xa4
[ 2866.132281]  [&lt;ffffffff8190dfba&gt;] tcp_event_new_data_sent+0x6e/0x70
[ 2866.132281]  [&lt;ffffffff8190f7ea&gt;] tcp_write_xmit+0x7de/0x8e4
[ 2866.132281]  [&lt;ffffffff818a565d&gt;] ? __alloc_skb+0xa0/0x1a1
[ 2866.132281]  [&lt;ffffffff8190f952&gt;] __tcp_push_pending_frames+0x2e/0x8a
[ 2866.132281]  [&lt;ffffffff81904122&gt;] tcp_sendmsg+0xb32/0xcc6
[ 2866.132281]  [&lt;ffffffff819229c2&gt;] inet_sendmsg+0xaa/0xd5
[ 2866.132281]  [&lt;ffffffff81922918&gt;] ? inet_autobind+0x5f/0x5f
[ 2866.132281]  [&lt;ffffffff810ee7f1&gt;] ? trace_clock_local+0x9/0xb
[ 2866.132281]  [&lt;ffffffff8189adab&gt;] sock_sendmsg+0xa3/0xc4
[ 2866.132281]  [&lt;ffffffff810f5de6&gt;] ? rb_reserve_next_event+0x26f/0x2d5
[ 2866.132281]  [&lt;ffffffff8103e6a9&gt;] ? native_sched_clock+0x29/0x6f
[ 2866.132281]  [&lt;ffffffff8103e6f8&gt;] ? sched_clock+0x9/0xd
[ 2866.132281]  [&lt;ffffffff810ee7f1&gt;] ? trace_clock_local+0x9/0xb
[ 2866.132281]  [&lt;ffffffff8189ae03&gt;] kernel_sendmsg+0x37/0x43
[ 2866.132281]  [&lt;ffffffff8199ce49&gt;] xs_send_kvec+0x77/0x80
[ 2866.132281]  [&lt;ffffffff8199cec1&gt;] xs_sendpages+0x6f/0x1a0
[ 2866.132281]  [&lt;ffffffff8107826d&gt;] ? try_to_del_timer_sync+0x55/0x61
[ 2866.132281]  [&lt;ffffffff8199d0d2&gt;] xs_tcp_send_request+0x55/0xf1
[ 2866.132281]  [&lt;ffffffff8199bb90&gt;] xprt_transmit+0x89/0x1db
[ 2866.132281]  [&lt;ffffffff81999bcd&gt;] ? call_connect+0x3c/0x3c
[ 2866.132281]  [&lt;ffffffff81999d92&gt;] call_transmit+0x1c5/0x20e
[ 2866.132281]  [&lt;ffffffff819a0d55&gt;] __rpc_execute+0x6f/0x225
[ 2866.132281]  [&lt;ffffffff81999bcd&gt;] ? call_connect+0x3c/0x3c
[ 2866.132281]  [&lt;ffffffff819a0f33&gt;] rpc_async_schedule+0x28/0x34
[ 2866.132281]  [&lt;ffffffff810835d6&gt;] process_one_work+0x24d/0x47f
[ 2866.132281]  [&lt;ffffffff81083567&gt;] ? process_one_work+0x1de/0x47f
[ 2866.132281]  [&lt;ffffffff819a0f0b&gt;] ? __rpc_execute+0x225/0x225
[ 2866.132281]  [&lt;ffffffff81083a6d&gt;] worker_thread+0x236/0x317
[ 2866.132281]  [&lt;ffffffff81083837&gt;] ? process_scheduled_works+0x2f/0x2f
[ 2866.132281]  [&lt;ffffffff8108b7b8&gt;] kthread+0x9a/0xa2
[ 2866.132281]  [&lt;ffffffff81a12184&gt;] kernel_thread_helper+0x4/0x10
[ 2866.132281]  [&lt;ffffffff81a0a4b0&gt;] ? retint_restore_args+0x13/0x13
[ 2866.132281]  [&lt;ffffffff8108b71e&gt;] ? __init_kthread_worker+0x5a/0x5a
[ 2866.132281]  [&lt;ffffffff81a12180&gt;] ? gs_change+0x13/0x13
[ 2866.308506] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
[ 2866.309689] =============================================================================
[ 2866.310254] BUG TCP (Not tainted): Object already free
[ 2866.310254] -----------------------------------------------------------------------------
[ 2866.310254]

The bug comes from the fact that timer set in sk_reset_timer() can run
before we actually do the sock_hold(). socket refcount reaches zero and
we free the socket too soon.

timer handler is not allowed to reduce socket refcnt if socket is owned
by the user, or we need to change sk_reset_timer() implementation.

We should take a reference on the socket in case TCP_DELACK_TIMER_DEFERRED
or TCP_DELACK_TIMER_DEFERRED bit are set in tsq_flags

Also fix a typo in tcp_delack_timer(), where TCP_WRITE_TIMER_DEFERRED
was used instead of TCP_DELACK_TIMER_DEFERRED.

For consistency, use same socket refcount change for TCP_MTU_REDUCED_DEFERRED,
even if not fired from a timer.

Reported-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Tested-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 6f458dfb40 (tcp: improve latencies of timer triggered events)
added bug leading to following trace :

[ 2866.131281] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
[ 2866.131726]
[ 2866.132188] =========================
[ 2866.132281] [ BUG: held lock freed! ]
[ 2866.132281] 3.6.0-rc1+ #622 Not tainted
[ 2866.132281] -------------------------
[ 2866.132281] kworker/0:1/652 is freeing memory ffff880019ec0000-ffff880019ec0a1f, with a lock still held there!
[ 2866.132281]  (sk_lock-AF_INET-RPC){+.+...}, at: [&lt;ffffffff81903619&gt;] tcp_sendmsg+0x29/0xcc6
[ 2866.132281] 4 locks held by kworker/0:1/652:
[ 2866.132281]  #0:  (rpciod){.+.+.+}, at: [&lt;ffffffff81083567&gt;] process_one_work+0x1de/0x47f
[ 2866.132281]  #1:  ((&amp;task-&gt;u.tk_work)){+.+.+.}, at: [&lt;ffffffff81083567&gt;] process_one_work+0x1de/0x47f
[ 2866.132281]  #2:  (sk_lock-AF_INET-RPC){+.+...}, at: [&lt;ffffffff81903619&gt;] tcp_sendmsg+0x29/0xcc6
[ 2866.132281]  #3:  (&amp;icsk-&gt;icsk_retransmit_timer){+.-...}, at: [&lt;ffffffff81078017&gt;] run_timer_softirq+0x1ad/0x35f
[ 2866.132281]
[ 2866.132281] stack backtrace:
[ 2866.132281] Pid: 652, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #622
[ 2866.132281] Call Trace:
[ 2866.132281]  &lt;IRQ&gt;  [&lt;ffffffff810bc527&gt;] debug_check_no_locks_freed+0x112/0x159
[ 2866.132281]  [&lt;ffffffff818a0839&gt;] ? __sk_free+0xfd/0x114
[ 2866.132281]  [&lt;ffffffff811549fa&gt;] kmem_cache_free+0x6b/0x13a
[ 2866.132281]  [&lt;ffffffff818a0839&gt;] __sk_free+0xfd/0x114
[ 2866.132281]  [&lt;ffffffff818a08c0&gt;] sk_free+0x1c/0x1e
[ 2866.132281]  [&lt;ffffffff81911e1c&gt;] tcp_write_timer+0x51/0x56
[ 2866.132281]  [&lt;ffffffff81078082&gt;] run_timer_softirq+0x218/0x35f
[ 2866.132281]  [&lt;ffffffff81078017&gt;] ? run_timer_softirq+0x1ad/0x35f
[ 2866.132281]  [&lt;ffffffff810f5831&gt;] ? rb_commit+0x58/0x85
[ 2866.132281]  [&lt;ffffffff81911dcb&gt;] ? tcp_write_timer_handler+0x148/0x148
[ 2866.132281]  [&lt;ffffffff81070bd6&gt;] __do_softirq+0xcb/0x1f9
[ 2866.132281]  [&lt;ffffffff81a0a00c&gt;] ? _raw_spin_unlock+0x29/0x2e
[ 2866.132281]  [&lt;ffffffff81a1227c&gt;] call_softirq+0x1c/0x30
[ 2866.132281]  [&lt;ffffffff81039f38&gt;] do_softirq+0x4a/0xa6
[ 2866.132281]  [&lt;ffffffff81070f2b&gt;] irq_exit+0x51/0xad
[ 2866.132281]  [&lt;ffffffff81a129cd&gt;] do_IRQ+0x9d/0xb4
[ 2866.132281]  [&lt;ffffffff81a0a3ef&gt;] common_interrupt+0x6f/0x6f
[ 2866.132281]  &lt;EOI&gt;  [&lt;ffffffff8109d006&gt;] ? sched_clock_cpu+0x58/0xd1
[ 2866.132281]  [&lt;ffffffff81a0a172&gt;] ? _raw_spin_unlock_irqrestore+0x4c/0x56
[ 2866.132281]  [&lt;ffffffff81078692&gt;] mod_timer+0x178/0x1a9
[ 2866.132281]  [&lt;ffffffff818a00aa&gt;] sk_reset_timer+0x19/0x26
[ 2866.132281]  [&lt;ffffffff8190b2cc&gt;] tcp_rearm_rto+0x99/0xa4
[ 2866.132281]  [&lt;ffffffff8190dfba&gt;] tcp_event_new_data_sent+0x6e/0x70
[ 2866.132281]  [&lt;ffffffff8190f7ea&gt;] tcp_write_xmit+0x7de/0x8e4
[ 2866.132281]  [&lt;ffffffff818a565d&gt;] ? __alloc_skb+0xa0/0x1a1
[ 2866.132281]  [&lt;ffffffff8190f952&gt;] __tcp_push_pending_frames+0x2e/0x8a
[ 2866.132281]  [&lt;ffffffff81904122&gt;] tcp_sendmsg+0xb32/0xcc6
[ 2866.132281]  [&lt;ffffffff819229c2&gt;] inet_sendmsg+0xaa/0xd5
[ 2866.132281]  [&lt;ffffffff81922918&gt;] ? inet_autobind+0x5f/0x5f
[ 2866.132281]  [&lt;ffffffff810ee7f1&gt;] ? trace_clock_local+0x9/0xb
[ 2866.132281]  [&lt;ffffffff8189adab&gt;] sock_sendmsg+0xa3/0xc4
[ 2866.132281]  [&lt;ffffffff810f5de6&gt;] ? rb_reserve_next_event+0x26f/0x2d5
[ 2866.132281]  [&lt;ffffffff8103e6a9&gt;] ? native_sched_clock+0x29/0x6f
[ 2866.132281]  [&lt;ffffffff8103e6f8&gt;] ? sched_clock+0x9/0xd
[ 2866.132281]  [&lt;ffffffff810ee7f1&gt;] ? trace_clock_local+0x9/0xb
[ 2866.132281]  [&lt;ffffffff8189ae03&gt;] kernel_sendmsg+0x37/0x43
[ 2866.132281]  [&lt;ffffffff8199ce49&gt;] xs_send_kvec+0x77/0x80
[ 2866.132281]  [&lt;ffffffff8199cec1&gt;] xs_sendpages+0x6f/0x1a0
[ 2866.132281]  [&lt;ffffffff8107826d&gt;] ? try_to_del_timer_sync+0x55/0x61
[ 2866.132281]  [&lt;ffffffff8199d0d2&gt;] xs_tcp_send_request+0x55/0xf1
[ 2866.132281]  [&lt;ffffffff8199bb90&gt;] xprt_transmit+0x89/0x1db
[ 2866.132281]  [&lt;ffffffff81999bcd&gt;] ? call_connect+0x3c/0x3c
[ 2866.132281]  [&lt;ffffffff81999d92&gt;] call_transmit+0x1c5/0x20e
[ 2866.132281]  [&lt;ffffffff819a0d55&gt;] __rpc_execute+0x6f/0x225
[ 2866.132281]  [&lt;ffffffff81999bcd&gt;] ? call_connect+0x3c/0x3c
[ 2866.132281]  [&lt;ffffffff819a0f33&gt;] rpc_async_schedule+0x28/0x34
[ 2866.132281]  [&lt;ffffffff810835d6&gt;] process_one_work+0x24d/0x47f
[ 2866.132281]  [&lt;ffffffff81083567&gt;] ? process_one_work+0x1de/0x47f
[ 2866.132281]  [&lt;ffffffff819a0f0b&gt;] ? __rpc_execute+0x225/0x225
[ 2866.132281]  [&lt;ffffffff81083a6d&gt;] worker_thread+0x236/0x317
[ 2866.132281]  [&lt;ffffffff81083837&gt;] ? process_scheduled_works+0x2f/0x2f
[ 2866.132281]  [&lt;ffffffff8108b7b8&gt;] kthread+0x9a/0xa2
[ 2866.132281]  [&lt;ffffffff81a12184&gt;] kernel_thread_helper+0x4/0x10
[ 2866.132281]  [&lt;ffffffff81a0a4b0&gt;] ? retint_restore_args+0x13/0x13
[ 2866.132281]  [&lt;ffffffff8108b71e&gt;] ? __init_kthread_worker+0x5a/0x5a
[ 2866.132281]  [&lt;ffffffff81a12180&gt;] ? gs_change+0x13/0x13
[ 2866.308506] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
[ 2866.309689] =============================================================================
[ 2866.310254] BUG TCP (Not tainted): Object already free
[ 2866.310254] -----------------------------------------------------------------------------
[ 2866.310254]

The bug comes from the fact that timer set in sk_reset_timer() can run
before we actually do the sock_hold(). socket refcount reaches zero and
we free the socket too soon.

timer handler is not allowed to reduce socket refcnt if socket is owned
by the user, or we need to change sk_reset_timer() implementation.

We should take a reference on the socket in case TCP_DELACK_TIMER_DEFERRED
or TCP_DELACK_TIMER_DEFERRED bit are set in tsq_flags

Also fix a typo in tcp_delack_timer(), where TCP_WRITE_TIMER_DEFERRED
was used instead of TCP_DELACK_TIMER_DEFERRED.

For consistency, use same socket refcount change for TCP_MTU_REDUCED_DEFERRED,
even if not fired from a timer.

Reported-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Tested-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: improve latencies of timer triggered events</title>
<updated>2012-07-20T17:59:41+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-07-20T05:45:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6f458dfb409272082c9bfa412f77ff2fc21c626f'/>
<id>6f458dfb409272082c9bfa412f77ff2fc21c626f</id>
<content type='text'>
Modern TCP stack highly depends on tcp_write_timer() having a small
latency, but current implementation doesn't exactly meet the
expectations.

When a timer fires but finds the socket is owned by the user, it rearms
itself for an additional delay hoping next run will be more
successful.

tcp_write_timer() for example uses a 50ms delay for next try, and it
defeats many attempts to get predictable TCP behavior in term of
latencies.

Use the recently introduced tcp_release_cb(), so that the user owning
the socket will call various handlers right before socket release.

This will permit us to post a followup patch to address the
tcp_tso_should_defer() syndrome (some deferred packets have to wait
RTO timer to be transmitted, while cwnd should allow us to send them
sooner)

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Nandita Dukkipati &lt;nanditad@google.com&gt;
Cc: H.K. Jerry Chu &lt;hkchu@google.com&gt;
Cc: John Heffner &lt;johnwheffner@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Modern TCP stack highly depends on tcp_write_timer() having a small
latency, but current implementation doesn't exactly meet the
expectations.

When a timer fires but finds the socket is owned by the user, it rearms
itself for an additional delay hoping next run will be more
successful.

tcp_write_timer() for example uses a 50ms delay for next try, and it
defeats many attempts to get predictable TCP behavior in term of
latencies.

Use the recently introduced tcp_release_cb(), so that the user owning
the socket will call various handlers right before socket release.

This will permit us to post a followup patch to address the
tcp_tso_should_defer() syndrome (some deferred packets have to wait
RTO timer to be transmitted, while cwnd should allow us to send them
sooner)

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Nandita Dukkipati &lt;nanditad@google.com&gt;
Cc: H.K. Jerry Chu &lt;hkchu@google.com&gt;
Cc: John Heffner &lt;johnwheffner@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: early retransmit: delayed fast retransmit</title>
<updated>2012-05-03T00:56:10+00:00</updated>
<author>
<name>Yuchung Cheng</name>
<email>ycheng@google.com</email>
</author>
<published>2012-05-02T13:30:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=750ea2bafa55aaed208b2583470ecd7122225634'/>
<id>750ea2bafa55aaed208b2583470ecd7122225634</id>
<content type='text'>
Implementing the advanced early retransmit (sysctl_tcp_early_retrans==2).
Delays the fast retransmit by an interval of RTT/4. We borrow the
RTO timer to implement the delay. If we receive another ACK or send
a new packet, the timer is cancelled and restored to original RTO
value offset by time elapsed.  When the delayed-ER timer fires,
we enter fast recovery and perform fast retransmit.

Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Implementing the advanced early retransmit (sysctl_tcp_early_retrans==2).
Delays the fast retransmit by an interval of RTT/4. We borrow the
RTO timer to implement the delay. If we receive another ACK or send
a new packet, the timer is cancelled and restored to original RTO
value offset by time elapsed.  When the delayed-ER timer fires,
we enter fast recovery and perform fast retransmit.

Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: ipv4: Standardize prefixes for message logging</title>
<updated>2012-03-13T00:05:21+00:00</updated>
<author>
<name>Joe Perches</name>
<email>joe@perches.com</email>
</author>
<published>2012-03-12T07:03:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=afd465030acb4098abcb6b965a5aebc7ea2209e0'/>
<id>afd465030acb4098abcb6b965a5aebc7ea2209e0</id>
<content type='text'>
Add #define pr_fmt(fmt) as appropriate.

Add "IPv4: ", "TCP: ", and "IPsec: " to appropriate files.
Standardize on "UDPLite: " for appropriate uses.
Some prefixes were previously "UDPLITE: " and "UDP-Lite: ".

Add KBUILD_MODNAME ": " to icmp and gre.
Remove embedded prefixes as appropriate.

Add missing "\n" to pr_info in gre.c.

Signed-off-by: Joe Perches &lt;joe@perches.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add #define pr_fmt(fmt) as appropriate.

Add "IPv4: ", "TCP: ", and "IPsec: " to appropriate files.
Standardize on "UDPLite: " for appropriate uses.
Some prefixes were previously "UDPLITE: " and "UDP-Lite: ".

Add KBUILD_MODNAME ": " to icmp and gre.
Remove embedded prefixes as appropriate.

Add missing "\n" to pr_info in gre.c.

Signed-off-by: Joe Perches &lt;joe@perches.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Disambiguate kernel message</title>
<updated>2012-02-01T19:41:50+00:00</updated>
<author>
<name>Arun Sharma</name>
<email>asharma@fb.com</email>
</author>
<published>2012-01-30T22:16:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=efcdbf24fd5daa88060869e51ed49f68b7ac8708'/>
<id>efcdbf24fd5daa88060869e51ed49f68b7ac8708</id>
<content type='text'>
Some of our machines were reporting:

TCP: too many of orphaned sockets

even when the number of orphaned sockets was well below the
limit.

We print a different message depending on whether we're out
of TCP memory or there are too many orphaned sockets.

Also move the check out of line and cleanup the messages
that were printed.

Signed-off-by: Arun Sharma &lt;asharma@fb.com&gt;
Suggested-by: Mohan Srinivasan &lt;mohan@fb.com&gt;
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Glauber Costa &lt;glommer@parallels.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Some of our machines were reporting:

TCP: too many of orphaned sockets

even when the number of orphaned sockets was well below the
limit.

We print a different message depending on whether we're out
of TCP memory or there are too many orphaned sockets.

Also move the check out of line and cleanup the messages
that were printed.

Signed-off-by: Arun Sharma &lt;asharma@fb.com&gt;
Suggested-by: Mohan Srinivasan &lt;mohan@fb.com&gt;
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Glauber Costa &lt;glommer@parallels.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: fix assignment of 0/1 to bool variables.</title>
<updated>2011-12-20T03:27:29+00:00</updated>
<author>
<name>Rusty Russell</name>
<email>rusty@rustcorp.com.au</email>
</author>
<published>2011-12-19T13:56:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3db1cd5c05f35fb43eb134df6f321de4e63141f2'/>
<id>3db1cd5c05f35fb43eb134df6f321de4e63141f2</id>
<content type='text'>
DaveM said:
   Please, this kind of stuff rots forever and not using bool properly
   drives me crazy.

Joe Perches &lt;joe@perches.com&gt; gave me the spatch script:

	@@
	bool b;
	@@
	-b = 0
	+b = false
	@@
	bool b;
	@@
	-b = 1
	+b = true

I merely installed coccinelle, read the documentation and took credit.

Signed-off-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
DaveM said:
   Please, this kind of stuff rots forever and not using bool properly
   drives me crazy.

Joe Perches &lt;joe@perches.com&gt; gave me the spatch script:

	@@
	bool b;
	@@
	-b = 0
	+b = false
	@@
	bool b;
	@@
	-b = 1
	+b = true

I merely installed coccinelle, read the documentation and took credit.

Signed-off-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>foundations of per-cgroup memory pressure controlling.</title>
<updated>2011-12-13T00:04:10+00:00</updated>
<author>
<name>Glauber Costa</name>
<email>glommer@parallels.com</email>
</author>
<published>2011-12-11T21:47:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=180d8cd942ce336b2c869d324855c40c5db478ad'/>
<id>180d8cd942ce336b2c869d324855c40c5db478ad</id>
<content type='text'>
This patch replaces all uses of struct sock fields' memory_pressure,
memory_allocated, sockets_allocated, and sysctl_mem to acessor
macros. Those macros can either receive a socket argument, or a mem_cgroup
argument, depending on the context they live in.

Since we're only doing a macro wrapping here, no performance impact at all is
expected in the case where we don't have cgroups disabled.

Signed-off-by: Glauber Costa &lt;glommer@parallels.com&gt;
Reviewed-by: Hiroyouki Kamezawa &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
CC: David S. Miller &lt;davem@davemloft.net&gt;
CC: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
CC: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch replaces all uses of struct sock fields' memory_pressure,
memory_allocated, sockets_allocated, and sysctl_mem to acessor
macros. Those macros can either receive a socket argument, or a mem_cgroup
argument, depending on the context they live in.

Since we're only doing a macro wrapping here, no performance impact at all is
expected in the case where we don't have cgroups disabled.

Signed-off-by: Glauber Costa &lt;glommer@parallels.com&gt;
Reviewed-by: Hiroyouki Kamezawa &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
CC: David S. Miller &lt;davem@davemloft.net&gt;
CC: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
CC: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: use IS_ENABLED(CONFIG_IPV6)</title>
<updated>2011-12-11T23:25:16+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-12-10T09:48:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=dfd56b8b38fff3586f36232db58e1e9f7885a605'/>
<id>dfd56b8b38fff3586f36232db58e1e9f7885a605</id>
<content type='text'>
Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
