linux-toradex.git/net/core/sock.c, branch v3.13.7

net: use __GFP_NORETRY for high order allocations

2014-03-07T06:06:15+00:00

[ Upstream commit ed98df3361f059db42786c830ea96e2d18b8d4db ]

sock_alloc_send_pskb() & sk_page_frag_refill()
have a loop trying high order allocations to prepare
skb with low number of fragments as this increases performance.

Problem is that under memory pressure/fragmentation, this can
trigger OOM while the intent was only to try the high order
allocations, then fallback to order-0 allocations.

We had various reports from unexpected regressions.

According to David, setting __GFP_NORETRY should be fine,
as the asynchronous compaction is still enabled, and this
will prevent OOM from kicking as in :

CFSClientEventm invoked oom-killer: gfp_mask=0x42d0, order=3, oom_adj=0,
oom_score_adj=0, oom_score_badness=2 (enabled),memcg_scoring=disabled
CFSClientEventm

Call Trace:
 [] dump_header+0xe1/0x23e
 [] oom_kill_process+0x6a/0x323
 [] out_of_memory+0x4b3/0x50d
 [] __alloc_pages_may_oom+0xa2/0xc7
 [] __alloc_pages_nodemask+0x1002/0x17f0
 [] alloc_pages_current+0x103/0x2b0
 [] sk_page_frag_refill+0x8f/0x160
 [] tcp_sendmsg+0x560/0xee0
 [] inet_sendmsg+0x67/0x100
 [] __sock_sendmsg_nosec+0x6c/0x90
 [] sock_sendmsg+0xc5/0xf0
 [] __sys_sendmsg+0x136/0x430
 [] sys_sendmsg+0x88/0x110
 [] system_call_fastpath+0x16/0x1b
Out of Memory: Kill process 2856 (bash) score 9999 or sacrifice child

Signed-off-by: Eric Dumazet 
Acked-by: David Rientjes 
Acked-by: "Eric W. Biederman" 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net: unix: allow set_peek_off to fail

2013-12-11T02:45:15+00:00

unix_dgram_recvmsg() will hold the readlock of the socket until recv
is complete.

In the same time, we may try to setsockopt(SO_PEEK_OFF) which will hang until
unix_dgram_recvmsg() will complete (which can take a while) without allowing
us to break out of it, triggering a hung task spew.

Instead, allow set_peek_off to fail, this way userspace will not hang.

Signed-off-by: Sasha Levin 
Acked-by: Pavel Emelyanov 
Signed-off-by: David S. Miller

net: remove function sk_reset_txq()

2013-10-22T18:00:21+00:00

What sk_reset_txq() does is just calls function sk_tx_queue_reset(),
and sk_reset_txq() is used only in sock.h, by dst_negative_advice().
Let dst_negative_advice() calls sk_tx_queue_reset() directly so we
can remove unneeded sk_reset_txq().

Signed-off-by: ZHAO Gang 
Signed-off-by: David S. Miller

net: refactor sk_page_frag_refill()

2013-10-18T04:08:51+00:00

While working on virtio_net new allocation strategy to increase
payload/truesize ratio, we found that refactoring sk_page_frag_refill()
was needed.

This patch splits sk_page_frag_refill() into two parts, adding
skb_page_frag_refill() which can be used without a socket.

While we are at it, add a minimum frag size of 32 for
sk_page_frag_refill()

Michael will either use netdev_alloc_frag() from softirq context,
or skb_page_frag_refill() from process context in refill_work()
 (GFP_KERNEL allocations)

Signed-off-by: Eric Dumazet 
Cc: Michael Dalton 
Signed-off-by: David S. Miller

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

2013-10-09T03:07:53+00:00

Conflicts:
	include/linux/netdevice.h
	net/core/sock.c

Trivial merge issues.

Removal of "extern" for functions declaration in netdevice.h
at the same time "const" was added to an argument.

Two parallel line additions in net/core/sock.c

Signed-off-by: David S. Miller

pkt_sched: fq: fix non TCP flows pacing

2013-10-09T01:54:01+00:00

Steinar reported FQ pacing was not working for UDP flows.

It looks like the initial sk->sk_pacing_rate value of 0 was
a wrong choice. We should init it to ~0U (unlimited)

Then, TCA_FQ_FLOW_DEFAULT_RATE should be removed because it makes
no real sense. The default rate is really unlimited, and we
need to avoid a zero divide.

Reported-by: Steinar H. Gunderson 
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller

net: introduce SO_MAX_PACING_RATE

2013-09-28T22:35:41+00:00

As mentioned in commit afe4fd062416b ("pkt_sched: fq: Fair Queue packet
scheduler"), this patch adds a new socket option.

SO_MAX_PACING_RATE offers the application the ability to cap the
rate computed by transport layer. Value is in bytes per second.

u32 val = 1000000;
setsockopt(sockfd, SOL_SOCKET, SO_MAX_PACING_RATE, &val, sizeof(val));

To be effectively paced, a flow must use FQ packet scheduler.

Note that a packet scheduler takes into account the headers for its
computations. The effective payload rate depends on MSS and retransmits
if any.

I chose to make this pacing rate a SOL_SOCKET option instead of a
TCP one because this can be used by other protocols.

Signed-off-by: Eric Dumazet 
Cc: Steinar H. Gunderson 
Cc: Michael Kerrisk 
Signed-off-by: David S. Miller

net: attempt high order allocations in sock_alloc_send_pskb()

2013-08-10T08:16:44+00:00

Adding paged frags skbs to af_unix sockets introduced a performance
regression on large sends because of additional page allocations, even
if each skb could carry at least 100% more payload than before.

We can instruct sock_alloc_send_pskb() to attempt high order
allocations.

Most of the time, it does a single page allocation instead of 8.

I added an additional parameter to sock_alloc_send_pskb() to
let other users to opt-in for this new feature on followup patches.

Tested:

Before patch :

$ netperf -t STREAM_STREAM
STREAM STREAM TEST
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 2304  212992  212992    10.00    46861.15

After patch :

$ netperf -t STREAM_STREAM
STREAM STREAM TEST
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 2304  212992  212992    10.00    57981.11

Signed-off-by: Eric Dumazet 
Cc: David Rientjes 
Signed-off-by: David S. Miller

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

2013-08-04T04:36:46+00:00

Merge net into net-next to setup some infrastructure Eric
Dumazet needs for usbnet changes.

Signed-off-by: David S. Miller

net: rename CONFIG_NET_LL_RX_POLL to CONFIG_NET_RX_BUSY_POLL

2013-08-01T22:11:17+00:00

Eliezer renames several *ll_poll to *busy_poll, but forgets
CONFIG_NET_LL_RX_POLL, so in case of confusion, rename it too.

Cc: Eliezer Tamir 
Cc: David S. Miller 
Signed-off-by: Cong Wang 
Signed-off-by: David S. Miller