linux-toradex.git/include/net, branch v3.0.79

ipv6: do not clear pinet6 field

2013-05-19T17:04:46+00:00

[ Upstream commit f77d602124d865c38705df7fa25c03de9c284ad2 ]

We have seen multiple NULL dereferences in __inet6_lookup_established()

After analysis, I found that inet6_sk() could be NULL while the
check for sk_family == AF_INET6 was true.

Bug was added in linux-2.6.29 when RCU lookups were introduced in UDP
and TCP stacks.

Once an IPv6 socket, using SLAB_DESTROY_BY_RCU is inserted in a hash
table, we no longer can clear pinet6 field.

This patch extends logic used in commit fcbdf09d9652c891
("net: fix nulls list corruptions in sk_prot_alloc")

TCP/UDP/UDPLite IPv6 protocols provide their own .clear_sk() method
to make sure we do not clear pinet6 field.

At socket clone phase, we do not really care, as cloning the parent (non
NULL) pinet6 is not adding a fatal race.

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: force a dst refcount when prequeue packet

2013-05-19T17:04:42+00:00

[ Upstream commit 093162553c33e9479283e107b4431378271c735d ]

Before escaping RCU protected section and adding packet into
prequeue, make sure the dst is refcounted.

Reported-by: Mike Galbraith 
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net: fix incorrect credentials passing

2013-05-01T15:56:38+00:00

[ Upstream commit 83f1b4ba917db5dc5a061a44b3403ddb6e783494 ]

Commit 257b5358b32f ("scm: Capture the full credentials of the scm
sender") changed the credentials passing code to pass in the effective
uid/gid instead of the real uid/gid.

Obviously this doesn't matter most of the time (since normally they are
the same), but it results in differences for suid binaries when the wrong
uid/gid ends up being used.

This just undoes that (presumably unintentional) part of the commit.

Reported-by: Andy Lutomirski 
Cc: Eric W. Biederman 
Cc: Serge E. Hallyn 
Cc: David S. Miller 
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds 
Acked-by: "Eric W. Biederman" 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

inet: limit length of fragment queue hash table bucket lists

2013-03-28T19:05:59+00:00

[ Upstream commit 5a3da1fe9561828d0ca7eca664b16ec2b9bf0055 ]

This patch introduces a constant limit of the fragment queue hash
table bucket list lengths. Currently the limit 128 is choosen somewhat
arbitrary and just ensures that we can fill up the fragment cache with
empty packets up to the default ip_frag_high_thresh limits. It should
just protect from list iteration eating considerable amounts of cpu.

If we reach the maximum length in one hash bucket a warning is printed.
This is implemented on the caller side of inet_frag_find to distinguish
between the different users of inet_fragment.c.

I dropped the out of memory warning in the ipv4 fragment lookup path,
because we already get a warning by the slab allocator.

Cc: Eric Dumazet 
Cc: Jesper Dangaard Brouer 
Signed-off-by: Hannes Frederic Sowa 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

ipv4: fix definition of FIB_TABLE_HASHSZ

2013-03-28T19:05:59+00:00

[ Upstream commit 5b9e12dbf92b441b37136ea71dac59f05f2673a9 ]

a long time ago by the commit

  commit 93456b6d7753def8760b423ac6b986eb9d5a4a95
  Author: Denis V. Lunev 
  Date:   Thu Jan 10 03:23:38 2008 -0800

    [IPV4]: Unify access to the routing tables.

the defenition of FIB_HASH_TABLE size has obtained wrong dependency:
it should depend upon CONFIG_IP_MULTIPLE_TABLES (as was in the original
code) but it was depended from CONFIG_IP_ROUTE_MULTIPATH

This patch returns the situation to the original state.

The problem was spotted by Tingwei Liu.

Signed-off-by: Denis V. Lunev 
CC: Tingwei Liu 
CC: Alexey Kuznetsov 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

ipv6: use a stronger hash for tcp

2013-02-28T14:32:27+00:00

[ Upstream commit 08dcdbf6a7b9d14c2302c5bd0c5390ddf122f664 ]

It looks like its possible to open thousands of TCP IPv6
sessions on a server, all landing in a single slot of TCP hash
table. Incoming packets have to lookup sockets in a very
long list.

We should hash all bits from foreign IPv6 addresses, using
a salt and hash mix, not a simple XOR.

inet6_ehashfn() can also separately use the ports, instead
of xoring them.

Reported-by: Neal Cardwell 
Signed-off-by: Eric Dumazet 
Cc: Yuchung Cheng 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

rtnetlink: Fix problem with buffer allocation

2013-01-17T16:43:58+00:00

commit 115c9b81928360d769a76c632bae62d15206a94a upstream.

Implement a new netlink attribute type IFLA_EXT_MASK.  The mask
is a 32 bit value that can be used to indicate to the kernel that
certain extended ifinfo values are requested by the user application.
At this time the only mask value defined is RTEXT_FILTER_VF to
indicate that the user wants the ifinfo dump to send information
about the VFs belonging to the interface.

This patch fixes a bug in which certain applications do not have
large enough buffers to accommodate the extra information returned
by the kernel with large numbers of SR-IOV virtual functions.
Those applications will not send the new netlink attribute with
the interface info dump request netlink messages so they will
not get unexpectedly large request buffers returned by the kernel.

Modifies the rtnl_calcit function to traverse the list of net
devices and compute the minimum buffer size that can hold the
info dumps of all matching devices based upon the filter passed
in via the new netlink attribute filter mask.  If no filter
mask is sent then the buffer allocation defaults to NLMSG_GOODSIZE.

With this change it is possible to add yet to be defined netlink
attributes to the dump request which should make it fairly extensible
in the future.

Signed-off-by: Greg Rose 
Acked-by: Greg Rose 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.0:
 - Adjust context
 - Drop the change in do_setlink() that reverts commit f18da1456581
   ('net: RTNETLINK adjusting values of min_ifinfo_dump_size'), which
   was never applied here]
Signed-off-by: Ben Hutchings 
Signed-off-by: Greg Kroah-Hartman

rtnetlink: Compute and store minimum ifinfo dump size

2013-01-17T16:43:58+00:00

commit c7ac8679bec9397afe8918f788cbcef88c38da54 upstream.

The message size allocated for rtnl ifinfo dumps was limited to
a single page.  This is not enough for additional interface info
available with devices that support SR-IOV and caused a bug in
which VF info would not be displayed if more than approximately
40 VFs were created per interface.

Implement a new function pointer for the rtnl_register service that will
calculate the amount of data required for the ifinfo dump and allocate
enough data to satisfy the request.

Signed-off-by: Greg Rose 
Signed-off-by: Jeff Kirsher 
Cc: Ben Hutchings 
Signed-off-by: Greg Kroah-Hartman

tcp: implement RFC 5961 3.2

2013-01-11T17:03:48+00:00

[ Upstream commit 282f23c6ee343126156dd41218b22ece96d747e3 ]

Implement the RFC 5691 mitigation against Blind
Reset attack using RST bit.

Idea is to validate incoming RST sequence,
to match RCV.NXT value, instead of previouly accepted
window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND)

If sequence is in window but not an exact match, send
a "challenge ACK", so that the other part can resend an
RST with the appropriate sequence.

Add a new sysctl, tcp_challenge_ack_limit, to limit
number of challenge ACK sent per second.

Add a new SNMP counter to count number of challenge acks sent.
(netstat -s | grep TCPChallengeACK)

Signed-off-by: Eric Dumazet 
Cc: Kiran Kumar Kella 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

ipvs: fix oops on NAT reply in br_nf context

2012-10-21T16:17:11+00:00

commit 9e33ce453f8ac8452649802bee1f410319408f4b upstream.

IPVS should not reset skb->nf_bridge in FORWARD hook
by calling nf_reset for NAT replies. It triggers oops in
br_nf_forward_finish.

[  579.781508] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[  579.781669] IP: [] br_nf_forward_finish+0x58/0x112
[  579.781792] PGD 218f9067 PUD 0
[  579.781865] Oops: 0000 [#1] SMP
[  579.781945] CPU 0
[  579.781983] Modules linked in:
[  579.782047]
[  579.782080]
[  579.782114] Pid: 4644, comm: qemu Tainted: G        W    3.5.0-rc5-00006-g95e69f9 #282 Hewlett-Packard  /30E8
[  579.782300] RIP: 0010:[]  [] br_nf_forward_finish+0x58/0x112
[  579.782455] RSP: 0018:ffff88007b003a98  EFLAGS: 00010287
[  579.782541] RAX: 0000000000000008 RBX: ffff8800762ead00 RCX: 000000000001670a
[  579.782653] RDX: 0000000000000000 RSI: 000000000000000a RDI: ffff8800762ead00
[  579.782845] RBP: ffff88007b003ac8 R08: 0000000000016630 R09: ffff88007b003a90
[  579.782957] R10: ffff88007b0038e8 R11: ffff88002da37540 R12: ffff88002da01a02
[  579.783066] R13: ffff88002da01a80 R14: ffff88002d83c000 R15: ffff88002d82a000
[  579.783177] FS:  0000000000000000(0000) GS:ffff88007b000000(0063) knlGS:00000000f62d1b70
[  579.783306] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[  579.783395] CR2: 0000000000000004 CR3: 00000000218fe000 CR4: 00000000000027f0
[  579.783505] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  579.783684] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  579.783795] Process qemu (pid: 4644, threadinfo ffff880021b20000, task ffff880021aba760)
[  579.783919] Stack:
[  579.783959]  ffff88007693cedc ffff8800762ead00 ffff88002da01a02 ffff8800762ead00
[  579.784110]  ffff88002da01a02 ffff88002da01a80 ffff88007b003b18 ffffffff817b26c7
[  579.784260]  ffff880080000000 ffffffff81ef59f0 ffff8800762ead00 ffffffff81ef58b0
[  579.784477] Call Trace:
[  579.784523]  
[  579.784562]
[  579.784603]  [] br_nf_forward_ip+0x275/0x2c8
[  579.784707]  [] nf_iterate+0x47/0x7d
[  579.784797]  [] ? br_dev_queue_push_xmit+0xae/0xae
[  579.784906]  [] nf_hook_slow+0x6d/0x102
[  579.784995]  [] ? br_dev_queue_push_xmit+0xae/0xae
[  579.785175]  [] ? _raw_write_unlock_bh+0x19/0x1b
[  579.785179]  [] __br_forward+0x97/0xa2
[  579.785179]  [] br_handle_frame_finish+0x1a6/0x257
[  579.785179]  [] br_nf_pre_routing_finish+0x26d/0x2cb
[  579.785179]  [] br_nf_pre_routing+0x55d/0x5c1
[  579.785179]  [] nf_iterate+0x47/0x7d
[  579.785179]  [] ? br_handle_local_finish+0x44/0x44
[  579.785179]  [] nf_hook_slow+0x6d/0x102
[  579.785179]  [] ? br_handle_local_finish+0x44/0x44
[  579.785179]  [] ? sky2_poll+0xb35/0xb54
[  579.785179]  [] br_handle_frame+0x213/0x229
[  579.785179]  [] ? br_handle_frame_finish+0x257/0x257
[  579.785179]  [] __netif_receive_skb+0x2b4/0x3f1
[  579.785179]  [] process_backlog+0x99/0x1e2
[  579.785179]  [] net_rx_action+0xdf/0x242
[  579.785179]  [] __do_softirq+0xc1/0x1e0
[  579.785179]  [] ? trace_hardirqs_off_thunk+0x3a/0x6c
[  579.785179]  [] call_softirq+0x1c/0x30

The steps to reproduce as follow,

1. On Host1, setup brige br0(192.168.1.106)
2. Boot a kvm guest(192.168.1.105) on Host1 and start httpd
3. Start IPVS service on Host1
   ipvsadm -A -t 192.168.1.106:80 -s rr
   ipvsadm -a -t 192.168.1.106:80 -r 192.168.1.105:80 -m
4. Run apache benchmark on Host2(192.168.1.101)
   ab -n 1000 http://192.168.1.106/

ip_vs_reply4
  ip_vs_out
    handle_response
      ip_vs_notrack
        nf_reset()
        {
          skb->nf_bridge = NULL;
        }

Actually, IPVS wants in this case just to replace nfct
with untracked version. So replace the nf_reset(skb) call
in ip_vs_notrack() with a nf_conntrack_put(skb->nfct) call.

Signed-off-by: Lin Ming 
Signed-off-by: Julian Anastasov 
Signed-off-by: Simon Horman 
Signed-off-by: Pablo Neira Ayuso 
Acked-by: David Miller 
Signed-off-by: Greg Kroah-Hartman