linux-toradex.git/include, branch v3.10.41

net: Fix ns_capable check in sock_diag_put_filterinfo

2014-05-31T04:52:15+00:00

[ Upstream commit 78541c1dc60b65ecfce5a6a096fc260219d6784e ]

The caller needs capabilities on the namespace being queried, not on
their own namespace.  This is a security bug, although it likely has
only a minor impact.

Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski 
Acked-by: Nicolas Dichtel 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net: sctp: cache auth_enable per endpoint

2014-05-31T04:52:15+00:00

[ Upstream commit b14878ccb7fac0242db82720b784ab62c467c0dc ]

Currently, it is possible to create an SCTP socket, then switch
auth_enable via sysctl setting to 1 and crash the system on connect:

Oops[#1]:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1
task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000
[...]
Call Trace:
[] sctp_auth_asoc_set_default_hmac+0x68/0x80
[] sctp_process_init+0x5e0/0x8a4
[] sctp_sf_do_5_1B_init+0x234/0x34c
[] sctp_do_sm+0xb4/0x1e8
[] sctp_endpoint_bh_rcv+0x1c4/0x214
[] sctp_rcv+0x588/0x630
[] sctp6_rcv+0x10/0x24
[] ip6_input+0x2c0/0x440
[] __netif_receive_skb_core+0x4a8/0x564
[] process_backlog+0xb4/0x18c
[] net_rx_action+0x12c/0x210
[] __do_softirq+0x17c/0x2ac
[] irq_exit+0x54/0xb0
[] ret_from_irq+0x0/0x4
[] rm7k_wait_irqoff+0x24/0x48
[] cpu_startup_entry+0xc0/0x148
[] start_kernel+0x37c/0x398
Code: dd0900b8  000330f8  0126302d  50c0fff1  0047182a  a48306a0
03e00008  00000000
---[ end trace b530b0551467f2fd ]---
Kernel panic - not syncing: Fatal exception in interrupt

What happens while auth_enable=0 in that case is, that
ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs()
when endpoint is being created.

After that point, if an admin switches over to auth_enable=1,
the machine can crash due to NULL pointer dereference during
reception of an INIT chunk. When we enter sctp_process_init()
via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk,
the INIT verification succeeds and while we walk and process
all INIT params via sctp_process_param() we find that
net->sctp.auth_enable is set, therefore do not fall through,
but invoke sctp_auth_asoc_set_default_hmac() instead, and thus,
dereference what we have set to NULL during endpoint
initialization phase.

The fix is to make auth_enable immutable by caching its value
during endpoint initialization, so that its original value is
being carried along until destruction. The bug seems to originate
from the very first days.

Fix in joint work with Daniel Borkmann.

Reported-by: Joshua Kinard 
Signed-off-by: Vlad Yasevich 
Signed-off-by: Daniel Borkmann 
Acked-by: Neil Horman 
Tested-by: Joshua Kinard 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

ipv6: Limit mtu to 65575 bytes

2014-05-31T04:52:14+00:00

[ Upstream commit 30f78d8ebf7f514801e71b88a10c948275168518 ]

Francois reported that setting big mtu on loopback device could prevent
tcp sessions making progress.

We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets.

We must limit the IPv6 MTU to (65535 + 40) bytes in theory.

Tested:

ifconfig lo mtu 70000
netperf -H ::1

Before patch : Throughput :   0.05 Mbits

After patch : Throughput : 35484 Mbits

Reported-by: Francois WELLENREITER 
Signed-off-by: Eric Dumazet 
Acked-by: YOSHIFUJI Hideaki 
Acked-by: Hannes Frederic Sowa 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

list: introduce list_next_entry() and list_prev_entry()

2014-05-31T04:52:14+00:00

[ Upstream commit 008208c6b26f21c2648c250a09c55e737c02c5f8 ]

Add two trivial helpers list_next_entry() and list_prev_entry(), they
can have a lot of users including list.h itself.  In fact the 1st one is
already defined in events/core.c and bnx2x_sp.c, so the patch simply
moves the definition to list.h.

Signed-off-by: Oleg Nesterov 
Cc: Eilon Greenstein 
Cc: Greg Kroah-Hartman 
Cc: Peter Zijlstra 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm: use paravirt friendly ops for NUMA hinting ptes

2014-05-31T04:52:12+00:00

commit 29c7787075c92ca8af353acd5301481e6f37082f upstream.

David Vrabel identified a regression when using automatic NUMA balancing
under Xen whereby page table entries were getting corrupted due to the
use of native PTE operations.  Quoting him

	Xen PV guest page tables require that their entries use machine
	addresses if the preset bit (_PAGE_PRESENT) is set, and (for
	successful migration) non-present PTEs must use pseudo-physical
	addresses.  This is because on migration MFNs in present PTEs are
	translated to PFNs (canonicalised) so they may be translated back
	to the new MFN in the destination domain (uncanonicalised).

	pte_mknonnuma(), pmd_mknonnuma(), pte_mknuma() and pmd_mknuma()
	set and clear the _PAGE_PRESENT bit using pte_set_flags(),
	pte_clear_flags(), etc.

	In a Xen PV guest, these functions must translate MFNs to PFNs
	when clearing _PAGE_PRESENT and translate PFNs to MFNs when setting
	_PAGE_PRESENT.

His suggested fix converted p[te|md]_[set|clear]_flags to using
paravirt-friendly ops but this is overkill.  He suggested an alternative
of using p[te|md]_modify in the NUMA page table operations but this is
does more work than necessary and would require looking up a VMA for
protections.

This patch modifies the NUMA page table operations to use paravirt
friendly operations to set/clear the flags of interest.  Unfortunately
this will take a performance hit when updating the PTEs on
CONFIG_PARAVIRT but I do not see a way around it that does not break
Xen.

Signed-off-by: Mel Gorman 
Acked-by: David Vrabel 
Tested-by: David Vrabel 
Cc: Ingo Molnar 
Cc: Peter Anvin 
Cc: Fengguang Wu 
Cc: Linus Torvalds 
Cc: Steven Noonan 
Cc: Rik van Riel 
Cc: Peter Zijlstra 
Cc: Andrea Arcangeli 
Cc: Dave Hansen 
Cc: Srikar Dronamraju 
Cc: Cyrill Gorcunov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len

2014-05-31T04:52:11+00:00

commit 223b02d923ecd7c84cf9780bb3686f455d279279 upstream.

"len" contains sizeof(nf_ct_ext) and size of extensions. In a worst
case it can contain all extensions. Bellow you can find sizes for all
types of extensions. Their sum is definitely bigger than 256.

nf_ct_ext_types[0]->len = 24
nf_ct_ext_types[1]->len = 32
nf_ct_ext_types[2]->len = 24
nf_ct_ext_types[3]->len = 32
nf_ct_ext_types[4]->len = 152
nf_ct_ext_types[5]->len = 2
nf_ct_ext_types[6]->len = 16
nf_ct_ext_types[7]->len = 8

I have seen "len" up to 280 and my host has crashes w/o this patch.

The right way to fix this problem is reducing the size of the ecache
extension (4) and Florian is going to do this, but these changes will
be quite large to be appropriate for a stable tree.

Fixes: 5b423f6a40a0 (netfilter: nf_conntrack: fix racy timer handling with reliable)
Cc: Pablo Neira Ayuso 
Cc: Patrick McHardy 
Cc: Jozsef Kadlecsik 
Cc: "David S. Miller" 
Signed-off-by: Andrey Vagin 
Signed-off-by: Pablo Neira Ayuso 
Signed-off-by: Greg Kroah-Hartman

blktrace: fix accounting of partially completed requests

2014-05-31T04:52:11+00:00

commit af5040da01ef980670b3741b3e10733ee3e33566 upstream.

trace_block_rq_complete does not take into account that request can
be partially completed, so we can get the following incorrect output
of blkparser:

  C   R 232 + 240 [0]
  C   R 240 + 232 [0]
  C   R 248 + 224 [0]
  C   R 256 + 216 [0]

but should be:

  C   R 232 + 8 [0]
  C   R 240 + 8 [0]
  C   R 248 + 8 [0]
  C   R 256 + 8 [0]

Also, the whole output summary statistics of completed requests and
final throughput will be incorrect.

This patch takes into account real completion size of the request and
fixes wrong completion accounting.

Signed-off-by: Roman Pen 
CC: Steven Rostedt 
CC: Frederic Weisbecker 
CC: Ingo Molnar 
CC: linux-kernel@vger.kernel.org
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

scsi: fix our current target reap infrastructure

2014-05-31T04:52:11+00:00

commit e63ed0d7a98014fdfc2cfeb3f6dada313dcabb59 upstream.

This patch eliminates the reap_ref and replaces it with a proper kref.
On last put of this kref, the target is removed from visibility in
sysfs.  The final call to scsi_target_reap() for the device is done from
__scsi_remove_device() and only if the device was made visible.  This
ensures that the target disappears as soon as the last device is gone
rather than waiting until final release of the device (which is often
too long).

Reviewed-by: Alan Stern 
Tested-by: Sarah Sharp 
Signed-off-by: James Bottomley 
Signed-off-by: Greg Kroah-Hartman

libata/ahci: accommodate tag ordered controllers

2014-05-13T11:59:43+00:00

commit 8a4aeec8d2d6a3edeffbdfae451cdf05cbf0fefd upstream.

The AHCI spec allows implementations to issue commands in tag order
rather than FIFO order:

	5.3.2.12 P:SelectCmd
	HBA sets pSlotLoc = (pSlotLoc + 1) mod (CAP.NCS + 1)
	or HBA selects the command to issue that has had the
	PxCI bit set to '1' longer than any other command
	pending to be issued.

The result is that commands posted sequentially (time-wise) may play out
of sequence when issued by hardware.

This behavior has likely been hidden by drives that arrange for commands
to complete in issue order.  However, it appears recent drives (two from
different vendors that we have found so far) inflict out-of-order
completions as a matter of course.  So, we need to take care to maintain
ordered submission, otherwise we risk triggering a drive to fall out of
sequential-io automation and back to random-io processing, which incurs
large latency and degrades throughput.

This issue was found in simple benchmarks where QD=2 seq-write
performance was 30-50% *greater* than QD=32 seq-write performance.

Tagging for -stable and making the change globally since it has a low
risk-to-reward ratio.  Also, word is that recent versions of an unnamed
OS also does it this way now.  So, drives in the field are already
experienced with this tag ordering scheme.

Cc: Dave Jiang 
Cc: Ed Ciechanowski 
Reviewed-by: Matthew Wilcox 
Signed-off-by: Dan Williams 
Signed-off-by: Tejun Heo 
Signed-off-by: Greg Kroah-Hartman

nfsd: check passed socket's net matches NFSd superblock's one

2014-05-06T14:55:29+00:00

commit 3064639423c48d6e0eb9ecc27c512a58e38c6c57 upstream.

There could be a case, when NFSd file system is mounted in network, different
to socket's one, like below:

"ip netns exec" creates new network and mount namespace, which duplicates NFSd
mount point, created in init_net context. And thus NFS server stop in nested
network context leads to RPCBIND client destruction in init_net.
Then, on NFSd start in nested network context, rpc.nfsd process creates socket
in nested net and passes it into "write_ports", which leads to RPCBIND sockets
creation in init_net context because of the same reason (NFSd monut point was
created in init_net context). An attempt to register passed socket in nested
net leads to panic, because no RPCBIND client present in nexted network
namespace.

This patch add check that passed socket's net matches NFSd superblock's one.
And returns -EINVAL error to user psace otherwise.

v2: Put socket on exit.

Reported-by: Weng Meiling 
Signed-off-by: Stanislav Kinsbursky 
Signed-off-by: J. Bruce Fields 
Signed-off-by: Greg Kroah-Hartman