linux-toradex.git/include/linux, branch v3.0.79

macvlan: fix passthru mode race between dev removal and rx path

2013-05-19T17:04:46+00:00

[ Upstream commit 233c7df0821c4190e2d3f4be0f2ca0ab40a5ed8c, note
  that I had to add list_first_or_null_rcu to rculist.h in order
  to accomodate this fix. ]

Currently, if macvlan in passthru mode is created and data are rxed and
you remove this device, following panic happens:

NULL pointer dereference at 0000000000000198
IP: [] macvlan_handle_frame+0x153/0x1f7 [macvlan]

I'm using following script to trigger this:


I run this script while "ping -f" is running on another machine to send
packets to e1 rx.

Reason of the panic is that list_first_entry() is blindly called in
macvlan_handle_frame() even if the list was empty. vlan is set to
incorrect pointer which leads to the crash.

I'm fixing this by protecting port->vlans list by rcu and by preventing
from getting incorrect pointer in case the list is empty.

Introduced by: commit eb06acdc85585f2 "macvlan: Introduce 'passthru' mode to takeover the underlying device"

Signed-off-by: Jiri Pirko 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

if_cablemodem.h: Add parenthesis around ioctl macros

2013-05-19T17:04:46+00:00

[ Upstream commit 4f924b2aa4d3cb30f07e57d6b608838edcbc0d88 ]

Protect the SIOCGCM* ioctl macros with parenthesis.

Reported-by: Paul Wouters 
Signed-off-by: Josh Boyer 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

ipc: sysv shared memory limited to 8TiB

2013-05-08T02:57:27+00:00

commit d69f3bad4675ac519d41ca2b11e1c00ca115cecd upstream.

Trying to run an application which was trying to put data into half of
memory using shmget(), we found that having a shmall value below 8EiB-8TiB
would prevent us from using anything more than 8TiB.  By setting
kernel.shmall greater than 8EiB-8TiB would make the job work.

In the newseg() function, ns->shm_tot which, at 8TiB is INT_MAX.

ipc/shm.c:
 458 static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
 459 {
...
 465         int numpages = (size + PAGE_SIZE -1) >> PAGE_SHIFT;
...
 474         if (ns->shm_tot + numpages > ns->shm_ctlall)
 475                 return -ENOSPC;

[akpm@linux-foundation.org: make ipc/shm.c:newseg()'s numpages size_t, not int]
Signed-off-by: Robin Holt 
Reported-by: Alex Thorlton 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

net: fix incorrect credentials passing

2013-05-01T15:56:38+00:00

[ Upstream commit 83f1b4ba917db5dc5a061a44b3403ddb6e783494 ]

Commit 257b5358b32f ("scm: Capture the full credentials of the scm
sender") changed the credentials passing code to pass in the effective
uid/gid instead of the real uid/gid.

Obviously this doesn't matter most of the time (since normally they are
the same), but it results in differences for suid binaries when the wrong
uid/gid ends up being used.

This just undoes that (presumably unintentional) part of the commit.

Reported-by: Andy Lutomirski 
Cc: Eric W. Biederman 
Cc: Serge E. Hallyn 
Cc: David S. Miller 
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds 
Acked-by: "Eric W. Biederman" 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

netfilter: don't reset nf_trace in nf_reset()

2013-05-01T15:56:37+00:00

[ Upstream commit 124dff01afbdbff251f0385beca84ba1b9adda68 ]

Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code
to reset nf_trace in nf_reset(). This is wrong and unnecessary.

nf_reset() is used in the following cases:

- when passing packets up the the socket layer, at which point we want to
  release all netfilter references that might keep modules pinned while
  the packet is queued. nf_trace doesn't matter anymore at this point.

- when encapsulating or decapsulating IPsec packets. We want to continue
  tracing these packets after IPsec processing.

- when passing packets through virtual network devices. Only devices on
  that encapsulate in IPv4/v6 matter since otherwise nf_trace is not
  used anymore. Its not entirely clear whether those packets should
  be traced after that, however we've always done that.

- when passing packets through virtual network devices that make the
  packet cross network namespace boundaries. This is the only cases
  where we clearly want to reset nf_trace and is also what the
  original patch intended to fix.

Add a new function nf_reset_trace() and use it in dev_forward_skb() to
fix this properly.

Signed-off-by: Patrick McHardy 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net: count hw_addr syncs so that unsync works properly.

2013-05-01T15:56:36+00:00

[ Upstream commit 4543fbefe6e06a9e40d9f2b28d688393a299f079 ]

A few drivers use dev_uc_sync/unsync to synchronize the
address lists from master down to slave/lower devices.  In
some cases (bond/team) a single address list is synched down
to multiple devices.  At the time of unsync, we have a leak
in these lower devices, because "synced" is treated as a
boolean and the address will not be unsynced for anything after
the first device/call.

Treat "synced" as a count (same as refcount) and allow all
unsync calls to work.

Signed-off-by: Vlad Yasevich 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

vm: add vm_iomap_memory() helper function

2013-04-26T04:23:49+00:00

commit b4cbb197c7e7a68dbad0d491242e3ca67420c13e upstream.

Various drivers end up replicating the code to mmap() their memory
buffers into user space, and our core memory remapping function may be
very flexible but it is unnecessarily complicated for the common cases
to use.

Our internal VM uses pfn's ("page frame numbers") which simplifies
things for the VM, and allows us to pass physical addresses around in a
denser and more efficient format than passing a "phys_addr_t" around,
and having to shift it up and down by the page size.  But it just means
that drivers end up doing that shifting instead at the interface level.

It also means that drivers end up mucking around with internal VM things
like the vma details (vm_pgoff, vm_start/end) way more than they really
need to.

So this just exports a function to map a certain physical memory range
into user space (using a phys_addr_t based interface that is much more
natural for a driver) and hides all the complexity from the driver.
Some drivers will still end up tweaking the vm_page_prot details for
things like prefetching or cacheability etc, but that's actually
relevant to the driver, rather than caring about what the page offset of
the mapping is into the particular IO memory region.

Acked-by: Greg Kroah-Hartman 
Signed-off-by: Linus Torvalds

KVM: Allow cross page reads and writes from cached translations.

2013-04-26T04:23:48+00:00

commit 8f964525a121f2ff2df948dac908dcc65be21b5b upstream.

This patch adds support for kvm_gfn_to_hva_cache_init functions for
reads and writes that will cross a page.  If the range falls within
the same memslot, then this will be a fast operation.  If the range
is split between two memslots, then the slower kvm_read_guest and
kvm_write_guest are used.

Tested: Test against kvm_clock unit tests.

Signed-off-by: Andrew Honig 
Signed-off-by: Gleb Natapov 
Cc: Ben Hutchings 
Signed-off-by: Greg Kroah-Hartman

spinlocks and preemption points need to be at least compiler barriers

2013-04-12T16:18:09+00:00

commit 386afc91144b36b42117b0092893f15bc8798a80 upstream.

In UP and non-preempt respectively, the spinlocks and preemption
disable/enable points are stubbed out entirely, because there is no
regular code that can ever hit the kind of concurrency they are meant to
protect against.

However, while there is no regular code that can cause scheduling, we
_do_ end up having some exceptional (literally!) code that can do so,
and that we need to make sure does not ever get moved into the critical
region by the compiler.

In particular, get_user() and put_user() is generally implemented as
inline asm statements (even if the inline asm may then make a call
instruction to call out-of-line), and can obviously cause a page fault
and IO as a result.  If that inline asm has been scheduled into the
middle of a preemption-safe (or spinlock-protected) code region, we
obviously lose.

Now, admittedly this is *very* unlikely to actually ever happen, and
we've not seen examples of actual bugs related to this.  But partly
exactly because it's so hard to trigger and the resulting bug is so
subtle, we should be extra careful to get this right.

So make sure that even when preemption is disabled, and we don't have to
generate any actual *code* to explicitly tell the system that we are in
a preemption-disabled region, we need to at least tell the compiler not
to move things around the critical region.

This patch grew out of the same discussion that caused commits
79e5f05edcbf ("ARC: Add implicit compiler barrier to raw_local_irq*
functions") and 3e2e0d2c222b ("tile: comment assumption about
__insn_mtspr for ") to come about.

Note for stable: use discretion when/if applying this.  As mentioned,
this bug may never have actually bitten anybody, and gcc may never have
done the required code motion for it to possibly ever trigger in
practice.

Signed-off-by: Linus Torvalds 
Cc: Steven Rostedt 
Cc: Peter Zijlstra 
Signed-off-by: Greg Kroah-Hartman

libata: Set max sector to 65535 for Slimtype DVD A DS8A8SH drive

2013-04-12T16:18:09+00:00

commit a32450e127fc6e5ca6d958ceb3cfea4d30a00846 upstream.

The Slimtype DVD A  DS8A8SH drive locks up when max sector is smaller than
65535, and the blow backtrace is observed on locking up:

INFO: task flush-8:32:1130 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
flush-8:32      D ffffffff8180cf60     0  1130      2 0x00000000
 ffff880273aef618 0000000000000046 0000000000000005 ffff880273aee000
 ffff880273aee000 ffff880273aeffd8 ffff880273aee010 ffff880273aee000
 ffff880273aeffd8 ffff880273aee000 ffff88026e842ea0 ffff880274a10000
Call Trace:
 [] schedule+0x5d/0x70
 [] io_schedule+0x8c/0xd0
 [] get_request+0x731/0x7d0
 [] ? cfq_allow_merge+0x50/0x90
 [] ? wake_up_bit+0x40/0x40
 [] ? bio_attempt_back_merge+0x33/0x110
 [] blk_queue_bio+0x23a/0x3f0
 [] generic_make_request+0xc6/0x120
 [] submit_bio+0x138/0x160
 [] ? bio_alloc_bioset+0x96/0x120
 [] submit_bh+0x1f1/0x220
 [] __block_write_full_page+0x228/0x340
 [] ? attach_nobh_buffers+0xc0/0xc0
 [] ? I_BDEV+0x10/0x10
 [] ? I_BDEV+0x10/0x10
 [] block_write_full_page_endio+0xe6/0x100
 [] block_write_full_page+0x15/0x20
 [] blkdev_writepage+0x18/0x20
 [] __writepage+0x17/0x40
 [] write_cache_pages+0x34a/0x4a0
 [] ? set_page_dirty+0x70/0x70
 [] generic_writepages+0x51/0x80
 [] do_writepages+0x20/0x50
 [] __writeback_single_inode+0xa6/0x2b0
 [] writeback_sb_inodes+0x311/0x4d0
 [] __writeback_inodes_wb+0x86/0xd0
 [] wb_writeback+0x1a3/0x330
 [] ? _raw_spin_lock_irqsave+0x3f/0x50
 [] ? get_nr_inodes+0x52/0x70
 [] wb_do_writeback+0x1dc/0x260
 [] ? schedule_timeout+0x204/0x240
 [] bdi_writeback_thread+0x102/0x2b0
 [] ? wb_do_writeback+0x260/0x260
 [] kthread+0xc0/0xd0
 [] ? kthread_worker_fn+0x1b0/0x1b0
 [] ret_from_fork+0x7c/0xb0
 [] ? kthread_worker_fn+0x1b0/0x1b0

 The above trace was triggered by
   "dd if=/dev/zero of=/dev/sr0 bs=2048 count=32768"

 It was previously working by accident, since another bug introduced
 by 4dce8ba94c7 (libata: Use 'bool' return value for ata_id_XXX) caused
 all drives to use maxsect=65535.

Signed-off-by: Shan Hai 
Signed-off-by: Jeff Garzik 
Signed-off-by: Greg Kroah-Hartman