linux-toradex.git/include, branch v2.6.27.62

lib: proportion: lower PROP_MAX_SHIFT to 32 on 64-bit kernel

2012-03-17T13:03:57+00:00

commit 3310225dfc71a35a2cc9340c15c0e08b14b3c754 upstream.

PROP_MAX_SHIFT should be set to <=32 on 64-bit box. This fixes two bugs
in the below lines of bdi_dirty_limit():

	bdi_dirty *= numerator;
	do_div(bdi_dirty, denominator);

1) divide error: do_div() only uses the lower 32 bit of the denominator,
   which may trimmed to be 0 when PROP_MAX_SHIFT > 32.

2) overflow: (bdi_dirty * numerator) could easily overflow if numerator
   used up to 48 bits, leaving only 16 bits to bdi_dirty

Cc: Peter Zijlstra 
Reported-by: Ilya Tumaykin 
Tested-by: Ilya Tumaykin 
Signed-off-by: Wu Fengguang 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Willy Tarreau

block: fail SCSI passthrough ioctls on partition devices

2012-02-11T14:40:55+00:00

commit 0bfc96cb77224736dfa35c3c555d37b3646ef35e upstream.

[ Changes with respect to 3.3: return -ENOTTY from scsi_verify_blk_ioctl
  and -ENOIOCTLCMD from sd_compat_ioctl. ]

Linux allows executing the SG_IO ioctl on a partition or LVM volume, and
will pass the command to the underlying block device.  This is
well-known, but it is also a large security problem when (via Unix
permissions, ACLs, SELinux or a combination thereof) a program or user
needs to be granted access only to part of the disk.

This patch lets partitions forward a small set of harmless ioctls;
others are logged with printk so that we can see which ioctls are
actually sent.  In my tests only CDROM_GET_CAPABILITY actually occurred.
Of course it was being sent to a (partition on a) hard disk, so it would
have failed with ENOTTY and the patch isn't changing anything in
practice.  Still, I'm treating it specially to avoid spamming the logs.

In principle, this restriction should include programs running with
CAP_SYS_RAWIO.  If for example I let a program access /dev/sda2 and
/dev/sdb, it still should not be able to read/write outside the
boundaries of /dev/sda2 independent of the capabilities.  However, for
now programs with CAP_SYS_RAWIO will still be allowed to send the
ioctls.  Their actions will still be logged.

This patch does not affect the non-libata IDE driver.  That driver
however already tests for bd != bd->bd_contains before issuing some
ioctl; it could be restricted further to forbid these ioctls even for
programs running with CAP_SYS_ADMIN/CAP_SYS_RAWIO.

Cc: linux-scsi@vger.kernel.org
Cc: Jens Axboe 
Cc: James Bottomley 
Signed-off-by: Paolo Bonzini 
[ Make it also print the command name when warning - Linus ]
Signed-off-by: Linus Torvalds 
[bwh: Backport to 2.6.32 - ENOIOCTLCMD does not get converted to
 ENOTTY, so we must return ENOTTY directly]
Signed-off-by: Ben Hutchings 
Signed-off-by: Greg Kroah-Hartman 

Signed-off-by: Willy Tarreau

block: add and use scsi_blk_cmd_ioctl

2012-02-11T14:40:54+00:00

commit 577ebb374c78314ac4617242f509e2f5e7156649 upstream.

Introduce a wrapper around scsi_cmd_ioctl that takes a block device.

The function will then be enhanced to detect partition block devices
and, in that case, subject the ioctls to whitelisting.

Cc: linux-scsi@vger.kernel.org
Cc: Jens Axboe 
Cc: James Bottomley 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman 
[bwh: Backport to 2.6.32 - adjust context]
Signed-off-by: Ben Hutchings 
[wt: slightly changed the interface to match 2.6.27's scsi_cmd_ioctl()
     which still needs the file pointer but has no mode parameter].

Signed-off-by: Willy Tarreau

af_packet: prevent information leak

2012-02-11T14:40:47+00:00

[ Upstream commit 13fcb7bd322164c67926ffe272846d4860196dc6 ]

In 2.6.27, commit 393e52e33c6c2 (packet: deliver VLAN TCI to userspace)
added a small information leak.

Add padding field and make sure its zeroed before copy to user.

Signed-off-by: Eric Dumazet 
CC: Patrick McHardy 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Willy Tarreau

x86, 64-bit: Fix copy_[to/from]_user() checks for the userspace address limit

2012-02-11T14:40:45+00:00

commit 26afb7c661080ae3f1f13ddf7f0c58c4f931c22b upstream.

As reported in BZ #30352:

  https://bugzilla.kernel.org/show_bug.cgi?id=30352

there's a kernel bug related to reading the last allowed page on x86_64.

The _copy_to_user() and _copy_from_user() functions use the following
check for address limit:

  if (buf + size >= limit)
	fail();

while it should be more permissive:

  if (buf + size > limit)
	fail();

That's because the size represents the number of bytes being
read/write from/to buf address AND including the buf address.
So the copy function will actually never touch the limit
address even if "buf + size == limit".

Following program fails to use the last page as buffer
due to the wrong limit check:

 #include 
 #include 
 #include 

 #define PAGE_SIZE       (4096)
 #define LAST_PAGE       ((void*)(0x7fffffffe000))

 int main()
 {
        int fds[2], err;
        void * ptr = mmap(LAST_PAGE, PAGE_SIZE, PROT_READ | PROT_WRITE,
                          MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
        assert(ptr == LAST_PAGE);
        err = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds);
        assert(err == 0);
        err = send(fds[0], ptr, PAGE_SIZE, 0);
        perror("send");
        assert(err == PAGE_SIZE);
        err = recv(fds[1], ptr, PAGE_SIZE, MSG_WAITALL);
        perror("recv");
        assert(err == PAGE_SIZE);
        return 0;
 }

The other place checking the addr limit is the access_ok() function,
which is working properly. There's just a misleading comment
for the __range_not_ok() macro - which this patch fixes as well.

The last page of the user-space address range is a guard page and
Brian Gerst observed that the guard page itself due to an erratum on K8 cpus
(#121 Sequential Execution Across Non-Canonical Boundary Causes Processor
Hang).

However, the test code is using the last valid page before the guard page.
The bug is that the last byte before the guard page can't be read
because of the off-by-one error. The guard page is left in place.

This bug would normally not show up because the last page is
part of the process stack and never accessed via syscalls.

[WT: in 2.6.27 use include/asm-x86/uaccess.h]

Signed-off-by: Jiri Olsa 
Acked-by: Brian Gerst 
Acked-by: Linus Torvalds 
Link: http://lkml.kernel.org/r/1305210630-7136-1-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Willy Tarreau

x86, mm: Add __get_user_pages_fast()

2012-02-11T14:38:10+00:00

Introduce a gup_fast() variant which is usable from IRQ/NMI context.

[ WT: this one is only needed for next patch ]

Signed-off-by: Peter Zijlstra 
CC: Nick Piggin 
Cc: Mike Galbraith 
Cc: Paul Mackerras 
Cc: Arnaldo Carvalho de Melo 
LKML-Reference: 
Signed-off-by: Ingo Molnar 
Signed-off-by: Willy Tarreau

NLM: Don't hang forever on NLM unlock requests

2012-02-11T14:37:49+00:00

commit 0b760113a3a155269a3fba93a409c640031dd68f upstream.

If the NLM daemon is killed on the NFS server, we can currently end up
hanging forever on an 'unlock' request, instead of aborting. Basically,
if the rpcbind request fails, or the server keeps returning garbage, we
really want to quit instead of retrying.

Tested-by: Vasily Averin 
Signed-off-by: Trond Myklebust 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Willy Tarreau

seqlock: Don't smp_rmb in seqlock reader spin loop

2012-02-11T14:37:29+00:00

commit 5db1256a5131d3b133946fa02ac9770a784e6eb2 upstream.

Move the smp_rmb after cpu_relax loop in read_seqlock and add
ACCESS_ONCE to make sure the test and return are consistent.

A multi-threaded core in the lab didn't like the update
from 2.6.35 to 2.6.36, to the point it would hang during
boot when multiple threads were active.  Bisection showed
af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867 (clockevents:
Remove the per cpu tick skew) as the culprit and it is
supported with stack traces showing xtime_lock waits including
tick_do_update_jiffies64 and/or update_vsyscall.

Experimentation showed the combination of cpu_relax and smp_rmb
was significantly slowing the progress of other threads sharing
the core, and this patch is effective in avoiding the hang.

A theory is the rmb is affecting the whole core while the
cpu_relax is causing a resource rebalance flush, together they
cause an interfernce cadance that is unbroken when the seqlock
reader has interrupts disabled.

At first I was confused why the refactor in
3c22cd5709e8143444a6d08682a87f4c57902df3 (kernel: optimise
seqlock) didn't affect this patch application, but after some
study that affected seqcount not seqlock. The new seqcount was
not factored back into the seqlock.  I defer that the future.

While the removal of the timer interrupt offset created
contention for the xtime lock while a cpu does the
additonal work to update the system clock, the seqlock
implementation with the tight rmb spin loop goes back much
further, and is just waiting for the right trigger.

Signed-off-by: Milton Miller 
Cc: 
Cc: Linus Torvalds 
Cc: Andi Kleen 
Cc: Nick Piggin 
Cc: Benjamin Herrenschmidt 
Cc: Anton Blanchard 
Cc: Paul McKenney 
Acked-by: Eric Dumazet 
Link: http://lkml.kernel.org/r/%3Cseqlock-rmb%40mdm.bga.com%3E
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Willy Tarreau

next_pidmap: fix overflow condition

2011-04-30T14:53:38+00:00

commit c78193e9c7bcbf25b8237ad0dec82f805c4ea69b upstream.

next_pidmap() just quietly accepted whatever 'last' pid that was passed
in, which is not all that safe when one of the users is /proc.

Admittedly the proc code should do some sanity checking on the range
(and that will be the next commit), but that doesn't mean that the
helper functions should just do that pidmap pointer arithmetic without
checking the range of its arguments.

So clamp 'last' to PID_MAX_LIMIT.  The fact that we then do "last+1"
doesn't really matter, the for-loop does check against the end of the
pidmap array properly (it's only the actual pointer arithmetic overflow
case we need to worry about, and going one bit beyond isn't going to
overflow).

[ Use PID_MAX_LIMIT rather than pid_max as per Eric Biederman ]

Reported-by: Tavis Ormandy 
Analyzed-by: Robert Święcki 
Cc: Eric W. Biederman 
Cc: Pavel Emelyanov 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

exec: copy-and-paste the fixes into compat_do_execve() paths

2011-04-30T14:53:36+00:00

commit 114279be2120a916e8a04feeb2ac976a10016f2f upstream.

Note: this patch targets 2.6.37 and tries to be as simple as possible.
That is why it adds more copy-and-paste horror into fs/compat.c and
uglifies fs/exec.c, this will be cleanuped later.

compat_copy_strings() plays with bprm->vma/mm directly and thus has
two problems: it lacks the RLIMIT_STACK check and argv/envp memory
is not visible to oom killer.

Export acct_arg_size() and get_arg_page(), change compat_copy_strings()
to use get_arg_page(), change compat_do_execve() to do acct_arg_size(0)
as do_execve() does.

Add the fatal_signal_pending/cond_resched checks into compat_count() and
compat_copy_strings(), this matches the code in fs/exec.c and certainly
makes sense.

Signed-off-by: Oleg Nesterov 
Cc: KOSAKI Motohiro 
Signed-off-by: Linus Torvalds 
Signed-off-by: Andi Kleen 
Cc: Moritz Muehlenhoff 
Signed-off-by: Greg Kroah-Hartman