linux-toradex.git/kernel, branch v3.2.23

tracing: change CPU ring buffer state from tracing_cpumask

2012-07-12T03:32:09+00:00

commit 71babb2705e2203a64c27ede13ae3508a0d2c16c upstream.

According to Documentation/trace/ftrace.txt:

tracing_cpumask:

        This is a mask that lets the user only trace
        on specified CPUS. The format is a hex string
        representing the CPUS.

The tracing_cpumask currently doesn't affect the tracing state of
per-CPU ring buffers.

This patch enables/disables CPU recording as its corresponding bit in
tracing_cpumask is set/unset.

Link: http://lkml.kernel.org/r/1336096792-25373-3-git-send-email-vnagarnaik@google.com

Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Laurent Chavey 
Cc: Justin Teravest 
Cc: David Sharp 
Signed-off-by: Vaibhav Nagarnaik 
Signed-off-by: Steven Rostedt 
Signed-off-by: Ben Hutchings

splice: fix racy pipe->buffers uses

2012-07-12T03:31:59+00:00

commit 047fe3605235888f3ebcda0c728cb31937eadfe6 upstream.

Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered
by splice_shrink_spd() called from vmsplice_to_pipe()

commit 35f3d14dbbc5 (pipe: add support for shrinking and growing pipes)
added capability to adjust pipe->buffers.

Problem is some paths don't hold pipe mutex and assume pipe->buffers
doesn't change for their duration.

Fix this by adding nr_pages_max field in struct splice_pipe_desc, and
use it in place of pipe->buffers where appropriate.

splice_shrink_spd() loses its struct pipe_inode_info argument.

Reported-by: Dave Jones 
Signed-off-by: Eric Dumazet 
Cc: Jens Axboe 
Cc: Alexander Viro 
Cc: Tom Herbert 
Tested-by: Dave Jones 
Signed-off-by: Jens Axboe 
[bwh: Backported to 3.2:
 - Adjust context in vmsplice_to_pipe()
 - Update one more call to splice_shrink_spd(), from skb_splice_bits()]
Signed-off-by: Ben Hutchings

sched: Fix the relax_domain_level boot parameter

2012-06-19T22:18:06+00:00

commit a841f8cef4bb124f0f5563314d0beaf2e1249d72 upstream.

It does not get processed because sched_domain_level_max is 0 at the
time that setup_relax_domain_level() is run.

Simply accept the value as it is, as we don't know the value of
sched_domain_level_max until sched domain construction is completed.

Fix sched_relax_domain_level in cpuset.  The build_sched_domain() routine calls
the set_domain_attribute() routine prior to setting the sd->level, however,
the set_domain_attribute() routine relies on the sd->level to decide whether
idle load balancing will be off/on.

Signed-off-by: Dimitri Sivanich 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/20120605184436.GA15668@sgi.com
Signed-off-by: Ingo Molnar 
[bwh: Backported to 3.2: adjust the filename]
Signed-off-by: Ben Hutchings

mm/fork: fix overflow in vma length when copying mmap on clone

2012-06-10T13:41:44+00:00

commit 7edc8b0ac16cbaed7cb4ea4c6b95ce98d2997e84 upstream.

The vma length in dup_mmap is calculated and stored in a unsigned int,
which is insufficient and hence overflows for very large maps (beyond
16TB). The following program demonstrates this:

#include 
#include 
#include 

#define GIG 1024 * 1024 * 1024L
#define EXTENT 16393

int main(void)
{
        int i, r;
        void *m;
        char buf[1024];

        for (i = 0; i < EXTENT; i++) {
                m = mmap(NULL, (size_t) 1 * 1024 * 1024 * 1024L,
                         PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);

                if (m == (void *)-1)
                        printf("MMAP Failed: %d\n", m);
                else
                        printf("%d : MMAP returned %p\n", i, m);

                r = fork();

                if (r == 0) {
                        printf("%d: successed\n", i);
                        return 0;
                } else if (r < 0)
                        printf("FORK Failed: %d\n", r);
                else if (r > 0)
                        wait(NULL);
        }
        return 0;
}

Increase the storage size of the result to unsigned long, which is
sufficient for storing the difference between addresses.

Signed-off-by: Siddhesh Poyarekar 
Cc: Tejun Heo 
Cc: Oleg Nesterov 
Cc: Jens Axboe 
Cc: Peter Zijlstra 
Acked-by: Hugh Dickins 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

compat: Fix RT signal mask corruption via sigprocmask

2012-05-30T23:43:51+00:00

commit b7dafa0ef3145c31d7753be0a08b3cbda51f0209 upstream.

compat_sys_sigprocmask reads a smaller signal mask from userspace than
sigprogmask accepts for setting.  So the high word of blocked.sig[0]
will be cleared, releasing any potentially blocked RT signal.

This was discovered via userspace code that relies on get/setcontext.
glibc's i386 versions of those functions use sigprogmask instead of
rt_sigprogmask to save/restore signal mask and caused RT signal
unblocking this way.

As suggested by Linus, this replaces the sys_sigprocmask based compat
version with one that open-codes the required logic, including the merge
of the existing blocked set with the new one provided on SIG_SETMASK.

Signed-off-by: Jan Kiszka 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings

workqueue: skip nr_running sanity check in worker_enter_idle() if trustee is active

2012-05-30T23:43:42+00:00

commit 544ecf310f0e7f51fa057ac2a295fc1b3b35a9d3 upstream.

worker_enter_idle() has WARN_ON_ONCE() which triggers if nr_running
isn't zero when every worker is idle.  This can trigger spuriously
while a cpu is going down due to the way trustee sets %WORKER_ROGUE
and zaps nr_running.

It first sets %WORKER_ROGUE on all workers without updating
nr_running, releases gcwq->lock, schedules, regrabs gcwq->lock and
then zaps nr_running.  If the last running worker enters idle
inbetween, it would see stale nr_running which hasn't been zapped yet
and trigger the WARN_ON_ONCE().

Fix it by performing the sanity check iff the trustee is idle.

Signed-off-by: Tejun Heo 
Reported-by: "Paul E. McKenney" 
Signed-off-by: Ben Hutchings

namespaces, pid_ns: fix leakage on fork() failure

2012-05-20T21:56:32+00:00

commit 5e2bf0142231194d36fdc9596b36a261ed2b9fe7 upstream.

Fork() failure post namespace creation for a child cloned with
CLONE_NEWPID leaks pid_namespace/mnt_cache due to proc being mounted
during creation, but not unmounted during cleanup.  Call
pid_ns_release_proc() during cleanup.

Signed-off-by: Mike Galbraith 
Acked-by: Oleg Nesterov 
Reviewed-by: "Eric W. Biederman" 
Cc: Pavel Emelyanov 
Cc: Cyrill Gorcunov 
Cc: Louis Rilling 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings

exit_signal: fix the "parent has changed security domain" logic

2012-05-11T12:15:04+00:00

commit b6e238dceed36891cc633167afe7151f1f3d83c5 upstream.

exit_notify() changes ->exit_signal if the parent already did exec.
This doesn't really work, we are not going to send the signal now
if there is another live thread or the exiting task is traced. The
parent can exec before the last dies or the tracer detaches.

Move this check into do_notify_parent() which actually sends the
signal.

The user-visible change is that we do not change ->exit_signal,
and thus the exiting task is still "clone children" for
do_wait()->eligible_child(__WCLONE). Hopefully this is fine, the
current logic is racy anyway.

Signed-off-by: Oleg Nesterov 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings

exit_signal: simplify the "we have changed execution domain" logic

2012-05-11T12:15:03+00:00

commit e636825346b36a07ccfc8e30946d52855e21f681 upstream.

exit_notify() checks "tsk->self_exec_id != tsk->parent_exec_id"
to handle the "we have changed execution domain" case.

We can change do_thread() to always set ->exit_signal = SIGCHLD
and remove this check to simplify the code.

We could change setup_new_exec() instead, this looks more logical
because it increments ->self_exec_id. But note that de_thread()
already resets ->exit_signal if it changes the leader, let's keep
both changes close to each other.

Note that we change ->exit_signal lockless, this changes the rules.
Thereafter ->exit_signal is not stable under tasklist but this is
fine, the only possible change is OLDSIG -> SIGCHLD. This can race
with eligible_child() but the race is harmless. We can race with
reparent_leader() which changes our ->exit_signal in parallel, but
it does the same change to SIGCHLD.

The noticeable user-visible change is that the execing task is not
"visible" to do_wait()->eligible_child(__WCLONE) right after exec.
To me this looks more logical, and this is consistent with mt case.

Signed-off-by: Oleg Nesterov 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings

sched: Fix nohz load accounting -- again!

2012-05-11T12:14:49+00:00

commit c308b56b5398779cd3da0f62ab26b0453494c3d4 upstream.

Various people reported nohz load tracking still being wrecked, but Doug
spotted the actual problem. We fold the nohz remainder in too soon,
causing us to loose samples and under-account.

So instead of playing catch-up up-front, always do a single load-fold
with whatever state we encounter and only then fold the nohz remainder
and play catch-up.

Reported-by: Doug Smythies 
Reported-by: LesÅ=82aw Kope=C4=87 
Reported-by: Aman Gupta 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/n/tip-4v31etnhgg9kwd6ocgx3rxl8@git.kernel.org
Signed-off-by: Ingo Molnar 
[bwh: Backported to 3.2: change filename]
Signed-off-by: Ben Hutchings