linux-toradex.git/kernel, branch v2.6.32.5

sched: Fix task priority bug

2010-01-22T23:18:40+00:00

commit 57785df5ac53c70da9fb53696130f3c551bfe1f9 upstream.

83f9ac removed a call to effective_prio() in wake_up_new_task(), which
leads to tasks running at MAX_PRIO.

This is caused by the idle thread being set to MAX_PRIO before forking
off init. O(1) used that to make sure idle was always preempted, CFS
uses check_preempt_curr_idle() for that so we can savely remove this bit
of legacy code.

Reported-by: Mike Galbraith 
Tested-by: Mike Galbraith 
Signed-off-by: Peter Zijlstra 
LKML-Reference: <1259754383.4003.610.camel@laptop>
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

sched: Fix cpu_clock() in NMIs, on !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK

2010-01-22T23:18:30+00:00

commit b9f8fcd55bbdb037e5332dbdb7b494f0b70861ac upstream.

Relax stable-sched-clock architectures to not save/disable/restore
hardirqs in cpu_clock().

The background is that I was trying to resolve a sparc64 perf
issue when I discovered this problem.

On sparc64 I implement pseudo NMIs by simply running the kernel
at IRQ level 14 when local_irq_disable() is called, this allows
performance counter events to still come in at IRQ level 15.

This doesn't work if any code in an NMI handler does
local_irq_save() or local_irq_disable() since the "disable" will
kick us back to cpu IRQ level 14 thus letting NMIs back in and
we recurse.

The only path which that does that in the perf event IRQ
handling path is the code supporting frequency based events.  It
uses cpu_clock().

cpu_clock() simply invokes sched_clock() with IRQs disabled.

And that's a fundamental bug all on it's own, particularly for
the HAVE_UNSTABLE_SCHED_CLOCK case.  NMIs can thus get into the
sched_clock() code interrupting the local IRQ disable code
sections of it.

Furthermore, for the not-HAVE_UNSTABLE_SCHED_CLOCK case, the IRQ
disabling done by cpu_clock() is just pure overhead and
completely unnecessary.

So the core problem is that sched_clock() is not NMI safe, but
we are invoking it from NMI contexts in the perf events code
(via cpu_clock()).

A less important issue is the overhead of IRQ disabling when it
isn't necessary in cpu_clock().

CONFIG_HAVE_UNSTABLE_SCHED_CLOCK architectures are not
affected by this patch.

Signed-off-by: David S. Miller 
Acked-by: Peter Zijlstra 
Cc: Mike Galbraith 
LKML-Reference: <20091213.182502.215092085.davem@davemloft.net>
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

futexes: Remove rw parameter from get_futex_key()

2010-01-22T23:18:11+00:00

commit 7485d0d3758e8e6491a5c9468114e74dc050785d upstream.

Currently, futexes have two problem:

A) The current futex code doesn't handle private file mappings properly.

get_futex_key() uses PageAnon() to distinguish file and
anon, which can cause the following bad scenario:

  1) thread-A call futex(private-mapping, FUTEX_WAIT), it
     sleeps on file mapping object.
  2) thread-B writes a variable and it makes it cow.
  3) thread-B calls futex(private-mapping, FUTEX_WAKE), it
     wakes up blocked thread on the anonymous page. (but it's nothing)

B) Current futex code doesn't handle zero page properly.

Read mode get_user_pages() can return zero page, but current
futex code doesn't handle it at all. Then, zero page makes
infinite loop internally.

The solution is to use write mode get_user_page() always for
page lookup. It prevents the lookup of both file page of private
mappings and zero page.

Performance concerns:

Probaly very little, because glibc always initialize variables
for futex before to call futex(). It means glibc users never see
the overhead of this patch.

Compatibility concerns:

This patch has few compatibility issues. After this patch,
FUTEX_WAIT require writable access to futex variables (read-only
mappings makes EFAULT). But practically it's not a problem,
glibc always initalizes variables for futexes explicitly - nobody
uses read-only mappings.

Reported-by: Hugh Dickins 
Signed-off-by: KOSAKI Motohiro 
Acked-by: Peter Zijlstra 
Acked-by: Darren Hart 
Cc: Linus Torvalds 
Cc: KAMEZAWA Hiroyuki 
Cc: Nick Piggin 
Cc: Ulrich Drepper 
LKML-Reference: <20100105162633.45A2.A69D9226@jp.fujitsu.com>
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

module: handle ppc64 relocating kcrctabs when CONFIG_RELOCATABLE=y

2010-01-18T18:19:51+00:00

commit d4703aefdbc8f9f347f6dcefcddd791294314eb7 upstream.

powerpc applies relocations to the kcrctab.  They're absolute symbols,
but it's not completely unreasonable: other archs may too, but the
relocation is often 0.

http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-November/077972.html

Inspired-by: Neil Horman 
Signed-off-by: Rusty Russell 
Tested-by: Neil Horman 
Acked-by: Paul Mackerras 
Signed-off-by: Greg Kroah-Hartman

fix more leaks in audit_tree.c tag_chunk()

2010-01-18T18:19:50+00:00

commit b4c30aad39805902cf5b855aa8a8b22d728ad057 upstream.

Several leaks in audit_tree didn't get caught by commit
318b6d3d7ddbcad3d6867e630711b8a705d873d7, including the leak on normal
exit in case of multiple rules refering to the same chunk.

Signed-off-by: Al Viro 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

fix braindamage in audit_tree.c untag_chunk()

2010-01-18T18:19:50+00:00

commit 6f5d51148921c242680a7a1d9913384a30ab3cbe upstream.

... aka "Al had badly fscked up when writing that thing and nobody
noticed until Eric had fixed leaks that used to mask the breakage".

The function essentially creates a copy of old array sans one element
and replaces the references to elements of original (they are on cyclic
lists) with those to corresponding elements of new one.  After that the
old one is fair game for freeing.

First of all, there's a dumb braino: when we get to list_replace_init we
use indices for wrong arrays - position in new one with the old array
and vice versa.

Another bug is more subtle - termination condition is wrong if the
element to be excluded happens to be the last one.  We shouldn't go
until we fill the new array, we should go until we'd finished the old
one.  Otherwise the element we are trying to kill will remain on the
cyclic lists...

That crap used to be masked by several leaks, so it was not quite
trivial to hit.  Eric had fixed some of those leaks a while ago and the
shit had hit the fan...

Signed-off-by: Al Viro 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

kernel/sysctl.c: fix stable merge error in NOMMU mmap_min_addr

2010-01-18T18:19:49+00:00

Stable commit 0399123f3dcce1a515d021107ec0fb4413ca3efa didn't match the
original upstream commit.  The CONFIG_MMU check was added much too early
in the list disabling a lot of proc entries in the process.

Signed-off-by: Mike Frysinger 
Signed-off-by: Greg Kroah-Hartman

kernel/signal.c: fix kernel information leak with print-fatal-signals=1

2010-01-18T18:19:33+00:00

commit b45c6e76bc2c72f6426c14bed64fdcbc9bf37cb0 upstream.

When print-fatal-signals is enabled it's possible to dump any memory
reachable by the kernel to the log by simply jumping to that address from
user space.

Or crash the system if there's some hardware with read side effects.

The fatal signals handler will dump 16 bytes at the execution address,
which is fully controlled by ring 3.

In addition when something jumps to a unmapped address there will be up to
16 additional useless page faults, which might be potentially slow (and at
least is not very efficient)

Fortunately this option is off by default and only there on i386.

But fix it by checking for kernel addresses and also stopping when there's
a page fault.

Signed-off-by: Andi Kleen 
Cc: Ingo Molnar 
Cc: Oleg Nesterov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

cgroups: fix 2.6.32 regression causing BUG_ON() in cgroup_diput()

2010-01-18T18:19:32+00:00

commit bd4f490a079730aadfaf9a728303ea0135c01945 upstream.

The LTP cgroup test suite generates a "kernel BUG at kernel/cgroup.c:790!"
here in cgroup_diput():

                 /*
                  * if we're getting rid of the cgroup, refcount should ensure
                  * that there are no pidlists left.
                  */
                 BUG_ON(!list_empty(&cgrp->pidlists));

The cgroup pidlist rework in 2.6.32 generates the BUG_ON, which is caused
when pidlist_array_load() calls cgroup_pidlist_find():

(1) if a matching cgroup_pidlist is found, it down_write's the mutex of the
     pre-existing cgroup_pidlist, and increments its use_count.
(2) if no matching cgroup_pidlist is found, then a new one is allocated, it
     down_write's its mutex, and the use_count is set to 0.
(3) the matching, or new, cgroup_pidlist gets returned back to pidlist_array_load(),
     which increments its use_count -- regardless whether new or pre-existing --
     and up_write's the mutex.

So if a matching list is ever encountered by cgroup_pidlist_find() during
the life of a cgroup directory, it results in an inflated use_count value,
preventing it from ever getting released by cgroup_release_pid_array().
Then if the directory is subsequently removed, cgroup_diput() hits the
BUG_ON() when it finds that the directory's cgroup is still populated with
a pidlist.

The patch simply removes the use_count increment when a matching pidlist
is found by cgroup_pidlist_find(), because it gets bumped by the calling
pidlist_array_load() function while still protected by the list's mutex.

Signed-off-by: Dave Anderson 
Reviewed-by: Li Zefan 
Acked-by: Ben Blum 
Cc: Paul Menage 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

modules: Skip empty sections when exporting section notes

2010-01-18T18:19:14+00:00

commit 10b465aaf9536ee5a16652fa0700740183d48ec9 upstream.

Commit 35dead4 "modules: don't export section names of empty sections
via sysfs" changed the set of sections that have attributes, but did
not change the iteration over these attributes in add_notes_attrs().
This can lead to add_notes_attrs() creating attributes with the wrong
names or with null name pointers.

Introduce a sect_empty() function and use it in both add_sect_attrs()
and add_notes_attrs().

Reported-by: Martin Michlmayr 
Signed-off-by: Ben Hutchings 
Tested-by: Martin Michlmayr 
Signed-off-by: Rusty Russell 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman