linux-toradex.git/kernel, branch v2.6.33.5

profile: fix stats and data leakage

2010-05-26T21:32:09+00:00

commit 16a2164bb03612efe79a76c73da6da44445b9287 upstream.

If the kernel is large or the profiling step small, /proc/profile
leaks data and readprofile shows silly stats, until readprofile -r
has reset the buffer: clear the prof_buffer when it is vmalloc()ed.

Signed-off-by: Hugh Dickins 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

revert "procfs: provide stack information for threads" and its fixup commits

2010-05-26T21:32:04+00:00

commit 34441427aab4bdb3069a4ffcda69a99357abcb2e upstream.

Originally, commit d899bf7b ("procfs: provide stack information for
threads") attempted to introduce a new feature for showing where the
threadstack was located and how many pages are being utilized by the
stack.

Commit c44972f1 ("procfs: disable per-task stack usage on NOMMU") was
applied to fix the NO_MMU case.

Commit 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on
64-bit") was applied to fix a bug in ia32 executables being loaded.

Commit 9ebd4eba7 ("procfs: fix /proc//stat stack pointer for kernel
threads") was applied to fix a bug which had kernel threads printing a
userland stack address.

Commit 1306d603f ('proc: partially revert "procfs: provide stack
information for threads"') was then applied to revert the stack pages
being used to solve a significant performance regression.

This patch nearly undoes the effect of all these patches.

The reason for reverting these is it provides an unusable value in
field 28.  For x86_64, a fork will result in the task->stack_start
value being updated to the current user top of stack and not the stack
start address.  This unpredictability of the stack_start value makes
it worthless.  That includes the intended use of showing how much stack
space a thread has.

Other architectures will get different values.  As an example, ia64
gets 0.  The do_fork() and copy_process() functions appear to treat the
stack_start and stack_size parameters as architecture specific.

I only partially reverted c44972f1 ("procfs: disable per-task stack usage
on NOMMU") .  If I had completely reverted it, I would have had to change
mm/Makefile only build pagewalk.o when CONFIG_PROC_PAGE_MONITOR is
configured.  Since I could not test the builds without significant effort,
I decided to not change mm/Makefile.

I only partially reverted 89240ba0 ("x86, fs: Fix x86 procfs stack
information for threads on 64-bit") .  I left the KSTK_ESP() change in
place as that seemed worthwhile.

Signed-off-by: Robin Holt 
Cc: Stefani Seibold 
Cc: KOSAKI Motohiro 
Cc: Michal Simek 
Cc: Ingo Molnar 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

CRED: Fix a race in creds_are_invalid() in credentials debugging

2010-05-12T22:02:50+00:00

commit e134d200d57d43b171dcb0b55c178a1a0c7db14a upstream.

creds_are_invalid() reads both cred->usage and cred->subscribers and then
compares them to make sure the number of processes subscribed to a cred struct
never exceeds the refcount of that cred struct.

The problem is that this can cause a race with both copy_creds() and
exit_creds() as the two counters, whilst they are of atomic_t type, are only
atomic with respect to themselves, and not atomic with respect to each other.

This means that if creds_are_invalid() can read the values on one CPU whilst
they're being modified on another CPU, and so can observe an evolving state in
which the subscribers count now is greater than the usage count a moment
before.

Switching the order in which the counts are read cannot help, so the thing to
do is to remove that particular check.

I had considered rechecking the values to see if they're in flux if the test
fails, but I can't guarantee they won't appear the same, even if they've
changed several times in the meantime.

Note that this can only happen if CONFIG_DEBUG_CREDENTIALS is enabled.

The problem is only likely to occur with multithreaded programs, and can be
tested by the tst-eintr1 program from glibc's "make check".  The symptoms look
like:

	CRED: Invalid credentials
	CRED: At include/linux/cred.h:240
	CRED: Specified credentials: ffff88003dda5878 [real][eff]
	CRED: ->magic=43736564, put_addr=(null)
	CRED: ->usage=766, subscr=766
	CRED: ->*uid = { 0,0,0,0 }
	CRED: ->*gid = { 0,0,0,0 }
	CRED: ->security is ffff88003d72f538
	CRED: ->security {359, 359}
	------------[ cut here ]------------
	kernel BUG at kernel/cred.c:850!
	...
	RIP: 0010:[]  [] __invalid_creds+0x4e/0x52
	...
	Call Trace:
	 [] copy_creds+0x6b/0x23f

Note the ->usage=766 and subscr=766.  The values appear the same because
they've been re-read since the check was made.

Reported-by: Roland McGrath 
Signed-off-by: David Howells 
Signed-off-by: James Morris 
Signed-off-by: Greg Kroah-Hartman

perf: Fix resource leak in failure path of perf_event_open()

2010-05-12T22:02:42+00:00

commit 048c852051d2bd5da54a4488bc1f16b0fc74c695 upstream.

perf_event_open() kfrees event after init failure which doesn't
release all resources allocated by perf_event_alloc().  Use
free_event() instead.

Signed-off-by: Tejun Heo 
Cc: Peter Zijlstra 
Cc: Paul Mackerras 
Cc: Arnaldo Carvalho de Melo 
LKML-Reference: <4BDBE237.1040809@kernel.org>
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

sched: Use proper type in sched_getaffinity()

2010-04-26T14:48:04+00:00

commit 8bc037fb89bb3104b9ae290d18c877624cd7d9cc upstream.

Using the proper type fixes the following compiler warning:

  kernel/sched.c:4850: warning: comparison of distinct pointer types lacks a cast

Signed-off-by: KOSAKI Motohiro 
Cc: torvalds@linux-foundation.org
Cc: travis@sgi.com
Cc: peterz@infradead.org
Cc: drepper@redhat.com
Cc: rja@sgi.com
Cc: sharyath@in.ibm.com
Cc: steiner@sgi.com
LKML-Reference: <20100317090046.4C79.A69D9226@jp.fujitsu.com>
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

lockdep: fix incorrect percpu usage

2010-04-26T14:48:03+00:00

The mainline kernel as of 2.6.34-rc5 is not affected by this problem because
commit 10fad5e46f6c7bdfb01b1a012380a38e3c6ab346 fixed it by refactoring.

lockdep fix incorrect percpu usage

Should use per_cpu_ptr() to obfuscate the per cpu pointers (RELOC_HIDE is needed
for per cpu pointers).

git blame points to commit:

lockdep.c: commit 8e18257d29238311e82085152741f0c3aa18b74d

But it's really just moving the code around. But it's enough to say that the
problems appeared before Jul 19 01:48:54 2007, which brings us back to 2.6.23.

It should be applied to stable 2.6.23.x to 2.6.33.x (or whichever of these
stable branches are still maintained).

(tested on 2.6.33.1 x86_64)

Signed-off-by: Mathieu Desnoyers 
CC: Randy Dunlap 
CC: Eric Dumazet 
CC: Rusty Russell 
CC: Peter Zijlstra 
CC: Tejun Heo 
CC: Ingo Molnar 
CC: Andrew Morton 
CC: Linus Torvalds 
CC: Greg Kroah-Hartman 
CC: Steven Rostedt 
Signed-off-by: Greg Kroah-Hartman

modules: fix incorrect percpu usage

2010-04-26T14:48:03+00:00

Mainline does not need this fix, as commit
259354deaaf03d49a02dbb9975d6ec2a54675672 fixed the problem by refactoring.

Should use per_cpu_ptr() to obfuscate the per cpu pointers (RELOC_HIDE is needed
for per cpu pointers).

Introduced by commit:

module.c: commit 6b588c18f8dacfa6d7957c33c5ff832096e752d3

This patch should be queued for the stable branch, for kernels 2.6.29.x to
2.6.33.x.  (tested on 2.6.33.1 x86_64)

Signed-off-by: Mathieu Desnoyers 
CC: Randy Dunlap 
CC: Eric Dumazet 
CC: Rusty Russell 
CC: Peter Zijlstra 
CC: Tejun Heo 
CC: Ingo Molnar 
CC: Andrew Morton 
CC: Linus Torvalds 
CC: Greg Kroah-Hartman 
CC: Steven Rostedt 
Signed-off-by: Greg Kroah-Hartman

sched: Fix sched_getaffinity()

2010-04-26T14:47:53+00:00

commit 84fba5ec91f11c0efb27d0ed6098f7447491f0df upstream.

taskset on 2.6.34-rc3 fails on one of my ppc64 test boxes with
the following error:

  sched_getaffinity(0, 16, 0x10029650030) = -1 EINVAL (Invalid argument)

This box has 128 threads and 16 bytes is enough to cover it.

Commit cd3d8031eb4311e516329aee03c79a08333141f1 (sched:
sched_getaffinity(): Allow less than NR_CPUS length) is
comparing this 16 bytes agains nr_cpu_ids.

Fix it by comparing nr_cpu_ids to the number of bits in the
cpumask we pass in.

Signed-off-by: Anton Blanchard 
Reviewed-by: KOSAKI Motohiro 
Cc: Sharyathi Nagesh 
Cc: Ulrich Drepper 
Cc: Peter Zijlstra 
Cc: Linus Torvalds 
Cc: Jack Steiner 
Cc: Russ Anderson 
Cc: Mike Travis 
LKML-Reference: <20100406070218.GM5594@kryten>
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

sched: sched_getaffinity(): Allow less than NR_CPUS length

2010-04-26T14:47:53+00:00

commit cd3d8031eb4311e516329aee03c79a08333141f1 upstream.

[ Note, this commit changes the syscall ABI for > 1024 CPUs systems. ]

Recently, some distro decided to use NR_CPUS=4096 for mysterious reasons.
Unfortunately, glibc sched interface has the following definition:

	# define __CPU_SETSIZE  1024
	# define __NCPUBITS     (8 * sizeof (__cpu_mask))
	typedef unsigned long int __cpu_mask;
	typedef struct
	{
	  __cpu_mask __bits[__CPU_SETSIZE / __NCPUBITS];
	} cpu_set_t;

It mean, if NR_CPUS is bigger than 1024, cpu_set_t makes an
ABI issue ...

More recently, Sharyathi Nagesh reported following test program makes
misterious syscall failure:

 -----------------------------------------------------------------------
 #define _GNU_SOURCE
 #include
 #include
 #include

 int main()
 {
     cpu_set_t set;
     if (sched_getaffinity(0, sizeof(cpu_set_t), &set) < 0)
         printf("\n Call is failing with:%d", errno);
 }
 -----------------------------------------------------------------------

Because the kernel assumes len argument of sched_getaffinity() is bigger
than NR_CPUS. But now it is not correct.

Now we are faced with the following annoying dilemma, due to
the limitations of the glibc interface built in years ago:

 (1) if we change glibc's __CPU_SETSIZE definition, we lost
     binary compatibility of _all_ application.

 (2) if we don't change it, we also lost binary compatibility of
     Sharyathi's use case.

Then, I would propse to change the rule of the len argument of
sched_getaffinity().

Old:
	len should be bigger than NR_CPUS
New:
	len should be bigger than maximum possible cpu id

This creates the following behavior:

 (A) In the real 4096 cpus machine, the above test program still
     return -EINVAL.

 (B) NR_CPUS=4096 but the machine have less than 1024 cpus (almost
     all machines in the world), the above can run successfully.

Fortunatelly, BIG SGI machine is mainly used for HPC use case. It means
they can rebuild their programs.

IOW we hope they are not annoyed by this issue ...

Reported-by: Sharyathi Nagesh 
Signed-off-by: KOSAKI Motohiro 
Acked-by: Ulrich Drepper 
Acked-by: Peter Zijlstra 
Cc: Linus Torvalds 
Cc: Andrew Morton 
Cc: Jack Steiner 
Cc: Russ Anderson 
Cc: Mike Travis 
LKML-Reference: <20100312161316.9520.A69D9226@jp.fujitsu.com>
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

genirq: Force MSI irq handlers to run with interrupts disabled

2010-04-26T14:47:49+00:00

commit 753649dbc49345a73a2454c770a3f2d54d11aec6 upstream.

Network folks reported that directing all MSI-X vectors of their multi
queue NICs to a single core can cause interrupt stack overflows when
enough interrupts fire at the same time.

This is caused by the fact that we run interrupt handlers by default
with interrupts enabled unless the driver reuqests the interrupt with
the IRQF_DISABLED set. The NIC handlers do not set this flag, so
simultaneous interrupts can nest unlimited and cause the stack
overflow.

The only safe counter measure is to run the interrupt handlers with
interrupts disabled. We can't switch to this mode in general right
now, but it is safe to do so for MSI interrupts.

Force IRQF_DISABLED for MSI interrupt handlers.

Signed-off-by: Thomas Gleixner 
Cc: Andi Kleen 
Cc: Linus Torvalds 
Cc: Andrew Morton 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Alan Cox 
Cc: David Miller 
Cc: Greg Kroah-Hartman 
Cc: Arnaldo Carvalho de Melo 
Signed-off-by: Greg Kroah-Hartman