linux-toradex.git/kernel, branch v2.6.33.4

CRED: Fix a race in creds_are_invalid() in credentials debugging

2010-05-12T22:02:50+00:00

commit e134d200d57d43b171dcb0b55c178a1a0c7db14a upstream.

creds_are_invalid() reads both cred->usage and cred->subscribers and then
compares them to make sure the number of processes subscribed to a cred struct
never exceeds the refcount of that cred struct.

The problem is that this can cause a race with both copy_creds() and
exit_creds() as the two counters, whilst they are of atomic_t type, are only
atomic with respect to themselves, and not atomic with respect to each other.

This means that if creds_are_invalid() can read the values on one CPU whilst
they're being modified on another CPU, and so can observe an evolving state in
which the subscribers count now is greater than the usage count a moment
before.

Switching the order in which the counts are read cannot help, so the thing to
do is to remove that particular check.

I had considered rechecking the values to see if they're in flux if the test
fails, but I can't guarantee they won't appear the same, even if they've
changed several times in the meantime.

Note that this can only happen if CONFIG_DEBUG_CREDENTIALS is enabled.

The problem is only likely to occur with multithreaded programs, and can be
tested by the tst-eintr1 program from glibc's "make check".  The symptoms look
like:

	CRED: Invalid credentials
	CRED: At include/linux/cred.h:240
	CRED: Specified credentials: ffff88003dda5878 [real][eff]
	CRED: ->magic=43736564, put_addr=(null)
	CRED: ->usage=766, subscr=766
	CRED: ->*uid = { 0,0,0,0 }
	CRED: ->*gid = { 0,0,0,0 }
	CRED: ->security is ffff88003d72f538
	CRED: ->security {359, 359}
	------------[ cut here ]------------
	kernel BUG at kernel/cred.c:850!
	...
	RIP: 0010:[]  [] __invalid_creds+0x4e/0x52
	...
	Call Trace:
	 [] copy_creds+0x6b/0x23f

Note the ->usage=766 and subscr=766.  The values appear the same because
they've been re-read since the check was made.

Reported-by: Roland McGrath 
Signed-off-by: David Howells 
Signed-off-by: James Morris 
Signed-off-by: Greg Kroah-Hartman

perf: Fix resource leak in failure path of perf_event_open()

2010-05-12T22:02:42+00:00

commit 048c852051d2bd5da54a4488bc1f16b0fc74c695 upstream.

perf_event_open() kfrees event after init failure which doesn't
release all resources allocated by perf_event_alloc().  Use
free_event() instead.

Signed-off-by: Tejun Heo 
Cc: Peter Zijlstra 
Cc: Paul Mackerras 
Cc: Arnaldo Carvalho de Melo 
LKML-Reference: <4BDBE237.1040809@kernel.org>
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

sched: Use proper type in sched_getaffinity()

2010-04-26T14:48:04+00:00

commit 8bc037fb89bb3104b9ae290d18c877624cd7d9cc upstream.

Using the proper type fixes the following compiler warning:

  kernel/sched.c:4850: warning: comparison of distinct pointer types lacks a cast

Signed-off-by: KOSAKI Motohiro 
Cc: torvalds@linux-foundation.org
Cc: travis@sgi.com
Cc: peterz@infradead.org
Cc: drepper@redhat.com
Cc: rja@sgi.com
Cc: sharyath@in.ibm.com
Cc: steiner@sgi.com
LKML-Reference: <20100317090046.4C79.A69D9226@jp.fujitsu.com>
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

lockdep: fix incorrect percpu usage

2010-04-26T14:48:03+00:00

The mainline kernel as of 2.6.34-rc5 is not affected by this problem because
commit 10fad5e46f6c7bdfb01b1a012380a38e3c6ab346 fixed it by refactoring.

lockdep fix incorrect percpu usage

Should use per_cpu_ptr() to obfuscate the per cpu pointers (RELOC_HIDE is needed
for per cpu pointers).

git blame points to commit:

lockdep.c: commit 8e18257d29238311e82085152741f0c3aa18b74d

But it's really just moving the code around. But it's enough to say that the
problems appeared before Jul 19 01:48:54 2007, which brings us back to 2.6.23.

It should be applied to stable 2.6.23.x to 2.6.33.x (or whichever of these
stable branches are still maintained).

(tested on 2.6.33.1 x86_64)

Signed-off-by: Mathieu Desnoyers 
CC: Randy Dunlap 
CC: Eric Dumazet 
CC: Rusty Russell 
CC: Peter Zijlstra 
CC: Tejun Heo 
CC: Ingo Molnar 
CC: Andrew Morton 
CC: Linus Torvalds 
CC: Greg Kroah-Hartman 
CC: Steven Rostedt 
Signed-off-by: Greg Kroah-Hartman

modules: fix incorrect percpu usage

2010-04-26T14:48:03+00:00

Mainline does not need this fix, as commit
259354deaaf03d49a02dbb9975d6ec2a54675672 fixed the problem by refactoring.

Should use per_cpu_ptr() to obfuscate the per cpu pointers (RELOC_HIDE is needed
for per cpu pointers).

Introduced by commit:

module.c: commit 6b588c18f8dacfa6d7957c33c5ff832096e752d3

This patch should be queued for the stable branch, for kernels 2.6.29.x to
2.6.33.x.  (tested on 2.6.33.1 x86_64)

Signed-off-by: Mathieu Desnoyers 
CC: Randy Dunlap 
CC: Eric Dumazet 
CC: Rusty Russell 
CC: Peter Zijlstra 
CC: Tejun Heo 
CC: Ingo Molnar 
CC: Andrew Morton 
CC: Linus Torvalds 
CC: Greg Kroah-Hartman 
CC: Steven Rostedt 
Signed-off-by: Greg Kroah-Hartman

sched: Fix sched_getaffinity()

2010-04-26T14:47:53+00:00

commit 84fba5ec91f11c0efb27d0ed6098f7447491f0df upstream.

taskset on 2.6.34-rc3 fails on one of my ppc64 test boxes with
the following error:

  sched_getaffinity(0, 16, 0x10029650030) = -1 EINVAL (Invalid argument)

This box has 128 threads and 16 bytes is enough to cover it.

Commit cd3d8031eb4311e516329aee03c79a08333141f1 (sched:
sched_getaffinity(): Allow less than NR_CPUS length) is
comparing this 16 bytes agains nr_cpu_ids.

Fix it by comparing nr_cpu_ids to the number of bits in the
cpumask we pass in.

Signed-off-by: Anton Blanchard 
Reviewed-by: KOSAKI Motohiro 
Cc: Sharyathi Nagesh 
Cc: Ulrich Drepper 
Cc: Peter Zijlstra 
Cc: Linus Torvalds 
Cc: Jack Steiner 
Cc: Russ Anderson 
Cc: Mike Travis 
LKML-Reference: <20100406070218.GM5594@kryten>
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

sched: sched_getaffinity(): Allow less than NR_CPUS length

2010-04-26T14:47:53+00:00

commit cd3d8031eb4311e516329aee03c79a08333141f1 upstream.

[ Note, this commit changes the syscall ABI for > 1024 CPUs systems. ]

Recently, some distro decided to use NR_CPUS=4096 for mysterious reasons.
Unfortunately, glibc sched interface has the following definition:

	# define __CPU_SETSIZE  1024
	# define __NCPUBITS     (8 * sizeof (__cpu_mask))
	typedef unsigned long int __cpu_mask;
	typedef struct
	{
	  __cpu_mask __bits[__CPU_SETSIZE / __NCPUBITS];
	} cpu_set_t;

It mean, if NR_CPUS is bigger than 1024, cpu_set_t makes an
ABI issue ...

More recently, Sharyathi Nagesh reported following test program makes
misterious syscall failure:

 -----------------------------------------------------------------------
 #define _GNU_SOURCE
 #include
 #include
 #include

 int main()
 {
     cpu_set_t set;
     if (sched_getaffinity(0, sizeof(cpu_set_t), &set) < 0)
         printf("\n Call is failing with:%d", errno);
 }
 -----------------------------------------------------------------------

Because the kernel assumes len argument of sched_getaffinity() is bigger
than NR_CPUS. But now it is not correct.

Now we are faced with the following annoying dilemma, due to
the limitations of the glibc interface built in years ago:

 (1) if we change glibc's __CPU_SETSIZE definition, we lost
     binary compatibility of _all_ application.

 (2) if we don't change it, we also lost binary compatibility of
     Sharyathi's use case.

Then, I would propse to change the rule of the len argument of
sched_getaffinity().

Old:
	len should be bigger than NR_CPUS
New:
	len should be bigger than maximum possible cpu id

This creates the following behavior:

 (A) In the real 4096 cpus machine, the above test program still
     return -EINVAL.

 (B) NR_CPUS=4096 but the machine have less than 1024 cpus (almost
     all machines in the world), the above can run successfully.

Fortunatelly, BIG SGI machine is mainly used for HPC use case. It means
they can rebuild their programs.

IOW we hope they are not annoyed by this issue ...

Reported-by: Sharyathi Nagesh 
Signed-off-by: KOSAKI Motohiro 
Acked-by: Ulrich Drepper 
Acked-by: Peter Zijlstra 
Cc: Linus Torvalds 
Cc: Andrew Morton 
Cc: Jack Steiner 
Cc: Russ Anderson 
Cc: Mike Travis 
LKML-Reference: <20100312161316.9520.A69D9226@jp.fujitsu.com>
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

genirq: Force MSI irq handlers to run with interrupts disabled

2010-04-26T14:47:49+00:00

commit 753649dbc49345a73a2454c770a3f2d54d11aec6 upstream.

Network folks reported that directing all MSI-X vectors of their multi
queue NICs to a single core can cause interrupt stack overflows when
enough interrupts fire at the same time.

This is caused by the fact that we run interrupt handlers by default
with interrupts enabled unless the driver reuqests the interrupt with
the IRQF_DISABLED set. The NIC handlers do not set this flag, so
simultaneous interrupts can nest unlimited and cause the stack
overflow.

The only safe counter measure is to run the interrupt handlers with
interrupts disabled. We can't switch to this mode in general right
now, but it is safe to do so for MSI interrupts.

Force IRQF_DISABLED for MSI interrupt handlers.

Signed-off-by: Thomas Gleixner 
Cc: Andi Kleen 
Cc: Linus Torvalds 
Cc: Andrew Morton 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Alan Cox 
Cc: David Miller 
Cc: Greg Kroah-Hartman 
Cc: Arnaldo Carvalho de Melo 
Signed-off-by: Greg Kroah-Hartman

Freezer: Fix buggy resume test for tasks frozen with cgroup freezer

2010-04-26T14:47:47+00:00

commit 5a7aadfe2fcb0f69e2acc1fbefe22a096e792fc9 upstream.

When the cgroup freezer is used to freeze tasks we do not want to thaw
those tasks during resume. Currently we test the cgroup freezer
state of the resuming tasks to see if the cgroup is FROZEN.  If so
then we don't thaw the task. However, the FREEZING state also indicates
that the task should remain frozen.

This also avoids a problem pointed out by Oren Ladaan: the freezer state
transition from FREEZING to FROZEN is updated lazily when userspace reads
or writes the freezer.state file in the cgroup filesystem. This means that
resume will thaw tasks in cgroups which should be in the FROZEN state if
there is no read/write of the freezer.state file to trigger this
transition before suspend.

NOTE: Another "simple" solution would be to always update the cgroup
freezer state during resume. However it's a bad choice for several reasons:
Updating the cgroup freezer state is somewhat expensive because it requires
walking all the tasks in the cgroup and checking if they are each frozen.
Worse, this could easily make resume run in N^2 time where N is the number
of tasks in the cgroup. Finally, updating the freezer state from this code
path requires trickier locking because of the way locks must be ordered.

Instead of updating the freezer state we rely on the fact that lazy
updates only manage the transition from FREEZING to FROZEN. We know that
a cgroup with the FREEZING state may actually be FROZEN so test for that
state too. This makes sense in the resume path even for partially-frozen
cgroups -- those that really are FREEZING but not FROZEN.

Reported-by: Oren Ladaan 
Signed-off-by: Matt Helsley 
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Greg Kroah-Hartman

softlockup: Stop spurious softlockup messages due to overflow

2010-04-01T23:01:51+00:00

commit 8c2eb4805d422bdbf60ba00ff233c794d23c3c00 upstream.

Ensure additions on touch_ts do not overflow.  This can occur
when the top 32 bits of the TSC reach 0xffffffff causing
additions to touch_ts to overflow and this in turn generates
spurious softlockup warnings.

Signed-off-by: Colin Ian King 
Cc: Peter Zijlstra 
Cc: Eric Dumazet 
LKML-Reference: <1268994482.1798.6.camel@lenovo>
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman