linux-toradex.git/kernel, branch v2.6.20.21

[PATCH] futex_compat: fix list traversal bugs

2007-10-17T19:30:32+00:00

commit 179c85ea53bef807621f335767e41e23f86f01df in mainline.

The futex list traversal on the compat side appears to have
a bug.

It's loop termination condition compares:

        while (compat_ptr(uentry) != &head->list)

But that can't be right because "uentry" has the special
"pi" indicator bit still potentially set at bit 0.  This
is cleared by fetch_robust_entry() into the "entry"
return value.

What this seems to mean is that the list won't terminate
when list iteration gets back to the the head.  And we'll
also process the list head like a normal entry, which could
cause all kinds of problems.

So we should check for equality with "entry".  That pointer
is of the non-compat type so we have to do a little casting
to keep the compiler and sparse happy.

The same problem can in theory occur with the 'pending'
variable, although that has not been reported from users
so far.

Based on the original patch from David Miller.

Acked-by: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: David Miller 
Signed-off-by: Arnd Bergmann 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

[PATCH] sigqueue_free: fix the race with collect_signal()

2007-10-17T19:30:29+00:00

commit 60187d2708caa870f0825d753df1612ea688eb9e in mainline.

Spotted by taoyue  and Jeremy Katz .

collect_signal:				sigqueue_free:

	list_del_init(&first->list);
						if (!list_empty(&q->list)) {
							// not taken
						}
						q->flags &= ~SIGQUEUE_PREALLOC;

	__sigqueue_free(first);			__sigqueue_free(q);

Now, __sigqueue_free() is called twice on the same "struct sigqueue" with the
obviously bad implications.

In particular, this double free breaks the array_cache->avail logic, so the
same sigqueue could be "allocated" twice, and the bug can manifest itself via
the "impossible" BUG_ON(!SIGQUEUE_PREALLOC) in sigqueue_free/send_sigqueue.

Hopefully this can explain these mysterious bug-reports, see

	http://marc.info/?t=118766926500003
	http://marc.info/?t=118466273000005

Alexey Dobriyan reports this patch makes the difference for the testcase, but
nobody has an access to the application which opened the problems originally.

Also, this patch removes tasklist lock/unlock, ->siglock is enough.

Signed-off-by: Oleg Nesterov 
Cc: taoyue 
Cc: Jeremy Katz 
Cc: Sukadev Bhattiprolu 
Cc: Alexey Dobriyan 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Roland McGrath 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

[PATCH] setpgid(child) fails if the child was forked by sub-thread

2007-10-17T19:30:29+00:00

commit b07e35f94a7b6a059f889b904529ee907dc0634d in mainline tree

Spotted by Marcin Kowalczyk .

sys_setpgid(child) fails if the child was forked by sub-thread.

Fix the "is it our child" check. The previous commit
ee0acf90d320c29916ba8c5c1b2e908d81f5057d was not complete.

(this patch asks for the new same_thread_group() helper, but mainline doesn't
 have it yet).

Signed-off-by: Oleg Nesterov 
Acked-by: Roland McGrath 
Tested-by: "Marcin 'Qrczak' Kowalczyk" 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

CPU time limit patch / setrlimit(RLIMIT_CPU, 0) cheat fix

2007-09-23T09:22:24+00:00

CPU time limit patch / setrlimit(RLIMIT_CPU, 0) cheat fix

As discovered here today, the change in Kernel 2.6.17 intended to inhibit
users from setting RLIMIT_CPU to 0 (as that is equivalent to unlimited) by
"cheating" and setting it to 1 in such a case, does not make a difference,
as the check is done in the wrong place (too late), and only applies to the
profiling code.

On all systems I checked running kernels above 2.6.17, no matter what the
hard and soft CPU time limits were before, a user could escape them by
issuing in the shell (sh/bash/zsh) "ulimit -t 0", and then the user's
process was not ever killed.

Attached is a trivial patch to fix that.  Simply moving the check to a
slightly earlier location (specifically, before the line that actually
assigns the limit - *old_rlim = new_rlim), does the trick.

Do note that at least the zsh (but not ash, dash, or bash) shell has the
problem of "caching" the limits set by the ulimit command, so when running
zsh the fix will not immediately be evident - after entering "ulimit -t 0",
"ulimit -a" will show "-t: cpu time (seconds) 0", even though the actual
limit as returned by getrlimit(...) will be 1.  It can be verified by
opening a subshell (which will not have the values of the parent shell in
cache) and checking in it, or just by running a CPU intensive command like
"echo '65536^1048576' | bc" and verifying that it dumps core after one
second.

Regardless of whether that is a misfeature in the shell, perhaps it would
be better to return -EINVAL from setrlimit in such a case instead of
cheating and setting to 1, as that does not really reflect the actual state
of the process anymore.  I do not however know what the ground for that
decision was in the original 2.6.17 change, and whether there would be any
"backward" compatibility issues, so I preferred not to touch that right
now.

Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] Fix leak on /proc/lockdep_stats

2007-08-25T15:24:05+00:00

Signed-off-by: Alexey Dobriyan 
Signed-off-by: Andrew Morton 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Willy Tarreau

[PATCH] audit: fix oops removing watch if audit disabled

2007-08-15T08:02:32+00:00

Removing a watched file will oops if audit is disabled (auditctl -e 0).

To reproduce:
- auditctl -e 1
- touch /tmp/foo
- auditctl -w /tmp/foo
- auditctl -e 0
- rm /tmp/foo (or mv)

Signed-off-by: Tony Jones 
Cc: Al Viro 
Signed-off-by: Andrew Morton 
Signed-off-by: Chris Wright 
Signed-off-by: Greg Kroah-Hartman

[PATCH] FUTEX: Restore the dropped ERSCH fix

2007-08-15T08:02:31+00:00

The return value of futex_find_get_task() needs to be -ESRCH in case
that the search fails. This was part of the original futex fixes and
got accidentally dropped, when the futex-tidy-up patch was split out.

Results in a NULL pointer dereference in case the search fails.

Restore it.

Signed-off-by: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Ulrich Drepper 
Signed-off-by: Chris Wright 
Signed-off-by: Greg Kroah-Hartman

[PATCH] sched: fix next_interval determination in idle_balance()

2007-08-15T08:02:31+00:00

Fix massive SMP imbalance on NUMA nodes observed on 2.6.21.5 with CFS.
(and later on reproduced without CFS as well).

The intervals of domains that do not have SD_BALANCE_NEWIDLE must be
considered for the calculation of the time of the next balance.
Otherwise we may defer rebalancing forever and nodes might stay idle for
very long times.

Siddha also spotted that the conversion of the balance interval to
jiffies is missing. Fix that to.

From: Srivatsa Vaddagiri 

also continue the loop if !(sd->flags & SD_LOAD_BALANCE).

Tested-by: Paul E. McKenney 

It did in fact trigger under all three of mainline, CFS, and -rt
including CFS -- see below for a couple of emails from last Friday
giving results for these three on the AMD box (where it happened) and on
a single-quad NUMA-Q system (where it did not, at least not with such
severity).

Signed-off-by: Christoph Lameter 
Signed-off-by: Ingo Molnar 
Signed-off-by: Chris Wright 
Signed-off-by: Greg Kroah-Hartman

[PATCH] pi-futex: Fix exit races and locking problems

2007-08-15T08:02:26+00:00

1. New entries can be added to tsk->pi_state_list after task completed
   exit_pi_state_list(). The result is memory leakage and deadlocks.

2. handle_mm_fault() is called under spinlock. The result is obvious.

3. results in self-inflicted deadlock inside glibc.
   Sometimes futex_lock_pi returns -ESRCH, when it is not expected
   and glibc enters to for(;;) sleep() to simulate deadlock. This problem
   is quite obvious and I think the patch is right. Though it looks like
   each "if" in futex_lock_pi() got some stupid special case "else if". :-)

4. sometimes futex_lock_pi() returns -EDEADLK,
   when nobody has the lock. The reason is also obvious (see comment
   in the patch), but correct fix is far beyond my comprehension.
   I guess someone already saw this, the chunk:

                        if (rt_mutex_trylock(&q.pi_state->pi_mutex))
                                ret = 0;

   is obviously from the same opera. But it does not work, because the
   rtmutex is really taken at this point: wake_futex_pi() of previous
   owner reassigned it to us. My fix works. But it looks very stupid.
   I would think about removal of shift of ownership in wake_futex_pi()
   and making all the work in context of process taking lock.

From: Thomas Gleixner 

Fix 1) Avoid the tasklist lock variant of the exit race fix by adding
    an additional state transition to the exit code.

    This fixes also the issue, when a task with recursive segfaults
    is not able to release the futexes.

Fix 2) Cleanup the lookup_pi_state() failure path and solve the -ESRCH
    problem finally.

Fix 3) Solve the fixup_pi_state_owner() problem which needs to do the fixup
    in the lock protected section by using the in_atomic userspace access
    functions.

    This removes also the ugly lock drop / unqueue inside of fixup_pi_state()

Fix 4) Fix a stale lock in the error path of futex_wake_pi()

Added some error checks for verification.

The -EDEADLK problem is solved by the rtmutex fixups.

Cc: Alexey Kuznetsov 
Signed-off-by: Thomas Gleixner 
Acked-by: Ingo Molnar 
Signed-off-by: Chris Wright 
Signed-off-by: Greg Kroah-Hartman

[PATCH] rt-mutex: Fix chain walk early wakeup bug

2007-08-15T08:02:26+00:00

Alexey Kuznetsov found some problems in the pi-futex code.

One of the root causes is:

When a wakeup happens, we do not to stop the chain walk so we
we follow a non existing locking chain.

Drop out when this happens.

Cc: Alexey Kuznetsov 
Signed-off-by: Thomas Gleixner 
Acked-by: Ingo Molnar 
Signed-off-by: Chris Wright 
Signed-off-by: Greg Kroah-Hartman