linux-toradex.git/include/linux/sched.h, branch v2.6.35.4

sched: Revert nohz_ratelimit() for now

2010-08-13T20:31:02+00:00

commit 396e894d289d69bacf5acd983c97cd6e21a14c08 upstream.

Norbert reported that nohz_ratelimit() causes his laptop to burn about
4W (40%) extra. For now back out the change and see if we can adjust
the power management code to make better decisions.

Reported-by: Norbert Preining 
Signed-off-by: Peter Zijlstra 
Acked-by: Mike Galbraith 
Cc: Arjan van de Ven 
LKML-Reference: 
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

CRED: Fix __task_cred()'s lockdep check and banner comment

2010-07-29T22:16:18+00:00

Fix __task_cred()'s lockdep check by removing the following validation
condition:

	lockdep_tasklist_lock_is_held()

as commit_creds() does not take the tasklist_lock, and nor do most of the
functions that call it, so this check is pointless and it can prevent
detection of the RCU lock not being held if the tasklist_lock is held.

Instead, add the following validation condition:

	task->exit_state >= 0

to permit the access if the target task is dead and therefore unable to change
its own credentials.

Fix __task_cred()'s comment to:

 (1) discard the bit that says that the caller must prevent the target task
     from being deleted.  That shouldn't need saying.

 (2) Add a comment indicating the result of __task_cred() should not be passed
     directly to get_cred(), but rather than get_task_cred() should be used
     instead.

Also put a note into the documentation to enforce this point there too.

Signed-off-by: David Howells 
Acked-by: Jiri Olsa 
Cc: Paul E. McKenney 
Signed-off-by: Linus Torvalds

sched: Cure nr_iowait_cpu() users

2010-07-01T07:39:48+00:00

Commit 0224cf4c5e (sched: Intoduce get_cpu_iowait_time_us())
broke things by not making sure preemption was indeed disabled
by the callers of nr_iowait_cpu() which took the iowait value of
the current cpu.

This resulted in a heap of preempt warnings. Cure this by making
nr_iowait_cpu() take a cpu number and fix up the callers to pass
in the right number.

Signed-off-by: Peter Zijlstra 
Cc: Arjan van de Ven 
Cc: Sergey Senozhatsky 
Cc: Rafael J. Wysocki 
Cc: Maxim Levitsky 
Cc: Len Brown 
Cc: Pavel Machek 
Cc: Jiri Slaby 
Cc: linux-pm@lists.linux-foundation.org
LKML-Reference: <1277968037.1868.120.camel@laptop>
Signed-off-by: Ingo Molnar

proc: turn signal_struct->count into "int nr_threads"

2010-05-27T16:12:47+00:00

No functional changes, just s/atomic_t count/int nr_threads/.

With the recent changes this counter has a single user, get_nr_threads()
And, none of its callers need the really accurate number of threads, not
to mention each caller obviously races with fork/exit.  It is only used to
report this value to the user-space, except first_tid() uses it to avoid
the unnecessary while_each_thread() loop in the unlikely case.

It is a bit sad we need a word in struct signal_struct for this, perhaps
we can change get_nr_threads() to approximate the number of threads using
signal->live and kill ->nr_threads later.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Oleg Nesterov 
Cc: Alexey Dobriyan 
Cc: "Eric W. Biederman" 
Acked-by: Roland McGrath 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

proc: get_nr_threads() doesn't need ->siglock any longer

2010-05-27T16:12:47+00:00

Now that task->signal can't go away get_nr_threads() doesn't need
->siglock to read signal->count.

Also, make it inline, move into sched.h, and convert 2 other proc users of
signal->count to use this (now trivial) helper.

Henceforth get_nr_threads() is the only valid user of signal->count, we
are ready to turn it into "int nr_threads" or, perhaps, kill it.

Signed-off-by: Oleg Nesterov 
Cc: Alexey Dobriyan 
Cc: David Howells 
Cc: "Eric W. Biederman" 
Acked-by: Roland McGrath 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

kill the obsolete thread_group_cputime_free() helper

2010-05-27T16:12:46+00:00

Kill the empty thread_group_cputime_free() helper.  It was needed to free
the per-cpu data which we no longer have.

Signed-off-by: Oleg Nesterov 
Cc: Balbir Singh 
Cc: Roland McGrath 
Cc: Veaceslav Falico 
Cc: Stanislaw Gruszka 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

signals: kill the awful task_rq_unlock_wait() hack

2010-05-27T16:12:46+00:00

Now that task->signal can't go away we can revert the horrible hack added
by ad474caca3e2a0550b7ce0706527ad5ab389a4d4 ("fix for
account_group_exec_runtime(), make sure ->signal can't be freed under
rq->lock").

And we can do more cleanups sched_stats.h/posix-cpu-timers.c later.

Signed-off-by: Oleg Nesterov 
Cc: Alan Cox 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Acked-by: Roland McGrath 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

signals: make task_struct->signal immutable/refcountable

2010-05-27T16:12:46+00:00

We have a lot of problems with accessing task_struct->signal, it can
"disappear" at any moment.  Even current can't use its ->signal safely
after exit_notify().  ->siglock helps, but it is not convenient, not
always possible, and sometimes it makes sense to use task->signal even
after this task has already dead.

This patch adds the reference counter, sigcnt, into signal_struct.  This
reference is owned by task_struct and it is dropped in
__put_task_struct().  Perhaps it makes sense to export
get/put_signal_struct() later, but currently I don't see the immediate
reason.

Rename __cleanup_signal() to free_signal_struct() and unexport it.  With
the previous changes it does nothing except kmem_cache_free().

Change __exit_signal() to not clear/free ->signal, it will be freed when
the last reference to any thread in the thread group goes away.

Note:
	- when the last thead exits signal->tty can point to nowhere, see
	  the next patch.

	- with or without this patch signal_struct->count should go away,
	  or at least it should be "int nr_threads" for fs/proc. This will
	  be addressed later.

Signed-off-by: Oleg Nesterov 
Cc: Alan Cox 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Acked-by: Roland McGrath 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

exit: change zap_other_threads() to count sub-threads

2010-05-27T16:12:46+00:00

Change zap_other_threads() to return the number of other sub-threads found
on ->thread_group list.

Other changes are cosmetic:

	- change the code to use while_each_thread() helper

	- remove the obsolete comment about SIGKILL/SIGSTOP

Signed-off-by: Oleg Nesterov 
Acked-by: Roland McGrath 
Cc: Veaceslav Falico 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

cpusets: new round-robin rotor for SLAB allocations

2010-05-27T16:12:44+00:00

We have observed several workloads running on multi-node systems where
memory is assigned unevenly across the nodes in the system.  There are
numerous reasons for this but one is the round-robin rotor in
cpuset_mem_spread_node().

For example, a simple test that writes a multi-page file will allocate
pages on nodes 0 2 4 6 ...  Odd nodes are skipped.  (Sometimes it
allocates on odd nodes & skips even nodes).

An example is shown below.  The program "lfile" writes a file consisting
of 10 pages.  The program then mmaps the file & uses get_mempolicy(...,
MPOL_F_NODE) to determine the nodes where the file pages were allocated.
The output is shown below:

	# ./lfile
	 allocated on nodes: 2 4 6 0 1 2 6 0 2

There is a single rotor that is used for allocating both file pages & slab
pages.  Writing the file allocates both a data page & a slab page
(buffer_head).  This advances the RR rotor 2 nodes for each page
allocated.

A quick confirmation seems to confirm this is the cause of the uneven
allocation:

	# echo 0 >/dev/cpuset/memory_spread_slab
	# ./lfile
	 allocated on nodes: 6 7 8 9 0 1 2 3 4 5

This patch introduces a second rotor that is used for slab allocations.

Signed-off-by: Jack Steiner 
Acked-by: Christoph Lameter 
Cc: Pekka Enberg 
Cc: Paul Menage 
Cc: Jack Steiner 
Cc: Robin Holt 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds