linux-toradex.git/include/linux/sched.h, branch v2.6.29.3

sched: do not count frozen tasks toward load

2009-04-27T17:37:00+00:00

upstream commit: e3c8ca8336707062f3f7cb1cd7e6b3c753baccdd

Freezing tasks via the cgroup freezer causes the load average to climb
because the freezer's current implementation puts frozen tasks in
uninterruptible sleep (D state).

Some applications which perform job-scheduling functions consult the
load average when making decisions.  If a cgroup is frozen, the load
average does not provide a useful measure of the system's utilization
to such applications.  This is especially inconvenient if the job
scheduler employs the cgroup freezer as a mechanism for preempting low
priority jobs.  Contrast this with using SIGSTOP for the same purpose:
the stopped tasks do not count toward system load.

Change task_contributes_to_load() to return false if the task is
frozen.  This results in /proc/loadavg behavior that better meets
users' expectations.

Signed-off-by: Nathan Lynch 
Acked-by: Andrew Morton 
Acked-by: Nigel Cunningham 
Tested-by: Nigel Cunningham 
Cc: 
Cc: containers@lists.linux-foundation.org
Cc: linux-pm@lists.linux-foundation.org
Cc: Matt Helsley 
LKML-Reference: <20090408194512.47a99b95@manatee.lan>
Signed-off-by: Ingo Molnar 
Signed-off-by: Chris Wright

cpumask: tsk_cpumask for accessing the struct task_struct's cpus_allowed.

2009-03-12T04:05:44+00:00

This allows us to change the representation (to a dangling bitmap or
cpumask_var_t) without breaking all the callers: they can use
tsk_cpumask() now and won't see a difference as the changes roll into
linux-next.

Signed-off-by: Rusty Russell

sched: don't allow setuid to succeed if the user does not have rt bandwidth

2009-02-27T10:11:53+00:00

Impact: fix hung task with certain (non-default) rt-limit settings

Corey Hickey reported that on using setuid to change the uid of a
rt process, the process would be unkillable and not be running.
This is because there was no rt runtime for that user group. Add
in a check to see if a user can attach an rt task to its task group.
On failure, return EINVAL, which is also returned in
CONFIG_CGROUP_SCHED.

Reported-by: Corey Hickey 
Signed-off-by: Dhaval Giani 
Acked-by: Peter Zijlstra 
Signed-off-by: Ingo Molnar

timers: fix TIMER_ABSTIME for process wide cpu timers

2009-02-11T13:04:21+00:00

The POSIX timer interface allows for absolute time expiry values through the
TIMER_ABSTIME flag, therefore we have to synchronize the timer to the clock
every time we start it.

Signed-off-by: Peter Zijlstra 
Signed-off-by: Ingo Molnar

timers: split process wide cpu clocks/timers, fix

2009-02-11T13:04:19+00:00

To decrease the chance of a missed enable, always enable the timer when we
sample it, we'll always disable it when we find that there are no active timers
in the jiffy tick.

This fixes a flood of warnings reported by Mike Galbraith.

Reported-by: Mike Galbraith 
Signed-off-by: Peter Zijlstra 
Signed-off-by: Ingo Molnar

timers: split process wide cpu clocks/timers, remove spurious warning

2009-02-06T13:57:51+00:00

Mike Galbraith reported that the new warning in thread_group_cputimer()
triggers en masse with Amarok running.

Oleg Nesterov observed:

  Can't fastpath_timer_check()->thread_group_cputimer() have the
  false warning too? Suppose we had the timer, then posix_cpu_timer_del()
  removes this timer, but task_cputime_zero(&sig->cputime_expires) still
  not true.

Remove the spurious debug warning.

Reported-by: Mike Galbraith 
Explained-by: Oleg Nesterov 
Signed-off-by: Ingo Molnar

timers: split process wide cpu clocks/timers

2009-02-05T12:04:33+00:00

Change the process wide cpu timers/clocks so that we:

 1) don't mess up the kernel with too many threads,
 2) don't have a per-cpu allocation for each process,
 3) have no impact when not used.

In order to accomplish this we're going to split it into two parts:

 - clocks; which can take all the time they want since they run
           from user context -- ie. sys_clock_gettime(CLOCK_PROCESS_CPUTIME_ID)

 - timers; which need constant time sampling but since they're
           explicity used, the user can pay the overhead.

The clock readout will go back to a full sum of the thread group, while the
timers will run of a global 'clock' that only runs when needed, so only
programs that make use of the facility pay the price.

Signed-off-by: Peter Zijlstra 
Reviewed-by: Ingo Molnar 
Signed-off-by: Ingo Molnar

signal: re-add dead task accumulation stats.

2009-02-05T12:04:33+00:00

We're going to split the process wide cpu accounting into two parts:

 - clocks; which can take all the time they want since they run
           from user context.

 - timers; which need constant time tracing but can affort the overhead
           because they're default off -- and rare.

The clock readout will go back to a full sum of the thread group, for this
we need to re-add the exit stats that were removed in the initial itimer
rework (f06febc9: timers: fix itimer/many thread hang).

Furthermore, since that full sum can be rather slow for large thread groups
and we have the complete dead task stats, revert the do_notify_parent time
computation.

Signed-off-by: Peter Zijlstra 
Reviewed-by: Ingo Molnar 
Signed-off-by: Ingo Molnar

sched: add missing kernel-doc in sched.h

2009-02-03T05:32:10+00:00

Add kernel-doc notation for @lock:

include/linux/sched.h:457: No description found for parameter 'lock'

Signed-off-by: Randy Dunlap 
Signed-off-by: Ingo Molnar

epoll: drop max_user_instances and rely only on max_user_watches

2009-01-30T02:04:45+00:00

Linus suggested to put limits where the money is, and max_user_watches
already does that w/out the need of max_user_instances.  That has the
advantage to mitigate the potential DoS while allowing pretty generous
default behavior.

Allowing top 4% of low memory (per user) to be allocated in epoll watches,
we have:

LOMEM    MAX_WATCHES (per user)
512MB    ~178000
1GB      ~356000
2GB      ~712000

A box with 512MB of lomem, will meet some challenge in hitting 180K
watches, socket buffers math teaches us.  No more max_user_instances
limits then.

Signed-off-by: Davide Libenzi 
Cc: Willy Tarreau 
Cc: Michael Kerrisk 
Cc: Bron Gondwana 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds