linux-toradex.git/kernel/sched_rt.c, branch v2.6.26-rc3

sched: fix RT task-wakeup logic

2008-05-05T21:56:18+00:00

Dmitry Adamushko pointed out a logic error in task_wake_up_rt() where we
will always evaluate to "true".  You can find the thread here:

http://lkml.org/lkml/2008/4/22/296

In reality, we only want to try to push tasks away when a wake up request is
not going to preempt the current task.  So lets fix it.

Note: We introduce test_tsk_need_resched() instead of open-coding the flag
check so that the merge-conflict with -rt should help remind us that we
may need to support NEEDS_RESCHED_DELAYED in the future, too.

Signed-off-by: Gregory Haskins 
CC: Dmitry Adamushko 
CC: Steven Rostedt 
Signed-off-by: Ingo Molnar

sched: make rt_sched_class, idle_sched_class static

2008-05-05T21:56:17+00:00

The C files are included directly in sched.c, so they are
effectively static.

Signed-off-by: Harvey Harrison 
Acked-by: Peter Zijlstra 
Signed-off-by: Ingo Molnar

sched: rt-group: optimize dequeue_rt_stack

2008-04-19T17:45:00+00:00

Now that the group hierarchy can have an arbitrary depth the O(n^2) nature
of RT task dequeues will really hurt. Optimize this by providing space to
store the tree path, so we can walk it the other way.

Signed-off-by: Peter Zijlstra 
Signed-off-by: Ingo Molnar

sched: fair-group: SMP-nice for group scheduling

2008-04-19T17:45:00+00:00

Implement SMP nice support for the full group hierarchy.

On each load-balance action, compile a sched_domain wide view of the full
task_group tree. We compute the domain wide view when walking down the
hierarchy, and readjust the weights when walking back up.

After collecting and readjusting the domain wide view, we try to balance the
tasks within the task_groups. The current approach is a naively balance each
task group until we've moved the targeted amount of load.

Inspired by Srivatsa Vaddsgiri's previous code and Abhishek Chandra's H-SMP
paper.

XXX: there will be some numerical issues due to the limited nature of
     SCHED_LOAD_SCALE wrt to representing a task_groups influence on the
     total weight. When the tree is deep enough, or the task weight small
     enough, we'll run out of bits.

Signed-off-by: Peter Zijlstra 
CC: Abhishek Chandra 
CC: Srivatsa Vaddagiri 
Signed-off-by: Ingo Molnar

sched: mix tasks and groups

2008-04-19T17:44:59+00:00

This patch allows tasks and groups to exist in the same cfs_rq. With this
change the CFS group scheduling follows a 1/(M+N) model from a 1/(1+N)
fairness model where M tasks and N groups exist at the cfs_rq level.

[a.p.zijlstra@chello.nl: rt bits and assorted fixes]
Signed-off-by: Dhaval Giani 
Signed-off-by: Srivatsa Vaddagiri 
Signed-off-by: Peter Zijlstra 
Signed-off-by: Ingo Molnar

sched: add new set_cpus_allowed_ptr function

2008-04-19T17:44:59+00:00

Add a new function that accepts a pointer to the "newly allowed cpus"
cpumask argument.

int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)

The current set_cpus_allowed() function is modified to use the above
but this does not result in an ABI change.  And with some compiler
optimization help, it may not introduce any additional overhead.

Additionally, to enforce the read only nature of the new_mask arg, the
"const" property is migrated to sub-functions called by set_cpus_allowed.
This silences compiler warnings.

Signed-off-by: Mike Travis 
Signed-off-by: Ingo Molnar

sched: rt-group: smp balancing

2008-04-19T17:44:58+00:00

Currently the rt group scheduling does a per cpu runtime limit, however
the rt load balancer makes no guarantees about an equal spread of real-
time tasks, just that at any one time, the highest priority tasks run.

Solve this by making the runtime limit a global property by borrowing
excessive runtime from the other cpus once the local limit runs out.

Signed-off-by: Peter Zijlstra 
Signed-off-by: Ingo Molnar

sched: rt-group: synchonised bandwidth period

2008-04-19T17:44:57+00:00

Various SMP balancing algorithms require that the bandwidth period
run in sync.

Possible improvements are moving the rt_bandwidth thing into root_domain
and keeping a span per rt_bandwidth which marks throttled cpus.

Signed-off-by: Peter Zijlstra 
Signed-off-by: Ingo Molnar

sched: balance RT task resched only on runqueue

2008-03-07T15:43:00+00:00

Sripathi Kodi reported a crash in the -rt kernel:

  https://bugzilla.redhat.com/show_bug.cgi?id=435674

this is due to a place that can reschedule a task without holding
the tasks runqueue lock.  This was caused by the RT balancing code
that pulls RT tasks to the current run queue and will reschedule the
current task.

There's a slight chance that the pulling of the RT tasks will release
the current runqueue's lock and retake it (in the double_lock_balance).
During this time that the runqueue is released, the current task can
migrate to another runqueue.

In the prio_changed_rt code, after the pull, if the current task is of
lesser priority than one of the RT tasks pulled, resched_task is called
on the current task. If the current task had migrated in that small
window, resched_task will be called without holding the runqueue lock
for the runqueue that the task is on.

This race condition also exists in the mainline kernel and this patch
adds a check to make sure the task hasn't migrated before calling
resched_task.

Signed-off-by: Steven Rostedt 
Tested-by: Sripathi Kodi 
Acked-by: Peter Zijlstra 
Signed-off-by: Ingo Molnar

sched: revert load_balance_monitor() changes

2008-03-04T16:54:06+00:00

The following commits cause a number of regressions:

  commit 58e2d4ca581167c2a079f4ee02be2f0bc52e8729
  Author: Srivatsa Vaddagiri 
  Date:   Fri Jan 25 21:08:00 2008 +0100
  sched: group scheduling, change how cpu load is calculated

  commit 6b2d7700266b9402e12824e11e0099ae6a4a6a79
  Author: Srivatsa Vaddagiri 
  Date:   Fri Jan 25 21:08:00 2008 +0100
  sched: group scheduler, fix fairness of cpu bandwidth allocation for task groups

Namely:
 - very frequent wakeups on SMP, reported by PowerTop users.
 - cacheline trashing on (large) SMP
 - some latencies larger than 500ms

While there is a mergeable patch to fix the latter, the former issues
are not fixable in a manner suitable for .25 (we're at -rc3 now).

Hence we revert them and try again in v2.6.26.

Signed-off-by: Peter Zijlstra 
CC: Srivatsa Vaddagiri 
Tested-by: Alexey Zaytsev 
Signed-off-by: Ingo Molnar