linux-toradex.git/kernel/sched.c, branch v2.6.32.28

sched: Fix string comparison in /proc/sched_features

2010-11-22T18:47:30+00:00

commit 7740191cd909b75d75685fb08a5d1f54b8a9d28b upstream.

Fix incorrect handling of the following case:

 INTERACTIVE
 INTERACTIVE_SOMETHING_ELSE

The comparison only checks up to each element's length.

Changelog since v1:
 - Embellish using some Rostedtisms.
  [ mingo:                 ^^ == smaller and cleaner ]

Signed-off-by: Mathieu Desnoyers 
Reviewed-by: Steven Rostedt 
Cc: Peter Zijlstra 
Cc: Tony Lindgren 
LKML-Reference: <20100913214700.GB16118@Krystal>
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

fix 2.6.32.23 suspend regression caused by commit 6f6198a

2010-10-29T04:44:16+00:00

[Not upstream in the same way, as it was fixed differently there]

6f6198a sched: kill migration thread in CPU_POST_DEAD instead of CPU_DEAD
leaves migration threads lying about.  Mask out CPU_TASKS_FROZEN.

Signed-off-by: Mike Galbraith 
Signed-off-by: Greg Kroah-Hartman

sched: Fix user time incorrectly accounted as system time on 32-bit

2010-09-27T00:21:25+00:00

commit e75e863dd5c7d96b91ebbd241da5328fc38a78cc upstream.

We have 32-bit variable overflow possibility when multiply in
task_times() and thread_group_times() functions. When the
overflow happens then the scaled utime value becomes erroneously
small and the scaled stime becomes i erroneously big.

Reported here:

 https://bugzilla.redhat.com/show_bug.cgi?id=633037
 https://bugzilla.kernel.org/show_bug.cgi?id=16559

Reported-by: Michael Chapman 
Reported-by: Ciriaco Garcia de Celis 
Signed-off-by: Stanislaw Gruszka 
Signed-off-by: Peter Zijlstra 
Cc: Hidetoshi Seto 
LKML-Reference: <20100914143513.GB8415@redhat.com>
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

sched: cpuacct: Use bigger percpu counter batch values for stats counters

2010-09-20T20:18:12+00:00

commit fa535a77bd3fa32b9215ba375d6a202fe73e1dd6 upstream

When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are
enabled we can call cpuacct_update_stats with values much larger
than percpu_counter_batch.  This means the call to
percpu_counter_add will always add to the global count which is
protected by a spinlock and we end up with a global spinlock in
the scheduler.

Based on an idea by KOSAKI Motohiro, this patch scales the batch
value by cputime_one_jiffy such that we have the same batch
limit as we would if CONFIG_VIRT_CPU_ACCOUNTING was disabled.
His patch did this once at boot but that initialisation happened
too early on PowerPC (before time_init) and it was never updated
at runtime as a result of a hotplug cpu add/remove.

This patch instead scales percpu_counter_batch by
cputime_one_jiffy at runtime, which keeps the batch correct even
after cpu hotplug operations.  We cap it at INT_MAX in case of
overflow.

For architectures that do not support
CONFIG_VIRT_CPU_ACCOUNTING, cputime_one_jiffy is the constant 1
and gcc is smart enough to optimise min(s32
percpu_counter_batch, INT_MAX) to just percpu_counter_batch at
least on x86 and PowerPC.  So there is no need to add an #ifdef.

On a 64 thread PowerPC box with CONFIG_VIRT_CPU_ACCOUNTING and
CONFIG_CGROUP_CPUACCT enabled, a context switch microbenchmark
is 234x faster and almost matches a CONFIG_CGROUP_CPUACCT
disabled kernel:

 CONFIG_CGROUP_CPUACCT disabled:   16906698 ctx switches/sec
 CONFIG_CGROUP_CPUACCT enabled:       61720 ctx switches/sec
 CONFIG_CGROUP_CPUACCT + patch:	   16663217 ctx switches/sec

Tested with:

 wget http://ozlabs.org/~anton/junkcode/context_switch.c
 make context_switch
 for i in `seq 0 63`; do taskset -c $i ./context_switch & done
 vmstat 1

Signed-off-by: Anton Blanchard 
Reviewed-by: KOSAKI Motohiro 
Acked-by: Balbir Singh 
Tested-by: Balbir Singh 
Cc: Peter Zijlstra 
Cc: Martin Schwidefsky 
Cc: Tony Luck 
Signed-off-by: Andrew Morton 
Signed-off-by: Ingo Molnar 
Signed-off-by: Mike Galbraith 
Signed-off-by: Greg Kroah-Hartman

sched: Pre-compute cpumask_weight(sched_domain_span(sd))

2010-09-20T20:18:11+00:00

commit 669c55e9f99b90e46eaa0f98a67ec53d46dc969a upstream

Dave reported that his large SPARC machines spend lots of time in
hweight64(), try and optimize some of those needless cpumask_weight()
invocations (esp. with the large offstack cpumasks these are very
expensive indeed).

Reported-by: David Miller 
Signed-off-by: Peter Zijlstra 
LKML-Reference: 
Signed-off-by: Ingo Molnar 
Signed-off-by: Mike Galbraith 
Signed-off-by: Greg Kroah-Hartman

sched: Fix nr_uninterruptible count

2010-09-20T20:18:09+00:00

commit cc87f76a601d2d256118f7bab15e35254356ae21 upstream

The cpuload calculation in calc_load_account_active() assumes
rq->nr_uninterruptible will not change on an offline cpu after
migrate_nr_uninterruptible(). However the recent migrate on wakeup
changes broke that and would result in decrementing the offline cpu's
rq->nr_uninterruptible.

Fix this by accounting the nr_uninterruptible on the waking cpu.

Signed-off-by: Peter Zijlstra 
LKML-Reference: 
Signed-off-by: Ingo Molnar 
Signed-off-by: Mike Galbraith 
Signed-off-by: Greg Kroah-Hartman

sched: Optimize task_rq_lock()

2010-09-20T20:18:09+00:00

commit 65cc8e4859ff29a9ddc989c88557d6059834c2a2 upstream

Now that we hold the rq->lock over set_task_cpu() again, we can do
away with most of the TASK_WAKING checks and reduce them again to
set_cpus_allowed_ptr().

Removes some conditionals from scheduling hot-paths.

Signed-off-by: Peter Zijlstra 
Cc: Oleg Nesterov 
LKML-Reference: 
Signed-off-by: Ingo Molnar 
Signed-off-by: Mike Galbraith 
Signed-off-by: Greg Kroah-Hartman

sched: Fix TASK_WAKING vs fork deadlock

2010-09-20T20:18:09+00:00

commit 0017d735092844118bef006696a750a0e4ef6ebd upstream

Oleg noticed a few races with the TASK_WAKING usage on fork.

 - since TASK_WAKING is basically a spinlock, it should be IRQ safe
 - since we set TASK_WAKING (*) without holding rq->lock it could
   be there still is a rq->lock holder, thereby not actually
   providing full serialization.

(*) in fact we clear PF_STARTING, which in effect enables TASK_WAKING.

Cure the second issue by not setting TASK_WAKING in sched_fork(), but
only temporarily in wake_up_new_task() while calling select_task_rq().

Cure the first by holding rq->lock around the select_task_rq() call,
this will disable IRQs, this however requires that we push down the
rq->lock release into select_task_rq_fair()'s cgroup stuff.

Because select_task_rq_fair() still needs to drop the rq->lock we
cannot fully get rid of TASK_WAKING.

Reported-by: Oleg Nesterov 
Signed-off-by: Peter Zijlstra 
LKML-Reference: 
Signed-off-by: Ingo Molnar 
Signed-off-by: Mike Galbraith 
Signed-off-by: Greg Kroah-Hartman

sched: Make select_fallback_rq() cpuset friendly

2010-09-20T20:18:08+00:00

commit 9084bb8246ea935b98320554229e2f371f7f52fa upstream

Introduce cpuset_cpus_allowed_fallback() helper to fix the cpuset problems
with select_fallback_rq(). It can be called from any context and can't use
any cpuset locks including task_lock(). It is called when the task doesn't
have online cpus in ->cpus_allowed but ttwu/etc must be able to find a
suitable cpu.

I am not proud of this patch. Everything which needs such a fat comment
can't be good even if correct. But I'd prefer to not change the locking
rules in the code I hardly understand, and in any case I believe this
simple change make the code much more correct compared to deadlocks we
currently have.

Signed-off-by: Oleg Nesterov 
Signed-off-by: Peter Zijlstra 
LKML-Reference: <20100315091027.GA9155@redhat.com>
Signed-off-by: Ingo Molnar 
Signed-off-by: Mike Galbraith 
Signed-off-by: Greg Kroah-Hartman

sched: _cpu_down(): Don't play with current->cpus_allowed

2010-09-20T20:18:08+00:00

commit 6a1bdc1b577ebcb65f6603c57f8347309bc4ab13 upstream

_cpu_down() changes the current task's affinity and then recovers it at
the end. The problems are well known: we can't restore old_allowed if it
was bound to the now-dead-cpu, and we can race with the userspace which
can change cpu-affinity during unplug.

_cpu_down() should not play with current->cpus_allowed at all. Instead,
take_cpu_down() can migrate the caller of _cpu_down() after __cpu_disable()
removes the dying cpu from cpu_online_mask.

Signed-off-by: Oleg Nesterov 
Acked-by: Rafael J. Wysocki 
Signed-off-by: Peter Zijlstra 
LKML-Reference: <20100315091023.GA9148@redhat.com>
Signed-off-by: Ingo Molnar 
Signed-off-by: Mike Galbraith 
Signed-off-by: Greg Kroah-Hartman