<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel, branch v2.6.26.3</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>posix-timers: fix posix_timer_event() vs dequeue_signal() race</title>
<updated>2008-08-20T18:05:00+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@tv-sign.ru</email>
</author>
<published>2008-08-12T15:30:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=24bad46e1e0c9b3882ecacdbefbbf63249419aaf'/>
<id>24bad46e1e0c9b3882ecacdbefbbf63249419aaf</id>
<content type='text'>
commit ba661292a2bc6ddd305a212b0526e5dc22195fe7 upstream

The bug was reported and analysed by Mark McLoughlin &lt;markmc@redhat.com&gt;,
the patch is based on his and Roland's suggestions.

posix_timer_event() always rewrites the pre-allocated siginfo before sending
the signal. Most of the written info is the same all the time, but memset(0)
is very wrong. If -&gt;sigq is queued we can race with collect_signal() which
can fail to find this siginfo looking at .si_signo, or copy_siginfo() can
copy the wrong .si_code/si_tid/etc.

In short, sys_timer_settime() can in fact stop the active timer, or the user
can receive the siginfo with the wrong .si_xxx values.

Move "memset(-&gt;info, 0)" from posix_timer_event() to alloc_posix_timer(),
change send_sigqueue() to set .si_overrun = 0 when -&gt;sigq is not queued.
It would be nice to move the whole sigq-&gt;info initialization from send to
create path, but this is not easy to do without uglifying timer_create()
further.

As Roland rightly pointed out, we need more cleanups/fixes here, see the
"FIXME" comment in the patch. Hopefully this patch makes sense anyway, and
it can mask the most bad implications.

Reported-by: Mark McLoughlin &lt;markmc@redhat.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Mark McLoughlin &lt;markmc@redhat.com&gt;
Cc: Oliver Pinter &lt;oliver.pntr@gmail.com&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit ba661292a2bc6ddd305a212b0526e5dc22195fe7 upstream

The bug was reported and analysed by Mark McLoughlin &lt;markmc@redhat.com&gt;,
the patch is based on his and Roland's suggestions.

posix_timer_event() always rewrites the pre-allocated siginfo before sending
the signal. Most of the written info is the same all the time, but memset(0)
is very wrong. If -&gt;sigq is queued we can race with collect_signal() which
can fail to find this siginfo looking at .si_signo, or copy_siginfo() can
copy the wrong .si_code/si_tid/etc.

In short, sys_timer_settime() can in fact stop the active timer, or the user
can receive the siginfo with the wrong .si_xxx values.

Move "memset(-&gt;info, 0)" from posix_timer_event() to alloc_posix_timer(),
change send_sigqueue() to set .si_overrun = 0 when -&gt;sigq is not queued.
It would be nice to move the whole sigq-&gt;info initialization from send to
create path, but this is not easy to do without uglifying timer_create()
further.

As Roland rightly pointed out, we need more cleanups/fixes here, see the
"FIXME" comment in the patch. Hopefully this patch makes sense anyway, and
it can mask the most bad implications.

Reported-by: Mark McLoughlin &lt;markmc@redhat.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Mark McLoughlin &lt;markmc@redhat.com&gt;
Cc: Oliver Pinter &lt;oliver.pntr@gmail.com&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>posix-timers: do_schedule_next_timer: fix the setting of -&gt;si_overrun</title>
<updated>2008-08-20T18:05:00+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@tv-sign.ru</email>
</author>
<published>2008-08-12T15:30:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=88be1c98cbdfd8ba4e231dc37da368d64d832e9e'/>
<id>88be1c98cbdfd8ba4e231dc37da368d64d832e9e</id>
<content type='text'>
commit 54da1174922cddd4be83d5a364b2e0fdd693f513 upstream

do_schedule_next_timer() sets info-&gt;si_overrun = timr-&gt;it_overrun_last,
this discards the already accumulated overruns.

Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Mark McLoughlin &lt;markmc@redhat.com&gt;
Cc: Oliver Pinter &lt;oliver.pntr@gmail.com&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 54da1174922cddd4be83d5a364b2e0fdd693f513 upstream

do_schedule_next_timer() sets info-&gt;si_overrun = timr-&gt;it_overrun_last,
this discards the already accumulated overruns.

Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Mark McLoughlin &lt;markmc@redhat.com&gt;
Cc: Oliver Pinter &lt;oliver.pntr@gmail.com&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>relay: fix "full buffer with exactly full last subbuffer" accounting problem</title>
<updated>2008-08-20T18:04:59+00:00</updated>
<author>
<name>Tom Zanussi</name>
<email>tzanussi@gmail.com</email>
</author>
<published>2008-08-06T00:05:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d6d033c731bbbdc700913790627968ef26ac29db'/>
<id>d6d033c731bbbdc700913790627968ef26ac29db</id>
<content type='text'>
commit 32194450330be327f3b25bf6b66298bd122599e9 upstream

In relay's current read implementation, if the buffer is completely full
but hasn't triggered the buffer-full condition (i.e. the last write
didn't cross the subbuffer boundary) and the last subbuffer is exactly
full, the subbuffer accounting code erroneously finds nothing available.
This patch fixes the problem.

Signed-off-by: Tom Zanussi &lt;tzanussi@gmail.com&gt;
Cc: Eduard - Gabriel Munteanu &lt;eduard.munteanu@linux360.ro&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: Mathieu Desnoyers &lt;compudj@krystal.dyndns.org&gt;
Cc: Andrea Righi &lt;righi.andrea@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 32194450330be327f3b25bf6b66298bd122599e9 upstream

In relay's current read implementation, if the buffer is completely full
but hasn't triggered the buffer-full condition (i.e. the last write
didn't cross the subbuffer boundary) and the last subbuffer is exactly
full, the subbuffer accounting code erroneously finds nothing available.
This patch fixes the problem.

Signed-off-by: Tom Zanussi &lt;tzanussi@gmail.com&gt;
Cc: Eduard - Gabriel Munteanu &lt;eduard.munteanu@linux360.ro&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: Mathieu Desnoyers &lt;compudj@krystal.dyndns.org&gt;
Cc: Andrea Righi &lt;righi.andrea@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>markers: fix markers read barrier for multiple probes</title>
<updated>2008-08-01T19:43:11+00:00</updated>
<author>
<name>Mathieu Desnoyers</name>
<email>mathieu.desnoyers@polymtl.ca</email>
</author>
<published>2008-07-30T18:20:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=30812f838cde7347071b8119667ea9bb8623ee78'/>
<id>30812f838cde7347071b8119667ea9bb8623ee78</id>
<content type='text'>
commit 5def9a3a22e09c99717f41ab7f07ec9e1a1f3ec8 upstream

Paul pointed out two incorrect read barriers in the marker handler code in
the path where multiple probes are connected.  Those are ordering reads of
"ptype" (single or multi probe marker), "multi" array pointer, and "multi"
array data access.

It should be ordered like this :

read ptype
smp_rmb()
read multi array pointer
smp_read_barrier_depends()
access data referenced by multi array pointer

The code with a single probe connected (optimized case, does not have to
allocate an array) has correct memory ordering.

It applies to kernel 2.6.26.x, 2.6.25.x and linux-next.

Signed-off-by: Mathieu Desnoyers &lt;mathieu.desnoyers@polymtl.ca&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 5def9a3a22e09c99717f41ab7f07ec9e1a1f3ec8 upstream

Paul pointed out two incorrect read barriers in the marker handler code in
the path where multiple probes are connected.  Those are ordering reads of
"ptype" (single or multi probe marker), "multi" array pointer, and "multi"
array data access.

It should be ordered like this :

read ptype
smp_rmb()
read multi array pointer
smp_read_barrier_depends()
access data referenced by multi array pointer

The code with a single probe connected (optimized case, does not have to
allocate an array) has correct memory ordering.

It applies to kernel 2.6.26.x, 2.6.25.x and linux-next.

Signed-off-by: Mathieu Desnoyers &lt;mathieu.desnoyers@polymtl.ca&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>cpusets: fix wrong domain attr updates</title>
<updated>2008-08-01T19:43:04+00:00</updated>
<author>
<name>Miao Xie</name>
<email>miaox@cn.fujitsu.com</email>
</author>
<published>2008-07-22T20:05:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d4f3865285ebc54998c341c57498de96f8daf4d0'/>
<id>d4f3865285ebc54998c341c57498de96f8daf4d0</id>
<content type='text'>
commit 91cd4d6ef0abb1f65e81f8fe37e7d3c10344e38c upstream

Fix wrong domain attr updates, or we will always update the first sched
domain attr.

Signed-off-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Cc: Hidetoshi Seto &lt;seto.hidetoshi@jp.fujitsu.com&gt;
Cc: Paul Jackson &lt;pj@sgi.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 91cd4d6ef0abb1f65e81f8fe37e7d3c10344e38c upstream

Fix wrong domain attr updates, or we will always update the first sched
domain attr.

Signed-off-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Cc: Hidetoshi Seto &lt;seto.hidetoshi@jp.fujitsu.com&gt;
Cc: Paul Jackson &lt;pj@sgi.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Fix build on COMPAT platforms when CONFIG_EPOLL is disabled</title>
<updated>2008-08-01T19:43:03+00:00</updated>
<author>
<name>Atsushi Nemoto</name>
<email>anemo@mba.ocn.ne.jp</email>
</author>
<published>2008-07-22T20:05:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=12b47b11e2cc0fae5ff87d63e455a76d0844fa55'/>
<id>12b47b11e2cc0fae5ff87d63e455a76d0844fa55</id>
<content type='text'>
commit 5f17156fc55abac476d180e480bedb0f07f01b14 upstream

Add missing cond_syscall() entry for compat_sys_epoll_pwait.

Signed-off-by: Atsushi Nemoto &lt;anemo@mba.ocn.ne.jp&gt;
Cc: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 5f17156fc55abac476d180e480bedb0f07f01b14 upstream

Add missing cond_syscall() entry for compat_sys_epoll_pwait.

Signed-off-by: Atsushi Nemoto &lt;anemo@mba.ocn.ne.jp&gt;
Cc: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>rcu: fix rcu_try_flip_waitack_needed() to prevent grace-period stall</title>
<updated>2008-08-01T19:43:00+00:00</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.vnet.ibm.com</email>
</author>
<published>2008-07-15T21:30:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6c6748b28299ab09172b387c98bb7204e8d043b0'/>
<id>6c6748b28299ab09172b387c98bb7204e8d043b0</id>
<content type='text'>
commit d7c0651390b6a03ad53f99faec0ba88109d7191d upstream

The comment was correct -- need to make the code match the comment.
Without this patch, if a CPU goes dynticks idle (and stays there forever)
in just the right phase of preemptible-RCU grace-period processing,
grace periods stall.  The offending sequence of events (courtesy
of Promela/spin, at least after I got the liveness criterion coded
correctly...) is as follows:

o	CPU 0 is in dynticks-idle mode.  Its dynticks_progress_counter
	is (say) 10.

o	CPU 0 takes an interrupt, so rcu_irq_enter() increments CPU 0's
	dynticks_progress_counter to 11.

o	CPU 1 is doing RCU grace-period processing in rcu_try_flip_idle(),
	sees rcu_pending(), so invokes dyntick_save_progress_counter(),
	which in turn takes a snapshot of CPU 0's dynticks_progress_counter
	into CPU 0's rcu_dyntick_snapshot -- now set to 11.  CPU 1 then
	updates the RCU grace-period state to rcu_try_flip_waitack().

o	CPU 0 returns from its interrupt, so rcu_irq_exit() increments
	CPU 0's dynticks_progress_counter to 12.

o	CPU 1 later invokes rcu_try_flip_waitack(), which notices that
	CPU 0 has not yet responded, and hence in turn invokes
	rcu_try_flip_waitack_needed().  This function examines the
	state of CPU 0's dynticks_progress_counter and rcu_dyntick_snapshot
	variables, which it copies to curr (== 12) and snap (== 11),
	respectively.

	Because curr!=snap, the first condition fails.

	Because curr-snap is only 1 and snap is odd, the second
	condition fails.

	rcu_try_flip_waitack_needed() therefore incorrectly concludes
	that it must wait for CPU 0 to explicitly acknowledge the
	counter flip.

o	CPU 0 remains forever in dynticks-idle mode, never taking
	any more hardware interrupts or any NMIs, and never running
	any more tasks.  (Of course, -something- will usually eventually
	happen, which might be why we haven't seen this one in the
	wild.  Still should be fixed!)

Therefore the grace period never ends.  Fix is to make the code match
the comment, as shown below.  With this fix, the above scenario
would be satisfied with curr being even, and allow the grace period
to proceed.

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Josh Triplett &lt;josh@kernel.org&gt;
Cc: Dipankar Sarma &lt;dipankar@in.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit d7c0651390b6a03ad53f99faec0ba88109d7191d upstream

The comment was correct -- need to make the code match the comment.
Without this patch, if a CPU goes dynticks idle (and stays there forever)
in just the right phase of preemptible-RCU grace-period processing,
grace periods stall.  The offending sequence of events (courtesy
of Promela/spin, at least after I got the liveness criterion coded
correctly...) is as follows:

o	CPU 0 is in dynticks-idle mode.  Its dynticks_progress_counter
	is (say) 10.

o	CPU 0 takes an interrupt, so rcu_irq_enter() increments CPU 0's
	dynticks_progress_counter to 11.

o	CPU 1 is doing RCU grace-period processing in rcu_try_flip_idle(),
	sees rcu_pending(), so invokes dyntick_save_progress_counter(),
	which in turn takes a snapshot of CPU 0's dynticks_progress_counter
	into CPU 0's rcu_dyntick_snapshot -- now set to 11.  CPU 1 then
	updates the RCU grace-period state to rcu_try_flip_waitack().

o	CPU 0 returns from its interrupt, so rcu_irq_exit() increments
	CPU 0's dynticks_progress_counter to 12.

o	CPU 1 later invokes rcu_try_flip_waitack(), which notices that
	CPU 0 has not yet responded, and hence in turn invokes
	rcu_try_flip_waitack_needed().  This function examines the
	state of CPU 0's dynticks_progress_counter and rcu_dyntick_snapshot
	variables, which it copies to curr (== 12) and snap (== 11),
	respectively.

	Because curr!=snap, the first condition fails.

	Because curr-snap is only 1 and snap is odd, the second
	condition fails.

	rcu_try_flip_waitack_needed() therefore incorrectly concludes
	that it must wait for CPU 0 to explicitly acknowledge the
	counter flip.

o	CPU 0 remains forever in dynticks-idle mode, never taking
	any more hardware interrupts or any NMIs, and never running
	any more tasks.  (Of course, -something- will usually eventually
	happen, which might be why we haven't seen this one in the
	wild.  Still should be fixed!)

Therefore the grace period never ends.  Fix is to make the code match
the comment, as shown below.  With this fix, the above scenario
would be satisfied with curr being even, and allow the grace period
to proceed.

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Josh Triplett &lt;josh@kernel.org&gt;
Cc: Dipankar Sarma &lt;dipankar@in.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip</title>
<updated>2008-07-13T18:03:59+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2008-07-13T18:03:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3b5c6b834984b09b7b7fba6a97d3a2878a4a8e42'/>
<id>3b5c6b834984b09b7b7fba6a97d3a2878a4a8e42</id>
<content type='text'>
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  cpusets, hotplug, scheduler: fix scheduler domain breakage
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  cpusets, hotplug, scheduler: fix scheduler domain breakage
</pre>
</div>
</content>
</entry>
<entry>
<title>cpusets, hotplug, scheduler: fix scheduler domain breakage</title>
<updated>2008-07-13T09:37:02+00:00</updated>
<author>
<name>Dmitry Adamushko</name>
<email>dmitry.adamushko@gmail.com</email>
</author>
<published>2008-07-13T00:10:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3e84050c81ffb4961ef43d20e1fb1d7607167d83'/>
<id>3e84050c81ffb4961ef43d20e1fb1d7607167d83</id>
<content type='text'>
Commit f18f982ab ("sched: CPU hotplug events must not destroy scheduler
domains created by the cpusets") introduced a hotplug-related problem as
described below:

Upon CPU_DOWN_PREPARE,

  update_sched_domains() -&gt; detach_destroy_domains(&amp;cpu_online_map)

does the following:

/*
 * Force a reinitialization of the sched domains hierarchy. The domains
 * and groups cannot be updated in place without racing with the balancing
 * code, so we temporarily attach all running cpus to the NULL domain
 * which will prevent rebalancing while the sched domains are recalculated.
 */

The sched-domains should be rebuilt when a CPU_DOWN ops. has been
completed, effectively either upon CPU_DEAD{_FROZEN} (upon success) or
CPU_DOWN_FAILED{_FROZEN} (upon failure -- restore the things to their
initial state). That's what update_sched_domains() also does but only
for !CPUSETS case.

With f18f982ab, sched-domains' reinitialization is delegated to
CPUSETS code:

cpuset_handle_cpuhp() -&gt; common_cpu_mem_hotplug_unplug() -&gt;
rebuild_sched_domains()

Being called for CPU_UP_PREPARE and if its callback is called after
update_sched_domains()), it just negates all the work done by
update_sched_domains() -- i.e. a soon-to-be-offline cpu is included in
the sched-domains and that makes it visible for the load-balancer
while the CPU_DOWN ops. is in progress.

__migrate_live_tasks() moves the tasks off a 'dead' cpu (it's already
"offline" when this function is called).

try_to_wake_up() is called for one of these tasks from another CPU -&gt;
the load-balancer (wake_idle()) picks up a "dead" CPU and places the
task on it. Then e.g. BUG_ON(rq-&gt;nr_running) detects this a bit later
-&gt; oops.

Signed-off-by: Dmitry Adamushko &lt;dmitry.adamushko@gmail.com&gt;
Tested-by: Vegard Nossum &lt;vegard.nossum@gmail.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Max Krasnyansky &lt;maxk@qualcomm.com&gt;
Cc: Paul Jackson &lt;pj@sgi.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: miaox@cn.fujitsu.com
Cc: rostedt@goodmis.org
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit f18f982ab ("sched: CPU hotplug events must not destroy scheduler
domains created by the cpusets") introduced a hotplug-related problem as
described below:

Upon CPU_DOWN_PREPARE,

  update_sched_domains() -&gt; detach_destroy_domains(&amp;cpu_online_map)

does the following:

/*
 * Force a reinitialization of the sched domains hierarchy. The domains
 * and groups cannot be updated in place without racing with the balancing
 * code, so we temporarily attach all running cpus to the NULL domain
 * which will prevent rebalancing while the sched domains are recalculated.
 */

The sched-domains should be rebuilt when a CPU_DOWN ops. has been
completed, effectively either upon CPU_DEAD{_FROZEN} (upon success) or
CPU_DOWN_FAILED{_FROZEN} (upon failure -- restore the things to their
initial state). That's what update_sched_domains() also does but only
for !CPUSETS case.

With f18f982ab, sched-domains' reinitialization is delegated to
CPUSETS code:

cpuset_handle_cpuhp() -&gt; common_cpu_mem_hotplug_unplug() -&gt;
rebuild_sched_domains()

Being called for CPU_UP_PREPARE and if its callback is called after
update_sched_domains()), it just negates all the work done by
update_sched_domains() -- i.e. a soon-to-be-offline cpu is included in
the sched-domains and that makes it visible for the load-balancer
while the CPU_DOWN ops. is in progress.

__migrate_live_tasks() moves the tasks off a 'dead' cpu (it's already
"offline" when this function is called).

try_to_wake_up() is called for one of these tasks from another CPU -&gt;
the load-balancer (wake_idle()) picks up a "dead" CPU and places the
task on it. Then e.g. BUG_ON(rq-&gt;nr_running) detects this a bit later
-&gt; oops.

Signed-off-by: Dmitry Adamushko &lt;dmitry.adamushko@gmail.com&gt;
Tested-by: Vegard Nossum &lt;vegard.nossum@gmail.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Max Krasnyansky &lt;maxk@qualcomm.com&gt;
Cc: Paul Jackson &lt;pj@sgi.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: miaox@cn.fujitsu.com
Cc: rostedt@goodmis.org
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip</title>
<updated>2008-07-10T19:34:55+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2008-07-10T19:34:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a26449daa285c858fc68991c1d585b6927702cf5'/>
<id>a26449daa285c858fc68991c1d585b6927702cf5</id>
<content type='text'>
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: fix cpu hotplug, cleanup
  sched: fix cpu hotplug
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: fix cpu hotplug, cleanup
  sched: fix cpu hotplug
</pre>
</div>
</content>
</entry>
</feed>
