<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/sched.h, branch v2.6.35-rc3</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>proc: turn signal_struct-&gt;count into "int nr_threads"</title>
<updated>2010-05-27T16:12:47+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-05-26T21:43:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b3ac022cb9dc5883505a88b159d1b240ad1ef405'/>
<id>b3ac022cb9dc5883505a88b159d1b240ad1ef405</id>
<content type='text'>
No functional changes, just s/atomic_t count/int nr_threads/.

With the recent changes this counter has a single user, get_nr_threads()
And, none of its callers need the really accurate number of threads, not
to mention each caller obviously races with fork/exit.  It is only used to
report this value to the user-space, except first_tid() uses it to avoid
the unnecessary while_each_thread() loop in the unlikely case.

It is a bit sad we need a word in struct signal_struct for this, perhaps
we can change get_nr_threads() to approximate the number of threads using
signal-&gt;live and kill -&gt;nr_threads later.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
No functional changes, just s/atomic_t count/int nr_threads/.

With the recent changes this counter has a single user, get_nr_threads()
And, none of its callers need the really accurate number of threads, not
to mention each caller obviously races with fork/exit.  It is only used to
report this value to the user-space, except first_tid() uses it to avoid
the unnecessary while_each_thread() loop in the unlikely case.

It is a bit sad we need a word in struct signal_struct for this, perhaps
we can change get_nr_threads() to approximate the number of threads using
signal-&gt;live and kill -&gt;nr_threads later.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>proc: get_nr_threads() doesn't need -&gt;siglock any longer</title>
<updated>2010-05-27T16:12:47+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-05-26T21:43:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7e49827cc937a742ae02078b483e3eb78f791a2a'/>
<id>7e49827cc937a742ae02078b483e3eb78f791a2a</id>
<content type='text'>
Now that task-&gt;signal can't go away get_nr_threads() doesn't need
-&gt;siglock to read signal-&gt;count.

Also, make it inline, move into sched.h, and convert 2 other proc users of
signal-&gt;count to use this (now trivial) helper.

Henceforth get_nr_threads() is the only valid user of signal-&gt;count, we
are ready to turn it into "int nr_threads" or, perhaps, kill it.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that task-&gt;signal can't go away get_nr_threads() doesn't need
-&gt;siglock to read signal-&gt;count.

Also, make it inline, move into sched.h, and convert 2 other proc users of
signal-&gt;count to use this (now trivial) helper.

Henceforth get_nr_threads() is the only valid user of signal-&gt;count, we
are ready to turn it into "int nr_threads" or, perhaps, kill it.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kill the obsolete thread_group_cputime_free() helper</title>
<updated>2010-05-27T16:12:46+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-05-26T21:43:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a705be6b5e8b05f2ae51536ec709de921960326c'/>
<id>a705be6b5e8b05f2ae51536ec709de921960326c</id>
<content type='text'>
Kill the empty thread_group_cputime_free() helper.  It was needed to free
the per-cpu data which we no longer have.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Cc: Veaceslav Falico &lt;vfalico@redhat.com&gt;
Cc: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Kill the empty thread_group_cputime_free() helper.  It was needed to free
the per-cpu data which we no longer have.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Cc: Veaceslav Falico &lt;vfalico@redhat.com&gt;
Cc: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>signals: kill the awful task_rq_unlock_wait() hack</title>
<updated>2010-05-27T16:12:46+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-05-26T21:43:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b7b8ff6373d4b910af081f76888395e6df53249d'/>
<id>b7b8ff6373d4b910af081f76888395e6df53249d</id>
<content type='text'>
Now that task-&gt;signal can't go away we can revert the horrible hack added
by ad474caca3e2a0550b7ce0706527ad5ab389a4d4 ("fix for
account_group_exec_runtime(), make sure -&gt;signal can't be freed under
rq-&gt;lock").

And we can do more cleanups sched_stats.h/posix-cpu-timers.c later.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Alan Cox &lt;alan@linux.intel.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that task-&gt;signal can't go away we can revert the horrible hack added
by ad474caca3e2a0550b7ce0706527ad5ab389a4d4 ("fix for
account_group_exec_runtime(), make sure -&gt;signal can't be freed under
rq-&gt;lock").

And we can do more cleanups sched_stats.h/posix-cpu-timers.c later.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Alan Cox &lt;alan@linux.intel.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>signals: make task_struct-&gt;signal immutable/refcountable</title>
<updated>2010-05-27T16:12:46+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-05-26T21:43:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ea6d290ca34c4fd91b7348338c0cc7bdeff94a35'/>
<id>ea6d290ca34c4fd91b7348338c0cc7bdeff94a35</id>
<content type='text'>
We have a lot of problems with accessing task_struct-&gt;signal, it can
"disappear" at any moment.  Even current can't use its -&gt;signal safely
after exit_notify().  -&gt;siglock helps, but it is not convenient, not
always possible, and sometimes it makes sense to use task-&gt;signal even
after this task has already dead.

This patch adds the reference counter, sigcnt, into signal_struct.  This
reference is owned by task_struct and it is dropped in
__put_task_struct().  Perhaps it makes sense to export
get/put_signal_struct() later, but currently I don't see the immediate
reason.

Rename __cleanup_signal() to free_signal_struct() and unexport it.  With
the previous changes it does nothing except kmem_cache_free().

Change __exit_signal() to not clear/free -&gt;signal, it will be freed when
the last reference to any thread in the thread group goes away.

Note:
	- when the last thead exits signal-&gt;tty can point to nowhere, see
	  the next patch.

	- with or without this patch signal_struct-&gt;count should go away,
	  or at least it should be "int nr_threads" for fs/proc. This will
	  be addressed later.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Alan Cox &lt;alan@linux.intel.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have a lot of problems with accessing task_struct-&gt;signal, it can
"disappear" at any moment.  Even current can't use its -&gt;signal safely
after exit_notify().  -&gt;siglock helps, but it is not convenient, not
always possible, and sometimes it makes sense to use task-&gt;signal even
after this task has already dead.

This patch adds the reference counter, sigcnt, into signal_struct.  This
reference is owned by task_struct and it is dropped in
__put_task_struct().  Perhaps it makes sense to export
get/put_signal_struct() later, but currently I don't see the immediate
reason.

Rename __cleanup_signal() to free_signal_struct() and unexport it.  With
the previous changes it does nothing except kmem_cache_free().

Change __exit_signal() to not clear/free -&gt;signal, it will be freed when
the last reference to any thread in the thread group goes away.

Note:
	- when the last thead exits signal-&gt;tty can point to nowhere, see
	  the next patch.

	- with or without this patch signal_struct-&gt;count should go away,
	  or at least it should be "int nr_threads" for fs/proc. This will
	  be addressed later.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Alan Cox &lt;alan@linux.intel.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>exit: change zap_other_threads() to count sub-threads</title>
<updated>2010-05-27T16:12:46+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-05-26T21:43:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=09faef11df8c559a23e2405d123cb2683733a79a'/>
<id>09faef11df8c559a23e2405d123cb2683733a79a</id>
<content type='text'>
Change zap_other_threads() to return the number of other sub-threads found
on -&gt;thread_group list.

Other changes are cosmetic:

	- change the code to use while_each_thread() helper

	- remove the obsolete comment about SIGKILL/SIGSTOP

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Cc: Veaceslav Falico &lt;vfalico@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change zap_other_threads() to return the number of other sub-threads found
on -&gt;thread_group list.

Other changes are cosmetic:

	- change the code to use while_each_thread() helper

	- remove the obsolete comment about SIGKILL/SIGSTOP

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Cc: Veaceslav Falico &lt;vfalico@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cpusets: new round-robin rotor for SLAB allocations</title>
<updated>2010-05-27T16:12:44+00:00</updated>
<author>
<name>Jack Steiner</name>
<email>steiner@sgi.com</email>
</author>
<published>2010-05-26T21:42:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6adef3ebe570bcde67fd6c16101451ddde5712b5'/>
<id>6adef3ebe570bcde67fd6c16101451ddde5712b5</id>
<content type='text'>
We have observed several workloads running on multi-node systems where
memory is assigned unevenly across the nodes in the system.  There are
numerous reasons for this but one is the round-robin rotor in
cpuset_mem_spread_node().

For example, a simple test that writes a multi-page file will allocate
pages on nodes 0 2 4 6 ...  Odd nodes are skipped.  (Sometimes it
allocates on odd nodes &amp; skips even nodes).

An example is shown below.  The program "lfile" writes a file consisting
of 10 pages.  The program then mmaps the file &amp; uses get_mempolicy(...,
MPOL_F_NODE) to determine the nodes where the file pages were allocated.
The output is shown below:

	# ./lfile
	 allocated on nodes: 2 4 6 0 1 2 6 0 2

There is a single rotor that is used for allocating both file pages &amp; slab
pages.  Writing the file allocates both a data page &amp; a slab page
(buffer_head).  This advances the RR rotor 2 nodes for each page
allocated.

A quick confirmation seems to confirm this is the cause of the uneven
allocation:

	# echo 0 &gt;/dev/cpuset/memory_spread_slab
	# ./lfile
	 allocated on nodes: 6 7 8 9 0 1 2 3 4 5

This patch introduces a second rotor that is used for slab allocations.

Signed-off-by: Jack Steiner &lt;steiner@sgi.com&gt;
Acked-by: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Jack Steiner &lt;steiner@sgi.com&gt;
Cc: Robin Holt &lt;holt@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have observed several workloads running on multi-node systems where
memory is assigned unevenly across the nodes in the system.  There are
numerous reasons for this but one is the round-robin rotor in
cpuset_mem_spread_node().

For example, a simple test that writes a multi-page file will allocate
pages on nodes 0 2 4 6 ...  Odd nodes are skipped.  (Sometimes it
allocates on odd nodes &amp; skips even nodes).

An example is shown below.  The program "lfile" writes a file consisting
of 10 pages.  The program then mmaps the file &amp; uses get_mempolicy(...,
MPOL_F_NODE) to determine the nodes where the file pages were allocated.
The output is shown below:

	# ./lfile
	 allocated on nodes: 2 4 6 0 1 2 6 0 2

There is a single rotor that is used for allocating both file pages &amp; slab
pages.  Writing the file allocates both a data page &amp; a slab page
(buffer_head).  This advances the RR rotor 2 nodes for each page
allocated.

A quick confirmation seems to confirm this is the cause of the uneven
allocation:

	# echo 0 &gt;/dev/cpuset/memory_spread_slab
	# ./lfile
	 allocated on nodes: 6 7 8 9 0 1 2 3 4 5

This patch introduces a second rotor that is used for slab allocations.

Signed-off-by: Jack Steiner &lt;steiner@sgi.com&gt;
Acked-by: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Jack Steiner &lt;steiner@sgi.com&gt;
Cc: Robin Holt &lt;holt@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kernel-wide: replace USHORT_MAX, SHORT_MAX and SHORT_MIN with USHRT_MAX, SHRT_MAX and SHRT_MIN</title>
<updated>2010-05-25T15:07:02+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2010-05-24T21:33:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4be929be34f9bdeffa40d815d32d7d60d2c7f03b'/>
<id>4be929be34f9bdeffa40d815d32d7d60d2c7f03b</id>
<content type='text'>
- C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not
  USHORT_MAX/SHORT_MAX/SHORT_MIN.

- Make SHRT_MIN of type s16, not int, for consistency.

[akpm@linux-foundation.org: fix drivers/dma/timb_dma.c]
[akpm@linux-foundation.org: fix security/keys/keyring.c]
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Acked-by: WANG Cong &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not
  USHORT_MAX/SHORT_MAX/SHORT_MIN.

- Make SHRT_MIN of type s16, not int, for consistency.

[akpm@linux-foundation.org: fix drivers/dma/timb_dma.c]
[akpm@linux-foundation.org: fix security/keys/keyring.c]
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Acked-by: WANG Cong &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cpuset,mm: fix no node to alloc memory when changing cpuset's mems</title>
<updated>2010-05-25T15:06:57+00:00</updated>
<author>
<name>Miao Xie</name>
<email>miaox@cn.fujitsu.com</email>
</author>
<published>2010-05-24T21:32:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c0ff7453bb5c7c98e0885fb94279f2571946f280'/>
<id>c0ff7453bb5c7c98e0885fb94279f2571946f280</id>
<content type='text'>
Before applying this patch, cpuset updates task-&gt;mems_allowed and
mempolicy by setting all new bits in the nodemask first, and clearing all
old unallowed bits later.  But in the way, the allocator may find that
there is no node to alloc memory.

The reason is that cpuset rebinds the task's mempolicy, it cleans the
nodes which the allocater can alloc pages on, for example:

(mpol: mempolicy)
	task1			task1's mpol	task2
	alloc page		1
	  alloc on node0? NO	1
				1		change mems from 1 to 0
				1		rebind task1's mpol
				0-1		  set new bits
				0	  	  clear disallowed bits
	  alloc on node1? NO	0
	  ...
	can't alloc page
	  goto oom

This patch fixes this problem by expanding the nodes range first(set newly
allowed bits) and shrink it lazily(clear newly disallowed bits).  So we
use a variable to tell the write-side task that read-side task is reading
nodemask, and the write-side task clears newly disallowed nodes after
read-side task ends the current memory allocation.

[akpm@linux-foundation.org: fix spello]
Signed-off-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Cc: Hugh Dickins &lt;hugh.dickins@tiscali.co.uk&gt;
Cc: Ravikiran Thirumalai &lt;kiran@scalex86.org&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Before applying this patch, cpuset updates task-&gt;mems_allowed and
mempolicy by setting all new bits in the nodemask first, and clearing all
old unallowed bits later.  But in the way, the allocator may find that
there is no node to alloc memory.

The reason is that cpuset rebinds the task's mempolicy, it cleans the
nodes which the allocater can alloc pages on, for example:

(mpol: mempolicy)
	task1			task1's mpol	task2
	alloc page		1
	  alloc on node0? NO	1
				1		change mems from 1 to 0
				1		rebind task1's mpol
				0-1		  set new bits
				0	  	  clear disallowed bits
	  alloc on node1? NO	0
	  ...
	can't alloc page
	  goto oom

This patch fixes this problem by expanding the nodes range first(set newly
allowed bits) and shrink it lazily(clear newly disallowed bits).  So we
use a variable to tell the write-side task that read-side task is reading
nodemask, and the write-side task clears newly disallowed nodes after
read-side task ends the current memory allocation.

[akpm@linux-foundation.org: fix spello]
Signed-off-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Cc: Hugh Dickins &lt;hugh.dickins@tiscali.co.uk&gt;
Cc: Ravikiran Thirumalai &lt;kiran@scalex86.org&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip</title>
<updated>2010-05-18T15:27:54+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2010-05-18T15:27:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b8ae30ee26d379db436b0b8c8c3ff1b52f69e5d1'/>
<id>b8ae30ee26d379db436b0b8c8c3ff1b52f69e5d1</id>
<content type='text'>
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
  stop_machine: Move local variable closer to the usage site in cpu_stop_cpu_callback()
  sched, wait: Use wrapper functions
  sched: Remove a stale comment
  ondemand: Make the iowait-is-busy time a sysfs tunable
  ondemand: Solve a big performance issue by counting IOWAIT time as busy
  sched: Intoduce get_cpu_iowait_time_us()
  sched: Eliminate the ts-&gt;idle_lastupdate field
  sched: Fold updating of the last_update_time_info into update_ts_time_stats()
  sched: Update the idle statistics in get_cpu_idle_time_us()
  sched: Introduce a function to update the idle statistics
  sched: Add a comment to get_cpu_idle_time_us()
  cpu_stop: add dummy implementation for UP
  sched: Remove rq argument to the tracepoints
  rcu: need barrier() in UP synchronize_sched_expedited()
  sched: correctly place paranioa memory barriers in synchronize_sched_expedited()
  sched: kill paranoia check in synchronize_sched_expedited()
  sched: replace migration_thread with cpu_stop
  stop_machine: reimplement using cpu_stop
  cpu_stop: implement stop_cpu[s]()
  sched: Fix select_idle_sibling() logic in select_task_rq_fair()
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
  stop_machine: Move local variable closer to the usage site in cpu_stop_cpu_callback()
  sched, wait: Use wrapper functions
  sched: Remove a stale comment
  ondemand: Make the iowait-is-busy time a sysfs tunable
  ondemand: Solve a big performance issue by counting IOWAIT time as busy
  sched: Intoduce get_cpu_iowait_time_us()
  sched: Eliminate the ts-&gt;idle_lastupdate field
  sched: Fold updating of the last_update_time_info into update_ts_time_stats()
  sched: Update the idle statistics in get_cpu_idle_time_us()
  sched: Introduce a function to update the idle statistics
  sched: Add a comment to get_cpu_idle_time_us()
  cpu_stop: add dummy implementation for UP
  sched: Remove rq argument to the tracepoints
  rcu: need barrier() in UP synchronize_sched_expedited()
  sched: correctly place paranioa memory barriers in synchronize_sched_expedited()
  sched: kill paranoia check in synchronize_sched_expedited()
  sched: replace migration_thread with cpu_stop
  stop_machine: reimplement using cpu_stop
  cpu_stop: implement stop_cpu[s]()
  sched: Fix select_idle_sibling() logic in select_task_rq_fair()
  ...
</pre>
</div>
</content>
</entry>
</feed>
