<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/sched.h, branch v2.6.35.4</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>sched: Revert nohz_ratelimit() for now</title>
<updated>2010-08-13T20:31:02+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2010-07-09T13:12:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1dc89aec877583e3e42421be77b063724a4bbb07'/>
<id>1dc89aec877583e3e42421be77b063724a4bbb07</id>
<content type='text'>
commit 396e894d289d69bacf5acd983c97cd6e21a14c08 upstream.

Norbert reported that nohz_ratelimit() causes his laptop to burn about
4W (40%) extra. For now back out the change and see if we can adjust
the power management code to make better decisions.

Reported-by: Norbert Preining &lt;preining@logic.at&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Acked-by: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Arjan van de Ven &lt;arjan@infradead.org&gt;
LKML-Reference: &lt;new-submission&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 396e894d289d69bacf5acd983c97cd6e21a14c08 upstream.

Norbert reported that nohz_ratelimit() causes his laptop to burn about
4W (40%) extra. For now back out the change and see if we can adjust
the power management code to make better decisions.

Reported-by: Norbert Preining &lt;preining@logic.at&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Acked-by: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Arjan van de Ven &lt;arjan@infradead.org&gt;
LKML-Reference: &lt;new-submission&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>CRED: Fix __task_cred()'s lockdep check and banner comment</title>
<updated>2010-07-29T22:16:18+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2010-07-29T11:45:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8f92054e7ca1d3a3ae50fb42d2253ac8730d9b2a'/>
<id>8f92054e7ca1d3a3ae50fb42d2253ac8730d9b2a</id>
<content type='text'>
Fix __task_cred()'s lockdep check by removing the following validation
condition:

	lockdep_tasklist_lock_is_held()

as commit_creds() does not take the tasklist_lock, and nor do most of the
functions that call it, so this check is pointless and it can prevent
detection of the RCU lock not being held if the tasklist_lock is held.

Instead, add the following validation condition:

	task-&gt;exit_state &gt;= 0

to permit the access if the target task is dead and therefore unable to change
its own credentials.

Fix __task_cred()'s comment to:

 (1) discard the bit that says that the caller must prevent the target task
     from being deleted.  That shouldn't need saying.

 (2) Add a comment indicating the result of __task_cred() should not be passed
     directly to get_cred(), but rather than get_task_cred() should be used
     instead.

Also put a note into the documentation to enforce this point there too.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix __task_cred()'s lockdep check by removing the following validation
condition:

	lockdep_tasklist_lock_is_held()

as commit_creds() does not take the tasklist_lock, and nor do most of the
functions that call it, so this check is pointless and it can prevent
detection of the RCU lock not being held if the tasklist_lock is held.

Instead, add the following validation condition:

	task-&gt;exit_state &gt;= 0

to permit the access if the target task is dead and therefore unable to change
its own credentials.

Fix __task_cred()'s comment to:

 (1) discard the bit that says that the caller must prevent the target task
     from being deleted.  That shouldn't need saying.

 (2) Add a comment indicating the result of __task_cred() should not be passed
     directly to get_cred(), but rather than get_task_cred() should be used
     instead.

Also put a note into the documentation to enforce this point there too.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Cure nr_iowait_cpu() users</title>
<updated>2010-07-01T07:39:48+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2010-07-01T07:07:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8c215bd3890c347dfb6a2db4779755f8b9c298a9'/>
<id>8c215bd3890c347dfb6a2db4779755f8b9c298a9</id>
<content type='text'>
Commit 0224cf4c5e (sched: Intoduce get_cpu_iowait_time_us())
broke things by not making sure preemption was indeed disabled
by the callers of nr_iowait_cpu() which took the iowait value of
the current cpu.

This resulted in a heap of preempt warnings. Cure this by making
nr_iowait_cpu() take a cpu number and fix up the callers to pass
in the right number.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arjan van de Ven &lt;arjan@infradead.org&gt;
Cc: Sergey Senozhatsky &lt;sergey.senozhatsky@gmail.com&gt;
Cc: Rafael J. Wysocki &lt;rjw@sisk.pl&gt;
Cc: Maxim Levitsky &lt;maximlevitsky@gmail.com&gt;
Cc: Len Brown &lt;len.brown@intel.com&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: Jiri Slaby &lt;jslaby@suse.cz&gt;
Cc: linux-pm@lists.linux-foundation.org
LKML-Reference: &lt;1277968037.1868.120.camel@laptop&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 0224cf4c5e (sched: Intoduce get_cpu_iowait_time_us())
broke things by not making sure preemption was indeed disabled
by the callers of nr_iowait_cpu() which took the iowait value of
the current cpu.

This resulted in a heap of preempt warnings. Cure this by making
nr_iowait_cpu() take a cpu number and fix up the callers to pass
in the right number.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arjan van de Ven &lt;arjan@infradead.org&gt;
Cc: Sergey Senozhatsky &lt;sergey.senozhatsky@gmail.com&gt;
Cc: Rafael J. Wysocki &lt;rjw@sisk.pl&gt;
Cc: Maxim Levitsky &lt;maximlevitsky@gmail.com&gt;
Cc: Len Brown &lt;len.brown@intel.com&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: Jiri Slaby &lt;jslaby@suse.cz&gt;
Cc: linux-pm@lists.linux-foundation.org
LKML-Reference: &lt;1277968037.1868.120.camel@laptop&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>proc: turn signal_struct-&gt;count into "int nr_threads"</title>
<updated>2010-05-27T16:12:47+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-05-26T21:43:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b3ac022cb9dc5883505a88b159d1b240ad1ef405'/>
<id>b3ac022cb9dc5883505a88b159d1b240ad1ef405</id>
<content type='text'>
No functional changes, just s/atomic_t count/int nr_threads/.

With the recent changes this counter has a single user, get_nr_threads()
And, none of its callers need the really accurate number of threads, not
to mention each caller obviously races with fork/exit.  It is only used to
report this value to the user-space, except first_tid() uses it to avoid
the unnecessary while_each_thread() loop in the unlikely case.

It is a bit sad we need a word in struct signal_struct for this, perhaps
we can change get_nr_threads() to approximate the number of threads using
signal-&gt;live and kill -&gt;nr_threads later.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
No functional changes, just s/atomic_t count/int nr_threads/.

With the recent changes this counter has a single user, get_nr_threads()
And, none of its callers need the really accurate number of threads, not
to mention each caller obviously races with fork/exit.  It is only used to
report this value to the user-space, except first_tid() uses it to avoid
the unnecessary while_each_thread() loop in the unlikely case.

It is a bit sad we need a word in struct signal_struct for this, perhaps
we can change get_nr_threads() to approximate the number of threads using
signal-&gt;live and kill -&gt;nr_threads later.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>proc: get_nr_threads() doesn't need -&gt;siglock any longer</title>
<updated>2010-05-27T16:12:47+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-05-26T21:43:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7e49827cc937a742ae02078b483e3eb78f791a2a'/>
<id>7e49827cc937a742ae02078b483e3eb78f791a2a</id>
<content type='text'>
Now that task-&gt;signal can't go away get_nr_threads() doesn't need
-&gt;siglock to read signal-&gt;count.

Also, make it inline, move into sched.h, and convert 2 other proc users of
signal-&gt;count to use this (now trivial) helper.

Henceforth get_nr_threads() is the only valid user of signal-&gt;count, we
are ready to turn it into "int nr_threads" or, perhaps, kill it.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that task-&gt;signal can't go away get_nr_threads() doesn't need
-&gt;siglock to read signal-&gt;count.

Also, make it inline, move into sched.h, and convert 2 other proc users of
signal-&gt;count to use this (now trivial) helper.

Henceforth get_nr_threads() is the only valid user of signal-&gt;count, we
are ready to turn it into "int nr_threads" or, perhaps, kill it.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kill the obsolete thread_group_cputime_free() helper</title>
<updated>2010-05-27T16:12:46+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-05-26T21:43:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a705be6b5e8b05f2ae51536ec709de921960326c'/>
<id>a705be6b5e8b05f2ae51536ec709de921960326c</id>
<content type='text'>
Kill the empty thread_group_cputime_free() helper.  It was needed to free
the per-cpu data which we no longer have.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Cc: Veaceslav Falico &lt;vfalico@redhat.com&gt;
Cc: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Kill the empty thread_group_cputime_free() helper.  It was needed to free
the per-cpu data which we no longer have.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Cc: Veaceslav Falico &lt;vfalico@redhat.com&gt;
Cc: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>signals: kill the awful task_rq_unlock_wait() hack</title>
<updated>2010-05-27T16:12:46+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-05-26T21:43:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b7b8ff6373d4b910af081f76888395e6df53249d'/>
<id>b7b8ff6373d4b910af081f76888395e6df53249d</id>
<content type='text'>
Now that task-&gt;signal can't go away we can revert the horrible hack added
by ad474caca3e2a0550b7ce0706527ad5ab389a4d4 ("fix for
account_group_exec_runtime(), make sure -&gt;signal can't be freed under
rq-&gt;lock").

And we can do more cleanups sched_stats.h/posix-cpu-timers.c later.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Alan Cox &lt;alan@linux.intel.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that task-&gt;signal can't go away we can revert the horrible hack added
by ad474caca3e2a0550b7ce0706527ad5ab389a4d4 ("fix for
account_group_exec_runtime(), make sure -&gt;signal can't be freed under
rq-&gt;lock").

And we can do more cleanups sched_stats.h/posix-cpu-timers.c later.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Alan Cox &lt;alan@linux.intel.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>signals: make task_struct-&gt;signal immutable/refcountable</title>
<updated>2010-05-27T16:12:46+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-05-26T21:43:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ea6d290ca34c4fd91b7348338c0cc7bdeff94a35'/>
<id>ea6d290ca34c4fd91b7348338c0cc7bdeff94a35</id>
<content type='text'>
We have a lot of problems with accessing task_struct-&gt;signal, it can
"disappear" at any moment.  Even current can't use its -&gt;signal safely
after exit_notify().  -&gt;siglock helps, but it is not convenient, not
always possible, and sometimes it makes sense to use task-&gt;signal even
after this task has already dead.

This patch adds the reference counter, sigcnt, into signal_struct.  This
reference is owned by task_struct and it is dropped in
__put_task_struct().  Perhaps it makes sense to export
get/put_signal_struct() later, but currently I don't see the immediate
reason.

Rename __cleanup_signal() to free_signal_struct() and unexport it.  With
the previous changes it does nothing except kmem_cache_free().

Change __exit_signal() to not clear/free -&gt;signal, it will be freed when
the last reference to any thread in the thread group goes away.

Note:
	- when the last thead exits signal-&gt;tty can point to nowhere, see
	  the next patch.

	- with or without this patch signal_struct-&gt;count should go away,
	  or at least it should be "int nr_threads" for fs/proc. This will
	  be addressed later.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Alan Cox &lt;alan@linux.intel.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have a lot of problems with accessing task_struct-&gt;signal, it can
"disappear" at any moment.  Even current can't use its -&gt;signal safely
after exit_notify().  -&gt;siglock helps, but it is not convenient, not
always possible, and sometimes it makes sense to use task-&gt;signal even
after this task has already dead.

This patch adds the reference counter, sigcnt, into signal_struct.  This
reference is owned by task_struct and it is dropped in
__put_task_struct().  Perhaps it makes sense to export
get/put_signal_struct() later, but currently I don't see the immediate
reason.

Rename __cleanup_signal() to free_signal_struct() and unexport it.  With
the previous changes it does nothing except kmem_cache_free().

Change __exit_signal() to not clear/free -&gt;signal, it will be freed when
the last reference to any thread in the thread group goes away.

Note:
	- when the last thead exits signal-&gt;tty can point to nowhere, see
	  the next patch.

	- with or without this patch signal_struct-&gt;count should go away,
	  or at least it should be "int nr_threads" for fs/proc. This will
	  be addressed later.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Alan Cox &lt;alan@linux.intel.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>exit: change zap_other_threads() to count sub-threads</title>
<updated>2010-05-27T16:12:46+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-05-26T21:43:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=09faef11df8c559a23e2405d123cb2683733a79a'/>
<id>09faef11df8c559a23e2405d123cb2683733a79a</id>
<content type='text'>
Change zap_other_threads() to return the number of other sub-threads found
on -&gt;thread_group list.

Other changes are cosmetic:

	- change the code to use while_each_thread() helper

	- remove the obsolete comment about SIGKILL/SIGSTOP

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Cc: Veaceslav Falico &lt;vfalico@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change zap_other_threads() to return the number of other sub-threads found
on -&gt;thread_group list.

Other changes are cosmetic:

	- change the code to use while_each_thread() helper

	- remove the obsolete comment about SIGKILL/SIGSTOP

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Cc: Veaceslav Falico &lt;vfalico@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cpusets: new round-robin rotor for SLAB allocations</title>
<updated>2010-05-27T16:12:44+00:00</updated>
<author>
<name>Jack Steiner</name>
<email>steiner@sgi.com</email>
</author>
<published>2010-05-26T21:42:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6adef3ebe570bcde67fd6c16101451ddde5712b5'/>
<id>6adef3ebe570bcde67fd6c16101451ddde5712b5</id>
<content type='text'>
We have observed several workloads running on multi-node systems where
memory is assigned unevenly across the nodes in the system.  There are
numerous reasons for this but one is the round-robin rotor in
cpuset_mem_spread_node().

For example, a simple test that writes a multi-page file will allocate
pages on nodes 0 2 4 6 ...  Odd nodes are skipped.  (Sometimes it
allocates on odd nodes &amp; skips even nodes).

An example is shown below.  The program "lfile" writes a file consisting
of 10 pages.  The program then mmaps the file &amp; uses get_mempolicy(...,
MPOL_F_NODE) to determine the nodes where the file pages were allocated.
The output is shown below:

	# ./lfile
	 allocated on nodes: 2 4 6 0 1 2 6 0 2

There is a single rotor that is used for allocating both file pages &amp; slab
pages.  Writing the file allocates both a data page &amp; a slab page
(buffer_head).  This advances the RR rotor 2 nodes for each page
allocated.

A quick confirmation seems to confirm this is the cause of the uneven
allocation:

	# echo 0 &gt;/dev/cpuset/memory_spread_slab
	# ./lfile
	 allocated on nodes: 6 7 8 9 0 1 2 3 4 5

This patch introduces a second rotor that is used for slab allocations.

Signed-off-by: Jack Steiner &lt;steiner@sgi.com&gt;
Acked-by: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Jack Steiner &lt;steiner@sgi.com&gt;
Cc: Robin Holt &lt;holt@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have observed several workloads running on multi-node systems where
memory is assigned unevenly across the nodes in the system.  There are
numerous reasons for this but one is the round-robin rotor in
cpuset_mem_spread_node().

For example, a simple test that writes a multi-page file will allocate
pages on nodes 0 2 4 6 ...  Odd nodes are skipped.  (Sometimes it
allocates on odd nodes &amp; skips even nodes).

An example is shown below.  The program "lfile" writes a file consisting
of 10 pages.  The program then mmaps the file &amp; uses get_mempolicy(...,
MPOL_F_NODE) to determine the nodes where the file pages were allocated.
The output is shown below:

	# ./lfile
	 allocated on nodes: 2 4 6 0 1 2 6 0 2

There is a single rotor that is used for allocating both file pages &amp; slab
pages.  Writing the file allocates both a data page &amp; a slab page
(buffer_head).  This advances the RR rotor 2 nodes for each page
allocated.

A quick confirmation seems to confirm this is the cause of the uneven
allocation:

	# echo 0 &gt;/dev/cpuset/memory_spread_slab
	# ./lfile
	 allocated on nodes: 6 7 8 9 0 1 2 3 4 5

This patch introduces a second rotor that is used for slab allocations.

Signed-off-by: Jack Steiner &lt;steiner@sgi.com&gt;
Acked-by: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Jack Steiner &lt;steiner@sgi.com&gt;
Cc: Robin Holt &lt;holt@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
