<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/nodemask.h, branch v2.6.36-rc5</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>Revert "cpusets: randomize node rotor used in cpuset_mem_spread_node()"</title>
<updated>2010-05-30T16:00:03+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2010-05-30T16:00:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=35926ff5fba8245bd1c6ac04155048f6f89232b1'/>
<id>35926ff5fba8245bd1c6ac04155048f6f89232b1</id>
<content type='text'>
This reverts commit 0ac0c0d0f837c499afd02a802f9cf52d3027fa3b, which
caused cross-architecture build problems for all the wrong reasons.
IA64 already added its own version of __node_random(), but the fact is,
there is nothing architectural about the function, and the original
commit was just badly done. Revert it, since no fix is forthcoming.

Requested-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit 0ac0c0d0f837c499afd02a802f9cf52d3027fa3b, which
caused cross-architecture build problems for all the wrong reasons.
IA64 already added its own version of __node_random(), but the fact is,
there is nothing architectural about the function, and the original
commit was just badly done. Revert it, since no fix is forthcoming.

Requested-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cpusets: randomize node rotor used in cpuset_mem_spread_node()</title>
<updated>2010-05-27T16:12:44+00:00</updated>
<author>
<name>Jack Steiner</name>
<email>steiner@sgi.com</email>
</author>
<published>2010-05-26T21:42:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0ac0c0d0f837c499afd02a802f9cf52d3027fa3b'/>
<id>0ac0c0d0f837c499afd02a802f9cf52d3027fa3b</id>
<content type='text'>
Some workloads that create a large number of small files tend to assign
too many pages to node 0 (multi-node systems).  Part of the reason is that
the rotor (in cpuset_mem_spread_node()) used to assign nodes starts at
node 0 for newly created tasks.

This patch changes the rotor to be initialized to a random node number of
the cpuset.

[akpm@linux-foundation.org: fix layout]
[Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration]
Signed-off-by: Jack Steiner &lt;steiner@sgi.com&gt;
Signed-off-by: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Cc: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Jack Steiner &lt;steiner@sgi.com&gt;
Cc: Robin Holt &lt;holt@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Some workloads that create a large number of small files tend to assign
too many pages to node 0 (multi-node systems).  Part of the reason is that
the rotor (in cpuset_mem_spread_node()) used to assign nodes starts at
node 0 for newly created tasks.

This patch changes the rotor to be initialized to a random node number of
the cpuset.

[akpm@linux-foundation.org: fix layout]
[Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration]
Signed-off-by: Jack Steiner &lt;steiner@sgi.com&gt;
Signed-off-by: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Cc: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Jack Steiner &lt;steiner@sgi.com&gt;
Cc: Robin Holt &lt;holt@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nodemask: fix the declaration of NODEMASK_ALLOC()</title>
<updated>2010-03-12T23:52:38+00:00</updated>
<author>
<name>Miao Xie</name>
<email>miaox@cn.fujitsu.com</email>
</author>
<published>2010-03-10T23:22:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7baab93f9297da3e42a8cecfbf91d5f22f415500'/>
<id>7baab93f9297da3e42a8cecfbf91d5f22f415500</id>
<content type='text'>
we can't declarate two variable at the same scope by NODEMASK_ALLOC().

This patch fixes it.

Signed-off-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
we can't declarate two variable at the same scope by NODEMASK_ALLOC().

This patch fixes it.

Signed-off-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nodemask.h: remove macro any_online_node</title>
<updated>2010-03-06T19:26:31+00:00</updated>
<author>
<name>H Hartley Sweeten</name>
<email>hartleys@visionengravers.com</email>
</author>
<published>2010-03-05T21:42:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=72c3368856c543ace033f6a5b9a3edf1f4043236'/>
<id>72c3368856c543ace033f6a5b9a3edf1f4043236</id>
<content type='text'>
The macro any_online_node() is prone to producing sparse warnings due to
the local symbol 'node'.  Since all the in-tree users are really
requesting the first online node (the mask argument is either
NODE_MASK_ALL or node_online_map) just use the first_online_node macro and
remove the any_online_node macro since there are no users.

Signed-off-by: H Hartley Sweeten &lt;hsweeten@visionengravers.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Reviewed-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Acked-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Dave Hansen &lt;dave@linux.vnet.ibm.com&gt;
Cc: Milton Miller &lt;miltonm@bga.com&gt;
Cc: Nathan Fontenot &lt;nfont@austin.ibm.com&gt;
Cc: Geoff Levand &lt;geoffrey.levand@am.sony.com&gt;
Cc: Grant Likely &lt;grant.likely@secretlab.ca&gt;
Cc: J. Bruce Fields &lt;bfields@fieldses.org&gt;
Cc: Neil Brown &lt;neilb@suse.de&gt;
Cc: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Benny Halevy &lt;bhalevy@panasas.com&gt;
Cc: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Cc: Ricardo Labiaga &lt;Ricardo.Labiaga@netapp.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The macro any_online_node() is prone to producing sparse warnings due to
the local symbol 'node'.  Since all the in-tree users are really
requesting the first online node (the mask argument is either
NODE_MASK_ALL or node_online_map) just use the first_online_node macro and
remove the any_online_node macro since there are no users.

Signed-off-by: H Hartley Sweeten &lt;hsweeten@visionengravers.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Reviewed-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Acked-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Dave Hansen &lt;dave@linux.vnet.ibm.com&gt;
Cc: Milton Miller &lt;miltonm@bga.com&gt;
Cc: Nathan Fontenot &lt;nfont@austin.ibm.com&gt;
Cc: Geoff Levand &lt;geoffrey.levand@am.sony.com&gt;
Cc: Grant Likely &lt;grant.likely@secretlab.ca&gt;
Cc: J. Bruce Fields &lt;bfields@fieldses.org&gt;
Cc: Neil Brown &lt;neilb@suse.de&gt;
Cc: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Benny Halevy &lt;bhalevy@panasas.com&gt;
Cc: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Cc: Ricardo Labiaga &lt;Ricardo.Labiaga@netapp.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: add gfp flags for NODEMASK_ALLOC slab allocations</title>
<updated>2009-12-15T16:53:13+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2009-12-15T01:58:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=bad44b5be84cf3bb1ff900bec02ee61e1993328c'/>
<id>bad44b5be84cf3bb1ff900bec02ee61e1993328c</id>
<content type='text'>
Objects passed to NODEMASK_ALLOC() are relatively small in size and are
backed by slab caches that are not of large order, traditionally never
greater than PAGE_ALLOC_COSTLY_ORDER.

Thus, using GFP_KERNEL for these allocations on large machines when
CONFIG_NODES_SHIFT &gt; 8 will cause the page allocator to loop endlessly in
the allocation attempt, each time invoking both direct reclaim or the oom
killer.

This is of particular interest when using NODEMASK_ALLOC() from a
mempolicy context (either directly in mm/mempolicy.c or the mempolicy
constrained hugetlb allocations) since the oom killer always kills current
when allocations are constrained by mempolicies.  So for all present use
cases in the kernel, current would end up being oom killed when direct
reclaim fails.  That would allow the NODEMASK_ALLOC() to succeed but
current would have sacrificed itself upon returning.

This patch adds gfp flags to NODEMASK_ALLOC() to pass to kmalloc() on
CONFIG_NODES_SHIFT &gt; 8; this parameter is a nop on other configurations.
All current use cases either directly from hugetlb code or indirectly via
NODEMASK_SCRATCH() union __GFP_NORETRY to avoid direct reclaim and the oom
killer when the slab allocator needs to allocate additional pages.

The side-effect of this change is that all current use cases of either
NODEMASK_ALLOC() or NODEMASK_SCRATCH() need appropriate -ENOMEM handling
when the allocation fails (never for CONFIG_NODES_SHIFT &lt;= 8).  All
current use cases were audited and do have appropriate error handling at
this time.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Cc: Nishanth Aravamudan &lt;nacc@us.ibm.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Adam Litke &lt;agl@us.ibm.com&gt;
Cc: Andy Whitcroft &lt;apw@canonical.com&gt;
Cc: Eric Whitney &lt;eric.whitney@hp.com&gt;
Cc: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Objects passed to NODEMASK_ALLOC() are relatively small in size and are
backed by slab caches that are not of large order, traditionally never
greater than PAGE_ALLOC_COSTLY_ORDER.

Thus, using GFP_KERNEL for these allocations on large machines when
CONFIG_NODES_SHIFT &gt; 8 will cause the page allocator to loop endlessly in
the allocation attempt, each time invoking both direct reclaim or the oom
killer.

This is of particular interest when using NODEMASK_ALLOC() from a
mempolicy context (either directly in mm/mempolicy.c or the mempolicy
constrained hugetlb allocations) since the oom killer always kills current
when allocations are constrained by mempolicies.  So for all present use
cases in the kernel, current would end up being oom killed when direct
reclaim fails.  That would allow the NODEMASK_ALLOC() to succeed but
current would have sacrificed itself upon returning.

This patch adds gfp flags to NODEMASK_ALLOC() to pass to kmalloc() on
CONFIG_NODES_SHIFT &gt; 8; this parameter is a nop on other configurations.
All current use cases either directly from hugetlb code or indirectly via
NODEMASK_SCRATCH() union __GFP_NORETRY to avoid direct reclaim and the oom
killer when the slab allocator needs to allocate additional pages.

The side-effect of this change is that all current use cases of either
NODEMASK_ALLOC() or NODEMASK_SCRATCH() need appropriate -ENOMEM handling
when the allocation fails (never for CONFIG_NODES_SHIFT &lt;= 8).  All
current use cases were audited and do have appropriate error handling at
this time.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Cc: Nishanth Aravamudan &lt;nacc@us.ibm.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Adam Litke &lt;agl@us.ibm.com&gt;
Cc: Andy Whitcroft &lt;apw@canonical.com&gt;
Cc: Eric Whitney &lt;eric.whitney@hp.com&gt;
Cc: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>hugetlb: factor init_nodemask_of_node()</title>
<updated>2009-12-15T16:53:12+00:00</updated>
<author>
<name>Lee Schermerhorn</name>
<email>lee.schermerhorn@hp.com</email>
</author>
<published>2009-12-15T01:58:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c1e6c8d074ea3621106548654cc244d2edc12ead'/>
<id>c1e6c8d074ea3621106548654cc244d2edc12ead</id>
<content type='text'>
Factor init_nodemask_of_node() out of the nodemask_of_node() macro.

This will be used to populate the huge pages "nodes_allowed" nodemask for
a single node when basing nodes_allowed on a preferred/local mempolicy or
when a persistent huge page pool page count is modified via a per node
sysfs attribute.

Signed-off-by: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Acked-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Reviewed-by: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Cc: Nishanth Aravamudan &lt;nacc@us.ibm.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Adam Litke &lt;agl@us.ibm.com&gt;
Cc: Andy Whitcroft &lt;apw@canonical.com&gt;
Cc: Eric Whitney &lt;eric.whitney@hp.com&gt;
Cc: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Factor init_nodemask_of_node() out of the nodemask_of_node() macro.

This will be used to populate the huge pages "nodes_allowed" nodemask for
a single node when basing nodes_allowed on a preferred/local mempolicy or
when a persistent huge page pool page count is modified via a per node
sysfs attribute.

Signed-off-by: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Acked-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Reviewed-by: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Cc: Nishanth Aravamudan &lt;nacc@us.ibm.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Adam Litke &lt;agl@us.ibm.com&gt;
Cc: Andy Whitcroft &lt;apw@canonical.com&gt;
Cc: Eric Whitney &lt;eric.whitney@hp.com&gt;
Cc: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nodemask: make NODEMASK_ALLOC more general</title>
<updated>2009-12-15T16:53:12+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2009-12-15T01:58:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4e7b8a6cef64a4c1f1194f9926f794c2b75ebdd7'/>
<id>4e7b8a6cef64a4c1f1194f9926f794c2b75ebdd7</id>
<content type='text'>
This is a series of patches to provide control over the location of the
allocation and freeing of persistent huge pages on a NUMA platform.
Please consider for merging into mmotm.

This series uses two mechanisms to constrain the nodes from which
persistent huge pages are allocated: 1) the task NUMA mempolicy of the
task modifying a new sysctl "nr_hugepages_mempolicy", based on a
suggestion by Mel Gorman; and 2) a subset of the hugepages hstate sysfs
attributes have been added [in V4] to each node system device under:

	/sys/devices/node/node[0-9]*/hugepages

The per node attibutes allow direct assignment of a huge page count on a
specific node, regardless of the task's mempolicy or cpuset constraints.

This patch:

NODEMASK_ALLOC(x, m) assumes x is a type of struct, which is unnecessary.
It's perfectly reasonable to use this macro to allocate a nodemask_t,
which is anonymous, either dynamically or on the stack depending on
NODES_SHIFT.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Cc: Nishanth Aravamudan &lt;nacc@us.ibm.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Adam Litke &lt;agl@us.ibm.com&gt;
Cc: Andy Whitcroft &lt;apw@canonical.com&gt;
Cc: Eric Whitney &lt;eric.whitney@hp.com&gt;
Cc: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is a series of patches to provide control over the location of the
allocation and freeing of persistent huge pages on a NUMA platform.
Please consider for merging into mmotm.

This series uses two mechanisms to constrain the nodes from which
persistent huge pages are allocated: 1) the task NUMA mempolicy of the
task modifying a new sysctl "nr_hugepages_mempolicy", based on a
suggestion by Mel Gorman; and 2) a subset of the hugepages hstate sysfs
attributes have been added [in V4] to each node system device under:

	/sys/devices/node/node[0-9]*/hugepages

The per node attibutes allow direct assignment of a huge page count on a
specific node, regardless of the task's mempolicy or cpuset constraints.

This patch:

NODEMASK_ALLOC(x, m) assumes x is a type of struct, which is unnecessary.
It's perfectly reasonable to use this macro to allocate a nodemask_t,
which is anonymous, either dynamically or on the stack depending on
NODES_SHIFT.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Cc: Nishanth Aravamudan &lt;nacc@us.ibm.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Adam Litke &lt;agl@us.ibm.com&gt;
Cc: Andy Whitcroft &lt;apw@canonical.com&gt;
Cc: Eric Whitney &lt;eric.whitney@hp.com&gt;
Cc: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: make set_mempolicy(MPOL_INTERLEAV) N_HIGH_MEMORY aware</title>
<updated>2009-08-07T17:39:55+00:00</updated>
<author>
<name>KAMEZAWA Hiroyuki</name>
<email>kamezawa.hiroyu@jp.fujitsu.com</email>
</author>
<published>2009-08-06T22:07:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4bfc44958e499af9a73f62201543b3a1f617cfeb'/>
<id>4bfc44958e499af9a73f62201543b3a1f617cfeb</id>
<content type='text'>
At first, init_task's mems_allowed is initialized as this.
 init_task-&gt;mems_allowed == node_state[N_POSSIBLE]

And cpuset's top_cpuset mask is initialized as this
 top_cpuset-&gt;mems_allowed = node_state[N_HIGH_MEMORY]

Before 2.6.29:
policy's mems_allowed is initialized as this.

  1. update tasks-&gt;mems_allowed by its cpuset-&gt;mems_allowed.
  2. policy-&gt;mems_allowed = nodes_and(tasks-&gt;mems_allowed, user's mask)

Updating task's mems_allowed in reference to top_cpuset's one.
cpuset's mems_allowed is aware of N_HIGH_MEMORY, always.

In 2.6.30: After commit 58568d2a8215cb6f55caf2332017d7bdff954e1c
("cpuset,mm: update tasks' mems_allowed in time"), policy's mems_allowed
is initialized as this.

  1. policy-&gt;mems_allowd = nodes_and(task-&gt;mems_allowed, user's mask)

Here, if task is in top_cpuset, task-&gt;mems_allowed is not updated from
init's one.  Assume user excutes command as #numactrl --interleave=all
,....

  policy-&gt;mems_allowd = nodes_and(N_POSSIBLE, ALL_SET_MASK)

Then, policy's mems_allowd can includes a possible node, which has no pgdat.

MPOL's INTERLEAVE just scans nodemask of task-&gt;mems_allowd and access this
directly.

  NODE_DATA(nid)-&gt;zonelist even if NODE_DATA(nid)==NULL

Then, what's we need is making policy-&gt;mems_allowed be aware of
N_HIGH_MEMORY.  This patch does that.  But to do so, extra nodemask will
be on statck.  Because I know cpumask has a new interface of
CPUMASK_ALLOC(), I added it to node.

This patch stands on old behavior.  But I feel this fix itself is just a
Band-Aid.  But to do fundametal fix, we have to take care of memory
hotplug and it takes time.  (task-&gt;mems_allowd should be N_HIGH_MEMORY, I
think.)

mpol_set_nodemask() should be aware of N_HIGH_MEMORY and policy's nodemask
should be includes only online nodes.

In old behavior, this is guaranteed by frequent reference to cpuset's
code.  Now, most of them are removed and mempolicy has to check it by
itself.

To do check, a few nodemask_t will be used for calculating nodemask.  But,
size of nodemask_t can be big and it's not good to allocate them on stack.

Now, cpumask_t has CPUMASK_ALLOC/FREE an easy code for get scratch area.
NODEMASK_ALLOC/FREE shoudl be there.

[akpm@linux-foundation.org: cleanups &amp; tweaks]
Tested-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Yasunori Goto &lt;y-goto@jp.fujitsu.com&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
At first, init_task's mems_allowed is initialized as this.
 init_task-&gt;mems_allowed == node_state[N_POSSIBLE]

And cpuset's top_cpuset mask is initialized as this
 top_cpuset-&gt;mems_allowed = node_state[N_HIGH_MEMORY]

Before 2.6.29:
policy's mems_allowed is initialized as this.

  1. update tasks-&gt;mems_allowed by its cpuset-&gt;mems_allowed.
  2. policy-&gt;mems_allowed = nodes_and(tasks-&gt;mems_allowed, user's mask)

Updating task's mems_allowed in reference to top_cpuset's one.
cpuset's mems_allowed is aware of N_HIGH_MEMORY, always.

In 2.6.30: After commit 58568d2a8215cb6f55caf2332017d7bdff954e1c
("cpuset,mm: update tasks' mems_allowed in time"), policy's mems_allowed
is initialized as this.

  1. policy-&gt;mems_allowd = nodes_and(task-&gt;mems_allowed, user's mask)

Here, if task is in top_cpuset, task-&gt;mems_allowed is not updated from
init's one.  Assume user excutes command as #numactrl --interleave=all
,....

  policy-&gt;mems_allowd = nodes_and(N_POSSIBLE, ALL_SET_MASK)

Then, policy's mems_allowd can includes a possible node, which has no pgdat.

MPOL's INTERLEAVE just scans nodemask of task-&gt;mems_allowd and access this
directly.

  NODE_DATA(nid)-&gt;zonelist even if NODE_DATA(nid)==NULL

Then, what's we need is making policy-&gt;mems_allowed be aware of
N_HIGH_MEMORY.  This patch does that.  But to do so, extra nodemask will
be on statck.  Because I know cpumask has a new interface of
CPUMASK_ALLOC(), I added it to node.

This patch stands on old behavior.  But I feel this fix itself is just a
Band-Aid.  But to do fundametal fix, we have to take care of memory
hotplug and it takes time.  (task-&gt;mems_allowd should be N_HIGH_MEMORY, I
think.)

mpol_set_nodemask() should be aware of N_HIGH_MEMORY and policy's nodemask
should be includes only online nodes.

In old behavior, this is guaranteed by frequent reference to cpuset's
code.  Now, most of them are removed and mempolicy has to check it by
itself.

To do check, a few nodemask_t will be used for calculating nodemask.  But,
size of nodemask_t can be big and it's not good to allocate them on stack.

Now, cpumask_t has CPUMASK_ALLOC/FREE an easy code for get scratch area.
NODEMASK_ALLOC/FREE shoudl be there.

[akpm@linux-foundation.org: cleanups &amp; tweaks]
Tested-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Yasunori Goto &lt;y-goto@jp.fujitsu.com&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>page allocator: use a pre-calculated value instead of num_online_nodes() in fast paths</title>
<updated>2009-06-17T02:47:35+00:00</updated>
<author>
<name>Christoph Lameter</name>
<email>cl@linux-foundation.org</email>
</author>
<published>2009-06-16T22:32:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=62bc62a873116805774ffd37d7f86aa4faa832b1'/>
<id>62bc62a873116805774ffd37d7f86aa4faa832b1</id>
<content type='text'>
num_online_nodes() is called in a number of places but most often by the
page allocator when deciding whether the zonelist needs to be filtered
based on cpusets or the zonelist cache.  This is actually a heavy function
and touches a number of cache lines.

This patch stores the number of online nodes at boot time and updates the
value when nodes get onlined and offlined.  The value is then used in a
number of important paths in place of num_online_nodes().

[rientjes@google.com: do not override definition of node_set_online() with macro]
Signed-off-by: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Dave Hansen &lt;dave@linux.vnet.ibm.com&gt;
Cc: Lee Schermerhorn &lt;Lee.Schermerhorn@hp.com&gt;
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
num_online_nodes() is called in a number of places but most often by the
page allocator when deciding whether the zonelist needs to be filtered
based on cpusets or the zonelist cache.  This is actually a heavy function
and touches a number of cache lines.

This patch stores the number of online nodes at boot time and updates the
value when nodes get onlined and offlined.  The value is then used in a
number of important paths in place of num_online_nodes().

[rientjes@google.com: do not override definition of node_set_online() with macro]
Signed-off-by: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Dave Hansen &lt;dave@linux.vnet.ibm.com&gt;
Cc: Lee Schermerhorn &lt;Lee.Schermerhorn@hp.com&gt;
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mempolicy: add bitmap_onto() and bitmap_fold() operations</title>
<updated>2008-04-28T15:58:19+00:00</updated>
<author>
<name>Paul Jackson</name>
<email>pj@sgi.com</email>
</author>
<published>2008-04-28T09:12:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7ea931c9fc80c4d0a4306c30ec92eb0f1d922a0b'/>
<id>7ea931c9fc80c4d0a4306c30ec92eb0f1d922a0b</id>
<content type='text'>
The following adds two more bitmap operators, bitmap_onto() and bitmap_fold(),
with the usual cpumask and nodemask wrappers.

The bitmap_onto() operator computes one bitmap relative to another.  If the
n-th bit in the origin mask is set, then the m-th bit of the destination mask
will be set, where m is the position of the n-th set bit in the relative mask.

The bitmap_fold() operator folds a bitmap into a second that has bit m set iff
the input bitmap has some bit n set, where m == n mod sz, for the specified sz
value.

There are two substantive changes between this patch and its
predecessor bitmap_relative:
 1) Renamed bitmap_relative() to be bitmap_onto().
 2) Added bitmap_fold().

The essential motivation for bitmap_onto() is to provide a mechanism for
converting a cpuset-relative CPU or Node mask to an absolute mask.  Cpuset
relative masks are written as if the current task were in a cpuset whose CPUs
or Nodes were just the consecutive ones numbered 0..N-1, for some N.  The
bitmap_onto() operator is provided in anticipation of adding support for the
first such cpuset relative mask, by the mbind() and set_mempolicy() system
calls, using a planned flag of MPOL_F_RELATIVE_NODES.  These bitmap operators
(and their nodemask wrappers, in particular) will be used in code that
converts the user specified cpuset relative memory policy to a specific system
node numbered policy, given the current mems_allowed of the tasks cpuset.

Such cpuset relative mempolicies will address two deficiencies
of the existing interface between cpusets and mempolicies:
 1) A task cannot at present reliably establish a cpuset
    relative mempolicy because there is an essential race
    condition, in that the tasks cpuset may be changed in
    between the time the task can query its cpuset placement,
    and the time the task can issue the applicable mbind or
    set_memplicy system call.
 2) A task cannot at present establish what cpuset relative
    mempolicy it would like to have, if it is in a smaller
    cpuset than it might have mempolicy preferences for,
    because the existing interface only allows specifying
    mempolicies for nodes currently allowed by the cpuset.

Cpuset relative mempolicies are useful for tasks that don't distinguish
particularly between one CPU or Node and another, but only between how many of
each are allowed, and the proper placement of threads and memory pages on the
various CPUs and Nodes available.

The motivation for the added bitmap_fold() can be seen in the following
example.

Let's say an application has specified some mempolicies that presume 16 memory
nodes, including say a mempolicy that specified MPOL_F_RELATIVE_NODES (cpuset
relative) nodes 12-15.  Then lets say that application is crammed into a
cpuset that only has 8 memory nodes, 0-7.  If one just uses bitmap_onto(),
this mempolicy, mapped to that cpuset, would ignore the requested relative
nodes above 7, leaving it empty of nodes.  That's not good; better to fold the
higher nodes down, so that some nodes are included in the resulting mapped
mempolicy.  In this case, the mempolicy nodes 12-15 are taken modulo 8 (the
weight of the mems_allowed of the confining cpuset), resulting in a mempolicy
specifying nodes 4-7.

Signed-off-by: Paul Jackson &lt;pj@sgi.com&gt;
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Christoph Lameter &lt;clameter@sgi.com&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Cc: &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: &lt;ray-lk@madrabbit.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The following adds two more bitmap operators, bitmap_onto() and bitmap_fold(),
with the usual cpumask and nodemask wrappers.

The bitmap_onto() operator computes one bitmap relative to another.  If the
n-th bit in the origin mask is set, then the m-th bit of the destination mask
will be set, where m is the position of the n-th set bit in the relative mask.

The bitmap_fold() operator folds a bitmap into a second that has bit m set iff
the input bitmap has some bit n set, where m == n mod sz, for the specified sz
value.

There are two substantive changes between this patch and its
predecessor bitmap_relative:
 1) Renamed bitmap_relative() to be bitmap_onto().
 2) Added bitmap_fold().

The essential motivation for bitmap_onto() is to provide a mechanism for
converting a cpuset-relative CPU or Node mask to an absolute mask.  Cpuset
relative masks are written as if the current task were in a cpuset whose CPUs
or Nodes were just the consecutive ones numbered 0..N-1, for some N.  The
bitmap_onto() operator is provided in anticipation of adding support for the
first such cpuset relative mask, by the mbind() and set_mempolicy() system
calls, using a planned flag of MPOL_F_RELATIVE_NODES.  These bitmap operators
(and their nodemask wrappers, in particular) will be used in code that
converts the user specified cpuset relative memory policy to a specific system
node numbered policy, given the current mems_allowed of the tasks cpuset.

Such cpuset relative mempolicies will address two deficiencies
of the existing interface between cpusets and mempolicies:
 1) A task cannot at present reliably establish a cpuset
    relative mempolicy because there is an essential race
    condition, in that the tasks cpuset may be changed in
    between the time the task can query its cpuset placement,
    and the time the task can issue the applicable mbind or
    set_memplicy system call.
 2) A task cannot at present establish what cpuset relative
    mempolicy it would like to have, if it is in a smaller
    cpuset than it might have mempolicy preferences for,
    because the existing interface only allows specifying
    mempolicies for nodes currently allowed by the cpuset.

Cpuset relative mempolicies are useful for tasks that don't distinguish
particularly between one CPU or Node and another, but only between how many of
each are allowed, and the proper placement of threads and memory pages on the
various CPUs and Nodes available.

The motivation for the added bitmap_fold() can be seen in the following
example.

Let's say an application has specified some mempolicies that presume 16 memory
nodes, including say a mempolicy that specified MPOL_F_RELATIVE_NODES (cpuset
relative) nodes 12-15.  Then lets say that application is crammed into a
cpuset that only has 8 memory nodes, 0-7.  If one just uses bitmap_onto(),
this mempolicy, mapped to that cpuset, would ignore the requested relative
nodes above 7, leaving it empty of nodes.  That's not good; better to fold the
higher nodes down, so that some nodes are included in the resulting mapped
mempolicy.  In this case, the mempolicy nodes 12-15 are taken modulo 8 (the
weight of the mems_allowed of the confining cpuset), resulting in a mempolicy
specifying nodes 4-7.

Signed-off-by: Paul Jackson &lt;pj@sgi.com&gt;
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Christoph Lameter &lt;clameter@sgi.com&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Cc: &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: &lt;ray-lk@madrabbit.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
