<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/mm/mempolicy.c, branch v4.10</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>mm/mempolicy.c: do not put mempolicy before using its nodemask</title>
<updated>2017-01-25T00:26:14+00:00</updated>
<author>
<name>Vlastimil Babka</name>
<email>vbabka@suse.cz</email>
</author>
<published>2017-01-24T23:18:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d51e9894d27492783fc6d1b489070b4ba66ce969'/>
<id>d51e9894d27492783fc6d1b489070b4ba66ce969</id>
<content type='text'>
Since commit be97a41b291e ("mm/mempolicy.c: merge alloc_hugepage_vma to
alloc_pages_vma") alloc_pages_vma() can potentially free a mempolicy by
mpol_cond_put() before accessing the embedded nodemask by
__alloc_pages_nodemask().  The commit log says it's so "we can use a
single exit path within the function" but that's clearly wrong.  We can
still do that when doing mpol_cond_put() after the allocation attempt.

Make sure the mempolicy is not freed prematurely, otherwise
__alloc_pages_nodemask() can end up using a bogus nodemask, which could
lead e.g.  to premature OOM.

Fixes: be97a41b291e ("mm/mempolicy.c: merge alloc_hugepage_vma to alloc_pages_vma")
Link: http://lkml.kernel.org/r/20170118141124.8345-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.0+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since commit be97a41b291e ("mm/mempolicy.c: merge alloc_hugepage_vma to
alloc_pages_vma") alloc_pages_vma() can potentially free a mempolicy by
mpol_cond_put() before accessing the embedded nodemask by
__alloc_pages_nodemask().  The commit log says it's so "we can use a
single exit path within the function" but that's clearly wrong.  We can
still do that when doing mpol_cond_put() after the allocation attempt.

Make sure the mempolicy is not freed prematurely, otherwise
__alloc_pages_nodemask() can end up using a bogus nodemask, which could
lead e.g.  to premature OOM.

Fixes: be97a41b291e ("mm/mempolicy.c: merge alloc_hugepage_vma to alloc_pages_vma")
Link: http://lkml.kernel.org/r/20170118141124.8345-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.0+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Replace &lt;asm/uaccess.h&gt; with &lt;linux/uaccess.h&gt; globally</title>
<updated>2016-12-24T19:46:01+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-12-24T19:46:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7c0f6ba682b9c7632072ffbedf8d328c8f3c42ba'/>
<id>7c0f6ba682b9c7632072ffbedf8d328c8f3c42ba</id>
<content type='text'>
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*&lt;asm/uaccess.h&gt;'
  sed -i -e "s!$PATT!#include &lt;linux/uaccess.h&gt;!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*&lt;asm/uaccess.h&gt;'
  sed -i -e "s!$PATT!#include &lt;linux/uaccess.h&gt;!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/mempolicy.c: forbid static or relative flags for local NUMA mode</title>
<updated>2016-12-13T02:55:07+00:00</updated>
<author>
<name>Piotr Kwapulinski</name>
<email>kwapulinski.piotr@gmail.com</email>
</author>
<published>2016-12-13T00:42:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8d303e44e99c2ae5cad31f3dded10a572b0fd4d7'/>
<id>8d303e44e99c2ae5cad31f3dded10a572b0fd4d7</id>
<content type='text'>
The MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES flags are irrelevant
when setting them for MPOL_LOCAL NUMA memory policy via set_mempolicy or
mbind.

Return the "invalid argument" from set_mempolicy and mbind whenever any
of these flags is passed along with MPOL_LOCAL.

It is consistent with MPOL_PREFERRED passed with empty nodemask.

It slightly shortens the execution time in paths where these flags are
used e.g.  when trying to rebind the NUMA nodes for changes in cgroups
cpuset mems (mpol_rebind_preferred()) or when just printing the mempolicy
structure (/proc/PID/numa_maps).  Isolated tests done.

Link: http://lkml.kernel.org/r/20161027163037.4089-1-kwapulinski.piotr@gmail.com
Signed-off-by: Piotr Kwapulinski &lt;kwapulinski.piotr@gmail.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Liang Chen &lt;liangchen.linux@gmail.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Nathan Zimmer &lt;nzimmer@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES flags are irrelevant
when setting them for MPOL_LOCAL NUMA memory policy via set_mempolicy or
mbind.

Return the "invalid argument" from set_mempolicy and mbind whenever any
of these flags is passed along with MPOL_LOCAL.

It is consistent with MPOL_PREFERRED passed with empty nodemask.

It slightly shortens the execution time in paths where these flags are
used e.g.  when trying to rebind the NUMA nodes for changes in cgroups
cpuset mems (mpol_rebind_preferred()) or when just printing the mempolicy
structure (/proc/PID/numa_maps).  Isolated tests done.

Link: http://lkml.kernel.org/r/20161027163037.4089-1-kwapulinski.piotr@gmail.com
Signed-off-by: Piotr Kwapulinski &lt;kwapulinski.piotr@gmail.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Liang Chen &lt;liangchen.linux@gmail.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Nathan Zimmer &lt;nzimmer@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm, mempolicy: clean up __GFP_THISNODE confusion in policy_zonelist</title>
<updated>2016-12-13T02:55:07+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2016-12-13T00:42:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6d8409580bee356ce418dcb94260b24dda639934'/>
<id>6d8409580bee356ce418dcb94260b24dda639934</id>
<content type='text'>
__GFP_THISNODE is documented to enforce the allocation to be satisified
from the requested node with no fallbacks or placement policy
enforcements.  policy_zonelist seemingly breaks this semantic if the
current policy is MPOL_MBIND and instead of taking the node it will
fallback to the first node in the mask if the requested one is not in
the mask.  This is confusing to say the least because it fact we
shouldn't ever go that path.  First tasks shouldn't be scheduled on CPUs
with nodes outside of their mempolicy binding.  And secondly
policy_zonelist is called only from 3 places:

 - huge_zonelist - never should do __GFP_THISNODE when going this path

 - alloc_pages_vma - which shouldn't depend on __GFP_THISNODE either

 - alloc_pages_current - which uses default_policy id __GFP_THISNODE is
   used

So we shouldn't even need to care about this possibility and can drop
the confusing code.  Let's keep a WARN_ON_ONCE in place to catch
potential users and fix them up properly (aka use a different allocation
function which ignores mempolicy).

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20161013125958.32155-1-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Anshuman Khandual &lt;khandual@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
__GFP_THISNODE is documented to enforce the allocation to be satisified
from the requested node with no fallbacks or placement policy
enforcements.  policy_zonelist seemingly breaks this semantic if the
current policy is MPOL_MBIND and instead of taking the node it will
fallback to the first node in the mask if the requested one is not in
the mask.  This is confusing to say the least because it fact we
shouldn't ever go that path.  First tasks shouldn't be scheduled on CPUs
with nodes outside of their mempolicy binding.  And secondly
policy_zonelist is called only from 3 places:

 - huge_zonelist - never should do __GFP_THISNODE when going this path

 - alloc_pages_vma - which shouldn't depend on __GFP_THISNODE either

 - alloc_pages_current - which uses default_policy id __GFP_THISNODE is
   used

So we shouldn't even need to care about this possibility and can drop
the confusing code.  Let's keep a WARN_ON_ONCE in place to catch
potential users and fix them up properly (aka use a different allocation
function which ignores mempolicy).

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20161013125958.32155-1-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Anshuman Khandual &lt;khandual@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm, thp: avoid unlikely branches for split_huge_pmd</title>
<updated>2016-12-13T02:55:07+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2016-12-13T00:42:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fd60775aea802beef444881ddfa111a4b73b1bbc'/>
<id>fd60775aea802beef444881ddfa111a4b73b1bbc</id>
<content type='text'>
While doing MADV_DONTNEED on a large area of thp memory, I noticed we
encountered many unlikely() branches in profiles for each backing
hugepage.  This is because zap_pmd_range() would call split_huge_pmd(),
which rechecked the conditions that were already validated, but as part
of an unlikely() branch.

Avoid the unlikely() branch when in a context where pmd is known to be
good for __split_huge_pmd() directly.

Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1610181600300.84525@chino.kir.corp.google.com
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
While doing MADV_DONTNEED on a large area of thp memory, I noticed we
encountered many unlikely() branches in profiles for each backing
hugepage.  This is because zap_pmd_range() would call split_huge_pmd(),
which rechecked the conditions that were already validated, but as part
of an unlikely() branch.

Avoid the unlikely() branch when in a context where pmd is known to be
good for __split_huge_pmd() directly.

Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1610181600300.84525@chino.kir.corp.google.com
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: replace get_user_pages() write/force parameters with gup_flags</title>
<updated>2016-10-19T15:11:43+00:00</updated>
<author>
<name>Lorenzo Stoakes</name>
<email>lstoakes@gmail.com</email>
</author>
<published>2016-10-13T00:20:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=768ae309a96103ed02eb1e111e838c87854d8b51'/>
<id>768ae309a96103ed02eb1e111e838c87854d8b51</id>
<content type='text'>
This removes the 'write' and 'force' from get_user_pages() and replaces
them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers
as use of this flag can result in surprising behaviour (and hence bugs)
within the mm subsystem.

Signed-off-by: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Acked-by: Jesper Nilsson &lt;jesper.nilsson@axis.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This removes the 'write' and 'force' from get_user_pages() and replaces
them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers
as use of this flag can result in surprising behaviour (and hence bugs)
within the mm subsystem.

Signed-off-by: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Acked-by: Jesper Nilsson &lt;jesper.nilsson@axis.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: use zonelist name instead of using hardcoded index</title>
<updated>2016-10-08T01:46:28+00:00</updated>
<author>
<name>Aneesh Kumar K.V</name>
<email>aneesh.kumar@linux.vnet.ibm.com</email>
</author>
<published>2016-10-07T23:59:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c9634cf012321243ee8e4ea0fb0709904cd58395'/>
<id>c9634cf012321243ee8e4ea0fb0709904cd58395</id>
<content type='text'>
Use the existing enums instead of hardcoded index when looking at the
zonelist.  This makes it more readable.  No functionality change by this
patch.

Link: http://lkml.kernel.org/r/1472227078-24852-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Reviewed-by: Anshuman Khandual &lt;khandual@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use the existing enums instead of hardcoded index when looking at the
zonelist.  This makes it more readable.  No functionality change by this
patch.

Link: http://lkml.kernel.org/r/1472227078-24852-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Reviewed-by: Anshuman Khandual &lt;khandual@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm, mempolicy: task-&gt;mempolicy must be NULL before dropping final reference</title>
<updated>2016-09-02T00:52:01+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2016-09-01T23:15:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c11600e4fed67ae4cd6a8096936afd445410e8ed'/>
<id>c11600e4fed67ae4cd6a8096936afd445410e8ed</id>
<content type='text'>
KASAN allocates memory from the page allocator as part of
kmem_cache_free(), and that can reference current-&gt;mempolicy through any
number of allocation functions.  It needs to be NULL'd out before the
final reference is dropped to prevent a use-after-free bug:

	BUG: KASAN: use-after-free in alloc_pages_current+0x363/0x370 at addr ffff88010b48102c
	CPU: 0 PID: 15425 Comm: trinity-c2 Not tainted 4.8.0-rc2+ #140
	...
	Call Trace:
		dump_stack
		kasan_object_err
		kasan_report_error
		__asan_report_load2_noabort
		alloc_pages_current	&lt;-- use after free
		depot_save_stack
		save_stack
		kasan_slab_free
		kmem_cache_free
		__mpol_put		&lt;-- free
		do_exit

This patch sets current-&gt;mempolicy to NULL before dropping the final
reference.

Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1608301442180.63329@chino.kir.corp.google.com
Fixes: cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB")
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Reported-by: Vegard Nossum &lt;vegard.nossum@oracle.com&gt;
Acked-by: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.6+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
KASAN allocates memory from the page allocator as part of
kmem_cache_free(), and that can reference current-&gt;mempolicy through any
number of allocation functions.  It needs to be NULL'd out before the
final reference is dropped to prevent a use-after-free bug:

	BUG: KASAN: use-after-free in alloc_pages_current+0x363/0x370 at addr ffff88010b48102c
	CPU: 0 PID: 15425 Comm: trinity-c2 Not tainted 4.8.0-rc2+ #140
	...
	Call Trace:
		dump_stack
		kasan_object_err
		kasan_report_error
		__asan_report_load2_noabort
		alloc_pages_current	&lt;-- use after free
		depot_save_stack
		save_stack
		kasan_slab_free
		kmem_cache_free
		__mpol_put		&lt;-- free
		do_exit

This patch sets current-&gt;mempolicy to NULL before dropping the final
reference.

Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1608301442180.63329@chino.kir.corp.google.com
Fixes: cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB")
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Reported-by: Vegard Nossum &lt;vegard.nossum@oracle.com&gt;
Acked-by: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.6+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm, vmscan: move LRU lists to node</title>
<updated>2016-07-28T23:07:41+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@techsingularity.net</email>
</author>
<published>2016-07-28T22:45:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=599d0c954f91d0689c9bb421b5bc04ea02437a41'/>
<id>599d0c954f91d0689c9bb421b5bc04ea02437a41</id>
<content type='text'>
This moves the LRU lists from the zone to the node and related data such
as counters, tracing, congestion tracking and writeback tracking.

Unfortunately, due to reclaim and compaction retry logic, it is
necessary to account for the number of LRU pages on both zone and node
logic.  Most reclaim logic is based on the node counters but the retry
logic uses the zone counters which do not distinguish inactive and
active sizes.  It would be possible to leave the LRU counters on a
per-zone basis but it's a heavier calculation across multiple cache
lines that is much more frequent than the retry checks.

Other than the LRU counters, this is mostly a mechanical patch but note
that it introduces a number of anomalies.  For example, the scans are
per-zone but using per-node counters.  We also mark a node as congested
when a zone is congested.  This causes weird problems that are fixed
later but is easier to review.

In the event that there is excessive overhead on 32-bit systems due to
the nodes being on LRU then there are two potential solutions

1. Long-term isolation of highmem pages when reclaim is lowmem

   When pages are skipped, they are immediately added back onto the LRU
   list. If lowmem reclaim persisted for long periods of time, the same
   highmem pages get continually scanned. The idea would be that lowmem
   keeps those pages on a separate list until a reclaim for highmem pages
   arrives that splices the highmem pages back onto the LRU. It potentially
   could be implemented similar to the UNEVICTABLE list.

   That would reduce the skip rate with the potential corner case is that
   highmem pages have to be scanned and reclaimed to free lowmem slab pages.

2. Linear scan lowmem pages if the initial LRU shrink fails

   This will break LRU ordering but may be preferable and faster during
   memory pressure than skipping LRU pages.

Link: http://lkml.kernel.org/r/1467970510-21195-4-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Hillf Danton &lt;hillf.zj@alibaba-inc.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This moves the LRU lists from the zone to the node and related data such
as counters, tracing, congestion tracking and writeback tracking.

Unfortunately, due to reclaim and compaction retry logic, it is
necessary to account for the number of LRU pages on both zone and node
logic.  Most reclaim logic is based on the node counters but the retry
logic uses the zone counters which do not distinguish inactive and
active sizes.  It would be possible to leave the LRU counters on a
per-zone basis but it's a heavier calculation across multiple cache
lines that is much more frequent than the retry checks.

Other than the LRU counters, this is mostly a mechanical patch but note
that it introduces a number of anomalies.  For example, the scans are
per-zone but using per-node counters.  We also mark a node as congested
when a zone is congested.  This causes weird problems that are fixed
later but is easier to review.

In the event that there is excessive overhead on 32-bit systems due to
the nodes being on LRU then there are two potential solutions

1. Long-term isolation of highmem pages when reclaim is lowmem

   When pages are skipped, they are immediately added back onto the LRU
   list. If lowmem reclaim persisted for long periods of time, the same
   highmem pages get continually scanned. The idea would be that lowmem
   keeps those pages on a separate list until a reclaim for highmem pages
   arrives that splices the highmem pages back onto the LRU. It potentially
   could be implemented similar to the UNEVICTABLE list.

   That would reduce the skip rate with the potential corner case is that
   highmem pages have to be scanned and reclaimed to free lowmem slab pages.

2. Linear scan lowmem pages if the initial LRU shrink fails

   This will break LRU ordering but may be preferable and faster during
   memory pressure than skipping LRU pages.

Link: http://lkml.kernel.org/r/1467970510-21195-4-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Hillf Danton &lt;hillf.zj@alibaba-inc.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>shmem: add huge pages support</title>
<updated>2016-07-26T23:19:19+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-07-26T22:26:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=800d8c63b2e989c2e349632d1648119bf5862f01'/>
<id>800d8c63b2e989c2e349632d1648119bf5862f01</id>
<content type='text'>
Here's basic implementation of huge pages support for shmem/tmpfs.

It's all pretty streight-forward:

  - shmem_getpage() allcoates huge page if it can and try to inserd into
    radix tree with shmem_add_to_page_cache();

  - shmem_add_to_page_cache() puts the page onto radix-tree if there's
    space for it;

  - shmem_undo_range() removes huge pages, if it fully within range.
    Partial truncate of huge pages zero out this part of THP.

    This have visible effect on fallocate(FALLOC_FL_PUNCH_HOLE)
    behaviour. As we don't really create hole in this case,
    lseek(SEEK_HOLE) may have inconsistent results depending what
    pages happened to be allocated.

  - no need to change shmem_fault: core-mm will map an compound page as
    huge if VMA is suitable;

Link: http://lkml.kernel.org/r/1466021202-61880-30-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Here's basic implementation of huge pages support for shmem/tmpfs.

It's all pretty streight-forward:

  - shmem_getpage() allcoates huge page if it can and try to inserd into
    radix tree with shmem_add_to_page_cache();

  - shmem_add_to_page_cache() puts the page onto radix-tree if there's
    space for it;

  - shmem_undo_range() removes huge pages, if it fully within range.
    Partial truncate of huge pages zero out this part of THP.

    This have visible effect on fallocate(FALLOC_FL_PUNCH_HOLE)
    behaviour. As we don't really create hole in this case,
    lseek(SEEK_HOLE) may have inconsistent results depending what
    pages happened to be allocated.

  - no need to change shmem_fault: core-mm will map an compound page as
    huge if VMA is suitable;

Link: http://lkml.kernel.org/r/1466021202-61880-30-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
