linux-toradex.git/include/linux/mempolicy.h, branch v4.11-rc3

mm: disable numa migration faults for dax vmas

2016-12-13T02:55:07+00:00

Mark dax vmas as not migratable to exclude them from task_numa_work().
This is especially relevant for device-dax which wants to ensure
predictable access latency and not incur periodic faults.

[akpm@linux-foundation.org: add comment]
Link: http://lkml.kernel.org/r/147892450132.22062.16875659431109209179.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams 
Reported-by: Aneesh Kumar K.V 
Cc: Michal Hocko 
Cc: Vlastimil Babka 
Cc: "Kirill A. Shutemov" 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, mempolicy: task->mempolicy must be NULL before dropping final reference

2016-09-02T00:52:01+00:00

KASAN allocates memory from the page allocator as part of
kmem_cache_free(), and that can reference current->mempolicy through any
number of allocation functions.  It needs to be NULL'd out before the
final reference is dropped to prevent a use-after-free bug:

	BUG: KASAN: use-after-free in alloc_pages_current+0x363/0x370 at addr ffff88010b48102c
	CPU: 0 PID: 15425 Comm: trinity-c2 Not tainted 4.8.0-rc2+ #140
	...
	Call Trace:
		dump_stack
		kasan_object_err
		kasan_report_error
		__asan_report_load2_noabort
		alloc_pages_current	<-- use after free
		depot_save_stack
		save_stack
		kasan_slab_free
		kmem_cache_free
		__mpol_put		<-- free
		do_exit

This patch sets current->mempolicy to NULL before dropping the final
reference.

Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1608301442180.63329@chino.kir.corp.google.com
Fixes: cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB")
Signed-off-by: David Rientjes 
Reported-by: Vegard Nossum 
Acked-by: Andrey Ryabinin 
Cc: Alexander Potapenko 
Cc: Dmitry Vyukov 
Cc: 	[4.6+]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

tmpfs: preliminary minor tidyups

2016-05-20T02:12:14+00:00

Make a few cleanups in mm/shmem.c, before going on to complicate it.

shmem_alloc_page() will become more complicated: we can't afford to to
have that complication duplicated between a CONFIG_NUMA version and a
!CONFIG_NUMA version, so rearrange the #ifdef'ery there to yield a
single shmem_swapin() and a single shmem_alloc_page().

Yes, it's a shame to inflict the horrid pseudo-vma on non-NUMA
configurations, but eliminating it is a larger cleanup: I have an
alloc_pages_mpol() patchset not yet ready - mpol handling is subtle and
bug-prone, and changed yet again since my last version.

Move __SetPageLocked, __SetPageSwapBacked from shmem_getpage_gfp() to
shmem_alloc_page(): that SwapBacked flag will be useful in future, to
help to distinguish different cases appropriately.

And the SGP_DIRTY variant of SGP_CACHE is hard to understand and of
little use (IIRC it dates back to when shmem_getpage() returned the page
unlocked): kill it and do the necessary in shmem_file_read_iter().

But an arm64 build then complained that info may be uninitialized (where
shmem_getpage_gfp() deletes a freshly alloced page beyond eof), and
advancing to an "sgp <= SGP_CACHE" test jogged it back to reality.

Signed-off-by: Hugh Dickins 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Andres Lagar-Cavilla 
Cc: Yang Shi 
Cc: Ning Qu 
Cc: Mel Gorman 
Cc: Konstantin Khlebnikov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/mempolicy.c: vma_migratable() can return bool

2016-05-20T02:12:14+00:00

Make vma_migratable() return bool due to this particular function only
using either one or zero as its return value.

Signed-off-by: Yaowei Bai 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/mempolicy.c: convert the shared_policy lock to a rwlock

2016-01-15T00:00:49+00:00

When running the SPECint_rate gcc on some very large boxes it was
noticed that the system was spending lots of time in
mpol_shared_policy_lookup().  The gamess benchmark can also show it and
is what I mostly used to chase down the issue since the setup for that I
found to be easier.

To be clear the binaries were on tmpfs because of disk I/O requirements.
We then used text replication to avoid icache misses and having all the
copies banging on the memory where the instruction code resides.  This
results in us hitting a bottleneck in mpol_shared_policy_lookup() since
lookup is serialised by the shared_policy lock.

I have only reproduced this on very large (3k+ cores) boxes.  The
problem starts showing up at just a few hundred ranks getting worse
until it threatens to livelock once it gets large enough.  For example
on the gamess benchmark at 128 ranks this area consumes only ~1% of
time, at 512 ranks it consumes nearly 13%, and at 2k ranks it is over
90%.

To alleviate the contention in this area I converted the spinlock to an
rwlock.  This allows a large number of lookups to happen simultaneously.
The results were quite good reducing this consumtion at max ranks to
around 2%.

[akpm@linux-foundation.org: tidy up code comments]
Signed-off-by: Nathan Zimmer 
Acked-by: David Rientjes 
Acked-by: Vlastimil Babka 
Cc: Nadia Yvette Chambers 
Cc: Naoya Horiguchi 
Cc: Mel Gorman 
Cc: "Aneesh Kumar K.V" 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mempolicy: unexport get_vma_policy() and remove its "task" arg

2014-10-10T02:25:56+00:00

- get_vma_policy(task) is not safe if task != current, remove this
  argument.

- get_vma_policy() no longer has callers outside of mempolicy.c,
  make it static.

Signed-off-by: Oleg Nesterov 
Cc: KAMEZAWA Hiroyuki 
Cc: David Rientjes 
Cc: KOSAKI Motohiro 
Cc: Alexander Viro 
Cc: Cyrill Gorcunov 
Cc: "Eric W. Biederman" 
Cc: "Kirill A. Shutemov" 
Cc: Peter Zijlstra 
Cc: Hugh Dickins 
Cc: Andi Kleen 
Cc: Naoya Horiguchi 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mempolicy: introduce __get_vma_policy(), export get_task_policy()

2014-10-10T02:25:56+00:00

Extract the code which looks for vma's policy from get_vma_policy()
into the new helper, __get_vma_policy(). Export get_task_policy().

Signed-off-by: Oleg Nesterov 
Cc: KAMEZAWA Hiroyuki 
Cc: David Rientjes 
Cc: KOSAKI Motohiro 
Cc: Alexander Viro 
Cc: Cyrill Gorcunov 
Cc: "Eric W. Biederman" 
Cc: "Kirill A. Shutemov" 
Cc: Peter Zijlstra 
Cc: Hugh Dickins 
Cc: Andi Kleen 
Cc: Naoya Horiguchi 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mempolicy: remove the "task" arg of vma_policy_mof() and simplify it

2014-10-10T02:25:56+00:00

1. vma_policy_mof(task) is simply not safe unless task == current,
   it can race with do_exit()->mpol_put(). Remove this arg and update
   its single caller.

2. vma can not be NULL, remove this check and simplify the code.

Signed-off-by: Oleg Nesterov 
Cc: KAMEZAWA Hiroyuki 
Cc: David Rientjes 
Cc: KOSAKI Motohiro 
Cc: Alexander Viro 
Cc: Cyrill Gorcunov 
Cc: "Eric W. Biederman" 
Cc: "Kirill A. Shutemov" 
Cc: Peter Zijlstra 
Cc: Hugh Dickins 
Cc: Andi Kleen 
Cc: Naoya Horiguchi 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

hugetlb: restrict hugepage_migration_support() to x86_64

2014-06-04T23:53:51+00:00

Currently hugepage migration is available for all archs which support
pmd-level hugepage, but testing is done only for x86_64 and there're
bugs for other archs.  So to avoid breaking such archs, this patch
limits the availability strictly to x86_64 until developers of other
archs get interested in enabling this feature.

Simply disabling hugepage migration on non-x86_64 archs is not enough to
fix the reported problem where sys_move_pages() hits the BUG_ON() in
follow_page(FOLL_GET), so let's fix this by checking if hugepage
migration is supported in vma_migratable().

Signed-off-by: Naoya Horiguchi 
Reported-by: Michael Ellerman 
Tested-by: Michael Ellerman 
Acked-by: Hugh Dickins 
Cc: Benjamin Herrenschmidt 
Cc: Tony Luck 
Cc: Russell King 
Cc: Martin Schwidefsky 
Cc: James Hogan 
Cc: Ralf Baechle 
Cc: David Miller 
Cc: 	[3.12+]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, mempolicy: remove per-process flag

2014-04-07T23:35:54+00:00

PF_MEMPOLICY is an unnecessary optimization for CONFIG_SLAB users.
There's no significant performance degradation to checking
current->mempolicy rather than current->flags & PF_MEMPOLICY in the
allocation path, especially since this is considered unlikely().

Running TCP_RR with netperf-2.4.5 through localhost on 16 cpu machine with
64GB of memory and without a mempolicy:

	threads		before		after
	16		1249409		1244487
	32		1281786		1246783
	48		1239175		1239138
	64		1244642		1241841
	80		1244346		1248918
	96		1266436		1254316
	112		1307398		1312135
	128		1327607		1326502

Per-process flags are a scarce resource so we should free them up whenever
possible and make them available.  We'll be using it shortly for memcg oom
reserves.

Signed-off-by: David Rientjes 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: KAMEZAWA Hiroyuki 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: Tejun Heo 
Cc: Mel Gorman 
Cc: Oleg Nesterov 
Cc: Rik van Riel 
Cc: Jianguo Wu 
Cc: Tim Hockin 
Cc: Christoph Lameter 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds