linux-toradex.git/mm, branch v3.14.8

mm: rmap: fix use-after-free in __put_anon_vma

2014-06-11T18:54:13+00:00

commit 624483f3ea82598ab0f62f1bdb9177f531ab1892 upstream.

While working address sanitizer for kernel I've discovered
use-after-free bug in __put_anon_vma.

For the last anon_vma, anon_vma->root freed before child anon_vma.
Later in anon_vma_free(anon_vma) we are referencing to already freed
anon_vma->root to check rwsem.

This fixes it by freeing the child anon_vma before freeing
anon_vma->root.

Signed-off-by: Andrey Ryabinin 
Acked-by: Peter Zijlstra 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm: add !pte_present() check on existing hugetlb_entry callbacks

2014-06-11T18:54:13+00:00

commit d4c54919ed86302094c0ca7d48a8cbd4ee753e92 upstream.

The age table walker doesn't check non-present hugetlb entry in common
path, so hugetlb_entry() callbacks must check it.  The reason for this
behavior is that some callers want to handle it in its own way.

[ I think that reason is bogus, btw - it should just do what the regular
  code does, which is to call the "pte_hole()" function for such hugetlb
  entries  - Linus]

However, some callers don't check it now, which causes unpredictable
result, for example when we have a race between migrating hugepage and
reading /proc/pid/numa_maps.  This patch fixes it by adding !pte_present
checks on buggy callbacks.

This bug exists for years and got visible by introducing hugepage
migration.

ChangeLog v2:
- fix if condition (check !pte_present() instead of pte_present())

Reported-by: Sasha Levin 
Signed-off-by: Naoya Horiguchi 
Cc: Rik van Riel 
Signed-off-by: Andrew Morton 
[ Backported to 3.15.  Signed-off-by: Josh Boyer  ]
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

Revert "revert "mm: vmscan: do not swap anon pages just because free+file is low""

2014-06-11T18:54:10+00:00

This reverts commit 623762517e2370be3b3f95f4fe08d6c063a49b06.

Ben rightly points out that commit 0bf1457f0cfc, which is what this
original commit was reverting, never ended up in 3.14-stable, but was
only for 3.15.

So revert this patch as we now have the same check twice in a row, which
is pretty pointless.  Although the comments were "prettier"...

Cc: Ben Hutchings 
Cc: Johannes Weiner 
Cc: Christian Borntraeger 
Cc: Christian Borntraeger 
Cc: Rafael Aquini 
Cc: Rik van Riel 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm/memory-failure.c: fix memory leak by race between poison and unpoison

2014-06-11T18:54:08+00:00

commit 3e030ecc0fc7de10fd0da10c1c19939872a31717 upstream.

When a memory error happens on an in-use page or (free and in-use)
hugepage, the victim page is isolated with its refcount set to one.

When you try to unpoison it later, unpoison_memory() calls put_page()
for it twice in order to bring the page back to free page pool (buddy or
free hugepage list).  However, if another memory error occurs on the
page which we are unpoisoning, memory_failure() returns without
releasing the refcount which was incremented in the same call at first,
which results in memory leak and unconsistent num_poisoned_pages
statistics.  This patch fixes it.

Signed-off-by: Naoya Horiguchi 
Cc: Andi Kleen 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

percpu: make pcpu_alloc_chunk() use pcpu_mem_free() instead of kfree()

2014-06-07T17:28:22+00:00

commit 5a838c3b60e3a36ade764cf7751b8f17d7c9c2da upstream.

pcpu_chunk_struct_size = sizeof(struct pcpu_chunk) +
	BITS_TO_LONGS(pcpu_unit_pages) * sizeof(unsigned long)

It hardly could be ever bigger than PAGE_SIZE even for large-scale machine,
but for consistency with its couterpart pcpu_mem_zalloc(),
use pcpu_mem_free() instead.

Commit b4916cb17c26 ("percpu: make pcpu_free_chunk() use
pcpu_mem_free() instead of kfree()") addressed this problem, but
missed this one.

tj: commit message updated

Signed-off-by: Jianyu Zhan 
Signed-off-by: Tejun Heo 
Fixes: 099a19d91ca4 ("percpu: allow limited allocation before slab is online)
Signed-off-by: Greg Kroah-Hartman

revert "mm: vmscan: do not swap anon pages just because free+file is low"

2014-06-07T17:28:17+00:00

commit 623762517e2370be3b3f95f4fe08d6c063a49b06 upstream.

This reverts commit 0bf1457f0cfc ("mm: vmscan: do not swap anon pages
just because free+file is low") because it introduced a regression in
mostly-anonymous workloads, where reclaim would become ineffective and
trap every allocating task in direct reclaim.

The problem is that there is a runaway feedback loop in the scan balance
between file and anon, where the balance tips heavily towards a tiny
thrashing file LRU and anonymous pages are no longer being looked at.
The commit in question removed the safe guard that would detect such
situations and respond with forced anonymous reclaim.

This commit was part of a series to fix premature swapping in loads with
relatively little cache, and while it made a small difference, the cure
is obviously worse than the disease.  Revert it.

Signed-off-by: Johannes Weiner 
Reported-by: Christian Borntraeger 
Acked-by: Christian Borntraeger 
Acked-by: Rafael Aquini 
Cc: Rik van Riel 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm/compaction: make isolate_freepages start at pageblock boundary

2014-06-07T17:28:17+00:00

commit 49e068f0b73dd042c186ffa9b420a9943e90389a upstream.

The compaction freepage scanner implementation in isolate_freepages()
starts by taking the current cc->free_pfn value as the first pfn.  In a
for loop, it scans from this first pfn to the end of the pageblock, and
then subtracts pageblock_nr_pages from the first pfn to obtain the first
pfn for the next for loop iteration.

This means that when cc->free_pfn starts at offset X rather than being
aligned on pageblock boundary, the scanner will start at offset X in all
scanned pageblock, ignoring potentially many free pages.  Currently this
can happen when

 a) zone's end pfn is not pageblock aligned, or

 b) through zone->compact_cached_free_pfn with CONFIG_HOLES_IN_ZONE
    enabled and a hole spanning the beginning of a pageblock

This patch fixes the problem by aligning the initial pfn in
isolate_freepages() to pageblock boundary.  This also permits replacing
the end-of-pageblock alignment within the for loop with a simple
pageblock_nr_pages increment.

Signed-off-by: Vlastimil Babka 
Reported-by: Heesub Shin 
Acked-by: Minchan Kim 
Cc: Mel Gorman 
Acked-by: Joonsoo Kim 
Cc: Bartlomiej Zolnierkiewicz 
Cc: Michal Nazarewicz 
Cc: Naoya Horiguchi 
Cc: Christoph Lameter 
Acked-by: Rik van Riel 
Cc: Dongjun Shin 
Cc: Sunghwan Yun 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm/page-writeback.c: fix divide by zero in pos_ratio_polynom

2014-06-07T17:28:16+00:00

commit d5c9fde3dae750889168807038243ff36431d276 upstream.

It is possible for "limit - setpoint + 1" to equal zero, after getting
truncated to a 32 bit variable, and resulting in a divide by zero error.

Using the fully 64 bit divide functions avoids this problem.  It also
will cause pos_ratio_polynom() to return the correct value when
(setpoint - limit) exceeds 2^32.

Also uninline pos_ratio_polynom, at Andrew's request.

Signed-off-by: Rik van Riel 
Reviewed-by: Michal Hocko 
Cc: Aneesh Kumar K.V 
Cc: Mel Gorman 
Cc: Nishanth Aravamudan 
Cc: Luiz Capitulino 
Cc: Masayoshi Mizuma 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

hwpoison, hugetlb: lock_page/unlock_page does not match for handling a free hugepage

2014-06-07T17:28:11+00:00

commit b985194c8c0a130ed155b71662e39f7eaea4876f upstream.

For handling a free hugepage in memory failure, the race will happen if
another thread hwpoisoned this hugepage concurrently.  So we need to
check PageHWPoison instead of !PageHWPoison.

If hwpoison_filter(p) returns true or a race happens, then we need to
unlock_page(hpage).

Signed-off-by: Chen Yucong 
Reviewed-by: Naoya Horiguchi 
Tested-by: Naoya Horiguchi 
Reviewed-by: Andi Kleen 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm, thp: close race between mremap() and split_huge_page()

2014-06-07T17:28:10+00:00

commit dd18dbc2d42af75fffa60c77e0f02220bc329829 upstream.

It's critical for split_huge_page() (and migration) to catch and freeze
all PMDs on rmap walk.  It gets tricky if there's concurrent fork() or
mremap() since usually we copy/move page table entries on dup_mm() or
move_page_tables() without rmap lock taken.  To get it work we rely on
rmap walk order to not miss any entry.  We expect to see destination VMA
after source one to work correctly.

But after switching rmap implementation to interval tree it's not always
possible to preserve expected walk order.

It works fine for dup_mm() since new VMA has the same vma_start_pgoff()
/ vma_last_pgoff() and explicitly insert dst VMA after src one with
vma_interval_tree_insert_after().

But on move_vma() destination VMA can be merged into adjacent one and as
result shifted left in interval tree.  Fortunately, we can detect the
situation and prevent race with rmap walk by moving page table entries
under rmap lock.  See commit 38a76013ad80.

Problem is that we miss the lock when we move transhuge PMD.  Most
likely this bug caused the crash[1].

[1] http://thread.gmane.org/gmane.linux.kernel.mm/96473

Fixes: 108d6642ad81 ("mm anon rmap: remove anon_vma_moveto_tail")

Signed-off-by: Kirill A. Shutemov 
Reviewed-by: Andrea Arcangeli 
Cc: Rik van Riel 
Acked-by: Michel Lespinasse 
Cc: Dave Jones 
Cc: David Miller 
Acked-by: Johannes Weiner 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman