diff options
author | Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> | 2015-08-06 15:47:08 -0700 |
---|---|---|
committer | Sasha Levin <sasha.levin@oracle.com> | 2016-05-17 13:42:51 -0400 |
commit | 9de27bd72b3aba88cf7847e8834cc54745ea3352 (patch) | |
tree | 126134a7ae31d09befd193d670b718f85c81c985 /mm | |
parent | 6936c1672176fa8403818ff95cff1f4a972b787d (diff) |
mm: check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_*
[ Upstream commit f4c18e6f7b5bbb5b528b3334115806b0d76f50f9 ]
The race condition addressed in commit add05cecef80 ("mm: soft-offline:
don't free target page in successful page migration") was not closed
completely, because that can happen not only for soft-offline, but also
for hard-offline. Consider that a slab page is about to be freed into
buddy pool, and then an uncorrected memory error hits the page just
after entering __free_one_page(), then VM_BUG_ON_PAGE(page->flags &
PAGE_FLAGS_CHECK_AT_PREP) is triggered, despite the fact that it's not
necessary because the data on the affected page is not consumed.
To solve it, this patch drops __PG_HWPOISON from page flag checks at
allocation/free time. I think it's justified because __PG_HWPOISON
flags is defined to prevent the page from being reused, and setting it
outside the page's alloc-free cycle is a designed behavior (not a bug.)
For recent months, I was annoyed about BUG_ON when soft-offlined page
remains on lru cache list for a while, which is avoided by calling
put_page() instead of putback_lru_page() in page migration's success
path. This means that this patch reverts a major change from commit
add05cecef80 about the new refcounting rule of soft-offlined pages, so
"reuse window" revives. This will be closed by a subsequent patch.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dean Nelson <dnelson@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Diffstat (limited to 'mm')
-rw-r--r-- | mm/huge_memory.c | 7 | ||||
-rw-r--r-- | mm/migrate.c | 5 | ||||
-rw-r--r-- | mm/page_alloc.c | 4 |
3 files changed, 9 insertions, 7 deletions
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b807a8fbf3bc..52975ebcfaa4 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1676,12 +1676,7 @@ static void __split_huge_page_refcount(struct page *page, /* after clearing PageTail the gup refcount can be released */ smp_mb__after_atomic(); - /* - * retain hwpoison flag of the poisoned tail page: - * fix for the unsuitable process killed on Guest Machine(KVM) - * by the memory-failure. - */ - page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP | __PG_HWPOISON; + page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; page_tail->flags |= (page->flags & ((1L << PG_referenced) | (1L << PG_swapbacked) | diff --git a/mm/migrate.c b/mm/migrate.c index 1d425611b180..fe71f91c7b27 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -950,7 +950,10 @@ out: list_del(&page->lru); dec_zone_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page)); - if (reason != MR_MEMORY_FAILURE) + /* Soft-offlined page shouldn't go through lru cache list */ + if (reason == MR_MEMORY_FAILURE) + put_page(page); + else putback_lru_page(page); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 872b2ac95dec..551923097bbc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -962,6 +962,10 @@ static inline int check_new_page(struct page *page) bad_reason = "non-NULL mapping"; if (unlikely(atomic_read(&page->_count) != 0)) bad_reason = "nonzero _count"; + if (unlikely(page->flags & __PG_HWPOISON)) { + bad_reason = "HWPoisoned (hardware-corrupted)"; + bad_flags = __PG_HWPOISON; + } if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_PREP)) { bad_reason = "PAGE_FLAGS_CHECK_AT_PREP flag set"; bad_flags = PAGE_FLAGS_CHECK_AT_PREP; |