From a8bef8ff6ea15fa4c67433cab0f5f3484574ef7c Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Mon, 24 May 2010 14:32:24 -0700 Subject: mm: migration: avoid race between shift_arg_pages() and rmap_walk() during migration by not migrating temporary stacks Page migration requires rmap to be able to find all ptes mapping a page at all times, otherwise the migration entry can be instantiated, but it is possible to leave one behind if the second rmap_walk fails to find the page. If this page is later faulted, migration_entry_to_page() will call BUG because the page is locked indicating the page was migrated by the migration PTE not cleaned up. For example kernel BUG at include/linux/swapops.h:105! invalid opcode: 0000 [#1] PREEMPT SMP ... Call Trace: [] handle_mm_fault+0x3f8/0x76a [] do_page_fault+0x44a/0x46e [] page_fault+0x25/0x30 [] load_elf_binary+0x152a/0x192b [] search_binary_handler+0x173/0x313 [] do_execve+0x219/0x30a [] sys_execve+0x43/0x5e [] stub_execve+0x6a/0xc0 RIP [] migration_entry_wait+0xc1/0x129 There is a race between shift_arg_pages and migration that triggers this bug. A temporary stack is setup during exec and later moved. If migration moves a page in the temporary stack and the VMA is then removed before migration completes, the migration PTE may not be found leading to a BUG when the stack is faulted. This patch causes pages within the temporary stack during exec to be skipped by migration. It does this by marking the VMA covering the temporary stack with an otherwise impossible combination of VMA flags. These flags are cleared when the temporary stack is moved to its final location. [kamezawa.hiroyu@jp.fujitsu.com: idea for having migration skip temporary stacks] Signed-off-by: Mel Gorman Reviewed-by: KAMEZAWA Hiroyuki Reviewed-by: Rik van Riel Acked-by: Linus Torvalds Cc: Minchan Kim Cc: Christoph Lameter Cc: Andrea Arcangeli Cc: Rik van Riel Cc: Peter Zijlstra Reviewed-by: KOSAKI Motohiro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux/mm.h') diff --git a/include/linux/mm.h b/include/linux/mm.h index fb19bb92b809..98ea5bab963e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -106,6 +106,9 @@ extern unsigned int kobjsize(const void *objp); #define VM_PFN_AT_MMAP 0x40000000 /* PFNMAP vma that is fully mapped at mmap time */ #define VM_MERGEABLE 0x80000000 /* KSM may merge identical pages */ +/* Bits set in the VMA until the stack is in its final location */ +#define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ) + #ifndef VM_STACK_DEFAULT_FLAGS /* arch can override this */ #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS #endif -- cgit v1.2.3 From 748446bb6b5a9390b546af38ec899c868a9dbcf0 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Mon, 24 May 2010 14:32:27 -0700 Subject: mm: compaction: memory compaction core This patch is the core of a mechanism which compacts memory in a zone by relocating movable pages towards the end of the zone. A single compaction run involves a migration scanner and a free scanner. Both scanners operate on pageblock-sized areas in the zone. The migration scanner starts at the bottom of the zone and searches for all movable pages within each area, isolating them onto a private list called migratelist. The free scanner starts at the top of the zone and searches for suitable areas and consumes the free pages within making them available for the migration scanner. The pages isolated for migration are then migrated to the newly isolated free pages. [aarcange@redhat.com: Fix unsafe optimisation] [mel@csn.ul.ie: do not schedule work on other CPUs for compaction] Signed-off-by: Mel Gorman Acked-by: Rik van Riel Reviewed-by: Minchan Kim Cc: KOSAKI Motohiro Cc: Christoph Lameter Cc: KAMEZAWA Hiroyuki Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux/mm.h') diff --git a/include/linux/mm.h b/include/linux/mm.h index 98ea5bab963e..963f908af9d0 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -337,6 +337,7 @@ void put_page(struct page *page); void put_pages_list(struct list_head *pages); void split_page(struct page *page, unsigned int order); +int split_free_page(struct page *page); /* * Compound pages have a destructor function. Provide a -- cgit v1.2.3 From c6f6b596a5a73e63e5e930c414375c0c389199ab Mon Sep 17 00:00:00 2001 From: Chris Metcalf Date: Mon, 24 May 2010 14:32:53 -0700 Subject: mm: make lowmem_page_address() use PFN_PHYS() for improved portability This ensures that platforms with lowmem PAs above 32 bits work correctly by avoiding truncating the PA during a left shift. Signed-off-by: Chris Metcalf Cc: Barry Song <21cnbao@gmail.com> Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux/mm.h') diff --git a/include/linux/mm.h b/include/linux/mm.h index 963f908af9d0..b969efb03787 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -13,6 +13,7 @@ #include #include #include +#include struct mempolicy; struct anon_vma; @@ -595,7 +596,7 @@ static inline void set_page_links(struct page *page, enum zone_type zone, static __always_inline void *lowmem_page_address(struct page *page) { - return __va(page_to_pfn(page) << PAGE_SHIFT); + return __va(PFN_PHYS(page_to_pfn(page))); } #if defined(CONFIG_HIGHMEM) && !defined(WANT_PAGE_VIRTUAL) -- cgit v1.2.3