From bda807d4445414e8e77da704f116bb0880fe0c76 Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Tue, 26 Jul 2016 15:23:05 -0700 Subject: mm: migrate: support non-lru movable page migration We have allowed migration for only LRU pages until now and it was enough to make high-order pages. But recently, embedded system(e.g., webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory) so we have seen several reports about troubles of small high-order allocation. For fixing the problem, there were several efforts (e,g,. enhance compaction algorithm, SLUB fallback to 0-order page, reserved memory, vmalloc and so on) but if there are lots of non-movable pages in system, their solutions are void in the long run. So, this patch is to support facility to change non-movable pages with movable. For the feature, this patch introduces functions related to migration to address_space_operations as well as some page flags. If a driver want to make own pages movable, it should define three functions which are function pointers of struct address_space_operations. 1. bool (*isolate_page) (struct page *page, isolate_mode_t mode); What VM expects on isolate_page function of driver is to return *true* if driver isolates page successfully. On returing true, VM marks the page as PG_isolated so concurrent isolation in several CPUs skip the page for isolation. If a driver cannot isolate the page, it should return *false*. Once page is successfully isolated, VM uses page.lru fields so driver shouldn't expect to preserve values in that fields. 2. int (*migratepage) (struct address_space *mapping, struct page *newpage, struct page *oldpage, enum migrate_mode); After isolation, VM calls migratepage of driver with isolated page. The function of migratepage is to move content of the old page to new page and set up fields of struct page newpage. Keep in mind that you should indicate to the VM the oldpage is no longer movable via __ClearPageMovable() under page_lock if you migrated the oldpage successfully and returns 0. If driver cannot migrate the page at the moment, driver can return -EAGAIN. On -EAGAIN, VM will retry page migration in a short time because VM interprets -EAGAIN as "temporal migration failure". On returning any error except -EAGAIN, VM will give up the page migration without retrying in this time. Driver shouldn't touch page.lru field VM using in the functions. 3. void (*putback_page)(struct page *); If migration fails on isolated page, VM should return the isolated page to the driver so VM calls driver's putback_page with migration failed page. In this function, driver should put the isolated page back to the own data structure. 4. non-lru movable page flags There are two page flags for supporting non-lru movable page. * PG_movable Driver should use the below function to make page movable under page_lock. void __SetPageMovable(struct page *page, struct address_space *mapping) It needs argument of address_space for registering migration family functions which will be called by VM. Exactly speaking, PG_movable is not a real flag of struct page. Rather than, VM reuses page->mapping's lower bits to represent it. #define PAGE_MAPPING_MOVABLE 0x2 page->mapping = page->mapping | PAGE_MAPPING_MOVABLE; so driver shouldn't access page->mapping directly. Instead, driver should use page_mapping which mask off the low two bits of page->mapping so it can get right struct address_space. For testing of non-lru movable page, VM supports __PageMovable function. However, it doesn't guarantee to identify non-lru movable page because page->mapping field is unified with other variables in struct page. As well, if driver releases the page after isolation by VM, page->mapping doesn't have stable value although it has PAGE_MAPPING_MOVABLE (Look at __ClearPageMovable). But __PageMovable is cheap to catch whether page is LRU or non-lru movable once the page has been isolated. Because LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also good for just peeking to test non-lru movable pages before more expensive checking with lock_page in pfn scanning to select victim. For guaranteeing non-lru movable page, VM provides PageMovable function. Unlike __PageMovable, PageMovable functions validates page->mapping and mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden destroying of page->mapping. Driver using __SetPageMovable should clear the flag via __ClearMovablePage under page_lock before the releasing the page. * PG_isolated To prevent concurrent isolation among several CPUs, VM marks isolated page as PG_isolated under lock_page. So if a CPU encounters PG_isolated non-lru movable page, it can skip it. Driver doesn't need to manipulate the flag because VM will set/clear it automatically. Keep in mind that if driver sees PG_isolated page, it means the page have been isolated by VM so it shouldn't touch page.lru field. PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag for own purpose. [opensource.ganesh@gmail.com: mm/compaction: remove local variable is_lru] Link: http://lkml.kernel.org/r/20160618014841.GA7422@leo-test Link: http://lkml.kernel.org/r/1464736881-24886-3-git-send-email-minchan@kernel.org Signed-off-by: Gioh Kim Signed-off-by: Minchan Kim Signed-off-by: Ganesh Mahendran Acked-by: Vlastimil Babka Cc: Sergey Senozhatsky Cc: Rik van Riel Cc: Joonsoo Kim Cc: Mel Gorman Cc: Hugh Dickins Cc: Rafael Aquini Cc: Jonathan Corbet Cc: John Einar Reitan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/compaction.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'include/linux/compaction.h') diff --git a/include/linux/compaction.h b/include/linux/compaction.h index a58c852a268f..c6b47c861cea 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -54,6 +54,9 @@ enum compact_result { struct alloc_context; /* in mm/internal.h */ #ifdef CONFIG_COMPACTION +extern int PageMovable(struct page *page); +extern void __SetPageMovable(struct page *page, struct address_space *mapping); +extern void __ClearPageMovable(struct page *page); extern int sysctl_compact_memory; extern int sysctl_compaction_handler(struct ctl_table *table, int write, void __user *buffer, size_t *length, loff_t *ppos); @@ -151,6 +154,19 @@ extern void kcompactd_stop(int nid); extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx); #else +static inline int PageMovable(struct page *page) +{ + return 0; +} +static inline void __SetPageMovable(struct page *page, + struct address_space *mapping) +{ +} + +static inline void __ClearPageMovable(struct page *page) +{ +} + static inline enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, int alloc_flags, const struct alloc_context *ac, @@ -212,6 +228,7 @@ static inline void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_i #endif /* CONFIG_COMPACTION */ #if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA) +struct node; extern int compaction_register_node(struct node *node); extern void compaction_unregister_node(struct node *node); -- cgit v1.2.3 From dd4123f324bbaec7618b677b7bce2b11aee9594e Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Tue, 26 Jul 2016 15:26:50 -0700 Subject: mm: fix build warnings in Randy reported below build error. > In file included from ../include/linux/balloon_compaction.h:48:0, > from ../mm/balloon_compaction.c:11: > ../include/linux/compaction.h:237:51: warning: 'struct node' declared inside parameter list [enabled by default] > static inline int compaction_register_node(struct node *node) > ../include/linux/compaction.h:237:51: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default] > ../include/linux/compaction.h:242:54: warning: 'struct node' declared inside parameter list [enabled by default] > static inline void compaction_unregister_node(struct node *node) > It was caused by non-lru page migration which needs compaction.h but compaction.h doesn't include any header to be standalone. I think proper header for non-lru page migration is migrate.h rather than compaction.h because migrate.h has already headers needed to work non-lru page migration indirectly like isolate_mode_t, migrate_mode MIGRATEPAGE_SUCCESS. [akpm@linux-foundation.org: revert mm-balloon-use-general-non-lru-movable-page-feature-fix.patch temp fix] Link: http://lkml.kernel.org/r/20160610003304.GE29779@bbox Signed-off-by: Minchan Kim Reported-by: Randy Dunlap Cc: Konstantin Khlebnikov Cc: Vlastimil Babka Cc: Gioh Kim Cc: Rafael Aquini Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/compaction.h | 16 ---------------- 1 file changed, 16 deletions(-) (limited to 'include/linux/compaction.h') diff --git a/include/linux/compaction.h b/include/linux/compaction.h index c6b47c861cea..1a02dab16646 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -54,9 +54,6 @@ enum compact_result { struct alloc_context; /* in mm/internal.h */ #ifdef CONFIG_COMPACTION -extern int PageMovable(struct page *page); -extern void __SetPageMovable(struct page *page, struct address_space *mapping); -extern void __ClearPageMovable(struct page *page); extern int sysctl_compact_memory; extern int sysctl_compaction_handler(struct ctl_table *table, int write, void __user *buffer, size_t *length, loff_t *ppos); @@ -154,19 +151,6 @@ extern void kcompactd_stop(int nid); extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx); #else -static inline int PageMovable(struct page *page) -{ - return 0; -} -static inline void __SetPageMovable(struct page *page, - struct address_space *mapping) -{ -} - -static inline void __ClearPageMovable(struct page *page) -{ -} - static inline enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, int alloc_flags, const struct alloc_context *ac, -- cgit v1.2.3 From a5508cd83f10f663e05d212cb81f600a3af46e40 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Thu, 28 Jul 2016 15:49:28 -0700 Subject: mm, compaction: introduce direct compaction priority In the context of direct compaction, for some types of allocations we would like the compaction to either succeed or definitely fail while trying as hard as possible. Current async/sync_light migration mode is insufficient, as there are heuristics such as caching scanner positions, marking pageblocks as unsuitable or deferring compaction for a zone. At least the final compaction attempt should be able to override these heuristics. To communicate how hard compaction should try, we replace migration mode with a new enum compact_priority and change the relevant function signatures. In compact_zone_order() where struct compact_control is constructed, the priority is mapped to suitable control flags. This patch itself has no functional change, as the current priority levels are mapped back to the same migration modes as before. Expanding them will be done next. Note that !CONFIG_COMPACTION variant of try_to_compact_pages() is removed, as the only caller exists under CONFIG_COMPACTION. Link: http://lkml.kernel.org/r/20160721073614.24395-8-vbabka@suse.cz Signed-off-by: Vlastimil Babka Acked-by: Michal Hocko Acked-by: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/compaction.h | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) (limited to 'include/linux/compaction.h') diff --git a/include/linux/compaction.h b/include/linux/compaction.h index 1a02dab16646..0980a6ce4436 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -1,6 +1,18 @@ #ifndef _LINUX_COMPACTION_H #define _LINUX_COMPACTION_H +/* + * Determines how hard direct compaction should try to succeed. + * Lower value means higher priority, analogically to reclaim priority. + */ +enum compact_priority { + COMPACT_PRIO_SYNC_LIGHT, + MIN_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT, + DEF_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT, + COMPACT_PRIO_ASYNC, + INIT_COMPACT_PRIORITY = COMPACT_PRIO_ASYNC +}; + /* Return values for compact_zone() and try_to_compact_pages() */ /* When adding new states, please adjust include/trace/events/compaction.h */ enum compact_result { @@ -66,7 +78,7 @@ extern int fragmentation_index(struct zone *zone, unsigned int order); extern enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, unsigned int alloc_flags, const struct alloc_context *ac, - enum migrate_mode mode, int *contended); + enum compact_priority prio, int *contended); extern void compact_pgdat(pg_data_t *pgdat, int order); extern void reset_isolation_suitable(pg_data_t *pgdat); extern enum compact_result compaction_suitable(struct zone *zone, int order, @@ -151,14 +163,6 @@ extern void kcompactd_stop(int nid); extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx); #else -static inline enum compact_result try_to_compact_pages(gfp_t gfp_mask, - unsigned int order, int alloc_flags, - const struct alloc_context *ac, - enum migrate_mode mode, int *contended) -{ - return COMPACT_CONTINUE; -} - static inline void compact_pgdat(pg_data_t *pgdat, int order) { } -- cgit v1.2.3 From c3486f5376696034d0fcbef8ba70c70cfcb26f51 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Thu, 28 Jul 2016 15:49:30 -0700 Subject: mm, compaction: simplify contended compaction handling Async compaction detects contention either due to failing trylock on zone->lock or lru_lock, or by need_resched(). Since 1f9efdef4f3f ("mm, compaction: khugepaged should not give up due to need_resched()") the code got quite complicated to distinguish these two up to the __alloc_pages_slowpath() level, so different decisions could be taken for khugepaged allocations. After the recent changes, khugepaged allocations don't check for contended compaction anymore, so we again don't need to distinguish lock and sched contention, and simplify the current convoluted code a lot. However, I believe it's also possible to simplify even more and completely remove the check for contended compaction after the initial async compaction for costly orders, which was originally aimed at THP page fault allocations. There are several reasons why this can be done now: - with the new defaults, THP page faults no longer do reclaim/compaction at all, unless the system admin has overridden the default, or application has indicated via madvise that it can benefit from THP's. In both cases, it means that the potential extra latency is expected and worth the benefits. - even if reclaim/compaction proceeds after this patch where it previously wouldn't, the second compaction attempt is still async and will detect the contention and back off, if the contention persists - there are still heuristics like deferred compaction and pageblock skip bits in place that prevent excessive THP page fault latencies Link: http://lkml.kernel.org/r/20160721073614.24395-9-vbabka@suse.cz Signed-off-by: Vlastimil Babka Acked-by: Michal Hocko Acked-by: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/compaction.h | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-) (limited to 'include/linux/compaction.h') diff --git a/include/linux/compaction.h b/include/linux/compaction.h index 0980a6ce4436..d4e106b5dc27 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -55,14 +55,6 @@ enum compact_result { COMPACT_PARTIAL, }; -/* Used to signal whether compaction detected need_sched() or lock contention */ -/* No contention detected */ -#define COMPACT_CONTENDED_NONE 0 -/* Either need_sched() was true or fatal signal pending */ -#define COMPACT_CONTENDED_SCHED 1 -/* Zone lock or lru_lock was contended in async compaction */ -#define COMPACT_CONTENDED_LOCK 2 - struct alloc_context; /* in mm/internal.h */ #ifdef CONFIG_COMPACTION @@ -76,9 +68,8 @@ extern int sysctl_compact_unevictable_allowed; extern int fragmentation_index(struct zone *zone, unsigned int order); extern enum compact_result try_to_compact_pages(gfp_t gfp_mask, - unsigned int order, - unsigned int alloc_flags, const struct alloc_context *ac, - enum compact_priority prio, int *contended); + unsigned int order, unsigned int alloc_flags, + const struct alloc_context *ac, enum compact_priority prio); extern void compact_pgdat(pg_data_t *pgdat, int order); extern void reset_isolation_suitable(pg_data_t *pgdat); extern enum compact_result compaction_suitable(struct zone *zone, int order, -- cgit v1.2.3