linux-toradex.git/mm/internal.h, branch v3.7

mm, thp: fix mlock statistics

2012-10-09T07:23:03+00:00

NR_MLOCK is only accounted in single page units: there's no logic to
handle transparent hugepages.  This patch checks the appropriate number of
pages to adjust the statistics by so that the correct amount of memory is
reflected.

Currently:

		$ grep Mlocked /proc/meminfo
		Mlocked:           19636 kB

	#define MAP_SIZE	(4 << 30)	/* 4GB */

	void *ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE,
			 MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
	mlock(ptr, MAP_SIZE);

		$ grep Mlocked /proc/meminfo
		Mlocked:           29844 kB

	munlock(ptr, MAP_SIZE);

		$ grep Mlocked /proc/meminfo
		Mlocked:           19636 kB

And with this patch:

		$ grep Mlock /proc/meminfo
		Mlocked:           19636 kB

	mlock(ptr, MAP_SIZE);

		$ grep Mlock /proc/meminfo
		Mlocked:         4213664 kB

	munlock(ptr, MAP_SIZE);

		$ grep Mlock /proc/meminfo
		Mlocked:           19636 kB

Signed-off-by: David Rientjes 
Reported-by: Hugh Dickens 
Acked-by: Hugh Dickins 
Reviewed-by: Andrea Arcangeli 
Cc: Naoya Horiguchi 
Cc: KAMEZAWA Hiroyuki 
Cc: Johannes Weiner 
Reviewed-by: Michel Lespinasse 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

CMA: migrate mlocked pages

2012-10-09T07:23:00+00:00

Presently CMA cannot migrate mlocked pages so it ends up failing to allocate
contiguous memory space.

This patch makes mlocked pages be migrated out.  Of course, it can affect
realtime processes but in CMA usecase, contiguous memory allocation failing
is far worse than access latency to an mlocked page being variable while
CMA is running.  If someone wants to make the system realtime, he shouldn't
enable CMA because stalls can still happen at random times.

[akpm@linux-foundation.org: tweak comment text, per Mel]
Signed-off-by: Minchan Kim 
Acked-by: Mel Gorman 
Cc: Michal Nazarewicz 
Cc: Bartlomiej Zolnierkiewicz 
Cc: Marek Szyprowski 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: use clear_page_mlock() in page_remove_rmap()

2012-10-09T07:22:56+00:00

We had thought that pages could no longer get freed while still marked as
mlocked; but Johannes Weiner posted this program to demonstrate that
truncating an mlocked private file mapping containing COWed pages is still
mishandled:

#include 
#include 
#include 
#include 
#include 
#include 
#include 

int main(void)
{
	char *map;
	int fd;

	system("grep mlockfreed /proc/vmstat");
	fd = open("chigurh", O_CREAT|O_EXCL|O_RDWR);
	unlink("chigurh");
	ftruncate(fd, 4096);
	map = mmap(NULL, 4096, PROT_WRITE, MAP_PRIVATE, fd, 0);
	map[0] = 11;
	mlock(map, sizeof(fd));
	ftruncate(fd, 0);
	close(fd);
	munlock(map, sizeof(fd));
	munmap(map, 4096);
	system("grep mlockfreed /proc/vmstat");
	return 0;
}

The anon COWed pages are not caught by truncation's clear_page_mlock() of
the pagecache pages; but unmap_mapping_range() unmaps them, so we ought to
look out for them there in page_remove_rmap().  Indeed, why should
truncation or invalidation be doing the clear_page_mlock() when removing
from pagecache?  mlock is a property of mapping in userspace, not a
property of pagecache: an mlocked unmapped page is nonsensical.

Reported-by: Johannes Weiner 
Signed-off-by: Hugh Dickins 
Cc: Mel Gorman 
Cc: Rik van Riel 
Cc: Michel Lespinasse 
Cc: Ying Han 
Acked-by: Johannes Weiner 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: remove vma arg from page_evictable

2012-10-09T07:22:55+00:00

page_evictable(page, vma) is an irritant: almost all its callers pass
NULL for vma.  Remove the vma arg and use mlocked_vma_newpage(vma, page)
explicitly in the couple of places it's needed.  But in those places we
don't even need page_evictable() itself!  They're dealing with a freshly
allocated anonymous page, which has no "mapping" and cannot be mlocked yet.

Signed-off-by: Hugh Dickins 
Acked-by: Mel Gorman 
Cc: Rik van Riel 
Acked-by: Johannes Weiner 
Cc: Michel Lespinasse 
Cc: Ying Han 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: compaction: Restart compaction from near where it left off

2012-10-09T07:22:50+00:00

This is almost entirely based on Rik's previous patches and discussions
with him about how this might be implemented.

Order > 0 compaction stops when enough free pages of the correct page
order have been coalesced.  When doing subsequent higher order
allocations, it is possible for compaction to be invoked many times.

However, the compaction code always starts out looking for things to
compact at the start of the zone, and for free pages to compact things to
at the end of the zone.

This can cause quadratic behaviour, with isolate_freepages starting at the
end of the zone each time, even though previous invocations of the
compaction code already filled up all free memory on that end of the zone.
 This can cause isolate_freepages to take enormous amounts of CPU with
certain workloads on larger memory systems.

This patch caches where the migration and free scanner should start from
on subsequent compaction invocations using the pageblock-skip information.
 When compaction starts it begins from the cached restart points and will
update the cached restart points until a page is isolated or a pageblock
is skipped that would have been scanned by synchronous compaction.

Signed-off-by: Mel Gorman 
Acked-by: Rik van Riel 
Cc: Richard Davies 
Cc: Shaohua Li 
Cc: Avi Kivity 
Acked-by: Rafael Aquini 
Cc: Fengguang Wu 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: compaction: cache if a pageblock was scanned and no pages were isolated

2012-10-09T07:22:50+00:00

When compaction was implemented it was known that scanning could
potentially be excessive.  The ideal was that a counter be maintained for
each pageblock but maintaining this information would incur a severe
penalty due to a shared writable cache line.  It has reached the point
where the scanning costs are a serious problem, particularly on
long-lived systems where a large process starts and allocates a large
number of THPs at the same time.

Instead of using a shared counter, this patch adds another bit to the
pageblock flags called PG_migrate_skip.  If a pageblock is scanned by
either migrate or free scanner and 0 pages were isolated, the pageblock is
marked to be skipped in the future.  When scanning, this bit is checked
before any scanning takes place and the block skipped if set.

The main difficulty with a patch like this is "when to ignore the cached
information?" If it's ignored too often, the scanning rates will still be
excessive.  If the information is too stale then allocations will fail
that might have otherwise succeeded.  In this patch

o CMA always ignores the information
o If the migrate and free scanner meet then the cached information will
  be discarded if it's at least 5 seconds since the last time the cache
  was discarded
o If there are a large number of allocation failures, discard the cache.

The time-based heuristic is very clumsy but there are few choices for a
better event.  Depending solely on multiple allocation failures still
allows excessive scanning when THP allocations are failing in quick
succession due to memory pressure.  Waiting until memory pressure is
relieved would cause compaction to continually fail instead of using
reclaim/compaction to try allocate the page.  The time-based mechanism is
clumsy but a better option is not obvious.

Signed-off-by: Mel Gorman 
Acked-by: Rik van Riel 
Cc: Richard Davies 
Cc: Shaohua Li 
Cc: Avi Kivity 
Acked-by: Rafael Aquini 
Cc: Fengguang Wu 
Cc: Michal Nazarewicz 
Cc: Bartlomiej Zolnierkiewicz 
Cc: Kyungmin Park 
Cc: Mark Brown 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

revert "mm: have order > 0 compaction start off where it left"

2012-10-09T07:22:50+00:00

This reverts commit 7db8889ab05b ("mm: have order > 0 compaction start
off where it left") and commit de74f1cc ("mm: have order > 0 compaction
start near a pageblock with free pages").  These patches were a good
idea and tests confirmed that they massively reduced the amount of
scanning but the implementation is complex and tricky to understand.  A
later patch will cache what pageblocks should be skipped and
reimplements the concept of compact_cached_free_pfn on top for both
migration and free scanners.

Signed-off-by: Mel Gorman 
Acked-by: Rik van Riel 
Cc: Richard Davies 
Cc: Shaohua Li 
Cc: Avi Kivity 
Acked-by: Rafael Aquini 
Acked-by: Minchan Kim 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: compaction: abort compaction loop if lock is contended or run too long

2012-10-09T07:22:48+00:00

isolate_migratepages_range() might isolate no pages if for example when
zone->lru_lock is contended and running asynchronous compaction. In this
case, we should abort compaction, otherwise, compact_zone will run a
useless loop and make zone->lru_lock is even contended.

An additional check is added to ensure that cc.migratepages and
cc.freepages get properly drained whan compaction is aborted.

[minchan@kernel.org: Putback pages isolated for migration if aborting]
[akpm@linux-foundation.org: compact_zone_order requires non-NULL arg contended]
[akpm@linux-foundation.org: make compact_zone_order() require non-NULL arg `contended']
[minchan@kernel.org: Putback pages isolated for migration if aborting]
Signed-off-by: Andrea Arcangeli 
Signed-off-by: Shaohua Li 
Signed-off-by: Mel Gorman 
Acked-by: Minchan Kim 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

cma: fix watermark checking

2012-10-09T07:22:45+00:00

* Add ALLOC_CMA alloc flag and pass it to [__]zone_watermark_ok()
  (from Minchan Kim).

* During watermark check decrease available free pages number by
  free CMA pages number if necessary (unmovable allocations cannot
  use pages from CMA areas).

Signed-off-by: Bartlomiej Zolnierkiewicz 
Signed-off-by: Kyungmin Park 
Cc: Marek Szyprowski 
Cc: Michal Nazarewicz 
Cc: Minchan Kim 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: cma: discard clean pages during contiguous allocation instead of migration

2012-10-09T07:22:43+00:00

Drop clean cache pages instead of migration during alloc_contig_range() to
minimise allocation latency by reducing the amount of migration that is
necessary.  It's useful for CMA because latency of migration is more
important than evicting the background process's working set.  In
addition, as pages are reclaimed then fewer free pages for migration
targets are required so it avoids memory reclaiming to get free pages,
which is a contributory factor to increased latency.

I measured elapsed time of __alloc_contig_migrate_range() which migrates
10M in 40M movable zone in QEMU machine.

Before - 146ms, After - 7ms

[akpm@linux-foundation.org: fix nommu build]
Signed-off-by: Mel Gorman 
Signed-off-by: Minchan Kim 
Reviewed-by: Mel Gorman 
Cc: Marek Szyprowski 
Acked-by: Michal Nazarewicz 
Cc: Rik van Riel 
Tested-by: Kyungmin Park 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds