<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/huge_mm.h, branch v3.7</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>mm, thp: fix mapped pages avoiding unevictable list on mlock</title>
<updated>2012-10-09T07:23:02+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2012-10-08T23:34:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b676b293fb48672904ee1b9828cb50b4eed01717'/>
<id>b676b293fb48672904ee1b9828cb50b4eed01717</id>
<content type='text'>
When a transparent hugepage is mapped and it is included in an mlock()
range, follow_page() incorrectly avoids setting the page's mlock bit and
moving it to the unevictable lru.

This is evident if you try to mlock(), munlock(), and then mlock() a
range again.  Currently:

	#define MAP_SIZE	(4 &lt;&lt; 30)	/* 4GB */

	void *ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE,
			 MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
	mlock(ptr, MAP_SIZE);

		$ grep -E "Unevictable|Inactive\(anon" /proc/meminfo
		Inactive(anon):     6304 kB
		Unevictable:     4213924 kB

	munlock(ptr, MAP_SIZE);

		Inactive(anon):  4186252 kB
		Unevictable:       19652 kB

	mlock(ptr, MAP_SIZE);

		Inactive(anon):  4198556 kB
		Unevictable:       21684 kB

Notice that less than 2MB was added to the unevictable list; this is
because these pages in the range are not transparent hugepages since the
4GB range was allocated with mmap() and has no specific alignment.  If
posix_memalign() were used instead, unevictable would not have grown at
all on the second mlock().

The fix is to call mlock_vma_page() so that the mlock bit is set and the
page is added to the unevictable list.  With this patch:

	mlock(ptr, MAP_SIZE);

		Inactive(anon):     4056 kB
		Unevictable:     4213940 kB

	munlock(ptr, MAP_SIZE);

		Inactive(anon):  4198268 kB
		Unevictable:       19636 kB

	mlock(ptr, MAP_SIZE);

		Inactive(anon):     4008 kB
		Unevictable:     4213940 kB

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Reviewed-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michel Lespinasse &lt;walken@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When a transparent hugepage is mapped and it is included in an mlock()
range, follow_page() incorrectly avoids setting the page's mlock bit and
moving it to the unevictable lru.

This is evident if you try to mlock(), munlock(), and then mlock() a
range again.  Currently:

	#define MAP_SIZE	(4 &lt;&lt; 30)	/* 4GB */

	void *ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE,
			 MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
	mlock(ptr, MAP_SIZE);

		$ grep -E "Unevictable|Inactive\(anon" /proc/meminfo
		Inactive(anon):     6304 kB
		Unevictable:     4213924 kB

	munlock(ptr, MAP_SIZE);

		Inactive(anon):  4186252 kB
		Unevictable:       19652 kB

	mlock(ptr, MAP_SIZE);

		Inactive(anon):  4198556 kB
		Unevictable:       21684 kB

Notice that less than 2MB was added to the unevictable list; this is
because these pages in the range are not transparent hugepages since the
4GB range was allocated with mmap() and has no specific alignment.  If
posix_memalign() were used instead, unevictable would not have grown at
all on the second mlock().

The fix is to call mlock_vma_page() so that the mlock bit is set and the
page is added to the unevictable list.  With this patch:

	mlock(ptr, MAP_SIZE);

		Inactive(anon):     4056 kB
		Unevictable:     4213940 kB

	munlock(ptr, MAP_SIZE);

		Inactive(anon):  4198268 kB
		Unevictable:       19636 kB

	mlock(ptr, MAP_SIZE);

		Inactive(anon):     4008 kB
		Unevictable:     4213940 kB

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Reviewed-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michel Lespinasse &lt;walken@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thp: remove assumptions on pgtable_t type</title>
<updated>2012-10-09T07:22:29+00:00</updated>
<author>
<name>Gerald Schaefer</name>
<email>gerald.schaefer@de.ibm.com</email>
</author>
<published>2012-10-08T23:30:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e3ebcf64381188a2744a9829a4eb5c2b60f1974c'/>
<id>e3ebcf64381188a2744a9829a4eb5c2b60f1974c</id>
<content type='text'>
The thp page table pre-allocation code currently assumes that pgtable_t is
of type "struct page *".  This may not be true for all architectures, so
this patch removes that assumption by replacing the functions
prepare_pmd_huge_pte() and get_pmd_huge_pte() with two new functions that
can be defined architecture-specific.

It also removes two VM_BUG_ON checks for page_count() and page_mapcount()
operating on a pgtable_t.  Apart from the VM_BUG_ON removal, there will be
no functional change introduced by this patch.

Signed-off-by: Gerald Schaefer &lt;gerald.schaefer@de.ibm.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Hillf Danton &lt;dhillf@gmail.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The thp page table pre-allocation code currently assumes that pgtable_t is
of type "struct page *".  This may not be true for all architectures, so
this patch removes that assumption by replacing the functions
prepare_pmd_huge_pte() and get_pmd_huge_pte() with two new functions that
can be defined architecture-specific.

It also removes two VM_BUG_ON checks for page_count() and page_mapcount()
operating on a pgtable_t.  Apart from the VM_BUG_ON removal, there will be
no functional change introduced by this patch.

Signed-off-by: Gerald Schaefer &lt;gerald.schaefer@de.ibm.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Hillf Danton &lt;dhillf@gmail.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: move is_vma_temporary_stack() declaration to huge_mm.h</title>
<updated>2012-05-29T23:22:21+00:00</updated>
<author>
<name>Alex Shi</name>
<email>alex.shi@intel.com</email>
</author>
<published>2012-05-29T22:06:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2099597401c7710c00b0d7c32b24a44a193836e1'/>
<id>2099597401c7710c00b0d7c32b24a44a193836e1</id>
<content type='text'>
When transparent_hugepage_enabled() is used outside mm/, such as in
arch/x86/xx/tlb.c:

+       if (!cpu_has_invlpg || vma-&gt;vm_flags &amp; VM_HUGETLB
+                       || transparent_hugepage_enabled(vma)) {
+               flush_tlb_mm(vma-&gt;vm_mm);

is_vma_temporary_stack() isn't referenced in huge_mm.h, so it has compile
errors:

  arch/x86/mm/tlb.c: In function `flush_tlb_range':
  arch/x86/mm/tlb.c:324:4: error: implicit declaration of function `is_vma_temporary_stack' [-Werror=implicit-function-declaration]

Since is_vma_temporay_stack() is just used in rmap.c and huge_memory.c, it
is better to move it to huge_mm.h from rmap.h to avoid such errors.

Signed-off-by: Alex Shi &lt;alex.shi@intel.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When transparent_hugepage_enabled() is used outside mm/, such as in
arch/x86/xx/tlb.c:

+       if (!cpu_has_invlpg || vma-&gt;vm_flags &amp; VM_HUGETLB
+                       || transparent_hugepage_enabled(vma)) {
+               flush_tlb_mm(vma-&gt;vm_mm);

is_vma_temporary_stack() isn't referenced in huge_mm.h, so it has compile
errors:

  arch/x86/mm/tlb.c: In function `flush_tlb_range':
  arch/x86/mm/tlb.c:324:4: error: implicit declaration of function `is_vma_temporary_stack' [-Werror=implicit-function-declaration]

Since is_vma_temporay_stack() is just used in rmap.c and huge_memory.c, it
is better to move it to huge_mm.h from rmap.h to avoid such errors.

Signed-off-by: Alex Shi &lt;alex.shi@intel.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thp: add HPAGE_PMD_* definitions for !CONFIG_TRANSPARENT_HUGEPAGE</title>
<updated>2012-03-22T00:55:02+00:00</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2012-03-21T23:34:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d8c37c480678ebe09bc570f33e085e28049db035'/>
<id>d8c37c480678ebe09bc570f33e085e28049db035</id>
<content type='text'>
These macros will be used in a later patch, where all usages are expected
to be optimized away without #ifdef CONFIG_TRANSPARENT_HUGEPAGE.  But to
detect unexpected usages, we convert the existing BUG() to BUILD_BUG().

[akpm@linux-foundation.org: fix build in mm/pgtable-generic.c]
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Acked-by: Hillf Danton &lt;dhillf@gmail.com&gt;
Reviewed-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Reviewed-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
These macros will be used in a later patch, where all usages are expected
to be optimized away without #ifdef CONFIG_TRANSPARENT_HUGEPAGE.  But to
detect unexpected usages, we convert the existing BUG() to BUILD_BUG().

[akpm@linux-foundation.org: fix build in mm/pgtable-generic.c]
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Acked-by: Hillf Danton &lt;dhillf@gmail.com&gt;
Reviewed-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Reviewed-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thp: optimize away unnecessary page table locking</title>
<updated>2012-03-22T00:54:57+00:00</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2012-03-21T23:33:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=025c5b2451e42c9e8dfdecd6dc84956ce8f321b5'/>
<id>025c5b2451e42c9e8dfdecd6dc84956ce8f321b5</id>
<content type='text'>
Currently when we check if we can handle thp as it is or we need to split
it into regular sized pages, we hold page table lock prior to check
whether a given pmd is mapping thp or not.  Because of this, when it's not
"huge pmd" we suffer from unnecessary lock/unlock overhead.  To remove it,
this patch introduces a optimized check function and replace several
similar logics with it.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Reviewed-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Jiri Slaby &lt;jslaby@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently when we check if we can handle thp as it is or we need to split
it into regular sized pages, we hold page table lock prior to check
whether a given pmd is mapping thp or not.  Because of this, when it's not
"huge pmd" we suffer from unnecessary lock/unlock overhead.  To remove it,
this patch introduces a optimized check function and replace several
similar logics with it.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Reviewed-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Jiri Slaby &lt;jslaby@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thp: add tlb_remove_pmd_tlb_entry</title>
<updated>2012-01-13T04:13:08+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shaohua.li@intel.com</email>
</author>
<published>2012-01-13T01:19:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f21760b15dcd091e5afd38d0b97197b45f7ef2ea'/>
<id>f21760b15dcd091e5afd38d0b97197b45f7ef2ea</id>
<content type='text'>
We have tlb_remove_tlb_entry to indicate a pte tlb flush entry should be
flushed, but not a corresponding API for pmd entry.  This isn't a
problem so far because THP is only for x86 currently and tlb_flush()
under x86 will flush entire TLB.  But this is confusion and could be
missed if thp is ported to other arch.

Also convert tlb-&gt;need_flush = 1 to a VM_BUG_ON(!tlb-&gt;need_flush) in
__tlb_remove_page() as suggested by Andrea Arcangeli.  The
__tlb_remove_page() function is supposed to be called after
tlb_remove_xxx_tlb_entry() and we can catch any misuse.

Signed-off-by: Shaohua Li &lt;shaohua.li@intel.com&gt;
Reviewed-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Johannes Weiner &lt;jweiner@redhat.com&gt;
Cc: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have tlb_remove_tlb_entry to indicate a pte tlb flush entry should be
flushed, but not a corresponding API for pmd entry.  This isn't a
problem so far because THP is only for x86 currently and tlb_flush()
under x86 will flush entire TLB.  But this is confusion and could be
missed if thp is ported to other arch.

Also convert tlb-&gt;need_flush = 1 to a VM_BUG_ON(!tlb-&gt;need_flush) in
__tlb_remove_page() as suggested by Andrea Arcangeli.  The
__tlb_remove_page() function is supposed to be called after
tlb_remove_xxx_tlb_entry() and we can catch any misuse.

Signed-off-by: Shaohua Li &lt;shaohua.li@intel.com&gt;
Reviewed-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Johannes Weiner &lt;jweiner@redhat.com&gt;
Cc: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thp: mremap support and TLB optimization</title>
<updated>2011-11-01T00:30:48+00:00</updated>
<author>
<name>Andrea Arcangeli</name>
<email>aarcange@redhat.com</email>
</author>
<published>2011-11-01T00:08:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=37a1c49a91ad55f917a399ef2174b5ebda4283f9'/>
<id>37a1c49a91ad55f917a399ef2174b5ebda4283f9</id>
<content type='text'>
This adds THP support to mremap (decreases the number of split_huge_page()
calls).

Here are also some benchmarks with a proggy like this:

===
#define _GNU_SOURCE
#include &lt;sys/mman.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;stdio.h&gt;
#include &lt;string.h&gt;
#include &lt;sys/time.h&gt;

#define SIZE (5UL*1024*1024*1024)

int main()
{
        static struct timeval oldstamp, newstamp;
	long diffsec;
	char *p, *p2, *p3, *p4;
	if (posix_memalign((void **)&amp;p, 2*1024*1024, SIZE))
		perror("memalign"), exit(1);
	if (posix_memalign((void **)&amp;p2, 2*1024*1024, SIZE))
		perror("memalign"), exit(1);
	if (posix_memalign((void **)&amp;p3, 2*1024*1024, 4096))
		perror("memalign"), exit(1);

	memset(p, 0xff, SIZE);
	memset(p2, 0xff, SIZE);
	memset(p3, 0x77, 4096);
	gettimeofday(&amp;oldstamp, NULL);
	p4 = mremap(p, SIZE, SIZE, MREMAP_FIXED|MREMAP_MAYMOVE, p3);
	gettimeofday(&amp;newstamp, NULL);
	diffsec = newstamp.tv_sec - oldstamp.tv_sec;
	diffsec = newstamp.tv_usec - oldstamp.tv_usec + 1000000 * diffsec;
	printf("usec %ld\n", diffsec);
	if (p == MAP_FAILED || p4 != p3)
	//if (p == MAP_FAILED)
		perror("mremap"), exit(1);
	if (memcmp(p4, p2, SIZE))
		printf("mremap bug\n"), exit(1);
	printf("ok\n");

	return 0;
}
===

THP on

 Performance counter stats for './largepage13' (3 runs):

          69195836 dTLB-loads                 ( +-   3.546% )  (scaled from 50.30%)
             60708 dTLB-load-misses           ( +-  11.776% )  (scaled from 52.62%)
         676266476 dTLB-stores                ( +-   5.654% )  (scaled from 69.54%)
             29856 dTLB-store-misses          ( +-   4.081% )  (scaled from 89.22%)
        1055848782 iTLB-loads                 ( +-   4.526% )  (scaled from 80.18%)
              8689 iTLB-load-misses           ( +-   2.987% )  (scaled from 58.20%)

        7.314454164  seconds time elapsed   ( +-   0.023% )

THP off

 Performance counter stats for './largepage13' (3 runs):

        1967379311 dTLB-loads                 ( +-   0.506% )  (scaled from 60.59%)
           9238687 dTLB-load-misses           ( +-  22.547% )  (scaled from 61.87%)
        2014239444 dTLB-stores                ( +-   0.692% )  (scaled from 60.40%)
           3312335 dTLB-store-misses          ( +-   7.304% )  (scaled from 67.60%)
        6764372065 iTLB-loads                 ( +-   0.925% )  (scaled from 79.00%)
              8202 iTLB-load-misses           ( +-   0.475% )  (scaled from 70.55%)

        9.693655243  seconds time elapsed   ( +-   0.069% )

grep thp /proc/vmstat
thp_fault_alloc 35849
thp_fault_fallback 0
thp_collapse_alloc 3
thp_collapse_alloc_failed 0
thp_split 0

thp_split 0 confirms no thp split despite plenty of hugepages allocated.

The measurement of only the mremap time (so excluding the 3 long
memset and final long 10GB memory accessing memcmp):

THP on

usec 14824
usec 14862
usec 14859

THP off

usec 256416
usec 255981
usec 255847

With an older kernel without the mremap optimizations (the below patch
optimizes the non THP version too).

THP on

usec 392107
usec 390237
usec 404124

THP off

usec 444294
usec 445237
usec 445820

I guess with a threaded program that sends more IPI on large SMP it'd
create an even larger difference.

All debug options are off except DEBUG_VM to avoid skewing the
results.

The only problem for native 2M mremap like it happens above both the
source and destination address must be 2M aligned or the hugepmd can't be
moved without a split but that is an hardware limitation.

[akpm@linux-foundation.org: coding-style nitpicking]
Signed-off-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Acked-by: Johannes Weiner &lt;jweiner@redhat.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This adds THP support to mremap (decreases the number of split_huge_page()
calls).

Here are also some benchmarks with a proggy like this:

===
#define _GNU_SOURCE
#include &lt;sys/mman.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;stdio.h&gt;
#include &lt;string.h&gt;
#include &lt;sys/time.h&gt;

#define SIZE (5UL*1024*1024*1024)

int main()
{
        static struct timeval oldstamp, newstamp;
	long diffsec;
	char *p, *p2, *p3, *p4;
	if (posix_memalign((void **)&amp;p, 2*1024*1024, SIZE))
		perror("memalign"), exit(1);
	if (posix_memalign((void **)&amp;p2, 2*1024*1024, SIZE))
		perror("memalign"), exit(1);
	if (posix_memalign((void **)&amp;p3, 2*1024*1024, 4096))
		perror("memalign"), exit(1);

	memset(p, 0xff, SIZE);
	memset(p2, 0xff, SIZE);
	memset(p3, 0x77, 4096);
	gettimeofday(&amp;oldstamp, NULL);
	p4 = mremap(p, SIZE, SIZE, MREMAP_FIXED|MREMAP_MAYMOVE, p3);
	gettimeofday(&amp;newstamp, NULL);
	diffsec = newstamp.tv_sec - oldstamp.tv_sec;
	diffsec = newstamp.tv_usec - oldstamp.tv_usec + 1000000 * diffsec;
	printf("usec %ld\n", diffsec);
	if (p == MAP_FAILED || p4 != p3)
	//if (p == MAP_FAILED)
		perror("mremap"), exit(1);
	if (memcmp(p4, p2, SIZE))
		printf("mremap bug\n"), exit(1);
	printf("ok\n");

	return 0;
}
===

THP on

 Performance counter stats for './largepage13' (3 runs):

          69195836 dTLB-loads                 ( +-   3.546% )  (scaled from 50.30%)
             60708 dTLB-load-misses           ( +-  11.776% )  (scaled from 52.62%)
         676266476 dTLB-stores                ( +-   5.654% )  (scaled from 69.54%)
             29856 dTLB-store-misses          ( +-   4.081% )  (scaled from 89.22%)
        1055848782 iTLB-loads                 ( +-   4.526% )  (scaled from 80.18%)
              8689 iTLB-load-misses           ( +-   2.987% )  (scaled from 58.20%)

        7.314454164  seconds time elapsed   ( +-   0.023% )

THP off

 Performance counter stats for './largepage13' (3 runs):

        1967379311 dTLB-loads                 ( +-   0.506% )  (scaled from 60.59%)
           9238687 dTLB-load-misses           ( +-  22.547% )  (scaled from 61.87%)
        2014239444 dTLB-stores                ( +-   0.692% )  (scaled from 60.40%)
           3312335 dTLB-store-misses          ( +-   7.304% )  (scaled from 67.60%)
        6764372065 iTLB-loads                 ( +-   0.925% )  (scaled from 79.00%)
              8202 iTLB-load-misses           ( +-   0.475% )  (scaled from 70.55%)

        9.693655243  seconds time elapsed   ( +-   0.069% )

grep thp /proc/vmstat
thp_fault_alloc 35849
thp_fault_fallback 0
thp_collapse_alloc 3
thp_collapse_alloc_failed 0
thp_split 0

thp_split 0 confirms no thp split despite plenty of hugepages allocated.

The measurement of only the mremap time (so excluding the 3 long
memset and final long 10GB memory accessing memcmp):

THP on

usec 14824
usec 14862
usec 14859

THP off

usec 256416
usec 255981
usec 255847

With an older kernel without the mremap optimizations (the below patch
optimizes the non THP version too).

THP on

usec 392107
usec 390237
usec 404124

THP off

usec 444294
usec 445237
usec 445820

I guess with a threaded program that sends more IPI on large SMP it'd
create an even larger difference.

All debug options are off except DEBUG_VM to avoid skewing the
results.

The only problem for native 2M mremap like it happens above both the
source and destination address must be 2M aligned or the hugepmd can't be
moved without a split but that is an hardware limitation.

[akpm@linux-foundation.org: coding-style nitpicking]
Signed-off-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Acked-by: Johannes Weiner &lt;jweiner@redhat.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: convert anon_vma-&gt;lock to a mutex</title>
<updated>2011-05-25T15:39:19+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2011-05-25T00:12:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2b575eb64f7a9c701fb4bfdb12388ac547f6c2b6'/>
<id>2b575eb64f7a9c701fb4bfdb12388ac547f6c2b6</id>
<content type='text'>
Straightforward conversion of anon_vma-&gt;lock to a mutex.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Reviewed-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Russell King &lt;rmk@arm.linux.org.uk&gt;
Cc: Paul Mundt &lt;lethal@linux-sh.org&gt;
Cc: Jeff Dike &lt;jdike@addtoit.com&gt;
Cc: Richard Weinberger &lt;richard@nod.at&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Nick Piggin &lt;npiggin@kernel.dk&gt;
Cc: Namhyung Kim &lt;namhyung@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Straightforward conversion of anon_vma-&gt;lock to a mutex.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Reviewed-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Russell King &lt;rmk@arm.linux.org.uk&gt;
Cc: Paul Mundt &lt;lethal@linux-sh.org&gt;
Cc: Jeff Dike &lt;jdike@addtoit.com&gt;
Cc: Richard Weinberger &lt;richard@nod.at&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Nick Piggin &lt;npiggin@kernel.dk&gt;
Cc: Namhyung Kim &lt;namhyung@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups</title>
<updated>2011-04-28T18:28:20+00:00</updated>
<author>
<name>Andrea Arcangeli</name>
<email>aarcange@redhat.com</email>
</author>
<published>2011-04-27T22:26:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=78f11a255749d09025f54d4e2df4fbcb031530e2'/>
<id>78f11a255749d09025f54d4e2df4fbcb031530e2</id>
<content type='text'>
The huge_memory.c THP page fault was allowed to run if vm_ops was null
(which would succeed for /dev/zero MAP_PRIVATE, as the f_op-&gt;mmap wouldn't
setup a special vma-&gt;vm_ops and it would fallback to regular anonymous
memory) but other THP logics weren't fully activated for vmas with vm_file
not NULL (/dev/zero has a not NULL vma-&gt;vm_file).

So this removes the vm_file checks so that /dev/zero also can safely use
THP (the other albeit safer approach to fix this bug would have been to
prevent the THP initial page fault to run if vm_file was set).

After removing the vm_file checks, this also makes huge_memory.c stricter
in khugepaged for the DEBUG_VM=y case.  It doesn't replace the vm_file
check with a is_pfn_mapping check (but it keeps checking for VM_PFNMAP
under VM_BUG_ON) because for a is_cow_mapping() mapping VM_PFNMAP should
only be allowed to exist before the first page fault, and in turn when
vma-&gt;anon_vma is null (so preventing khugepaged registration).  So I tend
to think the previous comment saying if vm_file was set, VM_PFNMAP might
have been set and we could still be registered in khugepaged (despite
anon_vma was not NULL to be registered in khugepaged) was too paranoid.
The is_linear_pfn_mapping check is also I think superfluous (as described
by comment) but under DEBUG_VM it is safe to stay.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=33682

Signed-off-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Reported-by: Caspar Zhang &lt;bugs@casparzhang.com&gt;
Acked-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: &lt;stable@kernel.org&gt;		[2.6.38.x]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The huge_memory.c THP page fault was allowed to run if vm_ops was null
(which would succeed for /dev/zero MAP_PRIVATE, as the f_op-&gt;mmap wouldn't
setup a special vma-&gt;vm_ops and it would fallback to regular anonymous
memory) but other THP logics weren't fully activated for vmas with vm_file
not NULL (/dev/zero has a not NULL vma-&gt;vm_file).

So this removes the vm_file checks so that /dev/zero also can safely use
THP (the other albeit safer approach to fix this bug would have been to
prevent the THP initial page fault to run if vm_file was set).

After removing the vm_file checks, this also makes huge_memory.c stricter
in khugepaged for the DEBUG_VM=y case.  It doesn't replace the vm_file
check with a is_pfn_mapping check (but it keeps checking for VM_PFNMAP
under VM_BUG_ON) because for a is_cow_mapping() mapping VM_PFNMAP should
only be allowed to exist before the first page fault, and in turn when
vma-&gt;anon_vma is null (so preventing khugepaged registration).  So I tend
to think the previous comment saying if vm_file was set, VM_PFNMAP might
have been set and we could still be registered in khugepaged (despite
anon_vma was not NULL to be registered in khugepaged) was too paranoid.
The is_linear_pfn_mapping check is also I think superfluous (as described
by comment) but under DEBUG_VM it is safe to stay.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=33682

Signed-off-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Reported-by: Caspar Zhang &lt;bugs@casparzhang.com&gt;
Acked-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: &lt;stable@kernel.org&gt;		[2.6.38.x]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thp: prevent hugepages during args/env copying into the user stack</title>
<updated>2011-02-15T23:21:11+00:00</updated>
<author>
<name>Andrea Arcangeli</name>
<email>aarcange@redhat.com</email>
</author>
<published>2011-02-15T18:02:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a7d6e4ecdb7648478ddec76d30d87d03d6e22b31'/>
<id>a7d6e4ecdb7648478ddec76d30d87d03d6e22b31</id>
<content type='text'>
Transparent hugepages can only be created if rmap is fully
functional. So we must prevent hugepages to be created while
is_vma_temporary_stack() is true.

This also optmizes away some harmless but unnecessary setting of
khugepaged_scan.address and it switches some BUG_ON to VM_BUG_ON.

Signed-off-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Transparent hugepages can only be created if rmap is fully
functional. So we must prevent hugepages to be created while
is_vma_temporary_stack() is true.

This also optmizes away some harmless but unnecessary setting of
khugepaged_scan.address and it switches some BUG_ON to VM_BUG_ON.

Signed-off-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
