<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/mm.h, branch v3.3.5</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2012-01-12T03:12:10+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-01-12T03:12:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d0b9706c20ebb4ba181dc26e52ac9a6861abf425'/>
<id>d0b9706c20ebb4ba181dc26e52ac9a6861abf425</id>
<content type='text'>
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/numa: Add constraints check for nid parameters
  mm, x86: Remove debug_pagealloc_enabled
  x86/mm: Initialize high mem before free_all_bootmem()
  arch/x86/kernel/e820.c: quiet sparse noise about plain integer as NULL pointer
  arch/x86/kernel/e820.c: Eliminate bubble sort from sanitize_e820_map()
  x86: Fix mmap random address range
  x86, mm: Unify zone_sizes_init()
  x86, mm: Prepare zone_sizes_init() for unification
  x86, mm: Use max_low_pfn for ZONE_NORMAL on 64-bit
  x86, mm: Wrap ZONE_DMA32 with CONFIG_ZONE_DMA32
  x86, mm: Use max_pfn instead of highend_pfn
  x86, mm: Move zone init from paging_init() on 64-bit
  x86, mm: Use MAX_DMA_PFN for ZONE_DMA on 32-bit
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/numa: Add constraints check for nid parameters
  mm, x86: Remove debug_pagealloc_enabled
  x86/mm: Initialize high mem before free_all_bootmem()
  arch/x86/kernel/e820.c: quiet sparse noise about plain integer as NULL pointer
  arch/x86/kernel/e820.c: Eliminate bubble sort from sanitize_e820_map()
  x86: Fix mmap random address range
  x86, mm: Unify zone_sizes_init()
  x86, mm: Prepare zone_sizes_init() for unification
  x86, mm: Use max_low_pfn for ZONE_NORMAL on 64-bit
  x86, mm: Wrap ZONE_DMA32 with CONFIG_ZONE_DMA32
  x86, mm: Use max_pfn instead of highend_pfn
  x86, mm: Move zone init from paging_init() on 64-bit
  x86, mm: Use MAX_DMA_PFN for ZONE_DMA on 32-bit
</pre>
</div>
</content>
</entry>
<entry>
<title>procfs: introduce the /proc/&lt;pid&gt;/map_files/ directory</title>
<updated>2012-01-11T00:30:54+00:00</updated>
<author>
<name>Pavel Emelyanov</name>
<email>xemul@parallels.com</email>
</author>
<published>2012-01-10T23:11:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=640708a2cff7f81e246243b0073c66e6ece7e53e'/>
<id>640708a2cff7f81e246243b0073c66e6ece7e53e</id>
<content type='text'>
This one behaves similarly to the /proc/&lt;pid&gt;/fd/ one - it contains
symlinks one for each mapping with file, the name of a symlink is
"vma-&gt;vm_start-vma-&gt;vm_end", the target is the file.  Opening a symlink
results in a file that point exactly to the same inode as them vma's one.

For example the ls -l of some arbitrary /proc/&lt;pid&gt;/map_files/

 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80403000-7f8f80404000 -&gt; /lib64/libc-2.5.so
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f8061e000-7f8f80620000 -&gt; /lib64/libselinux.so.1
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80826000-7f8f80827000 -&gt; /lib64/libacl.so.1.1.0
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a2f000-7f8f80a30000 -&gt; /lib64/librt-2.5.so
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a30000-7f8f80a4c000 -&gt; /lib64/ld-2.5.so

This *helps* checkpointing process in three ways:

1. When dumping a task mappings we do know exact file that is mapped
   by particular region.  We do this by opening
   /proc/$pid/map_files/$address symlink the way we do with file
   descriptors.

2. This also helps in determining which anonymous shared mappings are
   shared with each other by comparing the inodes of them.

3. When restoring a set of processes in case two of them has a mapping
   shared, we map the memory by the 1st one and then open its
   /proc/$pid/map_files/$address file and map it by the 2nd task.

Using /proc/$pid/maps for this is quite inconvenient since it brings
repeatable re-reading and reparsing for this text file which slows down
restore procedure significantly.  Also as being pointed in (3) it is a way
easier to use top level shared mapping in children as
/proc/$pid/map_files/$address when needed.

[akpm@linux-foundation.org: coding-style fixes]
[gorcunov@openvz.org: make map_files depend on CHECKPOINT_RESTORE]
Signed-off-by: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Signed-off-by: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Reviewed-by: Vasiliy Kulikov &lt;segoon@openwall.com&gt;
Reviewed-by: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This one behaves similarly to the /proc/&lt;pid&gt;/fd/ one - it contains
symlinks one for each mapping with file, the name of a symlink is
"vma-&gt;vm_start-vma-&gt;vm_end", the target is the file.  Opening a symlink
results in a file that point exactly to the same inode as them vma's one.

For example the ls -l of some arbitrary /proc/&lt;pid&gt;/map_files/

 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80403000-7f8f80404000 -&gt; /lib64/libc-2.5.so
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f8061e000-7f8f80620000 -&gt; /lib64/libselinux.so.1
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80826000-7f8f80827000 -&gt; /lib64/libacl.so.1.1.0
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a2f000-7f8f80a30000 -&gt; /lib64/librt-2.5.so
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a30000-7f8f80a4c000 -&gt; /lib64/ld-2.5.so

This *helps* checkpointing process in three ways:

1. When dumping a task mappings we do know exact file that is mapped
   by particular region.  We do this by opening
   /proc/$pid/map_files/$address symlink the way we do with file
   descriptors.

2. This also helps in determining which anonymous shared mappings are
   shared with each other by comparing the inodes of them.

3. When restoring a set of processes in case two of them has a mapping
   shared, we map the memory by the 1st one and then open its
   /proc/$pid/map_files/$address file and map it by the 2nd task.

Using /proc/$pid/maps for this is quite inconvenient since it brings
repeatable re-reading and reparsing for this text file which slows down
restore procedure significantly.  Also as being pointed in (3) it is a way
easier to use top level shared mapping in children as
/proc/$pid/map_files/$address when needed.

[akpm@linux-foundation.org: coding-style fixes]
[gorcunov@openvz.org: make map_files depend on CHECKPOINT_RESTORE]
Signed-off-by: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Signed-off-by: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Reviewed-by: Vasiliy Kulikov &lt;segoon@openwall.com&gt;
Reviewed-by: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: more intensive memory corruption debugging</title>
<updated>2012-01-11T00:30:42+00:00</updated>
<author>
<name>Stanislaw Gruszka</name>
<email>sgruszka@redhat.com</email>
</author>
<published>2012-01-10T23:07:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c0a32fc5a2e470d0b02597b23ad79a317735253e'/>
<id>c0a32fc5a2e470d0b02597b23ad79a317735253e</id>
<content type='text'>
With CONFIG_DEBUG_PAGEALLOC configured, the CPU will generate an exception
on access (read,write) to an unallocated page, which permits us to catch
code which corrupts memory.  However the kernel is trying to maximise
memory usage, hence there are usually few free pages in the system and
buggy code usually corrupts some crucial data.

This patch changes the buddy allocator to keep more free/protected pages
and to interlace free/protected and allocated pages to increase the
probability of catching corruption.

When the kernel is compiled with CONFIG_DEBUG_PAGEALLOC,
debug_guardpage_minorder defines the minimum order used by the page
allocator to grant a request.  The requested size will be returned with
the remaining pages used as guard pages.

The default value of debug_guardpage_minorder is zero: no change from
current behaviour.

[akpm@linux-foundation.org: tweak documentation, s/flg/flag/]
Signed-off-by: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@sisk.pl&gt;
Cc: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With CONFIG_DEBUG_PAGEALLOC configured, the CPU will generate an exception
on access (read,write) to an unallocated page, which permits us to catch
code which corrupts memory.  However the kernel is trying to maximise
memory usage, hence there are usually few free pages in the system and
buggy code usually corrupts some crucial data.

This patch changes the buddy allocator to keep more free/protected pages
and to interlace free/protected and allocated pages to increase the
probability of catching corruption.

When the kernel is compiled with CONFIG_DEBUG_PAGEALLOC,
debug_guardpage_minorder defines the minimum order used by the page
allocator to grant a request.  The requested size will be returned with
the remaining pages used as guard pages.

The default value of debug_guardpage_minorder is zero: no change from
current behaviour.

[akpm@linux-foundation.org: tweak documentation, s/flg/flag/]
Signed-off-by: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@sisk.pl&gt;
Cc: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'memblock-kill-early_node_map' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/memblock</title>
<updated>2011-12-20T11:14:26+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@elte.hu</email>
</author>
<published>2011-12-20T11:14:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=45aa0663cc408617b79a2b53f0a5f50e94688a48'/>
<id>45aa0663cc408617b79a2b53f0a5f50e94688a48</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>vmscan: use atomic-long for shrinker batching</title>
<updated>2011-12-09T15:50:27+00:00</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@openvz.org</email>
</author>
<published>2011-12-08T22:33:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=83aeeada7c69f35e5100b27ec354335597a7a488'/>
<id>83aeeada7c69f35e5100b27ec354335597a7a488</id>
<content type='text'>
Use atomic-long operations instead of looping around cmpxchg().

[akpm@linux-foundation.org: massage atomic.h inclusions]
Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use atomic-long operations instead of looping around cmpxchg().

[akpm@linux-foundation.org: massage atomic.h inclusions]
Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>memblock: Kill early_node_map[]</title>
<updated>2011-12-08T18:22:09+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2011-12-08T18:22:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0ee332c1451869963626bf9cac88f165a90990e1'/>
<id>0ee332c1451869963626bf9cac88f165a90990e1</id>
<content type='text'>
Now all ARCH_POPULATES_NODE_MAP archs select HAVE_MEBLOCK_NODE_MAP -
there's no user of early_node_map[] left.  Kill early_node_map[] and
replace ARCH_POPULATES_NODE_MAP with HAVE_MEMBLOCK_NODE_MAP.  Also,
relocate for_each_mem_pfn_range() and helper from mm.h to memblock.h
as page_alloc.c would no longer host an alternative implementation.

This change is ultimately one to one mapping and shouldn't cause any
observable difference; however, after the recent changes, there are
some functions which now would fit memblock.c better than page_alloc.c
and dependency on HAVE_MEMBLOCK_NODE_MAP instead of HAVE_MEMBLOCK
doesn't make much sense on some of them.  Further cleanups for
functions inside HAVE_MEMBLOCK_NODE_MAP in mm.h would be nice.

-v2: Fix compile bug introduced by mis-spelling
 CONFIG_HAVE_MEMBLOCK_NODE_MAP to CONFIG_MEMBLOCK_HAVE_NODE_MAP in
 mmzone.h.  Reported by Stephen Rothwell.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Yinghai Lu &lt;yinghai@kernel.org&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Chen Liqin &lt;liqin.chen@sunplusct.com&gt;
Cc: Paul Mundt &lt;lethal@linux-sh.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now all ARCH_POPULATES_NODE_MAP archs select HAVE_MEBLOCK_NODE_MAP -
there's no user of early_node_map[] left.  Kill early_node_map[] and
replace ARCH_POPULATES_NODE_MAP with HAVE_MEMBLOCK_NODE_MAP.  Also,
relocate for_each_mem_pfn_range() and helper from mm.h to memblock.h
as page_alloc.c would no longer host an alternative implementation.

This change is ultimately one to one mapping and shouldn't cause any
observable difference; however, after the recent changes, there are
some functions which now would fit memblock.c better than page_alloc.c
and dependency on HAVE_MEMBLOCK_NODE_MAP instead of HAVE_MEMBLOCK
doesn't make much sense on some of them.  Further cleanups for
functions inside HAVE_MEMBLOCK_NODE_MAP in mm.h would be nice.

-v2: Fix compile bug introduced by mis-spelling
 CONFIG_HAVE_MEMBLOCK_NODE_MAP to CONFIG_MEMBLOCK_HAVE_NODE_MAP in
 mmzone.h.  Reported by Stephen Rothwell.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Yinghai Lu &lt;yinghai@kernel.org&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Chen Liqin &lt;liqin.chen@sunplusct.com&gt;
Cc: Paul Mundt &lt;lethal@linux-sh.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm, x86: Remove debug_pagealloc_enabled</title>
<updated>2011-12-06T08:24:07+00:00</updated>
<author>
<name>Stanislaw Gruszka</name>
<email>sgruszka@redhat.com</email>
</author>
<published>2011-11-29T16:05:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=54c29c635ae91f5d75ced7bffeaa77ba37ca02bb'/>
<id>54c29c635ae91f5d75ced7bffeaa77ba37ca02bb</id>
<content type='text'>
When (no)bootmem finish operation, it pass pages to buddy
allocator. Since debug_pagealloc_enabled is not set, we will do
not protect pages, what is not what we want with
CONFIG_DEBUG_PAGEALLOC=y.

To fix remove debug_pagealloc_enabled. That variable was
introduced by commit 12d6f21e "x86: do not PSE on
CONFIG_DEBUG_PAGEALLOC=y" to get more CPA (change page
attribude) code testing. But currently we have CONFIG_CPA_DEBUG,
which test CPA.

Signed-off-by: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1322582711-14571-1-git-send-email-sgruszka@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When (no)bootmem finish operation, it pass pages to buddy
allocator. Since debug_pagealloc_enabled is not set, we will do
not protect pages, what is not what we want with
CONFIG_DEBUG_PAGEALLOC=y.

To fix remove debug_pagealloc_enabled. That variable was
introduced by commit 12d6f21e "x86: do not PSE on
CONFIG_DEBUG_PAGEALLOC=y" to get more CPA (change page
attribude) code testing. But currently we have CONFIG_CPA_DEBUG,
which test CPA.

Signed-off-by: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1322582711-14571-1-git-send-email-sgruszka@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'master' into x86/memblock</title>
<updated>2011-11-28T17:46:22+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2011-11-28T17:46:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d4bbf7e7759afc172e2bfbc5c416324590049cdd'/>
<id>d4bbf7e7759afc172e2bfbc5c416324590049cdd</id>
<content type='text'>
Conflicts &amp; resolutions:

* arch/x86/xen/setup.c

	dc91c728fd "xen: allow extra memory to be in multiple regions"
	24aa07882b "memblock, x86: Replace memblock_x86_reserve/free..."

	conflicted on xen_add_extra_mem() updates.  The resolution is
	trivial as the latter just want to replace
	memblock_x86_reserve_range() with memblock_reserve().

* drivers/pci/intel-iommu.c

	166e9278a3f "x86/ia64: intel-iommu: move to drivers/iommu/"
	5dfe8660a3d "bootmem: Replace work_with_active_regions() with..."

	conflicted as the former moved the file under drivers/iommu/.
	Resolved by applying the chnages from the latter on the moved
	file.

* mm/Kconfig

	6661672053a "memblock: add NO_BOOTMEM config symbol"
	c378ddd53f9 "memblock, x86: Make ARCH_DISCARD_MEMBLOCK a config option"

	conflicted trivially.  Both added config options.  Just
	letting both add their own options resolves the conflict.

* mm/memblock.c

	d1f0ece6cdc "mm/memblock.c: small function definition fixes"
	ed7b56a799c "memblock: Remove memblock_memory_can_coalesce()"

	confliected.  The former updates function removed by the
	latter.  Resolution is trivial.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Conflicts &amp; resolutions:

* arch/x86/xen/setup.c

	dc91c728fd "xen: allow extra memory to be in multiple regions"
	24aa07882b "memblock, x86: Replace memblock_x86_reserve/free..."

	conflicted on xen_add_extra_mem() updates.  The resolution is
	trivial as the latter just want to replace
	memblock_x86_reserve_range() with memblock_reserve().

* drivers/pci/intel-iommu.c

	166e9278a3f "x86/ia64: intel-iommu: move to drivers/iommu/"
	5dfe8660a3d "bootmem: Replace work_with_active_regions() with..."

	conflicted as the former moved the file under drivers/iommu/.
	Resolved by applying the chnages from the latter on the moved
	file.

* mm/Kconfig

	6661672053a "memblock: add NO_BOOTMEM config symbol"
	c378ddd53f9 "memblock, x86: Make ARCH_DISCARD_MEMBLOCK a config option"

	conflicted trivially.  Both added config options.  Just
	letting both add their own options resolves the conflict.

* mm/memblock.c

	d1f0ece6cdc "mm/memblock.c: small function definition fixes"
	ed7b56a799c "memblock: Remove memblock_memory_can_coalesce()"

	confliected.  The former updates function removed by the
	latter.  Resolution is trivial.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thp: share get_huge_page_tail()</title>
<updated>2011-11-02T23:06:58+00:00</updated>
<author>
<name>Andrea Arcangeli</name>
<email>aarcange@redhat.com</email>
</author>
<published>2011-11-02T20:37:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b35a35b556f5e6b7993ad0baf20173e75c09ce8c'/>
<id>b35a35b556f5e6b7993ad0baf20173e75c09ce8c</id>
<content type='text'>
This avoids duplicating the function in every arch gup_fast.

Signed-off-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Johannes Weiner &lt;jweiner@redhat.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This avoids duplicating the function in every arch gup_fast.

Signed-off-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Johannes Weiner &lt;jweiner@redhat.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: thp: tail page refcounting fix</title>
<updated>2011-11-02T23:06:57+00:00</updated>
<author>
<name>Andrea Arcangeli</name>
<email>aarcange@redhat.com</email>
</author>
<published>2011-11-02T20:36:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=70b50f94f1644e2aa7cb374819cfd93f3c28d725'/>
<id>70b50f94f1644e2aa7cb374819cfd93f3c28d725</id>
<content type='text'>
Michel while working on the working set estimation code, noticed that
calling get_page_unless_zero() on a random pfn_to_page(random_pfn)
wasn't safe, if the pfn ended up being a tail page of a transparent
hugepage under splitting by __split_huge_page_refcount().

He then found the problem could also theoretically materialize with
page_cache_get_speculative() during the speculative radix tree lookups
that uses get_page_unless_zero() in SMP if the radix tree page is freed
and reallocated and get_user_pages is called on it before
page_cache_get_speculative has a chance to call get_page_unless_zero().

So the best way to fix the problem is to keep page_tail-&gt;_count zero at
all times.  This will guarantee that get_page_unless_zero() can never
succeed on any tail page.  page_tail-&gt;_mapcount is guaranteed zero and
is unused for all tail pages of a compound page, so we can simply
account the tail page references there and transfer them to
tail_page-&gt;_count in __split_huge_page_refcount() (in addition to the
head_page-&gt;_mapcount).

While debugging this s/_count/_mapcount/ change I also noticed get_page is
called by direct-io.c on pages returned by get_user_pages.  That wasn't
entirely safe because the two atomic_inc in get_page weren't atomic.  As
opposed to other get_user_page users like secondary-MMU page fault to
establish the shadow pagetables would never call any superflous get_page
after get_user_page returns.  It's safer to make get_page universally safe
for tail pages and to use get_page_foll() within follow_page (inside
get_user_pages()).  get_page_foll() is safe to do the refcounting for tail
pages without taking any locks because it is run within PT lock protected
critical sections (PT lock for pte and page_table_lock for
pmd_trans_huge).

The standard get_page() as invoked by direct-io instead will now take
the compound_lock but still only for tail pages.  The direct-io paths
are usually I/O bound and the compound_lock is per THP so very
finegrined, so there's no risk of scalability issues with it.  A simple
direct-io benchmarks with all lockdep prove locking and spinlock
debugging infrastructure enabled shows identical performance and no
overhead.  So it's worth it.  Ideally direct-io should stop calling
get_page() on pages returned by get_user_pages().  The spinlock in
get_page() is already optimized away for no-THP builds but doing
get_page() on tail pages returned by GUP is generally a rare operation
and usually only run in I/O paths.

This new refcounting on page_tail-&gt;_mapcount in addition to avoiding new
RCU critical sections will also allow the working set estimation code to
work without any further complexity associated to the tail page
refcounting with THP.

Signed-off-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Reported-by: Michel Lespinasse &lt;walken@google.com&gt;
Reviewed-by: Michel Lespinasse &lt;walken@google.com&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Johannes Weiner &lt;jweiner@redhat.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Cc: &lt;stable@kernel.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Michel while working on the working set estimation code, noticed that
calling get_page_unless_zero() on a random pfn_to_page(random_pfn)
wasn't safe, if the pfn ended up being a tail page of a transparent
hugepage under splitting by __split_huge_page_refcount().

He then found the problem could also theoretically materialize with
page_cache_get_speculative() during the speculative radix tree lookups
that uses get_page_unless_zero() in SMP if the radix tree page is freed
and reallocated and get_user_pages is called on it before
page_cache_get_speculative has a chance to call get_page_unless_zero().

So the best way to fix the problem is to keep page_tail-&gt;_count zero at
all times.  This will guarantee that get_page_unless_zero() can never
succeed on any tail page.  page_tail-&gt;_mapcount is guaranteed zero and
is unused for all tail pages of a compound page, so we can simply
account the tail page references there and transfer them to
tail_page-&gt;_count in __split_huge_page_refcount() (in addition to the
head_page-&gt;_mapcount).

While debugging this s/_count/_mapcount/ change I also noticed get_page is
called by direct-io.c on pages returned by get_user_pages.  That wasn't
entirely safe because the two atomic_inc in get_page weren't atomic.  As
opposed to other get_user_page users like secondary-MMU page fault to
establish the shadow pagetables would never call any superflous get_page
after get_user_page returns.  It's safer to make get_page universally safe
for tail pages and to use get_page_foll() within follow_page (inside
get_user_pages()).  get_page_foll() is safe to do the refcounting for tail
pages without taking any locks because it is run within PT lock protected
critical sections (PT lock for pte and page_table_lock for
pmd_trans_huge).

The standard get_page() as invoked by direct-io instead will now take
the compound_lock but still only for tail pages.  The direct-io paths
are usually I/O bound and the compound_lock is per THP so very
finegrined, so there's no risk of scalability issues with it.  A simple
direct-io benchmarks with all lockdep prove locking and spinlock
debugging infrastructure enabled shows identical performance and no
overhead.  So it's worth it.  Ideally direct-io should stop calling
get_page() on pages returned by get_user_pages().  The spinlock in
get_page() is already optimized away for no-THP builds but doing
get_page() on tail pages returned by GUP is generally a rare operation
and usually only run in I/O paths.

This new refcounting on page_tail-&gt;_mapcount in addition to avoiding new
RCU critical sections will also allow the working set estimation code to
work without any further complexity associated to the tail page
refcounting with THP.

Signed-off-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Reported-by: Michel Lespinasse &lt;walken@google.com&gt;
Reviewed-by: Michel Lespinasse &lt;walken@google.com&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Johannes Weiner &lt;jweiner@redhat.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Cc: &lt;stable@kernel.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
