<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/swapops.h, branch v4.14-rc7</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>mm/ZONE_DEVICE: new type of ZONE_DEVICE for unaddressable memory</title>
<updated>2017-09-09T01:26:46+00:00</updated>
<author>
<name>Jérôme Glisse</name>
<email>jglisse@redhat.com</email>
</author>
<published>2017-09-08T23:11:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5042db43cc26f51eed51c56192e2c2317e44315f'/>
<id>5042db43cc26f51eed51c56192e2c2317e44315f</id>
<content type='text'>
HMM (heterogeneous memory management) need struct page to support
migration from system main memory to device memory.  Reasons for HMM and
migration to device memory is explained with HMM core patch.

This patch deals with device memory that is un-addressable memory (ie CPU
can not access it).  Hence we do not want those struct page to be manage
like regular memory.  That is why we extend ZONE_DEVICE to support
different types of memory.

A persistent memory type is define for existing user of ZONE_DEVICE and a
new device un-addressable type is added for the un-addressable memory
type.  There is a clear separation between what is expected from each
memory type and existing user of ZONE_DEVICE are un-affected by new
requirement and new use of the un-addressable type.  All specific code
path are protect with test against the memory type.

Because memory is un-addressable we use a new special swap type for when a
page is migrated to device memory (this reduces the number of maximum swap
file).

The main two additions beside memory type to ZONE_DEVICE is two callbacks.
First one, page_free() is call whenever page refcount reach 1 (which
means the page is free as ZONE_DEVICE page never reach a refcount of 0).
This allow device driver to manage its memory and associated struct page.

The second callback page_fault() happens when there is a CPU access to an
address that is back by a device page (which are un-addressable by the
CPU).  This callback is responsible to migrate the page back to system
main memory.  Device driver can not block migration back to system memory,
HMM make sure that such page can not be pin into device memory.

If device is in some error condition and can not migrate memory back then
a CPU page fault to device memory should end with SIGBUS.

[arnd@arndb.de: fix warning]
  Link: http://lkml.kernel.org/r/20170823133213.712917-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20170817000548.32038-8-jglisse@redhat.com
Signed-off-by: Jérôme Glisse &lt;jglisse@redhat.com&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Aneesh Kumar &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: David Nellans &lt;dnellans@nvidia.com&gt;
Cc: Evgeny Baskakov &lt;ebaskakov@nvidia.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Mark Hairgrove &lt;mhairgrove@nvidia.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Sherry Cheung &lt;SCheung@nvidia.com&gt;
Cc: Subhash Gutti &lt;sgutti@nvidia.com&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Bob Liu &lt;liubo95@huawei.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
HMM (heterogeneous memory management) need struct page to support
migration from system main memory to device memory.  Reasons for HMM and
migration to device memory is explained with HMM core patch.

This patch deals with device memory that is un-addressable memory (ie CPU
can not access it).  Hence we do not want those struct page to be manage
like regular memory.  That is why we extend ZONE_DEVICE to support
different types of memory.

A persistent memory type is define for existing user of ZONE_DEVICE and a
new device un-addressable type is added for the un-addressable memory
type.  There is a clear separation between what is expected from each
memory type and existing user of ZONE_DEVICE are un-affected by new
requirement and new use of the un-addressable type.  All specific code
path are protect with test against the memory type.

Because memory is un-addressable we use a new special swap type for when a
page is migrated to device memory (this reduces the number of maximum swap
file).

The main two additions beside memory type to ZONE_DEVICE is two callbacks.
First one, page_free() is call whenever page refcount reach 1 (which
means the page is free as ZONE_DEVICE page never reach a refcount of 0).
This allow device driver to manage its memory and associated struct page.

The second callback page_fault() happens when there is a CPU access to an
address that is back by a device page (which are un-addressable by the
CPU).  This callback is responsible to migrate the page back to system
main memory.  Device driver can not block migration back to system memory,
HMM make sure that such page can not be pin into device memory.

If device is in some error condition and can not migrate memory back then
a CPU page fault to device memory should end with SIGBUS.

[arnd@arndb.de: fix warning]
  Link: http://lkml.kernel.org/r/20170823133213.712917-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20170817000548.32038-8-jglisse@redhat.com
Signed-off-by: Jérôme Glisse &lt;jglisse@redhat.com&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Aneesh Kumar &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: David Nellans &lt;dnellans@nvidia.com&gt;
Cc: Evgeny Baskakov &lt;ebaskakov@nvidia.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Mark Hairgrove &lt;mhairgrove@nvidia.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Sherry Cheung &lt;SCheung@nvidia.com&gt;
Cc: Subhash Gutti &lt;sgutti@nvidia.com&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Bob Liu &lt;liubo95@huawei.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: soft-dirty: keep soft-dirty bits over thp migration</title>
<updated>2017-09-09T01:26:45+00:00</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2017-09-08T23:11:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ab6e3d0939bb332d72444a532f0f72e0dfde7b7b'/>
<id>ab6e3d0939bb332d72444a532f0f72e0dfde7b7b</id>
<content type='text'>
Soft dirty bit is designed to keep tracked over page migration.  This
patch makes it work in the same manner for thp migration too.

Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Signed-off-by: Zi Yan &lt;zi.yan@cs.rutgers.edu&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Anshuman Khandual &lt;khandual@linux.vnet.ibm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Nellans &lt;dnellans@nvidia.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Soft dirty bit is designed to keep tracked over page migration.  This
patch makes it work in the same manner for thp migration too.

Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Signed-off-by: Zi Yan &lt;zi.yan@cs.rutgers.edu&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Anshuman Khandual &lt;khandual@linux.vnet.ibm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Nellans &lt;dnellans@nvidia.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: thp: enable thp migration in generic path</title>
<updated>2017-09-09T01:26:45+00:00</updated>
<author>
<name>Zi Yan</name>
<email>zi.yan@cs.rutgers.edu</email>
</author>
<published>2017-09-08T23:10:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=616b8371539a6c487404c3b8fb04078016dab4ba'/>
<id>616b8371539a6c487404c3b8fb04078016dab4ba</id>
<content type='text'>
Add thp migration's core code, including conversions between a PMD entry
and a swap entry, setting PMD migration entry, removing PMD migration
entry, and waiting on PMD migration entries.

This patch makes it possible to support thp migration.  If you fail to
allocate a destination page as a thp, you just split the source thp as
we do now, and then enter the normal page migration.  If you succeed to
allocate destination thp, you enter thp migration.  Subsequent patches
actually enable thp migration for each caller of page migration by
allowing its get_new_page() callback to allocate thps.

[zi.yan@cs.rutgers.edu: fix gcc-4.9.0 -Wmissing-braces warning]
  Link: http://lkml.kernel.org/r/A0ABA698-7486-46C3-B209-E95A9048B22C@cs.rutgers.edu
[akpm@linux-foundation.org: fix x86_64 allnoconfig warning]
Signed-off-by: Zi Yan &lt;zi.yan@cs.rutgers.edu&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Anshuman Khandual &lt;khandual@linux.vnet.ibm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Nellans &lt;dnellans@nvidia.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add thp migration's core code, including conversions between a PMD entry
and a swap entry, setting PMD migration entry, removing PMD migration
entry, and waiting on PMD migration entries.

This patch makes it possible to support thp migration.  If you fail to
allocate a destination page as a thp, you just split the source thp as
we do now, and then enter the normal page migration.  If you succeed to
allocate destination thp, you enter thp migration.  Subsequent patches
actually enable thp migration for each caller of page migration by
allowing its get_new_page() callback to allocate thps.

[zi.yan@cs.rutgers.edu: fix gcc-4.9.0 -Wmissing-braces warning]
  Link: http://lkml.kernel.org/r/A0ABA698-7486-46C3-B209-E95A9048B22C@cs.rutgers.edu
[akpm@linux-foundation.org: fix x86_64 allnoconfig warning]
Signed-off-by: Zi Yan &lt;zi.yan@cs.rutgers.edu&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Anshuman Khandual &lt;khandual@linux.vnet.ibm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Nellans &lt;dnellans@nvidia.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: hwpoison: change PageHWPoison behavior on hugetlb pages</title>
<updated>2017-07-10T23:32:30+00:00</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2017-07-10T22:47:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b37ff71cc626a0c1b5e098ff9a0b723815f6aaeb'/>
<id>b37ff71cc626a0c1b5e098ff9a0b723815f6aaeb</id>
<content type='text'>
We'd like to narrow down the error region in memory error on hugetlb
pages.  However, currently we set PageHWPoison flags on all subpages in
the error hugepage and add # of subpages to num_hwpoison_pages, which
doesn't fit our purpose.

So this patch changes the behavior and we only set PageHWPoison on the
head page then increase num_hwpoison_pages only by 1.  This is a
preparation for narrow-down part which comes in later patches.

Link: http://lkml.kernel.org/r/1496305019-5493-4-git-send-email-n-horiguchi@ah.jp.nec.com
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Anshuman Khandual &lt;khandual@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We'd like to narrow down the error region in memory error on hugetlb
pages.  However, currently we set PageHWPoison flags on all subpages in
the error hugepage and add # of subpages to num_hwpoison_pages, which
doesn't fit our purpose.

So this patch changes the behavior and we only set PageHWPoison on the
head page then increase num_hwpoison_pages only by 1.  This is a
preparation for narrow-down part which comes in later patches.

Link: http://lkml.kernel.org/r/1496305019-5493-4-git-send-email-n-horiguchi@ah.jp.nec.com
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Anshuman Khandual &lt;khandual@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/hwpoison: fix race between soft_offline_page and unpoison_memory</title>
<updated>2015-09-08T22:35:28+00:00</updated>
<author>
<name>Wanpeng Li</name>
<email>wanpeng.li@hotmail.com</email>
</author>
<published>2015-09-08T22:03:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=da1b13ccfbebe0b9d69b5d61eff0a675e19e69a5'/>
<id>da1b13ccfbebe0b9d69b5d61eff0a675e19e69a5</id>
<content type='text'>
Wanpeng Li reported a race between soft_offline_page() and
unpoison_memory(), which causes the following kernel panic:

   BUG: Bad page state in process bash  pfn:97000
   page:ffffea00025c0000 count:0 mapcount:1 mapping:          (null) index:0x7f4fdbe00
   flags: 0x1fffff80080048(uptodate|active|swapbacked)
   page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
   bad because of flags:
   flags: 0x40(active)
   Modules linked in: snd_hda_codec_hdmi i915 rpcsec_gss_krb5 nfsv4 dns_resolver bnep rfcomm nfsd bluetooth auth_rpcgss nfs_acl nfs rfkill lockd grace sunrpc i2c_algo_bit drm_kms_helper snd_hda_codec_realtek snd_hda_codec_generic drm snd_hda_intel fscache snd_hda_codec x86_pkg_temp_thermal coretemp kvm_intel snd_hda_core snd_hwdep kvm snd_pcm snd_seq_dummy snd_seq_oss crct10dif_pclmul snd_seq_midi crc32_pclmul snd_seq_midi_event ghash_clmulni_intel snd_rawmidi aesni_intel lrw gf128mul snd_seq glue_helper ablk_helper snd_seq_device cryptd fuse snd_timer dcdbas serio_raw mei_me parport_pc snd mei ppdev i2c_core video lp soundcore parport lpc_ich shpchp mfd_core ext4 mbcache jbd2 sd_mod e1000e ahci ptp libahci crc32c_intel libata pps_core
   CPU: 3 PID: 2211 Comm: bash Not tainted 4.2.0-rc5-mm1+ #45
   Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
   Call Trace:
     dump_stack+0x48/0x5c
     bad_page+0xe6/0x140
     free_pages_prepare+0x2f9/0x320
     ? uncharge_list+0xdd/0x100
     free_hot_cold_page+0x40/0x170
     __put_single_page+0x20/0x30
     put_page+0x25/0x40
     unmap_and_move+0x1a6/0x1f0
     migrate_pages+0x100/0x1d0
     ? kill_procs+0x100/0x100
     ? unlock_page+0x6f/0x90
     __soft_offline_page+0x127/0x2a0
     soft_offline_page+0xa6/0x200

This race is explained like below:

  CPU0                    CPU1

  soft_offline_page
  __soft_offline_page
  TestSetPageHWPoison
                        unpoison_memory
                        PageHWPoison check (true)
                        TestClearPageHWPoison
                        put_page    -&gt; release refcount held by get_hwpoison_page in unpoison_memory
                        put_page    -&gt; release refcount held by isolate_lru_page in __soft_offline_page
  migrate_pages

The second put_page() releases refcount held by isolate_lru_page() which
will lead to unmap_and_move() releases the last refcount of page and w/
mapcount still 1 since try_to_unmap() is not called if there is only one
user map the page.  Anyway, the page refcount and mapcount will still
mess if the page is mapped by multiple users.

This race was introduced by commit 4491f71260 ("mm/memory-failure: set
PageHWPoison before migrate_pages()"), which focuses on preventing the
reuse of successfully migrated page.  Before this commit we prevent the
reuse by changing the migratetype to MIGRATE_ISOLATE during soft
offlining, which has the following problems, so simply reverting the
commit is not a best option:

  1) it doesn't eliminate the reuse completely, because
     set_migratetype_isolate() can fail to set MIGRATE_ISOLATE to the
     target page if the pageblock of the page contains one or more
     unmovable pages (i.e.  has_unmovable_pages() returns true).

  2) the original code changes migratetype to MIGRATE_ISOLATE
     forcibly, and sets it to MIGRATE_MOVABLE forcibly after soft offline,
     regardless of the original migratetype state, which could impact
     other subsystems like memory hotplug or compaction.

This patch moves PageSetHWPoison just after put_page() in
unmap_and_move(), which closes up the reported race window and minimizes
another race window b/w SetPageHWPoison and reallocation (which causes
the reuse of soft-offlined page.) The latter race window still exists
but it's acceptable, because it's rare and effectively the same as
ordinary "containment failure" case even if it happens, so keep the
window open is acceptable.

Fixes: 4491f71260 ("mm/memory-failure: set PageHWPoison before migrate_pages()")
Signed-off-by: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Reported-by: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Tested-by: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Wanpeng Li reported a race between soft_offline_page() and
unpoison_memory(), which causes the following kernel panic:

   BUG: Bad page state in process bash  pfn:97000
   page:ffffea00025c0000 count:0 mapcount:1 mapping:          (null) index:0x7f4fdbe00
   flags: 0x1fffff80080048(uptodate|active|swapbacked)
   page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
   bad because of flags:
   flags: 0x40(active)
   Modules linked in: snd_hda_codec_hdmi i915 rpcsec_gss_krb5 nfsv4 dns_resolver bnep rfcomm nfsd bluetooth auth_rpcgss nfs_acl nfs rfkill lockd grace sunrpc i2c_algo_bit drm_kms_helper snd_hda_codec_realtek snd_hda_codec_generic drm snd_hda_intel fscache snd_hda_codec x86_pkg_temp_thermal coretemp kvm_intel snd_hda_core snd_hwdep kvm snd_pcm snd_seq_dummy snd_seq_oss crct10dif_pclmul snd_seq_midi crc32_pclmul snd_seq_midi_event ghash_clmulni_intel snd_rawmidi aesni_intel lrw gf128mul snd_seq glue_helper ablk_helper snd_seq_device cryptd fuse snd_timer dcdbas serio_raw mei_me parport_pc snd mei ppdev i2c_core video lp soundcore parport lpc_ich shpchp mfd_core ext4 mbcache jbd2 sd_mod e1000e ahci ptp libahci crc32c_intel libata pps_core
   CPU: 3 PID: 2211 Comm: bash Not tainted 4.2.0-rc5-mm1+ #45
   Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
   Call Trace:
     dump_stack+0x48/0x5c
     bad_page+0xe6/0x140
     free_pages_prepare+0x2f9/0x320
     ? uncharge_list+0xdd/0x100
     free_hot_cold_page+0x40/0x170
     __put_single_page+0x20/0x30
     put_page+0x25/0x40
     unmap_and_move+0x1a6/0x1f0
     migrate_pages+0x100/0x1d0
     ? kill_procs+0x100/0x100
     ? unlock_page+0x6f/0x90
     __soft_offline_page+0x127/0x2a0
     soft_offline_page+0xa6/0x200

This race is explained like below:

  CPU0                    CPU1

  soft_offline_page
  __soft_offline_page
  TestSetPageHWPoison
                        unpoison_memory
                        PageHWPoison check (true)
                        TestClearPageHWPoison
                        put_page    -&gt; release refcount held by get_hwpoison_page in unpoison_memory
                        put_page    -&gt; release refcount held by isolate_lru_page in __soft_offline_page
  migrate_pages

The second put_page() releases refcount held by isolate_lru_page() which
will lead to unmap_and_move() releases the last refcount of page and w/
mapcount still 1 since try_to_unmap() is not called if there is only one
user map the page.  Anyway, the page refcount and mapcount will still
mess if the page is mapped by multiple users.

This race was introduced by commit 4491f71260 ("mm/memory-failure: set
PageHWPoison before migrate_pages()"), which focuses on preventing the
reuse of successfully migrated page.  Before this commit we prevent the
reuse by changing the migratetype to MIGRATE_ISOLATE during soft
offlining, which has the following problems, so simply reverting the
commit is not a best option:

  1) it doesn't eliminate the reuse completely, because
     set_migratetype_isolate() can fail to set MIGRATE_ISOLATE to the
     target page if the pageblock of the page contains one or more
     unmovable pages (i.e.  has_unmovable_pages() returns true).

  2) the original code changes migratetype to MIGRATE_ISOLATE
     forcibly, and sets it to MIGRATE_MOVABLE forcibly after soft offline,
     regardless of the original migratetype state, which could impact
     other subsystems like memory hotplug or compaction.

This patch moves PageSetHWPoison just after put_page() in
unmap_and_move(), which closes up the reported race window and minimizes
another race window b/w SetPageHWPoison and reallocation (which causes
the reuse of soft-offlined page.) The latter race window still exists
but it's acceptable, because it's rare and effectively the same as
ordinary "containment failure" case even if it happens, so keep the
window open is acceptable.

Fixes: 4491f71260 ("mm/memory-failure: set PageHWPoison before migrate_pages()")
Signed-off-by: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Reported-by: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Tested-by: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/hwpoison: introduce num_poisoned_pages wrappers</title>
<updated>2015-09-08T22:35:28+00:00</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2015-09-08T22:03:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8e30456b6c56029ecbb43b777519175e478adfbf'/>
<id>8e30456b6c56029ecbb43b777519175e478adfbf</id>
<content type='text'>
num_poisoned_pages counter will be changed outside mm/memory-failure.c
by a subsequent patch, so this patch prepares wrappers to manipulate it.

Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Tested-by: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
num_poisoned_pages counter will be changed outside mm/memory-failure.c
by a subsequent patch, so this patch prepares wrappers to manipulate it.

Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Tested-by: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: remove remaining references to NUMA hinting bits and helpers</title>
<updated>2015-02-13T02:54:08+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2015-02-12T22:58:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=21d9ee3eda7792c45880b2f11bff8e95c9a061fb'/>
<id>21d9ee3eda7792c45880b2f11bff8e95c9a061fb</id>
<content type='text'>
This patch removes the NUMA PTE bits and associated helpers.  As a
side-effect it increases the maximum possible swap space on x86-64.

One potential source of problems is races between the marking of PTEs
PROT_NONE, NUMA hinting faults and migration.  It must be guaranteed that
a PTE being protected is not faulted in parallel, seen as a pte_none and
corrupting memory.  The base case is safe but transhuge has problems in
the past due to an different migration mechanism and a dependance on page
lock to serialise migrations and warrants a closer look.

task_work hinting update			parallel fault
------------------------			--------------
change_pmd_range
  change_huge_pmd
    __pmd_trans_huge_lock
      pmdp_get_and_clear
						__handle_mm_fault
						pmd_none
						  do_huge_pmd_anonymous_page
						  read? pmd_lock blocks until hinting complete, fail !pmd_none test
						  write? __do_huge_pmd_anonymous_page acquires pmd_lock, checks pmd_none
      pmd_modify
      set_pmd_at

task_work hinting update			parallel migration
------------------------			------------------
change_pmd_range
  change_huge_pmd
    __pmd_trans_huge_lock
      pmdp_get_and_clear
						__handle_mm_fault
						  do_huge_pmd_numa_page
						    migrate_misplaced_transhuge_page
						    pmd_lock waits for updates to complete, recheck pmd_same
      pmd_modify
      set_pmd_at

Both of those are safe and the case where a transhuge page is inserted
during a protection update is unchanged.  The case where two processes try
migrating at the same time is unchanged by this series so should still be
ok.  I could not find a case where we are accidentally depending on the
PTE not being cleared and flushed.  If one is missed, it'll manifest as
corruption problems that start triggering shortly after this series is
merged and only happen when NUMA balancing is enabled.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Tested-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Kirill Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Mark Brown &lt;broonie@kernel.org&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch removes the NUMA PTE bits and associated helpers.  As a
side-effect it increases the maximum possible swap space on x86-64.

One potential source of problems is races between the marking of PTEs
PROT_NONE, NUMA hinting faults and migration.  It must be guaranteed that
a PTE being protected is not faulted in parallel, seen as a pte_none and
corrupting memory.  The base case is safe but transhuge has problems in
the past due to an different migration mechanism and a dependance on page
lock to serialise migrations and warrants a closer look.

task_work hinting update			parallel fault
------------------------			--------------
change_pmd_range
  change_huge_pmd
    __pmd_trans_huge_lock
      pmdp_get_and_clear
						__handle_mm_fault
						pmd_none
						  do_huge_pmd_anonymous_page
						  read? pmd_lock blocks until hinting complete, fail !pmd_none test
						  write? __do_huge_pmd_anonymous_page acquires pmd_lock, checks pmd_none
      pmd_modify
      set_pmd_at

task_work hinting update			parallel migration
------------------------			------------------
change_pmd_range
  change_huge_pmd
    __pmd_trans_huge_lock
      pmdp_get_and_clear
						__handle_mm_fault
						  do_huge_pmd_numa_page
						    migrate_misplaced_transhuge_page
						    pmd_lock waits for updates to complete, recheck pmd_same
      pmd_modify
      set_pmd_at

Both of those are safe and the case where a transhuge page is inserted
during a protection update is unchanged.  The case where two processes try
migrating at the same time is unchanged by this series so should still be
ok.  I could not find a case where we are accidentally depending on the
PTE not being cleared and flushed.  If one is missed, it'll manifest as
corruption problems that start triggering shortly after this series is
merged and only happen when NUMA balancing is enabled.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Tested-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Kirill Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Mark Brown &lt;broonie@kernel.org&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/hugetlb: take page table lock in follow_huge_pmd()</title>
<updated>2015-02-12T01:06:01+00:00</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2015-02-11T23:25:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e66f17ff71772b209eed39de35aaa99ba819c93d'/>
<id>e66f17ff71772b209eed39de35aaa99ba819c93d</id>
<content type='text'>
We have a race condition between move_pages() and freeing hugepages, where
move_pages() calls follow_page(FOLL_GET) for hugepages internally and
tries to get its refcount without preventing concurrent freeing.  This
race crashes the kernel, so this patch fixes it by moving FOLL_GET code
for hugepages into follow_huge_pmd() with taking the page table lock.

This patch intentionally removes page==NULL check after pte_page.
This is justified because pte_page() never returns NULL for any
architectures or configurations.

This patch changes the behavior of follow_huge_pmd() for tail pages and
then tail pages can be pinned/returned.  So the caller must be changed to
properly handle the returned tail pages.

We could have a choice to add the similar locking to
follow_huge_(addr|pud) for consistency, but it's not necessary because
currently these functions don't support FOLL_GET flag, so let's leave it
for future development.

Here is the reproducer:

  $ cat movepages.c
  #include &lt;stdio.h&gt;
  #include &lt;stdlib.h&gt;
  #include &lt;numaif.h&gt;

  #define ADDR_INPUT      0x700000000000UL
  #define HPS             0x200000
  #define PS              0x1000

  int main(int argc, char *argv[]) {
          int i;
          int nr_hp = strtol(argv[1], NULL, 0);
          int nr_p  = nr_hp * HPS / PS;
          int ret;
          void **addrs;
          int *status;
          int *nodes;
          pid_t pid;

          pid = strtol(argv[2], NULL, 0);
          addrs  = malloc(sizeof(char *) * nr_p + 1);
          status = malloc(sizeof(char *) * nr_p + 1);
          nodes  = malloc(sizeof(char *) * nr_p + 1);

          while (1) {
                  for (i = 0; i &lt; nr_p; i++) {
                          addrs[i] = (void *)ADDR_INPUT + i * PS;
                          nodes[i] = 1;
                          status[i] = 0;
                  }
                  ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
                                        MPOL_MF_MOVE_ALL);
                  if (ret == -1)
                          err("move_pages");

                  for (i = 0; i &lt; nr_p; i++) {
                          addrs[i] = (void *)ADDR_INPUT + i * PS;
                          nodes[i] = 0;
                          status[i] = 0;
                  }
                  ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
                                        MPOL_MF_MOVE_ALL);
                  if (ret == -1)
                          err("move_pages");
          }
          return 0;
  }

  $ cat hugepage.c
  #include &lt;stdio.h&gt;
  #include &lt;sys/mman.h&gt;
  #include &lt;string.h&gt;

  #define ADDR_INPUT      0x700000000000UL
  #define HPS             0x200000

  int main(int argc, char *argv[]) {
          int nr_hp = strtol(argv[1], NULL, 0);
          char *p;

          while (1) {
                  p = mmap((void *)ADDR_INPUT, nr_hp * HPS, PROT_READ | PROT_WRITE,
                           MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
                  if (p != (void *)ADDR_INPUT) {
                          perror("mmap");
                          break;
                  }
                  memset(p, 0, nr_hp * HPS);
                  munmap(p, nr_hp * HPS);
          }
  }

  $ sysctl vm.nr_hugepages=40
  $ ./hugepage 10 &amp;
  $ ./movepages 10 $(pgrep -f hugepage)

Fixes: e632a938d914 ("mm: migrate: add hugepage migration code to move_pages()")
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Reported-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: James Hogan &lt;james.hogan@imgtec.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Cc: Nishanth Aravamudan &lt;nacc@linux.vnet.ibm.com&gt;
Cc: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Cc: Steve Capper &lt;steve.capper@linaro.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[3.12+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have a race condition between move_pages() and freeing hugepages, where
move_pages() calls follow_page(FOLL_GET) for hugepages internally and
tries to get its refcount without preventing concurrent freeing.  This
race crashes the kernel, so this patch fixes it by moving FOLL_GET code
for hugepages into follow_huge_pmd() with taking the page table lock.

This patch intentionally removes page==NULL check after pte_page.
This is justified because pte_page() never returns NULL for any
architectures or configurations.

This patch changes the behavior of follow_huge_pmd() for tail pages and
then tail pages can be pinned/returned.  So the caller must be changed to
properly handle the returned tail pages.

We could have a choice to add the similar locking to
follow_huge_(addr|pud) for consistency, but it's not necessary because
currently these functions don't support FOLL_GET flag, so let's leave it
for future development.

Here is the reproducer:

  $ cat movepages.c
  #include &lt;stdio.h&gt;
  #include &lt;stdlib.h&gt;
  #include &lt;numaif.h&gt;

  #define ADDR_INPUT      0x700000000000UL
  #define HPS             0x200000
  #define PS              0x1000

  int main(int argc, char *argv[]) {
          int i;
          int nr_hp = strtol(argv[1], NULL, 0);
          int nr_p  = nr_hp * HPS / PS;
          int ret;
          void **addrs;
          int *status;
          int *nodes;
          pid_t pid;

          pid = strtol(argv[2], NULL, 0);
          addrs  = malloc(sizeof(char *) * nr_p + 1);
          status = malloc(sizeof(char *) * nr_p + 1);
          nodes  = malloc(sizeof(char *) * nr_p + 1);

          while (1) {
                  for (i = 0; i &lt; nr_p; i++) {
                          addrs[i] = (void *)ADDR_INPUT + i * PS;
                          nodes[i] = 1;
                          status[i] = 0;
                  }
                  ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
                                        MPOL_MF_MOVE_ALL);
                  if (ret == -1)
                          err("move_pages");

                  for (i = 0; i &lt; nr_p; i++) {
                          addrs[i] = (void *)ADDR_INPUT + i * PS;
                          nodes[i] = 0;
                          status[i] = 0;
                  }
                  ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
                                        MPOL_MF_MOVE_ALL);
                  if (ret == -1)
                          err("move_pages");
          }
          return 0;
  }

  $ cat hugepage.c
  #include &lt;stdio.h&gt;
  #include &lt;sys/mman.h&gt;
  #include &lt;string.h&gt;

  #define ADDR_INPUT      0x700000000000UL
  #define HPS             0x200000

  int main(int argc, char *argv[]) {
          int nr_hp = strtol(argv[1], NULL, 0);
          char *p;

          while (1) {
                  p = mmap((void *)ADDR_INPUT, nr_hp * HPS, PROT_READ | PROT_WRITE,
                           MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
                  if (p != (void *)ADDR_INPUT) {
                          perror("mmap");
                          break;
                  }
                  memset(p, 0, nr_hp * HPS);
                  munmap(p, nr_hp * HPS);
          }
  }

  $ sysctl vm.nr_hugepages=40
  $ ./hugepage 10 &amp;
  $ ./movepages 10 $(pgrep -f hugepage)

Fixes: e632a938d914 ("mm: migrate: add hugepage migration code to move_pages()")
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Reported-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: James Hogan &lt;james.hogan@imgtec.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Cc: Nishanth Aravamudan &lt;nacc@linux.vnet.ibm.com&gt;
Cc: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Cc: Steve Capper &lt;steve.capper@linaro.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[3.12+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: remove rest usage of VM_NONLINEAR and pte_file()</title>
<updated>2015-02-10T22:30:31+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2015-02-10T22:10:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0661a33611fca12570cba48d9344ce68834ee86c'/>
<id>0661a33611fca12570cba48d9344ce68834ee86c</id>
<content type='text'>
One bit in -&gt;vm_flags is unused now!

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
One bit in -&gt;vm_flags is unused now!

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels</title>
<updated>2014-06-04T23:53:55+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2014-06-04T23:06:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c46a7c817e662a820373bb76b88d0ad67d6abe5d'/>
<id>c46a7c817e662a820373bb76b88d0ad67d6abe5d</id>
<content type='text'>
_PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting
faults on x86.  Care is taken such that _PAGE_NUMA is used only in
situations where the VMA flags distinguish between NUMA hinting faults
and prot_none faults.  This decision was x86-specific and conceptually
it is difficult requiring special casing to distinguish between PROTNONE
and NUMA ptes based on context.

Fundamentally, we only need the _PAGE_NUMA bit to tell the difference
between an entry that is really unmapped and a page that is protected
for NUMA hinting faults as if the PTE is not present then a fault will
be trapped.

Swap PTEs on x86-64 use the bits after _PAGE_GLOBAL for the offset.
This patch shrinks the maximum possible swap size and uses the bit to
uniquely distinguish between NUMA hinting ptes and swap ptes.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: David Vrabel &lt;david.vrabel@citrix.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Steven Noonan &lt;steven@uplinklabs.net&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
_PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting
faults on x86.  Care is taken such that _PAGE_NUMA is used only in
situations where the VMA flags distinguish between NUMA hinting faults
and prot_none faults.  This decision was x86-specific and conceptually
it is difficult requiring special casing to distinguish between PROTNONE
and NUMA ptes based on context.

Fundamentally, we only need the _PAGE_NUMA bit to tell the difference
between an entry that is really unmapped and a page that is protected
for NUMA hinting faults as if the PTE is not present then a fault will
be trapped.

Swap PTEs on x86-64 use the bits after _PAGE_GLOBAL for the offset.
This patch shrinks the maximum possible swap size and uses the bit to
uniquely distinguish between NUMA hinting ptes and swap ptes.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: David Vrabel &lt;david.vrabel@citrix.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Steven Noonan &lt;steven@uplinklabs.net&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
