linux-toradex.git/mm/sparse-vmemmap.c, branch v5.16

mm: remove redundant smp_wmb()

2021-11-06T20:30:36+00:00

The smp_wmb() which is in the __pte_alloc() is used to ensure all ptes
setup is visible before the pte is made visible to other CPUs by being
put into page tables.  We only need this when the pte is actually
populated, so move it to pmd_install().  __pte_alloc_kernel(),
__p4d_alloc(), __pud_alloc() and __pmd_alloc() are similar to this case.

We can also defer smp_wmb() to the place where the pmd entry is really
populated by preallocated pte.  There are two kinds of user of
preallocated pte, one is filemap & finish_fault(), another is THP.  The
former does not need another smp_wmb() because the smp_wmb() has been
done by pmd_install().  Fortunately, the latter also does not need
another smp_wmb() because there is already a smp_wmb() before populating
the new pte when the THP uses a preallocated pte to split a huge pmd.

Link: https://lkml.kernel.org/r/20210901102722.47686-3-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng 
Reviewed-by: Muchun Song 
Acked-by: David Hildenbrand 
Acked-by: Kirill A. Shutemov 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Mika Penttila 
Cc: Thomas Gleixner 
Cc: Vladimir Davydov 
Cc: Vlastimil Babka 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: sparsemem: split the huge PMD mapping of vmemmap pages

2021-07-01T03:47:26+00:00

Patch series "Split huge PMD mapping of vmemmap pages", v4.

In order to reduce the difficulty of code review in series[1].  We disable
huge PMD mapping of vmemmap pages when that feature is enabled.  In this
series, we do not disable huge PMD mapping of vmemmap pages anymore.  We
will split huge PMD mapping when needed.  When HugeTLB pages are freed
from the pool we do not attempt coalasce and move back to a PMD mapping
because it is much more complex.

[1] https://lore.kernel.org/linux-doc/20210510030027.56044-1-songmuchun@bytedance.com/

This patch (of 3):

In [1], PMD mappings of vmemmap pages were disabled if the the feature
hugetlb_free_vmemmap was enabled.  This was done to simplify the initial
implementation of vmmemap freeing for hugetlb pages.  Now, remove this
simplification by allowing PMD mapping and switching to PTE mappings as
needed for allocated hugetlb pages.

When a hugetlb page is allocated, the vmemmap page tables are walked to
free vmemmap pages.  During this walk, split huge PMD mappings to PTE
mappings as required.  In the unlikely case PTE pages can not be
allocated, return error(ENOMEM) and do not optimize vmemmap of the hugetlb
page.

When HugeTLB pages are freed from the pool, we do not attempt to
coalesce and move back to a PMD mapping because it is much more complex.

[1] https://lkml.kernel.org/r/20210510030027.56044-8-songmuchun@bytedance.com

Link: https://lkml.kernel.org/r/20210616094915.34432-1-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20210616094915.34432-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song 
Reviewed-by: Mike Kravetz 
Cc: Oscar Salvador 
Cc: Michal Hocko 
Cc: David Hildenbrand 
Cc: Chen Huang 
Cc: Jonathan Corbet 
Cc: Xiongchun Duan 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page

2021-07-01T03:47:25+00:00

When we free a HugeTLB page to the buddy allocator, we need to allocate
the vmemmap pages associated with it.  However, we may not be able to
allocate the vmemmap pages when the system is under memory pressure.  In
this case, we just refuse to free the HugeTLB page.  This changes behavior
in some corner cases as listed below:

 1) Failing to free a huge page triggered by the user (decrease nr_pages).

    User needs to try again later.

 2) Failing to free a surplus huge page when freed by the application.

    Try again later when freeing a huge page next time.

 3) Failing to dissolve a free huge page on ZONE_MOVABLE via
    offline_pages().

    This can happen when we have plenty of ZONE_MOVABLE memory, but
    not enough kernel memory to allocate vmemmmap pages.  We may even
    be able to migrate huge page contents, but will not be able to
    dissolve the source huge page.  This will prevent an offline
    operation and is unfortunate as memory offlining is expected to
    succeed on movable zones.  Users that depend on memory hotplug
    to succeed for movable zones should carefully consider whether the
    memory savings gained from this feature are worth the risk of
    possibly not being able to offline memory in certain situations.

 4) Failing to dissolve a huge page on CMA/ZONE_MOVABLE via
    alloc_contig_range() - once we have that handling in place. Mainly
    affects CMA and virtio-mem.

    Similar to 3). virito-mem will handle migration errors gracefully.
    CMA might be able to fallback on other free areas within the CMA
    region.

Vmemmap pages are allocated from the page freeing context.  In order for
those allocations to be not disruptive (e.g.  trigger oom killer)
__GFP_NORETRY is used.  hugetlb_lock is dropped for the allocation because
a non sleeping allocation would be too fragile and it could fail too
easily under memory pressure.  GFP_ATOMIC or other modes to access memory
reserves is not used because we want to prevent consuming reserves under
heavy hugetlb freeing.

[mike.kravetz@oracle.com: fix dissolve_free_huge_page use of tail/head page]
  Link: https://lkml.kernel.org/r/20210527231225.226987-1-mike.kravetz@oracle.com
[willy@infradead.org: fix alloc_vmemmap_page_list documentation warning]
  Link: https://lkml.kernel.org/r/20210615200242.1716568-6-willy@infradead.org

Link: https://lkml.kernel.org/r/20210510030027.56044-7-songmuchun@bytedance.com
Signed-off-by: Muchun Song 
Signed-off-by: Mike Kravetz 
Signed-off-by: Matthew Wilcox (Oracle) 
Reviewed-by: Mike Kravetz 
Reviewed-by: Oscar Salvador 
Cc: Alexander Viro 
Cc: Andy Lutomirski 
Cc: Anshuman Khandual 
Cc: Balbir Singh 
Cc: Barry Song 
Cc: Bodeddula Balasubramaniam 
Cc: Borislav Petkov 
Cc: Chen Huang 
Cc: Dave Hansen 
Cc: David Hildenbrand 
Cc: David Rientjes 
Cc: HORIGUCHI NAOYA 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: Joao Martins 
Cc: Joerg Roedel 
Cc: Jonathan Corbet 
Cc: Matthew Wilcox 
Cc: Miaohe Lin 
Cc: Michal Hocko 
Cc: Mina Almasry 
Cc: Oliver Neukum 
Cc: Paul E. McKenney 
Cc: Pawan Gupta 
Cc: Peter Zijlstra 
Cc: Randy Dunlap 
Cc: Thomas Gleixner 
Cc: Xiongchun Duan 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: hugetlb: free the vmemmap pages associated with each HugeTLB page

2021-07-01T03:47:25+00:00

Every HugeTLB has more than one struct page structure.  We __know__ that
we only use the first 4 (__NR_USED_SUBPAGE) struct page structures to
store metadata associated with each HugeTLB.

There are a lot of struct page structures associated with each HugeTLB
page.  For tail pages, the value of compound_head is the same.  So we can
reuse first page of tail page structures.  We map the virtual addresses of
the remaining pages of tail page structures to the first tail page struct,
and then free these page frames.  Therefore, we need to reserve two pages
as vmemmap areas.

When we allocate a HugeTLB page from the buddy, we can free some vmemmap
pages associated with each HugeTLB page.  It is more appropriate to do it
in the prep_new_huge_page().

The free_vmemmap_pages_per_hpage(), which indicates how many vmemmap pages
associated with a HugeTLB page can be freed, returns zero for now, which
means the feature is disabled.  We will enable it once all the
infrastructure is there.

[willy@infradead.org: fix documentation warning]
  Link: https://lkml.kernel.org/r/20210615200242.1716568-5-willy@infradead.org

Link: https://lkml.kernel.org/r/20210510030027.56044-5-songmuchun@bytedance.com
Signed-off-by: Muchun Song 
Signed-off-by: Matthew Wilcox (Oracle) 
Reviewed-by: Oscar Salvador 
Tested-by: Chen Huang 
Tested-by: Bodeddula Balasubramaniam 
Acked-by: Michal Hocko 
Reviewed-by: Mike Kravetz 
Cc: Alexander Viro 
Cc: Andy Lutomirski 
Cc: Anshuman Khandual 
Cc: Balbir Singh 
Cc: Barry Song 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: David Hildenbrand 
Cc: David Rientjes 
Cc: HORIGUCHI NAOYA 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: Joao Martins 
Cc: Joerg Roedel 
Cc: Jonathan Corbet 
Cc: Matthew Wilcox 
Cc: Miaohe Lin 
Cc: Mina Almasry 
Cc: Oliver Neukum 
Cc: Paul E. McKenney 
Cc: Pawan Gupta 
Cc: Peter Zijlstra 
Cc: Randy Dunlap 
Cc: Thomas Gleixner 
Cc: Xiongchun Duan 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/sparse: only sub-section aligned range would be populated

2020-08-07T18:33:27+00:00

There are two code path which invoke __populate_section_memmap()

  * sparse_init_nid()
  * sparse_add_section()

For both case, we are sure the memory range is sub-section aligned.

  * we pass PAGES_PER_SECTION to sparse_init_nid()
  * we check range by check_pfn_span() before calling
    sparse_add_section()

Also, the counterpart of __populate_section_memmap(), we don't do such
calculation and check since the range is checked by check_pfn_span() in
__remove_pages().

Clear the calculation and check to keep it simple and comply with its
counterpart.

Signed-off-by: Wei Yang 
Signed-off-by: Andrew Morton 
Acked-by: David Hildenbrand 
Link: http://lkml.kernel.org/r/20200703031828.14645-1-richard.weiyang@linux.alibaba.com
Signed-off-by: Linus Torvalds

mm/sparsemem: enable vmem_altmap support in vmemmap_alloc_block_buf()

2020-08-07T18:33:27+00:00

There are many instances where vmemap allocation is often switched between
regular memory and device memory just based on whether altmap is available
or not.  vmemmap_alloc_block_buf() is used in various platforms to
allocate vmemmap mappings.  Lets also enable it to handle altmap based
device memory allocation along with existing regular memory allocations.
This will help in avoiding the altmap based allocation switch in many
places.  To summarize there are two different methods to call
vmemmap_alloc_block_buf().

vmemmap_alloc_block_buf(size, node, NULL)   /* Allocate from system RAM */
vmemmap_alloc_block_buf(size, node, altmap) /* Allocate from altmap */

This converts altmap_alloc_block_buf() into a static function, drops it's
entry from the header and updates Documentation/vm/memory-model.rst.

Suggested-by: Robin Murphy 
Signed-off-by: Anshuman Khandual 
Signed-off-by: Andrew Morton 
Tested-by: Jia He 
Reviewed-by: Catalin Marinas 
Cc: Jonathan Corbet 
Cc: Will Deacon 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Dan Williams 
Cc: David Hildenbrand 
Cc: Fenghua Yu 
Cc: Hsin-Yi Wang 
Cc: "Kirill A. Shutemov" 
Cc: Mark Rutland 
Cc: "Matthew Wilcox (Oracle)" 
Cc: Michal Hocko 
Cc: Mike Rapoport 
Cc: Palmer Dabbelt 
Cc: Paul Walmsley 
Cc: Pavel Tatashin 
Cc: Steve Capper 
Cc: Tony Luck 
Cc: Yu Zhao 
Link: http://lkml.kernel.org/r/1594004178-8861-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds

mm/sparsemem: enable vmem_altmap support in vmemmap_populate_basepages()

2020-08-07T18:33:27+00:00

Patch series "arm64: Enable vmemmap mapping from device memory", v4.

This series enables vmemmap backing memory allocation from device memory
ranges on arm64.  But before that, it enables vmemmap_populate_basepages()
and vmemmap_alloc_block_buf() to accommodate struct vmem_altmap based
alocation requests.

This patch (of 3):

vmemmap_populate_basepages() is used across platforms to allocate backing
memory for vmemmap mapping.  This is used as a standard default choice or
as a fallback when intended huge pages allocation fails.  This just
creates entire vmemmap mapping with base pages (PAGE_SIZE).

On arm64 platforms, vmemmap_populate_basepages() is called instead of the
platform specific vmemmap_populate() when ARM64_SWAPPER_USES_SECTION_MAPS
is not enabled as in case for ARM64_16K_PAGES and ARM64_64K_PAGES configs.

At present vmemmap_populate_basepages() does not support allocating from
driver defined struct vmem_altmap while trying to create vmemmap mapping
for a device memory range.  It prevents ARM64_16K_PAGES and
ARM64_64K_PAGES configs on arm64 from supporting device memory with
vmemap_altmap request.

This enables vmem_altmap support in vmemmap_populate_basepages() unlocking
device memory allocation for vmemap mapping on arm64 platforms with 16K or
64K base page configs.

Each architecture should evaluate and decide on subscribing device memory
based base page allocation through vmemmap_populate_basepages().  Hence
lets keep it disabled on all archs in order to preserve the existing
semantics.  A subsequent patch enables it on arm64.

Signed-off-by: Anshuman Khandual 
Signed-off-by: Andrew Morton 
Tested-by: Jia He 
Reviewed-by: David Hildenbrand 
Acked-by: Will Deacon 
Acked-by: Catalin Marinas 
Cc: Mark Rutland 
Cc: Paul Walmsley 
Cc: Palmer Dabbelt 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Mike Rapoport 
Cc: Michal Hocko 
Cc: "Matthew Wilcox (Oracle)" 
Cc: "Kirill A. Shutemov" 
Cc: Dan Williams 
Cc: Pavel Tatashin 
Cc: Benjamin Herrenschmidt 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Hsin-Yi Wang 
Cc: Jonathan Corbet 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Robin Murphy 
Cc: Steve Capper 
Cc: Yu Zhao 
Link: http://lkml.kernel.org/r/1594004178-8861-1-git-send-email-anshuman.khandual@arm.com
Link: http://lkml.kernel.org/r/1594004178-8861-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds

mm: don't include asm/pgtable.h if linux/mm.h is already included

2020-06-09T16:39:13+00:00

Patch series "mm: consolidate definitions of page table accessors", v2.

The low level page table accessors (pXY_index(), pXY_offset()) are
duplicated across all architectures and sometimes more than once.  For
instance, we have 31 definition of pgd_offset() for 25 supported
architectures.

Most of these definitions are actually identical and typically it boils
down to, e.g.

static inline unsigned long pmd_index(unsigned long address)
{
        return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
}

static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
{
        return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
}

These definitions can be shared among 90% of the arches provided
XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

For architectures that really need a custom version there is always
possibility to override the generic version with the usual ifdefs magic.

These patches introduce include/linux/pgtable.h that replaces
include/asm-generic/pgtable.h and add the definitions of the page table
accessors to the new header.

This patch (of 12):

The linux/mm.h header includes  to allow inlining of the
functions involving page table manipulations, e.g.  pte_alloc() and
pmd_alloc().  So, there is no point to explicitly include 
in the files that include .

The include statements in such cases are remove with a simple loop:

	for f in $(git grep -l "include ") ; do
		sed -i -e '/include / d' $f
	done

Signed-off-by: Mike Rapoport 
Signed-off-by: Andrew Morton 
Cc: Arnd Bergmann 
Cc: Borislav Petkov 
Cc: Brian Cain 
Cc: Catalin Marinas 
Cc: Chris Zankel 
Cc: "David S. Miller" 
Cc: Geert Uytterhoeven 
Cc: Greentime Hu 
Cc: Greg Ungerer 
Cc: Guan Xuetao 
Cc: Guo Ren 
Cc: Heiko Carstens 
Cc: Helge Deller 
Cc: Ingo Molnar 
Cc: Ley Foon Tan 
Cc: Mark Salter 
Cc: Matthew Wilcox 
Cc: Matt Turner 
Cc: Max Filippov 
Cc: Michael Ellerman 
Cc: Michal Simek 
Cc: Mike Rapoport 
Cc: Nick Hu 
Cc: Paul Walmsley 
Cc: Richard Weinberger 
Cc: Rich Felker 
Cc: Russell King 
Cc: Stafford Horne 
Cc: Thomas Bogendoerfer 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Cc: Vincent Chen 
Cc: Vineet Gupta 
Cc: Will Deacon 
Cc: Yoshinori Sato 
Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
Signed-off-by: Linus Torvalds

mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap()

2019-07-19T00:08:07+00:00

Allow sub-section sized ranges to be added to the memmap.

populate_section_memmap() takes an explict pfn range rather than
assuming a full section, and those parameters are plumbed all the way
through to vmmemap_populate().  There should be no sub-section usage in
current deployments.  New warnings are added to clarify which memmap
allocation paths are sub-section capable.

Link: http://lkml.kernel.org/r/156092352058.979959.6551283472062305149.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams 
Reviewed-by: Pavel Tatashin 
Tested-by: Aneesh Kumar K.V 	[ppc64]
Reviewed-by: Oscar Salvador 
Cc: Michal Hocko 
Cc: David Hildenbrand 
Cc: Logan Gunthorpe 
Cc: Jane Chu 
Cc: Jeff Moyer 
Cc: Jérôme Glisse 
Cc: Jonathan Corbet 
Cc: Mike Rapoport 
Cc: Toshi Kani 
Cc: Vlastimil Babka 
Cc: Wei Yang 
Cc: Jason Gunthorpe 
Cc: Christoph Hellwig 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: remove include/linux/bootmem.h

2018-10-31T15:54:16+00:00

Move remaining definitions and declarations from include/linux/bootmem.h
into include/linux/memblock.h and remove the redundant header.

The includes were replaced with the semantic patch below and then
semi-automated removal of duplicated '#include 

@@
@@
- #include 
+ #include 

[sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
  Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
[sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
  Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
[sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
  Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport 
Signed-off-by: Stephen Rothwell 
Acked-by: Michal Hocko 
Cc: Catalin Marinas 
Cc: Chris Zankel 
Cc: "David S. Miller" 
Cc: Geert Uytterhoeven 
Cc: Greentime Hu 
Cc: Greg Kroah-Hartman 
Cc: Guan Xuetao 
Cc: Ingo Molnar 
Cc: "James E.J. Bottomley" 
Cc: Jonas Bonn 
Cc: Jonathan Corbet 
Cc: Ley Foon Tan 
Cc: Mark Salter 
Cc: Martin Schwidefsky 
Cc: Matt Turner 
Cc: Michael Ellerman 
Cc: Michal Simek 
Cc: Palmer Dabbelt 
Cc: Paul Burton 
Cc: Richard Kuo 
Cc: Richard Weinberger 
Cc: Rich Felker 
Cc: Russell King 
Cc: Serge Semin 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Cc: Vineet Gupta 
Cc: Yoshinori Sato 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds