linux-toradex.git/mm/page_alloc.c, branch v2.6.27.50

page-allocator: preserve PFN ordering when __GFP_COLD is set

2009-08-16T21:26:35+00:00

commit e084b2d95e48b31aa45f9c49ffc6cdae8bdb21d4 upstream.

Fix a post-2.6.24 performace regression caused by
3dfa5721f12c3d5a441448086bee156887daa961 ("page-allocator: preserve PFN
ordering when __GFP_COLD is set").

Narayanan reports "The regression is around 15%.  There is no disk controller
as our setup is based on Samsung OneNAND used as a memory mapped device on a
OMAP2430 based board."

The page allocator tries to preserve contiguous PFN ordering when returning
pages such that repeated callers to the allocator have a strong chance of
getting physically contiguous pages, particularly when external fragmentation
is low.  However, of the bulk of the allocations have __GFP_COLD set as they
are due to aio_read() for example, then the PFNs are in reverse PFN order.
This can cause performance degration when used with IO controllers that could
have merged the requests.

This patch attempts to preserve the contiguous ordering of PFNs for users of
__GFP_COLD.

Signed-off-by: Mel Gorman 
Reported-by: Narayananu Gopalakrishnan 
Tested-by: Narayanan Gopalakrishnan 
Cc: KAMEZAWA Hiroyuki 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm: fix handling of pagesets for downed cpus

2009-07-02T23:32:06+00:00

commit 364df0ebfbbb1330bfc6ca159f4d6020efc15a12 upstream.

After downing/upping a cpu, an attempt to set
/proc/sys/vm/percpu_pagelist_fraction results in an oops in
percpu_pagelist_fraction_sysctl_handler().

If a processor is downed then we need to set the pageset pointer back to
the boot pageset.

Updates of the high water marks should not access pagesets of unpopulated
zones (those pointer go to the boot pagesets which would be no longer
functional if their size would be increased beyond zero).

Signed-off-by: Dimitri Sivanich 
Signed-off-by: Christoph Lameter 
Reviewed-by: KOSAKI Motohiro 
Cc: Nick Piggin 
Cc: Mel Gorman 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm: fix memmap init for handling memory hole

2009-03-17T00:52:46+00:00

commit cc2559bccc72767cb446f79b071d96c30c26439b upstream.

Now, early_pfn_in_nid(PFN, NID) may returns false if PFN is a hole.
and memmap initialization was not done. This was a trouble for
sparc boot.

To fix this, the PFN should be initialized and marked as PG_reserved.
This patch changes early_pfn_in_nid() return true if PFN is a hole.

Signed-off-by: KAMEZAWA Hiroyuki 
Reported-by: David Miller 
Tested-by: KOSAKI Motohiro 
Cc: Mel Gorman 
Cc: Heiko Carstens 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm: clean up for early_pfn_to_nid()

2009-03-17T00:52:45+00:00

commit f2dbcfa738368c8a40d4a5f0b65dc9879577cb21 upstream.

What's happening is that the assertion in mm/page_alloc.c:move_freepages()
is triggering:

	BUG_ON(page_zone(start_page) != page_zone(end_page));

Once I knew this is what was happening, I added some annotations:

	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
		printk(KERN_ERR "move_freepages: Bogus zones: "
		       "start_page[%p] end_page[%p] zone[%p]\n",
		       start_page, end_page, zone);
		printk(KERN_ERR "move_freepages: "
		       "start_zone[%p] end_zone[%p]\n",
		       page_zone(start_page), page_zone(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
		       page_to_pfn(start_page), page_to_pfn(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_nid[%d] end_nid[%d]\n",
		       page_to_nid(start_page), page_to_nid(end_page));
 ...

And here's what I got:

	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
	move_freepages: start_nid[1] end_nid[0]

My memory layout on this box is:

[    0.000000] Zone PFN ranges:
[    0.000000]   Normal   0x00000000 -> 0x0081ff5d
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[8] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x00020000
[    0.000000]     1: 0x00800000 -> 0x0081f7ff
[    0.000000]     1: 0x0081f800 -> 0x0081fe50
[    0.000000]     1: 0x0081fed1 -> 0x0081fed8
[    0.000000]     1: 0x0081feda -> 0x0081fedb
[    0.000000]     1: 0x0081fedd -> 0x0081fee5
[    0.000000]     1: 0x0081fee7 -> 0x0081ff51
[    0.000000]     1: 0x0081ff59 -> 0x0081ff5d

So it's a block move in that 0x81f600-->0x81f7ff region which triggers
the problem.

This patch:

Declaration of early_pfn_to_nid() is scattered over per-arch include
files, and it seems it's complicated to know when the declaration is used.
 I think it makes fix-for-memmap-init not easy.

This patch moves all declaration to include/linux/mm.h

After this,
  if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use static definition in include/linux/mm.h
  else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use generic definition in mm/page_alloc.c
  else
     -> per-arch back end function will be called.

Signed-off-by: KAMEZAWA Hiroyuki 
Tested-by: KOSAKI Motohiro 
Reported-by: David Miller 
Cc: Mel Gorman 
Cc: Heiko Carstens 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

setup_per_zone_pages_min(): take zone->lock instead of zone->lru_lock

2008-12-18T17:13:41+00:00

commit 1125b4e3949949b44a7c80b619507c6f61d62911 upstream.

This replaces zone->lru_lock in setup_per_zone_pages_min() with zone->lock.
There seems to be no need for the lru_lock anymore, but there is a need for
zone->lock instead, because that function may call move_freepages() via
setup_zone_migrate_reserve().

Signed-off-by: Gerald Schaefer 
Acked-by: KAMEZAWA Hiroyuki 
Tested-by: Yasunori Goto 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

hugetlb: pull gigantic page initialisation out of the default path

2008-11-13T17:55:56+00:00

commit 18229df5b613ed0732a766fc37850de2e7988e43 upstream

As we can determine exactly when a gigantic page is in use we can optimise
the common regular page cases by pulling out gigantic page initialisation
into its own function.  As gigantic pages are never released to buddy we
do not need a destructor.  This effectivly reverts the previous change to
the main buddy allocator.  It also adds a paranoid check to ensure we
never release gigantic pages from hugetlbfs to the main buddy.

Signed-off-by: Andy Whitcroft 
Cc: Jon Tollefson 
Cc: Mel Gorman 
Cc: Nick Piggin 
Cc: Christoph Lameter 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm: handle initialising compound pages at orders greater than MAX_ORDER

2008-10-02T22:53:13+00:00

When we initialise a compound page we initialise the page flags and head
page pointer for all base pages spanned by that page.  When we initialise
a gigantic page (a page of order greater than or equal to MAX_ORDER) we
have to initialise more than MAX_ORDER_NR_PAGES pages.  Currently we
assume that all elements of the mem_map in this page are contigious in
memory.  However this is only guarenteed out to MAX_ORDER_NR_PAGES pages,
and with SPARSEMEM enabled they will not be contigious.  This leads us to
walk off the end of the first section and scribble on everything which
follows, BAD.

When we reach a MAX_ORDER_NR_PAGES boundary we much locate the next
section of the mem_map.  As gigantic pages can only be maximally aligned
we know this will occur at exact multiple of MAX_ORDER_NR_PAGES pages from
the start of the page.

This is a bug fix for the gigantic page support in hugetlbfs.

Credit to Mel Gorman for spotting the issue.

Signed-off-by: Andy Whitcroft 
Cc: Mel Gorman 
Cc: Jon Tollefson 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/bootmem: silence section mismatch warning - contig_page_data/bootmem_node_data

2008-09-03T02:21:37+00:00

WARNING: vmlinux.o(.data+0x1f5c0): Section mismatch in reference from the variable contig_page_data to the variable .init.data:bootmem_node_data
The variable contig_page_data references
the variable __initdata bootmem_node_data
If the reference is valid then annotate the
variable with __init* (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,

Signed-off-by: Marcin Slusarz 
Cc: Johannes Weiner 
Cc: Sean MacLennan 
Cc: Sam Ravnborg 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: make setup_zone_migrate_reserve() aware of overlapping nodes

2008-09-03T02:21:37+00:00

I have gotten to the root cause of the hugetlb badness I reported back on
August 15th.  My system has the following memory topology (note the
overlapping node):

            Node 0 Memory: 0x8000000-0x44000000
            Node 1 Memory: 0x0-0x8000000 0x44000000-0x80000000

setup_zone_migrate_reserve() scans the address range 0x0-0x8000000 looking
for a pageblock to move onto the MIGRATE_RESERVE list.  Finding no
candidates, it happily continues the scan into 0x8000000-0x44000000.  When
a pageblock is found, the pages are moved to the MIGRATE_RESERVE list on
the wrong zone.  Oops.

setup_zone_migrate_reserve() should skip pageblocks in overlapping nodes.

Signed-off-by: Adam Litke 
Acked-by: Mel Gorman 
Cc: Dave Hansen 
Cc: Nishanth Aravamudan 
Cc: Andy Whitcroft 
Cc: 		[2.6.25.x, 2.6.26.x]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

page allocator: use no-panic variant of alloc_bootmem() in alloc_large_system_hash()

2008-08-12T23:07:27+00:00

..  since a failed allocation is being (initially) handled gracefully, and
panic()-ed upon failure explicitly in the function if retries with smaller
sizes failed.

Signed-off-by: Jan Beulich 
Signed-off-by: David Howells 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds