linux-toradex.git/include/linux/mmzone.h, branch v2.6.22.7

Move remote node draining out of slab allocators

2007-05-09T19:30:56+00:00

Currently the slab allocators contain callbacks into the page allocator to
perform the draining of pagesets on remote nodes.  This requires SLUB to have
a whole subsystem in order to be compatible with SLAB.  Moving node draining
out of the slab allocators avoids a section of code in SLUB.

Move the node draining so that is is done when the vm statistics are updated.
At that point we are already touching all the cachelines with the pagesets of
a processor.

Add a expire counter there.  If we have to update per zone or global vm
statistics then assume that the pageset will require subsequent draining.

The expire counter will be decremented on each vm stats update pass until it
reaches zero.  Then we will drain one batch from the pageset.  The draining
will cause vm counter updates which will then cause another expiration until
the pcp is empty.  So we will drain a batch every 3 seconds.

Note that remote node draining is a somewhat esoteric feature that is required
on large NUMA systems because otherwise significant portions of system memory
can become trapped in pcp queues.  The number of pcp is determined by the
number of processors and nodes in a system.  A system with 4 processors and 2
nodes has 8 pcps which is okay.  But a system with 1024 processors and 512
nodes has 512k pcps with a high potential for large amount of memory being
caught in them.

Signed-off-by: Christoph Lameter 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

add pfn_valid_within helper for sub-MAX_ORDER hole detection

2007-05-07T19:12:52+00:00

Generally we work under the assumption that memory the mem_map array is
contigious and valid out to MAX_ORDER_NR_PAGES block of pages, ie.  that if we
have validated any page within this MAX_ORDER_NR_PAGES block we need not check
any other.  This is not true when CONFIG_HOLES_IN_ZONE is set and we must
check each and every reference we make from a pfn.

Add a pfn_valid_within() helper which should be used when scanning pages
within a MAX_ORDER_NR_PAGES block when we have already checked the validility
of the block normally with pfn_valid().  This can then be optimised away when
we do not have holes within a MAX_ORDER_NR_PAGES block of pages.

Signed-off-by: Andy Whitcroft 
Acked-by: Mel Gorman 
Acked-by: Bob Picco 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] optional ZONE_DMA: optional ZONE_DMA in the VM

2007-02-11T18:51:18+00:00

Make ZONE_DMA optional in core code.

- ifdef all code for ZONE_DMA and related definitions following the example
  for ZONE_DMA32 and ZONE_HIGHMEM.

- Without ZONE_DMA, ZONE_HIGHMEM and ZONE_DMA32 we get to a ZONES_SHIFT of
  0.

- Modify the VM statistics to work correctly without a DMA zone.

- Modify slab to not create DMA slabs if there is no ZONE_DMA.

[akpm@osdl.org: cleanup]
[jdike@addtoit.com: build fix]
[apw@shadowen.org: Simplify calculation of the number of bits we need for ZONES_SHIFT]
Signed-off-by: Christoph Lameter 
Cc: Andi Kleen 
Cc: "Luck, Tony" 
Cc: Kyle McMartin 
Cc: Matthew Wilcox 
Cc: James Bottomley 
Cc: Paul Mundt 
Signed-off-by: Andy Whitcroft 
Signed-off-by: Jeff Dike 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] Drop __get_zone_counts()

2007-02-11T18:51:18+00:00

Values are readily available via ZVC per node and global sums.

Signed-off-by: Christoph Lameter 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] Reorder ZVCs according to cacheline

2007-02-11T18:51:17+00:00

The global and per zone counter sums are in arrays of longs.  Reorder the ZVCs
so that the most frequently used ZVCs are put into the same cacheline.  That
way calculations of the global, node and per zone vm state touches only a
single cacheline.  This is mostly important for 64 bit systems were one 128
byte cacheline takes only 8 longs.

Signed-off-by: Christoph Lameter 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] Use ZVC for free_pages

2007-02-11T18:51:17+00:00

This is again simplifies some of the VM counter calculations through the use
of the ZVC consolidated counters.

[michal.k.k.piotrowski@gmail.com: build fix]
Signed-off-by: Christoph Lameter 
Signed-off-by: Michal Piotrowski 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] Use ZVC for inactive and active counts

2007-02-11T18:51:17+00:00

The determination of the dirty ratio to determine writeback behavior is
currently based on the number of total pages on the system.

However, not all pages in the system may be dirtied.  Thus the ratio is always
too low and can never reach 100%.  The ratio may be particularly skewed if
large hugepage allocations, slab allocations or device driver buffers make
large sections of memory not available anymore.  In that case we may get into
a situation in which f.e.  the background writeback ratio of 40% cannot be
reached anymore which leads to undesired writeback behavior.

This patchset fixes that issue by determining the ratio based on the actual
pages that may potentially be dirty.  These are the pages on the active and
the inactive list plus free pages.

The problem with those counts has so far been that it is expensive to
calculate these because counts from multiple nodes and multiple zones will
have to be summed up.  This patchset makes these counters ZVC counters.  This
means that a current sum per zone, per node and for the whole system is always
available via global variables and not expensive anymore to calculate.

The patchset results in some other good side effects:

- Removal of the various functions that sum up free, active and inactive
  page counts

- Cleanup of the functions that display information via the proc filesystem.

This patch:

The use of a ZVC for nr_inactive and nr_active allows a simplification of some
counter operations.  More ZVC functionality is used for sums etc in the
following patches.

[akpm@osdl.org: UP build fix]
Signed-off-by: Christoph Lameter 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] Fix sparsemem on Cell

2007-01-12T02:18:20+00:00

Fix an oops experienced on the Cell architecture when init-time functions,
early_*(), are called at runtime.  It alters the call paths to make sure
that the callers explicitly say whether the call is being made on behalf of
a hotplug even, or happening at boot-time.

It has been compile tested on ppc64, ia64, s390, i386 and x86_64.

Acked-by: Arnd Bergmann 
Signed-off-by: Dave Hansen 
Cc: Yasunori Goto 
Acked-by: Andy Whitcroft 
Cc: Christoph Lameter 
Cc: Martin Schwidefsky 
Acked-by: Heiko Carstens 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] struct seq_operations and struct file_operations constification

2006-12-07T16:39:46+00:00

 - move some file_operations structs into the .rodata section

 - move static strings from policy_types[] array into the .rodata section

 - fix generic seq_operations usages, so that those structs may be defined
   as "const" as well

[akpm@osdl.org: couple of fixes]
Signed-off-by: Helge Deller 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] memory page_alloc zonelist caching reorder structure

2006-12-07T16:39:20+00:00

Rearrange the struct members in the 'struct zonelist_cache' structure, so
as to put the readonly (once initialized) z_to_n[] array first, where it
will come right after the zones[] array in struct zonelist.

This pretty much eliminates the chance that the two frequently written
elements of 'struct zonelist_cache', the fullzones bitmap and last_full_zap
times, will end up on the same cache line as the performance sensitive,
frequently read, never (after init) written zones[] array.

Keeping frequently written data off frequently read cache lines is good for
performance.

Thanks to Rohit Seth for the suggestion.

Signed-off-by: Paul Jackson 
Cc: Rohit Seth 
Cc: Paul Menage 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds