linux-toradex.git/mm, branch v2.6.35.8

mm: Move vma_stack_continue into mm.h

2010-10-29T04:51:47+00:00

commit 39aa3cb3e8250db9188a6f1e3fb62ffa1a717678 upstream.

So it can be used by all that need to check for that.

Signed-off-by: Stefan Bader 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

ksm: fix bad user data when swapping

2010-10-29T04:51:17+00:00

commit 4e31635c367a9e21a43cfbfae4c9deda2e19d1f4 upstream.

Building under memory pressure, with KSM on 2.6.36-rc5, collapsed with
an internal compiler error: typically indicating an error in swapping.

Perhaps there's a timing issue which makes it now more likely, perhaps
it's just a long time since I tried for so long: this bug goes back to
KSM swapping in 2.6.33.

Notice how reuse_swap_page() allows an exclusive page to be reused, but
only does SetPageDirty if it can delete it from swap cache right then -
if it's currently under Writeback, it has to be left in cache and we
don't SetPageDirty, but the page can be reused.  Fine, the dirty bit
will get set in the pte; but notice how zap_pte_range() does not bother
to transfer pte_dirty to page_dirty when unmapping a PageAnon.

If KSM chooses to share such a page, it will look like a clean copy of
swapcache, and not be written out to swap when its memory is needed;
then stale data read back from swap when it's needed again.

We could fix this in reuse_swap_page() (or even refuse to reuse a
page under writeback), but it's more honest to fix my oversight in
KSM's write_protect_page().  Several days of testing on three machines
confirms that this fixes the issue they showed.

Signed-off-by: Hugh Dickins 
Cc: Andrew Morton 
Cc: Andrea Arcangeli 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

guard page for stacks that grow upwards

2010-09-27T00:18:41+00:00

commit 8ca3eb08097f6839b2206e2242db4179aee3cfb3 upstream.

pa-risc and ia64 have stacks that grow upwards. Check that
they do not run into other mappings. By making VM_GROWSUP
0x0 on architectures that do not ever use it, we can avoid
some unpleasant #ifdefs in check_stack_guard_page().

Signed-off-by: Tony Luck 
Signed-off-by: Linus Torvalds 
Cc: dann frazier 
Signed-off-by: Greg Kroah-Hartman

mm: page allocator: update free page counters after pages are placed on the free list

2010-09-27T00:18:41+00:00

commit 72853e2991a2702ae93aaf889ac7db743a415dd3 upstream.

When allocating a page, the system uses NR_FREE_PAGES counters to
determine if watermarks would remain intact after the allocation was made.
This check is made without interrupts disabled or the zone lock held and
so is race-prone by nature.  Unfortunately, when pages are being freed in
batch, the counters are updated before the pages are added on the list.
During this window, the counters are misleading as the pages do not exist
yet.  When under significant pressure on systems with large numbers of
CPUs, it's possible for processes to make progress even though they should
have been stalled.  This is particularly problematic if a number of the
processes are using GFP_ATOMIC as the min watermark can be accidentally
breached and in extreme cases, the system can livelock.

This patch updates the counters after the pages have been added to the
list.  This makes the allocator more cautious with respect to preserving
the watermarks and mitigates livelock possibilities.

[akpm@linux-foundation.org: avoid modifying incoming args]
Signed-off-by: Mel Gorman 
Reviewed-by: Rik van Riel 
Reviewed-by: Minchan Kim 
Reviewed-by: KAMEZAWA Hiroyuki 
Reviewed-by: Christoph Lameter 
Reviewed-by: KOSAKI Motohiro 
Acked-by: Johannes Weiner 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm: page allocator: calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake

2010-09-27T00:18:40+00:00

commit aa45484031ddee09b06350ab8528bfe5b2c76d1c upstream.

Ordinarily watermark checks are based on the vmstat NR_FREE_PAGES as it is
cheaper than scanning a number of lists.  To avoid synchronization
overhead, counter deltas are maintained on a per-cpu basis and drained
both periodically and when the delta is above a threshold.  On large CPU
systems, the difference between the estimated and real value of
NR_FREE_PAGES can be very high.  If NR_FREE_PAGES is much higher than
number of real free page in buddy, the VM can allocate pages below min
watermark, at worst reducing the real number of pages to zero.  Even if
the OOM killer kills some victim for freeing memory, it may not free
memory if the exit path requires a new page resulting in livelock.

This patch introduces a zone_page_state_snapshot() function (courtesy of
Christoph) that takes a slightly more accurate view of an arbitrary vmstat
counter.  It is used to read NR_FREE_PAGES while kswapd is awake to avoid
the watermark being accidentally broken.  The estimate is not perfect and
may result in cache line bounces but is expected to be lighter than the
IPI calls necessary to continually drain the per-cpu counters while kswapd
is awake.

Signed-off-by: Christoph Lameter 
Signed-off-by: Mel Gorman 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm: page allocator: drain per-cpu lists after direct reclaim allocation fails

2010-09-27T00:18:40+00:00

commit 9ee493ce0a60bf42c0f8fd0b0fe91df5704a1cbf upstream.

When under significant memory pressure, a process enters direct reclaim
and immediately afterwards tries to allocate a page.  If it fails and no
further progress is made, it's possible the system will go OOM.  However,
on systems with large amounts of memory, it's possible that a significant
number of pages are on per-cpu lists and inaccessible to the calling
process.  This leads to a process entering direct reclaim more often than
it should increasing the pressure on the system and compounding the
problem.

This patch notes that if direct reclaim is making progress but allocations
are still failing that the system is already under heavy pressure.  In
this case, it drains the per-cpu lists and tries the allocation a second
time before continuing.

Signed-off-by: Mel Gorman 
Reviewed-by: Minchan Kim 
Reviewed-by: KAMEZAWA Hiroyuki 
Reviewed-by: KOSAKI Motohiro 
Reviewed-by: Christoph Lameter 
Cc: Dave Chinner 
Cc: Wu Fengguang 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

percpu: fix pcpu_last_unit_cpu

2010-09-27T00:18:29+00:00

commit 46b30ea9bc3698bc1d1e6fd726c9601d46fa0a91 upstream.

pcpu_first/last_unit_cpu are used to track which cpu has the first and
last units assigned.  This in turn is used to determine the span of a
chunk for man/unmap cache flushes and whether an address belongs to
the first chunk or not in per_cpu_ptr_to_phys().

When the number of possible CPUs isn't power of two, a chunk may
contain unassigned units towards the end of a chunk.  The logic to
determine pcpu_last_unit_cpu was incorrect when there was an unused
unit at the end of a chunk.  It failed to ignore the unused unit and
assigned the unused marker NR_CPUS to pcpu_last_unit_cpu.

This was discovered through kdump failure which was caused by
malfunctioning per_cpu_ptr_to_phys() on a kvm setup with 50 possible
CPUs by CAI Qian.

Signed-off-by: Tejun Heo 
Reported-by: CAI Qian 
Signed-off-by: Greg Kroah-Hartman

vmscan: check all_unreclaimable in direct reclaim path

2010-09-27T00:18:29+00:00

commit d1908362ae0b97374eb8328fbb471576332f9fb1 upstream.

M.  Vefa Bicakci reported 2.6.35 kernel hang up when hibernation on his
32bit 3GB mem machine.
(https://bugzilla.kernel.org/show_bug.cgi?id=16771). Also he bisected
the regression to

  commit bb21c7ce18eff8e6e7877ca1d06c6db719376e3c
  Author: KOSAKI Motohiro 
  Date:   Fri Jun 4 14:15:05 2010 -0700

     vmscan: fix do_try_to_free_pages() return value when priority==0 reclaim failure

At first impression, this seemed very strange because the above commit
only chenged function return value and hibernate_preallocate_memory()
ignore return value of shrink_all_memory().  But it's related.

Now, page allocation from hibernation code may enter infinite loop if the
system has highmem.  The reasons are that vmscan don't care enough OOM
case when oom_killer_disabled.

The problem sequence is following as.

1. hibernation
2. oom_disable
3. alloc_pages
4. do_try_to_free_pages
       if (scanning_global_lru(sc) && !all_unreclaimable)
               return 1;

If kswapd is not freozen, it would set zone->all_unreclaimable to 1 and
then shrink_zones maybe return true(ie, all_unreclaimable is true).  So at
last, alloc_pages could go to _nopage_.  If it is, it should have no
problem.

This patch adds all_unreclaimable check to protect in direct reclaim path,
too.  It can care of hibernation OOM case and help bailout
all_unreclaimable case slightly.

Signed-off-by: KOSAKI Motohiro 
Signed-off-by: Minchan Kim 
Reported-by: M. Vefa Bicakci 
Reported-by: 
Reviewed-by: Johannes Weiner 
Tested-by: 
Acked-by: Rafael J. Wysocki 
Acked-by: Rik van Riel 
Acked-by: KAMEZAWA Hiroyuki 
Cc: Balbir Singh 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mmap: call unlink_anon_vmas() in __split_vma() in case of error

2010-09-27T00:18:25+00:00

commit 2aeadc30de45a72648f271603203ab392b80f607 upstream.

If __split_vma fails because of an out of memory condition the
anon_vma_chain isn't teardown and freed potentially leading to rmap walks
accessing freed vma information plus there's a memleak.

Signed-off-by: Andrea Arcangeli 
Acked-by: Johannes Weiner 
Acked-by: Rik van Riel 
Acked-by: Hugh Dickins 
Cc: Marcelo Tosatti 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

bdi: Initialize noop_backing_dev_info properly

2010-09-27T00:18:24+00:00

commit 976e48f8a5b02fc33f3e5cad87fb3fcea041a49c upstream.

Properly initialize this backing dev info so that writeback code does not
barf when getting to it e.g. via sb->s_bdi.

Signed-off-by: Jan Kara 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman