linux-toradex.git/mm, branch v2.6.16-rc4

[PATCH] sys_mbind sanity checking

2006-02-17T22:09:22+00:00

Make sure maxnodes is safe size before calculating nlongs in
get_nodes().

Signed-off-by: Chris Wright 
Signed-off-by: Linus Torvalds

[PATCH] Handle holes in node mask in node fallback list setup

2006-02-17T21:27:06+00:00

Change the find_next_best_node algorithm to correctly skip
over holes in the node online mask. Previously it would not handle
missing nodes correctly and cause crashes at boot.

[Written by Linus, tested by AK]

Signed-off-by: Andi Kleen 
Signed-off-by: Linus Torvalds

[PATCH] Handle all and empty zones when setting up custom zonelists for mbind

2006-02-17T16:18:14+00:00

The memory allocator doesn't like empty zones (which have an
uninitialized freelist), so a x86-64 system with a node fully
in GFP_DMA32 only would crash on mbind.

Fix that up by putting all possible zones as fallback into the zonelist
and skipping the empty ones.

In fact the code always enough allocated space for all zones,
but only used it for the highest. This change just uses all the
memory that was allocated before.

This should work fine for now, but whoever implements node hot removal
needs to fix this somewhere else too (or make sure zone datastructures
by itself never go away, only their memory)

Signed-off-by: Andi Kleen 
Acked-by: Christoph Lameter 
Signed-off-by: Linus Torvalds

[PATCH] x86_64: Add boot option to disable randomized mappings and cleanup

2006-02-17T16:00:40+00:00

AMD SimNow!'s JIT doesn't like them at all in the guest. For distribution
installation it's easiest if it's a boot time option.

Also I moved the variable to a more appropiate place and make
it independent from sysctl

And marked __read_mostly which it is.

Signed-off-by: Andi Kleen 
Signed-off-by: Linus Torvalds

[PATCH] madvise MADV_DONTFORK/MADV_DOFORK

2006-02-15T00:09:34+00:00

Currently, copy-on-write may change the physical address of a page even if the
user requested that the page is pinned in memory (either by mlock or by
get_user_pages).  This happens if the process forks meanwhile, and the parent
writes to that page.  As a result, the page is orphaned: in case of
get_user_pages, the application will never see any data hardware DMA's into
this page after the COW.  In case of mlock'd memory, the parent is not getting
the realtime/security benefits of mlock.

In particular, this affects the Infiniband modules which do DMA from and into
user pages all the time.

This patch adds madvise options to control whether memory range is inherited
across fork.  Useful e.g.  for when hardware is doing DMA from/into these
pages.  Could also be useful to an application wanting to speed up its forks
by cutting large areas out of consideration.

Signed-off-by: Michael S. Tsirkin 
Acked-by: Hugh Dickins 
Cc: Michael Kerrisk 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] compound page: default destructor

2006-02-15T00:09:33+00:00

Somehow I imagined that calling a NULL destructor would free a compound page
rather than oopsing.  No, we must supply a default destructor, __free_pages_ok
using the order noted by prep_compound_page.  hugetlb can still replace this
as before with its own free_huge_page pointer.

The case that needs this is not common: rarely does put_compound_page's
put_page_testzero bring the count down to 0.  But if get_user_pages is applied
to some part of a compound page, without immediate release (e.g.  AIO or
Infiniband), then it's possible for its put_page to come after the containing
vma has been unmapped and the driver done its free_pages.

That's just the kind of case compound pages are supposed to be guarding
against (but Nick points out, nor did PageReserved handle this right).

Signed-off-by: Hugh Dickins 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] compound page: use page[1].lru

2006-02-15T00:09:33+00:00

If a compound page has its own put_page_testzero destructor (the only current
example is free_huge_page), that is noted in page[1].mapping of the compound
page.  But that's rather a poor place to keep it: functions which call
set_page_dirty_lock after get_user_pages (e.g.  Infiniband's
__ib_umem_release) ought to be checking first, otherwise set_page_dirty is
liable to crash on what's not the address of a struct address_space.

And now I'm about to make that worse: it turns out that every compound page
needs a destructor, so we can no longer rely on hugetlb pages going their own
special way, to avoid further problems of page->mapping reuse.  For example,
not many people know that: on 50% of i386 -Os builds, the first tail page of a
compound page purports to be PageAnon (when its destructor has an odd
address), which surprises page_add_file_rmap.

Keep the compound page destructor in page[1].lru.next instead.  And to free up
the common pairing of mapping and index, also move compound page order from
index to lru.prev.  Slab reuses page->lru too: but if we ever need slab to use
compound pages, it can easily stack its use above this.

(akpm: decoded version of the above: the tail pages of a compound page now
have ->mapping==NULL, so there's no need for the set_page_dirty[_lock]()
caller to check that they're not compund pages before doing the dirty).

Signed-off-by: Hugh Dickins 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] vmscan: skip reclaim_mapped determination if we do not swap

2006-02-12T05:41:11+00:00

This puts the variables and the way to get to reclaim_mapped in one block.
And allows zone_reclaim or other things to skip the determination (maybe
this whole block of code does not belong into refill_inactive_zone()?)

Signed-off-by: Christoph Lameter 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] vmscan: remove duplicate increment of reclaim_in_progress

2006-02-12T05:41:11+00:00

shrink_zone() already increments reclaim_in_progress.  No need to do it in
balance_pgdat.

Signed-off-by: Christoph Lameter 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] zone reclaim: do not check references to a page during zone reclaim

2006-02-12T05:41:11+00:00

shrink_list() and refill_inactive() check all ptes pointing to a page for
reference bits in order to decide if the page should be put on the active
list.  This is not necessary for zone_reclaim since we are only interested
in removing unmapped pages.  Skip the checks in both functions.

Signed-off-by: Christoph Lameter 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds