linux-toradex.git/include/linux/swap.h, branch v3.9-rc3

vmscan: change type of vm_total_pages to unsigned long

2013-02-24T01:50:22+00:00

This variable is calculated from nr_free_pagecache_pages so
change its type to unsigned long.

Signed-off-by: Zhang Yanfei 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: fix return type for functions nr_free_*_pages

2013-02-24T01:50:21+00:00

Currently, the amount of RAM that functions nr_free_*_pages return is
held in unsigned int.  But in machines with big memory (exceeding 16TB),
the amount may be incorrect because of overflow, so fix it.

Signed-off-by: Zhang Yanfei 
Cc: Simon Horman 
Cc: Julian Anastasov 
Cc: David Miller 
Cc: Eric Van Hensbergen 
Cc: Ron Minnich 
Cc: Latchesar Ionkov 
Cc: Mel Gorman 
Cc: Minchan Kim 
Cc: KAMEZAWA Hiroyuki 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

swap: add per-partition lock for swapfile

2013-02-24T01:50:17+00:00

swap_lock is heavily contended when I test swap to 3 fast SSD (even
slightly slower than swap to 2 such SSD).  The main contention comes
from swap_info_get().  This patch tries to fix the gap with adding a new
per-partition lock.

Global data like nr_swapfiles, total_swap_pages, least_priority and
swap_list are still protected by swap_lock.

nr_swap_pages is an atomic now, it can be changed without swap_lock.  In
theory, it's possible get_swap_page() finds no swap pages but actually
there are free swap pages.  But sounds not a big problem.

Accessing partition specific data (like scan_swap_map and so on) is only
protected by swap_info_struct.lock.

Changing swap_info_struct.flags need hold swap_lock and
swap_info_struct.lock, because scan_scan_map() will check it.  read the
flags is ok with either the locks hold.

If both swap_lock and swap_info_struct.lock must be hold, we always hold
the former first to avoid deadlock.

swap_entry_free() can change swap_list.  To delete that code, we add a
new highest_priority_index.  Whenever get_swap_page() is called, we
check it.  If it's valid, we use it.

It's a pity get_swap_page() still holds swap_lock().  But in practice,
swap_lock() isn't heavily contended in my test with this patch (or I can
say there are other much more heavier bottlenecks like TLB flush).  And
BTW, looks get_swap_page() doesn't really need the lock.  We never free
swap_info[] and we check SWAP_WRITEOK flag.  The only risk without the
lock is we could swapout to some low priority swap, but we can quickly
recover after several rounds of swap, so sounds not a big deal to me.
But I'd prefer to fix this if it's a real problem.

"swap: make each swap partition have one address_space" improved the
swapout speed from 1.7G/s to 2G/s.  This patch further improves the
speed to 2.3G/s, so around 15% improvement.  It's a multi-process test,
so TLB flush isn't the biggest bottleneck before the patches.

[arnd@arndb.de: fix it for nommu]
[hughd@google.com: add missing unlock]
[minchan@kernel.org: get rid of lockdep whinge on sys_swapon]
Signed-off-by: Shaohua Li 
Cc: Hugh Dickins 
Cc: Rik van Riel 
Cc: Minchan Kim 
Cc: Greg Kroah-Hartman 
Cc: Seth Jennings 
Cc: Konrad Rzeszutek Wilk 
Cc: Xiao Guangrong 
Cc: Dan Magenheimer 
Cc: Stephen Rothwell 
Signed-off-by: Arnd Bergmann 
Signed-off-by: Hugh Dickins 
Signed-off-by: Minchan Kim 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

swap: make each swap partition have one address_space

2013-02-24T01:50:17+00:00

When I use several fast SSD to do swap, swapper_space.tree_lock is
heavily contended.  This makes each swap partition have one
address_space to reduce the lock contention.  There is an array of
address_space for swap.  The swap entry type is the index to the array.

In my test with 3 SSD, this increases the swapout throughput 20%.

[akpm@linux-foundation.org: revert unneeded change to  __add_to_swap_cache]
Signed-off-by: Shaohua Li 
Cc: Hugh Dickins 
Acked-by: Rik van Riel 
Acked-by: Minchan Kim 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: vmscan: save work scanning (almost) empty LRU lists

2013-02-24T01:50:09+00:00

In certain cases (kswapd reclaim, memcg target reclaim), a fixed minimum
amount of pages is scanned from the LRU lists on each iteration, to make
progress.

Do not make this minimum bigger than the respective LRU list size,
however, and save some busy work trying to isolate and reclaim pages
that are not there.

Empty LRU lists are quite common with memory cgroups in NUMA
environments because there exists a set of LRU lists for each zone for
each memory cgroup, while the memory of a single cgroup is expected to
stay on just one node.  The number of expected empty LRU lists is thus

  memcgs * (nodes - 1) * lru types

Each attempt to reclaim from an empty LRU list does expensive size
comparisons between lists, acquires the zone's lru lock etc.  Avoid
that.

Signed-off-by: Johannes Weiner 
Reviewed-by: Rik van Riel 
Acked-by: Mel Gorman 
Reviewed-by: Michal Hocko 
Cc: Hugh Dickins 
Cc: Satoru Moriya 
Cc: Simon Jeons 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: remove vma arg from page_evictable

2012-10-09T07:22:55+00:00

page_evictable(page, vma) is an irritant: almost all its callers pass
NULL for vma.  Remove the vma arg and use mlocked_vma_newpage(vma, page)
explicitly in the couple of places it's needed.  But in those places we
don't even need page_evictable() itself!  They're dealing with a freshly
allocated anonymous page, which has no "mapping" and cannot be mlocked yet.

Signed-off-by: Hugh Dickins 
Acked-by: Mel Gorman 
Cc: Rik van Riel 
Acked-by: Johannes Weiner 
Cc: Michel Lespinasse 
Cc: Ying Han 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: swap: implement generic handler for swap_activate

2012-08-01T01:42:47+00:00

The version of swap_activate introduced is sufficient for swap-over-NFS
but would not provide enough information to implement a generic handler.
This patch shuffles things slightly to ensure the same information is
available for aops->swap_activate() as is available to the core.

No functionality change.

Signed-off-by: Mel Gorman 
Acked-by: Rik van Riel 
Cc: Christoph Hellwig 
Cc: David S. Miller 
Cc: Eric B Munson 
Cc: Eric Paris 
Cc: James Morris 
Cc: Mel Gorman 
Cc: Mike Christie 
Cc: Neil Brown 
Cc: Peter Zijlstra 
Cc: Sebastian Andrzej Siewior 
Cc: Trond Myklebust 
Cc: Xiaotian Feng 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: add support for a filesystem to activate swap files and use direct_IO for writing swap pages

2012-08-01T01:42:47+00:00

Currently swapfiles are managed entirely by the core VM by using ->bmap to
allocate space and write to the blocks directly.  This effectively ensures
that the underlying blocks are allocated and avoids the need for the swap
subsystem to locate what physical blocks store offsets within a file.

If the swap subsystem is to use the filesystem information to locate the
blocks, it is critical that information such as block groups, block
bitmaps and the block descriptor table that map the swap file were
resident in memory.  This patch adds address_space_operations that the VM
can call when activating or deactivating swap backed by a file.

  int swap_activate(struct file *);
  int swap_deactivate(struct file *);

The ->swap_activate() method is used to communicate to the file that the
VM relies on it, and the address_space should take adequate measures such
as reserving space in the underlying device, reserving memory for mempools
and pinning information such as the block descriptor table in memory.  The
->swap_deactivate() method is called on sys_swapoff() if ->swap_activate()
returned success.

After a successful swapfile ->swap_activate, the swapfile is marked
SWP_FILE and swapper_space.a_ops will proxy to
sis->swap_file->f_mappings->a_ops using ->direct_io to write swapcache
pages and ->readpage to read.

It is perfectly possible that direct_IO be used to read the swap pages but
it is an unnecessary complication.  Similarly, it is possible that
->writepage be used instead of direct_io to write the pages but filesystem
developers have stated that calling writepage from the VM is undesirable
for a variety of reasons and using direct_IO opens up the possibility of
writing back batches of swap pages in the future.

[a.p.zijlstra@chello.nl: Original patch]
Signed-off-by: Mel Gorman 
Acked-by: Rik van Riel 
Cc: Christoph Hellwig 
Cc: David S. Miller 
Cc: Eric B Munson 
Cc: Eric Paris 
Cc: James Morris 
Cc: Mel Gorman 
Cc: Mike Christie 
Cc: Neil Brown 
Cc: Peter Zijlstra 
Cc: Sebastian Andrzej Siewior 
Cc: Trond Myklebust 
Cc: Xiaotian Feng 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: methods for teaching filesystems about PG_swapcache pages

2012-08-01T01:42:47+00:00

In order to teach filesystems to handle swap cache pages, three new page
functions are introduced:

  pgoff_t page_file_index(struct page *);
  loff_t page_file_offset(struct page *);
  struct address_space *page_file_mapping(struct page *);

page_file_index() - gives the offset of this page in the file in
PAGE_CACHE_SIZE blocks.  Like page->index is for mapped pages, this
function also gives the correct index for PG_swapcache pages.

page_file_offset() - uses page_file_index(), so that it will give the
expected result, even for PG_swapcache pages.

page_file_mapping() - gives the mapping backing the actual page; that is
for swap cache pages it will give swap_file->f_mapping.

Signed-off-by: Peter Zijlstra 
Signed-off-by: Mel Gorman 
Reviewed-by: Rik van Riel 
Cc: Christoph Hellwig 
Cc: David S. Miller 
Cc: Eric B Munson 
Cc: Eric Paris 
Cc: James Morris 
Cc: Mel Gorman 
Cc: Mike Christie 
Cc: Neil Brown 
Cc: Sebastian Andrzej Siewior 
Cc: Trond Myklebust 
Cc: Xiaotian Feng 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

memcg: rename config variables

2012-08-01T01:42:43+00:00

Sanity:

CONFIG_CGROUP_MEM_RES_CTLR -> CONFIG_MEMCG
CONFIG_CGROUP_MEM_RES_CTLR_SWAP -> CONFIG_MEMCG_SWAP
CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED -> CONFIG_MEMCG_SWAP_ENABLED
CONFIG_CGROUP_MEM_RES_CTLR_KMEM -> CONFIG_MEMCG_KMEM

[mhocko@suse.cz: fix missed bits]
Cc: Glauber Costa 
Acked-by: Michal Hocko 
Cc: Johannes Weiner 
Cc: KAMEZAWA Hiroyuki 
Cc: Hugh Dickins 
Cc: Tejun Heo 
Cc: Aneesh Kumar K.V 
Cc: David Rientjes 
Cc: KOSAKI Motohiro 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds