linux-toradex.git/include/linux/swap.h, branch v2.6.35.4

memcg: move charge of file pages

2010-05-27T16:12:43+00:00

This patch adds support for moving charge of file pages, which include
normal file, tmpfs file and swaps of tmpfs file.  It's enabled by setting
bit 1 of /memory.move_charge_at_immigrate.

Unlike the case of anonymous pages, file pages(and swaps) in the range
mmapped by the task will be moved even if the task hasn't done page fault,
i.e.  they might not be the task's "RSS", but other task's "RSS" that maps
the same file.  And mapcount of the page is ignored(the page can be moved
even if page_mapcount(page) > 1).  So, conditions that the page/swap
should be met to be moved is that it must be in the range mmapped by the
target task and it must be charged to the old cgroup.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Daisuke Nishimura 
Acked-by: KAMEZAWA Hiroyuki 
Cc: Balbir Singh 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: compaction: memory compaction core

2010-05-25T15:06:59+00:00

This patch is the core of a mechanism which compacts memory in a zone by
relocating movable pages towards the end of the zone.

A single compaction run involves a migration scanner and a free scanner.
Both scanners operate on pageblock-sized areas in the zone.  The migration
scanner starts at the bottom of the zone and searches for all movable
pages within each area, isolating them onto a private list called
migratelist.  The free scanner starts at the top of the zone and searches
for suitable areas and consumes the free pages within making them
available for the migration scanner.  The pages isolated for migration are
then migrated to the newly isolated free pages.

[aarcange@redhat.com: Fix unsafe optimisation]
[mel@csn.ul.ie: do not schedule work on other CPUs for compaction]
Signed-off-by: Mel Gorman 
Acked-by: Rik van Riel 
Reviewed-by: Minchan Kim 
Cc: KOSAKI Motohiro 
Cc: Christoph Lameter 
Cc: KAMEZAWA Hiroyuki 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: move definition for LRU isolation modes to a header

2010-05-25T15:06:59+00:00

Currently, vmscan.c defines the isolation modes for __isolate_lru_page().
Memory compaction needs access to these modes for isolating pages for
migration.  This patch exports them.

Signed-off-by: Mel Gorman 
Acked-by: Christoph Lameter 
Cc: Rik van Riel 
Cc: Minchan Kim 
Cc: KOSAKI Motohiro 
Cc: KAMEZAWA Hiroyuki 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

tmpfs: insert tmpfs cache pages to inactive list at first

2010-05-25T15:06:56+00:00

Shaohua Li reported parallel file copy on tmpfs can lead to OOM killer.
This is regression of caused by commit 9ff473b9a7 ("vmscan: evict
streaming IO first").  Wow, It is 2 years old patch!

Currently, tmpfs file cache is inserted active list at first.  This means
that the insertion doesn't only increase numbers of pages in anon LRU, but
it also reduces anon scanning ratio.  Therefore, vmscan will get totally
confused.  It scans almost only file LRU even though the system has plenty
unused tmpfs pages.

Historically, lru_cache_add_active_anon() was used for two reasons.
1) Intend to priotize shmem page rather than regular file cache.
2) Intend to avoid reclaim priority inversion of used once pages.

But we've lost both motivation because (1) Now we have separate anon and
file LRU list.  then, to insert active list doesn't help such priotize.
(2) In past, one pte access bit will cause page activation.  then to
insert inactive list with pte access bit mean higher priority than to
insert active list.  Its priority inversion may lead to uninteded lru
chun.  but it was already solved by commit 645747462 (vmscan: detect
mapped file pages used only once).  (Thanks Hannes, you are great!)

Thus, now we can use lru_cache_add_anon() instead.

Signed-off-by: KOSAKI Motohiro 
Reported-by: Shaohua Li 
Reviewed-by: Wu Fengguang 
Reviewed-by: Johannes Weiner 
Reviewed-by: Rik van Riel 
Reviewed-by: Minchan Kim 
Acked-by: Hugh Dickins 
Cc: Henrique de Moraes Holschuh 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

swap: Add flag to identify block swap devices

2010-05-18T22:07:52+00:00

Added SWP_BLKDEV flag to distinguish block and regular file backed
swap devices. We could also check if a swap is entire block device,
rather than a file, by:
S_ISBLK(swap_info_struct->swap_file->f_mapping->host->i_mode)
but, I think, simply checking this flag is more convenient.

Signed-off-by: Nitin Gupta 
Acked-by: Linus Torvalds 
Acked-by: Nigel Cunningham 
Acked-by: Pekka Enberg 
Reviewed-by: Minchan Kim 
Signed-off-by: Greg Kroah-Hartman

memcg: move charges of anonymous swap

2010-03-12T23:52:36+00:00

This patch is another core part of this move-charge-at-task-migration
feature.  It enables moving charges of anonymous swaps.

To move the charge of swap, we need to exchange swap_cgroup's record.

In current implementation, swap_cgroup's record is protected by:

  - page lock: if the entry is on swap cache.
  - swap_lock: if the entry is not on swap cache.

This works well in usual swap-in/out activity.

But this behavior make the feature of moving swap charge check many
conditions to exchange swap_cgroup's record safely.

So I changed modification of swap_cgroup's recored(swap_cgroup_record())
to use xchg, and define a new function to cmpxchg swap_cgroup's record.

This patch also enables moving charge of non pte_present but not uncharged
swap caches, which can be exist on swap-out path, by getting the target
pages via find_get_page() as do_mincore() does.

[kosaki.motohiro@jp.fujitsu.com: fix ia64 build]
[akpm@linux-foundation.org: fix typos]
Signed-off-by: Daisuke Nishimura 
Cc: Balbir Singh 
Acked-by: KAMEZAWA Hiroyuki 
Cc: Li Zefan 
Cc: Paul Menage 
Cc: Daisuke Nishimura 
Signed-off-by: KOSAKI Motohiro 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

swap: rework map_swap_page() again

2009-12-15T16:53:16+00:00

Seems that page_io.c doesn't really need to know that page_private(page)
is the swp_entry 'val'.  Rework map_swap_page() to do what its name says
and map a page to a page offset in the swap space.

The only other caller of map_swap_page() is internal to mm/swapfile.c and
it does want to map a swap entry to the 'sector'.  So rename
map_swap_page() to map_swap_entry(), make it 'static' and and implement
map_swap_page() as a wrapper around that.

Signed-off-by: Lee Schermerhorn 
Cc: Hugh Dickins 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

swap_info: reorder its fields

2009-12-15T16:53:16+00:00

Reorder (and comment) the fields of swap_info_struct, to make better
use of its cachelines: it's good for swap_duplicate() in particular
if unsigned int max and swap_map are near the start.

Signed-off-by: Hugh Dickins 
Cc: KAMEZAWA Hiroyuki 
Cc: Rik van Riel 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

swap_info: note SWAP_MAP_SHMEM

2009-12-15T16:53:16+00:00

While we're fiddling with the swap_map values, let's assign a particular
value to shmem/tmpfs swap pages: their swap counts are never incremented,
and it helps swapoff's try_to_unuse() a little if it can immediately
distinguish those pages from process pages.

Since we've no use for SWAP_MAP_BAD | COUNT_CONTINUED,
we might as well use that 0xbf value for SWAP_MAP_SHMEM.

Signed-off-by: Hugh Dickins 
Reviewed-by: KAMEZAWA Hiroyuki 
Cc: Rik van Riel 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

swap_info: swap count continuations

2009-12-15T16:53:15+00:00

Swap is duplicated (reference count incremented by one) whenever the same
swap page is inserted into another mm (when forking finds a swap entry in
place of a pte, or when reclaim unmaps a pte to insert the swap entry).

swap_info_struct's vmalloc'ed swap_map is the array of these reference
counts: but what happens when the unsigned short (or unsigned char since
the preceding patch) is full? (and its high bit is kept for a cache flag)

We then lose track of it, never freeing, leaving it in use until swapoff:
at which point we _hope_ that a single pass will have found all instances,
assume there are no more, and will lose user data if we're wrong.

Swapping of KSM pages has not yet been enabled; but it is implemented,
and makes it very easy for a user to overflow the maximum swap count:
possible with ordinary process pages, but unlikely, even when pid_max
has been raised from PID_MAX_DEFAULT.

This patch implements swap count continuations: when the count overflows,
a continuation page is allocated and linked to the original vmalloc'ed
map page, and this used to hold the continuation counts for that entry
and its neighbours.  These continuation pages are seldom referenced:
the common paths all work on the original swap_map, only referring to
a continuation page when the low "digit" of a count is incremented or
decremented through SWAP_MAP_MAX.

Signed-off-by: Hugh Dickins 
Cc: KAMEZAWA Hiroyuki 
Cc: Rik van Riel 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds