<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/mm/filemap.c, branch v3.0.16</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>vfs: __read_cache_page should use gfp argument rather than GFP_KERNEL</title>
<updated>2012-01-06T22:13:54+00:00</updated>
<author>
<name>Dave Kleikamp</name>
<email>dave.kleikamp@oracle.com</email>
</author>
<published>2011-12-21T17:05:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2f79eddd49d391aded0b0c20f7eef3bec6d5e096'/>
<id>2f79eddd49d391aded0b0c20f7eef3bec6d5e096</id>
<content type='text'>
commit e6f67b8c05f5e129e126f4409ddac6f25f58ffcb upstream.

lockdep reports a deadlock in jfs because a special inode's rw semaphore
is taken recursively.  The mapping's gfp mask is GFP_NOFS, but is not
used when __read_cache_page() calls add_to_page_cache_lru().

Signed-off-by: Dave Kleikamp &lt;dave.kleikamp@oracle.com&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Acked-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit e6f67b8c05f5e129e126f4409ddac6f25f58ffcb upstream.

lockdep reports a deadlock in jfs because a special inode's rw semaphore
is taken recursively.  The mapping's gfp mask is GFP_NOFS, but is not
used when __read_cache_page() calls add_to_page_cache_lru().

Signed-off-by: Dave Kleikamp &lt;dave.kleikamp@oracle.com&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Acked-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>more conservative S_NOSEC handling</title>
<updated>2011-06-03T22:24:58+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2011-06-03T22:24:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9e1f1de02c2275d7172e18dc4e7c2065777611bf'/>
<id>9e1f1de02c2275d7172e18dc4e7c2065777611bf</id>
<content type='text'>
Caching "we have already removed suid/caps" was overenthusiastic as merged.
On network filesystems we might have had suid/caps set on another client,
silently picked by this client on revalidate, all of that *without* clearing
the S_NOSEC flag.

AFAICS, the only reasonably sane way to deal with that is
	* new superblock flag; unless set, S_NOSEC is not going to be set.
	* local block filesystems set it in their -&gt;mount() (more accurately,
mount_bdev() does, so does btrfs -&gt;mount(), users of mount_bdev() other than
local block ones clear it)
	* if any network filesystem (or a cluster one) wants to use S_NOSEC,
it'll need to set MS_NOSEC in sb-&gt;s_flags *AND* take care to clear S_NOSEC when
inode attribute changes are picked from other clients.

It's not an earth-shattering hole (anybody that can set suid on another client
will almost certainly be able to write to the file before doing that anyway),
but it's a bug that needs fixing.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Caching "we have already removed suid/caps" was overenthusiastic as merged.
On network filesystems we might have had suid/caps set on another client,
silently picked by this client on revalidate, all of that *without* clearing
the S_NOSEC flag.

AFAICS, the only reasonably sane way to deal with that is
	* new superblock flag; unless set, S_NOSEC is not going to be set.
	* local block filesystems set it in their -&gt;mount() (more accurately,
mount_bdev() does, so does btrfs -&gt;mount(), users of mount_bdev() other than
local block ones clear it)
	* if any network filesystem (or a cluster one) wants to use S_NOSEC,
it'll need to set MS_NOSEC in sb-&gt;s_flags *AND* take care to clear S_NOSEC when
inode attribute changes are picked from other clients.

It's not an earth-shattering hole (anybody that can set suid on another client
will almost certainly be able to write to the file before doing that anyway),
but it's a bug that needs fixing.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Cache xattr security drop check for write v2</title>
<updated>2011-05-28T16:02:09+00:00</updated>
<author>
<name>Andi Kleen</name>
<email>ak@linux.intel.com</email>
</author>
<published>2011-05-28T15:25:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=69b4573296469fd3f70cf7044693074980517067'/>
<id>69b4573296469fd3f70cf7044693074980517067</id>
<content type='text'>
Some recent benchmarking on btrfs showed that a major scaling bottleneck
on large systems on btrfs is currently the xattr lookup on every write.

Why xattr lookup on every write I hear you ask?

write wants to drop suid and security related xattrs that could set o
capabilities for executables.  To do that it currently looks up
security.capability on EVERY write (even for non executables) to decide
whether to drop it or not.

In btrfs this causes an additional tree walk, hitting some per file system
locks and quite bad scalability. In a simple read workload on a 8S
system I saw over 90% CPU time in spinlocks related to that.

Chris Mason tells me this is also a problem in ext4, where it hits
the global mbcache lock.

This patch adds a simple per inode to avoid this problem.  We only
do the lookup once per file and then if there is no xattr cache
the decision. All xattr changes clear the flag.

I also used the same flag to avoid the suid check, although
that one is pretty cheap.

A file system can also set this flag when it creates the inode,
if it has a cheap way to do so.  This is done for some common file systems
in followon patches.

With this patch a major part of the lock contention disappears
for btrfs. Some testing on smaller systems didn't show significant
performance changes, but at least it helps the larger systems
and is generally more efficient.

v2: Rename is_sgid. add file system helper.
Cc: chris.mason@oracle.com
Cc: josef@redhat.com
Cc: viro@zeniv.linux.org.uk
Cc: agruen@linbit.com
Cc: Serge E. Hallyn &lt;serue@us.ibm.com&gt;
Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Some recent benchmarking on btrfs showed that a major scaling bottleneck
on large systems on btrfs is currently the xattr lookup on every write.

Why xattr lookup on every write I hear you ask?

write wants to drop suid and security related xattrs that could set o
capabilities for executables.  To do that it currently looks up
security.capability on EVERY write (even for non executables) to decide
whether to drop it or not.

In btrfs this causes an additional tree walk, hitting some per file system
locks and quite bad scalability. In a simple read workload on a 8S
system I saw over 90% CPU time in spinlocks related to that.

Chris Mason tells me this is also a problem in ext4, where it hits
the global mbcache lock.

This patch adds a simple per inode to avoid this problem.  We only
do the lookup once per file and then if there is no xattr cache
the decision. All xattr changes clear the flag.

I also used the same flag to avoid the suid check, although
that one is pretty cheap.

A file system can also set this flag when it creates the inode,
if it has a cheap way to do so.  This is done for some common file systems
in followon patches.

With this patch a major part of the lock contention disappears
for btrfs. Some testing on smaller systems didn't show significant
performance changes, but at least it helps the larger systems
and is generally more efficient.

v2: Rename is_sgid. add file system helper.
Cc: chris.mason@oracle.com
Cc: josef@redhat.com
Cc: viro@zeniv.linux.org.uk
Cc: agruen@linbit.com
Cc: Serge E. Hallyn &lt;serue@us.ibm.com&gt;
Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: Wait for writeback when grabbing pages to begin a write</title>
<updated>2011-05-28T05:03:21+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@us.ibm.com</email>
</author>
<published>2011-05-27T19:23:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3d08bcc887a1c8d12be8d81f747ffa2e8a44b67b'/>
<id>3d08bcc887a1c8d12be8d81f747ffa2e8a44b67b</id>
<content type='text'>
When grabbing a page for a buffered IO write, the mm should wait for writeback
on the page to complete so that the page does not become writable during the IO
operation.  This change is needed to provide page stability during writes for
all filesystems.

Signed-off-by: Darrick J. Wong &lt;djwong@us.ibm.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When grabbing a page for a buffered IO write, the mm should wait for writeback
on the page to complete so that the page does not become writable during the IO
operation.  This change is needed to provide page stability during writes for
all filesystems.

Signed-off-by: Darrick J. Wong &lt;djwong@us.ibm.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>memcg: add the pagefault count into memcg stats</title>
<updated>2011-05-27T00:12:36+00:00</updated>
<author>
<name>Ying Han</name>
<email>yinghan@google.com</email>
</author>
<published>2011-05-26T23:25:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=456f998ec817ebfa254464be4f089542fa390645'/>
<id>456f998ec817ebfa254464be4f089542fa390645</id>
<content type='text'>
Two new stats in per-memcg memory.stat which tracks the number of page
faults and number of major page faults.

  "pgfault"
  "pgmajfault"

They are different from "pgpgin"/"pgpgout" stat which count number of
pages charged/discharged to the cgroup and have no meaning of reading/
writing page to disk.

It is valuable to track the two stats for both measuring application's
performance as well as the efficiency of the kernel page reclaim path.
Counting pagefaults per process is useful, but we also need the aggregated
value since processes are monitored and controlled in cgroup basis in
memcg.

Functional test: check the total number of pgfault/pgmajfault of all
memcgs and compare with global vmstat value:

  $ cat /proc/vmstat | grep fault
  pgfault 1070751
  pgmajfault 553

  $ cat /dev/cgroup/memory.stat | grep fault
  pgfault 1071138
  pgmajfault 553
  total_pgfault 1071142
  total_pgmajfault 553

  $ cat /dev/cgroup/A/memory.stat | grep fault
  pgfault 199
  pgmajfault 0
  total_pgfault 199
  total_pgmajfault 0

Performance test: run page fault test(pft) wit 16 thread on faulting in
15G anon pages in 16G container.  There is no regression noticed on the
"flt/cpu/s"

Sample output from pft:

  TAG pft:anon-sys-default:
    Gb  Thr CLine   User     System     Wall    flt/cpu/s fault/wsec
    15   16   1     0.67s   233.41s    14.76s   16798.546 266356.260

  +-------------------------------------------------------------------------+
      N           Min           Max        Median           Avg        Stddev
  x  10     16682.962     17344.027     16913.524     16928.812      166.5362
  +  10     16695.568     16923.896     16820.604     16824.652     84.816568
  No difference proven at 95.0% confidence

[akpm@linux-foundation.org: fix build]
[hughd@google.com: shmem fix]
Signed-off-by: Ying Han &lt;yinghan@google.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Acked-by: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Two new stats in per-memcg memory.stat which tracks the number of page
faults and number of major page faults.

  "pgfault"
  "pgmajfault"

They are different from "pgpgin"/"pgpgout" stat which count number of
pages charged/discharged to the cgroup and have no meaning of reading/
writing page to disk.

It is valuable to track the two stats for both measuring application's
performance as well as the efficiency of the kernel page reclaim path.
Counting pagefaults per process is useful, but we also need the aggregated
value since processes are monitored and controlled in cgroup basis in
memcg.

Functional test: check the total number of pgfault/pgmajfault of all
memcgs and compare with global vmstat value:

  $ cat /proc/vmstat | grep fault
  pgfault 1070751
  pgmajfault 553

  $ cat /dev/cgroup/memory.stat | grep fault
  pgfault 1071138
  pgmajfault 553
  total_pgfault 1071142
  total_pgmajfault 553

  $ cat /dev/cgroup/A/memory.stat | grep fault
  pgfault 199
  pgmajfault 0
  total_pgfault 199
  total_pgmajfault 0

Performance test: run page fault test(pft) wit 16 thread on faulting in
15G anon pages in 16G container.  There is no regression noticed on the
"flt/cpu/s"

Sample output from pft:

  TAG pft:anon-sys-default:
    Gb  Thr CLine   User     System     Wall    flt/cpu/s fault/wsec
    15   16   1     0.67s   233.41s    14.76s   16798.546 266356.260

  +-------------------------------------------------------------------------+
      N           Min           Max        Median           Avg        Stddev
  x  10     16682.962     17344.027     16913.524     16928.812      166.5362
  +  10     16695.568     16923.896     16820.604     16824.652     84.816568
  No difference proven at 95.0% confidence

[akpm@linux-foundation.org: fix build]
[hughd@google.com: shmem fix]
Signed-off-by: Ying Han &lt;yinghan@google.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Acked-by: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem</title>
<updated>2011-05-26T17:50:56+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-05-26T17:50:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f8d613e2a665bf1be9628a3c3f9bafe7599b32c0'/>
<id>f8d613e2a665bf1be9628a3c3f9bafe7599b32c0</id>
<content type='text'>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem:
  xen: cleancache shim to Xen Transcendent Memory
  ocfs2: add cleancache support
  ext4: add cleancache support
  btrfs: add cleancache support
  ext3: add cleancache support
  mm/fs: add hooks to support cleancache
  mm: cleancache core ops functions and config
  fs: add field to superblock to support cleancache
  mm/fs: cleancache documentation

Fix up trivial conflict in fs/btrfs/extent_io.c due to includes
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem:
  xen: cleancache shim to Xen Transcendent Memory
  ocfs2: add cleancache support
  ext4: add cleancache support
  btrfs: add cleancache support
  ext3: add cleancache support
  mm/fs: add hooks to support cleancache
  mm: cleancache core ops functions and config
  fs: add field to superblock to support cleancache
  mm/fs: cleancache documentation

Fix up trivial conflict in fs/btrfs/extent_io.c due to includes
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/fs: add hooks to support cleancache</title>
<updated>2011-05-26T16:01:43+00:00</updated>
<author>
<name>Dan Magenheimer</name>
<email>dan.magenheimer@oracle.com</email>
</author>
<published>2011-05-26T16:01:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c515e1fd361c2a08a9c2eb139396ec30a4f477dc'/>
<id>c515e1fd361c2a08a9c2eb139396ec30a4f477dc</id>
<content type='text'>
This fourth patch of eight in this cleancache series provides the
core hooks in VFS for: initializing cleancache per filesystem;
capturing clean pages reclaimed by page cache; attempting to get
pages from cleancache before filesystem read; and ensuring coherency
between pagecache, disk, and cleancache.  Note that the placement
of these hooks was stable from 2.6.18 to 2.6.38; a minor semantic
change was required due to a patchset in 2.6.39.

All hooks become no-ops if CONFIG_CLEANCACHE is unset, or become
a check of a boolean global if CONFIG_CLEANCACHE is set but no
cleancache "backend" has claimed cleancache_ops.

Details and a FAQ can be found in Documentation/vm/cleancache.txt

[v8: minchan.kim@gmail.com: adapt to new remove_from_page_cache function]
Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
Signed-off-by: Dan Magenheimer &lt;dan.magenheimer@oracle.com&gt;
Reviewed-by: Jeremy Fitzhardinge &lt;jeremy@goop.org&gt;
Reviewed-by: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Matthew Wilcox &lt;matthew@wil.cx&gt;
Cc: Nick Piggin &lt;npiggin@kernel.dk&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Rik Van Riel &lt;riel@redhat.com&gt;
Cc: Jan Beulich &lt;JBeulich@novell.com&gt;
Cc: Andreas Dilger &lt;adilger@sun.com&gt;
Cc: Ted Ts'o &lt;tytso@mit.edu&gt;
Cc: Mark Fasheh &lt;mfasheh@suse.com&gt;
Cc: Joel Becker &lt;joel.becker@oracle.com&gt;
Cc: Nitin Gupta &lt;ngupta@vflare.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This fourth patch of eight in this cleancache series provides the
core hooks in VFS for: initializing cleancache per filesystem;
capturing clean pages reclaimed by page cache; attempting to get
pages from cleancache before filesystem read; and ensuring coherency
between pagecache, disk, and cleancache.  Note that the placement
of these hooks was stable from 2.6.18 to 2.6.38; a minor semantic
change was required due to a patchset in 2.6.39.

All hooks become no-ops if CONFIG_CLEANCACHE is unset, or become
a check of a boolean global if CONFIG_CLEANCACHE is set but no
cleancache "backend" has claimed cleancache_ops.

Details and a FAQ can be found in Documentation/vm/cleancache.txt

[v8: minchan.kim@gmail.com: adapt to new remove_from_page_cache function]
Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
Signed-off-by: Dan Magenheimer &lt;dan.magenheimer@oracle.com&gt;
Reviewed-by: Jeremy Fitzhardinge &lt;jeremy@goop.org&gt;
Reviewed-by: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Matthew Wilcox &lt;matthew@wil.cx&gt;
Cc: Nick Piggin &lt;npiggin@kernel.dk&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Rik Van Riel &lt;riel@redhat.com&gt;
Cc: Jan Beulich &lt;JBeulich@novell.com&gt;
Cc: Andreas Dilger &lt;adilger@sun.com&gt;
Cc: Ted Ts'o &lt;tytso@mit.edu&gt;
Cc: Mark Fasheh &lt;mfasheh@suse.com&gt;
Cc: Joel Becker &lt;joel.becker@oracle.com&gt;
Cc: Nitin Gupta &lt;ngupta@vflare.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>readahead: trigger mmap sequential readahead on PG_readahead</title>
<updated>2011-05-25T15:39:27+00:00</updated>
<author>
<name>Wu Fengguang</name>
<email>fengguang.wu@intel.com</email>
</author>
<published>2011-05-25T00:12:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2cbea1d3ab11946885d37a2461072ee4d687cb4e'/>
<id>2cbea1d3ab11946885d37a2461072ee4d687cb4e</id>
<content type='text'>
Previously the mmap sequential readahead is triggered by updating
ra-&gt;prev_pos on each page fault and compare it with current page offset.

It costs dirtying the cache line on each _minor_ page fault.  So remove
the ra-&gt;prev_pos recording, and instead tag PG_readahead to trigger the
possible sequential readahead.  It's not only more simple, but also will
work more reliably and reduce cache line bouncing on concurrent page
faults on shared struct file.

In the mosbench exim benchmark which does multi-threaded page faults on
shared struct file, the ra-&gt;mmap_miss and ra-&gt;prev_pos updates are found
to cause excessive cache line bouncing on tmpfs, which actually disabled
readahead totally (shmem_backing_dev_info.ra_pages == 0).

So remove the ra-&gt;prev_pos recording, and instead tag PG_readahead to
trigger the possible sequential readahead.  It's not only more simple, but
also will work more reliably on concurrent reads on shared struct file.

Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Tested-by: Tim Chen &lt;tim.c.chen@intel.com&gt;
Reported-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Previously the mmap sequential readahead is triggered by updating
ra-&gt;prev_pos on each page fault and compare it with current page offset.

It costs dirtying the cache line on each _minor_ page fault.  So remove
the ra-&gt;prev_pos recording, and instead tag PG_readahead to trigger the
possible sequential readahead.  It's not only more simple, but also will
work more reliably and reduce cache line bouncing on concurrent page
faults on shared struct file.

In the mosbench exim benchmark which does multi-threaded page faults on
shared struct file, the ra-&gt;mmap_miss and ra-&gt;prev_pos updates are found
to cause excessive cache line bouncing on tmpfs, which actually disabled
readahead totally (shmem_backing_dev_info.ra_pages == 0).

So remove the ra-&gt;prev_pos recording, and instead tag PG_readahead to
trigger the possible sequential readahead.  It's not only more simple, but
also will work more reliably on concurrent reads on shared struct file.

Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Tested-by: Tim Chen &lt;tim.c.chen@intel.com&gt;
Reported-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>readahead: reduce unnecessary mmap_miss increases</title>
<updated>2011-05-25T15:39:26+00:00</updated>
<author>
<name>Andi Kleen</name>
<email>ak@linux.intel.com</email>
</author>
<published>2011-05-25T00:12:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=207d04baa3591a354711e863dd90087fc75873b3'/>
<id>207d04baa3591a354711e863dd90087fc75873b3</id>
<content type='text'>
The original INT_MAX is too large, reduce it to

- avoid unnecessarily dirtying/bouncing the cache line

- restore mmap read-around faster on changed access pattern

Background: in the mosbench exim benchmark which does multi-threaded page
faults on shared struct file, the ra-&gt;mmap_miss updates are found to cause
excessive cache line bouncing on tmpfs.  The ra state updates are needless
for tmpfs because it actually disabled readahead totally
(shmem_backing_dev_info.ra_pages == 0).

Tested-by: Tim Chen &lt;tim.c.chen@intel.com&gt;
Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The original INT_MAX is too large, reduce it to

- avoid unnecessarily dirtying/bouncing the cache line

- restore mmap read-around faster on changed access pattern

Background: in the mosbench exim benchmark which does multi-threaded page
faults on shared struct file, the ra-&gt;mmap_miss updates are found to cause
excessive cache line bouncing on tmpfs.  The ra state updates are needless
for tmpfs because it actually disabled readahead totally
(shmem_backing_dev_info.ra_pages == 0).

Tested-by: Tim Chen &lt;tim.c.chen@intel.com&gt;
Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>readahead: return early when readahead is disabled</title>
<updated>2011-05-25T15:39:26+00:00</updated>
<author>
<name>Wu Fengguang</name>
<email>fengguang.wu@intel.com</email>
</author>
<published>2011-05-25T00:12:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=275b12bf5486f6f531111fd3d7dbbf01df427cfe'/>
<id>275b12bf5486f6f531111fd3d7dbbf01df427cfe</id>
<content type='text'>
Reduce readahead overheads by returning early in do_sync_mmap_readahead().

tmpfs has ra_pages=0 and it can page fault really fast (not constraint by
IO if not swapping).

Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Tested-by: Tim Chen &lt;tim.c.chen@intel.com&gt;
Reported-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Reduce readahead overheads by returning early in do_sync_mmap_readahead().

tmpfs has ra_pages=0 and it can page fault really fast (not constraint by
IO if not swapping).

Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Tested-by: Tim Chen &lt;tim.c.chen@intel.com&gt;
Reported-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
