<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/fscache.h, branch v4.11-rc3</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>FS-Cache: Provide the ability to enable/disable cookies</title>
<updated>2013-09-27T17:40:25+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2013-09-20T23:09:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=94d30ae90a00cafe686c1057be57f4885f963abf'/>
<id>94d30ae90a00cafe686c1057be57f4885f963abf</id>
<content type='text'>
Provide the ability to enable and disable fscache cookies.  A disabled cookie
will reject or ignore further requests to:

	Acquire a child cookie
	Invalidate and update backing objects
	Check the consistency of a backing object
	Allocate storage for backing page
	Read backing pages
	Write to backing pages

but still allows:

	Checks/waits on the completion of already in-progress objects
	Uncaching of pages
	Relinquishment of cookies

Two new operations are provided:

 (1) Disable a cookie:

	void fscache_disable_cookie(struct fscache_cookie *cookie,
				    bool invalidate);

     If the cookie is not already disabled, this locks the cookie against other
     dis/enablement ops, marks the cookie as being disabled, discards or
     invalidates any backing objects and waits for cessation of activity on any
     associated object.

     This is a wrapper around a chunk split out of fscache_relinquish_cookie(),
     but it reinitialises the cookie such that it can be reenabled.

     All possible failures are handled internally.  The caller should consider
     calling fscache_uncache_all_inode_pages() afterwards to make sure all page
     markings are cleared up.

 (2) Enable a cookie:

	void fscache_enable_cookie(struct fscache_cookie *cookie,
				   bool (*can_enable)(void *data),
				   void *data)

     If the cookie is not already enabled, this locks the cookie against other
     dis/enablement ops, invokes can_enable() and, if the cookie is not an
     index cookie, will begin the procedure of acquiring backing objects.

     The optional can_enable() function is passed the data argument and returns
     a ruling as to whether or not enablement should actually be permitted to
     begin.

     All possible failures are handled internally.  The cookie will only be
     marked as enabled if provisional backing objects are allocated.

A later patch will introduce these to NFS.  Cookie enablement during nfs_open()
is then contingent on i_writecount &lt;= 0.  can_enable() checks for a race
between open(O_RDONLY) and open(O_WRONLY/O_RDWR).  This simplifies NFS's cookie
handling and allows us to get rid of open(O_RDONLY) accidentally introducing
caching to an inode that's open for writing already.

One operation has its API modified:

 (3) Acquire a cookie.

	struct fscache_cookie *fscache_acquire_cookie(
		struct fscache_cookie *parent,
		const struct fscache_cookie_def *def,
		void *netfs_data,
		bool enable);

     This now has an additional argument that indicates whether the requested
     cookie should be enabled by default.  It doesn't need the can_enable()
     function because the caller must prevent multiple calls for the same netfs
     object and it doesn't need to take the enablement lock because no one else
     can get at the cookie before this returns.

Signed-off-by: David Howells &lt;dhowells@redhat.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Provide the ability to enable and disable fscache cookies.  A disabled cookie
will reject or ignore further requests to:

	Acquire a child cookie
	Invalidate and update backing objects
	Check the consistency of a backing object
	Allocate storage for backing page
	Read backing pages
	Write to backing pages

but still allows:

	Checks/waits on the completion of already in-progress objects
	Uncaching of pages
	Relinquishment of cookies

Two new operations are provided:

 (1) Disable a cookie:

	void fscache_disable_cookie(struct fscache_cookie *cookie,
				    bool invalidate);

     If the cookie is not already disabled, this locks the cookie against other
     dis/enablement ops, marks the cookie as being disabled, discards or
     invalidates any backing objects and waits for cessation of activity on any
     associated object.

     This is a wrapper around a chunk split out of fscache_relinquish_cookie(),
     but it reinitialises the cookie such that it can be reenabled.

     All possible failures are handled internally.  The caller should consider
     calling fscache_uncache_all_inode_pages() afterwards to make sure all page
     markings are cleared up.

 (2) Enable a cookie:

	void fscache_enable_cookie(struct fscache_cookie *cookie,
				   bool (*can_enable)(void *data),
				   void *data)

     If the cookie is not already enabled, this locks the cookie against other
     dis/enablement ops, invokes can_enable() and, if the cookie is not an
     index cookie, will begin the procedure of acquiring backing objects.

     The optional can_enable() function is passed the data argument and returns
     a ruling as to whether or not enablement should actually be permitted to
     begin.

     All possible failures are handled internally.  The cookie will only be
     marked as enabled if provisional backing objects are allocated.

A later patch will introduce these to NFS.  Cookie enablement during nfs_open()
is then contingent on i_writecount &lt;= 0.  can_enable() checks for a race
between open(O_RDONLY) and open(O_WRONLY/O_RDWR).  This simplifies NFS's cookie
handling and allows us to get rid of open(O_RDONLY) accidentally introducing
caching to an inode that's open for writing already.

One operation has its API modified:

 (3) Acquire a cookie.

	struct fscache_cookie *fscache_acquire_cookie(
		struct fscache_cookie *parent,
		const struct fscache_cookie_def *def,
		void *netfs_data,
		bool enable);

     This now has an additional argument that indicates whether the requested
     cookie should be enabled by default.  It doesn't need the can_enable()
     function because the caller must prevent multiple calls for the same netfs
     object and it doesn't need to take the enablement lock because no one else
     can get at the cookie before this returns.

Signed-off-by: David Howells &lt;dhowells@redhat.com
</pre>
</div>
</content>
</entry>
<entry>
<title>fscache: Netfs function for cleanup post readpages</title>
<updated>2013-09-06T08:17:30+00:00</updated>
<author>
<name>Milosz Tanski</name>
<email>milosz@adfin.com</email>
</author>
<published>2013-08-21T21:30:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5a6f282a2052bb13171b53f03b34501cf72c33f1'/>
<id>5a6f282a2052bb13171b53f03b34501cf72c33f1</id>
<content type='text'>
Currently the fscache code expect the netfs to call fscache_readpages_or_alloc
inside the aops readpages callback.  It marks all the pages in the list
provided by readahead with PG_private_2.  In the cases that the netfs fails to
read all the pages (which is legal) it ends up returning to the readahead and
triggering a BUG.  This happens because the page list still contains marked
pages.

This patch implements a simple fscache_readpages_cancel function that the netfs
should call before returning from readpages.  It will revoke the pages from the
underlying cache backend and unmark them.

The problem was originally worked out in the Ceph devel tree, but it also
occurs in CIFS.  It appears that NFS, AFS and 9P are okay as read_cache_pages()
will clean up the unprocessed pages in the case of an error.

This can be used to address the following oops:

[12410647.597278] BUG: Bad page state in process petabucket  pfn:3d504e
[12410647.597292] page:ffffea000f541380 count:0 mapcount:0 mapping:
	(null) index:0x0
[12410647.597298] page flags: 0x200000000001000(private_2)

...

[12410647.597334] Call Trace:
[12410647.597345]  [&lt;ffffffff815523f2&gt;] dump_stack+0x19/0x1b
[12410647.597356]  [&lt;ffffffff8111def7&gt;] bad_page+0xc7/0x120
[12410647.597359]  [&lt;ffffffff8111e49e&gt;] free_pages_prepare+0x10e/0x120
[12410647.597361]  [&lt;ffffffff8111fc80&gt;] free_hot_cold_page+0x40/0x170
[12410647.597363]  [&lt;ffffffff81123507&gt;] __put_single_page+0x27/0x30
[12410647.597365]  [&lt;ffffffff81123df5&gt;] put_page+0x25/0x40
[12410647.597376]  [&lt;ffffffffa02bdcf9&gt;] ceph_readpages+0x2e9/0x6e0 [ceph]
[12410647.597379]  [&lt;ffffffff81122a8f&gt;] __do_page_cache_readahead+0x1af/0x260
[12410647.597382]  [&lt;ffffffff81122ea1&gt;] ra_submit+0x21/0x30
[12410647.597384]  [&lt;ffffffff81118f64&gt;] filemap_fault+0x254/0x490
[12410647.597387]  [&lt;ffffffff8113a74f&gt;] __do_fault+0x6f/0x4e0
[12410647.597391]  [&lt;ffffffff810125bd&gt;] ? __switch_to+0x16d/0x4a0
[12410647.597395]  [&lt;ffffffff810865ba&gt;] ? finish_task_switch+0x5a/0xc0
[12410647.597398]  [&lt;ffffffff8113d856&gt;] handle_pte_fault+0xf6/0x930
[12410647.597401]  [&lt;ffffffff81008c33&gt;] ? pte_mfn_to_pfn+0x93/0x110
[12410647.597403]  [&lt;ffffffff81008cce&gt;] ? xen_pmd_val+0xe/0x10
[12410647.597405]  [&lt;ffffffff81005469&gt;] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
[12410647.597407]  [&lt;ffffffff8113f361&gt;] handle_mm_fault+0x251/0x370
[12410647.597411]  [&lt;ffffffff812b0ac4&gt;] ? call_rwsem_down_read_failed+0x14/0x30
[12410647.597414]  [&lt;ffffffff8155bffa&gt;] __do_page_fault+0x1aa/0x550
[12410647.597418]  [&lt;ffffffff8108011d&gt;] ? up_write+0x1d/0x20
[12410647.597422]  [&lt;ffffffff8113141c&gt;] ? vm_mmap_pgoff+0xbc/0xe0
[12410647.597425]  [&lt;ffffffff81143bb8&gt;] ? SyS_mmap_pgoff+0xd8/0x240
[12410647.597427]  [&lt;ffffffff8155c3ae&gt;] do_page_fault+0xe/0x10
[12410647.597431]  [&lt;ffffffff81558818&gt;] page_fault+0x28/0x30

Signed-off-by: Milosz Tanski &lt;milosz@adfin.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently the fscache code expect the netfs to call fscache_readpages_or_alloc
inside the aops readpages callback.  It marks all the pages in the list
provided by readahead with PG_private_2.  In the cases that the netfs fails to
read all the pages (which is legal) it ends up returning to the readahead and
triggering a BUG.  This happens because the page list still contains marked
pages.

This patch implements a simple fscache_readpages_cancel function that the netfs
should call before returning from readpages.  It will revoke the pages from the
underlying cache backend and unmark them.

The problem was originally worked out in the Ceph devel tree, but it also
occurs in CIFS.  It appears that NFS, AFS and 9P are okay as read_cache_pages()
will clean up the unprocessed pages in the case of an error.

This can be used to address the following oops:

[12410647.597278] BUG: Bad page state in process petabucket  pfn:3d504e
[12410647.597292] page:ffffea000f541380 count:0 mapcount:0 mapping:
	(null) index:0x0
[12410647.597298] page flags: 0x200000000001000(private_2)

...

[12410647.597334] Call Trace:
[12410647.597345]  [&lt;ffffffff815523f2&gt;] dump_stack+0x19/0x1b
[12410647.597356]  [&lt;ffffffff8111def7&gt;] bad_page+0xc7/0x120
[12410647.597359]  [&lt;ffffffff8111e49e&gt;] free_pages_prepare+0x10e/0x120
[12410647.597361]  [&lt;ffffffff8111fc80&gt;] free_hot_cold_page+0x40/0x170
[12410647.597363]  [&lt;ffffffff81123507&gt;] __put_single_page+0x27/0x30
[12410647.597365]  [&lt;ffffffff81123df5&gt;] put_page+0x25/0x40
[12410647.597376]  [&lt;ffffffffa02bdcf9&gt;] ceph_readpages+0x2e9/0x6e0 [ceph]
[12410647.597379]  [&lt;ffffffff81122a8f&gt;] __do_page_cache_readahead+0x1af/0x260
[12410647.597382]  [&lt;ffffffff81122ea1&gt;] ra_submit+0x21/0x30
[12410647.597384]  [&lt;ffffffff81118f64&gt;] filemap_fault+0x254/0x490
[12410647.597387]  [&lt;ffffffff8113a74f&gt;] __do_fault+0x6f/0x4e0
[12410647.597391]  [&lt;ffffffff810125bd&gt;] ? __switch_to+0x16d/0x4a0
[12410647.597395]  [&lt;ffffffff810865ba&gt;] ? finish_task_switch+0x5a/0xc0
[12410647.597398]  [&lt;ffffffff8113d856&gt;] handle_pte_fault+0xf6/0x930
[12410647.597401]  [&lt;ffffffff81008c33&gt;] ? pte_mfn_to_pfn+0x93/0x110
[12410647.597403]  [&lt;ffffffff81008cce&gt;] ? xen_pmd_val+0xe/0x10
[12410647.597405]  [&lt;ffffffff81005469&gt;] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
[12410647.597407]  [&lt;ffffffff8113f361&gt;] handle_mm_fault+0x251/0x370
[12410647.597411]  [&lt;ffffffff812b0ac4&gt;] ? call_rwsem_down_read_failed+0x14/0x30
[12410647.597414]  [&lt;ffffffff8155bffa&gt;] __do_page_fault+0x1aa/0x550
[12410647.597418]  [&lt;ffffffff8108011d&gt;] ? up_write+0x1d/0x20
[12410647.597422]  [&lt;ffffffff8113141c&gt;] ? vm_mmap_pgoff+0xbc/0xe0
[12410647.597425]  [&lt;ffffffff81143bb8&gt;] ? SyS_mmap_pgoff+0xd8/0x240
[12410647.597427]  [&lt;ffffffff8155c3ae&gt;] do_page_fault+0xe/0x10
[12410647.597431]  [&lt;ffffffff81558818&gt;] page_fault+0x28/0x30

Signed-off-by: Milosz Tanski &lt;milosz@adfin.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>FS-Cache: Add interface to check consistency of a cached object</title>
<updated>2013-09-06T08:17:30+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2013-08-21T21:29:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=da9803bc8812f5bd3b26baaa90e515b843c65ff7'/>
<id>da9803bc8812f5bd3b26baaa90e515b843c65ff7</id>
<content type='text'>
Extend the fscache netfs API so that the netfs can ask as to whether a cache
object is up to date with respect to its corresponding netfs object:

	int fscache_check_consistency(struct fscache_cookie *cookie)

This will call back to the netfs to check whether the auxiliary data associated
with a cookie is correct.  It returns 0 if it is and -ESTALE if it isn't; it
may also return -ENOMEM and -ERESTARTSYS.

The backends now have to implement a mandatory operation pointer:

	int (*check_consistency)(struct fscache_object *object)

that corresponds to the above API call.  FS-Cache takes care of pinning the
object and the cookie in memory and managing this call with respect to the
object state.

Original-author: Hongyi Jia &lt;jiayisuse@gmail.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
cc: Hongyi Jia &lt;jiayisuse@gmail.com&gt;
cc: Milosz Tanski &lt;milosz@adfin.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Extend the fscache netfs API so that the netfs can ask as to whether a cache
object is up to date with respect to its corresponding netfs object:

	int fscache_check_consistency(struct fscache_cookie *cookie)

This will call back to the netfs to check whether the auxiliary data associated
with a cookie is correct.  It returns 0 if it is and -ESTALE if it isn't; it
may also return -ENOMEM and -ERESTARTSYS.

The backends now have to implement a mandatory operation pointer:

	int (*check_consistency)(struct fscache_object *object)

that corresponds to the above API call.  FS-Cache takes care of pinning the
object and the cookie in memory and managing this call with respect to the
object state.

Original-author: Hongyi Jia &lt;jiayisuse@gmail.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
cc: Hongyi Jia &lt;jiayisuse@gmail.com&gt;
cc: Milosz Tanski &lt;milosz@adfin.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>FS-Cache: Provide proper invalidation</title>
<updated>2012-12-20T22:04:07+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2012-12-20T21:52:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ef778e7ae67cd426c30cad43378b908f5eb0bad5'/>
<id>ef778e7ae67cd426c30cad43378b908f5eb0bad5</id>
<content type='text'>
Provide a proper invalidation method rather than relying on the netfs retiring
the cookie it has and getting a new one.  The problem with this is that isn't
easy for the netfs to make sure that it has completed/cancelled all its
outstanding storage and retrieval operations on the cookie it is retiring.

Instead, have the cache provide an invalidation method that will cancel or wait
for all currently outstanding operations before invalidating the cache, and
will cause new operations to queue up behind that.  Whilst invalidation is in
progress, some requests will be rejected until the cache can stack a barrier on
the operation queue to cause new operations to be deferred behind it.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Provide a proper invalidation method rather than relying on the netfs retiring
the cookie it has and getting a new one.  The problem with this is that isn't
easy for the netfs to make sure that it has completed/cancelled all its
outstanding storage and retrieval operations on the cookie it is retiring.

Instead, have the cache provide an invalidation method that will cancel or wait
for all currently outstanding operations before invalidating the cache, and
will cause new operations to queue up behind that.  Whilst invalidation is in
progress, some requests will be rejected until the cache can stack a barrier on
the operation queue to cause new operations to be deferred behind it.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>CacheFiles: Fix the marking of cached pages</title>
<updated>2012-12-20T21:54:30+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2012-12-20T21:52:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c4d6d8dbf335c7fa47341654a37c53a512b519bb'/>
<id>c4d6d8dbf335c7fa47341654a37c53a512b519bb</id>
<content type='text'>
Under some circumstances CacheFiles defers the marking of pages with PG_fscache
so that it can take advantage of pagevecs to reduce the number of calls to
fscache_mark_pages_cached() and the netfs's hook to keep track of this.

There are, however, two problems with this:

 (1) It can lead to the PG_fscache mark being applied _after_ the page is set
     PG_uptodate and unlocked (by the call to fscache_end_io()).

 (2) CacheFiles's ref on the page is dropped immediately following
     fscache_end_io() - and so may not still be held when the mark is applied.
     This can lead to the page being passed back to the allocator before the
     mark is applied.

Fix this by, where appropriate, marking the page before calling
fscache_end_io() and releasing the page.  This means that we can't take
advantage of pagevecs and have to make a separate call for each page to the
marking routines.

The symptoms of this are Bad Page state errors cropping up under memory
pressure, for example:

BUG: Bad page state in process tar  pfn:002da
page:ffffea0000009fb0 count:0 mapcount:0 mapping:          (null) index:0x1447
page flags: 0x1000(private_2)
Pid: 4574, comm: tar Tainted: G        W   3.1.0-rc4-fsdevel+ #1064
Call Trace:
 [&lt;ffffffff8109583c&gt;] ? dump_page+0xb9/0xbe
 [&lt;ffffffff81095916&gt;] bad_page+0xd5/0xea
 [&lt;ffffffff81095d82&gt;] get_page_from_freelist+0x35b/0x46a
 [&lt;ffffffff810961f3&gt;] __alloc_pages_nodemask+0x362/0x662
 [&lt;ffffffff810989da&gt;] __do_page_cache_readahead+0x13a/0x267
 [&lt;ffffffff81098942&gt;] ? __do_page_cache_readahead+0xa2/0x267
 [&lt;ffffffff81098d7b&gt;] ra_submit+0x1c/0x20
 [&lt;ffffffff8109900a&gt;] ondemand_readahead+0x28b/0x29a
 [&lt;ffffffff81098ee2&gt;] ? ondemand_readahead+0x163/0x29a
 [&lt;ffffffff810990ce&gt;] page_cache_sync_readahead+0x38/0x3a
 [&lt;ffffffff81091d8a&gt;] generic_file_aio_read+0x2ab/0x67e
 [&lt;ffffffffa008cfbe&gt;] nfs_file_read+0xa4/0xc9 [nfs]
 [&lt;ffffffff810c22c4&gt;] do_sync_read+0xba/0xfa
 [&lt;ffffffff81177a47&gt;] ? security_file_permission+0x7b/0x84
 [&lt;ffffffff810c25dd&gt;] ? rw_verify_area+0xab/0xc8
 [&lt;ffffffff810c29a4&gt;] vfs_read+0xaa/0x13a
 [&lt;ffffffff810c2a79&gt;] sys_read+0x45/0x6c
 [&lt;ffffffff813ac37b&gt;] system_call_fastpath+0x16/0x1b

As can be seen, PG_private_2 (== PG_fscache) is set in the page flags.

Instrumenting fscache_mark_pages_cached() to verify whether page-&gt;mapping was
set appropriately showed that sometimes it wasn't.  This led to the discovery
that sometimes the page has apparently been reclaimed by the time the marker
got to see it.

Reported-by: M. Stevens &lt;m@tippett.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Under some circumstances CacheFiles defers the marking of pages with PG_fscache
so that it can take advantage of pagevecs to reduce the number of calls to
fscache_mark_pages_cached() and the netfs's hook to keep track of this.

There are, however, two problems with this:

 (1) It can lead to the PG_fscache mark being applied _after_ the page is set
     PG_uptodate and unlocked (by the call to fscache_end_io()).

 (2) CacheFiles's ref on the page is dropped immediately following
     fscache_end_io() - and so may not still be held when the mark is applied.
     This can lead to the page being passed back to the allocator before the
     mark is applied.

Fix this by, where appropriate, marking the page before calling
fscache_end_io() and releasing the page.  This means that we can't take
advantage of pagevecs and have to make a separate call for each page to the
marking routines.

The symptoms of this are Bad Page state errors cropping up under memory
pressure, for example:

BUG: Bad page state in process tar  pfn:002da
page:ffffea0000009fb0 count:0 mapcount:0 mapping:          (null) index:0x1447
page flags: 0x1000(private_2)
Pid: 4574, comm: tar Tainted: G        W   3.1.0-rc4-fsdevel+ #1064
Call Trace:
 [&lt;ffffffff8109583c&gt;] ? dump_page+0xb9/0xbe
 [&lt;ffffffff81095916&gt;] bad_page+0xd5/0xea
 [&lt;ffffffff81095d82&gt;] get_page_from_freelist+0x35b/0x46a
 [&lt;ffffffff810961f3&gt;] __alloc_pages_nodemask+0x362/0x662
 [&lt;ffffffff810989da&gt;] __do_page_cache_readahead+0x13a/0x267
 [&lt;ffffffff81098942&gt;] ? __do_page_cache_readahead+0xa2/0x267
 [&lt;ffffffff81098d7b&gt;] ra_submit+0x1c/0x20
 [&lt;ffffffff8109900a&gt;] ondemand_readahead+0x28b/0x29a
 [&lt;ffffffff81098ee2&gt;] ? ondemand_readahead+0x163/0x29a
 [&lt;ffffffff810990ce&gt;] page_cache_sync_readahead+0x38/0x3a
 [&lt;ffffffff81091d8a&gt;] generic_file_aio_read+0x2ab/0x67e
 [&lt;ffffffffa008cfbe&gt;] nfs_file_read+0xa4/0xc9 [nfs]
 [&lt;ffffffff810c22c4&gt;] do_sync_read+0xba/0xfa
 [&lt;ffffffff81177a47&gt;] ? security_file_permission+0x7b/0x84
 [&lt;ffffffff810c25dd&gt;] ? rw_verify_area+0xab/0xc8
 [&lt;ffffffff810c29a4&gt;] vfs_read+0xaa/0x13a
 [&lt;ffffffff810c2a79&gt;] sys_read+0x45/0x6c
 [&lt;ffffffff813ac37b&gt;] system_call_fastpath+0x16/0x1b

As can be seen, PG_private_2 (== PG_fscache) is set in the page flags.

Instrumenting fscache_mark_pages_cached() to verify whether page-&gt;mapping was
set appropriately showed that sometimes it wasn't.  This led to the discovery
that sometimes the page has apparently been reclaimed by the time the marker
got to see it.

Reported-by: M. Stevens &lt;m@tippett.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>FS-Cache: Add a helper to bulk uncache pages on an inode</title>
<updated>2011-07-07T20:21:56+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2011-07-07T11:19:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c902ce1bfb40d8b049bd2319b388b4b68b04bc27'/>
<id>c902ce1bfb40d8b049bd2319b388b4b68b04bc27</id>
<content type='text'>
Add an FS-Cache helper to bulk uncache pages on an inode.  This will
only work for the circumstance where the pages in the cache correspond
1:1 with the pages attached to an inode's page cache.

This is required for CIFS and NFS: When disabling inode cookie, we were
returning the cookie and setting cifsi-&gt;fscache to NULL but failed to
invalidate any previously mapped pages.  This resulted in "Bad page
state" errors and manifested in other kind of errors when running
fsstress.  Fix it by uncaching mapped pages when we disable the inode
cookie.

This patch should fix the following oops and "Bad page state" errors
seen during fsstress testing.

  ------------[ cut here ]------------
  kernel BUG at fs/cachefiles/namei.c:201!
  invalid opcode: 0000 [#1] SMP
  Pid: 5, comm: kworker/u:0 Not tainted 2.6.38.7-30.fc15.x86_64 #1 Bochs Bochs
  RIP: 0010: cachefiles_walk_to_object+0x436/0x745 [cachefiles]
  RSP: 0018:ffff88002ce6dd00  EFLAGS: 00010282
  RAX: ffff88002ef165f0 RBX: ffff88001811f500 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000282
  RBP: ffff88002ce6dda0 R08: 0000000000000100 R09: ffffffff81b3a300
  R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff88002ae54840
  R13: ffff88002ae54840 R14: ffff880029c29c00 R15: ffff88001811f4b0
  FS:  00007f394dd32720(0000) GS:ffff88002ef00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00007fffcb62ddf8 CR3: 000000001825f000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process kworker/u:0 (pid: 5, threadinfo ffff88002ce6c000, task ffff88002ce55cc0)
  Stack:
   0000000000000246 ffff88002ce55cc0 ffff88002ce6dd58 ffff88001815dc00
   ffff8800185246c0 ffff88001811f618 ffff880029c29d18 ffff88001811f380
   ffff88002ce6dd50 ffffffff814757e4 ffff88002ce6dda0 ffffffff8106ac56
  Call Trace:
   cachefiles_lookup_object+0x78/0xd4 [cachefiles]
   fscache_lookup_object+0x131/0x16d [fscache]
   fscache_object_work_func+0x1bc/0x669 [fscache]
   process_one_work+0x186/0x298
   worker_thread+0xda/0x15d
   kthread+0x84/0x8c
   kernel_thread_helper+0x4/0x10
  RIP  cachefiles_walk_to_object+0x436/0x745 [cachefiles]
  ---[ end trace 1d481c9af1804caa ]---

I tested the uncaching by the following means:

 (1) Create a big file on my NFS server (104857600 bytes).

 (2) Read the file into the cache with md5sum on the NFS client.  Look in
     /proc/fs/fscache/stats:

	Pages  : mrk=25601 unc=0

 (3) Open the file for read/write ("bash 5&lt;&gt;/warthog/bigfile").  Look in proc
     again:

	Pages  : mrk=25601 unc=25601

Reported-by: Jeff Layton &lt;jlayton@redhat.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-and-Tested-by: Suresh Jayaraman &lt;sjayaraman@suse.de&gt;
cc: stable@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add an FS-Cache helper to bulk uncache pages on an inode.  This will
only work for the circumstance where the pages in the cache correspond
1:1 with the pages attached to an inode's page cache.

This is required for CIFS and NFS: When disabling inode cookie, we were
returning the cookie and setting cifsi-&gt;fscache to NULL but failed to
invalidate any previously mapped pages.  This resulted in "Bad page
state" errors and manifested in other kind of errors when running
fsstress.  Fix it by uncaching mapped pages when we disable the inode
cookie.

This patch should fix the following oops and "Bad page state" errors
seen during fsstress testing.

  ------------[ cut here ]------------
  kernel BUG at fs/cachefiles/namei.c:201!
  invalid opcode: 0000 [#1] SMP
  Pid: 5, comm: kworker/u:0 Not tainted 2.6.38.7-30.fc15.x86_64 #1 Bochs Bochs
  RIP: 0010: cachefiles_walk_to_object+0x436/0x745 [cachefiles]
  RSP: 0018:ffff88002ce6dd00  EFLAGS: 00010282
  RAX: ffff88002ef165f0 RBX: ffff88001811f500 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000282
  RBP: ffff88002ce6dda0 R08: 0000000000000100 R09: ffffffff81b3a300
  R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff88002ae54840
  R13: ffff88002ae54840 R14: ffff880029c29c00 R15: ffff88001811f4b0
  FS:  00007f394dd32720(0000) GS:ffff88002ef00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00007fffcb62ddf8 CR3: 000000001825f000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process kworker/u:0 (pid: 5, threadinfo ffff88002ce6c000, task ffff88002ce55cc0)
  Stack:
   0000000000000246 ffff88002ce55cc0 ffff88002ce6dd58 ffff88001815dc00
   ffff8800185246c0 ffff88001811f618 ffff880029c29d18 ffff88001811f380
   ffff88002ce6dd50 ffffffff814757e4 ffff88002ce6dda0 ffffffff8106ac56
  Call Trace:
   cachefiles_lookup_object+0x78/0xd4 [cachefiles]
   fscache_lookup_object+0x131/0x16d [fscache]
   fscache_object_work_func+0x1bc/0x669 [fscache]
   process_one_work+0x186/0x298
   worker_thread+0xda/0x15d
   kthread+0x84/0x8c
   kernel_thread_helper+0x4/0x10
  RIP  cachefiles_walk_to_object+0x436/0x745 [cachefiles]
  ---[ end trace 1d481c9af1804caa ]---

I tested the uncaching by the following means:

 (1) Create a big file on my NFS server (104857600 bytes).

 (2) Read the file into the cache with md5sum on the NFS client.  Look in
     /proc/fs/fscache/stats:

	Pages  : mrk=25601 unc=0

 (3) Open the file for read/write ("bash 5&lt;&gt;/warthog/bigfile").  Look in proc
     again:

	Pages  : mrk=25601 unc=25601

Reported-by: Jeff Layton &lt;jlayton@redhat.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-and-Tested-by: Suresh Jayaraman &lt;sjayaraman@suse.de&gt;
cc: stable@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix common misspellings</title>
<updated>2011-03-31T14:26:23+00:00</updated>
<author>
<name>Lucas De Marchi</name>
<email>lucas.demarchi@profusion.mobi</email>
</author>
<published>2011-03-31T01:57:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=25985edcedea6396277003854657b5f3cb31a628'/>
<id>25985edcedea6396277003854657b5f3cb31a628</id>
<content type='text'>
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@profusion.mobi&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@profusion.mobi&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fscache: fix missing kerneldoc annotation</title>
<updated>2010-07-11T20:22:23+00:00</updated>
<author>
<name>Suresh Jayaraman</name>
<email>sjayaraman@suse.de</email>
</author>
<published>2010-07-06T12:59:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=49a3df804bec09b8ee8196f79b81757e95cc6de4'/>
<id>49a3df804bec09b8ee8196f79b81757e95cc6de4</id>
<content type='text'>
.. and make kerneldoc scripts happy.

Signed-off-by: Suresh Jayaraman &lt;sjayaraman@suse.de&gt;
Acked-by: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
.. and make kerneldoc scripts happy.

Signed-off-by: Suresh Jayaraman &lt;sjayaraman@suse.de&gt;
Acked-by: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fscache: fix a trivial typo in the comment</title>
<updated>2010-07-11T20:21:26+00:00</updated>
<author>
<name>Suresh Jayaraman</name>
<email>sjayaraman@suse.de</email>
</author>
<published>2010-07-06T12:42:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ab0cfb928a3839e21942a28a86ad88e56ea3b136'/>
<id>ab0cfb928a3839e21942a28a86ad88e56ea3b136</id>
<content type='text'>
Signed-off-by: Suresh Jayaraman &lt;sjayaraman@suse.de&gt;
Acked-by: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Suresh Jayaraman &lt;sjayaraman@suse.de&gt;
Acked-by: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>FS-Cache: Handle pages pending storage that get evicted under OOM conditions</title>
<updated>2009-11-19T18:11:35+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2009-11-19T18:11:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=201a15428bd54f83eccec8b7c64a04b8f9431204'/>
<id>201a15428bd54f83eccec8b7c64a04b8f9431204</id>
<content type='text'>
Handle netfs pages that the vmscan algorithm wants to evict from the pagecache
under OOM conditions, but that are waiting for write to the cache.  Under these
conditions, vmscan calls the releasepage() function of the netfs, asking if a
page can be discarded.

The problem is typified by the following trace of a stuck process:

	kslowd005     D 0000000000000000     0  4253      2 0x00000080
	 ffff88001b14f370 0000000000000046 ffff880020d0d000 0000000000000007
	 0000000000000006 0000000000000001 ffff88001b14ffd8 ffff880020d0d2a8
	 000000000000ddf0 00000000000118c0 00000000000118c0 ffff880020d0d2a8
	Call Trace:
	 [&lt;ffffffffa00782d8&gt;] __fscache_wait_on_page_write+0x8b/0xa7 [fscache]
	 [&lt;ffffffff8104c0f1&gt;] ? autoremove_wake_function+0x0/0x34
	 [&lt;ffffffffa0078240&gt;] ? __fscache_check_page_write+0x63/0x70 [fscache]
	 [&lt;ffffffffa00b671d&gt;] nfs_fscache_release_page+0x4e/0xc4 [nfs]
	 [&lt;ffffffffa00927f0&gt;] nfs_release_page+0x3c/0x41 [nfs]
	 [&lt;ffffffff810885d3&gt;] try_to_release_page+0x32/0x3b
	 [&lt;ffffffff81093203&gt;] shrink_page_list+0x316/0x4ac
	 [&lt;ffffffff8109372b&gt;] shrink_inactive_list+0x392/0x67c
	 [&lt;ffffffff813532fa&gt;] ? __mutex_unlock_slowpath+0x100/0x10b
	 [&lt;ffffffff81058df0&gt;] ? trace_hardirqs_on_caller+0x10c/0x130
	 [&lt;ffffffff8135330e&gt;] ? mutex_unlock+0x9/0xb
	 [&lt;ffffffff81093aa2&gt;] shrink_list+0x8d/0x8f
	 [&lt;ffffffff81093d1c&gt;] shrink_zone+0x278/0x33c
	 [&lt;ffffffff81052d6c&gt;] ? ktime_get_ts+0xad/0xba
	 [&lt;ffffffff81094b13&gt;] try_to_free_pages+0x22e/0x392
	 [&lt;ffffffff81091e24&gt;] ? isolate_pages_global+0x0/0x212
	 [&lt;ffffffff8108e743&gt;] __alloc_pages_nodemask+0x3dc/0x5cf
	 [&lt;ffffffff81089529&gt;] grab_cache_page_write_begin+0x65/0xaa
	 [&lt;ffffffff8110f8c0&gt;] ext3_write_begin+0x78/0x1eb
	 [&lt;ffffffff81089ec5&gt;] generic_file_buffered_write+0x109/0x28c
	 [&lt;ffffffff8103cb69&gt;] ? current_fs_time+0x22/0x29
	 [&lt;ffffffff8108a509&gt;] __generic_file_aio_write+0x350/0x385
	 [&lt;ffffffff8108a588&gt;] ? generic_file_aio_write+0x4a/0xae
	 [&lt;ffffffff8108a59e&gt;] generic_file_aio_write+0x60/0xae
	 [&lt;ffffffff810b2e82&gt;] do_sync_write+0xe3/0x120
	 [&lt;ffffffff8104c0f1&gt;] ? autoremove_wake_function+0x0/0x34
	 [&lt;ffffffff810b18e1&gt;] ? __dentry_open+0x1a5/0x2b8
	 [&lt;ffffffff810b1a76&gt;] ? dentry_open+0x82/0x89
	 [&lt;ffffffffa00e693c&gt;] cachefiles_write_page+0x298/0x335 [cachefiles]
	 [&lt;ffffffffa0077147&gt;] fscache_write_op+0x178/0x2c2 [fscache]
	 [&lt;ffffffffa0075656&gt;] fscache_op_execute+0x7a/0xd1 [fscache]
	 [&lt;ffffffff81082093&gt;] slow_work_execute+0x18f/0x2d1
	 [&lt;ffffffff8108239a&gt;] slow_work_thread+0x1c5/0x308
	 [&lt;ffffffff8104c0f1&gt;] ? autoremove_wake_function+0x0/0x34
	 [&lt;ffffffff810821d5&gt;] ? slow_work_thread+0x0/0x308
	 [&lt;ffffffff8104be91&gt;] kthread+0x7a/0x82
	 [&lt;ffffffff8100beda&gt;] child_rip+0xa/0x20
	 [&lt;ffffffff8100b87c&gt;] ? restore_args+0x0/0x30
	 [&lt;ffffffff8102ef83&gt;] ? tg_shares_up+0x171/0x227
	 [&lt;ffffffff8104be17&gt;] ? kthread+0x0/0x82
	 [&lt;ffffffff8100bed0&gt;] ? child_rip+0x0/0x20

In the above backtrace, the following is happening:

 (1) A page storage operation is being executed by a slow-work thread
     (fscache_write_op()).

 (2) FS-Cache farms the operation out to the cache to perform
     (cachefiles_write_page()).

 (3) CacheFiles is then calling Ext3 to perform the actual write, using Ext3's
     standard write (do_sync_write()) under KERNEL_DS directly from the netfs
     page.

 (4) However, for Ext3 to perform the write, it must allocate some memory, in
     particular, it must allocate at least one page cache page into which it
     can copy the data from the netfs page.

 (5) Under OOM conditions, the memory allocator can't immediately come up with
     a page, so it uses vmscan to find something to discard
     (try_to_free_pages()).

 (6) vmscan finds a clean netfs page it might be able to discard (possibly the
     one it's trying to write out).

 (7) The netfs is called to throw the page away (nfs_release_page()) - but it's
     called with __GFP_WAIT, so the netfs decides to wait for the store to
     complete (__fscache_wait_on_page_write()).

 (8) This blocks a slow-work processing thread - possibly against itself.

The system ends up stuck because it can't write out any netfs pages to the
cache without allocating more memory.

To avoid this, we make FS-Cache cancel some writes that aren't in the middle of
actually being performed.  This means that some data won't make it into the
cache this time.  To support this, a new FS-Cache function is added
fscache_maybe_release_page() that replaces what the netfs releasepage()
functions used to do with respect to the cache.

The decisions fscache_maybe_release_page() makes are counted and displayed
through /proc/fs/fscache/stats on a line labelled "VmScan".  There are four
counters provided: "nos=N" - pages that weren't pending storage; "gon=N" -
pages that were pending storage when we first looked, but weren't by the time
we got the object lock; "bsy=N" - pages that we ignored as they were actively
being written when we looked; and "can=N" - pages that we cancelled the storage
of.

What I'd really like to do is alter the behaviour of the cancellation
heuristics, depending on how necessary it is to expel pages.  If there are
plenty of other pages that aren't waiting to be written to the cache that
could be ejected first, then it would be nice to hold up on immediate
cancellation of cache writes - but I don't see a way of doing that.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Handle netfs pages that the vmscan algorithm wants to evict from the pagecache
under OOM conditions, but that are waiting for write to the cache.  Under these
conditions, vmscan calls the releasepage() function of the netfs, asking if a
page can be discarded.

The problem is typified by the following trace of a stuck process:

	kslowd005     D 0000000000000000     0  4253      2 0x00000080
	 ffff88001b14f370 0000000000000046 ffff880020d0d000 0000000000000007
	 0000000000000006 0000000000000001 ffff88001b14ffd8 ffff880020d0d2a8
	 000000000000ddf0 00000000000118c0 00000000000118c0 ffff880020d0d2a8
	Call Trace:
	 [&lt;ffffffffa00782d8&gt;] __fscache_wait_on_page_write+0x8b/0xa7 [fscache]
	 [&lt;ffffffff8104c0f1&gt;] ? autoremove_wake_function+0x0/0x34
	 [&lt;ffffffffa0078240&gt;] ? __fscache_check_page_write+0x63/0x70 [fscache]
	 [&lt;ffffffffa00b671d&gt;] nfs_fscache_release_page+0x4e/0xc4 [nfs]
	 [&lt;ffffffffa00927f0&gt;] nfs_release_page+0x3c/0x41 [nfs]
	 [&lt;ffffffff810885d3&gt;] try_to_release_page+0x32/0x3b
	 [&lt;ffffffff81093203&gt;] shrink_page_list+0x316/0x4ac
	 [&lt;ffffffff8109372b&gt;] shrink_inactive_list+0x392/0x67c
	 [&lt;ffffffff813532fa&gt;] ? __mutex_unlock_slowpath+0x100/0x10b
	 [&lt;ffffffff81058df0&gt;] ? trace_hardirqs_on_caller+0x10c/0x130
	 [&lt;ffffffff8135330e&gt;] ? mutex_unlock+0x9/0xb
	 [&lt;ffffffff81093aa2&gt;] shrink_list+0x8d/0x8f
	 [&lt;ffffffff81093d1c&gt;] shrink_zone+0x278/0x33c
	 [&lt;ffffffff81052d6c&gt;] ? ktime_get_ts+0xad/0xba
	 [&lt;ffffffff81094b13&gt;] try_to_free_pages+0x22e/0x392
	 [&lt;ffffffff81091e24&gt;] ? isolate_pages_global+0x0/0x212
	 [&lt;ffffffff8108e743&gt;] __alloc_pages_nodemask+0x3dc/0x5cf
	 [&lt;ffffffff81089529&gt;] grab_cache_page_write_begin+0x65/0xaa
	 [&lt;ffffffff8110f8c0&gt;] ext3_write_begin+0x78/0x1eb
	 [&lt;ffffffff81089ec5&gt;] generic_file_buffered_write+0x109/0x28c
	 [&lt;ffffffff8103cb69&gt;] ? current_fs_time+0x22/0x29
	 [&lt;ffffffff8108a509&gt;] __generic_file_aio_write+0x350/0x385
	 [&lt;ffffffff8108a588&gt;] ? generic_file_aio_write+0x4a/0xae
	 [&lt;ffffffff8108a59e&gt;] generic_file_aio_write+0x60/0xae
	 [&lt;ffffffff810b2e82&gt;] do_sync_write+0xe3/0x120
	 [&lt;ffffffff8104c0f1&gt;] ? autoremove_wake_function+0x0/0x34
	 [&lt;ffffffff810b18e1&gt;] ? __dentry_open+0x1a5/0x2b8
	 [&lt;ffffffff810b1a76&gt;] ? dentry_open+0x82/0x89
	 [&lt;ffffffffa00e693c&gt;] cachefiles_write_page+0x298/0x335 [cachefiles]
	 [&lt;ffffffffa0077147&gt;] fscache_write_op+0x178/0x2c2 [fscache]
	 [&lt;ffffffffa0075656&gt;] fscache_op_execute+0x7a/0xd1 [fscache]
	 [&lt;ffffffff81082093&gt;] slow_work_execute+0x18f/0x2d1
	 [&lt;ffffffff8108239a&gt;] slow_work_thread+0x1c5/0x308
	 [&lt;ffffffff8104c0f1&gt;] ? autoremove_wake_function+0x0/0x34
	 [&lt;ffffffff810821d5&gt;] ? slow_work_thread+0x0/0x308
	 [&lt;ffffffff8104be91&gt;] kthread+0x7a/0x82
	 [&lt;ffffffff8100beda&gt;] child_rip+0xa/0x20
	 [&lt;ffffffff8100b87c&gt;] ? restore_args+0x0/0x30
	 [&lt;ffffffff8102ef83&gt;] ? tg_shares_up+0x171/0x227
	 [&lt;ffffffff8104be17&gt;] ? kthread+0x0/0x82
	 [&lt;ffffffff8100bed0&gt;] ? child_rip+0x0/0x20

In the above backtrace, the following is happening:

 (1) A page storage operation is being executed by a slow-work thread
     (fscache_write_op()).

 (2) FS-Cache farms the operation out to the cache to perform
     (cachefiles_write_page()).

 (3) CacheFiles is then calling Ext3 to perform the actual write, using Ext3's
     standard write (do_sync_write()) under KERNEL_DS directly from the netfs
     page.

 (4) However, for Ext3 to perform the write, it must allocate some memory, in
     particular, it must allocate at least one page cache page into which it
     can copy the data from the netfs page.

 (5) Under OOM conditions, the memory allocator can't immediately come up with
     a page, so it uses vmscan to find something to discard
     (try_to_free_pages()).

 (6) vmscan finds a clean netfs page it might be able to discard (possibly the
     one it's trying to write out).

 (7) The netfs is called to throw the page away (nfs_release_page()) - but it's
     called with __GFP_WAIT, so the netfs decides to wait for the store to
     complete (__fscache_wait_on_page_write()).

 (8) This blocks a slow-work processing thread - possibly against itself.

The system ends up stuck because it can't write out any netfs pages to the
cache without allocating more memory.

To avoid this, we make FS-Cache cancel some writes that aren't in the middle of
actually being performed.  This means that some data won't make it into the
cache this time.  To support this, a new FS-Cache function is added
fscache_maybe_release_page() that replaces what the netfs releasepage()
functions used to do with respect to the cache.

The decisions fscache_maybe_release_page() makes are counted and displayed
through /proc/fs/fscache/stats on a line labelled "VmScan".  There are four
counters provided: "nos=N" - pages that weren't pending storage; "gon=N" -
pages that were pending storage when we first looked, but weren't by the time
we got the object lock; "bsy=N" - pages that we ignored as they were actively
being written when we looked; and "can=N" - pages that we cancelled the storage
of.

What I'd really like to do is alter the behaviour of the cancellation
heuristics, depending on how necessary it is to expel pages.  If there are
plenty of other pages that aren't waiting to be written to the cache that
could be ejected first, then it would be nice to hold up on immediate
cancellation of cache writes - but I don't see a way of doing that.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
