<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/fs/fscache, branch v4.2.1</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>FS-Cache: Retain the netfs context in the retrieval op earlier</title>
<updated>2015-04-02T13:28:53+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2015-02-24T10:05:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4a47132ff472a0c2c5441baeb50cf97f2580bc43'/>
<id>4a47132ff472a0c2c5441baeb50cf97f2580bc43</id>
<content type='text'>
Now that the retrieval operation may be disposed of by fscache_put_operation()
before we actually set the context, the retrieval-specific cleanup operation
can produce a NULL-pointer dereference when it tries to unconditionally clean
up the netfs context.

Given that it is expected that we'll get at least as far as the place where we
currently set the context pointer and it is unlikely we'll go through the
error handling paths prior to that point, retain the context right from the
point that the retrieval op is allocated.

Concomitant to this, we need to retain the cookie pointer in the retrieval op
also so that we can call the netfs to release its context in the release
method.

In addition, we might now get into fscache_release_retrieval_op() with the op
only initialised.  To this end, set the operation to DEAD only after the
release method has been called and skip the n_pages test upon cleanup if the
op is still in the INITIALISED state.

Without these changes, the following oops might be seen:

	BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8
	...
	RIP: 0010:[&lt;ffffffffa0089c98&gt;] fscache_release_retrieval_op+0xae/0x100
	...
	Call Trace:
	 [&lt;ffffffffa0088560&gt;] fscache_put_operation+0x117/0x2e0
	 [&lt;ffffffffa008b8f5&gt;] __fscache_read_or_alloc_pages+0x351/0x3ac
	 [&lt;ffffffffa00b761f&gt;] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
	 [&lt;ffffffffa00b06c5&gt;] nfs_readpages+0x10c/0x185 [nfs]
	 [&lt;ffffffff81124925&gt;] ? alloc_pages_current+0x119/0x13e
	 [&lt;ffffffff810ee5fd&gt;] ? __page_cache_alloc+0xfb/0x10a
	 [&lt;ffffffff810f87f8&gt;] __do_page_cache_readahead+0x188/0x22c
	 [&lt;ffffffff810f8b3a&gt;] ondemand_readahead+0x29e/0x2af
	 [&lt;ffffffff810f8c92&gt;] page_cache_sync_readahead+0x38/0x3a
	 [&lt;ffffffff810ef337&gt;] generic_file_read_iter+0x1a2/0x55a
	 [&lt;ffffffffa00a9dff&gt;] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
	 [&lt;ffffffffa00a6a23&gt;] nfs_file_read+0x49/0x70 [nfs]
	 [&lt;ffffffff811363be&gt;] new_sync_read+0x78/0x9c
	 [&lt;ffffffff81137164&gt;] __vfs_read+0x13/0x38
	 [&lt;ffffffff8113721e&gt;] vfs_read+0x95/0x121
	 [&lt;ffffffff811372f6&gt;] SyS_read+0x4c/0x8a
	 [&lt;ffffffff81557a52&gt;] system_call_fastpath+0x12/0x17

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Steve Dickson &lt;steved@redhat.com&gt;
Acked-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that the retrieval operation may be disposed of by fscache_put_operation()
before we actually set the context, the retrieval-specific cleanup operation
can produce a NULL-pointer dereference when it tries to unconditionally clean
up the netfs context.

Given that it is expected that we'll get at least as far as the place where we
currently set the context pointer and it is unlikely we'll go through the
error handling paths prior to that point, retain the context right from the
point that the retrieval op is allocated.

Concomitant to this, we need to retain the cookie pointer in the retrieval op
also so that we can call the netfs to release its context in the release
method.

In addition, we might now get into fscache_release_retrieval_op() with the op
only initialised.  To this end, set the operation to DEAD only after the
release method has been called and skip the n_pages test upon cleanup if the
op is still in the INITIALISED state.

Without these changes, the following oops might be seen:

	BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8
	...
	RIP: 0010:[&lt;ffffffffa0089c98&gt;] fscache_release_retrieval_op+0xae/0x100
	...
	Call Trace:
	 [&lt;ffffffffa0088560&gt;] fscache_put_operation+0x117/0x2e0
	 [&lt;ffffffffa008b8f5&gt;] __fscache_read_or_alloc_pages+0x351/0x3ac
	 [&lt;ffffffffa00b761f&gt;] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
	 [&lt;ffffffffa00b06c5&gt;] nfs_readpages+0x10c/0x185 [nfs]
	 [&lt;ffffffff81124925&gt;] ? alloc_pages_current+0x119/0x13e
	 [&lt;ffffffff810ee5fd&gt;] ? __page_cache_alloc+0xfb/0x10a
	 [&lt;ffffffff810f87f8&gt;] __do_page_cache_readahead+0x188/0x22c
	 [&lt;ffffffff810f8b3a&gt;] ondemand_readahead+0x29e/0x2af
	 [&lt;ffffffff810f8c92&gt;] page_cache_sync_readahead+0x38/0x3a
	 [&lt;ffffffff810ef337&gt;] generic_file_read_iter+0x1a2/0x55a
	 [&lt;ffffffffa00a9dff&gt;] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
	 [&lt;ffffffffa00a6a23&gt;] nfs_file_read+0x49/0x70 [nfs]
	 [&lt;ffffffff811363be&gt;] new_sync_read+0x78/0x9c
	 [&lt;ffffffff81137164&gt;] __vfs_read+0x13/0x38
	 [&lt;ffffffff8113721e&gt;] vfs_read+0x95/0x121
	 [&lt;ffffffff811372f6&gt;] SyS_read+0x4c/0x8a
	 [&lt;ffffffff81557a52&gt;] system_call_fastpath+0x12/0x17

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Steve Dickson &lt;steved@redhat.com&gt;
Acked-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>FS-Cache: The operation cancellation method needs calling in more places</title>
<updated>2015-04-02T13:28:53+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2015-02-24T10:05:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d3b97ca4a99e4e6c78f5a21c968eadf5c8ba9971'/>
<id>d3b97ca4a99e4e6c78f5a21c968eadf5c8ba9971</id>
<content type='text'>
Any time an incomplete operation is cancelled, the operation cancellation
function needs to be called to clean up.  This is currently being passed
directly to some of the functions that might want to call it, but not all.

Instead, pass the cancellation method pointer to the fscache_operation_init()
and have that cache it in the operation struct.  Further, plug in a dummy
cancellation handler if the caller declines to set one as this allows us to
call the function unconditionally (the extra overhead isn't worth bothering
about as we don't expect to be calling this typically).

The cancellation method must thence be called everywhere the CANCELLED state
is set.  Note that we call it *before* setting the CANCELLED state such that
the method can use the old state value to guide its operation.

fscache_do_cancel_retrieval() needs moving higher up in the sources so that
the init function can use it now.

Without this, the following oops may be seen:

	FS-Cache: Assertion failed
	FS-Cache: 3 == 0 is false
	------------[ cut here ]------------
	kernel BUG at ../fs/fscache/page.c:261!
	...
	RIP: 0010:[&lt;ffffffffa0089c1b&gt;]  fscache_release_retrieval_op+0x77/0x100
	 [&lt;ffffffffa008853d&gt;] fscache_put_operation+0x114/0x2da
	 [&lt;ffffffffa008b8c2&gt;] __fscache_read_or_alloc_pages+0x358/0x3b3
	 [&lt;ffffffffa00b761f&gt;] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
	 [&lt;ffffffffa00b06c5&gt;] nfs_readpages+0x10c/0x185 [nfs]
	 [&lt;ffffffff81124925&gt;] ? alloc_pages_current+0x119/0x13e
	 [&lt;ffffffff810ee5fd&gt;] ? __page_cache_alloc+0xfb/0x10a
	 [&lt;ffffffff810f87f8&gt;] __do_page_cache_readahead+0x188/0x22c
	 [&lt;ffffffff810f8b3a&gt;] ondemand_readahead+0x29e/0x2af
	 [&lt;ffffffff810f8c92&gt;] page_cache_sync_readahead+0x38/0x3a
	 [&lt;ffffffff810ef337&gt;] generic_file_read_iter+0x1a2/0x55a
	 [&lt;ffffffffa00a9dff&gt;] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
	 [&lt;ffffffffa00a6a23&gt;] nfs_file_read+0x49/0x70 [nfs]
	 [&lt;ffffffff811363be&gt;] new_sync_read+0x78/0x9c
	 [&lt;ffffffff81137164&gt;] __vfs_read+0x13/0x38
	 [&lt;ffffffff8113721e&gt;] vfs_read+0x95/0x121
	 [&lt;ffffffff811372f6&gt;] SyS_read+0x4c/0x8a
	 [&lt;ffffffff81557a52&gt;] system_call_fastpath+0x12/0x17

The assertion is showing that the remaining number of pages (n_pages) is not 0
when the operation is being released.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Steve Dickson &lt;steved@redhat.com&gt;
Acked-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Any time an incomplete operation is cancelled, the operation cancellation
function needs to be called to clean up.  This is currently being passed
directly to some of the functions that might want to call it, but not all.

Instead, pass the cancellation method pointer to the fscache_operation_init()
and have that cache it in the operation struct.  Further, plug in a dummy
cancellation handler if the caller declines to set one as this allows us to
call the function unconditionally (the extra overhead isn't worth bothering
about as we don't expect to be calling this typically).

The cancellation method must thence be called everywhere the CANCELLED state
is set.  Note that we call it *before* setting the CANCELLED state such that
the method can use the old state value to guide its operation.

fscache_do_cancel_retrieval() needs moving higher up in the sources so that
the init function can use it now.

Without this, the following oops may be seen:

	FS-Cache: Assertion failed
	FS-Cache: 3 == 0 is false
	------------[ cut here ]------------
	kernel BUG at ../fs/fscache/page.c:261!
	...
	RIP: 0010:[&lt;ffffffffa0089c1b&gt;]  fscache_release_retrieval_op+0x77/0x100
	 [&lt;ffffffffa008853d&gt;] fscache_put_operation+0x114/0x2da
	 [&lt;ffffffffa008b8c2&gt;] __fscache_read_or_alloc_pages+0x358/0x3b3
	 [&lt;ffffffffa00b761f&gt;] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
	 [&lt;ffffffffa00b06c5&gt;] nfs_readpages+0x10c/0x185 [nfs]
	 [&lt;ffffffff81124925&gt;] ? alloc_pages_current+0x119/0x13e
	 [&lt;ffffffff810ee5fd&gt;] ? __page_cache_alloc+0xfb/0x10a
	 [&lt;ffffffff810f87f8&gt;] __do_page_cache_readahead+0x188/0x22c
	 [&lt;ffffffff810f8b3a&gt;] ondemand_readahead+0x29e/0x2af
	 [&lt;ffffffff810f8c92&gt;] page_cache_sync_readahead+0x38/0x3a
	 [&lt;ffffffff810ef337&gt;] generic_file_read_iter+0x1a2/0x55a
	 [&lt;ffffffffa00a9dff&gt;] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
	 [&lt;ffffffffa00a6a23&gt;] nfs_file_read+0x49/0x70 [nfs]
	 [&lt;ffffffff811363be&gt;] new_sync_read+0x78/0x9c
	 [&lt;ffffffff81137164&gt;] __vfs_read+0x13/0x38
	 [&lt;ffffffff8113721e&gt;] vfs_read+0x95/0x121
	 [&lt;ffffffff811372f6&gt;] SyS_read+0x4c/0x8a
	 [&lt;ffffffff81557a52&gt;] system_call_fastpath+0x12/0x17

The assertion is showing that the remaining number of pages (n_pages) is not 0
when the operation is being released.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Steve Dickson &lt;steved@redhat.com&gt;
Acked-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>FS-Cache: Put an aborted initialised op so that it is accounted correctly</title>
<updated>2015-04-02T13:28:53+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2015-02-25T14:22:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a39caadf06879017cb9a8c5c5cb4fc4ccb213275'/>
<id>a39caadf06879017cb9a8c5c5cb4fc4ccb213275</id>
<content type='text'>
Call fscache_put_operation() or a wrapper on any op that has gone through
fscache_operation_init() so that the accounting shown in /proc is done
correctly, specifically fscache_n_op_release.

fscache_put_operation() therefore now allows an op in the INITIALISED state as
well as in the CANCELLED and COMPLETE states.

Note that this means that an operation can get put that doesn't have its
-&gt;object pointer filled in, so anything that depends on the object needs to be
conditional in fscache_put_operation().

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Steve Dickson &lt;steved@redhat.com&gt;
Acked-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Call fscache_put_operation() or a wrapper on any op that has gone through
fscache_operation_init() so that the accounting shown in /proc is done
correctly, specifically fscache_n_op_release.

fscache_put_operation() therefore now allows an op in the INITIALISED state as
well as in the CANCELLED and COMPLETE states.

Note that this means that an operation can get put that doesn't have its
-&gt;object pointer filled in, so anything that depends on the object needs to be
conditional in fscache_put_operation().

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Steve Dickson &lt;steved@redhat.com&gt;
Acked-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>FS-Cache: Fix cancellation of in-progress operation</title>
<updated>2015-04-02T13:28:53+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2015-02-25T14:07:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=73c04a47bf79770fbe7f3cf515f5831fccab88ee'/>
<id>73c04a47bf79770fbe7f3cf515f5831fccab88ee</id>
<content type='text'>
Cancellation of an in-progress operation needs to update the relevant counters
and start any operations that are pending waiting on this one.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Steve Dickson &lt;steved@redhat.com&gt;
Acked-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Cancellation of an in-progress operation needs to update the relevant counters
and start any operations that are pending waiting on this one.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Steve Dickson &lt;steved@redhat.com&gt;
Acked-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>FS-Cache: Count the number of initialised operations</title>
<updated>2015-04-02T13:28:53+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2015-02-25T13:21:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=03cdd0e4b9a98ae995b81cd8f58e992ec3f44ae2'/>
<id>03cdd0e4b9a98ae995b81cd8f58e992ec3f44ae2</id>
<content type='text'>
Count and display through /proc/fs/fscache/stats the number of initialised
operations.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Steve Dickson &lt;steved@redhat.com&gt;
Acked-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Count and display through /proc/fs/fscache/stats the number of initialised
operations.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Steve Dickson &lt;steved@redhat.com&gt;
Acked-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>FS-Cache: Out of line fscache_operation_init()</title>
<updated>2015-04-02T13:28:53+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2015-02-25T13:26:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1339ec98e32b4bc8efb6fbb71c006a465130aaba'/>
<id>1339ec98e32b4bc8efb6fbb71c006a465130aaba</id>
<content type='text'>
Out of line fscache_operation_init() so that it can access internal FS-Cache
features, such as stats, in a later commit.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Steve Dickson &lt;steved@redhat.com&gt;
Acked-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Out of line fscache_operation_init() so that it can access internal FS-Cache
features, such as stats, in a later commit.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Steve Dickson &lt;steved@redhat.com&gt;
Acked-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>FS-Cache: Permit fscache_cancel_op() to cancel in-progress operations too</title>
<updated>2015-04-02T13:28:53+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2015-02-24T10:05:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=418b7eb9e1011bc11220a03ad0045885d04698d2'/>
<id>418b7eb9e1011bc11220a03ad0045885d04698d2</id>
<content type='text'>
Currently, fscache_cancel_op() only cancels pending operations - attempts to
cancel in-progress operations are ignored.  This leads to a problem in
fscache_wait_for_operation_activation() whereby the wait is terminated, but
the object has been killed.

The check at the end of the function now triggers because it's no longer
contingent on the cache having produced an I/O error since the commit that
fixed the logic error in fscache_object_is_dead().

The result of the check is that it tries to cancel the operation - but since
the object may not be pending by this point, the cancellation request may be
ignored - with the result that the the object is just put by the caller and
fscache_put_operation has an assertion failure because the operation isn't in
either the COMPLETE or the CANCELLED states.

To fix this, we permit in-progress ops to be cancelled under some
circumstances.

The bug results in an oops that looks something like this:

	FS-Cache: fscache_wait_for_operation_activation() = -ENOBUFS [obj dead 3]
	FS-Cache:
	FS-Cache: Assertion failed
	FS-Cache: 3 == 5 is false
	------------[ cut here ]------------
	kernel BUG at ../fs/fscache/operation.c:432!
	...
	RIP: 0010:[&lt;ffffffffa0088574&gt;] fscache_put_operation+0xf2/0x2cd
	Call Trace:
	 [&lt;ffffffffa008b92a&gt;] __fscache_read_or_alloc_pages+0x2ec/0x3b3
	 [&lt;ffffffffa00b761f&gt;] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
	 [&lt;ffffffffa00b06c5&gt;] nfs_readpages+0x10c/0x185 [nfs]
	 [&lt;ffffffff81124925&gt;] ? alloc_pages_current+0x119/0x13e
	 [&lt;ffffffff810ee5fd&gt;] ? __page_cache_alloc+0xfb/0x10a
	 [&lt;ffffffff810f87f8&gt;] __do_page_cache_readahead+0x188/0x22c
	 [&lt;ffffffff810f8b3a&gt;] ondemand_readahead+0x29e/0x2af
	 [&lt;ffffffff810f8c92&gt;] page_cache_sync_readahead+0x38/0x3a
	 [&lt;ffffffff810ef337&gt;] generic_file_read_iter+0x1a2/0x55a
	 [&lt;ffffffffa00a9dff&gt;] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
	 [&lt;ffffffffa00a6a23&gt;] nfs_file_read+0x49/0x70 [nfs]
	 [&lt;ffffffff811363be&gt;] new_sync_read+0x78/0x9c
	 [&lt;ffffffff81137164&gt;] __vfs_read+0x13/0x38
	 [&lt;ffffffff8113721e&gt;] vfs_read+0x95/0x121
	 [&lt;ffffffff811372f6&gt;] SyS_read+0x4c/0x8a
	 [&lt;ffffffff81557a52&gt;] system_call_fastpath+0x12/0x17

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Steve Dickson &lt;steved@redhat.com&gt;
Acked-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, fscache_cancel_op() only cancels pending operations - attempts to
cancel in-progress operations are ignored.  This leads to a problem in
fscache_wait_for_operation_activation() whereby the wait is terminated, but
the object has been killed.

The check at the end of the function now triggers because it's no longer
contingent on the cache having produced an I/O error since the commit that
fixed the logic error in fscache_object_is_dead().

The result of the check is that it tries to cancel the operation - but since
the object may not be pending by this point, the cancellation request may be
ignored - with the result that the the object is just put by the caller and
fscache_put_operation has an assertion failure because the operation isn't in
either the COMPLETE or the CANCELLED states.

To fix this, we permit in-progress ops to be cancelled under some
circumstances.

The bug results in an oops that looks something like this:

	FS-Cache: fscache_wait_for_operation_activation() = -ENOBUFS [obj dead 3]
	FS-Cache:
	FS-Cache: Assertion failed
	FS-Cache: 3 == 5 is false
	------------[ cut here ]------------
	kernel BUG at ../fs/fscache/operation.c:432!
	...
	RIP: 0010:[&lt;ffffffffa0088574&gt;] fscache_put_operation+0xf2/0x2cd
	Call Trace:
	 [&lt;ffffffffa008b92a&gt;] __fscache_read_or_alloc_pages+0x2ec/0x3b3
	 [&lt;ffffffffa00b761f&gt;] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
	 [&lt;ffffffffa00b06c5&gt;] nfs_readpages+0x10c/0x185 [nfs]
	 [&lt;ffffffff81124925&gt;] ? alloc_pages_current+0x119/0x13e
	 [&lt;ffffffff810ee5fd&gt;] ? __page_cache_alloc+0xfb/0x10a
	 [&lt;ffffffff810f87f8&gt;] __do_page_cache_readahead+0x188/0x22c
	 [&lt;ffffffff810f8b3a&gt;] ondemand_readahead+0x29e/0x2af
	 [&lt;ffffffff810f8c92&gt;] page_cache_sync_readahead+0x38/0x3a
	 [&lt;ffffffff810ef337&gt;] generic_file_read_iter+0x1a2/0x55a
	 [&lt;ffffffffa00a9dff&gt;] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
	 [&lt;ffffffffa00a6a23&gt;] nfs_file_read+0x49/0x70 [nfs]
	 [&lt;ffffffff811363be&gt;] new_sync_read+0x78/0x9c
	 [&lt;ffffffff81137164&gt;] __vfs_read+0x13/0x38
	 [&lt;ffffffff8113721e&gt;] vfs_read+0x95/0x121
	 [&lt;ffffffff811372f6&gt;] SyS_read+0x4c/0x8a
	 [&lt;ffffffff81557a52&gt;] system_call_fastpath+0x12/0x17

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Steve Dickson &lt;steved@redhat.com&gt;
Acked-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>FS-Cache: fscache_object_is_dead() has wrong logic, kill it</title>
<updated>2015-04-02T13:28:53+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2015-02-24T10:52:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=87021526300f1a292dd966e141e183630ac95317'/>
<id>87021526300f1a292dd966e141e183630ac95317</id>
<content type='text'>
fscache_object_is_dead() returns true only if the object is marked dead and
the cache got an I/O error.  This should be a logical OR instead.  Since two
of the callers got split up into handling for separate subcases, expand the
other callers and kill the function.  This is probably the right thing to do
anyway since one of the subcases isn't about the object at all, but rather
about the cache.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Steve Dickson &lt;steved@redhat.com&gt;
Acked-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
fscache_object_is_dead() returns true only if the object is marked dead and
the cache got an I/O error.  This should be a logical OR instead.  Since two
of the callers got split up into handling for separate subcases, expand the
other callers and kill the function.  This is probably the right thing to do
anyway since one of the subcases isn't about the object at all, but rather
about the cache.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Steve Dickson &lt;steved@redhat.com&gt;
Acked-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>FS-Cache: Synchronise object death state change vs operation submission</title>
<updated>2015-04-02T13:28:53+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2015-02-24T10:05:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f09b443d0e09f37121c55d7f83056f6ebff6ab4f'/>
<id>f09b443d0e09f37121c55d7f83056f6ebff6ab4f</id>
<content type='text'>
When an object is being marked as no longer live, do this under the object
spinlock to prevent a race with operation submission targeted on that object.

The problem occurs due to the following pair of intertwined sequences when the
cache tries to create an object that would take it over the hard available
space limit:

 NETFS INTERFACE
 ===============
 (A) The netfs calls fscache_acquire_cookie().  object creation is deferred to
     the object state machine and the netfs is allowed to continue.

	OBJECT STATE MACHINE KTHREAD
	============================
	(1) The object is looked up on disk by fscache_look_up_object()
	    calling cachefiles_walk_to_object().  The latter finds that the
	    object is not yet represented on disk and calls
	    fscache_object_lookup_negative().

	(2) fscache_object_lookup_negative() sets FSCACHE_COOKIE_NO_DATA_YET
	    and clears FSCACHE_COOKIE_LOOKING_UP, thus allowing the netfs to
	    start queuing read operations.

 (B) The netfs calls fscache_read_or_alloc_pages().  This calls
     fscache_wait_for_deferred_lookup() which sees FSCACHE_COOKIE_LOOKING_UP
     become clear, allowing the read to begin.

 (C) A read operation is set up and passed to fscache_submit_op() to deal
     with.

	(3) cachefiles_walk_to_object() calls cachefiles_has_space(), which
	    fails (or one of the file operations to create stuff fails).
	    cachefiles returns an error to fscache.

	(4) fscache_look_up_object() transits to the LOOKUP_FAILURE state,

	(5) fscache_lookup_failure() sets FSCACHE_OBJECT_LOOKED_UP and
	    FSCACHE_COOKIE_UNAVAILABLE and clears FSCACHE_COOKIE_LOOKING_UP
	    then transits to the KILL_OBJECT state.

	(6) fscache_kill_object() clears FSCACHE_OBJECT_IS_LIVE in an attempt
	    to reject any further requests from the netfs.

	(7) object-&gt;n_ops is examined and found to be 0.
	    fscache_kill_object() transits to the DROP_OBJECT state.

 (D) fscache_submit_op() locks the object spinlock, sees if it can dispatch
     the op immediately by calling fscache_object_is_active() - which fails
     since FSCACHE_OBJECT_IS_AVAILABLE has not yet been set.

 (E) fscache_submit_op() then tests FSCACHE_OBJECT_LOOKED_UP - which is set.
     It then queues the object and increments object-&gt;n_ops.

	(8) fscache_drop_object() releases the object and eventually
	    fscache_put_object() calls cachefiles_put_object() which suffers
	    an assertion failure here:

		ASSERTCMP(object-&gt;fscache.n_ops, ==, 0);

Locking the object spinlock in step (6) around the clearance of
FSCACHE_OBJECT_IS_LIVE ensures that the the decision trees in
fscache_submit_op() and fscache_submit_exclusive_op() don't see the IS_LIVE
flag being cleared mid-decision: either the op is queued before step (7) - in
which case fscache_kill_object() will see n_ops&gt;0 and will deal with the op -
or the op will be rejected.

This, combined with rejecting op submission if the target object is dying, fix
the problem.

The problem shows up as the following oops:

CacheFiles: Assertion failed
CacheFiles: 1 == 0 is false
------------[ cut here ]------------
kernel BUG at ../fs/cachefiles/interface.c:339!
...
RIP: 0010:[&lt;ffffffffa014fd9c&gt;]  [&lt;ffffffffa014fd9c&gt;] cachefiles_put_object+0x2a4/0x301 [cachefiles]
...
Call Trace:
 [&lt;ffffffffa008674b&gt;] fscache_put_object+0x18/0x21 [fscache]
 [&lt;ffffffffa00883e6&gt;] fscache_object_work_func+0x3ba/0x3c9 [fscache]
 [&lt;ffffffff81054dad&gt;] process_one_work+0x226/0x441
 [&lt;ffffffff81055d91&gt;] worker_thread+0x273/0x36b
 [&lt;ffffffff81055b1e&gt;] ? rescuer_thread+0x2e1/0x2e1
 [&lt;ffffffff81059b9d&gt;] kthread+0x10e/0x116
 [&lt;ffffffff81059a8f&gt;] ? kthread_create_on_node+0x1bb/0x1bb
 [&lt;ffffffff815579ac&gt;] ret_from_fork+0x7c/0xb0
 [&lt;ffffffff81059a8f&gt;] ? kthread_create_on_node+0x1bb/0x1bb

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Steve Dickson &lt;steved@redhat.com&gt;
Acked-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When an object is being marked as no longer live, do this under the object
spinlock to prevent a race with operation submission targeted on that object.

The problem occurs due to the following pair of intertwined sequences when the
cache tries to create an object that would take it over the hard available
space limit:

 NETFS INTERFACE
 ===============
 (A) The netfs calls fscache_acquire_cookie().  object creation is deferred to
     the object state machine and the netfs is allowed to continue.

	OBJECT STATE MACHINE KTHREAD
	============================
	(1) The object is looked up on disk by fscache_look_up_object()
	    calling cachefiles_walk_to_object().  The latter finds that the
	    object is not yet represented on disk and calls
	    fscache_object_lookup_negative().

	(2) fscache_object_lookup_negative() sets FSCACHE_COOKIE_NO_DATA_YET
	    and clears FSCACHE_COOKIE_LOOKING_UP, thus allowing the netfs to
	    start queuing read operations.

 (B) The netfs calls fscache_read_or_alloc_pages().  This calls
     fscache_wait_for_deferred_lookup() which sees FSCACHE_COOKIE_LOOKING_UP
     become clear, allowing the read to begin.

 (C) A read operation is set up and passed to fscache_submit_op() to deal
     with.

	(3) cachefiles_walk_to_object() calls cachefiles_has_space(), which
	    fails (or one of the file operations to create stuff fails).
	    cachefiles returns an error to fscache.

	(4) fscache_look_up_object() transits to the LOOKUP_FAILURE state,

	(5) fscache_lookup_failure() sets FSCACHE_OBJECT_LOOKED_UP and
	    FSCACHE_COOKIE_UNAVAILABLE and clears FSCACHE_COOKIE_LOOKING_UP
	    then transits to the KILL_OBJECT state.

	(6) fscache_kill_object() clears FSCACHE_OBJECT_IS_LIVE in an attempt
	    to reject any further requests from the netfs.

	(7) object-&gt;n_ops is examined and found to be 0.
	    fscache_kill_object() transits to the DROP_OBJECT state.

 (D) fscache_submit_op() locks the object spinlock, sees if it can dispatch
     the op immediately by calling fscache_object_is_active() - which fails
     since FSCACHE_OBJECT_IS_AVAILABLE has not yet been set.

 (E) fscache_submit_op() then tests FSCACHE_OBJECT_LOOKED_UP - which is set.
     It then queues the object and increments object-&gt;n_ops.

	(8) fscache_drop_object() releases the object and eventually
	    fscache_put_object() calls cachefiles_put_object() which suffers
	    an assertion failure here:

		ASSERTCMP(object-&gt;fscache.n_ops, ==, 0);

Locking the object spinlock in step (6) around the clearance of
FSCACHE_OBJECT_IS_LIVE ensures that the the decision trees in
fscache_submit_op() and fscache_submit_exclusive_op() don't see the IS_LIVE
flag being cleared mid-decision: either the op is queued before step (7) - in
which case fscache_kill_object() will see n_ops&gt;0 and will deal with the op -
or the op will be rejected.

This, combined with rejecting op submission if the target object is dying, fix
the problem.

The problem shows up as the following oops:

CacheFiles: Assertion failed
CacheFiles: 1 == 0 is false
------------[ cut here ]------------
kernel BUG at ../fs/cachefiles/interface.c:339!
...
RIP: 0010:[&lt;ffffffffa014fd9c&gt;]  [&lt;ffffffffa014fd9c&gt;] cachefiles_put_object+0x2a4/0x301 [cachefiles]
...
Call Trace:
 [&lt;ffffffffa008674b&gt;] fscache_put_object+0x18/0x21 [fscache]
 [&lt;ffffffffa00883e6&gt;] fscache_object_work_func+0x3ba/0x3c9 [fscache]
 [&lt;ffffffff81054dad&gt;] process_one_work+0x226/0x441
 [&lt;ffffffff81055d91&gt;] worker_thread+0x273/0x36b
 [&lt;ffffffff81055b1e&gt;] ? rescuer_thread+0x2e1/0x2e1
 [&lt;ffffffff81059b9d&gt;] kthread+0x10e/0x116
 [&lt;ffffffff81059a8f&gt;] ? kthread_create_on_node+0x1bb/0x1bb
 [&lt;ffffffff815579ac&gt;] ret_from_fork+0x7c/0xb0
 [&lt;ffffffff81059a8f&gt;] ? kthread_create_on_node+0x1bb/0x1bb

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Steve Dickson &lt;steved@redhat.com&gt;
Acked-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>FS-Cache: Handle a new operation submitted against a killed object</title>
<updated>2015-04-02T13:28:53+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2015-02-25T11:53:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6515d1dbf424c5c3b94d44e9c7f581026e7fc0d3'/>
<id>6515d1dbf424c5c3b94d44e9c7f581026e7fc0d3</id>
<content type='text'>
Reject new operations that are being submitted against an object if that
object has failed its lookup or creation states or has been killed by the
cache backend for some other reason, such as having been culled.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Steve Dickson &lt;steved@redhat.com&gt;
Acked-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Reject new operations that are being submitted against an object if that
object has failed its lookup or creation states or has been killed by the
cache backend for some other reason, such as having been culled.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Steve Dickson &lt;steved@redhat.com&gt;
Acked-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
