linux-toradex.git/Documentation/filesystems/caching, branch v3.10.78

FS-Cache: Provide proper invalidation

2012-12-20T22:04:07+00:00

Provide a proper invalidation method rather than relying on the netfs retiring
the cookie it has and getting a new one.  The problem with this is that isn't
easy for the netfs to make sure that it has completed/cancelled all its
outstanding storage and retrieval operations on the cookie it is retiring.

Instead, have the cache provide an invalidation method that will cancel or wait
for all currently outstanding operations before invalidating the cache, and
will cause new operations to queue up behind that.  Whilst invalidation is in
progress, some requests will be rejected until the cache can stack a barrier on
the operation queue to cause new operations to be deferred behind it.

Signed-off-by: David Howells

FS-Cache: Fix operation state management and accounting

2012-12-20T21:58:26+00:00

Fix the state management of internal fscache operations and the accounting of
what operations are in what states.

This is done by:

 (1) Give struct fscache_operation a enum variable that directly represents the
     state it's currently in, rather than spreading this knowledge over a bunch
     of flags, who's processing the operation at the moment and whether it is
     queued or not.

     This makes it easier to write assertions to check the state at various
     points and to prevent invalid state transitions.

 (2) Add an 'operation complete' state and supply a function to indicate the
     completion of an operation (fscache_op_complete()) and make things call
     it.  The final call to fscache_put_operation() can then check that an op
     in the appropriate state (complete or cancelled).

 (3) Adjust the use of object->n_ops, ->n_in_progress, ->n_exclusive to better
     govern the state of an object:

	(a) The ->n_ops is now the number of extant operations on the object
	    and is now decremented by fscache_put_operation() only.

	(b) The ->n_in_progress is simply the number of objects that have been
	    taken off of the object's pending queue for the purposes of being
	    run.  This is decremented by fscache_op_complete() only.

	(c) The ->n_exclusive is the number of exclusive ops that have been
	    submitted and queued or are in progress.  It is decremented by
	    fscache_op_complete() and by fscache_cancel_op().

     fscache_put_operation() and fscache_operation_gc() now no longer try to
     clean up ->n_exclusive and ->n_in_progress.  That was leading to double
     decrements against fscache_cancel_op().

     fscache_cancel_op() now no longer decrements ->n_ops.  That was leading to
     double decrements against fscache_put_operation().

     fscache_submit_exclusive_op() now decides whether it has to queue an op
     based on ->n_in_progress being > 0 rather than ->n_ops > 0 as the latter
     will persist in being true even after all preceding operations have been
     cancelled or completed.  Furthermore, if an object is active and there are
     runnable ops against it, there must be at least one op running.

 (4) Add a remaining-pages counter (n_pages) to struct fscache_retrieval and
     provide a function to record completion of the pages as they complete.

     When n_pages reaches 0, the operation is deemed to be complete and
     fscache_op_complete() is called.

     Add calls to fscache_retrieval_complete() anywhere we've finished with a
     page we've been given to read or allocate for.  This includes places where
     we just return pages to the netfs for reading from the server and where
     accessing the cache fails and we discard the proposed netfs page.

The bugs in the unfixed state management manifest themselves as oopses like the
following where the operation completion gets out of sync with return of the
cookie by the netfs.  This is possible because the cache unlocks and returns
all the netfs pages before recording its completion - which means that there's
nothing to stop the netfs discarding them and returning the cookie.


FS-Cache: Cookie 'NFS.fh' still has outstanding reads
------------[ cut here ]------------
kernel BUG at fs/fscache/cookie.c:519!
invalid opcode: 0000 [#1] SMP
CPU 1
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

Pid: 400, comm: kswapd0 Not tainted 3.1.0-rc7-fsdevel+ #1090                  /DG965RY
RIP: 0010:[]  [] __fscache_relinquish_cookie+0x170/0x343 [fscache]
RSP: 0018:ffff8800368cfb00  EFLAGS: 00010282
RAX: 000000000000003c RBX: ffff880023cc8790 RCX: 0000000000000000
RDX: 0000000000002f2e RSI: 0000000000000001 RDI: ffffffff813ab86c
RBP: ffff8800368cfb50 R08: 0000000000000002 R09: 0000000000000000
R10: ffff88003a1b7890 R11: ffff88001df6e488 R12: ffff880023d8ed98
R13: ffff880023cc8798 R14: 0000000000000004 R15: ffff88003b8bf370
FS:  0000000000000000(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000008ba008 CR3: 0000000023d93000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kswapd0 (pid: 400, threadinfo ffff8800368ce000, task ffff88003b8bf040)
Stack:
 ffff88003b8bf040 ffff88001df6e528 ffff88001df6e528 ffffffffa00b46b0
 ffff88003b8bf040 ffff88001df6e488 ffff88001df6e620 ffffffffa00b46b0
 ffff88001ebd04c8 0000000000000004 ffff8800368cfb70 ffffffffa00b2c91
Call Trace:
 [] nfs_fscache_release_inode_cookie+0x3b/0x47 [nfs]
 [] nfs_clear_inode+0x3c/0x41 [nfs]
 [] nfs4_evict_inode+0x2f/0x33 [nfs]
 [] evict+0xa1/0x15c
 [] dispose_list+0x2c/0x38
 [] prune_icache_sb+0x28c/0x29b
 [] prune_super+0xd5/0x140
 [] shrink_slab+0x102/0x1ab
 [] balance_pgdat+0x2f2/0x595
 [] ? process_timeout+0xb/0xb
 [] kswapd+0x270/0x289
 [] ? __init_waitqueue_head+0x46/0x46
 [] ? balance_pgdat+0x595/0x595
 [] kthread+0x7f/0x87
 [] kernel_thread_helper+0x4/0x10
 [] ? finish_task_switch+0x45/0xc0
 [] ? retint_restore_args+0xe/0xe
 [] ? __init_kthread_worker+0x53/0x53
 [] ? gs_change+0xb/0xb

Signed-off-by: David Howells

doc: fix broken references

2011-09-27T16:08:04+00:00

There are numerous broken references to Documentation files (in other
Documentation files, in comments, etc.). These broken references are
caused by typo's in the references, and by renames or removals of the
Documentation files. Some broken references are simply odd.

Fix these broken references, sometimes by dropping the irrelevant text
they were part of.

Signed-off-by: Paul Bolle 
Signed-off-by: Jiri Kosina

FS-Cache: Add a helper to bulk uncache pages on an inode

2011-07-07T20:21:56+00:00

Add an FS-Cache helper to bulk uncache pages on an inode.  This will
only work for the circumstance where the pages in the cache correspond
1:1 with the pages attached to an inode's page cache.

This is required for CIFS and NFS: When disabling inode cookie, we were
returning the cookie and setting cifsi->fscache to NULL but failed to
invalidate any previously mapped pages.  This resulted in "Bad page
state" errors and manifested in other kind of errors when running
fsstress.  Fix it by uncaching mapped pages when we disable the inode
cookie.

This patch should fix the following oops and "Bad page state" errors
seen during fsstress testing.

  ------------[ cut here ]------------
  kernel BUG at fs/cachefiles/namei.c:201!
  invalid opcode: 0000 [#1] SMP
  Pid: 5, comm: kworker/u:0 Not tainted 2.6.38.7-30.fc15.x86_64 #1 Bochs Bochs
  RIP: 0010: cachefiles_walk_to_object+0x436/0x745 [cachefiles]
  RSP: 0018:ffff88002ce6dd00  EFLAGS: 00010282
  RAX: ffff88002ef165f0 RBX: ffff88001811f500 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000282
  RBP: ffff88002ce6dda0 R08: 0000000000000100 R09: ffffffff81b3a300
  R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff88002ae54840
  R13: ffff88002ae54840 R14: ffff880029c29c00 R15: ffff88001811f4b0
  FS:  00007f394dd32720(0000) GS:ffff88002ef00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00007fffcb62ddf8 CR3: 000000001825f000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process kworker/u:0 (pid: 5, threadinfo ffff88002ce6c000, task ffff88002ce55cc0)
  Stack:
   0000000000000246 ffff88002ce55cc0 ffff88002ce6dd58 ffff88001815dc00
   ffff8800185246c0 ffff88001811f618 ffff880029c29d18 ffff88001811f380
   ffff88002ce6dd50 ffffffff814757e4 ffff88002ce6dda0 ffffffff8106ac56
  Call Trace:
   cachefiles_lookup_object+0x78/0xd4 [cachefiles]
   fscache_lookup_object+0x131/0x16d [fscache]
   fscache_object_work_func+0x1bc/0x669 [fscache]
   process_one_work+0x186/0x298
   worker_thread+0xda/0x15d
   kthread+0x84/0x8c
   kernel_thread_helper+0x4/0x10
  RIP  cachefiles_walk_to_object+0x436/0x745 [cachefiles]
  ---[ end trace 1d481c9af1804caa ]---

I tested the uncaching by the following means:

 (1) Create a big file on my NFS server (104857600 bytes).

 (2) Read the file into the cache with md5sum on the NFS client.  Look in
     /proc/fs/fscache/stats:

	Pages  : mrk=25601 unc=0

 (3) Open the file for read/write ("bash 5<>/warthog/bigfile").  Look in proc
     again:

	Pages  : mrk=25601 unc=25601

Reported-by: Jeff Layton 
Signed-off-by: David Howells 
Reviewed-and-Tested-by: Suresh Jayaraman 
cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Fix common misspellings

2011-03-31T14:26:23+00:00

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

fscache: convert object to use workqueue instead of slow-work

2010-07-22T20:58:34+00:00

Make fscache object state transition callbacks use workqueue instead
of slow-work.  New dedicated unbound CPU workqueue fscache_object_wq
is created.  get/put callbacks are renamed and modified to take
@object and called directly from the enqueue wrapper and the work
function.  While at it, make all open coded instances of get/put to
use fscache_get/put_object().

* Unbound workqueue is used.

* work_busy() output is printed instead of slow-work flags in object
  debugging outputs.  They mean basically the same thing bit-for-bit.

* sysctl fscache.object_max_active added to control concurrency.  The
  default value is nr_cpus clamped between 4 and
  WQ_UNBOUND_MAX_ACTIVE.

* slow_work_sleep_till_thread_needed() is replaced with fscache
  private implementation fscache_object_sleep_till_congested() which
  waits on fscache_object_wq congestion.

* debugfs support is dropped for now.  Tracing API based debug
  facility is planned to be added.

Signed-off-by: Tejun Heo 
Acked-by: David Howells

CacheFiles: Catch an overly long wait for an old active object

2009-11-19T18:12:05+00:00

Catch an overly long wait for an old, dying active object when we want to
replace it with a new one.  The probability is that all the slow-work threads
are hogged, and the delete can't get a look in.

What we do instead is:

 (1) if there's nothing in the slow work queue, we sleep until either the dying
     object has finished dying or there is something in the slow work queue
     behind which we can queue our object.

 (2) if there is something in the slow work queue, we return ETIMEDOUT to
     fscache_lookup_object(), which then puts us back on the slow work queue,
     presumably behind the deletion that we're blocked by.  We are then
     deferred for a while until we work our way back through the queue -
     without blocking a slow-work thread unnecessarily.

A backtrace similar to the following may appear in the log without this patch:

	INFO: task kslowd004:5711 blocked for more than 120 seconds.
	"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
	kslowd004     D 0000000000000000     0  5711      2 0x00000080
	 ffff88000340bb80 0000000000000046 ffff88002550d000 0000000000000000
	 ffff88002550d000 0000000000000007 ffff88000340bfd8 ffff88002550d2a8
	 000000000000ddf0 00000000000118c0 00000000000118c0 ffff88002550d2a8
	Call Trace:
	 [] ? trace_hardirqs_on+0xd/0xf
	 [] ? cachefiles_wait_bit+0x0/0xd [cachefiles]
	 [] cachefiles_wait_bit+0x9/0xd [cachefiles]
	 [] __wait_on_bit+0x43/0x76
	 [] ? ext3_xattr_get+0x1ec/0x270
	 [] out_of_line_wait_on_bit+0x69/0x74
	 [] ? cachefiles_wait_bit+0x0/0xd [cachefiles]
	 [] ? wake_bit_function+0x0/0x2e
	 [] cachefiles_mark_object_active+0x203/0x23b [cachefiles]
	 [] cachefiles_walk_to_object+0x558/0x827 [cachefiles]
	 [] cachefiles_lookup_object+0xac/0x12a [cachefiles]
	 [] fscache_lookup_object+0x1c7/0x214 [fscache]
	 [] fscache_object_state_machine+0xa5/0x52d [fscache]
	 [] fscache_object_slow_work_execute+0x5f/0xa0 [fscache]
	 [] slow_work_execute+0x18f/0x2d1
	 [] slow_work_thread+0x1c5/0x308
	 [] ? autoremove_wake_function+0x0/0x34
	 [] ? slow_work_thread+0x0/0x308
	 [] kthread+0x7a/0x82
	 [] child_rip+0xa/0x20
	 [] ? restore_args+0x0/0x30
	 [] ? kthread+0x0/0x82
	 [] ? child_rip+0x0/0x20
	1 lock held by kslowd004/5711:
	 #0:  (&sb->s_type->i_mutex_key#7/1){+.+.+.}, at: [] cachefiles_walk_to_object+0x1b3/0x827 [cachefiles]

Signed-off-by: David Howells

FS-Cache: Start processing an object's operations on that object's death

2009-11-19T18:11:45+00:00

Start processing an object's operations when that object moves into the DYING
state as the object cannot be destroyed until all its outstanding operations
have completed.

Furthermore, make sure that read and allocation operations handle being woken
up on a dead object.  Such events are recorded in the Allocs.abt and
Retrvls.abt statistics as viewable through /proc/fs/fscache/stats.

The code for waiting for object activation for the read and allocation
operations is also extracted into its own function as it is much the same in
all cases, differing only in the stats incremented.

Signed-off-by: David Howells

FS-Cache: Handle pages pending storage that get evicted under OOM conditions

2009-11-19T18:11:35+00:00

Handle netfs pages that the vmscan algorithm wants to evict from the pagecache
under OOM conditions, but that are waiting for write to the cache.  Under these
conditions, vmscan calls the releasepage() function of the netfs, asking if a
page can be discarded.

The problem is typified by the following trace of a stuck process:

	kslowd005     D 0000000000000000     0  4253      2 0x00000080
	 ffff88001b14f370 0000000000000046 ffff880020d0d000 0000000000000007
	 0000000000000006 0000000000000001 ffff88001b14ffd8 ffff880020d0d2a8
	 000000000000ddf0 00000000000118c0 00000000000118c0 ffff880020d0d2a8
	Call Trace:
	 [] __fscache_wait_on_page_write+0x8b/0xa7 [fscache]
	 [] ? autoremove_wake_function+0x0/0x34
	 [] ? __fscache_check_page_write+0x63/0x70 [fscache]
	 [] nfs_fscache_release_page+0x4e/0xc4 [nfs]
	 [] nfs_release_page+0x3c/0x41 [nfs]
	 [] try_to_release_page+0x32/0x3b
	 [] shrink_page_list+0x316/0x4ac
	 [] shrink_inactive_list+0x392/0x67c
	 [] ? __mutex_unlock_slowpath+0x100/0x10b
	 [] ? trace_hardirqs_on_caller+0x10c/0x130
	 [] ? mutex_unlock+0x9/0xb
	 [] shrink_list+0x8d/0x8f
	 [] shrink_zone+0x278/0x33c
	 [] ? ktime_get_ts+0xad/0xba
	 [] try_to_free_pages+0x22e/0x392
	 [] ? isolate_pages_global+0x0/0x212
	 [] __alloc_pages_nodemask+0x3dc/0x5cf
	 [] grab_cache_page_write_begin+0x65/0xaa
	 [] ext3_write_begin+0x78/0x1eb
	 [] generic_file_buffered_write+0x109/0x28c
	 [] ? current_fs_time+0x22/0x29
	 [] __generic_file_aio_write+0x350/0x385
	 [] ? generic_file_aio_write+0x4a/0xae
	 [] generic_file_aio_write+0x60/0xae
	 [] do_sync_write+0xe3/0x120
	 [] ? autoremove_wake_function+0x0/0x34
	 [] ? __dentry_open+0x1a5/0x2b8
	 [] ? dentry_open+0x82/0x89
	 [] cachefiles_write_page+0x298/0x335 [cachefiles]
	 [] fscache_write_op+0x178/0x2c2 [fscache]
	 [] fscache_op_execute+0x7a/0xd1 [fscache]
	 [] slow_work_execute+0x18f/0x2d1
	 [] slow_work_thread+0x1c5/0x308
	 [] ? autoremove_wake_function+0x0/0x34
	 [] ? slow_work_thread+0x0/0x308
	 [] kthread+0x7a/0x82
	 [] child_rip+0xa/0x20
	 [] ? restore_args+0x0/0x30
	 [] ? tg_shares_up+0x171/0x227
	 [] ? kthread+0x0/0x82
	 [] ? child_rip+0x0/0x20

In the above backtrace, the following is happening:

 (1) A page storage operation is being executed by a slow-work thread
     (fscache_write_op()).

 (2) FS-Cache farms the operation out to the cache to perform
     (cachefiles_write_page()).

 (3) CacheFiles is then calling Ext3 to perform the actual write, using Ext3's
     standard write (do_sync_write()) under KERNEL_DS directly from the netfs
     page.

 (4) However, for Ext3 to perform the write, it must allocate some memory, in
     particular, it must allocate at least one page cache page into which it
     can copy the data from the netfs page.

 (5) Under OOM conditions, the memory allocator can't immediately come up with
     a page, so it uses vmscan to find something to discard
     (try_to_free_pages()).

 (6) vmscan finds a clean netfs page it might be able to discard (possibly the
     one it's trying to write out).

 (7) The netfs is called to throw the page away (nfs_release_page()) - but it's
     called with __GFP_WAIT, so the netfs decides to wait for the store to
     complete (__fscache_wait_on_page_write()).

 (8) This blocks a slow-work processing thread - possibly against itself.

The system ends up stuck because it can't write out any netfs pages to the
cache without allocating more memory.

To avoid this, we make FS-Cache cancel some writes that aren't in the middle of
actually being performed.  This means that some data won't make it into the
cache this time.  To support this, a new FS-Cache function is added
fscache_maybe_release_page() that replaces what the netfs releasepage()
functions used to do with respect to the cache.

The decisions fscache_maybe_release_page() makes are counted and displayed
through /proc/fs/fscache/stats on a line labelled "VmScan".  There are four
counters provided: "nos=N" - pages that weren't pending storage; "gon=N" -
pages that were pending storage when we first looked, but weren't by the time
we got the object lock; "bsy=N" - pages that we ignored as they were actively
being written when we looked; and "can=N" - pages that we cancelled the storage
of.

What I'd really like to do is alter the behaviour of the cancellation
heuristics, depending on how necessary it is to expel pages.  If there are
plenty of other pages that aren't waiting to be written to the cache that
could be ejected first, then it would be nice to hold up on immediate
cancellation of cache writes - but I don't see a way of doing that.

Signed-off-by: David Howells

FS-Cache: Handle read request vs lookup, creation or other cache failure

2009-11-19T18:11:32+00:00

FS-Cache doesn't correctly handle the netfs requesting a read from the cache
on an object that failed or was withdrawn by the cache.  A trace similar to
the following might be seen:

	CacheFiles: Lookup failed error -105
	[exe   ] unexpected submission OP165afe [OBJ6cac OBJECT_LC_DYING]
	[exe   ] objstate=OBJECT_LC_DYING [OBJECT_LC_DYING]
	[exe   ] objflags=0
	[exe   ] objevent=9 [fffffffffffffffb]
	[exe   ] ops=0 inp=0 exc=0
	Pid: 6970, comm: exe Not tainted 2.6.32-rc6-cachefs #50
	Call Trace:
	 [] fscache_submit_op+0x3ff/0x45a [fscache]
	 [] __fscache_read_or_alloc_pages+0x187/0x3c4 [fscache]
	 [] ? nfs_readpage_from_fscache_complete+0x0/0x66 [nfs]
	 [] __nfs_readpages_from_fscache+0x7e/0x176 [nfs]
	 [] ? __alloc_pages_nodemask+0x11c/0x5cf
	 [] nfs_readpages+0x114/0x1d7 [nfs]
	 [] __do_page_cache_readahead+0x15f/0x1ec
	 [] ? __do_page_cache_readahead+0x73/0x1ec
	 [] ra_submit+0x1c/0x20
	 [] ondemand_readahead+0x227/0x23a
	 [] page_cache_sync_readahead+0x17/0x19
	 [] generic_file_aio_read+0x236/0x5a0
	 [] nfs_file_read+0xe4/0xf3 [nfs]
	 [] do_sync_read+0xe3/0x120
	 [] ? _spin_unlock_irq+0x2b/0x31
	 [] ? autoremove_wake_function+0x0/0x34
	 [] ? selinux_file_permission+0x5d/0x10f
	 [] ? thread_return+0x3e/0x101
	 [] ? security_file_permission+0x11/0x13
	 [] vfs_read+0xaa/0x16f
	 [] ? trace_hardirqs_on_caller+0x10c/0x130
	 [] sys_read+0x45/0x6c
	 [] system_call_fastpath+0x16/0x1b

The object state might also be OBJECT_DYING or OBJECT_WITHDRAWING.

This should be handled by simply rejecting the new operation with ENOBUFS.
There's no need to log an error for it.  Events of this type now appear in the
stats file under Ops:rej.

Signed-off-by: David Howells