linux-toradex.git/fs/fscache/stats.c, branch v4.5

FS-Cache: Count the number of initialised operations

2015-04-02T13:28:53+00:00

Count and display through /proc/fs/fscache/stats the number of initialised
operations.

Signed-off-by: David Howells 
Reviewed-by: Steve Dickson 
Acked-by: Jeff Layton

FS-Cache: Count culled objects and objects rejected due to lack of space

2015-02-24T10:05:27+00:00

Count the number of objects that get culled by the cache backend and the
number of objects that the cache backend declines to instantiate due to lack
of space in the cache.

These numbers are made available through /proc/fs/fscache/stats

Signed-off-by: David Howells 
Reviewed-by: Steve Dickson 
Acked-by: Jeff Layton

fs/fscache/stats.c: fix memory leak

2013-04-29T22:54:27+00:00

There is a kernel memory leak observed when the proc file
/proc/fs/fscache/stats is read.

The reason is that in fscache_stats_open, single_open is called and the
respective release function is not called during release.  Hence fix
with correct release function - single_release().

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=57101

Signed-off-by: Anurup m 
Cc: shyju pv 
Cc: Sanil kumar 
Cc: Nataraj m 
Cc: Li Zefan 
Cc: David Howells 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

NFS: nfs_migrate_page() does not wait for FS-Cache to finish with a page

2012-12-20T22:12:03+00:00

nfs_migrate_page() does not wait for FS-Cache to finish with a page, probably
leading to the following bad-page-state:

 BUG: Bad page state in process python-bin  pfn:17d39b
 page:ffffea00053649e8 flags:004000000000100c count:0 mapcount:0 mapping:(null)
index:38686 (Tainted: G    B      ---------------- )
 Pid: 31053, comm: python-bin Tainted: G    B      ----------------
2.6.32-71.24.1.el6.x86_64 #1
 Call Trace:
 [] bad_page+0x107/0x160
 [] free_hot_cold_page+0x1c9/0x220
 [] __pagevec_free+0x59/0xb0
 [] ? flush_tlb_others_ipi+0x128/0x130
 [] release_pages+0x21c/0x250
 [] ? remove_migration_pte+0x28a/0x2b0
 [] ? mem_cgroup_get_reclaim_stat_from_page+0x18/0x70
 [] ____pagevec_lru_add+0x167/0x180
 [] __lru_cache_add+0x58/0x70
 [] lru_cache_add_lru+0x21/0x40
 [] putback_lru_page+0x69/0x100
 [] migrate_pages+0x13d/0x5d0
 [] ? ____pagevec_lru_add+0x167/0x180
 [] ? compaction_alloc+0x0/0x370
 [] compact_zone+0x4cc/0x600
 [] ? get_page_from_freelist+0x15c/0x820
 [] ? check_preempt_wakeup+0x1c4/0x3c0
 [] compact_zone_order+0x7e/0xb0
 [] try_to_compact_pages+0x109/0x170
 [] __alloc_pages_nodemask+0x5ed/0x850
 [] ? thread_return+0x4e/0x778
 [] alloc_pages_vma+0x93/0x150
 [] do_huge_pmd_anonymous_page+0x135/0x340
 [] ? rwsem_down_read_failed+0x26/0x30
 [] handle_mm_fault+0x245/0x2b0
 [] do_page_fault+0x123/0x3a0
 [] page_fault+0x25/0x30

nfs_migrate_page() calls nfs_fscache_release_page() which doesn't actually wait
- even if __GFP_WAIT is set.  The reason that doesn't wait is that
fscache_maybe_release_page() might deadlock the allocator as the work threads
writing to the cache may all end up sleeping on memory allocation.

However, I wonder if that is actually a problem.  There are a number of things
I can do to deal with this:

 (1) Make nfs_migrate_page() wait.

 (2) Make fscache_maybe_release_page() honour the __GFP_WAIT flag.

 (3) Set a timeout around the wait.

 (4) Make nfs_migrate_page() return an error if the page is still busy.

For the moment, I'll select (2) and (4).

Signed-off-by: David Howells 
Acked-by: Jeff Layton

FS-Cache: Provide proper invalidation

2012-12-20T22:04:07+00:00

Provide a proper invalidation method rather than relying on the netfs retiring
the cookie it has and getting a new one.  The problem with this is that isn't
easy for the netfs to make sure that it has completed/cancelled all its
outstanding storage and retrieval operations on the cookie it is retiring.

Instead, have the cache provide an invalidation method that will cancel or wait
for all currently outstanding operations before invalidating the cache, and
will cause new operations to queue up behind that.  Whilst invalidation is in
progress, some requests will be rejected until the cache can stack a barrier on
the operation queue to cause new operations to be deferred behind it.

Signed-off-by: David Howells

fs-cache: order the debugfs stats correctly

2010-04-07T15:38:05+00:00

Order the debugfs statistics correctly.  The values displayed through a
seq_printf() statement should be in the same order as the names in the
format string.

In the 'Lookups' line, objects created ('crt=') and lookups timed out
('tmo=') have their values transposed.

Signed-off-by: David Howells 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

CacheFiles: Catch an overly long wait for an old active object

2009-11-19T18:12:05+00:00

Catch an overly long wait for an old, dying active object when we want to
replace it with a new one.  The probability is that all the slow-work threads
are hogged, and the delete can't get a look in.

What we do instead is:

 (1) if there's nothing in the slow work queue, we sleep until either the dying
     object has finished dying or there is something in the slow work queue
     behind which we can queue our object.

 (2) if there is something in the slow work queue, we return ETIMEDOUT to
     fscache_lookup_object(), which then puts us back on the slow work queue,
     presumably behind the deletion that we're blocked by.  We are then
     deferred for a while until we work our way back through the queue -
     without blocking a slow-work thread unnecessarily.

A backtrace similar to the following may appear in the log without this patch:

	INFO: task kslowd004:5711 blocked for more than 120 seconds.
	"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
	kslowd004     D 0000000000000000     0  5711      2 0x00000080
	 ffff88000340bb80 0000000000000046 ffff88002550d000 0000000000000000
	 ffff88002550d000 0000000000000007 ffff88000340bfd8 ffff88002550d2a8
	 000000000000ddf0 00000000000118c0 00000000000118c0 ffff88002550d2a8
	Call Trace:
	 [] ? trace_hardirqs_on+0xd/0xf
	 [] ? cachefiles_wait_bit+0x0/0xd [cachefiles]
	 [] cachefiles_wait_bit+0x9/0xd [cachefiles]
	 [] __wait_on_bit+0x43/0x76
	 [] ? ext3_xattr_get+0x1ec/0x270
	 [] out_of_line_wait_on_bit+0x69/0x74
	 [] ? cachefiles_wait_bit+0x0/0xd [cachefiles]
	 [] ? wake_bit_function+0x0/0x2e
	 [] cachefiles_mark_object_active+0x203/0x23b [cachefiles]
	 [] cachefiles_walk_to_object+0x558/0x827 [cachefiles]
	 [] cachefiles_lookup_object+0xac/0x12a [cachefiles]
	 [] fscache_lookup_object+0x1c7/0x214 [fscache]
	 [] fscache_object_state_machine+0xa5/0x52d [fscache]
	 [] fscache_object_slow_work_execute+0x5f/0xa0 [fscache]
	 [] slow_work_execute+0x18f/0x2d1
	 [] slow_work_thread+0x1c5/0x308
	 [] ? autoremove_wake_function+0x0/0x34
	 [] ? slow_work_thread+0x0/0x308
	 [] kthread+0x7a/0x82
	 [] child_rip+0xa/0x20
	 [] ? restore_args+0x0/0x30
	 [] ? kthread+0x0/0x82
	 [] ? child_rip+0x0/0x20
	1 lock held by kslowd004/5711:
	 #0:  (&sb->s_type->i_mutex_key#7/1){+.+.+.}, at: [] cachefiles_walk_to_object+0x1b3/0x827 [cachefiles]

Signed-off-by: David Howells

FS-Cache: Start processing an object's operations on that object's death

2009-11-19T18:11:45+00:00

Start processing an object's operations when that object moves into the DYING
state as the object cannot be destroyed until all its outstanding operations
have completed.

Furthermore, make sure that read and allocation operations handle being woken
up on a dead object.  Such events are recorded in the Allocs.abt and
Retrvls.abt statistics as viewable through /proc/fs/fscache/stats.

The code for waiting for object activation for the read and allocation
operations is also extracted into its own function as it is much the same in
all cases, differing only in the stats incremented.

Signed-off-by: David Howells

FS-Cache: Add a retirement stat counter

2009-11-19T18:11:38+00:00

Add a stat counter to count retirement events rather than ordinary release
events (the retire argument to fscache_relinquish_cookie()).

Signed-off-by: David Howells

FS-Cache: Handle pages pending storage that get evicted under OOM conditions

2009-11-19T18:11:35+00:00

Handle netfs pages that the vmscan algorithm wants to evict from the pagecache
under OOM conditions, but that are waiting for write to the cache.  Under these
conditions, vmscan calls the releasepage() function of the netfs, asking if a
page can be discarded.

The problem is typified by the following trace of a stuck process:

	kslowd005     D 0000000000000000     0  4253      2 0x00000080
	 ffff88001b14f370 0000000000000046 ffff880020d0d000 0000000000000007
	 0000000000000006 0000000000000001 ffff88001b14ffd8 ffff880020d0d2a8
	 000000000000ddf0 00000000000118c0 00000000000118c0 ffff880020d0d2a8
	Call Trace:
	 [] __fscache_wait_on_page_write+0x8b/0xa7 [fscache]
	 [] ? autoremove_wake_function+0x0/0x34
	 [] ? __fscache_check_page_write+0x63/0x70 [fscache]
	 [] nfs_fscache_release_page+0x4e/0xc4 [nfs]
	 [] nfs_release_page+0x3c/0x41 [nfs]
	 [] try_to_release_page+0x32/0x3b
	 [] shrink_page_list+0x316/0x4ac
	 [] shrink_inactive_list+0x392/0x67c
	 [] ? __mutex_unlock_slowpath+0x100/0x10b
	 [] ? trace_hardirqs_on_caller+0x10c/0x130
	 [] ? mutex_unlock+0x9/0xb
	 [] shrink_list+0x8d/0x8f
	 [] shrink_zone+0x278/0x33c
	 [] ? ktime_get_ts+0xad/0xba
	 [] try_to_free_pages+0x22e/0x392
	 [] ? isolate_pages_global+0x0/0x212
	 [] __alloc_pages_nodemask+0x3dc/0x5cf
	 [] grab_cache_page_write_begin+0x65/0xaa
	 [] ext3_write_begin+0x78/0x1eb
	 [] generic_file_buffered_write+0x109/0x28c
	 [] ? current_fs_time+0x22/0x29
	 [] __generic_file_aio_write+0x350/0x385
	 [] ? generic_file_aio_write+0x4a/0xae
	 [] generic_file_aio_write+0x60/0xae
	 [] do_sync_write+0xe3/0x120
	 [] ? autoremove_wake_function+0x0/0x34
	 [] ? __dentry_open+0x1a5/0x2b8
	 [] ? dentry_open+0x82/0x89
	 [] cachefiles_write_page+0x298/0x335 [cachefiles]
	 [] fscache_write_op+0x178/0x2c2 [fscache]
	 [] fscache_op_execute+0x7a/0xd1 [fscache]
	 [] slow_work_execute+0x18f/0x2d1
	 [] slow_work_thread+0x1c5/0x308
	 [] ? autoremove_wake_function+0x0/0x34
	 [] ? slow_work_thread+0x0/0x308
	 [] kthread+0x7a/0x82
	 [] child_rip+0xa/0x20
	 [] ? restore_args+0x0/0x30
	 [] ? tg_shares_up+0x171/0x227
	 [] ? kthread+0x0/0x82
	 [] ? child_rip+0x0/0x20

In the above backtrace, the following is happening:

 (1) A page storage operation is being executed by a slow-work thread
     (fscache_write_op()).

 (2) FS-Cache farms the operation out to the cache to perform
     (cachefiles_write_page()).

 (3) CacheFiles is then calling Ext3 to perform the actual write, using Ext3's
     standard write (do_sync_write()) under KERNEL_DS directly from the netfs
     page.

 (4) However, for Ext3 to perform the write, it must allocate some memory, in
     particular, it must allocate at least one page cache page into which it
     can copy the data from the netfs page.

 (5) Under OOM conditions, the memory allocator can't immediately come up with
     a page, so it uses vmscan to find something to discard
     (try_to_free_pages()).

 (6) vmscan finds a clean netfs page it might be able to discard (possibly the
     one it's trying to write out).

 (7) The netfs is called to throw the page away (nfs_release_page()) - but it's
     called with __GFP_WAIT, so the netfs decides to wait for the store to
     complete (__fscache_wait_on_page_write()).

 (8) This blocks a slow-work processing thread - possibly against itself.

The system ends up stuck because it can't write out any netfs pages to the
cache without allocating more memory.

To avoid this, we make FS-Cache cancel some writes that aren't in the middle of
actually being performed.  This means that some data won't make it into the
cache this time.  To support this, a new FS-Cache function is added
fscache_maybe_release_page() that replaces what the netfs releasepage()
functions used to do with respect to the cache.

The decisions fscache_maybe_release_page() makes are counted and displayed
through /proc/fs/fscache/stats on a line labelled "VmScan".  There are four
counters provided: "nos=N" - pages that weren't pending storage; "gon=N" -
pages that were pending storage when we first looked, but weren't by the time
we got the object lock; "bsy=N" - pages that we ignored as they were actively
being written when we looked; and "can=N" - pages that we cancelled the storage
of.

What I'd really like to do is alter the behaviour of the cancellation
heuristics, depending on how necessary it is to expel pages.  If there are
plenty of other pages that aren't waiting to be written to the cache that
could be ejected first, then it would be nice to hold up on immediate
cancellation of cache writes - but I don't see a way of doing that.

Signed-off-by: David Howells