linux-toradex.git/fs/fscache, branch v4.4.73

FS-Cache: Initialise stores_lock in netfs cookie

2017-06-17T04:39:37+00:00

[ Upstream commit 62deb8187d116581c88c69a2dd9b5c16588545d4 ]

Initialise the stores_lock in fscache netfs cookies.  Technically, it
shouldn't be necessary, since the netfs cookie is an index and stores no
data, but initialising it anyway adds insignificant overhead.

Signed-off-by: David Howells 
Reviewed-by: Jeff Layton 
Acked-by: Steve Dickson 
Signed-off-by: Al Viro 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

fscache: Clear outstanding writes when disabling a cookie

2017-06-17T04:39:37+00:00

[ Upstream commit 6bdded59c8933940ac7e5b416448276ac89d1144 ]

fscache_disable_cookie() needs to clear the outstanding writes on the
cookie it's disabling because they cannot be completed after.

Without this, fscache_nfs_open_file() gets stuck because it disables the
cookie when the file is opened for writing but can't uncache the pages till
afterwards - otherwise there's a race between the open routine and anyone
who already has it open R/O and is still reading from it.

Looking in /proc/pid/stack of the offending process shows:

[] __fscache_wait_on_page_write+0x82/0x9b [fscache]
[] __fscache_uncache_all_inode_pages+0x91/0xe1 [fscache]
[] nfs_fscache_open_file+0x59/0x9e [nfs]
[] nfs4_file_open+0x17f/0x1b8 [nfsv4]
[] do_dentry_open+0x16d/0x2b7
[] vfs_open+0x5c/0x65
[] path_openat+0x785/0x8fb
[] do_filp_open+0x48/0x9e
[] do_sys_open+0x13b/0x1cb
[] SyS_open+0x19/0x1b
[] do_syscall_64+0x80/0x17a
[] return_from_SYSCALL_64+0x0/0x7a
[] 0xffffffffffffffff

Reported-by: Jianhong Yin 
Signed-off-by: David Howells 
Acked-by: Jeff Layton 
Acked-by: Steve Dickson 
Signed-off-by: Al Viro 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

fscache: Fix dead object requeue

2017-06-17T04:39:36+00:00

[ Upstream commit e26bfebdfc0d212d366de9990a096665d5c0209a ]

Under some circumstances, an fscache object can become queued such that it
fscache_object_work_func() can be called once the object is in the
OBJECT_DEAD state.  This results in the kernel oopsing when it tries to
invoke the handler for the state (which is hard coded to 0x2).

The way this comes about is something like the following:

 (1) The object dispatcher is processing a work state for an object.  This
     is done in workqueue context.

 (2) An out-of-band event comes in that isn't masked, causing the object to
     be queued, say EV_KILL.

 (3) The object dispatcher finishes processing the current work state on
     that object and then sees there's another event to process, so,
     without returning to the workqueue core, it processes that event too.
     It then follows the chain of events that initiates until we reach
     OBJECT_DEAD without going through a wait state (such as
     WAIT_FOR_CLEARANCE).

     At this point, object->events may be 0, object->event_mask will be 0
     and oob_event_mask will be 0.

 (4) The object dispatcher returns to the workqueue processor, and in due
     course, this sees that the object's work item is still queued and
     invokes it again.

 (5) The current state is a work state (OBJECT_DEAD), so the dispatcher
     jumps to it - resulting in an OOPS.

When I'm seeing this, the work state in (1) appears to have been either
LOOK_UP_OBJECT or CREATE_OBJECT (object->oob_table is
fscache_osm_lookup_oob).

The window for (2) is very small:

 (A) object->event_mask is cleared whilst the event dispatch process is
     underway - though there's no memory barrier to force this to the top
     of the function.

     The window, therefore is from the time the object was selected by the
     workqueue processor and made requeueable to the time the mask was
     cleared.

 (B) fscache_raise_event() will only queue the object if it manages to set
     the event bit and the corresponding event_mask bit was set.

     The enqueuement is then deferred slightly whilst we get a ref on the
     object and get the per-CPU variable for workqueue congestion.  This
     slight deferral slightly increases the probability by allowing extra
     time for the workqueue to make the item requeueable.

Handle this by giving the dead state a processor function and checking the
for the dead state address rather than seeing if the processor function is
address 0x2.  The dead state processor function can then set a flag to
indicate that it's occurred and give a warning if it occurs more than once
per object.

If this race occurs, an oops similar to the following is seen (note the RIP
value):

BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
IP: [<0000000000000002>] 0x1
PGD 0
Oops: 0010 [#1] SMP
Modules linked in: ...
CPU: 17 PID: 16077 Comm: kworker/u48:9 Not tainted 3.10.0-327.18.2.el7.x86_64 #1
Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
Workqueue: fscache_object fscache_object_work_func [fscache]
task: ffff880302b63980 ti: ffff880717544000 task.ti: ffff880717544000
RIP: 0010:[<0000000000000002>]  [<0000000000000002>] 0x1
RSP: 0018:ffff880717547df8  EFLAGS: 00010202
RAX: ffffffffa0368640 RBX: ffff880edf7a4480 RCX: dead000000200200
RDX: 0000000000000002 RSI: 00000000ffffffff RDI: ffff880edf7a4480
RBP: ffff880717547e18 R08: 0000000000000000 R09: dfc40a25cb3a4510
R10: dfc40a25cb3a4510 R11: 0000000000000400 R12: 0000000000000000
R13: ffff880edf7a4510 R14: ffff8817f6153400 R15: 0000000000000600
FS:  0000000000000000(0000) GS:ffff88181f420000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000002 CR3: 000000000194a000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffffffffa0363695 ffff880edf7a4510 ffff88093f16f900 ffff8817faa4ec00
 ffff880717547e60 ffffffff8109d5db 00000000faa4ec18 0000000000000000
 ffff8817faa4ec18 ffff88093f16f930 ffff880302b63980 ffff88093f16f900
Call Trace:
 [] ? fscache_object_work_func+0xa5/0x200 [fscache]
 [] process_one_work+0x17b/0x470
 [] worker_thread+0x21c/0x400
 [] ? rescuer_thread+0x400/0x400
 [] kthread+0xcf/0xe0
 [] ? kthread_create_on_node+0x140/0x140
 [] ret_from_fork+0x58/0x90
 [] ? kthread_create_on_node+0x140/0x140

Signed-off-by: David Howells 
Acked-by: Jeremy McNicoll 
Tested-by: Frank Sorenson 
Tested-by: Benjamin Coddington 
Reviewed-by: Benjamin Coddington 
Signed-off-by: Al Viro 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

FS-Cache: Handle a write to the page immediately beyond the EOF marker

2015-11-11T07:11:02+00:00

Handle a write being requested to the page immediately beyond the EOF
marker on a cache object.  Currently this gets an assertion failure in
CacheFiles because the EOF marker is used there to encode information about
a partial page at the EOF - which could lead to an unknown blank spot in
the file if we extend the file over it.

The problem is actually in fscache where we check the index of the page
being written against store_limit.  store_limit is set to the number of
pages that we're allowed to store by fscache_set_store_limit() - which
means it's one more than the index of the last page we're allowed to store.
The problem is that we permit writing to a page with an index _equal_ to
the store limit - when we should reject that case.

Whilst we're at it, change the triggered assertion in CacheFiles to just
return -ENOBUFS instead.

The assertion failure looks something like this:

CacheFiles: Assertion failed
1000 < 7b1 is false
------------[ cut here ]------------
kernel BUG at fs/cachefiles/rdwr.c:962!
...
RIP: 0010:[]  [] cachefiles_write_page+0x273/0x2d0 [cachefiles]

Cc: stable@vger.kernel.org # v2.6.31+; earlier - that + backport of a17754f (at least)
Signed-off-by: David Howells 
Signed-off-by: Al Viro

FS-Cache: Don't override netfs's primary_index if registering failed

2015-11-11T07:07:51+00:00

Only override netfs->primary_index when registering success.

Cc: stable@vger.kernel.org # v2.6.30+
Signed-off-by: Kinglong Mee 
Signed-off-by: David Howells 
Signed-off-by: Al Viro

FS-Cache: Increase reference of parent after registering, netfs success

2015-11-11T07:06:53+00:00

If netfs exist, fscache should not increase the reference of parent's
usage and n_children, otherwise, never be decreased.

v2: thanks David's suggest,
 move increasing reference of parent if success
 use kmem_cache_free() freeing primary_index directly

v3: don't move "netfs->primary_index->parent = &fscache_fsdef_index;"

Cc: stable@vger.kernel.org # v2.6.30+
Signed-off-by: Kinglong Mee 
Signed-off-by: David Howells 
Signed-off-by: Al Viro

mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd

2015-11-07T01:50:42+00:00

__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts.  They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve".  __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available.  Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative.  High priority users continue to use
__GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
  pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
  __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
  into this category where kswapd will still be woken but atomic reserves
  are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
  helper gfpflags_allow_blocking() where possible. This is because
  checking for __GFP_WAIT as was done historically now can trigger false
  positives. Some exceptions like dm-crypt.c exist where the code intent
  is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
  flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
  and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL.  They may
now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.

Signed-off-by: Mel Gorman 
Acked-by: Vlastimil Babka 
Acked-by: Michal Hocko 
Acked-by: Johannes Weiner 
Cc: Christoph Lameter 
Cc: David Rientjes 
Cc: Vitaly Wool 
Cc: Rik van Riel 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

KEYS: Merge the type-specific data with the payload data

2015-10-21T14:18:36+00:00

Merge the type-specific data with the payload data into one four-word chunk
as it seems pointless to keep them separate.

Use user_key_payload() for accessing the payloads of overloaded
user-defined keys.

Signed-off-by: David Howells 
cc: linux-cifs@vger.kernel.org
cc: ecryptfs@vger.kernel.org
cc: linux-ext4@vger.kernel.org
cc: linux-f2fs-devel@lists.sourceforge.net
cc: linux-nfs@vger.kernel.org
cc: ceph-devel@vger.kernel.org
cc: linux-ima-devel@lists.sourceforge.net

FS-Cache: Retain the netfs context in the retrieval op earlier

2015-04-02T13:28:53+00:00

Now that the retrieval operation may be disposed of by fscache_put_operation()
before we actually set the context, the retrieval-specific cleanup operation
can produce a NULL-pointer dereference when it tries to unconditionally clean
up the netfs context.

Given that it is expected that we'll get at least as far as the place where we
currently set the context pointer and it is unlikely we'll go through the
error handling paths prior to that point, retain the context right from the
point that the retrieval op is allocated.

Concomitant to this, we need to retain the cookie pointer in the retrieval op
also so that we can call the netfs to release its context in the release
method.

In addition, we might now get into fscache_release_retrieval_op() with the op
only initialised.  To this end, set the operation to DEAD only after the
release method has been called and skip the n_pages test upon cleanup if the
op is still in the INITIALISED state.

Without these changes, the following oops might be seen:

	BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8
	...
	RIP: 0010:[] fscache_release_retrieval_op+0xae/0x100
	...
	Call Trace:
	 [] fscache_put_operation+0x117/0x2e0
	 [] __fscache_read_or_alloc_pages+0x351/0x3ac
	 [] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
	 [] nfs_readpages+0x10c/0x185 [nfs]
	 [] ? alloc_pages_current+0x119/0x13e
	 [] ? __page_cache_alloc+0xfb/0x10a
	 [] __do_page_cache_readahead+0x188/0x22c
	 [] ondemand_readahead+0x29e/0x2af
	 [] page_cache_sync_readahead+0x38/0x3a
	 [] generic_file_read_iter+0x1a2/0x55a
	 [] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
	 [] nfs_file_read+0x49/0x70 [nfs]
	 [] new_sync_read+0x78/0x9c
	 [] __vfs_read+0x13/0x38
	 [] vfs_read+0x95/0x121
	 [] SyS_read+0x4c/0x8a
	 [] system_call_fastpath+0x12/0x17

Signed-off-by: David Howells 
Reviewed-by: Steve Dickson 
Acked-by: Jeff Layton

FS-Cache: The operation cancellation method needs calling in more places

2015-04-02T13:28:53+00:00

Any time an incomplete operation is cancelled, the operation cancellation
function needs to be called to clean up.  This is currently being passed
directly to some of the functions that might want to call it, but not all.

Instead, pass the cancellation method pointer to the fscache_operation_init()
and have that cache it in the operation struct.  Further, plug in a dummy
cancellation handler if the caller declines to set one as this allows us to
call the function unconditionally (the extra overhead isn't worth bothering
about as we don't expect to be calling this typically).

The cancellation method must thence be called everywhere the CANCELLED state
is set.  Note that we call it *before* setting the CANCELLED state such that
the method can use the old state value to guide its operation.

fscache_do_cancel_retrieval() needs moving higher up in the sources so that
the init function can use it now.

Without this, the following oops may be seen:

	FS-Cache: Assertion failed
	FS-Cache: 3 == 0 is false
	------------[ cut here ]------------
	kernel BUG at ../fs/fscache/page.c:261!
	...
	RIP: 0010:[]  fscache_release_retrieval_op+0x77/0x100
	 [] fscache_put_operation+0x114/0x2da
	 [] __fscache_read_or_alloc_pages+0x358/0x3b3
	 [] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
	 [] nfs_readpages+0x10c/0x185 [nfs]
	 [] ? alloc_pages_current+0x119/0x13e
	 [] ? __page_cache_alloc+0xfb/0x10a
	 [] __do_page_cache_readahead+0x188/0x22c
	 [] ondemand_readahead+0x29e/0x2af
	 [] page_cache_sync_readahead+0x38/0x3a
	 [] generic_file_read_iter+0x1a2/0x55a
	 [] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
	 [] nfs_file_read+0x49/0x70 [nfs]
	 [] new_sync_read+0x78/0x9c
	 [] __vfs_read+0x13/0x38
	 [] vfs_read+0x95/0x121
	 [] SyS_read+0x4c/0x8a
	 [] system_call_fastpath+0x12/0x17

The assertion is showing that the remaining number of pages (n_pages) is not 0
when the operation is being released.

Signed-off-by: David Howells 
Reviewed-by: Steve Dickson 
Acked-by: Jeff Layton