linux-toradex.git/fs/dcache.c, branch v3.2.73

dcache: Handle escaped paths in prepend_path

2015-10-13T02:46:04+00:00

commit cde93be45a8a90d8c264c776fab63487b5038a65 upstream.

A rename can result in a dentry that by walking up d_parent
will never reach it's mnt_root.  For lack of a better term
I call this an escaped path.

prepend_path is called by four different functions __d_path,
d_absolute_path, d_path, and getcwd.

__d_path only wants to see paths are connected to the root it passes
in.  So __d_path needs prepend_path to return an error.

d_absolute_path similarly wants to see paths that are connected to
some root.  Escaped paths are not connected to any mnt_root so
d_absolute_path needs prepend_path to return an error greater
than 1.  So escaped paths will be treated like paths on lazily
unmounted mounts.

getcwd needs to prepend "(unreachable)" so getcwd also needs
prepend_path to return an error.

d_path is the interesting hold out.  d_path just wants to print
something, and does not care about the weird cases.  Which raises
the question what should be printed?

Given that / should result in -ENOENT I
believe it is desirable for escaped paths to be printed as empty
paths.  As there are not really any meaninful path components when
considered from the perspective of a mount tree.

So tweak prepend_path to return an empty path with an new error
code of 3 when it encounters an escaped path.

Signed-off-by: "Eric W. Biederman" 
Signed-off-by: Al Viro 
Signed-off-by: Ben Hutchings

d_walk() might skip too much

2015-08-06T23:32:13+00:00

commit 2159184ea01e4ae7d15f2017e296d4bc82d5aeb0 upstream.

when we find that a child has died while we'd been trying to ascend,
we should go into the first live sibling itself, rather than its sibling.

Off-by-one in question had been introduced in "deal with deadlock in
d_walk()" and the fix needs to be backported to all branches this one
has been backported to.

Signed-off-by: Al Viro 
[bwh: Backported to 3.2: apply to the 3 copies of this logic we
 ended up with]
Signed-off-by: Ben Hutchings

dcache: Fix locking bugs in backported "deal with deadlock in d_walk()"

2015-02-20T00:49:41+00:00

Steven Rostedt reported:
> Porting -rt to the latest 3.2 stable tree I triggered this bug:
> 
> =====================================
> [ BUG: bad unlock balance detected! ]
> -------------------------------------
> rm/1638 is trying to release lock (rcu_read_lock) at:
> [] rcu_read_unlock+0x0/0x23
> but there are no more locks to release!
> 
> other info that might help us debug this:
> 2 locks held by rm/1638:
>  #0:  (&sb->s_type->i_mutex_key#9/1){+.+.+.}, at: [] do_rmdir+0x5f/0xd2
>  #1:  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [] vfs_rmdir+0x49/0xac
> 
> stack backtrace:
> Pid: 1638, comm: rm Not tainted 3.2.66-test-rt96+ #2
> Call Trace:
>  [] ? printk+0x1d/0x1f
>  [] print_unlock_inbalance_bug+0xc3/0xcd
>  [] lock_release_non_nested+0x98/0x1ec
>  [] ? trace_hardirqs_off_caller+0x18/0x90
>  [] ? local_clock+0x2d/0x50
>  [] ? d_hash+0x2f/0x2f
>  [] ? d_hash+0x2f/0x2f
>  [] lock_release+0x192/0x1ad
>  [] rcu_read_unlock+0x17/0x23
>  [] shrink_dcache_parent+0x227/0x270
>  [] vfs_rmdir+0x68/0xac
>  [] do_rmdir+0x98/0xd2
>  [] ? fput+0x1a3/0x1ab
>  [] ? sysenter_exit+0xf/0x1a
>  [] ? trace_hardirqs_on_caller+0x118/0x149
>  [] sys_unlinkat+0x2b/0x35
>  [] sysenter_do_call+0x12/0x12
> 
> 
> 
> 
> There's a path to calling rcu_read_unlock() without calling
> rcu_read_lock() in have_submounts().
> 
> 	goto positive;
> 
> positive:
> 	if (!locked && read_seqretry(&rename_lock, seq))
> 		goto rename_retry;
> 
> rename_retry:
> 	rcu_read_unlock();
> 
> in the above path, rcu_read_lock() is never done before calling
> rcu_read_unlock();

I reviewed locking contexts in all three functions that I changed when
backporting "deal with deadlock in d_walk()".  It's actually worse
than this:

- We don't hold this_parent->d_lock at the 'positive' label in
  have_submounts(), but it is unlocked after 'rename_retry'.
- There is an rcu_read_unlock() after the 'out' label in
  select_parent(), but it's not held at the 'goto out'.

Fix all three lock imbalances.

Reported-by: Steven Rostedt 
Signed-off-by: Ben Hutchings 
Tested-by: Steven Rostedt

deal with deadlock in d_walk()

2015-01-01T01:27:51+00:00

commit ca5358ef75fc69fee5322a38a340f5739d997c10 upstream.

... by not hitting rename_retry for reasons other than rename having
happened.  In other words, do _not_ restart when finding that
between unlocking the child and locking the parent the former got
into __dentry_kill().  Skip the killed siblings instead...

Signed-off-by: Al Viro 
[bwh: Backported to 3.2:
 - As we only have try_to_ascend() and not d_walk(), apply this
   change to all callers of try_to_ascend()
 - Adjust context to make __dentry_kill() apply to d_kill()]
Signed-off-by: Ben Hutchings

move d_rcu from overlapping d_child to overlapping d_alias

2015-01-01T01:27:50+00:00

commit 946e51f2bf37f1656916eb75bd0742ba33983c28 upstream.

Signed-off-by: Al Viro 
[bwh: Backported to 3.2:
 - Apply name changes in all the different places we use d_alias and d_child
 - Move the WARN_ON() in __d_free() to d_free() as we don't have dentry_free()]
Signed-off-by: Ben Hutchings

fs/dcache.c: add cond_resched() to shrink_dcache_parent()

2013-05-13T14:02:30+00:00

commit 421348f1ca0bf17769dee0aed4d991845ae0536d upstream.

Call cond_resched() in shrink_dcache_parent() to maintain interactivity.

Before this patch:

	void shrink_dcache_parent(struct dentry * parent)
	{
		while ((found = select_parent(parent, &dispose)) != 0)
			shrink_dentry_list(&dispose);
	}

select_parent() populates the dispose list with dentries which
shrink_dentry_list() then deletes.  select_parent() carefully uses
need_resched() to avoid doing too much work at once.  But neither
shrink_dcache_parent() nor its called functions call cond_resched().  So
once need_resched() is set select_parent() will return single dentry
dispose list which is then deleted by shrink_dentry_list().  This is
inefficient when there are a lot of dentry to process.  This can cause
softlockup and hurts interactivity on non preemptable kernels.

This change adds cond_resched() in shrink_dcache_parent().  The benefit
of this is that need_resched() is quickly cleared so that future calls
to select_parent() are able to efficiently return a big batch of dentry.

These additional cond_resched() do not seem to impact performance, at
least for the workload below.

Here is a program which can cause soft lockup if other system activity
sets need_resched().

	int main()
	{
	        struct rlimit rlim;
	        int i;
	        int f[100000];
	        char buf[20];
	        struct timeval t1, t2;
	        double diff;

	        /* cleanup past run */
	        system("rm -rf x");

	        /* boost nfile rlimit */
	        rlim.rlim_cur = 200000;
	        rlim.rlim_max = 200000;
	        if (setrlimit(RLIMIT_NOFILE, &rlim))
	                err(1, "setrlimit");

	        /* make directory for files */
	        if (mkdir("x", 0700))
	                err(1, "mkdir");

	        if (gettimeofday(&t1, NULL))
	                err(1, "gettimeofday");

	        /* populate directory with open files */
	        for (i = 0; i < 100000; i++) {
	                snprintf(buf, sizeof(buf), "x/%d", i);
	                f[i] = open(buf, O_CREAT);
	                if (f[i] == -1)
	                        err(1, "open");
	        }

	        /* close some of the files */
	        for (i = 0; i < 85000; i++)
	                close(f[i]);

	        /* unlink all files, even open ones */
	        system("rm -rf x");

	        if (gettimeofday(&t2, NULL))
	                err(1, "gettimeofday");

	        diff = (((double)t2.tv_sec * 1000000 + t2.tv_usec) -
	                ((double)t1.tv_sec * 1000000 + t1.tv_usec));

	        printf("done: %g elapsed\n", diff/1e6);
	        return 0;
	}

Signed-off-by: Greg Thelen 
Signed-off-by: Dave Chinner 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings

Nest rename_lock inside vfsmount_lock

2013-04-10T02:20:03+00:00

commit 7ea600b5314529f9d1b9d6d3c41cb26fce6a7a4a upstream.

... lest we get livelocks between path_is_under() and d_path() and friends.

The thing is, wrt fairness lglocks are more similar to rwsems than to rwlocks;
it is possible to have thread B spin on attempt to take lock shared while thread
A is already holding it shared, if B is on lower-numbered CPU than A and there's
a thread C spinning on attempt to take the same lock exclusive.

As the result, we need consistent ordering between vfsmount_lock (lglock) and
rename_lock (seq_lock), even though everything that takes both is going to take
vfsmount_lock only shared.

Spotted-by: Brad Spengler 
Signed-off-by: Al Viro 
[bwh: Backported to 3.2:
 - Adjust context
 - s/&vfsmount_lock/vfsmount_lock/]
Signed-off-by: Ben Hutchings

vfs: d_obtain_alias() needs to use "/" as default name.

2013-01-03T03:33:45+00:00

commit b911a6bdeef5848c468597d040e3407e0aee04ce upstream.

NFS appears to use d_obtain_alias() to create the root dentry rather than
d_make_root.  This can cause 'prepend_path()' to complain that the root
has a weird name if an NFS filesystem is lazily unmounted.  e.g.  if
"/mnt" is an NFS mount then

 { cd /mnt; umount -l /mnt ; ls -l /proc/self/cwd; }

will cause a WARN message like
   WARNING: at /home/git/linux/fs/dcache.c:2624 prepend_path+0x1d7/0x1e0()
   ...
   Root dentry has weird name <>

to appear in kernel logs.

So change d_obtain_alias() to use "/" rather than "" as the anonymous
name.

Signed-off-by: NeilBrown 
Cc: Trond Myklebust 
Cc: Al Viro 
Signed-off-by: Andrew Morton 
Signed-off-by: Al Viro 
[bwh: Backported to 3.2: use named initialisers instead of QSTR_INIT()]
Signed-off-by: Ben Hutchings

vfs: dcache: fix deadlock in tree traversal

2012-10-10T02:31:13+00:00

commit 8110e16d42d587997bcaee0c864179e6d93603fe upstream.

IBM reported a deadlock in select_parent().  This was found to be caused
by taking rename_lock when already locked when restarting the tree
traversal.

There are two cases when the traversal needs to be restarted:

 1) concurrent d_move(); this can only happen when not already locked,
    since taking rename_lock protects against concurrent d_move().

 2) racing with final d_put() on child just at the moment of ascending
    to parent; rename_lock doesn't protect against this rare race, so it
    can happen when already locked.

Because of case 2, we need to be able to handle restarting the traversal
when rename_lock is already held.  This patch fixes all three callers of
try_to_ascend().

IBM reported that the deadlock is gone with this patch.

[ I rewrote the patch to be smaller and just do the "goto again" if the
  lock was already held, but credit goes to Miklos for the real work.
   - Linus ]

Signed-off-by: Miklos Szeredi 
Cc: Al Viro 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings

vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill()

2012-10-10T02:30:48+00:00

commit b161dfa6937ae46d50adce8a7c6b12233e96e7bd upstream.

IBM reported a soft lockup after applying the fix for the rename_lock
deadlock.  Commit c83ce989cb5f ("VFS: Fix the nfs sillyrename regression
in kernel 2.6.38") was found to be the culprit.

The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the
dentry was killed.  This flag can be set on non-killed dentries too,
which results in infinite retries when trying to traverse the dentry
tree.

This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is
only set in d_kill() and makes try_to_ascend() test only this flag.

IBM reported successful test results with this patch.

Signed-off-by: Miklos Szeredi 
Cc: Trond Myklebust 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings