linux-toradex.git/fs, branch v3.14.43

deal with deadlock in d_walk()

2015-05-17T16:53:51+00:00

commit ca5358ef75fc69fee5322a38a340f5739d997c10 upstream.

... by not hitting rename_retry for reasons other than rename having
happened.  In other words, do _not_ restart when finding that
between unlocking the child and locking the parent the former got
into __dentry_kill().  Skip the killed siblings instead...

Signed-off-by: Al Viro 
Cc: Ben Hutchings 
[hujianyang: Backported to 3.14 refer to the work of Ben Hutchings in 3.2:
 - Adjust context to make __dentry_kill() apply to d_kill()]
Signed-off-by: hujianyang 
Signed-off-by: Greg Kroah-Hartman

mnt: Fix fs_fully_visible to verify the root directory is visible

2015-05-17T16:53:49+00:00

commit 7e96c1b0e0f495c5a7450dc4aa7c9a24ba4305bd upstream.

This fixes a dumb bug in fs_fully_visible that allows proc or sys to
be mounted if there is a bind mount of part of /proc/ or /sys/ visible.

Reported-by: Eric Windisch 
Signed-off-by: "Eric W. Biederman" 
Signed-off-by: Greg Kroah-Hartman

nilfs2: fix sanity check of btree level in nilfs_btree_root_broken()

2015-05-17T16:53:49+00:00

commit d8fd150fe3935e1692bf57c66691e17409ebb9c1 upstream.

The range check for b-tree level parameter in nilfs_btree_root_broken()
is wrong; it accepts the case of "level == NILFS_BTREE_LEVEL_MAX" even
though the level is limited to values in the range of 0 to
(NILFS_BTREE_LEVEL_MAX - 1).

Since the level parameter is read from storage device and used to index
nilfs_btree_path array whose element count is NILFS_BTREE_LEVEL_MAX, it
can cause memory overrun during btree operations if the boundary value
is set to the level parameter on device.

This fixes the broken sanity check and adds a comment to clarify that
the upper bound NILFS_BTREE_LEVEL_MAX is exclusive.

Signed-off-by: Ryusuke Konishi 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

ocfs2: dlm: fix race between purge and get lock resource

2015-05-17T16:53:49+00:00

commit b1432a2a35565f538586774a03bf277c27fc267d upstream.

There is a race window in dlm_get_lock_resource(), which may return a
lock resource which has been purged.  This will cause the process to
hang forever in dlmlock() as the ast msg can't be handled due to its
lock resource not existing.

    dlm_get_lock_resource {
        ...
        spin_lock(&dlm->spinlock);
        tmpres = __dlm_lookup_lockres_full(dlm, lockid, namelen, hash);
        if (tmpres) {
             spin_unlock(&dlm->spinlock);
             >>>>>>>> race window, dlm_run_purge_list() may run and purge
                              the lock resource
             spin_lock(&tmpres->spinlock);
             ...
             spin_unlock(&tmpres->spinlock);
        }
    }

Signed-off-by: Junxiao Bi 
Cc: Joseph Qi 
Cc: Mark Fasheh 
Cc: Joel Becker 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

ext4: fix data corruption caused by unwritten and delayed extents

2015-05-13T12:16:57+00:00

commit d2dc317d564a46dfc683978a2e5a4f91434e9711 upstream.

Currently it is possible to lose whole file system block worth of data
when we hit the specific interaction with unwritten and delayed extents
in status extent tree.

The problem is that when we insert delayed extent into extent status
tree the only way to get rid of it is when we write out delayed buffer.
However there is a limitation in the extent status tree implementation
so that when inserting unwritten extent should there be even a single
delayed block the whole unwritten extent would be marked as delayed.

At this point, there is no way to get rid of the delayed extents,
because there are no delayed buffers to write out. So when a we write
into said unwritten extent we will convert it to written, but it still
remains delayed.

When we try to write into that block later ext4_da_map_blocks() will set
the buffer new and delayed and map it to invalid block which causes
the rest of the block to be zeroed loosing already written data.

For now we can fix this by simply not allowing to set delayed status on
written extent in the extent status tree. Also add WARN_ON() to make
sure that we notice if this happens in the future.

This problem can be easily reproduced by running the following xfs_io.

xfs_io -f -c "pwrite -S 0xaa 4096 2048" \
          -c "falloc 0 131072" \
          -c "pwrite -S 0xbb 65536 2048" \
          -c "fsync" /mnt/test/fff

echo 3 > /proc/sys/vm/drop_caches
xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff

This can be theoretically also reproduced by at random by running fsx,
but it's not very reliable, though on machines with bigger page size
(like ppc) this can be seen more often (especially xfstest generic/127)

Signed-off-by: Lukas Czerner 
Signed-off-by: Theodore Ts'o 
Signed-off-by: Greg Kroah-Hartman

fs: take i_mutex during prepare_binprm for set[ug]id executables

2015-05-06T19:59:21+00:00

commit 8b01fc86b9f425899f8a3a8fc1c47d73c2c20543 upstream.

This prevents a race between chown() and execve(), where chowning a
setuid-user binary to root would momentarily make the binary setuid
root.

This patch was mostly written by Linus Torvalds.

Signed-off-by: Jann Horn 
Signed-off-by: Linus Torvalds 
Signed-off-by: Charles Williams 
Signed-off-by: Greg Kroah-Hartman

RCU pathwalk breakage when running into a symlink overmounting something

2015-05-06T19:59:20+00:00

commit 3cab989afd8d8d1bc3d99fef0e7ed87c31e7b647 upstream.

Calling unlazy_walk() in walk_component() and do_last() when we find
a symlink that needs to be followed doesn't acquire a reference to vfsmount.
That's fine when the symlink is on the same vfsmount as the parent directory
(which is almost always the case), but it's not always true - one _can_
manage to bind a symlink on top of something.  And in such cases we end up
with excessive mntput().

Signed-off-by: Al Viro 
Signed-off-by: Greg Kroah-Hartman

ext4: make fsync to sync parent dir in no-journal for real this time

2015-05-06T19:59:15+00:00

commit e12fb97222fc41e8442896934f76d39ef99b590a upstream.

Previously commit 14ece1028b3ed53ffec1b1213ffc6acaf79ad77c added a
support for for syncing parent directory of newly created inodes to
make sure that the inode is not lost after a power failure in
no-journal mode.

However this does not work in majority of cases, namely:
 - if the directory has inline data
 - if the directory is already indexed
 - if the directory already has at least one block and:
	- the new entry fits into it
	- or we've successfully converted it to indexed

So in those cases we might lose the inode entirely even after fsync in
the no-journal mode. This also includes ext2 default mode obviously.

I've noticed this while running xfstest generic/321 and even though the
test should fail (we need to run fsck after a crash in no-journal mode)
I could not find a newly created entries even when if it was fsynced
before.

Fix this by adjusting the ext4_add_entry() successful exit paths to set
the inode EXT4_STATE_NEWENTRY so that fsync has the chance to fsync the
parent directory as well.

Signed-off-by: Lukas Czerner 
Signed-off-by: Theodore Ts'o 
Reviewed-by: Jan Kara 
Cc: Frank Mayhar 
Signed-off-by: Greg Kroah-Hartman

fs/binfmt_elf.c: fix bug in loading of PIE binaries

2015-05-06T19:59:14+00:00

commit a87938b2e246b81b4fb713edb371a9fa3c5c3c86 upstream.

With CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE enabled, and a normal top-down
address allocation strategy, load_elf_binary() will attempt to map a PIE
binary into an address range immediately below mm->mmap_base.

Unfortunately, load_elf_ binary() does not take account of the need to
allocate sufficient space for the entire binary which means that, while
the first PT_LOAD segment is mapped below mm->mmap_base, the subsequent
PT_LOAD segment(s) end up being mapped above mm->mmap_base into the are
that is supposed to be the "gap" between the stack and the binary.

Since the size of the "gap" on x86_64 is only guaranteed to be 128MB this
means that binaries with large data segments > 128MB can end up mapping
part of their data segment over their stack resulting in corruption of the
stack (and the data segment once the binary starts to run).

Any PIE binary with a data segment > 128MB is vulnerable to this although
address randomization means that the actual gap between the stack and the
end of the binary is normally greater than 128MB.  The larger the data
segment of the binary the higher the probability of failure.

Fix this by calculating the total size of the binary in the same way as
load_elf_interp().

Signed-off-by: Michael Davidson 
Cc: Alexander Viro 
Cc: Jiri Kosina 
Cc: Kees Cook 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

NFS: fix BUG() crash in notify_change() with patch to chown_common()

2015-05-06T19:59:11+00:00

commit c1b8940b42bb6487b10f2267a96b486276ce9ff7 upstream.

We have observed a BUG() crash in fs/attr.c:notify_change(). The crash
occurs during an rsync into a filesystem that is exported via NFS.

1.) fs/attr.c:notify_change() modifies the caller's version of attr.
2.) 6de0ec00ba8d ("VFS: make notify_change pass ATTR_KILL_S*ID to
    setattr operations") introduced a BUG() restriction such that "no
    function will ever call notify_change() with both ATTR_MODE and
    ATTR_KILL_S*ID set". Under some circumstances though, it will have
    assisted in setting the caller's version of attr to this very
    combination.
3.) 27ac0ffeac80 ("locks: break delegations on any attribute
    modification") introduced code to handle breaking
    delegations. This can result in notify_change() being re-called. attr
    _must_ be explicitly reset to avoid triggering the BUG() established
    in #2.
4.) The path that that triggers this is via fs/open.c:chmod_common().
    The combination of attr flags set here and in the first call to
    notify_change() along with a later failed break_deleg_wait()
    results in notify_change() being called again via retry_deleg
    without resetting attr.

Solution is to move retry_deleg in chmod_common() a bit further up to
ensure attr is completely reset.

There are other places where this seemingly could occur, such as
fs/utimes.c:utimes_common(), but the attr flags are not initially
set in such a way to trigger this.

Fixes: 27ac0ffeac80 ("locks: break delegations on any attribute modification")
Reported-by: Eric Meddaugh 
Tested-by: Eric Meddaugh 
Signed-off-by: Andrew Elble 
Signed-off-by: Al Viro 
Signed-off-by: Greg Kroah-Hartman