summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-07-27ovl: verify upper dentry before unlink and renameMiklos Szeredi
commit 11f3710417d026ea2f4fcf362d866342c5274185 upstream. Unlink and rename in overlayfs checked the upper dentry for staleness by verifying upper->d_parent against upperdir. However the dentry can go stale also by being unhashed, for example. Expand the verification to actually look up the name again (under parent lock) and check if it matches the upper dentry. This matches what the VFS does before passing the dentry to filesytem's unlink/rename methods, which excludes any inconsistency caused by overlayfs. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27File names with trailing period or space need special case conversionSteve French
commit 45e8a2583d97ca758a55c608f78c4cef562644d1 upstream. POSIX allows files with trailing spaces or a trailing period but SMB3 does not, so convert these using the normal Services For Mac mapping as we do for other reserved characters such as : < > | ? * This is similar to what Macs do for the same problem over SMB3. Signed-off-by: Steve French <steve.french@primarydata.com> Acked-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27cifs: dynamic allocation of ntlmssp blobJerome Marchand
commit b8da344b74c822e966c6d19d6b2321efe82c5d97 upstream. In sess_auth_rawntlmssp_authenticate(), the ntlmssp blob is allocated statically and its size is an "empirical" 5*sizeof(struct _AUTHENTICATE_MESSAGE) (320B on x86_64). I don't know where this value comes from or if it was ever appropriate, but it is currently insufficient: the user and domain name in UTF16 could take 1kB by themselves. Because of that, build_ntlmssp_auth_blob() might corrupt memory (out-of-bounds write). The size of ntlmssp_blob in SMB2_sess_setup() is too small too (sizeof(struct _NEGOTIATE_MESSAGE) + 500). This patch allocates the blob dynamically in build_ntlmssp_auth_blob(). Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27Fix reconnect to not defer smb3 session reconnect long after socket reconnectSteve French
commit 4fcd1813e6404dd4420c7d12fb483f9320f0bf93 upstream. Azure server blocks clients that open a socket and don't do anything on it. In our reconnect scenarios, we can reconnect the tcp session and detect the socket is available but we defer the negprot and SMB3 session setup and tree connect reconnection until the next i/o is requested, but this looks suspicous to some servers who expect SMB3 negprog and session setup soon after a socket is created. In the echo thread, reconnect SMB3 sessions and tree connections that are disconnected. A later patch will replay persistent (and resilient) handle opens. Signed-off-by: Steve French <steve.french@primarydata.com> Acked-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27ovl: verify upper dentry in ovl_remove_and_whiteout()Maxim Patlasov
commit cfc9fde0b07c3b44b570057c5f93dda59dca1c94 upstream. The upper dentry may become stale before we call ovl_lock_rename_workdir. For example, someone could (mistakenly or maliciously) manually unlink(2) it directly from upperdir. To ensure it is not stale, let's lookup it after ovl_lock_rename_workdir and and check if it matches the upper dentry. Essentially, it is the same problem and similar solution as in commit 11f3710417d0 ("ovl: verify upper dentry before unlink and rename"). Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27ovl: Copy up underlying inode's ->i_mode to overlay inodeVivek Goyal
commit 07a2daab49c549a37b5b744cbebb6e3f445f12bc upstream. Right now when a new overlay inode is created, we initialize overlay inode's ->i_mode from underlying inode ->i_mode but we retain only file type bits (S_IFMT) and discard permission bits. This patch changes it and retains permission bits too. This should allow overlay to do permission checks on overlay inode itself in task context. [SzM] It also fixes clearing suid/sgid bits on write. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27btrfs: account for non-CoW'd blocks in btrfs_abort_transactionJeff Mahoney
commit 64c12921e11b3a0c10d088606e328c58e29274d8 upstream. The test for !trans->blocks_used in btrfs_abort_transaction is insufficient to determine whether it's safe to drop the transaction handle on the floor. btrfs_cow_block, informed by should_cow_block, can return blocks that have already been CoW'd in the current transaction. trans->blocks_used is only incremented for new block allocations. If an operation overlaps the blocks in the current transaction entirely and must abort the transaction, we'll happily let it clean up the trans handle even though it may have modified the blocks and will commit an incomplete operation. In the long-term, I'd like to do closer tracking of when the fs is actually modified so we can still recover as gracefully as possible, but that approach will need some discussion. In the short term, since this is the only code using trans->blocks_used, let's just switch it to a bool indicating whether any blocks were used and set it when should_cow_block returns false. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27UBIFS: Implement ->migratepage()Kirill A. Shutemov
commit 4ac1c17b2044a1b4b2fbed74451947e905fc2992 upstream. During page migrations UBIFS might get confused and the following assert triggers: [ 213.480000] UBIFS assert failed in ubifs_set_page_dirty at 1451 (pid 436) [ 213.490000] CPU: 0 PID: 436 Comm: drm-stress-test Not tainted 4.4.4-00176-geaa802524636-dirty #1008 [ 213.490000] Hardware name: Allwinner sun4i/sun5i Families [ 213.490000] [<c0015e70>] (unwind_backtrace) from [<c0012cdc>] (show_stack+0x10/0x14) [ 213.490000] [<c0012cdc>] (show_stack) from [<c02ad834>] (dump_stack+0x8c/0xa0) [ 213.490000] [<c02ad834>] (dump_stack) from [<c0236ee8>] (ubifs_set_page_dirty+0x44/0x50) [ 213.490000] [<c0236ee8>] (ubifs_set_page_dirty) from [<c00fa0bc>] (try_to_unmap_one+0x10c/0x3a8) [ 213.490000] [<c00fa0bc>] (try_to_unmap_one) from [<c00fadb4>] (rmap_walk+0xb4/0x290) [ 213.490000] [<c00fadb4>] (rmap_walk) from [<c00fb1bc>] (try_to_unmap+0x64/0x80) [ 213.490000] [<c00fb1bc>] (try_to_unmap) from [<c010dc28>] (migrate_pages+0x328/0x7a0) [ 213.490000] [<c010dc28>] (migrate_pages) from [<c00d0cb0>] (alloc_contig_range+0x168/0x2f4) [ 213.490000] [<c00d0cb0>] (alloc_contig_range) from [<c010ec00>] (cma_alloc+0x170/0x2c0) [ 213.490000] [<c010ec00>] (cma_alloc) from [<c001a958>] (__alloc_from_contiguous+0x38/0xd8) [ 213.490000] [<c001a958>] (__alloc_from_contiguous) from [<c001ad44>] (__dma_alloc+0x23c/0x274) [ 213.490000] [<c001ad44>] (__dma_alloc) from [<c001ae08>] (arm_dma_alloc+0x54/0x5c) [ 213.490000] [<c001ae08>] (arm_dma_alloc) from [<c035cecc>] (drm_gem_cma_create+0xb8/0xf0) [ 213.490000] [<c035cecc>] (drm_gem_cma_create) from [<c035cf20>] (drm_gem_cma_create_with_handle+0x1c/0xe8) [ 213.490000] [<c035cf20>] (drm_gem_cma_create_with_handle) from [<c035d088>] (drm_gem_cma_dumb_create+0x3c/0x48) [ 213.490000] [<c035d088>] (drm_gem_cma_dumb_create) from [<c0341ed8>] (drm_ioctl+0x12c/0x444) [ 213.490000] [<c0341ed8>] (drm_ioctl) from [<c0121adc>] (do_vfs_ioctl+0x3f4/0x614) [ 213.490000] [<c0121adc>] (do_vfs_ioctl) from [<c0121d30>] (SyS_ioctl+0x34/0x5c) [ 213.490000] [<c0121d30>] (SyS_ioctl) from [<c000f2c0>] (ret_fast_syscall+0x0/0x34) UBIFS is using PagePrivate() which can have different meanings across filesystems. Therefore the generic page migration code cannot handle this case correctly. We have to implement our own migration function which basically does a plain copy but also duplicates the page private flag. UBIFS is not a block device filesystem and cannot use buffer_migrate_page(). Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> [rw: Massaged changelog, build fixes, etc...] Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27NFS: Fix another OPEN_DOWNGRADE bugTrond Myklebust
commit e547f2628327fec6afd2e03b46f113f614cca05b upstream. Olga Kornievskaia reports that the following test fails to trigger an OPEN_DOWNGRADE on the wire, and only triggers the final CLOSE. fd0 = open(foo, RDRW) -- should be open on the wire for "both" fd1 = open(foo, RDONLY) -- should be open on the wire for "read" close(fd0) -- should trigger an open_downgrade read(fd1) close(fd1) The issue is that we're missing a check for whether or not the current state transitioned from an O_RDWR state as opposed to having transitioned from a combination of O_RDONLY and O_WRONLY. Reported-by: Olga Kornievskaia <aglo@umich.edu> Fixes: cd9288ffaea4 ("NFSv4: Fix another bug in the close/open_downgrade code") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27make nfs_atomic_open() call d_drop() on all ->open_context() errors.Al Viro
commit d20cb71dbf3487f24549ede1a8e2d67579b4632e upstream. In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code" unconditional d_drop() after the ->open_context() had been removed. It had been correct for success cases (there ->open_context() itself had been doing dcache manipulations), but not for error ones. Only one of those (ENOENT) got a compensatory d_drop() added in that commit, but in fact it should've been done for all errors. As it is, the case of O_CREAT non-exclusive open on a hashed negative dentry racing with e.g. symlink creation from another client ended up with ->open_context() getting an error and proceeding to call nfs_lookup(). On a hashed dentry, which would've instantly triggered BUG_ON() in d_materialise_unique() (or, these days, its equivalent in d_splice_alias()). Tested-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27nfsd: check permissions when setting ACLsBen Hutchings
commit 999653786df6954a31044528ac3f7a5dadca08f4 upstream. Use set_posix_acl, which includes proper permission checks, instead of calling ->set_acl directly. Without this anyone may be able to grant themselves permissions to a file by setting the ACL. Lock the inode to make the new checks atomic with respect to set_acl. (Also, nfsd was the only caller of set_acl not locking the inode, so I suspect this may fix other races.) This also simplifies the code, and ensures our ACLs are checked by posix_acl_valid. The permission checks and the inode locking were lost with commit 4ac7249e, which changed nfsd to use the set_acl inode operation directly instead of going through xattr handlers. Reported-by: David Sinquin <david@sinquin.eu> [agreunba@redhat.com: use set_posix_acl] Fixes: 4ac7249e Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27posix_acl: Add set_posix_aclAndreas Gruenbacher
commit 485e71e8fb6356c08c7fc6bcce4bf02c9a9a663f upstream. Factor out part of posix_acl_xattr_set into a common function that takes a posix_acl, which nfsd can also call. The prototype already exists in include/linux/posix_acl.h. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27nfsd: Extend the mutex holding region around in nfsd4_process_open2()Oleg Drokin
commit 5cc1fb2a093e254b656c64ff24b0b76bed1d34d9 upstream. To avoid racing entry into nfs4_get_vfs_file(). Make init_open_stateid() return with locked stateid to be unlocked by the caller. Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27nfsd: Always lock state exclusively.Oleg Drokin
commit feb9dad5209280085d5b0c094fa67e7a8d75c81a upstream. It used to be the case that state had an rwlock that was locked for write by downgrades, but for read for upgrades (opens). Well, the problem is if there are two competing opens for the same state, they step on each other toes potentially leading to leaking file descriptors from the state structure, since access mode is a bitmap only set once. Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27nfsd4/rpc: move backchannel create logic into rpc codeJ. Bruce Fields
commit d50039ea5ee63c589b0434baa5ecf6e5075bb6f9 upstream. Also simplify the logic a bit. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Trond Myklebust <trondmy@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27mnt: If fs_fully_visible fails call put_filesystem.Eric W. Biederman
commit 97c1df3e54e811aed484a036a798b4b25d002ecf upstream. Add this trivial missing error handling. Fixes: 1b852bceb0d1 ("mnt: Refactor the logic for mounting sysfs and proc in a user namespace") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27mnt: Account for MS_RDONLY in fs_fully_visibleEric W. Biederman
commit 695e9df010e40f407f4830dc11d53dce957710ba upstream. In rare cases it is possible for s_flags & MS_RDONLY to be set but MNT_READONLY to be clear. This starting combination can cause fs_fully_visible to fail to ensure that the new mount is readonly. Therefore force MNT_LOCK_READONLY in the new mount if MS_RDONLY is set on the source filesystem of the mount. In general both MS_RDONLY and MNT_READONLY are set at the same for mounts so I don't expect any programs to care. Nor do I expect MS_RDONLY to be set on proc or sysfs in the initial user namespace, which further decreases the likelyhood of problems. Which means this change should only affect system configurations by paranoid sysadmins who should welcome the additional protection as it keeps people from wriggling out of their policies. Fixes: 8c6cf9cc829f ("mnt: Modify fs_fully_visible to deal with locked ro nodev and atime") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27mnt: fs_fully_visible test the proper mount for MNT_LOCKEDEric W. Biederman
commit d71ed6c930ac7d8f88f3cef6624a7e826392d61f upstream. MNT_LOCKED implies on a child mount implies the child is locked to the parent. So while looping through the children the children should be tested (not their parent). Typically an unshare of a mount namespace locks all mounts together making both the parent and the slave as locked but there are a few corner cases where other things work. Fixes: ceeb0e5d39fc ("vfs: Ignore unlocked mounts in fs_fully_visible") Reported-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24fix d_walk()/non-delayed __d_free() raceAl Viro
commit 3d56c25e3bb0726a5c5e16fc2d9e38f8ed763085 upstream. Ascend-to-parent logics in d_walk() depends on all encountered child dentries not getting freed without an RCU delay. Unfortunately, in quite a few cases it is not true, with hard-to-hit oopsable race as the result. Fortunately, the fix is simiple; right now the rule is "if it ever been hashed, freeing must be delayed" and changing it to "if it ever had a parent, freeing must be delayed" closes that hole and covers all cases the old rule used to cover. Moreover, pipes and sockets remain _not_ covered, so we do not introduce RCU delay in the cases which are the reason for having that delay conditional in the first place. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24proc: prevent stacking filesystems on topJann Horn
commit e54ad7f1ee263ffa5a2de9c609d58dfa27b21cd9 upstream. This prevents stacking filesystems (ecryptfs and overlayfs) from using procfs as lower filesystem. There is too much magic going on inside procfs, and there is no good reason to stack stuff on top of procfs. (For example, procfs does access checks in VFS open handlers, and ecryptfs by design calls open handlers from a kernel thread that doesn't drop privileges or so.) Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24ecryptfs: forbid opening files without mmap handlerJann Horn
commit 2f36db71009304b3f0b95afacd8eba1f9f046b87 upstream. This prevents users from triggering a stack overflow through a recursive invocation of pagefault handling that involves mapping procfs files into virtual memory. Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07xfs: handle dquot buffer readahead in log recovery correctlyDave Chinner
commit 7d6a13f023567d573ac362502bb702eda716e654 upstream. When we do dquot readahead in log recovery, we do not use a verifier as the underlying buffer may not have dquots in it. e.g. the allocation operation hasn't yet been replayed. Hence we do not want to fail recovery because we detect an operation to be replayed has not been run yet. This problem was addressed for inodes in commit d891400 ("xfs: inode buffers may not be valid during recovery readahead") but the problem was not recognised to exist for dquots and their buffers as the dquot readahead did not have a verifier. The result of not using a verifier is that when the buffer is then next read to replay a dquot modification, the dquot buffer verifier will only be attached to the buffer if *readahead is not complete*. Hence we can read the buffer, replay the dquot changes and then add it to the delwri submission list without it having a verifier attached to it. This then generates warnings in xfs_buf_ioapply(), which catches and warns about this case. Fix this and make it handle the same readahead verifier error cases as for inode buffers by adding a new readahead verifier that has a write operation as well as a read operation that marks the buffer as not done if any corruption is detected. Also make sure we don't run readahead if the dquot buffer has been marked as cancelled by recovery. This will result in readahead either succeeding and the buffer having a valid write verifier, or readahead failing and the buffer state requiring the subsequent read to resubmit the IO with the new verifier. In either case, this will result in the buffer always ending up with a valid write verifier on it. Note: we also need to fix the inode buffer readahead error handling to mark the buffer with EIO. Brian noticed the code I copied from there wrong during review, so fix it at the same time. Add comments linking the two functions that handle readahead verifier errors together so we don't forget this behavioural link in future. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07xfs: print name of verifier if it failsEric Sandeen
commit 233135b763db7c64d07b728a9c66745fb0376275 upstream. This adds a name to each buf_ops structure, so that if a verifier fails we can print the type of verifier that failed it. Should be a slight debugging aid, I hope. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Cc: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07xfs: skip stale inodes in xfs_iflush_clusterDave Chinner
commit 7d3aa7fe970791f1a674b14572a411accf2f4d4e upstream. We don't write back stale inodes so we should skip them in xfs_iflush_cluster, too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07xfs: fix inode validity check in xfs_iflush_clusterDave Chinner
commit 51b07f30a71c27405259a0248206ed4e22adbee2 upstream. Some careless idiot(*) wrote crap code in commit 1a3e8f3 ("xfs: convert inode cache lookups to use RCU locking") back in late 2010, and so xfs_iflush_cluster checks the wrong inode for whether it is still valid under RCU protection. Fix it to lock and check the correct inode. (*) Careless-idiot: Dave Chinner <dchinner@redhat.com> Discovered-by: Brain Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07xfs: xfs_iflush_cluster fails to abort on errorDave Chinner
commit b1438f477934f5a4d5a44df26f3079a7575d5946 upstream. When a failure due to an inode buffer occurs, the error handling fails to abort the inode writeback correctly. This can result in the inode being reclaimed whilst still in the AIL, leading to use-after-free situations as well as filesystems that cannot be unmounted as the inode log items left in the AIL never get removed. Fix this by ensuring fatal errors from xfs_imap_to_bp() result in the inode flush being aborted correctly. Reported-by: Shyam Kaushik <shyam@zadarastorage.com> Diagnosed-by: Shyam Kaushik <shyam@zadarastorage.com> Tested-by: Shyam Kaushik <shyam@zadarastorage.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07xfs: Don't wrap growfs AGFL indexesDave Chinner
commit ad747e3b299671e1a53db74963cc6c5f6cdb9f6d upstream. Commit 96f859d ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct") allowed the freelist to use the empty slot at the end of the freelist on 64 bit systems that was not being used due to sizeof() rounding up the structure size. This has caused versions of xfs_repair prior to 4.5.0 (which also has the fix) to report this as a corruption once the filesystem has been grown. Older kernels can also have problems (seen from a whacky container/vm management environment) mounting filesystems grown on a system with a newer kernel than the vm/container it is deployed on. To avoid this problem, change the initial free list indexes not to wrap across the end of the AGFL, hence avoiding the initialisation of agf_fllast to the last index in the AGFL. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07xfs: disallow rw remount on fs with unknown ro-compat featuresEric Sandeen
commit d0a58e833931234c44e515b5b8bede32bd4e6eed upstream. Today, a kernel which refuses to mount a filesystem read-write due to unknown ro-compat features can still transition to read-write via the remount path. The old kernel is most likely none the wiser, because it's unaware of the new feature, and isn't using it. However, writing to the filesystem may well corrupt metadata related to that new feature, and moving to a newer kernel which understand the feature will have problems. Right now the only ro-compat feature we have is the free inode btree, which showed up in v3.16. It would be good to push this back to all the active stable kernels, I think, so that if anyone is using newer mkfs (which enables the finobt feature) with older kernel releases, they'll be protected. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07ext4: silence UBSAN in ext4_mb_init()Nicolai Stange
commit 935244cd54b86ca46e69bc6604d2adfb1aec2d42 upstream. Currently, in ext4_mb_init(), there's a loop like the following: do { ... offset += 1 << (sb->s_blocksize_bits - i); i++; } while (i <= sb->s_blocksize_bits + 1); Note that the updated offset is used in the loop's next iteration only. However, at the last iteration, that is at i == sb->s_blocksize_bits + 1, the shift count becomes equal to (unsigned)-1 > 31 (c.f. C99 6.5.7(3)) and UBSAN reports UBSAN: Undefined behaviour in fs/ext4/mballoc.c:2621:15 shift exponent 4294967295 is too large for 32-bit type 'int' [...] Call Trace: [<ffffffff818c4d25>] dump_stack+0xbc/0x117 [<ffffffff818c4c69>] ? _atomic_dec_and_lock+0x169/0x169 [<ffffffff819411ab>] ubsan_epilogue+0xd/0x4e [<ffffffff81941cac>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254 [<ffffffff81941ab1>] ? __ubsan_handle_load_invalid_value+0x158/0x158 [<ffffffff814b6dc1>] ? kmem_cache_alloc+0x101/0x390 [<ffffffff816fc13b>] ? ext4_mb_init+0x13b/0xfd0 [<ffffffff814293c7>] ? create_cache+0x57/0x1f0 [<ffffffff8142948a>] ? create_cache+0x11a/0x1f0 [<ffffffff821c2168>] ? mutex_lock+0x38/0x60 [<ffffffff821c23ab>] ? mutex_unlock+0x1b/0x50 [<ffffffff814c26ab>] ? put_online_mems+0x5b/0xc0 [<ffffffff81429677>] ? kmem_cache_create+0x117/0x2c0 [<ffffffff816fcc49>] ext4_mb_init+0xc49/0xfd0 [...] Observe that the mentioned shift exponent, 4294967295, equals (unsigned)-1. Unless compilers start to do some fancy transformations (which at least GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the such calculated value of offset is never used again. Silence UBSAN by introducing another variable, offset_incr, holding the next increment to apply to offset and adjust that one by right shifting it by one position per loop iteration. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161 Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07ext4: address UBSAN warning in mb_find_order_for_block()Nicolai Stange
commit b5cb316cdf3a3f5f6125412b0f6065185240cfdc upstream. Currently, in mb_find_order_for_block(), there's a loop like the following: while (order <= e4b->bd_blkbits + 1) { ... bb += 1 << (e4b->bd_blkbits - order); } Note that the updated bb is used in the loop's next iteration only. However, at the last iteration, that is at order == e4b->bd_blkbits + 1, the shift count becomes negative (c.f. C99 6.5.7(3)) and UBSAN reports UBSAN: Undefined behaviour in fs/ext4/mballoc.c:1281:11 shift exponent -1 is negative [...] Call Trace: [<ffffffff818c4d35>] dump_stack+0xbc/0x117 [<ffffffff818c4c79>] ? _atomic_dec_and_lock+0x169/0x169 [<ffffffff819411bb>] ubsan_epilogue+0xd/0x4e [<ffffffff81941cbc>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254 [<ffffffff81941ac1>] ? __ubsan_handle_load_invalid_value+0x158/0x158 [<ffffffff816e93a0>] ? ext4_mb_generate_from_pa+0x590/0x590 [<ffffffff816502c8>] ? ext4_read_block_bitmap_nowait+0x598/0xe80 [<ffffffff816e7b7e>] mb_find_order_for_block+0x1ce/0x240 [...] Unless compilers start to do some fancy transformations (which at least GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the such calculated value of bb is never used again. Silence UBSAN by introducing another variable, bb_incr, holding the next increment to apply to bb and adjust that one by right shifting it by one position per loop iteration. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161 Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07ext4: fix oops on corrupted filesystemJan Kara
commit 74177f55b70e2f2be770dd28684dd6d17106a4ba upstream. When filesystem is corrupted in the right way, it can happen ext4_mark_iloc_dirty() in ext4_orphan_add() returns error and we subsequently remove inode from the in-memory orphan list. However this deletion is done with list_del(&EXT4_I(inode)->i_orphan) and thus we leave i_orphan list_head with a stale content. Later we can look at this content causing list corruption, oops, or other issues. The reported trace looked like: WARNING: CPU: 0 PID: 46 at lib/list_debug.c:53 __list_del_entry+0x6b/0x100() list_del corruption, 0000000061c1d6e0->next is LIST_POISON1 0000000000100100) CPU: 0 PID: 46 Comm: ext4.exe Not tainted 4.1.0-rc4+ #250 Stack: 60462947 62219960 602ede24 62219960 602ede24 603ca293 622198f0 602f02eb 62219950 6002c12c 62219900 601b4d6b Call Trace: [<6005769c>] ? vprintk_emit+0x2dc/0x5c0 [<602ede24>] ? printk+0x0/0x94 [<600190bc>] show_stack+0xdc/0x1a0 [<602ede24>] ? printk+0x0/0x94 [<602ede24>] ? printk+0x0/0x94 [<602f02eb>] dump_stack+0x2a/0x2c [<6002c12c>] warn_slowpath_common+0x9c/0xf0 [<601b4d6b>] ? __list_del_entry+0x6b/0x100 [<6002c254>] warn_slowpath_fmt+0x94/0xa0 [<602f4d09>] ? __mutex_lock_slowpath+0x239/0x3a0 [<6002c1c0>] ? warn_slowpath_fmt+0x0/0xa0 [<60023ebf>] ? set_signals+0x3f/0x50 [<600a205a>] ? kmem_cache_free+0x10a/0x180 [<602f4e88>] ? mutex_lock+0x18/0x30 [<601b4d6b>] __list_del_entry+0x6b/0x100 [<601177ec>] ext4_orphan_del+0x22c/0x2f0 [<6012f27c>] ? __ext4_journal_start_sb+0x2c/0xa0 [<6010b973>] ? ext4_truncate+0x383/0x390 [<6010bc8b>] ext4_write_begin+0x30b/0x4b0 [<6001bb50>] ? copy_from_user+0x0/0xb0 [<601aa840>] ? iov_iter_fault_in_readable+0xa0/0xc0 [<60072c4f>] generic_perform_write+0xaf/0x1e0 [<600c4166>] ? file_update_time+0x46/0x110 [<60072f0f>] __generic_file_write_iter+0x18f/0x1b0 [<6010030f>] ext4_file_write_iter+0x15f/0x470 [<60094e10>] ? unlink_file_vma+0x0/0x70 [<6009b180>] ? unlink_anon_vmas+0x0/0x260 [<6008f169>] ? free_pgtables+0xb9/0x100 [<600a6030>] __vfs_write+0xb0/0x130 [<600a61d5>] vfs_write+0xa5/0x170 [<600a63d6>] SyS_write+0x56/0xe0 [<6029fcb0>] ? __libc_waitpid+0x0/0xa0 [<6001b698>] handle_syscall+0x68/0x90 [<6002633d>] userspace+0x4fd/0x600 [<6002274f>] ? save_registers+0x1f/0x40 [<60028bd7>] ? arch_prctl+0x177/0x1b0 [<60017bd5>] fork_handler+0x85/0x90 Fix the problem by using list_del_init() as we always should with i_orphan list. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07ext4: clean up error handling when orphan list is corruptedTheodore Ts'o
commit 7827a7f6ebfcb7f388dc47fddd48567a314701ba upstream. Instead of just printing warning messages, if the orphan list is corrupted, declare the file system is corrupted. If there are any reserved inodes in the orphaned inode list, declare the file system corrupted and stop right away to avoid doing more potential damage to the file system. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07ext4: fix hang when processing corrupted orphaned inode listTheodore Ts'o
commit c9eb13a9105e2e418f72e46a2b6da3f49e696902 upstream. If the orphaned inode list contains inode #5, ext4_iget() returns a bad inode (since the bootloader inode should never be referenced directly). Because of the bad inode, we end up processing the inode repeatedly and this hangs the machine. This can be reproduced via: mke2fs -t ext4 /tmp/foo.img 100 debugfs -w -R "ssv last_orphan 5" /tmp/foo.img mount -o loop /tmp/foo.img /mnt (But don't do this if you are using an unpatched kernel if you care about the system staying functional. :-) This bug was found by the port of American Fuzzy Lop into the kernel to find file system problems[1]. (Since it *only* happens if inode #5 shows up on the orphan list --- 3, 7, 8, etc. won't do it, it's not surprising that AFL needed two hours before it found it.) [1] http://events.linuxfoundation.org/sites/events/files/slides/AFL%20filesystem%20fuzzing%2C%20Vault%202016_0.pdf Reported by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07pipe: limit the per-user amount of pages allocated in pipesWilly Tarreau
commit 759c01142a5d0f364a462346168a56de28a80f52 upstream. On no-so-small systems, it is possible for a single process to cause an OOM condition by filling large pipes with data that are never read. A typical process filling 4000 pipes with 1 MB of data will use 4 GB of memory. On small systems it may be tricky to set the pipe max size to prevent this from happening. This patch makes it possible to enforce a per-user soft limit above which new pipes will be limited to a single page, effectively limiting them to 4 kB each, as well as a hard limit above which no new pipes may be created for this user. This has the effect of protecting the system against memory abuse without hurting other users, and still allowing pipes to work correctly though with less data at once. The limit are controlled by two new sysctls : pipe-user-pages-soft, and pipe-user-pages-hard. Both may be disabled by setting them to zero. The default soft limit allows the default number of FDs per process (1024) to create pipes of the default size (64kB), thus reaching a limit of 64MB before starting to create only smaller pipes. With 256 processes limited to 1024 FDs each, this results in 1024*64kB + (256*1024 - 1024) * 4kB = 1084 MB of memory allocated for a user. The hard limit is disabled by default to avoid breaking existing applications that make intensive use of pipes (eg: for splicing). Reported-by: socketpair@gmail.com Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Mitigates: CVE-2013-4312 (Linux 2.0+) Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Moritz Muehlenhoff <moritz@wikimedia.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07affs: fix remount failure when there are no options changedMikulas Patocka
commit 01d6e08711bf90bc4d7ead14a93a0cbd73b1896a upstream. Commit c8f33d0bec99 ("affs: kstrdup() memory handling") checks if the kstrdup function returns NULL due to out-of-memory condition. However, if we are remounting a filesystem with no change to filesystem-specific options, the parameter data is NULL. In this case, kstrdup returns NULL (because it was passed NULL parameter), although no out of memory condition exists. The mount syscall then fails with ENOMEM. This patch fixes the bug. We fail with ENOMEM only if data is non-NULL. The patch also changes the call to replace_mount_options - if we didn't pass any filesystem-specific options, we don't call replace_mount_options (thus we don't erase existing reported options). Fixes: c8f33d0bec99 ("affs: kstrdup() memory handling") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01hpfs: implement the show_options methodMikulas Patocka
commit 037369b872940cd923835a0a589763180c4a36bc upstream. The HPFS filesystem used generic_show_options to produce string that is displayed in /proc/mounts. However, there is a problem that the options may disappear after remount. If we mount the filesystem with option1 and then remount it with option2, /proc/mounts should show both option1 and option2, however it only shows option2 because the whole option string is replaced with replace_mount_options in hpfs_remount_fs. To fix this bug, implement the hpfs_show_options function that prints options that are currently selected. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01hpfs: fix remount failure when there are no options changedMikulas Patocka
commit 44d51706b4685f965cd32acde3fe0fcc1e6198e8 upstream. Commit ce657611baf9 ("hpfs: kstrdup() out of memory handling") checks if the kstrdup function returns NULL due to out-of-memory condition. However, if we are remounting a filesystem with no change to filesystem-specific options, the parameter data is NULL. In this case, kstrdup returns NULL (because it was passed NULL parameter), although no out of memory condition exists. The mount syscall then fails with ENOMEM. This patch fixes the bug. We fail with ENOMEM only if data is non-NULL. The patch also changes the call to replace_mount_options - if we didn't pass any filesystem-specific options, we don't call replace_mount_options (thus we don't erase existing reported options). Fixes: ce657611baf9 ("hpfs: kstrdup() out of memory handling") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01fs/cifs: correctly to anonymous authentication for the NTLM(v2) authenticationStefan Metzmacher
commit 1a967d6c9b39c226be1b45f13acd4d8a5ab3dc44 upstream. Only server which map unknown users to guest will allow access using a non-null NTLMv2_Response. For Samba it's the "map to guest = bad user" option. BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913 Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01fs/cifs: correctly to anonymous authentication for the NTLM(v1) authenticationStefan Metzmacher
commit 777f69b8d26bf35ade4a76b08f203c11e048365d upstream. Only server which map unknown users to guest will allow access using a non-null NTChallengeResponse. For Samba it's the "map to guest = bad user" option. BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913 Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01fs/cifs: correctly to anonymous authentication for the LANMAN authenticationStefan Metzmacher
commit fa8f3a354bb775ec586e4475bcb07f7dece97e0c upstream. Only server which map unknown users to guest will allow access using a non-null LMChallengeResponse. For Samba it's the "map to guest = bad user" option. BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913 Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01fs/cifs: correctly to anonymous authentication via NTLMSSPStefan Metzmacher
commit cfda35d98298131bf38fbad3ce4cd5ecb3cf18db upstream. See [MS-NLMP] 3.2.5.1.2 Server Receives an AUTHENTICATE_MESSAGE from the Client: ... Set NullSession to FALSE If (AUTHENTICATE_MESSAGE.UserNameLen == 0 AND AUTHENTICATE_MESSAGE.NtChallengeResponse.Length == 0 AND (AUTHENTICATE_MESSAGE.LmChallengeResponse == Z(1) OR AUTHENTICATE_MESSAGE.LmChallengeResponse.Length == 0)) -- Special case: client requested anonymous authentication Set NullSession to TRUE ... Only server which map unknown users to guest will allow access using a non-null NTChallengeResponse. For Samba it's the "map to guest = bad user" option. BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913 Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01remove directory incorrectly tries to set delete on close on non-empty ↵Steve French
directories commit 897fba1172d637d344f009d700f7eb8a1fa262f1 upstream. Wrong return code was being returned on SMB3 rmdir of non-empty directory. For SMB3 (unlike for cifs), we attempt to delete a directory by set of delete on close flag on the open. Windows clients set this flag via a set info (SET_FILE_DISPOSITION to set this flag) which properly checks if the directory is empty. With this patch on smb3 mounts we correctly return "DIRECTORY NOT EMPTY" on attempts to remove a non-empty directory. Signed-off-by: Steve French <steve.french@primarydata.com> Acked-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01ext4: iterate over buffer heads correctly in move_extent_per_page()Eryu Guan
commit 6ffe77bad545f4a7c8edd2a4ee797ccfcd894ab4 upstream. In commit bcff24887d00 ("ext4: don't read blocks from disk after extents being swapped") bh is not updated correctly in the for loop and wrong data has been written to disk. generic/324 catches this on sub-page block size ext4. Fixes: bcff24887d00 ("ext4: don't read blocks from disk after extentsbeing swapped") Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01Btrfs: don't use src fd for printkJosef Bacik
commit c79b4713304f812d3d6c95826fc3e5fc2c0b0c14 upstream. The fd we pass in may not be on a btrfs file system, so don't try to do BTRFS_I() on it. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-18get_rock_ridge_filename(): handle malformed NM entriesAl Viro
commit 99d825822eade8d827a1817357cbf3f889a552d6 upstream. Payloads of NM entries are not supposed to contain NUL. When we run into such, only the part prior to the first NUL goes into the concatenation (i.e. the directory entry name being encoded by a bunch of NM entries). We do stop when the amount collected so far + the claimed amount in the current NM entry exceed 254. So far, so good, but what we return as the total length is the sum of *claimed* sizes, not the actual amount collected. And that can grow pretty large - not unlimited, since you'd need to put CE entries in between to be able to get more than the maximum that could be contained in one isofs directory entry / continuation chunk and we are stop once we'd encountered 32 CEs, but you can get about 8Kb easily. And that's what will be passed to readdir callback as the name length. 8Kb __copy_to_user() from a buffer allocated by __get_free_page() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-18atomic_open(): fix the handling of create_errorAl Viro
commit 10c64cea04d3c75c306b3f990586ffb343b63287 upstream. * if we have a hashed negative dentry and either CREAT|EXCL on r/o filesystem, or CREAT|TRUNC on r/o filesystem, or CREAT|EXCL with failing may_o_create(), we should fail with EROFS or the error may_o_create() has returned, but not ENOENT. Which is what the current code ends up returning. * if we have CREAT|TRUNC hitting a regular file on a read-only filesystem, we can't fail with EROFS here. At the very least, not until we'd done follow_managed() - we might have a writable file (or a device, for that matter) bound on top of that one. Moreover, the code downstream will see that O_TRUNC and attempt to grab the write access (*after* following possible mount), so if we really should fail with EROFS, it will happen. No need to do that inside atomic_open(). The real logics is much simpler than what the current code is trying to do - if we decided to go for simple lookup, ended up with a negative dentry *and* had create_error set, fail with create_error. No matter whether we'd got that negative dentry from lookup_real() or had found it in dcache. Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-18vfs: rename: check backing inode being equalMiklos Szeredi
commit 9409e22acdfc9153f88d9b1ed2bd2a5b34d2d3ca upstream. If a file is renamed to a hardlink of itself POSIX specifies that rename(2) should do nothing and return success. This condition is checked in vfs_rename(). However it won't detect hard links on overlayfs where these are given separate inodes on the overlayfs layer. Overlayfs itself detects this condition and returns success without doing anything, but then vfs_rename() will proceed as if this was a successful rename (detach_mounts(), d_move()). The correct thing to do is to detect this condition before even calling into overlayfs. This patch does this by calling vfs_select_inode() to get the underlying inodes. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-18vfs: add vfs_select_inode() helperMiklos Szeredi
commit 54d5ca871e72f2bb172ec9323497f01cd5091ec7 upstream. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-18ocfs2: fix posix_acl_create deadlockJunxiao Bi
commit c25a1e0671fbca7b2c0d0757d533bd2650d6dc0c upstream. Commit 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure") refactored code to use posix_acl_create. The problem with this function is that it is not mindful of the cluster wide inode lock making it unsuitable for use with ocfs2 inode creation with ACLs. For example, when used in ocfs2_mknod, this function can cause deadlock as follows. The parent dir inode lock is taken when calling posix_acl_create -> get_acl -> ocfs2_iop_get_acl which takes the inode lock again. This can cause deadlock if there is a blocked remote lock request waiting for the lock to be downconverted. And same deadlock happened in ocfs2_reflink. This fix is to revert back using ocfs2_init_acl. Fixes: 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure") Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-18ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hangJunxiao Bi
commit 5ee0fbd50fdf1c1329de8bee35ea9d7c6a81a2e0 upstream. Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") introduced this issue. ocfs2_setattr called by chmod command holds cluster wide inode lock when calling posix_acl_chmod. This latter function in turn calls ocfs2_iop_get_acl and ocfs2_iop_set_acl. These two are also called directly from vfs layer for getfacl/setfacl commands and therefore acquire the cluster wide inode lock. If a remote conversion request comes after the first inode lock in ocfs2_setattr, OCFS2_LOCK_BLOCKED will be set. And this will cause the second call to inode lock from the ocfs2_iop_get_acl() to block indefinetly. The deleted version of ocfs2_acl_chmod() calls __posix_acl_chmod() which does not call back into the filesystem. Therefore, we restore ocfs2_acl_chmod(), modify it slightly for locking as needed, and use that instead. Fixes: 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>