summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-06-24jbd2: get rid of superfluous __GFP_REPEATMichal Hocko
jbd2_alloc is explicit about its allocation preferences wrt. the allocation size. Sub page allocations go to the slab allocator and larger are using either the page allocator or vmalloc. This is all good but the logic is unnecessarily complex. 1) as per Ted, the vmalloc fallback is a left-over: : jbd2_alloc is only passed in the bh->b_size, which can't be PAGE_SIZE, so : the code path that calls vmalloc() should never get called. When we : conveted jbd2_alloc() to suppor sub-page size allocations in commit : d2eecb039368, there was an assumption that it could be called with a size : greater than PAGE_SIZE, but that's certaily not true today. Moreover vmalloc allocation might even lead to a deadlock because the callers expect GFP_NOFS context while vmalloc is GFP_KERNEL. 2) __GFP_REPEAT for requests <= PAGE_ALLOC_COSTLY_ORDER is ignored since the flag was introduced. Let's simplify the code flow and use the slab allocator for sub-page requests and the page allocator for others. Even though order > 0 is not currently used as per above leave that option open. Link: http://lkml.kernel.org/r/1464599699-30131-18-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-23Merge tag 'upstream-4.7-rc5' of git://git.infradead.org/linux-ubifsLinus Torvalds
Pull UBI/UBIFS fixes from Richard Weinberger: "This contains fixes for two critical bugs in UBI and UBIFS: - fix the possibility of losing data upon a power cut when UBI tries to recover from a write error - fix page migration on UBIFS. It turned out that the default page migration function is not suitable for UBIFS" * tag 'upstream-4.7-rc5' of git://git.infradead.org/linux-ubifs: UBIFS: Implement ->migratepage() mm: Export migrate_page_move_mapping and migrate_page_copy ubi: Make recover_peb power cut aware gpio: make library immune to error pointers gpio: make sure gpiod_to_irq() returns negative on NULL desc gpio: 104-idi-48: Fix missing spin_lock_init for ack_lock
2016-06-23UBIFS: Implement ->migratepage()Kirill A. Shutemov
During page migrations UBIFS might get confused and the following assert triggers: [ 213.480000] UBIFS assert failed in ubifs_set_page_dirty at 1451 (pid 436) [ 213.490000] CPU: 0 PID: 436 Comm: drm-stress-test Not tainted 4.4.4-00176-geaa802524636-dirty #1008 [ 213.490000] Hardware name: Allwinner sun4i/sun5i Families [ 213.490000] [<c0015e70>] (unwind_backtrace) from [<c0012cdc>] (show_stack+0x10/0x14) [ 213.490000] [<c0012cdc>] (show_stack) from [<c02ad834>] (dump_stack+0x8c/0xa0) [ 213.490000] [<c02ad834>] (dump_stack) from [<c0236ee8>] (ubifs_set_page_dirty+0x44/0x50) [ 213.490000] [<c0236ee8>] (ubifs_set_page_dirty) from [<c00fa0bc>] (try_to_unmap_one+0x10c/0x3a8) [ 213.490000] [<c00fa0bc>] (try_to_unmap_one) from [<c00fadb4>] (rmap_walk+0xb4/0x290) [ 213.490000] [<c00fadb4>] (rmap_walk) from [<c00fb1bc>] (try_to_unmap+0x64/0x80) [ 213.490000] [<c00fb1bc>] (try_to_unmap) from [<c010dc28>] (migrate_pages+0x328/0x7a0) [ 213.490000] [<c010dc28>] (migrate_pages) from [<c00d0cb0>] (alloc_contig_range+0x168/0x2f4) [ 213.490000] [<c00d0cb0>] (alloc_contig_range) from [<c010ec00>] (cma_alloc+0x170/0x2c0) [ 213.490000] [<c010ec00>] (cma_alloc) from [<c001a958>] (__alloc_from_contiguous+0x38/0xd8) [ 213.490000] [<c001a958>] (__alloc_from_contiguous) from [<c001ad44>] (__dma_alloc+0x23c/0x274) [ 213.490000] [<c001ad44>] (__dma_alloc) from [<c001ae08>] (arm_dma_alloc+0x54/0x5c) [ 213.490000] [<c001ae08>] (arm_dma_alloc) from [<c035cecc>] (drm_gem_cma_create+0xb8/0xf0) [ 213.490000] [<c035cecc>] (drm_gem_cma_create) from [<c035cf20>] (drm_gem_cma_create_with_handle+0x1c/0xe8) [ 213.490000] [<c035cf20>] (drm_gem_cma_create_with_handle) from [<c035d088>] (drm_gem_cma_dumb_create+0x3c/0x48) [ 213.490000] [<c035d088>] (drm_gem_cma_dumb_create) from [<c0341ed8>] (drm_ioctl+0x12c/0x444) [ 213.490000] [<c0341ed8>] (drm_ioctl) from [<c0121adc>] (do_vfs_ioctl+0x3f4/0x614) [ 213.490000] [<c0121adc>] (do_vfs_ioctl) from [<c0121d30>] (SyS_ioctl+0x34/0x5c) [ 213.490000] [<c0121d30>] (SyS_ioctl) from [<c000f2c0>] (ret_fast_syscall+0x0/0x34) UBIFS is using PagePrivate() which can have different meanings across filesystems. Therefore the generic page migration code cannot handle this case correctly. We have to implement our own migration function which basically does a plain copy but also duplicates the page private flag. UBIFS is not a block device filesystem and cannot use buffer_migrate_page(). Cc: stable@vger.kernel.org Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> [rw: Massaged changelog, build fixes, etc...] Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Christoph Hellwig <hch@lst.de>
2016-06-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull userns fix from Eric Biederman: "This contains just a single small patch that fixes a tiny hole in the logic of allowing unprivileged mounting of proc and sysfs. In practice I don't think anyone is affected because having MNT_RDONLY clear in mnt->mnt_flags but MS_RDONLY set in sb->s_flags is very weird for a filesystem, and weirder for proc and sysfs. However if it happens let's handle it correctly and then no one has to to worry about this crazy case" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: mnt: Account for MS_RDONLY in fs_fully_visible
2016-06-20Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "A couple more of d_walk()/d_subdirs reordering fixes (stable fodder; ought to solve that crap for good) and a fix for a brown paperbag bug in d_alloc_parallel() (this cycle)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix idiotic braino in d_alloc_parallel() autofs races much milder d_walk() race
2016-06-20fix idiotic braino in d_alloc_parallel()Al Viro
Check for d_unhashed() while searching in in-lookup hash was absolutely wrong. Worse, it masked a deadlock on dget() done under bitlock that nests inside ->d_lock. Thanks to J. R. Okajima for spotting it. Spotted-by: "J. R. Okajima" <hooanon05g@gmail.com> Wearing-brown-paperbag: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-19Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF fixes and a reiserfs fix from Jan Kara: "A couple of udf fixes (most notably a bug in parsing UDF partitions which led to inability to mount recent Windows installation media) and a reiserfs fix for handling kstrdup failure" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: reiserfs: check kstrdup failure udf: Use correct partition reference number for metadata udf: Use IS_ERR when loading metadata mirror file entry udf: Don't BUG on missing metadata partition descriptor
2016-06-18Merge tag 'driver-core-4.7-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are a small number of debugfs, ISA, and one driver core fix for 4.7-rc4. All of these resolve reported issues. The ISA ones have spent the least amount of time in linux-next, sorry about that, I didn't realize they were regressions that needed to get in now (thanks to Thorsten for the prodding!) but they do all pass the 0-day bot tests. The others have been in linux-next for a while now. Full details about them are in the shortlog below" * tag 'driver-core-4.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: isa: Dummy isa_register_driver should return error code isa: Call isa_bus_init before dependent ISA bus drivers register watchdog: ebc-c384_wdt: Allow build for X86_64 iio: stx104: Allow build for X86_64 gpio: Allow PC/104 devices on X86_64 isa: Allow ISA-style drivers on modern systems base: make module_create_drivers_dir race-free debugfs: open_proxy_open(): avoid double fops release debugfs: full_proxy_open(): free proxy on ->open() failure kernel/kcov: unproxify debugfs file's fops
2016-06-18Merge branch 'for-linus-4.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "The most user visible change here is a fix for our recent superblock validation checks that were causing problems on non-4k pagesized systems" * 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: btrfs_check_super_valid: Allow 4096 as stripesize btrfs: remove build fixup for qgroup_account_snapshot btrfs: use new error message helper in qgroup_account_snapshot btrfs: avoid blocking open_ctree from cleaner_kthread Btrfs: don't BUG_ON() in btrfs_orphan_add btrfs: account for non-CoW'd blocks in btrfs_abort_transaction Btrfs: check if extent buffer is aligned to sectorsize btrfs: Use correct format specifier
2016-06-17Btrfs: btrfs_check_super_valid: Allow 4096 as stripesizeChandan Rajendra
Older btrfs-progs/mkfs.btrfs sets 4096 as the stripesize. Hence restricting stripesize to be equal to sectorsize would cause super block validation to return an error on architectures where PAGE_SIZE is not equal to 4096. Hence as a workaround, this commit allows stripesize to be set to 4096 bytes. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17btrfs: remove build fixup for qgroup_account_snapshotDavid Sterba
Introduced in 2c1984f244838477aab ("btrfs: build fixup for qgroup_account_snapshot") as temporary bisectability build fixup. Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17btrfs: use new error message helper in qgroup_account_snapshotDavid Sterba
We've renamed btrfs_std_error, this one is left from last merge. Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17btrfs: avoid blocking open_ctree from cleaner_kthreadZygo Blaxell
This fixes a problem introduced in commit 2f3165ecf103599f82bf0ea254039db335fb5005 "btrfs: don't force mounts to wait for cleaner_kthread to delete one or more subvolumes". open_ctree eventually calls btrfs_replay_log which in turn calls btrfs_commit_super which tries to lock the cleaner_mutex, causing a recursive mutex deadlock during mount. Instead of playing whack-a-mole trying to keep up with all the functions that may want to lock cleaner_mutex, put all the cleaner_mutex lockers back where they were, and attack the problem more directly: keep cleaner_kthread asleep until the filesystem is mounted. When filesystems are mounted read-only and later remounted read-write, open_ctree did not set fs_info->open and neither does anything else. Set this flag in btrfs_remount so that neither btrfs_delete_unused_bgs nor cleaner_kthread get confused by the common case of "/" filesystem read-only mount followed by read-write remount. Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17Btrfs: don't BUG_ON() in btrfs_orphan_addJosef Bacik
This is just a screwup for developers, so change it to an ASSERT() so developers notice when things go wrong and deal with the error appropriately if ASSERT() isn't enabled. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17btrfs: account for non-CoW'd blocks in btrfs_abort_transactionJeff Mahoney
The test for !trans->blocks_used in btrfs_abort_transaction is insufficient to determine whether it's safe to drop the transaction handle on the floor. btrfs_cow_block, informed by should_cow_block, can return blocks that have already been CoW'd in the current transaction. trans->blocks_used is only incremented for new block allocations. If an operation overlaps the blocks in the current transaction entirely and must abort the transaction, we'll happily let it clean up the trans handle even though it may have modified the blocks and will commit an incomplete operation. In the long-term, I'd like to do closer tracking of when the fs is actually modified so we can still recover as gracefully as possible, but that approach will need some discussion. In the short term, since this is the only code using trans->blocks_used, let's just switch it to a bool indicating whether any blocks were used and set it when should_cow_block returns false. Cc: stable@vger.kernel.org # 3.4+ Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17Btrfs: check if extent buffer is aligned to sectorsizeLiu Bo
Thanks to fuzz testing, we can pass an invalid bytenr to extent buffer via alloc_extent_buffer(). An unaligned eb can have more pages than it should have, which ends up extent buffer's leak or some corrupted content in extent buffer. This adds a warning to let us quickly know what was happening. Now that alloc_extent_buffer() no more returns NULL, this changes its caller and callers of its caller to match with the new error handling. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17btrfs: Use correct format specifierHeinrich Schuchardt
Component mirror_num of struct btrfsic_block is defined as unsigned int. Use %u as format specifier. Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-16Merge tag 'nfsd-4.7-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd bugfixes from Bruce Fields: "Oleg Drokin found and fixed races in the nfsd4 state code that go back to the big nfs4_lock_state removal around 3.17 (but that were also probably hard to reproduce before client changes in 3.20 allowed the client to perform parallel opens). Also fix a 4.1 backchannel crash due to rpc multipath changes in 4.6. Trond acked the client-side rpc fixes going through my tree" * tag 'nfsd-4.7-1' of git://linux-nfs.org/~bfields/linux: nfsd: Make init_open_stateid() a bit more whole nfsd: Extend the mutex holding region around in nfsd4_process_open2() nfsd: Always lock state exclusively. rpc: share one xps between all backchannels nfsd4/rpc: move backchannel create logic into rpc code SUNRPC: fix xprt leak on xps allocation failure nfsd: Fix NFSD_MDS_PR_KEY on 32-bit by adding ULL postfix
2016-06-16Merge branch 'overlayfs-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "This contains two regression fixes: one for the xattr API update and one for using the mounter's creds in file creation in overlayfs. There's also a fix for a bug in handling hard linked AF_UNIX sockets that's been there from day one. This fix is overlayfs only despite the fact that it touches code outside the overlay filesystem: d_real() is an identity function for all except overlay dentries" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix uid/gid when creating over whiteout ovl: xattr filter fix af_unix: fix hard linked sockets on overlay vfs: add d_real_inode() helper
2016-06-15nfsd: Make init_open_stateid() a bit more wholeOleg Drokin
Move the state selection logic inside from the caller, always making it return correct stp to use. Signed-off-by: J . Bruce Fields <bfields@fieldses.org> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-15nfsd: Extend the mutex holding region around in nfsd4_process_open2()Oleg Drokin
To avoid racing entry into nfs4_get_vfs_file(). Make init_open_stateid() return with locked stateid to be unlocked by the caller. Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-15nfsd: Always lock state exclusively.Oleg Drokin
It used to be the case that state had an rwlock that was locked for write by downgrades, but for read for upgrades (opens). Well, the problem is if there are two competing opens for the same state, they step on each other toes potentially leading to leaking file descriptors from the state structure, since access mode is a bitmap only set once. Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-15nfsd4/rpc: move backchannel create logic into rpc codeJ. Bruce Fields
Also simplify the logic a bit. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Trond Myklebust <trondmy@primarydata.com>
2016-06-15ovl: fix uid/gid when creating over whiteoutMiklos Szeredi
Fix a regression when creating a file over a whiteout. The new file/directory needs to use the current fsuid/fsgid, not the ones from the mounter's credentials. The refcounting is a bit tricky: prepare_creds() sets an original refcount, override_creds() gets one more, which revert_cred() drops. So 1) we need to expicitly put the mounter's credentials when overriding with the updated one 2) we need to put the original ref to the updated creds (and this can safely be done before revert_creds(), since we'll still have the ref from override_creds()). Reported-by: Stephen Smalley <sds@tycho.nsa.gov> Fixes: 3fe6e52f0626 ("ovl: override creds with the ones from the superblock mounter") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-06-15debugfs: open_proxy_open(): avoid double fops releaseNicolai Stange
Debugfs' open_proxy_open(), the ->open() installed at all inodes created through debugfs_create_file_unsafe(), - grabs a reference to the original file_operations instance passed to debugfs_create_file_unsafe() via fops_get(), - installs it at the file's ->f_op by means of replace_fops() - and calls fops_put() on it. Since the semantics of replace_fops() are such that the reference's ownership is transferred, the subsequent fops_put() will result in a double release when the file is eventually closed. Currently, this is not an issue since fops_put() basically does a module_put() on the file_operations' ->owner only and there don't exist any modules calling debugfs_create_file_unsafe() yet. This is expected to change in the future though, c.f. commit c64688081490 ("debugfs: add support for self-protecting attribute file fops"). Remove the call to fops_put() from open_proxy_open(). Fixes: 9fd4dcece43a ("debugfs: prevent access to possibly dead file_operations at file open") Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-15debugfs: full_proxy_open(): free proxy on ->open() failureNicolai Stange
Debugfs' full_proxy_open(), the ->open() installed at all inodes created through debugfs_create_file(), - grabs a reference to the original struct file_operations instance passed to debugfs_create_file(), - dynamically allocates a proxy struct file_operations instance wrapping the original - and installs this at the file's ->f_op. Afterwards, it calls the original ->open() and passes its return value back to the VFS layer. Now, if that return value indicates failure, the VFS layer won't ever call ->release() and thus, neither the reference to the original file_operations nor the memory for the proxy file_operations will get released, i.e. both are leaked. Upon failure of the original fops' ->open(), undo the proxy installation. That is: - Set the struct file ->f_op to what it had been when full_proxy_open() was entered. - Drop the reference to the original file_operations. - Free the memory holding the proxy file_operations. Fixes: 49d200deaa68 ("debugfs: prevent access to removed files' private data") Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-15mnt: Account for MS_RDONLY in fs_fully_visibleEric W. Biederman
In rare cases it is possible for s_flags & MS_RDONLY to be set but MNT_READONLY to be clear. This starting combination can cause fs_fully_visible to fail to ensure that the new mount is readonly. Therefore force MNT_LOCK_READONLY in the new mount if MS_RDONLY is set on the source filesystem of the mount. In general both MS_RDONLY and MNT_READONLY are set at the same for mounts so I don't expect any programs to care. Nor do I expect MS_RDONLY to be set on proc or sysfs in the initial user namespace, which further decreases the likelyhood of problems. Which means this change should only affect system configurations by paranoid sysadmins who should welcome the additional protection as it keeps people from wriggling out of their policies. Cc: stable@vger.kernel.org Fixes: 8c6cf9cc829f ("mnt: Modify fs_fully_visible to deal with locked ro nodev and atime") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-06-14nfsd: Fix NFSD_MDS_PR_KEY on 32-bit by adding ULL postfixGeert Uytterhoeven
On 32-bit: fs/nfsd/blocklayout.c: In function ‘nfsd4_block_get_device_info_scsi’: fs/nfsd/blocklayout.c:337: warning: integer constant is too large for ‘long’ type fs/nfsd/blocklayout.c:344: warning: integer constant is too large for ‘long’ type fs/nfsd/blocklayout.c: In function ‘nfsd4_scsi_fence_client’: fs/nfsd/blocklayout.c:385: warning: integer constant is too large for ‘long’ type Add the missing "ULL" postfix to 64-bit constant NFSD_MDS_PR_KEY to fix this. Fixes: f99d4fbdae6765d0 ("nfsd: add SCSI layout support") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-12autofs racesAl Viro
* make autofs4_expire_indirect() skip the dentries being in process of expiry * do *not* mess with list_move(); making sure that dentry with AUTOFS_INF_EXPIRING are not picked for expiry is enough. * do not remove NO_RCU when we set EXPIRING, don't bother with smp_mb() there. Clear it at the same time we clear EXPIRING. Makes a bunch of tests simpler. * rename NO_RCU to WANT_EXPIRE, which is what it really is. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-10Merge branch 'for-linus-4.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "Has some fixes and some new self tests for btrfs. The self tests are usually disabled in the .config file (unless you're doing btrfs dev work), and this bunch is meant to find problems with the 64K page size patches. Jeff has a patch to help people see if they are using the hardware assist crc32c module, which really helps us nail down problems when people ask why crcs are using so much CPU. Otherwise, it's small fixes" * 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: self-tests: Fix extent buffer bitmap test fail on BE system Btrfs: self-tests: Fix test_bitmaps fail on 64k sectorsize Btrfs: self-tests: Use macros instead of constants and add missing newline Btrfs: self-tests: Support testing all possible sectorsizes and nodesizes Btrfs: self-tests: Execute page straddling test only when nodesize < PAGE_SIZE btrfs: advertise which crc32c implementation is being used at module load Btrfs: add validadtion checks for chunk loading Btrfs: add more validation checks for superblock Btrfs: clear uptodate flags of pages in sys_array eb Btrfs: self-tests: Support non-4k page size Btrfs: Fix integer overflow when calculating bytes_per_bitmap Btrfs: test_check_exists: Fix infinite loop when searching for free space entries Btrfs: end transaction if we abort when creating uuid root btrfs: Use __u64 in exported linux/btrfs.h.
2016-06-10Merge branch 'stacking-fixes' (vfs stacking fixes from Jann)Linus Torvalds
Merge filesystem stacking fixes from Jann Horn. * emailed patches from Jann Horn <jannh@google.com>: sched: panic on corrupted stack end ecryptfs: forbid opening files without mmap handler proc: prevent stacking filesystems on top
2016-06-10ecryptfs: forbid opening files without mmap handlerJann Horn
This prevents users from triggering a stack overflow through a recursive invocation of pagefault handling that involves mapping procfs files into virtual memory. Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Tyler Hicks <tyhicks@canonical.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-10proc: prevent stacking filesystems on topJann Horn
This prevents stacking filesystems (ecryptfs and overlayfs) from using procfs as lower filesystem. There is too much magic going on inside procfs, and there is no good reason to stack stuff on top of procfs. (For example, procfs does access checks in VFS open handlers, and ecryptfs by design calls open handlers from a kernel thread that doesn't drop privileges or so.) Signed-off-by: Jann Horn <jannh@google.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-10much milder d_walk() raceAl Viro
d_walk() relies upon the tree not getting rearranged under it without rename_lock being touched. And we do grab rename_lock around the places that change the tree topology. Unfortunately, branch reordering is just as bad from d_walk() POV and we have two places that do it without touching rename_lock - one in handling of cursors (for ramfs-style directories) and another in autofs. autofs one is a separate story; this commit deals with the cursors. * mark cursor dentries explicitly at allocation time * make __dentry_kill() leave ->d_child.next pointing to the next non-cursor sibling, making sure that it won't be moved around unnoticed before the parent is relocked on ascend-to-parent path in d_walk(). * make d_walk() skip cursors explicitly; strictly speaking it's not necessary (all callbacks we pass to d_walk() are no-ops on cursors), but it makes analysis easier. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-08Merge branch 'misc-fixes-4.7' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.7
2016-06-08Merge branch 'for-chris' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.7
2016-06-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Fixes for crap of assorted ages: EOPENSTALE one is 4.2+, autofs one is 4.6, d_walk - 3.2+. The atomic_open() and coredump ones are regressions from this window" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: coredump: fix dumping through pipes fix a regression in atomic_open() fix d_walk()/non-delayed __d_free() race autofs braino fix for do_last() fix EOPENSTALE bug in do_last()
2016-06-07coredump: fix dumping through pipesMateusz Guzik
The offset in the core file used to be tracked with ->written field of the coredump_params structure. The field was retired in favour of file->f_pos. However, ->f_pos is not maintained for pipes which leads to breakage. Restore explicit tracking of the offset in coredump_params. Introduce ->pos field for this purpose since ->written was already reused. Fixes: a00839395103 ("get rid of coredump_params->written"). Reported-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl> Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-07fix a regression in atomic_open()Al Viro
open("/foo/no_such_file", O_RDONLY | O_CREAT) on should fail with EACCES when /foo is not writable; failing with ENOENT is obviously wrong. That got broken by a braino introduced when moving the creat_error logics from atomic_open() to lookup_open(). Easy to fix, fortunately. Spotted-by: "Yan, Zheng" <ukernel@gmail.com> Tested-by: "Yan, Zheng" <ukernel@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-07fix d_walk()/non-delayed __d_free() raceAl Viro
Ascend-to-parent logics in d_walk() depends on all encountered child dentries not getting freed without an RCU delay. Unfortunately, in quite a few cases it is not true, with hard-to-hit oopsable race as the result. Fortunately, the fix is simiple; right now the rule is "if it ever been hashed, freeing must be delayed" and changing it to "if it ever had a parent, freeing must be delayed" closes that hole and covers all cases the old rule used to cover. Moreover, pipes and sockets remain _not_ covered, so we do not introduce RCU delay in the cases which are the reason for having that delay conditional in the first place. Cc: stable@vger.kernel.org # v3.2+ (and watch out for __d_materialise_dentry()) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-06mnt: fs_fully_visible test the proper mount for MNT_LOCKEDEric W. Biederman
MNT_LOCKED implies on a child mount implies the child is locked to the parent. So while looping through the children the children should be tested (not their parent). Typically an unshare of a mount namespace locks all mounts together making both the parent and the slave as locked but there are a few corner cases where other things work. Cc: stable@vger.kernel.org Fixes: ceeb0e5d39fc ("vfs: Ignore unlocked mounts in fs_fully_visible") Reported-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-06-06mnt: If fs_fully_visible fails call put_filesystem.Eric W. Biederman
Add this trivial missing error handling. Cc: stable@vger.kernel.org Fixes: 1b852bceb0d1 ("mnt: Refactor the logic for mounting sysfs and proc in a user namespace") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-06-06Btrfs: self-tests: Fix extent buffer bitmap test fail on BE systemFeifei Xu
In __test_eb_bitmaps(), we write random data to a bitmap. Then copy the bitmap to another bitmap that resides inside an extent buffer. Later we verify the values of corresponding bits in the bitmap and the bitmap inside the extent buffer. However, extent_buffer_test_bit() reads in byte granularity while test_bit() reads in unsigned long granularity. Hence we end up comparing wrong bits on big-endian systems such as ppc64. This commit fixes the issue by reading the bitmap in byte granularity. Reviewed-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Feifei Xu <xufeifei@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-06Btrfs: self-tests: Fix test_bitmaps fail on 64k sectorsizeFeifei Xu
With 64K sectorsize, 1G sized block group cannot span across bitmaps. To execute test_bitmaps() function, this commit allocates "BITS_PER_BITMAP * sectorsize + PAGE_SIZE" sized block group. Reviewed-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Feifei Xu <xufeifei@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-06Btrfs: self-tests: Use macros instead of constants and add missing newlineFeifei Xu
This commit replaces numerical constants with appropriate preprocessor macros. Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Feifei Xu <xufeifei@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-06Btrfs: self-tests: Support testing all possible sectorsizes and nodesizesFeifei Xu
To test all possible sectorsizes, this commit adds a sectorsize array. This commit executes the tests for all possible sectorsizes and nodesizes. Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Feifei Xu <xufeifei@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-06Btrfs: self-tests: Execute page straddling test only when nodesize < PAGE_SIZEFeifei Xu
On ppc64, PAGE_SIZE is 64k which is same as BTRFS_MAX_METADATA_BLOCKSIZE. In such a scenario, we will never be able to have an extent buffer containing more than one page. Hence in such cases this commit does not execute the page straddling tests. Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Feifei Xu <xufeifei@linux.vnet.ibm.com> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-06ovl: xattr filter fixMiklos Szeredi
a) ovl_need_xattr_filter() is wrong, we can have multiple lower layers overlaid, all of which (except the lowest one) honouring the "trusted.overlay.opaque" xattr. So need to filter everything except the bottom and the pure-upper layer. b) we no longer can assume that inode is attached to dentry in get/setxattr. This patch unconditionally filters private xattrs to fix both of the above. Performance impact for get/removexattrs is likely in the noise. For listxattrs it might be measurable in pathological cases, but I very much hope nobody cares. If they do, we'll fix it then. Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: b96809173e94 ("security_d_instantiate(): move to the point prior to attaching dentry to inode")
2016-06-06btrfs: advertise which crc32c implementation is being used at module loadJeff Mahoney
Since several architectures support hardware-accelerated crc32c calculation, it would be nice to confirm that btrfs is actually using it. We can see an elevated use count for the module, but it doesn't actually show who the users are. This patch simply prints the name of the driver after successfully initializing the shash. Signed-off-by: Jeff Mahoney <jeffm@suse.com> [ added a helper and used in module load-time message ] Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-06Btrfs: add validadtion checks for chunk loadingLiu Bo
To prevent fuzzed filesystem images from panic the whole system, we need various validation checks to refuse to mount such an image if btrfs finds any invalid value during loading chunks, including both sys_array and regular chunks. Note that these checks may not be sufficient to cover all corner cases, feel free to add more checks. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>