summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2021-04-16fs: direct-io: fix missing sdio->boundaryJack Qiu
commit df41872b68601059dd4a84858952dcae58acd331 upstream. I encountered a hung task issue, but not a performance one. I run DIO on a device (need lba continuous, for example open channel ssd), maybe hungtask in below case: DIO: Checkpoint: get addr A(at boundary), merge into BIO, no submit because boundary missing flush dirty data(get addr A+1), wait IO(A+1) writeback timeout, because DIO(A) didn't submit get addr A+2 fail, because checkpoint is doing dio_send_cur_page() may clear sdio->boundary, so prevent it from missing a boundary. Link: https://lkml.kernel.org/r/20210322042253.38312-1-jack.qiu@huawei.com Fixes: b1058b981272 ("direct-io: submit bio after boundary buffer is added to it") Signed-off-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10cifs: Silently ignore unknown oplock break handleVincent Whitchurch
[ Upstream commit 219481a8f90ec3a5eed9638fb35609e4b1aeece7 ] Make SMB2 not print out an error when an oplock break is received for an unknown handle, similar to SMB1. The debug message which is printed for these unknown handles may also be misleading, so fix that too. The SMB2 lease break path is not affected by this patch. Without this, a program which writes to a file from one thread, and opens, reads, and writes the same file from another thread triggers the below errors several times a minute when run against a Samba server configured with "smb2 leases = no". CIFS: VFS: \\192.168.0.1 No task to wake, unknown frame received! NumMids 2 00000000: 424d53fe 00000040 00000000 00000012 .SMB@........... 00000010: 00000001 00000000 ffffffff ffffffff ................ 00000020: 00000000 00000000 00000000 00000000 ................ 00000030: 00000000 00000000 00000000 00000000 ................ Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Reviewed-by: Tom Talpey <tom@talpey.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-10cifs: revalidate mapping when we open files for SMB1 POSIXRonnie Sahlberg
[ Upstream commit cee8f4f6fcabfdf229542926128e9874d19016d5 ] RHBZ: 1933527 Under SMB1 + POSIX, if an inode is reused on a server after we have read and cached a part of a file, when we then open the new file with the re-cycled inode there is a chance that we may serve the old data out of cache to the application. This only happens for SMB1 (deprecated) and when posix are used. The simplest solution to avoid this race is to force a revalidate on smb1-posix open. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-07reiserfs: update reiserfs_xattrs_initialized() conditionTetsuo Handa
commit 5e46d1b78a03d52306f21f77a4e4a144b6d31486 upstream. syzbot is reporting NULL pointer dereference at reiserfs_security_init() [1], for commit ab17c4f02156c4f7 ("reiserfs: fixup xattr_root caching") is assuming that REISERFS_SB(s)->xattr_root != NULL in reiserfs_xattr_jcreate_nblocks() despite that commit made REISERFS_SB(sb)->priv_root != NULL && REISERFS_SB(s)->xattr_root == NULL case possible. I guess that commit 6cb4aff0a77cc0e6 ("reiserfs: fix oops while creating privroot with selinux enabled") wanted to check xattr_root != NULL before reiserfs_xattr_jcreate_nblocks(), for the changelog is talking about the xattr root. The issue is that while creating the privroot during mount reiserfs_security_init calls reiserfs_xattr_jcreate_nblocks which dereferences the xattr root. The xattr root doesn't exist, so we get an oops. Therefore, update reiserfs_xattrs_initialized() to check both the privroot and the xattr root. Link: https://syzkaller.appspot.com/bug?id=8abaedbdeb32c861dc5340544284167dd0e46cde # [1] Reported-and-tested-by: syzbot <syzbot+690cb1e51970435f9775@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 6cb4aff0a77c ("reiserfs: fix oops while creating privroot with selinux enabled") Acked-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Jan Kara <jack@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-07ext4: do not iput inode under running transaction in ext4_rename()zhangyi (F)
[ Upstream commit 5dccdc5a1916d4266edd251f20bbbb113a5c495f ] In ext4_rename(), when RENAME_WHITEOUT failed to add new entry into directory, it ends up dropping new created whiteout inode under the running transaction. After commit <9b88f9fb0d2> ("ext4: Do not iput inode under running transaction"), we follow the assumptions that evict() does not get called from a transaction context but in ext4_rename() it breaks this suggestion. Although it's not a real problem, better to obey it, so this patch add inode to orphan list and stop transaction before final iput(). Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20210303131703.330415-2-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-07ext4: fix bh ref count on error pathsZhaolong Zhang
[ Upstream commit c915fb80eaa6194fa9bd0a4487705cd5b0dda2f1 ] __ext4_journalled_writepage should drop bhs' ref count on error paths Signed-off-by: Zhaolong Zhang <zhangzl2013@126.com> Link: https://lore.kernel.org/r/1614678151-70481-1-git-send-email-zhangzl2013@126.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30squashfs: fix xattr id and id lookup sanity checksPhillip Lougher
commit 8b44ca2b634527151af07447a8090a5f3a043321 upstream. The checks for maximum metadata block size is missing SQUASHFS_BLOCK_OFFSET (the two byte length count). Link: https://lkml.kernel.org/r/2069685113.2081245.1614583677427@webmail.123-reg.co.uk Fixes: f37aa4c7366e23f ("squashfs: add more sanity checks in id lookup") Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Cc: Sean Nyekjaer <sean@geanix.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-30squashfs: fix inode lookup sanity checksSean Nyekjaer
commit c1b2028315c6b15e8d6725e0d5884b15887d3daa upstream. When mouting a squashfs image created without inode compression it fails with: "unable to read inode lookup table" It turns out that the BLOCK_OFFSET is missing when checking the SQUASHFS_METADATA_SIZE agaist the actual size. Link: https://lkml.kernel.org/r/20210226092903.1473545-1-sean@geanix.com Fixes: eabac19e40c0 ("squashfs: add more sanity checks in inode lookup") Signed-off-by: Sean Nyekjaer <sean@geanix.com> Acked-by: Phillip Lougher <phillip@squashfs.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-30nfs: we don't support removing system.nfs4_aclJ. Bruce Fields
[ Upstream commit 4f8be1f53bf615102d103c0509ffa9596f65b718 ] The NFSv4 protocol doesn't have any notion of reomoving an attribute, so removexattr(path,"system.nfs4_acl") doesn't make sense. There's no documented return value. Arguably it could be EOPNOTSUPP but I'm a little worried an application might take that to mean that we don't support ACLs or xattrs. How about EINVAL? Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30NFS: Correct size calculation for create reply lengthFrank Sorenson
[ Upstream commit ad3dbe35c833c2d4d0bbf3f04c785d32f931e7c9 ] CREATE requests return a post_op_fh3, rather than nfs_fh3. The post_op_fh3 includes an extra word to indicate 'handle_follows'. Without that additional word, create fails when full 64-byte filehandles are in use. Add NFS3_post_op_fh_sz, and correct the size calculation for NFS3_createres_sz. Signed-off-by: Frank Sorenson <sorenson@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30nfs: fix PNFS_FLEXFILE_LAYOUT Kconfig defaultTimo Rothenpieler
[ Upstream commit a0590473c5e6c4ef17c3132ad08fbad170f72d55 ] This follows what was done in 8c2fabc6542d9d0f8b16bd1045c2eda59bdcde13. With the default being m, it's impossible to build the module into the kernel. Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-24ext4: fix potential error in ext4_do_update_inodeShijie Luo
commit 7d8bd3c76da1d94b85e6c9b7007e20e980bfcfe6 upstream. If set_large_file = 1 and errors occur in ext4_handle_dirty_metadata(), the error code will be overridden, go to out_brelse to avoid this situation. Signed-off-by: Shijie Luo <luoshijie1@huawei.com> Link: https://lore.kernel.org/r/20210312065051.36314-1-luoshijie1@huawei.com Cc: stable@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-24ext4: find old entry again if failed to rename whiteoutzhangyi (F)
commit b7ff91fd030dc9d72ed91b1aab36e445a003af4f upstream. If we failed to add new entry on rename whiteout, we cannot reset the old->de entry directly, because the old->de could have moved from under us during make indexed dir. So find the old entry again before reset is needed, otherwise it may corrupt the filesystem as below. /dev/sda: Entry '00000001' in ??? (12) has deleted/unused inode 15. CLEARED. /dev/sda: Unattached inode 75 /dev/sda: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. Fixes: 6b4b8e6b4ad ("ext4: fix bug for rename with RENAME_WHITEOUT") Cc: stable@vger.kernel.org Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20210303131703.330415-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-24btrfs: fix race when cloning extent buffer during rewind of an old rootFilipe Manana
commit dbcc7d57bffc0c8cac9dac11bec548597d59a6a5 upstream. While resolving backreferences, as part of a logical ino ioctl call or fiemap, we can end up hitting a BUG_ON() when replaying tree mod log operations of a root, triggering a stack trace like the following: ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:1210! invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 19054 Comm: crawl_335 Tainted: G W 5.11.0-2d11c0084b02-misc-next+ #89 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 RIP: 0010:__tree_mod_log_rewind+0x3b1/0x3c0 Code: 05 48 8d 74 10 (...) RSP: 0018:ffffc90001eb70b8 EFLAGS: 00010297 RAX: 0000000000000000 RBX: ffff88812344e400 RCX: ffffffffb28933b6 RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff88812344e42c RBP: ffffc90001eb7108 R08: 1ffff11020b60a20 R09: ffffed1020b60a20 R10: ffff888105b050f9 R11: ffffed1020b60a1f R12: 00000000000000ee R13: ffff8880195520c0 R14: ffff8881bc958500 R15: ffff88812344e42c FS: 00007fd1955e8700(0000) GS:ffff8881f5600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007efdb7928718 CR3: 000000010103a006 CR4: 0000000000170ee0 Call Trace: btrfs_search_old_slot+0x265/0x10d0 ? lock_acquired+0xbb/0x600 ? btrfs_search_slot+0x1090/0x1090 ? free_extent_buffer.part.61+0xd7/0x140 ? free_extent_buffer+0x13/0x20 resolve_indirect_refs+0x3e9/0xfc0 ? lock_downgrade+0x3d0/0x3d0 ? __kasan_check_read+0x11/0x20 ? add_prelim_ref.part.11+0x150/0x150 ? lock_downgrade+0x3d0/0x3d0 ? __kasan_check_read+0x11/0x20 ? lock_acquired+0xbb/0x600 ? __kasan_check_write+0x14/0x20 ? do_raw_spin_unlock+0xa8/0x140 ? rb_insert_color+0x30/0x360 ? prelim_ref_insert+0x12d/0x430 find_parent_nodes+0x5c3/0x1830 ? resolve_indirect_refs+0xfc0/0xfc0 ? lock_release+0xc8/0x620 ? fs_reclaim_acquire+0x67/0xf0 ? lock_acquire+0xc7/0x510 ? lock_downgrade+0x3d0/0x3d0 ? lockdep_hardirqs_on_prepare+0x160/0x210 ? lock_release+0xc8/0x620 ? fs_reclaim_acquire+0x67/0xf0 ? lock_acquire+0xc7/0x510 ? poison_range+0x38/0x40 ? unpoison_range+0x14/0x40 ? trace_hardirqs_on+0x55/0x120 btrfs_find_all_roots_safe+0x142/0x1e0 ? find_parent_nodes+0x1830/0x1830 ? btrfs_inode_flags_to_xflags+0x50/0x50 iterate_extent_inodes+0x20e/0x580 ? tree_backref_for_extent+0x230/0x230 ? lock_downgrade+0x3d0/0x3d0 ? read_extent_buffer+0xdd/0x110 ? lock_downgrade+0x3d0/0x3d0 ? __kasan_check_read+0x11/0x20 ? lock_acquired+0xbb/0x600 ? __kasan_check_write+0x14/0x20 ? _raw_spin_unlock+0x22/0x30 ? __kasan_check_write+0x14/0x20 iterate_inodes_from_logical+0x129/0x170 ? iterate_inodes_from_logical+0x129/0x170 ? btrfs_inode_flags_to_xflags+0x50/0x50 ? iterate_extent_inodes+0x580/0x580 ? __vmalloc_node+0x92/0xb0 ? init_data_container+0x34/0xb0 ? init_data_container+0x34/0xb0 ? kvmalloc_node+0x60/0x80 btrfs_ioctl_logical_to_ino+0x158/0x230 btrfs_ioctl+0x205e/0x4040 ? __might_sleep+0x71/0xe0 ? btrfs_ioctl_get_supported_features+0x30/0x30 ? getrusage+0x4b6/0x9c0 ? __kasan_check_read+0x11/0x20 ? lock_release+0xc8/0x620 ? __might_fault+0x64/0xd0 ? lock_acquire+0xc7/0x510 ? lock_downgrade+0x3d0/0x3d0 ? lockdep_hardirqs_on_prepare+0x210/0x210 ? lockdep_hardirqs_on_prepare+0x210/0x210 ? __kasan_check_read+0x11/0x20 ? do_vfs_ioctl+0xfc/0x9d0 ? ioctl_file_clone+0xe0/0xe0 ? lock_downgrade+0x3d0/0x3d0 ? lockdep_hardirqs_on_prepare+0x210/0x210 ? __kasan_check_read+0x11/0x20 ? lock_release+0xc8/0x620 ? __task_pid_nr_ns+0xd3/0x250 ? lock_acquire+0xc7/0x510 ? __fget_files+0x160/0x230 ? __fget_light+0xf2/0x110 __x64_sys_ioctl+0xc3/0x100 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fd1976e2427 Code: 00 00 90 48 8b 05 (...) RSP: 002b:00007fd1955e5cf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fd1955e5f40 RCX: 00007fd1976e2427 RDX: 00007fd1955e5f48 RSI: 00000000c038943b RDI: 0000000000000004 RBP: 0000000001000000 R08: 0000000000000000 R09: 00007fd1955e6120 R10: 0000557835366b00 R11: 0000000000000246 R12: 0000000000000004 R13: 00007fd1955e5f48 R14: 00007fd1955e5f40 R15: 00007fd1955e5ef8 Modules linked in: ---[ end trace ec8931a1c36e57be ]--- (gdb) l *(__tree_mod_log_rewind+0x3b1) 0xffffffff81893521 is in __tree_mod_log_rewind (fs/btrfs/ctree.c:1210). 1205 * the modification. as we're going backwards, we do the 1206 * opposite of each operation here. 1207 */ 1208 switch (tm->op) { 1209 case MOD_LOG_KEY_REMOVE_WHILE_FREEING: 1210 BUG_ON(tm->slot < n); 1211 fallthrough; 1212 case MOD_LOG_KEY_REMOVE_WHILE_MOVING: 1213 case MOD_LOG_KEY_REMOVE: 1214 btrfs_set_node_key(eb, &tm->key, tm->slot); Here's what happens to hit that BUG_ON(): 1) We have one tree mod log user (through fiemap or the logical ino ioctl), with a sequence number of 1, so we have fs_info->tree_mod_seq == 1; 2) Another task is at ctree.c:balance_level() and we have eb X currently as the root of the tree, and we promote its single child, eb Y, as the new root. Then, at ctree.c:balance_level(), we call: tree_mod_log_insert_root(eb X, eb Y, 1); 3) At tree_mod_log_insert_root() we create tree mod log elements for each slot of eb X, of operation type MOD_LOG_KEY_REMOVE_WHILE_FREEING each with a ->logical pointing to ebX->start. These are placed in an array named tm_list. Lets assume there are N elements (N pointers in eb X); 4) Then, still at tree_mod_log_insert_root(), we create a tree mod log element of operation type MOD_LOG_ROOT_REPLACE, ->logical set to ebY->start, ->old_root.logical set to ebX->start, ->old_root.level set to the level of eb X and ->generation set to the generation of eb X; 5) Then tree_mod_log_insert_root() calls tree_mod_log_free_eb() with tm_list as argument. After that, tree_mod_log_free_eb() calls __tree_mod_log_insert() for each member of tm_list in reverse order, from highest slot in eb X, slot N - 1, to slot 0 of eb X; 6) __tree_mod_log_insert() sets the sequence number of each given tree mod log operation - it increments fs_info->tree_mod_seq and sets fs_info->tree_mod_seq as the sequence number of the given tree mod log operation. This means that for the tm_list created at tree_mod_log_insert_root(), the element corresponding to slot 0 of eb X has the highest sequence number (1 + N), and the element corresponding to the last slot has the lowest sequence number (2); 7) Then, after inserting tm_list's elements into the tree mod log rbtree, the MOD_LOG_ROOT_REPLACE element is inserted, which gets the highest sequence number, which is N + 2; 8) Back to ctree.c:balance_level(), we free eb X by calling btrfs_free_tree_block() on it. Because eb X was created in the current transaction, has no other references and writeback did not happen for it, we add it back to the free space cache/tree; 9) Later some other task T allocates the metadata extent from eb X, since it is marked as free space in the space cache/tree, and uses it as a node for some other btree; 10) The tree mod log user task calls btrfs_search_old_slot(), which calls get_old_root(), and finally that calls __tree_mod_log_oldest_root() with time_seq == 1 and eb_root == eb Y; 11) First iteration of the while loop finds the tree mod log element with sequence number N + 2, for the logical address of eb Y and of type MOD_LOG_ROOT_REPLACE; 12) Because the operation type is MOD_LOG_ROOT_REPLACE, we don't break out of the loop, and set root_logical to point to tm->old_root.logical which corresponds to the logical address of eb X; 13) On the next iteration of the while loop, the call to tree_mod_log_search_oldest() returns the smallest tree mod log element for the logical address of eb X, which has a sequence number of 2, an operation type of MOD_LOG_KEY_REMOVE_WHILE_FREEING and corresponds to the old slot N - 1 of eb X (eb X had N items in it before being freed); 14) We then break out of the while loop and return the tree mod log operation of type MOD_LOG_ROOT_REPLACE (eb Y), and not the one for slot N - 1 of eb X, to get_old_root(); 15) At get_old_root(), we process the MOD_LOG_ROOT_REPLACE operation and set "logical" to the logical address of eb X, which was the old root. We then call tree_mod_log_search() passing it the logical address of eb X and time_seq == 1; 16) Then before calling tree_mod_log_search(), task T adds a key to eb X, which results in adding a tree mod log operation of type MOD_LOG_KEY_ADD to the tree mod log - this is done at ctree.c:insert_ptr() - but after adding the tree mod log operation and before updating the number of items in eb X from 0 to 1... 17) The task at get_old_root() calls tree_mod_log_search() and gets the tree mod log operation of type MOD_LOG_KEY_ADD just added by task T. Then it enters the following if branch: if (old_root && tm && tm->op != MOD_LOG_KEY_REMOVE_WHILE_FREEING) { (...) } (...) Calls read_tree_block() for eb X, which gets a reference on eb X but does not lock it - task T has it locked. Then it clones eb X while it has nritems set to 0 in its header, before task T sets nritems to 1 in eb X's header. From hereupon we use the clone of eb X which no other task has access to; 18) Then we call __tree_mod_log_rewind(), passing it the MOD_LOG_KEY_ADD mod log operation we just got from tree_mod_log_search() in the previous step and the cloned version of eb X; 19) At __tree_mod_log_rewind(), we set the local variable "n" to the number of items set in eb X's clone, which is 0. Then we enter the while loop, and in its first iteration we process the MOD_LOG_KEY_ADD operation, which just decrements "n" from 0 to (u32)-1, since "n" is declared with a type of u32. At the end of this iteration we call rb_next() to find the next tree mod log operation for eb X, that gives us the mod log operation of type MOD_LOG_KEY_REMOVE_WHILE_FREEING, for slot 0, with a sequence number of N + 1 (steps 3 to 6); 20) Then we go back to the top of the while loop and trigger the following BUG_ON(): (...) switch (tm->op) { case MOD_LOG_KEY_REMOVE_WHILE_FREEING: BUG_ON(tm->slot < n); fallthrough; (...) Because "n" has a value of (u32)-1 (4294967295) and tm->slot is 0. Fix this by taking a read lock on the extent buffer before cloning it at ctree.c:get_old_root(). This should be done regardless of the extent buffer having been freed and reused, as a concurrent task might be modifying it (while holding a write lock on it). Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Link: https://lore.kernel.org/linux-btrfs/20210227155037.GN28049@hungrycats.org/ Fixes: 834328a8493079 ("Btrfs: tree mod log's old roots could still be part of the tree") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-24ext4: check journal inode extents more carefullyJan Kara
commit ce9f24cccdc019229b70a5c15e2b09ad9c0ab5d1 upstream. Currently, system zones just track ranges of block, that are "important" fs metadata (bitmaps, group descriptors, journal blocks, etc.). This however complicates how extent tree (or indirect blocks) can be checked for inodes that actually track such metadata - currently the journal inode but arguably we should be treating quota files or resize inode similarly. We cannot run __ext4_ext_check() on such metadata inodes when loading their extents as that would immediately trigger the validity checks and so we just hack around that and special-case the journal inode. This however leads to a situation that a journal inode which has extent tree of depth at least one can have invalid extent tree that gets unnoticed until ext4_cache_extents() crashes. To overcome this limitation, track inode number each system zone belongs to (0 is used for zones not belonging to any inode). We can then verify inode number matches the expected one when verifying extent tree and thus avoid the false errors. With this there's no need to to special-case journal inode during extent tree checking anymore so remove it. Fixes: 0a944e8a6c66 ("ext4: don't perform block validity checks on the journal inode") Reported-by: Wolfgang Frisch <wolfgang.frisch@suse.com> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200728130437.7804-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-24ext4: don't allow overlapping system zonesJan Kara
commit bf9a379d0980e7413d94cb18dac73db2bfc5f470 upstream. Currently, add_system_zone() just silently merges two added system zones that overlap. However the overlap should not happen and it generally suggests that some unrelated metadata overlap which indicates the fs is corrupted. We should have caught such problems earlier (e.g. in ext4_check_descriptors()) but add this check as another line of defense. In later patch we also use this for stricter checking of journal inode extent tree. Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200728130437.7804-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-24ext4: handle error of ext4_setup_system_zone() on remountJan Kara
commit d176b1f62f242ab259ff665a26fbac69db1aecba upstream. ext4_setup_system_zone() can fail. Handle the failure in ext4_remount(). Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200728130437.7804-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17NFSv4.2: fix return value of _nfs4_get_security_label()Ondrej Mosnacek
[ Upstream commit 53cb245454df5b13d7063162afd7a785aed6ebf2 ] An xattr 'get' handler is expected to return the length of the value on success, yet _nfs4_get_security_label() (and consequently also nfs4_xattr_get_nfs4_label(), which is used as an xattr handler) returns just 0 on success. Fix this by returning label.len instead, which contains the length of the result. Fixes: aa9c2669626c ("NFS: Client implementation of Labeled-NFS") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Reviewed-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17cifs: return proper error code in statfs(2)Paulo Alcantara
commit 14302ee3301b3a77b331cc14efb95bf7184c73cc upstream. In cifs_statfs(), if server->ops->queryfs is not NULL, then we should use its return value rather than always returning 0. Instead, use rc variable as it is properly set to 0 in case there is no server->ops->queryfs. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-07sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs outputJoe Perches
commit 2efc459d06f1630001e3984854848a5647086232 upstream. Output defects can exist in sysfs content using sprintf and snprintf. sprintf does not know the PAGE_SIZE maximum of the temporary buffer used for outputting sysfs content and it's possible to overrun the PAGE_SIZE buffer length. Add a generic sysfs_emit function that knows that the size of the temporary buffer and ensures that no overrun is done. Add a generic sysfs_emit_at function that can be used in multiple call situations that also ensures that no overrun is done. Validate the output buffer argument to be page aligned. Validate the offset len argument to be within the PAGE_SIZE buf. Signed-off-by: Joe Perches <joe@perches.com> Link: https://lore.kernel.org/r/884235202216d464d61ee975f7465332c86f76b2.1600285923.git.joe@perches.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-07xfs: Fix assert failure in xfs_setattr_size()Yumei Huang
commit 88a9e03beef22cc5fabea344f54b9a0dfe63de08 upstream. An assert failure is triggered by syzkaller test due to ATTR_KILL_PRIV is not cleared before xfs_setattr_size. As ATTR_KILL_PRIV is not checked/used by xfs_setattr_size, just remove it from the assert. Signed-off-by: Yumei Huang <yuhuang@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-07JFS: more checks for invalid superblockRandy Dunlap
commit 3bef198f1b17d1bb89260bad947ef084c0a2d1a6 upstream. syzbot is feeding invalid superblock data to JFS for mount testing. JFS does not check several of the fields -- just assumes that they are good since the JFS_MAGIC and version fields are good. In this case (syzbot reproducer), we have s_l2bsize == 0xda0c, pad == 0xf045, and s_state == 0x50, all of which are invalid IMO. Having s_l2bsize == 0xda0c causes this UBSAN warning: UBSAN: shift-out-of-bounds in fs/jfs/jfs_mount.c:373:25 shift exponent -9716 is negative s_l2bsize can be tested for correctness. pad can be tested for non-0 and punted. s_state can be tested for its valid values and punted. Do those 3 tests and if any of them fails, report the superblock as invalid/corrupt and let fsck handle it. With this patch, chkSuper() says this when JFS_DEBUG is enabled: jfs_mount: Mount Failure: superblock is corrupt! Mount JFS Failure: -22 jfs_mount failed w/return code = -22 The obvious problem with this method is that next week there could be another syzbot test that uses different fields for invalid values, this making this like a game of whack-a-mole. syzkaller link: https://syzkaller.appspot.com/bug?extid=36315852ece4132ec193 Reported-by: syzbot+36315852ece4132ec193@syzkaller.appspotmail.com Reported-by: kernel test robot <lkp@intel.com> # v2 Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: jfs-discussion@lists.sourceforge.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-03gfs2: Don't skip dlm unlock if glock has an lvbBob Peterson
commit 78178ca844f0eb88f21f31c7fde969384be4c901 upstream. Patch fb6791d100d1 was designed to allow gfs2 to unmount quicker by skipping the step where it tells dlm to unlock glocks in EX with lvbs. This was done because when gfs2 unmounts a file system, it destroys the dlm lockspace shortly after it destroys the glocks so it doesn't need to unlock them all: the unlock is implied when the lockspace is destroyed by dlm. However, that patch introduced a use-after-free in dlm: as part of its normal dlm_recoverd process, it can call ls_recovery to recover dead locks. In so doing, it can call recover_rsbs which calls recover_lvb for any mastered rsbs. Func recover_lvb runs through the list of lkbs queued to the given rsb (if the glock is cached but unlocked, it will still be queued to the lkb, but in NL--Unlocked--mode) and if it has an lvb, copies it to the rsb, thus trying to preserve the lkb. However, when gfs2 skips the dlm unlock step, it frees the glock and its lvb, which means dlm's function recover_lvb references the now freed lvb pointer, copying the freed lvb memory to the rsb. This patch changes the check in gdlm_put_lock so that it calls dlm_unlock for all glocks that contain an lvb pointer. Fixes: fb6791d100d1 ("GFS2: skip dlm_unlock calls in unmount") Cc: stable@vger.kernel.org # v3.8+ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-03f2fs: fix out-of-repair __setattr_copy()Chao Yu
commit 2562515f0ad7342bde6456602c491b64c63fe950 upstream. __setattr_copy() was copied from setattr_copy() in fs/attr.c, there is two missing patches doesn't cover this inner function, fix it. Commit 7fa294c8991c ("userns: Allow chown and setgid preservation") Commit 23adbe12ef7d ("fs,userns: Change inode_capable to capable_wrt_inode_uidgid") Fixes: fbfa2cc58d53 ("f2fs: add file operations") Cc: stable@vger.kernel.org Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-03btrfs: fix reloc root leak with 0 ref reloc roots on recoveryJosef Bacik
commit c78a10aebb275c38d0cfccae129a803fe622e305 upstream. When recovering a relocation, if we run into a reloc root that has 0 refs we simply add it to the reloc_control->reloc_roots list, and then clean it up later. The problem with this is __del_reloc_root() doesn't do anything if the root isn't in the radix tree, which in this case it won't be because we never call __add_reloc_root() on the reloc_root. This exit condition simply isn't correct really. During normal operation we can remove ourselves from the rb tree and then we're meant to clean up later at merge_reloc_roots() time, and this happens correctly. During recovery we're depending on free_reloc_roots() to drop our references, but we're short-circuiting. Fix this by continuing to check if we're on the list and dropping ourselves from the reloc_control root list and dropping our reference appropriately. Change the corresponding BUG_ON() to an ASSERT() that does the correct thing if we aren't in the rb tree. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-03isofs: release buffer head before returnPan Bian
[ Upstream commit 0a6dc67a6aa45f19bd4ff89b4f468fc50c4b8daa ] Release the buffer_head before returning error code in do_isofs_readdir() and isofs_find_entry(). Fixes: 2deb1acc653c ("isofs: fix access to unallocated memory when reading corrupted filesystem") Link: https://lore.kernel.org/r/20210118120455.118955-1-bianpan2016@163.com Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-03jffs2: fix use after free in jffs2_sum_write_data()Tom Rix
[ Upstream commit 19646447ad3a680d2ab08c097585b7d96a66126b ] clang static analysis reports this problem fs/jffs2/summary.c:794:31: warning: Use of memory after it is freed c->summary->sum_list_head = temp->u.next; ^~~~~~~~~~~~ In jffs2_sum_write_data(), in a loop summary data is handles a node at a time. When it has written out the node it is removed the summary list, and the node is deleted. In the corner case when a JFFS2_FEATURE_RWCOMPAT_COPY is seen, a call is made to jffs2_sum_disable_collecting(). jffs2_sum_disable_collecting() deletes the whole list which conflicts with the loop's deleting the list by parts. To preserve the old behavior of stopping the write midway, bail out of the loop after disabling summary collection. Fixes: 6171586a7ae5 ("[JFFS2] Correct handling of JFFS2_FEATURE_RWCOMPAT_COPY nodes.") Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-03fs/jfs: fix potential integer overflow on shift of a intColin Ian King
[ Upstream commit 4208c398aae4c2290864ba15c3dab7111f32bec1 ] The left shift of int 32 bit integer constant 1 is evaluated using 32 bit arithmetic and then assigned to a signed 64 bit integer. In the case where l2nb is 32 or more this can lead to an overflow. Avoid this by shifting the value 1LL instead. Addresses-Coverity: ("Uninitentional integer overflow") Fixes: b40c2e665cd5 ("fs/jfs: TRIM support for JFS Filesystem") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-03btrfs: clarify error returns values in __load_free_space_cacheZhihao Cheng
[ Upstream commit 3cc64e7ebfb0d7faaba2438334c43466955a96e8 ] Return value in __load_free_space_cache is not properly set after (unlikely) memory allocation failures and 0 is returned instead. This is not a problem for the caller load_free_space_cache because only value 1 is considered as 'cache loaded' but for clarity it's better to set the errors accordingly. Fixes: a67509c30079 ("Btrfs: add a io_ctl struct and helpers for dealing with the space cache") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-03ntfs: check for valid standard information attributeRustam Kovhaev
commit 4dfe6bd94959222e18d512bdf15f6bf9edb9c27c upstream. Mounting a corrupted filesystem with NTFS resulted in a kernel crash. We should check for valid STANDARD_INFORMATION attribute offset and length before trying to access it Link: https://lkml.kernel.org/r/20210217155930.1506815-1-rkovhaev@gmail.com Link: https://syzkaller.appspot.com/bug?extid=c584225dabdea2f71969 Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com> Reported-by: syzbot+c584225dabdea2f71969@syzkaller.appspotmail.com Tested-by: syzbot+c584225dabdea2f71969@syzkaller.appspotmail.com Acked-by: Anton Altaparmakov <anton@tuxera.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-23squashfs: add more sanity checks in xattr id lookupPhillip Lougher
commit 506220d2ba21791314af569211ffd8870b8208fa upstream. Sysbot has reported a warning where a kmalloc() attempt exceeds the maximum limit. This has been identified as corruption of the xattr_ids count when reading the xattr id lookup table. This patch adds a number of additional sanity checks to detect this corruption and others. 1. It checks for a corrupted xattr index read from the inode. This could be because the metadata block is uncompressed, or because the "compression" bit has been corrupted (turning a compressed block into an uncompressed block). This would cause an out of bounds read. 2. It checks against corruption of the xattr_ids count. This can either lead to the above kmalloc failure, or a smaller than expected table to be read. 3. It checks the contents of the index table for corruption. [phillip@squashfs.org.uk: fix checkpatch issue] Link: https://lkml.kernel.org/r/270245655.754655.1612770082682@webmail.123-reg.co.uk Link: https://lkml.kernel.org/r/20210204130249.4495-5-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reported-by: syzbot+2ccea6339d368360800d@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-23squashfs: add more sanity checks in inode lookupPhillip Lougher
commit eabac19e40c095543def79cb6ffeb3a8588aaff4 upstream. Sysbot has reported an "slab-out-of-bounds read" error which has been identified as being caused by a corrupted "ino_num" value read from the inode. This could be because the metadata block is uncompressed, or because the "compression" bit has been corrupted (turning a compressed block into an uncompressed block). This patch adds additional sanity checks to detect this, and the following corruption. 1. It checks against corruption of the inodes count. This can either lead to a larger table to be read, or a smaller than expected table to be read. In the case of a too large inodes count, this would often have been trapped by the existing sanity checks, but this patch introduces a more exact check, which can identify too small values. 2. It checks the contents of the index table for corruption. [phillip@squashfs.org.uk: fix checkpatch issue] Link: https://lkml.kernel.org/r/527909353.754618.1612769948607@webmail.123-reg.co.uk Link: https://lkml.kernel.org/r/20210204130249.4495-4-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reported-by: syzbot+04419e3ff19d2970ea28@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-23squashfs: add more sanity checks in id lookupPhillip Lougher
commit f37aa4c7366e23f91b81d00bafd6a7ab54e4a381 upstream. Sysbot has reported a number of "slab-out-of-bounds reads" and "use-after-free read" errors which has been identified as being caused by a corrupted index value read from the inode. This could be because the metadata block is uncompressed, or because the "compression" bit has been corrupted (turning a compressed block into an uncompressed block). This patch adds additional sanity checks to detect this, and the following corruption. 1. It checks against corruption of the ids count. This can either lead to a larger table to be read, or a smaller than expected table to be read. In the case of a too large ids count, this would often have been trapped by the existing sanity checks, but this patch introduces a more exact check, which can identify too small values. 2. It checks the contents of the index table for corruption. Link: https://lkml.kernel.org/r/20210204130249.4495-3-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reported-by: syzbot+b06d57ba83f604522af2@syzkaller.appspotmail.com Reported-by: syzbot+c021ba012da41ee9807c@syzkaller.appspotmail.com Reported-by: syzbot+5024636e8b5fd19f0f19@syzkaller.appspotmail.com Reported-by: syzbot+bcbc661df46657d0fa4f@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-23memcg: fix a crash in wb_workfn when a device disappearsTheodore Ts'o
[ Upstream commit 68f23b89067fdf187763e75a56087550624fdbee ] Without memcg, there is a one-to-one mapping between the bdi and bdi_writeback structures. In this world, things are fairly straightforward; the first thing bdi_unregister() does is to shutdown the bdi_writeback structure (or wb), and part of that writeback ensures that no other work queued against the wb, and that the wb is fully drained. With memcg, however, there is a one-to-many relationship between the bdi and bdi_writeback structures; that is, there are multiple wb objects which can all point to a single bdi. There is a refcount which prevents the bdi object from being released (and hence, unregistered). So in theory, the bdi_unregister() *should* only get called once its refcount goes to zero (bdi_put will drop the refcount, and when it is zero, release_bdi gets called, which calls bdi_unregister). Unfortunately, del_gendisk() in block/gen_hd.c never got the memo about the Brave New memcg World, and calls bdi_unregister directly. It does this without informing the file system, or the memcg code, or anything else. This causes the root wb associated with the bdi to be unregistered, but none of the memcg-specific wb's are shutdown. So when one of these wb's are woken up to do delayed work, they try to dereference their wb->bdi->dev to fetch the device name, but unfortunately bdi->dev is now NULL, thanks to the bdi_unregister() called by del_gendisk(). As a result, *boom*. Fortunately, it looks like the rest of the writeback path is perfectly happy with bdi->dev and bdi->owner being NULL, so the simplest fix is to create a bdi_dev_name() function which can handle bdi->dev being NULL. This also allows us to bulletproof the writeback tracepoints to prevent them from dereferencing a NULL pointer and crashing the kernel if one is tracing with memcg's enabled, and an iSCSI device dies or a USB storage stick is pulled. The most common way of triggering this will be hotremoval of a device while writeback with memcg enabled is going on. It was triggering several times a day in a heavily loaded production environment. Google Bug Id: 145475544 Link: https://lore.kernel.org/r/20191227194829.150110-1-tytso@mit.edu Link: http://lkml.kernel.org/r/20191228005211.163952-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-10mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB pageMuchun Song
commit 585fc0d2871c9318c949fbf45b1f081edd489e96 upstream. If a new hugetlb page is allocated during fallocate it will not be marked as active (set_page_huge_active) which will result in a later isolate_huge_page failure when the page migration code would like to move that page. Such a failure would be unexpected and wrong. Only export set_page_huge_active, just leave clear_page_huge_active as static. Because there are no external users. Link: https://lkml.kernel.org/r/20210115124942.46403-3-songmuchun@bytedance.com Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate()) Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-10cifs: report error instead of invalid when revalidating a dentry failsAurelien Aptel
commit 21b200d091826a83aafc95d847139b2b0582f6d1 upstream. Assuming - //HOST/a is mounted on /mnt - //HOST/b is mounted on /mnt/b On a slow connection, running 'df' and killing it while it's processing /mnt/b can make cifs_get_inode_info() returns -ERESTARTSYS. This triggers the following chain of events: => the dentry revalidation fail => dentry is put and released => superblock associated with the dentry is put => /mnt/b is unmounted This patch makes cifs_d_revalidate() return the error instead of 0 (invalid) when cifs_revalidate_dentry() fails, except for ENOENT (file deleted) and ESTALE (file recreated). Signed-off-by: Aurelien Aptel <aaptel@suse.com> Suggested-by: Shyam Prasad N <nspmangalore@gmail.com> Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com> CC: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-10ELF/MIPS build fixRalf Baechle
commit f43edca7ed08fc02279f2a62015da5cb6aa0ad61 upstream. CONFIG_MIPS32_N32=y but CONFIG_BINFMT_ELF disabled results in the following linker errors: arch/mips/built-in.o: In function `elf_core_dump': binfmt_elfn32.c:(.text+0x23dbc): undefined reference to `elf_core_extra_phdrs' binfmt_elfn32.c:(.text+0x246e4): undefined reference to `elf_core_extra_data_size' binfmt_elfn32.c:(.text+0x248d0): undefined reference to `elf_core_write_extra_phdrs' binfmt_elfn32.c:(.text+0x24ac4): undefined reference to `elf_core_write_extra_data' CONFIG_MIPS32_O32=y but CONFIG_BINFMT_ELF disabled results in the following linker errors: arch/mips/built-in.o: In function `elf_core_dump': binfmt_elfo32.c:(.text+0x28a04): undefined reference to `elf_core_extra_phdrs' binfmt_elfo32.c:(.text+0x29330): undefined reference to `elf_core_extra_data_size' binfmt_elfo32.c:(.text+0x2951c): undefined reference to `elf_core_write_extra_phdrs' binfmt_elfo32.c:(.text+0x29710): undefined reference to `elf_core_write_extra_data' This is because binfmt_elfn32 and binfmt_elfo32 are using symbols from elfcore but for these configurations elfcore will not be built. Fixed by making elfcore selectable by a separate config symbol which unlike the current mechanism can also be used from other directories than kernel/, then having each flavor of ELF that relies on elfcore.o, select it in Kconfig, including CONFIG_MIPS32_N32 and CONFIG_MIPS32_O32 which fixes this issue. Link: http://lkml.kernel.org/r/20160520141705.GA1913@linux-mips.org Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Reviewed-by: James Hogan <james.hogan@imgtec.com> Cc: "Maciej W. Rozycki" <macro@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-03exit/exec: Seperate mm_release()Thomas Gleixner
commit 4610ba7ad877fafc0a25a30c6c82015304120426 upstream. mm_release() contains the futex exit handling. mm_release() is called from do_exit()->exit_mm() and from exec()->exec_mm(). In the exit_mm() case PF_EXITING and the futex state is updated. In the exec_mm() case these states are not touched. As the futex exit code needs further protections against exit races, this needs to be split into two functions. Preparatory only, no functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20191106224556.240518241@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23nfsd4: readdirplus shouldn't return parent of exportJ. Bruce Fields
commit 51b2ee7d006a736a9126e8111d1f24e4fd0afaa6 upstream. If you export a subdirectory of a filesystem, a READDIRPLUS on the root of that export will return the filehandle of the parent with the ".." entry. The filehandle is optional, so let's just not return the filehandle for ".." if we're at the root of an export. Note that once the client learns one filehandle outside of the export, they can trivially access the rest of the export using further lookups. However, it is also not very difficult to guess filehandles outside of the export. So exporting a subdirectory of a filesystem should considered equivalent to providing access to the entire filesystem. To avoid confusion, we recommend only exporting entire filesystems. Reported-by: Youjipeng <wangzhibei1999@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23ext4: fix superblock checksum failure when setting password saltJan Kara
commit dfd56c2c0c0dbb11be939b804ddc8d5395ab3432 upstream. When setting password salt in the superblock, we forget to recompute the superblock checksum so it will not match until the next superblock modification which recomputes the checksum. Fix it. CC: Michael Halcrow <mhalcrow@google.com> Reported-by: Andreas Dilger <adilger@dilger.ca> Fixes: 9bd8212f981e ("ext4 crypto: add encryption policy and password salt support") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201216101844.22917-8-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23NFS: nfs_igrab_and_active must first reference the superblockTrond Myklebust
commit 896567ee7f17a8a736cda8a28cc987228410a2ac upstream. Before referencing the inode, we must ensure that the superblock can be referenced. Otherwise, we can end up with iput() calling superblock operations that are no longer valid or accessible. Fixes: ea7c38fef0b7 ("NFSv4: Ensure we reference the inode for return-on-close in delegreturn") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23ext4: fix bug for rename with RENAME_WHITEOUTyangerkun
[ Upstream commit 6b4b8e6b4ad8553660421d6360678b3811d5deb9 ] We got a "deleted inode referenced" warning cross our fsstress test. The bug can be reproduced easily with following steps: cd /dev/shm mkdir test/ fallocate -l 128M img mkfs.ext4 -b 1024 img mount img test/ dd if=/dev/zero of=test/foo bs=1M count=128 mkdir test/dir/ && cd test/dir/ for ((i=0;i<1000;i++)); do touch file$i; done # consume all block cd ~ && renameat2(AT_FDCWD, /dev/shm/test/dir/file1, AT_FDCWD, /dev/shm/test/dir/dst_file, RENAME_WHITEOUT) # ext4_add_entry in ext4_rename will return ENOSPC!! cd /dev/shm/ && umount test/ && mount img test/ && ls -li test/dir/file1 We will get the output: "ls: cannot access 'test/dir/file1': Structure needs cleaning" and the dmesg show: "EXT4-fs error (device loop0): ext4_lookup:1626: inode #2049: comm ls: deleted inode referenced: 139" ext4_rename will create a special inode for whiteout and use this 'ino' to replace the source file's dir entry 'ino'. Once error happens latter(the error above was the ENOSPC return from ext4_add_entry in ext4_rename since all space has been consumed), the cleanup do drop the nlink for whiteout, but forget to restore 'ino' with source file. This will trigger the bug describle as above. Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Fixes: cd808deced43 ("ext4: support RENAME_WHITEOUT") Link: https://lore.kernel.org/r/20210105062857.3566-1-yangerkun@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-17ubifs: wbuf: Don't leak kernel memory to flashRichard Weinberger
commit 20f1431160c6b590cdc269a846fc5a448abf5b98 upstream Write buffers use a kmalloc()'ed buffer, they can leak up to seven bytes of kernel memory to flash if writes are not aligned. So use ubifs_pad() to fill these gaps with padding bytes. This was never a problem while scanning because the scanner logic manually aligns node lengths and skips over these gaps. Cc: <stable@vger.kernel.org> Fixes: 1e51764a3c2ac05a2 ("UBIFS: add new flash file system") Signed-off-by: Richard Weinberger <richard@nod.at> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> [sudip: adjust context] Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-09quota: Don't overflow quota file offsetsJan Kara
[ Upstream commit 10f04d40a9fa29785206c619f80d8beedb778837 ] The on-disk quota format supports quota files with upto 2^32 blocks. Be careful when computing quota file offsets in the quota files from block numbers as they can overflow 32-bit types. Since quota files larger than 4GB would require ~26 millions of quota users, this is mostly a theoretical concern now but better be careful, fuzzers would find the problem sooner or later anyway... Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-09reiserfs: add check for an invalid ih_entry_countRustam Kovhaev
commit d24396c5290ba8ab04ba505176874c4e04a2d53c upstream. when directory item has an invalid value set for ih_entry_count it might trigger use-after-free or out-of-bounds read in bin_search_in_dir_item() ih_entry_count * IH_SIZE for directory item should not be larger than ih_item_len Link: https://lore.kernel.org/r/20201101140958.3650143-1-rkovhaev@gmail.com Reported-and-tested-by: syzbot+83b6f7cf9922cae5c4d7@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=83b6f7cf9922cae5c4d7 Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-29jfs: Fix array index bounds check in dbAdjTreeDave Kleikamp
commit c61b3e4839007668360ed8b87d7da96d2e59fc6c upstream. Bounds checking tools can flag a bug in dbAdjTree() for an array index out of bounds in dmt_stree. Since dmt_stree can refer to the stree in both structures dmaptree and dmapctl, use the larger array to eliminate the false positive. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-29jffs2: Fix GC exit abnormallyZhe Li
commit 9afc9a8a4909fece0e911e72b1060614ba2f7969 upstream. The log of this problem is: jffs2: Error garbage collecting node at 0x***! jffs2: No space for garbage collection. Aborting GC thread This is because GC believe that it do nothing, so it abort. After going over the image of jffs2, I find a scene that can trigger this problem stably. The scene is: there is a normal dirent node at summary-area, but abnormal at corresponding not-summary-area with error name_crc. The reason that GC exit abnormally is because it find that abnormal dirent node to GC, but when it goes to function jffs2_add_fd_to_list, it cannot meet the condition listed below: if ((*prev)->nhash == new->nhash && !strcmp((*prev)->name, new->name)) So no node is marked obsolete, statistical information of erase_block do not change, which cause GC exit abnormally. The root cause of this problem is: we do not check the name_crc of the abnormal dirent node with summary is enabled. Noticed that in function jffs2_scan_dirent_node, we use function jffs2_scan_dirty_space to deal with the dirent node with error name_crc. So this patch add a checking code in function read_direntry to ensure the correctness of dirent node. If checked failed, the dirent node will be marked obsolete so GC will pass this node and this problem will be fixed. Cc: <stable@vger.kernel.org> Signed-off-by: Zhe Li <lizhe67@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-29ceph: fix race in concurrent __ceph_remove_cap invocationsLuis Henriques
commit e5cafce3ad0f8652d6849314d951459c2bff7233 upstream. A NULL pointer dereference may occur in __ceph_remove_cap with some of the callbacks used in ceph_iterate_session_caps, namely trim_caps_cb and remove_session_caps_cb. Those callers hold the session->s_mutex, so they are prevented from concurrent execution, but ceph_evict_inode does not. Since the callers of this function hold the i_ceph_lock, the fix is simply a matter of returning immediately if caps->ci is NULL. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/43272 Suggested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Luis Henriques <lhenriques@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-29ext4: fix a memory leak of ext4_free_dataChunguang Xu
commit cca415537244f6102cbb09b5b90db6ae2c953bdd upstream. When freeing metadata, we will create an ext4_free_data and insert it into the pending free list. After the current transaction is committed, the object will be freed. ext4_mb_free_metadata() will check whether the area to be freed overlaps with the pending free list. If true, return directly. At this time, ext4_free_data is leaked. Fortunately, the probability of this problem is small, since it only occurs if the file system is corrupted such that a block is claimed by more one inode and those inodes are deleted within a single jbd2 transaction. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Link: https://lore.kernel.org/r/1604764698-4269-8-git-send-email-brookxu@tencent.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-29btrfs: fix return value mixup in btrfs_get_extentPavel Machek
commit 881a3a11c2b858fe9b69ef79ac5ee9978a266dc9 upstream btrfs_get_extent() sets variable ret, but out: error path expect error to be in variable err so the error code is lost. Fixes: 6bf9e4bd6a27 ("btrfs: inode: Verify inode mode to avoid NULL pointer dereference") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Pavel Machek (CIP) <pavel@denx.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> [sudip: adjust context] Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>