summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2026-04-10smb: client: add support for O_TMPFILEPaulo Alcantara
Implement O_TMPFILE support for SMB2+ in the CIFS client. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2026-04-10vfs: introduce d_mark_tmpfile_name()Paulo Alcantara
CIFS requires O_TMPFILE dentries to have names of newly created delete-on-close files in the server so it can build full pathnames from the root of the share when performing operations on them. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: David Howells <dhowells@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2026-04-10smb: client: add missing MODULE_DESCRIPTION() to smb1maperror_testVenkat Rao Bagalkote
On the latest linux-next following modpost warning is reported: WARNING: modpost: missing MODULE_DESCRIPTION() in fs/smb/client/smb1maperror_test.o Add MODULE_DESCRIPTION() to the test module to fix the warning. Reviewed-by: Saket Kumar Bhaskar <skb99@linux.ibm.com> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-04-10Merge tag 'vfs-7.0-rc8.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "The kernfs rbtree is keyed by (hash, ns, name) where the hash is seeded with the raw namespace pointer via init_name_hash(ns). The resulting hash values are exposed to userspace through readdir seek positions, and the pointer-based ordering in kernfs_name_compare() is observable through entry order. Switch from raw pointers to ns_common::ns_id for both hashing and comparison. A preparatory commit first replaces all const void * namespace parameters with const struct ns_common * throughout kernfs, sysfs, and kobject so the code can access ns->ns_id. Also compare the ns_id when hashes match in the rbtree to handle crafted collisions. Also fix eventpoll RCU grace period issue and a cachefiles refcount problem" * tag 'vfs-7.0-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: kernfs: make directory seek namespace-aware kernfs: use namespace id instead of pointer for hashing and comparison kernfs: pass struct ns_common instead of const void * for namespace tags eventpoll: defer struct eventpoll free to RCU grace period cachefiles: fix incorrect dentry refcount in cachefiles_cull()
2026-04-10erofs: error out obviously illegal extents in advanceGao Xiang
Detect some corrupted extent cases during metadata parsing rather than letting them result in harmless decompression failures later: - For full-reference compressed extents, the compressed size must not exceed the decompressed size, which is a strict on-disk layout constraint; - For plain (shifted/interlaced) extents, the decoded size must not exceed the encoded size, even accounting for partial decoding. Both ways work but it should be better to report illegal extents as metadata layout violations rather than deferring as decompression failure. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2026-04-10erofs: clean up encoded map flagsGao Xiang
- Remove EROFS_MAP_ENCODED since it was always set together with EROFS_MAP_MAPPED for compressed extents and checked redundantly; - Replace the EROFS_MAP_FULL_MAPPED flag with the opposite EROFS_MAP_PARTIAL_MAPPED flag so that extents are implicitly fully mapped initially to simplify the logic; - Make fragment extents independent of EROFS_MAP_MAPPED since they are not directly allocated on disk; thus fragment extents are no longer twisted with mapped extents. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2026-04-09jbd2: fix deadlock in jbd2_journal_cancel_revoke()Zhang Yi
Commit f76d4c28a46a ("fs/jbd2: use sleeping version of __find_get_block()") changed jbd2_journal_cancel_revoke() to use __find_get_block_nonatomic() which holds the folio lock instead of i_private_lock. This breaks the lock ordering (folio -> buffer) and causes an ABBA deadlock when the filesystem blocksize < pagesize: T1 T2 ext4_mkdir() ext4_init_new_dir() ext4_append() ext4_getblk() lock_buffer() <- A sync_blockdev() blkdev_writepages() writeback_iter() writeback_get_folio() folio_lock() <- B ext4_journal_get_create_access() jbd2_journal_cancel_revoke() __find_get_block_nonatomic() folio_lock() <- B block_write_full_folio() lock_buffer() <- A This can occasionally cause generic/013 to hang. Fix by only calling __find_get_block_nonatomic() when the passed buffer_head doesn't belong to the bdev, which is the only case that we need to look up its bdev alias. Otherwise, the lookup is redundant since the found buffer_head is equal to the one we passed in. Fixes: f76d4c28a46a ("fs/jbd2: use sleeping version of __find_get_block()") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20260409114204.917154-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2026-04-09ext4: fix missing brelse() in ext4_xattr_inode_dec_ref_all()Sohei Koyama
The commit c8e008b60492 ("ext4: ignore xattrs past end") introduced a refcount leak in when block_csum is false. ext4_xattr_inode_dec_ref_all() calls ext4_get_inode_loc() to get iloc.bh, but never releases it with brelse(). Fixes: c8e008b60492 ("ext4: ignore xattrs past end") Signed-off-by: Sohei Koyama <skoyama@ddn.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: stable@vger.kernel.org Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Baokun Li <libaokun@linux.alibaba.com> Link: https://patch.msgid.link/20260406074830.8480-1-skoyama@ddn.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: fix possible null-ptr-deref in mbt_kunit_exit()Ye Bin
There's issue as follows: # test_new_blocks_simple: failed to initialize: -12 KASAN: null-ptr-deref in range [0x0000000000000638-0x000000000000063f] Tainted: [E]=UNSIGNED_MODULE, [N]=TEST RIP: 0010:mbt_kunit_exit+0x5e/0x3e0 [ext4_test] Call Trace: <TASK> kunit_try_run_case_cleanup+0xbc/0x100 [kunit] kunit_generic_run_threadfn_adapter+0x89/0x100 [kunit] kthread+0x408/0x540 ret_from_fork+0xa76/0xdf0 ret_from_fork_asm+0x1a/0x30 If mbt_kunit_init() init testcase failed will lead to null-ptr-deref. So add test if 'sb' is inited success in mbt_kunit_exit(). Fixes: 7c9fa399a369 ("ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20260330133035.287842-6-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: fix possible null-ptr-deref in extents_kunit_exit()Ye Bin
There's issue as follows: KASAN: null-ptr-deref in range [0x00000000000002c0-0x00000000000002c7] Tainted: [E]=UNSIGNED_MODULE, [N]=TEST RIP: 0010:extents_kunit_exit+0x2e/0xc0 [ext4_test] Call Trace: <TASK> kunit_try_run_case_cleanup+0xbc/0x100 [kunit] kunit_generic_run_threadfn_adapter+0x89/0x100 [kunit] kthread+0x408/0x540 ret_from_fork+0xa76/0xdf0 ret_from_fork_asm+0x1a/0x30 Above issue happens as extents_kunit_init() init testcase failed. So test if testcase is inited success. Fixes: cb1e0c1d1fad ("ext4: kunit tests for extent splitting and conversion") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://patch.msgid.link/20260330133035.287842-5-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: fix the error handling process in extents_kunit_init).Ye Bin
The error processing in extents_kunit_init() is improper, causing resource leakage. Reconstruct the error handling process to prevent potential resource leaks Fixes: cb1e0c1d1fad ("ext4: kunit tests for extent splitting and conversion") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20260330133035.287842-4-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: call deactivate_super() in extents_kunit_exit()Ye Bin
Call deactivate_super() is called in extents_kunit_exit() to cleanup the file system resource. Fixes: cb1e0c1d1fad ("ext4: kunit tests for extent splitting and conversion") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20260330133035.287842-3-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: fix miss unlock 'sb->s_umount' in extents_kunit_init()Ye Bin
There's warning as follows when do ext4 kunit test: WARNING: kunit_try_catch/15923 still has locks held! 7.0.0-rc3-next-20260309-00028-g73f965a1bbb1-dirty #281 Tainted: G E N 1 lock held by kunit_try_catch/15923: #0: ffff888139f860e0 (&type->s_umount_key#70/1){+.+.}-{4:4}, at: alloc_super.constprop.0+0x172/0xa90 Call Trace: <TASK> dump_stack_lvl+0x180/0x1b0 debug_check_no_locks_held+0xc8/0xd0 do_exit+0x1502/0x2b20 kthread+0x3a9/0x540 ret_from_fork+0xa76/0xdf0 ret_from_fork_asm+0x1a/0x30 As sget() will return 'sb' which holds 's->s_umount' lock. However, "extents-test" miss unlock this lock. So unlock 's->s_umount' in the end of extents_kunit_init(). Fixes: cb1e0c1d1fad ("ext4: kunit tests for extent splitting and conversion") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20260330133035.287842-2-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: fix bounds check in check_xattrs() to prevent out-of-bounds accessDeepanshu Kartikey
The bounds check for the next xattr entry in check_xattrs() uses (void *)next >= end, which allows next to point within sizeof(u32) bytes of end. On the next loop iteration, IS_LAST_ENTRY() reads 4 bytes via *(__u32 *)(entry), which can overrun the valid xattr region. For example, if next lands at end - 1, the check passes since next < end, but IS_LAST_ENTRY() reads 4 bytes starting at end - 1, accessing 3 bytes beyond the valid region. Fix this by changing the check to (void *)next + sizeof(u32) > end, ensuring there is always enough space for the IS_LAST_ENTRY() read on the subsequent iteration. Fixes: 3478c83cf26b ("ext4: improve xattr consistency checking and error reporting") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20260224231429.31361-1-kartikey406@gmail.com/T/ [v1] Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Link: https://patch.msgid.link/20260328150038.349497-1-kartikey406@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: zero post-EOF partial block before appending writeZhang Yi
In cases of appending write beyond EOF, ext4_zero_partial_blocks() is called within ext4_*_write_end() to zero out the partial block beyond EOF. This prevents exposing stale data that might be written through mmap. However, supporting only the regular buffered write path is insufficient. It is also necessary to support the DAX path as well as the upcoming iomap buffered write path. Therefore, move this operation to ext4_write_checks(). In addition, this may introduce a race window in which a post-EOF buffered write can race with an mmap write after the old EOF block has been zeroed. As a result, the data in this block written by the buffer-write and the data written by the mmap-write may be mixed. However, this is safe because users should not rely on the result of the race condition. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260327102939.1095257-14-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: move pagecache_isize_extended() out of active handleZhang Yi
In ext4_alloc_file_blocks(), pagecache_isize_extended() is called under an active handle and may also hold folio lock if the block size is smaller than the folio size. This also breaks the "folio lock -> transaction start" lock ordering for the upcoming iomap buffered I/O path. Therefore, move pagecache_isize_extended() outside of an active handle. Additionally, it is unnecessary to update the file length during each iteration of the allocation loop. Instead, update the file length only to the position where the allocation is successful. Postpone updating the inode size until after the allocation loop completes or is interrupted due to an error. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260327102939.1095257-13-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: remove ctime/mtime update from ext4_alloc_file_blocks()Zhang Yi
The ctime and mtime update is already handled by file_modified() in ext4_fallocate(), the caller of ext4_alloc_file_blocks(). So remove the redundant calls to inode_set_ctime_current() and inode_set_mtime_to_ts() in ext4_alloc_file_blocks(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260327102939.1095257-12-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: unify SYNC mode checks in fallocate pathsZhang Yi
In the ext4 fallocate call chain, SYNC mode handling is inconsistent: some places check the inode state, while others check the open file descriptor state. Unify these checks by evaluating both conditions to ensure consistent behavior across all fallocate operations. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260327102939.1095257-11-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: ensure zeroed partial blocks are persisted in SYNC modeZhang Yi
In ext4_zero_range() and ext4_punch_hole(), when operating in SYNC mode and zeroing a partial block, only data=journal modes guarantee that the zeroed data is synchronously persisted after the operation completes. For data=ordered/writeback mode and non-journal modes, this guarantee is missing. Introduce a partial_zero parameter to explicitly trigger writeback for all scenarios where a partial block is zeroed, ensuring the zeroed data is durably persisted. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20260327102939.1095257-10-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: move zero partial block range functions out of active handleZhang Yi
Move ext4_block_zero_eof() and ext4_zero_partial_blocks() calls out of the active handle context, making them independent operations, and also add return value checks. This is safe because it still ensures data is updated before metadata for data=ordered mode and data=journal mode because we still zero data and ordering data before modifying the metadata. This change is required for iomap infrastructure conversion because the iomap buffered I/O path does not use the same journal infrastructure for partial block zeroing. The lock ordering of folio lock and starting transactions is "folio lock -> transaction start", which is opposite of the current path. Therefore, zeroing partial blocks cannot be performed under the active handle. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260327102939.1095257-9-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: pass allocate range as loff_t to ext4_alloc_file_blocks()Zhang Yi
Change ext4_alloc_file_blocks() to accept offset and len in byte granularity instead of block granularity. This allows callers to pass byte offsets and lengths directly, and this prepares for moving the ext4_zero_partial_blocks() call from the while(len) loop for unaligned append writes, where it only needs to be invoked once before doing block allocation. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260327102939.1095257-8-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: remove handle parameters from zero partial block functionsZhang Yi
Only journal data mode requires an active journal handle when zeroing partial blocks. Stop passing handle_t *handle to ext4_zero_partial_blocks() and related functions, and make ext4_block_journalled_zero_range() start a handle independently. This change has no practical impact now because all callers invoke these functions within the context of an active handle. It prepares for moving ext4_block_zero_eof() out of an active handle in the next patch, which is a prerequisite for converting block zero range operations to iomap infrastructure. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260327102939.1095257-7-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: move ordered data handling out of ext4_block_do_zero_range()Zhang Yi
Remove the handle parameter from ext4_block_do_zero_range() and move the ordered data handling to ext4_block_zero_eof(). This is necessary for truncate up and append writes across a range extending beyond EOF. The ordered data must be committed before updating i_disksize to prevent exposing stale on-disk data from concurrent post-EOF mmap writes during previous folio writeback or in case of system crash during append writes. This is unnecessary for partial block hole punching because the entire punch operation does not provide atomicity guarantees and can already expose intermediate results in case of crash. Hole punching can only ever expose data that was there before the punch but missed zeroing during append / truncate could expose data that was not visible in the file before the operation. Since ordered data handling is no longer performed inside ext4_zero_partial_blocks(), ext4_punch_hole() no longer needs to attach jinode. This is prepared for the conversion to the iomap infrastructure, which does not use ordered data mode while zeroing post-EOF partial blocks. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260327102939.1095257-6-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: rename ext4_block_zero_page_range() to ext4_block_zero_range()Zhang Yi
Rename ext4_block_zero_page_range() to ext4_block_zero_range() since the "page" naming is no longer appropriate for current context. Also change its signature to take an inode pointer instead of an address_space. This aligns with the caller ext4_block_zero_eof() and ext4_zero_partial_blocks(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260327102939.1095257-5-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: factor out journalled block zeroing rangeZhang Yi
Refactor __ext4_block_zero_page_range() by separating the block zeroing operations for ordered data mode and journal data mode into two distinct functions: - ext4_block_do_zero_range(): handles non-journal data mode with ordered data support - ext4_block_journalled_zero_range(): handles journal data mode Also extract a common helper, ext4_load_tail_bh(), to handle buffer head and folio retrieval, along with the associated error handling. This prepares for converting the partial block zero range to the iomap infrastructure. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260327102939.1095257-4-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: rename and extend ext4_block_truncate_page()Zhang Yi
Rename ext4_block_truncate_page() to ext4_block_zero_eof() and extend its signature to accept an explicit 'end' offset instead of calculating the block boundary. This helper function now can replace all cases requiring zeroing of the partial EOF block, including the append buffered write paths in ext4_*_write_end(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260327102939.1095257-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: add did_zero output parameter to ext4_block_zero_page_range()Zhang Yi
Add a bool *did_zero output parameter to ext4_block_zero_page_range() and __ext4_block_zero_page_range(). The parameter reports whether a partial block was zeroed out, which is needed for the upcoming iomap buffered I/O conversion. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260327102939.1095257-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: fix diagnostic printf formatsDavid Laight
The formats for non-terminated names should be "%.*s" not "%*.s". The kernel currently treats "%*.s" as equivalent to "%*s" whereas userspace requires it be equivalent to "%*.0s". Neither is correct here. Signed-off-by: David Laight <david.laight.linux@gmail.com> Link: https://patch.msgid.link/20260326201804.3881-1-david.laight.linux@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: move dcache manipulation out of __ext4_link()NeilBrown
__ext4_link() has two callers. - ext4_link() calls it during normal handling of the link() system call or similar - ext4_fc_replay_link_internal() calls it when replaying the journal at mount time. The former needs changes to dcache - instantiating the dentry to the inode on success. The latter doesn't need or want any dcache manipulation. So move the manipulation out of __ext4_link() and do it in ext4_link() only. This requires: - passing the qname from the dentry explicitly to __ext4_link. The parent dir is already passed. The dentry is still passed in the ext4_link() case purely for use by ext4_fc_track_link(). - passing the inode separately to ext4_fc_track_link() as the dentry will not be instantiated yet. - using __ext4_add_entry() in ext4_link, which doesn't need a dentry. - moving ihold(), d_instantiate(), drop_nlink() and iput() calls out of __ext4_link() into ext4_link(). Note that ext4_inc_count() and drop_nlink() remain in __ext4_link() as both callers need them and they are not related to the dentry. This substantially simplifies ext4_fc_replay_link_internal(), and removes a use of d_alloc() which, it is planned, will be removed. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20260320000838.3797494-4-neilb@ownmail.net Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: add ext4_fc_eligible()NeilBrown
Testing EXT4_MF_FC_INELIGIBLE is almost always combined with testing ext4_fc_disabled(). The code can be simplified by combining these two in a new ext4_fc_eligible(). In ext4_fc_track_inode() this moves the ext4_fc_disabled() test after ext4_fc_mark_ineligible(), but as that is a non-op when ext4_fc_disabled() is true, this is no no consequence. Note that it is important to still call ext4_fc_mark_ineligible() in ext4_fc_track_inode() even when ext4_fc_eligible() would return true. ext4_fc_mark_ineligible() does not ONLY set the "INELIGIBLE" flag but also updates ->s_fc_ineligible_tid to make sure that the flag remains set until all ineligible transactions have been committed. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20260320000838.3797494-3-neilb@ownmail.net Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: split __ext4_add_entry() out of ext4_add_entry()NeilBrown
__ext4_add_entry() is not given a dentry - just inodes and name. This will help the next patch which simplifies __ex4_link(). Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20260320000838.3797494-2-neilb@ownmail.net Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: prefer IS_ERR_OR_NULL over manual NULL checkPhilipp Hahn
Prefer using IS_ERR_OR_NULL() over using IS_ERR() and a manual NULL check. Change generated with coccinelle. To: "Theodore Ts'o" <tytso@mit.edu> To: Andreas Dilger <adilger.kernel@dilger.ca> Cc: linux-ext4@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Philipp Hahn <phahn-oss@avm.de> Link: https://patch.msgid.link/20260310-b4-is_err_or_null-v1-4-bd63b656022d@avm.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-10affs: bound hash_pos before table lookup in affs_readdirHyungjung Joo
affs_readdir() decodes ctx->pos into hash_pos and chain_pos and then dereferences AFFS_HEAD(dir_bh)->table[hash_pos] before validating that hash_pos is within the runtime table bound. Treat out-of-range positions as end-of-directory before the first table lookup. Signed-off-by: Hyungjung Joo <jhj140711@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-7.0-rc8). Conflicts: net/ipv6/seg6_iptunnel.c c3812651b522f ("seg6: separate dst_cache for input and output paths in seg6 lwtunnel") 78723a62b969a ("seg6: add per-route tunnel source address") https://lore.kernel.org/adZhwtOYfo-0ImSa@sirena.org.uk net/ipv4/icmp.c fde29fd934932 ("ipv4: icmp: fix null-ptr-deref in icmp_build_probe()") d98adfbdd5c01 ("ipv4: drop ipv6_stub usage and use direct function calls") https://lore.kernel.org/adO3dccqnr6j-BL9@sirena.org.uk Adjacent changes: drivers/net/ethernet/stmicro/stmmac/chain_mode.c 51f4e090b9f8 ("net: stmmac: fix integer underflow in chain mode") 6b4286e05508 ("net: stmmac: rename STMMAC_GET_ENTRY() -> STMMAC_NEXT_ENTRY()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-10erofs: fix unsigned underflow in z_erofs_lz4_handle_overlap()Junrui Luo
Some crafted images can have illegal (!partial_decoding && m_llen < m_plen) extents, and the LZ4 inplace decompression path can be wrongly hit, but it cannot handle (outpages < inpages) properly: "outpages - inpages" wraps to a large value and the subsequent rq->out[] access reads past the decompressed_pages array. However, such crafted cases can correctly result in a corruption report in the normal LZ4 non-inplace path. Let's add an additional check to fix this for backporting. Reproducible image (base64-encoded gzipped blob): H4sIAJGR12kCA+3SPUoDQRgG4MkmkkZk8QRbRFIIi9hbpEjrHQI5ghfwCN5BLCzTGtLbBI+g dilSJo1CnIm7GEXFxhT6PDDwfrs73/ywIQD/1ePD4r7Ou6ETsrq4mu7XcWfj++Pb58nJU/9i PNtbjhan04/9GtX4qVYc814WDqt6FaX5s+ZwXXeq52lndT6IuVvlblytLMvh4Gzwaf90nsvz 2DF/21+20T/ldgp5s1jXRaN4t/8izsy/OUB6e/Qa79r+JwAAAAAAAL52vQVuGQAAAP6+my1w ywAAAAAAAADwu14ATsEYtgBQAAA= $ mount -t erofs -o cache_strategy=disabled foo.erofs /mnt $ dd if=/mnt/data of=/dev/null bs=4096 count=1 Fixes: 598162d05080 ("erofs: support decompress big pcluster for lz4 backend") Reported-by: Yuhao Jiang <danisjiang@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2026-04-09jbd2: store jinode dirty range in PAGE_SIZE unitsLi Chen
jbd2_inode fields are updated under journal->j_list_lock, but some paths read them without holding the lock (e.g. fast commit helpers and ordered truncate helpers). READ_ONCE() alone is not sufficient for the dirty range fields when they are stored as loff_t because 32-bit platforms can observe torn loads. Store the dirty range in PAGE_SIZE units as pgoff_t instead. Represent the dirty range end as an exclusive end page. This avoids a special sentinel value and keeps MAX_LFS_FILESIZE on 32-bit representable. Publish a new dirty range by updating end_page before start_page, and treat start_page >= end_page as empty in the accessor for robustness. Use READ_ONCE() on the read side and WRITE_ONCE() on the write side for the dirty range and i_flags to match the existing lockless access pattern. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Li Chen <me@linux.beauty> Link: https://patch.msgid.link/20260306085643.465275-5-me@linux.beauty Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ocfs2: use jbd2 jinode dirty range accessorLi Chen
ocfs2 journal commit callback reads jbd2_inode dirty range fields without holding journal->j_list_lock. Use jbd2_jinode_get_dirty_range() to get the range in bytes. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Li Chen <me@linux.beauty> Link: https://patch.msgid.link/20260306085643.465275-4-me@linux.beauty Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: use jbd2 jinode dirty range accessorLi Chen
ext4 journal commit callbacks access jbd2_inode dirty range fields without holding journal->j_list_lock. Use jbd2_jinode_get_dirty_range() to get the range in bytes, and read i_transaction with READ_ONCE() in the redirty check. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Li Chen <me@linux.beauty> Link: https://patch.msgid.link/20260306085643.465275-3-me@linux.beauty Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09jbd2: gracefully abort on transaction state corruptionsMilos Nikic
Auditing the jbd2 codebase reveals several legacy J_ASSERT calls that enforce internal state machine invariants (e.g., verifying jh->b_transaction or jh->b_next_transaction pointers). When these invariants are broken, the journal is in a corrupted state. However, triggering a fatal panic brings down the entire system for a localized filesystem error. This patch targets a specific class of these asserts: those residing inside functions that natively return integer error codes, booleans, or error pointers. It replaces the hard J_ASSERTs with WARN_ON_ONCE to capture the offending stack trace, safely drops any held locks, gracefully aborts the journal, and returns -EINVAL. This prevents a catastrophic kernel panic while ensuring the corrupted journal state is safely contained and upstream callers (like ext4 or ocfs2) can gracefully handle the aborted handle. Functions modified in fs/jbd2/transaction.c: - jbd2__journal_start() - do_get_write_access() - jbd2_journal_dirty_metadata() - jbd2_journal_forget() - jbd2_journal_try_to_free_buffers() - jbd2_journal_file_inode() Signed-off-by: Milos Nikic <nikic.milos@gmail.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://patch.msgid.link/20260304172016.23525-3-nikic.milos@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09jbd2: gracefully abort instead of panicking on unlocked bufferMilos Nikic
In jbd2_journal_get_create_access(), if the caller passes an unlocked buffer, the code currently triggers a fatal J_ASSERT. While an unlocked buffer here is a clear API violation and a bug in the caller, crashing the entire system is an overly severe response. It brings down the whole machine for a localized filesystem inconsistency. Replace the J_ASSERT with a WARN_ON_ONCE to capture the offending caller's stack trace, and return an error (-EINVAL). This allows the journal to gracefully abort the transaction, protecting data integrity without causing a kernel panic. Signed-off-by: Milos Nikic <nikic.milos@gmail.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://patch.msgid.link/20260304172016.23525-2-nikic.milos@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: simplify mballoc preallocation size rounding for small filesWeixie Cui
The if-else ladder in ext4_mb_normalize_request() manually rounds up the preallocation size to the next power of two for files up to 1MB, enumerating each step from 16KB to 1MB individually. Replace this with a single roundup_pow_of_two() call clamped to a 16KB minimum, which is functionally equivalent but much more concise. Also replace raw byte constants with SZ_1M and SZ_16K from <linux/sizes.h> for clarity, and remove the stale "XXX: should this table be tunable?" comment that has been there since the original mballoc code. No functional change. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Weixie Cui <cuiweixie@gmail.com> Link: https://patch.msgid.link/tencent_E9C5F1B2E9939B3037501FD04A7E9CF0C407@qq.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4/move_extent: use folio_next_pos()Julia Lawall
A series of patches such as commit 60a70e61430b ("mm: Use folio_next_pos()") replace folio_pos() + folio_size() by folio_next_pos(). The former performs x << z + y << z while the latter performs (x + y) << z, which is slightly more efficient. This case was not taken into account, perhaps because the argument is not named folio. The change was performed using the following Coccinelle semantic patch: @@ expression folio; @@ - folio_pos(folio) + folio_size(folio) + folio_next_pos(folio) Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20260222125049.1309075-1-Julia.Lawall@inria.fr Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: remove tl argument from ext4_fc_replay_{add,del}_rangeGuoqing Jiang
Since commit a7ba36bc94f2 ("ext4: fix fast commit alignment issues"), both ext4_fc_replay_add_range and ext4_fc_replay_del_range get ex based on 'val' instead of 'tl'. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20260121063805.19863-1-guoqing.jiang@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: remove unused i_fc_waitLi Chen
i_fc_wait is only initialized in ext4_fc_init_inode() and never used for waiting or wakeups. Drop it. Signed-off-by: Li Chen <me@linux.beauty> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20260120121941.144192-1-me@linux.beauty Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09ext4: unmap invalidated folios from page tables in mpage_release_unused_pages()Deepanshu Kartikey
When delayed block allocation fails (e.g., due to filesystem corruption detected in ext4_map_blocks()), the writeback error handler calls mpage_release_unused_pages(invalidate=true) which invalidates affected folios by clearing their uptodate flag via folio_clear_uptodate(). However, these folios may still be mapped in process page tables. If a subsequent operation (such as ftruncate calling ext4_block_truncate_page) triggers a write fault, the existing page table entry allows access to the now-invalidated folio. This leads to ext4_page_mkwrite() being called with a non-uptodate folio, which then gets marked dirty, triggering: WARNING: CPU: 0 PID: 5 at mm/page-writeback.c:2960 __folio_mark_dirty+0x578/0x880 Call Trace: fault_dirty_shared_page+0x16e/0x2d0 do_wp_page+0x38b/0xd20 handle_pte_fault+0x1da/0x450 The sequence leading to this warning is: 1. Process writes to mmap'd file, folio becomes uptodate and dirty 2. Writeback begins, but delayed allocation fails due to corruption 3. mpage_release_unused_pages(invalidate=true) is called: - block_invalidate_folio() clears dirty flag - folio_clear_uptodate() clears uptodate flag - But folio remains mapped in page tables 4. Later, ftruncate triggers ext4_block_truncate_page() 5. This causes a write fault on the still-mapped folio 6. ext4_page_mkwrite() is called with folio that is !uptodate 7. block_page_mkwrite() marks buffers dirty 8. fault_dirty_shared_page() tries to mark folio dirty 9. block_dirty_folio() calls __folio_mark_dirty(warn=1) 10. WARNING triggers: WARN_ON_ONCE(warn && !uptodate && !dirty) Fix this by unmapping folios from page tables before invalidating them using unmap_mapping_pages(). This ensures that subsequent accesses trigger new page faults rather than reusing invalidated folios through stale page table entries. Note that this results in data loss for any writes to the mmap'd region that couldn't be written back, but this is expected behavior when writeback fails due to filesystem corruption. The existing error message already states "This should not happen!! Data will be lost". Reported-by: syzbot+b0a0670332b6b3230a0a@syzkaller.appspotmail.com Tested-by: syzbot+b0a0670332b6b3230a0a@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b0a0670332b6b3230a0a Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Link: https://patch.msgid.link/20251205055914.1393799-1-kartikey406@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09kernfs: make directory seek namespace-awareChristian Brauner
The rbtree backing kernfs directories is ordered by (hash, ns_id, name) but kernfs_dir_pos() only searches by hash when seeking to a position during readdir. When two nodes from different namespaces share the same hash value, the binary search can land on a node in the wrong namespace. The subsequent skip-forward loop walks rb_next() and may overshoot the correct node, silently dropping an entry from the readdir results. With the recent switch from raw namespace pointers to public namespace ids as hash seeds, computing hash collisions became an offline operation. An unprivileged user could unshare into a new network namespace, create a single interface whose name-hash collides with a target entry in init_net, and cause a victim's seekdir/readdir on /sys/class/net to miss that entry. Fix this by extending the rbtree search in kernfs_dir_pos() to also compare namespace ids when hashes match. Since the rbtree is already ordered by (hash, ns_id, name), this makes the seek land directly in the correct namespace's range, eliminating the wrong-namespace overshoot. Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-04-09kernfs: use namespace id instead of pointer for hashing and comparisonChristian Brauner
kernfs uses the namespace tag as both a hash seed (via init_name_hash()) and a comparison key in the rbtree. The resulting hash values are exposed to userspace through directory seek positions (ctx->pos), and the raw pointer comparisons in kernfs_name_compare() encode kernel pointer ordering into the rbtree layout. This constitutes a KASLR information leak since the hash and ordering derived from kernel pointers can be observed from userspace. Fix this by using the 64-bit namespace id (ns_common::ns_id) instead of the raw pointer value for both hashing and comparison. The namespace id is a stable, non-secret identifier that is already exposed to userspace through other interfaces (e.g., /proc/pid/ns/, ioctl NS_GET_NSID). Introduce kernfs_ns_id() as a helper that extracts the namespace id from a potentially-NULL ns_common pointer, returning 0 for the no-namespace case. All namespace equality checks in the directory iteration and dentry revalidation paths are also switched from pointer comparison to ns_id comparison for consistency. Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-04-09kernfs: pass struct ns_common instead of const void * for namespace tagsChristian Brauner
kernfs has historically used const void * to pass around namespace tags used for directory-level namespace filtering. The only current user of this is sysfs network namespace tagging where struct net pointers are cast to void *. Replace all const void * namespace parameters with const struct ns_common * throughout the kernfs, sysfs, and kobject namespace layers. This includes the kobj_ns_type_operations callbacks, kobject_namespace(), and all sysfs/kernfs APIs that accept or return namespace tags. Passing struct ns_common is needed because various codepaths require access to the underlying namespace. A struct ns_common can always be converted back to the concrete namespace type (e.g., struct net) via container_of() or to_ns_common() in the reverse direction. This is a preparatory change for switching to ns_id-based directory iteration to prevent a KASLR pointer leak through the current use of raw namespace pointers as hash seeds and comparison keys. Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-04-09fs/adfs: validate nzones in adfs_validate_bblk()Bae Yeonju
Reject ADFS disc records with a zero zone count during boot block validation, before the disc record is used. When nzones is 0, adfs_read_map() passes it to kmalloc_array(0, ...) which returns ZERO_SIZE_PTR, and adfs_map_layout() then writes to dm[-1], causing an out-of-bounds write before the allocated buffer. adfs_validate_dr0() already rejects nzones != 1 for old-format images. Add the equivalent check to adfs_validate_bblk() for new-format images so that a crafted image with nzones == 0 is rejected at probe time. Found by syzkaller. Fixes: f6f14a0d71b0 ("fs/adfs: map: move map-specific sb initialisation to map.c") Signed-off-by: Bae Yeonju <iwasbaeyz@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2026-04-08ecryptfs: keep the lower iattr contained in truncate_upperChristoph Hellwig
Currently the two callers of truncate_upper handle passing information very differently. ecryptfs_truncate passes a zeroed lower_ia and expects truncate_upper to fill it in from the upper ia created just for that, while ecryptfs_setattr passes a fully initialized lower_ia copied from the upper one. Both of them then call notify_change on the lower_ia. Switch to only passing the upper ia, and derive the lower ia from it inside truncate_upper, and call notify_change inside the function itself. Because the old name is misleading now, rename the resulting function to __ecryptfs_truncate as it deals with both the lower and upper inodes. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tyler Hicks <code@tyhicks.com>