summaryrefslogtreecommitdiff
path: root/fs/f2fs
AgeCommit message (Collapse)Author
2026-04-21Merge tag 'f2fs-for-7.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, the changes primarily focus on resolving race conditions, memory safety issues (UAF), and improving the robustness of garbage collection (GC), and folio management. Enhancements: - add page-order information for large folio reads in iostat - add defrag_blocks sysfs node Bug fixes: - fix uninitialized kobject put in f2fs_init_sysfs() - disallow setting an extension to both cold and hot - fix node_cnt race between extent node destroy and writeback - preserve previous reserve_{blocks,node} value when remount - freeze GC and discard threads quickly - fix false alarm of lockdep on cp_global_sem lock - fix data loss caused by incorrect use of nat_entry flag - skip empty sections in f2fs_get_victim - fix inline data not being written to disk in writeback path - fix fsck inconsistency caused by FGGC of node block - fix fsck inconsistency caused by incorrect nat_entry flag usage - call f2fs_handle_critical_error() to set cp_error flag - fix fiemap boundary handling when read extent cache is incomplete - fix use-after-free of sbi in f2fs_compress_write_end_io() - fix UAF caused by decrementing sbi->nr_pages[] in f2fs_write_end_io() - fix incorrect file address mapping when inline inode is unwritten - fix incomplete search range in f2fs_get_victim when f2fs_need_rand_seg is enabled - avoid memory leak in f2fs_rename()" * tag 'f2fs-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (35 commits) f2fs: add page-order information for large folio reads in iostat f2fs: do not support mmap write for large folio f2fs: fix uninitialized kobject put in f2fs_init_sysfs() f2fs: protect extension_list reading with sb_lock in f2fs_sbi_show() f2fs: disallow setting an extension to both cold and hot f2fs: fix node_cnt race between extent node destroy and writeback f2fs: allow empty mount string for Opt_usr|grp|projjquota f2fs: fix to preserve previous reserve_{blocks,node} value when remount f2fs: invalidate block device page cache on umount f2fs: fix to freeze GC and discard threads quickly f2fs: fix to avoid uninit-value access in f2fs_sanity_check_node_footer f2fs: fix false alarm of lockdep on cp_global_sem lock f2fs: fix data loss caused by incorrect use of nat_entry flag f2fs: fix to skip empty sections in f2fs_get_victim f2fs: fix inline data not being written to disk in writeback path f2fs: fix fsck inconsistency caused by FGGC of node block f2fs: fix fsck inconsistency caused by incorrect nat_entry flag usage f2fs: fix to do sanity check on dcc->discard_cmd_cnt conditionally f2fs: refactor node footer flag setting related code f2fs: refactor f2fs_move_node_folio function ...
2026-04-18f2fs: add page-order information for large folio reads in iostatDaniel Lee
Track read folio counts by order in F2FS iostat sysfs and tracepoints. Signed-off-by: Daniel Lee <chullee@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-04-15Merge tag 'mm-stable-2026-04-13-21-45' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...
2026-04-15f2fs: do not support mmap write for large folioJaegeuk Kim
Let's check mmap writes onto the large folio, since we don't support writing large folios. Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-04-13Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linuxLinus Torvalds
Pull fscrypt updates from Eric Biggers: - Various cleanups for the interface between fs/crypto/ and filesystems, from Christoph Hellwig - Simplify and optimize the implementation of v1 key derivation by using the AES library instead of the crypto_skcipher API * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: use AES library for v1 key derivation ext4: use a byte granularity cursor in ext4_mpage_readpages fscrypt: pass a real sector_t to fscrypt_zeroout_range fscrypt: pass a byte length to fscrypt_zeroout_range fscrypt: pass a byte offset to fscrypt_zeroout_range fscrypt: pass a byte length to fscrypt_zeroout_range_inline_crypt fscrypt: pass a byte offset to fscrypt_zeroout_range_inline_crypt fscrypt: pass a byte offset to fscrypt_set_bio_crypt_ctx fscrypt: pass a byte offset to fscrypt_mergeable_bio fscrypt: pass a byte offset to fscrypt_generate_dun fscrypt: move fscrypt_set_bio_crypt_ctx_bh to buffer.c ext4, fscrypt: merge fscrypt_mergeable_bio_bh into io_submit_need_new_bio ext4: factor out a io_submit_need_new_bio helper ext4: open code fscrypt_set_bio_crypt_ctx_bh ext4: initialize the write hint in io_submit_init_bio
2026-04-13f2fs: fix uninitialized kobject put in f2fs_init_sysfs()Guangshuo Li
In f2fs_init_sysfs(), all failure paths after kset_register() jump to put_kobject, which unconditionally releases both f2fs_tune and f2fs_feat. If kobject_init_and_add(&f2fs_feat, ...) fails, f2fs_tune has not been initialized yet, so calling kobject_put(&f2fs_tune) is invalid. Fix this by splitting the unwind path so each error path only releases objects that were successfully initialized. Fixes: a907f3a68ee26ba4 ("f2fs: add a sysfs entry to reclaim POSIX_FADV_NOREUSE pages") Cc: stable@vger.kernel.org Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-04-13f2fs: protect extension_list reading with sb_lock in f2fs_sbi_show()Yongpeng Yang
In f2fs_sbi_show(), the extension_list, extension_count and hot_ext_count are read without holding sbi->sb_lock. If a concurrent sysfs store modifies the extension list via f2fs_update_extension_list(), the show path may read inconsistent count and array contents, potentially leading to out-of-bounds access or displaying stale data. Fix this by holding sb_lock around the entire extension list read and format operation. Fixes: b6a06cbbb5f7 ("f2fs: support hot file extension") Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-04-13f2fs: disallow setting an extension to both cold and hotYongpeng Yang
An extension should not exist in both the cold and hot extension lists simultaneously. When adding a hot extension, check whether it already exists in the cold list, and vice versa. Reject the operation with -EINVAL if a conflict is found. Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-04-13f2fs: fix node_cnt race between extent node destroy and writebackYongpeng Yang
f2fs_destroy_extent_node() does not set FI_NO_EXTENT before clearing extent nodes. When called from f2fs_drop_inode() with I_SYNC set, concurrent kworker writeback can insert new extent nodes into the same extent tree, racing with the destroy and triggering f2fs_bug_on() in __destroy_extent_node(). The scenario is as follows: drop inode writeback - iput - f2fs_drop_inode // I_SYNC set - f2fs_destroy_extent_node - __destroy_extent_node - while (node_cnt) { write_lock(&et->lock) __free_extent_tree write_unlock(&et->lock) - __writeback_single_inode - f2fs_outplace_write_data - f2fs_update_read_extent_cache - __update_extent_tree_range // FI_NO_EXTENT not set, // insert new extent node } // node_cnt == 0, exit while - f2fs_bug_on(node_cnt) // node_cnt > 0 Additionally, __update_extent_tree_range() only checks FI_NO_EXTENT for EX_READ type, leaving EX_BLOCK_AGE updates completely unprotected. This patch set FI_NO_EXTENT under et->lock in __destroy_extent_node(), consistent with other callers (__update_extent_tree_range and __drop_extent_tree) and check FI_NO_EXTENT for both EX_READ and EX_BLOCK_AGE tree. Fixes: 3fc5d5a182f6 ("f2fs: fix to shrink read extent node in batches") Cc: stable@vger.kernel.org Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-04-13f2fs: allow empty mount string for Opt_usr|grp|projjquotaJaegeuk Kim
The fsparam_string_empty() gives an error when mounting without string, since its type is set to fsparam_flag in VFS. So, let's allow the flag as well. This addresses xfstests/f2fs/015 and f2fs/021. Fixes: d18535132523 ("f2fs: separate the options parsing and options checking") Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-04-13Merge tag 'vfs-7.1-rc1.kino' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs i_ino updates from Christian Brauner: "For historical reasons, the inode->i_ino field is an unsigned long, which means that it's 32 bits on 32 bit architectures. This has caused a number of filesystems to implement hacks to hash a 64-bit identifier into a 32-bit field, and deprives us of a universal identifier field for an inode. This changes the inode->i_ino field from an unsigned long to a u64. This shouldn't make any material difference on 64-bit hosts, but 32-bit hosts will see struct inode grow by at least 4 bytes. This could have effects on slabcache sizes and field alignment. The bulk of the changes are to format strings and tracepoints, since the kernel itself doesn't care that much about the i_ino field. The first patch changes some vfs function arguments, so check that one out carefully. With this change, we may be able to shrink some inode structures. For instance, struct nfs_inode has a fileid field that holds the 64-bit inode number. With this set of changes, that field could be eliminated. I'd rather leave that sort of cleanups for later just to keep this simple" * tag 'vfs-7.1-rc1.kino' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nilfs2: fix 64-bit division operations in nilfs_bmap_find_target_in_group() EVM: add comment describing why ino field is still unsigned long vfs: remove externs from fs.h on functions modified by i_ino widening treewide: fix missed i_ino format specifier conversions ext4: fix signed format specifier in ext4_load_inode trace event treewide: change inode->i_ino from unsigned long to u64 nilfs2: widen trace event i_ino fields to u64 f2fs: widen trace event i_ino fields to u64 ext4: widen trace event i_ino fields to u64 zonefs: widen trace event i_ino fields to u64 hugetlbfs: widen trace event i_ino fields to u64 ext2: widen trace event i_ino fields to u64 cachefiles: widen trace event i_ino fields to u64 vfs: widen trace event i_ino fields to u64 net: change sock.sk_ino and sock_i_ino() to u64 audit: widen ino fields to u64 vfs: widen inode hash/lookup functions to u64
2026-04-13Merge tag 'vfs-7.1-rc1.writeback' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs writeback updates from Christian Brauner: "This introduces writeback helper APIs and converts f2fs, gfs2 and nfs to stop accessing writeback internals directly" * tag 'vfs-7.1-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nfs: stop using writeback internals for WB_WRITEBACK accounting gfs2: stop using writeback internals for dirty_exceeded check f2fs: stop using writeback internals for dirty_exceeded checks writeback: prep helpers for dirty-limit and writeback accounting
2026-04-05folio_batch: rename pagevec.h to folio_batch.hTal Zussman
struct pagevec was removed in commit 1e0877d58b1e ("mm: remove struct pagevec"). Rename include/linux/pagevec.h to reflect reality and update includes tree-wide. Add the new filename to MAINTAINERS explicitly, as it no longer matches the "include/linux/page[-_]*" pattern in MEMORY MANAGEMENT - CORE. Link: https://lkml.kernel.org/r/20260225-pagevec_cleanup-v2-3-716868cc2d11@columbia.edu Signed-off-by: Tal Zussman <tz2294@columbia.edu> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05fs: remove unncessary pagevec.h includesTal Zussman
Remove unused pagevec.h includes from .c files. These were found with the following command: grep -rl '#include.*pagevec\.h' --include='*.c' | while read f; do grep -qE 'PAGEVEC_SIZE|folio_batch' "$f" || echo "$f" done There are probably more removal candidates in .h files, but those are more complex to analyze. Link: https://lkml.kernel.org/r/20260225-pagevec_cleanup-v2-2-716868cc2d11@columbia.edu Signed-off-by: Tal Zussman <tz2294@columbia.edu> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Chris Li <chrisl@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: remove stray references to struct pagevecTal Zussman
Patch series "mm: Remove stray references to pagevec", v2. struct pagevec was removed in commit 1e0877d58b1e ("mm: remove struct pagevec"). Remove any stray references to it and rename relevant files and macros accordingly. While at it, remove unnecessary #includes of pagevec.h (now folio_batch.h) in .c files. There are probably more of these that could be removed in .h files, but those are more complex to verify. This patch (of 4): struct pagevec was removed in commit 1e0877d58b1e ("mm: remove struct pagevec"). Remove remaining forward declarations and change __folio_batch_release()'s declaration to match its definition. Link: https://lkml.kernel.org/r/20260225-pagevec_cleanup-v2-0-716868cc2d11@columbia.edu Link: https://lkml.kernel.org/r/20260225-pagevec_cleanup-v2-1-716868cc2d11@columbia.edu Signed-off-by: Tal Zussman <tz2294@columbia.edu> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Chris Li <chrisl@kernel.org> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-02f2fs: fix to preserve previous reserve_{blocks,node} value when remountZhiguo Niu
The following steps will change previous value of reserve_{blocks,node}, this dones not match the original intention. 1.mount -t f2fs -o reserve_root=8192 imgfile test_mount/ F2FS-fs (loop56): Mounted with checkpoint version = 1b69f8c7 mount info: /dev/block/loop56 on /data/test_mount type f2fs (xxx,reserve_root=8192,reserve_node=0,resuid=0,resgid=0,xxx) 2.mount -t f2fs -o remount,reserve_root=4096 /data/test_mount F2FS-fs (loop56): Preserve previous reserve_root=8192 check mount info: reserve_root change to 4096 /dev/block/loop56 on /data/test_mount type f2fs (xxx,reserve_root=4096,reserve_node=0,resuid=0,resgid=0,xxx) Prior to commit d18535132523 ("f2fs: separate the options parsing and options checking"), the value of reserve_{blocks,node} was only set during the first mount, along with the corresponding mount option F2FS_MOUNT_RESERVE_{ROOT,NODE} . If the mount option F2FS_MOUNT_RESERVE_{ROOT,NODE} was found to have been set during the mount/remount, the previously value of reserve_{blocks,node} would also be preserved, as shown in the code below. if (test_opt(sbi, RESERVE_ROOT)) { f2fs_info(sbi, "Preserve previous reserve_root=%u", F2FS_OPTION(sbi).root_reserved_blocks); } else { F2FS_OPTION(sbi).root_reserved_blocks = arg; set_opt(sbi, RESERVE_ROOT); } But commit d18535132523 ("f2fs: separate the options parsing and options checking") only preserved the previous mount option; it did not preserve the previous value of reserve_{blocks,node}. Since value of reserve_{blocks,node} value is assigned or not depends on ctx->spec_mask, ctx->spec_mask should be alos handled in f2fs_check_opt_consistency. This patch will clear the corresponding ctx->spec_mask bits in f2fs_check_opt_consistency to preserve the previously values of reserve_{blocks,node} if it already have a value. Fixes: d18535132523 ("f2fs: separate the options parsing and options checking") Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-04-02f2fs: invalidate block device page cache on umountYongpeng Yang
Neither F2FS nor VFS invalidates the block device page cache, which results in reading stale metadata. An example scenario is shown below: Terminal A Terminal B mount /dev/vdb /mnt/f2fs touch mx // ino = 4 sync dump.f2fs -i 4 /dev/vdb// block on "[Y/N]" touch mx2 // ino = 5 sync umount /mnt/f2fs dump.f2fs -i 5 /dev/vdb // block addr is 0 After umount, the block device page cache is not purged, causing `dump.f2fs -i 5 /dev/vdb` to read stale metadata and see inode 5 with block address 0. Btrfs has encountered a similar issue before, the solution there was to call sync_blockdev() and invalidate_bdev() when the device is closed: mail-archive.com/linux-btrfs@vger.kernel.org/msg54188.html For the root user, the f2fs kernel calls sync_blockdev() on umount to flush all cached data to disk, and f2fs-tools can release the page cache by issuing ioctl(fd, BLKFLSBUF) when accessing the device. However, non-root users are not permitted to drop the page cache, and may still observe stale data. This patch calls sync_blockdev() and invalidate_bdev() during umount to invalidate the block device page cache, thereby preventing stale metadata from being read. Note that this may result in an extra sync_blockdev() call on the first device, in both f2fs_put_super() and kill_block_super(). The second call do nothing, as there are no dirty pages left to flush. It ensures that non-root users do not observe stale data. Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-04-02f2fs: fix to freeze GC and discard threads quicklyDaeho Jeong
Suspend can fail if kernel threads do not freeze for a while. f2fs_gc and f2fs_discard threads can perform long-running operations that prevent them from reaching a freeze point in a timely manner. This patch adds explicit freezing checks in the following locations: 1. f2fs_gc: Added a check at the 'retry' label to exit the loop quickly if freezing is requested, especially during heavy GC rounds. 2. __issue_discard_cmd: Added a 'suspended' flag to break both inner and outer loops during discard command issuance if freezing is detected after at least one command has been issued. 3. __issue_discard_cmd_orderly: Added a similar check for orderly discard to ensure responsiveness. These checks ensure that the threads release locks safely and enter the frozen state. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-04-02f2fs: fix to avoid uninit-value access in f2fs_sanity_check_node_footerChao Yu
syzbot reported a f2fs bug as below: BUG: KMSAN: uninit-value in f2fs_sanity_check_node_footer+0x374/0xa20 fs/f2fs/node.c:1520 f2fs_sanity_check_node_footer+0x374/0xa20 fs/f2fs/node.c:1520 f2fs_finish_read_bio+0xe1e/0x1d60 fs/f2fs/data.c:177 f2fs_read_end_io+0x6ab/0x2220 fs/f2fs/data.c:-1 bio_endio+0x1006/0x1160 block/bio.c:1792 submit_bio_noacct+0x533/0x2960 block/blk-core.c:891 submit_bio+0x57a/0x620 block/blk-core.c:926 blk_crypto_submit_bio include/linux/blk-crypto.h:203 [inline] f2fs_submit_read_bio+0x12c/0x360 fs/f2fs/data.c:557 f2fs_submit_page_bio+0xee2/0x1450 fs/f2fs/data.c:775 read_node_folio+0x384/0x4b0 fs/f2fs/node.c:1481 __get_node_folio+0x5db/0x15d0 fs/f2fs/node.c:1576 f2fs_get_inode_folio+0x40/0x50 fs/f2fs/node.c:1623 do_read_inode fs/f2fs/inode.c:425 [inline] f2fs_iget+0x1209/0x9380 fs/f2fs/inode.c:596 f2fs_fill_super+0x8f5a/0xb2e0 fs/f2fs/super.c:5184 get_tree_bdev_flags+0x6e6/0x920 fs/super.c:1694 get_tree_bdev+0x38/0x50 fs/super.c:1717 f2fs_get_tree+0x35/0x40 fs/f2fs/super.c:5436 vfs_get_tree+0xb3/0x5d0 fs/super.c:1754 fc_mount fs/namespace.c:1193 [inline] do_new_mount_fc fs/namespace.c:3763 [inline] do_new_mount+0x885/0x1dd0 fs/namespace.c:3839 path_mount+0x7a2/0x20b0 fs/namespace.c:4159 do_mount fs/namespace.c:4172 [inline] __do_sys_mount fs/namespace.c:4361 [inline] __se_sys_mount+0x704/0x7f0 fs/namespace.c:4338 __x64_sys_mount+0xe4/0x150 fs/namespace.c:4338 x64_sys_call+0x39f0/0x3ea0 arch/x86/include/generated/asm/syscalls_64.h:166 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x134/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f The root cause is: in f2fs_finish_read_bio(), we may access uninit data in folio if we failed to read the data from device into folio, let's add a check condition to avoid such issue. Cc: stable@kernel.org Fixes: 50ac3ecd8e05 ("f2fs: fix to do sanity check on node footer in {read,write}_end_io") Reported-by: syzbot+9aac813cdc456cdd49f8@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/69a9ca26.a70a0220.305d9a.0000.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-04-02f2fs: fix false alarm of lockdep on cp_global_sem lockChao Yu
lockdep reported a potential deadlock: a) TCMU device removal context: - call del_gendisk() to get q->q_usage_counter - call start_flush_work() to get work_completion of wb->dwork b) f2fs writeback context: - in wb_workfn(), which holds work_completion of wb->dwork - call f2fs_balance_fs() to get sbi->gc_lock c) f2fs vfs_write context: - call f2fs_gc() to get sbi->gc_lock - call f2fs_write_checkpoint() to get sbi->cp_global_sem d) f2fs mount context: - call recover_fsync_data() to get sbi->cp_global_sem - call f2fs_check_and_fix_write_pointer() to call blkdev_report_zones() that goes down to blk_mq_alloc_request and get q->q_usage_counter Original callstack is in Closes tag. However, I think this is a false alarm due to before mount returns successfully (context d), we can not access file therein via vfs_write (context c). Let's introduce per-sb cp_global_sem_key, and assign the key for cp_global_sem, so that lockdep can recognize cp_global_sem from different super block correctly. A lot of work are done by Shin'ichiro Kawasaki, thanks a lot for the work. Fixes: c426d99127b1 ("f2fs: Check write pointer consistency of open zones") Cc: stable@kernel.org Reported-and-tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Closes: https://lore.kernel.org/linux-f2fs-devel/20260218125237.3340441-1-shinichiro.kawasaki@wdc.com Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-04-02f2fs: fix data loss caused by incorrect use of nat_entry flagYongpeng Yang
Data loss can occur when fsync is performed on a newly created file (before any checkpoint has been written) concurrently with a checkpoint operation. The scenario is as follows: create & write & fsync 'file A' write checkpoint - f2fs_do_sync_file // inline inode - f2fs_write_inode // inode folio is dirty - f2fs_write_checkpoint - f2fs_flush_merged_writes - f2fs_sync_node_pages - f2fs_flush_nat_entries - f2fs_fsync_node_pages // no dirty node - f2fs_need_inode_block_update // return false SPO and lost 'file A' f2fs_flush_nat_entries() sets the IS_CHECKPOINTED and HAS_LAST_FSYNC flags for the nat_entry, but this does not mean that the checkpoint has actually completed successfully. However, f2fs_need_inode_block_update() checks these flags and incorrectly assumes that the checkpoint has finished. The root cause is that the semantics of IS_CHECKPOINTED and HAS_LAST_FSYNC are only guaranteed after the checkpoint write fully completes. This patch modifies f2fs_need_inode_block_update() to acquire the sbi->node_write lock before reading the nat_entry flags, ensuring that once IS_CHECKPOINTED and HAS_LAST_FSYNC are observed to be set, the checkpoint operation has already completed. Fixes: e05df3b115e7 ("f2fs: add node operations") Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-04-02f2fs: fix to skip empty sections in f2fs_get_victimDaeho Jeong
In age-based victim selection (ATGC, AT_SSR, or GC_CB), f2fs_get_victim can encounter sections with zero valid blocks. This situation often arises when checkpoint is disabled or due to race conditions between SIT updates and dirty list management. In such cases, f2fs_get_section_mtime() returns INVALID_MTIME, which subsequently triggers a fatal f2fs_bug_on(sbi, mtime == INVALID_MTIME) in add_victim_entry() or get_cb_cost(). This patch adds a check in f2fs_get_victim's selection loop to skip sections with no valid blocks. This prevents unnecessary age calculations for empty sections and avoids the associated kernel panic. This change also allows removing redundant checks in add_victim_entry(). Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-04-02f2fs: fix inline data not being written to disk in writeback pathYongpeng Yang
When f2fs_fiemap() is called with `fileinfo->fi_flags` containing the FIEMAP_FLAG_SYNC flag, it attempts to write data to disk before retrieving file mappings via filemap_write_and_wait(). However, there is an issue where the file does not get mapped as expected. The following scenario can occur: root@vm:/mnt/f2fs# dd if=/dev/zero of=data.3k bs=3k count=1 root@vm:/mnt/f2fs# xfs_io data.3k -c "fiemap -v 0 4096" data.3k: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..5]: 0..5 6 0x307 The root cause of this issue is that f2fs_write_single_data_page() only calls f2fs_write_inline_data() to copy data from the data folio to the inode folio, and it clears the dirty flag on the data folio. However, it does not mark the data folio as writeback. When __filemap_fdatawait_range() checks for folios with the writeback flag, it returns early, causing f2fs_fiemap() to report that the file has no mapping. To fix this issue, the solution is to call f2fs_write_single_node_folio() in f2fs_inline_data_fiemap() when getting fiemap with FIEMAP_FLAG_SYNC flags. This patch ensures that the inode folio is written back and the writeback process completes before proceeding. Cc: stable@kernel.org Fixes: 9ffe0fb5f3bb ("f2fs: handle inline data operations") Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-04-02f2fs: fix fsck inconsistency caused by FGGC of node blockYongpeng Yang
During FGGC node block migration, fsck may incorrectly treat the migrated node block as fsync-written data. The reproduction scenario: root@vm:/mnt/f2fs# seq 1 2048 | xargs -n 1 ./test_sync // write inline inode and sync root@vm:/mnt/f2fs# rm -f 1 root@vm:/mnt/f2fs# sync root@vm:/mnt/f2fs# f2fs_io gc_range // move data block in sync mode and not write CP SPO, "fsck --dry-run" find inode has already checkpointed but still with DENT_BIT_SHIFT set The root cause is that GC does not clear the dentry mark and fsync mark during node block migration, leading fsck to misinterpret them as user-issued fsync writes. In BGGC mode, node block migration is handled by f2fs_sync_node_pages(), which guarantees the dentry and fsync marks are cleared before writing. This patch move the set/clear of the fsync|dentry marks into __write_node_folio to make the logic clearer, and ensures the fsync|dentry mark is cleared in FGGC. Cc: stable@kernel.org Fixes: da011cc0da8c ("f2fs: move node pages only in victim section during GC") Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-04-02f2fs: fix fsck inconsistency caused by incorrect nat_entry flag usageYongpeng Yang
f2fs_need_dentry_mark() reads nat_entry flags without mutual exclusion with the checkpoint path, which can result in an incorrect inode block marking state. The scenario is as follows: create & write & fsync 'file A' write checkpoint - f2fs_do_sync_file // inline inode - f2fs_write_inode // inode folio is dirty - f2fs_write_checkpoint - f2fs_flush_merged_writes - f2fs_sync_node_pages - f2fs_fsync_node_pages // no dirty node - f2fs_need_inode_block_update // return true - f2fs_fsync_node_pages // inode dirtied - f2fs_need_dentry_mark //return true - f2fs_flush_nat_entries - f2fs_write_checkpoint end - __write_node_folio // inode with DENT_BIT_SHIFT set SPO, "fsck --dry-run" find inode has already checkpointed but still with DENT_BIT_SHIFT set The state observed by f2fs_need_dentry_mark() can differ from the state observed in __write_node_folio() after acquiring sbi->node_write. The root cause is that the semantics of IS_CHECKPOINTED and HAS_FSYNCED_INODE are only guaranteed after the checkpoint write has fully completed. This patch moves set_dentry_mark() into __write_node_folio() and protects it with the sbi->node_write lock. Cc: stable@kernel.org Fixes: 88bd02c9472a ("f2fs: fix conditions to remain recovery information in f2fs_sync_file") Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-04-02f2fs: fix to do sanity check on dcc->discard_cmd_cnt conditionallyChao Yu
Syzbot reported a f2fs bug as below: ------------[ cut here ]------------ kernel BUG at fs/f2fs/segment.c:1900! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 1 UID: 0 PID: 6527 Comm: syz.5.110 Not tainted syzkaller #0 PREEMPT_{RT,(full)} Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2026 RIP: 0010:f2fs_issue_discard_timeout+0x59b/0x5a0 fs/f2fs/segment.c:1900 Code: d9 80 e1 07 80 c1 03 38 c1 0f 8c d6 fe ff ff 48 89 df e8 a8 5e fa fd e9 c9 fe ff ff e8 4e 46 94 fd 90 0f 0b e8 46 46 94 fd 90 <0f> 0b 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 RSP: 0018:ffffc9000494f940 EFLAGS: 00010283 RAX: ffffffff843009ca RBX: 0000000000000001 RCX: 0000000000080000 RDX: ffffc9001ca78000 RSI: 00000000000029f3 RDI: 00000000000029f4 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: dffffc0000000000 R11: ffffed100893a431 R12: 1ffff1100893a430 R13: 1ffff1100c2b702c R14: dffffc0000000000 R15: ffff8880449d2160 FS: 00007ffa35fed6c0(0000) GS:ffff88812643d000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2b68634000 CR3: 0000000039f62000 CR4: 00000000003526f0 Call Trace: <TASK> __f2fs_remount fs/f2fs/super.c:2960 [inline] f2fs_reconfigure+0x108a/0x1710 fs/f2fs/super.c:5443 reconfigure_super+0x227/0x8a0 fs/super.c:1080 do_remount fs/namespace.c:3391 [inline] path_mount+0xdc5/0x10e0 fs/namespace.c:4151 do_mount fs/namespace.c:4172 [inline] __do_sys_mount fs/namespace.c:4361 [inline] __se_sys_mount+0x31d/0x420 fs/namespace.c:4338 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7ffa37dbda0a The root cause is there will be race condition in between f2fs_ioc_fitrim() and f2fs_remount(): - f2fs_remount - f2fs_ioc_fitrim - f2fs_issue_discard_timeout - __issue_discard_cmd - __drop_discard_cmd - __wait_all_discard_cmd - f2fs_trim_fs - f2fs_write_checkpoint - f2fs_clear_prefree_segments - f2fs_issue_discard - __issue_discard_async - __queue_discard_cmd - __update_discard_tree_range - __insert_discard_cmd - __create_discard_cmd : atomic_inc(&dcc->discard_cmd_cnt); - sanity check on dcc->discard_cmd_cnt (expect discard_cmd_cnt to be zero) This will only happen when fitrim races w/ remount rw, if we remount to readonly filesystem, remount will wait until mnt_pcp.mnt_writers to zero, that means fitrim is not in process at that time. Cc: stable@kernel.org Fixes: 2482c4325dfe ("f2fs: detect bug_on in f2fs_wait_discard_bios") Reported-by: syzbot+62538b67389ee582837a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/69b07d7c.050a0220.8df7.09a1.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-04-02f2fs: refactor node footer flag setting related codeYongpeng Yang
This patch refactors the node footer flag setting code to simplify redundant logic and adjust function parameters and return types. No logical changes. Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-04-02f2fs: refactor f2fs_move_node_folio functionYongpeng Yang
This patch refactor the f2fs_move_node_folio() function. No logical changes. Cc: stable@kernel.org Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-03-24f2fs: use more generic f2fs_stop_checkpoint()Chao Yu
Let's use more generic f2fs_stop_checkpoint() instead of f2fs_handle_critical_error() to handle critical error in f2fs. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-03-24f2fs: call f2fs_handle_critical_error() to set cp_error flagChao Yu
f2fs_handle_page_eio() is the only left place we set CP_ERROR_FLAG directly, it missed to update superblock.s_stop_reason, let's call f2fs_handle_critical_error() instead to fix that. Introduce STOP_CP_REASON_READ_{META,NODE,DATA} stop_cp_reason enum variable to indicate which kind of data we failed to read. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-03-24f2fs: add READ_ONCE() for i_blocks in f2fs_update_inode()Cen Zhang
f2fs_update_inode() reads inode->i_blocks without holding i_lock to serialize it to the on-disk inode, while concurrent truncate or allocation paths may modify i_blocks under i_lock. Since blkcnt_t is u64, this risks torn reads on 32-bit architectures. Following the approach in ext4_inode_blocks_set(), add READ_ONCE() to prevent potential compiler-induced tearing. Fixes: 19f99cee206c ("f2fs: add core inode operations") Cc: stable@vger.kernel.org Signed-off-by: Cen Zhang <zzzccc427@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-03-24f2fs: drop unused ri parameter from truncate_partial_nodes()Yongpeng Yang
The ri parameter in truncate_partial_nodes() is unused. Remove it along with the related code. No logical changes. Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-03-24f2fs: fix fiemap boundary handling when read extent cache is incompleteYongpeng Yang
f2fs_fiemap() calls f2fs_map_blocks() to obtain the block mapping a file, and then merges contiguous mappings into extents. If the mapping is found in the read extent cache, node blocks do not need to be read. However, in the following scenario, a contiguous extent can be split into two extents: $ dd if=/dev/zero of=data.128M bs=1M count=128 $ losetup -f data.128M $ mkfs.f2fs /dev/loop0 -f $ mount -o mode=lfs /dev/loop0 /mnt/f2fs/ $ cd /mnt/f2fs/ $ dd if=/dev/zero of=data.72M bs=1M count=72 && sync $ dd if=/dev/zero of=data.4M bs=1M count=4 && sync $ dd if=/dev/zero of=data.4M bs=1M count=2 seek=2 conv=notrunc && sync $ echo 3 > /proc/sys/vm/drop_caches $ dd if=/dev/zero of=data.4M bs=1M count=2 seek=0 conv=notrunc && sync $ dd if=/dev/zero of=data.4M bs=1M count=2 seek=0 conv=notrunc && sync $ f2fs_io fiemap 0 1024 data.4M Fiemap: offset = 0 len = 1024 logical addr. physical addr. length flags 0 0000000000000000 0000000006400000 0000000000200000 00001000 1 0000000000200000 0000000006600000 0000000000200000 00001001 Although the physical addresses of the ranges 0~2MB and 2M~4MB are contiguous, the mapping for the 2M~4MB range is not present in memory. When the physical addresses for the 0~2MB range are updated, no merge happens because the adjacent mapping is missing from the in-memory cache. As a result, fiemap reports two separate extents instead of a single contiguous one. The root cause is that the read extent cache does not guarantee that all blocks of an extent are present in memory. Therefore, when the extent length returned by f2fs_map_blocks_cached() is smaller than maxblocks, the remaining mappings are retrieved via f2fs_get_dnode_of_data() to ensure correct fiemap extent boundary handling. Cc: stable@kernel.org Fixes: cd8fc5226bef ("f2fs: remove the create argument to f2fs_map_blocks") Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-03-24f2fs: fix incorrect multidevice info in trace_f2fs_map_blocks()Yongpeng Yang
When f2fs_map_blocks()->f2fs_map_blocks_cached() hits the read extent cache, map->m_multidev_dio is not updated, which leads to incorrect multidevice information being reported by trace_f2fs_map_blocks(). This patch updates map->m_multidev_dio in f2fs_map_blocks_cached() when the read extent cache is hit. Cc: stable@kernel.org Fixes: 0094e98bd147 ("f2fs: factor a f2fs_map_blocks_cached helper") Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-03-24f2fs: fix use-after-free of sbi in f2fs_compress_write_end_io()George Saad
In f2fs_compress_write_end_io(), dec_page_count(sbi, type) can bring the F2FS_WB_CP_DATA counter to zero, unblocking f2fs_wait_on_all_pages() in f2fs_put_super() on a concurrent unmount CPU. The unmount path then proceeds to call f2fs_destroy_page_array_cache(sbi), which destroys sbi->page_array_slab via kmem_cache_destroy(), and eventually kfree(sbi). Meanwhile, the bio completion callback is still executing: when it reaches page_array_free(sbi, ...), it dereferences sbi->page_array_slab — a destroyed slab cache — to call kmem_cache_free(), causing a use-after-free. This is the same class of bug as CVE-2026-23234 (which fixed the equivalent race in f2fs_write_end_io() in data.c), but in the compressed writeback completion path that was not covered by that fix. Fix this by moving dec_page_count() to after page_array_free(), so that all sbi accesses complete before the counter decrement that can unblock unmount. For non-last folios (where atomic_dec_return on cic->pending_pages is nonzero), dec_page_count is called immediately before returning — page_array_free is not reached on this path, so there is no post-decrement sbi access. For the last folio, page_array_free runs while the F2FS_WB_CP_DATA counter is still nonzero (this folio has not yet decremented it), keeping sbi alive, and dec_page_count runs as the final operation. Fixes: 4c8ff7095bef ("f2fs: support data compression") Cc: stable@vger.kernel.org Signed-off-by: George Saad <geoo115@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-03-24f2fs: drop unused sbi parameter from f2fs_in_warm_node_list()Yongpeng Yang
The sbi parameter in f2fs_in_warm_node_list() is not used. Remove it to simplify the function. Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-03-24f2fs: fix UAF caused by decrementing sbi->nr_pages[] in f2fs_write_end_io()Yongpeng Yang
The xfstests case "generic/107" and syzbot have both reported a NULL pointer dereference. The concurrent scenario that triggers the panic is as follows: F2FS_WB_CP_DATA write callback umount - f2fs_write_checkpoint - f2fs_wait_on_all_pages(sbi, F2FS_WB_CP_DATA) - blk_mq_end_request - bio_endio - f2fs_write_end_io : dec_page_count(sbi, F2FS_WB_CP_DATA) : wake_up(&sbi->cp_wait) - kill_f2fs_super - kill_block_super - f2fs_put_super : iput(sbi->node_inode) : sbi->node_inode = NULL : f2fs_in_warm_node_list - is_node_folio // sbi->node_inode is NULL and panic The root cause is that f2fs_put_super() calls iput(sbi->node_inode) and sets sbi->node_inode to NULL after sbi->nr_pages[F2FS_WB_CP_DATA] is decremented to zero. As a result, f2fs_in_warm_node_list() may dereference a NULL node_inode when checking whether a folio belongs to the node inode, leading to a panic. This patch fixes the issue by calling f2fs_in_warm_node_list() before decrementing sbi->nr_pages[F2FS_WB_CP_DATA], thus preventing the use-after-free condition. Cc: stable@kernel.org Fixes: 50fa53eccf9f ("f2fs: fix to avoid broken of dnode block list") Reported-by: syzbot+6e4cb1cac5efc96ea0ca@syzkaller.appspotmail.com Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-03-24f2fs: avoid reading already updated pages during GCJianan Huang
We found the following issue during fuzz testing: page: refcount:3 mapcount:0 mapping:00000000b6e89c65 index:0x18b2dc pfn:0x161ba9 memcg:f8ffff800e269c00 aops:f2fs_meta_aops ino:2 flags: 0x52880000000080a9(locked|waiters|uptodate|lru|private|zone=1|kasantag=0x4a) raw: 52880000000080a9 fffffffec6e17588 fffffffec0ccc088 a7ffff8067063618 raw: 000000000018b2dc 0000000000000009 00000003ffffffff f8ffff800e269c00 page dumped because: VM_BUG_ON_FOLIO(folio_test_uptodate(folio)) page_owner tracks the page as allocated post_alloc_hook+0x58c/0x5ec prep_new_page+0x34/0x284 get_page_from_freelist+0x2dcc/0x2e8c __alloc_pages_noprof+0x280/0x76c __folio_alloc_noprof+0x18/0xac __filemap_get_folio+0x6bc/0xdc4 pagecache_get_page+0x3c/0x104 do_garbage_collect+0x5c78/0x77a4 f2fs_gc+0xd74/0x25f0 gc_thread_func+0xb28/0x2930 kthread+0x464/0x5d8 ret_from_fork+0x10/0x20 ------------[ cut here ]------------ kernel BUG at mm/filemap.c:1563! folio_end_read+0x140/0x168 f2fs_finish_read_bio+0x5c4/0xb80 f2fs_read_end_io+0x64c/0x708 bio_endio+0x85c/0x8c0 blk_update_request+0x690/0x127c scsi_end_request+0x9c/0xb8c scsi_io_completion+0xf0/0x250 scsi_finish_command+0x430/0x45c scsi_complete+0x178/0x6d4 blk_mq_complete_request+0xcc/0x104 scsi_done_internal+0x214/0x454 scsi_done+0x24/0x34 which is similar to the problem reported by syzbot: https://syzkaller.appspot.com/bug?extid=3686758660f980b402dc This case is consistent with the description in commit 9bf1a3f ("f2fs: avoid GC causing encrypted file corrupted"): Page 1 is moved from blkaddr A to blkaddr B by move_data_block, and after being written it is marked as uptodate. Then, Page 1 is moved from blkaddr B to blkaddr C, VM_BUG_ON_FOLIO was triggered in the endio initiated by ra_data_block. There is no need to read Page 1 again from blkaddr B, since it has already been updated. Therefore, avoid initiating I/O in this case. Fixes: 6aa58d8ad20a ("f2fs: readahead encrypted block during GC") Signed-off-by: Jianan Huang <huangjianan@xiaomi.com> Signed-off-by: Sheng Yong <shengyong1@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-03-24f2fs: Add defrag_blocks sysfs nodeliujinbao1
Add the defrag_blocks sysfs node to track the amount of data blocks moved during filesystem defragmentation. Signed-off-by: Sheng Yong <shengyong1@xiaomi.com> Signed-off-by: liujinbao1 <liujinbao1@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-03-24f2fs: fix incorrect file address mapping when inline inode is unwrittenYongpeng Yang
When `fileinfo->fi_flags` does not have the `FIEMAP_FLAG_SYNC` bit set and inline data has not been persisted yet, the physical address of the extent is calculated incorrectly for unwritten inline inodes. root@vm:/mnt/f2fs# dd if=/dev/zero of=data.3k bs=3k count=1 root@vm:/mnt/f2fs# f2fs_io fiemap 0 100 data.3k Fiemap: offset = 0 len = 100 logical addr. physical addr. length flags 0 0000000000000000 00000ffffffff16c 0000000000000c00 00000301 This patch fixes the issue by checking if the inode's address is valid. If the inline inode is unwritten, set the physical address to 0 and mark the extent with `FIEMAP_EXTENT_UNKNOWN | FIEMAP_EXTENT_DELALLOC` flags. Cc: stable@kernel.org Fixes: 67f8cf3cee6f ("f2fs: support fiemap for inline_data") Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-03-24f2fs:Fix incomplete search range in f2fs_get_victim when f2fs_need_rand_seg ↵liujinbao1
is enabled During the f2fs_get_victim process, when the f2fs_need_rand_seg is enabled in select_policy, p->offset is a random value, and the search range is from p->offset to MAIN_SECS. When segno >= last_segment, the loop breaks and exits directly without searching the range from 0 to p->offset.This results in an incomplete search when the random offset is not zero. Signed-off-by: liujinbao1 <liujinbao1@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-03-24f2fs: fix to avoid memory leak in f2fs_rename()Chao Yu
syzbot reported a f2fs bug as below: BUG: memory leak unreferenced object 0xffff888127f70830 (size 16): comm "syz.0.23", pid 6144, jiffies 4294943712 hex dump (first 16 bytes): 3c af 57 72 5b e6 8f ad 6e 8e fd 33 42 39 03 ff <.Wr[...n..3B9.. backtrace (crc 925f8a80): kmemleak_alloc_recursive include/linux/kmemleak.h:44 [inline] slab_post_alloc_hook mm/slub.c:4520 [inline] slab_alloc_node mm/slub.c:4844 [inline] __do_kmalloc_node mm/slub.c:5237 [inline] __kmalloc_noprof+0x3bd/0x560 mm/slub.c:5250 kmalloc_noprof include/linux/slab.h:954 [inline] fscrypt_setup_filename+0x15e/0x3b0 fs/crypto/fname.c:364 f2fs_setup_filename+0x52/0xb0 fs/f2fs/dir.c:143 f2fs_rename+0x159/0xca0 fs/f2fs/namei.c:961 f2fs_rename2+0xd5/0xf20 fs/f2fs/namei.c:1308 vfs_rename+0x7ff/0x1250 fs/namei.c:6026 filename_renameat2+0x4f4/0x660 fs/namei.c:6144 __do_sys_renameat2 fs/namei.c:6173 [inline] __se_sys_renameat2 fs/namei.c:6168 [inline] __x64_sys_renameat2+0x59/0x80 fs/namei.c:6168 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xe2/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f The root cause is in commit 40b2d55e0452 ("f2fs: fix to create selinux label during whiteout initialization"), we added a call to f2fs_setup_filename() without a matching call to f2fs_free_filename(), fix it. Fixes: 40b2d55e0452 ("f2fs: fix to create selinux label during whiteout initialization") Cc: stable@kernel.org Reported-by: syzbot+cf7946ab25b21abc4b66@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/69a75fe1.a70a0220.b118c.0014.GAE@google.com Suggested-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-03-24f2fs: remove unreachable code in f2fs_encrypt_one_page()Eric Biggers
Since commit 52e7e0d88933 ("fscrypt: Switch to sync_skcipher and on-stack requests") eliminated the dynamic allocation of crypto requests, the only remaining dynamic memory allocation done by fscrypt_encrypt_pagecache_blocks() is the bounce page allocation. The bounce page is allocated from a mempool. Mempool allocations with GFP_NOFS never fail. Therefore, fscrypt_encrypt_pagecache_blocks() can no longer return -ENOMEM when passed GFP_NOFS. Remove the now-unreachable code from f2fs_encrypt_one_page(). Suggested-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/all/d9dc2ee1-283d-4467-ad36-a6a4aa557589@suse.cz/ Signed-off-by: Eric Biggers <ebiggers@kernel.org> Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-03-09fscrypt: pass a real sector_t to fscrypt_zeroout_rangeChristoph Hellwig
While the pblk argument to fscrypt_zeroout_range is declared as a sector_t, it actually is interpreted as a logical block size unit, which is highly unusual. Switch to passing the 512 byte units that sector_t is defined for. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20260302141922.370070-14-hch@lst.de Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-09fscrypt: pass a byte length to fscrypt_zeroout_rangeChristoph Hellwig
Range lengths are usually expressed as bytes in the VFS, switch fscrypt_zeroout_range to this convention. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20260302141922.370070-13-hch@lst.de Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-09fscrypt: pass a byte offset to fscrypt_zeroout_rangeChristoph Hellwig
Logical offsets into an inode are usually expressed as bytes in the VFS. Switch fscrypt_zeroout_range to that convention. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20260302141922.370070-12-hch@lst.de Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-09fscrypt: pass a byte offset to fscrypt_set_bio_crypt_ctxChristoph Hellwig
Logical offsets into an inode are usually expressed as bytes in the VFS. Switch fscrypt_set_bio_crypt_ctx to that convention. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20260302141922.370070-9-hch@lst.de Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-09fscrypt: pass a byte offset to fscrypt_mergeable_bioChristoph Hellwig
Logical offsets into an inode are usually expressed as bytes in the VFS. Switch fscrypt_mergeable_bio to that convention. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20260302141922.370070-8-hch@lst.de Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-06treewide: change inode->i_ino from unsigned long to u64Jeff Layton
On 32-bit architectures, unsigned long is only 32 bits wide, which causes 64-bit inode numbers to be silently truncated. Several filesystems (NFS, XFS, BTRFS, etc.) can generate inode numbers that exceed 32 bits, and this truncation can lead to inode number collisions and other subtle bugs on 32-bit systems. Change the type of inode->i_ino from unsigned long to u64 to ensure that inode numbers are always represented as 64-bit values regardless of architecture. Update all format specifiers treewide from %lu/%lx to %llu/%llx to match the new type, along with corresponding local variable types. This is the bulk treewide conversion. Earlier patches in this series handled trace events separately to allow trace field reordering for better struct packing on 32-bit. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260304-iino-u64-v3-12-2257ad83d372@kernel.org Acked-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-06vfs: widen inode hash/lookup functions to u64Jeff Layton
Change the inode hash/lookup VFS API functions to accept u64 parameters instead of unsigned long for inode numbers and hash values. This is preparation for widening i_ino itself to u64, which will allow filesystems to store full 64-bit inode numbers on 32-bit architectures. Since unsigned long implicitly widens to u64 on all architectures, this change is backward-compatible with all existing callers. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260304-iino-u64-v3-1-2257ad83d372@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>