diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2026-02-14 09:48:10 -0800 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2026-02-14 09:48:10 -0800 |
| commit | 3e48a11675c50698374d4ac596fb506736eb1c53 (patch) | |
| tree | 19784102302960f534344f97950dd3230a0163f5 /Documentation | |
| parent | 770aaedb461a055f79b971d538678942b6607894 (diff) | |
| parent | 52190933c37a96164b271f3f30c16099d9eb8c09 (diff) | |
Merge tag 'f2fs-for-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this development cycle, we focused on several key performance
optimizations:
- introducing large folio support to enhance read speeds for
immutable files
- reducing checkpoint=enable latency by flushing only committed dirty
pages
- implementing tracepoints to diagnose and resolve lock priority
inversion.
Additionally, we introduced the packed_ssa feature to optimize the SSA
footprint when utilizing large block sizes.
Detail summary:
Enhancements:
- support large folio for immutable non-compressed case
- support non-4KB block size without packed_ssa feature
- optimize f2fs_enable_checkpoint() to avoid long delay
- optimize f2fs_overwrite_io() for f2fs_iomap_begin
- optimize NAT block loading during checkpoint write
- add write latency stats for NAT and SIT blocks in
f2fs_write_checkpoint
- pin files do not require sbi->writepages lock for ordering
- avoid f2fs_map_blocks() for consecutive holes in readpages
- flush plug periodically during GC to maximize readahead effect
- add tracepoints to catch lock overheads
- add several sysfs entries to tune internal lock priorities
Fixes:
- fix lock priority inversion issue
- fix incomplete block usage in compact SSA summaries
- fix to show simulate_lock_timeout correctly
- fix to avoid mapping wrong physical block for swapfile
- fix IS_CHECKPOINTED flag inconsistency issue caused by
concurrent atomic commit and checkpoint writes
- fix to avoid UAF in f2fs_write_end_io()"
* tag 'f2fs-for-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (61 commits)
f2fs: sysfs: introduce critical_task_priority
f2fs: introduce trace_f2fs_priority_update
f2fs: fix lock priority inversion issue
f2fs: optimize f2fs_overwrite_io() for f2fs_iomap_begin
f2fs: fix incomplete block usage in compact SSA summaries
f2fs: decrease maximum flush retry count in f2fs_enable_checkpoint()
f2fs: optimize NAT block loading during checkpoint write
f2fs: change size parameter of __has_cursum_space() to unsigned int
f2fs: add write latency stats for NAT and SIT blocks in f2fs_write_checkpoint
f2fs: pin files do not require sbi->writepages lock for ordering
f2fs: fix to show simulate_lock_timeout correctly
f2fs: introduce FAULT_SKIP_WRITE
f2fs: check skipped write in f2fs_enable_checkpoint()
Revert "f2fs: add timeout in f2fs_enable_checkpoint()"
f2fs: fix to unlock folio in f2fs_read_data_large_folio()
f2fs: fix error path handling in f2fs_read_data_large_folio()
f2fs: use folio_end_read
f2fs: fix to avoid mapping wrong physical block for swapfile
f2fs: avoid f2fs_map_blocks() for consecutive holes in readpages
f2fs: advance index and offset after zeroing in large folio read
...
Diffstat (limited to 'Documentation')
| -rw-r--r-- | Documentation/ABI/testing/sysfs-fs-f2fs | 62 | ||||
| -rw-r--r-- | Documentation/filesystems/f2fs.rst | 49 |
2 files changed, 106 insertions, 5 deletions
diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs index 770470e0598b..c1d2b3fd9c65 100644 --- a/Documentation/ABI/testing/sysfs-fs-f2fs +++ b/Documentation/ABI/testing/sysfs-fs-f2fs @@ -520,7 +520,7 @@ What: /sys/fs/f2fs/<disk>/ckpt_thread_ioprio Date: January 2021 Contact: "Daeho Jeong" <daehojeong@google.com> Description: Give a way to change checkpoint merge daemon's io priority. - Its default value is "be,3", which means "BE" I/O class and + Its default value is "rt,3", which means "RT" I/O class and I/O priority "3". We can select the class between "rt" and "be", and set the I/O priority within valid range of it. "," delimiter is necessary in between I/O class and priority number. @@ -732,7 +732,7 @@ Description: Support configuring fault injection type, should be FAULT_TRUNCATE 0x00000400 FAULT_READ_IO 0x00000800 FAULT_CHECKPOINT 0x00001000 - FAULT_DISCARD 0x00002000 + FAULT_DISCARD 0x00002000 (obsolete) FAULT_WRITE_IO 0x00004000 FAULT_SLAB_ALLOC 0x00008000 FAULT_DQUOT_INIT 0x00010000 @@ -741,8 +741,10 @@ Description: Support configuring fault injection type, should be FAULT_BLKADDR_CONSISTENCE 0x00080000 FAULT_NO_SEGMENT 0x00100000 FAULT_INCONSISTENT_FOOTER 0x00200000 - FAULT_TIMEOUT 0x00400000 (1000ms) + FAULT_ATOMIC_TIMEOUT 0x00400000 (1000ms) FAULT_VMALLOC 0x00800000 + FAULT_LOCK_TIMEOUT 0x01000000 (1000ms) + FAULT_SKIP_WRITE 0x02000000 =========================== ========== What: /sys/fs/f2fs/<disk>/discard_io_aware_gran @@ -939,3 +941,57 @@ Description: Controls write priority in multi-devices setups. A value of 0 means allocate_section_policy = 1 Prioritize writing to section before allocate_section_hint allocate_section_policy = 2 Prioritize writing to section after allocate_section_hint =========================== ========================================================== + +What: /sys/fs/f2fs/<disk>/max_lock_elapsed_time +Date: December 2025 +Contact: "Chao Yu" <chao@kernel.org> +Description: This is a threshold, once a thread enters critical region that lock covers, total + elapsed time exceeds this threshold, f2fs will print tracepoint to dump information + of related context. This sysfs entry can be used to control the value of threshold, + by default, the value is 500 ms. + +What: /sys/fs/f2fs/<disk>/inject_timeout_type +Date: December 2025 +Contact: "Chao Yu" <chao@kernel.org> +Description: This sysfs entry can be used to change type of injected timeout: + ========== =============================== + Flag_Value Flag_Description + ========== =============================== + 0x00000000 No timeout (default) + 0x00000001 Simulate running time + 0x00000002 Simulate IO type sleep time + 0x00000003 Simulate Non-IO type sleep time + 0x00000004 Simulate runnable time + ========== =============================== + +What: /sys/fs/f2fs/<disk>/adjust_lock_priority +Date: January 2026 +Contact: "Chao Yu" <chao@kernel.org> +Description: This sysfs entry can be used to enable/disable to adjust priority for task + which is in critical region covered by lock. + ========== ================== + Flag_Value Flag_Description + ========== ================== + 0x00000000 Disabled (default) + 0x00000001 cp_rwsem + 0x00000002 node_change + 0x00000004 node_write + 0x00000008 gc_lock + 0x00000010 cp_global + 0x00000020 io_rwsem + ========== ================== + +What: /sys/fs/f2fs/<disk>/lock_duration_priority +Date: January 2026 +Contact: "Chao Yu" <chao@kernel.org> +Description: f2fs can tune priority of thread which has entered into critical region covered by + f2fs rwsemphore lock. This sysfs entry can be used to control priority value, the + range is [100,139], by default the value is 120. + +What: /sys/fs/f2fs/<disk>/critical_task_priority +Date: February 2026 +Contact: "Chao Yu" <chao@kernel.org> +Description: It can be used to tune priority of f2fs critical task, e.g. f2fs_ckpt, f2fs_gc + threads, limitation as below: + - it requires user has CAP_SYS_NICE capability. + - the range is [100, 139], by default the value is 100. diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst index cb90d1ae82d0..7e4031631286 100644 --- a/Documentation/filesystems/f2fs.rst +++ b/Documentation/filesystems/f2fs.rst @@ -206,7 +206,7 @@ fault_type=%d Support configuring fault injection type, should be FAULT_TRUNCATE 0x00000400 FAULT_READ_IO 0x00000800 FAULT_CHECKPOINT 0x00001000 - FAULT_DISCARD 0x00002000 + FAULT_DISCARD 0x00002000 (obsolete) FAULT_WRITE_IO 0x00004000 FAULT_SLAB_ALLOC 0x00008000 FAULT_DQUOT_INIT 0x00010000 @@ -215,8 +215,10 @@ fault_type=%d Support configuring fault injection type, should be FAULT_BLKADDR_CONSISTENCE 0x00080000 FAULT_NO_SEGMENT 0x00100000 FAULT_INCONSISTENT_FOOTER 0x00200000 - FAULT_TIMEOUT 0x00400000 (1000ms) + FAULT_ATOMIC_TIMEOUT 0x00400000 (1000ms) FAULT_VMALLOC 0x00800000 + FAULT_LOCK_TIMEOUT 0x01000000 (1000ms) + FAULT_SKIP_WRITE 0x02000000 =========================== ========== mode=%s Control block allocation mode which supports "adaptive" and "lfs". In "lfs" mode, there should be no random @@ -1033,3 +1035,46 @@ the reserved space back to F2FS for its own use. So, the key idea is, user can do any file operations on /dev/vdc, and reclaim the space after the use, while the space is counted as /data. That doesn't require modifying partition size and filesystem format. + +Per-file Read-Only Large Folio Support +-------------------------------------- + +F2FS implements large folio support on the read path to leverage high-order +page allocation for significant performance gains. To minimize code complexity, +this support is currently excluded from the write path, which requires handling +complex optimizations such as compression and block allocation modes. + +This optional feature is triggered only when a file's immutable bit is set. +Consequently, F2FS will return EOPNOTSUPP if a user attempts to open a cached +file with write permissions, even immediately after clearing the bit. Write +access is only restored once the cached inode is dropped. The usage flow is +demonstrated below: + +.. code-block:: + + # f2fs_io setflags immutable /data/testfile_read_seq + + /* flush and reload the inode to enable the large folio */ + # sync && echo 3 > /proc/sys/vm/drop_caches + + /* mmap(MAP_POPULATE) + mlock() */ + # f2fs_io read 128 0 1024 mmap 1 0 /data/testfile_read_seq + + /* mmap() + fadvise(POSIX_FADV_WILLNEED) + mlock() */ + # f2fs_io read 128 0 1024 fadvise 1 0 /data/testfile_read_seq + + /* mmap() + mlock2(MLOCK_ONFAULT) + madvise(MADV_POPULATE_READ) */ + # f2fs_io read 128 0 1024 madvise 1 0 /data/testfile_read_seq + + # f2fs_io clearflags immutable /data/testfile_read_seq + + # f2fs_io write 1 0 1 zero buffered /data/testfile_read_seq + Failed to open /mnt/test/test: Operation not supported + + /* flush and reload the inode to disable the large folio */ + # sync && echo 3 > /proc/sys/vm/drop_caches + + # f2fs_io write 1 0 1 zero buffered /data/testfile_read_seq + Written 4096 bytes with pattern = zero, total_time = 29 us, max_latency = 28 us + + # rm /data/testfile_read_seq |
