summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
11 hoursMerge tag 'for-6.19-rc6-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - protect reading super block vs setting block size externally (found by syzbot) - make sure no transaction is started in read-only mode even with some rescue mount option combinations - fix checksum calculation of backup super blocks when block-group-tree is enabled - more extensive mount-time checks of device items that could be left after device replace and attempting degraded mount - fix build warning with -Wmaybe-uninitialized on loongarch64-gcc 12 * tag 'for-6.19-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: add extra device item checks at mount btrfs: fix missing fields in superblock backup with BLOCK_GROUP_TREE btrfs: reject new transactions if the fs is fully read-only btrfs: sync read disk super and set block size btrfs: fix Wmaybe-uninitialized warning in replay_one_buffer()
36 hoursbtrfs: add extra device item checks at mountQu Wenruo
[BUG] There is a bug report where after a dev-replace, the replace source device with devid 4 is properly erased (dump tree shows it's the old devid 4), but the target device is still using devid 0. When the user tries to mount the fs degraded, the mount failed with the following errors: BTRFS: device fsid 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 devid 5 transid 1394395 /dev/sda (8:0) scanned by btrfs (261) BTRFS: device fsid 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 devid 6 transid 1394395 /dev/sde (8:64) scanned by btrfs (261) BTRFS: device fsid 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 devid 0 transid 1394395 /dev/sdd (8:48) scanned by btrfs (261) BTRFS: device fsid 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 devid 3 transid 1394395 /dev/sdf (8:80) scanned by btrfs (261) BTRFS info (device sdd): first mount of filesystem 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 BTRFS info (device sdd): using crc32c (crc32c-intel) checksum algorithm BTRFS warning (device sdd): devid 4 uuid 01e2081c-9c2a-4071-b9f4-e1b27e571ff5 is missing BTRFS info (device sdd): bdev <missing disk> errs: wr 84994544, rd 15567, flush 65872, corrupt 0, gen 0 BTRFS info (device sdd): bdev /dev/sdd errs: wr 71489901, rd 0, flush 30001, corrupt 0, gen 0 BTRFS error (device sdd): replace without active item, run 'device scan --forget' on the target device BTRFS error (device sdd): failed to init dev_replace: -117 BTRFS error (device sdd): open_ctree failed: -117 [CAUSE] The devid 0 didn't get its devid updated is its own problem, here I'm only focusing on the mount failure itself. The mount is not caused by the missing device, as the fs has RAID1C3 for metadata and RAID10 for data, thus is completely able to tolerate one missing device. The device tree shows the dev-replace has properly finished: item 7 key (0 DEV_REPLACE 0) itemoff 15931 itemsize 72 src devid -1 cursor left 11091821199360 cursor right 11091821199360 mode ALWAYS state FINISHED write errors 0 uncorrectable read errors 0 ^^^^^^^^ And the chunk tree shows there is no devid 0: leaf 37980736602112 items 23 free space 12548 generation 1394388 owner CHUNK_TREE leaf 37980736602112 flags 0x1(WRITTEN) backref revision 1 fs uuid 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 chunk uuid d074c661-6311-4570-b59f-a5c83fd37f8e item 0 key (DEV_ITEMS DEV_ITEM 3) itemoff 16185 itemsize 98 devid 3 total_bytes 20000588955648 bytes_used 8282877984768 io_align 4096 io_width 4096 sector_size 4096 type 0 generation 0 start_offset 0 dev_group 0 seek_speed 0 bandwidth 0 uuid 0d596b69-fb0d-4031-b4af-a301d0868b8b fsid 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 ... Which shows the first device is devid 3. But there is indeed /dev/sdd with devid 0: superblock: bytenr=65536, device=/dev/sdd --------------------------------------------------------- csum_type 0 (crc32c) csum_size 4 csum 0xd4bed87e [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 ... uuid_tree_generation 1394388 dev_item.uuid ee6532ad-5442-45f7-87fb-7703e29ed934 dev_item.fsid 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 [match] dev_item.type 0 dev_item.total_bytes 20000588955648 dev_item.bytes_used 8292541661184 dev_item.io_align 0 dev_item.io_width 0 dev_item.sector_size 0 dev_item.devid 0 <<< So this means device scan will register sdd as devid 0 into the fs, then during btrfs_init_dev_replace(), we located the replace progress item, found the previous replace is finished, but we still need to check if the dev-replace target device (devid 0) exists. If that device exists, we error out showing that error message. But to be honest the end user may not really remember which device is the replace target device, thus not sure what to do in the next step. [ENHANCEMENT] To make the error more obvious, and tell the end user which devices should be unregistered: - Introduce BTRFS_DEV_STATE_ITEM_FOUND flag During device item read from the chunk tree, set the flag for each found device item. - Verify there is no device without the above flag during mount Even missing device should have that flag set. If we found a device without that flag set, it means it's an unexpected one and should be rejected. - More detailed error message on what to do next This will show all unexpected devices and tell the end user to use 'btrfs dev scan --forget' to forget them or remove them before mount. There is an example dmesg where a device of a valid filesystem is modified to have devid 0, then try degraded mount: BTRFS info (device dm-6): first mount of filesystem 7c873869-844c-4b39-bd75-a96148bf4656 BTRFS info (device dm-6): using crc32c checksum algorithm BTRFS warning (device dm-6): devid 3 uuid b4a9f35b-db42-4ac4-b55a-cbf81d3b9683 is missing BTRFS error (device dm-6): devid 0 path /dev/mapper/test-scratch3 is registered but not found in chunk tree BTRFS error (device dm-6): please remove above devices or use 'btrfs device scan --forget <dev>' to unregister them before mount BTRFS error (device dm-6): open_ctree failed: -117 Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
36 hoursbtrfs: fix missing fields in superblock backup with BLOCK_GROUP_TREEMark Harmstone
When the BLOCK_GROUP_TREE compat_ro flag is set, the extent root and csum root fields are getting missed. This is because EXTENT_TREE_V2 treated these differently, and when they were split off this special-casing was mistakenly assigned to BGT rather than the rump EXTENT_TREE_V2. There's no reason why the existence of the block group tree should mean that we don't record the details of the last commit's extent root and csum root. Fix the code in backup_super_roots() so that the correct check gets made. Fixes: 1c56ab991903 ("btrfs: separate BLOCK_GROUP_TREE compat RO flag from EXTENT_TREE_V2") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
36 hoursbtrfs: reject new transactions if the fs is fully read-onlyQu Wenruo
[BUG] There is a bug report where a heavily fuzzed fs is mounted with all rescue mount options, which leads to the following warnings during unmount: BTRFS: Transaction aborted (error -22) Modules linked in: CPU: 0 UID: 0 PID: 9758 Comm: repro.out Not tainted 6.19.0-rc5-00002-gb71e635feefc #7 PREEMPT(full) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:find_free_extent_update_loop fs/btrfs/extent-tree.c:4208 [inline] RIP: 0010:find_free_extent+0x52f0/0x5d20 fs/btrfs/extent-tree.c:4611 Call Trace: <TASK> btrfs_reserve_extent+0x2cd/0x790 fs/btrfs/extent-tree.c:4705 btrfs_alloc_tree_block+0x1e1/0x10e0 fs/btrfs/extent-tree.c:5157 btrfs_force_cow_block+0x578/0x2410 fs/btrfs/ctree.c:517 btrfs_cow_block+0x3c4/0xa80 fs/btrfs/ctree.c:708 btrfs_search_slot+0xcad/0x2b50 fs/btrfs/ctree.c:2130 btrfs_truncate_inode_items+0x45d/0x2350 fs/btrfs/inode-item.c:499 btrfs_evict_inode+0x923/0xe70 fs/btrfs/inode.c:5628 evict+0x5f4/0xae0 fs/inode.c:837 __dentry_kill+0x209/0x660 fs/dcache.c:670 finish_dput+0xc9/0x480 fs/dcache.c:879 shrink_dcache_for_umount+0xa0/0x170 fs/dcache.c:1661 generic_shutdown_super+0x67/0x2c0 fs/super.c:621 kill_anon_super+0x3b/0x70 fs/super.c:1289 btrfs_kill_super+0x41/0x50 fs/btrfs/super.c:2127 deactivate_locked_super+0xbc/0x130 fs/super.c:474 cleanup_mnt+0x425/0x4c0 fs/namespace.c:1318 task_work_run+0x1d4/0x260 kernel/task_work.c:233 exit_task_work include/linux/task_work.h:40 [inline] do_exit+0x694/0x22f0 kernel/exit.c:971 do_group_exit+0x21c/0x2d0 kernel/exit.c:1112 __do_sys_exit_group kernel/exit.c:1123 [inline] __se_sys_exit_group kernel/exit.c:1121 [inline] __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1121 x64_sys_call+0x2210/0x2210 arch/x86/include/generated/asm/syscalls_64.h:232 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xe8/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x44f639 Code: Unable to access opcode bytes at 0x44f60f. RSP: 002b:00007ffc15c4e088 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 00000000004c32f0 RCX: 000000000044f639 RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001 RBP: 0000000000000001 R08: ffffffffffffffc0 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004c32f0 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001 </TASK> Since rescue mount options will mark the full fs read-only, there should be no new transaction triggered. But during unmount we will evict all inodes, which can trigger a new transaction, and triggers warnings on a heavily corrupted fs. [CAUSE] Btrfs allows new transaction even on a read-only fs, this is to allow log replay happen even on read-only mounts, just like what ext4/xfs do. However with rescue mount options, the fs is fully read-only and cannot be remounted read-write, thus in that case we should also reject any new transactions. [FIX] If we find the fs has rescue mount options, we should treat the fs as error, so that no new transaction can be started. Reported-by: Jiaming Zhang <r772577952@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CANypQFYw8Nt8stgbhoycFojOoUmt+BoZ-z8WJOZVxcogDdwm=Q@mail.gmail.com/ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
36 hoursbtrfs: sync read disk super and set block sizeEdward Adam Davis
When the user performs a btrfs mount, the block device is not set correctly. The user sets the block size of the block device to 0x4000 by executing the BLKBSZSET command. Since the block size change also changes the mapping->flags value, this further affects the result of the mapping_min_folio_order() calculation. Let's analyze the following two scenarios: Scenario 1: Without executing the BLKBSZSET command, the block size is 0x1000, and mapping_min_folio_order() returns 0; Scenario 2: After executing the BLKBSZSET command, the block size is 0x4000, and mapping_min_folio_order() returns 2. do_read_cache_folio() allocates a folio before the BLKBSZSET command is executed. This results in the allocated folio having an order value of 0. Later, after BLKBSZSET is executed, the block size increases to 0x4000, and the mapping_min_folio_order() calculation result becomes 2. This leads to two undesirable consequences: 1. filemap_add_folio() triggers a VM_BUG_ON_FOLIO(folio_order(folio) < mapping_min_folio_order(mapping)) assertion. 2. The syzbot report [1] shows a null pointer dereference in create_empty_buffers() due to a buffer head allocation failure. Synchronization should be established based on the inode between the BLKBSZSET command and read cache page to prevent inconsistencies in block size or mapping flags before and after folio allocation. [1] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:create_empty_buffers+0x4d/0x480 fs/buffer.c:1694 Call Trace: folio_create_buffers+0x109/0x150 fs/buffer.c:1802 block_read_full_folio+0x14c/0x850 fs/buffer.c:2403 filemap_read_folio+0xc8/0x2a0 mm/filemap.c:2496 do_read_cache_folio+0x266/0x5c0 mm/filemap.c:4096 do_read_cache_page mm/filemap.c:4162 [inline] read_cache_page_gfp+0x29/0x120 mm/filemap.c:4195 btrfs_read_disk_super+0x192/0x500 fs/btrfs/volumes.c:1367 Reported-by: syzbot+b4a2af3000eaa84d95d5@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b4a2af3000eaa84d95d5 Signed-off-by: Edward Adam Davis <eadavis@qq.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2 daysbtrfs: fix Wmaybe-uninitialized warning in replay_one_buffer()Qiang Ma
Warning was found when compiling using loongarch64-gcc 12.3.1: $ make CFLAGS_tree-log.o=-Wmaybe-uninitialized In file included from fs/btrfs/ctree.h:21, from fs/btrfs/tree-log.c:12: fs/btrfs/accessors.h: In function 'replay_one_buffer': fs/btrfs/accessors.h:66:16: warning: 'inode_item' may be used uninitialized [-Wmaybe-uninitialized] 66 | return btrfs_get_##bits(eb, s, offsetof(type, member)); \ | ^~~~~~~~~~ fs/btrfs/tree-log.c:2803:42: note: 'inode_item' declared here 2803 | struct btrfs_inode_item *inode_item; | ^~~~~~~~~~ Initialize the inode_item to NULL, the compiler does not seem to see the relation between the first 'wc->log_key.type == BTRFS_INODE_ITEM_KEY' check and the other one that also checks the replay phase. Signed-off-by: Qiang Ma <maqianga@uniontech.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2 daysfs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()Joanne Koong
Above the while() loop in wait_sb_inodes(), we document that we must wait for all pages under writeback for data integrity. Consequently, if a mapping, like fuse, traditionally does not have data integrity semantics, there is no need to wait at all; we can simply skip these inodes. This restores fuse back to prior behavior where syncs are no-ops. This fixes a user regression where if a system is running a faulty fuse server that does not reply to issued write requests, this causes wait_sb_inodes() to wait forever. Link: https://lkml.kernel.org/r/20260105211737.4105620-2-joannelkoong@gmail.com Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reported-by: Athul Krishna <athul.krishna.kr@protonmail.com> Reported-by: J. Neuschäfer <j.neuschaefer@gmx.net> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Tested-by: J. Neuschäfer <j.neuschaefer@gmx.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Bernd Schubert <bschubert@ddn.com> Cc: Bonaccorso Salvatore <carnil@debian.org> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysMerge tag 'ext4_for_linus-6.19-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: - Fix an inconsistency in structure size on 32-bit platforms caused by padding differences for the new EXT4_IOC_[GS]ET_TUNE_SB_PARAM ioctls - Fix a buffer leak on the error path when dropping the refcount an xattr value stored in an inode - Fix missing locking on the error path for the file defragmentation ioctl leading to a BUG * tag 'ext4_for_linus-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix iloc.bh leak in ext4_xattr_inode_update_ref ext4: add missing down_write_data_sem in mext_move_extent(). ext4: fix ext4_tune_sb_params padding
3 daysext4: fix iloc.bh leak in ext4_xattr_inode_update_refYang Erkun
The error branch for ext4_xattr_inode_update_ref forget to release the refcount for iloc.bh. Find this when review code. Fixes: 57295e835408 ("ext4: guard against EA inode refcount underflow in xattr update") Signed-off-by: Yang Erkun <yangerkun@huawei.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20251213055706.3417529-1-yangerkun@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
3 daysext4: add missing down_write_data_sem in mext_move_extent().Julian Sun
Commit 962e8a01eab9 ("ext4: introduce mext_move_extent()") attempts to call ext4_swap_extents() on the failure path to recover the swapped extents, but fails to acquire locks for the two inode->i_data_sem, triggering the BUG_ON statement in ext4_swap_extents(). This issue can be fixed by calling ext4_double_down_write_data_sem() before ext4_swap_extents(). Signed-off-by: Julian Sun <sunjunchao@bytedance.com> Reported-by: syzbot+4ea6bd8737669b423aae@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69368649.a70a0220.38f243.0093.GAE@google.com/ Fixes: 962e8a01eab9 ("ext4: introduce mext_move_extent()") Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://patch.msgid.link/20251208123713.1971068-1-sunjunchao@bytedance.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
4 daysMerge tag 'for-6.19-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - with large folios in use, fix partial incorrect update of a reflinked range - fix potential deadlock in iget when lookup fails and eviction is needed - in send, validate inline extent type while detecting file holes - fix memory leak after an error when creating a space info - remove zone statistics from sysfs again, the output size limitations make it unusable, we'll do it in another way in another release - test fixes: - return proper error codes from block remapping tests - fix tree root leaks in qgroup tests after errors * tag 'for-6.19-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: remove zoned statistics from sysfs btrfs: fix memory leaks in create_space_info() error paths btrfs: invalidate pages instead of truncate after reflinking btrfs: update the Kconfig string for CONFIG_BTRFS_EXPERIMENTAL btrfs: send: check for inline extents in range_is_hole_in_parent() btrfs: tests: fix return 0 on rmap test failure btrfs: tests: fix root tree leak in btrfs_test_qgroups() btrfs: release path before iget_failed() in btrfs_read_locked_inode()
5 daysMerge tag 'xfs-fixes-6.19-rc6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Carlos Maiolino: "Just a few obvious fixes and some 'cosmetic' changes" * tag 'xfs-fixes-6.19-rc6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: set max_agbno to allow sparse alloc of last full inode chunk xfs: Fix xfs_grow_last_rtg() xfs: improve the assert at the top of xfs_log_cover xfs: fix an overly long line in xfs_rtgroup_calc_geometry xfs: mark __xfs_rtgroup_extents static xfs: Fix the return value of xfs_rtcopy_summary() xfs: fix memory leak in xfs_growfs_check_rtgeom()
6 daysMerge tag 'nfs-for-6.19-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client fixes from Trond Myklebust: - Fix another deadlock involving nfs_release_folio() - localio: - Stop I/O upon hitting a fatal error - Deal with page offsets that are > PAGE_SIZE - Fix size read races in truncate, fallocate and copy offload - Several bugfixes for the NFSv4.x directory delegation client code - pNFS: - Fix a deadlock when returning delegations during open - Fix memory leaks in various error paths * tag 'nfs-for-6.19-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: Fix size read races in truncate, fallocate and copy offload NFS: Don't immediately return directory delegations when disabled NFS/localio: Deal with page bases that are > PAGE_SIZE NFS/localio: Stop further I/O upon hitting an error NFSv4.x: Directory delegations don't require any state recovery NFSv4: Don't free slots prematurely if requesting a directory delegation NFSv4: Fix nfs_clear_verifier_delegated() for delegated directories NFS: Fix directory delegation verifier checks pnfs/blocklayout: Fix memory leak in bl_parse_scsi() pnfs/flexfiles: Fix memory leak in nfs4_ff_alloc_deviceid_node() NFS: Fix a deadlock involving nfs_release_folio() pNFS: Fix a deadlock when returning a delegation during open()
6 daysNFS: Fix size read races in truncate, fallocate and copy offloadTrond Myklebust
If the pre-operation file size is read before locking the inode and quiescing O_DIRECT writes, then nfs_truncate_last_folio() might end up overwriting valid file data. Fixes: b1817b18ff20 ("NFS: Protect against 'eof page pollution'") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
7 daysbtrfs: remove zoned statistics from sysfsJohannes Thumshirn
Remove the newly introduced zoned statistics from sysfs, as sysfs can only show a single page this will truncate the output on a busy filesystem. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
8 daysMerge tag 'gfs2-for-6.19-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 revert from Andreas Gruenbacher: "Revert bad commit "gfs2: Fix use of bio_chain" I was originally assuming that there must be a bug in gfs2 because gfs2 chains bios in the opposite direction of what bio_chain_and_submit() expects. It turns out that the bio chains are set up in "reverse direction" intentionally so that the first bio's bi_end_io callback is invoked rather than the last bio's callback. We want the first bio's callback invoked for the following reason: The initial bio starts page aligned and covers one or more pages. When it terminates at a non-page-aligned offset, subsequent bios are added to handle the remaining portion of the final page. Upon completion of the bio chain, all affected pages need to be be marked as read, and only the first bio references all of these pages" * tag 'gfs2-for-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: Revert "gfs2: Fix use of bio_chain"
9 daysxfs: set max_agbno to allow sparse alloc of last full inode chunkBrian Foster
Sparse inode cluster allocation sets min/max agbno values to avoid allocating an inode cluster that might map to an invalid inode chunk. For example, we can't have an inode record mapped to agbno 0 or that extends past the end of a runt AG of misaligned size. The initial calculation of max_agbno is unnecessarily conservative, however. This has triggered a corner case allocation failure where a small runt AG (i.e. 2063 blocks) is mostly full save for an extent to the EOFS boundary: [2050,13]. max_agbno is set to 2048 in this case, which happens to be the offset of the last possible valid inode chunk in the AG. In practice, we should be able to allocate the 4-block cluster at agbno 2052 to map to the parent inode record at agbno 2048, but the max_agbno value precludes it. Note that this can result in filesystem shutdown via dirty trans cancel on stable kernels prior to commit 9eb775968b68 ("xfs: walk all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags") because the tail AG selection by the allocator sets t_highest_agno on the transaction. If the inode allocator spins around and finds an inode chunk with free inodes in an earlier AG, the subsequent dir name creation path may still fail to allocate due to the AG restriction and cancel. To avoid this problem, update the max_agbno calculation to the agbno prior to the last chunk aligned agbno in the AG. This is not necessarily the last valid allocation target for a sparse chunk, but since inode chunks (i.e. records) are chunk aligned and sparse allocs are cluster sized/aligned, this allows the sb_spino_align alignment restriction to take over and round down the max effective agbno to within the last valid inode chunk in the AG. Note that even though the allocator improvements in the aforementioned commit seem to avoid this particular dirty trans cancel situation, the max_agbno logic improvement still applies as we should be able to allocate from an AG that has been appropriately selected. The more important target for this patch however are older/stable kernels prior to this allocator rework/improvement. Cc: stable@vger.kernel.org # v4.2 Fixes: 56d1115c9bc7 ("xfs: allocate sparse inode chunks on full chunk allocation failure") Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
9 daysxfs: Fix xfs_grow_last_rtg()Nirjhar Roy (IBM)
The last rtg should be able to grow when the size of the last is less than (and not equal to) sb_rgextents. xfs_growfs with realtime groups fails without this patch. The reason is that, xfs_growfs_rtg() tries to grow the last rt group even when the last rt group is at its maximal size i.e, sb_rgextents. It fails with the following messages: XFS (loop0): Internal error block >= mp->m_rsumblocks at line 253 of file fs/xfs/libxfs/xfs_rtbitmap.c. Caller xfs_rtsummary_read_buf+0x20/0x80 XFS (loop0): Corruption detected. Unmount and run xfs_repair XFS (loop0): Internal error xfs_trans_cancel at line 976 of file fs/xfs/xfs_trans.c. Caller xfs_growfs_rt_bmblock+0x402/0x450 XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_trans_cancel+0x10a/0x1f0 (fs/xfs/xfs_trans.c:977). Shutting down filesystem. XFS (loop0): Please unmount the filesystem and rectify the problem(s) Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
9 daysxfs: improve the assert at the top of xfs_log_coverChristoph Hellwig
Move each condition into a separate assert so that we can see which on triggered. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
9 daysxfs: fix an overly long line in xfs_rtgroup_calc_geometryChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
9 daysxfs: mark __xfs_rtgroup_extents staticChristoph Hellwig
__xfs_rtgroup_extents is not used outside of xfs_rtgroup.c, so mark it static. Move it and xfs_rtgroup_extents up in the file to avoid forward declarations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
9 daysxfs: Fix the return value of xfs_rtcopy_summary()Nirjhar Roy (IBM)
xfs_rtcopy_summary() should return the appropriate error code instead of always returning 0. The caller of this function which is xfs_growfs_rt_bmblock() is already handling the error. Fixes: e94b53ff699c ("xfs: cache last bitmap block in realtime allocator") Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org # v6.7 Signed-off-by: Carlos Maiolino <cem@kernel.org>
9 daysNFS: Don't immediately return directory delegations when disabledAnna Schumaker
The function nfs_inode_evict_delegation() immediately and synchronously returns a delegation when called. This means we can't call it from nfs4_have_delegation(), since that function could be called under a lock. Instead we should mark the delegation for return and let the state manager handle it for us. Fixes: b6d2a520f463 ("NFS: Add a module option to disable directory delegations") Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
10 daysbtrfs: fix memory leaks in create_space_info() error pathsJiasheng Jiang
In create_space_info(), the 'space_info' object is allocated at the beginning of the function. However, there are two error paths where the function returns an error code without freeing the allocated memory: 1. When create_space_info_sub_group() fails in zoned mode. 2. When btrfs_sysfs_add_space_info_type() fails. In both cases, 'space_info' has not yet been added to the fs_info->space_info list, resulting in a memory leak. Fix this by adding an error handling label to kfree(space_info) before returning. Fixes: 2be12ef79fe9 ("btrfs: Separate space_info create/update") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
10 daysbtrfs: invalidate pages instead of truncate after reflinkingFilipe Manana
Qu reported that generic/164 often fails because the read operations get zeroes when it expects to either get all bytes with a value of 0x61 or 0x62. The issue stems from truncating the pages from the page cache instead of invalidating, as truncating can zero page contents. This zeroing is not just in case the range is not page sized (as it's commented in truncate_inode_pages_range()) but also in case we are using large folios, they need to be split and the splitting fails. Stealing Qu's comment in the thread linked below: "We can have the following case: 0 4K 8K 12K 16K | | | | | |<---- Extent A ----->|<----- Extent B ------>| The page size is still 4K, but the folio we got is 16K. Then if we remap the range for [8K, 16K), then truncate_inode_pages_range() will get the large folio 0 sized 16K, then call truncate_inode_partial_folio(). Which later calls folio_zero_range() for the [8K, 16K) range first, then tries to split the folio into smaller ones to properly drop them from the cache. But if splitting failed (e.g. racing with other operations holding the filemap lock), the partially zeroed large folio will be kept, resulting the range [8K, 16K) being zeroed meanwhile the folio is still a 16K sized large one." So instead of truncating, invalidate the page cache range with a call to filemap_invalidate_inode(), which besides not doing any zeroing also ensures that while it's invalidating folios, no new folios are added. This helps ensure that buffered reads that happen while a reflink operation is in progress always get either the whole old data (the one before the reflink) or the whole new data, which is what generic/164 expects. Link: https://lore.kernel.org/linux-btrfs/7fb9b44f-9680-4c22-a47f-6648cb109ddf@suse.com/ Reported-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
10 daysbtrfs: update the Kconfig string for CONFIG_BTRFS_EXPERIMENTALQu Wenruo
The following new features are missing: - Async checksum - Shutdown ioctl and auto-degradation - Larger block size support Which is dependent on larger folios. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
10 daysRevert "gfs2: Fix use of bio_chain"Andreas Gruenbacher
This reverts commit 8a157e0a0aa5143b5d94201508c0ca1bb8cfb941. That commit incorrectly assumed that the bio_chain() arguments were swapped in gfs2. However, gfs2 intentionally constructs bio chains so that the first bio's bi_end_io callback is invoked when all bios in the chain have completed, unlike bio chains where the last bio's callback is invoked. Fixes: 8a157e0a0aa5 ("gfs2: Fix use of bio_chain") Cc: stable@vger.kernel.org Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
10 daystreewide: Update email addressThomas Gleixner
In a vain attempt to consolidate the email zoo switch everything to the kernel.org account. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 dayserofs: fix file-backed mounts no longer working on EROFS partitionsGao Xiang
Sheng Yong reported [1] that Android APEX images didn't work with commit 072a7c7cdbea ("erofs: don't bother with s_stack_depth increasing for now") because "EROFS-formatted APEX file images can be stored within an EROFS-formatted Android system partition." In response, I sent a quick fat-fingered [PATCH v3] to address the report. Unfortunately, the updated condition was incorrect: if (erofs_is_fileio_mode(sbi)) { - sb->s_stack_depth = - file_inode(sbi->dif0.file)->i_sb->s_stack_depth + 1; - if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) { - erofs_err(sb, "maximum fs stacking depth exceeded"); + inode = file_inode(sbi->dif0.file); + if ((inode->i_sb->s_op == &erofs_sops && !sb->s_bdev) || + inode->i_sb->s_stack_depth) { The condition `!sb->s_bdev` is always true for all file-backed EROFS mounts, making the check effectively a no-op. The real fix tested and confirmed by Sheng Yong [2] at that time was [PATCH v3 RESEND], which correctly ensures the following EROFS^2 setup works: EROFS (on a block device) + EROFS (file-backed mount) But sadly I screwed it up again by upstreaming the outdated [PATCH v3]. This patch applies the same logic as the delta between the upstream [PATCH v3] and the real fix [PATCH v3 RESEND]. Reported-by: Sheng Yong <shengyong1@xiaomi.com> Closes: https://lore.kernel.org/r/3acec686-4020-4609-aee4-5dae7b9b0093@gmail.com [1] Fixes: 072a7c7cdbea ("erofs: don't bother with s_stack_depth increasing for now") Link: https://lore.kernel.org/r/243f57b8-246f-47e7-9fb1-27a771e8e9e8@gmail.com [2] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 daysMerge tag 'erofs-for-6.19-rc5-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fix from Gao Xiang: - Don't increase s_stack_depth which caused regressions in some composefs mount setups (EROFS + ovl^2) Instead just allow one extra unaccounted fs stacking level for straightforward cases. * tag 'erofs-for-6.19-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: don't bother with s_stack_depth increasing for now
12 dayserofs: don't bother with s_stack_depth increasing for nowGao Xiang
Previously, commit d53cd891f0e4 ("erofs: limit the level of fs stacking for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel stack overflow when stacking an unlimited number of EROFS on top of each other. This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes (and such setups are already used in production for quite a long time). One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH from 2 to 3, but proving that this is safe in general is a high bar. After a long discussion on GitHub issues [1] about possible solutions, one conclusion is that there is no need to support nesting file-backed EROFS mounts on stacked filesystems, because there is always the option to use loopback devices as a fallback. As a quick fix for the composefs regression for this cycle, instead of bumping `s_stack_depth` for file backed EROFS mounts, we disallow nesting file-backed EROFS over EROFS and over filesystems with `s_stack_depth` > 0. This works for all known file-backed mount use cases (composefs, containerd, and Android APEX for some Android vendors), and the fix is self-contained. Essentially, we are allowing one extra unaccounted fs stacking level of EROFS below stacking filesystems, but EROFS can only be used in the read path (i.e. overlayfs lower layers), which typically has much lower stack usage than the write path. We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more stack usage analysis or using alternative approaches, such as splitting the `s_stack_depth` limitation according to different combinations of stacking. Fixes: d53cd891f0e4 ("erofs: limit the level of fs stacking for file-backed mounts") Reported-and-tested-by: Dusty Mabe <dusty@dustymabe.com> Reported-by: Timothée Ravier <tim@siosm.fr> Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1] Reported-by: "Alekséi Naidénov" <an@digitaltide.io> Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com Acked-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Alexander Larsson <alexl@redhat.com> Reviewed-and-tested-by: Sheng Yong <shengyong1@xiaomi.com> Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
12 daysMerge tag 'for-6.19-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix potential NULL pointer dereference when replaying tree log after an error - release path before initializing extent tree to avoid potential deadlock when allocating new inode - on filesystems with block size > page size - fix potential read out of bounds during encoded read of an inline extent - only enforce free space tree if v1 cache is required - print correct tree id in error message * tag 'for-6.19-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: show correct warning if can't read data reloc tree btrfs: fix NULL pointer dereference in do_abort_log_replay() btrfs: force free space tree for bs > ps cases btrfs: only enforce free space tree if v1 cache is required for bs < ps cases btrfs: release path before initializing extent tree in btrfs_read_locked_inode() btrfs: avoid access-beyond-folio for bs > ps encoded writes
12 daysbtrfs: send: check for inline extents in range_is_hole_in_parent()Qu Wenruo
Before accessing the disk_bytenr field of a file extent item we need to check if we are dealing with an inline extent. This is because for inline extents their data starts at the offset of the disk_bytenr field. So accessing the disk_bytenr means we are accessing inline data or in case the inline data is less than 8 bytes we can actually cause an invalid memory access if this inline extent item is the first item in the leaf or access metadata from other items. Fixes: 82bfb2e7b645 ("Btrfs: incremental send, fix unnecessary hole writes for sparse files") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
12 daysbtrfs: tests: fix return 0 on rmap test failureNaohiro Aota
In test_rmap_blocks(), we have ret = 0 before checking the results. We need to set it to -EINVAL, so that a mismatching result will return -EINVAL not 0. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
12 daysbtrfs: tests: fix root tree leak in btrfs_test_qgroups()Zilin Guan
If btrfs_insert_fs_root() fails, the tmp_root allocated by btrfs_alloc_dummy_root() is leaked because its initial reference count is not decremented. Fix this by calling btrfs_put_root() unconditionally after btrfs_insert_fs_root(). This ensures the local reference is always dropped. Also fix a copy-paste error in the error message where the subvolume root insertion failure was incorrectly logged as "fs root". Co-developed-by: Jianhao Xu <jianhao.xu@seu.edu.cn> Signed-off-by: Jianhao Xu <jianhao.xu@seu.edu.cn> Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
12 daysbtrfs: release path before iget_failed() in btrfs_read_locked_inode()Filipe Manana
In btrfs_read_locked_inode() if we fail to lookup the inode, we jump to the 'out' label with a path that has a read locked leaf and then we call iget_failed(). This can result in a ABBA deadlock, since iget_failed() triggers inode eviction and that causes the release of the delayed inode, which must lock the delayed inode's mutex, and a task updating a delayed inode starts by taking the node's mutex and then modifying the inode's subvolume btree. Syzbot reported the following lockdep splat for this: ====================================================== WARNING: possible circular locking dependency detected syzkaller #0 Not tainted ------------------------------------------------------ btrfs-cleaner/8725 is trying to acquire lock: ffff0000d6826a48 (&delayed_node->mutex){+.+.}-{4:4}, at: __btrfs_release_delayed_node+0xa0/0x9b0 fs/btrfs/delayed-inode.c:290 but task is already holding lock: ffff0000dbeba878 (btrfs-tree-00){++++}-{4:4}, at: btrfs_tree_read_lock_nested+0x44/0x2ec fs/btrfs/locking.c:145 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (btrfs-tree-00){++++}-{4:4}: __lock_release kernel/locking/lockdep.c:5574 [inline] lock_release+0x198/0x39c kernel/locking/lockdep.c:5889 up_read+0x24/0x3c kernel/locking/rwsem.c:1632 btrfs_tree_read_unlock+0xdc/0x298 fs/btrfs/locking.c:169 btrfs_tree_unlock_rw fs/btrfs/locking.h:218 [inline] btrfs_search_slot+0xa6c/0x223c fs/btrfs/ctree.c:2133 btrfs_lookup_inode+0xd8/0x38c fs/btrfs/inode-item.c:395 __btrfs_update_delayed_inode+0x124/0xed0 fs/btrfs/delayed-inode.c:1032 btrfs_update_delayed_inode fs/btrfs/delayed-inode.c:1118 [inline] __btrfs_commit_inode_delayed_items+0x15f8/0x1748 fs/btrfs/delayed-inode.c:1141 __btrfs_run_delayed_items+0x1ac/0x514 fs/btrfs/delayed-inode.c:1176 btrfs_run_delayed_items_nr+0x28/0x38 fs/btrfs/delayed-inode.c:1219 flush_space+0x26c/0xb68 fs/btrfs/space-info.c:828 do_async_reclaim_metadata_space+0x110/0x364 fs/btrfs/space-info.c:1158 btrfs_async_reclaim_metadata_space+0x90/0xd8 fs/btrfs/space-info.c:1226 process_one_work+0x7e8/0x155c kernel/workqueue.c:3263 process_scheduled_works kernel/workqueue.c:3346 [inline] worker_thread+0x958/0xed8 kernel/workqueue.c:3427 kthread+0x5fc/0x75c kernel/kthread.c:463 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:844 -> #0 (&delayed_node->mutex){+.+.}-{4:4}: check_prev_add kernel/locking/lockdep.c:3165 [inline] check_prevs_add kernel/locking/lockdep.c:3284 [inline] validate_chain kernel/locking/lockdep.c:3908 [inline] __lock_acquire+0x1774/0x30a4 kernel/locking/lockdep.c:5237 lock_acquire+0x14c/0x2e0 kernel/locking/lockdep.c:5868 __mutex_lock_common+0x1d0/0x2678 kernel/locking/mutex.c:598 __mutex_lock kernel/locking/mutex.c:760 [inline] mutex_lock_nested+0x2c/0x38 kernel/locking/mutex.c:812 __btrfs_release_delayed_node+0xa0/0x9b0 fs/btrfs/delayed-inode.c:290 btrfs_release_delayed_node fs/btrfs/delayed-inode.c:315 [inline] btrfs_remove_delayed_node+0x68/0x84 fs/btrfs/delayed-inode.c:1326 btrfs_evict_inode+0x578/0xe28 fs/btrfs/inode.c:5587 evict+0x414/0x928 fs/inode.c:810 iput_final fs/inode.c:1914 [inline] iput+0x95c/0xad4 fs/inode.c:1966 iget_failed+0xec/0x134 fs/bad_inode.c:248 btrfs_read_locked_inode+0xe1c/0x1234 fs/btrfs/inode.c:4101 btrfs_iget+0x1b0/0x264 fs/btrfs/inode.c:5837 btrfs_run_defrag_inode fs/btrfs/defrag.c:237 [inline] btrfs_run_defrag_inodes+0x520/0xdc4 fs/btrfs/defrag.c:309 cleaner_kthread+0x21c/0x418 fs/btrfs/disk-io.c:1516 kthread+0x5fc/0x75c kernel/kthread.c:463 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:844 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- rlock(btrfs-tree-00); lock(&delayed_node->mutex); lock(btrfs-tree-00); lock(&delayed_node->mutex); *** DEADLOCK *** 1 lock held by btrfs-cleaner/8725: #0: ffff0000dbeba878 (btrfs-tree-00){++++}-{4:4}, at: btrfs_tree_read_lock_nested+0x44/0x2ec fs/btrfs/locking.c:145 stack backtrace: CPU: 0 UID: 0 PID: 8725 Comm: btrfs-cleaner Not tainted syzkaller #0 PREEMPT Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/03/2025 Call trace: show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:499 (C) __dump_stack+0x30/0x40 lib/dump_stack.c:94 dump_stack_lvl+0xd8/0x12c lib/dump_stack.c:120 dump_stack+0x1c/0x28 lib/dump_stack.c:129 print_circular_bug+0x324/0x32c kernel/locking/lockdep.c:2043 check_noncircular+0x154/0x174 kernel/locking/lockdep.c:2175 check_prev_add kernel/locking/lockdep.c:3165 [inline] check_prevs_add kernel/locking/lockdep.c:3284 [inline] validate_chain kernel/locking/lockdep.c:3908 [inline] __lock_acquire+0x1774/0x30a4 kernel/locking/lockdep.c:5237 lock_acquire+0x14c/0x2e0 kernel/locking/lockdep.c:5868 __mutex_lock_common+0x1d0/0x2678 kernel/locking/mutex.c:598 __mutex_lock kernel/locking/mutex.c:760 [inline] mutex_lock_nested+0x2c/0x38 kernel/locking/mutex.c:812 __btrfs_release_delayed_node+0xa0/0x9b0 fs/btrfs/delayed-inode.c:290 btrfs_release_delayed_node fs/btrfs/delayed-inode.c:315 [inline] btrfs_remove_delayed_node+0x68/0x84 fs/btrfs/delayed-inode.c:1326 btrfs_evict_inode+0x578/0xe28 fs/btrfs/inode.c:5587 evict+0x414/0x928 fs/inode.c:810 iput_final fs/inode.c:1914 [inline] iput+0x95c/0xad4 fs/inode.c:1966 iget_failed+0xec/0x134 fs/bad_inode.c:248 btrfs_read_locked_inode+0xe1c/0x1234 fs/btrfs/inode.c:4101 btrfs_iget+0x1b0/0x264 fs/btrfs/inode.c:5837 btrfs_run_defrag_inode fs/btrfs/defrag.c:237 [inline] btrfs_run_defrag_inodes+0x520/0xdc4 fs/btrfs/defrag.c:309 cleaner_kthread+0x21c/0x418 fs/btrfs/disk-io.c:1516 kthread+0x5fc/0x75c kernel/kthread.c:463 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:844 Fix this by releasing the path before calling iget_failed(). Reported-by: syzbot+c1c6edb02bea1da754d8@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/694530c2.a70a0220.207337.010d.GAE@google.com/ Fixes: 69673992b1ae ("btrfs: push cleanup into btrfs_read_locked_inode()") Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
12 daysMerge tag 'vfs-6.19-rc5.fixes' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Remove incorrect __user annotation from struct xattr_args::value - Documentation fix: Add missing kernel-doc description for the @isnew parameter in ilookup5_nowait() to silence Sphinx warnings - Documentation fix: Fix kernel-doc comment for __start_dirop() - the function name in the comment was wrong and the @state parameter was undocumented - Replace dynamic folio_batch allocation with stack allocation in iomap_zero_range(). The dynamic allocation was problematic for ext4-on-iomap work (didn't handle allocation failure properly) and triggered lockdep complaints. Uses a flag instead to control batch usage - Re-add #ifdef guards around PIDFD_GET_<ns-type>_NAMESPACE ioctls. When a namespace type is disabled, ns->ops is NULL, causes crashes during inode eviction when closing the fd. The ifdefs were removed in a recent simplification but are still needed - Fixe a race where a folio could be unlocked before the trailing zeros (for EOF within the page) were written - Split out a dedicated lease_dispose_list() helper since lease code paths always know they're disposing of leases. Removes unnecessary runtime flag checks and prepares for upcoming lease_manager enhancements - Fix userland delegation requests succeeding despite conflicting opens. Previously, FL_LAYOUT and FL_DELEG leases bypassed conflict checks (a hack for nfsd). Adds new ->lm_open_conflict() lease_manager operation so userland delegations get proper conflict checking while nfsd can continue its own conflict handling - Fix LOOKUP_CACHED path lookups incorrectly falling through to the slow path. After legitimize_links() calls were conditionally elided, the routine would always fail with LOOKUP_CACHED regardless of whether there were any links. Now the flag is checked at the two callsites before calling legitimize_links() - Fix bug in media fd allocation in media_request_alloc() - Fix mismatched API calls in ecryptfs_mknod(): was calling end_removing() instead of end_creating() after ecryptfs_start_creating_dentry() - Fix dentry reference count leak in ecryptfs_mkdir(): a dget() of the lower parent dir was added but never dput()'d, causing BUG during lower filesystem unmount due to the still-in-use dentry * tag 'vfs-6.19-rc5.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: pidfs: protect PIDFD_GET_* ioctls() via ifdef ecryptfs: Release lower parent dentry after creating dir ecryptfs: Fix improper mknod pairing of start_creating()/end_removing() get rid of bogus __user in struct xattr_args::value VFS: fix __start_dirop() kernel-doc warnings fs: Describe @isnew parameter in ilookup5_nowait() fs: make sure to fail try_to_unlazy() and try_to_unlazy() for LOOKUP_CACHED netfs: Fix early read unlock of page with EOF in middle filelock: allow lease_managers to dictate what qualifies as a conflict filelock: add lease_dispose_list() helper iomap: replace folio_batch allocation with stack allocation media: mc: fix potential use-after-free in media_request_alloc()
13 daysxfs: fix memory leak in xfs_growfs_check_rtgeom()Dan Carpenter
Free the "nmp" allocation before returning -EINVAL. Fixes: dc68c0f60169 ("xfs: fix the zoned RT growfs check for zone alignment") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-07NFS/localio: Deal with page bases that are > PAGE_SIZETrond Myklebust
When resending requests, etc, the page base can quickly grow larger than the page size. Fixes: 091bdcfcece0 ("nfs/localio: refactor iocb and iov_iter_bvec initialization") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Mike Snitzer <snitzer@kernel.org>
2026-01-07NFS/localio: Stop further I/O upon hitting an errorTrond Myklebust
If the call into the filesystem results in an I/O error, then the next chunk of data won't be contiguous with the end of the last successful chunk. So break out of the I/O loop and report the results. Currently the localio code will do this for a short read/write, but not for an error. Fixes: 6a218b9c3183 ("nfs/localio: do not issue misaligned DIO out-of-order") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Mike Snitzer <snitzer@kernel.org>
2026-01-07NFSv4.x: Directory delegations don't require any state recoveryTrond Myklebust
The state recovery code in nfs_end_delegation_return() is intended to allow regular files to recover cached open and lock state. It has no function for directory delegations, and may cause corruption. Fixes: 156b09482933 ("NFS: Request a directory delegation on ACCESS, CREATE, and UNLINK") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2026-01-06pidfs: protect PIDFD_GET_* ioctls() via ifdefChristian Brauner
We originally protected PIDFD_GET_<ns-type>_NAMESPACE ioctls() through ifdefs and recent rework made it possible to drop them. There was an oversight though. When the relevant namespace is turned off ns->ops will be NULL so even though opening a file descriptor is perfectly legitimate it would fail during inode eviction when the file was closed. The simple fix would be to check ns->ops for NULL and continue allow to retrieve namespace fds from pidfds but we don't allow retrieving them when the relevant namespace type is turned off. So keep the simplification but add the ifdefs back in. Link: https://lore.kernel.org/20251222214907.GA189632@quark Link: https://patch.msgid.link/20251224-ununterbrochen-gagen-ea949b83f8f2@brauner Fixes: a71e4f103aed ("pidfs: simplify PIDFD_GET_<type>_NAMESPACE ioctls") Tested-by: Brendan Jackman <jackmanb@kernel.org> Tested-by: Eric Biggers <ebiggers@kernel.org> Reported-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-06Merge tag 'nfsd-6.19-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "A set of NFSD fixes for stable that arrived after the merge window: - Remove an invalid NFS status code - Fix an fstests failure when using pNFS - Fix a UAF in v4_end_grace() - Fix the administrative interface used to revoke NFSv4 state - Fix a memory leak reported by syzbot" * tag 'nfsd-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: net ref data still needs to be freed even if net hasn't startup nfsd: check that server is running in unlock_filesystem nfsd: use correct loop termination in nfsd4_revoke_states() nfsd: provide locking for v4_end_grace NFSD: Fix permission check for read access to executable-only files NFSD: Remove NFSERR_EAGAIN
2026-01-06btrfs: show correct warning if can't read data reloc treeMark Harmstone
If a filesystem is missing its data reloc tree, we get something like this in dmesg: BTRFS warning (device loop11): failed to read root (objectid=4): -2 BTRFS error (device loop11): open_ctree failed: -2 objectid is BTRFS_DEV_TREE_OBJECTID, but this should actually be the value of BTRFS_DATA_RELOC_TREE_OBJECTID. btrfs_read_roots() prints location.objectid on failure, but this isn't set when reading the data reloc tree. Set location.objectid to the correct value on failure, so that the error message makes sense. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-06btrfs: fix NULL pointer dereference in do_abort_log_replay()Suchit Karunakaran
Coverity reported a NULL pointer dereference issue (CID 1666756) in do_abort_log_replay(). When btrfs_alloc_path() fails in replay_one_buffer(), wc->subvol_path is NULL, but btrfs_abort_log_replay() calls do_abort_log_replay() which unconditionally dereferences wc->subvol_path when attempting to print debug information. Fix this by adding a NULL check before dereferencing wc->subvol_path in do_abort_log_replay(). Fixes: 2753e4917624 ("btrfs: dump detailed info and specific messages on log replay failures") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-06btrfs: force free space tree for bs > ps casesQu Wenruo
[BUG] Currently we only enforcing the free space tree for bs < ps cases, but with the recently added bs > ps support, we lack the free space tree enforcing, causing explicit v1 cache mount option to fail on bs > ps cases: # mount -o space_cache=v1 /dev/test/scratch1 /mnt/btrfs/ mount: /mnt/btrfs: wrong fs type, bad option, bad superblock on /dev/mapper/test-scratch1, missing codepage or helper program, or other error. dmesg(1) may have more information after failed mount system call. # dmesg -t | tail -n7 BTRFS: device fsid ac14a6fa-4ec9-449e-aec9-7d1777bfdc06 devid 1 transid 11 /dev/mapper/test-scratch1 (253:3) scanned by mount (2849) BTRFS info (device dm-3): first mount of filesystem ac14a6fa-4ec9-449e-aec9-7d1777bfdc06 BTRFS info (device dm-3): using crc32c checksum algorithm BTRFS warning (device dm-3): support for block size 8192 with page size 4096 is experimental, some features may be missing BTRFS warning (device dm-3): space cache v1 is being deprecated and will be removed in a future release, please use -o space_cache=v2 BTRFS warning (device dm-3): v1 space cache is not supported for page size 4096 with sectorsize 8192 BTRFS error (device dm-3): open_ctree failed: -22 [FIX] Just enable the same free space tree for bs > ps cases, aligning the behavior to bs < ps cases. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-06btrfs: only enforce free space tree if v1 cache is required for bs < ps casesQu Wenruo
[BUG] Since the introduction of btrfs bs < ps support, v1 cache was never on the plan due to its hard coded PAGE_SIZE usage, and the future plan to properly deprecate it. However for bs < ps cases, even if 'nospace_cache,clear_cache' mount option is specified, it's never respected and free space tree is always enabled: mkfs.btrfs -f -O ^bgt,fst $dev mount $dev $mnt -o clear_cache,nospace_cache umount $mnt btrfs ins dump-super $dev ... compat_ro_flags 0x3 ( FREE_SPACE_TREE | FREE_SPACE_TREE_VALID ) ... This means a different behavior compared to bs >= ps cases. [CAUSE] The forcing usage of v2 space cache is done inside btrfs_set_free_space_cache_settings(), however it never checks if we're even using space cache but always enabling v2 cache. [FIX] Instead unconditionally enable v2 cache, only forcing v2 cache if the old v1 cache is required. Now v2 space cache can be properly disabled on bs < ps cases: mkfs.btrfs -f -O ^bgt,fst $dev mount $dev $mnt -o clear_cache,nospace_cache umount $mnt btrfs ins dump-super $dev ... compat_ro_flags 0x0 ... Fixes: 9f73f1aef98b ("btrfs: force v2 space cache usage for subpage mount") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-06btrfs: release path before initializing extent tree in btrfs_read_locked_inode()Filipe Manana
In btrfs_read_locked_inode() we are calling btrfs_init_file_extent_tree() while holding a path with a read locked leaf from a subvolume tree, and btrfs_init_file_extent_tree() may do a GFP_KERNEL allocation, which can trigger reclaim. This can create a circular lock dependency which lockdep warns about with the following splat: [6.1433] ====================================================== [6.1574] WARNING: possible circular locking dependency detected [6.1583] 6.18.0+ #4 Tainted: G U [6.1591] ------------------------------------------------------ [6.1599] kswapd0/117 is trying to acquire lock: [6.1606] ffff8d9b6333c5b8 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x39/0x2f0 [6.1625] but task is already holding lock: [6.1633] ffffffffa4ab8ce0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x195/0xc60 [6.1646] which lock already depends on the new lock. [6.1657] the existing dependency chain (in reverse order) is: [6.1667] -> #2 (fs_reclaim){+.+.}-{0:0}: [6.1677] fs_reclaim_acquire+0x9d/0xd0 [6.1685] __kmalloc_cache_noprof+0x59/0x750 [6.1694] btrfs_init_file_extent_tree+0x90/0x100 [6.1702] btrfs_read_locked_inode+0xc3/0x6b0 [6.1710] btrfs_iget+0xbb/0xf0 [6.1716] btrfs_lookup_dentry+0x3c5/0x8e0 [6.1724] btrfs_lookup+0x12/0x30 [6.1731] lookup_open.isra.0+0x1aa/0x6a0 [6.1739] path_openat+0x5f7/0xc60 [6.1746] do_filp_open+0xd6/0x180 [6.1753] do_sys_openat2+0x8b/0xe0 [6.1760] __x64_sys_openat+0x54/0xa0 [6.1768] do_syscall_64+0x97/0x3e0 [6.1776] entry_SYSCALL_64_after_hwframe+0x76/0x7e [6.1784] -> #1 (btrfs-tree-00){++++}-{3:3}: [6.1794] lock_release+0x127/0x2a0 [6.1801] up_read+0x1b/0x30 [6.1808] btrfs_search_slot+0x8e0/0xff0 [6.1817] btrfs_lookup_inode+0x52/0xd0 [6.1825] __btrfs_update_delayed_inode+0x73/0x520 [6.1833] btrfs_commit_inode_delayed_inode+0x11a/0x120 [6.1842] btrfs_log_inode+0x608/0x1aa0 [6.1849] btrfs_log_inode_parent+0x249/0xf80 [6.1857] btrfs_log_dentry_safe+0x3e/0x60 [6.1865] btrfs_sync_file+0x431/0x690 [6.1872] do_fsync+0x39/0x80 [6.1879] __x64_sys_fsync+0x13/0x20 [6.1887] do_syscall_64+0x97/0x3e0 [6.1894] entry_SYSCALL_64_after_hwframe+0x76/0x7e [6.1903] -> #0 (&delayed_node->mutex){+.+.}-{3:3}: [6.1913] __lock_acquire+0x15e9/0x2820 [6.1920] lock_acquire+0xc9/0x2d0 [6.1927] __mutex_lock+0xcc/0x10a0 [6.1934] __btrfs_release_delayed_node.part.0+0x39/0x2f0 [6.1944] btrfs_evict_inode+0x20b/0x4b0 [6.1952] evict+0x15a/0x2f0 [6.1958] prune_icache_sb+0x91/0xd0 [6.1966] super_cache_scan+0x150/0x1d0 [6.1974] do_shrink_slab+0x155/0x6f0 [6.1981] shrink_slab+0x48e/0x890 [6.1988] shrink_one+0x11a/0x1f0 [6.1995] shrink_node+0xbfd/0x1320 [6.1002] balance_pgdat+0x67f/0xc60 [6.1321] kswapd+0x1dc/0x3e0 [6.1643] kthread+0xff/0x240 [6.1965] ret_from_fork+0x223/0x280 [6.1287] ret_from_fork_asm+0x1a/0x30 [6.1616] other info that might help us debug this: [6.1561] Chain exists of: &delayed_node->mutex --> btrfs-tree-00 --> fs_reclaim [6.1503] Possible unsafe locking scenario: [6.1110] CPU0 CPU1 [6.1411] ---- ---- [6.1707] lock(fs_reclaim); [6.1998] lock(btrfs-tree-00); [6.1291] lock(fs_reclaim); [6.1581] lock(&delayed_node->mutex); [6.1874] *** DEADLOCK *** [6.1716] 2 locks held by kswapd0/117: [6.1999] #0: ffffffffa4ab8ce0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x195/0xc60 [6.1294] #1: ffff8d998344b0e0 (&type->s_umount_key#40){++++}- {3:3}, at: super_cache_scan+0x37/0x1d0 [6.1596] stack backtrace: [6.1183] CPU: 11 UID: 0 PID: 117 Comm: kswapd0 Tainted: G U 6.18.0+ #4 PREEMPT(lazy) [6.1185] Tainted: [U]=USER [6.1186] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023 [6.1187] Call Trace: [6.1187] <TASK> [6.1189] dump_stack_lvl+0x6e/0xa0 [6.1192] print_circular_bug.cold+0x17a/0x1c0 [6.1194] check_noncircular+0x175/0x190 [6.1197] __lock_acquire+0x15e9/0x2820 [6.1200] lock_acquire+0xc9/0x2d0 [6.1201] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0 [6.1204] __mutex_lock+0xcc/0x10a0 [6.1206] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0 [6.1208] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0 [6.1211] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0 [6.1213] __btrfs_release_delayed_node.part.0+0x39/0x2f0 [6.1215] btrfs_evict_inode+0x20b/0x4b0 [6.1217] ? lock_acquire+0xc9/0x2d0 [6.1220] evict+0x15a/0x2f0 [6.1222] prune_icache_sb+0x91/0xd0 [6.1224] super_cache_scan+0x150/0x1d0 [6.1226] do_shrink_slab+0x155/0x6f0 [6.1228] shrink_slab+0x48e/0x890 [6.1229] ? shrink_slab+0x2d2/0x890 [6.1231] shrink_one+0x11a/0x1f0 [6.1234] shrink_node+0xbfd/0x1320 [6.1236] ? shrink_node+0xa2d/0x1320 [6.1236] ? shrink_node+0xbd3/0x1320 [6.1239] ? balance_pgdat+0x67f/0xc60 [6.1239] balance_pgdat+0x67f/0xc60 [6.1241] ? finish_task_switch.isra.0+0xc4/0x2a0 [6.1246] kswapd+0x1dc/0x3e0 [6.1247] ? __pfx_autoremove_wake_function+0x10/0x10 [6.1249] ? __pfx_kswapd+0x10/0x10 [6.1250] kthread+0xff/0x240 [6.1251] ? __pfx_kthread+0x10/0x10 [6.1253] ret_from_fork+0x223/0x280 [6.1255] ? __pfx_kthread+0x10/0x10 [6.1257] ret_from_fork_asm+0x1a/0x30 [6.1260] </TASK> This is because: 1) The fsync task is holding an inode's delayed node mutex (for a directory) while calling __btrfs_update_delayed_inode() and that needs to do a search on the subvolume's btree (therefore read lock some extent buffers); 2) The lookup task, at btrfs_lookup(), triggered reclaim with the GFP_KERNEL allocation done by btrfs_init_file_extent_tree() while holding a read lock on a subvolume leaf; 3) The reclaim triggered kswapd which is doing inode eviction for the directory inode the fsync task is using as an argument to btrfs_commit_inode_delayed_inode() - but in that call chain we are trying to read lock the same leaf that the lookup task is holding while calling btrfs_init_file_extent_tree() and doing the GFP_KERNEL allocation. Fix this by calling btrfs_init_file_extent_tree() after we don't need the path anymore and release it in btrfs_read_locked_inode(). Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/linux-btrfs/6e55113a22347c3925458a5d840a18401a38b276.camel@linux.intel.com/ Fixes: 8679d2687c35 ("btrfs: initialize inode::file_extent_tree after i_mode has been set") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-06btrfs: avoid access-beyond-folio for bs > ps encoded writesQu Wenruo
[POTENTIAL BUG] If the system page size is 4K and fs block size is 8K, and max_inline mount option is set to 6K, we can inline a 6K sized data extent. Then a encoded write submitted a compressed extent which is at file offset 0, and the compressed length is 6K, which is allowed to be inlined. Now a read beyond page boundary is triggered inside write_extent_buffer() from insert_inline_extent(). [CAUSE] Currently the function __cow_file_range_inline() can only accept a single folio. For regular compressed write path, we always allocate the compressed folios using the minimal order matching the block size, thus the @compressed_folio should always cover a full fs block thus it is fine. But for encoded writes, they allocate page size folios, this means we can hit a case where the compressed data is smaller than block size but still larger than page size, in that case __cow_file_range_inline() will be called with @compressed_size larger than a page. In that case we will trigger a read beyond the folio inside insert_inline_extent(). Thankfully this is not that common, as the default max_inline is only 2048 bytes, smaller than PAGE_SIZE, and bs > ps support is still experimental. [FIX] We need to either allow insert_inline_extent() to accept a page array to properly support such case, or reject such inline extent. The latter is a much simpler solution, and considering bs > ps will stay as a corner case and non-default max_inline will be even rarer, I don't think we really need to fulfill such niche. So just reject any inline extent that's larger than PAGE_SIZE, and add an extra ASSERT() to insert_inline_extent() to catch such beyond-boundary access. Fixes: ec20799064c8 ("btrfs: enable encoded read/write/send for bs > ps cases") Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-05Merge tag 'for-6.19-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix potential deadlock due to mismatching transaction states when waiting for the current transaction - fix squota accounting with nested snapshots - fix quota inheritance of qgroups with multiple parent qgroups - fix NULL inode pointer in evict tracepoint - fix writes beyond end of file on systems with 64K page size and 4K block size - fix logging of inodes after exchange rename - fix use after free when using ref_tracker feature - space reservation fixes * tag 'for-6.19-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix reservation leak in some error paths when inserting inline extent btrfs: do not free data reservation in fallback from inline due to -ENOSPC btrfs: fix use-after-free warning in btrfs_get_or_create_delayed_node() btrfs: always detect conflicting inodes when logging inode refs btrfs: fix beyond-EOF write handling btrfs: fix deadlock in wait_current_trans() due to ignored transaction type btrfs: fix NULL dereference on root when tracing inode eviction btrfs: qgroup: update all parent qgroups when doing quick inherit btrfs: fix qgroup_snapshot_quick_inherit() squota bug