summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2025-05-23Merge tag 'v6.15-rc8-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: - Fix for rename regression due to the recent VFS lookup changes - Fix write failure - locking fix for oplock handling * tag 'v6.15-rc8-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: use list_first_entry_or_null for opinfo_get_list() ksmbd: fix rename failure ksmbd: fix stream write failure
2025-05-23Merge tag 'vfs-6.15-rc8.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "This contains a small set of fixes for the blocking buffer lookup conversion done earlier this cycle. It adds a missing conversion in the getblk slowpath and a few minor optimizations and cleanups" * tag 'vfs-6.15-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs/buffer: optimize discard_buffer() fs/buffer: remove superfluous statements fs/buffer: avoid redundant lookup in getblk slowpath fs/buffer: use sleeping lookup in __getblk_slowpath()
2025-05-22Merge tag 'bcachefs-2025-05-22' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "Small stuff, main ones users will be interested in: - Couple more casefolding fixes; we can now detect and repair casefolded dirents in non-casefolded dir and vice versa - Fix for massive write inflation with mmapped io, which hit certain databases" * tag 'bcachefs-2025-05-22' of git://evilpiepirate.org/bcachefs: bcachefs: Check for casefolded dirents in non casefolded dirs bcachefs: Fix bch2_dirent_create_snapshot() for casefolding bcachefs: Fix casefold opt via xattr interface bcachefs: mkwrite() now only dirties one page bcachefs: fix extent_has_stripe_ptr() bcachefs: Fix bch2_btree_path_traverse_cached() when paths realloced
2025-05-22Merge tag '6.15-rc8-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - Two fixes for use after free in readdir code paths * tag '6.15-rc8-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: Reset all search buffer pointers when releasing buffer smb: client: Fix use-after-free in cifs_fill_dirent
2025-05-21ksmbd: use list_first_entry_or_null for opinfo_get_list()Namjae Jeon
The list_first_entry() macro never returns NULL. If the list is empty then it returns an invalid pointer. Use list_first_entry_or_null() to check if the list is empty. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202505080231.7OXwq4Te-lkp@intel.com/ Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-05-21ksmbd: fix rename failureNamjae Jeon
I found that rename fails after cifs mount due to update of lookup_one_qstr_excl(). mv a/c b/ mv: cannot move 'a/c' to 'b/c': No such file or directory In order to rename to a new name regardless of whether the dentry is negative, we need to get the dentry through lookup_one_qstr_excl(). So It will not return error if the name doesn't exist. Fixes: 204a575e91f3 ("VFS: add common error checks to lookup_one_qstr_excl()") Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-05-21bcachefs: Check for casefolded dirents in non casefolded dirsKent Overstreet
Check for mismatches between casefold dirents and casefold directories. A mismatch will cause lookups to fail, as we'll be doing the lookup with the casefolded name, which won't match the non-casefolded dirent, and vice versa. Reported-by: Christopher Snowhill <chris@kode54.net> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Fix bch2_dirent_create_snapshot() for casefoldingKent Overstreet
bch2_dirent_create_snapshot(), used in fsck, neglected to create a casefolded dirent. Just move this into dirent_create_key(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Fix casefold opt via xattr interfaceKent Overstreet
Changing the casefold option requires extra checks/work - factor out a helper from bch2_fileattr_set() for the xattr code to use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21fs/buffer: optimize discard_buffer()Davidlohr Bueso
While invalidating, the clearing of the bits in discard_buffer() is done in one fully ordered CAS operation. In the past this was done via individual clear_bit(), until e7470ee89f0 (fs: buffer: do not use unnecessary atomic operations when discarding buffers). This implies that there were never strong ordering requirements outside of being serialized by the buffer lock. As such relax the ordering for archs that can benefit. Further, the implied ordering in buffer_unlock() makes current cmpxchg implied barrier redundant due to release semantics. And while in theory the unlock could be part of the bulk clearing, it is best to leave it explicit, but without the double barriers. Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://lore.kernel.org/20250515173925.147823-5-dave@stgolabs.net Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21fs/buffer: remove superfluous statementsDavidlohr Bueso
Get rid of those unnecessary return statements. Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://lore.kernel.org/20250515173925.147823-4-dave@stgolabs.net Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21fs/buffer: avoid redundant lookup in getblk slowpathDavidlohr Bueso
__getblk_slow() already implies failing a first lookup as the fastpath, so try to create the buffers immediately and avoid the redundant lookup. This saves 5-10% of the total cost/latency of the slowpath. Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://lore.kernel.org/20250515173925.147823-3-dave@stgolabs.net Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21fs/buffer: use sleeping lookup in __getblk_slowpath()Davidlohr Bueso
Just as with the fast path, call the lookup variant depending on the gfp flags. Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://lore.kernel.org/20250515173925.147823-2-dave@stgolabs.net Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-20Merge tag 'for-linus-6.15-ofs2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs fix from Mike Marshall: "Fix for orangefs page writeout counting" * tag 'for-linus-6.15-ofs2' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: adjust counting code to recover from 665575cf
2025-05-20orangefs: adjust counting code to recover from 665575cfMike Marshall
A late commit to 6.14-rc7! broke orangefs. 665575cf seems like a good change, but maybe should have been introduced during the merge window. This patch adjusts the counting code associated with writing out pages so that orangefs works in a 665575cf world. Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2025-05-19ksmbd: fix stream write failureNamjae Jeon
If there is no stream data in file, v_len is zero. So, If position(*pos) is zero, stream write will fail due to stream write position validation check. This patch reorganize stream write position validation. Fixes: 0ca6df4f40cf ("ksmbd: prevent out-of-bounds stream writes by validating *pos") Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-05-19smb: client: Reset all search buffer pointers when releasing bufferWang Zhaolong
Multiple pointers in struct cifs_search_info (ntwrk_buf_start, srch_entries_start, and last_entry) point to the same allocated buffer. However, when freeing this buffer, only ntwrk_buf_start was set to NULL, while the other pointers remained pointing to freed memory. This is defensive programming to prevent potential issues with stale pointers. While the active UAF vulnerability is fixed by the previous patch, this change ensures consistent pointer state and more robust error handling. Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> Cc: stable@vger.kernel.org Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-05-19bcachefs: mkwrite() now only dirties one pageKent Overstreet
Don't dirty the whole folio - fixes write amplification with applications doing mmaped writes. https://www.reddit.com/r/bcachefs/comments/1klzcg1/incredible_amounts_of_write_amplification_when/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-18bcachefs: fix extent_has_stripe_ptr()Kent Overstreet
This wasn't checking indirect extents. Fixes: https://github.com/koverstreet/bcachefs/issues/887 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-18smb: client: Fix use-after-free in cifs_fill_direntWang Zhaolong
There is a race condition in the readdir concurrency process, which may access the rsp buffer after it has been released, triggering the following KASAN warning. ================================================================== BUG: KASAN: slab-use-after-free in cifs_fill_dirent+0xb03/0xb60 [cifs] Read of size 4 at addr ffff8880099b819c by task a.out/342975 CPU: 2 UID: 0 PID: 342975 Comm: a.out Not tainted 6.15.0-rc6+ #240 PREEMPT(full) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x53/0x70 print_report+0xce/0x640 kasan_report+0xb8/0xf0 cifs_fill_dirent+0xb03/0xb60 [cifs] cifs_readdir+0x12cb/0x3190 [cifs] iterate_dir+0x1a1/0x520 __x64_sys_getdents+0x134/0x220 do_syscall_64+0x4b/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f996f64b9f9 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0d f7 c3 0c 00 f7 d8 64 89 8 RSP: 002b:00007f996f53de78 EFLAGS: 00000207 ORIG_RAX: 000000000000004e RAX: ffffffffffffffda RBX: 00007f996f53ecdc RCX: 00007f996f64b9f9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 00007f996f53dea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000207 R12: ffffffffffffff88 R13: 0000000000000000 R14: 00007ffc8cd9a500 R15: 00007f996f51e000 </TASK> Allocated by task 408: kasan_save_stack+0x20/0x40 kasan_save_track+0x14/0x30 __kasan_slab_alloc+0x6e/0x70 kmem_cache_alloc_noprof+0x117/0x3d0 mempool_alloc_noprof+0xf2/0x2c0 cifs_buf_get+0x36/0x80 [cifs] allocate_buffers+0x1d2/0x330 [cifs] cifs_demultiplex_thread+0x22b/0x2690 [cifs] kthread+0x394/0x720 ret_from_fork+0x34/0x70 ret_from_fork_asm+0x1a/0x30 Freed by task 342979: kasan_save_stack+0x20/0x40 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x37/0x50 kmem_cache_free+0x2b8/0x500 cifs_buf_release+0x3c/0x70 [cifs] cifs_readdir+0x1c97/0x3190 [cifs] iterate_dir+0x1a1/0x520 __x64_sys_getdents64+0x134/0x220 do_syscall_64+0x4b/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e The buggy address belongs to the object at ffff8880099b8000 which belongs to the cache cifs_request of size 16588 The buggy address is located 412 bytes inside of freed 16588-byte region [ffff8880099b8000, ffff8880099bc0cc) The buggy address belongs to the physical page: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x99b8 head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 anon flags: 0x80000000000040(head|node=0|zone=1) page_type: f5(slab) raw: 0080000000000040 ffff888001e03400 0000000000000000 dead000000000001 raw: 0000000000000000 0000000000010001 00000000f5000000 0000000000000000 head: 0080000000000040 ffff888001e03400 0000000000000000 dead000000000001 head: 0000000000000000 0000000000010001 00000000f5000000 0000000000000000 head: 0080000000000003 ffffea0000266e01 00000000ffffffff 00000000ffffffff head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880099b8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880099b8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8880099b8180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8880099b8200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880099b8280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== POC is available in the link [1]. The problem triggering process is as follows: Process 1 Process 2 ----------------------------------------------------------------- cifs_readdir /* file->private_data == NULL */ initiate_cifs_search cifsFile = kzalloc(sizeof(struct cifsFileInfo), GFP_KERNEL); smb2_query_dir_first ->query_dir_first() SMB2_query_directory SMB2_query_directory_init cifs_send_recv smb2_parse_query_directory srch_inf->ntwrk_buf_start = (char *)rsp; srch_inf->srch_entries_start = (char *)rsp + ... srch_inf->last_entry = (char *)rsp + ... srch_inf->smallBuf = true; find_cifs_entry /* if (cfile->srch_inf.ntwrk_buf_start) */ cifs_small_buf_release(cfile->srch_inf // free cifs_readdir ->iterate_shared() /* file->private_data != NULL */ find_cifs_entry /* in while (...) loop */ smb2_query_dir_next ->query_dir_next() SMB2_query_directory SMB2_query_directory_init cifs_send_recv compound_send_recv smb_send_rqst __smb_send_rqst rc = -ERESTARTSYS; /* if (fatal_signal_pending()) */ goto out; return rc /* if (cfile->srch_inf.last_entry) */ cifs_save_resume_key() cifs_fill_dirent // UAF /* if (rc) */ return -ENOENT; Fix this by ensuring the return code is checked before using pointers from the srch_inf. Link: https://bugzilla.kernel.org/show_bug.cgi?id=220131 [1] Fixes: a364bc0b37f1 ("[CIFS] fix saving of resume key before CIFSFindNext") Cc: stable@vger.kernel.org Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-05-17bcachefs: Fix bch2_btree_path_traverse_cached() when paths reallocedKent Overstreet
btree_key_cache_fill() will allocate and traverse another path (for the underlying btree), so we can't hold pointers to paths across a call - we have to pass indices. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-16Merge tag '6.15-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - Fix memory leak in mkdir error path - Fix max rsize miscalculation after channel reconnect * tag '6.15-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: fix zero rsize error messages smb: client: fix memory leak during error handling for POSIX mkdir
2025-05-16Merge tag 'nfs-for-6.15-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: - NFS: Fix a couple of missed handlers for the ENETDOWN and ENETUNREACH transport errors - NFS: Handle Oopsable failure of nfs_get_lock_context in the unlock path - NFSv4: Fix a race in nfs_local_open_fh() - NFSv4/pNFS: Fix a couple of layout segment leaks in layoutreturn - NFSv4/pNFS Avoid sharing pNFS DS connections between net namespaces since IP addresses are not guaranteed to refer to the same nodes - NFS: Don't flush file data while holding multiple directory locks in nfs_rename() * tag 'nfs-for-6.15-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: Avoid flushing data while holding directory locks in nfs_rename() NFS/pnfs: Fix the error path in pnfs_layoutreturn_retry_later_locked() NFSv4/pnfs: Reset the layout state after a layoutreturn NFS/localio: Fix a race in nfs_local_open_fh() nfs: nfs3acl: drop useless assignment in nfs3_get_acl() nfs: direct: drop useless initializer in nfs_direct_write_completion() nfs: move the nfs4_data_server_cache into struct nfs_net nfs: don't share pNFS DS connections between net namespaces nfs: handle failure of nfs_get_lock_context in unlock path pNFS/flexfiles: Record the RPC errors in the I/O tracepoints NFSv4/pnfs: Layoutreturn on close must handle fatal networking errors NFSv4: Handle fatal ENETDOWN and ENETUNREACH errors
2025-05-16NFS: Avoid flushing data while holding directory locks in nfs_rename()Trond Myklebust
The Linux client assumes that all filehandles are non-volatile for renames within the same directory (otherwise sillyrename cannot work). However, the existence of the Linux 'subtree_check' export option has meant that nfs_rename() has always assumed it needs to flush writes before attempting to rename. Since NFSv4 does allow the client to query whether or not the server exhibits this behaviour, and since knfsd does actually set the appropriate flag when 'subtree_check' is enabled on an export, it should be OK to optimise away the write flushing behaviour in the cases where it is clearly not needed. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2025-05-16NFS/pnfs: Fix the error path in pnfs_layoutreturn_retry_later_locked()Trond Myklebust
If there isn't a valid layout, or the layout stateid has changed, the cleanup after a layout return should clear out the old data. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-05-16NFSv4/pnfs: Reset the layout state after a layoutreturnTrond Myklebust
If there are still layout segments in the layout plh_return_lsegs list after a layout return, we should be resetting the state to ensure they eventually get returned as well. Fixes: 68f744797edd ("pNFS: Do not free layout segments that are marked for return") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-05-16Merge tag 'xfs-fixes-6.15-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Carlos Maiolino: "This includes a bug fix for a possible data corruption vector on the zoned allocator garbage collector" * tag 'xfs-fixes-6.15-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: Fix comment on xfs_trans_ail_update_bulk() xfs: Fix a comment on xfs_ail_delete xfs: Fail remount with noattr2 on a v5 with v4 enabled xfs: fix zoned GC data corruption due to wrong bv_offset xfs: free up mp->m_free[0].count in error case
2025-05-15Merge tag 'bcachefs-2025-05-15' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "The main user reported ones are: - Fix a btree iterator locking inconsistency that's been causing us to go emergency read-only in evacuate: "Fix broken btree_path lock invariants in next_node()" - Minor btree node cache reclaim tweak that should help with OOMs: don't set btree nodes as accessed on fill - Fix a bch2_bkey_clear_rebalance() issue that was causing rebalance to do needless work" * tag 'bcachefs-2025-05-15' of git://evilpiepirate.org/bcachefs: bcachefs: fix wrong arg to fsck_err() bcachefs: Fix missing commit in backpointer to missing target bcachefs: Fix accidental O(n^2) in fiemap bcachefs: Fix set_should_be_locked() call in peek_slot() bcachefs: Fix self deadlock bcachefs: Don't set btree nodes as accessed on fill bcachefs: Fix livelock in journal_entry_open() bcachefs: Fix broken btree_path lock invariants in next_node() bcachefs: Don't strip rebalance_opts from indirect extents
2025-05-14Merge tag 'for-6.15-rc6-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix potential endless loop when discarding a block group when disabling discard - reinstate message when setting a large value of mount option 'commit' - fix a folio leak when async extent submission fails * tag 'for-6.15-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: add back warning for mount option commit values exceeding 300 btrfs: fix folio leak in submit_one_async_extent() btrfs: fix discard worker infinite loop after disabling discard
2025-05-14smb: client: fix zero rsize error messagesPaulo Alcantara
cifs_prepare_read() might be called with a disconnected channel, where TCP_Server_Info::max_read is set to zero due to reconnect, so calling ->negotiate_rize() will set @rsize to default min IO size (64KiB) and then logging CIFS: VFS: SMB: Zero rsize calculated, using minimum value 65536 If the reconnect happens in cifsd thread, cifs_renegotiate_iosize() will end up being called and then @rsize set to the expected value. Since we can't rely on the value of @server->max_read by the time we call cifs_prepare_read(), try to ->negotiate_rize() only if @cifs_sb->ctx->rsize is zero. Reported-by: Steve French <stfrench@microsoft.com> Fixes: c59f7c9661b9 ("smb: client: ensure aligned IO sizes") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-05-14smb: client: fix memory leak during error handling for POSIX mkdirJethro Donaldson
The response buffer for the CREATE request handled by smb311_posix_mkdir() is leaked on the error path (goto err_free_rsp_buf) because the structure pointer *rsp passed to free_rsp_buf() is not assigned until *after* the error condition is checked. As *rsp is initialised to NULL, free_rsp_buf() becomes a no-op and the leak is instead reported by __kmem_cache_shutdown() upon subsequent rmmod of cifs.ko if (and only if) the error path has been hit. Pass rsp_iov.iov_base to free_rsp_buf() instead, similar to the code in other functions in smb2pdu.c for which *rsp is assigned late. Cc: stable@vger.kernel.org Signed-off-by: Jethro Donaldson <devel@jro.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-05-14bcachefs: fix wrong arg to fsck_err()Kent Overstreet
fsck_err() needs the btree transaction passed to it if there is one - so that it can unlock/relock around prompting userspace for fixing the error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-14bcachefs: Fix missing commit in backpointer to missing targetKent Overstreet
Fsck wants to do transaction commits from an outer context; it may have other repair to do (i.e. duplicate backpointers). But when calling backpointer_not_found() from runtime code, i.e. runtime self healing, we should be doing the commit - the outer context expects to just be doing lookups. This fixes bugs where we get stuck spinning, reported as "RCU lock hold time warnings. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-14bcachefs: Fix accidental O(n^2) in fiemapKent Overstreet
Since bch2_seek_pagecache_data() searches for dirty data, we only want to call it for holes in the extents btree - otherwise we have an accidental O(n^2), as we repeatedly search the same range. Reported-by: Marcin Mirosław <marcin@mejor.pl> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-14bcachefs: Fix set_should_be_locked() call in peek_slot()Kent Overstreet
set_should_be_locked() needs to be called before peek_key_cache(), which traverses other paths and may do a trans unlock/relock. This fixes an assertion pop in path_peek_slot(), when the path we're using is unexpectedly not uptodate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-14bcachefs: Fix self deadlockAlan Huang
Before invoking bch2_accounting_mem_mod_locked in bch2_gc_accounting_done, we already write locked mark_lock, in bch2_accounting_mem_insert, we lock mark_lock again. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-14bcachefs: Don't set btree nodes as accessed on fillKent Overstreet
Prevent jobs that do lots of scanning (i.e. evacuatee, scrub) from causing OOMs. The shrinker code seems to be having issues when it doesn't do any freeing because it's just flipping off the acccessed bit - and the accessed bit shouldn't be set on first use anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-14bcachefs: Fix livelock in journal_entry_open()Kent Overstreet
When the journal is low on space, we might do discards from journal_res_get() -> journal_entry_open(). Make sure we set j->can_discard correctly, so that if we're low on space but not because discards aren't keeping up we don't livelock. Fixes: 8e4d28036c29 ("bcachefs: Don't aggressively discard the journal") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-14bcachefs: Fix broken btree_path lock invariants in next_node()Kent Overstreet
This fixes btree locking assert pops users were seeing during evacuate: https://github.com/koverstreet/bcachefs/issues/878 May 09 22:45:02 sharon kernel: bcachefs (68116e25-fa2d-4c6f-86c7-e8b431d792ae): bch2_btree_insert_node(): node not locked at level 1 May 09 22:45:02 sharon kernel: bch2_btree_node_rewrite [bcachefs]: watermark=btree no_check_rw alloc l=0-1 mode=none nodes_written=0 cl.remaining=2 journal_seq=0 May 09 22:45:02 sharon kernel: path: idx 1 ref 1:0 S B btree=alloc level=0 pos 0:3699637:0 0:3698012:1-0:3699637:0 bch2_move_btree.isra.0+0x1db/0x490 [bcachefs] uptodate 0 locks_want 2 May 09 22:45:02 sharon kernel: l=0 locks intent seq 4 node ffff8bd700c93600 May 09 22:45:02 sharon kernel: l=1 locks unlocked seq 1712 node ffff8bd6fd5e7a00 May 09 22:45:02 sharon kernel: l=2 locks unlocked seq 2295 node ffff8bd6cc725400 May 09 22:45:02 sharon kernel: l=3 locks unlocked seq 0 node 0000000000000000 Evacuate walks btree nodes with bch2_btree_iter_next_node() and rewrites them, bch2_btree_update_start() upgrades the path to take intent locks as far as it needs to. But next_node() does low level unlock/relock calls on individual nodes, and didn't handle the case where a path is supposed to be holding multiple intent locks. If a path has locks_want > 1, it needs to be either holding locks on all the btree nodes (at each level) requested, or none of them. Fix this with a bch2_btree_path_downgrade(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-14bcachefs: Don't strip rebalance_opts from indirect extentsKent Overstreet
Fix bch2_bkey_clear_needs_rebalance(): indirect extents are never supposed to have bch_extent_rebalance stripped off, because that's how we get the IO path options when we don't have the original inode it belonged to. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-14Merge tag 'execve-v6.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve fix from Kees Cook: "This fixes a corner case for ASLR-disabled static-PIE brk collision with vdso allocations: - binfmt_elf: Move brk for static PIE even if ASLR disabled" * tag 'execve-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: binfmt_elf: Move brk for static PIE even if ASLR disabled
2025-05-14xfs: Fix comment on xfs_trans_ail_update_bulk()Carlos Maiolino
This function doesn't take the AIL lock, but should be called with AIL lock held. Also (hopefuly) simplify the comment. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-05-14xfs: Fix a comment on xfs_ail_deleteCarlos Maiolino
It doesn't return anything. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-05-14xfs: Fail remount with noattr2 on a v5 with v4 enabledNirjhar Roy (IBM)
Bug: When we compile the kernel with CONFIG_XFS_SUPPORT_V4=y, remount with "-o remount,noattr2" on a v5 XFS does not fail explicitly. Reproduction: mkfs.xfs -f /dev/loop0 mount /dev/loop0 /mnt/scratch mount -o remount,noattr2 /dev/loop0 /mnt/scratch However, with CONFIG_XFS_SUPPORT_V4=n, the remount correctly fails explicitly. This is because the way the following 2 functions are defined: static inline bool xfs_has_attr2 (struct xfs_mount *mp) { return !IS_ENABLED(CONFIG_XFS_SUPPORT_V4) || (mp->m_features & XFS_FEAT_ATTR2); } static inline bool xfs_has_noattr2 (const struct xfs_mount *mp) { return mp->m_features & XFS_FEAT_NOATTR2; } xfs_has_attr2() returns true when CONFIG_XFS_SUPPORT_V4=n and hence, the following if condition in xfs_fs_validate_params() succeeds and returns -EINVAL: /* * We have not read the superblock at this point, so only the attr2 * mount option can set the attr2 feature by this stage. */ if (xfs_has_attr2(mp) && xfs_has_noattr2(mp)) { xfs_warn(mp, "attr2 and noattr2 cannot both be specified."); return -EINVAL; } With CONFIG_XFS_SUPPORT_V4=y, xfs_has_attr2() always return false and hence no error is returned. Fix: Check if the existing mount has crc enabled(i.e, of type v5 and has attr2 enabled) and the remount has noattr2, if yes, return -EINVAL. I have tested xfs/{189,539} in fstests with v4 and v5 XFS with both CONFIG_XFS_SUPPORT_V4=y/n and they both behave as expected. This patch also fixes remount from noattr2 -> attr2 (on a v4 xfs). Related discussion in [1] [1] https://lore.kernel.org/all/Z65o6nWxT00MaUrW@dread.disaster.area/ Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-05-14xfs: fix zoned GC data corruption due to wrong bv_offsetChristoph Hellwig
xfs_zone_gc_write_chunk writes out the data buffer read in earlier using the same bio, and currenly looks at bv_offset for the offset into the scratch folio for that. But commit 26064d3e2b4d ("block: fix adding folio to bio") changed how bv_page and bv_offset are calculated for adding larger folios, breaking this fragile logic. Switch to extracting the full physical address from the old bio_vec, and calculate the offset into the folio from that instead. This fixes data corruption during garbage collection with heavy rockdsb workloads. Thanks to Hans for tracking down the culprit commit during long bisection sessions. Fixes: 26064d3e2b4d ("block: fix adding folio to bio") Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection") Reported-by: Hans Holmberg <Hans.Holmberg@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <Hans.Holmberg@wdc.com> Tested-by: Hans Holmberg <Hans.Holmberg@wdc.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-05-14xfs: free up mp->m_free[0].count in error caseWengang Wang
In xfs_init_percpu_counters(), memory for mp->m_free[0].count wasn't freed in error case. Free it up in this patch. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Fixes: 712bae96631852 ("xfs: generalize the freespace and reserved blocks handling") Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-05-12btrfs: add back warning for mount option commit values exceeding 300Kyoji Ogasawara
The Btrfs documentation states that if the commit value is greater than 300 a warning should be issued. The warning was accidentally lost in the new mount API update. Fixes: 6941823cc878 ("btrfs: remove old mount API code") CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Kyoji Ogasawara <sawara04.o@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-12btrfs: fix folio leak in submit_one_async_extent()Boris Burkov
If btrfs_reserve_extent() fails while submitting an async_extent for a compressed write, then we fail to call free_async_extent_pages() on the async_extent and leak its folios. A likely cause for such a failure would be btrfs_reserve_extent() failing to find a large enough contiguous free extent for the compressed extent. I was able to reproduce this by: 1. mount with compress-force=zstd:3 2. fallocating most of a filesystem to a big file 3. fragmenting the remaining free space 4. trying to copy in a file which zstd would generate large compressed extents for (vmlinux worked well for this) Step 4. hits the memory leak and can be repeated ad nauseam to eventually exhaust the system memory. Fix this by detecting the case where we fallback to uncompressed submission for a compressed async_extent and ensuring that we call free_async_extent_pages(). Fixes: 131a821a243f ("btrfs: fallback if compressed IO fails for ENOSPC") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Co-developed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-12btrfs: fix discard worker infinite loop after disabling discardFilipe Manana
If the discard worker is running and there's currently only one block group, that block group is a data block group, it's in the unused block groups discard list and is being used (it got an extent allocated from it after becoming unused), the worker can end up in an infinite loop if a transaction abort happens or the async discard is disabled (during remount or unmount for example). This happens like this: 1) Task A, the discard worker, is at peek_discard_list() and find_next_block_group() returns block group X; 2) Block group X is in the unused block groups discard list (its discard index is BTRFS_DISCARD_INDEX_UNUSED) since at some point in the past it become an unused block group and was added to that list, but then later it got an extent allocated from it, so its ->used counter is not zero anymore; 3) The current transaction is aborted by task B and we end up at __btrfs_handle_fs_error() in the transaction abort path, where we call btrfs_discard_stop(), which clears BTRFS_FS_DISCARD_RUNNING from fs_info, and then at __btrfs_handle_fs_error() we set the fs to RO mode (setting SB_RDONLY in the super block's s_flags field); 4) Task A calls __add_to_discard_list() with the goal of moving the block group from the unused block groups discard list into another discard list, but at __add_to_discard_list() we end up doing nothing because btrfs_run_discard_work() returns false, since the super block has SB_RDONLY set in its flags and BTRFS_FS_DISCARD_RUNNING is not set anymore in fs_info->flags. So block group X remains in the unused block groups discard list; 5) Task A then does a goto into the 'again' label, calls find_next_block_group() again we gets block group X again. Then it repeats the previous steps over and over since there are not other block groups in the discard lists and block group X is never moved out of the unused block groups discard list since btrfs_run_discard_work() keeps returning false and therefore __add_to_discard_list() doesn't move block group X out of that discard list. When this happens we can get a soft lockup report like this: [71.957] watchdog: BUG: soft lockup - CPU#0 stuck for 27s! [kworker/u4:3:97] [71.957] Modules linked in: xfs af_packet rfkill (...) [71.957] CPU: 0 UID: 0 PID: 97 Comm: kworker/u4:3 Tainted: G W 6.14.2-1-default #1 openSUSE Tumbleweed 968795ef2b1407352128b466fe887416c33af6fa [71.957] Tainted: [W]=WARN [71.957] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 [71.957] Workqueue: btrfs_discard btrfs_discard_workfn [btrfs] [71.957] RIP: 0010:btrfs_discard_workfn+0xc4/0x400 [btrfs] [71.957] Code: c1 01 48 83 (...) [71.957] RSP: 0018:ffffafaec03efe08 EFLAGS: 00000246 [71.957] RAX: ffff897045500000 RBX: ffff8970413ed8d0 RCX: 0000000000000000 [71.957] RDX: 0000000000000001 RSI: ffff8970413ed8d0 RDI: 0000000a8f1272ad [71.957] RBP: 0000000a9d61c60e R08: ffff897045500140 R09: 8080808080808080 [71.957] R10: ffff897040276800 R11: fefefefefefefeff R12: ffff8970413ed860 [71.957] R13: ffff897045500000 R14: ffff8970413ed868 R15: 0000000000000000 [71.957] FS: 0000000000000000(0000) GS:ffff89707bc00000(0000) knlGS:0000000000000000 [71.957] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [71.957] CR2: 00005605bcc8d2f0 CR3: 000000010376a001 CR4: 0000000000770ef0 [71.957] PKRU: 55555554 [71.957] Call Trace: [71.957] <TASK> [71.957] process_one_work+0x17e/0x330 [71.957] worker_thread+0x2ce/0x3f0 [71.957] ? __pfx_worker_thread+0x10/0x10 [71.957] kthread+0xef/0x220 [71.957] ? __pfx_kthread+0x10/0x10 [71.957] ret_from_fork+0x34/0x50 [71.957] ? __pfx_kthread+0x10/0x10 [71.957] ret_from_fork_asm+0x1a/0x30 [71.957] </TASK> [71.957] Kernel panic - not syncing: softlockup: hung tasks [71.987] CPU: 0 UID: 0 PID: 97 Comm: kworker/u4:3 Tainted: G W L 6.14.2-1-default #1 openSUSE Tumbleweed 968795ef2b1407352128b466fe887416c33af6fa [71.989] Tainted: [W]=WARN, [L]=SOFTLOCKUP [71.989] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 [71.991] Workqueue: btrfs_discard btrfs_discard_workfn [btrfs] [71.992] Call Trace: [71.993] <IRQ> [71.994] dump_stack_lvl+0x5a/0x80 [71.994] panic+0x10b/0x2da [71.995] watchdog_timer_fn.cold+0x9a/0xa1 [71.996] ? __pfx_watchdog_timer_fn+0x10/0x10 [71.997] __hrtimer_run_queues+0x132/0x2a0 [71.997] hrtimer_interrupt+0xff/0x230 [71.998] __sysvec_apic_timer_interrupt+0x55/0x100 [71.999] sysvec_apic_timer_interrupt+0x6c/0x90 [72.000] </IRQ> [72.000] <TASK> [72.001] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [72.002] RIP: 0010:btrfs_discard_workfn+0xc4/0x400 [btrfs] [72.002] Code: c1 01 48 83 (...) [72.005] RSP: 0018:ffffafaec03efe08 EFLAGS: 00000246 [72.006] RAX: ffff897045500000 RBX: ffff8970413ed8d0 RCX: 0000000000000000 [72.006] RDX: 0000000000000001 RSI: ffff8970413ed8d0 RDI: 0000000a8f1272ad [72.007] RBP: 0000000a9d61c60e R08: ffff897045500140 R09: 8080808080808080 [72.008] R10: ffff897040276800 R11: fefefefefefefeff R12: ffff8970413ed860 [72.009] R13: ffff897045500000 R14: ffff8970413ed868 R15: 0000000000000000 [72.010] ? btrfs_discard_workfn+0x51/0x400 [btrfs 23b01089228eb964071fb7ca156eee8cd3bf996f] [72.011] process_one_work+0x17e/0x330 [72.012] worker_thread+0x2ce/0x3f0 [72.013] ? __pfx_worker_thread+0x10/0x10 [72.014] kthread+0xef/0x220 [72.014] ? __pfx_kthread+0x10/0x10 [72.015] ret_from_fork+0x34/0x50 [72.015] ? __pfx_kthread+0x10/0x10 [72.016] ret_from_fork_asm+0x1a/0x30 [72.017] </TASK> [72.017] Kernel Offset: 0x15000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [72.019] Rebooting in 90 seconds.. So fix this by making sure we move a block group out of the unused block groups discard list when calling __add_to_discard_list(). Fixes: 2bee7eb8bb81 ("btrfs: discard one region at a time in async discard") Link: https://bugzilla.suse.com/show_bug.cgi?id=1242012 CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Daniel Vacek <neelx@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-12Merge tag 'udf_for_v6.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF fix from Jan Kara: "Fix a bug in UDF inode eviction leading to spewing pointless error messages" * tag 'udf_for_v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Make sure i_lenExtents is uptodate on inode eviction