| Age | Commit message (Collapse) | Author |
|
Pull smb server fixes from Steve French:
- Fix for rename regression due to the recent VFS lookup changes
- Fix write failure
- locking fix for oplock handling
* tag 'v6.15-rc8-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: use list_first_entry_or_null for opinfo_get_list()
ksmbd: fix rename failure
ksmbd: fix stream write failure
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"This contains a small set of fixes for the blocking buffer lookup
conversion done earlier this cycle.
It adds a missing conversion in the getblk slowpath and a few minor
optimizations and cleanups"
* tag 'vfs-6.15-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs/buffer: optimize discard_buffer()
fs/buffer: remove superfluous statements
fs/buffer: avoid redundant lookup in getblk slowpath
fs/buffer: use sleeping lookup in __getblk_slowpath()
|
|
Pull bcachefs fixes from Kent Overstreet:
"Small stuff, main ones users will be interested in:
- Couple more casefolding fixes; we can now detect and repair
casefolded dirents in non-casefolded dir and vice versa
- Fix for massive write inflation with mmapped io, which hit certain
databases"
* tag 'bcachefs-2025-05-22' of git://evilpiepirate.org/bcachefs:
bcachefs: Check for casefolded dirents in non casefolded dirs
bcachefs: Fix bch2_dirent_create_snapshot() for casefolding
bcachefs: Fix casefold opt via xattr interface
bcachefs: mkwrite() now only dirties one page
bcachefs: fix extent_has_stripe_ptr()
bcachefs: Fix bch2_btree_path_traverse_cached() when paths realloced
|
|
Pull smb client fixes from Steve French:
- Two fixes for use after free in readdir code paths
* tag '6.15-rc8-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: Reset all search buffer pointers when releasing buffer
smb: client: Fix use-after-free in cifs_fill_dirent
|
|
The list_first_entry() macro never returns NULL. If the list is
empty then it returns an invalid pointer. Use list_first_entry_or_null()
to check if the list is empty.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202505080231.7OXwq4Te-lkp@intel.com/
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
I found that rename fails after cifs mount due to update of
lookup_one_qstr_excl().
mv a/c b/
mv: cannot move 'a/c' to 'b/c': No such file or directory
In order to rename to a new name regardless of whether the dentry is
negative, we need to get the dentry through lookup_one_qstr_excl().
So It will not return error if the name doesn't exist.
Fixes: 204a575e91f3 ("VFS: add common error checks to lookup_one_qstr_excl()")
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Check for mismatches between casefold dirents and casefold directories.
A mismatch will cause lookups to fail, as we'll be doing the lookup with
the casefolded name, which won't match the non-casefolded dirent, and
vice versa.
Reported-by: Christopher Snowhill <chris@kode54.net>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_dirent_create_snapshot(), used in fsck, neglected to create a
casefolded dirent.
Just move this into dirent_create_key().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Changing the casefold option requires extra checks/work - factor out a
helper from bch2_fileattr_set() for the xattr code to use.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
While invalidating, the clearing of the bits in discard_buffer()
is done in one fully ordered CAS operation. In the past this was
done via individual clear_bit(), until e7470ee89f0 (fs: buffer:
do not use unnecessary atomic operations when discarding buffers).
This implies that there were never strong ordering requirements
outside of being serialized by the buffer lock.
As such relax the ordering for archs that can benefit. Further,
the implied ordering in buffer_unlock() makes current cmpxchg
implied barrier redundant due to release semantics. And while in
theory the unlock could be part of the bulk clearing, it is
best to leave it explicit, but without the double barriers.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/20250515173925.147823-5-dave@stgolabs.net
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Get rid of those unnecessary return statements.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/20250515173925.147823-4-dave@stgolabs.net
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
__getblk_slow() already implies failing a first lookup
as the fastpath, so try to create the buffers immediately
and avoid the redundant lookup. This saves 5-10% of the
total cost/latency of the slowpath.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/20250515173925.147823-3-dave@stgolabs.net
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Just as with the fast path, call the lookup variant depending
on the gfp flags.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/20250515173925.147823-2-dave@stgolabs.net
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs fix from Mike Marshall:
"Fix for orangefs page writeout counting"
* tag 'for-linus-6.15-ofs2' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: adjust counting code to recover from 665575cf
|
|
A late commit to 6.14-rc7! broke orangefs. 665575cf seems like a
good change, but maybe should have been introduced during the merge
window. This patch adjusts the counting code associated with
writing out pages so that orangefs works in a 665575cf world.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
|
|
If there is no stream data in file, v_len is zero.
So, If position(*pos) is zero, stream write will fail
due to stream write position validation check.
This patch reorganize stream write position validation.
Fixes: 0ca6df4f40cf ("ksmbd: prevent out-of-bounds stream writes by validating *pos")
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Multiple pointers in struct cifs_search_info (ntwrk_buf_start,
srch_entries_start, and last_entry) point to the same allocated buffer.
However, when freeing this buffer, only ntwrk_buf_start was set to NULL,
while the other pointers remained pointing to freed memory.
This is defensive programming to prevent potential issues with stale
pointers. While the active UAF vulnerability is fixed by the previous
patch, this change ensures consistent pointer state and more robust error
handling.
Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com>
Cc: stable@vger.kernel.org
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Don't dirty the whole folio - fixes write amplification with
applications doing mmaped writes.
https://www.reddit.com/r/bcachefs/comments/1klzcg1/incredible_amounts_of_write_amplification_when/
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This wasn't checking indirect extents.
Fixes: https://github.com/koverstreet/bcachefs/issues/887
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
There is a race condition in the readdir concurrency process, which may
access the rsp buffer after it has been released, triggering the
following KASAN warning.
==================================================================
BUG: KASAN: slab-use-after-free in cifs_fill_dirent+0xb03/0xb60 [cifs]
Read of size 4 at addr ffff8880099b819c by task a.out/342975
CPU: 2 UID: 0 PID: 342975 Comm: a.out Not tainted 6.15.0-rc6+ #240 PREEMPT(full)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x53/0x70
print_report+0xce/0x640
kasan_report+0xb8/0xf0
cifs_fill_dirent+0xb03/0xb60 [cifs]
cifs_readdir+0x12cb/0x3190 [cifs]
iterate_dir+0x1a1/0x520
__x64_sys_getdents+0x134/0x220
do_syscall_64+0x4b/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f996f64b9f9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 0d f7 c3 0c 00 f7 d8 64 89 8
RSP: 002b:00007f996f53de78 EFLAGS: 00000207 ORIG_RAX: 000000000000004e
RAX: ffffffffffffffda RBX: 00007f996f53ecdc RCX: 00007f996f64b9f9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007f996f53dea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000207 R12: ffffffffffffff88
R13: 0000000000000000 R14: 00007ffc8cd9a500 R15: 00007f996f51e000
</TASK>
Allocated by task 408:
kasan_save_stack+0x20/0x40
kasan_save_track+0x14/0x30
__kasan_slab_alloc+0x6e/0x70
kmem_cache_alloc_noprof+0x117/0x3d0
mempool_alloc_noprof+0xf2/0x2c0
cifs_buf_get+0x36/0x80 [cifs]
allocate_buffers+0x1d2/0x330 [cifs]
cifs_demultiplex_thread+0x22b/0x2690 [cifs]
kthread+0x394/0x720
ret_from_fork+0x34/0x70
ret_from_fork_asm+0x1a/0x30
Freed by task 342979:
kasan_save_stack+0x20/0x40
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x60
__kasan_slab_free+0x37/0x50
kmem_cache_free+0x2b8/0x500
cifs_buf_release+0x3c/0x70 [cifs]
cifs_readdir+0x1c97/0x3190 [cifs]
iterate_dir+0x1a1/0x520
__x64_sys_getdents64+0x134/0x220
do_syscall_64+0x4b/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The buggy address belongs to the object at ffff8880099b8000
which belongs to the cache cifs_request of size 16588
The buggy address is located 412 bytes inside of
freed 16588-byte region [ffff8880099b8000, ffff8880099bc0cc)
The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x99b8
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
anon flags: 0x80000000000040(head|node=0|zone=1)
page_type: f5(slab)
raw: 0080000000000040 ffff888001e03400 0000000000000000 dead000000000001
raw: 0000000000000000 0000000000010001 00000000f5000000 0000000000000000
head: 0080000000000040 ffff888001e03400 0000000000000000 dead000000000001
head: 0000000000000000 0000000000010001 00000000f5000000 0000000000000000
head: 0080000000000003 ffffea0000266e01 00000000ffffffff 00000000ffffffff
head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8880099b8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880099b8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8880099b8180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8880099b8200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880099b8280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
POC is available in the link [1].
The problem triggering process is as follows:
Process 1 Process 2
-----------------------------------------------------------------
cifs_readdir
/* file->private_data == NULL */
initiate_cifs_search
cifsFile = kzalloc(sizeof(struct cifsFileInfo), GFP_KERNEL);
smb2_query_dir_first ->query_dir_first()
SMB2_query_directory
SMB2_query_directory_init
cifs_send_recv
smb2_parse_query_directory
srch_inf->ntwrk_buf_start = (char *)rsp;
srch_inf->srch_entries_start = (char *)rsp + ...
srch_inf->last_entry = (char *)rsp + ...
srch_inf->smallBuf = true;
find_cifs_entry
/* if (cfile->srch_inf.ntwrk_buf_start) */
cifs_small_buf_release(cfile->srch_inf // free
cifs_readdir ->iterate_shared()
/* file->private_data != NULL */
find_cifs_entry
/* in while (...) loop */
smb2_query_dir_next ->query_dir_next()
SMB2_query_directory
SMB2_query_directory_init
cifs_send_recv
compound_send_recv
smb_send_rqst
__smb_send_rqst
rc = -ERESTARTSYS;
/* if (fatal_signal_pending()) */
goto out;
return rc
/* if (cfile->srch_inf.last_entry) */
cifs_save_resume_key()
cifs_fill_dirent // UAF
/* if (rc) */
return -ENOENT;
Fix this by ensuring the return code is checked before using pointers
from the srch_inf.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220131 [1]
Fixes: a364bc0b37f1 ("[CIFS] fix saving of resume key before CIFSFindNext")
Cc: stable@vger.kernel.org
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
btree_key_cache_fill() will allocate and traverse another path (for the
underlying btree), so we can't hold pointers to paths across a call - we
have to pass indices.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Pull smb client fixes from Steve French:
- Fix memory leak in mkdir error path
- Fix max rsize miscalculation after channel reconnect
* tag '6.15-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix zero rsize error messages
smb: client: fix memory leak during error handling for POSIX mkdir
|
|
Pull NFS client bugfixes from Trond Myklebust:
- NFS: Fix a couple of missed handlers for the ENETDOWN and ENETUNREACH
transport errors
- NFS: Handle Oopsable failure of nfs_get_lock_context in the unlock
path
- NFSv4: Fix a race in nfs_local_open_fh()
- NFSv4/pNFS: Fix a couple of layout segment leaks in layoutreturn
- NFSv4/pNFS Avoid sharing pNFS DS connections between net namespaces
since IP addresses are not guaranteed to refer to the same nodes
- NFS: Don't flush file data while holding multiple directory locks in
nfs_rename()
* tag 'nfs-for-6.15-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Avoid flushing data while holding directory locks in nfs_rename()
NFS/pnfs: Fix the error path in pnfs_layoutreturn_retry_later_locked()
NFSv4/pnfs: Reset the layout state after a layoutreturn
NFS/localio: Fix a race in nfs_local_open_fh()
nfs: nfs3acl: drop useless assignment in nfs3_get_acl()
nfs: direct: drop useless initializer in nfs_direct_write_completion()
nfs: move the nfs4_data_server_cache into struct nfs_net
nfs: don't share pNFS DS connections between net namespaces
nfs: handle failure of nfs_get_lock_context in unlock path
pNFS/flexfiles: Record the RPC errors in the I/O tracepoints
NFSv4/pnfs: Layoutreturn on close must handle fatal networking errors
NFSv4: Handle fatal ENETDOWN and ENETUNREACH errors
|
|
The Linux client assumes that all filehandles are non-volatile for
renames within the same directory (otherwise sillyrename cannot work).
However, the existence of the Linux 'subtree_check' export option has
meant that nfs_rename() has always assumed it needs to flush writes
before attempting to rename.
Since NFSv4 does allow the client to query whether or not the server
exhibits this behaviour, and since knfsd does actually set the
appropriate flag when 'subtree_check' is enabled on an export, it
should be OK to optimise away the write flushing behaviour in the cases
where it is clearly not needed.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
|
If there isn't a valid layout, or the layout stateid has changed, the
cleanup after a layout return should clear out the old data.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If there are still layout segments in the layout plh_return_lsegs list
after a layout return, we should be resetting the state to ensure they
eventually get returned as well.
Fixes: 68f744797edd ("pNFS: Do not free layout segments that are marked for return")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Pull xfs fixes from Carlos Maiolino:
"This includes a bug fix for a possible data corruption vector on the
zoned allocator garbage collector"
* tag 'xfs-fixes-6.15-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: Fix comment on xfs_trans_ail_update_bulk()
xfs: Fix a comment on xfs_ail_delete
xfs: Fail remount with noattr2 on a v5 with v4 enabled
xfs: fix zoned GC data corruption due to wrong bv_offset
xfs: free up mp->m_free[0].count in error case
|
|
Pull bcachefs fixes from Kent Overstreet:
"The main user reported ones are:
- Fix a btree iterator locking inconsistency that's been causing us
to go emergency read-only in evacuate: "Fix broken btree_path lock
invariants in next_node()"
- Minor btree node cache reclaim tweak that should help with OOMs:
don't set btree nodes as accessed on fill
- Fix a bch2_bkey_clear_rebalance() issue that was causing rebalance
to do needless work"
* tag 'bcachefs-2025-05-15' of git://evilpiepirate.org/bcachefs:
bcachefs: fix wrong arg to fsck_err()
bcachefs: Fix missing commit in backpointer to missing target
bcachefs: Fix accidental O(n^2) in fiemap
bcachefs: Fix set_should_be_locked() call in peek_slot()
bcachefs: Fix self deadlock
bcachefs: Don't set btree nodes as accessed on fill
bcachefs: Fix livelock in journal_entry_open()
bcachefs: Fix broken btree_path lock invariants in next_node()
bcachefs: Don't strip rebalance_opts from indirect extents
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix potential endless loop when discarding a block group when
disabling discard
- reinstate message when setting a large value of mount option 'commit'
- fix a folio leak when async extent submission fails
* tag 'for-6.15-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: add back warning for mount option commit values exceeding 300
btrfs: fix folio leak in submit_one_async_extent()
btrfs: fix discard worker infinite loop after disabling discard
|
|
cifs_prepare_read() might be called with a disconnected channel, where
TCP_Server_Info::max_read is set to zero due to reconnect, so calling
->negotiate_rize() will set @rsize to default min IO size (64KiB) and
then logging
CIFS: VFS: SMB: Zero rsize calculated, using minimum value
65536
If the reconnect happens in cifsd thread, cifs_renegotiate_iosize()
will end up being called and then @rsize set to the expected value.
Since we can't rely on the value of @server->max_read by the time we
call cifs_prepare_read(), try to ->negotiate_rize() only if
@cifs_sb->ctx->rsize is zero.
Reported-by: Steve French <stfrench@microsoft.com>
Fixes: c59f7c9661b9 ("smb: client: ensure aligned IO sizes")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The response buffer for the CREATE request handled by smb311_posix_mkdir()
is leaked on the error path (goto err_free_rsp_buf) because the structure
pointer *rsp passed to free_rsp_buf() is not assigned until *after* the
error condition is checked.
As *rsp is initialised to NULL, free_rsp_buf() becomes a no-op and the leak
is instead reported by __kmem_cache_shutdown() upon subsequent rmmod of
cifs.ko if (and only if) the error path has been hit.
Pass rsp_iov.iov_base to free_rsp_buf() instead, similar to the code in
other functions in smb2pdu.c for which *rsp is assigned late.
Cc: stable@vger.kernel.org
Signed-off-by: Jethro Donaldson <devel@jro.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
fsck_err() needs the btree transaction passed to it if there is one - so
that it can unlock/relock around prompting userspace for fixing the
error.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fsck wants to do transaction commits from an outer context; it may have
other repair to do (i.e. duplicate backpointers).
But when calling backpointer_not_found() from runtime code, i.e. runtime
self healing, we should be doing the commit - the outer context expects
to just be doing lookups.
This fixes bugs where we get stuck spinning, reported as "RCU lock hold
time warnings.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Since bch2_seek_pagecache_data() searches for dirty data, we only want
to call it for holes in the extents btree - otherwise we have an
accidental O(n^2), as we repeatedly search the same range.
Reported-by: Marcin Mirosław <marcin@mejor.pl>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
set_should_be_locked() needs to be called before peek_key_cache(), which
traverses other paths and may do a trans unlock/relock.
This fixes an assertion pop in path_peek_slot(), when the path we're
using is unexpectedly not uptodate.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Before invoking bch2_accounting_mem_mod_locked in
bch2_gc_accounting_done, we already write locked mark_lock,
in bch2_accounting_mem_insert, we lock mark_lock again.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Prevent jobs that do lots of scanning (i.e. evacuatee, scrub) from
causing OOMs.
The shrinker code seems to be having issues when it doesn't do any
freeing because it's just flipping off the acccessed bit - and the
accessed bit shouldn't be set on first use anyways.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When the journal is low on space, we might do discards from
journal_res_get() -> journal_entry_open().
Make sure we set j->can_discard correctly, so that if we're low on space
but not because discards aren't keeping up we don't livelock.
Fixes: 8e4d28036c29 ("bcachefs: Don't aggressively discard the journal")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This fixes btree locking assert pops users were seeing during evacuate:
https://github.com/koverstreet/bcachefs/issues/878
May 09 22:45:02 sharon kernel: bcachefs (68116e25-fa2d-4c6f-86c7-e8b431d792ae): bch2_btree_insert_node(): node not locked at level 1
May 09 22:45:02 sharon kernel: bch2_btree_node_rewrite [bcachefs]: watermark=btree no_check_rw alloc l=0-1 mode=none nodes_written=0 cl.remaining=2 journal_seq=0
May 09 22:45:02 sharon kernel: path: idx 1 ref 1:0 S B btree=alloc level=0 pos 0:3699637:0 0:3698012:1-0:3699637:0 bch2_move_btree.isra.0+0x1db/0x490 [bcachefs] uptodate 0 locks_want 2
May 09 22:45:02 sharon kernel: l=0 locks intent seq 4 node ffff8bd700c93600
May 09 22:45:02 sharon kernel: l=1 locks unlocked seq 1712 node ffff8bd6fd5e7a00
May 09 22:45:02 sharon kernel: l=2 locks unlocked seq 2295 node ffff8bd6cc725400
May 09 22:45:02 sharon kernel: l=3 locks unlocked seq 0 node 0000000000000000
Evacuate walks btree nodes with bch2_btree_iter_next_node() and rewrites
them, bch2_btree_update_start() upgrades the path to take intent locks
as far as it needs to.
But next_node() does low level unlock/relock calls on individual nodes,
and didn't handle the case where a path is supposed to be holding
multiple intent locks. If a path has locks_want > 1, it needs to be
either holding locks on all the btree nodes (at each level) requested,
or none of them.
Fix this with a bch2_btree_path_downgrade().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fix bch2_bkey_clear_needs_rebalance(): indirect extents are never
supposed to have bch_extent_rebalance stripped off, because that's how
we get the IO path options when we don't have the original inode it
belonged to.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve fix from Kees Cook:
"This fixes a corner case for ASLR-disabled static-PIE brk collision
with vdso allocations:
- binfmt_elf: Move brk for static PIE even if ASLR disabled"
* tag 'execve-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt_elf: Move brk for static PIE even if ASLR disabled
|
|
This function doesn't take the AIL lock, but should be called
with AIL lock held. Also (hopefuly) simplify the comment.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
It doesn't return anything.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Bug: When we compile the kernel with CONFIG_XFS_SUPPORT_V4=y,
remount with "-o remount,noattr2" on a v5 XFS does not
fail explicitly.
Reproduction:
mkfs.xfs -f /dev/loop0
mount /dev/loop0 /mnt/scratch
mount -o remount,noattr2 /dev/loop0 /mnt/scratch
However, with CONFIG_XFS_SUPPORT_V4=n, the remount
correctly fails explicitly. This is because the way the
following 2 functions are defined:
static inline bool xfs_has_attr2 (struct xfs_mount *mp)
{
return !IS_ENABLED(CONFIG_XFS_SUPPORT_V4) ||
(mp->m_features & XFS_FEAT_ATTR2);
}
static inline bool xfs_has_noattr2 (const struct xfs_mount *mp)
{
return mp->m_features & XFS_FEAT_NOATTR2;
}
xfs_has_attr2() returns true when CONFIG_XFS_SUPPORT_V4=n
and hence, the following if condition in
xfs_fs_validate_params() succeeds and returns -EINVAL:
/*
* We have not read the superblock at this point, so only the attr2
* mount option can set the attr2 feature by this stage.
*/
if (xfs_has_attr2(mp) && xfs_has_noattr2(mp)) {
xfs_warn(mp, "attr2 and noattr2 cannot both be specified.");
return -EINVAL;
}
With CONFIG_XFS_SUPPORT_V4=y, xfs_has_attr2() always return
false and hence no error is returned.
Fix: Check if the existing mount has crc enabled(i.e, of
type v5 and has attr2 enabled) and the
remount has noattr2, if yes, return -EINVAL.
I have tested xfs/{189,539} in fstests with v4
and v5 XFS with both CONFIG_XFS_SUPPORT_V4=y/n and
they both behave as expected.
This patch also fixes remount from noattr2 -> attr2 (on a v4 xfs).
Related discussion in [1]
[1] https://lore.kernel.org/all/Z65o6nWxT00MaUrW@dread.disaster.area/
Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
xfs_zone_gc_write_chunk writes out the data buffer read in earlier using
the same bio, and currenly looks at bv_offset for the offset into the
scratch folio for that. But commit 26064d3e2b4d ("block: fix adding
folio to bio") changed how bv_page and bv_offset are calculated for
adding larger folios, breaking this fragile logic.
Switch to extracting the full physical address from the old bio_vec,
and calculate the offset into the folio from that instead.
This fixes data corruption during garbage collection with heavy rockdsb
workloads. Thanks to Hans for tracking down the culprit commit during
long bisection sessions.
Fixes: 26064d3e2b4d ("block: fix adding folio to bio")
Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection")
Reported-by: Hans Holmberg <Hans.Holmberg@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hans Holmberg <Hans.Holmberg@wdc.com>
Tested-by: Hans Holmberg <Hans.Holmberg@wdc.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
In xfs_init_percpu_counters(), memory for mp->m_free[0].count wasn't freed
in error case. Free it up in this patch.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Fixes: 712bae96631852 ("xfs: generalize the freespace and reserved blocks handling")
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The Btrfs documentation states that if the commit value is greater than
300 a warning should be issued. The warning was accidentally lost in the
new mount API update.
Fixes: 6941823cc878 ("btrfs: remove old mount API code")
CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Kyoji Ogasawara <sawara04.o@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
If btrfs_reserve_extent() fails while submitting an async_extent for a
compressed write, then we fail to call free_async_extent_pages() on the
async_extent and leak its folios. A likely cause for such a failure
would be btrfs_reserve_extent() failing to find a large enough
contiguous free extent for the compressed extent.
I was able to reproduce this by:
1. mount with compress-force=zstd:3
2. fallocating most of a filesystem to a big file
3. fragmenting the remaining free space
4. trying to copy in a file which zstd would generate large compressed
extents for (vmlinux worked well for this)
Step 4. hits the memory leak and can be repeated ad nauseam to
eventually exhaust the system memory.
Fix this by detecting the case where we fallback to uncompressed
submission for a compressed async_extent and ensuring that we call
free_async_extent_pages().
Fixes: 131a821a243f ("btrfs: fallback if compressed IO fails for ENOSPC")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Co-developed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
If the discard worker is running and there's currently only one block
group, that block group is a data block group, it's in the unused block
groups discard list and is being used (it got an extent allocated from it
after becoming unused), the worker can end up in an infinite loop if a
transaction abort happens or the async discard is disabled (during remount
or unmount for example).
This happens like this:
1) Task A, the discard worker, is at peek_discard_list() and
find_next_block_group() returns block group X;
2) Block group X is in the unused block groups discard list (its discard
index is BTRFS_DISCARD_INDEX_UNUSED) since at some point in the past
it become an unused block group and was added to that list, but then
later it got an extent allocated from it, so its ->used counter is not
zero anymore;
3) The current transaction is aborted by task B and we end up at
__btrfs_handle_fs_error() in the transaction abort path, where we call
btrfs_discard_stop(), which clears BTRFS_FS_DISCARD_RUNNING from
fs_info, and then at __btrfs_handle_fs_error() we set the fs to RO mode
(setting SB_RDONLY in the super block's s_flags field);
4) Task A calls __add_to_discard_list() with the goal of moving the block
group from the unused block groups discard list into another discard
list, but at __add_to_discard_list() we end up doing nothing because
btrfs_run_discard_work() returns false, since the super block has
SB_RDONLY set in its flags and BTRFS_FS_DISCARD_RUNNING is not set
anymore in fs_info->flags. So block group X remains in the unused block
groups discard list;
5) Task A then does a goto into the 'again' label, calls
find_next_block_group() again we gets block group X again. Then it
repeats the previous steps over and over since there are not other
block groups in the discard lists and block group X is never moved
out of the unused block groups discard list since
btrfs_run_discard_work() keeps returning false and therefore
__add_to_discard_list() doesn't move block group X out of that discard
list.
When this happens we can get a soft lockup report like this:
[71.957] watchdog: BUG: soft lockup - CPU#0 stuck for 27s! [kworker/u4:3:97]
[71.957] Modules linked in: xfs af_packet rfkill (...)
[71.957] CPU: 0 UID: 0 PID: 97 Comm: kworker/u4:3 Tainted: G W 6.14.2-1-default #1 openSUSE Tumbleweed 968795ef2b1407352128b466fe887416c33af6fa
[71.957] Tainted: [W]=WARN
[71.957] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
[71.957] Workqueue: btrfs_discard btrfs_discard_workfn [btrfs]
[71.957] RIP: 0010:btrfs_discard_workfn+0xc4/0x400 [btrfs]
[71.957] Code: c1 01 48 83 (...)
[71.957] RSP: 0018:ffffafaec03efe08 EFLAGS: 00000246
[71.957] RAX: ffff897045500000 RBX: ffff8970413ed8d0 RCX: 0000000000000000
[71.957] RDX: 0000000000000001 RSI: ffff8970413ed8d0 RDI: 0000000a8f1272ad
[71.957] RBP: 0000000a9d61c60e R08: ffff897045500140 R09: 8080808080808080
[71.957] R10: ffff897040276800 R11: fefefefefefefeff R12: ffff8970413ed860
[71.957] R13: ffff897045500000 R14: ffff8970413ed868 R15: 0000000000000000
[71.957] FS: 0000000000000000(0000) GS:ffff89707bc00000(0000) knlGS:0000000000000000
[71.957] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[71.957] CR2: 00005605bcc8d2f0 CR3: 000000010376a001 CR4: 0000000000770ef0
[71.957] PKRU: 55555554
[71.957] Call Trace:
[71.957] <TASK>
[71.957] process_one_work+0x17e/0x330
[71.957] worker_thread+0x2ce/0x3f0
[71.957] ? __pfx_worker_thread+0x10/0x10
[71.957] kthread+0xef/0x220
[71.957] ? __pfx_kthread+0x10/0x10
[71.957] ret_from_fork+0x34/0x50
[71.957] ? __pfx_kthread+0x10/0x10
[71.957] ret_from_fork_asm+0x1a/0x30
[71.957] </TASK>
[71.957] Kernel panic - not syncing: softlockup: hung tasks
[71.987] CPU: 0 UID: 0 PID: 97 Comm: kworker/u4:3 Tainted: G W L 6.14.2-1-default #1 openSUSE Tumbleweed 968795ef2b1407352128b466fe887416c33af6fa
[71.989] Tainted: [W]=WARN, [L]=SOFTLOCKUP
[71.989] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
[71.991] Workqueue: btrfs_discard btrfs_discard_workfn [btrfs]
[71.992] Call Trace:
[71.993] <IRQ>
[71.994] dump_stack_lvl+0x5a/0x80
[71.994] panic+0x10b/0x2da
[71.995] watchdog_timer_fn.cold+0x9a/0xa1
[71.996] ? __pfx_watchdog_timer_fn+0x10/0x10
[71.997] __hrtimer_run_queues+0x132/0x2a0
[71.997] hrtimer_interrupt+0xff/0x230
[71.998] __sysvec_apic_timer_interrupt+0x55/0x100
[71.999] sysvec_apic_timer_interrupt+0x6c/0x90
[72.000] </IRQ>
[72.000] <TASK>
[72.001] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[72.002] RIP: 0010:btrfs_discard_workfn+0xc4/0x400 [btrfs]
[72.002] Code: c1 01 48 83 (...)
[72.005] RSP: 0018:ffffafaec03efe08 EFLAGS: 00000246
[72.006] RAX: ffff897045500000 RBX: ffff8970413ed8d0 RCX: 0000000000000000
[72.006] RDX: 0000000000000001 RSI: ffff8970413ed8d0 RDI: 0000000a8f1272ad
[72.007] RBP: 0000000a9d61c60e R08: ffff897045500140 R09: 8080808080808080
[72.008] R10: ffff897040276800 R11: fefefefefefefeff R12: ffff8970413ed860
[72.009] R13: ffff897045500000 R14: ffff8970413ed868 R15: 0000000000000000
[72.010] ? btrfs_discard_workfn+0x51/0x400 [btrfs 23b01089228eb964071fb7ca156eee8cd3bf996f]
[72.011] process_one_work+0x17e/0x330
[72.012] worker_thread+0x2ce/0x3f0
[72.013] ? __pfx_worker_thread+0x10/0x10
[72.014] kthread+0xef/0x220
[72.014] ? __pfx_kthread+0x10/0x10
[72.015] ret_from_fork+0x34/0x50
[72.015] ? __pfx_kthread+0x10/0x10
[72.016] ret_from_fork_asm+0x1a/0x30
[72.017] </TASK>
[72.017] Kernel Offset: 0x15000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[72.019] Rebooting in 90 seconds..
So fix this by making sure we move a block group out of the unused block
groups discard list when calling __add_to_discard_list().
Fixes: 2bee7eb8bb81 ("btrfs: discard one region at a time in async discard")
Link: https://bugzilla.suse.com/show_bug.cgi?id=1242012
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Daniel Vacek <neelx@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF fix from Jan Kara:
"Fix a bug in UDF inode eviction leading to spewing pointless
error messages"
* tag 'udf_for_v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Make sure i_lenExtents is uptodate on inode eviction
|