summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
12 daysi3c: master: Add i3c_master_do_daa_ext() for post-hibernation address recoveryAdrian Hunter
After system hibernation, I3C Dynamic Addresses may be reassigned at boot and no longer match the values recorded before suspend. Introduce i3c_master_do_daa_ext() to handle this situation. The restore procedure is straightforward: issue a Reset Dynamic Address Assignment (RSTDAA), then run the standard DAA sequence. The existing DAA logic already supports detecting and updating devices whose dynamic addresses differ from previously known values. Refactor the DAA path by introducing a shared helper used by both the normal i3c_master_do_daa() path and the new extended restore function, and correct the kernel-doc in the process. Export i3c_master_do_daa_ext() so that master drivers can invoke it from their PM restore callbacks. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260123063325.8210-2-adrian.hunter@intel.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
12 daysperf: sched: Fix perf crash with new is_user_task() helperSteven Rostedt
In order to do a user space stacktrace the current task needs to be a user task that has executed in user space. It use to be possible to test if a task is a user task or not by simply checking the task_struct mm field. If it was non NULL, it was a user task and if not it was a kernel task. But things have changed over time, and some kernel tasks now have their own mm field. An idea was made to instead test PF_KTHREAD and two functions were used to wrap this check in case it became more complex to test if a task was a user task or not[1]. But this was rejected and the C code simply checked the PF_KTHREAD directly. It was later found that not all kernel threads set PF_KTHREAD. The io-uring helpers instead set PF_USER_WORKER and this needed to be added as well. But checking the flags is still not enough. There's a very small window when a task exits that it frees its mm field and it is set back to NULL. If perf were to trigger at this moment, the flags test would say its a user space task but when perf would read the mm field it would crash with at NULL pointer dereference. Now there are flags that can be used to test if a task is exiting, but they are set in areas that perf may still want to profile the user space task (to see where it exited). The only real test is to check both the flags and the mm field. Instead of making this modification in every location, create a new is_user_task() helper function that does all the tests needed to know if it is safe to read the user space memory or not. [1] https://lore.kernel.org/all/20250425204120.639530125@goodmis.org/ Fixes: 90942f9fac05 ("perf: Use current->flags & PF_KTHREAD|PF_USER_WORKER instead of current->mm == NULL") Closes: https://lore.kernel.org/all/0d877e6f-41a7-4724-875d-0b0a27b8a545@roeck-us.net/ Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260129102821.46484722@gandalf.local.home
12 daysMerge tag 'dma-mapping-6.19-2026-01-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fixes from Marek Szyprowski: - important fix for ARM 32-bit based systems using cma= kernel parameter (Oreoluwa Babatunde) - a fix for the corner case of the DMA atomic pool based allocations (Sai Sree Kartheek Adivi) * tag 'dma-mapping-6.19-2026-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma/pool: distinguish between missing and exhausted atomic pools of: reserved_mem: Allow reserved_mem framework detect "cma=" kernel param
12 daysbpf: Allow sleepable programs to use tail callsJiri Olsa
Allowing sleepable programs to use tail calls. Making sure we can't mix sleepable and non-sleepable bpf programs in tail call map (BPF_MAP_TYPE_PROG_ARRAY) and allowing it to be used in sleepable programs. Sleepable programs can be preempted and sleep which might bring new source of race conditions, but both direct and indirect tail calls should not be affected. Direct tail calls work by patching direct jump to callee into bpf caller program, so no problem there. We atomically switch from nop to jump instruction. Indirect tail call reads the callee from the map and then jumps to it. The callee bpf program can't disappear (be released) from the caller, because it is executed under rcu lock (rcu_read_lock_trace). Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260130081208.1130204-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
12 daysblock: introduce bdev_rot()Damien Le Moal
Introduce the helper function bdev_rot() to test if a block device is a rotational one. The existing function bdev_nonrot() which tests for the opposite condition is redefined using this new helper. This avoids the double negation (operator and name) that appears when testing if a block device is a rotational device, thus making the code a little easier to read. Call sites of bdev_nonrot() in the block layer are updated to use this new helper. Remaining users in other subsystems are left unchanged for now. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
12 daysMerge branch 'core/entry' into sched/coreThomas Gleixner
Pull the entry update to avoid merge conflicts with the time slice extension changes. Signed-off-by: Thomas Gleixner <tglx@kernel.org>
12 daysentry: Inline syscall_exit_work() and syscall_trace_enter()Jinjie Ruan
After switching ARM64 to the generic entry code, a syscall_exit_work() appeared as a profiling hotspot because it is not inlined. Inlining both syscall_trace_enter() and syscall_exit_work() provides a performance gain when any of the work items is enabled. With audit enabled this results in a ~4% performance gain for perf bench basic syscall on a kunpeng920 system: | Metric | Baseline | Inlined | Change | | ---------- | ----------- | ----------- | ------ | | Total time | 2.353 [sec] | 2.264 [sec] | ↓3.8% | | usecs/op | 0.235374 | 0.226472 | ↓3.8% | | ops/sec | 4,248,588 | 4,415,554 | ↑3.9% | Small gains can be observed on x86 as well, though the generated code optimizes for the work case, which is counterproductive for high performance scenarios where such entry/exit work is usually avoided. Avoid this by marking the work check in syscall_enter_from_user_mode_work() unlikely, which is what the corresponding check in the exit path does already. [ tglx: Massage changelog and add the unlikely() ] Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260128031934.3906955-14-ruanjinjie@huawei.com
12 daysentry: Add arch_ptrace_report_syscall_entry/exit()Jinjie Ruan
ARM64 requires a architecture specific ptrace wrapper as it needs to save and restore scratch registers. Provide arch_ptrace_report_syscall_entry/exit() wrappers which fall back to ptrace_report_syscall_entry/exit() if the architecture does not provide them. No functional change intended. [ tglx: Massaged changelog and comments ] Suggested-by: Mark Rutland <mark.rutland@arm.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Link: https://patch.msgid.link/20260128031934.3906955-11-ruanjinjie@huawei.com
12 daysentry: Rework syscall_exit_to_user_mode_work() for architecture reuseJinjie Ruan
syscall_exit_to_user_mode_work() invokes local_irq_disable_exit_to_user() and syscall_exit_to_user_mode_prepare() after handling pending syscall exit work. The conversion of ARM64 to the generic entry code requires this to be split up, so move the invocations of local_irq_disable_exit_to_user() and syscall_exit_to_user_mode_prepare() into the only caller. No functional change intended. [ tglx: Massaged changelog and comments ] Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20260128031934.3906955-10-ruanjinjie@huawei.com
12 daysentry: Remove unused syscall argument from syscall_trace_enter()Jinjie Ruan
The 'syscall' argument of syscall_trace_enter() is immediately overwritten before any real use and serves only as a local variable, so drop the parameter. No functional change intended. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260128031934.3906955-2-ruanjinjie@huawei.com
12 dayspkcs7, x509: Add ML-DSA supportDavid Howells
Add support for ML-DSA keys and signatures to the CMS/PKCS#7 and X.509 implementations. ML-DSA-44, -65 and -87 are all supported. For X.509 certificates, the TBSCertificate is required to be signed directly; for CMS, direct signing of the data is preferred, though use of SHA512 (and only that) as an intermediate hash of the content is permitted with signedAttrs. Signed-off-by: David Howells <dhowells@redhat.com> cc: Lukas Wunner <lukas@wunner.de> cc: Ignat Korchagin <ignat@cloudflare.com> cc: Stephan Mueller <smueller@chronox.de> cc: Eric Biggers <ebiggers@kernel.org> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: keyrings@vger.kernel.org cc: linux-crypto@vger.kernel.org
12 daysnet: add skb_header_pointer_careful() helperEric Dumazet
This variant of skb_header_pointer() should be used in contexts where @offset argument is user-controlled and could be negative. Negative offsets are supported, as long as the zone starts between skb->head and skb->data. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260128141539.3404400-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 daysblock: introduce blk_queue_rot()Damien Le Moal
To check if a request queue is for a rotational device, a double negation is needed with the pattern "!blk_queue_nonrot(q)". Simplify this with the introduction of the helper blk_queue_rot() which tests if a requests queue limit has the BLK_FEAT_ROTATIONAL feature set. All call sites of blk_queue_nonrot() are modified to use blk_queue_rot() and blk_queue_nonrot() definition removed. No functional changes. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 daysblock: cleanup queue limit features definitionDamien Le Moal
Unwrap the definition of BLK_FEAT_ATOMIC_WRITES and renumber this feature to be sequential with BLK_FEAT_SKIP_TAGSET_QUIESCE. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 daysMerge tag 'mm-hotfixes-stable-2026-01-29-09-41' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "16 hotfixes. 9 are cc:stable, 12 are for MM. There's a patch series from Pratyush Yadav which fixes a few things in the new-in-6.19 LUO memfd code. Plus the usual shower of singletons - please see the changelogs for details" * tag 'mm-hotfixes-stable-2026-01-29-09-41' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: vmcoreinfo: make hwerr_data visible for debugging mm/zone_device: reinitialize large zone device private folios mm/mm_init: don't cond_resched() in deferred_init_memmap_chunk() if called from deferred_grow_zone() mm/kfence: randomize the freelist on initialization kho: kho_preserve_vmalloc(): don't return 0 when ENOMEM kho: init alloc tags when restoring pages from reserved memory mm: memfd_luo: restore and free memfd_luo_ser on failure mm: memfd_luo: use memfd_alloc_file() instead of shmem_file_setup() memfd: export alloc_file() flex_proportions: make fprop_new_period() hardirq safe mailmap: add entry for Viacheslav Bocharov mm/memory-failure: teach kill_accessing_process to accept hugetlb tail page pfn mm/memory-failure: fix missing ->mf_stats count in hugetlb poison mm, swap: restore swap_space attr aviod kernel panic mm/kasan: fix KASAN poisoning in vrealloc() mm/shmem, swap: fix race of truncate and swap entry split
13 daysMerge tag 'mtk-soc-for-v6.20' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux into soc/drivers MediaTek soc driver updates This adds: - A socinfo entry for the MT8371 Genio 520 SoC - Support for the Dynamic Voltage and Frequency Scaling Resource Controller (DVFSRC) version 4, found in the new MediaTek Kompanio Ultra (MT8196) SoC - Initial support for the CMDQ mailbox found in the MT8196. - A memory leak fix in the MediaTek SVS driver's debug ops. * tag 'mtk-soc-for-v6.20' of https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux: soc: mediatek: mtk-cmdq: Add mminfra_offset adjustment for DRAM addresses soc: mediatek: mtk-cmdq: Extend cmdq_pkt_write API for SoCs without subsys ID soc: mediatek: mtk-cmdq: Add pa_base parsing for hardware without subsys ID support soc: mediatek: mtk-cmdq: Add cmdq_get_mbox_priv() in cmdq_pkt_create() mailbox: mtk-cmdq: Add driver data to support for MT8196 mailbox: mtk-cmdq: Add mminfra_offset configuration for DRAM transaction mailbox: mtk-cmdq: Add GCE hardware virtualization configuration mailbox: mtk-cmdq: Add cmdq private data to cmdq_pkt for generating instruction soc: mediatek: mtk-dvfsrc: Rework bandwidth calculations soc: mediatek: mtk-dvfsrc: Get and Enable DVFSRC clock soc: mediatek: mtk-dvfsrc: Add support for DVFSRCv4 and MT8196 soc: mediatek: mtk-dvfsrc: Write bandwidth to EMI DDR if present soc: mediatek: mtk-dvfsrc: Add a new callback for calc_dram_bw soc: mediatek: mtk-dvfsrc: Add and propagate DVFSRC bandwidth type soc: mediatek: mtk-dvfsrc: Change error check for DVFSRCv4 START cmd dt-bindings: soc: mediatek: dvfsrc: Document clock soc: mediatek: mtk-socinfo: Add entry for MT8371AV/AZA Genio 520 soc: mediatek: svs: Fix memory leak in svs_enable_debug_write() Signed-off-by: Arnd Bergmann <arnd@arndb.de>
13 daysMerge tag 'apple-soc-drivers-6.20' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/sven/linux into soc/drivers Apple SoC driver updates for 6.20 - Add a poweroff function to the RTKit library which will be required for the first USB4/Thunderbolt series I hope to submit next cycle. * tag 'apple-soc-drivers-6.20' of https://git.kernel.org/pub/scm/linux/kernel/git/sven/linux: soc: apple: rtkit: Add function to poweroff Signed-off-by: Arnd Bergmann <arnd@arndb.de>
14 daysof: reserved_mem: Allow reserved_mem framework detect "cma=" kernel paramOreoluwa Babatunde
When initializing the default cma region, the "cma=" kernel parameter takes priority over a DT defined linux,cma-default region. Hence, give the reserved_mem framework the ability to detect this so that the DT defined cma region can skip initialization accordingly. Signed-off-by: Oreoluwa Babatunde <oreoluwa.babatunde@oss.qualcomm.com> Tested-by: Joy Zou <joy.zou@nxp.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Fixes: 8a6e02d0c00e ("of: reserved_mem: Restructure how the reserved memory regions are processed") Fixes: 2c223f7239f3 ("of: reserved_mem: Restructure call site for dma_contiguous_early_fixup()") Link: https://lore.kernel.org/r/20251210002027.1171519-1-oreoluwa.babatunde@oss.qualcomm.com [mszyprow: rebased onto v6.19-rc1, added fixes tags, added a stub for cma_skip_dt_default_reserved_mem() if no CONFIG_DMA_CMA is set] Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
14 dayscpufreq: ondemand: Simplify idle cputime granularity testFrederic Weisbecker
cpufreq calls get_cpu_idle_time_us() just to know if idle cputime accounting has a nanoseconds granularity. Use the appropriate indicator instead to make that deduction. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Link: https://patch.msgid.link/aXozx0PXutnm8ECX@localhost.localdomain Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
14 dayscompiler-context-analysis: Remove __assume_ctx_lock from initializersMarco Elver
Remove __assume_ctx_lock() from lock initializers. Implicitly asserting an active context during initialization caused false-positive double-lock errors when acquiring a lock immediately after its initialization. Moving forward, guarded member initialization must either: 1. Use guard(type_init)(&lock) or scoped_guard(type_init, ...). 2. Use context_unsafe() for simple initialization. Reported-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/57062131-e79e-42c2-aa0b-8f931cb8cac2@acm.org/ Link: https://patch.msgid.link/20260119094029.1344361-7-elver@google.com
14 dayscompiler-context-analysis: Introduce scoped init guardsMarco Elver
Add scoped init guard definitions for common synchronization primitives supported by context analysis. The scoped init guards treat the context as active within initialization scope of the underlying context lock, given initialization implies exclusive access to the underlying object. This allows initialization of guarded members without disabling context analysis, while documenting initialization from subsequent usage. The documentation is updated with the new recommendation. Where scoped init guards are not provided or cannot be implemented (ww_mutex omitted for lack of multi-arg guard initializers), the alternative is to just disable context analysis where guarded members are initialized. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/20251212095943.GM3911114@noisy.programming.kicks-ass.net/ Link: https://patch.msgid.link/20260119094029.1344361-3-elver@google.com
14 dayscleanup: Make __DEFINE_LOCK_GUARD handle commas in initializersMarco Elver
Initialization macros can expand to structure initializers containing commas, which when used as a "lock" function resulted in errors such as: >> include/linux/spinlock.h:582:56: error: too many arguments provided to function-like macro invocation 582 | DEFINE_LOCK_GUARD_1(raw_spinlock_init, raw_spinlock_t, raw_spin_lock_init(_T->lock), /* */) | ^ include/linux/spinlock.h:113:17: note: expanded from macro 'raw_spin_lock_init' 113 | do { *(lock) = __RAW_SPIN_LOCK_UNLOCKED(lock); } while (0) | ^ include/linux/spinlock_types_raw.h:70:19: note: expanded from macro '__RAW_SPIN_LOCK_UNLOCKED' 70 | (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname) | ^ include/linux/spinlock_types_raw.h:67:34: note: expanded from macro '__RAW_SPIN_LOCK_INITIALIZER' 67 | RAW_SPIN_DEP_MAP_INIT(lockname) } | ^ include/linux/cleanup.h:496:9: note: macro '__DEFINE_LOCK_GUARD_1' defined here 496 | #define __DEFINE_LOCK_GUARD_1(_name, _type, _lock) \ | ^ include/linux/spinlock.h:582:1: note: parentheses are required around macro argument containing braced initializer list 582 | DEFINE_LOCK_GUARD_1(raw_spinlock_init, raw_spinlock_t, raw_spin_lock_init(_T->lock), /* */) | ^ | ( include/linux/cleanup.h:558:60: note: expanded from macro 'DEFINE_LOCK_GUARD_1' 558 | __DEFINE_UNLOCK_GUARD(_name, _type, _unlock, __VA_ARGS__) \ | ^ Make __DEFINE_LOCK_GUARD_0 and __DEFINE_LOCK_GUARD_1 variadic so that __VA_ARGS__ captures everything. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260119094029.1344361-2-elver@google.com
14 daysftrace: Factor ftrace_ops ops_func interfaceJiri Olsa
We are going to remove "ftrace_ops->private == bpf_trampoline" setup in following changes. Adding ip argument to ftrace_ops_func_t callback function, so we can use it to look up the trampoline. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20251230145010.103439-9-jolsa@kernel.org
14 daysbpf: Add trampoline ip hash tableJiri Olsa
Following changes need to lookup trampoline based on its ip address, adding hash table for that. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251230145010.103439-8-jolsa@kernel.org
14 daysftrace: Add update_ftrace_direct_mod functionJiri Olsa
Adding update_ftrace_direct_mod function that modifies all entries (ip -> direct) provided in hash argument to direct ftrace ops and updates its attachments. The difference to current modify_ftrace_direct is: - hash argument that allows to modify multiple ip -> direct entries at once This change will allow us to have simple ftrace_ops for all bpf direct interface users in following changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20251230145010.103439-7-jolsa@kernel.org
14 daysftrace: Add update_ftrace_direct_del functionJiri Olsa
Adding update_ftrace_direct_del function that removes all entries (ip -> addr) provided in hash argument to direct ftrace ops and updates its attachments. The difference to current unregister_ftrace_direct is - hash argument that allows to unregister multiple ip -> direct entries at once - we can call update_ftrace_direct_del multiple times on the same ftrace_ops object, becase we do not need to unregister all entries at once, we can do it gradualy with the help of ftrace_update_ops function This change will allow us to have simple ftrace_ops for all bpf direct interface users in following changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20251230145010.103439-6-jolsa@kernel.org
14 daysftrace: Add update_ftrace_direct_add functionJiri Olsa
Adding update_ftrace_direct_add function that adds all entries (ip -> addr) provided in hash argument to direct ftrace ops and updates its attachments. The difference to current register_ftrace_direct is - hash argument that allows to register multiple ip -> direct entries at once - we can call update_ftrace_direct_add multiple times on the same ftrace_ops object, becase after first registration with register_ftrace_function_nolock, it uses ftrace_update_ops to update the ftrace_ops object This change will allow us to have simple ftrace_ops for all bpf direct interface users in following changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20251230145010.103439-5-jolsa@kernel.org
14 daysftrace: Export some of hash related functionsJiri Olsa
We are going to use these functions in following changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20251230145010.103439-4-jolsa@kernel.org
14 daysftrace,bpf: Remove FTRACE_OPS_FL_JMP ftrace_ops flagJiri Olsa
At the moment the we allow the jmp attach only for ftrace_ops that has FTRACE_OPS_FL_JMP set. This conflicts with following changes where we use single ftrace_ops object for all direct call sites, so all could be be attached via just call or jmp. We already limit the jmp attach support with config option and bit (LSB) set on the trampoline address. It turns out that's actually enough to limit the jmp attach for architecture and only for chosen addresses (with LSB bit set). Each user of register_ftrace_direct or modify_ftrace_direct can set the trampoline bit (LSB) to indicate it has to be attached by jmp. The bpf trampoline generation code uses trampoline flags to generate jmp-attach specific code and ftrace inner code uses the trampoline bit (LSB) to handle return from jmp attachment, so there's no harm to remove the FTRACE_OPS_FL_JMP bit. The fexit/fmodret performance stays the same (did not drop), current code: fentry : 77.904 ± 0.546M/s fexit : 62.430 ± 0.554M/s fmodret : 66.503 ± 0.902M/s with this change: fentry : 80.472 ± 0.061M/s fexit : 63.995 ± 0.127M/s fmodret : 67.362 ± 0.175M/s Fixes: 25e4e3565d45 ("ftrace: Introduce FTRACE_OPS_FL_JMP") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20251230145010.103439-2-jolsa@kernel.org
14 daysiomap: add a flag to bounce buffer direct I/OChristoph Hellwig
Add a new flag that request bounce buffering for direct I/O. This is needed to provide the stable pages requirement requested by devices that need to calculate checksums or parity over the data and allows file systems to properly work with things like T10 protection information. The implementation just calls out to the new bio bounce buffering helpers to allocate a bounce buffer, which is used for I/O and to copy to/from it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Tested-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
14 daysblock: add helpers to bounce buffer an iov_iter into biosChristoph Hellwig
Add helpers to implement bounce buffering of data into a bio to implement direct I/O for cases where direct user access is not possible because stable in-flight data is required. These are intended to be used as easily as bio_iov_iter_get_pages for the zero-copy path. The write side is trivial and just copies data into the bounce buffer. The read side is a lot more complex because it needs to perform the copy from the completion context, and without preserving the iov_iter through the call chain. It steals a trick from the integrity data user interface and uses the first vector in the bio for the bounce buffer data that is fed to the block I/O stack, and uses the others to record the user buffer fragments. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
14 daysiov_iter: extract a iov_iter_extract_bvecs helper from bio codeChristoph Hellwig
Massage __bio_iov_iter_get_pages so that it doesn't need the bio, and move it to lib/iov_iter.c so that it can be used by block code for other things than filling a bio and by other subsystems like netfs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
14 daysblock: add a BIO_MAX_SIZE constant and use itChristoph Hellwig
Currently the only constant for the maximum bio size is BIO_MAX_SECTORS, which is in units of 512-byte sectors, but a lot of user need a byte limit. Add a BIO_MAX_SIZE constant, redefine BIO_MAX_SECTORS in terms of it, and switch all bio-related uses of UINT_MAX for the maximum size to use the symbolic names instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-28Merge tag 'health-monitoring-7.0_2026-01-20' of ↵Carlos Maiolino
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-7.0-merge xfs: autonomous self healing of filesystems [v7] This patchset builds new functionality to deliver live information about filesystem health events to userspace. This is done by creating an anonymous file that can be read() for events by userspace programs. Events are captured by hooking various parts of XFS and iomap so that metadata health failures, file I/O errors, and major changes in filesystem state (unmounts, shutdowns, etc.) can be observed by programs. When an event occurs, the hook functions queue an event object to each event anonfd for later processing. Programs must have CAP_SYS_ADMIN to open the anonfd and there's a maximum event lag to prevent resource overconsumption. The events themselves can be read() from the anonfd as C structs for the xfs_healer daemon. In userspace, we create a new daemon program that will read the event objects and initiate repairs automatically. This daemon is managed entirely by systemd and will not block unmounting of the filesystem unless repairs are ongoing. They are auto-started by a starter service that uses fanotify. This patchset depends on the new fserror code that Christian Brauner has tentatively accepted for Linux 7.0: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=vfs-7.0.fserror v7: more cleanups of the media verification ioctl, improve comments, and reuse the bio v6: fix pi-breaking bugs, make verify failures trigger health reports and filter bio status flags better v5: add verify-media ioctl, collapse small helper funcs with only one caller v4: drop multiple client support so we can make direct calls into healthmon instead of chasing pointers and doing indirect calls v3: drag out of rfc status With a bit of luck, this should all go splendidly. Conflicts: This merge required an update on files: - fs/xfs/xfs_healthmon.c - fs/xfs/xfs_verify_media.c Such change was required because a parallel developement changed XFS header file xfs.h naming to xfs_platform.h, so the merge required to update those includes in both files above Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-28seqlock: fix scoped_seqlock_read kernel-docRandy Dunlap
Eliminate all kernel-doc warnings in seqlock.h: - correct the macro to have "()" immediately following the macro name - don't include the macro parameters in the short description (first line) - make the parameter names in the comments match the actual macro parameter names. - use "::" for the Example WARNING: include/linux/seqlock.h:1341 This comment starts with '/**', but isn't a kernel-doc comment. * scoped_seqlock_read (lock, ss_state) - execute the read side critical Documentation/locking/seqlock:242: include/linux/seqlock.h:1351: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils] Warning: include/linux/seqlock.h:1357 function parameter '_seqlock' not described in 'scoped_seqlock_read' Warning: include/linux/seqlock.h:1357 function parameter '_target' not described in 'scoped_seqlock_read' Fixes: cc39f3872c08 ("seqlock: Introduce scoped_seqlock_read()") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260123183749.3997533-1-rdunlap@infradead.org
2026-01-27bpf: Fix tcx/netkit detach permissions when prog fd isn't givenGuillaume Gonnet
This commit fixes a security issue where BPF_PROG_DETACH on tcx or netkit devices could be executed by any user when no program fd was provided, bypassing permission checks. The fix adds a capability check for CAP_NET_ADMIN or CAP_SYS_ADMIN in this case. Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support") Signed-off-by: Guillaume Gonnet <ggonnet.linux@gmail.com> Link: https://lore.kernel.org/r/20260127160200.10395-1-ggonnet.linux@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-27io_uring/bpf_filter: cache lookup table in ctx->bpf_filtersJens Axboe
Currently a few pointer dereferences need to be made to both check if BPF filters are installed, and then also to retrieve the actual filter for the opcode. Cache the table in ctx->bpf_filters to avoid that. Add a bit of debug info on ring exit to show if we ever got this wrong. Small risk of that given that the table is currently only updated in one spot, but once task forking is enabled, that will add one more spot. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-27io_uring: add support for BPF filtering for opcode restrictionsJens Axboe
Add support for loading classic BPF programs with io_uring to provide fine-grained filtering of SQE operations. Unlike IORING_REGISTER_RESTRICTIONS which only allows bitmap-based allow/deny of opcodes, BPF filters can inspect request attributes and make dynamic decisions. The filter is registered via IORING_REGISTER_BPF_FILTER with a struct io_uring_bpf: struct io_uring_bpf_filter { __u32 opcode; /* io_uring opcode to filter */ __u32 flags; __u32 filter_len; /* number of BPF instructions */ __u32 resv; __u64 filter_ptr; /* pointer to BPF filter */ __u64 resv2[5]; }; enum { IO_URING_BPF_CMD_FILTER = 1, }; struct io_uring_bpf { __u16 cmd_type; /* IO_URING_BPF_* values */ __u16 cmd_flags; /* none so far */ __u32 resv; union { struct io_uring_bpf_filter filter; }; }; and the filters get supplied a struct io_uring_bpf_ctx: struct io_uring_bpf_ctx { __u64 user_data; __u8 opcode; __u8 sqe_flags; __u8 pdu_size; __u8 pad[5]; }; where it's possible to filter on opcode and sqe_flags, with pdu_size indicating how much extra data is being passed in beyond the pad field. This will used for specific finer grained filtering inside an opcode. An example of that for sockets is in one of the following patches. Anything the opcode supports can end up in this struct, populated by the opcode itself, and hence can be filtered for. Filters have the following semantics: - Return 1 to allow the request - Return 0 to deny the request with -EACCES - Multiple filters can be stacked per opcode. All filters must return 1 for the opcode to be allowed. - Filters are evaluated in registration order (most recent first) The implementation uses classic BPF (cBPF) rather than eBPF for as that's required for containers, and since they can be used by any user in the system. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-27bpf, sockmap: Fix FIONREAD for sockmapJiayuan Chen
A socket using sockmap has its own independent receive queue: ingress_msg. This queue may contain data from its own protocol stack or from other sockets. Therefore, for sockmap, relying solely on copied_seq and rcv_nxt to calculate FIONREAD is not enough. This patch adds a new msg_tot_len field in the psock structure to record the data length in ingress_msg. Additionally, we implement new ioctl interfaces for TCP and UDP to intercept FIONREAD operations. Note that we intentionally do not include sk_receive_queue data in the FIONREAD result. Data in sk_receive_queue has not yet been processed by the BPF verdict program, and may be redirected to other sockets or dropped. Including it would create semantic ambiguity since this data may never be readable by the user. Unix and VSOCK sockets have similar issues, but fixing them is outside the scope of this patch as it would require more intrusive changes. Previous work by John Fastabend made some efforts towards FIONREAD support: commit e5c6de5fa025 ("bpf, sockmap: Incorrectly handling copied_seq") Although the current patch is based on the previous work by John Fastabend, it is acceptable for our Fixes tag to point to the same commit. FD1:read() -- FD1->copied_seq++ | [read data] | [enqueue data] v [sockmap] -> ingress to self -> ingress_msg queue FD1 native stack ------> ^ -- FD1->rcv_nxt++ -> redirect to other | [enqueue data] | | | ingress to FD1 v ^ ... | [sockmap] FD2 native stack Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/r/20260124113314.113584-3-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-27bpf, sockmap: Fix incorrect copied_seq calculationJiayuan Chen
A socket using sockmap has its own independent receive queue: ingress_msg. This queue may contain data from its own protocol stack or from other sockets. The issue is that when reading from ingress_msg, we update tp->copied_seq by default. However, if the data is not from its own protocol stack, tcp->rcv_nxt is not increased. Later, if we convert this socket to a native socket, reading from this socket may fail because copied_seq might be significantly larger than rcv_nxt. This fix also addresses the syzkaller-reported bug referenced in the Closes tag. This patch marks the skmsg objects in ingress_msg. When reading, we update copied_seq only if the data is from its own protocol stack. FD1:read() -- FD1->copied_seq++ | [read data] | [enqueue data] v [sockmap] -> ingress to self -> ingress_msg queue FD1 native stack ------> ^ -- FD1->rcv_nxt++ -> redirect to other | [enqueue data] | | | ingress to FD1 v ^ ... | [sockmap] FD2 native stack Closes: https://syzkaller.appspot.com/bug?extid=06dbd397158ec0ea4983 Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20260124113314.113584-2-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-27irqchip/gic-v5: Add ACPI IWB probingLorenzo Pieralisi
To probe an IWB in an ACPI based system it is required: - to implement the IORT functions handling the IWB IORT node and create functions to retrieve IWB firmware information - to augment the driver to match the DSDT ACPI "ARMH0003" device and retrieve the IWB wire and trigger mask from the GSI interrupt descriptor in the IWB msi_domain_ops.msi_translate() function Make the required driver changes to enable IWB probing in ACPI systems. The GICv5 GSI format requires special handling for IWB routed IRQs. Add IWB GSI detection to the top level driver gic_v5_get_gsi_domain_id() function so that the correct IRQ domain for a GSI can be detected by parsing the GSI and check whether it is an IWB-backed IRQ or not. Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Acked-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260115-gicv5-host-acpi-v3-6-c13a9a150388@kernel.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-01-27irqchip/gic-v5: Add ACPI ITS probingLorenzo Pieralisi
On ACPI ARM64 systems the GICv5 ITS configuration and translate frames are described in the MADT table. Refactor the current GICv5 ITS driver code to share common functions between ACPI and OF and implement ACPI probing in the GICv5 ITS driver. Add iort_msi_xlate() to map a device ID and retrieve an MSI controller fwnode node for ACPI systems and update pci_msi_map_rid_ctlr_node() to use it in its ACPI code path. Add the required functions to IORT code for deviceID retrieval and IRQ domain registration and look-up so that the GICv5 ITS driver in an ACPI based system can be successfully probed. Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Acked-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260115-gicv5-host-acpi-v3-5-c13a9a150388@kernel.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-01-27irqchip/gic-v5: Add ACPI IRS probingLorenzo Pieralisi
On ARM64 ACPI systems GICv5 IRSes are described in MADT sub-entries. Add the required plumbing to parse MADT IRS firmware table entries and probe the IRS components in ACPI. Augment the irqdomain_ops.translate() for PPI and SPI IRQs in order to provide support for their ACPI based firmware translation. Implement an irqchip ACPI based callback to initialize the global GSI domain upon an MADT IRS detection. The IRQCHIP_ACPI_DECLARE() entry in the top level GICv5 driver is only used to trigger the IRS probing (ie the global GSI domain is initialized once on the first call on multi-IRS systems); IRS probing takes place by calling acpi_table_parse_madt() in the IRS sub-driver, that probes all IRSes in sequence. Add a new ACPI interrupt model so that it can be detected at runtime and distinguished from previous GIC architecture models. Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Acked-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260115-gicv5-host-acpi-v3-4-c13a9a150388@kernel.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-01-27PCI/MSI: Make the pci_msi_map_rid_ctlr_node() interface firmware agnosticLorenzo Pieralisi
To support booting with OF and ACPI seamlessly, GIC ITS parent code requires the PCI/MSI irqdomain layer to implement a function to retrieve an MSI controller fwnode and map an RID in a firmware agnostic way (ie pci_msi_map_rid_ctlr_node()). Convert pci_msi_map_rid_ctlr_node() to an OF agnostic interface (fwnode_handle based) and update the GIC ITS MSI parent code to reflect the pci_msi_map_rid_ctlr_node() change. Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260115-gicv5-host-acpi-v3-2-c13a9a150388@kernel.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-01-27irqdomain: Add parent field to struct irqchip_fwidLorenzo Pieralisi
The GICv5 driver IRQ domain hierarchy requires adding a parent field to struct irqchip_fwid so that core code can reference a fwnode_handle parent for a given fwnode. Add a parent field to struct irqchip_fwid and update the related kernel API functions to initialize and handle it. Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Acked-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260115-gicv5-host-acpi-v3-1-c13a9a150388@kernel.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-01-26block: remove bio_last_bvec_allKeith Busch
There are no more callers of this function after commit f6b2d8b134b2413 ("btrfs: track the next file offset in struct btrfs_bio_ctrl"), so remove the function. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-26mm/zone_device: reinitialize large zone device private foliosMatthew Brost
Reinitialize metadata for large zone device private folios in zone_device_page_init prior to creating a higher-order zone device private folio. This step is necessary when the folio's order changes dynamically between zone_device_page_init calls to avoid building a corrupt folio. As part of the metadata reinitialization, the dev_pagemap must be passed in from the caller because the pgmap stored in the folio page may have been overwritten with a compound head. Without this fix, individual pages could have invalid pgmap fields and flags (with PG_locked being notably problematic) due to prior different order allocations, which can, and will, result in kernel crashes. Link: https://lkml.kernel.org/r/20260116111325.1736137-2-francois.dugast@intel.com Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Francois Dugast <francois.dugast@intel.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Balbir Singh <balbirs@nvidia.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26memfd: export alloc_file()Pratyush Yadav (Google)
Patch series "mm: memfd_luo hotfixes". This series contains a couple of fixes for memfd preservation using LUO. This patch (of 3): The Live Update Orchestrator's (LUO) memfd preservation works by preserving all the folios of a memfd, re-creating an empty memfd on the next boot, and then inserting back the preserved folios. Currently it creates the file by directly calling shmem_file_setup(). This leaves out other work done by alloc_file() like setting up the file mode, flags, or calling the security hooks. Export alloc_file() to let memfd_luo use it. Rename it to memfd_alloc_file() since it is no longer private and thus needs a subsystem prefix. Link: https://lkml.kernel.org/r/20260122151842.4069702-1-pratyush@kernel.org Link: https://lkml.kernel.org/r/20260122151842.4069702-2-pratyush@kernel.org Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26mm/kasan: fix KASAN poisoning in vrealloc()Andrey Ryabinin
A KASAN warning can be triggered when vrealloc() changes the requested size to a value that is not aligned to KASAN_GRANULE_SIZE. ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1 at mm/kasan/shadow.c:174 kasan_unpoison+0x40/0x48 ... pc : kasan_unpoison+0x40/0x48 lr : __kasan_unpoison_vmalloc+0x40/0x68 Call trace: kasan_unpoison+0x40/0x48 (P) vrealloc_node_align_noprof+0x200/0x320 bpf_patch_insn_data+0x90/0x2f0 convert_ctx_accesses+0x8c0/0x1158 bpf_check+0x1488/0x1900 bpf_prog_load+0xd20/0x1258 __sys_bpf+0x96c/0xdf0 __arm64_sys_bpf+0x50/0xa0 invoke_syscall+0x90/0x160 Introduce a dedicated kasan_vrealloc() helper that centralizes KASAN handling for vmalloc reallocations. The helper accounts for KASAN granule alignment when growing or shrinking an allocation and ensures that partial granules are handled correctly. Use this helper from vrealloc_node_align_noprof() to fix poisoning logic. [ryabinin.a.a@gmail.com: move kasan_enabled() check, fix build] Link: https://lkml.kernel.org/r/20260119144509.32767-1-ryabinin.a.a@gmail.com Link: https://lkml.kernel.org/r/20260113191516.31015-1-ryabinin.a.a@gmail.com Fixes: d699440f58ce ("mm: fix vrealloc()'s KASAN poisoning logic") Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Reported-by: Maciej Żenczykowski <maze@google.com> Reported-by: <joonki.min@samsung-slsi.corp-partner.google.com> Closes: https://lkml.kernel.org/r/CANP3RGeuRW53vukDy7WDO3FiVgu34-xVJYkfpm08oLO3odYFrA@mail.gmail.com Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26Merge tag 'vfs-6.19-rc8.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix the the buggy conversion of fuse_reverse_inval_entry() introduced during the creation rework - Disallow nfs delegation requests for directories by setting simple_nosetlease() - Require an opt-in for getting readdir flag bits outside of S_DT_MASK set in d_type - Fix scheduling delayed writeback work by only scheduling when the dirty time expiry interval is non-zero and cancel the delayed work if the interval is set to zero - Use rounded_jiffies_interval for dirty time work - Check the return value of sb_set_blocksize() for romfs - Wait for batched folios to be stable in __iomap_get_folio() - Use private naming for fuse hash size - Fix the stale dentry cleanup to prevent a race that causes a UAF * tag 'vfs-6.19-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: document d_dispose_if_unused() fuse: shrink once after all buckets have been scanned fuse: clean up fuse_dentry_tree_work() fuse: add need_resched() before unlocking bucket fuse: make sure dentry is evicted if stale fuse: fix race when disposing stale dentries fuse: use private naming for fuse hash size writeback: use round_jiffies_relative for dirtytime_work iomap: wait for batched folios to be stable in __iomap_get_folio romfs: check sb_set_blocksize() return value docs: clarify that dirtytime_expire_seconds=0 disables writeback writeback: fix 100% CPU usage when dirtytime_expire_interval is 0 readdir: require opt-in for d_type flags vboxsf: don't allow delegations to be set on directories ceph: don't allow delegations to be set on directories gfs2: don't allow delegations to be set on directories 9p: don't allow delegations to be set on directories smb/client: properly disallow delegations on directories nfs: properly disallow delegation requests on directories fuse: fix conversion of fuse_reverse_inval_entry() to start_removing()