summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2026-02-09Merge tag 'for-7.0/block-20260206' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block updates from Jens Axboe: - Support for batch request processing for ublk, improving the efficiency of the kernel/ublk server communication. This can yield nice 7-12% performance improvements - Support for integrity data for ublk - Various other ublk improvements and additions, including a ton of selftests additions and updated - Move the handling of blk-crypto software fallback from below the block layer to above it. This reduces the complexity of dealing with bio splitting - Series fixing a number of potential deadlocks in blk-mq related to the queue usage counter and writeback throttling and rq-qos debugfs handling - Add an async_depth queue attribute, to resolve a performance regression that's been around for a qhilw related to the scheduler depth handling - Only use task_work for IOPOLL completions on NVMe, if it is necessary to do so. An earlier fix for an issue resulted in all these completions being punted to task_work, to guarantee that completions were only run for a given io_uring ring when it was local to that ring. With the new changes, we can detect if it's necessary to use task_work or not, and avoid it if possible. - rnbd fixes: - Fix refcount underflow in device unmap path - Handle PREFLUSH and NOUNMAP flags properly in protocol - Fix server-side bi_size for special IOs - Zero response buffer before use - Fix trace format for flags - Add .release to rnbd_dev_ktype - MD pull requests via Yu Kuai - Fix raid5_run() to return error when log_init() fails - Fix IO hang with degraded array with llbitmap - Fix percpu_ref not resurrected on suspend timeout in llbitmap - Fix GPF in write_page caused by resize race - Fix NULL pointer dereference in process_metadata_update - Fix hang when stopping arrays with metadata through dm-raid - Fix any_working flag handling in raid10_sync_request - Refactor sync/recovery code path, improve error handling for badblocks, and remove unused recovery_disabled field - Consolidate mddev boolean fields into mddev_flags - Use mempool to allocate stripe_request_ctx and make sure max_sectors is not less than io_opt in raid5 - Fix return value of mddev_trylock - Fix memory leak in raid1_run() - Add Li Nan as mdraid reviewer - Move phys_vec definitions to the kernel types, mostly in preparation for some VFIO and RDMA changes - Improve the speed for secure erase for some devices - Various little rust updates - Various other minor fixes, improvements, and cleanups * tag 'for-7.0/block-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (162 commits) blk-mq: ABI/sysfs-block: fix docs build warnings selftests: ublk: organize test directories by test ID block: decouple secure erase size limit from discard size limit block: remove redundant kill_bdev() call in set_blocksize() blk-mq: add documentation for new queue attribute async_dpeth block, bfq: convert to use request_queue->async_depth mq-deadline: covert to use request_queue->async_depth kyber: covert to use request_queue->async_depth blk-mq: add a new queue sysfs attribute async_depth blk-mq: factor out a helper blk_mq_limit_depth() blk-mq-sched: unify elevators checking for async requests block: convert nr_requests to unsigned int block: don't use strcpy to copy blockdev name blk-mq-debugfs: warn about possible deadlock blk-mq-debugfs: add missing debugfs_mutex in blk_mq_debugfs_register_hctxs() blk-mq-debugfs: remove blk_mq_debugfs_unregister_rqos() blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static blk-rq-qos: fix possible debugfs_mutex deadlock blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos blk-wbt: fix possible deadlock to nest pcpu_alloc_mutex under q_usage_counter ...
2026-02-09Merge tag 'io_uring-bpf-restrictions.4-20260206' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring bpf filters from Jens Axboe: "This adds support for both cBPF filters for io_uring, as well as task inherited restrictions and filters. seccomp and io_uring don't play along nicely, as most of the interesting data to filter on resides somewhat out-of-band, in the submission queue ring. As a result, things like containers and systemd that apply seccomp filters, can't filter io_uring operations. That leaves them with just one choice if filtering is critical - filter the actual io_uring_setup(2) system call to simply disallow io_uring. That's rather unfortunate, and has limited us because of it. io_uring already has some filtering support. It requires the ring to be setup in a disabled state, and then a filter set can be applied. This filter set is completely bi-modal - an opcode is either enabled or it's not. Once a filter set is registered, the ring can be enabled. This is very restrictive, and it's not useful at all to systemd or containers which really want both broader and more specific control. This first adds support for cBPF filters for opcodes, which enables tighter control over what exactly a specific opcode may do. As examples, specific support is added for IORING_OP_OPENAT/OPENAT2, allowing filtering on resolve flags. And another example is added for IORING_OP_SOCKET, allowing filtering on domain/type/protocol. These are both common use cases. cBPF was chosen rather than eBPF, because the latter is often restricted in containers as well. These filters are run post the init phase of the request, which allows filters to even dip into data that is being passed in struct in user memory, as the init side of requests make that data stable by bringing it into the kernel. This allows filtering without needing to copy this data twice, or have filters etc know about the exact layout of the user data. The filters get the already copied and sanitized data passed. On top of that support is added for per-task filters, meaning that any ring created with a task that has a per-task filter will get those filters applied when it's created. These filters are inherited across fork as well. Once a filter has been registered, any further added filters may only further restrict what operations are permitted. Filters cannot change the return value of an operation, they can only permit or deny it based on the contents" * tag 'io_uring-bpf-restrictions.4-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring: allow registration of per-task restrictions io_uring: add task fork hook io_uring/bpf_filter: add ref counts to struct io_bpf_filter io_uring/bpf_filter: cache lookup table in ctx->bpf_filters io_uring/bpf_filter: allow filtering on contents of struct open_how io_uring/net: allow filtering on IORING_OP_SOCKET data io_uring: add support for BPF filtering for opcode restrictions
2026-02-09Merge tag 'pull-filename' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs 'struct filename' updates from Al Viro: "[Mostly] sanitize struct filename handling" * tag 'pull-filename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (68 commits) sysfs(2): fs_index() argument is _not_ a pathname alpha: switch osf_mount() to strndup_user() ksmbd: use CLASS(filename_kernel) mqueue: switch to CLASS(filename) user_statfs(): switch to CLASS(filename) statx: switch to CLASS(filename_maybe_null) quotactl_block(): switch to CLASS(filename) chroot(2): switch to CLASS(filename) move_mount(2): switch to CLASS(filename_maybe_null) namei.c: switch user pathname imports to CLASS(filename{,_flags}) namei.c: convert getname_kernel() callers to CLASS(filename_kernel) do_f{chmod,chown,access}at(): use CLASS(filename_uflags) do_readlinkat(): switch to CLASS(filename_flags) do_sys_truncate(): switch to CLASS(filename) do_utimes_path(): switch to CLASS(filename_uflags) chdir(2): unspaghettify a bit... do_fchownat(): unspaghettify a bit... fspick(2): use CLASS(filename_flags) name_to_handle_at(): use CLASS(filename_uflags) vfs_open_tree(): use CLASS(filename_uflags) ...
2026-02-09tracing: Move d_max_latency out of CONFIG_FSNOTIFY protectionSteven Rostedt
The tracing_max_latency shouldn't be limited if CONFIG_FSNOTIFY is defined or not and it was moved out of that protection to be always available with CONFIG_TRACER_MAX_TRACE. All was moved out except the dentry descriptor for it (d_max_latency) and it failed to build on some configs. Move that out of the CONFIG_FSNOTIFY protection too. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260209194631.788bfc85@fedora Fixes: ba73713da50e ("tracing: Clean up use of trace_create_maxlat_file()") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202602092133.fTdojd95-lkp@intel.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-02-09Merge tag 'vfs-7.0-rc1.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains a mix of VFS cleanups, performance improvements, API fixes, documentation, and a deprecation notice. Scalability and performance: - Rework pid allocation to only take pidmap_lock once instead of twice during alloc_pid(), improving thread creation/teardown throughput by 10-16% depending on false-sharing luck. Pad the namespace refcount to reduce false-sharing - Track file lock presence via a flag in ->i_opflags instead of reading ->i_flctx, avoiding false-sharing with ->i_readcount on open/close hot paths. Measured 4-16% improvement on 24-core open-in-a-loop benchmarks - Use a consume fence in locks_inode_context() to match the store-release/load-consume idiom, eliminating a hardware fence on some architectures - Annotate cdev_lock with __cacheline_aligned_in_smp to prevent false-sharing - Remove a redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu() that never fires since the caller already verifies it, eliminating a 100% mispredicted branch - Fix a 100% mispredicted likely() in devcgroup_inode_permission() that became wrong after a prior code reorder Bug fixes and correctness: - Make insert_inode_locked() wait for inode destruction instead of skipping, fixing a corner case where two matching inodes could exist in the hash - Move f_mode initialization before file_ref_init() in alloc_file() to respect the SLAB_TYPESAFE_BY_RCU ordering contract - Add a WARN_ON_ONCE guard in try_to_free_buffers() for folios with no buffers attached, preventing a null pointer dereference when AS_RELEASE_ALWAYS is set but no release_folio op exists - Fix select restart_block to store end_time as timespec64, avoiding truncation of tv_sec on 32-bit architectures - Make dump_inode() use get_kernel_nofault() to safely access inode and superblock fields, matching the dump_mapping() pattern API modernization: - Make posix_acl_to_xattr() allocate the buffer internally since every single caller was doing it anyway. Reduces boilerplate and unnecessary error checking across ~15 filesystems - Replace deprecated simple_strtoul() with kstrtoul() for the ihash_entries, dhash_entries, mhash_entries, and mphash_entries boot parameters, adding proper error handling - Convert chardev code to use guard(mutex) and __free(kfree) cleanup patterns - Replace min_t() with min() or umin() in VFS code to avoid silently truncating unsigned long to unsigned int - Gate LOOKUP_RCU assertions behind CONFIG_DEBUG_VFS since callers already check the flag Deprecation: - Begin deprecating legacy BSD process accounting (acct(2)). The interface has numerous footguns and better alternatives exist (eBPF) Documentation: - Fix and complete kernel-doc for struct export_operations, removing duplicated documentation between ReST and source - Fix kernel-doc warnings for __start_dirop() and ilookup5_nowait() Testing: - Add a kunit test for initramfs cpio handling of entries with filesize > PATH_MAX Misc: - Add missing <linux/init_task.h> include in fs_struct.c" * tag 'vfs-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits) posix_acl: make posix_acl_to_xattr() alloc the buffer fs: make insert_inode_locked() wait for inode destruction initramfs_test: kunit test for cpio.filesize > PATH_MAX fs: improve dump_inode() to safely access inode fields fs: add <linux/init_task.h> for 'init_fs' docs: exportfs: Use source code struct documentation fs: move initializing f_mode before file_ref_init() exportfs: Complete kernel-doc for struct export_operations exportfs: Mark struct export_operations functions at kernel-doc exportfs: Fix kernel-doc output for get_name() acct(2): begin the deprecation of legacy BSD process accounting device_cgroup: remove branch hint after code refactor VFS: fix __start_dirop() kernel-doc warnings fs: Describe @isnew parameter in ilookup5_nowait() fs/namei: Remove redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu fs: only assert on LOOKUP_RCU when built with CONFIG_DEBUG_VFS select: store end_time as timespec64 in restart block chardev: Switch to guard(mutex) and __free(kfree) namespace: Replace simple_strtoul with kstrtoul to parse boot params dcache: Replace simple_strtoul with kstrtoul in set_dhash_entries ...
2026-02-09Merge tag 'lsm-pr-20260203' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm updates from Paul Moore: - Unify the security_inode_listsecurity() calls in NFSv4 While looking at security_inode_listsecurity() with an eye towards improving the interface, we realized that the NFSv4 code was making multiple calls to the LSM hook that could be consolidated into one. - Mark the LSM static branch keys as static - this helps resolve some sparse warnings - Add __rust_helper annotations to the LSM and cred wrapper functions - Remove the unsused set_security_override_from_ctx() function - Minor fixes to some of the LSM kdoc comment blocks * tag 'lsm-pr-20260203' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: lsm: make keys for static branch static cred: remove unused set_security_override_from_ctx() rust: security: add __rust_helper to helpers rust: cred: add __rust_helper to helpers nfs: unify security_inode_listsecurity() calls lsm: fix kernel-doc struct member names
2026-02-09Merge tag 'audit-pr-20260203' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: - Improve the NETFILTER_PKT audit records Add source and destination ports to the NETFILTER_PKT audit records while also consolidating a lot of the code into a new, singular audit_log_nf_skb() function. This new approach to structuring the NETFILTER_PKT record generation should eliminate some unnecessary overhead when audit is not built into the kernel. - Update the audit syscall classifier code Add the listxattrat(), getxattrat(), and fchmodat2() syscall to the audit code which classifies syscalls into categories of operations, e.g. "read" or "change attributes". - Move the syscall classifier declarations into audit_arch.h Shuffle around some header file declarations to resolve some sparse warnings. * tag 'audit-pr-20260203' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: move the compat_xxx_class[] extern declarations to audit_arch.h audit: add missing syscalls to read class audit: include source and destination ports to NETFILTER_PKT audit: add audit_log_nf_skb helper function audit: add fchmodat2() to change attributes class
2026-02-09Merge tag 'rcu.release.v7.0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux Pull RCU updates from Boqun Feng: - RCU Tasks Trace: Re-implement RCU tasks trace in term of SRCU-fast, not only more than 500 lines of code are saved because of the reimplementation, a new set of API, rcu_read_{,un}lock_tasks_trace(), becomes possible as well. Compared to the previous rcu_read_{,un}lock_trace(), the new API avoid the task_struct accesses thanks to the SRCU-fast semantics. As a result, the old rcu_read{,un}lock_trace() API is now deprecated. - RCU Torture Test: - Multiple improvements on kvm-series.sh (parallel run and progress showing metrics) - Add context checks to rcu_torture_timer() - Make config2csv.sh properly handle comments in .boot files - Include commit discription in testid.txt - Miscellaneous RCU changes: - Reduce synchronize_rcu() latency by reporting GP kthread's CPU QS early - Use suitable gfp_flags for the init_srcu_struct_nodes() - Fix rcu_read_unlock() deadloop due to softirq - Correctly compute probability to invoke ->exp_current() in rcutorture - Make expedited RCU CPU stall warnings detect stall-end races - RCU nocb: - Remove unnecessary WakeOvfIsDeferred wake path and callback overload handling - Extract nocb_defer_wakeup_cancel() helper * tag 'rcu.release.v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (25 commits) rcu/nocb: Extract nocb_defer_wakeup_cancel() helper rcu/nocb: Remove dead callback overload handling rcu/nocb: Remove unnecessary WakeOvfIsDeferred wake path rcu: Reduce synchronize_rcu() latency by reporting GP kthread's CPU QS early srcu: Use suitable gfp_flags for the init_srcu_struct_nodes() rcu: Fix rcu_read_unlock() deadloop due to softirq rcutorture: Correctly compute probability to invoke ->exp_current() rcu: Make expedited RCU CPU stall warnings detect stall-end races rcutorture: Add --kill-previous option to terminate previous kvm.sh runs rcutorture: Prevent concurrent kvm.sh runs on same source tree torture: Include commit discription in testid.txt torture: Make config2csv.sh properly handle comments in .boot files torture: Make kvm-series.sh give run numbers and totals torture: Make kvm-series.sh give build numbers and totals torture: Parallelize kvm-series.sh guest-OS execution rcutorture: Add context checks to rcu_torture_timer() rcutorture: Test rcu_tasks_trace_expedite_current() srcu: Create an rcu_tasks_trace_expedite_current() function checkpatch: Deprecate rcu_read_{,un}lock_trace() rcu: Update Requirements.rst for RCU Tasks Trace ...
2026-02-08tracing: Better separate SNAPSHOT and MAX_TRACE optionsSteven Rostedt
The latency tracers (scheduler, irqsoff, etc) were created when tracing was first added. These tracers required a "snapshot" buffer that was the same size as the ring buffer being written to. When a new max latency was hit, the main ring buffer would swap with the snapshot buffer so that the trace leading up to the latency would be saved in the snapshot buffer (The snapshot buffer is never written to directly and the data within it can be viewed without fear of being overwritten). Later, a new feature was added to allow snapshots to be taken by user space or even event triggers. This created a "snapshot" file that allowed users to trigger a snapshot from user space to save the current trace. The config for this new feature (CONFIG_TRACER_SNAPSHOT) would select the latency tracer config (CONFIG_TRACER_MAX_LATENCY) as it would need all the functionality from it as it already existed. But this was incorrect. As the snapshot feature is really what the latency tracers need and not the other way around. Have CONFIG_TRACER_MAX_TRACE select CONFIG_TRACER_SNAPSHOT where the tracers that needs the max latency buffer selects the TRACE_MAX_TRACE which will then select TRACER_SNAPSHOT. Also, go through trace.c and trace.h and make the code that only needs the TRACER_MAX_TRACE protected by that and the code that always requires the snapshot to be protected by TRACER_SNAPSHOT. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20260208183856.767870992@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-02-08tracing: Add tracer_uses_snapshot() helper to remove #ifdefsSteven Rostedt
Instead of having #ifdef CONFIG_TRACER_MAX_TRACE around every access to the struct tracer's use_max_tr field, add a helper function for that access and if CONFIG_TRACER_MAX_TRACE is not configured it just returns false. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20260208183856.599390238@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-02-08tracing: Rename trace_array field max_buffer to snapshot_bufferSteven Rostedt
When tracing was first added, there were latency tracers that would take a snapshot of the current trace when a new max latency was hit. This snapshot buffer was called "max_buffer". Since then, a snapshot feature was added that allowed user space or event triggers to trigger a snapshot of the current buffer using the same max_buffer of the trace_array. As this snapshot buffer now has a more generic use case, calling it "max_buffer" is confusing. Rename it to snapshot_buffer. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20260208183856.428446729@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-02-08tracing: Move pid filtering into trace_pid.cSteven Rostedt
The trace.c file was a dumping ground for most tracing code. Start organizing it better by moving various functions out into their own files. Move the PID filtering functions from trace.c into its own trace_pid.c file. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20260208032450.998330662@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-02-08tracing: Move trace_printk functions out of trace.c and into trace_printk.cSteven Rostedt
The file trace.c has become a catchall for most things tracing. Start making it smaller by breaking out various aspects into their own files. Move the functions associated to the trace_printk operations out of trace.c and into trace_printk.c. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20260208032450.828744197@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-02-08tracing: Use system_state in trace_printk_init_buffers()Steven Rostedt
The function trace_printk_init_buffers() is used to expand tha trace_printk buffers when trace_printk() is used within the kernel or in modules. On kernel boot up, it holds off from starting the sched switch cmdline recorder, but will start it immediately when it is added by a module. Currently it uses a trick to see if the global_trace buffer has been allocated or not to know if it was called by module load or not. But this is more of a hack, and can not be used when this code is moved out of trace.c. Instead simply look at the system_state and if it is running then it is know that it could only be called by module load. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20260208032450.660237094@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-02-08tracing: Have trace_printk functions use flags instead of using global_traceSteven Rostedt
The trace.c file has become a dumping ground for all tracing code and has become quite large. In order to move the trace_printk functions out of it these functions can not access global_trace directly, as that is something that needs to stay static in trace.c. Instead of testing the trace_array tr pointer to &global_trace, test the tr->flags to see if TRACE_ARRAY_FL_GLOBAL set. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20260208032450.491116245@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-02-08tracing: Make tracing_update_buffers() take NULL for global_traceSteven Rostedt
The trace.c file has become a dumping ground for all tracing code and has become quite large. In order to move the trace_printk functions out of it these functions can not access global_trace directly, as that is something that needs to stay static in trace.c. Have tracing_update_buffers() take NULL for its trace_array to denote it should work on the global_trace top level trace_array allows that function to be used outside of trace.c and still update the global_trace trace_array. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20260208032450.318864210@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-02-08tracing: Make printk_trace global for tracing systemSteven Rostedt
The printk_trace is used to determine which trace_array trace_printk() writes to. By making it a global variable among the tracing subsystem it will allow the trace_printk functions to be moved out of trace.c and still have direct access to that variable. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20260208032450.144525891@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-02-08tracing: Move ftrace_trace_stack() out of trace.c and into trace.hSteven Rostedt
The file trace.c has become a catchall for most things tracing. Start making it smaller by breaking out various aspects into their own files. Make ftrace_trace_stack() into a static inline that tests if stack tracing is enabled and if so to call __ftrace_trace_stack() to do the stack trace. This keeps the test inlined in the fast paths and only does the function call if stack tracing is enabled. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20260208032449.974218132@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-02-08tracing: Move __trace_buffer_{un}lock_*() functions to trace.hSteven Rostedt
The file trace.c has become a catchall for most things tracing. Start making it smaller by breaking out various aspects into their own files. Move the __always_inline functions __trace_buffer_lock_reserve(), __trace_buffer_unlock_commit() and trace_event_setup() into trace.h. The trace.c file will be split up and these functions will be used in more than one of these files. As they are already __always_inline they can easily be moved into the trace.h header file. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20260208032449.813550600@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-02-08tracing: Make tracing_selftest_running global to the tracing subsystemSteven Rostedt
The file trace.c has become a catchall for most things tracing. Start making it smaller by breaking out various aspects into their own files. Make the variable tracing_selftest_running global so that it can be used by other files in the tracing subsystem and trace.c can be split up. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20260208032449.648932796@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-02-08tracing: Make tracing_disabled global for tracing systemSteven Rostedt
The tracing_disabled variable is set to one on boot up to prevent some parts of tracing to access the tracing infrastructure before it is set up. It also can be set after boot if an anomaly is discovered. It is currently a static variable in trace.c and can be accessed via a function call trace_is_disabled(). There's really no reason to use a function call as the tracing subsystem should be able to access it directly. By making the variable accessed directly, code can be moved out of trace.c without adding overhead of a function call to see if tracing is disabled or not. Make tracing_disabled global and remove the tracing_is_disabled() helper function. Also add some "unlikely()"s around tracing_disabled where it's checked in hot paths. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20260208032449.483690153@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-02-08tracing: Clean up use of trace_create_maxlat_file()Steven Rostedt
In trace.c, the function trace_create_maxlat_file() is defined behind the #ifdef CONFIG_TRACER_MAX_TRACE block. The #else part defines it as: #define trace_create_maxlat_file(tr, d_tracer) \ trace_create_file("tracing_max_latency", TRACE_MODE_WRITE, \ d_tracer, tr, &tracing_max_lat_fops) But the one place that it it used has: #ifdef CONFIG_TRACER_MAX_TRACE trace_create_maxlat_file(tr, d_tracer); #endif Which is pointless and also wrong! It only gets created when both CONFIG_TRACE_MAX_TRACE and CONFIG_FS_NOTIFY is defined, but the file itself should not be dependent on CONFIG_FS_NOTIFY. Always create that file when TRACE_MAX_TRACE is defined regardless if FS_NOTIFY is or is not. Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://patch.msgid.link/20260207191101.0e014abd@robin Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-02-08tracing: Move tracing_set_filter_buffering() into trace_events_hist.cSteven Rostedt
The function tracing_set_filter_buffering() is only used in trace_events_hist.c. Move it to that file and make it static. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20260206195936.617080218@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-02-08tracing: Have all triggers expect a file parameterSteven Rostedt
When the triggers were first created, they may not have had a file parameter passed to them and things needed to be done generically. But today, all triggers have a file parameter passed to them. Remove the generic code and add a "if (WARN_ON_ONCE(!file))" to each trigger. Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Tom Zanussi <zanussi@kernel.org> Link: https://patch.msgid.link/20260206101351.609d8906@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-02-08watchdog/hardlockup: simplify perf event probe and remove per-cpu dependencyQiliang Yuan
Simplify the hardlockup detector's probe path and remove its implicit dependency on pinned per-cpu execution. Refactor hardlockup_detector_event_create() to be stateless. Return the created perf_event pointer to the caller instead of directly modifying the per-cpu 'watchdog_ev' variable. This allows the probe path to safely manage a temporary event without the risk of leaving stale pointers should task migration occur. Link: https://lkml.kernel.org/r/20260129022629.2201331-1-realwujing@gmail.com Signed-off-by: Shouxin Sun <sunshx@chinatelecom.cn> Signed-off-by: Junnan Zhang <zhangjn11@chinatelecom.cn> Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn> Signed-off-by: Qiliang Yuan <realwujing@gmail.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Cc: Jinchao Wang <wangjinchao600@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Huafei <lihuafei1@huawei.com> Cc: Song Liu <song@kernel.org> Cc: Thorsten Blum <thorsten.blum@linux.dev> Cc: Wang Jinchao <wangjinchao600@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-08watchdog/softlockup: fix sample ring index wrap in need_counting_irqs()Shengming Hu
cpustat_tail indexes cpustat_util[], which is a NUM_SAMPLE_PERIODS-sized ring buffer. need_counting_irqs() currently wraps the index using NUM_HARDIRQ_REPORT, which only happens to match NUM_SAMPLE_PERIODS. Use NUM_SAMPLE_PERIODS for the wrap to keep the ring math correct even if the NUM_HARDIRQ_REPORT or NUM_SAMPLE_PERIODS changes. Link: https://lkml.kernel.org/r/tencent_7068189CB6D6689EB353F3D17BF5A5311A07@qq.com Fixes: e9a9292e2368 ("watchdog/softlockup: Report the most frequent interrupts") Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Zhang Run <zhang.run@zte.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-08kho: fix doc for kho_restore_pages()Tycho Andersen (AMD)
This function returns NULL if kho_restore_page() returns NULL, which happens in a couple of corner cases. It never returns an error code. Link: https://lkml.kernel.org/r/20260123190506.1058669-1-tycho@kernel.org Signed-off-by: Tycho Andersen (AMD) <tycho@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-08tests/liveupdate: add in-kernel liveupdate testPasha Tatashin
Introduce an in-kernel test module to validate the core logic of the Live Update Orchestrator's File-Lifecycle-Bound feature. This provides a low-level, controlled environment to test FLB registration and callback invocation without requiring userspace interaction or actual kexec reboots. The test is enabled by the CONFIG_LIVEUPDATE_TEST Kconfig option. Link: https://lkml.kernel.org/r/20251218155752.3045808-6-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Alexander Graf <graf@amazon.com> Cc: David Gow <davidgow@google.com> Cc: David Matlack <dmatlack@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <kees@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Cc: Tamir Duberstein <tamird@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-08liveupdate: luo_flb: introduce File-Lifecycle-Bound global statePasha Tatashin
Introduce a mechanism for managing global kernel state whose lifecycle is tied to the preservation of one or more files. This is necessary for subsystems where multiple preserved file descriptors depend on a single, shared underlying resource. An example is HugeTLB, where multiple file descriptors such as memfd and guest_memfd may rely on the state of a single HugeTLB subsystem. Preserving this state for each individual file would be redundant and incorrect. The state should be preserved only once when the first file is preserved, and restored/finished only once the last file is handled. This patch introduces File-Lifecycle-Bound (FLB) objects to solve this problem. An FLB is a global, reference-counted object with a defined set of operations: - A file handler (struct liveupdate_file_handler) declares a dependency on one or more FLBs via a new registration function, liveupdate_register_flb(). - When the first file depending on an FLB is preserved, the FLB's .preserve() callback is invoked to save the shared global state. The reference count is then incremented for each subsequent file. - Conversely, when the last file is unpreserved (before reboot) or finished (after reboot), the FLB's .unpreserve() or .finish() callback is invoked to clean up the global resource. The implementation includes: - A new set of ABI definitions (luo_flb_ser, luo_flb_head_ser) and a corresponding FDT node (luo-flb) to serialize the state of all active FLBs and pass them via Kexec Handover. - Core logic in luo_flb.c to manage FLB registration, reference counting, and the invocation of lifecycle callbacks. - An API (liveupdate_flb_get/_incoming/_outgoing) for other kernel subsystems to safely access the live object managed by an FLB, both before and after the live update. This framework provides the necessary infrastructure for more complex subsystems like IOMMU, VFIO, and KVM to integrate with the Live Update Orchestrator. Link: https://lkml.kernel.org/r/20251218155752.3045808-5-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Alexander Graf <graf@amazon.com> Cc: David Gow <davidgow@google.com> Cc: David Matlack <dmatlack@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <kees@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Cc: Tamir Duberstein <tamird@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-08liveupdate: luo_file: Use private listPasha Tatashin
Switch LUO to use the private list iterators. Link: https://lkml.kernel.org/r/20251218155752.3045808-4-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Alexander Graf <graf@amazon.com> Cc: David Gow <davidgow@google.com> Cc: David Matlack <dmatlack@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <kees@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Cc: Tamir Duberstein <tamird@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-08delayacct: fix uapi timespec64 definitionArnd Bergmann
The custom definition of 'struct timespec64' is incompatible with both the kernel's internal definition and the glibc type, at least on big-endian targets that have the tv_nsec field in a different place, and the definition clashes with any userspace that also defines a timespec64 structure. Running the header check with -Wpadding enabled produces this output that warns about the incorrect padding: usr/include/linux/taskstats.h:25:1: error: padding struct size to alignment boundary with 4 bytes [-Werror=padded] Remove the hack and instead use the regular __kernel_timespec type that is meant to be used in uapi definitions. Link: https://lkml.kernel.org/r/20260202095906.1344100-1-arnd@kernel.org Fixes: 29b63f6eff0e ("delayacct: add timestamp of delay max") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Fan Yu <fan.yu9@zte.com.cn> Cc: Jonathan Corbet <corbet@lwn.net> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Jiang Kun <jiang.kun2@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-07Merge tag 'sched-urgent-2026-02-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Miscellaneous MMCID fixes to address bugs and performance regressions in the recent rewrite of the SCHED_MM_CID management code: - Fix livelock triggered by BPF CI testing - Fix hard lockup on weakly ordered systems - Simplify the dropping of CIDs in the exit path by removing an unintended transition phase - Fix performance/scalability regression on a thread-pool benchmark by optimizing transitional CIDs when scheduling out" * tag 'sched-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/mmcid: Optimize transitional CIDs when scheduling out sched/mmcid: Drop per CPU CID immediately when switching to per task mode sched/mmcid: Protect transition on weakly ordered systems sched/mmcid: Prevent live lock on task to CPU mode transition
2026-02-07workqueue: replace BUG_ON with panic in panic_on_wq_watchdogBreno Leitao
Replace BUG_ON() with panic() in panic_on_wq_watchdog(). This is not a bug condition but a deliberate forced panic requested by the user via module parameters to crash the system for debugging purposes. Using panic() instead of BUG_ON() makes this intent clearer and provides more informative output about which threshold was exceeded and the actual values, making it easier to diagnose the stall condition from crash dumps. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2026-02-07workqueue: add time-based panic for stallsBreno Leitao
Add a new module parameter 'panic_on_stall_time' that triggers a panic when a workqueue stall persists for longer than the specified duration in seconds. Unlike 'panic_on_stall' which counts accumulated stall events, this parameter triggers based on the duration of a single continuous stall. This is useful for catching truly stuck workqueues rather than accumulating transient stalls. Usage: workqueue.panic_on_stall_time=120 This would panic if any workqueue pool has been stalled for 120 seconds or more. The stall duration is measured from the workqueue last progress (poll_ts) which accounts for legitimate system stalls. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2026-02-07Merge tag 'objtool-urgent-2026-02-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Ingo Molnar:: - Bump up the Clang minimum version requirements for livepatch builds, due to Clang assembler section handling bugs causing silent miscompilations - Strip livepatching symbol artifacts from non-livepatch modules - Fix livepatch build warnings when certain Clang LTO options are enabled - Fix livepatch build error when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y * tag 'objtool-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool/klp: Fix unexported static call key access for manually built livepatch modules objtool/klp: Fix symbol correlation for orphaned local symbols livepatch: Free klp_{object,func}_ext data after initialization livepatch: Fix having __klp_objects relics in non-livepatch modules livepatch/klp-build: Require Clang assembler >= 20
2026-02-06Merge branch 'pci/controller/tegra'Bjorn Helgaas
- Export irq_domain_free_irqs() to allow PCI/MSI drivers that tear down MSI domains to be built as modules (Aaron Kling) - Export tegra_cpuidle_pcie_irqs_in_use(), which disables Tegra CC6 while PCI IRQs are in use, so pci-tegra can be built as a module (Aaron Kling) - Allow pci-tegra to be built as a module (Aaron Kling) * pci/controller/tegra: PCI: tegra: Allow building as a module cpuidle: tegra: Export tegra_cpuidle_pcie_irqs_in_use() irqdomain: Export irq_domain_free_irqs()
2026-02-06bpf: Switch to bpf_selem_unlink_nofail in bpf_local_storage_{map_free, destroy}Amery Hung
Take care of rqspinlock error in bpf_local_storage_{map_free, destroy}() properly by switching to bpf_selem_unlink_nofail(). Both functions iterate their own RCU-protected list of selems and call bpf_selem_unlink_nofail(). In map_free(), to prevent infinite loop when both map_free() and destroy() fail to remove a selem from b->list (extremely unlikely), switch to hlist_for_each_entry_rcu(). In destroy(), also switch to hlist_for_each_entry_rcu() since we no longer iterate local_storage->list under local_storage->lock. bpf_selem_unlink() now becomes dedicated to helpers and syscalls paths so reuse_now should always be false. Remove it from the argument and hardcode it. Acked-by: Alexei Starovoitov <ast@kernel.org> Co-developed-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-12-ameryhung@gmail.com
2026-02-06bpf: Support lockless unlink when freeing map or local storageAmery Hung
Introduce bpf_selem_unlink_nofail() to properly handle errors returned from rqspinlock in bpf_local_storage_map_free() and bpf_local_storage_destroy() where the operation must succeeds. The idea of bpf_selem_unlink_nofail() is to allow an selem to be partially linked and use atomic operation on a bit field, selem->state, to determine when and who can free the selem if any unlink under lock fails. An selem initially is fully linked to a map and a local storage. Under normal circumstances, bpf_selem_unlink_nofail() will be able to grab locks and unlink a selem from map and local storage in sequeunce, just like bpf_selem_unlink(), and then free it after an RCU grace period. However, if any of the lock attempts fails, it will only clear SDATA(selem)->smap or selem->local_storage depending on the caller and set SELEM_MAP_UNLINKED or SELEM_STORAGE_UNLINKED according to the caller. Then, after both map_free() and destroy() see the selem and the state becomes SELEM_UNLINKED, one of two racing caller can succeed in cmpxchg the state from SELEM_UNLINKED to SELEM_TOFREE, ensuring no double free or memory leak. To make sure bpf_obj_free_fields() is done only once and when map is still present, it is called when unlinking an selem from b->list under b->lock. To make sure uncharging memory is done only when the owner is still present in map_free(), block destroy() from returning until there is no pending map_free(). Since smap may not be valid in destroy(), bpf_selem_unlink_nofail() skips bpf_selem_unlink_storage_nolock_misc() when called from destroy(). This is okay as bpf_local_storage_destroy() will return the remaining amount of memory charge tracked by mem_charge to the owner to uncharge. It is also safe to skip clearing local_storage->owner and owner_storage as the owner is being freed and no users or bpf programs should be able to reference the owner and using local_storage. Finally, access of selem, SDATA(selem)->smap and selem->local_storage are racy. Callers will protect these fields with RCU. Acked-by: Alexei Starovoitov <ast@kernel.org> Co-developed-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-11-ameryhung@gmail.com
2026-02-06bpf: Prepare for bpf_selem_unlink_nofail()Amery Hung
The next patch will introduce bpf_selem_unlink_nofail() to handle rqspinlock errors. bpf_selem_unlink_nofail() will allow an selem to be partially unlinked from map or local storage. Save memory allocation method in selem so that later an selem can be correctly freed even when SDATA(selem)->smap is init to NULL. In addition, keep track of memory charge to the owner in local storage so that later bpf_selem_unlink_nofail() can return the correct memory charge to the owner. Updating local_storage->mem_charge is protected by local_storage->lock. Finally, extract miscellaneous tasks performed when unlinking an selem from local_storage into bpf_selem_unlink_storage_nolock_misc(). It will be reused by bpf_selem_unlink_nofail(). This patch also takes the chance to remove local_storage->smap, which is no longer used since commit f484f4a3e058 ("bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage"). Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-10-ameryhung@gmail.com
2026-02-06bpf: Remove unused percpu counter from bpf_local_storage_map_freeAmery Hung
Percpu locks have been removed from cgroup and task local storage. Now that all local storage no longer use percpu variables as locks preventing recursion, there is no need to pass them to bpf_local_storage_map_free(). Remove the argument from the function. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-9-ameryhung@gmail.com
2026-02-06bpf: Remove cgroup local storage percpu counterAmery Hung
The percpu counter in cgroup local storage is no longer needed as the underlying bpf_local_storage can now handle deadlock with the help of rqspinlock. Remove the percpu counter and related migrate_{disable, enable}. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-8-ameryhung@gmail.com
2026-02-06bpf: Remove task local storage percpu counterAmery Hung
The percpu counter in task local storage is no longer needed as the underlying bpf_local_storage can now handle deadlock with the help of rqspinlock. Remove the percpu counter and related migrate_{disable, enable}. Since the percpu counter is removed, merge back bpf_task_storage_get() and bpf_task_storage_get_recur(). This will allow the bpf syscalls and helpers to run concurrently on the same CPU, removing the spurious -EBUSY error. bpf_task_storage_get(..., F_CREATE) will now always succeed with enough free memory unless being called recursively. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-7-ameryhung@gmail.com
2026-02-06bpf: Change local_storage->lock and b->lock to rqspinlockAmery Hung
Change bpf_local_storage::lock and bpf_local_storage_map_bucket::lock from raw_spin_lock to rqspinlock. Finally, propagate errors from raw_res_spin_lock_irqsave() to syscall return or BPF helper return. In bpf_local_storage_destroy(), ignore return from raw_res_spin_lock_irqsave() for now. A later patch will correctly handle errors correctly in bpf_local_storage_destroy() so that it can unlink selems even when failing to acquire locks. For __bpf_local_storage_map_cache(), instead of handling the error, skip updating the cache. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-6-ameryhung@gmail.com
2026-02-06bpf: Convert bpf_selem_unlink to failableAmery Hung
To prepare changing both bpf_local_storage_map_bucket::lock and bpf_local_storage::lock to rqspinlock, convert bpf_selem_unlink() to failable. It still always succeeds and returns 0 until the change happens. No functional change. Open code bpf_selem_unlink_storage() in the only caller, bpf_selem_unlink(), since unlink_map and unlink_storage must be done together after all the necessary locks are acquired. For bpf_local_storage_map_free(), ignore the return from bpf_selem_unlink() for now. A later patch will allow it to unlink selems even when failing to acquire locks. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-5-ameryhung@gmail.com
2026-02-06bpf: Convert bpf_selem_link_map to failableAmery Hung
To prepare for changing bpf_local_storage_map_bucket::lock to rqspinlock, convert bpf_selem_link_map() to failable. It still always succeeds and returns 0 until the change happens. No functional change. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-4-ameryhung@gmail.com
2026-02-06bpf: Convert bpf_selem_unlink_map to failableAmery Hung
To prepare for changing bpf_local_storage_map_bucket::lock to rqspinlock, convert bpf_selem_unlink_map() to failable. It still always succeeds and returns 0 for now. Since some operations updating local storage cannot fail in the middle, open-code bpf_selem_unlink_map() to take the b->lock before the operation. There are two such locations: - bpf_local_storage_alloc() The first selem will be unlinked from smap if cmpxchg owner_storage_ptr fails, which should not fail. Therefore, hold b->lock when linking until allocation complete. Helpers that assume b->lock is held by callers are introduced: bpf_selem_link_map_nolock() and bpf_selem_unlink_map_nolock(). - bpf_local_storage_update() The three step update process: link_map(new_selem), link_storage(new_selem), and unlink_map(old_selem) should not fail in the middle. In bpf_selem_unlink(), bpf_selem_unlink_map() and bpf_selem_unlink_storage() should either all succeed or fail as a whole instead of failing in the middle. So, return if unlink_map() failed. Remove the selem_linked_to_map_lockless() check as an selem in the common paths (not bpf_local_storage_map_free() or bpf_local_storage_destroy()), will be unlinked under b->lock and local_storage->lock and therefore no other threads can unlink the selem from map at the same time. In bpf_local_storage_destroy(), ignore the return of bpf_selem_unlink_map() for now. A later patch will allow bpf_local_storage_destroy() to unlink selems even when failing to acquire locks. Note that while this patch removes all callers of selem_linked_to_map(), a later patch that introduces bpf_selem_unlink_nofail() will use it again. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-3-ameryhung@gmail.com
2026-02-06bpf: Select bpf_local_storage_map_bucket based on bpf_local_storageAmery Hung
A later bpf_local_storage refactor will acquire all locks before performing any update. To simplified the number of locks needed to take in bpf_local_storage_map_update(), determine the bucket based on the local_storage an selem belongs to instead of the selem pointer. Currently, when a new selem needs to be created to replace the old selem in bpf_local_storage_map_update(), locks of both buckets need to be acquired to prevent racing. This can be simplified if the two selem belongs to the same bucket so that only one bucket needs to be locked. Therefore, instead of hashing selem, hashing the local_storage pointer the selem belongs. Performance wise, this is slightly better as update now requires locking one bucket. It should not change the level of contention on one bucket as the pointers to local storages of selems in a map are just as unique as pointers to selems. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-2-ameryhung@gmail.com
2026-02-06Merge tag 'trace-v6.19-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fix from Steven Rostedt: - Fix event format field alignments for 32 bit architectures The fields in the event format files are used to parse the raw binary buffer data by applications. If they are incorrect, then the application produces garbage. On 32 bit architectures, the function graph 64bit calltime and rettime were off by 4bytes. That's because the actual fields are in a packed structure but the macros used by the ftrace events did not mark them as packed, and instead, gave them their natural alignment which made their offsets off by 4 bytes. There are macros to have a packed field within an embedded structure of an event, but there's no macro for normal fields within a packed structure of the event. The macro __field_packed() was used for the packed embedded structure field. Rename that to __field_desc_packed() (to match the non-packed embedded field macro __field_desc()), and make __field_packed() for fields that are in a packed event structure (which matches the unpacked __field() macro). Switch the calltime and rettime fields of the function graph event to use the new __field_packed() and this makes the offsets correct. * tag 'trace-v6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix ftrace event field alignments
2026-02-06tracing/kprobes: Skip setup_boot_kprobe_events() when no cmdline eventYaxiong Tian
When the 'kprobe_event=' kernel command-line parameter is not provided, there is no need to execute setup_boot_kprobe_events(). This change optimizes the initialization function init_kprobe_trace() by skipping unnecessary work and effectively prevents potential blocking that could arise from contention on the event_mutex lock in subsequent operations. Link: https://patch.msgid.link/20260204015401.163748-1-tianyaxiong@kylinos.cn Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-02-06blktrace: Make init_blk_tracer() asynchronousYaxiong Tian
The init_blk_tracer() function causes significant boot delay as it waits for the trace_event_sem lock held by trace_event_update_all(). Specifically, its child function register_trace_event() requires this lock, which is occupied for an extended period during boot. To resolve this, the execution of primary init_blk_tracer() is moved to the trace_init_wq workqueue, allowing it to run asynchronously, and prevent blocking the main boot thread. Link: https://patch.msgid.link/20260204015353.163331-1-tianyaxiong@kylinos.cn Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>