summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2026-02-09Merge tag 'pm-6.20-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "By the number of commits, cpufreq is the leading party (again) and the most visible change there is the removal of the omap-cpufreq driver that has not been used for a long time (good riddance). There are also quite a few changes in the cppc_cpufreq driver, mostly related to fixing its frequency invariance engine in the case when the CPPC registers used by it are not in PCC. In addition to that, support for AM62L3 is added to the ti-cpufreq driver and the cpufreq-dt-platdev list is updated for some platforms. The remaining cpufreq changes are assorted fixes and cleanups. Next up is cpuidle and the changes there are dominated by intel_idle driver updates, mostly related to the new command line facility allowing users to adjust the list of C-states used by the driver. There are also a few updates of cpuidle governors, including two menu governor fixes and some refinements of the teo governor, and a MAINTAINERS update adding Christian Loehle as a cpuidle reviewer. [Thanks for stepping up Christian!] The most significant update related to system suspend and hibernation is the one to stop freezing the PM runtime workqueue during system PM transitions which allows some deadlocks to be avoided. There is also a fix for possible concurrent bit field updates in the core device suspend code and a few other minor fixes. Apart from the above, several drivers are updated to discard the return value of pm_runtime_put() which is going to be converted to a void function as soon as everybody stops using its return value, PL4 support for Ice Lake is added to the Intel RAPL power capping driver, and there are assorted cleanups, documentation fixes, and some cpupower utility improvements. Specifics: - Remove the unused omap-cpufreq driver (Andreas Kemnade) - Optimize error handling code in cpufreq_boost_trigger_state() and make cpufreq_boost_trigger_state() return -EOPNOTSUPP if no policy supports boost (Lifeng Zheng) - Update cpufreq-dt-platdev list for tegra, qcom, TI (Aaron Kling, Dhruva Gole, and Konrad Dybcio) - Minor improvements to the cpufreq and cpumask rust implementation (Alexandre Courbot, Alice Ryhl, Tamir Duberstein, and Yilin Chen) - Add support for AM62L3 SoC to the ti-cpufreq driver (Dhruva Gole) - Update arch_freq_scale in the CPPC cpufreq driver's frequency invariance engine (FIE) in scheduler ticks if the related CPPC registers are not in PCC (Jie Zhan) - Assorted minor cleanups and improvements in ARM cpufreq drivers (Juan Martinez, Felix Gu, Luca Weiss, and Sergey Shtylyov) - Add generic helpers for sysfs show/store to cppc_cpufreq (Sumit Gupta) - Make the scaling_setspeed cpufreq sysfs attribute return the actual requested frequency to avoid confusion (Pengjie Zhang) - Simplify the idle CPU time granularity test in the ondemand cpufreq governor (Frederic Weisbecker) - Enable asym capacity in intel_pstate only when CPU SMT is not possible (Yaxiong Tian) - Update the description of rate_limit_us default value in cpufreq documentation (Yaxiong Tian) - Add a command line option to adjust the C-states table in the intel_idle driver, remove the 'preferred_cstates' module parameter from it, add C-states validation to it and clean it up (Artem Bityutskiy) - Make the menu cpuidle governor always check the time till the closest timer event when the scheduler tick has been stopped to prevent it from mistakenly selecting the deepest available idle state (Rafael Wysocki) - Update the teo cpuidle governor to avoid making suboptimal decisions in certain corner cases and generally improve idle state selection accuracy (Rafael Wysocki) - Remove an unlikely() annotation on the early-return condition in menu_select() that leads to branch misprediction 100% of the time on systems with only 1 idle state enabled, like ARM64 servers (Breno Leitao) - Add Christian Loehle to MAINTAINERS as a cpuidle reviewer (Christian Loehle) - Stop flagging the PM runtime workqueue as freezable to avoid system suspend and resume deadlocks in subsystems that assume asynchronous runtime PM to work during system-wide PM transitions (Rafael Wysocki) - Drop redundant NULL pointer checks before acomp_request_free() from the hibernation code handling image saving (Rafael Wysocki) - Update wakeup_sources_walk_start() to handle empty lists of wakeup sources as appropriate (Samuel Wu) - Make dev_pm_clear_wake_irq() check the power.wakeirq value under power.lock to avoid race conditions (Gui-Dong Han) - Avoid bit field races related to power.work_in_progress in the core device suspend code (Xuewen Yan) - Make several drivers discard pm_runtime_put() return value in preparation for converting that function to a void one (Rafael Wysocki) - Add PL4 support for Ice Lake to the Intel RAPL power capping driver (Daniel Tang) - Replace sprintf() with sysfs_emit() in power capping sysfs show functions (Sumeet Pawnikar) - Make dev_pm_opp_get_level() return value match the documentation after a previous update of the latter (Aleks Todorov) - Use scoped for each OF child loop in the OPP code (Krzysztof Kozlowski) - Fix a bug in an example code snippet and correct typos in the energy model management documentation (Patrick Little) - Fix miscellaneous problems in cpupower (Kaushlendra Kumar): * idle_monitor: Fix incorrect value logged after stop * Fix inverted APERF capability check * Use strcspn() to strip trailing newline * Reset errno before strtoull() * Show C0 in idle-info dump - Improve cpupower installation procedure by making the systemd step optional and allowing users to disable the installation of systemd's unit file (João Marcos Costa)" * tag 'pm-6.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (65 commits) PM: sleep: core: Avoid bit field races related to work_in_progress PM: sleep: wakeirq: harden dev_pm_clear_wake_irq() against races cpufreq: Documentation: Update description of rate_limit_us default value cpufreq: intel_pstate: Enable asym capacity only when CPU SMT is not possible PM: wakeup: Handle empty list in wakeup_sources_walk_start() PM: EM: Documentation: Fix bug in example code snippet Documentation: Fix typos in energy model documentation cpuidle: governors: teo: Refine intercepts-based idle state lookup cpuidle: governors: teo: Adjust the classification of wakeup events cpufreq: ondemand: Simplify idle cputime granularity test cpufreq: userspace: make scaling_setspeed return the actual requested frequency PM: hibernate: Drop NULL pointer checks before acomp_request_free() cpufreq: CPPC: Add generic helpers for sysfs show/store cpufreq: scmi: Fix device_node reference leak in scmi_cpu_domain_id() cpufreq: ti-cpufreq: add support for AM62L3 SoC cpufreq: dt-platdev: Add ti,am62l3 to blocklist cpufreq/amd-pstate: Add comment explaining nominal_perf usage for performance policy cpufreq: scmi: correct SCMI explanation cpufreq: dt-platdev: Block the driver from probing on more QC platforms rust: cpumask: rename methods of Cpumask for clarity and consistency ...
2026-02-09Merge tag 'acpi-6.20-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "This one is significantly larger than previous ACPI support pull requests because several significant updates have coincided in it. First, there is a routine ACPICA code update, to upstream version 20251212, but this time it covers new ACPI 6.6 material that has not been covered yet. Among other things, it includes definitions of a few new ACPI tables and updates of some others, like the GICv5 MADT structures and ARM IORT IWB node definitions that are used for adding GICv5 ACPI probing on ARM (that technically is IRQ subsystem material, but it depends on the ACPICA changes, so it is included here). The latter alone adds a few hundred lines of new code. Second, there is an update of ACPI _OSC handling including a fix that prevents failures from occurring in some corner cases due to careless handling of _OSC error bits. On top of that, the "system resource" ACPI device objects with the PNP0C01 and PNP0C02 are now going to be handled by the ACPI core device enumeration code instead of handing them over to the legacy PNP system driver which causes device enumeration issues to occur. Some of those issues have been worked around in device drivers and elsewhere and those workarounds should not be necessary any more, so they are going away. Moreover, the time has come to convert all "core ACPI" device drivers that were still using struct acpi_driver objects for device binding into proper platform drivers that use struct platform_driver for this purpose. These updates are accompanied by some requisite core ACPI device enumeration code changes. Next, there are ACPI APEI updates, including changes to avoid excess overhead in the NMI handler and in SEA on the ARM side, changes to unify ACPI-based HW error tracing and logging, and changes to prevent APEI code from reaching out of its allocated memory. There are also some ACPI power management updates, mostly related to the ACPI cpuidle support in the processor driver, suspend-to-idle handling on systems with ACPI support and to ACPI PM of devices. In addition to the above, bugs are fixed and the code is cleaned up in assorted places all over. Specifics: - Update the ACPICA code in the kernel to upstream version 20251212 which includes the following changes: * Add support for new ACPI table DTPR (Michal Camacho Romero) * Release objects with acpi_ut_delete_object_desc() (Zilin Guan) * Add UUIDs for Microsoft fan extensions and UUIDs associated with TPM 2.0 devices (Armin Wolf) * Fix NULL pointer dereference in acpi_ev_address_space_dispatch() (Alexey Simakov) * Add KEYP ACPI table definition (Dave Jiang) * Add support for the Microsoft display mux _OSI string (Armin Wolf) * Add definitions for the IOVT ACPI table (Xianglai Li) * Abort AML bytecode execution on AML_FATAL_OP (Armin Wolf) * Include all fields in subtable type1 for PPTT (Ben Horgan) * Add GICv5 MADT structures and Arm IORT IWB node definitions (Jose Marinho) * Update Parameter Block structure for RAS2 and add a new flag in Memory Affinity Structure for SRAT (Pawel Chmielewski) * Add _VDM (Voltage Domain) object (Pawel Chmielewski) - Add support for GICv5 ACPI probing on ARM which is based on the GICv5 MADT structures and ARM IORT IWB node definitions recently added to ACPICA (Lorenzo Pieralisi) - Rework ACPI PM notification setup for PCI root buses and modify the ACPI PM setup for devices to register wakeup source objects under physical (that is, PCI, platform, etc.) devices instead of doing that under their ACPI companions (Rafael Wysocki) - Adjust debug messages regarding postponed ACPI PM printed during system resume to be more accurate (Rafael Wysocki) - Remove dead code from lps0_device_attach() (Gergo Koteles) - Start to invoke Microsoft Function 9 (Turn On Display) of the Low- Power S0 Idle (LPS0) _DSM in the suspend-to-idle resume flow on systems with ACPI LPS0 support to address a functional issue on Lenovo Yoga Slim 7i Aura (15ILL9), where system fans and keyboard backlights fail to resume after suspend (Jakob Riemenschneider) - Add sysfs attribute cid for exposing _CID lists under ACPI device objects (Rafael Wysocki) - Replace sprintf() with sysfs_emit() in all of the core ACPI sysfs interface code (Sumeet Pawnikar) - Use acpi_get_local_u64_address() in the code implementing ACPI support for PCI to evaluate _ADR instead of evaluating that object directly (Andy Shevchenko) - Add JWIPC JVC9100 to irq1_level_low_skip_override[] to unbreak serial IRQs on that system (Ai Chao) - Fix handling of _OSC errors in acpi_run_osc() to avoid failures on systems where _OSC error bits are set even though the _OSC return buffer contains acknowledged feature bits (Rafael Wysocki) - Clean up and rearrange \_SB._OSC handling for general platform features and USB4 features to avoid code duplication and unnecessary memory management overhead (Rafael Wysocki) - Make the ACPI core device enumeration code handle PNP0C01 and PNP0C02 ("system resource") device objects directly instead of letting the legacy PNP system driver handle them to avoid device enumeration issues on systems where PNP0C02 is present in the _CID list under ACPI device objects with a _HID matching a proper device driver in Linux (Rafael Wysocki) - Drop workarounds for the known device enumeration issues related to _CID lists containing PNP0C02 (Rafael Wysocki) - Drop outdated comment regarding removed function in the ACPI-based device enumeration code (Julia Lawall) - Make PRP0001 device matching work as expected for ACPI device objects using it as a _HID for board development and similar purposes (Kartik Rajput) - Use async schedule function in acpi_scan_clear_dep_fn() to avoid races with user space initialization on some systems (Yicong Yang) - Add a piece of documentation explaining why binding drivers directly to ACPI device objects is not a good idea in general and why it is desirable to convert drivers doing so into proper platform drivers that use struct platform_driver for device binding (Rafael Wysocki) - Convert multiple "core ACPI" drivers, including the NFIT ACPI device driver, the generic ACPI button drivers, the generic ACPI thermal zone driver, the ACPI hardware event device (HED) driver, the ACPI EC driver, the ACPI SMBUS HC driver, the ACPI Smart Battery Subsystem (SBS) driver, and the ACPI backlight (video) driver to proper platform drivers that use struct platform_driver for device binding (Rafael Wysocki) - Use acpi_get_local_u64_address() in the ACPI backlight (video) driver to evaluate _ADR instead of evaluating that object directly (Andy Shevchenko) - Convert the generic ACPI battery driver to a proper platform driver using struct platform_driver for device binding (Rafael Wysocki) - Fix incorrect charging status when current is zero in the generic ACPI battery driver (Ata İlhan Köktürk) - Use LIST_HEAD() for initializing a stack-allocated list in the generic ACPI watchdog device driver (Can Peng) - Rework the ACPI idle driver initialization to register it directly from the common initialization code instead of doing that from a CPU hotplug "online" callback and clean it up (Huisong Li, Rafael Wysocki) - Fix a possible NULL pointer dereference in acpi_processor_errata_piix4() (Tuo Li) - Make read-only array non_mmio_desc[] static const (Colin Ian King) - Prevent the APEI GHES support code on ARM from accessing memory out of bounds or going past the ARM processor CPER record buffer (Mauro Carvalho Chehab) - Prevent cper_print_fw_err() from dumping the entire memory on systems with defective firmware (Mauro Carvalho Chehab) - Improve ghes_notify_nmi() status check to avoid unnecessary overhead in the NMI handler by carrying out all of the requisite preparations and the NMI registration time (Tony Luck) - Refactor the GHES driver by extracting common functionality into reusable helper functions to reduce code duplication and improve the ghes_notify_sea() status check in analogy with the previous ghes_notify_nmi() status check improvement (Shuai Xue) - Make ELOG and GHES log and trace consistently and support the CPER CXL protocol analogously (Fabio De Francesco) - Disable KASAN instrumentation in the APEI GHES driver when compile testing with clang < 18 (Nathan Chancellor) - Let ghes_edac be the preferred driver to load on __ZX__ and _BYO_ systems by extending the platform detection list in the APEI GHES driver (Tony W Wang-oc) - Clean up cppc_perf_caps and cppc_perf_ctrls structs and rename EPP constants for clarity in the ACPI CPPC library (Sumit Gupta)" * tag 'acpi-6.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (117 commits) ACPI: battery: fix incorrect charging status when current is zero ACPI: scan: Use async schedule function in acpi_scan_clear_dep_fn() ACPI: x86: s2idle: Invoke Microsoft _DSM Function 9 (Turn On Display) ACPI: APEI: GHES: Add ghes_edac support for __ZX__ and _BYO_ systems ACPI: APEI: GHES: Disable KASAN instrumentation when compile testing with clang < 18 ACPI: sysfs: Replace sprintf() with sysfs_emit() ACPI: CPPC: Rename EPP constants for clarity ACPI: CPPC: Clean up cppc_perf_caps and cppc_perf_ctrls structs ACPI: processor: idle: Rework the handling of acpi_processor_ffh_lpi_probe() ACPI: processor: idle: Convert acpi_processor_setup_cpuidle_dev() to void ACPI: processor: idle: Convert acpi_processor_setup_cpuidle_states() to void irqchip/gic-v5: Add ACPI IWB probing irqchip/gic-v5: Add ACPI ITS probing irqchip/gic-v5: Add ACPI IRS probing irqchip/gic-v5: Split IRS probing into OF and generic portions PCI/MSI: Make the pci_msi_map_rid_ctlr_node() interface firmware agnostic irqdomain: Add parent field to struct irqchip_fwid ACPI: PCI: simplify code with acpi_get_local_u64_address() ACPI: video: simplify code with acpi_get_local_u64_address() ACPI: PM: Adjust messages regarding postponed ACPI PM ...
2026-02-09Merge tag 'for-7.0/block-stable-pages-20260206' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull bounce buffer dio for stable pages from Jens Axboe: "This adds support for bounce buffering of dio for stable pages. This was all done by Christoph. In his words: This series tries to address the problem that under I/O pages can be modified during direct I/O, even when the device or file system require stable pages during I/O to calculate checksums, parity or data operations. It does so by adding block layer helpers to bounce buffer an iov_iter into a bio, then wires that up in iomap and ultimately XFS. The reason that the file system even needs to know about it, is because reads need a user context to copy the data back, and the infrastructure to defer ioends to a workqueue currently sits in XFS. I'm going to look into moving that into ioend and enabling it for other file systems. Additionally btrfs already has it's own infrastructure for this, and actually an urgent need to bounce buffer, so this should be useful there and could be wire up easily. In fact the idea comes from patches by Qu that did this in btrfs. This patch fixes all but one xfstests failures on T10 PI capable devices (generic/095 seems to have issues with a mix of mmap and splice still, I'm looking into that separately), and make qemu VMs running Windows, or Linux with swap enabled fine on an XFS file on a device using PI. Performance numbers on my (not exactly state of the art) NVMe PI test setup: Sequential reads using io_uring, QD=16. Bandwidth and CPU usage (usr/sys): | size | zero copy | bounce | +------+--------------------------+--------------------------+ | 4k | 1316MiB/s (12.65/55.40%) | 1081MiB/s (11.76/49.78%) | | 64K | 3370MiB/s ( 5.46/18.20%) | 3365MiB/s ( 4.47/15.68%) | | 1M | 3401MiB/s ( 0.76/23.05%) | 3400MiB/s ( 0.80/09.06%) | +------+--------------------------+--------------------------+ Sequential writes using io_uring, QD=16. Bandwidth and CPU usage (usr/sys): | size | zero copy | bounce | +------+--------------------------+--------------------------+ | 4k | 882MiB/s (11.83/33.88%) | 750MiB/s (10.53/34.08%) | | 64K | 2009MiB/s ( 7.33/15.80%) | 2007MiB/s ( 7.47/24.71%) | | 1M | 1992MiB/s ( 7.26/ 9.13%) | 1992MiB/s ( 9.21/19.11%) | +------+--------------------------+--------------------------+ Note that the 64k read numbers look really odd to me for the baseline zero copy case, but are reproducible over many repeated runs. The bounce read numbers should further improve when moving the PI validation to the file system and removing the double context switch, which I have patches for that will sent out soon" * tag 'for-7.0/block-stable-pages-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: xfs: use bounce buffering direct I/O when the device requires stable pages iomap: add a flag to bounce buffer direct I/O iomap: support ioends for direct reads iomap: rename IOMAP_DIO_DIRTY to IOMAP_DIO_USER_BACKED iomap: free the bio before completing the dio iomap: share code between iomap_dio_bio_end_io and iomap_finish_ioend_direct iomap: split out the per-bio logic from iomap_dio_bio_iter iomap: simplify iomap_dio_bio_iter iomap: fix submission side handling of completion side errors block: add helpers to bounce buffer an iov_iter into bios block: remove bio_release_page iov_iter: extract a iov_iter_extract_bvecs helper from bio code block: open code bio_add_page and fix handling of mismatching P2P ranges block: refactor get_contig_folio_len block: add a BIO_MAX_SIZE constant and use it
2026-02-09Merge tag 'for-7.0/block-20260206' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block updates from Jens Axboe: - Support for batch request processing for ublk, improving the efficiency of the kernel/ublk server communication. This can yield nice 7-12% performance improvements - Support for integrity data for ublk - Various other ublk improvements and additions, including a ton of selftests additions and updated - Move the handling of blk-crypto software fallback from below the block layer to above it. This reduces the complexity of dealing with bio splitting - Series fixing a number of potential deadlocks in blk-mq related to the queue usage counter and writeback throttling and rq-qos debugfs handling - Add an async_depth queue attribute, to resolve a performance regression that's been around for a qhilw related to the scheduler depth handling - Only use task_work for IOPOLL completions on NVMe, if it is necessary to do so. An earlier fix for an issue resulted in all these completions being punted to task_work, to guarantee that completions were only run for a given io_uring ring when it was local to that ring. With the new changes, we can detect if it's necessary to use task_work or not, and avoid it if possible. - rnbd fixes: - Fix refcount underflow in device unmap path - Handle PREFLUSH and NOUNMAP flags properly in protocol - Fix server-side bi_size for special IOs - Zero response buffer before use - Fix trace format for flags - Add .release to rnbd_dev_ktype - MD pull requests via Yu Kuai - Fix raid5_run() to return error when log_init() fails - Fix IO hang with degraded array with llbitmap - Fix percpu_ref not resurrected on suspend timeout in llbitmap - Fix GPF in write_page caused by resize race - Fix NULL pointer dereference in process_metadata_update - Fix hang when stopping arrays with metadata through dm-raid - Fix any_working flag handling in raid10_sync_request - Refactor sync/recovery code path, improve error handling for badblocks, and remove unused recovery_disabled field - Consolidate mddev boolean fields into mddev_flags - Use mempool to allocate stripe_request_ctx and make sure max_sectors is not less than io_opt in raid5 - Fix return value of mddev_trylock - Fix memory leak in raid1_run() - Add Li Nan as mdraid reviewer - Move phys_vec definitions to the kernel types, mostly in preparation for some VFIO and RDMA changes - Improve the speed for secure erase for some devices - Various little rust updates - Various other minor fixes, improvements, and cleanups * tag 'for-7.0/block-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (162 commits) blk-mq: ABI/sysfs-block: fix docs build warnings selftests: ublk: organize test directories by test ID block: decouple secure erase size limit from discard size limit block: remove redundant kill_bdev() call in set_blocksize() blk-mq: add documentation for new queue attribute async_dpeth block, bfq: convert to use request_queue->async_depth mq-deadline: covert to use request_queue->async_depth kyber: covert to use request_queue->async_depth blk-mq: add a new queue sysfs attribute async_depth blk-mq: factor out a helper blk_mq_limit_depth() blk-mq-sched: unify elevators checking for async requests block: convert nr_requests to unsigned int block: don't use strcpy to copy blockdev name blk-mq-debugfs: warn about possible deadlock blk-mq-debugfs: add missing debugfs_mutex in blk_mq_debugfs_register_hctxs() blk-mq-debugfs: remove blk_mq_debugfs_unregister_rqos() blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static blk-rq-qos: fix possible debugfs_mutex deadlock blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos blk-wbt: fix possible deadlock to nest pcpu_alloc_mutex under q_usage_counter ...
2026-02-09Merge tag 'io_uring-bpf-restrictions.4-20260206' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring bpf filters from Jens Axboe: "This adds support for both cBPF filters for io_uring, as well as task inherited restrictions and filters. seccomp and io_uring don't play along nicely, as most of the interesting data to filter on resides somewhat out-of-band, in the submission queue ring. As a result, things like containers and systemd that apply seccomp filters, can't filter io_uring operations. That leaves them with just one choice if filtering is critical - filter the actual io_uring_setup(2) system call to simply disallow io_uring. That's rather unfortunate, and has limited us because of it. io_uring already has some filtering support. It requires the ring to be setup in a disabled state, and then a filter set can be applied. This filter set is completely bi-modal - an opcode is either enabled or it's not. Once a filter set is registered, the ring can be enabled. This is very restrictive, and it's not useful at all to systemd or containers which really want both broader and more specific control. This first adds support for cBPF filters for opcodes, which enables tighter control over what exactly a specific opcode may do. As examples, specific support is added for IORING_OP_OPENAT/OPENAT2, allowing filtering on resolve flags. And another example is added for IORING_OP_SOCKET, allowing filtering on domain/type/protocol. These are both common use cases. cBPF was chosen rather than eBPF, because the latter is often restricted in containers as well. These filters are run post the init phase of the request, which allows filters to even dip into data that is being passed in struct in user memory, as the init side of requests make that data stable by bringing it into the kernel. This allows filtering without needing to copy this data twice, or have filters etc know about the exact layout of the user data. The filters get the already copied and sanitized data passed. On top of that support is added for per-task filters, meaning that any ring created with a task that has a per-task filter will get those filters applied when it's created. These filters are inherited across fork as well. Once a filter has been registered, any further added filters may only further restrict what operations are permitted. Filters cannot change the return value of an operation, they can only permit or deny it based on the contents" * tag 'io_uring-bpf-restrictions.4-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring: allow registration of per-task restrictions io_uring: add task fork hook io_uring/bpf_filter: add ref counts to struct io_bpf_filter io_uring/bpf_filter: cache lookup table in ctx->bpf_filters io_uring/bpf_filter: allow filtering on contents of struct open_how io_uring/net: allow filtering on IORING_OP_SOCKET data io_uring: add support for BPF filtering for opcode restrictions
2026-02-09Merge tag 'for-7.0/io_uring-20260206' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring updates from Jens Axboe: - Clean up the IORING_SETUP_R_DISABLED and submitter task checking, mostly just in preparation for relaxing the locking for SINGLE_ISSUER in the future. - Improve IOPOLL by using a doubly linked list to manage completions. Previously it was singly listed, which meant that to complete request N in the chain 0..N-1 had to have completed first. With a doubly linked list we can complete whatever request completes in that order, rather than need to wait for a consecutive range to be available. This reduces latencies. - Improve the restriction setup and checking. Mostly in preparation for adding further features on top of that. Coming in a separate pull request. - Split out task_work and wait handling into separate files. These are mostly nicely abstracted already, but still remained in the io_uring.c file which is on the larger side. - Use GFP_KERNEL_ACCOUNT in a few more spots, where appropriate. - Ensure even the idle io-wq worker exits if a task no longer has any rings open. - Add support for a non-circular submission queue. By default, the SQ ring keeps moving around, even if only a few entries are used for each submission. This can be wasteful in terms of cachelines. If IORING_SETUP_SQ_REWIND is set for the ring when created, each submission will start at offset 0 instead of where we last left off doing submissions. - Various little cleanups * tag 'for-7.0/io_uring-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (30 commits) io_uring/kbuf: fix memory leak if io_buffer_add_list fails io_uring: Add SPDX id lines to remaining source files io_uring: allow io-wq workers to exit when unused io_uring/io-wq: add exit-on-idle state io_uring/net: don't continue send bundle if poll was required for retry io_uring/rsrc: use GFP_KERNEL_ACCOUNT consistently io_uring/futex: use GFP_KERNEL_ACCOUNT for futex data allocation io_uring/io-wq: handle !sysctl_hung_task_timeout_secs io_uring: fix bad indentation for setup flags if statement io_uring/rsrc: take unsigned index in io_rsrc_node_lookup() io_uring: introduce non-circular SQ io_uring: split out CQ waiting code into wait.c io_uring: split out task work code into tw.c io_uring/io-wq: don't trigger hung task for syzbot craziness io_uring: add IO_URING_EXIT_WAIT_MAX definition io_uring/sync: validate passed in offset io_uring/eventfd: remove unused ctx->evfd_last_cq_tail member io_uring/timeout: annotate data race in io_flush_timeouts() io_uring/uring_cmd: explicitly disallow cancelations for IOPOLL io_uring: fix IOPOLL with passthrough I/O ...
2026-02-09Merge tag 'pull-filename' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs 'struct filename' updates from Al Viro: "[Mostly] sanitize struct filename handling" * tag 'pull-filename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (68 commits) sysfs(2): fs_index() argument is _not_ a pathname alpha: switch osf_mount() to strndup_user() ksmbd: use CLASS(filename_kernel) mqueue: switch to CLASS(filename) user_statfs(): switch to CLASS(filename) statx: switch to CLASS(filename_maybe_null) quotactl_block(): switch to CLASS(filename) chroot(2): switch to CLASS(filename) move_mount(2): switch to CLASS(filename_maybe_null) namei.c: switch user pathname imports to CLASS(filename{,_flags}) namei.c: convert getname_kernel() callers to CLASS(filename_kernel) do_f{chmod,chown,access}at(): use CLASS(filename_uflags) do_readlinkat(): switch to CLASS(filename_flags) do_sys_truncate(): switch to CLASS(filename) do_utimes_path(): switch to CLASS(filename_uflags) chdir(2): unspaghettify a bit... do_fchownat(): unspaghettify a bit... fspick(2): use CLASS(filename_flags) name_to_handle_at(): use CLASS(filename_uflags) vfs_open_tree(): use CLASS(filename_uflags) ...
2026-02-09Merge tag 'xfs-merge-7.0' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Carlos Maiolino: "This contains several improvements to zoned device support, performance improvements for the parent pointers, and a new health monitoring feature. There are some improvements in the journaling code too but no behavior change expected. Last but not least, some code refactoring and bug fixes are also included in this series" * tag 'xfs-merge-7.0' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (67 commits) xfs: add sysfs stats for zoned GC xfs: give the defer_relog stat a xs_ prefix xfs: add zone reset error injection xfs: refactor zone reset handling xfs: don't mark all discard issued by zoned GC as sync xfs: allow setting errortags at mount time xfs: use WRITE_ONCE/READ_ONCE for m_errortag xfs: move the guts of XFS_ERRORTAG_DELAY out of line xfs: don't validate error tags in the I/O path xfs: allocate m_errortag early xfs: fix the errno sign for the xfs_errortag_{add,clearall} stubs xfs: validate log record version against superblock log version xfs: fix spacing style issues in xfs_alloc.c xfs: remove xfs_zone_gc_space_available xfs: use a seprate member to track space availabe in the GC scatch buffer xfs: check for deleted cursors when revalidating two btrees xfs: fix UAF in xchk_btree_check_block_owner xfs: check return value of xchk_scrub_create_subord xfs: only call xf{array,blob}_destroy if we have a valid pointer xfs: get rid of the xchk_xfile_*_descr calls ...
2026-02-09Merge tag 'vfs-7.0-rc1.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains a mix of VFS cleanups, performance improvements, API fixes, documentation, and a deprecation notice. Scalability and performance: - Rework pid allocation to only take pidmap_lock once instead of twice during alloc_pid(), improving thread creation/teardown throughput by 10-16% depending on false-sharing luck. Pad the namespace refcount to reduce false-sharing - Track file lock presence via a flag in ->i_opflags instead of reading ->i_flctx, avoiding false-sharing with ->i_readcount on open/close hot paths. Measured 4-16% improvement on 24-core open-in-a-loop benchmarks - Use a consume fence in locks_inode_context() to match the store-release/load-consume idiom, eliminating a hardware fence on some architectures - Annotate cdev_lock with __cacheline_aligned_in_smp to prevent false-sharing - Remove a redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu() that never fires since the caller already verifies it, eliminating a 100% mispredicted branch - Fix a 100% mispredicted likely() in devcgroup_inode_permission() that became wrong after a prior code reorder Bug fixes and correctness: - Make insert_inode_locked() wait for inode destruction instead of skipping, fixing a corner case where two matching inodes could exist in the hash - Move f_mode initialization before file_ref_init() in alloc_file() to respect the SLAB_TYPESAFE_BY_RCU ordering contract - Add a WARN_ON_ONCE guard in try_to_free_buffers() for folios with no buffers attached, preventing a null pointer dereference when AS_RELEASE_ALWAYS is set but no release_folio op exists - Fix select restart_block to store end_time as timespec64, avoiding truncation of tv_sec on 32-bit architectures - Make dump_inode() use get_kernel_nofault() to safely access inode and superblock fields, matching the dump_mapping() pattern API modernization: - Make posix_acl_to_xattr() allocate the buffer internally since every single caller was doing it anyway. Reduces boilerplate and unnecessary error checking across ~15 filesystems - Replace deprecated simple_strtoul() with kstrtoul() for the ihash_entries, dhash_entries, mhash_entries, and mphash_entries boot parameters, adding proper error handling - Convert chardev code to use guard(mutex) and __free(kfree) cleanup patterns - Replace min_t() with min() or umin() in VFS code to avoid silently truncating unsigned long to unsigned int - Gate LOOKUP_RCU assertions behind CONFIG_DEBUG_VFS since callers already check the flag Deprecation: - Begin deprecating legacy BSD process accounting (acct(2)). The interface has numerous footguns and better alternatives exist (eBPF) Documentation: - Fix and complete kernel-doc for struct export_operations, removing duplicated documentation between ReST and source - Fix kernel-doc warnings for __start_dirop() and ilookup5_nowait() Testing: - Add a kunit test for initramfs cpio handling of entries with filesize > PATH_MAX Misc: - Add missing <linux/init_task.h> include in fs_struct.c" * tag 'vfs-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits) posix_acl: make posix_acl_to_xattr() alloc the buffer fs: make insert_inode_locked() wait for inode destruction initramfs_test: kunit test for cpio.filesize > PATH_MAX fs: improve dump_inode() to safely access inode fields fs: add <linux/init_task.h> for 'init_fs' docs: exportfs: Use source code struct documentation fs: move initializing f_mode before file_ref_init() exportfs: Complete kernel-doc for struct export_operations exportfs: Mark struct export_operations functions at kernel-doc exportfs: Fix kernel-doc output for get_name() acct(2): begin the deprecation of legacy BSD process accounting device_cgroup: remove branch hint after code refactor VFS: fix __start_dirop() kernel-doc warnings fs: Describe @isnew parameter in ilookup5_nowait() fs/namei: Remove redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu fs: only assert on LOOKUP_RCU when built with CONFIG_DEBUG_VFS select: store end_time as timespec64 in restart block chardev: Switch to guard(mutex) and __free(kfree) namespace: Replace simple_strtoul with kstrtoul to parse boot params dcache: Replace simple_strtoul with kstrtoul in set_dhash_entries ...
2026-02-09Merge tag 'vfs-7.0-rc1.iomap' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs iomap updates from Christian Brauner: - Erofs page cache sharing preliminaries: Plumb a void *private parameter through iomap_read_folio() and iomap_readahead() into iomap_iter->private, matching iomap DIO. Erofs uses this to replace a bogus kmap_to_page() call, as preparatory work for page cache sharing. - Fix for invalid folio access: Fix an invalid folio access when a folio without iomap_folio_state is fully submitted to the IO helper — the helper may call folio_end_read() at any time, so ctx->cur_folio must be invalidated after full submission. * tag 'vfs-7.0-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: fix invalid folio access after folio_end_read() erofs: hold read context in iomap_iter if needed iomap: stash iomap read ctx in the private field of iomap_iter
2026-02-09Merge tag 'vfs-7.0-rc1.namespace' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount updates from Christian Brauner: - statmount: accept fd as a parameter Extend struct mnt_id_req with a file descriptor field and a new STATMOUNT_BY_FD flag. When set, statmount() returns mount information for the mount the fd resides on — including detached mounts (unmounted via umount2(MNT_DETACH)). For detached mounts the STATMOUNT_MNT_POINT and STATMOUNT_MNT_NS_ID mask bits are cleared since neither is meaningful. The capability check is skipped for STATMOUNT_BY_FD since holding an fd already implies prior access to the mount and equivalent information is available through fstatfs() and /proc/pid/mountinfo without privilege. Includes comprehensive selftests covering both attached and detached mount cases. - fs: Remove internal old mount API code (1 patch) Now that every in-tree filesystem has been converted to the new mount API, remove all the legacy shim code in fs_context.c that handled unconverted filesystems. This deletes ~280 lines including legacy_init_fs_context(), the legacy_fs_context struct, and associated wrappers. The mount(2) syscall path for userspace remains untouched. Documentation references to the legacy callbacks are cleaned up. - mount: add OPEN_TREE_NAMESPACE to open_tree() Container runtimes currently use CLONE_NEWNS to copy the caller's entire mount namespace — only to then pivot_root() and recursively unmount everything they just copied. With large mount tables and thousands of parallel container launches this creates significant contention on the namespace semaphore. OPEN_TREE_NAMESPACE copies only the specified mount tree (like OPEN_TREE_CLONE) but returns a mount namespace fd instead of a detached mount fd. The new namespace contains the copied tree mounted on top of a clone of the real rootfs. This functions as a combined unshare(CLONE_NEWNS) + pivot_root() in a single syscall. Works with user namespaces: an unshare(CLONE_NEWUSER) followed by OPEN_TREE_NAMESPACE creates a mount namespace owned by the new user namespace. Mount namespace file mounts are excluded from the copy to prevent cycles. Includes ~1000 lines of selftests" * tag 'vfs-7.0-rc1.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests/open_tree: add OPEN_TREE_NAMESPACE tests mount: add OPEN_TREE_NAMESPACE fs: Remove internal old mount API code selftests: statmount: tests for STATMOUNT_BY_FD statmount: accept fd as a parameter statmount: permission check should return EPERM
2026-02-09Merge tag 'vfs-7.0-rc1.atomic_open' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs atomic_open updates from Christian Brauner: "Allow knfsd to use atomic_open() While knfsd offers combined exclusive create and open results to clients, on some filesystems those results are not atomic. The separate vfs_create() + vfs_open() sequence in dentry_create() can produce races and unexpected errors. For example, open O_CREAT with mode 0 will succeed in creating the file but return -EACCES from vfs_open(). Additionally, network filesystems benefit from reducing remote round-trip operations by using a single atomic_open() call. Teach dentry_create() -- whose sole caller is knfsd -- to use atomic_open() for filesystems that support it" * tag 'vfs-7.0-rc1.atomic_open' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs/namei: fix kernel-doc markup for dentry_create VFS/knfsd: Teach dentry_create() to use atomic_open() VFS: Prepare atomic_open() for dentry_create() VFS: move dentry_create() from fs/open.c to fs/namei.c
2026-02-09Merge tag 'vfs-7.0-rc1.nullfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs nullfs update from Christian Brauner: "Add a completely catatonic minimal pseudo filesystem called "nullfs" and make pivot_root() work in the initramfs. Currently pivot_root() does not work on the real rootfs because it cannot be unmounted. Userspace has to recursively delete initramfs contents manually before continuing boot, using the fragile switch_root sequence (overmount + chroot). Add nullfs, a minimal immutable filesystem that serves as the true root of the mount hierarchy. The mutable rootfs (tmpfs/ramfs) is mounted on top of it. This allows userspace to simply: chdir(new_root); pivot_root(".", "."); umount2(".", MNT_DETACH); without the traditional switch_root workarounds. systemd already handles this correctly. It tries pivot_root() first and falls back to MS_MOVE only when that fails. This also means rootfs mounts in unprivileged namespaces no longer need MNT_LOCKED, since the immutable nullfs guarantees nothing can be revealed by unmounting the covering mount. nullfs is a single-instance filesystem (get_tree_single()) marked SB_NOUSER | SB_I_NOEXEC | SB_I_NODEV with an immutable empty root directory. This means sooner or later it can be used to overmount other directories to hide their contents without any additional protection needed. We enable it unconditionally. If we see any real regression we'll hide it behind a boot option. nullfs has extensions beyond this in the future. It will serve as a concept to support the creation of completely empty mount namespaces - which is work coming up in the next cycle" * tag 'vfs-7.0-rc1.nullfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: use nullfs unconditionally as the real rootfs docs: mention nullfs fs: add immutable rootfs fs: add init_pivot_root() fs: ensure that internal tmpfs mount gets mount id zero
2026-02-09Merge tag 'vfs-7.0-rc1.btrfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs updates for btrfs from Christian Brauner: "This contains some changes for btrfs that are taken to the vfs tree to stop duplicating VFS code for subvolume/snapshot dentry Btrfs has carried private copies of the VFS may_delete() and may_create() functions in fs/btrfs/ioctl.c for permission checks during subvolume creation and snapshot destruction. These copies have drifted out of sync with the VFS originals — btrfs_may_delete() is missing the uid/gid validity check and btrfs_may_create() is missing the audit_inode_child() call. Export the VFS functions as may_{create,delete}_dentry() and switch btrfs to use them, removing ~70 lines of duplicated code" * tag 'vfs-7.0-rc1.btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: btrfs: use may_create_dentry() in btrfs_mksubvol() btrfs: use may_delete_dentry() in btrfs_ioctl_snap_destroy() fs: export may_create() as may_create_dentry() fs: export may_delete() as may_delete_dentry()
2026-02-09Merge tag 'vfs-7.0-rc1.fserror' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs error reporting updates from Christian Brauner: "This contains the changes to support generic I/O error reporting. Filesystems currently have no standard mechanism for reporting metadata corruption and file I/O errors to userspace via fsnotify. Each filesystem (xfs, ext4, erofs, f2fs, etc.) privately defines EFSCORRUPTED, and error reporting to fanotify is inconsistent or absent entirely. This introduces a generic fserror infrastructure built around struct super_block that gives filesystems a standard way to queue metadata and file I/O error reports for delivery to fsnotify. Errors are queued via mempools and queue_work to avoid holding filesystem locks in the notification path; unmount waits for pending events to drain. A new super_operations::report_error callback lets filesystem drivers respond to file I/O errors themselves (to be used by an upcoming XFS self-healing patchset). On the uapi side, EFSCORRUPTED and EUCLEAN are promoted from private per-filesystem definitions to canonical errno.h values across all architectures" * tag 'vfs-7.0-rc1.fserror' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: ext4: convert to new fserror helpers xfs: translate fsdax media errors into file "data lost" errors when convenient xfs: report fs metadata errors via fsnotify iomap: report file I/O errors to the VFS fs: report filesystem and file I/O errors to fsnotify uapi: promote EFSCORRUPTED and EUCLEAN to errno.h
2026-02-09Merge tag 'vfs-7.0-rc1.leases' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs lease updates from Christian Brauner: "This contains updates for lease support to require filesystems to explicitly opt-in to lease support Currently kernel_setlease() falls through to generic_setlease() when a a filesystem does not define ->setlease(), silently granting lease support to every filesystem regardless of whether it is prepared for it. This is a poor default: most filesystems never intended to support leases, and the silent fallthrough makes it impossible to distinguish "supports leases" from "never thought about it". This inverts the default. It adds explicit .setlease = generic_setlease; assignments to every in-tree filesystem that should retain lease support, then changes kernel_setlease() to return -EINVAL when ->setlease is NULL. With the new default in place, simple_nosetlease() is redundant and is removed along with all references to it" * tag 'vfs-7.0-rc1.leases' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits) fuse: add setlease file operation fs: remove simple_nosetlease() filelock: default to returning -EINVAL when ->setlease operation is NULL xfs: add setlease file operation ufs: add setlease file operation udf: add setlease file operation tmpfs: add setlease file operation squashfs: add setlease file operation overlayfs: add setlease file operation orangefs: add setlease file operation ocfs2: add setlease file operation ntfs3: add setlease file operation nilfs2: add setlease file operation jfs: add setlease file operation jffs2: add setlease file operation gfs2: add a setlease file operation fat: add setlease file operation f2fs: add setlease file operation exfat: add setlease file operation ext4: add setlease file operation ...
2026-02-09Merge tag 'vfs-7.0-rc1.nonblocking_timestamps' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs timestamp updates from Christian Brauner: "This contains the changes to support non-blocking timestamp updates. Since commit 66fa3cedf16a ("fs: Add async write file modification handling") file_update_time_flags() unconditionally returns -EAGAIN when any timestamp needs updating and IOCB_NOWAIT is set. This makes non-blocking direct writes impossible on file systems with granular enough timestamps, which in practice means all of them. This reworks the timestamp update path to propagate IOCB_NOWAIT through ->update_time so that file systems which can update timestamps without blocking are no longer penalized. With that groundwork in place, the core change passes IOCB_NOWAIT into ->update_time and returns -EAGAIN only when the file system indicates it would block. XFS implements non-blocking timestamp updates by using the new ->sync_lazytime and open-coding generic_update_time without the S_NOWAIT check, since the lazytime path through the generic helpers can never block in XFS" * tag 'vfs-7.0-rc1.nonblocking_timestamps' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: xfs: enable non-blocking timestamp updates xfs: implement ->sync_lazytime fs: refactor file_update_time_flags fs: add support for non-blocking timestamp updates fs: add a ->sync_lazytime method fs: factor out a sync_lazytime helper fs: refactor ->update_time handling fat: cleanup the flags for fat_truncate_time nfs: split nfs_update_timestamps fs: allow error returns from generic_update_time fs: remove inode_update_time
2026-02-09Merge tag 'vfs-7.0-rc1.initrd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs initrd removal from Christian Brauner: "Remove the deprecated linuxrc-based initrd code path and related dead code. The linuxrc initrd path was deprecated in 2020 and this series completes its removal. If we see real-life regressions we'll revert. The core change removes handle_initrd() and init_linuxrc() — the entire flow that ran /linuxrc from an initrd, pivoted roots, and handed off to the real root filesystem. With that gone, initrd_load() becomes void (no longer short-circuits prepare_namespace()), rd_load_image() is simplified to always load /initrd.image instead of taking a path, and rd_load_disk() is deleted. The /proc/sys/kernel/real-root-dev sysctl and its backing variable are removed since they only existed for linuxrc to communicate the real root device back to the kernel. The no-op load_ramdisk= and prompt_ramdisk= parameters are dropped, and noinitrd and ramdisk_start= gain deprecation warnings. Initramfs is entirely unaffected. The non-linuxrc initrd path (root=/dev/ram0) is preserved but now carries a deprecation warning targeting January 2027 removal" * tag 'vfs-7.0-rc1.initrd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: init: remove /proc/sys/kernel/real-root-dev initrd: remove deprecated code path (linuxrc) init: remove deprecated "load_ramdisk" and "prompt_ramdisk" command line parameters
2026-02-09Merge tag 'lsm-pr-20260203' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm updates from Paul Moore: - Unify the security_inode_listsecurity() calls in NFSv4 While looking at security_inode_listsecurity() with an eye towards improving the interface, we realized that the NFSv4 code was making multiple calls to the LSM hook that could be consolidated into one. - Mark the LSM static branch keys as static - this helps resolve some sparse warnings - Add __rust_helper annotations to the LSM and cred wrapper functions - Remove the unsused set_security_override_from_ctx() function - Minor fixes to some of the LSM kdoc comment blocks * tag 'lsm-pr-20260203' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: lsm: make keys for static branch static cred: remove unused set_security_override_from_ctx() rust: security: add __rust_helper to helpers rust: cred: add __rust_helper to helpers nfs: unify security_inode_listsecurity() calls lsm: fix kernel-doc struct member names
2026-02-09Merge tag 'audit-pr-20260203' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: - Improve the NETFILTER_PKT audit records Add source and destination ports to the NETFILTER_PKT audit records while also consolidating a lot of the code into a new, singular audit_log_nf_skb() function. This new approach to structuring the NETFILTER_PKT record generation should eliminate some unnecessary overhead when audit is not built into the kernel. - Update the audit syscall classifier code Add the listxattrat(), getxattrat(), and fchmodat2() syscall to the audit code which classifies syscalls into categories of operations, e.g. "read" or "change attributes". - Move the syscall classifier declarations into audit_arch.h Shuffle around some header file declarations to resolve some sparse warnings. * tag 'audit-pr-20260203' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: move the compat_xxx_class[] extern declarations to audit_arch.h audit: add missing syscalls to read class audit: include source and destination ports to NETFILTER_PKT audit: add audit_log_nf_skb helper function audit: add fchmodat2() to change attributes class
2026-02-09Merge tag 'i3c/for-6.20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux Pull i3c updates from Alexandre Belloni: "Subsystem: - add sysfs entry and attribute for Device NACK Retry count Drivers: - dw: Device NACK Retry configuration knob - mipi-i3c-hci: support multi-bus instances, runtime PM, and suspend - renesas: suspend/resume support" * tag 'i3c/for-6.20' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: (52 commits) i3c: dw-i3c-master: fix SIR reject bit mapping for dynamic addresses i3c: dw-i3c-master: convert spinlock usage to scoped guards i3c: dw: Fix memory leak in dw_i3c_master_i2c_xfers() i3c: mipi-i3c-hci-pci: Add System Suspend support i3c: mipi-i3c-hci: Add optional System Suspend support i3c: master: Add i3c_master_do_daa_ext() for post-hibernation address recovery i3c: dw: Initialize spinlock to avoid upsetting lockdep i3c: mipi-i3c-hci-pci: Add Runtime PM support i3c: mipi-i3c-hci: Add optional Runtime PM support i3c: master: Introduce optional Runtime PM support i3c: mipi-i3c-hci: Factor out master dynamic address setting into helper i3c: mipi-i3c-hci: Allow core re-initialization for Runtime PM support i3c: mipi-i3c-hci: Factor out core initialization into helper i3c: mipi-i3c-hci: Factor out IO mode setting into helper i3c: mipi-i3c-hci: Factor out software reset into helper i3c: mipi-i3c-hci: Add PIO suspend and resume support i3c: mipi-i3c-hci: Refactor PIO register initialization i3c: mipi-i3c-hci: Add DMA suspend and resume support i3c: mipi-i3c-hci: Extract ring initialization from hci_dma_init() i3c: mipi-i3c-hci: Introduce helper to restore DAT ...
2026-02-09Merge tag 'rcu.release.v7.0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux Pull RCU updates from Boqun Feng: - RCU Tasks Trace: Re-implement RCU tasks trace in term of SRCU-fast, not only more than 500 lines of code are saved because of the reimplementation, a new set of API, rcu_read_{,un}lock_tasks_trace(), becomes possible as well. Compared to the previous rcu_read_{,un}lock_trace(), the new API avoid the task_struct accesses thanks to the SRCU-fast semantics. As a result, the old rcu_read{,un}lock_trace() API is now deprecated. - RCU Torture Test: - Multiple improvements on kvm-series.sh (parallel run and progress showing metrics) - Add context checks to rcu_torture_timer() - Make config2csv.sh properly handle comments in .boot files - Include commit discription in testid.txt - Miscellaneous RCU changes: - Reduce synchronize_rcu() latency by reporting GP kthread's CPU QS early - Use suitable gfp_flags for the init_srcu_struct_nodes() - Fix rcu_read_unlock() deadloop due to softirq - Correctly compute probability to invoke ->exp_current() in rcutorture - Make expedited RCU CPU stall warnings detect stall-end races - RCU nocb: - Remove unnecessary WakeOvfIsDeferred wake path and callback overload handling - Extract nocb_defer_wakeup_cancel() helper * tag 'rcu.release.v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (25 commits) rcu/nocb: Extract nocb_defer_wakeup_cancel() helper rcu/nocb: Remove dead callback overload handling rcu/nocb: Remove unnecessary WakeOvfIsDeferred wake path rcu: Reduce synchronize_rcu() latency by reporting GP kthread's CPU QS early srcu: Use suitable gfp_flags for the init_srcu_struct_nodes() rcu: Fix rcu_read_unlock() deadloop due to softirq rcutorture: Correctly compute probability to invoke ->exp_current() rcu: Make expedited RCU CPU stall warnings detect stall-end races rcutorture: Add --kill-previous option to terminate previous kvm.sh runs rcutorture: Prevent concurrent kvm.sh runs on same source tree torture: Include commit discription in testid.txt torture: Make config2csv.sh properly handle comments in .boot files torture: Make kvm-series.sh give run numbers and totals torture: Make kvm-series.sh give build numbers and totals torture: Parallelize kvm-series.sh guest-OS execution rcutorture: Add context checks to rcu_torture_timer() rcutorture: Test rcu_tasks_trace_expedite_current() srcu: Create an rcu_tasks_trace_expedite_current() function checkpatch: Deprecate rcu_read_{,un}lock_trace() rcu: Update Requirements.rst for RCU Tasks Trace ...
2026-02-07Merge tag 'sched-urgent-2026-02-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Miscellaneous MMCID fixes to address bugs and performance regressions in the recent rewrite of the SCHED_MM_CID management code: - Fix livelock triggered by BPF CI testing - Fix hard lockup on weakly ordered systems - Simplify the dropping of CIDs in the exit path by removing an unintended transition phase - Fix performance/scalability regression on a thread-pool benchmark by optimizing transitional CIDs when scheduling out" * tag 'sched-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/mmcid: Optimize transitional CIDs when scheduling out sched/mmcid: Drop per CPU CID immediately when switching to per task mode sched/mmcid: Protect transition on weakly ordered systems sched/mmcid: Prevent live lock on task to CPU mode transition
2026-02-07Merge tag 'objtool-urgent-2026-02-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Ingo Molnar:: - Bump up the Clang minimum version requirements for livepatch builds, due to Clang assembler section handling bugs causing silent miscompilations - Strip livepatching symbol artifacts from non-livepatch modules - Fix livepatch build warnings when certain Clang LTO options are enabled - Fix livepatch build error when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y * tag 'objtool-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool/klp: Fix unexported static call key access for manually built livepatch modules objtool/klp: Fix symbol correlation for orphaned local symbols livepatch: Free klp_{object,func}_ext data after initialization livepatch: Fix having __klp_objects relics in non-livepatch modules livepatch/klp-build: Require Clang assembler >= 20
2026-02-06Merge tag 'mm-hotfixes-stable-2026-02-06-12-37' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "A couple of late-breaking MM fixes. One against a new-in-this-cycle patch and the other addresses a locking issue which has been there for over a year" * tag 'mm-hotfixes-stable-2026-02-06-12-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/memory-failure: reject unsupported non-folio compound page procfs: avoid fetching build ID while holding VMA lock
2026-02-06Merge tag 'ceph-for-6.19-rc9' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "One RBD and two CephFS fixes which address potential oopses. The RBD thing is more of a rare edge case that pops up in our CI, while the two CephFS scenarios are regressions that were reported by users and can be triggered trivially in normal operation. All marked for stable" * tag 'ceph-for-6.19-rc9' of https://github.com/ceph/ceph-client: ceph: fix NULL pointer dereference in ceph_mds_auth_match() ceph: fix oops due to invalid pointer for kfree() in parse_longname() rbd: check for EOD after exclusive lock is ensured to be held
2026-02-06io_uring: allow registration of per-task restrictionsJens Axboe
Currently io_uring supports restricting operations on a per-ring basis. To use those, the ring must be setup in a disabled state by setting IORING_SETUP_R_DISABLED. Then restrictions can be set for the ring, and the ring can then be enabled. This commit adds support for IORING_REGISTER_RESTRICTIONS with ring_fd == -1, like the other "blind" register opcodes which work on the task rather than a specific ring. This allows registration of the same kind of restrictions as can been done on a specific ring, but with the task itself. Once done, any ring created will inherit these restrictions. If a restriction filter is registered with a task, then it's inherited on fork for its children. Children may only further restrict operations, not extend them. Inheriting restrictions include both the classic IORING_REGISTER_RESTRICTIONS based restrictions, as well as the BPF filters that have been registered with the task via IORING_REGISTER_BPF_FILTER. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-02-06io_uring: add task fork hookJens Axboe
Called when copy_process() is called to copy state to a new child. Right now this is just a stub, but will be used shortly to properly handle fork'ing of task based io_uring restrictions. Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-02-05procfs: avoid fetching build ID while holding VMA lockAndrii Nakryiko
Fix PROCMAP_QUERY to fetch optional build ID only after dropping mmap_lock or per-VMA lock, whichever was used to lock VMA under question, to avoid deadlock reported by syzbot: -> #1 (&mm->mmap_lock){++++}-{4:4}: __might_fault+0xed/0x170 _copy_to_iter+0x118/0x1720 copy_page_to_iter+0x12d/0x1e0 filemap_read+0x720/0x10a0 blkdev_read_iter+0x2b5/0x4e0 vfs_read+0x7f4/0xae0 ksys_read+0x12a/0x250 do_syscall_64+0xcb/0xf80 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}: __lock_acquire+0x1509/0x26d0 lock_acquire+0x185/0x340 down_read+0x98/0x490 blkdev_read_iter+0x2a7/0x4e0 __kernel_read+0x39a/0xa90 freader_fetch+0x1d5/0xa80 __build_id_parse.isra.0+0xea/0x6a0 do_procmap_query+0xd75/0x1050 procfs_procmap_ioctl+0x7a/0xb0 __x64_sys_ioctl+0x18e/0x210 do_syscall_64+0xcb/0xf80 entry_SYSCALL_64_after_hwframe+0x77/0x7f other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- rlock(&mm->mmap_lock); lock(&sb->s_type->i_mutex_key#8); lock(&mm->mmap_lock); rlock(&sb->s_type->i_mutex_key#8); *** DEADLOCK *** This seems to be exacerbated (as we haven't seen these syzbot reports before that) by the recent: 777a8560fd29 ("lib/buildid: use __kernel_read() for sleepable context") To make this safe, we need to grab file refcount while VMA is still locked, but other than that everything is pretty straightforward. Internal build_id_parse() API assumes VMA is passed, but it only needs the underlying file reference, so just add another variant build_id_parse_file() that expects file passed directly. [akpm@linux-foundation.org: fix up kerneldoc] Link: https://lkml.kernel.org/r/20260129215340.3742283-1-andrii@kernel.org Fixes: ed5d583a88a9 ("fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reported-by: <syzbot+4e70c8e0a2017b432f7a@syzkaller.appspotmail.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-05Merge tag 'net-6.19-rc9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireless and Netfilter. Previous releases - regressions: - eth: stmmac: fix stm32 (and potentially others) resume regression - nf_tables: fix inverted genmask check in nft_map_catchall_activate() - usb: r8152: fix resume reset deadlock - fix reporting RXH_XFRM_NO_CHANGE as input_xfrm for RSS contexts Previous releases - always broken: - sched: cls_u32: use skb_header_pointer_careful() to avoid OOB reads with malicious u32 rules - eth: ice: timestamping related fixes" * tag 'net-6.19-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits) ipv6: Fix ECMP sibling count mismatch when clearing RTF_ADDRCONF netfilter: nf_tables: fix inverted genmask check in nft_map_catchall_activate() net: cpsw: Execute ndo_set_rx_mode callback in a work queue net: cpsw_new: Execute ndo_set_rx_mode callback in a work queue gve: Correct ethtool rx_dropped calculation gve: Fix stats report corruption on queue count change selftest: net: add a test-case for encap segmentation after GRO net: gro: fix outer network offset net: add proper RCU protection to /proc/net/ptype net: ethernet: adi: adin1110: Check return value of devm_gpiod_get_optional() in adin1110_check_spi() wifi: iwlwifi: mvm: pause TCM on fast resume wifi: iwlwifi: mld: cancel mlo_scan_start_wk net: spacemit: k1-emac: fix jumbo frame support net: enetc: Convert 16-bit register reads to 32-bit for ENETC v4 net: enetc: Convert 16-bit register writes to 32-bit for ENETC v4 net: enetc: Remove CBDR cacheability AXI settings for ENETC v4 net: enetc: Remove SI/BDR cacheability AXI settings for ENETC v4 tipc: use kfree_sensitive() for session key material net: stmmac: fix stm32 (and potentially others) resume regression net: rss: fix reporting RXH_XFRM_NO_CHANGE as input_xfrm for contexts ...
2026-02-05livepatch: Fix having __klp_objects relics in non-livepatch modulesPetr Pavlu
The linker script scripts/module.lds.S specifies that all input __klp_objects sections should be consolidated into an output section of the same name, and start/stop symbols should be created to enable scripts/livepatch/init.c to locate this data. This start/stop pattern is not ideal for modules because the symbols are created even if no __klp_objects input sections are present. Consequently, a dummy __klp_objects section also appears in the resulting module. This unnecessarily pollutes non-livepatch modules. Instead, since modules are relocatable files, the usual method for locating consolidated data in a module is to read its section table. This approach avoids the aforementioned problem. The klp_modinfo already stores a copy of the entire section table with the final addresses. Introduce a helper function that scripts/livepatch/init.c can call to obtain the location of the __klp_objects section from this data. Fixes: dd590d4d57eb ("objtool/klp: Introduce klp diff subcommand for diffing object files") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Aaron Tomlin <atomlin@atomlin.com> Link: https://patch.msgid.link/20260123102825.3521961-2-petr.pavlu@suse.com Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2026-02-05Merge branch 'acpi-apei'Rafael J. Wysocki
Merge ACPI APEI support updates for 6.20-rc1/7.0-rc1: - Make read-only array non_mmio_desc[] static const (Colin Ian King) - Prevent the APEI GHES support code on ARM from accessing memory out of bounds or going past the ARM processor CPER record buffer (Mauro Carvalho Chehab) - Prevent cper_print_fw_err() from dumping the entire memory on systems with defective firmware (Mauro Carvalho Chehab) - Improve ghes_notify_nmi() status check to avoid unnecessary overhead in the NMI handler by carrying out all of the requisite preparations and the NMI registration time (Tony Luck) - Refactor the GHES driver by extracting common functionality into reusable helper functions to reduce code duplication and improve the ghes_notify_sea() status check in analogy with the previous ghes_notify_nmi() status check improvement (Shuai Xue) - Make ELOG and GHES log and trace consistently and support the CPER CXL protocol analogously (Fabio De Francesco) - Disable KASAN instrumentation in the APEI GHES driver when compile testing with clang < 18 (Nathan Chancellor) - Let ghes_edac be the preferred driver to load on __ZX__ and _BYO_ systems by extending the platform detection list in the APEI GHES driver (Tony W Wang-oc) * acpi-apei: ACPI: APEI: GHES: Add ghes_edac support for __ZX__ and _BYO_ systems ACPI: APEI: GHES: Disable KASAN instrumentation when compile testing with clang < 18 ACPI: extlog: Trace CPER CXL Protocol Error Section ACPI: APEI: GHES: Add helper to copy CPER CXL protocol error info to work struct ACPI: APEI: GHES: Add helper for CPER CXL protocol errors checks ACPI: extlog: Trace CPER PCI Express Error Section ACPI: extlog: Trace CPER Non-standard Section Body ACPI: APEI: GHES: Improve ghes_notify_sea() status check ACPI: APEI: GHES: Extract helper functions for error status handling ACPI: APEI: GHES: Improve ghes_notify_nmi() status check EFI/CPER: don't dump the entire memory region APEI/GHES: ensure that won't go past CPER allocated record EFI/CPER: don't go past the ARM processor CPER record buffer APEI/GHES: ARM processor Error: don't go past allocated memory ACPI: APEI: EINJ: make read-only array non_mmio_desc static const
2026-02-05Merge branches 'acpi-pm', 'acpi-sysfs', 'acpi-pci' and 'acpi-resource'Rafael J. Wysocki
Merge ACPI power management updates, ACPI sysfs interface updates, an ACPI support update related to PCI, and an ACPI device resources management update for 6.20-rc1/7.0-rc1: - Rework ACPI PM notification setup for PCI root buses and modify the ACPI PM setup for devices to register wakeup source objects under physical (that is, PCI, platform, etc.) devices instead of doing that under their ACPI companions (Rafael Wysocki) - Adjust debug messages regarding postponed ACPI PM printed during system resume to be more accurate (Rafael Wysocki) - Remove dead code from lps0_device_attach() (Gergo Koteles) - Start to invoke Microsoft Function 9 (Turn On Display) of the Low- Power S0 Idle (LPS0) _DSM in the suspend-to-idle resume flow on systems with ACPI LPS0 support to address a functional issue on Lenovo Yoga Slim 7i Aura (15ILL9), where system fans and keyboard backlights fail to resume after suspend (Jakob Riemenschneider) - Add sysfs attribute cid for exposing _CID lists under ACPI device objects (Rafael Wysocki) - Replace sprintf() with sysfs_emit() in all of the core ACPI sysfs interface code (Sumeet Pawnikar) - Use acpi_get_local_u64_address() in the code implementing ACPI support for PCI to evaluate _ADR instead of evaluating that object directly (Andy Shevchenko) - Add JWIPC JVC9100 to irq1_level_low_skip_override[] to unbreak serial IRQs on that system (Ai Chao) * acpi-pm: ACPI: x86: s2idle: Invoke Microsoft _DSM Function 9 (Turn On Display) ACPI: PM: Adjust messages regarding postponed ACPI PM ACPI: x86: s2idle: Remove dead code in lps0_device_attach() ACPI: PM: Register wakeup sources under physical devices ACPI: PCI: PM: Rework root bus notification setup * acpi-sysfs: ACPI: sysfs: Replace sprintf() with sysfs_emit() ACPI: sysfs: Add device cid attribute for exposing _CID lists * acpi-pci: ACPI: PCI: simplify code with acpi_get_local_u64_address() * acpi-resource: ACPI: resource: Add JWIPC JVC9100 to irq1_level_low_skip_override[]
2026-02-04Merge tag 'tsm-fixes-for-6.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm Pull TSM (TEE security Manager) fixes from Dan Williams: "The largest change is reverting part of an ABI that never shipped in a released kernel (Documentation/ABI/testing/sysfs-class-tsm). The fix / replacement for that is too large to squeeze in at this late date. The rest is a collection of small fixups: - Fix multiple streams per host bridge for SEV-TIO - Drop the TSM ABI for reporting IDE streams (to be replaced) - Fix virtual function enumeration - Fix reserved stream ID initialization - Fix unused variable compiler warning" * tag 'tsm-fixes-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm: crypto/ccp: Allow multiple streams on the same root bridge crypto/ccp: Use PCI bridge defaults for IDE coco/tsm: Remove unused variable tsm_rwsem PCI/IDE: Fix reading a wrong reg for unused sel stream initialization PCI/IDE: Fix off by one error calculating VF RID range Revert "PCI/TSM: Report active IDE streams"
2026-02-04ceph: fix NULL pointer dereference in ceph_mds_auth_match()Viacheslav Dubeyko
The CephFS kernel client has regression starting from 6.18-rc1. We have issue in ceph_mds_auth_match() if fs_name == NULL: const char fs_name = mdsc->fsc->mount_options->mds_namespace; ... if (auth->match.fs_name && strcmp(auth->match.fs_name, fs_name)) { / fsname mismatch, try next one */ return 0; } Patrick Donnelly suggested that: In summary, we should definitely start decoding `fs_name` from the MDSMap and do strict authorizations checks against it. Note that the `-o mds_namespace=foo` should only be used for selecting the file system to mount and nothing else. It's possible no mds_namespace is specified but the kernel will mount the only file system that exists which may have name "foo". This patch reworks ceph_mdsmap_decode() and namespace_equals() with the goal of supporting the suggested concept. Now struct ceph_mdsmap contains m_fs_name field that receives copy of extracted FS name by ceph_extract_encoded_string(). For the case of "old" CephFS file systems, it is used "cephfs" name. [ idryomov: replace redundant %*pE with %s in ceph_mdsmap_decode(), get rid of a series of strlen() calls in ceph_namespace_match(), drop changes to namespace_equals() body to avoid treating empty mds_namespace as equal, drop changes to ceph_mdsc_handle_fsmap() as namespace_equals() isn't an equivalent substitution there ] Cc: stable@vger.kernel.org Fixes: 22c73d52a6d0 ("ceph: fix multifs mds auth caps issue") Link: https://tracker.ceph.com/issues/73886 Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Patrick Donnelly <pdonnell@ibm.com> Tested-by: Patrick Donnelly <pdonnell@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2026-02-04Merge branch 'pm-runtime'Rafael J. Wysocki
Merge updates related to runtime PM for 6.20-rc1/7.0-rc1: - Make several drivers discard pm_runtime_put() return value in preparation for converting that function to a void one (Rafael Wysocki) * pm-runtime: drm: Discard pm_runtime_put() return value genirq/chip: Change irq_chip_pm_put() return type to void scsi: ufs: core: Discard pm_runtime_put() return values platform/chrome: cros_hps_i2c: Discard pm_runtime_put() return value coresight: Discard pm_runtime_put() return values hwspinlock: omap: Discard pm_runtime_put() return value watchdog: rzv2h_wdt: Discard pm_runtime_put() return value watchdog: rz: Discard pm_runtime_put() return values media: ccs: Discard pm_runtime_put() return value drm/imagination: Discard pm_runtime_put() return value USB: core: Discard pm_runtime_put() return value
2026-02-04Merge branch 'pm-sleep'Rafael J. Wysocki
Merge updates related to system suspend and hibernation for 6.20-rc1/7.0-rc1: - Stop flagging the PM runtime workqueue as freezable to avoid system suspend and resume deadlocks in subsystems that assume asynchronous runtime PM to work during system-wide PM transitions (Rafael Wysocki) - Drop redundant NULL pointer checks before acomp_request_free() from the hibernation code handling image saving (Rafael Wysocki) - Update wakeup_sources_walk_start() to handle empty lists of wakeup sources as appropriate (Samuel Wu) - Make dev_pm_clear_wake_irq() check the power.wakeirq value under power.lock to avoid race conditions (Gui-Dong Han) - Avoid bit field races related to power.work_in_progress in the core device suspend code (Xuewen Yan) * pm-sleep: PM: sleep: core: Avoid bit field races related to work_in_progress PM: sleep: wakeirq: harden dev_pm_clear_wake_irq() against races PM: wakeup: Handle empty list in wakeup_sources_walk_start() PM: hibernate: Drop NULL pointer checks before acomp_request_free() PM: sleep: Do not flag runtime PM workqueue as freezable
2026-02-04PM: sleep: core: Avoid bit field races related to work_in_progressXuewen Yan
In all of the system suspend transition phases, the async processing of a device may be carried out in parallel with power.work_in_progress updates for the device's parent or suppliers and if it touches bit fields from the same group (for example, power.must_resume or power.wakeup_path), bit field corruption is possible. To avoid that, turn work_in_progress in struct dev_pm_info into a proper bool field and relocate it to save space. Fixes: aa7a9275ab81 ("PM: sleep: Suspend async parents after suspending children") Fixes: 443046d1ad66 ("PM: sleep: Make suspend of devices more asynchronous") Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Closes: https://lore.kernel.org/linux-pm/20260203063459.12808-1-xuewen.yan@unisoc.com/ Cc: All applicable <stable@vger.kernel.org> [ rjw: Added subject and changelog ] Link: https://patch.msgid.link/CAB8ipk_VX2VPm706Jwa1=8NSA7_btWL2ieXmBgHr2JcULEP76g@mail.gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-02-04sched/mmcid: Protect transition on weakly ordered systemsThomas Gleixner
Shrikanth reported a hard lockup which he observed once. The stack trace shows the following CID related participants: watchdog: CPU 23 self-detected hard LOCKUP @ mm_get_cid+0xe8/0x188 NIP: mm_get_cid+0xe8/0x188 LR: mm_get_cid+0x108/0x188 mm_cid_switch_to+0x3c4/0x52c __schedule+0x47c/0x700 schedule_idle+0x3c/0x64 do_idle+0x160/0x1b0 cpu_startup_entry+0x48/0x50 start_secondary+0x284/0x288 start_secondary_prolog+0x10/0x14 watchdog: CPU 11 self-detected hard LOCKUP @ plpar_hcall_norets_notrace+0x18/0x2c NIP: plpar_hcall_norets_notrace+0x18/0x2c LR: queued_spin_lock_slowpath+0xd88/0x15d0 _raw_spin_lock+0x80/0xa0 raw_spin_rq_lock_nested+0x3c/0xf8 mm_cid_fixup_cpus_to_tasks+0xc8/0x28c sched_mm_cid_exit+0x108/0x22c do_exit+0xf4/0x5d0 make_task_dead+0x0/0x178 system_call_exception+0x128/0x390 system_call_vectored_common+0x15c/0x2ec The task on CPU11 is running the CID ownership mode change fixup function and is stuck on a runqueue lock. The task on CPU23 is trying to get a CID from the pool with the same runqueue lock held, but the pool is empty. After decoding a similar issue in the opposite direction switching from per task to per CPU mode the tool which models the possible scenarios failed to come up with a similar loop hole. This showed up only once, was not reproducible and according to tooling not related to a overlooked scheduling scenario permutation. But the fact that it was observed on a PowerPC system gave the right hint: PowerPC is a weakly ordered architecture. The transition mechanism does: WRITE_ONCE(mm->mm_cid.transit, MM_CID_TRANSIT); WRITE_ONCE(mm->mm_cid.percpu, new_mode); fixup() WRITE_ONCE(mm->mm_cid.transit, 0); mm_cid_schedin() does: if (!READ_ONCE(mm->mm_cid.percpu)) ... cid |= READ_ONCE(mm->mm_cid.transit); so weakly ordered systems can observe percpu == false and transit == 0 even if the fixup function has not yet completed. As a consequence the task will not drop the CID when scheduling out before the fixup is completed, which means the CID space can be exhausted and the next task scheduling in will loop in mm_get_cid() and the fixup thread can livelock on the held runqueue lock as above. This could obviously be solved by using: smp_store_release(&mm->mm_cid.percpu, true); and smp_load_acquire(&mm->mm_cid.percpu); but that brings a memory barrier back into the scheduler hotpath, which was just designed out by the CID rewrite. That can be completely avoided by combining the per CPU mode and the transit storage into a single mm_cid::mode member and ordering the stores against the fixup functions to prevent the CPU from reordering them. That makes the update of both states atomic and a concurrent read observes always consistent state. The price is an additional AND operation in mm_cid_schedin() to evaluate the per CPU or the per task path, but that's in the noise even on strongly ordered architectures as the actual load can be significantly more expensive and the conditional branch evaluation is there anyway. Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions") Closes: https://lore.kernel.org/bdfea828-4585-40e8-8835-247c6a8a76b0@linux.ibm.com Reported-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260201192834.965217106@kernel.org
2026-02-03blk-mq: add a new queue sysfs attribute async_depthYu Kuai
Add a new field async_depth to request_queue and related APIs, this is currently not used, following patches will convert elevators to use this instead of internal async_depth. Signed-off-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-02-03block: convert nr_requests to unsigned intYu Kuai
This value represents the number of requests for elevator tags, or drivers tags if elevator is none. The max value for elevator tags is 2048, and in drivers at most 16 bits is used for tag. Signed-off-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-02-01Merge tag 'perf-urgent-2026-02-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events fix from Ingo Molnar: "Fix a race in the user-callchains code" * tag 'perf-urgent-2026-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: sched: Fix perf crash with new is_user_task() helper
2026-01-31i3c: master: Add i3c_master_do_daa_ext() for post-hibernation address recoveryAdrian Hunter
After system hibernation, I3C Dynamic Addresses may be reassigned at boot and no longer match the values recorded before suspend. Introduce i3c_master_do_daa_ext() to handle this situation. The restore procedure is straightforward: issue a Reset Dynamic Address Assignment (RSTDAA), then run the standard DAA sequence. The existing DAA logic already supports detecting and updating devices whose dynamic addresses differ from previously known values. Refactor the DAA path by introducing a shared helper used by both the normal i3c_master_do_daa() path and the new extended restore function, and correct the kernel-doc in the process. Export i3c_master_do_daa_ext() so that master drivers can invoke it from their PM restore callbacks. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260123063325.8210-2-adrian.hunter@intel.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2026-01-30perf: sched: Fix perf crash with new is_user_task() helperSteven Rostedt
In order to do a user space stacktrace the current task needs to be a user task that has executed in user space. It use to be possible to test if a task is a user task or not by simply checking the task_struct mm field. If it was non NULL, it was a user task and if not it was a kernel task. But things have changed over time, and some kernel tasks now have their own mm field. An idea was made to instead test PF_KTHREAD and two functions were used to wrap this check in case it became more complex to test if a task was a user task or not[1]. But this was rejected and the C code simply checked the PF_KTHREAD directly. It was later found that not all kernel threads set PF_KTHREAD. The io-uring helpers instead set PF_USER_WORKER and this needed to be added as well. But checking the flags is still not enough. There's a very small window when a task exits that it frees its mm field and it is set back to NULL. If perf were to trigger at this moment, the flags test would say its a user space task but when perf would read the mm field it would crash with at NULL pointer dereference. Now there are flags that can be used to test if a task is exiting, but they are set in areas that perf may still want to profile the user space task (to see where it exited). The only real test is to check both the flags and the mm field. Instead of making this modification in every location, create a new is_user_task() helper function that does all the tests needed to know if it is safe to read the user space memory or not. [1] https://lore.kernel.org/all/20250425204120.639530125@goodmis.org/ Fixes: 90942f9fac05 ("perf: Use current->flags & PF_KTHREAD|PF_USER_WORKER instead of current->mm == NULL") Closes: https://lore.kernel.org/all/0d877e6f-41a7-4724-875d-0b0a27b8a545@roeck-us.net/ Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260129102821.46484722@gandalf.local.home
2026-01-30Merge tag 'dma-mapping-6.19-2026-01-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fixes from Marek Szyprowski: - important fix for ARM 32-bit based systems using cma= kernel parameter (Oreoluwa Babatunde) - a fix for the corner case of the DMA atomic pool based allocations (Sai Sree Kartheek Adivi) * tag 'dma-mapping-6.19-2026-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma/pool: distinguish between missing and exhausted atomic pools of: reserved_mem: Allow reserved_mem framework detect "cma=" kernel param
2026-01-30block: introduce bdev_rot()Damien Le Moal
Introduce the helper function bdev_rot() to test if a block device is a rotational one. The existing function bdev_nonrot() which tests for the opposite condition is redefined using this new helper. This avoids the double negation (operator and name) that appears when testing if a block device is a rotational device, thus making the code a little easier to read. Call sites of bdev_nonrot() in the block layer are updated to use this new helper. Remaining users in other subsystems are left unchanged for now. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-29net: add skb_header_pointer_careful() helperEric Dumazet
This variant of skb_header_pointer() should be used in contexts where @offset argument is user-controlled and could be negative. Negative offsets are supported, as long as the zone starts between skb->head and skb->data. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260128141539.3404400-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-29block: introduce blk_queue_rot()Damien Le Moal
To check if a request queue is for a rotational device, a double negation is needed with the pattern "!blk_queue_nonrot(q)". Simplify this with the introduction of the helper blk_queue_rot() which tests if a requests queue limit has the BLK_FEAT_ROTATIONAL feature set. All call sites of blk_queue_nonrot() are modified to use blk_queue_rot() and blk_queue_nonrot() definition removed. No functional changes. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-29block: cleanup queue limit features definitionDamien Le Moal
Unwrap the definition of BLK_FEAT_ATOMIC_WRITES and renumber this feature to be sequential with BLK_FEAT_SKIP_TAGSET_QUIESCE. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-29Merge tag 'mm-hotfixes-stable-2026-01-29-09-41' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "16 hotfixes. 9 are cc:stable, 12 are for MM. There's a patch series from Pratyush Yadav which fixes a few things in the new-in-6.19 LUO memfd code. Plus the usual shower of singletons - please see the changelogs for details" * tag 'mm-hotfixes-stable-2026-01-29-09-41' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: vmcoreinfo: make hwerr_data visible for debugging mm/zone_device: reinitialize large zone device private folios mm/mm_init: don't cond_resched() in deferred_init_memmap_chunk() if called from deferred_grow_zone() mm/kfence: randomize the freelist on initialization kho: kho_preserve_vmalloc(): don't return 0 when ENOMEM kho: init alloc tags when restoring pages from reserved memory mm: memfd_luo: restore and free memfd_luo_ser on failure mm: memfd_luo: use memfd_alloc_file() instead of shmem_file_setup() memfd: export alloc_file() flex_proportions: make fprop_new_period() hardirq safe mailmap: add entry for Viacheslav Bocharov mm/memory-failure: teach kill_accessing_process to accept hugetlb tail page pfn mm/memory-failure: fix missing ->mf_stats count in hugetlb poison mm, swap: restore swap_space attr aviod kernel panic mm/kasan: fix KASAN poisoning in vrealloc() mm/shmem, swap: fix race of truncate and swap entry split