summaryrefslogtreecommitdiff
path: root/drivers/xen
AgeCommit message (Collapse)Author
14 hoursMerge tag 'for-linus-7.1-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - fix an error path in drivers/xen/manage.c - fix the Xen console driver solving a boot hangup when the console backend isn't yet running - comment fix in the Xen swiotlb driver - hardening for Xen on Arm adding a more thorough validation - cleanup of the Xen grant table code hiding suspend/resume code for the case if CONFIG_HIBERNATE_CALLBACKS isn't defined * tag 'for-linus-7.1-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/grant-table: guard gnttab_suspend/resume with CONFIG_HIBERNATE_CALLBACKS hvc/xen: Check console connection flag xen/swiotlb: fix stale reference to swiotlb_unmap_page() xen/manage: unwind partial shutdown watcher setup on error ARM: xen: validate hypervisor compatible before parsing its version
2 daysMerge tag 'acpi-7.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI support updates from Rafael Wysocki: "These include an update of the CMOS RTC driver and the related ACPI and x86 code that, among other things, switches it over to using the platform device interface for device binding on x86 instead of the PNP device driver interface (which allows the code in question to be simplified quite a bit), a major update of the ACPI Time and Alarm Device (TAD) driver adding an RTC class device interface to it, and updates of core ACPI drivers that remove some unnecessary and not really useful code from them. Apart from that, two drivers are converted to using the platform driver interface for device binding instead of the ACPI driver one, which is slated for removal, support for the Performance Limited register is added to the ACPI CPPC library and there are some janitorial updates of it and the related cpufreq CPPC driver, the ACPI processor driver is fixed and cleaned up, and NVIDIA vendor CPER record handler is added to the APEI GHES code. Also, the interface for obtaining a CPU UID from ACPI is consolidated across architectures and used for fixing a problem with the PCI TPH Steering Tag on ARM64, there are two updates related to ACPICA, a minor ACPI OS Services Layer (OSL) update, and a few assorted updates related to ACPI tables parsing. Specifics: - Update maintainers information regarding ACPICA (Rafael Wysocki) - Replace strncpy() with strscpy_pad() in acpi_ut_safe_strncpy() (Kees Cook) - Trigger an ordered system power off after encountering a fatal error operator in AML (Armin Wolf) - Enable ACPI FPDT parsing on LoongArch (Xi Ruoyao) - Remove the temporary stop-gap acpi_pptt_cache_v1_full structure from the ACPI PPTT parser (Ben Horgan) - Add support for exposing ACPI FPDT subtables FBPT and S3PT (Nate DeSimone) - Address multiple assorted issues and clean up the code in the ACPI processor idle driver (Huisong Li) - Replace strlcat() in the ACPI processor idle drive with a better alternative (Andy Shevchenko) - Rearrange and clean up acpi_processor_errata_piix4() (Rafael Wysocki) - Move reference performance to capabilities and fix an uninitialized variable in the ACPI CPPC library (Pengjie Zhang) - Add support for the Performance Limited Register to the ACPI CPPC library (Sumit Gupta) - Add cppc_get_perf() API to read performance controls, extend cppc_set_epp_perf() for FFH/SystemMemory, and make the ACPI CPPC library warn on missing mandatory DESIRED_PERF register (Sumit Gupta) - Modify the cpufreq CPPC driver to update MIN_PERF/MAX_PERF in target callbacks to allow it to control performance bounds via standard scaling_min_freq and scaling_max_freq sysfs attributes and add sysfs documentation for the Performance Limited Register to it (Sumit Gupta) - Add ACPI support to the platform device interface in the CMOS RTC driver, make the ACPI core device enumeration code create a platform device for the CMOS RTC, and drop CMOS RTC PNP device support (Rafael Wysocki) - Consolidate the x86-specific CMOS RTC handling with the ACPI TAD driver and clean up the CMOS RTC ACPI address space handler (Rafael Wysocki) - Enable ACPI alarm in the CMOS RTC driver if advertised in ACPI FADT and allow that driver to work without a dedicated IRQ if the ACPI alarm is used (Rafael Wysocki) - Clean up the ACPI TAD driver in various ways and add an RTC class device interface, including both the RTC setting/reading and alarm timer support, to it (Rafael Wysocki) - Clean up the ACPI AC and ACPI PAD (processor aggregator device) drivers (Rafael Wysocki) - Rework checking for duplicate video bus devices and consolidate pnp.bus_id workarounds handling in the ACPI video bus driver (Rafael Wysocki) - Update the ACPI core device drivers to stop setting acpi_device_name() unnecessarily (Rafael Wysocki) - Rearrange code using acpi_device_class() in the ACPI core device drivers and update them to stop setting acpi_device_class() unnecessarily (Rafael Wysocki) - Define ACPI_AC_CLASS in one place (Rafael Wysocki) - Convert the ni903x_wdt watchdog driver and the xen ACPI PAD driver to bind to platform devices instead of ACPI devices (Rafael Wysocki) - Add devm_ghes_register_vendor_record_notifier(), use it in the PCI hisi driver, and Add NVIDIA vendor CPER record handler (Kai-Heng Feng) - Consolidate the interface for obtaining a CPU UID from ACPI across architectures and use it to address incorrect PCI TPH Steering Tag on ARM64 resulting from the invalid assumption that the ACPI Processor UID would always be the same as the corresponding logical CPU ID in Linux (Chengwen Feng)" * tag 'acpi-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (73 commits) ACPICA: Update maintainers information watchdog: ni903x_wdt: Convert to a platform driver ACPI: PAD: xen: Convert to a platform driver ACPI: processor: idle: Reset cpuidle on C-state list changes cpuidle: Extract and export no-lock variants of cpuidle_unregister_device() PCI/TPH: Pass ACPI Processor UID to Cache Locality _DSM ACPI: PPTT: Use acpi_get_cpu_uid() and remove get_acpi_id_for_cpu() perf: arm_cspmu: Switch to acpi_get_cpu_uid() from get_acpi_id_for_cpu() ACPI: Centralize acpi_get_cpu_uid() declaration in include/linux/acpi.h x86/acpi: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval RISC-V: ACPI: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval LoongArch: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval arm64: acpi: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval ACPI: APEI: GHES: Add NVIDIA vendor CPER record handler PCI: hisi: Use devm_ghes_register_vendor_record_notifier() ACPI: APEI: GHES: Add devm_ghes_register_vendor_record_notifier() ACPI: tables: Enable FPDT on LoongArch ACPI: processor: idle: Fix NULL pointer dereference in hotplug path ACPI: processor: idle: Reset power_setup_done flag on initialization failure ACPI: TAD: Add alarm support to the RTC class device interface ...
2 daysMerge tag 'driver-core-7.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core updates from Danilo Krummrich: "debugfs: - Fix NULL pointer dereference in debugfs_create_str() - Fix misplaced EXPORT_SYMBOL_GPL for debugfs_create_str() - Fix soundwire debugfs NULL pointer dereference from uninitialized firmware_file device property: - Make fwnode flags modifications thread safe; widen the field to unsigned long and use set_bit() / clear_bit() based accessors - Document how to check for the property presence devres: - Separate struct devres_node from its "subclasses" (struct devres, struct devres_group); give struct devres_node its own release and free callbacks for per-type dispatch - Introduce struct devres_action for devres actions, avoiding the ARCH_DMA_MINALIGN alignment overhead of struct devres - Export struct devres_node and its init/add/remove/dbginfo primitives for use by Rust Devres<T> - Fix missing node debug info in devm_krealloc() - Use guard(spinlock_irqsave) where applicable; consolidate unlock paths in devres_release_group() driver_override: - Convert PCI, WMI, vdpa, s390/cio, s390/ap, and fsl-mc to the generic driver_override infrastructure, replacing per-bus driver_override strings, sysfs attributes, and match logic; fixes a potential UAF from unsynchronized access to driver_override in bus match() callbacks - Simplify __device_set_driver_override() logic kernfs: - Send IN_DELETE_SELF and IN_IGNORED inotify events on kernfs file and directory removal - Add corresponding selftests for memcg platform: - Allow attaching software nodes when creating platform devices via a new 'swnode' field in struct platform_device_info - Add kerneldoc for struct platform_device_info software node: - Move software node initialization from postcore_initcall() to driver_init(), making it available early in the boot process - Move kernel_kobj initialization (ksysfs_init) earlier to support the above - Remove software_node_exit(); dead code in a built-in unit SoC: - Introduce of_machine_read_compatible() and of_machine_read_model() OF helpers and export soc_attr_read_machine() to replace direct accesses to of_root from SoC drivers; also enables CONFIG_COMPILE_TEST coverage for these drivers sysfs: - Constify attribute group array pointers to 'const struct attribute_group *const *' in sysfs functions, device_add_groups() / device_remove_groups(), and struct class Rust: - Devres: - Embed struct devres_node directly in Devres<T> instead of going through devm_add_action(), avoiding the extra allocation and the unnecessary ARCH_DMA_MINALIGN alignment - I/O: - Turn IoCapable from a marker trait into a functional trait carrying the raw I/O accessor implementation (io_read / io_write), providing working defaults for the per-type Io methods - Add RelaxedMmio wrapper type, making relaxed accessors usable in code generic over the Io trait - Remove overloaded per-type Io methods and per-backend macros from Mmio and PCI ConfigSpace - I/O (Register): - Add IoLoc trait and generic read/write/update methods to the Io trait, making I/O operations parameterizable by typed locations - Add register! macro for defining hardware register types with typed bitfield accessors backed by Bounded values; supports direct, relative, and array register addressing - Add write_reg() / try_write_reg() and LocatedRegister trait - Update PCI sample driver to demonstrate the register! macro Example: ``` register! { /// UART control register. CTRL(u32) @ 0x18 { /// Receiver enable. 19:19 rx_enable => bool; /// Parity configuration. 14:13 parity ?=> Parity; } /// FIFO watermark and counter register. WATER(u32) @ 0x2c { /// Number of datawords in the receive FIFO. 26:24 rx_count; /// RX interrupt threshold. 17:16 rx_water; } } impl WATER { fn rx_above_watermark(&self) -> bool { self.rx_count() > self.rx_water() } } fn init(bar: &pci::Bar<BAR0_SIZE>) { let water = WATER::zeroed() .with_const_rx_water::<1>(); // > 3 would not compile bar.write_reg(water); let ctrl = CTRL::zeroed() .with_parity(Parity::Even) .with_rx_enable(true); bar.write_reg(ctrl); } fn handle_rx(bar: &pci::Bar<BAR0_SIZE>) { if bar.read(WATER).rx_above_watermark() { // drain the FIFO } } fn set_parity(bar: &pci::Bar<BAR0_SIZE>, parity: Parity) { bar.update(CTRL, |r| r.with_parity(parity)); } ``` - IRQ: - Move 'static bounds from where clauses to trait declarations for IRQ handler traits - Misc: - Enable the generic_arg_infer Rust feature - Extend Bounded with shift operations, single-bit bool conversion, and const get() Misc: - Make deferred_probe_timeout default a Kconfig option - Drop auxiliary_dev_pm_ops; the PM core falls back to driver PM callbacks when no bus type PM ops are set - Add conditional guard support for device_lock() - Add ksysfs.c to the DRIVER CORE MAINTAINERS entry - Fix kernel-doc warnings in base.h - Fix stale reference to memory_block_add_nid() in documentation" * tag 'driver-core-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (67 commits) bus: fsl-mc: use generic driver_override infrastructure s390/ap: use generic driver_override infrastructure s390/cio: use generic driver_override infrastructure vdpa: use generic driver_override infrastructure platform/wmi: use generic driver_override infrastructure PCI: use generic driver_override infrastructure driver core: make software nodes available earlier software node: remove software_node_exit() kernel: ksysfs: initialize kernel_kobj earlier MAINTAINERS: add ksysfs.c to the DRIVER CORE entry drivers/base/memory: fix stale reference to memory_block_add_nid() device property: Document how to check for the property presence soundwire: debugfs: initialize firmware_file to empty string debugfs: fix placement of EXPORT_SYMBOL_GPL for debugfs_create_str() debugfs: check for NULL pointer in debugfs_create_str() driver core: Make deferred_probe_timeout default a Kconfig option driver core: simplify __device_set_driver_override() clearing logic driver core: auxiliary bus: Drop auxiliary_dev_pm_ops device property: Make modifications of fwnode "flags" thread safe rust: devres: embed struct devres_node directly ...
6 daysxen/grant-table: guard gnttab_suspend/resume with CONFIG_HIBERNATE_CALLBACKSPengpeng Hou
In current linux.git, gnttab_suspend() and gnttab_resume() are defined and declared unconditionally. However, their only in-tree callers reside in drivers/xen/manage.c, which are guarded by CONFIG_HIBERNATE_CALLBACKS. Match the helper scope to their callers by wrapping the definitions in CONFIG_HIBERNATE_CALLBACKS and providing no-op stubs in the header. This fixes the config-scope mismatch and reduces the code footprint when hibernation callbacks are disabled. Signed-off-by: Pengpeng Hou <pengpeng.hou@isrc.iscas.ac.cn> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20260310080800.742223-1-pengpeng.hou@isrc.iscas.ac.cn>
6 daysxen/swiotlb: fix stale reference to swiotlb_unmap_page()Kexin Sun
Commit af85de5a9f00 ("xen: swiotlb: Switch to physical address mapping callbacks") renamed xen_swiotlb_unmap_page() to xen_swiotlb_unmap_phys(). The comment in xen_swiotlb_unmap_sg() had already been missing the xen_ prefix (reading swiotlb_unmap_page()), and the rename only changed _page to _phys without correcting this, leaving it as swiotlb_unmap_phys(). Fix the reference to use the correct function name xen_swiotlb_unmap_phys(). Assisted-by: unnamed:deepseek-v3.2 coccinelle Signed-off-by: Kexin Sun <kexinsun@smail.nju.edu.cn> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20260321110039.8905-1-kexinsun@smail.nju.edu.cn>
6 daysxen/manage: unwind partial shutdown watcher setup on errorGuoHan Zhao
setup_shutdown_watcher() registers shutdown_watch first, then the sysrq watch, and finally publishes the supported feature-* nodes in xenstore. If sysrq watch registration fails, or xenbus_printf() fails after one or more feature nodes were created, the function returns immediately without undoing the earlier setup. This leaves the system in a partially initialized state, with registered watches and/or stale xenstore entries despite the function reporting failure. Unwind the partial setup before returning an error by unregistering any watches that were already registered and removing feature nodes that were already published. Signed-off-by: GuoHan Zhao <zhaoguohan@kylinos.cn> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20260407022443.12971-1-zhaoguohan@kylinos.cn>
7 daysMerge branch 'acpi-driver'Rafael J. Wysocki
Merge ACPI core driver core driver updates and assorted driver updates related to ACPI support for 7.1-rc1: - Clean up the ACPI AC and ACPI PAD (processor aggregator device) drivers (Rafael Wysocki) - Rework checking for duplicate video bus devices and consolidate pnp.bus_id workarounds handling in the ACPI video bus driver (Rafael Wysocki) - Update the ACPI core device drivers to stop setting acpi_device_name() unnecessarily (Rafael Wysocki) - Rearrange code using acpi_device_class() in the ACPI core device drivers and update them to stop setting acpi_device_class() unnecessarily (Rafael Wysocki) - Define ACPI_AC_CLASS in one place (Rafael Wysocki) - Convert the ni903x_wdt watchdog driver and the xen ACPI PAD driver to bind to platform devices instead of ACPI devices (Rafael Wysocki) * acpi-driver: watchdog: ni903x_wdt: Convert to a platform driver ACPI: PAD: xen: Convert to a platform driver ACPI: AC: Define ACPI_AC_CLASS in one place ACPI: driver: Do not set acpi_device_class() unnecessarily ACPI: driver: Avoid using pnp.device_class for netlink handling ACPI: event: Redefine acpi_notifier_call_chain() ACPI: driver: Do not set acpi_device_name() unnecessarily ACPI: video: Consolidate pnp.bus_id workarounds handling ACPI: video: Rework checking for duplicate video bus devices driver core: auxiliary bus: Introduce dev_is_auxiliary() ACPI: PAD: Rearrange notify handler installation and removal ACPI: AC: Get rid of unnecessary declarations
9 daysACPI: PAD: xen: Convert to a platform driverRafael J. Wysocki
In all cases in which a struct acpi_driver is used for binding a driver to an ACPI device object, a corresponding platform device is created by the ACPI core and that device is regarded as a proper representation of underlying hardware. Accordingly, a struct platform_driver should be used by driver code to bind to that device. There are multiple reasons why drivers should not bind directly to ACPI device objects [1]. Overall, it is better to bind drivers to platform devices than to their ACPI companions, so convert the Xen ACPI processor aggregator device (PAD) driver to a platform one. While this is not expected to alter functionality, it changes sysfs layout and so it will be visible to user space. Link: https://lore.kernel.org/all/2396510.ElGaqSPkdT@rafael.j.wysocki/ [1] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/8683270.T7Z3S40VBb@rafael.j.wysocki
13 daysPCI: use generic driver_override infrastructureDanilo Krummrich
When a driver is probed through __driver_attach(), the bus' match() callback is called without the device lock held, thus accessing the driver_override field without a lock, which can cause a UAF. Fix this by using the driver-core driver_override infrastructure taking care of proper locking internally. Note that calling match() from __driver_attach() without the device lock held is intentional. [1] Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1] Reported-by: Gui-Dong Han <hanguidong02@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789 Fixes: 782a985d7af2 ("PCI: Introduce new device binding path using pci_dev.driver_override") Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alex Williamson <alex@shazbot.org> Tested-by: Gui-Dong Han <hanguidong02@gmail.com> Reviewed-by: Gui-Dong Han <hanguidong02@gmail.com> Link: https://patch.msgid.link/20260324005919.2408620-6-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-03-26xen/privcmd: unregister xenstore notifier on module exitGuoHan Zhao
Commit 453b8fb68f36 ("xen/privcmd: restrict usage in unprivileged domU") added a xenstore notifier to defer setting the restriction target until Xenstore is ready. XEN_PRIVCMD can be built as a module, but privcmd_exit() leaves that notifier behind. Balance the notifier lifecycle by unregistering it on module exit. This is harmless even if xenstore was already ready at registration time and the notifier was never queued on the chain. Fixes: 453b8fb68f3641fe ("xen/privcmd: restrict usage in unprivileged domU") Signed-off-by: GuoHan Zhao <zhaoguohan@kylinos.cn> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20260325120246.252899-1-zhaoguohan@kylinos.cn>
2026-03-20xen/privcmd: add boot control for restricted usage in domUJuergen Gross
When running in an unprivileged domU under Xen, the privcmd driver is restricted to allow only hypercalls against a target domain, for which the current domU is acting as a device model. Add a boot parameter "unrestricted" to allow all hypercalls (the hypervisor will still refuse destructive hypercalls affecting other guests). Make this new parameter effective only in case the domU wasn't started using secure boot, as otherwise hypercalls targeting the domU itself might result in violating the secure boot functionality. This is achieved by adding another lockdown reason, which can be tested to not being set when applying the "unrestricted" option. This is part of XSA-482 Signed-off-by: Juergen Gross <jgross@suse.com> --- V2: - new patch
2026-03-20xen/privcmd: restrict usage in unprivileged domUJuergen Gross
The Xen privcmd driver allows to issue arbitrary hypercalls from user space processes. This is normally no problem, as access is usually limited to root and the hypervisor will deny any hypercalls affecting other domains. In case the guest is booted using secure boot, however, the privcmd driver would be enabling a root user process to modify e.g. kernel memory contents, thus breaking the secure boot feature. The only known case where an unprivileged domU is really needing to use the privcmd driver is the case when it is acting as the device model for another guest. In this case all hypercalls issued via the privcmd driver will target that other guest. Fortunately the privcmd driver can already be locked down to allow only hypercalls targeting a specific domain, but this mode can be activated from user land only today. The target domain can be obtained from Xenstore, so when not running in dom0 restrict the privcmd driver to that target domain from the beginning, resolving the potential problem of breaking secure boot. This is XSA-482 Reported-by: Teddy Astie <teddy.astie@vates.tech> Fixes: 1c5de1939c20 ("xen: add privcmd driver") Signed-off-by: Juergen Gross <jgross@suse.com> --- V2: - defer reading from Xenstore if Xenstore isn't ready yet (Jan Beulich) - wait in open() if target domain isn't known yet - issue message in case no target domain found (Jan Beulich)
2026-03-07Merge tag 'for-linus-7.0-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - a cleanup of arch/x86/kernel/head_64.S removing the pre-built page tables for Xen guests - a small comment update - another cleanup for Xen PVH guests mode - fix an issue with Xen PV-devices backed by driver domains * tag 'for-linus-7.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/xenbus: better handle backend crash xenbus: add xenbus_device parameter to xenbus_read_driver_state() x86/PVH: Use boot params to pass RSDP address in start_info page x86/xen: update outdated comment xen/acpi-processor: fix _CST detection using undersized evaluation buffer x86/xen: Build identity mapping page tables dynamically for XENPV
2026-03-04xen/xenbus: better handle backend crashJuergen Gross
When the backend domain crashes, coordinated device cleanup is not possible (as it involves waiting for the backend state change). In that case, toolstack forcefully removes frontend xenstore entries. xenbus_dev_changed() handles this case, and triggers device cleanup. It's possible that toolstack manages to connect new device in that place, before xenbus_dev_changed() notices the old one is missing. If that happens, new one won't be probed and will forever remain in XenbusStateInitialising. Fix this by checking the frontend's state in Xenstore. In case it has been reset to XenbusStateInitialising by Xen tools, consider this being the result of an unplug+plug operation. It's important that cleanup on such unplug doesn't modify Xenstore entries (especially the "state" key) as it belong to the new device to be probed - changing it would derail establishing connection to the new backend (most likely, closing the device before it was even connected). Handle this case by setting new xenbus_device->vanished flag to true, and check it before changing state entry. And even if xenbus_dev_changed() correctly detects the device was forcefully removed, the cleanup handling is still racy. Since this whole handling doesn't happened in a single Xenstore transaction, it's possible that toolstack might put a new device there already. Avoid re-creating the state key (which in the case of loosing the race would actually close newly attached device). The problem does not apply to frontend domain crash, as this case involves coordinated cleanup. Problem originally reported at https://lore.kernel.org/xen-devel/aOZvivyZ9YhVWDLN@mail-itl/T/#t, including reproduction steps. Based-on-patch-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20260218095205.453657-3-jgross@suse.com>
2026-03-04xenbus: add xenbus_device parameter to xenbus_read_driver_state()Juergen Gross
In order to prepare checking the xenbus device status in xenbus_read_driver_state(), add the pointer to struct xenbus_device as a parameter. Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: "Martin K. Petersen" <martin.petersen@oracle.com> # SCSI Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci/xen-pcifront.c Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20260218095205.453657-2-jgross@suse.com>
2026-03-03xen/acpi-processor: fix _CST detection using undersized evaluation bufferDavid Thomson
read_acpi_id() attempts to evaluate _CST using a stack buffer of sizeof(union acpi_object) (48 bytes), but _CST returns a nested Package of sub-Packages (one per C-state, each containing a register descriptor, type, latency, and power) requiring hundreds of bytes. The evaluation always fails with AE_BUFFER_OVERFLOW. On modern systems using FFH/MWAIT entry (where pblk is zero), this causes the function to return before setting the acpi_id_cst_present bit. In check_acpi_ids(), flags.power is then zero for all Phase 2 CPUs (physical CPUs beyond dom0's vCPU count), so push_cxx_to_hypervisor() is never called for them. On a system with dom0_max_vcpus=2 and 8 physical CPUs, only PCPUs 0-1 receive C-state data. PCPUs 2-7 are stuck in C0/C1 idle, unable to enter C2/C3. This costs measurable wall power (4W observed on an Intel Core Ultra 7 265K with Xen 4.20). The function never uses the _CST return value -- it only needs to know whether _CST exists. Replace the broken acpi_evaluate_object() call with acpi_has_method(), which correctly detects _CST presence using acpi_get_handle() without any buffer allocation. This brings C-state detection to parity with the P-state path, which already works correctly for Phase 2 CPUs. Fixes: 59a568029181 ("xen/acpi-processor: C and P-state driver that uploads said data to hypervisor.") Signed-off-by: David Thomson <dt@linux-mail.net> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20260224093707.19679-1-dt@linux-mail.net>
2026-02-22Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL usesKees Cook
Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert more 'alloc_obj' cases to default GFP_KERNEL argumentsLinus Torvalds
This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_flex' family to use the new default GFP_KERNEL argumentLinus Torvalds
This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the *alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs*'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-10Merge tag 'x86_paravirt_for_v7.0_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 paravirt updates from Borislav Petkov: - A nice cleanup to the paravirt code containing a unification of the paravirt clock interface, taming the include hell by splitting the pv_ops structure and removing of a bunch of obsolete code (Juergen Gross) * tag 'x86_paravirt_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/paravirt: Use XOR r32,r32 to clear register in pv_vcpu_is_preempted() x86/paravirt: Remove trailing semicolons from alternative asm templates x86/pvlocks: Move paravirt spinlock functions into own header x86/paravirt: Specify pv_ops array in paravirt macros x86/paravirt: Allow pv-calls outside paravirt.h objtool: Allow multiple pv_ops arrays x86/xen: Drop xen_mmu_ops x86/xen: Drop xen_cpu_ops x86/xen: Drop xen_irq_ops x86/paravirt: Move pv_native_*() prototypes to paravirt.c x86/paravirt: Introduce new paravirt-base.h header x86/paravirt: Move paravirt_sched_clock() related code into tsc.c x86/paravirt: Use common code for paravirt_steal_clock() riscv/paravirt: Use common code for paravirt_steal_clock() loongarch/paravirt: Use common code for paravirt_steal_clock() arm64/paravirt: Use common code for paravirt_steal_clock() arm/paravirt: Use common code for paravirt_steal_clock() sched: Move clock related paravirt code to kernel/sched paravirt: Remove asm/paravirt_api_clock.h x86/paravirt: Move thunk macros to paravirt_types.h ...
2026-02-09Merge tag 'for-linus-7.0-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - fix running as Xen PVH guest in 32-bit mode without PAE - fix PV device handling for suspend/resume when running as a Xen guest - clean up workqueue usage - fix the Xen balloon driver for PVH dom0 - introduce the possibility to use hypercalls for console messages in unprivileged guests - enable Xen dom0 use of virtio devices in nested virtualization setups - simplify the xen-mcelog driver * tag 'for-linus-7.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xenbus: Rename helpers to freeze/thaw/restore xenbus: Use .freeze/.thaw to handle xenbus devices xen/mcelog: simplify MCE_GETCLEAR_FLAGS using xchg() xen/balloon: improve accuracy of initial balloon target for dom0 Partial revert "x86/xen: fix balloon target initialization for PVH dom0" xen: introduce xen_console_io option xen/virtio: Don't use grant-dma-ops when running as Dom0 x86/xen/pvh: Enable PAE mode for 32-bit guest only when CONFIG_X86_PAE is set xen: privcmd: WQ_PERCPU added to alloc_workqueue users xen/events: replace use of system_wq with system_percpu_wq
2026-02-02xenbus: Rename helpers to freeze/thaw/restoreJason Andryuk
Rename the xenbus helpers called from the .freeze, .thaw, and .restore pm ops to have matching names. Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20251119224731.61497-3-jason.andryuk@amd.com>
2026-02-02xenbus: Use .freeze/.thaw to handle xenbus devicesJason Andryuk
The goal is to fix s2idle and S3 for Xen PV devices. A domain resuming from s3 or s2idle disconnects its PV devices during resume. The backends are not expecting this and do not reconnect. b3e96c0c7562 ("xen: use freeze/restore/thaw PM events for suspend/ resume/chkpt") changed xen_suspend()/do_suspend() from PMSG_SUSPEND/PMSG_RESUME to PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE, but the suspend/resume callbacks remained. .freeze/restore are used with hiberation where Linux restarts in a new place in the future. .suspend/resume are useful for runtime power management for the duration of a boot. The current behavior of the callbacks works for an xl save/restore or live migration where the domain is restored/migrated to a new location and connecting to a not-already-connected backend. Change xenbus_pm_ops to use .freeze/thaw/restore and drop the .suspend/resume hook. This matches the use in drivers/xen/manage.c for save/restore and live migration. With .suspend/resume empty, PV devices are left connected during s2idle and s3, so PV devices are not changed and work after resume. Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Acked-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20251119224731.61497-2-jason.andryuk@amd.com>
2026-02-02xen/mcelog: simplify MCE_GETCLEAR_FLAGS using xchg()Uros Bizjak
The MCE_GETCLEAR_FLAGS ioctl retrieves xen_mcelog.flags while atomically clearing it. This was previously implemented using a cmpxchg() loop. Replace the cmpxchg() loop with a single xchg(), which provides the same atomic get-and-clear semantics, avoids retry spinning under contention, and simplifies the code. The code on x86_64 improves from: 186: 8b 15 00 00 00 00 mov 0x0(%rip),%edx 18c: 89 d0 mov %edx,%eax 18e: f0 0f b1 0d 00 00 00 lock cmpxchg %ecx,0x0(%rip) 195: 00 196: 39 c2 cmp %eax,%edx 198: 75 ec jne 186 <...> to just: 186: 87 05 00 00 00 00 xchg %eax,0x0(%rip) No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20260122141754.116129-1-ubizjak@gmail.com>
2026-02-02xen/balloon: improve accuracy of initial balloon target for dom0Roger Pau Monne
The dom0 balloon target set by the toolstack is the value returned by XENMEM_current_reservation. Do the same in the kernel balloon driver and set the current allocation to the value returned by XENMEM_current_reservation. On my test system this causes the kernel balloon driver target to exactly match the value set by the toolstack in xenstore. Note this approach can be used by both PV and PVH dom0s, as the toolstack always uses XENMEM_current_reservation to set the initial target regardless of the dom0 type. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20260128110510.46425-3-roger.pau@citrix.com>
2026-02-02Partial revert "x86/xen: fix balloon target initialization for PVH dom0"Roger Pau Monne
This partially reverts commit 87af633689ce16ddb166c80f32b120e50b1295de so the current memory target for PV guests is still fetched from start_info->nr_pages, which matches exactly what the toolstack sets the initial memory target to. Using get_num_physpages() is possible on PV also, but needs adjusting to take into account the ISA hole and the PFN at 0 not considered usable memory despite being populated, and hence would need extra adjustments. Instead of carrying those extra adjustments switch back to the previous code. That leaves Linux with a difference in how current memory target is obtained for HVM vs PV, but that's better than adding extra logic just for PV. However if switching to start_info->nr_pages for PV domains we need to differentiate between released pages (freed back to the hypervisor) as opposed to pages in the physmap which are not populated to start with. Introduce a new xen_unpopulated_pages to account for papges that have never been populated, and hence in the PV case don't need subtracting. Fixes: 87af633689ce ("x86/xen: fix balloon target initialization for PVH dom0") Reported-by: James Dingwall <james@dingwall.me.uk> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20260128110510.46425-2-roger.pau@citrix.com>
2026-01-25Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Only one core change, the rest are drivers. The core change reorders some state operations in the error handler to try to prevent missed wake ups of the error handler (which can halt error processing and effectively freeze the entire system)" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: qla2xxx: Sanitize payload size to prevent member overflow scsi: target: iscsi: Fix use-after-free in iscsit_dec_session_usage_count() scsi: target: iscsi: Fix use-after-free in iscsit_dec_conn_usage_count() scsi: core: Wake up the error handler when final completions race against each other scsi: storvsc: Process unsupported MODE_SENSE_10 scsi: xen: scsiback: Fix potential memory leak in scsiback_remove()
2026-01-12x86/paravirt: Use common code for paravirt_steal_clock()Juergen Gross
Remove the arch-specific variant of paravirt_steal_clock() and use the common one instead. With all archs supporting Xen now having been switched to the common variant, including paravirt.h can be dropped from drivers/xen/time.c. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-12-jgross@suse.com
2026-01-12arm/paravirt: Use common code for paravirt_steal_clock()Juergen Gross
Remove the arch-specific variant of paravirt_steal_clock() and use the common one instead. This allows to remove paravirt.c and paravirt.h from arch/arm. Until all archs supporting Xen have been switched to the common code of paravirt_steal_clock(), drivers/xen/time.c needs to include asm/paravirt.h for those archs, while this is not necessary for arm any longer. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-8-jgross@suse.com
2026-01-12sched: Move clock related paravirt code to kernel/schedJuergen Gross
Paravirt clock related functions are available in multiple archs. In order to share the common parts, move the common static keys to kernel/sched/ and remove them from the arch specific files. Make a common paravirt_steal_clock() implementation available in kernel/sched/cputime.c, guarding it with a new config option CONFIG_HAVE_PV_STEAL_CLOCK_GEN, which can be selected by an arch in case it wants to use that common variant. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-7-jgross@suse.com
2026-01-12xen/virtio: Don't use grant-dma-ops when running as Dom0Teddy Astie
Dom0 inherit devices from the machine and is usually in PV mode. If we are running in a virtual that has virtio devices, these devices would be considered as using grants with Dom0 as backend, while being the said Dom0 itself, while we want to use these devices like regular PCI devices. Fix this by preventing grant-dma-ops from being used when running as Dom0 (initial domain). We still keep the device-tree logic as-is. Signed-off-by: Teddy Astie <teddy.astie@vates.tech> Fixes: 61367688f1fb0 ("xen/virtio: enable grant based virtio on x86") Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <6698564dd2270a9f7377b78ebfb20cb425cabbe8.1767720955.git.teddy.astie@vates.tech>
2026-01-12xen: privcmd: WQ_PERCPU added to alloc_workqueue usersMarco Crivellari
Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This continues the effort to refactor workqueue APIs, which began with the introduction of new workqueues and a new alloc_workqueue flag in: commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq") commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag") This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20251106155831.306248-3-marco.crivellari@suse.com>
2026-01-12xen/events: replace use of system_wq with system_percpu_wqMarco Crivellari
Currently if a user enqueues a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed without refactoring the API. This continues the effort to refactor workqueue APIs, which began with the introduction of new workqueues and a new alloc_workqueue flag in: commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq") commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag") Switch to using system_percpu_wq because system_wq is going away as part of a workqueue restructuring. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20251106155831.306248-2-marco.crivellari@suse.com>
2026-01-11scsi: xen: scsiback: Fix potential memory leak in scsiback_remove()Abdun Nihaal
Memory allocated for struct vscsiblk_info in scsiback_probe() is not freed in scsiback_remove() leading to potential memory leaks on remove, as well as in the scsiback_probe() error paths. Fix that by freeing it in scsiback_remove(). Cc: stable@vger.kernel.org Fixes: d9d660f6e562 ("xen-scsiback: Add Xen PV SCSI backend driver") Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://patch.msgid.link/20251223063012.119035-1-nihaal@cse.iitm.ac.in Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-01-05ACPI: PCI: IRQ: Fix INTx GSIs signednessLorenzo Pieralisi
In ACPI Global System Interrupts (GSIs) are described using a 32-bit value. ACPI/PCI legacy interrupts (INTx) parsing code treats GSIs as 'int', which poses issues if the GSI interrupt value is a 32-bit value with the MSB set (as required in some interrupt configurations - eg ARM64 GICv5 systems) because acpi_pci_link_allocate_irq() treats a negative gsi return value as a failed GSI allocation (and acpi_irq_get_penalty() would trigger an out-of-bounds array dereference if the 'irq' param is a negative value). Fix ACPI/PCI legacy INTx parsing by converting variables representing GSIs from 'int' to 'u32' bringing the code in line with the ACPI specification and fixing the current parsing issue. Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260105101705.36703-1-lpieralisi@kernel.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-12-06Merge tag 'for-linus-6.19-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: "This round it contains only three small cleanup patches" * tag 'for-linus-6.19-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: drivers/xen: use min() instead of min_t() drivers/xen/xenbus: Replace deprecated strcpy in xenbus_transaction_end drivers/xen/xenbus: Simplify return statement in join()
2025-12-06Merge tag 'dma-mapping-6.19-2025-12-05' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping updates from Marek Szyprowski: - More DMA mapping API refactoring to physical addresses as the primary interface instead of page+offset parameters. This time dma_map_ops callbacks are converted to physical addresses, what in turn results also in some simplification of architecture specific code (Leon Romanovsky and Jason Gunthorpe) - Clarify that dma_map_benchmark is not a kernel self-test, but standalone tool (Qinxin Xia) * tag 'dma-mapping-6.19-2025-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma-mapping: remove unused map_page callback xen: swiotlb: Convert mapping routine to rely on physical address x86: Use physical address for DMA mapping sparc: Use physical address DMA mapping powerpc: Convert to physical address DMA mapping parisc: Convert DMA map_page to map_phys interface MIPS/jazzdma: Provide physical address directly alpha: Convert mapping routine to rely on physical address dma-mapping: remove unused mapping resource callbacks xen: swiotlb: Switch to physical address mapping callbacks ARM: dma-mapping: Switch to physical address mapping callbacks ARM: dma-mapping: Reduce struct page exposure in arch_sync_dma*() dma-mapping: convert dummy ops to physical address mapping dma-mapping: prepare dma_map_ops to conversion to physical address tools/dma: move dma_map_benchmark from selftests to tools/dma
2025-12-05Merge tag 'soc-drivers-6.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC driver updates from Arnd Bergmann: "This is the first half of the driver changes: - A treewide interface change to the "syscore" operations for power management, as a preparation for future Tegra specific changes - Reset controller updates with added drivers for LAN969x, eic770 and RZ/G3S SoCs - Protection of system controller registers on Renesas and Google SoCs, to prevent trivially triggering a system crash from e.g. debugfs access - soc_device identification updates on Nvidia, Exynos and Mediatek - debugfs support in the ST STM32 firewall driver - Minor updates for SoC drivers on AMD/Xilinx, Renesas, Allwinner, TI - Cleanups for memory controller support on Nvidia and Renesas" * tag 'soc-drivers-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (114 commits) memory: tegra186-emc: Fix missing put_bpmp Documentation: reset: Remove reset_controller_add_lookup() reset: fix BIT macro reference reset: rzg2l-usbphy-ctrl: Fix a NULL vs IS_ERR() bug in probe reset: th1520: Support reset controllers in more subsystems reset: th1520: Prepare for supporting multiple controllers dt-bindings: reset: thead,th1520-reset: Add controllers for more subsys dt-bindings: reset: thead,th1520-reset: Remove non-VO-subsystem resets reset: remove legacy reset lookup code clk: davinci: psc: drop unused reset lookup reset: rzg2l-usbphy-ctrl: Add support for RZ/G3S SoC reset: rzg2l-usbphy-ctrl: Add support for USB PWRRDY dt-bindings: reset: renesas,rzg2l-usbphy-ctrl: Document RZ/G3S support reset: eswin: Add eic7700 reset driver dt-bindings: reset: eswin: Documentation for eic7700 SoC reset: sparx5: add LAN969x support dt-bindings: reset: microchip: Add LAN969x support soc: rockchip: grf: Add select correct PWM implementation on RK3368 soc/tegra: pmc: Add USB wake events for Tegra234 amba: tegra-ahb: Fix device leak on SMMU enable ...
2025-12-05Merge tag 'pull-persistency' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull persistent dentry infrastructure and conversion from Al Viro: "Some filesystems use a kinda-sorta controlled dentry refcount leak to pin dentries of created objects in dcache (and undo it when removing those). A reference is grabbed and not released, but it's not actually _stored_ anywhere. That works, but it's hard to follow and verify; among other things, we have no way to tell _which_ of the increments is intended to be an unpaired one. Worse, on removal we need to decide whether the reference had already been dropped, which can be non-trivial if that removal is on umount and we need to figure out if this dentry is pinned due to e.g. unlink() not done. Usually that is handled by using kill_litter_super() as ->kill_sb(), but there are open-coded special cases of the same (consider e.g. /proc/self). Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set claims responsibility for +1 in refcount. The end result this series is aiming for: - get these unbalanced dget() and dput() replaced with new primitives that would, in addition to adjusting refcount, set and clear persistency flag. - instead of having kill_litter_super() mess with removing the remaining "leaked" references (e.g. for all tmpfs files that hadn't been removed prior to umount), have the regular shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries, dropping the corresponding reference if it had been set. After that kill_litter_super() becomes an equivalent of kill_anon_super(). Doing that in a single step is not feasible - it would affect too many places in too many filesystems. It has to be split into a series. This work has really started early in 2024; quite a few preliminary pieces have already gone into mainline. This chunk is finally getting to the meat of that stuff - infrastructure and most of the conversions to it. Some pieces are still sitting in the local branches, but the bulk of that stuff is here" * tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits) d_make_discardable(): warn if given a non-persistent dentry kill securityfs_recursive_remove() convert securityfs get rid of kill_litter_super() convert rust_binderfs convert nfsctl convert rpc_pipefs convert hypfs hypfs: swich hypfs_create_u64() to returning int hypfs: switch hypfs_create_str() to returning int hypfs: don't pin dentries twice convert gadgetfs gadgetfs: switch to simple_remove_by_name() convert functionfs functionfs: switch to simple_remove_by_name() functionfs: fix the open/removal races functionfs: need to cancel ->reset_work in ->kill_sb() functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}() functionfs: don't abuse ffs_data_closed() on fs shutdown convert selinuxfs ...
2025-12-05drivers/xen: use min() instead of min_t()David Laight
min_t(unsigned int, a, b) casts an 'unsigned long' to 'unsigned int'. Use min(a, b) instead as it promotes any 'unsigned int' to 'unsigned long' and so cannot discard significant bits. In this case the 'unsigned long' value is small enough that the result is ok. Detected by an extra check added to min_t(). Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <20251119224140.8616-30-david.laight.linux@gmail.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-12-03Merge tag 'net-next-6.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Replace busylock at the Tx queuing layer with a lockless list. Resulting in a 300% (4x) improvement on heavy TX workloads, sending twice the number of packets per second, for half the cpu cycles. - Allow constantly busy flows to migrate to a more suitable CPU/NIC queue. Normally we perform queue re-selection when flow comes out of idle, but under extreme circumstances the flows may be constantly busy. Add sysctl to allow periodic rehashing even if it'd risk packet reordering. - Optimize the NAPI skb cache, make it larger, use it in more paths. - Attempt returning Tx skbs to the originating CPU (like we already did for Rx skbs). - Various data structure layout and prefetch optimizations from Eric. - Remove ktime_get() from the recvmsg() fast path, ktime_get() is sadly quite expensive on recent AMD machines. - Extend threaded NAPI polling to allow the kthread busy poll for packets. - Make MPTCP use Rx backlog processing. This lowers the lock pressure, improving the Rx performance. - Support memcg accounting of MPTCP socket memory. - Allow admin to opt sockets out of global protocol memory accounting (using a sysctl or BPF-based policy). The global limits are a poor fit for modern container workloads, where limits are imposed using cgroups. - Improve heuristics for when to kick off AF_UNIX garbage collection. - Allow users to control TCP SACK compression, and default to 33% of RTT. - Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid unnecessarily aggressive rcvbuf growth and overshot when the connection RTT is low. - Preserve skb metadata space across skb_push / skb_pull operations. - Support for IPIP encapsulation in the nftables flowtable offload. - Support appending IP interface information to ICMP messages (RFC 5837). - Support setting max record size in TLS (RFC 8449). - Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL. - Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock. - Let users configure the number of write buffers in SMC. - Add new struct sockaddr_unsized for sockaddr of unknown length, from Kees. - Some conversions away from the crypto_ahash API, from Eric Biggers. - Some preparations for slimming down struct page. - YAML Netlink protocol spec for WireGuard. - Add a tool on top of YAML Netlink specs/lib for reporting commonly computed derived statistics and summarized system state. Driver API: - Add CAN XL support to the CAN Netlink interface. - Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics, as defined by the OPEN Alliance's "Advanced diagnostic features for 100BASE-T1 automotive Ethernet PHYs" specification. - Add DPLL phase-adjust-gran pin attribute (and implement it in zl3073x). - Refactor xfrm_input lock to reduce contention when NIC offloads IPsec and performs RSS. - Add info to devlink params whether the current setting is the default or a user override. Allow resetting back to default. - Add standard device stats for PSP crypto offload. - Leverage DSA frame broadcast to implement simple HSR frame duplication for a lot of switches without dedicated HSR offload. - Add uAPI defines for 1.6Tbps link modes. Device drivers: - Add Motorcomm YT921x gigabit Ethernet switch support. - Add MUCSE driver for N500/N210 1GbE NIC series. - Convert drivers to support dedicated ops for timestamping control, and away from the direct IOCTL handling. While at it support GET operations for PHY timestamping. - Add (and convert most drivers to) a dedicated ethtool callback for reading the Rx ring count. - Significant refactoring efforts in the STMMAC driver, which supports Synopsys turn-key MAC IP integrated into a ton of SoCs. - Ethernet high-speed NICs: - Broadcom (bnxt): - support PPS in/out on all pins - Intel (100G, ice, idpf): - ice: implement standard ethtool and timestamping stats - i40e: support setting the max number of MAC addresses per VF - iavf: support RSS of GTP tunnels for 5G and LTE deployments - nVidia/Mellanox (mlx5): - reduce downtime on interface reconfiguration - disable being an XDP redirect target by default (same as other drivers) to avoid wasting resources if feature is unused - Meta (fbnic): - add support for Linux-managed PCS on 25G, 50G, and 100G links - Wangxun: - support Rx descriptor merge, and Tx head writeback - support Rx coalescing offload - support 25G SPF and 40G QSFP modules - Ethernet virtual: - Google (gve): - allow ethtool to configure rx_buf_len - implement XDP HW RX Timestamping support for DQ descriptor format - Microsoft vNIC (mana): - support HW link state events - handle hardware recovery events when probing the device - Ethernet NICs consumer, and embedded: - usbnet: add support for Byte Queue Limits (BQL) - AMD (amd-xgbe): - add device selftests - NXP (enetc): - add i.MX94 support - Broadcom integrated MACs (bcmgenet, bcmasp): - bcmasp: add support for PHY-based Wake-on-LAN - Broadcom switches (b53): - support port isolation - support BCM5389/97/98 and BCM63XX ARL formats - Lantiq/MaxLinear switches: - support bridge FDB entries on the CPU port - use regmap for register access - allow user to enable/disable learning - support Energy Efficient Ethernet - support configuring RMII clock delays - add tagging driver for MaxLinear GSW1xx switches - Synopsys (stmmac): - support using the HW clock in free running mode - add Eswin EIC7700 support - add Rockchip RK3506 support - add Altera Agilex5 support - Cadence (macb): - cleanup and consolidate descriptor and DMA address handling - add EyeQ5 support - TI: - icssg-prueth: support AF_XDP - Airoha access points: - add missing Ethernet stats and link state callback - add AN7583 support - support out-of-order Tx completion processing - Power over Ethernet: - pd692x0: preserve PSE configuration across reboots - add support for TPS23881B devices - Ethernet PHYs: - Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support - Support 50G SerDes and 100G interfaces in Linux-managed PHYs - micrel: - support for non PTP SKUs of lan8814 - enable in-band auto-negotiation on lan8814 - realtek: - cable testing support on RTL8224 - interrupt support on RTL8221B - motorcomm: support for PHY LEDs on YT853 - microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag - mscc: support for PHY LED control - CAN drivers: - m_can: add support for optional reset and system wake up - remove can_change_mtu() obsoleted by core handling - mcp251xfd: support GPIO controller functionality - Bluetooth: - add initial support for PASTa - WiFi: - split ieee80211.h file, it's way too big - improvements in VHT radiotap reporting, S1G, Channel Switch Announcement handling, rate tracking in mesh networks - improve multi-radio monitor mode support, and add a cfg80211 debugfs interface for it - HT action frame handling on 6 GHz - initial chanctx work towards NAN - MU-MIMO sniffer improvements - WiFi drivers: - RealTek (rtw89): - support USB devices RTL8852AU and RTL8852CU - initial work for RTL8922DE - improved injection support - Intel: - iwlwifi: new sniffer API support - MediaTek (mt76): - WED support for >32-bit DMA - airoha NPU support - regdomain improvements - continued WiFi7/MLO work - Qualcomm/Atheros: - ath10k: factory test support - ath11k: TX power insertion support - ath12k: BSS color change support - ath12k: statistics improvements - brcmfmac: Acer A1 840 tablet quirk - rtl8xxxu: 40 MHz connection fixes/support" * tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1381 commits) net: page_pool: sanitise allocation order net: page pool: xa init with destroy on pp init net/mlx5e: Support XDP target xmit with dummy program net/mlx5e: Update XDP features in switch channels selftests/tc-testing: Test CAKE scheduler when enqueue drops packets net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop wireguard: netlink: generate netlink code wireguard: uapi: generate header with ynl-gen wireguard: uapi: move flag enums wireguard: uapi: move enum wg_cmd wireguard: netlink: add YNL specification selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py selftests: drv-net: Fix and clarify TC bandwidth split in devlink_rate_tc_bw.py selftests: drv-net: Set shell=True for sysfs writes in devlink_rate_tc_bw.py selftests: drv-net: Use Iperf3Runner in devlink_rate_tc_bw.py selftests: drv-net: introduce Iperf3Runner for measurement use cases selftests: drv-net: Add devlink_rate_tc_bw.py to TEST_PROGS net: ps3_gelic_net: Use napi_alloc_skb() and napi_gro_receive() Documentation: net: dsa: mention simple HSR offload helpers Documentation: net: dsa: mention availability of RedBox ...
2025-11-17drivers/xen/xenbus: Replace deprecated strcpy in xenbus_transaction_endThorsten Blum
strcpy() is deprecated; inline the read-only string instead. Fix the function comment and use bool instead of int while we're at it. Link: https://github.com/KSPP/linux/issues/88 Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20251031112145.103257-2-thorsten.blum@linux.dev>
2025-11-17drivers/xen/xenbus: Simplify return statement in join()Thorsten Blum
Don't unnecessarily negate 'buffer' and simplify the return statement. Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20251112171410.3140-2-thorsten.blum@linux.dev>
2025-11-16convert xenfsAl Viro
entirely static tree, populated by simple_fill_super(). Can switch to kill_anon_super() without any other changes. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-14syscore: Pass context data to callbacksThierry Reding
Several drivers can benefit from registering per-instance data along with the syscore operations. To achieve this, move the modifiable fields out of the syscore_ops structure and into a separate struct syscore that can be registered with the framework. Add a void * driver data field for drivers to store contextual data that will be passed to the syscore ops. Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-13drivers/xen/xenbus: Fix namespace collision and split() section placement ↵Josh Poimboeuf
with AutoFDO When compiling the kernel with -ffunction-sections enabled, the split() function gets compiled into the .text.split section. In some cases it can even be cloned into .text.split.constprop.0 or .text.split.isra.0. However, .text.split.* is already reserved for use by the Clang -fsplit-machine-functions flag, which is used by AutoFDO. That may place part of a function's code in a .text.split.<func> section. This naming conflict causes the vmlinux linker script to wrongly place split() with other .text.split.* code, rather than where it belongs with regular text. Fix it by renaming split() to split_strings(). Fixes: 6568f14cb5ae ("vmlinux.lds: Exclude .text.startup and .text.exit from TEXT_MAIN") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: live-patching@vger.kernel.org Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://patch.msgid.link/92a194234a0f757765e275b288bb1a7236c2c35c.1762991150.git.jpoimboe@kernel.org
2025-11-04net: Convert proto_ops connect() callbacks to use sockaddr_unsizedKees Cook
Update all struct proto_ops connect() callback function prototypes from "struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the compiler about object sizes. Calls into struct proto handlers gain casts that will be removed in the struct proto conversion patch. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-3-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: Convert proto_ops bind() callbacks to use sockaddr_unsizedKees Cook
Update all struct proto_ops bind() callback function prototypes from "struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the compiler about object sizes. Calls into struct proto handlers gain casts that will be removed in the struct proto conversion patch. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-2-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>