| Age | Commit message (Collapse) | Author |
|
According to the PLIC specification[1], global interrupt sources are
assigned small unsigned integer identifiers beginning at the value 1.
An interrupt ID of 0 is reserved to mean "no interrupt".
The current plic_irq_resume() and plic_irq_suspend() functions incorrectly
start the loop from index 0, which accesses the register space for the
reserved interrupt ID 0.
Change the loop to start from index 1, skipping the reserved
interrupt ID 0 as per the PLIC specification.
This prevents potential undefined behavior when accessing the reserved
register space during suspend/resume cycles.
Fixes: e80f0b6a2cf3 ("irqchip/irq-sifive-plic: Add syscore callbacks for hibernation")
Co-developed-by: Jia Wang <wangjia@ultrarisc.com>
Signed-off-by: Jia Wang <wangjia@ultrarisc.com>
Co-developed-by: Charles Mirabile <cmirabil@redhat.com>
Signed-off-by: Charles Mirabile <cmirabil@redhat.com>
Signed-off-by: Lucas Zampieri <lzampier@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://github.com/riscv/riscv-plic-spec/releases/tag/1.0.0
|
|
of_iomap() doesn't return error pointers, it returns NULL. Fix the error
checking to check for NULL pointers.
Fixes: 86cd4301c285 ("irqchip/aspeed-scu-ic: Refactor driver to support variant-based initialization")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Paul Walmsley:
- Support for the RISC-V-standardized RPMI interface.
RPMI is a platform management communication mechanism between OSes
running on application processors, and a remote platform management
processor. Similar to ARM SCMI, TI SCI, etc. This includes irqchip,
mailbox, and clk changes.
- Support for the RISC-V-standardized MPXY SBI extension.
MPXY is a RISC-V-specific standard implementing a shared memory
mailbox between S-mode operating systems (e.g., Linux) and M-mode
firmware (e.g., OpenSBI). It is part of this PR since one of its use
cases is to enable M-mode firmware to act as a single RPMI client for
all RPMI activity on a core (including S-mode RPMI activity).
Includes a mailbox driver.
- Some ACPI-related updates to enable the use of RPMI and MPXY.
- The addition of Linux-wide memcpy_{from,to}_le32() static inline
functions, for RPMI use.
- An ACPI Kconfig change to enable boot logos on any ACPI-using
architecture (including RISC-V)
- A RISC-V defconfig change to add GPIO keyboard and event device
support, for front panel shutdown or reboot buttons
* tag 'riscv-for-linus-6.18-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (26 commits)
clk: COMMON_CLK_RPMI should depend on RISCV
ACPI: support BGRT table on RISC-V
MAINTAINERS: Add entry for RISC-V RPMI and MPXY drivers
RISC-V: Enable GPIO keyboard and event device in RV64 defconfig
irqchip/riscv-rpmi-sysmsi: Add ACPI support
mailbox/riscv-sbi-mpxy: Add ACPI support
irqchip/irq-riscv-imsic-early: Export imsic_acpi_get_fwnode()
ACPI: RISC-V: Add RPMI System MSI to GSI mapping
ACPI: RISC-V: Add support to update gsi range
ACPI: RISC-V: Create interrupt controller list in sorted order
ACPI: scan: Update honor list for RPMI System MSI
ACPI: Add support for nargs_prop in acpi_fwnode_get_reference_args()
ACPI: property: Refactor acpi_fwnode_get_reference_args() to support nargs_prop
irqchip: Add driver for the RPMI system MSI service group
dt-bindings: Add RPMI system MSI interrupt controller bindings
dt-bindings: Add RPMI system MSI message proxy bindings
clk: Add clock driver for the RISC-V RPMI clock service group
dt-bindings: clock: Add RPMI clock service controller bindings
dt-bindings: clock: Add RPMI clock service message proxy bindings
mailbox: Add RISC-V SBI message proxy (MPXY) based mailbox driver
...
|
|
Pull kvm updates from Paolo Bonzini:
"This excludes the bulk of the x86 changes, which I will send
separately. They have two not complex but relatively unusual conflicts
so I will wait for other dust to settle.
guest_memfd:
- Add support for host userspace mapping of guest_memfd-backed memory
for VM types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE
(which isn't precisely the same thing as CoCo VMs, since x86's
SEV-MEM and SEV-ES have no way to detect private vs. shared).
This lays the groundwork for removal of guest memory from the
kernel direct map, as well as for limited mmap() for
guest_memfd-backed memory.
For more information see:
- commit a6ad54137af9 ("Merge branch 'guest-memfd-mmap' into HEAD")
- guest_memfd in Firecracker:
https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
- direct map removal:
https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
- mmap support:
https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/
ARM:
- Add support for FF-A 1.2 as the secure memory conduit for pKVM,
allowing more registers to be used as part of the message payload.
- Change the way pKVM allocates its VM handles, making sure that the
privileged hypervisor is never tricked into using uninitialised
data.
- Speed up MMIO range registration by avoiding unnecessary RCU
synchronisation, which results in VMs starting much quicker.
- Add the dump of the instruction stream when panic-ing in the EL2
payload, just like the rest of the kernel has always done. This
will hopefully help debugging non-VHE setups.
- Add 52bit PA support to the stage-1 page-table walker, and make use
of it to populate the fault level reported to the guest on failing
to translate a stage-1 walk.
- Add NV support to the GICv3-on-GICv5 emulation code, ensuring
feature parity for guests, irrespective of the host platform.
- Fix some really ugly architecture problems when dealing with debug
in a nested VM. This has some bad performance impacts, but is at
least correct.
- Add enough infrastructure to be able to disable EL2 features and
give effective values to the EL2 control registers. This then
allows a bunch of features to be turned off, which helps cross-host
migration.
- Large rework of the selftest infrastructure to allow most tests to
transparently run at EL2. This is the first step towards enabling
NV testing.
- Various fixes and improvements all over the map, including one BE
fix, just in time for the removal of the feature.
LoongArch:
- Detect page table walk feature on new hardware
- Add sign extension with kernel MMIO/IOCSR emulation
- Improve in-kernel IPI emulation
- Improve in-kernel PCH-PIC emulation
- Move kvm_iocsr tracepoint out of generic code
RISC-V:
- Added SBI FWFT extension for Guest/VM with misaligned delegation
and pointer masking PMLEN features
- Added ONE_REG interface for SBI FWFT extension
- Added Zicbop and bfloat16 extensions for Guest/VM
- Enabled more common KVM selftests for RISC-V
- Added SBI v3.0 PMU enhancements in KVM and perf driver
s390:
- Improve interrupt cpu for wakeup, in particular the heuristic to
decide which vCPU to deliver a floating interrupt to.
- Clear the PTE when discarding a swapped page because of CMMA; this
bug was introduced in 6.16 when refactoring gmap code.
x86 selftests:
- Add #DE coverage in the fastops test (the only exception that's
guest- triggerable in fastop-emulated instructions).
- Fix PMU selftests errors encountered on Granite Rapids (GNR),
Sierra Forest (SRF) and Clearwater Forest (CWF).
- Minor cleanups and improvements
x86 (guest side):
- For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when
overriding guest MTRR for TDX/SNP to fix an issue where ACPI
auto-mapping could map devices as WB and prevent the device drivers
from mapping their devices with UC/UC-.
- Make kvm_async_pf_task_wake() a local static helper and remove its
export.
- Use native qspinlocks when running in a VM with dedicated
vCPU=>pCPU bindings even when PV_UNHALT is unsupported.
Generic:
- Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as
__GFP_NOWARN is now included in GFP_NOWAIT.
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (178 commits)
KVM: s390: Fix to clear PTE when discarding a swapped page
KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs
KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs
KVM: arm64: selftests: Cope with arch silliness in EL2 selftest
KVM: arm64: selftests: Add basic test for running in VHE EL2
KVM: arm64: selftests: Enable EL2 by default
KVM: arm64: selftests: Initialize HCR_EL2
KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters
KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2
KVM: arm64: selftests: Select SMCCC conduit based on current EL
KVM: arm64: selftests: Provide helper for getting default vCPU target
KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts
KVM: arm64: selftests: Create a VGICv3 for 'default' VMs
KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation
KVM: arm64: selftests: Add helper to check for VGICv3 support
KVM: arm64: selftests: Initialize VGICv3 only once
KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code
KVM: selftests: Add ex_str() to print human friendly name of exception vectors
selftests/kvm: remove stale TODO in xapic_state_test
KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
"DT core:
- Update dtc to upstream version v1.7.2-35-g52f07dcca47c
- Add stub for of_get_next_child_with_prefix()
- Convert of_msi_map_id() callers to of_msi_xlate()
DT bindings:
- Convert multiple text board bindings to DT schema format
- Add bindings for synaptics,synaptics_i2c touchscreen controller,
innolux,n133hse-ea1 and nlt,nl12880bc20-spwg-24 displays, and NXP
vf610 reboot controller
- Add new Arm Cortex-A320/A520AE/A720AE and C1-Nano/Pro/Premium/Ultra
CPUs. Add missing Applied Micro CPU compatibles. Add pu-supply and
fsl,soc-operating-points properties for CPU nodes.
- Add QCom Glymur PDC and tegra264-agic interrupt controllers
- Add samsung,exynos8890-mali GPU to Arm Mali Midgard
- Drop Samsung S3C2410 display related bindings
- Allow separate DP lane and AUX connections in dp-connector
- Add some missing, undocumented vendor prefixes
- Add missing '#address-cells' properties in interrupt controller
bindings which dtc now warns about
- Drop duplicate socfpga-sdram-edac.txt, moxa,moxart-watchdog.txt,
fsl/mpic.txt, ti,opa362.txt, and cavium-thunder2.txt legacy text
bindings which are already covered by existing schemas.
- Various binding fixes for Mediatek platforms in mailbox, regulator,
pinctrl, timer, and display
- Drop work-around for yamllint quoting of values containing ','
- Various spelling, typo, grammar, and duplicated words fixes in DT
bindings and docs
- Add binding guidelines for defining properties at top level of
schemas, lack of node name ABI, and usage of simple-mfd"
* tag 'devicetree-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (81 commits)
dt-bindings: arm: altera: Drop socfpga-sdram-edac.txt
dt-bindings: gpu: Convert nvidia,gk20a to DT schema
dt-bindings: rng: sparc_sun_oracle_rng: convert to DT schema
dt-bindings: vendor-prefixes: update regex for properties without a prefix
dt-bindings: display: bridge: convert megachips-stdpxxxx-ge-b850v3-fw.txt to yaml
scripts: dt_to_config: fix grammar and a typo in --help text
dt-bindings: fix spelling, typos, grammar, duplicated words
docs: dt: fix grammar and spelling
of: base: Add of_get_next_child_with_prefix() stub
dt-bindings: trivial-devices: Add compatible string synaptics,synaptics_i2c
dt-bindings: soc: mediatek: pwrap: Add power-domains property
dt-bindings: pinctrl: mt65xx: Allow gpio-line-names
dt-bindings: media: Convert MediaTek mt8173-vpu bindings to DT schema
dt-bindings: arm: mediatek: Support mt8183-audiosys variant
dt-bindings: mailbox: mediatek,gce-mailbox: Make clock-names optional
dt-bindings: regulator: mediatek,mt6331: Add missing compatible
dt-bindings: regulator: mediatek,mt6331: Fix various regulator names
dt-bindings: regulator: mediatek,mt6332-regulator: Add missing compatible
dt-bindings: pinctrl: mediatek,mt7622-pinctrl: Add missing base reg
dt-bindings: pinctrl: mediatek,mt7622-pinctrl: Add missing pwm_ch7_2
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq chip driver updates from Thomas Gleixner:
- Use the startup/shutdown callbacks for the PCI/MSI per device
interrupt domains.
This allows us to initialize the RISCV PLIC interrupt hierarchy
correctly and provides a mechanism to decouple the masking and
unmasking during run-time from the expensive PCI mask and unmask when
the underlying MSI provider implementation allows the interrupt to be
masked.
- Initialize the RISCV PLIC MSI interrupt hierarchy correctly so that
the affinity assignment works correctly by switching it over to the
startup/shutdown scheme
- Allow MSI providers to opt out from masking a PCI/MSI interrupt at
the PCI device during operation when the provider can mask the
interrupt at the underlying interrupt chip. This reduces the overhead
in scenarios where disable_irq()/enable_irq() is utilized frequently
by a driver.
The PCI/MSI device level [un]masking is only required on startup and
shutdown in this case.
- Remove the conditional mask/unmask logic in the PCI/MSI layer as this
is now handled unconditionally.
- Replace the hardcoded interrupt routing in the Loongson EIOINTC
interrupt driver to respect the firmware settings and spread them out
to different CPU interrupt inputs so that the demultiplexing handler
only needs to read only a single 64-bit status register instead of
four, which significantly reduces the overhead in VMs as the status
register access causes a VM exit.
- Add support for the new AST2700 SCU interrupt controllers
- Use the legacy interrupt domain setup for the Loongson PCH-LPC
interrupt controller, which resembles the x86 legacy PIC setup and
has the same hardcoded legacy requirements.
- The usual set of cleanups, fixes and improvements all over the place
* tag 'irq-drivers-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
irqchip/loongson-pch-lpc: Use legacy domain for PCH-LPC IRQ controller
PCI/MSI: Remove the conditional parent [un]mask logic
irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
irqchip/aspeed-scu-ic: Add support for AST2700 SCU interrupt controllers
dt-bindings: interrupt-controller: aspeed: Add AST2700 SCU IC compatibles
dt-bindings: mfd: aspeed: Add AST2700 SCU compatibles
irqchip/aspeed-scu-ic: Refactor driver to support variant-based initialization
irqchip/gic-v5: Fix error handling in gicv5_its_irq_domain_alloc()
irqchip/gic-v5: Fix loop in gicv5_its_create_itt_two_level() cleanup path
irqchip/gic-v5: Delete a stray tab
irqchip/sg2042-msi: Set irq type according to DT configuration
riscv: sophgo: dts: sg2044: Change msi irq type to IRQ_TYPE_EDGE_RISING
riscv: sophgo: dts: sg2042: Change msi irq type to IRQ_TYPE_EDGE_RISING
irqchip/gic-v2m: Handle Multiple MSI base IRQ Alignment
irqchip/renesas-rzg2l: Remove dev_err_probe() if error is -ENOMEM
irqchip: Use int type to store negative error codes
irqchip/gic-v5: Remove the redundant ITS cache invalidation
PCI/MSI: Check MSI_FLAG_PCI_MSI_MASK_PARENT in cond_[startup|shutdown]_parent()
irqchip/loongson-eiointc: Add multiple interrupt pin routing support
irqchip/loongson-eiointc: Route interrupt parsed from bios table
...
|
|
Add ACPI support for the RISC-V RPMI system MSI based irqchip driver.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Acked-by: Jassi Brar <jassisinghbrar@gmail.com>
Link: https://lore.kernel.org/r/20250818040920.272664-23-apatel@ventanamicro.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
ACPI based loadable drivers which need MSIs will also need
imsic_acpi_get_fwnode() to update the device MSI domain so
export this function.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20250818040920.272664-21-apatel@ventanamicro.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
The RPMI specification defines a system MSI service group which
allows application processors to receive MSIs upon system events
such as graceful shutdown/reboot request, CPU hotplug event, memory
hotplug event, etc.
Add an irqchip driver for the RISC-V RPMI system MSI service group
to directly receive system MSIs in Linux kernel.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20250818040920.272664-14-apatel@ventanamicro.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
The presence of FEAT_GCIE_LEGACY is now handled as a CPU
feature. Therefore, drop the check and flag from the GIC driver and
gic_kvm_info as it is no longer required or used by KVM.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
On certain Loongson platforms, drivers attempting to request a legacy
ISA IRQ directly via request_irq() (e.g., IRQ 4) may fail. The
virtual IRQ descriptor is not fully initialized and lacks a valid irqchip.
This issue does not affect ACPI-enumerated devices described in DSDT,
as their interrupts are properly mapped via the GSI translation path.
This indicates the LPC irqdomain itself is functional but is not correctly
handling direct VIRQ-to-HWIRQ mappings.
The root cause is the use of irq_domain_create_linear(). This API sets
up a domain for dynamic, on-demand mapping, typically triggered by a GSI
request. It does not pre-populate the mappings for the legacy VIRQ range
(0-15). Consequently, if no ACPI device claims a specific GSI
(e.g., GSI 4), the corresponding VIRQ (e.g., VIRQ 4) is never mapped to
the LPC domain. A direct call to request_irq(4, ...) then fails because
the kernel cannot resolve this VIRQ to a hardware interrupt managed by
the LPC controller.
The PCH-LPC interrupt controller is an i8259-compatible legacy device
that requires a deterministic, static 1-to-1 mapping for IRQs 0-15 to
support legacy drivers.
Fix this by replacing irq_domain_create_linear() with
irq_domain_create_legacy(). This API is specifically designed for such
controllers. It establishes the required static 1-to-1 VIRQ-to-HWIRQ
mapping for the entire legacy range (0-15) immediately upon domain
creation. This ensures that any VIRQ in this range is always resolvable,
making direct calls to request_irq() for legacy IRQs function correctly.
Signed-off-by: Ming Wang <wangming01@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
For systems that implement interrupt masking at the interrupt controller
level, the MSI library offers MSI_FLAG_PCI_MSI_MASK_PARENT. It indicates
that it isn't enough to only unmask the interrupt at the PCI device level,
but that the interrupt controller must also be involved.
However, the way this is currently done is less than optimal, as the
masking/unmasking is done on both sides, always. It would be far cheaper to
unmask both at the start of times, and then only deal with the interrupt
controller mask, which is cheaper than a round-trip to the PCI endpoint.
Now that the PCI/MSI layer implements irq_startup() and irq_shutdown()
callbacks, which [un]mask at the PCI level and honor the request to
[un]mask the parent, this can be trivially done.
Overwrite the irq_mask/unmask() callbacks of the device domain interrupt
chip with irq_[un]mask_parent() when the parent domain asks for it.
[ tglx: Adopted to the PCI/MSI changes ]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20250903135433.380783272@linutronix.de
|
|
AST2700 continues the multi-instance SCU interrupt controller model
introduced in the AST2600, with four independent interrupt domains (scu-ic0
to 3).
Unlike earlier generations which combine interrupt enable and status bits
into a single register, AST2700 separates these into distinct IER and ISR
registers. Support for this layout is implemented by using register offsets
and separate chained IRQ handlers.
The variant table is extended to cover AST2700 IC instances, enabling
shared initialization logic while preserving support for previous SoCs.
[ tglx: Simplified the logic and cleaned up coding style ]
Signed-off-by: Ryan Chen <ryan_chen@aspeedtech.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250908011812.1033858-5-ryan_chen@aspeedtech.com
|
|
The SCU IC driver handles each AST2600 instance with separate
initialization functions and hardcoded register definitions, which is
inflexible and creates duplicated code.
Consolidate the implementation by introducing a variant-based structure,
selected via compatible string, and use a unified init path and MMIO access
via of_iomap(). This simplifies the code and prepares for upcoming SoCs
like AST2700, which require split register handling.
[ tglx: Cleaned up coding style and massaged change log ]
Signed-off-by: Ryan Chen <ryan_chen@aspeedtech.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250908011812.1033858-2-ryan_chen@aspeedtech.com
|
|
Code in gicv5_its_irq_domain_alloc() has two issues:
- it checks the wrong return value/variable when calling gicv5_alloc_lpi()
- The cleanup code does not take previous loop iterations into account
Fix both issues at once by adding the right gicv5_alloc_lpi() variable
check and by reworking the function cleanup code to take into account
current and previous iterations.
[ lpieralisi: Reworded commit message ]
Fixes: 57d72196dfc8 ("irqchip/gic-v5: Add GICv5 ITS support")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/all/20250908082745.113718-4-lpieralisi@kernel.org
|
|
The "i" variable in gicv5_its_create_itt_two_level() needs to be signed
otherwise it can cause a forever loop in the function's cleanup path.
[ lpieralisi: Reworded commit message ]
Fixes: 57d72196dfc8 ("irqchip/gic-v5: Add GICv5 ITS support")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/all/20250908082745.113718-3-lpieralisi@kernel.org
|
|
Delete a stray tab that is indenting the code erroneously.
[ lpieralisi: Reworded commit message]
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/all/20250908082745.113718-2-lpieralisi@kernel.org
|
|
Read the device tree configuration and use it to set the interrupt type.
Signed-off-by: Chen Wang <unicorn_wang@outlook.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Inochi Amaoto <inochiama@gmail.com> # Sophgo SRD3-10
Link: https://lore.kernel.org/all/b22d2b0a00a96161253435d17b3c66538f3ba1c2.1756953919.git.unicorn_wang@outlook.com
|
|
The PCI Local Bus Specification 3.0 (section 6.8.1.6) allows modifying the
low-order bits of the MSI Message DATA register to encode nr_irqs interrupt
numbers in the log2(nr_irqs) bits for the domain.
The problem arises if the base vector (GICV2m base spi) is not aligned with
nr_irqs; in this case, the low-order log2(nr_irqs) bits from the base
vector conflict with the nr_irqs masking, causing the wrong MSI interrupt
to be identified.
To fix this, use bitmap_find_next_zero_area_off() instead of
bitmap_find_free_region() to align the initial base vector with nr_irqs.
Signed-off-by: Christian Bruel <christian.bruel@foss.st.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20250902091045.220847-1-christian.bruel@foss.st.com
|
|
With the introduction of the of_msi_xlate() function, the OF layer
provides an API to map a device ID and retrieve the MSI controller
node the ID is mapped to with a single call.
of_msi_map_id() is currently used to map a deviceID to a specific
MSI controller node; of_msi_xlate() can be used for that purpose
too, there is no need to keep the two functions.
Convert of_msi_map_id() to of_msi_xlate() calls and update the
of_msi_xlate() documentation to describe how the struct device_node
pointer passed in should be set-up to either provide the MSI controller
node target or receive its pointer upon mapping completion.
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rob Herring <robh@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20250805133443.936955-1-lpieralisi@kernel.org
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
|
|
The dev_err_probe() doesn't do anything when error is '-ENOMEM'.
Therefore, remove the useless call to dev_err_probe(), and just return the
value instead.
Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250821093845.564496-1-zhao.xichao@vivo.com
|
|
Change the 'ret' variable from unsigned int to int to store negative error
codes or zero returned by other functions.
Storing the negative error codes in unsigned type, doesn't cause an issue
at runtime but assigning negative error codes to unsigned type may trigger
a compiler warning when the -Wsign-conversion flag is enabled.
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20250829132020.82077-1-rongqianfeng@vivo.com
|
|
An ITS cache invalidation has been performed immediately after programming
the L2 DTE in gicv5_its_device_register(). No need to perform it again
right after a successful gicv5_its_device_register().
Remove it.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20250903023319.1820-1-yuzenghui@huawei.com
|
|
In gicv5_irs_of_init_affinity() a WARN_ON() is triggered if:
1) a phandle in the "cpus" property does not correspond to a valid OF
node
2 a CPU logical id does not exist for a given OF cpu_node
#1 is a firmware bug and should be reported as such but does not warrant a
WARN_ON() backtrace.
#2 is not necessarily an error condition (eg a kernel can be booted with
nr_cpus=X limiting the number of cores artificially) and therefore there
is no reason to clutter the kernel log with WARN_ON() output when the
condition is hit.
Rework the IRS affinity parsing code to remove undue WARN_ON()s thus
making it less noisy.
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250814094138.1611017-1-lpieralisi@kernel.org
|
|
The eiointc interrupt controller supports 256 interrupt vectors at most,
and the interrupt handler gets the interrupt status from the base register
group EIOINTC_REG_ISR at the interrupt specific offset.
It needs to read the register group EIOINTC_REG_ISR four times to get all
256 interrupt vectors status.
Eiointc registers including EIOINTC_REG_ISR are software emulated for
VMs, so there will be VM-exits when accessing eiointc registers.
Introduce a method to make the eiointc interrupt controller route
to different CPU interrupt pins for every 64 interrupt vectors.
The interrupt handler can then reduce the read to one specific
EIOINTC_REG_ISR register instead of all four, which reduces VM exits.
[ tglx: Massage change log ]
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250804081946.1456573-3-maobibo@loongson.cn
|
|
Interrupt controller eiointc routes interrupts to CPU interface IP0 - IP7.
It is currently hard-coded that eiointc routes interrupts to the CPU
starting from IP1, but it should base that decision on the parent
interrupt, which is provided by ACPI or DTS.
Retrieve the parent's hardware interrupt number and store it in the
descriptor of the eointc instance, so that the routing function can utilize
it for the correct route settings.
[ tglx: Massaged change log ]
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250804081946.1456573-2-maobibo@loongson.cn
|
|
plic_set_affinity() always calls plic_irq_enable(), which clears up the
priority setting even the interrupt is only masked. This unmasks the
interrupt unexpectly.
Replace the plic_irq_enable/disable() with plic_irq_toggle() to avoid
changing the priority setting.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nam Cao <namcao@linutronix.de> # VisionFive 2
Tested-by: Chen Wang <unicorn_wang@outlook.com> # Pioneerbox
Reviewed-by: Nam Cao <namcao@linutronix.de>
Reviewed-by: Chen Wang <unicorn_wang@outlook.com>
Link: https://lore.kernel.org/all/20250811002633.55275-1-inochiama@gmail.com
Link: https://lore.kernel.org/lkml/20250722224513.22125-1-inochiama@gmail.com/
|
|
L2 IST table entries are allocated with the kmalloc interface and their
physical addresses are programmed in the GIC (either IST base address
register or L1 IST table entries) but their virtual addresses are not
stored in any kernel data structure because they are not needed at runtime
- the L2 IST table entries are managed through system instructions but
never dereferenced directly by the driver.
This triggers kmemleak false positive reports:
unreferenced object 0xffff00080039a000 (size 4096):
comm "swapper/0", pid 0, jiffies 4294892296
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc 0):
kmemleak_alloc+0x34/0x40
__kmalloc_noprof+0x320/0x464
gicv5_irs_iste_alloc+0x1a4/0x484
gicv5_irq_lpi_domain_alloc+0xe4/0x194
irq_domain_alloc_irqs_parent+0x78/0xd8
gicv5_irq_ipi_domain_alloc+0x180/0x238
irq_domain_alloc_irqs_locked+0x238/0x7d4
__irq_domain_alloc_irqs+0x88/0x114
gicv5_of_init+0x284/0x37c
of_irq_init+0x3b8/0xb18
irqchip_init+0x18/0x40
init_IRQ+0x104/0x164
start_kernel+0x1a4/0x3d4
__primary_switched+0x8c/0x94
Instruct kmemleak to ignore L2 IST table memory allocation virtual
addresses to prevent these false positive reports.
Reported-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20250811135001.1333684-1-lpieralisi@kernel.org
Closes: https://lore.kernel.org/lkml/cc611dda-d1e4-4793-9bb2-0eaa47277584@huawei.com/
|
|
Replace the open coded for_each_cpu(cpu, cpu_present_mask) loop with the
more readable and equivalent for_each_present_cpu(cpu) macro.
Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250811064701.2906-1-wangfushuai@baidu.com
|
|
ioremap() never returns error pointers, it returns NULL on error. Fix the
check to match.
Fixes: 3c3d7dbab2c7 ("irqchip/mvebu-gicp: Clear pending interrupts on init")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aKRGcgMeaXm2TMIC@stanley.mountain
|
|
Commit b00bee8afaca ("irqchip: Convert generic irqchip locking to guards")
replaced calls to irq_gc_lock_irq{save,restore}() with
guard(raw_spinlock_irq).
However, in irq-atmel-aic5.c and irq-atmel-aic.c, the xlate callback is
used in the early boot process, before interrupts are initially enabled.
As its destructor enables interrupts, this triggers the warning in
start_kernel():
WARNING: CPU: 0 PID: 0 at init/main.c:1024 start_kernel+0x4d0/0x5dc
Interrupts were enabled early
Fix this by using guard(raw_spinlock_irqsave) instead.
[ tglx: Folded the equivivalent fix for atmel-aic ]
Fixes: b00bee8afaca ("irqchip: Convert generic irqchip locking to guards")
Signed-off-by: Edgar Bonet <bonet@grenoble.cnrs.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/all/280dd506-e1fc-4d2e-bdc4-98dd9dca6138@grenoble.cnrs.fr
|
|
The MSI controller on SG2044 has the ability to allocate multiple PCI MSI
interrupts. So the PCIe controller driver can use this feature if the
hardware supports multiple PCI MSI interrupts.
Add the MSI_FLAG_MULTI_PCI_MSI flag to the supported_flags of SG2044
msi_parent_ops to enable this functionality.
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Chen Wang <unicorn_wang@outlook.com> # Pioneerbox
Reviewed-by: Chen Wang <unicorn_wang@outlook.com>
Link: https://lore.kernel.org/all/20250813232835.43458-5-inochiama@gmail.com
|
|
When using NVME on SG2044, the NVME drvier always complains about "I/O tag
XXX (XXX) QID XX timeout, completion polled", which is caused by the broken
affinity setting mechanism of the sg2042-msi driver.
The PLIC driver can only the set the affinity when enabled, but the
sg2042-msi driver invokes the affinity setter in disabled state, which
causes the change to be lost.
Cure this by implementing the irq_startup()/shutdown() callbacks, which
allow to startup (enabled) the underlying PLIC first.
Fixes: e96b93a97c90 ("irqchip/sg2042-msi: Add the Sophgo SG2044 MSI interrupt controller")
Reported-by: Han Gao <rabenda.cn@gmail.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Chen Wang <unicorn_wang@outlook.com> # Pioneerbox
Reviewed-by: Chen Wang <unicorn_wang@outlook.com>
Link: https://lore.kernel.org/all/20250813232835.43458-4-inochiama@gmail.com
|
|
0-day reported an off by one in the ioremap() sizing:
drivers/irqchip/irq-mvebu-gicp.c:240:45-48: WARNING:
Suspicious code. resource_size is maybe missing with gicp -> res
Convert it to resource_size(), which does the right thing.
Fixes: 3c3d7dbab2c7 ("irqchip/mvebu-gicp: Clear pending interrupts on init")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Closes: https://lore.kernel.org/oe-kbuild-all/202508062150.mtFQMTXc-lkp@intel.com/
|
|
Compile-testing IMX_MU_MSI on x86 without PCI_MSI support results in a
build failure:
drivers/gpio/gpio-sprd.c:8:
include/linux/gpio/driver.h:41:33: error: field 'msiinfo' has incomplete type
drivers/iommu/iommufd/viommu.c:4:
include/linux/msi.h:528:33: error: field 'alloc_info' has incomplete type
Tighten the dependency further to only allow compile testing on Arm.
This could be refined further to allow certain x86 configs.
This was submitted before to address a different build failure, which was
fixed differently, but the problem has now returned in a different form.
Fixes: 70afdab904d2d1e6 ("irqchip: Add IMX MU MSI controller driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250805160952.4006075-1-arnd@kernel.org
Link: https://lore.kernel.org/all/20221215164109.761427-1-arnd@kernel.org/
|
|
GICv5 LPI interrupts have an active state hence they cannot retrigger
while the interrupt is being handled.
Therefore, setting the IRQD_RESEND_WHEN_IN_PROGRESS flag on LPIs is
pointless, as the situation this flag caters for cannot happen.
Remove it.
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20250801-gic-v5-fixes-6-17-v1-3-4fcedaccf9e6@kernel.org
|
|
The 0-day bot reported that on the failure path the driver iounmap()s IWB
resources that are managed through devm_ioremap(), which is clearly wrong
because the driver would end up unmapping the MMIO resource twice on
probing failure.
Fix this by removing the error path altogether and by letting devres manage
the iounmapping on clean-up.
Fixes: 695949d8b16f ("irqchip/gic-v5: Add GICv5 IWB support")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20250801-gic-v5-fixes-6-17-v1-1-4fcedaccf9e6@kernel.org
Closes: https://lore.kernel.org/oe-kbuild-all/202508010038.N3r4ZmII-lkp@intel.com
|
|
When a kexec'ed kernel boots up, there might be stale unhandled interrupts
pending in the interrupt controller. These are delivered as spurious
interrupts once the boot CPU enables interrupts.
Clear all pending interrupts when the driver is initialized to prevent
these spurious interrupts from locking the CPU in an endless loop.
Signed-off-by: Elad Nachman <enachman@marvell.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250803102548.669682-2-enachman@marvell.com
|
|
Commit 8b65db1e93a2 ("irqchip/msi-lib: Add IRQ_DOMAIN_FLAG_FWNODE_PARENT
handling") added logic in msi_lib_irq_domain_select() to match the domain
fwnode against the fwnode parent of the fwspec.fwnode.
The fwnode_get_parent() caller must call fwnode_handle_put() on the
returned pointer value, lest fwnode refcounting for the parent ends up
being out of kilter.
Fix this by relying on the fwnode_handle clean-up handlers and by
incrementing the fwnode refcount regardless of whether parent matching is
used or not (the domain selection code already holds a reference before
calling msi_lib_irq_domain_select() but to make the exit path more uniform
if IRQ_DOMAIN_FLAG_FWNODE_PARENT is not set fwnode_handle_get() is called
again on fwspec.fwnode so that the clean-up code is the same for the two
matching patterns).
Fixes: 8b65db1e93a2 ("irqchip/msi-lib: Add IRQ_DOMAIN_FLAG_FWNODE_PARENT handling")
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250804145553.795065-1-lpieralisi@kernel.org
|
|
smatch warns about a dereference before check:
drivers/irqchip/irq-riscv-imsic-platform.c:317 imsic_irqdomain_init() warn: variable dereferenced before check 'imsic' (see line 311)
Cure it by moving the firmware not assignement after the checks.
Fixes: 59422904dd98 ("irqchip/riscv-imsic: Convert to msi_create_parent_irq_domain() helper")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Closes: https://lore.kernel.org/r/202507311953.NFVZkr0a-lkp@intel.com/ ---
drivers/irqchip/irq-riscv-imsic-platform.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
|
|
Pull kvm updates from Paolo Bonzini:
"ARM:
- Host driver for GICv5, the next generation interrupt controller for
arm64, including support for interrupt routing, MSIs, interrupt
translation and wired interrupts
- Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
GICv5 hardware, leveraging the legacy VGIC interface
- Userspace control of the 'nASSGIcap' GICv3 feature, allowing
userspace to disable support for SGIs w/o an active state on
hardware that previously advertised it unconditionally
- Map supporting endpoints with cacheable memory attributes on
systems with FEAT_S2FWB and DIC where KVM no longer needs to
perform cache maintenance on the address range
- Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the
guest hypervisor to inject external aborts into an L2 VM and take
traps of masked external aborts to the hypervisor
- Convert more system register sanitization to the config-driven
implementation
- Fixes to the visibility of EL2 registers, namely making VGICv3
system registers accessible through the VGIC device instead of the
ONE_REG vCPU ioctls
- Various cleanups and minor fixes
LoongArch:
- Add stat information for in-kernel irqchip
- Add tracepoints for CPUCFG and CSR emulation exits
- Enhance in-kernel irqchip emulation
- Various cleanups
RISC-V:
- Enable ring-based dirty memory tracking
- Improve perf kvm stat to report interrupt events
- Delegate illegal instruction trap to VS-mode
- MMU improvements related to upcoming nested virtualization
s390x
- Fixes
x86:
- Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O
APIC, PIC, and PIT emulation at compile time
- Share device posted IRQ code between SVM and VMX and harden it
against bugs and runtime errors
- Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups
O(1) instead of O(n)
- For MMIO stale data mitigation, track whether or not a vCPU has
access to (host) MMIO based on whether the page tables have MMIO
pfns mapped; using VFIO is prone to false negatives
- Rework the MSR interception code so that the SVM and VMX APIs are
more or less identical
- Recalculate all MSR intercepts from scratch on MSR filter changes,
instead of maintaining shadow bitmaps
- Advertise support for LKGS (Load Kernel GS base), a new instruction
that's loosely related to FRED, but is supported and enumerated
independently
- Fix a user-triggerable WARN that syzkaller found by setting the
vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting
the vCPU into VMX Root Mode (post-VMXON). Trying to detect every
possible path leading to architecturally forbidden states is hard
and even risks breaking userspace (if it goes from valid to valid
state but passes through invalid states), so just wait until
KVM_RUN to detect that the vCPU state isn't allowed
- Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling
interception of APERF/MPERF reads, so that a "properly" configured
VM can access APERF/MPERF. This has many caveats (APERF/MPERF
cannot be zeroed on vCPU creation or saved/restored on suspend and
resume, or preserved over thread migration let alone VM migration)
but can be useful whenever you're interested in letting Linux
guests see the effective physical CPU frequency in /proc/cpuinfo
- Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been
created, as there's no known use case for changing the default
frequency for other VM types and it goes counter to the very reason
why the ioctl was added to the vm file descriptor. And also, there
would be no way to make it work for confidential VMs with a
"secure" TSC, so kill two birds with one stone
- Dynamically allocation the shadow MMU's hashed page list, and defer
allocating the hashed list until it's actually needed (the TDP MMU
doesn't use the list)
- Extract many of KVM's helpers for accessing architectural local
APIC state to common x86 so that they can be shared by guest-side
code for Secure AVIC
- Various cleanups and fixes
x86 (Intel):
- Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest.
Failure to honor FREEZE_IN_SMM can leak host state into guests
- Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to
prevent L1 from running L2 with features that KVM doesn't support,
e.g. BTF
x86 (AMD):
- WARN and reject loading kvm-amd.ko instead of panicking the kernel
if the nested SVM MSRPM offsets tracker can't handle an MSR (which
is pretty much a static condition and therefore should never
happen, but still)
- Fix a variety of flaws and bugs in the AVIC device posted IRQ code
- Inhibit AVIC if a vCPU's ID is too big (relative to what hardware
supports) instead of rejecting vCPU creation
- Extend enable_ipiv module param support to SVM, by simply leaving
IsRunning clear in the vCPU's physical ID table entry
- Disable IPI virtualization, via enable_ipiv, if the CPU is affected
by erratum #1235, to allow (safely) enabling AVIC on such CPUs
- Request GA Log interrupts if and only if the target vCPU is
blocking, i.e. only if KVM needs a notification in order to wake
the vCPU
- Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to
the vCPU's CPUID model
- Accept any SNP policy that is accepted by the firmware with respect
to SMT and single-socket restrictions. An incompatible policy
doesn't put the kernel at risk in any way, so there's no reason for
KVM to care
- Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and
use WBNOINVD instead of WBINVD when possible for SEV cache
maintenance
- When reclaiming memory from an SEV guest, only do cache flushes on
CPUs that have ever run a vCPU for the guest, i.e. don't flush the
caches for CPUs that can't possibly have cache lines with dirty,
encrypted data
Generic:
- Rework irqbypass to track/match producers and consumers via an
xarray instead of a linked list. Using a linked list leads to
O(n^2) insertion times, which is hugely problematic for use cases
that create large numbers of VMs. Such use cases typically don't
actually use irqbypass, but eliminating the pointless registration
is a future problem to solve as it likely requires new uAPI
- Track irqbypass's "token" as "struct eventfd_ctx *" instead of a
"void *", to avoid making a simple concept unnecessarily difficult
to understand
- Decouple device posted IRQs from VFIO device assignment, as binding
a VM to a VFIO group is not a requirement for enabling device
posted IRQs
- Clean up and document/comment the irqfd assignment code
- Disallow binding multiple irqfds to an eventfd with a priority
waiter, i.e. ensure an eventfd is bound to at most one irqfd
through the entire host, and add a selftest to verify eventfd:irqfd
bindings are globally unique
- Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues
related to private <=> shared memory conversions
- Drop guest_memfd's .getattr() implementation as the VFS layer will
call generic_fillattr() if inode_operations.getattr is NULL
- Fix issues with dirty ring harvesting where KVM doesn't bound the
processing of entries in any way, which allows userspace to keep
KVM in a tight loop indefinitely
- Kill off kvm_arch_{start,end}_assignment() and x86's associated
tracking, now that KVM no longer uses assigned_device_count as a
heuristic for either irqbypass usage or MDS mitigation
Selftests:
- Fix a comment typo
- Verify KVM is loaded when getting any KVM module param so that
attempting to run a selftest without kvm.ko loaded results in a
SKIP message about KVM not being loaded/enabled (versus some random
parameter not existing)
- Skip tests that hit EACCES when attempting to access a file, and
print a "Root required?" help message. In most cases, the test just
needs to be run with elevated permissions"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits)
Documentation: KVM: Use unordered list for pre-init VGIC registers
RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map()
RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs
RISC-V: perf/kvm: Add reporting of interrupt events
RISC-V: KVM: Enable ring-based dirty memory tracking
RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap
RISC-V: KVM: Delegate illegal instruction fault to VS mode
RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs
RISC-V: KVM: Factor-out g-stage page table management
RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence
RISC-V: KVM: Introduce struct kvm_gstage_mapping
RISC-V: KVM: Factor-out MMU related declarations into separate headers
RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect()
RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range()
RISC-V: KVM: Don't flush TLB when PTE is unchanged
RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH
RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize()
RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init()
RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value
KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt chip driver updates from Thomas Gleixner:
- Add support of forced affinity setting to yet offline CPUs for the
MIPS-GIC to ensure that the affinity of per CPU interrupts can be set
during the early bringup phase of a secondary CPU in the hotplug code
before the CPU is set online and interrupts are enabled
- Add support for the MIPS (RISC-V !?!?) P8700 SoC in the ACLINT_SSWI
interrupt chip
- Make the interrupt routing to RISV-V harts specification compliant so
it supports arbitrary hart indices
- Add a command line parameter and related handling to disable the
generic RISCV IMSIC mechanism on platforms which use a trap-emulated
IMSIC. Unfortunatly this is required because there is no mechanism
available to discover this programatically.
- Enable wakeup sources on the Renesas RZV2H driver
- Convert interrupt chip drivers, which use a open coded variant of
msi_create_parent_irq_domain() to use the new functionality
- Convert interrupt chip drivers, which use the old style two level
implementation of MSI support over to the MSI parent mechanism to
prepare for removing at least one of the three PCI/MSI backend
variants.
- The usual cleanups and improvements all over the place
* tag 'irq-drivers-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
irqchip/renesas-irqc: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
irqchip/renesas-intc-irqpin: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
irqchip/riscv-imsic: Add kernel parameter to disable IPIs
irqchip/gic-v3: Fix GICD_CTLR register naming
irqchip/ls-scfg-msi: Fix NULL dereference in error handling
irqchip/ls-scfg-msi: Switch to use msi_create_parent_irq_domain()
irqchip/armada-370-xp: Switch to msi_create_parent_irq_domain()
irqchip/alpine-msi: Switch to msi_create_parent_irq_domain()
irqchip/alpine-msi: Convert to __free
irqchip/alpine-msi: Convert to lock guards
irqchip/alpine-msi: Clean up whitespace style
irqchip/sg2042-msi: Switch to msi_create_parent_irq_domain()
irqchip/loongson-pch-msi.c: Switch to msi_create_parent_irq_domain()
irqchip/imx-mu-msi: Convert to msi_create_parent_irq_domain() helper
irqchip/riscv-imsic: Convert to msi_create_parent_irq_domain() helper
irqchip/bcm2712-mip: Switch to msi_create_parent_irq_domain()
irqdomain: Add device pointer to irq_domain_info and msi_domain_info
irqchip/renesas-rzv2h: Remove unneeded includes
irqchip/renesas-rzv2h: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
irqchip/aslint-sswi: Resolve hart index
...
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.17, round #1
- Host driver for GICv5, the next generation interrupt controller for
arm64, including support for interrupt routing, MSIs, interrupt
translation and wired interrupts.
- Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
GICv5 hardware, leveraging the legacy VGIC interface.
- Userspace control of the 'nASSGIcap' GICv3 feature, allowing
userspace to disable support for SGIs w/o an active state on hardware
that previously advertised it unconditionally.
- Map supporting endpoints with cacheable memory attributes on systems
with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
maintenance on the address range.
- Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
hypervisor to inject external aborts into an L2 VM and take traps of
masked external aborts to the hypervisor.
- Convert more system register sanitization to the config-driven
implementation.
- Fixes to the visibility of EL2 registers, namely making VGICv3 system
registers accessible through the VGIC device instead of the ONE_REG
vCPU ioctls.
- Various cleanups and minor fixes.
|
|
KVM IRQ changes for 6.17
- Rework irqbypass to track/match producers and consumers via an xarray
instead of a linked list. Using a linked list leads to O(n^2) insertion
times, which is hugely problematic for use cases that create large numbers
of VMs. Such use cases typically don't actually use irqbypass, but
eliminating the pointless registration is a future problem to solve as it
likely requires new uAPI.
- Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *",
to avoid making a simple concept unnecessarily difficult to understand.
- Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC,
and PIT emulation at compile time.
- Drop x86's irq_comm.c, and move a pile of IRQ related code into irq.c.
- Fix a variety of flaws and bugs in the AVIC device posted IRQ code.
- Inhibited AVIC if a vCPU's ID is too big (relative to what hardware
supports) instead of rejecting vCPU creation.
- Extend enable_ipiv module param support to SVM, by simply leaving IsRunning
clear in the vCPU's physical ID table entry.
- Disable IPI virtualization, via enable_ipiv, if the CPU is affected by
erratum #1235, to allow (safely) enabling AVIC on such CPUs.
- Dedup x86's device posted IRQ code, as the vast majority of functionality
can be shared verbatime between SVM and VMX.
- Harden the device posted IRQ code against bugs and runtime errors.
- Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1)
instead of O(n).
- Generate GA Log interrupts if and only if the target vCPU is blocking, i.e.
only if KVM needs a notification in order to wake the vCPU.
- Decouple device posted IRQs from VFIO device assignment, as binding a VM to
a VFIO group is not a requirement for enabling device posted IRQs.
- Clean up and document/comment the irqfd assignment code.
- Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e.
ensure an eventfd is bound to at most one irqfd through the entire host,
and add a selftest to verify eventfd:irqfd bindings are globally unique.
|
|
Convert the Renesas IRQC driver from SIMPLE_DEV_PM_OPS() to
DEFINE_SIMPLE_DEV_PM_OPS() and pm_sleep_ptr(). This allows to drop the
__maybe_unused annotations from its suspend callback, and reduces kernel
size in case CONFIG_PM or CONFIG_PM_SLEEP is disabled.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/5a14f9932da20ec46cde27f314414474072755ed.1752086718.git.geert+renesas@glider.be
|
|
Convert the Renesas INTC External IRQ Pin driver from SIMPLE_DEV_PM_OPS()
to DEFINE_SIMPLE_DEV_PM_OPS() and pm_sleep_ptr(). This allows to drop the
__maybe_unused annotations from its suspend callbacks, and reduces kernel
size in case CONFIG_PM or CONFIG_PM_SLEEP is disabled.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/865e5274cc516d8c345048330a46e753e2bda677.1752086656.git.geert+renesas@glider.be
|
|
When injecting IPIs to a set of harts, the IMSIC IPI support will do a
separate MMIO write to the SETIPNUM_LE register of each target hart. This
means on a platform where IMSIC is trap-n-emulated, there will be N MMIO
traps when injecting IPI to N target harts hence IMSIC IPIs will be slow on
such platforms compared to the SBI IPI extension.
Unfortunately, there is no DT, ACPI, or any other way of discovering
whether the underlying IMSIC is trap-n-emulated. Using MMIO write to the
SETIPNUM_LE register for injecting IPI is purely a software choice in the
IMSIC driver hence add a kernel parameter to allow users to disable IMSIC
IPIs on platforms with trap-n-emulated IMSIC.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250716123745.557585-1-apatel@ventanamicro.com
|
|
It was incorrectly named as GICD_CTRL in a pr_info() and comments. Fix
them.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20250709130046.1354-1-yuzenghui@huawei.com
|
|
The call to irq_domain_remove(msi_data->parent); was accidentally left
behind during a code refactor. It's not necessary to free
"msi_data->parent" because it is NULL and, in fact, trying to free it
will lead to a NULL pointer dereference. Delete the unnecessary code.
Fixes: 94b59d5f567a ("irqchip/ls-scfg-msi: Switch to use msi_create_parent_irq_domain()")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/all/15059507-6422-4333-94ca-e8e8840bd289@sabinyo.mountain
|
|
Populate the gic_kvm_info struct based on support for
FEAT_GCIE_LEGACY. The struct is used by KVM to probe for a compatible
GIC.
Co-authored-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Link: https://lore.kernel.org/r/20250627100847.1022515-3-sascha.bischoff@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|