Age | Commit message (Collapse) | Author |
|
It is now possible to select CONFIG_KVM_ARM_TIMER to enable the
KVM architected timer support.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Do the necessary save/restore dance for the timers in the world
switch code. In the process, allow the guest to read the physical
counter, which is useful for its own clock_event_device.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Add some the architected timer related infrastructure, and support timer
interrupt injection, which can happen as a resultof three possible
events:
- The virtual timer interrupt has fired while we were still
executing the guest
- The timer interrupt hasn't fired, but it expired while we
were doing the world switch
- A hrtimer we programmed earlier has fired
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into kvm-arm/timer
|
|
It is now possible to select the VGIC configuration option.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Add the init code for the hypervisor, the virtual machine, and
the virtual CPUs.
An interrupt handler is also wired to allow the VGIC maintenance
interrupts, used to deal with level triggered interrupts and LR
underflows.
A CPU hotplug notifier is registered to disable/enable the interrupt
as requested.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Enable the VGIC control interface to be save-restored on world switch.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Plug the interrupt injection code. Interrupts can now be generated
from user space.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
An interrupt may have been disabled after being made pending on the
CPU interface (the classic case is a timer running while we're
rebooting the guest - the interrupt would kick as soon as the CPU
interface gets enabled, with deadly consequences).
The solution is to examine already active LRs, and check the
interrupt is still enabled. If not, just retire it.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Add VGIC virtual CPU interface code, picking pending interrupts
from the distributor and stashing them in the VGIC control interface
list registers.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Add the GIC distributor emulation code. A number of the GIC features
are simply ignored as they are not required to boot a Linux guest.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
User space defines the model to emulate to a guest and should therefore
decide which addresses are used for both the virtual CPU interface
directly mapped in the guest physical address space and for the emulated
distributor interface, which is mapped in software by the in-kernel VGIC
support.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Wire the basic framework code for VGIC support and the initial in-kernel
MMIO support code for the VGIC, used for the distributor emulation.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
When an interrupt occurs for the guest, it is sometimes necessary
to find out which vcpu was running at that point.
Keep track of which vcpu is being run in kvm_arch_vcpu_ioctl_run(),
and allow the data to be retrieved using either:
- kvm_arm_get_running_vcpu(): returns the vcpu running at this point
on the current CPU. Can only be used in a non-preemptible context.
- kvm_arm_get_running_vcpus(): returns the per-CPU variable holding
the running vcpus, usable for per-CPU interrupts.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
On ARM some bits are specific to the model being emulated for the guest and
user space needs a way to tell the kernel about those bits. An example is mmio
device base addresses, where KVM must know the base address for a given device
to properly emulate mmio accesses within a certain address range or directly
map a device with virtualiation extensions into the guest address space.
We make this API ARM-specific as we haven't yet reached a consensus for a
generic API for all KVM architectures that will allow us to do something like
this.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
git://linux-arm.org/linux-mr into for-arm-soc/arch-timers
|
|
Implement timer_broadcast for the arm architecture, allowing for the use
of clock_event_device_drivers decoupled from the timer tick broadcast
mechanism.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
|
|
Currently, the ARM backend must maintain a redundant list of timers for
the purpose of centralising timer broadcast functionality. This prevents
sharing timer drivers across architectures.
This patch moves the pain of dealing with timer broadcasts to the core
clockevents tick broadcast code, which already maintains its own list
of timers.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
|
|
The core functionality of the arch_timer driver is not directly tied to
anything under arch/arm, and can be split out.
This patch factors out the core of the arch_timer driver, so it can be
shared with other architectures. A couple of functions are added so
that architecture-specific code can interact with the driver without
needing to touch its internals.
The ARM_ARCH_TIMER config variable is moved out to
drivers/clocksource/Kconfig, existing uses in arch/arm are replaced with
HAVE_ARM_ARCH_TIMER, which selects it.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Several bits in CNTKCTL reset to 0, including PL0VTEN. For architectures
using the generic timer which wish to have a fast gettimeofday vDSO
implementation, these bits must be set to 1 by the kernel. For
architectures without a vDSO, it's best to leave the bits set to 0 for
now to ensure that if and when support is added, it's implemented sanely
architecture wide.
As the bootloader might set PL0VTEN to a value that doesn't correspond
to that which the kernel prefers, we must explicitly set it to the
architecture port's preferred value.
This patch adds arch_counter_set_user_access, which sets the PL0 access
permissions to that required by the architecture. For arch/arm, this
currently means disabling all userspace access.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently, the arch_timer driver is tied to the arm port, as it relies
on code in arch/arm/smp.c to setup and teardown timers as cores are
hotplugged on and off. The timer is registered through an arm-specific
registration mechanism, preventing sharing the driver with the arm64
port.
This patch moves the driver to using a cpu notifier instead, making it
easier to port.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
|
|
Without the isbs in arch_timer_get_cnt{p,v}ct the cpu may speculate
reads and return stale values. This could be bad for code sensitive to
changes in expected deltas between calls (e.g. the delay loop).
Without isbs in arch_timer_reg_write the processor may reorder
instructions around enabling/disabling of the timer or writing the
compare value, which we probably don't want.
This patch adds isbs to prevent those issues.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently the arch_timer register accessors are thrown together with
the main driver, preventing us from porting the driver to other
architectures.
This patch moves the register accessors into a header file, as with
the arm64 version. Constants required by the accessors are also moved.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
|
|
The CNTFRQ register is not duplicated for physical and virtual timers,
and accessing it as if it were is confusing.
Instead, use a separate accessor which doesn't take the access type
as a parameter.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
|
|
We're currently inconsistent with respect to our accesses to the
physical and virtual counters, mixing and matching the two.
This patch introduces and uses a function pointer for accessing the
correct counter based on whether we're using physical or virtual
interrupts. All current accesses to the counter accessors are redirected
through it.
When the driver is moved out to drivers/clocksource, there's the
possibility that code called before the timer code is initialised will
attempt to call arch_timer_read_counter (e.g. sched_clock for AArch64).
To avoid having to have to check whether the timer has been initialised
either in arch_timer_read_counter or one of it's callers, a default
implementation is assigned that simply returns 0.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
|
|
To ensure the correct size of types, use u64 for the return value of
arch_timer_get_cnt{p,v}ct, and u32 for arch_timer_rate, matching the
size of the registers these values are taken from. While we're changing
them anyway, simplify the implementation of arch_timer_get_cnt{p,v}ct.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
|
|
This check is a holdover from the pre-devicetree days. As the timer
is not probed except by platforms which register it via devicetree,
it's not strictly necessary.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
When we get the device_node for the arch timer, it's refcount is
automatically incremented in of_find_matching_node, but it is
never decremented.
This patch decrements the refcount on the node after we're finished
using it.
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Pull ARM fixes from Russell King:
"A number of fixes:
Patrik found a problem with preempt counting in the VFP assembly
functions which can cause the preempt count to be upset.
Nicolas fixed a problem with the parsing of the DT when it straddles a
1MB boundary.
Subhash Jadavani reported a problem with sparsemem and our highmem
support for cache maintanence for DMA areas, and TI found a bug in
their strongly ordered memory mapping type.
Also, three fixes by way of Will Deacon's tree from Dave Martin for
instruction compatibility and Marc Zyngier to fix hypervisor boot mode
issues."
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7629/1: mm: Fix missing XN flag for for MT_MEMORY_SO
ARM: DMA: Fix struct page iterator in dma_cache_maint() to work with sparsemem
ARM: 7628/1: head.S: map one extra section for the ATAG/DTB area
ARM: 7627/1: Predicate preempt logic on PREEMP_COUNT not PREEMPT alone
ARM: virt: simplify __hyp_stub_install epilog
ARM: virt: boot secondary CPUs through the right entry point
ARM: virt: Avoid bx instruction for compatibility with <=ARMv4
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Here's a long-pending fixes pull request for arm-soc (I didn't send
one in the -rc4 cycle).
The larger deltas are from:
- A fixup of error paths in the mvsdio driver
- Header file move for a driver that hadn't been properly converted
to multiplatform on i.MX, which was causing build failures when
included
- Device tree updates for at91 dealing mostly with their new pinctrl
setup merged in 3.8 and mistakes in those initial configs
The rest are the normal mix of small fixes all over the place; sunxi,
omap, imx, mvebu, etc, etc."
* tag 'fixes-for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (40 commits)
mfd: vexpress-sysreg: Don't skip initialization on probe
ARM: vexpress: Enable A7 cores in V2P-CA15_A7's Device Tree
ARM: vexpress: extend the MPIDR range used for pen release check
ARM: at91/dts: correct comment in at91sam9x5.dtsi for mii
ARM: at91/at91_dt_defconfig: add at91sam9n12 SoC to DT defconfig
ARM: at91/at91_dt_defconfig: remove memory specification to cmdline
ARM: at91/dts: add macb mii pinctrl config for kizbox
ARM: at91: rm9200: remake the BGA as default version
ARM: at91: fix gpios on i2c-gpio for RM9200 DT
ARM: at91/at91sam9x5 DTS: add SCK USART pins
ARM: at91/at91sam9x5 DTS: correct wrong PIO BANK values on u(s)arts
ARM: at91/at91-pinctrl documentation: fix typo and add some details
ARM: kirkwood: fix missing #interrupt-cells property
mmc: mvsdio: use devm_ API to simplify/correct error paths.
clk: mvebu/clk-cpu.c: fix memory leakage
ARM: OMAP2+: omap4-panda: add UART2 muxing for WiLink shared transport
ARM: OMAP2+: DT node Timer iteration fix
ARM: OMAP2+: Fix section warning for omap_init_ocp2scp()
ARM: OMAP2+: fix build break for omapdrm
ARM: OMAP2: Fix missing omap2xxx_clkt_vps_late_init function calls
...
|
|
into fixes
From Pawel Moll:
- makes the V2P-CA15_A7 (a.k.a. TC2) work with 3.8 kernels
- improves vexpress-sysreg.c behaviour on arm64 platforms
* 'vexpress/fixes' of git://git.linaro.org/people/pawelmoll/linux:
mfd: vexpress-sysreg: Don't skip initialization on probe
ARM: vexpress: Enable A7 cores in V2P-CA15_A7's Device Tree
ARM: vexpress: extend the MPIDR range used for pen release check
|
|
From Nicolas Ferre:
Here are fixes for AT91 that are mainly related to device tree.
One RM9200 setup option is the only C code change.
Some documentation changes can clarify the pinctrl use.
Then, some defconfig modifications are allowing the affected platforms
to boot.
* tag 'at91-fixes' of git://github.com/at91linux/linux-at91:
ARM: at91/dts: correct comment in at91sam9x5.dtsi for mii
ARM: at91/at91_dt_defconfig: add at91sam9n12 SoC to DT defconfig
ARM: at91/at91_dt_defconfig: remove memory specification to cmdline
ARM: at91/dts: add macb mii pinctrl config for kizbox
ARM: at91: rm9200: remake the BGA as default version
ARM: at91: fix gpios on i2c-gpio for RM9200 DT
ARM: at91/at91sam9x5 DTS: add SCK USART pins
ARM: at91/at91sam9x5 DTS: correct wrong PIO BANK values on u(s)arts
ARM: at91/at91-pinctrl documentation: fix typo and add some details
|
|
As the kernel is able to cope with multiple clusters,
uncomment the A7 cores in the Device Tree for V2P-CA15_A7
tile, making all 5 cores available to the user.
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
|
|
In ARM multi-cluster systems the MPIDR affinity level 0 cannot be used as a
single cpu identifier, affinity levels 1 and 2 must be taken into account as
well.
This patch extends the MPIDR usage to affinity levels 1 and 2 in versatile
secondary cores start up code in order to compare the passed pen_release
value with the full-blown affinity mask.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
|
|
|
|
git://git.infradead.org/users/jcooper/linux into fixes
From Jason Cooper:
mvebu fixes for v3.8-rc5
- fix memory leak in mvebu/clk-cpu.c
- use devm_ to correct/simplify error paths in mvsdio
- add missing #interrupt-cells property in kirkwood
* tag 'mvebu_fixes_for_v3.8-rc5' of git://git.infradead.org/users/jcooper/linux:
ARM: kirkwood: fix missing #interrupt-cells property
mmc: mvsdio: use devm_ API to simplify/correct error paths.
clk: mvebu/clk-cpu.c: fix memory leakage
|
|
Implement the PSCI specification (ARM DEN 0022A) to control
virtual CPUs being "powered" on or off.
PSCI/KVM is detected using the KVM_CAP_ARM_PSCI capability.
A virtual CPU can now be initialized in a "powered off" state,
using the KVM_ARM_VCPU_POWER_OFF feature flag.
The guest can use either SMC or HVC to execute a PSCI function.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
|
|
When the guest accesses I/O memory this will create data abort
exceptions and they are handled by decoding the HSR information
(physical address, read/write, length, register) and forwarding reads
and writes to QEMU which performs the device emulation.
Certain classes of load/store operations do not support the syndrome
information provided in the HSR. We don't support decoding these (patches
are available elsewhere), so we report an error to user space in this case.
This requires changing the general flow somewhat since new calls to run
the VCPU must check if there's a pending MMIO load and perform the write
after userspace has made the data available.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
|
|
Handles the guest faults in KVM by mapping in corresponding user pages
in the 2nd stage page tables.
We invalidate the instruction cache by MVA whenever we map a page to the
guest (no, we cannot only do it when we have an iabt because the guest
may happily read/write a page before hitting the icache) if the hardware
uses VIPT or PIPT. In the latter case, we can invalidate only that
physical page. In the first case, all bets are off and we simply must
invalidate the whole affair. Not that VIVT icaches are tagged with
vmids, and we are out of the woods on that one. Alexander Graf was nice
enough to remind us of this massive pain.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
|
|
We use space #18 for floating point regs.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
|
|
The Cache Size Selection Register (CSSELR) selects the current Cache
Size ID Register (CCSIDR). You write which cache you are interested
in to CSSELR, and read the information out of CCSIDR.
Which cache numbers are valid is known by reading the Cache Level ID
Register (CLIDR).
To export this state to userspace, we add a KVM_REG_ARM_DEMUX
numberspace (17), which uses 8 bits to represent which register is
being demultiplexed (0 for CCSIDR), and the lower 8 bits to represent
this demultiplexing (in our case, the CSSELR value, which is 4 bits).
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
|
|
The following three ioctls are implemented:
- KVM_GET_REG_LIST
- KVM_GET_ONE_REG
- KVM_SET_ONE_REG
Now we have a table for all the cp15 registers, we can drive a generic
API.
The register IDs carry the following encoding:
ARM registers are mapped using the lower 32 bits. The upper 16 of that
is the register group type, or coprocessor number:
ARM 32-bit CP15 registers have the following id bit patterns:
0x4002 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3>
ARM 64-bit CP15 registers have the following id bit patterns:
0x4003 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3>
For futureproofing, we need to tell QEMU about the CP15 registers the
host lets the guest access.
It will need this information to restore a current guest on a future
CPU or perhaps a future KVM which allow some of these to be changed.
We use a separate table for these, as they're only for the userspace API.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
|
|
Adds a new important function in the main KVM/ARM code called
handle_exit() which is called from kvm_arch_vcpu_ioctl_run() on returns
from guest execution. This function examines the Hyp-Syndrome-Register
(HSR), which contains information telling KVM what caused the exit from
the guest.
Some of the reasons for an exit are CP15 accesses, which are
not allowed from the guest and this commit handles these exits by
emulating the intended operation in software and skipping the guest
instruction.
Minor notes about the coproc register reset:
1) We reserve a value of 0 as an invalid cp15 offset, to catch bugs in our
table, at cost of 4 bytes per vcpu.
2) Added comments on the table indicating how we handle each register, for
simplicity of understanding.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
|
|
Provides complete world-switch implementation to switch to other guests
running in non-secure modes. Includes Hyp exception handlers that
capture necessary exception information and stores the information on
the VCPU and KVM structures.
The following Hyp-ABI is also documented in the code:
Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
Switching to Hyp mode is done through a simple HVC #0 instruction. The
exception vector code will check that the HVC comes from VMID==0 and if
so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
- r0 contains a pointer to a HYP function
- r1, r2, and r3 contain arguments to the above function.
- The HYP function will be called with its arguments in r0, r1 and r2.
On HYP function return, we return directly to SVC.
A call to a function executing in Hyp mode is performed like the following:
<svc code>
ldr r0, =BSYM(my_hyp_fn)
ldr r1, =my_param
hvc #0 ; Call my_hyp_fn(my_param) from HYP mode
<svc code>
Otherwise, the world-switch is pretty straight-forward. All state that
can be modified by the guest is first backed up on the Hyp stack and the
VCPU values is loaded onto the hardware. State, which is not loaded, but
theoretically modifiable by the guest is protected through the
virtualiation features to generate a trap and cause software emulation.
Upon guest returns, all state is restored from hardware onto the VCPU
struct and the original state is restored from the Hyp-stack onto the
hardware.
SMP support using the VMPIDR calculated on the basis of the host MPIDR
and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
a separate patch into the appropriate patches introducing the
functionality. Note that the VMIDs are stored per VM as required by the ARM
architecture reference manual.
To support VFP/NEON we trap those instructions using the HPCTR. When
we trap, we switch the FPU. After a guest exit, the VFP state is
returned to the host. When disabling access to floating point
instructions, we also mask FPEXC_EN in order to avoid the guest
receiving Undefined instruction exceptions before we have a chance to
switch back the floating point state. We are reusing vfp_hard_struct,
so we depend on VFPv3 being enabled in the host kernel, if not, we still
trap cp10 and cp11 in order to inject an undefined instruction exception
whenever the guest tries to use VFP/NEON. VFP/NEON developed by
Antionios Motakis and Rusty Russell.
Aborts that are permission faults, and not stage-1 page table walk, do
not report the faulting address in the HPFAR. We have to resolve the
IPA, and store it just like the HPFAR register on the VCPU struct. If
the IPA cannot be resolved, it means another CPU is playing with the
page tables, and we simply restart the guest. This quirk was fixed by
Marc Zyngier.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
|
|
All interrupt injection is now based on the VM ioctl KVM_IRQ_LINE. This
works semantically well for the GIC as we in fact raise/lower a line on
a machine component (the gic). The IOCTL uses the follwing struct.
struct kvm_irq_level {
union {
__u32 irq; /* GSI */
__s32 status; /* not used for KVM_IRQ_LEVEL */
};
__u32 level; /* 0 or 1 */
};
ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
(GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
specific cpus. The irq field is interpreted like this:
bits: | 31 ... 24 | 23 ... 16 | 15 ... 0 |
field: | irq_type | vcpu_index | irq_number |
The irq_type field has the following values:
- irq_type[0]: out-of-kernel GIC: irq_number 0 is IRQ, irq_number 1 is FIQ
- irq_type[1]: in-kernel GIC: SPI, irq_number between 32 and 1019 (incl.)
(the vcpu_index field is ignored)
- irq_type[2]: in-kernel GIC: PPI, irq_number between 16 and 31 (incl.)
The irq_number thus corresponds to the irq ID in as in the GICv2 specs.
This is documented in Documentation/kvm/api.txt.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
|
|
This commit introduces the framework for guest memory management
through the use of 2nd stage translation. Each VM has a pointer
to a level-1 table (the pgd field in struct kvm_arch) which is
used for the 2nd stage translations. Entries are added when handling
guest faults (later patch) and the table itself can be allocated and
freed through the following functions implemented in
arch/arm/kvm/arm_mmu.c:
- kvm_alloc_stage2_pgd(struct kvm *kvm);
- kvm_free_stage2_pgd(struct kvm *kvm);
Each entry in TLBs and caches are tagged with a VMID identifier in
addition to ASIDs. The VMIDs are assigned consecutively to VMs in the
order that VMs are executed, and caches and tlbs are invalidated when
the VMID space has been used to allow for more than 255 simultaenously
running guests.
The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is
freed in kvm_arch_destroy_vm(). Both functions are called from the main
KVM code.
We pre-allocate page table memory to be able to synchronize using a
spinlock and be called under rcu_read_lock from the MMU notifiers. We
steal the mmu_memory_cache implementation from x86 and adapt for our
specific usage.
We support MMU notifiers (thanks to Marc Zyngier) through
kvm_unmap_hva and kvm_set_spte_hva.
Finally, define kvm_phys_addr_ioremap() to map a device at a guest IPA,
which is used by VGIC support to map the virtual CPU interface registers
to the guest. This support is added by Marc Zyngier.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
|
|
Sets up KVM code to handle all exceptions taken to Hyp mode.
When the kernel is booted in Hyp mode, calling an hvc instruction with r0
pointing to the new vectors, the HVBAR is changed to the the vector pointers.
This allows subsystems (like KVM here) to execute code in Hyp-mode with the
MMU disabled.
We initialize other Hyp-mode registers and enables the MMU for Hyp-mode from
the id-mapped hyp initialization code. Afterwards, the HVBAR is changed to
point to KVM Hyp vectors used to catch guest faults and to switch to Hyp mode
to perform a world-switch into a KVM guest.
Also provides memory mapping code to map required code pages, data structures,
and I/O regions accessed in Hyp mode at the same virtual address as the host
kernel virtual addresses, but which conforms to the architectural requirements
for translations in Hyp mode. This interface is added in arch/arm/kvm/arm_mmu.c
and comprises:
- create_hyp_mappings(from, to);
- create_hyp_io_mappings(from, to, phys_addr);
- free_hyp_pmds();
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
|
|
Targets KVM support for Cortex A-15 processors.
Contains all the framework components, make files, header files, some
tracing functionality, and basic user space API.
Only supported core is Cortex-A15 for now.
Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
|
|
Add a method (hyp_idmap_setup) to populate a hyp pgd with an
identity mapping of the code contained in the .hyp.idmap.text
section.
Offer a method to drop this identity mapping through
hyp_idmap_teardown.
Make all the above depend on CONFIG_ARM_VIRT_EXT and CONFIG_ARM_LPAE.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
|
|
KVM uses the stage-2 page tables and the Hyp page table format,
so we define the fields and page protection flags needed by KVM.
The nomenclature is this:
- page_hyp: PL2 code/data mappings
- page_hyp_device: PL2 device mappings (vgic access)
- page_s2: Stage-2 code/data page mappings
- page_s2_device: Stage-2 device mappings (vgic access)
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Christoffer Dall <c.dall@virtualopensystems.com>
|