linux-toradex.git/arch, branch v4.1.15

KVM: s390: enable SIMD only when no VCPUs were created

2015-12-09T19:03:28+00:00

commit 5967c17b118a2bd1dd1d554cc4eee16233e52bec upstream.

We should never allow to enable/disable any facilities for the guest
when other VCPUs were already created.

kvm_arch_vcpu_(load|put) relies on SIMD not changing during runtime.
If somebody would create and run VCPUs and then decides to enable
SIMD, undefined behaviour could be possible (e.g. vector save area
not being set up).

Acked-by: Christian Borntraeger 
Acked-by: Cornelia Huck 
Signed-off-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
Signed-off-by: Greg Kroah-Hartman

KVM: s390: avoid memory overwrites on emergency signal injection

2015-12-09T19:03:23+00:00

commit b85de33a1a3433487b6a721cfdce25ec8673e622 upstream.

Commit 383d0b050106 ("KVM: s390: handle pending local interrupts via
bitmap") introduced a possible memory overwrite from user space.

User space could pass an invalid emergency signal code (sending VCPU)
and therefore exceed the bitmap. Let's take care of this case and
check that the id is in the valid range.

Reviewed-by: Dominik Dingel 
Signed-off-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
Signed-off-by: Greg Kroah-Hartman

KVM: s390: fix wrong lookup of VCPUs by array index

2015-12-09T19:03:23+00:00

commit 152e9f65d66f0a3891efc3869440becc0e7ff53f upstream.

For now, VCPUs were always created sequentially with incrementing
VCPU ids. Therefore, the index in the VCPUs array matched the id.

As sequential creation might change with cpu hotplug, let's use
the correct lookup function to find a VCPU by id, not array index.

Let's also use kvm_lookup_vcpu() for validation of the sending VCPU
on external call injection.

Reviewed-by: Christian Borntraeger 
Signed-off-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
Signed-off-by: Greg Kroah-Hartman

KVM: s390: SCA must not cross page boundaries

2015-12-09T19:03:22+00:00

commit c5c2c393468576bad6d10b2b5fefff8cd25df3f4 upstream.

We seemed to have missed a few corner cases in commit f6c137ff00a4
("KVM: s390: randomize sca address").

The SCA has a maximum size of 2112 bytes. By setting the sca_offset to
some unlucky numbers, we exceed the page.

0x7c0 (1984) -> Fits exactly
0x7d0 (2000) -> 16 bytes out
0x7e0 (2016) -> 32 bytes out
0x7f0 (2032) -> 48 bytes out

One VCPU entry is 32 bytes long.

For the last two cases, we actually write data to the other page.
1. The address of the VCPU.
2. Injection/delivery/clearing of SIGP externall calls via SIGP IF.

Especially the 2. happens regularly. So this could produce two problems:
1. The guest losing/getting external calls.
2. Random memory overwrites in the host.

So this problem happens on every 127 + 128 created VM with 64 VCPUs.

Acked-by: Christian Borntraeger 
Signed-off-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
Signed-off-by: Greg Kroah-Hartman

arm64: page-align sections for DEBUG_RODATA

2015-12-09T19:03:22+00:00

commit cb083816ab5ac3d10a9417527f07fc5962cc3808 upstream.

A kernel built with DEBUG_RO_DATA && !CONFIG_DEBUG_ALIGN_RODATA doesn't
have .text aligned to a page boundary, though fixup_executable works at
page-granularity thanks to its use of create_mapping. If .text is not
page-aligned, the first page it exists in may be marked non-executable,
leading to failures when an attempt is made to execute code in said
page.

This patch upgrades ALIGN_DEBUG_RO and ALIGN_DEBUG_RO_MIN to force page
alignment for DEBUG_RO_DATA && !CONFIG_DEBUG_ALIGN_RODATA kernels,
ensuring that all sections with specific RWX permission requirements are
mapped with the correct permissions.

Signed-off-by: Mark Rutland 
Reported-by: Jeremy Linton 
Reviewed-by: Laura Abbott 
Acked-by: Ard Biesheuvel 
Cc: Suzuki Poulose 
Cc: Will Deacon 
Fixes: da141706aea52c1a ("arm64: add better page protections to arm64")
Signed-off-by: Catalin Marinas 
Signed-off-by: Greg Kroah-Hartman

arm64: Fix compat register mappings

2015-12-09T19:03:22+00:00

commit 5accd17d0eb523350c9ef754d655e379c9bb93b3 upstream.

For reasons not entirely apparent, but now enshrined in history, the
architectural mapping of AArch32 banked registers to AArch64 registers
actually orders SP_ and LR_ backwards compared to the
intuitive r13/r14 order, for all modes except FIQ.

Fix the compat__ macros accordingly, in the hope of avoiding
subtle bugs with KVM and AArch32 guests.

Signed-off-by: Robin Murphy 
Acked-by: Will Deacon 
Signed-off-by: Catalin Marinas 
Signed-off-by: Greg Kroah-Hartman

x86/cpu: Fix SMAP check in PVOPS environments

2015-12-09T19:03:18+00:00

commit 581b7f158fe0383b492acd1ce3fb4e99d4e57808 upstream.

There appears to be no formal statement of what pv_irq_ops.save_fl() is
supposed to return precisely.  Native returns the full flags, while lguest and
Xen only return the Interrupt Flag, and both have comments by the
implementations stating that only the Interrupt Flag is looked at.  This may
have been true when initially implemented, but no longer is.

To make matters worse, the Xen PVOP leaves the upper bits undefined, making
the BUG_ON() undefined behaviour.  Experimentally, this now trips for 32bit PV
guests on Broadwell hardware.  The BUG_ON() is consistent for an individual
build, but not consistent for all builds.  It has also been a sitting timebomb
since SMAP support was introduced.

Use native_save_fl() instead, which will obtain an accurate view of the AC
flag.

Signed-off-by: Andrew Cooper 
Reviewed-by: David Vrabel 
Tested-by: Rusty Russell 
Cc: Rusty Russell 
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: 
Cc: Xen-devel 
Link: http://lkml.kernel.org/r/1433323874-6927-1-git-send-email-andrew.cooper3@citrix.com
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

x86/cpu: Call verify_cpu() after having entered long mode too

2015-12-09T19:03:17+00:00

commit 04633df0c43d710e5f696b06539c100898678235 upstream.

When we get loaded by a 64-bit bootloader, kernel entry point is
startup_64 in head_64.S. We don't trust any and all bootloaders because
some will fiddle with CPU configuration so we go ahead and massage each
CPU into sanity again.

For example, some dell BIOSes have this XD disable feature which set
IA32_MISC_ENABLE[34] and disable NX. This might be some dumb workaround
for other OSes but Linux sure doesn't need it.

A similar thing is present in the Surface 3 firmware - see
https://bugzilla.kernel.org/show_bug.cgi?id=106051 - which sets this bit
only on the BSP:

  # rdmsr -a 0x1a0
  400850089
  850089
  850089
  850089

I know, right?!

There's not even an off switch in there.

So fix all those cases by sanitizing the 64-bit entry point too. For
that, make verify_cpu() callable in 64-bit mode also.

Requested-and-debugged-by: "H. Peter Anvin" 
Reported-and-tested-by: Bastien Nocera 
Signed-off-by: Borislav Petkov 
Cc: Matt Fleming 
Cc: Peter Zijlstra 
Link: http://lkml.kernel.org/r/1446739076-21303-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

x86/setup: Fix low identity map for >= 2GB kernel range

2015-12-09T19:03:17+00:00

commit 68accac392d859d24adcf1be3a90e41f978bd54c upstream.

The commit f5f3497cad8c extended the low identity mapping. However, if
the kernel uses more than 2 GB (VMSPLIT_2G_OPT or VMSPLIT_1G memory
split), the normal memory mapping is overwritten by the low identity
mapping causing a crash. To avoid overwritting, limit the low identity
map to cover only memory before kernel range (PAGE_OFFSET).

Fixes: f5f3497cad8c "x86/setup: Extend low identity map to cover whole kernel range
Signed-off-by: Krzysztof Mazur 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Laszlo Ersek 
Cc: Matt Fleming 
Cc: Paolo Bonzini 
Link: http://lkml.kernel.org/r/1446815916-22105-1-git-send-email-krzysiek@podlesie.net
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

x86/setup: Extend low identity map to cover whole kernel range

2015-12-09T19:03:17+00:00

commit f5f3497cad8c8416a74b9aaceb127908755d020a upstream.

On 32-bit systems, the initial_page_table is reused by
efi_call_phys_prolog as an identity map to call
SetVirtualAddressMap.  efi_call_phys_prolog takes care of
converting the current CPU's GDT to a physical address too.

For PAE kernels the identity mapping is achieved by aliasing the
first PDPE for the kernel memory mapping into the first PDPE
of initial_page_table.  This makes the EFI stub's trick "just work".

However, for non-PAE kernels there is no guarantee that the identity
mapping in the initial_page_table extends as far as the GDT; in this
case, accesses to the GDT will cause a page fault (which quickly becomes
a triple fault).  Fix this by copying the kernel mappings from
swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at
identity mapping.

For some reason, this is only reproducible with QEMU's dynamic translation
mode, and not for example with KVM.  However, even under KVM one can clearly
see that the page table is bogus:

    $ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize
    $ gdb
    (gdb) target remote localhost:1234
    (gdb) hb *0x02858f6f
    Hardware assisted breakpoint 1 at 0x2858f6f
    (gdb) c
    Continuing.

    Breakpoint 1, 0x02858f6f in ?? ()
    (gdb) monitor info registers
    ...
    GDT=     0724e000 000000ff
    IDT=     fffbb000 000007ff
    CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=00000690
    ...

The page directory is sane:

    (gdb) x/4wx 0x32b7000
    0x32b7000:	0x03398063	0x03399063	0x0339a063	0x0339b063
    (gdb) x/4wx 0x3398000
    0x3398000:	0x00000163	0x00001163	0x00002163	0x00003163
    (gdb) x/4wx 0x3399000
    0x3399000:	0x00400003	0x00401003	0x00402003	0x00403003

but our particular page directory entry is empty:

    (gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) * 4
    0x32b7070:	0x00000000

[ It appears that you can skate past this issue if you don't receive
  any interrupts while the bogus GDT pointer is loaded, or if you avoid
  reloading the segment registers in general.

  Andy Lutomirski provides some additional insight:

   "AFAICT it's entirely permissible for the GDTR and/or LDT
    descriptor to point to unmapped memory.  Any attempt to use them
    (segment loads, interrupts, IRET, etc) will try to access that memory
    as if the access came from CPL 0 and, if the access fails, will
    generate a valid page fault with CR2 pointing into the GDT or
    LDT."

  Up until commit 23a0d4e8fa6d ("efi: Disable interrupts around EFI
  calls, not in the epilog/prolog calls") interrupts were disabled
  around the prolog and epilog calls, and the functional GDT was
  re-installed before interrupts were re-enabled.

  Which explains why no one has hit this issue until now. ]

Signed-off-by: Paolo Bonzini 
Reported-by: Laszlo Ersek 
Cc: 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Signed-off-by: Matt Fleming 
[ Updated changelog. ]
Signed-off-by: Greg Kroah-Hartman