linux-toradex.git/arch/x86/include, branch v3.2.63

x86_64/entry/xen: Do not invoke espfix64 on Xen

2014-09-13T22:41:51+00:00

commit 7209a75d2009dbf7745e2fd354abf25c3deb3ca3 upstream.

This moves the espfix64 logic into native_iret.  To make this work,
it gets rid of the native patch for INTERRUPT_RETURN:
INTERRUPT_RETURN on native kernels is now 'jmp native_iret'.

This changes the 16-bit SS behavior on Xen from OOPSing to leaking
some bits of the Xen hypervisor's RSP (I think).

[ hpa: this is a nonzero cost on native, but probably not enough to
  measure. Xen needs to fix this in their own code, probably doing
  something equivalent to espfix64. ]

Signed-off-by: Andy Lutomirski 
Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Ben Hutchings

x86, espfix: Fix broken header guard

2014-09-13T22:41:51+00:00

commit 20b68535cd27183ebd3651ff313afb2b97dac941 upstream.

Header guard is #ifndef, not #ifdef...

Reported-by: Fengguang Wu 
Signed-off-by: H. Peter Anvin 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Ben Hutchings

x86, espfix: Move espfix definitions into a separate header file

2014-09-13T22:41:51+00:00

commit e1fe9ed8d2a4937510d0d60e20705035c2609aea upstream.

Sparse warns that the percpu variables aren't declared before they are
defined.  Rather than hacking around it, move espfix definitions into
a proper header file.

Reported-by: Fengguang Wu 
Signed-off-by: H. Peter Anvin 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Ben Hutchings

x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack

2014-09-13T22:41:51+00:00

commit 3891a04aafd668686239349ea58f3314ea2af86b upstream.

The IRET instruction, when returning to a 16-bit segment, only
restores the bottom 16 bits of the user space stack pointer.  This
causes some 16-bit software to break, but it also leaks kernel state
to user space.  We have a software workaround for that ("espfix") for
the 32-bit kernel, but it relies on a nonzero stack segment base which
is not available in 64-bit mode.

In checkin:

    b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels

we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
the logic that 16-bit support is crippled on 64-bit kernels anyway (no
V86 support), but it turns out that people are doing stuff like
running old Win16 binaries under Wine and expect it to work.

This works around this by creating percpu "ministacks", each of which
is mapped 2^16 times 64K apart.  When we detect that the return SS is
on the LDT, we copy the IRET frame to the ministack and use the
relevant alias to return to userspace.  The ministacks are mapped
readonly, so if IRET faults we promote #GP to #DF which is an IST
vector and thus has its own stack; we then do the fixup in the #DF
handler.

(Making #GP an IST exception would make the msr_safe functions unsafe
in NMI/MC context, and quite possibly have other effects.)

Special thanks to:

- Andy Lutomirski, for the suggestion of using very small stack slots
  and copy (as opposed to map) the IRET frame there, and for the
  suggestion to mark them readonly and let the fault promote to #DF.
- Konrad Wilk for paravirt fixup and testing.
- Borislav Petkov for testing help and useful comments.

Reported-by: Brian Gerst 
Signed-off-by: H. Peter Anvin 
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Cc: Konrad Rzeszutek Wilk 
Cc: Borislav Petkov 
Cc: Andrew Lutomriski 
Cc: Linus Torvalds 
Cc: Dirk Hohndel 
Cc: Arjan van de Ven 
Cc: comex 
Cc: Alexander van Heukelum 
Cc: Boris Ostrovsky 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Ben Hutchings

KVM: x86: preserve the high 32-bits of the PAT register

2014-08-06T17:07:32+00:00

commit 7cb060a91c0efc5ff94f83c6df3ed705e143cdb9 upstream.

KVM does not really do much with the PAT, so this went unnoticed for a
long time.  It is exposed however if you try to do rdmsr on the PAT
register.

Reported-by: Valentine Sinitsyn 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Ben Hutchings

KVM: x86: Increase the number of fixed MTRR regs to 10

2014-08-06T17:07:31+00:00

commit 682367c494869008eb89ef733f196e99415ae862 upstream.

Recent Intel CPUs have 10 variable range MTRRs. Since operating systems
sometime make assumptions on CPUs while they ignore capability MSRs, it is
better for KVM to be consistent with recent CPUs. Reporting more MTRRs than
actually supported has no functional implications.

Signed-off-by: Nadav Amit 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Ben Hutchings

ptrace,x86: force IRET path after a ptrace_stop()

2014-07-11T12:33:59+00:00

commit b9cd18de4db3c9ffa7e17b0dc0ca99ed5aa4d43a upstream.

The 'sysret' fastpath does not correctly restore even all regular
registers, much less any segment registers or reflags values.  That is
very much part of why it's faster than 'iret'.

Normally that isn't a problem, because the normal ptrace() interface
catches the process using the signal handler infrastructure, which
always returns with an iret.

However, some paths can get caught using ptrace_event() instead of the
signal path, and for those we need to make sure that we aren't going to
return to user space using 'sysret'.  Otherwise the modifications that
may have been done to the register set by the tracer wouldn't
necessarily take effect.

Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from
arch_ptrace_stop_needed() which is invoked from ptrace_stop().

Signed-off-by: Tejun Heo 
Reported-by: Andy Lutomirski 
Acked-by: Oleg Nesterov 
Suggested-by: Linus Torvalds 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings

x86, mm, hugetlb: Add missing TLB page invalidation for hugetlb_cow()

2014-06-09T12:29:11+00:00

commit 9844f5462392b53824e8b86726e7c33b5ecbb676 upstream.

The invalidation is required in order to maintain proper semantics
under CoW conditions. In scenarios where a process clones several
threads, a thread operating on a core whose DTLB entry for a
particular hugepage has not been invalidated, will be reading from
the hugepage that belongs to the forked child process, even after
hugetlb_cow().

The thread will not see the updated page as long as the stale DTLB
entry remains cached, the thread attempts to write into the page,
the child process exits, or the thread gets migrated to a different
processor.

Signed-off-by: Anthony Iliopoulos 
Link: http://lkml.kernel.org/r/20140514092948.GA17391@server-36.huawei.corp
Suggested-by: Shay Goikhman 
Acked-by: Dave Hansen 
Signed-off-by: H. Peter Anvin 
Signed-off-by: Ben Hutchings

x86: Add check for number of available vectors before CPU down

2014-04-01T23:58:43+00:00

commit da6139e49c7cb0f4251265cb5243b8d220adb48d upstream.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791

When a cpu is downed on a system, the irqs on the cpu are assigned to
other cpus.  It is possible, however, that when a cpu is downed there
aren't enough free vectors on the remaining cpus to account for the
vectors from the cpu that is being downed.

This results in an interesting "overflow" condition where irqs are
"assigned" to a CPU but are not handled.

For example, when downing cpus on a 1-64 logical processor system:


[  232.021745] smpboot: CPU 61 is now offline
[  238.480275] smpboot: CPU 62 is now offline
[  245.991080] ------------[ cut here ]------------
[  245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x246/0x250()
[  246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out
[  246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas scsi_transport_sas
[  246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14
[  246.044451] Hardware name: Intel Corporation S4600LH ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
[  246.057371]  0000000000000009 ffff88081fa03d40 ffffffff8164fbf6 ffff88081fa0ee48
[  246.065728]  ffff88081fa03d90 ffff88081fa03d80 ffffffff81054ecc ffff88081fa13040
[  246.074073]  0000000000000000 ffff88200cce0000 0000000000000040 0000000000000000
[  246.082430] Call Trace:
[  246.085174]    [] dump_stack+0x46/0x58
[  246.091633]  [] warn_slowpath_common+0x8c/0xc0
[  246.098352]  [] warn_slowpath_fmt+0x46/0x50
[  246.104786]  [] dev_watchdog+0x246/0x250
[  246.110923]  [] ? dev_deactivate_queue.constprop.31+0x80/0x80
[  246.119097]  [] call_timer_fn+0x3a/0x110
[  246.125224]  [] ? update_process_times+0x6f/0x80
[  246.132137]  [] ? dev_deactivate_queue.constprop.31+0x80/0x80
[  246.140308]  [] run_timer_softirq+0x1f0/0x2a0
[  246.146933]  [] __do_softirq+0xe0/0x220
[  246.152976]  [] call_softirq+0x1c/0x30
[  246.158920]  [] do_softirq+0x55/0x90
[  246.164670]  [] irq_exit+0xa5/0xb0
[  246.170227]  [] smp_apic_timer_interrupt+0x4a/0x60
[  246.177324]  [] apic_timer_interrupt+0x6a/0x70
[  246.184041]    [] ? cpuidle_enter_state+0x5b/0xe0
[  246.191559]  [] ? cpuidle_enter_state+0x57/0xe0
[  246.198374]  [] cpuidle_idle_call+0xbd/0x200
[  246.204900]  [] arch_cpu_idle+0xe/0x30
[  246.210846]  [] cpu_startup_entry+0xd0/0x250
[  246.217371]  [] rest_init+0x77/0x80
[  246.223028]  [] start_kernel+0x3ee/0x3fb
[  246.229165]  [] ? repair_env_string+0x5e/0x5e
[  246.235787]  [] x86_64_start_reservations+0x2a/0x2c
[  246.242990]  [] x86_64_start_kernel+0xf8/0xfc
[  246.249610] ---[ end trace fb74fdef54d79039 ]---
[  246.254807] ixgbe 0000:c2:00.0 p786p1: initiating reset due to tx timeout
[  246.262489] ixgbe 0000:c2:00.0 p786p1: Reset adapter
Last login: Mon Nov 11 08:35:14 from 10.18.17.119
[root@(none) ~]# [  246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
[  249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[  246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
[  249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX

(last lines keep repeating.  ixgbe driver is dead until module reload.)

If the downed cpu has more vectors than are free on the remaining cpus on the
system, it is possible that some vectors are "orphaned" even though they are
assigned to a cpu.  In this case, since the ixgbe driver had a watchdog, the
watchdog fired and notified that something was wrong.

This patch adds a function, check_vectors(), to compare the number of vectors
on the CPU going down and compares it to the number of vectors available on
the system.  If there aren't enough vectors for the CPU to go down, an
error is returned and propogated back to userspace.

v2: Do not need to look at percpu irqs
v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest
    Priority Mode
v4: Additional changes suggested by Gong Chen.
v5/v6/v7/v8: Updated comment text

Signed-off-by: Prarit Bhargava 
Link: http://lkml.kernel.org/r/1389613861-3853-1-git-send-email-prarit@redhat.com
Reviewed-by: Gong Chen 
Cc: Andi Kleen 
Cc: Michel Lespinasse 
Cc: Seiji Aguchi 
Cc: Yang Zhang 
Cc: Paul Gortmaker 
Cc: Janet Morgan 
Cc: Tony Luck 
Cc: Ruiv Wang 
Cc: Gong Chen 
Signed-off-by: H. Peter Anvin 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

compiler/gcc4: Add quirk for 'asm goto' miscompilation bug

2013-11-28T14:02:03+00:00

commit 3f0116c3238a96bc18ad4b4acefe4e7be32fa861 upstream.

Fengguang Wu, Oleg Nesterov and Peter Zijlstra tracked down
a kernel crash to a GCC bug: GCC miscompiles certain 'asm goto'
constructs, as outlined here:

  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670

Implement a workaround suggested by Jakub Jelinek.

Reported-and-tested-by: Fengguang Wu 
Reported-by: Oleg Nesterov 
Reported-by: Peter Zijlstra 
Suggested-by: Jakub Jelinek 
Reviewed-by: Richard Henderson 
Cc: Linus Torvalds 
Cc: Andrew Morton 
Signed-off-by: Ingo Molnar 
[bwh: Backported to 3.2:
 - Drop inapplicable changes
 - Adjust context]
Signed-off-by: Ben Hutchings