Age | Commit message (Collapse) | Author |
|
commit 086e8ced65d9bcc4a8e8f1cd39b09640f2883f90 upstream.
In x2apic mode, we need to set the upper address register of the fault
handling interrupt register of the vt-d hardware. Without this
irq migration of the vt-d fault handling interrupt is broken.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
LKML-Reference: <1291225233.2648.39.camel@sbsiddha-MOBL3>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Tested-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 7f7fbf45c6b748074546f7f16b9488ca71de99c1 upstream.
Interrupt-remapping gets enabled very early in the boot, as it determines the
apic mode that the processor can use. And the current code enables the vt-d
fault handling before the setup_local_APIC(). And hence the APIC LDR registers
and data structure in the memory may not be initialized. So the vt-d fault
handling in logical xapic/x2apic modes were broken.
Fix this by enabling the vt-d fault handling in the end_local_APIC_setup()
A cleaner fix of enabling fault handling while enabling intr-remapping
will be addressed for v2.6.38. [ Enabling intr-remapping determines the
usage of x2apic mode and the apic mode determines the fault-handling
configuration. ]
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
LKML-Reference: <20101201062244.541996375@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit de2a8cf98ecdde25231d6c5e7901e2cffaf32af9 upstream.
The vdso Makefile passes linker-style -m options not to the linker but
to gcc. This happens to work with earlier gcc, but fails with gcc
4.6. Pass gcc-style -m options, instead.
Note: all currently supported versions of gcc supports -m32, so there
is no reason to conditionalize it any more.
Reported-by: H. J. Lu <hjl.tools@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <tip-*@git.kernel.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit ce5f68246bf2385d6174856708d0b746dc378f20 upstream.
When we're using MWAIT for play_dead, explicitly CLFLUSH the cache
line before executing MONITOR. This is a potential workaround for the
Xeon 7400 erratum AAI65 after having a spurious wakeup and returning
around the loop. "Potential" here because it is not certain that that
erratum could actually trigger; however, the CLFLUSH should be
harmless.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Venkatesh Pallipadi <venki@google.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.kernel.org>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit a68e5c94f7d3dd64fef34dd5d97e365cae4bb42a upstream.
On processors with hyperthreading, when only one thread is offlined
the other thread can cause a spurious wakeup on the idled thread. We
do not want to re-WBINVD when that happens.
Ideally, we should simply skip WBINVD unless we're the last thread on
a particular core to shut down, but there might be similar issues
elsewhere in the system.
Thus, revert to previous behavior of only WBINVD outside the loop.
Partly as a result, remove the mb()'s around it: they are not
necessary since wbinvd() is a serializing instruction, but they were
intended to make sure the compiler didn't do any funny loop
optimizations.
Reported-by: Asit Mallick <asit.k.mallick@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Arjan van de Ven <arjan@linux.kernel.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <tip-ea53069231f9317062910d6e772cca4ce93de8c8@git.kernel.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit ea53069231f9317062910d6e772cca4ce93de8c8 upstream.
The code in native_play_dead() has a number of problems:
1. We should use MWAIT when available, to put ourselves into a deeper
sleep state.
2. We use the existence of CLFLUSH to determine if WBINVD is safe, but
that is totally bogus -- WBINVD is 486+, whereas CLFLUSH is a much
later addition.
3. We should do WBINVD inside the loop, just in case of something like
setting an A bit on page tables. Pointed out by Arjan van de Ven.
This code is based in part of a previous patch by Venki Pallipadi, but
unlike that patch this one keeps all the detection code local instead
of pre-caching a bunch of information. We're shutting down the CPU;
there is absolutely no hurry.
This patch moves all the code to C and deletes the global
wbinvd_halt() which is broken anyway.
Originally-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090522232230.162239000@intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit bc83cccc761953f878088cdfa682de0970b5561f upstream.
We have MWAIT constants spread across three different .c files, for no
good reason. Move them all into a common header file.
[PG: required for cherry pick of ce5f68246b - to avoid dup. mwait
fields in smpboot.c; drop intel_idle.c chunk, as 34 doesnt have it]
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
LKML-Reference: <tip-*@git.kernel.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit e8129c642155616d9e2160a75f103e127c8c3708 upstream.
On each machine check all registers are revalidated. The save area for
the clock comparator however only contains the upper most seven bytes
of the former contents, if valid.
Therefore the machine check handler uses a store clock instruction to
get the current time and writes that to the clock comparator register
which in turn will generate an immediate timer interrupt.
However within the lowcore the expected time of the next timer
interrupt is stored. If the interrupt happens before that time the
handler won't be called. In turn the clock comparator won't be
reprogrammed and therefore the interrupt condition stays pending which
causes an interrupt loop until the expected time is reached.
On NOHZ machines this can result in unresponsive machines since the
time of the next expected interrupted can be a couple of days in the
future.
To fix this just revalidate the clock comparator register with the
expected value.
In addition the special handling for udelay must be changed as well.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 8848a91068c018bc91f597038a0f41462a0f88a4 upstream.
Fix dummy inline stubs for trampoline-related functions when no
trampolines exist (until we get rid of the no-trampoline case
entirely.)
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <4C6C294D.3030404@zytor.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit b7d460897739e02f186425b7276e3fdb1595cea7 upstream.
rc2 kernel crashes when booting second cpu on this CONFIG_VMSPLIT_2G_OPT
laptop: whereas cloning from kernel to low mappings pgd range does need
to limit by both KERNEL_PGD_PTRS and KERNEL_PGD_BOUNDARY, cloning kernel
pgd range itself must not be limited by the smaller KERNEL_PGD_BOUNDARY.
Signed-off-by: Hugh Dickins <hughd@google.com>
LKML-Reference: <alpine.LSU.2.00.1008242235120.2515@sister.anvils>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit fd89a137924e0710078c3ae855e7cec1c43cb845 upstream.
This patch fixes machine crashes which occur when heavily exercising the
CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by
AMD Erratum 383 and result in a fatal machine check exception. Here's
the scenario:
1. On 32-bit, the swapper_pg_dir page table is used as the initial page
table for booting a secondary CPU.
2. To make this work, swapper_pg_dir needs a direct mapping of physical
memory in it (the low mappings). By adding those low, large page (2M)
mappings (PAE kernel), we create the necessary conditions for Erratum
383 to occur.
3. Other CPUs which do not participate in the off- and onlining game may
use swapper_pg_dir while the low mappings are present (when leave_mm is
called). For all steps below, the CPU referred to is a CPU that is using
swapper_pg_dir, and not the CPU which is being onlined.
4. The presence of the low mappings in swapper_pg_dir can result
in TLB entries for addresses below __PAGE_OFFSET to be established
speculatively. These TLB entries are marked global and large.
5. When the CPU with such TLB entry switches to another page table, this
TLB entry remains because it is global.
6. The process then generates an access to an address covered by the
above TLB entry but there is a permission mismatch - the TLB entry
covers a large global page not accessible to userspace.
7. Due to this permission mismatch a new 4kb, user TLB entry gets
established. Further, Erratum 383 provides for a small window of time
where both TLB entries are present. This results in an uncorrectable
machine check exception signalling a TLB multimatch which panics the
machine.
There are two ways to fix this issue:
1. Always do a global TLB flush when a new cr3 is loaded and the
old page table was swapper_pg_dir. I consider this a hack hard
to understand and with performance implications
2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
does.
This patch implements solution 2. It introduces a trampoline_pg_dir
which has the same layout as swapper_pg_dir with low_mappings. This page
table is used as the initial page table of the booting CPU. Later in the
bringup process, it switches to swapper_pg_dir and does a global TLB
flush. This fixes the crashes in our test cases.
-v2: switch to swapper_pg_dir right after entering start_secondary() so
that we are able to access percpu data which might not be mapped in the
trampoline page table.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <20100816123833.GB28147@aftab>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 36ac4b987bea9a95217e1af552252f275ca7fc44 upstream.
Fix calculation of "max_pnode" for systems where the the highest
blade has neither cpus or memory. (And, yes, although rare this
does occur).
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20100910150808.GA19802@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 2acebe9ecb2b77876e87a1480729cfb2db4570dd upstream.
SGI:UV: Delete extra boot messages that describe the system
topology. These messages are no longer useful.
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20100317154038.GA29346@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit c27852597829128a9c9d96d79ec454a83c6b0da5 upstream.
Explicitly clear the "in-syscall" bit when we have no signal
handler and back up the program counters to back up the system
call.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 392c21802ee3aa85cee0e703105f797a8a7b9416 upstream.
Don't invoke the signal handler tracehook in that situation
either.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 05c5e7698bdc54b3079a3517d86077f49ebcc788 upstream.
If another cpu does a very wide munmap() on the signal frame area,
it can tear down the page table hierarchy from underneath us.
Borrow an idea from the 64-bit fault path's get_user_insn(), and
disable cross call interrupts during the page table traversal
to lock them in place while we operate.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 0e91ec0c06d2cd15071a6021c94840a50e6671aa upstream.
The find_next_bit, find_first_bit, find_next_zero_bit
and find_first_zero_bit functions were not properly
clamping to the maxbit argument at the bit level. They
were instead only checking maxbit at the byte level.
To fix this, add a compare and a conditional move
instruction to the end of the common bit-within-the-
byte code used by all the functions and be sure not to
clobber the maxbit argument before it is used.
Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: James Jones <jajones@nvidia.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 1142b71d85894dcff1466dd6c871ea3c89e0352c upstream.
Commit 8b592783 added a Thumb-2 variant of usracc which, when it is
called with \rept=2, calls usraccoff once with an offset of 0 and
secondly with a hard-coded offset of 4 in order to avoid incrementing
the pointer again. If \inc != 4 then we will store the data to the wrong
offset from \ptr. Luckily, the only caller that passes \rept=2 to this
function is __clear_user so we haven't been actively corrupting user data.
This patch fixes usracc to pass \inc instead of #4 to usraccoff
when it is called a second time.
Reported-by: Tony Thompson <tony.thompson@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 69e83dad5207f8f03c9699e57e1febb114383cb8 upstream.
Disable the winch irq early to make sure we don't take an interrupt part
way through the freeing of the handler data, resulting in a crash on
shutdown:
winch_interrupt : read failed, errno = 9
fd 13 is losing SIGWINCH support
------------[ cut here ]------------
WARNING: at lib/list_debug.c:48 list_del+0xc6/0x100()
list_del corruption, next is LIST_POISON1 (00100100)
082578c8: [<081fd77f>] dump_stack+0x22/0x24
082578e0: [<0807a18a>] warn_slowpath_common+0x5a/0x80
08257908: [<0807a23e>] warn_slowpath_fmt+0x2e/0x30
08257920: [<08172196>] list_del+0xc6/0x100
08257940: [<08060244>] free_winch+0x14/0x80
08257958: [<080606fb>] winch_interrupt+0xdb/0xe0
08257978: [<080a65b5>] handle_IRQ_event+0x35/0xe0
08257998: [<080a8717>] handle_edge_irq+0xb7/0x170
082579bc: [<08059bc4>] do_IRQ+0x34/0x50
082579d4: [<08059e1b>] sigio_handler+0x5b/0x80
082579ec: [<0806a374>] sig_handler_common+0x44/0xb0
08257a68: [<0806a538>] sig_handler+0x38/0x50
08257a78: [<0806a77c>] handle_signal+0x5c/0xa0
08257a9c: [<0806be28>] hard_handler+0x18/0x20
08257aac: [<00c14400>] 0xc14400
Signed-off-by: Will Newton <will.newton@gmail.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit dab5fff14df2cd16eb1ad4c02e83915e1063fece upstream.
We didn't free per_cpu(acfreq_data, cpu)->freq_table
when acpi_freq driver is unloaded.
Resulting in the following messages in /sys/kernel/debug/kmemleak:
unreferenced object 0xf6450e80 (size 64):
comm "modprobe", pid 1066, jiffies 4294677317 (age 19290.453s)
hex dump (first 32 bytes):
00 00 00 00 e8 a2 24 00 01 00 00 00 00 9f 24 00 ......$.......$.
02 00 00 00 00 6a 18 00 03 00 00 00 00 35 0c 00 .....j.......5..
backtrace:
[<c123ba97>] kmemleak_alloc+0x27/0x50
[<c109f96f>] __kmalloc+0xcf/0x110
[<f9da97ee>] acpi_cpufreq_cpu_init+0x1ee/0x4e4 [acpi_cpufreq]
[<c11cd8d2>] cpufreq_add_dev+0x142/0x3a0
[<c11920b7>] sysdev_driver_register+0x97/0x110
[<c11cce56>] cpufreq_register_driver+0x86/0x140
[<f9dad080>] 0xf9dad080
[<c1001130>] do_one_initcall+0x30/0x160
[<c10626e9>] sys_init_module+0x99/0x1e0
[<c1002d97>] sysenter_do_call+0x12/0x26
[<ffffffff>] 0xffffffff
https://bugzilla.kernel.org/show_bug.cgi?id=15807#c21
Tested-by: Toralf Forster <toralf.foerster@gmx.de>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit c8770e7ba63bb5dd8fe5f9d251275a8fa717fb78 upstream.
We now use load_gs_index() to load gs safely; unfortunately this also
changes MSR_KERNEL_GS_BASE, which we managed separately. This resulted
in confusion and breakage running 32-bit host userspace on a 64-bit kernel.
Fix by
- saving guest MSR_KERNEL_GS_BASE before we we reload the host's gs
- doing the host save/load unconditionally, instead of only when in guest
long mode
Things can be cleaned up further, but this is the minmal fix for now.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 97e69aa62f8b5d338d6cff49be09e37cc1262838 upstream.
Structures kvm_vcpu_events, kvm_debugregs, kvm_pit_state2 and
kvm_clock_data are copied to userland with some padding and reserved
fields unitialized. It leads to leaking of contents of kernel stack
memory. We have to initialize them to zero.
In patch v1 Jan Kiszka suggested to fill reserved fields with zeros
instead of memset'ting the whole struct. It makes sense as these
fields are explicitly marked as padding. No more fields need zeroing.
[PG: 34 doesn't have the dbgregs section to be memset, so drop that bit]
KVM-Stable-Tag.
Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 31e323cca9d5c8afd372976c35a5d46192f540d1 upstream.
Xen will shoot all the VCPUs when we do a shutdown hypercall, so there's
no need to do it manually.
In any case it will fail because all the IPI irqs have been pulled
down by this point, so the cross-CPU calls will simply hang forever.
Until change 76fac077db6b34e2c6383a7b4f3f4f7b7d06d8ce the function calls
were not synchronously waited for, so this wasn't apparent. However after
that change the calls became synchronous leading to a hang on shutdown
on multi-VCPU guests.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Alok Kataria <akataria@vmware.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 482db6df1746c4fa7d64a2441d4cb2610249c679 upstream.
This fixes a issue which was introduced by fe2cc53e ("uml: track and make
up lost ticks").
timeval_to_ns() returns long long and not int. Due to that UML's timer
did not work properlt and caused timer freezes.
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 6915e04f8847bea16d0890f559694ad8eedd026c upstream.
The linker script cleanup that I did in commit 5d150a97f93 ("um: Clean up
linker script using standard macros.") (2.6.32) accidentally introduced an
ALIGN(PAGE_SIZE) when converting to use INIT_TEXT_SECTION; Richard
Weinberger reported that this causes the kernel to segfault with
CONFIG_STATIC_LINK=y.
I'm not certain why this extra alignment is a problem, but it seems likely
it is because previously
__init_begin = _stext = _text = _sinittext
and with the extra ALIGN(PAGE_SIZE), _sinittext becomes different from the
rest. So there is likely a bug here where something is assuming that
_sinittext is the same as one of those other symbols. But reverting the
accidental change fixes the regression, so it seems worth committing that
now.
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Reported-by: Richard Weinberger <richard@nod.at>
Cc: Jeff Dike <jdike@addtoit.com>
Tested by: Antoine Martin <antoine@nagafix.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit b843e4ec01991a386a9e0e9030703524446e03da upstream.
When running make headers_install_all on x86_64 and make 3.82 I hit this:
arch/microblaze/Makefile:80: *** mixed implicit and normal rules. Stop.
make: *** [headers_install_all] Error 2
So split the rules to satisfy make 3.82.
Signed-off-by: Thomas Backlund <tmb@mandriva.org>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 9581d442b9058d3699b4be568b6e5eae38a41493 upstream.
kvm reloads the host's fs and gs blindly, however the underlying segment
descriptors may be invalid due to the user modifying the ldt after loading
them.
Fix by using the safe accessors (loadsegment() and load_gs_index()) instead
of home grown unsafe versions.
This is CVE-2010-3698.
KVM-Stable-Tag.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 47008cd887c1836bcadda123ba73e1863de7a6c4 upstream.
The VMCB is reset whenever we receive a startup IPI, so Linux is setting
TSC back to zero happens very late in the boot process and destabilizing
the TSC. Instead, just set TSC to zero once at VCPU creation time.
Why the separate patch? So git-bisect is your friend.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 58877679fd393d3ef71aa383031ac7817561463d upstream.
On reset, VMCB TSC should be set to zero. Instead, code was setting
tsc_offset to zero, which passes through the underlying TSC.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 3444d7da1839b851eefedd372978d8a982316c36 upstream.
vmx does not restore GDT.LIMIT to the host value, instead it sets it to 64KB.
This means host userspace can learn a few bits of host memory.
Fix by reloading GDTR when we load other host state.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 5fd5387c89ec99ff6cb82d2477ffeb7211b781c2 upstream.
In no-direct mapping, we mark sp is 'direct' when we mapping the
guest's larger page, but its access is encoded form upper page-struct
entire not include the last mapping, it will cause access conflict.
For example, have this mapping:
[W]
/ PDE1 -> |---|
P[W] | | LPA
\ PDE2 -> |---|
[R]
P have two children, PDE1 and PDE2, both PDE1 and PDE2 mapping the
same lage page(LPA). The P's access is WR, PDE1's access is WR,
PDE2's access is RO(just consider read-write permissions here)
When guest access PDE1, we will create a direct sp for LPA, the sp's
access is from P, is W, then we will mark the ptes is W in this sp.
Then, guest access PDE2, we will find LPA's shadow page, is the same as
PDE's, and mark the ptes is RO.
So, if guest access PDE1, the incorrect #PF is occured.
Fixed by encode the last mapping access into direct shadow page
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 9e7b0e7fba45ca3c6357aeb7091ebc281f1de365 upstream.
If the mapping is writable but the dirty flag is not set, we will find
the read-only direct sp and setup the mapping, then if the write #PF
occur, we will mark this mapping writable in the read-only direct sp,
now, other real read-only mapping will happily write it without #PF.
It may hurt guest's COW
Fixed by re-install the mapping when write #PF occur.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 37a2f9f30a360fb03522d15c85c78265ccd80287 upstream.
The copy of /proc/vmcore to a user buffer proceeds much faster
if the kernel addresses memory as cached.
With this patch we have seen an increase in transfer rate from
less than 15MB/s to 80-460MB/s, depending on size of the
transfer. This makes a big difference in time needed to save a
system dump.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: kexec@lists.infradead.org
LKML-Reference: <E1OtMLz-0001yp-Ia@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 75e3cfbed6f71a8f151dc6e413b6ce3c390030cb upstream.
Currently the redirection hint in the interrupt-remapping table entry
is set to 0, which means the remapped interrupt is directed to the
processors listed in the destination. So in logical flat mode
in the presence of intr-remapping, this results in a single
interrupt multi-casted to multiple cpu's as specified by the destination
bit mask. But what we really want is to send that interrupt to one of the cpus
based on the lowest priority delivery mode.
Set the redirection hint in the IRTE to '1' to indicate that we want
the remapped interrupt to be directed to only one of the processors
listed in the destination.
This fixes the issue of same interrupt getting delivered to multiple cpu's
in the logical flat mode in the presence of interrupt-remapping. While
there is no functional issue observed with this behavior, this will
impact performance of such configurations (<=8 cpu's using logical flat
mode in the presence of interrupt-remapping)
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20100827181049.013051492@sbsiddha-MOBL3.sc.intel.com>
Cc: Weidong Han <weidong.han@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 3fdbf004c1706480a7c7fac3c9d836fa6df20d7d upstream.
Instead of adapting the CPU family check in amd_special_default_mtrr()
for each new CPU family assume that all new AMD CPUs support the
necessary bits in SYS_CFG MSR.
Tom2Enabled is architectural (defined in APM Vol.2).
Tom2ForceMemTypeWB is defined in all BKDGs starting with K8 NPT.
In pre K8-NPT BKDG this bit is reserved (read as zero).
W/o this adaption Linux would unnecessarily complain about bad MTRR
settings on every new AMD CPU family, e.g.
[ 0.000000] WARNING: BIOS bug: CPU MTRRs don't cover all of memory, losing 4863MB of RAM.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100930123235.GB20545@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 286e5b97eb22baab9d9a41ca76c6b933a484252c upstream.
Avoids a potential infinite loop.
It was observed once, during an EC hacking/debugging
session - not in regular operation.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Cc: dilinger@queued.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 76fac077db6b34e2c6383a7b4f3f4f7b7d06d8ce upstream.
x86 smp_ops now has a new op, stop_other_cpus which takes a parameter
"wait" this allows the caller to specify if it wants to stop until all
the cpus have processed the stop IPI. This is required specifically
for the kexec case where we should wait for all the cpus to be stopped
before starting the new kernel. We now wait for the cpus to stop in
all cases except for panic/kdump where we expect things to be broken
and we are doing our best to make things work anyway.
This patch fixes a legitimate regression, which was introduced during
2.6.30, by commit id 4ef702c10b5df18ab04921fc252c26421d4d6c75.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
LKML-Reference: <1286833028.1372.20.camel@ank32.eng.vmware.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 7ef8aa72ab176e0288f363d1247079732c5d5792 upstream.
The AMD SSE5 feature set as-it has been replaced by some extensions
to the AVX instruction set. Thus the bit formerly advertised as SSE5
is re-used for one of these extensions (XOP).
Although this changes the /proc/cpuinfo output, it is not user visible, as
there are no CPUs (yet) having this feature.
To avoid confusion this should be added to the stable series, too.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
LKML-Reference: <1283778860-26843-2-git-send-email-andre.przywara@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 3ee48b6af49cf534ca2f481ecc484b156a41451d upstream.
During the reading of /proc/vmcore the kernel is doing
ioremap()/iounmap() repeatedly. And the buildup of un-flushed
vm_area_struct's is causing a great deal of overhead. (rb_next()
is chewing up most of that time).
This solution is to provide function set_iounmap_nonlazy(). It
causes a subsequent call to iounmap() to immediately purge the
vma area (with try_purge_vmap_area_lazy()).
With this patch we have seen the time for writing a 250MB
compressed dump drop from 71 seconds to 44 seconds.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: kexec@lists.infradead.org
LKML-Reference: <E1OwHZ4-0005WK-Tw@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 9f5f9ffe50e90ed73040d2100db8bfc341cee352 upstream.
The logic to distinguish marked instruction events from ordinary events
on PPC970 and derivatives was flawed. The result is that instruction
sampling didn't get enabled in the PMU for some marked instruction
events, so they would never trigger. This fixes it by adding the
appropriate break statements in the switch statement.
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 1dedefd1a066a795a87afca9c0236e1a94de9bf6 upstream.
Some extra CPU features such as ARAT is needed in early boot so
that x86_init function pointers can be set up properly.
http://lkml.org/lkml/2010/5/18/519
At start_kernel() level, this patch moves init_scattered_cpuid_features()
from check_bugs() to setup_arch() -> early_cpu_init() which is earlier than
platform specific x86_init layer setup. Suggested by HPA.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
LKML-Reference: <1274295685-6774-2-git-send-email-jacob.jun.pan@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 54a834043314c257210db2a9d59f8cc605571639 upstream.
In f761622e59433130bc33ad086ce219feee9eb961 we changed
early_setup_secondary so it's called using the proper kernel stack
rather than the emergency one.
Unfortunately, this stack pointer can't be used when translation is off
on PHYP as this stack pointer might be outside the RMO. This results in
the following on all non zero cpus:
cpu 0x1: Vector: 300 (Data Access) at [c00000001639fd10]
pc: 000000000001c50c
lr: 000000000000821c
sp: c00000001639ff90
msr: 8000000000001000
dar: c00000001639ffa0
dsisr: 42000000
current = 0xc000000016393540
paca = 0xc000000006e00200
pid = 0, comm = swapper
The original patch was only tested on bare metal system, so it never
caught this problem.
This changes __secondary_start so that we calculate the new stack
pointer but only start using it after we've called early_setup_secondary.
With this patch, the above problem goes away.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit f761622e59433130bc33ad086ce219feee9eb961 upstream.
As early setup calls down to slb_initialize(), we must have kstack
initialised before checking "should we add a bolted SLB entry for our kstack?"
Failing to do so means stack access requires an SLB miss exception to refill
an entry dynamically, if the stack isn't accessible via SLB(0) (kernel text
& static data). It's not always allowable to take such a miss, and
intermittent crashes will result.
Primary CPUs don't have this issue; an SLB entry is not bolted for their
stack anyway (as that lives within SLB(0)). This patch therefore only
affects the init of secondaries.
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 6dcbfe4f0b4e17e289d56fa534b7ce5a6b7f63a3 upstream.
This fixes possible cases of not collecting valid error info in
the MCE error thresholding groups on F10h hardware.
The current code contains a subtle problem of checking only the
Valid bit of MSR0000_0413 (which is MC4_MISC0 - DRAM
thresholding group) in its first iteration and breaking out if
the bit is cleared.
But (!), this MSR contains an offset value, BlkPtr[31:24], which
points to the remaining MSRs in this thresholding group which
might contain valid information too. But if we bail out only
after we checked the valid bit in the first MSR and not the
block pointer too, we miss that other information.
The thing is, MC4_MISC0[BlkPtr] is not predicated on
MCi_STATUS[MiscV] or MC4_MISC0[Valid] and should be checked
prior to iterating over the MCI_MISCj thresholding group,
irrespective of the MC4_MISC0[Valid] setting.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 47526903feb52f4c26a6350370bdf74e337fcdb1 upstream.
Commit f81f2f7c (ubd: drop unnecessary rq->sector manipulation)
dropped request->sector manipulation in preparation for global request
handling cleanup; unfortunately, it incorrectly assumed that the
updated sector wasn't being used.
ubd tries to issue as many requests as possible to io_thread. When
issuing fails due to memory pressure or other reasons, the device is
put on the restart list and issuing stops. On IO completion, devices
on the restart list are scanned and IO issuing is restarted.
ubd issues IOs sg-by-sg and issuing can be stopped in the middle of a
request, so each device on the restart queue needs to remember where
to restart in its current request. ubd needs to keep track of the
issue position itself because,
* blk_rq_pos(req) is now updated by the block layer to keep track of
_completion_ position.
* Multiple io_req's for the current request may be in flight, so it's
difficult to tell where blk_rq_pos(req) currently is.
Add ubd->rq_pos to keep track of the issue position and use it to
correctly restart io_req issue.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Richard Weinberger <richard@nod.at>
Tested-by: Richard Weinberger <richard@nod.at>
Tested-by: Chris Frey <cdfrey@foursquare.net>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 1cf180c94e9166cda083ff65333883ab3648e852 upstream.
free_irq_cfg() is not freeing the cpumask_vars in irq_cfg. Fixing this
triggers a use after free caused by the fact that copying struct
irq_cfg is done with memcpy, which copies the pointer not the cpumask.
Fix both places.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
LKML-Reference: <alpine.LFD.2.00.1009282052570.2416@localhost6.localdomain6>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 021989622810b02aab4b24f91e1f5ada2b654579 upstream.
create_irq() returns -1 if the interrupt allocation failed, but the
code checks for irq == 0.
Use create_irq_nr() instead.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Venkatesh Pallipadi <venki@google.com>
LKML-Reference: <alpine.LFD.2.00.1009282310360.2416@localhost6.localdomain6>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 258af47479980d8238a04568b94a4e55aa1cb537 upstream.
The guest can use the paravirt clock in kvmclock.c which is used
by sched_clock(), which in turn is used by the tracing mechanism
for timestamps, which leads to infinite recursion.
Disable mcount/tracing for kvmclock.o.
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 9ecd4e1689208afe9b059a5ce1333acb2f42c4d2 upstream.
When using a paravirt clock, pvclock.c can be used by sched_clock(),
which in turn is used by the tracing mechanism for timestamps,
which leads to infinite recursion.
Disable mcount/tracing for pvclock.o.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
LKML-Reference: <4C9A9A3F.4040201@goop.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 4c894f47bb49284008073d351c0ddaac8860864e upstream.
This patch adds a workaround for an IOMMU BIOS problem to
the AMD IOMMU driver. The result of the bug is that the
IOMMU does not execute commands anymore when the system
comes out of the S3 state resulting in system failure. The
bug in the BIOS is that is does not restore certain hardware
specific registers correctly. This workaround reads out the
contents of these registers at boot time and restores them
on resume from S3. The workaround is limited to the specific
IOMMU chipset where this problem occurs.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|