Age | Commit message (Collapse) | Author |
|
commit 8762e5092828c4dc0f49da5a47a644c670df77f3 upstream.
init_espfix_ap() is currently off by one level when informing hypervisor
that allocated pages will be used for ministacks' page tables.
The most immediate effect of this on a PV guest is that if
'stack_page = __get_free_page()' returns a non-zeroed-out page the hypervisor
will refuse to use it for a page table (which it shouldn't be anyway). This will
result in warnings by both Xen and Linux.
More importantly, a subsequent write to that page (again, by a PV guest) is
likely to result in fatal page fault.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1404926298-5565-1-git-send-email-boris.ostrovsky@oracle.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ea9f9274bf4337ba7cbab241c780487651642d63 upstream.
Remove xen_enable_nmi() to fix a 64-bit guest crash when registering
the NMI callback on Xen 3.1 and earlier.
It's not needed since the NMI callback is set by a set_trap_table
hypercall (in xen_load_idt() or xen_write_idt_entry()).
It's also broken since it only set the current VCPU's callback.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7209a75d2009dbf7745e2fd354abf25c3deb3ca3 upstream.
This moves the espfix64 logic into native_iret. To make this work,
it gets rid of the native patch for INTERRUPT_RETURN:
INTERRUPT_RETURN on native kernels is now 'jmp native_iret'.
This changes the 16-bit SS behavior on Xen from OOPSing to leaking
some bits of the Xen hypervisor's RSP (I think).
[ hpa: this is a nonzero cost on native, but probably not enough to
measure. Xen needs to fix this in their own code, probably doing
something equivalent to espfix64. ]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 34273f41d57ee8d854dcd2a1d754cbb546cb548f upstream.
Embedded systems, which may be very memory-size-sensitive, are
extremely unlikely to ever encounter any 16-bit software, so make it
a CONFIG_EXPERT option to turn off support for any 16-bit software
whatsoever.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 197725de65477bc8509b41388157c1a2283542bb upstream.
Make espfix64 a hidden Kconfig option. This fixes the x86-64 UML
build which had broken due to the non-existence of init_espfix_bsp()
in UML: since UML uses its own Kconfig, this option does not appear in
the UML build.
This also makes it possible to make support for 16-bit segments a
configuration option, for the people who want to minimize the size of
the kernel.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Richard Weinberger <richard@nod.at>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 20b68535cd27183ebd3651ff313afb2b97dac941 upstream.
Header guard is #ifndef, not #ifdef...
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e1fe9ed8d2a4937510d0d60e20705035c2609aea upstream.
Sparse warns that the percpu variables aren't declared before they are
defined. Rather than hacking around it, move espfix definitions into
a proper header file.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3891a04aafd668686239349ea58f3314ea2af86b upstream.
The IRET instruction, when returning to a 16-bit segment, only
restores the bottom 16 bits of the user space stack pointer. This
causes some 16-bit software to break, but it also leaks kernel state
to user space. We have a software workaround for that ("espfix") for
the 32-bit kernel, but it relies on a nonzero stack segment base which
is not available in 64-bit mode.
In checkin:
b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
the logic that 16-bit support is crippled on 64-bit kernels anyway (no
V86 support), but it turns out that people are doing stuff like
running old Win16 binaries under Wine and expect it to work.
This works around this by creating percpu "ministacks", each of which
is mapped 2^16 times 64K apart. When we detect that the return SS is
on the LDT, we copy the IRET frame to the ministack and use the
relevant alias to return to userspace. The ministacks are mapped
readonly, so if IRET faults we promote #GP to #DF which is an IST
vector and thus has its own stack; we then do the fixup in the #DF
handler.
(Making #GP an IST exception would make the msr_safe functions unsafe
in NMI/MC context, and quite possibly have other effects.)
Special thanks to:
- Andy Lutomirski, for the suggestion of using very small stack slots
and copy (as opposed to map) the IRET frame there, and for the
suggestion to mark them readonly and let the fault promote to #DF.
- Konrad Wilk for paravirt fixup and testing.
- Borislav Petkov for testing help and useful comments.
Reported-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andrew Lutomriski <amluto@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dirk Hohndel <dirk@hohndel.org>
Cc: Arjan van de Ven <arjan.van.de.ven@intel.com>
Cc: comex <comexk@gmail.com>
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: <stable@vger.kernel.org> # consider after upstream merge
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7ed6fb9b5a5510e4ef78ab27419184741169978a upstream.
This reverts commit fa81511bb0bbb2b1aace3695ce869da9762624ff in
preparation of merging in the proper fix (espfix64).
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c7fb93ec51d462ec3540a729ba446663c26a0505 upstream.
The PE/COFF headers currently describe only the initialised-data
portions of the image, and result in no space being allocated for the
uninitialised-data portions. Consequently, the EFI boot stub will end
up overwriting unexpected areas of memory, with unpredictable results.
Fix by including a .bss section in the PE/COFF headers (functionally
equivalent to the init_size field in the bzImage header).
Signed-off-by: Michael Brown <mbrown@fensystems.co.uk>
Cc: Thomas Bächler <thomas@archlinux.org>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8142b215501f8b291a108a202b3a053a265b03dd upstream.
Commit 554086d ("x86_32, entry: Do syscall exit work on badsys
(CVE-2014-4508)") introduced a regression in the x86_32 syscall entry
code, resulting in syscall() not returning proper errors for undefined
syscalls on CPUs supporting the sysenter feature.
The following code:
> int result = syscall(666);
> printf("result=%d errno=%d error=%s\n", result, errno, strerror(errno));
results in:
> result=666 errno=0 error=Success
Obviously, the syscall return value is the called syscall number, but it
should have been an ENOSYS error. When run under ptrace it behaves
correctly, which makes it hard to debug in the wild:
> result=-1 errno=38 error=Function not implemented
The %eax register is the return value register. For debugging via ptrace
the syscall entry code stores the complete register context on the
stack. The badsys handlers only store the ENOSYS error code in the
ptrace register set and do not set %eax like a regular syscall handler
would. The old resume_userspace call chain contains code that clobbers
%eax and it restores %eax from the ptrace registers afterwards. The same
goes for the ptrace-enabled call chain. When ptrace is not used, the
syscall return value is the passed-in syscall number from the untouched
%eax register.
Use %eax as the return value register in syscall_badsys and
sysenter_badsys, like a real syscall handler does, and have the caller
push the value onto the stack for ptrace access.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Link: http://lkml.kernel.org/r/alpine.LNX.2.11.1407221022380.31021@titan.int.lan.stealer.net
Reviewed-and-tested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4badad352a6bb202ec68afa7a574c0bb961e5ebc upstream.
The optimistic spin code assumes regular stores and cmpxchg() play nice;
this is found to not be true for at least: parisc, sparc32, tile32,
metag-lock1, arc-!llsc and hexagon.
There is further wreckage, but this in particular seemed easy to
trigger, so blacklist this.
Opt in for known good archs.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/20140606175316.GV13930@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3896c329df8092661dac80f55a8c3f60136fd61a upstream.
Mauro reported that his AMD X2 using the powernow-k8 cpufreq driver
locked up when doing cpu hotplug.
Because we called set_cyc2ns_scale() from the time_cpufreq_notifier()
unconditionally, it gets called multiple times for each freq change,
instead of only the once, when the tsc_khz value actually changes.
Because it gets called more than once, we run out of cyc2ns data slots
and stall, waiting for a free one, but because we're half way offline,
there's no consumers to free slots.
By placing the call inside the condition that actually changes tsc_khz
we avoid superfluous calls and avoid the problem.
Reported-by: Mauro <registosites@hotmail.com>
Tested-by: Mauro <registosites@hotmail.com>
Fixes: 20d1c86a5776 ("sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs")
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Bin Gao <bin.gao@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b292d7a10487aee6e74b1c18b8d95b92f40d4a4f upstream.
Currently, any NMI is falsely handled by a NMI handler of NMI watchdog
if CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR is set.
For example, we use external NMI to make system panic to get crash
dump, but in this case, the external NMI is falsely handled do to the
issue.
This commit deals with the issue simply by ignoring CondChgd bit.
Here is explanation in detail.
On x86 NMI watchdog uses performance monitoring feature to
periodically signal NMI each time performance counter gets overflowed.
intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI
handler of NMI watchdog, perf_event_nmi_handler(). It identifies an
owner of a given NMI by looking at overflow status bits in
MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it
handles the given NMI as its own NMI.
The problem is that the intel_pmu_handle_irq() doesn't distinguish
CondChgd bit from other bits. Unlike the other status bits, CondChgd
bit doesn't represent overflow status for performance counters. Thus,
CondChgd bit cannot be thought of as a mark indicating a given NMI is
NMI watchdog's.
As a result, if CondChgd bit is set, any NMI is falsely handled by the
NMI handler of NMI watchdog. Also, if type of the falsely handled NMI
is either NMI_UNKNOWN, NMI_SERR or NMI_IO_CHECK, the corresponding
action is never performed until CondChgd bit is cleared.
I noticed this behavior on systems with Ivy Bridge processors: Intel
Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems,
CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set
in the beginning at boot. Then the CondChgd bit is immediately cleared
by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain
0.
On the other hand, on older processors such as Nehalem, Xeon E7540,
CondChgd bit is not set in the beginning at boot.
I'm not sure about exact behavior of CondChgd bit, in particular when
this bit is set. Although I read Intel System Programmer's Manual to
figure out that, the descriptions I found are:
In 18.9.1:
"The MSR_PERF_GLOBAL_STATUS MSR also provides a ¡sticky bit¢ to
indicate changes to the state of performancmonitoring hardware"
In Table 35-2 IA-32 Architectural MSRs
63 CondChg: status bits of this register has changed.
These are different from the bahviour I see on the actual system as I
explained above.
At least, I think ignoring CondChgd bit should be enough for NMI
watchdog perspective.
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20140625.103503.409316067.d.hatayama@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c81c8a1eeede61e92a15103748c23d100880cc8a upstream.
In __ioremap_caller() (the guts of ioremap), we loop over the range of
pfns being remapped and checks each one individually with page_is_ram().
For large ioremaps, this can be very slow. For example, we have a
device with a 256 GiB PCI BAR, and ioremapping this BAR can take 20+
seconds -- sometimes long enough to trigger the soft lockup detector!
Internally, page_is_ram() calls walk_system_ram_range() on a single
page. Instead, we can make a single call to walk_system_ram_range()
from __ioremap_caller(), and do our further checks only for any RAM
pages that we find. For the common case of MMIO, this saves an enormous
amount of work, since the range being ioremapped doesn't intersect
system RAM at all.
With this change, ioremap on our 256 GiB BAR takes less than 1 second.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Link: http://lkml.kernel.org/r/1399054721-1331-1-git-send-email-roland@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit cfe82d4f45c7cc39332a2be7c4c1d3bf279bbd3d upstream.
Byte-to-bit-count computation is only partly converted to big-endian and is
mixing in CPU-endian values. Problem was noticed by sparce with warning:
CHECK arch/x86/crypto/sha512_ssse3_glue.c
arch/x86/crypto/sha512_ssse3_glue.c:144:19: warning: restricted __be64 degrades to integer
arch/x86/crypto/sha512_ssse3_glue.c:144:17: warning: incorrect type in assignment (different base types)
arch/x86/crypto/sha512_ssse3_glue.c:144:17: expected restricted __be64 <noident>
arch/x86/crypto/sha512_ssse3_glue.c:144:17: got unsigned long long
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e1fa108d24697b78348fd4e5a531029a50d0d36d upstream.
When kvm_write_guest writes the tsc_ref structure to the guest, or it will lead
the low HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT bits of the TSC page address
must be cleared, or the guest can see a non-zero sequence number.
Otherwise Windows guests would not be able to get a correct clocksource
(QueryPerformanceCounter will always return 0) which causes serious chaos.
Signed-off-by: Xiaoming Gao <newtongao@tencnet.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7cb060a91c0efc5ff94f83c6df3ed705e143cdb9 upstream.
KVM does not really do much with the PAT, so this went unnoticed for a
long time. It is exposed however if you try to do rdmsr on the PAT
register.
Reported-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 682367c494869008eb89ef733f196e99415ae862 upstream.
Recent Intel CPUs have 10 variable range MTRRs. Since operating systems
sometime make assumptions on CPUs while they ignore capability MSRs, it is
better for KVM to be consistent with recent CPUs. Reporting more MTRRs than
actually supported has no functional implications.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b9cd18de4db3c9ffa7e17b0dc0ca99ed5aa4d43a upstream.
The 'sysret' fastpath does not correctly restore even all regular
registers, much less any segment registers or reflags values. That is
very much part of why it's faster than 'iret'.
Normally that isn't a problem, because the normal ptrace() interface
catches the process using the signal handler infrastructure, which
always returns with an iret.
However, some paths can get caught using ptrace_event() instead of the
signal path, and for those we need to make sure that we aren't going to
return to user space using 'sysret'. Otherwise the modifications that
may have been done to the register set by the tracer wouldn't
necessarily take effect.
Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from
arch_ptrace_stop_needed() which is invoked from ptrace_stop().
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 554086d85e71f30abe46fc014fea31929a7c6a8a upstream.
The bad syscall nr paths are their own incomprehensible route
through the entry control flow. Rearrange them to work just like
syscalls that return -ENOSYS.
This fixes an OOPS in the audit code when fast-path auditing is
enabled and sysenter gets a bad syscall nr (CVE-2014-4508).
This has probably been broken since Linux 2.6.27:
af0575bba0 i386 syscall audit fast-path
Cc: Roland McGrath <roland@redhat.com>
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/e09c499eade6fc321266dd6b54da7beb28d6991c.1403558229.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7fd44dacdd803c0bbf38bf478d51d280902bb0f1 upstream.
The io_setup takes a pointer to a context id of type aio_context_t.
This in turn is typed to a __kernel_ulong_t. We could tweak the
exported headers to define this as a 64bit quantity for specific
ABIs, but since we already have a 32bit compat shim for the x86 ABI,
let's just re-use that logic. The libaio package is also written to
expect this as a pointer type, so a compat shim would simplify that.
The io_submit func operates on an array of pointers to iocb structs.
Padding out the array to be 64bit aligned is a huge pain, so convert
it over to the existing compat shim too.
We don't convert io_getevents to the compat func as its only purpose
is to handle the timespec struct, and the x32 ABI uses 64bit times.
With this change, the libaio package can now pass its testsuite when
built for the x32 ABI.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Link: http://lkml.kernel.org/r/1399250595-5005-1-git-send-email-vapier@gentoo.org
Cc: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 246f2d2ee1d715e1077fc47d61c394569c8ee692 upstream.
It is not safe to use LAR to filter when to go down the espfix path,
because the LDT is per-process (rather than per-thread) and another
thread might change the descriptors behind our back. Fortunately it
is always *safe* (if a bit slow) to go down the espfix path, and a
32-bit LDT stack segment is extremely rare.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c177c81e09e517bbf75b67762cdab1b83aba6976 upstream.
Currently hugepage migration is available for all archs which support
pmd-level hugepage, but testing is done only for x86_64 and there're
bugs for other archs. So to avoid breaking such archs, this patch
limits the availability strictly to x86_64 until developers of other
archs get interested in enabling this feature.
Simply disabling hugepage migration on non-x86_64 archs is not enough to
fix the reported problem where sys_move_pages() hits the BUG_ON() in
follow_page(FOLL_GET), so let's fix this by checking if hugepage
migration is supported in vma_migratable().
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fc57ac2c9ca8109ea97fcc594f4be436944230cc upstream.
When Hyper-V enlightenments are in effect, Windows prefers to issue an
Hyper-V MSR write to issue an EOI rather than an x2apic MSR write.
The Hyper-V MSR write is not handled by the processor, and besides
being slower, this also causes bugs with APIC virtualization. The
reason is that on EOI the processor will modify the highest in-service
interrupt (SVI) field of the VMCS, as explained in section 29.1.4 of
the SDM; every other step in EOI virtualization is already done by
apic_send_eoi or on VM entry, but this one is missing.
We need to do the same, and be careful not to muck with the isr_count
and highest_isr_cache fields that are unused when virtual interrupt
delivery is enabled.
Reviewed-by: Yang Zhang <yang.z.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fa81511bb0bbb2b1aace3695ce869da9762624ff upstream.
Checkin:
b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
disabled 16-bit segments on 64-bit kernels due to an information
leak. However, it does seem that people are genuinely using Wine to
run old 16-bit Windows programs on Linux.
A proper fix for this ("espfix64") is coming in the upcoming merge
window, but as a temporary fix, create a sysctl to allow the
administrator to re-enable support for 16-bit segments.
It adds a "/proc/sys/abi/ldt16" sysctl that defaults to zero (off). If
you hit this issue and care about your old Windows program more than
you care about a kernel stack address information leak, you can do
echo 1 > /proc/sys/abi/ldt16
as root (add it to your startup scripts), and you should be ok.
The sysctl table is only added if you have COMPAT support enabled on
x86-64, but I assume anybody who runs old windows binaries very much
does that ;)
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/CA%2B55aFw9BPoD10U1LfHbOMpHWZkvJTkMcfCs9s3urPr1YyWBxw@mail.gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9844f5462392b53824e8b86726e7c33b5ecbb676 upstream.
The invalidation is required in order to maintain proper semantics
under CoW conditions. In scenarios where a process clones several
threads, a thread operating on a core whose DTLB entry for a
particular hugepage has not been invalidated, will be reading from
the hugepage that belongs to the forked child process, even after
hugetlb_cow().
The thread will not see the updated page as long as the stale DTLB
entry remains cached, the thread attempts to write into the page,
the child process exits, or the thread gets migrated to a different
processor.
Signed-off-by: Anthony Iliopoulos <anthony.iliopoulos@huawei.com>
Link: http://lkml.kernel.org/r/20140514092948.GA17391@server-36.huawei.corp
Suggested-by: Shay Goikhman <shay.goikhman@huawei.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 773cd38f40b8834be991dbfed36683acc1dd41ee ]
bpf_alloc_binary() adds 128 bytes of room to JITed program image
and rounds it up to the nearest page size. If image size is close
to page size (like 4000), it is rounded to two pages:
round_up(4000 + 4 + 128) == 8192
then 'hole' is computed as 8192 - (4000 + 4) = 4188
If prandom_u32() % hole selects a number >= PAGE_SIZE - sizeof(*header)
then kernel will crash during bpf_jit_free():
kernel BUG at arch/x86/mm/pageattr.c:887!
Call Trace:
[<ffffffff81037285>] change_page_attr_set_clr+0x135/0x460
[<ffffffff81694cc0>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff810378ff>] set_memory_rw+0x2f/0x40
[<ffffffffa01a0d8d>] bpf_jit_free_deferred+0x2d/0x60
[<ffffffff8106bf98>] process_one_work+0x1d8/0x6a0
[<ffffffff8106bf38>] ? process_one_work+0x178/0x6a0
[<ffffffff8106c90c>] worker_thread+0x11c/0x370
since bpf_jit_free() does:
unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
struct bpf_binary_header *header = (void *)addr;
to compute start address of 'bpf_binary_header'
and header->pages will pass junk to:
set_memory_rw(addr, header->pages);
Fix it by making sure that &header->image[prandom_u32() % hole] and &header
are in the same page
Fixes: 314beb9bcabfd ("x86: bpf_jit_comp: secure bpf jit against spraying attacks")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e0fc17a936334c08b2729fff87168c03fdecf5b6 upstream.
The git commit a945928ea2709bc0e8e8165d33aed855a0110279
('xen: Do not enable spinlocks before jump_label_init() has executed')
was added to deal with the jump machinery. Earlier the code
that turned on the jump label was only called by Xen specific
functions. But now that it had been moved to the initcall machinery
it gets called on Xen, KVM, and baremetal - ouch!. And the detection
machinery to only call it on Xen wasn't remembered in the heat
of merge window excitement.
This means that the slowpath is enabled on baremetal while it should
not be.
Reported-by: Waiman Long <waiman.long@hp.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Many people reported preemption/reschedule problems with i386 kernels
for .13 and .14. After Michele bisected this to a combination of
3e8e42c69bb ("sched: Revert need_resched() to look at TIF_NEED_RESCHED")
ded79754754 ("irq: Force hardirq exit's softirq processing on its own stack")
it finally dawned on me that i386's current_thread_info() was to
blame.
When we are on interrupt/exception stacks, we fail to observe the
right TIF_NEED_RESCHED bit and therefore the PREEMPT_NEED_RESCHED
folding malfunctions.
Current upstream fixes this by making i386 behave the same as x86_64
already did:
2432e1364bbe ("x86: Nuke the supervisor_stack field in i386 thread_info")
b807902a88c4 ("x86: Nuke GET_THREAD_INFO_WITH_ESP() macro for i386")
0788aa6a23cb ("x86: Prepare removal of previous_esp from i386 thread_info structure")
198d208df437 ("x86: Keep thread_info on thread stack in x86_32")
However, that is far too much to stuff into -stable. Therefore I
propose we merge the below patch which uses task_thread_info(current)
for tif_need_resched() instead of the ESP based current_thread_info().
This makes sure we always observe the one true TIF_NEED_RESCHED bit
and things will work as expected again.
Cc: bp@alien8.de
Cc: fweisbec@gmail.com
Cc: david.a.cohen@linux.intel.com
Cc: mingo@kernel.org
Cc: fweisbec@gmail.com
Cc: greg@kroah.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: gregkh@linuxfoundation.org
Cc: pbonzini@redhat.com
Cc: rostedt@goodmis.org
Cc: stefan.bader@canonical.com
Cc: mingo@kernel.org
Cc: toralf.foerster@gmx.de
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: torvalds@linux-foundation.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: <stable-commits@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: peterz@infradead.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: barra_cuda@katamail.com
Tested-by: Stefan Bader <stefan.bader@canonical.com>
Tested-by: Toralf F¿rster <toralf.foerster@gmx.de>
Tested-by: Michele Ballabio <barra_cuda@katamail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140409142447.GD13658@twins.programming.kicks-ass.net
|
|
commit b351c39cc9e0151cee9b8d52a1e714928faabb38 upstream.
Function and callers can be preempted.
https://bugzilla.kernel.org/show_bug.cgi?id=73721
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 14262d67fe348018af368a07430fbc06eadeabb1 upstream.
If you are using a 64-bit kernel with 32-bit userland, then
scripts/gcc-x86_64-has-stack-protector.sh invokes 32-bit gcc
with -mcmodel=kernel, which produces:
<stdin>:1:0: error: code model 'kernel' not supported in the 32 bit mode
and trips the "broken compiler" test at arch/x86/Makefile:120.
There are several places a fix is possible, but the following seems
cleanest. (But it's minimal; it would also be possible to factor
out a bunch of stuff from the two branches of the if.)
Signed-off-by: George Spelvin <linux@horizon.com>
Link: http://lkml.kernel.org/r/20140507210552.7581.qmail@ns.horizon.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7e8213c1f3acc064aef37813a39f13cbfe7c3ce7 upstream.
code32_start should point at the start of the protected mode code, and
*not* at the beginning of the bzImage. This is much easier to do in
assembly so document that callers of make_boot_params() need to fill out
code32_start.
The fallout from this bug is that we would end up relocating the image
but copying the image at some offset, resulting in what appeared to be
memory corruption.
Reported-by: Thomas Bächler <thomas@archlinux.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b3b42ac2cbae1f3cecbb6229964a4d48af31d382 upstream.
The IRET instruction, when returning to a 16-bit segment, only
restores the bottom 16 bits of the user space stack pointer. We have
a software workaround for that ("espfix") for the 32-bit kernel, but
it relies on a nonzero stack segment base which is not available in
32-bit mode.
Since 16-bit support is somewhat crippled anyway on a 64-bit kernel
(no V86 mode), and most (if not quite all) 64-bit processors support
virtualization for the users who really need it, simply reject
attempts at creating a 16-bit segment when running on top of a 64-bit
kernel.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-kicdm89kzw9lldryb1br9od0@git.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 12729f14d8357fb845d75155228b21e76360272d upstream.
If a failure occurs while modifying ftrace function, it bails out and will
remove the tracepoints to be back to what the code originally was.
There is missing the final sync run across the CPUs after the fix up is done
and before the ftrace int3 handler flag is reset.
Here's the description of the problem:
CPU0 CPU1
---- ----
remove_breakpoint();
modifying_ftrace_code = 0;
[still sees breakpoint]
<takes trap>
[sees modifying_ftrace_code as zero]
[no breakpoint handler]
[goto failed case]
[trap exception - kernel breakpoint, no
handler]
BUG()
Link: http://lkml.kernel.org/r/1393258342-29978-2-git-send-email-pmladek@suse.cz
Fixes: 8a4d0a687a5 "ftrace: Use breakpoint method to update ftrace caller"
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c2bc11f10a39527cd1bb252097b5525664560956 upstream.
This patch enables Opmask, ZMM_Hi256, and Hi16_ZMM AVX-512 states for
xstate context switch.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1392931491-33237-2-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8e5780fdeef7dc490b3f0b3a62704593721fa4f3 upstream.
AVX-512 is an extention of AVX2. Its spec can be found at:
http://download-software.intel.com/sites/default/files/managed/71/2e/319433-017.pdf
This patch detects AVX-512 features by CPUID.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1392931491-33237-1-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 06325190bd577e11429444d54f454b9d13f560c9 upstream.
Just like for other ISA extension instruction uses we should check
whether the assembler actually supports them. The fallback here simply
is to encode an instruction with fixed operands (%eax and %ecx).
[ hpa: tagging for -stable as a build fix ]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/530F0996020000780011FBE7@nat28.tlf.novell.com
Cc: Francesco Fusco <ffusco@redhat.com>
Cc: Thomas Graf <tgraf@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6f8a1b335fde143b7407036e2368d3cd6eb55674 upstream.
Commit 03bbcb2e7e2 (iommu/vt-d: add quirk for broken interrupt
remapping on 55XX chipsets) properly disables irq remapping on the
5500/5520 chipsets that don't correctly perform that feature.
However, when I wrote it, I followed the errata sheet linked in that
commit too closely, and explicitly tied the activation of the quirk to
revision 0x13 of the chip, under the assumption that earlier revisions
were not in the field. Recently a system was reported to be suffering
from this remap bug and the quirk hadn't triggered, because the
revision id register read at a lower value that 0x13, so the quirk
test failed improperly. Given this, it seems only prudent to adjust
this quirk so that any revision less than 0x13 has the quirk asserted.
[ tglx: Removed the 0x12 comparison of pci id 3405 as this is covered
by the <= 0x13 check already ]
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1394649873-14913-1-git-send-email-nhorman@tuxdriver.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ca3ba2a2f4a49a308e7d78c784d51b2332064f15 upstream.
This patch bypass the timer_irq_works() check for hyperv guest since:
- It was guaranteed to work.
- timer_irq_works() may fail sometime due to the lpj calibration were inaccurate
in a hyperv guest or a buggy host.
In the future, we should get the tsc frequency from hypervisor and use preset
lpj instead.
[ hpa: I would prefer to not defer things to "the future" in the future... ]
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: http://lkml.kernel.org/r/1393558229-14755-1-git-send-email-jasowang@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8ceee72808d1ae3fb191284afc2257a2be964725 upstream.
The GHASH setkey() function uses SSE registers but fails to call
kernel_fpu_begin()/kernel_fpu_end(). Instead of adding these calls, and
then having to deal with the restriction that they cannot be called from
interrupt context, move the setkey() implementation to the C domain.
Note that setkey() does not use any particular SSE features and is not
expected to become a performance bottleneck.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Fixes: 0e1227d356e9b (crypto: ghash - Add PCLMULQDQ accelerated implementation)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b7b898ae0c0a82489511a1ce1b35f26215e6beb5 upstream.
Currently, running SetVirtualAddressMap() and passing the physical
address of the virtual map array was working only by a lucky coincidence
because the memory was present in the EFI page table too. Until Toshi
went and booted this on a big HP box - the krealloc() manner of resizing
the memmap we're doing did allocate from such physical addresses which
were not mapped anymore and boom:
http://lkml.kernel.org/r/1386806463.1791.295.camel@misato.fc.hp.com
One way to take care of that issue is to reimplement the krealloc thing
but with pages. We start with contiguous pages of order 1, i.e. 2 pages,
and when we deplete that memory (shouldn't happen all that often but you
know firmware) we realloc the next power-of-two pages.
Having the pages, it is much more handy and easy to map them into the
EFI page table with the already existing mapping code which we're using
for building the virtual mappings.
Thanks to Toshi Kani and Matt for the great debugging help.
Reported-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 42a5477251f0e0f33ad5f6a95c48d685ec03191e upstream.
We will use it in efi so expose it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
On x86 uniprocessor systems topology_physical_package_id() returns -1
which causes rapl_cpu_prepare() to leave rapl_pmu variable uninitialized
which leads to GPF in rapl_pmu_init().
See arch/x86/kernel/cpu/perf_event_intel_rapl.c.
It turns out that physical_package_id and core_id can actually be
retreived for uniprocessor systems too. Enabling them also fixes
rapl_pmu code.
Signed-off-by: Artem Fetishev <artem_fetishev@epam.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This reverts commit a9c8e4beeeb64c22b84c803747487857fe424b68.
PTEs in Xen PV guests must contain machine addresses if _PAGE_PRESENT
is set and pseudo-physical addresses is _PAGE_PRESENT is clear.
This is because during a domain save/restore (migration) the page
table entries are "canonicalised" and uncanonicalised". i.e., MFNs are
converted to PFNs during domain save so that on a restore the page
table entries may be rewritten with the new MFNs on the destination.
This canonicalisation is only done for PTEs that are present.
This change resulted in writing PTEs with MFNs if _PAGE_PROTNONE (or
_PAGE_NUMA) was set but _PAGE_PRESENT was clear. These PTEs would be
migrated as-is which would result in unexpected behaviour in the
destination domain. Either a) the MFN would be translated to the
wrong PFN/page; b) setting the _PAGE_PRESENT bit would clear the PTE
because the MFN is no longer owned by the domain; or c) the present
bit would not get set.
Symptoms include "Bad page" reports when munmapping after migrating a
domain.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <stable@vger.kernel.org> [3.12+]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI resource management fix from Bjorn Helgaas:
"This is a fix for an AGP regression exposed by e501b3d87f00 ("agp:
Support 64-bit APBASE"), which we merged in v3.14-rc1.
We've warned about the conflict between the GART and PCI resources and
cleared out the PCI resource for a long time, but after e501b3d87f00,
we still *use* that cleared-out PCI resource. I think the GART
resource is incorrect, so this patch removes it"
* tag 'pci-v3.14-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Revert "[PATCH] Insert GART region into resource map"
|
|
This reverts commit 56dd669a138c, which makes the GART visible in
/proc/iomem. This fixes a regression: e501b3d87f00 ("agp: Support 64-bit
APBASE") exposed an existing problem with a conflict between the GART
region and a PCI BAR region.
The GART addresses are bus addresses, not CPU addresses, and therefore
should not be inserted in iomem_resource.
On many machines, the GART region is addressable by the CPU as well as by
an AGP master, but CPU addressability is not required by the spec. On some
of these machines, the GART is mapped by a PCI BAR, and in that case, the
PCI core automatically inserts it into iomem_resource, just as it does for
all BARs.
Inserting it here means we'll have a conflict if the PCI core later tries
to claim the GART region, so let's drop the insertion here.
The conflict indirectly causes X failures, as reported by Jouni in the
bugzilla below. We detected the conflict even before e501b3d87f00, but
after it the AGP code (fix_northbridge()) uses the PCI resource (which is
zeroed because of the conflict) instead of reading the BAR again.
Conflicts:
arch/x86_64/kernel/aperture.c
Fixes: e501b3d87f00 agp: Support 64-bit APBASE
Link: https://bugzilla.kernel.org/show_bug.cgi?id=72201
Reported-and-tested-by: Jouni Mettälä <jtmettala@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Misc smaller fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Fix leak in uncore_type_init failure paths
perf machine: Use map as success in ip__resolve_ams
perf symbols: Fix crash in elf_section_by_name
perf trace: Decode architecture-specific signal numbers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
"Two x86 fixes: Suresh's eager FPU fix, and a fix to the NUMA quirk for
AMD northbridges.
This only includes Suresh's fix patch, not the "mostly a cleanup"
patch which had __init issues"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/amd/numa: Fix northbridge quirk to assign correct NUMA node
x86, fpu: Check tsk_used_math() in kernel_fpu_end() for eager FPU
|
|
For systems with multiple servers and routed fabric, all
northbridges get assigned to the first server. Fix this by also
using the node reported from the PCI bus. For single-fabric
systems, the northbriges are on PCI bus 0 by definition, which
are on NUMA node 0 by definition, so this is invarient on most
systems.
Tested on fam10h and fam15h single and multi-fabric systems and
candidate for stable.
Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Steffen Persvold <sp@numascale.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1394710981-3596-1-git-send-email-daniel@numascale.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|