summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2015-10-03Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Fixes all around the map: W+X kernel mapping fix, WCHAN fixes, two build failure fixes for corner case configs, x32 header fix and a speling fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds x86/mm: Set NX on gap between __ex_table and rodata x86/kexec: Fix kexec crash in syscall kexec_file_load() x86/process: Unify 32bit and 64bit implementations of get_wchan() x86/process: Add proper bound checks in 64bit get_wchan() x86, efi, kasan: Fix build failure on !KASAN && KMEMCHECK=y kernels x86/hyperv: Fix the build in the !CONFIG_KEXEC_CORE case x86/cpufeatures: Correct spelling of the HWP_NOTIFY flag
2015-10-03Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: "Two EFI fixes: one for x86, one for ARM, fixing a boot crash bug that can trigger under newer EFI firmware" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arm64/efi: Fix boot crash by not padding between EFI_MEMORY_RUNTIME regions x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, instead of top-down
2015-10-02x86/headers/uapi: Fix __BITS_PER_LONG value for x32 buildsBen Hutchings
On x32, gcc predefines __x86_64__ but long is only 32-bit. Use __ILP32__ to distinguish x32. Fixes this compiler error in perf: tools/include/asm-generic/bitops/__ffs.h: In function '__ffs': tools/include/asm-generic/bitops/__ffs.h:19:8: error: right shift count >= width of type [-Werror=shift-count-overflow] word >>= 32; ^ This isn't sufficient to build perf for x32, though. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1443660043.2730.15.camel@decadent.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-02x86/mm: Set NX on gap between __ex_table and rodataStephen Smalley
Unused space between the end of __ex_table and the start of rodata can be left W+x in the kernel page tables. Extend the setting of the NX bit to cover this gap by starting from text_end rather than rodata_start. Before: ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd 0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte 0xffffffff81754000-0xffffffff81800000 688K RW GLB x pte 0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd 0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte 0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte 0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd 0xffffffff82200000-0xffffffffa0000000 478M pmd After: ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd 0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte 0xffffffff81754000-0xffffffff81800000 688K RW GLB NX pte 0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd 0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte 0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte 0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd 0xffffffff82200000-0xffffffffa0000000 478M pmd Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-sds@tycho.nsa.gov Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-02x86/kexec: Fix kexec crash in syscall kexec_file_load()Lee, Chun-Yi
The original bug is a page fault crash that sometimes happens on big machines when preparing ELF headers: BUG: unable to handle kernel paging request at ffffc90613fc9000 IP: [<ffffffff8103d645>] prepare_elf64_ram_headers_callback+0x165/0x260 The bug is caused by us under-counting the number of memory ranges and subsequently not allocating enough ELF header space for them. The bug is typically masked on smaller systems, because the ELF header allocation is rounded up to the next page. This patch modifies the code in fill_up_crash_elf_data() by using walk_system_ram_res() instead of walk_system_ram_range() to correctly count the max number of crash memory ranges. That's because the walk_system_ram_range() filters out small memory regions that reside in the same page, but walk_system_ram_res() does not. Here's how I found the bug: After tracing prepare_elf64_headers() and prepare_elf64_ram_headers_callback(), the code uses walk_system_ram_res() to fill-in crash memory regions information to the program header, so it counts those small memory regions that reside in a page area. But, when the kernel was using walk_system_ram_range() in fill_up_crash_elf_data() to count the number of crash memory regions, it filters out small regions. I printed those small memory regions, for example: kexec: Get nr_ram ranges. vaddr=0xffff880077592258 paddr=0x77592258, sz=0xdc0 Based on the code in walk_system_ram_range(), this memory region will be filtered out: pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593 end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593 end_pfn - pfn = 0x77593 - 0x77593 = 0 <=== if (end_pfn > pfn) is FALSE So, the max_nr_ranges that's counted by the kernel doesn't include small memory regions - causing us to under-allocate the required space. That causes the page fault crash that happens in a later code path when preparing ELF headers. This bug is not easy to reproduce on small machines that have few CPUs, because the allocated page aligned ELF buffer has more free space to cover those small memory regions' PT_LOAD headers. Signed-off-by: Lee, Chun-Yi <jlee@suse.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: kexec@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1443531537-29436-1-git-send-email-jlee@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-01Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "12 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: dmapool: fix overflow condition in pool_find_page() thermal: avoid division by zero in power allocator memcg: remove pcp_counter_lock kprobes: use _do_fork() in samples to make them work again drivers/input/joystick/Kconfig: zhenhua.c needs BITREVERSE memcg: make mem_cgroup_read_stat() unsigned memcg: fix dirty page migration dax: fix NULL pointer in __dax_pmd_fault() mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault mm/slab: fix unexpected index mapping result of kmalloc_size(INDEX_NODE+1) userfaultfd: remove kernel header include from uapi header arch/x86/include/asm/efi.h: fix build failure
2015-10-01arch/x86/include/asm/efi.h: fix build failureAndrey Ryabinin
With KMEMCHECK=y, KASAN=n: arch/x86/platform/efi/efi.c:673:3: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration] arch/x86/platform/efi/efi_64.c:139:2: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration] arch/x86/include/asm/desc.h:121:2: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration] Don't #undef memcpy if KASAN=n. Fixes: 769a8089c1fd ("x86, efi, kasan: #undef memset/memcpy/memmove per arch") Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Reported-by: Ingo Molnar <mingo@kernel.org> Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "(Relatively) a lot of reverts, mostly. Bugs have trickled in for a new feature in 4.2 (MTRR support in guests) so I'm reverting it all; let's not make this -rc period busier for KVM than it's been so far. This covers the four reverts from me. The fifth patch is being reverted because Radim found a bug in the implementation of stable scheduler clock, *but* also managed to implement the feature entirely without hypervisor support. So instead of fixing the hypervisor side we can remove it completely; 4.4 will get the new implementation" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS Update KVM homepage Url Revert "KVM: SVM: use NPT page attributes" Revert "KVM: svm: handle KVM_X86_QUIRK_CD_NW_CLEARED in svm_get_mt_mask" Revert "KVM: SVM: Sync g_pat with guest-written PAT value" Revert "KVM: x86: apply guest MTRR virtualization on host reserved pages" Revert "KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR"
2015-10-01Use WARN_ON_ONCE for missing X86_FEATURE_NRIPSDirk Müller
The cpu feature flags are not ever going to change, so warning everytime can cause a lot of kernel log spam (in our case more than 10GB/hour). The warning seems to only occur when nested virtualization is enabled, so it's probably triggered by a KVM bug. This is a sensible and safe change anyway, and the KVM bug fix might not be suitable for stable releases anyway. Cc: stable@vger.kernel.org Signed-off-by: Dirk Mueller <dmueller@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01Revert "KVM: SVM: use NPT page attributes"Paolo Bonzini
This reverts commit 3c2e7f7de3240216042b61073803b61b9b3cfb22. Initializing the mapping from MTRR to PAT values was reported to fail nondeterministically, and it also caused extremely slow boot (due to caching getting disabled---bug 103321) with assigned devices. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Reported-by: Sebastian Schuette <dracon@ewetel.net> Cc: stable@vger.kernel.org # 4.2+ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01Revert "KVM: svm: handle KVM_X86_QUIRK_CD_NW_CLEARED in svm_get_mt_mask"Paolo Bonzini
This reverts commit 5492830370171b6a4ede8a3bfba687a8d0f25fa5. It builds on the commit that is being reverted next. Cc: stable@vger.kernel.org # 4.2+ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01Revert "KVM: SVM: Sync g_pat with guest-written PAT value"Paolo Bonzini
This reverts commit e098223b789b4a618dacd79e5e0dad4a9d5018d1, which has a dependency on other commits being reverted. Cc: stable@vger.kernel.org # 4.2+ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01Revert "KVM: x86: apply guest MTRR virtualization on host reserved pages"Paolo Bonzini
This reverts commit fd717f11015f673487ffc826e59b2bad69d20fe5. It was reported to cause Machine Check Exceptions (bug 104091). Reported-by: harn-solo@gmx.de Cc: stable@vger.kernel.org # 4.2+ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, ↵Matt Fleming
instead of top-down Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced that signals that the firmware PE/COFF loader supports splitting code and data sections of PE/COFF images into separate EFI memory map entries. This allows the kernel to map those regions with strict memory protections, e.g. EFI_MEMORY_RO for code, EFI_MEMORY_XP for data, etc. Unfortunately, an unwritten requirement of this new feature is that the regions need to be mapped with the same offsets relative to each other as observed in the EFI memory map. If this is not done crashes like this may occur, BUG: unable to handle kernel paging request at fffffffefe6086dd IP: [<fffffffefe6086dd>] 0xfffffffefe6086dd Call Trace: [<ffffffff8104c90e>] efi_call+0x7e/0x100 [<ffffffff81602091>] ? virt_efi_set_variable+0x61/0x90 [<ffffffff8104c583>] efi_delete_dummy_variable+0x63/0x70 [<ffffffff81f4e4aa>] efi_enter_virtual_mode+0x383/0x392 [<ffffffff81f37e1b>] start_kernel+0x38a/0x417 [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef Here 0xfffffffefe6086dd refers to an address the firmware expects to be mapped but which the OS never claimed was mapped. The issue is that included in these regions are relative addresses to other regions which were emitted by the firmware toolchain before the "splitting" of sections occurred at runtime. Needless to say, we don't satisfy this unwritten requirement on x86_64 and instead map the EFI memory map entries in reverse order. The above crash is almost certainly triggerable with any kernel newer than v3.13 because that's when we rewrote the EFI runtime region mapping code, in commit d2f7cbe7b26a ("x86/efi: Runtime services virtual mapping"). For kernel versions before v3.13 things may work by pure luck depending on the fragmentation of the kernel virtual address space at the time we map the EFI regions. Instead of mapping the EFI memory map entries in reverse order, where entry N has a higher virtual address than entry N+1, map them in the same order as they appear in the EFI memory map to preserve this relative offset between regions. This patch has been kept as small as possible with the intention that it should be applied aggressively to stable and distribution kernels. It is very much a bugfix rather than support for a new feature, since when EFI_PROPERTIES_TABLE is enabled we must map things as outlined above to even boot - we have no way of asking the firmware not to split the code/data regions. In fact, this patch doesn't even make use of the more strict memory protections available in UEFI v2.5. That will come later. Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: <stable@vger.kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Chun-Yi <jlee@suse.com> Cc: Dave Young <dyoung@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: James Bottomley <JBottomley@Odin.com> Cc: Lee, Chun-Yi <jlee@suse.com> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1443218539-7610-2-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-01Merge remote-tracking branch 'tglx/x86/urgent' into x86/urgentIngo Molnar
Pick up the WCHAN fixes from Thomas. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-30x86/process: Unify 32bit and 64bit implementations of get_wchan()Thomas Gleixner
The stack layout and the functionality is identical. Use the 64bit version for all of x86. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: kasan-dev <kasan-dev@googlegroups.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de> Link: http://lkml.kernel.org/r/20150930083302.779694618@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-30x86/process: Add proper bound checks in 64bit get_wchan()Thomas Gleixner
Dmitry Vyukov reported the following using trinity and the memory error detector AddressSanitizer (https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel). [ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on address ffff88002e280000 [ 124.576801] ffff88002e280000 is located 131938492886538 bytes to the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a) [ 124.578633] Accessed by thread T10915: [ 124.579295] inlined in describe_heap_address ./arch/x86/mm/asan/report.c:164 [ 124.579295] #0 ffffffff810dd277 in asan_report_error ./arch/x86/mm/asan/report.c:278 [ 124.580137] #1 ffffffff810dc6a0 in asan_check_region ./arch/x86/mm/asan/asan.c:37 [ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0 [ 124.581893] #3 ffffffff8107c093 in get_wchan ./arch/x86/kernel/process_64.c:444 The address checks in the 64bit implementation of get_wchan() are wrong in several ways: - The lower bound of the stack is not the start of the stack page. It's the start of the stack page plus sizeof (struct thread_info) - The upper bound must be: top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long). The 2 * sizeof(unsigned long) is required because the stack pointer points at the frame pointer. The layout on the stack is: ... IP FP ... IP FP. So we need to make sure that both IP and FP are in the bounds. Fix the bound checks and get rid of the mix of numeric constants, u64 and unsigned long. Making all unsigned long allows us to use the same function for 32bit as well. Use READ_ONCE() when accessing the stack. This does not prevent a concurrent wakeup of the task and the stack changing, but at least it avoids TOCTOU. Also check task state at the end of the loop. Again that does not prevent concurrent changes, but it avoids walking for nothing. Add proper comments while at it. Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Based-on-patch-from: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: kasan-dev <kasan-dev@googlegroups.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20150930083302.694788319@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-30x86, efi, kasan: Fix build failure on !KASAN && KMEMCHECK=y kernelsAndrey Ryabinin
With KMEMCHECK=y, KASAN=n we get this build failure: arch/x86/platform/efi/efi.c:673:3: error: implicit declaration of function ‘memcpy’ [-Werror=implicit-function-declaration] arch/x86/platform/efi/efi_64.c:139:2: error: implicit declaration of function ‘memcpy’ [-Werror=implicit-function-declaration] arch/x86/include/asm/desc.h:121:2: error: implicit declaration of function ‘memcpy’ [-Werror=implicit-function-declaration] Don't #undef memcpy if KASAN=n. Reported-by: Ingo Molnar <mingo@kernel.org> Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 769a8089c1fd ("x86, efi, kasan: #undef memset/memcpy/memmove per arch") Link: http://lkml.kernel.org/r/1443544814-20122-1-git-send-email-ryabinin.a.a@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-30Merge tag 'v4.3-rc3' into x86/urgent, before applying dependent fixIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-30x86/hyperv: Fix the build in the !CONFIG_KEXEC_CORE caseVitaly Kuznetsov
Recent changes in the Hyper-V driver: b4370df2b1f5 ("Drivers: hv: vmbus: add special crash handler") broke the build when CONFIG_KEXEC_CORE is not set: arch/x86/built-in.o: In function `hv_machine_crash_shutdown': arch/x86/kernel/cpu/mshyperv.c:112: undefined reference to `native_machine_crash_shutdown' Decorate all kexec related code with #ifdef CONFIG_KEXEC_CORE. Reported-by: Jim Davis <jim.epost@gmail.com> Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: devel@linuxdriverproject.org Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1443002577-25370-1-git-send-email-vkuznets@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-28Revert "KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock ↵Radim Krčmář
system MSR" Shifting pvclock_vcpu_time_info.system_time on write to KVM system time MSR is a change of ABI. Probably only 2.6.16 based SLES 10 breaks due to its custom enhancements to kvmclock, but KVM never declared the MSR only for one-shot initialization. (Doc says that only one write is needed.) This reverts commit b7e60c5aedd2b63f16ef06fde4f81ca032211bc5. And adds a note to the definition of PVCLOCK_COUNTS_FROM_ZERO. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-27Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Another pile of fixes for perf: - Plug overflows and races in the core code - Sanitize the flow of the perf syscall so we error out before handling the more complex and hard to undo setups - Improve and fix Broadwell and Skylake hardware support - Revert a fix which broke what it tried to fix in perf tools - A couple of smaller fixes in various places of perf tools" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Fix copying of /proc/kcore perf intel-pt: Remove no_force_psb from documentation perf probe: Use existing routine to look for a kernel module by dso->short_name perf/x86: Change test_aperfmperf() and test_intel() to static tools lib traceevent: Fix string handling in heterogeneous arch environments perf record: Avoid infinite loop at buildid processing with no samples perf: Fix races in computing the header sizes perf: Fix u16 overflows perf: Restructure perf syscall point of no return perf/x86/intel: Fix Skylake FRONTEND MSR extrareg mask perf/x86/intel/pebs: Add PEBS frontend profiling for Skylake perf/x86/intel: Make the CYCLE_ACTIVITY.* constraint on Broadwell more specific perf tools: Bool functions shouldn't return -1 tools build: Add test for presence of __get_cpuid() gcc builtin tools build: Add test for presence of numa_num_possible_cpus() in libnuma Revert "perf symbols: Fix mismatched declarations for elf_getphdrnum" perf stat: Fix per-pkg event reporting bug
2015-09-27Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Two bugfixes from Andy addressing at least some of the subtle NMI related wreckage which has been reported by Sasha Levin" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/nmi/64: Fix a paravirt stack-clobbering bug in the NMI code x86/paravirt: Replace the paravirt nop with a bona fide empty function
2015-09-25Merge tag 'pci-v4.3-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "These are fixes for things we merged for v4.3 (VPD, MSI, and bridge window management), and a new Renesas R8A7794 SoC device ID. Details: Resource management: - Revert pci_read_bridge_bases() unification (Bjorn Helgaas) - Clear IORESOURCE_UNSET when clipping a bridge window (Bjorn Helgaas) MSI: - Fix MSI IRQ domains for VFs on virtual buses (Alex Williamson) Renesas R-Car host bridge driver: - Add R8A7794 support (Sergei Shtylyov) Miscellaneous: - Fix devfn for VPD access through function 0 (Alex Williamson) - Use function 0 VPD only for identical functions (Alex Williamson)" * tag 'pci-v4.3-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: rcar: Add R8A7794 support PCI: Use function 0 VPD for identical functions, regular VPD for others PCI: Fix devfn for VPD access through function 0 PCI/MSI: Fix MSI IRQ domains for VFs on virtual buses PCI: Clear IORESOURCE_UNSET when clipping a bridge window PCI: Revert "PCI: Call pci_read_bridge_bases() from core instead of arch code"
2015-09-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "AMD fixes for bugs introduced in the 4.2 merge window, and a few PPC bug fixes too" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: disable halt_poll_ns as default for s390x KVM: x86: fix off-by-one in reserved bits check KVM: x86: use correct page table format to check nested page table reserved bits KVM: svm: do not call kvm_set_cr0 from init_vmcb KVM: x86: trap AMD MSRs for the TSeg base and mask KVM: PPC: Book3S: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store() KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs kvm: svm: reset mmu on VCPU reset
2015-09-25KVM: disable halt_poll_ns as default for s390xDavid Hildenbrand
We observed some performance degradation on s390x with dynamic halt polling. Until we can provide a proper fix, let's enable halt_poll_ns as default only for supported architectures. Architectures are now free to set their own halt_poll_ns default value. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-25KVM: x86: fix off-by-one in reserved bits checkPaolo Bonzini
29ecd6601904 ("KVM: x86: avoid uninitialized variable warning", 2015-09-06) introduced a not-so-subtle problem, which probably escaped review because it was not part of the patch context. Before the patch, leaf was always equal to iterator.level. After, it is equal to iterator.level - 1 in the call to is_shadow_zero_bits_set, and when is_shadow_zero_bits_set does another "-1" the check on reserved bits becomes incorrect. Using "iterator.level" in the call fixes this call trace: WARNING: CPU: 2 PID: 17000 at arch/x86/kvm/mmu.c:3385 handle_mmio_page_fault.part.93+0x1a/0x20 [kvm]() Modules linked in: tun sha256_ssse3 sha256_generic drbg binfmt_misc ipv6 vfat fat fuse dm_crypt dm_mod kvm_amd kvm crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd fam15h_power amd64_edac_mod k10temp edac_core amdkfd amd_iommu_v2 radeon acpi_cpufreq [...] Call Trace: dump_stack+0x4e/0x84 warn_slowpath_common+0x95/0xe0 warn_slowpath_null+0x1a/0x20 handle_mmio_page_fault.part.93+0x1a/0x20 [kvm] tdp_page_fault+0x231/0x290 [kvm] ? emulator_pio_in_out+0x6e/0xf0 [kvm] kvm_mmu_page_fault+0x36/0x240 [kvm] ? svm_set_cr0+0x95/0xc0 [kvm_amd] pf_interception+0xde/0x1d0 [kvm_amd] handle_exit+0x181/0xa70 [kvm_amd] ? kvm_arch_vcpu_ioctl_run+0x68b/0x1730 [kvm] kvm_arch_vcpu_ioctl_run+0x6f6/0x1730 [kvm] ? kvm_arch_vcpu_ioctl_run+0x68b/0x1730 [kvm] ? preempt_count_sub+0x9b/0xf0 ? mutex_lock_killable_nested+0x26f/0x490 ? preempt_count_sub+0x9b/0xf0 kvm_vcpu_ioctl+0x358/0x710 [kvm] ? __fget+0x5/0x210 ? __fget+0x101/0x210 do_vfs_ioctl+0x2f4/0x560 ? __fget_light+0x29/0x90 SyS_ioctl+0x4c/0x90 entry_SYSCALL_64_fastpath+0x16/0x73 ---[ end trace 37901c8686d84de6 ]--- Reported-by: Borislav Petkov <bp@alien8.de> Tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-25KVM: x86: use correct page table format to check nested page table reserved bitsPaolo Bonzini
Intel CPUID on AMD host or vice versa is a weird case, but it can happen. Handle it by checking the host CPU vendor instead of the guest's in reset_tdp_shadow_zero_bits_mask. For speed, the check uses the fact that Intel EPT has an X (executable) bit while AMD NPT has NX. Reported-by: Borislav Petkov <bp@alien8.de> Tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-25KVM: svm: do not call kvm_set_cr0 from init_vmcbPaolo Bonzini
kvm_set_cr0 may want to call kvm_zap_gfn_range and thus access the memslots array (SRCU protected). Using a mini SRCU critical section is ugly, and adding it to kvm_arch_vcpu_create doesn't work because the VMX vcpu_create callback calls synchronize_srcu. Fixes this lockdep splat: =============================== [ INFO: suspicious RCU usage. ] 4.3.0-rc1+ #1 Not tainted ------------------------------- include/linux/kvm_host.h:488 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by qemu-system-i38/17000: #0: (&(&kvm->mmu_lock)->rlock){+.+...}, at: kvm_zap_gfn_range+0x24/0x1a0 [kvm] [...] Call Trace: dump_stack+0x4e/0x84 lockdep_rcu_suspicious+0xfd/0x130 kvm_zap_gfn_range+0x188/0x1a0 [kvm] kvm_set_cr0+0xde/0x1e0 [kvm] init_vmcb+0x760/0xad0 [kvm_amd] svm_create_vcpu+0x197/0x250 [kvm_amd] kvm_arch_vcpu_create+0x47/0x70 [kvm] kvm_vm_ioctl+0x302/0x7e0 [kvm] ? __lock_is_held+0x51/0x70 ? __fget+0x101/0x210 do_vfs_ioctl+0x2f4/0x560 ? __fget_light+0x29/0x90 SyS_ioctl+0x4c/0x90 entry_SYSCALL_64_fastpath+0x16/0x73 Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-25perf/x86: Change test_aperfmperf() and test_intel() to staticGeliang Tang
Fixes the following sparse warnings: arch/x86/kernel/cpu/perf_event_msr.c:13:6: warning: symbol 'test_aperfmperf' was not declared. Should it be static? arch/x86/kernel/cpu/perf_event_msr.c:18:6: warning: symbol 'test_intel' was not declared. Should it be static? Signed-off-by: Geliang Tang <geliangtang@163.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/4588e8ab09638458f2451af572827108be3b4a36.1443123796.git.geliangtang@163.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-23x86/cpufeatures: Correct spelling of the HWP_NOTIFY flagKristen Carlson Accardi
Because noitification just isn't right. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Cc: rjw@rjwysocki.net Link: http://lkml.kernel.org/r/1442944296-11737-1-git-send-email-kristen@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-22x86, efi, kasan: #undef memset/memcpy/memmove per archAndrey Ryabinin
In not-instrumented code KASAN replaces instrumented memset/memcpy/memmove with not-instrumented analogues __memset/__memcpy/__memove. However, on x86 the EFI stub is not linked with the kernel. It uses not-instrumented mem*() functions from arch/x86/boot/compressed/string.c So we don't replace them with __mem*() variants in EFI stub. On ARM64 the EFI stub is linked with the kernel, so we should replace mem*() functions with __mem*(), because the EFI stub runs before KASAN sets up early shadow. So let's move these #undef mem* into arch's asm/efi.h which is also included by the EFI stub. Also, this will fix the warning in 32-bit build reported by kbuild test robot: efi-stub-helper.c:599:2: warning: implicit declaration of function 'memcpy' [akpm@linux-foundation.org: use 80 cols in comment] Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Reported-by: Fengguang Wu <fengguang.wu@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-22x86/nmi/64: Fix a paravirt stack-clobbering bug in the NMI codeAndy Lutomirski
The NMI entry code that switches to the normal kernel stack needs to be very careful not to clobber any extra stack slots on the NMI stack. The code is fine under the assumption that SWAPGS is just a normal instruction, but that assumption isn't really true. Use SWAPGS_UNSAFE_STACK instead. This is part of a fix for some random crashes that Sasha saw. Fixes: 9b6e6a8334d5 ("x86/nmi/64: Switch stacks on userspace NMI entry") Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/974bc40edffdb5c2950a5c4977f821a446b76178.1442791737.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-22x86/paravirt: Replace the paravirt nop with a bona fide empty functionAndy Lutomirski
PARAVIRT_ADJUST_EXCEPTION_FRAME generates this code (using nmi as an example, trimmed for readability): ff 15 00 00 00 00 callq *0x0(%rip) # 2796 <nmi+0x6> 2792: R_X86_64_PC32 pv_irq_ops+0x2c That's a call through a function pointer to regular C function that does nothing on native boots, but that function isn't protected against kprobes, isn't marked notrace, and is certainly not guaranteed to preserve any registers if the compiler is feeling perverse. This is bad news for a CLBR_NONE operation. Of course, if everything works correctly, once paravirt ops are patched, it gets nopped out, but what if we hit this code before paravirt ops are patched in? This can potentially cause breakage that is very difficult to debug. A more subtle failure is possible here, too: if _paravirt_nop uses the stack at all (even just to push RBP), it will overwrite the "NMI executing" variable if it's called in the NMI prologue. The Xen case, perhaps surprisingly, is fine, because it's already written in asm. Fix all of the cases that default to paravirt_nop (including adjust_exception_frame) with a big hammer: replace paravirt_nop with an asm function that is just a ret instruction. The Xen case may have other problems, so document them. This is part of a fix for some random crashes that Sasha saw. Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/8f5d2ba295f9d73751c33d97fda03e0495d9ade0.1442791737.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-21KVM: x86: trap AMD MSRs for the TSeg base and maskPaolo Bonzini
These have roughly the same purpose as the SMRR, which we do not need to implement in KVM. However, Linux accesses MSR_K8_TSEG_ADDR at boot, which causes problems when running a Xen dom0 under KVM. Just return 0, meaning that processor protection of SMRAM is not in effect. Reported-by: M A Young <m.a.young@durham.ac.uk> Cc: stable@vger.kernel.org Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "Mostly stable material, a lot of ARM fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits) sched: access local runqueue directly in single_task_running arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS' arm64: KVM: Remove all traces of the ThumbEE registers arm: KVM: Disable virtual timer even if the guest is not using it arm64: KVM: Disable virtual timer even if the guest is not using it arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping resources KVM: s390: Replace incorrect atomic_or with atomic_andnot arm: KVM: Fix incorrect device to IPA mapping arm64: KVM: Fix user access for debug registers KVM: vmx: fix VPID is 0000H in non-root operation KVM: add halt_attempted_poll to VCPU stats kvm: fix zero length mmio searching kvm: fix double free for fast mmio eventfd kvm: factor out core eventfd assign/deassign logic kvm: don't try to register to KVM_FAST_MMIO_BUS for non mmio eventfd KVM: make the declaration of functions within 80 characters KVM: arm64: add workaround for Cortex-A57 erratum #852523 KVM: fix polling for guest halt continued even if disable it arm/arm64: KVM: Fix PSCI affinity info return value for non valid cores arm64: KVM: set {v,}TCR_EL2 RES1 bits ...
2015-09-18Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "This is a rather large update post rc1 due to the final steps of cleanups and API changes which had to wait for the preparatory patches to hit your tree. - Regression fixes for ARM GIC irqchips - Regression fixes and lockdep anotations for renesas irq chips - The leftovers of the cleanup and preparatory patches which have been ignored by maintainers - Final conversions of the newly merged users of obsolete APIs - Final removal of obsolete APIs - Final removal of ARM artifacts which had been introduced during the conversion of ARM to the generic interrupt code. - Final split of the irq_data into chip specific and common data to reflect the needs of hierarchical irq domains. - Treewide removal of the first argument of interrupt flow handlers, i.e. the irq number, which is not used by the majority of handlers and simple to retrieve from the other argument the irq descriptor. - A few comment updates and build warning fixes" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) arm64: Remove ununsed set_irq_flags ARM: Remove ununsed set_irq_flags sh: Kill off set_irq_flags usage irqchip: Kill off set_irq_flags usage gpu/drm: Kill off set_irq_flags usage genirq: Remove irq argument from irq flow handlers genirq: Move field 'msi_desc' from irq_data into irq_common_data genirq: Move field 'affinity' from irq_data into irq_common_data genirq: Move field 'handler_data' from irq_data into irq_common_data genirq: Move field 'node' from irq_data into irq_common_data irqchip/gic-v3: Use IRQD_FORWARDED_TO_VCPU flag irqchip/gic: Use IRQD_FORWARDED_TO_VCPU flag genirq: Provide IRQD_FORWARDED_TO_VCPU status flag genirq: Simplify irq_data_to_desc() genirq: Remove __irq_set_handler_locked() pinctrl/pistachio: Use irq_set_handler_locked gpio: vf610: Use irq_set_handler_locked powerpc/mpc8xx: Use irq_set_handler_locked() powerpc/ipic: Use irq_set_handler_locked() powerpc/cpm2: Use irq_set_handler_locked() ...
2015-09-18Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "A single regression fix for the x86 dma allocator which got wreckaged in the merge window" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pci/dma: Fix gfp flags for coherent DMA memory allocation
2015-09-18kvm: svm: reset mmu on VCPU resetIgor Mammedov
When INIT/SIPI sequence is sent to VCPU which before that was in use by OS, VMRUN might fail with: KVM: entry failed, hardware error 0xffffffff EAX=00000000 EBX=00000000 ECX=00000000 EDX=000006d3 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=00000000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =9a00 0009a000 0000ffff 00009a00 [...] CR0=60000010 CR2=b6f3e000 CR3=01942000 CR4=000007e0 [...] EFER=0000000000000000 with corresponding SVM error: KVM: FAILED VMRUN WITH VMCB: [...] cpl: 0 efer: 0000000000001000 cr0: 0000000080010010 cr2: 00007fd7fe85bf90 cr3: 0000000187d0c000 cr4: 0000000000000020 [...] What happens is that VCPU state right after offlinig: CR0: 0x80050033 EFER: 0xd01 CR4: 0x7e0 -> long mode with CR3 pointing to longmode page tables and when VCPU gets INIT/SIPI following transition happens CR0: 0 -> 0x60000010 EFER: 0x0 CR4: 0x7e0 -> paging disabled with stale CR3 However SVM under the hood puts VCPU in Paged Real Mode* which effectively translates CR0 0x60000010 -> 80010010 after svm_vcpu_reset() -> init_vmcb() -> kvm_set_cr0() -> svm_set_cr0() but from kvm_set_cr0() perspective CR0: 0 -> 0x60000010 only caching bits are changed and commit d81135a57aa6 ("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed")' regressed svm_vcpu_reset() which relied on MMU being reset. As result VMRUN after svm_vcpu_reset() tries to run VCPU in Paged Real Mode with stale MMU context (longmode page tables), which causes some AMD CPUs** to bail out with VMEXIT_INVALID. Fix issue by unconditionally resetting MMU context at init_vmcb() time. * AMD64 Architecture Programmer’s Manual, Volume 2: System Programming, rev: 3.25 15.19 Paged Real Mode ** Opteron 1216 Signed-off-by: Igor Mammedov <imammedo@redhat.com> Fixes: d81135a57aa6 Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-18perf/x86/intel: Fix Skylake FRONTEND MSR extrareg maskAndi Kleen
Stephane pointed out that the extrareg mask was one bit too short. The bubble width field was truncated by one bit. Fix that here. Also add some extra comments on the reserved bits inside the event select code. Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1441835640-21347-3-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-18perf/x86/intel/pebs: Add PEBS frontend profiling for SkylakeAndi Kleen
Skylake has a new FRONTEND_LATENCY PEBS event to accurately profile frontend problems (like ITLB or decoding issues). The new event is configured through a separate MSR, which selects a range of sub events. Define the extra MSR as a extra reg and export support for it through sysfs. To avoid duplicating the existing tables use a new function to add new entries to existing tables. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1435707205-6676-4-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-18perf/x86/intel: Make the CYCLE_ACTIVITY.* constraint on Broadwell more specificAndi Kleen
The counter constraint for CYCLE_ACTIVITY.* on Broadwell covered all CYCLE_ACTIVITY.* sub events, and forced them on counter 2. But actually only one sub event (umask 8) needs to be on counter 2, all others do not have any constraint. Only force that subevent. This fixes groups with multiple CYCLE_ACTIVITY.* events, for example: % perf stat -x, -e '{cpu/event=0xa3,umask=0x6,cmask=6/,\ cpu/event=0xa2,umask=0x8/,\ cpu/event=0xa3,umask=0x4,cmask=4/,cpu/event=0xb1,umask=0x1,cmask=1/}' true 122150,,cpu/event=0xa3,umask=0x6,cmask=6/,846486,100.00 16483,,cpu/event=0xa2,umask=0x8/,846486,100.00 252280,,cpu/event=0xa3,umask=0x4,cmask=4/,846486,100.00 233604,,cpu/event=0xb1,umask=0x1,cmask=1/,846486,100.00 % Without this patch the third result would be <unsupported> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1442267222-16464-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-17Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - misc fixes all around the map - block non-root vm86(old) if mmap_min_addr != 0 - two small debuggability improvements - removal of obsolete paravirt op * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform: Fix Geode LX timekeeping in the generic x86 build x86/apic: Serialize LVTT and TSC_DEADLINE writes x86/ioapic: Force affinity setting in setup_ioapic_dest() x86/paravirt: Remove the unused pv_time_ops::get_tsc_khz method x86/ldt: Fix small LDT allocation for Xen x86/vm86: Fix the misleading CONFIG_VM86 Kconfig help text x86/cpu: Print family/model/stepping in hex x86/vm86: Block non-root vm86(old) if mmap_min_addr != 0 x86/alternatives: Make optimize_nops() interrupt safe and synced x86/mm/srat: Print non-volatile flag in SRAT x86/cpufeatures: Enable cpuid for Intel SHA extensions
2015-09-17Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo MOlnar: "Mostly tooling fixes, but also two x86 PMU driver fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tests: Fix software clock events test setting maps perf tests: Fix task exit test setting maps perf evlist: Fix create_syswide_maps() not propagating maps perf evlist: Fix add() not propagating maps perf evlist: Factor out a function to propagate maps for a single evsel perf evlist: Make create_maps() use set_maps() perf evlist: Make set_maps() more resilient perf evsel: Add own_cpus member perf evlist: Fix missing thread_map__put in propagate_maps() perf evlist: Fix splice_list_tail() not setting evlist perf evlist: Add has_user_cpus member perf evlist: Remove redundant validation from propagate_maps() perf evlist: Simplify set_maps() logic perf evlist: Simplify propagate_maps() logic perf top: Fix segfault pressing -> with no hist entries perf header: Fixup reading of HEADER_NRCPUS feature perf/x86/intel: Fix constraint access perf/x86/intel/bts: Set event->hw.itrace_started in pmu::start to match the new logic perf tools: Fix use of wrong event when processing exit events perf tools: Fix parse_events_add_pmu caller
2015-09-17Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Spinlock performance regression fix, plus documentation fixes" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/static_keys: Fix up the static keys documentation locking/qspinlock/x86: Only emit the test-and-set fallback when building guest support locking/qspinlock/x86: Fix performance regression under unaccelerated VMs locking/static_keys: Fix a silly typo
2015-09-17x86/pci/dma: Fix gfp flags for coherent DMA memory allocationJunichi Nomura
Commit 6894258eda2f reversed the order of gfp_flags adjustment in dma_alloc_attrs() for x86 [arch/x86/kernel/pci-dma.c] As a result, relevant flags set by dma_alloc_coherent_gfp_flags() are just discarded and cause coherent DMA memory allocation failure on some devices. Fixes: 6894258eda2f ("dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent}") Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Tested-by: Tony Luck <tony.luck@intel.com> Acked-by: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20150914073834.GA13077@xzibit.linux.bs1.fc.nec.co.jp Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-16x86/platform: Fix Geode LX timekeeping in the generic x86 buildDavid Woodhouse
In 2007, commit 07190a08eef36 ("Mark TSC on GeodeLX reliable") bypassed verification of the TSC on Geode LX. However, this code (now in the check_system_tsc_reliable() function in arch/x86/kernel/tsc.c) was only present if CONFIG_MGEODE_LX was set. OpenWRT has recently started building its generic Geode target for Geode GX, not LX, to include support for additional platforms. This broke the timekeeping on LX-based devices, because the TSC wasn't marked as reliable: https://dev.openwrt.org/ticket/20531 By adding a runtime check on is_geode_lx(), we can also include the fix if CONFIG_MGEODEGX1 or CONFIG_X86_GENERIC are set, thus fixing the problem. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Cc: Andres Salomon <dilinger@queued.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marcelo Tosatti <marcelo@kvack.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1442409003.131189.87.camel@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-16genirq: Remove irq argument from irq flow handlersThomas Gleixner
Most interrupt flow handlers do not use the irq argument. Those few which use it can retrieve the irq number from the irq descriptor. Remove the argument. Search and replace was done with coccinelle and some extra helper scripts around it. Thanks to Julia for her help! Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Jiang Liu <jiang.liu@linux.intel.com>
2015-09-16genirq: Move field 'affinity' from irq_data into irq_common_dataJiang Liu
Irq affinity mask is per-irq instead of per irqchip, so move it into struct irq_common_data. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Kevin Cernekee <cernekee@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Link: http://lkml.kernel.org/r/1433303281-27688-1-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-16KVM: vmx: fix VPID is 0000H in non-root operationWanpeng Li
Reference SDM 28.1: The current VPID is 0000H in the following situations: - Outside VMX operation. (This includes operation in system-management mode under the default treatment of SMIs and SMM with VMX operation; see Section 34.14.) - In VMX root operation. - In VMX non-root operation when the “enable VPID” VM-execution control is 0. The VPID should never be 0000H in non-root operation when "enable VPID" VM-execution control is 1. However, commit 34a1cd60 ("kvm: x86: vmx: move some vmx setting from vmx_init() to hardware_setup()") remove the codes which reserve 0000H for VMX root operation. This patch fix it by again reserving 0000H for VMX root operation. Cc: stable@vger.kernel.org # 3.19+ Fixes: 34a1cd60d17f62c1f077c1478a6c2ca8c3d17af4 Reported-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>