summaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)Author
2018-11-27x86/mm/kaslr: Use the _ASM_MUL macro for multiplication to work around Clang ↵Matthias Kaehlcke
incompatibility commit 121843eb02a6e2fa30aefab64bfe183c97230c75 upstream. The constraint "rm" allows the compiler to put mix_const into memory. When the input operand is a memory location then MUL needs an operand size suffix, since Clang can't infer the multiplication width from the operand. Add and use the _ASM_MUL macro which determines the operand size and resolves to the NUL instruction with the corresponding suffix. This fixes the following error when building with clang: CC arch/x86/lib/kaslr.o /tmp/kaslr-dfe1ad.s: Assembler messages: /tmp/kaslr-dfe1ad.s:182: Error: no instruction mnemonic suffix given and no register operands; can't size instruction Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Cc: Grant Grundler <grundler@chromium.org> Cc: Greg Hackmann <ghackmann@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Davidson <md@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170501224741.133938-1-mka@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> [nc: Apply to aslr.c in get_random_long as the kaslr shift didn't happen until 4.8 in commit d899a7d146a2] Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-10x86/percpu: Fix this_cpu_read()Peter Zijlstra
commit b59167ac7bafd804c91e49ad53c6d33a7394d4c8 upstream. Eric reported that a sequence count loop using this_cpu_read() got optimized out. This is wrong, this_cpu_read() must imply READ_ONCE() because the interface is IRQ-safe, therefore an interrupt can have changed the per-cpu value. Fixes: 7c3576d261ce ("[PATCH] i386: Convert PDA into the percpu section") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Eric Dumazet <edumazet@google.com> Cc: hpa@zytor.com Cc: eric.dumazet@gmail.com Cc: bp@alien8.de Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181011104019.748208519@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-20x86/fpu: Finish excising 'eagerfpu'Andy Lutomirski
commit e63650840e8b053aa09ad934877e87e9941ed135 upstream. Now that eagerfpu= is gone, remove it from the docs and some comments. Also sync the changes to tools/. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/cf430dd4481d41280e93ac6cf0def1007a67fc8e.1476740397.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-20x86/fpu: Remove struct fpu::counterRik van Riel
commit 3913cc3507575273beb165a5e027a081913ed507 upstream. With the lazy FPU code gone, we no longer use the counter field in struct fpu for anything. Get rid it. Signed-off-by: Rik van Riel <riel@redhat.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: pbonzini@redhat.com Link: http://lkml.kernel.org/r/1475627678-20788-6-git-send-email-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-20x86/fpu: Remove use_eager_fpu()Andy Lutomirski
commit c592b57347069abfc0dcad3b3a302cf882602597 upstream. This removes all the obvious code paths that depend on lazy FPU mode. It shouldn't change the generated code at all. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: pbonzini@redhat.com Link: http://lkml.kernel.org/r/1475627678-20788-5-git-send-email-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-20KVM: x86: remove eager_fpu field of struct kvm_vcpu_archPaolo Bonzini
commit 5a5fbdc0e3f1159a734f1890da60fce70e98271d upstream. It is now equal to use_eager_fpu(), which simply tests a cpufeature bit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-19x86/speculation/l1tf: Increase l1tf memory limit for Nehalem+Andi Kleen
[upstream cc51e5428ea54f575d49cfcede1d4cb3a72b4ec4 for 4.4. Note there might be still a trivial conflict with the backport for b0a182f875689647b014bc01d36b340217792852, but should be easy to resolve] On Nehalem and newer core CPUs the CPU cache internally uses 44 bits physical address space. The L1TF workaround is limited by this internal cache address width, and needs to have one bit free there for the mitigation to work. Older client systems report only 36bit physical address space so the range check decides that L1TF is not mitigated for a 36bit phys/32GB system with some memory holes. But since these actually have the larger internal cache width this warning is bogus because it would only really be needed if the system had more than 43bits of memory. Add a new internal x86_cache_bits field. Normally it is the same as the physical bits field reported by CPUID, but for Nehalem and newerforce it to be at least 44bits. Change the L1TF memory size warning to use the new cache_bits field to avoid bogus warnings and remove the bogus comment about memory size. Fixes: 17dbca119312 ("x86/speculation/l1tf: Add sysfs reporting for l1tf") Reported-by: George Anchev <studio@anchev.net> Reported-by: Christopher Snowhill <kode54@gmail.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Cc: Michael Hocko <mhocko@suse.com> Cc: vbabka@suse.cz Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180824170351.34874-1-andi@firstfloor.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-15x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clearJuergen Gross
commit b2d7a075a1ccef2fb321d595802190c8e9b39004 upstream. Using only 32-bit writes for the pte will result in an intermediate L1TF vulnerable PTE. When running as a Xen PV guest this will at once switch the guest to shadow mode resulting in a loss of performance. Use arch_atomic64_xchg() instead which will perform the requested operation atomically with all 64 bits. Some performance considerations according to: https://software.intel.com/sites/default/files/managed/ad/dc/Intel-Xeon-Scalable-Processor-throughput-latency.pdf The main number should be the latency, as there is no tight loop around native_ptep_get_and_clear(). "lock cmpxchg8b" has a latency of 20 cycles, while "lock xchg" (with a memory operand) isn't mentioned in that document. "lock xadd" (with xadd having 3 cycles less latency than xchg) has a latency of 11, so we can assume a latency of 14 for "lock xchg". Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jan Beulich <jbeulich@suse.com> Tested-by: Jason Andryuk <jandryuk@gmail.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> [ Atomic operations gained an arch_ prefix in 8bf705d13039 ("locking/atomic/x86: Switch atomic.h to use atomic-instrumented.h") so s/arch_atomic64_xchg/atomic64_xchg/ for backport.] Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-15x86/speculation/l1tf: Fix up pte->pfn conversion for PAEMichal Hocko
commit e14d7dfb41f5807a0c1c26a13f2b8ef16af24935 upstream. Jan has noticed that pte_pfn and co. resp. pfn_pte are incorrect for CONFIG_PAE because phys_addr_t is wider than unsigned long and so the pte_val reps. shift left would get truncated. Fix this up by using proper types. [Just one chunk, again, needed here. Thanks to Ben and Guenter for finding and fixing this. - gregkh] Fixes: 6b28baca9b1f ("x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation") Reported-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-09x86/io: add interface to reserve io memtype for a resource range. (v1.1)Dave Airlie
commit 8ef4227615e158faa4ee85a1d6466782f7e22f2f upstream. A recent change to the mm code in: 87744ab3832b mm: fix cache mode tracking in vm_insert_mixed() started enforcing checking the memory type against the registered list for amixed pfn insertion mappings. It happens that the drm drivers for a number of gpus relied on this being broken. Currently the driver only inserted VRAM mappings into the tracking table when they came from the kernel, and userspace mappings never landed in the table. This led to a regression where all the mapping end up as UC instead of WC now. I've considered a number of solutions but since this needs to be fixed in fixes and not next, and some of the solutions were going to introduce overhead that hadn't been there before I didn't consider them viable at this stage. These mainly concerned hooking into the TTM io reserve APIs, but these API have a bunch of fast paths I didn't want to unwind to add this to. The solution I've decided on is to add a new API like the arch_phys_wc APIs (these would have worked but wc_del didn't take a range), and use them from the drivers to add a WC compatible mapping to the table for all VRAM on those GPUs. This means we can then create userspace mapping that won't get degraded to UC. v1.1: use CONFIG_X86_PAT + add some comments in io.h Cc: Toshi Kani <toshi.kani@hp.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: x86@kernel.org Cc: mcgrof@suse.com Cc: Dan Williams <dan.j.williams@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Airlie <airlied@redhat.com> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-05x86/irqflags: Mark native_restore_fl extern inlineNick Desaulniers
commit 1f59a4581b5ecfe9b4f049a7a2cf904d8352842d upstream. This should have been marked extern inline in order to pick up the out of line definition in arch/x86/kernel/irqflags.S. Fixes: 208cbb325589 ("x86/irqflags: Provide a declaration for native_save_fl") Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180827214011.55428-1-ndesaulniers@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-05x86/speculation/l1tf: Fix off-by-one error when warning that system has too ↵Vlastimil Babka
much RAM commit b0a182f875689647b014bc01d36b340217792852 upstream. Two users have reported [1] that they have an "extremely unlikely" system with more than MAX_PA/2 memory and L1TF mitigation is not effective. In fact it's a CPU with 36bits phys limit (64GB) and 32GB memory, but due to holes in the e820 map, the main region is almost 500MB over the 32GB limit: [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000081effffff] usable Suggestions to use 'mem=32G' to enable the L1TF mitigation while losing the 500MB revealed, that there's an off-by-one error in the check in l1tf_select_mitigation(). l1tf_pfn_limit() returns the last usable pfn (inclusive) and the range check in the mitigation path does not take this into account. Instead of amending the range check, make l1tf_pfn_limit() return the first PFN which is over the limit which is less error prone. Adjust the other users accordingly. [1] https://bugzilla.suse.com/show_bug.cgi?id=1105536 Fixes: 17dbca119312 ("x86/speculation/l1tf: Add sysfs reporting for l1tf") Reported-by: George Anchev <studio@anchev.net> Reported-by: Christopher Snowhill <kode54@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180823134418.17008-1-vbabka@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-05x86/speculation/l1tf: Fix overflow in l1tf_pfn_limit() on 32bitVlastimil Babka
commit 9df9516940a61d29aedf4d91b483ca6597e7d480 upstream. On 32bit PAE kernels on 64bit hardware with enough physical bits, l1tf_pfn_limit() will overflow unsigned long. This in turn affects max_swapfile_size() and can lead to swapon returning -EINVAL. This has been observed in a 32bit guest with 42 bits physical address size, where max_swapfile_size() overflows exactly to 1 << 32, thus zero, and produces the following warning to dmesg: [ 6.396845] Truncating oversized swap area, only using 0k out of 2047996k Fix this by using unsigned long long instead. Fixes: 17dbca119312 ("x86/speculation/l1tf: Add sysfs reporting for l1tf") Fixes: 377eeaa8e11f ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2") Reported-by: Dominique Leuenberger <dimstar@suse.de> Reported-by: Adrian Schroeter <adrian@suse.de> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180820095835.5298-1-vbabka@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-28x86/mm: Fix use-after-free of ldt_structEric Biggers
commit ccd5b3235180eef3cfec337df1c8554ab151b5cc upstream. The following commit: 39a0526fb3f7 ("x86/mm: Factor out LDT init from context init") renamed init_new_context() to init_new_context_ldt() and added a new init_new_context() which calls init_new_context_ldt(). However, the error code of init_new_context_ldt() was ignored. Consequently, if a memory allocation in alloc_ldt_struct() failed during a fork(), the ->context.ldt of the new task remained the same as that of the old task (due to the memcpy() in dup_mm()). ldt_struct's are not intended to be shared, so a use-after-free occurred after one task exited. Fix the bug by making init_new_context() pass through the error code of init_new_context_ldt(). This bug was found by syzkaller, which encountered the following splat: BUG: KASAN: use-after-free in free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116 Read of size 4 at addr ffff88006d2cb7c8 by task kworker/u9:0/3710 CPU: 1 PID: 3710 Comm: kworker/u9:0 Not tainted 4.13.0-rc4-next-20170811 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x24e/0x340 mm/kasan/report.c:409 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429 free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116 free_ldt_struct arch/x86/kernel/ldt.c:173 [inline] destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171 destroy_context arch/x86/include/asm/mmu_context.h:157 [inline] __mmdrop+0xe9/0x530 kernel/fork.c:889 mmdrop include/linux/sched/mm.h:42 [inline] exec_mmap fs/exec.c:1061 [inline] flush_old_exec+0x173c/0x1ff0 fs/exec.c:1291 load_elf_binary+0x81f/0x4ba0 fs/binfmt_elf.c:855 search_binary_handler+0x142/0x6b0 fs/exec.c:1652 exec_binprm fs/exec.c:1694 [inline] do_execveat_common.isra.33+0x1746/0x22e0 fs/exec.c:1816 do_execve+0x31/0x40 fs/exec.c:1860 call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431 Allocated by task 3700: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3627 kmalloc include/linux/slab.h:493 [inline] alloc_ldt_struct+0x52/0x140 arch/x86/kernel/ldt.c:67 write_ldt+0x7b7/0xab0 arch/x86/kernel/ldt.c:277 sys_modify_ldt+0x1ef/0x240 arch/x86/kernel/ldt.c:307 entry_SYSCALL_64_fastpath+0x1f/0xbe Freed by task 3700: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524 __cache_free mm/slab.c:3503 [inline] kfree+0xca/0x250 mm/slab.c:3820 free_ldt_struct.part.2+0xdd/0x150 arch/x86/kernel/ldt.c:121 free_ldt_struct arch/x86/kernel/ldt.c:173 [inline] destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171 destroy_context arch/x86/include/asm/mmu_context.h:157 [inline] __mmdrop+0xe9/0x530 kernel/fork.c:889 mmdrop include/linux/sched/mm.h:42 [inline] __mmput kernel/fork.c:916 [inline] mmput+0x541/0x6e0 kernel/fork.c:927 copy_process.part.36+0x22e1/0x4af0 kernel/fork.c:1931 copy_process kernel/fork.c:1546 [inline] _do_fork+0x1ef/0xfb0 kernel/fork.c:2025 SYSC_clone kernel/fork.c:2135 [inline] SyS_clone+0x37/0x50 kernel/fork.c:2129 do_syscall_64+0x26c/0x8c0 arch/x86/entry/common.c:287 return_from_SYSCALL_64+0x0/0x7a Here is a C reproducer: #include <asm/ldt.h> #include <pthread.h> #include <signal.h> #include <stdlib.h> #include <sys/syscall.h> #include <sys/wait.h> #include <unistd.h> static void *fork_thread(void *_arg) { fork(); } int main(void) { struct user_desc desc = { .entry_number = 8191 }; syscall(__NR_modify_ldt, 1, &desc, sizeof(desc)); for (;;) { if (fork() == 0) { pthread_t t; srand(getpid()); pthread_create(&t, NULL, fork_thread, NULL); usleep(rand() % 10000); syscall(__NR_exit_group, 0); } wait(NULL); } } Note: the reproducer takes advantage of the fact that alloc_ldt_struct() may use vmalloc() to allocate a large ->entries array, and after commit: 5d17a73a2ebe ("vmalloc: back off when the current task is killed") it is possible for userspace to fail a task's vmalloc() by sending a fatal signal, e.g. via exit_group(). It would be more difficult to reproduce this bug on kernels without that commit. This bug only affected kernels with CONFIG_MODIFY_LDT_SYSCALL=y. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> [v4.6+] Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Fixes: 39a0526fb3f7 ("x86/mm: Factor out LDT init from context init") Link: http://lkml.kernel.org/r/20170824175029.76040-1-ebiggers3@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-22x86/mm: Simplify p[g4um]d_page() macrosTom Lendacky
commit fd7e315988b784509ba3f1b42f539bd0b1fca9bb upstream. Create a pgd_pfn() macro similar to the p[4um]d_pfn() macros and then use the p[g4um]d_pfn() macros in the p[g4um]d_page() macros instead of duplicating the code. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/e61eb533a6d0aac941db2723d8aa63ef6b882dee.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org> [Backported to 4.9 stable by AK, suggested by Michael Hocko] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-18x86/speculation/l1tf: Exempt zeroed PTEs from inversionSean Christopherson
commit f19f5c49bbc3ffcc9126cc245fc1b24cc29f4a37 upstream. It turns out that we should *not* invert all not-present mappings, because the all zeroes case is obviously special. clear_page() does not undergo the XOR logic to invert the address bits, i.e. PTE, PMD and PUD entries that have not been individually written will have val=0 and so will trigger __pte_needs_invert(). As a result, {pte,pmd,pud}_pfn() will return the wrong PFN value, i.e. all ones (adjusted by the max PFN mask) instead of zero. A zeroed entry is ok because the page at physical address 0 is reserved early in boot specifically to mitigate L1TF, so explicitly exempt them from the inversion when reading the PFN. Manifested as an unexpected mprotect(..., PROT_NONE) failure when called on a VMA that has VM_PFNMAP and was mmap'd to as something other than PROT_NONE but never used. mprotect() sends the PROT_NONE request down prot_none_walk(), which walks the PTEs to check the PFNs. prot_none_pte_entry() gets the bogus PFN from pte_pfn() and returns -EACCES because it thinks mprotect() is trying to adjust a high MMIO address. [ This is a very modified version of Sean's original patch, but all credit goes to Sean for doing this and also pointing out that sometimes the __pte_needs_invert() function only gets the protection bits, not the full eventual pte. But zero remains special even in just protection bits, so that's ok. - Linus ] Fixes: f22cc87f6c1f ("x86/speculation/l1tf: Invert all not present mappings") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15x86/speculation/l1tf: Fix up CPU feature flagsGuenter Roeck
In linux-4.4.y, the definition of X86_FEATURE_RETPOLINE and X86_FEATURE_RETPOLINE_AMD is different from the upstream definition. Result is an overlap with the newly introduced X86_FEATURE_L1TF_PTEINV. Update RETPOLINE definitions to match upstream definitions to improve alignment with upstream code. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15x86/mm/pat: Make set_memory_np() L1TF safeAndi Kleen
commit 958f79b9ee55dfaf00c8106ed1c22a2919e0028b upstream set_memory_np() is used to mark kernel mappings not present, but it has it's own open coded mechanism which does not have the L1TF protection of inverting the address bits. Replace the open coded PTE manipulation with the L1TF protecting low level PTE routines. Passes the CPA self test. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [ dwmw2: Pull in pud_mkhuge() from commit a00cc7d9dd, and pfn_pud() ] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> [groeck: port to 4.4] Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15x86/speculation/l1tf: Make pmd/pud_mknotpresent() invertAndi Kleen
commit 0768f91530ff46683e0b372df14fd79fe8d156e5 upstream Some cases in THP like: - MADV_FREE - mprotect - split mark the PMD non present for temporarily to prevent races. The window for an L1TF attack in these contexts is very small, but it wants to be fixed for correctness sake. Use the proper low level functions for pmd/pud_mknotpresent() to address this. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15x86/speculation/l1tf: Invert all not present mappingsAndi Kleen
commit f22cc87f6c1f771b57c407555cfefd811cdd9507 upstream For kernel mappings PAGE_PROTNONE is not necessarily set for a non present mapping, but the inversion logic explicitely checks for !PRESENT and PROT_NONE. Remove the PROT_NONE check and make the inversion unconditional for all not present mappings. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15x86/speculation/l1tf: Fix up pte->pfn conversion for PAEMichal Hocko
commit e14d7dfb41f5807a0c1c26a13f2b8ef16af24935 upstream Jan has noticed that pte_pfn and co. resp. pfn_pte are incorrect for CONFIG_PAE because phys_addr_t is wider than unsigned long and so the pte_val reps. shift left would get truncated. Fix this up by using proper types. [dwmw2: Backport to 4.9] Fixes: 6b28baca9b1f ("x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation") Reported-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15x86/speculation/l1tf: Protect PAE swap entries against L1TFVlastimil Babka
commit 0d0f6249058834ffe1ceaad0bb31464af66f6e7a upstream The PAE 3-level paging code currently doesn't mitigate L1TF by flipping the offset bits, and uses the high PTE word, thus bits 32-36 for type, 37-63 for offset. The lower word is zeroed, thus systems with less than 4GB memory are safe. With 4GB to 128GB the swap type selects the memory locations vulnerable to L1TF; with even more memory, also the swap offfset influences the address. This might be a problem with 32bit PAE guests running on large 64bit hosts. By continuing to keep the whole swap entry in either high or low 32bit word of PTE we would limit the swap size too much. Thus this patch uses the whole PAE PTE with the same layout as the 64bit version does. The macros just become a bit tricky since they assume the arch-dependent swp_entry_t to be 32bit. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15x86/cpufeatures: Add detection of L1D cache flush support.Konrad Rzeszutek Wilk
commit 11e34e64e4103955fc4568750914c75d65ea87ee upstream 336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines a new MSR (IA32_FLUSH_CMD) which is detected by CPUID.7.EDX[28]=1 bit being set. This new MSR "gives software a way to invalidate structures with finer granularity than other architectual methods like WBINVD." A copy of this document is available at https://bugzilla.kernel.org/show_bug.cgi?id=199511 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappingsAndi Kleen
commit 42e4089c7890725fcd329999252dc489b72f2921 upstream For L1TF PROT_NONE mappings are protected by inverting the PFN in the page table entry. This sets the high bits in the CPU's address space, thus making sure to point to not point an unmapped entry to valid cached memory. Some server system BIOSes put the MMIO mappings high up in the physical address space. If such an high mapping was mapped to unprivileged users they could attack low memory by setting such a mapping to PROT_NONE. This could happen through a special device driver which is not access protected. Normal /dev/mem is of course access protected. To avoid this forbid PROT_NONE mappings or mprotect for high MMIO mappings. Valid page mappings are allowed because the system is then unsafe anyways. It's not expected that users commonly use PROT_NONE on MMIO. But to minimize any impact this is only enforced if the mapping actually refers to a high MMIO address (defined as the MAX_PA-1 bit being set), and also skip the check for root. For mmaps this is straight forward and can be handled in vm_insert_pfn and in remap_pfn_range(). For mprotect it's a bit trickier. At the point where the actual PTEs are accessed a lot of state has been changed and it would be difficult to undo on an error. Since this is a uncommon case use a separate early page talk walk pass for MMIO PROT_NONE mappings that checks for this condition early. For non MMIO and non PROT_NONE there are no changes. [dwmw2: Backport to 4.9] [groeck: Backport to 4.4] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15x86/speculation/l1tf: Add sysfs reporting for l1tfAndi Kleen
commit 17dbca119312b4e8173d4e25ff64262119fcef38 upstream L1TF core kernel workarounds are cheap and normally always enabled, However they still should be reported in sysfs if the system is vulnerable or mitigated. Add the necessary CPU feature/bug bits. - Extend the existing checks for Meltdowns to determine if the system is vulnerable. All CPUs which are not vulnerable to Meltdown are also not vulnerable to L1TF - Check for 32bit non PAE and emit a warning as there is no practical way for mitigation due to the limited physical address bits - If the system has more than MAX_PA/2 physical memory the invert page workarounds don't protect the system against the L1TF attack anymore, because an inverted physical address will also point to valid memory. Print a warning in this case and report that the system is vulnerable. Add a function which returns the PFN limit for the L1TF mitigation, which will be used in follow up patches for sanity and range checks. [ tglx: Renamed the CPU feature bit to L1TF_PTEINV ] [ dwmw2: Backport to 4.9 (cpufeatures.h, E820) ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15x86/speculation/l1tf: Protect PROT_NONE PTEs against speculationAndi Kleen
commit 6b28baca9b1f0d4a42b865da7a05b1c81424bd5c upstream When PTEs are set to PROT_NONE the kernel just clears the Present bit and preserves the PFN, which creates attack surface for L1TF speculation speculation attacks. This is important inside guests, because L1TF speculation bypasses physical page remapping. While the host has its own migitations preventing leaking data from other VMs into the guest, this would still risk leaking the wrong page inside the current guest. This uses the same technique as Linus' swap entry patch: while an entry is is in PROTNONE state invert the complete PFN part part of it. This ensures that the the highest bit will point to non existing memory. The invert is done by pte/pmd_modify and pfn/pmd/pud_pte for PROTNONE and pte/pmd/pud_pfn undo it. This assume that no code path touches the PFN part of a PTE directly without using these primitives. This doesn't handle the case that MMIO is on the top of the CPU physical memory. If such an MMIO region was exposed by an unpriviledged driver for mmap it would be possible to attack some real memory. However this situation is all rather unlikely. For 32bit non PAE the inversion is not done because there are really not enough bits to protect anything. Q: Why does the guest need to be protected when the HyperVisor already has L1TF mitigations? A: Here's an example: Physical pages 1 2 get mapped into a guest as GPA 1 -> PA 2 GPA 2 -> PA 1 through EPT. The L1TF speculation ignores the EPT remapping. Now the guest kernel maps GPA 1 to process A and GPA 2 to process B, and they belong to different users and should be isolated. A sets the GPA 1 PA 2 PTE to PROT_NONE to bypass the EPT remapping and gets read access to the underlying physical page. Which in this case points to PA 2, so it can read process B's data, if it happened to be in L1, so isolation inside the guest is broken. There's nothing the hypervisor can do about this. This mitigation has to be done in the guest itself. [ tglx: Massaged changelog ] [ dwmw2: backported to 4.9 ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15x86/speculation/l1tf: Protect swap entries against L1TFLinus Torvalds
commit 2f22b4cd45b67b3496f4aa4c7180a1271c6452f6 upstream With L1 terminal fault the CPU speculates into unmapped PTEs, and resulting side effects allow to read the memory the PTE is pointing too, if its values are still in the L1 cache. For swapped out pages Linux uses unmapped PTEs and stores a swap entry into them. To protect against L1TF it must be ensured that the swap entry is not pointing to valid memory, which requires setting higher bits (between bit 36 and bit 45) that are inside the CPUs physical address space, but outside any real memory. To do this invert the offset to make sure the higher bits are always set, as long as the swap file is not too big. Note there is no workaround for 32bit !PAE, or on systems which have more than MAX_PA/2 worth of memory. The later case is very unlikely to happen on real systems. [AK: updated description and minor tweaks by. Split out from the original patch ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15x86/speculation/l1tf: Change order of offset/type in swap entryLinus Torvalds
commit bcd11afa7adad8d720e7ba5ef58bdcd9775cf45f upstream If pages are swapped out, the swap entry is stored in the corresponding PTE, which has the Present bit cleared. CPUs vulnerable to L1TF speculate on PTE entries which have the present bit set and would treat the swap entry as phsyical address (PFN). To mitigate that the upper bits of the PTE must be set so the PTE points to non existent memory. The swap entry stores the type and the offset of a swapped out page in the PTE. type is stored in bit 9-13 and offset in bit 14-63. The hardware ignores the bits beyond the phsyical address space limit, so to make the mitigation effective its required to start 'offset' at the lowest possible bit so that even large swap offsets do not reach into the physical address space limit bits. Move offset to bit 9-58 and type to bit 59-63 which are the bits that hardware generally doesn't care about. That, in turn, means that if you on desktop chip with only 40 bits of physical addressing, now that the offset starts at bit 9, there needs to be 30 bits of offset actually *in use* until bit 39 ends up being set, which means when inverted it will again point into existing memory. So that's 4 terabyte of swap space (because the offset is counted in pages, so 30 bits of offset is 42 bits of actual coverage). With bigger physical addressing, that obviously grows further, until the limit of the offset is hit (at 50 bits of offset - 62 bits of actual swap file coverage). This is a preparatory change for the actual swap entry inversion to protect against L1TF. [ AK: Updated description and minor tweaks. Split into two parts ] [ tglx: Massaged changelog ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1Naoya Horiguchi
commit eee4818baac0f2b37848fdf90e4b16430dc536ac upstream _PAGE_PSE is used to distinguish between a truly non-present (_PAGE_PRESENT=0) PMD, and a PMD which is undergoing a THP split and should be treated as present. But _PAGE_SWP_SOFT_DIRTY currently uses the _PAGE_PSE bit, which would cause confusion between one of those PMDs undergoing a THP split, and a soft-dirty PMD. Dropping _PAGE_PSE check in pmd_present() does not work well, because it can hurt optimization of tlb handling in thp split. Thus, we need to move the bit. In the current kernel, bits 1-4 are not used in non-present format since commit 00839ee3b299 ("x86/mm: Move swap offset/type up in PTE to work around erratum"). So let's move _PAGE_SWP_SOFT_DIRTY to bit 1. Bit 7 is used as reserved (always clear), so please don't use it for other purpose. [dwmw2: Pulled in to 4.9 backport to support L1TF changes] Link: http://lkml.kernel.org/r/20170717193955.20207-3-zi.yan@sent.com Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: David Nellans <dnellans@nvidia.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Minchan Kim <minchan@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15x86/mm: Fix swap entry comment and macroDave Hansen
commit ace7fab7a6cdd363a615ec537f2aa94dbc761ee2 upstream A recent patch changed the format of a swap PTE. The comment explaining the format of the swap PTE is wrong about the bits used for the swap type field. Amusingly, the ASCII art and the patch description are correct, but the comment itself is wrong. As I was looking at this, I also noticed that the SWP_OFFSET_FIRST_BIT has an off-by-one error. This does not really hurt anything. It just wasted a bit of space in the PTE, giving us 2^59 bytes of addressable space in our swapfiles instead of 2^60. But, it doesn't match with the comments, and it wastes a bit of space, so fix it. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Fixes: 00839ee3b299 ("x86/mm: Move swap offset/type up in PTE to work around erratum") Link: http://lkml.kernel.org/r/20160810172325.E56AD7DA@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15x86/mm: Move swap offset/type up in PTE to work around erratumDave Hansen
commit 00839ee3b299303c6a5e26a0a2485427a3afcbbf upstream This erratum can result in Accessed/Dirty getting set by the hardware when we do not expect them to be (on !Present PTEs). Instead of trying to fix them up after this happens, we just allow the bits to get set and try to ignore them. We do this by shifting the layout of the bits we use for swap offset/type in our 64-bit PTEs. It looks like this: bitnrs: | ... | 11| 10| 9|8|7|6|5| 4| 3|2|1|0| names: | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P| before: | OFFSET (9-63) |0|X|X| TYPE(1-5) |0| after: | OFFSET (14-63) | TYPE (9-13) |0|X|X|X| X| X|X|X|0| Note that D was already a don't care (X) even before. We just move TYPE up and turn its old spot (which could be hit by the A bit) into all don't cares. We take 5 bits away from the offset, but that still leaves us with 50 bits which lets us index into a 62-bit swapfile (4 EiB). I think that's probably fine for the moment. We could theoretically reclaim 5 of the bits (1, 2, 3, 4, 7) but it doesn't gain us anything. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: dave.hansen@intel.com Cc: linux-mm@kvack.org Cc: mhocko@suse.com Link: http://lkml.kernel.org/r/20160708001911.9A3FD2B6@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15x86/speculation/l1tf: Increase 32bit PAE __PHYSICAL_PAGE_SHIFTAndi Kleen
commit 50896e180c6aa3a9c61a26ced99e15d602666a4c upstream L1 Terminal Fault (L1TF) is a speculation related vulnerability. The CPU speculates on PTE entries which do not have the PRESENT bit set, if the content of the resulting physical address is available in the L1D cache. The OS side mitigation makes sure that a !PRESENT PTE entry points to a physical address outside the actually existing and cachable memory space. This is achieved by inverting the upper bits of the PTE. Due to the address space limitations this only works for 64bit and 32bit PAE kernels, but not for 32bit non PAE. This mitigation applies to both host and guest kernels, but in case of a 64bit host (hypervisor) and a 32bit PAE guest, inverting the upper bits of the PAE address space (44bit) is not enough if the host has more than 43 bits of populated memory address space, because the speculation treats the PTE content as a physical host address bypassing EPT. The host (hypervisor) protects itself against the guest by flushing L1D as needed, but pages inside the guest are not protected against attacks from other processes inside the same guest. For the guest the inverted PTE mask has to match the host to provide the full protection for all pages the host could possibly map into the guest. The hosts populated address space is not known to the guest, so the mask must cover the possible maximal host address space, i.e. 52 bit. On 32bit PAE the maximum PTE mask is currently set to 44 bit because that is the limit imposed by 32bit unsigned long PFNs in the VMs. This limits the mask to be below what the host could possible use for physical pages. The L1TF PROT_NONE protection code uses the PTE masks to determine which bits to invert to make sure the higher bits are set for unmapped entries to prevent L1TF speculation attacks against EPT inside guests. In order to invert all bits that could be used by the host, increase __PHYSICAL_PAGE_SHIFT to 52 to match 64bit. The real limit for a 32bit PAE kernel is still 44 bits because all Linux PTEs are created from unsigned long PFNs, so they cannot be higher than 44 bits on a 32bit kernel. So these extra PFN bits should be never set. The only users of this macro are using it to look at PTEs, so it's safe. [ tglx: Massaged changelog ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15x86/irqflags: Provide a declaration for native_save_flNick Desaulniers
commit 208cbb32558907f68b3b2a081ca2337ac3744794 upstream. It was reported that the commit d0a8d9378d16 is causing users of gcc < 4.9 to observe -Werror=missing-prototypes errors. Indeed, it seems that: extern inline unsigned long native_save_fl(void) { return 0; } compiled with -Werror=missing-prototypes produces this warning in gcc < 4.9, but not gcc >= 4.9. Fixes: d0a8d9378d16 ("x86/paravirt: Make native_save_fl() extern inline"). Reported-by: David Laight <david.laight@aculab.com> Reported-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Cc: jgross@suse.com Cc: kstewart@linuxfoundation.org Cc: gregkh@linuxfoundation.org Cc: boris.ostrovsky@oracle.com Cc: astrachan@google.com Cc: mka@chromium.org Cc: arnd@arndb.de Cc: tstellar@redhat.com Cc: sedat.dilek@gmail.com Cc: David.Laight@aculab.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180803170550.164688-1-ndesaulniers@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25x86/bugs: Rename SSBD_NO to SSB_NOKonrad Rzeszutek Wilk
commit 240da953fcc6a9008c92fae5b1f727ee5ed167ab upstream The "336996 Speculative Execution Side Channel Mitigations" from May defines this as SSB_NO, hence lets sync-up. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu> Reviewed-by: Matt Helsley (VMware) <matt.helsley@gmail.com> Reviewed-by: Alexey Makhalov <amakhalov@vmware.com> Reviewed-by: Bo Gan <ganb@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFGThomas Gleixner
commit 47c61b3955cf712cadfc25635bf9bc174af030ea upstream Add the necessary logic for supporting the emulated VIRT_SPEC_CTRL MSR to x86_virt_spec_ctrl(). If either X86_FEATURE_LS_CFG_SSBD or X86_FEATURE_VIRT_SPEC_CTRL is set then use the new guest_virt_spec_ctrl argument to check whether the state must be modified on the host. The update reuses speculative_store_bypass_update() so the ZEN-specific sibling coordination can be reused. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu> Reviewed-by: Matt Helsley (VMware) <matt.helsley@gmail.com> Reviewed-by: Alexey Makhalov <amakhalov@vmware.com> Reviewed-by: Bo Gan <ganb@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25x86/bugs: Remove x86_spec_ctrl_set()Thomas Gleixner
commit 4b59bdb569453a60b752b274ca61f009e37f4dae upstream x86_spec_ctrl_set() is only used in bugs.c and the extra mask checks there provide no real value as both call sites can just write x86_spec_ctrl_base to MSR_SPEC_CTRL. x86_spec_ctrl_base is valid and does not need any extra masking or checking. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu> Reviewed-by: Matt Helsley (VMware) <matt.helsley@gmail.com> Reviewed-by: Alexey Makhalov <amakhalov@vmware.com> Reviewed-by: Bo Gan <ganb@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25x86/bugs: Expose x86_spec_ctrl_base directlyThomas Gleixner
commit fa8ac4988249c38476f6ad678a4848a736373403 upstream x86_spec_ctrl_base is the system wide default value for the SPEC_CTRL MSR. x86_spec_ctrl_get_default() returns x86_spec_ctrl_base and was intended to prevent modification to that variable. Though the variable is read only after init and globaly visible already. Remove the function and export the variable instead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu> Reviewed-by: Matt Helsley (VMware) <matt.helsley@gmail.com> Reviewed-by: Alexey Makhalov <amakhalov@vmware.com> Reviewed-by: Bo Gan <ganb@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25x86/bugs: Unify x86_spec_ctrl_{set_guest, restore_host}Borislav Petkov
commit cc69b34989210f067b2c51d5539b5f96ebcc3a01 upstream Function bodies are very similar and are going to grow more almost identical code. Add a bool arg to determine whether SPEC_CTRL is being set for the guest or restored to the host. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu> Reviewed-by: Matt Helsley (VMware) <matt.helsley@gmail.com> Reviewed-by: Alexey Makhalov <amakhalov@vmware.com> Reviewed-by: Bo Gan <ganb@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25x86/speculation: Rework speculative_store_bypass_update()Thomas Gleixner
commit 0270be3e34efb05a88bc4c422572ece038ef3608 upstream The upcoming support for the virtual SPEC_CTRL MSR on AMD needs to reuse speculative_store_bypass_update() to avoid code duplication. Add an argument for supplying a thread info (TIF) value and create a wrapper speculative_store_bypass_update_current() which is used at the existing call site. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu> Reviewed-by: Matt Helsley (VMware) <matt.helsley@gmail.com> Reviewed-by: Alexey Makhalov <amakhalov@vmware.com> Reviewed-by: Bo Gan <ganb@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25x86/speculation: Add virtualized speculative store bypass disable supportTom Lendacky
commit 11fb0683493b2da112cd64c9dada221b52463bf7 upstream Some AMD processors only support a non-architectural means of enabling speculative store bypass disable (SSBD). To allow a simplified view of this to a guest, an architectural definition has been created through a new CPUID bit, 0x80000008_EBX[25], and a new MSR, 0xc001011f. With this, a hypervisor can virtualize the existence of this definition and provide an architectural method for using SSBD to a guest. Add the new CPUID feature, the new MSR and update the existing SSBD support to use this MSR when present. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu> Reviewed-by: Matt Helsley (VMware) <matt.helsley@gmail.com> Reviewed-by: Alexey Makhalov <amakhalov@vmware.com> Reviewed-by: Bo Gan <ganb@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRLThomas Gleixner
commit ccbcd2674472a978b48c91c1fbfb66c0ff959f24 upstream AMD is proposing a VIRT_SPEC_CTRL MSR to handle the Speculative Store Bypass Disable via MSR_AMD64_LS_CFG so that guests do not have to care about the bit position of the SSBD bit and thus facilitate migration. Also, the sibling coordination on Family 17H CPUs can only be done on the host. Extend x86_spec_ctrl_set_guest() and x86_spec_ctrl_restore_host() with an extra argument for the VIRT_SPEC_CTRL MSR. Hand in 0 from VMX and in SVM add a new virt_spec_ctrl member to the CPU data structure which is going to be used in later patches for the actual implementation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [ Srivatsa: Backported to 4.4.y, skipping the KVM changes in this patch. ] Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu> Reviewed-by: Matt Helsley (VMware) <matt.helsley@gmail.com> Reviewed-by: Alexey Makhalov <amakhalov@vmware.com> Reviewed-by: Bo Gan <ganb@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25x86/speculation: Handle HT correctly on AMDThomas Gleixner
commit 1f50ddb4f4189243c05926b842dc1a0332195f31 upstream The AMD64_LS_CFG MSR is a per core MSR on Family 17H CPUs. That means when hyperthreading is enabled the SSBD bit toggle needs to take both cores into account. Otherwise the following situation can happen: CPU0 CPU1 disable SSB disable SSB enable SSB <- Enables it for the Core, i.e. for CPU0 as well So after the SSB enable on CPU1 the task on CPU0 runs with SSB enabled again. On Intel the SSBD control is per core as well, but the synchronization logic is implemented behind the per thread SPEC_CTRL MSR. It works like this: CORE_SPEC_CTRL = THREAD0_SPEC_CTRL | THREAD1_SPEC_CTRL i.e. if one of the threads enables a mitigation then this affects both and the mitigation is only disabled in the core when both threads disabled it. Add the necessary synchronization logic for AMD family 17H. Unfortunately that requires a spinlock to serialize the access to the MSR, but the locks are only shared between siblings. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu> Reviewed-by: Matt Helsley (VMware) <matt.helsley@gmail.com> Reviewed-by: Alexey Makhalov <amakhalov@vmware.com> Reviewed-by: Bo Gan <ganb@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25x86/cpufeatures: Add FEATURE_ZENThomas Gleixner
commit d1035d971829dcf80e8686ccde26f94b0a069472 upstream Add a ZEN feature bit so family-dependent static_cpu_has() optimizations can be built for ZEN. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu> Reviewed-by: Matt Helsley (VMware) <matt.helsley@gmail.com> Reviewed-by: Alexey Makhalov <amakhalov@vmware.com> Reviewed-by: Bo Gan <ganb@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25x86/cpufeatures: Disentangle SSBD enumerationThomas Gleixner
commit 52817587e706686fcdb27f14c1b000c92f266c96 upstream The SSBD enumeration is similarly to the other bits magically shared between Intel and AMD though the mechanisms are different. Make X86_FEATURE_SSBD synthetic and set it depending on the vendor specific features or family dependent setup. Change the Intel bit to X86_FEATURE_SPEC_CTRL_SSBD to denote that SSBD is controlled via MSR_SPEC_CTRL and fix up the usage sites. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu> Reviewed-by: Matt Helsley (VMware) <matt.helsley@gmail.com> Reviewed-by: Alexey Makhalov <amakhalov@vmware.com> Reviewed-by: Bo Gan <ganb@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRSThomas Gleixner
commit 7eb8956a7fec3c1f0abc2a5517dada99ccc8a961 upstream The availability of the SPEC_CTRL MSR is enumerated by a CPUID bit on Intel and implied by IBRS or STIBP support on AMD. That's just confusing and in case an AMD CPU has IBRS not supported because the underlying problem has been fixed but has another bit valid in the SPEC_CTRL MSR, the thing falls apart. Add a synthetic feature bit X86_FEATURE_MSR_SPEC_CTRL to denote the availability on both Intel and AMD. While at it replace the boot_cpu_has() checks with static_cpu_has() where possible. This prevents late microcode loading from exposing SPEC_CTRL, but late loading is already very limited as it does not reevaluate the mitigation options and other bits and pieces. Having static_cpu_has() is the simplest and least fragile solution. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu> Reviewed-by: Matt Helsley (VMware) <matt.helsley@gmail.com> Reviewed-by: Alexey Makhalov <amakhalov@vmware.com> Reviewed-by: Bo Gan <ganb@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25x86/speculation: Use synthetic bits for IBRS/IBPB/STIBPBorislav Petkov
commit e7c587da125291db39ddf1f49b18e5970adbac17 upstream Intel and AMD have different CPUID bits hence for those use synthetic bits which get set on the respective vendor's in init_speculation_control(). So that debacles like what the commit message of c65732e4f721 ("x86/cpu: Restore CPUID_8000_0008_EBX reload") talks about don't happen anymore. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Jörg Otte <jrg.otte@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20180504161815.GG9257@pd.tnic Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [ Srivatsa: Backported to 4.4.y, skipping the KVM changes in this patch. ] Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu> Reviewed-by: Matt Helsley (VMware) <matt.helsley@gmail.com> Reviewed-by: Alexey Makhalov <amakhalov@vmware.com> Reviewed-by: Bo Gan <ganb@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25x86/cpu: Make alternative_msr_write work for 32-bit codeJim Mattson
commit 5f2b745f5e1304f438f9b2cd03ebc8120b6e0d3b upstream Cast val and (val >> 32) to (u32), so that they fit in a general-purpose register in both 32-bit and 64-bit code. [ tglx: Made it u32 instead of uintptr_t ] Fixes: c65732e4f721 ("x86/cpu: Restore CPUID_8000_0008_EBX reload") Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu> Reviewed-by: Matt Helsley (VMware) <matt.helsley@gmail.com> Reviewed-by: Alexey Makhalov <amakhalov@vmware.com> Reviewed-by: Bo Gan <ganb@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25x86/bugs: Rename _RDS to _SSBDKonrad Rzeszutek Wilk
commit 9f65fb29374ee37856dbad847b4e121aab72b510 upstream Intel collateral will reference the SSB mitigation bit in IA32_SPEC_CTL[2] as SSBD (Speculative Store Bypass Disable). Hence changing it. It is unclear yet what the MSR_IA32_ARCH_CAPABILITIES (0x10a) Bit(4) name is going to be. Following the rename it would be SSBD_NO but that rolls out to Speculative Store Bypass Disable No. Also fixed the missing space in X86_FEATURE_AMD_SSBD. [ tglx: Fixup x86_amd_rds_enable() and rds_tif_to_amd_ls_cfg() as well ] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [ Srivatsa: Backported to 4.4.y, skipping the KVM changes in this patch. ] Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu> Reviewed-by: Matt Helsley (VMware) <matt.helsley@gmail.com> Reviewed-by: Alexey Makhalov <amakhalov@vmware.com> Reviewed-by: Bo Gan <ganb@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25x86/speculation: Make "seccomp" the default mode for Speculative Store BypassKees Cook
commit f21b53b20c754021935ea43364dbf53778eeba32 upstream Unless explicitly opted out of, anything running under seccomp will have SSB mitigations enabled. Choosing the "prctl" mode will disable this. [ tglx: Adjusted it to the new arch_seccomp_spec_mitigate() mechanism ] Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu> Reviewed-by: Matt Helsley (VMware) <matt.helsley@gmail.com> Reviewed-by: Alexey Makhalov <amakhalov@vmware.com> Reviewed-by: Bo Gan <ganb@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25x86/speculation: Add prctl for Speculative Store Bypass mitigationThomas Gleixner
commit a73ec77ee17ec556fe7f165d00314cb7c047b1ac upstream Add prctl based control for Speculative Store Bypass mitigation and make it the default mitigation for Intel and AMD. Andi Kleen provided the following rationale (slightly redacted): There are multiple levels of impact of Speculative Store Bypass: 1) JITed sandbox. It cannot invoke system calls, but can do PRIME+PROBE and may have call interfaces to other code 2) Native code process. No protection inside the process at this level. 3) Kernel. 4) Between processes. The prctl tries to protect against case (1) doing attacks. If the untrusted code can do random system calls then control is already lost in a much worse way. So there needs to be system call protection in some way (using a JIT not allowing them or seccomp). Or rather if the process can subvert its environment somehow to do the prctl it can already execute arbitrary code, which is much worse than SSB. To put it differently, the point of the prctl is to not allow JITed code to read data it shouldn't read from its JITed sandbox. If it already has escaped its sandbox then it can already read everything it wants in its address space, and do much worse. The ability to control Speculative Store Bypass allows to enable the protection selectively without affecting overall system performance. Based on an initial patch from Tim Chen. Completely rewritten. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu> Reviewed-by: Matt Helsley (VMware) <matt.helsley@gmail.com> Reviewed-by: Alexey Makhalov <amakhalov@vmware.com> Reviewed-by: Bo Gan <ganb@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>