summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/entry_64.S
AgeCommit message (Collapse)Author
2015-06-03x86/asm/entry: Move entry_64.S and entry_32.S to arch/x86/entry/Ingo Molnar
Create a new directory hierarchy for the low level x86 entry code: arch/x86/entry/* This will host all the low level glue that is currently scattered all across arch/x86/. Start with entry_64.S and entry_32.S. Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02x86/asm/entry/64: Fold identical code pathsJan Beulich
retint_kernel doesn't require %rcx to be pointing to thread info (anymore?), and the code on the two alternative paths is - not really surprisingly - identical. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/556C664F020000780007FB64@mail.emea.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02x86/debug: Remove perpetually broken, unmaintainable dwarf annotationsIngo Molnar
So the dwarf2 annotations in low level assembly code have become an increasing hindrance: unreadable, messy macros mixed into some of the most security sensitive code paths of the Linux kernel. These debug info annotations don't even buy the upstream kernel anything: dwarf driven stack unwinding has caused problems in the past so it's out of tree, and the upstream kernel only uses the much more robust framepointers based stack unwinding method. In addition to that there's a steady, slow bitrot going on with these annotations, requiring frequent fixups. There's no tooling and no functionality upstream that keeps it correct. So burn down the sick forest, allowing new, healthier growth: 27 files changed, 350 insertions(+), 1101 deletions(-) Someone who has the willingness and time to do this properly can attempt to reintroduce dwarf debuginfo in x86 assembly code plus dwarf unwinding from first principles, with the following conditions: - it should be maximally readable, and maximally low-key to 'ordinary' code reading and maintenance. - find a build time method to insert dwarf annotations automatically in the most common cases, for pop/push instructions that manipulate the stack pointer. This could be done for example via a preprocessing step that just looks for common patterns - plus special annotations for the few cases where we want to depart from the default. We have hundreds of CFI annotations, so automating most of that makes sense. - it should come with build tooling checks that ensure that CFI annotations are sensible. We've seen such efforts from the framepointer side, and there's no reason it couldn't be done on the dwarf side. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-17x86/asm/entry/64: Use shorter MOVs from segment registersDenys Vlasenko
The "movw %ds,%cx" instruction needs a 0x66 prefix, while "movl %ds,%ecx" does not. The difference is that latter form (on 64-bit CPUs) overwrites the entire %ecx, not only its lower half. But subsequent code doesn't depend on the value of upper half of %ecx, so we can safely use the shorter instruction. The new code is also faster than the old one - now we don't depend on the old value of %ecx, but this code fragment is not performance-critical so it does not matter much. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1431722346-26585-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08x86/entry: Define 'cpu_current_top_of_stack' for 64-bit codeDenys Vlasenko
32-bit code has PER_CPU_VAR(cpu_current_top_of_stack). 64-bit code uses somewhat more obscure: PER_CPU_VAR(cpu_tss + TSS_sp0). Define the 'cpu_current_top_of_stack' macro on CONFIG_X86_64 as well so that the PER_CPU_VAR(cpu_current_top_of_stack) expression can be used in both 32-bit and 64-bit code. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1429889495-27850-3-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08x86/entry: Stop using PER_CPU_VAR(kernel_stack)Denys Vlasenko
PER_CPU_VAR(kernel_stack) is redundant: - On the 64-bit build, we can use PER_CPU_VAR(cpu_tss + TSS_sp0). - On the 32-bit build, we can use PER_CPU_VAR(cpu_current_top_of_stack). PER_CPU_VAR(kernel_stack) will be deleted by a separate change. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1429889495-27850-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08Merge branch 'linus' into x86/asm, before applying dependent patchIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08x86/asm/entry/64: Clean up usage of TEST insnsDenys Vlasenko
By the nature of TEST operation, it is often possible to test a narrower part of the operand: "testl $3, mem" -> "testb $3, mem" This results in shorter insns, because TEST insn has no sign-entending byte-immediate forms unlike other ALU ops. text data bss dec hex filename 11674 0 0 11674 2d9a entry_64.o.before 11658 0 0 11658 2d8a entry_64.o Changes in object code: - f7 84 24 88 00 00 00 03 00 00 00 testl $0x3,0x88(%rsp) + f6 84 24 88 00 00 00 03 testb $0x3,0x88(%rsp) - f7 44 24 68 03 00 00 00 testl $0x3,0x68(%rsp) + f6 44 24 68 03 testb $0x3,0x68(%rsp) - f7 84 24 90 00 00 00 03 00 00 00 testl $0x3,0x90(%rsp) + f6 84 24 90 00 00 00 03 testb $0x3,0x90(%rsp) Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1430140912-7960-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08x86/asm/entry/64: Tidy up JZ insns after TESTsDenys Vlasenko
After TESTs, use logically correct JZ/JNZ mnemonics instead of JE/JNE. This doesn't change code. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1430140912-7960-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-26x86_64, asm: Work around AMD SYSRET SS descriptor attribute issueAndy Lutomirski
AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with SS == 0 results in an invalid usermode state in which SS is apparently equal to __USER_DS but causes #SS if used. Work around the issue by setting SS to __KERNEL_DS __switch_to, thus ensuring that SYSRET never happens with SS set to NULL. This was exposed by a recent vDSO cleanup. Fixes: e7d6eefaaa44 x86/vdso32/syscall.S: Do not load __USER32_DS to %ss Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Brian Gerst <brgerst@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-22x86/asm/entry/64: Merge 32-bit execve stubs with x32 ones, as they are identicalDenys Vlasenko
Run-tested. Suggested-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1429632194-13445-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-22x86/asm/entry/64: Implement better check for canonical addressesDenys Vlasenko
This change makes the check exact (no more false positives on "negative" addresses). Andy explains: "Canonical addresses either start with 17 zeros or 17 ones. In the old code, we checked that the top (64-47) = 17 bits were all zero. We did this by shifting right by 47 bits and making sure that nothing was left. In the new code, we're shifting left by (64 - 48) = 16 bits and then signed shifting right by the same amount, this propagating the 17th highest bit to all positions to its left. If we get the same value we started with, then we're good to go." While it isn't really important to be fully correct here - almost all addresses we'll ever see will be userspace ones, but OTOH it looks to be cheap enough: the new code uses two more ALU ops but preserves %rcx, allowing to not reload it from pt_regs->cx again. On disassembly level, the changes are: cmp %rcx,0x80(%rsp) -> mov 0x80(%rsp),%r11; cmp %rcx,%r11 shr $0x2f,%rcx -> shl $0x10,%rcx; sar $0x10,%rcx; cmp %rcx,%r11 mov 0x58(%rsp),%rcx -> (eliminated) Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1429633649-20169-1-git-send-email-dvlasenk@redhat.com [ Changelog massage. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-09x86/asm/entry/64: Reduce padding in execve stubsDenys Vlasenko
execve stubs are 7 bytes only. Padding them to 16 bytes is a waste. text data bss dec hex filename 12594 0 0 12594 3132 entry_64.o.before 12530 0 0 12530 30f2 entry_64.o Run-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428439424-7258-8-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-09x86/asm/entry/64: Remove GET_THREAD_INFO() in ret_from_forkDenys Vlasenko
It used to be used to check for _TIF_IA32, but the check has been removed. Remove GET_THREAD_INFO() too. Run-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428439424-7258-7-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-09x86/asm/entry/64: Simplify jumps in ret_from_forkDenys Vlasenko
Replace test jz 1f jmp label 1: with test jnz label Run-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428439424-7258-6-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-09x86/asm/entry/64: Remove a redundant jumpDenys Vlasenko
Jumping to the very next instruction is not very useful: jmp label label: Removing the jump. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428439424-7258-5-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-09x86/asm/entry/64: Optimize [v]fork/clone stubsDenys Vlasenko
Replace "call func; ret" with "jmp func". Run-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428439424-7258-4-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-09x86/asm/entry: Zero EXTRA_REGS for stub32_execve() tooDenys Vlasenko
The change which affected how execve clears EXTRA_REGS missed 32-bit execve syscalls. Fix this by using 64-bit execve stub epilogue for them too. Run-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428439424-7258-3-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-09x86/asm/entry/64: Move stub_x32_execvecloser() to stub_execveat()Denys Vlasenko
This is a preparatory patch for moving stub32_execve[at]() to this file. It makes sense to have all execve stubs in one place, so that they can reuse code. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428439424-7258-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-09x86/asm/entry/64: Use common code for rt_sigreturn() epilogueDenys Vlasenko
Similarly to stub_execve, we can reuse the epilogue in stub_rt_sigreturn() and stub_x32_rt_sigreturn(). Add a comment explaining why we can't eliminage SAVE_EXTRA_REGS here. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428439424-7258-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-08x86/asm/entry/64: Add forgotten CFI annotationDenys Vlasenko
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428424967-14460-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-08x86/asm/entry/irq: Simplify interrupt dispatch table (IDT) layoutDenys Vlasenko
Interrupt entry points are handled with the following code, each 32-byte code block contains seven entry points: ... [push][jump 22] // 4 bytes [push][jump 18] // 4 bytes [push][jump 14] // 4 bytes [push][jump 10] // 4 bytes [push][jump 6] // 4 bytes [push][jump 2] // 4 bytes [push][jump common_interrupt][padding] // 8 bytes [push][jump] [push][jump] [push][jump] [push][jump] [push][jump] [push][jump] [push][jump common_interrupt][padding] [padding_2] common_interrupt: And there is a table which holds pointers to every entry point, IOW: to every push. In cold cache, two jumps are still costlier than one, even though we get the benefit of them residing in the same cacheline. This change replaces short jumps with near ones to 'common_interrupt', and pads every push+jump pair to 8 bytes. This way, each interrupt takes only one jump. This change replaces ".p2align CONFIG_X86_L1_CACHE_SHIFT" before dispatch table with ".align 8" - we do not need anything stronger than that. The table of entry addresses (the interrupt[] array) is no longer necessary, the address of entries can be easily calculated as (irq_entries_start + i*8). text data bss dec hex filename 12546 0 0 12546 3102 entry_64.o.before 11626 0 0 11626 2d6a entry_64.o The size decrease is because 1656 bytes of .init.rodata are gone. That's initdata, though. The resident size does go up a bit. Run-tested (32 and 64 bits). Acked-and-Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428090553-7283-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-08x86/asm/entry/64: Move opportunistic sysret code to syscall code pathDenys Vlasenko
This change does two things: Copy-pastes "retint_swapgs:" code into syscall handling code, the copy is under "syscall_return:" label. The code is unchanged apart from some label renames. Removes "opportunistic sysret" code from "retint_swapgs:" code block, since now it won't be reached by syscall return. This in fact removes most of the code in question. text data bss dec hex filename 12530 0 0 12530 30f2 entry_64.o.before 12562 0 0 12562 3112 entry_64.o Run-tested. Acked-and-Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1427993219-7291-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-08Merge tag 'v4.0-rc7' into x86/asm, to resolve conflictsIngo Molnar
Conflicts: arch/x86/kernel/entry_64.S Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-06x86/asm/entry: Clear EXTRA_REGS for all executable formatsDenys Vlasenko
On failure, sys_execve() does not clobber EXTRA_REGS, so we can just return to userpsace without saving/restoring them. On success, ELF_PLAT_INIT() in sys_execve() clears all these registers. On other executable formats: - binfmt_flat.c has similar FLAT_PLAT_INIT, but x86 (and everyone else except sh) doesn't define it. - binfmt_elf_fdpic.c has ELF_FDPIC_PLAT_INIT, but x86 (and most others) doesn't define it. - There are no such hooks in binfmt_aout.c et al. We inherit EXTRA_REGS from the prior executable. This inconsistency was not intended. This change removes SAVE/RESTORE_EXTRA_REGS in stub_execve, removes register clearing in ELF_PLAT_INIT(), and instead simply clears them on success in stub_execve. Run-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428173719-7637-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02x86/asm/entry/64: Fold the 'test_in_nmi' macro into its only userDenys Vlasenko
No code changes. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1427899858-7165-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02x86/asm/entry/64: Disable opportunistic SYSRET if regs->flags has TF setAndy Lutomirski
When I wrote the opportunistic SYSRET code, I missed an important difference between SYSRET and IRET. Both instructions are capable of setting EFLAGS.TF, but they behave differently when doing so: - IRET will not issue a #DB trap after execution when it sets TF. This is critical -- otherwise you'd never be able to make forward progress when returning to userspace. - SYSRET, on the other hand, will trap with #DB immediately after returning to CPL3, and the next instruction will never execute. This breaks anything that opportunistically SYSRETs to a user context with TF set. For example, running this code with TF set and a SIGTRAP handler loaded never gets past 'post_nop': extern unsigned char post_nop[]; asm volatile ("pushfq\n\t" "popq %%r11\n\t" "nop\n\t" "post_nop:" : : "c" (post_nop) : "r11"); In my defense, I can't find this documented in the AMD or Intel manual. Fix it by using IRET to restore TF. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 2a23c6b8a9c4 ("x86_64, entry: Use sysret to return to userspace when possible") Link: http://lkml.kernel.org/r/9472f1ca4c19a38ecda45bba9c91b7168135fcfa.1427923514.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01x86/asm/entry/64: Use local label to skip around sycall dispatchDenys Vlasenko
Logically, we just want to jump around the following instruction and its prologue/epilogue: call *sys_call_table(,%rax,8) if the syscall number is too big - we do not specifically target the "int_ret_from_sys_call" label. Use a local, numerical label for this jump, for more clarity. This also makes the code smaller: -ffffffff8187756b: 0f 87 0f 00 00 00 ja ffffffff81877580 <int_ret_from_sys_call> +ffffffff8187756b: 77 0f ja ffffffff8187757c <int_ret_from_sys_call> because jumps to global labels are never translated to short jump instructions by GAS. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1427821211-25099-9-git-send-email-dvlasenk@redhat.com [ Improved the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01x86/asm/entry/64: Simplify looping around preempt_schedule_irq()Denys Vlasenko
At the 'exit_intr' label we test whether interrupt/exception was in kernel. If it did, we jump to the preemption check. If preemption does happen (IOW if we call preempt_schedule_irq()), we go back to 'exit_intr'. But it's pointless, we already know that the test succeeded last time, preemption doesn't change the fact that interrupt/exception was in the kernel. We can go back directly to checking PER_CPU_VAR(__preempt_count) instead. This makes the 'exit_intr' label unused, drop it. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1427821211-25099-5-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01x86/asm/entry/64: Remove redundant DISABLE_INTERRUPTS()Denys Vlasenko
At this location, we already have interrupts off, always. To be more specific, we already disabled them here: ret_from_intr: DISABLE_INTERRUPTS(CLBR_NONE) Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1427821211-25099-4-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01x86/asm/entry/64: Simplify retint_kernel label usage, make ↵Denys Vlasenko
retint_restore_args label local Get rid of #define obfuscation of retint_kernel in CONFIG_PREEMPT case by defining retint_kernel label always, not only for CONFIG_PREEMPT. Strip retint_kernel of .global-ness (ENTRY macro) - it has no users outside of this file. This looks like cosmetics, but it is not: "je LABEL" can be optimized to short jump by assember only if LABEL is not global, for global labels jump is always a near one with relocation. Convert retint_restore_args to a local numeric label, making it clearer that it is not used elsewhere in the file. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1427821211-25099-3-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01x86/asm/entry/64: Do not TRACE_IRQS fast SYSRET64 pathDenys Vlasenko
SYSRET code path has a small irq-off block. On this code path, TRACE_IRQS_ON can't be called right before interrupts are enabled for real, we can't clobber registers there. So current code does it earlier, in a safe place. But with this, TRACE_IRQS_OFF/ON frames just two fast instructions, which is ridiculous: now most of irq-off block is _outside_ of the framing. Do the same thing that we do on SYSCALL entry: do not track this irq-off block, it is very small to ever cause noticeable irq latency. Be careful: make sure that "jnz int_ret_from_sys_call_irqs_off" now does invoke TRACE_IRQS_OFF - move int_ret_from_sys_call_irqs_off label before TRACE_IRQS_OFF. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1427821211-25099-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-31x86/asm/entry/64: Do not GET_THREAD_INFO() too earlyDenys Vlasenko
At exit_intr, we GET_THREAD_INFO(%rcx) and then jump to retint_kernel if saved CS was from kernel. But the code at retint_kernel doesn't need %rcx. Move GET_THREAD_INFO(%rcx) down, after CS check and branch. While at it, remove "has a correct top of stack" comment. After recent changes which eliminated FIXUP_TOP_OF_STACK, we always have a correct pt_regs layout. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1427738975-7391-5-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-31x86/asm/entry/64: Move retint_kernel code block closer to its userDenys Vlasenko
The "retint_kernel" code block is misplaced. Since its logical continuation is "retint_restore_args", it is more natural to place it above that label. This also makes two jumps "short". This change only moves code block around, without changing logic. This enables the next simplification: making "retint_restore_args" label a local numeric one. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1427738975-7391-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27x86/asm/entry/64: Add missing CFI annotationDenys Vlasenko
This is a missing bit of the recent MOV-to-PUSH conversion. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1427452582-21624-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27x86/asm/entry/64: Use smaller instructionsDenys Vlasenko
The $AUDIT_ARCH_X86_64 parameter to syscall_trace_enter_phase1/2 is a 32-bit constant, loading it with 32-bit MOV produces 5-byte insn instead of 10-byte MOVABS one. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1427303896-24023-3-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27x86/asm/entry/64: Use better label name, fix commentsDenys Vlasenko
A named label "ret_from_sys_call" implies that there are jumps to this location from elsewhere, as happens with many other labels in this file. But this label is used only by the JMP a few insns above. To make that obvious, use local numeric label instead. Improve comments: "and return regs->ax" isn't too informative. We always return regs->ax. The comment suggesting that it'd be cool to use rip relative addressing for CALL is deleted. It's unclear why that would be an improvement - we aren't striving to use position-independent code here. PIC code here would require something like LEA sys_call_table(%rip),reg + CALL *(reg,%rax*8)... "iret frame is also incomplete" is no longer true, fix that too. Also fix typo in comment. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1427303896-24023-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-25Merge branch 'x86/urgent' into x86/asm, to resolve conflictIngo Molnar
Conflicts: arch/x86/kernel/entry_64.S Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24x86/asm/entry: Check for syscall exit work with IRQs disabledAndy Lutomirski
We currently have a race: if we're preempted during syscall exit, we can fail to process syscall return work that is queued up while we're preempted in ret_from_sys_call after checking ti.flags. Fix it by disabling interrupts before checking ti.flags. Reported-by: Stefan Seyfried <stefan.seyfried@googlemail.com> Reported-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tejun Heo <tj@kernel.org> Fixes: 96b6352c1271 ("x86_64, entry: Remove the syscall exit audit") Link: http://lkml.kernel.org/r/189320d42b4d671df78c10555976bb10af1ffc75.1427137498.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24x86/asm/entry/64: Rename THREAD_INFO() to ASM_THREAD_INFO()Ingo Molnar
The THREAD_INFO() macro has a somewhat confusingly generic name, defined in a generic .h C header file. It also does not make it clear that it constructs a memory operand for use in assembly code. Rename it to ASM_THREAD_INFO() to make it all glaringly obvious on first glance. Acked-by: Borislav Petkov <bp@suse.de> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/20150324184442.GC14760@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24x86/asm/entry/64: Merge the field offset into the THREAD_INFO() macroIngo Molnar
Before: TI_sysenter_return+THREAD_INFO(%rsp,3*8),%r10d After: movl THREAD_INFO(TI_sysenter_return, %rsp, 3*8), %r10d to turn it into a clear thread_info accessor. No code changed: md5: fb4cb2b3ce05d89940ca304efc8ff183 ia32entry.o.before.asm fb4cb2b3ce05d89940ca304efc8ff183 ia32entry.o.after.asm e39f2958a5d1300158e276e4f7663263 entry_64.o.before.asm e39f2958a5d1300158e276e4f7663263 entry_64.o.after.asm Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/20150324184411.GB14760@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24x86/asm/entry/64: Get rid of int_ret_from_sys_call_fixupDenys Vlasenko
With the FIXUP_TOP_OF_STACK macro removed, this intermediate jump is unnecessary. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1426785469-15125-5-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24x86/asm/entry/64: Get rid of the FIXUP_TOP_OF_STACK/RESTORE_TOP_OF_STACK macrosDenys Vlasenko
The FIXUP_TOP_OF_STACK macro is only necessary because we don't save %r11 to pt_regs->r11 on SYSCALL64 fast path, but we want ptrace to see it populated. Bite the bullet, add a single additional PUSH instruction, and remove the FIXUP_TOP_OF_STACK macro. The RESTORE_TOP_OF_STACK macro is already a nop. Remove it too. On SandyBridge CPU, it does not get slower: measured 54.22 ns per getpid syscall before and after last two changes on defconfig kernel. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1426785469-15125-4-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24x86/asm/entry/64: Use PUSH instructions to build pt_regs on stackDenys Vlasenko
With this change, on SYSCALL64 code path we are now populating pt_regs->cs, pt_regs->ss and pt_regs->rcx unconditionally and therefore don't need to do that in FIXUP_TOP_OF_STACK. We lose a number of large instructions there: text data bss dec hex filename 13298 0 0 13298 33f2 entry_64_before.o 12978 0 0 12978 32b2 entry_64.o What's more important, we convert two "MOVQ $imm,off(%rsp)" to "PUSH $imm" (the ones which fill pt_regs->cs,ss). Before this patch, placing them on fast path was slowing it down by two cycles: this form of MOV is very large, 12 bytes, and this probably reduces decode bandwidth to one instruction per cycle when CPU sees them. Therefore they were living in FIXUP_TOP_OF_STACK instead (away from fast path). "PUSH $imm" is a small 2-byte instruction. Moving it to fast path does not slow it down in my measurements. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1426785469-15125-3-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24x86/asm/entry: Get rid of KERNEL_STACK_OFFSETDenys Vlasenko
PER_CPU_VAR(kernel_stack) was set up in a way where it points five stack slots below the top of stack. Presumably, it was done to avoid one "sub $5*8,%rsp" in syscall/sysenter code paths, where iret frame needs to be created by hand. Ironically, none of them benefits from this optimization, since all of them need to allocate additional data on stack (struct pt_regs), so they still have to perform subtraction. This patch eliminates KERNEL_STACK_OFFSET. PER_CPU_VAR(kernel_stack) now points directly to top of stack. pt_regs allocations are adjusted to allocate iret frame as well. Hopefully we can merge it later with 32-bit specific PER_CPU_VAR(cpu_current_top_of_stack) variable... Net result in generated code is that constants in several insns are changed. This change is necessary for changing struct pt_regs creation in SYSCALL64 code path from MOV to PUSH instructions. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1426785469-15125-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24x86/asm/entry/64: Change the THREAD_INFO() definition to not depend on ↵Denys Vlasenko
KERNEL_STACK_OFFSET This changes the THREAD_INFO() definition and all its callsites so that they do not count stack position from (top of stack - KERNEL_STACK_OFFSET), but from top of stack. Semi-mysterious expressions THREAD_INFO(%rsp,RIP) - "why RIP??" are now replaced by more logical THREAD_INFO(%rsp,SIZEOF_PTREGS) - "calculate thread_info's address using information that rsp is SIZEOF_PTREGS bytes below top of stack". While at it, replace "(off)-THREAD_SIZE(reg)" with equivalent "((off)-THREAD_SIZE)(reg)". The form without parentheses falsely looks like we invoke THREAD_SIZE() macro. Improve comment atop THREAD_INFO macro definition. This patch does not change generated code (verified by objdump). Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1426785469-15125-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23x86/asm/entry/64: Fix incorrect commentDenys Vlasenko
The recent old_rsp -> rsp_scratch rename also changed this comment, but in this case "old_rsp" was not referring to PER_CPU(old_rsp). Fix this. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1427115839-6397-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17x86/asm/entry/64: Rename 'old_rsp' to 'rsp_scratch'Ingo Molnar
Make clear that the usage of PER_CPU(old_rsp) is purely temporary, by renaming it to 'rsp_scratch'. Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17x86/asm/entry/64: Update comments about stack framesIngo Molnar
Tweak a few outdated comments that were obsoleted by recent changes to syscall entry code: - we no longer have a "partial stack frame" on entry, ever. - explain the syscall entry usage of old_rsp. Partially based on a (split out of) patch from Denys Vlasenko. Originally-from: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17x86/asm/entry/64: Enable interrupts *after* we fetch PER_CPU_VAR(old_rsp)Denys Vlasenko
We want to use PER_CPU_VAR(old_rsp) as a simple temporary register, to shuffle user-space RSP into (and from) when we set up the system call stack frame. At that point we cannot shuffle values into general purpose registers, because we have not saved them yet. To be able to do this shuffling into a memory location, we must be atomic and must not be preempted while we do the shuffling, otherwise the 'temporary' register gets overwritten by some other task's temporary register contents ... Tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1426600344-8254-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>