From 9da3f2b74054406f87dff7101a569217ffceb29b Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Tue, 28 Aug 2018 22:14:20 +0200 Subject: x86/fault: BUG() when uaccess helpers fault on kernel addresses There have been multiple kernel vulnerabilities that permitted userspace to pass completely unchecked pointers through to userspace accessors: - the waitid() bug - commit 96ca579a1ecc ("waitid(): Add missing access_ok() checks") - the sg/bsg read/write APIs - the infiniband read/write APIs These don't happen all that often, but when they do happen, it is hard to test for them properly; and it is probably also hard to discover them with fuzzing. Even when an unmapped kernel address is supplied to such buggy code, it just returns -EFAULT instead of doing a proper BUG() or at least WARN(). Try to make such misbehaving code a bit more visible by refusing to do a fixup in the pagefault handler code when a userspace accessor causes a #PF on a kernel address and the current context isn't whitelisted. Signed-off-by: Jann Horn Signed-off-by: Thomas Gleixner Tested-by: Kees Cook Cc: Andy Lutomirski Cc: kernel-hardening@lists.openwall.com Cc: dvyukov@google.com Cc: Masami Hiramatsu Cc: "Naveen N. Rao" Cc: Anil S Keshavamurthy Cc: "David S. Miller" Cc: Alexander Viro Cc: linux-fsdevel@vger.kernel.org Cc: Borislav Petkov Link: https://lkml.kernel.org/r/20180828201421.157735-7-jannh@google.com --- include/linux/sched.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index 977cb57d7bc9..56dd65f1be4f 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -739,6 +739,12 @@ struct task_struct { unsigned use_memdelay:1; #endif + /* + * May usercopy functions fault on kernel addresses? + * This is not just a single bit because this can potentially nest. + */ + unsigned int kernel_uaccess_faults_ok; + unsigned long atomic_flags; /* Flags requiring atomic access. */ struct restart_block restart_block; -- cgit v1.2.3 From 925b9cd1b89a94b7124d128c80dfc48f78a63098 Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Thu, 6 Sep 2018 16:18:34 -0400 Subject: locking/rwsem: Make owner store task pointer of last owning reader Currently, when a reader acquires a lock, it only sets the RWSEM_READER_OWNED bit in the owner field. The other bits are simply not used. When debugging hanging cases involving rwsems and readers, the owner value does not provide much useful information at all. This patch modifies the current behavior to always store the task_struct pointer of the last rwsem-acquiring reader in a reader-owned rwsem. This may be useful in debugging rwsem hanging cases especially if only one reader is involved. However, the task in the owner field may not the real owner or one of the real owners at all when the owner value is examined, for example, in a crash dump. So it is just an additional hint about the past history. If CONFIG_DEBUG_RWSEMS=y is enabled, the owner field will be checked at unlock time too to make sure the task pointer value is valid. That does have a slight performance cost and so is only enabled as part of that debug option. From the performance point of view, it is expected that the changes shouldn't have any noticeable performance impact. A rwsem microbenchmark (with 48 worker threads and 1:1 reader/writer ratio) was ran on a 2-socket 24-core 48-thread Haswell system. The locking rates on a 4.19-rc1 based kernel were as follows: 1) Unpatched kernel: 543.3 kops/s 2) Patched kernel: 549.2 kops/s 3) Patched kernel (CONFIG_DEBUG_RWSEMS on): 546.6 kops/s There was actually a slight increase in performance (1.1%) in this particular case. Maybe it was caused by the elimination of a branch or just a testing noise. Turning on the CONFIG_DEBUG_RWSEMS option also had less than the expected impact on performance. The least significant 2 bits of the owner value are now used to designate the rwsem is readers owned and the owners are anonymous. Signed-off-by: Waiman Long Acked-by: Peter Zijlstra Cc: Davidlohr Bueso Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon Link: http://lkml.kernel.org/r/1536265114-10842-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar --- include/linux/rwsem.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index ab93b6eae696..67dbb57508b1 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -45,10 +45,10 @@ struct rw_semaphore { }; /* - * Setting bit 0 of the owner field with other non-zero bits will indicate + * Setting bit 1 of the owner field but not bit 0 will indicate * that the rwsem is writer-owned with an unknown owner. */ -#define RWSEM_OWNER_UNKNOWN ((struct task_struct *)-1L) +#define RWSEM_OWNER_UNKNOWN ((struct task_struct *)-2L) extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem); extern struct rw_semaphore *rwsem_down_read_failed_killable(struct rw_semaphore *sem); -- cgit v1.2.3 From 9ae033aca8d600e36034d4d0743aad624cec92ed Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Tue, 18 Sep 2018 23:51:36 -0700 Subject: jump_label: Abstract jump_entry member accessors In preparation of allowing architectures to use relative references in jump_label entries [which can dramatically reduce the memory footprint], introduce abstractions for references to the 'code' and 'key' members of struct jump_entry. Signed-off-by: Ard Biesheuvel Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Cc: linux-arm-kernel@lists.infradead.org Cc: linux-s390@vger.kernel.org Cc: Arnd Bergmann Cc: Heiko Carstens Cc: Kees Cook Cc: Will Deacon Cc: Catalin Marinas Cc: Steven Rostedt Cc: Martin Schwidefsky Cc: Jessica Yu Link: https://lkml.kernel.org/r/20180919065144.25010-2-ard.biesheuvel@linaro.org --- include/linux/jump_label.h | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) (limited to 'include/linux') diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index 1a0b6f17a5d6..2eadff9b3b90 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -119,6 +119,40 @@ struct static_key { #ifdef HAVE_JUMP_LABEL #include + +#ifndef __ASSEMBLY__ + +static inline unsigned long jump_entry_code(const struct jump_entry *entry) +{ + return entry->code; +} + +static inline unsigned long jump_entry_target(const struct jump_entry *entry) +{ + return entry->target; +} + +static inline struct static_key *jump_entry_key(const struct jump_entry *entry) +{ + return (struct static_key *)((unsigned long)entry->key & ~1UL); +} + +static inline bool jump_entry_is_branch(const struct jump_entry *entry) +{ + return (unsigned long)entry->key & 1UL; +} + +static inline bool jump_entry_is_init(const struct jump_entry *entry) +{ + return entry->code == 0; +} + +static inline void jump_entry_set_init(struct jump_entry *entry) +{ + entry->code = 0; +} + +#endif #endif #ifndef __ASSEMBLY__ -- cgit v1.2.3 From 50ff18ab497aa22f6a59444625df7508c8918237 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Tue, 18 Sep 2018 23:51:37 -0700 Subject: jump_label: Implement generic support for relative references To reduce the size taken up by absolute references in jump label entries themselves and the associated relocation records in the .init segment, add support for emitting them as relative references instead. Note that this requires some extra care in the sorting routine, given that the offsets change when entries are moved around in the jump_entry table. Signed-off-by: Ard Biesheuvel Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Cc: linux-arm-kernel@lists.infradead.org Cc: linux-s390@vger.kernel.org Cc: Arnd Bergmann Cc: Heiko Carstens Cc: Kees Cook Cc: Will Deacon Cc: Catalin Marinas Cc: Steven Rostedt Cc: Martin Schwidefsky Cc: Jessica Yu Link: https://lkml.kernel.org/r/20180919065144.25010-3-ard.biesheuvel@linaro.org --- include/linux/jump_label.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) (limited to 'include/linux') diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index 2eadff9b3b90..2768a925bafa 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -121,6 +121,32 @@ struct static_key { #include #ifndef __ASSEMBLY__ +#ifdef CONFIG_HAVE_ARCH_JUMP_LABEL_RELATIVE + +struct jump_entry { + s32 code; + s32 target; + long key; // key may be far away from the core kernel under KASLR +}; + +static inline unsigned long jump_entry_code(const struct jump_entry *entry) +{ + return (unsigned long)&entry->code + entry->code; +} + +static inline unsigned long jump_entry_target(const struct jump_entry *entry) +{ + return (unsigned long)&entry->target + entry->target; +} + +static inline struct static_key *jump_entry_key(const struct jump_entry *entry) +{ + long offset = entry->key & ~1L; + + return (struct static_key *)((unsigned long)&entry->key + offset); +} + +#else static inline unsigned long jump_entry_code(const struct jump_entry *entry) { @@ -137,6 +163,8 @@ static inline struct static_key *jump_entry_key(const struct jump_entry *entry) return (struct static_key *)((unsigned long)entry->key & ~1UL); } +#endif + static inline bool jump_entry_is_branch(const struct jump_entry *entry) { return (unsigned long)entry->key & 1UL; -- cgit v1.2.3 From 19483677684b6ca01606f58503cb79cdfbbc7c72 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Tue, 18 Sep 2018 23:51:42 -0700 Subject: jump_label: Annotate entries that operate on __init code earlier Jump table entries are mostly read-only, with the exception of the init and module loader code that defuses entries that point into init code when the code being referred to is freed. For robustness, it would be better to move these entries into the ro_after_init section, but clearing the 'code' member of each jump table entry referring to init code at module load time races with the module_enable_ro() call that remaps the ro_after_init section read only, so we'd like to do it earlier. So given that whether such an entry refers to init code can be decided much earlier, we can pull this check forward. Since we may still need the code entry at this point, let's switch to setting a low bit in the 'key' member just like we do to annotate the default state of a jump table entry. Signed-off-by: Ard Biesheuvel Signed-off-by: Thomas Gleixner Reviewed-by: Kees Cook Acked-by: Peter Zijlstra (Intel) Cc: linux-arm-kernel@lists.infradead.org Cc: linux-s390@vger.kernel.org Cc: Arnd Bergmann Cc: Heiko Carstens Cc: Will Deacon Cc: Catalin Marinas Cc: Steven Rostedt Cc: Martin Schwidefsky Cc: Jessica Yu Link: https://lkml.kernel.org/r/20180919065144.25010-8-ard.biesheuvel@linaro.org --- include/linux/jump_label.h | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index 2768a925bafa..5df6a621e464 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -141,7 +141,7 @@ static inline unsigned long jump_entry_target(const struct jump_entry *entry) static inline struct static_key *jump_entry_key(const struct jump_entry *entry) { - long offset = entry->key & ~1L; + long offset = entry->key & ~3L; return (struct static_key *)((unsigned long)&entry->key + offset); } @@ -160,7 +160,7 @@ static inline unsigned long jump_entry_target(const struct jump_entry *entry) static inline struct static_key *jump_entry_key(const struct jump_entry *entry) { - return (struct static_key *)((unsigned long)entry->key & ~1UL); + return (struct static_key *)((unsigned long)entry->key & ~3UL); } #endif @@ -172,12 +172,12 @@ static inline bool jump_entry_is_branch(const struct jump_entry *entry) static inline bool jump_entry_is_init(const struct jump_entry *entry) { - return entry->code == 0; + return (unsigned long)entry->key & 2UL; } static inline void jump_entry_set_init(struct jump_entry *entry) { - entry->code = 0; + entry->key |= 2; } #endif @@ -213,7 +213,6 @@ extern struct jump_entry __start___jump_table[]; extern struct jump_entry __stop___jump_table[]; extern void jump_label_init(void); -extern void jump_label_invalidate_initmem(void); extern void jump_label_lock(void); extern void jump_label_unlock(void); extern void arch_jump_label_transform(struct jump_entry *entry, @@ -261,8 +260,6 @@ static __always_inline void jump_label_init(void) static_key_initialized = true; } -static inline void jump_label_invalidate_initmem(void) {} - static __always_inline bool static_key_false(struct static_key *key) { if (unlikely(static_key_count(key) > 0)) -- cgit v1.2.3 From c06c4d8090513f2974dfdbed2ac98634357ac475 Mon Sep 17 00:00:00 2001 From: Nadav Amit Date: Wed, 3 Oct 2018 14:30:53 -0700 Subject: x86/objtool: Use asm macros to work around GCC inlining bugs As described in: 77b0bf55bc67: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs") GCC's inlining heuristics are broken with common asm() patterns used in kernel code, resulting in the effective disabling of inlining. In the case of objtool the resulting borkage can be significant, since all the annotations of objtool are discarded during linkage and never inlined, yet GCC bogusly considers most functions affected by objtool annotations as 'too large'. The workaround is to set an assembly macro and call it from the inline assembly block. As a result GCC considers the inline assembly block as a single instruction. (Which it isn't, but that's the best we can get.) This increases the kernel size slightly: text data bss dec hex filename 18140829 10224724 2957312 31322865 1ddf2f1 ./vmlinux before 18140970 10225412 2957312 31323694 1ddf62e ./vmlinux after (+829) The number of static text symbols (i.e. non-inlined functions) is reduced: Before: 40321 After: 40302 (-19) [ mingo: Rewrote the changelog. ] Tested-by: Kees Cook Signed-off-by: Nadav Amit Reviewed-by: Josh Poimboeuf Acked-by: Peter Zijlstra (Intel) Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Christopher Li Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-sparse@vger.kernel.org Link: http://lkml.kernel.org/r/20181003213100.189959-4-namit@vmware.com Signed-off-by: Ingo Molnar --- include/linux/compiler.h | 56 +++++++++++++++++++++++++++++++++++++----------- 1 file changed, 43 insertions(+), 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 681d866efb1e..1921545c6351 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -99,22 +99,13 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, * unique, to convince GCC not to merge duplicate inline asm statements. */ #define annotate_reachable() ({ \ - asm volatile("%c0:\n\t" \ - ".pushsection .discard.reachable\n\t" \ - ".long %c0b - .\n\t" \ - ".popsection\n\t" : : "i" (__COUNTER__)); \ + asm volatile("ANNOTATE_REACHABLE counter=%c0" \ + : : "i" (__COUNTER__)); \ }) #define annotate_unreachable() ({ \ - asm volatile("%c0:\n\t" \ - ".pushsection .discard.unreachable\n\t" \ - ".long %c0b - .\n\t" \ - ".popsection\n\t" : : "i" (__COUNTER__)); \ + asm volatile("ANNOTATE_UNREACHABLE counter=%c0" \ + : : "i" (__COUNTER__)); \ }) -#define ASM_UNREACHABLE \ - "999:\n\t" \ - ".pushsection .discard.unreachable\n\t" \ - ".long 999b - .\n\t" \ - ".popsection\n\t" #else #define annotate_reachable() #define annotate_unreachable() @@ -299,6 +290,45 @@ static inline void *offset_to_ptr(const int *off) return (void *)((unsigned long)off + *off); } +#else /* __ASSEMBLY__ */ + +#ifdef __KERNEL__ +#ifndef LINKER_SCRIPT + +#ifdef CONFIG_STACK_VALIDATION +.macro ANNOTATE_UNREACHABLE counter:req +\counter: + .pushsection .discard.unreachable + .long \counter\()b -. + .popsection +.endm + +.macro ANNOTATE_REACHABLE counter:req +\counter: + .pushsection .discard.reachable + .long \counter\()b -. + .popsection +.endm + +.macro ASM_UNREACHABLE +999: + .pushsection .discard.unreachable + .long 999b - . + .popsection +.endm +#else /* CONFIG_STACK_VALIDATION */ +.macro ANNOTATE_UNREACHABLE counter:req +.endm + +.macro ANNOTATE_REACHABLE counter:req +.endm + +.macro ASM_UNREACHABLE +.endm +#endif /* CONFIG_STACK_VALIDATION */ + +#endif /* LINKER_SCRIPT */ +#endif /* __KERNEL__ */ #endif /* __ASSEMBLY__ */ #ifndef __optimize -- cgit v1.2.3 From 8ca2b56cd7da98fc8f8d787bb706b9d6c8674a3b Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Wed, 3 Oct 2018 13:07:18 -0400 Subject: locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y A sizable portion of the CPU cycles spent on the __lock_acquire() is used up by the atomic increment of the class->ops stat counter. By taking it out from the lock_class structure and changing it to a per-cpu per-lock-class counter, we can reduce the amount of cacheline contention on the class structure when multiple CPUs are trying to acquire locks of the same class simultaneously. To limit the increase in memory consumption because of the percpu nature of that counter, it is now put back under the CONFIG_DEBUG_LOCKDEP config option. So the memory consumption increase will only occur if CONFIG_DEBUG_LOCKDEP is defined. The lock_class structure, however, is reduced in size by 16 bytes on 64-bit archs after ops removal and a minor restructuring of the fields. This patch also fixes a bug in the increment code as the counter is of the 'unsigned long' type, but atomic_inc() was used to increment it. Signed-off-by: Waiman Long Acked-by: Peter Zijlstra Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon Link: http://lkml.kernel.org/r/d66681f3-8781-9793-1dcf-2436a284550b@redhat.com Signed-off-by: Ingo Molnar --- include/linux/lockdep.h | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index b0d0b51c4d85..1fd82ff99c65 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -99,13 +99,8 @@ struct lock_class { */ unsigned int version; - /* - * Statistics counter: - */ - unsigned long ops; - - const char *name; int name_version; + const char *name; #ifdef CONFIG_LOCK_STAT unsigned long contention_point[LOCKSTAT_POINTS]; -- cgit v1.2.3 From 01a14bda11add9dcd4a59200f13834d634559935 Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Thu, 18 Oct 2018 21:45:18 -0400 Subject: locking/lockdep: Make global debug_locks* variables read-mostly Make the frequently used lockdep global variable debug_locks read-mostly. As debug_locks_silent is sometime used together with debug_locks, it is also made read-mostly so that they can be close together. With false cacheline sharing, cacheline contention problem can happen depending on what get put into the same cacheline as debug_locks. Signed-off-by: Waiman Long Cc: Andrew Morton Cc: Linus Torvalds Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon Link: http://lkml.kernel.org/r/1539913518-15598-2-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar --- include/linux/debug_locks.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h index 120225e9a366..257ab3c92cb8 100644 --- a/include/linux/debug_locks.h +++ b/include/linux/debug_locks.h @@ -8,8 +8,8 @@ struct task_struct; -extern int debug_locks; -extern int debug_locks_silent; +extern int debug_locks __read_mostly; +extern int debug_locks_silent __read_mostly; static inline int __debug_locks_off(void) -- cgit v1.2.3