From a5418be9dffe70ccbb0b4bd5ea3881c81927e965 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Tue, 8 Dec 2020 09:46:56 +0530 Subject: sched/core: Rename schedutil_cpu_util() and allow rest of the kernel to use it There is nothing schedutil specific in schedutil_cpu_util(), rename it to effective_cpu_util(). Also create and expose another wrapper sched_cpu_util() which can be used by other parts of the kernel, like thermal core (that will be done in a later commit). Signed-off-by: Viresh Kumar Signed-off-by: Peter Zijlstra (Intel) Acked-by: Rafael J. Wysocki Link: https://lkml.kernel.org/r/db011961fb3bb8bef1c0eda5cd64564637d3ef31.1607400596.git.viresh.kumar@linaro.org --- include/linux/sched.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index 6e3a5eeec509..31169e70476a 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1968,6 +1968,11 @@ extern long sched_getaffinity(pid_t pid, struct cpumask *mask); #define TASK_SIZE_OF(tsk) TASK_SIZE #endif +#ifdef CONFIG_SMP +/* Returns effective CPU energy utilization, as seen by the scheduler */ +unsigned long sched_cpu_util(int cpu, unsigned long max); +#endif /* CONFIG_SMP */ + #ifdef CONFIG_RSEQ /* -- cgit v1.2.3 From 620a6dc40754dc218f5b6389b5d335e9a107fd29 Mon Sep 17 00:00:00 2001 From: Valentin Schneider Date: Fri, 22 Jan 2021 12:39:43 +0000 Subject: sched/topology: Make sched_init_numa() use a set for the deduplicating sort MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The deduplicating sort in sched_init_numa() assumes that the first line in the distance table contains all unique values in the entire table. I've been trying to pen what this exactly means for the topology, but it's not straightforward. For instance, topology.c uses this example: node 0 1 2 3 0: 10 20 20 30 1: 20 10 20 20 2: 20 20 10 20 3: 30 20 20 10 0 ----- 1 | / | | / | | / | 2 ----- 3 Which works out just fine. However, if we swap nodes 0 and 1: 1 ----- 0 | / | | / | | / | 2 ----- 3 we get this distance table: node 0 1 2 3 0: 10 20 20 20 1: 20 10 20 30 2: 20 20 10 20 3: 20 30 20 10 Which breaks the deduplicating sort (non-representative first line). In this case this would just be a renumbering exercise, but it so happens that we can have a deduplicating sort that goes through the whole table in O(n²) at the extra cost of a temporary memory allocation (i.e. any form of set). The ACPI spec (SLIT) mentions distances are encoded on 8 bits. Following this, implement the set as a 256-bits bitmap. Should this not be satisfactory (i.e. we want to support 32-bit values), then we'll have to go for some other sparse set implementation. This has the added benefit of letting us allocate just the right amount of memory for sched_domains_numa_distance[], rather than an arbitrary (nr_node_ids + 1). Note: DT binding equivalent (distance-map) decodes distances as 32-bit values. Signed-off-by: Valentin Schneider Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20210122123943.1217-2-valentin.schneider@arm.com --- include/linux/topology.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/topology.h b/include/linux/topology.h index ad03df1cc266..7634cd737061 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -48,6 +48,7 @@ int arch_update_cpu_topology(void); /* Conform to ACPI 2.0 SLIT distance definitions */ #define LOCAL_DISTANCE 10 #define REMOTE_DISTANCE 20 +#define DISTANCE_BITS 8 #ifndef node_distance #define node_distance(from,to) ((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE) #endif -- cgit v1.2.3 From 1875dc5b8ff4690547c446ef222083e28e2d9463 Mon Sep 17 00:00:00 2001 From: Peter Oskolkov Date: Tue, 26 Jan 2021 11:34:49 -0800 Subject: sched: Correctly sort struct predeclarations The comment says: /* task_struct member predeclarations (sorted alphabetically): */ So move io_uring_task where it belongs (alphabetically). Signed-off-by: Peter Oskolkov Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20210126193449.487547-1-posk@google.com --- include/linux/sched.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index 31169e70476a..e1152227bb79 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -47,6 +47,7 @@ struct cfs_rq; struct fs_struct; struct futex_pi_state; struct io_context; +struct io_uring_task; struct mempolicy; struct nameidata; struct nsproxy; @@ -65,7 +66,6 @@ struct sighand_struct; struct signal_struct; struct task_delay_info; struct task_group; -struct io_uring_task; /* * Task state bitmask. NOTE! These bits are also -- cgit v1.2.3 From 2d24dd5798d0474d9bf705bfca8725e7d20f9d54 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Wed, 29 Apr 2020 17:03:22 +0200 Subject: rbtree: Add generic add and find helpers I've always been bothered by the endless (fragile) boilerplate for rbtree, and I recently wrote some rbtree helpers for objtool and figured I should lift them into the kernel and use them more widely. Provide: partial-order; less() based: - rb_add(): add a new entry to the rbtree - rb_add_cached(): like rb_add(), but for a rb_root_cached total-order; cmp() based: - rb_find(): find an entry in an rbtree - rb_find_add(): find an entry, and add if not found - rb_find_first(): find the first (leftmost) matching entry - rb_next_match(): continue from rb_find_first() - rb_for_each(): iterate a sub-tree using the previous two Inlining and constant propagation should see the compiler inline the whole thing, including the various compare functions. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Reviewed-by: Michel Lespinasse Acked-by: Davidlohr Bueso --- include/linux/rbtree.h | 190 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 190 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h index d7db17996322..e0b300de8f3f 100644 --- a/include/linux/rbtree.h +++ b/include/linux/rbtree.h @@ -158,4 +158,194 @@ static inline void rb_replace_node_cached(struct rb_node *victim, rb_replace_node(victim, new, &root->rb_root); } +/* + * The below helper functions use 2 operators with 3 different + * calling conventions. The operators are related like: + * + * comp(a->key,b) < 0 := less(a,b) + * comp(a->key,b) > 0 := less(b,a) + * comp(a->key,b) == 0 := !less(a,b) && !less(b,a) + * + * If these operators define a partial order on the elements we make no + * guarantee on which of the elements matching the key is found. See + * rb_find(). + * + * The reason for this is to allow the find() interface without requiring an + * on-stack dummy object, which might not be feasible due to object size. + */ + +/** + * rb_add_cached() - insert @node into the leftmost cached tree @tree + * @node: node to insert + * @tree: leftmost cached tree to insert @node into + * @less: operator defining the (partial) node order + */ +static __always_inline void +rb_add_cached(struct rb_node *node, struct rb_root_cached *tree, + bool (*less)(struct rb_node *, const struct rb_node *)) +{ + struct rb_node **link = &tree->rb_root.rb_node; + struct rb_node *parent = NULL; + bool leftmost = true; + + while (*link) { + parent = *link; + if (less(node, parent)) { + link = &parent->rb_left; + } else { + link = &parent->rb_right; + leftmost = false; + } + } + + rb_link_node(node, parent, link); + rb_insert_color_cached(node, tree, leftmost); +} + +/** + * rb_add() - insert @node into @tree + * @node: node to insert + * @tree: tree to insert @node into + * @less: operator defining the (partial) node order + */ +static __always_inline void +rb_add(struct rb_node *node, struct rb_root *tree, + bool (*less)(struct rb_node *, const struct rb_node *)) +{ + struct rb_node **link = &tree->rb_node; + struct rb_node *parent = NULL; + + while (*link) { + parent = *link; + if (less(node, parent)) + link = &parent->rb_left; + else + link = &parent->rb_right; + } + + rb_link_node(node, parent, link); + rb_insert_color(node, tree); +} + +/** + * rb_find_add() - find equivalent @node in @tree, or add @node + * @node: node to look-for / insert + * @tree: tree to search / modify + * @cmp: operator defining the node order + * + * Returns the rb_node matching @node, or NULL when no match is found and @node + * is inserted. + */ +static __always_inline struct rb_node * +rb_find_add(struct rb_node *node, struct rb_root *tree, + int (*cmp)(struct rb_node *, const struct rb_node *)) +{ + struct rb_node **link = &tree->rb_node; + struct rb_node *parent = NULL; + int c; + + while (*link) { + parent = *link; + c = cmp(node, parent); + + if (c < 0) + link = &parent->rb_left; + else if (c > 0) + link = &parent->rb_right; + else + return parent; + } + + rb_link_node(node, parent, link); + rb_insert_color(node, tree); + return NULL; +} + +/** + * rb_find() - find @key in tree @tree + * @key: key to match + * @tree: tree to search + * @cmp: operator defining the node order + * + * Returns the rb_node matching @key or NULL. + */ +static __always_inline struct rb_node * +rb_find(const void *key, const struct rb_root *tree, + int (*cmp)(const void *key, const struct rb_node *)) +{ + struct rb_node *node = tree->rb_node; + + while (node) { + int c = cmp(key, node); + + if (c < 0) + node = node->rb_left; + else if (c > 0) + node = node->rb_right; + else + return node; + } + + return NULL; +} + +/** + * rb_find_first() - find the first @key in @tree + * @key: key to match + * @tree: tree to search + * @cmp: operator defining node order + * + * Returns the leftmost node matching @key, or NULL. + */ +static __always_inline struct rb_node * +rb_find_first(const void *key, const struct rb_root *tree, + int (*cmp)(const void *key, const struct rb_node *)) +{ + struct rb_node *node = tree->rb_node; + struct rb_node *match = NULL; + + while (node) { + int c = cmp(key, node); + + if (c <= 0) { + if (!c) + match = node; + node = node->rb_left; + } else if (c > 0) { + node = node->rb_right; + } + } + + return match; +} + +/** + * rb_next_match() - find the next @key in @tree + * @key: key to match + * @tree: tree to search + * @cmp: operator defining node order + * + * Returns the next node matching @key, or NULL. + */ +static __always_inline struct rb_node * +rb_next_match(const void *key, struct rb_node *node, + int (*cmp)(const void *key, const struct rb_node *)) +{ + node = rb_next(node); + if (node && cmp(key, node)) + node = NULL; + return node; +} + +/** + * rb_for_each() - iterates a subtree matching @key + * @node: iterator + * @key: key to match + * @tree: tree to search + * @cmp: operator defining node order + */ +#define rb_for_each(node, key, tree, cmp) \ + for ((node) = rb_find_first((key), (tree), (cmp)); \ + (node); (node) = rb_next_match((key), (node), (cmp))) + #endif /* _LINUX_RBTREE_H */ -- cgit v1.2.3 From 8ecca39483ed4e4e97096d0d6f8e25fdd323b189 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Wed, 29 Apr 2020 17:04:41 +0200 Subject: rbtree, sched/deadline: Use rb_add_cached() Reduce rbtree boiler plate by using the new helpers. Make rb_add_cached() / rb_erase_cached() return a pointer to the leftmost node to aid in updating additional state. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Acked-by: Davidlohr Bueso --- include/linux/rbtree.h | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h index e0b300de8f3f..d31ecaf4fdd3 100644 --- a/include/linux/rbtree.h +++ b/include/linux/rbtree.h @@ -141,12 +141,18 @@ static inline void rb_insert_color_cached(struct rb_node *node, rb_insert_color(node, &root->rb_root); } -static inline void rb_erase_cached(struct rb_node *node, - struct rb_root_cached *root) + +static inline struct rb_node * +rb_erase_cached(struct rb_node *node, struct rb_root_cached *root) { + struct rb_node *leftmost = NULL; + if (root->rb_leftmost == node) - root->rb_leftmost = rb_next(node); + leftmost = root->rb_leftmost = rb_next(node); + rb_erase(node, &root->rb_root); + + return leftmost; } static inline void rb_replace_node_cached(struct rb_node *victim, @@ -179,8 +185,10 @@ static inline void rb_replace_node_cached(struct rb_node *victim, * @node: node to insert * @tree: leftmost cached tree to insert @node into * @less: operator defining the (partial) node order + * + * Returns @node when it is the new leftmost, or NULL. */ -static __always_inline void +static __always_inline struct rb_node * rb_add_cached(struct rb_node *node, struct rb_root_cached *tree, bool (*less)(struct rb_node *, const struct rb_node *)) { @@ -200,6 +208,8 @@ rb_add_cached(struct rb_node *node, struct rb_root_cached *tree, rb_link_node(node, parent, link); rb_insert_color_cached(node, tree, leftmost); + + return leftmost ? node : NULL; } /** -- cgit v1.2.3 From a3b89864554bbce1594b7abdb5739fc708c1ca95 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Wed, 29 Apr 2020 17:05:15 +0200 Subject: rbtree, perf: Use new rbtree helpers Reduce rbtree boiler plate by using the new helpers. One noteworthy change is unification of the various (partial) compare functions. We construct a subtree match by forcing the sub-order to always match, see __group_cmp(). Due to 'const' we had to touch cgroup_id(). Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Acked-by: Tejun Heo Acked-by: Davidlohr Bueso --- include/linux/cgroup.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 451c2d26a5db..4f2f79de083e 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -307,7 +307,7 @@ void css_task_iter_end(struct css_task_iter *it); * Inline functions. */ -static inline u64 cgroup_id(struct cgroup *cgrp) +static inline u64 cgroup_id(const struct cgroup *cgrp) { return cgrp->kn->id; } @@ -701,7 +701,7 @@ void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen); struct cgroup_subsys_state; struct cgroup; -static inline u64 cgroup_id(struct cgroup *cgrp) { return 1; } +static inline u64 cgroup_id(const struct cgroup *cgrp) { return 1; } static inline void css_get(struct cgroup_subsys_state *css) {} static inline void css_put(struct cgroup_subsys_state *css) {} static inline int cgroup_attach_task_all(struct task_struct *from, -- cgit v1.2.3 From ae18ad281e825993d190073d0ae2ea35dee27ee1 Mon Sep 17 00:00:00 2001 From: Dietmar Eggemann Date: Thu, 28 Jan 2021 14:10:38 +0100 Subject: sched: Remove MAX_USER_RT_PRIO Commit d46523ea32a7 ("[PATCH] fix MAX_USER_RT_PRIO and MAX_RT_PRIO") was introduced due to a a small time period in which the realtime patch set was using different values for MAX_USER_RT_PRIO and MAX_RT_PRIO. This is no longer true, i.e. now MAX_RT_PRIO == MAX_USER_RT_PRIO. Get rid of MAX_USER_RT_PRIO and make everything use MAX_RT_PRIO instead. Signed-off-by: Dietmar Eggemann Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Link: https://lkml.kernel.org/r/20210128131040.296856-2-dietmar.eggemann@arm.com --- include/linux/sched/prio.h | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched/prio.h b/include/linux/sched/prio.h index 7d64feafc408..d111f2fd77ea 100644 --- a/include/linux/sched/prio.h +++ b/include/linux/sched/prio.h @@ -11,16 +11,9 @@ * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH * tasks are in the range MAX_RT_PRIO..MAX_PRIO-1. Priority * values are inverted: lower p->prio value means higher priority. - * - * The MAX_USER_RT_PRIO value allows the actual maximum - * RT priority to be separate from the value exported to - * user-space. This allows kernel threads to set their - * priority to a value higher than any user task. Note: - * MAX_RT_PRIO must not be smaller than MAX_USER_RT_PRIO. */ -#define MAX_USER_RT_PRIO 100 -#define MAX_RT_PRIO MAX_USER_RT_PRIO +#define MAX_RT_PRIO 100 #define MAX_PRIO (MAX_RT_PRIO + NICE_WIDTH) #define DEFAULT_PRIO (MAX_RT_PRIO + NICE_WIDTH / 2) -- cgit v1.2.3 From 9d061ba6bc170045857f3efe0bba5def30188d4d Mon Sep 17 00:00:00 2001 From: Dietmar Eggemann Date: Thu, 28 Jan 2021 14:10:39 +0100 Subject: sched: Remove USER_PRIO, TASK_USER_PRIO and MAX_USER_PRIO The only remaining use of MAX_USER_PRIO (and USER_PRIO) is the SCALE_PRIO() definition in the PowerPC Cell architecture's Synergistic Processor Unit (SPU) scheduler. TASK_USER_PRIO isn't used anymore. Commit fe443ef2ac42 ("[POWERPC] spusched: Dynamic timeslicing for SCHED_OTHER") copied SCALE_PRIO() from the task scheduler in v2.6.23. Commit a4ec24b48dde ("sched: tidy up SCHED_RR") removed it from the task scheduler in v2.6.24. Commit 3ee237dddcd8 ("sched/prio: Add 3 macros of MAX_NICE, MIN_NICE and NICE_WIDTH in prio.h") introduced NICE_WIDTH much later. With: MAX_USER_PRIO = USER_PRIO(MAX_PRIO) = MAX_PRIO - MAX_RT_PRIO MAX_PRIO = MAX_RT_PRIO + NICE_WIDTH MAX_USER_PRIO = MAX_RT_PRIO + NICE_WIDTH - MAX_RT_PRIO MAX_USER_PRIO = NICE_WIDTH MAX_USER_PRIO can be replaced by NICE_WIDTH to be able to remove all the {*_}USER_PRIO defines. Signed-off-by: Dietmar Eggemann Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Link: https://lkml.kernel.org/r/20210128131040.296856-3-dietmar.eggemann@arm.com --- include/linux/sched/prio.h | 9 --------- 1 file changed, 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched/prio.h b/include/linux/sched/prio.h index d111f2fd77ea..ab83d85e1183 100644 --- a/include/linux/sched/prio.h +++ b/include/linux/sched/prio.h @@ -26,15 +26,6 @@ #define NICE_TO_PRIO(nice) ((nice) + DEFAULT_PRIO) #define PRIO_TO_NICE(prio) ((prio) - DEFAULT_PRIO) -/* - * 'User priority' is the nice value converted to something we - * can work with better when scaling various scheduler parameters, - * it's a [ 0 ... 39 ] range. - */ -#define USER_PRIO(p) ((p)-MAX_RT_PRIO) -#define TASK_USER_PRIO(p) USER_PRIO((p)->static_prio) -#define MAX_USER_PRIO (USER_PRIO(MAX_PRIO)) - /* * Convert nice value [19,-20] to rlimit style value [1,40]. */ -- cgit v1.2.3 From 880cfed3a012d7863f42251791cea7fe78c39390 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Mon, 18 Jan 2021 15:12:18 +0100 Subject: static_call: Pull some static_call declarations to the type headers Some static call declarations are going to be needed on low level header files. Move the necessary material to the dedicated static call types header to avoid inclusion dependency hell. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Frederic Weisbecker Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Link: https://lkml.kernel.org/r/20210118141223.123667-4-frederic@kernel.org --- include/linux/static_call.h | 21 --------------------- include/linux/static_call_types.h | 27 +++++++++++++++++++++++++++ 2 files changed, 27 insertions(+), 21 deletions(-) (limited to 'include/linux') diff --git a/include/linux/static_call.h b/include/linux/static_call.h index 695da4c9b338..a2c064585c03 100644 --- a/include/linux/static_call.h +++ b/include/linux/static_call.h @@ -107,26 +107,10 @@ extern void arch_static_call_transform(void *site, void *tramp, void *func, bool #define STATIC_CALL_TRAMP_ADDR(name) &STATIC_CALL_TRAMP(name) -/* - * __ADDRESSABLE() is used to ensure the key symbol doesn't get stripped from - * the symbol table so that objtool can reference it when it generates the - * .static_call_sites section. - */ -#define __static_call(name) \ -({ \ - __ADDRESSABLE(STATIC_CALL_KEY(name)); \ - &STATIC_CALL_TRAMP(name); \ -}) - #else #define STATIC_CALL_TRAMP_ADDR(name) NULL #endif - -#define DECLARE_STATIC_CALL(name, func) \ - extern struct static_call_key STATIC_CALL_KEY(name); \ - extern typeof(func) STATIC_CALL_TRAMP(name); - #define static_call_update(name, func) \ ({ \ BUILD_BUG_ON(!__same_type(*(func), STATIC_CALL_TRAMP(name))); \ @@ -174,7 +158,6 @@ extern int static_call_text_reserved(void *start, void *end); }; \ ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name) -#define static_call(name) __static_call(name) #define static_call_cond(name) (void)__static_call(name) #define EXPORT_STATIC_CALL(name) \ @@ -207,7 +190,6 @@ struct static_call_key { }; \ ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name) -#define static_call(name) __static_call(name) #define static_call_cond(name) (void)__static_call(name) static inline @@ -252,9 +234,6 @@ struct static_call_key { .func = NULL, \ } -#define static_call(name) \ - ((typeof(STATIC_CALL_TRAMP(name))*)(STATIC_CALL_KEY(name).func)) - static inline void __static_call_nop(void) { } /* diff --git a/include/linux/static_call_types.h b/include/linux/static_call_types.h index 89135bb35bf7..08f78b1b88b4 100644 --- a/include/linux/static_call_types.h +++ b/include/linux/static_call_types.h @@ -4,6 +4,7 @@ #include #include +#include #define STATIC_CALL_KEY_PREFIX __SCK__ #define STATIC_CALL_KEY_PREFIX_STR __stringify(STATIC_CALL_KEY_PREFIX) @@ -32,4 +33,30 @@ struct static_call_site { s32 key; }; +#define DECLARE_STATIC_CALL(name, func) \ + extern struct static_call_key STATIC_CALL_KEY(name); \ + extern typeof(func) STATIC_CALL_TRAMP(name); + +#ifdef CONFIG_HAVE_STATIC_CALL + +/* + * __ADDRESSABLE() is used to ensure the key symbol doesn't get stripped from + * the symbol table so that objtool can reference it when it generates the + * .static_call_sites section. + */ +#define __static_call(name) \ +({ \ + __ADDRESSABLE(STATIC_CALL_KEY(name)); \ + &STATIC_CALL_TRAMP(name); \ +}) + +#define static_call(name) __static_call(name) + +#else + +#define static_call(name) \ + ((typeof(STATIC_CALL_TRAMP(name))*)(STATIC_CALL_KEY(name).func)) + +#endif /* CONFIG_HAVE_STATIC_CALL */ + #endif /* _STATIC_CALL_TYPES_H */ -- cgit v1.2.3 From 3f2a8fc4b15de18644e8a80a09edda168676e22c Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Mon, 18 Jan 2021 15:12:16 +0100 Subject: static_call/x86: Add __static_call_return0() Provide a stub function that return 0 and wire up the static call site patching to replace the CALL with a single 5 byte instruction that clears %RAX, the return value register. The function can be cast to any function pointer type that has a single %RAX return (including pointers). Also provide a version that returns an int for convenience. We are clearing the entire %RAX register in any case, whether the return value is 32 or 64 bits, since %RAX is always a scratch register anyway. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Frederic Weisbecker Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Link: https://lkml.kernel.org/r/20210118141223.123667-2-frederic@kernel.org --- include/linux/static_call.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/static_call.h b/include/linux/static_call.h index a2c064585c03..bd6735d10e2e 100644 --- a/include/linux/static_call.h +++ b/include/linux/static_call.h @@ -142,6 +142,8 @@ extern void __static_call_update(struct static_call_key *key, void *tramp, void extern int static_call_mod_init(struct module *mod); extern int static_call_text_reserved(void *start, void *end); +extern long __static_call_return0(void); + #define DEFINE_STATIC_CALL(name, _func) \ DECLARE_STATIC_CALL(name, _func); \ struct static_call_key STATIC_CALL_KEY(name) = { \ @@ -206,6 +208,11 @@ static inline int static_call_text_reserved(void *start, void *end) return 0; } +static inline long __static_call_return0(void) +{ + return 0; +} + #define EXPORT_STATIC_CALL(name) \ EXPORT_SYMBOL(STATIC_CALL_KEY(name)); \ EXPORT_SYMBOL(STATIC_CALL_TRAMP(name)) @@ -222,6 +229,11 @@ struct static_call_key { void *func; }; +static inline long __static_call_return0(void) +{ + return 0; +} + #define DEFINE_STATIC_CALL(name, _func) \ DECLARE_STATIC_CALL(name, _func); \ struct static_call_key STATIC_CALL_KEY(name) = { \ -- cgit v1.2.3 From 29fd01944b7273bb630c649a2104b7f9e4ef3fa6 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Mon, 18 Jan 2021 15:12:17 +0100 Subject: static_call: Provide DEFINE_STATIC_CALL_RET0() DECLARE_STATIC_CALL() must pass the original function targeted for a given static call. But DEFINE_STATIC_CALL() may want to initialize it as off. In this case we can't pass NULL (for functions without return value) or __static_call_return0 (for functions returning a value) directly to DEFINE_STATIC_CALL() as that may trigger a static call redeclaration with a different function prototype. Type casts neither can work around that as they don't get along with typeof(). The proper way to do that for functions that don't return a value is to use DEFINE_STATIC_CALL_NULL(). But functions returning a actual value don't have an equivalent yet. Provide DEFINE_STATIC_CALL_RET0() to solve this situation. Signed-off-by: Frederic Weisbecker Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Link: https://lkml.kernel.org/r/20210118141223.123667-3-frederic@kernel.org --- include/linux/static_call.h | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/static_call.h b/include/linux/static_call.h index bd6735d10e2e..d69dd8b976ca 100644 --- a/include/linux/static_call.h +++ b/include/linux/static_call.h @@ -144,13 +144,13 @@ extern int static_call_text_reserved(void *start, void *end); extern long __static_call_return0(void); -#define DEFINE_STATIC_CALL(name, _func) \ +#define __DEFINE_STATIC_CALL(name, _func, _func_init) \ DECLARE_STATIC_CALL(name, _func); \ struct static_call_key STATIC_CALL_KEY(name) = { \ - .func = _func, \ + .func = _func_init, \ .type = 1, \ }; \ - ARCH_DEFINE_STATIC_CALL_TRAMP(name, _func) + ARCH_DEFINE_STATIC_CALL_TRAMP(name, _func_init) #define DEFINE_STATIC_CALL_NULL(name, _func) \ DECLARE_STATIC_CALL(name, _func); \ @@ -178,12 +178,12 @@ struct static_call_key { void *func; }; -#define DEFINE_STATIC_CALL(name, _func) \ +#define __DEFINE_STATIC_CALL(name, _func, _func_init) \ DECLARE_STATIC_CALL(name, _func); \ struct static_call_key STATIC_CALL_KEY(name) = { \ - .func = _func, \ + .func = _func_init, \ }; \ - ARCH_DEFINE_STATIC_CALL_TRAMP(name, _func) + ARCH_DEFINE_STATIC_CALL_TRAMP(name, _func_init) #define DEFINE_STATIC_CALL_NULL(name, _func) \ DECLARE_STATIC_CALL(name, _func); \ @@ -234,10 +234,10 @@ static inline long __static_call_return0(void) return 0; } -#define DEFINE_STATIC_CALL(name, _func) \ +#define __DEFINE_STATIC_CALL(name, _func, _func_init) \ DECLARE_STATIC_CALL(name, _func); \ struct static_call_key STATIC_CALL_KEY(name) = { \ - .func = _func, \ + .func = _func_init, \ } #define DEFINE_STATIC_CALL_NULL(name, _func) \ @@ -286,4 +286,10 @@ static inline int static_call_text_reserved(void *start, void *end) #endif /* CONFIG_HAVE_STATIC_CALL */ +#define DEFINE_STATIC_CALL(name, _func) \ + __DEFINE_STATIC_CALL(name, _func, _func) + +#define DEFINE_STATIC_CALL_RET0(name, _func) \ + __DEFINE_STATIC_CALL(name, _func, __static_call_return0) + #endif /* _LINUX_STATIC_CALL_H */ -- cgit v1.2.3 From b965f1ddb47daa5b8b2e2bc9c921431236830367 Mon Sep 17 00:00:00 2001 From: "Peter Zijlstra (Intel)" Date: Mon, 18 Jan 2021 15:12:20 +0100 Subject: preempt/dynamic: Provide cond_resched() and might_resched() static calls Provide static calls to control cond_resched() (called in !CONFIG_PREEMPT) and might_resched() (called in CONFIG_PREEMPT_VOLUNTARY) to that we can override their behaviour when preempt= is overriden. Since the default behaviour is full preemption, both their calls are ignored when preempt= isn't passed. [fweisbec: branch might_resched() directly to __cond_resched(), only define static calls when PREEMPT_DYNAMIC] Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Frederic Weisbecker Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Link: https://lkml.kernel.org/r/20210118141223.123667-6-frederic@kernel.org --- include/linux/kernel.h | 23 +++++++++++++++++++---- include/linux/sched.h | 27 ++++++++++++++++++++++++--- 2 files changed, 43 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kernel.h b/include/linux/kernel.h index f7902d8c1048..cfd3d349f905 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -15,7 +15,7 @@ #include #include #include - +#include #include #include @@ -81,11 +81,26 @@ struct pt_regs; struct user; #ifdef CONFIG_PREEMPT_VOLUNTARY -extern int _cond_resched(void); -# define might_resched() _cond_resched() + +extern int __cond_resched(void); +# define might_resched() __cond_resched() + +#elif defined(CONFIG_PREEMPT_DYNAMIC) + +extern int __cond_resched(void); + +DECLARE_STATIC_CALL(might_resched, __cond_resched); + +static __always_inline void might_resched(void) +{ + static_call(might_resched)(); +} + #else + # define might_resched() do { } while (0) -#endif + +#endif /* CONFIG_PREEMPT_* */ #ifdef CONFIG_DEBUG_ATOMIC_SLEEP extern void ___might_sleep(const char *file, int line, int preempt_offset); diff --git a/include/linux/sched.h b/include/linux/sched.h index e1152227bb79..2f35594b8b53 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1871,11 +1871,32 @@ static inline int test_tsk_need_resched(struct task_struct *tsk) * value indicates whether a reschedule was done in fact. * cond_resched_lock() will drop the spinlock before scheduling, */ -#ifndef CONFIG_PREEMPTION -extern int _cond_resched(void); +#if !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC) +extern int __cond_resched(void); + +#ifdef CONFIG_PREEMPT_DYNAMIC + +DECLARE_STATIC_CALL(cond_resched, __cond_resched); + +static __always_inline int _cond_resched(void) +{ + return static_call(cond_resched)(); +} + #else + +static inline int _cond_resched(void) +{ + return __cond_resched(); +} + +#endif /* CONFIG_PREEMPT_DYNAMIC */ + +#else + static inline int _cond_resched(void) { return 0; } -#endif + +#endif /* !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC) */ #define cond_resched() ({ \ ___might_sleep(__FILE__, __LINE__, 0); \ -- cgit v1.2.3 From 40607ee97e4eec5655cc0f76a720bdc4c63a6434 Mon Sep 17 00:00:00 2001 From: "Peter Zijlstra (Intel)" Date: Mon, 18 Jan 2021 15:12:22 +0100 Subject: preempt/dynamic: Provide irqentry_exit_cond_resched() static call Provide static call to control IRQ preemption (called in CONFIG_PREEMPT) so that we can override its behaviour when preempt= is overriden. Since the default behaviour is full preemption, its call is initialized to provide IRQ preemption when preempt= isn't passed. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Frederic Weisbecker Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Link: https://lkml.kernel.org/r/20210118141223.123667-8-frederic@kernel.org --- include/linux/entry-common.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index a104b298019a..883acef895bc 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -2,6 +2,7 @@ #ifndef __LINUX_ENTRYCOMMON_H #define __LINUX_ENTRYCOMMON_H +#include #include #include #include @@ -454,6 +455,9 @@ irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs); * Conditional reschedule with additional sanity checks. */ void irqentry_exit_cond_resched(void); +#ifdef CONFIG_PREEMPT_DYNAMIC +DECLARE_STATIC_CALL(irqentry_exit_cond_resched, irqentry_exit_cond_resched); +#endif /** * irqentry_exit - Handle return from exception that used irqentry_enter() -- cgit v1.2.3 From 73f44fe19d359635a607e8e8daa0da4001c1cfc2 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Wed, 27 Jan 2021 17:18:37 -0600 Subject: static_call: Allow module use without exposing static_call_key When exporting static_call_key; with EXPORT_STATIC_CALL*(), the module can use static_call_update() to change the function called. This is not desirable in general. Not exporting static_call_key however also disallows usage of static_call(), since objtool needs the key to construct the static_call_site. Solve this by allowing objtool to create the static_call_site using the trampoline address when it builds a module and cannot find the static_call_key symbol. The module loader will then try and map the trampole back to a key before it constructs the normal sites list. Doing this requires a trampoline -> key associsation, so add another magic section that keeps those. Originally-by: Peter Zijlstra (Intel) Signed-off-by: Josh Poimboeuf Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Link: https://lkml.kernel.org/r/20210127231837.ifddpn7rhwdaepiu@treble --- include/linux/static_call.h | 22 ++++++++++++++++++++-- include/linux/static_call_types.h | 27 +++++++++++++++++++++++++-- 2 files changed, 45 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/static_call.h b/include/linux/static_call.h index d69dd8b976ca..85ecc789f4ff 100644 --- a/include/linux/static_call.h +++ b/include/linux/static_call.h @@ -138,6 +138,12 @@ struct static_call_key { }; }; +/* For finding the key associated with a trampoline */ +struct static_call_tramp_key { + s32 tramp; + s32 key; +}; + extern void __static_call_update(struct static_call_key *key, void *tramp, void *func); extern int static_call_mod_init(struct module *mod); extern int static_call_text_reserved(void *start, void *end); @@ -165,11 +171,18 @@ extern long __static_call_return0(void); #define EXPORT_STATIC_CALL(name) \ EXPORT_SYMBOL(STATIC_CALL_KEY(name)); \ EXPORT_SYMBOL(STATIC_CALL_TRAMP(name)) - #define EXPORT_STATIC_CALL_GPL(name) \ EXPORT_SYMBOL_GPL(STATIC_CALL_KEY(name)); \ EXPORT_SYMBOL_GPL(STATIC_CALL_TRAMP(name)) +/* Leave the key unexported, so modules can't change static call targets: */ +#define EXPORT_STATIC_CALL_TRAMP(name) \ + EXPORT_SYMBOL(STATIC_CALL_TRAMP(name)); \ + ARCH_ADD_TRAMP_KEY(name) +#define EXPORT_STATIC_CALL_TRAMP_GPL(name) \ + EXPORT_SYMBOL_GPL(STATIC_CALL_TRAMP(name)); \ + ARCH_ADD_TRAMP_KEY(name) + #elif defined(CONFIG_HAVE_STATIC_CALL) static inline int static_call_init(void) { return 0; } @@ -216,11 +229,16 @@ static inline long __static_call_return0(void) #define EXPORT_STATIC_CALL(name) \ EXPORT_SYMBOL(STATIC_CALL_KEY(name)); \ EXPORT_SYMBOL(STATIC_CALL_TRAMP(name)) - #define EXPORT_STATIC_CALL_GPL(name) \ EXPORT_SYMBOL_GPL(STATIC_CALL_KEY(name)); \ EXPORT_SYMBOL_GPL(STATIC_CALL_TRAMP(name)) +/* Leave the key unexported, so modules can't change static call targets: */ +#define EXPORT_STATIC_CALL_TRAMP(name) \ + EXPORT_SYMBOL(STATIC_CALL_TRAMP(name)) +#define EXPORT_STATIC_CALL_TRAMP_GPL(name) \ + EXPORT_SYMBOL_GPL(STATIC_CALL_TRAMP(name)) + #else /* Generic implementation */ static inline int static_call_init(void) { return 0; } diff --git a/include/linux/static_call_types.h b/include/linux/static_call_types.h index 08f78b1b88b4..ae5662d368b9 100644 --- a/include/linux/static_call_types.h +++ b/include/linux/static_call_types.h @@ -10,6 +10,7 @@ #define STATIC_CALL_KEY_PREFIX_STR __stringify(STATIC_CALL_KEY_PREFIX) #define STATIC_CALL_KEY_PREFIX_LEN (sizeof(STATIC_CALL_KEY_PREFIX_STR) - 1) #define STATIC_CALL_KEY(name) __PASTE(STATIC_CALL_KEY_PREFIX, name) +#define STATIC_CALL_KEY_STR(name) __stringify(STATIC_CALL_KEY(name)) #define STATIC_CALL_TRAMP_PREFIX __SCT__ #define STATIC_CALL_TRAMP_PREFIX_STR __stringify(STATIC_CALL_TRAMP_PREFIX) @@ -39,17 +40,39 @@ struct static_call_site { #ifdef CONFIG_HAVE_STATIC_CALL +#define __raw_static_call(name) (&STATIC_CALL_TRAMP(name)) + +#ifdef CONFIG_HAVE_STATIC_CALL_INLINE + /* * __ADDRESSABLE() is used to ensure the key symbol doesn't get stripped from * the symbol table so that objtool can reference it when it generates the * .static_call_sites section. */ +#define __STATIC_CALL_ADDRESSABLE(name) \ + __ADDRESSABLE(STATIC_CALL_KEY(name)) + #define __static_call(name) \ ({ \ - __ADDRESSABLE(STATIC_CALL_KEY(name)); \ - &STATIC_CALL_TRAMP(name); \ + __STATIC_CALL_ADDRESSABLE(name); \ + __raw_static_call(name); \ }) +#else /* !CONFIG_HAVE_STATIC_CALL_INLINE */ + +#define __STATIC_CALL_ADDRESSABLE(name) +#define __static_call(name) __raw_static_call(name) + +#endif /* CONFIG_HAVE_STATIC_CALL_INLINE */ + +#ifdef MODULE +#define __STATIC_CALL_MOD_ADDRESSABLE(name) +#define static_call_mod(name) __raw_static_call(name) +#else +#define __STATIC_CALL_MOD_ADDRESSABLE(name) __STATIC_CALL_ADDRESSABLE(name) +#define static_call_mod(name) __static_call(name) +#endif + #define static_call(name) __static_call(name) #else -- cgit v1.2.3 From ef72661e28c64ad610f89acc2832ec67b27ba438 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Mon, 25 Jan 2021 16:26:50 +0100 Subject: sched: Harden PREEMPT_DYNAMIC Use the new EXPORT_STATIC_CALL_TRAMP() / static_call_mod() to unexport the static_call_key for the PREEMPT_DYNAMIC calls such that modules can no longer update these calls. Having modules change/hi-jack the preemption calls would be horrible. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar --- include/linux/kernel.h | 2 +- include/linux/sched.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kernel.h b/include/linux/kernel.h index cfd3d349f905..5b7ed6dc99ac 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -93,7 +93,7 @@ DECLARE_STATIC_CALL(might_resched, __cond_resched); static __always_inline void might_resched(void) { - static_call(might_resched)(); + static_call_mod(might_resched)(); } #else diff --git a/include/linux/sched.h b/include/linux/sched.h index 2f35594b8b53..4d568288abf9 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1880,7 +1880,7 @@ DECLARE_STATIC_CALL(cond_resched, __cond_resched); static __always_inline int _cond_resched(void) { - return static_call(cond_resched)(); + return static_call_mod(cond_resched)(); } #else -- cgit v1.2.3 From 43789ef3f7d61aa7bed0cb2764e588fc990c30ef Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Mon, 1 Feb 2021 00:05:45 +0100 Subject: rcu/nocb: Perform deferred wake up before last idle's need_resched() check Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP kthread (rcuog) to be serviced. Usually a local wake up happening while running the idle task is handled in one of the need_resched() checks carefully placed within the idle loop that can break to the scheduler. Unfortunately the call to rcu_idle_enter() is already beyond the last generic need_resched() check and we may halt the CPU with a resched request unhandled, leaving the task hanging. Fix this with splitting the rcuog wakeup handling from rcu_idle_enter() and place it before the last generic need_resched() check in the idle loop. It is then assumed that no call to call_rcu() will be performed after that in the idle loop until the CPU is put in low power mode. Fixes: 96d3fd0d315a (rcu: Break call_rcu() deadlock involving scheduler and perf) Reported-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210131230548.32970-3-frederic@kernel.org --- include/linux/rcupdate.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index fd02c5fa60cb..36c2119de702 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -110,8 +110,10 @@ static inline void rcu_user_exit(void) { } #ifdef CONFIG_RCU_NOCB_CPU void rcu_init_nohz(void); +void rcu_nocb_flush_deferred_wakeup(void); #else /* #ifdef CONFIG_RCU_NOCB_CPU */ static inline void rcu_init_nohz(void) { } +static inline void rcu_nocb_flush_deferred_wakeup(void) { } #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */ /** -- cgit v1.2.3 From 4ae7dc97f726ea95c58ac58af71cc034ad22d7de Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Mon, 1 Feb 2021 00:05:48 +0100 Subject: entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point Following the idle loop model, cleanly check for pending rcuog wakeup before the last rescheduling point upon resuming to guest mode. This way we can avoid to do it from rcu_user_enter() with the last resort self-IPI hack that enforces rescheduling. Suggested-by: Peter Zijlstra Signed-off-by: Frederic Weisbecker Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210131230548.32970-6-frederic@kernel.org --- include/linux/entry-kvm.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/entry-kvm.h b/include/linux/entry-kvm.h index 9b93f8584ff7..8b2b1d68b954 100644 --- a/include/linux/entry-kvm.h +++ b/include/linux/entry-kvm.h @@ -46,6 +46,20 @@ static inline int arch_xfer_to_guest_mode_handle_work(struct kvm_vcpu *vcpu, */ int xfer_to_guest_mode_handle_work(struct kvm_vcpu *vcpu); +/** + * xfer_to_guest_mode_prepare - Perform last minute preparation work that + * need to be handled while IRQs are disabled + * upon entering to guest. + * + * Has to be invoked with interrupts disabled before the last call + * to xfer_to_guest_mode_work_pending(). + */ +static inline void xfer_to_guest_mode_prepare(void) +{ + lockdep_assert_irqs_disabled(); + rcu_nocb_flush_deferred_wakeup(); +} + /** * __xfer_to_guest_mode_work_pending - Check if work is pending * -- cgit v1.2.3