From cb2517653fccaf9f9b4ae968c7ee005c1bbacdc5 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Fri, 5 Feb 2016 09:08:36 +0000 Subject: sched/debug: Make schedstats a runtime tunable that is disabled by default schedstats is very useful during debugging and performance tuning but it incurs overhead to calculate the stats. As such, even though it can be disabled at build time, it is often enabled as the information is useful. This patch adds a kernel command-line and sysctl tunable to enable or disable schedstats on demand (when it's built in). It is disabled by default as someone who knows they need it can also learn to enable it when necessary. The benefits are dependent on how scheduler-intensive the workload is. If it is then the patch reduces the number of cycles spent calculating the stats with a small benefit from reducing the cache footprint of the scheduler. These measurements were taken from a 48-core 2-socket machine with Xeon(R) E5-2670 v3 cpus although they were also tested on a single socket machine 8-core machine with Intel i7-3770 processors. netperf-tcp 4.5.0-rc1 4.5.0-rc1 vanilla nostats-v3r1 Hmean 64 560.45 ( 0.00%) 575.98 ( 2.77%) Hmean 128 766.66 ( 0.00%) 795.79 ( 3.80%) Hmean 256 950.51 ( 0.00%) 981.50 ( 3.26%) Hmean 1024 1433.25 ( 0.00%) 1466.51 ( 2.32%) Hmean 2048 2810.54 ( 0.00%) 2879.75 ( 2.46%) Hmean 3312 4618.18 ( 0.00%) 4682.09 ( 1.38%) Hmean 4096 5306.42 ( 0.00%) 5346.39 ( 0.75%) Hmean 8192 10581.44 ( 0.00%) 10698.15 ( 1.10%) Hmean 16384 18857.70 ( 0.00%) 18937.61 ( 0.42%) Small gains here, UDP_STREAM showed nothing intresting and neither did the TCP_RR tests. The gains on the 8-core machine were very similar. tbench4 4.5.0-rc1 4.5.0-rc1 vanilla nostats-v3r1 Hmean mb/sec-1 500.85 ( 0.00%) 522.43 ( 4.31%) Hmean mb/sec-2 984.66 ( 0.00%) 1018.19 ( 3.41%) Hmean mb/sec-4 1827.91 ( 0.00%) 1847.78 ( 1.09%) Hmean mb/sec-8 3561.36 ( 0.00%) 3611.28 ( 1.40%) Hmean mb/sec-16 5824.52 ( 0.00%) 5929.03 ( 1.79%) Hmean mb/sec-32 10943.10 ( 0.00%) 10802.83 ( -1.28%) Hmean mb/sec-64 15950.81 ( 0.00%) 16211.31 ( 1.63%) Hmean mb/sec-128 15302.17 ( 0.00%) 15445.11 ( 0.93%) Hmean mb/sec-256 14866.18 ( 0.00%) 15088.73 ( 1.50%) Hmean mb/sec-512 15223.31 ( 0.00%) 15373.69 ( 0.99%) Hmean mb/sec-1024 14574.25 ( 0.00%) 14598.02 ( 0.16%) Hmean mb/sec-2048 13569.02 ( 0.00%) 13733.86 ( 1.21%) Hmean mb/sec-3072 12865.98 ( 0.00%) 13209.23 ( 2.67%) Small gains of 2-4% at low thread counts and otherwise flat. The gains on the 8-core machine were slightly different tbench4 on 8-core i7-3770 single socket machine Hmean mb/sec-1 442.59 ( 0.00%) 448.73 ( 1.39%) Hmean mb/sec-2 796.68 ( 0.00%) 794.39 ( -0.29%) Hmean mb/sec-4 1322.52 ( 0.00%) 1343.66 ( 1.60%) Hmean mb/sec-8 2611.65 ( 0.00%) 2694.86 ( 3.19%) Hmean mb/sec-16 2537.07 ( 0.00%) 2609.34 ( 2.85%) Hmean mb/sec-32 2506.02 ( 0.00%) 2578.18 ( 2.88%) Hmean mb/sec-64 2511.06 ( 0.00%) 2569.16 ( 2.31%) Hmean mb/sec-128 2313.38 ( 0.00%) 2395.50 ( 3.55%) Hmean mb/sec-256 2110.04 ( 0.00%) 2177.45 ( 3.19%) Hmean mb/sec-512 2072.51 ( 0.00%) 2053.97 ( -0.89%) In constract, this shows a relatively steady 2-3% gain at higher thread counts. Due to the nature of the patch and the type of workload, it's not a surprise that the result will depend on the CPU used. hackbench-pipes 4.5.0-rc1 4.5.0-rc1 vanilla nostats-v3r1 Amean 1 0.0637 ( 0.00%) 0.0660 ( -3.59%) Amean 4 0.1229 ( 0.00%) 0.1181 ( 3.84%) Amean 7 0.1921 ( 0.00%) 0.1911 ( 0.52%) Amean 12 0.3117 ( 0.00%) 0.2923 ( 6.23%) Amean 21 0.4050 ( 0.00%) 0.3899 ( 3.74%) Amean 30 0.4586 ( 0.00%) 0.4433 ( 3.33%) Amean 48 0.5910 ( 0.00%) 0.5694 ( 3.65%) Amean 79 0.8663 ( 0.00%) 0.8626 ( 0.43%) Amean 110 1.1543 ( 0.00%) 1.1517 ( 0.22%) Amean 141 1.4457 ( 0.00%) 1.4290 ( 1.16%) Amean 172 1.7090 ( 0.00%) 1.6924 ( 0.97%) Amean 192 1.9126 ( 0.00%) 1.9089 ( 0.19%) Some small gains and losses and while the variance data is not included, it's close to the noise. The UMA machine did not show anything particularly different pipetest 4.5.0-rc1 4.5.0-rc1 vanilla nostats-v2r2 Min Time 4.13 ( 0.00%) 3.99 ( 3.39%) 1st-qrtle Time 4.38 ( 0.00%) 4.27 ( 2.51%) 2nd-qrtle Time 4.46 ( 0.00%) 4.39 ( 1.57%) 3rd-qrtle Time 4.56 ( 0.00%) 4.51 ( 1.10%) Max-90% Time 4.67 ( 0.00%) 4.60 ( 1.50%) Max-93% Time 4.71 ( 0.00%) 4.65 ( 1.27%) Max-95% Time 4.74 ( 0.00%) 4.71 ( 0.63%) Max-99% Time 4.88 ( 0.00%) 4.79 ( 1.84%) Max Time 4.93 ( 0.00%) 4.83 ( 2.03%) Mean Time 4.48 ( 0.00%) 4.39 ( 1.91%) Best99%Mean Time 4.47 ( 0.00%) 4.39 ( 1.91%) Best95%Mean Time 4.46 ( 0.00%) 4.38 ( 1.93%) Best90%Mean Time 4.45 ( 0.00%) 4.36 ( 1.98%) Best50%Mean Time 4.36 ( 0.00%) 4.25 ( 2.49%) Best10%Mean Time 4.23 ( 0.00%) 4.10 ( 3.13%) Best5%Mean Time 4.19 ( 0.00%) 4.06 ( 3.20%) Best1%Mean Time 4.13 ( 0.00%) 4.00 ( 3.39%) Small improvement and similar gains were seen on the UMA machine. The gain is small but it stands to reason that doing less work in the scheduler is a good thing. The downside is that the lack of schedstats and tracepoints may be surprising to experts doing performance analysis until they find the existence of the schedstats= parameter or schedstats sysctl. It will be automatically activated for latencytop and sleep profiling to alleviate the problem. For tracepoints, there is a simple warning as it's not safe to activate schedstats in the context when it's known the tracepoint may be wanted but is unavailable. Signed-off-by: Mel Gorman Reviewed-by: Matt Fleming Reviewed-by: Srikar Dronamraju Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1454663316-22048-1-git-send-email-mgorman@techsingularity.net Signed-off-by: Ingo Molnar --- include/linux/latencytop.h | 3 +++ include/linux/sched.h | 4 ++++ include/linux/sched/sysctl.h | 4 ++++ 3 files changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/latencytop.h b/include/linux/latencytop.h index e23121f9d82a..59ccab297ae0 100644 --- a/include/linux/latencytop.h +++ b/include/linux/latencytop.h @@ -37,6 +37,9 @@ account_scheduler_latency(struct task_struct *task, int usecs, int inter) void clear_all_latency_tracing(struct task_struct *p); +extern int sysctl_latencytop(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos); + #else static inline void diff --git a/include/linux/sched.h b/include/linux/sched.h index a10494a94cc3..a292c4b7e94c 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -920,6 +920,10 @@ static inline int sched_info_on(void) #endif } +#ifdef CONFIG_SCHEDSTATS +void force_schedstat_enabled(void); +#endif + enum cpu_idle_type { CPU_IDLE, CPU_NOT_IDLE, diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index c9e4731cf10b..4f080ab4f2cd 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -95,4 +95,8 @@ extern int sysctl_numa_balancing(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos); +extern int sysctl_schedstats(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, + loff_t *ppos); + #endif /* _SCHED_SYSCTL_H */ -- cgit v1.2.3 From f4bcfa1da6fdcbc7a0854a28603bffc3c5656332 Mon Sep 17 00:00:00 2001 From: Stafford Horne Date: Tue, 23 Feb 2016 22:39:28 +0900 Subject: sched/wait: Fix wait_event_freezable() documentation I noticed the comment label 'wait_event' was wrong. Signed-off-by: Stafford Horne Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1456234768-24933-1-git-send-email-shorne@gmail.com Signed-off-by: Ingo Molnar --- include/linux/wait.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/wait.h b/include/linux/wait.h index ae71a769b89e..27d7a0ab5da3 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -338,7 +338,7 @@ do { \ schedule(); try_to_freeze()) /** - * wait_event - sleep (or freeze) until a condition gets true + * wait_event_freezable - sleep (or freeze) until a condition gets true * @wq: the waitqueue to wait on * @condition: a C expression for the event to wait for * -- cgit v1.2.3 From 13b35686e8b934ff78f59cef0c65fa3a43f8eeaf Mon Sep 17 00:00:00 2001 From: "Peter Zijlstra (Intel)" Date: Fri, 19 Feb 2016 09:46:37 +0100 Subject: wait.[ch]: Introduce the simple waitqueue (swait) implementation The existing wait queue support has support for custom wake up call backs, wake flags, wake key (passed to call back) and exclusive flags that allow wakers to be tagged as exclusive, for limiting the number of wakers. In a lot of cases, none of these features are used, and hence we can benefit from a slimmed down version that lowers memory overhead and reduces runtime overhead. The concept originated from -rt, where waitqueues are a constant source of trouble, as we can't convert the head lock to a raw spinlock due to fancy and long lasting callbacks. With the removal of custom callbacks, we can use a raw lock for queue list manipulations, hence allowing the simple wait support to be used in -rt. [Patch is from PeterZ which is based on Thomas version. Commit message is written by Paul G. Daniel: - Fixed some compile issues - Added non-lazy implementation of swake_up_locked as suggested by Boqun Feng.] Originally-by: Thomas Gleixner Signed-off-by: Daniel Wagner Acked-by: Peter Zijlstra (Intel) Cc: linux-rt-users@vger.kernel.org Cc: Boqun Feng Cc: Marcelo Tosatti Cc: Steven Rostedt Cc: Paul Gortmaker Cc: Paolo Bonzini Cc: "Paul E. McKenney" Link: http://lkml.kernel.org/r/1455871601-27484-2-git-send-email-wagi@monom.org Signed-off-by: Thomas Gleixner --- include/linux/swait.h | 172 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 172 insertions(+) create mode 100644 include/linux/swait.h (limited to 'include/linux') diff --git a/include/linux/swait.h b/include/linux/swait.h new file mode 100644 index 000000000000..c1f9c62a8a50 --- /dev/null +++ b/include/linux/swait.h @@ -0,0 +1,172 @@ +#ifndef _LINUX_SWAIT_H +#define _LINUX_SWAIT_H + +#include +#include +#include +#include + +/* + * Simple wait queues + * + * While these are very similar to the other/complex wait queues (wait.h) the + * most important difference is that the simple waitqueue allows for + * deterministic behaviour -- IOW it has strictly bounded IRQ and lock hold + * times. + * + * In order to make this so, we had to drop a fair number of features of the + * other waitqueue code; notably: + * + * - mixing INTERRUPTIBLE and UNINTERRUPTIBLE sleeps on the same waitqueue; + * all wakeups are TASK_NORMAL in order to avoid O(n) lookups for the right + * sleeper state. + * + * - the exclusive mode; because this requires preserving the list order + * and this is hard. + * + * - custom wake functions; because you cannot give any guarantees about + * random code. + * + * As a side effect of this; the data structures are slimmer. + * + * One would recommend using this wait queue where possible. + */ + +struct task_struct; + +struct swait_queue_head { + raw_spinlock_t lock; + struct list_head task_list; +}; + +struct swait_queue { + struct task_struct *task; + struct list_head task_list; +}; + +#define __SWAITQUEUE_INITIALIZER(name) { \ + .task = current, \ + .task_list = LIST_HEAD_INIT((name).task_list), \ +} + +#define DECLARE_SWAITQUEUE(name) \ + struct swait_queue name = __SWAITQUEUE_INITIALIZER(name) + +#define __SWAIT_QUEUE_HEAD_INITIALIZER(name) { \ + .lock = __RAW_SPIN_LOCK_UNLOCKED(name.lock), \ + .task_list = LIST_HEAD_INIT((name).task_list), \ +} + +#define DECLARE_SWAIT_QUEUE_HEAD(name) \ + struct swait_queue_head name = __SWAIT_QUEUE_HEAD_INITIALIZER(name) + +extern void __init_swait_queue_head(struct swait_queue_head *q, const char *name, + struct lock_class_key *key); + +#define init_swait_queue_head(q) \ + do { \ + static struct lock_class_key __key; \ + __init_swait_queue_head((q), #q, &__key); \ + } while (0) + +#ifdef CONFIG_LOCKDEP +# define __SWAIT_QUEUE_HEAD_INIT_ONSTACK(name) \ + ({ init_swait_queue_head(&name); name; }) +# define DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(name) \ + struct swait_queue_head name = __SWAIT_QUEUE_HEAD_INIT_ONSTACK(name) +#else +# define DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(name) \ + DECLARE_SWAIT_QUEUE_HEAD(name) +#endif + +static inline int swait_active(struct swait_queue_head *q) +{ + return !list_empty(&q->task_list); +} + +extern void swake_up(struct swait_queue_head *q); +extern void swake_up_all(struct swait_queue_head *q); +extern void swake_up_locked(struct swait_queue_head *q); + +extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait); +extern void prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait, int state); +extern long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state); + +extern void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait); +extern void finish_swait(struct swait_queue_head *q, struct swait_queue *wait); + +/* as per ___wait_event() but for swait, therefore "exclusive == 0" */ +#define ___swait_event(wq, condition, state, ret, cmd) \ +({ \ + struct swait_queue __wait; \ + long __ret = ret; \ + \ + INIT_LIST_HEAD(&__wait.task_list); \ + for (;;) { \ + long __int = prepare_to_swait_event(&wq, &__wait, state);\ + \ + if (condition) \ + break; \ + \ + if (___wait_is_interruptible(state) && __int) { \ + __ret = __int; \ + break; \ + } \ + \ + cmd; \ + } \ + finish_swait(&wq, &__wait); \ + __ret; \ +}) + +#define __swait_event(wq, condition) \ + (void)___swait_event(wq, condition, TASK_UNINTERRUPTIBLE, 0, \ + schedule()) + +#define swait_event(wq, condition) \ +do { \ + if (condition) \ + break; \ + __swait_event(wq, condition); \ +} while (0) + +#define __swait_event_timeout(wq, condition, timeout) \ + ___swait_event(wq, ___wait_cond_timeout(condition), \ + TASK_UNINTERRUPTIBLE, timeout, \ + __ret = schedule_timeout(__ret)) + +#define swait_event_timeout(wq, condition, timeout) \ +({ \ + long __ret = timeout; \ + if (!___wait_cond_timeout(condition)) \ + __ret = __swait_event_timeout(wq, condition, timeout); \ + __ret; \ +}) + +#define __swait_event_interruptible(wq, condition) \ + ___swait_event(wq, condition, TASK_INTERRUPTIBLE, 0, \ + schedule()) + +#define swait_event_interruptible(wq, condition) \ +({ \ + int __ret = 0; \ + if (!(condition)) \ + __ret = __swait_event_interruptible(wq, condition); \ + __ret; \ +}) + +#define __swait_event_interruptible_timeout(wq, condition, timeout) \ + ___swait_event(wq, ___wait_cond_timeout(condition), \ + TASK_INTERRUPTIBLE, timeout, \ + __ret = schedule_timeout(__ret)) + +#define swait_event_interruptible_timeout(wq, condition, timeout) \ +({ \ + long __ret = timeout; \ + if (!___wait_cond_timeout(condition)) \ + __ret = __swait_event_interruptible_timeout(wq, \ + condition, timeout); \ + __ret; \ +}) + +#endif /* _LINUX_SWAIT_H */ -- cgit v1.2.3 From 8577370fb0cbe88266b7583d8d3b9f43ced077a0 Mon Sep 17 00:00:00 2001 From: Marcelo Tosatti Date: Fri, 19 Feb 2016 09:46:39 +0100 Subject: KVM: Use simple waitqueue for vcpu->wq The problem: On -rt, an emulated LAPIC timer instances has the following path: 1) hard interrupt 2) ksoftirqd is scheduled 3) ksoftirqd wakes up vcpu thread 4) vcpu thread is scheduled This extra context switch introduces unnecessary latency in the LAPIC path for a KVM guest. The solution: Allow waking up vcpu thread from hardirq context, thus avoiding the need for ksoftirqd to be scheduled. Normal waitqueues make use of spinlocks, which on -RT are sleepable locks. Therefore, waking up a waitqueue waiter involves locking a sleeping lock, which is not allowed from hard interrupt context. cyclictest command line: This patch reduces the average latency in my tests from 14us to 11us. Daniel writes: Paolo asked for numbers from kvm-unit-tests/tscdeadline_latency benchmark on mainline. The test was run 1000 times on tip/sched/core 4.4.0-rc8-01134-g0905f04: ./x86-run x86/tscdeadline_latency.flat -cpu host with idle=poll. The test seems not to deliver really stable numbers though most of them are smaller. Paolo write: "Anything above ~10000 cycles means that the host went to C1 or lower---the number means more or less nothing in that case. The mean shows an improvement indeed." Before: min max mean std count 1000.000000 1000.000000 1000.000000 1000.000000 mean 5162.596000 2019270.084000 5824.491541 20681.645558 std 75.431231 622607.723969 89.575700 6492.272062 min 4466.000000 23928.000000 5537.926500 585.864966 25% 5163.000000 1613252.750000 5790.132275 16683.745433 50% 5175.000000 2281919.000000 5834.654000 23151.990026 75% 5190.000000 2382865.750000 5861.412950 24148.206168 max 5228.000000 4175158.000000 6254.827300 46481.048691 After min max mean std count 1000.000000 1000.00000 1000.000000 1000.000000 mean 5143.511000 2076886.10300 5813.312474 21207.357565 std 77.668322 610413.09583 86.541500 6331.915127 min 4427.000000 25103.00000 5529.756600 559.187707 25% 5148.000000 1691272.75000 5784.889825 17473.518244 50% 5160.000000 2308328.50000 5832.025000 23464.837068 75% 5172.000000 2393037.75000 5853.177675 24223.969976 max 5222.000000 3922458.00000 6186.720500 42520.379830 [Patch was originaly based on the swait implementation found in the -rt tree. Daniel ported it to mainline's version and gathered the benchmark numbers for tscdeadline_latency test.] Signed-off-by: Daniel Wagner Acked-by: Peter Zijlstra (Intel) Cc: linux-rt-users@vger.kernel.org Cc: Boqun Feng Cc: Marcelo Tosatti Cc: Steven Rostedt Cc: Paul Gortmaker Cc: Paolo Bonzini Cc: "Paul E. McKenney" Link: http://lkml.kernel.org/r/1455871601-27484-4-git-send-email-wagi@monom.org Signed-off-by: Thomas Gleixner --- include/linux/kvm_host.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 861f690aa791..5276fe0916fc 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -218,7 +219,7 @@ struct kvm_vcpu { int fpu_active; int guest_fpu_loaded, guest_xcr0_loaded; unsigned char fpu_counter; - wait_queue_head_t wq; + struct swait_queue_head wq; struct pid *pid; int sigset_active; sigset_t sigset; @@ -782,7 +783,7 @@ static inline bool kvm_arch_has_assigned_device(struct kvm *kvm) } #endif -static inline wait_queue_head_t *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu) +static inline struct swait_queue_head *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu) { #ifdef __KVM_HAVE_ARCH_WQP return vcpu->arch.wqp; -- cgit v1.2.3 From ff77e468535987b3d21b7bd4da15608ea3ce7d0b Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Mon, 18 Jan 2016 15:27:07 +0100 Subject: sched/rt: Fix PI handling vs. sched_setscheduler() Andrea Parri reported: > I found that the following scenario (with CONFIG_RT_GROUP_SCHED=y) is not > handled correctly: > > T1 (prio = 20) > lock(rtmutex); > > T2 (prio = 20) > blocks on rtmutex (rt_nr_boosted = 0 on T1's rq) > > T1 (prio = 20) > sys_set_scheduler(prio = 0) > [new_effective_prio == oldprio] > T1 prio = 20 (rt_nr_boosted = 0 on T1's rq) > > The last step is incorrect as T1 is now boosted (c.f., rt_se_boosted()); > in particular, if we continue with > > T1 (prio = 20) > unlock(rtmutex) > wakeup(T2) > adjust_prio(T1) > [prio != rt_mutex_getprio(T1)] > dequeue(T1) > rt_nr_boosted = (unsigned long)(-1) > ... > T1 prio = 0 > > then we end up leaving rt_nr_boosted in an "inconsistent" state. > > The simple program attached could reproduce the previous scenario; note > that, as a consequence of the presence of this state, the "assertion" > > WARN_ON(!rt_nr_running && rt_nr_boosted) > > from dec_rt_group() may trigger. So normally we dequeue/enqueue tasks in sched_setscheduler(), which would ensure the accounting stays correct. However in the early PI path we fail to do so. So this was introduced at around v3.14, by: c365c292d059 ("sched: Consider pi boosting in setscheduler()") which fixed another problem exactly because that dequeue/enqueue, joy. Fix this by teaching rt about DEQUEUE_SAVE/ENQUEUE_RESTORE and have it preserve runqueue location with that option. This requires decoupling the on_rt_rq() state from being on the list. In order to allow for explicit movement during the SAVE/RESTORE, introduce {DE,EN}QUEUE_MOVE. We still must use SAVE/RESTORE in these cases to preserve other invariants. Respecting the SAVE/RESTORE flags also has the (nice) side-effect that things like sys_nice()/sys_sched_setaffinity() also do not reorder FIFO tasks (whereas they used to before this patch). Reported-by: Andrea Parri Tested-by: Andrea Parri Signed-off-by: Peter Zijlstra (Intel) Cc: Juri Lelli Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Steven Rostedt Cc: Thomas Gleixner Signed-off-by: Ingo Molnar --- include/linux/sched.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index a292c4b7e94c..87e5f9886ac8 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1293,6 +1293,8 @@ struct sched_rt_entity { unsigned long timeout; unsigned long watchdog_stamp; unsigned int time_slice; + unsigned short on_rq; + unsigned short on_list; struct sched_rt_entity *back; #ifdef CONFIG_RT_GROUP_SCHED -- cgit v1.2.3 From f904f58263e1df5f70feb8b283f4bbe662847334 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Fri, 26 Feb 2016 14:54:56 +0100 Subject: sched/debug: Fix preempt_disable_ip recording for preempt_disable() The preempt_disable() invokes preempt_count_add() which saves the caller in ->preempt_disable_ip. It uses CALLER_ADDR1 which does not look for its caller but for the parent of the caller. Which means we get the correct caller for something like spin_lock() unless the architectures inlines those invocations. It is always wrong for preempt_disable() or local_bh_disable(). This patch makes the function get_lock_parent_ip() which tries CALLER_ADDR0,1,2 if the former is a locking function. This seems to record the preempt_disable() caller properly for preempt_disable() itself as well as for get_cpu_var() or local_bh_disable(). Steven asked for the get_parent_ip() -> get_lock_parent_ip() rename. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Steven Rostedt Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20160226135456.GB18244@linutronix.de Signed-off-by: Ingo Molnar --- include/linux/ftrace.h | 12 ++++++++++++ include/linux/sched.h | 2 -- 2 files changed, 12 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index c2b340e23f62..6d9df3f7e334 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -713,6 +713,18 @@ static inline void __ftrace_enabled_restore(int enabled) #define CALLER_ADDR5 ((unsigned long)ftrace_return_address(5)) #define CALLER_ADDR6 ((unsigned long)ftrace_return_address(6)) +static inline unsigned long get_lock_parent_ip(void) +{ + unsigned long addr = CALLER_ADDR0; + + if (!in_lock_functions(addr)) + return addr; + addr = CALLER_ADDR1; + if (!in_lock_functions(addr)) + return addr; + return CALLER_ADDR2; +} + #ifdef CONFIG_IRQSOFF_TRACER extern void time_hardirqs_on(unsigned long a0, unsigned long a1); extern void time_hardirqs_off(unsigned long a0, unsigned long a1); diff --git a/include/linux/sched.h b/include/linux/sched.h index 87e5f9886ac8..9519b66fc21f 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -182,8 +182,6 @@ extern void update_cpu_load_nohz(int active); static inline void update_cpu_load_nohz(int active) { } #endif -extern unsigned long get_parent_ip(unsigned long addr); - extern void dump_cpu_task(int cpu); struct seq_file; -- cgit v1.2.3 From 72f9f3fdc928dc3ecd223e801b32d930b662b6ed Mon Sep 17 00:00:00 2001 From: Luca Abeni Date: Mon, 7 Mar 2016 12:27:04 +0100 Subject: sched/deadline: Remove dl_new from struct sched_dl_entity The dl_new field of struct sched_dl_entity is currently used to identify new deadline tasks, so that their deadline and runtime can be properly initialised. However, these tasks can be easily identified by checking if their deadline is smaller than the current time when they switch to SCHED_DEADLINE. So, dl_new can be removed by introducing this check in switched_to_dl(); this allows to simplify the SCHED_DEADLINE code. Signed-off-by: Luca Abeni Signed-off-by: Peter Zijlstra (Intel) Cc: Juri Lelli Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1457350024-7825-2-git-send-email-luca.abeni@unitn.it Signed-off-by: Ingo Molnar --- include/linux/sched.h | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index 9519b66fc21f..838a89a78332 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1333,10 +1333,6 @@ struct sched_dl_entity { * task has to wait for a replenishment to be performed at the * next firing of dl_timer. * - * @dl_new tells if a new instance arrived. If so we must - * start executing it with full runtime and reset its absolute - * deadline; - * * @dl_boosted tells if we are boosted due to DI. If so we are * outside bandwidth enforcement mechanism (but only until we * exit the critical section); @@ -1344,7 +1340,7 @@ struct sched_dl_entity { * @dl_yielded tells if task gave up the cpu before consuming * all its available runtime during the last job. */ - int dl_throttled, dl_new, dl_boosted, dl_yielded; + int dl_throttled, dl_boosted, dl_yielded; /* * Bandwidth enforcement timer. Each -deadline task has its -- cgit v1.2.3