cpufreq: schedutil: New governor based on scheduler utilization data

Add a new cpufreq scaling governor, called "schedutil", that uses scheduler-provided CPU utilization information as input for making its decisions. Doing that is possible after commit 34e2c555f3e1 (cpufreq: Add mechanism for registering utilization update callbacks) that introduced cpufreq_update_util() called by the scheduler on utilization changes (from CFS) and RT/DL task status updates. In particular, CPU frequency scaling decisions may be based on the the utilization data passed to cpufreq_update_util() by CFS. The new governor is relatively simple. The frequency selection formula used by it depends on whether or not the utilization is frequency-invariant. In the frequency-invariant case the new CPU frequency is given by next_freq = 1.25 * max_freq * util / max where util and max are the last two arguments of cpufreq_update_util(). In turn, if util is not frequency-invariant, the maximum frequency in the above formula is replaced with the current frequency of the CPU: next_freq = 1.25 * curr_freq * util / max The coefficient 1.25 corresponds to the frequency tipping point at (util / max) = 0.8. All of the computations are carried out in the utilization update handlers provided by the new governor. One of those handlers is used for cpufreq policies shared between multiple CPUs and the other one is for policies with one CPU only (and therefore it doesn't need to use any extra synchronization means). The governor supports fast frequency switching if that is supported by the cpufreq driver in use and possible for the given policy. In the fast switching case, all operations of the governor take place in its utilization update handlers. If fast switching cannot be used, the frequency switch operations are carried out with the help of a work item which only calls __cpufreq_driver_target() (under a mutex) to trigger a frequency update (to a value already computed beforehand in one of the utilization update handlers). Currently, the governor treats all of the RT and DL tasks as "unknown utilization" and sets the frequency to the allowed maximum when updated from the RT or DL sched classes. That heavy-handed approach should be replaced with something more subtle and specifically targeted at RT and DL tasks. The governor shares some tunables management code with the "ondemand" and "conservative" governors and uses some common definitions from cpufreq_governor.h, but apart from that it is stand-alone. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> 2016-04-02 01:09:12 +0200
committer: Rafael J. Wysocki <rafael.j.wysocki@intel.com> 2016-04-02 01:09:12 +0200
commit: 9bdcb44e391da5c41b98573bf0305a0e0b1c9569 (patch)
tree: d9785da0dfc47ca196fd8401e072a07623827793 /kernel/sched/sched.h
parent: b7898fda5bc7e786e76ce24fbd2ec993b08ec518 (diff)
1 files changed, 8 insertions, 0 deletions
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ec2e8d23527e..921d6e5d33b7 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1842,6 +1842,14 @@ static inline void cpufreq_update_util(u64 time, unsigned long util, unsigned lo
 static inline void cpufreq_trigger_update(u64 time) {}
 #endif /* CONFIG_CPU_FREQ */
 
+#ifdef arch_scale_freq_capacity
+#ifndef arch_scale_freq_invariant
+#define arch_scale_freq_invariant()	(true)
+#endif
+#else /* arch_scale_freq_capacity */
+#define arch_scale_freq_invariant()	(false)
+#endif
+
 static inline void account_reset_rq(struct rq *rq)
 {
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
author	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-02 01:09:12 +0200
committer	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-02 01:09:12 +0200
commit	9bdcb44e391da5c41b98573bf0305a0e0b1c9569 (patch)
tree	d9785da0dfc47ca196fd8401e072a07623827793 /kernel/sched/sched.h
parent	b7898fda5bc7e786e76ce24fbd2ec993b08ec518 (diff)