linux-toradex.git/drivers/cpufreq, branch v3.10.61

cpufreq: intel_pstate: Fix setting max_perf_pct in performance policy

2014-11-14T16:47:58+00:00

commit 36b4bed5cd8f6e17019fa7d380e0836872c7b367 upstream.

Code which changes policy to powersave changes also max_policy_pct based on
max_freq. Code which change max_perf_pct has upper limit base on value
max_policy_pct. When policy is changing from powersave back to performance
then max_policy_pct is not changed. Which means that changing max_perf_pct is
not possible to high values if max_freq was too low in powersave policy.

Test case:

$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
800000
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
3300000
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
100

$ echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
$ echo 800000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
$ echo 20 > /sys/devices/system/cpu/intel_pstate/max_perf_pct

$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
powersave
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
800000
$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
20

$ echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
$ echo 3300000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
$ echo 100 > /sys/devices/system/cpu/intel_pstate/max_perf_pct

$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
3300000
$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
24

And now intel_pstate driver allows to set maximal value for max_perf_pct based
on max_policy_pct which is 24 for previous powersave max_freq 800000.

This patch will set default value for max_policy_pct when setting policy to
performance so it will allow to set also max value for max_perf_pct.

Signed-off-by: Pali Rohár 
Acked-by: Dirk Brandewie 
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Greg Kroah-Hartman

cpufreq: ondemand: Change the calculation of target frequency

2014-10-09T19:18:43+00:00

commit dfa5bb622555d9da0df21b50f46ebdeef390041b upstream.

The ondemand governor calculates load in terms of frequency and
increases it only if load_freq is greater than up_threshold
multiplied by the current or average frequency.  This appears to
produce oscillations of frequency between min and max because,
for example, a relatively small load can easily saturate minimum
frequency and lead the CPU to the max.  Then, it will decrease
back to the min due to small load_freq.

Change the calculation method of load and target frequency on the
basis of the following two observations:

 - Load computation should not depend on the current or average
   measured frequency.  For example, absolute load of 80% at 100MHz
   is not necessarily equivalent to 8% at 1000MHz in the next
   sampling interval.

 - It should be possible to increase the target frequency to any
   value present in the frequency table proportional to the absolute
   load, rather than to the max only, so that:

   Target frequency = C * load

   where we take C = policy->cpuinfo.max_freq / 100.

Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
that middle frequencies are used more, with this patch.  Highest
and lowest frequencies were used less by ~9%.

[rjw: We have run multiple other tests on kernels with this
 change applied and in the vast majority of cases it turns out
 that the resulting performance improvement also leads to reduced
 consumption of energy.  The change is additionally justified by
 the overall simplification of the code in question.]

Signed-off-by: Stratos Karafotis 
Acked-by: Viresh Kumar 
Signed-off-by: Rafael J. Wysocki 
Cc: Mark Brown 
Signed-off-by: Greg Kroah-Hartman

cpufreq: Fix wrong time unit conversion

2014-10-09T19:18:43+00:00

commit a857c0b9e24e39fe5be82451b65377795f9538d8 upstream.

The time spent by a CPU under a given frequency is stored in jiffies unit
in the cpu var cpufreq_stats_table->time_in_state[i], i being the index of
the frequency.

This is what is displayed in the following file on the right column:

     cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state
     2301000 19835820
     2300000 3172
     [...]

Now cpufreq converts this jiffies unit delta to clock_t before returning it
to the user as in the above file. And that conversion is achieved using the API
cputime64_to_clock_t().

Although it accidentally works on traditional tick based cputime accounting, where
cputime_t maps directly to jiffies, it doesn't work with other types of cputime
accounting such as CONFIG_VIRT_CPU_ACCOUNTING_* where cputime_t can map to nsecs
or any granularity preffered by the architecture.

For example we get a buggy zero delta on full dyntick configurations:

     cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state
     2301000 0
     2300000 0
     [...]

Fix this with using the proper jiffies_64_t to clock_t conversion.

Reported-and-tested-by: Carsten Emde 
Signed-off-by: Andreas Schwab 
Signed-off-by: Frederic Weisbecker 
Acked-by: Paul E. McKenney 
Signed-off-by: Rafael J. Wysocki 
Cc: Mark Brown 
Signed-off-by: Greg Kroah-Hartman

cpufreq: Makefile: fix compilation for davinci platform

2014-07-17T22:58:01+00:00

commit 5a90af67c2126fe1d04ebccc1f8177e6ca70d3a9 upstream.

Since commtit 8a7b1227e303 (cpufreq: davinci: move cpufreq driver to
drivers/cpufreq) this added dependancy only for CONFIG_ARCH_DAVINCI_DA850
where as davinci_cpufreq_init() call is used by all davinci platform.

This patch fixes following build error:

arch/arm/mach-davinci/built-in.o: In function `davinci_init_late':
:(.init.text+0x928): undefined reference to `davinci_cpufreq_init'
make: *** [vmlinux] Error 1

Fixes: 8a7b1227e303 (cpufreq: davinci: move cpufreq driver to drivers/cpufreq)
Signed-off-by: Lad, Prabhakar 
Acked-by: Viresh Kumar 
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Greg Kroah-Hartman

cpufreq: Fix timer/workqueue corruption due to double queueing

2014-04-14T13:42:19+00:00

commit 3617f2ca6d0eba48114308532945a7f1577816a4 upstream.

When a CPU is hot removed we'll cancel all the delayed work items
via gov_cancel_work(). Normally this will just cancels a delayed
timer on each CPU that the policy is managing and the work won't
run, but if the work is already running the workqueue code will
wait for the work to finish before continuing to prevent the
work items from re-queuing themselves like they normally do. This
scheme will work most of the time, except for the case where the
work function determines that it should adjust the delay for all
other CPUs that the policy is managing. If this scenario occurs,
the canceling CPU will cancel its own work but queue up the other
CPUs works to run. For example:

 CPU0                                        CPU1
 ----                                        ----
 cpu_down()
  ...
  __cpufreq_remove_dev()
   cpufreq_governor_dbs()
    case CPUFREQ_GOV_STOP:
     gov_cancel_work(dbs_data, policy);
      cpu0 work is canceled
       timer is canceled
       cpu1 work is canceled                    
                                od_dbs_timer()
                                                 gov_queue_work(*, *, true);
 						  cpu0 work queued
 						  cpu1 work queued
						  cpu2 work queued
						  ...
       cpu1 work is canceled
       cpu2 work is canceled
       ...

At the end of the GOV_STOP case cpu0 still has a work queued to
run although the code is expecting all of the works to be
canceled. __cpufreq_remove_dev() will then proceed to
re-initialize all the other CPUs works except for the CPU that is
going down. The CPUFREQ_GOV_START case in cpufreq_governor_dbs()
will trample over the queued work and debugobjects will spit out
a warning:

WARNING: at lib/debugobjects.c:260 debug_print_object+0x94/0xbc()
ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10
Modules linked in:
CPU: 0 PID: 1491 Comm: sh Tainted: G        W    3.10.0 #19
[] (unwind_backtrace+0x0/0x11c) from [] (show_stack+0x10/0x14)
[] (show_stack+0x10/0x14) from [] (warn_slowpath_common+0x4c/0x6c)
[] (warn_slowpath_common+0x4c/0x6c) from [] (warn_slowpath_fmt+0x2c/0x3c)
[] (warn_slowpath_fmt+0x2c/0x3c) from [] (debug_print_object+0x94/0xbc)
[] (debug_print_object+0x94/0xbc) from [] (__debug_object_init+0x2d0/0x340)
[] (__debug_object_init+0x2d0/0x340) from [] (init_timer_key+0x14/0xb0)
[] (init_timer_key+0x14/0xb0) from [] (cpufreq_governor_dbs+0x3e8/0x5f8)
[] (cpufreq_governor_dbs+0x3e8/0x5f8) from [] (__cpufreq_governor+0xdc/0x1a4)
[] (__cpufreq_governor+0xdc/0x1a4) from [] (__cpufreq_remove_dev.isra.10+0x3b4/0x434)
[] (__cpufreq_remove_dev.isra.10+0x3b4/0x434) from [] (cpufreq_cpu_callback+0x60/0x80)
[] (cpufreq_cpu_callback+0x60/0x80) from [] (notifier_call_chain+0x38/0x68)
[] (notifier_call_chain+0x38/0x68) from [] (__cpu_notify+0x28/0x40)
[] (__cpu_notify+0x28/0x40) from [] (_cpu_down+0x7c/0x2c0)
[] (_cpu_down+0x7c/0x2c0) from [] (cpu_down+0x24/0x40)
[] (cpu_down+0x24/0x40) from [] (store_online+0x2c/0x74)
[] (store_online+0x2c/0x74) from [] (dev_attr_store+0x18/0x24)
[] (dev_attr_store+0x18/0x24) from [] (sysfs_write_file+0x100/0x148)
[] (sysfs_write_file+0x100/0x148) from [] (vfs_write+0xcc/0x174)
[] (vfs_write+0xcc/0x174) from [] (SyS_write+0x38/0x64)
[] (SyS_write+0x38/0x64) from [] (ret_fast_syscall+0x0/0x30)

Signed-off-by: Stephen Boyd 
Acked-by: Viresh Kumar 
Signed-off-by: Rafael J. Wysocki 
Cc: Krzysztof Kozlowski 
Signed-off-by: Greg Kroah-Hartman

cpufreq: Fix governor start/stop race condition

2014-04-14T13:42:19+00:00

commit 95731ebb114c5f0c028459388560fc2a72fe5049 upstream.

Cpufreq governors' stop and start operations should be carried out
in sequence.  Otherwise, there will be unexpected behavior, like in
the example below.

Suppose there are 4 CPUs and policy->cpu=CPU0, CPU1/2/3 are linked
to CPU0.  The normal sequence is:

 1) Current governor is userspace.  An application tries to set the
    governor to ondemand.  It will call __cpufreq_set_policy() in
    which it will stop the userspace governor and then start the
    ondemand governor.

 2) Current governor is userspace.  The online of CPU3 runs on CPU0.
    It will call cpufreq_add_policy_cpu() in which it will first
    stop the userspace governor, and then start it again.

If the sequence of the above two cases interleaves, it becomes:

 1) Application stops userspace governor
 2)                                  Hotplug stops userspace governor

which is a problem, because the governor shouldn't be stopped twice
in a row.  What happens next is:

 3) Application starts ondemand governor
 4)                                  Hotplug starts a governor

In step 4, the hotplug is supposed to start the userspace governor,
but now the governor has been changed by the application to ondemand,
so the ondemand governor is started once again, which is incorrect.

The solution is to prevent policy governors from being stopped
multiple times in a row.  A governor should only be stopped once for
one policy.  After it has been stopped, no more governor stop
operations should be executed.

Also add a mutex to serialize governor operations.

[rjw: Changelog.  And you owe me a beverage of my choice.]
Signed-off-by: Xiaoguang Chen 
Acked-by: Viresh Kumar 
Signed-off-by: Rafael J. Wysocki 
Cc: Krzysztof Kozlowski 
Signed-off-by: Greg Kroah-Hartman

powernow-k6: reorder frequencies

2014-04-14T13:42:14+00:00

commit 22c73795b101597051924556dce019385a1e2fa0 upstream.

This patch reorders reported frequencies from the highest to the lowest,
just like in other frequency drivers.

Signed-off-by: Mikulas Patocka 
Acked-by: Viresh Kumar 
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Greg Kroah-Hartman

powernow-k6: correctly initialize default parameters

2014-04-14T13:42:14+00:00

commit d82b922a4acc1781d368aceac2f9da43b038cab2 upstream.

The powernow-k6 driver used to read the initial multiplier from the
powernow register. However, there is a problem with this:

* If there was a frequency transition before, the multiplier read from the
  register corresponds to the current multiplier.
* If there was no frequency transition since reset, the field in the
  register always reads as zero, regardless of the current multiplier that
  is set using switches on the mainboard and that the CPU is running at.

The zero value corresponds to multiplier 4.5, so as a consequence, the
powernow-k6 driver always assumes multiplier 4.5.

For example, if we have 550MHz CPU with bus frequency 100MHz and
multiplier 5.5, the powernow-k6 driver thinks that the multiplier is 4.5
and bus frequency is 122MHz. The powernow-k6 driver then sets the
multiplier to 4.5, underclocking the CPU to 450MHz, but reports the
current frequency as 550MHz.

There is no reliable way how to read the initial multiplier. I modified
the driver so that it contains a table of known frequencies (based on
parameters of existing CPUs and some common overclocking schemes) and sets
the multiplier according to the frequency. If the frequency is unknown
(because of unusual overclocking or underclocking), the user must supply
the bus speed and maximum multiplier as module parameters.

This patch should be backported to all stable kernels. If it doesn't
apply cleanly, change it, or ask me to change it.

Signed-off-by: Mikulas Patocka 
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Greg Kroah-Hartman

powernow-k6: disable cache when changing frequency

2014-04-14T13:42:14+00:00

commit e20e1d0ac02308e2211306fc67abcd0b2668fb8b upstream.

I found out that a system with k6-3+ processor is unstable during network
server load. The system locks up or the network card stops receiving. The
reason for the instability is the CPU frequency scaling.

During frequency transition the processor is in "EPM Stop Grant" state.
The documentation says that the processor doesn't respond to inquiry
requests in this state. Consequently, coherency of processor caches and
bus master devices is not maintained, causing the system instability.

This patch flushes the cache during frequency transition. It fixes the
instability.

Other minor changes:
* u64 invalue changed to unsigned long because the variable is 32-bit
* move the logic to set the multiplier to a separate function
  powernow_k6_set_cpu_multiplier
* preserve lower 5 bits of the powernow port instead of 4 (the voltage
  field has 5 bits)
* mask interrupts when reading the multiplier, so that the port is not
  open during other activity (running other kernel code with the port open
  shouldn't cause any misbehavior, but we should better be safe and keep
  the port closed)

This patch should be backported to all stable kernels. If it doesn't
apply cleanly, change it, or ask me to change it.

Signed-off-by: Mikulas Patocka 
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Greg Kroah-Hartman

cpufreq: powernow-k8: Initialize per-cpu data-structures properly

2014-03-07T05:30:09+00:00

commit c3274763bfc3bf1ececa269ed6e6c4d7ec1c3e5e upstream.

The powernow-k8 driver maintains a per-cpu data-structure called
powernow_data that is used to perform the frequency transitions.
It initializes this data structure only for the policy->cpu. So,
accesses to this data structure by other CPUs results in various
problems because they would have been uninitialized.

Specifically, if a cpu (!= policy->cpu) invokes the drivers' ->get()
function, it returns 0 as the KHz value, since its per-cpu memory
doesn't point to anything valid. This causes problems during
suspend/resume since cpufreq_update_policy() tries to enforce this
(0 KHz) as the current frequency of the CPU, and this madness gets
propagated to adjust_jiffies() as well. Eventually, lots of things
start breaking down, including the r8169 ethernet card, in one
particularly interesting case reported by Pierre Ossman.

Fix this by initializing the per-cpu data-structures of all the CPUs
in the policy appropriately.

References: https://bugzilla.kernel.org/show_bug.cgi?id=70311
Reported-by: Pierre Ossman 
Signed-off-by: Srivatsa S. Bhat 
Acked-by: Viresh Kumar 
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Greg Kroah-Hartman