diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/ABI/testing/sysfs-power | 45 | ||||
-rw-r--r-- | Documentation/admin-guide/kernel-parameters.txt | 22 | ||||
-rw-r--r-- | Documentation/cpu-freq/cpufreq-stats.txt | 6 | ||||
-rw-r--r-- | Documentation/cpu-freq/intel-pstate.txt | 54 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/cpufreq/brcm,stb-avs-cpu-freq.txt | 78 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/opp/opp.txt | 27 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/power/domain-idle-state.txt | 33 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/power/power_domain.txt | 43 | ||||
-rw-r--r-- | Documentation/power/devices.txt | 14 | ||||
-rw-r--r-- | Documentation/power/states.txt | 62 |
10 files changed, 321 insertions, 63 deletions
diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power index 50b368d490b5..f523e5a3ac33 100644 --- a/Documentation/ABI/testing/sysfs-power +++ b/Documentation/ABI/testing/sysfs-power @@ -7,30 +7,35 @@ Description: subsystem. What: /sys/power/state -Date: May 2014 +Date: November 2016 Contact: Rafael J. Wysocki <rjw@rjwysocki.net> Description: The /sys/power/state file controls system sleep states. Reading from this file returns the available sleep state - labels, which may be "mem", "standby", "freeze" and "disk" - (hibernation). The meanings of the first three labels depend on - the relative_sleep_states command line argument as follows: - 1) relative_sleep_states = 1 - "mem", "standby", "freeze" represent non-hibernation sleep - states from the deepest ("mem", always present) to the - shallowest ("freeze"). "standby" and "freeze" may or may - not be present depending on the capabilities of the - platform. "freeze" can only be present if "standby" is - present. - 2) relative_sleep_states = 0 (default) - "mem" - "suspend-to-RAM", present if supported. - "standby" - "power-on suspend", present if supported. - "freeze" - "suspend-to-idle", always present. - - Writing to this file one of these strings causes the system to - transition into the corresponding state, if available. See - Documentation/power/states.txt for a description of what - "suspend-to-RAM", "power-on suspend" and "suspend-to-idle" mean. + labels, which may be "mem" (suspend), "standby" (power-on + suspend), "freeze" (suspend-to-idle) and "disk" (hibernation). + + Writing one of the above strings to this file causes the system + to transition into the corresponding state, if available. + + See Documentation/power/states.txt for more information. + +What: /sys/power/mem_sleep +Date: November 2016 +Contact: Rafael J. Wysocki <rjw@rjwysocki.net> +Description: + The /sys/power/mem_sleep file controls the operating mode of + system suspend. Reading from it returns the available modes + as "s2idle" (always present), "shallow" and "deep" (present if + supported). The mode that will be used on subsequent attempts + to suspend the system (by writing "mem" to the /sys/power/state + file described above) is enclosed in square brackets. + + Writing one of the above strings to this file causes the mode + represented by it to be used on subsequent attempts to suspend + the system. + + See Documentation/power/states.txt for more information. What: /sys/power/disk Date: September 2006 diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 62d68b2056de..be2d6d0a03a4 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1560,6 +1560,12 @@ disable Do not enable intel_pstate as the default scaling driver for the supported processors + passive + Use intel_pstate as a scaling driver, but configure it + to work with generic cpufreq governors (instead of + enabling its internal governor). This mode cannot be + used along with the hardware-managed P-states (HWP) + feature. force Enable intel_pstate on systems that prohibit it by default in favor of acpi-cpufreq. Forcing the intel_pstate driver @@ -1580,6 +1586,9 @@ Description Table, specifies preferred power management profile as "Enterprise Server" or "Performance Server", then this feature is turned on by default. + per_cpu_perf_limits + Allow per-logical-CPU P-State performance control limits using + cpufreq sysfs interface intremap= [X86-64, Intel-IOMMU] on enable Interrupt Remapping (default) @@ -2122,6 +2131,12 @@ memory contents and reserves bad memory regions that are detected. + mem_sleep_default= [SUSPEND] Default system suspend mode: + s2idle - Suspend-To-Idle + shallow - Power-On Suspend or equivalent (if supported) + deep - Suspend-To-RAM or equivalent (if supported) + See Documentation/power/states.txt. + meye.*= [HW] Set MotionEye Camera parameters See Documentation/video4linux/meye.txt. @@ -3475,13 +3490,6 @@ [KNL, SMP] Set scheduler's default relax_domain_level. See Documentation/cgroup-v1/cpusets.txt. - relative_sleep_states= - [SUSPEND] Use sleep state labeling where the deepest - state available other than hibernation is always "mem". - Format: { "0" | "1" } - 0 -- Traditional sleep state labels. - 1 -- Relative sleep state labels. - reserve= [KNL,BUGS] Force the kernel to ignore some iomem area reservetop= [X86-32] diff --git a/Documentation/cpu-freq/cpufreq-stats.txt b/Documentation/cpu-freq/cpufreq-stats.txt index 8d9773f23550..3c355f6ad834 100644 --- a/Documentation/cpu-freq/cpufreq-stats.txt +++ b/Documentation/cpu-freq/cpufreq-stats.txt @@ -44,11 +44,17 @@ the stats driver insertion. total 0 drwxr-xr-x 2 root root 0 May 14 16:06 . drwxr-xr-x 3 root root 0 May 14 15:58 .. +--w------- 1 root root 4096 May 14 16:06 reset -r--r--r-- 1 root root 4096 May 14 16:06 time_in_state -r--r--r-- 1 root root 4096 May 14 16:06 total_trans -r--r--r-- 1 root root 4096 May 14 16:06 trans_table -------------------------------------------------------------------------------- +- reset +Write-only attribute that can be used to reset the stat counters. This can be +useful for evaluating system behaviour under different governors without the +need for a reboot. + - time_in_state This gives the amount of time spent in each of the frequencies supported by this CPU. The cat output will have "<frequency> <time>" pair in each line, which diff --git a/Documentation/cpu-freq/intel-pstate.txt b/Documentation/cpu-freq/intel-pstate.txt index e6bd1e6512a5..1953994ef5e6 100644 --- a/Documentation/cpu-freq/intel-pstate.txt +++ b/Documentation/cpu-freq/intel-pstate.txt @@ -48,7 +48,7 @@ In addition to the frequency-controlling interfaces provided by the cpufreq core, the driver provides its own sysfs files to control the P-State selection. These files have been added to /sys/devices/system/cpu/intel_pstate/. Any changes made to these files are applicable to all CPUs (even in a -multi-package system). +multi-package system, Refer to later section on placing "Per-CPU limits"). max_perf_pct: Limits the maximum P-State that will be requested by the driver. It states it as a percentage of the available performance. The @@ -120,13 +120,57 @@ frequency is fictional for Intel Core processors. Even if the scaling driver selects a single P-State, the actual frequency the processor will run at is selected by the processor itself. +Per-CPU limits + +The kernel command line option "intel_pstate=per_cpu_perf_limits" forces +the intel_pstate driver to use per-CPU performance limits. When it is set, +the sysfs control interface described above is subject to limitations. +- The following controls are not available for both read and write + /sys/devices/system/cpu/intel_pstate/max_perf_pct + /sys/devices/system/cpu/intel_pstate/min_perf_pct +- The following controls can be used to set performance limits, as far as the +architecture of the processor permits: + /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq + /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq + /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor +- User can still observe turbo percent and number of P-States from + /sys/devices/system/cpu/intel_pstate/turbo_pct + /sys/devices/system/cpu/intel_pstate/num_pstates +- User can read write system wide turbo status + /sys/devices/system/cpu/no_turbo + +Support of energy performance hints +It is possible to provide hints to the HWP algorithms in the processor +to be more performance centric to more energy centric. When the driver +is using HWP, two additional cpufreq sysfs attributes are presented for +each logical CPU. +These attributes are: + - energy_performance_available_preferences + - energy_performance_preference + +To get list of supported hints: +$ cat energy_performance_available_preferences + default performance balance_performance balance_power power + +The current preference can be read or changed via cpufreq sysfs +attribute "energy_performance_preference". Reading from this attribute +will display current effective setting. User can write any of the valid +preference string to this attribute. User can always restore to power-on +default by writing "default". + +Since threads can migrate to different CPUs, this is possible that the +new CPU may have different energy performance preference than the previous +one. To avoid such issues, either threads can be pinned to specific CPUs +or set the same energy performance preference value to all CPUs. + Tuning Intel P-State driver -When HWP mode is not used, debugfs files have also been added to allow the -tuning of the internal governor algorithm. These files are located at -/sys/kernel/debug/pstate_snb/. The algorithm uses a PID (Proportional -Integral Derivative) controller. The PID tunable parameters are: +When the performance can be tuned using PID (Proportional Integral +Derivative) controller, debugfs files are provided for adjusting performance. +They are presented under: +/sys/kernel/debug/pstate_snb/ +The PID tunable parameters are: deadband d_gain_pct i_gain_pct diff --git a/Documentation/devicetree/bindings/cpufreq/brcm,stb-avs-cpu-freq.txt b/Documentation/devicetree/bindings/cpufreq/brcm,stb-avs-cpu-freq.txt new file mode 100644 index 000000000000..af2385795d78 --- /dev/null +++ b/Documentation/devicetree/bindings/cpufreq/brcm,stb-avs-cpu-freq.txt @@ -0,0 +1,78 @@ +Broadcom AVS mail box and interrupt register bindings +===================================================== + +A total of three DT nodes are required. One node (brcm,avs-cpu-data-mem) +references the mailbox register used to communicate with the AVS CPU[1]. The +second node (brcm,avs-cpu-l2-intr) is required to trigger an interrupt on +the AVS CPU. The interrupt tells the AVS CPU that it needs to process a +command sent to it by a driver. Interrupting the AVS CPU is mandatory for +commands to be processed. + +The interface also requires a reference to the AVS host interrupt controller, +so a driver can react to interrupts generated by the AVS CPU whenever a command +has been processed. See [2] for more information on the brcm,l2-intc node. + +[1] The AVS CPU is an independent co-processor that runs proprietary +firmware. On some SoCs, this firmware supports DFS and DVFS in addition to +Adaptive Voltage Scaling. + +[2] Documentation/devicetree/bindings/interrupt-controller/brcm,l2-intc.txt + + +Node brcm,avs-cpu-data-mem +-------------------------- + +Required properties: +- compatible: must include: brcm,avs-cpu-data-mem and + should include: one of brcm,bcm7271-avs-cpu-data-mem or + brcm,bcm7268-avs-cpu-data-mem +- reg: Specifies base physical address and size of the registers. +- interrupts: The interrupt that the AVS CPU will use to interrupt the host + when a command completed. +- interrupt-parent: The interrupt controller the above interrupt is routed + through. +- interrupt-names: The name of the interrupt used to interrupt the host. + +Optional properties: +- None + +Node brcm,avs-cpu-l2-intr +------------------------- + +Required properties: +- compatible: must include: brcm,avs-cpu-l2-intr and + should include: one of brcm,bcm7271-avs-cpu-l2-intr or + brcm,bcm7268-avs-cpu-l2-intr +- reg: Specifies base physical address and size of the registers. + +Optional properties: +- None + + +Example +======= + + avs_host_l2_intc: interrupt-controller@f04d1200 { + #interrupt-cells = <1>; + compatible = "brcm,l2-intc"; + interrupt-parent = <&intc>; + reg = <0xf04d1200 0x48>; + interrupt-controller; + interrupts = <0x0 0x19 0x0>; + interrupt-names = "avs"; + }; + + avs-cpu-data-mem@f04c4000 { + compatible = "brcm,bcm7271-avs-cpu-data-mem", + "brcm,avs-cpu-data-mem"; + reg = <0xf04c4000 0x60>; + interrupts = <0x1a>; + interrupt-parent = <&avs_host_l2_intc>; + interrupt-names = "sw_intr"; + }; + + avs-cpu-l2-intr@f04d1100 { + compatible = "brcm,bcm7271-avs-cpu-l2-intr", + "brcm,avs-cpu-l2-intr"; + reg = <0xf04d1100 0x10>; + }; diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt index ee91cbdd95ee..9f5ca4457b5f 100644 --- a/Documentation/devicetree/bindings/opp/opp.txt +++ b/Documentation/devicetree/bindings/opp/opp.txt @@ -86,8 +86,14 @@ Optional properties: Single entry is for target voltage and three entries are for <target min max> voltages. - Entries for multiple regulators must be present in the same order as - regulators are specified in device's DT node. + Entries for multiple regulators shall be provided in the same field separated + by angular brackets <>. The OPP binding doesn't provide any provisions to + relate the values to their power supplies or the order in which the supplies + need to be configured and that is left for the implementation specific + binding. + + Entries for all regulators shall be of the same size, i.e. either all use a + single value or triplets. - opp-microvolt-<name>: Named opp-microvolt property. This is exactly similar to the above opp-microvolt property, but allows multiple voltage ranges to be @@ -104,10 +110,13 @@ Optional properties: Should only be set if opp-microvolt is set for the OPP. - Entries for multiple regulators must be present in the same order as - regulators are specified in device's DT node. If this property isn't required - for few regulators, then this should be marked as zero for them. If it isn't - required for any regulator, then this property need not be present. + Entries for multiple regulators shall be provided in the same field separated + by angular brackets <>. If current values aren't required for a regulator, + then it shall be filled with 0. If current values aren't required for any of + the regulators, then this field is not required. The OPP binding doesn't + provide any provisions to relate the values to their power supplies or the + order in which the supplies need to be configured and that is left for the + implementation specific binding. - opp-microamp-<name>: Named opp-microamp property. Similar to opp-microvolt-<name> property, but for microamp instead. @@ -386,10 +395,12 @@ Example 4: Handling multiple regulators / { cpus { cpu@0 { - compatible = "arm,cortex-a7"; + compatible = "vendor,cpu-type"; ... - cpu-supply = <&cpu_supply0>, <&cpu_supply1>, <&cpu_supply2>; + vcc0-supply = <&cpu_supply0>; + vcc1-supply = <&cpu_supply1>; + vcc2-supply = <&cpu_supply2>; operating-points-v2 = <&cpu0_opp_table>; }; }; diff --git a/Documentation/devicetree/bindings/power/domain-idle-state.txt b/Documentation/devicetree/bindings/power/domain-idle-state.txt new file mode 100644 index 000000000000..eefc7ed22ca2 --- /dev/null +++ b/Documentation/devicetree/bindings/power/domain-idle-state.txt @@ -0,0 +1,33 @@ +PM Domain Idle State Node: + +A domain idle state node represents the state parameters that will be used to +select the state when there are no active components in the domain. + +The state node has the following parameters - + +- compatible: + Usage: Required + Value type: <string> + Definition: Must be "domain-idle-state". + +- entry-latency-us + Usage: Required + Value type: <prop-encoded-array> + Definition: u32 value representing worst case latency in + microseconds required to enter the idle state. + The exit-latency-us duration may be guaranteed + only after entry-latency-us has passed. + +- exit-latency-us + Usage: Required + Value type: <prop-encoded-array> + Definition: u32 value representing worst case latency + in microseconds required to exit the idle state. + +- min-residency-us + Usage: Required + Value type: <prop-encoded-array> + Definition: u32 value representing minimum residency duration + in microseconds after which the idle state will yield + power benefits after overcoming the overhead in entering +i the idle state. diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt index 025b5e7df61c..723e1ad937da 100644 --- a/Documentation/devicetree/bindings/power/power_domain.txt +++ b/Documentation/devicetree/bindings/power/power_domain.txt @@ -29,6 +29,15 @@ Optional properties: specified by this binding. More details about power domain specifier are available in the next section. +- domain-idle-states : A phandle of an idle-state that shall be soaked into a + generic domain power state. The idle state definitions are + compatible with domain-idle-state specified in [1]. + The domain-idle-state property reflects the idle state of this PM domain and + not the idle states of the devices or sub-domains in the PM domain. Devices + and sub-domains have their own idle-states independent of the parent + domain's idle states. In the absence of this property, the domain would be + considered as capable of being powered-on or powered-off. + Example: power: power-controller@12340000 { @@ -59,6 +68,38 @@ The nodes above define two power controllers: 'parent' and 'child'. Domains created by the 'child' power controller are subdomains of '0' power domain provided by the 'parent' power controller. +Example 3: + parent: power-controller@12340000 { + compatible = "foo,power-controller"; + reg = <0x12340000 0x1000>; + #power-domain-cells = <0>; + domain-idle-states = <&DOMAIN_RET>, <&DOMAIN_PWR_DN>; + }; + + child: power-controller@12341000 { + compatible = "foo,power-controller"; + reg = <0x12341000 0x1000>; + power-domains = <&parent 0>; + #power-domain-cells = <0>; + domain-idle-states = <&DOMAIN_PWR_DN>; + }; + + DOMAIN_RET: state@0 { + compatible = "domain-idle-state"; + reg = <0x0>; + entry-latency-us = <1000>; + exit-latency-us = <2000>; + min-residency-us = <10000>; + }; + + DOMAIN_PWR_DN: state@1 { + compatible = "domain-idle-state"; + reg = <0x1>; + entry-latency-us = <5000>; + exit-latency-us = <8000>; + min-residency-us = <7000>; + }; + ==PM domain consumers== Required properties: @@ -76,3 +117,5 @@ Example: The node above defines a typical PM domain consumer device, which is located inside a PM domain with index 0 of a power controller represented by a node with the label "power". + +[1]. Documentation/devicetree/bindings/power/domain-idle-state.txt diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt index 8ba6625fdd63..73ddea39a9ce 100644 --- a/Documentation/power/devices.txt +++ b/Documentation/power/devices.txt @@ -607,7 +607,9 @@ individually. Instead, a set of devices sharing a power resource can be put into a low-power state together at the same time by turning off the shared power resource. Of course, they also need to be put into the full-power state together, by turning the shared power resource on. A set of devices with this -property is often referred to as a power domain. +property is often referred to as a power domain. A power domain may also be +nested inside another power domain. The nested domain is referred to as the +sub-domain of the parent domain. Support for power domains is provided through the pm_domain field of struct device. This field is a pointer to an object of type struct dev_pm_domain, @@ -629,6 +631,16 @@ support for power domains into subsystem-level callbacks, for example by modifying the platform bus type. Other platforms need not implement it or take it into account in any way. +Devices may be defined as IRQ-safe which indicates to the PM core that their +runtime PM callbacks may be invoked with disabled interrupts (see +Documentation/power/runtime_pm.txt for more information). If an IRQ-safe +device belongs to a PM domain, the runtime PM of the domain will be +disallowed, unless the domain itself is defined as IRQ-safe. However, it +makes sense to define a PM domain as IRQ-safe only if all the devices in it +are IRQ-safe. Moreover, if an IRQ-safe domain has a parent domain, the runtime +PM of the parent is only allowed if the parent itself is IRQ-safe too with the +additional restriction that all child domains of an IRQ-safe parent must also +be IRQ-safe. Device Low Power (suspend) States --------------------------------- diff --git a/Documentation/power/states.txt b/Documentation/power/states.txt index 50f3ef9177c1..8a39ce45d8a0 100644 --- a/Documentation/power/states.txt +++ b/Documentation/power/states.txt @@ -8,25 +8,43 @@ for each state. The states are represented by strings that can be read or written to the /sys/power/state file. Those strings may be "mem", "standby", "freeze" and -"disk", where the last one always represents hibernation (Suspend-To-Disk) and -the meaning of the remaining ones depends on the relative_sleep_states command -line argument. - -For relative_sleep_states=1, the strings "mem", "standby" and "freeze" label the -available non-hibernation sleep states from the deepest to the shallowest, -respectively. In that case, "mem" is always present in /sys/power/state, -because there is at least one non-hibernation sleep state in every system. If -the given system supports two non-hibernation sleep states, "standby" is present -in /sys/power/state in addition to "mem". If the system supports three -non-hibernation sleep states, "freeze" will be present in /sys/power/state in -addition to "mem" and "standby". - -For relative_sleep_states=0, which is the default, the following descriptions -apply. - -state: Suspend-To-Idle +"disk", where the last three always represent Power-On Suspend (if supported), +Suspend-To-Idle and hibernation (Suspend-To-Disk), respectively. + +The meaning of the "mem" string is controlled by the /sys/power/mem_sleep file. +It contains strings representing the available modes of system suspend that may +be triggered by writing "mem" to /sys/power/state. These modes are "s2idle" +(Suspend-To-Idle), "shallow" (Power-On Suspend) and "deep" (Suspend-To-RAM). +The "s2idle" mode is always available, while the other ones are only available +if supported by the platform (if not supported, the strings representing them +are not present in /sys/power/mem_sleep). The string representing the suspend +mode to be used subsequently is enclosed in square brackets. Writing one of +the other strings present in /sys/power/mem_sleep to it causes the suspend mode +to be used subsequently to change to the one represented by that string. + +Consequently, there are two ways to cause the system to go into the +Suspend-To-Idle sleep state. The first one is to write "freeze" directly to +/sys/power/state. The second one is to write "s2idle" to /sys/power/mem_sleep +and then to wrtie "mem" to /sys/power/state. Similarly, there are two ways +to cause the system to go into the Power-On Suspend sleep state (the strings to +write to the control files in that case are "standby" or "shallow" and "mem", +respectively) if that state is supported by the platform. In turn, there is +only one way to cause the system to go into the Suspend-To-RAM state (write +"deep" into /sys/power/mem_sleep and "mem" into /sys/power/state). + +The default suspend mode (ie. the one to be used without writing anything into +/sys/power/mem_sleep) is either "deep" (if Suspend-To-RAM is supported) or +"s2idle", but it can be overridden by the value of the "mem_sleep_default" +parameter in the kernel command line. On some ACPI-based systems, depending on +the information in the FADT, the default may be "s2idle" even if Suspend-To-RAM +is supported. + +The properties of all of the sleep states are described below. + + +State: Suspend-To-Idle ACPI state: S0 -Label: "freeze" +Label: "s2idle" ("freeze") This state is a generic, pure software, light-weight, system sleep state. It allows more energy to be saved relative to runtime idle by freezing user @@ -35,13 +53,13 @@ lower-power than available at run time), such that the processors can spend more time in their idle states. This state can be used for platforms without Power-On Suspend/Suspend-to-RAM -support, or it can be used in addition to Suspend-to-RAM (memory sleep) -to provide reduced resume latency. It is always supported. +support, or it can be used in addition to Suspend-to-RAM to provide reduced +resume latency. It is always supported. State: Standby / Power-On Suspend ACPI State: S1 -Label: "standby" +Label: "shallow" ("standby") This state, if supported, offers moderate, though real, power savings, while providing a relatively low-latency transition back to a working system. No @@ -58,7 +76,7 @@ state. State: Suspend-to-RAM ACPI State: S3 -Label: "mem" +Label: "deep" This state, if supported, offers significant power savings as everything in the system is put into a low-power state, except for memory, which should be placed |