summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/README13
-rw-r--r--Documentation/ABI/testing/sysfs-class-mtd2
-rw-r--r--Documentation/ABI/testing/sysfs-class-net-batman-adv4
-rw-r--r--Documentation/ABI/testing/sysfs-class-net-mesh34
-rw-r--r--Documentation/ABI/testing/sysfs-class-powercap152
-rw-r--r--Documentation/ABI/testing/sysfs-driver-hid-roccat-ryos178
-rw-r--r--Documentation/ABI/testing/sysfs-driver-hid-wiimote18
-rw-r--r--Documentation/DMA-API-HOWTO.txt37
-rw-r--r--Documentation/DMA-API.txt8
-rw-r--r--Documentation/DMA-attributes.txt6
-rw-r--r--Documentation/DocBook/80211.tmpl4
-rw-r--r--Documentation/DocBook/kernel-locking.tmpl2
-rw-r--r--Documentation/DocBook/mtdnand.tmpl2
-rw-r--r--Documentation/PCI/pci.txt8
-rw-r--r--Documentation/backlight/lp855x-driver.txt5
-rw-r--r--Documentation/blockdev/floppy.txt6
-rw-r--r--Documentation/cgroups/memory.txt10
-rw-r--r--Documentation/cpu-freq/cpu-drivers.txt27
-rw-r--r--Documentation/cpu-freq/governors.txt4
-rw-r--r--Documentation/cpu-hotplug.txt2
-rw-r--r--Documentation/cpuidle/governor.txt1
-rw-r--r--Documentation/device-mapper/cache-policies.txt6
-rw-r--r--Documentation/device-mapper/cache.txt57
-rw-r--r--Documentation/device-mapper/dm-crypt.txt11
-rw-r--r--Documentation/devices.txt1
-rw-r--r--Documentation/devicetree/bindings/hwmon/lm90.txt44
-rw-r--r--Documentation/devicetree/bindings/input/touchscreen/ti-tsc-adc.txt2
-rw-r--r--Documentation/devicetree/bindings/mfd/as3722.txt194
-rw-r--r--Documentation/devicetree/bindings/mfd/s2mps11.txt13
-rw-r--r--Documentation/devicetree/bindings/mtd/gpmc-nand.txt16
-rw-r--r--Documentation/devicetree/bindings/net/cpsw-phy-sel.txt28
-rw-r--r--Documentation/devicetree/bindings/pci/designware-pcie.txt7
-rw-r--r--Documentation/devicetree/bindings/pwm/pwm-samsung.txt2
-rw-r--r--Documentation/devicetree/bindings/video/atmel,lcdc.txt75
-rw-r--r--Documentation/devicetree/bindings/video/backlight/lp855x.txt29
-rw-r--r--Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt7
-rw-r--r--Documentation/filesystems/directory-locking31
-rw-r--r--Documentation/filesystems/f2fs.txt7
-rw-r--r--Documentation/filesystems/porting8
-rw-r--r--Documentation/filesystems/proc.txt1
-rw-r--r--Documentation/filesystems/vfat.txt2
-rw-r--r--Documentation/gcov.txt4
-rw-r--r--Documentation/hwmon/lm906
-rw-r--r--Documentation/input/gamepad.txt3
-rw-r--r--Documentation/kbuild/kconfig.txt11
-rw-r--r--Documentation/kernel-parameters.txt6
-rw-r--r--Documentation/lockstat.txt123
-rw-r--r--Documentation/mutex-design.txt10
-rw-r--r--Documentation/networking/batman-adv.txt54
-rw-r--r--Documentation/networking/bonding.txt75
-rw-r--r--Documentation/networking/can.txt217
-rw-r--r--Documentation/networking/ip-sysctl.txt15
-rw-r--r--Documentation/networking/netdevices.txt10
-rw-r--r--Documentation/power/opp.txt108
-rw-r--r--Documentation/power/powercap/powercap.txt236
-rw-r--r--Documentation/power/runtime_pm.txt14
-rw-r--r--Documentation/ptp/testptp.c65
-rw-r--r--Documentation/pwm.txt4
-rw-r--r--Documentation/sysctl/kernel.txt25
-rw-r--r--Documentation/sysctl/vm.txt15
-rw-r--r--Documentation/timers/00-INDEX4
-rw-r--r--Documentation/trace/tracepoints.txt5
-rw-r--r--Documentation/usb/gadget_configfs.txt6
-rw-r--r--Documentation/virtual/kvm/00-INDEX24
-rw-r--r--Documentation/virtual/kvm/api.txt152
-rw-r--r--Documentation/virtual/kvm/cpuid.txt7
-rw-r--r--Documentation/virtual/kvm/devices/vfio.txt22
-rw-r--r--Documentation/virtual/kvm/locking.txt19
-rw-r--r--Documentation/vm/00-INDEX20
-rw-r--r--Documentation/vm/split_page_table_lock94
-rw-r--r--Documentation/vm/zswap.txt8
71 files changed, 2044 insertions, 382 deletions
diff --git a/Documentation/ABI/README b/Documentation/ABI/README
index 10069828568b..1fafc4b0753b 100644
--- a/Documentation/ABI/README
+++ b/Documentation/ABI/README
@@ -72,3 +72,16 @@ kernel tree without going through the obsolete state first.
It's up to the developer to place their interfaces in the category they
wish for it to start out in.
+
+
+Notable bits of non-ABI, which should not under any circumstances be considered
+stable:
+
+- Kconfig. Userspace should not rely on the presence or absence of any
+ particular Kconfig symbol, in /proc/config.gz, in the copy of .config
+ commonly installed to /boot, or in any invocation of the kernel build
+ process.
+
+- Kernel-internal symbols. Do not rely on the presence, absence, location, or
+ type of any kernel symbol, either in System.map files or the kernel binary
+ itself. See Documentation/stable_api_nonsense.txt.
diff --git a/Documentation/ABI/testing/sysfs-class-mtd b/Documentation/ABI/testing/sysfs-class-mtd
index bfd119ace6ad..1399bb2da3eb 100644
--- a/Documentation/ABI/testing/sysfs-class-mtd
+++ b/Documentation/ABI/testing/sysfs-class-mtd
@@ -104,7 +104,7 @@ Description:
One of the following ASCII strings, representing the device
type:
- absent, ram, rom, nor, nand, dataflash, ubi, unknown
+ absent, ram, rom, nor, nand, mlc-nand, dataflash, ubi, unknown
What: /sys/class/mtd/mtdX/writesize
Date: April 2009
diff --git a/Documentation/ABI/testing/sysfs-class-net-batman-adv b/Documentation/ABI/testing/sysfs-class-net-batman-adv
index bdc00707c751..7f34a95bb963 100644
--- a/Documentation/ABI/testing/sysfs-class-net-batman-adv
+++ b/Documentation/ABI/testing/sysfs-class-net-batman-adv
@@ -1,13 +1,13 @@
What: /sys/class/net/<iface>/batman-adv/iface_status
Date: May 2010
-Contact: Marek Lindner <lindner_marek@yahoo.de>
+Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
Indicates the status of <iface> as it is seen by batman.
What: /sys/class/net/<iface>/batman-adv/mesh_iface
Date: May 2010
-Contact: Marek Lindner <lindner_marek@yahoo.de>
+Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
The /sys/class/net/<iface>/batman-adv/mesh_iface file
displays the batman mesh interface this <iface>
diff --git a/Documentation/ABI/testing/sysfs-class-net-mesh b/Documentation/ABI/testing/sysfs-class-net-mesh
index bdcd8b4e38f2..0baa657b18c4 100644
--- a/Documentation/ABI/testing/sysfs-class-net-mesh
+++ b/Documentation/ABI/testing/sysfs-class-net-mesh
@@ -1,22 +1,23 @@
What: /sys/class/net/<mesh_iface>/mesh/aggregated_ogms
Date: May 2010
-Contact: Marek Lindner <lindner_marek@yahoo.de>
+Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
Indicates whether the batman protocol messages of the
mesh <mesh_iface> shall be aggregated or not.
-What: /sys/class/net/<mesh_iface>/mesh/ap_isolation
+What: /sys/class/net/<mesh_iface>/mesh/<vlan_subdir>/ap_isolation
Date: May 2011
-Contact: Antonio Quartulli <ordex@autistici.org>
+Contact: Antonio Quartulli <antonio@meshcoding.com>
Description:
Indicates whether the data traffic going from a
wireless client to another wireless client will be
- silently dropped.
+ silently dropped. <vlan_subdir> is empty when referring
+ to the untagged lan.
What: /sys/class/net/<mesh_iface>/mesh/bonding
Date: June 2010
-Contact: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
+Contact: Simon Wunderlich <sw@simonwunderlich.de>
Description:
Indicates whether the data traffic going through the
mesh will be sent using multiple interfaces at the
@@ -24,7 +25,7 @@ Description:
What: /sys/class/net/<mesh_iface>/mesh/bridge_loop_avoidance
Date: November 2011
-Contact: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
+Contact: Simon Wunderlich <sw@simonwunderlich.de>
Description:
Indicates whether the bridge loop avoidance feature
is enabled. This feature detects and avoids loops
@@ -41,21 +42,21 @@ Description:
What: /sys/class/net/<mesh_iface>/mesh/gw_bandwidth
Date: October 2010
-Contact: Marek Lindner <lindner_marek@yahoo.de>
+Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
Defines the bandwidth which is propagated by this
node if gw_mode was set to 'server'.
What: /sys/class/net/<mesh_iface>/mesh/gw_mode
Date: October 2010
-Contact: Marek Lindner <lindner_marek@yahoo.de>
+Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
Defines the state of the gateway features. Can be
either 'off', 'client' or 'server'.
What: /sys/class/net/<mesh_iface>/mesh/gw_sel_class
Date: October 2010
-Contact: Marek Lindner <lindner_marek@yahoo.de>
+Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
Defines the selection criteria this node will use
to choose a gateway if gw_mode was set to 'client'.
@@ -77,25 +78,14 @@ Description:
What: /sys/class/net/<mesh_iface>/mesh/orig_interval
Date: May 2010
-Contact: Marek Lindner <lindner_marek@yahoo.de>
+Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
Defines the interval in milliseconds in which batman
sends its protocol messages.
What: /sys/class/net/<mesh_iface>/mesh/routing_algo
Date: Dec 2011
-Contact: Marek Lindner <lindner_marek@yahoo.de>
+Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
Defines the routing procotol this mesh instance
uses to find the optimal paths through the mesh.
-
-What: /sys/class/net/<mesh_iface>/mesh/vis_mode
-Date: May 2010
-Contact: Marek Lindner <lindner_marek@yahoo.de>
-Description:
- Each batman node only maintains information about its
- own local neighborhood, therefore generating graphs
- showing the topology of the entire mesh is not easily
- feasible without having a central instance to collect
- the local topologies from all nodes. This file allows
- to activate the collecting (server) mode.
diff --git a/Documentation/ABI/testing/sysfs-class-powercap b/Documentation/ABI/testing/sysfs-class-powercap
new file mode 100644
index 000000000000..db3b3ff70d84
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-powercap
@@ -0,0 +1,152 @@
+What: /sys/class/powercap/
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ The powercap/ class sub directory belongs to the power cap
+ subsystem. Refer to
+ Documentation/power/powercap/powercap.txt for details.
+
+What: /sys/class/powercap/<control type>
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ A <control type> is a unique name under /sys/class/powercap.
+ Here <control type> determines how the power is going to be
+ controlled. A <control type> can contain multiple power zones.
+
+What: /sys/class/powercap/<control type>/enabled
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ This allows to enable/disable power capping for a "control type".
+ This status affects every power zone using this "control_type.
+
+What: /sys/class/powercap/<control type>/<power zone>
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ A power zone is a single or a collection of devices, which can
+ be independently monitored and controlled. A power zone sysfs
+ entry is qualified with the name of the <control type>.
+ E.g. intel-rapl:0:1:1.
+
+What: /sys/class/powercap/<control type>/<power zone>/<child power zone>
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Power zones may be organized in a hierarchy in which child
+ power zones provide monitoring and control for a subset of
+ devices under the parent. For example, if there is a parent
+ power zone for a whole CPU package, each CPU core in it can
+ be a child power zone.
+
+What: /sys/class/powercap/.../<power zone>/name
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Specifies the name of this power zone.
+
+What: /sys/class/powercap/.../<power zone>/energy_uj
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Current energy counter in micro-joules. Write "0" to reset.
+ If the counter can not be reset, then this attribute is
+ read-only.
+
+What: /sys/class/powercap/.../<power zone>/max_energy_range_uj
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Range of the above energy counter in micro-joules.
+
+
+What: /sys/class/powercap/.../<power zone>/power_uw
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Current power in micro-watts.
+
+What: /sys/class/powercap/.../<power zone>/max_power_range_uw
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Range of the above power value in micro-watts.
+
+What: /sys/class/powercap/.../<power zone>/constraint_X_name
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Each power zone can define one or more constraints. Each
+ constraint can have an optional name. Here "X" can have values
+ from 0 to max integer.
+
+What: /sys/class/powercap/.../<power zone>/constraint_X_power_limit_uw
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Power limit in micro-watts should be applicable for
+ the time window specified by "constraint_X_time_window_us".
+ Here "X" can have values from 0 to max integer.
+
+What: /sys/class/powercap/.../<power zone>/constraint_X_time_window_us
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Time window in micro seconds. This is used along with
+ constraint_X_power_limit_uw to define a power constraint.
+ Here "X" can have values from 0 to max integer.
+
+
+What: /sys/class/powercap/<control type>/.../constraint_X_max_power_uw
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Maximum allowed power in micro watts for this constraint.
+ Here "X" can have values from 0 to max integer.
+
+What: /sys/class/powercap/<control type>/.../constraint_X_min_power_uw
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Minimum allowed power in micro watts for this constraint.
+ Here "X" can have values from 0 to max integer.
+
+What: /sys/class/powercap/.../<power zone>/constraint_X_max_time_window_us
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Maximum allowed time window in micro seconds for this
+ constraint. Here "X" can have values from 0 to max integer.
+
+What: /sys/class/powercap/.../<power zone>/constraint_X_min_time_window_us
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Minimum allowed time window in micro seconds for this
+ constraint. Here "X" can have values from 0 to max integer.
+
+What: /sys/class/powercap/.../<power zone>/enabled
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description
+ This allows to enable/disable power capping at power zone level.
+ This applies to current power zone and its children.
diff --git a/Documentation/ABI/testing/sysfs-driver-hid-roccat-ryos b/Documentation/ABI/testing/sysfs-driver-hid-roccat-ryos
new file mode 100644
index 000000000000..1d6a8cf9dc0a
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-hid-roccat-ryos
@@ -0,0 +1,178 @@
+What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/control
+Date: October 2013
+Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
+Description: When written, this file lets one select which data from which
+ profile will be read next. The data has to be 3 bytes long.
+ This file is writeonly.
+Users: http://roccat.sourceforge.net
+
+What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/profile
+Date: October 2013
+Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
+Description: The mouse can store 5 profiles which can be switched by the
+ press of a button. profile holds index of actual profile.
+ This value is persistent, so its value determines the profile
+ that's active when the device is powered on next time.
+ When written, the device activates the set profile immediately.
+ The data has to be 3 bytes long.
+ The device will reject invalid data.
+Users: http://roccat.sourceforge.net
+
+What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/keys_primary
+Date: October 2013
+Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
+Description: When written, this file lets one set the default of all keys for
+ a specific profile. Profile index is included in written data.
+ The data has to be 125 bytes long.
+ Before reading this file, control has to be written to select
+ which profile to read.
+Users: http://roccat.sourceforge.net
+
+What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/keys_function
+Date: October 2013
+Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
+Description: When written, this file lets one set the function of the
+ function keys for a specific profile. Profile index is included
+ in written data. The data has to be 95 bytes long.
+ Before reading this file, control has to be written to select
+ which profile to read.
+Users: http://roccat.sourceforge.net
+
+What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/keys_macro
+Date: October 2013
+Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
+Description: When written, this file lets one set the function of the macro
+ keys for a specific profile. Profile index is included in
+ written data. The data has to be 35 bytes long.
+ Before reading this file, control has to be written to select
+ which profile to read.
+Users: http://roccat.sourceforge.net
+
+What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/keys_thumbster
+Date: October 2013
+Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
+Description: When written, this file lets one set the function of the
+ thumbster keys for a specific profile. Profile index is included
+ in written data. The data has to be 23 bytes long.
+ Before reading this file, control has to be written to select
+ which profile to read.
+Users: http://roccat.sourceforge.net
+
+What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/keys_extra
+Date: October 2013
+Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
+Description: When written, this file lets one set the function of the
+ capslock and function keys for a specific profile. Profile index
+ is included in written data. The data has to be 8 bytes long.
+ Before reading this file, control has to be written to select
+ which profile to read.
+Users: http://roccat.sourceforge.net
+
+What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/keys_easyzone
+Date: October 2013
+Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
+Description: When written, this file lets one set the function of the
+ easyzone keys for a specific profile. Profile index is included
+ in written data. The data has to be 294 bytes long.
+ Before reading this file, control has to be written to select
+ which profile to read.
+Users: http://roccat.sourceforge.net
+
+What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/key_mask
+Date: October 2013
+Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
+Description: When written, this file lets one deactivate certain keys like
+ windows and application keys, to prevent accidental presses.
+ Profile index for which this settings occur is included in
+ written data. The data has to be 6 bytes long.
+ Before reading this file, control has to be written to select
+ which profile to read.
+Users: http://roccat.sourceforge.net
+
+What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/light
+Date: October 2013
+Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
+Description: When written, this file lets one set the backlight intensity for
+ a specific profile. Profile index is included in written data.
+ This attribute is only valid for the glow and pro variant.
+ The data has to be 16 bytes long.
+ Before reading this file, control has to be written to select
+ which profile to read.
+Users: http://roccat.sourceforge.net
+
+What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/macro
+Date: October 2013
+Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
+Description: When written, this file lets one store macros with max 480
+ keystrokes for a specific button for a specific profile.
+ Button and profile indexes are included in written data.
+ The data has to be 2002 bytes long.
+ Before reading this file, control has to be written to select
+ which profile and key to read.
+Users: http://roccat.sourceforge.net
+
+What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/info
+Date: October 2013
+Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
+Description: When read, this file returns general data like firmware version.
+ The data is 8 bytes long.
+ This file is readonly.
+Users: http://roccat.sourceforge.net
+
+What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/reset
+Date: October 2013
+Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
+Description: When written, this file lets one reset the device.
+ The data has to be 3 bytes long.
+ This file is writeonly.
+Users: http://roccat.sourceforge.net
+
+What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/talk
+Date: October 2013
+Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
+Description: When written, this file lets one trigger easyshift functionality
+ from the host.
+ The data has to be 16 bytes long.
+ This file is writeonly.
+Users: http://roccat.sourceforge.net
+
+What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/light_control
+Date: October 2013
+Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
+Description: When written, this file lets one switch between stored and custom
+ light settings.
+ This attribute is only valid for the pro variant.
+ The data has to be 8 bytes long.
+ This file is writeonly.
+Users: http://roccat.sourceforge.net
+
+What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/stored_lights
+Date: October 2013
+Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
+Description: When written, this file lets one set per-key lighting for different
+ layers.
+ This attribute is only valid for the pro variant.
+ The data has to be 1382 bytes long.
+ Before reading this file, control has to be written to select
+ which profile to read.
+Users: http://roccat.sourceforge.net
+
+What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/custom_lights
+Date: October 2013
+Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
+Description: When written, this file lets one set the actual per-key lighting.
+ This attribute is only valid for the pro variant.
+ The data has to be 20 bytes long.
+ This file is writeonly.
+Users: http://roccat.sourceforge.net
+
+What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/light_macro
+Date: October 2013
+Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
+Description: When written, this file lets one set a light macro that is looped
+ whenever the device gets in dimness mode.
+ This attribute is only valid for the pro variant.
+ The data has to be 2002 bytes long.
+ Before reading this file, control has to be written to select
+ which profile to read.
+Users: http://roccat.sourceforge.net
diff --git a/Documentation/ABI/testing/sysfs-driver-hid-wiimote b/Documentation/ABI/testing/sysfs-driver-hid-wiimote
index ed5dd567d397..39dfa5cb1cc5 100644
--- a/Documentation/ABI/testing/sysfs-driver-hid-wiimote
+++ b/Documentation/ABI/testing/sysfs-driver-hid-wiimote
@@ -57,3 +57,21 @@ Description: This attribute is only provided if the device was detected as a
Calibration data is already applied by the kernel to all input
values but may be used by user-space to perform other
transformations.
+
+What: /sys/bus/hid/drivers/wiimote/<dev>/pro_calib
+Date: October 2013
+KernelVersion: 3.13
+Contact: David Herrmann <dh.herrmann@gmail.com>
+Description: This attribute is only provided if the device was detected as a
+ pro-controller. It provides a single line with 4 calibration
+ values for all 4 analog sticks. Format is: "x1:y1 x2:y2". Data
+ is prefixed with a +/-. Each value is a signed 16bit number.
+ Data is encoded as decimal numbers and specifies the offsets of
+ the analog sticks of the pro-controller.
+ Calibration data is already applied by the kernel to all input
+ values but may be used by user-space to perform other
+ transformations.
+ Calibration data is detected by the kernel during device setup.
+ You can write "scan\n" into this file to re-trigger calibration.
+ You can also write data directly in the form "x1:y1 x2:y2" to
+ set the calibration values manually.
diff --git a/Documentation/DMA-API-HOWTO.txt b/Documentation/DMA-API-HOWTO.txt
index 14129f149a75..5e983031cc11 100644
--- a/Documentation/DMA-API-HOWTO.txt
+++ b/Documentation/DMA-API-HOWTO.txt
@@ -101,14 +101,23 @@ style to do this even if your device holds the default setting,
because this shows that you did think about these issues wrt. your
device.
-The query is performed via a call to dma_set_mask():
+The query is performed via a call to dma_set_mask_and_coherent():
- int dma_set_mask(struct device *dev, u64 mask);
+ int dma_set_mask_and_coherent(struct device *dev, u64 mask);
-The query for consistent allocations is performed via a call to
-dma_set_coherent_mask():
+which will query the mask for both streaming and coherent APIs together.
+If you have some special requirements, then the following two separate
+queries can be used instead:
- int dma_set_coherent_mask(struct device *dev, u64 mask);
+ The query for streaming mappings is performed via a call to
+ dma_set_mask():
+
+ int dma_set_mask(struct device *dev, u64 mask);
+
+ The query for consistent allocations is performed via a call
+ to dma_set_coherent_mask():
+
+ int dma_set_coherent_mask(struct device *dev, u64 mask);
Here, dev is a pointer to the device struct of your device, and mask
is a bit mask describing which bits of an address your device
@@ -137,7 +146,7 @@ exactly why.
The standard 32-bit addressing device would do something like this:
- if (dma_set_mask(dev, DMA_BIT_MASK(32))) {
+ if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))) {
printk(KERN_WARNING
"mydev: No suitable DMA available.\n");
goto ignore_this_device;
@@ -171,22 +180,20 @@ the case would look like this:
int using_dac, consistent_using_dac;
- if (!dma_set_mask(dev, DMA_BIT_MASK(64))) {
+ if (!dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64))) {
using_dac = 1;
consistent_using_dac = 1;
- dma_set_coherent_mask(dev, DMA_BIT_MASK(64));
- } else if (!dma_set_mask(dev, DMA_BIT_MASK(32))) {
+ } else if (!dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))) {
using_dac = 0;
consistent_using_dac = 0;
- dma_set_coherent_mask(dev, DMA_BIT_MASK(32));
} else {
printk(KERN_WARNING
"mydev: No suitable DMA available.\n");
goto ignore_this_device;
}
-dma_set_coherent_mask() will always be able to set the same or a
-smaller mask as dma_set_mask(). However for the rare case that a
+The coherent coherent mask will always be able to set the same or a
+smaller mask as the streaming mask. However for the rare case that a
device driver only uses consistent allocations, one would have to
check the return value from dma_set_coherent_mask().
@@ -199,9 +206,9 @@ address you might do something like:
goto ignore_this_device;
}
-When dma_set_mask() is successful, and returns zero, the kernel saves
-away this mask you have provided. The kernel will use this
-information later when you make DMA mappings.
+When dma_set_mask() or dma_set_mask_and_coherent() is successful, and
+returns zero, the kernel saves away this mask you have provided. The
+kernel will use this information later when you make DMA mappings.
There is a case which we are aware of at this time, which is worth
mentioning in this documentation. If your device supports multiple
diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 78a6c569d204..e865279cec58 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -142,6 +142,14 @@ internal API for use by the platform than an external API for use by
driver writers.
int
+dma_set_mask_and_coherent(struct device *dev, u64 mask)
+
+Checks to see if the mask is possible and updates the device
+streaming and coherent DMA mask parameters if it is.
+
+Returns: 0 if successful and a negative error if not.
+
+int
dma_set_mask(struct device *dev, u64 mask)
Checks to see if the mask is possible and updates the device
diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
index e59480db9ee0..cc2450d80310 100644
--- a/Documentation/DMA-attributes.txt
+++ b/Documentation/DMA-attributes.txt
@@ -13,7 +13,7 @@ all pending DMA writes to complete, and thus provides a mechanism to
strictly order DMA from a device across all intervening busses and
bridges. This barrier is not specific to a particular type of
interconnect, it applies to the system as a whole, and so its
-implementation must account for the idiosyncracies of the system all
+implementation must account for the idiosyncrasies of the system all
the way from the DMA device to memory.
As an example of a situation where DMA_ATTR_WRITE_BARRIER would be
@@ -60,7 +60,7 @@ such mapping is non-trivial task and consumes very limited resources
Buffers allocated with this attribute can be only passed to user space
by calling dma_mmap_attrs(). By using this API, you are guaranteeing
that you won't dereference the pointer returned by dma_alloc_attr(). You
-can threat it as a cookie that must be passed to dma_mmap_attrs() and
+can treat it as a cookie that must be passed to dma_mmap_attrs() and
dma_free_attrs(). Make sure that both of these also get this attribute
set on each call.
@@ -82,7 +82,7 @@ to 'device' domain, what synchronizes CPU caches for the given region
(usually it means that the cache has been flushed or invalidated
depending on the dma direction). However, next calls to
dma_map_{single,page,sg}() for other devices will perform exactly the
-same sychronization operation on the CPU cache. CPU cache sychronization
+same synchronization operation on the CPU cache. CPU cache synchronization
might be a time consuming operation, especially if the buffers are
large, so it is highly recommended to avoid it if possible.
DMA_ATTR_SKIP_CPU_SYNC allows platform code to skip synchronization of
diff --git a/Documentation/DocBook/80211.tmpl b/Documentation/DocBook/80211.tmpl
index f403ec3c5c9a..46ad6faee9ab 100644
--- a/Documentation/DocBook/80211.tmpl
+++ b/Documentation/DocBook/80211.tmpl
@@ -152,8 +152,8 @@
!Finclude/net/cfg80211.h cfg80211_scan_request
!Finclude/net/cfg80211.h cfg80211_scan_done
!Finclude/net/cfg80211.h cfg80211_bss
-!Finclude/net/cfg80211.h cfg80211_inform_bss_frame
-!Finclude/net/cfg80211.h cfg80211_inform_bss
+!Finclude/net/cfg80211.h cfg80211_inform_bss_width_frame
+!Finclude/net/cfg80211.h cfg80211_inform_bss_width
!Finclude/net/cfg80211.h cfg80211_unlink_bss
!Finclude/net/cfg80211.h cfg80211_find_ie
!Finclude/net/cfg80211.h ieee80211_bss_get_ie
diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl
index 09e884e5b9f5..19f2a5a5a5b4 100644
--- a/Documentation/DocBook/kernel-locking.tmpl
+++ b/Documentation/DocBook/kernel-locking.tmpl
@@ -1958,7 +1958,7 @@ machines due to caching.
<chapter id="apiref-mutex">
<title>Mutex API reference</title>
!Iinclude/linux/mutex.h
-!Ekernel/mutex.c
+!Ekernel/locking/mutex.c
</chapter>
<chapter id="apiref-futex">
diff --git a/Documentation/DocBook/mtdnand.tmpl b/Documentation/DocBook/mtdnand.tmpl
index a248f42a121e..cd11926e07c7 100644
--- a/Documentation/DocBook/mtdnand.tmpl
+++ b/Documentation/DocBook/mtdnand.tmpl
@@ -1222,8 +1222,6 @@ in this page</entry>
#define NAND_BBT_VERSION 0x00000100
/* Create a bbt if none axists */
#define NAND_BBT_CREATE 0x00000200
-/* Search good / bad pattern through all pages of a block */
-#define NAND_BBT_SCANALLPAGES 0x00000400
/* Write bbt if neccecary */
#define NAND_BBT_WRITE 0x00001000
/* Read and write back block contents when writing bbt */
diff --git a/Documentation/PCI/pci.txt b/Documentation/PCI/pci.txt
index bccf602a87f5..6f458564d625 100644
--- a/Documentation/PCI/pci.txt
+++ b/Documentation/PCI/pci.txt
@@ -525,8 +525,9 @@ corresponding register block for you.
6. Other interesting functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-pci_find_slot() Find pci_dev corresponding to given bus and
- slot numbers.
+pci_get_domain_bus_and_slot() Find pci_dev corresponding to given domain,
+ bus and slot and number. If the device is
+ found, its reference count is increased.
pci_set_power_state() Set PCI Power Management state (0=D0 ... 3=D3)
pci_find_capability() Find specified capability in device's capability
list.
@@ -582,7 +583,8 @@ having sane locking.
pci_find_device() Superseded by pci_get_device()
pci_find_subsys() Superseded by pci_get_subsys()
-pci_find_slot() Superseded by pci_get_slot()
+pci_find_slot() Superseded by pci_get_domain_bus_and_slot()
+pci_get_slot() Superseded by pci_get_domain_bus_and_slot()
The alternative is the traditional PCI device driver that walks PCI
diff --git a/Documentation/backlight/lp855x-driver.txt b/Documentation/backlight/lp855x-driver.txt
index 1c732f0c6758..01bce243d3d7 100644
--- a/Documentation/backlight/lp855x-driver.txt
+++ b/Documentation/backlight/lp855x-driver.txt
@@ -4,7 +4,8 @@ Kernel driver lp855x
Backlight driver for LP855x ICs
Supported chips:
- Texas Instruments LP8550, LP8551, LP8552, LP8553, LP8556 and LP8557
+ Texas Instruments LP8550, LP8551, LP8552, LP8553, LP8555, LP8556 and
+ LP8557
Author: Milo(Woogyom) Kim <milo.kim@ti.com>
@@ -24,7 +25,7 @@ Value : pwm based or register based
2) chip_id
The lp855x chip id.
-Value : lp8550/lp8551/lp8552/lp8553/lp8556/lp8557
+Value : lp8550/lp8551/lp8552/lp8553/lp8555/lp8556/lp8557
Platform data for lp855x
------------------------
diff --git a/Documentation/blockdev/floppy.txt b/Documentation/blockdev/floppy.txt
index 470fe4b5e379..e2240f5ab64d 100644
--- a/Documentation/blockdev/floppy.txt
+++ b/Documentation/blockdev/floppy.txt
@@ -39,15 +39,15 @@ Module configuration options
============================
If you use the floppy driver as a module, use the following syntax:
-modprobe floppy <options>
+modprobe floppy floppy="<options>"
Example:
- modprobe floppy omnibook messages
+ modprobe floppy floppy="omnibook messages"
If you need certain options enabled every time you load the floppy driver,
you can put:
- options floppy omnibook messages
+ options floppy floppy="omnibook messages"
in a configuration file in /etc/modprobe.d/.
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 8af4ad121828..e2bc132608fd 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -573,15 +573,19 @@ an memcg since the pages are allowed to be allocated from any physical
node. One of the use cases is evaluating application performance by
combining this information with the application's CPU allocation.
-We export "total", "file", "anon" and "unevictable" pages per-node for
-each memcg. The ouput format of memory.numa_stat is:
+Each memcg's numa_stat file includes "total", "file", "anon" and "unevictable"
+per-node page counts including "hierarchical_<counter>" which sums up all
+hierarchical children's values in addition to the memcg's own value.
+
+The ouput format of memory.numa_stat is:
total=<total pages> N0=<node 0 pages> N1=<node 1 pages> ...
file=<total file pages> N0=<node 0 pages> N1=<node 1 pages> ...
anon=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
+hierarchical_<counter>=<counter pages> N0=<node 0 pages> N1=<node 1 pages> ...
-And we have total = file + anon + unevictable.
+The "total" count is sum of file + anon + unevictable.
6. Hierarchy support
diff --git a/Documentation/cpu-freq/cpu-drivers.txt b/Documentation/cpu-freq/cpu-drivers.txt
index 40282e617913..8b1a4451422e 100644
--- a/Documentation/cpu-freq/cpu-drivers.txt
+++ b/Documentation/cpu-freq/cpu-drivers.txt
@@ -23,8 +23,8 @@ Contents:
1.1 Initialization
1.2 Per-CPU Initialization
1.3 verify
-1.4 target or setpolicy?
-1.5 target
+1.4 target/target_index or setpolicy?
+1.5 target/target_index
1.6 setpolicy
2. Frequency Table Helpers
@@ -56,7 +56,8 @@ cpufreq_driver.init - A pointer to the per-CPU initialization
cpufreq_driver.verify - A pointer to a "verification" function.
cpufreq_driver.setpolicy _or_
-cpufreq_driver.target - See below on the differences.
+cpufreq_driver.target/
+target_index - See below on the differences.
And optionally
@@ -66,7 +67,7 @@ cpufreq_driver.resume - A pointer to a per-CPU resume function
which is called with interrupts disabled
and _before_ the pre-suspend frequency
and/or policy is restored by a call to
- ->target or ->setpolicy.
+ ->target/target_index or ->setpolicy.
cpufreq_driver.attr - A pointer to a NULL-terminated list of
"struct freq_attr" which allow to
@@ -103,8 +104,8 @@ policy->governor must contain the "default policy" for
this CPU. A few moments later,
cpufreq_driver.verify and either
cpufreq_driver.setpolicy or
- cpufreq_driver.target is called with
- these values.
+ cpufreq_driver.target/target_index is called
+ with these values.
For setting some of these values (cpuinfo.min[max]_freq, policy->min[max]), the
frequency table helpers might be helpful. See the section 2 for more information
@@ -133,20 +134,28 @@ range) is within policy->min and policy->max. If necessary, increase
policy->max first, and only if this is no solution, decrease policy->min.
-1.4 target or setpolicy?
+1.4 target/target_index or setpolicy?
----------------------------
Most cpufreq drivers or even most cpu frequency scaling algorithms
only allow the CPU to be set to one frequency. For these, you use the
-->target call.
+->target/target_index call.
Some cpufreq-capable processors switch the frequency between certain
limits on their own. These shall use the ->setpolicy call
-1.4. target
+1.4. target/target_index
-------------
+The target_index call has two arguments: struct cpufreq_policy *policy,
+and unsigned int index (into the exposed frequency table).
+
+The CPUfreq driver must set the new frequency when called here. The
+actual frequency must be determined by freq_table[index].frequency.
+
+Deprecated:
+----------
The target call has three arguments: struct cpufreq_policy *policy,
unsigned int target_frequency, unsigned int relation.
diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt
index 219970ba54b7..77ec21574fb1 100644
--- a/Documentation/cpu-freq/governors.txt
+++ b/Documentation/cpu-freq/governors.txt
@@ -40,7 +40,7 @@ Most cpufreq drivers (in fact, all except one, longrun) or even most
cpu frequency scaling algorithms only offer the CPU to be set to one
frequency. In order to offer dynamic frequency scaling, the cpufreq
core must be able to tell these drivers of a "target frequency". So
-these specific drivers will be transformed to offer a "->target"
+these specific drivers will be transformed to offer a "->target/target_index"
call instead of the existing "->setpolicy" call. For "longrun", all
stays the same, though.
@@ -71,7 +71,7 @@ CPU can be set to switch independently | CPU can only be set
/ the limits of policy->{min,max}
/ \
/ \
- Using the ->setpolicy call, Using the ->target call,
+ Using the ->setpolicy call, Using the ->target/target_index call,
the limits and the the frequency closest
"policy" is set. to target_freq is set.
It is assured that it
diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt
index 786dc82f98ce..8cb9938cc47e 100644
--- a/Documentation/cpu-hotplug.txt
+++ b/Documentation/cpu-hotplug.txt
@@ -5,7 +5,7 @@
Rusty Russell <rusty@rustcorp.com.au>
Srivatsa Vaddagiri <vatsa@in.ibm.com>
i386:
- Zwane Mwaikambo <zwane@arm.linux.org.uk>
+ Zwane Mwaikambo <zwanem@gmail.com>
ppc64:
Nathan Lynch <nathanl@austin.ibm.com>
Joel Schopp <jschopp@austin.ibm.com>
diff --git a/Documentation/cpuidle/governor.txt b/Documentation/cpuidle/governor.txt
index 12c6bd50c9f6..d9020f5e847b 100644
--- a/Documentation/cpuidle/governor.txt
+++ b/Documentation/cpuidle/governor.txt
@@ -25,5 +25,4 @@ kernel configuration and platform will be selected by cpuidle.
Interfaces:
extern int cpuidle_register_governor(struct cpuidle_governor *gov);
-extern void cpuidle_unregister_governor(struct cpuidle_governor *gov);
struct cpuidle_governor
diff --git a/Documentation/device-mapper/cache-policies.txt b/Documentation/device-mapper/cache-policies.txt
index d7c440b444cc..df52a849957f 100644
--- a/Documentation/device-mapper/cache-policies.txt
+++ b/Documentation/device-mapper/cache-policies.txt
@@ -30,8 +30,10 @@ multiqueue
This policy is the default.
-The multiqueue policy has two sets of 16 queues: one set for entries
-waiting for the cache and another one for those in the cache.
+The multiqueue policy has three sets of 16 queues: one set for entries
+waiting for the cache and another two for those in the cache (a set for
+clean entries and a set for dirty entries).
+
Cache entries in the queues are aged based on logical time. Entry into
the cache is based on variable thresholds and queue selection is based
on hit count on entry. The policy aims to take different cache miss
diff --git a/Documentation/device-mapper/cache.txt b/Documentation/device-mapper/cache.txt
index 33d45ee0b737..274752f8bdf9 100644
--- a/Documentation/device-mapper/cache.txt
+++ b/Documentation/device-mapper/cache.txt
@@ -68,10 +68,11 @@ So large block sizes are bad because they waste cache space. And small
block sizes are bad because they increase the amount of metadata (both
in core and on disk).
-Writeback/writethrough
-----------------------
+Cache operating modes
+---------------------
-The cache has two modes, writeback and writethrough.
+The cache has three operating modes: writeback, writethrough and
+passthrough.
If writeback, the default, is selected then a write to a block that is
cached will go only to the cache and the block will be marked dirty in
@@ -81,8 +82,31 @@ If writethrough is selected then a write to a cached block will not
complete until it has hit both the origin and cache devices. Clean
blocks should remain clean.
+If passthrough is selected, useful when the cache contents are not known
+to be coherent with the origin device, then all reads are served from
+the origin device (all reads miss the cache) and all writes are
+forwarded to the origin device; additionally, write hits cause cache
+block invalidates. To enable passthrough mode the cache must be clean.
+Passthrough mode allows a cache device to be activated without having to
+worry about coherency. Coherency that exists is maintained, although
+the cache will gradually cool as writes take place. If the coherency of
+the cache can later be verified, or established through use of the
+"invalidate_cblocks" message, the cache device can be transitioned to
+writethrough or writeback mode while still warm. Otherwise, the cache
+contents can be discarded prior to transitioning to the desired
+operating mode.
+
A simple cleaner policy is provided, which will clean (write back) all
-dirty blocks in a cache. Useful for decommissioning a cache.
+dirty blocks in a cache. Useful for decommissioning a cache or when
+shrinking a cache. Shrinking the cache's fast device requires all cache
+blocks, in the area of the cache being removed, to be clean. If the
+area being removed from the cache still contains dirty blocks the resize
+will fail. Care must be taken to never reduce the volume used for the
+cache's fast device until the cache is clean. This is of particular
+importance if writeback mode is used. Writethrough and passthrough
+modes already maintain a clean cache. Future support to partially clean
+the cache, above a specified threshold, will allow for keeping the cache
+warm and in writeback mode during resize.
Migration throttling
--------------------
@@ -161,7 +185,7 @@ Constructor
block size : cache unit size in sectors
#feature args : number of feature arguments passed
- feature args : writethrough. (The default is writeback.)
+ feature args : writethrough or passthrough (The default is writeback.)
policy : the replacement policy to use
#policy args : an even number of arguments corresponding to
@@ -177,6 +201,13 @@ Optional feature arguments are:
back cache block contents later for performance reasons,
so they may differ from the corresponding origin blocks.
+ passthrough : a degraded mode useful for various cache coherency
+ situations (e.g., rolling back snapshots of
+ underlying storage). Reads and writes always go to
+ the origin. If a write goes to a cached origin
+ block, then the cache block is invalidated.
+ To enable passthrough mode the cache must be clean.
+
A policy called 'default' is always registered. This is an alias for
the policy we currently think is giving best all round performance.
@@ -231,12 +262,26 @@ The message format is:
E.g.
dmsetup message my_cache 0 sequential_threshold 1024
+
+Invalidation is removing an entry from the cache without writing it
+back. Cache blocks can be invalidated via the invalidate_cblocks
+message, which takes an arbitrary number of cblock ranges. Each cblock
+must be expressed as a decimal value, in the future a variant message
+that takes cblock ranges expressed in hexidecimal may be needed to
+better support efficient invalidation of larger caches. The cache must
+be in passthrough mode when invalidate_cblocks is used.
+
+ invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]*
+
+E.g.
+ dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789
+
Examples
========
The test suite can be found here:
-https://github.com/jthornber/thinp-test-suite
+https://github.com/jthornber/device-mapper-test-suite
dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
/dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0'
diff --git a/Documentation/device-mapper/dm-crypt.txt b/Documentation/device-mapper/dm-crypt.txt
index 2c656ae43ba7..c81839b52c4d 100644
--- a/Documentation/device-mapper/dm-crypt.txt
+++ b/Documentation/device-mapper/dm-crypt.txt
@@ -4,12 +4,15 @@ dm-crypt
Device-Mapper's "crypt" target provides transparent encryption of block devices
using the kernel crypto API.
+For a more detailed description of supported parameters see:
+http://code.google.com/p/cryptsetup/wiki/DMCrypt
+
Parameters: <cipher> <key> <iv_offset> <device path> \
<offset> [<#opt_params> <opt_params>]
<cipher>
Encryption cipher and an optional IV generation mode.
- (In format cipher[:keycount]-chainmode-ivopts:ivmode).
+ (In format cipher[:keycount]-chainmode-ivmode[:ivopts]).
Examples:
des
aes-cbc-essiv:sha256
@@ -19,7 +22,11 @@ Parameters: <cipher> <key> <iv_offset> <device path> \
<key>
Key used for encryption. It is encoded as a hexadecimal number.
- You can only use key sizes that are valid for the selected cipher.
+ You can only use key sizes that are valid for the selected cipher
+ in combination with the selected iv mode.
+ Note that for some iv modes the key string can contain additional
+ keys (for example IV seed) so the key contains more parts concatenated
+ into a single string.
<keycount>
Multi-key compatibility mode. You can define <keycount> keys and
diff --git a/Documentation/devices.txt b/Documentation/devices.txt
index 23721d3be3e6..80b72419ffd8 100644
--- a/Documentation/devices.txt
+++ b/Documentation/devices.txt
@@ -414,6 +414,7 @@ Your cooperation is appreciated.
200 = /dev/net/tun TAP/TUN network device
201 = /dev/button/gulpb Transmeta GULP-B buttons
202 = /dev/emd/ctl Enhanced Metadisk RAID (EMD) control
+ 203 = /dev/cuse Cuse (character device in user-space)
204 = /dev/video/em8300 EM8300 DVD decoder control
205 = /dev/video/em8300_mv EM8300 DVD decoder video
206 = /dev/video/em8300_ma EM8300 DVD decoder audio
diff --git a/Documentation/devicetree/bindings/hwmon/lm90.txt b/Documentation/devicetree/bindings/hwmon/lm90.txt
new file mode 100644
index 000000000000..e8632486b9ef
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/lm90.txt
@@ -0,0 +1,44 @@
+* LM90 series thermometer.
+
+Required node properties:
+- compatible: manufacturer and chip name, one of
+ "adi,adm1032"
+ "adi,adt7461"
+ "adi,adt7461a"
+ "gmt,g781"
+ "national,lm90"
+ "national,lm86"
+ "national,lm89"
+ "national,lm99"
+ "dallas,max6646"
+ "dallas,max6647"
+ "dallas,max6649"
+ "dallas,max6657"
+ "dallas,max6658"
+ "dallas,max6659"
+ "dallas,max6680"
+ "dallas,max6681"
+ "dallas,max6695"
+ "dallas,max6696"
+ "onnn,nct1008"
+ "winbond,w83l771"
+ "nxp,sa56004"
+
+- reg: I2C bus address of the device
+
+- vcc-supply: vcc regulator for the supply voltage.
+
+Optional properties:
+- interrupts: Contains a single interrupt specifier which describes the
+ LM90 "-ALERT" pin output.
+ See interrupt-controller/interrupts.txt for the format.
+
+Example LM90 node:
+
+temp-sensor {
+ compatible = "onnn,nct1008";
+ reg = <0x4c>;
+ vcc-supply = <&palmas_ldo6_reg>;
+ interrupt-parent = <&gpio>;
+ interrupts = <TEGRA_GPIO(O, 4) IRQ_TYPE_LEVEL_LOW>;
+}
diff --git a/Documentation/devicetree/bindings/input/touchscreen/ti-tsc-adc.txt b/Documentation/devicetree/bindings/input/touchscreen/ti-tsc-adc.txt
index 491c97b78384..878549ba814d 100644
--- a/Documentation/devicetree/bindings/input/touchscreen/ti-tsc-adc.txt
+++ b/Documentation/devicetree/bindings/input/touchscreen/ti-tsc-adc.txt
@@ -6,7 +6,7 @@ Required properties:
ti,wires: Wires refer to application modes i.e. 4/5/8 wire touchscreen
support on the platform.
ti,x-plate-resistance: X plate resistance
- ti,coordiante-readouts: The sequencer supports a total of 16
+ ti,coordinate-readouts: The sequencer supports a total of 16
programmable steps each step is used to
read a single coordinate. A single
readout is enough but multiple reads can
diff --git a/Documentation/devicetree/bindings/mfd/as3722.txt b/Documentation/devicetree/bindings/mfd/as3722.txt
new file mode 100644
index 000000000000..fc2191ecfd6b
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/as3722.txt
@@ -0,0 +1,194 @@
+* ams AS3722 Power management IC.
+
+Required properties:
+-------------------
+- compatible: Must be "ams,as3722".
+- reg: I2C device address.
+- interrupt-controller: AS3722 has internal interrupt controller which takes the
+ interrupt request from internal sub-blocks like RTC, regulators, GPIOs as well
+ as external input.
+- #interrupt-cells: Should be set to 2 for IRQ number and flags.
+ The first cell is the IRQ number. IRQ numbers for different interrupt source
+ of AS3722 are defined at dt-bindings/mfd/as3722.h
+ The second cell is the flags, encoded as the trigger masks from binding document
+ interrupts.txt, using dt-bindings/irq.
+
+Optional submodule and their properties:
+=======================================
+
+Pinmux and GPIO:
+===============
+Device has 8 GPIO pins which can be configured as GPIO as well as the special IO
+functions.
+
+Please refer to pinctrl-bindings.txt in this directory for details of the
+common pinctrl bindings used by client devices, including the meaning of the
+phrase "pin configuration node".
+
+Following are properties which is needed if GPIO and pinmux functionality
+is required:
+ Required properties:
+ -------------------
+ - gpio-controller: Marks the device node as a GPIO controller.
+ - #gpio-cells: Number of GPIO cells. Refer to binding document
+ gpio/gpio.txt
+
+ Optional properties:
+ --------------------
+ Following properties are require if pin control setting is required
+ at boot.
+ - pinctrl-names: A pinctrl state named "default" be defined, using the
+ bindings in pinctrl/pinctrl-binding.txt.
+ - pinctrl[0...n]: Properties to contain the phandle that refer to
+ different nodes of pin control settings. These nodes represents
+ the pin control setting of state 0 to state n. Each of these
+ nodes contains different subnodes to represents some desired
+ configuration for a list of pins. This configuration can
+ include the mux function to select on those pin(s), and
+ various pin configuration parameters, such as pull-up,
+ open drain.
+
+ Each subnode have following properties:
+ Required properties:
+ - pins: List of pins. Valid values of pins properties are:
+ gpio0, gpio1, gpio2, gpio3, gpio4, gpio5,
+ gpio6, gpio7
+
+ Optional properties:
+ function, bias-disable, bias-pull-up, bias-pull-down,
+ bias-high-impedance, drive-open-drain.
+
+ Valid values for function properties are:
+ gpio, interrupt-out, gpio-in-interrupt,
+ vsup-vbat-low-undebounce-out,
+ vsup-vbat-low-debounce-out,
+ voltage-in-standby, oc-pg-sd0, oc-pg-sd6,
+ powergood-out, pwm-in, pwm-out, clk32k-out,
+ watchdog-in, soft-reset-in
+
+Regulators:
+===========
+Device has multiple DCDC and LDOs. The node "regulators" is require if regulator
+functionality is needed.
+
+Following are properties of regulator subnode.
+
+ Optional properties:
+ -------------------
+ The input supply of regulators are the optional properties on the
+ regulator node. The input supply of these regulators are provided
+ through following properties:
+ vsup-sd2-supply: Input supply for SD2.
+ vsup-sd3-supply: Input supply for SD3.
+ vsup-sd4-supply: Input supply for SD4.
+ vsup-sd5-supply: Input supply for SD5.
+ vin-ldo0-supply: Input supply for LDO0.
+ vin-ldo1-6-supply: Input supply for LDO1 and LDO6.
+ vin-ldo2-5-7-supply: Input supply for LDO2, LDO5 and LDO7.
+ vin-ldo3-4-supply: Input supply for LDO3 and LDO4.
+ vin-ldo9-10-supply: Input supply for LDO9 and LDO10.
+ vin-ldo11-supply: Input supply for LDO11.
+
+ Optional sub nodes for regulators:
+ ---------------------------------
+ The subnodes name is the name of regulator and it must be one of:
+ sd[0-6], ldo[0-7], ldo[9-11]
+
+ Each sub-node should contain the constraints and initialization
+ information for that regulator. See regulator.txt for a description
+ of standard properties for these sub-nodes.
+ Additional optional custom properties are listed below.
+ ams,ext-control: External control of the rail. The option of
+ this properties will tell which external input is
+ controlling this rail. Valid values are 0, 1, 2 ad 3.
+ 0: There is no external control of this rail.
+ 1: Rail is controlled by ENABLE1 input pin.
+ 2: Rail is controlled by ENABLE2 input pin.
+ 3: Rail is controlled by ENABLE3 input pin.
+ Missing this property on DT will be assume as no
+ external control. The external control pin macros
+ are defined @dt-bindings/mfd/as3722.h
+
+ ams,enable-tracking: Enable tracking with SD1, only supported
+ by LDO3.
+
+Example:
+--------
+#include <dt-bindings/mfd/as3722.h>
+...
+ams3722 {
+ compatible = "ams,as3722";
+ reg = <0x48>;
+
+ interrupt-parent = <&intc>;
+ interrupt-controller;
+ #interrupt-cells = <2>;
+
+ gpio-controller;
+ #gpio-cells = <2>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&as3722_default>;
+
+ as3722_default: pinmux {
+ gpio0 {
+ pins = "gpio0";
+ function = "gpio";
+ bias-pull-down;
+ };
+
+ gpio1_2_4_7 {
+ pins = "gpio1", "gpio2", "gpio4", "gpio7";
+ function = "gpio";
+ bias-pull-up;
+ };
+
+ gpio5 {
+ pins = "gpio5";
+ function = "clk32k_out";
+ };
+ }
+
+ regulators {
+ vsup-sd2-supply = <...>;
+ ...
+
+ sd0 {
+ regulator-name = "vdd_cpu";
+ regulator-min-microvolt = <700000>;
+ regulator-max-microvolt = <1400000>;
+ regulator-always-on;
+ ams,ext-control = <2>;
+ };
+
+ sd1 {
+ regulator-name = "vdd_core";
+ regulator-min-microvolt = <700000>;
+ regulator-max-microvolt = <1400000>;
+ regulator-always-on;
+ ams,ext-control = <1>;
+ };
+
+ sd2 {
+ regulator-name = "vddio_ddr";
+ regulator-min-microvolt = <1350000>;
+ regulator-max-microvolt = <1350000>;
+ regulator-always-on;
+ };
+
+ sd4 {
+ regulator-name = "avdd-hdmi-pex";
+ regulator-min-microvolt = <1050000>;
+ regulator-max-microvolt = <1050000>;
+ regulator-always-on;
+ };
+
+ sd5 {
+ regulator-name = "vdd-1v8";
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-always-on;
+ };
+ ....
+ };
+};
diff --git a/Documentation/devicetree/bindings/mfd/s2mps11.txt b/Documentation/devicetree/bindings/mfd/s2mps11.txt
index c9332c626021..78a840d7510d 100644
--- a/Documentation/devicetree/bindings/mfd/s2mps11.txt
+++ b/Documentation/devicetree/bindings/mfd/s2mps11.txt
@@ -1,10 +1,10 @@
* Samsung S2MPS11 Voltage and Current Regulator
-The Samsung S2MP211 is a multi-function device which includes voltage and
+The Samsung S2MPS11 is a multi-function device which includes voltage and
current regulators, RTC, charger controller and other sub-blocks. It is
-interfaced to the host controller using a I2C interface. Each sub-block is
-addressed by the host system using different I2C slave address.
+interfaced to the host controller using an I2C interface. Each sub-block is
+addressed by the host system using different I2C slave addresses.
Required properties:
- compatible: Should be "samsung,s2mps11-pmic".
@@ -43,7 +43,8 @@ sub-node should be of the format as listed below.
BUCK[2/3/4/6] supports disabling ramp delay on hardware, so explictly
regulator-ramp-delay = <0> can be used for them to disable ramp delay.
- In absence of regulator-ramp-delay property, default ramp delay will be used.
+ In the absence of the regulator-ramp-delay property, the default ramp
+ delay will be used.
NOTE: Some BUCKs share the ramp rate setting i.e. same ramp value will be set
for a particular group of BUCKs. So provide same regulator-ramp-delay<value>.
@@ -58,10 +59,10 @@ supports. Note: The 'n' in LDOn and BUCKn represents the LDO or BUCK number
as per the datasheet of s2mps11.
- LDOn
- - valid values for n are 1 to 28
+ - valid values for n are 1 to 38
- Example: LDO0, LD01, LDO28
- BUCKn
- - valid values for n are 1 to 9.
+ - valid values for n are 1 to 10.
- Example: BUCK1, BUCK2, BUCK9
Example:
diff --git a/Documentation/devicetree/bindings/mtd/gpmc-nand.txt b/Documentation/devicetree/bindings/mtd/gpmc-nand.txt
index df338cb5059c..5e1f31b5ff70 100644
--- a/Documentation/devicetree/bindings/mtd/gpmc-nand.txt
+++ b/Documentation/devicetree/bindings/mtd/gpmc-nand.txt
@@ -22,10 +22,10 @@ Optional properties:
width of 8 is assumed.
- ti,nand-ecc-opt: A string setting the ECC layout to use. One of:
-
- "sw" Software method (default)
- "hw" Hardware method
- "hw-romcode" gpmc hamming mode method & romcode layout
+ "sw" <deprecated> use "ham1" instead
+ "hw" <deprecated> use "ham1" instead
+ "hw-romcode" <deprecated> use "ham1" instead
+ "ham1" 1-bit Hamming ecc code
"bch4" 4-bit BCH ecc code
"bch8" 8-bit BCH ecc code
@@ -36,8 +36,12 @@ Optional properties:
"prefetch-dma" Prefetch enabled sDMA mode
"prefetch-irq" Prefetch enabled irq mode
- - elm_id: Specifies elm device node. This is required to support BCH
- error correction using ELM module.
+ - elm_id: <deprecated> use "ti,elm-id" instead
+ - ti,elm-id: Specifies phandle of the ELM devicetree node.
+ ELM is an on-chip hardware engine on TI SoC which is used for
+ locating ECC errors for BCHx algorithms. SoC devices which have
+ ELM hardware engines should specify this device node in .dtsi
+ Using ELM for ECC error correction frees some CPU cycles.
For inline partiton table parsing (optional):
diff --git a/Documentation/devicetree/bindings/net/cpsw-phy-sel.txt b/Documentation/devicetree/bindings/net/cpsw-phy-sel.txt
new file mode 100644
index 000000000000..7ff57a119f81
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/cpsw-phy-sel.txt
@@ -0,0 +1,28 @@
+TI CPSW Phy mode Selection Device Tree Bindings
+-----------------------------------------------
+
+Required properties:
+- compatible : Should be "ti,am3352-cpsw-phy-sel"
+- reg : physical base address and size of the cpsw
+ registers map
+- reg-names : names of the register map given in "reg" node
+
+Optional properties:
+-rmii-clock-ext : If present, the driver will configure the RMII
+ interface to external clock usage
+
+Examples:
+
+ phy_sel: cpsw-phy-sel@44e10650 {
+ compatible = "ti,am3352-cpsw-phy-sel";
+ reg= <0x44e10650 0x4>;
+ reg-names = "gmii-sel";
+ };
+
+(or)
+ phy_sel: cpsw-phy-sel@44e10650 {
+ compatible = "ti,am3352-cpsw-phy-sel";
+ reg= <0x44e10650 0x4>;
+ reg-names = "gmii-sel";
+ rmii-clock-ext;
+ };
diff --git a/Documentation/devicetree/bindings/pci/designware-pcie.txt b/Documentation/devicetree/bindings/pci/designware-pcie.txt
index e216af356847..d5d26d443693 100644
--- a/Documentation/devicetree/bindings/pci/designware-pcie.txt
+++ b/Documentation/devicetree/bindings/pci/designware-pcie.txt
@@ -3,7 +3,7 @@
Required properties:
- compatible: should contain "snps,dw-pcie" to identify the
core, plus an identifier for the specific instance, such
- as "samsung,exynos5440-pcie".
+ as "samsung,exynos5440-pcie" or "fsl,imx6q-pcie".
- reg: base addresses and lengths of the pcie controller,
the phy controller, additional register for the phy controller.
- interrupts: interrupt values for level interrupt,
@@ -21,6 +21,11 @@ Required properties:
- num-lanes: number of lanes to use
- reset-gpio: gpio pin number of power good signal
+Optional properties for fsl,imx6q-pcie
+- power-on-gpio: gpio pin number of power-enable signal
+- wake-up-gpio: gpio pin number of incoming wakeup signal
+- disable-gpio: gpio pin number of outgoing rfkill/endpoint disable signal
+
Example:
SoC specific DT Entry:
diff --git a/Documentation/devicetree/bindings/pwm/pwm-samsung.txt b/Documentation/devicetree/bindings/pwm/pwm-samsung.txt
index d61fccd40bad..5538de9c2007 100644
--- a/Documentation/devicetree/bindings/pwm/pwm-samsung.txt
+++ b/Documentation/devicetree/bindings/pwm/pwm-samsung.txt
@@ -15,7 +15,7 @@ Required properties:
samsung,s5pc100-pwm - for 32-bit timers present on S5PC100, S5PV210,
Exynos4210 rev0 SoCs
samsung,exynos4210-pwm - for 32-bit timers present on Exynos4210,
- Exynos4x12 and Exynos5250 SoCs
+ Exynos4x12, Exynos5250 and Exynos5420 SoCs
- reg: base address and size of register area
- interrupts: list of timer interrupts (one interrupt per timer, starting at
timer 0)
diff --git a/Documentation/devicetree/bindings/video/atmel,lcdc.txt b/Documentation/devicetree/bindings/video/atmel,lcdc.txt
new file mode 100644
index 000000000000..1ec175eddca8
--- /dev/null
+++ b/Documentation/devicetree/bindings/video/atmel,lcdc.txt
@@ -0,0 +1,75 @@
+Atmel LCDC Framebuffer
+-----------------------------------------------------
+
+Required properties:
+- compatible :
+ "atmel,at91sam9261-lcdc" ,
+ "atmel,at91sam9263-lcdc" ,
+ "atmel,at91sam9g10-lcdc" ,
+ "atmel,at91sam9g45-lcdc" ,
+ "atmel,at91sam9g45es-lcdc" ,
+ "atmel,at91sam9rl-lcdc" ,
+ "atmel,at32ap-lcdc"
+- reg : Should contain 1 register ranges(address and length)
+- interrupts : framebuffer controller interrupt
+- display: a phandle pointing to the display node
+
+Required nodes:
+- display: a display node is required to initialize the lcd panel
+ This should be in the board dts.
+- default-mode: a videomode within the display with timing parameters
+ as specified below.
+
+Example:
+
+ fb0: fb@0x00500000 {
+ compatible = "atmel,at91sam9g45-lcdc";
+ reg = <0x00500000 0x1000>;
+ interrupts = <23 3 0>;
+ pinctrl-names = "default";
+ pinctrl-0 = <&pinctrl_fb>;
+ display = <&display0>;
+ status = "okay";
+ #address-cells = <1>;
+ #size-cells = <1>;
+
+ };
+
+Atmel LCDC Display
+-----------------------------------------------------
+Required properties (as per of_videomode_helper):
+
+ - atmel,dmacon: dma controler configuration
+ - atmel,lcdcon2: lcd controler configuration
+ - atmel,guard-time: lcd guard time (Delay in frame periods)
+ - bits-per-pixel: lcd panel bit-depth.
+
+Optional properties (as per of_videomode_helper):
+ - atmel,lcdcon-backlight: enable backlight
+ - atmel,lcd-wiring-mode: lcd wiring mode "RGB" or "BRG"
+ - atmel,power-control-gpio: gpio to power on or off the LCD (as many as needed)
+
+Example:
+ display0: display {
+ bits-per-pixel = <32>;
+ atmel,lcdcon-backlight;
+ atmel,dmacon = <0x1>;
+ atmel,lcdcon2 = <0x80008002>;
+ atmel,guard-time = <9>;
+ atmel,lcd-wiring-mode = <1>;
+
+ display-timings {
+ native-mode = <&timing0>;
+ timing0: timing0 {
+ clock-frequency = <9000000>;
+ hactive = <480>;
+ vactive = <272>;
+ hback-porch = <1>;
+ hfront-porch = <1>;
+ vback-porch = <40>;
+ vfront-porch = <1>;
+ hsync-len = <45>;
+ vsync-len = <1>;
+ };
+ };
+ };
diff --git a/Documentation/devicetree/bindings/video/backlight/lp855x.txt b/Documentation/devicetree/bindings/video/backlight/lp855x.txt
index 1482103d288f..96e83a56048e 100644
--- a/Documentation/devicetree/bindings/video/backlight/lp855x.txt
+++ b/Documentation/devicetree/bindings/video/backlight/lp855x.txt
@@ -2,7 +2,7 @@ lp855x bindings
Required properties:
- compatible: "ti,lp8550", "ti,lp8551", "ti,lp8552", "ti,lp8553",
- "ti,lp8556", "ti,lp8557"
+ "ti,lp8555", "ti,lp8556", "ti,lp8557"
- reg: I2C slave address (u8)
- dev-ctrl: Value of DEVICE CONTROL register (u8). It depends on the device.
@@ -15,6 +15,33 @@ Optional properties:
Example:
+ /* LP8555 */
+ backlight@2c {
+ compatible = "ti,lp8555";
+ reg = <0x2c>;
+
+ dev-ctrl = /bits/ 8 <0x00>;
+ pwm-period = <10000>;
+
+ /* 4V OV, 4 output LED0 string enabled */
+ rom_14h {
+ rom-addr = /bits/ 8 <0x14>;
+ rom-val = /bits/ 8 <0xcf>;
+ };
+
+ /* Heavy smoothing, 24ms ramp time step */
+ rom_15h {
+ rom-addr = /bits/ 8 <0x15>;
+ rom-val = /bits/ 8 <0xc7>;
+ };
+
+ /* 4 output LED1 string enabled */
+ rom_19h {
+ rom-addr = /bits/ 8 <0x19>;
+ rom-val = /bits/ 8 <0x0f>;
+ };
+ };
+
/* LP8556 */
backlight@2c {
compatible = "ti,lp8556";
diff --git a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt b/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
index 1e4fc727f3b1..764db86d441a 100644
--- a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
+++ b/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
@@ -10,12 +10,16 @@ Required properties:
last value in the array represents a 100% duty cycle (brightest).
- default-brightness-level: the default brightness level (index into the
array defined by the "brightness-levels" property)
+ - power-supply: regulator for supply voltage
Optional properties:
- pwm-names: a list of names for the PWM devices specified in the
"pwms" property (see PWM binding[0])
+ - enable-gpios: contains a single GPIO specifier for the GPIO which enables
+ and disables the backlight (see GPIO binding[1])
[0]: Documentation/devicetree/bindings/pwm/pwm.txt
+[1]: Documentation/devicetree/bindings/gpio/gpio.txt
Example:
@@ -25,4 +29,7 @@ Example:
brightness-levels = <0 4 8 16 32 64 128 255>;
default-brightness-level = <6>;
+
+ power-supply = <&vdd_bl_reg>;
+ enable-gpios = <&gpio 58 0>;
};
diff --git a/Documentation/filesystems/directory-locking b/Documentation/filesystems/directory-locking
index ff7b611abf33..09bbf9a54f80 100644
--- a/Documentation/filesystems/directory-locking
+++ b/Documentation/filesystems/directory-locking
@@ -2,6 +2,10 @@
kinds of locks - per-inode (->i_mutex) and per-filesystem
(->s_vfs_rename_mutex).
+ When taking the i_mutex on multiple non-directory objects, we
+always acquire the locks in order by increasing address. We'll call
+that "inode pointer" order in the following.
+
For our purposes all operations fall in 5 classes:
1) read access. Locking rules: caller locks directory we are accessing.
@@ -12,8 +16,9 @@ kinds of locks - per-inode (->i_mutex) and per-filesystem
locks victim and calls the method.
4) rename() that is _not_ cross-directory. Locking rules: caller locks
-the parent, finds source and target, if target already exists - locks it
-and then calls the method.
+the parent and finds source and target. If target already exists, lock
+it. If source is a non-directory, lock it. If that means we need to
+lock both, lock them in inode pointer order.
5) link creation. Locking rules:
* lock parent
@@ -30,7 +35,9 @@ rules:
fail with -ENOTEMPTY
* if new parent is equal to or is a descendent of source
fail with -ELOOP
- * if target exists - lock it.
+ * If target exists, lock it. If source is a non-directory, lock
+ it. In case that means we need to lock both source and target,
+ do so in inode pointer order.
* call the method.
@@ -56,9 +63,11 @@ objects - A < B iff A is an ancestor of B.
renames will be blocked on filesystem lock and we don't start changing
the order until we had acquired all locks).
-(3) any operation holds at most one lock on non-directory object and
- that lock is acquired after all other locks. (Proof: see descriptions
- of operations).
+(3) locks on non-directory objects are acquired only after locks on
+ directory objects, and are acquired in inode pointer order.
+ (Proof: all operations but renames take lock on at most one
+ non-directory object, except renames, which take locks on source and
+ target in inode pointer order in the case they are not directories.)
Now consider the minimal deadlock. Each process is blocked on
attempt to acquire some lock and already holds at least one lock. Let's
@@ -66,9 +75,13 @@ consider the set of contended locks. First of all, filesystem lock is
not contended, since any process blocked on it is not holding any locks.
Thus all processes are blocked on ->i_mutex.
- Non-directory objects are not contended due to (3). Thus link
-creation can't be a part of deadlock - it can't be blocked on source
-and it means that it doesn't hold any locks.
+ By (3), any process holding a non-directory lock can only be
+waiting on another non-directory lock with a larger address. Therefore
+the process holding the "largest" such lock can always make progress, and
+non-directory objects are not included in the set of contended locks.
+
+ Thus link creation can't be a part of deadlock - it can't be
+blocked on source and it means that it doesn't hold any locks.
Any contended object is either held by cross-directory rename or
has a child that is also contended. Indeed, suppose that it is held by
diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt
index 3cd27bed6349..a3fe811bbdbc 100644
--- a/Documentation/filesystems/f2fs.txt
+++ b/Documentation/filesystems/f2fs.txt
@@ -119,6 +119,7 @@ active_logs=%u Support configuring the number of active logs. In the
Default number is 6.
disable_ext_identify Disable the extension list configured by mkfs, so f2fs
does not aware of cold files such as media files.
+inline_xattr Enable the inline xattrs feature.
================================================================================
DEBUGFS ENTRIES
@@ -164,6 +165,12 @@ Files in /sys/fs/f2fs/<devname>
gc_idle = 1 will select the Cost Benefit approach
& setting gc_idle = 2 will select the greedy aproach.
+ reclaim_segments This parameter controls the number of prefree
+ segments to be reclaimed. If the number of prefree
+ segments is larger than this number, f2fs tries to
+ conduct checkpoint to reclaim the prefree segments
+ to free segments. By default, 100 segments, 200MB.
+
================================================================================
USAGE
================================================================================
diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index f0890581f7f6..fe2b7ae6f962 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -455,3 +455,11 @@ in your dentry operations instead.
vfs_follow_link has been removed. Filesystems must use nd_set_link
from ->follow_link for normal symlinks, or nd_jump_link for magic
/proc/<pid> style links.
+--
+[mandatory]
+ iget5_locked()/ilookup5()/ilookup5_nowait() test() callback used to be
+ called with both ->i_lock and inode_hash_lock held; the former is *not*
+ taken anymore, so verify that your callbacks do not rely on it (none
+ of the in-tree instances did). inode_hash_lock is still held,
+ of course, so they are still serialized wrt removal from inode hash,
+ as well as wrt set() callback of iget5_locked().
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 823c95faebd2..22d89aa37218 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -460,6 +460,7 @@ manner. The codes are the following:
nl - non-linear mapping
ar - architecture specific flag
dd - do not include area into core dump
+ sd - soft-dirty flag
mm - mixed map area
hg - huge page advise flag
nh - no-huge page advise flag
diff --git a/Documentation/filesystems/vfat.txt b/Documentation/filesystems/vfat.txt
index aa1f459fa6cf..4a93e98b290a 100644
--- a/Documentation/filesystems/vfat.txt
+++ b/Documentation/filesystems/vfat.txt
@@ -307,7 +307,7 @@ the following:
<proceeding files...>
<slot #3, id = 0x43, characters = "h is long">
- <slot #2, id = 0x02, characters = "xtension which">
+ <slot #2, id = 0x02, characters = "xtension whic">
<slot #1, id = 0x01, characters = "My Big File.E">
<directory entry, name = "MYBIGFIL.EXT">
diff --git a/Documentation/gcov.txt b/Documentation/gcov.txt
index e7ca6478cd93..7b727783db7e 100644
--- a/Documentation/gcov.txt
+++ b/Documentation/gcov.txt
@@ -50,6 +50,10 @@ Configure the kernel with:
CONFIG_DEBUG_FS=y
CONFIG_GCOV_KERNEL=y
+select the gcc's gcov format, default is autodetect based on gcc version:
+
+ CONFIG_GCOV_FORMAT_AUTODETECT=y
+
and to get coverage data for the entire kernel:
CONFIG_GCOV_PROFILE_ALL=y
diff --git a/Documentation/hwmon/lm90 b/Documentation/hwmon/lm90
index b466974e142f..ab81013cc390 100644
--- a/Documentation/hwmon/lm90
+++ b/Documentation/hwmon/lm90
@@ -122,6 +122,12 @@ Supported chips:
Prefix: 'g781'
Addresses scanned: I2C 0x4c, 0x4d
Datasheet: Not publicly available from GMT
+ * Texas Instruments TMP451
+ Prefix: 'tmp451'
+ Addresses scanned: I2C 0x4c
+ Datasheet: Publicly available at TI website
+ http://www.ti.com/litv/pdf/sbos686
+
Author: Jean Delvare <khali@linux-fr.org>
diff --git a/Documentation/input/gamepad.txt b/Documentation/input/gamepad.txt
index 8002c894c6b0..31bb6a4029ef 100644
--- a/Documentation/input/gamepad.txt
+++ b/Documentation/input/gamepad.txt
@@ -122,12 +122,14 @@ D-Pad:
BTN_DPAD_*
Analog buttons are reported as:
ABS_HAT0X and ABS_HAT0Y
+ (for ABS values negative is left/up, positive is right/down)
Analog-Sticks:
The left analog-stick is reported as ABS_X, ABS_Y. The right analog stick is
reported as ABS_RX, ABS_RY. Zero, one or two sticks may be present.
If analog-sticks provide digital buttons, they are mapped accordingly as
BTN_THUMBL (first/left) and BTN_THUMBR (second/right).
+ (for ABS values negative is left/up, positive is right/down)
Triggers:
Trigger buttons can be available as digital or analog buttons or both. User-
@@ -138,6 +140,7 @@ Triggers:
ABS_HAT2X (right/ZR) and BTN_TL2 or ABS_HAT2Y (left/ZL).
If only one trigger-button combination is present (upper+lower), they are
reported as "right" triggers (BTN_TR/ABS_HAT1X).
+ (ABS trigger values start at 0, pressure is reported as positive values)
Menu-Pad:
Menu buttons are always digital and are mapped according to their location
diff --git a/Documentation/kbuild/kconfig.txt b/Documentation/kbuild/kconfig.txt
index 8ef6dbb6a462..bbc99c0c1094 100644
--- a/Documentation/kbuild/kconfig.txt
+++ b/Documentation/kbuild/kconfig.txt
@@ -20,16 +20,9 @@ symbols have been introduced.
To see a list of new config symbols when using "make oldconfig", use
cp user/some/old.config .config
- yes "" | make oldconfig >conf.new
+ make listnewconfig
-and the config program will list as (NEW) any new symbols that have
-unknown values. Of course, the .config file is also updated with
-new (default) values, so you can use:
-
- grep "(NEW)" conf.new
-
-to see the new config symbols or you can use diffconfig to see the
-differences between the previous and new .config files:
+and the config program will list any new symbols, one per line.
scripts/diffconfig .config.old .config | less
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index fd3ecedc084d..9ca3e74a10e1 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1070,6 +1070,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
VIA, nVidia)
verbose: show contents of HPET registers during setup
+ hpet_mmap= [X86, HPET_MMAP] Allow userspace to mmap HPET
+ registers. Default set by CONFIG_HPET_MMAP_DEFAULT.
+
hugepages= [HW,X86-32,IA-64] HugeTLB pages to allocate at boot.
hugepagesz= [HW,IA-64,PPC,X86-64] The size of the HugeTLB pages.
On x86-64 and powerpc, this option can be specified
@@ -1775,6 +1778,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
that the amount of memory usable for all allocations
is not too small.
+ movable_node [KNL,X86] Boot-time switch to enable the effects
+ of CONFIG_MOVABLE_NODE=y. See mm/Kconfig for details.
+
MTD_Partition= [MTD]
Format: <name>,<region-number>,<size>,<offset>
diff --git a/Documentation/lockstat.txt b/Documentation/lockstat.txt
index dd2f7b26ca30..72d010689751 100644
--- a/Documentation/lockstat.txt
+++ b/Documentation/lockstat.txt
@@ -46,16 +46,14 @@ With these hooks we provide the following statistics:
contentions - number of lock acquisitions that had to wait
wait time min - shortest (non-0) time we ever had to wait for a lock
max - longest time we ever had to wait for a lock
- total - total time we spend waiting on this lock
+ total - total time we spend waiting on this lock
+ avg - average time spent waiting on this lock
acq-bounces - number of lock acquisitions that involved x-cpu data
acquisitions - number of times we took the lock
hold time min - shortest (non-0) time we ever held the lock
- max - longest time we ever held the lock
- total - total time this lock was held
-
-From these number various other statistics can be derived, such as:
-
- hold time average = hold time total / acquisitions
+ max - longest time we ever held the lock
+ total - total time this lock was held
+ avg - average time this lock was held
These numbers are gathered per lock class, per read/write state (when
applicable).
@@ -84,37 +82,38 @@ Look at the current lock statistics:
# less /proc/lock_stat
-01 lock_stat version 0.3
-02 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-03 class name con-bounces contentions waittime-min waittime-max waittime-total acq-bounces acquisitions holdtime-min holdtime-max holdtime-total
-04 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+01 lock_stat version 0.4
+02-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+03 class name con-bounces contentions waittime-min waittime-max waittime-total waittime-avg acq-bounces acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg
+04-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
05
-06 &mm->mmap_sem-W: 233 538 18446744073708 22924.27 607243.51 1342 45806 1.71 8595.89 1180582.34
-07 &mm->mmap_sem-R: 205 587 18446744073708 28403.36 731975.00 1940 412426 0.58 187825.45 6307502.88
-08 ---------------
-09 &mm->mmap_sem 487 [<ffffffff8053491f>] do_page_fault+0x466/0x928
-10 &mm->mmap_sem 179 [<ffffffff802a6200>] sys_mprotect+0xcd/0x21d
-11 &mm->mmap_sem 279 [<ffffffff80210a57>] sys_mmap+0x75/0xce
-12 &mm->mmap_sem 76 [<ffffffff802a490b>] sys_munmap+0x32/0x59
-13 ---------------
-14 &mm->mmap_sem 270 [<ffffffff80210a57>] sys_mmap+0x75/0xce
-15 &mm->mmap_sem 431 [<ffffffff8053491f>] do_page_fault+0x466/0x928
-16 &mm->mmap_sem 138 [<ffffffff802a490b>] sys_munmap+0x32/0x59
-17 &mm->mmap_sem 145 [<ffffffff802a6200>] sys_mprotect+0xcd/0x21d
+06 &mm->mmap_sem-W: 46 84 0.26 939.10 16371.53 194.90 47291 2922365 0.16 2220301.69 17464026916.32 5975.99
+07 &mm->mmap_sem-R: 37 100 1.31 299502.61 325629.52 3256.30 212344 34316685 0.10 7744.91 95016910.20 2.77
+08 ---------------
+09 &mm->mmap_sem 1 [<ffffffff811502a7>] khugepaged_scan_mm_slot+0x57/0x280
+19 &mm->mmap_sem 96 [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510
+11 &mm->mmap_sem 34 [<ffffffff81113d77>] vm_mmap_pgoff+0x87/0xd0
+12 &mm->mmap_sem 17 [<ffffffff81127e71>] vm_munmap+0x41/0x80
+13 ---------------
+14 &mm->mmap_sem 1 [<ffffffff81046fda>] dup_mmap+0x2a/0x3f0
+15 &mm->mmap_sem 60 [<ffffffff81129e29>] SyS_mprotect+0xe9/0x250
+16 &mm->mmap_sem 41 [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510
+17 &mm->mmap_sem 68 [<ffffffff81113d77>] vm_mmap_pgoff+0x87/0xd0
18
-19 ...............................................................................................................................................................................................
+19.............................................................................................................................................................................................................................
20
-21 dcache_lock: 621 623 0.52 118.26 1053.02 6745 91930 0.29 316.29 118423.41
-22 -----------
-23 dcache_lock 179 [<ffffffff80378274>] _atomic_dec_and_lock+0x34/0x54
-24 dcache_lock 113 [<ffffffff802cc17b>] d_alloc+0x19a/0x1eb
-25 dcache_lock 99 [<ffffffff802ca0dc>] d_rehash+0x1b/0x44
-26 dcache_lock 104 [<ffffffff802cbca0>] d_instantiate+0x36/0x8a
-27 -----------
-28 dcache_lock 192 [<ffffffff80378274>] _atomic_dec_and_lock+0x34/0x54
-29 dcache_lock 98 [<ffffffff802ca0dc>] d_rehash+0x1b/0x44
-30 dcache_lock 72 [<ffffffff802cc17b>] d_alloc+0x19a/0x1eb
-31 dcache_lock 112 [<ffffffff802cbca0>] d_instantiate+0x36/0x8a
+21 unix_table_lock: 110 112 0.21 49.24 163.91 1.46 21094 66312 0.12 624.42 31589.81 0.48
+22 ---------------
+23 unix_table_lock 45 [<ffffffff8150ad8e>] unix_create1+0x16e/0x1b0
+24 unix_table_lock 47 [<ffffffff8150b111>] unix_release_sock+0x31/0x250
+25 unix_table_lock 15 [<ffffffff8150ca37>] unix_find_other+0x117/0x230
+26 unix_table_lock 5 [<ffffffff8150a09f>] unix_autobind+0x11f/0x1b0
+27 ---------------
+28 unix_table_lock 39 [<ffffffff8150b111>] unix_release_sock+0x31/0x250
+29 unix_table_lock 49 [<ffffffff8150ad8e>] unix_create1+0x16e/0x1b0
+30 unix_table_lock 20 [<ffffffff8150ca37>] unix_find_other+0x117/0x230
+31 unix_table_lock 4 [<ffffffff8150a09f>] unix_autobind+0x11f/0x1b0
+
This excerpt shows the first two lock class statistics. Line 01 shows the
output version - each time the format changes this will be updated. Line 02-04
@@ -131,30 +130,30 @@ The integer part of the time values is in us.
Dealing with nested locks, subclasses may appear:
-32...............................................................................................................................................................................................
+32...........................................................................................................................................................................................................................
33
-34 &rq->lock: 13128 13128 0.43 190.53 103881.26 97454 3453404 0.00 401.11 13224683.11
+34 &rq->lock: 13128 13128 0.43 190.53 103881.26 7.91 97454 3453404 0.00 401.11 13224683.11 3.82
35 ---------
-36 &rq->lock 645 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75
-37 &rq->lock 297 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a
-38 &rq->lock 360 [<ffffffff8103c4c5>] select_task_rq_fair+0x1f0/0x74a
-39 &rq->lock 428 [<ffffffff81045f98>] scheduler_tick+0x46/0x1fb
+36 &rq->lock 645 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75
+37 &rq->lock 297 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a
+38 &rq->lock 360 [<ffffffff8103c4c5>] select_task_rq_fair+0x1f0/0x74a
+39 &rq->lock 428 [<ffffffff81045f98>] scheduler_tick+0x46/0x1fb
40 ---------
-41 &rq->lock 77 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75
-42 &rq->lock 174 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a
-43 &rq->lock 4715 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54
-44 &rq->lock 893 [<ffffffff81340524>] schedule+0x157/0x7b8
+41 &rq->lock 77 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75
+42 &rq->lock 174 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a
+43 &rq->lock 4715 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54
+44 &rq->lock 893 [<ffffffff81340524>] schedule+0x157/0x7b8
45
-46...............................................................................................................................................................................................
+46...........................................................................................................................................................................................................................
47
-48 &rq->lock/1: 11526 11488 0.33 388.73 136294.31 21461 38404 0.00 37.93 109388.53
+48 &rq->lock/1: 1526 11488 0.33 388.73 136294.31 11.86 21461 38404 0.00 37.93 109388.53 2.84
49 -----------
-50 &rq->lock/1 11526 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54
+50 &rq->lock/1 11526 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54
51 -----------
-52 &rq->lock/1 5645 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54
-53 &rq->lock/1 1224 [<ffffffff81340524>] schedule+0x157/0x7b8
-54 &rq->lock/1 4336 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54
-55 &rq->lock/1 181 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a
+52 &rq->lock/1 5645 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54
+53 &rq->lock/1 1224 [<ffffffff81340524>] schedule+0x157/0x7b8
+54 &rq->lock/1 4336 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54
+55 &rq->lock/1 181 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a
Line 48 shows statistics for the second subclass (/1) of &rq->lock class
(subclass starts from 0), since in this case, as line 50 suggests,
@@ -163,16 +162,16 @@ double_rq_lock actually acquires a nested lock of two spinlocks.
View the top contending locks:
# grep : /proc/lock_stat | head
- &inode->i_data.tree_lock-W: 15 21657 0.18 1093295.30 11547131054.85 58 10415 0.16 87.51 6387.60
- &inode->i_data.tree_lock-R: 0 0 0.00 0.00 0.00 23302 231198 0.25 8.45 98023.38
- dcache_lock: 1037 1161 0.38 45.32 774.51 6611 243371 0.15 306.48 77387.24
- &inode->i_mutex: 161 286 18446744073709 62882.54 1244614.55 3653 20598 18446744073709 62318.60 1693822.74
- &zone->lru_lock: 94 94 0.53 7.33 92.10 4366 32690 0.29 59.81 16350.06
- &inode->i_data.i_mmap_mutex: 79 79 0.40 3.77 53.03 11779 87755 0.28 116.93 29898.44
- &q->__queue_lock: 48 50 0.52 31.62 86.31 774 13131 0.17 113.08 12277.52
- &rq->rq_lock_key: 43 47 0.74 68.50 170.63 3706 33929 0.22 107.99 17460.62
- &rq->rq_lock_key#2: 39 46 0.75 6.68 49.03 2979 32292 0.17 125.17 17137.63
- tasklist_lock-W: 15 15 1.45 10.87 32.70 1201 7390 0.58 62.55 13648.47
+ clockevents_lock: 2926159 2947636 0.15 46882.81 1784540466.34 605.41 3381345 3879161 0.00 2260.97 53178395.68 13.71
+ tick_broadcast_lock: 346460 346717 0.18 2257.43 39364622.71 113.54 3642919 4242696 0.00 2263.79 49173646.60 11.59
+ &mapping->i_mmap_mutex: 203896 203899 3.36 645530.05 31767507988.39 155800.21 3361776 8893984 0.17 2254.15 14110121.02 1.59
+ &rq->lock: 135014 136909 0.18 606.09 842160.68 6.15 1540728 10436146 0.00 728.72 17606683.41 1.69
+ &(&zone->lru_lock)->rlock: 93000 94934 0.16 59.18 188253.78 1.98 1199912 3809894 0.15 391.40 3559518.81 0.93
+ tasklist_lock-W: 40667 41130 0.23 1189.42 428980.51 10.43 270278 510106 0.16 653.51 3939674.91 7.72
+ tasklist_lock-R: 21298 21305 0.20 1310.05 215511.12 10.12 186204 241258 0.14 1162.33 1179779.23 4.89
+ rcu_node_1: 47656 49022 0.16 635.41 193616.41 3.95 844888 1865423 0.00 764.26 1656226.96 0.89
+ &(&dentry->d_lockref.lock)->rlock: 39791 40179 0.15 1302.08 88851.96 2.21 2790851 12527025 0.10 1910.75 3379714.27 0.27
+ rcu_node_0: 29203 30064 0.16 786.55 1555573.00 51.74 88963 244254 0.00 398.87 428872.51 1.76
Clear the statistics:
diff --git a/Documentation/mutex-design.txt b/Documentation/mutex-design.txt
index 38c10fd7f411..1dfe62c3641d 100644
--- a/Documentation/mutex-design.txt
+++ b/Documentation/mutex-design.txt
@@ -116,11 +116,11 @@ using mutexes at the moment, please let me know if you find any. ]
Implementation of mutexes
-------------------------
-'struct mutex' is the new mutex type, defined in include/linux/mutex.h
-and implemented in kernel/mutex.c. It is a counter-based mutex with a
-spinlock and a wait-list. The counter has 3 states: 1 for "unlocked",
-0 for "locked" and negative numbers (usually -1) for "locked, potential
-waiters queued".
+'struct mutex' is the new mutex type, defined in include/linux/mutex.h and
+implemented in kernel/locking/mutex.c. It is a counter-based mutex with a
+spinlock and a wait-list. The counter has 3 states: 1 for "unlocked", 0 for
+"locked" and negative numbers (usually -1) for "locked, potential waiters
+queued".
the APIs of 'struct mutex' have been streamlined:
diff --git a/Documentation/networking/batman-adv.txt b/Documentation/networking/batman-adv.txt
index c1d82047a4b1..89490beb3c0b 100644
--- a/Documentation/networking/batman-adv.txt
+++ b/Documentation/networking/batman-adv.txt
@@ -69,8 +69,7 @@ folder:
# aggregated_ogms gw_bandwidth log_level
# ap_isolation gw_mode orig_interval
# bonding gw_sel_class routing_algo
-# bridge_loop_avoidance hop_penalty vis_mode
-# fragmentation
+# bridge_loop_avoidance hop_penalty fragmentation
There is a special folder for debugging information:
@@ -78,7 +77,7 @@ There is a special folder for debugging information:
# ls /sys/kernel/debug/batman_adv/bat0/
# bla_backbone_table log transtable_global
# bla_claim_table originators transtable_local
-# gateways socket vis_data
+# gateways socket
Some of the files contain all sort of status information regard-
ing the mesh network. For example, you can view the table of
@@ -127,51 +126,6 @@ ously assigned to interfaces now used by batman advanced, e.g.
# ifconfig eth0 0.0.0.0
-VISUALIZATION
--------------
-
-If you want topology visualization, at least one mesh node must
-be configured as VIS-server:
-
-# echo "server" > /sys/class/net/bat0/mesh/vis_mode
-
-Each node is either configured as "server" or as "client" (de-
-fault: "client"). Clients send their topology data to the server
-next to them, and server synchronize with other servers. If there
-is no server configured (default) within the mesh, no topology
-information will be transmitted. With these "synchronizing
-servers", there can be 1 or more vis servers sharing the same (or
-at least very similar) data.
-
-When configured as server, you can get a topology snapshot of
-your mesh:
-
-# cat /sys/kernel/debug/batman_adv/bat0/vis_data
-
-This raw output is intended to be easily parsable and convertable
-with other tools. Have a look at the batctl README if you want a
-vis output in dot or json format for instance and how those out-
-puts could then be visualised in an image.
-
-The raw format consists of comma separated values per entry where
-each entry is giving information about a certain source inter-
-face. Each entry can/has to have the following values:
--> "mac" - mac address of an originator's source interface
- (each line begins with it)
--> "TQ mac value" - src mac's link quality towards mac address
- of a neighbor originator's interface which
- is being used for routing
--> "TT mac" - TT announced by source mac
--> "PRIMARY" - this is a primary interface
--> "SEC mac" - secondary mac address of source
- (requires preceding PRIMARY)
-
-The TQ value has a range from 4 to 255 with 255 being the best.
-The TT entries are showing which hosts are connected to the mesh
-via bat0 or being bridged into the mesh network. The PRIMARY/SEC
-values are only applied on primary interfaces
-
-
LOGGING/DEBUGGING
-----------------
@@ -245,5 +199,5 @@ Mailing-list: b.a.t.m.a.n@open-mesh.org (optional subscription
You can also contact the Authors:
-Marek Lindner <lindner_marek@yahoo.de>
-Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
+Marek Lindner <mareklindner@neomailbox.ch>
+Simon Wunderlich <sw@simonwunderlich.de>
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 9b28e714831a..2cdb8b66caa9 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -639,6 +639,15 @@ num_unsol_na
are generated by the ipv4 and ipv6 code and the numbers of
repetitions cannot be set independently.
+packets_per_slave
+
+ Specify the number of packets to transmit through a slave before
+ moving to the next one. When set to 0 then a slave is chosen at
+ random.
+
+ The valid range is 0 - 65535; the default value is 1. This option
+ has effect only in balance-rr mode.
+
primary
A string (eth0, eth2, etc) specifying which slave is the
@@ -743,21 +752,16 @@ xmit_hash_policy
protocol information to generate the hash.
Uses XOR of hardware MAC addresses and IP addresses to
- generate the hash. The IPv4 formula is
-
- (((source IP XOR dest IP) AND 0xffff) XOR
- ( source MAC XOR destination MAC ))
- modulo slave count
-
- The IPv6 formula is
+ generate the hash. The formula is
- hash = (source ip quad 2 XOR dest IP quad 2) XOR
- (source ip quad 3 XOR dest IP quad 3) XOR
- (source ip quad 4 XOR dest IP quad 4)
+ hash = source MAC XOR destination MAC
+ hash = hash XOR source IP XOR destination IP
+ hash = hash XOR (hash RSHIFT 16)
+ hash = hash XOR (hash RSHIFT 8)
+ And then hash is reduced modulo slave count.
- (((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
- XOR (source MAC XOR destination MAC))
- modulo slave count
+ If the protocol is IPv6 then the source and destination
+ addresses are first hashed using ipv6_addr_hash.
This algorithm will place all traffic to a particular
network peer on the same slave. For non-IP traffic,
@@ -779,21 +783,16 @@ xmit_hash_policy
slaves, although a single connection will not span
multiple slaves.
- The formula for unfragmented IPv4 TCP and UDP packets is
+ The formula for unfragmented TCP and UDP packets is
- ((source port XOR dest port) XOR
- ((source IP XOR dest IP) AND 0xffff)
- modulo slave count
+ hash = source port, destination port (as in the header)
+ hash = hash XOR source IP XOR destination IP
+ hash = hash XOR (hash RSHIFT 16)
+ hash = hash XOR (hash RSHIFT 8)
+ And then hash is reduced modulo slave count.
- The formula for unfragmented IPv6 TCP and UDP packets is
-
- hash = (source port XOR dest port) XOR
- ((source ip quad 2 XOR dest IP quad 2) XOR
- (source ip quad 3 XOR dest IP quad 3) XOR
- (source ip quad 4 XOR dest IP quad 4))
-
- ((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
- modulo slave count
+ If the protocol is IPv6 then the source and destination
+ addresses are first hashed using ipv6_addr_hash.
For fragmented TCP or UDP packets and all other IPv4 and
IPv6 protocol traffic, the source and destination port
@@ -801,10 +800,6 @@ xmit_hash_policy
formula is the same as for the layer2 transmit hash
policy.
- The IPv4 policy is intended to mimic the behavior of
- certain switches, notably Cisco switches with PFC2 as
- well as some Foundry and IBM products.
-
This algorithm is not fully 802.3ad compliant. A
single TCP or UDP conversation containing both
fragmented and unfragmented packets will see packets
@@ -815,6 +810,26 @@ xmit_hash_policy
conversations. Other implementations of 802.3ad may
or may not tolerate this noncompliance.
+ encap2+3
+
+ This policy uses the same formula as layer2+3 but it
+ relies on skb_flow_dissect to obtain the header fields
+ which might result in the use of inner headers if an
+ encapsulation protocol is used. For example this will
+ improve the performance for tunnel users because the
+ packets will be distributed according to the encapsulated
+ flows.
+
+ encap3+4
+
+ This policy uses the same formula as layer3+4 but it
+ relies on skb_flow_dissect to obtain the header fields
+ which might result in the use of inner headers if an
+ encapsulation protocol is used. For example this will
+ improve the performance for tunnel users because the
+ packets will be distributed according to the encapsulated
+ flows.
+
The default value is layer2. This option was added in bonding
version 2.6.3. In earlier versions of bonding, this parameter
does not exist, and the layer2 policy is the only policy. The
diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt
index 820f55344edc..4c072414eadb 100644
--- a/Documentation/networking/can.txt
+++ b/Documentation/networking/can.txt
@@ -25,6 +25,12 @@ This file contains
4.1.5 RAW socket option CAN_RAW_FD_FRAMES
4.1.6 RAW socket returned message flags
4.2 Broadcast Manager protocol sockets (SOCK_DGRAM)
+ 4.2.1 Broadcast Manager operations
+ 4.2.2 Broadcast Manager message flags
+ 4.2.3 Broadcast Manager transmission timers
+ 4.2.4 Broadcast Manager message sequence transmission
+ 4.2.5 Broadcast Manager receive filter timers
+ 4.2.6 Broadcast Manager multiplex message receive filter
4.3 connected transport protocols (SOCK_SEQPACKET)
4.4 unconnected transport protocols (SOCK_DGRAM)
@@ -593,6 +599,217 @@ solution for a couple of reasons:
In order to receive such messages, CAN_RAW_RECV_OWN_MSGS must be set.
4.2 Broadcast Manager protocol sockets (SOCK_DGRAM)
+
+ The Broadcast Manager protocol provides a command based configuration
+ interface to filter and send (e.g. cyclic) CAN messages in kernel space.
+
+ Receive filters can be used to down sample frequent messages; detect events
+ such as message contents changes, packet length changes, and do time-out
+ monitoring of received messages.
+
+ Periodic transmission tasks of CAN frames or a sequence of CAN frames can be
+ created and modified at runtime; both the message content and the two
+ possible transmit intervals can be altered.
+
+ A BCM socket is not intended for sending individual CAN frames using the
+ struct can_frame as known from the CAN_RAW socket. Instead a special BCM
+ configuration message is defined. The basic BCM configuration message used
+ to communicate with the broadcast manager and the available operations are
+ defined in the linux/can/bcm.h include. The BCM message consists of a
+ message header with a command ('opcode') followed by zero or more CAN frames.
+ The broadcast manager sends responses to user space in the same form:
+
+ struct bcm_msg_head {
+ __u32 opcode; /* command */
+ __u32 flags; /* special flags */
+ __u32 count; /* run 'count' times with ival1 */
+ struct timeval ival1, ival2; /* count and subsequent interval */
+ canid_t can_id; /* unique can_id for task */
+ __u32 nframes; /* number of can_frames following */
+ struct can_frame frames[0];
+ };
+
+ The aligned payload 'frames' uses the same basic CAN frame structure defined
+ at the beginning of section 4 and in the include/linux/can.h include. All
+ messages to the broadcast manager from user space have this structure.
+
+ Note a CAN_BCM socket must be connected instead of bound after socket
+ creation (example without error checking):
+
+ int s;
+ struct sockaddr_can addr;
+ struct ifreq ifr;
+
+ s = socket(PF_CAN, SOCK_DGRAM, CAN_BCM);
+
+ strcpy(ifr.ifr_name, "can0");
+ ioctl(s, SIOCGIFINDEX, &ifr);
+
+ addr.can_family = AF_CAN;
+ addr.can_ifindex = ifr.ifr_ifindex;
+
+ connect(s, (struct sockaddr *)&addr, sizeof(addr))
+
+ (..)
+
+ The broadcast manager socket is able to handle any number of in flight
+ transmissions or receive filters concurrently. The different RX/TX jobs are
+ distinguished by the unique can_id in each BCM message. However additional
+ CAN_BCM sockets are recommended to communicate on multiple CAN interfaces.
+ When the broadcast manager socket is bound to 'any' CAN interface (=> the
+ interface index is set to zero) the configured receive filters apply to any
+ CAN interface unless the sendto() syscall is used to overrule the 'any' CAN
+ interface index. When using recvfrom() instead of read() to retrieve BCM
+ socket messages the originating CAN interface is provided in can_ifindex.
+
+ 4.2.1 Broadcast Manager operations
+
+ The opcode defines the operation for the broadcast manager to carry out,
+ or details the broadcast managers response to several events, including
+ user requests.
+
+ Transmit Operations (user space to broadcast manager):
+
+ TX_SETUP: Create (cyclic) transmission task.
+
+ TX_DELETE: Remove (cyclic) transmission task, requires only can_id.
+
+ TX_READ: Read properties of (cyclic) transmission task for can_id.
+
+ TX_SEND: Send one CAN frame.
+
+ Transmit Responses (broadcast manager to user space):
+
+ TX_STATUS: Reply to TX_READ request (transmission task configuration).
+
+ TX_EXPIRED: Notification when counter finishes sending at initial interval
+ 'ival1'. Requires the TX_COUNTEVT flag to be set at TX_SETUP.
+
+ Receive Operations (user space to broadcast manager):
+
+ RX_SETUP: Create RX content filter subscription.
+
+ RX_DELETE: Remove RX content filter subscription, requires only can_id.
+
+ RX_READ: Read properties of RX content filter subscription for can_id.
+
+ Receive Responses (broadcast manager to user space):
+
+ RX_STATUS: Reply to RX_READ request (filter task configuration).
+
+ RX_TIMEOUT: Cyclic message is detected to be absent (timer ival1 expired).
+
+ RX_CHANGED: BCM message with updated CAN frame (detected content change).
+ Sent on first message received or on receipt of revised CAN messages.
+
+ 4.2.2 Broadcast Manager message flags
+
+ When sending a message to the broadcast manager the 'flags' element may
+ contain the following flag definitions which influence the behaviour:
+
+ SETTIMER: Set the values of ival1, ival2 and count
+
+ STARTTIMER: Start the timer with the actual values of ival1, ival2
+ and count. Starting the timer leads simultaneously to emit a CAN frame.
+
+ TX_COUNTEVT: Create the message TX_EXPIRED when count expires
+
+ TX_ANNOUNCE: A change of data by the process is emitted immediately.
+
+ TX_CP_CAN_ID: Copies the can_id from the message header to each
+ subsequent frame in frames. This is intended as usage simplification. For
+ TX tasks the unique can_id from the message header may differ from the
+ can_id(s) stored for transmission in the subsequent struct can_frame(s).
+
+ RX_FILTER_ID: Filter by can_id alone, no frames required (nframes=0).
+
+ RX_CHECK_DLC: A change of the DLC leads to an RX_CHANGED.
+
+ RX_NO_AUTOTIMER: Prevent automatically starting the timeout monitor.
+
+ RX_ANNOUNCE_RESUME: If passed at RX_SETUP and a receive timeout occured, a
+ RX_CHANGED message will be generated when the (cyclic) receive restarts.
+
+ TX_RESET_MULTI_IDX: Reset the index for the multiple frame transmission.
+
+ RX_RTR_FRAME: Send reply for RTR-request (placed in op->frames[0]).
+
+ 4.2.3 Broadcast Manager transmission timers
+
+ Periodic transmission configurations may use up to two interval timers.
+ In this case the BCM sends a number of messages ('count') at an interval
+ 'ival1', then continuing to send at another given interval 'ival2'. When
+ only one timer is needed 'count' is set to zero and only 'ival2' is used.
+ When SET_TIMER and START_TIMER flag were set the timers are activated.
+ The timer values can be altered at runtime when only SET_TIMER is set.
+
+ 4.2.4 Broadcast Manager message sequence transmission
+
+ Up to 256 CAN frames can be transmitted in a sequence in the case of a cyclic
+ TX task configuration. The number of CAN frames is provided in the 'nframes'
+ element of the BCM message head. The defined number of CAN frames are added
+ as array to the TX_SETUP BCM configuration message.
+
+ /* create a struct to set up a sequence of four CAN frames */
+ struct {
+ struct bcm_msg_head msg_head;
+ struct can_frame frame[4];
+ } mytxmsg;
+
+ (..)
+ mytxmsg.nframes = 4;
+ (..)
+
+ write(s, &mytxmsg, sizeof(mytxmsg));
+
+ With every transmission the index in the array of CAN frames is increased
+ and set to zero at index overflow.
+
+ 4.2.5 Broadcast Manager receive filter timers
+
+ The timer values ival1 or ival2 may be set to non-zero values at RX_SETUP.
+ When the SET_TIMER flag is set the timers are enabled:
+
+ ival1: Send RX_TIMEOUT when a received message is not received again within
+ the given time. When START_TIMER is set at RX_SETUP the timeout detection
+ is activated directly - even without a former CAN frame reception.
+
+ ival2: Throttle the received message rate down to the value of ival2. This
+ is useful to reduce messages for the application when the signal inside the
+ CAN frame is stateless as state changes within the ival2 periode may get
+ lost.
+
+ 4.2.6 Broadcast Manager multiplex message receive filter
+
+ To filter for content changes in multiplex message sequences an array of more
+ than one CAN frames can be passed in a RX_SETUP configuration message. The
+ data bytes of the first CAN frame contain the mask of relevant bits that
+ have to match in the subsequent CAN frames with the received CAN frame.
+ If one of the subsequent CAN frames is matching the bits in that frame data
+ mark the relevant content to be compared with the previous received content.
+ Up to 257 CAN frames (multiplex filter bit mask CAN frame plus 256 CAN
+ filters) can be added as array to the TX_SETUP BCM configuration message.
+
+ /* usually used to clear CAN frame data[] - beware of endian problems! */
+ #define U64_DATA(p) (*(unsigned long long*)(p)->data)
+
+ struct {
+ struct bcm_msg_head msg_head;
+ struct can_frame frame[5];
+ } msg;
+
+ msg.msg_head.opcode = RX_SETUP;
+ msg.msg_head.can_id = 0x42;
+ msg.msg_head.flags = 0;
+ msg.msg_head.nframes = 5;
+ U64_DATA(&msg.frame[0]) = 0xFF00000000000000ULL; /* MUX mask */
+ U64_DATA(&msg.frame[1]) = 0x01000000000000FFULL; /* data mask (MUX 0x01) */
+ U64_DATA(&msg.frame[2]) = 0x0200FFFF000000FFULL; /* data mask (MUX 0x02) */
+ U64_DATA(&msg.frame[3]) = 0x330000FFFFFF0003ULL; /* data mask (MUX 0x33) */
+ U64_DATA(&msg.frame[4]) = 0x4F07FC0FF0000000ULL; /* data mask (MUX 0x4F) */
+
+ write(s, &msg, sizeof(msg));
+
4.3 connected transport protocols (SOCK_SEQPACKET)
4.4 unconnected transport protocols (SOCK_DGRAM)
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index a46d78583ae1..8b8a05787641 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -267,17 +267,6 @@ tcp_max_orphans - INTEGER
more aggressively. Let me to remind again: each orphan eats
up to ~64K of unswappable memory.
-tcp_max_ssthresh - INTEGER
- Limited Slow-Start for TCP with large congestion windows (cwnd) defined in
- RFC3742. Limited slow-start is a mechanism to limit growth of the cwnd
- on the region where cwnd is larger than tcp_max_ssthresh. TCP increases cwnd
- by at most tcp_max_ssthresh segments, and by at least tcp_max_ssthresh/2
- segments per RTT when the cwnd is above tcp_max_ssthresh.
- If TCP connection increased cwnd to thousands (or tens of thousands) segments,
- and thousands of packets were being dropped during slow-start, you can set
- tcp_max_ssthresh to improve performance for new TCP connection.
- Default: 0 (off)
-
tcp_max_syn_backlog - INTEGER
Maximal number of remembered connection requests, which have not
received an acknowledgment from connecting client.
@@ -451,7 +440,7 @@ tcp_fastopen - INTEGER
connect() to perform a TCP handshake automatically.
The values (bitmap) are
- 1: Enables sending data in the opening SYN on the client.
+ 1: Enables sending data in the opening SYN on the client w/ MSG_FASTOPEN.
2: Enables TCP Fast Open on the server side, i.e., allowing data in
a SYN packet to be accepted and passed to the application before
3-way hand shake finishes.
@@ -464,7 +453,7 @@ tcp_fastopen - INTEGER
different ways of setting max_qlen without the TCP_FASTOPEN socket
option.
- Default: 0
+ Default: 1
Note that the client & server side Fast Open flags (1 and 2
respectively) must be also enabled before the rest of flags can take
diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt
index c7ecc7080494..0b1cf6b2a592 100644
--- a/Documentation/networking/netdevices.txt
+++ b/Documentation/networking/netdevices.txt
@@ -10,12 +10,12 @@ network devices.
struct net_device allocation rules
==================================
Network device structures need to persist even after module is unloaded and
-must be allocated with kmalloc. If device has registered successfully,
-it will be freed on last use by free_netdev. This is required to handle the
-pathologic case cleanly (example: rmmod mydriver </sys/class/net/myeth/mtu )
+must be allocated with alloc_netdev_mqs() and friends.
+If device has registered successfully, it will be freed on last use
+by free_netdev(). This is required to handle the pathologic case cleanly
+(example: rmmod mydriver </sys/class/net/myeth/mtu )
-There are routines in net_init.c to handle the common cases of
-alloc_etherdev, alloc_netdev. These reserve extra space for driver
+alloc_netdev_mqs()/alloc_netdev() reserve extra space for driver
private data which gets freed when the network device is freed. If
separately allocated data is attached to the network device
(netdev_priv(dev)) then it is up to the module exit handler to free that.
diff --git a/Documentation/power/opp.txt b/Documentation/power/opp.txt
index 425c51d56aef..b8a907dc0169 100644
--- a/Documentation/power/opp.txt
+++ b/Documentation/power/opp.txt
@@ -42,7 +42,7 @@ We can represent these as three OPPs as the following {Hz, uV} tuples:
OPP library provides a set of helper functions to organize and query the OPP
information. The library is located in drivers/base/power/opp.c and the header
-is located in include/linux/opp.h. OPP library can be enabled by enabling
+is located in include/linux/pm_opp.h. OPP library can be enabled by enabling
CONFIG_PM_OPP from power management menuconfig menu. OPP library depends on
CONFIG_PM as certain SoCs such as Texas Instrument's OMAP framework allows to
optionally boot at a certain OPP without needing cpufreq.
@@ -71,14 +71,14 @@ operations until that OPP could be re-enabled if possible.
OPP library facilitates this concept in it's implementation. The following
operational functions operate only on available opps:
-opp_find_freq_{ceil, floor}, opp_get_voltage, opp_get_freq, opp_get_opp_count
-and opp_init_cpufreq_table
+opp_find_freq_{ceil, floor}, dev_pm_opp_get_voltage, dev_pm_opp_get_freq, dev_pm_opp_get_opp_count
+and dev_pm_opp_init_cpufreq_table
-opp_find_freq_exact is meant to be used to find the opp pointer which can then
-be used for opp_enable/disable functions to make an opp available as required.
+dev_pm_opp_find_freq_exact is meant to be used to find the opp pointer which can then
+be used for dev_pm_opp_enable/disable functions to make an opp available as required.
WARNING: Users of OPP library should refresh their availability count using
-get_opp_count if opp_enable/disable functions are invoked for a device, the
+get_opp_count if dev_pm_opp_enable/disable functions are invoked for a device, the
exact mechanism to trigger these or the notification mechanism to other
dependent subsystems such as cpufreq are left to the discretion of the SoC
specific framework which uses the OPP library. Similar care needs to be taken
@@ -96,24 +96,24 @@ using RCU read locks. The opp_find_freq_{exact,ceil,floor},
opp_get_{voltage, freq, opp_count} fall into this category.
opp_{add,enable,disable} are updaters which use mutex and implement it's own
-RCU locking mechanisms. opp_init_cpufreq_table acts as an updater and uses
+RCU locking mechanisms. dev_pm_opp_init_cpufreq_table acts as an updater and uses
mutex to implment RCU updater strategy. These functions should *NOT* be called
under RCU locks and other contexts that prevent blocking functions in RCU or
mutex operations from working.
2. Initial OPP List Registration
================================
-The SoC implementation calls opp_add function iteratively to add OPPs per
+The SoC implementation calls dev_pm_opp_add function iteratively to add OPPs per
device. It is expected that the SoC framework will register the OPP entries
optimally- typical numbers range to be less than 5. The list generated by
registering the OPPs is maintained by OPP library throughout the device
operation. The SoC framework can subsequently control the availability of the
-OPPs dynamically using the opp_enable / disable functions.
+OPPs dynamically using the dev_pm_opp_enable / disable functions.
-opp_add - Add a new OPP for a specific domain represented by the device pointer.
+dev_pm_opp_add - Add a new OPP for a specific domain represented by the device pointer.
The OPP is defined using the frequency and voltage. Once added, the OPP
is assumed to be available and control of it's availability can be done
- with the opp_enable/disable functions. OPP library internally stores
+ with the dev_pm_opp_enable/disable functions. OPP library internally stores
and manages this information in the opp struct. This function may be
used by SoC framework to define a optimal list as per the demands of
SoC usage environment.
@@ -124,7 +124,7 @@ opp_add - Add a new OPP for a specific domain represented by the device pointer.
soc_pm_init()
{
/* Do things */
- r = opp_add(mpu_dev, 1000000, 900000);
+ r = dev_pm_opp_add(mpu_dev, 1000000, 900000);
if (!r) {
pr_err("%s: unable to register mpu opp(%d)\n", r);
goto no_cpufreq;
@@ -143,44 +143,44 @@ functions return the matching pointer representing the opp if a match is
found, else returns error. These errors are expected to be handled by standard
error checks such as IS_ERR() and appropriate actions taken by the caller.
-opp_find_freq_exact - Search for an OPP based on an *exact* frequency and
+dev_pm_opp_find_freq_exact - Search for an OPP based on an *exact* frequency and
availability. This function is especially useful to enable an OPP which
is not available by default.
Example: In a case when SoC framework detects a situation where a
higher frequency could be made available, it can use this function to
- find the OPP prior to call the opp_enable to actually make it available.
+ find the OPP prior to call the dev_pm_opp_enable to actually make it available.
rcu_read_lock();
- opp = opp_find_freq_exact(dev, 1000000000, false);
+ opp = dev_pm_opp_find_freq_exact(dev, 1000000000, false);
rcu_read_unlock();
/* dont operate on the pointer.. just do a sanity check.. */
if (IS_ERR(opp)) {
pr_err("frequency not disabled!\n");
/* trigger appropriate actions.. */
} else {
- opp_enable(dev,1000000000);
+ dev_pm_opp_enable(dev,1000000000);
}
NOTE: This is the only search function that operates on OPPs which are
not available.
-opp_find_freq_floor - Search for an available OPP which is *at most* the
+dev_pm_opp_find_freq_floor - Search for an available OPP which is *at most* the
provided frequency. This function is useful while searching for a lesser
match OR operating on OPP information in the order of decreasing
frequency.
Example: To find the highest opp for a device:
freq = ULONG_MAX;
rcu_read_lock();
- opp_find_freq_floor(dev, &freq);
+ dev_pm_opp_find_freq_floor(dev, &freq);
rcu_read_unlock();
-opp_find_freq_ceil - Search for an available OPP which is *at least* the
+dev_pm_opp_find_freq_ceil - Search for an available OPP which is *at least* the
provided frequency. This function is useful while searching for a
higher match OR operating on OPP information in the order of increasing
frequency.
Example 1: To find the lowest opp for a device:
freq = 0;
rcu_read_lock();
- opp_find_freq_ceil(dev, &freq);
+ dev_pm_opp_find_freq_ceil(dev, &freq);
rcu_read_unlock();
Example 2: A simplified implementation of a SoC cpufreq_driver->target:
soc_cpufreq_target(..)
@@ -188,7 +188,7 @@ opp_find_freq_ceil - Search for an available OPP which is *at least* the
/* Do stuff like policy checks etc. */
/* Find the best frequency match for the req */
rcu_read_lock();
- opp = opp_find_freq_ceil(dev, &freq);
+ opp = dev_pm_opp_find_freq_ceil(dev, &freq);
rcu_read_unlock();
if (!IS_ERR(opp))
soc_switch_to_freq_voltage(freq);
@@ -208,34 +208,34 @@ as thermal considerations (e.g. don't use OPPx until the temperature drops).
WARNING: Do not use these functions in interrupt context.
-opp_enable - Make a OPP available for operation.
+dev_pm_opp_enable - Make a OPP available for operation.
Example: Lets say that 1GHz OPP is to be made available only if the
SoC temperature is lower than a certain threshold. The SoC framework
implementation might choose to do something as follows:
if (cur_temp < temp_low_thresh) {
/* Enable 1GHz if it was disabled */
rcu_read_lock();
- opp = opp_find_freq_exact(dev, 1000000000, false);
+ opp = dev_pm_opp_find_freq_exact(dev, 1000000000, false);
rcu_read_unlock();
/* just error check */
if (!IS_ERR(opp))
- ret = opp_enable(dev, 1000000000);
+ ret = dev_pm_opp_enable(dev, 1000000000);
else
goto try_something_else;
}
-opp_disable - Make an OPP to be not available for operation
+dev_pm_opp_disable - Make an OPP to be not available for operation
Example: Lets say that 1GHz OPP is to be disabled if the temperature
exceeds a threshold value. The SoC framework implementation might
choose to do something as follows:
if (cur_temp > temp_high_thresh) {
/* Disable 1GHz if it was enabled */
rcu_read_lock();
- opp = opp_find_freq_exact(dev, 1000000000, true);
+ opp = dev_pm_opp_find_freq_exact(dev, 1000000000, true);
rcu_read_unlock();
/* just error check */
if (!IS_ERR(opp))
- ret = opp_disable(dev, 1000000000);
+ ret = dev_pm_opp_disable(dev, 1000000000);
else
goto try_something_else;
}
@@ -247,7 +247,7 @@ information from the OPP structure is necessary. Once an OPP pointer is
retrieved using the search functions, the following functions can be used by SoC
framework to retrieve the information represented inside the OPP layer.
-opp_get_voltage - Retrieve the voltage represented by the opp pointer.
+dev_pm_opp_get_voltage - Retrieve the voltage represented by the opp pointer.
Example: At a cpufreq transition to a different frequency, SoC
framework requires to set the voltage represented by the OPP using
the regulator framework to the Power Management chip providing the
@@ -256,15 +256,15 @@ opp_get_voltage - Retrieve the voltage represented by the opp pointer.
{
/* do things */
rcu_read_lock();
- opp = opp_find_freq_ceil(dev, &freq);
- v = opp_get_voltage(opp);
+ opp = dev_pm_opp_find_freq_ceil(dev, &freq);
+ v = dev_pm_opp_get_voltage(opp);
rcu_read_unlock();
if (v)
regulator_set_voltage(.., v);
/* do other things */
}
-opp_get_freq - Retrieve the freq represented by the opp pointer.
+dev_pm_opp_get_freq - Retrieve the freq represented by the opp pointer.
Example: Lets say the SoC framework uses a couple of helper functions
we could pass opp pointers instead of doing additional parameters to
handle quiet a bit of data parameters.
@@ -273,8 +273,8 @@ opp_get_freq - Retrieve the freq represented by the opp pointer.
/* do things.. */
max_freq = ULONG_MAX;
rcu_read_lock();
- max_opp = opp_find_freq_floor(dev,&max_freq);
- requested_opp = opp_find_freq_ceil(dev,&freq);
+ max_opp = dev_pm_opp_find_freq_floor(dev,&max_freq);
+ requested_opp = dev_pm_opp_find_freq_ceil(dev,&freq);
if (!IS_ERR(max_opp) && !IS_ERR(requested_opp))
r = soc_test_validity(max_opp, requested_opp);
rcu_read_unlock();
@@ -282,25 +282,25 @@ opp_get_freq - Retrieve the freq represented by the opp pointer.
}
soc_test_validity(..)
{
- if(opp_get_voltage(max_opp) < opp_get_voltage(requested_opp))
+ if(dev_pm_opp_get_voltage(max_opp) < dev_pm_opp_get_voltage(requested_opp))
return -EINVAL;
- if(opp_get_freq(max_opp) < opp_get_freq(requested_opp))
+ if(dev_pm_opp_get_freq(max_opp) < dev_pm_opp_get_freq(requested_opp))
return -EINVAL;
/* do things.. */
}
-opp_get_opp_count - Retrieve the number of available opps for a device
+dev_pm_opp_get_opp_count - Retrieve the number of available opps for a device
Example: Lets say a co-processor in the SoC needs to know the available
frequencies in a table, the main processor can notify as following:
soc_notify_coproc_available_frequencies()
{
/* Do things */
rcu_read_lock();
- num_available = opp_get_opp_count(dev);
+ num_available = dev_pm_opp_get_opp_count(dev);
speeds = kzalloc(sizeof(u32) * num_available, GFP_KERNEL);
/* populate the table in increasing order */
freq = 0;
- while (!IS_ERR(opp = opp_find_freq_ceil(dev, &freq))) {
+ while (!IS_ERR(opp = dev_pm_opp_find_freq_ceil(dev, &freq))) {
speeds[i] = freq;
freq++;
i++;
@@ -313,7 +313,7 @@ opp_get_opp_count - Retrieve the number of available opps for a device
6. Cpufreq Table Generation
===========================
-opp_init_cpufreq_table - cpufreq framework typically is initialized with
+dev_pm_opp_init_cpufreq_table - cpufreq framework typically is initialized with
cpufreq_frequency_table_cpuinfo which is provided with the list of
frequencies that are available for operation. This function provides
a ready to use conversion routine to translate the OPP layer's internal
@@ -326,7 +326,7 @@ opp_init_cpufreq_table - cpufreq framework typically is initialized with
soc_pm_init()
{
/* Do things */
- r = opp_init_cpufreq_table(dev, &freq_table);
+ r = dev_pm_opp_init_cpufreq_table(dev, &freq_table);
if (!r)
cpufreq_frequency_table_cpuinfo(policy, freq_table);
/* Do other things */
@@ -336,7 +336,7 @@ opp_init_cpufreq_table - cpufreq framework typically is initialized with
addition to CONFIG_PM as power management feature is required to
dynamically scale voltage and frequency in a system.
-opp_free_cpufreq_table - Free up the table allocated by opp_init_cpufreq_table
+dev_pm_opp_free_cpufreq_table - Free up the table allocated by dev_pm_opp_init_cpufreq_table
7. Data Structures
==================
@@ -358,16 +358,16 @@ accessed by various functions as described above. However, the structures
representing the actual OPPs and domains are internal to the OPP library itself
to allow for suitable abstraction reusable across systems.
-struct opp - The internal data structure of OPP library which is used to
+struct dev_pm_opp - The internal data structure of OPP library which is used to
represent an OPP. In addition to the freq, voltage, availability
information, it also contains internal book keeping information required
for the OPP library to operate on. Pointer to this structure is
provided back to the users such as SoC framework to be used as a
identifier for OPP in the interactions with OPP layer.
- WARNING: The struct opp pointer should not be parsed or modified by the
- users. The defaults of for an instance is populated by opp_add, but the
- availability of the OPP can be modified by opp_enable/disable functions.
+ WARNING: The struct dev_pm_opp pointer should not be parsed or modified by the
+ users. The defaults of for an instance is populated by dev_pm_opp_add, but the
+ availability of the OPP can be modified by dev_pm_opp_enable/disable functions.
struct device - This is used to identify a domain to the OPP layer. The
nature of the device and it's implementation is left to the user of
@@ -377,19 +377,19 @@ Overall, in a simplistic view, the data structure operations is represented as
following:
Initialization / modification:
- +-----+ /- opp_enable
-opp_add --> | opp | <-------
- | +-----+ \- opp_disable
+ +-----+ /- dev_pm_opp_enable
+dev_pm_opp_add --> | opp | <-------
+ | +-----+ \- dev_pm_opp_disable
\-------> domain_info(device)
Search functions:
- /-- opp_find_freq_ceil ---\ +-----+
-domain_info<---- opp_find_freq_exact -----> | opp |
- \-- opp_find_freq_floor ---/ +-----+
+ /-- dev_pm_opp_find_freq_ceil ---\ +-----+
+domain_info<---- dev_pm_opp_find_freq_exact -----> | opp |
+ \-- dev_pm_opp_find_freq_floor ---/ +-----+
Retrieval functions:
-+-----+ /- opp_get_voltage
++-----+ /- dev_pm_opp_get_voltage
| opp | <---
-+-----+ \- opp_get_freq
++-----+ \- dev_pm_opp_get_freq
-domain_info <- opp_get_opp_count
+domain_info <- dev_pm_opp_get_opp_count
diff --git a/Documentation/power/powercap/powercap.txt b/Documentation/power/powercap/powercap.txt
new file mode 100644
index 000000000000..1e6ef164e07a
--- /dev/null
+++ b/Documentation/power/powercap/powercap.txt
@@ -0,0 +1,236 @@
+Power Capping Framework
+==================================
+
+The power capping framework provides a consistent interface between the kernel
+and the user space that allows power capping drivers to expose the settings to
+user space in a uniform way.
+
+Terminology
+=========================
+The framework exposes power capping devices to user space via sysfs in the
+form of a tree of objects. The objects at the root level of the tree represent
+'control types', which correspond to different methods of power capping. For
+example, the intel-rapl control type represents the Intel "Running Average
+Power Limit" (RAPL) technology, whereas the 'idle-injection' control type
+corresponds to the use of idle injection for controlling power.
+
+Power zones represent different parts of the system, which can be controlled and
+monitored using the power capping method determined by the control type the
+given zone belongs to. They each contain attributes for monitoring power, as
+well as controls represented in the form of power constraints. If the parts of
+the system represented by different power zones are hierarchical (that is, one
+bigger part consists of multiple smaller parts that each have their own power
+controls), those power zones may also be organized in a hierarchy with one
+parent power zone containing multiple subzones and so on to reflect the power
+control topology of the system. In that case, it is possible to apply power
+capping to a set of devices together using the parent power zone and if more
+fine grained control is required, it can be applied through the subzones.
+
+
+Example sysfs interface tree:
+
+/sys/devices/virtual/powercap
+??? intel-rapl
+ ??? intel-rapl:0
+ ?   ??? constraint_0_name
+ ?   ??? constraint_0_power_limit_uw
+ ?   ??? constraint_0_time_window_us
+ ?   ??? constraint_1_name
+ ?   ??? constraint_1_power_limit_uw
+ ?   ??? constraint_1_time_window_us
+ ?   ??? device -> ../../intel-rapl
+ ?   ??? energy_uj
+ ?   ??? intel-rapl:0:0
+ ?   ?   ??? constraint_0_name
+ ?   ?   ??? constraint_0_power_limit_uw
+ ?   ?   ??? constraint_0_time_window_us
+ ?   ?   ??? constraint_1_name
+ ?   ?   ??? constraint_1_power_limit_uw
+ ?   ?   ??? constraint_1_time_window_us
+ ?   ?   ??? device -> ../../intel-rapl:0
+ ?   ?   ??? energy_uj
+ ?   ?   ??? max_energy_range_uj
+ ?   ?   ??? name
+ ?   ?   ??? enabled
+ ?   ?   ??? power
+ ?   ?   ?   ??? async
+ ?   ?   ?   []
+ ?   ?   ??? subsystem -> ../../../../../../class/power_cap
+ ?   ?   ??? uevent
+ ?   ??? intel-rapl:0:1
+ ?   ?   ??? constraint_0_name
+ ?   ?   ??? constraint_0_power_limit_uw
+ ?   ?   ??? constraint_0_time_window_us
+ ?   ?   ??? constraint_1_name
+ ?   ?   ??? constraint_1_power_limit_uw
+ ?   ?   ??? constraint_1_time_window_us
+ ?   ?   ??? device -> ../../intel-rapl:0
+ ?   ?   ??? energy_uj
+ ?   ?   ??? max_energy_range_uj
+ ?   ?   ??? name
+ ?   ?   ??? enabled
+ ?   ?   ??? power
+ ?   ?   ?   ??? async
+ ?   ?   ?   []
+ ?   ?   ??? subsystem -> ../../../../../../class/power_cap
+ ?   ?   ??? uevent
+ ?   ??? max_energy_range_uj
+ ?   ??? max_power_range_uw
+ ?   ??? name
+ ?   ??? enabled
+ ?   ??? power
+ ?   ?   ??? async
+ ?   ?   []
+ ?   ??? subsystem -> ../../../../../class/power_cap
+ ?   ??? enabled
+ ?   ??? uevent
+ ??? intel-rapl:1
+ ?   ??? constraint_0_name
+ ?   ??? constraint_0_power_limit_uw
+ ?   ??? constraint_0_time_window_us
+ ?   ??? constraint_1_name
+ ?   ??? constraint_1_power_limit_uw
+ ?   ??? constraint_1_time_window_us
+ ?   ??? device -> ../../intel-rapl
+ ?   ??? energy_uj
+ ?   ??? intel-rapl:1:0
+ ?   ?   ??? constraint_0_name
+ ?   ?   ??? constraint_0_power_limit_uw
+ ?   ?   ??? constraint_0_time_window_us
+ ?   ?   ??? constraint_1_name
+ ?   ?   ??? constraint_1_power_limit_uw
+ ?   ?   ??? constraint_1_time_window_us
+ ?   ?   ??? device -> ../../intel-rapl:1
+ ?   ?   ??? energy_uj
+ ?   ?   ??? max_energy_range_uj
+ ?   ?   ??? name
+ ?   ?   ??? enabled
+ ?   ?   ??? power
+ ?   ?   ?   ??? async
+ ?   ?   ?   []
+ ?   ?   ??? subsystem -> ../../../../../../class/power_cap
+ ?   ?   ??? uevent
+ ?   ??? intel-rapl:1:1
+ ?   ?   ??? constraint_0_name
+ ?   ?   ??? constraint_0_power_limit_uw
+ ?   ?   ??? constraint_0_time_window_us
+ ?   ?   ??? constraint_1_name
+ ?   ?   ??? constraint_1_power_limit_uw
+ ?   ?   ??? constraint_1_time_window_us
+ ?   ?   ??? device -> ../../intel-rapl:1
+ ?   ?   ??? energy_uj
+ ?   ?   ??? max_energy_range_uj
+ ?   ?   ??? name
+ ?   ?   ??? enabled
+ ?   ?   ??? power
+ ?   ?   ?   ??? async
+ ?   ?   ?   []
+ ?   ?   ??? subsystem -> ../../../../../../class/power_cap
+ ?   ?   ??? uevent
+ ?   ??? max_energy_range_uj
+ ?   ??? max_power_range_uw
+ ?   ??? name
+ ?   ??? enabled
+ ?   ??? power
+ ?   ?   ??? async
+ ?   ?   []
+ ?   ??? subsystem -> ../../../../../class/power_cap
+ ?   ??? uevent
+ ??? power
+ ?   ??? async
+ ?   []
+ ??? subsystem -> ../../../../class/power_cap
+ ??? enabled
+ ??? uevent
+
+The above example illustrates a case in which the Intel RAPL technology,
+available in Intel® IA-64 and IA-32 Processor Architectures, is used. There is one
+control type called intel-rapl which contains two power zones, intel-rapl:0 and
+intel-rapl:1, representing CPU packages. Each of these power zones contains
+two subzones, intel-rapl:j:0 and intel-rapl:j:1 (j = 0, 1), representing the
+"core" and the "uncore" parts of the given CPU package, respectively. All of
+the zones and subzones contain energy monitoring attributes (energy_uj,
+max_energy_range_uj) and constraint attributes (constraint_*) allowing controls
+to be applied (the constraints in the 'package' power zones apply to the whole
+CPU packages and the subzone constraints only apply to the respective parts of
+the given package individually). Since Intel RAPL doesn't provide instantaneous
+power value, there is no power_uw attribute.
+
+In addition to that, each power zone contains a name attribute, allowing the
+part of the system represented by that zone to be identified.
+For example:
+
+cat /sys/class/power_cap/intel-rapl/intel-rapl:0/name
+package-0
+
+The Intel RAPL technology allows two constraints, short term and long term,
+with two different time windows to be applied to each power zone. Thus for
+each zone there are 2 attributes representing the constraint names, 2 power
+limits and 2 attributes representing the sizes of the time windows. Such that,
+constraint_j_* attributes correspond to the jth constraint (j = 0,1).
+
+For example:
+ constraint_0_name
+ constraint_0_power_limit_uw
+ constraint_0_time_window_us
+ constraint_1_name
+ constraint_1_power_limit_uw
+ constraint_1_time_window_us
+
+Power Zone Attributes
+=================================
+Monitoring attributes
+----------------------
+
+energy_uj (rw): Current energy counter in micro joules. Write "0" to reset.
+If the counter can not be reset, then this attribute is read only.
+
+max_energy_range_uj (ro): Range of the above energy counter in micro-joules.
+
+power_uw (ro): Current power in micro watts.
+
+max_power_range_uw (ro): Range of the above power value in micro-watts.
+
+name (ro): Name of this power zone.
+
+It is possible that some domains have both power ranges and energy counter ranges;
+however, only one is mandatory.
+
+Constraints
+----------------
+constraint_X_power_limit_uw (rw): Power limit in micro watts, which should be
+applicable for the time window specified by "constraint_X_time_window_us".
+
+constraint_X_time_window_us (rw): Time window in micro seconds.
+
+constraint_X_name (ro): An optional name of the constraint
+
+constraint_X_max_power_uw(ro): Maximum allowed power in micro watts.
+
+constraint_X_min_power_uw(ro): Minimum allowed power in micro watts.
+
+constraint_X_max_time_window_us(ro): Maximum allowed time window in micro seconds.
+
+constraint_X_min_time_window_us(ro): Minimum allowed time window in micro seconds.
+
+Except power_limit_uw and time_window_us other fields are optional.
+
+Common zone and control type attributes
+----------------------------------------
+enabled (rw): Enable/Disable controls at zone level or for all zones using
+a control type.
+
+Power Cap Client Driver Interface
+==================================
+The API summary:
+
+Call powercap_register_control_type() to register control type object.
+Call powercap_register_zone() to register a power zone (under a given
+control type), either as a top-level power zone or as a subzone of another
+power zone registered earlier.
+The number of constraints in a power zone and the corresponding callbacks have
+to be defined prior to calling powercap_register_zone() to register that zone.
+
+To Free a power zone call powercap_unregister_zone().
+To free a control type object call powercap_unregister_control_type().
+Detailed API can be generated using kernel-doc on include/linux/powercap.h.
diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt
index 71d8fe4e75d3..0f54333b0ff2 100644
--- a/Documentation/power/runtime_pm.txt
+++ b/Documentation/power/runtime_pm.txt
@@ -145,11 +145,13 @@ The action performed by the idle callback is totally dependent on the subsystem
if the device can be suspended (i.e. if all of the conditions necessary for
suspending the device are satisfied) and to queue up a suspend request for the
device in that case. If there is no idle callback, or if the callback returns
-0, then the PM core will attempt to carry out a runtime suspend of the device;
-in essence, it will call pm_runtime_suspend() directly. To prevent this (for
-example, if the callback routine has started a delayed suspend), the routine
-should return a non-zero value. Negative error return codes are ignored by the
-PM core.
+0, then the PM core will attempt to carry out a runtime suspend of the device,
+also respecting devices configured for autosuspend. In essence this means a
+call to pm_runtime_autosuspend() (do note that drivers needs to update the
+device last busy mark, pm_runtime_mark_last_busy(), to control the delay under
+this circumstance). To prevent this (for example, if the callback routine has
+started a delayed suspend), the routine must return a non-zero value. Negative
+error return codes are ignored by the PM core.
The helper functions provided by the PM core, described in Section 4, guarantee
that the following constraints are met with respect to runtime PM callbacks for
@@ -308,7 +310,7 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
- execute the subsystem-level idle callback for the device; returns an
error code on failure, where -EINPROGRESS means that ->runtime_idle() is
already being executed; if there is no callback or the callback returns 0
- then run pm_runtime_suspend(dev) and return its result
+ then run pm_runtime_autosuspend(dev) and return its result
int pm_runtime_suspend(struct device *dev);
- execute the subsystem-level suspend callback for the device; returns 0 on
diff --git a/Documentation/ptp/testptp.c b/Documentation/ptp/testptp.c
index f59ded066108..a74d0a84d329 100644
--- a/Documentation/ptp/testptp.c
+++ b/Documentation/ptp/testptp.c
@@ -100,6 +100,11 @@ static long ppb_to_scaled_ppm(int ppb)
return (long) (ppb * 65.536);
}
+static int64_t pctns(struct ptp_clock_time *t)
+{
+ return t->sec * 1000000000LL + t->nsec;
+}
+
static void usage(char *progname)
{
fprintf(stderr,
@@ -112,6 +117,8 @@ static void usage(char *progname)
" -f val adjust the ptp clock frequency by 'val' ppb\n"
" -g get the ptp clock time\n"
" -h prints this message\n"
+ " -k val measure the time offset between system and phc clock\n"
+ " for 'val' times (Maximum 25)\n"
" -p val enable output with a period of 'val' nanoseconds\n"
" -P val enable or disable (val=1|0) the system clock PPS\n"
" -s set the ptp clock time from the system time\n"
@@ -133,8 +140,12 @@ int main(int argc, char *argv[])
struct itimerspec timeout;
struct sigevent sigevent;
+ struct ptp_clock_time *pct;
+ struct ptp_sys_offset *sysoff;
+
+
char *progname;
- int c, cnt, fd;
+ int i, c, cnt, fd;
char *device = DEVICE;
clockid_t clkid;
@@ -144,14 +155,19 @@ int main(int argc, char *argv[])
int extts = 0;
int gettime = 0;
int oneshot = 0;
+ int pct_offset = 0;
+ int n_samples = 0;
int periodic = 0;
int perout = -1;
int pps = -1;
int settime = 0;
+ int64_t t1, t2, tp;
+ int64_t interval, offset;
+
progname = strrchr(argv[0], '/');
progname = progname ? 1+progname : argv[0];
- while (EOF != (c = getopt(argc, argv, "a:A:cd:e:f:ghp:P:sSt:v"))) {
+ while (EOF != (c = getopt(argc, argv, "a:A:cd:e:f:ghk:p:P:sSt:v"))) {
switch (c) {
case 'a':
oneshot = atoi(optarg);
@@ -174,6 +190,10 @@ int main(int argc, char *argv[])
case 'g':
gettime = 1;
break;
+ case 'k':
+ pct_offset = 1;
+ n_samples = atoi(optarg);
+ break;
case 'p':
perout = atoi(optarg);
break;
@@ -376,6 +396,47 @@ int main(int argc, char *argv[])
}
}
+ if (pct_offset) {
+ if (n_samples <= 0 || n_samples > 25) {
+ puts("n_samples should be between 1 and 25");
+ usage(progname);
+ return -1;
+ }
+
+ sysoff = calloc(1, sizeof(*sysoff));
+ if (!sysoff) {
+ perror("calloc");
+ return -1;
+ }
+ sysoff->n_samples = n_samples;
+
+ if (ioctl(fd, PTP_SYS_OFFSET, sysoff))
+ perror("PTP_SYS_OFFSET");
+ else
+ puts("system and phc clock time offset request okay");
+
+ pct = &sysoff->ts[0];
+ for (i = 0; i < sysoff->n_samples; i++) {
+ t1 = pctns(pct+2*i);
+ tp = pctns(pct+2*i+1);
+ t2 = pctns(pct+2*i+2);
+ interval = t2 - t1;
+ offset = (t2 + t1) / 2 - tp;
+
+ printf("system time: %ld.%ld\n",
+ (pct+2*i)->sec, (pct+2*i)->nsec);
+ printf("phc time: %ld.%ld\n",
+ (pct+2*i+1)->sec, (pct+2*i+1)->nsec);
+ printf("system time: %ld.%ld\n",
+ (pct+2*i+2)->sec, (pct+2*i+2)->nsec);
+ printf("system/phc clock time offset is %ld ns\n"
+ "system clock time delay is %ld ns\n",
+ offset, interval);
+ }
+
+ free(sysoff);
+ }
+
close(fd);
return 0;
}
diff --git a/Documentation/pwm.txt b/Documentation/pwm.txt
index 1039b68fe9c6..93cb97974986 100644
--- a/Documentation/pwm.txt
+++ b/Documentation/pwm.txt
@@ -39,7 +39,7 @@ New users should use the pwm_get() function and pass to it the consumer
device or a consumer name. pwm_put() is used to free the PWM device. Managed
variants of these functions, devm_pwm_get() and devm_pwm_put(), also exist.
-After being requested a PWM has to be configured using:
+After being requested, a PWM has to be configured using:
int pwm_config(struct pwm_device *pwm, int duty_ns, int period_ns);
@@ -94,7 +94,7 @@ for new drivers to use the generic PWM framework.
A new PWM controller/chip can be added using pwmchip_add() and removed
again with pwmchip_remove(). pwmchip_add() takes a filled in struct
pwm_chip as argument which provides a description of the PWM chip, the
-number of PWM devices provider by the chip and the chip-specific
+number of PWM devices provided by the chip and the chip-specific
implementation of the supported PWM operations to the framework.
Locking
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 4273b2d71a27..26b7ee491df8 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -290,13 +290,24 @@ Default value is "/sbin/hotplug".
kptr_restrict:
This toggle indicates whether restrictions are placed on
-exposing kernel addresses via /proc and other interfaces. When
-kptr_restrict is set to (0), there are no restrictions. When
-kptr_restrict is set to (1), the default, kernel pointers
-printed using the %pK format specifier will be replaced with 0's
-unless the user has CAP_SYSLOG. When kptr_restrict is set to
-(2), kernel pointers printed using %pK will be replaced with 0's
-regardless of privileges.
+exposing kernel addresses via /proc and other interfaces.
+
+When kptr_restrict is set to (0), the default, there are no restrictions.
+
+When kptr_restrict is set to (1), kernel pointers printed using the %pK
+format specifier will be replaced with 0's unless the user has CAP_SYSLOG
+and effective user and group ids are equal to the real ids. This is
+because %pK checks are done at read() time rather than open() time, so
+if permissions are elevated between the open() and the read() (e.g via
+a setuid binary) then %pK will not leak kernel pointers to unprivileged
+users. Note, this is a temporary solution only. The correct long-term
+solution is to do the permission checks at open() time. Consider removing
+world read permissions from files that use %pK, and using dmesg_restrict
+to protect against uses of %pK in dmesg(8) if leaking kernel pointer
+values to unprivileged users is a concern.
+
+When kptr_restrict is set to (2), kernel pointers printed using
+%pK will be replaced with 0's regardless of privileges.
==============================================================
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 79a797eb3e87..1fbd4eb7b64a 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -119,8 +119,11 @@ other appears as 0 when read.
dirty_background_ratio
-Contains, as a percentage of total system memory, the number of pages at which
-the background kernel flusher threads will start writing out dirty data.
+Contains, as a percentage of total available memory that contains free pages
+and reclaimable pages, the number of pages at which the background kernel
+flusher threads will start writing out dirty data.
+
+The total avaiable memory is not equal to total system memory.
==============================================================
@@ -151,9 +154,11 @@ interval will be written out next time a flusher thread wakes up.
dirty_ratio
-Contains, as a percentage of total system memory, the number of pages at which
-a process which is generating disk writes will itself start writing out dirty
-data.
+Contains, as a percentage of total available memory that contains free pages
+and reclaimable pages, the number of pages at which a process which is
+generating disk writes will itself start writing out dirty data.
+
+The total avaiable memory is not equal to total system memory.
==============================================================
diff --git a/Documentation/timers/00-INDEX b/Documentation/timers/00-INDEX
index a9248da5cdbc..ef2ccbf77fa2 100644
--- a/Documentation/timers/00-INDEX
+++ b/Documentation/timers/00-INDEX
@@ -8,5 +8,9 @@ hpet_example.c
- sample hpet timer test program
hrtimers.txt
- subsystem for high-resolution kernel timers
+NO_HZ.txt
+ - Summary of the different methods for the scheduler clock-interrupts management.
+timers-howto.txt
+ - how to insert delays in the kernel the right (tm) way.
timer_stats.txt
- timer usage statistics
diff --git a/Documentation/trace/tracepoints.txt b/Documentation/trace/tracepoints.txt
index ac4170dd0f24..6b018b53177a 100644
--- a/Documentation/trace/tracepoints.txt
+++ b/Documentation/trace/tracepoints.txt
@@ -114,3 +114,8 @@ core kernel image or in modules.
If the tracepoint has to be used in kernel modules, an
EXPORT_TRACEPOINT_SYMBOL_GPL() or EXPORT_TRACEPOINT_SYMBOL() can be
used to export the defined tracepoints.
+
+Note: The convenience macro TRACE_EVENT provides an alternative way to
+ define tracepoints. Check http://lwn.net/Articles/379903,
+ http://lwn.net/Articles/381064 and http://lwn.net/Articles/383362
+ for a series of articles with more details.
diff --git a/Documentation/usb/gadget_configfs.txt b/Documentation/usb/gadget_configfs.txt
index 8ec2a67c39b7..4cf53e406613 100644
--- a/Documentation/usb/gadget_configfs.txt
+++ b/Documentation/usb/gadget_configfs.txt
@@ -26,7 +26,7 @@ Linux provides a number of functions for gadgets to use.
Creating a gadget means deciding what configurations there will be
and which functions each configuration will provide.
-Configfs (please see Documentation/filesystems/configfs/*) lends itslef nicely
+Configfs (please see Documentation/filesystems/configfs/*) lends itself nicely
for the purpose of telling the kernel about the above mentioned decision.
This document is about how to do it.
@@ -99,7 +99,7 @@ directories must be created:
$ mkdir configs/<name>.<number>
where <name> can be any string which is legal in a filesystem and the
-<numebr> is the configuration's number, e.g.:
+<number> is the configuration's number, e.g.:
$ mkdir configs/c.1
@@ -327,7 +327,7 @@ from the buffer to the cs), but it is up to the implementer of the
two functions to decide what they actually do.
typedef struct configured_structure cs;
-typedef struc specific_attribute sa;
+typedef struct specific_attribute sa;
sa
+----------------------------------+
diff --git a/Documentation/virtual/kvm/00-INDEX b/Documentation/virtual/kvm/00-INDEX
new file mode 100644
index 000000000000..641ec9220179
--- /dev/null
+++ b/Documentation/virtual/kvm/00-INDEX
@@ -0,0 +1,24 @@
+00-INDEX
+ - this file.
+api.txt
+ - KVM userspace API.
+cpuid.txt
+ - KVM-specific cpuid leaves (x86).
+devices/
+ - KVM_CAP_DEVICE_CTRL userspace API.
+hypercalls.txt
+ - KVM hypercalls.
+locking.txt
+ - notes on KVM locks.
+mmu.txt
+ - the x86 kvm shadow mmu.
+msr.txt
+ - KVM-specific MSRs (x86).
+nested-vmx.txt
+ - notes on nested virtualization for Intel x86 processors.
+ppc-pv.txt
+ - the paravirtualization interface on PowerPC.
+review-checklist.txt
+ - review checklist for KVM patches.
+timekeeping.txt
+ - timekeeping virtualization for x86-based architectures.
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 858aecf21db2..a30035dd4c26 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1122,9 +1122,9 @@ struct kvm_cpuid2 {
struct kvm_cpuid_entry2 entries[0];
};
-#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX 1
-#define KVM_CPUID_FLAG_STATEFUL_FUNC 2
-#define KVM_CPUID_FLAG_STATE_READ_NEXT 4
+#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0)
+#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1)
+#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2)
struct kvm_cpuid_entry2 {
__u32 function;
@@ -1810,6 +1810,50 @@ registers, find a list below:
PPC | KVM_REG_PPC_TLB3PS | 32
PPC | KVM_REG_PPC_EPTCFG | 32
PPC | KVM_REG_PPC_ICP_STATE | 64
+ PPC | KVM_REG_PPC_TB_OFFSET | 64
+ PPC | KVM_REG_PPC_SPMC1 | 32
+ PPC | KVM_REG_PPC_SPMC2 | 32
+ PPC | KVM_REG_PPC_IAMR | 64
+ PPC | KVM_REG_PPC_TFHAR | 64
+ PPC | KVM_REG_PPC_TFIAR | 64
+ PPC | KVM_REG_PPC_TEXASR | 64
+ PPC | KVM_REG_PPC_FSCR | 64
+ PPC | KVM_REG_PPC_PSPB | 32
+ PPC | KVM_REG_PPC_EBBHR | 64
+ PPC | KVM_REG_PPC_EBBRR | 64
+ PPC | KVM_REG_PPC_BESCR | 64
+ PPC | KVM_REG_PPC_TAR | 64
+ PPC | KVM_REG_PPC_DPDES | 64
+ PPC | KVM_REG_PPC_DAWR | 64
+ PPC | KVM_REG_PPC_DAWRX | 64
+ PPC | KVM_REG_PPC_CIABR | 64
+ PPC | KVM_REG_PPC_IC | 64
+ PPC | KVM_REG_PPC_VTB | 64
+ PPC | KVM_REG_PPC_CSIGR | 64
+ PPC | KVM_REG_PPC_TACR | 64
+ PPC | KVM_REG_PPC_TCSCR | 64
+ PPC | KVM_REG_PPC_PID | 64
+ PPC | KVM_REG_PPC_ACOP | 64
+ PPC | KVM_REG_PPC_VRSAVE | 32
+ PPC | KVM_REG_PPC_LPCR | 64
+ PPC | KVM_REG_PPC_PPR | 64
+ PPC | KVM_REG_PPC_ARCH_COMPAT 32
+ PPC | KVM_REG_PPC_TM_GPR0 | 64
+ ...
+ PPC | KVM_REG_PPC_TM_GPR31 | 64
+ PPC | KVM_REG_PPC_TM_VSR0 | 128
+ ...
+ PPC | KVM_REG_PPC_TM_VSR63 | 128
+ PPC | KVM_REG_PPC_TM_CR | 64
+ PPC | KVM_REG_PPC_TM_LR | 64
+ PPC | KVM_REG_PPC_TM_CTR | 64
+ PPC | KVM_REG_PPC_TM_FPSCR | 64
+ PPC | KVM_REG_PPC_TM_AMR | 64
+ PPC | KVM_REG_PPC_TM_PPR | 64
+ PPC | KVM_REG_PPC_TM_VRSAVE | 64
+ PPC | KVM_REG_PPC_TM_VSCR | 32
+ PPC | KVM_REG_PPC_TM_DSCR | 64
+ PPC | KVM_REG_PPC_TM_TAR | 64
ARM registers are mapped using the lower 32 bits. The upper 16 of that
is the register group type, or coprocessor number:
@@ -2304,7 +2348,31 @@ Possible features:
Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only).
-4.83 KVM_GET_REG_LIST
+4.83 KVM_ARM_PREFERRED_TARGET
+
+Capability: basic
+Architectures: arm, arm64
+Type: vm ioctl
+Parameters: struct struct kvm_vcpu_init (out)
+Returns: 0 on success; -1 on error
+Errors:
+ ENODEV: no preferred target available for the host
+
+This queries KVM for preferred CPU target type which can be emulated
+by KVM on underlying host.
+
+The ioctl returns struct kvm_vcpu_init instance containing information
+about preferred CPU target type and recommended features for it. The
+kvm_vcpu_init->features bitmap returned will have feature bits set if
+the preferred target recommends setting these features, but this is
+not mandatory.
+
+The information returned by this ioctl can be used to prepare an instance
+of struct kvm_vcpu_init for KVM_ARM_VCPU_INIT ioctl which will result in
+in VCPU matching underlying host.
+
+
+4.84 KVM_GET_REG_LIST
Capability: basic
Architectures: arm, arm64
@@ -2323,8 +2391,7 @@ struct kvm_reg_list {
This ioctl returns the guest registers that are supported for the
KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
-
-4.84 KVM_ARM_SET_DEVICE_ADDR
+4.85 KVM_ARM_SET_DEVICE_ADDR
Capability: KVM_CAP_ARM_SET_DEVICE_ADDR
Architectures: arm, arm64
@@ -2362,7 +2429,7 @@ must be called after calling KVM_CREATE_IRQCHIP, but before calling
KVM_RUN on any of the VCPUs. Calling this ioctl twice for any of the
base addresses will return -EEXIST.
-4.85 KVM_PPC_RTAS_DEFINE_TOKEN
+4.86 KVM_PPC_RTAS_DEFINE_TOKEN
Capability: KVM_CAP_PPC_RTAS
Architectures: ppc
@@ -2661,6 +2728,77 @@ and usually define the validity of a groups of registers. (e.g. one bit
};
+4.81 KVM_GET_EMULATED_CPUID
+
+Capability: KVM_CAP_EXT_EMUL_CPUID
+Architectures: x86
+Type: system ioctl
+Parameters: struct kvm_cpuid2 (in/out)
+Returns: 0 on success, -1 on error
+
+struct kvm_cpuid2 {
+ __u32 nent;
+ __u32 flags;
+ struct kvm_cpuid_entry2 entries[0];
+};
+
+The member 'flags' is used for passing flags from userspace.
+
+#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0)
+#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1)
+#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2)
+
+struct kvm_cpuid_entry2 {
+ __u32 function;
+ __u32 index;
+ __u32 flags;
+ __u32 eax;
+ __u32 ebx;
+ __u32 ecx;
+ __u32 edx;
+ __u32 padding[3];
+};
+
+This ioctl returns x86 cpuid features which are emulated by
+kvm.Userspace can use the information returned by this ioctl to query
+which features are emulated by kvm instead of being present natively.
+
+Userspace invokes KVM_GET_EMULATED_CPUID by passing a kvm_cpuid2
+structure with the 'nent' field indicating the number of entries in
+the variable-size array 'entries'. If the number of entries is too low
+to describe the cpu capabilities, an error (E2BIG) is returned. If the
+number is too high, the 'nent' field is adjusted and an error (ENOMEM)
+is returned. If the number is just right, the 'nent' field is adjusted
+to the number of valid entries in the 'entries' array, which is then
+filled.
+
+The entries returned are the set CPUID bits of the respective features
+which kvm emulates, as returned by the CPUID instruction, with unknown
+or unsupported feature bits cleared.
+
+Features like x2apic, for example, may not be present in the host cpu
+but are exposed by kvm in KVM_GET_SUPPORTED_CPUID because they can be
+emulated efficiently and thus not included here.
+
+The fields in each entry are defined as follows:
+
+ function: the eax value used to obtain the entry
+ index: the ecx value used to obtain the entry (for entries that are
+ affected by ecx)
+ flags: an OR of zero or more of the following:
+ KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
+ if the index field is valid
+ KVM_CPUID_FLAG_STATEFUL_FUNC:
+ if cpuid for this function returns different values for successive
+ invocations; there will be several entries with the same function,
+ all with this flag set
+ KVM_CPUID_FLAG_STATE_READ_NEXT:
+ for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
+ the first entry to be read by a cpu
+ eax, ebx, ecx, edx: the values returned by the cpuid instruction for
+ this function/index combination
+
+
6. Capabilities that can be enabled
-----------------------------------
diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
index 22ff659bc0fb..3c65feb83010 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -43,6 +43,13 @@ KVM_FEATURE_CLOCKSOURCE2 || 3 || kvmclock available at msrs
KVM_FEATURE_ASYNC_PF || 4 || async pf can be enabled by
|| || writing to msr 0x4b564d02
------------------------------------------------------------------------------
+KVM_FEATURE_STEAL_TIME || 5 || steal time can be enabled by
+ || || writing to msr 0x4b564d03.
+------------------------------------------------------------------------------
+KVM_FEATURE_PV_EOI || 6 || paravirtualized end of interrupt
+ || || handler can be enabled by writing
+ || || to msr 0x4b564d04.
+------------------------------------------------------------------------------
KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit
|| || before enabling paravirtualized
|| || spinlock support.
diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt
new file mode 100644
index 000000000000..ef51740c67ca
--- /dev/null
+++ b/Documentation/virtual/kvm/devices/vfio.txt
@@ -0,0 +1,22 @@
+VFIO virtual device
+===================
+
+Device types supported:
+ KVM_DEV_TYPE_VFIO
+
+Only one VFIO instance may be created per VM. The created device
+tracks VFIO groups in use by the VM and features of those groups
+important to the correctness and acceleration of the VM. As groups
+are enabled and disabled for use by the VM, KVM should be updated
+about their presence. When registered with KVM, a reference to the
+VFIO-group is held by KVM.
+
+Groups:
+ KVM_DEV_VFIO_GROUP
+
+KVM_DEV_VFIO_GROUP attributes:
+ KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
+ KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking
+
+For each, kvm_device_attr.addr points to an int32_t file descriptor
+for the VFIO group.
diff --git a/Documentation/virtual/kvm/locking.txt b/Documentation/virtual/kvm/locking.txt
index 41b7ac9884b5..f8869410d40c 100644
--- a/Documentation/virtual/kvm/locking.txt
+++ b/Documentation/virtual/kvm/locking.txt
@@ -132,10 +132,14 @@ See the comments in spte_has_volatile_bits() and mmu_spte_update().
------------
Name: kvm_lock
-Type: raw_spinlock
+Type: spinlock_t
Arch: any
Protects: - vm_list
- - hardware virtualization enable/disable
+
+Name: kvm_count_lock
+Type: raw_spinlock_t
+Arch: any
+Protects: - hardware virtualization enable/disable
Comment: 'raw' because hardware enabling/disabling must be atomic /wrt
migration.
@@ -151,3 +155,14 @@ Type: spinlock_t
Arch: any
Protects: -shadow page/shadow tlb entry
Comment: it is a spinlock since it is used in mmu notifier.
+
+Name: kvm->srcu
+Type: srcu lock
+Arch: any
+Protects: - kvm->memslots
+ - kvm->buses
+Comment: The srcu read lock must be held while accessing memslots (e.g.
+ when using gfn_to_* functions) and while accessing in-kernel
+ MMIO/PIO address->device structure mapping (kvm->buses).
+ The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu
+ if it is needed by multiple functions.
diff --git a/Documentation/vm/00-INDEX b/Documentation/vm/00-INDEX
index 5481c8ba3412..a39d06680e1c 100644
--- a/Documentation/vm/00-INDEX
+++ b/Documentation/vm/00-INDEX
@@ -4,10 +4,12 @@ active_mm.txt
- An explanation from Linus about tsk->active_mm vs tsk->mm.
balance
- various information on memory balancing.
-hugepage-mmap.c
- - Example app using huge page memory with the mmap system call.
-hugepage-shm.c
- - Example app using huge page memory with Sys V shared memory system calls.
+cleancache.txt
+ - Intro to cleancache and page-granularity victim cache.
+frontswap.txt
+ - Outline frontswap, part of the transcendent memory frontend.
+highmem.txt
+ - Outline of highmem and common issues.
hugetlbpage.txt
- a brief summary of hugetlbpage support in the Linux kernel.
hwpoison.txt
@@ -16,21 +18,23 @@ ksm.txt
- how to use the Kernel Samepage Merging feature.
locking
- info on how locking and synchronization is done in the Linux vm code.
-map_hugetlb.c
- - an example program that uses the MAP_HUGETLB mmap flag.
numa
- information about NUMA specific code in the Linux vm.
numa_memory_policy.txt
- documentation of concepts and APIs of the 2.6 memory policy support.
overcommit-accounting
- description of the Linux kernels overcommit handling modes.
-page-types.c
- - Tool for querying page flags
page_migration
- description of page migration in NUMA systems.
pagemap.txt
- pagemap, from the userspace perspective
slub.txt
- a short users guide for SLUB.
+soft-dirty.txt
+ - short explanation for soft-dirty PTEs
+transhuge.txt
+ - Transparent Hugepage Support, alternative way of using hugepages.
unevictable-lru.txt
- Unevictable LRU infrastructure
+zswap.txt
+ - Intro to compressed cache for swap pages
diff --git a/Documentation/vm/split_page_table_lock b/Documentation/vm/split_page_table_lock
new file mode 100644
index 000000000000..7521d367f21d
--- /dev/null
+++ b/Documentation/vm/split_page_table_lock
@@ -0,0 +1,94 @@
+Split page table lock
+=====================
+
+Originally, mm->page_table_lock spinlock protected all page tables of the
+mm_struct. But this approach leads to poor page fault scalability of
+multi-threaded applications due high contention on the lock. To improve
+scalability, split page table lock was introduced.
+
+With split page table lock we have separate per-table lock to serialize
+access to the table. At the moment we use split lock for PTE and PMD
+tables. Access to higher level tables protected by mm->page_table_lock.
+
+There are helpers to lock/unlock a table and other accessor functions:
+ - pte_offset_map_lock()
+ maps pte and takes PTE table lock, returns pointer to the taken
+ lock;
+ - pte_unmap_unlock()
+ unlocks and unmaps PTE table;
+ - pte_alloc_map_lock()
+ allocates PTE table if needed and take the lock, returns pointer
+ to taken lock or NULL if allocation failed;
+ - pte_lockptr()
+ returns pointer to PTE table lock;
+ - pmd_lock()
+ takes PMD table lock, returns pointer to taken lock;
+ - pmd_lockptr()
+ returns pointer to PMD table lock;
+
+Split page table lock for PTE tables is enabled compile-time if
+CONFIG_SPLIT_PTLOCK_CPUS (usually 4) is less or equal to NR_CPUS.
+If split lock is disabled, all tables guaded by mm->page_table_lock.
+
+Split page table lock for PMD tables is enabled, if it's enabled for PTE
+tables and the architecture supports it (see below).
+
+Hugetlb and split page table lock
+---------------------------------
+
+Hugetlb can support several page sizes. We use split lock only for PMD
+level, but not for PUD.
+
+Hugetlb-specific helpers:
+ - huge_pte_lock()
+ takes pmd split lock for PMD_SIZE page, mm->page_table_lock
+ otherwise;
+ - huge_pte_lockptr()
+ returns pointer to table lock;
+
+Support of split page table lock by an architecture
+---------------------------------------------------
+
+There's no need in special enabling of PTE split page table lock:
+everything required is done by pgtable_page_ctor() and pgtable_page_dtor(),
+which must be called on PTE table allocation / freeing.
+
+Make sure the architecture doesn't use slab allocator for page table
+allocation: slab uses page->slab_cache and page->first_page for its pages.
+These fields share storage with page->ptl.
+
+PMD split lock only makes sense if you have more than two page table
+levels.
+
+PMD split lock enabling requires pgtable_pmd_page_ctor() call on PMD table
+allocation and pgtable_pmd_page_dtor() on freeing.
+
+Allocation usually happens in pmd_alloc_one(), freeing in pmd_free(), but
+make sure you cover all PMD table allocation / freeing paths: i.e X86_PAE
+preallocate few PMDs on pgd_alloc().
+
+With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK.
+
+NOTE: pgtable_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must
+be handled properly.
+
+page->ptl
+---------
+
+page->ptl is used to access split page table lock, where 'page' is struct
+page of page containing the table. It shares storage with page->private
+(and few other fields in union).
+
+To avoid increasing size of struct page and have best performance, we use a
+trick:
+ - if spinlock_t fits into long, we use page->ptr as spinlock, so we
+ can avoid indirect access and save a cache line.
+ - if size of spinlock_t is bigger then size of long, we use page->ptl as
+ pointer to spinlock_t and allocate it dynamically. This allows to use
+ split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs
+ one more cache line for indirect access;
+
+The spinlock_t allocated in pgtable_page_ctor() for PTE table and in
+pgtable_pmd_page_ctor() for PMD table.
+
+Please, never access page->ptl directly -- use appropriate helper.
diff --git a/Documentation/vm/zswap.txt b/Documentation/vm/zswap.txt
index 7e492d8aaeaf..00c3d31e7971 100644
--- a/Documentation/vm/zswap.txt
+++ b/Documentation/vm/zswap.txt
@@ -8,7 +8,7 @@ significant performance improvement if reads from the compressed cache are
faster than reads from a swap device.
NOTE: Zswap is a new feature as of v3.11 and interacts heavily with memory
-reclaim. This interaction has not be fully explored on the large set of
+reclaim. This interaction has not been fully explored on the large set of
potential configurations and workloads that exist. For this reason, zswap
is a work in progress and should be considered experimental.
@@ -23,7 +23,7 @@ Some potential benefits:
    drastically reducing life-shortening writes.
Zswap evicts pages from compressed cache on an LRU basis to the backing swap
-device when the compressed pool reaches it size limit. This requirement had
+device when the compressed pool reaches its size limit. This requirement had
been identified in prior community discussions.
To enabled zswap, the "enabled" attribute must be set to 1 at boot time. e.g.
@@ -37,7 +37,7 @@ the backing swap device in the case that the compressed pool is full.
Zswap makes use of zbud for the managing the compressed memory pool. Each
allocation in zbud is not directly accessible by address. Rather, a handle is
-return by the allocation routine and that handle must be mapped before being
+returned by the allocation routine and that handle must be mapped before being
accessed. The compressed memory pool grows on demand and shrinks as compressed
pages are freed. The pool is not preallocated.
@@ -56,7 +56,7 @@ in the swap_map goes to 0) the swap code calls the zswap invalidate function,
via frontswap, to free the compressed entry.
Zswap seeks to be simple in its policies. Sysfs attributes allow for one user
-controlled policies:
+controlled policy:
* max_pool_percent - The maximum percentage of memory that the compressed
pool can occupy.