diff options
Diffstat (limited to 'Documentation')
35 files changed, 1202 insertions, 578 deletions
diff --git a/Documentation/DMA-API-HOWTO.txt b/Documentation/DMA-API-HOWTO.txt index 3c4e07123e59..d568bc235bc0 100644 --- a/Documentation/DMA-API-HOWTO.txt +++ b/Documentation/DMA-API-HOWTO.txt @@ -738,17 +738,17 @@ to "Closing". CONFIG_NEED_SG_DMA_LENGTH if the architecture supports IOMMUs (including software IOMMU). -2) ARCH_KMALLOC_MINALIGN +2) ARCH_DMA_MINALIGN Architectures must ensure that kmalloc'ed buffer is DMA-safe. Drivers and subsystems depend on it. If an architecture isn't fully DMA-coherent (i.e. hardware doesn't ensure that data in the CPU cache is identical to data in main memory), - ARCH_KMALLOC_MINALIGN must be set so that the memory allocator + ARCH_DMA_MINALIGN must be set so that the memory allocator makes sure that kmalloc'ed buffer doesn't share a cache line with the others. See arch/arm/include/asm/cache.h as an example. - Note that ARCH_KMALLOC_MINALIGN is about DMA memory alignment + Note that ARCH_DMA_MINALIGN is about DMA memory alignment constraints. You don't need to worry about the architecture data alignment constraints (e.g. the alignment constraints about 64-bit objects). diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl index ecd35e9d4410..feca0758391e 100644 --- a/Documentation/DocBook/device-drivers.tmpl +++ b/Documentation/DocBook/device-drivers.tmpl @@ -46,7 +46,6 @@ <sect1><title>Atomic and pointer manipulation</title> !Iarch/x86/include/asm/atomic.h -!Iarch/x86/include/asm/unaligned.h </sect1> <sect1><title>Delaying, scheduling, and timer routines</title> diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl index a20c6f6fffc3..6899f471fb15 100644 --- a/Documentation/DocBook/kernel-api.tmpl +++ b/Documentation/DocBook/kernel-api.tmpl @@ -57,7 +57,6 @@ </para> <sect1><title>String Conversions</title> -!Ilib/vsprintf.c !Elib/vsprintf.c </sect1> <sect1><title>String Manipulation</title> diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl index 084f6ad7b7a0..a0d479d1e1dd 100644 --- a/Documentation/DocBook/kernel-locking.tmpl +++ b/Documentation/DocBook/kernel-locking.tmpl @@ -1922,9 +1922,12 @@ machines due to caching. <function>mutex_lock()</function> </para> <para> - There is a <function>mutex_trylock()</function> which can be - used inside interrupt context, as it will not sleep. + There is a <function>mutex_trylock()</function> which does not + sleep. Still, it must not be used inside interrupt context since + its implementation is not safe for that. <function>mutex_unlock()</function> will also never sleep. + It cannot be used in interrupt context either since a mutex + must be released by the same task that acquired it. </para> </listitem> </itemizedlist> @@ -1958,6 +1961,12 @@ machines due to caching. </sect1> </chapter> + <chapter id="apiref"> + <title>Mutex API reference</title> +!Iinclude/linux/mutex.h +!Ekernel/mutex.c + </chapter> + <chapter id="references"> <title>Further reading</title> diff --git a/Documentation/DocBook/tracepoint.tmpl b/Documentation/DocBook/tracepoint.tmpl index e8473eae2a20..b57a9ede3224 100644 --- a/Documentation/DocBook/tracepoint.tmpl +++ b/Documentation/DocBook/tracepoint.tmpl @@ -104,4 +104,9 @@ <title>Block IO</title> !Iinclude/trace/events/block.h </chapter> + + <chapter id="workqueue"> + <title>Workqueue</title> +!Iinclude/trace/events/workqueue.h + </chapter> </book> diff --git a/Documentation/acpi/method-customizing.txt b/Documentation/acpi/method-customizing.txt index e628cd23ca80..3e1d25aee3fb 100644 --- a/Documentation/acpi/method-customizing.txt +++ b/Documentation/acpi/method-customizing.txt @@ -19,6 +19,8 @@ Note: Only ACPI METHOD can be overridden, any other object types like "Device", "OperationRegion", are not recognized. Note: The same ACPI control method can be overridden for many times, and it's always the latest one that used by Linux/kernel. +Note: To get the ACPI debug object output (Store (AAAA, Debug)), + please run "echo 1 > /sys/module/acpi/parameters/aml_debug_output". 1. override an existing method a) get the ACPI table via ACPI sysfs I/F. e.g. to get the DSDT, diff --git a/Documentation/block/cfq-iosched.txt b/Documentation/block/cfq-iosched.txt new file mode 100644 index 000000000000..e578feed6d81 --- /dev/null +++ b/Documentation/block/cfq-iosched.txt @@ -0,0 +1,45 @@ +CFQ ioscheduler tunables +======================== + +slice_idle +---------- +This specifies how long CFQ should idle for next request on certain cfq queues +(for sequential workloads) and service trees (for random workloads) before +queue is expired and CFQ selects next queue to dispatch from. + +By default slice_idle is a non-zero value. That means by default we idle on +queues/service trees. This can be very helpful on highly seeky media like +single spindle SATA/SAS disks where we can cut down on overall number of +seeks and see improved throughput. + +Setting slice_idle to 0 will remove all the idling on queues/service tree +level and one should see an overall improved throughput on faster storage +devices like multiple SATA/SAS disks in hardware RAID configuration. The down +side is that isolation provided from WRITES also goes down and notion of +IO priority becomes weaker. + +So depending on storage and workload, it might be useful to set slice_idle=0. +In general I think for SATA/SAS disks and software RAID of SATA/SAS disks +keeping slice_idle enabled should be useful. For any configurations where +there are multiple spindles behind single LUN (Host based hardware RAID +controller or for storage arrays), setting slice_idle=0 might end up in better +throughput and acceptable latencies. + +CFQ IOPS Mode for group scheduling +=================================== +Basic CFQ design is to provide priority based time slices. Higher priority +process gets bigger time slice and lower priority process gets smaller time +slice. Measuring time becomes harder if storage is fast and supports NCQ and +it would be better to dispatch multiple requests from multiple cfq queues in +request queue at a time. In such scenario, it is not possible to measure time +consumed by single queue accurately. + +What is possible though is to measure number of requests dispatched from a +single queue and also allow dispatch from multiple cfq queue at the same time. +This effectively becomes the fairness in terms of IOPS (IO operations per +second). + +If one sets slice_idle=0 and if storage supports NCQ, CFQ internally switches +to IOPS mode and starts providing fairness in terms of number of requests +dispatched. Note that this mode switching takes effect only for group +scheduling. For non-cgroup users nothing should change. diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt index 48e0b21b0059..6919d62591d9 100644 --- a/Documentation/cgroups/blkio-controller.txt +++ b/Documentation/cgroups/blkio-controller.txt @@ -217,6 +217,7 @@ Details of cgroup files CFQ sysfs tunable ================= /sys/block/<disk>/queue/iosched/group_isolation +----------------------------------------------- If group_isolation=1, it provides stronger isolation between groups at the expense of throughput. By default group_isolation is 0. In general that @@ -243,6 +244,33 @@ By default one should run with group_isolation=0. If that is not sufficient and one wants stronger isolation between groups, then set group_isolation=1 but this will come at cost of reduced throughput. +/sys/block/<disk>/queue/iosched/slice_idle +------------------------------------------ +On a faster hardware CFQ can be slow, especially with sequential workload. +This happens because CFQ idles on a single queue and single queue might not +drive deeper request queue depths to keep the storage busy. In such scenarios +one can try setting slice_idle=0 and that would switch CFQ to IOPS +(IO operations per second) mode on NCQ supporting hardware. + +That means CFQ will not idle between cfq queues of a cfq group and hence be +able to driver higher queue depth and achieve better throughput. That also +means that cfq provides fairness among groups in terms of IOPS and not in +terms of disk time. + +/sys/block/<disk>/queue/iosched/group_idle +------------------------------------------ +If one disables idling on individual cfq queues and cfq service trees by +setting slice_idle=0, group_idle kicks in. That means CFQ will still idle +on the group in an attempt to provide fairness among groups. + +By default group_idle is same as slice_idle and does not do anything if +slice_idle is enabled. + +One can experience an overall throughput drop if you have created multiple +groups and put applications in that group which are not driving enough +IO to keep disk busy. In that case set group_idle=0, and CFQ will not idle +on individual groups and throughput should improve. + What works ========== - Currently only sync IO queues are support. All the buffered writes are diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 842aa9de84a6..5e2bc4ab897a 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -386,34 +386,6 @@ Who: Tejun Heo <tj@kernel.org> ---------------------------- -What: Support for VMware's guest paravirtuliazation technique [VMI] will be - dropped. -When: 2.6.37 or earlier. -Why: With the recent innovations in CPU hardware acceleration technologies - from Intel and AMD, VMware ran a few experiments to compare these - techniques to guest paravirtualization technique on VMware's platform. - These hardware assisted virtualization techniques have outperformed the - performance benefits provided by VMI in most of the workloads. VMware - expects that these hardware features will be ubiquitous in a couple of - years, as a result, VMware has started a phased retirement of this - feature from the hypervisor. We will be removing this feature from the - Kernel too. Right now we are targeting 2.6.37 but can retire earlier if - technical reasons (read opportunity to remove major chunk of pvops) - arise. - - Please note that VMI has always been an optimization and non-VMI kernels - still work fine on VMware's platform. - Latest versions of VMware's product which support VMI are, - Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence - releases for these products will continue supporting VMI. - - For more details about VMI retirement take a look at this, - http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html - -Who: Alok N Kataria <akataria@vmware.com> - ----------------------------- - What: Support for lcd_switch and display_get in asus-laptop driver When: March 2010 Why: These two features use non-standard interfaces. There are the diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index bbcc15651a21..2db4283efa8d 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -374,8 +374,6 @@ prototypes: ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t); int (*readdir) (struct file *, void *, filldir_t); unsigned int (*poll) (struct file *, struct poll_table_struct *); - int (*ioctl) (struct inode *, struct file *, unsigned int, - unsigned long); long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long); long (*compat_ioctl) (struct file *, unsigned int, unsigned long); int (*mmap) (struct file *, struct vm_area_struct *); @@ -409,8 +407,7 @@ write: no aio_write: no readdir: no poll: no -ioctl: yes (see below) -unlocked_ioctl: no (see below) +unlocked_ioctl: no compat_ioctl: no mmap: no open: no @@ -453,9 +450,6 @@ move ->readdir() to inode_operations and use a separate method for directory anything that resembles union-mount we won't have a struct file for all components. And there are other reasons why the current interface is a mess... -->ioctl() on regular files is superceded by the ->unlocked_ioctl() that -doesn't take the BKL. - ->read on directories probably must go away - we should just enforce -EISDIR in sys_read() and friends. diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 94677e7dcb13..ed7e5efc06d8 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -727,7 +727,6 @@ struct file_operations { ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t); int (*readdir) (struct file *, void *, filldir_t); unsigned int (*poll) (struct file *, struct poll_table_struct *); - int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long); long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long); long (*compat_ioctl) (struct file *, unsigned int, unsigned long); int (*mmap) (struct file *, struct vm_area_struct *); @@ -768,10 +767,7 @@ otherwise noted. activity on this file and (optionally) go to sleep until there is activity. Called by the select(2) and poll(2) system calls - ioctl: called by the ioctl(2) system call - - unlocked_ioctl: called by the ioctl(2) system call. Filesystems that do not - require the BKL should use this method instead of the ioctl() above. + unlocked_ioctl: called by the ioctl(2) system call. compat_ioctl: called by the ioctl(2) system call when 32 bit system calls are used on 64 bit kernels. diff --git a/Documentation/gpio.txt b/Documentation/gpio.txt index d96a6dba5748..9633da01ff46 100644 --- a/Documentation/gpio.txt +++ b/Documentation/gpio.txt @@ -109,17 +109,19 @@ use numbers 2000-2063 to identify GPIOs in a bank of I2C GPIO expanders. If you want to initialize a structure with an invalid GPIO number, use some negative number (perhaps "-EINVAL"); that will never be valid. To -test if a number could reference a GPIO, you may use this predicate: +test if such number from such a structure could reference a GPIO, you +may use this predicate: int gpio_is_valid(int number); A number that's not valid will be rejected by calls which may request or free GPIOs (see below). Other numbers may also be rejected; for -example, a number might be valid but unused on a given board. - -Whether a platform supports multiple GPIO controllers is currently a -platform-specific implementation issue. +example, a number might be valid but temporarily unused on a given board. +Whether a platform supports multiple GPIO controllers is a platform-specific +implementation issue, as are whether that support can leave "holes" in the space +of GPIO numbers, and whether new controllers can be added at runtime. Such issues +can affect things including whether adjacent GPIO numbers are both valid. Using GPIOs ----------- @@ -480,12 +482,16 @@ To support this framework, a platform's Kconfig will "select" either ARCH_REQUIRE_GPIOLIB or ARCH_WANT_OPTIONAL_GPIOLIB and arrange that its <asm/gpio.h> includes <asm-generic/gpio.h> and defines three functions: gpio_get_value(), gpio_set_value(), and gpio_cansleep(). -They may also want to provide a custom value for ARCH_NR_GPIOS. -ARCH_REQUIRE_GPIOLIB means that the gpio-lib code will always get compiled +It may also provide a custom value for ARCH_NR_GPIOS, so that it better +reflects the number of GPIOs in actual use on that platform, without +wasting static table space. (It should count both built-in/SoC GPIOs and +also ones on GPIO expanders. + +ARCH_REQUIRE_GPIOLIB means that the gpiolib code will always get compiled into the kernel on that architecture. -ARCH_WANT_OPTIONAL_GPIOLIB means the gpio-lib code defaults to off and the user +ARCH_WANT_OPTIONAL_GPIOLIB means the gpiolib code defaults to off and the user can enable it and build it into the kernel optionally. If neither of these options are selected, the platform does not support diff --git a/Documentation/hwmon/emc2103 b/Documentation/hwmon/emc2103 new file mode 100644 index 000000000000..a12b2c127140 --- /dev/null +++ b/Documentation/hwmon/emc2103 @@ -0,0 +1,33 @@ +Kernel driver emc2103 +====================== + +Supported chips: + * SMSC EMC2103 + Addresses scanned: I2C 0x2e + Prefix: 'emc2103' + Datasheet: Not public + +Authors: + Steve Glendinning <steve.glendinning@smsc.com> + +Description +----------- + +The Standard Microsystems Corporation (SMSC) EMC2103 chips +contain up to 4 temperature sensors and a single fan controller. + +Fan rotation speeds are reported in RPM (rotations per minute). An alarm is +triggered if the rotation speed has dropped below a programmable limit. Fan +readings can be divided by a programmable divider (1, 2, 4 or 8) to give +the readings more range or accuracy. Not all RPM values can accurately be +represented, so some rounding is done. With a divider of 1, the lowest +representable value is 480 RPM. + +This driver supports RPM based control, to use this a fan target +should be written to fan1_target and pwm1_enable should be set to 3. + +The 2103-2 and 2103-4 variants have a third temperature sensor, which can +be connected to two anti-parallel diodes. These values can be read +as temp3 and temp4. If only one diode is attached to this channel, temp4 +will show as "fault". The module parameter "apd=0" can be used to suppress +this 4th channel when anti-parallel diodes are not fitted. diff --git a/Documentation/hwmon/f71882fg b/Documentation/hwmon/f71882fg index 1a07fd674cd0..a7952c2bd959 100644 --- a/Documentation/hwmon/f71882fg +++ b/Documentation/hwmon/f71882fg @@ -2,10 +2,6 @@ Kernel driver f71882fg ====================== Supported chips: - * Fintek F71808E - Prefix: 'f71808fg' - Addresses scanned: none, address read from Super I/O config space - Datasheet: Not public * Fintek F71858FG Prefix: 'f71858fg' Addresses scanned: none, address read from Super I/O config space diff --git a/Documentation/hwmon/ltc4245 b/Documentation/hwmon/ltc4245 index 86b5880d8502..b478b0864965 100644 --- a/Documentation/hwmon/ltc4245 +++ b/Documentation/hwmon/ltc4245 @@ -72,9 +72,31 @@ in6_min_alarm 5v output undervoltage alarm in7_min_alarm 3v output undervoltage alarm in8_min_alarm Vee (-12v) output undervoltage alarm -in9_input GPIO voltage data +in9_input GPIO voltage data (see note 1) +in10_input GPIO voltage data (see note 1) +in11_input GPIO voltage data (see note 1) power1_input 12v power usage (mW) power2_input 5v power usage (mW) power3_input 3v power usage (mW) power4_input Vee (-12v) power usage (mW) + + +Note 1 +------ + +If you have NOT configured the driver to sample all GPIO pins as analog +voltages, then the in10_input and in11_input sysfs attributes will not be +created. The driver will sample the GPIO pin that is currently connected to the +ADC as an analog voltage, and report the value in in9_input. + +If you have configured the driver to sample all GPIO pins as analog voltages, +then they will be sampled in round-robin fashion. If userspace reads too +slowly, -EAGAIN will be returned when you read the sysfs attribute containing +the sensor reading. + +The LTC4245 chip can be configured to sample all GPIO pins with two methods: +1) platform data -- see include/linux/i2c/ltc4245.h +2) OF device tree -- add the "ltc4245,use-extra-gpios" property to each chip + +The default mode of operation is to sample a single GPIO pin. diff --git a/Documentation/hwmon/pc87427 b/Documentation/hwmon/pc87427 index db5cc1227a83..8fdd08c9e48b 100644 --- a/Documentation/hwmon/pc87427 +++ b/Documentation/hwmon/pc87427 @@ -18,10 +18,11 @@ Description The National Semiconductor Super I/O chip includes complete hardware monitoring capabilities. It can monitor up to 18 voltages, 8 fans and -6 temperature sensors. Only the fans are supported at the moment. +6 temperature sensors. Only the fans and temperatures are supported at +the moment, voltages aren't. -This chip also has fan controlling features, which are not yet supported -by this driver either. +This chip also has fan controlling features (up to 4 PWM outputs), +which are partly supported by this driver. The driver assumes that no more than one chip is present, which seems reasonable. @@ -36,3 +37,23 @@ signal. Speeds down to 83 RPM can be measured. An alarm is triggered if the rotation speed drops below a programmable limit. Another alarm is triggered if the speed is too low to be measured (including stalled or missing fan). + + +Fan Speed Control +----------------- + +Fan speed can be controlled by PWM outputs. There are 4 possible modes: +always off, always on, manual and automatic. The latter isn't supported +by the driver: you can only return to that mode if it was the original +setting, and the configuration interface is missing. + + +Temperature Monitoring +---------------------- + +The PC87427 relies on external sensors (following the SensorPath +standard), so the resolution and range depend on the type of sensor +connected. The integer part can be 8-bit or 9-bit, and can be signed or +not. I couldn't find a way to figure out the external sensor data +temperature format, so user-space adjustment (typically by a factor 2) +may be required. diff --git a/Documentation/hwmon/sysfs-interface b/Documentation/hwmon/sysfs-interface index d4e2917c6f18..48ceabedf55d 100644 --- a/Documentation/hwmon/sysfs-interface +++ b/Documentation/hwmon/sysfs-interface @@ -91,12 +91,11 @@ name The chip name. I2C devices get this attribute created automatically. RO -update_rate The rate at which the chip will update readings. +update_interval The interval at which the chip will update readings. Unit: millisecond RW - Some devices have a variable update rate. This attribute - can be used to change the update rate to the desired - frequency. + Some devices have a variable update rate or interval. + This attribute can be used to change it to the desired value. ************ @@ -107,10 +106,24 @@ in[0-*]_min Voltage min value. Unit: millivolt RW +in[0-*]_lcrit Voltage critical min value. + Unit: millivolt + RW + If voltage drops to or below this limit, the system may + take drastic action such as power down or reset. At the very + least, it should report a fault. + in[0-*]_max Voltage max value. Unit: millivolt RW +in[0-*]_crit Voltage critical max value. + Unit: millivolt + RW + If voltage reaches or exceeds this limit, the system may + take drastic action such as power down or reset. At the very + least, it should report a fault. + in[0-*]_input Voltage input value. Unit: millivolt RO @@ -284,7 +297,7 @@ temp[1-*]_input Temperature input value. Unit: millidegree Celsius RO -temp[1-*]_crit Temperature critical value, typically greater than +temp[1-*]_crit Temperature critical max value, typically greater than corresponding temp_max values. Unit: millidegree Celsius RW @@ -296,6 +309,11 @@ temp[1-*]_crit_hyst from the critical value. RW +temp[1-*]_lcrit Temperature critical min value, typically lower than + corresponding temp_min values. + Unit: millidegree Celsius + RW + temp[1-*]_offset Temperature offset which is added to the temperature reading by the chip. @@ -344,9 +362,6 @@ Also see the Alarms section for status flags associated with temperatures. * Currents * ************ -Note that no known chip provides current measurements as of writing, -so this part is theoretical, so to say. - curr[1-*]_max Current max value Unit: milliampere RW @@ -471,6 +486,7 @@ limit-related alarms, not both. The driver should just reflect the hardware implementation. in[0-*]_alarm +curr[1-*]_alarm fan[1-*]_alarm temp[1-*]_alarm Channel alarm @@ -482,6 +498,8 @@ OR in[0-*]_min_alarm in[0-*]_max_alarm +curr[1-*]_min_alarm +curr[1-*]_max_alarm fan[1-*]_min_alarm fan[1-*]_max_alarm temp[1-*]_min_alarm @@ -497,7 +515,6 @@ to notify open diodes, unconnected fans etc. where the hardware supports it. When this boolean has value 1, the measurement for that channel should not be trusted. -in[0-*]_fault fan[1-*]_fault temp[1-*]_fault Input fault condition @@ -513,6 +530,7 @@ beep_enable Master beep enable RW in[0-*]_beep +curr[1-*]_beep fan[1-*]_beep temp[1-*]_beep Channel beep diff --git a/Documentation/hwmon/w83627ehf b/Documentation/hwmon/w83627ehf index b7e42ec4b26b..13d556112fc0 100644 --- a/Documentation/hwmon/w83627ehf +++ b/Documentation/hwmon/w83627ehf @@ -20,6 +20,10 @@ Supported chips: Prefix: 'w83667hg' Addresses scanned: ISA address retrieved from Super I/O registers Datasheet: not available + * Winbond W83667HG-B + Prefix: 'w83667hg' + Addresses scanned: ISA address retrieved from Super I/O registers + Datasheet: Available from Nuvoton upon request Authors: Jean Delvare <khali@linux-fr.org> @@ -32,8 +36,8 @@ Description ----------- This driver implements support for the Winbond W83627EHF, W83627EHG, -W83627DHG, W83627DHG-P and W83667HG super I/O chips. We will refer to them -collectively as Winbond chips. +W83627DHG, W83627DHG-P, W83667HG and W83667HG-B super I/O chips. +We will refer to them collectively as Winbond chips. The chips implement three temperature sensors, five fan rotation speed sensors, ten analog voltage sensors (only nine for the 627DHG), one @@ -68,14 +72,15 @@ follows: temp1 -> pwm1 temp2 -> pwm2 temp3 -> pwm3 -prog -> pwm4 (not on 667HG; the programmable setting is not supported by - the driver) +prog -> pwm4 (not on 667HG and 667HG-B; the programmable setting is not + supported by the driver) /sys files ---------- name - this is a standard hwmon device entry. For the W83627EHF and W83627EHG, - it is set to "w83627ehf" and for the W83627DHG it is set to "w83627dhg" + it is set to "w83627ehf", for the W83627DHG it is set to "w83627dhg", + and for the W83667HG it is set to "w83667hg". pwm[1-4] - this file stores PWM duty cycle or DC value (fan speed) in range: 0 (stop) to 255 (full) diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt index c375313cb128..c787ae512120 100644 --- a/Documentation/kbuild/makefiles.txt +++ b/Documentation/kbuild/makefiles.txt @@ -45,7 +45,6 @@ This document describes the Linux kernel Makefiles. --- 7.1 header-y --- 7.2 objhdr-y --- 7.3 destination-y - --- 7.4 unifdef-y (deprecated) === 8 Kbuild Variables === 9 Makefile language @@ -1245,11 +1244,6 @@ See subsequent chapter for the syntax of the Kbuild file. will be located in the directory "include/linux" when exported. - --- 7.4 unifdef-y (deprecated) - - unifdef-y is deprecated. A direct replacement is header-y. - - === 8 Kbuild Variables The top Makefile exports the following variables: diff --git a/Documentation/kernel-doc-nano-HOWTO.txt b/Documentation/kernel-doc-nano-HOWTO.txt index 27a52b35d55b..3d8a97747f77 100644 --- a/Documentation/kernel-doc-nano-HOWTO.txt +++ b/Documentation/kernel-doc-nano-HOWTO.txt @@ -345,5 +345,10 @@ documentation, in <filename>, for the functions listed. section titled <section title> from <filename>. Spaces are allowed in <section title>; do not quote the <section title>. +!C<filename> is replaced by nothing, but makes the tools check that +all DOC: sections and documented functions, symbols, etc. are used. +This makes sense to use when you use !F/!P only and want to verify +that all documentation is included. + Tim. */ <twaugh@redhat.com> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 873b68090098..281c70b94089 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -88,8 +88,8 @@ parameter is applicable: RAM RAM disk support is enabled. S390 S390 architecture is enabled. SCSI Appropriate SCSI support is enabled. - A lot of drivers has their options described inside of - Documentation/scsi/. + A lot of drivers have their options described inside + the Documentation/scsi/ sub-directory. SECURITY Different security models are enabled. SELINUX SELinux support is enabled. APPARMOR AppArmor support is enabled. @@ -284,27 +284,12 @@ and is between 256 and 4096 characters. It is defined in the file add_efi_memmap [EFI; X86] Include EFI memory map in kernel's map of available physical RAM. - advansys= [HW,SCSI] - See header of drivers/scsi/advansys.c. - agp= [AGP] { off | try_unsupported } off: disable AGP support try_unsupported: try to drive unsupported chipsets (may crash computer or cause data corruption) - aha152x= [HW,SCSI] - See Documentation/scsi/aha152x.txt. - - aha1542= [HW,SCSI] - Format: <portbase>[,<buson>,<busoff>[,<dmaspeed>]] - - aic7xxx= [HW,SCSI] - See Documentation/scsi/aic7xxx.txt. - - aic79xx= [HW,SCSI] - See Documentation/scsi/aic79xx.txt. - ALSA [HW,ALSA] See Documentation/sound/alsa/alsa-parameters.txt @@ -368,8 +353,6 @@ and is between 256 and 4096 characters. It is defined in the file atarimouse= [HW,MOUSE] Atari Mouse - atascsi= [HW,SCSI] Atari SCSI - atkbd.extra= [HW] Enable extra LEDs and keys on IBM RapidAccess, EzKey and similar keyboards @@ -419,10 +402,6 @@ and is between 256 and 4096 characters. It is defined in the file bttv.pll= See Documentation/video4linux/bttv/Insmod-options bttv.tuner= and Documentation/video4linux/bttv/CARDLIST - BusLogic= [HW,SCSI] - See drivers/scsi/BusLogic.c, comment before function - BusLogic_ParseDriverOptions(). - c101= [NET] Moxa C101 synchronous serial card cachesize= [BUGS=X86-32] Override level 2 CPU cache size detection. @@ -476,7 +455,7 @@ and is between 256 and 4096 characters. It is defined in the file [ARM] imx_timer1,OSTS,netx_timer,mpu_timer2, pxa_timer,timer3,32k_counter,timer0_1 [AVR32] avr32 - [X86-32] pit,hpet,tsc,vmi-timer; + [X86-32] pit,hpet,tsc; scx200_hrt on Geode; cyclone on IBM x440 [MIPS] MIPS [PARISC] cr16 @@ -671,8 +650,6 @@ and is between 256 and 4096 characters. It is defined in the file dscc4.setup= [NET] - dtc3181e= [HW,SCSI] - dynamic_printk Enables pr_debug()/dev_dbg() calls if CONFIG_DYNAMIC_PRINTK_DEBUG has been enabled. These can also be switched on/off via @@ -713,8 +690,6 @@ and is between 256 and 4096 characters. It is defined in the file This is desgined to be used in conjunction with the boot argument: earlyprintk=vga - eata= [HW,SCSI] - edd= [EDD] Format: {"off" | "on" | "skip[mbr]"} @@ -770,12 +745,6 @@ and is between 256 and 4096 characters. It is defined in the file Format: <interval>,<probability>,<space>,<times> See also /Documentation/fault-injection/. - fd_mcs= [HW,SCSI] - See header of drivers/scsi/fd_mcs.c. - - fdomain= [HW,SCSI] - See header of drivers/scsi/fdomain.c. - floppy= [HW] See Documentation/blockdev/floppy.txt. @@ -835,14 +804,9 @@ and is between 256 and 4096 characters. It is defined in the file When zero, profiling data is discarded and associated debugfs files are removed at module unload time. - gdth= [HW,SCSI] - See header of drivers/scsi/gdth.c. - gpt [EFI] Forces disk with valid GPT signature but invalid Protective MBR to be treated as GPT. - gvp11= [HW,SCSI] - hashdist= [KNL,NUMA] Large hashes allocated during boot are distributed across NUMA nodes. Defaults on for 64bit NUMA, off otherwise. @@ -931,9 +895,6 @@ and is between 256 and 4096 characters. It is defined in the file i8k.restricted [HW] Allow controlling fans only if SYS_ADMIN capability is set. - ibmmcascsi= [HW,MCA,SCSI] IBM MicroChannel SCSI adapter - See Documentation/mca.txt. - icn= [HW,ISDN] Format: <io>[,<membase>[,<icn_id>[,<icn_id2>]]] @@ -983,9 +944,6 @@ and is between 256 and 4096 characters. It is defined in the file programs exec'd, files mmap'd for exec, and all files opened for read by uid=0. - in2000= [HW,SCSI] - See header of drivers/scsi/in2000.c. - init= [KNL] Format: <full_path> Run specified binary instead of /sbin/init as init @@ -1023,6 +981,12 @@ and is between 256 and 4096 characters. It is defined in the file result in a hardware IOTLB flush operation as opposed to batching them for performance. + intremap= [X86-64, Intel-IOMMU] + Format: { on (default) | off | nosid } + on enable Interrupt Remapping (default) + off disable Interrupt Remapping + nosid disable Source ID checking + inttest= [IA64] iomem= Disable strict checking of access to MMIO memory @@ -1063,9 +1027,6 @@ and is between 256 and 4096 characters. It is defined in the file See comment before ip2_setup() in drivers/char/ip2/ip2base.c. - ips= [HW,SCSI] Adaptec / IBM ServeRAID controller - See header of drivers/scsi/ips.c. - irqfixup [HW] When an interrupt is not handled search all handlers for it. Intended to get systems with badly broken @@ -1341,9 +1302,6 @@ and is between 256 and 4096 characters. It is defined in the file ltpc= [NET] Format: <io>,<irq>,<dma> - mac5380= [HW,SCSI] Format: - <can_queue>,<cmd_per_lun>,<sg_tablesize>,<hostid>,<use_tags> - machvec= [IA64] Force the use of a particular machine-vector (machvec) in a generic kernel. Example: machvec=hpzx1_swiotlb @@ -1365,13 +1323,6 @@ and is between 256 and 4096 characters. It is defined in the file be mounted Format: <1-256> - max_luns= [SCSI] Maximum number of LUNs to probe. - Should be between 1 and 2^32-1. - - max_report_luns= - [SCSI] Maximum number of LUNs received. - Should be between 1 and 16384. - mcatest= [IA-64] mce [X86-32] Machine Check Exception @@ -1568,19 +1519,6 @@ and is between 256 and 4096 characters. It is defined in the file n2= [NET] SDL Inc. RISCom/N2 synchronous serial card - NCR_D700= [HW,SCSI] - See header of drivers/scsi/NCR_D700.c. - - ncr5380= [HW,SCSI] - - ncr53c400= [HW,SCSI] - - ncr53c400a= [HW,SCSI] - - ncr53c406a= [HW,SCSI] - - ncr53c8xx= [HW,SCSI] - netdev= [NET] Network devices parameters Format: <irq>,<io>,<mem_start>,<mem_end>,<name> Note that mem_start is often overloaded to mean @@ -1749,6 +1687,7 @@ and is between 256 and 4096 characters. It is defined in the file nointremap [X86-64, Intel-IOMMU] Do not enable interrupt remapping. + [Deprecated - use intremap=off] nointroute [IA-64] @@ -1859,10 +1798,6 @@ and is between 256 and 4096 characters. It is defined in the file OSS [HW,OSS] See Documentation/sound/oss/oss-parameters.txt - osst= [HW,SCSI] SCSI Tape Driver - Format: <buffer_size>,<write_threshold> - See also Documentation/scsi/st.txt. - panic= [KNL] Kernel behaviour on panic Format: <timeout> @@ -1895,9 +1830,6 @@ and is between 256 and 4096 characters. It is defined in the file Currently this function knows 686a and 8231 chips. Format: [spp|ps2|epp|ecp|ecpepp] - pas16= [HW,SCSI] - See header of drivers/scsi/pas16.c. - pause_on_oops= Halt all CPUs after the first oops has been printed for the specified number of seconds. This is to be used if @@ -2042,15 +1974,18 @@ and is between 256 and 4096 characters. It is defined in the file force Enable ASPM even on devices that claim not to support it. WARNING: Forcing ASPM on may cause system lockups. + pcie_ports= [PCIE] PCIe ports handling: + auto Ask the BIOS whether or not to use native PCIe services + associated with PCIe ports (PME, hot-plug, AER). Use + them only if that is allowed by the BIOS. + native Use native PCIe services associated with PCIe ports + unconditionally. + compat Treat PCIe ports as PCI-to-PCI bridges, disable the PCIe + ports driver. + pcie_pme= [PCIE,PM] Native PCIe PME signaling options: - Format: {auto|force}[,nomsi] - auto Use native PCIe PME signaling if the BIOS allows the - kernel to control PCIe config registers of root ports. - force Use native PCIe PME signaling even if the BIOS refuses - to allow the kernel to control the relevant PCIe config - registers. nomsi Do not use MSI for native PCIe PME signaling (this makes - all PCIe root ports use INTx for everything). + all PCIe root ports use INTx for all services). pcmv= [HW,PCMCIA] BadgePAD 4 @@ -2264,30 +2199,6 @@ and is between 256 and 4096 characters. It is defined in the file sched_debug [KNL] Enables verbose scheduler debug messages. - scsi_debug_*= [SCSI] - See drivers/scsi/scsi_debug.c. - - scsi_default_dev_flags= - [SCSI] SCSI default device flags - Format: <integer> - - scsi_dev_flags= [SCSI] Black/white list entry for vendor and model - Format: <vendor>:<model>:<flags> - (flags are integer value) - - scsi_logging_level= [SCSI] a bit mask of logging levels - See drivers/scsi/scsi_logging.h for bits. Also - settable via sysctl at dev.scsi.logging_level - (/proc/sys/dev/scsi/logging_level). - There is also a nice 'scsi_logging_level' script in the - S390-tools package, available for download at - http://www-128.ibm.com/developerworks/linux/linux390/s390-tools-1.5.4.html - - scsi_mod.scan= [SCSI] sync (default) scans SCSI busses as they are - discovered. async scans them in kernel threads, - allowing boot to proceed. none ignores them, expecting - user space to do the scan. - security= [SECURITY] Choose a security module to enable at boot. If this boot parameter is not specified, only the first security module asking for security registration will be @@ -2321,9 +2232,6 @@ and is between 256 and 4096 characters. It is defined in the file The parameter means the number of CPUs to show, for example 1 means boot CPU only. - sim710= [SCSI,HW] - See header of drivers/scsi/sim710.c. - simeth= [IA-64] simscsi= @@ -2395,9 +2303,6 @@ and is between 256 and 4096 characters. It is defined in the file spia_pedr= spia_peddr= - st= [HW,SCSI] SCSI tape parameters (buffers, etc.) - See Documentation/scsi/st.txt. - stacktrace [FTRACE] Enabled the stack tracer on boot up. @@ -2455,18 +2360,12 @@ and is between 256 and 4096 characters. It is defined in the file switches= [HW,M68k] - sym53c416= [HW,SCSI] - See header of drivers/scsi/sym53c416.c. - sysrq_always_enabled [KNL] Ignore sysrq setting - this boot parameter will neutralize any effect of /proc/sys/kernel/sysrq. Useful for debugging. - t128= [HW,SCSI] - See header of drivers/scsi/t128.c. - tdfx= [HW,DRM] test_suspend= [SUSPEND] @@ -2503,10 +2402,6 @@ and is between 256 and 4096 characters. It is defined in the file <deci-seconds>: poll all this frequency 0: no polling (default) - tmscsim= [HW,SCSI] - See comment before function dc390_setup() in - drivers/scsi/tmscsim.c. - topology= [S390] Format: {off | on} Specify if the kernel should make use of the cpu @@ -2547,9 +2442,6 @@ and is between 256 and 4096 characters. It is defined in the file <port#>,<js1>,<js2>,<js3>,<js4>,<js5>,<js6>,<js7> See also Documentation/input/joystick-parport.txt - u14-34f= [HW,SCSI] UltraStor 14F/34F SCSI host adapter - See header of drivers/scsi/u14-34f.c. - uhash_entries= [KNL,NET] Set number of hash buckets for UDP/UDP-Lite connections @@ -2715,12 +2607,6 @@ and is between 256 and 4096 characters. It is defined in the file overridden by individual drivers. 0 will hide cursors, 1 will display them. - wd33c93= [HW,SCSI] - See header of drivers/scsi/wd33c93.c. - - wd7000= [HW,SCSI] - See header of drivers/scsi/wd7000.c. - watchdog timers [HW,WDT] For information on watchdog timers, see Documentation/watchdog/watchdog-parameters.txt or other driver-specific files in the @@ -2746,8 +2632,10 @@ and is between 256 and 4096 characters. It is defined in the file aux-ide-disks -- unplug non-primary-master IDE devices nics -- unplug network devices all -- unplug all emulated devices (NICs and IDE disks) - ignore -- continue loading the Xen platform PCI driver even - if the version check failed + unnecessary -- unplugging emulated devices is + unnecessary even if the host did not respond to + the unplug protocol + never -- do not unplug even if version check succeeds xirc2ps_cs= [NET,PCMCIA] Format: diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt index f6f80257addb..1565eefd6fd5 100644 --- a/Documentation/laptops/thinkpad-acpi.txt +++ b/Documentation/laptops/thinkpad-acpi.txt @@ -1024,6 +1024,10 @@ ThinkPad-specific interface. The driver will disable its native backlight brightness control interface if it detects that the standard ACPI interface is available in the ThinkPad. +If you want to use the thinkpad-acpi backlight brightness control +instead of the generic ACPI video backlight brightness control for some +reason, you should use the acpi_backlight=vendor kernel parameter. + The brightness_enable module parameter can be used to control whether the LCD brightness control feature will be enabled when available. brightness_enable=0 forces it to be disabled. brightness_enable=1 diff --git a/Documentation/lguest/Makefile b/Documentation/lguest/Makefile index 28c8cdfcafd8..bebac6b4f332 100644 --- a/Documentation/lguest/Makefile +++ b/Documentation/lguest/Makefile @@ -1,5 +1,6 @@ # This creates the demonstration utility "lguest" which runs a Linux guest. -CFLAGS:=-m32 -Wall -Wmissing-declarations -Wmissing-prototypes -O3 -I../../include -I../../arch/x86/include -U_FORTIFY_SOURCE +# Missing headers? Add "-I../../include -I../../arch/x86/include" +CFLAGS:=-m32 -Wall -Wmissing-declarations -Wmissing-prototypes -O3 -U_FORTIFY_SOURCE all: lguest diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c index e9ce3c554514..8a6a8c6d4980 100644 --- a/Documentation/lguest/lguest.c +++ b/Documentation/lguest/lguest.c @@ -39,14 +39,14 @@ #include <limits.h> #include <stddef.h> #include <signal.h> -#include "linux/lguest_launcher.h" -#include "linux/virtio_config.h" -#include "linux/virtio_net.h" -#include "linux/virtio_blk.h" -#include "linux/virtio_console.h" -#include "linux/virtio_rng.h" -#include "linux/virtio_ring.h" -#include "asm/bootparam.h" +#include <linux/virtio_config.h> +#include <linux/virtio_net.h> +#include <linux/virtio_blk.h> +#include <linux/virtio_console.h> +#include <linux/virtio_rng.h> +#include <linux/virtio_ring.h> +#include <asm/bootparam.h> +#include "../../include/linux/lguest_launcher.h" /*L:110 * We can ignore the 42 include files we need for this program, but I do want * to draw attention to the use of kernel-style types. @@ -1447,14 +1447,15 @@ static void add_to_bridge(int fd, const char *if_name, const char *br_name) static void configure_device(int fd, const char *tapif, u32 ipaddr) { struct ifreq ifr; - struct sockaddr_in *sin = (struct sockaddr_in *)&ifr.ifr_addr; + struct sockaddr_in sin; memset(&ifr, 0, sizeof(ifr)); strcpy(ifr.ifr_name, tapif); /* Don't read these incantations. Just cut & paste them like I did! */ - sin->sin_family = AF_INET; - sin->sin_addr.s_addr = htonl(ipaddr); + sin.sin_family = AF_INET; + sin.sin_addr.s_addr = htonl(ipaddr); + memcpy(&ifr.ifr_addr, &sin, sizeof(sin)); if (ioctl(fd, SIOCSIFADDR, &ifr) != 0) err(1, "Setting %s interface address", tapif); ifr.ifr_flags = IFF_UP; diff --git a/Documentation/mutex-design.txt b/Documentation/mutex-design.txt index c91ccc0720fa..38c10fd7f411 100644 --- a/Documentation/mutex-design.txt +++ b/Documentation/mutex-design.txt @@ -9,7 +9,7 @@ firstly, there's nothing wrong with semaphores. But if the simpler mutex semantics are sufficient for your code, then there are a couple of advantages of mutexes: - - 'struct mutex' is smaller on most architectures: .e.g on x86, + - 'struct mutex' is smaller on most architectures: E.g. on x86, 'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes. A smaller structure size means less RAM footprint, and better CPU-cache utilization. @@ -136,3 +136,4 @@ the APIs of 'struct mutex' have been streamlined: void mutex_lock_nested(struct mutex *lock, unsigned int subclass); int mutex_lock_interruptible_nested(struct mutex *lock, unsigned int subclass); + int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock); diff --git a/Documentation/networking/e1000.txt b/Documentation/networking/e1000.txt index 2df71861e578..d9271e74e488 100644 --- a/Documentation/networking/e1000.txt +++ b/Documentation/networking/e1000.txt @@ -1,82 +1,35 @@ Linux* Base Driver for the Intel(R) PRO/1000 Family of Adapters =============================================================== -September 26, 2006 - +Intel Gigabit Linux driver. +Copyright(c) 1999 - 2010 Intel Corporation. Contents ======== -- In This Release - Identifying Your Adapter -- Building and Installation - Command Line Parameters - Speed and Duplex Configuration - Additional Configurations -- Known Issues - Support - -In This Release -=============== - -This file describes the Linux* Base Driver for the Intel(R) PRO/1000 Family -of Adapters. This driver includes support for Itanium(R)2-based systems. - -For questions related to hardware requirements, refer to the documentation -supplied with your Intel PRO/1000 adapter. All hardware requirements listed -apply to use with Linux. - -The following features are now available in supported kernels: - - Native VLANs - - Channel Bonding (teaming) - - SNMP - -Channel Bonding documentation can be found in the Linux kernel source: -/Documentation/networking/bonding.txt - -The driver information previously displayed in the /proc filesystem is not -supported in this release. Alternatively, you can use ethtool (version 1.6 -or later), lspci, and ifconfig to obtain the same information. - -Instructions on updating ethtool can be found in the section "Additional -Configurations" later in this document. - -NOTE: The Intel(R) 82562v 10/100 Network Connection only provides 10/100 -support. - - Identifying Your Adapter ======================== For more information on how to identify your adapter, go to the Adapter & Driver ID Guide at: - http://support.intel.com/support/network/adapter/pro100/21397.htm + http://support.intel.com/support/go/network/adapter/idguide.htm For the latest Intel network drivers for Linux, refer to the following website. In the search field, enter your adapter name or type, or use the networking link on the left to search for your adapter: - http://downloadfinder.intel.com/scripts-df/support_intel.asp - + http://support.intel.com/support/go/network/adapter/home.htm Command Line Parameters ======================= -If the driver is built as a module, the following optional parameters -are used by entering them on the command line with the modprobe command -using this syntax: - - modprobe e1000 [<option>=<VAL1>,<VAL2>,...] - -For example, with two PRO/1000 PCI adapters, entering: - - modprobe e1000 TxDescriptors=80,128 - -loads the e1000 driver with 80 TX descriptors for the first adapter and -128 TX descriptors for the second adapter. - The default value for each parameter is generally the recommended setting, unless otherwise noted. @@ -89,10 +42,6 @@ NOTES: For more information about the AutoNeg, Duplex, and Speed parameters, see the application note at: http://www.intel.com/design/network/applnots/ap450.htm - A descriptor describes a data buffer and attributes related to - the data buffer. This information is accessed by the hardware. - - AutoNeg ------- (Supported only on adapters with copper connections) @@ -106,7 +55,6 @@ Duplex parameters must not be specified. NOTE: Refer to the Speed and Duplex section of this readme for more information on the AutoNeg parameter. - Duplex ------ (Supported only on adapters with copper connections) @@ -119,7 +67,6 @@ set to auto-negotiate, the board auto-detects the correct duplex. If the link partner is forced (either full or half), Duplex defaults to half- duplex. - FlowControl ----------- Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx) @@ -128,16 +75,16 @@ Default Value: Reads flow control settings from the EEPROM This parameter controls the automatic generation(Tx) and response(Rx) to Ethernet PAUSE frames. - InterruptThrottleRate --------------------- (not supported on Intel(R) 82542, 82543 or 82544-based adapters) -Valid Range: 0,1,3,100-100000 (0=off, 1=dynamic, 3=dynamic conservative) +Valid Range: 0,1,3,4,100-100000 (0=off, 1=dynamic, 3=dynamic conservative, + 4=simplified balancing) Default Value: 3 The driver can limit the amount of interrupts per second that the adapter -will generate for incoming packets. It does this by writing a value to the -adapter that is based on the maximum amount of interrupts that the adapter +will generate for incoming packets. It does this by writing a value to the +adapter that is based on the maximum amount of interrupts that the adapter will generate per second. Setting InterruptThrottleRate to a value greater or equal to 100 @@ -146,37 +93,43 @@ per second, even if more packets have come in. This reduces interrupt load on the system and can lower CPU utilization under heavy load, but will increase latency as packets are not processed as quickly. -The default behaviour of the driver previously assumed a static -InterruptThrottleRate value of 8000, providing a good fallback value for -all traffic types,but lacking in small packet performance and latency. -The hardware can handle many more small packets per second however, and +The default behaviour of the driver previously assumed a static +InterruptThrottleRate value of 8000, providing a good fallback value for +all traffic types,but lacking in small packet performance and latency. +The hardware can handle many more small packets per second however, and for this reason an adaptive interrupt moderation algorithm was implemented. Since 7.3.x, the driver has two adaptive modes (setting 1 or 3) in which -it dynamically adjusts the InterruptThrottleRate value based on the traffic +it dynamically adjusts the InterruptThrottleRate value based on the traffic that it receives. After determining the type of incoming traffic in the last -timeframe, it will adjust the InterruptThrottleRate to an appropriate value +timeframe, it will adjust the InterruptThrottleRate to an appropriate value for that traffic. The algorithm classifies the incoming traffic every interval into -classes. Once the class is determined, the InterruptThrottleRate value is -adjusted to suit that traffic type the best. There are three classes defined: +classes. Once the class is determined, the InterruptThrottleRate value is +adjusted to suit that traffic type the best. There are three classes defined: "Bulk traffic", for large amounts of packets of normal size; "Low latency", for small amounts of traffic and/or a significant percentage of small -packets; and "Lowest latency", for almost completely small packets or +packets; and "Lowest latency", for almost completely small packets or minimal traffic. -In dynamic conservative mode, the InterruptThrottleRate value is set to 4000 -for traffic that falls in class "Bulk traffic". If traffic falls in the "Low -latency" or "Lowest latency" class, the InterruptThrottleRate is increased +In dynamic conservative mode, the InterruptThrottleRate value is set to 4000 +for traffic that falls in class "Bulk traffic". If traffic falls in the "Low +latency" or "Lowest latency" class, the InterruptThrottleRate is increased stepwise to 20000. This default mode is suitable for most applications. For situations where low latency is vital such as cluster or grid computing, the algorithm can reduce latency even more when InterruptThrottleRate is set to mode 1. In this mode, which operates -the same as mode 3, the InterruptThrottleRate will be increased stepwise to +the same as mode 3, the InterruptThrottleRate will be increased stepwise to 70000 for traffic in class "Lowest latency". +In simplified mode the interrupt rate is based on the ratio of Tx and +Rx traffic. If the bytes per second rate is approximately equal, the +interrupt rate will drop as low as 2000 interrupts per second. If the +traffic is mostly transmit or mostly receive, the interrupt rate could +be as high as 8000. + Setting InterruptThrottleRate to 0 turns off any interrupt moderation and may improve small packet latency, but is generally not suitable for bulk throughput traffic. @@ -212,8 +165,6 @@ NOTE: When e1000 is loaded with default settings and multiple adapters be platform-specific. If CPU utilization is not a concern, use RX_POLLING (NAPI) and default driver settings. - - RxDescriptors ------------- Valid Range: 80-256 for 82542 and 82543-based adapters @@ -225,15 +176,14 @@ by the driver. Increasing this value allows the driver to buffer more incoming packets, at the expense of increased system memory utilization. Each descriptor is 16 bytes. A receive buffer is also allocated for each -descriptor and can be either 2048, 4096, 8192, or 16384 bytes, depending +descriptor and can be either 2048, 4096, 8192, or 16384 bytes, depending on the MTU setting. The maximum MTU size is 16110. -NOTE: MTU designates the frame size. It only needs to be set for Jumbo - Frames. Depending on the available system resources, the request - for a higher number of receive descriptors may be denied. In this +NOTE: MTU designates the frame size. It only needs to be set for Jumbo + Frames. Depending on the available system resources, the request + for a higher number of receive descriptors may be denied. In this case, use a lower number. - RxIntDelay ---------- Valid Range: 0-65535 (0=off) @@ -254,7 +204,6 @@ CAUTION: When setting RxIntDelay to a value other than 0, adapters may restoring the network connection. To eliminate the potential for the hang ensure that RxIntDelay is set to 0. - RxAbsIntDelay ------------- (This parameter is supported only on 82540, 82545 and later adapters.) @@ -268,7 +217,6 @@ packet is received within the set amount of time. Proper tuning, along with RxIntDelay, may improve traffic throughput in specific network conditions. - Speed ----- (This parameter is supported only on adapters with copper connections.) @@ -280,7 +228,6 @@ Speed forces the line speed to the specified value in megabits per second partner is set to auto-negotiate, the board will auto-detect the correct speed. Duplex should also be set when Speed is set to either 10 or 100. - TxDescriptors ------------- Valid Range: 80-256 for 82542 and 82543-based adapters @@ -295,6 +242,36 @@ NOTE: Depending on the available system resources, the request for a higher number of transmit descriptors may be denied. In this case, use a lower number. +TxDescriptorStep +---------------- +Valid Range: 1 (use every Tx Descriptor) + 4 (use every 4th Tx Descriptor) + +Default Value: 1 (use every Tx Descriptor) + +On certain non-Intel architectures, it has been observed that intense TX +traffic bursts of short packets may result in an improper descriptor +writeback. If this occurs, the driver will report a "TX Timeout" and reset +the adapter, after which the transmit flow will restart, though data may +have stalled for as much as 10 seconds before it resumes. + +The improper writeback does not occur on the first descriptor in a system +memory cache-line, which is typically 32 bytes, or 4 descriptors long. + +Setting TxDescriptorStep to a value of 4 will ensure that all TX descriptors +are aligned to the start of a system memory cache line, and so this problem +will not occur. + +NOTES: Setting TxDescriptorStep to 4 effectively reduces the number of + TxDescriptors available for transmits to 1/4 of the normal allocation. + This has a possible negative performance impact, which may be + compensated for by allocating more descriptors using the TxDescriptors + module parameter. + + There are other conditions which may result in "TX Timeout", which will + not be resolved by the use of the TxDescriptorStep parameter. As the + issue addressed by this parameter has never been observed on Intel + Architecture platforms, it should not be used on Intel platforms. TxIntDelay ---------- @@ -307,7 +284,6 @@ efficiency if properly tuned for specific network traffic. If the system is reporting dropped transmits, this value may be set too high causing the driver to run out of available transmit descriptors. - TxAbsIntDelay ------------- (This parameter is supported only on 82540, 82545 and later adapters.) @@ -330,6 +306,35 @@ Default Value: 1 A value of '1' indicates that the driver should enable IP checksum offload for received packets (both UDP and TCP) to the adapter hardware. +Copybreak +--------- +Valid Range: 0-xxxxxxx (0=off) +Default Value: 256 +Usage: insmod e1000.ko copybreak=128 + +Driver copies all packets below or equaling this size to a fresh Rx +buffer before handing it up the stack. + +This parameter is different than other parameters, in that it is a +single (not 1,1,1 etc.) parameter applied to all driver instances and +it is also available during runtime at +/sys/module/e1000/parameters/copybreak + +SmartPowerDownEnable +-------------------- +Valid Range: 0-1 +Default Value: 0 (disabled) + +Allows PHY to turn off in lower power states. The user can turn off +this parameter in supported chipsets. + +KumeranLockLoss +--------------- +Valid Range: 0-1 +Default Value: 1 (enabled) + +This workaround skips resetting the PHY at shutdown for the initial +silicon releases of ICH8 systems. Speed and Duplex Configuration ============================== @@ -385,40 +390,9 @@ If the link partner is forced to a specific speed and duplex, then this parameter should not be used. Instead, use the Speed and Duplex parameters previously mentioned to force the adapter to the same speed and duplex. - Additional Configurations ========================= - Configuring the Driver on Different Distributions - ------------------------------------------------- - Configuring a network driver to load properly when the system is started - is distribution dependent. Typically, the configuration process involves - adding an alias line to /etc/modules.conf or /etc/modprobe.conf as well - as editing other system startup scripts and/or configuration files. Many - popular Linux distributions ship with tools to make these changes for you. - To learn the proper way to configure a network device for your system, - refer to your distribution documentation. If during this process you are - asked for the driver or module name, the name for the Linux Base Driver - for the Intel(R) PRO/1000 Family of Adapters is e1000. - - As an example, if you install the e1000 driver for two PRO/1000 adapters - (eth0 and eth1) and set the speed and duplex to 10full and 100half, add - the following to modules.conf or or modprobe.conf: - - alias eth0 e1000 - alias eth1 e1000 - options e1000 Speed=10,100 Duplex=2,1 - - Viewing Link Messages - --------------------- - Link messages will not be displayed to the console if the distribution is - restricting system messages. In order to see network driver link messages - on your console, set dmesg to eight by entering the following: - - dmesg -n 8 - - NOTE: This setting is not saved across reboots. - Jumbo Frames ------------ Jumbo Frames support is enabled by changing the MTU to a value larger than @@ -437,9 +411,11 @@ Additional Configurations setting in a different location. Notes: - - - To enable Jumbo Frames, increase the MTU size on the interface beyond - 1500. + Degradation in throughput performance may be observed in some Jumbo frames + environments. If this is observed, increasing the application's socket buffer + size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help. + See the specific application manual and /usr/src/linux*/Documentation/ + networking/ip-sysctl.txt for more details. - The maximum MTU setting for Jumbo Frames is 16110. This value coincides with the maximum Jumbo Frames size of 16128. @@ -447,40 +423,11 @@ Additional Configurations - Using Jumbo Frames at 10 or 100 Mbps may result in poor performance or loss of link. - - Some Intel gigabit adapters that support Jumbo Frames have a frame size - limit of 9238 bytes, with a corresponding MTU size limit of 9216 bytes. - The adapters with this limitation are based on the Intel(R) 82571EB, - 82572EI, 82573L and 80003ES2LAN controller. These correspond to the - following product names: - Intel(R) PRO/1000 PT Server Adapter - Intel(R) PRO/1000 PT Desktop Adapter - Intel(R) PRO/1000 PT Network Connection - Intel(R) PRO/1000 PT Dual Port Server Adapter - Intel(R) PRO/1000 PT Dual Port Network Connection - Intel(R) PRO/1000 PF Server Adapter - Intel(R) PRO/1000 PF Network Connection - Intel(R) PRO/1000 PF Dual Port Server Adapter - Intel(R) PRO/1000 PB Server Connection - Intel(R) PRO/1000 PL Network Connection - Intel(R) PRO/1000 EB Network Connection with I/O Acceleration - Intel(R) PRO/1000 EB Backplane Connection with I/O Acceleration - Intel(R) PRO/1000 PT Quad Port Server Adapter - - Adapters based on the Intel(R) 82542 and 82573V/E controller do not support Jumbo Frames. These correspond to the following product names: Intel(R) PRO/1000 Gigabit Server Adapter Intel(R) PRO/1000 PM Network Connection - - The following adapters do not support Jumbo Frames: - Intel(R) 82562V 10/100 Network Connection - Intel(R) 82566DM Gigabit Network Connection - Intel(R) 82566DC Gigabit Network Connection - Intel(R) 82566MM Gigabit Network Connection - Intel(R) 82566MC Gigabit Network Connection - Intel(R) 82562GT 10/100 Network Connection - Intel(R) 82562G 10/100 Network Connection - - Ethtool ------- The driver utilizes the ethtool interface for driver configuration and @@ -490,142 +437,14 @@ Additional Configurations The latest release of ethtool can be found from http://sourceforge.net/projects/gkernel. - NOTE: Ethtool 1.6 only supports a limited set of ethtool options. Support - for a more complete ethtool feature set can be enabled by upgrading - ethtool to ethtool-1.8.1. - Enabling Wake on LAN* (WoL) --------------------------- - WoL is configured through the Ethtool* utility. Ethtool is included with - all versions of Red Hat after Red Hat 7.2. For other Linux distributions, - download and install Ethtool from the following website: - http://sourceforge.net/projects/gkernel. - - For instructions on enabling WoL with Ethtool, refer to the website listed - above. + WoL is configured through the Ethtool* utility. WoL will be enabled on the system during the next shut down or reboot. For this driver version, in order to enable WoL, the e1000 driver must be loaded when shutting down or rebooting the system. - Wake On LAN is only supported on port A for the following devices: - Intel(R) PRO/1000 PT Dual Port Network Connection - Intel(R) PRO/1000 PT Dual Port Server Connection - Intel(R) PRO/1000 PT Dual Port Server Adapter - Intel(R) PRO/1000 PF Dual Port Server Adapter - Intel(R) PRO/1000 PT Quad Port Server Adapter - - NAPI - ---- - NAPI (Rx polling mode) is enabled in the e1000 driver. - - See www.cyberus.ca/~hadi/usenix-paper.tgz for more information on NAPI. - - -Known Issues -============ - -Dropped Receive Packets on Half-duplex 10/100 Networks ------------------------------------------------------- -If you have an Intel PCI Express adapter running at 10mbps or 100mbps, half- -duplex, you may observe occasional dropped receive packets. There are no -workarounds for this problem in this network configuration. The network must -be updated to operate in full-duplex, and/or 1000mbps only. - -Jumbo Frames System Requirement -------------------------------- -Memory allocation failures have been observed on Linux systems with 64 MB -of RAM or less that are running Jumbo Frames. If you are using Jumbo -Frames, your system may require more than the advertised minimum -requirement of 64 MB of system memory. - -Performance Degradation with Jumbo Frames ------------------------------------------ -Degradation in throughput performance may be observed in some Jumbo frames -environments. If this is observed, increasing the application's socket -buffer size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values -may help. See the specific application manual and -/usr/src/linux*/Documentation/ -networking/ip-sysctl.txt for more details. - -Jumbo Frames on Foundry BigIron 8000 switch -------------------------------------------- -There is a known issue using Jumbo frames when connected to a Foundry -BigIron 8000 switch. This is a 3rd party limitation. If you experience -loss of packets, lower the MTU size. - -Allocating Rx Buffers when Using Jumbo Frames ---------------------------------------------- -Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if -the available memory is heavily fragmented. This issue may be seen with PCI-X -adapters or with packet split disabled. This can be reduced or eliminated -by changing the amount of available memory for receive buffer allocation, by -increasing /proc/sys/vm/min_free_kbytes. - -Multiple Interfaces on Same Ethernet Broadcast Network ------------------------------------------------------- -Due to the default ARP behavior on Linux, it is not possible to have -one system on two IP networks in the same Ethernet broadcast domain -(non-partitioned switch) behave as expected. All Ethernet interfaces -will respond to IP traffic for any IP address assigned to the system. -This results in unbalanced receive traffic. - -If you have multiple interfaces in a server, either turn on ARP -filtering by entering: - - echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter -(this only works if your kernel's version is higher than 2.4.5), - -NOTE: This setting is not saved across reboots. The configuration -change can be made permanent by adding the line: - net.ipv4.conf.all.arp_filter = 1 -to the file /etc/sysctl.conf - - or, - -install the interfaces in separate broadcast domains (either in -different switches or in a switch partitioned to VLANs). - -82541/82547 can't link or are slow to link with some link partners ------------------------------------------------------------------ -There is a known compatibility issue with 82541/82547 and some -low-end switches where the link will not be established, or will -be slow to establish. In particular, these switches are known to -be incompatible with 82541/82547: - - Planex FXG-08TE - I-O Data ETG-SH8 - -To workaround this issue, the driver can be compiled with an override -of the PHY's master/slave setting. Forcing master or forcing slave -mode will improve time-to-link. - - # make CFLAGS_EXTRA=-DE1000_MASTER_SLAVE=<n> - -Where <n> is: - - 0 = Hardware default - 1 = Master mode - 2 = Slave mode - 3 = Auto master/slave - -Disable rx flow control with ethtool ------------------------------------- -In order to disable receive flow control using ethtool, you must turn -off auto-negotiation on the same command line. - -For example: - - ethtool -A eth? autoneg off rx off - -Unplugging network cable while ethtool -p is running ----------------------------------------------------- -In kernel versions 2.5.50 and later (including 2.6 kernel), unplugging -the network cable while ethtool -p is running will cause the system to -become unresponsive to keyboard commands, except for control-alt-delete. -Restarting the system appears to be the only remedy. - - Support ======= diff --git a/Documentation/networking/e1000e.txt b/Documentation/networking/e1000e.txt new file mode 100644 index 000000000000..6aa048badf32 --- /dev/null +++ b/Documentation/networking/e1000e.txt @@ -0,0 +1,302 @@ +Linux* Driver for Intel(R) Network Connection +=============================================================== + +Intel Gigabit Linux driver. +Copyright(c) 1999 - 2010 Intel Corporation. + +Contents +======== + +- Identifying Your Adapter +- Command Line Parameters +- Additional Configurations +- Support + +Identifying Your Adapter +======================== + +The e1000e driver supports all PCI Express Intel(R) Gigabit Network +Connections, except those that are 82575, 82576 and 82580-based*. + +* NOTE: The Intel(R) PRO/1000 P Dual Port Server Adapter is supported by + the e1000 driver, not the e1000e driver due to the 82546 part being used + behind a PCI Express bridge. + +For more information on how to identify your adapter, go to the Adapter & +Driver ID Guide at: + + http://support.intel.com/support/go/network/adapter/idguide.htm + +For the latest Intel network drivers for Linux, refer to the following +website. In the search field, enter your adapter name or type, or use the +networking link on the left to search for your adapter: + + http://support.intel.com/support/go/network/adapter/home.htm + +Command Line Parameters +======================= + +The default value for each parameter is generally the recommended setting, +unless otherwise noted. + +NOTES: For more information about the InterruptThrottleRate, + RxIntDelay, TxIntDelay, RxAbsIntDelay, and TxAbsIntDelay + parameters, see the application note at: + http://www.intel.com/design/network/applnots/ap450.htm + +InterruptThrottleRate +--------------------- +Valid Range: 0,1,3,4,100-100000 (0=off, 1=dynamic, 3=dynamic conservative, + 4=simplified balancing) +Default Value: 3 + +The driver can limit the amount of interrupts per second that the adapter +will generate for incoming packets. It does this by writing a value to the +adapter that is based on the maximum amount of interrupts that the adapter +will generate per second. + +Setting InterruptThrottleRate to a value greater or equal to 100 +will program the adapter to send out a maximum of that many interrupts +per second, even if more packets have come in. This reduces interrupt +load on the system and can lower CPU utilization under heavy load, +but will increase latency as packets are not processed as quickly. + +The driver has two adaptive modes (setting 1 or 3) in which +it dynamically adjusts the InterruptThrottleRate value based on the traffic +that it receives. After determining the type of incoming traffic in the last +timeframe, it will adjust the InterruptThrottleRate to an appropriate value +for that traffic. + +The algorithm classifies the incoming traffic every interval into +classes. Once the class is determined, the InterruptThrottleRate value is +adjusted to suit that traffic type the best. There are three classes defined: +"Bulk traffic", for large amounts of packets of normal size; "Low latency", +for small amounts of traffic and/or a significant percentage of small +packets; and "Lowest latency", for almost completely small packets or +minimal traffic. + +In dynamic conservative mode, the InterruptThrottleRate value is set to 4000 +for traffic that falls in class "Bulk traffic". If traffic falls in the "Low +latency" or "Lowest latency" class, the InterruptThrottleRate is increased +stepwise to 20000. This default mode is suitable for most applications. + +For situations where low latency is vital such as cluster or +grid computing, the algorithm can reduce latency even more when +InterruptThrottleRate is set to mode 1. In this mode, which operates +the same as mode 3, the InterruptThrottleRate will be increased stepwise to +70000 for traffic in class "Lowest latency". + +In simplified mode the interrupt rate is based on the ratio of Tx and +Rx traffic. If the bytes per second rate is approximately equal the +interrupt rate will drop as low as 2000 interrupts per second. If the +traffic is mostly transmit or mostly receive, the interrupt rate could +be as high as 8000. + +Setting InterruptThrottleRate to 0 turns off any interrupt moderation +and may improve small packet latency, but is generally not suitable +for bulk throughput traffic. + +NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and + RxAbsIntDelay parameters. In other words, minimizing the receive + and/or transmit absolute delays does not force the controller to + generate more interrupts than what the Interrupt Throttle Rate + allows. + +NOTE: When e1000e is loaded with default settings and multiple adapters + are in use simultaneously, the CPU utilization may increase non- + linearly. In order to limit the CPU utilization without impacting + the overall throughput, we recommend that you load the driver as + follows: + + modprobe e1000e InterruptThrottleRate=3000,3000,3000 + + This sets the InterruptThrottleRate to 3000 interrupts/sec for + the first, second, and third instances of the driver. The range + of 2000 to 3000 interrupts per second works on a majority of + systems and is a good starting point, but the optimal value will + be platform-specific. If CPU utilization is not a concern, use + RX_POLLING (NAPI) and default driver settings. + +RxIntDelay +---------- +Valid Range: 0-65535 (0=off) +Default Value: 0 + +This value delays the generation of receive interrupts in units of 1.024 +microseconds. Receive interrupt reduction can improve CPU efficiency if +properly tuned for specific network traffic. Increasing this value adds +extra latency to frame reception and can end up decreasing the throughput +of TCP traffic. If the system is reporting dropped receives, this value +may be set too high, causing the driver to run out of available receive +descriptors. + +CAUTION: When setting RxIntDelay to a value other than 0, adapters may + hang (stop transmitting) under certain network conditions. If + this occurs a NETDEV WATCHDOG message is logged in the system + event log. In addition, the controller is automatically reset, + restoring the network connection. To eliminate the potential + for the hang ensure that RxIntDelay is set to 0. + +RxAbsIntDelay +------------- +Valid Range: 0-65535 (0=off) +Default Value: 8 + +This value, in units of 1.024 microseconds, limits the delay in which a +receive interrupt is generated. Useful only if RxIntDelay is non-zero, +this value ensures that an interrupt is generated after the initial +packet is received within the set amount of time. Proper tuning, +along with RxIntDelay, may improve traffic throughput in specific network +conditions. + +TxIntDelay +---------- +Valid Range: 0-65535 (0=off) +Default Value: 8 + +This value delays the generation of transmit interrupts in units of +1.024 microseconds. Transmit interrupt reduction can improve CPU +efficiency if properly tuned for specific network traffic. If the +system is reporting dropped transmits, this value may be set too high +causing the driver to run out of available transmit descriptors. + +TxAbsIntDelay +------------- +Valid Range: 0-65535 (0=off) +Default Value: 32 + +This value, in units of 1.024 microseconds, limits the delay in which a +transmit interrupt is generated. Useful only if TxIntDelay is non-zero, +this value ensures that an interrupt is generated after the initial +packet is sent on the wire within the set amount of time. Proper tuning, +along with TxIntDelay, may improve traffic throughput in specific +network conditions. + +Copybreak +--------- +Valid Range: 0-xxxxxxx (0=off) +Default Value: 256 + +Driver copies all packets below or equaling this size to a fresh Rx +buffer before handing it up the stack. + +This parameter is different than other parameters, in that it is a +single (not 1,1,1 etc.) parameter applied to all driver instances and +it is also available during runtime at +/sys/module/e1000e/parameters/copybreak + +SmartPowerDownEnable +-------------------- +Valid Range: 0-1 +Default Value: 0 (disabled) + +Allows PHY to turn off in lower power states. The user can set this parameter +in supported chipsets. + +KumeranLockLoss +--------------- +Valid Range: 0-1 +Default Value: 1 (enabled) + +This workaround skips resetting the PHY at shutdown for the initial +silicon releases of ICH8 systems. + +IntMode +------- +Valid Range: 0-2 (0=legacy, 1=MSI, 2=MSI-X) +Default Value: 2 + +Allows changing the interrupt mode at module load time, without requiring a +recompile. If the driver load fails to enable a specific interrupt mode, the +driver will try other interrupt modes, from least to most compatible. The +interrupt order is MSI-X, MSI, Legacy. If specifying MSI (IntMode=1) +interrupts, only MSI and Legacy will be attempted. + +CrcStripping +------------ +Valid Range: 0-1 +Default Value: 1 (enabled) + +Strip the CRC from received packets before sending up the network stack. If +you have a machine with a BMC enabled but cannot receive IPMI traffic after +loading or enabling the driver, try disabling this feature. + +WriteProtectNVM +--------------- +Valid Range: 0-1 +Default Value: 1 (enabled) + +Set the hardware to ignore all write/erase cycles to the GbE region in the +ICHx NVM (non-volatile memory). This feature can be disabled by the +WriteProtectNVM module parameter (enabled by default) only after a hardware +reset, but the machine must be power cycled before trying to enable writes. + +Note: the kernel boot option iomem=relaxed may need to be set if the kernel +config option CONFIG_STRICT_DEVMEM=y, if the root user wants to write the +NVM from user space via ethtool. + +Additional Configurations +========================= + + Jumbo Frames + ------------ + Jumbo Frames support is enabled by changing the MTU to a value larger than + the default of 1500. Use the ifconfig command to increase the MTU size. + For example: + + ifconfig eth<x> mtu 9000 up + + This setting is not saved across reboots. + + Notes: + + - The maximum MTU setting for Jumbo Frames is 9216. This value coincides + with the maximum Jumbo Frames size of 9234 bytes. + + - Using Jumbo Frames at 10 or 100 Mbps is not supported and may result in + poor performance or loss of link. + + - Some adapters limit Jumbo Frames sized packets to a maximum of + 4096 bytes and some adapters do not support Jumbo Frames. + + + Ethtool + ------- + The driver utilizes the ethtool interface for driver configuration and + diagnostics, as well as displaying statistical information. We + strongly recommend downloading the latest version of Ethtool at: + + http://sourceforge.net/projects/gkernel. + + Speed and Duplex + ---------------- + Speed and Duplex are configured through the Ethtool* utility. For + instructions, refer to the Ethtool man page. + + Enabling Wake on LAN* (WoL) + --------------------------- + WoL is configured through the Ethtool* utility. For instructions on + enabling WoL with Ethtool, refer to the Ethtool man page. + + WoL will be enabled on the system during the next shut down or reboot. + For this driver version, in order to enable WoL, the e1000e driver must be + loaded when shutting down or rebooting the system. + + In most cases Wake On LAN is only supported on port A for multiple port + adapters. To verify if a port supports Wake on LAN run ethtool eth<X>. + + +Support +======= + +For general information, go to the Intel support website at: + + www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + + http://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on the supported +kernel with a supported adapter, email the specific information related +to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/ixgbevf.txt b/Documentation/networking/ixgbevf.txt index 19015de6725f..21dd5d15b6b4 100755..100644 --- a/Documentation/networking/ixgbevf.txt +++ b/Documentation/networking/ixgbevf.txt @@ -1,19 +1,16 @@ Linux* Base Driver for Intel(R) Network Connection ================================================== -November 24, 2009 +Intel Gigabit Linux driver. +Copyright(c) 1999 - 2010 Intel Corporation. Contents ======== -- In This Release - Identifying Your Adapter - Known Issues/Troubleshooting - Support -In This Release -=============== - This file describes the ixgbevf Linux* Base Driver for Intel Network Connection. @@ -33,7 +30,7 @@ Identifying Your Adapter For more information on how to identify your adapter, go to the Adapter & Driver ID Guide at: - http://support.intel.com/support/network/sb/CS-008441.htm + http://support.intel.com/support/go/network/adapter/idguide.htm Known Issues/Troubleshooting ============================ @@ -57,34 +54,3 @@ or the Intel Wired Networking project hosted by Sourceforge at: If an issue is identified with the released source code on the supported kernel with a supported adapter, email the specific information related to the issue to e1000-devel@lists.sf.net - -License -======= - -Intel 10 Gigabit Linux driver. -Copyright(c) 1999 - 2009 Intel Corporation. - -This program is free software; you can redistribute it and/or modify it -under the terms and conditions of the GNU General Public License, -version 2, as published by the Free Software Foundation. - -This program is distributed in the hope it will be useful, but WITHOUT -ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or -FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for -more details. - -You should have received a copy of the GNU General Public License along with -this program; if not, write to the Free Software Foundation, Inc., -51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. - -The full GNU General Public License is included in this distribution in -the file called "COPYING". - -Trademarks -========== - -Intel, Itanium, and Pentium are trademarks or registered trademarks of -Intel Corporation or its subsidiaries in the United States and other -countries. - -* Other names and brands may be claimed as the property of others. diff --git a/Documentation/power/regulator/overview.txt b/Documentation/power/regulator/overview.txt index 9363e056188a..8ed17587a74b 100644 --- a/Documentation/power/regulator/overview.txt +++ b/Documentation/power/regulator/overview.txt @@ -13,7 +13,7 @@ regulators (where voltage output is controllable) and current sinks (where current limit is controllable). (C) 2008 Wolfson Microelectronics PLC. -Author: Liam Girdwood <lg@opensource.wolfsonmicro.com> +Author: Liam Girdwood <lrg@slimlogic.co.uk> Nomenclature diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt index 568fa08e82e5..302db5da49b3 100644 --- a/Documentation/powerpc/booting-without-of.txt +++ b/Documentation/powerpc/booting-without-of.txt @@ -49,40 +49,13 @@ Table of Contents f) MDIO on GPIOs g) SPI busses - VII - Marvell Discovery mv64[345]6x System Controller chips - 1) The /system-controller node - 2) Child nodes of /system-controller - a) Marvell Discovery MDIO bus - b) Marvell Discovery ethernet controller - c) Marvell Discovery PHY nodes - d) Marvell Discovery SDMA nodes - e) Marvell Discovery BRG nodes - f) Marvell Discovery CUNIT nodes - g) Marvell Discovery MPSCROUTING nodes - h) Marvell Discovery MPSCINTR nodes - i) Marvell Discovery MPSC nodes - j) Marvell Discovery Watch Dog Timer nodes - k) Marvell Discovery I2C nodes - l) Marvell Discovery PIC (Programmable Interrupt Controller) nodes - m) Marvell Discovery MPP (Multipurpose Pins) multiplexing nodes - n) Marvell Discovery GPP (General Purpose Pins) nodes - o) Marvell Discovery PCI host bridge node - p) Marvell Discovery CPU Error nodes - q) Marvell Discovery SRAM Controller nodes - r) Marvell Discovery PCI Error Handler nodes - s) Marvell Discovery Memory Controller nodes - - VIII - Specifying interrupt information for devices + VII - Specifying interrupt information for devices 1) interrupts property 2) interrupt-parent property 3) OpenPIC Interrupt Controllers 4) ISA Interrupt Controllers - IX - Specifying GPIO information for devices - 1) gpios property - 2) gpio-controller nodes - - X - Specifying device power management information (sleep property) + VIII - Specifying device power management information (sleep property) Appendix A - Sample SOC node for MPC8540 diff --git a/Documentation/powerpc/hvcs.txt b/Documentation/powerpc/hvcs.txt index f93462c5db25..6d8be3468d7d 100644 --- a/Documentation/powerpc/hvcs.txt +++ b/Documentation/powerpc/hvcs.txt @@ -560,7 +560,7 @@ The proper channel for reporting bugs is either through the Linux OS distribution company that provided your OS or by posting issues to the PowerPC development mailing list at: -linuxppc-dev@ozlabs.org +linuxppc-dev@lists.ozlabs.org This request is to provide a documented and searchable public exchange of the problems and solutions surrounding this driver for the benefit of diff --git a/Documentation/scsi/scsi-parameters.txt b/Documentation/scsi/scsi-parameters.txt new file mode 100644 index 000000000000..21e5798526ee --- /dev/null +++ b/Documentation/scsi/scsi-parameters.txt @@ -0,0 +1,139 @@ + SCSI Kernel Parameters + ~~~~~~~~~~~~~~~~~~~~~~ + +See Documentation/kernel-parameters.txt for general information on +specifying module parameters. + +This document may not be entirely up to date and comprehensive. The command +"modinfo -p ${modulename}" shows a current list of all parameters of a loadable +module. Loadable modules, after being loaded into the running kernel, also +reveal their parameters in /sys/module/${modulename}/parameters/. Some of these +parameters may be changed at runtime by the command +"echo -n ${value} > /sys/module/${modulename}/parameters/${parm}". + + + advansys= [HW,SCSI] + See header of drivers/scsi/advansys.c. + + aha152x= [HW,SCSI] + See Documentation/scsi/aha152x.txt. + + aha1542= [HW,SCSI] + Format: <portbase>[,<buson>,<busoff>[,<dmaspeed>]] + + aic7xxx= [HW,SCSI] + See Documentation/scsi/aic7xxx.txt. + + aic79xx= [HW,SCSI] + See Documentation/scsi/aic79xx.txt. + + atascsi= [HW,SCSI] Atari SCSI + + BusLogic= [HW,SCSI] + See drivers/scsi/BusLogic.c, comment before function + BusLogic_ParseDriverOptions(). + + dtc3181e= [HW,SCSI] + + eata= [HW,SCSI] + + fd_mcs= [HW,SCSI] + See header of drivers/scsi/fd_mcs.c. + + fdomain= [HW,SCSI] + See header of drivers/scsi/fdomain.c. + + gdth= [HW,SCSI] + See header of drivers/scsi/gdth.c. + + gvp11= [HW,SCSI] + + ibmmcascsi= [HW,MCA,SCSI] IBM MicroChannel SCSI adapter + See Documentation/mca.txt. + + in2000= [HW,SCSI] + See header of drivers/scsi/in2000.c. + + ips= [HW,SCSI] Adaptec / IBM ServeRAID controller + See header of drivers/scsi/ips.c. + + mac5380= [HW,SCSI] Format: + <can_queue>,<cmd_per_lun>,<sg_tablesize>,<hostid>,<use_tags> + + max_luns= [SCSI] Maximum number of LUNs to probe. + Should be between 1 and 2^32-1. + + max_report_luns= + [SCSI] Maximum number of LUNs received. + Should be between 1 and 16384. + + NCR_D700= [HW,SCSI] + See header of drivers/scsi/NCR_D700.c. + + ncr5380= [HW,SCSI] + + ncr53c400= [HW,SCSI] + + ncr53c400a= [HW,SCSI] + + ncr53c406a= [HW,SCSI] + + ncr53c8xx= [HW,SCSI] + + nodisconnect [HW,SCSI,M68K] Disables SCSI disconnects. + + osst= [HW,SCSI] SCSI Tape Driver + Format: <buffer_size>,<write_threshold> + See also Documentation/scsi/st.txt. + + pas16= [HW,SCSI] + See header of drivers/scsi/pas16.c. + + scsi_debug_*= [SCSI] + See drivers/scsi/scsi_debug.c. + + scsi_default_dev_flags= + [SCSI] SCSI default device flags + Format: <integer> + + scsi_dev_flags= [SCSI] Black/white list entry for vendor and model + Format: <vendor>:<model>:<flags> + (flags are integer value) + + scsi_logging_level= [SCSI] a bit mask of logging levels + See drivers/scsi/scsi_logging.h for bits. Also + settable via sysctl at dev.scsi.logging_level + (/proc/sys/dev/scsi/logging_level). + There is also a nice 'scsi_logging_level' script in the + S390-tools package, available for download at + http://www-128.ibm.com/developerworks/linux/linux390/s390-tools-1.5.4.html + + scsi_mod.scan= [SCSI] sync (default) scans SCSI busses as they are + discovered. async scans them in kernel threads, + allowing boot to proceed. none ignores them, expecting + user space to do the scan. + + sim710= [SCSI,HW] + See header of drivers/scsi/sim710.c. + + st= [HW,SCSI] SCSI tape parameters (buffers, etc.) + See Documentation/scsi/st.txt. + + sym53c416= [HW,SCSI] + See header of drivers/scsi/sym53c416.c. + + t128= [HW,SCSI] + See header of drivers/scsi/t128.c. + + tmscsim= [HW,SCSI] + See comment before function dc390_setup() in + drivers/scsi/tmscsim.c. + + u14-34f= [HW,SCSI] UltraStor 14F/34F SCSI host adapter + See header of drivers/scsi/u14-34f.c. + + wd33c93= [HW,SCSI] + See header of drivers/scsi/wd33c93.c. + + wd7000= [HW,SCSI] + See header of drivers/scsi/wd7000.c. diff --git a/Documentation/sound/alsa/HD-Audio-Models.txt b/Documentation/sound/alsa/HD-Audio-Models.txt index ce46fa1e643e..37c6aad5e590 100644 --- a/Documentation/sound/alsa/HD-Audio-Models.txt +++ b/Documentation/sound/alsa/HD-Audio-Models.txt @@ -296,6 +296,7 @@ Conexant 5051 Conexant 5066 ============= laptop Basic Laptop config (default) + hp-laptop HP laptops, e g G60 dell-laptop Dell laptops dell-vostro Dell Vostro olpc-xo-1_5 OLPC XO 1.5 diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index ccd951fa94ee..cc96ee2666f2 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -478,7 +478,7 @@ static void prepare_hwpoison_fd(void) } if (opt_unpoison && !hwpoison_forget_fd) { - sprintf(buf, "%s/renew-pfn", hwpoison_debug_fs); + sprintf(buf, "%s/unpoison-pfn", hwpoison_debug_fs); hwpoison_forget_fd = checked_open(buf, O_WRONLY); } } diff --git a/Documentation/workqueue.txt b/Documentation/workqueue.txt new file mode 100644 index 000000000000..e4498a2872c3 --- /dev/null +++ b/Documentation/workqueue.txt @@ -0,0 +1,380 @@ + +Concurrency Managed Workqueue (cmwq) + +September, 2010 Tejun Heo <tj@kernel.org> + Florian Mickler <florian@mickler.org> + +CONTENTS + +1. Introduction +2. Why cmwq? +3. The Design +4. Application Programming Interface (API) +5. Example Execution Scenarios +6. Guidelines + + +1. Introduction + +There are many cases where an asynchronous process execution context +is needed and the workqueue (wq) API is the most commonly used +mechanism for such cases. + +When such an asynchronous execution context is needed, a work item +describing which function to execute is put on a queue. An +independent thread serves as the asynchronous execution context. The +queue is called workqueue and the thread is called worker. + +While there are work items on the workqueue the worker executes the +functions associated with the work items one after the other. When +there is no work item left on the workqueue the worker becomes idle. +When a new work item gets queued, the worker begins executing again. + + +2. Why cmwq? + +In the original wq implementation, a multi threaded (MT) wq had one +worker thread per CPU and a single threaded (ST) wq had one worker +thread system-wide. A single MT wq needed to keep around the same +number of workers as the number of CPUs. The kernel grew a lot of MT +wq users over the years and with the number of CPU cores continuously +rising, some systems saturated the default 32k PID space just booting +up. + +Although MT wq wasted a lot of resource, the level of concurrency +provided was unsatisfactory. The limitation was common to both ST and +MT wq albeit less severe on MT. Each wq maintained its own separate +worker pool. A MT wq could provide only one execution context per CPU +while a ST wq one for the whole system. Work items had to compete for +those very limited execution contexts leading to various problems +including proneness to deadlocks around the single execution context. + +The tension between the provided level of concurrency and resource +usage also forced its users to make unnecessary tradeoffs like libata +choosing to use ST wq for polling PIOs and accepting an unnecessary +limitation that no two polling PIOs can progress at the same time. As +MT wq don't provide much better concurrency, users which require +higher level of concurrency, like async or fscache, had to implement +their own thread pool. + +Concurrency Managed Workqueue (cmwq) is a reimplementation of wq with +focus on the following goals. + +* Maintain compatibility with the original workqueue API. + +* Use per-CPU unified worker pools shared by all wq to provide + flexible level of concurrency on demand without wasting a lot of + resource. + +* Automatically regulate worker pool and level of concurrency so that + the API users don't need to worry about such details. + + +3. The Design + +In order to ease the asynchronous execution of functions a new +abstraction, the work item, is introduced. + +A work item is a simple struct that holds a pointer to the function +that is to be executed asynchronously. Whenever a driver or subsystem +wants a function to be executed asynchronously it has to set up a work +item pointing to that function and queue that work item on a +workqueue. + +Special purpose threads, called worker threads, execute the functions +off of the queue, one after the other. If no work is queued, the +worker threads become idle. These worker threads are managed in so +called thread-pools. + +The cmwq design differentiates between the user-facing workqueues that +subsystems and drivers queue work items on and the backend mechanism +which manages thread-pool and processes the queued work items. + +The backend is called gcwq. There is one gcwq for each possible CPU +and one gcwq to serve work items queued on unbound workqueues. + +Subsystems and drivers can create and queue work items through special +workqueue API functions as they see fit. They can influence some +aspects of the way the work items are executed by setting flags on the +workqueue they are putting the work item on. These flags include +things like CPU locality, reentrancy, concurrency limits and more. To +get a detailed overview refer to the API description of +alloc_workqueue() below. + +When a work item is queued to a workqueue, the target gcwq is +determined according to the queue parameters and workqueue attributes +and appended on the shared worklist of the gcwq. For example, unless +specifically overridden, a work item of a bound workqueue will be +queued on the worklist of exactly that gcwq that is associated to the +CPU the issuer is running on. + +For any worker pool implementation, managing the concurrency level +(how many execution contexts are active) is an important issue. cmwq +tries to keep the concurrency at a minimal but sufficient level. +Minimal to save resources and sufficient in that the system is used at +its full capacity. + +Each gcwq bound to an actual CPU implements concurrency management by +hooking into the scheduler. The gcwq is notified whenever an active +worker wakes up or sleeps and keeps track of the number of the +currently runnable workers. Generally, work items are not expected to +hog a CPU and consume many cycles. That means maintaining just enough +concurrency to prevent work processing from stalling should be +optimal. As long as there are one or more runnable workers on the +CPU, the gcwq doesn't start execution of a new work, but, when the +last running worker goes to sleep, it immediately schedules a new +worker so that the CPU doesn't sit idle while there are pending work +items. This allows using a minimal number of workers without losing +execution bandwidth. + +Keeping idle workers around doesn't cost other than the memory space +for kthreads, so cmwq holds onto idle ones for a while before killing +them. + +For an unbound wq, the above concurrency management doesn't apply and +the gcwq for the pseudo unbound CPU tries to start executing all work +items as soon as possible. The responsibility of regulating +concurrency level is on the users. There is also a flag to mark a +bound wq to ignore the concurrency management. Please refer to the +API section for details. + +Forward progress guarantee relies on that workers can be created when +more execution contexts are necessary, which in turn is guaranteed +through the use of rescue workers. All work items which might be used +on code paths that handle memory reclaim are required to be queued on +wq's that have a rescue-worker reserved for execution under memory +pressure. Else it is possible that the thread-pool deadlocks waiting +for execution contexts to free up. + + +4. Application Programming Interface (API) + +alloc_workqueue() allocates a wq. The original create_*workqueue() +functions are deprecated and scheduled for removal. alloc_workqueue() +takes three arguments - @name, @flags and @max_active. @name is the +name of the wq and also used as the name of the rescuer thread if +there is one. + +A wq no longer manages execution resources but serves as a domain for +forward progress guarantee, flush and work item attributes. @flags +and @max_active control how work items are assigned execution +resources, scheduled and executed. + +@flags: + + WQ_NON_REENTRANT + + By default, a wq guarantees non-reentrance only on the same + CPU. A work item may not be executed concurrently on the same + CPU by multiple workers but is allowed to be executed + concurrently on multiple CPUs. This flag makes sure + non-reentrance is enforced across all CPUs. Work items queued + to a non-reentrant wq are guaranteed to be executed by at most + one worker system-wide at any given time. + + WQ_UNBOUND + + Work items queued to an unbound wq are served by a special + gcwq which hosts workers which are not bound to any specific + CPU. This makes the wq behave as a simple execution context + provider without concurrency management. The unbound gcwq + tries to start execution of work items as soon as possible. + Unbound wq sacrifices locality but is useful for the following + cases. + + * Wide fluctuation in the concurrency level requirement is + expected and using bound wq may end up creating large number + of mostly unused workers across different CPUs as the issuer + hops through different CPUs. + + * Long running CPU intensive workloads which can be better + managed by the system scheduler. + + WQ_FREEZEABLE + + A freezeable wq participates in the freeze phase of the system + suspend operations. Work items on the wq are drained and no + new work item starts execution until thawed. + + WQ_RESCUER + + All wq which might be used in the memory reclaim paths _MUST_ + have this flag set. This reserves one worker exclusively for + the execution of this wq under memory pressure. + + WQ_HIGHPRI + + Work items of a highpri wq are queued at the head of the + worklist of the target gcwq and start execution regardless of + the current concurrency level. In other words, highpri work + items will always start execution as soon as execution + resource is available. + + Ordering among highpri work items is preserved - a highpri + work item queued after another highpri work item will start + execution after the earlier highpri work item starts. + + Although highpri work items are not held back by other + runnable work items, they still contribute to the concurrency + level. Highpri work items in runnable state will prevent + non-highpri work items from starting execution. + + This flag is meaningless for unbound wq. + + WQ_CPU_INTENSIVE + + Work items of a CPU intensive wq do not contribute to the + concurrency level. In other words, runnable CPU intensive + work items will not prevent other work items from starting + execution. This is useful for bound work items which are + expected to hog CPU cycles so that their execution is + regulated by the system scheduler. + + Although CPU intensive work items don't contribute to the + concurrency level, start of their executions is still + regulated by the concurrency management and runnable + non-CPU-intensive work items can delay execution of CPU + intensive work items. + + This flag is meaningless for unbound wq. + + WQ_HIGHPRI | WQ_CPU_INTENSIVE + + This combination makes the wq avoid interaction with + concurrency management completely and behave as a simple + per-CPU execution context provider. Work items queued on a + highpri CPU-intensive wq start execution as soon as resources + are available and don't affect execution of other work items. + +@max_active: + +@max_active determines the maximum number of execution contexts per +CPU which can be assigned to the work items of a wq. For example, +with @max_active of 16, at most 16 work items of the wq can be +executing at the same time per CPU. + +Currently, for a bound wq, the maximum limit for @max_active is 512 +and the default value used when 0 is specified is 256. For an unbound +wq, the limit is higher of 512 and 4 * num_possible_cpus(). These +values are chosen sufficiently high such that they are not the +limiting factor while providing protection in runaway cases. + +The number of active work items of a wq is usually regulated by the +users of the wq, more specifically, by how many work items the users +may queue at the same time. Unless there is a specific need for +throttling the number of active work items, specifying '0' is +recommended. + +Some users depend on the strict execution ordering of ST wq. The +combination of @max_active of 1 and WQ_UNBOUND is used to achieve this +behavior. Work items on such wq are always queued to the unbound gcwq +and only one work item can be active at any given time thus achieving +the same ordering property as ST wq. + + +5. Example Execution Scenarios + +The following example execution scenarios try to illustrate how cmwq +behave under different configurations. + + Work items w0, w1, w2 are queued to a bound wq q0 on the same CPU. + w0 burns CPU for 5ms then sleeps for 10ms then burns CPU for 5ms + again before finishing. w1 and w2 burn CPU for 5ms then sleep for + 10ms. + +Ignoring all other tasks, works and processing overhead, and assuming +simple FIFO scheduling, the following is one highly simplified version +of possible sequences of events with the original wq. + + TIME IN MSECS EVENT + 0 w0 starts and burns CPU + 5 w0 sleeps + 15 w0 wakes up and burns CPU + 20 w0 finishes + 20 w1 starts and burns CPU + 25 w1 sleeps + 35 w1 wakes up and finishes + 35 w2 starts and burns CPU + 40 w2 sleeps + 50 w2 wakes up and finishes + +And with cmwq with @max_active >= 3, + + TIME IN MSECS EVENT + 0 w0 starts and burns CPU + 5 w0 sleeps + 5 w1 starts and burns CPU + 10 w1 sleeps + 10 w2 starts and burns CPU + 15 w2 sleeps + 15 w0 wakes up and burns CPU + 20 w0 finishes + 20 w1 wakes up and finishes + 25 w2 wakes up and finishes + +If @max_active == 2, + + TIME IN MSECS EVENT + 0 w0 starts and burns CPU + 5 w0 sleeps + 5 w1 starts and burns CPU + 10 w1 sleeps + 15 w0 wakes up and burns CPU + 20 w0 finishes + 20 w1 wakes up and finishes + 20 w2 starts and burns CPU + 25 w2 sleeps + 35 w2 wakes up and finishes + +Now, let's assume w1 and w2 are queued to a different wq q1 which has +WQ_HIGHPRI set, + + TIME IN MSECS EVENT + 0 w1 and w2 start and burn CPU + 5 w1 sleeps + 10 w2 sleeps + 10 w0 starts and burns CPU + 15 w0 sleeps + 15 w1 wakes up and finishes + 20 w2 wakes up and finishes + 25 w0 wakes up and burns CPU + 30 w0 finishes + +If q1 has WQ_CPU_INTENSIVE set, + + TIME IN MSECS EVENT + 0 w0 starts and burns CPU + 5 w0 sleeps + 5 w1 and w2 start and burn CPU + 10 w1 sleeps + 15 w2 sleeps + 15 w0 wakes up and burns CPU + 20 w0 finishes + 20 w1 wakes up and finishes + 25 w2 wakes up and finishes + + +6. Guidelines + +* Do not forget to use WQ_RESCUER if a wq may process work items which + are used during memory reclaim. Each wq with WQ_RESCUER set has one + rescuer thread reserved for it. If there is dependency among + multiple work items used during memory reclaim, they should be + queued to separate wq each with WQ_RESCUER. + +* Unless strict ordering is required, there is no need to use ST wq. + +* Unless there is a specific need, using 0 for @max_active is + recommended. In most use cases, concurrency level usually stays + well under the default limit. + +* A wq serves as a domain for forward progress guarantee (WQ_RESCUER), + flush and work item attributes. Work items which are not involved + in memory reclaim and don't need to be flushed as a part of a group + of work items, and don't require any special attribute, can use one + of the system wq. There is no difference in execution + characteristics between using a dedicated wq and a system wq. + +* Unless work items are expected to consume a huge amount of CPU + cycles, using a bound wq is usually beneficial due to the increased + level of locality in wq operations and work item execution. |