diff options
Diffstat (limited to 'Documentation')
113 files changed, 7513 insertions, 1663 deletions
diff --git a/Documentation/ABI/README b/Documentation/ABI/README new file mode 100644 index 000000000000..9feaf16f1617 --- /dev/null +++ b/Documentation/ABI/README @@ -0,0 +1,77 @@ +This directory attempts to document the ABI between the Linux kernel and +userspace, and the relative stability of these interfaces. Due to the +everchanging nature of Linux, and the differing maturity levels, these +interfaces should be used by userspace programs in different ways. + +We have four different levels of ABI stability, as shown by the four +different subdirectories in this location. Interfaces may change levels +of stability according to the rules described below. + +The different levels of stability are: + + stable/ + This directory documents the interfaces that the developer has + defined to be stable. Userspace programs are free to use these + interfaces with no restrictions, and backward compatibility for + them will be guaranteed for at least 2 years. Most interfaces + (like syscalls) are expected to never change and always be + available. + + testing/ + This directory documents interfaces that are felt to be stable, + as the main development of this interface has been completed. + The interface can be changed to add new features, but the + current interface will not break by doing this, unless grave + errors or security problems are found in them. Userspace + programs can start to rely on these interfaces, but they must be + aware of changes that can occur before these interfaces move to + be marked stable. Programs that use these interfaces are + strongly encouraged to add their name to the description of + these interfaces, so that the kernel developers can easily + notify them if any changes occur (see the description of the + layout of the files below for details on how to do this.) + + obsolete/ + This directory documents interfaces that are still remaining in + the kernel, but are marked to be removed at some later point in + time. The description of the interface will document the reason + why it is obsolete and when it can be expected to be removed. + The file Documentation/feature-removal-schedule.txt may describe + some of these interfaces, giving a schedule for when they will + be removed. + + removed/ + This directory contains a list of the old interfaces that have + been removed from the kernel. + +Every file in these directories will contain the following information: + +What: Short description of the interface +Date: Date created +KernelVersion: Kernel version this feature first showed up in. +Contact: Primary contact for this interface (may be a mailing list) +Description: Long description of the interface and how to use it. +Users: All users of this interface who wish to be notified when + it changes. This is very important for interfaces in + the "testing" stage, so that kernel developers can work + with userspace developers to ensure that things do not + break in ways that are unacceptable. It is also + important to get feedback for these interfaces to make + sure they are working in a proper way and do not need to + be changed further. + + +How things move between levels: + +Interfaces in stable may move to obsolete, as long as the proper +notification is given. + +Interfaces may be removed from obsolete and the kernel as long as the +documented amount of time has gone by. + +Interfaces in the testing state can move to the stable state when the +developers feel they are finished. They cannot be removed from the +kernel tree without going through the obsolete state first. + +It's up to the developer to place their interfaces in the category they +wish for it to start out in. diff --git a/Documentation/ABI/obsolete/devfs b/Documentation/ABI/obsolete/devfs new file mode 100644 index 000000000000..b8b87399bc8f --- /dev/null +++ b/Documentation/ABI/obsolete/devfs @@ -0,0 +1,13 @@ +What: devfs +Date: July 2005 +Contact: Greg Kroah-Hartman <gregkh@suse.de> +Description: + devfs has been unmaintained for a number of years, has unfixable + races, contains a naming policy within the kernel that is + against the LSB, and can be replaced by using udev. + The files fs/devfs/*, include/linux/devfs_fs*.h will be removed, + along with the the assorted devfs function calls throughout the + kernel tree. + +Users: + diff --git a/Documentation/ABI/stable/syscalls b/Documentation/ABI/stable/syscalls new file mode 100644 index 000000000000..c3ae3e7d6a0c --- /dev/null +++ b/Documentation/ABI/stable/syscalls @@ -0,0 +1,10 @@ +What: The kernel syscall interface +Description: + This interface matches much of the POSIX interface and is based + on it and other Unix based interfaces. It will only be added to + over time, and not have things removed from it. + + Note that this interface is different for every architecture + that Linux supports. Please see the architecture-specific + documentation for details on the syscall numbers that are to be + mapped to each syscall. diff --git a/Documentation/ABI/stable/sysfs-module b/Documentation/ABI/stable/sysfs-module new file mode 100644 index 000000000000..75be43118335 --- /dev/null +++ b/Documentation/ABI/stable/sysfs-module @@ -0,0 +1,30 @@ +What: /sys/module +Description: + The /sys/module tree consists of the following structure: + + /sys/module/MODULENAME + The name of the module that is in the kernel. This + module name will show up either if the module is built + directly into the kernel, or if it is loaded as a + dyanmic module. + + /sys/module/MODULENAME/parameters + This directory contains individual files that are each + individual parameters of the module that are able to be + changed at runtime. See the individual module + documentation as to the contents of these parameters and + what they accomplish. + + Note: The individual parameter names and values are not + considered stable, only the fact that they will be + placed in this location within sysfs. See the + individual driver documentation for details as to the + stability of the different parameters. + + /sys/module/MODULENAME/refcnt + If the module is able to be unloaded from the kernel, this file + will contain the current reference count of the module. + + Note: If the module is built into the kernel, or if the + CONFIG_MODULE_UNLOAD kernel configuration value is not enabled, + this file will not be present. diff --git a/Documentation/ABI/testing/sysfs-class b/Documentation/ABI/testing/sysfs-class new file mode 100644 index 000000000000..4b0cb891e46e --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class @@ -0,0 +1,16 @@ +What: /sys/class/ +Date: Febuary 2006 +Contact: Greg Kroah-Hartman <gregkh@suse.de> +Description: + The /sys/class directory will consist of a group of + subdirectories describing individual classes of devices + in the kernel. The individual directories will consist + of either subdirectories, or symlinks to other + directories. + + All programs that use this directory tree must be able + to handle both subdirectories or symlinks in order to + work properly. + +Users: + udev <linux-hotplug-devel@lists.sourceforge.net> diff --git a/Documentation/ABI/testing/sysfs-devices b/Documentation/ABI/testing/sysfs-devices new file mode 100644 index 000000000000..6a25671ee5f6 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-devices @@ -0,0 +1,25 @@ +What: /sys/devices +Date: February 2006 +Contact: Greg Kroah-Hartman <gregkh@suse.de> +Description: + The /sys/devices tree contains a snapshot of the + internal state of the kernel device tree. Devices will + be added and removed dynamically as the machine runs, + and between different kernel versions, the layout of the + devices within this tree will change. + + Please do not rely on the format of this tree because of + this. If a program wishes to find different things in + the tree, please use the /sys/class structure and rely + on the symlinks there to point to the proper location + within the /sys/devices tree of the individual devices. + Or rely on the uevent messages to notify programs of + devices being added and removed from this tree to find + the location of those devices. + + Note that sometimes not all devices along the directory + chain will have emitted uevent messages, so userspace + programs must be able to handle such occurrences. + +Users: + udev <linux-hotplug-devel@lists.sourceforge.net> diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle index ce5d2c038cf5..6d2412ec91ed 100644 --- a/Documentation/CodingStyle +++ b/Documentation/CodingStyle @@ -155,7 +155,83 @@ problem, which is called the function-growth-hormone-imbalance syndrome. See next chapter. - Chapter 5: Functions + Chapter 5: Typedefs + +Please don't use things like "vps_t". + +It's a _mistake_ to use typedef for structures and pointers. When you see a + + vps_t a; + +in the source, what does it mean? + +In contrast, if it says + + struct virtual_container *a; + +you can actually tell what "a" is. + +Lots of people think that typedefs "help readability". Not so. They are +useful only for: + + (a) totally opaque objects (where the typedef is actively used to _hide_ + what the object is). + + Example: "pte_t" etc. opaque objects that you can only access using + the proper accessor functions. + + NOTE! Opaqueness and "accessor functions" are not good in themselves. + The reason we have them for things like pte_t etc. is that there + really is absolutely _zero_ portably accessible information there. + + (b) Clear integer types, where the abstraction _helps_ avoid confusion + whether it is "int" or "long". + + u8/u16/u32 are perfectly fine typedefs, although they fit into + category (d) better than here. + + NOTE! Again - there needs to be a _reason_ for this. If something is + "unsigned long", then there's no reason to do + + typedef unsigned long myflags_t; + + but if there is a clear reason for why it under certain circumstances + might be an "unsigned int" and under other configurations might be + "unsigned long", then by all means go ahead and use a typedef. + + (c) when you use sparse to literally create a _new_ type for + type-checking. + + (d) New types which are identical to standard C99 types, in certain + exceptional circumstances. + + Although it would only take a short amount of time for the eyes and + brain to become accustomed to the standard types like 'uint32_t', + some people object to their use anyway. + + Therefore, the Linux-specific 'u8/u16/u32/u64' types and their + signed equivalents which are identical to standard types are + permitted -- although they are not mandatory in new code of your + own. + + When editing existing code which already uses one or the other set + of types, you should conform to the existing choices in that code. + + (e) Types safe for use in userspace. + + In certain structures which are visible to userspace, we cannot + require C99 types and cannot use the 'u32' form above. Thus, we + use __u32 and similar types in all structures which are shared + with userspace. + +Maybe there are other cases too, but the rule should basically be to NEVER +EVER use a typedef unless you can clearly match one of those rules. + +In general, a pointer, or a struct that has elements that can reasonably +be directly accessed should _never_ be a typedef. + + + Chapter 6: Functions Functions should be short and sweet, and do just one thing. They should fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24, @@ -183,7 +259,7 @@ and it gets confused. You know you're brilliant, but maybe you'd like to understand what you did 2 weeks from now. - Chapter 6: Centralized exiting of functions + Chapter 7: Centralized exiting of functions Albeit deprecated by some people, the equivalent of the goto statement is used frequently by compilers in form of the unconditional jump instruction. @@ -220,7 +296,7 @@ out: return result; } - Chapter 7: Commenting + Chapter 8: Commenting Comments are good, but there is also a danger of over-commenting. NEVER try to explain HOW your code works in a comment: it's much better to @@ -240,7 +316,7 @@ When commenting the kernel API functions, please use the kerneldoc format. See the files Documentation/kernel-doc-nano-HOWTO.txt and scripts/kernel-doc for details. - Chapter 8: You've made a mess of it + Chapter 9: You've made a mess of it That's OK, we all do. You've probably been told by your long-time Unix user helper that "GNU emacs" automatically formats the C sources for @@ -288,7 +364,7 @@ re-formatting you may want to take a look at the man page. But remember: "indent" is not a fix for bad programming. - Chapter 9: Configuration-files + Chapter 10: Configuration-files For configuration options (arch/xxx/Kconfig, and all the Kconfig files), somewhat different indentation is used. @@ -313,7 +389,7 @@ support for file-systems, for instance) should be denoted (DANGEROUS), other experimental options should be denoted (EXPERIMENTAL). - Chapter 10: Data structures + Chapter 11: Data structures Data structures that have visibility outside the single-threaded environment they are created and destroyed in should always have @@ -344,7 +420,7 @@ Remember: if another thread can find your data structure, and you don't have a reference count on it, you almost certainly have a bug. - Chapter 11: Macros, Enums and RTL + Chapter 12: Macros, Enums and RTL Names of macros defining constants and labels in enums are capitalized. @@ -399,7 +475,7 @@ The cpp manual deals with macros exhaustively. The gcc internals manual also covers RTL which is used frequently with assembly language in the kernel. - Chapter 12: Printing kernel messages + Chapter 13: Printing kernel messages Kernel developers like to be seen as literate. Do mind the spelling of kernel messages to make a good impression. Do not use crippled @@ -410,7 +486,7 @@ Kernel messages do not have to be terminated with a period. Printing numbers in parentheses (%d) adds no value and should be avoided. - Chapter 13: Allocating memory + Chapter 14: Allocating memory The kernel provides the following general purpose memory allocators: kmalloc(), kzalloc(), kcalloc(), and vmalloc(). Please refer to the API @@ -429,7 +505,7 @@ from void pointer to any other pointer type is guaranteed by the C programming language. - Chapter 14: The inline disease + Chapter 15: The inline disease There appears to be a common misperception that gcc has a magic "make me faster" speedup option called "inline". While the use of inlines can be @@ -457,7 +533,7 @@ something it would have done anyway. - Chapter 15: References + Appendix I: References The C Programming Language, Second Edition by Brian W. Kernighan and Dennis M. Ritchie. @@ -481,4 +557,4 @@ Kernel CodingStyle, by greg@kroah.com at OLS 2002: http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/ -- -Last updated on 30 December 2005 by a community effort on LKML. +Last updated on 30 April 2006. diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl index ca02e04a906c..3630a0d7695f 100644 --- a/Documentation/DocBook/kernel-api.tmpl +++ b/Documentation/DocBook/kernel-api.tmpl @@ -62,6 +62,8 @@ <sect1><title>Internal Functions</title> !Ikernel/exit.c !Ikernel/signal.c +!Iinclude/linux/kthread.h +!Ekernel/kthread.c </sect1> <sect1><title>Kernel objects manipulation</title> @@ -114,9 +116,33 @@ X!Ilib/string.c </sect1> </chapter> + <chapter id="kernel-lib"> + <title>Basic Kernel Library Functions</title> + + <para> + The Linux kernel provides more basic utility functions. + </para> + + <sect1><title>Bitmap Operations</title> +!Elib/bitmap.c +!Ilib/bitmap.c + </sect1> + + <sect1><title>Command-line Parsing</title> +!Elib/cmdline.c + </sect1> + + <sect1><title>CRC Functions</title> +!Elib/crc16.c +!Elib/crc32.c +!Elib/crc-ccitt.c + </sect1> + </chapter> + <chapter id="mm"> <title>Memory Management in Linux</title> <sect1><title>The Slab Cache</title> +!Iinclude/linux/slab.h !Emm/slab.c </sect1> <sect1><title>User Space Memory Access</title> @@ -280,12 +306,13 @@ X!Ekernel/module.c <sect1><title>MTRR Handling</title> !Earch/i386/kernel/cpu/mtrr/main.c </sect1> + <sect1><title>PCI Support Library</title> !Edrivers/pci/pci.c !Edrivers/pci/pci-driver.c !Edrivers/pci/remove.c !Edrivers/pci/pci-acpi.c -<!-- kerneldoc does not understand to __devinit +<!-- kerneldoc does not understand __devinit X!Edrivers/pci/search.c --> !Edrivers/pci/msi.c @@ -314,6 +341,13 @@ X!Earch/i386/kernel/mca.c </sect1> </chapter> + <chapter id="firmware"> + <title>Firmware Interfaces</title> + <sect1><title>DMI Interfaces</title> +!Edrivers/firmware/dmi_scan.c + </sect1> + </chapter> + <chapter id="devfs"> <title>The Device File System</title> !Efs/devfs/base.c @@ -331,6 +365,18 @@ X!Earch/i386/kernel/mca.c !Esecurity/security.c </chapter> + <chapter id="audit"> + <title>Audit Interfaces</title> +!Ekernel/audit.c +!Ikernel/auditsc.c +!Ikernel/auditfilter.c + </chapter> + + <chapter id="accounting"> + <title>Accounting Framework</title> +!Ikernel/acct.c + </chapter> + <chapter id="pmfuncs"> <title>Power Management</title> !Ekernel/power/pm.c @@ -390,7 +436,6 @@ X!Edrivers/pnp/system.c </sect1> </chapter> - <chapter id="blkdev"> <title>Block Devices</title> !Eblock/ll_rw_blk.c @@ -401,6 +446,14 @@ X!Edrivers/pnp/system.c !Edrivers/char/misc.c </chapter> + <chapter id="parportdev"> + <title>Parallel Port Devices</title> +!Iinclude/linux/parport.h +!Edrivers/parport/ieee1284.c +!Edrivers/parport/share.c +!Idrivers/parport/daisy.c + </chapter> + <chapter id="viddev"> <title>Video4Linux</title> !Edrivers/media/video/videodev.c diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl index 158ffe9bfade..644c3884fab9 100644 --- a/Documentation/DocBook/kernel-locking.tmpl +++ b/Documentation/DocBook/kernel-locking.tmpl @@ -1590,7 +1590,7 @@ the amount of locking which needs to be done. <para> Our final dilemma is this: when can we actually destroy the removed element? Remember, a reader might be stepping through - this element in the list right now: it we free this element and + this element in the list right now: if we free this element and the <symbol>next</symbol> pointer changes, the reader will jump off into garbage and crash. We need to wait until we know that all the readers who were traversing the list when we deleted the diff --git a/Documentation/DocBook/libata.tmpl b/Documentation/DocBook/libata.tmpl index f869b03929db..e97c32314541 100644 --- a/Documentation/DocBook/libata.tmpl +++ b/Documentation/DocBook/libata.tmpl @@ -169,6 +169,22 @@ void (*tf_read) (struct ata_port *ap, struct ata_taskfile *tf); </sect2> + <sect2><title>PIO data read/write</title> + <programlisting> +void (*data_xfer) (struct ata_device *, unsigned char *, unsigned int, int); + </programlisting> + + <para> +All bmdma-style drivers must implement this hook. This is the low-level +operation that actually copies the data bytes during a PIO data +transfer. +Typically the driver +will choose one of ata_pio_data_xfer_noirq(), ata_pio_data_xfer(), or +ata_mmio_data_xfer(). + </para> + + </sect2> + <sect2><title>ATA command execute</title> <programlisting> void (*exec_command)(struct ata_port *ap, struct ata_taskfile *tf); @@ -204,11 +220,10 @@ command. <programlisting> u8 (*check_status)(struct ata_port *ap); u8 (*check_altstatus)(struct ata_port *ap); -u8 (*check_err)(struct ata_port *ap); </programlisting> <para> - Reads the Status/AltStatus/Error ATA shadow register from + Reads the Status/AltStatus ATA shadow register from hardware. On some hardware, reading the Status register has the side effect of clearing the interrupt condition. Most drivers for taskfile-based hardware use @@ -269,23 +284,6 @@ void (*set_mode) (struct ata_port *ap); </sect2> - <sect2><title>Reset ATA bus</title> - <programlisting> -void (*phy_reset) (struct ata_port *ap); - </programlisting> - - <para> - The very first step in the probe phase. Actions vary depending - on the bus type, typically. After waking up the device and probing - for device presence (PATA and SATA), typically a soft reset - (SRST) will be performed. Drivers typically use the helper - functions ata_bus_reset() or sata_phy_reset() for this hook. - Many SATA drivers use sata_phy_reset() or call it from within - their own phy_reset() functions. - </para> - - </sect2> - <sect2><title>Control PCI IDE BMDMA engine</title> <programlisting> void (*bmdma_setup) (struct ata_queued_cmd *qc); @@ -354,16 +352,74 @@ int (*qc_issue) (struct ata_queued_cmd *qc); </sect2> - <sect2><title>Timeout (error) handling</title> + <sect2><title>Exception and probe handling (EH)</title> <programlisting> void (*eng_timeout) (struct ata_port *ap); +void (*phy_reset) (struct ata_port *ap); + </programlisting> + + <para> +Deprecated. Use ->error_handler() instead. + </para> + + <programlisting> +void (*freeze) (struct ata_port *ap); +void (*thaw) (struct ata_port *ap); + </programlisting> + + <para> +ata_port_freeze() is called when HSM violations or some other +condition disrupts normal operation of the port. A frozen port +is not allowed to perform any operation until the port is +thawed, which usually follows a successful reset. + </para> + + <para> +The optional ->freeze() callback can be used for freezing the port +hardware-wise (e.g. mask interrupt and stop DMA engine). If a +port cannot be frozen hardware-wise, the interrupt handler +must ack and clear interrupts unconditionally while the port +is frozen. + </para> + <para> +The optional ->thaw() callback is called to perform the opposite of ->freeze(): +prepare the port for normal operation once again. Unmask interrupts, +start DMA engine, etc. + </para> + + <programlisting> +void (*error_handler) (struct ata_port *ap); + </programlisting> + + <para> +->error_handler() is a driver's hook into probe, hotplug, and recovery +and other exceptional conditions. The primary responsibility of an +implementation is to call ata_do_eh() or ata_bmdma_drive_eh() with a set +of EH hooks as arguments: + </para> + + <para> +'prereset' hook (may be NULL) is called during an EH reset, before any other actions +are taken. + </para> + + <para> +'postreset' hook (may be NULL) is called after the EH reset is performed. Based on +existing conditions, severity of the problem, and hardware capabilities, + </para> + + <para> +Either 'softreset' (may be NULL) or 'hardreset' (may be NULL) will be +called to perform the low-level EH reset. + </para> + + <programlisting> +void (*post_internal_cmd) (struct ata_queued_cmd *qc); </programlisting> <para> -This is a high level error handling function, called from the -error handling thread, when a command times out. Most newer -hardware will implement its own error handling code here. IDE BMDMA -drivers may use the helper function ata_eng_timeout(). +Perform any hardware-specific actions necessary to finish processing +after executing a probe-time or EH-time command via ata_exec_internal(). </para> </sect2> diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 49e27cc19385..1d50cf0c905e 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt @@ -144,9 +144,47 @@ over a rather long period of time, but improvements are always welcome! whether the increased speed is worth it. 8. Although synchronize_rcu() is a bit slower than is call_rcu(), - it usually results in simpler code. So, unless update performance - is important or the updaters cannot block, synchronize_rcu() - should be used in preference to call_rcu(). + it usually results in simpler code. So, unless update + performance is critically important or the updaters cannot block, + synchronize_rcu() should be used in preference to call_rcu(). + + An especially important property of the synchronize_rcu() + primitive is that it automatically self-limits: if grace periods + are delayed for whatever reason, then the synchronize_rcu() + primitive will correspondingly delay updates. In contrast, + code using call_rcu() should explicitly limit update rate in + cases where grace periods are delayed, as failing to do so can + result in excessive realtime latencies or even OOM conditions. + + Ways of gaining this self-limiting property when using call_rcu() + include: + + a. Keeping a count of the number of data-structure elements + used by the RCU-protected data structure, including those + waiting for a grace period to elapse. Enforce a limit + on this number, stalling updates as needed to allow + previously deferred frees to complete. + + Alternatively, limit only the number awaiting deferred + free rather than the total number of elements. + + b. Limiting update rate. For example, if updates occur only + once per hour, then no explicit rate limiting is required, + unless your system is already badly broken. The dcache + subsystem takes this approach -- updates are guarded + by a global lock, limiting their rate. + + c. Trusted update -- if updates can only be done manually by + superuser or some other trusted user, then it might not + be necessary to automatically limit them. The theory + here is that superuser already has lots of ways to crash + the machine. + + d. Use call_rcu_bh() rather than call_rcu(), in order to take + advantage of call_rcu_bh()'s faster grace periods. + + e. Periodically invoke synchronize_rcu(), permitting a limited + number of updates per grace period. 9. All RCU list-traversal primitives, which include list_for_each_rcu(), list_for_each_entry_rcu(), diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt index e4c38152f7f7..a4948591607d 100644 --- a/Documentation/RCU/torture.txt +++ b/Documentation/RCU/torture.txt @@ -7,7 +7,7 @@ The CONFIG_RCU_TORTURE_TEST config option is available for all RCU implementations. It creates an rcutorture kernel module that can be loaded to run a torture test. The test periodically outputs status messages via printk(), which can be examined via the dmesg -command (perhaps grepping for "rcutorture"). The test is started +command (perhaps grepping for "torture"). The test is started when the module is loaded, and stops when the module is unloaded. However, actually setting this config option to "y" results in the system @@ -35,6 +35,19 @@ stat_interval The number of seconds between output of torture be printed -only- when the module is unloaded, and this is the default. +shuffle_interval + The number of seconds to keep the test threads affinitied + to a particular subset of the CPUs. Used in conjunction + with test_no_idle_hz. + +test_no_idle_hz Whether or not to test the ability of RCU to operate in + a kernel that disables the scheduling-clock interrupt to + idle CPUs. Boolean parameter, "1" to test, "0" otherwise. + +torture_type The type of RCU to test: "rcu" for the rcu_read_lock() + API, "rcu_bh" for the rcu_read_lock_bh() API, and "srcu" + for the "srcu_read_lock()" API. + verbose Enable debug printk()s. Default is disabled. @@ -42,14 +55,14 @@ OUTPUT The statistics output is as follows: - rcutorture: --- Start of test: nreaders=16 stat_interval=0 verbose=0 - rcutorture: rtc: 0000000000000000 ver: 1916 tfle: 0 rta: 1916 rtaf: 0 rtf: 1915 - rcutorture: Reader Pipe: 1466408 9747 0 0 0 0 0 0 0 0 0 - rcutorture: Reader Batch: 1464477 11678 0 0 0 0 0 0 0 0 - rcutorture: Free-Block Circulation: 1915 1915 1915 1915 1915 1915 1915 1915 1915 1915 0 - rcutorture: --- End of test + rcu-torture: --- Start of test: nreaders=16 stat_interval=0 verbose=0 + rcu-torture: rtc: 0000000000000000 ver: 1916 tfle: 0 rta: 1916 rtaf: 0 rtf: 1915 + rcu-torture: Reader Pipe: 1466408 9747 0 0 0 0 0 0 0 0 0 + rcu-torture: Reader Batch: 1464477 11678 0 0 0 0 0 0 0 0 + rcu-torture: Free-Block Circulation: 1915 1915 1915 1915 1915 1915 1915 1915 1915 1915 0 + rcu-torture: --- End of test -The command "dmesg | grep rcutorture:" will extract this information on +The command "dmesg | grep torture:" will extract this information on most systems. On more esoteric configurations, it may be necessary to use other commands to access the output of the printk()s used by the RCU torture test. The printk()s use KERN_ALERT, so they should @@ -115,8 +128,9 @@ The following script may be used to torture RCU: modprobe rcutorture sleep 100 rmmod rcutorture - dmesg | grep rcutorture: + dmesg | grep torture: The output can be manually inspected for the error flag of "!!!". One could of course create a more elaborate script that automatically -checked for such errors. +checked for such errors. The "rmmod" command forces a "SUCCESS" or +"FAILURE" indication to be printk()ed. diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index 07cb93b82ba9..4f41a60e5111 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -184,7 +184,17 @@ synchronize_rcu() blocking, it registers a function and argument which are invoked after all ongoing RCU read-side critical sections have completed. This callback variant is particularly useful in situations where - it is illegal to block. + it is illegal to block or where update-side performance is + critically important. + + However, the call_rcu() API should not be used lightly, as use + of the synchronize_rcu() API generally results in simpler code. + In addition, the synchronize_rcu() API has the nice property + of automatically limiting update rate should grace periods + be delayed. This property results in system resilience in face + of denial-of-service attacks. Code using call_rcu() should limit + update rate in order to gain this same sort of resilience. See + checklist.txt for some approaches to limiting the update rate. rcu_assign_pointer() @@ -790,7 +800,6 @@ RCU pointer update: RCU grace period: - synchronize_kernel (deprecated) synchronize_net synchronize_sched synchronize_rcu diff --git a/Documentation/SubmitChecklist b/Documentation/SubmitChecklist new file mode 100644 index 000000000000..8230098da529 --- /dev/null +++ b/Documentation/SubmitChecklist @@ -0,0 +1,57 @@ +Linux Kernel patch sumbittal checklist +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Here are some basic things that developers should do if they +want to see their kernel patch submittals accepted quicker. + +These are all above and beyond the documentation that is provided +in Documentation/SubmittingPatches and elsewhere about submitting +Linux kernel patches. + + + +- Builds cleanly with applicable or modified CONFIG options =y, =m, and =n. + No gcc warnings/errors, no linker warnings/errors. + +- Passes allnoconfig, allmodconfig + +- Builds on multiple CPU arch-es by using local cross-compile tools + or something like PLM at OSDL. + +- ppc64 is a good architecture for cross-compilation checking because it + tends to use `unsigned long' for 64-bit quantities. + +- Matches kernel coding style(!) + +- Any new or modified CONFIG options don't muck up the config menu. + +- All new Kconfig options have help text. + +- Has been carefully reviewed with respect to relevant Kconfig + combinations. This is very hard to get right with testing -- + brainpower pays off here. + +- Check cleanly with sparse. + +- Use 'make checkstack' and 'make namespacecheck' and fix any + problems that they find. Note: checkstack does not point out + problems explicitly, but any one function that uses more than + 512 bytes on the stack is a candidate for change. + +- Include kernel-doc to document global kernel APIs. (Not required + for static functions, but OK there also.) Use 'make htmldocs' + or 'make mandocs' to check the kernel-doc and fix any issues. + +- Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT, + CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES, + CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_SPINLOCK_SLEEP all simultaneously + enabled. + +- Has been build- and runtime tested with and without CONFIG_SMP and + CONFIG_PREEMPT. + +- If the patch affects IO/Disk, etc: has been tested with and without + CONFIG_LBD. + + +2006-APR-27 diff --git a/Documentation/arm/Samsung-S3C24XX/Overview.txt b/Documentation/arm/Samsung-S3C24XX/Overview.txt index 8c6ee684174c..3e46d2a31158 100644 --- a/Documentation/arm/Samsung-S3C24XX/Overview.txt +++ b/Documentation/arm/Samsung-S3C24XX/Overview.txt @@ -7,11 +7,13 @@ Introduction ------------ The Samsung S3C24XX range of ARM9 System-on-Chip CPUs are supported - by the 's3c2410' architecture of ARM Linux. Currently the S3C2410 and - the S3C2440 are supported CPUs. + by the 's3c2410' architecture of ARM Linux. Currently the S3C2410, + S3C2440 and S3C2442 devices are supported. Support for the S3C2400 series is in progress. + Support for the S3C2412 and S3C2413 CPUs is being merged. + Configuration ------------- @@ -43,9 +45,18 @@ Machines Samsung's own development board, geared for PDA work. + Samsung/Aiji SMDK2412 + + The S3C2412 version of the SMDK2440. + + Samsung/Aiji SMDK2413 + + The S3C2412 version of the SMDK2440. + Samsung/Meritech SMDK2440 - The S3C2440 compatible version of the SMDK2440 + The S3C2440 compatible version of the SMDK2440, which has the + option of an S3C2440 or S3C2442 CPU module. Thorcom VR1000 @@ -211,24 +222,6 @@ Port Contributors Lucas Correia Villa Real (S3C2400 port) -Document Changes ----------------- - - 05 Sep 2004 - BJD - Added Document Changes section - 05 Sep 2004 - BJD - Added Klaus Fetscher to list of contributors - 25 Oct 2004 - BJD - Added Dimitry Andric to list of contributors - 25 Oct 2004 - BJD - Updated the MTD from the 2.6.9 merge - 21 Jan 2005 - BJD - Added rx3715, added Shannon to contributors - 10 Feb 2005 - BJD - Added Guillaume Gourat to contributors - 02 Mar 2005 - BJD - Added SMDK2440 to list of machines - 06 Mar 2005 - BJD - Added Christer Weinigel - 08 Mar 2005 - BJD - Added LCVR to list of people, updated introduction - 08 Mar 2005 - BJD - Added section on adding machines - 09 Sep 2005 - BJD - Added section on platform data - 11 Feb 2006 - BJD - Added I2C, RTC and Watchdog sections - 11 Feb 2006 - BJD - Added Osiris machine, and S3C2400 information - - Document Author --------------- diff --git a/Documentation/arm/Samsung-S3C24XX/S3C2412.txt b/Documentation/arm/Samsung-S3C24XX/S3C2412.txt new file mode 100644 index 000000000000..cb82a7fc7901 --- /dev/null +++ b/Documentation/arm/Samsung-S3C24XX/S3C2412.txt @@ -0,0 +1,120 @@ + S3C2412 ARM Linux Overview + ========================== + +Introduction +------------ + + The S3C2412 is part of the S3C24XX range of ARM9 System-on-Chip CPUs + from Samsung. This part has an ARM926-EJS core, capable of running up + to 266MHz (see data-sheet for more information) + + +Clock +----- + + The core clock code provides a set of clocks to the drivers, and allows + for source selection and a number of other features. + + +Power +----- + + No support for suspend/resume to RAM in the current system. + + +DMA +--- + + No current support for DMA. + + +GPIO +---- + + There is support for setting the GPIO to input/output/special function + and reading or writing to them. + + +UART +---- + + The UART hardware is similar to the S3C2440, and is supported by the + s3c2410 driver in the drivers/serial directory. + + +NAND +---- + + The NAND hardware is similar to the S3C2440, and is supported by the + s3c2410 driver in the drivers/mtd/nand directory. + + +USB Host +-------- + + The USB hardware is similar to the S3C2410, with extended clock source + control. The OHCI portion is supported by the ohci-s3c2410 driver, and + the clock control selection is supported by the core clock code. + + +USB Device +---------- + + No current support in the kernel + + +IRQs +---- + + All the standard, and external interrupt sources are supported. The + extra sub-sources are not yet supported. + + +RTC +--- + + The RTC hardware is similar to the S3C2410, and is supported by the + s3c2410-rtc driver. + + +Watchdog +-------- + + The watchdog harware is the same as the S3C2410, and is supported by + the s3c2410_wdt driver. + + +MMC/SD/SDIO +----------- + + No current support for the MMC/SD/SDIO block. + +IIC +--- + + The IIC hardware is the same as the S3C2410, and is supported by the + i2c-s3c24xx driver. + + +IIS +--- + + No current support for the IIS interface. + + +SPI +--- + + No current support for the SPI interfaces. + + +ATA +--- + + No current support for the on-board ATA block. + + +Document Author +--------------- + +Ben Dooks, (c) 2006 Simtec Electronics diff --git a/Documentation/arm/Samsung-S3C24XX/S3C2413.txt b/Documentation/arm/Samsung-S3C24XX/S3C2413.txt new file mode 100644 index 000000000000..ab2a88858f12 --- /dev/null +++ b/Documentation/arm/Samsung-S3C24XX/S3C2413.txt @@ -0,0 +1,21 @@ + S3C2413 ARM Linux Overview + ========================== + +Introduction +------------ + + The S3C2413 is an extended version of the S3C2412, with an camera + interface and mobile DDR memory support. See the S3C2412 support + documentation for more information. + + +Camera Interface +--------------- + + This block is currently not supported. + + +Document Author +--------------- + +Ben Dooks, (c) 2006 Simtec Electronics diff --git a/Documentation/arm/Sharp-LH/ADC-LH7-Touchscreen b/Documentation/arm/Sharp-LH/ADC-LH7-Touchscreen new file mode 100644 index 000000000000..1e6a23fdf2fc --- /dev/null +++ b/Documentation/arm/Sharp-LH/ADC-LH7-Touchscreen @@ -0,0 +1,61 @@ +README on the ADC/Touchscreen Controller +======================================== + +The LH79524 and LH7A404 include a built-in Analog to Digital +controller (ADC) that is used to process input from a touchscreen. +The driver only implements a four-wire touch panel protocol. + +The touchscreen driver is maintenance free except for the pen-down or +touch threshold. Some resistive displays and board combinations may +require tuning of this threshold. The driver exposes some of it's +internal state in the sys filesystem. If the kernel is configured +with it, CONFIG_SYSFS, and sysfs is mounted at /sys, there will be a +directory + + /sys/devices/platform/adc-lh7.0 + +containing these files. + + -r--r--r-- 1 root root 4096 Jan 1 00:00 samples + -rw-r--r-- 1 root root 4096 Jan 1 00:00 threshold + -r--r--r-- 1 root root 4096 Jan 1 00:00 threshold_range + +The threshold is the current touch threshold. It defaults to 750 on +most targets. + + # cat threshold + 750 + +The threshold_range contains the range of valid values for the +threshold. Values outside of this range will be silently ignored. + + # cat threshold_range + 0 1023 + +To change the threshold, write a value to the threshold file. + + # echo 500 > threshold + # cat threshold + 500 + +The samples file contains the most recently sampled values from the +ADC. There are 12. Below are typical of the last sampled values when +the pen has been released. The first two and last two samples are for +detecting whether or not the pen is down. The third through sixth are +X coordinate samples. The seventh through tenth are Y coordinate +samples. + + # cat samples + 1023 1023 0 0 0 0 530 529 530 529 1023 1023 + +To determine a reasonable threshold, press on the touch panel with an +appropriate stylus and read the values from samples. + + # cat samples + 1023 676 92 103 101 102 855 919 922 922 1023 679 + +The first and eleventh samples are discarded. Thus, the important +values are the second and twelfth which are used to determine if the +pen is down. When both are below the threshold, the driver registers +that the pen is down. When either is above the threshold, it +registers then pen is up. diff --git a/Documentation/arm/Sharp-LH/LCDPanels b/Documentation/arm/Sharp-LH/LCDPanels new file mode 100644 index 000000000000..fb1b21c2f2f4 --- /dev/null +++ b/Documentation/arm/Sharp-LH/LCDPanels @@ -0,0 +1,59 @@ +README on the LCD Panels +======================== + +Configuration options for several LCD panels, available from Logic PD, +are included in the kernel source. This README will help you +understand the configuration data and give you some guidance for +adding support for other panels if you wish. + + +lcd-panels.h +------------ + +There is no way, at present, to detect which panel is attached to the +system at runtime. Thus the kernel configuration is static. The file +arch/arm/mach-ld7a40x/lcd-panels.h (or similar) defines all of the +panel specific parameters. + +It should be possible for this data to be shared among several device +families. The current layout may be insufficiently general, but it is +amenable to improvement. + + +PIXEL_CLOCK +----------- + +The panel data sheets will give a range of acceptable pixel clocks. +The fundamental LCDCLK input frequency is divided down by a PCD +constant in field '.tim2'. It may happen that it is impossible to set +the pixel clock within this range. A clock which is too slow will +tend to flicker. For the highest quality image, set the clock as high +as possible. + + +MARGINS +------- + +These values may be difficult to glean from the panel data sheet. In +the case of the Sharp panels, the upper margin is explicitly called +out as a specific number of lines from the top of the frame. The +other values may not matter as much as the panels tend to +automatically center the image. + + +Sync Sense +---------- + +The sense of the hsync and vsync pulses may be called out in the data +sheet. On one panel, the sense of these pulses determine the height +of the visible region on the panel. Most of the Sharp panels use +negative sense sync pulses set by the TIM2_IHS and TIM2_IVS bits in +'.tim2'. + + +Pel Layout +---------- + +The Sharp color TFT panels are all configured for 16 bit direct color +modes. The amba-lcd driver sets the pel mode to 565 for 5 bits of +each red and blue and 6 bits of green. diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt index 23a1c2402bcc..2a63d5662a93 100644 --- a/Documentation/atomic_ops.txt +++ b/Documentation/atomic_ops.txt @@ -157,13 +157,13 @@ For example, smp_mb__before_atomic_dec() can be used like so: smp_mb__before_atomic_dec(); atomic_dec(&obj->ref_count); -It makes sure that all memory operations preceeding the atomic_dec() +It makes sure that all memory operations preceding the atomic_dec() call are strongly ordered with respect to the atomic counter -operation. In the above example, it guarentees that the assignment of +operation. In the above example, it guarantees that the assignment of "1" to obj->dead will be globally visible to other cpus before the atomic counter decrement. -Without the explicitl smp_mb__before_atomic_dec() call, the +Without the explicit smp_mb__before_atomic_dec() call, the implementation could legally allow the atomic counter update visible to other cpus before the "obj->dead = 1;" assignment. @@ -173,11 +173,11 @@ ordering with respect to memory operations after an atomic_dec() call (smp_mb__{before,after}_atomic_inc()). A missing memory barrier in the cases where they are required by the -atomic_t implementation above can have disasterous results. Here is -an example, which follows a pattern occuring frequently in the Linux +atomic_t implementation above can have disastrous results. Here is +an example, which follows a pattern occurring frequently in the Linux kernel. It is the use of atomic counters to implement reference counting, and it works such that once the counter falls to zero it can -be guarenteed that no other entity can be accessing the object: +be guaranteed that no other entity can be accessing the object: static void obj_list_add(struct obj *obj) { @@ -291,9 +291,9 @@ to the size of an "unsigned long" C data type, and are least of that size. The endianness of the bits within each "unsigned long" are the native endianness of the cpu. - void set_bit(unsigned long nr, volatils unsigned long *addr); - void clear_bit(unsigned long nr, volatils unsigned long *addr); - void change_bit(unsigned long nr, volatils unsigned long *addr); + void set_bit(unsigned long nr, volatile unsigned long *addr); + void clear_bit(unsigned long nr, volatile unsigned long *addr); + void change_bit(unsigned long nr, volatile unsigned long *addr); These routines set, clear, and change, respectively, the bit number indicated by "nr" on the bit mask pointed to by "ADDR". @@ -301,9 +301,9 @@ indicated by "nr" on the bit mask pointed to by "ADDR". They must execute atomically, yet there are no implicit memory barrier semantics required of these interfaces. - int test_and_set_bit(unsigned long nr, volatils unsigned long *addr); - int test_and_clear_bit(unsigned long nr, volatils unsigned long *addr); - int test_and_change_bit(unsigned long nr, volatils unsigned long *addr); + int test_and_set_bit(unsigned long nr, volatile unsigned long *addr); + int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr); + int test_and_change_bit(unsigned long nr, volatile unsigned long *addr); Like the above, except that these routines return a boolean which indicates whether the changed bit was set _BEFORE_ the atomic bit @@ -335,7 +335,7 @@ subsequent memory operation is made visible. For example: /* ... */; obj->killed = 1; -The implementation of test_and_set_bit() must guarentee that +The implementation of test_and_set_bit() must guarantee that "obj->dead = 1;" is visible to cpus before the atomic memory operation done by test_and_set_bit() becomes visible. Likewise, the atomic memory operation done by test_and_set_bit() must become visible before @@ -474,7 +474,7 @@ Now, as far as memory barriers go, as long as spin_lock() strictly orders all subsequent memory operations (including the cas()) with respect to itself, things will be fine. -Said another way, _atomic_dec_and_lock() must guarentee that +Said another way, _atomic_dec_and_lock() must guarantee that a counter dropping to zero is never made visible before the spinlock being acquired. diff --git a/Documentation/console/console.txt b/Documentation/console/console.txt new file mode 100644 index 000000000000..d3e17447321c --- /dev/null +++ b/Documentation/console/console.txt @@ -0,0 +1,144 @@ +Console Drivers +=============== + +The linux kernel has 2 general types of console drivers. The first type is +assigned by the kernel to all the virtual consoles during the boot process. +This type will be called 'system driver', and only one system driver is allowed +to exist. The system driver is persistent and it can never be unloaded, though +it may become inactive. + +The second type has to be explicitly loaded and unloaded. This will be called +'modular driver' by this document. Multiple modular drivers can coexist at +any time with each driver sharing the console with other drivers including +the system driver. However, modular drivers cannot take over the console +that is currently occupied by another modular driver. (Exception: Drivers that +call take_over_console() will succeed in the takeover regardless of the type +of driver occupying the consoles.) They can only take over the console that is +occupied by the system driver. In the same token, if the modular driver is +released by the console, the system driver will take over. + +Modular drivers, from the programmer's point of view, has to call: + + take_over_console() - load and bind driver to console layer + give_up_console() - unbind and unload driver + +In newer kernels, the following are also available: + + register_con_driver() + unregister_con_driver() + +If sysfs is enabled, the contents of /sys/class/vtconsole can be +examined. This shows the console backends currently registered by the +system which are named vtcon<n> where <n> is an integer fro 0 to 15. Thus: + + ls /sys/class/vtconsole + . .. vtcon0 vtcon1 + +Each directory in /sys/class/vtconsole has 3 files: + + ls /sys/class/vtconsole/vtcon0 + . .. bind name uevent + +What do these files signify? + + 1. bind - this is a read/write file. It shows the status of the driver if + read, or acts to bind or unbind the driver to the virtual consoles + when written to. The possible values are: + + 0 - means the driver is not bound and if echo'ed, commands the driver + to unbind + + 1 - means the driver is bound and if echo'ed, commands the driver to + bind + + 2. name - read-only file. Shows the name of the driver in this format: + + cat /sys/class/vtconsole/vtcon0/name + (S) VGA+ + + '(S)' stands for a (S)ystem driver, ie, it cannot be directly + commanded to bind or unbind + + 'VGA+' is the name of the driver + + cat /sys/class/vtconsole/vtcon1/name + (M) frame buffer device + + In this case, '(M)' stands for a (M)odular driver, one that can be + directly commanded to bind or unbind. + + 3. uevent - ignore this file + +When unbinding, the modular driver is detached first, and then the system +driver takes over the consoles vacated by the driver. Binding, on the other +hand, will bind the driver to the consoles that are currently occupied by a +system driver. + +NOTE1: Binding and binding must be selected in Kconfig. It's under: + +Device Drivers -> Character devices -> Support for binding and unbinding +console drivers + +NOTE2: If any of the virtual consoles are in KD_GRAPHICS mode, then binding or +unbinding will not succeed. An example of an application that sets the console +to KD_GRAPHICS is X. + +How useful is this feature? This is very useful for console driver +developers. By unbinding the driver from the console layer, one can unload the +driver, make changes, recompile, reload and rebind the driver without any need +for rebooting the kernel. For regular users who may want to switch from +framebuffer console to VGA console and vice versa, this feature also makes +this possible. (NOTE NOTE NOTE: Please read fbcon.txt under Documentation/fb +for more details). + +Notes for developers: +===================== + +take_over_console() is now broken up into: + + register_con_driver() + bind_con_driver() - private function + +give_up_console() is a wrapper to unregister_con_driver(), and a driver must +be fully unbound for this call to succeed. con_is_bound() will check if the +driver is bound or not. + +Guidelines for console driver writers: +===================================== + +In order for binding to and unbinding from the console to properly work, +console drivers must follow these guidelines: + +1. All drivers, except system drivers, must call either register_con_driver() + or take_over_console(). register_con_driver() will just add the driver to + the console's internal list. It won't take over the + console. take_over_console(), as it name implies, will also take over (or + bind to) the console. + +2. All resources allocated during con->con_init() must be released in + con->con_deinit(). + +3. All resources allocated in con->con_startup() must be released when the + driver, which was previously bound, becomes unbound. The console layer + does not have a complementary call to con->con_startup() so it's up to the + driver to check when it's legal to release these resources. Calling + con_is_bound() in con->con_deinit() will help. If the call returned + false(), then it's safe to release the resources. This balance has to be + ensured because con->con_startup() can be called again when a request to + rebind the driver to the console arrives. + +4. Upon exit of the driver, ensure that the driver is totally unbound. If the + condition is satisfied, then the driver must call unregister_con_driver() + or give_up_console(). + +5. unregister_con_driver() can also be called on conditions which make it + impossible for the driver to service console requests. This can happen + with the framebuffer console that suddenly lost all of its drivers. + +The current crop of console drivers should still work correctly, but binding +and unbinding them may cause problems. With minimal fixes, these drivers can +be made to work correctly. + +========================== +Antonino Daplas <adaplas@pol.net> + diff --git a/Documentation/devices.txt b/Documentation/devices.txt index b369a8c46a73..4aaf68fafebe 100644 --- a/Documentation/devices.txt +++ b/Documentation/devices.txt @@ -3,7 +3,7 @@ Maintained by Torben Mathiasen <device@lanana.org> - Last revised: 25 January 2005 + Last revised: 15 May 2006 This list is the Linux Device List, the official registry of allocated device numbers and /dev directory nodes for the Linux operating @@ -94,7 +94,6 @@ Your cooperation is appreciated. 9 = /dev/urandom Faster, less secure random number gen. 10 = /dev/aio Asyncronous I/O notification interface 11 = /dev/kmsg Writes to this come out as printk's - 12 = /dev/oldmem Access to crash dump from kexec kernel 1 block RAM disk 0 = /dev/ram0 First RAM disk 1 = /dev/ram1 Second RAM disk @@ -262,13 +261,13 @@ Your cooperation is appreciated. NOTE: These devices permit both read and write access. 7 block Loopback devices - 0 = /dev/loop0 First loopback device - 1 = /dev/loop1 Second loopback device + 0 = /dev/loop0 First loop device + 1 = /dev/loop1 Second loop device ... - The loopback devices are used to mount filesystems not + The loop devices are used to mount filesystems not associated with block devices. The binding to the - loopback devices is handled by mount(8) or losetup(8). + loop devices is handled by mount(8) or losetup(8). 8 block SCSI disk devices (0-15) 0 = /dev/sda First SCSI disk whole disk @@ -943,7 +942,7 @@ Your cooperation is appreciated. 240 = /dev/ftlp FTL on 16th Memory Technology Device Partitions are handled in the same way as for IDE - disks (see major number 3) expect that the partition + disks (see major number 3) except that the partition limit is 15 rather than 63 per disk (same as SCSI.) 45 char isdn4linux ISDN BRI driver @@ -1168,7 +1167,7 @@ Your cooperation is appreciated. The filename of the encrypted container and the passwords are sent via ioctls (using the sdmount tool) to the master node which then activates them via one of the - /dev/scramdisk/x nodes for loopback mounting (all handled + /dev/scramdisk/x nodes for loop mounting (all handled through the sdmount tool). Requested by: andy@scramdisklinux.org @@ -2538,18 +2537,32 @@ Your cooperation is appreciated. 0 = /dev/usb/lp0 First USB printer ... 15 = /dev/usb/lp15 16th USB printer - 16 = /dev/usb/mouse0 First USB mouse - ... - 31 = /dev/usb/mouse15 16th USB mouse - 32 = /dev/usb/ez0 First USB firmware loader - ... - 47 = /dev/usb/ez15 16th USB firmware loader 48 = /dev/usb/scanner0 First USB scanner ... 63 = /dev/usb/scanner15 16th USB scanner 64 = /dev/usb/rio500 Diamond Rio 500 65 = /dev/usb/usblcd USBLCD Interface (info@usblcd.de) 66 = /dev/usb/cpad0 Synaptics cPad (mouse/LCD) + 96 = /dev/usb/hiddev0 1st USB HID device + ... + 111 = /dev/usb/hiddev15 16th USB HID device + 112 = /dev/usb/auer0 1st auerswald ISDN device + ... + 127 = /dev/usb/auer15 16th auerswald ISDN device + 128 = /dev/usb/brlvgr0 First Braille Voyager device + ... + 131 = /dev/usb/brlvgr3 Fourth Braille Voyager device + 132 = /dev/usb/idmouse ID Mouse (fingerprint scanner) device + 133 = /dev/usb/sisusbvga1 First SiSUSB VGA device + ... + 140 = /dev/usb/sisusbvga8 Eigth SISUSB VGA device + 144 = /dev/usb/lcd USB LCD device + 160 = /dev/usb/legousbtower0 1st USB Legotower device + ... + 175 = /dev/usb/legousbtower15 16th USB Legotower device + 240 = /dev/usb/dabusb0 First daubusb device + ... + 243 = /dev/usb/dabusb3 Fourth dabusb device 180 block USB block devices 0 = /dev/uba First USB block device @@ -2710,6 +2723,17 @@ Your cooperation is appreciated. 1 = /dev/cpu/1/msr MSRs on CPU 1 ... +202 block Xen Virtual Block Device + 0 = /dev/xvda First Xen VBD whole disk + 16 = /dev/xvdb Second Xen VBD whole disk + 32 = /dev/xvdc Third Xen VBD whole disk + ... + 240 = /dev/xvdp Sixteenth Xen VBD whole disk + + Partitions are handled in the same way as for IDE + disks (see major number 3) except that the limit on + partitions is 15. + 203 char CPU CPUID information 0 = /dev/cpu/0/cpuid CPUID on CPU 0 1 = /dev/cpu/1/cpuid CPUID on CPU 1 @@ -2747,11 +2771,27 @@ Your cooperation is appreciated. 46 = /dev/ttyCPM0 PPC CPM (SCC or SMC) - port 0 ... 47 = /dev/ttyCPM5 PPC CPM (SCC or SMC) - port 5 - 50 = /dev/ttyIOC40 Altix serial card + 50 = /dev/ttyIOC0 Altix serial card + ... + 81 = /dev/ttyIOC31 Altix serial card + 82 = /dev/ttyVR0 NEC VR4100 series SIU + 83 = /dev/ttyVR1 NEC VR4100 series DSIU + 84 = /dev/ttyIOC84 Altix ioc4 serial card + ... + 115 = /dev/ttyIOC115 Altix ioc4 serial card + 116 = /dev/ttySIOC0 Altix ioc3 serial card + ... + 147 = /dev/ttySIOC31 Altix ioc3 serial card + 148 = /dev/ttyPSC0 PPC PSC - port 0 + ... + 153 = /dev/ttyPSC5 PPC PSC - port 5 + 154 = /dev/ttyAT0 ATMEL serial port 0 + ... + 169 = /dev/ttyAT15 ATMEL serial port 15 + 170 = /dev/ttyNX0 Hilscher netX serial port 0 ... - 81 = /dev/ttyIOC431 Altix serial card - 82 = /dev/ttyVR0 NEC VR4100 series SIU - 83 = /dev/ttyVR1 NEC VR4100 series DSIU + 185 = /dev/ttyNX15 Hilscher netX serial port 15 + 186 = /dev/ttyJ0 JTAG1 DCC protocol based serial port emulation 205 char Low-density serial ports (alternate device) 0 = /dev/culu0 Callout device for ttyLU0 @@ -2786,8 +2826,8 @@ Your cooperation is appreciated. 50 = /dev/cuioc40 Callout device for ttyIOC40 ... 81 = /dev/cuioc431 Callout device for ttyIOC431 - 82 = /dev/cuvr0 Callout device for ttyVR0 - 83 = /dev/cuvr1 Callout device for ttyVR1 + 82 = /dev/cuvr0 Callout device for ttyVR0 + 83 = /dev/cuvr1 Callout device for ttyVR1 206 char OnStream SC-x0 tape devices @@ -2897,7 +2937,6 @@ Your cooperation is appreciated. ... 196 = /dev/dvb/adapter3/video0 first video decoder of fourth card - 216 char Bluetooth RFCOMM TTY devices 0 = /dev/rfcomm0 First Bluetooth RFCOMM TTY device 1 = /dev/rfcomm1 Second Bluetooth RFCOMM TTY device @@ -3002,12 +3041,43 @@ Your cooperation is appreciated. ioctl()'s can be used to rewind the tape regardless of the device used to access it. -231 char InfiniBand MAD +231 char InfiniBand 0 = /dev/infiniband/umad0 1 = /dev/infiniband/umad1 - ... + ... + 63 = /dev/infiniband/umad63 63rd InfiniBandMad device + 64 = /dev/infiniband/issm0 First InfiniBand IsSM device + 65 = /dev/infiniband/issm1 Second InfiniBand IsSM device + ... + 127 = /dev/infiniband/issm63 63rd InfiniBand IsSM device + 128 = /dev/infiniband/uverbs0 First InfiniBand verbs device + 129 = /dev/infiniband/uverbs1 Second InfiniBand verbs device + ... + 159 = /dev/infiniband/uverbs31 31st InfiniBand verbs device + +232 char Biometric Devices + 0 = /dev/biometric/sensor0/fingerprint first fingerprint sensor on first device + 1 = /dev/biometric/sensor0/iris first iris sensor on first device + 2 = /dev/biometric/sensor0/retina first retina sensor on first device + 3 = /dev/biometric/sensor0/voiceprint first voiceprint sensor on first device + 4 = /dev/biometric/sensor0/facial first facial sensor on first device + 5 = /dev/biometric/sensor0/hand first hand sensor on first device + ... + 10 = /dev/biometric/sensor1/fingerprint first fingerprint sensor on second device + ... + 20 = /dev/biometric/sensor2/fingerprint first fingerprint sensor on third device + ... + +233 char PathScale InfiniPath interconnect + 0 = /dev/ipath Primary device for programs (any unit) + 1 = /dev/ipath0 Access specifically to unit 0 + 2 = /dev/ipath1 Access specifically to unit 1 + ... + 4 = /dev/ipath3 Access specifically to unit 3 + 129 = /dev/ipath_sma Device used by Subnet Management Agent + 130 = /dev/ipath_diag Device used by diagnostics programs -232-239 UNASSIGNED +234-239 UNASSIGNED 240-254 char LOCAL/EXPERIMENTAL USE 240-254 block LOCAL/EXPERIMENTAL USE @@ -3021,6 +3091,28 @@ Your cooperation is appreciated. This major is reserved to assist the expansion to a larger number space. No device nodes with this major should ever be created on the filesystem. + (This is probaly not true anymore, but I'll leave it + for now /Torben) + +---LARGE MAJORS!!!!!--- + +256 char Equinox SST multi-port serial boards + 0 = /dev/ttyEQ0 First serial port on first Equinox SST board + 127 = /dev/ttyEQ127 Last serial port on first Equinox SST board + 128 = /dev/ttyEQ128 First serial port on second Equinox SST board + ... + 1027 = /dev/ttyEQ1027 Last serial port on eighth Equinox SST board + +256 block Resident Flash Disk Flash Translation Layer + 0 = /dev/rfda First RFD FTL layer + 16 = /dev/rfdb Second RFD FTL layer + ... + 240 = /dev/rfdp 16th RFD FTL layer + +257 char Phoenix Technologies Cryptographic Services Driver + 0 = /dev/ptlsec Crypto Services Driver + + **** ADDITIONAL /dev DIRECTORY ENTRIES diff --git a/Documentation/driver-model/overview.txt b/Documentation/driver-model/overview.txt index ac4a7a737e43..2050c9ffc629 100644 --- a/Documentation/driver-model/overview.txt +++ b/Documentation/driver-model/overview.txt @@ -18,7 +18,7 @@ Traditional driver models implemented some sort of tree-like structure (sometimes just a list) for the devices they control. There wasn't any uniformity across the different bus types. -The current driver model provides a comon, uniform data model for describing +The current driver model provides a common, uniform data model for describing a bus and the devices that can appear under the bus. The unified bus model includes a set of common attributes which all busses carry, and a set of common callbacks, such as device discovery during bus probing, bus diff --git a/Documentation/fb/fbcon.txt b/Documentation/fb/fbcon.txt index 08dce0f631bf..f373df12ed4c 100644 --- a/Documentation/fb/fbcon.txt +++ b/Documentation/fb/fbcon.txt @@ -135,10 +135,10 @@ C. Boot options The angle can be changed anytime afterwards by 'echoing' the same numbers to any one of the 2 attributes found in - /sys/class/graphics/fb{x} + /sys/class/graphics/fbcon - con_rotate - rotate the display of the active console - con_rotate_all - rotate the display of all consoles + rotate - rotate the display of the active console + rotate_all - rotate the display of all consoles Console rotation will only become available if Console Rotation Support is compiled in your kernel. @@ -148,5 +148,177 @@ C. Boot options Actually, the underlying fb driver is totally ignorant of console rotation. ---- +C. Attaching, Detaching and Unloading + +Before going on on how to attach, detach and unload the framebuffer console, an +illustration of the dependencies may help. + +The console layer, as with most subsystems, needs a driver that interfaces with +the hardware. Thus, in a VGA console: + +console ---> VGA driver ---> hardware. + +Assuming the VGA driver can be unloaded, one must first unbind the VGA driver +from the console layer before unloading the driver. The VGA driver cannot be +unloaded if it is still bound to the console layer. (See +Documentation/console/console.txt for more information). + +This is more complicated in the case of the the framebuffer console (fbcon), +because fbcon is an intermediate layer between the console and the drivers: + +console ---> fbcon ---> fbdev drivers ---> hardware + +The fbdev drivers cannot be unloaded if it's bound to fbcon, and fbcon cannot +be unloaded if it's bound to the console layer. + +So to unload the fbdev drivers, one must first unbind fbcon from the console, +then unbind the fbdev drivers from fbcon. Fortunately, unbinding fbcon from +the console layer will automatically unbind framebuffer drivers from +fbcon. Thus, there is no need to explicitly unbind the fbdev drivers from +fbcon. + +So, how do we unbind fbcon from the console? Part of the answer is in +Documentation/console/console.txt. To summarize: + +Echo a value to the bind file that represents the framebuffer console +driver. So assuming vtcon1 represents fbcon, then: + +echo 1 > sys/class/vtconsole/vtcon1/bind - attach framebuffer console to + console layer +echo 0 > sys/class/vtconsole/vtcon1/bind - detach framebuffer console from + console layer + +If fbcon is detached from the console layer, your boot console driver (which is +usually VGA text mode) will take over. A few drivers (rivafb and i810fb) will +restore VGA text mode for you. With the rest, before detaching fbcon, you +must take a few additional steps to make sure that your VGA text mode is +restored properly. The following is one of the several methods that you can do: + +1. Download or install vbetool. This utility is included with most + distributions nowadays, and is usually part of the suspend/resume tool. + +2. In your kernel configuration, ensure that CONFIG_FRAMEBUFFER_CONSOLE is set + to 'y' or 'm'. Enable one or more of your favorite framebuffer drivers. + +3. Boot into text mode and as root run: + + vbetool vbestate save > <vga state file> + + The above command saves the register contents of your graphics + hardware to <vga state file>. You need to do this step only once as + the state file can be reused. + +4. If fbcon is compiled as a module, load fbcon by doing: + + modprobe fbcon + +5. Now to detach fbcon: + + vbetool vbestate restore < <vga state file> && \ + echo 0 > /sys/class/vtconsole/vtcon1/bind + +6. That's it, you're back to VGA mode. And if you compiled fbcon as a module, + you can unload it by 'rmmod fbcon' + +7. To reattach fbcon: + + echo 1 > /sys/class/vtconsole/vtcon1/bind + +8. Once fbcon is unbound, all drivers registered to the system will also +become unbound. This means that fbcon and individual framebuffer drivers +can be unloaded or reloaded at will. Reloading the drivers or fbcon will +automatically bind the console, fbcon and the drivers together. Unloading +all the drivers without unloading fbcon will make it impossible for the +console to bind fbcon. + +Notes for vesafb users: +======================= + +Unfortunately, if your bootline includes a vga=xxx parameter that sets the +hardware in graphics mode, such as when loading vesafb, vgacon will not load. +Instead, vgacon will replace the default boot console with dummycon, and you +won't get any display after detaching fbcon. Your machine is still alive, so +you can reattach vesafb. However, to reattach vesafb, you need to do one of +the following: + +Variation 1: + + a. Before detaching fbcon, do + + vbetool vbemode save > <vesa state file> # do once for each vesafb mode, + # the file can be reused + + b. Detach fbcon as in step 5. + + c. Attach fbcon + + vbetool vbestate restore < <vesa state file> && \ + echo 1 > /sys/class/vtconsole/vtcon1/bind + +Variation 2: + + a. Before detaching fbcon, do: + echo <ID> > /sys/class/tty/console/bind + + + vbetool vbemode get + + b. Take note of the mode number + + b. Detach fbcon as in step 5. + + c. Attach fbcon: + + vbetool vbemode set <mode number> && \ + echo 1 > /sys/class/vtconsole/vtcon1/bind + +Samples: +======== + +Here are 2 sample bash scripts that you can use to bind or unbind the +framebuffer console driver if you are in an X86 box: + +--------------------------------------------------------------------------- +#!/bin/bash +# Unbind fbcon + +# Change this to where your actual vgastate file is located +# Or Use VGASTATE=$1 to indicate the state file at runtime +VGASTATE=/tmp/vgastate + +# path to vbetool +VBETOOL=/usr/local/bin + + +for (( i = 0; i < 16; i++)) +do + if test -x /sys/class/vtconsole/vtcon$i; then + if [ `cat /sys/class/vtconsole/vtcon$i/name | grep -c "frame buffer"` \ + = 1 ]; then + if test -x $VBETOOL/vbetool; then + echo Unbinding vtcon$i + $VBETOOL/vbetool vbestate restore < $VGASTATE + echo 0 > /sys/class/vtconsole/vtcon$i/bind + fi + fi + fi +done + +--------------------------------------------------------------------------- +#!/bin/bash +# Bind fbcon + +for (( i = 0; i < 16; i++)) +do + if test -x /sys/class/vtconsole/vtcon$i; then + if [ `cat /sys/class/vtconsole/vtcon$i/name | grep -c "frame buffer"` \ + = 1 ]; then + echo Unbinding vtcon$i + echo 1 > /sys/class/vtconsole/vtcon$i/bind + fi + fi +done +--------------------------------------------------------------------------- + +-- Antonino Daplas <adaplas@pol.net> diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 43ab119963d5..033ac91da07a 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -33,27 +33,12 @@ Who: Adrian Bunk <bunk@stusta.de> --------------------------- -What: RCU API moves to EXPORT_SYMBOL_GPL -When: April 2006 -Files: include/linux/rcupdate.h, kernel/rcupdate.c -Why: Outside of Linux, the only implementations of anything even - vaguely resembling RCU that I am aware of are in DYNIX/ptx, - VM/XA, Tornado, and K42. I do not expect anyone to port binary - drivers or kernel modules from any of these, since the first two - are owned by IBM and the last two are open-source research OSes. - So these will move to GPL after a grace period to allow - people, who might be using implementations that I am not aware - of, to adjust to this upcoming change. -Who: Paul E. McKenney <paulmck@us.ibm.com> - ---------------------------- - What: raw1394: requests of type RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN -When: November 2005 +When: November 2006 Why: Deprecated in favour of the new ioctl-based rawiso interface, which is more efficient. You should really be using libraw1394 for raw1394 access anyway. -Who: Jody McIntyre <scjody@steamballoon.com> +Who: Jody McIntyre <scjody@modernduck.com> --------------------------- @@ -192,6 +177,16 @@ Who: Jean Delvare <khali@linux-fr.org> --------------------------- +What: Unused EXPORT_SYMBOL/EXPORT_SYMBOL_GPL exports + (temporary transition config option provided until then) + The transition config option will also be removed at the same time. +When: before 2.6.19 +Why: Unused symbols are both increasing the size of the kernel binary + and are often a sign of "wrong API" +Who: Arjan van de Ven <arjan@linux.intel.com> + +--------------------------- + What: remove EXPORT_SYMBOL(tasklist_lock) When: August 2006 Files: kernel/fork.c @@ -212,15 +207,6 @@ Who: Greg Kroah-Hartman <gregkh@suse.de> --------------------------- -What: Support for NEC DDB5074 and DDB5476 evaluation boards. -When: June 2006 -Why: Board specific code doesn't build anymore since ~2.6.0 and no - users have complained indicating there is no more need for these - boards. This should really be considered a last call. -Who: Ralf Baechle <ralf@linux-mips.org> - ---------------------------- - What: USB driver API moves to EXPORT_SYMBOL_GPL When: Febuary 2008 Files: include/linux/usb.h, drivers/usb/core/driver.c diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 1045da582b9b..d31efbbdfe50 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -99,7 +99,7 @@ prototypes: int (*sync_fs)(struct super_block *sb, int wait); void (*write_super_lockfs) (struct super_block *); void (*unlockfs) (struct super_block *); - int (*statfs) (struct super_block *, struct kstatfs *); + int (*statfs) (struct dentry *, struct kstatfs *); int (*remount_fs) (struct super_block *, int *, char *); void (*clear_inode) (struct inode *); void (*umount_begin) (struct super_block *); @@ -142,15 +142,16 @@ see also dquot_operations section. --------------------------- file_system_type --------------------------- prototypes: - struct super_block *(*get_sb) (struct file_system_type *, int, - const char *, void *); + struct int (*get_sb) (struct file_system_type *, int, + const char *, void *, struct vfsmount *); void (*kill_sb) (struct super_block *); locking rules: may block BKL get_sb yes yes kill_sb yes yes -->get_sb() returns error or a locked superblock (exclusive on ->s_umount). +->get_sb() returns error or 0 with locked superblock attached to the vfsmount +(exclusive on ->s_umount). ->kill_sb() takes a write-locked superblock, does all shutdown work on it, unlocks and drops the reference. diff --git a/Documentation/filesystems/automount-support.txt b/Documentation/filesystems/automount-support.txt index 58c65a1713e5..7cac200e2a85 100644 --- a/Documentation/filesystems/automount-support.txt +++ b/Documentation/filesystems/automount-support.txt @@ -19,7 +19,7 @@ following procedure: (2) Have the follow_link() op do the following steps: - (a) Call do_kern_mount() to call the appropriate filesystem to set up a + (a) Call vfs_kern_mount() to call the appropriate filesystem to set up a superblock and gain a vfsmount structure representing it. (b) Copy the nameidata provided as an argument and substitute the dentry diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt index afb1335c05d6..4aecc9bdb273 100644 --- a/Documentation/filesystems/ext3.txt +++ b/Documentation/filesystems/ext3.txt @@ -113,6 +113,14 @@ noquota grpquota usrquota +bh (*) ext3 associates buffer heads to data pages to +nobh (a) cache disk block mapping information + (b) link pages into transaction to provide + ordering guarantees. + "bh" option forces use of buffer heads. + "nobh" option tries to avoid associating buffer + heads (supported only for "writeback" mode). + Specification ============= diff --git a/Documentation/filesystems/fuse.txt b/Documentation/filesystems/fuse.txt index 33f74310d161..a584f05403a4 100644 --- a/Documentation/filesystems/fuse.txt +++ b/Documentation/filesystems/fuse.txt @@ -18,6 +18,14 @@ Non-privileged mount (or user mount): user. NOTE: this is not the same as mounts allowed with the "user" option in /etc/fstab, which is not discussed here. +Filesystem connection: + + A connection between the filesystem daemon and the kernel. The + connection exists until either the daemon dies, or the filesystem is + umounted. Note that detaching (or lazy umounting) the filesystem + does _not_ break the connection, in this case it will exist until + the last reference to the filesystem is released. + Mount owner: The user who does the mounting. @@ -86,16 +94,20 @@ Mount options The default is infinite. Note that the size of read requests is limited anyway to 32 pages (which is 128kbyte on i386). -Sysfs -~~~~~ +Control filesystem +~~~~~~~~~~~~~~~~~~ + +There's a control filesystem for FUSE, which can be mounted by: -FUSE sets up the following hierarchy in sysfs: + mount -t fusectl none /sys/fs/fuse/connections - /sys/fs/fuse/connections/N/ +Mounting it under the '/sys/fs/fuse/connections' directory makes it +backwards compatible with earlier versions. -where N is an increasing number allocated to each new connection. +Under the fuse control filesystem each connection has a directory +named by a unique number. -For each connection the following attributes are defined: +For each connection the following files exist within this directory: 'waiting' @@ -110,7 +122,47 @@ For each connection the following attributes are defined: connection. This means that all waiting requests will be aborted an error returned for all aborted and new requests. -Only a privileged user may read or write these attributes. +Only the owner of the mount may read or write these files. + +Interrupting filesystem operations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If a process issuing a FUSE filesystem request is interrupted, the +following will happen: + + 1) If the request is not yet sent to userspace AND the signal is + fatal (SIGKILL or unhandled fatal signal), then the request is + dequeued and returns immediately. + + 2) If the request is not yet sent to userspace AND the signal is not + fatal, then an 'interrupted' flag is set for the request. When + the request has been successfully transfered to userspace and + this flag is set, an INTERRUPT request is queued. + + 3) If the request is already sent to userspace, then an INTERRUPT + request is queued. + +INTERRUPT requests take precedence over other requests, so the +userspace filesystem will receive queued INTERRUPTs before any others. + +The userspace filesystem may ignore the INTERRUPT requests entirely, +or may honor them by sending a reply to the _original_ request, with +the error set to EINTR. + +It is also possible that there's a race between processing the +original request and it's INTERRUPT request. There are two possibilities: + + 1) The INTERRUPT request is processed before the original request is + processed + + 2) The INTERRUPT request is processed after the original request has + been answered + +If the filesystem cannot find the original request, it should wait for +some timeout and/or a number of new requests to arrive, after which it +should reply to the INTERRUPT request with an EAGAIN error. In case +1) the INTERRUPT request will be requeued. In case 2) the INTERRUPT +reply will be ignored. Aborting a filesystem connection ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -139,8 +191,8 @@ the filesystem. There are several ways to do this: - Use forced umount (umount -f). Works in all cases but only if filesystem is still attached (it hasn't been lazy unmounted) - - Abort filesystem through the sysfs interface. Most powerful - method, always works. + - Abort filesystem through the FUSE control filesystem. Most + powerful method, always works. How do non-privileged mounts work? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -304,25 +356,7 @@ Scenario 1 - Simple deadlock | | for "file"] | | *DEADLOCK* -The solution for this is to allow requests to be interrupted while -they are in userspace: - - | [interrupted by signal] | - | <fuse_unlink() | - | [release semaphore] | [semaphore acquired] - | <sys_unlink() | - | | >fuse_unlink() - | | [queue req on fc->pending] - | | [wake up fc->waitq] - | | [sleep on req->waitq] - -If the filesystem daemon was single threaded, this will stop here, -since there's no other thread to dequeue and execute the request. -In this case the solution is to kill the FUSE daemon as well. If -there are multiple serving threads, you just have to kill them as -long as any remain. - -Moral: a filesystem which deadlocks, can soon find itself dead. +The solution for this is to allow the filesystem to be aborted. Scenario 2 - Tricky deadlock ---------------------------- @@ -355,24 +389,14 @@ but is caused by a pagefault. | | [lock page] | | * DEADLOCK * -Solution is again to let the the request be interrupted (not -elaborated further). - -An additional problem is that while the write buffer is being -copied to the request, the request must not be interrupted. This -is because the destination address of the copy may not be valid -after the request is interrupted. - -This is solved with doing the copy atomically, and allowing -interruption while the page(s) belonging to the write buffer are -faulted with get_user_pages(). The 'req->locked' flag indicates -when the copy is taking place, and interruption is delayed until -this flag is unset. +Solution is basically the same as above. -Scenario 3 - Tricky deadlock with asynchronous read ---------------------------------------------------- +An additional problem is that while the write buffer is being copied +to the request, the request must not be interrupted/aborted. This is +because the destination address of the copy may not be valid after the +request has returned. -The same situation as above, except thread-1 will wait on page lock -and hence it will be uninterruptible as well. The solution is to -abort the connection with forced umount (if mount is attached) or -through the abort attribute in sysfs. +This is solved with doing the copy atomically, and allowing abort +while the page(s) belonging to the write buffer are faulted with +get_user_pages(). The 'req->locked' flag indicates when the copy is +taking place, and abort is delayed until this flag is unset. diff --git a/Documentation/filesystems/inotify.txt b/Documentation/filesystems/inotify.txt index 6d501903f68e..59a919f16144 100644 --- a/Documentation/filesystems/inotify.txt +++ b/Documentation/filesystems/inotify.txt @@ -69,17 +69,135 @@ Prototypes: int inotify_rm_watch (int fd, __u32 mask); -(iii) Internal Kernel Implementation +(iii) Kernel Interface -Each inotify instance is associated with an inotify_device structure. +Inotify's kernel API consists a set of functions for managing watches and an +event callback. + +To use the kernel API, you must first initialize an inotify instance with a set +of inotify_operations. You are given an opaque inotify_handle, which you use +for any further calls to inotify. + + struct inotify_handle *ih = inotify_init(my_event_handler); + +You must provide a function for processing events and a function for destroying +the inotify watch. + + void handle_event(struct inotify_watch *watch, u32 wd, u32 mask, + u32 cookie, const char *name, struct inode *inode) + + watch - the pointer to the inotify_watch that triggered this call + wd - the watch descriptor + mask - describes the event that occurred + cookie - an identifier for synchronizing events + name - the dentry name for affected files in a directory-based event + inode - the affected inode in a directory-based event + + void destroy_watch(struct inotify_watch *watch) + +You may add watches by providing a pre-allocated and initialized inotify_watch +structure and specifying the inode to watch along with an inotify event mask. +You must pin the inode during the call. You will likely wish to embed the +inotify_watch structure in a structure of your own which contains other +information about the watch. Once you add an inotify watch, it is immediately +subject to removal depending on filesystem events. You must grab a reference if +you depend on the watch hanging around after the call. + + inotify_init_watch(&my_watch->iwatch); + inotify_get_watch(&my_watch->iwatch); // optional + s32 wd = inotify_add_watch(ih, &my_watch->iwatch, inode, mask); + inotify_put_watch(&my_watch->iwatch); // optional + +You may use the watch descriptor (wd) or the address of the inotify_watch for +other inotify operations. You must not directly read or manipulate data in the +inotify_watch. Additionally, you must not call inotify_add_watch() more than +once for a given inotify_watch structure, unless you have first called either +inotify_rm_watch() or inotify_rm_wd(). + +To determine if you have already registered a watch for a given inode, you may +call inotify_find_watch(), which gives you both the wd and the watch pointer for +the inotify_watch, or an error if the watch does not exist. + + wd = inotify_find_watch(ih, inode, &watchp); + +You may use container_of() on the watch pointer to access your own data +associated with a given watch. When an existing watch is found, +inotify_find_watch() bumps the refcount before releasing its locks. You must +put that reference with: + + put_inotify_watch(watchp); + +Call inotify_find_update_watch() to update the event mask for an existing watch. +inotify_find_update_watch() returns the wd of the updated watch, or an error if +the watch does not exist. + + wd = inotify_find_update_watch(ih, inode, mask); + +An existing watch may be removed by calling either inotify_rm_watch() or +inotify_rm_wd(). + + int ret = inotify_rm_watch(ih, &my_watch->iwatch); + int ret = inotify_rm_wd(ih, wd); + +A watch may be removed while executing your event handler with the following: + + inotify_remove_watch_locked(ih, iwatch); + +Call inotify_destroy() to remove all watches from your inotify instance and +release it. If there are no outstanding references, inotify_destroy() will call +your destroy_watch op for each watch. + + inotify_destroy(ih); + +When inotify removes a watch, it sends an IN_IGNORED event to your callback. +You may use this event as an indication to free the watch memory. Note that +inotify may remove a watch due to filesystem events, as well as by your request. +If you use IN_ONESHOT, inotify will remove the watch after the first event, at +which point you may call the final inotify_put_watch. + +(iv) Kernel Interface Prototypes + + struct inotify_handle *inotify_init(struct inotify_operations *ops); + + inotify_init_watch(struct inotify_watch *watch); + + s32 inotify_add_watch(struct inotify_handle *ih, + struct inotify_watch *watch, + struct inode *inode, u32 mask); + + s32 inotify_find_watch(struct inotify_handle *ih, struct inode *inode, + struct inotify_watch **watchp); + + s32 inotify_find_update_watch(struct inotify_handle *ih, + struct inode *inode, u32 mask); + + int inotify_rm_wd(struct inotify_handle *ih, u32 wd); + + int inotify_rm_watch(struct inotify_handle *ih, + struct inotify_watch *watch); + + void inotify_remove_watch_locked(struct inotify_handle *ih, + struct inotify_watch *watch); + + void inotify_destroy(struct inotify_handle *ih); + + void get_inotify_watch(struct inotify_watch *watch); + void put_inotify_watch(struct inotify_watch *watch); + + +(v) Internal Kernel Implementation + +Each inotify instance is represented by an inotify_handle structure. +Inotify's userspace consumers also have an inotify_device which is +associated with the inotify_handle, and on which events are queued. Each watch is associated with an inotify_watch structure. Watches are chained -off of each associated device and each associated inode. +off of each associated inotify_handle and each associated inode. -See fs/inotify.c for the locking and lifetime rules. +See fs/inotify.c and fs/inotify_user.c for the locking and lifetime rules. -(iv) Rationale +(vi) Rationale Q: What is the design decision behind not tying the watch to the open fd of the watched object? @@ -145,7 +263,7 @@ A: The poor user-space interface is the second biggest problem with dnotify. file descriptor-based one that allows basic file I/O and poll/select. Obtaining the fd and managing the watches could have been done either via a device file or a family of new system calls. We decided to implement a - family of system calls because that is the preffered approach for new kernel + family of system calls because that is the preferred approach for new kernel interfaces. The only real difference was whether we wanted to use open(2) and ioctl(2) or a couple of new system calls. System calls beat ioctls. diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting index 2f388460cbe7..5531694059ab 100644 --- a/Documentation/filesystems/porting +++ b/Documentation/filesystems/porting @@ -50,10 +50,11 @@ Turn your foo_read_super() into a function that would return 0 in case of success and negative number in case of error (-EINVAL unless you have more informative error value to report). Call it foo_fill_super(). Now declare -struct super_block foo_get_sb(struct file_system_type *fs_type, - int flags, const char *dev_name, void *data) +int foo_get_sb(struct file_system_type *fs_type, + int flags, const char *dev_name, void *data, struct vfsmount *mnt) { - return get_sb_bdev(fs_type, flags, dev_name, data, ext2_fill_super); + return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super, + mnt); } (or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.txt b/Documentation/filesystems/ramfs-rootfs-initramfs.txt index 60ab61e54e8a..25981e2e51be 100644 --- a/Documentation/filesystems/ramfs-rootfs-initramfs.txt +++ b/Documentation/filesystems/ramfs-rootfs-initramfs.txt @@ -70,11 +70,13 @@ tmpfs mounts. See Documentation/filesystems/tmpfs.txt for more information. What is rootfs? --------------- -Rootfs is a special instance of ramfs, which is always present in 2.6 systems. -(It's used internally as the starting and stopping point for searches of the -kernel's doubly-linked list of mount points.) +Rootfs is a special instance of ramfs (or tmpfs, if that's enabled), which is +always present in 2.6 systems. You can't unmount rootfs for approximately the +same reason you can't kill the init process; rather than having special code +to check for and handle an empty list, it's smaller and simpler for the kernel +to just make sure certain lists can't become empty. -Most systems just mount another filesystem over it and ignore it. The +Most systems just mount another filesystem over rootfs and ignore it. The amount of space an empty instance of ramfs takes up is tiny. What is initramfs? @@ -92,14 +94,16 @@ out of that. All this differs from the old initrd in several ways: - - The old initrd was a separate file, while the initramfs archive is linked - into the linux kernel image. (The directory linux-*/usr is devoted to - generating this archive during the build.) + - The old initrd was always a separate file, while the initramfs archive is + linked into the linux kernel image. (The directory linux-*/usr is devoted + to generating this archive during the build.) - The old initrd file was a gzipped filesystem image (in some file format, - such as ext2, that had to be built into the kernel), while the new + such as ext2, that needed a driver built into the kernel), while the new initramfs archive is a gzipped cpio archive (like tar only simpler, - see cpio(1) and Documentation/early-userspace/buffer-format.txt). + see cpio(1) and Documentation/early-userspace/buffer-format.txt). The + kernel's cpio extraction code is not only extremely small, it's also + __init data that can be discarded during the boot process. - The program run by the old initrd (which was called /initrd, not /init) did some setup and then returned to the kernel, while the init program from @@ -124,13 +128,14 @@ Populating initramfs: The 2.6 kernel build process always creates a gzipped cpio format initramfs archive and links it into the resulting kernel binary. By default, this -archive is empty (consuming 134 bytes on x86). The config option -CONFIG_INITRAMFS_SOURCE (for some reason buried under devices->block devices -in menuconfig, and living in usr/Kconfig) can be used to specify a source for -the initramfs archive, which will automatically be incorporated into the -resulting binary. This option can point to an existing gzipped cpio archive, a -directory containing files to be archived, or a text file specification such -as the following example: +archive is empty (consuming 134 bytes on x86). + +The config option CONFIG_INITRAMFS_SOURCE (for some reason buried under +devices->block devices in menuconfig, and living in usr/Kconfig) can be used +to specify a source for the initramfs archive, which will automatically be +incorporated into the resulting binary. This option can point to an existing +gzipped cpio archive, a directory containing files to be archived, or a text +file specification such as the following example: dir /dev 755 0 0 nod /dev/console 644 0 0 c 5 1 @@ -146,23 +151,84 @@ as the following example: Run "usr/gen_init_cpio" (after the kernel build) to get a usage message documenting the above file format. -One advantage of the text file is that root access is not required to +One advantage of the configuration file is that root access is not required to set permissions or create device nodes in the new archive. (Note that those two example "file" entries expect to find files named "init.sh" and "busybox" in a directory called "initramfs", under the linux-2.6.* directory. See Documentation/early-userspace/README for more details.) -The kernel does not depend on external cpio tools, gen_init_cpio is created -from usr/gen_init_cpio.c which is entirely self-contained, and the kernel's -boot-time extractor is also (obviously) self-contained. However, if you _do_ -happen to have cpio installed, the following command line can extract the -generated cpio image back into its component files: +The kernel does not depend on external cpio tools. If you specify a +directory instead of a configuration file, the kernel's build infrastructure +creates a configuration file from that directory (usr/Makefile calls +scripts/gen_initramfs_list.sh), and proceeds to package up that directory +using the config file (by feeding it to usr/gen_init_cpio, which is created +from usr/gen_init_cpio.c). The kernel's build-time cpio creation code is +entirely self-contained, and the kernel's boot-time extractor is also +(obviously) self-contained. + +The one thing you might need external cpio utilities installed for is creating +or extracting your own preprepared cpio files to feed to the kernel build +(instead of a config file or directory). + +The following command line can extract a cpio image (either by the above script +or by the kernel build) back into its component files: cpio -i -d -H newc -F initramfs_data.cpio --no-absolute-filenames +The following shell script can create a prebuilt cpio archive you can +use in place of the above config file: + + #!/bin/sh + + # Copyright 2006 Rob Landley <rob@landley.net> and TimeSys Corporation. + # Licensed under GPL version 2 + + if [ $# -ne 2 ] + then + echo "usage: mkinitramfs directory imagename.cpio.gz" + exit 1 + fi + + if [ -d "$1" ] + then + echo "creating $2 from $1" + (cd "$1"; find . | cpio -o -H newc | gzip) > "$2" + else + echo "First argument must be a directory" + exit 1 + fi + +Note: The cpio man page contains some bad advice that will break your initramfs +archive if you follow it. It says "A typical way to generate the list +of filenames is with the find command; you should give find the -depth option +to minimize problems with permissions on directories that are unwritable or not +searchable." Don't do this when creating initramfs.cpio.gz images, it won't +work. The Linux kernel cpio extractor won't create files in a directory that +doesn't exist, so the directory entries must go before the files that go in +those directories. The above script gets them in the right order. + +External initramfs images: +-------------------------- + +If the kernel has initrd support enabled, an external cpio.gz archive can also +be passed into a 2.6 kernel in place of an initrd. In this case, the kernel +will autodetect the type (initramfs, not initrd) and extract the external cpio +archive into rootfs before trying to run /init. + +This has the memory efficiency advantages of initramfs (no ramdisk block +device) but the separate packaging of initrd (which is nice if you have +non-GPL code you'd like to run from initramfs, without conflating it with +the GPL licensed Linux kernel binary). + +It can also be used to supplement the kernel's built-in initamfs image. The +files in the external archive will overwrite any conflicting files in +the built-in initramfs archive. Some distributors also prefer to customize +a single kernel image with task-specific initramfs images, without recompiling. + Contents of initramfs: ---------------------- +An initramfs archive is a complete self-contained root filesystem for Linux. If you don't already understand what shared libraries, devices, and paths you need to get a minimal root filesystem up and running, here are some references: @@ -176,13 +242,36 @@ code against, along with some related utilities. It is BSD licensed. I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net) myself. These are LGPL and GPL, respectively. (A self-contained initramfs -package is planned for the busybox 1.2 release.) +package is planned for the busybox 1.3 release.) In theory you could use glibc, but that's not well suited for small embedded uses like this. (A "hello world" program statically linked against glibc is over 400k. With uClibc it's 7k. Also note that glibc dlopens libnss to do name lookups, even when otherwise statically linked.) +A good first step is to get initramfs to run a statically linked "hello world" +program as init, and test it under an emulator like qemu (www.qemu.org) or +User Mode Linux, like so: + + cat > hello.c << EOF + #include <stdio.h> + #include <unistd.h> + + int main(int argc, char *argv[]) + { + printf("Hello world!\n"); + sleep(999999999); + } + EOF + gcc -static hello2.c -o init + echo init | cpio -o -H newc | gzip > test.cpio.gz + # Testing external initramfs using the initrd loading mechanism. + qemu -kernel /boot/vmlinuz -initrd test.cpio.gz /dev/zero + +When debugging a normal root filesystem, it's nice to be able to boot with +"init=/bin/sh". The initramfs equivalent is "rdinit=/bin/sh", and it's +just as useful. + Why cpio rather than tar? ------------------------- @@ -241,7 +330,7 @@ the above threads) is: Future directions: ------------------ -Today (2.6.14), initramfs is always compiled in, but not always used. The +Today (2.6.16), initramfs is always compiled in, but not always used. The kernel falls back to legacy boot code that is reached only if initramfs does not contain an /init program. The fallback is legacy code, there to ensure a smooth transition and allowing early boot functionality to gradually move to @@ -258,8 +347,9 @@ and so on. This kind of complexity (which inevitably includes policy) is rightly handled in userspace. Both klibc and busybox/uClibc are working on simple initramfs -packages to drop into a kernel build, and when standard solutions are ready -and widely deployed, the kernel's legacy early boot code will become obsolete -and a candidate for the feature removal schedule. +packages to drop into a kernel build. -But that's a while off yet. +The klibc package has now been accepted into Andrew Morton's 2.6.17-mm tree. +The kernel's current early boot code (partition detection, etc) will probably +be migrated into a default initramfs, automatically created and used by the +kernel build. diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 3a2e5520c1e3..9d3aed628bc1 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -113,8 +113,8 @@ members are defined: struct file_system_type { const char *name; int fs_flags; - struct super_block *(*get_sb) (struct file_system_type *, int, - const char *, void *); + struct int (*get_sb) (struct file_system_type *, int, + const char *, void *, struct vfsmount *); void (*kill_sb) (struct super_block *); struct module *owner; struct file_system_type * next; @@ -211,7 +211,7 @@ struct super_operations { int (*sync_fs)(struct super_block *sb, int wait); void (*write_super_lockfs) (struct super_block *); void (*unlockfs) (struct super_block *); - int (*statfs) (struct super_block *, struct kstatfs *); + int (*statfs) (struct dentry *, struct kstatfs *); int (*remount_fs) (struct super_block *, int *, char *); void (*clear_inode) (struct inode *); void (*umount_begin) (struct super_block *); diff --git a/Documentation/hwmon/abituguru b/Documentation/hwmon/abituguru new file mode 100644 index 000000000000..69cdb527d58f --- /dev/null +++ b/Documentation/hwmon/abituguru @@ -0,0 +1,59 @@ +Kernel driver abituguru +======================= + +Supported chips: + * Abit uGuru (Hardware Monitor part only) + Prefix: 'abituguru' + Addresses scanned: ISA 0x0E0 + Datasheet: Not available, this driver is based on reverse engineering. + A "Datasheet" has been written based on the reverse engineering it + should be available in the same dir as this file under the name + abituguru-datasheet. + +Authors: + Hans de Goede <j.w.r.degoede@hhs.nl>, + (Initial reverse engineering done by Olle Sandberg + <ollebull@gmail.com>) + + +Module Parameters +----------------- + +* force: bool Force detection. Note this parameter only causes the + detection to be skipped, if the uGuru can't be read + the module initialization (insmod) will still fail. +* fan_sensors: int Tell the driver how many fan speed sensors there are + on your motherboard. Default: 0 (autodetect). +* pwms: int Tell the driver how many fan speed controls (fan + pwms) your motherboard has. Default: 0 (autodetect). +* verbose: int How verbose should the driver be? (0-3): + 0 normal output + 1 + verbose error reporting + 2 + sensors type probing info\n" + 3 + retryable error reporting + Default: 2 (the driver is still in the testing phase) + +Notice if you need any of the first three options above please insmod the +driver with verbose set to 3 and mail me <j.w.r.degoede@hhs.nl> the output of: +dmesg | grep abituguru + + +Description +----------- + +This driver supports the hardware monitoring features of the Abit uGuru chip +found on Abit uGuru featuring motherboards (most modern Abit motherboards). + +The uGuru chip in reality is a Winbond W83L950D in disguise (despite Abit +claiming it is "a new microprocessor designed by the ABIT Engineers"). +Unfortunatly this doesn't help since the W83L950D is a generic +microcontroller with a custom Abit application running on it. + +Despite Abit not releasing any information regarding the uGuru, Olle +Sandberg <ollebull@gmail.com> has managed to reverse engineer the sensor part +of the uGuru. Without his work this driver would not have been possible. + +Known Issues +------------ + +The voltage and frequency control parts of the Abit uGuru are not supported. diff --git a/Documentation/hwmon/abituguru-datasheet b/Documentation/hwmon/abituguru-datasheet new file mode 100644 index 000000000000..aef5a9b36846 --- /dev/null +++ b/Documentation/hwmon/abituguru-datasheet @@ -0,0 +1,312 @@ +uGuru datasheet +=============== + +First of all, what I know about uGuru is no fact based on any help, hints or +datasheet from Abit. The data I have got on uGuru have I assembled through +my weak knowledge in "backwards engineering". +And just for the record, you may have noticed uGuru isn't a chip developed by +Abit, as they claim it to be. It's realy just an microprocessor (uC) created by +Winbond (W83L950D). And no, reading the manual for this specific uC or +mailing Windbond for help won't give any usefull data about uGuru, as it is +the program inside the uC that is responding to calls. + +Olle Sandberg <ollebull@gmail.com>, 2005-05-25 + + +Original version by Olle Sandberg who did the heavy lifting of the initial +reverse engineering. This version has been almost fully rewritten for clarity +and extended with write support and info on more databanks, the write support +is once again reverse engineered by Olle the additional databanks have been +reverse engineered by me. I would like to express my thanks to Olle, this +document and the Linux driver could not have been written without his efforts. + +Note: because of the lack of specs only the sensors part of the uGuru is +described here and not the CPU / RAM / etc voltage & frequency control. + +Hans de Goede <j.w.r.degoede@hhs.nl>, 28-01-2006 + + +Detection +========= + +As far as known the uGuru is always placed at and using the (ISA) I/O-ports +0xE0 and 0xE4, so we don't have to scan any port-range, just check what the two +ports are holding for detection. We will refer to 0xE0 as CMD (command-port) +and 0xE4 as DATA because Abit refers to them with these names. + +If DATA holds 0x00 or 0x08 and CMD holds 0x00 or 0xAC an uGuru could be +present. We have to check for two different values at data-port, because +after a reboot uGuru will hold 0x00 here, but if the driver is removed and +later on attached again data-port will hold 0x08, more about this later. + +After wider testing of the Linux kernel driver some variants of the uGuru have +turned up which will hold 0x00 instead of 0xAC at the CMD port, thus we also +have to test CMD for two different values. On these uGuru's DATA will initally +hold 0x09 and will only hold 0x08 after reading CMD first, so CMD must be read +first! + +To be really sure an uGuru is present a test read of one or more register +sets should be done. + + +Reading / Writing +================= + +Addressing +---------- + +The uGuru has a number of different addressing levels. The first addressing +level we will call banks. A bank holds data for one or more sensors. The data +in a bank for a sensor is one or more bytes large. + +The number of bytes is fixed for a given bank, you should always read or write +that many bytes, reading / writing more will fail, the results when writing +less then the number of bytes for a given bank are undetermined. + +See below for all known bank addresses, numbers of sensors in that bank, +number of bytes data per sensor and contents/meaning of those bytes. + +Although both this document and the kernel driver have kept the sensor +terminoligy for the addressing within a bank this is not 100% correct, in +bank 0x24 for example the addressing within the bank selects a PWM output not +a sensor. + +Notice that some banks have both a read and a write address this is how the +uGuru determines if a read from or a write to the bank is taking place, thus +when reading you should always use the read address and when writing the +write address. The write address is always one (1) more then the read address. + + +uGuru ready +----------- + +Before you can read from or write to the uGuru you must first put the uGuru +in "ready" mode. + +To put the uGuru in ready mode first write 0x00 to DATA and then wait for DATA +to hold 0x09, DATA should read 0x09 within 250 read cycles. + +Next CMD _must_ be read and should hold 0xAC, usually CMD will hold 0xAC the +first read but sometimes it takes a while before CMD holds 0xAC and thus it +has to be read a number of times (max 50). + +After reading CMD, DATA should hold 0x08 which means that the uGuru is ready +for input. As above DATA will usually hold 0x08 the first read but not always. +This step can be skipped, but it is undetermined what happens if the uGuru has +not yet reported 0x08 at DATA and you proceed with writing a bank address. + + +Sending bank and sensor addresses to the uGuru +---------------------------------------------- + +First the uGuru must be in "ready" mode as described above, DATA should hold +0x08 indicating that the uGuru wants input, in this case the bank address. + +Next write the bank address to DATA. After the bank address has been written +wait for to DATA to hold 0x08 again indicating that it wants / is ready for +more input (max 250 reads). + +Once DATA holds 0x08 again write the sensor address to CMD. + + +Reading +------- + +First send the bank and sensor addresses as described above. +Then for each byte of data you want to read wait for DATA to hold 0x01 +which indicates that the uGuru is ready to be read (max 250 reads) and once +DATA holds 0x01 read the byte from CMD. + +Once all bytes have been read data will hold 0x09, but there is no reason to +test for this. Notice that the number of bytes is bank address dependent see +above and below. + +After completing a successfull read it is advised to put the uGuru back in +ready mode, so that it is ready for the next read / write cycle. This way +if your program / driver is unloaded and later loaded again the detection +algorithm described above will still work. + + + +Writing +------- + +First send the bank and sensor addresses as described above. +Then for each byte of data you want to write wait for DATA to hold 0x00 +which indicates that the uGuru is ready to be written (max 250 reads) and +once DATA holds 0x00 write the byte to CMD. + +Once all bytes have been written wait for DATA to hold 0x01 (max 250 reads) +don't ask why this is the way it is. + +Once DATA holds 0x01 read CMD it should hold 0xAC now. + +After completing a successfull write it is advised to put the uGuru back in +ready mode, so that it is ready for the next read / write cycle. This way +if your program / driver is unloaded and later loaded again the detection +algorithm described above will still work. + + +Gotchas +------- + +After wider testing of the Linux kernel driver some variants of the uGuru have +turned up which do not hold 0x08 at DATA within 250 reads after writing the +bank address. With these versions this happens quite frequent, using larger +timeouts doesn't help, they just go offline for a second or 2, doing some +internal callibration or whatever. Your code should be prepared to handle +this and in case of no response in this specific case just goto sleep for a +while and then retry. + + +Address Map +=========== + +Bank 0x20 Alarms (R) +-------------------- +This bank contains 0 sensors, iow the sensor address is ignored (but must be +written) just use 0. Bank 0x20 contains 3 bytes: + +Byte 0: +This byte holds the alarm flags for sensor 0-7 of Sensor Bank1, with bit 0 +corresponding to sensor 0, 1 to 1, etc. + +Byte 1: +This byte holds the alarm flags for sensor 8-15 of Sensor Bank1, with bit 0 +corresponding to sensor 8, 1 to 9, etc. + +Byte 2: +This byte holds the alarm flags for sensor 0-5 of Sensor Bank2, with bit 0 +corresponding to sensor 0, 1 to 1, etc. + + +Bank 0x21 Sensor Bank1 Values / Readings (R) +-------------------------------------------- +This bank contains 16 sensors, for each sensor it contains 1 byte. +So far the following sensors are known to be available on all motherboards: +Sensor 0 CPU temp +Sensor 1 SYS temp +Sensor 3 CPU core volt +Sensor 4 DDR volt +Sensor 10 DDR Vtt volt +Sensor 15 PWM temp + +Byte 0: +This byte holds the reading from the sensor. Sensors in Bank1 can be both +volt and temp sensors, this is motherboard specific. The uGuru however does +seem to know (be programmed with) what kindoff sensor is attached see Sensor +Bank1 Settings description. + +Volt sensors use a linear scale, a reading 0 corresponds with 0 volt and a +reading of 255 with 3494 mV. The sensors for higher voltages however are +connected through a division circuit. The currently known division circuits +in use result in ranges of: 0-4361mV, 0-6248mV or 0-14510mV. 3.3 volt sources +use the 0-4361mV range, 5 volt the 0-6248mV and 12 volt the 0-14510mV . + +Temp sensors also use a linear scale, a reading of 0 corresponds with 0 degree +Celsius and a reading of 255 with a reading of 255 degrees Celsius. + + +Bank 0x22 Sensor Bank1 Settings (R) +Bank 0x23 Sensor Bank1 Settings (W) +----------------------------------- + +This bank contains 16 sensors, for each sensor it contains 3 bytes. Each +set of 3 bytes contains the settings for the sensor with the same sensor +address in Bank 0x21 . + +Byte 0: +Alarm behaviour for the selected sensor. A 1 enables the described behaviour. +Bit 0: Give an alarm if measured temp is over the warning threshold (RW) * +Bit 1: Give an alarm if measured volt is over the max threshold (RW) ** +Bit 2: Give an alarm if measured volt is under the min threshold (RW) ** +Bit 3: Beep if alarm (RW) +Bit 4: 1 if alarm cause measured temp is over the warning threshold (R) +Bit 5: 1 if alarm cause measured volt is over the max threshold (R) +Bit 6: 1 if alarm cause measured volt is under the min threshold (R) +Bit 7: Volt sensor: Shutdown if alarm persist for more then 4 seconds (RW) + Temp sensor: Shutdown if temp is over the shutdown threshold (RW) + +* This bit is only honored/used by the uGuru if a temp sensor is connected +** This bit is only honored/used by the uGuru if a volt sensor is connected +Note with some trickery this can be used to find out what kinda sensor is +detected see the Linux kernel driver for an example with many comments on +how todo this. + +Byte 1: +Temp sensor: warning threshold (scale as bank 0x21) +Volt sensor: min threshold (scale as bank 0x21) + +Byte 2: +Temp sensor: shutdown threshold (scale as bank 0x21) +Volt sensor: max threshold (scale as bank 0x21) + + +Bank 0x24 PWM outputs for FAN's (R) +Bank 0x25 PWM outputs for FAN's (W) +----------------------------------- + +This bank contains 3 "sensors", for each sensor it contains 5 bytes. +Sensor 0 usually controls the CPU fan +Sensor 1 usually controls the NB (or chipset for single chip) fan +Sensor 2 usually controls the System fan + +Byte 0: +Flag 0x80 to enable control, Fan runs at 100% when disabled. +low nibble (temp)sensor address at bank 0x21 used for control. + +Byte 1: +0-255 = 0-12v (linear), specify voltage at which fan will rotate when under +low threshold temp (specified in byte 3) + +Byte 2: +0-255 = 0-12v (linear), specify voltage at which fan will rotate when above +high threshold temp (specified in byte 4) + +Byte 3: +Low threshold temp (scale as bank 0x21) + +byte 4: +High threshold temp (scale as bank 0x21) + + +Bank 0x26 Sensors Bank2 Values / Readings (R) +--------------------------------------------- + +This bank contains 6 sensors (AFAIK), for each sensor it contains 1 byte. +So far the following sensors are known to be available on all motherboards: +Sensor 0: CPU fan speed +Sensor 1: NB (or chipset for single chip) fan speed +Sensor 2: SYS fan speed + +Byte 0: +This byte holds the reading from the sensor. 0-255 = 0-15300 (linear) + + +Bank 0x27 Sensors Bank2 Settings (R) +Bank 0x28 Sensors Bank2 Settings (W) +------------------------------------ + +This bank contains 6 sensors (AFAIK), for each sensor it contains 2 bytes. + +Byte 0: +Alarm behaviour for the selected sensor. A 1 enables the described behaviour. +Bit 0: Give an alarm if measured rpm is under the min threshold (RW) +Bit 3: Beep if alarm (RW) +Bit 7: Shutdown if alarm persist for more then 4 seconds (RW) + +Byte 1: +min threshold (scale as bank 0x26) + + +Warning for the adventerous +=========================== + +A word of caution to those who want to experiment and see if they can figure +the voltage / clock programming out, I tried reading and only reading banks +0-0x30 with the reading code used for the sensor banks (0x20-0x28) and this +resulted in a _permanent_ reprogramming of the voltages, luckily I had the +sensors part configured so that it would shutdown my system on any out of spec +voltages which proprably safed my computer (after a reboot I managed to +immediatly enter the bios and reload the defaults). This probably means that +the read/write cycle for the non sensor part is different from the sensor part. diff --git a/Documentation/hwmon/lm70 b/Documentation/hwmon/lm70 new file mode 100644 index 000000000000..2bdd3feebf53 --- /dev/null +++ b/Documentation/hwmon/lm70 @@ -0,0 +1,31 @@ +Kernel driver lm70 +================== + +Supported chip: + * National Semiconductor LM70 + Datasheet: http://www.national.com/pf/LM/LM70.html + +Author: + Kaiwan N Billimoria <kaiwan@designergraphix.com> + +Description +----------- + +This driver implements support for the National Semiconductor LM70 +temperature sensor. + +The LM70 temperature sensor chip supports a single temperature sensor. +It communicates with a host processor (or microcontroller) via an +SPI/Microwire Bus interface. + +Communication with the LM70 is simple: when the temperature is to be sensed, +the driver accesses the LM70 using SPI communication: 16 SCLK cycles +comprise the MOSI/MISO loop. At the end of the transfer, the 11-bit 2's +complement digital temperature (sent via the SIO line), is available in the +driver for interpretation. This driver makes use of the kernel's in-core +SPI support. + +Thanks to +--------- +Jean Delvare <khali@linux-fr.org> for mentoring the hwmon-side driver +development. diff --git a/Documentation/hwmon/lm83 b/Documentation/hwmon/lm83 index 061d9ed8ff43..f7aad1489cb0 100644 --- a/Documentation/hwmon/lm83 +++ b/Documentation/hwmon/lm83 @@ -7,6 +7,10 @@ Supported chips: Addresses scanned: I2C 0x18 - 0x1a, 0x29 - 0x2b, 0x4c - 0x4e Datasheet: Publicly available at the National Semiconductor website http://www.national.com/pf/LM/LM83.html + * National Semiconductor LM82 + Addresses scanned: I2C 0x18 - 0x1a, 0x29 - 0x2b, 0x4c - 0x4e + Datasheet: Publicly available at the National Semiconductor website + http://www.national.com/pf/LM/LM82.html Author: Jean Delvare <khali@linux-fr.org> @@ -15,10 +19,11 @@ Description ----------- The LM83 is a digital temperature sensor. It senses its own temperature as -well as the temperature of up to three external diodes. It is compatible -with many other devices such as the LM84 and all other ADM1021 clones. -The main difference between the LM83 and the LM84 in that the later can -only sense the temperature of one external diode. +well as the temperature of up to three external diodes. The LM82 is +a stripped down version of the LM83 that only supports one external diode. +Both are compatible with many other devices such as the LM84 and all +other ADM1021 clones. The main difference between the LM83 and the LM84 +in that the later can only sense the temperature of one external diode. Using the adm1021 driver for a LM83 should work, but only two temperatures will be reported instead of four. @@ -30,12 +35,16 @@ contact us. Note that the LM90 can easily be misdetected as a LM83. Confirmed motherboards: SBS P014 + SBS PSL09 Unconfirmed motherboards: Gigabyte GA-8IK1100 Iwill MPX2 Soltek SL-75DRV5 +The LM82 is confirmed to have been found on most AMD Geode reference +designs and test platforms. + The driver has been successfully tested by Magnus Forsström, who I'd like to thank here. More testers will be of course welcome. diff --git a/Documentation/hwmon/smsc47m192 b/Documentation/hwmon/smsc47m192 new file mode 100644 index 000000000000..45d6453cd435 --- /dev/null +++ b/Documentation/hwmon/smsc47m192 @@ -0,0 +1,102 @@ +Kernel driver smsc47m192 +======================== + +Supported chips: + * SMSC LPC47M192 and LPC47M997 + Prefix: 'smsc47m192' + Addresses scanned: I2C 0x2c - 0x2d + Datasheet: The datasheet for LPC47M192 is publicly available from + http://www.smsc.com/ + The LPC47M997 is compatible for hardware monitoring. + +Author: Hartmut Rick <linux@rick.claranet.de> + Special thanks to Jean Delvare for careful checking + of the code and many helpful comments and suggestions. + + +Description +----------- + +This driver implements support for the hardware sensor capabilities +of the SMSC LPC47M192 and LPC47M997 Super-I/O chips. + +These chips support 3 temperature channels and 8 voltage inputs +as well as CPU voltage VID input. + +They do also have fan monitoring and control capabilities, but the +these features are accessed via ISA bus and are not supported by this +driver. Use the 'smsc47m1' driver for fan monitoring and control. + +Voltages and temperatures are measured by an 8-bit ADC, the resolution +of the temperatures is 1 bit per degree C. +Voltages are scaled such that the nominal voltage corresponds to +192 counts, i.e. 3/4 of the full range. Thus the available range for +each voltage channel is 0V ... 255/192*(nominal voltage), the resolution +is 1 bit per (nominal voltage)/192. +Both voltage and temperature values are scaled by 1000, the sys files +show voltages in mV and temperatures in units of 0.001 degC. + +The +12V analog voltage input channel (in4_input) is multiplexed with +bit 4 of the encoded CPU voltage. This means that you either get +a +12V voltage measurement or a 5 bit CPU VID, but not both. +The default setting is to use the pin as 12V input, and use only 4 bit VID. +This driver assumes that the information in the configuration register +is correct, i.e. that the BIOS has updated the configuration if +the motherboard has this input wired to VID4. + +The temperature and voltage readings are updated once every 1.5 seconds. +Reading them more often repeats the same values. + + +sysfs interface +--------------- + +in0_input - +2.5V voltage input +in1_input - CPU voltage input (nominal 2.25V) +in2_input - +3.3V voltage input +in3_input - +5V voltage input +in4_input - +12V voltage input (may be missing if used as VID4) +in5_input - Vcc voltage input (nominal 3.3V) + This is the supply voltage of the sensor chip itself. +in6_input - +1.5V voltage input +in7_input - +1.8V voltage input + +in[0-7]_min, +in[0-7]_max - lower and upper alarm thresholds for in[0-7]_input reading + + All voltages are read and written in mV. + +in[0-7]_alarm - alarm flags for voltage inputs + These files read '1' in case of alarm, '0' otherwise. + +temp1_input - chip temperature measured by on-chip diode +temp[2-3]_input - temperature measured by external diodes (one of these would + typically be wired to the diode inside the CPU) + +temp[1-3]_min, +temp[1-3]_max - lower and upper alarm thresholds for temperatures + +temp[1-3]_offset - temperature offset registers + The chip adds the offsets stored in these registers to + the corresponding temperature readings. + Note that temp1 and temp2 offsets share the same register, + they cannot both be different from zero at the same time. + Writing a non-zero number to one of them will reset the other + offset to zero. + + All temperatures and offsets are read and written in + units of 0.001 degC. + +temp[1-3]_alarm - alarm flags for temperature inputs, '1' in case of alarm, + '0' otherwise. +temp[2-3]_input_fault - diode fault flags for temperature inputs 2 and 3. + A fault is detected if the two pins for the corresponding + sensor are open or shorted, or any of the two is shorted + to ground or Vcc. '1' indicates a diode fault. + +cpu0_vid - CPU voltage as received from the CPU + +vrm - CPU VID standard used for decoding CPU voltage + + The *_min, *_max, *_offset and vrm files can be read and + written, all others are read-only. diff --git a/Documentation/hwmon/sysfs-interface b/Documentation/hwmon/sysfs-interface index a0d0ab24288e..d1d390aaf620 100644 --- a/Documentation/hwmon/sysfs-interface +++ b/Documentation/hwmon/sysfs-interface @@ -3,15 +3,15 @@ Naming and data format standards for sysfs files The libsensors library offers an interface to the raw sensors data through the sysfs interface. See libsensors documentation and source for -more further information. As of writing this document, libsensors -(from lm_sensors 2.8.3) is heavily chip-dependant. Adding or updating +further information. As of writing this document, libsensors +(from lm_sensors 2.8.3) is heavily chip-dependent. Adding or updating support for any given chip requires modifying the library's code. This is because libsensors was written for the procfs interface older kernel modules were using, which wasn't standardized enough. Recent versions of libsensors (from lm_sensors 2.8.2 and later) have support for the sysfs interface, though. -The new sysfs interface was designed to be as chip-independant as +The new sysfs interface was designed to be as chip-independent as possible. Note that motherboards vary widely in the connections to sensor chips. @@ -24,7 +24,7 @@ range using external resistors. Since the values of these resistors can change from motherboard to motherboard, the conversions cannot be hard coded into the driver and have to be done in user space. -For this reason, even if we aim at a chip-independant libsensors, it will +For this reason, even if we aim at a chip-independent libsensors, it will still require a configuration file (e.g. /etc/sensors.conf) for proper values conversion, labeling of inputs and hiding of unused inputs. @@ -39,15 +39,16 @@ If you are developing a userspace application please send us feedback on this standard. Note that this standard isn't completely established yet, so it is subject -to changes, even important ones. One more reason to use the library instead -of accessing sysfs files directly. +to changes. If you are writing a new hardware monitoring driver those +features can't seem to fit in this interface, please contact us with your +extension proposal. Keep in mind that backward compatibility must be +preserved. Each chip gets its own directory in the sysfs /sys/devices tree. To -find all sensor chips, it is easier to follow the symlinks from -/sys/i2c/devices/ +find all sensor chips, it is easier to follow the device symlinks from +/sys/class/hwmon/hwmon*. -All sysfs values are fixed point numbers. To get the true value of some -of the values, you should divide by the specified value. +All sysfs values are fixed point numbers. There is only one value per file, unlike the older /proc specification. The common scheme for files naming is: <type><number>_<item>. Usual @@ -69,28 +70,40 @@ to cause an alarm) is chip-dependent. ------------------------------------------------------------------------- +[0-*] denotes any positive number starting from 0 +[1-*] denotes any positive number starting from 1 +RO read only value +RW read/write value + +Read/write values may be read-only for some chips, depending on the +hardware implementation. + +All entries are optional, and should only be created in a given driver +if the chip has the feature. + ************ * Voltages * ************ -in[0-8]_min Voltage min value. +in[0-*]_min Voltage min value. Unit: millivolt - Read/Write + RW -in[0-8]_max Voltage max value. +in[0-*]_max Voltage max value. Unit: millivolt - Read/Write + RW -in[0-8]_input Voltage input value. +in[0-*]_input Voltage input value. Unit: millivolt - Read only + RO + Voltage measured on the chip pin. Actual voltage depends on the scaling resistors on the motherboard, as recommended in the chip datasheet. This varies by chip and by motherboard. Because of this variation, values are generally NOT scaled by the chip driver, and must be done by the application. However, some drivers (notably lm87 and via686a) - do scale, with various degrees of success. + do scale, because of internal resistors built into a chip. These drivers will output the actual voltage. Typical usage: @@ -104,58 +117,72 @@ in[0-8]_input Voltage input value. in7_* varies in8_* varies -cpu[0-1]_vid CPU core reference voltage. +cpu[0-*]_vid CPU core reference voltage. Unit: millivolt - Read only. + RO Not always correct. vrm Voltage Regulator Module version number. - Read only. - Two digit number, first is major version, second is - minor version. + RW (but changing it should no more be necessary) + Originally the VRM standard version multiplied by 10, but now + an arbitrary number, as not all standards have a version + number. Affects the way the driver calculates the CPU core reference voltage from the vid pins. +Also see the Alarms section for status flags associated with voltages. + ******** * Fans * ******** -fan[1-3]_min Fan minimum value +fan[1-*]_min Fan minimum value Unit: revolution/min (RPM) - Read/Write. + RW -fan[1-3]_input Fan input value. +fan[1-*]_input Fan input value. Unit: revolution/min (RPM) - Read only. + RO -fan[1-3]_div Fan divisor. +fan[1-*]_div Fan divisor. Integer value in powers of two (1, 2, 4, 8, 16, 32, 64, 128). + RW Some chips only support values 1, 2, 4 and 8. Note that this is actually an internal clock divisor, which affects the measurable speed range, not the read value. +Also see the Alarms section for status flags associated with fans. + + ******* * PWM * ******* -pwm[1-3] Pulse width modulation fan control. +pwm[1-*] Pulse width modulation fan control. Integer value in the range 0 to 255 - Read/Write + RW 255 is max or 100%. -pwm[1-3]_enable +pwm[1-*]_enable Switch PWM on and off. Not always present even if fan*_pwm is. - 0 to turn off - 1 to turn on in manual mode - 2 to turn on in automatic mode - Read/Write + 0: turn off + 1: turn on in manual mode + 2+: turn on in automatic mode + Check individual chip documentation files for automatic mode details. + RW + +pwm[1-*]_mode + 0: DC mode + 1: PWM mode + RW pwm[1-*]_auto_channels_temp Select which temperature channels affect this PWM output in auto mode. Bitfield, 1 is temp1, 2 is temp2, 4 is temp3 etc... Which values are possible depend on the chip used. + RW pwm[1-*]_auto_point[1-*]_pwm pwm[1-*]_auto_point[1-*]_temp @@ -163,6 +190,7 @@ pwm[1-*]_auto_point[1-*]_temp_hyst Define the PWM vs temperature curve. Number of trip points is chip-dependent. Use this for chips which associate trip points to PWM output channels. + RW OR @@ -172,50 +200,57 @@ temp[1-*]_auto_point[1-*]_temp_hyst Define the PWM vs temperature curve. Number of trip points is chip-dependent. Use this for chips which associate trip points to temperature channels. + RW **************** * Temperatures * **************** -temp[1-3]_type Sensor type selection. +temp[1-*]_type Sensor type selection. Integers 1 to 4 or thermistor Beta value (typically 3435) - Read/Write. + RW 1: PII/Celeron Diode 2: 3904 transistor 3: thermal diode 4: thermistor (default/unknown Beta) Not all types are supported by all chips -temp[1-4]_max Temperature max value. - Unit: millidegree Celcius - Read/Write value. +temp[1-*]_max Temperature max value. + Unit: millidegree Celsius (or millivolt, see below) + RW -temp[1-3]_min Temperature min value. - Unit: millidegree Celcius - Read/Write value. +temp[1-*]_min Temperature min value. + Unit: millidegree Celsius + RW -temp[1-3]_max_hyst +temp[1-*]_max_hyst Temperature hysteresis value for max limit. - Unit: millidegree Celcius + Unit: millidegree Celsius Must be reported as an absolute temperature, NOT a delta from the max value. - Read/Write value. + RW -temp[1-4]_input Temperature input value. - Unit: millidegree Celcius - Read only value. +temp[1-*]_input Temperature input value. + Unit: millidegree Celsius + RO -temp[1-4]_crit Temperature critical value, typically greater than +temp[1-*]_crit Temperature critical value, typically greater than corresponding temp_max values. - Unit: millidegree Celcius - Read/Write value. + Unit: millidegree Celsius + RW -temp[1-2]_crit_hyst +temp[1-*]_crit_hyst Temperature hysteresis value for critical limit. - Unit: millidegree Celcius + Unit: millidegree Celsius Must be reported as an absolute temperature, NOT a delta from the critical value. + RW + +temp[1-4]_offset + Temperature offset which is added to the temperature reading + by the chip. + Unit: millidegree Celsius Read/Write value. If there are multiple temperature sensors, temp1_* is @@ -225,6 +260,17 @@ temp[1-2]_crit_hyst itself, for example the thermal diode inside the CPU or a thermistor nearby. +Some chips measure temperature using external thermistors and an ADC, and +report the temperature measurement as a voltage. Converting this voltage +back to a temperature (or the other way around for limits) requires +mathematical functions not available in the kernel, so the conversion +must occur in user space. For these chips, all temp* files described +above should contain values expressed in millivolt instead of millidegree +Celsius. In other words, such temperature channels are handled as voltage +channels by the driver. + +Also see the Alarms section for status flags associated with temperatures. + ************ * Currents * @@ -233,25 +279,88 @@ temp[1-2]_crit_hyst Note that no known chip provides current measurements as of writing, so this part is theoretical, so to say. -curr[1-n]_max Current max value +curr[1-*]_max Current max value Unit: milliampere - Read/Write. + RW -curr[1-n]_min Current min value. +curr[1-*]_min Current min value. Unit: milliampere - Read/Write. + RW -curr[1-n]_input Current input value +curr[1-*]_input Current input value Unit: milliampere - Read only. + RO -********* -* Other * -********* +********** +* Alarms * +********** + +Each channel or limit may have an associated alarm file, containing a +boolean value. 1 means than an alarm condition exists, 0 means no alarm. + +Usually a given chip will either use channel-related alarms, or +limit-related alarms, not both. The driver should just reflect the hardware +implementation. + +in[0-*]_alarm +fan[1-*]_alarm +temp[1-*]_alarm + Channel alarm + 0: no alarm + 1: alarm + RO + +OR + +in[0-*]_min_alarm +in[0-*]_max_alarm +fan[1-*]_min_alarm +temp[1-*]_min_alarm +temp[1-*]_max_alarm +temp[1-*]_crit_alarm + Limit alarm + 0: no alarm + 1: alarm + RO + +Each input channel may have an associated fault file. This can be used +to notify open diodes, unconnected fans etc. where the hardware +supports it. When this boolean has value 1, the measurement for that +channel should not be trusted. + +in[0-*]_input_fault +fan[1-*]_input_fault +temp[1-*]_input_fault + Input fault condition + 0: no fault occured + 1: fault condition + RO + +Some chips also offer the possibility to get beeped when an alarm occurs: + +beep_enable Master beep enable + 0: no beeps + 1: beeps + RW + +in[0-*]_beep +fan[1-*]_beep +temp[1-*]_beep + Channel beep + 0: disable + 1: enable + RW + +In theory, a chip could provide per-limit beep masking, but no such chip +was seen so far. + +Old drivers provided a different, non-standard interface to alarms and +beeps. These interface files are deprecated, but will be kept around +for compatibility reasons: alarms Alarm bitmask. - Read only. + RO Integer representation of one to four bytes. A '1' bit means an alarm. Chips should be programmed for 'comparator' mode so that @@ -259,35 +368,26 @@ alarms Alarm bitmask. if it is still valid. Generally a direct representation of a chip's internal alarm registers; there is no standard for the position - of individual bits. + of individual bits. For this reason, the use of this + interface file for new drivers is discouraged. Use + individual *_alarm and *_fault files instead. Bits are defined in kernel/include/sensors.h. -alarms_in Alarm bitmask relative to in (voltage) channels - Read only - A '1' bit means an alarm, LSB corresponds to in0 and so on - Prefered to 'alarms' for newer chips - -alarms_fan Alarm bitmask relative to fan channels - Read only - A '1' bit means an alarm, LSB corresponds to fan1 and so on - Prefered to 'alarms' for newer chips - -alarms_temp Alarm bitmask relative to temp (temperature) channels - Read only - A '1' bit means an alarm, LSB corresponds to temp1 and so on - Prefered to 'alarms' for newer chips +beep_mask Bitmask for beep. + Same format as 'alarms' with the same bit locations, + use discouraged for the same reason. Use individual + *_beep files instead. + RW -beep_enable Beep/interrupt enable - 0 to disable. - 1 to enable. - Read/Write -beep_mask Bitmask for beep. - Same format as 'alarms' with the same bit locations. - Read/Write +********* +* Other * +********* eeprom Raw EEPROM data in binary form. - Read only. + RO pec Enable or disable PEC (SMBus only) - Read/Write + 0: disable + 1: enable + RW diff --git a/Documentation/hwmon/userspace-tools b/Documentation/hwmon/userspace-tools index 2622aac65422..19900a8fe679 100644 --- a/Documentation/hwmon/userspace-tools +++ b/Documentation/hwmon/userspace-tools @@ -6,31 +6,32 @@ voltages, fans speed). They are often connected through an I2C bus, but some are also connected directly through the ISA bus. The kernel drivers make the data from the sensor chips available in the /sys -virtual filesystem. Userspace tools are then used to display or set or the -data in a more friendly manner. +virtual filesystem. Userspace tools are then used to display the measured +values or configure the chips in a more friendly manner. Lm-sensors ---------- -Core set of utilites that will allow you to obtain health information, +Core set of utilities that will allow you to obtain health information, setup monitoring limits etc. You can get them on their homepage http://www.lm-sensors.nu/ or as a package from your Linux distribution. If from website: -Get lmsensors from project web site. Please note, you need only userspace -part, so compile with "make user_install" target. +Get lm-sensors from project web site. Please note, you need only userspace +part, so compile with "make user" and install with "make user_install". General hints to get things working: 0) get lm-sensors userspace utils -1) compile all drivers in I2C section as modules in your kernel +1) compile all drivers in I2C and Hardware Monitoring sections as modules + in your kernel 2) run sensors-detect script, it will tell you what modules you need to load. 3) load them and run "sensors" command, you should see some results. 4) fix sensors.conf, labels, limits, fan divisors 5) if any more problems consult FAQ, or documentation -Other utilites --------------- +Other utilities +--------------- If you want some graphical indicators of system health look for applications like: gkrellm, ksensors, xsensors, wmtemp, wmsensors, wmgtemp, ksysguardd, diff --git a/Documentation/hwmon/w83791d b/Documentation/hwmon/w83791d new file mode 100644 index 000000000000..83a3836289c2 --- /dev/null +++ b/Documentation/hwmon/w83791d @@ -0,0 +1,113 @@ +Kernel driver w83791d +===================== + +Supported chips: + * Winbond W83791D + Prefix: 'w83791d' + Addresses scanned: I2C 0x2c - 0x2f + Datasheet: http://www.winbond-usa.com/products/winbond_products/pdfs/PCIC/W83791Da.pdf + +Author: Charles Spirakis <bezaur@gmail.com> + +This driver was derived from the w83781d.c and w83792d.c source files. + +Credits: + w83781d.c: + Frodo Looijaard <frodol@dds.nl>, + Philip Edelbrock <phil@netroedge.com>, + and Mark Studebaker <mdsxyz123@yahoo.com> + w83792d.c: + Chunhao Huang <DZShen@Winbond.com.tw>, + Rudolf Marek <r.marek@sh.cvut.cz> + +Module Parameters +----------------- + +* init boolean + (default 0) + Use 'init=1' to have the driver do extra software initializations. + The default behavior is to do the minimum initialization possible + and depend on the BIOS to properly setup the chip. If you know you + have a w83791d and you're having problems, try init=1 before trying + reset=1. + +* reset boolean + (default 0) + Use 'reset=1' to reset the chip (via index 0x40, bit 7). The default + behavior is no chip reset to preserve BIOS settings. + +* force_subclients=bus,caddr,saddr,saddr + This is used to force the i2c addresses for subclients of + a certain chip. Example usage is `force_subclients=0,0x2f,0x4a,0x4b' + to force the subclients of chip 0x2f on bus 0 to i2c addresses + 0x4a and 0x4b. + + +Description +----------- + +This driver implements support for the Winbond W83791D chip. + +Detection of the chip can sometimes be foiled because it can be in an +internal state that allows no clean access (Bank with ID register is not +currently selected). If you know the address of the chip, use a 'force' +parameter; this will put it into a more well-behaved state first. + +The driver implements three temperature sensors, five fan rotation speed +sensors, and ten voltage sensors. + +Temperatures are measured in degrees Celsius and measurement resolution is 1 +degC for temp1 and 0.5 degC for temp2 and temp3. An alarm is triggered when +the temperature gets higher than the Overtemperature Shutdown value; it stays +on until the temperature falls below the Hysteresis value. + +Fan rotation speeds are reported in RPM (rotations per minute). An alarm is +triggered if the rotation speed has dropped below a programmable limit. Fan +readings can be divided by a programmable divider (1, 2, 4, 8 for fan 1/2/3 +and 1, 2, 4, 8, 16, 32, 64 or 128 for fan 4/5) to give the readings more +range or accuracy. + +Voltage sensors (also known as IN sensors) report their values in millivolts. +An alarm is triggered if the voltage has crossed a programmable minimum +or maximum limit. + +Alarms are provided as output from a "realtime status register". The +following bits are defined: + +bit - alarm on: +0 - Vcore +1 - VINR0 +2 - +3.3VIN +3 - 5VDD +4 - temp1 +5 - temp2 +6 - fan1 +7 - fan2 +8 - +12VIN +9 - -12VIN +10 - -5VIN +11 - fan3 +12 - chassis +13 - temp3 +14 - VINR1 +15 - reserved +16 - tart1 +17 - tart2 +18 - tart3 +19 - VSB +20 - VBAT +21 - fan4 +22 - fan5 +23 - reserved + +When an alarm goes off, you can be warned by a beeping signal through your +computer speaker. It is possible to enable all beeping globally, or only +the beeping for some alarms. + +The driver only reads the chip values each 3 seconds; reading them more +often will do no harm, but will return 'old' values. + +W83791D TODO: +--------------- +Provide a patch for per-file alarms as discussed on the mailing list +Provide a patch for smart-fan control (still need appropriate motherboard/fans) diff --git a/Documentation/i2c/busses/i2c-i801 b/Documentation/i2c/busses/i2c-i801 index fd4b2712d570..e46c23458242 100644 --- a/Documentation/i2c/busses/i2c-i801 +++ b/Documentation/i2c/busses/i2c-i801 @@ -21,8 +21,7 @@ Authors: Module Parameters ----------------- -* force_addr: int - Forcibly enable the ICH at the given address. EXTREMELY DANGEROUS! +None. Description diff --git a/Documentation/i2c/busses/i2c-nforce2 b/Documentation/i2c/busses/i2c-nforce2 index d751282d9b2a..cd49c428a3ab 100644 --- a/Documentation/i2c/busses/i2c-nforce2 +++ b/Documentation/i2c/busses/i2c-nforce2 @@ -7,6 +7,8 @@ Supported adapters: * nForce3 250Gb MCP 10de:00E4 * nForce4 MCP 10de:0052 * nForce4 MCP-04 10de:0034 + * nForce4 MCP51 10de:0264 + * nForce4 MCP55 10de:0368 Datasheet: not publically available, but seems to be similar to the AMD-8111 SMBus 2.0 adapter. diff --git a/Documentation/i2c/busses/i2c-ocores b/Documentation/i2c/busses/i2c-ocores new file mode 100644 index 000000000000..cfcebb10d14e --- /dev/null +++ b/Documentation/i2c/busses/i2c-ocores @@ -0,0 +1,51 @@ +Kernel driver i2c-ocores + +Supported adapters: + * OpenCores.org I2C controller by Richard Herveille (see datasheet link) + Datasheet: http://www.opencores.org/projects.cgi/web/i2c/overview + +Author: Peter Korsgaard <jacmet@sunsite.dk> + +Description +----------- + +i2c-ocores is an i2c bus driver for the OpenCores.org I2C controller +IP core by Richard Herveille. + +Usage +----- + +i2c-ocores uses the platform bus, so you need to provide a struct +platform_device with the base address and interrupt number. The +dev.platform_data of the device should also point to a struct +ocores_i2c_platform_data (see linux/i2c-ocores.h) describing the +distance between registers and the input clock speed. + +E.G. something like: + +static struct resource ocores_resources[] = { + [0] = { + .start = MYI2C_BASEADDR, + .end = MYI2C_BASEADDR + 8, + .flags = IORESOURCE_MEM, + }, + [1] = { + .start = MYI2C_IRQ, + .end = MYI2C_IRQ, + .flags = IORESOURCE_IRQ, + }, +}; + +static struct ocores_i2c_platform_data myi2c_data = { + .regstep = 2, /* two bytes between registers */ + .clock_khz = 50000, /* input clock of 50MHz */ +}; + +static struct platform_device myi2c = { + .name = "ocores-i2c", + .dev = { + .platform_data = &myi2c_data, + }, + .num_resources = ARRAY_SIZE(ocores_resources), + .resource = ocores_resources, +}; diff --git a/Documentation/i2c/busses/i2c-piix4 b/Documentation/i2c/busses/i2c-piix4 index a1c8f581afed..921476333235 100644 --- a/Documentation/i2c/busses/i2c-piix4 +++ b/Documentation/i2c/busses/i2c-piix4 @@ -6,6 +6,8 @@ Supported adapters: Datasheet: Publicly available at the Intel website * ServerWorks OSB4, CSB5, CSB6 and HT-1000 southbridges Datasheet: Only available via NDA from ServerWorks + * ATI IXP southbridges IXP200, IXP300, IXP400 + Datasheet: Not publicly available * Standard Microsystems (SMSC) SLC90E66 (Victory66) southbridge Datasheet: Publicly available at the SMSC website http://www.smsc.com @@ -21,8 +23,6 @@ Module Parameters Forcibly enable the PIIX4. DANGEROUS! * force_addr: int Forcibly enable the PIIX4 at the given address. EXTREMELY DANGEROUS! -* fix_hstcfg: int - Fix config register. Needed on some boards (Force CPCI735). Description @@ -63,10 +63,36 @@ The PIIX4E is just an new version of the PIIX4; it is supported as well. The PIIX/PIIX3 does not implement an SMBus or I2C bus, so you can't use this driver on those mainboards. -The ServerWorks Southbridges, the Intel 440MX, and the Victory766 are +The ServerWorks Southbridges, the Intel 440MX, and the Victory66 are identical to the PIIX4 in I2C/SMBus support. -A few OSB4 southbridges are known to be misconfigured by the BIOS. In this -case, you have you use the fix_hstcfg module parameter. Do not use it -unless you know you have to, because in some cases it also breaks -configuration on southbridges that don't need it. +If you own Force CPCI735 motherboard or other OSB4 based systems you may need +to change the SMBus Interrupt Select register so the SMBus controller uses +the SMI mode. + +1) Use lspci command and locate the PCI device with the SMBus controller: + 00:0f.0 ISA bridge: ServerWorks OSB4 South Bridge (rev 4f) + The line may vary for different chipsets. Please consult the driver source + for all possible PCI ids (and lspci -n to match them). Lets assume the + device is located at 00:0f.0. +2) Now you just need to change the value in 0xD2 register. Get it first with + command: lspci -xxx -s 00:0f.0 + If the value is 0x3 then you need to change it to 0x1 + setpci -s 00:0f.0 d2.b=1 + +Please note that you don't need to do that in all cases, just when the SMBus is +not working properly. + + +Hardware-specific issues +------------------------ + +This driver will refuse to load on IBM systems with an Intel PIIX4 SMBus. +Some of these machines have an RFID EEPROM (24RF08) connected to the SMBus, +which can easily get corrupted due to a state machine bug. These are mostly +Thinkpad laptops, but desktop systems may also be affected. We have no list +of all affected systems, so the only safe solution was to prevent access to +the SMBus on all IBM systems (detected using DMI data.) + +For additional information, read: +http://www2.lm-sensors.nu/~lm78/cvs/lm_sensors2/README.thinkpad diff --git a/Documentation/i2c/busses/scx200_acb b/Documentation/i2c/busses/scx200_acb index f50e69981ec6..7c07883d4dfc 100644 --- a/Documentation/i2c/busses/scx200_acb +++ b/Documentation/i2c/busses/scx200_acb @@ -2,14 +2,31 @@ Kernel driver scx200_acb Author: Christer Weinigel <wingel@nano-system.com> +The driver supersedes the older, never merged driver named i2c-nscacb. + Module Parameters ----------------- -* base: int +* base: up to 4 ints Base addresses for the ACCESS.bus controllers on SCx200 and SC1100 devices + By default the driver uses two base addresses 0x820 and 0x840. + If you want only one base address, specify the second as 0 so as to + override this default. + Description ----------- Enable the use of the ACCESS.bus controller on the Geode SCx200 and SC1100 processors and the CS5535 and CS5536 Geode companion devices. + +Device-specific notes +--------------------- + +The SC1100 WRAP boards are known to use base addresses 0x810 and 0x820. +If the scx200_acb driver is built into the kernel, add the following +parameter to your boot command line: + scx200_acb.base=0x810,0x820 +If the scx200_acb driver is built as a module, add the following line to +the file /etc/modprobe.conf instead: + options scx200_acb base=0x810,0x820 diff --git a/Documentation/ia64/aliasing.txt b/Documentation/ia64/aliasing.txt new file mode 100644 index 000000000000..38f9a52d1820 --- /dev/null +++ b/Documentation/ia64/aliasing.txt @@ -0,0 +1,208 @@ + MEMORY ATTRIBUTE ALIASING ON IA-64 + + Bjorn Helgaas + <bjorn.helgaas@hp.com> + May 4, 2006 + + +MEMORY ATTRIBUTES + + Itanium supports several attributes for virtual memory references. + The attribute is part of the virtual translation, i.e., it is + contained in the TLB entry. The ones of most interest to the Linux + kernel are: + + WB Write-back (cacheable) + UC Uncacheable + WC Write-coalescing + + System memory typically uses the WB attribute. The UC attribute is + used for memory-mapped I/O devices. The WC attribute is uncacheable + like UC is, but writes may be delayed and combined to increase + performance for things like frame buffers. + + The Itanium architecture requires that we avoid accessing the same + page with both a cacheable mapping and an uncacheable mapping[1]. + + The design of the chipset determines which attributes are supported + on which regions of the address space. For example, some chipsets + support either WB or UC access to main memory, while others support + only WB access. + +MEMORY MAP + + Platform firmware describes the physical memory map and the + supported attributes for each region. At boot-time, the kernel uses + the EFI GetMemoryMap() interface. ACPI can also describe memory + devices and the attributes they support, but Linux/ia64 currently + doesn't use this information. + + The kernel uses the efi_memmap table returned from GetMemoryMap() to + learn the attributes supported by each region of physical address + space. Unfortunately, this table does not completely describe the + address space because some machines omit some or all of the MMIO + regions from the map. + + The kernel maintains another table, kern_memmap, which describes the + memory Linux is actually using and the attribute for each region. + This contains only system memory; it does not contain MMIO space. + + The kern_memmap table typically contains only a subset of the system + memory described by the efi_memmap. Linux/ia64 can't use all memory + in the system because of constraints imposed by the identity mapping + scheme. + + The efi_memmap table is preserved unmodified because the original + boot-time information is required for kexec. + +KERNEL IDENTITY MAPPINGS + + Linux/ia64 identity mappings are done with large pages, currently + either 16MB or 64MB, referred to as "granules." Cacheable mappings + are speculative[2], so the processor can read any location in the + page at any time, independent of the programmer's intentions. This + means that to avoid attribute aliasing, Linux can create a cacheable + identity mapping only when the entire granule supports cacheable + access. + + Therefore, kern_memmap contains only full granule-sized regions that + can referenced safely by an identity mapping. + + Uncacheable mappings are not speculative, so the processor will + generate UC accesses only to locations explicitly referenced by + software. This allows UC identity mappings to cover granules that + are only partially populated, or populated with a combination of UC + and WB regions. + +USER MAPPINGS + + User mappings are typically done with 16K or 64K pages. The smaller + page size allows more flexibility because only 16K or 64K has to be + homogeneous with respect to memory attributes. + +POTENTIAL ATTRIBUTE ALIASING CASES + + There are several ways the kernel creates new mappings: + + mmap of /dev/mem + + This uses remap_pfn_range(), which creates user mappings. These + mappings may be either WB or UC. If the region being mapped + happens to be in kern_memmap, meaning that it may also be mapped + by a kernel identity mapping, the user mapping must use the same + attribute as the kernel mapping. + + If the region is not in kern_memmap, the user mapping should use + an attribute reported as being supported in the EFI memory map. + + Since the EFI memory map does not describe MMIO on some + machines, this should use an uncacheable mapping as a fallback. + + mmap of /sys/class/pci_bus/.../legacy_mem + + This is very similar to mmap of /dev/mem, except that legacy_mem + only allows mmap of the one megabyte "legacy MMIO" area for a + specific PCI bus. Typically this is the first megabyte of + physical address space, but it may be different on machines with + several VGA devices. + + "X" uses this to access VGA frame buffers. Using legacy_mem + rather than /dev/mem allows multiple instances of X to talk to + different VGA cards. + + The /dev/mem mmap constraints apply. + + However, since this is for mapping legacy MMIO space, WB access + does not make sense. This matters on machines without legacy + VGA support: these machines may have WB memory for the entire + first megabyte (or even the entire first granule). + + On these machines, we could mmap legacy_mem as WB, which would + be safe in terms of attribute aliasing, but X has no way of + knowing that it is accessing regular memory, not a frame buffer, + so the kernel should fail the mmap rather than doing it with WB. + + read/write of /dev/mem + + This uses copy_from_user(), which implicitly uses a kernel + identity mapping. This is obviously safe for things in + kern_memmap. + + There may be corner cases of things that are not in kern_memmap, + but could be accessed this way. For example, registers in MMIO + space are not in kern_memmap, but could be accessed with a UC + mapping. This would not cause attribute aliasing. But + registers typically can be accessed only with four-byte or + eight-byte accesses, and the copy_from_user() path doesn't allow + any control over the access size, so this would be dangerous. + + ioremap() + + This returns a kernel identity mapping for use inside the + kernel. + + If the region is in kern_memmap, we should use the attribute + specified there. Otherwise, if the EFI memory map reports that + the entire granule supports WB, we should use that (granules + that are partially reserved or occupied by firmware do not appear + in kern_memmap). Otherwise, we should use a UC mapping. + +PAST PROBLEM CASES + + mmap of various MMIO regions from /dev/mem by "X" on Intel platforms + + The EFI memory map may not report these MMIO regions. + + These must be allowed so that X will work. This means that + when the EFI memory map is incomplete, every /dev/mem mmap must + succeed. It may create either WB or UC user mappings, depending + on whether the region is in kern_memmap or the EFI memory map. + + mmap of 0x0-0xA0000 /dev/mem by "hwinfo" on HP sx1000 with VGA enabled + + See https://bugzilla.novell.com/show_bug.cgi?id=140858. + + The EFI memory map reports the following attributes: + 0x00000-0x9FFFF WB only + 0xA0000-0xBFFFF UC only (VGA frame buffer) + 0xC0000-0xFFFFF WB only + + This mmap is done with user pages, not kernel identity mappings, + so it is safe to use WB mappings. + + The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000, + which will use a granule-sized UC mapping covering 0-0xFFFFF. This + granule covers some WB-only memory, but since UC is non-speculative, + the processor will never generate an uncacheable reference to the + WB-only areas unless the driver explicitly touches them. + + mmap of 0x0-0xFFFFF legacy_mem by "X" + + If the EFI memory map reports this entire range as WB, there + is no VGA MMIO hole, and the mmap should fail or be done with + a WB mapping. + + There's no easy way for X to determine whether the 0xA0000-0xBFFFF + region is a frame buffer or just memory, so I think it's best to + just fail this mmap request rather than using a WB mapping. As + far as I know, there's no need to map legacy_mem with WB + mappings. + + Otherwise, a UC mapping of the entire region is probably safe. + The VGA hole means the region will not be in kern_memmap. The + HP sx1000 chipset doesn't support UC access to the memory surrounding + the VGA hole, but X doesn't need that area anyway and should not + reference it. + + mmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled + + The EFI memory map reports the following attributes: + 0x00000-0xFFFFF WB only (no VGA MMIO hole) + + This is a special case of the previous case, and the mmap should + fail for the same reason as above. + +NOTES + + [1] SDM rev 2.2, vol 2, sec 4.4.1. + [2] SDM rev 2.2, vol 2, sec 4.4.6. diff --git a/Documentation/infiniband/ipoib.txt b/Documentation/infiniband/ipoib.txt index 5c5a4ccce76a..187035560d7f 100644 --- a/Documentation/infiniband/ipoib.txt +++ b/Documentation/infiniband/ipoib.txt @@ -1,10 +1,10 @@ IP OVER INFINIBAND The ib_ipoib driver is an implementation of the IP over InfiniBand - protocol as specified by the latest Internet-Drafts issued by the - IETF ipoib working group. It is a "native" implementation in the - sense of setting the interface type to ARPHRD_INFINIBAND and the - hardware address length to 20 (earlier proprietary implementations + protocol as specified by RFC 4391 and 4392, issued by the IETF ipoib + working group. It is a "native" implementation in the sense of + setting the interface type to ARPHRD_INFINIBAND and the hardware + address length to 20 (earlier proprietary implementations masqueraded to the kernel as ethernet interfaces). Partitions and P_Keys @@ -53,3 +53,7 @@ References IETF IP over InfiniBand (ipoib) Working Group http://ietf.org/html.charters/ipoib-charter.html + Transmission of IP over InfiniBand (IPoIB) (RFC 4391) + http://ietf.org/rfc/rfc4391.txt + IP over InfiniBand (IPoIB) Architecture (RFC 4392) + http://ietf.org/rfc/rfc4392.txt diff --git a/Documentation/ioctl-number.txt b/Documentation/ioctl-number.txt index 171a44ebd939..1543802ef53e 100644 --- a/Documentation/ioctl-number.txt +++ b/Documentation/ioctl-number.txt @@ -85,7 +85,9 @@ Code Seq# Include File Comments <mailto:maassen@uni-freiburg.de> 'C' all linux/soundcard.h 'D' all asm-s390/dasd.h +'E' all linux/input.h 'F' all linux/fb.h +'H' all linux/hiddev.h 'I' all linux/isdn.h 'J' 00-1F drivers/scsi/gdth_ioctl.h 'K' all linux/kd.h diff --git a/Documentation/isdn/README.gigaset b/Documentation/isdn/README.gigaset index 85a64defd385..fa0d4cca964a 100644 --- a/Documentation/isdn/README.gigaset +++ b/Documentation/isdn/README.gigaset @@ -124,7 +124,8 @@ GigaSet 307x Device Driver You can use some configuration tool of your distribution to configure this "modem" or configure pppd/wvdial manually. There are some example ppp - configuration files and chat scripts in the gigaset-VERSION/ppp directory. + configuration files and chat scripts in the gigaset-VERSION/ppp directory + in the driver packages from http://sourceforge.net/projects/gigaset307x/. Please note that the USB drivers are not able to change the state of the control lines (the M105 driver can be configured to use some undocumented control requests, if you really need the control lines, though). This means @@ -164,8 +165,8 @@ GigaSet 307x Device Driver If you want both of these at once, you are out of luck. - You can also use /sys/module/<name>/parameters/cidmode for changing - the CID mode setting (<name> is usb_gigaset or bas_gigaset). + You can also use /sys/class/tty/ttyGxy/cidmode for changing the CID mode + setting (ttyGxy is ttyGU0 or ttyGB0). 3. Troubleshooting diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt index a9c00facdf40..14ef3868a328 100644 --- a/Documentation/kbuild/makefiles.txt +++ b/Documentation/kbuild/makefiles.txt @@ -1123,6 +1123,14 @@ The top Makefile exports the following variables: $(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE). The user may override this value on the command line if desired. + INSTALL_MOD_STRIP + + If this variable is specified, will cause modules to be stripped + after they are installed. If INSTALL_MOD_STRIP is '1', then the + default option --strip-debug will be used. Otherwise, + INSTALL_MOD_STRIP will used as the option(s) to the strip command. + + === 8 Makefile language The kernel Makefiles are designed to run with GNU Make. The Makefiles diff --git a/Documentation/kdump/gdbmacros.txt b/Documentation/kdump/gdbmacros.txt index dcf5580380ab..9b9b454b048a 100644 --- a/Documentation/kdump/gdbmacros.txt +++ b/Documentation/kdump/gdbmacros.txt @@ -175,7 +175,7 @@ end document trapinfo Run info threads and lookup pid of thread #1 'trapinfo <pid>' will tell you by which trap & possibly - addresthe kernel paniced. + address the kernel panicked. end diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt index 212cf3c21abf..08bafa8c1caa 100644 --- a/Documentation/kdump/kdump.txt +++ b/Documentation/kdump/kdump.txt @@ -1,155 +1,325 @@ -Documentation for kdump - the kexec-based crash dumping solution +================================================================ +Documentation for Kdump - The kexec-based Crash Dumping Solution ================================================================ -DESIGN -====== +This document includes overview, setup and installation, and analysis +information. -Kdump uses kexec to reboot to a second kernel whenever a dump needs to be -taken. This second kernel is booted with very little memory. The first kernel -reserves the section of memory that the second kernel uses. This ensures that -on-going DMA from the first kernel does not corrupt the second kernel. +Overview +======== -All the necessary information about Core image is encoded in ELF format and -stored in reserved area of memory before crash. Physical address of start of -ELF header is passed to new kernel through command line parameter elfcorehdr=. +Kdump uses kexec to quickly boot to a dump-capture kernel whenever a +dump of the system kernel's memory needs to be taken (for example, when +the system panics). The system kernel's memory image is preserved across +the reboot and is accessible to the dump-capture kernel. -On i386, the first 640 KB of physical memory is needed to boot, irrespective -of where the kernel loads. Hence, this region is backed up by kexec just before -rebooting into the new kernel. +You can use common Linux commands, such as cp and scp, to copy the +memory image to a dump file on the local disk, or across the network to +a remote system. -In the second kernel, "old memory" can be accessed in two ways. +Kdump and kexec are currently supported on the x86, x86_64, and ppc64 +architectures. -- The first one is through a /dev/oldmem device interface. A capture utility - can read the device file and write out the memory in raw format. This is raw - dump of memory and analysis/capture tool should be intelligent enough to - determine where to look for the right information. ELF headers (elfcorehdr=) - can become handy here. +When the system kernel boots, it reserves a small section of memory for +the dump-capture kernel. This ensures that ongoing Direct Memory Access +(DMA) from the system kernel does not corrupt the dump-capture kernel. +The kexec -p command loads the dump-capture kernel into this reserved +memory. -- The second interface is through /proc/vmcore. This exports the dump as an ELF - format file which can be written out using any file copy command - (cp, scp, etc). Further, gdb can be used to perform limited debugging on - the dump file. This method ensures methods ensure that there is correct - ordering of the dump pages (corresponding to the first 640 KB that has been - relocated). +On x86 machines, the first 640 KB of physical memory is needed to boot, +regardless of where the kernel loads. Therefore, kexec backs up this +region just before rebooting into the dump-capture kernel. -SETUP -===== +All of the necessary information about the system kernel's core image is +encoded in the ELF format, and stored in a reserved area of memory +before a crash. The physical address of the start of the ELF header is +passed to the dump-capture kernel through the elfcorehdr= boot +parameter. + +With the dump-capture kernel, you can access the memory image, or "old +memory," in two ways: + +- Through a /dev/oldmem device interface. A capture utility can read the + device file and write out the memory in raw format. This is a raw dump + of memory. Analysis and capture tools must be intelligent enough to + determine where to look for the right information. + +- Through /proc/vmcore. This exports the dump as an ELF-format file that + you can write out using file copy commands such as cp or scp. Further, + you can use analysis tools such as the GNU Debugger (GDB) and the Crash + tool to debug the dump file. This method ensures that the dump pages are + correctly ordered. + + +Setup and Installation +====================== + +Install kexec-tools and the Kdump patch +--------------------------------------- + +1) Login as the root user. + +2) Download the kexec-tools user-space package from the following URL: + + http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz + +3) Unpack the tarball with the tar command, as follows: + + tar xvpzf kexec-tools-1.101.tar.gz + +4) Download the latest consolidated Kdump patch from the following URL: + + http://lse.sourceforge.net/kdump/ + + (This location is being used until all the user-space Kdump patches + are integrated with the kexec-tools package.) + +5) Change to the kexec-tools-1.101 directory, as follows: + + cd kexec-tools-1.101 + +6) Apply the consolidated patch to the kexec-tools-1.101 source tree + with the patch command, as follows. (Modify the path to the downloaded + patch as necessary.) + + patch -p1 < /path-to-kdump-patch/kexec-tools-1.101-kdump.patch + +7) Configure the package, as follows: + + ./configure + +8) Compile the package, as follows: + + make + +9) Install the package, as follows: + + make install + + +Download and build the system and dump-capture kernels +------------------------------------------------------ + +Download the mainline (vanilla) kernel source code (2.6.13-rc1 or newer) +from http://www.kernel.org. Two kernels must be built: a system kernel +and a dump-capture kernel. Use the following steps to configure these +kernels with the necessary kexec and Kdump features: + +System kernel +------------- + +1) Enable "kexec system call" in "Processor type and features." + + CONFIG_KEXEC=y + +2) Enable "sysfs file system support" in "Filesystem" -> "Pseudo + filesystems." This is usually enabled by default. + + CONFIG_SYSFS=y + + Note that "sysfs file system support" might not appear in the "Pseudo + filesystems" menu if "Configure standard kernel features (for small + systems)" is not enabled in "General Setup." In this case, check the + .config file itself to ensure that sysfs is turned on, as follows: + + grep 'CONFIG_SYSFS' .config + +3) Enable "Compile the kernel with debug info" in "Kernel hacking." + + CONFIG_DEBUG_INFO=Y + + This causes the kernel to be built with debug symbols. The dump + analysis tools require a vmlinux with debug symbols in order to read + and analyze a dump file. + +4) Make and install the kernel and its modules. Update the boot loader + (such as grub, yaboot, or lilo) configuration files as necessary. + +5) Boot the system kernel with the boot parameter "crashkernel=Y@X", + where Y specifies how much memory to reserve for the dump-capture kernel + and X specifies the beginning of this reserved memory. For example, + "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory + starting at physical address 0x01000000 for the dump-capture kernel. + + On x86 and x86_64, use "crashkernel=64M@16M". + + On ppc64, use "crashkernel=128M@32M". + + +The dump-capture kernel +----------------------- -1) Download the upstream kexec-tools userspace package from - http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz. - - Apply the latest consolidated kdump patch on top of kexec-tools-1.101 - from http://lse.sourceforge.net/kdump/. This arrangment has been made - till all the userspace patches supporting kdump are integrated with - upstream kexec-tools userspace. - -2) Download and build the appropriate (2.6.13-rc1 onwards) vanilla kernels. - Two kernels need to be built in order to get this feature working. - Following are the steps to properly configure the two kernels specific - to kexec and kdump features: - - A) First kernel or regular kernel: - ---------------------------------- - a) Enable "kexec system call" feature (in Processor type and features). - CONFIG_KEXEC=y - b) Enable "sysfs file system support" (in Pseudo filesystems). - CONFIG_SYSFS=y - c) make - d) Boot into first kernel with the command line parameter "crashkernel=Y@X". - Use appropriate values for X and Y. Y denotes how much memory to reserve - for the second kernel, and X denotes at what physical address the - reserved memory section starts. For example: "crashkernel=64M@16M". - - - B) Second kernel or dump capture kernel: - --------------------------------------- - a) For i386 architecture enable Highmem support - CONFIG_HIGHMEM=y - b) Enable "kernel crash dumps" feature (under "Processor type and features") - CONFIG_CRASH_DUMP=y - c) Make sure a suitable value for "Physical address where the kernel is - loaded" (under "Processor type and features"). By default this value - is 0x1000000 (16MB) and it should be same as X (See option d above), - e.g., 16 MB or 0x1000000. - CONFIG_PHYSICAL_START=0x1000000 - d) Enable "/proc/vmcore support" (Optional, under "Pseudo filesystems"). - CONFIG_PROC_VMCORE=y - -3) After booting to regular kernel or first kernel, load the second kernel - using the following command: - - kexec -p <second-kernel> --args-linux --elf32-core-headers - --append="root=<root-dev> init 1 irqpoll maxcpus=1" - - Notes: - ====== - i) <second-kernel> has to be a vmlinux image ie uncompressed elf image. - bzImage will not work, as of now. - ii) --args-linux has to be speicfied as if kexec it loading an elf image, - it needs to know that the arguments supplied are of linux type. - iii) By default ELF headers are stored in ELF64 format to support systems - with more than 4GB memory. Option --elf32-core-headers forces generation - of ELF32 headers. The reason for this option being, as of now gdb can - not open vmcore file with ELF64 headers on a 32 bit systems. So ELF32 - headers can be used if one has non-PAE systems and hence memory less - than 4GB. - iv) Specify "irqpoll" as command line parameter. This reduces driver - initialization failures in second kernel due to shared interrupts. - v) <root-dev> needs to be specified in a format corresponding to the root - device name in the output of mount command. - vi) If you have built the drivers required to mount root file system as - modules in <second-kernel>, then, specify - --initrd=<initrd-for-second-kernel>. - vii) Specify maxcpus=1 as, if during first kernel run, if panic happens on - non-boot cpus, second kernel doesn't seem to be boot up all the cpus. - The other option is to always built the second kernel without SMP - support ie CONFIG_SMP=n - -4) After successfully loading the second kernel as above, if a panic occurs - system reboots into the second kernel. A module can be written to force - the panic or "ALT-SysRq-c" can be used initiate a crash dump for testing - purposes. - -5) Once the second kernel has booted, write out the dump file using +1) Under "General setup," append "-kdump" to the current string in + "Local version." + +2) On x86, enable high memory support under "Processor type and + features": + + CONFIG_HIGHMEM64G=y + or + CONFIG_HIGHMEM4G + +3) On x86 and x86_64, disable symmetric multi-processing support + under "Processor type and features": + + CONFIG_SMP=n + (If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line + when loading the dump-capture kernel, see section "Load the Dump-capture + Kernel".) + +4) On ppc64, disable NUMA support and enable EMBEDDED support: + + CONFIG_NUMA=n + CONFIG_EMBEDDED=y + CONFIG_EEH=N for the dump-capture kernel + +5) Enable "kernel crash dumps" support under "Processor type and + features": + + CONFIG_CRASH_DUMP=y + +6) Use a suitable value for "Physical address where the kernel is + loaded" (under "Processor type and features"). This only appears when + "kernel crash dumps" is enabled. By default this value is 0x1000000 + (16MB). It should be the same as X in the "crashkernel=Y@X" boot + parameter discussed above. + + On x86 and x86_64, use "CONFIG_PHYSICAL_START=0x1000000". + + On ppc64 the value is automatically set at 32MB when + CONFIG_CRASH_DUMP is set. + +6) Optionally enable "/proc/vmcore support" under "Filesystems" -> + "Pseudo filesystems". + + CONFIG_PROC_VMCORE=y + (CONFIG_PROC_VMCORE is set by default when CONFIG_CRASH_DUMP is selected.) + +7) Make and install the kernel and its modules. DO NOT add this kernel + to the boot loader configuration files. + + +Load the Dump-capture Kernel +============================ + +After booting to the system kernel, load the dump-capture kernel using +the following command: + + kexec -p <dump-capture-kernel> \ + --initrd=<initrd-for-dump-capture-kernel> --args-linux \ + --append="root=<root-dev> init 1 irqpoll" + + +Notes on loading the dump-capture kernel: + +* <dump-capture-kernel> must be a vmlinux image (that is, an + uncompressed ELF image). bzImage does not work at this time. + +* By default, the ELF headers are stored in ELF64 format to support + systems with more than 4GB memory. The --elf32-core-headers option can + be used to force the generation of ELF32 headers. This is necessary + because GDB currently cannot open vmcore files with ELF64 headers on + 32-bit systems. ELF32 headers can be used on non-PAE systems (that is, + less than 4GB of memory). + +* The "irqpoll" boot parameter reduces driver initialization failures + due to shared interrupts in the dump-capture kernel. + +* You must specify <root-dev> in the format corresponding to the root + device name in the output of mount command. + +* "init 1" boots the dump-capture kernel into single-user mode without + networking. If you want networking, use "init 3." + + +Kernel Panic +============ + +After successfully loading the dump-capture kernel as previously +described, the system will reboot into the dump-capture kernel if a +system crash is triggered. Trigger points are located in panic(), +die(), die_nmi() and in the sysrq handler (ALT-SysRq-c). + +The following conditions will execute a crash trigger point: + +If a hard lockup is detected and "NMI watchdog" is configured, the system +will boot into the dump-capture kernel ( die_nmi() ). + +If die() is called, and it happens to be a thread with pid 0 or 1, or die() +is called inside interrupt context or die() is called and panic_on_oops is set, +the system will boot into the dump-capture kernel. + +On powererpc systems when a soft-reset is generated, die() is called by all cpus and the system system will boot into the dump-capture kernel. + +For testing purposes, you can trigger a crash by using "ALT-SysRq-c", +"echo c > /proc/sysrq-trigger or write a module to force the panic. + +Write Out the Dump File +======================= + +After the dump-capture kernel is booted, write out the dump file with +the following command: cp /proc/vmcore <dump-file> - Dump memory can also be accessed as a /dev/oldmem device for a linear/raw - view. To create the device, type: +You can also access dumped memory as a /dev/oldmem device for a linear +and raw view. To create the device, use the following command: - mknod /dev/oldmem c 1 12 + mknod /dev/oldmem c 1 12 - Use "dd" with suitable options for count, bs and skip to access specific - portions of the dump. +Use the dd command with suitable options for count, bs, and skip to +access specific portions of the dump. - Entire memory: dd if=/dev/oldmem of=oldmem.001 +To see the entire memory, use the following command: + dd if=/dev/oldmem of=oldmem.001 -ANALYSIS + +Analysis ======== -Limited analysis can be done using gdb on the dump file copied out of -/proc/vmcore. Use vmlinux built with -g and run - gdb vmlinux <dump-file> +Before analyzing the dump image, you should reboot into a stable kernel. + +You can do limited analysis using GDB on the dump file copied out of +/proc/vmcore. Use the debug vmlinux built with -g and run the following +command: + + gdb vmlinux <dump-file> -Stack trace for the task on processor 0, register display, memory display -work fine. +Stack trace for the task on processor 0, register display, and memory +display work fine. -Note: gdb cannot analyse core files generated in ELF64 format for i386. +Note: GDB cannot analyze core files generated in ELF64 format for x86. +On systems with a maximum of 4GB of memory, you can generate +ELF32-format headers using the --elf32-core-headers kernel option on the +dump kernel. -Latest "crash" (crash-4.0-2.18) as available on Dave Anderson's site -http://people.redhat.com/~anderson/ works well with kdump format. +You can also use the Crash utility to analyze dump files in Kdump +format. Crash is available on Dave Anderson's site at the following URL: + http://people.redhat.com/~anderson/ + + +To Do +===== -TODO -==== -1) Provide a kernel pages filtering mechanism so that core file size is not - insane on systems having huge memory banks. -2) Relocatable kernel can help in maintaining multiple kernels for crashdump - and same kernel as the first kernel can be used to capture the dump. +1) Provide a kernel pages filtering mechanism, so core file size is not + extreme on systems with huge memory banks. +2) Relocatable kernel can help in maintaining multiple kernels for + crash_dump, and the same kernel as the system kernel can be used to + capture the dump. -CONTACT + +Contact ======= + Vivek Goyal (vgoyal@in.ibm.com) Maneesh Soni (maneesh@in.ibm.com) + + +Trademark +========= + +Linux is a trademark of Linus Torvalds in the United States, other +countries, or both. diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index b3a6187e5305..0d189c93eeaf 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -61,6 +61,7 @@ parameter is applicable: MTD MTD support is enabled. NET Appropriate network support is enabled. NUMA NUMA support is enabled. + GENERIC_TIME The generic timeofday code is enabled. NFS Appropriate NFS support is enabled. OSS OSS sound support is enabled. PARIDE The ParIDE subsystem is enabled. @@ -147,6 +148,9 @@ running once the system is up. acpi_irq_isa= [HW,ACPI] If irq_balance, mark listed IRQs used by ISA Format: <irq>,<irq>... + acpi_os_name= [HW,ACPI] Tell ACPI BIOS the name of the OS + Format: To spoof as Windows 98: ="Microsoft Windows" + acpi_osi= [HW,ACPI] empty param disables _OSI acpi_serialize [HW,ACPI] force serialization of AML methods @@ -176,6 +180,11 @@ running once the system is up. override platform specific driver. See also Documentation/acpi-hotkey.txt. + acpi_pm_good [IA-32,X86-64] + Override the pmtimer bug detection: force the kernel + to assume that this machine's pmtimer latches its value + and always returns good values. + enable_timer_pin_1 [i386,x86-64] Enable PIN 1 of APIC timer Can be useful to work around chipset bugs @@ -338,10 +347,11 @@ running once the system is up. Value can be changed at runtime via /selinux/checkreqprot. - clock= [BUGS=IA-32,HW] gettimeofday timesource override. - Forces specified timesource (if avaliable) to be used - when calculating gettimeofday(). If specicified - timesource is not avalible, it defaults to PIT. + clock= [BUGS=IA-32, HW] gettimeofday clocksource override. + [Deprecated] + Forces specified clocksource (if avaliable) to be used + when calculating gettimeofday(). If specified + clocksource is not avalible, it defaults to PIT. Format: { pit | tsc | cyclone | pmtmr } disable_8254_timer @@ -1402,6 +1412,15 @@ running once the system is up. If enabled at boot time, /selinux/disable can be used later to disable prior to initial policy load. + selinux_compat_net = + [SELINUX] Set initial selinux_compat_net flag value. + Format: { "0" | "1" } + 0 -- use new secmark-based packet controls + 1 -- use legacy packet controls + Default value is 0 (preferred). + Value can be changed at runtime via + /selinux/compat_net. + serialnumber [BUGS=IA-32] sg_def_reserved_size= [SCSI] @@ -1605,6 +1624,10 @@ running once the system is up. time Show timing data prefixed to each printk message line + clocksource= [GENERIC_TIME] Override the default clocksource + Override the default clocksource and use the clocksource + with the name specified. + tipar.timeout= [HW,PPT] Set communications timeout in tenths of a second (default 15). @@ -1646,6 +1669,10 @@ running once the system is up. usbhid.mousepoll= [USBHID] The interval which mice are to be polled at. + vdso= [IA-32] + vdso=1: enable VDSO (default) + vdso=0: disable VDSO mapping + video= [FB] Frame buffer configuration See Documentation/fb/modedb.txt. diff --git a/Documentation/keys.txt b/Documentation/keys.txt index aaa01b0e3ee9..61c0fad2fe2f 100644 --- a/Documentation/keys.txt +++ b/Documentation/keys.txt @@ -19,6 +19,7 @@ This document has the following sections: - Key overview - Key service overview - Key access permissions + - SELinux support - New procfs files - Userspace system call interface - Kernel services @@ -232,6 +233,39 @@ For changing the ownership, group ID or permissions mask, being the owner of the key or having the sysadmin capability is sufficient. +=============== +SELINUX SUPPORT +=============== + +The security class "key" has been added to SELinux so that mandatory access +controls can be applied to keys created within various contexts. This support +is preliminary, and is likely to change quite significantly in the near future. +Currently, all of the basic permissions explained above are provided in SELinux +as well; SELinux is simply invoked after all basic permission checks have been +performed. + +The value of the file /proc/self/attr/keycreate influences the labeling of +newly-created keys. If the contents of that file correspond to an SELinux +security context, then the key will be assigned that context. Otherwise, the +key will be assigned the current context of the task that invoked the key +creation request. Tasks must be granted explicit permission to assign a +particular context to newly-created keys, using the "create" permission in the +key security class. + +The default keyrings associated with users will be labeled with the default +context of the user if and only if the login programs have been instrumented to +properly initialize keycreate during the login process. Otherwise, they will +be labeled with the context of the login program itself. + +Note, however, that the default keyrings associated with the root user are +labeled with the default kernel context, since they are created early in the +boot process, before root has a chance to log in. + +The keyrings associated with new threads are each labeled with the context of +their associated thread, and both session and process keyrings are handled +similarly. + + ================ NEW PROCFS FILES ================ @@ -241,9 +275,17 @@ about the status of the key service: (*) /proc/keys - This lists all the keys on the system, giving information about their - type, description and permissions. The payload of the key is not available - this way: + This lists the keys that are currently viewable by the task reading the + file, giving information about their type, description and permissions. + It is not possible to view the payload of the key this way, though some + information about it may be given. + + The only keys included in the list are those that grant View permission to + the reading process whether or not it possesses them. Note that LSM + security checks are still performed, and may further filter out keys that + the current process is not authorised to view. + + The contents of the file look like this: SERIAL FLAGS USAGE EXPY PERM UID GID TYPE DESCRIPTION: SUMMARY 00000001 I----- 39 perm 1f3f0000 0 0 keyring _uid_ses.0: 1/4 @@ -271,7 +313,7 @@ about the status of the key service: (*) /proc/key-users This file lists the tracking data for each user that has at least one key - on the system. Such data includes quota information and statistics: + on the system. Such data includes quota information and statistics: [root@andromeda root]# cat /proc/key-users 0: 46 45/45 1/100 13/10000 @@ -935,6 +977,16 @@ The structure has a number of fields, some of which are mandatory: It is not safe to sleep in this method; the caller may hold spinlocks. + (*) void (*revoke)(struct key *key); + + This method is optional. It is called to discard part of the payload + data upon a key being revoked. The caller will have the key semaphore + write-locked. + + It is safe to sleep in this method, though care should be taken to avoid + a deadlock against the key semaphore. + + (*) void (*destroy)(struct key *key); This method is optional. It is called to discard the payload data on a key diff --git a/Documentation/md.txt b/Documentation/md.txt index 03a13c462cf2..0668f9dc9d29 100644 --- a/Documentation/md.txt +++ b/Documentation/md.txt @@ -200,6 +200,17 @@ All md devices contain: This can be written only while the array is being assembled, not after it is started. + layout + The "layout" for the array for the particular level. This is + simply a number that is interpretted differently by different + levels. It can be written while assembling an array. + + resync_start + The point at which resync should start. If no resync is needed, + this will be a very large number. At array creation it will + default to 0, though starting the array as 'clean' will + set it much larger. + new_dev This file can be written but not read. The value written should be a block device number as major:minor. e.g. 8:0 @@ -207,6 +218,54 @@ All md devices contain: available. It will then appear at md/dev-XXX (depending on the name of the device) and further configuration is then possible. + safe_mode_delay + When an md array has seen no write requests for a certain period + of time, it will be marked as 'clean'. When another write + request arrive, the array is marked as 'dirty' before the write + commenses. This is known as 'safe_mode'. + The 'certain period' is controlled by this file which stores the + period as a number of seconds. The default is 200msec (0.200). + Writing a value of 0 disables safemode. + + array_state + This file contains a single word which describes the current + state of the array. In many cases, the state can be set by + writing the word for the desired state, however some states + cannot be explicitly set, and some transitions are not allowed. + + clear + No devices, no size, no level + Writing is equivalent to STOP_ARRAY ioctl + inactive + May have some settings, but array is not active + all IO results in error + When written, doesn't tear down array, but just stops it + suspended (not supported yet) + All IO requests will block. The array can be reconfigured. + Writing this, if accepted, will block until array is quiessent + readonly + no resync can happen. no superblocks get written. + write requests fail + read-auto + like readonly, but behaves like 'clean' on a write request. + + clean - no pending writes, but otherwise active. + When written to inactive array, starts without resync + If a write request arrives then + if metadata is known, mark 'dirty' and switch to 'active'. + if not known, block and switch to write-pending + If written to an active array that has pending writes, then fails. + active + fully active: IO and resync can be happening. + When written to inactive array, starts with resync + + write-pending + clean, but writes are blocked waiting for 'active' to be written. + + active-idle + like active, but no writes have been seen for a while (safe_mode_delay). + + sync_speed_min sync_speed_max This are similar to /proc/sys/dev/raid/speed_limit_{min,max} @@ -250,10 +309,18 @@ Each directory contains: faulty - device has been kicked from active use due to a detected fault in_sync - device is a fully in-sync member of the array + writemostly - device will only be subject to read + requests if there are no other options. + This applies only to raid1 arrays. spare - device is working, but not a full member. This includes spares that are in the process of being recoverred to This list make grow in future. + This can be written to. + Writing "faulty" simulates a failure on the device. + Writing "remove" removes the device from the array. + Writing "writemostly" sets the writemostly flag. + Writing "-writemostly" clears the writemostly flag. errors An approximate count of read errors that have been detected on diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 4710845dbac4..cf0d5416a4c3 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -262,9 +262,14 @@ What is required is some way of intervening to instruct the compiler and the CPU to restrict the order. Memory barriers are such interventions. They impose a perceived partial -ordering between the memory operations specified on either side of the barrier. -They request that the sequence of memory events generated appears to other -parts of the system as if the barrier is effective on that CPU. +ordering over the memory operations on either side of the barrier. + +Such enforcement is important because the CPUs and other devices in a system +can use a variety of tricks to improve performance - including reordering, +deferral and combination of memory operations; speculative loads; speculative +branch prediction and various types of caching. Memory barriers are used to +override or suppress these tricks, allowing the code to sanely control the +interaction of multiple CPUs and/or devices. VARIETIES OF MEMORY BARRIER @@ -282,7 +287,7 @@ Memory barriers come in four basic varieties: A write barrier is a partial ordering on stores only; it is not required to have any effect on loads. - A CPU can be viewed as as commiting a sequence of store operations to the + A CPU can be viewed as committing a sequence of store operations to the memory system as time progresses. All stores before a write barrier will occur in the sequence _before_ all the stores after the write barrier. @@ -413,7 +418,7 @@ There are certain things that the Linux kernel memory barriers do not guarantee: indirect effect will be the order in which the second CPU sees the effects of the first CPU's accesses occur, but see the next point: - (*) There is no guarantee that the a CPU will see the correct order of effects + (*) There is no guarantee that a CPU will see the correct order of effects from a second CPU's accesses, even _if_ the second CPU uses a memory barrier, unless the first CPU _also_ uses a matching memory barrier (see the subsection on "SMP Barrier Pairing"). @@ -461,8 +466,8 @@ Whilst this may seem like a failure of coherency or causality maintenance, it isn't, and this behaviour can be observed on certain real CPUs (such as the DEC Alpha). -To deal with this, a data dependency barrier must be inserted between the -address load and the data load: +To deal with this, a data dependency barrier or better must be inserted +between the address load and the data load: CPU 1 CPU 2 =============== =============== @@ -484,7 +489,7 @@ lines. The pointer P might be stored in an odd-numbered cache line, and the variable B might be stored in an even-numbered cache line. Then, if the even-numbered bank of the reading CPU's cache is extremely busy while the odd-numbered bank is idle, one can see the new value of the pointer P (&B), -but the old value of the variable B (1). +but the old value of the variable B (2). Another example of where data dependency barriers might by required is where a @@ -744,7 +749,7 @@ some effectively random order, despite the write barrier issued by CPU 1: : : -If, however, a read barrier were to be placed between the load of E and the +If, however, a read barrier were to be placed between the load of B and the load of A on CPU 2: CPU 1 CPU 2 @@ -1461,9 +1466,8 @@ instruction itself is complete. On a UP system - where this wouldn't be a problem - the smp_mb() is just a compiler barrier, thus making sure the compiler emits the instructions in the -right order without actually intervening in the CPU. Since there there's only -one CPU, that CPU's dependency ordering logic will take care of everything -else. +right order without actually intervening in the CPU. Since there's only one +CPU, that CPU's dependency ordering logic will take care of everything else. ATOMIC OPERATIONS @@ -1640,9 +1644,9 @@ functions: The PCI bus, amongst others, defines an I/O space concept - which on such CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O - space. However, it may also mapped as a virtual I/O space in the CPU's - memory map, particularly on those CPUs that don't support alternate - I/O spaces. + space. However, it may also be mapped as a virtual I/O space in the CPU's + memory map, particularly on those CPUs that don't support alternate I/O + spaces. Accesses to this space may be fully synchronous (as on i386), but intermediary bridges (such as the PCI host bridge) may not fully honour diff --git a/Documentation/networking/README.ipw2200 b/Documentation/networking/README.ipw2200 index acb30c5dcff3..4f2a40f1dbc6 100644 --- a/Documentation/networking/README.ipw2200 +++ b/Documentation/networking/README.ipw2200 @@ -14,8 +14,8 @@ Copyright (C) 2004-2006, Intel Corporation README.ipw2200 -Version: 1.0.8 -Date : October 20, 2005 +Version: 1.1.2 +Date : March 30, 2006 Index @@ -103,7 +103,7 @@ file. 1.1. Overview of Features ----------------------------------------------- -The current release (1.0.8) supports the following features: +The current release (1.1.2) supports the following features: + BSS mode (Infrastructure, Managed) + IBSS mode (Ad-Hoc) @@ -247,8 +247,8 @@ and can set the contents via echo. For example: % cat /sys/bus/pci/drivers/ipw2200/debug_level Will report the current debug level of the driver's logging subsystem -(only available if CONFIG_IPW_DEBUG was configured when the driver was -built). +(only available if CONFIG_IPW2200_DEBUG was configured when the driver +was built). You can set the debug level via: diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 8d8b4e5ea184..afac780445cd 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -1,7 +1,7 @@ Linux Ethernet Bonding Driver HOWTO - Latest update: 21 June 2005 + Latest update: 24 April 2006 Initial release : Thomas Davis <tadavis at lbl.gov> Corrections, HA extensions : 2000/10/03-15 : @@ -12,6 +12,8 @@ Corrections, HA extensions : 2000/10/03-15 : - Jay Vosburgh <fubar at us dot ibm dot com> Reorganized and updated Feb 2005 by Jay Vosburgh +Added Sysfs information: 2006/04/24 + - Mitch Williams <mitch.a.williams at intel.com> Introduction ============ @@ -38,61 +40,62 @@ Table of Contents 2. Bonding Driver Options 3. Configuring Bonding Devices -3.1 Configuration with sysconfig support -3.1.1 Using DHCP with sysconfig -3.1.2 Configuring Multiple Bonds with sysconfig -3.2 Configuration with initscripts support -3.2.1 Using DHCP with initscripts -3.2.2 Configuring Multiple Bonds with initscripts -3.3 Configuring Bonding Manually +3.1 Configuration with Sysconfig Support +3.1.1 Using DHCP with Sysconfig +3.1.2 Configuring Multiple Bonds with Sysconfig +3.2 Configuration with Initscripts Support +3.2.1 Using DHCP with Initscripts +3.2.2 Configuring Multiple Bonds with Initscripts +3.3 Configuring Bonding Manually with Ifenslave 3.3.1 Configuring Multiple Bonds Manually +3.4 Configuring Bonding Manually via Sysfs -5. Querying Bonding Configuration -5.1 Bonding Configuration -5.2 Network Configuration +4. Querying Bonding Configuration +4.1 Bonding Configuration +4.2 Network Configuration -6. Switch Configuration +5. Switch Configuration -7. 802.1q VLAN Support +6. 802.1q VLAN Support -8. Link Monitoring -8.1 ARP Monitor Operation -8.2 Configuring Multiple ARP Targets -8.3 MII Monitor Operation +7. Link Monitoring +7.1 ARP Monitor Operation +7.2 Configuring Multiple ARP Targets +7.3 MII Monitor Operation -9. Potential Trouble Sources -9.1 Adventures in Routing -9.2 Ethernet Device Renaming -9.3 Painfully Slow Or No Failed Link Detection By Miimon +8. Potential Trouble Sources +8.1 Adventures in Routing +8.2 Ethernet Device Renaming +8.3 Painfully Slow Or No Failed Link Detection By Miimon -10. SNMP agents +9. SNMP agents -11. Promiscuous mode +10. Promiscuous mode -12. Configuring Bonding for High Availability -12.1 High Availability in a Single Switch Topology -12.2 High Availability in a Multiple Switch Topology -12.2.1 HA Bonding Mode Selection for Multiple Switch Topology -12.2.2 HA Link Monitoring for Multiple Switch Topology +11. Configuring Bonding for High Availability +11.1 High Availability in a Single Switch Topology +11.2 High Availability in a Multiple Switch Topology +11.2.1 HA Bonding Mode Selection for Multiple Switch Topology +11.2.2 HA Link Monitoring for Multiple Switch Topology -13. Configuring Bonding for Maximum Throughput -13.1 Maximum Throughput in a Single Switch Topology -13.1.1 MT Bonding Mode Selection for Single Switch Topology -13.1.2 MT Link Monitoring for Single Switch Topology -13.2 Maximum Throughput in a Multiple Switch Topology -13.2.1 MT Bonding Mode Selection for Multiple Switch Topology -13.2.2 MT Link Monitoring for Multiple Switch Topology +12. Configuring Bonding for Maximum Throughput +12.1 Maximum Throughput in a Single Switch Topology +12.1.1 MT Bonding Mode Selection for Single Switch Topology +12.1.2 MT Link Monitoring for Single Switch Topology +12.2 Maximum Throughput in a Multiple Switch Topology +12.2.1 MT Bonding Mode Selection for Multiple Switch Topology +12.2.2 MT Link Monitoring for Multiple Switch Topology -14. Switch Behavior Issues -14.1 Link Establishment and Failover Delays -14.2 Duplicated Incoming Packets +13. Switch Behavior Issues +13.1 Link Establishment and Failover Delays +13.2 Duplicated Incoming Packets -15. Hardware Specific Considerations -15.1 IBM BladeCenter +14. Hardware Specific Considerations +14.1 IBM BladeCenter -16. Frequently Asked Questions +15. Frequently Asked Questions -17. Resources and Links +16. Resources and Links 1. Bonding Driver Installation @@ -156,6 +159,9 @@ you're trying to build it for. Some distros (e.g., Red Hat from 7.1 onwards) do not have /usr/include/linux symbolically linked to the default kernel source include directory. +SECOND IMPORTANT NOTE: + If you plan to configure bonding using sysfs, you do not need +to use ifenslave. 2. Bonding Driver Options ========================= @@ -270,7 +276,7 @@ mode In bonding version 2.6.2 or later, when a failover occurs in active-backup mode, bonding will issue one or more gratuitous ARPs on the newly active slave. - One gratutious ARP is issued for the bonding master + One gratuitous ARP is issued for the bonding master interface and each VLAN interfaces configured above it, provided that the interface has at least one IP address configured. Gratuitous ARPs issued for VLAN @@ -377,7 +383,7 @@ mode When a link is reconnected or a new slave joins the bond the receive traffic is redistributed among all active slaves in the bond by initiating ARP Replies - with the selected mac address to each of the + with the selected MAC address to each of the clients. The updelay parameter (detailed below) must be set to a value equal or greater than the switch's forwarding delay so that the ARP Replies sent to the @@ -498,11 +504,12 @@ not exist, and the layer2 policy is the only policy. 3. Configuring Bonding Devices ============================== - There are, essentially, two methods for configuring bonding: -with support from the distro's network initialization scripts, and -without. Distros generally use one of two packages for the network -initialization scripts: initscripts or sysconfig. Recent versions of -these packages have support for bonding, while older versions do not. + You can configure bonding using either your distro's network +initialization scripts, or manually using either ifenslave or the +sysfs interface. Distros generally use one of two packages for the +network initialization scripts: initscripts or sysconfig. Recent +versions of these packages have support for bonding, while older +versions do not. We will first describe the options for configuring bonding for distros using versions of initscripts and sysconfig with full or @@ -530,7 +537,7 @@ $ grep ifenslave /sbin/ifup If this returns any matches, then your initscripts or sysconfig has support for bonding. -3.1 Configuration with sysconfig support +3.1 Configuration with Sysconfig Support ---------------------------------------- This section applies to distros using a version of sysconfig @@ -538,7 +545,7 @@ with bonding support, for example, SuSE Linux Enterprise Server 9. SuSE SLES 9's networking configuration system does support bonding, however, at this writing, the YaST system configuration -frontend does not provide any means to work with bonding devices. +front end does not provide any means to work with bonding devices. Bonding devices can be managed by hand, however, as follows. First, if they have not already been configured, configure the @@ -660,7 +667,7 @@ format can be found in an example ifcfg template file: Note that the template does not document the various BONDING_ settings described above, but does describe many of the other options. -3.1.1 Using DHCP with sysconfig +3.1.1 Using DHCP with Sysconfig ------------------------------- Under sysconfig, configuring a device with BOOTPROTO='dhcp' @@ -670,7 +677,7 @@ attempt to obtain the device address from DHCP prior to adding any of the slave devices. Without active slaves, the DHCP requests are not sent to the network. -3.1.2 Configuring Multiple Bonds with sysconfig +3.1.2 Configuring Multiple Bonds with Sysconfig ----------------------------------------------- The sysconfig network initialization system is capable of @@ -685,7 +692,7 @@ ifcfg-bondX files. options in the ifcfg-bondX file, it is not necessary to add them to the system /etc/modules.conf or /etc/modprobe.conf configuration file. -3.2 Configuration with initscripts support +3.2 Configuration with Initscripts Support ------------------------------------------ This section applies to distros using a version of initscripts @@ -756,7 +763,7 @@ options for your configuration. will restart the networking subsystem and your bond link should be now up and running. -3.2.1 Using DHCP with initscripts +3.2.1 Using DHCP with Initscripts --------------------------------- Recent versions of initscripts (the version supplied with @@ -768,7 +775,7 @@ above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp" and add a line consisting of "TYPE=Bonding". Note that the TYPE value is case sensitive. -3.2.2 Configuring Multiple Bonds with initscripts +3.2.2 Configuring Multiple Bonds with Initscripts ------------------------------------------------- At this writing, the initscripts package does not directly @@ -784,8 +791,8 @@ Fedora Core kernels, and has been seen on RHEL 4 as well. On kernels exhibiting this problem, it will be impossible to configure multiple bonds with differing parameters. -3.3 Configuring Bonding Manually --------------------------------- +3.3 Configuring Bonding Manually with Ifenslave +----------------------------------------------- This section applies to distros whose network initialization scripts (the sysconfig or initscripts package) do not have specific @@ -889,11 +896,139 @@ install bond1 /sbin/modprobe --ignore-install bonding -o bond1 \ This may be repeated any number of times, specifying a new and unique name in place of bond1 for each subsequent instance. +3.4 Configuring Bonding Manually via Sysfs +------------------------------------------ + + Starting with version 3.0, Channel Bonding may be configured +via the sysfs interface. This interface allows dynamic configuration +of all bonds in the system without unloading the module. It also +allows for adding and removing bonds at runtime. Ifenslave is no +longer required, though it is still supported. + + Use of the sysfs interface allows you to use multiple bonds +with different configurations without having to reload the module. +It also allows you to use multiple, differently configured bonds when +bonding is compiled into the kernel. + + You must have the sysfs filesystem mounted to configure +bonding this way. The examples in this document assume that you +are using the standard mount point for sysfs, e.g. /sys. If your +sysfs filesystem is mounted elsewhere, you will need to adjust the +example paths accordingly. + +Creating and Destroying Bonds +----------------------------- +To add a new bond foo: +# echo +foo > /sys/class/net/bonding_masters + +To remove an existing bond bar: +# echo -bar > /sys/class/net/bonding_masters + +To show all existing bonds: +# cat /sys/class/net/bonding_masters + +NOTE: due to 4K size limitation of sysfs files, this list may be +truncated if you have more than a few hundred bonds. This is unlikely +to occur under normal operating conditions. + +Adding and Removing Slaves +-------------------------- + Interfaces may be enslaved to a bond using the file +/sys/class/net/<bond>/bonding/slaves. The semantics for this file +are the same as for the bonding_masters file. + +To enslave interface eth0 to bond bond0: +# ifconfig bond0 up +# echo +eth0 > /sys/class/net/bond0/bonding/slaves + +To free slave eth0 from bond bond0: +# echo -eth0 > /sys/class/net/bond0/bonding/slaves + + NOTE: The bond must be up before slaves can be added. All +slaves are freed when the interface is brought down. + + When an interface is enslaved to a bond, symlinks between the +two are created in the sysfs filesystem. In this case, you would get +/sys/class/net/bond0/slave_eth0 pointing to /sys/class/net/eth0, and +/sys/class/net/eth0/master pointing to /sys/class/net/bond0. + + This means that you can tell quickly whether or not an +interface is enslaved by looking for the master symlink. Thus: +# echo -eth0 > /sys/class/net/eth0/master/bonding/slaves +will free eth0 from whatever bond it is enslaved to, regardless of +the name of the bond interface. + +Changing a Bond's Configuration +------------------------------- + Each bond may be configured individually by manipulating the +files located in /sys/class/net/<bond name>/bonding + + The names of these files correspond directly with the command- +line parameters described elsewhere in in this file, and, with the +exception of arp_ip_target, they accept the same values. To see the +current setting, simply cat the appropriate file. + + A few examples will be given here; for specific usage +guidelines for each parameter, see the appropriate section in this +document. + +To configure bond0 for balance-alb mode: +# ifconfig bond0 down +# echo 6 > /sys/class/net/bond0/bonding/mode + - or - +# echo balance-alb > /sys/class/net/bond0/bonding/mode + NOTE: The bond interface must be down before the mode can be +changed. + +To enable MII monitoring on bond0 with a 1 second interval: +# echo 1000 > /sys/class/net/bond0/bonding/miimon + NOTE: If ARP monitoring is enabled, it will disabled when MII +monitoring is enabled, and vice-versa. + +To add ARP targets: +# echo +192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target +# echo +192.168.0.101 > /sys/class/net/bond0/bonding/arp_ip_target + NOTE: up to 10 target addresses may be specified. + +To remove an ARP target: +# echo -192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target + +Example Configuration +--------------------- + We begin with the same example that is shown in section 3.3, +executed with sysfs, and without using ifenslave. + + To make a simple bond of two e100 devices (presumed to be eth0 +and eth1), and have it persist across reboots, edit the appropriate +file (/etc/init.d/boot.local or /etc/rc.d/rc.local), and add the +following: + +modprobe bonding +modprobe e100 +echo balance-alb > /sys/class/net/bond0/bonding/mode +ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up +echo 100 > /sys/class/net/bond0/bonding/miimon +echo +eth0 > /sys/class/net/bond0/bonding/slaves +echo +eth1 > /sys/class/net/bond0/bonding/slaves + + To add a second bond, with two e1000 interfaces in +active-backup mode, using ARP monitoring, add the following lines to +your init script: + +modprobe e1000 +echo +bond1 > /sys/class/net/bonding_masters +echo active-backup > /sys/class/net/bond1/bonding/mode +ifconfig bond1 192.168.2.1 netmask 255.255.255.0 up +echo +192.168.2.100 /sys/class/net/bond1/bonding/arp_ip_target +echo 2000 > /sys/class/net/bond1/bonding/arp_interval +echo +eth2 > /sys/class/net/bond1/bonding/slaves +echo +eth3 > /sys/class/net/bond1/bonding/slaves + -5. Querying Bonding Configuration +4. Querying Bonding Configuration ================================= -5.1 Bonding Configuration +4.1 Bonding Configuration ------------------------- Each bonding device has a read-only file residing in the @@ -923,7 +1058,7 @@ generally as follows: The precise format and contents will change depending upon the bonding configuration, state, and version of the bonding driver. -5.2 Network configuration +4.2 Network configuration ------------------------- The network configuration can be inspected using the ifconfig @@ -958,7 +1093,7 @@ eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 collisions:0 txqueuelen:100 Interrupt:9 Base address:0x1400 -6. Switch Configuration +5. Switch Configuration ======================= For this section, "switch" refers to whatever system the @@ -991,7 +1126,7 @@ transmit policy for an EtherChannel group; all three will interoperate with another EtherChannel group. -7. 802.1q VLAN Support +6. 802.1q VLAN Support ====================== It is possible to configure VLAN devices over a bond interface @@ -1042,7 +1177,7 @@ underlying device -- i.e. the bonding interface -- to promiscuous mode, which might not be what you want. -8. Link Monitoring +7. Link Monitoring ================== The bonding driver at present supports two schemes for @@ -1053,7 +1188,7 @@ monitor. bonding driver itself, it is not possible to enable both ARP and MII monitoring simultaneously. -8.1 ARP Monitor Operation +7.1 ARP Monitor Operation ------------------------- The ARP monitor operates as its name suggests: it sends ARP @@ -1071,7 +1206,7 @@ those slaves will stay down. If networking monitoring (tcpdump, etc) shows the ARP requests and replies on the network, then it may be that your device driver is not updating last_rx and trans_start. -8.2 Configuring Multiple ARP Targets +7.2 Configuring Multiple ARP Targets ------------------------------------ While ARP monitoring can be done with just one target, it can @@ -1094,7 +1229,7 @@ alias bond0 bonding options bond0 arp_interval=60 arp_ip_target=192.168.0.100 -8.3 MII Monitor Operation +7.3 MII Monitor Operation ------------------------- The MII monitor monitors only the carrier state of the local @@ -1120,14 +1255,14 @@ does not support or had some error in processing both the MII register and ethtool requests), then the MII monitor will assume the link is up. -9. Potential Sources of Trouble +8. Potential Sources of Trouble =============================== -9.1 Adventures in Routing +8.1 Adventures in Routing ------------------------- When bonding is configured, it is important that the slave -devices not have routes that supercede routes of the master (or, +devices not have routes that supersede routes of the master (or, generally, not have routes at all). For example, suppose the bonding device bond0 has two slaves, eth0 and eth1, and the routing table is as follows: @@ -1154,11 +1289,11 @@ by the state of the routing table. The solution here is simply to insure that slaves do not have routes of their own, and if for some reason they must, those routes do -not supercede routes of their master. This should generally be the +not supersede routes of their master. This should generally be the case, but unusual configurations or errant manual or automatic static route additions may cause trouble. -9.2 Ethernet Device Renaming +8.2 Ethernet Device Renaming ---------------------------- On systems with network configuration scripts that do not @@ -1207,7 +1342,7 @@ modprobe with --ignore-install to cause the normal action to then take place. Full documentation on this can be found in the modprobe.conf and modprobe manual pages. -9.3. Painfully Slow Or No Failed Link Detection By Miimon +8.3. Painfully Slow Or No Failed Link Detection By Miimon --------------------------------------------------------- By default, bonding enables the use_carrier option, which @@ -1235,7 +1370,7 @@ carrier state. It has no way to determine the state of devices on or beyond other ports of a switch, or if a switch is refusing to pass traffic while still maintaining carrier on. -10. SNMP agents +9. SNMP agents =============== If running SNMP agents, the bonding driver should be loaded @@ -1281,7 +1416,7 @@ ifDescr, the association between the IP address and IfIndex remains and SNMP functions such as Interface_Scan_Next will report that association. -11. Promiscuous mode +10. Promiscuous mode ==================== When running network monitoring tools, e.g., tcpdump, it is @@ -1308,7 +1443,7 @@ sending to peers that are unassigned or if the load is unbalanced. the active slave changes (e.g., due to a link failure), the promiscuous setting will be propagated to the new active slave. -12. Configuring Bonding for High Availability +11. Configuring Bonding for High Availability ============================================= High Availability refers to configurations that provide @@ -1318,7 +1453,7 @@ goal is to provide the maximum availability of network connectivity (i.e., the network always works), even though other configurations could provide higher throughput. -12.1 High Availability in a Single Switch Topology +11.1 High Availability in a Single Switch Topology -------------------------------------------------- If two hosts (or a host and a single switch) are directly @@ -1332,7 +1467,7 @@ the load will be rebalanced across the remaining devices. See Section 13, "Configuring Bonding for Maximum Throughput" for information on configuring bonding with one peer device. -12.2 High Availability in a Multiple Switch Topology +11.2 High Availability in a Multiple Switch Topology ---------------------------------------------------- With multiple switches, the configuration of bonding and the @@ -1359,7 +1494,7 @@ switches (ISL, or inter switch link), and multiple ports connecting to the outside world ("port3" on each switch). There is no technical reason that this could not be extended to a third switch. -12.2.1 HA Bonding Mode Selection for Multiple Switch Topology +11.2.1 HA Bonding Mode Selection for Multiple Switch Topology ------------------------------------------------------------- In a topology such as the example above, the active-backup and @@ -1381,7 +1516,7 @@ broadcast: This mode is really a special purpose mode, and is suitable necessary for some specific one-way traffic to reach both independent networks, then the broadcast mode may be suitable. -12.2.2 HA Link Monitoring Selection for Multiple Switch Topology +11.2.2 HA Link Monitoring Selection for Multiple Switch Topology ---------------------------------------------------------------- The choice of link monitoring ultimately depends upon your @@ -1402,10 +1537,10 @@ regardless of which switch is active, the ARP monitor has a suitable target to query. -13. Configuring Bonding for Maximum Throughput +12. Configuring Bonding for Maximum Throughput ============================================== -13.1 Maximizing Throughput in a Single Switch Topology +12.1 Maximizing Throughput in a Single Switch Topology ------------------------------------------------------ In a single switch configuration, the best method to maximize @@ -1476,7 +1611,7 @@ destination to make load balancing decisions. The behavior of each mode is described below. -13.1.1 MT Bonding Mode Selection for Single Switch Topology +12.1.1 MT Bonding Mode Selection for Single Switch Topology ----------------------------------------------------------- This configuration is the easiest to set up and to understand, @@ -1607,7 +1742,7 @@ balance-alb: This mode is everything that balance-tlb is, and more. device driver must support changing the hardware address while the device is open. -13.1.2 MT Link Monitoring for Single Switch Topology +12.1.2 MT Link Monitoring for Single Switch Topology ---------------------------------------------------- The choice of link monitoring may largely depend upon which @@ -1616,7 +1751,7 @@ support the use of the ARP monitor, and are thus restricted to using the MII monitor (which does not provide as high a level of end to end assurance as the ARP monitor). -13.2 Maximum Throughput in a Multiple Switch Topology +12.2 Maximum Throughput in a Multiple Switch Topology ----------------------------------------------------- Multiple switches may be utilized to optimize for throughput @@ -1651,7 +1786,7 @@ a single 72 port switch. can be equipped with an additional network device connected to an external network; this host then additionally acts as a gateway. -13.2.1 MT Bonding Mode Selection for Multiple Switch Topology +12.2.1 MT Bonding Mode Selection for Multiple Switch Topology ------------------------------------------------------------- In actual practice, the bonding mode typically employed in @@ -1664,7 +1799,7 @@ packets has arrived). When employed in this fashion, the balance-rr mode allows individual connections between two hosts to effectively utilize greater than one interface's bandwidth. -13.2.2 MT Link Monitoring for Multiple Switch Topology +12.2.2 MT Link Monitoring for Multiple Switch Topology ------------------------------------------------------ Again, in actual practice, the MII monitor is most often used @@ -1674,10 +1809,10 @@ advantages over the MII monitor are mitigated by the volume of probes needed as the number of systems involved grows (remember that each host in the network is configured with bonding). -14. Switch Behavior Issues +13. Switch Behavior Issues ========================== -14.1 Link Establishment and Failover Delays +13.1 Link Establishment and Failover Delays ------------------------------------------- Some switches exhibit undesirable behavior with regard to the @@ -1712,7 +1847,7 @@ switches take a long time to go into backup mode, it may be desirable to not activate a backup interface immediately after a link goes down. Failover may be delayed via the downdelay bonding module option. -14.2 Duplicated Incoming Packets +13.2 Duplicated Incoming Packets -------------------------------- It is not uncommon to observe a short burst of duplicated @@ -1751,14 +1886,14 @@ behavior, it can be induced by clearing the MAC forwarding table (on most Cisco switches, the privileged command "clear mac address-table dynamic" will accomplish this). -15. Hardware Specific Considerations +14. Hardware Specific Considerations ==================================== This section contains additional information for configuring bonding on specific hardware platforms, or for interfacing bonding with particular switches or other devices. -15.1 IBM BladeCenter +14.1 IBM BladeCenter -------------------- This applies to the JS20 and similar systems. @@ -1861,7 +1996,7 @@ bonding driver. avoid fail-over delay issues when using bonding. -16. Frequently Asked Questions +15. Frequently Asked Questions ============================== 1. Is it SMP safe? @@ -1925,7 +2060,7 @@ not have special switch requirements, but do need device drivers that support specific features (described in the appropriate section under module parameters, above). - In 802.3ad mode, it works with with systems that support IEEE + In 802.3ad mode, it works with systems that support IEEE 802.3ad Dynamic Link Aggregation. Most managed and many unmanaged switches currently available support 802.3ad. diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index f12007b80a46..d46338af6002 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -362,6 +362,13 @@ tcp_workaround_signed_windows - BOOLEAN not receive a window scaling option from them. Default: 0 +tcp_slow_start_after_idle - BOOLEAN + If set, provide RFC2861 behavior and time out the congestion + window after an idle period. An idle period is defined at + the current RTO. If unset, the congestion window will not + be timed out after an idle period. + Default: 1 + IP Variables: ip_local_port_range - 2 INTEGERS diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt index 3c0a5ba614d7..847cedb238f6 100644 --- a/Documentation/networking/netdevices.txt +++ b/Documentation/networking/netdevices.txt @@ -42,9 +42,9 @@ dev->get_stats: Context: nominally process, but don't sleep inside an rwlock dev->hard_start_xmit: - Synchronization: dev->xmit_lock spinlock. + Synchronization: netif_tx_lock spinlock. When the driver sets NETIF_F_LLTX in dev->features this will be - called without holding xmit_lock. In this case the driver + called without holding netif_tx_lock. In this case the driver has to lock by itself when needed. It is recommended to use a try lock for this and return -1 when the spin lock fails. The locking there should also properly protect against @@ -62,12 +62,12 @@ dev->hard_start_xmit: Only valid when NETIF_F_LLTX is set. dev->tx_timeout: - Synchronization: dev->xmit_lock spinlock. + Synchronization: netif_tx_lock spinlock. Context: BHs disabled Notes: netif_queue_stopped() is guaranteed true dev->set_multicast_list: - Synchronization: dev->xmit_lock spinlock. + Synchronization: netif_tx_lock spinlock. Context: BHs disabled dev->poll: diff --git a/Documentation/networking/tuntap.txt b/Documentation/networking/tuntap.txt index 76750fb9151a..839cbb71388b 100644 --- a/Documentation/networking/tuntap.txt +++ b/Documentation/networking/tuntap.txt @@ -39,10 +39,13 @@ Copyright (C) 1999-2000 Maxim Krasnyansky <max_mk@yahoo.com> mknod /dev/net/tun c 10 200 Set permissions: - e.g. chmod 0700 /dev/net/tun - if you want the device only accessible by root. Giving regular users the - right to assign network devices is NOT a good idea. Users could assign - bogus network interfaces to trick firewalls or administrators. + e.g. chmod 0666 /dev/net/tun + There's no harm in allowing the device to be accessible by non-root users, + since CAP_NET_ADMIN is required for creating network devices or for + connecting to network devices which aren't owned by the user in question. + If you want to create persistent devices and give ownership of them to + unprivileged users, then you need the /dev/net/tun device to be usable by + those users. Driver module autoloading diff --git a/Documentation/pci.txt b/Documentation/pci.txt index 66bbbf1d1ef6..3242e5c1ee9c 100644 --- a/Documentation/pci.txt +++ b/Documentation/pci.txt @@ -213,9 +213,17 @@ have been remapped by the kernel. See Documentation/IO-mapping.txt for how to access device memory. - You still need to call request_region() for I/O regions and -request_mem_region() for memory regions to make sure nobody else is using the -same device. + The device driver needs to call pci_request_region() to make sure +no other device is already using the same resource. The driver is expected +to determine MMIO and IO Port resource availability _before_ calling +pci_enable_device(). Conversely, drivers should call pci_release_region() +_after_ calling pci_disable_device(). The idea is to prevent two devices +colliding on the same address range. + +Generic flavors of pci_request_region() are request_mem_region() +(for MMIO ranges) and request_region() (for IO Port ranges). +Use these for address resources that are not described by "normal" PCI +interfaces (e.g. BAR). All interrupt handlers should be registered with SA_SHIRQ and use the devid to map IRQs to devices (remember that all PCI interrupts are shared). diff --git a/Documentation/pi-futex.txt b/Documentation/pi-futex.txt new file mode 100644 index 000000000000..5d61dacd21f6 --- /dev/null +++ b/Documentation/pi-futex.txt @@ -0,0 +1,121 @@ +Lightweight PI-futexes +---------------------- + +We are calling them lightweight for 3 reasons: + + - in the user-space fastpath a PI-enabled futex involves no kernel work + (or any other PI complexity) at all. No registration, no extra kernel + calls - just pure fast atomic ops in userspace. + + - even in the slowpath, the system call and scheduling pattern is very + similar to normal futexes. + + - the in-kernel PI implementation is streamlined around the mutex + abstraction, with strict rules that keep the implementation + relatively simple: only a single owner may own a lock (i.e. no + read-write lock support), only the owner may unlock a lock, no + recursive locking, etc. + +Priority Inheritance - why? +--------------------------- + +The short reply: user-space PI helps achieving/improving determinism for +user-space applications. In the best-case, it can help achieve +determinism and well-bound latencies. Even in the worst-case, PI will +improve the statistical distribution of locking related application +delays. + +The longer reply: +----------------- + +Firstly, sharing locks between multiple tasks is a common programming +technique that often cannot be replaced with lockless algorithms. As we +can see it in the kernel [which is a quite complex program in itself], +lockless structures are rather the exception than the norm - the current +ratio of lockless vs. locky code for shared data structures is somewhere +between 1:10 and 1:100. Lockless is hard, and the complexity of lockless +algorithms often endangers to ability to do robust reviews of said code. +I.e. critical RT apps often choose lock structures to protect critical +data structures, instead of lockless algorithms. Furthermore, there are +cases (like shared hardware, or other resource limits) where lockless +access is mathematically impossible. + +Media players (such as Jack) are an example of reasonable application +design with multiple tasks (with multiple priority levels) sharing +short-held locks: for example, a highprio audio playback thread is +combined with medium-prio construct-audio-data threads and low-prio +display-colory-stuff threads. Add video and decoding to the mix and +we've got even more priority levels. + +So once we accept that synchronization objects (locks) are an +unavoidable fact of life, and once we accept that multi-task userspace +apps have a very fair expectation of being able to use locks, we've got +to think about how to offer the option of a deterministic locking +implementation to user-space. + +Most of the technical counter-arguments against doing priority +inheritance only apply to kernel-space locks. But user-space locks are +different, there we cannot disable interrupts or make the task +non-preemptible in a critical section, so the 'use spinlocks' argument +does not apply (user-space spinlocks have the same priority inversion +problems as other user-space locking constructs). Fact is, pretty much +the only technique that currently enables good determinism for userspace +locks (such as futex-based pthread mutexes) is priority inheritance: + +Currently (without PI), if a high-prio and a low-prio task shares a lock +[this is a quite common scenario for most non-trivial RT applications], +even if all critical sections are coded carefully to be deterministic +(i.e. all critical sections are short in duration and only execute a +limited number of instructions), the kernel cannot guarantee any +deterministic execution of the high-prio task: any medium-priority task +could preempt the low-prio task while it holds the shared lock and +executes the critical section, and could delay it indefinitely. + +Implementation: +--------------- + +As mentioned before, the userspace fastpath of PI-enabled pthread +mutexes involves no kernel work at all - they behave quite similarly to +normal futex-based locks: a 0 value means unlocked, and a value==TID +means locked. (This is the same method as used by list-based robust +futexes.) Userspace uses atomic ops to lock/unlock these mutexes without +entering the kernel. + +To handle the slowpath, we have added two new futex ops: + + FUTEX_LOCK_PI + FUTEX_UNLOCK_PI + +If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to +TID fails], then FUTEX_LOCK_PI is called. The kernel does all the +remaining work: if there is no futex-queue attached to the futex address +yet then the code looks up the task that owns the futex [it has put its +own TID into the futex value], and attaches a 'PI state' structure to +the futex-queue. The pi_state includes an rt-mutex, which is a PI-aware, +kernel-based synchronization object. The 'other' task is made the owner +of the rt-mutex, and the FUTEX_WAITERS bit is atomically set in the +futex value. Then this task tries to lock the rt-mutex, on which it +blocks. Once it returns, it has the mutex acquired, and it sets the +futex value to its own TID and returns. Userspace has no other work to +perform - it now owns the lock, and futex value contains +FUTEX_WAITERS|TID. + +If the unlock side fastpath succeeds, [i.e. userspace manages to do a +TID -> 0 atomic transition of the futex value], then no kernel work is +triggered. + +If the unlock fastpath fails (because the FUTEX_WAITERS bit is set), +then FUTEX_UNLOCK_PI is called, and the kernel unlocks the futex on the +behalf of userspace - and it also unlocks the attached +pi_state->rt_mutex and thus wakes up any potential waiters. + +Note that under this approach, contrary to previous PI-futex approaches, +there is no prior 'registration' of a PI-futex. [which is not quite +possible anyway, due to existing ABI properties of pthread mutexes.] + +Also, under this scheme, 'robustness' and 'PI' are two orthogonal +properties of futexes, and all four combinations are possible: futex, +robust-futex, PI-futex, robust+PI-futex. + +More details about priority inheritance can be found in +Documentation/rtmutex.txt. diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt index f987afe43e28..fba1e05c47c7 100644 --- a/Documentation/power/devices.txt +++ b/Documentation/power/devices.txt @@ -135,96 +135,6 @@ HW. FREEZE -- stop DMA and interrupts, and be prepared to reinit HW from scratch. That probably means stop accepting upstream requests, the -actual policy of what to do with them beeing specific to a given -driver. It's acceptable for a network driver to just drop packets -while a block driver is expected to block the queue so no request is -lost. (Use IDE as an example on how to do that). FREEZE requires no -power state change, and it's expected for drivers to be able to -quickly transition back to operating state. - -SUSPEND -- like FREEZE, but also put hardware into low-power state. If -there's need to distinguish several levels of sleep, additional flag -is probably best way to do that. - -Transitions are only from a resumed state to a suspended state, never -between 2 suspended states. (ON -> FREEZE or ON -> SUSPEND can happen, -FREEZE -> SUSPEND or SUSPEND -> FREEZE can not). - -All events are: - -[NOTE NOTE NOTE: If you are driver author, you should not care; you -should only look at event, and ignore flags.] - -#Prepare for suspend -- userland is still running but we are going to -#enter suspend state. This gives drivers chance to load firmware from -#disk and store it in memory, or do other activities taht require -#operating userland, ability to kmalloc GFP_KERNEL, etc... All of these -#are forbiden once the suspend dance is started.. event = ON, flags = -#PREPARE_TO_SUSPEND - -Apm standby -- prepare for APM event. Quiesce devices to make life -easier for APM BIOS. event = FREEZE, flags = APM_STANDBY - -Apm suspend -- same as APM_STANDBY, but it we should probably avoid -spinning down disks. event = FREEZE, flags = APM_SUSPEND - -System halt, reboot -- quiesce devices to make life easier for BIOS. event -= FREEZE, flags = SYSTEM_HALT or SYSTEM_REBOOT - -System shutdown -- at least disks need to be spun down, or data may be -lost. Quiesce devices, just to make life easier for BIOS. event = -FREEZE, flags = SYSTEM_SHUTDOWN - -Kexec -- turn off DMAs and put hardware into some state where new -kernel can take over. event = FREEZE, flags = KEXEC - -Powerdown at end of swsusp -- very similar to SYSTEM_SHUTDOWN, except wake -may need to be enabled on some devices. This actually has at least 3 -subtypes, system can reboot, enter S4 and enter S5 at the end of -swsusp. event = FREEZE, flags = SWSUSP and one of SYSTEM_REBOOT, -SYSTEM_SHUTDOWN, SYSTEM_S4 - -Suspend to ram -- put devices into low power state. event = SUSPEND, -flags = SUSPEND_TO_RAM - -Freeze for swsusp snapshot -- stop DMA and interrupts. No need to put -devices into low power mode, but you must be able to reinitialize -device from scratch in resume method. This has two flavors, its done -once on suspending kernel, once on resuming kernel. event = FREEZE, -flags = DURING_SUSPEND or DURING_RESUME - -Device detach requested from /sys -- deinitialize device; proably same as -SYSTEM_SHUTDOWN, I do not understand this one too much. probably event -= FREEZE, flags = DEV_DETACH. - -#These are not really events sent: -# -#System fully on -- device is working normally; this is probably never -#passed to suspend() method... event = ON, flags = 0 -# -#Ready after resume -- userland is now running, again. Time to free any -#memory you ate during prepare to suspend... event = ON, flags = -#READY_AFTER_RESUME -# - - -pm_message_t meaning - -pm_message_t has two fields. event ("major"), and flags. If driver -does not know event code, it aborts the request, returning error. Some -drivers may need to deal with special cases based on the actual type -of suspend operation being done at the system level. This is why -there are flags. - -Event codes are: - -ON -- no need to do anything except special cases like broken -HW. - -# NOTIFICATION -- pretty much same as ON? - -FREEZE -- stop DMA and interrupts, and be prepared to reinit HW from -scratch. That probably means stop accepting upstream requests, the actual policy of what to do with them being specific to a given driver. It's acceptable for a network driver to just drop packets while a block driver is expected to block the queue so no request is diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt index d7814a113ee1..823b2cf6e3dc 100644 --- a/Documentation/power/swsusp.txt +++ b/Documentation/power/swsusp.txt @@ -18,10 +18,11 @@ Some warnings, first. * * (*) suspend/resume support is needed to make it safe. * - * If you have any filesystems on USB devices mounted before suspend, + * If you have any filesystems on USB devices mounted before software suspend, * they won't be accessible after resume and you may lose data, as though - * you have unplugged the USB devices with mounted filesystems on them - * (see the FAQ below for details). + * you have unplugged the USB devices with mounted filesystems on them; + * see the FAQ below for details. (This is not true for more traditional + * power states like "standby", which normally don't turn USB off.) You need to append resume=/dev/your_swap_partition to kernel command line. Then you suspend by @@ -204,7 +205,7 @@ Q: There don't seem to be any generally useful behavioral distinctions between SUSPEND and FREEZE. A: Doing SUSPEND when you are asked to do FREEZE is always correct, -but it may be unneccessarily slow. If you want USB to stay simple, +but it may be unneccessarily slow. If you want your driver to stay simple, slowness may not matter to you. It can always be fixed later. For devices like disk it does matter, you do not want to spindown for @@ -349,25 +350,72 @@ Q: How do I make suspend more verbose? A: If you want to see any non-error kernel messages on the virtual terminal the kernel switches to during suspend, you have to set the -kernel console loglevel to at least 5, for example by doing - - echo 5 > /proc/sys/kernel/printk +kernel console loglevel to at least 4 (KERN_WARNING), for example by +doing + + # save the old loglevel + read LOGLEVEL DUMMY < /proc/sys/kernel/printk + # set the loglevel so we see the progress bar. + # if the level is higher than needed, we leave it alone. + if [ $LOGLEVEL -lt 5 ]; then + echo 5 > /proc/sys/kernel/printk + fi + + IMG_SZ=0 + read IMG_SZ < /sys/power/image_size + echo -n disk > /sys/power/state + RET=$? + # + # the logic here is: + # if image_size > 0 (without kernel support, IMG_SZ will be zero), + # then try again with image_size set to zero. + if [ $RET -ne 0 -a $IMG_SZ -ne 0 ]; then # try again with minimal image size + echo 0 > /sys/power/image_size + echo -n disk > /sys/power/state + RET=$? + fi + + # restore previous loglevel + echo $LOGLEVEL > /proc/sys/kernel/printk + exit $RET Q: Is this true that if I have a mounted filesystem on a USB device and I suspend to disk, I can lose data unless the filesystem has been mounted with "sync"? -A: That's right. It depends on your hardware, and it could be true even for -suspend-to-RAM. In fact, even with "-o sync" you can lose data if your -programs have information in buffers they haven't written out to disk. +A: That's right ... if you disconnect that device, you may lose data. +In fact, even with "-o sync" you can lose data if your programs have +information in buffers they haven't written out to a disk you disconnect, +or if you disconnect before the device finished saving data you wrote. -If you're lucky, your hardware will support low-power modes for USB -controllers while the system is asleep. Lots of hardware doesn't, -however. Shutting off the power to a USB controller is equivalent to -unplugging all the attached devices. +Software suspend normally powers down USB controllers, which is equivalent +to disconnecting all USB devices attached to your system. -Remember that it's always a bad idea to unplug a disk drive containing a -mounted filesystem. With USB that's true even when your system is asleep! -The safest thing is to unmount all USB-based filesystems before suspending -and remount them after resuming. +Your system might well support low-power modes for its USB controllers +while the system is asleep, maintaining the connection, using true sleep +modes like "suspend-to-RAM" or "standby". (Don't write "disk" to the +/sys/power/state file; write "standby" or "mem".) We've not seen any +hardware that can use these modes through software suspend, although in +theory some systems might support "platform" or "firmware" modes that +won't break the USB connections. +Remember that it's always a bad idea to unplug a disk drive containing a +mounted filesystem. That's true even when your system is asleep! The +safest thing is to unmount all filesystems on removable media (such USB, +Firewire, CompactFlash, MMC, external SATA, or even IDE hotplug bays) +before suspending; then remount them after resuming. + +Q: I upgraded the kernel from 2.6.15 to 2.6.16. Both kernels were +compiled with the similar configuration files. Anyway I found that +suspend to disk (and resume) is much slower on 2.6.16 compared to +2.6.15. Any idea for why that might happen or how can I speed it up? + +A: This is because the size of the suspend image is now greater than +for 2.6.15 (by saving more data we can get more responsive system +after resume). + +There's the /sys/power/image_size knob that controls the size of the +image. If you set it to 0 (eg. by echo 0 > /sys/power/image_size as +root), the 2.6.15 behavior should be restored. If it is still too +slow, take a look at suspend.sf.net -- userland suspend is faster and +supports LZF compression to speed it up further. diff --git a/Documentation/power/video.txt b/Documentation/power/video.txt index 43a889f8f08d..d859faa3a463 100644 --- a/Documentation/power/video.txt +++ b/Documentation/power/video.txt @@ -90,6 +90,7 @@ Table of known working notebooks: Model hack (or "how to do it") ------------------------------------------------------------------------------ Acer Aspire 1406LC ole's late BIOS init (7), turn off DRI +Acer TM 230 s3_bios (2) Acer TM 242FX vbetool (6) Acer TM C110 video_post (8) Acer TM C300 vga=normal (only suspend on console, not in X), vbetool (6) or video_post (8) @@ -115,6 +116,7 @@ Dell D610 vga=normal and X (possibly vbestate (6) too, but not tested) Dell Inspiron 4000 ??? (*) Dell Inspiron 500m ??? (*) Dell Inspiron 510m ??? +Dell Inspiron 5150 vbetool needed (6) Dell Inspiron 600m ??? (*) Dell Inspiron 8200 ??? (*) Dell Inspiron 8500 ??? (*) @@ -125,6 +127,7 @@ HP NX7000 ??? (*) HP Pavilion ZD7000 vbetool post needed, need open-source nv driver for X HP Omnibook XE3 athlon version none (1) HP Omnibook XE3GC none (1), video is S3 Savage/IX-MV +HP Omnibook XE3L-GF vbetool (6) HP Omnibook 5150 none (1), (S1 also works OK) IBM TP T20, model 2647-44G none (1), video is S3 Inc. 86C270-294 Savage/IX-MV, vesafb gets "interesting" but X work. IBM TP A31 / Type 2652-M5G s3_mode (3) [works ok with BIOS 1.04 2002-08-23, but not at all with BIOS 1.11 2004-11-05 :-(] @@ -157,6 +160,7 @@ Sony Vaio vgn-s260 X or boot-radeon can init it (5) Sony Vaio vgn-S580BH vga=normal, but suspend from X. Console will be blank unless you return to X. Sony Vaio vgn-FS115B s3_bios (2),s3_mode (4) Toshiba Libretto L5 none (1) +Toshiba Libretto 100CT/110CT vbetool (6) Toshiba Portege 3020CT s3_mode (3) Toshiba Satellite 4030CDT s3_mode (3) (S1 also works OK) Toshiba Satellite 4080XCDT s3_mode (3) (S1 also works OK) diff --git a/Documentation/robust-futexes.txt b/Documentation/robust-futexes.txt index df82d75245a0..76e8064b8c3a 100644 --- a/Documentation/robust-futexes.txt +++ b/Documentation/robust-futexes.txt @@ -95,7 +95,7 @@ comparison. If the thread has registered a list, then normally the list is empty. If the thread/process crashed or terminated in some incorrect way then the list might be non-empty: in this case the kernel carefully walks the list [not trusting it], and marks all locks that are owned by -this thread with the FUTEX_OWNER_DEAD bit, and wakes up one waiter (if +this thread with the FUTEX_OWNER_DIED bit, and wakes up one waiter (if any). The list is guaranteed to be private and per-thread at do_exit() time, diff --git a/Documentation/rt-mutex-design.txt b/Documentation/rt-mutex-design.txt new file mode 100644 index 000000000000..c472ffacc2f6 --- /dev/null +++ b/Documentation/rt-mutex-design.txt @@ -0,0 +1,781 @@ +# +# Copyright (c) 2006 Steven Rostedt +# Licensed under the GNU Free Documentation License, Version 1.2 +# + +RT-mutex implementation design +------------------------------ + +This document tries to describe the design of the rtmutex.c implementation. +It doesn't describe the reasons why rtmutex.c exists. For that please see +Documentation/rt-mutex.txt. Although this document does explain problems +that happen without this code, but that is in the concept to understand +what the code actually is doing. + +The goal of this document is to help others understand the priority +inheritance (PI) algorithm that is used, as well as reasons for the +decisions that were made to implement PI in the manner that was done. + + +Unbounded Priority Inversion +---------------------------- + +Priority inversion is when a lower priority process executes while a higher +priority process wants to run. This happens for several reasons, and +most of the time it can't be helped. Anytime a high priority process wants +to use a resource that a lower priority process has (a mutex for example), +the high priority process must wait until the lower priority process is done +with the resource. This is a priority inversion. What we want to prevent +is something called unbounded priority inversion. That is when the high +priority process is prevented from running by a lower priority process for +an undetermined amount of time. + +The classic example of unbounded priority inversion is were you have three +processes, let's call them processes A, B, and C, where A is the highest +priority process, C is the lowest, and B is in between. A tries to grab a lock +that C owns and must wait and lets C run to release the lock. But in the +meantime, B executes, and since B is of a higher priority than C, it preempts C, +but by doing so, it is in fact preempting A which is a higher priority process. +Now there's no way of knowing how long A will be sleeping waiting for C +to release the lock, because for all we know, B is a CPU hog and will +never give C a chance to release the lock. This is called unbounded priority +inversion. + +Here's a little ASCII art to show the problem. + + grab lock L1 (owned by C) + | +A ---+ + C preempted by B + | +C +----+ + +B +--------> + B now keeps A from running. + + +Priority Inheritance (PI) +------------------------- + +There are several ways to solve this issue, but other ways are out of scope +for this document. Here we only discuss PI. + +PI is where a process inherits the priority of another process if the other +process blocks on a lock owned by the current process. To make this easier +to understand, let's use the previous example, with processes A, B, and C again. + +This time, when A blocks on the lock owned by C, C would inherit the priority +of A. So now if B becomes runnable, it would not preempt C, since C now has +the high priority of A. As soon as C releases the lock, it loses its +inherited priority, and A then can continue with the resource that C had. + +Terminology +----------- + +Here I explain some terminology that is used in this document to help describe +the design that is used to implement PI. + +PI chain - The PI chain is an ordered series of locks and processes that cause + processes to inherit priorities from a previous process that is + blocked on one of its locks. This is described in more detail + later in this document. + +mutex - In this document, to differentiate from locks that implement + PI and spin locks that are used in the PI code, from now on + the PI locks will be called a mutex. + +lock - In this document from now on, I will use the term lock when + referring to spin locks that are used to protect parts of the PI + algorithm. These locks disable preemption for UP (when + CONFIG_PREEMPT is enabled) and on SMP prevents multiple CPUs from + entering critical sections simultaneously. + +spin lock - Same as lock above. + +waiter - A waiter is a struct that is stored on the stack of a blocked + process. Since the scope of the waiter is within the code for + a process being blocked on the mutex, it is fine to allocate + the waiter on the process's stack (local variable). This + structure holds a pointer to the task, as well as the mutex that + the task is blocked on. It also has the plist node structures to + place the task in the waiter_list of a mutex as well as the + pi_list of a mutex owner task (described below). + + waiter is sometimes used in reference to the task that is waiting + on a mutex. This is the same as waiter->task. + +waiters - A list of processes that are blocked on a mutex. + +top waiter - The highest priority process waiting on a specific mutex. + +top pi waiter - The highest priority process waiting on one of the mutexes + that a specific process owns. + +Note: task and process are used interchangeably in this document, mostly to + differentiate between two processes that are being described together. + + +PI chain +-------- + +The PI chain is a list of processes and mutexes that may cause priority +inheritance to take place. Multiple chains may converge, but a chain +would never diverge, since a process can't be blocked on more than one +mutex at a time. + +Example: + + Process: A, B, C, D, E + Mutexes: L1, L2, L3, L4 + + A owns: L1 + B blocked on L1 + B owns L2 + C blocked on L2 + C owns L3 + D blocked on L3 + D owns L4 + E blocked on L4 + +The chain would be: + + E->L4->D->L3->C->L2->B->L1->A + +To show where two chains merge, we could add another process F and +another mutex L5 where B owns L5 and F is blocked on mutex L5. + +The chain for F would be: + + F->L5->B->L1->A + +Since a process may own more than one mutex, but never be blocked on more than +one, the chains merge. + +Here we show both chains: + + E->L4->D->L3->C->L2-+ + | + +->B->L1->A + | + F->L5-+ + +For PI to work, the processes at the right end of these chains (or we may +also call it the Top of the chain) must be equal to or higher in priority +than the processes to the left or below in the chain. + +Also since a mutex may have more than one process blocked on it, we can +have multiple chains merge at mutexes. If we add another process G that is +blocked on mutex L2: + + G->L2->B->L1->A + +And once again, to show how this can grow I will show the merging chains +again. + + E->L4->D->L3->C-+ + +->L2-+ + | | + G-+ +->B->L1->A + | + F->L5-+ + + +Plist +----- + +Before I go further and talk about how the PI chain is stored through lists +on both mutexes and processes, I'll explain the plist. This is similar to +the struct list_head functionality that is already in the kernel. +The implementation of plist is out of scope for this document, but it is +very important to understand what it does. + +There are a few differences between plist and list, the most important one +being that plist is a priority sorted linked list. This means that the +priorities of the plist are sorted, such that it takes O(1) to retrieve the +highest priority item in the list. Obviously this is useful to store processes +based on their priorities. + +Another difference, which is important for implementation, is that, unlike +list, the head of the list is a different element than the nodes of a list. +So the head of the list is declared as struct plist_head and nodes that will +be added to the list are declared as struct plist_node. + + +Mutex Waiter List +----------------- + +Every mutex keeps track of all the waiters that are blocked on itself. The mutex +has a plist to store these waiters by priority. This list is protected by +a spin lock that is located in the struct of the mutex. This lock is called +wait_lock. Since the modification of the waiter list is never done in +interrupt context, the wait_lock can be taken without disabling interrupts. + + +Task PI List +------------ + +To keep track of the PI chains, each process has its own PI list. This is +a list of all top waiters of the mutexes that are owned by the process. +Note that this list only holds the top waiters and not all waiters that are +blocked on mutexes owned by the process. + +The top of the task's PI list is always the highest priority task that +is waiting on a mutex that is owned by the task. So if the task has +inherited a priority, it will always be the priority of the task that is +at the top of this list. + +This list is stored in the task structure of a process as a plist called +pi_list. This list is protected by a spin lock also in the task structure, +called pi_lock. This lock may also be taken in interrupt context, so when +locking the pi_lock, interrupts must be disabled. + + +Depth of the PI Chain +--------------------- + +The maximum depth of the PI chain is not dynamic, and could actually be +defined. But is very complex to figure it out, since it depends on all +the nesting of mutexes. Let's look at the example where we have 3 mutexes, +L1, L2, and L3, and four separate functions func1, func2, func3 and func4. +The following shows a locking order of L1->L2->L3, but may not actually +be directly nested that way. + +void func1(void) +{ + mutex_lock(L1); + + /* do anything */ + + mutex_unlock(L1); +} + +void func2(void) +{ + mutex_lock(L1); + mutex_lock(L2); + + /* do something */ + + mutex_unlock(L2); + mutex_unlock(L1); +} + +void func3(void) +{ + mutex_lock(L2); + mutex_lock(L3); + + /* do something else */ + + mutex_unlock(L3); + mutex_unlock(L2); +} + +void func4(void) +{ + mutex_lock(L3); + + /* do something again */ + + mutex_unlock(L3); +} + +Now we add 4 processes that run each of these functions separately. +Processes A, B, C, and D which run functions func1, func2, func3 and func4 +respectively, and such that D runs first and A last. With D being preempted +in func4 in the "do something again" area, we have a locking that follows: + +D owns L3 + C blocked on L3 + C owns L2 + B blocked on L2 + B owns L1 + A blocked on L1 + +And thus we have the chain A->L1->B->L2->C->L3->D. + +This gives us a PI depth of 4 (four processes), but looking at any of the +functions individually, it seems as though they only have at most a locking +depth of two. So, although the locking depth is defined at compile time, +it still is very difficult to find the possibilities of that depth. + +Now since mutexes can be defined by user-land applications, we don't want a DOS +type of application that nests large amounts of mutexes to create a large +PI chain, and have the code holding spin locks while looking at a large +amount of data. So to prevent this, the implementation not only implements +a maximum lock depth, but also only holds at most two different locks at a +time, as it walks the PI chain. More about this below. + + +Mutex owner and flags +--------------------- + +The mutex structure contains a pointer to the owner of the mutex. If the +mutex is not owned, this owner is set to NULL. Since all architectures +have the task structure on at least a four byte alignment (and if this is +not true, the rtmutex.c code will be broken!), this allows for the two +least significant bits to be used as flags. This part is also described +in Documentation/rt-mutex.txt, but will also be briefly described here. + +Bit 0 is used as the "Pending Owner" flag. This is described later. +Bit 1 is used as the "Has Waiters" flags. This is also described later + in more detail, but is set whenever there are waiters on a mutex. + + +cmpxchg Tricks +-------------- + +Some architectures implement an atomic cmpxchg (Compare and Exchange). This +is used (when applicable) to keep the fast path of grabbing and releasing +mutexes short. + +cmpxchg is basically the following function performed atomically: + +unsigned long _cmpxchg(unsigned long *A, unsigned long *B, unsigned long *C) +{ + unsigned long T = *A; + if (*A == *B) { + *A = *C; + } + return T; +} +#define cmpxchg(a,b,c) _cmpxchg(&a,&b,&c) + +This is really nice to have, since it allows you to only update a variable +if the variable is what you expect it to be. You know if it succeeded if +the return value (the old value of A) is equal to B. + +The macro rt_mutex_cmpxchg is used to try to lock and unlock mutexes. If +the architecture does not support CMPXCHG, then this macro is simply set +to fail every time. But if CMPXCHG is supported, then this will +help out extremely to keep the fast path short. + +The use of rt_mutex_cmpxchg with the flags in the owner field help optimize +the system for architectures that support it. This will also be explained +later in this document. + + +Priority adjustments +-------------------- + +The implementation of the PI code in rtmutex.c has several places that a +process must adjust its priority. With the help of the pi_list of a +process this is rather easy to know what needs to be adjusted. + +The functions implementing the task adjustments are rt_mutex_adjust_prio, +__rt_mutex_adjust_prio (same as the former, but expects the task pi_lock +to already be taken), rt_mutex_get_prio, and rt_mutex_setprio. + +rt_mutex_getprio and rt_mutex_setprio are only used in __rt_mutex_adjust_prio. + +rt_mutex_getprio returns the priority that the task should have. Either the +task's own normal priority, or if a process of a higher priority is waiting on +a mutex owned by the task, then that higher priority should be returned. +Since the pi_list of a task holds an order by priority list of all the top +waiters of all the mutexes that the task owns, rt_mutex_getprio simply needs +to compare the top pi waiter to its own normal priority, and return the higher +priority back. + +(Note: if looking at the code, you will notice that the lower number of + prio is returned. This is because the prio field in the task structure + is an inverse order of the actual priority. So a "prio" of 5 is + of higher priority than a "prio" of 10.) + +__rt_mutex_adjust_prio examines the result of rt_mutex_getprio, and if the +result does not equal the task's current priority, then rt_mutex_setprio +is called to adjust the priority of the task to the new priority. +Note that rt_mutex_setprio is defined in kernel/sched.c to implement the +actual change in priority. + +It is interesting to note that __rt_mutex_adjust_prio can either increase +or decrease the priority of the task. In the case that a higher priority +process has just blocked on a mutex owned by the task, __rt_mutex_adjust_prio +would increase/boost the task's priority. But if a higher priority task +were for some reason to leave the mutex (timeout or signal), this same function +would decrease/unboost the priority of the task. That is because the pi_list +always contains the highest priority task that is waiting on a mutex owned +by the task, so we only need to compare the priority of that top pi waiter +to the normal priority of the given task. + + +High level overview of the PI chain walk +---------------------------------------- + +The PI chain walk is implemented by the function rt_mutex_adjust_prio_chain. + +The implementation has gone through several iterations, and has ended up +with what we believe is the best. It walks the PI chain by only grabbing +at most two locks at a time, and is very efficient. + +The rt_mutex_adjust_prio_chain can be used either to boost or lower process +priorities. + +rt_mutex_adjust_prio_chain is called with a task to be checked for PI +(de)boosting (the owner of a mutex that a process is blocking on), a flag to +check for deadlocking, the mutex that the task owns, and a pointer to a waiter +that is the process's waiter struct that is blocked on the mutex (although this +parameter may be NULL for deboosting). + +For this explanation, I will not mention deadlock detection. This explanation +will try to stay at a high level. + +When this function is called, there are no locks held. That also means +that the state of the owner and lock can change when entered into this function. + +Before this function is called, the task has already had rt_mutex_adjust_prio +performed on it. This means that the task is set to the priority that it +should be at, but the plist nodes of the task's waiter have not been updated +with the new priorities, and that this task may not be in the proper locations +in the pi_lists and wait_lists that the task is blocked on. This function +solves all that. + +A loop is entered, where task is the owner to be checked for PI changes that +was passed by parameter (for the first iteration). The pi_lock of this task is +taken to prevent any more changes to the pi_list of the task. This also +prevents new tasks from completing the blocking on a mutex that is owned by this +task. + +If the task is not blocked on a mutex then the loop is exited. We are at +the top of the PI chain. + +A check is now done to see if the original waiter (the process that is blocked +on the current mutex) is the top pi waiter of the task. That is, is this +waiter on the top of the task's pi_list. If it is not, it either means that +there is another process higher in priority that is blocked on one of the +mutexes that the task owns, or that the waiter has just woken up via a signal +or timeout and has left the PI chain. In either case, the loop is exited, since +we don't need to do any more changes to the priority of the current task, or any +task that owns a mutex that this current task is waiting on. A priority chain +walk is only needed when a new top pi waiter is made to a task. + +The next check sees if the task's waiter plist node has the priority equal to +the priority the task is set at. If they are equal, then we are done with +the loop. Remember that the function started with the priority of the +task adjusted, but the plist nodes that hold the task in other processes +pi_lists have not been adjusted. + +Next, we look at the mutex that the task is blocked on. The mutex's wait_lock +is taken. This is done by a spin_trylock, because the locking order of the +pi_lock and wait_lock goes in the opposite direction. If we fail to grab the +lock, the pi_lock is released, and we restart the loop. + +Now that we have both the pi_lock of the task as well as the wait_lock of +the mutex the task is blocked on, we update the task's waiter's plist node +that is located on the mutex's wait_list. + +Now we release the pi_lock of the task. + +Next the owner of the mutex has its pi_lock taken, so we can update the +task's entry in the owner's pi_list. If the task is the highest priority +process on the mutex's wait_list, then we remove the previous top waiter +from the owner's pi_list, and replace it with the task. + +Note: It is possible that the task was the current top waiter on the mutex, + in which case the task is not yet on the pi_list of the waiter. This + is OK, since plist_del does nothing if the plist node is not on any + list. + +If the task was not the top waiter of the mutex, but it was before we +did the priority updates, that means we are deboosting/lowering the +task. In this case, the task is removed from the pi_list of the owner, +and the new top waiter is added. + +Lastly, we unlock both the pi_lock of the task, as well as the mutex's +wait_lock, and continue the loop again. On the next iteration of the +loop, the previous owner of the mutex will be the task that will be +processed. + +Note: One might think that the owner of this mutex might have changed + since we just grab the mutex's wait_lock. And one could be right. + The important thing to remember is that the owner could not have + become the task that is being processed in the PI chain, since + we have taken that task's pi_lock at the beginning of the loop. + So as long as there is an owner of this mutex that is not the same + process as the tasked being worked on, we are OK. + + Looking closely at the code, one might be confused. The check for the + end of the PI chain is when the task isn't blocked on anything or the + task's waiter structure "task" element is NULL. This check is + protected only by the task's pi_lock. But the code to unlock the mutex + sets the task's waiter structure "task" element to NULL with only + the protection of the mutex's wait_lock, which was not taken yet. + Isn't this a race condition if the task becomes the new owner? + + The answer is No! The trick is the spin_trylock of the mutex's + wait_lock. If we fail that lock, we release the pi_lock of the + task and continue the loop, doing the end of PI chain check again. + + In the code to release the lock, the wait_lock of the mutex is held + the entire time, and it is not let go when we grab the pi_lock of the + new owner of the mutex. So if the switch of a new owner were to happen + after the check for end of the PI chain and the grabbing of the + wait_lock, the unlocking code would spin on the new owner's pi_lock + but never give up the wait_lock. So the PI chain loop is guaranteed to + fail the spin_trylock on the wait_lock, release the pi_lock, and + try again. + + If you don't quite understand the above, that's OK. You don't have to, + unless you really want to make a proof out of it ;) + + +Pending Owners and Lock stealing +-------------------------------- + +One of the flags in the owner field of the mutex structure is "Pending Owner". +What this means is that an owner was chosen by the process releasing the +mutex, but that owner has yet to wake up and actually take the mutex. + +Why is this important? Why can't we just give the mutex to another process +and be done with it? + +The PI code is to help with real-time processes, and to let the highest +priority process run as long as possible with little latencies and delays. +If a high priority process owns a mutex that a lower priority process is +blocked on, when the mutex is released it would be given to the lower priority +process. What if the higher priority process wants to take that mutex again. +The high priority process would fail to take that mutex that it just gave up +and it would need to boost the lower priority process to run with full +latency of that critical section (since the low priority process just entered +it). + +There's no reason a high priority process that gives up a mutex should be +penalized if it tries to take that mutex again. If the new owner of the +mutex has not woken up yet, there's no reason that the higher priority process +could not take that mutex away. + +To solve this, we introduced Pending Ownership and Lock Stealing. When a +new process is given a mutex that it was blocked on, it is only given +pending ownership. This means that it's the new owner, unless a higher +priority process comes in and tries to grab that mutex. If a higher priority +process does come along and wants that mutex, we let the higher priority +process "steal" the mutex from the pending owner (only if it is still pending) +and continue with the mutex. + + +Taking of a mutex (The walk through) +------------------------------------ + +OK, now let's take a look at the detailed walk through of what happens when +taking a mutex. + +The first thing that is tried is the fast taking of the mutex. This is +done when we have CMPXCHG enabled (otherwise the fast taking automatically +fails). Only when the owner field of the mutex is NULL can the lock be +taken with the CMPXCHG and nothing else needs to be done. + +If there is contention on the lock, whether it is owned or pending owner +we go about the slow path (rt_mutex_slowlock). + +The slow path function is where the task's waiter structure is created on +the stack. This is because the waiter structure is only needed for the +scope of this function. The waiter structure holds the nodes to store +the task on the wait_list of the mutex, and if need be, the pi_list of +the owner. + +The wait_lock of the mutex is taken since the slow path of unlocking the +mutex also takes this lock. + +We then call try_to_take_rt_mutex. This is where the architecture that +does not implement CMPXCHG would always grab the lock (if there's no +contention). + +try_to_take_rt_mutex is used every time the task tries to grab a mutex in the +slow path. The first thing that is done here is an atomic setting of +the "Has Waiters" flag of the mutex's owner field. Yes, this could really +be false, because if the the mutex has no owner, there are no waiters and +the current task also won't have any waiters. But we don't have the lock +yet, so we assume we are going to be a waiter. The reason for this is to +play nice for those architectures that do have CMPXCHG. By setting this flag +now, the owner of the mutex can't release the mutex without going into the +slow unlock path, and it would then need to grab the wait_lock, which this +code currently holds. So setting the "Has Waiters" flag forces the owner +to synchronize with this code. + +Now that we know that we can't have any races with the owner releasing the +mutex, we check to see if we can take the ownership. This is done if the +mutex doesn't have a owner, or if we can steal the mutex from a pending +owner. Let's look at the situations we have here. + + 1) Has owner that is pending + ---------------------------- + + The mutex has a owner, but it hasn't woken up and the mutex flag + "Pending Owner" is set. The first check is to see if the owner isn't the + current task. This is because this function is also used for the pending + owner to grab the mutex. When a pending owner wakes up, it checks to see + if it can take the mutex, and this is done if the owner is already set to + itself. If so, we succeed and leave the function, clearing the "Pending + Owner" bit. + + If the pending owner is not current, we check to see if the current priority is + higher than the pending owner. If not, we fail the function and return. + + There's also something special about a pending owner. That is a pending owner + is never blocked on a mutex. So there is no PI chain to worry about. It also + means that if the mutex doesn't have any waiters, there's no accounting needed + to update the pending owner's pi_list, since we only worry about processes + blocked on the current mutex. + + If there are waiters on this mutex, and we just stole the ownership, we need + to take the top waiter, remove it from the pi_list of the pending owner, and + add it to the current pi_list. Note that at this moment, the pending owner + is no longer on the list of waiters. This is fine, since the pending owner + would add itself back when it realizes that it had the ownership stolen + from itself. When the pending owner tries to grab the mutex, it will fail + in try_to_take_rt_mutex if the owner field points to another process. + + 2) No owner + ----------- + + If there is no owner (or we successfully stole the lock), we set the owner + of the mutex to current, and set the flag of "Has Waiters" if the current + mutex actually has waiters, or we clear the flag if it doesn't. See, it was + OK that we set that flag early, since now it is cleared. + + 3) Failed to grab ownership + --------------------------- + + The most interesting case is when we fail to take ownership. This means that + there exists an owner, or there's a pending owner with equal or higher + priority than the current task. + +We'll continue on the failed case. + +If the mutex has a timeout, we set up a timer to go off to break us out +of this mutex if we failed to get it after a specified amount of time. + +Now we enter a loop that will continue to try to take ownership of the mutex, or +fail from a timeout or signal. + +Once again we try to take the mutex. This will usually fail the first time +in the loop, since it had just failed to get the mutex. But the second time +in the loop, this would likely succeed, since the task would likely be +the pending owner. + +If the mutex is TASK_INTERRUPTIBLE a check for signals and timeout is done +here. + +The waiter structure has a "task" field that points to the task that is blocked +on the mutex. This field can be NULL the first time it goes through the loop +or if the task is a pending owner and had it's mutex stolen. If the "task" +field is NULL then we need to set up the accounting for it. + +Task blocks on mutex +-------------------- + +The accounting of a mutex and process is done with the waiter structure of +the process. The "task" field is set to the process, and the "lock" field +to the mutex. The plist nodes are initialized to the processes current +priority. + +Since the wait_lock was taken at the entry of the slow lock, we can safely +add the waiter to the wait_list. If the current process is the highest +priority process currently waiting on this mutex, then we remove the +previous top waiter process (if it exists) from the pi_list of the owner, +and add the current process to that list. Since the pi_list of the owner +has changed, we call rt_mutex_adjust_prio on the owner to see if the owner +should adjust its priority accordingly. + +If the owner is also blocked on a lock, and had its pi_list changed +(or deadlock checking is on), we unlock the wait_lock of the mutex and go ahead +and run rt_mutex_adjust_prio_chain on the owner, as described earlier. + +Now all locks are released, and if the current process is still blocked on a +mutex (waiter "task" field is not NULL), then we go to sleep (call schedule). + +Waking up in the loop +--------------------- + +The schedule can then wake up for a few reasons. + 1) we were given pending ownership of the mutex. + 2) we received a signal and was TASK_INTERRUPTIBLE + 3) we had a timeout and was TASK_INTERRUPTIBLE + +In any of these cases, we continue the loop and once again try to grab the +ownership of the mutex. If we succeed, we exit the loop, otherwise we continue +and on signal and timeout, will exit the loop, or if we had the mutex stolen +we just simply add ourselves back on the lists and go back to sleep. + +Note: For various reasons, because of timeout and signals, the steal mutex + algorithm needs to be careful. This is because the current process is + still on the wait_list. And because of dynamic changing of priorities, + especially on SCHED_OTHER tasks, the current process can be the + highest priority task on the wait_list. + +Failed to get mutex on Timeout or Signal +---------------------------------------- + +If a timeout or signal occurred, the waiter's "task" field would not be +NULL and the task needs to be taken off the wait_list of the mutex and perhaps +pi_list of the owner. If this process was a high priority process, then +the rt_mutex_adjust_prio_chain needs to be executed again on the owner, +but this time it will be lowering the priorities. + + +Unlocking the Mutex +------------------- + +The unlocking of a mutex also has a fast path for those architectures with +CMPXCHG. Since the taking of a mutex on contention always sets the +"Has Waiters" flag of the mutex's owner, we use this to know if we need to +take the slow path when unlocking the mutex. If the mutex doesn't have any +waiters, the owner field of the mutex would equal the current process and +the mutex can be unlocked by just replacing the owner field with NULL. + +If the owner field has the "Has Waiters" bit set (or CMPXCHG is not available), +the slow unlock path is taken. + +The first thing done in the slow unlock path is to take the wait_lock of the +mutex. This synchronizes the locking and unlocking of the mutex. + +A check is made to see if the mutex has waiters or not. On architectures that +do not have CMPXCHG, this is the location that the owner of the mutex will +determine if a waiter needs to be awoken or not. On architectures that +do have CMPXCHG, that check is done in the fast path, but it is still needed +in the slow path too. If a waiter of a mutex woke up because of a signal +or timeout between the time the owner failed the fast path CMPXCHG check and +the grabbing of the wait_lock, the mutex may not have any waiters, thus the +owner still needs to make this check. If there are no waiters than the mutex +owner field is set to NULL, the wait_lock is released and nothing more is +needed. + +If there are waiters, then we need to wake one up and give that waiter +pending ownership. + +On the wake up code, the pi_lock of the current owner is taken. The top +waiter of the lock is found and removed from the wait_list of the mutex +as well as the pi_list of the current owner. The task field of the new +pending owner's waiter structure is set to NULL, and the owner field of the +mutex is set to the new owner with the "Pending Owner" bit set, as well +as the "Has Waiters" bit if there still are other processes blocked on the +mutex. + +The pi_lock of the previous owner is released, and the new pending owner's +pi_lock is taken. Remember that this is the trick to prevent the race +condition in rt_mutex_adjust_prio_chain from adding itself as a waiter +on the mutex. + +We now clear the "pi_blocked_on" field of the new pending owner, and if +the mutex still has waiters pending, we add the new top waiter to the pi_list +of the pending owner. + +Finally we unlock the pi_lock of the pending owner and wake it up. + + +Contact +------- + +For updates on this document, please email Steven Rostedt <rostedt@goodmis.org> + + +Credits +------- + +Author: Steven Rostedt <rostedt@goodmis.org> + +Reviewers: Ingo Molnar, Thomas Gleixner, Thomas Duetsch, and Randy Dunlap + +Updates +------- + +This document was originally written for 2.6.17-rc3-mm1 diff --git a/Documentation/rt-mutex.txt b/Documentation/rt-mutex.txt new file mode 100644 index 000000000000..243393d882ee --- /dev/null +++ b/Documentation/rt-mutex.txt @@ -0,0 +1,79 @@ +RT-mutex subsystem with PI support +---------------------------------- + +RT-mutexes with priority inheritance are used to support PI-futexes, +which enable pthread_mutex_t priority inheritance attributes +(PTHREAD_PRIO_INHERIT). [See Documentation/pi-futex.txt for more details +about PI-futexes.] + +This technology was developed in the -rt tree and streamlined for +pthread_mutex support. + +Basic principles: +----------------- + +RT-mutexes extend the semantics of simple mutexes by the priority +inheritance protocol. + +A low priority owner of a rt-mutex inherits the priority of a higher +priority waiter until the rt-mutex is released. If the temporarily +boosted owner blocks on a rt-mutex itself it propagates the priority +boosting to the owner of the other rt_mutex it gets blocked on. The +priority boosting is immediately removed once the rt_mutex has been +unlocked. + +This approach allows us to shorten the block of high-prio tasks on +mutexes which protect shared resources. Priority inheritance is not a +magic bullet for poorly designed applications, but it allows +well-designed applications to use userspace locks in critical parts of +an high priority thread, without losing determinism. + +The enqueueing of the waiters into the rtmutex waiter list is done in +priority order. For same priorities FIFO order is chosen. For each +rtmutex, only the top priority waiter is enqueued into the owner's +priority waiters list. This list too queues in priority order. Whenever +the top priority waiter of a task changes (for example it timed out or +got a signal), the priority of the owner task is readjusted. [The +priority enqueueing is handled by "plists", see include/linux/plist.h +for more details.] + +RT-mutexes are optimized for fastpath operations and have no internal +locking overhead when locking an uncontended mutex or unlocking a mutex +without waiters. The optimized fastpath operations require cmpxchg +support. [If that is not available then the rt-mutex internal spinlock +is used] + +The state of the rt-mutex is tracked via the owner field of the rt-mutex +structure: + +rt_mutex->owner holds the task_struct pointer of the owner. Bit 0 and 1 +are used to keep track of the "owner is pending" and "rtmutex has +waiters" state. + + owner bit1 bit0 + NULL 0 0 mutex is free (fast acquire possible) + NULL 0 1 invalid state + NULL 1 0 Transitional state* + NULL 1 1 invalid state + taskpointer 0 0 mutex is held (fast release possible) + taskpointer 0 1 task is pending owner + taskpointer 1 0 mutex is held and has waiters + taskpointer 1 1 task is pending owner and mutex has waiters + +Pending-ownership handling is a performance optimization: +pending-ownership is assigned to the first (highest priority) waiter of +the mutex, when the mutex is released. The thread is woken up and once +it starts executing it can acquire the mutex. Until the mutex is taken +by it (bit 0 is cleared) a competing higher priority thread can "steal" +the mutex which puts the woken up thread back on the waiters list. + +The pending-ownership optimization is especially important for the +uninterrupted workflow of high-prio tasks which repeatedly +takes/releases locks that have lower-prio waiters. Without this +optimization the higher-prio thread would ping-pong to the lower-prio +task [because at unlock time we always assign a new owner]. + +(*) The "mutex has waiters" bit gets set to take the lock. If the lock +doesn't already have an owner, this bit is quickly cleared if there are +no waiters. So this is a transitional state to synchronize with looking +at the owner field of the mutex and the mutex owner releasing the lock. diff --git a/Documentation/rtc.txt b/Documentation/rtc.txt index 95d17b3e2eee..2a58f985795a 100644 --- a/Documentation/rtc.txt +++ b/Documentation/rtc.txt @@ -44,8 +44,10 @@ normal timer interrupt, which is 100Hz. Programming and/or enabling interrupt frequencies greater than 64Hz is only allowed by root. This is perhaps a bit conservative, but we don't want an evil user generating lots of IRQs on a slow 386sx-16, where it might have -a negative impact on performance. Note that the interrupt handler is only -a few lines of code to minimize any possibility of this effect. +a negative impact on performance. This 64Hz limit can be changed by writing +a different value to /proc/sys/dev/rtc/max-user-freq. Note that the +interrupt handler is only a few lines of code to minimize any possibility +of this effect. Also, if the kernel time is synchronized with an external source, the kernel will write the time back to the CMOS clock every 11 minutes. In @@ -81,6 +83,7 @@ that will be using this driver. */ #include <stdio.h> +#include <stdlib.h> #include <linux/rtc.h> #include <sys/ioctl.h> #include <sys/time.h> diff --git a/Documentation/scsi/00-INDEX b/Documentation/scsi/00-INDEX index e7da8c3a255b..12354830c6b0 100644 --- a/Documentation/scsi/00-INDEX +++ b/Documentation/scsi/00-INDEX @@ -30,8 +30,6 @@ aic7xxx.txt - info on driver for Adaptec controllers aic7xxx_old.txt - info on driver for Adaptec controllers, old generation -cpqfc.txt - - info on driver for Compaq Tachyon TS adapters dpti.txt - info on driver for DPT SmartRAID and Adaptec I2O RAID based adapters dtc3x80.txt diff --git a/Documentation/scsi/ChangeLog.megaraid_sas b/Documentation/scsi/ChangeLog.megaraid_sas index 2dafa63bd370..0a85a7e8120e 100644 --- a/Documentation/scsi/ChangeLog.megaraid_sas +++ b/Documentation/scsi/ChangeLog.megaraid_sas @@ -1,3 +1,16 @@ + +1 Release Date : Wed Feb 03 14:31:44 PST 2006 - Sumant Patro <Sumant.Patro@lsil.com> +2 Current Version : 00.00.02.04 +3 Older Version : 00.00.02.04 + +i. Remove superflous instance_lock + + gets rid of the otherwise superflous instance_lock and avoids an unsave + unsynchronized access in the error handler. + + - Christoph Hellwig <hch@lst.de> + + 1 Release Date : Wed Feb 03 14:31:44 PST 2006 - Sumant Patro <Sumant.Patro@lsil.com> 2 Current Version : 00.00.02.04 3 Older Version : 00.00.02.04 diff --git a/Documentation/scsi/aacraid.txt b/Documentation/scsi/aacraid.txt index 820fd0793502..be55670851a4 100644 --- a/Documentation/scsi/aacraid.txt +++ b/Documentation/scsi/aacraid.txt @@ -24,10 +24,10 @@ Supported Cards/Chipsets 9005:0285:9005:0296 Adaptec 2240S (SabreExpress) 9005:0285:9005:0290 Adaptec 2410SA (Jaguar) 9005:0285:9005:0293 Adaptec 21610SA (Corsair-16) - 9005:0285:103c:3227 Adaptec 2610SA (Bearcat) + 9005:0285:103c:3227 Adaptec 2610SA (Bearcat HP release) 9005:0285:9005:0292 Adaptec 2810SA (Corsair-8) 9005:0285:9005:0294 Adaptec Prowler - 9005:0286:9005:029d Adaptec 2420SA (Intruder) + 9005:0286:9005:029d Adaptec 2420SA (Intruder HP release) 9005:0286:9005:029c Adaptec 2620SA (Intruder) 9005:0286:9005:029b Adaptec 2820SA (Intruder) 9005:0286:9005:02a7 Adaptec 2830SA (Skyray) @@ -38,7 +38,7 @@ Supported Cards/Chipsets 9005:0285:9005:0297 Adaptec 4005SAS (AvonPark) 9005:0285:9005:0299 Adaptec 4800SAS (Marauder-X) 9005:0285:9005:029a Adaptec 4805SAS (Marauder-E) - 9005:0286:9005:02a2 Adaptec 4810SAS (Hurricane) + 9005:0286:9005:02a2 Adaptec 3800SAS (Hurricane44) 1011:0046:9005:0364 Adaptec 5400S (Mustang) 1011:0046:9005:0365 Adaptec 5400S (Mustang) 9005:0283:9005:0283 Adaptec Catapult (3210S with arc firmware) @@ -72,7 +72,7 @@ Supported Cards/Chipsets 9005:0286:9005:02a1 ICP ICP9087MA (Lancer) 9005:0286:9005:02a4 ICP ICP9085LI (Marauder-X) 9005:0286:9005:02a5 ICP ICP5085BR (Marauder-E) - 9005:0286:9005:02a3 ICP ICP5085AU (Hurricane) + 9005:0286:9005:02a3 ICP ICP5445AU (Hurricane44) 9005:0286:9005:02a6 ICP ICP9067MA (Intruder-6) 9005:0286:9005:02a9 ICP ICP5087AU (Skyray) 9005:0286:9005:02aa ICP ICP5047AU (Skyray) diff --git a/Documentation/scsi/cpqfc.txt b/Documentation/scsi/cpqfc.txt deleted file mode 100644 index dd33e61c0645..000000000000 --- a/Documentation/scsi/cpqfc.txt +++ /dev/null @@ -1,272 +0,0 @@ -Notes for CPQFCTS driver for Compaq Tachyon TS -Fibre Channel Host Bus Adapter, PCI 64-bit, 66MHz -for Linux (RH 6.1, 6.2 kernel 2.2.12-32, 2.2.14-5) -SMP tested -Tested in single and dual HBA configuration, 32 and 64bit busses, -33 and 66MHz. Only supports FC-AL. -SEST size 512 Exchanges (simultaneous I/Os) limited by module kmalloc() - max of 128k bytes contiguous. - -Ver 2.5.4 Oct 03, 2002 - * fixed memcpy of sense buffer in ioctl to copy the smaller defined size -Ver 2.5.3 Aug 01, 2002 - * fix the passthru ioctl to handle the Scsi_Cmnd->request being a pointer -Ver 2.5.1 Jul 30, 2002 - * fix ioctl to pay attention to the specified LUN. -Ver 2.5.0 Nov 29, 2001 - * eliminated io_request_lock. This change makes the driver specific - to the 2.5.x kernels. - * silenced excessively noisy printks. - -Ver 2.1.2 July 23, 2002 - * initialize DumCmnd->lun in cpqfcTS_ioctl (used in fcFindLoggedInPorts as LUN index) - -Ver 2.1.1 Oct 18, 2001 - * reinitialize Cmnd->SCp.sent_command (used to identify commands as - passthrus) on calling scsi_done, since the scsi mid layer does not - use (or reinitialize) this field to prevent subsequent comands from - having it set incorrectly. - -Ver 2.1.0 Aug 27, 2001 - * Revise driver to use new kernel 2.4.x PCI DMA API, instead of - virt_to_bus(). (enables driver to work w/ ia64 systems with >2Gb RAM.) - Rework main scatter-gather code to handle cases where SG element - lengths are larger than 0x7FFFF bytes and use as many scatter - gather pages as necessary. (Steve Cameron) - * Makefile changes to bring cpqfc into line w/ rest of SCSI drivers - (thanks to Keith Owens) - -Ver 2.0.5 Aug 06, 2001 - * Reject non-existent luns in the driver rather than letting the - hardware do it. (some HW behaves differently than others in this area.) - * Changed Makefile to rely on "make dep" instead of explicit dependencies - * ifdef'ed out fibre channel analyzer triggering debug code - * fixed a jiffies wrapping issue - -Ver 2.0.4 Aug 01, 2001 - * Incorporated fix for target device reset from Steeleye - * Fixed passthrough ioctl so it doesn't hang. - * Fixed hang in launch_FCworker_thread() that occurred on some machines. - * Avoid problem when number of volumes in a single cabinet > 8 - -Ver 2.0.2 July 23, 2001 - Changed the semiphore changes so the driver would compile in 2.4.7. - This version is for 2.4.7 and beyond. - -Ver 2.0.1 May 7, 2001 - Merged version 1.3.6 fixes into version 2.0.0. - -Ver 2.0.0 May 7, 2001 - Fixed problem so spinlock is being initialized to UNLOCKED. - Fixed updated driver so it compiles in the 2.4 tree. - - Ver 1.3.6 Feb 27, 2001 - Added Target_Device_Reset function for SCSI error handling - Fixed problem with not reseting addressing mode after implicit logout - - -Ver 1.3.4 Sep 7, 2000 - Added Modinfo information - Fixed problem with statically linking the driver - -Ver 1.3.3, Aug 23, 2000 - Fixed device/function number in ioctl - -Ver 1.3.2, July 27, 2000 - Add include for Alpha compile on 2.2.14 kernel (cpq*i2c.c) - Change logic for different FCP-RSP sense_buffer location for HSG80 target - And search for Agilent Tachyon XL2 HBAs (not finished! - in test) - -Tested with -(storage): - Compaq RA-4x000, RAID firmware ver 2.40 - 2.54 - Seagate FC drives model ST39102FC, rev 0006 - Hitachi DK31CJ-72FC rev J8A8 - IBM DDYF-T18350R rev F60K - Compaq FC-SCSI bridge w/ DLT 35/70 Gb DLT (tape) -(servers): - Compaq PL-1850R - Compaq PL-6500 Xeon (400MHz) - Compaq PL-8500 (500MHz, 66MHz, 64bit PCI) - Compaq Alpha DS20 (RH 6.1) -(hubs): - Vixel Rapport 1000 (7-port "dumb") - Gadzoox Gibralter (12-port "dumb") - Gadzoox Capellix 2000, 3000 -(switches): - Brocade 2010, 2400, 2800, rev 2.0.3a (& later) - Gadzoox 3210 (Fabric blade beta) - Vixel 7100 (Fabric beta firmare - known hot plug issues) -using "qa_test" (esp. io_test script) suite modified from Unix tests. - -Installation: -make menuconfig - (select SCSI low-level, Compaq FC HBA) -make modules -make modules_install - -e.g. insmod -f cpqfc - -Due to Fabric/switch delays, driver requires 4 seconds -to initialize. If adapters are found, there will be a entries at -/proc/scsi/cpqfcTS/* - -sample contents of startup messages - -************************* - scsi_register allocating 3596 bytes for CPQFCHBA - ioremap'd Membase: c887e600 - HBA Tachyon RevId 1.2 -Allocating 119808 for 576 Exchanges @ c0dc0000 -Allocating 112904 for LinkQ @ c0c20000 (576 elements) -Allocating 110600 for TachSEST for 512 Exchanges - cpqfcTS: writing IMQ BASE 7C0000h PI 7C4000h - cpqfcTS: SEST c0e40000(virt): Wrote base E40000h @ c887e740 -cpqfcTS: New FC port 0000E8h WWN: 500507650642499D SCSI Chan/Trgt 0/0 -cpqfcTS: New FC port 0000EFh WWN: 50000E100000D5A6 SCSI Chan/Trgt 0/1 -cpqfcTS: New FC port 0000E4h WWN: 21000020370097BB SCSI Chan/Trgt 0/2 -cpqfcTS: New FC port 0000E2h WWN: 2100002037009946 SCSI Chan/Trgt 0/3 -cpqfcTS: New FC port 0000E1h WWN: 21000020370098FE SCSI Chan/Trgt 0/4 -cpqfcTS: New FC port 0000E0h WWN: 21000020370097B2 SCSI Chan/Trgt 0/5 -cpqfcTS: New FC port 0000DCh WWN: 2100002037006CC1 SCSI Chan/Trgt 0/6 -cpqfcTS: New FC port 0000DAh WWN: 21000020370059F6 SCSI Chan/Trgt 0/7 -cpqfcTS: New FC port 00000Fh WWN: 500805F1FADB0E20 SCSI Chan/Trgt 0/8 -cpqfcTS: New FC port 000008h WWN: 500805F1FADB0EBA SCSI Chan/Trgt 0/9 -cpqfcTS: New FC port 000004h WWN: 500805F1FADB1EB9 SCSI Chan/Trgt 0/10 -cpqfcTS: New FC port 000002h WWN: 500805F1FADB1ADE SCSI Chan/Trgt 0/11 -cpqfcTS: New FC port 000001h WWN: 500805F1FADBA2CA SCSI Chan/Trgt 0/12 -scsi4 : Compaq FibreChannel HBA Tachyon TS HPFC-5166A/1.2: WWN 500508B200193F50 - on PCI bus 0 device 0xa0fc irq 5 IObaseL 0x3400, MEMBASE 0xc6ef8600 -PCI bus width 32 bits, bus speed 33 MHz -FCP-SCSI Driver v1.3.0 -GBIC detected: Short-wave. LPSM 0h Monitor -scsi : 5 hosts. - Vendor: IBM Model: DDYF-T18350R Rev: F60K - Type: Direct-Access ANSI SCSI revision: 03 -Detected scsi disk sdb at scsi4, channel 0, id 0, lun 0 - Vendor: HITACHI Model: DK31CJ-72FC Rev: J8A8 - Type: Direct-Access ANSI SCSI revision: 02 -Detected scsi disk sdc at scsi4, channel 0, id 1, lun 0 - Vendor: SEAGATE Model: ST39102FC Rev: 0006 - Type: Direct-Access ANSI SCSI revision: 02 -Detected scsi disk sdd at scsi4, channel 0, id 2, lun 0 - Vendor: SEAGATE Model: ST39102FC Rev: 0006 - Type: Direct-Access ANSI SCSI revision: 02 -Detected scsi disk sde at scsi4, channel 0, id 3, lun 0 - Vendor: SEAGATE Model: ST39102FC Rev: 0006 - Type: Direct-Access ANSI SCSI revision: 02 -Detected scsi disk sdf at scsi4, channel 0, id 4, lun 0 - Vendor: SEAGATE Model: ST39102FC Rev: 0006 - Type: Direct-Access ANSI SCSI revision: 02 -Detected scsi disk sdg at scsi4, channel 0, id 5, lun 0 - Vendor: SEAGATE Model: ST39102FC Rev: 0006 - Type: Direct-Access ANSI SCSI revision: 02 -Detected scsi disk sdh at scsi4, channel 0, id 6, lun 0 - Vendor: SEAGATE Model: ST39102FC Rev: 0006 - Type: Direct-Access ANSI SCSI revision: 02 -Detected scsi disk sdi at scsi4, channel 0, id 7, lun 0 - Vendor: COMPAQ Model: LOGICAL VOLUME Rev: 2.48 - Type: Direct-Access ANSI SCSI revision: 02 -Detected scsi disk sdj at scsi4, channel 0, id 8, lun 0 - Vendor: COMPAQ Model: LOGICAL VOLUME Rev: 2.48 - Type: Direct-Access ANSI SCSI revision: 02 -Detected scsi disk sdk at scsi4, channel 0, id 8, lun 1 - Vendor: COMPAQ Model: LOGICAL VOLUME Rev: 2.40 - Type: Direct-Access ANSI SCSI revision: 02 -Detected scsi disk sdl at scsi4, channel 0, id 9, lun 0 - Vendor: COMPAQ Model: LOGICAL VOLUME Rev: 2.40 - Type: Direct-Access ANSI SCSI revision: 02 -Detected scsi disk sdm at scsi4, channel 0, id 9, lun 1 - Vendor: COMPAQ Model: LOGICAL VOLUME Rev: 2.54 - Type: Direct-Access ANSI SCSI revision: 02 -Detected scsi disk sdn at scsi4, channel 0, id 10, lun 0 - Vendor: COMPAQ Model: LOGICAL VOLUME Rev: 2.54 - Type: Direct-Access ANSI SCSI revision: 02 -Detected scsi disk sdo at scsi4, channel 0, id 11, lun 0 - Vendor: COMPAQ Model: LOGICAL VOLUME Rev: 2.54 - Type: Direct-Access ANSI SCSI revision: 02 -Detected scsi disk sdp at scsi4, channel 0, id 11, lun 1 - Vendor: COMPAQ Model: LOGICAL VOLUME Rev: 2.54 - Type: Direct-Access ANSI SCSI revision: 02 -Detected scsi disk sdq at scsi4, channel 0, id 12, lun 0 - Vendor: COMPAQ Model: LOGICAL VOLUME Rev: 2.54 - Type: Direct-Access ANSI SCSI revision: 02 -Detected scsi disk sdr at scsi4, channel 0, id 12, lun 1 -resize_dma_pool: unknown device type 12 -resize_dma_pool: unknown device type 12 -SCSI device sdb: hdwr sector= 512 bytes. Sectors= 35843670 [17501 MB] [17.5 GB] - sdb: sdb1 -SCSI device sdc: hdwr sector= 512 bytes. Sectors= 144410880 [70513 MB] [70.5 GB] - sdc: sdc1 -SCSI device sdd: hdwr sector= 512 bytes. Sectors= 17783240 [8683 MB] [8.7 GB] - sdd: sdd1 -SCSI device sde: hdwr sector= 512 bytes. Sectors= 17783240 [8683 MB] [8.7 GB] - sde: sde1 -SCSI device sdf: hdwr sector= 512 bytes. Sectors= 17783240 [8683 MB] [8.7 GB] - sdf: sdf1 -SCSI device sdg: hdwr sector= 512 bytes. Sectors= 17783240 [8683 MB] [8.7 GB] - sdg: sdg1 -SCSI device sdh: hdwr sector= 512 bytes. Sectors= 17783240 [8683 MB] [8.7 GB] - sdh: sdh1 -SCSI device sdi: hdwr sector= 512 bytes. Sectors= 17783240 [8683 MB] [8.7 GB] - sdi: sdi1 -SCSI device sdj: hdwr sector= 512 bytes. Sectors= 2056160 [1003 MB] [1.0 GB] - sdj: sdj1 -SCSI device sdk: hdwr sector= 512 bytes. Sectors= 2052736 [1002 MB] [1.0 GB] - sdk: sdk1 -SCSI device sdl: hdwr sector= 512 bytes. Sectors= 17764320 [8673 MB] [8.7 GB] - sdl: sdl1 -SCSI device sdm: hdwr sector= 512 bytes. Sectors= 8380320 [4091 MB] [4.1 GB] - sdm: sdm1 -SCSI device sdn: hdwr sector= 512 bytes. Sectors= 17764320 [8673 MB] [8.7 GB] - sdn: sdn1 -SCSI device sdo: hdwr sector= 512 bytes. Sectors= 17764320 [8673 MB] [8.7 GB] - sdo: sdo1 -SCSI device sdp: hdwr sector= 512 bytes. Sectors= 17764320 [8673 MB] [8.7 GB] - sdp: sdp1 -SCSI device sdq: hdwr sector= 512 bytes. Sectors= 2056160 [1003 MB] [1.0 GB] - sdq: sdq1 -SCSI device sdr: hdwr sector= 512 bytes. Sectors= 2052736 [1002 MB] [1.0 GB] - sdr: sdr1 - -************************* - -If a GBIC of type Short-wave, Long-wave, or Copper is detected, it will -print out; otherwise, "none" is displayed. If the cabling is correct -and a loop circuit is completed, you should see "Monitor"; otherwise, -"LoopFail" (on open circuit) or some LPSM number/state with bit 3 set. - - -ERRATA: -1. Normally, Linux Scsi queries FC devices with INQUIRY strings. All LUNs -found according to INQUIRY should get READ commands at sector 0 to find -partition table, etc. Older kernels only query the first 4 devices. Some -Linux kernels only look for one LUN per target (i.e. FC device). - -2. Physically removing a device, or a malfunctioning system which hides a -device, leads to a 30-second timeout and subsequent _abort call. -In some process contexts, this will hang the kernel (crashing the system). -Single bit errors in frames and virtually all hot plugging events are -gracefully handled with internal driver timer and Abort processing. - -3. Some SCSI drives with error conditions will not handle the 7 second timeout -in this software driver, leading to infinite retries on timed out SCSI commands. -The 7 secs balances the need to quickly recover from lost frames (esp. on sequence -initiatives) and time needed by older/slower/error-state drives in responding. -This can be easily changed in "Exchanges[].timeOut". - -4. Due to the nature of FC soft addressing, there is no assurance that the -same LUNs (drives) will have the same path (e.g. /dev/sdb1) from one boot to -next. Dynamic soft address changes (i.e. 24-bit FC port_id) are -supported during run time (e.g. due to hot plug event) by the use of WWN to -SCSI Nexus (channel/target/LUN) mapping. - -5. Compaq RA4x00 firmware version 2.54 and later supports SSP (Selective -Storage Presentation), which maps LUNs to a WWN. If RA4x00 firmware prior -2.54 (e.g. older controller) is used, or the FC HBA is replaced (another WWN -is used), logical volumes on the RA4x00 will no longer be visible. - - -Send questions/comments to: -Amy Vanzant-Hodge (fibrechannel@compaq.com) - diff --git a/Documentation/scsi/hptiop.txt b/Documentation/scsi/hptiop.txt new file mode 100644 index 000000000000..d28a31247d4c --- /dev/null +++ b/Documentation/scsi/hptiop.txt @@ -0,0 +1,92 @@ +HIGHPOINT ROCKETRAID 3xxx RAID DRIVER (hptiop) + +Controller Register Map +------------------------- + +The controller IOP is accessed via PCI BAR0. + + BAR0 offset Register + 0x10 Inbound Message Register 0 + 0x14 Inbound Message Register 1 + 0x18 Outbound Message Register 0 + 0x1C Outbound Message Register 1 + 0x20 Inbound Doorbell Register + 0x24 Inbound Interrupt Status Register + 0x28 Inbound Interrupt Mask Register + 0x30 Outbound Interrupt Status Register + 0x34 Outbound Interrupt Mask Register + 0x40 Inbound Queue Port + 0x44 Outbound Queue Port + + +I/O Request Workflow +---------------------- + +All queued requests are handled via inbound/outbound queue port. +A request packet can be allocated in either IOP or host memory. + +To send a request to the controller: + + - Get a free request packet by reading the inbound queue port or + allocate a free request in host DMA coherent memory. + + The value returned from the inbound queue port is an offset + relative to the IOP BAR0. + + Requests allocated in host memory must be aligned on 32-bytes boundary. + + - Fill the packet. + + - Post the packet to IOP by writing it to inbound queue. For requests + allocated in IOP memory, write the offset to inbound queue port. For + requests allocated in host memory, write (0x80000000|(bus_addr>>5)) + to the inbound queue port. + + - The IOP process the request. When the request is completed, it + will be put into outbound queue. An outbound interrupt will be + generated. + + For requests allocated in IOP memory, the request offset is posted to + outbound queue. + + For requests allocated in host memory, (0x80000000|(bus_addr>>5)) + is posted to the outbound queue. If IOP_REQUEST_FLAG_OUTPUT_CONTEXT + flag is set in the request, the low 32-bit context value will be + posted instead. + + - The host read the outbound queue and complete the request. + + For requests allocated in IOP memory, the host driver free the request + by writing it to the outbound queue. + +Non-queued requests (reset/flush etc) can be sent via inbound message +register 0. An outbound message with the same value indicates the completion +of an inbound message. + + +User-level Interface +--------------------- + +The driver exposes following sysfs attributes: + + NAME R/W Description + driver-version R driver version string + firmware-version R firmware version string + +The driver registers char device "hptiop" to communicate with HighPoint RAID +management software. Its ioctl routine acts as a general binary interface +between the IOP firmware and HighPoint RAID management software. New management +functions can be implemented in application/firmware without modification +in driver code. + + +----------------------------------------------------------------------------- +Copyright (C) 2006 HighPoint Technologies, Inc. All Rights Reserved. + + This file is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + linux@highpoint-tech.com + http://www.highpoint-tech.com diff --git a/Documentation/scsi/ppa.txt b/Documentation/scsi/ppa.txt index 0dac88d86d87..5d9223bc1bd5 100644 --- a/Documentation/scsi/ppa.txt +++ b/Documentation/scsi/ppa.txt @@ -12,5 +12,3 @@ http://www.torque.net/parport/ Email list for Linux Parport linux-parport@torque.net -Email for problems with ZIP or ZIP Plus drivers -campbell@torque.net diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt index 0ee2c7dfc482..87d76a5c73d0 100644 --- a/Documentation/sound/alsa/ALSA-Configuration.txt +++ b/Documentation/sound/alsa/ALSA-Configuration.txt @@ -366,7 +366,9 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. Module for C-Media CMI8338 and 8738 PCI sound cards. - mpu_port - 0x300,0x310,0x320,0x330, 0 = disable (default) + mpu_port - 0x300,0x310,0x320,0x330 = legacy port, + 1 = integrated PCI port, + 0 = disable (default) fm_port - 0x388 (default), 0 = disable (default) soft_ac3 - Software-conversion of raw SPDIF packets (model 033 only) (default = 1) @@ -468,7 +470,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. Module for multifunction CS5535 companion PCI device - This module supports multiple cards. + The power-management is supported. Module snd-dt019x ----------------- @@ -707,8 +709,10 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. Module snd-hda-intel -------------------- - Module for Intel HD Audio (ICH6, ICH6M, ICH7), ATI SB450, - VIA VT8251/VT8237A + Module for Intel HD Audio (ICH6, ICH6M, ESB2, ICH7, ICH8), + ATI SB450, SB600, RS600, + VIA VT8251/VT8237A, + SIS966, ULI M5461 model - force the model name position_fix - Fix DMA pointer (0 = auto, 1 = none, 2 = POSBUF, 3 = FIFO size) @@ -778,6 +782,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. AD1981 basic 3-jack (default) hp HP nx6320 + thinkpad Lenovo Thinkpad T60/X60/Z60 AD1986A 6stack 6-jack, separate surrounds (default) @@ -1633,9 +1638,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. About capture IBL, see the description of snd-vx222 module. - Note: the driver is build only when CONFIG_ISA is set. - - Note2: snd-vxp440 driver is merged to snd-vxpocket driver since + Note: snd-vxp440 driver is merged to snd-vxpocket driver since ALSA 1.0.10. The power-management is supported. @@ -1662,8 +1665,6 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. Module for Sound Core PDAudioCF sound card. - Note: the driver is build only when CONFIG_ISA is set. - The power-management is supported. diff --git a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl b/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl index 1faf76383bab..635cbb94357c 100644 --- a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl +++ b/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl @@ -4215,7 +4215,7 @@ struct _snd_pcm_runtime { <programlisting> <![CDATA[ struct snd_rawmidi *rmidi; - snd_mpu401_uart_new(card, 0, MPU401_HW_MPU401, port, integrated, + snd_mpu401_uart_new(card, 0, MPU401_HW_MPU401, port, info_flags, irq, irq_flags, &rmidi); ]]> </programlisting> @@ -4242,15 +4242,36 @@ struct _snd_pcm_runtime { </para> <para> + The 5th argument is bitflags for additional information. When the i/o port address above is a part of the PCI i/o region, the MPU401 i/o port might have been already allocated - (reserved) by the driver itself. In such a case, pass non-zero - to the 5th argument - (<parameter>integrated</parameter>). Otherwise, pass 0 to it, + (reserved) by the driver itself. In such a case, pass a bit flag + <constant>MPU401_INFO_INTEGRATED</constant>, and the mpu401-uart layer will allocate the i/o ports by itself. </para> + <para> + When the controller supports only the input or output MIDI stream, + pass <constant>MPU401_INFO_INPUT</constant> or + <constant>MPU401_INFO_OUTPUT</constant> bitflag, respectively. + Then the rawmidi instance is created as a single stream. + </para> + + <para> + <constant>MPU401_INFO_MMIO</constant> bitflag is used to change + the access method to MMIO (via readb and writeb) instead of + iob and outb. In this case, you have to pass the iomapped address + to <function>snd_mpu401_uart_new()</function>. + </para> + + <para> + When <constant>MPU401_INFO_TX_IRQ</constant> is set, the output + stream isn't checked in the default interrupt handler. The driver + needs to call <function>snd_mpu401_uart_interrupt_tx()</function> + by itself to start processing the output stream in irq handler. + </para> + <para> Usually, the port address corresponds to the command port and port + 1 corresponds to the data port. If not, you may change @@ -5333,7 +5354,7 @@ struct _snd_pcm_runtime { <informalexample> <programlisting> <![CDATA[ - snd_info_set_text_ops(entry, chip, read_size, my_proc_read); + snd_info_set_text_ops(entry, chip, my_proc_read); ]]> </programlisting> </informalexample> @@ -5394,7 +5415,6 @@ struct _snd_pcm_runtime { <informalexample> <programlisting> <![CDATA[ - entry->c.text.write_size = 256; entry->c.text.write = my_proc_write; ]]> </programlisting> @@ -5402,22 +5422,6 @@ struct _snd_pcm_runtime { </para> <para> - The buffer size for read is set to 1024 implicitly by - <function>snd_info_set_text_ops()</function>. It should suffice - in most cases (the size will be aligned to - <constant>PAGE_SIZE</constant> anyway), but if you need to handle - very large text files, you can set it explicitly, too. - - <informalexample> - <programlisting> -<![CDATA[ - entry->c.text.read_size = 65536; -]]> - </programlisting> - </informalexample> - </para> - - <para> For the write callback, you can use <function>snd_info_get_line()</function> to get a text line, and <function>snd_info_get_str()</function> to retrieve a string from @@ -5562,7 +5566,7 @@ struct _snd_pcm_runtime { power status.</para></listitem> <listitem><para>Call <function>snd_pcm_suspend_all()</function> to suspend the running PCM streams.</para></listitem> <listitem><para>If AC97 codecs are used, call - <function>snd_ac97_resume()</function> for each codec.</para></listitem> + <function>snd_ac97_suspend()</function> for each codec.</para></listitem> <listitem><para>Save the register values if necessary.</para></listitem> <listitem><para>Stop the hardware if necessary.</para></listitem> <listitem><para>Disable the PCI device by calling diff --git a/Documentation/sparc/sbus_drivers.txt b/Documentation/sparc/sbus_drivers.txt index 876195dc2aef..4b9351624f13 100644 --- a/Documentation/sparc/sbus_drivers.txt +++ b/Documentation/sparc/sbus_drivers.txt @@ -25,42 +25,84 @@ the bits necessary to run your device. The most commonly used members of this structure, and their typical usage, will be detailed below. - Here is how probing is performed by an SBUS driver -under Linux: + Here is a piece of skeleton code for perofming a device +probe in an SBUS driverunder Linux: - static void init_one_mydevice(struct sbus_dev *sdev) + static int __devinit mydevice_probe_one(struct sbus_dev *sdev) { + struct mysdevice *mp = kzalloc(sizeof(*mp), GFP_KERNEL); + + if (!mp) + return -ENODEV; + + ... + dev_set_drvdata(&sdev->ofdev.dev, mp); + return 0; ... } - static int mydevice_match(struct sbus_dev *sdev) + static int __devinit mydevice_probe(struct of_device *dev, + const struct of_device_id *match) { - if (some_criteria(sdev)) - return 1; - return 0; + struct sbus_dev *sdev = to_sbus_device(&dev->dev); + + return mydevice_probe_one(sdev); } - static void mydevice_probe(void) + static int __devexit mydevice_remove(struct of_device *dev) { - struct sbus_bus *sbus; - struct sbus_dev *sdev; + struct sbus_dev *sdev = to_sbus_device(&dev->dev); + struct mydevice *mp = dev_get_drvdata(&dev->dev); - for_each_sbus(sbus) { - for_each_sbusdev(sdev, sbus) { - if (mydevice_match(sdev)) - init_one_mydevice(sdev); - } - } + return mydevice_remove_one(sdev, mp); } - All this does is walk through all SBUS devices in the -system, checks each to see if it is of the type which -your driver is written for, and if so it calls the init -routine to attach the device and prepare to drive it. + static struct of_device_id mydevice_match[] = { + { + .name = "mydevice", + }, + {}, + }; + + MODULE_DEVICE_TABLE(of, mydevice_match); - "init_one_mydevice" might do things like allocate software -state structures, map in I/O registers, place the hardware -into an initialized state, etc. + static struct of_platform_driver mydevice_driver = { + .name = "mydevice", + .match_table = mydevice_match, + .probe = mydevice_probe, + .remove = __devexit_p(mydevice_remove), + }; + + static int __init mydevice_init(void) + { + return of_register_driver(&mydevice_driver, &sbus_bus_type); + } + + static void __exit mydevice_exit(void) + { + of_unregister_driver(&mydevice_driver); + } + + module_init(mydevice_init); + module_exit(mydevice_exit); + + The mydevice_match table is a series of entries which +describes what SBUS devices your driver is meant for. In the +simplest case you specify a string for the 'name' field. Every +SBUS device with a 'name' property matching your string will +be passed one-by-one to your .probe method. + + You should store away your device private state structure +pointer in the drvdata area so that you can retrieve it later on +in your .remove method. + + Any memory allocated, registers mapped, IRQs registered, +etc. must be undone by your .remove method so that all resources +of your device are relased by the time it returns. + + You should _NOT_ use the for_each_sbus(), for_each_sbusdev(), +and for_all_sbusdev() interfaces. They are deprecated, will be +removed, and no new driver should reference them ever. Mapping and Accessing I/O Registers @@ -263,10 +305,3 @@ discussed above and plus it handles both PCI and SBUS boards. Lance driver abuses consistent mappings for data transfer. It is a nifty trick which we do not particularly recommend... Just check it out and know that it's legal. - - Bad examples, do NOT use - - drivers/video/cgsix.c - This one uses result of sbus_ioremap as if it is an address. -This does NOT work on sparc64 and therefore is broken. We will -convert it at a later date. diff --git a/Documentation/sparse.txt b/Documentation/sparse.txt index 3f1c5464b1c9..5a311c38dd1a 100644 --- a/Documentation/sparse.txt +++ b/Documentation/sparse.txt @@ -1,5 +1,6 @@ Copyright 2004 Linus Torvalds Copyright 2004 Pavel Machek <pavel@suse.cz> +Copyright 2006 Bob Copeland <me@bobcopeland.com> Using sparse for typechecking ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -41,15 +42,8 @@ sure that bitwise types don't get mixed up (little-endian vs big-endian vs cpu-endian vs whatever), and there the constant "0" really _is_ special. -Use - - make C=[12] CF=-Wbitwise - -or you don't get any checking at all. - - -Where to get sparse -~~~~~~~~~~~~~~~~~~~ +Getting sparse +~~~~~~~~~~~~~~ With git, you can just get it from @@ -57,7 +51,7 @@ With git, you can just get it from and DaveJ has tar-balls at - http://www.codemonkey.org.uk/projects/git-snapshots/sparse/ + http://www.codemonkey.org.uk/projects/git-snapshots/sparse/ Once you have it, just do @@ -65,8 +59,20 @@ Once you have it, just do make make install -as your regular user, and it will install sparse in your ~/bin directory. -After that, doing a kernel make with "make C=1" will run sparse on all the -C files that get recompiled, or with "make C=2" will run sparse on the -files whether they need to be recompiled or not (ie the latter is fast way -to check the whole tree if you have already built it). +as a regular user, and it will install sparse in your ~/bin directory. + +Using sparse +~~~~~~~~~~~~ + +Do a kernel make with "make C=1" to run sparse on all the C files that get +recompiled, or use "make C=2" to run sparse on the files whether they need to +be recompiled or not. The latter is a fast way to check the whole tree if you +have already built it. + +The optional make variable CF can be used to pass arguments to sparse. The +build system passes -Wbitwise to sparse automatically. To perform endianness +checks, you may define __CHECK_ENDIAN__: + + make C=2 CF="-D__CHECK_ENDIAN__" + +These checks are disabled by default as they generate a host of warnings. diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index a46c10fcddfc..2dc246af4885 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -29,6 +29,7 @@ Currently, these files are in /proc/sys/vm: - drop-caches - zone_reclaim_mode - zone_reclaim_interval +- panic_on_oom ============================================================== @@ -178,3 +179,15 @@ Time is set in seconds and set by default to 30 seconds. Reduce the interval if undesired off node allocations occur. However, too frequent scans will have a negative impact onoff node allocation performance. +============================================================= + +panic_on_oom + +This enables or disables panic on out-of-memory feature. If this is set to 1, +the kernel panics when out-of-memory happens. If this is set to 0, the kernel +will kill some rogue process, called oom_killer. Usually, oom_killer can kill +rogue processes and system will survive. If you want to panic the system +rather than killing rogue processes, set this to 1. + +The default value is 0. + diff --git a/Documentation/sysrq.txt b/Documentation/sysrq.txt index ad0bedf678b3..e0188a23fd5e 100644 --- a/Documentation/sysrq.txt +++ b/Documentation/sysrq.txt @@ -115,8 +115,9 @@ trojan program is running at console and which could grab your password when you would try to login. It will kill all programs on given console and thus letting you make sure that the login prompt you see is actually the one from init, not some trojan program. -IMPORTANT:In its true form it is not a true SAK like the one in :IMPORTANT -IMPORTANT:c2 compliant systems, and it should be mistook as such. :IMPORTANT +IMPORTANT: In its true form it is not a true SAK like the one in a :IMPORTANT +IMPORTANT: c2 compliant system, and it should not be mistaken as :IMPORTANT +IMPORTANT: such. :IMPORTANT It seems other find it useful as (System Attention Key) which is useful when you want to exit a program that will not let you switch consoles. (For example, X or a svgalib program.) diff --git a/Documentation/tty.txt b/Documentation/tty.txt index 8ff7bc2a0811..dab56604745d 100644 --- a/Documentation/tty.txt +++ b/Documentation/tty.txt @@ -80,13 +80,6 @@ receive_buf() - Hand buffers of bytes from the driver to the ldisc for processing. Semantics currently rather mysterious 8( -receive_room() - Can be called by the driver layer at any time when - the ldisc is opened. The ldisc must be able to - handle the reported amount of data at that instant. - Synchronization between active receive_buf and - receive_room calls is down to the driver not the - ldisc. Must not sleep. - write_wakeup() - May be called at any point between open and close. The TTY_DO_WRITE_WAKEUP flag indicates if a call is needed but always races versus calls. Thus the diff --git a/Documentation/usb/usbmon.txt b/Documentation/usb/usbmon.txt index 63cb7edd177e..e65ec828d7aa 100644 --- a/Documentation/usb/usbmon.txt +++ b/Documentation/usb/usbmon.txt @@ -29,14 +29,13 @@ if usbmon is built into the kernel. # mount -t debugfs none_debugs /sys/kernel/debug # modprobe usbmon +# Verify that bus sockets are present. -[root@lembas zaitcev]# ls /sys/kernel/debug/usbmon +# ls /sys/kernel/debug/usbmon 1s 1t 2s 2t 3s 3t 4s 4t -[root@lembas zaitcev]# - -# ls /sys/kernel +# 2. Find which bus connects to the desired device @@ -76,7 +75,7 @@ that the file size is not excessive for your favourite editor. * Raw text data format -The '0t' type data consists of a stream of events, such as URB submission, +The '1t' type data consists of a stream of events, such as URB submission, URB callback, submission error. Every event is a text line, which consists of whitespace separated words. The number of position of words may depend on the event type, but there is a set of words, common for all types. @@ -97,20 +96,25 @@ Here is the list of words, from left to right: Zi Zo Isochronous input and output Ii Io Interrupt input and output Bi Bo Bulk input and output - Device address and Endpoint number are decimal numbers with leading zeroes - or 3 and 2 positions, correspondingly. -- URB Status. This field makes no sense for submissions, but is present - to help scripts with parsing. In error case, it contains the error code. - In case of a setup packet, it contains a Setup Tag. If scripts read a number - in this field, they proceed to read Data Length. Otherwise, they read - the setup packet before reading the Data Length. + Device address and Endpoint number are 3-digit and 2-digit (respectively) + decimal numbers, with leading zeroes. +- URB Status. In most cases, this field contains a number, sometimes negative, + which represents a "status" field of the URB. This field makes no sense for + submissions, but is present anyway to help scripts with parsing. When an + error occurs, the field contains the error code. In case of a submission of + a Control packet, this field contains a Setup Tag instead of an error code. + It is easy to tell whether the Setup Tag is present because it is never a + number. Thus if scripts find a number in this field, they proceed to read + Data Length. If they find something else, like a letter, they read the setup + packet before reading the Data Length. - Setup packet, if present, consists of 5 words: one of each for bmRequestType, bRequest, wValue, wIndex, wLength, as specified by the USB Specification 2.0. These words are safe to decode if Setup Tag was 's'. Otherwise, the setup packet was present, but not captured, and the fields contain filler. -- Data Length. This is the actual length in the URB. +- Data Length. For submissions, this is the requested length. For callbacks, + this is the actual length. - Data tag. The usbmon may not always capture data, even if length is nonzero. - Only if tag is '=', the data words are present. + The data words are present only if this tag is '='. - Data words follow, in big endian hexadecimal format. Notice that they are not machine words, but really just a byte stream split into words to make it easier to read. Thus, the last word may contain from one to four bytes. diff --git a/Documentation/video4linux/CARDLIST.bttv b/Documentation/video4linux/CARDLIST.bttv index b72706c58a44..4efa4645885f 100644 --- a/Documentation/video4linux/CARDLIST.bttv +++ b/Documentation/video4linux/CARDLIST.bttv @@ -87,7 +87,7 @@ 86 -> Osprey 101/151 w/ svid 87 -> Osprey 200/201/250/251 88 -> Osprey 200/250 [0070:ff01] - 89 -> Osprey 210/220 + 89 -> Osprey 210/220/230 90 -> Osprey 500 [0070:ff02] 91 -> Osprey 540 [0070:ff04] 92 -> Osprey 2000 [0070:ff03] @@ -111,7 +111,7 @@ 110 -> IVC-100 [ff00:a132] 111 -> IVC-120G [ff00:a182,ff01:a182,ff02:a182,ff03:a182,ff04:a182,ff05:a182,ff06:a182,ff07:a182,ff08:a182,ff09:a182,ff0a:a182,ff0b:a182,ff0c:a182,ff0d:a182,ff0e:a182,ff0f:a182] 112 -> pcHDTV HD-2000 TV [7063:2000] -113 -> Twinhan DST + clones [11bd:0026,1822:0001,270f:fc00] +113 -> Twinhan DST + clones [11bd:0026,1822:0001,270f:fc00,1822:0026] 114 -> Winfast VC100 [107d:6607] 115 -> Teppro TEV-560/InterVision IV-560 116 -> SIMUS GVC1100 [aa6a:82b2] diff --git a/Documentation/video4linux/CARDLIST.cx88 b/Documentation/video4linux/CARDLIST.cx88 index 3b39a91b24bd..6cb63ddf6163 100644 --- a/Documentation/video4linux/CARDLIST.cx88 +++ b/Documentation/video4linux/CARDLIST.cx88 @@ -15,7 +15,7 @@ 14 -> KWorld/VStream XPert DVB-T [17de:08a6] 15 -> DViCO FusionHDTV DVB-T1 [18ac:db00] 16 -> KWorld LTV883RF - 17 -> DViCO FusionHDTV 3 Gold-Q [18ac:d810] + 17 -> DViCO FusionHDTV 3 Gold-Q [18ac:d810,18ac:d800] 18 -> Hauppauge Nova-T DVB-T [0070:9002,0070:9001] 19 -> Conexant DVB-T reference design [14f1:0187] 20 -> Provideo PV259 [1540:2580] @@ -40,8 +40,13 @@ 39 -> KWorld DVB-S 100 [17de:08b2] 40 -> Hauppauge WinTV-HVR1100 DVB-T/Hybrid [0070:9400,0070:9402] 41 -> Hauppauge WinTV-HVR1100 DVB-T/Hybrid (Low Profile) [0070:9800,0070:9802] - 42 -> digitalnow DNTV Live! DVB-T Pro [1822:0025] + 42 -> digitalnow DNTV Live! DVB-T Pro [1822:0025,1822:0019] 43 -> KWorld/VStream XPert DVB-T with cx22702 [17de:08a1] 44 -> DViCO FusionHDTV DVB-T Dual Digital [18ac:db50,18ac:db54] 45 -> KWorld HardwareMpegTV XPert [17de:0840] 46 -> DViCO FusionHDTV DVB-T Hybrid [18ac:db40,18ac:db44] + 47 -> pcHDTV HD5500 HDTV [7063:5500] + 48 -> Kworld MCE 200 Deluxe [17de:0841] + 49 -> PixelView PlayTV P7000 [1554:4813] + 50 -> NPG Tech Real TV FM Top 10 [14f1:0842] + 51 -> WinFast DTV2000 H [107d:665e] diff --git a/Documentation/video4linux/CARDLIST.saa7134 b/Documentation/video4linux/CARDLIST.saa7134 index bca50903233f..9068b669f5ee 100644 --- a/Documentation/video4linux/CARDLIST.saa7134 +++ b/Documentation/video4linux/CARDLIST.saa7134 @@ -93,3 +93,4 @@ 92 -> AVerMedia A169 B1 [1461:6360] 93 -> Medion 7134 Bridge #2 [16be:0005] 94 -> LifeView FlyDVB-T Hybrid Cardbus [5168:3306,5168:3502] + 95 -> LifeView FlyVIDEO3000 (NTSC) [5169:0138] diff --git a/Documentation/video4linux/CARDLIST.tuner b/Documentation/video4linux/CARDLIST.tuner index 1bcdac67dd8c..44134f04b82a 100644 --- a/Documentation/video4linux/CARDLIST.tuner +++ b/Documentation/video4linux/CARDLIST.tuner @@ -62,7 +62,7 @@ tuner=60 - Thomson DTT 761X (ATSC/NTSC) tuner=61 - Tena TNF9533-D/IF/TNF9533-B/DF tuner=62 - Philips TEA5767HN FM Radio tuner=63 - Philips FMD1216ME MK3 Hybrid Tuner -tuner=64 - LG TDVS-H062F/TUA6034 +tuner=64 - LG TDVS-H06xF tuner=65 - Ymec TVF66T5-B/DFF tuner=66 - LG TALN series tuner=67 - Philips TD1316 Hybrid Tuner @@ -71,3 +71,4 @@ tuner=69 - Tena TNF 5335 and similar models tuner=70 - Samsung TCPN 2121P30A tuner=71 - Xceive xc3028 tuner=72 - Thomson FE6600 +tuner=73 - Samsung TCPG 6121P30A diff --git a/Documentation/video4linux/CQcam.txt b/Documentation/video4linux/CQcam.txt index 464e4cec94cb..ade8651e2443 100644 --- a/Documentation/video4linux/CQcam.txt +++ b/Documentation/video4linux/CQcam.txt @@ -185,207 +185,10 @@ this work is documented at the video4linux2 site listed below. 9.0 --- A sample program using v4lgrabber, -This program is a simple image grabber that will copy a frame from the +v4lgrab is a simple image grabber that will copy a frame from the first video device, /dev/video0 to standard output in portable pixmap -format (.ppm) Using this like: 'v4lgrab | convert - c-qcam.jpg' -produced this picture of me at - http://mug.sys.virginia.edu/~drf5n/extras/c-qcam.jpg - --------------------- 8< ---------------- 8< ----------------------------- - -/* Simple Video4Linux image grabber. */ -/* - * Video4Linux Driver Test/Example Framegrabbing Program - * - * Compile with: - * gcc -s -Wall -Wstrict-prototypes v4lgrab.c -o v4lgrab - * Use as: - * v4lgrab >image.ppm - * - * Copyright (C) 1998-05-03, Phil Blundell <philb@gnu.org> - * Copied from http://www.tazenda.demon.co.uk/phil/vgrabber.c - * with minor modifications (Dave Forrest, drf5n@virginia.edu). - * - */ - -#include <unistd.h> -#include <sys/types.h> -#include <sys/stat.h> -#include <fcntl.h> -#include <stdio.h> -#include <sys/ioctl.h> -#include <stdlib.h> - -#include <linux/types.h> -#include <linux/videodev.h> - -#define FILE "/dev/video0" - -/* Stole this from tvset.c */ - -#define READ_VIDEO_PIXEL(buf, format, depth, r, g, b) \ -{ \ - switch (format) \ - { \ - case VIDEO_PALETTE_GREY: \ - switch (depth) \ - { \ - case 4: \ - case 6: \ - case 8: \ - (r) = (g) = (b) = (*buf++ << 8);\ - break; \ - \ - case 16: \ - (r) = (g) = (b) = \ - *((unsigned short *) buf); \ - buf += 2; \ - break; \ - } \ - break; \ - \ - \ - case VIDEO_PALETTE_RGB565: \ - { \ - unsigned short tmp = *(unsigned short *)buf; \ - (r) = tmp&0xF800; \ - (g) = (tmp<<5)&0xFC00; \ - (b) = (tmp<<11)&0xF800; \ - buf += 2; \ - } \ - break; \ - \ - case VIDEO_PALETTE_RGB555: \ - (r) = (buf[0]&0xF8)<<8; \ - (g) = ((buf[0] << 5 | buf[1] >> 3)&0xF8)<<8; \ - (b) = ((buf[1] << 2 ) & 0xF8)<<8; \ - buf += 2; \ - break; \ - \ - case VIDEO_PALETTE_RGB24: \ - (r) = buf[0] << 8; (g) = buf[1] << 8; \ - (b) = buf[2] << 8; \ - buf += 3; \ - break; \ - \ - default: \ - fprintf(stderr, \ - "Format %d not yet supported\n", \ - format); \ - } \ -} - -int get_brightness_adj(unsigned char *image, long size, int *brightness) { - long i, tot = 0; - for (i=0;i<size*3;i++) - tot += image[i]; - *brightness = (128 - tot/(size*3))/3; - return !((tot/(size*3)) >= 126 && (tot/(size*3)) <= 130); -} - -int main(int argc, char ** argv) -{ - int fd = open(FILE, O_RDONLY), f; - struct video_capability cap; - struct video_window win; - struct video_picture vpic; - - unsigned char *buffer, *src; - int bpp = 24, r, g, b; - unsigned int i, src_depth; - - if (fd < 0) { - perror(FILE); - exit(1); - } - - if (ioctl(fd, VIDIOCGCAP, &cap) < 0) { - perror("VIDIOGCAP"); - fprintf(stderr, "(" FILE " not a video4linux device?)\n"); - close(fd); - exit(1); - } - - if (ioctl(fd, VIDIOCGWIN, &win) < 0) { - perror("VIDIOCGWIN"); - close(fd); - exit(1); - } - - if (ioctl(fd, VIDIOCGPICT, &vpic) < 0) { - perror("VIDIOCGPICT"); - close(fd); - exit(1); - } - - if (cap.type & VID_TYPE_MONOCHROME) { - vpic.depth=8; - vpic.palette=VIDEO_PALETTE_GREY; /* 8bit grey */ - if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) { - vpic.depth=6; - if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) { - vpic.depth=4; - if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) { - fprintf(stderr, "Unable to find a supported capture format.\n"); - close(fd); - exit(1); - } - } - } - } else { - vpic.depth=24; - vpic.palette=VIDEO_PALETTE_RGB24; - - if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) { - vpic.palette=VIDEO_PALETTE_RGB565; - vpic.depth=16; - - if(ioctl(fd, VIDIOCSPICT, &vpic)==-1) { - vpic.palette=VIDEO_PALETTE_RGB555; - vpic.depth=15; - - if(ioctl(fd, VIDIOCSPICT, &vpic)==-1) { - fprintf(stderr, "Unable to find a supported capture format.\n"); - return -1; - } - } - } - } - - buffer = malloc(win.width * win.height * bpp); - if (!buffer) { - fprintf(stderr, "Out of memory.\n"); - exit(1); - } - - do { - int newbright; - read(fd, buffer, win.width * win.height * bpp); - f = get_brightness_adj(buffer, win.width * win.height, &newbright); - if (f) { - vpic.brightness += (newbright << 8); - if(ioctl(fd, VIDIOCSPICT, &vpic)==-1) { - perror("VIDIOSPICT"); - break; - } - } - } while (f); - - fprintf(stdout, "P6\n%d %d 255\n", win.width, win.height); - - src = buffer; - - for (i = 0; i < win.width * win.height; i++) { - READ_VIDEO_PIXEL(src, vpic.palette, src_depth, r, g, b); - fputc(r>>8, stdout); - fputc(g>>8, stdout); - fputc(b>>8, stdout); - } - - close(fd); - return 0; -} --------------------- 8< ---------------- 8< ----------------------------- +format (.ppm) To produce .jpg output, you can use it like this: +'v4lgrab | convert - c-qcam.jpg' 10.0 --- Other Information diff --git a/Documentation/video4linux/README.pvrusb2 b/Documentation/video4linux/README.pvrusb2 new file mode 100644 index 000000000000..c73a32c34528 --- /dev/null +++ b/Documentation/video4linux/README.pvrusb2 @@ -0,0 +1,212 @@ + +$Id$ +Mike Isely <isely@pobox.com> + + pvrusb2 driver + +Background: + + This driver is intended for the "Hauppauge WinTV PVR USB 2.0", which + is a USB 2.0 hosted TV Tuner. This driver is a work in progress. + Its history started with the reverse-engineering effort by Björn + Danielsson <pvrusb2@dax.nu> whose web page can be found here: + + http://pvrusb2.dax.nu/ + + From there Aurelien Alleaume <slts@free.fr> began an effort to + create a video4linux compatible driver. I began with Aurelien's + last known snapshot and evolved the driver to the state it is in + here. + + More information on this driver can be found at: + + http://www.isely.net/pvrusb2.html + + + This driver has a strong separation of layers. They are very + roughly: + + 1a. Low level wire-protocol implementation with the device. + + 1b. I2C adaptor implementation and corresponding I2C client drivers + implemented elsewhere in V4L. + + 1c. High level hardware driver implementation which coordinates all + activities that ensure correct operation of the device. + + 2. A "context" layer which manages instancing of driver, setup, + tear-down, arbitration, and interaction with high level + interfaces appropriately as devices are hotplugged in the + system. + + 3. High level interfaces which glue the driver to various published + Linux APIs (V4L, sysfs, maybe DVB in the future). + + The most important shearing layer is between the top 2 layers. A + lot of work went into the driver to ensure that any kind of + conceivable API can be laid on top of the core driver. (Yes, the + driver internally leverages V4L to do its work but that really has + nothing to do with the API published by the driver to the outside + world.) The architecture allows for different APIs to + simultaneously access the driver. I have a strong sense of fairness + about APIs and also feel that it is a good design principle to keep + implementation and interface isolated from each other. Thus while + right now the V4L high level interface is the most complete, the + sysfs high level interface will work equally well for similar + functions, and there's no reason I see right now why it shouldn't be + possible to produce a DVB high level interface that can sit right + alongside V4L. + + NOTE: Complete documentation on the pvrusb2 driver is contained in + the html files within the doc directory; these are exactly the same + as what is on the web site at the time. Browse those files + (especially the FAQ) before asking questions. + + +Building + + To build these modules essentially amounts to just running "Make", + but you need the kernel source tree nearby and you will likely also + want to set a few controlling environment variables first in order + to link things up with that source tree. Please see the Makefile + here for comments that explain how to do that. + + +Source file list / functional overview: + + (Note: The term "module" used below generally refers to loosely + defined functional units within the pvrusb2 driver and bears no + relation to the Linux kernel's concept of a loadable module.) + + pvrusb2-audio.[ch] - This is glue logic that resides between this + driver and the msp3400.ko I2C client driver (which is found + elsewhere in V4L). + + pvrusb2-context.[ch] - This module implements the context for an + instance of the driver. Everything else eventually ties back to + or is otherwise instanced within the data structures implemented + here. Hotplugging is ultimately coordinated here. All high level + interfaces tie into the driver through this module. This module + helps arbitrate each interface's access to the actual driver core, + and is designed to allow concurrent access through multiple + instances of multiple interfaces (thus you can for example change + the tuner's frequency through sysfs while simultaneously streaming + video through V4L out to an instance of mplayer). + + pvrusb2-debug.h - This header defines a printk() wrapper and a mask + of debugging bit definitions for the various kinds of debug + messages that can be enabled within the driver. + + pvrusb2-debugifc.[ch] - This module implements a crude command line + oriented debug interface into the driver. Aside from being part + of the process for implementing manual firmware extraction (see + the pvrusb2 web site mentioned earlier), probably I'm the only one + who has ever used this. It is mainly a debugging aid. + + pvrusb2-eeprom.[ch] - This is glue logic that resides between this + driver the tveeprom.ko module, which is itself implemented + elsewhere in V4L. + + pvrusb2-encoder.[ch] - This module implements all protocol needed to + interact with the Conexant mpeg2 encoder chip within the pvrusb2 + device. It is a crude echo of corresponding logic in ivtv, + however the design goals (strict isolation) and physical layer + (proxy through USB instead of PCI) are enough different that this + implementation had to be completely different. + + pvrusb2-hdw-internal.h - This header defines the core data structure + in the driver used to track ALL internal state related to control + of the hardware. Nobody outside of the core hardware-handling + modules should have any business using this header. All external + access to the driver should be through one of the high level + interfaces (e.g. V4L, sysfs, etc), and in fact even those high + level interfaces are restricted to the API defined in + pvrusb2-hdw.h and NOT this header. + + pvrusb2-hdw.h - This header defines the full internal API for + controlling the hardware. High level interfaces (e.g. V4L, sysfs) + will work through here. + + pvrusb2-hdw.c - This module implements all the various bits of logic + that handle overall control of a specific pvrusb2 device. + (Policy, instantiation, and arbitration of pvrusb2 devices fall + within the jurisdiction of pvrusb-context not here). + + pvrusb2-i2c-chips-*.c - These modules implement the glue logic to + tie together and configure various I2C modules as they attach to + the I2C bus. There are two versions of this file. The "v4l2" + version is intended to be used in-tree alongside V4L, where we + implement just the logic that makes sense for a pure V4L + environment. The "all" version is intended for use outside of + V4L, where we might encounter other possibly "challenging" modules + from ivtv or older kernel snapshots (or even the support modules + in the standalone snapshot). + + pvrusb2-i2c-cmd-v4l1.[ch] - This module implements generic V4L1 + compatible commands to the I2C modules. It is here where state + changes inside the pvrusb2 driver are translated into V4L1 + commands that are in turn send to the various I2C modules. + + pvrusb2-i2c-cmd-v4l2.[ch] - This module implements generic V4L2 + compatible commands to the I2C modules. It is here where state + changes inside the pvrusb2 driver are translated into V4L2 + commands that are in turn send to the various I2C modules. + + pvrusb2-i2c-core.[ch] - This module provides an implementation of a + kernel-friendly I2C adaptor driver, through which other external + I2C client drivers (e.g. msp3400, tuner, lirc) may connect and + operate corresponding chips within the the pvrusb2 device. It is + through here that other V4L modules can reach into this driver to + operate specific pieces (and those modules are in turn driven by + glue logic which is coordinated by pvrusb2-hdw, doled out by + pvrusb2-context, and then ultimately made available to users + through one of the high level interfaces). + + pvrusb2-io.[ch] - This module implements a very low level ring of + transfer buffers, required in order to stream data from the + device. This module is *very* low level. It only operates the + buffers and makes no attempt to define any policy or mechanism for + how such buffers might be used. + + pvrusb2-ioread.[ch] - This module layers on top of pvrusb2-io.[ch] + to provide a streaming API usable by a read() system call style of + I/O. Right now this is the only layer on top of pvrusb2-io.[ch], + however the underlying architecture here was intended to allow for + other styles of I/O to be implemented with additonal modules, like + mmap()'ed buffers or something even more exotic. + + pvrusb2-main.c - This is the top level of the driver. Module level + and USB core entry points are here. This is our "main". + + pvrusb2-sysfs.[ch] - This is the high level interface which ties the + pvrusb2 driver into sysfs. Through this interface you can do + everything with the driver except actually stream data. + + pvrusb2-tuner.[ch] - This is glue logic that resides between this + driver and the tuner.ko I2C client driver (which is found + elsewhere in V4L). + + pvrusb2-util.h - This header defines some common macros used + throughout the driver. These macros are not really specific to + the driver, but they had to go somewhere. + + pvrusb2-v4l2.[ch] - This is the high level interface which ties the + pvrusb2 driver into video4linux. It is through here that V4L + applications can open and operate the driver in the usual V4L + ways. Note that **ALL** V4L functionality is published only + through here and nowhere else. + + pvrusb2-video-*.[ch] - This is glue logic that resides between this + driver and the saa711x.ko I2C client driver (which is found + elsewhere in V4L). Note that saa711x.ko used to be known as + saa7115.ko in ivtv. There are two versions of this; one is + selected depending on the particular saa711[5x].ko that is found. + + pvrusb2.h - This header contains compile time tunable parameters + (and at the moment the driver has very little that needs to be + tuned). + + + -Mike Isely + isely@pobox.com + diff --git a/Documentation/video4linux/Zoran b/Documentation/video4linux/Zoran index be9f21b84555..040a2c841ae9 100644 --- a/Documentation/video4linux/Zoran +++ b/Documentation/video4linux/Zoran @@ -33,6 +33,21 @@ Inputs/outputs: Composite and S-video Norms: PAL, SECAM (720x576 @ 25 fps), NTSC (720x480 @ 29.97 fps) Card number: 7 +AverMedia 6 Eyes AVS6EYES: +* Zoran zr36067 PCI controller +* Zoran zr36060 MJPEG codec +* Samsung ks0127 TV decoder +* Conexant bt866 TV encoder +Drivers to use: videodev, i2c-core, i2c-algo-bit, + videocodec, ks0127, bt866, zr36060, zr36067 +Inputs/outputs: Six physical inputs. 1-6 are composite, + 1-2, 3-4, 5-6 doubles as S-video, + 1-3 triples as component. + One composite output. +Norms: PAL, SECAM (720x576 @ 25 fps), NTSC (720x480 @ 29.97 fps) +Card number: 8 +Not autodetected, card=8 is necessary. + Linux Media Labs LML33: * Zoran zr36067 PCI controller * Zoran zr36060 MJPEG codec @@ -192,6 +207,10 @@ Micronas vpx3220a TV decoder was introduced in 1996, is used in the DC30 and DC30+ and can handle: PAL B/G/H/I, PAL N, PAL M, NTSC M, NTSC 44, PAL 60, SECAM,NTSC Comb +Samsung ks0127 TV decoder +is used in the AVS6EYES card and +can handle: NTSC-M/N/44, PAL-M/N/B/G/H/I/D/K/L and SECAM + =========================== 1.2 What the TV encoder can do an what not @@ -221,6 +240,10 @@ ITT mse3000 TV encoder was introduced in 1991, is used in the DC10 old can generate: PAL , NTSC , SECAM +Conexant bt866 TV encoder +is used in AVS6EYES, and +can generate: NTSC/PAL, PALM, PALN + The adv717x, should be able to produce PAL N. But you find nothing PAL N specific in the registers. Seem that you have to reuse a other standard to generate PAL N, maybe it would work if you use the PAL M settings. diff --git a/Documentation/video4linux/bttv/CONTRIBUTORS b/Documentation/video4linux/bttv/CONTRIBUTORS index aef49db8847d..8aad6dd93d6b 100644 --- a/Documentation/video4linux/bttv/CONTRIBUTORS +++ b/Documentation/video4linux/bttv/CONTRIBUTORS @@ -1,4 +1,4 @@ -Contributors to bttv: +Contributors to bttv: Michael Chu <mmchu@pobox.com> AverMedia fix and more flexible card recognition @@ -8,8 +8,8 @@ Alan Cox <alan@redhat.com> Chris Kleitsch Hardware I2C - -Gerd Knorr <kraxel@cs.tu-berlin.de> + +Gerd Knorr <kraxel@cs.tu-berlin.de> Radio card (ITT sound processor) bigfoot <bigfoot@net-way.net> @@ -18,7 +18,7 @@ Ragnar Hojland Espinosa <ragnar@macula.net> + many more (please mail me if you are missing in this list and would - like to be mentioned) + like to be mentioned) diff --git a/Documentation/video4linux/cx2341x/fw-calling.txt b/Documentation/video4linux/cx2341x/fw-calling.txt new file mode 100644 index 000000000000..8d21181de537 --- /dev/null +++ b/Documentation/video4linux/cx2341x/fw-calling.txt @@ -0,0 +1,69 @@ +This page describes how to make calls to the firmware api. + +How to call +=========== + +The preferred calling convention is known as the firmware mailbox. The +mailboxes are basically a fixed length array that serves as the call-stack. + +Firmware mailboxes can be located by searching the encoder and decoder memory +for a 16 byte signature. That signature will be located on a 256-byte boundary. + +Signature: +0x78, 0x56, 0x34, 0x12, 0x12, 0x78, 0x56, 0x34, +0x34, 0x12, 0x78, 0x56, 0x56, 0x34, 0x12, 0x78 + +The firmware implements 20 mailboxes of 20 32-bit words. The first 10 are +reserved for API calls. The second 10 are used by the firmware for event +notification. + + Index Name + ----- ---- + 0 Flags + 1 Command + 2 Return value + 3 Timeout + 4-19 Parameter/Result + + +The flags are defined in the following table. The direction is from the +perspective of the firmware. + + Bit Direction Purpose + --- --------- ------- + 2 O Firmware has processed the command. + 1 I Driver has finished setting the parameters. + 0 I Driver is using this mailbox. + + +The command is a 32-bit enumerator. The API specifics may be found in the +fw-*-api.txt documents. + +The return value is a 32-bit enumerator. Only two values are currently defined: +0=success and -1=command undefined. + +There are 16 parameters/results 32-bit fields. The driver populates these fields +with values for all the parameters required by the call. The driver overwrites +these fields with result values returned by the call. The API specifics may be +found in the fw-*-api.txt documents. + +The timeout value protects the card from a hung driver thread. If the driver +doesn't handle the completed call within the timeout specified, the firmware +will reset that mailbox. + +To make an API call, the driver iterates over each mailbox looking for the +first one available (bit 0 has been cleared). The driver sets that bit, fills +in the command enumerator, the timeout value and any required parameters. The +driver then sets the parameter ready bit (bit 1). The firmware scans the +mailboxes for pending commands, processes them, sets the result code, populates +the result value array with that call's return values and sets the call +complete bit (bit 2). Once bit 2 is set, the driver should retrieve the results +and clear all the flags. If the driver does not perform this task within the +time set in the timeout register, the firmware will reset that mailbox. + +Event notifications are sent from the firmware to the host. The host tells the +firmware which events it is interested in via an API call. That call tells the +firmware which notification mailbox to use. The firmware signals the host via +an interrupt. Only the 16 Results fields are used, the Flags, Command, Return +value and Timeout words are not used. + diff --git a/Documentation/video4linux/cx2341x/fw-decoder-api.txt b/Documentation/video4linux/cx2341x/fw-decoder-api.txt new file mode 100644 index 000000000000..9df4fb3ea0f2 --- /dev/null +++ b/Documentation/video4linux/cx2341x/fw-decoder-api.txt @@ -0,0 +1,319 @@ +Decoder firmware API description +================================ + +Note: this API is part of the decoder firmware, so it's cx23415 only. + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_PING_FW +Enum 0/0x00 +Description + This API call does nothing. It may be used to check if the firmware + is responding. + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_START_PLAYBACK +Enum 1/0x01 +Description + Begin or resume playback. +Param[0] + 0 based frame number in GOP to begin playback from. +Param[1] + Specifies the number of muted audio frames to play before normal + audio resumes. + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_STOP_PLAYBACK +Enum 2/0x02 +Description + Ends playback and clears all decoder buffers. If PTS is not zero, + playback stops at specified PTS. +Param[0] + Display 0=last frame, 1=black +Param[1] + PTS low +Param[2] + PTS high + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_SET_PLAYBACK_SPEED +Enum 3/0x03 +Description + Playback stream at speed other than normal. There are two modes of + operation: + Smooth: host transfers entire stream and firmware drops unused + frames. + Coarse: host drops frames based on indexing as required to achieve + desired speed. +Param[0] + Bitmap: + 0:7 0 normal + 1 fast only "1.5 times" + n nX fast, 1/nX slow + 30 Framedrop: + '0' during 1.5 times play, every other B frame is dropped + '1' during 1.5 times play, stream is unchanged (bitrate + must not exceed 8mbps) + 31 Speed: + '0' slow + '1' fast +Param[1] + Direction: 0=forward, 1=reverse +Param[2] + Picture mask: + 1=I frames + 3=I, P frames + 7=I, P, B frames +Param[3] + B frames per GOP (for reverse play only) +Param[4] + Mute audio: 0=disable, 1=enable +Param[5] + Display 0=frame, 1=field +Param[6] + Specifies the number of muted audio frames to play before normal audio + resumes. + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_STEP_VIDEO +Enum 5/0x05 +Description + Each call to this API steps the playback to the next unit defined below + in the current playback direction. +Param[0] + 0=frame, 1=top field, 2=bottom field + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_SET_DMA_BLOCK_SIZE +Enum 8/0x08 +Description + Set DMA transfer block size. Counterpart to API 0xC9 +Param[0] + DMA transfer block size in bytes. A different size may be specified + when issuing the DMA transfer command. + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_GET_XFER_INFO +Enum 9/0x09 +Description + This API call may be used to detect an end of stream condtion. +Result[0] + Stream type +Result[1] + Address offset +Result[2] + Maximum bytes to transfer +Result[3] + Buffer fullness + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_GET_DMA_STATUS +Enum 10/0x0A +Description + Status of the last DMA transfer +Result[0] + Bit 1 set means transfer complete + Bit 2 set means DMA error + Bit 3 set means linked list error +Result[1] + DMA type: 0=MPEG, 1=OSD, 2=YUV + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_SCHED_DMA_FROM_HOST +Enum 11/0x0B +Description + Setup DMA from host operation. Counterpart to API 0xCC +Param[0] + Memory address of link list +Param[1] + Total # of bytes to transfer +Param[2] + DMA type (0=MPEG, 1=OSD, 2=YUV) + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_PAUSE_PLAYBACK +Enum 13/0x0D +Description + Freeze playback immediately. In this mode, when internal buffers are + full, no more data will be accepted and data request IRQs will be + masked. +Param[0] + Display: 0=last frame, 1=black + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_HALT_FW +Enum 14/0x0E +Description + The firmware is halted and no further API calls are serviced until + the firmware is uploaded again. + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_SET_STANDARD +Enum 16/0x10 +Description + Selects display standard +Param[0] + 0=NTSC, 1=PAL + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_GET_VERSION +Enum 17/0x11 +Description + Returns decoder firmware version information +Result[0] + Version bitmask: + Bits 0:15 build + Bits 16:23 minor + Bits 24:31 major + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_SET_STREAM_INPUT +Enum 20/0x14 +Description + Select decoder stream input port +Param[0] + 0=memory (default), 1=streaming + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_GET_TIMING_INFO +Enum 21/0x15 +Description + Returns timing information from start of playback +Result[0] + Frame count by decode order +Result[1] + Video PTS bits 0:31 by display order +Result[2] + Video PTS bit 32 by display order +Result[3] + SCR bits 0:31 by display order +Result[4] + SCR bit 32 by display order + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_SET_AUDIO_MODE +Enum 22/0x16 +Description + Select audio mode +Param[0] + Dual mono mode action +Param[1] + Stereo mode action: + 0=Stereo, 1=Left, 2=Right, 3=Mono, 4=Swap, -1=Unchanged + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_SET_EVENT_NOTIFICATION +Enum 23/0x17 +Description + Setup firmware to notify the host about a particular event. + Counterpart to API 0xD5 +Param[0] + Event: 0=Audio mode change between stereo and dual channel +Param[1] + Notification 0=disabled, 1=enabled +Param[2] + Interrupt bit +Param[3] + Mailbox slot, -1 if no mailbox required. + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_SET_DISPLAY_BUFFERS +Enum 24/0x18 +Description + Number of display buffers. To decode all frames in reverse playback you + must use nine buffers. +Param[0] + 0=six buffers, 1=nine buffers + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_EXTRACT_VBI +Enum 25/0x19 +Description + Extracts VBI data +Param[0] + 0=extract from extension & user data, 1=extract from private packets +Result[0] + VBI table location +Result[1] + VBI table size + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_SET_DECODER_SOURCE +Enum 26/0x1A +Description + Selects decoder source. Ensure that the parameters passed to this + API match the encoder settings. +Param[0] + Mode: 0=MPEG from host, 1=YUV from encoder, 2=YUV from host +Param[1] + YUV picture width +Param[2] + YUV picture height +Param[3] + Bitmap: see Param[0] of API 0xBD + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_SET_AUDIO_OUTPUT +Enum 27/0x1B +Description + Select audio output format +Param[0] + Bitmask: + 0:1 Data size: + '00' 16 bit + '01' 20 bit + '10' 24 bit + 2:7 Unused + 8:9 Mode: + '00' 2 channels + '01' 4 channels + '10' 6 channels + '11' 6 channels with one line data mode + (for left justified MSB first mode, 20 bit only) + 10:11 Unused + 12:13 Channel format: + '00' right justified MSB first mode + '01' left justified MSB first mode + '10' I2S mode + 14:15 Unused + 16:21 Right justify bit count + 22:31 Unused + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_SET_AV_DELAY +Enum 28/0x1C +Description + Set audio/video delay in 90Khz ticks +Param[0] + 0=A/V in sync, negative=audio lags, positive=video lags + +------------------------------------------------------------------------------- + +Name CX2341X_DEC_SET_PREBUFFERING +Enum 30/0x1E +Description + Decoder prebuffering, when enabled up to 128KB are buffered for + streams <8mpbs or 640KB for streams >8mbps +Param[0] + 0=off, 1=on diff --git a/Documentation/video4linux/cx2341x/fw-dma.txt b/Documentation/video4linux/cx2341x/fw-dma.txt new file mode 100644 index 000000000000..8123e262d5b6 --- /dev/null +++ b/Documentation/video4linux/cx2341x/fw-dma.txt @@ -0,0 +1,94 @@ +This page describes the structures and procedures used by the cx2341x DMA +engine. + +Introduction +============ + +The cx2341x PCI interface is busmaster capable. This means it has a DMA +engine to efficiently transfer large volumes of data between the card and main +memory without requiring help from a CPU. Like most hardware, it must operate +on contiguous physical memory. This is difficult to come by in large quantities +on virtual memory machines. + +Therefore, it also supports a technique called "scatter-gather". The card can +transfer multiple buffers in one operation. Instead of allocating one large +contiguous buffer, the driver can allocate several smaller buffers. + +In practice, I've seen the average transfer to be roughly 80K, but transfers +above 128K were not uncommon, particularly at startup. The 128K figure is +important, because that is the largest block that the kernel can normally +allocate. Even still, 128K blocks are hard to come by, so the driver writer is +urged to choose a smaller block size and learn the scatter-gather technique. + +Mailbox #10 is reserved for DMA transfer information. + +Flow +==== + +This section describes, in general, the order of events when handling DMA +transfers. Detailed information follows this section. + +- The card raises the Encoder interrupt. +- The driver reads the transfer type, offset and size from Mailbox #10. +- The driver constructs the scatter-gather array from enough free dma buffers + to cover the size. +- The driver schedules the DMA transfer via the ScheduleDMAtoHost API call. +- The card raises the DMA Complete interrupt. +- The driver checks the DMA status register for any errors. +- The driver post-processes the newly transferred buffers. + +NOTE! It is possible that the Encoder and DMA Complete interrupts get raised +simultaneously. (End of the last, start of the next, etc.) + +Mailbox #10 +=========== + +The Flags, Command, Return Value and Timeout fields are ignored. + +Name: Mailbox #10 +Results[0]: Type: 0: MPEG. +Results[1]: Offset: The position relative to the card's memory space. +Results[2]: Size: The exact number of bytes to transfer. + +My speculation is that since the StartCapture API has a capture type of "RAW" +available, that the type field will have other values that correspond to YUV +and PCM data. + +Scatter-Gather Array +==================== + +The scatter-gather array is a contiguously allocated block of memory that +tells the card the source and destination of each data-block to transfer. +Card "addresses" are derived from the offset supplied by Mailbox #10. Host +addresses are the physical memory location of the target DMA buffer. + +Each S-G array element is a struct of three 32-bit words. The first word is +the source address, the second is the destination address. Both take up the +entire 32 bits. The lowest 16 bits of the third word is the transfer byte +count. The high-bit of the third word is the "last" flag. The last-flag tells +the card to raise the DMA_DONE interrupt. From hard personal experience, if +you forget to set this bit, the card will still "work" but the stream will +most likely get corrupted. + +The transfer count must be a multiple of 256. Therefore, the driver will need +to track how much data in the target buffer is valid and deal with it +accordingly. + +Array Element: + +- 32-bit Source Address +- 32-bit Destination Address +- 16-bit reserved (high bit is the last flag) +- 16-bit byte count + +DMA Transfer Status +=================== + +Register 0x0004 holds the DMA Transfer Status: + +Bit +4 Scatter-Gather array error +3 DMA write error +2 DMA read error +1 write completed +0 read completed diff --git a/Documentation/video4linux/cx2341x/fw-encoder-api.txt b/Documentation/video4linux/cx2341x/fw-encoder-api.txt new file mode 100644 index 000000000000..001c68644b08 --- /dev/null +++ b/Documentation/video4linux/cx2341x/fw-encoder-api.txt @@ -0,0 +1,694 @@ +Encoder firmware API description +================================ + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_PING_FW +Enum 128/0x80 +Description + Does nothing. Can be used to check if the firmware is responding. + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_START_CAPTURE +Enum 129/0x81 +Description + Commences the capture of video, audio and/or VBI data. All encoding + parameters must be initialized prior to this API call. Captures frames + continuously or until a predefined number of frames have been captured. +Param[0] + Capture stream type: + 0=MPEG + 1=Raw + 2=Raw passthrough + 3=VBI + +Param[1] + Bitmask: + Bit 0 when set, captures YUV + Bit 1 when set, captures PCM audio + Bit 2 when set, captures VBI (same as param[0]=3) + Bit 3 when set, the capture destination is the decoder + (same as param[0]=2) + Bit 4 when set, the capture destination is the host + Note: this parameter is only meaningful for RAW capture type. + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_STOP_CAPTURE +Enum 130/0x82 +Description + Ends a capture in progress +Param[0] + 0=stop at end of GOP (generates IRQ) + 1=stop immediate (no IRQ) +Param[1] + Stream type to stop, see param[0] of API 0x81 +Param[2] + Subtype, see param[1] of API 0x81 + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_AUDIO_ID +Enum 137/0x89 +Description + Assigns the transport stream ID of the encoded audio stream +Param[0] + Audio Stream ID + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_VIDEO_ID +Enum 139/0x8B +Description + Set video transport stream ID +Param[0] + Video stream ID + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_PCR_ID +Enum 141/0x8D +Description + Assigns the transport stream ID for PCR packets +Param[0] + PCR Stream ID + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_FRAME_RATE +Enum 143/0x8F +Description + Set video frames per second. Change occurs at start of new GOP. +Param[0] + 0=30fps + 1=25fps + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_FRAME_SIZE +Enum 145/0x91 +Description + Select video stream encoding resolution. +Param[0] + Height in lines. Default 480 +Param[1] + Width in pixels. Default 720 + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_BIT_RATE +Enum 149/0x95 +Description + Assign average video stream bitrate. Note on the last three params: + Param[3] and [4] seem to be always 0, param [5] doesn't seem to be used. +Param[0] + 0=variable bitrate, 1=constant bitrate +Param[1] + bitrate in bits per second +Param[2] + peak bitrate in bits per second, divided by 400 +Param[3] + Mux bitrate in bits per second, divided by 400. May be 0 (default). +Param[4] + Rate Control VBR Padding +Param[5] + VBV Buffer used by encoder + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_GOP_PROPERTIES +Enum 151/0x97 +Description + Setup the GOP structure +Param[0] + GOP size (maximum is 34) +Param[1] + Number of B frames between the I and P frame, plus 1. + For example: IBBPBBPBBPBB --> GOP size: 12, number of B frames: 2+1 = 3 + Note that GOP size must be a multiple of (B-frames + 1). + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_ASPECT_RATIO +Enum 153/0x99 +Description + Sets the encoding aspect ratio. Changes in the aspect ratio take effect + at the start of the next GOP. +Param[0] + '0000' forbidden + '0001' 1:1 square + '0010' 4:3 + '0011' 16:9 + '0100' 2.21:1 + '0101' reserved + .... + '1111' reserved + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_DNR_FILTER_MODE +Enum 155/0x9B +Description + Assign Dynamic Noise Reduction operating mode +Param[0] + Bit0: Spatial filter, set=auto, clear=manual + Bit1: Temporal filter, set=auto, clear=manual +Param[1] + Median filter: + 0=Disabled + 1=Horizontal + 2=Vertical + 3=Horiz/Vert + 4=Diagonal + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_DNR_FILTER_PROPS +Enum 157/0x9D +Description + These Dynamic Noise Reduction filter values are only meaningful when + the respective filter is set to "manual" (See API 0x9B) +Param[0] + Spatial filter: default 0, range 0:15 +Param[1] + Temporal filter: default 0, range 0:31 + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_CORING_LEVELS +Enum 159/0x9F +Description + Assign Dynamic Noise Reduction median filter properties. +Param[0] + Threshold above which the luminance median filter is enabled. + Default: 0, range 0:255 +Param[1] + Threshold below which the luminance median filter is enabled. + Default: 255, range 0:255 +Param[2] + Threshold above which the chrominance median filter is enabled. + Default: 0, range 0:255 +Param[3] + Threshold below which the chrominance median filter is enabled. + Default: 255, range 0:255 + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_SPATIAL_FILTER_TYPE +Enum 161/0xA1 +Description + Assign spatial prefilter parameters +Param[0] + Luminance filter + 0=Off + 1=1D Horizontal + 2=1D Vertical + 3=2D H/V Separable (default) + 4=2D Symmetric non-separable +Param[1] + Chrominance filter + 0=Off + 1=1D Horizontal (default) + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_3_2_PULLDOWN +Enum 177/0xB1 +Description + 3:2 pulldown properties +Param[0] + 0=enabled + 1=disabled + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_VBI_LINE +Enum 183/0xB7 +Description + Selects VBI line number. +Param[0] + Bits 0:4 line number + Bit 31 0=top_field, 1=bottom_field + Bits 0:31 all set specifies "all lines" +Param[1] + VBI line information features: 0=disabled, 1=enabled +Param[2] + Slicing: 0=None, 1=Closed Caption + Almost certainly not implemented. Set to 0. +Param[3] + Luminance samples in this line. + Almost certainly not implemented. Set to 0. +Param[4] + Chrominance samples in this line + Almost certainly not implemented. Set to 0. + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_STREAM_TYPE +Enum 185/0xB9 +Description + Assign stream type + Note: Transport stream is not working in recent firmwares. + And in older firmwares the timestamps in the TS seem to be + unreliable. +Param[0] + 0=Program stream + 1=Transport stream + 2=MPEG1 stream + 3=PES A/V stream + 5=PES Video stream + 7=PES Audio stream + 10=DVD stream + 11=VCD stream + 12=SVCD stream + 13=DVD_S1 stream + 14=DVD_S2 stream + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_OUTPUT_PORT +Enum 187/0xBB +Description + Assign stream output port. Normally 0 when the data is copied through + the PCI bus (DMA), and 1 when the data is streamed to another chip + (pvrusb and cx88-blackbird). +Param[0] + 0=Memory (default) + 1=Streaming + 2=Serial +Param[1] + Unknown, but leaving this to 0 seems to work best. Indications are that + this might have to do with USB support, although passing anything but 0 + onl breaks things. + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_AUDIO_PROPERTIES +Enum 189/0xBD +Description + Set audio stream properties, may be called while encoding is in progress. + Note: all bitfields are consistent with ISO11172 documentation except + bits 2:3 which ISO docs define as: + '11' Layer I + '10' Layer II + '01' Layer III + '00' Undefined + This discrepancy may indicate a possible error in the documentation. + Testing indicated that only Layer II is actually working, and that + the minimum bitrate should be 192 kbps. +Param[0] + Bitmask: + 0:1 '00' 44.1Khz + '01' 48Khz + '10' 32Khz + '11' reserved + + 2:3 '01'=Layer I + '10'=Layer II + + 4:7 Bitrate: + Index | Layer I | Layer II + ------+-------------+------------ + '0000' | free format | free format + '0001' | 32 kbit/s | 32 kbit/s + '0010' | 64 kbit/s | 48 kbit/s + '0011' | 96 kbit/s | 56 kbit/s + '0100' | 128 kbit/s | 64 kbit/s + '0101' | 160 kbit/s | 80 kbit/s + '0110' | 192 kbit/s | 96 kbit/s + '0111' | 224 kbit/s | 112 kbit/s + '1000' | 256 kbit/s | 128 kbit/s + '1001' | 288 kbit/s | 160 kbit/s + '1010' | 320 kbit/s | 192 kbit/s + '1011' | 352 kbit/s | 224 kbit/s + '1100' | 384 kbit/s | 256 kbit/s + '1101' | 416 kbit/s | 320 kbit/s + '1110' | 448 kbit/s | 384 kbit/s + Note: For Layer II, not all combinations of total bitrate + and mode are allowed. See ISO11172-3 3-Annex B, Table 3-B.2 + + 8:9 '00'=Stereo + '01'=JointStereo + '10'=Dual + '11'=Mono + Note: testing seems to indicate that Mono and possibly + JointStereo are not working (default to stereo). + Dual does work, though. + + 10:11 Mode Extension used in joint_stereo mode. + In Layer I and II they indicate which subbands are in + intensity_stereo. All other subbands are coded in stereo. + '00' subbands 4-31 in intensity_stereo, bound==4 + '01' subbands 8-31 in intensity_stereo, bound==8 + '10' subbands 12-31 in intensity_stereo, bound==12 + '11' subbands 16-31 in intensity_stereo, bound==16 + + 12:13 Emphasis: + '00' None + '01' 50/15uS + '10' reserved + '11' CCITT J.17 + + 14 CRC: + '0' off + '1' on + + 15 Copyright: + '0' off + '1' on + + 16 Generation: + '0' copy + '1' original + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_HALT_FW +Enum 195/0xC3 +Description + The firmware is halted and no further API calls are serviced until the + firmware is uploaded again. + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_GET_VERSION +Enum 196/0xC4 +Description + Returns the version of the encoder firmware. +Result[0] + Version bitmask: + Bits 0:15 build + Bits 16:23 minor + Bits 24:31 major + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_GOP_CLOSURE +Enum 197/0xC5 +Description + Assigns the GOP open/close property. +Param[0] + 0=Open + 1=Closed + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_GET_SEQ_END +Enum 198/0xC6 +Description + Obtains the sequence end code of the encoder's buffer. When a capture + is started a number of interrupts are still generated, the last of + which will have Result[0] set to 1 and Result[1] will contain the size + of the buffer. +Result[0] + State of the transfer (1 if last buffer) +Result[1] + If Result[0] is 1, this contains the size of the last buffer, undefined + otherwise. + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_PGM_INDEX_INFO +Enum 199/0xC7 +Description + Sets the Program Index Information. +Param[0] + Picture Mask: + 0=No index capture + 1=I frames + 3=I,P frames + 7=I,P,B frames +Param[1] + Elements requested (up to 400) +Result[0] + Offset in SDF memory of the table. +Result[1] + Number of allocated elements up to a maximum of Param[1] + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_VBI_CONFIG +Enum 200/0xC8 +Description + Configure VBI settings +Param[0] + Bitmap: + 0 Mode '0' Sliced, '1' Raw + 1:3 Insertion: + '000' insert in extension & user data + '001' insert in private packets + '010' separate stream and user data + '111' separate stream and private data + 8:15 Stream ID (normally 0xBD) +Param[1] + Frames per interrupt (max 8). Only valid in raw mode. +Param[2] + Total raw VBI frames. Only valid in raw mode. +Param[3] + Start codes +Param[4] + Stop codes +Param[5] + Lines per frame +Param[6] + Byte per line +Result[0] + Observed frames per interrupt in raw mode only. Rage 1 to Param[1] +Result[1] + Observed number of frames in raw mode. Range 1 to Param[2] +Result[2] + Memory offset to start or raw VBI data + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_DMA_BLOCK_SIZE +Enum 201/0xC9 +Description + Set DMA transfer block size +Param[0] + DMA transfer block size in bytes or frames. When unit is bytes, + supported block sizes are 2^7, 2^8 and 2^9 bytes. +Param[1] + Unit: 0=bytes, 1=frames + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_GET_PREV_DMA_INFO_MB_10 +Enum 202/0xCA +Description + Returns information on the previous DMA transfer in conjunction with + bit 27 of the interrupt mask. Uses mailbox 10. +Result[0] + Type of stream +Result[1] + Address Offset +Result[2] + Maximum size of transfer + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_GET_PREV_DMA_INFO_MB_9 +Enum 203/0xCB +Description + Returns information on the previous DMA transfer in conjunction with + bit 27 of the interrupt mask. Uses mailbox 9. +Result[0] + Status bits: + Bit 0 set indicates transfer complete + Bit 2 set indicates transfer error + Bit 4 set indicates linked list error +Result[1] + DMA type +Result[2] + Presentation Time Stamp bits 0..31 +Result[3] + Presentation Time Stamp bit 32 + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SCHED_DMA_TO_HOST +Enum 204/0xCC +Description + Setup DMA to host operation +Param[0] + Memory address of link list +Param[1] + Length of link list (wtf: what units ???) +Param[2] + DMA type (0=MPEG) + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_INITIALIZE_INPUT +Enum 205/0xCD +Description + Initializes the video input + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_FRAME_DROP_RATE +Enum 208/0xD0 +Description + For each frame captured, skip specified number of frames. +Param[0] + Number of frames to skip + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_PAUSE_ENCODER +Enum 210/0xD2 +Description + During a pause condition, all frames are dropped instead of being encoded. +Param[0] + 0=Pause encoding + 1=Continue encoding + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_REFRESH_INPUT +Enum 211/0xD3 +Description + Refreshes the video input + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_COPYRIGHT +Enum 212/0xD4 +Description + Sets stream copyright property +Param[0] + 0=Stream is not copyrighted + 1=Stream is copyrighted + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_EVENT_NOTIFICATION +Enum 213/0xD5 +Description + Setup firmware to notify the host about a particular event. Host must + unmask the interrupt bit. +Param[0] + Event (0=refresh encoder input) +Param[1] + Notification 0=disabled 1=enabled +Param[2] + Interrupt bit +Param[3] + Mailbox slot, -1 if no mailbox required. + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_NUM_VSYNC_LINES +Enum 214/0xD6 +Description + Depending on the analog video decoder used, this assigns the number + of lines for field 1 and 2. +Param[0] + Field 1 number of lines: + 0x00EF for SAA7114 + 0x00F0 for SAA7115 + 0x0105 for Micronas +Param[1] + Field 2 number of lines: + 0x00EF for SAA7114 + 0x00F0 for SAA7115 + 0x0106 for Micronas + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_SET_PLACEHOLDER +Enum 215/0xD7 +Description + Provides a mechanism of inserting custom user data in the MPEG stream. +Param[0] + 0=extension & user data + 1=private packet with stream ID 0xBD +Param[1] + Rate at which to insert data, in units of frames (for private packet) + or GOPs (for ext. & user data) +Param[2] + Number of data DWORDs (below) to insert +Param[3] + Custom data 0 +Param[4] + Custom data 1 +Param[5] + Custom data 2 +Param[6] + Custom data 3 +Param[7] + Custom data 4 +Param[8] + Custom data 5 +Param[9] + Custom data 6 +Param[10] + Custom data 7 +Param[11] + Custom data 8 + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_MUTE_VIDEO +Enum 217/0xD9 +Description + Video muting +Param[0] + Bit usage: + 0 '0'=video not muted + '1'=video muted, creates frames with the YUV color defined below + 1:7 Unused + 8:15 V chrominance information + 16:23 U chrominance information + 24:31 Y luminance information + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_MUTE_AUDIO +Enum 218/0xDA +Description + Audio muting +Param[0] + 0=audio not muted + 1=audio muted (produces silent mpeg audio stream) + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_UNKNOWN +Enum 219/0xDB +Description + Unknown API, it's used by Hauppauge though. +Param[0] + 0 This is the value Hauppauge uses, Unknown what it means. + +------------------------------------------------------------------------------- + +Name CX2341X_ENC_MISC +Enum 220/0xDC +Description + Miscellaneous actions. Not known for 100% what it does. It's really a + sort of ioctl call. The first parameter is a command number, the second + the value. +Param[0] + Command number: + 1=set initial SCR value when starting encoding. + 2=set quality mode (apparently some test setting). + 3=setup advanced VIM protection handling (supposedly only for the cx23416 + for raw YUV). + Actually it looks like this should be 0 for saa7114/5 based card and 1 + for cx25840 based cards. + 4=generate artificial PTS timestamps + 5=USB flush mode + 6=something to do with the quantization matrix + 7=set navigation pack insertion for DVD + 8=enable scene change detection (seems to be a failure) + 9=set history parameters of the video input module + 10=set input field order of VIM + 11=set quantization matrix + 12=reset audio interface + 13=set audio volume delay + 14=set audio delay + +Param[1] + Command value. diff --git a/Documentation/video4linux/cx2341x/fw-memory.txt b/Documentation/video4linux/cx2341x/fw-memory.txt new file mode 100644 index 000000000000..ef0aad3f88fc --- /dev/null +++ b/Documentation/video4linux/cx2341x/fw-memory.txt @@ -0,0 +1,141 @@ +This document describes the cx2341x memory map and documents some of the register +space. + +Warning! This information was figured out from searching through the memory and +registers, this information may not be correct and is certainly not complete, and +was not derived from anything more than searching through the memory space with +commands like: + + ivtvctl -O min=0x02000000,max=0x020000ff + +So take this as is, I'm always searching for more stuff, it's a large +register space :-). + +Memory Map +========== + +The cx2341x exposes its entire 64M memory space to the PCI host via the PCI BAR0 +(Base Address Register 0). The addresses here are offsets relative to the +address held in BAR0. + +0x00000000-0x00ffffff Encoder memory space +0x00000000-0x0003ffff Encode.rom + ???-??? MPEG buffer(s) + ???-??? Raw video capture buffer(s) + ???-??? Raw audio capture buffer(s) + ???-??? Display buffers (6 or 9) + +0x01000000-0x01ffffff Decoder memory space +0x01000000-0x0103ffff Decode.rom + ???-??? MPEG buffers(s) +0x0114b000-0x0115afff Audio.rom (deprecated?) + +0x02000000-0x0200ffff Register Space + +Registers +========= + +The registers occupy the 64k space starting at the 0x02000000 offset from BAR0. +All of these registers are 32 bits wide. + +DMA Registers 0x000-0xff: + + 0x00 - Control: + 0=reset/cancel, 1=read, 2=write, 4=stop + 0x04 - DMA status: + 1=read busy, 2=write busy, 4=read error, 8=write error, 16=link list error + 0x08 - pci DMA pointer for read link list + 0x0c - pci DMA pointer for write link list + 0x10 - read/write DMA enable: + 1=read enable, 2=write enable + 0x14 - always 0xffffffff, if set any lower instability occurs, 0x00 crashes + 0x18 - ?? + 0x1c - always 0x20 or 32, smaller values slow down DMA transactions + 0x20 - always value of 0x780a010a + 0x24-0x3c - usually just random values??? + 0x40 - Interrupt status + 0x44 - Write a bit here and shows up in Interrupt status 0x40 + 0x48 - Interrupt Mask + 0x4C - always value of 0xfffdffff, + if changed to 0xffffffff DMA write interrupts break. + 0x50 - always 0xffffffff + 0x54 - always 0xffffffff (0x4c, 0x50, 0x54 seem like interrupt masks, are + 3 processors on chip, Java ones, VPU, SPU, APU, maybe these are the + interrupt masks???). + 0x60-0x7C - random values + 0x80 - first write linked list reg, for Encoder Memory addr + 0x84 - first write linked list reg, for pci memory addr + 0x88 - first write linked list reg, for length of buffer in memory addr + (|0x80000000 or this for last link) + 0x8c-0xcc - rest of write linked list reg, 8 sets of 3 total, DMA goes here + from linked list addr in reg 0x0c, firmware must push through or + something. + 0xe0 - first (and only) read linked list reg, for pci memory addr + 0xe4 - first (and only) read linked list reg, for Decoder memory addr + 0xe8 - first (and only) read linked list reg, for length of buffer + 0xec-0xff - Nothing seems to be in these registers, 0xec-f4 are 0x00000000. + +Memory locations for Encoder Buffers 0x700-0x7ff: + +These registers show offsets of memory locations pertaining to each +buffer area used for encoding, have to shift them by <<1 first. + +0x07F8: Encoder SDRAM refresh +0x07FC: Encoder SDRAM pre-charge + +Memory locations for Decoder Buffers 0x800-0x8ff: + +These registers show offsets of memory locations pertaining to each +buffer area used for decoding, have to shift them by <<1 first. + +0x08F8: Decoder SDRAM refresh +0x08FC: Decoder SDRAM pre-charge + +Other memory locations: + +0x2800: Video Display Module control +0x2D00: AO (audio output?) control +0x2D24: Bytes Flushed +0x7000: LSB I2C write clock bit (inverted) +0x7004: LSB I2C write data bit (inverted) +0x7008: LSB I2C read clock bit +0x700c: LSB I2C read data bit +0x9008: GPIO get input state +0x900c: GPIO set output state +0x9020: GPIO direction (Bit7 (GPIO 0..7) - 0:input, 1:output) +0x9050: SPU control +0x9054: Reset HW blocks +0x9058: VPU control +0xA018: Bit6: interrupt pending? +0xA064: APU command + + +Interrupt Status Register +========================= + +The definition of the bits in the interrupt status register 0x0040, and the +interrupt mask 0x0048. If a bit is cleared in the mask, then we want our ISR to +execute. + +Bit +31 Encoder Start Capture +30 Encoder EOS +29 Encoder VBI capture +28 Encoder Video Input Module reset event +27 Encoder DMA complete +26 +25 Decoder copy protect detection event +24 Decoder audio mode change detection event +23 +22 Decoder data request +21 Decoder I-Frame? done +20 Decoder DMA complete +19 Decoder VBI re-insertion +18 Decoder DMA err (linked-list bad) + +Missing +Encoder API call completed +Decoder API call completed +Encoder API post(?) +Decoder API post(?) +Decoder VTRACE event diff --git a/Documentation/video4linux/cx2341x/fw-osd-api.txt b/Documentation/video4linux/cx2341x/fw-osd-api.txt new file mode 100644 index 000000000000..da98ae30a37a --- /dev/null +++ b/Documentation/video4linux/cx2341x/fw-osd-api.txt @@ -0,0 +1,342 @@ +OSD firmware API description +============================ + +Note: this API is part of the decoder firmware, so it's cx23415 only. + +------------------------------------------------------------------------------- + +Name CX2341X_OSD_GET_FRAMEBUFFER +Enum 65/0x41 +Description + Return base and length of contiguous OSD memory. +Result[0] + OSD base address +Result[1] + OSD length + +------------------------------------------------------------------------------- + +Name CX2341X_OSD_GET_PIXEL_FORMAT +Enum 66/0x42 +Description + Query OSD format +Result[0] + 0=8bit index, 4=AlphaRGB 8:8:8:8 + +------------------------------------------------------------------------------- + +Name CX2341X_OSD_SET_PIXEL_FORMAT +Enum 67/0x43 +Description + Assign pixel format +Param[0] + 0=8bit index, 4=AlphaRGB 8:8:8:8 + +------------------------------------------------------------------------------- + +Name CX2341X_OSD_GET_STATE +Enum 68/0x44 +Description + Query OSD state +Result[0] + Bit 0 0=off, 1=on + Bits 1:2 alpha control + Bits 3:5 pixel format + +------------------------------------------------------------------------------- + +Name CX2341X_OSD_SET_STATE +Enum 69/0x45 +Description + OSD switch +Param[0] + 0=off, 1=on + +------------------------------------------------------------------------------- + +Name CX2341X_OSD_GET_OSD_COORDS +Enum 70/0x46 +Description + Retrieve coordinates of OSD area blended with video +Result[0] + OSD buffer address +Result[1] + Stride in pixels +Result[2] + Lines in OSD buffer +Result[3] + Horizontal offset in buffer +Result[4] + Vertical offset in buffer + +------------------------------------------------------------------------------- + +Name CX2341X_OSD_SET_OSD_COORDS +Enum 71/0x47 +Description + Assign the coordinates of the OSD area to blend with video +Param[0] + buffer address +Param[1] + buffer stride in pixels +Param[2] + lines in buffer +Param[3] + horizontal offset +Param[4] + vertical offset + +------------------------------------------------------------------------------- + +Name CX2341X_OSD_GET_SCREEN_COORDS +Enum 72/0x48 +Description + Retrieve OSD screen area coordinates +Result[0] + top left horizontal offset +Result[1] + top left vertical offset +Result[2] + bottom right hotizontal offset +Result[3] + bottom right vertical offset + +------------------------------------------------------------------------------- + +Name CX2341X_OSD_SET_SCREEN_COORDS +Enum 73/0x49 +Description + Assign the coordinates of the screen area to blend with video +Param[0] + top left horizontal offset +Param[1] + top left vertical offset +Param[2] + bottom left horizontal offset +Param[3] + bottom left vertical offset + +------------------------------------------------------------------------------- + +Name CX2341X_OSD_GET_GLOBAL_ALPHA +Enum 74/0x4A +Description + Retrieve OSD global alpha +Result[0] + global alpha: 0=off, 1=on +Result[1] + bits 0:7 global alpha + +------------------------------------------------------------------------------- + +Name CX2341X_OSD_SET_GLOBAL_ALPHA +Enum 75/0x4B +Description + Update global alpha +Param[0] + global alpha: 0=off, 1=on +Param[1] + global alpha (8 bits) +Param[2] + local alpha: 0=on, 1=off + +------------------------------------------------------------------------------- + +Name CX2341X_OSD_SET_BLEND_COORDS +Enum 78/0x4C +Description + Move start of blending area within display buffer +Param[0] + horizontal offset in buffer +Param[1] + vertical offset in buffer + +------------------------------------------------------------------------------- + +Name CX2341X_OSD_GET_FLICKER_STATE +Enum 79/0x4F +Description + Retrieve flicker reduction module state +Result[0] + flicker state: 0=off, 1=on + +------------------------------------------------------------------------------- + +Name CX2341X_OSD_SET_FLICKER_STATE +Enum 80/0x50 +Description + Set flicker reduction module state +Param[0] + State: 0=off, 1=on + +------------------------------------------------------------------------------- + +Name CX2341X_OSD_BLT_COPY +Enum 82/0x52 +Description + BLT copy +Param[0] +'0000' zero +'0001' ~destination AND ~source +'0010' ~destination AND source +'0011' ~destination +'0100' destination AND ~source +'0101' ~source +'0110' destination XOR source +'0111' ~destination OR ~source +'1000' ~destination AND ~source +'1001' destination XNOR source +'1010' source +'1011' ~destination OR source +'1100' destination +'1101' destination OR ~source +'1110' destination OR source +'1111' one + +Param[1] + Resulting alpha blending + '01' source_alpha + '10' destination_alpha + '11' source_alpha*destination_alpha+1 + (zero if both source and destination alpha are zero) +Param[2] + '00' output_pixel = source_pixel + + '01' if source_alpha=0: + output_pixel = destination_pixel + if 256 > source_alpha > 1: + output_pixel = ((source_alpha + 1)*source_pixel + + (255 - source_alpha)*destination_pixel)/256 + + '10' if destination_alpha=0: + output_pixel = source_pixel + if 255 > destination_alpha > 0: + output_pixel = ((255 - destination_alpha)*source_pixel + + (destination_alpha + 1)*destination_pixel)/256 + + '11' if source_alpha=0: + source_temp = 0 + if source_alpha=255: + source_temp = source_pixel*256 + if 255 > source_alpha > 0: + source_temp = source_pixel*(source_alpha + 1) + if destination_alpha=0: + destination_temp = 0 + if destination_alpha=255: + destination_temp = destination_pixel*256 + if 255 > destination_alpha > 0: + destination_temp = destination_pixel*(destination_alpha + 1) + output_pixel = (source_temp + destination_temp)/256 +Param[3] + width +Param[4] + height +Param[5] + destination pixel mask +Param[6] + destination rectangle start address +Param[7] + destination stride in dwords +Param[8] + source stride in dwords +Param[9] + source rectangle start address + +------------------------------------------------------------------------------- + +Name CX2341X_OSD_BLT_FILL +Enum 83/0x53 +Description + BLT fill color +Param[0] + Same as Param[0] on API 0x52 +Param[1] + Same as Param[1] on API 0x52 +Param[2] + Same as Param[2] on API 0x52 +Param[3] + width +Param[4] + height +Param[5] + destination pixel mask +Param[6] + destination rectangle start address +Param[7] + destination stride in dwords +Param[8] + color fill value + +------------------------------------------------------------------------------- + +Name CX2341X_OSD_BLT_TEXT +Enum 84/0x54 +Description + BLT for 8 bit alpha text source +Param[0] + Same as Param[0] on API 0x52 +Param[1] + Same as Param[1] on API 0x52 +Param[2] + Same as Param[2] on API 0x52 +Param[3] + width +Param[4] + height +Param[5] + destination pixel mask +Param[6] + destination rectangle start address +Param[7] + destination stride in dwords +Param[8] + source stride in dwords +Param[9] + source rectangle start address +Param[10] + color fill value + +------------------------------------------------------------------------------- + +Name CX2341X_OSD_SET_FRAMEBUFFER_WINDOW +Enum 86/0x56 +Description + Positions the main output window on the screen. The coordinates must be + such that the entire window fits on the screen. +Param[0] + window width +Param[1] + window height +Param[2] + top left window corner horizontal offset +Param[3] + top left window corner vertical offset + +------------------------------------------------------------------------------- + +Name CX2341X_OSD_SET_CHROMA_KEY +Enum 96/0x60 +Description + Chroma key switch and color +Param[0] + state: 0=off, 1=on +Param[1] + color + +------------------------------------------------------------------------------- + +Name CX2341X_OSD_GET_ALPHA_CONTENT_INDEX +Enum 97/0x61 +Description + Retrieve alpha content index +Result[0] + alpha content index, Range 0:15 + +------------------------------------------------------------------------------- + +Name CX2341X_OSD_SET_ALPHA_CONTENT_INDEX +Enum 98/0x62 +Description + Assign alpha content index +Param[0] + alpha content index, range 0:15 diff --git a/Documentation/video4linux/cx2341x/fw-upload.txt b/Documentation/video4linux/cx2341x/fw-upload.txt new file mode 100644 index 000000000000..60c502ce3215 --- /dev/null +++ b/Documentation/video4linux/cx2341x/fw-upload.txt @@ -0,0 +1,49 @@ +This document describes how to upload the cx2341x firmware to the card. + +How to find +=========== + +See the web pages of the various projects that uses this chip for information +on how to obtain the firmware. + +The firmware stored in a Windows driver can be detected as follows: + +- Each firmware image is 256k bytes. +- The 1st 32-bit word of the Encoder image is 0x0000da7 +- The 1st 32-bit word of the Decoder image is 0x00003a7 +- The 2nd 32-bit word of both images is 0xaa55bb66 + +How to load +=========== + +- Issue the FWapi command to stop the encoder if it is running. Wait for the + command to complete. +- Issue the FWapi command to stop the decoder if it is running. Wait for the + command to complete. +- Issue the I2C command to the digitizer to stop emitting VSYNC events. +- Issue the FWapi command to halt the encoder's firmware. +- Sleep for 10ms. +- Issue the FWapi command to halt the decoder's firmware. +- Sleep for 10ms. +- Write 0x00000000 to register 0x2800 to stop the Video Display Module. +- Write 0x00000005 to register 0x2D00 to stop the AO (audio output?). +- Write 0x00000000 to register 0xA064 to ping? the APU. +- Write 0xFFFFFFFE to register 0x9058 to stop the VPU. +- Write 0xFFFFFFFF to register 0x9054 to reset the HW blocks. +- Write 0x00000001 to register 0x9050 to stop the SPU. +- Sleep for 10ms. +- Write 0x0000001A to register 0x07FC to init the Encoder SDRAM's pre-charge. +- Write 0x80000640 to register 0x07F8 to init the Encoder SDRAM's refresh to 1us. +- Write 0x0000001A to register 0x08FC to init the Decoder SDRAM's pre-charge. +- Write 0x80000640 to register 0x08F8 to init the Decoder SDRAM's refresh to 1us. +- Sleep for 512ms. (600ms is recommended) +- Transfer the encoder's firmware image to offset 0 in Encoder memory space. +- Transfer the decoder's firmware image to offset 0 in Decoder memory space. +- Use a read-modify-write operation to Clear bit 0 of register 0x9050 to + re-enable the SPU. +- Sleep for 1 second. +- Use a read-modify-write operation to Clear bits 3 and 0 of register 0x9058 + to re-enable the VPU. +- Sleep for 1 second. +- Issue status API commands to both firmware images to verify. + diff --git a/Documentation/video4linux/cx88/hauppauge-wintv-cx88-ir.txt b/Documentation/video4linux/cx88/hauppauge-wintv-cx88-ir.txt new file mode 100644 index 000000000000..93fec32a1188 --- /dev/null +++ b/Documentation/video4linux/cx88/hauppauge-wintv-cx88-ir.txt @@ -0,0 +1,54 @@ +The controls for the mux are GPIO [0,1] for source, and GPIO 2 for muting. + +GPIO0 GPIO1 + 0 0 TV Audio + 1 0 FM radio + 0 1 Line-In + 1 1 Mono tuner bypass or CD passthru (tuner specific) + +GPIO 16(i believe) is tied to the IR port (if present). + +------------------------------------------------------------------------------------ + +>From the data sheet: + Register 24'h20004 PCI Interrupt Status + bit [18] IR_SMP_INT Set when 32 input samples have been collected over + gpio[16] pin into GP_SAMPLE register. + +What's missing from the data sheet: + +Setup 4KHz sampling rate (roughly 2x oversampled; good enough for our RC5 +compat remote) +set register 0x35C050 to 0xa80a80 + +enable sampling +set register 0x35C054 to 0x5 + +Of course, enable the IRQ bit 18 in the interrupt mask register .(and +provide for a handler) + +GP_SAMPLE register is at 0x35C058 + +Bits are then right shifted into the GP_SAMPLE register at the specified +rate; you get an interrupt when a full DWORD is recieved. +You need to recover the actual RC5 bits out of the (oversampled) IR sensor +bits. (Hint: look for the 0/1and 1/0 crossings of the RC5 bi-phase data) An +actual raw RC5 code will span 2-3 DWORDS, depending on the actual alignment. + +I'm pretty sure when no IR signal is present the receiver is always in a +marking state(1); but stray light, etc can cause intermittent noise values +as well. Remember, this is a free running sample of the IR receiver state +over time, so don't assume any sample starts at any particular place. + +http://www.atmel.com/dyn/resources/prod_documents/doc2817.pdf +This data sheet (google search) seems to have a lovely description of the +RC5 basics + +http://users.pandora.be/nenya/electronics/rc5/ and more data + +http://www.ee.washington.edu/circuit_archive/text/ir_decode.txt +and even a reference to how to decode a bi-phase data stream. + +http://www.xs4all.nl/~sbp/knowledge/ir/rc5.htm +still more info + diff --git a/Documentation/video4linux/et61x251.txt b/Documentation/video4linux/et61x251.txt index 29340282ab5f..cd584f20a997 100644 --- a/Documentation/video4linux/et61x251.txt +++ b/Documentation/video4linux/et61x251.txt @@ -1,9 +1,9 @@ - ET61X[12]51 PC Camera Controllers - Driver for Linux - ================================= + ET61X[12]51 PC Camera Controllers + Driver for Linux + ================================= - - Documentation - + - Documentation - Index @@ -156,46 +156,46 @@ Name: video_nr Type: short array (min = 0, max = 64) Syntax: <-1|n[,...]> Description: Specify V4L2 minor mode number: - -1 = use next available - n = use minor number n - You can specify up to 64 cameras this way. - For example: - video_nr=-1,2,-1 would assign minor number 2 to the second - registered camera and use auto for the first one and for every - other camera. + -1 = use next available + n = use minor number n + You can specify up to 64 cameras this way. + For example: + video_nr=-1,2,-1 would assign minor number 2 to the second + registered camera and use auto for the first one and for every + other camera. Default: -1 ------------------------------------------------------------------------------- Name: force_munmap Type: bool array (min = 0, max = 64) Syntax: <0|1[,...]> Description: Force the application to unmap previously mapped buffer memory - before calling any VIDIOC_S_CROP or VIDIOC_S_FMT ioctl's. Not - all the applications support this feature. This parameter is - specific for each detected camera. - 0 = do not force memory unmapping - 1 = force memory unmapping (save memory) + before calling any VIDIOC_S_CROP or VIDIOC_S_FMT ioctl's. Not + all the applications support this feature. This parameter is + specific for each detected camera. + 0 = do not force memory unmapping + 1 = force memory unmapping (save memory) Default: 0 ------------------------------------------------------------------------------- Name: frame_timeout Type: uint array (min = 0, max = 64) Syntax: <n[,...]> Description: Timeout for a video frame in seconds. This parameter is - specific for each detected camera. This parameter can be - changed at runtime thanks to the /sys filesystem interface. + specific for each detected camera. This parameter can be + changed at runtime thanks to the /sys filesystem interface. Default: 2 ------------------------------------------------------------------------------- Name: debug Type: ushort Syntax: <n> Description: Debugging information level, from 0 to 3: - 0 = none (use carefully) - 1 = critical errors - 2 = significant informations - 3 = more verbose messages - Level 3 is useful for testing only, when only one device - is used at the same time. It also shows some more informations - about the hardware being detected. This module parameter can be - changed at runtime thanks to the /sys filesystem interface. + 0 = none (use carefully) + 1 = critical errors + 2 = significant informations + 3 = more verbose messages + Level 3 is useful for testing only, when only one device + is used at the same time. It also shows some more informations + about the hardware being detected. This module parameter can be + changed at runtime thanks to the /sys filesystem interface. Default: 2 ------------------------------------------------------------------------------- diff --git a/Documentation/video4linux/ibmcam.txt b/Documentation/video4linux/ibmcam.txt index 4a40a2e99451..397a94eb77b8 100644 --- a/Documentation/video4linux/ibmcam.txt +++ b/Documentation/video4linux/ibmcam.txt @@ -21,7 +21,7 @@ Internal interface: Video For Linux (V4L) Supported controls: - by V4L: Contrast, Brightness, Color, Hue - by driver options: frame rate, lighting conditions, video format, - default picture settings, sharpness. + default picture settings, sharpness. SUPPORTED CAMERAS: @@ -191,66 +191,66 @@ init_model2_sat Integer 0..255 [0x34] init_model2_sat=65 init_model2_yb Integer 0..255 [0xa0] init_model2_yb=200 debug You don't need this option unless you are a developer. - If you are a developer then you will see in the code - what values do what. 0=off. + If you are a developer then you will see in the code + what values do what. 0=off. flags This is a bit mask, and you can combine any number of - bits to produce what you want. Usually you don't want - any of extra features this option provides: - - FLAGS_RETRY_VIDIOCSYNC 1 This bit allows to retry failed - VIDIOCSYNC ioctls without failing. - Will work with xawtv, will not - with xrealproducer. Default is - not set. - FLAGS_MONOCHROME 2 Activates monochrome (b/w) mode. - FLAGS_DISPLAY_HINTS 4 Shows colored pixels which have - magic meaning to developers. - FLAGS_OVERLAY_STATS 8 Shows tiny numbers on screen, - useful only for debugging. - FLAGS_FORCE_TESTPATTERN 16 Shows blue screen with numbers. - FLAGS_SEPARATE_FRAMES 32 Shows each frame separately, as - it was received from the camera. - Default (not set) is to mix the - preceding frame in to compensate - for occasional loss of Isoc data - on high frame rates. - FLAGS_CLEAN_FRAMES 64 Forces "cleanup" of each frame - prior to use; relevant only if - FLAGS_SEPARATE_FRAMES is set. - Default is not to clean frames, - this is a little faster but may - produce flicker if frame rate is - too high and Isoc data gets lost. - FLAGS_NO_DECODING 128 This flag turns the video stream - decoder off, and dumps the raw - Isoc data from the camera into - the reading process. Useful to - developers, but not to users. + bits to produce what you want. Usually you don't want + any of extra features this option provides: + + FLAGS_RETRY_VIDIOCSYNC 1 This bit allows to retry failed + VIDIOCSYNC ioctls without failing. + Will work with xawtv, will not + with xrealproducer. Default is + not set. + FLAGS_MONOCHROME 2 Activates monochrome (b/w) mode. + FLAGS_DISPLAY_HINTS 4 Shows colored pixels which have + magic meaning to developers. + FLAGS_OVERLAY_STATS 8 Shows tiny numbers on screen, + useful only for debugging. + FLAGS_FORCE_TESTPATTERN 16 Shows blue screen with numbers. + FLAGS_SEPARATE_FRAMES 32 Shows each frame separately, as + it was received from the camera. + Default (not set) is to mix the + preceding frame in to compensate + for occasional loss of Isoc data + on high frame rates. + FLAGS_CLEAN_FRAMES 64 Forces "cleanup" of each frame + prior to use; relevant only if + FLAGS_SEPARATE_FRAMES is set. + Default is not to clean frames, + this is a little faster but may + produce flicker if frame rate is + too high and Isoc data gets lost. + FLAGS_NO_DECODING 128 This flag turns the video stream + decoder off, and dumps the raw + Isoc data from the camera into + the reading process. Useful to + developers, but not to users. framerate This setting controls frame rate of the camera. This is - an approximate setting (in terms of "worst" ... "best") - because camera changes frame rate depending on amount - of light available. Setting 0 is slowest, 6 is fastest. - Beware - fast settings are very demanding and may not - work well with all video sizes. Be conservative. + an approximate setting (in terms of "worst" ... "best") + because camera changes frame rate depending on amount + of light available. Setting 0 is slowest, 6 is fastest. + Beware - fast settings are very demanding and may not + work well with all video sizes. Be conservative. hue_correction This highly optional setting allows to adjust the - hue of the image in a way slightly different from - what usual "hue" control does. Both controls affect - YUV colorspace: regular "hue" control adjusts only - U component, and this "hue_correction" option similarly - adjusts only V component. However usually it is enough - to tweak only U or V to compensate for colored light or - color temperature; this option simply allows more - complicated correction when and if it is necessary. + hue of the image in a way slightly different from + what usual "hue" control does. Both controls affect + YUV colorspace: regular "hue" control adjusts only + U component, and this "hue_correction" option similarly + adjusts only V component. However usually it is enough + to tweak only U or V to compensate for colored light or + color temperature; this option simply allows more + complicated correction when and if it is necessary. init_brightness These settings specify _initial_ values which will be init_contrast used to set up the camera. If your V4L application has init_color its own controls to adjust the picture then these init_hue controls will be used too. These options allow you to - preconfigure the camera when it gets connected, before - any V4L application connects to it. Good for webcams. + preconfigure the camera when it gets connected, before + any V4L application connects to it. Good for webcams. init_model2_rg These initial settings alter color balance of the init_model2_rg2 camera on hardware level. All four settings may be used @@ -258,47 +258,47 @@ init_model2_sat to tune the camera to specific lighting conditions. These init_model2_yb settings only apply to Model 2 cameras. lighting This option selects one of three hardware-defined - photosensitivity settings of the camera. 0=bright light, - 1=Medium (default), 2=Low light. This setting affects - frame rate: the dimmer the lighting the lower the frame - rate (because longer exposition time is needed). The - Model 2 cameras allow values more than 2 for this option, - thus enabling extremely high sensitivity at cost of frame - rate, color saturation and imaging sensor noise. + photosensitivity settings of the camera. 0=bright light, + 1=Medium (default), 2=Low light. This setting affects + frame rate: the dimmer the lighting the lower the frame + rate (because longer exposition time is needed). The + Model 2 cameras allow values more than 2 for this option, + thus enabling extremely high sensitivity at cost of frame + rate, color saturation and imaging sensor noise. sharpness This option controls smoothing (noise reduction) - made by camera. Setting 0 is most smooth, setting 6 - is most sharp. Be aware that CMOS sensor used in the - camera is pretty noisy, so if you choose 6 you will - be greeted with "snowy" image. Default is 4. Model 2 - cameras do not support this feature. + made by camera. Setting 0 is most smooth, setting 6 + is most sharp. Be aware that CMOS sensor used in the + camera is pretty noisy, so if you choose 6 you will + be greeted with "snowy" image. Default is 4. Model 2 + cameras do not support this feature. size This setting chooses one of several image sizes that are - supported by this driver. Cameras may support more, but - it's difficult to reverse-engineer all formats. - Following video sizes are supported: - - size=0 128x96 (Model 1 only) - size=1 160x120 - size=2 176x144 - size=3 320x240 (Model 2 only) - size=4 352x240 (Model 2 only) - size=5 352x288 - size=6 640x480 (Model 3 only) - - The 352x288 is the native size of the Model 1 sensor - array, so it's the best resolution the camera can - yield. The best resolution of Model 2 is 176x144, and - larger images are produced by stretching the bitmap. - Model 3 has sensor with 640x480 grid, and it works too, - but the frame rate will be exceptionally low (1-2 FPS); - it may be still OK for some applications, like security. - Choose the image size you need. The smaller image can - support faster frame rate. Default is 352x288. + supported by this driver. Cameras may support more, but + it's difficult to reverse-engineer all formats. + Following video sizes are supported: + + size=0 128x96 (Model 1 only) + size=1 160x120 + size=2 176x144 + size=3 320x240 (Model 2 only) + size=4 352x240 (Model 2 only) + size=5 352x288 + size=6 640x480 (Model 3 only) + + The 352x288 is the native size of the Model 1 sensor + array, so it's the best resolution the camera can + yield. The best resolution of Model 2 is 176x144, and + larger images are produced by stretching the bitmap. + Model 3 has sensor with 640x480 grid, and it works too, + but the frame rate will be exceptionally low (1-2 FPS); + it may be still OK for some applications, like security. + Choose the image size you need. The smaller image can + support faster frame rate. Default is 352x288. For more information and the Troubleshooting FAQ visit this URL: - http://www.linux-usb.org/ibmcam/ + http://www.linux-usb.org/ibmcam/ WHAT NEEDS TO BE DONE: diff --git a/Documentation/video4linux/ov511.txt b/Documentation/video4linux/ov511.txt index 142741e3c578..79af610d4ba5 100644 --- a/Documentation/video4linux/ov511.txt +++ b/Documentation/video4linux/ov511.txt @@ -81,7 +81,7 @@ MODULE PARAMETERS: TYPE: integer (Boolean) DEFAULT: 1 DESC: Brightness is normally under automatic control and can't be set - manually by the video app. Set to 0 for manual control. + manually by the video app. Set to 0 for manual control. NAME: autogain TYPE: integer (Boolean) @@ -97,13 +97,13 @@ MODULE PARAMETERS: TYPE: integer (0-6) DEFAULT: 3 DESC: Sets the threshold for printing debug messages. The higher the value, - the more is printed. The levels are cumulative, and are as follows: - 0=no debug messages - 1=init/detection/unload and other significant messages - 2=some warning messages - 3=config/control function calls - 4=most function calls and data parsing messages - 5=highly repetitive mesgs + the more is printed. The levels are cumulative, and are as follows: + 0=no debug messages + 1=init/detection/unload and other significant messages + 2=some warning messages + 3=config/control function calls + 4=most function calls and data parsing messages + 5=highly repetitive mesgs NAME: snapshot TYPE: integer (Boolean) @@ -116,24 +116,24 @@ MODULE PARAMETERS: TYPE: integer (1-4 for OV511, 1-31 for OV511+) DEFAULT: 1 DESC: Number of cameras allowed to stream simultaneously on a single bus. - Values higher than 1 reduce the data rate of each camera, allowing two - or more to be used at once. If you have a complicated setup involving - both OV511 and OV511+ cameras, trial-and-error may be necessary for - finding the optimum setting. + Values higher than 1 reduce the data rate of each camera, allowing two + or more to be used at once. If you have a complicated setup involving + both OV511 and OV511+ cameras, trial-and-error may be necessary for + finding the optimum setting. NAME: compress TYPE: integer (Boolean) DEFAULT: 0 DESC: Set this to 1 to turn on the camera's compression engine. This can - potentially increase the frame rate at the expense of quality, if you - have a fast CPU. You must load the proper compression module for your - camera before starting your application (ov511_decomp or ov518_decomp). + potentially increase the frame rate at the expense of quality, if you + have a fast CPU. You must load the proper compression module for your + camera before starting your application (ov511_decomp or ov518_decomp). NAME: testpat TYPE: integer (Boolean) DEFAULT: 0 DESC: This configures the camera's sensor to transmit a colored test-pattern - instead of an image. This does not work correctly yet. + instead of an image. This does not work correctly yet. NAME: dumppix TYPE: integer (0-2) diff --git a/Documentation/video4linux/sn9c102.txt b/Documentation/video4linux/sn9c102.txt index 142920bc011f..1d20895b4354 100644 --- a/Documentation/video4linux/sn9c102.txt +++ b/Documentation/video4linux/sn9c102.txt @@ -1,9 +1,9 @@ - SN9C10x PC Camera Controllers - Driver for Linux - ============================= + SN9C10x PC Camera Controllers + Driver for Linux + ============================= - - Documentation - + - Documentation - Index @@ -176,46 +176,46 @@ Name: video_nr Type: short array (min = 0, max = 64) Syntax: <-1|n[,...]> Description: Specify V4L2 minor mode number: - -1 = use next available - n = use minor number n - You can specify up to 64 cameras this way. - For example: - video_nr=-1,2,-1 would assign minor number 2 to the second - recognized camera and use auto for the first one and for every - other camera. + -1 = use next available + n = use minor number n + You can specify up to 64 cameras this way. + For example: + video_nr=-1,2,-1 would assign minor number 2 to the second + recognized camera and use auto for the first one and for every + other camera. Default: -1 ------------------------------------------------------------------------------- Name: force_munmap Type: bool array (min = 0, max = 64) Syntax: <0|1[,...]> Description: Force the application to unmap previously mapped buffer memory - before calling any VIDIOC_S_CROP or VIDIOC_S_FMT ioctl's. Not - all the applications support this feature. This parameter is - specific for each detected camera. - 0 = do not force memory unmapping - 1 = force memory unmapping (save memory) + before calling any VIDIOC_S_CROP or VIDIOC_S_FMT ioctl's. Not + all the applications support this feature. This parameter is + specific for each detected camera. + 0 = do not force memory unmapping + 1 = force memory unmapping (save memory) Default: 0 ------------------------------------------------------------------------------- Name: frame_timeout Type: uint array (min = 0, max = 64) Syntax: <n[,...]> Description: Timeout for a video frame in seconds. This parameter is - specific for each detected camera. This parameter can be - changed at runtime thanks to the /sys filesystem interface. + specific for each detected camera. This parameter can be + changed at runtime thanks to the /sys filesystem interface. Default: 2 ------------------------------------------------------------------------------- Name: debug Type: ushort Syntax: <n> Description: Debugging information level, from 0 to 3: - 0 = none (use carefully) - 1 = critical errors - 2 = significant informations - 3 = more verbose messages - Level 3 is useful for testing only, when only one device - is used. It also shows some more informations about the - hardware being detected. This parameter can be changed at - runtime thanks to the /sys filesystem interface. + 0 = none (use carefully) + 1 = critical errors + 2 = significant informations + 3 = more verbose messages + Level 3 is useful for testing only, when only one device + is used. It also shows some more informations about the + hardware being detected. This parameter can be changed at + runtime thanks to the /sys filesystem interface. Default: 2 ------------------------------------------------------------------------------- @@ -280,24 +280,24 @@ Byte # Value Description 0x04 0xC4 Frame synchronisation pattern. 0x05 0x96 Frame synchronisation pattern. 0x06 0xXX Unknown meaning. The exact value depends on the chip; - possible values are 0x00, 0x01 and 0x20. + possible values are 0x00, 0x01 and 0x20. 0x07 0xXX Variable value, whose bits are ff00uzzc, where ff is a - frame counter, u is unknown, zz is a size indicator - (00 = VGA, 01 = SIF, 10 = QSIF) and c stands for - "compression enabled" (1 = yes, 0 = no). + frame counter, u is unknown, zz is a size indicator + (00 = VGA, 01 = SIF, 10 = QSIF) and c stands for + "compression enabled" (1 = yes, 0 = no). 0x08 0xXX Brightness sum inside Auto-Exposure area (low-byte). 0x09 0xXX Brightness sum inside Auto-Exposure area (high-byte). - For a pure white image, this number will be equal to 500 - times the area of the specified AE area. For images - that are not pure white, the value scales down according - to relative whiteness. + For a pure white image, this number will be equal to 500 + times the area of the specified AE area. For images + that are not pure white, the value scales down according + to relative whiteness. 0x0A 0xXX Brightness sum outside Auto-Exposure area (low-byte). 0x0B 0xXX Brightness sum outside Auto-Exposure area (high-byte). - For a pure white image, this number will be equal to 125 - times the area outside of the specified AE area. For - images that are not pure white, the value scales down - according to relative whiteness. - according to relative whiteness. + For a pure white image, this number will be equal to 125 + times the area outside of the specified AE area. For + images that are not pure white, the value scales down + according to relative whiteness. + according to relative whiteness. The following bytes are used by the SN9C103 bridge only: diff --git a/Documentation/video4linux/v4lgrab.c b/Documentation/video4linux/v4lgrab.c new file mode 100644 index 000000000000..079b628481cf --- /dev/null +++ b/Documentation/video4linux/v4lgrab.c @@ -0,0 +1,192 @@ +/* Simple Video4Linux image grabber. */ +/* + * Video4Linux Driver Test/Example Framegrabbing Program + * + * Compile with: + * gcc -s -Wall -Wstrict-prototypes v4lgrab.c -o v4lgrab + * Use as: + * v4lgrab >image.ppm + * + * Copyright (C) 1998-05-03, Phil Blundell <philb@gnu.org> + * Copied from http://www.tazenda.demon.co.uk/phil/vgrabber.c + * with minor modifications (Dave Forrest, drf5n@virginia.edu). + * + */ + +#include <unistd.h> +#include <sys/types.h> +#include <sys/stat.h> +#include <fcntl.h> +#include <stdio.h> +#include <sys/ioctl.h> +#include <stdlib.h> + +#include <linux/types.h> +#include <linux/videodev.h> + +#define FILE "/dev/video0" + +/* Stole this from tvset.c */ + +#define READ_VIDEO_PIXEL(buf, format, depth, r, g, b) \ +{ \ + switch (format) \ + { \ + case VIDEO_PALETTE_GREY: \ + switch (depth) \ + { \ + case 4: \ + case 6: \ + case 8: \ + (r) = (g) = (b) = (*buf++ << 8);\ + break; \ + \ + case 16: \ + (r) = (g) = (b) = \ + *((unsigned short *) buf); \ + buf += 2; \ + break; \ + } \ + break; \ + \ + \ + case VIDEO_PALETTE_RGB565: \ + { \ + unsigned short tmp = *(unsigned short *)buf; \ + (r) = tmp&0xF800; \ + (g) = (tmp<<5)&0xFC00; \ + (b) = (tmp<<11)&0xF800; \ + buf += 2; \ + } \ + break; \ + \ + case VIDEO_PALETTE_RGB555: \ + (r) = (buf[0]&0xF8)<<8; \ + (g) = ((buf[0] << 5 | buf[1] >> 3)&0xF8)<<8; \ + (b) = ((buf[1] << 2 ) & 0xF8)<<8; \ + buf += 2; \ + break; \ + \ + case VIDEO_PALETTE_RGB24: \ + (r) = buf[0] << 8; (g) = buf[1] << 8; \ + (b) = buf[2] << 8; \ + buf += 3; \ + break; \ + \ + default: \ + fprintf(stderr, \ + "Format %d not yet supported\n", \ + format); \ + } \ +} + +int get_brightness_adj(unsigned char *image, long size, int *brightness) { + long i, tot = 0; + for (i=0;i<size*3;i++) + tot += image[i]; + *brightness = (128 - tot/(size*3))/3; + return !((tot/(size*3)) >= 126 && (tot/(size*3)) <= 130); +} + +int main(int argc, char ** argv) +{ + int fd = open(FILE, O_RDONLY), f; + struct video_capability cap; + struct video_window win; + struct video_picture vpic; + + unsigned char *buffer, *src; + int bpp = 24, r, g, b; + unsigned int i, src_depth; + + if (fd < 0) { + perror(FILE); + exit(1); + } + + if (ioctl(fd, VIDIOCGCAP, &cap) < 0) { + perror("VIDIOGCAP"); + fprintf(stderr, "(" FILE " not a video4linux device?)\n"); + close(fd); + exit(1); + } + + if (ioctl(fd, VIDIOCGWIN, &win) < 0) { + perror("VIDIOCGWIN"); + close(fd); + exit(1); + } + + if (ioctl(fd, VIDIOCGPICT, &vpic) < 0) { + perror("VIDIOCGPICT"); + close(fd); + exit(1); + } + + if (cap.type & VID_TYPE_MONOCHROME) { + vpic.depth=8; + vpic.palette=VIDEO_PALETTE_GREY; /* 8bit grey */ + if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) { + vpic.depth=6; + if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) { + vpic.depth=4; + if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) { + fprintf(stderr, "Unable to find a supported capture format.\n"); + close(fd); + exit(1); + } + } + } + } else { + vpic.depth=24; + vpic.palette=VIDEO_PALETTE_RGB24; + + if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) { + vpic.palette=VIDEO_PALETTE_RGB565; + vpic.depth=16; + + if(ioctl(fd, VIDIOCSPICT, &vpic)==-1) { + vpic.palette=VIDEO_PALETTE_RGB555; + vpic.depth=15; + + if(ioctl(fd, VIDIOCSPICT, &vpic)==-1) { + fprintf(stderr, "Unable to find a supported capture format.\n"); + return -1; + } + } + } + } + + buffer = malloc(win.width * win.height * bpp); + if (!buffer) { + fprintf(stderr, "Out of memory.\n"); + exit(1); + } + + do { + int newbright; + read(fd, buffer, win.width * win.height * bpp); + f = get_brightness_adj(buffer, win.width * win.height, &newbright); + if (f) { + vpic.brightness += (newbright << 8); + if(ioctl(fd, VIDIOCSPICT, &vpic)==-1) { + perror("VIDIOSPICT"); + break; + } + } + } while (f); + + fprintf(stdout, "P6\n%d %d 255\n", win.width, win.height); + + src = buffer; + + for (i = 0; i < win.width * win.height; i++) { + READ_VIDEO_PIXEL(src, vpic.palette, src_depth, r, g, b); + fputc(r>>8, stdout); + fputc(g>>8, stdout); + fputc(b>>8, stdout); + } + + close(fd); + return 0; +} diff --git a/Documentation/video4linux/w9968cf.txt b/Documentation/video4linux/w9968cf.txt index 3b704f2aae6d..0d53ce774b01 100644 --- a/Documentation/video4linux/w9968cf.txt +++ b/Documentation/video4linux/w9968cf.txt @@ -1,9 +1,9 @@ - W996[87]CF JPEG USB Dual Mode Camera Chip - Driver for Linux 2.6 (basic version) - ========================================= + W996[87]CF JPEG USB Dual Mode Camera Chip + Driver for Linux 2.6 (basic version) + ========================================= - - Documentation - + - Documentation - Index @@ -188,57 +188,57 @@ Name: ovmod_load Type: bool Syntax: <0|1> Description: Automatic 'ovcamchip' module loading: 0 disabled, 1 enabled. - If enabled, 'insmod' searches for the required 'ovcamchip' - module in the system, according to its configuration, and - loads that module automatically. This action is performed as - once soon as the 'w9968cf' module is loaded into memory. + If enabled, 'insmod' searches for the required 'ovcamchip' + module in the system, according to its configuration, and + loads that module automatically. This action is performed as + once soon as the 'w9968cf' module is loaded into memory. Default: 1 Note: The kernel must be compiled with the CONFIG_KMOD option - enabled for the 'ovcamchip' module to be loaded and for - this parameter to be present. + enabled for the 'ovcamchip' module to be loaded and for + this parameter to be present. ------------------------------------------------------------------------------- Name: simcams Type: int Syntax: <n> Description: Number of cameras allowed to stream simultaneously. - n may vary from 0 to 32. + n may vary from 0 to 32. Default: 32 ------------------------------------------------------------------------------- Name: video_nr Type: int array (min = 0, max = 32) Syntax: <-1|n[,...]> Description: Specify V4L minor mode number. - -1 = use next available - n = use minor number n - You can specify up to 32 cameras this way. - For example: - video_nr=-1,2,-1 would assign minor number 2 to the second - recognized camera and use auto for the first one and for every - other camera. + -1 = use next available + n = use minor number n + You can specify up to 32 cameras this way. + For example: + video_nr=-1,2,-1 would assign minor number 2 to the second + recognized camera and use auto for the first one and for every + other camera. Default: -1 ------------------------------------------------------------------------------- Name: packet_size Type: int array (min = 0, max = 32) Syntax: <n[,...]> Description: Specify the maximum data payload size in bytes for alternate - settings, for each device. n is scaled between 63 and 1023. + settings, for each device. n is scaled between 63 and 1023. Default: 1023 ------------------------------------------------------------------------------- Name: max_buffers Type: int array (min = 0, max = 32) Syntax: <n[,...]> Description: For advanced users. - Specify the maximum number of video frame buffers to allocate - for each device, from 2 to 32. + Specify the maximum number of video frame buffers to allocate + for each device, from 2 to 32. Default: 2 ------------------------------------------------------------------------------- Name: double_buffer Type: bool array (min = 0, max = 32) Syntax: <0|1[,...]> Description: Hardware double buffering: 0 disabled, 1 enabled. - It should be enabled if you want smooth video output: if you - obtain out of sync. video, disable it, or try to - decrease the 'clockdiv' module parameter value. + It should be enabled if you want smooth video output: if you + obtain out of sync. video, disable it, or try to + decrease the 'clockdiv' module parameter value. Default: 1 for every device. ------------------------------------------------------------------------------- Name: clamping @@ -251,9 +251,9 @@ Name: filter_type Type: int array (min = 0, max = 32) Syntax: <0|1|2[,...]> Description: Video filter type. - 0 none, 1 (1-2-1) 3-tap filter, 2 (2-3-6-3-2) 5-tap filter. - The filter is used to reduce noise and aliasing artifacts - produced by the CCD or CMOS image sensor. + 0 none, 1 (1-2-1) 3-tap filter, 2 (2-3-6-3-2) 5-tap filter. + The filter is used to reduce noise and aliasing artifacts + produced by the CCD or CMOS image sensor. Default: 0 for every device. ------------------------------------------------------------------------------- Name: largeview @@ -266,9 +266,9 @@ Name: upscaling Type: bool array (min = 0, max = 32) Syntax: <0|1[,...]> Description: Software scaling (for non-compressed video only): - 0 disabled, 1 enabled. - Disable it if you have a slow CPU or you don't have enough - memory. + 0 disabled, 1 enabled. + Disable it if you have a slow CPU or you don't have enough + memory. Default: 0 for every device. Note: If 'w9968cf-vpp' is not present, this parameter is set to 0. ------------------------------------------------------------------------------- @@ -276,36 +276,36 @@ Name: decompression Type: int array (min = 0, max = 32) Syntax: <0|1|2[,...]> Description: Software video decompression: - 0 = disables decompression - (doesn't allow formats needing decompression). - 1 = forces decompression - (allows formats needing decompression only). - 2 = allows any permitted formats. - Formats supporting (de)compressed video are YUV422P and - YUV420P/YUV420 in any resolutions where width and height are - multiples of 16. + 0 = disables decompression + (doesn't allow formats needing decompression). + 1 = forces decompression + (allows formats needing decompression only). + 2 = allows any permitted formats. + Formats supporting (de)compressed video are YUV422P and + YUV420P/YUV420 in any resolutions where width and height are + multiples of 16. Default: 2 for every device. Note: If 'w9968cf-vpp' is not present, forcing decompression is not - allowed; in this case this parameter is set to 2. + allowed; in this case this parameter is set to 2. ------------------------------------------------------------------------------- Name: force_palette Type: int array (min = 0, max = 32) Syntax: <0|9|10|13|15|8|7|1|6|3|4|5[,...]> Description: Force picture palette. - In order: - 0 = Off - allows any of the following formats: - 9 = UYVY 16 bpp - Original video, compression disabled - 10 = YUV420 12 bpp - Original video, compression enabled - 13 = YUV422P 16 bpp - Original video, compression enabled - 15 = YUV420P 12 bpp - Original video, compression enabled - 8 = YUVY 16 bpp - Software conversion from UYVY - 7 = YUV422 16 bpp - Software conversion from UYVY - 1 = GREY 8 bpp - Software conversion from UYVY - 6 = RGB555 16 bpp - Software conversion from UYVY - 3 = RGB565 16 bpp - Software conversion from UYVY - 4 = RGB24 24 bpp - Software conversion from UYVY - 5 = RGB32 32 bpp - Software conversion from UYVY - When not 0, this parameter will override 'decompression'. + In order: + 0 = Off - allows any of the following formats: + 9 = UYVY 16 bpp - Original video, compression disabled + 10 = YUV420 12 bpp - Original video, compression enabled + 13 = YUV422P 16 bpp - Original video, compression enabled + 15 = YUV420P 12 bpp - Original video, compression enabled + 8 = YUVY 16 bpp - Software conversion from UYVY + 7 = YUV422 16 bpp - Software conversion from UYVY + 1 = GREY 8 bpp - Software conversion from UYVY + 6 = RGB555 16 bpp - Software conversion from UYVY + 3 = RGB565 16 bpp - Software conversion from UYVY + 4 = RGB24 24 bpp - Software conversion from UYVY + 5 = RGB32 32 bpp - Software conversion from UYVY + When not 0, this parameter will override 'decompression'. Default: 0 for every device. Initial palette is 9 (UYVY). Note: If 'w9968cf-vpp' is not present, this parameter is set to 9. ------------------------------------------------------------------------------- @@ -313,77 +313,77 @@ Name: force_rgb Type: bool array (min = 0, max = 32) Syntax: <0|1[,...]> Description: Read RGB video data instead of BGR: - 1 = use RGB component ordering. - 0 = use BGR component ordering. - This parameter has effect when using RGBX palettes only. + 1 = use RGB component ordering. + 0 = use BGR component ordering. + This parameter has effect when using RGBX palettes only. Default: 0 for every device. ------------------------------------------------------------------------------- Name: autobright Type: bool array (min = 0, max = 32) Syntax: <0|1[,...]> Description: Image sensor automatically changes brightness: - 0 = no, 1 = yes + 0 = no, 1 = yes Default: 0 for every device. ------------------------------------------------------------------------------- Name: autoexp Type: bool array (min = 0, max = 32) Syntax: <0|1[,...]> Description: Image sensor automatically changes exposure: - 0 = no, 1 = yes + 0 = no, 1 = yes Default: 1 for every device. ------------------------------------------------------------------------------- Name: lightfreq Type: int array (min = 0, max = 32) Syntax: <50|60[,...]> Description: Light frequency in Hz: - 50 for European and Asian lighting, 60 for American lighting. + 50 for European and Asian lighting, 60 for American lighting. Default: 50 for every device. ------------------------------------------------------------------------------- Name: bandingfilter Type: bool array (min = 0, max = 32) Syntax: <0|1[,...]> Description: Banding filter to reduce effects of fluorescent - lighting: - 0 disabled, 1 enabled. - This filter tries to reduce the pattern of horizontal - light/dark bands caused by some (usually fluorescent) lighting. + lighting: + 0 disabled, 1 enabled. + This filter tries to reduce the pattern of horizontal + light/dark bands caused by some (usually fluorescent) lighting. Default: 0 for every device. ------------------------------------------------------------------------------- Name: clockdiv Type: int array (min = 0, max = 32) Syntax: <-1|n[,...]> Description: Force pixel clock divisor to a specific value (for experts): - n may vary from 0 to 127. - -1 for automatic value. - See also the 'double_buffer' module parameter. + n may vary from 0 to 127. + -1 for automatic value. + See also the 'double_buffer' module parameter. Default: -1 for every device. ------------------------------------------------------------------------------- Name: backlight Type: bool array (min = 0, max = 32) Syntax: <0|1[,...]> Description: Objects are lit from behind: - 0 = no, 1 = yes + 0 = no, 1 = yes Default: 0 for every device. ------------------------------------------------------------------------------- Name: mirror Type: bool array (min = 0, max = 32) Syntax: <0|1[,...]> Description: Reverse image horizontally: - 0 = no, 1 = yes + 0 = no, 1 = yes Default: 0 for every device. ------------------------------------------------------------------------------- Name: monochrome Type: bool array (min = 0, max = 32) Syntax: <0|1[,...]> Description: The image sensor is monochrome: - 0 = no, 1 = yes + 0 = no, 1 = yes Default: 0 for every device. ------------------------------------------------------------------------------- Name: brightness Type: long array (min = 0, max = 32) Syntax: <n[,...]> Description: Set picture brightness (0-65535). - This parameter has no effect if 'autobright' is enabled. + This parameter has no effect if 'autobright' is enabled. Default: 31000 for every device. ------------------------------------------------------------------------------- Name: hue @@ -414,23 +414,23 @@ Name: debug Type: int Syntax: <n> Description: Debugging information level, from 0 to 6: - 0 = none (use carefully) - 1 = critical errors - 2 = significant informations - 3 = configuration or general messages - 4 = warnings - 5 = called functions - 6 = function internals - Level 5 and 6 are useful for testing only, when only one - device is used. + 0 = none (use carefully) + 1 = critical errors + 2 = significant informations + 3 = configuration or general messages + 4 = warnings + 5 = called functions + 6 = function internals + Level 5 and 6 are useful for testing only, when only one + device is used. Default: 2 ------------------------------------------------------------------------------- Name: specific_debug Type: bool Syntax: <0|1> Description: Enable or disable specific debugging messages: - 0 = print messages concerning every level <= 'debug' level. - 1 = print messages concerning the level indicated by 'debug'. + 0 = print messages concerning every level <= 'debug' level. + 1 = print messages concerning the level indicated by 'debug'. Default: 0 ------------------------------------------------------------------------------- diff --git a/Documentation/video4linux/zc0301.txt b/Documentation/video4linux/zc0301.txt index f55262c6733b..f406f5e80046 100644 --- a/Documentation/video4linux/zc0301.txt +++ b/Documentation/video4linux/zc0301.txt @@ -1,9 +1,9 @@ - ZC0301 Image Processor and Control Chip - Driver for Linux - ======================================= + ZC0301 and ZC0301P Image Processor and Control Chip + Driver for Linux + =================================================== - - Documentation - + - Documentation - Index @@ -51,13 +51,13 @@ Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 4. Overview and features ======================== -This driver supports the video interface of the devices mounting the ZC0301 -Image Processor and Control Chip. +This driver supports the video interface of the devices mounting the ZC0301 or +ZC0301P Image Processors and Control Chips. The driver relies on the Video4Linux2 and USB core modules. It has been designed to run properly on SMP systems as well. -The latest version of the ZC0301 driver can be found at the following URL: +The latest version of the ZC0301[P] driver can be found at the following URL: http://www.linux-projects.org/ Some of the features of the driver are: @@ -117,7 +117,7 @@ supported by the USB Audio driver thanks to the ALSA API: And finally: - # USB Multimedia devices + # V4L USB devices # CONFIG_USB_ZC0301=m @@ -146,46 +146,46 @@ Name: video_nr Type: short array (min = 0, max = 64) Syntax: <-1|n[,...]> Description: Specify V4L2 minor mode number: - -1 = use next available - n = use minor number n - You can specify up to 64 cameras this way. - For example: - video_nr=-1,2,-1 would assign minor number 2 to the second - registered camera and use auto for the first one and for every - other camera. + -1 = use next available + n = use minor number n + You can specify up to 64 cameras this way. + For example: + video_nr=-1,2,-1 would assign minor number 2 to the second + registered camera and use auto for the first one and for every + other camera. Default: -1 ------------------------------------------------------------------------------- Name: force_munmap Type: bool array (min = 0, max = 64) Syntax: <0|1[,...]> Description: Force the application to unmap previously mapped buffer memory - before calling any VIDIOC_S_CROP or VIDIOC_S_FMT ioctl's. Not - all the applications support this feature. This parameter is - specific for each detected camera. - 0 = do not force memory unmapping - 1 = force memory unmapping (save memory) + before calling any VIDIOC_S_CROP or VIDIOC_S_FMT ioctl's. Not + all the applications support this feature. This parameter is + specific for each detected camera. + 0 = do not force memory unmapping + 1 = force memory unmapping (save memory) Default: 0 ------------------------------------------------------------------------------- Name: frame_timeout Type: uint array (min = 0, max = 64) Syntax: <n[,...]> Description: Timeout for a video frame in seconds. This parameter is - specific for each detected camera. This parameter can be - changed at runtime thanks to the /sys filesystem interface. + specific for each detected camera. This parameter can be + changed at runtime thanks to the /sys filesystem interface. Default: 2 ------------------------------------------------------------------------------- Name: debug Type: ushort Syntax: <n> Description: Debugging information level, from 0 to 3: - 0 = none (use carefully) - 1 = critical errors - 2 = significant informations - 3 = more verbose messages - Level 3 is useful for testing only, when only one device - is used at the same time. It also shows some more informations - about the hardware being detected. This module parameter can be - changed at runtime thanks to the /sys filesystem interface. + 0 = none (use carefully) + 1 = critical errors + 2 = significant informations + 3 = more verbose messages + Level 3 is useful for testing only, when only one device + is used at the same time. It also shows some more informations + about the hardware being detected. This module parameter can be + changed at runtime thanks to the /sys filesystem interface. Default: 2 ------------------------------------------------------------------------------- @@ -204,11 +204,25 @@ Vendor ID Product ID 0x041e 0x4017 0x041e 0x401c 0x041e 0x401e +0x041e 0x401f +0x041e 0x4022 0x041e 0x4034 0x041e 0x4035 +0x041e 0x4036 +0x041e 0x403a +0x0458 0x7007 +0x0458 0x700C +0x0458 0x700f +0x046d 0x08ae +0x055f 0xd003 +0x055f 0xd004 0x046d 0x08ae 0x0ac8 0x0301 +0x0ac8 0x301b +0x0ac8 0x303b +0x10fd 0x0128 0x10fd 0x8050 +0x10fd 0x804e The list above does not imply that all those devices work with this driver: up until now only the ones that mount the following image sensors are supported; @@ -217,6 +231,7 @@ kernel messages will always tell you whether this is the case: Model Manufacturer ----- ------------ PAS202BCB PixArt Imaging, Inc. +PB-0330 Photobit Corporation 9. Notes for V4L2 application developers @@ -250,5 +265,6 @@ the fingerprint is: '88E8 F32F 7244 68BA 3958 5D40 99DA 5D2A FCE6 35A4'. been taken from the documentation of the ZC030x Video4Linux1 driver written by Andrew Birkett <andy@nobugs.org>; - The initialization values of the ZC0301 controller connected to the PAS202BCB - image sensor have been taken from the SPCA5XX driver maintained by - Michel Xhaard <mxhaard@magic.fr>. + and PB-0330 image sensors have been taken from the SPCA5XX driver maintained + by Michel Xhaard <mxhaard@magic.fr>; +- Stanislav Lechev donated one camera. diff --git a/Documentation/vm/page_migration b/Documentation/vm/page_migration index 0dd4ef30c361..99f89aa10169 100644 --- a/Documentation/vm/page_migration +++ b/Documentation/vm/page_migration @@ -26,8 +26,13 @@ a process are located. See also the numa_maps manpage in the numactl package. Manual migration is useful if for example the scheduler has relocated a process to a processor on a distant node. A batch scheduler or an administrator may detect the situation and move the pages of the process -nearer to the new processor. At some point in the future we may have -some mechanism in the scheduler that will automatically move the pages. +nearer to the new processor. The kernel itself does only provide +manual page migration support. Automatic page migration may be implemented +through user space processes that move pages. A special function call +"move_pages" allows the moving of individual pages within a process. +A NUMA profiler may f.e. obtain a log showing frequent off node +accesses and may use the result to move pages to more advantageous +locations. Larger installations usually partition the system using cpusets into sections of nodes. Paul Jackson has equipped cpusets with the ability to @@ -62,22 +67,14 @@ A. In kernel use of migrate_pages() It also prevents the swapper or other scans to encounter the page. -2. Generate a list of newly allocates page. These pages will contain the - contents of the pages from the first list after page migration is - complete. +2. We need to have a function of type new_page_t that can be + passed to migrate_pages(). This function should figure out + how to allocate the correct new page given the old page. 3. The migrate_pages() function is called which attempts - to do the migration. It returns the moved pages in the - list specified as the third parameter and the failed - migrations in the fourth parameter. The first parameter - will contain the pages that could still be retried. - -4. The leftover pages of various types are returned - to the LRU using putback_to_lru_pages() or otherwise - disposed of. The pages will still have the refcount as - increased by isolate_lru_pages() if putback_to_lru_pages() is not - used! The kernel may want to handle the various cases of failures in - different ways. + to do the migration. It will call the function to allocate + the new page for each page that is considered for + moving. B. How migrate_pages() works ---------------------------- @@ -93,83 +90,58 @@ Steps: 2. Insure that writeback is complete. -3. Make sure that the page has assigned swap cache entry if - it is an anonyous page. The swap cache reference is necessary - to preserve the information contain in the page table maps while - page migration occurs. - -4. Prep the new page that we want to move to. It is locked +3. Prep the new page that we want to move to. It is locked and set to not being uptodate so that all accesses to the new page immediately lock while the move is in progress. -5. All the page table references to the page are either dropped (file - backed pages) or converted to swap references (anonymous pages). - This should decrease the reference count. +4. The new page is prepped with some settings from the old page so that + accesses to the new page will discover a page with the correct settings. + +5. All the page table references to the page are converted + to migration entries or dropped (nonlinear vmas). + This decrease the mapcount of a page. If the resulting + mapcount is not zero then we do not migrate the page. + All user space processes that attempt to access the page + will now wait on the page lock. 6. The radix tree lock is taken. This will cause all processes trying - to reestablish a pte to block on the radix tree spinlock. + to access the page via the mapping to block on the radix tree spinlock. 7. The refcount of the page is examined and we back out if references remain otherwise we know that we are the only one referencing this page. 8. The radix tree is checked and if it does not contain the pointer to this - page then we back out because someone else modified the mapping first. - -9. The mapping is checked. If the mapping is gone then a truncate action may - be in progress and we back out. - -10. The new page is prepped with some settings from the old page so that - accesses to the new page will be discovered to have the correct settings. + page then we back out because someone else modified the radix tree. -11. The radix tree is changed to point to the new page. +9. The radix tree is changed to point to the new page. -12. The reference count of the old page is dropped because the radix tree - reference is gone. +10. The reference count of the old page is dropped because the radix tree + reference is gone. A reference to the new page is established because + the new page is referenced to by the radix tree. -13. The radix tree lock is dropped. With that lookups become possible again - and other processes will move from spinning on the tree lock to sleeping on - the locked new page. +11. The radix tree lock is dropped. With that lookups in the mapping + become possible again. Processes will move from spinning on the tree_lock + to sleeping on the locked new page. -14. The page contents are copied to the new page. +12. The page contents are copied to the new page. -15. The remaining page flags are copied to the new page. +13. The remaining page flags are copied to the new page. -16. The old page flags are cleared to indicate that the page does - not use any information anymore. +14. The old page flags are cleared to indicate that the page does + not provide any information anymore. -17. Queued up writeback on the new page is triggered. +15. Queued up writeback on the new page is triggered. -18. If swap pte's were generated for the page then replace them with real - ptes. This will reenable access for processes not blocked by the page lock. +16. If migration entries were page then replace them with real ptes. Doing + so will enable access for user space processes not already waiting for + the page lock. 19. The page locks are dropped from the old and new page. - Processes waiting on the page lock can continue. + Processes waiting on the page lock will redo their page faults + and will reach the new page. 20. The new page is moved to the LRU and can be scanned by the swapper etc again. -TODO list ---------- - -- Page migration requires the use of swap handles to preserve the - information of the anonymous page table entries. This means that swap - space is reserved but never used. The maximum number of swap handles used - is determined by CHUNK_SIZE (see mm/mempolicy.c) per ongoing migration. - Reservation of pages could be avoided by having a special type of swap - handle that does not require swap space and that would only track the page - references. Something like that was proposed by Marcelo Tosatti in the - past (search for migration cache on lkml or linux-mm@kvack.org). - -- Page migration unmaps ptes for file backed pages and requires page - faults to reestablish these ptes. This could be optimized by somehow - recording the references before migration and then reestablish them later. - However, there are several locking challenges that have to be overcome - before this is possible. - -- Page migration generates read ptes for anonymous pages. Dirty page - faults are required to make the pages writable again. It may be possible - to generate a pte marked dirty if it is known that the page is dirty and - that this process has the only reference to that page. - -Christoph Lameter, March 8, 2006. +Christoph Lameter, May 8, 2006. diff --git a/Documentation/w1/masters/ds2490 b/Documentation/w1/masters/ds2490 new file mode 100644 index 000000000000..44a4918bd7f2 --- /dev/null +++ b/Documentation/w1/masters/ds2490 @@ -0,0 +1,18 @@ +Kernel driver ds2490 +==================== + +Supported chips: + * Maxim DS2490 based + +Author: Evgeniy Polyakov <johnpol@2ka.mipt.ru> + + +Description +----------- + +The Maixm/Dallas Semiconductor DS2490 is a chip +which allows to build USB <-> W1 bridges. + +DS9490(R) is a USB <-> W1 bus master device +which has 0x81 family ID integrated chip and DS2490 +low-level operational chip. diff --git a/Documentation/w1/w1.generic b/Documentation/w1/w1.generic index f937fbe1cacb..4c6509dd4789 100644 --- a/Documentation/w1/w1.generic +++ b/Documentation/w1/w1.generic @@ -27,8 +27,19 @@ When a w1 master driver registers with the w1 subsystem, the following occurs: When a device is found on the bus, w1 core checks if driver for it's family is loaded. If so, the family driver is attached to the slave. -If there is no driver for the family, a simple sysfs entry is created -for the slave device. +If there is no driver for the family, default one is assigned, which allows to perform +almost any kind of operations. Each logical operation is a transaction +in nature, which can contain several (two or one) low-level operations. +Let's see how one can read EEPROM context: +1. one must write control buffer, i.e. buffer containing command byte +and two byte address. At this step bus is reset and appropriate device +is selected using either W1_SKIP_ROM or W1_MATCH_ROM command. +Then provided control buffer is being written to the wire. +2. reading. This will issue reading eeprom response. + +It is possible that between 1. and 2. w1 master thread will reset bus for searching +and slave device will be even removed, but in this case 0xff will +be read, since no device was selected. W1 device families @@ -89,4 +100,5 @@ driver - (standard) symlink to the w1 driver name - the device name, usually the same as the directory name w1_slave - (optional) a binary file whose meaning depends on the family driver - +rw - (optional) created for slave devices which do not have + appropriate family driver. Allows to read/write binary data. diff --git a/Documentation/w1/w1.netlink b/Documentation/w1/w1.netlink new file mode 100644 index 000000000000..3640c7c87d45 --- /dev/null +++ b/Documentation/w1/w1.netlink @@ -0,0 +1,98 @@ +Userspace communication protocol over connector [1]. + + +Message types. +============= + +There are three types of messages between w1 core and userspace: +1. Events. They are generated each time new master or slave device found + either due to automatic or requested search. +2. Userspace commands. Includes read/write and search/alarm search comamnds. +3. Replies to userspace commands. + + +Protocol. +======== + +[struct cn_msg] - connector header. It's length field is equal to size of the attached data. +[struct w1_netlink_msg] - w1 netlink header. + __u8 type - message type. + W1_SLAVE_ADD/W1_SLAVE_REMOVE - slave add/remove events. + W1_MASTER_ADD/W1_MASTER_REMOVE - master add/remove events. + W1_MASTER_CMD - userspace command for bus master device (search/alarm search). + W1_SLAVE_CMD - userspace command for slave device (read/write/ search/alarm search + for bus master device where given slave device found). + __u8 res - reserved + __u16 len - size of attached to this header data. + union { + __u8 id; - slave unique device id + struct w1_mst { + __u32 id; - master's id. + __u32 res; - reserved + } mst; + } id; + +[strucrt w1_netlink_cmd] - command for gived master or slave device. + __u8 cmd - command opcode. + W1_CMD_READ - read command. + W1_CMD_WRITE - write command. + W1_CMD_SEARCH - search command. + W1_CMD_ALARM_SEARCH - alarm search command. + __u8 res - reserved + __u16 len - length of data for this command. + For read command data must be allocated like for write command. + __u8 data[0] - data for this command. + + +Each connector message can include one or more w1_netlink_msg with zero of more attached w1_netlink_cmd messages. + +For event messages there are no w1_netlink_cmd embedded structures, only connector header +and w1_netlink_msg strucutre with "len" field being zero and filled type (one of event types) +and id - either 8 bytes of slave unique id in host order, or master's id, which is assigned +to bus master device when it is added to w1 core. + +Currently replies to userspace commands are only generated for read command request. +One reply is generated exactly for one w1_netlink_cmd read request. +Replies are not combined when sent - i.e. typical reply messages looks like the following: +[cn_msg][w1_netlink_msg][w1_netlink_cmd] +cn_msg.len = sizeof(struct w1_netlink_msg) + sizeof(struct w1_netlink_cmd) + cmd->len; +w1_netlink_msg.len = sizeof(struct w1_netlink_cmd) + cmd->len; +w1_netlink_cmd.len = cmd->len; + + +Operation steps in w1 core when new command is received. +======================================================= + +When new message (w1_netlink_msg) is received w1 core detects if it is master of slave request, +according to w1_netlink_msg.type field. +Then master or slave device is searched for. +When found, master device (requested or those one on where slave device is found) is locked. +If slave command is requested, then reset/select procedure is started to select given device. + +Then all requested in w1_netlink_msg operations are performed one by one. +If command requires reply (like read command) it is sent on command completion. + +When all commands (w1_netlink_cmd) are processed muster device is unlocked +and next w1_netlink_msg header processing started. + + +Connector [1] specific documentation. +==================================== + +Each connector message includes two u32 fields as "address". +w1 uses CN_W1_IDX and CN_W1_VAL defined in include/linux/connector.h header. +Each message also includes sequence and acknowledge numbers. +Sequence number for event messages is appropriate bus master sequence number increased with +each event message sent "through" this master. +Sequence number for userspace requests is set by userspace application. +Sequence number for reply is the same as was in request, and +acknowledge number is set to seq+1. + + +Additional documantion, source code examples. +============================================ + +1. Documentation/connector +2. http://tservice.net.ru/~s0mbre/archive/w1 +This archive includes userspace application w1d.c which +uses read/write/search commands for all master/slave devices found on the bus. diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt index f2cd6ef53ff3..6887d44d2661 100644 --- a/Documentation/x86_64/boot-options.txt +++ b/Documentation/x86_64/boot-options.txt @@ -205,6 +205,27 @@ IOMMU pages Prereserve that many 128K pages for the software IO bounce buffering. force Force all IO through the software TLB. + calgary=[64k,128k,256k,512k,1M,2M,4M,8M] + calgary=[translate_empty_slots] + calgary=[disable=<PCI bus number>] + + 64k,...,8M - Set the size of each PCI slot's translation table + when using the Calgary IOMMU. This is the size of the translation + table itself in main memory. The smallest table, 64k, covers an IO + space of 32MB; the largest, 8MB table, can cover an IO space of + 4GB. Normally the kernel will make the right choice by itself. + + translate_empty_slots - Enable translation even on slots that have + no devices attached to them, in case a device will be hotplugged + in the future. + + disable=<PCI bus number> - Disable translation on a given PHB. For + example, the built-in graphics adapter resides on the first bridge + (PCI bus number 0); if translation (isolation) is enabled on this + bridge, X servers that access the hardware directly from user + space might stop working. Use this option if you have devices that + are accessed from userspace directly on some PCI host bridge. + Debugging oops=panic Always panic on oopses. Default is to just kill the process, |