diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/kernel-parameters.txt | 8 | ||||
-rw-r--r-- | Documentation/x86_64/boot-options.txt | 132 | ||||
-rw-r--r-- | Documentation/x86_64/cpu-hotplug-spec | 2 | ||||
-rw-r--r-- | Documentation/x86_64/kernel-stacks | 26 | ||||
-rw-r--r-- | Documentation/x86_64/machinecheck | 70 | ||||
-rw-r--r-- | Documentation/x86_64/mm.txt | 22 |
6 files changed, 186 insertions, 74 deletions
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index d25acd51e181..22b19962a1a2 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -104,6 +104,9 @@ loader, and have no meaning to the kernel directly. Do not modify the syntax of boot loader parameters without extreme need or coordination with <Documentation/i386/boot.txt>. +There are also arch-specific kernel-parameters not documented here. +See for example <Documentation/x86_64/boot-options.txt>. + Note that ALL kernel parameters listed below are CASE SENSITIVE, and that a trailing = on the name of any parameter states that that parameter will be entered as an environment variable, whereas its absence indicates that @@ -361,6 +364,11 @@ and is between 256 and 4096 characters. It is defined in the file clocksource is not available, it defaults to PIT. Format: { pit | tsc | cyclone | pmtmr } + code_bytes [IA32] How many bytes of object code to print in an + oops report. + Range: 0 - 8192 + Default: 64 + disable_8254_timer enable_8254_timer [IA32/X86_64] Disable/Enable interrupt 0 timer routing diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt index 5c86ed6f0448..625a21db0c2a 100644 --- a/Documentation/x86_64/boot-options.txt +++ b/Documentation/x86_64/boot-options.txt @@ -180,40 +180,81 @@ PCI pci=lastbus=NUMBER Scan upto NUMBER busses, no matter what the mptable says. pci=noacpi Don't use ACPI to set up PCI interrupt routing. -IOMMU - - iommu=[size][,noagp][,off][,force][,noforce][,leak][,memaper[=order]][,merge] - [,forcesac][,fullflush][,nomerge][,noaperture][,calgary] - size set size of iommu (in bytes) - noagp don't initialize the AGP driver and use full aperture. - off don't use the IOMMU - leak turn on simple iommu leak tracing (only when CONFIG_IOMMU_LEAK is on) - memaper[=order] allocate an own aperture over RAM with size 32MB^order. - noforce don't force IOMMU usage. Default. - force Force IOMMU. - merge Do SG merging. Implies force (experimental) - nomerge Don't do SG merging. - forcesac For SAC mode for masks <40bits (experimental) - fullflush Flush IOMMU on each allocation (default) - nofullflush Don't use IOMMU fullflush - allowed overwrite iommu off workarounds for specific chipsets. - soft Use software bounce buffering (default for Intel machines) - noaperture Don't touch the aperture for AGP. - allowdac Allow DMA >4GB - When off all DMA over >4GB is forced through an IOMMU or bounce - buffering. - nodac Forbid DMA >4GB - panic Always panic when IOMMU overflows - calgary Use the Calgary IOMMU if it is available - - swiotlb=pages[,force] - - pages Prereserve that many 128K pages for the software IO bounce buffering. - force Force all IO through the software TLB. - - calgary=[64k,128k,256k,512k,1M,2M,4M,8M] - calgary=[translate_empty_slots] - calgary=[disable=<PCI bus number>] +IOMMU (input/output memory management unit) + + Currently four x86-64 PCI-DMA mapping implementations exist: + + 1. <arch/x86_64/kernel/pci-nommu.c>: use no hardware/software IOMMU at all + (e.g. because you have < 3 GB memory). + Kernel boot message: "PCI-DMA: Disabling IOMMU" + + 2. <arch/x86_64/kernel/pci-gart.c>: AMD GART based hardware IOMMU. + Kernel boot message: "PCI-DMA: using GART IOMMU" + + 3. <arch/x86_64/kernel/pci-swiotlb.c> : Software IOMMU implementation. Used + e.g. if there is no hardware IOMMU in the system and it is need because + you have >3GB memory or told the kernel to us it (iommu=soft)) + Kernel boot message: "PCI-DMA: Using software bounce buffering + for IO (SWIOTLB)" + + 4. <arch/x86_64/pci-calgary.c> : IBM Calgary hardware IOMMU. Used in IBM + pSeries and xSeries servers. This hardware IOMMU supports DMA address + mapping with memory protection, etc. + Kernel boot message: "PCI-DMA: Using Calgary IOMMU" + + iommu=[<size>][,noagp][,off][,force][,noforce][,leak[=<nr_of_leak_pages>] + [,memaper[=<order>]][,merge][,forcesac][,fullflush][,nomerge] + [,noaperture][,calgary] + + General iommu options: + off Don't initialize and use any kind of IOMMU. + noforce Don't force hardware IOMMU usage when it is not needed. + (default). + force Force the use of the hardware IOMMU even when it is + not actually needed (e.g. because < 3 GB memory). + soft Use software bounce buffering (SWIOTLB) (default for + Intel machines). This can be used to prevent the usage + of an available hardware IOMMU. + + iommu options only relevant to the AMD GART hardware IOMMU: + <size> Set the size of the remapping area in bytes. + allowed Overwrite iommu off workarounds for specific chipsets. + fullflush Flush IOMMU on each allocation (default). + nofullflush Don't use IOMMU fullflush. + leak Turn on simple iommu leak tracing (only when + CONFIG_IOMMU_LEAK is on). Default number of leak pages + is 20. + memaper[=<order>] Allocate an own aperture over RAM with size 32MB<<order. + (default: order=1, i.e. 64MB) + merge Do scatter-gather (SG) merging. Implies "force" + (experimental). + nomerge Don't do scatter-gather (SG) merging. + noaperture Ask the IOMMU not to touch the aperture for AGP. + forcesac Force single-address cycle (SAC) mode for masks <40bits + (experimental). + noagp Don't initialize the AGP driver and use full aperture. + allowdac Allow double-address cycle (DAC) mode, i.e. DMA >4GB. + DAC is used with 32-bit PCI to push a 64-bit address in + two cycles. When off all DMA over >4GB is forced through + an IOMMU or software bounce buffering. + nodac Forbid DAC mode, i.e. DMA >4GB. + panic Always panic when IOMMU overflows. + calgary Use the Calgary IOMMU if it is available + + iommu options only relevant to the software bounce buffering (SWIOTLB) IOMMU + implementation: + swiotlb=<pages>[,force] + <pages> Prereserve that many 128K pages for the software IO + bounce buffering. + force Force all IO through the software TLB. + + Settings for the IBM Calgary hardware IOMMU currently found in IBM + pSeries and xSeries machines: + + calgary=[64k,128k,256k,512k,1M,2M,4M,8M] + calgary=[translate_empty_slots] + calgary=[disable=<PCI bus number>] + panic Always panic when IOMMU overflows 64k,...,8M - Set the size of each PCI slot's translation table when using the Calgary IOMMU. This is the size of the translation @@ -234,14 +275,14 @@ IOMMU Debugging - oops=panic Always panic on oopses. Default is to just kill the process, - but there is a small probability of deadlocking the machine. - This will also cause panics on machine check exceptions. - Useful together with panic=30 to trigger a reboot. + oops=panic Always panic on oopses. Default is to just kill the process, + but there is a small probability of deadlocking the machine. + This will also cause panics on machine check exceptions. + Useful together with panic=30 to trigger a reboot. - kstack=N Print that many words from the kernel stack in oops dumps. + kstack=N Print N words from the kernel stack in oops dumps. - pagefaulttrace Dump all page faults. Only useful for extreme debugging + pagefaulttrace Dump all page faults. Only useful for extreme debugging and will create a lot of output. call_trace=[old|both|newfallback|new] @@ -251,15 +292,8 @@ Debugging newfallback: use new unwinder but fall back to old if it gets stuck (default) - call_trace=[old|both|newfallback|new] - old: use old inexact backtracer - new: use new exact dwarf2 unwinder - both: print entries from both - newfallback: use new unwinder but fall back to old if it gets - stuck (default) - -Misc +Miscellaneous noreplacement Don't replace instructions with more appropriate ones for the CPU. This may be useful on asymmetric MP systems - where some CPU have less capabilities than the others. + where some CPUs have less capabilities than others. diff --git a/Documentation/x86_64/cpu-hotplug-spec b/Documentation/x86_64/cpu-hotplug-spec index 5c0fa345e556..3c23e0587db3 100644 --- a/Documentation/x86_64/cpu-hotplug-spec +++ b/Documentation/x86_64/cpu-hotplug-spec @@ -2,7 +2,7 @@ Firmware support for CPU hotplug under Linux/x86-64 --------------------------------------------------- Linux/x86-64 supports CPU hotplug now. For various reasons Linux wants to -know in advance boot time the maximum number of CPUs that could be plugged +know in advance of boot time the maximum number of CPUs that could be plugged into the system. ACPI 3.0 currently has no official way to supply this information from the firmware to the operating system. diff --git a/Documentation/x86_64/kernel-stacks b/Documentation/x86_64/kernel-stacks index bddfddd466ab..5ad65d51fb95 100644 --- a/Documentation/x86_64/kernel-stacks +++ b/Documentation/x86_64/kernel-stacks @@ -9,9 +9,9 @@ zombie. While the thread is in user space the kernel stack is empty except for the thread_info structure at the bottom. In addition to the per thread stacks, there are specialized stacks -associated with each cpu. These stacks are only used while the kernel -is in control on that cpu, when a cpu returns to user space the -specialized stacks contain no useful data. The main cpu stacks is +associated with each CPU. These stacks are only used while the kernel +is in control on that CPU; when a CPU returns to user space the +specialized stacks contain no useful data. The main CPU stacks are: * Interrupt stack. IRQSTACKSIZE @@ -32,17 +32,17 @@ x86_64 also has a feature which is not available on i386, the ability to automatically switch to a new stack for designated events such as double fault or NMI, which makes it easier to handle these unusual events on x86_64. This feature is called the Interrupt Stack Table -(IST). There can be up to 7 IST entries per cpu. The IST code is an -index into the Task State Segment (TSS), the IST entries in the TSS -point to dedicated stacks, each stack can be a different size. +(IST). There can be up to 7 IST entries per CPU. The IST code is an +index into the Task State Segment (TSS). The IST entries in the TSS +point to dedicated stacks; each stack can be a different size. -An IST is selected by an non-zero value in the IST field of an +An IST is selected by a non-zero value in the IST field of an interrupt-gate descriptor. When an interrupt occurs and the hardware loads such a descriptor, the hardware automatically sets the new stack pointer based on the IST value, then invokes the interrupt handler. If software wants to allow nested IST interrupts then the handler must adjust the IST values on entry to and exit from the interrupt handler. -(this is occasionally done, e.g. for debug exceptions) +(This is occasionally done, e.g. for debug exceptions.) Events with different IST codes (i.e. with different stacks) can be nested. For example, a debug interrupt can safely be interrupted by an @@ -58,17 +58,17 @@ The currently assigned IST stacks are :- Used for interrupt 12 - Stack Fault Exception (#SS). - This allows to recover from invalid stack segments. Rarely + This allows the CPU to recover from invalid stack segments. Rarely happens. * DOUBLEFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE). Used for interrupt 8 - Double Fault Exception (#DF). - Invoked when handling a exception causes another exception. Happens - when the kernel is very confused (e.g. kernel stack pointer corrupt) - Using a separate stack allows to recover from it well enough in many - cases to still output an oops. + Invoked when handling one exception causes another exception. Happens + when the kernel is very confused (e.g. kernel stack pointer corrupt). + Using a separate stack allows the kernel to recover from it well enough + in many cases to still output an oops. * NMI_STACK. EXCEPTION_STKSZ (PAGE_SIZE). diff --git a/Documentation/x86_64/machinecheck b/Documentation/x86_64/machinecheck new file mode 100644 index 000000000000..068a6d9904b9 --- /dev/null +++ b/Documentation/x86_64/machinecheck @@ -0,0 +1,70 @@ + +Configurable sysfs parameters for the x86-64 machine check code. + +Machine checks report internal hardware error conditions detected +by the CPU. Uncorrected errors typically cause a machine check +(often with panic), corrected ones cause a machine check log entry. + +Machine checks are organized in banks (normally associated with +a hardware subsystem) and subevents in a bank. The exact meaning +of the banks and subevent is CPU specific. + +mcelog knows how to decode them. + +When you see the "Machine check errors logged" message in the system +log then mcelog should run to collect and decode machine check entries +from /dev/mcelog. Normally mcelog should be run regularly from a cronjob. + +Each CPU has a directory in /sys/devices/system/machinecheck/machinecheckN +(N = CPU number) + +The directory contains some configurable entries: + +Entries: + +bankNctl +(N bank number) + 64bit Hex bitmask enabling/disabling specific subevents for bank N + When a bit in the bitmask is zero then the respective + subevent will not be reported. + By default all events are enabled. + Note that BIOS maintain another mask to disable specific events + per bank. This is not visible here + +The following entries appear for each CPU, but they are truly shared +between all CPUs. + +check_interval + How often to poll for corrected machine check errors, in seconds + (Note output is hexademical). Default 5 minutes. + +tolerant + Tolerance level. When a machine check exception occurs for a non + corrected machine check the kernel can take different actions. + Since machine check exceptions can happen any time it is sometimes + risky for the kernel to kill a process because it defies + normal kernel locking rules. The tolerance level configures + how hard the kernel tries to recover even at some risk of deadlock. + + 0: always panic, + 1: panic if deadlock possible, + 2: try to avoid panic, + 3: never panic or exit (for testing only) + + Default: 1 + + Note this only makes a difference if the CPU allows recovery + from a machine check exception. Current x86 CPUs generally do not. + +trigger + Program to run when a machine check event is detected. + This is an alternative to running mcelog regularly from cron + and allows to detect events faster. + +TBD document entries for AMD threshold interrupt configuration + +For more details about the x86 machine check architecture +see the Intel and AMD architecture manuals from their developer websites. + +For more details about the architecture see +see http://one.firstfloor.org/~andi/mce.pdf diff --git a/Documentation/x86_64/mm.txt b/Documentation/x86_64/mm.txt index 133561b9cb0c..f42798ed1c54 100644 --- a/Documentation/x86_64/mm.txt +++ b/Documentation/x86_64/mm.txt @@ -3,26 +3,26 @@ Virtual memory map with 4 level page tables: -0000000000000000 - 00007fffffffffff (=47bits) user space, different per mm +0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm hole caused by [48:63] sign extension -ffff800000000000 - ffff80ffffffffff (=40bits) guard hole -ffff810000000000 - ffffc0ffffffffff (=46bits) direct mapping of all phys. memory -ffffc10000000000 - ffffc1ffffffffff (=40bits) hole -ffffc20000000000 - ffffe1ffffffffff (=45bits) vmalloc/ioremap space +ffff800000000000 - ffff80ffffffffff (=40 bits) guard hole +ffff810000000000 - ffffc0ffffffffff (=46 bits) direct mapping of all phys. memory +ffffc10000000000 - ffffc1ffffffffff (=40 bits) hole +ffffc20000000000 - ffffe1ffffffffff (=45 bits) vmalloc/ioremap space ... unused hole ... -ffffffff80000000 - ffffffff82800000 (=40MB) kernel text mapping, from phys 0 +ffffffff80000000 - ffffffff82800000 (=40 MB) kernel text mapping, from phys 0 ... unused hole ... -ffffffff88000000 - fffffffffff00000 (=1919MB) module mapping space +ffffffff88000000 - fffffffffff00000 (=1919 MB) module mapping space -The direct mapping covers all memory in the system upto the highest +The direct mapping covers all memory in the system up to the highest memory address (this means in some cases it can also include PCI memory -holes) +holes). vmalloc space is lazily synchronized into the different PML4 pages of the processes using the page fault handler, with init_level4_pgt as reference. -Current X86-64 implementations only support 40 bit of address space, -but we support upto 46bits. This expands into MBZ space in the page tables. +Current X86-64 implementations only support 40 bits of address space, +but we support up to 46 bits. This expands into MBZ space in the page tables. -Andi Kleen, Jul 2004 |