diff options
Diffstat (limited to 'Documentation/x86_64/machinecheck')
-rw-r--r-- | Documentation/x86_64/machinecheck | 77 |
1 files changed, 0 insertions, 77 deletions
diff --git a/Documentation/x86_64/machinecheck b/Documentation/x86_64/machinecheck deleted file mode 100644 index a05e58e7b159..000000000000 --- a/Documentation/x86_64/machinecheck +++ /dev/null @@ -1,77 +0,0 @@ - -Configurable sysfs parameters for the x86-64 machine check code. - -Machine checks report internal hardware error conditions detected -by the CPU. Uncorrected errors typically cause a machine check -(often with panic), corrected ones cause a machine check log entry. - -Machine checks are organized in banks (normally associated with -a hardware subsystem) and subevents in a bank. The exact meaning -of the banks and subevent is CPU specific. - -mcelog knows how to decode them. - -When you see the "Machine check errors logged" message in the system -log then mcelog should run to collect and decode machine check entries -from /dev/mcelog. Normally mcelog should be run regularly from a cronjob. - -Each CPU has a directory in /sys/devices/system/machinecheck/machinecheckN -(N = CPU number) - -The directory contains some configurable entries: - -Entries: - -bankNctl -(N bank number) - 64bit Hex bitmask enabling/disabling specific subevents for bank N - When a bit in the bitmask is zero then the respective - subevent will not be reported. - By default all events are enabled. - Note that BIOS maintain another mask to disable specific events - per bank. This is not visible here - -The following entries appear for each CPU, but they are truly shared -between all CPUs. - -check_interval - How often to poll for corrected machine check errors, in seconds - (Note output is hexademical). Default 5 minutes. When the poller - finds MCEs it triggers an exponential speedup (poll more often) on - the polling interval. When the poller stops finding MCEs, it - triggers an exponential backoff (poll less often) on the polling - interval. The check_interval variable is both the initial and - maximum polling interval. - -tolerant - Tolerance level. When a machine check exception occurs for a non - corrected machine check the kernel can take different actions. - Since machine check exceptions can happen any time it is sometimes - risky for the kernel to kill a process because it defies - normal kernel locking rules. The tolerance level configures - how hard the kernel tries to recover even at some risk of - deadlock. Higher tolerant values trade potentially better uptime - with the risk of a crash or even corruption (for tolerant >= 3). - - 0: always panic on uncorrected errors, log corrected errors - 1: panic or SIGBUS on uncorrected errors, log corrected errors - 2: SIGBUS or log uncorrected errors, log corrected errors - 3: never panic or SIGBUS, log all errors (for testing only) - - Default: 1 - - Note this only makes a difference if the CPU allows recovery - from a machine check exception. Current x86 CPUs generally do not. - -trigger - Program to run when a machine check event is detected. - This is an alternative to running mcelog regularly from cron - and allows to detect events faster. - -TBD document entries for AMD threshold interrupt configuration - -For more details about the x86 machine check architecture -see the Intel and AMD architecture manuals from their developer websites. - -For more details about the architecture see -see http://one.firstfloor.org/~andi/mce.pdf |