diff options
author | Varun Wadekar <vwadekar@nvidia.com> | 2012-04-07 01:52:57 +0530 |
---|---|---|
committer | Varun Wadekar <vwadekar@nvidia.com> | 2012-04-07 01:52:57 +0530 |
commit | 97caf63d0c837f9b5c9f6f469979e68c0378e83f (patch) | |
tree | c6fc834bcfb66268f474324eca619db419010532 /Documentation/powerpc | |
parent | 6a1a6f4f69adf0febfd923795b45edeff63e75ed (diff) |
Merge branch '3.4-rc1' into android-tegra-nv-3.3-rebased
Change-Id: Ib3b69ffc5ac3e07c9cc44cc49e9142088eec477e
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
Diffstat (limited to 'Documentation/powerpc')
-rw-r--r-- | Documentation/powerpc/firmware-assisted-dump.txt | 270 | ||||
-rw-r--r-- | Documentation/powerpc/mpc52xx.txt | 12 | ||||
-rw-r--r-- | Documentation/powerpc/phyp-assisted-dump.txt | 127 |
3 files changed, 276 insertions, 133 deletions
diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt new file mode 100644 index 000000000000..3007bc98af28 --- /dev/null +++ b/Documentation/powerpc/firmware-assisted-dump.txt @@ -0,0 +1,270 @@ + + Firmware-Assisted Dump + ------------------------ + July 2011 + +The goal of firmware-assisted dump is to enable the dump of +a crashed system, and to do so from a fully-reset system, and +to minimize the total elapsed time until the system is back +in production use. + +- Firmware assisted dump (fadump) infrastructure is intended to replace + the existing phyp assisted dump. +- Fadump uses the same firmware interfaces and memory reservation model + as phyp assisted dump. +- Unlike phyp dump, fadump exports the memory dump through /proc/vmcore + in the ELF format in the same way as kdump. This helps us reuse the + kdump infrastructure for dump capture and filtering. +- Unlike phyp dump, userspace tool does not need to refer any sysfs + interface while reading /proc/vmcore. +- Unlike phyp dump, fadump allows user to release all the memory reserved + for dump, with a single operation of echo 1 > /sys/kernel/fadump_release_mem. +- Once enabled through kernel boot parameter, fadump can be + started/stopped through /sys/kernel/fadump_registered interface (see + sysfs files section below) and can be easily integrated with kdump + service start/stop init scripts. + +Comparing with kdump or other strategies, firmware-assisted +dump offers several strong, practical advantages: + +-- Unlike kdump, the system has been reset, and loaded + with a fresh copy of the kernel. In particular, + PCI and I/O devices have been reinitialized and are + in a clean, consistent state. +-- Once the dump is copied out, the memory that held the dump + is immediately available to the running kernel. And therefore, + unlike kdump, fadump doesn't need a 2nd reboot to get back + the system to the production configuration. + +The above can only be accomplished by coordination with, +and assistance from the Power firmware. The procedure is +as follows: + +-- The first kernel registers the sections of memory with the + Power firmware for dump preservation during OS initialization. + These registered sections of memory are reserved by the first + kernel during early boot. + +-- When a system crashes, the Power firmware will save + the low memory (boot memory of size larger of 5% of system RAM + or 256MB) of RAM to the previous registered region. It will + also save system registers, and hardware PTE's. + + NOTE: The term 'boot memory' means size of the low memory chunk + that is required for a kernel to boot successfully when + booted with restricted memory. By default, the boot memory + size will be the larger of 5% of system RAM or 256MB. + Alternatively, user can also specify boot memory size + through boot parameter 'fadump_reserve_mem=' which will + override the default calculated size. Use this option + if default boot memory size is not sufficient for second + kernel to boot successfully. + +-- After the low memory (boot memory) area has been saved, the + firmware will reset PCI and other hardware state. It will + *not* clear the RAM. It will then launch the bootloader, as + normal. + +-- The freshly booted kernel will notice that there is a new + node (ibm,dump-kernel) in the device tree, indicating that + there is crash data available from a previous boot. During + the early boot OS will reserve rest of the memory above + boot memory size effectively booting with restricted memory + size. This will make sure that the second kernel will not + touch any of the dump memory area. + +-- User-space tools will read /proc/vmcore to obtain the contents + of memory, which holds the previous crashed kernel dump in ELF + format. The userspace tools may copy this info to disk, or + network, nas, san, iscsi, etc. as desired. + +-- Once the userspace tool is done saving dump, it will echo + '1' to /sys/kernel/fadump_release_mem to release the reserved + memory back to general use, except the memory required for + next firmware-assisted dump registration. + + e.g. + # echo 1 > /sys/kernel/fadump_release_mem + +Please note that the firmware-assisted dump feature +is only available on Power6 and above systems with recent +firmware versions. + +Implementation details: +---------------------- + +During boot, a check is made to see if firmware supports +this feature on that particular machine. If it does, then +we check to see if an active dump is waiting for us. If yes +then everything but boot memory size of RAM is reserved during +early boot (See Fig. 2). This area is released once we finish +collecting the dump from user land scripts (e.g. kdump scripts) +that are run. If there is dump data, then the +/sys/kernel/fadump_release_mem file is created, and the reserved +memory is held. + +If there is no waiting dump data, then only the memory required +to hold CPU state, HPTE region, boot memory dump and elfcore +header, is reserved at the top of memory (see Fig. 1). This area +is *not* released: this region will be kept permanently reserved, +so that it can act as a receptacle for a copy of the boot memory +content in addition to CPU state and HPTE region, in the case a +crash does occur. + + o Memory Reservation during first kernel + + Low memory Top of memory + 0 boot memory size | + | | |<--Reserved dump area -->| + V V | Permanent Reservation V + +-----------+----------/ /----------+---+----+-----------+----+ + | | |CPU|HPTE| DUMP |ELF | + +-----------+----------/ /----------+---+----+-----------+----+ + | ^ + | | + \ / + ------------------------------------------- + Boot memory content gets transferred to + reserved area by firmware at the time of + crash + Fig. 1 + + o Memory Reservation during second kernel after crash + + Low memory Top of memory + 0 boot memory size | + | |<------------- Reserved dump area ----------- -->| + V V V + +-----------+----------/ /----------+---+----+-----------+----+ + | | |CPU|HPTE| DUMP |ELF | + +-----------+----------/ /----------+---+----+-----------+----+ + | | + V V + Used by second /proc/vmcore + kernel to boot + Fig. 2 + +Currently the dump will be copied from /proc/vmcore to a +a new file upon user intervention. The dump data available through +/proc/vmcore will be in ELF format. Hence the existing kdump +infrastructure (kdump scripts) to save the dump works fine with +minor modifications. + +The tools to examine the dump will be same as the ones +used for kdump. + +How to enable firmware-assisted dump (fadump): +------------------------------------- + +1. Set config option CONFIG_FA_DUMP=y and build kernel. +2. Boot into linux kernel with 'fadump=on' kernel cmdline option. +3. Optionally, user can also set 'fadump_reserve_mem=' kernel cmdline + to specify size of the memory to reserve for boot memory dump + preservation. + +NOTE: If firmware-assisted dump fails to reserve memory then it will + fallback to existing kdump mechanism if 'crashkernel=' option + is set at kernel cmdline. + +Sysfs/debugfs files: +------------ + +Firmware-assisted dump feature uses sysfs file system to hold +the control files and debugfs file to display memory reserved region. + +Here is the list of files under kernel sysfs: + + /sys/kernel/fadump_enabled + + This is used to display the fadump status. + 0 = fadump is disabled + 1 = fadump is enabled + + This interface can be used by kdump init scripts to identify if + fadump is enabled in the kernel and act accordingly. + + /sys/kernel/fadump_registered + + This is used to display the fadump registration status as well + as to control (start/stop) the fadump registration. + 0 = fadump is not registered. + 1 = fadump is registered and ready to handle system crash. + + To register fadump echo 1 > /sys/kernel/fadump_registered and + echo 0 > /sys/kernel/fadump_registered for un-register and stop the + fadump. Once the fadump is un-registered, the system crash will not + be handled and vmcore will not be captured. This interface can be + easily integrated with kdump service start/stop. + + /sys/kernel/fadump_release_mem + + This file is available only when fadump is active during + second kernel. This is used to release the reserved memory + region that are held for saving crash dump. To release the + reserved memory echo 1 to it: + + echo 1 > /sys/kernel/fadump_release_mem + + After echo 1, the content of the /sys/kernel/debug/powerpc/fadump_region + file will change to reflect the new memory reservations. + + The existing userspace tools (kdump infrastructure) can be easily + enhanced to use this interface to release the memory reserved for + dump and continue without 2nd reboot. + +Here is the list of files under powerpc debugfs: +(Assuming debugfs is mounted on /sys/kernel/debug directory.) + + /sys/kernel/debug/powerpc/fadump_region + + This file shows the reserved memory regions if fadump is + enabled otherwise this file is empty. The output format + is: + <region>: [<start>-<end>] <reserved-size> bytes, Dumped: <dump-size> + + e.g. + Contents when fadump is registered during first kernel + + # cat /sys/kernel/debug/powerpc/fadump_region + CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x0 + HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x0 + DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x0 + + Contents when fadump is active during second kernel + + # cat /sys/kernel/debug/powerpc/fadump_region + CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x40020 + HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x1000 + DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x10000000 + : [0x00000010000000-0x0000006ffaffff] 0x5ffb0000 bytes, Dumped: 0x5ffb0000 + +NOTE: Please refer to Documentation/filesystems/debugfs.txt on + how to mount the debugfs filesystem. + + +TODO: +----- + o Need to come up with the better approach to find out more + accurate boot memory size that is required for a kernel to + boot successfully when booted with restricted memory. + o The fadump implementation introduces a fadump crash info structure + in the scratch area before the ELF core header. The idea of introducing + this structure is to pass some important crash info data to the second + kernel which will help second kernel to populate ELF core header with + correct data before it gets exported through /proc/vmcore. The current + design implementation does not address a possibility of introducing + additional fields (in future) to this structure without affecting + compatibility. Need to come up with the better approach to address this. + The possible approaches are: + 1. Introduce version field for version tracking, bump up the version + whenever a new field is added to the structure in future. The version + field can be used to find out what fields are valid for the current + version of the structure. + 2. Reserve the area of predefined size (say PAGE_SIZE) for this + structure and have unused area as reserved (initialized to zero) + for future field additions. + The advantage of approach 1 over 2 is we don't need to reserve extra space. +--- +Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> +This document is based on the original documentation written for phyp +assisted dump by Linas Vepstas and Manish Ahuja. diff --git a/Documentation/powerpc/mpc52xx.txt b/Documentation/powerpc/mpc52xx.txt index 10dd4ab93b85..0d540a31ea1a 100644 --- a/Documentation/powerpc/mpc52xx.txt +++ b/Documentation/powerpc/mpc52xx.txt @@ -2,7 +2,7 @@ Linux 2.6.x on MPC52xx family ----------------------------- For the latest info, go to http://www.246tNt.com/mpc52xx/ - + To compile/use : - U-Boot: @@ -10,23 +10,23 @@ To compile/use : if you wish to ). # make lite5200_defconfig # make uImage - + then, on U-boot: => tftpboot 200000 uImage => tftpboot 400000 pRamdisk => bootm 200000 400000 - + - DBug: # <edit Makefile to set ARCH=ppc & CROSS_COMPILE=... ( also EXTRAVERSION if you wish to ). # make lite5200_defconfig # cp your_initrd.gz arch/ppc/boot/images/ramdisk.image.gz - # make zImage.initrd - # make + # make zImage.initrd + # make then in DBug: DBug> dn -i zImage.initrd.lite5200 - + Some remarks : - The port is named mpc52xxx, and config options are PPC_MPC52xx. The MGT5100 diff --git a/Documentation/powerpc/phyp-assisted-dump.txt b/Documentation/powerpc/phyp-assisted-dump.txt deleted file mode 100644 index ad340205d96a..000000000000 --- a/Documentation/powerpc/phyp-assisted-dump.txt +++ /dev/null @@ -1,127 +0,0 @@ - - Hypervisor-Assisted Dump - ------------------------ - November 2007 - -The goal of hypervisor-assisted dump is to enable the dump of -a crashed system, and to do so from a fully-reset system, and -to minimize the total elapsed time until the system is back -in production use. - -As compared to kdump or other strategies, hypervisor-assisted -dump offers several strong, practical advantages: - --- Unlike kdump, the system has been reset, and loaded - with a fresh copy of the kernel. In particular, - PCI and I/O devices have been reinitialized and are - in a clean, consistent state. --- As the dump is performed, the dumped memory becomes - immediately available to the system for normal use. --- After the dump is completed, no further reboots are - required; the system will be fully usable, and running - in its normal, production mode on its normal kernel. - -The above can only be accomplished by coordination with, -and assistance from the hypervisor. The procedure is -as follows: - --- When a system crashes, the hypervisor will save - the low 256MB of RAM to a previously registered - save region. It will also save system state, system - registers, and hardware PTE's. - --- After the low 256MB area has been saved, the - hypervisor will reset PCI and other hardware state. - It will *not* clear RAM. It will then launch the - bootloader, as normal. - --- The freshly booted kernel will notice that there - is a new node (ibm,dump-kernel) in the device tree, - indicating that there is crash data available from - a previous boot. It will boot into only 256MB of RAM, - reserving the rest of system memory. - --- Userspace tools will parse /sys/kernel/release_region - and read /proc/vmcore to obtain the contents of memory, - which holds the previous crashed kernel. The userspace - tools may copy this info to disk, or network, nas, san, - iscsi, etc. as desired. - - For Example: the values in /sys/kernel/release-region - would look something like this (address-range pairs). - CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: / - DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A - --- As the userspace tools complete saving a portion of - dump, they echo an offset and size to - /sys/kernel/release_region to release the reserved - memory back to general use. - - An example of this is: - "echo 0x40000000 0x10000000 > /sys/kernel/release_region" - which will release 256MB at the 1GB boundary. - -Please note that the hypervisor-assisted dump feature -is only available on Power6-based systems with recent -firmware versions. - -Implementation details: ----------------------- - -During boot, a check is made to see if firmware supports -this feature on this particular machine. If it does, then -we check to see if a active dump is waiting for us. If yes -then everything but 256 MB of RAM is reserved during early -boot. This area is released once we collect a dump from user -land scripts that are run. If there is dump data, then -the /sys/kernel/release_region file is created, and -the reserved memory is held. - -If there is no waiting dump data, then only the highest -256MB of the ram is reserved as a scratch area. This area -is *not* released: this region will be kept permanently -reserved, so that it can act as a receptacle for a copy -of the low 256MB in the case a crash does occur. See, -however, "open issues" below, as to whether -such a reserved region is really needed. - -Currently the dump will be copied from /proc/vmcore to a -a new file upon user intervention. The starting address -to be read and the range for each data point in provided -in /sys/kernel/release_region. - -The tools to examine the dump will be same as the ones -used for kdump. - -General notes: --------------- -Security: please note that there are potential security issues -with any sort of dump mechanism. In particular, plaintext -(unencrypted) data, and possibly passwords, may be present in -the dump data. Userspace tools must take adequate precautions to -preserve security. - -Open issues/ToDo: ------------- - o The various code paths that tell the hypervisor that a crash - occurred, vs. it simply being a normal reboot, should be - reviewed, and possibly clarified/fixed. - - o Instead of using /sys/kernel, should there be a /sys/dump - instead? There is a dump_subsys being created by the s390 code, - perhaps the pseries code should use a similar layout as well. - - o Is reserving a 256MB region really required? The goal of - reserving a 256MB scratch area is to make sure that no - important crash data is clobbered when the hypervisor - save low mem to the scratch area. But, if one could assure - that nothing important is located in some 256MB area, then - it would not need to be reserved. Something that can be - improved in subsequent versions. - - o Still working the kdump team to integrate this with kdump, - some work remains but this would not affect the current - patches. - - o Still need to write a shell script, to copy the dump away. - Currently I am parsing it manually. |