diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2010-08-06 09:30:52 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2010-08-06 09:30:52 -0700 |
commit | 4aed2fd8e3181fea7c09ba79cf64e7e3f4413bf9 (patch) | |
tree | 1f69733e5daab4915a76a41de0e4d1dc61e12cfb /Documentation/trace | |
parent | 3a3527b6461b1298cc53ce72f336346739297ac8 (diff) | |
parent | fc9ea5a1e53ee54f681e226d735008e2a6f8f470 (diff) |
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (162 commits)
tracing/kprobes: unregister_trace_probe needs to be called under mutex
perf: expose event__process function
perf events: Fix mmap offset determination
perf, powerpc: fsl_emb: Restore setting perf_sample_data.period
perf, powerpc: Convert the FSL driver to use local64_t
perf tools: Don't keep unreferenced maps when unmaps are detected
perf session: Invalidate last_match when removing threads from rb_tree
perf session: Free the ref_reloc_sym memory at the right place
x86,mmiotrace: Add support for tracing STOS instruction
perf, sched migration: Librarize task states and event headers helpers
perf, sched migration: Librarize the GUI class
perf, sched migration: Make the GUI class client agnostic
perf, sched migration: Make it vertically scrollable
perf, sched migration: Parameterize cpu height and spacing
perf, sched migration: Fix key bindings
perf, sched migration: Ignore unhandled task states
perf, sched migration: Handle ignored migrate out events
perf: New migration tool overview
tracing: Drop cpparg() macro
perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call
...
Fix up trivial conflicts in Makefile and drivers/cpufreq/cpufreq.c
Diffstat (limited to 'Documentation/trace')
-rw-r--r-- | Documentation/trace/ftrace-design.txt | 153 | ||||
-rw-r--r-- | Documentation/trace/kmemtrace.txt | 126 | ||||
-rw-r--r-- | Documentation/trace/kprobetrace.txt | 2 |
3 files changed, 149 insertions, 132 deletions
diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt index f1f81afee8a0..dc52bd442c92 100644 --- a/Documentation/trace/ftrace-design.txt +++ b/Documentation/trace/ftrace-design.txt @@ -13,6 +13,9 @@ Note that this focuses on architecture implementation details only. If you want more explanation of a feature in terms of common code, review the common ftrace.txt file. +Ideally, everyone who wishes to retain performance while supporting tracing in +their kernel should make it all the way to dynamic ftrace support. + Prerequisites ------------- @@ -215,7 +218,7 @@ An arch may pass in a unique value (frame pointer) to both the entering and exiting of a function. On exit, the value is compared and if it does not match, then it will panic the kernel. This is largely a sanity check for bad code generation with gcc. If gcc for your port sanely updates the frame -pointer under different opitmization levels, then ignore this option. +pointer under different optimization levels, then ignore this option. However, adding support for it isn't terribly difficult. In your assembly code that calls prepare_ftrace_return(), pass the frame pointer as the 3rd argument. @@ -234,7 +237,7 @@ If you can't trace NMI functions, then skip this option. HAVE_SYSCALL_TRACEPOINTS ---------------------- +------------------------ You need very few things to get the syscalls tracing in an arch. @@ -250,12 +253,152 @@ You need very few things to get the syscalls tracing in an arch. HAVE_FTRACE_MCOUNT_RECORD ------------------------- -See scripts/recordmcount.pl for more info. +See scripts/recordmcount.pl for more info. Just fill in the arch-specific +details for how to locate the addresses of mcount call sites via objdump. +This option doesn't make much sense without also implementing dynamic ftrace. + +HAVE_DYNAMIC_FTRACE +------------------- + +You will first need HAVE_FTRACE_MCOUNT_RECORD and HAVE_FUNCTION_TRACER, so +scroll your reader back up if you got over eager. + +Once those are out of the way, you will need to implement: + - asm/ftrace.h: + - MCOUNT_ADDR + - ftrace_call_adjust() + - struct dyn_arch_ftrace{} + - asm code: + - mcount() (new stub) + - ftrace_caller() + - ftrace_call() + - ftrace_stub() + - C code: + - ftrace_dyn_arch_init() + - ftrace_make_nop() + - ftrace_make_call() + - ftrace_update_ftrace_func() + +First you will need to fill out some arch details in your asm/ftrace.h. + +Define MCOUNT_ADDR as the address of your mcount symbol similar to: + #define MCOUNT_ADDR ((unsigned long)mcount) +Since no one else will have a decl for that function, you will need to: + extern void mcount(void); + +You will also need the helper function ftrace_call_adjust(). Most people +will be able to stub it out like so: + static inline unsigned long ftrace_call_adjust(unsigned long addr) + { + return addr; + } <details to be filled> +Lastly you will need the custom dyn_arch_ftrace structure. If you need +some extra state when runtime patching arbitrary call sites, this is the +place. For now though, create an empty struct: + struct dyn_arch_ftrace { + /* No extra data needed */ + }; + +With the header out of the way, we can fill out the assembly code. While we +did already create a mcount() function earlier, dynamic ftrace only wants a +stub function. This is because the mcount() will only be used during boot +and then all references to it will be patched out never to return. Instead, +the guts of the old mcount() will be used to create a new ftrace_caller() +function. Because the two are hard to merge, it will most likely be a lot +easier to have two separate definitions split up by #ifdefs. Same goes for +the ftrace_stub() as that will now be inlined in ftrace_caller(). + +Before we get confused anymore, let's check out some pseudo code so you can +implement your own stuff in assembly: -HAVE_DYNAMIC_FTRACE ---------------------- +void mcount(void) +{ + return; +} + +void ftrace_caller(void) +{ + /* implement HAVE_FUNCTION_TRACE_MCOUNT_TEST if you desire */ + + /* save all state needed by the ABI (see paragraph above) */ + + unsigned long frompc = ...; + unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE; + +ftrace_call: + ftrace_stub(frompc, selfpc); + + /* restore all state needed by the ABI */ + +ftrace_stub: + return; +} + +This might look a little odd at first, but keep in mind that we will be runtime +patching multiple things. First, only functions that we actually want to trace +will be patched to call ftrace_caller(). Second, since we only have one tracer +active at a time, we will patch the ftrace_caller() function itself to call the +specific tracer in question. That is the point of the ftrace_call label. + +With that in mind, let's move on to the C code that will actually be doing the +runtime patching. You'll need a little knowledge of your arch's opcodes in +order to make it through the next section. + +Every arch has an init callback function. If you need to do something early on +to initialize some state, this is the time to do that. Otherwise, this simple +function below should be sufficient for most people: + +int __init ftrace_dyn_arch_init(void *data) +{ + /* return value is done indirectly via data */ + *(unsigned long *)data = 0; + + return 0; +} + +There are two functions that are used to do runtime patching of arbitrary +functions. The first is used to turn the mcount call site into a nop (which +is what helps us retain runtime performance when not tracing). The second is +used to turn the mcount call site into a call to an arbitrary location (but +typically that is ftracer_caller()). See the general function definition in +linux/ftrace.h for the functions: + ftrace_make_nop() + ftrace_make_call() +The rec->ip value is the address of the mcount call site that was collected +by the scripts/recordmcount.pl during build time. + +The last function is used to do runtime patching of the active tracer. This +will be modifying the assembly code at the location of the ftrace_call symbol +inside of the ftrace_caller() function. So you should have sufficient padding +at that location to support the new function calls you'll be inserting. Some +people will be using a "call" type instruction while others will be using a +"branch" type instruction. Specifically, the function is: + ftrace_update_ftrace_func() + + +HAVE_DYNAMIC_FTRACE + HAVE_FUNCTION_GRAPH_TRACER +------------------------------------------------ + +The function grapher needs a few tweaks in order to work with dynamic ftrace. +Basically, you will need to: + - update: + - ftrace_caller() + - ftrace_graph_call() + - ftrace_graph_caller() + - implement: + - ftrace_enable_ftrace_graph_caller() + - ftrace_disable_ftrace_graph_caller() <details to be filled> +Quick notes: + - add a nop stub after the ftrace_call location named ftrace_graph_call; + stub needs to be large enough to support a call to ftrace_graph_caller() + - update ftrace_graph_caller() to work with being called by the new + ftrace_caller() since some semantics may have changed + - ftrace_enable_ftrace_graph_caller() will runtime patch the + ftrace_graph_call location with a call to ftrace_graph_caller() + - ftrace_disable_ftrace_graph_caller() will runtime patch the + ftrace_graph_call location with nops diff --git a/Documentation/trace/kmemtrace.txt b/Documentation/trace/kmemtrace.txt deleted file mode 100644 index 6308735e58ca..000000000000 --- a/Documentation/trace/kmemtrace.txt +++ /dev/null @@ -1,126 +0,0 @@ - kmemtrace - Kernel Memory Tracer - - by Eduard - Gabriel Munteanu - <eduard.munteanu@linux360.ro> - -I. Introduction -=============== - -kmemtrace helps kernel developers figure out two things: -1) how different allocators (SLAB, SLUB etc.) perform -2) how kernel code allocates memory and how much - -To do this, we trace every allocation and export information to the userspace -through the relay interface. We export things such as the number of requested -bytes, the number of bytes actually allocated (i.e. including internal -fragmentation), whether this is a slab allocation or a plain kmalloc() and so -on. - -The actual analysis is performed by a userspace tool (see section III for -details on where to get it from). It logs the data exported by the kernel, -processes it and (as of writing this) can provide the following information: -- the total amount of memory allocated and fragmentation per call-site -- the amount of memory allocated and fragmentation per allocation -- total memory allocated and fragmentation in the collected dataset -- number of cross-CPU allocation and frees (makes sense in NUMA environments) - -Moreover, it can potentially find inconsistent and erroneous behavior in -kernel code, such as using slab free functions on kmalloc'ed memory or -allocating less memory than requested (but not truly failed allocations). - -kmemtrace also makes provisions for tracing on some arch and analysing the -data on another. - -II. Design and goals -==================== - -kmemtrace was designed to handle rather large amounts of data. Thus, it uses -the relay interface to export whatever is logged to userspace, which then -stores it. Analysis and reporting is done asynchronously, that is, after the -data is collected and stored. By design, it allows one to log and analyse -on different machines and different arches. - -As of writing this, the ABI is not considered stable, though it might not -change much. However, no guarantees are made about compatibility yet. When -deemed stable, the ABI should still allow easy extension while maintaining -backward compatibility. This is described further in Documentation/ABI. - -Summary of design goals: - - allow logging and analysis to be done across different machines - - be fast and anticipate usage in high-load environments (*) - - be reasonably extensible - - make it possible for GNU/Linux distributions to have kmemtrace - included in their repositories - -(*) - one of the reasons Pekka Enberg's original userspace data analysis - tool's code was rewritten from Perl to C (although this is more than a - simple conversion) - - -III. Quick usage guide -====================== - -1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable -CONFIG_KMEMTRACE). - -2) Get the userspace tool and build it: -$ git clone git://repo.or.cz/kmemtrace-user.git # current repository -$ cd kmemtrace-user/ -$ ./autogen.sh -$ ./configure -$ make - -3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the -'single' runlevel (so that relay buffers don't fill up easily), and run -kmemtrace: -# '$' does not mean user, but root here. -$ mount -t debugfs none /sys/kernel/debug -$ mount -t proc none /proc -$ cd path/to/kmemtrace-user/ -$ ./kmemtraced -Wait a bit, then stop it with CTRL+C. -$ cat /sys/kernel/debug/kmemtrace/total_overruns # Check if we didn't - # overrun, should - # be zero. -$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to - check its correctness] -$ ./kmemtrace-report - -Now you should have a nice and short summary of how the allocator performs. - -IV. FAQ and known issues -======================== - -Q: 'cat /sys/kernel/debug/kmemtrace/total_overruns' is non-zero, how do I fix -this? Should I worry? -A: If it's non-zero, this affects kmemtrace's accuracy, depending on how -large the number is. You can fix it by supplying a higher -'kmemtrace.subbufs=N' kernel parameter. ---- - -Q: kmemtrace_check reports errors, how do I fix this? Should I worry? -A: This is a bug and should be reported. It can occur for a variety of -reasons: - - possible bugs in relay code - - possible misuse of relay by kmemtrace - - timestamps being collected unorderly -Or you may fix it yourself and send us a patch. ---- - -Q: kmemtrace_report shows many errors, how do I fix this? Should I worry? -A: This is a known issue and I'm working on it. These might be true errors -in kernel code, which may have inconsistent behavior (e.g. allocating memory -with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed -out this behavior may work with SLAB, but may fail with other allocators. - -It may also be due to lack of tracing in some unusual allocator functions. - -We don't want bug reports regarding this issue yet. ---- - -V. See also -=========== - -Documentation/kernel-parameters.txt -Documentation/ABI/testing/debugfs-kmemtrace - diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt index ec94748ae65b..5f77d94598dd 100644 --- a/Documentation/trace/kprobetrace.txt +++ b/Documentation/trace/kprobetrace.txt @@ -42,7 +42,7 @@ Synopsis of kprobe_events +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**) NAME=FETCHARG : Set NAME as the argument name of FETCHARG. FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types - (u8/u16/u32/u64/s8/s16/s32/s64) are supported. + (u8/u16/u32/u64/s8/s16/s32/s64) and string are supported. (*) only for return probe. (**) this is useful for fetching a field of data structures. |