| Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
- Extend tracing option mask to 64 bits
The trace options were defined by a 32 bit variable. This limits the
tracing instances to have a total of 32 different options. As that
limit has been hit, and more options are being added, increase the
option mask to a 64 bit number, doubling the number of options
available.
As this is required for the kprobe topic branches as well as the
tracing topic branch, a separate branch was created and merged into
both.
- Make trace_user_fault_read() available for the rest of tracing
The function trace_user_fault_read() is used by trace_marker file
read to allow reading user space to be done fast and without locking
or allocations. Make this available so that the system call trace
events can use it too.
- Have system call trace events read user space values
Now that the system call trace events callbacks are called in a
faultable context, take advantage of this and read the user space
buffers for various system calls. For example, show the path name of
the openat system call instead of just showing the pointer to that
path name in user space. Also show the contents of the buffer of the
write system call. Several system call trace events are updated to
make tracing into a light weight strace tool for all applications in
the system.
- Update perf system call tracing to do the same
- And a config and syscall_user_buf_size file to control the size of
the buffer
Limit the amount of data that can be read from user space. The
default size is 63 bytes but that can be expanded to 165 bytes.
- Allow the persistent ring buffer to print system calls normally
The persistent ring buffer prints trace events by their type and
ignores the print_fmt. This is because the print_fmt may change from
kernel to kernel. As the system call output is fixed by the system
call ABI itself, there's no reason to limit that. This makes reading
the system call events in the persistent ring buffer much nicer and
easier to understand.
- Add options to show text offset to function profiler
The function profiler that counts the number of times a function is
hit currently lists all functions by its name and offset. But this
becomes ambiguous when there are several functions with the same
name.
Add a tracing option that changes the output to be that of
'_text+offset' instead. Now a user space tool can use this
information to map the '_text+offset' to the unique function it is
counting.
- Report bad dynamic event command
If a bad command is passed to the dynamic_events file, report it
properly in the error log.
- Clean up tracer options
Clean up the tracer option code a bit, by removing some useless code
and also using switch statements instead of a series of if
statements.
- Have tracing options be instance specific
Tracers can have their own options (function tracer, irqsoff tracer,
function graph tracer, etc). But now that the same tracer can be
enabled in multiple trace instances, their options are still global.
The API is per instance, thus changing one affects other instances.
This isn't even consistent, as the option take affect differently
depending on when an tracer started in an instance. Make the options
for instances only affect the instance it is changed under.
- Optimize pid_list lock contention
Whenever the pid_list is read, it uses a spin lock. This happens at
every sched switch. Taking the lock at sched switch can be removed by
instead using a seqlock counter.
- Clean up the trace trigger structures
The trigger code uses two different structures to implement a single
tigger. This was due to trying to reuse code for the two different
types of triggers (always on trigger, and count limited trigger). But
by adding a single field to one structure, the other structure could
be absorbed into the first structure making he code easier to
understand.
- Create a bulk garbage collector for trace triggers
If user space has triggers for several hundreds of events and then
removes them, it can take several seconds to complete. This is
because each removal calls tracepoint_synchronize_unregister() that
can take hundreds of milliseconds to complete.
Instead, create a helper thread that will do the clean up. When a
trigger is removed, it will create the kthread if it isn't already
created, and then add the trigger to a llist. The kthread will take
the items off the llist, call tracepoint_synchronize_unregister(),
and then remove the items it took off. It will then check if there's
more items to free before sleeping.
This makes user space removing all these triggers to finish in less
than a second.
- Allow function tracing of some of the tracing infrastructure code
Because the tracing code can cause recursion issues if it is traced
by the function tracer the entire tracing directory disables function
tracing. But not all of tracing causes issues if it is traced.
Namely, the event tracing code. Add a config that enables some of the
tracing code to be traced to help in debugging it. Note, when this is
enabled, it does add noise to general function tracing, especially if
events are enabled as well (which is a common case).
- Add boot-time backup instance for persistent buffer
The persistent ring buffer is used mostly for kernel crash analysis
in the field. One issue is that if there's a crash, the data in the
persistent ring buffer must be read before tracing can begin using
it. This slows down the boot process. Once tracing starts in the
persistent ring buffer, the old data must be freed and the addresses
no longer match and old events can't be in the buffer with new
events.
Create a way to create a backup buffer that copies the persistent
ring buffer at boot up. Then after a crash, the always on tracer can
begin immediately as well as the normal boot process while the crash
analysis tooling uses the backup buffer. After the backup buffer is
finished being read, it can be removed.
- Enable function graph args and return address options at the same
time
Currently the when reading of arguments in the function graph tracer
is enabled, the option to record the parent function in the entry
event can not be enabled. Update the code so that it can.
- Add new struct_offset() helper macro
Add a new macro that takes a pointer to a structure and a name of one
of its members and it will return the offset of that member. This
allows the ring buffer code to simplify the following:
From: size = struct_size(entry, buf, cnt - sizeof(entry->id));
To: size = struct_offset(entry, id) + cnt;
There should be other simplifications that this macro can help out
with as well
* tag 'trace-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (42 commits)
overflow: Introduce struct_offset() to get offset of member
function_graph: Enable funcgraph-args and funcgraph-retaddr to work simultaneously
tracing: Add boot-time backup of persistent ring buffer
ftrace: Allow tracing of some of the tracing code
tracing: Use strim() in trigger_process_regex() instead of skip_spaces()
tracing: Add bulk garbage collection of freeing event_trigger_data
tracing: Remove unneeded event_mutex lock in event_trigger_regex_release()
tracing: Merge struct event_trigger_ops into struct event_command
tracing: Remove get_trigger_ops() and add count_func() from trigger ops
tracing: Show the tracer options in boot-time created instance
ftrace: Avoid redundant initialization in register_ftrace_direct
tracing: Remove unused variable in tracing_trace_options_show()
fgraph: Make fgraph_no_sleep_time signed
tracing: Convert function graph set_flags() to use a switch() statement
tracing: Have function graph tracer option sleep-time be per instance
tracing: Move graph-time out of function graph options
tracing: Have function graph tracer option funcgraph-irqs be per instance
trace/pid_list: optimize pid_list->lock contention
tracing: Have function graph tracer define options per instance
tracing: Have function tracer define options per instance
...
|
|
simultaneously
Currently, the funcgraph-args and funcgraph-retaddr features are
mutually exclusive. This patch resolves this limitation by allowing
funcgraph-retaddr to have an args array.
To verify the change, use perf to trace vfs_write with both options
enabled:
Before:
# perf ftrace -G vfs_write --graph-opts args,retaddr
......
down_read() { /* <-n_tty_write+0xa3/0x540 */
__cond_resched(); /* <-down_read+0x12/0x160 */
preempt_count_add(); /* <-down_read+0x3b/0x160 */
preempt_count_sub(); /* <-down_read+0x8b/0x160 */
}
After:
# perf ftrace -G vfs_write --graph-opts args,retaddr
......
down_read(sem=0xffff8880100bea78) { /* <-n_tty_write+0xa3/0x540 */
__cond_resched(); /* <-down_read+0x12/0x160 */
preempt_count_add(val=1); /* <-down_read+0x3b/0x160 */
preempt_count_sub(val=1); /* <-down_read+0x8b/0x160 */
}
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Xiaoqin Zhang <zhangxiaoqin@xiaomi.com>
Link: https://patch.msgid.link/20251125093425.2563849-1-dolinux.peng@gmail.com
Signed-off-by: pengdonglin <pengdonglin@xiaomi.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
For now, the "nop" will be replaced with a "call" instruction when a
function is hooked by the ftrace. However, sometimes the "call" can break
the RSB and introduce extra overhead. Therefore, introduce the flag
FTRACE_OPS_FL_JMP, which indicate that the ftrace_ops should be called
with a "jmp" instead of "call". For now, it is only used by the direct
call case.
When a direct ftrace_ops is marked with FTRACE_OPS_FL_JMP, the last bit of
the ops->direct_call will be set to 1. Therefore, we can tell if we should
use "jmp" for the callback in ftrace_call_replace().
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20251118123639.688444-2-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently we don't get stack trace via ORC unwinder on top of fgraph exit
handler. We can see that when generating stacktrace from kretprobe_multi
bpf program which is based on fprobe/fgraph.
The reason is that the ORC unwind code won't get pass the return_to_handler
callback installed by fgraph return probe machinery.
Solving this by creating stack frame in return_to_handler expected by
ftrace_graph_ret_addr function to recover original return address and
continue with the unwind.
Also updating the pt_regs data with cs/flags/rsp which are needed for
successful stack retrieval from ebpf bpf_get_stackid helper.
- in get_perf_callchain we check user_mode(regs) so CS has to be set
- in perf_callchain_kernel we call perf_hw_regs(regs), so EFLAGS/FIXED
has to be unset
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251104215405.168643-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Ftrace is tightly coupled with architecture specific code because it
requires the use of trampolines written in assembly. This means that when
a new feature or optimization is made, it must be done for all
architectures. To simplify the approach, CONFIG_HAVE_FTRACE_* configs are
added to denote which architecture has the new enhancement so that other
architectures can still function until they too have been updated.
The CONFIG_HAVE_FTRACE_MCOUNT was added to help simplify the
DYNAMIC_FTRACE work, but now every architecture that implements
DYNAMIC_FTRACE also has HAVE_FTRACE_MCOUNT set too, making it redundant
with the HAVE_DYNAMIC_FTRACE.
Remove the HAVE_FTRACE_MCOUNT config and use DYNAMIC_FTRACE directly where
applicable.
Link: https://lore.kernel.org/all/20250703154916.48e3ada7@gandalf.local.home/
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20250704104838.27a18690@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support for the FWFT SBI extension, which is part of SBI 3.0 and a
dependency for many new SBI and ISA extensions
- Support for getrandom() in the VDSO
- Support for mseal
- Optimized routines for raid6 syndrome and recovery calculations
- kexec_file() supports loading Image-formatted kernel binaries
- Improvements to the instruction patching framework to allow for
atomic instruction patching, along with rules as to how systems need
to behave in order to function correctly
- Support for a handful of new ISA extensions: Svinval, Zicbop, Zabha,
some SiFive vendor extensions
- Various fixes and cleanups, including: misaligned access handling,
perf symbol mangling, module loading, PUD THPs, and improved uaccess
routines
* tag 'riscv-for-linus-6.16-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (69 commits)
riscv: uaccess: Only restore the CSR_STATUS SUM bit
RISC-V: vDSO: Wire up getrandom() vDSO implementation
riscv: enable mseal sysmap for RV64
raid6: Add RISC-V SIMD syndrome and recovery calculations
riscv: mm: Add support for Svinval extension
RISC-V: Documentation: Add enough title underlines to CMODX
riscv: Improve Kconfig help for RISCV_ISA_V_PREEMPTIVE
MAINTAINERS: Update Atish's email address
riscv: uaccess: do not do misaligned accesses in get/put_user()
riscv: process: use unsigned int instead of unsigned long for put_user()
riscv: make unsafe user copy routines use existing assembly routines
riscv: hwprobe: export Zabha extension
riscv: Make regs_irqs_disabled() more clear
perf symbols: Ignore mapping symbols on riscv
RISC-V: Kconfig: Fix help text of CMDLINE_EXTEND
riscv: module: Optimize PLT/GOT entry counting
riscv: Add support for PUD THP
riscv: xchg: Prefetch the destination word for sc.w
riscv: Add ARCH_HAS_PREFETCH[W] support with Zicbop
riscv: Add support for Zicbop
...
|
|
The following ftrace patch for riscv uses a data store to update ftrace
function. Therefore, a romote fence is required to order it against
function_trace_op updates. The mechanism is similar to the fence between
function_trace_op and update_ftrace_func in the generic ftrace, so we
leverage the same ftrace_sync_ipi function.
[ alex: Fix build warning when !CONFIG_DYNAMIC_FTRACE ]
Signed-off-by: Andy Chiu <andybnac@gmail.com>
Link: https://lore.kernel.org/r/20250407180838.42877-4-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
- Have module addresses get updated in the persistent ring buffer
The addresses of the modules from the previous boot are saved in the
persistent ring buffer. If the same modules are loaded and an address
is in the old buffer points to an address that was both saved in the
persistent ring buffer and is loaded in memory, shift the address to
point to the address that is loaded in memory in the trace event.
- Print function names for irqs off and preempt off callsites
When ignoring the print fmt of a trace event and just printing the
fields directly, have the fields for preempt off and irqs off events
still show the function name (via kallsyms) instead of just showing
the raw address.
- Clean ups of the histogram code
The histogram functions saved over 800 bytes on the stack to process
events as they come in. Instead, create per-cpu buffers that can hold
this information and have a separate location for each context level
(thread, softirq, IRQ and NMI).
Also add some more comments to the code.
- Add "common_comm" field for histograms
Add "common_comm" that uses the current->comm as a field in an event
histogram and acts like any of the other fields of the event.
- Show "subops" in the enabled_functions file
When the function graph infrastructure is used, a subsystem has a
"subops" that it attaches its callback function to. Instead of the
enabled_functions just showing a function calling the function that
calls the subops functions, also show the subops functions that will
get called for that function too.
- Add "copy_trace_marker" option to instances
There are cases where an instance is created for tooling to write
into, but the old tooling has the top level instance hardcoded into
the application. New tools want to consume the data from an instance
and not the top level buffer. By adding a copy_trace_marker option,
whenever the top instance trace_marker is written into, a copy of it
is also written into the instance with this option set. This allows
new tools to read what old tools are writing into the top buffer.
If this option is cleared by the top instance, then what is written
into the trace_marker is not written into the top instance. This is a
way to redirect the trace_marker writes into another instance.
- Have tracepoints created by DECLARE_TRACE() use trace_<name>_tp()
If a tracepoint is created by DECLARE_TRACE() instead of
TRACE_EVENT(), then it will not be exposed via tracefs. Currently
there's no way to differentiate in the kernel the tracepoint
functions between those that are exposed via tracefs or not. A
calling convention has been made manually to append a "_tp" prefix
for events created by DECLARE_TRACE(). Instead of doing this
manually, force it so that all DECLARE_TRACE() events have this
notation.
- Use __string() for task->comm in some sched events
Instead of hardcoding the comm to be TASK_COMM_LEN in some of the
scheduler events use __string() which makes it dynamic. Note, if
these events are parsed by user space it they may break, and the
event may have to be converted back to the hardcoded size.
- Have function graph "depth" be unsigned to the user
Internally to the kernel, the "depth" field of the function graph
event is signed due to -1 being used for end of boundary. What
actually gets recorded in the event itself is zero or positive.
Reflect this to user space by showing "depth" as unsigned int and be
consistent across all events.
- Allow an arbitrary long CPU string to osnoise_cpus_write()
The filtering of which CPUs to write to can exceed 256 bytes. If a
machine has 256 CPUs, and the filter is to filter every other CPU,
the write would take a string larger than 256 bytes. Instead of using
a fixed size buffer on the stack that is 256 bytes, allocate it to
handle what is passed in.
- Stop having ftrace check the per-cpu data "disabled" flag
The "disabled" flag in the data structure passed to most ftrace
functions is checked to know if tracing has been disabled or not.
This flag was added back in 2008 before the ring buffer had its own
way to disable tracing. The "disable" flag is now not always set when
needed, and the ring buffer flag should be used in all locations
where the disabled is needed. Since the "disable" flag is redundant
and incorrect, stop using it. Fix up some locations that use the
"disable" flag to use the ring buffer info.
- Use a new tracer_tracing_disable/enable() instead of data->disable
flag
There's a few cases that set the data->disable flag to stop tracing,
but this flag is not consistently used. It is also an on/off switch
where if a function set it and calls another function that sets it,
the called function may incorrectly enable it.
Use a new trace_tracing_disable() and tracer_tracing_enable() that
uses a counter and can be nested. These use the ring buffer flags
which are always checked making the disabling more consistent.
- Save the trace clock in the persistent ring buffer
Save what clock was used for tracing in the persistent ring buffer
and set it back to that clock after a reboot.
- Remove unused reference to a per CPU data pointer in mmiotrace
functions
- Remove unused buffer_page field from trace_array_cpu structure
- Remove more strncpy() instances
- Other minor clean ups and fixes
* tag 'trace-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (36 commits)
tracing: Fix compilation warning on arm32
tracing: Record trace_clock and recover when reboot
tracing/sched: Use __string() instead of fixed lengths for task->comm
tracepoint: Have tracepoints created with DECLARE_TRACE() have _tp suffix
tracing: Cleanup upper_empty() in pid_list
tracing: Allow the top level trace_marker to write into another instances
tracing: Add a helper function to handle the dereference arg in verifier
tracing: Remove unnecessary "goto out" that simply returns ret is trigger code
tracing: Fix error handling in event_trigger_parse()
tracing: Rename event_trigger_alloc() to trigger_data_alloc()
tracing: Replace deprecated strncpy() with strscpy() for stack_trace_filter_buf
tracing: Remove unused buffer_page field from trace_array_cpu structure
tracing: Use atomic_inc_return() for updating "disabled" counter in irqsoff tracer
tracing: Convert the per CPU "disabled" counter to local from atomic
tracing: branch: Use trace_tracing_is_on_cpu() instead of "disabled" field
ring-buffer: Add ring_buffer_record_is_on_cpu()
tracing: Do not use per CPU array_buffer.data->disabled for cpumask
ftrace: Do not disabled function graph based on "disabled" field
tracing: kdb: Use tracer_tracing_on/off() instead of setting per CPU disabled
tracing: Use tracer_tracing_disable() instead of "disabled" field for ftrace_dump_one()
...
|
|
The function graph infrastructure uses subops of the function tracer.
These are not shown in enabled_functions. Add a "subops:" section to the
enabled_functions line to show what functions are attached via subops. If
the subops is from the function_graph infrastructure, then show the entry
and return callbacks that are attached.
Here's an example of the output:
schedule_on_each_cpu (1) tramp: 0xffffffffc03ef000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 subops: {ent:trace_graph_entry+0x0/0x20 ret:trace_graph_return+0x0/0x150}
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250410153830.5d97f108@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Move stack_tracer_enabled into trace_stack_sysctl_table. This is part of
a greater effort to move ctl tables into their respective subsystems
which will reduce the merge conflicts in kernel/sysctl.c.
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
|
|
Move trace ctl tables into their own const array in
kernel/trace/trace.c. The sysctl table register is called with
subsys_initcall placing if after its original place in proc_root_init.
This is part of a greater effort to move ctl tables into their
respective subsystems which will reduce the merge conflicts in
kernel/sysctl.c.
Signed-off-by: Joel Granados <joel.granados@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The function graph infrastructure is now generic so that kretprobes,
fprobes and BPF can use it. But there is still some leftover logic that
only the function graph tracer itself uses. This is the calculation of the
calltime and return time of the functions. The calculation of the calltime
has been moved into the function graph tracer and those users that need it
so that it doesn't cause overhead to the other users. But the return
function timestamp was still called.
Instead of just moving the taking of the timestamp into the function graph
trace remove the calltime and rettime completely from the ftrace_graph_ret
structure. Instead, move it into the function graph return entry event
structure and this also moves all the calltime and rettime logic out of
the generic fgraph.c code and into the tracing code that uses it.
This has been reported to decrease the overhead by ~27%.
Link: https://lore.kernel.org/all/Z3aSuql3fnXMVMoM@krava/
Link: https://lore.kernel.org/all/173665959558.1629214.16724136597211810729.stgit@devnote2/
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250121194436.15bdf71a@gandalf.local.home
Reported-by: Jiri Olsa <olsajiri@gmail.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
This introduces ftrace_get_symaddr() which tries to convert fentry_ip
passed by ftrace or fgraph callback to symaddr without calling
kallsyms API. It returns the symbol address or 0 if it fails to
convert it.
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/173519011487.391279.5450806886342723151.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202412061423.K79V55Hd-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202412061804.5VRzF14E-lkp@intel.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Allow fprobe events to be enabled with CONFIG_DYNAMIC_FTRACE_WITH_ARGS.
With this change, fprobe events mostly use ftrace_regs instead of pt_regs.
Note that if the arch doesn't enable HAVE_FTRACE_REGS_HAVING_PT_REGS,
fprobe events will not be able to be used from perf.
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/173518999352.391279.13332699755290175168.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add ftrace_fill_perf_regs() which should be compatible with the
perf_fetch_caller_regs(). In other words, the pt_regs returned from the
ftrace_fill_perf_regs() must satisfy 'user_mode(regs) == false' and can be
used for stack tracing.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/173518997908.391279.15910334347345106424.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add ftrace_partial_regs() which converts the ftrace_regs to pt_regs.
This is for the eBPF which needs this to keep the same pt_regs interface
to access registers.
Thus when replacing the pt_regs with ftrace_regs in fprobes (which is
used by kprobe_multi eBPF event), this will be used.
If the architecture defines its own ftrace_regs, this copies partial
registers to pt_regs and returns it. If not, ftrace_regs is the same as
pt_regs and ftrace_partial_regs() will return ftrace_regs::regs.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Florent Revest <revest@chromium.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Link: https://lore.kernel.org/173518996761.391279.4987911298206448122.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Change the fprobe exit handler to use ftrace_regs structure instead of
pt_regs. This also introduce HAVE_FTRACE_REGS_HAVING_PT_REGS which
means the ftrace_regs is including the pt_regs so that ftrace_regs
can provide pt_regs without memory allocation.
Fprobe introduces a new dependency with that.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Song Liu <song@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Matt Bobrowski <mattbobrowski@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Hao Luo <haoluo@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/173518995092.391279.6765116450352977627.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Pass ftrace_regs to the fgraph_ops::retfunc(). If ftrace_regs is not
available, it passes a NULL instead. User callback function can access
some registers (including return address) via this ftrace_regs.
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/173518992972.391279.14055405490327765506.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Use ftrace_regs instead of fgraph_ret_regs for tracing return value
on function_graph tracer because of simplifying the callback interface.
The CONFIG_HAVE_FUNCTION_GRAPH_RETVAL is also replaced by
CONFIG_HAVE_FUNCTION_GRAPH_FREGS.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/173518991508.391279.16635322774382197642.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Pass ftrace_regs to the fgraph_ops::entryfunc(). If ftrace_regs is not
available, it passes a NULL instead. User callback function can access
some registers (including return address) via this ftrace_regs.
Note that the ftrace_regs can be NULL when the arch does NOT define:
HAVE_DYNAMIC_FTRACE_WITH_ARGS or HAVE_DYNAMIC_FTRACE_WITH_REGS.
More specifically, if HAVE_DYNAMIC_FTRACE_WITH_REGS is defined but
not the HAVE_DYNAMIC_FTRACE_WITH_ARGS, and the ftrace ops used to
register the function callback does not set FTRACE_OPS_FL_SAVE_REGS.
In this case, ftrace_regs can be NULL in user callback.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/173518990044.391279.17406984900626078579.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Most architectures use pt_regs within ftrace_regs making a lot of the
accessor functions just calls to the pt_regs internally. Instead of
duplication this effort, use a HAVE_ARCH_FTRACE_REGS for architectures
that have their own ftrace_regs that is not based on pt_regs and will
define all the accessor functions, and for the architectures that just use
pt_regs, it will leave it undefined, and the default accessor functions
will be used.
Note, this will also make it easier to add new accessor functions to
ftrace_regs as it will mean having to touch less architectures.
Cc: <linux-arch@vger.kernel.org>
Cc: "x86@kernel.org" <x86@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/20241010202114.2289f6fd@gandalf.local.home
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> # powerpc
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
ftrace_regs was created to hold registers that store information to save
function parameters, return value and stack. Since it is a subset of
pt_regs, it should only be used by its accessor functions. But because
pt_regs can easily be taken from ftrace_regs (on most archs), it is
tempting to use it directly. But when running on other architectures, it
may fail to build or worse, build but crash the kernel!
Instead, make struct ftrace_regs an empty structure and have the
architectures define __arch_ftrace_regs and all the accessor functions
will typecast to it to get to the actual fields. This will help avoid
usage of ftrace_regs directly.
Link: https://lore.kernel.org/all/20241007171027.629bdafd@gandalf.local.home/
Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
Cc: "x86@kernel.org" <x86@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/20241008230628.958778821@goodmis.org
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Simplify return address printing in the function graph tracer by removing
fgraph_extras. Since this feature is only used by the function graph
tracer and the feature flags can directly accessible from the function
graph tracer, fgraph_extras can be removed from the fgraph callback.
Cc: Donglin Peng <dolinux.peng@gmail.com>
Link: https://lore.kernel.org/172857234900.270774.15378354017601069781.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When using function_graph tracer to analyze the flow of kernel function
execution, it is often necessary to quickly locate the exact line of code
where the call occurs. While this may be easy at times, it can be more
time-consuming when some functions are inlined or the flow is too long.
This feature aims to simplify the process by recording the return address
of traced funcions and printing it when outputing trace logs.
To enhance human readability, the prefix 'ret=' is used for the kernel return
value, while '<-' serves as the prefix for the return address in trace logs to
make it look more like the function tracer.
A new trace option named 'funcgraph-retaddr' has been introduced, and the
existing option 'sym-addr' can be used to control the format of the return
address.
See below logs with both funcgraph-retval and funcgraph-retaddr enabled.
0) | load_elf_binary() { /* <-bprm_execve+0x249/0x600 */
0) | load_elf_phdrs() { /* <-load_elf_binary+0x84/0x1730 */
0) | __kmalloc_noprof() { /* <-load_elf_phdrs+0x4a/0xb0 */
0) 3.657 us | __cond_resched(); /* <-__kmalloc_noprof+0x28c/0x390 ret=0x0 */
0) + 24.335 us | } /* __kmalloc_noprof ret=0xffff8882007f3000 */
0) | kernel_read() { /* <-load_elf_phdrs+0x6c/0xb0 */
0) | rw_verify_area() { /* <-kernel_read+0x2b/0x50 */
0) | security_file_permission() { /* <-kernel_read+0x2b/0x50 */
0) | selinux_file_permission() { /* <-security_file_permission+0x26/0x40 */
0) | __inode_security_revalidate() { /* <-selinux_file_permission+0x6d/0x140 */
0) 2.034 us | __cond_resched(); /* <-__inode_security_revalidate+0x5f/0x80 ret=0x0 */
0) 6.602 us | } /* __inode_security_revalidate ret=0x0 */
0) 2.214 us | avc_policy_seqno(); /* <-selinux_file_permission+0x107/0x140 ret=0x0 */
0) + 16.670 us | } /* selinux_file_permission ret=0x0 */
0) + 20.809 us | } /* security_file_permission ret=0x0 */
0) + 25.217 us | } /* rw_verify_area ret=0x0 */
0) | __kernel_read() { /* <-load_elf_phdrs+0x6c/0xb0 */
0) | ext4_file_read_iter() { /* <-__kernel_read+0x160/0x2e0 */
Then, we can use the faddr2line to locate the source code, for example:
$ ./scripts/faddr2line ./vmlinux load_elf_phdrs+0x6c/0xb0
load_elf_phdrs+0x6c/0xb0:
elf_read at fs/binfmt_elf.c:471
(inlined by) load_elf_phdrs at fs/binfmt_elf.c:531
Link: https://lore.kernel.org/20240915032912.1118397-1-dolinux.peng@gmail.com
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202409150605.HgUmU8ea-lkp@intel.com/
Signed-off-by: Donglin Peng <dolinux.peng@gmail.com>
[ Rebased to handle text_delta offsets ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The calltime field in the shadow stack frame is only used by the function
graph tracer and profiler. But now that there's other users of the function
graph infrastructure, this adds overhead and wastes space on the shadow
stack. Move the calltime to the fgraph data storage, where the function
graph and profiler entry functions will save it in its own graph storage and
retrieve it in its exit functions.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Link: https://lore.kernel.org/20240914214827.096968730@goodmis.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Instead of having the "subtime" for the function profiler in the
infrastructure ftrace_ret_stack structure, have it use the fgraph data
reserve and retrieve functions.
This will keep the limited shadow stack from wasting 8 bytes for something
that is seldom used.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Link: https://lore.kernel.org/20240914214826.780323141@goodmis.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
To clarify what will be expected on ftrace_regs, add a comment to the
architecture independent definition of the ftrace_regs.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
const qualify the struct ctl_table argument in the proc_handler function
signatures. This is a prerequisite to moving the static ctl_table
structs into .rodata data which will ensure that proc_handler function
pointers cannot be modified.
This patch has been generated by the following coccinelle script:
```
virtual patch
@r1@
identifier ctl, write, buffer, lenp, ppos;
identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)";
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos);
@r2@
identifier func, ctl, write, buffer, lenp, ppos;
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos)
{ ... }
@r3@
identifier func;
@@
int func(
- struct ctl_table *
+ const struct ctl_table *
,int , void *, size_t *, loff_t *);
@r4@
identifier func, ctl;
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int , void *, size_t *, loff_t *);
@r5@
identifier func, write, buffer, lenp, ppos;
@@
int func(
- struct ctl_table *
+ const struct ctl_table *
,int write, void *buffer, size_t *lenp, loff_t *ppos);
```
* Code formatting was adjusted in xfs_sysctl.c to comply with code
conventions. The xfs_stats_clear_proc_handler,
xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where
adjusted.
* The ctl_table argument in proc_watchdog_common was const qualified.
This is called from a proc_handler itself and is calling back into
another proc_handler, making it necessary to change it as part of the
proc_handler migration.
Co-developed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Co-developed-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Joel Granados <j.granados@samsung.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace updates from Steven Rostedt:
"Rewrite of function graph tracer to allow multiple users
Up until now, the function graph tracer could only have a single user
attached to it. If another user tried to attach to the function graph
tracer while one was already attached, it would fail. Allowing
function graph tracer to have more than one user has been asked for
since 2009, but it required a rewrite to the logic to pull it off so
it never happened. Until now!
There's three systems that trace the return of a function. That is
kretprobes, function graph tracer, and BPF. kretprobes and function
graph tracing both do it similarly. The difference is that kretprobes
uses a shadow stack per callback and function graph tracer creates a
shadow stack for all tasks. The function graph tracer method makes it
possible to trace the return of all functions. As kretprobes now needs
that feature too, allowing it to use function graph tracer was needed.
BPF also wants to trace the return of many probes and its method
doesn't scale either. Having it use function graph tracer would
improve that.
By allowing function graph tracer to have multiple users allows both
kretprobes and BPF to use function graph tracer in these cases. This
will allow kretprobes code to be removed in the future as it's version
will no longer be needed.
Note, function graph tracer is only limited to 16 simultaneous users,
due to shadow stack size and allocated slots"
* tag 'ftrace-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (49 commits)
fgraph: Use str_plural() in test_graph_storage_single()
function_graph: Add READ_ONCE() when accessing fgraph_array[]
ftrace: Add missing kerneldoc parameters to unregister_ftrace_direct()
function_graph: Everyone uses HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, remove it
function_graph: Fix up ftrace_graph_ret_addr()
function_graph: Make fgraph_update_pid_func() a stub for !DYNAMIC_FTRACE
function_graph: Rename BYTE_NUMBER to CHAR_NUMBER in selftests
fgraph: Remove some unused functions
ftrace: Hide one more entry in stack trace when ftrace_pid is enabled
function_graph: Do not update pid func if CONFIG_DYNAMIC_FTRACE not enabled
function_graph: Make fgraph_do_direct static key static
ftrace: Fix prototypes for ftrace_startup/shutdown_subops()
ftrace: Assign RCU list variable with rcu_assign_ptr()
ftrace: Assign ftrace_list_end to ftrace_ops_list type cast to RCU
ftrace: Declare function_trace_op in header to quiet sparse warning
ftrace: Add comments to ftrace_hash_move() and friends
ftrace: Convert "inc" parameter to bool in ftrace_hash_rec_update_modify()
ftrace: Add comments to ftrace_hash_rec_disable/enable()
ftrace: Remove "filter_hash" parameter from __ftrace_hash_rec_update()
ftrace: Rename dup_hash() and comment it
...
|
|
Building with W=1 in some configurations produces a false positive
warning for kallsyms:
kernel/kallsyms.c: In function '__sprint_symbol.isra':
kernel/kallsyms.c:503:17: error: 'strcpy' source argument is the same as destination [-Werror=restrict]
503 | strcpy(buffer, name);
| ^~~~~~~~~~~~~~~~~~~~
This originally showed up while building with -O3, but later started
happening in other configurations as well, depending on inlining
decisions. The underlying issue is that the local 'name' variable is
always initialized to the be the same as 'buffer' in the called functions
that fill the buffer, which gcc notices while inlining, though it could
see that the address check always skips the copy.
The calling conventions here are rather unusual, as all of the internal
lookup functions (bpf_address_lookup, ftrace_mod_address_lookup,
ftrace_func_address_lookup, module_address_lookup and
kallsyms_lookup_buildid) already use the provided buffer and either return
the address of that buffer to indicate success, or NULL for failure,
but the callers are written to also expect an arbitrary other buffer
to be returned.
Rework the calling conventions to return the length of the filled buffer
instead of its address, which is simpler and easier to follow as well
as avoiding the warning. Leave only the kallsyms_lookup() calling conventions
unchanged, since that is called from 16 different functions and
adapting this would be a much bigger change.
Link: https://lore.kernel.org/lkml/20200107214042.855757-1-arnd@arndb.de/
Link: https://lore.kernel.org/lkml/20240326130647.7bfb1d92@gandalf.local.home/
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
All architectures that implement function graph also implements
HAVE_FUNCTION_GRAPH_RET_ADDR_PTR. Remove it, as it is no longer a
differentiator.
Link: https://lore.kernel.org/linux-trace-kernel/20240611031737.982047614@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Sparse complains that function_trace_op is not static but is not declared
in a header file. It is used only in assembly code. But add it to a header
so that sparse no longer complains:
kernel/trace/ftrace.c:99:19: warning: symbol 'function_trace_op' was not declared. Should it be static?
Link: https://lore.kernel.org/linux-trace-kernel/20240605202708.289105647@goodmis.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Added functions that can be called by a fgraph_ops entryfunc and retfunc to
store state between the entry of the function being traced to the exit of
the same function. The fgraph_ops entryfunc() may call
fgraph_reserve_data() to store up to 32 words onto the task's shadow
ret_stack and this then can be retrieved by fgraph_retrieve_data() called
by the corresponding retfunc().
Co-developed with Masami Hiramatsu:
Link: https://lore.kernel.org/linux-trace-kernel/171509109089.162236.11372474169781184034.stgit@devnote2
Link: https://lore.kernel.org/linux-trace-kernel/20240603190823.959703050@goodmis.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add a "task variables" array on the tasks shadow ret_stack that is the
size of longs for each possible registered fgraph_ops. That's a total
of 16, taking up 8 * 16 = 128 bytes (out of a page size 4k).
This will allow for fgraph_ops to do specific features on a per task basis
having a way to maintain state for each task.
Co-developed with Masami Hiramatsu:
Link: https://lore.kernel.org/linux-trace-kernel/171509104383.162236.12239656156685718550.stgit@devnote2
Link: https://lore.kernel.org/linux-trace-kernel/20240603190823.308806126@goodmis.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Now that the function_graph has a main callback that handles the function
graph subops tracing, it no longer honors the pid filtering of ftrace. Add
back this logic in the function_graph code to update the gops callback for
the entry function to test if it should trace the current task or not.
Link: https://lore.kernel.org/linux-trace-kernel/20240603190822.991720703@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Allow for instances to have their own ftrace_ops part of the fgraph_ops
that makes the funtion_graph tracer filter on the set_ftrace_filter file
of the instance and not the top instance.
This uses the new ftrace_startup_subops(), by using graph_ops as the
"manager ops" that defines the callback function and adds the functions
defined by the filters of the ops for each trace instance. The callback
defined by the manager ops will call the registered fgraph ops that were
added to the fgraph_array.
Co-developed with Masami Hiramatsu:
Link: https://lore.kernel.org/linux-trace-kernel/171509102088.162236.15758883237657317789.stgit@devnote2
Link: https://lore.kernel.org/linux-trace-kernel/20240603190822.832946261@goodmis.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The subops filters use a "manager" ops to enable and disable its filters.
The manager ops can handle more than one subops, and its filter is what
controls what functions get set. Add a ftrace_hash_move_and_update_subops()
function that will update the manager ops when the subops filters change.
Link: https://lore.kernel.org/linux-trace-kernel/20240603190822.673932251@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
There are cases where a single system will use a single function callback
to handle multiple users. For example, to allow function_graph tracer to
have multiple users where each can trace their own set of functions, it is
useful to only have one ftrace_ops registered to ftrace that will call a
function by the function_graph tracer to handle the multiplexing with the
different registered function_graph tracers.
Add a "subop_list" to the ftrace_ops that will hold a list of other
ftrace_ops that the top ftrace_ops will manage.
The function ftrace_startup_subops() that takes the manager ftrace_ops and
a subop ftrace_ops it will manage. If there are no subops with the
ftrace_ops yet, it will copy the ftrace_ops subop filters to the manager
ftrace_ops and register that with ftrace_startup(), and adds the subop to
its subop_list. If the manager ops already has something registered, it
will then merge the new subop filters with what it has and enable the new
functions that covers all the subops it has.
To remove a subop, ftrace_shutdown_subops() is called which will use the
subop_list of the manager ops to rebuild all the functions it needs to
trace, and update the ftrace records to only call the functions it now has
registered. If there are no more functions registered, it will then call
ftrace_shutdown() to disable itself completely.
Note, it is up to the manager ops callback to always make sure that the
subops callbacks are called if its filter matches, as there are times in
the update where the callback could be calling more functions than those
that are currently registered.
This could be updated to handle other systems other than function_graph,
for example, fprobes could use this (but will need an interface to call
ftrace_startup_subops()).
Link: https://lore.kernel.org/linux-trace-kernel/20240603190822.508431129@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Some of the flags for ftrace_startup() may be exposed even when
CONFIG_DYNAMIC_FTRACE is not configured in. This is fine as the difference
between dynamic ftrace and static ftrace is done within the internals of
ftrace itself. No need to have use cases fail to compile because dynamic
ftrace is disabled.
This change is needed to move some of the logic of what is passed to
ftrace_startup() out of the parameters of ftrace_startup().
Link: https://lore.kernel.org/linux-trace-kernel/171509100890.162236.4362350342549122222.stgit@devnote2
Link: https://lore.kernel.org/linux-trace-kernel/20240603190822.350654104@goodmis.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Now that function graph tracing can handle more than one user, allow it to
be enabled in the ftrace instances. Note, the filtering of the functions is
still joined by the top level set_ftrace_filter and friends, as well as the
graph and nograph files.
Co-developed with Masami Hiramatsu:
Link: https://lore.kernel.org/linux-trace-kernel/171509099743.162236.1699959255446248163.stgit@devnote2
Link: https://lore.kernel.org/linux-trace-kernel/20240603190822.190630762@goodmis.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Pass the fgraph_ops structure to the function graph callbacks. This will
allow callbacks to add a descriptor to a fgraph_ops private field that wil
be added in the future and use it for the callbacks. This will be useful
when more than one callback can be registered to the function graph tracer.
Co-developed with Masami Hiramatsu:
Link: https://lore.kernel.org/linux-trace-kernel/171509098588.162236.4787930115997357578.stgit@devnote2
Link: https://lore.kernel.org/linux-trace-kernel/20240603190822.035147698@goodmis.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Allow for multiple users to attach to function graph tracer at the same
time. Only 16 simultaneous users can attach to the tracer. This is because
there's an array that stores the pointers to the attached fgraph_ops. When
a function being traced is entered, each of the ftrace_ops entryfunc is
called and if it returns non zero, its index into the array will be added
to the shadow stack.
On exit of the function being traced, the shadow stack will contain the
indexes of the ftrace_ops on the array that want their retfunc to be
called.
Because a function may sleep for a long time (if a task sleeps itself),
the return of the function may be literally days later. If the ftrace_ops
is removed, its place on the array is replaced with a ftrace_ops that
contains the stub functions and that will be called when the function
finally returns.
If another ftrace_ops is added that happens to get the same index into the
array, its return function may be called. But that's actually the way
things current work with the old function graph tracer. If one tracer is
removed and another is added, the new one will get the return calls of the
function traced by the previous one, thus this is not a regression. This
can be fixed by adding a counter to each time the array item is updated and
save that on the shadow stack as well, such that it won't be called if the
index saved does not match the index on the array.
Note, being able to filter functions when both are called is not completely
handled yet, but that shouldn't be too hard to manage.
Co-developed with Masami Hiramatsu:
Link: https://lore.kernel.org/linux-trace-kernel/171509096221.162236.8806372072523195752.stgit@devnote2
Link: https://lore.kernel.org/linux-trace-kernel/20240603190821.555493396@goodmis.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
This commit replaces riscv's support for FTRACE_WITH_REGS with support
for FTRACE_WITH_ARGS. This is required for the ongoing effort to stop
relying on stop_machine() for RISCV's implementation of ftrace.
The main relevant benefit that this change will bring for the above
use-case is that now we don't have separate ftrace_caller and
ftrace_regs_caller trampolines. This will allow the callsite to call
ftrace_caller by modifying a single instruction. Now the callsite can
do something similar to:
When not tracing: | When tracing:
func: func:
auipc t0, ftrace_caller_top auipc t0, ftrace_caller_top
nop <=========<Enable/Disable>=========> jalr t0, ftrace_caller_bottom
[...] [...]
The above assumes that we are dropping the support of calling a direct
trampoline from the callsite. We need to drop this as the callsite can't
change the target address to call, it can only enable/disable a call to
a preset target (ftrace_caller in the above diagram). We can later optimize
this by calling an intermediate dispatcher trampoline before ftrace_caller.
Currently, ftrace_regs_caller saves all CPU registers in the format of
struct pt_regs and allows the tracer to modify them. We don't need to
save all of the CPU registers because at function entry only a subset of
pt_regs is live:
|----------+----------+---------------------------------------------|
| Register | ABI Name | Description |
|----------+----------+---------------------------------------------|
| x1 | ra | Return address for traced function |
| x2 | sp | Stack pointer |
| x5 | t0 | Return address for ftrace_caller trampoline |
| x8 | s0/fp | Frame pointer |
| x10-11 | a0-1 | Function arguments/return values |
| x12-17 | a2-7 | Function arguments |
|----------+----------+---------------------------------------------|
See RISCV calling convention[1] for the above table.
Saving just the live registers decreases the amount of stack space
required from 288 Bytes to 112 Bytes.
Basic testing was done with this on the VisionFive 2 development board.
Note:
- Moving from REGS to ARGS will mean that RISCV will stop supporting
KPROBES_ON_FTRACE as it requires full pt_regs to be saved.
- KPROBES_ON_FTRACE will be supplanted by FPROBES see [2].
[1] https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf
[2] https://lore.kernel.org/all/170887410337.564249.6360118840946697039.stgit@devnote2/
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20240405142453.4187-1-puranjay@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Commit 8788ca164eb4b ("ftrace: Remove the legacy _ftrace_direct API")
stopped setting the 'ftrace_direct_func_count' variable, but left
it around. Clean it up.
Link: https://lore.kernel.org/linux-trace-kernel/20240506233305.215735-1-linux@treblig.org
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Commit 8788ca164eb4b ("ftrace: Remove the legacy _ftrace_direct API")
stopped using 'ftrace_direct_funcs' (and the associated
struct ftrace_direct_func). Remove them.
Build tested only (on x86-64 with FTRACE and DYNAMIC_FTRACE
enabled)
Link: https://lore.kernel.org/linux-trace-kernel/20240504132303.67538-1-linux@treblig.org
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Currently ftrace only dumps the global trace buffer on an OOPs. For
debugging a production usecase, instance trace will be helpful to
check specific problems since global trace buffer may be used for
other purposes.
This patch extend the ftrace_dump_on_oops parameter to dump a specific
or multiple trace instances:
- ftrace_dump_on_oops=0: as before -- don't dump
- ftrace_dump_on_oops[=1]: as before -- dump the global trace buffer
on all CPUs
- ftrace_dump_on_oops=2 or =orig_cpu: as before -- dump the global
trace buffer on CPU that triggered the oops
- ftrace_dump_on_oops=<instance_name>: new behavior -- dump the
tracing instance matching <instance_name>
- ftrace_dump_on_oops[=2/orig_cpu],<instance1_name>[=2/orig_cpu],
<instrance2_name>[=2/orig_cpu]: new behavior -- dump the global trace
buffer and multiple instance buffer on all CPUs, or only dump on CPU
that triggered the oops if =2 or =orig_cpu is given
Also, the sysctl node can handle the input accordingly.
Link: https://lore.kernel.org/linux-trace-kernel/20240223083126.1817731-1-quic_hyiwei@quicinc.com
Cc: Ross Zwisler <zwisler@google.com>
Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Cc: <mcgrof@kernel.org>
Cc: <keescook@chromium.org>
Cc: <j.granados@samsung.com>
Cc: <mathieu.desnoyers@efficios.com>
Cc: <corbet@lwn.net>
Signed-off-by: Huang Yiwei <quic_hyiwei@quicinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
ftrace_disable_daemon()
The definition of ftrace_enable_daemon() and ftrace_disable_daemon() has
been removed since commit cb7be3b2fc2c ("ftrace: remove daemon"), remain
the declarations in the header files, so remove it.
Link: https://lore.kernel.org/linux-trace-kernel/20230804013636.115940-1-zhangzekun11@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
commit 6a9c981b1e96 ("ftrace: Remove unused function ftrace_arch_read_dyn_info()")
left ftrace_arch_read_dyn_info() extern declaration.
And commit 1d74f2a0f64b ("ftrace: remove ftrace_ip_converted()")
leave ftrace_ip_converted() declaration.
Link: https://lore.kernel.org/linux-trace-kernel/20230725134808.9716-1-yuehaibing@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
These are all tracing W=1 warnings in arm64 allmodconfig about missing
prototypes:
kernel/trace/trace_kprobe_selftest.c:7:5: error: no previous prototype for 'kprobe_trace_selftest_target' [-Werror=missing-pro
totypes]
kernel/trace/ftrace.c:329:5: error: no previous prototype for '__register_ftrace_function' [-Werror=missing-prototypes]
kernel/trace/ftrace.c:372:5: error: no previous prototype for '__unregister_ftrace_function' [-Werror=missing-prototypes]
kernel/trace/ftrace.c:4130:15: error: no previous prototype for 'arch_ftrace_match_adjust' [-Werror=missing-prototypes]
kernel/trace/fgraph.c:243:15: error: no previous prototype for 'ftrace_return_to_handler' [-Werror=missing-prototypes]
kernel/trace/fgraph.c:358:6: error: no previous prototype for 'ftrace_graph_sleep_time_control' [-Werror=missing-prototypes]
arch/arm64/kernel/ftrace.c:460:6: error: no previous prototype for 'prepare_ftrace_return' [-Werror=missing-prototypes]
arch/arm64/kernel/ptrace.c:2172:5: error: no previous prototype for 'syscall_trace_enter' [-Werror=missing-prototypes]
arch/arm64/kernel/ptrace.c:2195:6: error: no previous prototype for 'syscall_trace_exit' [-Werror=missing-prototypes]
Move the declarations to an appropriate header where they can be seen
by the caller and callee, and make sure the headers are included where
needed.
Link: https://lore.kernel.org/linux-trace-kernel/20230517125215.930689-1-arnd@kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Florent Revest <revest@chromium.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
[ Fixed ftrace_return_to_handler() to handle CONFIG_HAVE_FUNCTION_GRAPH_RETVAL case ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Adding new available_filter_functions_addrs file that shows all available
functions (same as available_filter_functions) together with addresses,
like:
# cat available_filter_functions_addrs | head
ffffffff81000770 __traceiter_initcall_level
ffffffff810007c0 __traceiter_initcall_start
ffffffff81000810 __traceiter_initcall_finish
ffffffff81000860 trace_initcall_finish_cb
...
Note displayed address is the patch-site address and can differ from
/proc/kallsyms address.
It's useful to have address avilable for traceable symbols, so we don't
need to allways cross check kallsyms with available_filter_functions
(or the other way around) and have all the data in single file.
For backwards compatibility reasons we can't change the existing
available_filter_functions file output, but we need to add new file.
The problem is that we need to do 2 passes:
- through available_filter_functions and find out if the function is traceable
- through /proc/kallsyms to get the address for traceable function
Having available_filter_functions symbols together with addresses allow
us to skip the kallsyms step and we are ok with the address in
available_filter_functions_addr not being the function entry, because
kprobe_multi uses fprobe and that handles both entry and patch-site
address properly.
We have 2 interfaces how to create kprobe_multi link:
a) passing symbols to kernel
1) user gathers symbols and need to ensure that they are
trace-able -> pass through available_filter_functions file
2) kernel takes those symbols and translates them to addresses
through kallsyms api
3) addresses are passed to fprobe/ftrace through:
register_fprobe_ips
-> ftrace_set_filter_ips
b) passing addresses to kernel
1) user gathers symbols and needs to ensure that they are
trace-able -> pass through available_filter_functions file
2) user takes those symbols and translates them to addresses
through /proc/kallsyms
3) addresses are passed to the kernel and kernel calls:
register_fprobe_ips
-> ftrace_set_filter_ips
The new available_filter_functions_addrs file helps us with option b),
because we can make 'b 1' and 'b 2' in one step - while filtering traceable
functions, we get the address directly.
Link: https://lore.kernel.org/linux-trace-kernel/20230611130029.1202298-1-jolsa@kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Tested-by: Jackie Liu <liuyun01@kylinos.cn> # x86
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|