From 0f72027bb9fb77a2b2ea7d73e6d58bb9c0efb1e8 Mon Sep 17 00:00:00 2001 From: Howard Chu Date: Wed, 30 Apr 2025 19:28:00 -0700 Subject: perf record --off-cpu: Parse off-cpu event Parse the off-cpu event using parse_event(), as bpf-output. Call evlist__enable_evsel() on off-cpu event. This fixes the inability to collect direct off-cpu samples on a workload, as reported by Arnaldo Carvalho de Melo . The reason being, workload sets enable_on_exec instead of calling evlist__enable(), but off-cpu event does not attach to an executable and execve won't be called, so the fds from perf_event_open() are not enabled. no-inherit should be set to 1, here's the reason: We update the BPF perf_event map for direct off-cpu sample dumping (in following patches), it executes as follows: bpf_map_update_value() bpf_fd_array_map_update_elem() perf_event_fd_array_get_ptr() perf_event_read_local() In perf_event_read_local(), there is: int perf_event_read_local(struct perf_event *event, u64 *value, u64 *enabled, u64 *running) { ... /* * It must not be an event with inherit set, we cannot read * all child counters from atomic context. */ if (event->attr.inherit) { ret = -EOPNOTSUPP; goto out; } Which means no-inherit has to be true for updating the BPF perf_event map. Moreover, for bpf-output events, we primarily want a system-wide event instead of a per-task event. The reason is that in BPF's bpf_perf_event_output(), BPF uses the CPU index to retrieve the perf_event file descriptor it outputs to. Making a bpf-output event system-wide naturally satisfies this requirement by mapping CPU appropriately. Suggested-by: Namhyung Kim Reviewed-by: Ian Rogers Signed-off-by: Howard Chu Tested-by: Arnaldo Carvalho de Melo Tested-by: Gautam Menghani Tested-by: Ian Rogers Acked-by: Namhyung Kim Cc: Adrian Hunter Cc: Alexander Shishkin Cc: Ingo Molnar Cc: James Clark Cc: Jiri Olsa Cc: Kan Liang Cc: Mark Rutland Cc: Peter Zijlstra Link: https://lore.kernel.org/r/20241108204137.2444151-4-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-3-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-record.c | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'tools/perf/builtin-record.c') diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 6637a3acb1f1..4194ea5ac729 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -2568,6 +2568,13 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) if (!target__none(&opts->target) && !opts->target.initial_delay) evlist__enable(rec->evlist); + /* + * offcpu-time does not call execve, so enable_on_exe wouldn't work + * when recording a workload, do it manually + */ + if (rec->off_cpu) + evlist__enable_evsel(rec->evlist, (char *)OFFCPU_EVENT); + /* * Let the child rip */ -- cgit v1.2.3