linux-toradex.git/tools/lib/perf, branch v6.14-rc2

Merge remote-tracking branch 'torvalds/master' into perf-tools-next

2024-12-13T14:53:27+00:00

To get the fixes that went thru perf-tools for v6.13.

Signed-off-by: Arnaldo Carvalho de Melo

libperf: evlist: Fix --cpu argument on hybrid platform

2024-12-11T17:19:44+00:00

Since the linked fixes: commit, specifying a CPU on hybrid platforms
results in an error because Perf tries to open an extended type event
on "any" CPU which isn't valid. Extended type events can only be opened
on CPUs that match the type.

Before (working):

  $ perf record --cpu 1 -- true
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 2.385 MB perf.data (7 samples) ]

After (not working):

  $ perf record -C 1 -- true
  WARNING: A requested CPU in '1' is not supported by PMU 'cpu_atom' (CPUs 16-27) for event 'cycles:P'
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cpu_atom/cycles:P/).
  /bin/dmesg | grep -i perf may provide additional information.

(Ignore the warning message, that's expected and not particularly
relevant to this issue).

This is because perf_cpu_map__intersect() of the user specified CPU (1)
and one of the PMU's CPUs (16-27) correctly results in an empty (NULL)
CPU map. However for the purposes of opening an event, libperf converts
empty CPU maps into an any CPU (-1) which the kernel rejects.

Fix it by deleting evsels with empty CPU maps in the specific case where
user requested CPU maps are evaluated.

Fixes: 251aa040244a ("perf parse-events: Wildcard most "numeric" events")
Reviewed-by: Ian Rogers 
Tested-by: Thomas Falcon 
Signed-off-by: James Clark 
Tested-by: Arnaldo Carvalho de Melo 
Link: https://lore.kernel.org/r/20241114160450.295844-2-james.clark@linaro.org
Signed-off-by: Namhyung Kim

libperf cpumap: Grow array of read CPUs in smaller increments

2024-12-09T20:52:41+00:00

Instead of growing the array by 2048, grow by the larger of the current
range or 16.

As ranges are typical for things like the online CPUs this will mean a
single allocation happens.

While uncore CPU maps will grow 16 at a time which is a value that is
generous except say on large servers.

Reviewed-by: Leo Yan 
Signed-off-by: Ian Rogers 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Ben Gainey 
Cc: Ingo Molnar 
Cc: James Clark 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Kyle Meyer 
Cc: Mark Rutland 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Link: https://lore.kernel.org/r/20241206044035.1062032-9-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo

libperf cpumap: Remove perf_cpu_map__read()

2024-12-09T20:52:41+00:00

Function is no longer used and duplicates the parsing logic from
perf_cpu_map__new().

Remove to allow simplification.

Reviewed-by: Leo Yan 
Signed-off-by: Ian Rogers 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Ben Gainey 
Cc: Ingo Molnar 
Cc: James Clark 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Kyle Meyer 
Cc: Mark Rutland 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Link: https://lore.kernel.org/r/20241206044035.1062032-8-irogers@google.com
[ Applied manually to cope with "libperf cpumap: Refactor perf_cpu_map__merge()" ]
Signed-off-by: Arnaldo Carvalho de Melo

libperf cpumap: Remove use of perf_cpu_map__read()

2024-12-09T20:52:41+00:00

Remove use of a FILE and switch to reading a string that is then
passed to perf_cpu_map__new().

Being able to remove perf_cpu_map__read() avoids duplicated parsing
logic.

Reviewed-by: Leo Yan 
Signed-off-by: Ian Rogers 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Ben Gainey 
Cc: Ingo Molnar 
Cc: James Clark 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Kyle Meyer 
Cc: Mark Rutland 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Link: https://lore.kernel.org/r/20241206044035.1062032-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo

libperf cpumap: Be tolerant of newline at the end of a cpumask

2024-12-09T20:52:41+00:00

File cpumasks often have a newline that shouldn't trigger the invalid
parsing case in perf_cpu_map__new().

Reviewed-by: Leo Yan 
Signed-off-by: Ian Rogers 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Ben Gainey 
Cc: Ingo Molnar 
Cc: James Clark 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Kyle Meyer 
Cc: Mark Rutland 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Link: https://lore.kernel.org/r/20241206044035.1062032-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo

libperf cpumap: Hide/reduce scope of MAX_NR_CPUS

2024-12-09T20:52:41+00:00

Avoid redefinition of MAX_NR_CPUS as a global constant, the original
definition is tools/perf/perf.h.

Reviewed-by: Leo Yan 
Signed-off-by: Ian Rogers 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Ben Gainey 
Cc: Ingo Molnar 
Cc: James Clark 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Kyle Meyer 
Cc: Mark Rutland 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Link: https://lore.kernel.org/r/20241206044035.1062032-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo

perf: Increase MAX_NR_CPUS to 4096

2024-12-09T20:52:41+00:00

Systems have surpassed 2048 CPUs. Increase MAX_NR_CPUS to 4096.

Bitmaps declared with MAX_NR_CPUS bits will increase from 256B to 512B,
cpus_runtime will increase from 81960B to 163880B, and max_entries will
increase from 8192B to 16384B.

Reviewed-by: Ian Rogers 
Reviewed-by: Leo Yan 
Signed-off-by: Kyle Meyer 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Ben Gainey 
Cc: Ingo Molnar 
Cc: James Clark 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Mark Rutland 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Link: https://lore.kernel.org/r/20241206044035.1062032-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo

libperf cpumap: Refactor perf_cpu_map__merge()

2024-12-09T20:52:41+00:00

The perf_cpu_map__merge() function has two arguments, 'orig' and
'other'.  The function definition might cause confusion as it could give
the impression that the CPU maps in the two arguments are copied into a
new allocated structure, which is then returned as the result.

The purpose of the function is to merge the CPU map 'other' into the CPU
map 'orig'.  This commit changes the 'orig' argument to a pointer to
pointer, so the new result will be updated into 'orig'.

The return value is changed to an int type, as an error number or 0 for
success.

Update callers and tests for the new function definition.

Reviewed-by: Adrian Hunter 
Signed-off-by: Leo Yan 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Ian Rogers 
Cc: James Clark 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Mark Rutland 
Cc: Namhyung Kim 
Link: https://lore.kernel.org/r/20241107125308.41226-2-leo.yan@arm.com
Signed-off-by: Arnaldo Carvalho de Melo

tools/perf: Correctly calculate sample period for inherited SAMPLE_READ values

2024-10-02T21:58:03+00:00

Sample period calculation in deliver_sample_value is updated to
calculate the per-thread period delta for events that are inherit +
PERF_SAMPLE_READ. When the sampling event has this configuration, the
read_format.id is used with the tid from the sample to lookup the
storage of the previously accumulated counter total before calculating
the delta. All existing valid configurations where read_format.value
represents some global value continue to use just the read_format.id to
locate the storage of the previously accumulated total.

perf_sample_id is modified to support tracking per-thread
values, along with the existing global per-id values. In the
per-thread case, values are stored in a hash by tid within the
perf_sample_id, and are dynamically allocated as the number is not known
ahead of time.

Signed-off-by: Ben Gainey 
Cc: james.clark@arm.com
Link: https://lore.kernel.org/r/20241001121505.1009685-2-ben.gainey@arm.com
Signed-off-by: Namhyung Kim