summaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)Author
2026-04-05perf sample: Add evsel to struct perf_sampleIan Rogers
Add the evsel from evsel__parse_sample into the struct perf_sample. Sometimes we want to alter the evsel associated with a sample, such as with off-cpu bpf-output events. In general the evsel and perf_sample are passed as a pair, but this makes an altered evsel something of a chore to keep checking for and setting up. Later patches will remove passing an evsel with the perf_sample and switch to just using the perf_sample's value. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-05perf sample: Make sure perf_sample__init/exit are usedIan Rogers
The deferred stack trace code wasn't using perf_sample__init/exit. Add the deferred stack trace clean up to perf_sample__exit which requires proper NULL initialization in perf_sample__init. Make the perf_sample__exit robust to being called more than once by using zfree. Make the error paths in evsel__parse_sample exit the sample. Add a merged_callchain boolean to capture that callchain is allocated, deferred_callchain doen't suffice for this. Pack the struct variables to avoid padding bytes for this. Similiarly powerpc_vpadtl_sample wasn't using perf_sample__init/exit, use it for consistency and potential issues with uninitialized variables. Similarly guest_session__inject_events in builtin-inject wasn't using perf_sample_init/exit. The lifetime management for fetched events is somewhat complex there, but when an event is fetched the sample should be initialized and needs exiting on error. The sample may be left in place so that future injects have access to it. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-05perf sample: Document struct perf_sampleIan Rogers
Add kernel-doc for struct perf_sample capturing the somewhat unusual population of fields and lifetime relationships. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-05perf tools: Save cln_size headerRicky Ringler
Store cacheline size during perf record in header, so that cacheline size can be used for other features, like sort keys for perf report. Testing example with feat enabled: $ perf record ./Example $ perf report --header-only | grep -C 3 cacheline CPU_DOMAIN_INFO info available, use -I to display e_machine : 62 e_flags : 0 cacheline size: 64 missing features: TRACING_DATA BUILD_ID BRANCH_STACK GROUP_DESC AUXTRACE \ STAT CLOCKID DIR_FORMAT COMPRESSED CLOCK_DATA ======== [namhyung: Update the commit message and remove blank lines] Signed-off-by: Ricky Ringler <ricky.ringler@proton.me> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-05perf tests sched stats: Write output to temp fileIan Rogers
Writing to the perf.data file can fail in various contexts such as continual test. Other tests write to a mktemp-ed file, make the "perf sched stats tests" follow this convention. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Swapnil Sapkal <swapnil.sapkal@amd.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-05perf sched: Avoid crash for unexpected perf sched stats reportNamhyung Kim
Doing a `perf sched record` then `perf sched stats report` crashes as the tp_handler isn't set. Add a dummy tp_handler for it rather than adding an extra check. Reported-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-04prctl: cfi: change the branch landing pad prctl()s to be more descriptivePaul Walmsley
Per Linus' comments requesting the replacement of "INDIR_BR_LP" in the indirect branch tracking prctl()s with something more readable, and suggesting the use of the speculation control prctl()s as an exemplar, reimplement the prctl()s and related constants that control per-task forward-edge control flow integrity. This primarily involves two changes. First, the prctls are restructured to resemble the style of the speculative execution workaround control prctls PR_{GET,SET}_SPECULATION_CTRL, to make them easier to extend in the future. Second, the "indir_br_lp" abbrevation is expanded to "branch_landing_pads" to be less telegraphic. The kselftest and documentation is adjusted accordingly. Link: https://lore.kernel.org/linux-riscv/CAHk-=whhSLGZAx3N5jJpb4GLFDqH_QvS07D+6BnkPWmCEzTAgw@mail.gmail.com/ Cc: Deepak Gupta <debug@rivosinc.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Brown <broonie@kernel.org> Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-04-02perf metrics: Make common stalled metrics conditional on having the eventIan Rogers
The metric code uses the event parsing code but it generally assumes all events are supported. Arnaldo reported AMD supporting stalled-cycles-frontend but not stalled-cycles-backend [1]. An issue with this is that before parsing happens the metric code tries to share events within groups to reduce the number of events and multiplexing. If the group has some supported and not supported events, the whole group will become broken. To avoid this situation add has_event tests to the metrics for stalled-cycles-frontend and stalled-cycles-backend. has_events is evaluated when parsing the metric and its result constant propagated (with if-elses) to reduce the number of events. This means when the metric code considers sharing the events, only supported events will be shared. Note for backporting. This change updates tools/perf/pmu-events/empty-pmu-events.c a convenience file for builds on systems without python present. While the metrics.json code should backport easily there can be conflicts on empty-pmu-events.c. In this case the build will have left a file test-empty-pmu-events.c that can be copied over empty-pmu-events.c to resolve issues and make an appropriate empty-pmu-events.c for the json in the source tree at the time of the build. [1] https://lore.kernel.org/lkml/abm1nR-2xjOUBroD@x1/ Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Closes: https://lore.kernel.org/lkml/abm1nR-2xjOUBroD@x1/ Fixes: c7adeb0974f1 ("perf jevents: Add set of common metrics based on default ones") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-02perf tests kwork: Add basic kwork coverage testsIan Rogers
Add basic kwork coverage tests for record, report, latency, timehist and top. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-02perf data convert ctf: Pipe mode improvementsIan Rogers
Handle the finished_round event. Set up the CTF events when the feature event desc is read. In pipe mode the attr events will create the evsels and the feature event desc events will name the evsels. The CTF events need the evsel name, so wait until feature event descs are read (in pipe mode) before setting up the events except for tracepoint events. Handle the tracing_data event so that tracepoint information is available when setting up tracepoint events. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-02perf evsel: Make unknown event names more uniqueIan Rogers
In situations like the perf data converter the evsel__name will be used to create babeltrace events. If the events have the same name then creation can fail. Avoid these failures by including more information into the unknown event names. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-02perf ordered-events: Event processing consistency with the regular readerIan Rogers
Some event processing functions like perf_event__process_tracing_data return a zero or positive value on success. Ordered event processing handles any non-zero value as an error, which is inconsistent with reader__process_events and reader__read_event that only treat negative values as errors. Make the ordered events error handling consistent with that of the events reader. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-02perf header: Refactor pipe mode end marker handlingIan Rogers
In non-pipe/data mode the header has a 256-bit bitmap representing whether a feature is enabled or not. In pipe mode features are written out in perf_event__synthesize_features as PERF_RECORD_HEADER_FEATURE events with a special zero sized marker for the last feature. If a new feature is added the last feature marker event appears as that feature from old pipe mode perf data. As the event is zero sized it will fail to be processed and generally terminate perf. Add a last_feat variable to the header that in non-pipe/data mode is just HEADER_LAST_FEATURE. In pipe mode compute the last_feat by handling zero sized feature events, assuming they are the marker and updating last_feat accordingly. Potentially a feature event could be zero sized and so still process the feature event, just ignore the error if it fails. As perf_event__process_feature can properly handle pipe mode data, migrate users to it except for report that still wants to group events and stop header printing with the last feature marker. Make perf_event__process_feature non-fatal in the case of a newer feature than this version of perf's HEADER_LAST_FEATURE, which was the behavior all users wanted. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-02perf session: Extra logging for failed to process eventsIan Rogers
Print log information in ordered event processing so that the cause of finished round failing is clearer. Print the event name along with its number when an event isn't processed. Add extra detail about where the failure happened. The following log lines come from running `perf data convert`. Before: 0xa250 [0x10]: failed to process type: 80 After: 0xa250 [0x10]: piped event processing failed for event of type: FEATURE (80) Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-02perf header: Properly warn/print when libtraceevent/libbpf support is missingIan Rogers
By removing the features from feat_ops with ifdefs the previous logic would print "# (null)" when perf processed a feature that lacked builtin support. Remove the ifdefs from feat_ops and in the relevant functions print errors/messages about the lack of support. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-02perf header: Add utility to convert feature number to a stringIan Rogers
For logging and debug messages it can be convenient to convert a feature number to a name. Add header_feat__name for this and reuse the data already within the feat_ops struct. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-02perf clockid: Add missing includeIan Rogers
clockid_t is declared in time.h but the include is missing. Reordering header files may result in build breakages. Add the include to avoid this. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-02perf trace: Fix potential u64 underflow in duration calculationMichael Petlan
Although it happens very rarely, in case of out-of-order events (i.e. due to CPU migration when a syscall is executed), the calculation of event duration might underflow and thus a bogus value is printed: 2.804 ( 0.001 ms): :49553/49553 rt_sigaction(sig: QUIT, act: 0x7fff403ed6e0, oact: 0x7fff403ed780, sigsetsize: 8) = 0 2.807 ( 0.001 ms): :49553/49553 rt_sigaction(sig: CHLD, act: 0x7fff403ed6e0, oact: 0x7fff403ed780, sigsetsize: 8) = 0 2.815 (18446744073709.438 ms): :49553/49553 execve(filename: 0xbb173a30, argv: 0x55aabb171930, envp: 0x55aabb171120) = 0 2.815 ( 0.534 ms): pwd/49553 ... [continued]: execve()) = 0 Check for possible underflow first and in case of a bogus value, do not print it. 2.804 ( 0.001 ms): :49553/49553 rt_sigaction(sig: QUIT, act: 0x7fff403ed6e0, oact: 0x7fff403ed780, sigsetsize: 8) = 0 2.807 ( 0.001 ms): :49553/49553 rt_sigaction(sig: CHLD, act: 0x7fff403ed6e0, oact: 0x7fff403ed780, sigsetsize: 8) = 0 2.815 ( ): :49553/49553 execve(filename: 0xbb173a30, argv: 0x55aabb171930, envp: 0x55aabb171120) = 0 2.815 ( 0.534 ms): pwd/49553 ... [continued]: execve()) = 0 Signed-off-by: Michael Petlan <mpetlan@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-02perf header: Validate build_id filename length to prevent buffer overflowSeungJu Cheon
The build_id parsing functions calculate a filename length from the event header size and read directly into a stack buffer of PATH_MAX bytes without bounds checking. A malformed perf.data file with a crafted header.size can cause the length to be negative or exceed PATH_MAX, resulting in a stack buffer overflow. Add bounds checking for the filename length in both perf_header__read_build_ids() and the ABI quirk variant. Print a warning message when invalid length is detected. Signed-off-by: SeungJu Cheon <suunj1331@gmail.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-02perf metricgroup: Refine error logsLeo Yan
Return -ENOENT when no metric/group matches, and directly use the return value from expr__find_ids(), so -EINVAL is reserved for parse failures. Print separate logs to make it clear. Before: perf stat -C 5 -vvv Using CPUID 0x00000000410fd490 metric expr 100 * (STALL_SLOT_BACKEND / (CPU_CYCLES * #slots) - BR_MIS_PRED * 3 / CPU_CYCLES) for backend_bound parsing metric: 100 * (STALL_SLOT_BACKEND / (CPU_CYCLES * #slots) - BR_MIS_PRED * 3 / CPU_CYCLES) Failure to read '#slots' literal: #slots = nan syntax error Cannot find metric or group `Default' After: perf stat -C 5 -vvv Using CPUID 0x00000000410fd490 metric expr 100 * (STALL_SLOT_BACKEND / (CPU_CYCLES * #slots) - BR_MIS_PRED * 3 / CPU_CYCLES) for backend_bound parsing metric: 100 * (STALL_SLOT_BACKEND / (CPU_CYCLES * #slots) - BR_MIS_PRED * 3 / CPU_CYCLES) Failure to read '#slots' literal: #slots = nan syntax error Fail to parse metric or group `Default' Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-02perf expr: Add '\n' in literal parse errorsLeo Yan
Add a trailing newline for logs. Before: perf stat -C 5 Failure to read '#slots'Cannot find metric or group `Default' After: perf stat -C 5 Failure to read '#slots' Cannot find metric or group `Default' Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-02perf expr: Return -EINVAL for syntax error in expr__find_ids()Leo Yan
expr__find_ids() propagates the parser return value directly. For syntax errors, the parser can return a positive value, but callers treat it as success, e.g., for below case on Arm64 platform: metric expr 100 * (STALL_SLOT_BACKEND / (CPU_CYCLES * #slots) - BR_MIS_PRED * 3 / CPU_CYCLES) for backend_bound parsing metric: 100 * (STALL_SLOT_BACKEND / (CPU_CYCLES * #slots) - BR_MIS_PRED * 3 / CPU_CYCLES) Failure to read '#slots' literal: #slots = nan syntax error Convert positive parser returns in expr__find_ids() to -EINVAL, as a result, the error value will be respected by callers. Before: perf stat -C 5 Failure to read '#slots'Failure to read '#slots'Failure to read '#slots'Failure to read '#slots'Segmentation fault After: perf stat -C 5 Failure to read '#slots'Cannot find metric or group `Default' Fixes: ded80bda8bc9 ("perf expr: Migrate expr ids table to a hashmap") Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-02perf test: Skip perf data type profiling tests for s390Thomas Richter
Test case 'perf data type profiling tests' fails on s390 with this error: # ./perf mem record -- ./perf test -w code_with_type failed: no PMU supports the memory events # echo $? 255 # because s390 does not support memory events at all. According to the man page, perf annotate --code-with-type only works with memory instructions only. As command 'perf mem record ...' is not supported on s390, skip this test for s390. Output before: # ./perf test 'perf data type profiling tests' 77: perf data type profiling tests : FAILED! Output after: # ./perf test 'perf data type profiling tests' 77: perf data type profiling tests : Skip Fixes: f60a5c22967b8 ("perf tests: Test annotate with data type profiling and rust") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Suggested-by: Namhyung Kim <namhyung@kernel.org> Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-02perf tools: prevent null dsos from being addedAnubhav Shelat
When sorting the dso array we sometimes get a crash due to null comparisons in comparator functions. So prevent __dsos__add from adding null to the dso array to avoid out-of-memory related errors. Signed-off-by: Anubhav Shelat <ashelat@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-02perf test: Fix ratio_to_prev event parsing testThomas Falcon
test__ratio_to_prev() assumed the first event in a group is the leader, which is not the case when the event is expanded into two event groups on hybrid PMU's with auto counter reload support. Instead, iterate over the event group generated for each core PMU. Also update "wrong leader" test to check that the subordinate event has the correct leader instead of checking that it is not the group leader. Finally, do not exit immediately if a PMU without auto counter reload support is found. Signed-off-by: Thomas Falcon <thomas.falcon@intel.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Fixes: 56be0fe5f62c ("perf record: Add auto counter reload parse and regression tests") Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-02perf tools: Fix module symbol resolution for non-zero .text sh_addrChuck Lever
When perf resolves symbols from kernel module ELF files (ET_REL), it converts symbol addresses to file offsets so that sample IPs can be matched to the correct symbol. The conversion adjusts each symbol's st_value: sym->st_value -= shdr->sh_addr - shdr->sh_offset; For vmlinux (ET_EXEC), st_value is a virtual address and sh_addr is the section's virtual base, so subtracting sh_addr and adding sh_offset correctly yields a file offset. For kernel modules (ET_REL), st_value is a section-relative offset. The module loader ignores sh_addr entirely and places symbols at module_base + st_value. Converting to file offset requires only adding sh_offset; subtracting sh_addr introduces an error equal to sh_addr bytes. When .text has sh_addr == 0 -- the historical norm for simple modules -- both formulas produce the same result and the bug is latent. As modules gain more metadata sections before .text (.note, .static_call.text, etc.), the linker assigns .text a non-zero sh_addr, exposing the defect. For example, nfsd.ko on this kernel has sh_addr=0xa80, kvm-intel.ko has sh_addr=0x1e90. The effect is that all .text symbols in affected modules shift by sh_addr bytes relative to sample IPs, causing perf report to attribute samples to incorrect, nearby symbols. This was observed as 13% of LLC-load-miss samples misattributed to nfsd_file_get_dio_attrs when the actual hot function was nfsd_cache_lookup, approximately 0xa80 bytes away in the symbol table. Use the existing dso__rel() flag (already set for ET_REL modules) to select the correct adjustment: add sh_offset for ET_REL, subtract (sh_addr - sh_offset) for ET_EXEC/ET_DYN. Fixes: 0131c4ec794a ("perf tools: Make it possible to read object code from kernel modules") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-02perf trace: Skip unnecessary synthesis for summary-only modeNamhyung Kim
It needs to synthesize task info for the comm name. The mmap information is only needed for callchain symbolization which is not used by the summary mode. Also total or cgroup summary mode don't require the task info. Let's skip the processing if possible. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-02perf stat: Fix crash on arm64Breno Leitao
Perf stat is crashing on arm64 hosts with the following issue: # make -C tools/perf DEBUG=1 # perf stat sleep 1 perf: util/evsel.c:2034: get_group_fd: Assertion `!(!leader->core.fd)' failed. [1] 1220794 IOT instruction (core dumped) ./perf stat The sorting function introduced by commit a745c0831c15c ("perf stat: Sort default events/metrics") compares events based on their individual properties. This can cause events from different groups to be interleaved, resulting in group members appearing before their leaders in the sorted evlist. When the iterator opens events in list order, a group member may be processed before its leader has been opened. For example, CPU_CYCLES (idx=32) with leader STALL_SLOT_BACKEND (idx=37) could be sorted before its leader, causing the crash when CPU_CYCLES tries to get its group fd from the not-yet-opened leader. Fix this by comparing events based on their leader's attributes instead of their own attributes when the events are in different groups. This ensures all members of a group share the same sort key as their leader, keeping groups together and guaranteeing leaders are opened before their members. Fixes: a745c0831c15c ("perf stat: Sort default events/metrics") Reported-by: Denis Yaroshevskiy <dyaroshev@meta.com> Tested-by: Dmitry Ilvokhin <d@ilvokhin.com> Tested-by: Ian Rogers <irogers@google.com> Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-01perf test: Fix perf stat --bpf-counters on hybrid machinesNamhyung Kim
The test constantly fails on my Intel hybrid machine. The issue was it has two events in the output even if I only gave it one event. $ perf stat -e instructions -- perf test -w sqrtloop Performance counter stats for 'perf test -w sqrtloop': 910,856,421 cpu_atom/instructions/ (28.05%) 14,852,865,997 cpu_core/instructions/ (96.79%) 1.014313341 seconds time elapsed 1.004114000 seconds user 0.008174000 seconds sys Let's modify the awk script to add the values for each line and print the total. The variable 'i' has a number of input lines that have valid output and variable 'c' has the sum of actual counter values. That way it should work on any platforms. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-01perf tests: Write test files to tmpdirIan Rogers
Writing to the test output files in the current working directory can fail in various contexts such as continual test. Other tests write to a mktemp-ed file, make the "perf script task-analyszer tests" follow this convention too. Currently this isn't possible for the perf.data file due to a lack of perf script support, add a variable for when this support is available. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-04-01libperf cpumap: Make index and nr types unsignedIan Rogers
The index into the cpumap array and the number of entries within the array can never be negative, so let's make them unsigned. This is prompted by reports that gcc 13 with -O6 is giving a alloc-size-larger-than errors. The change makes the cpumap changes and then updates the declaration of index variables throughout perf and libperf to be unsigned. The two things are hard to separate as compiler warnings about mixing signed and unsigned types breaks the build. Reported-by: Chingbin Li <liqb365@163.com> Closes: https://lore.kernel.org/lkml/20260212025127.841090-1-liqb365@163.com/ Tested-by: Chingbin Li <liqb365@163.com> Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-31perf beauty: Move copy of fadvise.h from tools/include/ to ↵Arnaldo Carvalho de Melo
tools/perf/trace/beauty/include/ As it is not really used when compiling anything, just being parsed to collect number->string tables for 'perf trace'. $ git grep fadvise.h tools/ tools/perf/Makefile.perf:$(fadvise_advice_array): $(beauty_uapi_linux_dir)/fadvise.h $(fadvise_advice_tbl) tools/perf/check-headers.sh: "include/uapi/linux/fadvise.h" tools/perf/trace/beauty/fadvise.sh:grep -E $regex ${header_dir}/fadvise.h | \ tools/perf/trace/beauty/fadvise.sh:# tools/include/uapi/linux/fadvise.h for details. $ Link: https://lore.kernel.org/r/CAP-5=fVBNQVF8k3JUQjH1nkP69ZVp8BqP+uwygcx=xO0zC4xrg@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-31perf beauty: Move tools/include/uapi/drm to tools/perf/trace/beauty/include/uapiArnaldo Carvalho de Melo
As it is used only to parse ioctl numbers, not to build perf and so far no other tools/ living tool uses it, so to clean up tools/include/ to be used just for building tools, to have access to things available in the kernel and not yet in the system headers, move it to the directory where just the tools/perf/trace/beauty/ scripts can use to generate tables used by perf. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-31perf build: Add -funsigned-char to default CFLAGSIan Rogers
Commit 3bc753c06dd0 ("kbuild: treat char as always unsigned") made chars unsigned by default in the Linux kernel. To avoid similar kinds of bugs and warnings, make unsigned chars the default for the perf tool. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-27perf tools: Add --pmu-filter option for filtering PMUsQinxin Xia
This patch adds a new --pmu-filter option to perf-stat command to allow filtering events on specific PMUs. This is useful when there are multiple PMUs with same type (e.g. hisi_sicl2_cpa0 and hisi_sicl0_cpa0). [root@localhost tmp]# perf stat -M cpa_p0_avg_bw Performance counter stats for 'system wide': 19,417,779,115 hisi_sicl0_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw 0 hisi_sicl0_cpa0/cpa_p0_wr_dat/ 0 hisi_sicl0_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl0_cpa0/cpa_p0_rd_dat_32b/ 19,417,751,103 hisi_sicl10_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw 0 hisi_sicl10_cpa0/cpa_p0_wr_dat/ 0 hisi_sicl10_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl10_cpa0/cpa_p0_rd_dat_32b/ 19,417,730,679 hisi_sicl2_cpa0/cpa_cycles/ # 0.31 cpa_p0_avg_bw 75,635,749 hisi_sicl2_cpa0/cpa_p0_wr_dat/ 18,520,640 hisi_sicl2_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl2_cpa0/cpa_p0_rd_dat_32b/ 19,417,674,227 hisi_sicl8_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw 0 hisi_sicl8_cpa0/cpa_p0_wr_dat/ 0 hisi_sicl8_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl8_cpa0/cpa_p0_rd_dat_32b/ 19.417734480 seconds time elapsed [root@localhost tmp]# perf stat --pmu-filter hisi_sicl2_cpa0 -M cpa_p0_avg_bw Performance counter stats for 'system wide': 6,234,093,559 cpa_cycles # 0.60 cpa_p0_avg_bw 50,548,465 cpa_p0_wr_dat 7,552,182 cpa_p0_rd_dat_64b 0 cpa_p0_rd_dat_32b 6.234139320 seconds time elapsed Signed-off-by: Qinxin Xia <xiaqinxin@huawei.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-26perf report: Add comm_nodigit sort keyStephen Brennan
The "comm" column allows grouping events by the process command. It is intended to group like programs, despite having different PIDs. But some workloads may adjust their own command, so that a unique identifier (e.g. a PID or some other numeric value) is part of the command name. This destroys the utility of "comm", forcing perf to place each unique process name into its own bucket, which can contribute to a combinatorial explosion of memory use in perf report. Create a less strict version of this column, which ignores digits when comparing command names. Commands whose names are the same (ignoring digits) are sorted into the same histogram buckets, and displayed with the placeholder value "<N>" in the place of digits. For example, hypothetical command names "kworker/1" "kworker/2" "kworker/3" would sort into the same bucket and be represented as "kworker/<N>". Committer testing: $ perf report -s comm,comm_nodigit | grep -F "<N>" 0.01% CPU 6/TCG CPU <N>/TCG 0.01% kworker/53:2-mm kworker/<N>:<N>-mm 0.01% migration/24 migration/<N> 0.01% kworker/24:1-ev kworker/<N>:<N>-ev 0.01% llvmpipe-8 llvmpipe-<N> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-26perf stat: Fix opt->value type for parse_cache_levelIan Rogers
Commit f5803651b4a4 ("perf stat: Choose the most disaggregate command line option") changed aggregation option handling for `perf stat` but not `perf stat report` leading to parse_cache_level being passed a struct in the `perf stat` case but erroneously an aggr_mode enum value for `perf stat report`. Change the `perf stat report` aggregation handling to use the same opt_aggr_mode as `perf stat`. Also, just pass the boolean for consistency with other boolean argument handling. Fixes: f5803651b4a4 ("perf stat: Choose the most disaggregate command line option") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-26perf lock: Fix option value type in parse_max_stackIan Rogers
The value is a void* and the address of an int, max_stack_depth, is set up in the perf lock options. The parse_max_stack function treats the int* as a long*, make this more correct by declaring the value to be an int*. Fixes: 0a277b622670 ("perf lock contention: Check --max-stack option") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-26perf record: Add support for arch_sdt_arg_parse_op() on s390Thomas Richter
commit e5e66adfe45a6 ("perf regs: Remove __weak attributive arch_sdt_arg_parse_op() function") removes arch_sdt_arg_parse_op() functions and reveals missing s390 support. The following warning is printed: Unknown ELF machine 22, standard arguments parse will be skipped. ELF machine 22 is the EM_S390 host. This happens with command # ./perf record -v -- stress-ng -t 1s --matrix 0 when the event is not specified. Add s390 specific __perf_sdt_arg_parse_op_s390() function to support -architecture calls to arch_sdt_arg_parse_op() for s390. The warning disappears. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Jan Polensky <japo@linux.ibm.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-25Merge tag 'perf-tools-fixes-for-v7.0-2-2026-03-23' into perf-tools-nextNamhyung Kim
To get the various fixes for v7.0. Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-19perf evlist: Improve default event for s390Ian Rogers
Frame pointer callchains are not supported on s390 and dwarf callchains are only supported on software events. Switch the default event from the hardware 'cycles' event to the software 'cpu-clock' or 'task-clock' on s390 if callchains are enabled. Move some of the target initialization earlier in builtin-top and builtin-record, so it is ready for use by evlist__new_default. If frame pointer callchains are requested on s390 show a warning. Modify the '-g' option of `perf top` and `perf record` to default to dwarf callchains on s390. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-19perf callchain: Refactor callchain option parsingIan Rogers
record_opts__parse_callchain is shared by builtin-record and builtin-trace, it is declared in callchain.h. Move the declaration to callchain.c for consistency with the header. In other cases make the option callback a small static stub that then calls into callchain.c. Make the no argument '-g' callchain option just a short-cut for '--call-graph fp' so that there is consistency in how the arguments are handled. This requires the const char* string to be strdup-ed in __parse_callchain_report_opt. For consistency also make parse_callchain_record use strdup and remove some unnecessary casts. Also, be more explicit about the '-g' behavior if there is a .perfconfig file setting. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-19perf evsel: Constify option arguments to config functionsIan Rogers
The options are used to configure the evsel but are not themselves configured. Make the arguments const to better capture this. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-19perf target: Constify simple check functionsIan Rogers
Allow the target to be const in callers. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-19perf evsel: Improve falling back from cyclesIan Rogers
Switch to using evsel__match rather than comparing perf_event_attr values, this is robust on hybrid architectures. Ensure evsel->pmu matches the evsel->core.attr. Remove exclude bits that get set in other fallback attempts when switching the event. Log the event name with modifiers when switching the event on fallback. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-19perf dwarf-aux: Collect all variable locations for insn trackingZecheng Li
Previously, only the first DWARF location entry was collected for each variable. This was based on the assumption that instruction tracking could reconstruct the remaining state. However, variables may have different locations across different address ranges, and relying solely on instruction tracking can miss valid type information. Change __die_collect_vars_cb() to iterate over all location entries using dwarf_getlocations() in a loop. This ensures that variables with multiple location ranges are properly tracked, improving type coverage. Signed-off-by: Zecheng Li <zli94@ncsu.edu> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-19perf annotate-data: Use DWARF location ranges to preserve reg stateZecheng Li
When a function call occurs, caller-saved registers are typically invalidated since the callee may clobber them. However, DWARF debug info provides location ranges that indicate exactly where a variable is valid in a register. Track the DWARF location range end address in type_state_reg and use it to determine if a caller-saved register should be preserved across a call. If the current call address is within the DWARF-specified lifetime of the variable, keep the register state valid instead of invalidating it. This improves type annotation for code where the compiler knows a register value survives across calls (e.g., when the callee is known not to clobber certain registers or when the value is reloaded after the call at the same logical location). Changes: - Add `end` and `has_range` fields to die_var_type to capture DWARF location range information - Add `lifetime_active` and `lifetime_end` fields to type_state_reg - Check location lifetime before invalidating caller-saved registers Signed-off-by: Zecheng Li <zli94@ncsu.edu> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-19perf annotate-data: Invalidate caller-saved regs for all callsZecheng Li
Previously, the x86 call handler returned early without invalidating caller-saved registers when the call target symbol could not be resolved (func == NULL). This violated the ABI which requires caller-saved registers to be considered clobbered after any call instruction. Fix this by: 1. Always invalidating caller-saved registers for any call instruction (except __fentry__ which preserves registers) 2. Using dl->ops.target.name as fallback when func->name is unavailable, allowing return type lookup for more call targets This is a conservative change that may reduce type coverage for indirect calls (e.g., callq *(%rax)) where we cannot determine the return type but it ensures correctness. Signed-off-by: Zecheng Li <zli94@ncsu.edu> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-19perf annotate-data: Add invalidate_reg_state() helper for x86Zecheng Li
Add a helper function to consistently invalidate register state instead of field assignments. This ensures kind, ok, and copied_from are all properly cleared when a register becomes invalid. The helper sets: - kind = TSR_KIND_INVALID - ok = false - copied_from = -1 Replace all invalidation patterns with calls to this helper. No functional change and this removes some incorrect annotations that were caused by incomplete invalidation (e.g. a obsolete copied_from from an invalidated register). Signed-off-by: Zecheng Li <zli94@ncsu.edu> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-19perf annotate-data: Handle global variable access with const registerZecheng Li
When a register holds a constant value (TSR_KIND_CONST) and is used with a negative offset, treat it as a potential global variable access instead of falling through to CFA (frame) handling. This fixes cases like array indexing with computed offsets: movzbl -0x7d72725a(%rax), %eax # array[%rax] Where %rax contains a computed index and the negative offset points to a global array. Previously this fell through to the CFA path which doesn't handle global variables, resulting in "no type information". The fix redirects such accesses to check_kernel which calls get_global_var_type() to resolve the type from the global variable cache. This is only done for kernel DSOs since the pattern relies on kernel-specific global variable resolution. We could also treat registers with integer types to the global variable path, but this requires more changes. Signed-off-by: Zecheng Li <zli94@ncsu.edu> Signed-off-by: Namhyung Kim <namhyung@kernel.org>