summaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)Author
2025-12-02perf tools: Fallback to initial kernel map properlyNamhyung Kim
In maps__split_kallsyms(), it assumes new kernel map when it finds a symbol without module after any module and the initial kernel map has some symbols. Because it expects modules are out of the kernel map so modules should not have symbols in the kernel map. For example, the following memory map shows symbols and maps. Any symbols in the module 1 area will go to the module 1. The main kernel map starts at 0xffffffffbc200000. But if any symbol has a module between the symbols in that area, next symbols after 0xffffffffbd008000 will generate new kernel maps like [kernel].1. kernel address | | | | 0xffffffffc0000000 |---------------------| | (symbols) | | ... | <--- [kernel].N 0xffffffffbc400000 |---------------------| | (symbols) | | module 2 | <--- bad? 0xffffffffbc380000 |---------------------| | ... | | (symbols) | | [kernel.kallsyms] | <--- initial map 0xffffffffbc200000 |---------------------| | | | | 0xffffffffabcde000 |---------------------| | (symbols) | | module 1 | 0xffffffffabcd0000 |---------------------| This is very fragile when the module has a symbol that falls into the main kernel map for some reason. My system has a livepatch module with such symbols. And it created a lot of new kernel maps after those symbols. But the symbol may have broken addresses and the later symbols can still be found in the initial kernel map. Let's check the symbol address in the initial map and use it if found. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf tools: Fix split kallsyms DSO countingNamhyung Kim
It's counted twice as it's increased after calling maps__insert(). I guess we want to increase it only after it's added properly. Reviewed-by: Ian Rogers <irogers@google.com> Fixes: 2e538c4a1847291cf ("perf tools: Improve kernel/modules symbol lookup") Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf tools: Mark split kallsyms DSOs as loadedNamhyung Kim
The maps__split_kallsyms() will split symbols to module DSOs if it comes from a module. It also handled some unusual kernel symbols after modules by creating new kernel maps like "[kernel].0". But they are pseudo DSOs to have those unexpected symbols. They should not be considered as unloaded kernel DSOs. Otherwise the dso__load() for them will end up calling dso__load_kallsyms() and then maps__split_kallsyms() again and again. Reviewed-by: Ian Rogers <irogers@google.com> Fixes: 2e538c4a1847291cf ("perf tools: Improve kernel/modules symbol lookup") Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf tools: Flush remaining samples w/o deferred callchainsNamhyung Kim
It's possible that some kernel samples don't have matching deferred callchain records when the profiling session was ended before the threads came back to userspace. Let's flush the samples before finish the session. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf tools: Merge deferred user callchainsNamhyung Kim
Save samples with deferred callchains in a separate list and deliver them after merging the user callchains. If users don't want to merge they can set tool->merge_deferred_callchains to false to prevent the behavior. With previous result, now perf script will show the merged callchains. $ perf script ... pwd 2312 121.163435: 249113 cpu/cycles/P: ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms]) ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms]) ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms]) ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms]) ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms]) ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms]) ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms]) 7f18fe337fa7 mprotect+0x7 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) 7f18fe330e0f _dl_sysdep_start+0x7f (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) 7f18fe331448 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) ... The old output can be get using --no-merge-callchain option. Also perf report can get the user callchain entry at the end. $ perf report --no-children --stdio -q -S __build_id_parse.isra.0 # symbol: __build_id_parse.isra.0 8.40% pwd [kernel.kallsyms] | ---__build_id_parse.isra.0 perf_event_mmap mprotect_fixup do_mprotect_pkey __x64_sys_mprotect do_syscall_64 entry_SYSCALL_64_after_hwframe mprotect _dl_sysdep_start _dl_start_user Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf script: Display PERF_RECORD_CALLCHAIN_DEFERREDNamhyung Kim
Handle the deferred callchains in the script output. $ perf script ... pwd 2312 121.163435: 249113 cpu/cycles/P: ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms]) ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms]) ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms]) ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms]) ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms]) ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms]) ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms]) b00000006 (cookie) ([unknown]) pwd 2312 121.163447: DEFERRED CALLCHAIN [cookie: b00000006] 7f18fe337fa7 mprotect+0x7 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) 7f18fe330e0f _dl_sysdep_start+0x7f (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) 7f18fe331448 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf record: Add --call-graph fp,defer option for deferred callchainsNamhyung Kim
Add a new callchain record mode option for deferred callchains. For now it only works with FP (frame-pointer) mode. And add the missing feature detection logic to clear the flag on old kernels. $ perf record --call-graph fp,defer -vv true ... ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CALLCHAIN|PERIOD read_format ID|LOST disabled 1 inherit 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 sample_id_all 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 defer_callchain 1 defer_output 1 ------------------------------------------------------------ sys_perf_event_open: pid 162755 cpu 0 group_fd -1 flags 0x8 sys_perf_event_open failed, error -22 switching off deferred callchain support Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf tools: Minimal DEFERRED_CALLCHAIN supportNamhyung Kim
Add a new event type for deferred callchains and a new callback for the struct perf_tool. For now it doesn't actually handle the deferred callchains but it just marks the sample if it has the PERF_CONTEXT_ USER_DEFFERED in the callchain array. At least, perf report can dump the raw data with this change. Actually this requires the next commit to enable attr.defer_callchain, but if you already have a data file, it'll show the following result. $ perf report -D ... 0x2158@perf.data [0x40]: event: 22 . . ... raw event: size 64 bytes . 0000: 16 00 00 00 02 00 40 00 06 00 00 00 0b 00 00 00 ......@......... . 0010: 03 00 00 00 00 00 00 00 a7 7f 33 fe 18 7f 00 00 ..........3..... . 0020: 0f 0e 33 fe 18 7f 00 00 48 14 33 fe 18 7f 00 00 ..3.....H.3..... . 0030: 08 09 00 00 08 09 00 00 e6 7a e7 35 1c 00 00 00 .........z.5.... 121163447014 0x2158 [0x40]: PERF_RECORD_CALLCHAIN_DEFERRED(IP, 0x2): 2312/2312: 0xb00000006 ... FP chain: nr:3 ..... 0: 00007f18fe337fa7 ..... 1: 00007f18fe330e0f ..... 2: 00007f18fe331448 : unhandled! Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf jevents: Skip optional metrics in metric group listIan Rogers
For metric groups, skip metrics in the list that are None. This allows functions to better optionally return metrics. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf jevents: Drop duplicate pending metricsIan Rogers
Drop adding a pending metric if there is an existing one. Ensure the PMUs differ for hybrid systems. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf jevents: Move json encoding to its own functionsIan Rogers
Have dedicated encode functions rather than having them embedded in MetricGroup. This is to provide some uniformity in the Metric ToXXX routines. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf jevents: Add threshold expressions to MetricIan Rogers
Allow threshold expressions for metrics to be generated. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf jevents: Term list fix in event parsingIan Rogers
Fix events seemingly broken apart at a comma. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf jevents: Support parsing negative exponentsIan Rogers
Support negative exponents when parsing from a json metric string by making the numbers after the 'e' optional in the 'Event' insertion fix up. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf jevents: Allow metric groups not to be namedIan Rogers
It can be convenient to have unnamed metric groups for the sake of organizing other metrics and metric groups. An unspecified name shouldn't contribute to the MetricGroup json value, so don't record it. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf jevents: Add descriptions to metricgroup abstractionIan Rogers
Add a function to recursively generate metric group descriptions. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf jevents: Update metric constraint supportIan Rogers
Previous metric constraints were binary, either none or don't group when the NMI watchdog is present. Update to match the definitions in 'enum metric_event_groups' in pmu-events.h. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf jevents: Allow multiple metricgroups.json filesIan Rogers
Allow multiple metricgroups.json files by handling any file ending with metricgroups.json as a metricgroups file. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf ilist: Be tolerant of reading a metric on the wrong CPUIan Rogers
This happens on hybrid machine metrics. Be tolerant and don't cause the ilist application to crash with an exception. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf python: Correct copying of metric_leader in an evselIan Rogers
Ensure the metric_leader is copied and set up correctly. In compute_metric determine the correct metric_leader event to match the requested CPU. Fixes the handling of metrics particularly on hybrid machines. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf test: Add python JIT dump testNamhyung Kim
Add a test case for the python interpreter like below so that we can make sure it won't break again. To validate the effect of build-ID generation, it adds and removes the JIT'ed DSOs to/from the build-ID cache for the test. $ perf test -vv jitdump 84: python profiling with jitdump: --- start --- test child forked, pid 214316 Run python with -Xperf_jit [ perf record: Woken up 5 times to write data ] [ perf record: Captured and wrote 1.180 MB /tmp/__perf_test.perf.data.XbqZNm (140 samples) ] Generate JIT-ed DSOs using perf inject Add JIT-ed DSOs to the build-ID cache Check the symbol containing the script name Found 108 matching lines Remove JIT-ed DSOs from the build-ID cache ---- end(0) ---- 84: python profiling with jitdump : Ok Cc: Pablo Galindo <pablogsal@gmail.com> Link: https://docs.python.org/3/howto/perf_profiling.html#how-to-work-without-frame-pointers Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf jitdump: Add sym/str-tables to build-ID generationNamhyung Kim
It was reported that python backtrace with JIT dump was broken after the change to built-in SHA-1 implementation. It seems python generates the same JIT code for each function. They will become separate DSOs but the contents are the same. Only difference is in the symbol name. But this caused a problem that every JIT'ed DSOs will have the same build-ID which makes perf confused. And it resulted in no python symbols (from JIT) in the output. Looking back at the original code before the conversion, it used the load_addr as well as the code section to distinguish each DSO. But it'd be better to use contents of symtab and strtab instead as it aligns with some linker behaviors. This patch adds a buffer to save all the contents in a single place for SHA-1 calculation. Probably we need to add sha1_update() or similar to update the existing hash value with different contents and use it here. But it's out of scope for this change and I'd like something that can be backported to the stable trees easily. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Pablo Galindo <pablogsal@gmail.com> Cc: Fangrui Song <maskray@sourceware.org> Link: https://github.com/python/cpython/issues/139544 Fixes: e3f612c1d8f3945b ("perf genelf: Remove libcrypto dependency and use built-in sha1()") Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf test: Fix hybrid testing of event fallback testIan Rogers
The mem-loads-aux event exists on hybrid systems but the "cpu" PMU does not. This causes an event parsing error which erroneously makes the test look like it is failing. Avoid naming the PMU to avoid this. Rather than cleaning up perf.data in the directory the test is run, explicitly send the 'perf record' output to /dev/null and avoid any cleanup scripts. Fixes: fc9c17b22352 ("perf test: Add a perf event fallback test") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-02perf tools: Remove a trailing newline in the event termsNamhyung Kim
So that it can show the correct encoding info in the JSON output. $ perf list -j hw [ { "Unit": "cpu", "Topic": "legacy hardware", "EventName": "branch-instructions", "EventType": "Kernel PMU event", "BriefDescription": "Retired branch instructions [This event is an alias of branches]", "Encoding": "cpu/event=0xc4/" }, ... Reviewed-by: Ian Rogers <irogers@google.com> Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-12-01Merge tag 'objtool-core-2025-12-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: - klp-build livepatch module generation (Josh Poimboeuf) Introduce new objtool features and a klp-build script to generate livepatch modules using a source .patch as input. This builds on concepts from the longstanding out-of-tree kpatch project which began in 2012 and has been used for many years to generate livepatch modules for production kernels. However, this is a complete rewrite which incorporates hard-earned lessons from 12+ years of maintaining kpatch. Key improvements compared to kpatch-build: - Integrated with objtool: Leverages objtool's existing control-flow graph analysis to help detect changed functions. - Works on vmlinux.o: Supports late-linked objects, making it compatible with LTO, IBT, and similar. - Simplified code base: ~3k fewer lines of code. - Upstream: No more out-of-tree #ifdef hacks, far less cruft. - Cleaner internals: Vastly simplified logic for symbol/section/reloc inclusion and special section extraction. - Robust __LINE__ macro handling: Avoids false positive binary diffs caused by the __LINE__ macro by introducing a fix-patch-lines script which injects #line directives into the source .patch to preserve the original line numbers at compile time. - Disassemble code with libopcodes instead of running objdump (Alexandre Chartre) - Disassemble support (-d option to objtool) by Alexandre Chartre, which supports the decoding of various Linux kernel code generation specials such as alternatives: 17ef: sched_balance_find_dst_group+0x62f mov 0x34(%r9),%edx 17f3: sched_balance_find_dst_group+0x633 | <alternative.17f3> | X86_FEATURE_POPCNT 17f3: sched_balance_find_dst_group+0x633 | call 0x17f8 <__sw_hweight64> | popcnt %rdi,%rax 17f8: sched_balance_find_dst_group+0x638 cmp %eax,%edx ... jump table alternatives: 1895: sched_use_asym_prio+0x5 test $0x8,%ch 1898: sched_use_asym_prio+0x8 je 0x18a9 <sched_use_asym_prio+0x19> 189a: sched_use_asym_prio+0xa | <jump_table.189a> | JUMP 189a: sched_use_asym_prio+0xa | jmp 0x18ae <sched_use_asym_prio+0x1e> | nop2 189c: sched_use_asym_prio+0xc mov $0x1,%eax 18a1: sched_use_asym_prio+0x11 and $0x80,%ecx ... exception table alternatives: native_read_msr: 5b80: native_read_msr+0x0 mov %edi,%ecx 5b82: native_read_msr+0x2 | <ex_table.5b82> | EXCEPTION 5b82: native_read_msr+0x2 | rdmsr | resume at 0x5b84 <native_read_msr+0x4> 5b84: native_read_msr+0x4 shl $0x20,%rdx .... x86 feature flag decoding (also see the X86_FEATURE_POPCNT example in sched_balance_find_dst_group() above): 2faaf: start_thread_common.constprop.0+0x1f jne 0x2fba4 <start_thread_common.constprop.0+0x114> 2fab5: start_thread_common.constprop.0+0x25 | <alternative.2fab5> | X86_FEATURE_ALWAYS | X86_BUG_NULL_SEG 2fab5: start_thread_common.constprop.0+0x25 | jmp 0x2faba <.altinstr_aux+0x2f4> | jmp 0x4b0 <start_thread_common.constprop.0+0x3f> | nop5 2faba: start_thread_common.constprop.0+0x2a mov $0x2b,%eax ... NOP sequence shortening: 1048e2: snapshot_write_finalize+0xc2 je 0x104917 <snapshot_write_finalize+0xf7> 1048e4: snapshot_write_finalize+0xc4 nop6 1048ea: snapshot_write_finalize+0xca nop11 1048f5: snapshot_write_finalize+0xd5 nop11 104900: snapshot_write_finalize+0xe0 mov %rax,%rcx 104903: snapshot_write_finalize+0xe3 mov 0x10(%rdx),%rax ... and much more. - Function validation tracing support (Alexandre Chartre) - Various -ffunction-sections fixes (Josh Poimboeuf) - Clang AutoFDO (Automated Feedback-Directed Optimizations) support (Josh Poimboeuf) - Misc fixes and cleanups (Borislav Petkov, Chen Ni, Dylan Hatch, Ingo Molnar, John Wang, Josh Poimboeuf, Pankaj Raghav, Peter Zijlstra, Thorsten Blum) * tag 'objtool-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits) objtool: Fix segfault on unknown alternatives objtool: Build with disassembly can fail when including bdf.h objtool: Trim trailing NOPs in alternative objtool: Add wide output for disassembly objtool: Compact output for alternatives with one instruction objtool: Improve naming of group alternatives objtool: Add Function to get the name of a CPU feature objtool: Provide access to feature and flags of group alternatives objtool: Fix address references in alternatives objtool: Disassemble jump table alternatives objtool: Disassemble exception table alternatives objtool: Print addresses with alternative instructions objtool: Disassemble group alternatives objtool: Print headers for alternatives objtool: Preserve alternatives order objtool: Add the --disas=<function-pattern> action objtool: Do not validate IBT for .return_sites and .call_sites objtool: Improve tracing of alternative instructions objtool: Add functions to better name alternatives objtool: Identify the different types of alternatives ...
2025-11-29perf trace: Skip internal syscall argumentsNamhyung Kim
Recent changes in the linux-next kernel will add new field for syscalls to have contents in the userspace like below. # cat /sys/kernel/tracing/events/syscalls/sys_enter_write/format name: sys_enter_write ID: 758 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:int __syscall_nr; offset:8; size:4; signed:1; field:unsigned int fd; offset:16; size:8; signed:0; field:const char * buf; offset:24; size:8; signed:0; field:size_t count; offset:32; size:8; signed:0; field:__data_loc char[] __buf_val; offset:40; size:4; signed:0; print fmt: "fd: 0x%08lx, buf: 0x%08lx (%s), count: 0x%08lx", ((unsigned long)(REC->fd)), ((unsigned long)(REC->buf)), __print_dynamic_array(__buf_val, 1), ((unsigned long)(REC->count)) We have a different way to handle those arguments and this change confuses perf trace then make some tests failing. Fix it by skipping the new fields that have "__data_loc char[]" type. Maybe we can switch to this instead of the BPF augmentation later. Reviewed-by: Howard Chu <howardchu95@gmail.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Howard Chu <howardchu95@gmail.com> Reported-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-26perf tools: Don't read build-ids from non-regular filesJames Clark
Simplify the build ID reading code by removing the non-blocking option. Having to pass the correct option to this function was fragile and a mistake would result in a hang, see the linked fix. Furthermore, compressed files are always opened blocking anyway, ignoring the non-blocking option. We also don't expect to read build IDs from non-regular files. The only hits to this function that are non-regular are devices that won't be elf files with build IDs, for example "/dev/dri/renderD129". Now instead of opening these as non-blocking and failing to read, we skip them. Even if something like a pipe or character device did have a build ID, I don't think it would have worked because you need to call read() in a loop, check for -EAGAIN and handle timeouts to make non-blocking reads work. Link: https://lore.kernel.org/linux-perf-users/20251022-james-perf-fix-dso-block-v1-1-c4faab150546@linaro.org/ Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-26perf vendor events riscv: add T-HEAD C920V2 JSON supportInochi Amaoto
T-HEAD C920 has a V2 iteration, which supports Sscompmf. The V2 iteration supports the same perf events as V1. Reuse T-HEAD c900-legacy JSON file for T-HEAD C920V2. Signed-off-by: Inochi Amaoto <inochiama@gmail.com> Acked-by: Paul Walmsley <pjw@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-26perf pmu: fix duplicate conditional statementAnubhav Shelat
Remove duplicate check for PERF_PMU_TYPE_DRM_END in perf_pmu__kind. Fixes: f0feb21e0a10 ("perf pmu: Add PMU kind to simplify differentiating") Signed-off-by: Anubhav Shelat <ashelat@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Closes: https://lore.kernel.org/linux-perf-users/CA+G8Dh+wLx+FvjjoEkypqvXhbzWEQVpykovzrsHi2_eQjHkzQA@mail.gmail.com/ Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-24perf docs: arm-spe: Document new SPE filtering featuresJames Clark
FEAT_SPE_EFT and FEAT_SPE_FDS etc have new user facing format attributes so document them. Also document existing 'event_filter' bits that were missing from the doc and the fact that latency values are stored in the weight field. Reviewed-by: Leo Yan <leo.yan@arm.com> Tested-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-24perf tools: Add support for perf_event_attr::config4James Clark
perf_event_attr has gained a new field, config4, so add support for it extending the existing configN support. Reviewed-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-20perf: replace strcpy() with strncpy() in util/jitdump.cHrishikesh Suresh
Usage of strcpy() can lead to buffer overflows. Therefore, it has been replaced with strncpy(). The output file path is provided as a parameter and might be restricted by command-line by default. But this defensive patch will prevent any potential overflow, making the code more robust against future changes in input handling. Testing: - ran perf test from tools/perf and did not observe any regression with the earlier code Signed-off-by: Hrishikesh Suresh <hrishikesh123s@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-20perf list: Support filtering in JSON outputNamhyung Kim
Like regular output mode, it should honor command line arguments to limit to a certain type of PMUs or events. $ perf list -j hw [ { "Unit": "cpu", "Topic": "legacy hardware", "EventName": "branch-instructions", "EventType": "Kernel PMU event", "BriefDescription": "Retired branch instructions [This event is an alias of branches]", "Encoding": "cpu/event=0xc4\n/" }, { "Unit": "cpu", "Topic": "legacy hardware", "EventName": "branch-misses", "EventType": "Kernel PMU event", "BriefDescription": "Mispredicted branch instructions", "Encoding": "cpu/event=0xc5\n/" }, ... Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-20perf list: Share print state with JSON outputNamhyung Kim
The JSON print state has only one different field (need_sep). Let's add the default print state to the json state and use it. Then we can use the 'ps' variable to update the state properly. This is a preparation for the next commit. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-20perf list: Print matching PMU events for --unitNamhyung Kim
When --unit option is used, pmu_glob is set to the argument. It should match with event PMU and display the matching ones only. But it also shows raw events and metrics after that. $ perf list --unit tool List of pre-defined events (to be used in -e or -M): tool: core_wide [1 if not SMT,if SMT are events being gathered on all SMT threads 1 otherwise 0. Unit: tool] duration_time [Wall clock interval time in nanoseconds. Unit: tool] has_pmem [1 if persistent memory installed otherwise 0. Unit: tool] num_cores [Number of cores. A core consists of 1 or more thread,with each thread being associated with a logical Linux CPU. Unit: tool] num_cpus [Number of logical Linux CPUs. There may be multiple such CPUs on a core. Unit: tool] ... rNNN [Raw event descriptor] cpu/event=0..255,pc,edge,.../modifier [Raw event descriptor] [(see 'man perf-list' or 'man perf-record' on how to encode it)] breakpoint//modifier [Raw event descriptor] cstate_core/event=0..0xffffffffffffffff/modifier [Raw event descriptor] cstate_pkg/event=0..0xffffffffffffffff/modifier [Raw event descriptor] drm_i915//modifier [Raw event descriptor] hwmon_acpitz//modifier [Raw event descriptor] hwmon_ac//modifier [Raw event descriptor] hwmon_bat0//modifier [Raw event descriptor] hwmon_coretemp//modifier [Raw event descriptor] ... Metric Groups: Backend: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] tma_core_bound [This metric represents fraction of slots where Core non-memory issues were of a bottleneck] tma_info_core_ilp [Instruction-Level-Parallelism (average number of uops executed when there is execution) per thread (logical-processor)] tma_info_memory_l2mpki [L2 cache true misses per kilo instruction for retired demand loads] ... This change makes it print the tool PMU events only. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-20perf test all metrics: Fully ignore Default metric failuresIan Rogers
Determine if a metric is default from `perf list --raw-dump $m` eg: ``` $ perf list --raw-dump l1_prefetch_miss_rate Default4 l1_prefetch_miss_rate ``` If a metric has "not supported" or "no supported events" then ignore these failures for default metrics. Tidy up the skip/fail messages in the output to make them easier to spot/read. ``` $ perf list -vv "all metrics" ... Testing llc_miss_rate [Ignored llc_miss_rate] failed but as a Default metric this can be expected Error: No supported events found. The LLC-loads event is not supported. ... ``` Reported-by: Thomas Richter <tmricht@linux.ibm.com> Closes: https://lore.kernel.org/linux-perf-users/20251119104751.51960-1-tmricht@linux.ibm.com/ Reported-by: Namhyung Kim <namhyung@kernel.org> Reported-by: James Clark <james.clark@linaro.org> Closes: https://lore.kernel.org/lkml/aRi9xnwdLh3Dir9f@google.com/ Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.18-rc7). No conflicts, adjacent changes: tools/testing/selftests/net/af_unix/Makefile e1bb28bf13f4 ("selftest: af_unix: Add test for SO_PEEK_OFF.") 45a1cd8346ca ("selftests: af_unix: Add tests for ECONNRESET and EOF semantics") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-19perf evsel: Skip store_evsel_ids for non-perf-event PMUsIan Rogers
The IDs are associated with perf events and not applicable to non-perf event PMUs. The failure to generate the ids was causing perf stat record to fail. ``` $ perf stat record -a sleep 1 Performance counter stats for 'system wide': 47,941 context-switches # nan cs/sec cs_per_second 0.00 msec cpu-clock # 0.0 CPUs CPUs_utilized 3,261 cpu-migrations # nan migrations/sec migrations_per_second 516 page-faults # nan faults/sec page_faults_per_second 7,525,483 cpu_core/branch-misses/ # 2.3 % branch_miss_rate 322,069,004 cpu_core/branches/ # nan M/sec branch_frequency 1,895,684,291 cpu_core/cpu-cycles/ # nan GHz cycles_frequency 2,789,777,426 cpu_core/instructions/ # 1.5 instructions insn_per_cycle 7,074,765 cpu_atom/branch-misses/ # 3.2 % branch_miss_rate (49.89%) 224,225,412 cpu_atom/branches/ # nan M/sec branch_frequency (50.29%) 2,061,679,981 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (50.33%) 2,011,242,533 cpu_atom/instructions/ # 1.0 instructions insn_per_cycle (50.33%) TopdownL1 (cpu_core) # 9.0 % tma_bad_speculation # 28.3 % tma_frontend_bound # 35.2 % tma_backend_bound # 27.5 % tma_retiring TopdownL1 (cpu_atom) # 36.8 % tma_backend_bound (59.65%) # 22.8 % tma_frontend_bound (59.60%) # 11.6 % tma_bad_speculation # 28.8 % tma_retiring (59.59%) 1.006777519 seconds time elapsed $ perf stat report Performance counter stats for 'perf': 1,013,376,154 duration_time <not counted> duration_time <not counted> duration_time <not counted> duration_time <not counted> duration_time <not counted> duration_time 47,941 context-switches 0.00 msec cpu-clock 3,261 cpu-migrations 516 page-faults 7,525,483 cpu_core/branch-misses/ 322,069,814 cpu_core/branches/ 322,069,004 cpu_core/branches/ 1,895,684,291 cpu_core/cpu-cycles/ 1,895,679,209 cpu_core/cpu-cycles/ 2,789,777,426 cpu_core/instructions/ <not counted> cpu_core/cpu-cycles/ <not counted> cpu_core/stalled-cycles-frontend/ <not counted> cpu_core/cpu-cycles/ <not counted> cpu_core/stalled-cycles-backend/ <not counted> cpu_core/stalled-cycles-backend/ <not counted> cpu_core/instructions/ <not counted> cpu_core/stalled-cycles-frontend/ 7,074,765 cpu_atom/branch-misses/ (49.89%) 221,679,088 cpu_atom/branches/ (49.89%) 224,225,412 cpu_atom/branches/ (50.29%) 2,061,679,981 cpu_atom/cpu-cycles/ (50.33%) 2,016,259,567 cpu_atom/cpu-cycles/ (50.33%) 2,011,242,533 cpu_atom/instructions/ (50.33%) <not counted> cpu_atom/cpu-cycles/ <not counted> cpu_atom/stalled-cycles-frontend/ <not counted> cpu_atom/cpu-cycles/ <not counted> cpu_atom/stalled-cycles-backend/ <not counted> cpu_atom/stalled-cycles-backend/ <not counted> cpu_atom/instructions/ <not counted> cpu_atom/stalled-cycles-frontend/ 17,145,113 cpu_core/INT_MISC.UOP_DROPPING/ 10,594,226,100 cpu_core/TOPDOWN.SLOTS/ 2,919,021,401 cpu_core/topdown-retiring/ 943,101,838 cpu_core/topdown-bad-spec/ 3,031,152,533 cpu_core/topdown-fe-bound/ 3,739,756,791 cpu_core/topdown-be-bound/ 1,909,501,648 cpu_atom/CPU_CLK_UNHALTED.CORE/ (60.04%) 3,516,608,359 cpu_atom/TOPDOWN_BE_BOUND.ALL/ (59.65%) 2,179,403,876 cpu_atom/TOPDOWN_FE_BOUND.ALL/ (59.60%) 2,745,732,458 cpu_atom/TOPDOWN_RETIRING.ALL/ (59.59%) 1.006777519 seconds time elapsed Some events weren't counted. Try disabling the NMI watchdog: echo 0 > /proc/sys/kernel/nmi_watchdog perf stat ... echo 1 > /proc/sys/kernel/nmi_watchdog ``` Reported-by: James Clark <james.clark@linaro.org> Closes: https://lore.kernel.org/lkml/ca0f0cd3-7335-48f9-8737-2f70a75b019a@linaro.org/ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-19perf pmu: Add PMU kind to simplify differentiatingIan Rogers
Rather than perf_pmu__is_xxx calls, and a notion of kind so that a single call can be used. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-19perf header: Switch "cpu" for find_core_pmu in caps feature writingIan Rogers
Writing currently fails on non-x86 and hybrid CPUs. Switch to the more regular find_core_pmu that is normally used in this case. Tested on hybrid alderlake system. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-19perf test maps: Additional maps__fixup_overlap_and_insert testsIan Rogers
Add additional test to the maps covering maps__fixup_overlap_and_insert. Change the test suite to be for more than just 1 test. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-19perf maps: Avoid RC_CHK use after freeIan Rogers
The case of __maps__fixup_overlap_and_insert where the "new" maps covers existing mappings can create a use-after-free with reference count checking enabled. The issue is that "pos" holds a map pointer from maps_by_address that is put from maps_by_address but then used to look for a map in maps_by_name (the compared map is now a use-after-free). The issue stems from using maps__remove which redoes some of the searches already done by __maps__fixup_overlap_and_insert, so optimize the code (by avoiding repeated searches) and avoid the use-after-free by inlining the appropriate removal code. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202511141407.f9edcfa6-lkp@intel.com Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18perf stat: Read tool events lastIan Rogers
When reading a metric like memory bandwidth on multiple sockets, the additional sockets will be on CPUS > 0. Because of the affinity reading, the counters are read on CPU 0 along with the time, then the later sockets are read. This can lead to the later sockets having a bandwidth larger than is possible for the period of time. To avoid this move the reading of tool events to occur after all other events are read. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18perf arm_spe: Synthesize memory samples for SIMD operationsLeo Yan
Synthesize memory samples for SIMD operations (including Advanced SIMD, SVE, and SME). To provide complete information, also generate data source entries for SIMD operations. Since memory operations are not limited to load and store, set PERF_MEM_OP_STORE if the operation does not fall into these cases. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18perf arm_spe: Expose SIMD information in other operationsLeo Yan
The other operations contain SME data processing, ASE (Advanced SIMD) and floating-point operations. Expose these info in the records. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18perf arm_spe: Report GCS in recordLeo Yan
Report GCS related info in records. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18perf arm_spe: Report memset and memcpy in recordsLeo Yan
Expose memset and memcpy related info in records. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18perf arm_spe: Report associated info for SVE / SME operationsLeo Yan
SVE / SME operations can be predicated or Gather load / scatter store, save the relevant info into record. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18perf arm_spe: Report extended memory operations in recordsLeo Yan
Extended memory operations include atomic (AT), acquire/release (AR), and exclusive (EXCL) operations. Save the relevant information in the records. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18perf arm_spe: Report MTE allocation tag in recordLeo Yan
Save MTE tag info in memory record. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>