linux-toradex.git/tools/perf/util/stat-display.c, branch v6.8

perf stat: Combine the -A/--no-aggr and --no-merge options

2023-12-14T21:24:38+00:00

The -A or --no-aggr option disables aggregation of core events:

  $ perf stat -A -e cycles,data_total -a true

   Performance counter stats for 'system wide':

  CPU0            1,287,665      cycles
  CPU1            1,831,681      cycles
  CPU2           27,345,998      cycles
  CPU3            1,964,799      cycles
  CPU4              236,174      cycles
  CPU5            3,302,825      cycles
  CPU6            9,201,446      cycles
  CPU7            1,403,043      cycles
  CPU0               110.90 MiB  data_total

         0.008961761 seconds time elapsed

The --no-merge option disables the aggregation of uncore events:

  $ perf stat --no-merge -e cycles,data_total -a true

   Performance counter stats for 'system wide':

          38,482,778      cycles
               15.04 MiB  data_total [uncore_imc_free_running_1]
               15.00 MiB  data_total [uncore_imc_free_running_0]

         0.005915155 seconds time elapsed

Having two options confuses users who generally don't appreciate the
difference in PMUs. Keep all the options but make it so they all
disable aggregation both of core and uncore events:

  $ perf stat -A -e cycles,data_total -a true

   Performance counter stats for 'system wide':

  CPU0               85,878      cycles
  CPU1               88,179      cycles
  CPU2               60,872      cycles
  CPU3            3,265,567      cycles
  CPU4               82,357      cycles
  CPU5               83,383      cycles
  CPU6               84,156      cycles
  CPU7              220,803      cycles
  CPU0                 2.38 MiB  data_total [uncore_imc_free_running_0]
  CPU0                 2.38 MiB  data_total [uncore_imc_free_running_1]

         0.001397205 seconds time elapsed

Update the relevant 'perf stat' man page information.

Reviewed-by: Kan Liang 
Signed-off-by: Ian Rogers 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Athira Jajeev 
Cc: Changbin Du 
Cc: Ingo Molnar 
Cc: James Clark 
Cc: Jiri Olsa 
Cc: John Garry 
Cc: K Prateek Nayak 
Cc: Kaige Ye 
Cc: Mark Rutland 
Cc: Namhyung Kim 
Cc: Nick Desaulniers 
Cc: Peter Zijlstra 
Link: https://lore.kernel.org/r/20231214060256.2094017-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo

perf stat-display: Check if snprintf()'s fmt argument is NULL

2023-08-21T13:54:22+00:00

It is undefined behavior to pass NULL as snprintf()'s fmt argument.
Here is an example to trigger the problem:

  $ perf stat --metric-only -x, -e instructions -- sleep 1
  insn per cycle,
  Segmentation fault (core dumped)

With this patch:

  $ perf stat --metric-only -x, -e instructions -- sleep 1
  insn per cycle,
  ,

Reviewed-by: Ian Rogers 
Signed-off-by: Kaige Ye 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Mark Rutland 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Link: https://lore.kernel.org/r/01CA7674B690CA24+20230804020907.144562-2-ye@kaige.org
Signed-off-by: Arnaldo Carvalho de Melo

perf stat: Don't display zero tool counts

2023-08-08T17:33:57+00:00

Andi reported (see link below) a regression when printing the
'duration_time' tool event, where it gets printed as "not counted" for
most of the CPUs, fix it by skipping zero counts for tool events.

Reported-by: Andi Kleen 
Signed-off-by: Ian Rogers 
Tested-by: Andi Kleen 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Athira Rajeev 
Cc: Claire Jensen 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Mark Rutland 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Link: https://lore.kernel.org/all/ZMlrzcVrVi1lTDmn@tassilo/
Signed-off-by: Arnaldo Carvalho de Melo

perf stat: New metricgroup output for the default mode

2023-06-16T12:57:19+00:00

In the default mode, the current output of the metricgroup include both
events and metrics, which is not necessary and just makes the output
hard to read. Since different ARCHs (even different generations in the
same ARCH) may use different events. The output also vary on different
platforms.

For a metricgroup, only outputting the value of each metric is good
enough.

Add a new field default_metricgroup in evsel to indicate an event of the
default metricgroup. For those events, printout() should print the
metricgroup name rather than each event.

Add perf_stat__skip_metric_event() to skip the evsel in the Default
metricgroup, if it's not running or not the metric event.

Add print_metricgroup_header_t to pass the functions which print the
display name of each metricgroup in the Default metricgroup. Support all
three output methods.

Factor out perf_stat__print_shadow_stats_metricgroup() to print out each
metrics.

On SPR:

Before:

 ./perf_old stat sleep 1

 Performance counter stats for 'sleep 1':

              0.54 msec task-clock:u                     #    0.001 CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
                68      page-faults:u                    #  125.445 K/sec
           540,970      cycles:u                         #    0.998 GHz
           556,325      instructions:u                   #    1.03  insn per cycle
           123,602      branches:u                       #  228.018 M/sec
             6,889      branch-misses:u                  #    5.57% of all branches
         3,245,820      TOPDOWN.SLOTS:u                  #     18.4 %  tma_backend_bound
                                                  #     17.2 %  tma_retiring
                                                  #     23.1 %  tma_bad_speculation
                                                  #     41.4 %  tma_frontend_bound
           564,859      topdown-retiring:u
         1,370,999      topdown-fe-bound:u
           603,271      topdown-be-bound:u
           744,874      topdown-bad-spec:u
            12,661      INT_MISC.UOP_DROPPING:u          #   23.357 M/sec

       1.001798215 seconds time elapsed

       0.000193000 seconds user
       0.001700000 seconds sys

After:

$ ./perf stat sleep 1

 Performance counter stats for 'sleep 1':

              0.51 msec task-clock:u                     #    0.001 CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
                68      page-faults:u                    #  132.683 K/sec
           545,228      cycles:u                         #    1.064 GHz
           555,509      instructions:u                   #    1.02  insn per cycle
           123,574      branches:u                       #  241.120 M/sec
             6,957      branch-misses:u                  #    5.63% of all branches
                        TopdownL1                 #     17.5 %  tma_backend_bound
                                                  #     22.6 %  tma_bad_speculation
                                                  #     42.7 %  tma_frontend_bound
                                                  #     17.1 %  tma_retiring
                        TopdownL2                 #     21.8 %  tma_branch_mispredicts
                                                  #     11.5 %  tma_core_bound
                                                  #     13.4 %  tma_fetch_bandwidth
                                                  #     29.3 %  tma_fetch_latency
                                                  #      2.7 %  tma_heavy_operations
                                                  #     14.5 %  tma_light_operations
                                                  #      0.8 %  tma_machine_clears
                                                  #      6.1 %  tma_memory_bound

       1.001712086 seconds time elapsed

       0.000151000 seconds user
       0.001618000 seconds sys

Reviewed-by: Ian Rogers 
Signed-off-by: Kan Liang 
Cc: Adrian Hunter 
Cc: Ahmad Yasin 
Cc: Andi Kleen 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Link: https://lore.kernel.org/r/20230616031420.3751973-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

perf pmus: Remove perf_pmus__has_hybrid

2023-05-27T12:42:38+00:00

perf_pmus__has_hybrid was used to detect when there was >1 core PMU,
this can be achieved with perf_pmus__num_core_pmus that doesn't depend
upon is_pmu_hybrid and PMU name comparisons. When modifying the
function calls take the opportunity to improve comments,
enable/simplify tests that were previously failing for hybrid but now
pass and to simplify generic code.

Reviewed-by: Kan Liang 
Signed-off-by: Ian Rogers 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Ali Saidi 
Cc: Athira Rajeev 
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: Huacai Chen 
Cc: Ingo Molnar 
Cc: James Clark 
Cc: Jing Zhang 
Cc: Jiri Olsa 
Cc: John Garry 
Cc: Kajol Jain 
Cc: Kang Minchul 
Cc: Leo Yan 
Cc: Madhavan Srinivasan 
Cc: Mark Rutland 
Cc: Mike Leach 
Cc: Ming Wang 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Ravi Bangoria 
Cc: Rob Herring 
Cc: Sandipan Das 
Cc: Sean Christopherson 
Cc: Suzuki Poulouse 
Cc: Thomas Richter 
Cc: Will Deacon 
Cc: Xing Zhengjun 
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230527072210.2900565-34-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo

perf pmu: Separate pmu and pmus

2023-05-27T12:41:39+00:00

Separate and hide the pmus list in pmus.[ch]. Move pmus functionality
out of pmu.[ch] into pmus.[ch] renaming pmus functions which were
prefixed perf_pmu__ to perf_pmus__.

Reviewed-by: Kan Liang 
Signed-off-by: Ian Rogers 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Ali Saidi 
Cc: Athira Rajeev 
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: Huacai Chen 
Cc: Ingo Molnar 
Cc: James Clark 
Cc: Jing Zhang 
Cc: Jiri Olsa 
Cc: John Garry 
Cc: Kajol Jain 
Cc: Kang Minchul 
Cc: Leo Yan 
Cc: Madhavan Srinivasan 
Cc: Mark Rutland 
Cc: Mike Leach 
Cc: Ming Wang 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Ravi Bangoria 
Cc: Rob Herring 
Cc: Sandipan Das 
Cc: Sean Christopherson 
Cc: Suzuki Poulouse 
Cc: Thomas Richter 
Cc: Will Deacon 
Cc: Xing Zhengjun 
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230527072210.2900565-28-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo

perf stat: Avoid hybrid PMU list

2023-05-27T12:40:57+00:00

perf_pmu__is_hybrid implicitly uses the hybrid PMU list. Instead
return false if hybrid isn't present, if it is then see if any evsel's
PMUs are core.

Reviewed-by: Kan Liang 
Signed-off-by: Ian Rogers 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Ali Saidi 
Cc: Athira Rajeev 
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: Huacai Chen 
Cc: Ingo Molnar 
Cc: James Clark 
Cc: Jing Zhang 
Cc: Jiri Olsa 
Cc: John Garry 
Cc: Kajol Jain 
Cc: Kang Minchul 
Cc: Leo Yan 
Cc: Madhavan Srinivasan 
Cc: Mark Rutland 
Cc: Mike Leach 
Cc: Ming Wang 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Ravi Bangoria 
Cc: Rob Herring 
Cc: Sandipan Das 
Cc: Sean Christopherson 
Cc: Suzuki Poulouse 
Cc: Thomas Richter 
Cc: Will Deacon 
Cc: Xing Zhengjun 
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230527072210.2900565-23-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo

perf evlist: Reduce scope of evlist__has_hybrid

2023-05-27T12:39:51+00:00

Function is only used in printout, reduce scope to
stat-display.c. Remove the now empty evlist-hybrid.c and
evlist-hybrid.h.

Reviewed-by: Kan Liang 
Signed-off-by: Ian Rogers 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Ali Saidi 
Cc: Athira Rajeev 
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: Huacai Chen 
Cc: Ingo Molnar 
Cc: James Clark 
Cc: Jing Zhang 
Cc: Jiri Olsa 
Cc: John Garry 
Cc: Kajol Jain 
Cc: Kang Minchul 
Cc: Leo Yan 
Cc: Madhavan Srinivasan 
Cc: Mark Rutland 
Cc: Mike Leach 
Cc: Ming Wang 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Ravi Bangoria 
Cc: Rob Herring 
Cc: Sandipan Das 
Cc: Sean Christopherson 
Cc: Suzuki Poulouse 
Cc: Thomas Richter 
Cc: Will Deacon 
Cc: Xing Zhengjun 
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230527072210.2900565-15-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo

perf stat: Setup the foundation to allow aggregation based on cache topology

2023-05-23T19:08:08+00:00

Processors based on chiplet architecture, such as AMD EPYC and Hygon do
not expose the chiplet details in the sysfs CPU topology information.
However, this information can be derived from the per CPU cache level
information from the sysfs.

'perf stat' has already supported aggregation based on topology
information using core ID, socket ID, etc. It'll be useful to aggregate
based on the cache topology to detect problems like imbalance and
cache-to-cache sharing at various cache levels.

This patch lays the foundation for aggregating data in 'perf stat' based
on the processor's cache topology. The cmdline option to aggregate data
based on the cache topology is added in Patch 4 of the series while this
patch sets up all the necessary functions and variables required to
support the new aggregation option.

The patch also adds support to display per-cache aggregation, or save it
as a JSON or CSV, as splitting it into a separate patch would break
builds when compiling with "-Werror=switch-enum" where the compiler will
complain about the lack of handling for the AGGR_CACHE case in the
output functions.

Committer notes:

Don't use perf_stat_config in tools/perf/util/cpumap.c, this would make
code that is in util/, thus not really specific to a single builtin, use
a specific builtin config structure.

Move the functions introduced in this patch from
tools/perf/util/cpumap.c since it needs access to builtin specific
and is not strictly needed to live in the util/ directory.

With this 'perf test python' is back building.

Suggested-by: Gautham Shenoy 
Signed-off-by: K Prateek Nayak 
Acked-by: Ian Rogers 
Cc: Alexander Shishkin 
Cc: Ananth Narayan 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Mark Rutland 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Ravi Bangoria 
Cc: Sandipan Das 
Cc: Stephane Eranian 
Cc: Wen Pu 
Link: https://lore.kernel.org/r/20230517172745.5833-3-kprateek.nayak@amd.com
Signed-off-by: Arnaldo Carvalho de Melo

perf metric: Change divide by zero and !support events behavior

2023-05-10T15:35:02+00:00

Division by zero causes expression parsing to fail and no metric to be
generated. This can mean for short running benchmarks metrics are not
shown. Change the behavior to make the value nan, which gets shown like:

'''
$ perf stat -M TopdownL2 true

 Performance counter stats for 'true':

         1,031,492      INST_RETIRED.ANY                 #      nan %  tma_fetch_bandwidth
                                                  #      nan %  tma_heavy_operations
                                                  #      nan %  tma_light_operations
            29,304      CPU_CLK_UNHALTED.REF_XCLK        #      nan %  tma_fetch_latency
                                                  #      nan %  tma_branch_mispredicts
                                                  #      nan %  tma_machine_clears
                                                  #      nan %  tma_core_bound
                                                  #      nan %  tma_memory_bound
         2,658,319      IDQ_UOPS_NOT_DELIVERED.CORE
            11,167      EXE_ACTIVITY.BOUND_ON_STORES
           262,058      EXE_ACTIVITY.1_PORTS_UTIL
           BR_MISP_RETIRED.ALL_BRANCHES                                            (0.00%)
           INT_MISC.RECOVERY_CYCLES_ANY                                            (0.00%)
           CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE                                        (0.00%)
           CPU_CLK_UNHALTED.THREAD                                                 (0.00%)
           UOPS_RETIRED.RETIRE_SLOTS                                               (0.00%)
           CYCLE_ACTIVITY.STALLS_MEM_ANY                                           (0.00%)
           UOPS_RETIRED.MACRO_FUSED                                                (0.00%)
           IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE                                        (0.00%)
           EXE_ACTIVITY.2_PORTS_UTIL                                               (0.00%)
           CYCLE_ACTIVITY.STALLS_TOTAL                                             (0.00%)
           MACHINE_CLEARS.COUNT                                                    (0.00%)
           UOPS_ISSUED.ANY                                                         (0.00%)

       0.002864879 seconds time elapsed

       0.003012000 seconds user
       0.000000000 seconds sys
'''

When events aren't supported a count of 0 can be confusing and make
metrics look meaningful. Change these to be nan also which, with the
next change, gets shown like:

'''
$ perf stat true
 Performance counter stats for 'true':

              1.25 msec task-clock:u                     #    0.387 CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
                46      page-faults:u                    #   36.702 K/sec
           255,942      cycles:u                         #    0.204 GHz                         (88.66%)
           123,046      instructions:u                   #    0.48  insn per cycle
            28,301      branches:u                       #   22.580 M/sec
             2,489      branch-misses:u                  #    8.79% of all branches
             4,719      CPU_CLK_UNHALTED.REF_XCLK:u      #    3.765 M/sec
                                                  #      nan %  tma_frontend_bound
                                                  #      nan %  tma_retiring
                                                  #      nan %  tma_backend_bound
                                                  #      nan %  tma_bad_speculation
           344,855      IDQ_UOPS_NOT_DELIVERED.CORE:u    #  275.147 M/sec
         INT_MISC.RECOVERY_CYCLES_ANY:u
           CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u                                        (0.00%)
           CPU_CLK_UNHALTED.THREAD:u                                               (0.00%)
           UOPS_RETIRED.RETIRE_SLOTS:u                                             (0.00%)
           UOPS_ISSUED.ANY:u                                                       (0.00%)

       0.003238142 seconds time elapsed

       0.000000000 seconds user
       0.003434000 seconds sys
'''

Ensure that nan metric values are quoted as nan isn't a valid number
in JSON.

Signed-off-by: Ian Rogers 
Tested-by: Kan Liang 
Cc: Adrian Hunter 
Cc: Ahmad Yasin 
Cc: Alexander Shishkin 
Cc: Andi Kleen 
Cc: Athira Rajeev 
Cc: Caleb Biggers 
Cc: Edward Baker 
Cc: Florian Fischer 
Cc: Ingo Molnar 
Cc: James Clark 
Cc: Jiri Olsa 
Cc: John Garry 
Cc: Kajol Jain 
Cc: Kang Minchul 
Cc: Leo Yan 
Cc: Mark Rutland 
Cc: Namhyung Kim 
Cc: Perry Taylor 
Cc: Peter Zijlstra 
Cc: Ravi Bangoria 
Cc: Rob Herring 
Cc: Samantha Alt 
Cc: Stephane Eranian 
Cc: Sumanth Korikkar 
Cc: Suzuki Poulouse 
Cc: Thomas Richter 
Cc: Tiezhu Yang 
Cc: Weilin Wang 
Cc: Xing Zhengjun 
Cc: Yang Jihong 
Link: https://lore.kernel.org/r/20230502223851.2234828-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo