linux-toradex.git/kernel/trace, branch toradex_5.3.y

tracing: Fix "gfp_t" format for synthetic events

2019-11-10T10:34:15+00:00

[ Upstream commit 9fa8c9c647be624e91b09ecffa7cd97ee0600b40 ]

In the format of synthetic events, the "gfp_t" is shown as "signed:1",
but in fact the "gfp_t" is "unsigned", should be shown as "signed:0".

The issue can be reproduced by the following commands:

echo 'memlatency u64 lat; unsigned int order; gfp_t gfp_flags; int migratetype' > /sys/kernel/debug/tracing/synthetic_events
cat  /sys/kernel/debug/tracing/events/synthetic/memlatency/format

name: memlatency
ID: 2233
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:u64 lat;  offset:8;       size:8; signed:0;
        field:unsigned int order;       offset:16;      size:4; signed:0;
        field:gfp_t gfp_flags;  offset:24;      size:4; signed:1;
        field:int migratetype;  offset:32;      size:4; signed:1;

print fmt: "lat=%llu, order=%u, gfp_flags=%x, migratetype=%d", REC->lat, REC->order, REC->gfp_flags, REC->migratetype

Link: http://lkml.kernel.org/r/20191018012034.6404-1-zhengjun.xing@linux.intel.com

Reviewed-by: Tom Zanussi 
Signed-off-by: Zhengjun Xing 
Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Sasha Levin

tracing: Initialize iter->seq after zeroing in tracing_read_pipe()

2019-11-06T12:08:47+00:00

[ Upstream commit d303de1fcf344ff7c15ed64c3f48a991c9958775 ]

A customer reported the following softlockup:

[899688.160002] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [test.sh:16464]
[899688.160002] CPU: 0 PID: 16464 Comm: test.sh Not tainted 4.12.14-6.23-azure #1 SLE12-SP4
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] Kernel panic - not syncing: softlockup: hung tasks
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] RSP: 0018:ffffa86784d4fde8 EFLAGS: 00000257 ORIG_RAX: ffffffffffffff12
[899688.160002] RAX: ffffffff970fea00 RBX: 0000000000000001 RCX: 0000000000000000
[899688.160002] RDX: ffffffff00000001 RSI: 0000000000000080 RDI: ffffffff970fea00
[899688.160002] RBP: ffffffffffffffff R08: ffffffffffffffff R09: 0000000000000000
[899688.160002] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b59014720d8
[899688.160002] R13: ffff8b59014720c0 R14: ffff8b5901471090 R15: ffff8b5901470000
[899688.160002]  tracing_read_pipe+0x336/0x3c0
[899688.160002]  __vfs_read+0x26/0x140
[899688.160002]  vfs_read+0x87/0x130
[899688.160002]  SyS_read+0x42/0x90
[899688.160002]  do_syscall_64+0x74/0x160

It caught the process in the middle of trace_access_unlock(). There is
no loop. So, it must be looping in the caller tracing_read_pipe()
via the "waitagain" label.

Crashdump analyze uncovered that iter->seq was completely zeroed
at this point, including iter->seq.seq.size. It means that
print_trace_line() was never able to print anything and
there was no forward progress.

The culprit seems to be in the code:

	/* reset all but tr, trace, and overruns */
	memset(&iter->seq, 0,
	       sizeof(struct trace_iterator) -
	       offsetof(struct trace_iterator, seq));

It was added by the commit 53d0aa773053ab182877 ("ftrace:
add logic to record overruns"). It was v2.6.27-rc1.
It was the time when iter->seq looked like:

     struct trace_seq {
	unsigned char		buffer[PAGE_SIZE];
	unsigned int		len;
     };

There was no "size" variable and zeroing was perfectly fine.

The solution is to reinitialize the structure after or without
zeroing.

Link: http://lkml.kernel.org/r/20191011142134.11997-1-pmladek@suse.com

Signed-off-by: Petr Mladek 
Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Sasha Levin

tracing: Fix race in perf_trace_buf initialization

2019-10-29T08:22:39+00:00

commit 6b1340cc00edeadd52ebd8a45171f38c8de2a387 upstream.

A race condition exists while initialiazing perf_trace_buf from
perf_trace_init() and perf_kprobe_init().

      CPU0                                        CPU1
perf_trace_init()
  mutex_lock(&event_mutex)
    perf_trace_event_init()
      perf_trace_event_reg()
        total_ref_count == 0
	buf = alloc_percpu()
        perf_trace_buf[i] = buf
        tp_event->class->reg() //fails       perf_kprobe_init()
	goto fail                              perf_trace_event_init()
                                                 perf_trace_event_reg()
        fail:
	  total_ref_count == 0

                                                   total_ref_count == 0
                                                   buf = alloc_percpu()
                                                   perf_trace_buf[i] = buf
                                                   tp_event->class->reg()
                                                   total_ref_count++

          free_percpu(perf_trace_buf[i])
          perf_trace_buf[i] = NULL

Any subsequent call to perf_trace_event_reg() will observe total_ref_count > 0,
causing the perf_trace_buf to be always NULL. This can result in perf_trace_buf
getting accessed from perf_trace_buf_alloc() without being initialized. Acquiring
event_mutex in perf_kprobe_init() before calling perf_trace_event_init() should
fix this race.

The race caused the following bug:

 Unable to handle kernel paging request at virtual address 0000003106f2003c
 Mem abort info:
   ESR = 0x96000045
   Exception class = DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
 Data abort info:
   ISV = 0, ISS = 0x00000045
   CM = 0, WnR = 1
 user pgtable: 4k pages, 39-bit VAs, pgdp = ffffffc034b9b000
 [0000003106f2003c] pgd=0000000000000000, pud=0000000000000000
 Internal error: Oops: 96000045 [#1] PREEMPT SMP
 Process syz-executor (pid: 18393, stack limit = 0xffffffc093190000)
 pstate: 80400005 (Nzcv daif +PAN -UAO)
 pc : __memset+0x20/0x1ac
 lr : memset+0x3c/0x50
 sp : ffffffc09319fc50

  __memset+0x20/0x1ac
  perf_trace_buf_alloc+0x140/0x1a0
  perf_trace_sys_enter+0x158/0x310
  syscall_trace_enter+0x348/0x7c0
  el0_svc_common+0x11c/0x368
  el0_svc_handler+0x12c/0x198
  el0_svc+0x8/0xc

Ramdumps showed the following:
  total_ref_count = 3
  perf_trace_buf = (
      0x0 -> NULL,
      0x0 -> NULL,
      0x0 -> NULL,
      0x0 -> NULL)

Link: http://lkml.kernel.org/r/1571120245-4186-1-git-send-email-prsood@codeaurora.org

Cc: stable@vger.kernel.org
Fixes: e12f03d7031a9 ("perf/core: Implement the 'perf_kprobe' PMU")
Acked-by: Song Liu 
Signed-off-by: Prateek Sood 
Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Greg Kroah-Hartman

tracing: Get trace_array reference for available_tracers files

2019-10-17T20:47:31+00:00

commit 194c2c74f5532e62c218adeb8e2b683119503907 upstream.

As instances may have different tracers available, we need to look at the
trace_array descriptor that shows the list of the available tracers for the
instance. But there's a race between opening the file and an admin
deleting the instance. The trace_array_get() needs to be called before
accessing the trace_array.

Cc: stable@vger.kernel.org
Fixes: 607e2ea167e56 ("tracing: Set up infrastructure to allow tracers for instances")
Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Greg Kroah-Hartman

ftrace: Get a reference counter for the trace_array on filter files

2019-10-17T20:47:31+00:00

commit 9ef16693aff8137faa21d16ffe65bb9832d24d71 upstream.

The ftrace set_ftrace_filter and set_ftrace_notrace files are specific for
an instance now. They need to take a reference to the instance otherwise
there could be a race between accessing the files and deleting the instance.

It wasn't until the :mod: caching where these file operations started
referencing the trace_array directly.

Cc: stable@vger.kernel.org
Fixes: 673feb9d76ab3 ("ftrace: Add :mod: caching infrastructure to trace_array")
Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Greg Kroah-Hartman

tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency

2019-10-17T20:47:31+00:00

commit fc64e4ad80d4b72efce116f87b3174f0b7196f8e upstream.

max_latency is intended to record the maximum ever observed hardware
latency, which may occur in either part of the loop (inner/outer). So
we need to also consider the outer-loop sample when updating
max_latency.

Link: http://lkml.kernel.org/r/157073345463.17189.18124025522664682811.stgit@srivatsa-ubuntu

Fixes: e7c15cd8a113 ("tracing: Added hardware latency tracer")
Cc: stable@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat (VMware) 
Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Greg Kroah-Hartman

tracing/hwlat: Report total time spent in all NMIs during the sample

2019-10-17T20:47:30+00:00

commit 98dc19c11470ee6048aba723d77079ad2cda8a52 upstream.

nmi_total_ts is supposed to record the total time spent in *all* NMIs
that occur on the given CPU during the (active portion of the)
sampling window. However, the code seems to be overwriting this
variable for each NMI, thereby only recording the time spent in the
most recent NMI. Fix it by accumulating the duration instead.

Link: http://lkml.kernel.org/r/157073343544.17189.13911783866738671133.stgit@srivatsa-ubuntu

Fixes: 7b2c86250122 ("tracing: Add NMI tracing in hwlat detector")
Cc: stable@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat (VMware) 
Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Greg Kroah-Hartman

bpf: Fix bpf_event_output re-entry issue

2019-10-11T16:36:48+00:00

[ Upstream commit 768fb61fcc13b2acaca758275d54c09a65e2968b ]

BPF_PROG_TYPE_SOCK_OPS program can reenter bpf_event_output because it
can be called from atomic and non-atomic contexts since we don't have
bpf_prog_active to prevent it happen.

This patch enables 3 levels of nesting to support normal, irq and nmi
context.

We can easily reproduce the issue by running netperf crr mode with 100
flows and 10 threads from netperf client side.

Here is the whole stack dump:

[  515.228898] WARNING: CPU: 20 PID: 14686 at kernel/trace/bpf_trace.c:549 bpf_event_output+0x1f9/0x220
[  515.228903] CPU: 20 PID: 14686 Comm: tcp_crr Tainted: G        W        4.15.0-smp-fixpanic #44
[  515.228904] Hardware name: Intel TBG,ICH10/Ikaria_QC_1b, BIOS 1.22.0 06/04/2018
[  515.228905] RIP: 0010:bpf_event_output+0x1f9/0x220
[  515.228906] RSP: 0018:ffff9a57ffc03938 EFLAGS: 00010246
[  515.228907] RAX: 0000000000000012 RBX: 0000000000000001 RCX: 0000000000000000
[  515.228907] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffffffff836b0f80
[  515.228908] RBP: ffff9a57ffc039c8 R08: 0000000000000004 R09: 0000000000000012
[  515.228908] R10: ffff9a57ffc1de40 R11: 0000000000000000 R12: 0000000000000002
[  515.228909] R13: ffff9a57e13bae00 R14: 00000000ffffffff R15: ffff9a57ffc1e2c0
[  515.228910] FS:  00007f5a3e6ec700(0000) GS:ffff9a57ffc00000(0000) knlGS:0000000000000000
[  515.228910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  515.228911] CR2: 0000537082664fff CR3: 000000061fed6002 CR4: 00000000000226f0
[  515.228911] Call Trace:
[  515.228913]  
[  515.228919]  [] bpf_sockopt_event_output+0x3b/0x50
[  515.228923]  [] ? bpf_ktime_get_ns+0xe/0x10
[  515.228927]  [] ? __cgroup_bpf_run_filter_sock_ops+0x85/0x100
[  515.228930]  [] ? tcp_init_transfer+0x125/0x150
[  515.228933]  [] ? tcp_finish_connect+0x89/0x110
[  515.228936]  [] ? tcp_rcv_state_process+0x704/0x1010
[  515.228939]  [] ? sk_filter_trim_cap+0x53/0x2a0
[  515.228942]  [] ? tcp_v6_inbound_md5_hash+0x6f/0x1d0
[  515.228945]  [] ? tcp_v6_do_rcv+0x1c0/0x460
[  515.228947]  [] ? tcp_v6_rcv+0x9f8/0xb30
[  515.228951]  [] ? ip6_route_input+0x190/0x220
[  515.228955]  [] ? ip6_protocol_deliver_rcu+0x6d/0x450
[  515.228958]  [] ? ip6_rcv_finish+0xb6/0x170
[  515.228961]  [] ? ip6_protocol_deliver_rcu+0x450/0x450
[  515.228963]  [] ? ipv6_rcv+0x61/0xe0
[  515.228966]  [] ? ipv6_list_rcv+0x330/0x330
[  515.228969]  [] ? __netif_receive_skb_one_core+0x5b/0xa0
[  515.228972]  [] ? __netif_receive_skb+0x21/0x70
[  515.228975]  [] ? process_backlog+0xb2/0x150
[  515.228978]  [] ? net_rx_action+0x16f/0x410
[  515.228982]  [] ? __do_softirq+0xdd/0x305
[  515.228986]  [] ? irq_exit+0x9c/0xb0
[  515.228989]  [] ? smp_call_function_single_interrupt+0x65/0x120
[  515.228991]  [] ? call_function_single_interrupt+0x81/0x90
[  515.228992]  
[  515.228996]  [] ? io_serial_in+0x20/0x20
[  515.229000]  [] ? console_unlock+0x230/0x490
[  515.229003]  [] ? vprintk_emit+0x26a/0x2a0
[  515.229006]  [] ? vprintk_default+0x1f/0x30
[  515.229008]  [] ? vprintk_func+0x35/0x70
[  515.229011]  [] ? printk+0x50/0x66
[  515.229013]  [] ? bpf_event_output+0xb7/0x220
[  515.229016]  [] ? bpf_sockopt_event_output+0x3b/0x50
[  515.229019]  [] ? bpf_ktime_get_ns+0xe/0x10
[  515.229023]  [] ? release_sock+0x97/0xb0
[  515.229026]  [] ? tcp_recvmsg+0x31a/0xda0
[  515.229029]  [] ? __cgroup_bpf_run_filter_sock_ops+0x85/0x100
[  515.229032]  [] ? tcp_set_state+0x191/0x1b0
[  515.229035]  [] ? tcp_disconnect+0x2e/0x600
[  515.229038]  [] ? tcp_close+0x3eb/0x460
[  515.229040]  [] ? inet_release+0x42/0x70
[  515.229043]  [] ? inet6_release+0x39/0x50
[  515.229046]  [] ? __sock_release+0x4d/0xd0
[  515.229049]  [] ? sock_close+0x15/0x20
[  515.229052]  [] ? __fput+0xe7/0x1f0
[  515.229055]  [] ? ____fput+0xe/0x10
[  515.229058]  [] ? task_work_run+0x82/0xb0
[  515.229061]  [] ? exit_to_usermode_loop+0x7e/0x11f
[  515.229064]  [] ? do_syscall_64+0x111/0x130
[  515.229067]  [] ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Fixes: a5a3a828cd00 ("bpf: add perf event notificaton support for sock_ops")
Signed-off-by: Allan Zhang 
Signed-off-by: Daniel Borkmann 
Reviewed-by: Stanislav Fomichev 
Reviewed-by: Eric Dumazet 
Acked-by: John Fastabend 
Link: https://lore.kernel.org/bpf/20190925234312.94063-2-allanzhang@google.com
Signed-off-by: Sasha Levin

tracing: Make sure variable reference alias has correct var_ref_idx

2019-10-11T16:36:07+00:00

commit 17f8607a1658a8e70415eef67909f990d13017b5 upstream.

Original changelog from Steve Rostedt (except last sentence which
explains the problem, and the Fixes: tag):

I performed a three way histogram with the following commands:

echo 'irq_lat u64 lat pid_t pid' > synthetic_events
echo 'wake_lat u64 lat u64 irqlat pid_t pid' >> synthetic_events
echo 'hist:keys=common_pid:irqts=common_timestamp.usecs if function == 0xffffffff81200580' > events/timer/hrtimer_start/trigger
echo 'hist:keys=common_pid:lat=common_timestamp.usecs-$irqts:onmatch(timer.hrtimer_start).irq_lat($lat,pid) if common_flags & 1' > events/sched/sched_waking/trigger
echo 'hist:keys=pid:wakets=common_timestamp.usecs,irqlat=lat' > events/synthetic/irq_lat/trigger
echo 'hist:keys=next_pid:lat=common_timestamp.usecs-$wakets,irqlat=$irqlat:onmatch(synthetic.irq_lat).wake_lat($lat,$irqlat,next_pid)' > events/sched/sched_switch/trigger
echo 1 > events/synthetic/wake_lat/enable

Basically I wanted to see:

 hrtimer_start (calling function tick_sched_timer)

Note:

  # grep tick_sched_timer /proc/kallsyms
ffffffff81200580 t tick_sched_timer

And save the time of that, and then record sched_waking if it is called
in interrupt context and with the same pid as the hrtimer_start, it
will record the latency between that and the waking event.

I then look at when the task that is woken is scheduled in, and record
the latency between the wakeup and the task running.

At the end, the wake_lat synthetic event will show the wakeup to
scheduled latency, as well as the irq latency in from hritmer_start to
the wakeup. The problem is that I found this:

          -0     [007] d...   190.485261: wake_lat: lat=27 irqlat=190485230 pid=698
          -0     [005] d...   190.485283: wake_lat: lat=40 irqlat=190485239 pid=10
          -0     [002] d...   190.488327: wake_lat: lat=56 irqlat=190488266 pid=335
          -0     [005] d...   190.489330: wake_lat: lat=64 irqlat=190489262 pid=10
          -0     [003] d...   190.490312: wake_lat: lat=43 irqlat=190490265 pid=77
          -0     [005] d...   190.493322: wake_lat: lat=54 irqlat=190493262 pid=10
          -0     [005] d...   190.497305: wake_lat: lat=35 irqlat=190497267 pid=10
          -0     [005] d...   190.501319: wake_lat: lat=50 irqlat=190501264 pid=10

The irqlat seemed quite large! Investigating this further, if I had
enabled the irq_lat synthetic event, I noticed this:

          -0     [002] d.s.   249.429308: irq_lat: lat=164968 pid=335
          -0     [002] d...   249.429369: wake_lat: lat=55 irqlat=249429308 pid=335

Notice that the timestamp of the irq_lat "249.429308" is awfully
similar to the reported irqlat variable. In fact, all instances were
like this. It appeared that:

  irqlat=$irqlat

Wasn't assigning the old $irqlat to the new irqlat variable, but
instead was assigning the $irqts to it.

The issue is that assigning the old $irqlat to the new irqlat variable
creates a variable reference alias, but the alias creation code
forgets to make sure the alias uses the same var_ref_idx to access the
reference.

Link: http://lkml.kernel.org/r/1567375321.5282.12.camel@kernel.org

Cc: Linux Trace Devel 
Cc: linux-rt-users 
Cc: stable@vger.kernel.org
Fixes: 7e8b88a30b085 ("tracing: Add hist trigger support for variable reference aliases")
Reported-by: Steven Rostedt (VMware) 
Signed-off-by: Tom Zanussi 
Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Greg Kroah-Hartman

tracing: Correct kdoc formats

2019-08-31T10:51:56+00:00

Fix the following kdoc warnings:

kernel/trace/trace.c:1579: warning: Function parameter or member 'tr' not described in 'update_max_tr_single'
kernel/trace/trace.c:1579: warning: Function parameter or member 'tsk' not described in 'update_max_tr_single'
kernel/trace/trace.c:1579: warning: Function parameter or member 'cpu' not described in 'update_max_tr_single'
kernel/trace/trace.c:1776: warning: Function parameter or member 'type' not described in 'register_tracer'
kernel/trace/trace.c:2239: warning: Function parameter or member 'task' not described in 'tracing_record_taskinfo'
kernel/trace/trace.c:2239: warning: Function parameter or member 'flags' not described in 'tracing_record_taskinfo'
kernel/trace/trace.c:2269: warning: Function parameter or member 'prev' not described in 'tracing_record_taskinfo_sched_switch'
kernel/trace/trace.c:2269: warning: Function parameter or member 'next' not described in 'tracing_record_taskinfo_sched_switch'
kernel/trace/trace.c:2269: warning: Function parameter or member 'flags' not described in 'tracing_record_taskinfo_sched_switch'
kernel/trace/trace.c:3078: warning: Function parameter or member 'ip' not described in 'trace_vbprintk'
kernel/trace/trace.c:3078: warning: Function parameter or member 'fmt' not described in 'trace_vbprintk'
kernel/trace/trace.c:3078: warning: Function parameter or member 'args' not described in 'trace_vbprintk'

Link: http://lkml.kernel.org/r/20190828052549.2472-2-jakub.kicinski@netronome.com

Signed-off-by: Jakub Kicinski 
Signed-off-by: Steven Rostedt (VMware)