summaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)Author
2017-01-09fgraph: Handle a case where a tracer ignores set_graph_notraceSteven Rostedt (Red Hat)
commit 794de08a16cf1fc1bf785dc48f66d36218cf6d88 upstream. Both the wakeup and irqsoff tracers can use the function graph tracer when the display-graph option is set. The problem is that they ignore the notrace file, and record the entry of functions that would be ignored by the function_graph tracer. This causes the trace->depth to be recorded into the ring buffer. The set_graph_notrace uses a trick by adding a large negative number to the trace->depth when a graph function is to be ignored. On trace output, the graph function uses the depth to record a stack of functions. But since the depth is negative, it accesses the array with a negative number and causes an out of bounds access that can cause a kernel oops or corrupt data. Have the print functions handle cases where a tracer still records functions even when they are in set_graph_notrace. Also add warnings if the depth is below zero before accessing the array. Note, the function graph logic will still prevent the return of these functions from being recorded, which means that they will be left hanging without a return. For example: # echo '*spin*' > set_graph_notrace # echo 1 > options/display-graph # echo wakeup > current_tracer # cat trace [...] _raw_spin_lock() { preempt_count_add() { do_raw_spin_lock() { update_rq_clock(); Where it should look like: _raw_spin_lock() { preempt_count_add(); do_raw_spin_lock(); } update_rq_clock(); Cc: Namhyung Kim <namhyung.kim@lge.com> Fixes: 29ad23b00474 ("ftrace: Add set_graph_notrace filter") Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30tracing: Move mutex to protect against resetting of seq dataSteven Rostedt (Red Hat)
commit 1245800c0f96eb6ebb368593e251d66c01e61022 upstream. The iter->seq can be reset outside the protection of the mutex. So can reading of user data. Move the mutex up to the beginning of the function. Fixes: d7350c3f45694 ("tracing/core: make the read callbacks reentrants") Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30fix memory leaks in tracing_buffers_splice_read()Al Viro
commit 1ae2293dd6d2f5c823cf97e60b70d03631cd622f upstream. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30Makefile: Mute warning for __builtin_return_address(>0) for tracing onlySteven Rostedt
commit 377ccbb483738f84400ddf5840c7dd8825716985 upstream. With the latest gcc compilers, they give a warning if __builtin_return_address() parameter is greater than 0. That is because if it is used by a function called by a top level function (or in the case of the kernel, by assembly), it can try to access stack frames outside the stack and crash the system. The tracing system uses __builtin_return_address() of up to 2! But it is well aware of the dangers that it may have, and has even added precautions to protect against it (see the thunk code in arch/x86/entry/thunk*.S) Linus originally added KBUILD_CFLAGS that would suppress the warning for the entire kernel, as simply adding KBUILD_CFLAGS to the tracing directory wouldn't work. The tracing directory plays a bit with the CFLAGS and requires a little more logic. This adds that special logic to only suppress the warning for the tracing directory. If it is used anywhere else outside of tracing, the warning will still be triggered. Link: http://lkml.kernel.org/r/20160728223043.51996267@grimm.local.home Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27tracing: Handle NULL formats in hold_module_trace_bprintk_format()Steven Rostedt (Red Hat)
commit 70c8217acd4383e069fe1898bbad36ea4fcdbdcc upstream. If a task uses a non constant string for the format parameter in trace_printk(), then the trace_printk_fmt variable is set to NULL. This variable is then saved in the __trace_printk_fmt section. The function hold_module_trace_bprintk_format() checks to see if duplicate formats are used by modules, and reuses them if so (saves them to the list if it is new). But this function calls lookup_format() that does a strcmp() to the value (which is now NULL) and can cause a kernel oops. This wasn't an issue till 3debb0a9ddb ("tracing: Fix trace_printk() to print when not using bprintk()") which added "__used" to the trace_printk_fmt variable, and before that, the kernel simply optimized it out (no NULL value was saved). The fix is simply to handle the NULL pointer in lookup_format() and have the caller ignore the value if it was NULL. Link: http://lkml.kernel.org/r/1464769870-18344-1-git-send-email-zhengjun.xing@intel.com Reported-by: xingzhen <zhengjun.xing@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Fixes: 3debb0a9ddb ("tracing: Fix trace_printk() to print when not using bprintk()") Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01ring-buffer: Prevent overflow of size in ring_buffer_resize()Steven Rostedt (Red Hat)
commit 59643d1535eb220668692a5359de22545af579f6 upstream. If the size passed to ring_buffer_resize() is greater than MAX_LONG - BUF_PAGE_SIZE then the DIV_ROUND_UP() will return zero. Here's the details: # echo 18014398509481980 > /sys/kernel/debug/tracing/buffer_size_kb tracing_entries_write() processes this and converts kb to bytes. 18014398509481980 << 10 = 18446744073709547520 and this is passed to ring_buffer_resize() as unsigned long size. size = DIV_ROUND_UP(size, BUF_PAGE_SIZE); Where DIV_ROUND_UP(a, b) is (a + b - 1)/b BUF_PAGE_SIZE is 4080 and here 18446744073709547520 + 4080 - 1 = 18446744073709551599 where 18446744073709551599 is still smaller than 2^64 2^64 - 18446744073709551599 = 17 But now 18446744073709551599 / 4080 = 4521260802379792 and size = size * 4080 = 18446744073709551360 This is checked to make sure its still greater than 2 * 4080, which it is. Then we convert to the number of buffer pages needed. nr_page = DIV_ROUND_UP(size, BUF_PAGE_SIZE) but this time size is 18446744073709551360 and 2^64 - (18446744073709551360 + 4080 - 1) = -3823 Thus it overflows and the resulting number is less than 4080, which makes 3823 / 4080 = 0 an nr_pages is set to this. As we already checked against the minimum that nr_pages may be, this causes the logic to fail as well, and we crash the kernel. There's no reason to have the two DIV_ROUND_UP() (that's just result of historical code changes), clean up the code and fix this bug. Fixes: 83f40318dab00 ("ring-buffer: Make removal of ring buffer pages atomic") Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01ring-buffer: Use long for nr_pages to avoid overflow failuresSteven Rostedt (Red Hat)
commit 9b94a8fba501f38368aef6ac1b30e7335252a220 upstream. The size variable to change the ring buffer in ftrace is a long. The nr_pages used to update the ring buffer based on the size is int. On 64 bit machines this can cause an overflow problem. For example, the following will cause the ring buffer to crash: # cd /sys/kernel/debug/tracing # echo 10 > buffer_size_kb # echo 8556384240 > buffer_size_kb Then you get the warning of: WARNING: CPU: 1 PID: 318 at kernel/trace/ring_buffer.c:1527 rb_update_pages+0x22f/0x260 Which is: RB_WARN_ON(cpu_buffer, nr_removed); Note each ring buffer page holds 4080 bytes. This is because: 1) 10 causes the ring buffer to have 3 pages. (10kb requires 3 * 4080 pages to hold) 2) (2^31 / 2^10 + 1) * 4080 = 8556384240 The value written into buffer_size_kb is shifted by 10 and then passed to ring_buffer_resize(). 8556384240 * 2^10 = 8761737461760 3) The size passed to ring_buffer_resize() is then divided by BUF_PAGE_SIZE which is 4080. 8761737461760 / 4080 = 2147484672 4) nr_pages is subtracted from the current nr_pages (3) and we get: 2147484669. This value is saved in a signed integer nr_pages_to_update 5) 2147484669 is greater than 2^31 but smaller than 2^32, a signed int turns into the value of -2147482627 6) As the value is a negative number, in update_pages_handler() it is negated and passed to rb_remove_pages() and 2147482627 pages will be removed, which is much larger than 3 and it causes the warning because not all the pages asked to be removed were removed. Link: https://bugzilla.kernel.org/show_bug.cgi?id=118001 Fixes: 7a8e76a3829f1 ("tracing: unified trace buffer") Reported-by: Hao Qin <QEver.cn@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-11tracing: Don't display trigger file for events that can't be enabledChunyu Hu
commit 854145e0a8e9a05f7366d240e2f99d9c1ca6d6dd upstream. Currently register functions for events will be called through the 'reg' field of event class directly without any check when seting up triggers. Triggers for events that don't support register through debug fs (events under events/ftrace are for trace-cmd to read event format, and most of them don't have a register function except events/ftrace/functionx) can't be enabled at all, and an oops will be hit when setting up trigger for those events, so just not creating them is an easy way to avoid the oops. Link: http://lkml.kernel.org/r/1462275274-3911-1-git-send-email-chuhu@redhat.com Fixes: 85f2b08268c01 ("tracing: Add basic event trigger framework") Signed-off-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12tracing: Fix trace_printk() to print when not using bprintk()Steven Rostedt (Red Hat)
commit 3debb0a9ddb16526de8b456491b7db60114f7b5e upstream. The trace_printk() code will allocate extra buffers if the compile detects that a trace_printk() is used. To do this, the format of the trace_printk() is saved to the __trace_printk_fmt section, and if that section is bigger than zero, the buffers are allocated (along with a message that this has happened). If trace_printk() uses a format that is not a constant, and thus something not guaranteed to be around when the print happens, the compiler optimizes the fmt out, as it is not used, and the __trace_printk_fmt section is not filled. This means the kernel will not allocate the special buffers needed for the trace_printk() and the trace_printk() will not write anything to the tracing buffer. Adding a "__used" to the variable in the __trace_printk_fmt section will keep it around, even though it is set to NULL. This will keep the string from being printed in the debugfs/tracing/printk_formats section as it is not needed. Reported-by: Vlastimil Babka <vbabka@suse.cz> Fixes: 07d777fe8c398 "tracing: Add percpu buffers for trace_printk()" Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12tracing: Fix crash from reading trace_pipe with sendfileSteven Rostedt (Red Hat)
commit a29054d9478d0435ab01b7544da4f674ab13f533 upstream. If tracing contains data and the trace_pipe file is read with sendfile(), then it can trigger a NULL pointer dereference and various BUG_ON within the VM code. There's a patch to fix this in the splice_to_pipe() code, but it's also a good idea to not let that happen from trace_pipe either. Link: http://lkml.kernel.org/r/1457641146-9068-1-git-send-email-rabin@rab.in Reported-by: Rabin Vincent <rabin.vincent@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12tracing: Have preempt(irqs)off trace preempt disabled functionsSteven Rostedt (Red Hat)
commit cb86e05390debcc084cfdb0a71ed4c5dbbec517d upstream. Joel Fernandes reported that the function tracing of preempt disabled sections was not being reported when running either the preemptirqsoff or preemptoff tracers. This was due to the fact that the function tracer callback for those tracers checked if irqs were disabled before tracing. But this fails when we want to trace preempt off locations as well. Joel explained that he wanted to see funcitons where interrupts are enabled but preemption was disabled. The expected output he wanted: <...>-2265 1d.h1 3419us : preempt_count_sub <-irq_exit <...>-2265 1d..1 3419us : __do_softirq <-irq_exit <...>-2265 1d..1 3419us : msecs_to_jiffies <-__do_softirq <...>-2265 1d..1 3420us : irqtime_account_irq <-__do_softirq <...>-2265 1d..1 3420us : __local_bh_disable_ip <-__do_softirq <...>-2265 1..s1 3421us : run_timer_softirq <-__do_softirq <...>-2265 1..s1 3421us : hrtimer_run_pending <-run_timer_softirq <...>-2265 1..s1 3421us : _raw_spin_lock_irq <-run_timer_softirq <...>-2265 1d.s1 3422us : preempt_count_add <-_raw_spin_lock_irq <...>-2265 1d.s2 3422us : _raw_spin_unlock_irq <-run_timer_softirq <...>-2265 1..s2 3422us : preempt_count_sub <-_raw_spin_unlock_irq <...>-2265 1..s1 3423us : rcu_bh_qs <-__do_softirq <...>-2265 1d.s1 3423us : irqtime_account_irq <-__do_softirq <...>-2265 1d.s1 3423us : __local_bh_enable <-__do_softirq There's a comment saying that the irq disabled check is because there's a possible race that tracing_cpu may be set when the function is executed. But I don't remember that race. For now, I added a check for preemption being enabled too to not record the function, as there would be no race if that was the case. I need to re-investigate this, as I'm now thinking that the tracing_cpu will always be correct. But no harm in keeping the check for now, except for the slight performance hit. Link: http://lkml.kernel.org/r/1457770386-88717-1-git-send-email-agnel.joel@gmail.com Fixes: 5e6d2b9cfa3a "tracing: Use one prologue for the preempt irqs off tracer function tracers" Reported-by: Joel Fernandes <agnel.joel@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-09tracing: Do not have 'comm' filter override event 'comm' fieldSteven Rostedt (Red Hat)
commit e57cbaf0eb006eaa207395f3bfd7ce52c1b5539c upstream. Commit 9f61668073a8d "tracing: Allow triggers to filter for CPU ids and process names" added a 'comm' filter that will filter events based on the current tasks struct 'comm'. But this now hides the ability to filter events that have a 'comm' field too. For example, sched_migrate_task trace event. That has a 'comm' field of the task to be migrated. echo 'comm == "bash"' > events/sched_migrate_task/filter will now filter all sched_migrate_task events for tasks named "bash" that migrates other tasks (in interrupt context), instead of seeing when "bash" itself gets migrated. This fix requires a couple of changes. 1) Change the look up order for filter predicates to look at the events fields before looking at the generic filters. 2) Instead of basing the filter function off of the "comm" name, have the generic "comm" filter have its own filter_type (FILTER_COMM). Test against the type instead of the name to assign the filter function. 3) Add a new "COMM" filter that works just like "comm" but will filter based on the current task, even if the trace event contains a "comm" field. Do the same for "cpu" field, adding a FILTER_CPU and a filter "CPU". Fixes: 9f61668073a8d "tracing: Allow triggers to filter for CPU ids and process names" Reported-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03tracing: Fix showing function event in available_eventsSteven Rostedt (Red Hat)
commit d045437a169f899dfb0f6f7ede24cc042543ced9 upstream. The ftrace:function event is only displayed for parsing the function tracer data. It is not used to enable function tracing, and does not include an "enable" file in its event directory. Originally, this event was kept separate from other events because it did not have a ->reg parameter. But perf added a "reg" parameter for its use which caused issues, because it made the event available to functions where it was not compatible for. Commit 9b63776fa3ca9 "tracing: Do not enable function event with enable" added a TRACE_EVENT_FL_IGNORE_ENABLE flag that prevented the function event from being enabled by normal trace events. But this commit missed keeping the function event from being displayed by the "available_events" directory, which is used to show what events can be enabled by set_event. One documented way to enable all events is to: cat available_events > set_event But because the function event is displayed in the available_events, this now causes an INVALID error: cat: write error: Invalid argument Reported-by: Chunyu Hu <chuhu@redhat.com> Fixes: 9b63776fa3ca9 "tracing: Do not enable function event with enable" Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-17tracing/stacktrace: Show entire trace if passed in function not foundSteven Rostedt
commit 6ccd83714a009ee301b50c15f6c3a5dc1f30164c upstream. When a max stack trace is discovered, the stack dump is saved. In order to not record the overhead of the stack tracer, the ip of the traced function is looked for within the dump. The trace is started from the location of that function. But if for some reason the ip is not found, the entire stack trace is then truncated. That's not very useful. Instead, print everything if the ip of the traced function is not found within the trace. This issue showed up on s390. Link: http://lkml.kernel.org/r/20160129102241.1b3c9c04@gandalf.local.home Fixes: 72ac426a5bb0 ("tracing: Clean up stack tracing and fix fentry updates") Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-17tracing: Fix stacktrace skip depth in trace_buffer_unlock_commit_regs()Steven Rostedt (Red Hat)
commit 7717c6be699975f6733d278b13b7c4295d73caf6 upstream. While cleaning the stacktrace code I unintentially changed the skip depth of trace_buffer_unlock_commit_regs() from 0 to 6. kprobes uses this function, and with skipping 6 call backs, it can easily produce no stack. Here's how I tested it: # echo 'p:ext4_sync_fs ext4_sync_fs ' > /sys/kernel/debug/tracing/kprobe_events # echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable # cat /sys/kernel/debug/trace sync-2394 [005] 502.457060: ext4_sync_fs: (ffffffff81317650) sync-2394 [005] 502.457063: kernel_stack: <stack trace> sync-2394 [005] 502.457086: ext4_sync_fs: (ffffffff81317650) sync-2394 [005] 502.457087: kernel_stack: <stack trace> sync-2394 [005] 502.457091: ext4_sync_fs: (ffffffff81317650) After putting back the skip stack to zero, we have: sync-2270 [000] 748.052693: ext4_sync_fs: (ffffffff81317650) sync-2270 [000] 748.052695: kernel_stack: <stack trace> => iterate_supers (ffffffff8126412e) => sys_sync (ffffffff8129c4b6) => entry_SYSCALL_64_fastpath (ffffffff8181f0b2) sync-2270 [000] 748.053017: ext4_sync_fs: (ffffffff81317650) sync-2270 [000] 748.053019: kernel_stack: <stack trace> => iterate_supers (ffffffff8126412e) => sys_sync (ffffffff8129c4b6) => entry_SYSCALL_64_fastpath (ffffffff8181f0b2) sync-2270 [000] 748.053381: ext4_sync_fs: (ffffffff81317650) sync-2270 [000] 748.053383: kernel_stack: <stack trace> => iterate_supers (ffffffff8126412e) => sys_sync (ffffffff8129c4b6) => entry_SYSCALL_64_fastpath (ffffffff8181f0b2) Fixes: 73dddbb57bb0 "tracing: Only create stacktrace option when STACKTRACE is configured" Reported-by: Brendan Gregg <brendan.d.gregg@gmail.com> Tested-by: Brendan Gregg <brendan.d.gregg@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-04tracing: Fix setting of start_index in find_next()Qiu Peiyang
When we do cat /sys/kernel/debug/tracing/printk_formats, we hit kernel panic at t_show. general protection fault: 0000 [#1] PREEMPT SMP CPU: 0 PID: 2957 Comm: sh Tainted: G W O 3.14.55-x86_64-01062-gd4acdc7 #2 RIP: 0010:[<ffffffff811375b2>] [<ffffffff811375b2>] t_show+0x22/0xe0 RSP: 0000:ffff88002b4ebe80 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004 RDX: 0000000000000004 RSI: ffffffff81fd26a6 RDI: ffff880032f9f7b1 RBP: ffff88002b4ebe98 R08: 0000000000001000 R09: 000000000000ffec R10: 0000000000000000 R11: 000000000000000f R12: ffff880004d9b6c0 R13: 7365725f6d706400 R14: ffff880004d9b6c0 R15: ffffffff82020570 FS: 0000000000000000(0000) GS:ffff88003aa00000(0063) knlGS:00000000f776bc40 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 00000000f6c02ff0 CR3: 000000002c2b3000 CR4: 00000000001007f0 Call Trace: [<ffffffff811dc076>] seq_read+0x2f6/0x3e0 [<ffffffff811b749b>] vfs_read+0x9b/0x160 [<ffffffff811b7f69>] SyS_read+0x49/0xb0 [<ffffffff81a3a4b9>] ia32_do_call+0x13/0x13 ---[ end trace 5bd9eb630614861e ]--- Kernel panic - not syncing: Fatal exception When the first time find_next calls find_next_mod_format, it should iterate the trace_bprintk_fmt_list to find the first print format of the module. However in current code, start_index is smaller than *pos at first, and code will not iterate the list. Latter container_of will get the wrong address with former v, which will cause mod_fmt be a meaningless object and so is the returned mod_fmt->fmt. This patch will fix it by correcting the start_index. After fixed, when the first time calls find_next_mod_format, start_index will be equal to *pos, and code will iterate the trace_bprintk_fmt_list to get the right module printk format, so is the returned mod_fmt->fmt. Link: http://lkml.kernel.org/r/5684B900.9000309@intel.com Cc: stable@vger.kernel.org # 3.12+ Fixes: 102c9323c35a8 "tracing: Add __tracepoint_string() to export string pointers" Signed-off-by: Qiu Peiyang <peiyangx.qiu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-12-08Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "This tree includes four core perf fixes for misc bugs, three fixes to x86 PMU drivers, and two updates to old email addresses" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Do not send exit event twice perf/x86/intel: Fix INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_NA macro perf/x86/intel: Make L1D_PEND_MISS.FB_FULL not constrained on Haswell perf: Fix PERF_EVENT_IOC_PERIOD deadlock treewide: Remove old email address perf/x86: Fix LBR call stack save/restore perf: Update email address in MAINTAINERS perf/core: Robustify the perf_cgroup_from_task() RCU checks perf/core: Fix RCU problem with cgroup context switching code
2015-12-01tracing: Add sched_wakeup_new and sched_waking tracepoints for pid filterSteven Rostedt (Red Hat)
The set_event_pid filter relies on attaching to the sched_switch and sched_wakeup tracepoints to see if it should filter the tracing on schedule tracepoints. By adding the callbacks to sched_wakeup, pids in the set_event_pid file will trace the wakeups of those tasks with those pids. But sched_wakeup_new and sched_waking were missed. These two should also be traced. Luckily, these tracepoints share the same class as sched_wakeup which means they can use the same pre and post callbacks as sched_wakeup does. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-24ring-buffer: Put back the length if crossed page with add_timestampSteven Rostedt (Red Hat)
Commit fcc742eaad7c "ring-buffer: Add event descriptor to simplify passing data" added a descriptor that holds various data instead of passing around several variables through parameters. The problem was that one of the parameters was modified in a function and the code was designed not to have an effect on that modified parameter. Now that the parameter is a descriptor and any modifications to it are non-volatile, the size of the data could be unnecessarily expanded. Remove the extra space added if a timestamp was added and the event went across the page. Cc: stable@vger.kernel.org # 4.3+ Fixes: fcc742eaad7c "ring-buffer: Add event descriptor to simplify passing data" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-24ring-buffer: Update read stamp with first real commit on pageSteven Rostedt (Red Hat)
Do not update the read stamp after swapping out the reader page from the write buffer. If the reader page is swapped out of the buffer before an event is written to it, then the read_stamp may get an out of date timestamp, as the page timestamp is updated on the first commit to that page. rb_get_reader_page() only returns a page if it has an event on it, otherwise it will return NULL. At that point, check if the page being returned has events and has not been read yet. Then at that point update the read_stamp to match the time stamp of the reader page. Cc: stable@vger.kernel.org # 2.6.30+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-23treewide: Remove old email addressPeter Zijlstra
There were still a number of references to my old Red Hat email address in the kernel source. Remove these while keeping the Red Hat copyright notices intact. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-12Merge tag 'trace-v4.4-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull trace cleanups from Steven Rostedt: "This contains three more clean up patches. One patch is needed to make tracing work without debugfs now that tracing uses its own tracefs. The second is removing an unused variable. The third is fixing a warning about unused variables when MAX_TRACER is not configured. Note, this warning shows up in gcc 6.0, but does not show up in gcc 4.9, as it seems that gcc does not complain about constants not being used" * tag 'trace-v4.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: #ifdef out uses of max trace when CONFIG_TRACER_MAX_TRACE is not set tracing: Remove unused ftrace_cpu_disabled per cpu variable tracing: Make tracing work when debugfs is not configured in
2015-11-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix null deref in xt_TEE netfilter module, from Eric Dumazet. 2) Several spots need to get to the original listner for SYN-ACK packets, most spots got this ok but some were not. Whilst covering the remaining cases, create a helper to do this. From Eric Dumazet. 3) Missiing check of return value from alloc_netdev() in CAIF SPI code, from Rasmus Villemoes. 4) Don't sleep while != TASK_RUNNING in macvtap, from Vlad Yasevich. 5) Use after free in mvneta driver, from Justin Maggard. 6) Fix race on dst->flags access in dst_release(), from Eric Dumazet. 7) Add missing ZLIB_INFLATE dependency for new qed driver. From Arnd Bergmann. 8) Fix multicast getsockopt deadlock, from WANG Cong. 9) Fix deadlock in btusb, from Kuba Pawlak. 10) Some ipv6_add_dev() failure paths were not cleaning up the SNMP6 counter state. From Sabrina Dubroca. 11) Fix packet_bind() race, which can cause lost notifications, from Francesco Ruggeri. 12) Fix MAC restoration in qlcnic driver during bonding mode changes, from Jarod Wilson. 13) Revert bridging forward delay change which broke libvirt and other userspace things, from Vlad Yasevich. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits) Revert "bridge: Allow forward delay to be cfgd when STP enabled" bpf_trace: Make dependent on PERF_EVENTS qed: select ZLIB_INFLATE net: fix a race in dst_release() net: mvneta: Fix memory use after free. net: Documentation: Fix default value tcp_limit_output_bytes macvtap: Resolve possible __might_sleep warning in macvtap_do_read() mvneta: add FIXED_PHY dependency net: caif: check return value of alloc_netdev net: hisilicon: NET_VENDOR_HISILICON should depend on HAS_DMA drivers: net: xgene: fix RGMII 10/100Mb mode netfilter: nft_meta: use skb_to_full_sk() helper net_sched: em_meta: use skb_to_full_sk() helper sched: cls_flow: use skb_to_full_sk() helper netfilter: xt_owner: use skb_to_full_sk() helper smack: use skb_to_full_sk() helper net: add skb_to_full_sk() helper and use it in selinux_netlbl_skbuff_setsid() bpf: doc: correct arch list for supported eBPF JIT dwc_eth_qos: Delete an unnecessary check before the function call "of_node_put" bonding: fix panic on non-ARPHRD_ETHER enslave failure ...
2015-11-10bpf_trace: Make dependent on PERF_EVENTSSteven Rostedt
Arnd Bergmann reported: In my ARM randconfig tests, I'm getting a build error for newly added code in bpf_perf_event_read and bpf_perf_event_output whenever CONFIG_PERF_EVENTS is disabled: kernel/trace/bpf_trace.c: In function 'bpf_perf_event_read': kernel/trace/bpf_trace.c:203:11: error: 'struct perf_event' has no member named 'oncpu' if (event->oncpu != smp_processor_id() || ^ kernel/trace/bpf_trace.c:204:11: error: 'struct perf_event' has no member named 'pmu' event->pmu->count) This can happen when UPROBE_EVENT is enabled but KPROBE_EVENT is disabled. I'm not sure if that is a configuration we care about, otherwise we could prevent this case from occuring by adding Kconfig dependencies. Looking at this further, it's really that UPROBE_EVENT enables PERF_EVENTS. By just having BPF_EVENTS depend on PERF_EVENTS, then all is fine. Link: http://lkml.kernel.org/r/4525348.Aq9YoXkChv@wuerfel Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-10tracing: #ifdef out uses of max trace when CONFIG_TRACER_MAX_TRACE is not setChen Gang
tracing_max_lat_fops is used only when TRACER_MAX_TRACE enabled, so also swith the related code. The related warning with defconfig under x86_64: CC kernel/trace/trace.o kernel/trace/trace.c:5466:37: warning: ‘tracing_max_lat_fops’ defined but not used [-Wunused-const-variable] static const struct file_operations tracing_max_lat_fops = { Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-07tracing: Remove unused ftrace_cpu_disabled per cpu variableDmitry Safonov
Since the ring buffer is lockless, there is no need to disable ftrace on CPU. And no one doing so: after commit 68179686ac67cb ("tracing: Remove ftrace_disable/enable_cpu()") ftrace_cpu_disabled stays the same after initialization, nothing changes it. ftrace_cpu_disabled shouldn't be used by any external module since it disables only function and graph_function tracers but not any other tracer. Link: http://lkml.kernel.org/r/1446836846-22239-1-git-send-email-0x7f454c46@gmail.com Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-06Merge tag 'trace-v4.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracking updates from Steven Rostedt: "Most of the changes are clean ups and small fixes. Some of them have stable tags to them. I searched through my INBOX just as the merge window opened and found lots of patches to pull. I ran them through all my tests and they were in linux-next for a few days. Features added this release: ---------------------------- - Module globbing. You can now filter function tracing to several modules. # echo '*:mod:*snd*' > set_ftrace_filter (Dmitry Safonov) - Tracer specific options are now visible even when the tracer is not active. It was rather annoying that you can only see and modify tracer options after enabling the tracer. Now they are in the options/ directory even when the tracer is not active. Although they are still only visible when the tracer is active in the trace_options file. - Trace options are now per instance (although some of the tracer specific options are global) - New tracefs file: set_event_pid. If any pid is added to this file, then all events in the instance will filter out events that are not part of this pid. sched_switch and sched_wakeup events handle next and the wakee pids" * tag 'trace-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (68 commits) tracefs: Fix refcount imbalance in start_creating() tracing: Put back comma for empty fields in boot string parsing tracing: Apply tracer specific options from kernel command line. tracing: Add some documentation about set_event_pid ring_buffer: Remove unneeded smp_wmb() before wakeup of reader benchmark tracing: Allow dumping traces without tracking trace started cpus ring_buffer: Fix more races when terminating the producer in the benchmark ring_buffer: Do no not complete benchmark reader too early tracing: Remove redundant TP_ARGS redefining tracing: Rename max_stack_lock to stack_trace_max_lock tracing: Allow arch-specific stack tracer recordmcount: arm64: Replace the ignored mcount call into nop recordmcount: Fix endianness handling bug for nop_mcount tracepoints: Fix documentation of RCU lockdep checks tracing: ftrace_event_is_function() can return boolean tracing: is_legal_op() can return boolean ring-buffer: rb_event_is_commit() can return boolean ring-buffer: rb_per_cpu_empty() can return boolean ring_buffer: ring_buffer_empty{cpu}() can return boolean ring-buffer: rb_is_reader_page() can return boolean ...
2015-11-06tracing: Make tracing work when debugfs is not configured inJiaxing Wang
Currently tracing_init_dentry() returns -ENODEV when debugfs is not configured in, which causes tracefs not populated with tracing files and directories, so we will get an empty directory even after we manually mount tracefs. We can make tracing_init_dentry() return NULL if debugfs is not configured in and can manually mount tracefs. But return -ENODEV if debugfs is configured in but not initialized or failed to create automount point as that would break backward compatibility with older tools. Link: http://lkml.kernel.org/r/1446797056-11683-1-git-send-email-hello.wjx@gmail.com Signed-off-by: Jiaxing Wang <hello.wjx@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-04Merge branch 'for-4.4/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block updates from Jens Axboe: "This is the core block pull request for 4.4. I've got a few more topic branches this time around, some of them will layer on top of the core+drivers changes and will come in a separate round. So not a huge chunk of changes in this round. This pull request contains: - Enable blk-mq page allocation tracking with kmemleak, from Catalin. - Unused prototype removal in blk-mq from Christoph. - Cleanup of the q->blk_trace exchange, using cmpxchg instead of two xchg()'s, from Davidlohr. - A plug flush fix from Jeff. - Also from Jeff, a fix that means we don't have to update shared tag sets at init time unless we do a state change. This cuts down boot times on thousands of devices a lot with scsi/blk-mq. - blk-mq waitqueue barrier fix from Kosuke. - Various fixes from Ming: - Fixes for segment merging and splitting, and checks, for the old core and blk-mq. - Potential blk-mq speedup by marking ctx pending at the end of a plug insertion batch in blk-mq. - direct-io no page dirty on kernel direct reads. - A WRITE_SYNC fix for mpage from Roman" * 'for-4.4/core' of git://git.kernel.dk/linux-block: blk-mq: avoid excessive boot delays with large lun counts blktrace: re-write setting q->blk_trace blk-mq: mark ctx as pending at batch in flush plug path blk-mq: fix for trace_block_plug() block: check bio_mergeable() early before merging blk-mq: check bio_mergeable() early before merging block: avoid to merge splitted bio block: setup bi_phys_segments after splitting block: fix plug list flushing for nomerge queues blk-mq: remove unused blk_mq_clone_flush_request prototype blk-mq: fix waitqueue_active without memory barrier in block/blk-mq-tag.c fs: direct-io: don't dirtying pages for ITER_BVEC/ITER_KVEC direct read fs/mpage.c: forgotten WRITE_SYNC in case of data integrity write block: kmemleak: Track the page allocations for struct request
2015-11-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: Changes of note: 1) Allow to schedule ICMP packets in IPVS, from Alex Gartrell. 2) Provide FIB table ID in ipv4 route dumps just as ipv6 does, from David Ahern. 3) Allow the user to ask for the statistics to be filtered out of ipv4/ipv6 address netlink dumps. From Sowmini Varadhan. 4) More work to pass the network namespace context around deep into various packet path APIs, starting with the netfilter hooks. From Eric W Biederman. 5) Add layer 2 TX/RX checksum offloading to qeth driver, from Thomas Richter. 6) Use usec resolution for SYN/ACK RTTs in TCP, from Yuchung Cheng. 7) Support Very High Throughput in wireless MESH code, from Bob Copeland. 8) Allow setting the ageing_time in switchdev/rocker. From Scott Feldman. 9) Properly autoload L2TP type modules, from Stephen Hemminger. 10) Fix and enable offload features by default in 8139cp driver, from David Woodhouse. 11) Support both ipv4 and ipv6 sockets in a single vxlan device, from Jiri Benc. 12) Fix CWND limiting of thin streams in TCP, from Bendik Rønning Opstad. 13) Fix IPSEC flowcache overflows on large systems, from Steffen Klassert. 14) Convert bridging to track VLANs using rhashtable entries rather than a bitmap. From Nikolay Aleksandrov. 15) Make TCP listener handling completely lockless, this is a major accomplishment. Incoming request sockets now live in the established hash table just like any other socket too. From Eric Dumazet. 15) Provide more bridging attributes to netlink, from Nikolay Aleksandrov. 16) Use hash based algorithm for ipv4 multipath routing, this was very long overdue. From Peter Nørlund. 17) Several y2038 cures, mostly avoiding timespec. From Arnd Bergmann. 18) Allow non-root execution of EBPF programs, from Alexei Starovoitov. 19) Support SO_INCOMING_CPU as setsockopt, from Eric Dumazet. This influences the port binding selection logic used by SO_REUSEPORT. 20) Add ipv6 support to VRF, from David Ahern. 21) Add support for Mellanox Spectrum switch ASIC, from Jiri Pirko. 22) Add rtl8xxxu Realtek wireless driver, from Jes Sorensen. 23) Implement RACK loss recovery in TCP, from Yuchung Cheng. 24) Support multipath routes in MPLS, from Roopa Prabhu. 25) Fix POLLOUT notification for listening sockets in AF_UNIX, from Eric Dumazet. 26) Add new QED Qlogic river, from Yuval Mintz, Manish Chopra, and Sudarsana Kalluru. 27) Don't fetch timestamps on AF_UNIX sockets, from Hannes Frederic Sowa. 28) Support ipv6 geneve tunnels, from John W Linville. 29) Add flood control support to switchdev layer, from Ido Schimmel. 30) Fix CHECKSUM_PARTIAL handling of potentially fragmented frames, from Hannes Frederic Sowa. 31) Support persistent maps and progs in bpf, from Daniel Borkmann. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1790 commits) sh_eth: use DMA barriers switchdev: respect SKIP_EOPNOTSUPP flag in case there is no recursion net: sched: kill dead code in sch_choke.c irda: Delete an unnecessary check before the function call "irlmp_unregister_service" net: dsa: mv88e6xxx: include DSA ports in VLANs net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports net/core: fix for_each_netdev_feature vlan: Invoke driver vlan hooks only if device is present arcnet/com20020: add LEDS_CLASS dependency bpf, verifier: annotate verbose printer with __printf dp83640: Only wait for timestamps for packets with timestamping enabled. ptp: Change ptp_class to a proper bitmask dp83640: Prune rx timestamp list before reading from it dp83640: Delay scheduled work. dp83640: Include hash in timestamp/packet matching ipv6: fix tunnel error handling net/mlx5e: Fix LSO vlan insertion net/mlx5e: Re-eanble client vlan TX acceleration net/mlx5e: Return error in case mlx5e_set_features() fails net/mlx5e: Don't allow more than max supported channels ...
2015-11-03tracing: Put back comma for empty fields in boot string parsingSteven Rostedt (Red Hat)
Both early_enable_events() and apply_trace_boot_options() parse a boot string that may get parsed later on. They both use strsep() which converts a comma into a nul character. To still allow the boot string to be parsed again the same way, the nul character gets converted back to a comma after the token is processed. The problem is that these two functions check for an empty parameter (two commas in a row ",,"), and continue the loop if the parameter is empty, but fails to place the comma back. In this case, the second parsing will end at this blank field, and not process fields afterward. In most cases, users should not have an empty field, but if its going to be checked, the code might as well be correct. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-03tracing: Apply tracer specific options from kernel command line.Jiaxing Wang
Currently, the trace_options parameter is only applied in tracer_alloc_buffers() when global_trace.current_trace is nop_trace, so a tracer specific option will not be applied even when the specific tracer is also enabled from kernel command line. For example, the 'func_stack_trace' option can't be enabled with the following kernel parameter: ftrace=function ftrace_filter=kfree trace_options=func_stack_trace We can enable tracer specific options by simply apply the options again if the specific tracer is also supplied from command line and started in register_tracer(). To make trace_boot_options_buf can be parsed again, a comma and a space is put back if they were replaced by strsep and strstrip respectively. Also make register_tracer() be __init to access the __init data, and in fact register_tracer is only called from __init code. Link: http://lkml.kernel.org/r/1446599669-9294-1-git-send-email-hello.wjx@gmail.com Signed-off-by: Jiaxing Wang <hello.wjx@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-03Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "The main changes in this cycle were: - sched/fair load tracking fixes and cleanups (Byungchul Park) - Make load tracking frequency scale invariant (Dietmar Eggemann) - sched/deadline updates (Juri Lelli) - stop machine fixes, cleanups and enhancements for bugs triggered by CPU hotplug stress testing (Oleg Nesterov) - scheduler preemption code rework: remove PREEMPT_ACTIVE and related cleanups (Peter Zijlstra) - Rework the sched_info::run_delay code to fix races (Peter Zijlstra) - Optimize per entity utilization tracking (Peter Zijlstra) - ... misc other fixes, cleanups and smaller updates" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) sched: Don't scan all-offline ->cpus_allowed twice if !CONFIG_CPUSETS sched: Move cpu_active() tests from stop_two_cpus() into migrate_swap_stop() sched: Start stopper early stop_machine: Kill cpu_stop_threads->setup() and cpu_stop_unpark() stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() stop_machine: Change cpu_stop_queue_two_works() to rely on stopper->enabled stop_machine: Introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works() stop_machine: Ensure that a queued callback will be called before cpu_stop_park() sched/x86: Fix typo in __switch_to() comments sched/core: Remove a parameter in the migrate_task_rq() function sched/core: Drop unlikely behind BUG_ON() sched/core: Fix task and run queue sched_info::run_delay inconsistencies sched/numa: Fix task_tick_fair() from disabling numa_balancing sched/core: Add preempt_count invariant check sched/core: More notrace annotations sched/core: Kill PREEMPT_ACTIVE sched/core, sched/x86: Kill thread_info::saved_preempt_count sched/core: Simplify preempt_count tests sched/core: Robustify preemption leak checks sched/core: Stop setting PREEMPT_ACTIVE ...
2015-11-03ring_buffer: Remove unneeded smp_wmb() before wakeup of reader benchmarkSteven Rostedt (Red Hat)
wake_up_process() has a memory barrier before doing anything, thus adding a memory barrier before calling it is redundant. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-03tracing: Allow dumping traces without tracking trace started cpusSasha Levin
We don't init iter->started when dumping the ftrace buffer, and there's no real need to do so - so allow skipping that check if the iter doesn't have an initialized ->started cpumask. Link: http://lkml.kernel.org/r/1441385156-27279-1-git-send-email-sasha.levin@oracle.com Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-03ring_buffer: Fix more races when terminating the producer in the benchmarkPetr Mladek
The commit b44754d8262d3aab8 ("ring_buffer: Allow to exit the ring buffer benchmark immediately") added a hack into ring_buffer_producer() that set @kill_test when kthread_should_stop() returned true. It improved the situation a lot. It stopped the kthread in most cases because the producer spent most of the time in the patched while cycle. But there are still few possible races when kthread_should_stop() is set outside of the cycle. Then we do not set @kill_test and some other checks pass. This patch adds a better fix. It renames @test_kill/TEST_KILL() into a better descriptive @test_error/TEST_ERROR(). Also it introduces break_test() function that checks for both @test_error and kthread_should_stop(). The new function is used in the producer when the check for @test_error is not enough. It is not used in the consumer because its state is manipulated by the producer via the "reader_finish" variable. Also we add a missing check into ring_buffer_producer_thread() between setting TASK_INTERRUPTIBLE and calling schedule_timeout(). Otherwise, we might miss a wakeup from kthread_stop(). Link: http://lkml.kernel.org/r/1441629518-32712-3-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-03ring_buffer: Do no not complete benchmark reader too earlyPetr Mladek
It seems that complete(&read_done) might be called too early in some situations. 1st scenario: ------------- CPU0 CPU1 ring_buffer_producer_thread() wake_up_process(consumer); wait_for_completion(&read_start); ring_buffer_consumer_thread() complete(&read_start); ring_buffer_producer() # producing data in # the do-while cycle ring_buffer_consumer(); # reading data # got error # set kill_test = 1; set_current_state( TASK_INTERRUPTIBLE); if (reader_finish) # false schedule(); # producer still in the middle of # do-while cycle if (consumer && !(cnt % wakeup_interval)) wake_up_process(consumer); # spurious wakeup while (!reader_finish && !kill_test) # leaving because # kill_test == 1 reader_finish = 0; complete(&read_done); 1st BANG: We might access uninitialized "read_done" if this is the the first round. # producer finally leaving # the do-while cycle because kill_test == 1; if (consumer) { reader_finish = 1; wake_up_process(consumer); wait_for_completion(&read_done); 2nd BANG: This will never complete because consumer already did the completion. 2nd scenario: ------------- CPU0 CPU1 ring_buffer_producer_thread() wake_up_process(consumer); wait_for_completion(&read_start); ring_buffer_consumer_thread() complete(&read_start); ring_buffer_producer() # CPU3 removes the module <--- difference from # and stops producer <--- the 1st scenario if (kthread_should_stop()) kill_test = 1; ring_buffer_consumer(); while (!reader_finish && !kill_test) # kill_test == 1 => we never go # into the top level while() reader_finish = 0; complete(&read_done); # producer still in the middle of # do-while cycle if (consumer && !(cnt % wakeup_interval)) wake_up_process(consumer); # spurious wakeup while (!reader_finish && !kill_test) # leaving because kill_test == 1 reader_finish = 0; complete(&read_done); BANG: We are in the same "bang" situations as in the 1st scenario. Root of the problem: -------------------- ring_buffer_consumer() must complete "read_done" only when "reader_finish" variable is set. It must not be skipped due to other conditions. Note that we still must keep the check for "reader_finish" in a loop because there might be spurious wakeups as described in the above scenarios. Solution: ---------- The top level cycle in ring_buffer_consumer() will finish only when "reader_finish" is set. The data will be read in "while-do" cycle so that they are not read after an error (kill_test == 1) or a spurious wake up. In addition, "reader_finish" is manipulated by the producer thread. Therefore we add READ_ONCE() to make sure that the fresh value is read in each cycle. Also we add the corresponding barrier to synchronize the sleep check. Next we set the state back to TASK_RUNNING for the situation where we did not sleep. Just from paranoid reasons, we initialize both completions statically. This is safer, in case there are other races that we are unaware of. As a side effect we could remove the memory barrier from ring_buffer_producer_thread(). IMHO, this was the reason for the barrier. ring_buffer_reset() uses spin locks that should provide the needed memory barrier for using the buffer. Link: http://lkml.kernel.org/r/1441629518-32712-2-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-03tracing: Remove redundant TP_ARGS redefiningDmitry Safonov
TP_ARGS is not used anywhere in trace.h nor trace_entries.h Firstly, I left just #undef TP_ARGS and had no errors - remove it. Link: http://lkml.kernel.org/r/1446576560-14085-1-git-send-email-0x7f454c46@gmail.com Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-03tracing: Rename max_stack_lock to stack_trace_max_lockSteven Rostedt (Red Hat)
Now that max_stack_lock is a global variable, it requires a naming convention that is unlikely to collide. Rename it to the same naming convention that the other stack_trace variables have. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-03tracing: Allow arch-specific stack tracerAKASHI Takahiro
A stack frame may be used in a different way depending on cpu architecture. Thus it is not always appropriate to slurp the stack contents, as current check_stack() does, in order to calcurate a stack index (height) at a given function call. At least not on arm64. In addition, there is a possibility that we will mistakenly detect a stale stack frame which has not been overwritten. This patch makes check_stack() a weak function so as to later implement arch-specific version. Link: http://lkml.kernel.org/r/1446182741-31019-5-git-send-email-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02tracing: ftrace_event_is_function() can return booleanYaowei Bai
Make ftrace_event_is_function() return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Link: http://lkml.kernel.org/r/1443537816-5788-9-git-send-email-bywxiaobai@163.com Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02tracing: is_legal_op() can return booleanYaowei Bai
Make is_legal_op() return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Link: http://lkml.kernel.org/r/1443537816-5788-8-git-send-email-bywxiaobai@163.com Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02ring-buffer: rb_event_is_commit() can return booleanYaowei Bai
Make rb_event_is_commit() return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Link: http://lkml.kernel.org/r/1443537816-5788-7-git-send-email-bywxiaobai@163.com Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02ring-buffer: rb_per_cpu_empty() can return booleanYaowei Bai
Makes rb_per_cpu_empty() return bool to improve readability. No functional change. Link: http://lkml.kernel.org/r/1443537816-5788-6-git-send-email-bywxiaobai@163.com Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02ring_buffer: ring_buffer_empty{cpu}() can return booleanYaowei Bai
Make ring_buffer_empty() and ring_buffer_empty_cpu() return bool. No functional change. Link: http://lkml.kernel.org/r/1443537816-5788-5-git-send-email-bywxiaobai@163.com Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02ring-buffer: rb_is_reader_page() can return booleanYaowei Bai
Make rb_is_reader_page() return bool to improve readability due to this particular function only using either true or false as its return value. No functional change. Link: http://lkml.kernel.org/r/1443537816-5788-4-git-send-email-bywxiaobai@163.com Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02tracing: report_latency() in trace_irqsoff.c can return booleanYaowei Bai
This patch makes report_latency return bool due to this particular function only using either one or zero as its return value. No functional change. Link: http://lkml.kernel.org/r/1443537816-5788-3-git-send-email-bywxiaobai@163.com Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02tracing: report_latency() in trace_sched_wakeup.c can return booleanYaowei Bai
This patch makes report_latency return bool to improve readability, indicating whether this new latency should be reported/recorded. No functional change. Link: http://lkml.kernel.org/r/1443537816-5788-2-git-send-email-bywxiaobai@163.com Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02tracing: Update instance_rmdir() to use tracefs_remove_recursiveJiaxing Wang
Update instancd_rmdir to use tracefs_remove_recursive instead of debugfs_remove_recursive.This was left in the transition from debugfs to tracefs. Link: http://lkml.kernel.org/r/1445169490-18315-2-git-send-email-hello.wjx@gmail.com Cc: stable@vger.kernel.org # 4.1+ Fixes: 8434dc9340cd2 ("tracing: Convert the tracing facility over to use tracefs") Signed-off-by: Jiaxing Wang <hello.wjx@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02tracing: Only benchmark the time tracepoints take if tracing is onChunyan Zhang
There's no need to record the time tracepoints take when tracing is off. This is because: 1) We cannot see these records since ring_buffer record is off at that moment. 2) If tracing is off and benchmark tracepoint is enabled, the time tracepoint takes is fewer than the same situation when tracing is on, since the tracepoints need to be wrote into ring_buffer, it would take more time. If turn on tracing at this moment, the average and standard deviation cannot exactly present the time that tracepoints take to write data into ring_buffer. Link: http://lkml.kernel.org/r/1445947933-27955-1-git-send-email-zhang.chunyan@linaro.org Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>