From 3f005e7de3db8d0b3f7a1f399aa061dc35b65864 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Tue, 26 Jul 2016 18:12:21 +0100 Subject: perf/core: Sched out groups atomically Groups of events are supposed to be scheduled atomically, such that it is possible to derive meaningful ratios between their values. We take great pains to achieve this when scheduling event groups to a PMU in group_sched_in(), calling {start,commit}_txn() (which fall back to perf_pmu_{disable,enable}() if necessary) to provide this guarantee. However we don't mirror this in group_sched_out(), and in some cases events will not be scheduled out atomically. For example, if we disable an event group with PERF_EVENT_IOC_DISABLE, we'll cross-call __perf_event_disable() for the group leader, and will call group_sched_out() without having first disabled the relevant PMU. We will disable/enable the PMU around each pmu->del() call, but between each call the PMU will be enabled and events may count. Avoid this by explicitly disabling and enabling the PMU around event removal in group_sched_out(), mirroring what we do in group_sched_in(). Signed-off-by: Mark Rutland Signed-off-by: Peter Zijlstra (Intel) Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Link: http://lkml.kernel.org/r/1469553141-28314-1-git-send-email-mark.rutland@arm.com Signed-off-by: Ingo Molnar --- kernel/events/core.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'kernel') diff --git a/kernel/events/core.c b/kernel/events/core.c index 1903b8f3a705..11f6bbe168ab 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1796,6 +1796,8 @@ group_sched_out(struct perf_event *group_event, struct perf_event *event; int state = group_event->state; + perf_pmu_disable(ctx->pmu); + event_sched_out(group_event, cpuctx, ctx); /* @@ -1804,6 +1806,8 @@ group_sched_out(struct perf_event *group_event, list_for_each_entry(event, &group_event->sibling_list, group_entry) event_sched_out(event, cpuctx, ctx); + perf_pmu_enable(ctx->pmu); + if (state == PERF_EVENT_STATE_ACTIVE && group_event->attr.exclusive) cpuctx->exclusive = 0; } -- cgit v1.2.3 From 09e61b4f78498bd9f213b0a536e80b79507ea89f Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Wed, 6 Jul 2016 18:02:43 +0200 Subject: perf/x86/intel: Rework the large PEBS setup code In order to allow optimizing perf_pmu_sched_task() we must ensure perf_sched_cb_{inc,dec}() are no longer called from NMI context; this means that pmu::{start,stop}() can no longer use them. Prepare for this by reworking the whole large PEBS setup code. The current code relied on the cpuc->pebs_enabled state, however since that reflects the current active state as per pmu::{start,stop}() we can no longer rely on this. Introduce two counters: cpuc->n_pebs and cpuc->n_large_pebs which count the total number of PEBS events and the number of PEBS events that have FREERUNNING set, resp.. With this we can tell if the current setup requires a single record interrupt threshold or can use a larger buffer. This also improves the code in that it re-enables the large threshold once the PEBS event that required single record gets removed. Signed-off-by: Peter Zijlstra (Intel) Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar --- kernel/events/core.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'kernel') diff --git a/kernel/events/core.c b/kernel/events/core.c index 11f6bbe168ab..57aff715039f 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2818,6 +2818,10 @@ void perf_sched_cb_inc(struct pmu *pmu) /* * This function provides the context switch callback to the lower code * layer. It is invoked ONLY when the context switch callback is enabled. + * + * This callback is relevant even to per-cpu events; for example multi event + * PEBS requires this to provide PID/TID information. This requires we flush + * all queued PEBS records before we context switch to a new task. */ static void perf_pmu_sched_task(struct task_struct *prev, struct task_struct *next, -- cgit v1.2.3 From e48c178814b4a33f84f62d01f5a601ebd57fbba8 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Wed, 6 Jul 2016 09:18:30 +0200 Subject: perf/core: Optimize perf_pmu_sched_task() For perf record -b, which requires the pmu::sched_task callback the current code is rather expensive: 7.68% sched-pipe [kernel.vmlinux] [k] perf_pmu_sched_task 5.95% sched-pipe [kernel.vmlinux] [k] __switch_to 5.20% sched-pipe [kernel.vmlinux] [k] __intel_pmu_disable_all 3.95% sched-pipe perf [.] worker_thread The problem is that it will iterate all registered PMUs, most of which will not have anything to do. Avoid this by keeping an explicit list of PMUs that have requested the callback. The perf_sched_cb_{inc,dec}() functions already takes the required pmu argument, and now that these functions are no longer called from NMI context we can use them to manage a list. With this patch applied the function doesn't show up in the top 4 anymore (it dropped to 18th place). 6.67% sched-pipe [kernel.vmlinux] [k] __switch_to 6.18% sched-pipe [kernel.vmlinux] [k] __intel_pmu_disable_all 3.92% sched-pipe [kernel.vmlinux] [k] switch_mm_irqs_off 3.71% sched-pipe perf [.] worker_thread Signed-off-by: Peter Zijlstra (Intel) Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar --- kernel/events/core.c | 43 ++++++++++++++++++++++++------------------- 1 file changed, 24 insertions(+), 19 deletions(-) (limited to 'kernel') diff --git a/kernel/events/core.c b/kernel/events/core.c index 57aff715039f..803481cb6cbd 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2805,13 +2805,26 @@ unlock: } } +static DEFINE_PER_CPU(struct list_head, sched_cb_list); + void perf_sched_cb_dec(struct pmu *pmu) { + struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context); + this_cpu_dec(perf_sched_cb_usages); + + if (!--cpuctx->sched_cb_usage) + list_del(&cpuctx->sched_cb_entry); } + void perf_sched_cb_inc(struct pmu *pmu) { + struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context); + + if (!cpuctx->sched_cb_usage++) + list_add(&cpuctx->sched_cb_entry, this_cpu_ptr(&sched_cb_list)); + this_cpu_inc(perf_sched_cb_usages); } @@ -2829,34 +2842,24 @@ static void perf_pmu_sched_task(struct task_struct *prev, { struct perf_cpu_context *cpuctx; struct pmu *pmu; - unsigned long flags; if (prev == next) return; - local_irq_save(flags); - - rcu_read_lock(); - - list_for_each_entry_rcu(pmu, &pmus, entry) { - if (pmu->sched_task) { - cpuctx = this_cpu_ptr(pmu->pmu_cpu_context); - - perf_ctx_lock(cpuctx, cpuctx->task_ctx); + list_for_each_entry(cpuctx, this_cpu_ptr(&sched_cb_list), sched_cb_entry) { + pmu = cpuctx->unique_pmu; /* software PMUs will not have sched_task */ - perf_pmu_disable(pmu); + if (WARN_ON_ONCE(!pmu->sched_task)) + continue; - pmu->sched_task(cpuctx->task_ctx, sched_in); + perf_ctx_lock(cpuctx, cpuctx->task_ctx); + perf_pmu_disable(pmu); - perf_pmu_enable(pmu); + pmu->sched_task(cpuctx->task_ctx, sched_in); - perf_ctx_unlock(cpuctx, cpuctx->task_ctx); - } + perf_pmu_enable(pmu); + perf_ctx_unlock(cpuctx, cpuctx->task_ctx); } - - rcu_read_unlock(); - - local_irq_restore(flags); } static void perf_event_switch(struct task_struct *task, @@ -10393,6 +10396,8 @@ static void __init perf_event_init_all_cpus(void) INIT_LIST_HEAD(&per_cpu(pmu_sb_events.list, cpu)); raw_spin_lock_init(&per_cpu(pmu_sb_events.lock, cpu)); + + INIT_LIST_HEAD(&per_cpu(sched_cb_list, cpu)); } } -- cgit v1.2.3 From bdfaa2eecd5f6ca0cb5cff2bc7a974a15a2fd21b Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Wed, 17 Aug 2016 17:37:04 +0200 Subject: uprobes: Rename the "struct page *" args of __replace_page() Purely cosmetic, no changes in the compiled code. Perhaps it is just me but I can hardly read __replace_page() because I can't distinguish "page" from "kpage" and because I need to look at the caller to to ensure that, say, kpage is really the new page and the code is correct. Rename them to old_page and new_page, this matches the caller. Signed-off-by: Oleg Nesterov Cc: Alexander Shishkin Cc: Alexei Starovoitov Cc: Arnaldo Carvalho de Melo Cc: Arnaldo Carvalho de Melo Cc: Brenden Blanco Cc: Jiri Olsa Cc: Johannes Weiner Cc: Linus Torvalds Cc: Michal Hocko Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Vladimir Davydov Link: http://lkml.kernel.org/r/20160817153704.GC29724@redhat.com Signed-off-by: Ingo Molnar --- kernel/events/uprobes.c | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-) (limited to 'kernel') diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 8c50276b60d1..d4129bb05e5d 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -150,7 +150,7 @@ static loff_t vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr) * Returns 0 on success, -EFAULT on failure. */ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, - struct page *page, struct page *kpage) + struct page *old_page, struct page *new_page) { struct mm_struct *mm = vma->vm_mm; spinlock_t *ptl; @@ -161,49 +161,49 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, const unsigned long mmun_end = addr + PAGE_SIZE; struct mem_cgroup *memcg; - err = mem_cgroup_try_charge(kpage, vma->vm_mm, GFP_KERNEL, &memcg, + err = mem_cgroup_try_charge(new_page, vma->vm_mm, GFP_KERNEL, &memcg, false); if (err) return err; /* For try_to_free_swap() and munlock_vma_page() below */ - lock_page(page); + lock_page(old_page); mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end); err = -EAGAIN; - ptep = page_check_address(page, mm, addr, &ptl, 0); + ptep = page_check_address(old_page, mm, addr, &ptl, 0); if (!ptep) { - mem_cgroup_cancel_charge(kpage, memcg, false); + mem_cgroup_cancel_charge(new_page, memcg, false); goto unlock; } - get_page(kpage); - page_add_new_anon_rmap(kpage, vma, addr, false); - mem_cgroup_commit_charge(kpage, memcg, false, false); - lru_cache_add_active_or_unevictable(kpage, vma); + get_page(new_page); + page_add_new_anon_rmap(new_page, vma, addr, false); + mem_cgroup_commit_charge(new_page, memcg, false, false); + lru_cache_add_active_or_unevictable(new_page, vma); - if (!PageAnon(page)) { - dec_mm_counter(mm, mm_counter_file(page)); + if (!PageAnon(old_page)) { + dec_mm_counter(mm, mm_counter_file(old_page)); inc_mm_counter(mm, MM_ANONPAGES); } flush_cache_page(vma, addr, pte_pfn(*ptep)); ptep_clear_flush_notify(vma, addr, ptep); - set_pte_at_notify(mm, addr, ptep, mk_pte(kpage, vma->vm_page_prot)); + set_pte_at_notify(mm, addr, ptep, mk_pte(new_page, vma->vm_page_prot)); - page_remove_rmap(page, false); - if (!page_mapped(page)) - try_to_free_swap(page); + page_remove_rmap(old_page, false); + if (!page_mapped(old_page)) + try_to_free_swap(old_page); pte_unmap_unlock(ptep, ptl); if (vma->vm_flags & VM_LOCKED) - munlock_vma_page(page); - put_page(page); + munlock_vma_page(old_page); + put_page(old_page); err = 0; unlock: mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); - unlock_page(page); + unlock_page(old_page); return err; } -- cgit v1.2.3 From 29dd3288705f26cc27663e79061209dabce2d5b9 Mon Sep 17 00:00:00 2001 From: Madhavan Srinivasan Date: Wed, 17 Aug 2016 15:06:08 +0530 Subject: bitmap.h, perf/core: Fix the mask in perf_output_sample_regs() When decoding the perf_regs mask in perf_output_sample_regs(), we loop through the mask using find_first_bit and find_next_bit functions. While the exisiting code works fine in most of the case, the logic is broken for big-endian 32-bit kernels. When reading a u64 mask using (u32 *)(&val)[0], find_*_bit() assumes that it gets the lower 32 bits of u64, but instead it gets the upper 32 bits - which is wrong. The fix is to swap the words of the u64 to handle this case. This is _not_ a regular endianness swap. Suggested-by: Yury Norov Signed-off-by: Madhavan Srinivasan Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Yury Norov Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Jiri Olsa Cc: Linus Torvalds Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1471426568-31051-2-git-send-email-maddy@linux.vnet.ibm.com Signed-off-by: Ingo Molnar --- kernel/events/core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'kernel') diff --git a/kernel/events/core.c b/kernel/events/core.c index ca4fde5ed268..849919c2f3d7 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5340,9 +5340,10 @@ perf_output_sample_regs(struct perf_output_handle *handle, struct pt_regs *regs, u64 mask) { int bit; + DECLARE_BITMAP(_mask, 64); - for_each_set_bit(bit, (const unsigned long *) &mask, - sizeof(mask) * BITS_PER_BYTE) { + bitmap_from_u64(_mask, mask); + for_each_set_bit(bit, _mask, sizeof(mask) * BITS_PER_BYTE) { u64 val; val = perf_reg_value(regs, bit); -- cgit v1.2.3 From 4ff6a8debf48a7bf48e93c01da720785070d3a25 Mon Sep 17 00:00:00 2001 From: David Carrillo-Cisneros Date: Wed, 17 Aug 2016 13:55:05 -0700 Subject: perf/core: Generalize event->group_flags Currently, PERF_GROUP_SOFTWARE is used in the group_flags field of a group's leader to indicate that is_software_event(event) is true for all events in a group. This is the only usage of event->group_flags. This pattern of setting a group level flags when all events in the group share a property is useful for the flag introduced in the next patch and for future CQM/CMT flags. So this patches generalizes group_flags to work as an aggregate of event level flags. PERF_GROUP_SOFTWARE denotes an inmutable event's property. All other flags that I intend to add are also determinable at event initialization. To better convey the above, this patch renames event's group_flags to group_caps and PERF_GROUP_SOFTWARE to PERF_EV_CAP_SOFTWARE. Individual event flags are stored in the new event->event_caps. Since the cap flags do not change after event initialization, there is no need to serialize event_caps. This new field is used when events are added to a context, similarly to how PERF_GROUP_SOFTWARE and is_software_event() worked. Lastly, for consistency, updates is_software_event() to rely in event_cap instead of the context index. Signed-off-by: David Carrillo-Cisneros Signed-off-by: Peter Zijlstra (Intel) Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Kan Liang Cc: Linus Torvalds Cc: Paul Turner Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vegard Nossum Cc: Vince Weaver Link: http://lkml.kernel.org/r/1471467307-61171-3-git-send-email-davidcc@google.com Signed-off-by: Ingo Molnar --- kernel/events/core.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'kernel') diff --git a/kernel/events/core.c b/kernel/events/core.c index 849919c2f3d7..8c42a5ae9030 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1475,8 +1475,7 @@ list_add_event(struct perf_event *event, struct perf_event_context *ctx) if (event->group_leader == event) { struct list_head *list; - if (is_software_event(event)) - event->group_flags |= PERF_GROUP_SOFTWARE; + event->group_caps = event->event_caps; list = ctx_group_list(event, ctx); list_add_tail(&event->group_entry, list); @@ -1630,9 +1629,7 @@ static void perf_group_attach(struct perf_event *event) WARN_ON_ONCE(group_leader->ctx != event->ctx); - if (group_leader->group_flags & PERF_GROUP_SOFTWARE && - !is_software_event(event)) - group_leader->group_flags &= ~PERF_GROUP_SOFTWARE; + group_leader->group_caps &= event->event_caps; list_add_tail(&event->group_entry, &group_leader->sibling_list); group_leader->nr_siblings++; @@ -1723,7 +1720,7 @@ static void perf_group_detach(struct perf_event *event) sibling->group_leader = sibling; /* Inherit group flags from the previous leader */ - sibling->group_flags = event->group_flags; + sibling->group_caps = event->group_caps; WARN_ON_ONCE(sibling->ctx != event->ctx); } @@ -2149,7 +2146,7 @@ static int group_can_go_on(struct perf_event *event, /* * Groups consisting entirely of software events can always go on. */ - if (event->group_flags & PERF_GROUP_SOFTWARE) + if (event->group_caps & PERF_EV_CAP_SOFTWARE) return 1; /* * If an exclusive group is already on, no other hardware @@ -9490,6 +9487,9 @@ SYSCALL_DEFINE5(perf_event_open, goto err_alloc; } + if (pmu->task_ctx_nr == perf_sw_context) + event->event_caps |= PERF_EV_CAP_SOFTWARE; + if (group_leader && (is_software_event(event) != is_software_event(group_leader))) { if (is_software_event(event)) { @@ -9503,7 +9503,7 @@ SYSCALL_DEFINE5(perf_event_open, */ pmu = group_leader->pmu; } else if (is_software_event(group_leader) && - (group_leader->group_flags & PERF_GROUP_SOFTWARE)) { + (group_leader->group_caps & PERF_EV_CAP_SOFTWARE)) { /* * In case the group is a pure software group, and we * try to add a hardware event, move the whole group to -- cgit v1.2.3 From d6a2f9035bfc27d0e9d78b13635dda9fb017ac01 Mon Sep 17 00:00:00 2001 From: David Carrillo-Cisneros Date: Wed, 17 Aug 2016 13:55:06 -0700 Subject: perf/core: Introduce PMU_EV_CAP_READ_ACTIVE_PKG Introduce the flag PMU_EV_CAP_READ_ACTIVE_PKG, useful for uncore events, that allows a PMU to signal the generic perf code that an event is readable in the current CPU if the event is active in a CPU in the same package as the current CPU. This is an optimization that avoids a unnecessary IPI for the common case where uncore events are run and read in the same package but in different CPUs. As an example, the IPI removal speeds up perf_read() in my Haswell system as follows: - For event UNC_C_LLC_LOOKUP: From 260 us to 31 us. - For event RAPL's power/energy-cores/: From to 255 us to 27 us. For the optimization to work, all events in the group must have it (similarly to PERF_EV_CAP_SOFTWARE). Signed-off-by: David Carrillo-Cisneros Signed-off-by: Peter Zijlstra (Intel) Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: David Carrillo-Cisneros Cc: Jiri Olsa Cc: Kan Liang Cc: Linus Torvalds Cc: Paul Turner Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vegard Nossum Cc: Vince Weaver Link: http://lkml.kernel.org/r/1471467307-61171-4-git-send-email-davidcc@google.com Signed-off-by: Ingo Molnar --- kernel/events/core.c | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) (limited to 'kernel') diff --git a/kernel/events/core.c b/kernel/events/core.c index 8c42a5ae9030..3f07e6cfc1b6 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -3424,6 +3424,22 @@ struct perf_read_data { int ret; }; +static int find_cpu_to_read(struct perf_event *event, int local_cpu) +{ + int event_cpu = event->oncpu; + u16 local_pkg, event_pkg; + + if (event->group_caps & PERF_EV_CAP_READ_ACTIVE_PKG) { + event_pkg = topology_physical_package_id(event_cpu); + local_pkg = topology_physical_package_id(local_cpu); + + if (event_pkg == local_pkg) + return local_cpu; + } + + return event_cpu; +} + /* * Cross CPU call to read the hardware event */ @@ -3545,7 +3561,7 @@ u64 perf_event_read_local(struct perf_event *event) static int perf_event_read(struct perf_event *event, bool group) { - int ret = 0; + int ret = 0, cpu_to_read, local_cpu; /* * If event is enabled and currently active on a CPU, update the @@ -3557,7 +3573,12 @@ static int perf_event_read(struct perf_event *event, bool group) .group = group, .ret = 0, }; - ret = smp_call_function_single(event->oncpu, __perf_event_read, &data, 1); + + local_cpu = get_cpu(); + cpu_to_read = find_cpu_to_read(event, local_cpu); + put_cpu(); + + ret = smp_call_function_single(cpu_to_read, __perf_event_read, &data, 1); /* The event must have been read from an online CPU: */ WARN_ON_ONCE(ret); ret = ret ? : data.ret; -- cgit v1.2.3 From 17ce3dc7e5a0e4796cc7838d1f7b2531d0bca130 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Thu, 18 Aug 2016 17:57:50 +0900 Subject: ftrace: kprobe: uprobe: Add x8/x16/x32/x64 for hexadecimal types Add x8/x16/x32/x64 for hexadecimal type casting to kprobe/uprobe event tracer. These type casts can be used for integer arguments for explicitly showing them in hexadecimal digits in formatted text. Signed-off-by: Masami Hiramatsu Acked-by: Steven Rostedt Cc: Alexander Shishkin Cc: Hemant Kumar Cc: Naohiro Aota Cc: Peter Zijlstra Cc: Wang Nan Link: http://lkml.kernel.org/r/147151067029.12957.11591314629326414783.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo --- kernel/trace/trace_kprobe.c | 4 ++++ kernel/trace/trace_probe.c | 30 +++++++++++++++++------------- kernel/trace/trace_probe.h | 9 +++++++++ kernel/trace/trace_uprobe.c | 4 ++++ 4 files changed, 34 insertions(+), 13 deletions(-) (limited to 'kernel') diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 9aedb0b06683..eb6c9f1d3a93 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -253,6 +253,10 @@ static const struct fetch_type kprobes_fetch_type_table[] = { ASSIGN_FETCH_TYPE(s16, u16, 1), ASSIGN_FETCH_TYPE(s32, u32, 1), ASSIGN_FETCH_TYPE(s64, u64, 1), + ASSIGN_FETCH_TYPE_ALIAS(x8, u8, u8, 0), + ASSIGN_FETCH_TYPE_ALIAS(x16, u16, u16, 0), + ASSIGN_FETCH_TYPE_ALIAS(x32, u32, u32, 0), + ASSIGN_FETCH_TYPE_ALIAS(x64, u64, u64, 0), ASSIGN_FETCH_TYPE_END }; diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c index 74e80a582c28..725af9dcbdff 100644 --- a/kernel/trace/trace_probe.c +++ b/kernel/trace/trace_probe.c @@ -36,24 +36,28 @@ const char *reserved_field_names[] = { }; /* Printing in basic type function template */ -#define DEFINE_BASIC_PRINT_TYPE_FUNC(type, fmt) \ -int PRINT_TYPE_FUNC_NAME(type)(struct trace_seq *s, const char *name, \ +#define DEFINE_BASIC_PRINT_TYPE_FUNC(tname, type, fmt) \ +int PRINT_TYPE_FUNC_NAME(tname)(struct trace_seq *s, const char *name, \ void *data, void *ent) \ { \ trace_seq_printf(s, " %s=" fmt, name, *(type *)data); \ return !trace_seq_has_overflowed(s); \ } \ -const char PRINT_TYPE_FMT_NAME(type)[] = fmt; \ -NOKPROBE_SYMBOL(PRINT_TYPE_FUNC_NAME(type)); - -DEFINE_BASIC_PRINT_TYPE_FUNC(u8 , "0x%x") -DEFINE_BASIC_PRINT_TYPE_FUNC(u16, "0x%x") -DEFINE_BASIC_PRINT_TYPE_FUNC(u32, "0x%x") -DEFINE_BASIC_PRINT_TYPE_FUNC(u64, "0x%Lx") -DEFINE_BASIC_PRINT_TYPE_FUNC(s8, "%d") -DEFINE_BASIC_PRINT_TYPE_FUNC(s16, "%d") -DEFINE_BASIC_PRINT_TYPE_FUNC(s32, "%d") -DEFINE_BASIC_PRINT_TYPE_FUNC(s64, "%Ld") +const char PRINT_TYPE_FMT_NAME(tname)[] = fmt; \ +NOKPROBE_SYMBOL(PRINT_TYPE_FUNC_NAME(tname)); + +DEFINE_BASIC_PRINT_TYPE_FUNC(u8, u8, "0x%x") +DEFINE_BASIC_PRINT_TYPE_FUNC(u16, u16, "0x%x") +DEFINE_BASIC_PRINT_TYPE_FUNC(u32, u32, "0x%x") +DEFINE_BASIC_PRINT_TYPE_FUNC(u64, u64, "0x%Lx") +DEFINE_BASIC_PRINT_TYPE_FUNC(s8, s8, "%d") +DEFINE_BASIC_PRINT_TYPE_FUNC(s16, s16, "%d") +DEFINE_BASIC_PRINT_TYPE_FUNC(s32, s32, "%d") +DEFINE_BASIC_PRINT_TYPE_FUNC(s64, s64, "%Ld") +DEFINE_BASIC_PRINT_TYPE_FUNC(x8, u8, "0x%x") +DEFINE_BASIC_PRINT_TYPE_FUNC(x16, u16, "0x%x") +DEFINE_BASIC_PRINT_TYPE_FUNC(x32, u32, "0x%x") +DEFINE_BASIC_PRINT_TYPE_FUNC(x64, u64, "0x%Lx") /* Print type function for string type */ int PRINT_TYPE_FUNC_NAME(string)(struct trace_seq *s, const char *name, diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h index 45400ca5ded1..f0c470a10edd 100644 --- a/kernel/trace/trace_probe.h +++ b/kernel/trace/trace_probe.h @@ -149,6 +149,11 @@ DECLARE_BASIC_PRINT_TYPE_FUNC(s8); DECLARE_BASIC_PRINT_TYPE_FUNC(s16); DECLARE_BASIC_PRINT_TYPE_FUNC(s32); DECLARE_BASIC_PRINT_TYPE_FUNC(s64); +DECLARE_BASIC_PRINT_TYPE_FUNC(x8); +DECLARE_BASIC_PRINT_TYPE_FUNC(x16); +DECLARE_BASIC_PRINT_TYPE_FUNC(x32); +DECLARE_BASIC_PRINT_TYPE_FUNC(x64); + DECLARE_BASIC_PRINT_TYPE_FUNC(string); #define FETCH_FUNC_NAME(method, type) fetch_##method##_##type @@ -234,6 +239,10 @@ ASSIGN_FETCH_FUNC(file_offset, ftype), \ #define ASSIGN_FETCH_TYPE(ptype, ftype, sign) \ __ASSIGN_FETCH_TYPE(#ptype, ptype, ftype, sizeof(ftype), sign, #ptype) +/* If ptype is an alias of atype, use this macro (show atype in format) */ +#define ASSIGN_FETCH_TYPE_ALIAS(ptype, atype, ftype, sign) \ + __ASSIGN_FETCH_TYPE(#ptype, ptype, ftype, sizeof(ftype), sign, #atype) + #define ASSIGN_FETCH_TYPE_END {} #define FETCH_TYPE_STRING 0 diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index c53485441c88..7a687320f867 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -211,6 +211,10 @@ static const struct fetch_type uprobes_fetch_type_table[] = { ASSIGN_FETCH_TYPE(s16, u16, 1), ASSIGN_FETCH_TYPE(s32, u32, 1), ASSIGN_FETCH_TYPE(s64, u64, 1), + ASSIGN_FETCH_TYPE_ALIAS(x8, u8, u8, 0), + ASSIGN_FETCH_TYPE_ALIAS(x16, u16, u16, 0), + ASSIGN_FETCH_TYPE_ALIAS(x32, u32, u32, 0), + ASSIGN_FETCH_TYPE_ALIAS(x64, u64, u64, 0), ASSIGN_FETCH_TYPE_END }; -- cgit v1.2.3 From 864256255597aad86abcecbe6c53da8852ded15b Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Thu, 18 Aug 2016 17:58:15 +0900 Subject: ftrace: probe: Add README entries for k/uprobe-events Add README entries for kprobe-events and uprobe-events. This allows user to check what options can be acceptable for running kernel. E.g. perf tools can choose correct types for the kernel. Signed-off-by: Masami Hiramatsu Acked-by: Steven Rostedt Cc: Alexander Shishkin Cc: Hemant Kumar Cc: Naohiro Aota Cc: Peter Zijlstra Cc: Wang Nan Link: http://lkml.kernel.org/r/147151069524.12957.12957179170304055028.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo --- kernel/trace/trace.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) (limited to 'kernel') diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index dade4c9559cc..1e2ce3b52e51 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4123,6 +4123,30 @@ static const char readme_msg[] = "\t\t\t traces\n" #endif #endif /* CONFIG_STACK_TRACER */ +#ifdef CONFIG_KPROBE_EVENT + " kprobe_events\t\t- Add/remove/show the kernel dynamic events\n" + "\t\t\t Write into this file to define/undefine new trace events.\n" +#endif +#ifdef CONFIG_UPROBE_EVENT + " uprobe_events\t\t- Add/remove/show the userspace dynamic events\n" + "\t\t\t Write into this file to define/undefine new trace events.\n" +#endif +#if defined(CONFIG_KPROBE_EVENT) || defined(CONFIG_UPROBE_EVENT) + "\t accepts: event-definitions (one definition per line)\n" + "\t Format: p|r[:[/]] []\n" + "\t -:[/]\n" +#ifdef CONFIG_KPROBE_EVENT + "\t place: [:][+]|\n" +#endif +#ifdef CONFIG_UPROBE_EVENT + "\t place: :\n" +#endif + "\t args: =fetcharg[:type]\n" + "\t fetcharg: %, @
, @[+|-],\n" + "\t $stack, $stack, $retval, $comm\n" + "\t type: s8/16/32/64, u8/16/32/64, x8/16/32/64, string,\n" + "\t b@/\n" +#endif " events/\t\t- Directory containing all trace event subsystems:\n" " enable\t\t- Write 0/1 to enable/disable tracing of all events\n" " events//\t- Directory containing all trace events for :\n" -- cgit v1.2.3 From bdca79c2bf40556b664c9b1c32aec103e9bdb4a9 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Thu, 18 Aug 2016 17:59:21 +0900 Subject: ftrace: kprobe: uprobe: Show u8/u16/u32/u64 types in decimal Change kprobe/uprobe-tracer to show the arguments type-casted with u8/u16/u32/u64 in decimal digits instead of hexadecimal. To minimize compatibility issue, the arguments without type casting are typed by x64 (or x32 for 32bit arch) by default. Note: all arguments set by old perf probe without types are shown in decimal by default. Signed-off-by: Masami Hiramatsu Acked-by: Steven Rostedt Cc: Alexander Shishkin Cc: Hemant Kumar Cc: Naohiro Aota Cc: Peter Zijlstra Cc: Wang Nan Link: http://lkml.kernel.org/r/147151076135.12957.14684546093034343894.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo --- kernel/trace/trace_probe.c | 8 ++++---- kernel/trace/trace_probe.h | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) (limited to 'kernel') diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c index 725af9dcbdff..8c0553d9afd3 100644 --- a/kernel/trace/trace_probe.c +++ b/kernel/trace/trace_probe.c @@ -46,10 +46,10 @@ int PRINT_TYPE_FUNC_NAME(tname)(struct trace_seq *s, const char *name, \ const char PRINT_TYPE_FMT_NAME(tname)[] = fmt; \ NOKPROBE_SYMBOL(PRINT_TYPE_FUNC_NAME(tname)); -DEFINE_BASIC_PRINT_TYPE_FUNC(u8, u8, "0x%x") -DEFINE_BASIC_PRINT_TYPE_FUNC(u16, u16, "0x%x") -DEFINE_BASIC_PRINT_TYPE_FUNC(u32, u32, "0x%x") -DEFINE_BASIC_PRINT_TYPE_FUNC(u64, u64, "0x%Lx") +DEFINE_BASIC_PRINT_TYPE_FUNC(u8, u8, "%u") +DEFINE_BASIC_PRINT_TYPE_FUNC(u16, u16, "%u") +DEFINE_BASIC_PRINT_TYPE_FUNC(u32, u32, "%u") +DEFINE_BASIC_PRINT_TYPE_FUNC(u64, u64, "%Lu") DEFINE_BASIC_PRINT_TYPE_FUNC(s8, s8, "%d") DEFINE_BASIC_PRINT_TYPE_FUNC(s16, s16, "%d") DEFINE_BASIC_PRINT_TYPE_FUNC(s32, s32, "%d") diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h index f0c470a10edd..0c0ae54d44c6 100644 --- a/kernel/trace/trace_probe.h +++ b/kernel/trace/trace_probe.h @@ -208,7 +208,7 @@ DEFINE_FETCH_##method(u32) \ DEFINE_FETCH_##method(u64) /* Default (unsigned long) fetch type */ -#define __DEFAULT_FETCH_TYPE(t) u##t +#define __DEFAULT_FETCH_TYPE(t) x##t #define _DEFAULT_FETCH_TYPE(t) __DEFAULT_FETCH_TYPE(t) #define DEFAULT_FETCH_TYPE _DEFAULT_FETCH_TYPE(BITS_PER_LONG) #define DEFAULT_FETCH_TYPE_STR __stringify(DEFAULT_FETCH_TYPE) -- cgit v1.2.3 From c9bbdd4830ab06288bb1d8c00ed8c8c6e80e377a Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Mon, 15 Aug 2016 11:42:45 +0100 Subject: perf/core: Don't pass PERF_EF_START to the PMU ->start callback PERF_EF_START is a flag to indicate to the PMU ->add() callback that, as well as claiming the PMU resources required by the event being added, it should also start the PMU. Passing this flag to the ->start() callback doesn't make sense, because ->start() always tries to start the PMU. Remove it. Signed-off-by: Will Deacon Signed-off-by: Peter Zijlstra (Intel) Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Cc: mark.rutland@arm.com Link: http://lkml.kernel.org/r/1471257765-29662-1-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar --- kernel/events/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'kernel') diff --git a/kernel/events/core.c b/kernel/events/core.c index dff00c787867..74f22a95eb63 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2492,7 +2492,7 @@ static int __perf_event_stop(void *info) * while restarting. */ if (sd->restart) - event->pmu->start(event, PERF_EF_START); + event->pmu->start(event, 0); return 0; } -- cgit v1.2.3