linux-toradex.git/kernel/events, branch v3.12.29

perf: Fix race in removing an event

2014-06-20T15:33:52+00:00

commit 46ce0fe97a6be7532ce6126bb26ce89fed81528c upstream.

When removing a (sibling) event we do:

	raw_spin_lock_irq(&ctx->lock);
	perf_group_detach(event);
	raw_spin_unlock_irq(&ctx->lock);

	

	perf_remove_from_context(event);
		raw_spin_lock_irq(&ctx->lock);
		...
		raw_spin_unlock_irq(&ctx->lock);

Now, assuming the event is a sibling, it will be 'unreachable' for
things like ctx_sched_out() because that iterates the
groups->siblings, and we just unhooked the sibling.

So, if during  we get ctx_sched_out(), it will miss the event
and not call event_sched_out() on it, leaving it programmed on the
PMU.

The subsequent perf_remove_from_context() call will find the ctx is
inactive and only call list_del_event() to remove the event from all
other lists.

Hereafter we can proceed to free the event; while still programmed!

Close this hole by moving perf_group_detach() inside the same
ctx->lock region(s) perf_remove_from_context() has.

The condition on inherited events only in __perf_event_exit_task() is
likely complete crap because non-inherited events are part of groups
too and we're tearing down just the same. But leave that for another
patch.

Most-likely-Fixes: e03a9a55b4e ("perf: Change close() semantics for group events")
Reported-by: Vince Weaver 
Tested-by: Vince Weaver 
Much-staring-at-traces-by: Vince Weaver 
Much-staring-at-traces-by: Thomas Gleixner 
Cc: Arnaldo Carvalho de Melo 
Cc: Linus Torvalds 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/20140505093124.GN17778@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
Signed-off-by: Jiri Slaby

perf: Limit perf_event_attr::sample_period to 63 bits

2014-06-20T15:33:52+00:00

commit 0819b2e30ccb93edf04876237b6205eef84ec8d2 upstream.

Vince reported that using a large sample_period (one with bit 63 set)
results in wreckage since while the sample_period is fundamentally
unsigned (negative periods don't make sense) the way we implement
things very much rely on signed logic.

So limit sample_period to 63 bits to avoid tripping over this.

Reported-by: Vince Weaver 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/n/tip-p25fhunibl4y3qi0zuqmyf4b@git.kernel.org
Signed-off-by: Thomas Gleixner 
Signed-off-by: Jiri Slaby

perf: Prevent false warning in perf_swevent_add

2014-06-20T15:33:51+00:00

commit 39af6b1678afa5880dda7e375cf3f9d395087f6d upstream.

The perf cpu offline callback takes down all cpu context
events and releases swhash->swevent_hlist.

This could race with task context software event being just
scheduled on this cpu via perf_swevent_add while cpu hotplug
code already cleaned up event's data.

The race happens in the gap between the cpu notifier code
and the cpu being actually taken down. Note that only cpu
ctx events are terminated in the perf cpu hotplug code.

It's easily reproduced with:
  $ perf record -e faults perf bench sched pipe

while putting one of the cpus offline:
  # echo 0 > /sys/devices/system/cpu/cpu1/online

Console emits following warning:
  WARNING: CPU: 1 PID: 2845 at kernel/events/core.c:5672 perf_swevent_add+0x18d/0x1a0()
  Modules linked in:
  CPU: 1 PID: 2845 Comm: sched-pipe Tainted: G        W    3.14.0+ #256
  Hardware name: Intel Corporation Montevina platform/To be filled by O.E.M., BIOS AMVACRB1.86C.0066.B00.0805070703 05/07/2008
   0000000000000009 ffff880077233ab8 ffffffff81665a23 0000000000200005
   0000000000000000 ffff880077233af8 ffffffff8104732c 0000000000000046
   ffff88007467c800 0000000000000002 ffff88007a9cf2a0 0000000000000001
  Call Trace:
   [] dump_stack+0x4f/0x7c
   [] warn_slowpath_common+0x8c/0xc0
   [] warn_slowpath_null+0x1a/0x20
   [] perf_swevent_add+0x18d/0x1a0
   [] event_sched_in.isra.75+0x9e/0x1f0
   [] group_sched_in+0x6a/0x1f0
   [] ? sched_clock_local+0x25/0xa0
   [] ctx_sched_in+0x1f6/0x450
   [] perf_event_sched_in+0x6b/0xa0
   [] perf_event_context_sched_in+0x7b/0xc0
   [] __perf_event_task_sched_in+0x43e/0x460
   [] ? put_lock_stats.isra.18+0xe/0x30
   [] finish_task_switch+0xb8/0x100
   [] __schedule+0x30e/0xad0
   [] ? pipe_read+0x3e2/0x560
   [] ? preempt_schedule_irq+0x3e/0x70
   [] ? preempt_schedule_irq+0x3e/0x70
   [] preempt_schedule_irq+0x44/0x70
   [] retint_kernel+0x20/0x30
   [] ? lockdep_sys_exit+0x1a/0x90
   [] lockdep_sys_exit_thunk+0x35/0x67
   [] ? sysret_check+0x5/0x56

Fixing this by tracking the cpu hotplug state and displaying
the WARN only if current cpu is initialized properly.

Cc: Corey Ashford 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Paul Mackerras 
Cc: Arnaldo Carvalho de Melo 
Reported-by: Fengguang Wu 
Signed-off-by: Jiri Olsa 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/1396861448-10097-1-git-send-email-jolsa@redhat.com
Signed-off-by: Thomas Gleixner 
Signed-off-by: Jiri Slaby

list: introduce list_next_entry() and list_prev_entry()

2014-05-29T09:37:44+00:00

commit 008208c6b26f21c2648c250a09c55e737c02c5f8 upstream.

Add two trivial helpers list_next_entry() and list_prev_entry(), they
can have a lot of users including list.h itself.  In fact the 1st one is
already defined in events/core.c and bnx2x_sp.c, so the patch simply
moves the definition to list.h.

Signed-off-by: Oleg Nesterov 
Cc: Eilon Greenstein 
Cc: Greg Kroah-Hartman 
Cc: Peter Zijlstra 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Jiri Slaby

perf: Fix hotplug splat

2014-03-05T16:13:51+00:00

commit e3703f8cdfcf39c25c4338c3ad8e68891cca3731 upstream.

Drew Richardson reported that he could make the kernel go *boom* when hotplugging
while having perf events active.

It turned out that when you have a group event, the code in
__perf_event_exit_context() fails to remove the group siblings from
the context.

We then proceed with destroying and freeing the event, and when you
re-plug the CPU and try and add another event to that CPU, things go
*boom* because you've still got dead entries there.

Reported-by: Drew Richardson 
Signed-off-by: Peter Zijlstra 
Cc: Will Deacon 
Link: http://lkml.kernel.org/n/tip-k6v5wundvusvcseqj1si0oz0@git.kernel.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Jiri Slaby

perf: Enforce 1 as lower limit for perf_event_max_sample_rate

2014-02-26T09:24:22+00:00

commit 723478c8a471403c53cf144999701f6e0c4bbd11 upstream.

/proc/sys/kernel/perf_event_max_sample_rate will accept
negative values as well as 0.

Negative values are unreasonable, and 0 causes a
divide by zero exception in perf_proc_update_handler.

This patch enforces a lower limit of 1.

Signed-off-by: Knut Petersen 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/5242DB0C.4070005@t-online.de
Signed-off-by: Ingo Molnar 
Signed-off-by: Jiri Slaby

perf: Fix perf ring buffer memory ordering

2013-10-29T11:01:19+00:00

The PPC64 people noticed a missing memory barrier and crufty old
comments in the perf ring buffer code. So update all the comments and
add the missing barrier.

When the architecture implements local_t using atomic_long_t there
will be double barriers issued; but short of introducing more
conditional barrier primitives this is the best we can do.

Reported-by: Victor Kaplansky 
Tested-by: Victor Kaplansky 
Signed-off-by: Peter Zijlstra 
Cc: Mathieu Desnoyers 
Cc: michael@ellerman.id.au
Cc: Paul McKenney 
Cc: Michael Neuling 
Cc: Frederic Weisbecker 
Cc: anton@samba.org
Cc: benh@kernel.crashing.org
Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lan
Signed-off-by: Ingo Molnar

perf: Disable PERF_RECORD_MMAP2 support

2013-10-17T19:27:14+00:00

For now, we disable the extended MMAP record support (MMAP2).

We have identified cases where it would not report the correct mapping
information, clone(VM_CLONE) but with separate pids.  We will revisit
the support once we find a solution for this case.

The patch changes the kernel to return EINVAL if attr->mmap2 is set. The
patch also modifies the perf tool to use regular PERF_RECORD_MMAP for
synthetic events and it also prevents the tool from requesting
attr->mmap2 mode because the kernel would reject it.

The support will be revisited once the kenrel interface is updated.

In V2, we reduce the patch to the strict minimum.

In V3, we avoid calling perf_event_open() with mmap2 set because we know
it will fail and require fallback retry.

Signed-off-by: Stephane Eranian 
Cc: Andi Kleen 
Cc: David Ahern 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Peter Zijlstra 
Link: http://lkml.kernel.org/r/20131017173215.GA8820@quad
Signed-off-by: Arnaldo Carvalho de Melo

perf: Fix perf_pmu_migrate_context

2013-10-04T07:58:53+00:00

While auditing the list_entry usage due to a trinity bug I found that
perf_pmu_migrate_context violates the rules for
perf_event::event_entry.

The problem is that perf_event::event_entry is a RCU list element, and
hence we must wait for a full RCU grace period before re-using the
element after deletion.

Therefore the usage in perf_pmu_migrate_context() which re-uses the
entry immediately is broken. For now introduce another list_head into
perf_event for this specific usage.

This doesn't actually fix the trinity report because that never goes
through this code.

Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/n/tip-mkj72lxagw1z8fvjm648iznw@git.kernel.org
Signed-off-by: Ingo Molnar

perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page'

2013-09-20T07:45:11+00:00

Solve the problems around the broken definition of perf_event_mmap_page::
cap_usr_time and cap_usr_rdpmc fields which used to overlap, partially
fixed by:

  860f085b74e9 ("perf: Fix broken union in 'struct perf_event_mmap_page'")

The problem with the fix (merged in v3.12-rc1 and not yet released
officially), noticed by Vince Weaver is that the new behavior is
not detectable by new user-space, and that due to the reuse of the
field names it's easy to mis-compile a binary if old headers are used
on a new kernel or new headers are used on an old kernel.

To solve all that make this change explicit, detectable and self-contained,
by iterating the ABI the following way:

 - Always clear bit 0, and rename it to usrpage->cap_bit0, to at least not
   confuse old user-space binaries. RDPMC will be marked as unavailable
   to old binaries but that's within the ABI, this is a capability bit.

 - Rename bit 1 to ->cap_bit0_is_deprecated and always set it to 1, so new
   libraries can reliably detect that bit 0 is deprecated and perma-zero
   without having to check the kernel version.

 - Use bits 2, 3, 4 for the newly defined, correct functionality:

	cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
	cap_user_time		: 1, /* The time_* fields are used */
	cap_user_time_zero	: 1, /* The time_zero field is used */

 - Rename all the bitfield names in perf_event.h to be different from the
   old names, to make sure it's not possible to mis-compile it
   accidentally with old assumptions.

The 'size' field can then be used in the future to add new fields and it
will act as a natural ABI version indicator as well.

Also adjust tools/perf/ userspace for the new definitions, noticed by
Adrian Hunter.

Reported-by: Vince Weaver 
Signed-off-by: Peter Zijlstra 
Also-Fixed-by: Adrian Hunter 
Link: http://lkml.kernel.org/n/tip-zr03yxjrpXesOzzupszqglbv@git.kernel.org
Signed-off-by: Ingo Molnar