linux-toradex.git/kernel/events, branch v3.2.60

perf: Prevent false warning in perf_swevent_add

2014-06-09T12:29:13+00:00

commit 39af6b1678afa5880dda7e375cf3f9d395087f6d upstream.

The perf cpu offline callback takes down all cpu context
events and releases swhash->swevent_hlist.

This could race with task context software event being just
scheduled on this cpu via perf_swevent_add while cpu hotplug
code already cleaned up event's data.

The race happens in the gap between the cpu notifier code
and the cpu being actually taken down. Note that only cpu
ctx events are terminated in the perf cpu hotplug code.

It's easily reproduced with:
  $ perf record -e faults perf bench sched pipe

while putting one of the cpus offline:
  # echo 0 > /sys/devices/system/cpu/cpu1/online

Console emits following warning:
  WARNING: CPU: 1 PID: 2845 at kernel/events/core.c:5672 perf_swevent_add+0x18d/0x1a0()
  Modules linked in:
  CPU: 1 PID: 2845 Comm: sched-pipe Tainted: G        W    3.14.0+ #256
  Hardware name: Intel Corporation Montevina platform/To be filled by O.E.M., BIOS AMVACRB1.86C.0066.B00.0805070703 05/07/2008
   0000000000000009 ffff880077233ab8 ffffffff81665a23 0000000000200005
   0000000000000000 ffff880077233af8 ffffffff8104732c 0000000000000046
   ffff88007467c800 0000000000000002 ffff88007a9cf2a0 0000000000000001
  Call Trace:
   [] dump_stack+0x4f/0x7c
   [] warn_slowpath_common+0x8c/0xc0
   [] warn_slowpath_null+0x1a/0x20
   [] perf_swevent_add+0x18d/0x1a0
   [] event_sched_in.isra.75+0x9e/0x1f0
   [] group_sched_in+0x6a/0x1f0
   [] ? sched_clock_local+0x25/0xa0
   [] ctx_sched_in+0x1f6/0x450
   [] perf_event_sched_in+0x6b/0xa0
   [] perf_event_context_sched_in+0x7b/0xc0
   [] __perf_event_task_sched_in+0x43e/0x460
   [] ? put_lock_stats.isra.18+0xe/0x30
   [] finish_task_switch+0xb8/0x100
   [] __schedule+0x30e/0xad0
   [] ? pipe_read+0x3e2/0x560
   [] ? preempt_schedule_irq+0x3e/0x70
   [] ? preempt_schedule_irq+0x3e/0x70
   [] preempt_schedule_irq+0x44/0x70
   [] retint_kernel+0x20/0x30
   [] ? lockdep_sys_exit+0x1a/0x90
   [] lockdep_sys_exit_thunk+0x35/0x67
   [] ? sysret_check+0x5/0x56

Fixing this by tracking the cpu hotplug state and displaying
the WARN only if current cpu is initialized properly.

Cc: Corey Ashford 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Paul Mackerras 
Cc: Arnaldo Carvalho de Melo 
Reported-by: Fengguang Wu 
Signed-off-by: Jiri Olsa 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/1396861448-10097-1-git-send-email-jolsa@redhat.com
Signed-off-by: Thomas Gleixner 
Signed-off-by: Ben Hutchings

perf: Limit perf_event_attr::sample_period to 63 bits

2014-06-09T12:29:13+00:00

commit 0819b2e30ccb93edf04876237b6205eef84ec8d2 upstream.

Vince reported that using a large sample_period (one with bit 63 set)
results in wreckage since while the sample_period is fundamentally
unsigned (negative periods don't make sense) the way we implement
things very much rely on signed logic.

So limit sample_period to 63 bits to avoid tripping over this.

Reported-by: Vince Weaver 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/n/tip-p25fhunibl4y3qi0zuqmyf4b@git.kernel.org
Signed-off-by: Thomas Gleixner 
Signed-off-by: Ben Hutchings

perf: Fix hotplug splat

2014-04-01T23:58:56+00:00

commit e3703f8cdfcf39c25c4338c3ad8e68891cca3731 upstream.

Drew Richardson reported that he could make the kernel go *boom* when hotplugging
while having perf events active.

It turned out that when you have a group event, the code in
__perf_event_exit_context() fails to remove the group siblings from
the context.

We then proceed with destroying and freeing the event, and when you
re-plug the CPU and try and add another event to that CPU, things go
*boom* because you've still got dead entries there.

Reported-by: Drew Richardson 
Signed-off-by: Peter Zijlstra 
Cc: Will Deacon 
Link: http://lkml.kernel.org/n/tip-k6v5wundvusvcseqj1si0oz0@git.kernel.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Ben Hutchings

perf: Fix perf ring buffer memory ordering

2013-11-28T14:01:59+00:00

commit bf378d341e4873ed928dc3c636252e6895a21f50 upstream.

The PPC64 people noticed a missing memory barrier and crufty old
comments in the perf ring buffer code. So update all the comments and
add the missing barrier.

When the architecture implements local_t using atomic_long_t there
will be double barriers issued; but short of introducing more
conditional barrier primitives this is the best we can do.

Reported-by: Victor Kaplansky 
Tested-by: Victor Kaplansky 
Signed-off-by: Peter Zijlstra 
Cc: Mathieu Desnoyers 
Cc: michael@ellerman.id.au
Cc: Paul McKenney 
Cc: Michael Neuling 
Cc: Frederic Weisbecker 
Cc: anton@samba.org
Cc: benh@kernel.crashing.org
Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lan
Signed-off-by: Ingo Molnar 
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings

perf: Fix perf_cgroup_switch for sw-events

2013-10-26T20:06:12+00:00

commit 95cf59ea72331d0093010543b8951bb43f262cac upstream.

Jiri reported that he could trigger the WARN_ON_ONCE() in
perf_cgroup_switch() using sw-events. This is because sw-events share
a cpuctx with multiple PMUs.

Use the ->unique_pmu pointer to limit the pmu iteration to unique
cpuctx instances.

Reported-and-Tested-by: Jiri Olsa 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/n/tip-so7wi2zf3jjzrwcutm2mkz0j@git.kernel.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Ben Hutchings

perf: Clarify perf_cpu_context::active_pmu usage by renaming it to ::unique_pmu

2013-10-26T20:06:11+00:00

commit 3f1f33206c16c7b3839d71372bc2ac3f305aa802 upstream.

Stephane thought the perf_cpu_context::active_pmu name confusing and
suggested using 'unique_pmu' instead.

This pointer is a pointer to a 'random' pmu sharing the cpuctx
instance, therefore limiting a for_each_pmu loop to those where
cpuctx->unique_pmu matches the pmu we get a loop over unique cpuctx
instances.

Suggested-by: Stephane Eranian 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/n/tip-kxyjqpfj2fn9gt7kwu5ag9ks@git.kernel.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Ben Hutchings

perf: Use css_tryget() to avoid propping up css refcount

2013-10-26T20:06:10+00:00

commit 9c5da09d266ca9b32eb16cf940f8161d949c2fe5 upstream.

An rmdir pushes css's ref count to zero.  However, if the associated
directory is open at the time, the dentry ref count is non-zero.  If
the fd for this directory is then passed into perf_event_open, it
does a css_get().  This bounces the ref count back up from zero.  This
is a problem by itself.  But what makes it turn into a crash is the
fact that we end up doing an extra dput, since we perform a dput
when css_put sees the ref count go down to zero.

css_tryget() does not fall into that trap. So, we use that instead.

Reproduction test-case for the bug:

 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 

 #define PERF_FLAG_PID_CGROUP    (1U << 2)

 int perf_event_open(struct perf_event_attr *hw_event_uptr,
                     pid_t pid, int cpu, int group_fd, unsigned long flags) {
         return syscall(__NR_perf_event_open,hw_event_uptr, pid, cpu,
                 group_fd, flags);
 }

 /*
  * Directly poke at the perf_event bug, since it's proving hard to repro
  * depending on where in the kernel tree.  what moved?
  */
 int main(int argc, char **argv)
 {
        int fd;
        struct perf_event_attr attr;
        memset(&attr, 0, sizeof(attr));
        attr.exclude_kernel = 1;
        attr.size = sizeof(attr);
        mkdir("/dev/cgroup/perf_event/blah", 0777);
        fd = open("/dev/cgroup/perf_event/blah", O_RDONLY);
        perror("open");
        rmdir("/dev/cgroup/perf_event/blah");
        sleep(2);
        perf_event_open(&attr, fd, 0, -1,  PERF_FLAG_PID_CGROUP);
        perror("perf_event_open");
        close(fd);
        return 0;
 }

Signed-off-by: Salman Qazi 
Signed-off-by: Peter Zijlstra 
Acked-by: Tejun Heo 
Link: http://lkml.kernel.org/r/20120614223108.1025.2503.stgit@dungbeetle.mtv.corp.google.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Ben Hutchings

perf: Fix event group context move

2013-09-10T00:57:05+00:00

commit 0231bb5336758426b44ccd798ccd3c5419c95d58 upstream.

When we have group with mixed events (hw/sw) we want to end up
with group leader being in hw context. So if group leader is
initialy sw event, we move all the events under hw context.

The move is done for each event by removing it from its context
and adding it back into proper one. As a part of the removal the
event is automatically disabled, which is not what we want at
this stage of creating groups.

The fix is to initialize event state after removal from sw
context.

This fix resulted from the following discussion:

  http://thread.gmane.org/gmane.linux.kernel.perf.user/1144

Reported-by: Andreas Hollmann 
Signed-off-by: Jiri Olsa 
Cc: Arnaldo Carvalho de Melo 
Cc: Namhyung Kim 
Cc: Corey Ashford 
Cc: Frederic Weisbecker 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Vince Weaver 
Link: http://lkml.kernel.org/r/1359714225-4231-1-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Ben Hutchings

perf: Fix mmap() accounting hole

2013-07-27T04:34:32+00:00

commit 9bb5d40cd93c9dd4be74834b1dcb1ba03629716b upstream.

Vince's fuzzer once again found holes. This time it spotted a leak in
the locked page accounting.

When an event had redirected output and its close() was the last
reference to the buffer we didn't have a vm context to undo accounting.

Change the code to destroy the buffer on the last munmap() and detach
all redirected events at that time. This provides us the right context
to undo the vm accounting.

[Backporting for 3.4-stable.
VM_RESERVED flag was replaced with pair 'VM_DONTEXPAND | VM_DONTDUMP' in
314e51b9 since 3.7.0-rc1, and 314e51b9 comes from a big patchset, we didn't
backport the patchset, so I restored 'VM_DNOTEXPAND | VM_DONTDUMP' as before:
-	vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP;
+	vma->vm_flags |= VM_DONTCOPY | VM_RESERVED;
 -- zliu]

Reported-and-tested-by: Vince Weaver 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/20130604084421.GI8923@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
Signed-off-by: Zhouping Liu 
Signed-off-by: Greg Kroah-Hartman 
[bwh: Backported to 3.2: drop unrelated addition of braces in free_event()]
Signed-off-by: Ben Hutchings

perf: Fix perf mmap bugs

2013-07-27T04:34:32+00:00

commit 26cb63ad11e04047a64309362674bcbbd6a6f246 upstream.

Vince reported a problem found by his perf specific trinity
fuzzer.

Al noticed 2 problems with perf's mmap():

 - it has issues against fork() since we use vma->vm_mm for accounting.
 - it has an rb refcount leak on double mmap().

We fix the issues against fork() by using VM_DONTCOPY; I don't
think there's code out there that uses this; we didn't hear
about weird accounting problems/crashes. If we do need this to
work, the previously proposed VM_PINNED could make this work.

Aside from the rb reference leak spotted by Al, Vince's example
prog was indeed doing a double mmap() through the use of
perf_event_set_output().

This exposes another problem, since we now have 2 events with
one buffer, the accounting gets screwy because we account per
event. Fix this by making the buffer responsible for its own
accounting.

[Backporting for 3.4-stable.
VM_RESERVED flag was replaced with pair 'VM_DONTEXPAND | VM_DONTDUMP' in
314e51b9 since 3.7.0-rc1, and 314e51b9 comes from a big patchset, we didn't
backport the patchset, so I restored 'VM_DNOTEXPAND | VM_DONTDUMP' as before:
-       vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP;
+       vma->vm_flags |= VM_DONTCOPY | VM_RESERVED;
 -- zliu]

Reported-by: Vince Weaver 
Signed-off-by: Peter Zijlstra 
Cc: Al Viro 
Cc: Paul Mackerras 
Cc: Arnaldo Carvalho de Melo 
Link: http://lkml.kernel.org/r/20130528085548.GA12193@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
Signed-off-by: Zhouping Liu 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Ben Hutchings