<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/arch/x86/kernel/cpu/perf_event.c, branch v3.0.56</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>x86, perf: Check that current-&gt;mm is alive before getting user callchain</title>
<updated>2011-10-03T18:40:09+00:00</updated>
<author>
<name>Andrey Vagin</name>
<email>avagin@openvz.org</email>
</author>
<published>2011-08-30T08:32:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2dd9ce06633ff92cd2b6770c717372ef44693bfd'/>
<id>2dd9ce06633ff92cd2b6770c717372ef44693bfd</id>
<content type='text'>
commit 20afc60f892d285fde179ead4b24e6a7938c2f1b upstream.

An event may occur when an mm is already released.

I added an event in dequeue_entity() and caught a panic with
the following backtrace:

[  434.421110] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
[  434.421258] IP: [&lt;ffffffff810464ac&gt;] __get_user_pages_fast+0x9c/0x120
...
[  434.421258] Call Trace:
[  434.421258]  [&lt;ffffffff8101ae81&gt;] copy_from_user_nmi+0x51/0xf0
[  434.421258]  [&lt;ffffffff8109a0d5&gt;] ? sched_clock_local+0x25/0x90
[  434.421258]  [&lt;ffffffff8101b048&gt;] perf_callchain_user+0x128/0x170
[  434.421258]  [&lt;ffffffff811154cd&gt;] ? __perf_event_header__init_id+0xed/0x100
[  434.421258]  [&lt;ffffffff81116690&gt;] perf_prepare_sample+0x200/0x280
[  434.421258]  [&lt;ffffffff81118da8&gt;] __perf_event_overflow+0x1b8/0x290
[  434.421258]  [&lt;ffffffff81065240&gt;] ? tg_shares_up+0x0/0x670
[  434.421258]  [&lt;ffffffff8104fe1a&gt;] ? walk_tg_tree+0x6a/0xb0
[  434.421258]  [&lt;ffffffff81118f44&gt;] perf_swevent_overflow+0xc4/0xf0
[  434.421258]  [&lt;ffffffff81119150&gt;] do_perf_sw_event+0x1e0/0x250
[  434.421258]  [&lt;ffffffff81119204&gt;] perf_tp_event+0x44/0x70
[  434.421258]  [&lt;ffffffff8105701f&gt;] ftrace_profile_sched_block+0xdf/0x110
[  434.421258]  [&lt;ffffffff8106121d&gt;] dequeue_entity+0x2ad/0x2d0
[  434.421258]  [&lt;ffffffff810614ec&gt;] dequeue_task_fair+0x1c/0x60
[  434.421258]  [&lt;ffffffff8105818a&gt;] dequeue_task+0x9a/0xb0
[  434.421258]  [&lt;ffffffff810581e2&gt;] deactivate_task+0x42/0xe0
[  434.421258]  [&lt;ffffffff814bc019&gt;] thread_return+0x191/0x808
[  434.421258]  [&lt;ffffffff81098a44&gt;] ? switch_task_namespaces+0x24/0x60
[  434.421258]  [&lt;ffffffff8106f4c4&gt;] do_exit+0x464/0x910
[  434.421258]  [&lt;ffffffff8106f9c8&gt;] do_group_exit+0x58/0xd0
[  434.421258]  [&lt;ffffffff8106fa57&gt;] sys_exit_group+0x17/0x20
[  434.421258]  [&lt;ffffffff8100b202&gt;] system_call_fastpath+0x16/0x1b

Signed-off-by: Andrey Vagin &lt;avagin@openvz.org&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/1314693156-24131-1-git-send-email-avagin@openvz.org
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 20afc60f892d285fde179ead4b24e6a7938c2f1b upstream.

An event may occur when an mm is already released.

I added an event in dequeue_entity() and caught a panic with
the following backtrace:

[  434.421110] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
[  434.421258] IP: [&lt;ffffffff810464ac&gt;] __get_user_pages_fast+0x9c/0x120
...
[  434.421258] Call Trace:
[  434.421258]  [&lt;ffffffff8101ae81&gt;] copy_from_user_nmi+0x51/0xf0
[  434.421258]  [&lt;ffffffff8109a0d5&gt;] ? sched_clock_local+0x25/0x90
[  434.421258]  [&lt;ffffffff8101b048&gt;] perf_callchain_user+0x128/0x170
[  434.421258]  [&lt;ffffffff811154cd&gt;] ? __perf_event_header__init_id+0xed/0x100
[  434.421258]  [&lt;ffffffff81116690&gt;] perf_prepare_sample+0x200/0x280
[  434.421258]  [&lt;ffffffff81118da8&gt;] __perf_event_overflow+0x1b8/0x290
[  434.421258]  [&lt;ffffffff81065240&gt;] ? tg_shares_up+0x0/0x670
[  434.421258]  [&lt;ffffffff8104fe1a&gt;] ? walk_tg_tree+0x6a/0xb0
[  434.421258]  [&lt;ffffffff81118f44&gt;] perf_swevent_overflow+0xc4/0xf0
[  434.421258]  [&lt;ffffffff81119150&gt;] do_perf_sw_event+0x1e0/0x250
[  434.421258]  [&lt;ffffffff81119204&gt;] perf_tp_event+0x44/0x70
[  434.421258]  [&lt;ffffffff8105701f&gt;] ftrace_profile_sched_block+0xdf/0x110
[  434.421258]  [&lt;ffffffff8106121d&gt;] dequeue_entity+0x2ad/0x2d0
[  434.421258]  [&lt;ffffffff810614ec&gt;] dequeue_task_fair+0x1c/0x60
[  434.421258]  [&lt;ffffffff8105818a&gt;] dequeue_task+0x9a/0xb0
[  434.421258]  [&lt;ffffffff810581e2&gt;] deactivate_task+0x42/0xe0
[  434.421258]  [&lt;ffffffff814bc019&gt;] thread_return+0x191/0x808
[  434.421258]  [&lt;ffffffff81098a44&gt;] ? switch_task_namespaces+0x24/0x60
[  434.421258]  [&lt;ffffffff8106f4c4&gt;] do_exit+0x464/0x910
[  434.421258]  [&lt;ffffffff8106f9c8&gt;] do_group_exit+0x58/0xd0
[  434.421258]  [&lt;ffffffff8106fa57&gt;] sys_exit_group+0x17/0x20
[  434.421258]  [&lt;ffffffff8100b202&gt;] system_call_fastpath+0x16/0x1b

Signed-off-by: Andrey Vagin &lt;avagin@openvz.org&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/1314693156-24131-1-git-send-email-avagin@openvz.org
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>x86: Remove warning and warning_symbol from struct stacktrace_ops</title>
<updated>2011-05-12T13:31:28+00:00</updated>
<author>
<name>Richard Weinberger</name>
<email>richard@nod.at</email>
</author>
<published>2011-05-12T13:11:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=449a66fd1fa75d36dca917704827c40c8f416bca'/>
<id>449a66fd1fa75d36dca917704827c40c8f416bca</id>
<content type='text'>
Both warning and warning_symbol are nowhere used.
Let's get rid of them.

Signed-off-by: Richard Weinberger &lt;richard@nod.at&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Huang Ying &lt;ying.huang@intel.com&gt;
Cc: Soeren Sandmann Pedersen &lt;ssp@redhat.com&gt;
Cc: Namhyung Kim &lt;namhyung@gmail.com&gt;
Cc: x86 &lt;x86@kernel.org&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Robert Richter &lt;robert.richter@amd.com&gt;
Cc: Paul Mundt &lt;lethal@linux-sh.org&gt;
Link: http://lkml.kernel.org/r/1305205872-10321-2-git-send-email-richard@nod.at
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Both warning and warning_symbol are nowhere used.
Let's get rid of them.

Signed-off-by: Richard Weinberger &lt;richard@nod.at&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Huang Ying &lt;ying.huang@intel.com&gt;
Cc: Soeren Sandmann Pedersen &lt;ssp@redhat.com&gt;
Cc: Namhyung Kim &lt;namhyung@gmail.com&gt;
Cc: x86 &lt;x86@kernel.org&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Robert Richter &lt;robert.richter@amd.com&gt;
Cc: Paul Mundt &lt;lethal@linux-sh.org&gt;
Link: http://lkml.kernel.org/r/1305205872-10321-2-git-send-email-richard@nod.at
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core</title>
<updated>2011-05-01T17:09:39+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@elte.hu</email>
</author>
<published>2011-05-01T17:09:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=809435ff4f43a5c0cb0201b3b89176253d5ade18'/>
<id>809435ff4f43a5c0cb0201b3b89176253d5ade18</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>perf, x86, nmi: Move LVT un-masking into irq handlers</title>
<updated>2011-04-27T15:59:11+00:00</updated>
<author>
<name>Don Zickus</name>
<email>dzickus@redhat.com</email>
</author>
<published>2011-04-27T10:32:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2bce5daca28346f19c190dbdb5542c9fe3e8c6e6'/>
<id>2bce5daca28346f19c190dbdb5542c9fe3e8c6e6</id>
<content type='text'>
It was noticed that P4 machines were generating double NMIs for
each perf event.  These extra NMIs lead to 'Dazed and confused'
messages on the screen.

I tracked this down to a P4 quirk that said the overflow bit had
to be cleared before re-enabling the apic LVT mask.  My first
attempt was to move the un-masking inside the perf nmi handler
from before the chipset NMI handler to after.

This broke Nehalem boxes that seem to like the unmasking before
the counters themselves are re-enabled.

In order to keep this change simple for 2.6.39, I decided to
just simply move the apic LVT un-masking to the beginning of all
the chipset NMI handlers, with the exception of Pentium4's to
fix the double NMI issue.

Later on we can move the un-masking to later in the handlers to
save a number of 'extra' NMIs on those particular chipsets.

I tested this change on a P4 machine, an AMD machine, a Nehalem
box, and a core2quad box.  'perf top' worked correctly along
with various other small 'perf record' runs.  Anything high
stress breaks all the machines but that is a different problem.

Thanks to various people for testing different versions of this
patch.

Reported-and-tested-by: Shaun Ruffell &lt;sruffell@digium.com&gt;
Signed-off-by: Don Zickus &lt;dzickus@redhat.com&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@gmail.com&gt;
Link: http://lkml.kernel.org/r/1303900353-10242-1-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
CC: Cyrill Gorcunov &lt;gorcunov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It was noticed that P4 machines were generating double NMIs for
each perf event.  These extra NMIs lead to 'Dazed and confused'
messages on the screen.

I tracked this down to a P4 quirk that said the overflow bit had
to be cleared before re-enabling the apic LVT mask.  My first
attempt was to move the un-masking inside the perf nmi handler
from before the chipset NMI handler to after.

This broke Nehalem boxes that seem to like the unmasking before
the counters themselves are re-enabled.

In order to keep this change simple for 2.6.39, I decided to
just simply move the apic LVT un-masking to the beginning of all
the chipset NMI handlers, with the exception of Pentium4's to
fix the double NMI issue.

Later on we can move the un-masking to later in the handlers to
save a number of 'extra' NMIs on those particular chipsets.

I tested this change on a P4 machine, an AMD machine, a Nehalem
box, and a core2quad box.  'perf top' worked correctly along
with various other small 'perf record' runs.  Anything high
stress breaks all the machines but that is a different problem.

Thanks to various people for testing different versions of this
patch.

Reported-and-tested-by: Shaun Ruffell &lt;sruffell@digium.com&gt;
Signed-off-by: Don Zickus &lt;dzickus@redhat.com&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@gmail.com&gt;
Link: http://lkml.kernel.org/r/1303900353-10242-1-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
CC: Cyrill Gorcunov &lt;gorcunov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf, x86: Fix BTS condition</title>
<updated>2011-04-26T11:34:34+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2011-04-26T11:24:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=18a073a3acd3a47fbb5e23333df7fad28d576345'/>
<id>18a073a3acd3a47fbb5e23333df7fad28d576345</id>
<content type='text'>
Currently the x86 backend incorrectly assumes that any BRANCH_INSN
with sample_period==1 is a BTS request. This is not true when we do
frequency driven profiling such as 'perf record -e branches'.

Solves this error:

  $ perf record -e branches ./array
  Error: sys_perf_event_open() syscall returned with 95 (Operation not supported).

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Reported-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: "Metzger, Markus T" &lt;markus.t.metzger@intel.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Link: http://lkml.kernel.org/n/tip-rd2y4ct71hjawzz6fpvsy9hg@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently the x86 backend incorrectly assumes that any BRANCH_INSN
with sample_period==1 is a BTS request. This is not true when we do
frequency driven profiling such as 'perf record -e branches'.

Solves this error:

  $ perf record -e branches ./array
  Error: sys_perf_event_open() syscall returned with 95 (Operation not supported).

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Reported-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: "Metzger, Markus T" &lt;markus.t.metzger@intel.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Link: http://lkml.kernel.org/n/tip-rd2y4ct71hjawzz6fpvsy9hg@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86, perf event: Turn off unstructured raw event access to offcore registers</title>
<updated>2011-04-22T08:02:53+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@elte.hu</email>
</author>
<published>2011-04-22T06:44:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b52c55c6a25e4515b5e075a989ff346fc251ed09'/>
<id>b52c55c6a25e4515b5e075a989ff346fc251ed09</id>
<content type='text'>
Andi Kleen pointed out that the Intel offcore support patches were merged
without user-space tool support to the functionality:

 |
 | The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
 | user space bits were not. This made it impossible to set the extra mask
 | and actually do the OFFCORE profiling
 |

Andi submitted a preliminary patch for user-space support, as an
extension to perf's raw event syntax:

 |
 | Some raw events -- like the Intel OFFCORE events -- support additional
 | parameters. These can be appended after a ':'.
 |
 | For example on a multi socket Intel Nehalem:
 |
 |    perf stat -e r1b7:20ff -a sleep 1
 |
 | Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
 | that measures any access to DRAM on another socket.
 |

But this kind of usability is absolutely unacceptable - users should not
be expected to type in magic, CPU and model specific incantations to get
access to useful hardware functionality.

The proper solution is to expose useful offcore functionality via
generalized events - that way users do not have to care which specific
CPU model they are using, they can use the conceptual event and not some
model specific quirky hexa number.

We already have such generalization in place for CPU cache events,
and it's all very extensible.

"Offcore" events measure general DRAM access patters along various
parameters. They are particularly useful in NUMA systems.

We want to support them via generalized DRAM events: either as the
fourth level of cache (after the last-level cache), or as a separate
generalization category.

That way user-space support would be very obvious, memory access
profiling could be done via self-explanatory commands like:

  perf record -e dram ./myapp
  perf record -e dram-remote ./myapp

... to measure DRAM accesses or more expensive cross-node NUMA DRAM
accesses.

These generalized events would work on all CPUs and architectures that
have comparable PMU features.

( Note, these are just examples: actual implementation could have more
  sophistication and more parameter - as long as they center around
  similarly simple usecases. )

Now we do not want to revert *all* of the current offcore bits, as they
are still somewhat useful for generic last-level-cache events, implemented
in this commit:

  e994d7d23a0b: perf: Fix LLC-* events on Intel Nehalem/Westmere

But we definitely do not yet want to expose the unstructured raw events
to user-space, until better generalization and usability is implemented
for these hardware event features.

( Note: after generalization has been implemented raw offcore events can be
  supported as well: there can always be an odd event that is marginally
  useful but not useful enough to generalize. DRAM profiling is definitely
  *not* such a category so generalization must be done first. )

Furthermore, PERF_TYPE_RAW access to these registers was not intended
to go upstream without proper support - it was a side-effect of the above
e994d7d23a0b commit, not mentioned in the changelog.

As v2.6.39 is nearing release we go for the simplest approach: disable
the PERF_TYPE_RAW offcore hack for now, before it escapes into a released
kernel and becomes an ABI.

Once proper structure is implemented for these hardware events and users
are offered usable solutions we can revisit this issue.

Reported-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Andi Kleen pointed out that the Intel offcore support patches were merged
without user-space tool support to the functionality:

 |
 | The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
 | user space bits were not. This made it impossible to set the extra mask
 | and actually do the OFFCORE profiling
 |

Andi submitted a preliminary patch for user-space support, as an
extension to perf's raw event syntax:

 |
 | Some raw events -- like the Intel OFFCORE events -- support additional
 | parameters. These can be appended after a ':'.
 |
 | For example on a multi socket Intel Nehalem:
 |
 |    perf stat -e r1b7:20ff -a sleep 1
 |
 | Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
 | that measures any access to DRAM on another socket.
 |

But this kind of usability is absolutely unacceptable - users should not
be expected to type in magic, CPU and model specific incantations to get
access to useful hardware functionality.

The proper solution is to expose useful offcore functionality via
generalized events - that way users do not have to care which specific
CPU model they are using, they can use the conceptual event and not some
model specific quirky hexa number.

We already have such generalization in place for CPU cache events,
and it's all very extensible.

"Offcore" events measure general DRAM access patters along various
parameters. They are particularly useful in NUMA systems.

We want to support them via generalized DRAM events: either as the
fourth level of cache (after the last-level cache), or as a separate
generalization category.

That way user-space support would be very obvious, memory access
profiling could be done via self-explanatory commands like:

  perf record -e dram ./myapp
  perf record -e dram-remote ./myapp

... to measure DRAM accesses or more expensive cross-node NUMA DRAM
accesses.

These generalized events would work on all CPUs and architectures that
have comparable PMU features.

( Note, these are just examples: actual implementation could have more
  sophistication and more parameter - as long as they center around
  similarly simple usecases. )

Now we do not want to revert *all* of the current offcore bits, as they
are still somewhat useful for generic last-level-cache events, implemented
in this commit:

  e994d7d23a0b: perf: Fix LLC-* events on Intel Nehalem/Westmere

But we definitely do not yet want to expose the unstructured raw events
to user-space, until better generalization and usability is implemented
for these hardware event features.

( Note: after generalization has been implemented raw offcore events can be
  supported as well: there can always be an odd event that is marginally
  useful but not useful enough to generalize. DRAM profiling is definitely
  *not* such a category so generalization must be done first. )

Furthermore, PERF_TYPE_RAW access to these registers was not intended
to go upstream without proper support - it was a side-effect of the above
e994d7d23a0b commit, not mentioned in the changelog.

As v2.6.39 is nearing release we go for the simplest approach: disable
the PERF_TYPE_RAW offcore hack for now, before it escapes into a released
kernel and becomes an ABI.

Once proper structure is implemented for these hardware events and users
are offered usable solutions we can revisit this issue.

Reported-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf, x86: Use ALTERNATIVE() to check for X86_FEATURE_PERFCTR_CORE</title>
<updated>2011-04-19T08:08:12+00:00</updated>
<author>
<name>Robert Richter</name>
<email>robert.richter@amd.com</email>
</author>
<published>2011-04-16T00:27:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c8e5910edf8bbe2e5c6c35a4ef2a578cc7893b25'/>
<id>c8e5910edf8bbe2e5c6c35a4ef2a578cc7893b25</id>
<content type='text'>
Using ALTERNATIVE() when checking for X86_FEATURE_PERFCTR_CORE avoids
an extra pointer chase and data cache hit.

Signed-off-by: Robert Richter &lt;robert.richter@amd.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/1302913676-14352-4-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Using ALTERNATIVE() when checking for X86_FEATURE_PERFCTR_CORE avoids
an extra pointer chase and data cache hit.

Signed-off-by: Robert Richter &lt;robert.richter@amd.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/1302913676-14352-4-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip</title>
<updated>2011-03-26T00:53:09+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-03-26T00:53:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2a20b02c055a14eb60ac8da737d79dc940bb9ee0'/>
<id>2a20b02c055a14eb60ac8da737d79dc940bb9ee0</id>
<content type='text'>
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86: Complain louder about BIOSen corrupting CPU/PMU state and continue
  perf, x86: P4 PMU - Read proper MSR register to catch unflagged overflows
  perf symbols: Look at .dynsym again if .symtab not found
  perf build-id: Add quirk to deal with perf.data file format breakage
  perf session: Pass evsel in event_ops-&gt;sample()
  perf: Better fit max unprivileged mlock pages for tools needs
  perf_events: Fix stale -&gt;cgrp pointer in update_cgrp_time_from_cpuctx()
  perf top: Fix uninitialized 'counter' variable
  tracing: Fix set_ftrace_filter probe function display
  perf, x86: Fix Intel fixed counters base initialization
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86: Complain louder about BIOSen corrupting CPU/PMU state and continue
  perf, x86: P4 PMU - Read proper MSR register to catch unflagged overflows
  perf symbols: Look at .dynsym again if .symtab not found
  perf build-id: Add quirk to deal with perf.data file format breakage
  perf session: Pass evsel in event_ops-&gt;sample()
  perf: Better fit max unprivileged mlock pages for tools needs
  perf_events: Fix stale -&gt;cgrp pointer in update_cgrp_time_from_cpuctx()
  perf top: Fix uninitialized 'counter' variable
  tracing: Fix set_ftrace_filter probe function display
  perf, x86: Fix Intel fixed counters base initialization
</pre>
</div>
</content>
</entry>
<entry>
<title>perf, x86: Complain louder about BIOSen corrupting CPU/PMU state and continue</title>
<updated>2011-03-25T10:23:41+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@elte.hu</email>
</author>
<published>2011-03-25T09:24:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=45daae575e08bbf7405c5a3633e956fa364d1b4f'/>
<id>45daae575e08bbf7405c5a3633e956fa364d1b4f</id>
<content type='text'>
Eric Dumazet reported that hardware PMU events do not work on his
system, due to the BIOS corrupting PMU state:

    Performance Events: PEBS fmt0+, Core2 events, Broken BIOS detected, using software events only.
    [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 186 is 43003c)

Linus suggested that we continue in the face of such BIOS-induced CPU
state corruption:

   http://lkml.org/lkml/2011/3/24/608

Such BIOSes will have to be fixed - Linux developers rely on a working and
fully capable PMU and the BIOS interfering with the CPU's PMU state is simply
not acceptable.

So this patch changes perf to continue when it detects such BIOS
interaction, some hardware events may be unreliable due to the BIOS
writing and re-writing them - there's not much the kernel can do
about that but to detect the corruption and report it.

Reported-and-tested-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
LKML-Reference: &lt;new-submission&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Eric Dumazet reported that hardware PMU events do not work on his
system, due to the BIOS corrupting PMU state:

    Performance Events: PEBS fmt0+, Core2 events, Broken BIOS detected, using software events only.
    [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 186 is 43003c)

Linus suggested that we continue in the face of such BIOS-induced CPU
state corruption:

   http://lkml.org/lkml/2011/3/24/608

Such BIOSes will have to be fixed - Linux developers rely on a working and
fully capable PMU and the BIOS interfering with the CPU's PMU state is simply
not acceptable.

So this patch changes perf to continue when it detects such BIOS
interaction, some hardware events may be unreliable due to the BIOS
writing and re-writing them - there's not much the kernel can do
about that but to detect the corruption and report it.

Reported-and-tested-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
LKML-Reference: &lt;new-submission&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf, x86: Fix Intel fixed counters base initialization</title>
<updated>2011-03-19T18:00:49+00:00</updated>
<author>
<name>Stephane Eranian</name>
<email>eranian@google.com</email>
</author>
<published>2011-03-19T17:20:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fc66c5210ec2539e800e87d7b3a985323c7be96e'/>
<id>fc66c5210ec2539e800e87d7b3a985323c7be96e</id>
<content type='text'>
The following patch solves the problems introduced by Robert's
commit 41bf498 and reported by Arun Sharma. This commit gets rid
of the base + index notation for reading and writing PMU msrs.

The problem is that for fixed counters, the new calculation for
the base did not take into account the fixed counter indexes,
thus all fixed counters were read/written from fixed counter 0.
Although all fixed counters share the same config MSR, they each
have their own counter register.

Without:

 $ task -e unhalted_core_cycles -e instructions_retired -e baclears noploop 1 noploop for 1 seconds

  242202299 unhalted_core_cycles (0.00% scaling, ena=1000790892, run=1000790892)
 2389685946 instructions_retired (0.00% scaling, ena=1000790892, run=1000790892)
      49473 baclears             (0.00% scaling, ena=1000790892, run=1000790892)

With:

 $ task -e unhalted_core_cycles -e instructions_retired -e baclears noploop 1 noploop for 1 seconds

 2392703238 unhalted_core_cycles (0.00% scaling, ena=1000840809, run=1000840809)
 2389793744 instructions_retired (0.00% scaling, ena=1000840809, run=1000840809)
      47863 baclears             (0.00% scaling, ena=1000840809, run=1000840809)

Signed-off-by: Stephane Eranian &lt;eranian@google.com&gt;
Cc: peterz@infradead.org
Cc: ming.m.lin@intel.com
Cc: robert.richter@amd.com
Cc: asharma@fb.com
Cc: perfmon2-devel@lists.sf.net
LKML-Reference: &lt;20110319172005.GB4978@quad&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The following patch solves the problems introduced by Robert's
commit 41bf498 and reported by Arun Sharma. This commit gets rid
of the base + index notation for reading and writing PMU msrs.

The problem is that for fixed counters, the new calculation for
the base did not take into account the fixed counter indexes,
thus all fixed counters were read/written from fixed counter 0.
Although all fixed counters share the same config MSR, they each
have their own counter register.

Without:

 $ task -e unhalted_core_cycles -e instructions_retired -e baclears noploop 1 noploop for 1 seconds

  242202299 unhalted_core_cycles (0.00% scaling, ena=1000790892, run=1000790892)
 2389685946 instructions_retired (0.00% scaling, ena=1000790892, run=1000790892)
      49473 baclears             (0.00% scaling, ena=1000790892, run=1000790892)

With:

 $ task -e unhalted_core_cycles -e instructions_retired -e baclears noploop 1 noploop for 1 seconds

 2392703238 unhalted_core_cycles (0.00% scaling, ena=1000840809, run=1000840809)
 2389793744 instructions_retired (0.00% scaling, ena=1000840809, run=1000840809)
      47863 baclears             (0.00% scaling, ena=1000840809, run=1000840809)

Signed-off-by: Stephane Eranian &lt;eranian@google.com&gt;
Cc: peterz@infradead.org
Cc: ming.m.lin@intel.com
Cc: robert.richter@amd.com
Cc: asharma@fb.com
Cc: perfmon2-devel@lists.sf.net
LKML-Reference: &lt;20110319172005.GB4978@quad&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
</feed>
