linux-toradex.git, branch v4.4.7

Linux 4.4.7

2016-04-12T16:09:26+00:00

perf/x86/intel: Fix PEBS data source interpretation on Nehalem/Westmere

2016-04-12T16:09:06+00:00

commit e17dc65328057c00db7e1bfea249c8771a78b30b upstream.

Jiri reported some time ago that some entries in the PEBS data source table
in perf do not agree with the SDM. We investigated and the bits
changed for Sandy Bridge, but the SDM was not updated.

perf already implements the bits correctly for Sandy Bridge
and later. This patch patches it up for Nehalem and Westmere.

Signed-off-by: Andi Kleen 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: jolsa@kernel.org
Link: http://lkml.kernel.org/r/1456871124-15985-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

perf/x86/intel: Use PAGE_SIZE for PEBS buffer size on Core2

2016-04-12T16:09:06+00:00

commit e72daf3f4d764c47fb71c9bdc7f9c54a503825b1 upstream.

Using PAGE_SIZE buffers makes the WRMSR to PERF_GLOBAL_CTRL in
intel_pmu_enable_all() mysteriously hang on Core2. As a workaround, we
don't do this.

The hard lockup is easily triggered by running 'perf test attr'
repeatedly. Most of the time it gets stuck on sample session with
small periods.

  # perf test attr -vv
  14: struct perf_event_attr setup                             :
  --- start ---
  ...
    'PERF_TEST_ATTR=/tmp/tmpuEKz3B /usr/bin/perf record -o /tmp/tmpuEKz3B/perf.data -c 123 kill >/dev/null 2>&1' ret 1

Reported-by: Arnaldo Carvalho de Melo 
Signed-off-by: Jiri Olsa 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Andi Kleen 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Cc: Wang Nan 
Link: http://lkml.kernel.org/r/20160301190352.GA8355@krava.redhat.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

perf/x86/intel: Fix PEBS warning by only restoring active PMU in pmi

2016-04-12T16:09:06+00:00

commit c3d266c8a9838cc141b69548bc3b1b18808ae8c4 upstream.

This patch tries to fix a PEBS warning found in my stress test. The
following perf command can easily trigger the pebs warning or spurious
NMI error on Skylake/Broadwell/Haswell platforms:

  sudo perf record -e 'cpu/umask=0x04,event=0xc4/pp,cycles,branches,ref-cycles,cache-misses,cache-references' --call-graph fp -b -c1000 -a

Also the NMI watchdog must be enabled.

For this case, the events number is larger than counter number. So
perf has to do multiplexing.

In perf_mux_hrtimer_handler, it does perf_pmu_disable(), schedule out
old events, rotate_ctx, schedule in new events and finally
perf_pmu_enable().

If the old events include precise event, the MSR_IA32_PEBS_ENABLE
should be cleared when perf_pmu_disable().  The MSR_IA32_PEBS_ENABLE
should keep 0 until the perf_pmu_enable() is called and the new event is
precise event.

However, there is a corner case which could restore PEBS_ENABLE to
stale value during the above period. In perf_pmu_disable(), GLOBAL_CTRL
will be set to 0 to stop overflow and followed PMI. But there may be
pending PMI from an earlier overflow, which cannot be stopped. So even
GLOBAL_CTRL is cleared, the kernel still be possible to get PMI. At
the end of the PMI handler, __intel_pmu_enable_all() will be called,
which will restore the stale values if old events haven't scheduled
out.

Once the stale pebs value is set, it's impossible to be corrected if
the new events are non-precise. Because the pebs_enabled will be set
to 0. x86_pmu.enable_all() will ignore the MSR_IA32_PEBS_ENABLE
setting. As a result, the following NMI with stale PEBS_ENABLE
trigger pebs warning.

The pending PMI after enabled=0 will become harmless if the NMI handler
does not change the state. This patch checks cpuc->enabled in pmi and
only restore the state when PMU is active.

Here is the dump:

  Call Trace:
     [] dump_stack+0x63/0x85
   [] warn_slowpath_common+0x82/0xc0
   [] warn_slowpath_null+0x1a/0x20
   [] intel_pmu_drain_pebs_nhm+0x2be/0x320
   [] intel_pmu_handle_irq+0x279/0x460
   [] ? native_write_msr_safe+0x6/0x40
   [] ? vunmap_page_range+0x20d/0x330
   [] ?  unmap_kernel_range_noflush+0x11/0x20
   [] ? ghes_copy_tofrom_phys+0x10f/0x2a0
   [] ? ghes_read_estatus+0x98/0x170
   [] perf_event_nmi_handler+0x2d/0x50
   [] nmi_handle+0x69/0x120
   [] default_do_nmi+0xe6/0x100
   [] do_nmi+0xe2/0x130
   [] end_repeat_nmi+0x1a/0x1e
   [] ? native_write_msr_safe+0x6/0x40
   [] ? native_write_msr_safe+0x6/0x40
   [] ? native_write_msr_safe+0x6/0x40
   <>    [] ?  x86_perf_event_set_period+0xd8/0x180
   [] x86_pmu_start+0x4c/0x100
   [] x86_pmu_enable+0x28d/0x300
   [] perf_pmu_enable.part.81+0x7/0x10
   [] perf_mux_hrtimer_handler+0x200/0x280
   [] ?  __perf_install_in_context+0xc0/0xc0
   [] __hrtimer_run_queues+0xfd/0x280
   [] hrtimer_interrupt+0xa8/0x190
   [] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
   [] local_apic_timer_interrupt+0x38/0x60
   [] smp_apic_timer_interrupt+0x3d/0x50
   [] apic_timer_interrupt+0x8c/0xa0
     [] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
   [] ?  smp_call_function_single+0xd5/0x130
   [] ?  smp_call_function_single+0xcb/0x130
   [] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
   [] event_function_call+0x10a/0x120
   [] ? ctx_resched+0x90/0x90
   [] ? cpu_clock_event_read+0x30/0x30
   [] ? _perf_event_disable+0x60/0x60
   [] _perf_event_enable+0x5b/0x70
   [] perf_event_for_each_child+0x38/0xa0
   [] ? _perf_event_disable+0x60/0x60
   [] perf_ioctl+0x12d/0x3c0
   [] ? selinux_file_ioctl+0x95/0x1e0
   [] do_vfs_ioctl+0xa1/0x5a0
   [] ? sched_clock+0x9/0x10
   [] SyS_ioctl+0x79/0x90
   [] entry_SYSCALL_64_fastpath+0x1a/0xa4
  ---[ end trace aef202839fe9a71d ]---
  Uhhuh. NMI received for unknown reason 2d on CPU 2.
  Do you have a strange power saving mode enabled?

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: http://lkml.kernel.org/r/1457046448-6184-1-git-send-email-kan.liang@intel.com
[ Fixed various typos and other small details. ]
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

perf/x86/pebs: Add workaround for broken OVFL status on HSW+

2016-04-12T16:09:05+00:00

commit 8077eca079a212f26419c57226f28696b7100683 upstream.

This patch fixes an issue with the GLOBAL_OVERFLOW_STATUS bits on
Haswell, Broadwell and Skylake processors when using PEBS.

The SDM stipulates that when the PEBS iterrupt threshold is crossed,
an interrupt is posted and the kernel is interrupted. The kernel will
find GLOBAL_OVF_SATUS bit 62 set indicating there are PEBS records to
drain. But the bits corresponding to the actual counters should NOT be
set. The kernel follows the SDM and assumes that all PEBS events are
processed in the drain_pebs() callback. The kernel then checks for
remaining overflows on any other (non-PEBS) events and processes these
in the for_each_bit_set(&status) loop.

As it turns out, under certain conditions on HSW and later processors,
on PEBS buffer interrupt, bit 62 is set but the counter bits may be
set as well. In that case, the kernel drains PEBS and generates
SAMPLES with the EXACT tag, then it processes the counter bits, and
generates normal (non-EXACT) SAMPLES.

I ran into this problem by trying to understand why on HSW sampling on
a PEBS event was sometimes returning SAMPLES without the EXACT tag.
This should not happen on user level code because HSW has the
eventing_ip which always point to the instruction that caused the
event.

The workaround in this patch simply ensures that the bits for the
counters used for PEBS events are cleared after the PEBS buffer has
been drained. With this fix 100% of the PEBS samples on my user code
report the EXACT tag.

Before:
  $ perf record -e cpu/event=0xd0,umask=0x81/upp ./multichase
  $ perf report -D | fgrep SAMPLES
  PERF_RECORD_SAMPLE(IP, 0x2): 11775/11775: 0x406de5 period: 73469 addr: 0 exact=Y
                           \--- EXACT tag is missing

After:
  $ perf record -e cpu/event=0xd0,umask=0x81/upp ./multichase
  $ perf report -D | fgrep SAMPLES
  PERF_RECORD_SAMPLE(IP, 0x4002): 11775/11775: 0x406de5 period: 73469 addr: 0 exact=Y
                           \--- EXACT tag is set

The problem tends to appear more often when multiple PEBS events are used.

Signed-off-by: Stephane Eranian 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: namhyung@kernel.org
Link: http://lkml.kernel.org/r/1457034642-21837-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

sched/cputime: Fix steal time accounting vs. CPU hotplug

2016-04-12T16:09:05+00:00

commit e9532e69b8d1d1284e8ecf8d2586de34aec61244 upstream.

On CPU hotplug the steal time accounting can keep a stale rq->prev_steal_time
value over CPU down and up. So after the CPU comes up again the delta
calculation in steal_account_process_tick() wreckages itself due to the
unsigned math:

	 u64 steal = paravirt_steal_clock(smp_processor_id());

	 steal -= this_rq()->prev_steal_time;

So if steal is smaller than rq->prev_steal_time we end up with an insane large
value which then gets added to rq->prev_steal_time, resulting in a permanent
wreckage of the accounting. As a consequence the per CPU stats in /proc/stat
become stale.

Nice trick to tell the world how idle the system is (100%) while the CPU is
100% busy running tasks. Though we prefer realistic numbers.

None of the accounting values which use a previous value to account for
fractions is reset at CPU hotplug time. update_rq_clock_task() has a sanity
check for prev_irq_time and prev_steal_time_rq, but that sanity check solely
deals with clock warps and limits the /proc/stat visible wreckage. The
prev_time values are still wrong.

Solution is simple: Reset rq->prev_*_time when the CPU is plugged in again.

Signed-off-by: Thomas Gleixner 
Acked-by: Rik van Riel 
Cc: Frederic Weisbecker 
Cc: Glauber Costa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Fixes: commit 095c0aa83e52 "sched: adjust scheduler cpu power for stolen time"
Fixes: commit aa483808516c "sched: Remove irq time from available CPU power"
Fixes: commit e6e6685accfa "KVM guest: Steal time accounting"
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603041539490.3686@nanos
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

scsi_common: do not clobber fixed sense information

2016-04-12T16:09:05+00:00

commit ba08311647892cc7912de74525fd78416caf544a upstream.

For fixed sense the information field is 32 bits, to we need to truncate
the information field to avoid clobbering the sense code.

Fixes: a1524f226a02 ("libata-eh: Set 'information' field for autosense")
Signed-off-by: Hannes Reinecke 
Reviewed-by: Lee Duncan 
Reviewed-by: Bart Van Assche 
Reviewed-by: Ewan D. Milne 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Greg Kroah-Hartman

PM / sleep: Clear pm_suspend_global_flags upon hibernate

2016-04-12T16:09:05+00:00

commit 276142730c39c9839465a36a90e5674a8c34e839 upstream.

When suspending to RAM, waking up and later suspending to disk,
we gratuitously runtime resume devices after the thaw phase.
This does not occur if we always suspend to RAM or always to disk.

pm_complete_with_resume_check(), which gets called from
pci_pm_complete() among others, schedules a runtime resume
if PM_SUSPEND_FLAG_FW_RESUME is set. The flag is set during
a suspend-to-RAM cycle. It is cleared at the beginning of
the suspend-to-RAM cycle but not afterwards and it is not
cleared during a suspend-to-disk cycle at all. Fix it.

Fixes: ef25ba047601 (PM / sleep: Add flags to indicate platform firmware involvement)
Signed-off-by: Lukas Wunner 
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Greg Kroah-Hartman

intel_idle: prevent SKL-H boot failure when C8+C9+C10 enabled

2016-04-12T16:09:05+00:00

commit d70e28f57e14a481977436695b0c9ba165472431 upstream.

Some SKL-H configurations require "intel_idle.max_cstate=7" to boot.
While that is an effective workaround, it disables C10.

This patch detects the problematic configuration,
and disables C8 and C9, keeping C10 enabled.

Note that enabling SGX in BIOS SETUP can also prevent this issue,
if the system BIOS provides that option.

https://bugzilla.kernel.org/show_bug.cgi?id=109081
"Freezes with Intel i7 6700HQ (Skylake), unless intel_idle.max_cstate=7"

Signed-off-by: Len Brown 
Signed-off-by: Greg Kroah-Hartman

mtd: onenand: fix deadlock in onenand_block_markbad

2016-04-12T16:09:05+00:00

commit 5e64c29e98bfbba1b527b0a164f9493f3db9e8cb upstream.

Commit 5942ddbc500d ("mtd: introduce mtd_block_markbad interface")
incorrectly changed onenand_block_markbad() to call mtd_block_markbad
instead of onenand_chip's block_markbad function. As a result the function
will now recurse and deadlock. Fix by reverting the change.

Fixes: 5942ddbc500d ("mtd: introduce mtd_block_markbad interface")
Signed-off-by: Aaro Koskinen 
Acked-by: Artem Bityutskiy 
Signed-off-by: Brian Norris 
Signed-off-by: Greg Kroah-Hartman