<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel/time, branch v3.17-rc4</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>timekeeping: Update timekeeper before updating vsyscall and pvclock</title>
<updated>2014-09-06T10:58:18+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-09-06T10:24:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9bf2419fa7bffa16ce58a4d5c20399eff8c970c9'/>
<id>9bf2419fa7bffa16ce58a4d5c20399eff8c970c9</id>
<content type='text'>
The update_walltime() code works on the shadow timekeeper to make the
seqcount protected region as short as possible. But that update to the
shadow timekeeper does not update all timekeeper fields because it's
sufficient to do that once before it becomes life. One of these fields
is tkr.base_mono. That stays stale in the shadow timekeeper unless an
operation happens which copies the real timekeeper to the shadow.

The update function is called after the update calls to vsyscall and
pvclock. While not correct, it did not cause any problems because none
of the invoked update functions used base_mono.

commit cbcf2dd3b3d4 (x86: kvm: Make kvm_get_time_and_clockread()
nanoseconds based) changed that in the kvm pvclock update function, so
the stale mono_base value got used and caused kvm-clock to malfunction.

Put the update where it belongs and fix the issue.

Reported-by: Chris J Arges &lt;chris.j.arges@canonical.com&gt;
Reported-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Gleb Natapov &lt;gleb@kernel.org&gt;
Cc: John Stultz &lt;john.stultz@linaro.org&gt;
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409050000570.3333@nanos
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The update_walltime() code works on the shadow timekeeper to make the
seqcount protected region as short as possible. But that update to the
shadow timekeeper does not update all timekeeper fields because it's
sufficient to do that once before it becomes life. One of these fields
is tkr.base_mono. That stays stale in the shadow timekeeper unless an
operation happens which copies the real timekeeper to the shadow.

The update function is called after the update calls to vsyscall and
pvclock. While not correct, it did not cause any problems because none
of the invoked update functions used base_mono.

commit cbcf2dd3b3d4 (x86: kvm: Make kvm_get_time_and_clockread()
nanoseconds based) changed that in the kvm pvclock update function, so
the stale mono_base value got used and caused kvm-clock to malfunction.

Put the update where it belongs and fix the issue.

Reported-by: Chris J Arges &lt;chris.j.arges@canonical.com&gt;
Reported-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Gleb Natapov &lt;gleb@kernel.org&gt;
Cc: John Stultz &lt;john.stultz@linaro.org&gt;
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409050000570.3333@nanos
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nohz: Restore NMI safe local irq work for local nohz kick</title>
<updated>2014-09-04T20:35:59+00:00</updated>
<author>
<name>Frederic Weisbecker</name>
<email>fweisbec@gmail.com</email>
</author>
<published>2014-08-13T16:50:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=40bea039593dfc7f3f9814dab844f6db43ae580b'/>
<id>40bea039593dfc7f3f9814dab844f6db43ae580b</id>
<content type='text'>
The local nohz kick is currently used by perf which needs it to be
NMI-safe. Recent commit though (7d1311b93e58ed55f3a31cc8f94c4b8fe988a2b9)
changed its implementation to fire the local kick using the remote kick
API. It was convenient to make the code more generic but the remote kick
isn't NMI-safe.

As a result:

	WARNING: CPU: 3 PID: 18062 at kernel/irq_work.c:72 irq_work_queue_on+0x11e/0x140()
	CPU: 3 PID: 18062 Comm: trinity-subchil Not tainted 3.16.0+ #34
	0000000000000009 00000000903774d1 ffff880244e06c00 ffffffff9a7f1e37
	0000000000000000 ffff880244e06c38 ffffffff9a0791dd ffff880244fce180
	0000000000000003 ffff880244e06d58 ffff880244e06ef8 0000000000000000
	Call Trace:
	&lt;NMI&gt;  [&lt;ffffffff9a7f1e37&gt;] dump_stack+0x4e/0x7a
	[&lt;ffffffff9a0791dd&gt;] warn_slowpath_common+0x7d/0xa0
	[&lt;ffffffff9a07930a&gt;] warn_slowpath_null+0x1a/0x20
	[&lt;ffffffff9a17ca1e&gt;] irq_work_queue_on+0x11e/0x140
	[&lt;ffffffff9a10a2c7&gt;] tick_nohz_full_kick_cpu+0x57/0x90
	[&lt;ffffffff9a186cd5&gt;] __perf_event_overflow+0x275/0x350
	[&lt;ffffffff9a184f80&gt;] ? perf_event_task_disable+0xa0/0xa0
	[&lt;ffffffff9a01a4cf&gt;] ? x86_perf_event_set_period+0xbf/0x150
	[&lt;ffffffff9a187934&gt;] perf_event_overflow+0x14/0x20
	[&lt;ffffffff9a020386&gt;] intel_pmu_handle_irq+0x206/0x410
	[&lt;ffffffff9a0b54d3&gt;] ? arch_vtime_task_switch+0x63/0x130
	[&lt;ffffffff9a01937b&gt;] perf_event_nmi_handler+0x2b/0x50
	[&lt;ffffffff9a007b72&gt;] nmi_handle+0xd2/0x390
	[&lt;ffffffff9a007aa5&gt;] ? nmi_handle+0x5/0x390
	[&lt;ffffffff9a0d131b&gt;] ? lock_release+0xab/0x330
	[&lt;ffffffff9a008062&gt;] default_do_nmi+0x72/0x1c0
	[&lt;ffffffff9a0c925f&gt;] ? cpuacct_account_field+0xcf/0x200
	[&lt;ffffffff9a008268&gt;] do_nmi+0xb8/0x100

Lets fix this by restoring the use of local irq work for the nohz local
kick.

Reported-by: Catalin Iacob &lt;iacobcatalin@gmail.com&gt;
Reported-and-tested-by: Dave Jones &lt;davej@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The local nohz kick is currently used by perf which needs it to be
NMI-safe. Recent commit though (7d1311b93e58ed55f3a31cc8f94c4b8fe988a2b9)
changed its implementation to fire the local kick using the remote kick
API. It was convenient to make the code more generic but the remote kick
isn't NMI-safe.

As a result:

	WARNING: CPU: 3 PID: 18062 at kernel/irq_work.c:72 irq_work_queue_on+0x11e/0x140()
	CPU: 3 PID: 18062 Comm: trinity-subchil Not tainted 3.16.0+ #34
	0000000000000009 00000000903774d1 ffff880244e06c00 ffffffff9a7f1e37
	0000000000000000 ffff880244e06c38 ffffffff9a0791dd ffff880244fce180
	0000000000000003 ffff880244e06d58 ffff880244e06ef8 0000000000000000
	Call Trace:
	&lt;NMI&gt;  [&lt;ffffffff9a7f1e37&gt;] dump_stack+0x4e/0x7a
	[&lt;ffffffff9a0791dd&gt;] warn_slowpath_common+0x7d/0xa0
	[&lt;ffffffff9a07930a&gt;] warn_slowpath_null+0x1a/0x20
	[&lt;ffffffff9a17ca1e&gt;] irq_work_queue_on+0x11e/0x140
	[&lt;ffffffff9a10a2c7&gt;] tick_nohz_full_kick_cpu+0x57/0x90
	[&lt;ffffffff9a186cd5&gt;] __perf_event_overflow+0x275/0x350
	[&lt;ffffffff9a184f80&gt;] ? perf_event_task_disable+0xa0/0xa0
	[&lt;ffffffff9a01a4cf&gt;] ? x86_perf_event_set_period+0xbf/0x150
	[&lt;ffffffff9a187934&gt;] perf_event_overflow+0x14/0x20
	[&lt;ffffffff9a020386&gt;] intel_pmu_handle_irq+0x206/0x410
	[&lt;ffffffff9a0b54d3&gt;] ? arch_vtime_task_switch+0x63/0x130
	[&lt;ffffffff9a01937b&gt;] perf_event_nmi_handler+0x2b/0x50
	[&lt;ffffffff9a007b72&gt;] nmi_handle+0xd2/0x390
	[&lt;ffffffff9a007aa5&gt;] ? nmi_handle+0x5/0x390
	[&lt;ffffffff9a0d131b&gt;] ? lock_release+0xab/0x330
	[&lt;ffffffff9a008062&gt;] default_do_nmi+0x72/0x1c0
	[&lt;ffffffff9a0c925f&gt;] ? cpuacct_account_field+0xcf/0x200
	[&lt;ffffffff9a008268&gt;] do_nmi+0xb8/0x100

Lets fix this by restoring the use of local irq work for the nohz local
kick.

Reported-by: Catalin Iacob &lt;iacobcatalin@gmail.com&gt;
Reported-and-tested-by: Dave Jones &lt;davej@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>timekeeping: Another fix to the VSYSCALL_OLD update_vsyscall</title>
<updated>2014-08-14T17:04:11+00:00</updated>
<author>
<name>John Stultz</name>
<email>john.stultz@linaro.org</email>
</author>
<published>2014-08-13T19:47:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0680eb1f485ba5aac2ee02c9f0622239c9a4b16c'/>
<id>0680eb1f485ba5aac2ee02c9f0622239c9a4b16c</id>
<content type='text'>
Benjamin Herrenschmidt pointed out that I further missed modifying
update_vsyscall after the wall_to_mono value was changed to a
timespec64.  This causes issues on powerpc32, which expects a 32bit
timespec.

This patch fixes the problem by properly converting from a timespec64 to
a timespec before passing the value on to the arch-specific vsyscall
logic.

[ Thomas is currently on vacation, but reviewed it and wanted me to send
  this fix on to you directly. ]

Cc: LKML &lt;linux-kernel@vger.kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Reported-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Benjamin Herrenschmidt pointed out that I further missed modifying
update_vsyscall after the wall_to_mono value was changed to a
timespec64.  This causes issues on powerpc32, which expects a 32bit
timespec.

This patch fixes the problem by properly converting from a timespec64 to
a timespec before passing the value on to the arch-specific vsyscall
logic.

[ Thomas is currently on vacation, but reviewed it and wanted me to send
  this fix on to you directly. ]

Cc: LKML &lt;linux-kernel@vger.kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Reported-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2014-08-06T00:46:42+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-08-06T00:46:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e7fda6c4c3c1a7d6996dd75fd84670fa0b5d448f'/>
<id>e7fda6c4c3c1a7d6996dd75fd84670fa0b5d448f</id>
<content type='text'>
Pull timer and time updates from Thomas Gleixner:
 "A rather large update of timers, timekeeping &amp; co

   - Core timekeeping code is year-2038 safe now for 32bit machines.
     Now we just need to fix all in kernel users and the gazillion of
     user space interfaces which rely on timespec/timeval :)

   - Better cache layout for the timekeeping internal data structures.

   - Proper nanosecond based interfaces for in kernel users.

   - Tree wide cleanup of code which wants nanoseconds but does hoops
     and loops to convert back and forth from timespecs.  Some of it
     definitely belongs into the ugly code museum.

   - Consolidation of the timekeeping interface zoo.

   - A fast NMI safe accessor to clock monotonic for tracing.  This is a
     long standing request to support correlated user/kernel space
     traces.  With proper NTP frequency correction it's also suitable
     for correlation of traces accross separate machines.

   - Checkpoint/restart support for timerfd.

   - A few NOHZ[_FULL] improvements in the [hr]timer code.

   - Code move from kernel to kernel/time of all time* related code.

   - New clocksource/event drivers from the ARM universe.  I'm really
     impressed that despite an architected timer in the newer chips SoC
     manufacturers insist on inventing new and differently broken SoC
     specific timers.

[ Ed. "Impressed"? I don't think that word means what you think it means ]

   - Another round of code move from arch to drivers.  Looks like most
     of the legacy mess in ARM regarding timers is sorted out except for
     a few obnoxious strongholds.

   - The usual updates and fixlets all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
  timekeeping: Fixup typo in update_vsyscall_old definition
  clocksource: document some basic timekeeping concepts
  timekeeping: Use cached ntp_tick_length when accumulating error
  timekeeping: Rework frequency adjustments to work better w/ nohz
  timekeeping: Minor fixup for timespec64-&gt;timespec assignment
  ftrace: Provide trace clocks monotonic
  timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC
  seqcount: Add raw_write_seqcount_latch()
  seqcount: Provide raw_read_seqcount()
  timekeeping: Use tk_read_base as argument for timekeeping_get_ns()
  timekeeping: Create struct tk_read_base and use it in struct timekeeper
  timekeeping: Restructure the timekeeper some more
  clocksource: Get rid of cycle_last
  clocksource: Move cycle_last validation to core code
  clocksource: Make delta calculation a function
  wireless: ath9k: Get rid of timespec conversions
  drm: vmwgfx: Use nsec based interfaces
  drm: i915: Use nsec based interfaces
  timekeeping: Provide ktime_get_raw()
  hangcheck-timer: Use ktime_get_ns()
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull timer and time updates from Thomas Gleixner:
 "A rather large update of timers, timekeeping &amp; co

   - Core timekeeping code is year-2038 safe now for 32bit machines.
     Now we just need to fix all in kernel users and the gazillion of
     user space interfaces which rely on timespec/timeval :)

   - Better cache layout for the timekeeping internal data structures.

   - Proper nanosecond based interfaces for in kernel users.

   - Tree wide cleanup of code which wants nanoseconds but does hoops
     and loops to convert back and forth from timespecs.  Some of it
     definitely belongs into the ugly code museum.

   - Consolidation of the timekeeping interface zoo.

   - A fast NMI safe accessor to clock monotonic for tracing.  This is a
     long standing request to support correlated user/kernel space
     traces.  With proper NTP frequency correction it's also suitable
     for correlation of traces accross separate machines.

   - Checkpoint/restart support for timerfd.

   - A few NOHZ[_FULL] improvements in the [hr]timer code.

   - Code move from kernel to kernel/time of all time* related code.

   - New clocksource/event drivers from the ARM universe.  I'm really
     impressed that despite an architected timer in the newer chips SoC
     manufacturers insist on inventing new and differently broken SoC
     specific timers.

[ Ed. "Impressed"? I don't think that word means what you think it means ]

   - Another round of code move from arch to drivers.  Looks like most
     of the legacy mess in ARM regarding timers is sorted out except for
     a few obnoxious strongholds.

   - The usual updates and fixlets all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
  timekeeping: Fixup typo in update_vsyscall_old definition
  clocksource: document some basic timekeeping concepts
  timekeeping: Use cached ntp_tick_length when accumulating error
  timekeeping: Rework frequency adjustments to work better w/ nohz
  timekeeping: Minor fixup for timespec64-&gt;timespec assignment
  ftrace: Provide trace clocks monotonic
  timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC
  seqcount: Add raw_write_seqcount_latch()
  seqcount: Provide raw_read_seqcount()
  timekeeping: Use tk_read_base as argument for timekeeping_get_ns()
  timekeeping: Create struct tk_read_base and use it in struct timekeeper
  timekeeping: Restructure the timekeeper some more
  clocksource: Get rid of cycle_last
  clocksource: Move cycle_last validation to core code
  clocksource: Make delta calculation a function
  wireless: ath9k: Get rid of timespec conversions
  drm: vmwgfx: Use nsec based interfaces
  drm: i915: Use nsec based interfaces
  timekeeping: Provide ktime_get_raw()
  hangcheck-timer: Use ktime_get_ns()
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'staging-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging</title>
<updated>2014-08-05T01:36:12+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-08-05T01:36:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=53ee983378ff23e8f3ff95ecf99dea7c6c221900'/>
<id>53ee983378ff23e8f3ff95ecf99dea7c6c221900</id>
<content type='text'>
Pull staging driver updates from Greg KH:
 "Here's the big pull request for the staging driver tree for 3.17-rc1.

  Lots of things in here, over 2000 patches, but the best part is this:
   1480 files changed, 39070 insertions(+), 254659 deletions(-)

  Thanks to the great work of Kristina Martšenko, 14 different staging
  drivers have been removed from the tree as they were obsolete and no
  one was willing to work on cleaning them up.  Other than the driver
  removals, loads of cleanups are in here (comedi, lustre, etc.) as well
  as the usual IIO driver updates and additions.

  All of this has been in the linux-next tree for a while"

* tag 'staging-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (2199 commits)
  staging: comedi: addi_apci_1564: remove diagnostic interrupt support code
  staging: comedi: addi_apci_1564: add subdevice to check diagnostic status
  staging: wlan-ng: coding style problem fix
  staging: wlan-ng: fixing coding style problems
  staging: comedi: ii_pci20kc: request and ioremap memory
  staging: lustre: bitwise vs logical typo
  staging: dgnc: Remove unneeded dgnc_trace.c and dgnc_trace.h
  staging: dgnc: rephrase comment
  staging: comedi: ni_tio: remove some dead code
  staging: rtl8723au: Fix static symbol sparse warning
  staging: rtl8723au: usb_dvobj_init(): Remove unused variable 'pdev_desc'
  staging: rtl8723au: Do not duplicate kernel provided USB macros
  staging: rtl8723au: Remove never set struct pwrctrl_priv.bHWPowerdown
  staging: rtl8723au: Remove two never set variables
  staging: rtl8723au: RSSI_test is never set
  staging:r8190: coding style: Fixed checkpatch reported Error
  staging:r8180: coding style: Fixed too long lines
  staging:r8180: coding style: Fixed commenting style
  staging: lustre: ptlrpc: lproc_ptlrpc.c - fix dereferenceing user space buffer
  staging: lustre: ldlm: ldlm_resource.c - fix dereferenceing user space buffer
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull staging driver updates from Greg KH:
 "Here's the big pull request for the staging driver tree for 3.17-rc1.

  Lots of things in here, over 2000 patches, but the best part is this:
   1480 files changed, 39070 insertions(+), 254659 deletions(-)

  Thanks to the great work of Kristina Martšenko, 14 different staging
  drivers have been removed from the tree as they were obsolete and no
  one was willing to work on cleaning them up.  Other than the driver
  removals, loads of cleanups are in here (comedi, lustre, etc.) as well
  as the usual IIO driver updates and additions.

  All of this has been in the linux-next tree for a while"

* tag 'staging-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (2199 commits)
  staging: comedi: addi_apci_1564: remove diagnostic interrupt support code
  staging: comedi: addi_apci_1564: add subdevice to check diagnostic status
  staging: wlan-ng: coding style problem fix
  staging: wlan-ng: fixing coding style problems
  staging: comedi: ii_pci20kc: request and ioremap memory
  staging: lustre: bitwise vs logical typo
  staging: dgnc: Remove unneeded dgnc_trace.c and dgnc_trace.h
  staging: dgnc: rephrase comment
  staging: comedi: ni_tio: remove some dead code
  staging: rtl8723au: Fix static symbol sparse warning
  staging: rtl8723au: usb_dvobj_init(): Remove unused variable 'pdev_desc'
  staging: rtl8723au: Do not duplicate kernel provided USB macros
  staging: rtl8723au: Remove never set struct pwrctrl_priv.bHWPowerdown
  staging: rtl8723au: Remove two never set variables
  staging: rtl8723au: RSSI_test is never set
  staging:r8190: coding style: Fixed checkpatch reported Error
  staging:r8180: coding style: Fixed too long lines
  staging:r8180: coding style: Fixed commenting style
  staging: lustre: ptlrpc: lproc_ptlrpc.c - fix dereferenceing user space buffer
  staging: lustre: ldlm: ldlm_resource.c - fix dereferenceing user space buffer
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2014-08-04T23:23:30+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-08-04T23:23:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=98959948a7ba33cf8c708626e0d2a1456397e1c6'/>
<id>98959948a7ba33cf8c708626e0d2a1456397e1c6</id>
<content type='text'>
Pull scheduler updates from Ingo Molnar:

 - Move the nohz kick code out of the scheduler tick to a dedicated IPI,
   from Frederic Weisbecker.

  This necessiated quite some background infrastructure rework,
  including:

   * Clean up some irq-work internals
   * Implement remote irq-work
   * Implement nohz kick on top of remote irq-work
   * Move full dynticks timer enqueue notification to new kick
   * Move multi-task notification to new kick
   * Remove unecessary barriers on multi-task notification

 - Remove proliferation of wait_on_bit() action functions and allow
   wait_on_bit_action() functions to support a timeout.  (Neil Brown)

 - Another round of sched/numa improvements, cleanups and fixes.  (Rik
   van Riel)

 - Implement fast idling of CPUs when the system is partially loaded,
   for better scalability.  (Tim Chen)

 - Restructure and fix the CPU hotplug handling code that may leave
   cfs_rq and rt_rq's throttled when tasks are migrated away from a dead
   cpu.  (Kirill Tkhai)

 - Robustify the sched topology setup code.  (Peterz Zijlstra)

 - Improve sched_feat() handling wrt.  static_keys (Jason Baron)

 - Misc fixes.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
  sched/fair: Fix 'make xmldocs' warning caused by missing description
  sched: Use macro for magic number of -1 for setparam
  sched: Robustify topology setup
  sched: Fix sched_setparam() policy == -1 logic
  sched: Allow wait_on_bit_action() functions to support a timeout
  sched: Remove proliferation of wait_on_bit() action functions
  sched/numa: Revert "Use effective_load() to balance NUMA loads"
  sched: Fix static_key race with sched_feat()
  sched: Remove extra static_key*() function indirection
  sched/rt: Fix replenish_dl_entity() comments to match the current upstream code
  sched: Transform resched_task() into resched_curr()
  sched/deadline: Kill task_struct-&gt;pi_top_task
  sched: Rework check_for_tasks()
  sched/rt: Enqueue just unthrottled rt_rq back on the stack in __disable_runtime()
  sched/fair: Disable runtime_enabled on dying rq
  sched/numa: Change scan period code to match intent
  sched/numa: Rework best node setting in task_numa_migrate()
  sched/numa: Examine a task move when examining a task swap
  sched/numa: Simplify task_numa_compare()
  sched/numa: Use effective_load() to balance NUMA loads
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull scheduler updates from Ingo Molnar:

 - Move the nohz kick code out of the scheduler tick to a dedicated IPI,
   from Frederic Weisbecker.

  This necessiated quite some background infrastructure rework,
  including:

   * Clean up some irq-work internals
   * Implement remote irq-work
   * Implement nohz kick on top of remote irq-work
   * Move full dynticks timer enqueue notification to new kick
   * Move multi-task notification to new kick
   * Remove unecessary barriers on multi-task notification

 - Remove proliferation of wait_on_bit() action functions and allow
   wait_on_bit_action() functions to support a timeout.  (Neil Brown)

 - Another round of sched/numa improvements, cleanups and fixes.  (Rik
   van Riel)

 - Implement fast idling of CPUs when the system is partially loaded,
   for better scalability.  (Tim Chen)

 - Restructure and fix the CPU hotplug handling code that may leave
   cfs_rq and rt_rq's throttled when tasks are migrated away from a dead
   cpu.  (Kirill Tkhai)

 - Robustify the sched topology setup code.  (Peterz Zijlstra)

 - Improve sched_feat() handling wrt.  static_keys (Jason Baron)

 - Misc fixes.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
  sched/fair: Fix 'make xmldocs' warning caused by missing description
  sched: Use macro for magic number of -1 for setparam
  sched: Robustify topology setup
  sched: Fix sched_setparam() policy == -1 logic
  sched: Allow wait_on_bit_action() functions to support a timeout
  sched: Remove proliferation of wait_on_bit() action functions
  sched/numa: Revert "Use effective_load() to balance NUMA loads"
  sched: Fix static_key race with sched_feat()
  sched: Remove extra static_key*() function indirection
  sched/rt: Fix replenish_dl_entity() comments to match the current upstream code
  sched: Transform resched_task() into resched_curr()
  sched/deadline: Kill task_struct-&gt;pi_top_task
  sched: Rework check_for_tasks()
  sched/rt: Enqueue just unthrottled rt_rq back on the stack in __disable_runtime()
  sched/fair: Disable runtime_enabled on dying rq
  sched/numa: Change scan period code to match intent
  sched/numa: Rework best node setting in task_numa_migrate()
  sched/numa: Examine a task move when examining a task swap
  sched/numa: Simplify task_numa_compare()
  sched/numa: Use effective_load() to balance NUMA loads
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2014-08-04T22:55:08+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-08-04T22:55:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5bda4f638f36ef4c4e3b1397b02affc3db94356e'/>
<id>5bda4f638f36ef4c4e3b1397b02affc3db94356e</id>
<content type='text'>
Pull RCU changes from Ingo Molar:
 "The main changes:

   - torture-test updates
   - callback-offloading changes
   - maintainership changes
   - update RCU documentation
   - miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  rcu: Allow for NULL tick_nohz_full_mask when nohz_full= missing
  rcu: Fix a sparse warning in rcu_report_unblock_qs_rnp()
  rcu: Fix a sparse warning in rcu_initiate_boost()
  rcu: Fix __rcu_reclaim() to use true/false for bool
  rcu: Remove CONFIG_PROVE_RCU_DELAY
  rcu: Use __this_cpu_read() instead of per_cpu_ptr()
  rcu: Don't use NMIs to dump other CPUs' stacks
  rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  rcu: Simplify priority boosting by putting rt_mutex in rcu_node
  rcu: Check both root and current rcu_node when setting up future grace period
  rcu: Allow post-unlock reference for rt_mutex
  rcu: Loosen __call_rcu()'s rcu_head alignment constraint
  rcu: Eliminate read-modify-write ACCESS_ONCE() calls
  rcu: Remove redundant ACCESS_ONCE() from tick_do_timer_cpu
  rcu: Make rcu node arrays static const char * const
  signal: Explain local_irq_save() call
  rcu: Handle obsolete references to TINY_PREEMPT_RCU
  rcu: Document deadlock-avoidance information for rcu_read_unlock()
  scripts: Teach get_maintainer.pl about the new "R:" tag
  rcu: Update rcu torture maintainership filename patterns
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull RCU changes from Ingo Molar:
 "The main changes:

   - torture-test updates
   - callback-offloading changes
   - maintainership changes
   - update RCU documentation
   - miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  rcu: Allow for NULL tick_nohz_full_mask when nohz_full= missing
  rcu: Fix a sparse warning in rcu_report_unblock_qs_rnp()
  rcu: Fix a sparse warning in rcu_initiate_boost()
  rcu: Fix __rcu_reclaim() to use true/false for bool
  rcu: Remove CONFIG_PROVE_RCU_DELAY
  rcu: Use __this_cpu_read() instead of per_cpu_ptr()
  rcu: Don't use NMIs to dump other CPUs' stacks
  rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  rcu: Simplify priority boosting by putting rt_mutex in rcu_node
  rcu: Check both root and current rcu_node when setting up future grace period
  rcu: Allow post-unlock reference for rt_mutex
  rcu: Loosen __call_rcu()'s rcu_head alignment constraint
  rcu: Eliminate read-modify-write ACCESS_ONCE() calls
  rcu: Remove redundant ACCESS_ONCE() from tick_do_timer_cpu
  rcu: Make rcu node arrays static const char * const
  signal: Explain local_irq_save() call
  rcu: Handle obsolete references to TINY_PREEMPT_RCU
  rcu: Document deadlock-avoidance information for rcu_read_unlock()
  scripts: Teach get_maintainer.pl about the new "R:" tag
  rcu: Update rcu torture maintainership filename patterns
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks</title>
<updated>2014-08-01T10:54:41+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2014-08-01T10:20:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=504d58745c9ca28d33572e2d8a9990b43e06075d'/>
<id>504d58745c9ca28d33572e2d8a9990b43e06075d</id>
<content type='text'>
clockevents_increase_min_delta() calls printk() from under
hrtimer_bases.lock. That causes lock inversion on scheduler locks because
printk() can call into the scheduler. Lockdep puts it as:

======================================================
[ INFO: possible circular locking dependency detected ]
3.15.0-rc8-06195-g939f04b #2 Not tainted
-------------------------------------------------------
trinity-main/74 is trying to acquire lock:
 (&amp;port_lock_key){-.....}, at: [&lt;811c60be&gt;] serial8250_console_write+0x8c/0x10c

but task is already holding lock:
 (hrtimer_bases.lock){-.-...}, at: [&lt;8103caeb&gt;] hrtimer_try_to_cancel+0x13/0x66

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-&gt; #5 (hrtimer_bases.lock){-.-...}:
       [&lt;8104a942&gt;] lock_acquire+0x92/0x101
       [&lt;8142f11d&gt;] _raw_spin_lock_irqsave+0x2e/0x3e
       [&lt;8103c918&gt;] __hrtimer_start_range_ns+0x1c/0x197
       [&lt;8107ec20&gt;] perf_swevent_start_hrtimer.part.41+0x7a/0x85
       [&lt;81080792&gt;] task_clock_event_start+0x3a/0x3f
       [&lt;810807a4&gt;] task_clock_event_add+0xd/0x14
       [&lt;8108259a&gt;] event_sched_in+0xb6/0x17a
       [&lt;810826a2&gt;] group_sched_in+0x44/0x122
       [&lt;81082885&gt;] ctx_sched_in.isra.67+0x105/0x11f
       [&lt;810828e6&gt;] perf_event_sched_in.isra.70+0x47/0x4b
       [&lt;81082bf6&gt;] __perf_install_in_context+0x8b/0xa3
       [&lt;8107eb8e&gt;] remote_function+0x12/0x2a
       [&lt;8105f5af&gt;] smp_call_function_single+0x2d/0x53
       [&lt;8107e17d&gt;] task_function_call+0x30/0x36
       [&lt;8107fb82&gt;] perf_install_in_context+0x87/0xbb
       [&lt;810852c9&gt;] SYSC_perf_event_open+0x5c6/0x701
       [&lt;810856f9&gt;] SyS_perf_event_open+0x17/0x19
       [&lt;8142f8ee&gt;] syscall_call+0x7/0xb

-&gt; #4 (&amp;ctx-&gt;lock){......}:
       [&lt;8104a942&gt;] lock_acquire+0x92/0x101
       [&lt;8142f04c&gt;] _raw_spin_lock+0x21/0x30
       [&lt;81081df3&gt;] __perf_event_task_sched_out+0x1dc/0x34f
       [&lt;8142cacc&gt;] __schedule+0x4c6/0x4cb
       [&lt;8142cae0&gt;] schedule+0xf/0x11
       [&lt;8142f9a6&gt;] work_resched+0x5/0x30

-&gt; #3 (&amp;rq-&gt;lock){-.-.-.}:
       [&lt;8104a942&gt;] lock_acquire+0x92/0x101
       [&lt;8142f04c&gt;] _raw_spin_lock+0x21/0x30
       [&lt;81040873&gt;] __task_rq_lock+0x33/0x3a
       [&lt;8104184c&gt;] wake_up_new_task+0x25/0xc2
       [&lt;8102474b&gt;] do_fork+0x15c/0x2a0
       [&lt;810248a9&gt;] kernel_thread+0x1a/0x1f
       [&lt;814232a2&gt;] rest_init+0x1a/0x10e
       [&lt;817af949&gt;] start_kernel+0x303/0x308
       [&lt;817af2ab&gt;] i386_start_kernel+0x79/0x7d

-&gt; #2 (&amp;p-&gt;pi_lock){-.-...}:
       [&lt;8104a942&gt;] lock_acquire+0x92/0x101
       [&lt;8142f11d&gt;] _raw_spin_lock_irqsave+0x2e/0x3e
       [&lt;810413dd&gt;] try_to_wake_up+0x1d/0xd6
       [&lt;810414cd&gt;] default_wake_function+0xb/0xd
       [&lt;810461f3&gt;] __wake_up_common+0x39/0x59
       [&lt;81046346&gt;] __wake_up+0x29/0x3b
       [&lt;811b8733&gt;] tty_wakeup+0x49/0x51
       [&lt;811c3568&gt;] uart_write_wakeup+0x17/0x19
       [&lt;811c5dc1&gt;] serial8250_tx_chars+0xbc/0xfb
       [&lt;811c5f28&gt;] serial8250_handle_irq+0x54/0x6a
       [&lt;811c5f57&gt;] serial8250_default_handle_irq+0x19/0x1c
       [&lt;811c56d8&gt;] serial8250_interrupt+0x38/0x9e
       [&lt;810510e7&gt;] handle_irq_event_percpu+0x5f/0x1e2
       [&lt;81051296&gt;] handle_irq_event+0x2c/0x43
       [&lt;81052cee&gt;] handle_level_irq+0x57/0x80
       [&lt;81002a72&gt;] handle_irq+0x46/0x5c
       [&lt;810027df&gt;] do_IRQ+0x32/0x89
       [&lt;8143036e&gt;] common_interrupt+0x2e/0x33
       [&lt;8142f23c&gt;] _raw_spin_unlock_irqrestore+0x3f/0x49
       [&lt;811c25a4&gt;] uart_start+0x2d/0x32
       [&lt;811c2c04&gt;] uart_write+0xc7/0xd6
       [&lt;811bc6f6&gt;] n_tty_write+0xb8/0x35e
       [&lt;811b9beb&gt;] tty_write+0x163/0x1e4
       [&lt;811b9cd9&gt;] redirected_tty_write+0x6d/0x75
       [&lt;810b6ed6&gt;] vfs_write+0x75/0xb0
       [&lt;810b7265&gt;] SyS_write+0x44/0x77
       [&lt;8142f8ee&gt;] syscall_call+0x7/0xb

-&gt; #1 (&amp;tty-&gt;write_wait){-.....}:
       [&lt;8104a942&gt;] lock_acquire+0x92/0x101
       [&lt;8142f11d&gt;] _raw_spin_lock_irqsave+0x2e/0x3e
       [&lt;81046332&gt;] __wake_up+0x15/0x3b
       [&lt;811b8733&gt;] tty_wakeup+0x49/0x51
       [&lt;811c3568&gt;] uart_write_wakeup+0x17/0x19
       [&lt;811c5dc1&gt;] serial8250_tx_chars+0xbc/0xfb
       [&lt;811c5f28&gt;] serial8250_handle_irq+0x54/0x6a
       [&lt;811c5f57&gt;] serial8250_default_handle_irq+0x19/0x1c
       [&lt;811c56d8&gt;] serial8250_interrupt+0x38/0x9e
       [&lt;810510e7&gt;] handle_irq_event_percpu+0x5f/0x1e2
       [&lt;81051296&gt;] handle_irq_event+0x2c/0x43
       [&lt;81052cee&gt;] handle_level_irq+0x57/0x80
       [&lt;81002a72&gt;] handle_irq+0x46/0x5c
       [&lt;810027df&gt;] do_IRQ+0x32/0x89
       [&lt;8143036e&gt;] common_interrupt+0x2e/0x33
       [&lt;8142f23c&gt;] _raw_spin_unlock_irqrestore+0x3f/0x49
       [&lt;811c25a4&gt;] uart_start+0x2d/0x32
       [&lt;811c2c04&gt;] uart_write+0xc7/0xd6
       [&lt;811bc6f6&gt;] n_tty_write+0xb8/0x35e
       [&lt;811b9beb&gt;] tty_write+0x163/0x1e4
       [&lt;811b9cd9&gt;] redirected_tty_write+0x6d/0x75
       [&lt;810b6ed6&gt;] vfs_write+0x75/0xb0
       [&lt;810b7265&gt;] SyS_write+0x44/0x77
       [&lt;8142f8ee&gt;] syscall_call+0x7/0xb

-&gt; #0 (&amp;port_lock_key){-.....}:
       [&lt;8104a62d&gt;] __lock_acquire+0x9ea/0xc6d
       [&lt;8104a942&gt;] lock_acquire+0x92/0x101
       [&lt;8142f11d&gt;] _raw_spin_lock_irqsave+0x2e/0x3e
       [&lt;811c60be&gt;] serial8250_console_write+0x8c/0x10c
       [&lt;8104e402&gt;] call_console_drivers.constprop.31+0x87/0x118
       [&lt;8104f5d5&gt;] console_unlock+0x1d7/0x398
       [&lt;8104fb70&gt;] vprintk_emit+0x3da/0x3e4
       [&lt;81425f76&gt;] printk+0x17/0x19
       [&lt;8105bfa0&gt;] clockevents_program_min_delta+0x104/0x116
       [&lt;8105c548&gt;] clockevents_program_event+0xe7/0xf3
       [&lt;8105cc1c&gt;] tick_program_event+0x1e/0x23
       [&lt;8103c43c&gt;] hrtimer_force_reprogram+0x88/0x8f
       [&lt;8103c49e&gt;] __remove_hrtimer+0x5b/0x79
       [&lt;8103cb21&gt;] hrtimer_try_to_cancel+0x49/0x66
       [&lt;8103cb4b&gt;] hrtimer_cancel+0xd/0x18
       [&lt;8107f102&gt;] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
       [&lt;81080705&gt;] task_clock_event_stop+0x20/0x64
       [&lt;81080756&gt;] task_clock_event_del+0xd/0xf
       [&lt;81081350&gt;] event_sched_out+0xab/0x11e
       [&lt;810813e0&gt;] group_sched_out+0x1d/0x66
       [&lt;81081682&gt;] ctx_sched_out+0xaf/0xbf
       [&lt;81081e04&gt;] __perf_event_task_sched_out+0x1ed/0x34f
       [&lt;8142cacc&gt;] __schedule+0x4c6/0x4cb
       [&lt;8142cae0&gt;] schedule+0xf/0x11
       [&lt;8142f9a6&gt;] work_resched+0x5/0x30

other info that might help us debug this:

Chain exists of:
  &amp;port_lock_key --&gt; &amp;ctx-&gt;lock --&gt; hrtimer_bases.lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(hrtimer_bases.lock);
                               lock(&amp;ctx-&gt;lock);
                               lock(hrtimer_bases.lock);
  lock(&amp;port_lock_key);

 *** DEADLOCK ***

4 locks held by trinity-main/74:
 #0:  (&amp;rq-&gt;lock){-.-.-.}, at: [&lt;8142c6f3&gt;] __schedule+0xed/0x4cb
 #1:  (&amp;ctx-&gt;lock){......}, at: [&lt;81081df3&gt;] __perf_event_task_sched_out+0x1dc/0x34f
 #2:  (hrtimer_bases.lock){-.-...}, at: [&lt;8103caeb&gt;] hrtimer_try_to_cancel+0x13/0x66
 #3:  (console_lock){+.+...}, at: [&lt;8104fb5d&gt;] vprintk_emit+0x3c7/0x3e4

stack backtrace:
CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04b #2
 00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570
 8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0
 8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003
Call Trace:
 [&lt;81426f69&gt;] dump_stack+0x16/0x18
 [&lt;81425a99&gt;] print_circular_bug+0x18f/0x19c
 [&lt;8104a62d&gt;] __lock_acquire+0x9ea/0xc6d
 [&lt;8104a942&gt;] lock_acquire+0x92/0x101
 [&lt;811c60be&gt;] ? serial8250_console_write+0x8c/0x10c
 [&lt;811c6032&gt;] ? wait_for_xmitr+0x76/0x76
 [&lt;8142f11d&gt;] _raw_spin_lock_irqsave+0x2e/0x3e
 [&lt;811c60be&gt;] ? serial8250_console_write+0x8c/0x10c
 [&lt;811c60be&gt;] serial8250_console_write+0x8c/0x10c
 [&lt;8104af87&gt;] ? lock_release+0x191/0x223
 [&lt;811c6032&gt;] ? wait_for_xmitr+0x76/0x76
 [&lt;8104e402&gt;] call_console_drivers.constprop.31+0x87/0x118
 [&lt;8104f5d5&gt;] console_unlock+0x1d7/0x398
 [&lt;8104fb70&gt;] vprintk_emit+0x3da/0x3e4
 [&lt;81425f76&gt;] printk+0x17/0x19
 [&lt;8105bfa0&gt;] clockevents_program_min_delta+0x104/0x116
 [&lt;8105cc1c&gt;] tick_program_event+0x1e/0x23
 [&lt;8103c43c&gt;] hrtimer_force_reprogram+0x88/0x8f
 [&lt;8103c49e&gt;] __remove_hrtimer+0x5b/0x79
 [&lt;8103cb21&gt;] hrtimer_try_to_cancel+0x49/0x66
 [&lt;8103cb4b&gt;] hrtimer_cancel+0xd/0x18
 [&lt;8107f102&gt;] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
 [&lt;81080705&gt;] task_clock_event_stop+0x20/0x64
 [&lt;81080756&gt;] task_clock_event_del+0xd/0xf
 [&lt;81081350&gt;] event_sched_out+0xab/0x11e
 [&lt;810813e0&gt;] group_sched_out+0x1d/0x66
 [&lt;81081682&gt;] ctx_sched_out+0xaf/0xbf
 [&lt;81081e04&gt;] __perf_event_task_sched_out+0x1ed/0x34f
 [&lt;8104416d&gt;] ? __dequeue_entity+0x23/0x27
 [&lt;81044505&gt;] ? pick_next_task_fair+0xb1/0x120
 [&lt;8142cacc&gt;] __schedule+0x4c6/0x4cb
 [&lt;81047574&gt;] ? trace_hardirqs_off_caller+0xd7/0x108
 [&lt;810475b0&gt;] ? trace_hardirqs_off+0xb/0xd
 [&lt;81056346&gt;] ? rcu_irq_exit+0x64/0x77

Fix the problem by using printk_deferred() which does not call into the
scheduler.

Reported-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
clockevents_increase_min_delta() calls printk() from under
hrtimer_bases.lock. That causes lock inversion on scheduler locks because
printk() can call into the scheduler. Lockdep puts it as:

======================================================
[ INFO: possible circular locking dependency detected ]
3.15.0-rc8-06195-g939f04b #2 Not tainted
-------------------------------------------------------
trinity-main/74 is trying to acquire lock:
 (&amp;port_lock_key){-.....}, at: [&lt;811c60be&gt;] serial8250_console_write+0x8c/0x10c

but task is already holding lock:
 (hrtimer_bases.lock){-.-...}, at: [&lt;8103caeb&gt;] hrtimer_try_to_cancel+0x13/0x66

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-&gt; #5 (hrtimer_bases.lock){-.-...}:
       [&lt;8104a942&gt;] lock_acquire+0x92/0x101
       [&lt;8142f11d&gt;] _raw_spin_lock_irqsave+0x2e/0x3e
       [&lt;8103c918&gt;] __hrtimer_start_range_ns+0x1c/0x197
       [&lt;8107ec20&gt;] perf_swevent_start_hrtimer.part.41+0x7a/0x85
       [&lt;81080792&gt;] task_clock_event_start+0x3a/0x3f
       [&lt;810807a4&gt;] task_clock_event_add+0xd/0x14
       [&lt;8108259a&gt;] event_sched_in+0xb6/0x17a
       [&lt;810826a2&gt;] group_sched_in+0x44/0x122
       [&lt;81082885&gt;] ctx_sched_in.isra.67+0x105/0x11f
       [&lt;810828e6&gt;] perf_event_sched_in.isra.70+0x47/0x4b
       [&lt;81082bf6&gt;] __perf_install_in_context+0x8b/0xa3
       [&lt;8107eb8e&gt;] remote_function+0x12/0x2a
       [&lt;8105f5af&gt;] smp_call_function_single+0x2d/0x53
       [&lt;8107e17d&gt;] task_function_call+0x30/0x36
       [&lt;8107fb82&gt;] perf_install_in_context+0x87/0xbb
       [&lt;810852c9&gt;] SYSC_perf_event_open+0x5c6/0x701
       [&lt;810856f9&gt;] SyS_perf_event_open+0x17/0x19
       [&lt;8142f8ee&gt;] syscall_call+0x7/0xb

-&gt; #4 (&amp;ctx-&gt;lock){......}:
       [&lt;8104a942&gt;] lock_acquire+0x92/0x101
       [&lt;8142f04c&gt;] _raw_spin_lock+0x21/0x30
       [&lt;81081df3&gt;] __perf_event_task_sched_out+0x1dc/0x34f
       [&lt;8142cacc&gt;] __schedule+0x4c6/0x4cb
       [&lt;8142cae0&gt;] schedule+0xf/0x11
       [&lt;8142f9a6&gt;] work_resched+0x5/0x30

-&gt; #3 (&amp;rq-&gt;lock){-.-.-.}:
       [&lt;8104a942&gt;] lock_acquire+0x92/0x101
       [&lt;8142f04c&gt;] _raw_spin_lock+0x21/0x30
       [&lt;81040873&gt;] __task_rq_lock+0x33/0x3a
       [&lt;8104184c&gt;] wake_up_new_task+0x25/0xc2
       [&lt;8102474b&gt;] do_fork+0x15c/0x2a0
       [&lt;810248a9&gt;] kernel_thread+0x1a/0x1f
       [&lt;814232a2&gt;] rest_init+0x1a/0x10e
       [&lt;817af949&gt;] start_kernel+0x303/0x308
       [&lt;817af2ab&gt;] i386_start_kernel+0x79/0x7d

-&gt; #2 (&amp;p-&gt;pi_lock){-.-...}:
       [&lt;8104a942&gt;] lock_acquire+0x92/0x101
       [&lt;8142f11d&gt;] _raw_spin_lock_irqsave+0x2e/0x3e
       [&lt;810413dd&gt;] try_to_wake_up+0x1d/0xd6
       [&lt;810414cd&gt;] default_wake_function+0xb/0xd
       [&lt;810461f3&gt;] __wake_up_common+0x39/0x59
       [&lt;81046346&gt;] __wake_up+0x29/0x3b
       [&lt;811b8733&gt;] tty_wakeup+0x49/0x51
       [&lt;811c3568&gt;] uart_write_wakeup+0x17/0x19
       [&lt;811c5dc1&gt;] serial8250_tx_chars+0xbc/0xfb
       [&lt;811c5f28&gt;] serial8250_handle_irq+0x54/0x6a
       [&lt;811c5f57&gt;] serial8250_default_handle_irq+0x19/0x1c
       [&lt;811c56d8&gt;] serial8250_interrupt+0x38/0x9e
       [&lt;810510e7&gt;] handle_irq_event_percpu+0x5f/0x1e2
       [&lt;81051296&gt;] handle_irq_event+0x2c/0x43
       [&lt;81052cee&gt;] handle_level_irq+0x57/0x80
       [&lt;81002a72&gt;] handle_irq+0x46/0x5c
       [&lt;810027df&gt;] do_IRQ+0x32/0x89
       [&lt;8143036e&gt;] common_interrupt+0x2e/0x33
       [&lt;8142f23c&gt;] _raw_spin_unlock_irqrestore+0x3f/0x49
       [&lt;811c25a4&gt;] uart_start+0x2d/0x32
       [&lt;811c2c04&gt;] uart_write+0xc7/0xd6
       [&lt;811bc6f6&gt;] n_tty_write+0xb8/0x35e
       [&lt;811b9beb&gt;] tty_write+0x163/0x1e4
       [&lt;811b9cd9&gt;] redirected_tty_write+0x6d/0x75
       [&lt;810b6ed6&gt;] vfs_write+0x75/0xb0
       [&lt;810b7265&gt;] SyS_write+0x44/0x77
       [&lt;8142f8ee&gt;] syscall_call+0x7/0xb

-&gt; #1 (&amp;tty-&gt;write_wait){-.....}:
       [&lt;8104a942&gt;] lock_acquire+0x92/0x101
       [&lt;8142f11d&gt;] _raw_spin_lock_irqsave+0x2e/0x3e
       [&lt;81046332&gt;] __wake_up+0x15/0x3b
       [&lt;811b8733&gt;] tty_wakeup+0x49/0x51
       [&lt;811c3568&gt;] uart_write_wakeup+0x17/0x19
       [&lt;811c5dc1&gt;] serial8250_tx_chars+0xbc/0xfb
       [&lt;811c5f28&gt;] serial8250_handle_irq+0x54/0x6a
       [&lt;811c5f57&gt;] serial8250_default_handle_irq+0x19/0x1c
       [&lt;811c56d8&gt;] serial8250_interrupt+0x38/0x9e
       [&lt;810510e7&gt;] handle_irq_event_percpu+0x5f/0x1e2
       [&lt;81051296&gt;] handle_irq_event+0x2c/0x43
       [&lt;81052cee&gt;] handle_level_irq+0x57/0x80
       [&lt;81002a72&gt;] handle_irq+0x46/0x5c
       [&lt;810027df&gt;] do_IRQ+0x32/0x89
       [&lt;8143036e&gt;] common_interrupt+0x2e/0x33
       [&lt;8142f23c&gt;] _raw_spin_unlock_irqrestore+0x3f/0x49
       [&lt;811c25a4&gt;] uart_start+0x2d/0x32
       [&lt;811c2c04&gt;] uart_write+0xc7/0xd6
       [&lt;811bc6f6&gt;] n_tty_write+0xb8/0x35e
       [&lt;811b9beb&gt;] tty_write+0x163/0x1e4
       [&lt;811b9cd9&gt;] redirected_tty_write+0x6d/0x75
       [&lt;810b6ed6&gt;] vfs_write+0x75/0xb0
       [&lt;810b7265&gt;] SyS_write+0x44/0x77
       [&lt;8142f8ee&gt;] syscall_call+0x7/0xb

-&gt; #0 (&amp;port_lock_key){-.....}:
       [&lt;8104a62d&gt;] __lock_acquire+0x9ea/0xc6d
       [&lt;8104a942&gt;] lock_acquire+0x92/0x101
       [&lt;8142f11d&gt;] _raw_spin_lock_irqsave+0x2e/0x3e
       [&lt;811c60be&gt;] serial8250_console_write+0x8c/0x10c
       [&lt;8104e402&gt;] call_console_drivers.constprop.31+0x87/0x118
       [&lt;8104f5d5&gt;] console_unlock+0x1d7/0x398
       [&lt;8104fb70&gt;] vprintk_emit+0x3da/0x3e4
       [&lt;81425f76&gt;] printk+0x17/0x19
       [&lt;8105bfa0&gt;] clockevents_program_min_delta+0x104/0x116
       [&lt;8105c548&gt;] clockevents_program_event+0xe7/0xf3
       [&lt;8105cc1c&gt;] tick_program_event+0x1e/0x23
       [&lt;8103c43c&gt;] hrtimer_force_reprogram+0x88/0x8f
       [&lt;8103c49e&gt;] __remove_hrtimer+0x5b/0x79
       [&lt;8103cb21&gt;] hrtimer_try_to_cancel+0x49/0x66
       [&lt;8103cb4b&gt;] hrtimer_cancel+0xd/0x18
       [&lt;8107f102&gt;] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
       [&lt;81080705&gt;] task_clock_event_stop+0x20/0x64
       [&lt;81080756&gt;] task_clock_event_del+0xd/0xf
       [&lt;81081350&gt;] event_sched_out+0xab/0x11e
       [&lt;810813e0&gt;] group_sched_out+0x1d/0x66
       [&lt;81081682&gt;] ctx_sched_out+0xaf/0xbf
       [&lt;81081e04&gt;] __perf_event_task_sched_out+0x1ed/0x34f
       [&lt;8142cacc&gt;] __schedule+0x4c6/0x4cb
       [&lt;8142cae0&gt;] schedule+0xf/0x11
       [&lt;8142f9a6&gt;] work_resched+0x5/0x30

other info that might help us debug this:

Chain exists of:
  &amp;port_lock_key --&gt; &amp;ctx-&gt;lock --&gt; hrtimer_bases.lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(hrtimer_bases.lock);
                               lock(&amp;ctx-&gt;lock);
                               lock(hrtimer_bases.lock);
  lock(&amp;port_lock_key);

 *** DEADLOCK ***

4 locks held by trinity-main/74:
 #0:  (&amp;rq-&gt;lock){-.-.-.}, at: [&lt;8142c6f3&gt;] __schedule+0xed/0x4cb
 #1:  (&amp;ctx-&gt;lock){......}, at: [&lt;81081df3&gt;] __perf_event_task_sched_out+0x1dc/0x34f
 #2:  (hrtimer_bases.lock){-.-...}, at: [&lt;8103caeb&gt;] hrtimer_try_to_cancel+0x13/0x66
 #3:  (console_lock){+.+...}, at: [&lt;8104fb5d&gt;] vprintk_emit+0x3c7/0x3e4

stack backtrace:
CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04b #2
 00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570
 8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0
 8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003
Call Trace:
 [&lt;81426f69&gt;] dump_stack+0x16/0x18
 [&lt;81425a99&gt;] print_circular_bug+0x18f/0x19c
 [&lt;8104a62d&gt;] __lock_acquire+0x9ea/0xc6d
 [&lt;8104a942&gt;] lock_acquire+0x92/0x101
 [&lt;811c60be&gt;] ? serial8250_console_write+0x8c/0x10c
 [&lt;811c6032&gt;] ? wait_for_xmitr+0x76/0x76
 [&lt;8142f11d&gt;] _raw_spin_lock_irqsave+0x2e/0x3e
 [&lt;811c60be&gt;] ? serial8250_console_write+0x8c/0x10c
 [&lt;811c60be&gt;] serial8250_console_write+0x8c/0x10c
 [&lt;8104af87&gt;] ? lock_release+0x191/0x223
 [&lt;811c6032&gt;] ? wait_for_xmitr+0x76/0x76
 [&lt;8104e402&gt;] call_console_drivers.constprop.31+0x87/0x118
 [&lt;8104f5d5&gt;] console_unlock+0x1d7/0x398
 [&lt;8104fb70&gt;] vprintk_emit+0x3da/0x3e4
 [&lt;81425f76&gt;] printk+0x17/0x19
 [&lt;8105bfa0&gt;] clockevents_program_min_delta+0x104/0x116
 [&lt;8105cc1c&gt;] tick_program_event+0x1e/0x23
 [&lt;8103c43c&gt;] hrtimer_force_reprogram+0x88/0x8f
 [&lt;8103c49e&gt;] __remove_hrtimer+0x5b/0x79
 [&lt;8103cb21&gt;] hrtimer_try_to_cancel+0x49/0x66
 [&lt;8103cb4b&gt;] hrtimer_cancel+0xd/0x18
 [&lt;8107f102&gt;] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
 [&lt;81080705&gt;] task_clock_event_stop+0x20/0x64
 [&lt;81080756&gt;] task_clock_event_del+0xd/0xf
 [&lt;81081350&gt;] event_sched_out+0xab/0x11e
 [&lt;810813e0&gt;] group_sched_out+0x1d/0x66
 [&lt;81081682&gt;] ctx_sched_out+0xaf/0xbf
 [&lt;81081e04&gt;] __perf_event_task_sched_out+0x1ed/0x34f
 [&lt;8104416d&gt;] ? __dequeue_entity+0x23/0x27
 [&lt;81044505&gt;] ? pick_next_task_fair+0xb1/0x120
 [&lt;8142cacc&gt;] __schedule+0x4c6/0x4cb
 [&lt;81047574&gt;] ? trace_hardirqs_off_caller+0xd7/0x108
 [&lt;810475b0&gt;] ? trace_hardirqs_off+0xb/0xd
 [&lt;81056346&gt;] ? rcu_irq_exit+0x64/0x77

Fix the problem by using printk_deferred() which does not call into the
scheduler.

Reported-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'sched/urgent' into sched/core, to merge fixes before applying new changes</title>
<updated>2014-07-28T08:03:00+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2014-07-28T08:03:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ca5bc6cd5de5b53eb8fd6fea39aa3fe2a1e8c3d9'/>
<id>ca5bc6cd5de5b53eb8fd6fea39aa3fe2a1e8c3d9</id>
<content type='text'>
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched_clock: Avoid corrupting hrtimer tree during suspend</title>
<updated>2014-07-24T10:02:49+00:00</updated>
<author>
<name>Stephen Boyd</name>
<email>sboyd@codeaurora.org</email>
</author>
<published>2014-07-24T04:03:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f723aa1817dd8f4fe005aab52ba70c8ab0ef9457'/>
<id>f723aa1817dd8f4fe005aab52ba70c8ab0ef9457</id>
<content type='text'>
During suspend we call sched_clock_poll() to update the epoch and
accumulated time and reprogram the sched_clock_timer to fire
before the next wrap-around time. Unfortunately,
sched_clock_poll() doesn't restart the timer, instead it relies
on the hrtimer layer to do that and during suspend we aren't
calling that function from the hrtimer layer. Instead, we're
reprogramming the expires time while the hrtimer is enqueued,
which can cause the hrtimer tree to be corrupted. Furthermore, we
restart the timer during suspend but we update the epoch during
resume which seems counter-intuitive.

Let's fix this by saving the accumulated state and canceling the
timer during suspend. On resume we can update the epoch and
restart the timer similar to what we would do if we were starting
the clock for the first time.

Fixes: a08ca5d1089d "sched_clock: Use an hrtimer instead of timer"
Signed-off-by: Stephen Boyd &lt;sboyd@codeaurora.org&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
Link: http://lkml.kernel.org/r/1406174630-23458-1-git-send-email-john.stultz@linaro.org
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: stable &lt;stable@vger.kernel.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
During suspend we call sched_clock_poll() to update the epoch and
accumulated time and reprogram the sched_clock_timer to fire
before the next wrap-around time. Unfortunately,
sched_clock_poll() doesn't restart the timer, instead it relies
on the hrtimer layer to do that and during suspend we aren't
calling that function from the hrtimer layer. Instead, we're
reprogramming the expires time while the hrtimer is enqueued,
which can cause the hrtimer tree to be corrupted. Furthermore, we
restart the timer during suspend but we update the epoch during
resume which seems counter-intuitive.

Let's fix this by saving the accumulated state and canceling the
timer during suspend. On resume we can update the epoch and
restart the timer similar to what we would do if we were starting
the clock for the first time.

Fixes: a08ca5d1089d "sched_clock: Use an hrtimer instead of timer"
Signed-off-by: Stephen Boyd &lt;sboyd@codeaurora.org&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
Link: http://lkml.kernel.org/r/1406174630-23458-1-git-send-email-john.stultz@linaro.org
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: stable &lt;stable@vger.kernel.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</pre>
</div>
</content>
</entry>
</feed>
