<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel/rcutree.c, branch v3.10.51</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration</title>
<updated>2013-06-10T20:37:12+00:00</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.vnet.ibm.com</email>
</author>
<published>2013-06-02T14:13:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=971394f389992f8462c4e5ae0e3b49a10a9534a3'/>
<id>971394f389992f8462c4e5ae0e3b49a10a9534a3</id>
<content type='text'>
In Steven Rostedt's words:

&gt; I've been debugging the last couple of days why my tests have been
&gt; locking up. One of my tracing tests, runs all available tracers. The
&gt; lockup always happened with the mmiotrace, which is used to trace
&gt; interactions between priority drivers and the kernel. But to do this
&gt; easily, when the tracer gets registered, it disables all but the boot
&gt; CPUs. The lockup always happened after it got done disabling the CPUs.
&gt;
&gt; Then I decided to try this:
&gt;
&gt; while :; do
&gt; 	for i in 1 2 3; do
&gt; 		echo 0 &gt; /sys/devices/system/cpu/cpu$i/online
&gt; 	done
&gt; 	for i in 1 2 3; do
&gt; 		echo 1 &gt; /sys/devices/system/cpu/cpu$i/online
&gt; 	done
&gt; done
&gt;
&gt; Well, sure enough, that locked up too, with the same users. Doing a
&gt; sysrq-w (showing all blocked tasks):
&gt;
&gt; [ 2991.344562]   task                        PC stack   pid father
&gt; [ 2991.344562] rcu_preempt     D ffff88007986fdf8     0    10      2 0x00000000
&gt; [ 2991.344562]  ffff88007986fc98 0000000000000002 ffff88007986fc48 0000000000000908
&gt; [ 2991.344562]  ffff88007986c280 ffff88007986ffd8 ffff88007986ffd8 00000000001d3c80
&gt; [ 2991.344562]  ffff880079248a40 ffff88007986c280 0000000000000000 00000000fffd4295
&gt; [ 2991.344562] Call Trace:
&gt; [ 2991.344562]  [&lt;ffffffff815437ba&gt;] schedule+0x64/0x66
&gt; [ 2991.344562]  [&lt;ffffffff81541750&gt;] schedule_timeout+0xbc/0xf9
&gt; [ 2991.344562]  [&lt;ffffffff8154bec0&gt;] ? ftrace_call+0x5/0x2f
&gt; [ 2991.344562]  [&lt;ffffffff81049513&gt;] ? cascade+0xa8/0xa8
&gt; [ 2991.344562]  [&lt;ffffffff815417ab&gt;] schedule_timeout_uninterruptible+0x1e/0x20
&gt; [ 2991.344562]  [&lt;ffffffff810c980c&gt;] rcu_gp_kthread+0x502/0x94b
&gt; [ 2991.344562]  [&lt;ffffffff81062791&gt;] ? __init_waitqueue_head+0x50/0x50
&gt; [ 2991.344562]  [&lt;ffffffff810c930a&gt;] ? rcu_gp_fqs+0x64/0x64
&gt; [ 2991.344562]  [&lt;ffffffff81061cdb&gt;] kthread+0xb1/0xb9
&gt; [ 2991.344562]  [&lt;ffffffff81091e31&gt;] ? lock_release_holdtime.part.23+0x4e/0x55
&gt; [ 2991.344562]  [&lt;ffffffff81061c2a&gt;] ? __init_kthread_worker+0x58/0x58
&gt; [ 2991.344562]  [&lt;ffffffff8154c1dc&gt;] ret_from_fork+0x7c/0xb0
&gt; [ 2991.344562]  [&lt;ffffffff81061c2a&gt;] ? __init_kthread_worker+0x58/0x58
&gt; [ 2991.344562] kworker/0:1     D ffffffff81a30680     0    47      2 0x00000000
&gt; [ 2991.344562] Workqueue: events cpuset_hotplug_workfn
&gt; [ 2991.344562]  ffff880078dbbb58 0000000000000002 0000000000000006 00000000000000d8
&gt; [ 2991.344562]  ffff880078db8100 ffff880078dbbfd8 ffff880078dbbfd8 00000000001d3c80
&gt; [ 2991.344562]  ffff8800779ca5c0 ffff880078db8100 ffffffff81541fcf 0000000000000000
&gt; [ 2991.344562] Call Trace:
&gt; [ 2991.344562]  [&lt;ffffffff81541fcf&gt;] ? __mutex_lock_common+0x3d4/0x609
&gt; [ 2991.344562]  [&lt;ffffffff815437ba&gt;] schedule+0x64/0x66
&gt; [ 2991.344562]  [&lt;ffffffff81543a39&gt;] schedule_preempt_disabled+0x18/0x24
&gt; [ 2991.344562]  [&lt;ffffffff81541fcf&gt;] __mutex_lock_common+0x3d4/0x609
&gt; [ 2991.344562]  [&lt;ffffffff8103d11b&gt;] ? get_online_cpus+0x3c/0x50
&gt; [ 2991.344562]  [&lt;ffffffff8103d11b&gt;] ? get_online_cpus+0x3c/0x50
&gt; [ 2991.344562]  [&lt;ffffffff815422ff&gt;] mutex_lock_nested+0x3b/0x40
&gt; [ 2991.344562]  [&lt;ffffffff8103d11b&gt;] get_online_cpus+0x3c/0x50
&gt; [ 2991.344562]  [&lt;ffffffff810af7e6&gt;] rebuild_sched_domains_locked+0x6e/0x3a8
&gt; [ 2991.344562]  [&lt;ffffffff810b0ec6&gt;] rebuild_sched_domains+0x1c/0x2a
&gt; [ 2991.344562]  [&lt;ffffffff810b109b&gt;] cpuset_hotplug_workfn+0x1c7/0x1d3
&gt; [ 2991.344562]  [&lt;ffffffff810b0ed9&gt;] ? cpuset_hotplug_workfn+0x5/0x1d3
&gt; [ 2991.344562]  [&lt;ffffffff81058e07&gt;] process_one_work+0x2d4/0x4d1
&gt; [ 2991.344562]  [&lt;ffffffff81058d3a&gt;] ? process_one_work+0x207/0x4d1
&gt; [ 2991.344562]  [&lt;ffffffff8105964c&gt;] worker_thread+0x2e7/0x3b5
&gt; [ 2991.344562]  [&lt;ffffffff81059365&gt;] ? rescuer_thread+0x332/0x332
&gt; [ 2991.344562]  [&lt;ffffffff81061cdb&gt;] kthread+0xb1/0xb9
&gt; [ 2991.344562]  [&lt;ffffffff81061c2a&gt;] ? __init_kthread_worker+0x58/0x58
&gt; [ 2991.344562]  [&lt;ffffffff8154c1dc&gt;] ret_from_fork+0x7c/0xb0
&gt; [ 2991.344562]  [&lt;ffffffff81061c2a&gt;] ? __init_kthread_worker+0x58/0x58
&gt; [ 2991.344562] bash            D ffffffff81a4aa80     0  2618   2612 0x10000000
&gt; [ 2991.344562]  ffff8800379abb58 0000000000000002 0000000000000006 0000000000000c2c
&gt; [ 2991.344562]  ffff880077fea140 ffff8800379abfd8 ffff8800379abfd8 00000000001d3c80
&gt; [ 2991.344562]  ffff8800779ca5c0 ffff880077fea140 ffffffff81541fcf 0000000000000000
&gt; [ 2991.344562] Call Trace:
&gt; [ 2991.344562]  [&lt;ffffffff81541fcf&gt;] ? __mutex_lock_common+0x3d4/0x609
&gt; [ 2991.344562]  [&lt;ffffffff815437ba&gt;] schedule+0x64/0x66
&gt; [ 2991.344562]  [&lt;ffffffff81543a39&gt;] schedule_preempt_disabled+0x18/0x24
&gt; [ 2991.344562]  [&lt;ffffffff81541fcf&gt;] __mutex_lock_common+0x3d4/0x609
&gt; [ 2991.344562]  [&lt;ffffffff81530078&gt;] ? rcu_cpu_notify+0x2f5/0x86e
&gt; [ 2991.344562]  [&lt;ffffffff81530078&gt;] ? rcu_cpu_notify+0x2f5/0x86e
&gt; [ 2991.344562]  [&lt;ffffffff815422ff&gt;] mutex_lock_nested+0x3b/0x40
&gt; [ 2991.344562]  [&lt;ffffffff81530078&gt;] rcu_cpu_notify+0x2f5/0x86e
&gt; [ 2991.344562]  [&lt;ffffffff81091c99&gt;] ? __lock_is_held+0x32/0x53
&gt; [ 2991.344562]  [&lt;ffffffff81548912&gt;] notifier_call_chain+0x6b/0x98
&gt; [ 2991.344562]  [&lt;ffffffff810671fd&gt;] __raw_notifier_call_chain+0xe/0x10
&gt; [ 2991.344562]  [&lt;ffffffff8103cf64&gt;] __cpu_notify+0x20/0x32
&gt; [ 2991.344562]  [&lt;ffffffff8103cf8d&gt;] cpu_notify_nofail+0x17/0x36
&gt; [ 2991.344562]  [&lt;ffffffff815225de&gt;] _cpu_down+0x154/0x259
&gt; [ 2991.344562]  [&lt;ffffffff81522710&gt;] cpu_down+0x2d/0x3a
&gt; [ 2991.344562]  [&lt;ffffffff81526351&gt;] store_online+0x4e/0xe7
&gt; [ 2991.344562]  [&lt;ffffffff8134d764&gt;] dev_attr_store+0x20/0x22
&gt; [ 2991.344562]  [&lt;ffffffff811b3c5f&gt;] sysfs_write_file+0x108/0x144
&gt; [ 2991.344562]  [&lt;ffffffff8114c5ef&gt;] vfs_write+0xfd/0x158
&gt; [ 2991.344562]  [&lt;ffffffff8114c928&gt;] SyS_write+0x5c/0x83
&gt; [ 2991.344562]  [&lt;ffffffff8154c494&gt;] tracesys+0xdd/0xe2
&gt;
&gt; As well as held locks:
&gt;
&gt; [ 3034.728033] Showing all locks held in the system:
&gt; [ 3034.728033] 1 lock held by rcu_preempt/10:
&gt; [ 3034.728033]  #0:  (rcu_preempt_state.onoff_mutex){+.+...}, at: [&lt;ffffffff810c9471&gt;] rcu_gp_kthread+0x167/0x94b
&gt; [ 3034.728033] 4 locks held by kworker/0:1/47:
&gt; [ 3034.728033]  #0:  (events){.+.+.+}, at: [&lt;ffffffff81058d3a&gt;] process_one_work+0x207/0x4d1
&gt; [ 3034.728033]  #1:  (cpuset_hotplug_work){+.+.+.}, at: [&lt;ffffffff81058d3a&gt;] process_one_work+0x207/0x4d1
&gt; [ 3034.728033]  #2:  (cpuset_mutex){+.+.+.}, at: [&lt;ffffffff810b0ec1&gt;] rebuild_sched_domains+0x17/0x2a
&gt; [ 3034.728033]  #3:  (cpu_hotplug.lock){+.+.+.}, at: [&lt;ffffffff8103d11b&gt;] get_online_cpus+0x3c/0x50
&gt; [ 3034.728033] 1 lock held by mingetty/2563:
&gt; [ 3034.728033]  #0:  (&amp;ldata-&gt;atomic_read_lock){+.+...}, at: [&lt;ffffffff8131e28a&gt;] n_tty_read+0x252/0x7e8
&gt; [ 3034.728033] 1 lock held by mingetty/2565:
&gt; [ 3034.728033]  #0:  (&amp;ldata-&gt;atomic_read_lock){+.+...}, at: [&lt;ffffffff8131e28a&gt;] n_tty_read+0x252/0x7e8
&gt; [ 3034.728033] 1 lock held by mingetty/2569:
&gt; [ 3034.728033]  #0:  (&amp;ldata-&gt;atomic_read_lock){+.+...}, at: [&lt;ffffffff8131e28a&gt;] n_tty_read+0x252/0x7e8
&gt; [ 3034.728033] 1 lock held by mingetty/2572:
&gt; [ 3034.728033]  #0:  (&amp;ldata-&gt;atomic_read_lock){+.+...}, at: [&lt;ffffffff8131e28a&gt;] n_tty_read+0x252/0x7e8
&gt; [ 3034.728033] 1 lock held by mingetty/2575:
&gt; [ 3034.728033]  #0:  (&amp;ldata-&gt;atomic_read_lock){+.+...}, at: [&lt;ffffffff8131e28a&gt;] n_tty_read+0x252/0x7e8
&gt; [ 3034.728033] 7 locks held by bash/2618:
&gt; [ 3034.728033]  #0:  (sb_writers#5){.+.+.+}, at: [&lt;ffffffff8114bc3f&gt;] file_start_write+0x2a/0x2c
&gt; [ 3034.728033]  #1:  (&amp;buffer-&gt;mutex#2){+.+.+.}, at: [&lt;ffffffff811b3b93&gt;] sysfs_write_file+0x3c/0x144
&gt; [ 3034.728033]  #2:  (s_active#54){.+.+.+}, at: [&lt;ffffffff811b3c3e&gt;] sysfs_write_file+0xe7/0x144
&gt; [ 3034.728033]  #3:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [&lt;ffffffff810217c2&gt;] cpu_hotplug_driver_lock+0x17/0x19
&gt; [ 3034.728033]  #4:  (cpu_add_remove_lock){+.+.+.}, at: [&lt;ffffffff8103d196&gt;] cpu_maps_update_begin+0x17/0x19
&gt; [ 3034.728033]  #5:  (cpu_hotplug.lock){+.+.+.}, at: [&lt;ffffffff8103cfd8&gt;] cpu_hotplug_begin+0x2c/0x6d
&gt; [ 3034.728033]  #6:  (rcu_preempt_state.onoff_mutex){+.+...}, at: [&lt;ffffffff81530078&gt;] rcu_cpu_notify+0x2f5/0x86e
&gt; [ 3034.728033] 1 lock held by bash/2980:
&gt; [ 3034.728033]  #0:  (&amp;ldata-&gt;atomic_read_lock){+.+...}, at: [&lt;ffffffff8131e28a&gt;] n_tty_read+0x252/0x7e8
&gt;
&gt; Things looked a little weird. Also, this is a deadlock that lockdep did
&gt; not catch. But what we have here does not look like a circular lock
&gt; issue:
&gt;
&gt; Bash is blocked in rcu_cpu_notify():
&gt;
&gt; 1961		/* Exclude any attempts to start a new grace period. */
&gt; 1962		mutex_lock(&amp;rsp-&gt;onoff_mutex);
&gt;
&gt;
&gt; kworker is blocked in get_online_cpus(), which makes sense as we are
&gt; currently taking down a CPU.
&gt;
&gt; But rcu_preempt is not blocked on anything. It is simply sleeping in
&gt; rcu_gp_kthread (really rcu_gp_init) here:
&gt;
&gt; 1453	#ifdef CONFIG_PROVE_RCU_DELAY
&gt; 1454			if ((prandom_u32() % (rcu_num_nodes * 8)) == 0 &amp;&amp;
&gt; 1455			    system_state == SYSTEM_RUNNING)
&gt; 1456				schedule_timeout_uninterruptible(2);
&gt; 1457	#endif /* #ifdef CONFIG_PROVE_RCU_DELAY */
&gt;
&gt; And it does this while holding the onoff_mutex that bash is waiting for.
&gt;
&gt; Doing a function trace, it showed me where it happened:
&gt;
&gt; [  125.940066] rcu_pree-10      3.... 28384115273: schedule_timeout_uninterruptible &lt;-rcu_gp_kthread
&gt; [...]
&gt; [  125.940066] rcu_pree-10      3d..3 28384202439: sched_switch: prev_comm=rcu_preempt prev_pid=10 prev_prio=120 prev_state=D ==&gt; next_comm=watchdog/3 next_pid=38 next_prio=120
&gt;
&gt; The watchdog ran, and then:
&gt;
&gt; [  125.940066] watchdog-38      3d..3 28384692863: sched_switch: prev_comm=watchdog/3 prev_pid=38 prev_prio=120 prev_state=P ==&gt; next_comm=modprobe next_pid=2848 next_prio=118
&gt;
&gt; Not sure what modprobe was doing, but shortly after that:
&gt;
&gt; [  125.940066] modprobe-2848    3d..3 28385041749: sched_switch: prev_comm=modprobe prev_pid=2848 prev_prio=118 prev_state=R+ ==&gt; next_comm=migration/3 next_pid=40 next_prio=0
&gt;
&gt; Where the migration thread took down the CPU:
&gt;
&gt; [  125.940066] migratio-40      3d..3 28389148276: sched_switch: prev_comm=migration/3 prev_pid=40 prev_prio=0 prev_state=P ==&gt; next_comm=swapper/3 next_pid=0 next_prio=120
&gt;
&gt; which finally did:
&gt;
&gt; [  125.940066]   &lt;idle&gt;-0       3...1 28389282142: arch_cpu_idle_dead &lt;-cpu_startup_entry
&gt; [  125.940066]   &lt;idle&gt;-0       3...1 28389282548: native_play_dead &lt;-arch_cpu_idle_dead
&gt; [  125.940066]   &lt;idle&gt;-0       3...1 28389282924: play_dead_common &lt;-native_play_dead
&gt; [  125.940066]   &lt;idle&gt;-0       3...1 28389283468: idle_task_exit &lt;-play_dead_common
&gt; [  125.940066]   &lt;idle&gt;-0       3...1 28389284644: amd_e400_remove_cpu &lt;-play_dead_common
&gt;
&gt;
&gt; CPU 3 is now offline, the rcu_preempt thread that ran on CPU 3 is still
&gt; doing a schedule_timeout_uninterruptible() and it registered it's
&gt; timeout to the timer base for CPU 3. You would think that it would get
&gt; migrated right? The issue here is that the timer migration happens at
&gt; the CPU notifier for CPU_DEAD. The problem is that the rcu notifier for
&gt; CPU_DOWN is blocked waiting for the onoff_mutex to be released, which is
&gt; held by the thread that just put itself into a uninterruptible sleep,
&gt; that wont wake up until the CPU_DEAD notifier of the timer
&gt; infrastructure is called, which wont happen until the rcu notifier
&gt; finishes. Here's our deadlock!

This commit breaks this deadlock cycle by substituting a shorter udelay()
for the previous schedule_timeout_uninterruptible(), while at the same
time increasing the probability of the delay.  This maintains the intensity
of the testing.

Reported-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Tested-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In Steven Rostedt's words:

&gt; I've been debugging the last couple of days why my tests have been
&gt; locking up. One of my tracing tests, runs all available tracers. The
&gt; lockup always happened with the mmiotrace, which is used to trace
&gt; interactions between priority drivers and the kernel. But to do this
&gt; easily, when the tracer gets registered, it disables all but the boot
&gt; CPUs. The lockup always happened after it got done disabling the CPUs.
&gt;
&gt; Then I decided to try this:
&gt;
&gt; while :; do
&gt; 	for i in 1 2 3; do
&gt; 		echo 0 &gt; /sys/devices/system/cpu/cpu$i/online
&gt; 	done
&gt; 	for i in 1 2 3; do
&gt; 		echo 1 &gt; /sys/devices/system/cpu/cpu$i/online
&gt; 	done
&gt; done
&gt;
&gt; Well, sure enough, that locked up too, with the same users. Doing a
&gt; sysrq-w (showing all blocked tasks):
&gt;
&gt; [ 2991.344562]   task                        PC stack   pid father
&gt; [ 2991.344562] rcu_preempt     D ffff88007986fdf8     0    10      2 0x00000000
&gt; [ 2991.344562]  ffff88007986fc98 0000000000000002 ffff88007986fc48 0000000000000908
&gt; [ 2991.344562]  ffff88007986c280 ffff88007986ffd8 ffff88007986ffd8 00000000001d3c80
&gt; [ 2991.344562]  ffff880079248a40 ffff88007986c280 0000000000000000 00000000fffd4295
&gt; [ 2991.344562] Call Trace:
&gt; [ 2991.344562]  [&lt;ffffffff815437ba&gt;] schedule+0x64/0x66
&gt; [ 2991.344562]  [&lt;ffffffff81541750&gt;] schedule_timeout+0xbc/0xf9
&gt; [ 2991.344562]  [&lt;ffffffff8154bec0&gt;] ? ftrace_call+0x5/0x2f
&gt; [ 2991.344562]  [&lt;ffffffff81049513&gt;] ? cascade+0xa8/0xa8
&gt; [ 2991.344562]  [&lt;ffffffff815417ab&gt;] schedule_timeout_uninterruptible+0x1e/0x20
&gt; [ 2991.344562]  [&lt;ffffffff810c980c&gt;] rcu_gp_kthread+0x502/0x94b
&gt; [ 2991.344562]  [&lt;ffffffff81062791&gt;] ? __init_waitqueue_head+0x50/0x50
&gt; [ 2991.344562]  [&lt;ffffffff810c930a&gt;] ? rcu_gp_fqs+0x64/0x64
&gt; [ 2991.344562]  [&lt;ffffffff81061cdb&gt;] kthread+0xb1/0xb9
&gt; [ 2991.344562]  [&lt;ffffffff81091e31&gt;] ? lock_release_holdtime.part.23+0x4e/0x55
&gt; [ 2991.344562]  [&lt;ffffffff81061c2a&gt;] ? __init_kthread_worker+0x58/0x58
&gt; [ 2991.344562]  [&lt;ffffffff8154c1dc&gt;] ret_from_fork+0x7c/0xb0
&gt; [ 2991.344562]  [&lt;ffffffff81061c2a&gt;] ? __init_kthread_worker+0x58/0x58
&gt; [ 2991.344562] kworker/0:1     D ffffffff81a30680     0    47      2 0x00000000
&gt; [ 2991.344562] Workqueue: events cpuset_hotplug_workfn
&gt; [ 2991.344562]  ffff880078dbbb58 0000000000000002 0000000000000006 00000000000000d8
&gt; [ 2991.344562]  ffff880078db8100 ffff880078dbbfd8 ffff880078dbbfd8 00000000001d3c80
&gt; [ 2991.344562]  ffff8800779ca5c0 ffff880078db8100 ffffffff81541fcf 0000000000000000
&gt; [ 2991.344562] Call Trace:
&gt; [ 2991.344562]  [&lt;ffffffff81541fcf&gt;] ? __mutex_lock_common+0x3d4/0x609
&gt; [ 2991.344562]  [&lt;ffffffff815437ba&gt;] schedule+0x64/0x66
&gt; [ 2991.344562]  [&lt;ffffffff81543a39&gt;] schedule_preempt_disabled+0x18/0x24
&gt; [ 2991.344562]  [&lt;ffffffff81541fcf&gt;] __mutex_lock_common+0x3d4/0x609
&gt; [ 2991.344562]  [&lt;ffffffff8103d11b&gt;] ? get_online_cpus+0x3c/0x50
&gt; [ 2991.344562]  [&lt;ffffffff8103d11b&gt;] ? get_online_cpus+0x3c/0x50
&gt; [ 2991.344562]  [&lt;ffffffff815422ff&gt;] mutex_lock_nested+0x3b/0x40
&gt; [ 2991.344562]  [&lt;ffffffff8103d11b&gt;] get_online_cpus+0x3c/0x50
&gt; [ 2991.344562]  [&lt;ffffffff810af7e6&gt;] rebuild_sched_domains_locked+0x6e/0x3a8
&gt; [ 2991.344562]  [&lt;ffffffff810b0ec6&gt;] rebuild_sched_domains+0x1c/0x2a
&gt; [ 2991.344562]  [&lt;ffffffff810b109b&gt;] cpuset_hotplug_workfn+0x1c7/0x1d3
&gt; [ 2991.344562]  [&lt;ffffffff810b0ed9&gt;] ? cpuset_hotplug_workfn+0x5/0x1d3
&gt; [ 2991.344562]  [&lt;ffffffff81058e07&gt;] process_one_work+0x2d4/0x4d1
&gt; [ 2991.344562]  [&lt;ffffffff81058d3a&gt;] ? process_one_work+0x207/0x4d1
&gt; [ 2991.344562]  [&lt;ffffffff8105964c&gt;] worker_thread+0x2e7/0x3b5
&gt; [ 2991.344562]  [&lt;ffffffff81059365&gt;] ? rescuer_thread+0x332/0x332
&gt; [ 2991.344562]  [&lt;ffffffff81061cdb&gt;] kthread+0xb1/0xb9
&gt; [ 2991.344562]  [&lt;ffffffff81061c2a&gt;] ? __init_kthread_worker+0x58/0x58
&gt; [ 2991.344562]  [&lt;ffffffff8154c1dc&gt;] ret_from_fork+0x7c/0xb0
&gt; [ 2991.344562]  [&lt;ffffffff81061c2a&gt;] ? __init_kthread_worker+0x58/0x58
&gt; [ 2991.344562] bash            D ffffffff81a4aa80     0  2618   2612 0x10000000
&gt; [ 2991.344562]  ffff8800379abb58 0000000000000002 0000000000000006 0000000000000c2c
&gt; [ 2991.344562]  ffff880077fea140 ffff8800379abfd8 ffff8800379abfd8 00000000001d3c80
&gt; [ 2991.344562]  ffff8800779ca5c0 ffff880077fea140 ffffffff81541fcf 0000000000000000
&gt; [ 2991.344562] Call Trace:
&gt; [ 2991.344562]  [&lt;ffffffff81541fcf&gt;] ? __mutex_lock_common+0x3d4/0x609
&gt; [ 2991.344562]  [&lt;ffffffff815437ba&gt;] schedule+0x64/0x66
&gt; [ 2991.344562]  [&lt;ffffffff81543a39&gt;] schedule_preempt_disabled+0x18/0x24
&gt; [ 2991.344562]  [&lt;ffffffff81541fcf&gt;] __mutex_lock_common+0x3d4/0x609
&gt; [ 2991.344562]  [&lt;ffffffff81530078&gt;] ? rcu_cpu_notify+0x2f5/0x86e
&gt; [ 2991.344562]  [&lt;ffffffff81530078&gt;] ? rcu_cpu_notify+0x2f5/0x86e
&gt; [ 2991.344562]  [&lt;ffffffff815422ff&gt;] mutex_lock_nested+0x3b/0x40
&gt; [ 2991.344562]  [&lt;ffffffff81530078&gt;] rcu_cpu_notify+0x2f5/0x86e
&gt; [ 2991.344562]  [&lt;ffffffff81091c99&gt;] ? __lock_is_held+0x32/0x53
&gt; [ 2991.344562]  [&lt;ffffffff81548912&gt;] notifier_call_chain+0x6b/0x98
&gt; [ 2991.344562]  [&lt;ffffffff810671fd&gt;] __raw_notifier_call_chain+0xe/0x10
&gt; [ 2991.344562]  [&lt;ffffffff8103cf64&gt;] __cpu_notify+0x20/0x32
&gt; [ 2991.344562]  [&lt;ffffffff8103cf8d&gt;] cpu_notify_nofail+0x17/0x36
&gt; [ 2991.344562]  [&lt;ffffffff815225de&gt;] _cpu_down+0x154/0x259
&gt; [ 2991.344562]  [&lt;ffffffff81522710&gt;] cpu_down+0x2d/0x3a
&gt; [ 2991.344562]  [&lt;ffffffff81526351&gt;] store_online+0x4e/0xe7
&gt; [ 2991.344562]  [&lt;ffffffff8134d764&gt;] dev_attr_store+0x20/0x22
&gt; [ 2991.344562]  [&lt;ffffffff811b3c5f&gt;] sysfs_write_file+0x108/0x144
&gt; [ 2991.344562]  [&lt;ffffffff8114c5ef&gt;] vfs_write+0xfd/0x158
&gt; [ 2991.344562]  [&lt;ffffffff8114c928&gt;] SyS_write+0x5c/0x83
&gt; [ 2991.344562]  [&lt;ffffffff8154c494&gt;] tracesys+0xdd/0xe2
&gt;
&gt; As well as held locks:
&gt;
&gt; [ 3034.728033] Showing all locks held in the system:
&gt; [ 3034.728033] 1 lock held by rcu_preempt/10:
&gt; [ 3034.728033]  #0:  (rcu_preempt_state.onoff_mutex){+.+...}, at: [&lt;ffffffff810c9471&gt;] rcu_gp_kthread+0x167/0x94b
&gt; [ 3034.728033] 4 locks held by kworker/0:1/47:
&gt; [ 3034.728033]  #0:  (events){.+.+.+}, at: [&lt;ffffffff81058d3a&gt;] process_one_work+0x207/0x4d1
&gt; [ 3034.728033]  #1:  (cpuset_hotplug_work){+.+.+.}, at: [&lt;ffffffff81058d3a&gt;] process_one_work+0x207/0x4d1
&gt; [ 3034.728033]  #2:  (cpuset_mutex){+.+.+.}, at: [&lt;ffffffff810b0ec1&gt;] rebuild_sched_domains+0x17/0x2a
&gt; [ 3034.728033]  #3:  (cpu_hotplug.lock){+.+.+.}, at: [&lt;ffffffff8103d11b&gt;] get_online_cpus+0x3c/0x50
&gt; [ 3034.728033] 1 lock held by mingetty/2563:
&gt; [ 3034.728033]  #0:  (&amp;ldata-&gt;atomic_read_lock){+.+...}, at: [&lt;ffffffff8131e28a&gt;] n_tty_read+0x252/0x7e8
&gt; [ 3034.728033] 1 lock held by mingetty/2565:
&gt; [ 3034.728033]  #0:  (&amp;ldata-&gt;atomic_read_lock){+.+...}, at: [&lt;ffffffff8131e28a&gt;] n_tty_read+0x252/0x7e8
&gt; [ 3034.728033] 1 lock held by mingetty/2569:
&gt; [ 3034.728033]  #0:  (&amp;ldata-&gt;atomic_read_lock){+.+...}, at: [&lt;ffffffff8131e28a&gt;] n_tty_read+0x252/0x7e8
&gt; [ 3034.728033] 1 lock held by mingetty/2572:
&gt; [ 3034.728033]  #0:  (&amp;ldata-&gt;atomic_read_lock){+.+...}, at: [&lt;ffffffff8131e28a&gt;] n_tty_read+0x252/0x7e8
&gt; [ 3034.728033] 1 lock held by mingetty/2575:
&gt; [ 3034.728033]  #0:  (&amp;ldata-&gt;atomic_read_lock){+.+...}, at: [&lt;ffffffff8131e28a&gt;] n_tty_read+0x252/0x7e8
&gt; [ 3034.728033] 7 locks held by bash/2618:
&gt; [ 3034.728033]  #0:  (sb_writers#5){.+.+.+}, at: [&lt;ffffffff8114bc3f&gt;] file_start_write+0x2a/0x2c
&gt; [ 3034.728033]  #1:  (&amp;buffer-&gt;mutex#2){+.+.+.}, at: [&lt;ffffffff811b3b93&gt;] sysfs_write_file+0x3c/0x144
&gt; [ 3034.728033]  #2:  (s_active#54){.+.+.+}, at: [&lt;ffffffff811b3c3e&gt;] sysfs_write_file+0xe7/0x144
&gt; [ 3034.728033]  #3:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [&lt;ffffffff810217c2&gt;] cpu_hotplug_driver_lock+0x17/0x19
&gt; [ 3034.728033]  #4:  (cpu_add_remove_lock){+.+.+.}, at: [&lt;ffffffff8103d196&gt;] cpu_maps_update_begin+0x17/0x19
&gt; [ 3034.728033]  #5:  (cpu_hotplug.lock){+.+.+.}, at: [&lt;ffffffff8103cfd8&gt;] cpu_hotplug_begin+0x2c/0x6d
&gt; [ 3034.728033]  #6:  (rcu_preempt_state.onoff_mutex){+.+...}, at: [&lt;ffffffff81530078&gt;] rcu_cpu_notify+0x2f5/0x86e
&gt; [ 3034.728033] 1 lock held by bash/2980:
&gt; [ 3034.728033]  #0:  (&amp;ldata-&gt;atomic_read_lock){+.+...}, at: [&lt;ffffffff8131e28a&gt;] n_tty_read+0x252/0x7e8
&gt;
&gt; Things looked a little weird. Also, this is a deadlock that lockdep did
&gt; not catch. But what we have here does not look like a circular lock
&gt; issue:
&gt;
&gt; Bash is blocked in rcu_cpu_notify():
&gt;
&gt; 1961		/* Exclude any attempts to start a new grace period. */
&gt; 1962		mutex_lock(&amp;rsp-&gt;onoff_mutex);
&gt;
&gt;
&gt; kworker is blocked in get_online_cpus(), which makes sense as we are
&gt; currently taking down a CPU.
&gt;
&gt; But rcu_preempt is not blocked on anything. It is simply sleeping in
&gt; rcu_gp_kthread (really rcu_gp_init) here:
&gt;
&gt; 1453	#ifdef CONFIG_PROVE_RCU_DELAY
&gt; 1454			if ((prandom_u32() % (rcu_num_nodes * 8)) == 0 &amp;&amp;
&gt; 1455			    system_state == SYSTEM_RUNNING)
&gt; 1456				schedule_timeout_uninterruptible(2);
&gt; 1457	#endif /* #ifdef CONFIG_PROVE_RCU_DELAY */
&gt;
&gt; And it does this while holding the onoff_mutex that bash is waiting for.
&gt;
&gt; Doing a function trace, it showed me where it happened:
&gt;
&gt; [  125.940066] rcu_pree-10      3.... 28384115273: schedule_timeout_uninterruptible &lt;-rcu_gp_kthread
&gt; [...]
&gt; [  125.940066] rcu_pree-10      3d..3 28384202439: sched_switch: prev_comm=rcu_preempt prev_pid=10 prev_prio=120 prev_state=D ==&gt; next_comm=watchdog/3 next_pid=38 next_prio=120
&gt;
&gt; The watchdog ran, and then:
&gt;
&gt; [  125.940066] watchdog-38      3d..3 28384692863: sched_switch: prev_comm=watchdog/3 prev_pid=38 prev_prio=120 prev_state=P ==&gt; next_comm=modprobe next_pid=2848 next_prio=118
&gt;
&gt; Not sure what modprobe was doing, but shortly after that:
&gt;
&gt; [  125.940066] modprobe-2848    3d..3 28385041749: sched_switch: prev_comm=modprobe prev_pid=2848 prev_prio=118 prev_state=R+ ==&gt; next_comm=migration/3 next_pid=40 next_prio=0
&gt;
&gt; Where the migration thread took down the CPU:
&gt;
&gt; [  125.940066] migratio-40      3d..3 28389148276: sched_switch: prev_comm=migration/3 prev_pid=40 prev_prio=0 prev_state=P ==&gt; next_comm=swapper/3 next_pid=0 next_prio=120
&gt;
&gt; which finally did:
&gt;
&gt; [  125.940066]   &lt;idle&gt;-0       3...1 28389282142: arch_cpu_idle_dead &lt;-cpu_startup_entry
&gt; [  125.940066]   &lt;idle&gt;-0       3...1 28389282548: native_play_dead &lt;-arch_cpu_idle_dead
&gt; [  125.940066]   &lt;idle&gt;-0       3...1 28389282924: play_dead_common &lt;-native_play_dead
&gt; [  125.940066]   &lt;idle&gt;-0       3...1 28389283468: idle_task_exit &lt;-play_dead_common
&gt; [  125.940066]   &lt;idle&gt;-0       3...1 28389284644: amd_e400_remove_cpu &lt;-play_dead_common
&gt;
&gt;
&gt; CPU 3 is now offline, the rcu_preempt thread that ran on CPU 3 is still
&gt; doing a schedule_timeout_uninterruptible() and it registered it's
&gt; timeout to the timer base for CPU 3. You would think that it would get
&gt; migrated right? The issue here is that the timer migration happens at
&gt; the CPU notifier for CPU_DEAD. The problem is that the rcu notifier for
&gt; CPU_DOWN is blocked waiting for the onoff_mutex to be released, which is
&gt; held by the thread that just put itself into a uninterruptible sleep,
&gt; that wont wake up until the CPU_DEAD notifier of the timer
&gt; infrastructure is called, which wont happen until the rcu notifier
&gt; finishes. Here's our deadlock!

This commit breaks this deadlock cycle by substituting a shorter udelay()
for the previous schedule_timeout_uninterruptible(), while at the same
time increasing the probability of the delay.  This maintains the intensity
of the testing.

Reported-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Tested-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rcu: Don't call wakeup() with rcu_node structure -&gt;lock held</title>
<updated>2013-06-10T20:37:11+00:00</updated>
<author>
<name>Steven Rostedt</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2013-05-28T21:32:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=016a8d5be6ddcc72ef0432d82d9f6fa34f61b907'/>
<id>016a8d5be6ddcc72ef0432d82d9f6fa34f61b907</id>
<content type='text'>
This commit fixes a lockdep-detected deadlock by moving a wake_up()
call out from a rnp-&gt;lock critical section.  Please see below for
the long version of this story.

On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote:

&gt; [12572.705832] ======================================================
&gt; [12572.750317] [ INFO: possible circular locking dependency detected ]
&gt; [12572.796978] 3.10.0-rc3+ #39 Not tainted
&gt; [12572.833381] -------------------------------------------------------
&gt; [12572.862233] trinity-child17/31341 is trying to acquire lock:
&gt; [12572.870390]  (rcu_node_0){..-.-.}, at: [&lt;ffffffff811054ff&gt;] rcu_read_unlock_special+0x9f/0x4c0
&gt; [12572.878859]
&gt; but task is already holding lock:
&gt; [12572.894894]  (&amp;ctx-&gt;lock){-.-...}, at: [&lt;ffffffff811390ed&gt;] perf_lock_task_context+0x7d/0x2d0
&gt; [12572.903381]
&gt; which lock already depends on the new lock.
&gt;
&gt; [12572.927541]
&gt; the existing dependency chain (in reverse order) is:
&gt; [12572.943736]
&gt; -&gt; #4 (&amp;ctx-&gt;lock){-.-...}:
&gt; [12572.960032]        [&lt;ffffffff810b9851&gt;] lock_acquire+0x91/0x1f0
&gt; [12572.968337]        [&lt;ffffffff816ebc90&gt;] _raw_spin_lock+0x40/0x80
&gt; [12572.976633]        [&lt;ffffffff8113c987&gt;] __perf_event_task_sched_out+0x2e7/0x5e0
&gt; [12572.984969]        [&lt;ffffffff81088953&gt;] perf_event_task_sched_out+0x93/0xa0
&gt; [12572.993326]        [&lt;ffffffff816ea0bf&gt;] __schedule+0x2cf/0x9c0
&gt; [12573.001652]        [&lt;ffffffff816eacfe&gt;] schedule_user+0x2e/0x70
&gt; [12573.009998]        [&lt;ffffffff816ecd64&gt;] retint_careful+0x12/0x2e
&gt; [12573.018321]
&gt; -&gt; #3 (&amp;rq-&gt;lock){-.-.-.}:
&gt; [12573.034628]        [&lt;ffffffff810b9851&gt;] lock_acquire+0x91/0x1f0
&gt; [12573.042930]        [&lt;ffffffff816ebc90&gt;] _raw_spin_lock+0x40/0x80
&gt; [12573.051248]        [&lt;ffffffff8108e6a7&gt;] wake_up_new_task+0xb7/0x260
&gt; [12573.059579]        [&lt;ffffffff810492f5&gt;] do_fork+0x105/0x470
&gt; [12573.067880]        [&lt;ffffffff81049686&gt;] kernel_thread+0x26/0x30
&gt; [12573.076202]        [&lt;ffffffff816cee63&gt;] rest_init+0x23/0x140
&gt; [12573.084508]        [&lt;ffffffff81ed8e1f&gt;] start_kernel+0x3f1/0x3fe
&gt; [12573.092852]        [&lt;ffffffff81ed856f&gt;] x86_64_start_reservations+0x2a/0x2c
&gt; [12573.101233]        [&lt;ffffffff81ed863d&gt;] x86_64_start_kernel+0xcc/0xcf
&gt; [12573.109528]
&gt; -&gt; #2 (&amp;p-&gt;pi_lock){-.-.-.}:
&gt; [12573.125675]        [&lt;ffffffff810b9851&gt;] lock_acquire+0x91/0x1f0
&gt; [12573.133829]        [&lt;ffffffff816ebe9b&gt;] _raw_spin_lock_irqsave+0x4b/0x90
&gt; [12573.141964]        [&lt;ffffffff8108e881&gt;] try_to_wake_up+0x31/0x320
&gt; [12573.150065]        [&lt;ffffffff8108ebe2&gt;] default_wake_function+0x12/0x20
&gt; [12573.158151]        [&lt;ffffffff8107bbf8&gt;] autoremove_wake_function+0x18/0x40
&gt; [12573.166195]        [&lt;ffffffff81085398&gt;] __wake_up_common+0x58/0x90
&gt; [12573.174215]        [&lt;ffffffff81086909&gt;] __wake_up+0x39/0x50
&gt; [12573.182146]        [&lt;ffffffff810fc3da&gt;] rcu_start_gp_advanced.isra.11+0x4a/0x50
&gt; [12573.190119]        [&lt;ffffffff810fdb09&gt;] rcu_start_future_gp+0x1c9/0x1f0
&gt; [12573.198023]        [&lt;ffffffff810fe2c4&gt;] rcu_nocb_kthread+0x114/0x930
&gt; [12573.205860]        [&lt;ffffffff8107a91d&gt;] kthread+0xed/0x100
&gt; [12573.213656]        [&lt;ffffffff816f4b1c&gt;] ret_from_fork+0x7c/0xb0
&gt; [12573.221379]
&gt; -&gt; #1 (&amp;rsp-&gt;gp_wq){..-.-.}:
&gt; [12573.236329]        [&lt;ffffffff810b9851&gt;] lock_acquire+0x91/0x1f0
&gt; [12573.243783]        [&lt;ffffffff816ebe9b&gt;] _raw_spin_lock_irqsave+0x4b/0x90
&gt; [12573.251178]        [&lt;ffffffff810868f3&gt;] __wake_up+0x23/0x50
&gt; [12573.258505]        [&lt;ffffffff810fc3da&gt;] rcu_start_gp_advanced.isra.11+0x4a/0x50
&gt; [12573.265891]        [&lt;ffffffff810fdb09&gt;] rcu_start_future_gp+0x1c9/0x1f0
&gt; [12573.273248]        [&lt;ffffffff810fe2c4&gt;] rcu_nocb_kthread+0x114/0x930
&gt; [12573.280564]        [&lt;ffffffff8107a91d&gt;] kthread+0xed/0x100
&gt; [12573.287807]        [&lt;ffffffff816f4b1c&gt;] ret_from_fork+0x7c/0xb0

Notice the above call chain.

rcu_start_future_gp() is called with the rnp-&gt;lock held. Then it calls
rcu_start_gp_advance, which does a wakeup.

You can't do wakeups while holding the rnp-&gt;lock, as that would mean
that you could not do a rcu_read_unlock() while holding the rq lock, or
any lock that was taken while holding the rq lock. This is because...
(See below).

&gt; [12573.295067]
&gt; -&gt; #0 (rcu_node_0){..-.-.}:
&gt; [12573.309293]        [&lt;ffffffff810b8d36&gt;] __lock_acquire+0x1786/0x1af0
&gt; [12573.316568]        [&lt;ffffffff810b9851&gt;] lock_acquire+0x91/0x1f0
&gt; [12573.323825]        [&lt;ffffffff816ebc90&gt;] _raw_spin_lock+0x40/0x80
&gt; [12573.331081]        [&lt;ffffffff811054ff&gt;] rcu_read_unlock_special+0x9f/0x4c0
&gt; [12573.338377]        [&lt;ffffffff810760a6&gt;] __rcu_read_unlock+0x96/0xa0
&gt; [12573.345648]        [&lt;ffffffff811391b3&gt;] perf_lock_task_context+0x143/0x2d0
&gt; [12573.352942]        [&lt;ffffffff8113938e&gt;] find_get_context+0x4e/0x1f0
&gt; [12573.360211]        [&lt;ffffffff811403f4&gt;] SYSC_perf_event_open+0x514/0xbd0
&gt; [12573.367514]        [&lt;ffffffff81140e49&gt;] SyS_perf_event_open+0x9/0x10
&gt; [12573.374816]        [&lt;ffffffff816f4dd4&gt;] tracesys+0xdd/0xe2

Notice the above trace.

perf took its own ctx-&gt;lock, which can be taken while holding the rq
lock. While holding this lock, it did a rcu_read_unlock(). The
perf_lock_task_context() basically looks like:

rcu_read_lock();
raw_spin_lock(ctx-&gt;lock);
rcu_read_unlock();

Now, what looks to have happened, is that we scheduled after taking that
first rcu_read_lock() but before taking the spin lock. When we scheduled
back in and took the ctx-&gt;lock, the following rcu_read_unlock()
triggered the "special" code.

The rcu_read_unlock_special() takes the rnp-&gt;lock, which gives us a
possible deadlock scenario.

	CPU0		CPU1		CPU2
	----		----		----

				     rcu_nocb_kthread()
    lock(rq-&gt;lock);
		    lock(ctx-&gt;lock);
				     lock(rnp-&gt;lock);

				     wake_up();

				     lock(rq-&gt;lock);

		    rcu_read_unlock();

		    rcu_read_unlock_special();

		    lock(rnp-&gt;lock);
    lock(ctx-&gt;lock);

**** DEADLOCK ****

&gt; [12573.382068]
&gt; other info that might help us debug this:
&gt;
&gt; [12573.403229] Chain exists of:
&gt;   rcu_node_0 --&gt; &amp;rq-&gt;lock --&gt; &amp;ctx-&gt;lock
&gt;
&gt; [12573.424471]  Possible unsafe locking scenario:
&gt;
&gt; [12573.438499]        CPU0                    CPU1
&gt; [12573.445599]        ----                    ----
&gt; [12573.452691]   lock(&amp;ctx-&gt;lock);
&gt; [12573.459799]                                lock(&amp;rq-&gt;lock);
&gt; [12573.467010]                                lock(&amp;ctx-&gt;lock);
&gt; [12573.474192]   lock(rcu_node_0);
&gt; [12573.481262]
&gt;  *** DEADLOCK ***
&gt;
&gt; [12573.501931] 1 lock held by trinity-child17/31341:
&gt; [12573.508990]  #0:  (&amp;ctx-&gt;lock){-.-...}, at: [&lt;ffffffff811390ed&gt;] perf_lock_task_context+0x7d/0x2d0
&gt; [12573.516475]
&gt; stack backtrace:
&gt; [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
&gt; [12573.545357]  ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
&gt; [12573.552868]  ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
&gt; [12573.560353]  0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
&gt; [12573.567856] Call Trace:
&gt; [12573.575011]  [&lt;ffffffff816e375b&gt;] dump_stack+0x19/0x1b
&gt; [12573.582284]  [&lt;ffffffff816dfa5d&gt;] print_circular_bug+0x200/0x20f
&gt; [12573.589637]  [&lt;ffffffff810b8d36&gt;] __lock_acquire+0x1786/0x1af0
&gt; [12573.596982]  [&lt;ffffffff810918f5&gt;] ? sched_clock_cpu+0xb5/0x100
&gt; [12573.604344]  [&lt;ffffffff810b9851&gt;] lock_acquire+0x91/0x1f0
&gt; [12573.611652]  [&lt;ffffffff811054ff&gt;] ? rcu_read_unlock_special+0x9f/0x4c0
&gt; [12573.619030]  [&lt;ffffffff816ebc90&gt;] _raw_spin_lock+0x40/0x80
&gt; [12573.626331]  [&lt;ffffffff811054ff&gt;] ? rcu_read_unlock_special+0x9f/0x4c0
&gt; [12573.633671]  [&lt;ffffffff811054ff&gt;] rcu_read_unlock_special+0x9f/0x4c0
&gt; [12573.640992]  [&lt;ffffffff811390ed&gt;] ? perf_lock_task_context+0x7d/0x2d0
&gt; [12573.648330]  [&lt;ffffffff810b429e&gt;] ? put_lock_stats.isra.29+0xe/0x40
&gt; [12573.655662]  [&lt;ffffffff813095a0&gt;] ? delay_tsc+0x90/0xe0
&gt; [12573.662964]  [&lt;ffffffff810760a6&gt;] __rcu_read_unlock+0x96/0xa0
&gt; [12573.670276]  [&lt;ffffffff811391b3&gt;] perf_lock_task_context+0x143/0x2d0
&gt; [12573.677622]  [&lt;ffffffff81139070&gt;] ? __perf_event_enable+0x370/0x370
&gt; [12573.684981]  [&lt;ffffffff8113938e&gt;] find_get_context+0x4e/0x1f0
&gt; [12573.692358]  [&lt;ffffffff811403f4&gt;] SYSC_perf_event_open+0x514/0xbd0
&gt; [12573.699753]  [&lt;ffffffff8108cd9d&gt;] ? get_parent_ip+0xd/0x50
&gt; [12573.707135]  [&lt;ffffffff810b71fd&gt;] ? trace_hardirqs_on_caller+0xfd/0x1c0
&gt; [12573.714599]  [&lt;ffffffff81140e49&gt;] SyS_perf_event_open+0x9/0x10
&gt; [12573.721996]  [&lt;ffffffff816f4dd4&gt;] tracesys+0xdd/0xe2

This commit delays the wakeup via irq_work(), which is what
perf and ftrace use to perform wakeups in critical sections.

Reported-by: Dave Jones &lt;davej@redhat.com&gt;
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This commit fixes a lockdep-detected deadlock by moving a wake_up()
call out from a rnp-&gt;lock critical section.  Please see below for
the long version of this story.

On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote:

&gt; [12572.705832] ======================================================
&gt; [12572.750317] [ INFO: possible circular locking dependency detected ]
&gt; [12572.796978] 3.10.0-rc3+ #39 Not tainted
&gt; [12572.833381] -------------------------------------------------------
&gt; [12572.862233] trinity-child17/31341 is trying to acquire lock:
&gt; [12572.870390]  (rcu_node_0){..-.-.}, at: [&lt;ffffffff811054ff&gt;] rcu_read_unlock_special+0x9f/0x4c0
&gt; [12572.878859]
&gt; but task is already holding lock:
&gt; [12572.894894]  (&amp;ctx-&gt;lock){-.-...}, at: [&lt;ffffffff811390ed&gt;] perf_lock_task_context+0x7d/0x2d0
&gt; [12572.903381]
&gt; which lock already depends on the new lock.
&gt;
&gt; [12572.927541]
&gt; the existing dependency chain (in reverse order) is:
&gt; [12572.943736]
&gt; -&gt; #4 (&amp;ctx-&gt;lock){-.-...}:
&gt; [12572.960032]        [&lt;ffffffff810b9851&gt;] lock_acquire+0x91/0x1f0
&gt; [12572.968337]        [&lt;ffffffff816ebc90&gt;] _raw_spin_lock+0x40/0x80
&gt; [12572.976633]        [&lt;ffffffff8113c987&gt;] __perf_event_task_sched_out+0x2e7/0x5e0
&gt; [12572.984969]        [&lt;ffffffff81088953&gt;] perf_event_task_sched_out+0x93/0xa0
&gt; [12572.993326]        [&lt;ffffffff816ea0bf&gt;] __schedule+0x2cf/0x9c0
&gt; [12573.001652]        [&lt;ffffffff816eacfe&gt;] schedule_user+0x2e/0x70
&gt; [12573.009998]        [&lt;ffffffff816ecd64&gt;] retint_careful+0x12/0x2e
&gt; [12573.018321]
&gt; -&gt; #3 (&amp;rq-&gt;lock){-.-.-.}:
&gt; [12573.034628]        [&lt;ffffffff810b9851&gt;] lock_acquire+0x91/0x1f0
&gt; [12573.042930]        [&lt;ffffffff816ebc90&gt;] _raw_spin_lock+0x40/0x80
&gt; [12573.051248]        [&lt;ffffffff8108e6a7&gt;] wake_up_new_task+0xb7/0x260
&gt; [12573.059579]        [&lt;ffffffff810492f5&gt;] do_fork+0x105/0x470
&gt; [12573.067880]        [&lt;ffffffff81049686&gt;] kernel_thread+0x26/0x30
&gt; [12573.076202]        [&lt;ffffffff816cee63&gt;] rest_init+0x23/0x140
&gt; [12573.084508]        [&lt;ffffffff81ed8e1f&gt;] start_kernel+0x3f1/0x3fe
&gt; [12573.092852]        [&lt;ffffffff81ed856f&gt;] x86_64_start_reservations+0x2a/0x2c
&gt; [12573.101233]        [&lt;ffffffff81ed863d&gt;] x86_64_start_kernel+0xcc/0xcf
&gt; [12573.109528]
&gt; -&gt; #2 (&amp;p-&gt;pi_lock){-.-.-.}:
&gt; [12573.125675]        [&lt;ffffffff810b9851&gt;] lock_acquire+0x91/0x1f0
&gt; [12573.133829]        [&lt;ffffffff816ebe9b&gt;] _raw_spin_lock_irqsave+0x4b/0x90
&gt; [12573.141964]        [&lt;ffffffff8108e881&gt;] try_to_wake_up+0x31/0x320
&gt; [12573.150065]        [&lt;ffffffff8108ebe2&gt;] default_wake_function+0x12/0x20
&gt; [12573.158151]        [&lt;ffffffff8107bbf8&gt;] autoremove_wake_function+0x18/0x40
&gt; [12573.166195]        [&lt;ffffffff81085398&gt;] __wake_up_common+0x58/0x90
&gt; [12573.174215]        [&lt;ffffffff81086909&gt;] __wake_up+0x39/0x50
&gt; [12573.182146]        [&lt;ffffffff810fc3da&gt;] rcu_start_gp_advanced.isra.11+0x4a/0x50
&gt; [12573.190119]        [&lt;ffffffff810fdb09&gt;] rcu_start_future_gp+0x1c9/0x1f0
&gt; [12573.198023]        [&lt;ffffffff810fe2c4&gt;] rcu_nocb_kthread+0x114/0x930
&gt; [12573.205860]        [&lt;ffffffff8107a91d&gt;] kthread+0xed/0x100
&gt; [12573.213656]        [&lt;ffffffff816f4b1c&gt;] ret_from_fork+0x7c/0xb0
&gt; [12573.221379]
&gt; -&gt; #1 (&amp;rsp-&gt;gp_wq){..-.-.}:
&gt; [12573.236329]        [&lt;ffffffff810b9851&gt;] lock_acquire+0x91/0x1f0
&gt; [12573.243783]        [&lt;ffffffff816ebe9b&gt;] _raw_spin_lock_irqsave+0x4b/0x90
&gt; [12573.251178]        [&lt;ffffffff810868f3&gt;] __wake_up+0x23/0x50
&gt; [12573.258505]        [&lt;ffffffff810fc3da&gt;] rcu_start_gp_advanced.isra.11+0x4a/0x50
&gt; [12573.265891]        [&lt;ffffffff810fdb09&gt;] rcu_start_future_gp+0x1c9/0x1f0
&gt; [12573.273248]        [&lt;ffffffff810fe2c4&gt;] rcu_nocb_kthread+0x114/0x930
&gt; [12573.280564]        [&lt;ffffffff8107a91d&gt;] kthread+0xed/0x100
&gt; [12573.287807]        [&lt;ffffffff816f4b1c&gt;] ret_from_fork+0x7c/0xb0

Notice the above call chain.

rcu_start_future_gp() is called with the rnp-&gt;lock held. Then it calls
rcu_start_gp_advance, which does a wakeup.

You can't do wakeups while holding the rnp-&gt;lock, as that would mean
that you could not do a rcu_read_unlock() while holding the rq lock, or
any lock that was taken while holding the rq lock. This is because...
(See below).

&gt; [12573.295067]
&gt; -&gt; #0 (rcu_node_0){..-.-.}:
&gt; [12573.309293]        [&lt;ffffffff810b8d36&gt;] __lock_acquire+0x1786/0x1af0
&gt; [12573.316568]        [&lt;ffffffff810b9851&gt;] lock_acquire+0x91/0x1f0
&gt; [12573.323825]        [&lt;ffffffff816ebc90&gt;] _raw_spin_lock+0x40/0x80
&gt; [12573.331081]        [&lt;ffffffff811054ff&gt;] rcu_read_unlock_special+0x9f/0x4c0
&gt; [12573.338377]        [&lt;ffffffff810760a6&gt;] __rcu_read_unlock+0x96/0xa0
&gt; [12573.345648]        [&lt;ffffffff811391b3&gt;] perf_lock_task_context+0x143/0x2d0
&gt; [12573.352942]        [&lt;ffffffff8113938e&gt;] find_get_context+0x4e/0x1f0
&gt; [12573.360211]        [&lt;ffffffff811403f4&gt;] SYSC_perf_event_open+0x514/0xbd0
&gt; [12573.367514]        [&lt;ffffffff81140e49&gt;] SyS_perf_event_open+0x9/0x10
&gt; [12573.374816]        [&lt;ffffffff816f4dd4&gt;] tracesys+0xdd/0xe2

Notice the above trace.

perf took its own ctx-&gt;lock, which can be taken while holding the rq
lock. While holding this lock, it did a rcu_read_unlock(). The
perf_lock_task_context() basically looks like:

rcu_read_lock();
raw_spin_lock(ctx-&gt;lock);
rcu_read_unlock();

Now, what looks to have happened, is that we scheduled after taking that
first rcu_read_lock() but before taking the spin lock. When we scheduled
back in and took the ctx-&gt;lock, the following rcu_read_unlock()
triggered the "special" code.

The rcu_read_unlock_special() takes the rnp-&gt;lock, which gives us a
possible deadlock scenario.

	CPU0		CPU1		CPU2
	----		----		----

				     rcu_nocb_kthread()
    lock(rq-&gt;lock);
		    lock(ctx-&gt;lock);
				     lock(rnp-&gt;lock);

				     wake_up();

				     lock(rq-&gt;lock);

		    rcu_read_unlock();

		    rcu_read_unlock_special();

		    lock(rnp-&gt;lock);
    lock(ctx-&gt;lock);

**** DEADLOCK ****

&gt; [12573.382068]
&gt; other info that might help us debug this:
&gt;
&gt; [12573.403229] Chain exists of:
&gt;   rcu_node_0 --&gt; &amp;rq-&gt;lock --&gt; &amp;ctx-&gt;lock
&gt;
&gt; [12573.424471]  Possible unsafe locking scenario:
&gt;
&gt; [12573.438499]        CPU0                    CPU1
&gt; [12573.445599]        ----                    ----
&gt; [12573.452691]   lock(&amp;ctx-&gt;lock);
&gt; [12573.459799]                                lock(&amp;rq-&gt;lock);
&gt; [12573.467010]                                lock(&amp;ctx-&gt;lock);
&gt; [12573.474192]   lock(rcu_node_0);
&gt; [12573.481262]
&gt;  *** DEADLOCK ***
&gt;
&gt; [12573.501931] 1 lock held by trinity-child17/31341:
&gt; [12573.508990]  #0:  (&amp;ctx-&gt;lock){-.-...}, at: [&lt;ffffffff811390ed&gt;] perf_lock_task_context+0x7d/0x2d0
&gt; [12573.516475]
&gt; stack backtrace:
&gt; [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
&gt; [12573.545357]  ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
&gt; [12573.552868]  ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
&gt; [12573.560353]  0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
&gt; [12573.567856] Call Trace:
&gt; [12573.575011]  [&lt;ffffffff816e375b&gt;] dump_stack+0x19/0x1b
&gt; [12573.582284]  [&lt;ffffffff816dfa5d&gt;] print_circular_bug+0x200/0x20f
&gt; [12573.589637]  [&lt;ffffffff810b8d36&gt;] __lock_acquire+0x1786/0x1af0
&gt; [12573.596982]  [&lt;ffffffff810918f5&gt;] ? sched_clock_cpu+0xb5/0x100
&gt; [12573.604344]  [&lt;ffffffff810b9851&gt;] lock_acquire+0x91/0x1f0
&gt; [12573.611652]  [&lt;ffffffff811054ff&gt;] ? rcu_read_unlock_special+0x9f/0x4c0
&gt; [12573.619030]  [&lt;ffffffff816ebc90&gt;] _raw_spin_lock+0x40/0x80
&gt; [12573.626331]  [&lt;ffffffff811054ff&gt;] ? rcu_read_unlock_special+0x9f/0x4c0
&gt; [12573.633671]  [&lt;ffffffff811054ff&gt;] rcu_read_unlock_special+0x9f/0x4c0
&gt; [12573.640992]  [&lt;ffffffff811390ed&gt;] ? perf_lock_task_context+0x7d/0x2d0
&gt; [12573.648330]  [&lt;ffffffff810b429e&gt;] ? put_lock_stats.isra.29+0xe/0x40
&gt; [12573.655662]  [&lt;ffffffff813095a0&gt;] ? delay_tsc+0x90/0xe0
&gt; [12573.662964]  [&lt;ffffffff810760a6&gt;] __rcu_read_unlock+0x96/0xa0
&gt; [12573.670276]  [&lt;ffffffff811391b3&gt;] perf_lock_task_context+0x143/0x2d0
&gt; [12573.677622]  [&lt;ffffffff81139070&gt;] ? __perf_event_enable+0x370/0x370
&gt; [12573.684981]  [&lt;ffffffff8113938e&gt;] find_get_context+0x4e/0x1f0
&gt; [12573.692358]  [&lt;ffffffff811403f4&gt;] SYSC_perf_event_open+0x514/0xbd0
&gt; [12573.699753]  [&lt;ffffffff8108cd9d&gt;] ? get_parent_ip+0xd/0x50
&gt; [12573.707135]  [&lt;ffffffff810b71fd&gt;] ? trace_hardirqs_on_caller+0xfd/0x1c0
&gt; [12573.714599]  [&lt;ffffffff81140e49&gt;] SyS_perf_event_open+0x9/0x10
&gt; [12573.721996]  [&lt;ffffffff816f4dd4&gt;] tracesys+0xdd/0xe2

This commit delays the wakeup via irq_work(), which is what
perf and ftrace use to perform wakeups in critical sections.

Reported-by: Dave Jones &lt;davej@redhat.com&gt;
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge commit '8700c95adb03' into timers/nohz</title>
<updated>2013-05-02T15:54:19+00:00</updated>
<author>
<name>Frederic Weisbecker</name>
<email>fweisbec@gmail.com</email>
</author>
<published>2013-05-02T15:37:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c032862fba51a3ca504752d3a25186b324c5ce83'/>
<id>c032862fba51a3ca504752d3a25186b324c5ce83</id>
<content type='text'>
The full dynticks tree needs the latest RCU and sched
upstream updates in order to fix some dependencies.

Merge a common upstream merge point that has these
updates.

Conflicts:
	include/linux/perf_event.h
	kernel/rcutree.h
	kernel/rcutree_plugin.h

Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The full dynticks tree needs the latest RCU and sched
upstream updates in order to fix some dependencies.

Merge a common upstream merge point that has these
updates.

Conflicts:
	include/linux/perf_event.h
	kernel/rcutree.h
	kernel/rcutree_plugin.h

Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2013-04-30T14:39:01+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-04-30T14:39:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1f889ec62c3f0d8913f3c32f9aff2a1e15099346'/>
<id>1f889ec62c3f0d8913f3c32f9aff2a1e15099346</id>
<content type='text'>
Pull RCU updates from Ingo Molnar:
 "The main changes in this cycle are mostly related to preparatory work
  for the full-dynticks work:

   - Remove restrictions on no-CBs CPUs, make RCU_FAST_NO_HZ take
     advantage of numbered callbacks, do callback accelerations based on
     numbered callbacks.  Posted to LKML at
        https://lkml.org/lkml/2013/3/18/960

   - RCU documentation updates.  Posted to LKML at
        https://lkml.org/lkml/2013/3/18/570

   - Miscellaneous fixes.  Posted to LKML at
        https://lkml.org/lkml/2013/3/18/594"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  rcu: Make rcu_accelerate_cbs() note need for future grace periods
  rcu: Abstract rcu_start_future_gp() from rcu_nocb_wait_gp()
  rcu: Rename n_nocb_gp_requests to need_future_gp
  rcu: Push lock release to rcu_start_gp()'s callers
  rcu: Repurpose no-CBs event tracing to future-GP events
  rcu: Rearrange locking in rcu_start_gp()
  rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks
  rcu: Accelerate RCU callbacks at grace-period end
  rcu: Export RCU_FAST_NO_HZ parameters to sysfs
  rcu: Distinguish "rcuo" kthreads by RCU flavor
  rcu: Add event tracing for no-CBs CPUs' grace periods
  rcu: Add event tracing for no-CBs CPUs' callback registration
  rcu: Introduce proper blocking to no-CBs kthreads GP waits
  rcu: Provide compile-time control for no-CBs CPUs
  rcu: Tone down debugging during boot-up and shutdown.
  rcu: Add softirq-stall indications to stall-warning messages
  rcu: Documentation update
  rcu: Make bugginess of code sample more evident
  rcu: Fix hlist_bl_set_first_rcu() annotation
  rcu: Delete unused rcu_node "wakemask" field
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull RCU updates from Ingo Molnar:
 "The main changes in this cycle are mostly related to preparatory work
  for the full-dynticks work:

   - Remove restrictions on no-CBs CPUs, make RCU_FAST_NO_HZ take
     advantage of numbered callbacks, do callback accelerations based on
     numbered callbacks.  Posted to LKML at
        https://lkml.org/lkml/2013/3/18/960

   - RCU documentation updates.  Posted to LKML at
        https://lkml.org/lkml/2013/3/18/570

   - Miscellaneous fixes.  Posted to LKML at
        https://lkml.org/lkml/2013/3/18/594"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  rcu: Make rcu_accelerate_cbs() note need for future grace periods
  rcu: Abstract rcu_start_future_gp() from rcu_nocb_wait_gp()
  rcu: Rename n_nocb_gp_requests to need_future_gp
  rcu: Push lock release to rcu_start_gp()'s callers
  rcu: Repurpose no-CBs event tracing to future-GP events
  rcu: Rearrange locking in rcu_start_gp()
  rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks
  rcu: Accelerate RCU callbacks at grace-period end
  rcu: Export RCU_FAST_NO_HZ parameters to sysfs
  rcu: Distinguish "rcuo" kthreads by RCU flavor
  rcu: Add event tracing for no-CBs CPUs' grace periods
  rcu: Add event tracing for no-CBs CPUs' callback registration
  rcu: Introduce proper blocking to no-CBs kthreads GP waits
  rcu: Provide compile-time control for no-CBs CPUs
  rcu: Tone down debugging during boot-up and shutdown.
  rcu: Add softirq-stall indications to stall-warning messages
  rcu: Documentation update
  rcu: Make bugginess of code sample more evident
  rcu: Fix hlist_bl_set_first_rcu() annotation
  rcu: Delete unused rcu_node "wakemask" field
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>kernel/: rename random32() to prandom_u32()</title>
<updated>2013-04-30T01:28:42+00:00</updated>
<author>
<name>Akinobu Mita</name>
<email>akinobu.mita@gmail.com</email>
</author>
<published>2013-04-29T23:21:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6d65df3325c380f3c897330c48f0e53d73b8f362'/>
<id>6d65df3325c380f3c897330c48f0e53d73b8f362</id>
<content type='text'>
Use preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita &lt;akinobu.mita@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita &lt;akinobu.mita@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nohz: Ensure full dynticks CPUs are RCU nocbs</title>
<updated>2013-04-19T11:54:04+00:00</updated>
<author>
<name>Frederic Weisbecker</name>
<email>fweisbec@gmail.com</email>
</author>
<published>2013-03-26T22:47:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d1e43fa5f8bb25f83a86a29f11fcfb57ed4d7566'/>
<id>d1e43fa5f8bb25f83a86a29f11fcfb57ed4d7566</id>
<content type='text'>
We need full dynticks CPU to also be RCU nocb so
that we don't have to keep the tick to handle RCU
callbacks.

Make sure the range passed to nohz_full= boot
parameter is a subset of rcu_nocbs=

The CPUs that fail to meet this requirement will be
excluded from the nohz_full range. This is checked
early in boot time, before any CPU has the opportunity
to stop its tick.

Suggested-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Reviewed-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Chris Metcalf &lt;cmetcalf@tilera.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Geoff Levand &lt;geoff@infradead.org&gt;
Cc: Gilad Ben Yossef &lt;gilad@benyossef.com&gt;
Cc: Hakan Akkan &lt;hakanakkan@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Kevin Hilman &lt;khilman@linaro.org&gt;
Cc: Li Zhong &lt;zhong@linux.vnet.ibm.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We need full dynticks CPU to also be RCU nocb so
that we don't have to keep the tick to handle RCU
callbacks.

Make sure the range passed to nohz_full= boot
parameter is a subset of rcu_nocbs=

The CPUs that fail to meet this requirement will be
excluded from the nohz_full range. This is checked
early in boot time, before any CPU has the opportunity
to stop its tick.

Suggested-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Reviewed-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Chris Metcalf &lt;cmetcalf@tilera.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Geoff Levand &lt;geoff@infradead.org&gt;
Cc: Gilad Ben Yossef &lt;gilad@benyossef.com&gt;
Cc: Hakan Akkan &lt;hakanakkan@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Kevin Hilman &lt;khilman@linaro.org&gt;
Cc: Li Zhong &lt;zhong@linux.vnet.ibm.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rcu: Kick adaptive-ticks CPUs that are holding up RCU grace periods</title>
<updated>2013-04-15T18:18:36+00:00</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.vnet.ibm.com</email>
</author>
<published>2013-04-12T23:19:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=65d798f0f9339ae2c4ebe9480e3260b33382a584'/>
<id>65d798f0f9339ae2c4ebe9480e3260b33382a584</id>
<content type='text'>
Adaptive-ticks CPUs inform RCU when they enter kernel mode, but they do
not necessarily turn the scheduler-clock tick back on.  This state of
affairs could result in RCU waiting on an adaptive-ticks CPU running
for an extended period in kernel mode.  Such a CPU will never run the
RCU state machine, and could therefore indefinitely extend the RCU state
machine, sooner or later resulting in an OOM condition.

This patch, inspired by an earlier patch by Frederic Weisbecker, therefore
causes RCU's force-quiescent-state processing to check for this condition
and to send an IPI to CPUs that remain in that state for too long.
"Too long" currently means about three jiffies by default, which is
quite some time for a CPU to remain in the kernel without blocking.
The rcu_tree.jiffies_till_first_fqs and rcutree.jiffies_till_next_fqs
sysfs variables may be used to tune "too long" if needed.

Reported-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Reviewed-by: Josh Triplett &lt;josh@joshtriplett.org&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Chris Metcalf &lt;cmetcalf@tilera.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Geoff Levand &lt;geoff@infradead.org&gt;
Cc: Gilad Ben Yossef &lt;gilad@benyossef.com&gt;
Cc: Hakan Akkan &lt;hakanakkan@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Kevin Hilman &lt;khilman@linaro.org&gt;
Cc: Li Zhong &lt;zhong@linux.vnet.ibm.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Adaptive-ticks CPUs inform RCU when they enter kernel mode, but they do
not necessarily turn the scheduler-clock tick back on.  This state of
affairs could result in RCU waiting on an adaptive-ticks CPU running
for an extended period in kernel mode.  Such a CPU will never run the
RCU state machine, and could therefore indefinitely extend the RCU state
machine, sooner or later resulting in an OOM condition.

This patch, inspired by an earlier patch by Frederic Weisbecker, therefore
causes RCU's force-quiescent-state processing to check for this condition
and to send an IPI to CPUs that remain in that state for too long.
"Too long" currently means about three jiffies by default, which is
quite some time for a CPU to remain in the kernel without blocking.
The rcu_tree.jiffies_till_first_fqs and rcutree.jiffies_till_next_fqs
sysfs variables may be used to tune "too long" if needed.

Reported-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Reviewed-by: Josh Triplett &lt;josh@joshtriplett.org&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Chris Metcalf &lt;cmetcalf@tilera.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Geoff Levand &lt;geoff@infradead.org&gt;
Cc: Gilad Ben Yossef &lt;gilad@benyossef.com&gt;
Cc: Hakan Akkan &lt;hakanakkan@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Kevin Hilman &lt;khilman@linaro.org&gt;
Cc: Li Zhong &lt;zhong@linux.vnet.ibm.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branches 'doc.2013.03.12a', 'fixes.2013.03.13a' and 'idlenocb.2013.03.26b' into HEAD</title>
<updated>2013-03-26T15:07:38+00:00</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.vnet.ibm.com</email>
</author>
<published>2013-03-26T15:07:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6d87669357936bffa1e8fea7a4e7743e76905736'/>
<id>6d87669357936bffa1e8fea7a4e7743e76905736</id>
<content type='text'>
doc.2013.03.12a: Documentation changes.

fixes.2013.03.13a: Miscellaneous fixes.

idlenocb.2013.03.26b: Remove restrictions on no-CBs CPUs, make
	RCU_FAST_NO_HZ take advantage of numbered callbacks, add
	callback acceleration based on numbered callbacks.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
doc.2013.03.12a: Documentation changes.

fixes.2013.03.13a: Miscellaneous fixes.

idlenocb.2013.03.26b: Remove restrictions on no-CBs CPUs, make
	RCU_FAST_NO_HZ take advantage of numbered callbacks, add
	callback acceleration based on numbered callbacks.
</pre>
</div>
</content>
</entry>
<entry>
<title>rcu: Make rcu_accelerate_cbs() note need for future grace periods</title>
<updated>2013-03-26T15:04:58+00:00</updated>
<author>
<name>Paul E. McKenney</name>
<email>paul.mckenney@linaro.org</email>
</author>
<published>2012-12-31T10:24:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=910ee45db2f4837c8440e770474758493ab94bf7'/>
<id>910ee45db2f4837c8440e770474758493ab94bf7</id>
<content type='text'>
Now that rcu_start_future_gp() has been abstracted from
rcu_nocb_wait_gp(), rcu_accelerate_cbs() can invoke rcu_start_future_gp()
so as to register the need for any future grace periods needed by a
CPU about to enter dyntick-idle mode.  This commit makes this change.
Note that some refactoring of rcu_start_gp() is carried out to avoid
recursion and subsequent self-deadlocks.

Signed-off-by: Paul E. McKenney &lt;paul.mckenney@linaro.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that rcu_start_future_gp() has been abstracted from
rcu_nocb_wait_gp(), rcu_accelerate_cbs() can invoke rcu_start_future_gp()
so as to register the need for any future grace periods needed by a
CPU about to enter dyntick-idle mode.  This commit makes this change.
Note that some refactoring of rcu_start_gp() is carried out to avoid
recursion and subsequent self-deadlocks.

Signed-off-by: Paul E. McKenney &lt;paul.mckenney@linaro.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rcu: Abstract rcu_start_future_gp() from rcu_nocb_wait_gp()</title>
<updated>2013-03-26T15:04:57+00:00</updated>
<author>
<name>Paul E. McKenney</name>
<email>paul.mckenney@linaro.org</email>
</author>
<published>2012-12-30T23:21:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0446be489795d8bb994125a916ef03211f539e54'/>
<id>0446be489795d8bb994125a916ef03211f539e54</id>
<content type='text'>
CPUs going idle will need to record the need for a future grace
period, but won't actually need to block waiting on it.  This commit
therefore splits rcu_start_future_gp(), which does the recording, from
rcu_nocb_wait_gp(), which now invokes rcu_start_future_gp() to do the
recording, after which rcu_nocb_wait_gp() does the waiting.

Signed-off-by: Paul E. McKenney &lt;paul.mckenney@linaro.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
CPUs going idle will need to record the need for a future grace
period, but won't actually need to block waiting on it.  This commit
therefore splits rcu_start_future_gp(), which does the recording, from
rcu_nocb_wait_gp(), which now invokes rcu_start_future_gp() to do the
recording, after which rcu_nocb_wait_gp() does the waiting.

Signed-off-by: Paul E. McKenney &lt;paul.mckenney@linaro.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
