<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel/softirq.c, branch v3.2.73</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>Fix lockup related to stop_machine being stuck in __do_softirq.</title>
<updated>2015-08-06T23:32:19+00:00</updated>
<author>
<name>Ben Greear</name>
<email>greearb@candelatech.com</email>
</author>
<published>2013-06-06T21:29:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=383253cfed2018d4422c6251170107823594309d'/>
<id>383253cfed2018d4422c6251170107823594309d</id>
<content type='text'>
commit 34376a50fb1fa095b9d0636fa41ed2e73125f214 upstream.

The stop machine logic can lock up if all but one of the migration
threads make it through the disable-irq step and the one remaining
thread gets stuck in __do_softirq.  The reason __do_softirq can hang is
that it has a bail-out based on jiffies timeout, but in the lockup case,
jiffies itself is not incremented.

To work around this, re-add the max_restart counter in __do_irq and stop
processing irqs after 10 restarts.

Thanks to Tejun Heo and Rusty Russell and others for helping me track
this down.

This was introduced in 3.9 by commit c10d73671ad3 ("softirq: reduce
latencies").

It may be worth looking into ath9k to see if it has issues with its irq
handler at a later date.

The hang stack traces look something like this:

    ------------[ cut here ]------------
    WARNING: at kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xa7()
    Watchdog detected hard LOCKUP on cpu 2
    Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
    Pid: 23, comm: migration/2 Tainted: G         C   3.9.4+ #11
    Call Trace:
     &lt;NMI&gt;   warn_slowpath_common+0x85/0x9f
      warn_slowpath_fmt+0x46/0x48
      watchdog_overflow_callback+0x9c/0xa7
      __perf_event_overflow+0x137/0x1cb
      perf_event_overflow+0x14/0x16
      intel_pmu_handle_irq+0x2dc/0x359
      perf_event_nmi_handler+0x19/0x1b
      nmi_handle+0x7f/0xc2
      do_nmi+0xbc/0x304
      end_repeat_nmi+0x1e/0x2e
     &lt;&lt;EOE&gt;&gt;
      cpu_stopper_thread+0xae/0x162
      smpboot_thread_fn+0x258/0x260
      kthread+0xc7/0xcf
      ret_from_fork+0x7c/0xb0
    ---[ end trace 4947dfa9b0a4cec3 ]---
    BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:17]
    Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
    irq event stamp: 835637905
    hardirqs last  enabled at (835637904): __do_softirq+0x9f/0x257
    hardirqs last disabled at (835637905): apic_timer_interrupt+0x6d/0x80
    softirqs last  enabled at (5654720): __do_softirq+0x1ff/0x257
    softirqs last disabled at (5654725): irq_exit+0x5f/0xbb
    CPU 1
    Pid: 17, comm: migration/1 Tainted: G        WC   3.9.4+ #11 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
    RIP: tasklet_hi_action+0xf0/0xf0
    Process migration/1
    Call Trace:
     &lt;IRQ&gt;
      __do_softirq+0x117/0x257
      irq_exit+0x5f/0xbb
      smp_apic_timer_interrupt+0x8a/0x98
      apic_timer_interrupt+0x72/0x80
     &lt;EOI&gt;
      printk+0x4d/0x4f
      stop_machine_cpu_stop+0x22c/0x274
      cpu_stopper_thread+0xae/0x162
      smpboot_thread_fn+0x258/0x260
      kthread+0xc7/0xcf
      ret_from_fork+0x7c/0xb0

Signed-off-by: Ben Greear &lt;greearb@candelatech.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Pekka Riikonen &lt;priikone@iki.fi&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Cc: Rui Xiang &lt;rui.xiang@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 34376a50fb1fa095b9d0636fa41ed2e73125f214 upstream.

The stop machine logic can lock up if all but one of the migration
threads make it through the disable-irq step and the one remaining
thread gets stuck in __do_softirq.  The reason __do_softirq can hang is
that it has a bail-out based on jiffies timeout, but in the lockup case,
jiffies itself is not incremented.

To work around this, re-add the max_restart counter in __do_irq and stop
processing irqs after 10 restarts.

Thanks to Tejun Heo and Rusty Russell and others for helping me track
this down.

This was introduced in 3.9 by commit c10d73671ad3 ("softirq: reduce
latencies").

It may be worth looking into ath9k to see if it has issues with its irq
handler at a later date.

The hang stack traces look something like this:

    ------------[ cut here ]------------
    WARNING: at kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xa7()
    Watchdog detected hard LOCKUP on cpu 2
    Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
    Pid: 23, comm: migration/2 Tainted: G         C   3.9.4+ #11
    Call Trace:
     &lt;NMI&gt;   warn_slowpath_common+0x85/0x9f
      warn_slowpath_fmt+0x46/0x48
      watchdog_overflow_callback+0x9c/0xa7
      __perf_event_overflow+0x137/0x1cb
      perf_event_overflow+0x14/0x16
      intel_pmu_handle_irq+0x2dc/0x359
      perf_event_nmi_handler+0x19/0x1b
      nmi_handle+0x7f/0xc2
      do_nmi+0xbc/0x304
      end_repeat_nmi+0x1e/0x2e
     &lt;&lt;EOE&gt;&gt;
      cpu_stopper_thread+0xae/0x162
      smpboot_thread_fn+0x258/0x260
      kthread+0xc7/0xcf
      ret_from_fork+0x7c/0xb0
    ---[ end trace 4947dfa9b0a4cec3 ]---
    BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:17]
    Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
    irq event stamp: 835637905
    hardirqs last  enabled at (835637904): __do_softirq+0x9f/0x257
    hardirqs last disabled at (835637905): apic_timer_interrupt+0x6d/0x80
    softirqs last  enabled at (5654720): __do_softirq+0x1ff/0x257
    softirqs last disabled at (5654725): irq_exit+0x5f/0xbb
    CPU 1
    Pid: 17, comm: migration/1 Tainted: G        WC   3.9.4+ #11 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
    RIP: tasklet_hi_action+0xf0/0xf0
    Process migration/1
    Call Trace:
     &lt;IRQ&gt;
      __do_softirq+0x117/0x257
      irq_exit+0x5f/0xbb
      smp_apic_timer_interrupt+0x8a/0x98
      apic_timer_interrupt+0x72/0x80
     &lt;EOI&gt;
      printk+0x4d/0x4f
      stop_machine_cpu_stop+0x22c/0x274
      cpu_stopper_thread+0xae/0x162
      smpboot_thread_fn+0x258/0x260
      kthread+0xc7/0xcf
      ret_from_fork+0x7c/0xb0

Signed-off-by: Ben Greear &lt;greearb@candelatech.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Pekka Riikonen &lt;priikone@iki.fi&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Cc: Rui Xiang &lt;rui.xiang@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>softirq: reduce latencies</title>
<updated>2015-08-06T23:32:19+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-01-10T23:26:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=29a07c1eb1ddd861fbf501df25bbc154999d06cb'/>
<id>29a07c1eb1ddd861fbf501df25bbc154999d06cb</id>
<content type='text'>
commit c10d73671ad30f54692f7f69f0e09e75d3a8926a upstream.

In various network workloads, __do_softirq() latencies can be up
to 20 ms if HZ=1000, and 200 ms if HZ=100.

This is because we iterate 10 times in the softirq dispatcher,
and some actions can consume a lot of cycles.

This patch changes the fallback to ksoftirqd condition to :

- A time limit of 2 ms.
- need_resched() being set on current task

When one of this condition is met, we wakeup ksoftirqd for further
softirq processing if we still have pending softirqs.

Using need_resched() as the only condition can trigger RCU stalls,
as we can keep BH disabled for too long.

I ran several benchmarks and got no significant difference in
throughput, but a very significant reduction of latencies (one order
of magnitude) :

In following bench, 200 antagonist "netperf -t TCP_RR" are started in
background, using all available cpus.

Then we start one "netperf -t TCP_RR", bound to the cpu handling the NIC
IRQ (hard+soft)

Before patch :

# netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=550110.424
MIN_LATENCY=146858
MAX_LATENCY=997109
P50_LATENCY=305000
P90_LATENCY=550000
P99_LATENCY=710000
MEAN_LATENCY=376989.12
STDDEV_LATENCY=184046.92

After patch :

# netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=40545.492
MIN_LATENCY=9834
MAX_LATENCY=78366
P50_LATENCY=33583
P90_LATENCY=59000
P99_LATENCY=69000
MEAN_LATENCY=38364.67
STDDEV_LATENCY=12865.26

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Cc: Rui Xiang &lt;rui.xiang@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c10d73671ad30f54692f7f69f0e09e75d3a8926a upstream.

In various network workloads, __do_softirq() latencies can be up
to 20 ms if HZ=1000, and 200 ms if HZ=100.

This is because we iterate 10 times in the softirq dispatcher,
and some actions can consume a lot of cycles.

This patch changes the fallback to ksoftirqd condition to :

- A time limit of 2 ms.
- need_resched() being set on current task

When one of this condition is met, we wakeup ksoftirqd for further
softirq processing if we still have pending softirqs.

Using need_resched() as the only condition can trigger RCU stalls,
as we can keep BH disabled for too long.

I ran several benchmarks and got no significant difference in
throughput, but a very significant reduction of latencies (one order
of magnitude) :

In following bench, 200 antagonist "netperf -t TCP_RR" are started in
background, using all available cpus.

Then we start one "netperf -t TCP_RR", bound to the cpu handling the NIC
IRQ (hard+soft)

Before patch :

# netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=550110.424
MIN_LATENCY=146858
MAX_LATENCY=997109
P50_LATENCY=305000
P90_LATENCY=550000
P99_LATENCY=710000
MEAN_LATENCY=376989.12
STDDEV_LATENCY=184046.92

After patch :

# netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=40545.492
MIN_LATENCY=9834
MAX_LATENCY=78366
P50_LATENCY=33583
P90_LATENCY=59000
P99_LATENCY=69000
MEAN_LATENCY=38364.67
STDDEV_LATENCY=12865.26

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Cc: Rui Xiang &lt;rui.xiang@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kernel: Map most files to use export.h instead of module.h</title>
<updated>2011-10-31T13:20:12+00:00</updated>
<author>
<name>Paul Gortmaker</name>
<email>paul.gortmaker@windriver.com</email>
</author>
<published>2011-05-23T18:51:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9984de1a5a8a96275fcab818f7419af5a3c86e71'/>
<id>9984de1a5a8a96275fcab818f7419af5a3c86e71</id>
<content type='text'>
The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else.  Revector them
onto the isolated export header for faster compile times.

Nothing to see here but a whole lot of instances of:

  -#include &lt;linux/module.h&gt;
  +#include &lt;linux/export.h&gt;

This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.

Signed-off-by: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else.  Revector them
onto the isolated export header for faster compile times.

Nothing to see here but a whole lot of instances of:

  -#include &lt;linux/module.h&gt;
  +#include &lt;linux/export.h&gt;

This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.

Signed-off-by: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>softirq,rcu: Inform RCU of irq_exit() activity</title>
<updated>2011-07-20T17:50:12+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2011-07-19T22:32:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ec433f0c51527426989ea8a38a856d810d739414'/>
<id>ec433f0c51527426989ea8a38a856d810d739414</id>
<content type='text'>
The rcu_read_unlock_special() function relies on in_irq() to exclude
scheduler activity from interrupt level.  This fails because exit_irq()
can invoke the scheduler after clearing the preempt_count() bits that
in_irq() uses to determine that it is at interrupt level.  This situation
can result in failures as follows:

 $task			IRQ		SoftIRQ

 rcu_read_lock()

 /* do stuff */

 &lt;preempt&gt; |= UNLOCK_BLOCKED

 rcu_read_unlock()
   --t-&gt;rcu_read_lock_nesting

			irq_enter();
			/* do stuff, don't use RCU */
			irq_exit();
			  sub_preempt_count(IRQ_EXIT_OFFSET);
			  invoke_softirq()

					ttwu();
					  spin_lock_irq(&amp;pi-&gt;lock)
					  rcu_read_lock();
					  /* do stuff */
					  rcu_read_unlock();
					    rcu_read_unlock_special()
					      rcu_report_exp_rnp()
					        ttwu()
					          spin_lock_irq(&amp;pi-&gt;lock) /* deadlock */

   rcu_read_unlock_special(t);

Ed can simply trigger this 'easy' because invoke_softirq() immediately
does a ttwu() of ksoftirqd/# instead of doing the in-place softirq stuff
first, but even without that the above happens.

Cure this by also excluding softirqs from the
rcu_read_unlock_special() handler and ensuring the force_irqthreads
ksoftirqd/# wakeup is done from full softirq context.

[ Alternatively, delaying the -&gt;rcu_read_lock_nesting decrement
  until after the special handling would make the thing more robust
  in the face of interrupts as well.  And there is a separate patch
  for that. ]

Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reported-and-tested-by: Ed Tomlinson &lt;edt@aei.ca&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The rcu_read_unlock_special() function relies on in_irq() to exclude
scheduler activity from interrupt level.  This fails because exit_irq()
can invoke the scheduler after clearing the preempt_count() bits that
in_irq() uses to determine that it is at interrupt level.  This situation
can result in failures as follows:

 $task			IRQ		SoftIRQ

 rcu_read_lock()

 /* do stuff */

 &lt;preempt&gt; |= UNLOCK_BLOCKED

 rcu_read_unlock()
   --t-&gt;rcu_read_lock_nesting

			irq_enter();
			/* do stuff, don't use RCU */
			irq_exit();
			  sub_preempt_count(IRQ_EXIT_OFFSET);
			  invoke_softirq()

					ttwu();
					  spin_lock_irq(&amp;pi-&gt;lock)
					  rcu_read_lock();
					  /* do stuff */
					  rcu_read_unlock();
					    rcu_read_unlock_special()
					      rcu_report_exp_rnp()
					        ttwu()
					          spin_lock_irq(&amp;pi-&gt;lock) /* deadlock */

   rcu_read_unlock_special(t);

Ed can simply trigger this 'easy' because invoke_softirq() immediately
does a ttwu() of ksoftirqd/# instead of doing the in-place softirq stuff
first, but even without that the above happens.

Cure this by also excluding softirqs from the
rcu_read_unlock_special() handler and ensuring the force_irqthreads
ksoftirqd/# wakeup is done from full softirq context.

[ Alternatively, delaying the -&gt;rcu_read_lock_nesting decrement
  until after the special handling would make the thing more robust
  in the face of interrupts as well.  And there is a separate patch
  for that. ]

Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reported-and-tested-by: Ed Tomlinson &lt;edt@aei.ca&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rcu: Use softirq to address performance regression</title>
<updated>2011-06-14T22:25:39+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shaohua.li@intel.com</email>
</author>
<published>2011-06-14T05:26:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=09223371deac67d08ca0b70bd18787920284c967'/>
<id>09223371deac67d08ca0b70bd18787920284c967</id>
<content type='text'>
Commit a26ac2455ffcf3(rcu: move TREE_RCU from softirq to kthread)
introduced performance regression. In an AIM7 test, this commit degraded
performance by about 40%.

The commit runs rcu callbacks in a kthread instead of softirq. We observed
high rate of context switch which is caused by this. Out test system has
64 CPUs and HZ is 1000, so we saw more than 64k context switch per second
which is caused by RCU's per-CPU kthread.  A trace showed that most of
the time the RCU per-CPU kthread doesn't actually handle any callbacks,
but instead just does a very small amount of work handling grace periods.
This means that RCU's per-CPU kthreads are making the scheduler do quite
a bit of work in order to allow a very small amount of RCU-related
processing to be done.

Alex Shi's analysis determined that this slowdown is due to lock
contention within the scheduler.  Unfortunately, as Peter Zijlstra points
out, the scheduler's real-time semantics require global action, which
means that this contention is inherent in real-time scheduling.  (Yes,
perhaps someone will come up with a workaround -- otherwise, -rt is not
going to do well on large SMP systems -- but this patch will work around
this issue in the meantime.  And "the meantime" might well be forever.)

This patch therefore re-introduces softirq processing to RCU, but only
for core RCU work.  RCU callbacks are still executed in kthread context,
so that only a small amount of RCU work runs in softirq context in the
common case.  This should minimize ksoftirqd execution, allowing us to
skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels.

Signed-off-by: Shaohua Li &lt;shaohua.li@intel.com&gt;
Tested-by: "Alex,Shi" &lt;alex.shi@intel.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit a26ac2455ffcf3(rcu: move TREE_RCU from softirq to kthread)
introduced performance regression. In an AIM7 test, this commit degraded
performance by about 40%.

The commit runs rcu callbacks in a kthread instead of softirq. We observed
high rate of context switch which is caused by this. Out test system has
64 CPUs and HZ is 1000, so we saw more than 64k context switch per second
which is caused by RCU's per-CPU kthread.  A trace showed that most of
the time the RCU per-CPU kthread doesn't actually handle any callbacks,
but instead just does a very small amount of work handling grace periods.
This means that RCU's per-CPU kthreads are making the scheduler do quite
a bit of work in order to allow a very small amount of RCU-related
processing to be done.

Alex Shi's analysis determined that this slowdown is due to lock
contention within the scheduler.  Unfortunately, as Peter Zijlstra points
out, the scheduler's real-time semantics require global action, which
means that this contention is inherent in real-time scheduling.  (Yes,
perhaps someone will come up with a workaround -- otherwise, -rt is not
going to do well on large SMP systems -- but this patch will work around
this issue in the meantime.  And "the meantime" might well be forever.)

This patch therefore re-introduces softirq processing to RCU, but only
for core RCU work.  RCU callbacks are still executed in kthread context,
so that only a small amount of RCU work runs in softirq context in the
common case.  This should minimize ksoftirqd execution, allowing us to
skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels.

Signed-off-by: Shaohua Li &lt;shaohua.li@intel.com&gt;
Tested-by: "Alex,Shi" &lt;alex.shi@intel.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rcu: move TREE_RCU from softirq to kthread</title>
<updated>2011-05-06T06:16:54+00:00</updated>
<author>
<name>Paul E. McKenney</name>
<email>paul.mckenney@linaro.org</email>
</author>
<published>2011-01-12T22:10:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a26ac2455ffcf3be5c6ef92bc6df7182700f2114'/>
<id>a26ac2455ffcf3be5c6ef92bc6df7182700f2114</id>
<content type='text'>
If RCU priority boosting is to be meaningful, callback invocation must
be boosted in addition to preempted RCU readers.  Otherwise, in presence
of CPU real-time threads, the grace period ends, but the callbacks don't
get invoked.  If the callbacks don't get invoked, the associated memory
doesn't get freed, so the system is still subject to OOM.

But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit
moves the callback invocations to a kthread, which can be boosted easily.

Also add comments and properly synchronized all accesses to
rcu_cpu_kthread_task, as suggested by Lai Jiangshan.

Signed-off-by: Paul E. McKenney &lt;paul.mckenney@linaro.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Reviewed-by: Josh Triplett &lt;josh@joshtriplett.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If RCU priority boosting is to be meaningful, callback invocation must
be boosted in addition to preempted RCU readers.  Otherwise, in presence
of CPU real-time threads, the grace period ends, but the callbacks don't
get invoked.  If the callbacks don't get invoked, the associated memory
doesn't get freed, so the system is still subject to OOM.

But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit
moves the callback invocations to a kthread, which can be boosted easily.

Also add comments and properly synchronized all accesses to
rcu_cpu_kthread_task, as suggested by Lai Jiangshan.

Signed-off-by: Paul E. McKenney &lt;paul.mckenney@linaro.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Reviewed-by: Josh Triplett &lt;josh@joshtriplett.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix common misspellings</title>
<updated>2011-03-31T14:26:23+00:00</updated>
<author>
<name>Lucas De Marchi</name>
<email>lucas.demarchi@profusion.mobi</email>
</author>
<published>2011-03-31T01:57:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=25985edcedea6396277003854657b5f3cb31a628'/>
<id>25985edcedea6396277003854657b5f3cb31a628</id>
<content type='text'>
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@profusion.mobi&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@profusion.mobi&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kthread: use kthread_create_on_node()</title>
<updated>2011-03-23T00:44:01+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-03-22T23:30:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=94dcf29a11b3d20a28790598d701f98484a969da'/>
<id>94dcf29a11b3d20a28790598d701f98484a969da</id>
<content type='text'>
ksoftirqd, kworker, migration, and pktgend kthreads can be created with
kthread_create_on_node(), to get proper NUMA affinities for their stack and
task_struct.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Reviewed-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Acked-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: &lt;linux-arch@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
ksoftirqd, kworker, migration, and pktgend kthreads can be created with
kthread_create_on_node(), to get proper NUMA affinities for their stack and
task_struct.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Reviewed-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Acked-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: &lt;linux-arch@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip</title>
<updated>2011-03-16T02:23:40+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-03-16T02:23:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5f6fb45466b2273ffb91c9cf209f164f666c33b1'/>
<id>5f6fb45466b2273ffb91c9cf209f164f666c33b1</id>
<content type='text'>
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (116 commits)
  x86: Enable forced interrupt threading support
  x86: Mark low level interrupts IRQF_NO_THREAD
  x86: Use generic show_interrupts
  x86: ioapic: Avoid redundant lookup of irq_cfg
  x86: ioapic: Use new move_irq functions
  x86: Use the proper accessors in fixup_irqs()
  x86: ioapic: Use irq_data-&gt;state
  x86: ioapic: Simplify irq chip and handler setup
  x86: Cleanup the genirq name space
  genirq: Add chip flag to force mask on suspend
  genirq: Add desc-&gt;irq_data accessor
  genirq: Add comments to Kconfig switches
  genirq: Fixup fasteoi handler for oneshot mode
  genirq: Provide forced interrupt threading
  sched: Switch wait_task_inactive to schedule_hrtimeout()
  genirq: Add IRQF_NO_THREAD
  genirq: Allow shared oneshot interrupts
  genirq: Prepare the handling of shared oneshot interrupts
  genirq: Make warning in handle_percpu_event useful
  x86: ioapic: Move trigger defines to io_apic.h
  ...

Fix up trivial(?) conflicts in arch/x86/pci/xen.c due to genirq name
space changes clashing with the Xen cleanups.  The set_irq_msi() had
moved to xen_bind_pirq_msi_to_irq().
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (116 commits)
  x86: Enable forced interrupt threading support
  x86: Mark low level interrupts IRQF_NO_THREAD
  x86: Use generic show_interrupts
  x86: ioapic: Avoid redundant lookup of irq_cfg
  x86: ioapic: Use new move_irq functions
  x86: Use the proper accessors in fixup_irqs()
  x86: ioapic: Use irq_data-&gt;state
  x86: ioapic: Simplify irq chip and handler setup
  x86: Cleanup the genirq name space
  genirq: Add chip flag to force mask on suspend
  genirq: Add desc-&gt;irq_data accessor
  genirq: Add comments to Kconfig switches
  genirq: Fixup fasteoi handler for oneshot mode
  genirq: Provide forced interrupt threading
  sched: Switch wait_task_inactive to schedule_hrtimeout()
  genirq: Add IRQF_NO_THREAD
  genirq: Allow shared oneshot interrupts
  genirq: Prepare the handling of shared oneshot interrupts
  genirq: Make warning in handle_percpu_event useful
  x86: ioapic: Move trigger defines to io_apic.h
  ...

Fix up trivial(?) conflicts in arch/x86/pci/xen.c due to genirq name
space changes clashing with the Xen cleanups.  The set_irq_msi() had
moved to xen_bind_pirq_msi_to_irq().
</pre>
</div>
</content>
</entry>
<entry>
<title>genirq: Provide forced interrupt threading</title>
<updated>2011-02-26T10:57:18+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2011-02-23T23:52:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8d32a307e4faa8b123dc8a9cd56d1a7525f69ad3'/>
<id>8d32a307e4faa8b123dc8a9cd56d1a7525f69ad3</id>
<content type='text'>
Add a commandline parameter "threadirqs" which forces all interrupts except
those marked IRQF_NO_THREAD to run threaded. That's mostly a debug option to
allow retrieving better debug data from crashing interrupt handlers. If
"threadirqs" is not enabled on the kernel command line, then there is no
impact in the interrupt hotpath.

Architecture code needs to select CONFIG_IRQ_FORCED_THREADING after
marking the interrupts which cant be threaded IRQF_NO_THREAD. All
interrupts which have IRQF_TIMER set are implict marked
IRQF_NO_THREAD. Also all PER_CPU interrupts are excluded.

Forced threading hard interrupts also forces all soft interrupt
handling into thread context.

When enabled it might slow down things a bit, but for debugging problems in
interrupt code it's a reasonable penalty as it does not immediately
crash and burn the machine when an interrupt handler is buggy.

Some test results on a Core2Duo machine:

Cache cold run of:
 # time git grep irq_desc

      non-threaded       threaded
 real 1m18.741s          1m19.061s
 user 0m1.874s           0m1.757s
 sys  0m5.843s           0m5.427s

 # iperf -c server
non-threaded
[  3]  0.0-10.0 sec  1.09 GBytes   933 Mbits/sec
[  3]  0.0-10.0 sec  1.09 GBytes   934 Mbits/sec
[  3]  0.0-10.0 sec  1.09 GBytes   933 Mbits/sec
threaded
[  3]  0.0-10.0 sec  1.09 GBytes   939 Mbits/sec
[  3]  0.0-10.0 sec  1.09 GBytes   934 Mbits/sec
[  3]  0.0-10.0 sec  1.09 GBytes   937 Mbits/sec

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
LKML-Reference: &lt;20110223234956.772668648@linutronix.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a commandline parameter "threadirqs" which forces all interrupts except
those marked IRQF_NO_THREAD to run threaded. That's mostly a debug option to
allow retrieving better debug data from crashing interrupt handlers. If
"threadirqs" is not enabled on the kernel command line, then there is no
impact in the interrupt hotpath.

Architecture code needs to select CONFIG_IRQ_FORCED_THREADING after
marking the interrupts which cant be threaded IRQF_NO_THREAD. All
interrupts which have IRQF_TIMER set are implict marked
IRQF_NO_THREAD. Also all PER_CPU interrupts are excluded.

Forced threading hard interrupts also forces all soft interrupt
handling into thread context.

When enabled it might slow down things a bit, but for debugging problems in
interrupt code it's a reasonable penalty as it does not immediately
crash and burn the machine when an interrupt handler is buggy.

Some test results on a Core2Duo machine:

Cache cold run of:
 # time git grep irq_desc

      non-threaded       threaded
 real 1m18.741s          1m19.061s
 user 0m1.874s           0m1.757s
 sys  0m5.843s           0m5.427s

 # iperf -c server
non-threaded
[  3]  0.0-10.0 sec  1.09 GBytes   933 Mbits/sec
[  3]  0.0-10.0 sec  1.09 GBytes   934 Mbits/sec
[  3]  0.0-10.0 sec  1.09 GBytes   933 Mbits/sec
threaded
[  3]  0.0-10.0 sec  1.09 GBytes   939 Mbits/sec
[  3]  0.0-10.0 sec  1.09 GBytes   934 Mbits/sec
[  3]  0.0-10.0 sec  1.09 GBytes   937 Mbits/sec

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
LKML-Reference: &lt;20110223234956.772668648@linutronix.de&gt;
</pre>
</div>
</content>
</entry>
</feed>
