summaryrefslogtreecommitdiff
path: root/kernel/rcu
AgeCommit message (Collapse)Author
2014-08-28rcu: Make nocb leader kthreads process pending callbacks after spawningPranith Kumar
The nocb callbacks generated before the nocb kthreads are spawned are enqueued in the nocb queue for later processing. Commit fbce7497ee5af ("rcu: Parallelize and economize NOCB kthread wakeups") introduced nocb leader kthreads which checked the nocb_leader_wake flag to see if there were any such pending callbacks. A case was reported in which newly spawned leader kthreads were not processing the pending callbacks as this flag was not set, which led to a boot hang. The following commit ensures that the newly spawned nocb kthreads process the pending callbacks by allowing the kthreads to run immediately after spawning instead of waiting. This is done by inverting the logic of nocb_leader_wake tests to nocb_leader_sleep which allows us to use the default initialization of this flag to 0 to let the kthreads run. Reported-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Link: http://www.spinics.net/lists/kernel/msg1802899.html [ paulmck: Backported to v3.17-rc2. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Amit Shah <amit.shah@redhat.com>
2014-08-04Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU changes from Ingo Molar: "The main changes: - torture-test updates - callback-offloading changes - maintainership changes - update RCU documentation - miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) rcu: Allow for NULL tick_nohz_full_mask when nohz_full= missing rcu: Fix a sparse warning in rcu_report_unblock_qs_rnp() rcu: Fix a sparse warning in rcu_initiate_boost() rcu: Fix __rcu_reclaim() to use true/false for bool rcu: Remove CONFIG_PROVE_RCU_DELAY rcu: Use __this_cpu_read() instead of per_cpu_ptr() rcu: Don't use NMIs to dump other CPUs' stacks rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs rcu: Simplify priority boosting by putting rt_mutex in rcu_node rcu: Check both root and current rcu_node when setting up future grace period rcu: Allow post-unlock reference for rt_mutex rcu: Loosen __call_rcu()'s rcu_head alignment constraint rcu: Eliminate read-modify-write ACCESS_ONCE() calls rcu: Remove redundant ACCESS_ONCE() from tick_do_timer_cpu rcu: Make rcu node arrays static const char * const signal: Explain local_irq_save() call rcu: Handle obsolete references to TINY_PREEMPT_RCU rcu: Document deadlock-avoidance information for rcu_read_unlock() scripts: Teach get_maintainer.pl about the new "R:" tag rcu: Update rcu torture maintainership filename patterns ...
2014-07-30Josh has movedJosh Triplett
My IBM email addresses haven't worked for years; also map some old-but-functional forwarding addresses to my canonical address. Update my GPG key fingerprint; I moved to 4096R a long time ago. Update description. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-16rcu: Allow for NULL tick_nohz_full_mask when nohz_full= missingPaul E. McKenney
If there isn't a nohz_full= kernel parameter specified, then tick_nohz_full_mask can legitimately be NULL. This can cause problems when RCU's boot code tries to cpumask_or() this value into rcu_nocb_mask. In addition, if NO_HZ_FULL_ALL=y, there is no point in doing the cpumask_or() in the first place because this will cause RCU_NOCB_CPU_ALL=y, which in turn will have all bits already set in rcu_nocb_mask. This commit therefore avoids the cpumask_or() if NO_HZ_FULL_ALL=y and checks for !tick_nohz_full_running otherwise, this latter check catching cases when there was no nohz_full= kernel parameter specified. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-07-09Merge branches 'doc.2014.07.08a', 'fixes.2014.07.09a', ↵Paul E. McKenney
'maintainers.2014.07.08b', 'nocbs.2014.07.07a' and 'torture.2014.07.07a' into HEAD doc.2014.07.08a: Documentation updates. fixes.2014.07.09a: Miscellaneous fixes. maintainers.2014.07.08b: Maintainership updates. nocbs.2014.07.07a: Callback-offloading fixes. torture.2014.07.07a: Torture-test updates.
2014-07-09rcu: Fix a sparse warning in rcu_report_unblock_qs_rnp()Pranith Kumar
This commit annotates rcu_report_unblock_qs_rnp() in order to fix the following sparse warning: kernel/rcu/tree_plugin.h:990:13: warning: context imbalance in 'rcu_report_unblock_qs_rnp' - unexpected unlock Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2014-07-09rcu: Fix a sparse warning in rcu_initiate_boost()Pranith Kumar
This commit annotates rcu_initiate_boost() fixes the following sparse warning: kernel/rcu/tree_plugin.h:1494:13: warning: context imbalance in 'rcu_initiate_boost' - unexpected unlock Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2014-07-09rcu: Fix __rcu_reclaim() to use true/false for boolPaul E. McKenney
The __rcu_reclaim() function returned 0/1, which is not proper for a function of type bool. This commit therefore converts to false/true. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2014-07-09rcu: Remove CONFIG_PROVE_RCU_DELAYPaul E. McKenney
The CONFIG_PROVE_RCU_DELAY Kconfig parameter doesn't appear to be very effective at finding race conditions, so this commit removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> [ paulmck: Remove definition and uses as noted by Paul Bolle. ]
2014-07-09rcu: Use __this_cpu_read() instead of per_cpu_ptr()Shan Wei
The __this_cpu_read() function produces better code than does per_cpu_ptr() on both ARM and x86. For example, gcc (Ubuntu/Linaro 4.7.3-12ubuntu1) 4.7.3 produces the following: ARMv7 per_cpu_ptr(): force_quiescent_state: mov r3, sp @, bic r1, r3, #8128 @ tmp171,, ldr r2, .L98 @ tmp169, bic r1, r1, #63 @ tmp170, tmp171, ldr r3, [r0, #220] @ __ptr, rsp_6(D)->rda ldr r1, [r1, #20] @ D.35903_68->cpu, D.35903_68->cpu mov r6, r0 @ rsp, rsp ldr r2, [r2, r1, asl #2] @ tmp173, __per_cpu_offset add r3, r3, r2 @ tmp175, __ptr, tmp173 ldr r5, [r3, #12] @ rnp_old, D.29162_13->mynode ARMv7 __this_cpu_read(): force_quiescent_state: ldr r3, [r0, #220] @ rsp_7(D)->rda, rsp_7(D)->rda mov r6, r0 @ rsp, rsp add r3, r3, #12 @ __ptr, rsp_7(D)->rda, ldr r5, [r2, r3] @ rnp_old, *D.29176_13 Using gcc 4.8.2: x86_64 per_cpu_ptr(): movl %gs:cpu_number,%edx # cpu_number, pscr_ret__ movslq %edx, %rdx # pscr_ret__, pscr_ret__ movq __per_cpu_offset(,%rdx,8), %rdx # __per_cpu_offset, tmp93 movq %rdi, %r13 # rsp, rsp movq 1000(%rdi), %rax # rsp_9(D)->rda, __ptr movq 24(%rdx,%rax), %r12 # _15->mynode, rnp_old x86_64 __this_cpu_read(): movq %rdi, %r13 # rsp, rsp movq 1000(%rdi), %rax # rsp_9(D)->rda, rsp_9(D)->rda movq %gs:24(%rax),%r12 # _10->mynode, rnp_old Because this change produces significant benefits for these two very diverse architectures, this commit makes this change. Signed-off-by: Shan Wei <davidshan@tencent.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2014-07-09rcu: Don't use NMIs to dump other CPUs' stacksPaul E. McKenney
Although NMI-based stack dumps are in principle more accurate, they are also more likely to trigger deadlocks. This commit therefore replaces all uses of trigger_all_cpu_backtrace() with rcu_dump_cpu_stacks(), so that the CPU detecting an RCU CPU stall does the stack dumping. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2014-07-09rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUsPaul E. McKenney
Binding the grace-period kthreads to the timekeeping CPU resulted in significant performance decreases for some workloads. For more detail, see: https://lkml.org/lkml/2014/6/3/395 for benchmark numbers https://lkml.org/lkml/2014/6/4/218 for CPU statistics It turns out that it is necessary to bind the grace-period kthreads to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other. In other cases, it suffices to bind the grace-period kthreads to the set of non-nohz_full CPUs. This commit therefore creates a tick_nohz_not_full_mask that is the complement of tick_nohz_full_mask, and then binds the grace-period kthread to the set of CPUs indicated by this new mask, which covers the CONFIG_NO_HZ_FULL_SYSIDLE=n case. The CONFIG_NO_HZ_FULL_SYSIDLE=y case still binds the grace-period kthreads to the timekeeping CPU. This commit also includes the tick_nohz_full_enabled() check suggested by Frederic Weisbecker. Reported-by: Jet Chen <jet.chen@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Created housekeeping_affine() and housekeeping_mask per fweisbec feedback. ]
2014-07-09rcu: Simplify priority boosting by putting rt_mutex in rcu_nodePaul E. McKenney
RCU priority boosting currently checks for boosting via a pointer in task_struct. However, this is not needed: As Oleg noted, if the rt_mutex is placed in the rcu_node instead of on the booster's stack, the boostee can simply check it see if it owns the lock. This commit makes this change, shrinking task_struct by one pointer and the kernel by thirteen lines. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-07-09rcu: Check both root and current rcu_node when setting up future grace periodPranith Kumar
The rcu_start_future_gp() function checks the current rcu_node's ->gpnum and ->completed twice, once without ACCESS_ONCE() and once with it. Which is pointless because we hold that rcu_node's ->lock at that point. The intent was to check the current rcu_node structure and the root rcu_node structure, the latter locklessly with ACCESS_ONCE(). This commit therefore makes that change. The reason that it is safe to locklessly check the root rcu_nodes's ->gpnum and ->completed fields is that we hold the current rcu_node's ->lock, which constrains the root rcu_node's ability to change its ->gpnum and ->completed fields. Of course, if there is a single rcu_node structure, then rnp_root==rnp, and holding the lock prevents all changes. If there is more than one rcu_node structure, then the code updates the fields in the following order: 1. Increment rnp_root->gpnum to start new grace period. 2. Increment rnp->gpnum to initialize the current rcu_node, continuing initialization for the new grace period. 3. Increment rnp_root->completed to end the current grace period. 4. Increment rnp->completed to continue cleaning up after the old grace period. So there are four possible combinations of relative values of these four fields: N N N N: RCU idle, new grace period must be initiated. Although rnp_root->gpnum might be incremented immediately after we check, that will just result in unnecessary work. The grace period already started, and we try to start it. N+1 N N N: RCU grace period just started. No further change is possible because we hold rnp->lock, so the checks of rnp_root->gpnum and rnp_root->completed are stable. We know that our request for a future grace period will be seen during grace-period cleanup. N+1 N N+1 N: RCU grace period is ongoing. Because rnp->gpnum is different than rnp->completed, we won't even look at rnp_root->gpnum and rnp_root->completed, so the possible concurrent change to rnp_root->completed does not matter. We know that our request for a future grace period will be seen during grace-period cleanup, which cannot pass this rcu_node because we hold its ->lock. N+1 N+1 N+1 N: RCU grace period has ended, but not yet been cleaned up. Because rnp->gpnum is different than rnp->completed, we won't look at rnp_root->gpnum and rnp_root->completed, so the possible concurrent change to rnp_root->completed does not matter. We know that our request for a future grace period will be seen during grace-period cleanup, which cannot pass this rcu_node because we hold its ->lock. Therefore, despite initial appearances, the lockless check is safe. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> [ paulmck: Update comment to say why the lockless check is safe. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-07-09rcu: Allow post-unlock reference for rt_mutexPaul E. McKenney
The current approach to RCU priority boosting uses an rt_mutex strictly for its priority-boosting side effects. The rt_mutex_init_proxy_locked() function is used by the booster to initialize the lock as held by the boostee. The booster then uses rt_mutex_lock() to acquire this rt_mutex, which priority-boosts the boostee. When the boostee reaches the end of its outermost RCU read-side critical section, it checks a field in its task structure to see whether it has been boosted, and, if so, uses rt_mutex_unlock() to release the rt_mutex. The booster can then go on to boost the next task that is blocking the current RCU grace period. But reasonable implementations of rt_mutex_unlock() might result in the boostee referencing the rt_mutex's data after releasing it. But the booster might have re-initialized the rt_mutex between the time that the boostee released it and the time that it later referenced it. This is clearly asking for trouble, so this commit introduces a completion that forces the booster to wait until the boostee has completely finished with the rt_mutex, thus avoiding the case where the booster is re-initializing the rt_mutex before the last boostee's last reference to that rt_mutex. This of course does introduce some overhead, but the priority-boosting code paths are miles from any possible fastpath, and the overhead of executing the completion will normally be quite small compared to the overhead of priority boosting and deboosting, so this should be OK. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-07-09rcu: Loosen __call_rcu()'s rcu_head alignment constraintPaul E. McKenney
The m68k architecture aligns only to 16-bit boundaries, which can cause the align-to-32-bits check in __call_rcu() to trigger. Because there is currently no known potential need for more than one low-order bit, this commit loosens the check to 16-bit boundaries. Reported-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2014-07-09rcu: Eliminate read-modify-write ACCESS_ONCE() callsPaul E. McKenney
RCU contains code of the following forms: ACCESS_ONCE(x)++; ACCESS_ONCE(x) += y; ACCESS_ONCE(x) -= y; Now these constructs do operate correctly, but they really result in a pair of volatile accesses, one to do the load and another to do the store. This can be confusing, as the casual reader might well assume that (for example) gcc might generate a memory-to-memory add instruction for each of these three cases. In fact, gcc will do no such thing. Also, there is a good chance that the kernel will move to separate load and store variants of ACCESS_ONCE(), and constructs like the above could easily confuse both people and scripts attempting to make that sort of change. Finally, most of RCU's read-modify-write uses of ACCESS_ONCE() really only need the store to be volatile, so that the read-modify-write form might be misleading. This commit therefore changes the above forms in RCU so that each instance of ACCESS_ONCE() either does a load or a store, but not both. In a few cases, ACCESS_ONCE() was not critical, for example, for maintaining statisitics. In these cases, ACCESS_ONCE() has been dispensed with entirely. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-07-09rcu: Remove redundant ACCESS_ONCE() from tick_do_timer_cpuPaul E. McKenney
In kernels built with CONFIG_NO_HZ_FULL, tick_do_timer_cpu is constant once boot completes. Thus, there is no need to wrap it in ACCESS_ONCE() in code that is built only when CONFIG_NO_HZ_FULL. This commit therefore removes the redundant ACCESS_ONCE(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2014-07-09rcu: Make rcu node arrays static const char * constFabian Frederick
Those two arrays are being passed to lockdep_init_map(), which expects const char *, and are stored in lockdep_map the same way. Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-07-07rcu: Don't offload callbacks unless specifically requestedPaul E. McKenney
Enabling NO_HZ_FULL currently has the side effect of enabling callback offloading on all CPUs. This results in lots of additional rcuo kthreads, and can also increase context switching and wakeups, even in cases where callback offloading is neither needed nor particularly desirable. This commit therefore enables callback offloading on a given CPU only if specifically requested at build time or boot time, or if that CPU has been specifically designated (again, either at build time or boot time) as a nohz_full CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-07-07rcu: Parallelize and economize NOCB kthread wakeupsPaul E. McKenney
An 80-CPU system with a context-switch-heavy workload can require so many NOCB kthread wakeups that the RCU grace-period kthreads spend several tens of percent of a CPU just awakening things. This clearly will not scale well: If you add enough CPUs, the RCU grace-period kthreads would get behind, increasing grace-period latency. To avoid this problem, this commit divides the NOCB kthreads into leaders and followers, where the grace-period kthreads awaken the leaders each of whom in turn awakens its followers. By default, the number of groups of kthreads is the square root of the number of CPUs, but this default may be overridden using the rcutree.rcu_nocb_leader_stride boot parameter. This reduces the number of wakeups done per grace period by the RCU grace-period kthread by the square root of the number of CPUs, but of course by shifting those wakeups to the leaders. In addition, because the leaders do grace periods on behalf of their respective followers, the number of wakeups of the followers decreases by up to a factor of two. Instead of being awakened once when new callbacks arrive and again at the end of the grace period, the followers are awakened only at the end of the grace period. For a numerical example, in a 4096-CPU system, the grace-period kthread would awaken 64 leaders, each of which would awaken its 63 followers at the end of the grace period. This compares favorably with the 79 wakeups for the grace-period kthread on an 80-CPU system. Reported-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-06-23rcu: Reduce overhead of cond_resched() checks for RCUPaul E. McKenney
Commit ac1bea85781e (Make cond_resched() report RCU quiescent states) fixed a problem where a CPU looping in the kernel with but one runnable task would give RCU CPU stall warnings, even if the in-kernel loop contained cond_resched() calls. Unfortunately, in so doing, it introduced performance regressions in Anton Blanchard's will-it-scale "open1" test. The problem appears to be not so much the increased cond_resched() path length as an increase in the rate at which grace periods complete, which increased per-update grace-period overhead. This commit takes a different approach to fixing this bug, mainly by moving the RCU-visible quiescent state from cond_resched() to rcu_note_context_switch(), and by further reducing the check to a simple non-zero test of a single per-CPU variable. However, this approach requires that the force-quiescent-state processing send resched IPIs to the offending CPUs. These will be sent only once the grace period has reached an age specified by the boot/sysfs parameter rcutree.jiffies_till_sched_qs, or once the grace period reaches an age halfway to the point at which RCU CPU stall warnings will be emitted, whichever comes first. Reported-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Christoph Lameter <cl@gentwo.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> [ paulmck: Made rcu_momentary_dyntick_idle() as suggested by the ktest build robot. Also fixed smp_mb() comment as noted by Oleg Nesterov. ] Merge with e552592e (Reduce overhead of cond_resched() checks for RCU) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-06-23rcu: Export debug_init_rcu_head() and and debug_init_rcu_head()Paul E. McKenney
Currently, call_rcu() relies on implicit allocation and initialization for the debug-objects handling of RCU callbacks. If you hammer the kernel hard enough with Sasha's modified version of trinity, you can end up with the sl*b allocators recursing into themselves via this implicit call_rcu() allocation. This commit therefore exports the debug_init_rcu_head() and debug_rcu_head_free() functions, which permits the allocators to allocated and pre-initialize the debug-objects information, so that there no longer any need for call_rcu() to do that initialization, which in turn prevents the recursion into the memory allocators. Reported-by: Sasha Levin <sasha.levin@oracle.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Looks-good-to: Christoph Lameter <cl@linux.com>
2014-06-03Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull core locking updates from Ingo Molnar: "The main changes in this cycle were: - reduced/streamlined smp_mb__*() interface that allows more usecases and makes the existing ones less buggy, especially in rarer architectures - add rwsem implementation comments - bump up lockdep limits" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) rwsem: Add comments to explain the meaning of the rwsem's count field lockdep: Increase static allocations arch: Mass conversion of smp_mb__*() arch,doc: Convert smp_mb__*() arch,xtensa: Convert smp_mb__*() arch,x86: Convert smp_mb__*() arch,tile: Convert smp_mb__*() arch,sparc: Convert smp_mb__*() arch,sh: Convert smp_mb__*() arch,score: Convert smp_mb__*() arch,s390: Convert smp_mb__*() arch,powerpc: Convert smp_mb__*() arch,parisc: Convert smp_mb__*() arch,openrisc: Convert smp_mb__*() arch,mn10300: Convert smp_mb__*() arch,mips: Convert smp_mb__*() arch,metag: Convert smp_mb__*() arch,m68k: Convert smp_mb__*() arch,m32r: Convert smp_mb__*() arch,ia64: Convert smp_mb__*() ...
2014-05-19rcu: Provide API to suppress stall warnings while sysrc runsRik van Riel
Some sysrq handlers can run for a long time, because they dump a lot of data onto a serial console. Having RCU stall warnings pop up in the middle of them only makes the problem worse. This commit provides rcu_sysrq_start() and rcu_sysrq_end() APIs to temporarily suppress RCU CPU stall warnings while a sysrq request is handled. Signed-off-by: Rik van Riel <riel@redhat.com> [ paulmck: Fix TINY_RCU build error. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-05-14rcu: Variable name changed in tree_plugin.h and used in tree.cUma Sharma
The variable and struct both having the name "rcu_state" confuses sparse in some situations, so this commit changes the variable to "rcu_state_p" in order to avoid this confusion. This also makes things easier for human readers. Signed-off-by: Uma Sharma <uma.sharma523@gmail.com> [ paulmck: Changed the declaration and several additional uses. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-05-14Merge branches 'doc.2014.04.29a', 'fixes.2014.04.29a' and ↵Paul E. McKenney
'torture.2014.05.14a' into HEAD doc.2014.04.29a: Documentation updates. fixes.2014.04.29a: Miscellaneous fixes. torture.2014.05.14a: RCU/Lock torture tests.
2014-05-14torture: Check for multiple concurrent torture testsPaul E. McKenney
The torture tests are designed to run in isolation, but do not enforce this isolation. This commit therefore checks for concurrent torture tests, and refuses to start new tests while old tests are running. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14rcutorture: Run rcu_torture_writer at normal priorityPaul E. McKenney
There are usually lots of readers and only one writer, so if there has to be a choice, we would want rcu_torture_writer to win. This commit therefore removes the set_user_nice() from rcu_torture_writer(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14rcutorture: Add missing destroy_timer_on_stack()Thomas Gleixner
The rcu_torture_reader() function uses an on-stack timer_list structure which it initializes with setup_timer_on_stack(). However, it fails to use destroy_timer_on_stack() before exiting, which results in leaking a tracking object if DEBUG_OBJECTS is enabled. This commit therefore invokes destroy_timer_on_stack() to avoid this leakage. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-05-14rcutorture: Explicitly test synchronous grace-period primitivesPaul E. McKenney
The original rcu_torture_writer() avoided testing the synchronous grace-period primitives because they were simply wrappers around call_rcu() invocations. The testing of these synchronous primitives was delegated to the fake writers. However, there really is no excuse not to test them, especially in the case of SRCU, where the wrappering is somewhat more elaborate. This commit therefore makes the default rcutorture parameters cause rcu_torture_writer() to include synchronous grace-period primitives in its testing. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14rcutorture: Add tests for get_state_synchronize_rcu()Paul E. McKenney
This commit adds rcutorture testing for get_state_synchronize_rcu() and cond_synchronize_rcu(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14rcutorture: Check for rcu_torture_fqs creation errorsPaul E. McKenney
The return value from torture_create_kthread() is currently ignored when creating the rcu_torture_fqs kthread. This commit therefore captures the return value so that it can be tested for errors. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14rcutorture: Make rcu_torture_reader() use cond_resched()Paul E. McKenney
The rcu_torture_reader() function currently uses schedule(). This commit therefore speeds things up a bit by substituting cond_resched(). This change makes rcu_torture_reader() more CPU-bound, so this commit also adjusts the number of readers (the "nreaders" module parameter, which feeds into the "nrealreaders" variable) to allow one CPU to be free of readers on SMP systems. The point of this is to increase the probability that readers will be watching while an updater makes a change. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14sched,rcu: Make cond_resched() report RCU quiescent statesPaul E. McKenney
Given a CPU running a loop containing cond_resched(), with no other tasks runnable on that CPU, RCU will eventually report RCU CPU stall warnings due to lack of quiescent states. Fortunately, every call to cond_resched() is a perfectly good quiescent state. Unfortunately, invoking rcu_note_context_switch() is a bit heavyweight for cond_resched(), especially given the need to disable preemption, and, for RCU-preempt, interrupts as well. This commit therefore maintains a per-CPU counter that causes cond_resched(), cond_resched_lock(), and cond_resched_softirq() to call rcu_note_context_switch(), but only about once per 256 invocations. This ratio was chosen in keeping with the relative time constants of RCU grace periods. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14rcutorture: Export RCU grace-period kthread wait state to rcutorturePaul E. McKenney
This commit allows rcutorture to print additional state for the RCU grace-period kthreads in cases where RCU seems reluctant to start a new grace period. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14torture: Dump ftrace buffer when the RCU grace period stallsPaul E. McKenney
This commit adds a call to rcutorture_trace_dump() to dump the ftrace buffer when the RCU grace period stalls in order to help debug the stall. Note that this is different than the RCU CPU stall warning, as it is rcutorture detecting the stall rather than the underlying RCU implementation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14rcutorture: Print negatives for SRCU counter wraparoundPaul E. McKenney
The srcu_torture_stats() function prints SRCU's per-CPU c[] array with an unsigned format, which means that the number one less than zero is a very large number. This commit therefore prints this array with a signed format in order to improve readability of the rcutorture output. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14rcutorture: Mark function as static in kernel/rcu/torture.cRashika Kheria
Mark functions as static in kernel/rcu/torture.c because they are not used outside this file. This eliminates the following warning in kernel/rcu/torture.c: kernel/rcu/torture.c:902:6: warning: no previous prototype for ‘rcutorture_trace_dump’ [-Wmissing-prototypes] kernel/rcu/torture.c:1572:6: warning: no previous prototype for ‘rcu_torture_barrier_cbf’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-05-13rcutorture: Add forward-progress checking for writerPaul E. McKenney
The rcutorture output currently does not distinguish between stalls in the RCU implementation and stalls in the rcu_torture_writer() kthreads. This commit therefore adds some diagnostics to help distinguish between these two conditions, at least for the non-SRCU implementations. (SRCU does not provide evidence of update-side forward progress by design.) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-04-29rcu: Replace __this_cpu_ptr() uses with raw_cpu_ptr()Christoph Lameter
__this_cpu_ptr is being phased out. One special case is increment_cpu_stall_ticks(). A per cpu variable is incremented so use raw_cpu_inc(). Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-04-29rcu: Remove duplicate resched_cpu() declarationPranith Kumar
Signed-off-by: Pranith Kumar <pranith@gatech.edu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-04-29rcu: Make large and small sysidle systems use same state machinePaul E. McKenney
Currently, small systems move back into RCU_SYSIDLE_NOT from RCU_SYSIDLE_SHORT and large systems do not. This works because moving aggressively to RCU_SYSIDLE_NOT affects only performance, not correctness, and on small systems, the performance impact should be negligible. That said, this difference does make RCU a bit more complex, and RCU does not seem to be suffering from any lack of complexity. This commit therefore adjusts small-system operation to match that of large systems, so that the state never moves back to RCU_SYSIDLE_NOT from RCU_SYSIDLE_SHORT. Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-04-29rcu: Bind RCU grace-period kthreads if NO_HZ_FULLPaul E. McKenney
Currently, RCU binds the grace-period kthreads to the timekeeping CPU only if CONFIG_NO_HZ_FULL_SYSIDLE=y. This means that these kthreads must be bound manually when CONFIG_NO_HZ_FULL_SYSIDLE=n and CONFIG_NO_HZ_FULL=y: Otherwise, these kthreads will induce OS jitter on random CPUs. Given that we are trying to reduce the amount of manual tweaking required to make CONFIG_NO_HZ_FULL=y work nicely, this commit makes this binding happen when CONFIG_NO_HZ_FULL=y, even in cases where CONFIG_NO_HZ_FULL_SYSIDLE=n. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-04-29rcu: Merge rcu_sched_force_quiescent_state() with rcu_force_quiescent_state()Andreea-Cristina Bernat
This patch merges the function rcu_force_quiescent_state() with rcu_sched_force_quiescent_state(), using the rcu_state pointer. Firstly, the rcu_sched_force_quiescent_state() function is deleted from the file kernel/rcu/tree.c. Also, the rcu_force_quiescent_state() function that was calling force_quiescent_state with the argument rcu_preempt_state pointer was deleted as well. The new function that combines the old ones uses the rcu_state pointer and is located after rcu_batches_completed_bh() in kernel/rcu/tree.c. Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-04-29rcu: Consolidate kfree_call_rcu() to use rcu_state pointerAndreea-Cristina Bernat
kfree_call_rcu is defined two times. When defined under CONFIG_TREE_PREEMPT_RCU, it uses rcu_preempt_state. Otherwise, it uses rcu_sched_state. This patch uses the rcu_state_pointer to combine the two definitions into one. The resulting function is placed after the closing of the preprocessor conditional CONFIG_TREE_PREEMPT_RCU. Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-04-29rcu: Replace NR_CPUS with nr_cpu_idsHimangi Saraogi
This patch replaces NR_CPUS with nr_cpu_ids as NR_CPUS should consider cpumask_var_t. Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-04-29rcu: Add event tracing to dyntick_save_progress_counter().Andreea-Cristina Bernat
This patch adds event tracing to dyntick_save_progress_counter() in the case where it returns 1. I used the tracepoint string "dti" because this function returns 1 in case the CPU is in dynticks idle mode. Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-04-29rcu: Protect uses of ->jiffies_stall with ACCESS_ONCE()Himangi Saraogi
Some of the accesses to the rcu_state structure's ->jiffies_stall field are unprotected. This patch protects them with ACCESS_ONCE(). The following coccinelle script was used to acheive this: /* coccinelle script to protect uses of ->jiffies_stall with ACCESS_ONCE() */ @@ identifier a; @@ ( ACCESS_ONCE(a->jiffies_stall) | - a->jiffies_stall + ACCESS_ONCE(a->jiffies_stall) ) Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-04-29rcu: Make callers awaken grace-period kthreadPaul E. McKenney
The rcu_start_gp_advanced() function currently uses irq_work_queue() to defer wakeups of the RCU grace-period kthread. This deferring is necessary to avoid RCU-scheduler deadlocks involving the rcu_node structure's lock, meaning that RCU cannot call any of the scheduler's wake-up functions while holding one of these locks. Unfortunately, the second and subsequent calls to irq_work_queue() are ignored, and the first call will be ignored (aside from queuing the work item) if the scheduler-clock tick is turned off. This is OK for many uses, especially those where irq_work_queue() is called from an interrupt or softirq handler, because in those cases the scheduler-clock-tick state will be re-evaluated, which will turn the scheduler-clock tick back on. On the next tick, any deferred work will then be processed. However, this strategy does not always work for RCU, which can be invoked at process level from idle CPUs. In this case, the tick might never be turned back on, indefinitely defering a grace-period start request. Note that the RCU CPU stall detector cannot see this condition, because there is no RCU grace period in progress. Therefore, we can (and do!) see long tens-of-seconds stalls in grace-period handling. In theory, we could see a full grace-period hang, but rcutorture testing to date has seen only the tens-of-seconds stalls. Event tracing demonstrates that irq_work_queue() is being called repeatedly to no effect during these stalls: The "newreq" event appears repeatedly from a task that is not one of the grace-period kthreads. In theory, irq_work_queue() might be fixed to avoid this sort of issue, but RCU's requirements are unusual and it is quite straightforward to pass wake-up responsibility up through RCU's call chain, so that the wakeup happens when the offending locks are released. This commit therefore makes this change. The rcu_start_gp_advanced(), rcu_start_future_gp(), rcu_accelerate_cbs(), rcu_advance_cbs(), __note_gp_changes(), and rcu_start_gp() functions now return a boolean which indicates when a wake-up is needed. A new rcu_gp_kthread_wake() does the wakeup when it is necessary and safe to do so: No self-wakes, no wake-ups if the ->gp_flags field indicates there is no need (as in someone else did the wake-up before we got around to it), and no wake-ups before the grace-period kthread has been created. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>