diff options
author | Paul E. McKenney <paulmck@linux.ibm.com> | 2019-08-13 14:41:48 -0700 |
---|---|---|
committer | Paul E. McKenney <paulmck@linux.ibm.com> | 2019-08-13 14:41:48 -0700 |
commit | 07f038a408fb215fd656de78304b6ff4c7e4e490 (patch) | |
tree | 7e78a89f7a5981382d252cafcc58d9a4d66c9957 /Documentation | |
parent | 6738ff85c3ee8073d5b030cb26241d0009d4ce29 (diff) | |
parent | cfcdef5e30469f3f2d6786ad35fc3fdef2a3833f (diff) |
Merge LKMM and RCU commits
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/RCU/Design/Requirements/Requirements.html | 73 | ||||
-rw-r--r-- | Documentation/RCU/stallwarn.txt | 6 | ||||
-rw-r--r-- | Documentation/admin-guide/kernel-parameters.txt | 17 |
3 files changed, 89 insertions, 7 deletions
diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index 5a9238a2883c..467251f7fef6 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html @@ -2129,6 +2129,8 @@ Some of the relevant points of interest are as follows: <li> <a href="#Hotplug CPU">Hotplug CPU</a>. <li> <a href="#Scheduler and RCU">Scheduler and RCU</a>. <li> <a href="#Tracing and RCU">Tracing and RCU</a>. +<li> <a href="#Accesses to User Memory and RCU"> +Accesses to User Memory and RCU</a>. <li> <a href="#Energy Efficiency">Energy Efficiency</a>. <li> <a href="#Scheduling-Clock Interrupts and RCU"> Scheduling-Clock Interrupts and RCU</a>. @@ -2512,7 +2514,7 @@ disabled across the entire RCU read-side critical section. <p> It is possible to use tracing on RCU code, but tracing itself uses RCU. -For this reason, <tt>rcu_dereference_raw_notrace()</tt> +For this reason, <tt>rcu_dereference_raw_check()</tt> is provided for use by tracing, which avoids the destructive recursion that could otherwise ensue. This API is also used by virtualization in some architectures, @@ -2521,6 +2523,75 @@ cannot be used. The tracing folks both located the requirement and provided the needed fix, so this surprise requirement was relatively painless. +<h3><a name="Accesses to User Memory and RCU"> +Accesses to User Memory and RCU</a></h3> + +<p> +The kernel needs to access user-space memory, for example, to access +data referenced by system-call parameters. +The <tt>get_user()</tt> macro does this job. + +<p> +However, user-space memory might well be paged out, which means +that <tt>get_user()</tt> might well page-fault and thus block while +waiting for the resulting I/O to complete. +It would be a very bad thing for the compiler to reorder +a <tt>get_user()</tt> invocation into an RCU read-side critical +section. +For example, suppose that the source code looked like this: + +<blockquote> +<pre> + 1 rcu_read_lock(); + 2 p = rcu_dereference(gp); + 3 v = p->value; + 4 rcu_read_unlock(); + 5 get_user(user_v, user_p); + 6 do_something_with(v, user_v); +</pre> +</blockquote> + +<p> +The compiler must not be permitted to transform this source code into +the following: + +<blockquote> +<pre> + 1 rcu_read_lock(); + 2 p = rcu_dereference(gp); + 3 get_user(user_v, user_p); // BUG: POSSIBLE PAGE FAULT!!! + 4 v = p->value; + 5 rcu_read_unlock(); + 6 do_something_with(v, user_v); +</pre> +</blockquote> + +<p> +If the compiler did make this transformation in a +<tt>CONFIG_PREEMPT=n</tt> kernel build, and if <tt>get_user()</tt> did +page fault, the result would be a quiescent state in the middle +of an RCU read-side critical section. +This misplaced quiescent state could result in line 4 being +a use-after-free access, which could be bad for your kernel's +actuarial statistics. +Similar examples can be constructed with the call to <tt>get_user()</tt> +preceding the <tt>rcu_read_lock()</tt>. + +<p> +Unfortunately, <tt>get_user()</tt> doesn't have any particular +ordering properties, and in some architectures the underlying <tt>asm</tt> +isn't even marked <tt>volatile</tt>. +And even if it was marked <tt>volatile</tt>, the above access to +<tt>p->value</tt> is not volatile, so the compiler would not have any +reason to keep those two accesses in order. + +<p> +Therefore, the Linux-kernel definitions of <tt>rcu_read_lock()</tt> +and <tt>rcu_read_unlock()</tt> must act as compiler barriers, +at least for outermost instances of <tt>rcu_read_lock()</tt> and +<tt>rcu_read_unlock()</tt> within a nested set of RCU read-side critical +sections. + <h3><a name="Energy Efficiency">Energy Efficiency</a></h3> <p> diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 13e88fc00f01..f48f4621ccbc 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -57,6 +57,12 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that CONFIG_PREEMPT_RCU case, you might see stall-warning messages. + You can use the rcutree.kthread_prio kernel boot parameter to + increase the scheduling priority of RCU's kthreads, which can + help avoid this problem. However, please note that doing this + can increase your system's context-switch rate and thus degrade + performance. + o A periodic interrupt whose handler takes longer than the time interval between successive pairs of interrupts. This can prevent RCU's kthreads and softirq handlers from running. diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 7ccd158b3894..79b983bedcaa 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3837,12 +3837,13 @@ RCU_BOOST is not set, valid values are 0-99 and the default is zero (non-realtime operation). - rcutree.rcu_nocb_leader_stride= [KNL] - Set the number of NOCB kthread groups, which - defaults to the square root of the number of - CPUs. Larger numbers reduces the wakeup overhead - on the per-CPU grace-period kthreads, but increases - that same overhead on each group's leader. + rcutree.rcu_nocb_gp_stride= [KNL] + Set the number of NOCB callback kthreads in + each group, which defaults to the square root + of the number of CPUs. Larger numbers reduce + the wakeup overhead on the global grace-period + kthread, but increases that same overhead on + each group's NOCB grace-period kthread. rcutree.qhimark= [KNL] Set threshold of queued RCU callbacks beyond which @@ -4047,6 +4048,10 @@ rcutorture.verbose= [KNL] Enable additional printk() statements. + rcupdate.rcu_cpu_stall_ftrace_dump= [KNL] + Dump ftrace buffer after reporting RCU CPU + stall warning. + rcupdate.rcu_cpu_stall_suppress= [KNL] Suppress RCU CPU stall warning messages. |