diff options
| author | Ingo Molnar <mingo@kernel.org> | 2018-07-17 09:16:02 +0200 |
|---|---|---|
| committer | Ingo Molnar <mingo@kernel.org> | 2018-07-17 09:16:02 +0200 |
| commit | ea73a5c6929b9c7d30b7b424414645641cb7d1d9 (patch) | |
| tree | 19de3c2469c6476be82b143535a018d0053f9460 /Documentation/RCU/Design/Data-Structures/Data-Structures.html | |
| parent | 9d3cce1e8b8561fed5f383d22a4d6949db4eadbe (diff) | |
| parent | 18952651dae8efcc6d565c97f8fe5629b399cb3e (diff) | |
Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:
- An optimization and a fix for RCU expedited grace periods, with
the fix being from Boqun Feng.
- Miscellaneous fixes, including a lockdep-annotation fix from
Boqun Feng.
- SRCU updates.
- Updates to rcutorture and associated scripting.
- Introduce grace-period sequence numbers to the RCU-bh, RCU-preempt,
and RCU-sched flavors, replacing the old ->gpnum and ->completed
pair of fields. This change allows lockless code to obtain the
complete grace-period state with a single READ_ONCE(), which is
needed to maintain tolerable lock contention during the upcoming
consolidation of the three RCU flavors. Note that grace-period
sequence numbers are already used by rcu_barrier(), expedited
RCU grace periods, and SRCU, and are thus already heavily used
and well-tested. Joel Fernandes contributed a number of excellent
fixes and improvements.
- Clean up some grace-period-reporting loose ends, including
improving the handling of quiescent states from offline CPUs
and fixing some false-positive WARN_ON_ONCE() invocations.
(Strictly speaking, the WARN_ON_ONCE() invocations were quite
correct, but their invariants were (harmlessly) violated by the
earlier sloppy handling of quiescent states from offline CPUs.)
In addition, improve grace-period forward-progress guarantees so
as to allow removal of fail-safe checks that required otherwise
needless lock acquisitions. Finally, add more diagnostics to
help debug the upcoming consolidation of the RCU-bh, RCU-preempt,
and RCU-sched flavors.
- Additional miscellaneous fixes, including those contributed by
Byungchul Park, Mauro Carvalho Chehab, Joe Perches, Joel Fernandes,
Steven Rostedt, Andrea Parri, and Neil Brown.
- Additional torture-test changes, including several contributed by
Arnd Bergmann and Joel Fernandes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Diffstat (limited to 'Documentation/RCU/Design/Data-Structures/Data-Structures.html')
| -rw-r--r-- | Documentation/RCU/Design/Data-Structures/Data-Structures.html | 118 |
1 files changed, 63 insertions, 55 deletions
diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.html b/Documentation/RCU/Design/Data-Structures/Data-Structures.html index 6c06e10bd04b..f5120a00f511 100644 --- a/Documentation/RCU/Design/Data-Structures/Data-Structures.html +++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.html @@ -380,31 +380,26 @@ and therefore need no protection. as follows: <pre> - 1 unsigned long gpnum; - 2 unsigned long completed; + 1 unsigned long gp_seq; </pre> <p>RCU grace periods are numbered, and -the <tt>->gpnum</tt> field contains the number of the grace -period that started most recently. -The <tt>->completed</tt> field contains the number of the -grace period that completed most recently. -If the two fields are equal, the RCU grace period that most recently -started has already completed, and therefore the corresponding -flavor of RCU is idle. -If <tt>->gpnum</tt> is one greater than <tt>->completed</tt>, -then <tt>->gpnum</tt> gives the number of the current RCU -grace period, which has not yet completed. -Any other combination of values indicates that something is broken. -These two fields are protected by the root <tt>rcu_node</tt>'s +the <tt>->gp_seq</tt> field contains the current grace-period +sequence number. +The bottom two bits are the state of the current grace period, +which can be zero for not yet started or one for in progress. +In other words, if the bottom two bits of <tt>->gp_seq</tt> are +zero, the corresponding flavor of RCU is idle. +Any other value in the bottom two bits indicates that something is broken. +This field is protected by the root <tt>rcu_node</tt> structure's <tt>->lock</tt> field. -</p><p>There are <tt>->gpnum</tt> and <tt>->completed</tt> fields +</p><p>There are <tt>->gp_seq</tt> fields in the <tt>rcu_node</tt> and <tt>rcu_data</tt> structures as well. The fields in the <tt>rcu_state</tt> structure represent the -most current values, and those of the other structures are compared -in order to detect the start of a new grace period in a distributed +most current value, and those of the other structures are compared +in order to detect the beginnings and ends of grace periods in a distributed fashion. The values flow from <tt>rcu_state</tt> to <tt>rcu_node</tt> (down the tree from the root to the leaves) to <tt>rcu_data</tt>. @@ -512,27 +507,47 @@ than to be heisenbugged out of existence. as follows: <pre> - 1 unsigned long gpnum; - 2 unsigned long completed; + 1 unsigned long gp_seq; + 2 unsigned long gp_seq_needed; </pre> -<p>These fields are the counterparts of the fields of the same name in -the <tt>rcu_state</tt> structure. -They each may lag up to one behind their <tt>rcu_state</tt> -counterparts. -If a given <tt>rcu_node</tt> structure's <tt>->gpnum</tt> and -<tt>->complete</tt> fields are equal, then this <tt>rcu_node</tt> +<p>The <tt>rcu_node</tt> structures' <tt>->gp_seq</tt> fields are +the counterparts of the field of the same name in the <tt>rcu_state</tt> +structure. +They each may lag up to one step behind their <tt>rcu_state</tt> +counterpart. +If the bottom two bits of a given <tt>rcu_node</tt> structure's +<tt>->gp_seq</tt> field is zero, then this <tt>rcu_node</tt> structure believes that RCU is idle. -Otherwise, as with the <tt>rcu_state</tt> structure, -the <tt>->gpnum</tt> field will be one greater than the -<tt>->complete</tt> fields, with <tt>->gpnum</tt> -indicating which grace period this <tt>rcu_node</tt> believes -is still being waited for. +</p><p>The <tt>>gp_seq</tt> field of each <tt>rcu_node</tt> +structure is updated at the beginning and the end +of each grace period. + +<p>The <tt>->gp_seq_needed</tt> fields record the +furthest-in-the-future grace period request seen by the corresponding +<tt>rcu_node</tt> structure. The request is considered fulfilled when +the value of the <tt>->gp_seq</tt> field equals or exceeds that of +the <tt>->gp_seq_needed</tt> field. -</p><p>The <tt>>gpnum</tt> field of each <tt>rcu_node</tt> -structure is updated at the beginning -of each grace period, and the <tt>->completed</tt> fields are -updated at the end of each grace period. +<table> +<tr><th> </th></tr> +<tr><th align="left">Quick Quiz:</th></tr> +<tr><td> + Suppose that this <tt>rcu_node</tt> structure doesn't see + a request for a very long time. + Won't wrapping of the <tt>->gp_seq</tt> field cause + problems? +</td></tr> +<tr><th align="left">Answer:</th></tr> +<tr><td bgcolor="#ffffff"><font color="ffffff"> + No, because if the <tt>->gp_seq_needed</tt> field lags behind the + <tt>->gp_seq</tt> field, the <tt>->gp_seq_needed</tt> field + will be updated at the end of the grace period. + Modulo-arithmetic comparisons therefore will always get the + correct answer, even with wrapping. +</font></td></tr> +<tr><td> </td></tr> +</table> <h5>Quiescent-State Tracking</h5> @@ -626,9 +641,8 @@ normal and expedited grace periods, respectively. </ol> <p><font color="ffffff">So the locking is absolutely required in - order to coordinate - clearing of the bits with the grace-period numbers in - <tt>->gpnum</tt> and <tt>->completed</tt>. + order to coordinate clearing of the bits with updating of the + grace-period sequence number in <tt>->gp_seq</tt>. </font></td></tr> <tr><td> </td></tr> </table> @@ -1038,15 +1052,15 @@ out any <tt>rcu_data</tt> structure for which this flag is not set. as follows: <pre> - 1 unsigned long completed; - 2 unsigned long gpnum; + 1 unsigned long gp_seq; + 2 unsigned long gp_seq_needed; 3 bool cpu_no_qs; 4 bool core_needs_qs; 5 bool gpwrap; 6 unsigned long rcu_qs_ctr_snap; </pre> -<p>The <tt>completed</tt> and <tt>gpnum</tt> +<p>The <tt>->gp_seq</tt> and <tt>->gp_seq_needed</tt> fields are the counterparts of the fields of the same name in the <tt>rcu_state</tt> and <tt>rcu_node</tt> structures. They may each lag up to one behind their <tt>rcu_node</tt> @@ -1054,15 +1068,9 @@ counterparts, but in <tt>CONFIG_NO_HZ_IDLE</tt> and <tt>CONFIG_NO_HZ_FULL</tt> kernels can lag arbitrarily far behind for CPUs in dyntick-idle mode (but these counters will catch up upon exit from dyntick-idle mode). -If a given <tt>rcu_data</tt> structure's <tt>->gpnum</tt> and -<tt>->complete</tt> fields are equal, then this <tt>rcu_data</tt> +If the lower two bits of a given <tt>rcu_data</tt> structure's +<tt>->gp_seq</tt> are zero, then this <tt>rcu_data</tt> structure believes that RCU is idle. -Otherwise, as with the <tt>rcu_state</tt> and <tt>rcu_node</tt> -structure, -the <tt>->gpnum</tt> field will be one greater than the -<tt>->complete</tt> fields, with <tt>->gpnum</tt> -indicating which grace period this <tt>rcu_data</tt> believes -is still being waited for. <table> <tr><th> </th></tr> @@ -1070,13 +1078,13 @@ is still being waited for. <tr><td> All this replication of the grace period numbers can only cause massive confusion. - Why not just keep a global pair of counters and be done with it??? + Why not just keep a global sequence number and be done with it??? </td></tr> <tr><th align="left">Answer:</th></tr> <tr><td bgcolor="#ffffff"><font color="ffffff"> - Because if there was only a single global pair of grace-period + Because if there was only a single global sequence numbers, there would need to be a single global lock to allow - safely accessing and updating them. + safely accessing and updating it. And if we are not going to have a single global lock, we need to carefully manage the numbers on a per-node basis. Recall from the answer to a previous Quick Quiz that the consequences @@ -1091,8 +1099,8 @@ CPU has not yet passed through a quiescent state, while the <tt>->core_needs_qs</tt> flag indicates that the RCU core needs a quiescent state from the corresponding CPU. The <tt>->gpwrap</tt> field indicates that the corresponding -CPU has remained idle for so long that the <tt>completed</tt> -and <tt>gpnum</tt> counters are in danger of overflow, which +CPU has remained idle for so long that the +<tt>gp_seq</tt> counter is in danger of overflow, which will cause the CPU to disregard the values of its counters on its next exit from idle. Finally, the <tt>rcu_qs_ctr_snap</tt> field is used to detect @@ -1130,10 +1138,10 @@ The CPU advances the callbacks in its <tt>rcu_data</tt> structure whenever it notices that another RCU grace period has completed. The CPU detects the completion of an RCU grace period by noticing that the value of its <tt>rcu_data</tt> structure's -<tt>->completed</tt> field differs from that of its leaf +<tt>->gp_seq</tt> field differs from that of its leaf <tt>rcu_node</tt> structure. Recall that each <tt>rcu_node</tt> structure's -<tt>->completed</tt> field is updated at the end of each +<tt>->gp_seq</tt> field is updated at the beginnings and ends of each grace period. <p> |
