From 832aa35a65bac800a1adbf2eab0b42427032cab8 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 29 Aug 2018 13:37:47 -0700 Subject: doc: Set down forward-progress requirements This commit adds a section to the requirements documentation setting down requirements for grace-period and callback-invocation forward progress. Signed-off-by: Paul E. McKenney --- .../RCU/Design/Requirements/Requirements.html | 110 ++++++++++++++++++++- 1 file changed, 108 insertions(+), 2 deletions(-) (limited to 'Documentation/RCU/Design') diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index 43c4e2f05f40..7efc1c1da7af 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html @@ -1381,6 +1381,7 @@ Classes of quality-of-implementation requirements are as follows:
  1. Specialization
  2. Performance and Scalability +
  3. Forward Progress
  4. Composability
  5. Corner Cases
@@ -1822,6 +1823,106 @@ so it is too early to tell whether they will stand the test of time. RCU thus provides a range of tools to allow updaters to strike the required tradeoff between latency, flexibility and CPU overhead. +

Forward Progress

+ +

+In theory, delaying grace-period completion and callback invocation +is harmless. +In practice, not only are memory sizes finite but also callbacks sometimes +do wakeups, and sufficiently deferred wakeups can be difficult +to distinguish from system hangs. +Therefore, RCU must provide a number of mechanisms to promote forward +progress. + +

+These mechanisms are not foolproof, nor can they be. +For one simple example, an infinite loop in an RCU read-side critical +section must by definition prevent later grace periods from ever completing. +For a more involved example, consider a 64-CPU system built with +CONFIG_RCU_NOCB_CPU=y and booted with rcu_nocbs=1-63, +where CPUs 1 through 63 spin in tight loops that invoke +call_rcu(). +Even if these tight loops also contain calls to cond_resched() +(thus allowing grace periods to complete), CPU 0 simply will +not be able to invoke callbacks as fast as the other 63 CPUs can +register them, at least not until the system runs out of memory. +In both of these examples, the Spiderman principle applies: With great +power comes great responsibility. +However, short of this level of abuse, RCU is required to +ensure timely completion of grace periods and timely invocation of +callbacks. + +

+RCU takes the following steps to encourage timely completion of +grace periods: + +

    +
  1. If a grace period fails to complete within 100 milliseconds, + RCU causes future invocations of cond_resched() on + the holdout CPUs to provide an RCU quiescent state. + RCU also causes those CPUs' need_resched() invocations + to return true, but only after the corresponding CPU's + next scheduling-clock. +
  2. CPUs mentioned in the nohz_full kernel boot parameter + can run indefinitely in the kernel without scheduling-clock + interrupts, which defeats the above need_resched() + strategem. + RCU will therefore invoke resched_cpu() on any + nohz_full CPUs still holding out after + 109 milliseconds. +
  3. In kernels built with CONFIG_RCU_BOOST=y, if a given + task that has been preempted within an RCU read-side critical + section is holding out for more than 500 milliseconds, + RCU will resort to priority boosting. +
  4. If a CPU is still holding out 10 seconds into the grace + period, RCU will invoke resched_cpu() on it regardless + of its nohz_full state. +
+ +

+The above values are defaults for systems running with HZ=1000. +They will vary as the value of HZ varies, and can also be +changed using the relevant Kconfig options and kernel boot parameters. +RCU currently does not do much sanity checking of these +parameters, so please use caution when changing them. +Note that these forward-progress measures are provided only for RCU, +not for +SRCU or +Tasks RCU. + +

+RCU takes the following steps in call_rcu() to encourage timely +invocation of callbacks when any given non-rcu_nocbs CPU has +10,000 callbacks, or has 10,000 more callbacks than it had the last time +encouragement was provided: + +

    +
  1. Starts a grace period, if one is not already in progress. +
  2. Forces immediate checking for quiescent states, rather than + waiting for three milliseconds to have elapsed since the + beginning of the grace period. +
  3. Immediately tags the CPU's callbacks with their grace period + completion numbers, rather than waiting for the RCU_SOFTIRQ + handler to get around to it. +
  4. Lifts callback-execution batch limits, which speeds up callback + invocation at the expense of degrading realtime response. +
+ +

+Again, these are default values when running at HZ=1000, +and can be overridden. +Again, these forward-progress measures are provided only for RCU, +not for +SRCU or +Tasks RCU. +Even for RCU, callback-invocation forward progress for rcu_nocbs +CPUs is much less well-developed, in part because workloads benefiting +from rcu_nocbs CPUs tend to invoke call_rcu() +relatively infrequently. +If workloads emerge that need both rcu_nocbs CPUs and high +call_rcu() invocation rates, then additional forward-progress +work will be required. +

Composability

@@ -2272,7 +2373,7 @@ that meets this requirement. Furthermore, NMI handlers can be interrupted by what appear to RCU to be normal interrupts. One way that this can happen is for code that directly invokes -rcu_irq_enter() and rcu_irq_exit() to be called +rcu_irq_enter() and rcu_irq_exit() to be called from an NMI handler. This astonishing fact of life prompted the current code structure, which has rcu_irq_enter() invoking rcu_nmi_enter() @@ -2294,7 +2395,7 @@ via del_timer_sync() or similar.

Unfortunately, there is no way to cancel an RCU callback; once you invoke call_rcu(), the callback function is -going to eventually be invoked, unless the system goes down first. +eventually going to be invoked, unless the system goes down first. Because it is normally considered socially irresponsible to crash the system in response to a module unload request, we need some other way to deal with in-flight RCU callbacks. @@ -3233,6 +3334,11 @@ For example, RCU callback overhead might be charged back to the originating call_rcu() instance, though probably not in production kernels. +

+Additional work may be required to provide reasonable forward-progress +guarantees under heavy load for grace periods and for callback +invocation. +

Summary

-- cgit v1.2.3 From 2d0350a8f0e6eb5494141c61c5c749b5155df33d Mon Sep 17 00:00:00 2001 From: "Joel Fernandes (Google)" Date: Fri, 21 Sep 2018 18:31:53 -0400 Subject: doc: Clarify RCU data-structure comment about rcu_tree fanout RCU Data-Structures document describes a trick to test RCU with small number of CPUs but with a taller tree. It wasn't immediately clear how the document arrived at 16 CPUs which also requires setting the FANOUT_LEAF to 2 instead of the default of 16. This commit therefore provides the needed clarification. Signed-off-by: Joel Fernandes (Google) Cc: Signed-off-by: Paul E. McKenney --- Documentation/RCU/Design/Data-Structures/Data-Structures.html | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'Documentation/RCU/Design') diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.html b/Documentation/RCU/Design/Data-Structures/Data-Structures.html index 1d2051c0c3fc..476b1ac38e4c 100644 --- a/Documentation/RCU/Design/Data-Structures/Data-Structures.html +++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.html @@ -127,9 +127,11 @@ CPUs, RCU would configure the rcu_node tree as follows:

RCU currently permits up to a four-level tree, which on a 64-bit system accommodates up to 4,194,304 CPUs, though only a mere 524,288 CPUs for 32-bit systems. -On the other hand, you can set CONFIG_RCU_FANOUT to be -as small as 2 if you wish, which would permit only 16 CPUs, which -is useful for testing. +On the other hand, you can set both CONFIG_RCU_FANOUT and +CONFIG_RCU_FANOUT_LEAF to be as small as 2, which would result +in a 16-CPU test using a 4-level tree. +This can be useful for testing large-system capabilities on small test +machines.

This multi-level combining tree allows us to get most of the performance and scalability -- cgit v1.2.3 From 5cc379a42acd7104747077db7aaf4b01115ee484 Mon Sep 17 00:00:00 2001 From: "Joel Fernandes (Google)" Date: Tue, 25 Sep 2018 11:25:57 -0700 Subject: doc: Update information about resched_cpu Since commit fced9c8cfe6b ("rcu: Avoid resched_cpu() when rescheduling the current CPU"), resched_cpu is not directly called from sync_sched_exp_handler. Update the documentation about the same. Signed-off-by: Joel Fernandes (Google) Cc: Signed-off-by: Paul E. McKenney --- .../RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'Documentation/RCU/Design') diff --git a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html index e62c7c34a369..8e4f873b979f 100644 --- a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html +++ b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html @@ -160,9 +160,9 @@ was in flight. If the CPU is idle, then sync_sched_exp_handler() reports the quiescent state. -

-Otherwise, the handler invokes resched_cpu(), which forces -a future context switch. +

Otherwise, the handler forces a future context switch by setting the +NEED_RESCHED flag of the current task's thread flag and the CPU preempt +counter. At the time of the context switch, the CPU reports the quiescent state. Should the CPU go offline first, it will report the quiescent state at that time. -- cgit v1.2.3 From c9b6f899e120c83ef144b3d4a8365413ef49cce4 Mon Sep 17 00:00:00 2001 From: "Joel Fernandes (Google)" Date: Wed, 3 Oct 2018 17:37:25 -0700 Subject: doc: Remove rcu_dynticks from Data-Structures rcu_dynticks was folded into rcu_data structure. Update the data structures RCU document accordingly. Signed-off-by: Joel Fernandes (Google) Cc: Signed-off-by: Paul E. McKenney --- .../Data-Structures/BigTreeClassicRCUBHdyntick.svg | 695 --------------------- .../Design/Data-Structures/Data-Structures.html | 90 +-- 2 files changed, 25 insertions(+), 760 deletions(-) delete mode 100644 Documentation/RCU/Design/Data-Structures/BigTreeClassicRCUBHdyntick.svg (limited to 'Documentation/RCU/Design') diff --git a/Documentation/RCU/Design/Data-Structures/BigTreeClassicRCUBHdyntick.svg b/Documentation/RCU/Design/Data-Structures/BigTreeClassicRCUBHdyntick.svg deleted file mode 100644 index 21ba7823479d..000000000000 --- a/Documentation/RCU/Design/Data-Structures/BigTreeClassicRCUBHdyntick.svg +++ /dev/null @@ -1,695 +0,0 @@ - - - - - - - - - - - - image/svg+xml - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - rcu_bh - - struct - - rcu_node - - struct - - rcu_node - - rcu_node - - struct - - struct - - rcu_data - - struct - - rcu_data - - struct - - rcu_data - - struct - - rcu_data - - struct rcu_state - - struct - - rcu_dynticks - - struct - - rcu_dynticks - - struct - - rcu_dynticks - - struct - - rcu_dynticks - - rcu_sched - - - - - diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.html b/Documentation/RCU/Design/Data-Structures/Data-Structures.html index 476b1ac38e4c..4eb603e3a005 100644 --- a/Documentation/RCU/Design/Data-Structures/Data-Structures.html +++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.html @@ -23,8 +23,6 @@ to each other. The rcu_segcblist Structure

  • The rcu_data Structure -
  • - The rcu_dynticks Structure
  • The rcu_head Structure
  • @@ -174,16 +172,8 @@ said to be in dyntick-idle mode. RCU must handle dyntick-idle CPUs specially because RCU would otherwise wake up each CPU on every grace period, which would defeat the whole purpose of CONFIG_NO_HZ_IDLE. -RCU uses the rcu_dynticks structure to track -which CPUs are in dyntick idle mode, as shown below: - -

    BigTreeClassicRCUBHdyntick.svg - -

    However, if a CPU is in dyntick-idle mode, it is in that mode -for all flavors of RCU. -Therefore, a single rcu_dynticks structure is allocated per -CPU, and all of a given CPU's rcu_data structures share -that rcu_dynticks, as shown in the figure. +RCU uses the dynticks related fields in the rcu_data structure +to track which CPUs are in dyntick idle mode.

    Kernels built with CONFIG_PREEMPT_RCU support rcu_preempt in addition to rcu_sched and rcu_bh, as shown below: @@ -216,9 +206,6 @@ its own synchronization:

  • Each rcu_node structure has a spinlock.
  • The fields in rcu_data are private to the corresponding CPU, although a few can be read and written by other CPUs. -
  • Similarly, the fields in rcu_dynticks are private - to the corresponding CPU, although a few can be read by - other CPUs.

    It is important to note that different data structures can have @@ -274,11 +261,6 @@ follows: access to this information from the corresponding CPU. Finally, this structure records past dyntick-idle state for the corresponding CPU and also tracks statistics. -

  • rcu_dynticks: - This per-CPU structure tracks the current dyntick-idle - state for the corresponding CPU. - Unlike the other three structures, the rcu_dynticks - structure is not replicated per RCU flavor.
  • rcu_head: This structure represents RCU callbacks, and is the only structure allocated and managed by RCU users. @@ -289,8 +271,8 @@ follows:

    If all you wanted from this article was a general notion of how RCU's data structures are related, you are done. Otherwise, each of the following sections give more details on -the rcu_state, rcu_node, rcu_data, -and rcu_dynticks data structures. +the rcu_state, rcu_node and rcu_data data +structures.

    The rcu_state Structure

    @@ -1017,30 +999,19 @@ as follows:
       1   int cpu;
    -  2   struct rcu_state *rsp;
    -  3   struct rcu_node *mynode;
    -  4   struct rcu_dynticks *dynticks;
    -  5   unsigned long grpmask;
    -  6   bool beenonline;
    +  2   struct rcu_node *mynode;
    +  3   unsigned long grpmask;
    +  4   bool beenonline;
     

    The ->cpu field contains the number of the -corresponding CPU, the ->rsp pointer references -the corresponding rcu_state structure (and is most frequently -used to locate the name of the corresponding flavor of RCU for tracing), -and the ->mynode field references the corresponding -rcu_node structure. +corresponding CPU and the ->mynode field references the +corresponding rcu_node structure. The ->mynode is used to propagate quiescent states up the combining tree. -

    The ->dynticks pointer references the -rcu_dynticks structure corresponding to this -CPU. -Recall that a single per-CPU instance of the rcu_dynticks -structure is shared among all flavors of RCU. -These first four fields are constant and therefore require not -synchronization. +These two fields are constant and therefore do not require synchronization. -

    The ->grpmask field indicates the bit in +

    The ->grpmask field indicates the bit in the ->mynode->qsmask corresponding to this rcu_data structure, and is also used when propagating quiescent states. @@ -1181,26 +1152,22 @@ Finally, the ->dynticks_fqs field is used to count the number of times this CPU is determined to be in dyntick-idle state, and is used for tracing and debugging purposes. -

    -The rcu_dynticks Structure

    - -

    The rcu_dynticks maintains the per-CPU dyntick-idle state -for the corresponding CPU. -Unlike the other structures, rcu_dynticks is not -replicated over the different flavors of RCU. -The fields in this structure may be accessed only from the corresponding -CPU (and from tracing) unless otherwise stated. -Its fields are as follows: +

    +This portion of the rcu_data structure is declared as follows:

       1   long dynticks_nesting;
       2   long dynticks_nmi_nesting;
       3   atomic_t dynticks;
       4   bool rcu_need_heavy_qs;
    -  5   unsigned long rcu_qs_ctr;
    -  6   bool rcu_urgent_qs;
    +  5   bool rcu_urgent_qs;
     
    +

    These fields in the rcu_data structure maintain the per-CPU dyntick-idle +state for the corresponding CPU. +The fields may be accessed only from the corresponding CPU (and from tracing) +unless otherwise stated. +

    The ->dynticks_nesting field counts the nesting depth of process execution, so that in normal circumstances this counter has value zero or one. @@ -1242,19 +1209,12 @@ it is willing to call for heavy-weight dyntick-counter operations. This flag is checked by RCU's context-switch and cond_resched() code, which provide a momentary idle sojourn in response. -

    The ->rcu_qs_ctr field is used to record -quiescent states from cond_resched(). -Because cond_resched() can execute quite frequently, this -must be quite lightweight, as in a non-atomic increment of this -per-CPU field. -

    Finally, the ->rcu_urgent_qs field is used to record -the fact that the RCU core code would really like to see a quiescent -state from the corresponding CPU, with the various other fields indicating -just how badly RCU wants this quiescent state. -This flag is checked by RCU's context-switch and cond_resched() -code, which, if nothing else, non-atomically increment ->rcu_qs_ctr -in response. +the fact that the RCU core code would really like to see a quiescent state from +the corresponding CPU, with the various other fields indicating just how badly +RCU wants this quiescent state. +This flag is checked by RCU's context-switch path +(rcu_note_context_switch) and the cond_resched code. @@ -1431,7 +1391,7 @@ So each flavor of RCU is represented by an rcu_state structure, which contains a combining tree of rcu_node and rcu_data structures. Finally, in CONFIG_NO_HZ_IDLE kernels, each CPU's dyntick-idle -state is tracked by an rcu_dynticks structure. +state is tracked by dynticks-related fields in the rcu_data structure. If you made it this far, you are well prepared to read the code walkthroughs in the other articles in this series. -- cgit v1.2.3 From b54d9db26031d6dc96222164092eacbaa0329255 Mon Sep 17 00:00:00 2001 From: "Joel Fernandes (Google)" Date: Wed, 3 Oct 2018 17:40:28 -0700 Subject: doc: rcu: Update Data-Structures for RCU flavor consolidation This patch updates all Data-Structures document figures and text and removes some unwanted figures, to reflect the recent work Paul has been doing with consolidating all flavors of RCU. Signed-off-by: Joel Fernandes (Google) Cc: Signed-off-by: Paul E. McKenney --- .../Design/Data-Structures/BigTreeClassicRCUBH.svg | 499 ------------ .../Data-Structures/BigTreePreemptRCUBHdyntick.svg | 741 ------------------ .../BigTreePreemptRCUBHdyntickCB.svg | 834 ++++++++------------- .../Design/Data-Structures/Data-Structures.html | 49 +- .../RCU/Design/Data-Structures/blkd_task.svg | 676 ++++++----------- 5 files changed, 559 insertions(+), 2240 deletions(-) delete mode 100644 Documentation/RCU/Design/Data-Structures/BigTreeClassicRCUBH.svg delete mode 100644 Documentation/RCU/Design/Data-Structures/BigTreePreemptRCUBHdyntick.svg (limited to 'Documentation/RCU/Design') diff --git a/Documentation/RCU/Design/Data-Structures/BigTreeClassicRCUBH.svg b/Documentation/RCU/Design/Data-Structures/BigTreeClassicRCUBH.svg deleted file mode 100644 index 9bbb1944f962..000000000000 --- a/Documentation/RCU/Design/Data-Structures/BigTreeClassicRCUBH.svg +++ /dev/null @@ -1,499 +0,0 @@ - - - - - - - - - - - - image/svg+xml - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - rcu_bh - - struct - - rcu_node - - struct - - rcu_node - - rcu_node - - struct - - struct - - rcu_data - - struct - - rcu_data - - struct - - rcu_data - - struct - - rcu_data - - struct rcu_state - - rcu_sched - - - - - - - - - - - diff --git a/Documentation/RCU/Design/Data-Structures/BigTreePreemptRCUBHdyntick.svg b/Documentation/RCU/Design/Data-Structures/BigTreePreemptRCUBHdyntick.svg deleted file mode 100644 index 15adcac036c7..000000000000 --- a/Documentation/RCU/Design/Data-Structures/BigTreePreemptRCUBHdyntick.svg +++ /dev/null @@ -1,741 +0,0 @@ - - - - - - - - - - - - image/svg+xml - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - rcu_bh - - struct - - rcu_node - - struct - - rcu_node - - rcu_node - - struct - - struct - - rcu_data - - struct - - rcu_data - - struct - - rcu_data - - struct - - rcu_data - - struct rcu_state - - struct - - rcu_dynticks - - struct - - rcu_dynticks - - struct - - rcu_dynticks - - struct - - rcu_dynticks - - rcu_preempt - - rcu_sched - - - - - diff --git a/Documentation/RCU/Design/Data-Structures/BigTreePreemptRCUBHdyntickCB.svg b/Documentation/RCU/Design/Data-Structures/BigTreePreemptRCUBHdyntickCB.svg index bbc3801470d0..3a1a4f85dc3a 100644 --- a/Documentation/RCU/Design/Data-Structures/BigTreePreemptRCUBHdyntickCB.svg +++ b/Documentation/RCU/Design/Data-Structures/BigTreePreemptRCUBHdyntickCB.svg @@ -13,12 +13,12 @@ xmlns="http://www.w3.org/2000/svg" xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape" - width="7.4in" - height="9.9in" - viewBox="-44 -44 8938 11938" + width="7.4000001in" + height="7.9000001in" + viewBox="-44 -44 8938 9526.283" id="svg2" version="1.1" - inkscape:version="0.48.4 r9939" + inkscape:version="0.92.2pre0 (973e216, 2017-07-25)" sodipodi:docname="BigTreePreemptRCUBHdyntickCB.svg"> @@ -37,15 +37,46 @@ + + + + + + + style="overflow:visible"> + d="M 0,0 5,-5 -12.5,0 5,5 Z" + style="fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt" + transform="matrix(-0.4,0,0,-0.4,-4,0)" + inkscape:connector-curvature="0" /> + style="fill:none;stroke-width:0.025in" + id="g4" + transform="translate(0,-2415.6743)"> - - - - - - - + style="stroke:#00ff00;stroke-width:14;stroke-miterlimit:8" + id="polyline20" + transform="matrix(1,0,0,0.95854605,12.340758,1579.9033)" /> - + style="stroke:#00ff00;stroke-width:14;stroke-miterlimit:8" + id="polyline24" + transform="matrix(1,0,0,0.95854605,12.340758,1579.9033)" /> - + style="stroke:#00ff00;stroke-width:14;stroke-miterlimit:8" + id="polyline28" + transform="matrix(1,0,0,0.95854605,12.340758,1579.9033)" /> - + style="stroke:#00ff00;stroke-width:14;stroke-miterlimit:8" + id="polyline32" + transform="matrix(1,0,0,0.95854605,12.340758,1579.9033)" /> - + style="stroke:#00ff00;stroke-width:14;stroke-miterlimit:8" + id="polyline36" + transform="matrix(1,0,0,0.95854605,12.340758,1579.9033)" /> - + style="stroke:#00ff00;stroke-width:14;stroke-miterlimit:8" + id="polyline40" + transform="matrix(1,0,0,0.95854605,12.340758,1579.9033)" /> + style="stroke:#00d1d1;stroke-width:29.99464035;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;marker-end:url(#Arrow1Mend)" + id="polyline46" + transform="matrix(1,0,0,0.95854605,12.340758,1579.9033)" /> + style="stroke:#00d1d1;stroke-width:29.99464035;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;marker-end:url(#Arrow1Mend)" + id="polyline50" + transform="matrix(1,0,0,0.95854605,12.340758,1579.9033)" /> + style="stroke:#00d1d1;stroke-width:29.99464035;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;marker-end:url(#Arrow1Mend)" + id="polyline54" + transform="matrix(1,0,0,0.95854605,12.340758,1579.9033)" /> + style="stroke:#00d1d1;stroke-width:29.99464035;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;marker-end:url(#Arrow1Mend)" + id="polyline58" + transform="matrix(1,0,0,0.95854605,12.340758,1579.9033)" /> + style="stroke:#00d1d1;stroke-width:29.99464035;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;marker-end:url(#Arrow1Mend)" + id="polyline62" + transform="matrix(1,0,0,0.95854605,12.340758,1579.9033)" /> - - - - - - + - + - + - + - + - + - + - + - + + style="stroke:#000000;stroke-width:30;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend)" + id="polyline108" + transform="matrix(1,0,0,0.95854605,-604.69715,525.62477)" /> + style="stroke:#000000;stroke-width:30;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend)" + id="polyline114" + transform="matrix(1,0,0,0.95854605,-604.69715,525.62477)" /> - - - - struct + id="text140" + style="font-style:normal;font-weight:bold;font-size:187.978302px;font-family:Courier;text-anchor:middle;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">struct rcu_head + id="text142" + style="font-style:normal;font-weight:bold;font-size:187.978302px;font-family:Courier;text-anchor:middle;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">rcu_head struct + id="text144" + style="font-style:normal;font-weight:bold;font-size:187.978302px;font-family:Courier;text-anchor:middle;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">struct rcu_head + id="text146" + style="font-style:normal;font-weight:bold;font-size:187.978302px;font-family:Courier;text-anchor:middle;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">rcu_head struct + id="text148" + style="font-style:normal;font-weight:bold;font-size:187.978302px;font-family:Courier;text-anchor:middle;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">struct rcu_head + id="text150" + style="font-style:normal;font-weight:bold;font-size:187.978302px;font-family:Courier;text-anchor:middle;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">rcu_head rcu_sched + id="text152" + style="font-style:normal;font-weight:normal;font-size:187.978302px;font-family:Helvetica;text-anchor:end;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">rcu_state - rcu_bh struct + id="text156" + style="font-style:normal;font-weight:bold;font-size:187.978302px;font-family:Courier;text-anchor:middle;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">struct rcu_node + id="text158" + style="font-style:normal;font-weight:bold;font-size:187.978302px;font-family:Courier;text-anchor:middle;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">rcu_node struct + id="text160" + style="font-style:normal;font-weight:bold;font-size:187.978302px;font-family:Courier;text-anchor:middle;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">struct rcu_node + id="text162" + style="font-style:normal;font-weight:bold;font-size:187.978302px;font-family:Courier;text-anchor:middle;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">rcu_node rcu_node + id="text164" + style="font-style:normal;font-weight:bold;font-size:187.978302px;font-family:Courier;text-anchor:middle;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">rcu_node struct + id="text166" + style="font-style:normal;font-weight:bold;font-size:187.978302px;font-family:Courier;text-anchor:middle;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">struct struct + id="text168" + style="font-style:normal;font-weight:bold;font-size:187.978302px;font-family:Courier;text-anchor:middle;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">struct rcu_data + id="text170" + style="font-style:normal;font-weight:bold;font-size:187.978302px;font-family:Courier;text-anchor:middle;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">rcu_data struct + id="text172" + style="font-style:normal;font-weight:bold;font-size:187.978302px;font-family:Courier;text-anchor:middle;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">struct rcu_data + id="text174" + style="font-style:normal;font-weight:bold;font-size:187.978302px;font-family:Courier;text-anchor:middle;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">rcu_data struct + id="text176" + style="font-style:normal;font-weight:bold;font-size:187.978302px;font-family:Courier;text-anchor:middle;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">struct rcu_data + id="text178" + style="font-style:normal;font-weight:bold;font-size:187.978302px;font-family:Courier;text-anchor:middle;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">rcu_data struct + id="text180" + style="font-style:normal;font-weight:bold;font-size:187.978302px;font-family:Courier;text-anchor:middle;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">struct rcu_data + id="text182" + style="font-style:normal;font-weight:bold;font-size:187.978302px;font-family:Courier;text-anchor:middle;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">rcu_data struct rcu_state + id="text184" + style="font-style:normal;font-weight:bold;font-size:187.978302px;font-family:Courier;text-anchor:start;fill:#000000;stroke-width:0.02447634in" + transform="scale(1.0213945,0.97905363)">struct rcu_state - struct - rcu_dynticks - struct - rcu_dynticks - struct - rcu_dynticks - struct - rcu_dynticks - rcu_preempt + style="stroke:#00d1d1;stroke-width:29.99464035;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;marker-end:url(#Arrow1Mend)" + id="polyline204" + transform="matrix(1,0,0,0.95854605,12.340758,1579.9033)" /> + diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.html b/Documentation/RCU/Design/Data-Structures/Data-Structures.html index 4eb603e3a005..28b241074c86 100644 --- a/Documentation/RCU/Design/Data-Structures/Data-Structures.html +++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.html @@ -154,36 +154,9 @@ on that root rcu_node structure remains acceptably low. keeping lock contention under control at all tree levels regardless of the level of loading on the system. -

    The Linux kernel actually supports multiple flavors of RCU -running concurrently, so RCU builds separate data structures for each -flavor. -For example, for CONFIG_TREE_RCU=y kernels, RCU provides -rcu_sched and rcu_bh, as shown below: - -

    BigTreeClassicRCUBH.svg - -

    Energy efficiency is increasingly important, and for that -reason the Linux kernel provides CONFIG_NO_HZ_IDLE, which -turns off the scheduling-clock interrupts on idle CPUs, which in -turn allows those CPUs to attain deeper sleep states and to consume -less energy. -CPUs whose scheduling-clock interrupts have been turned off are -said to be in dyntick-idle mode. -RCU must handle dyntick-idle CPUs specially -because RCU would otherwise wake up each CPU on every grace period, -which would defeat the whole purpose of CONFIG_NO_HZ_IDLE. -RCU uses the dynticks related fields in the rcu_data structure -to track which CPUs are in dyntick idle mode. - -

    Kernels built with CONFIG_PREEMPT_RCU support -rcu_preempt in addition to rcu_sched and rcu_bh, as shown below: - -

    BigTreePreemptRCUBHdyntick.svg -

    RCU updaters wait for normal grace periods by registering RCU callbacks, either directly via call_rcu() and friends (namely call_rcu_bh() and call_rcu_sched()), -there being a separate interface per flavor of RCU) or indirectly via synchronize_rcu() and friends. RCU callbacks are represented by rcu_head structures, which are queued on rcu_data structures while they are @@ -278,7 +251,7 @@ structures. The rcu_state Structure

    The rcu_state structure is the base structure that -represents a flavor of RCU. +represents the state of RCU in the system. This structure forms the interconnection between the rcu_node and rcu_data structures, tracks grace periods, contains the lock used to @@ -373,7 +346,7 @@ sequence number. The bottom two bits are the state of the current grace period, which can be zero for not yet started or one for in progress. In other words, if the bottom two bits of ->gp_seq are -zero, the corresponding flavor of RCU is idle. +zero, then RCU is idle. Any other value in the bottom two bits indicates that something is broken. This field is protected by the root rcu_node structure's ->lock field. @@ -403,10 +376,10 @@ as follows: grace period in jiffies. It is protected by the root rcu_node's ->lock. -

    The ->name field points to the name of the RCU flavor -(for example, “rcu_sched”), and is constant. -The ->abbr field contains a one-character abbreviation, -for example, “s” for RCU-sched. +

    The ->name and ->abbr fields distinguish +between preemptible RCU (“rcu_preempt” and “p”) +and non-preemptible RCU (“rcu_sched” and “s”). +These fields are used for diagnostic and tracing purposes.

    The rcu_node Structure

    @@ -972,8 +945,7 @@ of rcu_barrier().

    The rcu_data Structure

    -

    The rcu_data maintains the per-CPU state for the -corresponding flavor of RCU. +

    The rcu_data maintains the per-CPU state for the RCU subsystem. The fields in this structure may be accessed only from the corresponding CPU (and from tracing) unless otherwise stated. This structure is the @@ -1030,7 +1002,6 @@ as follows: 3 bool cpu_no_qs; 4 bool core_needs_qs; 5 bool gpwrap; - 6 unsigned long rcu_qs_ctr_snap;

    The ->gp_seq and ->gp_seq_needed @@ -1076,10 +1047,6 @@ CPU has remained idle for so long that the gp_seq counter is in danger of overflow, which will cause the CPU to disregard the values of its counters on its next exit from idle. -Finally, the rcu_qs_ctr_snap field is used to detect -cases where a given operation has resulted in a quiescent state -for all flavors of RCU, for example, cond_resched() -when RCU has indicated a need for quiescent states.

    RCU Callback Handling
    @@ -1387,7 +1354,7 @@ the last part of the array, thus traversing only the leaf

    Summary

    -So each flavor of RCU is represented by an rcu_state structure, +So the state of RCU is represented by an rcu_state structure, which contains a combining tree of rcu_node and rcu_data structures. Finally, in CONFIG_NO_HZ_IDLE kernels, each CPU's dyntick-idle diff --git a/Documentation/RCU/Design/Data-Structures/blkd_task.svg b/Documentation/RCU/Design/Data-Structures/blkd_task.svg index 00e810bb8419..bed13e9ecab8 100644 --- a/Documentation/RCU/Design/Data-Structures/blkd_task.svg +++ b/Documentation/RCU/Design/Data-Structures/blkd_task.svg @@ -14,12 +14,12 @@ xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape" width="10.1in" - height="8.6in" - viewBox="-44 -44 12088 10288" + height="6.5999999in" + viewBox="-44 -44 12088 7895.4414" id="svg2" version="1.1" - inkscape:version="0.48.4 r9939" - sodipodi:docname="blkd_task.fig"> + inkscape:version="0.92.2pre0 (973e216, 2017-07-25)" + sodipodi:docname="blkd_task.svg"> @@ -37,15 +37,16 @@ + style="overflow:visible"> + d="M 0,0 5,-5 -12.5,0 5,5 Z" + style="fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt" + transform="matrix(-0.4,0,0,-0.4,-4,0)" + inkscape:connector-curvature="0" /> + inkscape:cx="456.40569" + inkscape:cy="348.88682" + inkscape:window-x="0" + inkscape:window-y="0" + inkscape:window-maximized="1" + inkscape:current-layer="g4" + showguides="false" /> + style="fill:none;stroke-width:0.025in" + id="g4" + transform="translate(0,-2393.6637)"> - - - - + style="stroke:#00ff00;stroke-width:14;stroke-miterlimit:8" + id="polyline14" + transform="translate(23.757862,2185.7233)" /> - + style="stroke:#00ff00;stroke-width:14;stroke-miterlimit:8" + id="polyline18" + transform="translate(23.757862,2185.7233)" /> - + style="stroke:#00ff00;stroke-width:14;stroke-miterlimit:8" + id="polyline22" + transform="translate(23.757862,2185.7233)" /> - + style="stroke:#00ff00;stroke-width:14;stroke-miterlimit:8" + id="polyline26" + transform="translate(23.757862,2185.7233)" /> + style="stroke:#00d1d1;stroke-width:30.00057793;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;marker-end:url(#Arrow1Mend)" + id="polyline32" + transform="translate(23.757862,2185.7233)" /> + style="stroke:#00d1d1;stroke-width:30.00057793;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;marker-end:url(#Arrow1Mend)" + id="polyline36" + transform="translate(23.757862,2185.7233)" /> + style="stroke:#00d1d1;stroke-width:30.00057793;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;marker-end:url(#Arrow1Mend)" + id="polyline40" + transform="translate(23.757862,2185.7233)" /> + style="stroke:#00d1d1;stroke-width:30.00057793;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;marker-end:url(#Arrow1Mend)" + id="polyline44" + transform="translate(23.757862,2185.7233)" /> + style="stroke:#00d1d1;stroke-width:30.00057793;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;marker-end:url(#Arrow1Mend)" + id="polyline48" + transform="translate(23.757862,2185.7233)" /> - - - - - - - - + points="7350,2850 7350,5100 5550,4350 5550,3450 " + style="fill:#ffbfbf;stroke:#000000;stroke-width:14;stroke-linecap:butt;stroke-linejoin:miter;stroke-dasharray:120, 120" + id="polygon106" + transform="translate(23.757862,2185.7233)" /> + style="stroke:#000000;stroke-width:30.00057793;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;marker-end:url(#Arrow1Mend)" + id="polyline108" + transform="translate(23.757862,2185.7233)" /> + style="stroke:#000000;stroke-width:30.00057793;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;marker-end:url(#Arrow1Mend)" + id="polyline114" + transform="translate(23.757862,2185.7233)" /> + style="stroke:#000000;stroke-width:30.00057793;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;marker-end:url(#Arrow1Mend)" + id="polyline120" + transform="translate(23.757862,2185.7233)" /> + style="stroke:#000000;stroke-width:30.00057793;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;marker-end:url(#Arrow1Mend)" + id="polyline126" + transform="translate(23.757862,2185.7233)" /> + style="stroke:#000000;stroke-width:30.00057793;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;marker-end:url(#Arrow1Mend)" + id="polyline130" + transform="translate(23.757862,2185.7233)" /> - rcu_bh struct + id="text136" + style="font-style:normal;font-weight:bold;font-size:192px;font-family:Courier;text-anchor:middle;fill:#000000">struct rcu_node + id="text138" + style="font-style:normal;font-weight:bold;font-size:192px;font-family:Courier;text-anchor:middle;fill:#000000">rcu_node struct + id="text140" + style="font-style:normal;font-weight:bold;font-size:192px;font-family:Courier;text-anchor:middle;fill:#000000">struct rcu_node + id="text142" + style="font-style:normal;font-weight:bold;font-size:192px;font-family:Courier;text-anchor:middle;fill:#000000">rcu_node struct + id="text144" + style="font-style:normal;font-weight:bold;font-size:192px;font-family:Courier;text-anchor:middle;fill:#000000">struct rcu_data + id="text146" + style="font-style:normal;font-weight:bold;font-size:192px;font-family:Courier;text-anchor:middle;fill:#000000">rcu_data struct + id="text148" + style="font-style:normal;font-weight:bold;font-size:192px;font-family:Courier;text-anchor:middle;fill:#000000">struct rcu_data + id="text150" + style="font-style:normal;font-weight:bold;font-size:192px;font-family:Courier;text-anchor:middle;fill:#000000">rcu_data struct + id="text152" + style="font-style:normal;font-weight:bold;font-size:192px;font-family:Courier;text-anchor:middle;fill:#000000">struct rcu_data + id="text154" + style="font-style:normal;font-weight:bold;font-size:192px;font-family:Courier;text-anchor:middle;fill:#000000">rcu_data struct + id="text156" + style="font-style:normal;font-weight:bold;font-size:192px;font-family:Courier;text-anchor:middle;fill:#000000">struct rcu_data + id="text158" + style="font-style:normal;font-weight:bold;font-size:192px;font-family:Courier;text-anchor:middle;fill:#000000">rcu_data struct rcu_state + id="text160" + style="font-style:normal;font-weight:bold;font-size:192px;font-family:Courier;text-anchor:start;fill:#000000">struct rcu_state - struct - rcu_dynticks - struct - rcu_dynticks - struct - rcu_dynticks - struct - rcu_dynticks rcu_sched + id="text178" + style="font-style:normal;font-weight:normal;font-size:192px;font-family:Helvetica;text-anchor:end;fill:#000000">rcu_state T3 + id="text180" + style="font-style:normal;font-weight:normal;font-size:216px;font-family:Helvetica;text-anchor:middle;fill:#000000">T3 T2 + id="text182" + style="font-style:normal;font-weight:normal;font-size:216px;font-family:Helvetica;text-anchor:middle;fill:#000000">T2 T1 + id="text184" + style="font-style:normal;font-weight:normal;font-size:216px;font-family:Helvetica;text-anchor:middle;fill:#000000">T1 + style="stroke:#00d1d1;stroke-width:30.00057793;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;marker-end:url(#Arrow1Mend)" + id="polyline186" + transform="translate(23.757862,2185.7233)" /> rcu_node + id="text198" + style="font-style:normal;font-weight:bold;font-size:192px;font-family:Courier;text-anchor:middle;fill:#000000">rcu_node struct + id="text200" + style="font-style:normal;font-weight:bold;font-size:192px;font-family:Courier;text-anchor:middle;fill:#000000">struct blkd_tasks + id="text202" + style="font-style:normal;font-weight:bold;font-size:192px;font-family:Courier;text-anchor:start;fill:#000000">blkd_tasks gp_tasks + id="text204" + style="font-style:normal;font-weight:bold;font-size:192px;font-family:Courier;text-anchor:start;fill:#000000">gp_tasks exp_tasks + id="text206" + style="font-style:normal;font-weight:bold;font-size:192px;font-family:Courier;text-anchor:start;fill:#000000">exp_tasks -- cgit v1.2.3 From 82eccec851478e55bfd398d7e9d03300026fc4a9 Mon Sep 17 00:00:00 2001 From: "Joel Fernandes (Google)" Date: Tue, 25 Sep 2018 11:26:00 -0700 Subject: doc: rcu: Better clarify the rcu_segcblist ->len field An important note under the rcu_segcblist description could use a more detailed description. Especially explanation of the scenario where the ->head field may be temporarily NULL making it not wise to rely on it to determine if callbacks are associated with the rcu_segcblist. Signed-off-by: Joel Fernandes (Google) Cc: Signed-off-by: Paul E. McKenney --- .../Design/Data-Structures/Data-Structures.html | 23 ++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) (limited to 'Documentation/RCU/Design') diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.html b/Documentation/RCU/Design/Data-Structures/Data-Structures.html index 28b241074c86..3ed5f0182bc4 100644 --- a/Documentation/RCU/Design/Data-Structures/Data-Structures.html +++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.html @@ -928,17 +928,24 @@ this rcu_segcblist structure, not the ->head pointer. The reason for this is that all the ready-to-invoke callbacks (that is, those in the RCU_DONE_TAIL segment) are extracted -all at once at callback-invocation time. +all at once at callback-invocation time (rcu_do_batch), due +to which ->head may be set to NULL if there are no not-done +callbacks remaining in the rcu_segcblist. If callback invocation must be postponed, for example, because a high-priority process just woke up on this CPU, then the remaining -callbacks are placed back on the RCU_DONE_TAIL segment. -Either way, the ->len and ->len_lazy counts -are adjusted after the corresponding callbacks have been invoked, and so -again it is the ->len count that accurately reflects whether -or not there are callbacks associated with this rcu_segcblist -structure. +callbacks are placed back on the RCU_DONE_TAIL segment and +->head once again points to the start of the segment. +In short, the head field can briefly be NULL even though the +CPU has callbacks present the entire time. +Therefore, it is not appropriate to test the ->head pointer +for NULL. + +

    In contrast, the ->len and ->len_lazy counts +are adjusted only after the corresponding callbacks have been invoked. +This means that the ->len count is zero only if +the rcu_segcblist structure really is devoid of callbacks. Of course, off-CPU sampling of the ->len count requires -the use of appropriate synchronization, for example, memory barriers. +careful use of appropriate synchronization, for example, memory barriers. This synchronization can be a bit subtle, particularly in the case of rcu_barrier(). -- cgit v1.2.3 From 70f0508caba2ccb564337e7a2ac4816b094abc00 Mon Sep 17 00:00:00 2001 From: "Joel Fernandes (Google)" Date: Tue, 25 Sep 2018 11:26:01 -0700 Subject: doc: rcu: Update description of gp_seq fields in rcu_data The rcu_state structure doesn't have a gp_seq_needed field. Update the description under rcu_data accordingly, to reflect this. Signed-off-by: Joel Fernandes (Google) Cc: Signed-off-by: Paul E. McKenney --- Documentation/RCU/Design/Data-Structures/Data-Structures.html | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'Documentation/RCU/Design') diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.html b/Documentation/RCU/Design/Data-Structures/Data-Structures.html index 3ed5f0182bc4..18f179807563 100644 --- a/Documentation/RCU/Design/Data-Structures/Data-Structures.html +++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.html @@ -1011,9 +1011,10 @@ as follows: 5 bool gpwrap; -

    The ->gp_seq and ->gp_seq_needed -fields are the counterparts of the fields of the same name -in the rcu_state and rcu_node structures. +

    The ->gp_seq field is the counterpart of the field of the same +name in the rcu_state and rcu_node structures. The +->gp_seq_needed field is the counterpart of the field of the same +name in the rcu_node structure. They may each lag up to one behind their rcu_node counterparts, but in CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_FULL kernels can lag -- cgit v1.2.3 From 8b9df28d7f2e50dc1be758e98dad61ec77d6f6b5 Mon Sep 17 00:00:00 2001 From: "Joel Fernandes (Google)" Date: Sun, 14 Oct 2018 14:29:55 -0700 Subject: doc: Remove obsolete (non-)requirement about disabling preemption The Requirements.html document says "Disabling Preemption Does Not Block Grace Periods". However this is no longer true with the RCU consolidation. This commit therefore removes the obsolete (non-)requirement entirely. Signed-off-by: Joel Fernandes (Google) Signed-off-by: Paul E. McKenney --- .../RCU/Design/Requirements/Requirements.html | 50 ---------------------- 1 file changed, 50 deletions(-) (limited to 'Documentation/RCU/Design') diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index 7efc1c1da7af..4fae55056c1d 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html @@ -900,8 +900,6 @@ Except where otherwise noted, these non-guarantees were premeditated. Grace Periods Don't Partition Read-Side Critical Sections

  • Read-Side Critical Sections Don't Partition Grace Periods -
  • - Disabling Preemption Does Not Block Grace Periods

    Readers Impose Minimal Ordering

    @@ -1259,54 +1257,6 @@ of RCU grace periods.
  •  
     
    -

    -Disabling Preemption Does Not Block Grace Periods

    - -

    -There was a time when disabling preemption on any given CPU would block -subsequent grace periods. -However, this was an accident of implementation and is not a requirement. -And in the current Linux-kernel implementation, disabling preemption -on a given CPU in fact does not block grace periods, as Oleg Nesterov -demonstrated. - -

    -If you need a preempt-disable region to block grace periods, you need to add -rcu_read_lock() and rcu_read_unlock(), for example -as follows: - -

    -
    - 1 preempt_disable();
    - 2 rcu_read_lock();
    - 3 do_something();
    - 4 rcu_read_unlock();
    - 5 preempt_enable();
    - 6
    - 7 /* Spinlocks implicitly disable preemption. */
    - 8 spin_lock(&mylock);
    - 9 rcu_read_lock();
    -10 do_something();
    -11 rcu_read_unlock();
    -12 spin_unlock(&mylock);
    -
    -
    - -

    -In theory, you could enter the RCU read-side critical section first, -but it is more efficient to keep the entire RCU read-side critical -section contained in the preempt-disable region as shown above. -Of course, RCU read-side critical sections that extend outside of -preempt-disable regions will work correctly, but such critical sections -can be preempted, which forces rcu_read_unlock() to do -more work. -And no, this is not an invitation to enclose all of your RCU -read-side critical sections within preempt-disable regions, because -doing so would degrade real-time response. - -

    -This non-requirement appeared with preemptible RCU. -

    Parallelism Facts of Life

    -- cgit v1.2.3 From 97949f0176da396c32e7c881cbfbc61642fb1266 Mon Sep 17 00:00:00 2001 From: "Joel Fernandes (Google)" Date: Sun, 14 Oct 2018 19:29:42 -0700 Subject: doc: Make listing in RCU perf/scale requirements use rcu_assign_pointer() The code listing under this section has a quick quiz that says line 19 uses rcu_access_pointer, but the code listing itself instead uses rcu_dereference(). This commit therefore makes the code listing match the quick quiz. Signed-off-by: Joel Fernandes (Google) Signed-off-by: Paul E. McKenney --- Documentation/RCU/Design/Requirements/Requirements.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/RCU/Design') diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index 4fae55056c1d..f74a2233865c 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html @@ -1596,7 +1596,7 @@ used in place of synchronize_rcu() as follows: 16 struct foo *p; 17 18 spin_lock(&gp_lock); -19 p = rcu_dereference(gp); +19 p = rcu_access_pointer(gp); 20 if (!p) { 21 spin_unlock(&gp_lock); 22 return false; -- cgit v1.2.3 From 97562c018135a9d01c59bd3bf95a9458548b79e2 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 15 Oct 2018 10:54:13 -0700 Subject: doc: RCU scheduler spinlock rcu_read_unlock() restriction remains Given RCU flavor consolidation, when rcu_read_unlock() is invoked with interrupts disabled, the reporting of the corresponding quiescent state is deferred until interrupts are re-enabled. There was therefore some hope that this would allow dropping the restriction against holding scheduler spinlocks across an rcu_read_unlock() without disabling interrupts across the entire corresponding RCU read-side critical section. Unfortunately, the need to quickly provide a quiescent state to expedited grace periods sometimes requires a call to raise_softirq() during rcu_read_unlock() execution. Because raise_softirq() can sometimes acquire the scheduler spinlocks, the restriction must remain in effect. This commit therefore updates the RCU requirements documentation accordingly. Signed-off-by: Paul E. McKenney --- .../RCU/Design/Requirements/Requirements.html | 44 ++++++++++++++-------- 1 file changed, 29 insertions(+), 15 deletions(-) (limited to 'Documentation/RCU/Design') diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index f74a2233865c..9fca73e03a98 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html @@ -2475,23 +2475,37 @@ for context-switch-heavy CONFIG_NO_HZ_FULL=y workloads, but there is room for further improvement.

    -In the past, it was forbidden to disable interrupts across an -rcu_read_unlock() unless that interrupt-disabled region -of code also included the matching rcu_read_lock(). -Violating this restriction could result in deadlocks involving the -scheduler's runqueue and priority-inheritance spinlocks. -This restriction was lifted when interrupt-disabled calls to -rcu_read_unlock() started deferring the reporting of -the resulting RCU-preempt quiescent state until the end of that +It is forbidden to hold any of scheduler's runqueue or priority-inheritance +spinlocks across an rcu_read_unlock() unless interrupts have been +disabled across the entire RCU read-side critical section, that is, +up to and including the matching rcu_read_lock(). +Violating this restriction can result in deadlocks involving these +scheduler spinlocks. +There was hope that this restriction might be lifted when interrupt-disabled +calls to rcu_read_unlock() started deferring the reporting of +the resulting RCU-preempt quiescent state until the end of the corresponding interrupts-disabled region. -This deferred reporting means that the scheduler's runqueue and -priority-inheritance locks cannot be held while reporting an RCU-preempt -quiescent state, which lifts the earlier restriction, at least from -a deadlock perspective. -Unfortunately, real-time systems using RCU priority boosting may +Unfortunately, timely reporting of the corresponding quiescent state +to expedited grace periods requires a call to raise_softirq(), +which can acquire these scheduler spinlocks. +In addition, real-time systems using RCU priority boosting need this restriction to remain in effect because deferred -quiescent-state reporting also defers deboosting, which in turn -degrades real-time latencies. +quiescent-state reporting would also defer deboosting, which in turn +would degrade real-time latencies. + +

    +In theory, if a given RCU read-side critical section could be +guaranteed to be less than one second in duration, holding a scheduler +spinlock across that critical section's rcu_read_unlock() +would require only that preemption be disabled across the entire +RCU read-side critical section, not interrupts. +Unfortunately, given the possibility of vCPU preemption, long-running +interrupts, and so on, it is not possible in practice to guarantee +that a given RCU read-side critical section will complete in less than +one second. +Therefore, as noted above, if scheduler spinlocks are held across +a given call to rcu_read_unlock(), interrupts must be +disabled across the entire RCU read-side critical section.

    Tracing and RCU

    -- cgit v1.2.3 From 97b59370fa5959d5833a54f303f640d094af3d3c Mon Sep 17 00:00:00 2001 From: "Joel Fernandes (Google)" Date: Sat, 27 Oct 2018 21:30:46 -0700 Subject: doc: Fix "struction" typo in RCU memory-ordering documentation This commit replaces "struction" with the correct "structure". Signed-off-by: Joel Fernandes (Google) Signed-off-by: Paul E. McKenney --- Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/RCU/Design') diff --git a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html index a346ce0116eb..e4d94fba6c89 100644 --- a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html +++ b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html @@ -77,7 +77,7 @@ The key point is that the lock-acquisition functions, including smp_mb__after_unlock_lock() immediately after successful acquisition of the lock. -

    Therefore, for any given rcu_node struction, any access +

    Therefore, for any given rcu_node structure, any access happening before one of the above lock-release functions will be seen by all CPUs as happening before any access happening after a later one of the above lock-acquisition functions. -- cgit v1.2.3