linux-toradex.git/kernel/rcu/tree.c, branch v5.17-rc7

Merge branches 'doc.2021.11.30c', 'exp.2021.12.07a', 'fastnohz.2021.11.30c', 'fixes.2021.11.30c', 'nocb.2021.12.09a', 'nolibc.2021.11.30c', 'tasks.2021.12.09a', 'torture.2021.12.07a' and 'torturescript.2021.11.30c' into HEAD

2021-12-09T19:38:09+00:00

doc.2021.11.30c: Documentation updates.
exp.2021.12.07a: Expedited-grace-period fixes.
fastnohz.2021.11.30c: Remove CONFIG_RCU_FAST_NO_HZ.
fixes.2021.11.30c: Miscellaneous fixes.
nocb.2021.12.09a: No-CB CPU updates.
nolibc.2021.11.30c: Tiny in-kernel library updates.
tasks.2021.12.09a: RCU-tasks updates, including update-side scalability.
torture.2021.12.07a: Torture-test in-kernel module updates.
torturescript.2021.11.30c: Torture-test scripting updates.

rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread

2021-12-08T00:24:44+00:00

rcu_core() tries to ensure that its self-invocation in case of callbacks
overload only happen in softirq/rcuc mode. Indeed it doesn't make sense
to trigger local RCU core from nocb_cb kthread since it can execute
on a CPU different from the target rdp. Also in case of overload, the
nocb_cb kthread simply iterates a new loop of callbacks processing.

However the "offloaded" check that aims at preventing misplaced
rcu_core() invocations is wrong. First of all that state is volatile
and second: softirq/rcuc can execute while the target rdp is offloaded.
As a result rcu_core() can be invoked on the wrong CPU while in the
process of (de-)offloading.

Fix that with moving the rcu_core() self-invocation to rcu_core() itself,
irrespective of the rdp offloaded state.

Tested-by: Valentin Schneider 
Tested-by: Sebastian Andrzej Siewior 
Signed-off-by: Frederic Weisbecker 
Cc: Valentin Schneider 
Cc: Peter Zijlstra 
Cc: Sebastian Andrzej Siewior 
Cc: Josh Triplett 
Cc: Joel Fernandes 
Cc: Boqun Feng 
Cc: Neeraj Upadhyay 
Cc: Uladzislau Rezki 
Cc: Thomas Gleixner 
Signed-off-by: Paul E. McKenney

rcu: Apply callbacks processing time limit only on softirq

2021-12-08T00:24:44+00:00

Time limit only makes sense when callbacks are serviced in softirq mode
because:

_ In case we need to get back to the scheduler,
  cond_resched_tasks_rcu_qs() is called after each callback.

_ In case some other softirq vector needs the CPU, the call to
  local_bh_enable() before cond_resched_tasks_rcu_qs() takes care about
  them via a call to do_softirq().

Therefore, make sure the time limit only applies to softirq mode.

Reviewed-by: Valentin Schneider 
Tested-by: Valentin Schneider 
Tested-by: Sebastian Andrzej Siewior 
Signed-off-by: Frederic Weisbecker 
Cc: Valentin Schneider 
Cc: Peter Zijlstra 
Cc: Sebastian Andrzej Siewior 
Cc: Josh Triplett 
Cc: Joel Fernandes 
Cc: Boqun Feng 
Cc: Neeraj Upadhyay 
Cc: Uladzislau Rezki 
Cc: Thomas Gleixner 
Signed-off-by: Paul E. McKenney

rcu: Fix callbacks processing time limit retaining cond_resched()

2021-12-08T00:24:44+00:00

The callbacks processing time limit makes sure we are not exceeding a
given amount of time executing the queue.

However its "continue" clause bypasses the cond_resched() call on
rcuc and NOCB kthreads, delaying it until we reach the limit, which can
be very long...

Make sure the scheduler has a higher priority than the time limit.

Reviewed-by: Valentin Schneider 
Tested-by: Valentin Schneider 
Tested-by: Sebastian Andrzej Siewior 
Signed-off-by: Frederic Weisbecker 
Cc: Valentin Schneider 
Cc: Peter Zijlstra 
Cc: Sebastian Andrzej Siewior 
Cc: Josh Triplett 
Cc: Joel Fernandes 
Cc: Boqun Feng 
Cc: Neeraj Upadhyay 
Cc: Uladzislau Rezki 
Cc: Thomas Gleixner 
Signed-off-by: Paul E. McKenney

rcu/nocb: Limit number of softirq callbacks only on softirq

2021-12-08T00:24:44+00:00

The current condition to limit the number of callbacks executed in a
row checks the offloaded state of the rdp. Not only is it volatile
but it is also misleading: the rcu_core() may well be executing
callbacks concurrently with NOCB kthreads, and the offloaded state
would then be verified on both cases. As a result the limit would
spuriously not apply anymore on softirq while in the middle of
(de-)offloading process.

Fix and clarify the condition with those constraints in mind:

_ If callbacks are processed either by rcuc or NOCB kthread, the call
  to cond_resched_tasks_rcu_qs() is enough to take care of the overload.

_ If instead callbacks are processed by softirqs:
  * If need_resched(), exit the callbacks processing
  * Otherwise if CPU is idle we can continue
  * Otherwise exit because a softirq shouldn't interrupt a task for too
    long nor deprive other pending softirq vectors of the CPU.

Tested-by: Valentin Schneider 
Tested-by: Sebastian Andrzej Siewior 
Signed-off-by: Frederic Weisbecker 
Cc: Valentin Schneider 
Cc: Peter Zijlstra 
Cc: Sebastian Andrzej Siewior 
Cc: Josh Triplett 
Cc: Joel Fernandes 
Cc: Boqun Feng 
Cc: Neeraj Upadhyay 
Cc: Uladzislau Rezki 
Cc: Thomas Gleixner 
Signed-off-by: Paul E. McKenney

rcu/nocb: Use appropriate rcu_nocb_lock_irqsave()

2021-12-08T00:24:44+00:00

Instead of hardcoding IRQ save and nocb lock, use the consolidated
API (and fix a comment as per Valentin Schneider's suggestion).

Reviewed-by: Valentin Schneider 
Tested-by: Valentin Schneider 
Tested-by: Sebastian Andrzej Siewior 
Signed-off-by: Frederic Weisbecker 
Cc: Valentin Schneider 
Cc: Peter Zijlstra 
Cc: Sebastian Andrzej Siewior 
Cc: Josh Triplett 
Cc: Joel Fernandes 
Cc: Boqun Feng 
Cc: Neeraj Upadhyay 
Cc: Uladzislau Rezki 
Cc: Thomas Gleixner 
Signed-off-by: Paul E. McKenney

rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check

2021-12-08T00:24:44+00:00

It's not entirely obvious why rdp->qlen_last_fqs_check is updated before
processing the queue only on offloaded rdp. There can be different
effect to that, either in favour of triggering the force quiescent state
path or not. For example:

1) If the number of callbacks has decreased since the last
   rdp->qlen_last_fqs_check update (because we recently called
   rcu_do_batch() and we executed below qhimark callbacks) and the number
   of processed callbacks on a subsequent do_batch() arranges for
   exceeding qhimark on non-offloaded but not on offloaded setup, then we
   may spare a later run to the force quiescent state
   slow path on __call_rcu_nocb_wake(), as compared to the non-offloaded
   counterpart scenario.

   Here is such an offloaded scenario instance:

    qhimark = 1000
    rdp->last_qlen_last_fqs_check = 3000
    rcu_segcblist_n_cbs(rdp) = 2000

    rcu_do_batch() {
        if (offloaded)
            rdp->last_qlen_fqs_check = rcu_segcblist_n_cbs(rdp) // 2000
        // run 1000 callback
        rcu_segcblist_n_cbs(rdp) = 1000
        // Not updating rdp->qlen_last_fqs_check
        if (count < rdp->qlen_last_fqs_check - qhimark)
            rdp->qlen_last_fqs_check = count;
    }

    call_rcu() * 1001 {
        __call_rcu_nocb_wake() {
            // not taking the fqs slowpath:
            // rcu_segcblist_n_cbs(rdp) == 2001
            // rdp->qlen_last_fqs_check == 2000
            // qhimark == 1000
            if (len > rdp->qlen_last_fqs_check + qhimark)
                ...
    }

    In the case of a non-offloaded scenario, rdp->qlen_last_fqs_check
    would be 1000 and the fqs slowpath would have executed.

2) If the number of callbacks has increased since the last
   rdp->qlen_last_fqs_check update (because we recently queued below
   qhimark callbacks) and the number of callbacks executed in rcu_do_batch()
   doesn't exceed qhimark for either offloaded or non-offloaded setup,
   then it's possible that the offloaded scenario later run the force
   quiescent state slow path on __call_rcu_nocb_wake() while the
   non-offloaded doesn't.

    qhimark = 1000
    rdp->last_qlen_last_fqs_check = 3000
    rcu_segcblist_n_cbs(rdp) = 2000

    rcu_do_batch() {
        if (offloaded)
            rdp->last_qlen_last_fqs_check = rcu_segcblist_n_cbs(rdp) // 2000
        // run 100 callbacks
        // concurrent queued 100
        rcu_segcblist_n_cbs(rdp) = 2000
        // Not updating rdp->qlen_last_fqs_check
        if (count < rdp->qlen_last_fqs_check - qhimark)
            rdp->qlen_last_fqs_check = count;
    }

    call_rcu() * 1001 {
        __call_rcu_nocb_wake() {
            // Taking the fqs slowpath:
            // rcu_segcblist_n_cbs(rdp) == 3001
            // rdp->qlen_last_fqs_check == 2000
            // qhimark == 1000
            if (len > rdp->qlen_last_fqs_check + qhimark)
                ...
    }

    In the case of a non-offloaded scenario, rdp->qlen_last_fqs_check
    would be 3000 and the fqs slowpath would have executed.

The reason for updating rdp->qlen_last_fqs_check when invoking callbacks
for offloaded CPUs is that there is usually no point in waking up either
the rcuog or rcuoc kthreads while in this state.  After all, both threads
are prohibited from indefinite sleeps.

The exception is when some huge number of callbacks are enqueued while
rcu_do_batch() is in the midst of invoking, in which case interrupting
the rcuog kthread's timed sleep might get more callbacks set up for the
next grace period.

Reported-and-tested-by: Valentin Schneider 
Tested-by: Sebastian Andrzej Siewior 
Original-patch-by: Thomas Gleixner 
Signed-off-by: Frederic Weisbecker 
Cc: Valentin Schneider 
Cc: Peter Zijlstra 
Cc: Sebastian Andrzej Siewior 
Cc: Josh Triplett 
Cc: Joel Fernandes 
Cc: Boqun Feng 
Cc: Neeraj Upadhyay 
Cc: Uladzislau Rezki 
Cc: Thomas Gleixner 
Signed-off-by: Paul E. McKenney

rcu/nocb: Make rcu_core() callbacks acceleration (de-)offloading safe

2021-12-08T00:24:44+00:00

When callbacks are offloaded, the NOCB kthreads handle the callbacks
progression on behalf of rcu_core().

However during the (de-)offloading process, the kthread may not be
entirely up to the task. As a result some callbacks grace period
sequence number may remain stale for a while because rcu_core() won't
take care of them either.

Fix this with forcing callbacks acceleration from rcu_core() as long
as the offloading process isn't complete.

Reported-and-tested-by: Valentin Schneider 
Tested-by: Sebastian Andrzej Siewior 
Signed-off-by: Frederic Weisbecker 
Cc: Valentin Schneider 
Cc: Peter Zijlstra 
Cc: Sebastian Andrzej Siewior 
Cc: Josh Triplett 
Cc: Joel Fernandes 
Cc: Boqun Feng 
Cc: Neeraj Upadhyay 
Cc: Uladzislau Rezki 
Cc: Thomas Gleixner 
Signed-off-by: Paul E. McKenney

rcu/nocb: Make rcu_core() callbacks acceleration preempt-safe

2021-12-08T00:24:44+00:00

While reporting a quiescent state for a given CPU, rcu_core() takes
advantage of the freshly loaded grace period sequence number and the
locked rnp to accelerate the callbacks whose sequence number have been
assigned a stale value.

This action is only necessary when the rdp isn't offloaded, otherwise
the NOCB kthreads already take care of the callbacks progression.

However the check for the offloaded state is volatile because it is
performed outside the IRQs disabled section. It's possible for the
offloading process to preempt rcu_core() at that point on PREEMPT_RT.

This is dangerous because rcu_core() may end up accelerating callbacks
concurrently with NOCB kthreads without appropriate locking.

Fix this with moving the offloaded check inside the rnp locking section.

Reported-and-tested-by: Valentin Schneider 
Reviewed-by: Valentin Schneider 
Tested-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Sebastian Andrzej Siewior 
Cc: Josh Triplett 
Cc: Joel Fernandes 
Cc: Boqun Feng 
Cc: Neeraj Upadhyay 
Cc: Uladzislau Rezki 
Cc: Thomas Gleixner 
Signed-off-by: Frederic Weisbecker 
Signed-off-by: Paul E. McKenney

rcu/nocb: Invoke rcu_core() at the start of deoffloading

2021-12-08T00:24:44+00:00

On PREEMPT_RT, if rcu_core() is preempted by the de-offloading process,
some work, such as callbacks acceleration and invocation, may be left
unattended due to the volatile checks on the offloaded state.

In the worst case this work is postponed until the next rcu_pending()
check that can take a jiffy to reach, which can be a problem in case
of callbacks flooding.

Solve that with invoking rcu_core() early in the de-offloading process.
This way any work dismissed by an ongoing rcu_core() call fooled by
a preempting deoffloading process will be caught up by a nearby future
recall to rcu_core(), this time fully aware of the de-offloading state.

Tested-by: Valentin Schneider 
Tested-by: Sebastian Andrzej Siewior 
Signed-off-by: Frederic Weisbecker 
Cc: Valentin Schneider 
Cc: Peter Zijlstra 
Cc: Sebastian Andrzej Siewior 
Cc: Josh Triplett 
Cc: Joel Fernandes 
Cc: Boqun Feng 
Cc: Neeraj Upadhyay 
Cc: Uladzislau Rezki 
Cc: Thomas Gleixner 
Signed-off-by: Paul E. McKenney