linux-toradex.git/kernel/time/tick-broadcast.c, branch v3.10.78

tick: Make oneshot broadcast robust vs. CPU offlining

2014-03-24T04:38:21+00:00

commit c9b5a266b103af873abb9ac03bc3d067702c8f4b upstream.

In periodic mode we remove offline cpus from the broadcast propagation
mask. In oneshot mode we fail to do so. This was not a problem so far,
but the recent changes to the broadcast propagation introduced a
constellation which can result in a NULL pointer dereference.

What happens is:

CPU0			CPU1
			idle()
			  arch_idle()
			    tick_broadcast_oneshot_control(OFF);
			      set cpu1 in tick_broadcast_force_mask
			  if (cpu_offline())
			     arch_cpu_dead()

cpu_dead_cleanup(cpu1)
 cpu1 tickdevice pointer = NULL

broadcast interrupt
  dereference cpu1 tickdevice pointer -> OOPS

We dereference the pointer because cpu1 is still set in
tick_broadcast_force_mask and tick_do_broadcast() expects a valid
cpumask and therefor lacks any further checks.

Remove the cpu from the tick_broadcast_force_mask before we set the
tick device pointer to NULL. Also add a sanity check to the oneshot
broadcast function, so we can detect such issues w/o crashing the
machine.

Reported-by: Prarit Bhargava 
Cc: athorlton@sgi.com
Cc: CAI Qian 
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1306261303260.4013@ionos.tec.linutronix.de
Signed-off-by: Thomas Gleixner 
Signed-off-by: Preeti U Murthy 
Signed-off-by: Greg Kroah-Hartman

tick: Clear broadcast pending bit when switching to oneshot

2014-02-22T20:41:29+00:00

commit dd5fd9b91a77b4c9c28b7ef9c181b1a875820d0a upstream.

AMD systems which use the C1E workaround in the amd_e400_idle routine
trigger the WARN_ON_ONCE in the broadcast code when onlining a CPU.

The reason is that the idle routine of those AMD systems switches the
cpu into forced broadcast mode early on before the newly brought up
CPU can switch over to high resolution / NOHZ mode. The timer related
CPU1 bringup looks like this:

  clockevent_register_device(local_apic);
  tick_setup(local_apic);
  ...
  idle()
	tick_broadcast_on_off(FORCE);
	tick_broadcast_oneshot_control(ENTER)
	  cpumask_set(cpu, broadcast_oneshot_mask);
	halt();

Now the broadcast interrupt on CPU0 sets CPU1 in the
broadcast_pending_mask and wakes CPU1. So CPU1 continues:

	local_apic_timer_interrupt()
	   tick_handle_periodic();
	   softirq()
	     tick_init_highres();
	       cpumask_clr(cpu, broadcast_oneshot_mask);

	tick_broadcast_oneshot_control(ENTER)
	   WARN_ON(cpumask_test(cpu, broadcast_pending_mask);

So while we remove CPU1 from the broadcast_oneshot_mask when we switch
over to highres mode, we do not clear the pending bit, which then
triggers the warning when we go back to idle.

The reason why this is only visible on C1E affected AMD systems is
that the other machines enter the deep sleep states via
acpi_idle/intel_idle and exit the broadcast mode before executing the
remote triggered local_apic_timer_interrupt. So the pending bit is
already cleared when the switch over to highres mode is clearing the
oneshot mask.

The solution is simple: Clear the pending bit together with the mask
bit when we switch over to highres mode.

Stanislaw came up independently with the same patch by enforcing the
C1E workaround and debugging the fallout. I picked mine, because mine
has a changelog :)

Reported-by: poma 
Debugged-by: Stanislaw Gruszka 
Signed-off-by: Thomas Gleixner 
Cc: Olaf Hering 
Cc: Dave Jones 
Cc: Justin M. Forbes 
Cc: Josh Boyer 
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402111434180.21991@ionos.tec.linutronix.de
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

clockevents: Split out selection logic

2013-12-08T15:29:27+00:00

commit 45cb8e01b2ecef1c2afb18333e95793fa1a90281 upstream.

Split out the clockevent device selection logic. Preparatory patch to
allow unbinding active clockevent devices.

Signed-off-by: Thomas Gleixner 
Cc: John Stultz 
Cc: Magnus Damm 
Link: http://lkml.kernel.org/r/20130425143436.431796247@linutronix.de
Signed-off-by: Thomas Gleixner 
Cc: Kim Phillips 
Signed-off-by: Greg Kroah-Hartman

clockevents: Add module refcount

2013-12-08T15:29:27+00:00

commit ccf33d6880f39a35158fff66db13000ae4943fac upstream.

We want to be able to remove clockevent modules as well. Add a
refcount so we don't remove a module with an active clock event
device.

Signed-off-by: Thomas Gleixner 
Cc: John Stultz 
Cc: Magnus Damm 
Link: http://lkml.kernel.org/r/20130425143436.307435149@linutronix.de
Signed-off-by: Thomas Gleixner 
Cc: Kim Phillips 
Signed-off-by: Greg Kroah-Hartman

clockevents: Get rid of the notifier chain

2013-12-08T15:29:27+00:00

commit 7172a286ced0c1f4f239a0fa09db54ed37d3ead2 upstream.

7+ years and still a single user. Kill it.

Signed-off-by: Thomas Gleixner 
Cc: John Stultz 
Cc: Magnus Damm 
Link: http://lkml.kernel.org/r/20130425143436.098520211@linutronix.de
Signed-off-by: Thomas Gleixner 
Cc: Kim Phillips 
Signed-off-by: Greg Kroah-Hartman

tick: Prevent uncontrolled switch to oneshot mode

2013-07-25T21:07:29+00:00

commit 1f73a9806bdd07a5106409bbcab3884078bd34fe upstream.

When the system switches from periodic to oneshot mode, the broadcast
logic causes a possibility that a CPU which has not yet switched to
oneshot mode puts its own clock event device into oneshot mode without
updating the state and the timer handler.

CPU0				CPU1
				per cpu tickdev is in periodic mode
				and switched to broadcast

Switch to oneshot mode
 tick_broadcast_switch_to_oneshot()
  cpumask_copy(tick_oneshot_broacast_mask,
	       tick_broadcast_mask);

  broadcast device mode = oneshot

				Timer interrupt

				irq_enter()
				 tick_check_oneshot_broadcast()
				  dev->set_mode(ONESHOT);

				tick_handle_periodic()
				 if (dev->mode == ONESHOT)
				   dev->next_event += period;
				   FAIL.

We fail, because dev->next_event contains KTIME_MAX, if the device was
in periodic mode before the uncontrolled switch to oneshot happened.

We must copy the broadcast bits over to the oneshot mask, because
otherwise a CPU which relies on the broadcast would not been woken up
anymore after the broadcast device switched to oneshot mode.

So we need to verify in tick_check_oneshot_broadcast() whether the CPU
has already switched to oneshot mode. If not, leave the device
untouched and let the CPU switch controlled into oneshot mode.

This is a long standing bug, which was never noticed, because the main
user of the broadcast x86 cannot run into that scenario, AFAICT. The
nonarchitected timer mess of ARM creates a gazillion of differently
broken abominations which trigger the shortcomings of that broadcast
code, which better had never been necessary in the first place.

Reported-and-tested-by: Stehle Vincent-B46079 
Reviewed-by: Stephen Boyd 
Cc: John Stultz ,
Cc: Mark Rutland 
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307012153060.4013@ionos.tec.linutronix.de
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

tick: Sanitize broadcast control logic

2013-07-25T21:07:29+00:00

commit 07bd1172902e782f288e4d44b1fde7dec0f08b6f upstream.

The recent implementation of a generic dummy timer resulted in a
different registration order of per cpu local timers which made the
broadcast control logic go belly up.

If the dummy timer is the first clock event device which is registered
for a CPU, then it is installed, the broadcast timer is initialized
and the CPU is marked as broadcast target.

If a real clock event device is installed after that, we can fail to
take the CPU out of the broadcast mask. In the worst case we end up
with two periodic timer events firing for the same CPU. One from the
per cpu hardware device and one from the broadcast.

Now the problem is that we have no way to distinguish whether the
system is in a state which makes broadcasting necessary or the
broadcast bit was set due to the nonfunctional dummy timer
installment.

To solve this we need to keep track of the system state seperately and
provide a more detailed decision logic whether we keep the CPU in
broadcast mode or not.

The old decision logic only clears the broadcast mode, if the newly
installed clock event device is not affected by power states.

The new logic clears the broadcast mode if one of the following is
true:

  - The new device is not affected by power states.

  - The system is not in a power state affected mode

  - The system has switched to oneshot mode. The oneshot broadcast is
    controlled from the deep idle state. The CPU is not in idle at
    this point, so it's safe to remove it from the mask.

If we clear the broadcast bit for the CPU when a new device is
installed, we also shutdown the broadcast device when this was the
last CPU in the broadcast mask.

If the broadcast bit is kept, then we leave the new device in shutdown
state and rely on the broadcast to deliver the timer interrupts via
the broadcast ipis.

Reported-and-tested-by: Stehle Vincent-B46079 
Reviewed-by: Stephen Boyd 
Cc: John Stultz ,
Cc: Mark Rutland 
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307012153060.4013@ionos.tec.linutronix.de
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

tick: Fix tick_broadcast_pending_mask not cleared

2013-06-21T11:10:34+00:00

The recent modification in the cpuidle framework consolidated the
timer broadcast code across the different drivers by setting a new
flag in the idle state. It tells the cpuidle core code to enter/exit
the broadcast mode for the cpu when entering a deep idle state. The
broadcast timer enter/exit is no longer handled by the back-end
driver.

This change made the local interrupt to be enabled *before* calling
CLOCK_EVENT_NOTIFY_EXIT.

On a tegra114, a four cores system, when the flag has been introduced
in the driver, the following warning appeared:

WARNING: at kernel/time/tick-broadcast.c:578 tick_broadcast_oneshot_control
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.10.0-rc3-next-20130529+ #15
[] (tick_broadcast_oneshot_control+0x1a4/0x1d0) from [] (tick_notify+0x240/0x40c)
[] (tick_notify+0x240/0x40c) from [] (notifier_call_chain+0x44/0x84)
[] (notifier_call_chain+0x44/0x84) from [] (raw_notifier_call_chain+0x18/0x20)
[] (raw_notifier_call_chain+0x18/0x20) from [] (clockevents_notify+0x28/0x170)
[] (clockevents_notify+0x28/0x170) from [] (cpuidle_idle_call+0x11c/0x168)
[] (cpuidle_idle_call+0x11c/0x168) from [] (arch_cpu_idle+0x8/0x38)
[] (arch_cpu_idle+0x8/0x38) from [] (cpu_startup_entry+0x60/0x134)
[] (cpu_startup_entry+0x60/0x134) from [<804fe9a4>] (0x804fe9a4)

I don't have the hardware, so I wasn't able to reproduce the warning
but after looking a while at the code, I deduced the following:

 1. the CPU2 enters a deep idle state and sets the broadcast timer

 2. the timer expires, the tick_handle_oneshot_broadcast function is
    called, setting the tick_broadcast_pending_mask and waking up the
    idle cpu CPU2

 3. the CPU2 exits idle handles the interrupt and then invokes
    tick_broadcast_oneshot_control with CLOCK_EVENT_NOTIFY_EXIT which
    runs the following code:

    [...]
    if (dev->next_event.tv64 == KTIME_MAX)
            goto out;

    if (cpumask_test_and_clear_cpu(cpu,
                                 tick_broadcast_pending_mask))
            goto out;
    [...]

    So if there is no next event scheduled for CPU2, we fulfil the
    first condition and jump out without clearing the
    tick_broadcast_pending_mask.

 4. CPU2 goes to deep idle again and calls
    tick_broadcast_oneshot_control with CLOCK_NOTIFY_EVENT_ENTER but
    with the tick_broadcast_pending_mask set for CPU2, triggering the
    warning.

The issue only surfaced due to the modifications of the cpuidle
framework, which resulted in interrupts being enabled before the call
to the clockevents code. If the call happens before interrupts have
been enabled, the warning cannot trigger, because there is still the
event pending which caused the broadcast timer expiry.

Move the check for the next event below the check for the pending bit,
so the pending bit gets cleared whether an event is scheduled on the
cpu or not.

[ tglx: Massaged changelog ]

Signed-off-by: Daniel Lezcano 
Reported-and-tested-by: Joseph Lo 
Cc: Stephen Warren 
Cc: linux-arm-kernel@lists.infradead.org
Cc: linaro-kernel@lists.linaro.org
Link: http://lkml.kernel.org/r/1371485735-31249-1-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Thomas Gleixner

tick: Remove useless timekeeping duty attribution to broadcast source

2013-05-31T13:58:32+00:00

Since 7300711e ("clockevents: broadcast fixup possible waiters"),
the timekeeping duty is assigned to the CPU that handles the tick
broadcast clock device by the time it is set in one shot mode.

This is an issue in full dynticks mode where the timekeeping duty
must stay handled by the boot CPU for now. Otherwise it prevents
secondary CPUs from offlining and this breaks
suspend/shutdown/reboot/...

As it appears there is no reason for this timekeeping duty to be
moved to the broadcast CPU, besides nothing prevent it from being
later re-assigned to another target, let's simply remove it.

Signed-off-by: Jiri Bohac 
Reported-by: Steven Rostedt 
Acked-by: Thomas Gleixner 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Signed-off-by: Frederic Weisbecker 
Signed-off-by: Ingo Molnar

tick: Cure broadcast false positive pending bit warning

2013-05-28T07:33:01+00:00

commit 26517f3e (tick: Avoid programming the local cpu timer if
broadcast pending) added a warning if the cpu enters broadcast mode
again while the pending bit is still set. Meelis reported that the
warning triggers. There are two corner cases which have been not
considered:

1) cpuidle calls clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
   twice. That can result in the following scenario

   CPU0                    CPU1
                           cpuidle_idle_call()
                             clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
                               set cpu in tick_broadcast_oneshot_mask

   broadcast interrupt
     event expired for cpu1
     set pending bit

                             acpi_idle_enter_simple()
                               clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
                                 WARN_ON(pending bit)

  Move the WARN_ON into the section where we enter broadcast mode so
  it wont provide false positives on the second call.

2) safe_halt() enables interrupts, so a broadcast interrupt can be
   delivered befor the broadcast mode is disabled. That sets the
   pending bit for the CPU which receives the broadcast
   interrupt. Though the interrupt is delivered right away from the
   broadcast handler and leaves the pending bit stale.

   Clear the pending bit for the current cpu in the broadcast handler.

Reported-and-tested-by: Meelis Roos 
Cc: Len Brown 
Cc: Frederic Weisbecker 
Cc: Borislav Petkov 
Cc: Rafael J. Wysocki 
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305271841130.4220@ionos
Signed-off-by: Thomas Gleixner