summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2011-11-14ifdef out include of atomic.hv3.2-rc1-rt1Clark Williams
Signed-off-by: Clark Williams <williams@redhat.com>
2011-11-14kconfig-preempt-rt-full.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14kconfig-disable-a-few-options-rt.patchThomas Gleixner
Disable stuff which is known to have issues on RT Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14scsi-fcoe-rt-aware.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14x86-kvm-require-const-tsc-for-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14sysrq: Allow immediate Magic SysRq output for PREEMPT_RT_FULLFrank Rowand
Add a CONFIG option to allow the output from Magic SysRq to be output immediately, even if this causes large latencies. If PREEMPT_RT_FULL, printk() will not try to acquire the console lock when interrupts or preemption are disabled. If the console lock is not acquired the printk() output will be buffered, but will not be output immediately. Some drivers call into the Magic SysRq code with interrupts or preemption disabled, so the output of Magic SysRq will be buffered instead of printing immediately if this option is not selected. Even with this option selected, Magic SysRq output will be delayed if the attempt to acquire the console lock fails. Signed-off-by: Frank Rowand <frank.rowand@am.sony.com> Link: http://lkml.kernel.org/r/4E7CEF60.5020508@am.sony.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14ipc/sem: Rework semaphore wakeupsPeter Zijlstra
Current sysv sems have a weird ass wakeup scheme that involves keeping preemption disabled over a potential O(n^2) loop and busy waiting on that on other CPUs. Kill this and simply wake the task directly from under the sem_lock. This was discovered by a migrate_disable() debug feature that disallows: spin_lock(); preempt_disable(); spin_unlock() preempt_enable(); Cc: Manfred Spraul <manfred@colorfullife.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Manfred Spraul <manfred@colorfullife.com> Link: http://lkml.kernel.org/r/1315994224.5040.1.camel@twins Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14mm, rt: kmap_atomic schedulingPeter Zijlstra
In fact, with migrate_disable() existing one could play games with kmap_atomic. You could save/restore the kmap_atomic slots on context switch (if there are any in use of course), this should be esp easy now that we have a kmap_atomic stack. Something like the below.. it wants replacing all the preempt_disable() stuff with pagefault_disable() && migrate_disable() of course, but then you can flip kmaps around like below. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> [dvhart@linux.intel.com: build fix] Link: http://lkml.kernel.org/r/1311842631.5890.208.camel@twins
2011-11-14add /sys/kernel/realtime entryClark Williams
Add a /sys/kernel entry to indicate that the kernel is a realtime kernel. Clark says that he needs this for udev rules, udev needs to evaluate if its a PREEMPT_RT kernel a few thousand times and parsing uname output is too slow or so. Are there better solutions? Should it exist and return 0 on !-rt? Signed-off-by: Clark Williams <williams@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2011-11-14kgdb/serial: Short term workaroundJason Wessel
On 07/27/2011 04:37 PM, Thomas Gleixner wrote: > - KGDB (not yet disabled) is reportedly unusable on -rt right now due > to missing hacks in the console locking which I dropped on purpose. > To work around this in the short term you can use this patch, in addition to the clocksource watchdog patch that Thomas brewed up. Comments are welcome of course. Ultimately the right solution is to change separation between the console and the HW to have a polled mode + work queue so as not to introduce any kind of latency. Thanks, Jason.
2011-11-14ping-sysrq.patchCarsten Emde
There are (probably rare) situations when a system crashed and the system console becomes unresponsive but the network icmp layer still is alive. Wouldn't it be wonderful, if we then could submit a sysreq command via ping? This patch provides this facility. Please consult the updated documentation Documentation/sysrq.txt for details. Signed-off-by: Carsten Emde <C.Emde@osadl.org>
2011-11-14net: Avoid livelock in net_tx_action() on RTSteven Rostedt
qdisc_lock is taken w/o disabling interrupts or bottom halfs. So code holding a qdisc_lock() can be interrupted and softirqs can run on the return of interrupt in !RT. The spin_trylock() in net_tx_action() makes sure, that the softirq does not deadlock. When the lock can't be acquired q is requeued and the NET_TX softirq is raised. That causes the softirq to run over and over. That works in mainline as do_softirq() has a retry loop limit and leaves the softirq processing in the interrupt return path and schedules ksoftirqd. The task which holds qdisc_lock cannot be preempted, so the lock is released and either ksoftirqd or the next softirq in the return from interrupt path can proceed. Though it's a bit strange to actually run MAX_SOFTIRQ_RESTART (10) loops before it decides to bail out even if it's clear in the first iteration :) On RT all softirq processing is done in a FIFO thread and we don't have a loop limit, so ksoftirqd preempts the lock holder forever and unqueues and requeues until the reset button is hit. Due to the forced threading of ksoftirqd on RT we actually cannot deadlock on qdisc_lock because it's a "sleeping lock". So it's safe to replace the spin_trylock() with a spin_lock(). When contended, ksoftirqd is scheduled out and the lock holder can proceed. [ tglx: Massaged changelog and code comments ] Solved-by: Thomas Gleixner <tglx@linuxtronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Carsten Emde <cbe@osadl.org> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Luis Claudio R. Goncalves <lclaudio@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14mips-disable-highmem-on-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14ARM: at91: tclib: Default to tclib timer for RTThomas Gleixner
RT is not too happy about the shared timer interrupt in AT91 devices. Default to tclib timer for RT. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14arm-disable-highmem-on-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14power-disable-highmem-on-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14power-use-generic-rwsem-on-rtThomas Gleixner
2011-11-14console-make-rt-friendly.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14x86-no-perf-irq-work-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14skbufhead-raw-lock.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14jump-label-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14debugobjects-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14hotplug-stuff.patchThomas Gleixner
Do not take lock for non handled cases (might be atomic context) Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14workqueue: Use get_cpu_light() in flush_gcwq()Yong Zhang
BUG: sleeping function called from invalid context at kernel/rtmutex.c:645 in_atomic(): 1, irqs_disabled(): 0, pid: 1739, name: bash Pid: 1739, comm: bash Not tainted 3.0.6-rt17-00284-gb76d419 #3 Call Trace: [<c06e3b5d>] ? printk+0x1d/0x20 [<c01390b6>] __might_sleep+0xe6/0x110 [<c06e633c>] rt_spin_lock+0x1c/0x30 [<c01655a6>] flush_gcwq+0x236/0x320 [<c021c651>] ? kfree+0xe1/0x1a0 [<c05b7178>] ? __cpufreq_remove_dev+0xf8/0x260 [<c0183fad>] ? rt_down_write+0xd/0x10 [<c06cd91e>] workqueue_cpu_down_callback+0x26/0x2d [<c06e9d65>] notifier_call_chain+0x45/0x60 [<c0171cfe>] __raw_notifier_call_chain+0x1e/0x30 [<c014c9b4>] __cpu_notify+0x24/0x40 [<c06cbc6f>] _cpu_down+0xdf/0x330 [<c06cbef0>] cpu_down+0x30/0x50 [<c06cd6b0>] store_online+0x50/0xa7 [<c06cd660>] ? acpi_os_map_memory+0xec/0xec [<c04f2faa>] sysdev_store+0x2a/0x40 [<c02887a4>] sysfs_write_file+0xa4/0x100 [<c0229ab2>] vfs_write+0xa2/0x170 [<c0288700>] ? sysfs_poll+0x90/0x90 [<c0229d92>] sys_write+0x42/0x70 [<c06ecedf>] sysenter_do_call+0x12/0x2d CPU 1 is now offline SMP alternatives: switching to UP code SMP alternatives: switching to SMP code Booting Node 0 Processor 1 APIC 0x1 smpboot cpu 1: start_ip = 9b000 Initializing CPU#1 BUG: sleeping function called from invalid context at kernel/rtmutex.c:645 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: kworker/0:0 Pid: 0, comm: kworker/0:0 Not tainted 3.0.6-rt17-00284-gb76d419 #3 Call Trace: [<c06e3b5d>] ? printk+0x1d/0x20 [<c01390b6>] __might_sleep+0xe6/0x110 [<c06e633c>] rt_spin_lock+0x1c/0x30 [<c06cd85b>] workqueue_cpu_up_callback+0x56/0xf3 [<c06e9d65>] notifier_call_chain+0x45/0x60 [<c0171cfe>] __raw_notifier_call_chain+0x1e/0x30 [<c014c9b4>] __cpu_notify+0x24/0x40 [<c014c9ec>] cpu_notify+0x1c/0x20 [<c06e1d43>] notify_cpu_starting+0x1e/0x20 [<c06e0aad>] smp_callin+0xfb/0x10e [<c06e0ad9>] start_secondary+0x19/0xd7 NMI watchdog enabled, takes one hw-pmu counter. Switched to NOHz mode on CPU #1 Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Link: http://lkml.kernel.org/r/1318762607-2261-5-git-send-email-yong.zhang0@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14workqueue: Fix PF_THREAD_BOUND abusePeter Zijlstra
PF_THREAD_BOUND is set by kthread_bind() and means the thread is bound to a particular cpu for correctness. The workqueue code abuses this flag and blindly sets it for all created threads, including those that are free to migrate. Restore the original semantics now that the worst abuse in the cpu-hotplug path are gone. The only icky bit is the rescue thread for per-cpu workqueues, this cannot use kthread_bind() but will use set_cpus_allowed_ptr() to migrate itself to the desired cpu. Set and clear PF_THREAD_BOUND manually here. XXX: I think worker_maybe_bind_and_lock()/worker_unbind_and_unlock() should also do a get_online_cpus(), this would likely allow us to remove the while loop. XXX: should probably repurpose GCWQ_DISASSOCIATED to warn on adding works after CPU_DOWN_PREPARE -- its dual use to mark unbound gcwqs is a tad annoying though. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14workqueue: Fix cpuhotplug trainwreckPeter Zijlstra
The current workqueue code does crazy stuff on cpu unplug, it relies on forced affine breakage, thereby violating per-cpu expectations. Worse, it tries to re-attach to a cpu if the thing comes up again before all previously queued works are finished. This breaks (admittedly bonkers) cpu-hotplug use that relies on a down-up cycle to push all usage away. Introduce a new WQ_NON_AFFINE flag that indicates a per-cpu workqueue will not respect cpu affinity and use this to migrate all its pending works to whatever cpu is doing cpu-down. This also adds a warning for queue_on_cpu() users which warns when its used on WQ_NON_AFFINE workqueues for the API implies you care about what cpu things are ran on when such workqueues cannot guarantee this. For the rest, simply flush all per-cpu works and don't mess about. This also means that currently all workqueues that are manually flushing things on cpu-down in order to provide the per-cpu guarantee no longer need to do so. In short, we tell the WQ what we want it to do, provide validation for this and loose ~250 lines of code. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14mm-vmalloc.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14epoll.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14workqueue-use-get-cpu-light.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14x86: Disable IST stacks for debug/int 3/stack fault for PREEMPT_RTAndi Kleen
Normally the x86-64 trap handlers for debug/int 3/stack fault run on a special interrupt stack to make them more robust when dealing with kernel code. The PREEMPT_RT kernel can sleep in locks even while allocating GFP_ATOMIC memory. When one of these trap handlers needs to send real time signals for ptrace it allocates memory and could then try to to schedule. But it is not allowed to schedule on a IST stack. This can cause warnings and hangs. This patch disables the IST stacks for these handlers for PREEMPT_RT kernel. Instead let them run on the normal process stack. The kernel only really needs the ISTs here to make kernel debuggers more robust in case someone sets a break point somewhere where the stack is invalid. But there are no kernel debuggers in the standard kernel that do this. It also means kprobes cannot be set in situations with invalid stack; but that sounds like a reasonable restriction. The stack fault change could minimally impact oops quality, but not very much because stack faults are fairly rare. A better solution would be to use similar logic as the NMI "paranoid" path: check if signal is for user space, if yes go back to entry.S, switch stack, call sync_regs, then do the signal sending etc. But this patch is much simpler and should work too with minimal impact. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14x86: Use generic rwsem_spinlocks on -rtThomas Gleixner
Simplifies the separation of anon_rw_semaphores and rw_semaphores for -rt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14x86: stackprotector: Avoid random pool on rtThomas Gleixner
CPU bringup calls into the random pool to initialize the stack canary. During boot that works nicely even on RT as the might sleep checks are disabled. During CPU hotplug the might sleep checks trigger. Making the locks in random raw is a major PITA, so avoid the call on RT is the only sensible solution. This is basically the same randomness which we get during boot where the random pool has no entropy and we rely on the TSC randomnness. Reported-by: Carsten Emde <carsten.emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14x86: Convert mce timer to hrtimerThomas Gleixner
mce_timer is started in atomic contexts of cpu bringup. This results in might_sleep() warnings on RT. Convert mce_timer to a hrtimer to avoid this. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14fs: ntfs: disable interrupt only on !RTMike Galbraith
On Sat, 2007-10-27 at 11:44 +0200, Ingo Molnar wrote: > * Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > > > [10138.175796] [<c0105de3>] show_trace+0x12/0x14 > > > [10138.180291] [<c0105dfb>] dump_stack+0x16/0x18 > > > [10138.184769] [<c011609f>] native_smp_call_function_mask+0x138/0x13d > > > [10138.191117] [<c0117606>] smp_call_function+0x1e/0x24 > > > [10138.196210] [<c012f85c>] on_each_cpu+0x25/0x50 > > > [10138.200807] [<c0115c74>] flush_tlb_all+0x1e/0x20 > > > [10138.205553] [<c016caaf>] kmap_high+0x1b6/0x417 > > > [10138.210118] [<c011ec88>] kmap+0x4d/0x4f > > > [10138.214102] [<c026a9d8>] ntfs_end_buffer_async_read+0x228/0x2f9 > > > [10138.220163] [<c01a0e9e>] end_bio_bh_io_sync+0x26/0x3f > > > [10138.225352] [<c01a2b09>] bio_endio+0x42/0x6d > > > [10138.229769] [<c02c2a08>] __end_that_request_first+0x115/0x4ac > > > [10138.235682] [<c02c2da7>] end_that_request_chunk+0x8/0xa > > > [10138.241052] [<c0365943>] ide_end_request+0x55/0x10a > > > [10138.246058] [<c036dae3>] ide_dma_intr+0x6f/0xac > > > [10138.250727] [<c0366d83>] ide_intr+0x93/0x1e0 > > > [10138.255125] [<c015afb4>] handle_IRQ_event+0x5c/0xc9 > > > > Looks like ntfs is kmap()ing from interrupt context. Should be using > > kmap_atomic instead, I think. > > it's not atomic interrupt context but irq thread context - and -rt > remaps kmap_atomic() to kmap() internally. Hm. Looking at the change to mm/bounce.c, perhaps I should do this instead? Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14fs-block-rt-support.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14mm-protect-activate-switch-mm.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14fs: namespace preemption fixThomas Gleixner
On RT we cannot loop with preemption disabled here as mnt_make_readonly() might have been preempted. We can safely enable preemption while waiting for MNT_WRITE_HOLD to be cleared. Safe on !RT as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14drivers-tty-fix-omap-lock-crap.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14serial: 8250: Call flush_to_ldisc when the irq is threadedIngo Molnar
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14serial: 8250: Clean up the locking for -rtIngo Molnar
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14lglocks-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14rcu: Make ksoftirqd do RCU quiescent statesPaul E. McKenney
Implementing RCU-bh in terms of RCU-preempt makes the system vulnerable to network-based denial-of-service attacks. This patch therefore makes __do_softirq() invoke rcu_bh_qs(), but only when __do_softirq() is running in ksoftirqd context. A wrapper layer in interposed so that other calls to __do_softirq() avoid invoking rcu_bh_qs(). The underlying function __do_softirq_common() does the actual work. The reason that rcu_bh_qs() is bad in these non-ksoftirqd contexts is that there might be a local_bh_enable() inside an RCU-preempt read-side critical section. This local_bh_enable() can invoke __do_softirq() directly, so if __do_softirq() were to invoke rcu_bh_qs() (which just calls rcu_preempt_qs() in the PREEMPT_RT_FULL case), there would be an illegal RCU-preempt quiescent state in the middle of an RCU-preempt read-side critical section. Therefore, quiescent states can only happen in cases where __do_softirq() is invoked directly from ksoftirqd. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20111005184518.GA21601@linux.vnet.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14rcu-more-fallout.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14rcu: Fix macro substitution for synchronize_rcu_bh() on RTJohn Kacur
kernel/rcutorture.c:492: error: ‘synchronize_rcu_bh’ undeclared here (not in a function) In CONFIG_PREEMPT_RT_FULL doesn't cover function pointer assignment unless we remove the brackets. Signed-off-by: John Kacur <jkacur@redhat.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: stable-rt@vger.kernel.org Link: http://lkml.kernel.org/r/1321235083-21756-1-git-send-email-jkacur@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14rcu: Merge RCU-bh into RCU-preemptThomas Gleixner
The Linux kernel has long RCU-bh read-side critical sections that intolerably increase scheduling latency under mainline's RCU-bh rules, which include RCU-bh read-side critical sections being non-preemptible. This patch therefore arranges for RCU-bh to be implemented in terms of RCU-preempt for CONFIG_PREEMPT_RT_FULL=y. This has the downside of defeating the purpose of RCU-bh, namely, handling the case where the system is subjected to a network-based denial-of-service attack that keeps at least one CPU doing full-time softirq processing. This issue will be fixed by a later commit. The current commit will need some work to make it appropriate for mainline use, for example, it needs to be extended to cover Tiny RCU. [ paulmck: Added a useful changelog ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20111005185938.GA20403@linux.vnet.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14rcu: Frob softirq testPeter Zijlstra
With RT_FULL we get the below wreckage: [ 126.060484] ======================================================= [ 126.060486] [ INFO: possible circular locking dependency detected ] [ 126.060489] 3.0.1-rt10+ #30 [ 126.060490] ------------------------------------------------------- [ 126.060492] irq/24-eth0/1235 is trying to acquire lock: [ 126.060495] (&(lock)->wait_lock#2){+.+...}, at: [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55 [ 126.060503] [ 126.060504] but task is already holding lock: [ 126.060506] (&p->pi_lock){-...-.}, at: [<ffffffff81074fdc>] try_to_wake_up+0x35/0x429 [ 126.060511] [ 126.060511] which lock already depends on the new lock. [ 126.060513] [ 126.060514] [ 126.060514] the existing dependency chain (in reverse order) is: [ 126.060516] [ 126.060516] -> #1 (&p->pi_lock){-...-.}: [ 126.060519] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a [ 126.060524] [<ffffffff8150291e>] _raw_spin_lock_irqsave+0x4b/0x85 [ 126.060527] [<ffffffff810b5aa4>] task_blocks_on_rt_mutex+0x36/0x20f [ 126.060531] [<ffffffff815019bb>] rt_mutex_slowlock+0xd1/0x15a [ 126.060534] [<ffffffff81501ae3>] rt_mutex_lock+0x2d/0x2f [ 126.060537] [<ffffffff810d9020>] rcu_boost+0xad/0xde [ 126.060541] [<ffffffff810d90ce>] rcu_boost_kthread+0x7d/0x9b [ 126.060544] [<ffffffff8109a760>] kthread+0x99/0xa1 [ 126.060547] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10 [ 126.060551] [ 126.060552] -> #0 (&(lock)->wait_lock#2){+.+...}: [ 126.060555] [<ffffffff810af1b8>] __lock_acquire+0x1157/0x1816 [ 126.060558] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a [ 126.060561] [<ffffffff8150279e>] _raw_spin_lock+0x40/0x73 [ 126.060564] [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55 [ 126.060566] [<ffffffff81501ce7>] rt_mutex_unlock+0x27/0x29 [ 126.060569] [<ffffffff810d9f86>] rcu_read_unlock_special+0x17e/0x1c4 [ 126.060573] [<ffffffff810da014>] __rcu_read_unlock+0x48/0x89 [ 126.060576] [<ffffffff8106847a>] select_task_rq_rt+0xc7/0xd5 [ 126.060580] [<ffffffff8107511c>] try_to_wake_up+0x175/0x429 [ 126.060583] [<ffffffff81075425>] wake_up_process+0x15/0x17 [ 126.060585] [<ffffffff81080a51>] wakeup_softirqd+0x24/0x26 [ 126.060590] [<ffffffff81081df9>] irq_exit+0x49/0x55 [ 126.060593] [<ffffffff8150a3bd>] smp_apic_timer_interrupt+0x8a/0x98 [ 126.060597] [<ffffffff81509793>] apic_timer_interrupt+0x13/0x20 [ 126.060600] [<ffffffff810d5952>] irq_forced_thread_fn+0x1b/0x44 [ 126.060603] [<ffffffff810d582c>] irq_thread+0xde/0x1af [ 126.060606] [<ffffffff8109a760>] kthread+0x99/0xa1 [ 126.060608] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10 [ 126.060611] [ 126.060612] other info that might help us debug this: [ 126.060614] [ 126.060615] Possible unsafe locking scenario: [ 126.060616] [ 126.060617] CPU0 CPU1 [ 126.060619] ---- ---- [ 126.060620] lock(&p->pi_lock); [ 126.060623] lock(&(lock)->wait_lock); [ 126.060625] lock(&p->pi_lock); [ 126.060627] lock(&(lock)->wait_lock); [ 126.060629] [ 126.060629] *** DEADLOCK *** [ 126.060630] [ 126.060632] 1 lock held by irq/24-eth0/1235: [ 126.060633] #0: (&p->pi_lock){-...-.}, at: [<ffffffff81074fdc>] try_to_wake_up+0x35/0x429 [ 126.060638] [ 126.060638] stack backtrace: [ 126.060641] Pid: 1235, comm: irq/24-eth0 Not tainted 3.0.1-rt10+ #30 [ 126.060643] Call Trace: [ 126.060644] <IRQ> [<ffffffff810acbde>] print_circular_bug+0x289/0x29a [ 126.060651] [<ffffffff810af1b8>] __lock_acquire+0x1157/0x1816 [ 126.060655] [<ffffffff810ab3aa>] ? trace_hardirqs_off_caller+0x1f/0x99 [ 126.060658] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55 [ 126.060661] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a [ 126.060664] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55 [ 126.060668] [<ffffffff8150279e>] _raw_spin_lock+0x40/0x73 [ 126.060671] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55 [ 126.060674] [<ffffffff810d9655>] ? rcu_report_qs_rsp+0x87/0x8c [ 126.060677] [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55 [ 126.060680] [<ffffffff810d9ea3>] ? rcu_read_unlock_special+0x9b/0x1c4 [ 126.060683] [<ffffffff81501ce7>] rt_mutex_unlock+0x27/0x29 [ 126.060687] [<ffffffff810d9f86>] rcu_read_unlock_special+0x17e/0x1c4 [ 126.060690] [<ffffffff810da014>] __rcu_read_unlock+0x48/0x89 [ 126.060693] [<ffffffff8106847a>] select_task_rq_rt+0xc7/0xd5 [ 126.060696] [<ffffffff810683da>] ? select_task_rq_rt+0x27/0xd5 [ 126.060701] [<ffffffff810a852a>] ? clockevents_program_event+0x8e/0x90 [ 126.060704] [<ffffffff8107511c>] try_to_wake_up+0x175/0x429 [ 126.060708] [<ffffffff810a95dc>] ? tick_program_event+0x1f/0x21 [ 126.060711] [<ffffffff81075425>] wake_up_process+0x15/0x17 [ 126.060715] [<ffffffff81080a51>] wakeup_softirqd+0x24/0x26 [ 126.060718] [<ffffffff81081df9>] irq_exit+0x49/0x55 [ 126.060721] [<ffffffff8150a3bd>] smp_apic_timer_interrupt+0x8a/0x98 [ 126.060724] [<ffffffff81509793>] apic_timer_interrupt+0x13/0x20 [ 126.060726] <EOI> [<ffffffff81072855>] ? migrate_disable+0x75/0x12d [ 126.060733] [<ffffffff81080a61>] ? local_bh_disable+0xe/0x1f [ 126.060736] [<ffffffff81080a70>] ? local_bh_disable+0x1d/0x1f [ 126.060739] [<ffffffff810d5952>] irq_forced_thread_fn+0x1b/0x44 [ 126.060742] [<ffffffff81502ac0>] ? _raw_spin_unlock_irq+0x3b/0x59 [ 126.060745] [<ffffffff810d582c>] irq_thread+0xde/0x1af [ 126.060748] [<ffffffff810d5937>] ? irq_thread_fn+0x3a/0x3a [ 126.060751] [<ffffffff810d574e>] ? irq_finalize_oneshot+0xd1/0xd1 [ 126.060754] [<ffffffff810d574e>] ? irq_finalize_oneshot+0xd1/0xd1 [ 126.060757] [<ffffffff8109a760>] kthread+0x99/0xa1 [ 126.060761] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10 [ 126.060764] [<ffffffff81069ed7>] ? finish_task_switch+0x87/0x10a [ 126.060768] [<ffffffff81502ec4>] ? retint_restore_args+0xe/0xe [ 126.060771] [<ffffffff8109a6c7>] ? __init_kthread_worker+0x8c/0x8c [ 126.060774] [<ffffffff81509b10>] ? gs_change+0xb/0xb Because irq_exit() does: void irq_exit(void) { account_system_vtime(current); trace_hardirq_exit(); sub_preempt_count(IRQ_EXIT_OFFSET); if (!in_interrupt() && local_softirq_pending()) invoke_softirq(); ... } Which triggers a wakeup, which uses RCU, now if the interrupted task has t->rcu_read_unlock_special set, the rcu usage from the wakeup will end up in rcu_read_unlock_special(). rcu_read_unlock_special() will test for in_irq(), which will fail as we just decremented preempt_count with IRQ_EXIT_OFFSET, and in_sering_softirq(), which for PREEMPT_RT_FULL reads: int in_serving_softirq(void) { int res; preempt_disable(); res = __get_cpu_var(local_softirq_runner) == current; preempt_enable(); return res; } Which will thus also fail, resulting in the above wreckage. The 'somewhat' ugly solution is to open-code the preempt_count() test in rcu_read_unlock_special(). Also, we're not at all sure how ->rcu_read_unlock_special gets set here... so this is very likely a bandaid and more thought is required. Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2011-11-14RCU: Force PREEMPT_RCU for PREEMPT-RTIngo Molnar
PREEMPT_RT relies on PREEMPT_RCU - only allow RCU to be configured interactively in the !PREEMPT_RT case. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-j1y0phicu6s6pu8guku2vca0@git.kernel.org
2011-11-14timer-handle-idle-trylock-in-get-next-timer-irq.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14rwlocks: Fix section mismatchJohn Kacur
This fixes the following build error for the preempt-rt kernel. make kernel/fork.o CC kernel/fork.o kernel/fork.c:90: error: section of ¡tasklist_lock¢ conflicts with previous declaration make[2]: *** [kernel/fork.o] Error 1 make[1]: *** [kernel/fork.o] Error 2 The rt kernel cache aligns the RWLOCK in DEFINE_RWLOCK by default. The non-rt kernels explicitly cache align only the tasklist_lock in kernel/fork.c That can create a build conflict. This fixes the build problem by making the non-rt kernels cache align RWLOCKs by default. The side effect is that the other RWLOCKs are also cache aligned for non-rt. This is a short term solution for rt only. The longer term solution would be to push the cache aligned DEFINE_RWLOCK to mainline. If there are objections, then we could create a DEFINE_RWLOCK_CACHE_ALIGNED or something of that nature. Comments? Objections? Signed-off-by: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/alpine.LFD.2.00.1109191104010.23118@localhost6.localdomain6 Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-14rt: Add the preempt-rt lock replacement APIsThomas Gleixner
Map spinlocks, rwlocks, rw_semaphores and semaphores to the rt_mutex based locking functions for preempt-rt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>