summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2012-02-15kconfig-preempt-rt-full.patchv3.2.6-rt13Thomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15wait-simple: Simple waitqueue implementationThomas Gleixner
wait_queue is a swiss army knife and in most of the cases the complexity is not needed. For RT waitqueues are a constant source of trouble as we can't convert the head lock to a raw spinlock due to fancy and long lasting callbacks. Provide a slim version, which allows RT to replace wait queues. This should go mainline as well, as it lowers memory consumption and runtime overhead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15sysrq: Allow immediate Magic SysRq output for PREEMPT_RT_FULLFrank Rowand
Add a CONFIG option to allow the output from Magic SysRq to be output immediately, even if this causes large latencies. If PREEMPT_RT_FULL, printk() will not try to acquire the console lock when interrupts or preemption are disabled. If the console lock is not acquired the printk() output will be buffered, but will not be output immediately. Some drivers call into the Magic SysRq code with interrupts or preemption disabled, so the output of Magic SysRq will be buffered instead of printing immediately if this option is not selected. Even with this option selected, Magic SysRq output will be delayed if the attempt to acquire the console lock fails. Signed-off-by: Frank Rowand <frank.rowand@am.sony.com> Link: http://lkml.kernel.org/r/4E7CEF60.5020508@am.sony.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15add /sys/kernel/realtime entryClark Williams
Add a /sys/kernel entry to indicate that the kernel is a realtime kernel. Clark says that he needs this for udev rules, udev needs to evaluate if its a PREEMPT_RT kernel a few thousand times and parsing uname output is too slow or so. Are there better solutions? Should it exist and return 0 on !-rt? Signed-off-by: Clark Williams <williams@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2012-02-15kgdb/serial: Short term workaroundJason Wessel
On 07/27/2011 04:37 PM, Thomas Gleixner wrote: > - KGDB (not yet disabled) is reportedly unusable on -rt right now due > to missing hacks in the console locking which I dropped on purpose. > To work around this in the short term you can use this patch, in addition to the clocksource watchdog patch that Thomas brewed up. Comments are welcome of course. Ultimately the right solution is to change separation between the console and the HW to have a polled mode + work queue so as not to introduce any kind of latency. Thanks, Jason.
2012-02-15printk: Disable migration instead of preemptionRichard Weinberger
There is no need do disable preemption in vprintk(), disable_migrate() is sufficient. This fixes the following bug in -rt: [ 14.759233] BUG: sleeping function called from invalid context at /home/rw/linux-rt/kernel/rtmutex.c:645 [ 14.759235] in_atomic(): 1, irqs_disabled(): 0, pid: 547, name: bash [ 14.759244] Pid: 547, comm: bash Not tainted 3.0.12-rt29+ #3 [ 14.759246] Call Trace: [ 14.759301] [<ffffffff8106fade>] __might_sleep+0xeb/0xf0 [ 14.759318] [<ffffffff810ad784>] rt_spin_lock_fastlock.constprop.9+0x21/0x43 [ 14.759336] [<ffffffff8161fef0>] rt_spin_lock+0xe/0x10 [ 14.759354] [<ffffffff81347ad1>] serial8250_console_write+0x81/0x121 [ 14.759366] [<ffffffff8107ecd3>] __call_console_drivers+0x7c/0x93 [ 14.759369] [<ffffffff8107ef31>] _call_console_drivers+0x5c/0x60 [ 14.759372] [<ffffffff8107f7e5>] console_unlock+0x147/0x1a2 [ 14.759374] [<ffffffff8107fd33>] vprintk+0x3ea/0x462 [ 14.759383] [<ffffffff816160e0>] printk+0x51/0x53 [ 14.759399] [<ffffffff811974e4>] ? proc_reg_poll+0x9a/0x9a [ 14.759403] [<ffffffff81335b42>] __handle_sysrq+0x50/0x14d [ 14.759406] [<ffffffff81335c8a>] write_sysrq_trigger+0x4b/0x53 [ 14.759408] [<ffffffff81335c3f>] ? __handle_sysrq+0x14d/0x14d [ 14.759410] [<ffffffff81197583>] proc_reg_write+0x9f/0xbe [ 14.759426] [<ffffffff811497ec>] vfs_write+0xac/0xf3 [ 14.759429] [<ffffffff8114a9b3>] ? fget_light+0x3a/0x9b [ 14.759431] [<ffffffff811499db>] sys_write+0x4a/0x6e [ 14.759438] [<ffffffff81625d52>] system_call_fastpath+0x16/0x1b Signed-off-by: Richard Weinberger <rw@linutronix.de> Link: http://lkml.kernel.org/r/1323696956-11445-1-git-send-email-rw@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15console-make-rt-friendly.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15x86-no-perf-irq-work-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15hotplug-stuff.patchThomas Gleixner
Do not take lock for non handled cases (might be atomic context) Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15workqueue: Use get_cpu_light() in flush_gcwq()Yong Zhang
BUG: sleeping function called from invalid context at kernel/rtmutex.c:645 in_atomic(): 1, irqs_disabled(): 0, pid: 1739, name: bash Pid: 1739, comm: bash Not tainted 3.0.6-rt17-00284-gb76d419 #3 Call Trace: [<c06e3b5d>] ? printk+0x1d/0x20 [<c01390b6>] __might_sleep+0xe6/0x110 [<c06e633c>] rt_spin_lock+0x1c/0x30 [<c01655a6>] flush_gcwq+0x236/0x320 [<c021c651>] ? kfree+0xe1/0x1a0 [<c05b7178>] ? __cpufreq_remove_dev+0xf8/0x260 [<c0183fad>] ? rt_down_write+0xd/0x10 [<c06cd91e>] workqueue_cpu_down_callback+0x26/0x2d [<c06e9d65>] notifier_call_chain+0x45/0x60 [<c0171cfe>] __raw_notifier_call_chain+0x1e/0x30 [<c014c9b4>] __cpu_notify+0x24/0x40 [<c06cbc6f>] _cpu_down+0xdf/0x330 [<c06cbef0>] cpu_down+0x30/0x50 [<c06cd6b0>] store_online+0x50/0xa7 [<c06cd660>] ? acpi_os_map_memory+0xec/0xec [<c04f2faa>] sysdev_store+0x2a/0x40 [<c02887a4>] sysfs_write_file+0xa4/0x100 [<c0229ab2>] vfs_write+0xa2/0x170 [<c0288700>] ? sysfs_poll+0x90/0x90 [<c0229d92>] sys_write+0x42/0x70 [<c06ecedf>] sysenter_do_call+0x12/0x2d CPU 1 is now offline SMP alternatives: switching to UP code SMP alternatives: switching to SMP code Booting Node 0 Processor 1 APIC 0x1 smpboot cpu 1: start_ip = 9b000 Initializing CPU#1 BUG: sleeping function called from invalid context at kernel/rtmutex.c:645 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: kworker/0:0 Pid: 0, comm: kworker/0:0 Not tainted 3.0.6-rt17-00284-gb76d419 #3 Call Trace: [<c06e3b5d>] ? printk+0x1d/0x20 [<c01390b6>] __might_sleep+0xe6/0x110 [<c06e633c>] rt_spin_lock+0x1c/0x30 [<c06cd85b>] workqueue_cpu_up_callback+0x56/0xf3 [<c06e9d65>] notifier_call_chain+0x45/0x60 [<c0171cfe>] __raw_notifier_call_chain+0x1e/0x30 [<c014c9b4>] __cpu_notify+0x24/0x40 [<c014c9ec>] cpu_notify+0x1c/0x20 [<c06e1d43>] notify_cpu_starting+0x1e/0x20 [<c06e0aad>] smp_callin+0xfb/0x10e [<c06e0ad9>] start_secondary+0x19/0xd7 NMI watchdog enabled, takes one hw-pmu counter. Switched to NOHz mode on CPU #1 Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Link: http://lkml.kernel.org/r/1318762607-2261-5-git-send-email-yong.zhang0@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15workqueue: Fix PF_THREAD_BOUND abusePeter Zijlstra
PF_THREAD_BOUND is set by kthread_bind() and means the thread is bound to a particular cpu for correctness. The workqueue code abuses this flag and blindly sets it for all created threads, including those that are free to migrate. Restore the original semantics now that the worst abuse in the cpu-hotplug path are gone. The only icky bit is the rescue thread for per-cpu workqueues, this cannot use kthread_bind() but will use set_cpus_allowed_ptr() to migrate itself to the desired cpu. Set and clear PF_THREAD_BOUND manually here. XXX: I think worker_maybe_bind_and_lock()/worker_unbind_and_unlock() should also do a get_online_cpus(), this would likely allow us to remove the while loop. XXX: should probably repurpose GCWQ_DISASSOCIATED to warn on adding works after CPU_DOWN_PREPARE -- its dual use to mark unbound gcwqs is a tad annoying though. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15workqueue: Fix cpuhotplug trainwreckPeter Zijlstra
The current workqueue code does crazy stuff on cpu unplug, it relies on forced affine breakage, thereby violating per-cpu expectations. Worse, it tries to re-attach to a cpu if the thing comes up again before all previously queued works are finished. This breaks (admittedly bonkers) cpu-hotplug use that relies on a down-up cycle to push all usage away. Introduce a new WQ_NON_AFFINE flag that indicates a per-cpu workqueue will not respect cpu affinity and use this to migrate all its pending works to whatever cpu is doing cpu-down. This also adds a warning for queue_on_cpu() users which warns when its used on WQ_NON_AFFINE workqueues for the API implies you care about what cpu things are ran on when such workqueues cannot guarantee this. For the rest, simply flush all per-cpu works and don't mess about. This also means that currently all workqueues that are manually flushing things on cpu-down in order to provide the per-cpu guarantee no longer need to do so. In short, we tell the WQ what we want it to do, provide validation for this and loose ~250 lines of code. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15workqueue-use-get-cpu-light.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15rt/rcutree: Move misplaced prototypeIngo Molnar
Fix this warning on x86 defconfig: kernel/rcutree.h:433:13: warning: ‘rcu_preempt_qs’ declared ‘static’ but never defined [-Wunused-function] The #ifdefs and prototypes here are a maze, move it closer to the usage site that needs it. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15rcu: Make ksoftirqd do RCU quiescent statesPaul E. McKenney
Implementing RCU-bh in terms of RCU-preempt makes the system vulnerable to network-based denial-of-service attacks. This patch therefore makes __do_softirq() invoke rcu_bh_qs(), but only when __do_softirq() is running in ksoftirqd context. A wrapper layer in interposed so that other calls to __do_softirq() avoid invoking rcu_bh_qs(). The underlying function __do_softirq_common() does the actual work. The reason that rcu_bh_qs() is bad in these non-ksoftirqd contexts is that there might be a local_bh_enable() inside an RCU-preempt read-side critical section. This local_bh_enable() can invoke __do_softirq() directly, so if __do_softirq() were to invoke rcu_bh_qs() (which just calls rcu_preempt_qs() in the PREEMPT_RT_FULL case), there would be an illegal RCU-preempt quiescent state in the middle of an RCU-preempt read-side critical section. Therefore, quiescent states can only happen in cases where __do_softirq() is invoked directly from ksoftirqd. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20111005184518.GA21601@linux.vnet.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15rcu-more-fallout.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15rcu: Merge RCU-bh into RCU-preemptThomas Gleixner
The Linux kernel has long RCU-bh read-side critical sections that intolerably increase scheduling latency under mainline's RCU-bh rules, which include RCU-bh read-side critical sections being non-preemptible. This patch therefore arranges for RCU-bh to be implemented in terms of RCU-preempt for CONFIG_PREEMPT_RT_FULL=y. This has the downside of defeating the purpose of RCU-bh, namely, handling the case where the system is subjected to a network-based denial-of-service attack that keeps at least one CPU doing full-time softirq processing. This issue will be fixed by a later commit. The current commit will need some work to make it appropriate for mainline use, for example, it needs to be extended to cover Tiny RCU. [ paulmck: Added a useful changelog ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20111005185938.GA20403@linux.vnet.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15rcu: Frob softirq testPeter Zijlstra
With RT_FULL we get the below wreckage: [ 126.060484] ======================================================= [ 126.060486] [ INFO: possible circular locking dependency detected ] [ 126.060489] 3.0.1-rt10+ #30 [ 126.060490] ------------------------------------------------------- [ 126.060492] irq/24-eth0/1235 is trying to acquire lock: [ 126.060495] (&(lock)->wait_lock#2){+.+...}, at: [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55 [ 126.060503] [ 126.060504] but task is already holding lock: [ 126.060506] (&p->pi_lock){-...-.}, at: [<ffffffff81074fdc>] try_to_wake_up+0x35/0x429 [ 126.060511] [ 126.060511] which lock already depends on the new lock. [ 126.060513] [ 126.060514] [ 126.060514] the existing dependency chain (in reverse order) is: [ 126.060516] [ 126.060516] -> #1 (&p->pi_lock){-...-.}: [ 126.060519] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a [ 126.060524] [<ffffffff8150291e>] _raw_spin_lock_irqsave+0x4b/0x85 [ 126.060527] [<ffffffff810b5aa4>] task_blocks_on_rt_mutex+0x36/0x20f [ 126.060531] [<ffffffff815019bb>] rt_mutex_slowlock+0xd1/0x15a [ 126.060534] [<ffffffff81501ae3>] rt_mutex_lock+0x2d/0x2f [ 126.060537] [<ffffffff810d9020>] rcu_boost+0xad/0xde [ 126.060541] [<ffffffff810d90ce>] rcu_boost_kthread+0x7d/0x9b [ 126.060544] [<ffffffff8109a760>] kthread+0x99/0xa1 [ 126.060547] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10 [ 126.060551] [ 126.060552] -> #0 (&(lock)->wait_lock#2){+.+...}: [ 126.060555] [<ffffffff810af1b8>] __lock_acquire+0x1157/0x1816 [ 126.060558] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a [ 126.060561] [<ffffffff8150279e>] _raw_spin_lock+0x40/0x73 [ 126.060564] [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55 [ 126.060566] [<ffffffff81501ce7>] rt_mutex_unlock+0x27/0x29 [ 126.060569] [<ffffffff810d9f86>] rcu_read_unlock_special+0x17e/0x1c4 [ 126.060573] [<ffffffff810da014>] __rcu_read_unlock+0x48/0x89 [ 126.060576] [<ffffffff8106847a>] select_task_rq_rt+0xc7/0xd5 [ 126.060580] [<ffffffff8107511c>] try_to_wake_up+0x175/0x429 [ 126.060583] [<ffffffff81075425>] wake_up_process+0x15/0x17 [ 126.060585] [<ffffffff81080a51>] wakeup_softirqd+0x24/0x26 [ 126.060590] [<ffffffff81081df9>] irq_exit+0x49/0x55 [ 126.060593] [<ffffffff8150a3bd>] smp_apic_timer_interrupt+0x8a/0x98 [ 126.060597] [<ffffffff81509793>] apic_timer_interrupt+0x13/0x20 [ 126.060600] [<ffffffff810d5952>] irq_forced_thread_fn+0x1b/0x44 [ 126.060603] [<ffffffff810d582c>] irq_thread+0xde/0x1af [ 126.060606] [<ffffffff8109a760>] kthread+0x99/0xa1 [ 126.060608] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10 [ 126.060611] [ 126.060612] other info that might help us debug this: [ 126.060614] [ 126.060615] Possible unsafe locking scenario: [ 126.060616] [ 126.060617] CPU0 CPU1 [ 126.060619] ---- ---- [ 126.060620] lock(&p->pi_lock); [ 126.060623] lock(&(lock)->wait_lock); [ 126.060625] lock(&p->pi_lock); [ 126.060627] lock(&(lock)->wait_lock); [ 126.060629] [ 126.060629] *** DEADLOCK *** [ 126.060630] [ 126.060632] 1 lock held by irq/24-eth0/1235: [ 126.060633] #0: (&p->pi_lock){-...-.}, at: [<ffffffff81074fdc>] try_to_wake_up+0x35/0x429 [ 126.060638] [ 126.060638] stack backtrace: [ 126.060641] Pid: 1235, comm: irq/24-eth0 Not tainted 3.0.1-rt10+ #30 [ 126.060643] Call Trace: [ 126.060644] <IRQ> [<ffffffff810acbde>] print_circular_bug+0x289/0x29a [ 126.060651] [<ffffffff810af1b8>] __lock_acquire+0x1157/0x1816 [ 126.060655] [<ffffffff810ab3aa>] ? trace_hardirqs_off_caller+0x1f/0x99 [ 126.060658] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55 [ 126.060661] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a [ 126.060664] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55 [ 126.060668] [<ffffffff8150279e>] _raw_spin_lock+0x40/0x73 [ 126.060671] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55 [ 126.060674] [<ffffffff810d9655>] ? rcu_report_qs_rsp+0x87/0x8c [ 126.060677] [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55 [ 126.060680] [<ffffffff810d9ea3>] ? rcu_read_unlock_special+0x9b/0x1c4 [ 126.060683] [<ffffffff81501ce7>] rt_mutex_unlock+0x27/0x29 [ 126.060687] [<ffffffff810d9f86>] rcu_read_unlock_special+0x17e/0x1c4 [ 126.060690] [<ffffffff810da014>] __rcu_read_unlock+0x48/0x89 [ 126.060693] [<ffffffff8106847a>] select_task_rq_rt+0xc7/0xd5 [ 126.060696] [<ffffffff810683da>] ? select_task_rq_rt+0x27/0xd5 [ 126.060701] [<ffffffff810a852a>] ? clockevents_program_event+0x8e/0x90 [ 126.060704] [<ffffffff8107511c>] try_to_wake_up+0x175/0x429 [ 126.060708] [<ffffffff810a95dc>] ? tick_program_event+0x1f/0x21 [ 126.060711] [<ffffffff81075425>] wake_up_process+0x15/0x17 [ 126.060715] [<ffffffff81080a51>] wakeup_softirqd+0x24/0x26 [ 126.060718] [<ffffffff81081df9>] irq_exit+0x49/0x55 [ 126.060721] [<ffffffff8150a3bd>] smp_apic_timer_interrupt+0x8a/0x98 [ 126.060724] [<ffffffff81509793>] apic_timer_interrupt+0x13/0x20 [ 126.060726] <EOI> [<ffffffff81072855>] ? migrate_disable+0x75/0x12d [ 126.060733] [<ffffffff81080a61>] ? local_bh_disable+0xe/0x1f [ 126.060736] [<ffffffff81080a70>] ? local_bh_disable+0x1d/0x1f [ 126.060739] [<ffffffff810d5952>] irq_forced_thread_fn+0x1b/0x44 [ 126.060742] [<ffffffff81502ac0>] ? _raw_spin_unlock_irq+0x3b/0x59 [ 126.060745] [<ffffffff810d582c>] irq_thread+0xde/0x1af [ 126.060748] [<ffffffff810d5937>] ? irq_thread_fn+0x3a/0x3a [ 126.060751] [<ffffffff810d574e>] ? irq_finalize_oneshot+0xd1/0xd1 [ 126.060754] [<ffffffff810d574e>] ? irq_finalize_oneshot+0xd1/0xd1 [ 126.060757] [<ffffffff8109a760>] kthread+0x99/0xa1 [ 126.060761] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10 [ 126.060764] [<ffffffff81069ed7>] ? finish_task_switch+0x87/0x10a [ 126.060768] [<ffffffff81502ec4>] ? retint_restore_args+0xe/0xe [ 126.060771] [<ffffffff8109a6c7>] ? __init_kthread_worker+0x8c/0x8c [ 126.060774] [<ffffffff81509b10>] ? gs_change+0xb/0xb Because irq_exit() does: void irq_exit(void) { account_system_vtime(current); trace_hardirq_exit(); sub_preempt_count(IRQ_EXIT_OFFSET); if (!in_interrupt() && local_softirq_pending()) invoke_softirq(); ... } Which triggers a wakeup, which uses RCU, now if the interrupted task has t->rcu_read_unlock_special set, the rcu usage from the wakeup will end up in rcu_read_unlock_special(). rcu_read_unlock_special() will test for in_irq(), which will fail as we just decremented preempt_count with IRQ_EXIT_OFFSET, and in_sering_softirq(), which for PREEMPT_RT_FULL reads: int in_serving_softirq(void) { int res; preempt_disable(); res = __get_cpu_var(local_softirq_runner) == current; preempt_enable(); return res; } Which will thus also fail, resulting in the above wreckage. The 'somewhat' ugly solution is to open-code the preempt_count() test in rcu_read_unlock_special(). Also, we're not at all sure how ->rcu_read_unlock_special gets set here... so this is very likely a bandaid and more thought is required. Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2012-02-15timer-handle-idle-trylock-in-get-next-timer-irq.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15rwlocks: Fix section mismatchJohn Kacur
This fixes the following build error for the preempt-rt kernel. make kernel/fork.o CC kernel/fork.o kernel/fork.c:90: error: section of ¡tasklist_lock¢ conflicts with previous declaration make[2]: *** [kernel/fork.o] Error 1 make[1]: *** [kernel/fork.o] Error 2 The rt kernel cache aligns the RWLOCK in DEFINE_RWLOCK by default. The non-rt kernels explicitly cache align only the tasklist_lock in kernel/fork.c That can create a build conflict. This fixes the build problem by making the non-rt kernels cache align RWLOCKs by default. The side effect is that the other RWLOCKs are also cache aligned for non-rt. This is a short term solution for rt only. The longer term solution would be to push the cache aligned DEFINE_RWLOCK to mainline. If there are objections, then we could create a DEFINE_RWLOCK_CACHE_ALIGNED or something of that nature. Comments? Objections? Signed-off-by: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/alpine.LFD.2.00.1109191104010.23118@localhost6.localdomain6 Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15rt: Add the preempt-rt lock replacement APIsThomas Gleixner
Map spinlocks, rwlocks, rw_semaphores and semaphores to the rt_mutex based locking functions for preempt-rt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15rt-mutex-add-sleeping-spinlocks-support.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15futex: Fix bug on when a requeued RT task times outSteven Rostedt
Requeue with timeout causes a bug with PREEMPT_RT_FULL. The bug comes from a timed out condition. TASK 1 TASK 2 ------ ------ futex_wait_requeue_pi() futex_wait_queue_me() <timed out> double_lock_hb(); raw_spin_lock(pi_lock); if (current->pi_blocked_on) { } else { current->pi_blocked_on = PI_WAKE_INPROGRESS; run_spin_unlock(pi_lock); spin_lock(hb->lock); <-- blocked! plist_for_each_entry_safe(this) { rt_mutex_start_proxy_lock(); task_blocks_on_rt_mutex(); BUG_ON(task->pi_blocked_on)!!!! The BUG_ON() actually has a check for PI_WAKE_INPROGRESS, but the problem is that, after TASK 1 sets PI_WAKE_INPROGRESS, it then tries to grab the hb->lock, which it fails to do so. As the hb->lock is a mutex, it will block and set the "pi_blocked_on" to the hb->lock. When TASK 2 goes to requeue it, the check for PI_WAKE_INPROGESS fails because the task1's pi_blocked_on is no longer set to that, but instead, set to the hb->lock. The fix: When calling rt_mutex_start_proxy_lock() a check is made to see if the proxy tasks pi_blocked_on is set. If so, exit out early. Otherwise set it to a new flag PI_REQUEUE_INPROGRESS, which notifies the proxy task that it is being requeued, and will handle things appropriately. Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15rtmutex-futex-prepare-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15rtmutex-lock-killable.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15genirq: Allow disabling of softirq processing in irq thread contextThomas Gleixner
The processing of softirqs in irq thread context is a performance gain for the non-rt workloads of a system, but it's counterproductive for interrupts which are explicitely related to the realtime workload. Allow such interrupts to prevent softirq processing in their thread context. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
2012-02-15tasklet: Prevent tasklets from going into infinite spin in RTIngo Molnar
When CONFIG_PREEMPT_RT_FULL is enabled, tasklets run as threads, and spinlocks turn are mutexes. But this can cause issues with tasks disabling tasklets. A tasklet runs under ksoftirqd, and if a tasklets are disabled with tasklet_disable(), the tasklet count is increased. When a tasklet runs, it checks this counter and if it is set, it adds itself back on the softirq queue and returns. The problem arises in RT because ksoftirq will see that a softirq is ready to run (the tasklet softirq just re-armed itself), and will not sleep, but instead run the softirqs again. The tasklet softirq will still see that the count is non-zero and will not execute the tasklet and requeue itself on the softirq again, which will cause ksoftirqd to run it again and again and again. It gets worse because ksoftirqd runs as a real-time thread. If it preempted the task that disabled tasklets, and that task has migration disabled, or can't run for other reasons, the tasklet softirq will never run because the count will never be zero, and ksoftirqd will go into an infinite loop. As an RT task, it this becomes a big problem. This is a hack solution to have tasklet_disable stop tasklets, and when a tasklet runs, instead of requeueing the tasklet softirqd it delays it. When tasklet_enable() is called, and tasklets are waiting, then the tasklet_enable() will kick the tasklets to continue. This prevents the lock up from ksoftirq going into an infinite loop. [ rostedt@goodmis.org: ported to 3.0-rt ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15softirq-make-fifo.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15softirq: Fix unplug deadlockPeter Zijlstra
If ksoftirqd gets woken during hot-unplug, __thread_do_softirq() will call pin_current_cpu() which will block on the held cpu_hotplug.lock. Moving the offline check in __thread_do_softirq() before the pin_current_cpu() call doesn't work, since the wakeup can happen before we mark the cpu offline. So here we have the ksoftirq thread stuck until hotplug finishes, but then the ksoftirq CPU_DOWN notifier issues kthread_stop() which will wait for the ksoftirq thread to go away -- while holding the hotplug lock. Sort this by delaying the kthread_stop() until CPU_POST_DEAD, which is outside of the cpu_hotplug.lock, but still serialized by the cpu_add_remove_lock. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: rostedt <rostedt@goodmis.org> Cc: Clark Williams <williams@redhat.com> Link: http://lkml.kernel.org/r/1317391156.12973.3.camel@twins Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15softirq: Export in_serving_softirq()John Kacur
ERROR: "in_serving_softirq" [net/sched/cls_cgroup.ko] undefined! The above can be fixed by exporting in_serving_softirq Signed-off-by: John Kacur <jkacur@redhat.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: stable-rt@vger.kernel.org Link: http://lkml.kernel.org/r/1321235083-21756-2-git-send-email-jkacur@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15softirq-local-lock.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15mutex-no-spin-on-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15lockdep-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15softirq: Sanitize softirq pending for NOHZ/RTThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15ring-buffer: Convert reader_lock from raw_spin_lock into spin_lockSteven Rostedt
The reader_lock is mostly taken in normal context with interrupts enabled. But because ftrace_dump() can happen anywhere, it is used as a spin lock and in some cases a check to in_nmi() is performed to determine if the ftrace_dump() was initiated from an NMI and if it is, the lock is not taken. But having the lock as a raw_spin_lock() causes issues with the real-time kernel as the lock is held during allocation and freeing of the buffer. As memory locks convert into mutexes, keeping the reader_lock as a spin_lock causes problems. Converting the reader_lock is not straight forward as we must still deal with the ftrace_dump() happening not only from an NMI but also from true interrupt context in PREEPMT_RT. Two wrapper functions are created to take and release the reader lock: int read_buffer_lock(cpu_buffer, unsigned long *flags) void read_buffer_unlock(cpu_buffer, unsigned long flags, int locked) The read_buffer_lock() returns 1 if it actually took the lock, disables interrupts and updates the flags. The only time it returns 0 is in the case of a ftrace_dump() happening in an unsafe context. The read_buffer_unlock() checks the return of locked and will simply unlock the spin lock if it was successfully taken. Instead of just having this in specific cases that the NMI might call into, all instances of the reader_lock is converted to the wrapper functions to make this a bit simpler to read and less error prone. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clark Williams <clark@redhat.com> Link: http://lkml.kernel.org/r/1317146210.26514.33.camel@gandalf.stny.rr.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15ftrace-crap.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15sched-clear-pf-thread-bound-on-fallback-rq.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15sched: Have migrate_disable ignore bounded threadsPeter Zijlstra
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clark Williams <williams@redhat.com> Link: http://lkml.kernel.org/r/20110927124423.567944215@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15sched: Do not compare cpu masks in schedulerPeter Zijlstra
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clark Williams <williams@redhat.com> Link: http://lkml.kernel.org/r/20110927124423.128129033@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15sched: Postpone actual migration disalbe to scheduleSteven Rostedt
The migrate_disable() can cause a bit of a overhead to the RT kernel, as changing the affinity is expensive to do at every lock encountered. As a running task can not migrate, the actual disabling of migration does not need to occur until the task is about to schedule out. In most cases, a task that disables migration will enable it before it schedules making this change improve performance tremendously. [ Frank Rowand: UP compile fix ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clark Williams <williams@redhat.com> Link: http://lkml.kernel.org/r/20110927124422.779693167@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15sched: teach migrate_disable about atomic contextsPeter Zijlstra
<NMI> [<ffffffff812dafd8>] spin_bug+0x94/0xa8 [<ffffffff812db07f>] do_raw_spin_lock+0x43/0xea [<ffffffff814fa9be>] _raw_spin_lock_irqsave+0x6b/0x85 [<ffffffff8106ff9e>] ? migrate_disable+0x75/0x12d [<ffffffff81078aaf>] ? pin_current_cpu+0x36/0xb0 [<ffffffff8106ff9e>] migrate_disable+0x75/0x12d [<ffffffff81115b9d>] pagefault_disable+0xe/0x1f [<ffffffff81047027>] copy_from_user_nmi+0x74/0xe6 [<ffffffff810489d7>] perf_callchain_user+0xf3/0x135 Now clearly we can't go around taking locks from NMI context, cure this by short-circuiting migrate_disable() when we're in an atomic context already. Add some extra debugging to avoid things like: preempt_disable() migrate_disable(); preempt_enable(); migrate_enable(); Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1314967297.1301.14.camel@twins Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/n/tip-wbot4vsmwhi8vmbf83hsclk6@git.kernel.org
2012-02-15sched, rt: Fix migrate_enable() thinkoMike Galbraith
Assigning mask = tsk_cpus_allowed(p) after p->migrate_disable = 0 ensures that we won't see a mask change.. no push/pull, we stack tasks on one CPU. Also add a couple fields to sched_debug for the next guy. [ Build fix from Stratos Psomadakis <psomas@gentoo.org> ] Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1314108763.6689.4.camel@marge.simson.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15sched: Generic migrate_disablePeter Zijlstra
Make migrate_disable() be a preempt_disable() for !rt kernels. This allows generic code to use it but still enforces that these code sections stay relatively small. A preemptible migrate_disable() accessible for general use would allow people growing arbitrary per-cpu crap instead of clean these things up. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-275i87sl8e1jcamtchmehonm@git.kernel.org
2012-02-15sched: Optimize migrate_disablePeter Zijlstra
Change from task_rq_lock() to raw_spin_lock(&rq->lock) to avoid a few atomic ops. See comment on why it should be safe. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-cbz6hkl5r5mvwtx5s3tor2y6@git.kernel.org
2012-02-15tracing: Show padding as unsigned shortSteven Rostedt
RT added two bytes to trace migrate disable counting to the trace events and used two bytes of the padding to make the change. The structures and all were updated correctly, but the display in the event formats was not: cat /debug/tracing/events/sched/sched_switch/format name: sched_switch ID: 51 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:unsigned short common_migrate_disable; offset:8; size:2; signed:0; field:int common_padding; offset:10; size:2; signed:0; The field for common_padding has the correct size and offset, but the use of "int" might confuse some parsers (and people that are reading it). This needs to be changed to "unsigned short". Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1321467575.4181.36.camel@frodo Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15ftrace-migrate-disable-tracing.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15hotplug: Call cpu_unplug_begin() before DOWN_PREPAREYong Zhang
cpu_unplug_begin() should be called before CPU_DOWN_PREPARE, because at CPU_DOWN_PREPARE cpu_active is cleared and sched_domain is rebuilt. Otherwise the 'sync_unplug' thread will be running on the cpu on which it's created and not bound on the cpu which is about to go down. I found that by an incorrect warning on smp_processor_id() called by sync_unplug/1, and trace shows below: (echo 1 > /sys/device/system/cpu/cpu1/online) bash-1664 [000] 83.136620: _cpu_down: Bind sync_unplug to cpu 1 bash-1664 [000] 83.136623: sched_wait_task: comm=sync_unplug/1 pid=1724 prio=120 bash-1664 [000] 83.136624: _cpu_down: Wake sync_unplug bash-1664 [000] 83.136629: sched_wakeup: comm=sync_unplug/1 pid=1724 prio=120 success=1 target_cpu=000 Wants to be folded back.... Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Link: http://lkml.kernel.org/r/1318762607-2261-3-git-send-email-yong.zhang0@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15hotplug-use-migrate-disable.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15sched-migrate-disable.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15hotplug: Reread hotplug_pcp on pin_current_cpu() retryYong Zhang
When retry happens, it's likely that the task has been migrated to another cpu (except unplug failed), but it still derefernces the original hotplug_pcp per cpu data. Update the pointer to hotplug_pcp in the retry path, so it points to the current cpu. Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110728031600.GA338@windriver.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>